All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v13 00/15] KVM PCIe/MSI passthrough on ARM/ARM64
@ 2016-10-06  8:45 ` Eric Auger
  0 siblings, 0 replies; 109+ messages in thread
From: Eric Auger @ 2016-10-06  8:45 UTC (permalink / raw)
  To: eric.auger, eric.auger.pro, christoffer.dall, marc.zyngier,
	robin.murphy, alex.williamson, will.deacon, joro, tglx, jason,
	linux-arm-kernel
  Cc: kvm, drjones, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
	p.fedin, iommu, Jean-Philippe.Brucker, yehuday, Manish.Jaggi

Following Robin's series [1], addressing MSI IOMMU mapping for devices
attached to a DMA ops domain, quite a lot of changes (and simplifications)
were induced with respect to the v12 iteration:

- msi-iommu API role now is handled at dma-iommu level
- MSI doorbell registration API still is used for security assessment
  and doorbell overall iommu page size computation.
- MSI layer part II is not needed anymore since mapping directly is
  done in the irqchip compose callback.

The VFIO user API and VFIO layer changes have not changed though. All the
patches now are in the same series.

Tested on AMD Overdrive (single GICv2m frame) with I350 VF assignment.

dependency:
the series depends on Robin's generic-v7 branch:
[1] [PATCH v7 00/22] Generic DT bindings for PCI IOMMUs and ARM SMMU
http://www.spinics.net/lists/arm-kernel/msg531110.html

Best Regards

Eric

Git: complete series available at
https://github.com/eauger/linux/tree/generic-v7-pcie-passthru-v13

the above branch includes a temporary patch to work around a ThunderX pci
bus reset crash (which I think unrelated to this series):
"vfio: pci: HACK! workaround thunderx pci_try_reset_bus crash"
Do not take this one for other platforms.

Eric Auger (14):
  iommu: Introduce DOMAIN_ATTR_MSI_GEOMETRY
  iommu/arm-smmu: Initialize the msi geometry
  genirq/msi: Introduce the MSI doorbell API
  genirq/msi: msi_doorbell_calc_pages
  irqchip/gic-v2m: Register the MSI doorbell
  irqchip/gicv3-its: Register the MSI doorbell
  vfio: Introduce a vfio_dma type field
  vfio/type1: vfio_find_dma accepting a type argument
  vfio/type1: Implement recursive vfio_find_dma_from_node
  vfio/type1: Handle unmap/unpin and replay for VFIO_IOVA_RESERVED slots
  vfio: Allow reserved msi iova registration
  vfio/type1: Check doorbell safety
  iommu/arm-smmu: Do not advertise IOMMU_CAP_INTR_REMAP
  vfio/type1: Return the MSI geometry through VFIO_IOMMU_GET_INFO
    capability chains

Robin Murphy (1):
  iommu/dma: Allow MSI-only cookies

 drivers/iommu/Kconfig            |   2 +
 drivers/iommu/arm-smmu-v3.c      |   5 +-
 drivers/iommu/arm-smmu.c         |   6 +-
 drivers/iommu/dma-iommu.c        |  40 +++++++
 drivers/iommu/iommu.c            |   5 +
 drivers/irqchip/irq-gic-v2m.c    |  11 +-
 drivers/irqchip/irq-gic-v3-its.c |  14 +++
 drivers/vfio/Kconfig             |   1 +
 drivers/vfio/vfio_iommu_type1.c  | 244 ++++++++++++++++++++++++++++++++++++---
 include/linux/dma-iommu.h        |   9 ++
 include/linux/iommu.h            |  14 +++
 include/linux/msi-doorbell.h     |  92 +++++++++++++++
 include/uapi/linux/vfio.h        |  42 ++++++-
 kernel/irq/Kconfig               |   4 +
 kernel/irq/Makefile              |   1 +
 kernel/irq/msi-doorbell.c        | 162 ++++++++++++++++++++++++++
 16 files changed, 633 insertions(+), 19 deletions(-)
 create mode 100644 include/linux/msi-doorbell.h
 create mode 100644 kernel/irq/msi-doorbell.c

-- 
1.9.1

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v13 00/15] KVM PCIe/MSI passthrough on ARM/ARM64
@ 2016-10-06  8:45 ` Eric Auger
  0 siblings, 0 replies; 109+ messages in thread
From: Eric Auger @ 2016-10-06  8:45 UTC (permalink / raw)
  To: eric.auger-H+wXaHxf7aLQT0dZR+AlfA,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	marc.zyngier-5wv7dgnIgG8, robin.murphy-5wv7dgnIgG8,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	joro-zLv9SwRftAIdnm+yROfE0A, tglx-hfZtesqFncYOwBW4kG4KsQ,
	jason-NLaQJdtUoK4Be96aLqz0jA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: drjones-H+wXaHxf7aLQT0dZR+AlfA, kvm-u79uwXL29TY76Z2rM5mHXA,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	yehuday-eYqpPyKDWXRBDgjK7y7TUQ

Following Robin's series [1], addressing MSI IOMMU mapping for devices
attached to a DMA ops domain, quite a lot of changes (and simplifications)
were induced with respect to the v12 iteration:

- msi-iommu API role now is handled at dma-iommu level
- MSI doorbell registration API still is used for security assessment
  and doorbell overall iommu page size computation.
- MSI layer part II is not needed anymore since mapping directly is
  done in the irqchip compose callback.

The VFIO user API and VFIO layer changes have not changed though. All the
patches now are in the same series.

Tested on AMD Overdrive (single GICv2m frame) with I350 VF assignment.

dependency:
the series depends on Robin's generic-v7 branch:
[1] [PATCH v7 00/22] Generic DT bindings for PCI IOMMUs and ARM SMMU
http://www.spinics.net/lists/arm-kernel/msg531110.html

Best Regards

Eric

Git: complete series available at
https://github.com/eauger/linux/tree/generic-v7-pcie-passthru-v13

the above branch includes a temporary patch to work around a ThunderX pci
bus reset crash (which I think unrelated to this series):
"vfio: pci: HACK! workaround thunderx pci_try_reset_bus crash"
Do not take this one for other platforms.

Eric Auger (14):
  iommu: Introduce DOMAIN_ATTR_MSI_GEOMETRY
  iommu/arm-smmu: Initialize the msi geometry
  genirq/msi: Introduce the MSI doorbell API
  genirq/msi: msi_doorbell_calc_pages
  irqchip/gic-v2m: Register the MSI doorbell
  irqchip/gicv3-its: Register the MSI doorbell
  vfio: Introduce a vfio_dma type field
  vfio/type1: vfio_find_dma accepting a type argument
  vfio/type1: Implement recursive vfio_find_dma_from_node
  vfio/type1: Handle unmap/unpin and replay for VFIO_IOVA_RESERVED slots
  vfio: Allow reserved msi iova registration
  vfio/type1: Check doorbell safety
  iommu/arm-smmu: Do not advertise IOMMU_CAP_INTR_REMAP
  vfio/type1: Return the MSI geometry through VFIO_IOMMU_GET_INFO
    capability chains

Robin Murphy (1):
  iommu/dma: Allow MSI-only cookies

 drivers/iommu/Kconfig            |   2 +
 drivers/iommu/arm-smmu-v3.c      |   5 +-
 drivers/iommu/arm-smmu.c         |   6 +-
 drivers/iommu/dma-iommu.c        |  40 +++++++
 drivers/iommu/iommu.c            |   5 +
 drivers/irqchip/irq-gic-v2m.c    |  11 +-
 drivers/irqchip/irq-gic-v3-its.c |  14 +++
 drivers/vfio/Kconfig             |   1 +
 drivers/vfio/vfio_iommu_type1.c  | 244 ++++++++++++++++++++++++++++++++++++---
 include/linux/dma-iommu.h        |   9 ++
 include/linux/iommu.h            |  14 +++
 include/linux/msi-doorbell.h     |  92 +++++++++++++++
 include/uapi/linux/vfio.h        |  42 ++++++-
 kernel/irq/Kconfig               |   4 +
 kernel/irq/Makefile              |   1 +
 kernel/irq/msi-doorbell.c        | 162 ++++++++++++++++++++++++++
 16 files changed, 633 insertions(+), 19 deletions(-)
 create mode 100644 include/linux/msi-doorbell.h
 create mode 100644 kernel/irq/msi-doorbell.c

-- 
1.9.1

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v13 00/15] KVM PCIe/MSI passthrough on ARM/ARM64
@ 2016-10-06  8:45 ` Eric Auger
  0 siblings, 0 replies; 109+ messages in thread
From: Eric Auger @ 2016-10-06  8:45 UTC (permalink / raw)
  To: linux-arm-kernel

Following Robin's series [1], addressing MSI IOMMU mapping for devices
attached to a DMA ops domain, quite a lot of changes (and simplifications)
were induced with respect to the v12 iteration:

- msi-iommu API role now is handled at dma-iommu level
- MSI doorbell registration API still is used for security assessment
  and doorbell overall iommu page size computation.
- MSI layer part II is not needed anymore since mapping directly is
  done in the irqchip compose callback.

The VFIO user API and VFIO layer changes have not changed though. All the
patches now are in the same series.

Tested on AMD Overdrive (single GICv2m frame) with I350 VF assignment.

dependency:
the series depends on Robin's generic-v7 branch:
[1] [PATCH v7 00/22] Generic DT bindings for PCI IOMMUs and ARM SMMU
http://www.spinics.net/lists/arm-kernel/msg531110.html

Best Regards

Eric

Git: complete series available at
https://github.com/eauger/linux/tree/generic-v7-pcie-passthru-v13

the above branch includes a temporary patch to work around a ThunderX pci
bus reset crash (which I think unrelated to this series):
"vfio: pci: HACK! workaround thunderx pci_try_reset_bus crash"
Do not take this one for other platforms.

Eric Auger (14):
  iommu: Introduce DOMAIN_ATTR_MSI_GEOMETRY
  iommu/arm-smmu: Initialize the msi geometry
  genirq/msi: Introduce the MSI doorbell API
  genirq/msi: msi_doorbell_calc_pages
  irqchip/gic-v2m: Register the MSI doorbell
  irqchip/gicv3-its: Register the MSI doorbell
  vfio: Introduce a vfio_dma type field
  vfio/type1: vfio_find_dma accepting a type argument
  vfio/type1: Implement recursive vfio_find_dma_from_node
  vfio/type1: Handle unmap/unpin and replay for VFIO_IOVA_RESERVED slots
  vfio: Allow reserved msi iova registration
  vfio/type1: Check doorbell safety
  iommu/arm-smmu: Do not advertise IOMMU_CAP_INTR_REMAP
  vfio/type1: Return the MSI geometry through VFIO_IOMMU_GET_INFO
    capability chains

Robin Murphy (1):
  iommu/dma: Allow MSI-only cookies

 drivers/iommu/Kconfig            |   2 +
 drivers/iommu/arm-smmu-v3.c      |   5 +-
 drivers/iommu/arm-smmu.c         |   6 +-
 drivers/iommu/dma-iommu.c        |  40 +++++++
 drivers/iommu/iommu.c            |   5 +
 drivers/irqchip/irq-gic-v2m.c    |  11 +-
 drivers/irqchip/irq-gic-v3-its.c |  14 +++
 drivers/vfio/Kconfig             |   1 +
 drivers/vfio/vfio_iommu_type1.c  | 244 ++++++++++++++++++++++++++++++++++++---
 include/linux/dma-iommu.h        |   9 ++
 include/linux/iommu.h            |  14 +++
 include/linux/msi-doorbell.h     |  92 +++++++++++++++
 include/uapi/linux/vfio.h        |  42 ++++++-
 kernel/irq/Kconfig               |   4 +
 kernel/irq/Makefile              |   1 +
 kernel/irq/msi-doorbell.c        | 162 ++++++++++++++++++++++++++
 16 files changed, 633 insertions(+), 19 deletions(-)
 create mode 100644 include/linux/msi-doorbell.h
 create mode 100644 kernel/irq/msi-doorbell.c

-- 
1.9.1

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v13 01/15] iommu: Introduce DOMAIN_ATTR_MSI_GEOMETRY
  2016-10-06  8:45 ` Eric Auger
@ 2016-10-06  8:45   ` Eric Auger
  -1 siblings, 0 replies; 109+ messages in thread
From: Eric Auger @ 2016-10-06  8:45 UTC (permalink / raw)
  To: eric.auger, eric.auger.pro, christoffer.dall, marc.zyngier,
	robin.murphy, alex.williamson, will.deacon, joro, tglx, jason,
	linux-arm-kernel
  Cc: kvm, drjones, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
	p.fedin, iommu, Jean-Philippe.Brucker, yehuday, Manish.Jaggi

Introduce a new DOMAIN_ATTR_MSI_GEOMETRY domain attribute. It enables
to query the aperture of the IOVA window dedicated to MSIs and
test whether the MSIs must be IOMMU mapped.

x86 IOMMUs will typically expose an MSI aperture matching the 1MB
region [FEE0_0000h - FEF0_000h] corresponding to the the APIC
configuration space and no support for MSI IOMMU mapping.

On ARM, the requirement to map MSIs will be reported by setting
iommu_msi_supported to true.

A helper function is added to allow testing if the aperture is valid.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Suggested-by: Alex Williamson <alex.williamson@redhat.com>

---
v12 -> v13:
- reword the commit message

v8 -> v9:
- rename programmable into iommu_msi_supported
- add iommu_domain_msi_aperture_valid

v8: creation
- deprecates DOMAIN_ATTR_MSI_MAPPING flag
---
 drivers/iommu/iommu.c |  5 +++++
 include/linux/iommu.h | 14 ++++++++++++++
 2 files changed, 19 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 9a2f196..617cb2b 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1485,6 +1485,7 @@ int iommu_domain_get_attr(struct iommu_domain *domain,
 			  enum iommu_attr attr, void *data)
 {
 	struct iommu_domain_geometry *geometry;
+	struct iommu_domain_msi_geometry *msi_geometry;
 	bool *paging;
 	int ret = 0;
 	u32 *count;
@@ -1495,6 +1496,10 @@ int iommu_domain_get_attr(struct iommu_domain *domain,
 		*geometry = domain->geometry;
 
 		break;
+	case DOMAIN_ATTR_MSI_GEOMETRY:
+		msi_geometry  = data;
+		*msi_geometry = domain->msi_geometry;
+		break;
 	case DOMAIN_ATTR_PAGING:
 		paging  = data;
 		*paging = (domain->pgsize_bitmap != 0UL);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 436dc21..9f90735 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -52,6 +52,12 @@ struct iommu_domain_geometry {
 	bool force_aperture;       /* DMA only allowed in mappable range? */
 };
 
+struct iommu_domain_msi_geometry {
+	dma_addr_t aperture_start; /* First address used for MSI IOVA    */
+	dma_addr_t aperture_end;   /* Last address used for MSI IOVA     */
+	bool iommu_msi_supported;  /* Is MSI mapping supported?		 */
+};
+
 /* Domain feature flags */
 #define __IOMMU_DOMAIN_PAGING	(1U << 0)  /* Support for iommu_map/unmap */
 #define __IOMMU_DOMAIN_DMA_API	(1U << 1)  /* Domain for use in DMA-API
@@ -83,6 +89,7 @@ struct iommu_domain {
 	iommu_fault_handler_t handler;
 	void *handler_token;
 	struct iommu_domain_geometry geometry;
+	struct iommu_domain_msi_geometry msi_geometry;
 	void *iova_cookie;
 };
 
@@ -108,6 +115,7 @@ enum iommu_cap {
 
 enum iommu_attr {
 	DOMAIN_ATTR_GEOMETRY,
+	DOMAIN_ATTR_MSI_GEOMETRY,
 	DOMAIN_ATTR_PAGING,
 	DOMAIN_ATTR_WINDOWS,
 	DOMAIN_ATTR_FSL_PAMU_STASH,
@@ -352,6 +360,12 @@ int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
 void iommu_fwspec_free(struct device *dev);
 int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids);
 
+static inline bool iommu_domain_msi_aperture_valid(struct iommu_domain *domain)
+{
+	return (domain->msi_geometry.aperture_end >
+		domain->msi_geometry.aperture_start);
+}
+
 #else /* CONFIG_IOMMU_API */
 
 struct iommu_ops {};
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v13 01/15] iommu: Introduce DOMAIN_ATTR_MSI_GEOMETRY
@ 2016-10-06  8:45   ` Eric Auger
  0 siblings, 0 replies; 109+ messages in thread
From: Eric Auger @ 2016-10-06  8:45 UTC (permalink / raw)
  To: linux-arm-kernel

Introduce a new DOMAIN_ATTR_MSI_GEOMETRY domain attribute. It enables
to query the aperture of the IOVA window dedicated to MSIs and
test whether the MSIs must be IOMMU mapped.

x86 IOMMUs will typically expose an MSI aperture matching the 1MB
region [FEE0_0000h - FEF0_000h] corresponding to the the APIC
configuration space and no support for MSI IOMMU mapping.

On ARM, the requirement to map MSIs will be reported by setting
iommu_msi_supported to true.

A helper function is added to allow testing if the aperture is valid.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Suggested-by: Alex Williamson <alex.williamson@redhat.com>

---
v12 -> v13:
- reword the commit message

v8 -> v9:
- rename programmable into iommu_msi_supported
- add iommu_domain_msi_aperture_valid

v8: creation
- deprecates DOMAIN_ATTR_MSI_MAPPING flag
---
 drivers/iommu/iommu.c |  5 +++++
 include/linux/iommu.h | 14 ++++++++++++++
 2 files changed, 19 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 9a2f196..617cb2b 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1485,6 +1485,7 @@ int iommu_domain_get_attr(struct iommu_domain *domain,
 			  enum iommu_attr attr, void *data)
 {
 	struct iommu_domain_geometry *geometry;
+	struct iommu_domain_msi_geometry *msi_geometry;
 	bool *paging;
 	int ret = 0;
 	u32 *count;
@@ -1495,6 +1496,10 @@ int iommu_domain_get_attr(struct iommu_domain *domain,
 		*geometry = domain->geometry;
 
 		break;
+	case DOMAIN_ATTR_MSI_GEOMETRY:
+		msi_geometry  = data;
+		*msi_geometry = domain->msi_geometry;
+		break;
 	case DOMAIN_ATTR_PAGING:
 		paging  = data;
 		*paging = (domain->pgsize_bitmap != 0UL);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 436dc21..9f90735 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -52,6 +52,12 @@ struct iommu_domain_geometry {
 	bool force_aperture;       /* DMA only allowed in mappable range? */
 };
 
+struct iommu_domain_msi_geometry {
+	dma_addr_t aperture_start; /* First address used for MSI IOVA    */
+	dma_addr_t aperture_end;   /* Last address used for MSI IOVA     */
+	bool iommu_msi_supported;  /* Is MSI mapping supported?		 */
+};
+
 /* Domain feature flags */
 #define __IOMMU_DOMAIN_PAGING	(1U << 0)  /* Support for iommu_map/unmap */
 #define __IOMMU_DOMAIN_DMA_API	(1U << 1)  /* Domain for use in DMA-API
@@ -83,6 +89,7 @@ struct iommu_domain {
 	iommu_fault_handler_t handler;
 	void *handler_token;
 	struct iommu_domain_geometry geometry;
+	struct iommu_domain_msi_geometry msi_geometry;
 	void *iova_cookie;
 };
 
@@ -108,6 +115,7 @@ enum iommu_cap {
 
 enum iommu_attr {
 	DOMAIN_ATTR_GEOMETRY,
+	DOMAIN_ATTR_MSI_GEOMETRY,
 	DOMAIN_ATTR_PAGING,
 	DOMAIN_ATTR_WINDOWS,
 	DOMAIN_ATTR_FSL_PAMU_STASH,
@@ -352,6 +360,12 @@ int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
 void iommu_fwspec_free(struct device *dev);
 int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids);
 
+static inline bool iommu_domain_msi_aperture_valid(struct iommu_domain *domain)
+{
+	return (domain->msi_geometry.aperture_end >
+		domain->msi_geometry.aperture_start);
+}
+
 #else /* CONFIG_IOMMU_API */
 
 struct iommu_ops {};
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v13 02/15] iommu/arm-smmu: Initialize the msi geometry
@ 2016-10-06  8:45   ` Eric Auger
  0 siblings, 0 replies; 109+ messages in thread
From: Eric Auger @ 2016-10-06  8:45 UTC (permalink / raw)
  To: eric.auger, eric.auger.pro, christoffer.dall, marc.zyngier,
	robin.murphy, alex.williamson, will.deacon, joro, tglx, jason,
	linux-arm-kernel
  Cc: kvm, drjones, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
	p.fedin, iommu, Jean-Philippe.Brucker, yehuday, Manish.Jaggi

On ARM, MSI write transactions also are translated by the smmu.
Let's report that specificity by setting the iommu_msi_supported
field to true. A valid aperture window will need to be provided.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
v12 -> v13:
- reword the commit message

v8 -> v9:
- reword the title and patch description

v7 -> v8:
- use DOMAIN_ATTR_MSI_GEOMETRY

v4 -> v5:
- don't handle fsl_pamu_domain anymore
- handle arm-smmu-v3
---
 drivers/iommu/arm-smmu-v3.c | 2 ++
 drivers/iommu/arm-smmu.c    | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 15c01c3..f82eec3 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1382,6 +1382,7 @@ static bool arm_smmu_capable(enum iommu_cap cap)
 static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 {
 	struct arm_smmu_domain *smmu_domain;
+	struct iommu_domain_msi_geometry msi_geometry = {0, 0, true};
 
 	if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
 		return NULL;
@@ -1400,6 +1401,7 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 		kfree(smmu_domain);
 		return NULL;
 	}
+	smmu_domain->domain.msi_geometry = msi_geometry;
 
 	mutex_init(&smmu_domain->init_mutex);
 	spin_lock_init(&smmu_domain->pgtbl_lock);
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index ac4aab9..97ff1b4 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1002,6 +1002,7 @@ static void arm_smmu_destroy_domain_context(struct iommu_domain *domain)
 static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 {
 	struct arm_smmu_domain *smmu_domain;
+	struct iommu_domain_msi_geometry msi_geometry = {0, 0, true};
 
 	if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
 		return NULL;
@@ -1020,6 +1021,8 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 		return NULL;
 	}
 
+	smmu_domain->domain.msi_geometry = msi_geometry;
+
 	mutex_init(&smmu_domain->init_mutex);
 	spin_lock_init(&smmu_domain->pgtbl_lock);
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v13 02/15] iommu/arm-smmu: Initialize the msi geometry
@ 2016-10-06  8:45   ` Eric Auger
  0 siblings, 0 replies; 109+ messages in thread
From: Eric Auger @ 2016-10-06  8:45 UTC (permalink / raw)
  To: eric.auger-H+wXaHxf7aLQT0dZR+AlfA,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	marc.zyngier-5wv7dgnIgG8, robin.murphy-5wv7dgnIgG8,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	joro-zLv9SwRftAIdnm+yROfE0A, tglx-hfZtesqFncYOwBW4kG4KsQ,
	jason-NLaQJdtUoK4Be96aLqz0jA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: drjones-H+wXaHxf7aLQT0dZR+AlfA, kvm-u79uwXL29TY76Z2rM5mHXA,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	yehuday-eYqpPyKDWXRBDgjK7y7TUQ

On ARM, MSI write transactions also are translated by the smmu.
Let's report that specificity by setting the iommu_msi_supported
field to true. A valid aperture window will need to be provided.

Signed-off-by: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

---
v12 -> v13:
- reword the commit message

v8 -> v9:
- reword the title and patch description

v7 -> v8:
- use DOMAIN_ATTR_MSI_GEOMETRY

v4 -> v5:
- don't handle fsl_pamu_domain anymore
- handle arm-smmu-v3
---
 drivers/iommu/arm-smmu-v3.c | 2 ++
 drivers/iommu/arm-smmu.c    | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 15c01c3..f82eec3 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1382,6 +1382,7 @@ static bool arm_smmu_capable(enum iommu_cap cap)
 static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 {
 	struct arm_smmu_domain *smmu_domain;
+	struct iommu_domain_msi_geometry msi_geometry = {0, 0, true};
 
 	if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
 		return NULL;
@@ -1400,6 +1401,7 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 		kfree(smmu_domain);
 		return NULL;
 	}
+	smmu_domain->domain.msi_geometry = msi_geometry;
 
 	mutex_init(&smmu_domain->init_mutex);
 	spin_lock_init(&smmu_domain->pgtbl_lock);
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index ac4aab9..97ff1b4 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1002,6 +1002,7 @@ static void arm_smmu_destroy_domain_context(struct iommu_domain *domain)
 static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 {
 	struct arm_smmu_domain *smmu_domain;
+	struct iommu_domain_msi_geometry msi_geometry = {0, 0, true};
 
 	if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
 		return NULL;
@@ -1020,6 +1021,8 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 		return NULL;
 	}
 
+	smmu_domain->domain.msi_geometry = msi_geometry;
+
 	mutex_init(&smmu_domain->init_mutex);
 	spin_lock_init(&smmu_domain->pgtbl_lock);
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v13 02/15] iommu/arm-smmu: Initialize the msi geometry
@ 2016-10-06  8:45   ` Eric Auger
  0 siblings, 0 replies; 109+ messages in thread
From: Eric Auger @ 2016-10-06  8:45 UTC (permalink / raw)
  To: linux-arm-kernel

On ARM, MSI write transactions also are translated by the smmu.
Let's report that specificity by setting the iommu_msi_supported
field to true. A valid aperture window will need to be provided.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
v12 -> v13:
- reword the commit message

v8 -> v9:
- reword the title and patch description

v7 -> v8:
- use DOMAIN_ATTR_MSI_GEOMETRY

v4 -> v5:
- don't handle fsl_pamu_domain anymore
- handle arm-smmu-v3
---
 drivers/iommu/arm-smmu-v3.c | 2 ++
 drivers/iommu/arm-smmu.c    | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 15c01c3..f82eec3 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1382,6 +1382,7 @@ static bool arm_smmu_capable(enum iommu_cap cap)
 static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 {
 	struct arm_smmu_domain *smmu_domain;
+	struct iommu_domain_msi_geometry msi_geometry = {0, 0, true};
 
 	if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
 		return NULL;
@@ -1400,6 +1401,7 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 		kfree(smmu_domain);
 		return NULL;
 	}
+	smmu_domain->domain.msi_geometry = msi_geometry;
 
 	mutex_init(&smmu_domain->init_mutex);
 	spin_lock_init(&smmu_domain->pgtbl_lock);
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index ac4aab9..97ff1b4 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1002,6 +1002,7 @@ static void arm_smmu_destroy_domain_context(struct iommu_domain *domain)
 static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 {
 	struct arm_smmu_domain *smmu_domain;
+	struct iommu_domain_msi_geometry msi_geometry = {0, 0, true};
 
 	if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
 		return NULL;
@@ -1020,6 +1021,8 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 		return NULL;
 	}
 
+	smmu_domain->domain.msi_geometry = msi_geometry;
+
 	mutex_init(&smmu_domain->init_mutex);
 	spin_lock_init(&smmu_domain->pgtbl_lock);
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v13 03/15] iommu/dma: Allow MSI-only cookies
@ 2016-10-06  8:45   ` Eric Auger
  0 siblings, 0 replies; 109+ messages in thread
From: Eric Auger @ 2016-10-06  8:45 UTC (permalink / raw)
  To: eric.auger, eric.auger.pro, christoffer.dall, marc.zyngier,
	robin.murphy, alex.williamson, will.deacon, joro, tglx, jason,
	linux-arm-kernel
  Cc: kvm, drjones, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
	p.fedin, iommu, Jean-Philippe.Brucker, yehuday, Manish.Jaggi

From: Robin Murphy <robin.murphy@arm.com>

IOMMU domain users such as VFIO face a similar problem to DMA API ops
with regard to mapping MSI messages in systems where the MSI write is
subject to IOMMU translation. With the relevant infrastructure now in
place for managed DMA domains, it's actually really simple for other
users to piggyback off that and reap the benefits without giving up
their own IOVA management, and without having to reinvent their own
wheel in the MSI layer.

Allow such users to opt into automatic MSI remapping by dedicating a
region of their IOVA space to a managed cookie.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v1 -> v2:
- compared to Robin's version
- add NULL last param to iommu_dma_init_domain
- set the msi_geometry aperture
- I removed
  if (base < U64_MAX - size)
     reserve_iova(iovad, iova_pfn(iovad, base + size), ULONG_MAX);
  don't get why we would reserve something out of the scope of the iova domain?
  what do I miss?
---
 drivers/iommu/dma-iommu.c | 40 ++++++++++++++++++++++++++++++++++++++++
 include/linux/dma-iommu.h |  9 +++++++++
 2 files changed, 49 insertions(+)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index c5ab866..11da1a0 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -716,3 +716,43 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
 		msg->address_lo += lower_32_bits(msi_page->iova);
 	}
 }
+
+/**
+ * iommu_get_dma_msi_region_cookie - Configure a domain for MSI remapping only
+ * @domain: IOMMU domain to prepare
+ * @base: Base address of IOVA region to use as the MSI remapping aperture
+ * @size: Size of the desired MSI aperture
+ *
+ * Users who manage their own IOVA allocation and do not want DMA API support,
+ * but would still like to take advantage of automatic MSI remapping, can use
+ * this to initialise their own domain appropriately.
+ */
+int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
+		dma_addr_t base, u64 size)
+{
+	struct iommu_dma_cookie *cookie;
+	struct iova_domain *iovad;
+	int ret;
+
+	if (domain->type == IOMMU_DOMAIN_DMA)
+		return -EINVAL;
+
+	ret = iommu_get_dma_cookie(domain);
+	if (ret)
+		return ret;
+
+	ret = iommu_dma_init_domain(domain, base, size, NULL);
+	if (ret) {
+		iommu_put_dma_cookie(domain);
+		return ret;
+	}
+
+	domain->msi_geometry.aperture_start = base;
+	domain->msi_geometry.aperture_end = base + size - 1;
+
+	cookie = domain->iova_cookie;
+	iovad = &cookie->iovad;
+
+	return 0;
+}
+EXPORT_SYMBOL(iommu_get_dma_msi_region_cookie);
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index 32c5890..1c55413 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -67,6 +67,9 @@ int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
 /* The DMA API isn't _quite_ the whole story, though... */
 void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg);
 
+int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
+		dma_addr_t base, u64 size);
+
 #else
 
 struct iommu_domain;
@@ -90,6 +93,12 @@ static inline void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
 {
 }
 
+static inline int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
+		dma_addr_t base, u64 size)
+{
+	return -ENODEV;
+}
+
 #endif	/* CONFIG_IOMMU_DMA */
 #endif	/* __KERNEL__ */
 #endif	/* __DMA_IOMMU_H */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v13 03/15] iommu/dma: Allow MSI-only cookies
@ 2016-10-06  8:45   ` Eric Auger
  0 siblings, 0 replies; 109+ messages in thread
From: Eric Auger @ 2016-10-06  8:45 UTC (permalink / raw)
  To: eric.auger-H+wXaHxf7aLQT0dZR+AlfA,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	marc.zyngier-5wv7dgnIgG8, robin.murphy-5wv7dgnIgG8,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	joro-zLv9SwRftAIdnm+yROfE0A, tglx-hfZtesqFncYOwBW4kG4KsQ,
	jason-NLaQJdtUoK4Be96aLqz0jA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: drjones-H+wXaHxf7aLQT0dZR+AlfA, kvm-u79uwXL29TY76Z2rM5mHXA,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	yehuday-eYqpPyKDWXRBDgjK7y7TUQ

From: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>

IOMMU domain users such as VFIO face a similar problem to DMA API ops
with regard to mapping MSI messages in systems where the MSI write is
subject to IOMMU translation. With the relevant infrastructure now in
place for managed DMA domains, it's actually really simple for other
users to piggyback off that and reap the benefits without giving up
their own IOVA management, and without having to reinvent their own
wheel in the MSI layer.

Allow such users to opt into automatic MSI remapping by dedicating a
region of their IOVA space to a managed cookie.

Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
Signed-off-by: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

---

v1 -> v2:
- compared to Robin's version
- add NULL last param to iommu_dma_init_domain
- set the msi_geometry aperture
- I removed
  if (base < U64_MAX - size)
     reserve_iova(iovad, iova_pfn(iovad, base + size), ULONG_MAX);
  don't get why we would reserve something out of the scope of the iova domain?
  what do I miss?
---
 drivers/iommu/dma-iommu.c | 40 ++++++++++++++++++++++++++++++++++++++++
 include/linux/dma-iommu.h |  9 +++++++++
 2 files changed, 49 insertions(+)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index c5ab866..11da1a0 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -716,3 +716,43 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
 		msg->address_lo += lower_32_bits(msi_page->iova);
 	}
 }
+
+/**
+ * iommu_get_dma_msi_region_cookie - Configure a domain for MSI remapping only
+ * @domain: IOMMU domain to prepare
+ * @base: Base address of IOVA region to use as the MSI remapping aperture
+ * @size: Size of the desired MSI aperture
+ *
+ * Users who manage their own IOVA allocation and do not want DMA API support,
+ * but would still like to take advantage of automatic MSI remapping, can use
+ * this to initialise their own domain appropriately.
+ */
+int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
+		dma_addr_t base, u64 size)
+{
+	struct iommu_dma_cookie *cookie;
+	struct iova_domain *iovad;
+	int ret;
+
+	if (domain->type == IOMMU_DOMAIN_DMA)
+		return -EINVAL;
+
+	ret = iommu_get_dma_cookie(domain);
+	if (ret)
+		return ret;
+
+	ret = iommu_dma_init_domain(domain, base, size, NULL);
+	if (ret) {
+		iommu_put_dma_cookie(domain);
+		return ret;
+	}
+
+	domain->msi_geometry.aperture_start = base;
+	domain->msi_geometry.aperture_end = base + size - 1;
+
+	cookie = domain->iova_cookie;
+	iovad = &cookie->iovad;
+
+	return 0;
+}
+EXPORT_SYMBOL(iommu_get_dma_msi_region_cookie);
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index 32c5890..1c55413 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -67,6 +67,9 @@ int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
 /* The DMA API isn't _quite_ the whole story, though... */
 void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg);
 
+int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
+		dma_addr_t base, u64 size);
+
 #else
 
 struct iommu_domain;
@@ -90,6 +93,12 @@ static inline void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
 {
 }
 
+static inline int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
+		dma_addr_t base, u64 size)
+{
+	return -ENODEV;
+}
+
 #endif	/* CONFIG_IOMMU_DMA */
 #endif	/* __KERNEL__ */
 #endif	/* __DMA_IOMMU_H */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v13 03/15] iommu/dma: Allow MSI-only cookies
@ 2016-10-06  8:45   ` Eric Auger
  0 siblings, 0 replies; 109+ messages in thread
From: Eric Auger @ 2016-10-06  8:45 UTC (permalink / raw)
  To: linux-arm-kernel

From: Robin Murphy <robin.murphy@arm.com>

IOMMU domain users such as VFIO face a similar problem to DMA API ops
with regard to mapping MSI messages in systems where the MSI write is
subject to IOMMU translation. With the relevant infrastructure now in
place for managed DMA domains, it's actually really simple for other
users to piggyback off that and reap the benefits without giving up
their own IOVA management, and without having to reinvent their own
wheel in the MSI layer.

Allow such users to opt into automatic MSI remapping by dedicating a
region of their IOVA space to a managed cookie.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v1 -> v2:
- compared to Robin's version
- add NULL last param to iommu_dma_init_domain
- set the msi_geometry aperture
- I removed
  if (base < U64_MAX - size)
     reserve_iova(iovad, iova_pfn(iovad, base + size), ULONG_MAX);
  don't get why we would reserve something out of the scope of the iova domain?
  what do I miss?
---
 drivers/iommu/dma-iommu.c | 40 ++++++++++++++++++++++++++++++++++++++++
 include/linux/dma-iommu.h |  9 +++++++++
 2 files changed, 49 insertions(+)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index c5ab866..11da1a0 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -716,3 +716,43 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
 		msg->address_lo += lower_32_bits(msi_page->iova);
 	}
 }
+
+/**
+ * iommu_get_dma_msi_region_cookie - Configure a domain for MSI remapping only
+ * @domain: IOMMU domain to prepare
+ * @base: Base address of IOVA region to use as the MSI remapping aperture
+ * @size: Size of the desired MSI aperture
+ *
+ * Users who manage their own IOVA allocation and do not want DMA API support,
+ * but would still like to take advantage of automatic MSI remapping, can use
+ * this to initialise their own domain appropriately.
+ */
+int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
+		dma_addr_t base, u64 size)
+{
+	struct iommu_dma_cookie *cookie;
+	struct iova_domain *iovad;
+	int ret;
+
+	if (domain->type == IOMMU_DOMAIN_DMA)
+		return -EINVAL;
+
+	ret = iommu_get_dma_cookie(domain);
+	if (ret)
+		return ret;
+
+	ret = iommu_dma_init_domain(domain, base, size, NULL);
+	if (ret) {
+		iommu_put_dma_cookie(domain);
+		return ret;
+	}
+
+	domain->msi_geometry.aperture_start = base;
+	domain->msi_geometry.aperture_end = base + size - 1;
+
+	cookie = domain->iova_cookie;
+	iovad = &cookie->iovad;
+
+	return 0;
+}
+EXPORT_SYMBOL(iommu_get_dma_msi_region_cookie);
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index 32c5890..1c55413 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -67,6 +67,9 @@ int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
 /* The DMA API isn't _quite_ the whole story, though... */
 void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg);
 
+int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
+		dma_addr_t base, u64 size);
+
 #else
 
 struct iommu_domain;
@@ -90,6 +93,12 @@ static inline void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
 {
 }
 
+static inline int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
+		dma_addr_t base, u64 size)
+{
+	return -ENODEV;
+}
+
 #endif	/* CONFIG_IOMMU_DMA */
 #endif	/* __KERNEL__ */
 #endif	/* __DMA_IOMMU_H */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v13 04/15] genirq/msi: Introduce the MSI doorbell API
@ 2016-10-06  8:45   ` Eric Auger
  0 siblings, 0 replies; 109+ messages in thread
From: Eric Auger @ 2016-10-06  8:45 UTC (permalink / raw)
  To: eric.auger, eric.auger.pro, christoffer.dall, marc.zyngier,
	robin.murphy, alex.williamson, will.deacon, joro, tglx, jason,
	linux-arm-kernel
  Cc: kvm, drjones, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
	p.fedin, iommu, Jean-Philippe.Brucker, yehuday, Manish.Jaggi

We introduce a new msi-doorbell API that allows msi controllers
to allocate and register their doorbells. This is useful when
those doorbells are likely to be iommu mapped (typically on ARM).
The VFIO layer will need to gather information about those doorbells:
whether they are safe (ie. they implement irq remapping) and how
many IOMMU pages are requested to map all of them.

This patch first introduces the dedicated msi_doorbell_info struct
and the registration/unregistration functions.

A doorbell region is characterized by its physical address base, size,
and whether it its safe (ie. it implements IRQ remapping). A doorbell
can be per-cpu of global. We currently only care about global doorbells.

A function returns whether all doorbells are safe.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
v12 -> v13:
- directly select MSI_DOORBELL in ARM_SMMU and ARM_SMMU_V3 configs
- remove prot attribute
- move msi_doorbell_info struct definition in msi-doorbell.c
- change the commit title
- change proto of the registration function
- msi_doorbell_safe now in this patch

v11 -> v12:
- rename irqchip_doorbell into msi_doorbell, irqchip_doorbell_list
  into msi_doorbell_list and irqchip_doorbell_mutex into
  msi_doorbell_mutex
- fix style issues: align msi_doorbell struct members, kernel-doc comments
- use kzalloc
- use container_of in msi_doorbell_unregister_global
- compute nb_unsafe_doorbells on registration/unregistration
- registration simply returns NULL if allocation failed

v10 -> v11:
- remove void *chip_data argument from register/unregister function
- remove lookup funtions since we restored the struct irq_chip
  msi_doorbell_info ops to realize this function
- reword commit message and title

Conflicts:
	kernel/irq/Makefile

Conflicts:
	drivers/iommu/Kconfig
---
 drivers/iommu/Kconfig        |  2 +
 include/linux/msi-doorbell.h | 77 ++++++++++++++++++++++++++++++++++
 kernel/irq/Kconfig           |  4 ++
 kernel/irq/Makefile          |  1 +
 kernel/irq/msi-doorbell.c    | 98 ++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 182 insertions(+)
 create mode 100644 include/linux/msi-doorbell.h
 create mode 100644 kernel/irq/msi-doorbell.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 8ee54d7..0cc7fac 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -297,6 +297,7 @@ config SPAPR_TCE_IOMMU
 config ARM_SMMU
 	bool "ARM Ltd. System MMU (SMMU) Support"
 	depends on (ARM64 || ARM) && MMU
+	select MSI_DOORBELL
 	select IOMMU_API
 	select IOMMU_IO_PGTABLE_LPAE
 	select ARM_DMA_USE_IOMMU if ARM
@@ -310,6 +311,7 @@ config ARM_SMMU
 config ARM_SMMU_V3
 	bool "ARM Ltd. System MMU Version 3 (SMMUv3) Support"
 	depends on ARM64
+	select MSI_DOORBELL
 	select IOMMU_API
 	select IOMMU_IO_PGTABLE_LPAE
 	select GENERIC_MSI_IRQ_DOMAIN
diff --git a/include/linux/msi-doorbell.h b/include/linux/msi-doorbell.h
new file mode 100644
index 0000000..c18a382
--- /dev/null
+++ b/include/linux/msi-doorbell.h
@@ -0,0 +1,77 @@
+/*
+ * API to register/query MSI doorbells likely to be IOMMU mapped
+ *
+ * Copyright (C) 2016 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef _LINUX_MSI_DOORBELL_H
+#define _LINUX_MSI_DOORBELL_H
+
+struct msi_doorbell_info;
+
+#ifdef CONFIG_MSI_DOORBELL
+
+/**
+ * msi_doorbell_register - allocate and register a global doorbell
+ * @base: physical base address of the global doorbell
+ * @size: size of the global doorbell
+ * @prot: protection/memory attributes
+ * @safe: true is irq_remapping implemented for this doorbell
+ * @dbinfo: returned doorbell info
+ *
+ * Return: 0 on success, -ENOMEM on allocation failure
+ */
+int msi_doorbell_register_global(phys_addr_t base, size_t size,
+				 bool safe,
+				 struct msi_doorbell_info **dbinfo);
+
+/**
+ * msi_doorbell_unregister_global - unregister a global doorbell
+ * @db: doorbell info to unregister
+ *
+ * remove the doorbell descriptor from the list of registered doorbells
+ * and deallocates it
+ */
+void msi_doorbell_unregister_global(struct msi_doorbell_info *db);
+
+/**
+ * msi_doorbell_safe - return whether all registered doorbells are safe
+ *
+ * Safe doorbells are those which implement irq remapping
+ * Return: true if all doorbells are safe, false otherwise
+ */
+bool msi_doorbell_safe(void);
+
+#else
+
+static inline int
+msi_doorbell_register_global(phys_addr_t base, size_t size,
+			     int prot, bool safe,
+			     struct msi_doorbell_info **dbinfo)
+{
+	*dbinfo = NULL;
+	return 0;
+}
+
+static inline void
+msi_doorbell_unregister_global(struct msi_doorbell_info *db) {}
+
+static inline bool msi_doorbell_safe(void)
+{
+	return true;
+}
+#endif /* CONFIG_MSI_DOORBELL */
+
+#endif
diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig
index 3bbfd6a..d4faaaa 100644
--- a/kernel/irq/Kconfig
+++ b/kernel/irq/Kconfig
@@ -72,6 +72,10 @@ config GENERIC_IRQ_IPI
 config GENERIC_MSI_IRQ
 	bool
 
+# MSI doorbell support (for doorbell IOMMU mapping)
+config MSI_DOORBELL
+	bool
+
 # Generic MSI hierarchical interrupt domain support
 config GENERIC_MSI_IRQ_DOMAIN
 	bool
diff --git a/kernel/irq/Makefile b/kernel/irq/Makefile
index 1d3ee31..5b04dd1 100644
--- a/kernel/irq/Makefile
+++ b/kernel/irq/Makefile
@@ -10,3 +10,4 @@ obj-$(CONFIG_PM_SLEEP) += pm.o
 obj-$(CONFIG_GENERIC_MSI_IRQ) += msi.o
 obj-$(CONFIG_GENERIC_IRQ_IPI) += ipi.o
 obj-$(CONFIG_SMP) += affinity.o
+obj-$(CONFIG_MSI_DOORBELL) += msi-doorbell.o
diff --git a/kernel/irq/msi-doorbell.c b/kernel/irq/msi-doorbell.c
new file mode 100644
index 0000000..60a262a
--- /dev/null
+++ b/kernel/irq/msi-doorbell.c
@@ -0,0 +1,98 @@
+/*
+ * API to register/query MSI doorbells likely to be IOMMU mapped
+ *
+ * Copyright (C) 2016 Red Hat, Inc.
+ * Author: Eric Auger <eric.auger@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/slab.h>
+#include <linux/irq.h>
+#include <linux/msi-doorbell.h>
+
+/**
+ * struct msi_doorbell_info - MSI doorbell region descriptor
+ * @percpu_doorbells: per cpu doorbell base address
+ * @global_doorbell: base address of the doorbell
+ * @doorbell_is_percpu: is the doorbell per cpu or global?
+ * @safe: true if irq remapping is implemented
+ * @size: size of the doorbell
+ */
+struct msi_doorbell_info {
+	union {
+		phys_addr_t __percpu    *percpu_doorbells;
+		phys_addr_t             global_doorbell;
+	};
+	bool    doorbell_is_percpu;
+	bool    safe;
+	size_t  size;
+};
+
+struct msi_doorbell {
+	struct msi_doorbell_info	info;
+	struct list_head		next;
+};
+
+/* list of registered MSI doorbells */
+static LIST_HEAD(msi_doorbell_list);
+
+/* counts the number of unsafe registered doorbells */
+static uint nb_unsafe_doorbells;
+
+/* protects the list and nb__unsafe_doorbells */
+static DEFINE_MUTEX(msi_doorbell_mutex);
+
+int msi_doorbell_register_global(phys_addr_t base, size_t size, bool safe,
+				 struct msi_doorbell_info **dbinfo)
+{
+	struct msi_doorbell *db;
+
+	db = kzalloc(sizeof(*db), GFP_KERNEL);
+	if (!db)
+		return -ENOMEM;
+
+	db->info.global_doorbell = base;
+	db->info.size = size;
+	db->info.safe = safe;
+
+	mutex_lock(&msi_doorbell_mutex);
+	list_add(&db->next, &msi_doorbell_list);
+	if (!db->info.safe)
+		nb_unsafe_doorbells++;
+	mutex_unlock(&msi_doorbell_mutex);
+	*dbinfo = &db->info;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(msi_doorbell_register_global);
+
+void msi_doorbell_unregister_global(struct msi_doorbell_info *dbinfo)
+{
+	struct msi_doorbell *db;
+
+	db = container_of(dbinfo, struct msi_doorbell, info);
+
+	mutex_lock(&msi_doorbell_mutex);
+	list_del(&db->next);
+	if (!db->info.safe)
+		nb_unsafe_doorbells--;
+	mutex_unlock(&msi_doorbell_mutex);
+	kfree(db);
+}
+EXPORT_SYMBOL_GPL(msi_doorbell_unregister_global);
+
+bool msi_doorbell_safe(void)
+{
+	return !nb_unsafe_doorbells;
+}
+EXPORT_SYMBOL_GPL(msi_doorbell_safe);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v13 04/15] genirq/msi: Introduce the MSI doorbell API
@ 2016-10-06  8:45   ` Eric Auger
  0 siblings, 0 replies; 109+ messages in thread
From: Eric Auger @ 2016-10-06  8:45 UTC (permalink / raw)
  To: eric.auger-H+wXaHxf7aLQT0dZR+AlfA,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	marc.zyngier-5wv7dgnIgG8, robin.murphy-5wv7dgnIgG8,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	joro-zLv9SwRftAIdnm+yROfE0A, tglx-hfZtesqFncYOwBW4kG4KsQ,
	jason-NLaQJdtUoK4Be96aLqz0jA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: drjones-H+wXaHxf7aLQT0dZR+AlfA, kvm-u79uwXL29TY76Z2rM5mHXA,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	yehuday-eYqpPyKDWXRBDgjK7y7TUQ

We introduce a new msi-doorbell API that allows msi controllers
to allocate and register their doorbells. This is useful when
those doorbells are likely to be iommu mapped (typically on ARM).
The VFIO layer will need to gather information about those doorbells:
whether they are safe (ie. they implement irq remapping) and how
many IOMMU pages are requested to map all of them.

This patch first introduces the dedicated msi_doorbell_info struct
and the registration/unregistration functions.

A doorbell region is characterized by its physical address base, size,
and whether it its safe (ie. it implements IRQ remapping). A doorbell
can be per-cpu of global. We currently only care about global doorbells.

A function returns whether all doorbells are safe.

Signed-off-by: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

---
v12 -> v13:
- directly select MSI_DOORBELL in ARM_SMMU and ARM_SMMU_V3 configs
- remove prot attribute
- move msi_doorbell_info struct definition in msi-doorbell.c
- change the commit title
- change proto of the registration function
- msi_doorbell_safe now in this patch

v11 -> v12:
- rename irqchip_doorbell into msi_doorbell, irqchip_doorbell_list
  into msi_doorbell_list and irqchip_doorbell_mutex into
  msi_doorbell_mutex
- fix style issues: align msi_doorbell struct members, kernel-doc comments
- use kzalloc
- use container_of in msi_doorbell_unregister_global
- compute nb_unsafe_doorbells on registration/unregistration
- registration simply returns NULL if allocation failed

v10 -> v11:
- remove void *chip_data argument from register/unregister function
- remove lookup funtions since we restored the struct irq_chip
  msi_doorbell_info ops to realize this function
- reword commit message and title

Conflicts:
	kernel/irq/Makefile

Conflicts:
	drivers/iommu/Kconfig
---
 drivers/iommu/Kconfig        |  2 +
 include/linux/msi-doorbell.h | 77 ++++++++++++++++++++++++++++++++++
 kernel/irq/Kconfig           |  4 ++
 kernel/irq/Makefile          |  1 +
 kernel/irq/msi-doorbell.c    | 98 ++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 182 insertions(+)
 create mode 100644 include/linux/msi-doorbell.h
 create mode 100644 kernel/irq/msi-doorbell.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 8ee54d7..0cc7fac 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -297,6 +297,7 @@ config SPAPR_TCE_IOMMU
 config ARM_SMMU
 	bool "ARM Ltd. System MMU (SMMU) Support"
 	depends on (ARM64 || ARM) && MMU
+	select MSI_DOORBELL
 	select IOMMU_API
 	select IOMMU_IO_PGTABLE_LPAE
 	select ARM_DMA_USE_IOMMU if ARM
@@ -310,6 +311,7 @@ config ARM_SMMU
 config ARM_SMMU_V3
 	bool "ARM Ltd. System MMU Version 3 (SMMUv3) Support"
 	depends on ARM64
+	select MSI_DOORBELL
 	select IOMMU_API
 	select IOMMU_IO_PGTABLE_LPAE
 	select GENERIC_MSI_IRQ_DOMAIN
diff --git a/include/linux/msi-doorbell.h b/include/linux/msi-doorbell.h
new file mode 100644
index 0000000..c18a382
--- /dev/null
+++ b/include/linux/msi-doorbell.h
@@ -0,0 +1,77 @@
+/*
+ * API to register/query MSI doorbells likely to be IOMMU mapped
+ *
+ * Copyright (C) 2016 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef _LINUX_MSI_DOORBELL_H
+#define _LINUX_MSI_DOORBELL_H
+
+struct msi_doorbell_info;
+
+#ifdef CONFIG_MSI_DOORBELL
+
+/**
+ * msi_doorbell_register - allocate and register a global doorbell
+ * @base: physical base address of the global doorbell
+ * @size: size of the global doorbell
+ * @prot: protection/memory attributes
+ * @safe: true is irq_remapping implemented for this doorbell
+ * @dbinfo: returned doorbell info
+ *
+ * Return: 0 on success, -ENOMEM on allocation failure
+ */
+int msi_doorbell_register_global(phys_addr_t base, size_t size,
+				 bool safe,
+				 struct msi_doorbell_info **dbinfo);
+
+/**
+ * msi_doorbell_unregister_global - unregister a global doorbell
+ * @db: doorbell info to unregister
+ *
+ * remove the doorbell descriptor from the list of registered doorbells
+ * and deallocates it
+ */
+void msi_doorbell_unregister_global(struct msi_doorbell_info *db);
+
+/**
+ * msi_doorbell_safe - return whether all registered doorbells are safe
+ *
+ * Safe doorbells are those which implement irq remapping
+ * Return: true if all doorbells are safe, false otherwise
+ */
+bool msi_doorbell_safe(void);
+
+#else
+
+static inline int
+msi_doorbell_register_global(phys_addr_t base, size_t size,
+			     int prot, bool safe,
+			     struct msi_doorbell_info **dbinfo)
+{
+	*dbinfo = NULL;
+	return 0;
+}
+
+static inline void
+msi_doorbell_unregister_global(struct msi_doorbell_info *db) {}
+
+static inline bool msi_doorbell_safe(void)
+{
+	return true;
+}
+#endif /* CONFIG_MSI_DOORBELL */
+
+#endif
diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig
index 3bbfd6a..d4faaaa 100644
--- a/kernel/irq/Kconfig
+++ b/kernel/irq/Kconfig
@@ -72,6 +72,10 @@ config GENERIC_IRQ_IPI
 config GENERIC_MSI_IRQ
 	bool
 
+# MSI doorbell support (for doorbell IOMMU mapping)
+config MSI_DOORBELL
+	bool
+
 # Generic MSI hierarchical interrupt domain support
 config GENERIC_MSI_IRQ_DOMAIN
 	bool
diff --git a/kernel/irq/Makefile b/kernel/irq/Makefile
index 1d3ee31..5b04dd1 100644
--- a/kernel/irq/Makefile
+++ b/kernel/irq/Makefile
@@ -10,3 +10,4 @@ obj-$(CONFIG_PM_SLEEP) += pm.o
 obj-$(CONFIG_GENERIC_MSI_IRQ) += msi.o
 obj-$(CONFIG_GENERIC_IRQ_IPI) += ipi.o
 obj-$(CONFIG_SMP) += affinity.o
+obj-$(CONFIG_MSI_DOORBELL) += msi-doorbell.o
diff --git a/kernel/irq/msi-doorbell.c b/kernel/irq/msi-doorbell.c
new file mode 100644
index 0000000..60a262a
--- /dev/null
+++ b/kernel/irq/msi-doorbell.c
@@ -0,0 +1,98 @@
+/*
+ * API to register/query MSI doorbells likely to be IOMMU mapped
+ *
+ * Copyright (C) 2016 Red Hat, Inc.
+ * Author: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/slab.h>
+#include <linux/irq.h>
+#include <linux/msi-doorbell.h>
+
+/**
+ * struct msi_doorbell_info - MSI doorbell region descriptor
+ * @percpu_doorbells: per cpu doorbell base address
+ * @global_doorbell: base address of the doorbell
+ * @doorbell_is_percpu: is the doorbell per cpu or global?
+ * @safe: true if irq remapping is implemented
+ * @size: size of the doorbell
+ */
+struct msi_doorbell_info {
+	union {
+		phys_addr_t __percpu    *percpu_doorbells;
+		phys_addr_t             global_doorbell;
+	};
+	bool    doorbell_is_percpu;
+	bool    safe;
+	size_t  size;
+};
+
+struct msi_doorbell {
+	struct msi_doorbell_info	info;
+	struct list_head		next;
+};
+
+/* list of registered MSI doorbells */
+static LIST_HEAD(msi_doorbell_list);
+
+/* counts the number of unsafe registered doorbells */
+static uint nb_unsafe_doorbells;
+
+/* protects the list and nb__unsafe_doorbells */
+static DEFINE_MUTEX(msi_doorbell_mutex);
+
+int msi_doorbell_register_global(phys_addr_t base, size_t size, bool safe,
+				 struct msi_doorbell_info **dbinfo)
+{
+	struct msi_doorbell *db;
+
+	db = kzalloc(sizeof(*db), GFP_KERNEL);
+	if (!db)
+		return -ENOMEM;
+
+	db->info.global_doorbell = base;
+	db->info.size = size;
+	db->info.safe = safe;
+
+	mutex_lock(&msi_doorbell_mutex);
+	list_add(&db->next, &msi_doorbell_list);
+	if (!db->info.safe)
+		nb_unsafe_doorbells++;
+	mutex_unlock(&msi_doorbell_mutex);
+	*dbinfo = &db->info;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(msi_doorbell_register_global);
+
+void msi_doorbell_unregister_global(struct msi_doorbell_info *dbinfo)
+{
+	struct msi_doorbell *db;
+
+	db = container_of(dbinfo, struct msi_doorbell, info);
+
+	mutex_lock(&msi_doorbell_mutex);
+	list_del(&db->next);
+	if (!db->info.safe)
+		nb_unsafe_doorbells--;
+	mutex_unlock(&msi_doorbell_mutex);
+	kfree(db);
+}
+EXPORT_SYMBOL_GPL(msi_doorbell_unregister_global);
+
+bool msi_doorbell_safe(void)
+{
+	return !nb_unsafe_doorbells;
+}
+EXPORT_SYMBOL_GPL(msi_doorbell_safe);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v13 04/15] genirq/msi: Introduce the MSI doorbell API
@ 2016-10-06  8:45   ` Eric Auger
  0 siblings, 0 replies; 109+ messages in thread
From: Eric Auger @ 2016-10-06  8:45 UTC (permalink / raw)
  To: linux-arm-kernel

We introduce a new msi-doorbell API that allows msi controllers
to allocate and register their doorbells. This is useful when
those doorbells are likely to be iommu mapped (typically on ARM).
The VFIO layer will need to gather information about those doorbells:
whether they are safe (ie. they implement irq remapping) and how
many IOMMU pages are requested to map all of them.

This patch first introduces the dedicated msi_doorbell_info struct
and the registration/unregistration functions.

A doorbell region is characterized by its physical address base, size,
and whether it its safe (ie. it implements IRQ remapping). A doorbell
can be per-cpu of global. We currently only care about global doorbells.

A function returns whether all doorbells are safe.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
v12 -> v13:
- directly select MSI_DOORBELL in ARM_SMMU and ARM_SMMU_V3 configs
- remove prot attribute
- move msi_doorbell_info struct definition in msi-doorbell.c
- change the commit title
- change proto of the registration function
- msi_doorbell_safe now in this patch

v11 -> v12:
- rename irqchip_doorbell into msi_doorbell, irqchip_doorbell_list
  into msi_doorbell_list and irqchip_doorbell_mutex into
  msi_doorbell_mutex
- fix style issues: align msi_doorbell struct members, kernel-doc comments
- use kzalloc
- use container_of in msi_doorbell_unregister_global
- compute nb_unsafe_doorbells on registration/unregistration
- registration simply returns NULL if allocation failed

v10 -> v11:
- remove void *chip_data argument from register/unregister function
- remove lookup funtions since we restored the struct irq_chip
  msi_doorbell_info ops to realize this function
- reword commit message and title

Conflicts:
	kernel/irq/Makefile

Conflicts:
	drivers/iommu/Kconfig
---
 drivers/iommu/Kconfig        |  2 +
 include/linux/msi-doorbell.h | 77 ++++++++++++++++++++++++++++++++++
 kernel/irq/Kconfig           |  4 ++
 kernel/irq/Makefile          |  1 +
 kernel/irq/msi-doorbell.c    | 98 ++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 182 insertions(+)
 create mode 100644 include/linux/msi-doorbell.h
 create mode 100644 kernel/irq/msi-doorbell.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 8ee54d7..0cc7fac 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -297,6 +297,7 @@ config SPAPR_TCE_IOMMU
 config ARM_SMMU
 	bool "ARM Ltd. System MMU (SMMU) Support"
 	depends on (ARM64 || ARM) && MMU
+	select MSI_DOORBELL
 	select IOMMU_API
 	select IOMMU_IO_PGTABLE_LPAE
 	select ARM_DMA_USE_IOMMU if ARM
@@ -310,6 +311,7 @@ config ARM_SMMU
 config ARM_SMMU_V3
 	bool "ARM Ltd. System MMU Version 3 (SMMUv3) Support"
 	depends on ARM64
+	select MSI_DOORBELL
 	select IOMMU_API
 	select IOMMU_IO_PGTABLE_LPAE
 	select GENERIC_MSI_IRQ_DOMAIN
diff --git a/include/linux/msi-doorbell.h b/include/linux/msi-doorbell.h
new file mode 100644
index 0000000..c18a382
--- /dev/null
+++ b/include/linux/msi-doorbell.h
@@ -0,0 +1,77 @@
+/*
+ * API to register/query MSI doorbells likely to be IOMMU mapped
+ *
+ * Copyright (C) 2016 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef _LINUX_MSI_DOORBELL_H
+#define _LINUX_MSI_DOORBELL_H
+
+struct msi_doorbell_info;
+
+#ifdef CONFIG_MSI_DOORBELL
+
+/**
+ * msi_doorbell_register - allocate and register a global doorbell
+ * @base: physical base address of the global doorbell
+ * @size: size of the global doorbell
+ * @prot: protection/memory attributes
+ * @safe: true is irq_remapping implemented for this doorbell
+ * @dbinfo: returned doorbell info
+ *
+ * Return: 0 on success, -ENOMEM on allocation failure
+ */
+int msi_doorbell_register_global(phys_addr_t base, size_t size,
+				 bool safe,
+				 struct msi_doorbell_info **dbinfo);
+
+/**
+ * msi_doorbell_unregister_global - unregister a global doorbell
+ * @db: doorbell info to unregister
+ *
+ * remove the doorbell descriptor from the list of registered doorbells
+ * and deallocates it
+ */
+void msi_doorbell_unregister_global(struct msi_doorbell_info *db);
+
+/**
+ * msi_doorbell_safe - return whether all registered doorbells are safe
+ *
+ * Safe doorbells are those which implement irq remapping
+ * Return: true if all doorbells are safe, false otherwise
+ */
+bool msi_doorbell_safe(void);
+
+#else
+
+static inline int
+msi_doorbell_register_global(phys_addr_t base, size_t size,
+			     int prot, bool safe,
+			     struct msi_doorbell_info **dbinfo)
+{
+	*dbinfo = NULL;
+	return 0;
+}
+
+static inline void
+msi_doorbell_unregister_global(struct msi_doorbell_info *db) {}
+
+static inline bool msi_doorbell_safe(void)
+{
+	return true;
+}
+#endif /* CONFIG_MSI_DOORBELL */
+
+#endif
diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig
index 3bbfd6a..d4faaaa 100644
--- a/kernel/irq/Kconfig
+++ b/kernel/irq/Kconfig
@@ -72,6 +72,10 @@ config GENERIC_IRQ_IPI
 config GENERIC_MSI_IRQ
 	bool
 
+# MSI doorbell support (for doorbell IOMMU mapping)
+config MSI_DOORBELL
+	bool
+
 # Generic MSI hierarchical interrupt domain support
 config GENERIC_MSI_IRQ_DOMAIN
 	bool
diff --git a/kernel/irq/Makefile b/kernel/irq/Makefile
index 1d3ee31..5b04dd1 100644
--- a/kernel/irq/Makefile
+++ b/kernel/irq/Makefile
@@ -10,3 +10,4 @@ obj-$(CONFIG_PM_SLEEP) += pm.o
 obj-$(CONFIG_GENERIC_MSI_IRQ) += msi.o
 obj-$(CONFIG_GENERIC_IRQ_IPI) += ipi.o
 obj-$(CONFIG_SMP) += affinity.o
+obj-$(CONFIG_MSI_DOORBELL) += msi-doorbell.o
diff --git a/kernel/irq/msi-doorbell.c b/kernel/irq/msi-doorbell.c
new file mode 100644
index 0000000..60a262a
--- /dev/null
+++ b/kernel/irq/msi-doorbell.c
@@ -0,0 +1,98 @@
+/*
+ * API to register/query MSI doorbells likely to be IOMMU mapped
+ *
+ * Copyright (C) 2016 Red Hat, Inc.
+ * Author: Eric Auger <eric.auger@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/slab.h>
+#include <linux/irq.h>
+#include <linux/msi-doorbell.h>
+
+/**
+ * struct msi_doorbell_info - MSI doorbell region descriptor
+ * @percpu_doorbells: per cpu doorbell base address
+ * @global_doorbell: base address of the doorbell
+ * @doorbell_is_percpu: is the doorbell per cpu or global?
+ * @safe: true if irq remapping is implemented
+ * @size: size of the doorbell
+ */
+struct msi_doorbell_info {
+	union {
+		phys_addr_t __percpu    *percpu_doorbells;
+		phys_addr_t             global_doorbell;
+	};
+	bool    doorbell_is_percpu;
+	bool    safe;
+	size_t  size;
+};
+
+struct msi_doorbell {
+	struct msi_doorbell_info	info;
+	struct list_head		next;
+};
+
+/* list of registered MSI doorbells */
+static LIST_HEAD(msi_doorbell_list);
+
+/* counts the number of unsafe registered doorbells */
+static uint nb_unsafe_doorbells;
+
+/* protects the list and nb__unsafe_doorbells */
+static DEFINE_MUTEX(msi_doorbell_mutex);
+
+int msi_doorbell_register_global(phys_addr_t base, size_t size, bool safe,
+				 struct msi_doorbell_info **dbinfo)
+{
+	struct msi_doorbell *db;
+
+	db = kzalloc(sizeof(*db), GFP_KERNEL);
+	if (!db)
+		return -ENOMEM;
+
+	db->info.global_doorbell = base;
+	db->info.size = size;
+	db->info.safe = safe;
+
+	mutex_lock(&msi_doorbell_mutex);
+	list_add(&db->next, &msi_doorbell_list);
+	if (!db->info.safe)
+		nb_unsafe_doorbells++;
+	mutex_unlock(&msi_doorbell_mutex);
+	*dbinfo = &db->info;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(msi_doorbell_register_global);
+
+void msi_doorbell_unregister_global(struct msi_doorbell_info *dbinfo)
+{
+	struct msi_doorbell *db;
+
+	db = container_of(dbinfo, struct msi_doorbell, info);
+
+	mutex_lock(&msi_doorbell_mutex);
+	list_del(&db->next);
+	if (!db->info.safe)
+		nb_unsafe_doorbells--;
+	mutex_unlock(&msi_doorbell_mutex);
+	kfree(db);
+}
+EXPORT_SYMBOL_GPL(msi_doorbell_unregister_global);
+
+bool msi_doorbell_safe(void)
+{
+	return !nb_unsafe_doorbells;
+}
+EXPORT_SYMBOL_GPL(msi_doorbell_safe);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v13 05/15] genirq/msi: msi_doorbell_calc_pages
  2016-10-06  8:45 ` Eric Auger
@ 2016-10-06  8:45   ` Eric Auger
  -1 siblings, 0 replies; 109+ messages in thread
From: Eric Auger @ 2016-10-06  8:45 UTC (permalink / raw)
  To: eric.auger, eric.auger.pro, christoffer.dall, marc.zyngier,
	robin.murphy, alex.williamson, will.deacon, joro, tglx, jason,
	linux-arm-kernel
  Cc: kvm, drjones, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
	p.fedin, iommu, Jean-Philippe.Brucker, yehuday, Manish.Jaggi

msi_doorbell_calc_pages() sum up the number of iommu pages of a given order
requested to map all the registered doorbells. This function will allow
to dimension the intermediate physical address (IPA) aperture requested
to map the MSI doorbells.

Note this requirement cannot be computed at MSI doorbell registration time
since the IOMMU page order is not known.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
v11 -> v12:
- fix style issues: remove useless line break, remove pointless braces,
  fix kernel-doc comments
- reword commit message
- rename msi_doorbell_pages into msi_doorbell_calc_pages
- rename static compute* functions

v10: creation
---
 include/linux/msi-doorbell.h | 15 +++++++++++
 kernel/irq/msi-doorbell.c    | 64 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 79 insertions(+)

diff --git a/include/linux/msi-doorbell.h b/include/linux/msi-doorbell.h
index c18a382..f1106cb 100644
--- a/include/linux/msi-doorbell.h
+++ b/include/linux/msi-doorbell.h
@@ -54,6 +54,15 @@ void msi_doorbell_unregister_global(struct msi_doorbell_info *db);
  */
 bool msi_doorbell_safe(void);
 
+/**
+ * msi_doorbell_calc_pages - compute the number of pages
+ * requested to map all the registered doorbells
+ * @order: iommu page order
+ *
+ * Return: the number of requested pages
+ */
+int msi_doorbell_calc_pages(unsigned int order);
+
 #else
 
 static inline int
@@ -72,6 +81,12 @@ static inline bool msi_doorbell_safe(void)
 {
 	return true;
 }
+
+static inline int msi_doorbell_calc_pages(unsigned int order)
+{
+	return 0;
+}
+
 #endif /* CONFIG_MSI_DOORBELL */
 
 #endif
diff --git a/kernel/irq/msi-doorbell.c b/kernel/irq/msi-doorbell.c
index 60a262a..d1cc525 100644
--- a/kernel/irq/msi-doorbell.c
+++ b/kernel/irq/msi-doorbell.c
@@ -96,3 +96,67 @@ bool msi_doorbell_safe(void)
 	return !nb_unsafe_doorbells;
 }
 EXPORT_SYMBOL_GPL(msi_doorbell_safe);
+
+/**
+ * calc_region_reqs - compute the number of pages requested to map a region
+ *
+ * @addr: physical base address of the region
+ * @size: size of the region
+ * @order: the page order
+ *
+ * Return: the number of requested pages to map this region
+ */
+static int calc_region_reqs(phys_addr_t addr, size_t size, unsigned int order)
+{
+	phys_addr_t offset, granule;
+	unsigned int nb_pages;
+
+	granule = (uint64_t)(1 << order);
+	offset = addr & (granule - 1);
+	size = ALIGN(size + offset, granule);
+	nb_pages = size >> order;
+
+	return nb_pages;
+}
+
+/**
+ * calc_dbinfo_reqs - compute the number of pages requested to map a given
+ * MSI doorbell
+ *
+ * @dbi: doorbell info descriptor
+ * @order: page order
+ *
+ * Return: the number of requested pages to map this doorbell
+ */
+static int calc_dbinfo_reqs(struct msi_doorbell_info *dbi, unsigned int order)
+{
+	int ret = 0;
+
+	if (!dbi->doorbell_is_percpu) {
+		ret = calc_region_reqs(dbi->global_doorbell, dbi->size, order);
+	} else {
+		phys_addr_t __percpu *pbase;
+		int cpu;
+
+		for_each_possible_cpu(cpu) {
+			pbase = per_cpu_ptr(dbi->percpu_doorbells, cpu);
+			ret += calc_region_reqs(*pbase, dbi->size, order);
+		}
+	}
+	return ret;
+}
+
+int msi_doorbell_calc_pages(unsigned int order)
+{
+	struct msi_doorbell *db;
+	int ret = 0;
+
+	mutex_lock(&msi_doorbell_mutex);
+	list_for_each_entry(db, &msi_doorbell_list, next)
+		ret += calc_dbinfo_reqs(&db->info, order);
+
+	mutex_unlock(&msi_doorbell_mutex);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(msi_doorbell_calc_pages);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v13 05/15] genirq/msi: msi_doorbell_calc_pages
@ 2016-10-06  8:45   ` Eric Auger
  0 siblings, 0 replies; 109+ messages in thread
From: Eric Auger @ 2016-10-06  8:45 UTC (permalink / raw)
  To: linux-arm-kernel

msi_doorbell_calc_pages() sum up the number of iommu pages of a given order
requested to map all the registered doorbells. This function will allow
to dimension the intermediate physical address (IPA) aperture requested
to map the MSI doorbells.

Note this requirement cannot be computed at MSI doorbell registration time
since the IOMMU page order is not known.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
v11 -> v12:
- fix style issues: remove useless line break, remove pointless braces,
  fix kernel-doc comments
- reword commit message
- rename msi_doorbell_pages into msi_doorbell_calc_pages
- rename static compute* functions

v10: creation
---
 include/linux/msi-doorbell.h | 15 +++++++++++
 kernel/irq/msi-doorbell.c    | 64 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 79 insertions(+)

diff --git a/include/linux/msi-doorbell.h b/include/linux/msi-doorbell.h
index c18a382..f1106cb 100644
--- a/include/linux/msi-doorbell.h
+++ b/include/linux/msi-doorbell.h
@@ -54,6 +54,15 @@ void msi_doorbell_unregister_global(struct msi_doorbell_info *db);
  */
 bool msi_doorbell_safe(void);
 
+/**
+ * msi_doorbell_calc_pages - compute the number of pages
+ * requested to map all the registered doorbells
+ * @order: iommu page order
+ *
+ * Return: the number of requested pages
+ */
+int msi_doorbell_calc_pages(unsigned int order);
+
 #else
 
 static inline int
@@ -72,6 +81,12 @@ static inline bool msi_doorbell_safe(void)
 {
 	return true;
 }
+
+static inline int msi_doorbell_calc_pages(unsigned int order)
+{
+	return 0;
+}
+
 #endif /* CONFIG_MSI_DOORBELL */
 
 #endif
diff --git a/kernel/irq/msi-doorbell.c b/kernel/irq/msi-doorbell.c
index 60a262a..d1cc525 100644
--- a/kernel/irq/msi-doorbell.c
+++ b/kernel/irq/msi-doorbell.c
@@ -96,3 +96,67 @@ bool msi_doorbell_safe(void)
 	return !nb_unsafe_doorbells;
 }
 EXPORT_SYMBOL_GPL(msi_doorbell_safe);
+
+/**
+ * calc_region_reqs - compute the number of pages requested to map a region
+ *
+ * @addr: physical base address of the region
+ * @size: size of the region
+ * @order: the page order
+ *
+ * Return: the number of requested pages to map this region
+ */
+static int calc_region_reqs(phys_addr_t addr, size_t size, unsigned int order)
+{
+	phys_addr_t offset, granule;
+	unsigned int nb_pages;
+
+	granule = (uint64_t)(1 << order);
+	offset = addr & (granule - 1);
+	size = ALIGN(size + offset, granule);
+	nb_pages = size >> order;
+
+	return nb_pages;
+}
+
+/**
+ * calc_dbinfo_reqs - compute the number of pages requested to map a given
+ * MSI doorbell
+ *
+ * @dbi: doorbell info descriptor
+ * @order: page order
+ *
+ * Return: the number of requested pages to map this doorbell
+ */
+static int calc_dbinfo_reqs(struct msi_doorbell_info *dbi, unsigned int order)
+{
+	int ret = 0;
+
+	if (!dbi->doorbell_is_percpu) {
+		ret = calc_region_reqs(dbi->global_doorbell, dbi->size, order);
+	} else {
+		phys_addr_t __percpu *pbase;
+		int cpu;
+
+		for_each_possible_cpu(cpu) {
+			pbase = per_cpu_ptr(dbi->percpu_doorbells, cpu);
+			ret += calc_region_reqs(*pbase, dbi->size, order);
+		}
+	}
+	return ret;
+}
+
+int msi_doorbell_calc_pages(unsigned int order)
+{
+	struct msi_doorbell *db;
+	int ret = 0;
+
+	mutex_lock(&msi_doorbell_mutex);
+	list_for_each_entry(db, &msi_doorbell_list, next)
+		ret += calc_dbinfo_reqs(&db->info, order);
+
+	mutex_unlock(&msi_doorbell_mutex);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(msi_doorbell_calc_pages);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v13 06/15] irqchip/gic-v2m: Register the MSI doorbell
@ 2016-10-06  8:45   ` Eric Auger
  0 siblings, 0 replies; 109+ messages in thread
From: Eric Auger @ 2016-10-06  8:45 UTC (permalink / raw)
  To: eric.auger, eric.auger.pro, christoffer.dall, marc.zyngier,
	robin.murphy, alex.williamson, will.deacon, joro, tglx, jason,
	linux-arm-kernel
  Cc: kvm, drjones, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
	p.fedin, iommu, Jean-Philippe.Brucker, yehuday, Manish.Jaggi

Register the GIC V2M global doorbell. The registered information
are used to set up the KVM passthrough use case.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
v12 -> v13:
- use new msi doorbell registration prototype
- remove iommu protection attributes
- add unregistration in teardown

v11 -> v12:
- use irq_get_msi_doorbell_info new name
- simplify error handling

v10 -> v11:
- use the new registration API and re-implement the msi_doorbell_info
  ops

v9 -> v10:
- introduce the registration concept in place of msi_doorbell_info
  callback

v8 -> v9:
- use global_doorbell instead of percpu_doorbells

v7 -> v8:
- gicv2m_msi_doorbell_info does not return a pointer to const
- remove spurious !v2m check
- add IOMMU_MMIO flag

v7: creation
---
 drivers/irqchip/irq-gic-v2m.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/irqchip/irq-gic-v2m.c b/drivers/irqchip/irq-gic-v2m.c
index 863e073..343f19f 100644
--- a/drivers/irqchip/irq-gic-v2m.c
+++ b/drivers/irqchip/irq-gic-v2m.c
@@ -26,6 +26,7 @@
 #include <linux/slab.h>
 #include <linux/spinlock.h>
 #include <linux/irqchip/arm-gic.h>
+#include <linux/msi-doorbell.h>
 
 /*
 * MSI_TYPER:
@@ -70,6 +71,7 @@ struct v2m_data {
 	u32 spi_offset;		/* offset to be subtracted from SPI number */
 	unsigned long *bm;	/* MSI vector bitmap */
 	u32 flags;		/* v2m flags for specific implementation */
+	struct msi_doorbell_info *doorbell_info; /* MSI doorbell */
 };
 
 static void gicv2m_mask_msi_irq(struct irq_data *d)
@@ -254,6 +256,7 @@ static void gicv2m_teardown(void)
 	struct v2m_data *v2m, *tmp;
 
 	list_for_each_entry_safe(v2m, tmp, &v2m_nodes, entry) {
+		msi_doorbell_unregister_global(v2m->doorbell_info);
 		list_del(&v2m->entry);
 		kfree(v2m->bm);
 		iounmap(v2m->base);
@@ -370,12 +373,18 @@ static int __init gicv2m_init_one(struct fwnode_handle *fwnode,
 		goto err_iounmap;
 	}
 
+	ret = msi_doorbell_register_global(v2m->res.start, sizeof(u32),
+					   false, &v2m->doorbell_info);
+	if (ret)
+		goto err_free_bm;
+
 	list_add_tail(&v2m->entry, &v2m_nodes);
 
 	pr_info("range%pR, SPI[%d:%d]\n", res,
 		v2m->spi_start, (v2m->spi_start + v2m->nr_spis - 1));
 	return 0;
-
+err_free_bm:
+	kfree(v2m->bm);
 err_iounmap:
 	iounmap(v2m->base);
 err_free_v2m:
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v13 06/15] irqchip/gic-v2m: Register the MSI doorbell
@ 2016-10-06  8:45   ` Eric Auger
  0 siblings, 0 replies; 109+ messages in thread
From: Eric Auger @ 2016-10-06  8:45 UTC (permalink / raw)
  To: eric.auger-H+wXaHxf7aLQT0dZR+AlfA,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	marc.zyngier-5wv7dgnIgG8, robin.murphy-5wv7dgnIgG8,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	joro-zLv9SwRftAIdnm+yROfE0A, tglx-hfZtesqFncYOwBW4kG4KsQ,
	jason-NLaQJdtUoK4Be96aLqz0jA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: drjones-H+wXaHxf7aLQT0dZR+AlfA, kvm-u79uwXL29TY76Z2rM5mHXA,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	yehuday-eYqpPyKDWXRBDgjK7y7TUQ

Register the GIC V2M global doorbell. The registered information
are used to set up the KVM passthrough use case.

Signed-off-by: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

---
v12 -> v13:
- use new msi doorbell registration prototype
- remove iommu protection attributes
- add unregistration in teardown

v11 -> v12:
- use irq_get_msi_doorbell_info new name
- simplify error handling

v10 -> v11:
- use the new registration API and re-implement the msi_doorbell_info
  ops

v9 -> v10:
- introduce the registration concept in place of msi_doorbell_info
  callback

v8 -> v9:
- use global_doorbell instead of percpu_doorbells

v7 -> v8:
- gicv2m_msi_doorbell_info does not return a pointer to const
- remove spurious !v2m check
- add IOMMU_MMIO flag

v7: creation
---
 drivers/irqchip/irq-gic-v2m.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/irqchip/irq-gic-v2m.c b/drivers/irqchip/irq-gic-v2m.c
index 863e073..343f19f 100644
--- a/drivers/irqchip/irq-gic-v2m.c
+++ b/drivers/irqchip/irq-gic-v2m.c
@@ -26,6 +26,7 @@
 #include <linux/slab.h>
 #include <linux/spinlock.h>
 #include <linux/irqchip/arm-gic.h>
+#include <linux/msi-doorbell.h>
 
 /*
 * MSI_TYPER:
@@ -70,6 +71,7 @@ struct v2m_data {
 	u32 spi_offset;		/* offset to be subtracted from SPI number */
 	unsigned long *bm;	/* MSI vector bitmap */
 	u32 flags;		/* v2m flags for specific implementation */
+	struct msi_doorbell_info *doorbell_info; /* MSI doorbell */
 };
 
 static void gicv2m_mask_msi_irq(struct irq_data *d)
@@ -254,6 +256,7 @@ static void gicv2m_teardown(void)
 	struct v2m_data *v2m, *tmp;
 
 	list_for_each_entry_safe(v2m, tmp, &v2m_nodes, entry) {
+		msi_doorbell_unregister_global(v2m->doorbell_info);
 		list_del(&v2m->entry);
 		kfree(v2m->bm);
 		iounmap(v2m->base);
@@ -370,12 +373,18 @@ static int __init gicv2m_init_one(struct fwnode_handle *fwnode,
 		goto err_iounmap;
 	}
 
+	ret = msi_doorbell_register_global(v2m->res.start, sizeof(u32),
+					   false, &v2m->doorbell_info);
+	if (ret)
+		goto err_free_bm;
+
 	list_add_tail(&v2m->entry, &v2m_nodes);
 
 	pr_info("range%pR, SPI[%d:%d]\n", res,
 		v2m->spi_start, (v2m->spi_start + v2m->nr_spis - 1));
 	return 0;
-
+err_free_bm:
+	kfree(v2m->bm);
 err_iounmap:
 	iounmap(v2m->base);
 err_free_v2m:
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v13 06/15] irqchip/gic-v2m: Register the MSI doorbell
@ 2016-10-06  8:45   ` Eric Auger
  0 siblings, 0 replies; 109+ messages in thread
From: Eric Auger @ 2016-10-06  8:45 UTC (permalink / raw)
  To: linux-arm-kernel

Register the GIC V2M global doorbell. The registered information
are used to set up the KVM passthrough use case.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
v12 -> v13:
- use new msi doorbell registration prototype
- remove iommu protection attributes
- add unregistration in teardown

v11 -> v12:
- use irq_get_msi_doorbell_info new name
- simplify error handling

v10 -> v11:
- use the new registration API and re-implement the msi_doorbell_info
  ops

v9 -> v10:
- introduce the registration concept in place of msi_doorbell_info
  callback

v8 -> v9:
- use global_doorbell instead of percpu_doorbells

v7 -> v8:
- gicv2m_msi_doorbell_info does not return a pointer to const
- remove spurious !v2m check
- add IOMMU_MMIO flag

v7: creation
---
 drivers/irqchip/irq-gic-v2m.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/irqchip/irq-gic-v2m.c b/drivers/irqchip/irq-gic-v2m.c
index 863e073..343f19f 100644
--- a/drivers/irqchip/irq-gic-v2m.c
+++ b/drivers/irqchip/irq-gic-v2m.c
@@ -26,6 +26,7 @@
 #include <linux/slab.h>
 #include <linux/spinlock.h>
 #include <linux/irqchip/arm-gic.h>
+#include <linux/msi-doorbell.h>
 
 /*
 * MSI_TYPER:
@@ -70,6 +71,7 @@ struct v2m_data {
 	u32 spi_offset;		/* offset to be subtracted from SPI number */
 	unsigned long *bm;	/* MSI vector bitmap */
 	u32 flags;		/* v2m flags for specific implementation */
+	struct msi_doorbell_info *doorbell_info; /* MSI doorbell */
 };
 
 static void gicv2m_mask_msi_irq(struct irq_data *d)
@@ -254,6 +256,7 @@ static void gicv2m_teardown(void)
 	struct v2m_data *v2m, *tmp;
 
 	list_for_each_entry_safe(v2m, tmp, &v2m_nodes, entry) {
+		msi_doorbell_unregister_global(v2m->doorbell_info);
 		list_del(&v2m->entry);
 		kfree(v2m->bm);
 		iounmap(v2m->base);
@@ -370,12 +373,18 @@ static int __init gicv2m_init_one(struct fwnode_handle *fwnode,
 		goto err_iounmap;
 	}
 
+	ret = msi_doorbell_register_global(v2m->res.start, sizeof(u32),
+					   false, &v2m->doorbell_info);
+	if (ret)
+		goto err_free_bm;
+
 	list_add_tail(&v2m->entry, &v2m_nodes);
 
 	pr_info("range%pR, SPI[%d:%d]\n", res,
 		v2m->spi_start, (v2m->spi_start + v2m->nr_spis - 1));
 	return 0;
-
+err_free_bm:
+	kfree(v2m->bm);
 err_iounmap:
 	iounmap(v2m->base);
 err_free_v2m:
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v13 07/15] irqchip/gicv3-its: Register the MSI doorbell
@ 2016-10-06  8:45   ` Eric Auger
  0 siblings, 0 replies; 109+ messages in thread
From: Eric Auger @ 2016-10-06  8:45 UTC (permalink / raw)
  To: eric.auger, eric.auger.pro, christoffer.dall, marc.zyngier,
	robin.murphy, alex.williamson, will.deacon, joro, tglx, jason,
	linux-arm-kernel
  Cc: kvm, drjones, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
	p.fedin, iommu, Jean-Philippe.Brucker, yehuday, Manish.Jaggi

This patch registers the ITS global doorbell. Registered information
are needed to set up the KVM passthrough use case.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
v12 -> v13:
- use new doorbell registration prototype

v11 -> v12:
- use new irq_get_msi_doorbell_info name
- simplify error handling

v10 -> v11:
- adapt to new doorbell registration API and implement msi_doorbell_info
---
 drivers/irqchip/irq-gic-v3-its.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 98ff669..3fc715e 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -30,6 +30,7 @@
 #include <linux/of_platform.h>
 #include <linux/percpu.h>
 #include <linux/slab.h>
+#include <linux/msi-doorbell.h>
 
 #include <linux/irqchip.h>
 #include <linux/irqchip/arm-gic-v3.h>
@@ -86,6 +87,7 @@ struct its_node {
 	u32			ite_size;
 	u32			device_ids;
 	int			numa_node;
+	struct msi_doorbell_info	*doorbell_info;
 };
 
 #define ITS_ITT_ALIGN		SZ_256
@@ -1717,6 +1719,7 @@ static int __init its_probe(struct device_node *node,
 
 	if (of_property_read_bool(node, "msi-controller")) {
 		struct msi_domain_info *info;
+		phys_addr_t translater;
 
 		info = kzalloc(sizeof(*info), GFP_KERNEL);
 		if (!info) {
@@ -1724,10 +1727,21 @@ static int __init its_probe(struct device_node *node,
 			goto out_free_tables;
 		}
 
+		translater = its->phys_base + GITS_TRANSLATER;
+		err = msi_doorbell_register_global(translater, sizeof(u32),
+						   true, &its->doorbell_info);
+
+		if (err)  {
+			kfree(info);
+			goto out_free_tables;
+		}
+
+
 		inner_domain = irq_domain_add_tree(node, &its_domain_ops, its);
 		if (!inner_domain) {
 			err = -ENOMEM;
 			kfree(info);
+			msi_doorbell_unregister_global(its->doorbell_info);
 			goto out_free_tables;
 		}
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v13 07/15] irqchip/gicv3-its: Register the MSI doorbell
@ 2016-10-06  8:45   ` Eric Auger
  0 siblings, 0 replies; 109+ messages in thread
From: Eric Auger @ 2016-10-06  8:45 UTC (permalink / raw)
  To: eric.auger-H+wXaHxf7aLQT0dZR+AlfA,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	marc.zyngier-5wv7dgnIgG8, robin.murphy-5wv7dgnIgG8,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	joro-zLv9SwRftAIdnm+yROfE0A, tglx-hfZtesqFncYOwBW4kG4KsQ,
	jason-NLaQJdtUoK4Be96aLqz0jA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: drjones-H+wXaHxf7aLQT0dZR+AlfA, kvm-u79uwXL29TY76Z2rM5mHXA,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	yehuday-eYqpPyKDWXRBDgjK7y7TUQ

This patch registers the ITS global doorbell. Registered information
are needed to set up the KVM passthrough use case.

Signed-off-by: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

---
v12 -> v13:
- use new doorbell registration prototype

v11 -> v12:
- use new irq_get_msi_doorbell_info name
- simplify error handling

v10 -> v11:
- adapt to new doorbell registration API and implement msi_doorbell_info
---
 drivers/irqchip/irq-gic-v3-its.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 98ff669..3fc715e 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -30,6 +30,7 @@
 #include <linux/of_platform.h>
 #include <linux/percpu.h>
 #include <linux/slab.h>
+#include <linux/msi-doorbell.h>
 
 #include <linux/irqchip.h>
 #include <linux/irqchip/arm-gic-v3.h>
@@ -86,6 +87,7 @@ struct its_node {
 	u32			ite_size;
 	u32			device_ids;
 	int			numa_node;
+	struct msi_doorbell_info	*doorbell_info;
 };
 
 #define ITS_ITT_ALIGN		SZ_256
@@ -1717,6 +1719,7 @@ static int __init its_probe(struct device_node *node,
 
 	if (of_property_read_bool(node, "msi-controller")) {
 		struct msi_domain_info *info;
+		phys_addr_t translater;
 
 		info = kzalloc(sizeof(*info), GFP_KERNEL);
 		if (!info) {
@@ -1724,10 +1727,21 @@ static int __init its_probe(struct device_node *node,
 			goto out_free_tables;
 		}
 
+		translater = its->phys_base + GITS_TRANSLATER;
+		err = msi_doorbell_register_global(translater, sizeof(u32),
+						   true, &its->doorbell_info);
+
+		if (err)  {
+			kfree(info);
+			goto out_free_tables;
+		}
+
+
 		inner_domain = irq_domain_add_tree(node, &its_domain_ops, its);
 		if (!inner_domain) {
 			err = -ENOMEM;
 			kfree(info);
+			msi_doorbell_unregister_global(its->doorbell_info);
 			goto out_free_tables;
 		}
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v13 07/15] irqchip/gicv3-its: Register the MSI doorbell
@ 2016-10-06  8:45   ` Eric Auger
  0 siblings, 0 replies; 109+ messages in thread
From: Eric Auger @ 2016-10-06  8:45 UTC (permalink / raw)
  To: linux-arm-kernel

This patch registers the ITS global doorbell. Registered information
are needed to set up the KVM passthrough use case.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
v12 -> v13:
- use new doorbell registration prototype

v11 -> v12:
- use new irq_get_msi_doorbell_info name
- simplify error handling

v10 -> v11:
- adapt to new doorbell registration API and implement msi_doorbell_info
---
 drivers/irqchip/irq-gic-v3-its.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 98ff669..3fc715e 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -30,6 +30,7 @@
 #include <linux/of_platform.h>
 #include <linux/percpu.h>
 #include <linux/slab.h>
+#include <linux/msi-doorbell.h>
 
 #include <linux/irqchip.h>
 #include <linux/irqchip/arm-gic-v3.h>
@@ -86,6 +87,7 @@ struct its_node {
 	u32			ite_size;
 	u32			device_ids;
 	int			numa_node;
+	struct msi_doorbell_info	*doorbell_info;
 };
 
 #define ITS_ITT_ALIGN		SZ_256
@@ -1717,6 +1719,7 @@ static int __init its_probe(struct device_node *node,
 
 	if (of_property_read_bool(node, "msi-controller")) {
 		struct msi_domain_info *info;
+		phys_addr_t translater;
 
 		info = kzalloc(sizeof(*info), GFP_KERNEL);
 		if (!info) {
@@ -1724,10 +1727,21 @@ static int __init its_probe(struct device_node *node,
 			goto out_free_tables;
 		}
 
+		translater = its->phys_base + GITS_TRANSLATER;
+		err = msi_doorbell_register_global(translater, sizeof(u32),
+						   true, &its->doorbell_info);
+
+		if (err)  {
+			kfree(info);
+			goto out_free_tables;
+		}
+
+
 		inner_domain = irq_domain_add_tree(node, &its_domain_ops, its);
 		if (!inner_domain) {
 			err = -ENOMEM;
 			kfree(info);
+			msi_doorbell_unregister_global(its->doorbell_info);
 			goto out_free_tables;
 		}
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v13 08/15] vfio: Introduce a vfio_dma type field
@ 2016-10-06  8:45   ` Eric Auger
  0 siblings, 0 replies; 109+ messages in thread
From: Eric Auger @ 2016-10-06  8:45 UTC (permalink / raw)
  To: eric.auger, eric.auger.pro, christoffer.dall, marc.zyngier,
	robin.murphy, alex.williamson, will.deacon, joro, tglx, jason,
	linux-arm-kernel
  Cc: kvm, drjones, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
	p.fedin, iommu, Jean-Philippe.Brucker, yehuday, Manish.Jaggi

We introduce a vfio_dma type since we will need to discriminate
different types of dma slots:
- VFIO_IOVA_USER: IOVA region used to map user vaddr
- VFIO_IOVA_RESERVED_MSI: IOVA region reserved to map MSI doorbells

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
v9 -> v10:
- renamed VFIO_IOVA_RESERVED into VFIO_IOVA_RESERVED_MSI
- explicitly set type to VFIO_IOVA_USER on dma_map

v6 -> v7:
- add VFIO_IOVA_ANY
- do not introduce yet any VFIO_IOVA_RESERVED handling
---
 drivers/vfio/vfio_iommu_type1.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 2ba1942..a9f8b93 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -53,6 +53,12 @@ module_param_named(disable_hugepages,
 MODULE_PARM_DESC(disable_hugepages,
 		 "Disable VFIO IOMMU support for IOMMU hugepages.");
 
+enum vfio_iova_type {
+	VFIO_IOVA_USER = 0,	/* standard IOVA used to map user vaddr */
+	VFIO_IOVA_RESERVED_MSI,	/* reserved to map MSI doorbells */
+	VFIO_IOVA_ANY,		/* matches any IOVA type */
+};
+
 struct vfio_iommu {
 	struct list_head	domain_list;
 	struct mutex		lock;
@@ -75,6 +81,7 @@ struct vfio_dma {
 	unsigned long		vaddr;		/* Process virtual addr */
 	size_t			size;		/* Map size (bytes) */
 	int			prot;		/* IOMMU_READ/WRITE */
+	enum vfio_iova_type	type;		/* type of IOVA */
 };
 
 struct vfio_group {
@@ -607,6 +614,7 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
 	dma->iova = iova;
 	dma->vaddr = vaddr;
 	dma->prot = prot;
+	dma->type = VFIO_IOVA_USER;
 
 	/* Insert zero-sized and grow as we map chunks of it */
 	vfio_link_dma(iommu, dma);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v13 08/15] vfio: Introduce a vfio_dma type field
@ 2016-10-06  8:45   ` Eric Auger
  0 siblings, 0 replies; 109+ messages in thread
From: Eric Auger @ 2016-10-06  8:45 UTC (permalink / raw)
  To: eric.auger-H+wXaHxf7aLQT0dZR+AlfA,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	marc.zyngier-5wv7dgnIgG8, robin.murphy-5wv7dgnIgG8,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	joro-zLv9SwRftAIdnm+yROfE0A, tglx-hfZtesqFncYOwBW4kG4KsQ,
	jason-NLaQJdtUoK4Be96aLqz0jA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: drjones-H+wXaHxf7aLQT0dZR+AlfA, kvm-u79uwXL29TY76Z2rM5mHXA,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	yehuday-eYqpPyKDWXRBDgjK7y7TUQ

We introduce a vfio_dma type since we will need to discriminate
different types of dma slots:
- VFIO_IOVA_USER: IOVA region used to map user vaddr
- VFIO_IOVA_RESERVED_MSI: IOVA region reserved to map MSI doorbells

Signed-off-by: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

---
v9 -> v10:
- renamed VFIO_IOVA_RESERVED into VFIO_IOVA_RESERVED_MSI
- explicitly set type to VFIO_IOVA_USER on dma_map

v6 -> v7:
- add VFIO_IOVA_ANY
- do not introduce yet any VFIO_IOVA_RESERVED handling
---
 drivers/vfio/vfio_iommu_type1.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 2ba1942..a9f8b93 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -53,6 +53,12 @@ module_param_named(disable_hugepages,
 MODULE_PARM_DESC(disable_hugepages,
 		 "Disable VFIO IOMMU support for IOMMU hugepages.");
 
+enum vfio_iova_type {
+	VFIO_IOVA_USER = 0,	/* standard IOVA used to map user vaddr */
+	VFIO_IOVA_RESERVED_MSI,	/* reserved to map MSI doorbells */
+	VFIO_IOVA_ANY,		/* matches any IOVA type */
+};
+
 struct vfio_iommu {
 	struct list_head	domain_list;
 	struct mutex		lock;
@@ -75,6 +81,7 @@ struct vfio_dma {
 	unsigned long		vaddr;		/* Process virtual addr */
 	size_t			size;		/* Map size (bytes) */
 	int			prot;		/* IOMMU_READ/WRITE */
+	enum vfio_iova_type	type;		/* type of IOVA */
 };
 
 struct vfio_group {
@@ -607,6 +614,7 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
 	dma->iova = iova;
 	dma->vaddr = vaddr;
 	dma->prot = prot;
+	dma->type = VFIO_IOVA_USER;
 
 	/* Insert zero-sized and grow as we map chunks of it */
 	vfio_link_dma(iommu, dma);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v13 08/15] vfio: Introduce a vfio_dma type field
@ 2016-10-06  8:45   ` Eric Auger
  0 siblings, 0 replies; 109+ messages in thread
From: Eric Auger @ 2016-10-06  8:45 UTC (permalink / raw)
  To: linux-arm-kernel

We introduce a vfio_dma type since we will need to discriminate
different types of dma slots:
- VFIO_IOVA_USER: IOVA region used to map user vaddr
- VFIO_IOVA_RESERVED_MSI: IOVA region reserved to map MSI doorbells

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
v9 -> v10:
- renamed VFIO_IOVA_RESERVED into VFIO_IOVA_RESERVED_MSI
- explicitly set type to VFIO_IOVA_USER on dma_map

v6 -> v7:
- add VFIO_IOVA_ANY
- do not introduce yet any VFIO_IOVA_RESERVED handling
---
 drivers/vfio/vfio_iommu_type1.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 2ba1942..a9f8b93 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -53,6 +53,12 @@ module_param_named(disable_hugepages,
 MODULE_PARM_DESC(disable_hugepages,
 		 "Disable VFIO IOMMU support for IOMMU hugepages.");
 
+enum vfio_iova_type {
+	VFIO_IOVA_USER = 0,	/* standard IOVA used to map user vaddr */
+	VFIO_IOVA_RESERVED_MSI,	/* reserved to map MSI doorbells */
+	VFIO_IOVA_ANY,		/* matches any IOVA type */
+};
+
 struct vfio_iommu {
 	struct list_head	domain_list;
 	struct mutex		lock;
@@ -75,6 +81,7 @@ struct vfio_dma {
 	unsigned long		vaddr;		/* Process virtual addr */
 	size_t			size;		/* Map size (bytes) */
 	int			prot;		/* IOMMU_READ/WRITE */
+	enum vfio_iova_type	type;		/* type of IOVA */
 };
 
 struct vfio_group {
@@ -607,6 +614,7 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
 	dma->iova = iova;
 	dma->vaddr = vaddr;
 	dma->prot = prot;
+	dma->type = VFIO_IOVA_USER;
 
 	/* Insert zero-sized and grow as we map chunks of it */
 	vfio_link_dma(iommu, dma);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v13 09/15] vfio/type1: vfio_find_dma accepting a type argument
@ 2016-10-06  8:45   ` Eric Auger
  0 siblings, 0 replies; 109+ messages in thread
From: Eric Auger @ 2016-10-06  8:45 UTC (permalink / raw)
  To: eric.auger, eric.auger.pro, christoffer.dall, marc.zyngier,
	robin.murphy, alex.williamson, will.deacon, joro, tglx, jason,
	linux-arm-kernel
  Cc: kvm, drjones, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
	p.fedin, iommu, Jean-Philippe.Brucker, yehuday, Manish.Jaggi

In our RB-tree we get prepared to insert slots of different types
(USER and RESERVED). It becomes useful to be able to search for dma
slots of a specific type or any type.

This patch introduces vfio_find_dma_from_node which starts the
search from a given node and stops on the first node that matches
the @start and @size parameters. If this node also matches the
@type parameter, the node is returned else NULL is returned.

At the moment we only have USER SLOTS so the type will always match.

In a separate patch, this function will be enhanced to pursue the
search recursively in case a node with a different type is
encountered.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 drivers/vfio/vfio_iommu_type1.c | 53 +++++++++++++++++++++++++++++++++--------
 1 file changed, 43 insertions(+), 10 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index a9f8b93..cb7267a 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -94,25 +94,56 @@ struct vfio_group {
  * into DMA'ble space using the IOMMU
  */
 
-static struct vfio_dma *vfio_find_dma(struct vfio_iommu *iommu,
-				      dma_addr_t start, size_t size)
+/**
+ * vfio_find_dma_from_node: looks for a dma slot intersecting a window
+ * from a given rb tree node
+ * @top: top rb tree node where the search starts (including this node)
+ * @start: window start
+ * @size: window size
+ * @type: window type
+ */
+static struct vfio_dma *vfio_find_dma_from_node(struct rb_node *top,
+						dma_addr_t start, size_t size,
+						enum vfio_iova_type type)
 {
-	struct rb_node *node = iommu->dma_list.rb_node;
+	struct rb_node *node = top;
+	struct vfio_dma *dma;
 
 	while (node) {
-		struct vfio_dma *dma = rb_entry(node, struct vfio_dma, node);
-
+		dma = rb_entry(node, struct vfio_dma, node);
 		if (start + size <= dma->iova)
 			node = node->rb_left;
 		else if (start >= dma->iova + dma->size)
 			node = node->rb_right;
 		else
-			return dma;
+			break;
 	}
+	if (!node)
+		return NULL;
+
+	/* a dma slot intersects our window, check the type also matches */
+	if (type == VFIO_IOVA_ANY || dma->type == type)
+		return dma;
 
 	return NULL;
 }
 
+/**
+ * vfio_find_dma: find a dma slot intersecting a given window
+ * @iommu: vfio iommu handle
+ * @start: window base iova
+ * @size: window size
+ * @type: window type
+ */
+static struct vfio_dma *vfio_find_dma(struct vfio_iommu *iommu,
+				      dma_addr_t start, size_t size,
+				      enum vfio_iova_type type)
+{
+	struct rb_node *top_node = iommu->dma_list.rb_node;
+
+	return vfio_find_dma_from_node(top_node, start, size, type);
+}
+
 static void vfio_link_dma(struct vfio_iommu *iommu, struct vfio_dma *new)
 {
 	struct rb_node **link = &iommu->dma_list.rb_node, *parent = NULL;
@@ -484,19 +515,21 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
 	 * mappings within the range.
 	 */
 	if (iommu->v2) {
-		dma = vfio_find_dma(iommu, unmap->iova, 0);
+		dma = vfio_find_dma(iommu, unmap->iova, 0, VFIO_IOVA_USER);
 		if (dma && dma->iova != unmap->iova) {
 			ret = -EINVAL;
 			goto unlock;
 		}
-		dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0);
+		dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0,
+				    VFIO_IOVA_USER);
 		if (dma && dma->iova + dma->size != unmap->iova + unmap->size) {
 			ret = -EINVAL;
 			goto unlock;
 		}
 	}
 
-	while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size))) {
+	while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size,
+				    VFIO_IOVA_USER))) {
 		if (!iommu->v2 && unmap->iova > dma->iova)
 			break;
 		unmapped += dma->size;
@@ -600,7 +633,7 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
 
 	mutex_lock(&iommu->lock);
 
-	if (vfio_find_dma(iommu, iova, size)) {
+	if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
 		mutex_unlock(&iommu->lock);
 		return -EEXIST;
 	}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v13 09/15] vfio/type1: vfio_find_dma accepting a type argument
@ 2016-10-06  8:45   ` Eric Auger
  0 siblings, 0 replies; 109+ messages in thread
From: Eric Auger @ 2016-10-06  8:45 UTC (permalink / raw)
  To: eric.auger-H+wXaHxf7aLQT0dZR+AlfA,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	marc.zyngier-5wv7dgnIgG8, robin.murphy-5wv7dgnIgG8,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	joro-zLv9SwRftAIdnm+yROfE0A, tglx-hfZtesqFncYOwBW4kG4KsQ,
	jason-NLaQJdtUoK4Be96aLqz0jA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: drjones-H+wXaHxf7aLQT0dZR+AlfA, kvm-u79uwXL29TY76Z2rM5mHXA,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	yehuday-eYqpPyKDWXRBDgjK7y7TUQ

In our RB-tree we get prepared to insert slots of different types
(USER and RESERVED). It becomes useful to be able to search for dma
slots of a specific type or any type.

This patch introduces vfio_find_dma_from_node which starts the
search from a given node and stops on the first node that matches
the @start and @size parameters. If this node also matches the
@type parameter, the node is returned else NULL is returned.

At the moment we only have USER SLOTS so the type will always match.

In a separate patch, this function will be enhanced to pursue the
search recursively in case a node with a different type is
encountered.

Signed-off-by: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 drivers/vfio/vfio_iommu_type1.c | 53 +++++++++++++++++++++++++++++++++--------
 1 file changed, 43 insertions(+), 10 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index a9f8b93..cb7267a 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -94,25 +94,56 @@ struct vfio_group {
  * into DMA'ble space using the IOMMU
  */
 
-static struct vfio_dma *vfio_find_dma(struct vfio_iommu *iommu,
-				      dma_addr_t start, size_t size)
+/**
+ * vfio_find_dma_from_node: looks for a dma slot intersecting a window
+ * from a given rb tree node
+ * @top: top rb tree node where the search starts (including this node)
+ * @start: window start
+ * @size: window size
+ * @type: window type
+ */
+static struct vfio_dma *vfio_find_dma_from_node(struct rb_node *top,
+						dma_addr_t start, size_t size,
+						enum vfio_iova_type type)
 {
-	struct rb_node *node = iommu->dma_list.rb_node;
+	struct rb_node *node = top;
+	struct vfio_dma *dma;
 
 	while (node) {
-		struct vfio_dma *dma = rb_entry(node, struct vfio_dma, node);
-
+		dma = rb_entry(node, struct vfio_dma, node);
 		if (start + size <= dma->iova)
 			node = node->rb_left;
 		else if (start >= dma->iova + dma->size)
 			node = node->rb_right;
 		else
-			return dma;
+			break;
 	}
+	if (!node)
+		return NULL;
+
+	/* a dma slot intersects our window, check the type also matches */
+	if (type == VFIO_IOVA_ANY || dma->type == type)
+		return dma;
 
 	return NULL;
 }
 
+/**
+ * vfio_find_dma: find a dma slot intersecting a given window
+ * @iommu: vfio iommu handle
+ * @start: window base iova
+ * @size: window size
+ * @type: window type
+ */
+static struct vfio_dma *vfio_find_dma(struct vfio_iommu *iommu,
+				      dma_addr_t start, size_t size,
+				      enum vfio_iova_type type)
+{
+	struct rb_node *top_node = iommu->dma_list.rb_node;
+
+	return vfio_find_dma_from_node(top_node, start, size, type);
+}
+
 static void vfio_link_dma(struct vfio_iommu *iommu, struct vfio_dma *new)
 {
 	struct rb_node **link = &iommu->dma_list.rb_node, *parent = NULL;
@@ -484,19 +515,21 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
 	 * mappings within the range.
 	 */
 	if (iommu->v2) {
-		dma = vfio_find_dma(iommu, unmap->iova, 0);
+		dma = vfio_find_dma(iommu, unmap->iova, 0, VFIO_IOVA_USER);
 		if (dma && dma->iova != unmap->iova) {
 			ret = -EINVAL;
 			goto unlock;
 		}
-		dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0);
+		dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0,
+				    VFIO_IOVA_USER);
 		if (dma && dma->iova + dma->size != unmap->iova + unmap->size) {
 			ret = -EINVAL;
 			goto unlock;
 		}
 	}
 
-	while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size))) {
+	while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size,
+				    VFIO_IOVA_USER))) {
 		if (!iommu->v2 && unmap->iova > dma->iova)
 			break;
 		unmapped += dma->size;
@@ -600,7 +633,7 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
 
 	mutex_lock(&iommu->lock);
 
-	if (vfio_find_dma(iommu, iova, size)) {
+	if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
 		mutex_unlock(&iommu->lock);
 		return -EEXIST;
 	}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v13 09/15] vfio/type1: vfio_find_dma accepting a type argument
@ 2016-10-06  8:45   ` Eric Auger
  0 siblings, 0 replies; 109+ messages in thread
From: Eric Auger @ 2016-10-06  8:45 UTC (permalink / raw)
  To: linux-arm-kernel

In our RB-tree we get prepared to insert slots of different types
(USER and RESERVED). It becomes useful to be able to search for dma
slots of a specific type or any type.

This patch introduces vfio_find_dma_from_node which starts the
search from a given node and stops on the first node that matches
the @start and @size parameters. If this node also matches the
@type parameter, the node is returned else NULL is returned.

At the moment we only have USER SLOTS so the type will always match.

In a separate patch, this function will be enhanced to pursue the
search recursively in case a node with a different type is
encountered.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 drivers/vfio/vfio_iommu_type1.c | 53 +++++++++++++++++++++++++++++++++--------
 1 file changed, 43 insertions(+), 10 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index a9f8b93..cb7267a 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -94,25 +94,56 @@ struct vfio_group {
  * into DMA'ble space using the IOMMU
  */
 
-static struct vfio_dma *vfio_find_dma(struct vfio_iommu *iommu,
-				      dma_addr_t start, size_t size)
+/**
+ * vfio_find_dma_from_node: looks for a dma slot intersecting a window
+ * from a given rb tree node
+ * @top: top rb tree node where the search starts (including this node)
+ * @start: window start
+ * @size: window size
+ * @type: window type
+ */
+static struct vfio_dma *vfio_find_dma_from_node(struct rb_node *top,
+						dma_addr_t start, size_t size,
+						enum vfio_iova_type type)
 {
-	struct rb_node *node = iommu->dma_list.rb_node;
+	struct rb_node *node = top;
+	struct vfio_dma *dma;
 
 	while (node) {
-		struct vfio_dma *dma = rb_entry(node, struct vfio_dma, node);
-
+		dma = rb_entry(node, struct vfio_dma, node);
 		if (start + size <= dma->iova)
 			node = node->rb_left;
 		else if (start >= dma->iova + dma->size)
 			node = node->rb_right;
 		else
-			return dma;
+			break;
 	}
+	if (!node)
+		return NULL;
+
+	/* a dma slot intersects our window, check the type also matches */
+	if (type == VFIO_IOVA_ANY || dma->type == type)
+		return dma;
 
 	return NULL;
 }
 
+/**
+ * vfio_find_dma: find a dma slot intersecting a given window
+ * @iommu: vfio iommu handle
+ * @start: window base iova
+ * @size: window size
+ * @type: window type
+ */
+static struct vfio_dma *vfio_find_dma(struct vfio_iommu *iommu,
+				      dma_addr_t start, size_t size,
+				      enum vfio_iova_type type)
+{
+	struct rb_node *top_node = iommu->dma_list.rb_node;
+
+	return vfio_find_dma_from_node(top_node, start, size, type);
+}
+
 static void vfio_link_dma(struct vfio_iommu *iommu, struct vfio_dma *new)
 {
 	struct rb_node **link = &iommu->dma_list.rb_node, *parent = NULL;
@@ -484,19 +515,21 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
 	 * mappings within the range.
 	 */
 	if (iommu->v2) {
-		dma = vfio_find_dma(iommu, unmap->iova, 0);
+		dma = vfio_find_dma(iommu, unmap->iova, 0, VFIO_IOVA_USER);
 		if (dma && dma->iova != unmap->iova) {
 			ret = -EINVAL;
 			goto unlock;
 		}
-		dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0);
+		dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0,
+				    VFIO_IOVA_USER);
 		if (dma && dma->iova + dma->size != unmap->iova + unmap->size) {
 			ret = -EINVAL;
 			goto unlock;
 		}
 	}
 
-	while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size))) {
+	while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size,
+				    VFIO_IOVA_USER))) {
 		if (!iommu->v2 && unmap->iova > dma->iova)
 			break;
 		unmapped += dma->size;
@@ -600,7 +633,7 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
 
 	mutex_lock(&iommu->lock);
 
-	if (vfio_find_dma(iommu, iova, size)) {
+	if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
 		mutex_unlock(&iommu->lock);
 		return -EEXIST;
 	}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v13 10/15] vfio/type1: Implement recursive vfio_find_dma_from_node
@ 2016-10-06  8:45   ` Eric Auger
  0 siblings, 0 replies; 109+ messages in thread
From: Eric Auger @ 2016-10-06  8:45 UTC (permalink / raw)
  To: eric.auger, eric.auger.pro, christoffer.dall, marc.zyngier,
	robin.murphy, alex.williamson, will.deacon, joro, tglx, jason,
	linux-arm-kernel
  Cc: kvm, drjones, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
	p.fedin, iommu, Jean-Philippe.Brucker, yehuday, Manish.Jaggi

This patch handles the case where a node is encountered, matching
@start and @size arguments but not matching the @type argument.
In that case, we need to skip that node and pursue the search in the
node's leaves. In case @start is inferior to the node's base, we
resume the search on the left leaf. If the recursive search on the left
leaves did not produce any match, we search the right leaves recursively.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v10: creation
---
 drivers/vfio/vfio_iommu_type1.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index cb7267a..65a4038 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -125,7 +125,17 @@ static struct vfio_dma *vfio_find_dma_from_node(struct rb_node *top,
 	if (type == VFIO_IOVA_ANY || dma->type == type)
 		return dma;
 
-	return NULL;
+	/* restart 2 searches skipping the current node */
+	if (start < dma->iova) {
+		dma = vfio_find_dma_from_node(node->rb_left, start,
+					      size, type);
+		if (dma)
+			return dma;
+	}
+	if (start + size > dma->iova + dma->size)
+		dma = vfio_find_dma_from_node(node->rb_right, start,
+					      size, type);
+	return dma;
 }
 
 /**
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v13 10/15] vfio/type1: Implement recursive vfio_find_dma_from_node
@ 2016-10-06  8:45   ` Eric Auger
  0 siblings, 0 replies; 109+ messages in thread
From: Eric Auger @ 2016-10-06  8:45 UTC (permalink / raw)
  To: eric.auger-H+wXaHxf7aLQT0dZR+AlfA,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	marc.zyngier-5wv7dgnIgG8, robin.murphy-5wv7dgnIgG8,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	joro-zLv9SwRftAIdnm+yROfE0A, tglx-hfZtesqFncYOwBW4kG4KsQ,
	jason-NLaQJdtUoK4Be96aLqz0jA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: drjones-H+wXaHxf7aLQT0dZR+AlfA, kvm-u79uwXL29TY76Z2rM5mHXA,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	yehuday-eYqpPyKDWXRBDgjK7y7TUQ

This patch handles the case where a node is encountered, matching
@start and @size arguments but not matching the @type argument.
In that case, we need to skip that node and pursue the search in the
node's leaves. In case @start is inferior to the node's base, we
resume the search on the left leaf. If the recursive search on the left
leaves did not produce any match, we search the right leaves recursively.

Signed-off-by: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

---

v10: creation
---
 drivers/vfio/vfio_iommu_type1.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index cb7267a..65a4038 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -125,7 +125,17 @@ static struct vfio_dma *vfio_find_dma_from_node(struct rb_node *top,
 	if (type == VFIO_IOVA_ANY || dma->type == type)
 		return dma;
 
-	return NULL;
+	/* restart 2 searches skipping the current node */
+	if (start < dma->iova) {
+		dma = vfio_find_dma_from_node(node->rb_left, start,
+					      size, type);
+		if (dma)
+			return dma;
+	}
+	if (start + size > dma->iova + dma->size)
+		dma = vfio_find_dma_from_node(node->rb_right, start,
+					      size, type);
+	return dma;
 }
 
 /**
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v13 10/15] vfio/type1: Implement recursive vfio_find_dma_from_node
@ 2016-10-06  8:45   ` Eric Auger
  0 siblings, 0 replies; 109+ messages in thread
From: Eric Auger @ 2016-10-06  8:45 UTC (permalink / raw)
  To: linux-arm-kernel

This patch handles the case where a node is encountered, matching
@start and @size arguments but not matching the @type argument.
In that case, we need to skip that node and pursue the search in the
node's leaves. In case @start is inferior to the node's base, we
resume the search on the left leaf. If the recursive search on the left
leaves did not produce any match, we search the right leaves recursively.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v10: creation
---
 drivers/vfio/vfio_iommu_type1.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index cb7267a..65a4038 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -125,7 +125,17 @@ static struct vfio_dma *vfio_find_dma_from_node(struct rb_node *top,
 	if (type == VFIO_IOVA_ANY || dma->type == type)
 		return dma;
 
-	return NULL;
+	/* restart 2 searches skipping the current node */
+	if (start < dma->iova) {
+		dma = vfio_find_dma_from_node(node->rb_left, start,
+					      size, type);
+		if (dma)
+			return dma;
+	}
+	if (start + size > dma->iova + dma->size)
+		dma = vfio_find_dma_from_node(node->rb_right, start,
+					      size, type);
+	return dma;
 }
 
 /**
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v13 11/15] vfio/type1: Handle unmap/unpin and replay for VFIO_IOVA_RESERVED slots
@ 2016-10-06  8:45   ` Eric Auger
  0 siblings, 0 replies; 109+ messages in thread
From: Eric Auger @ 2016-10-06  8:45 UTC (permalink / raw)
  To: eric.auger, eric.auger.pro, christoffer.dall, marc.zyngier,
	robin.murphy, alex.williamson, will.deacon, joro, tglx, jason,
	linux-arm-kernel
  Cc: kvm, drjones, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
	p.fedin, iommu, Jean-Philippe.Brucker, yehuday, Manish.Jaggi

Before allowing the end-user to create VFIO_IOVA_RESERVED dma slots,
let's implement the expected behavior for removal and replay.

As opposed to user dma slots, reserved IOVAs are not systematically bound
to PAs and PAs are not pinned. VFIO just initializes the IOVA "aperture".
IOVAs are allocated outside of the VFIO framework, by the MSI layer which
is responsible to free and unmap them. The MSI mapping resources are freeed
by the IOMMU driver on domain destruction.

On the creation of a new domain, the "replay" of a reserved slot simply
needs to set the MSI aperture on the new domain.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
v12 -> v13:
- use dma-iommu iommu_get_dma_msi_region_cookie

v9 -> v10:
- replay of a reserved slot sets the MSI aperture on the new domain
- use VFIO_IOVA_RESERVED_MSI enum value instead of VFIO_IOVA_RESERVED

v7 -> v8:
- do no destroy anything anymore, just bypass unmap/unpin and iommu_map
  on replay
---
 drivers/vfio/Kconfig            |  1 +
 drivers/vfio/vfio_iommu_type1.c | 10 +++++++++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index da6e2ce..673ec79 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -1,6 +1,7 @@
 config VFIO_IOMMU_TYPE1
 	tristate
 	depends on VFIO
+	select IOMMU_DMA
 	default n
 
 config VFIO_IOMMU_SPAPR_TCE
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 65a4038..5bc5fc9 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -36,6 +36,7 @@
 #include <linux/uaccess.h>
 #include <linux/vfio.h>
 #include <linux/workqueue.h>
+#include <linux/dma-iommu.h>
 
 #define DRIVER_VERSION  "0.2"
 #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
@@ -387,7 +388,7 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
 	struct vfio_domain *domain, *d;
 	long unlocked = 0;
 
-	if (!dma->size)
+	if (!dma->size || dma->type != VFIO_IOVA_USER)
 		return;
 	/*
 	 * We use the IOMMU to track the physical addresses, otherwise we'd
@@ -724,6 +725,13 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
 		dma = rb_entry(n, struct vfio_dma, node);
 		iova = dma->iova;
 
+		if (dma->type == VFIO_IOVA_RESERVED_MSI) {
+			ret = iommu_get_dma_msi_region_cookie(domain->domain,
+						     dma->iova, dma->size);
+			WARN_ON(ret);
+			continue;
+		}
+
 		while (iova < dma->iova + dma->size) {
 			phys_addr_t phys = iommu_iova_to_phys(d->domain, iova);
 			size_t size;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v13 11/15] vfio/type1: Handle unmap/unpin and replay for VFIO_IOVA_RESERVED slots
@ 2016-10-06  8:45   ` Eric Auger
  0 siblings, 0 replies; 109+ messages in thread
From: Eric Auger @ 2016-10-06  8:45 UTC (permalink / raw)
  To: eric.auger-H+wXaHxf7aLQT0dZR+AlfA,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	marc.zyngier-5wv7dgnIgG8, robin.murphy-5wv7dgnIgG8,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	joro-zLv9SwRftAIdnm+yROfE0A, tglx-hfZtesqFncYOwBW4kG4KsQ,
	jason-NLaQJdtUoK4Be96aLqz0jA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: drjones-H+wXaHxf7aLQT0dZR+AlfA, kvm-u79uwXL29TY76Z2rM5mHXA,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	yehuday-eYqpPyKDWXRBDgjK7y7TUQ

Before allowing the end-user to create VFIO_IOVA_RESERVED dma slots,
let's implement the expected behavior for removal and replay.

As opposed to user dma slots, reserved IOVAs are not systematically bound
to PAs and PAs are not pinned. VFIO just initializes the IOVA "aperture".
IOVAs are allocated outside of the VFIO framework, by the MSI layer which
is responsible to free and unmap them. The MSI mapping resources are freeed
by the IOMMU driver on domain destruction.

On the creation of a new domain, the "replay" of a reserved slot simply
needs to set the MSI aperture on the new domain.

Signed-off-by: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

---
v12 -> v13:
- use dma-iommu iommu_get_dma_msi_region_cookie

v9 -> v10:
- replay of a reserved slot sets the MSI aperture on the new domain
- use VFIO_IOVA_RESERVED_MSI enum value instead of VFIO_IOVA_RESERVED

v7 -> v8:
- do no destroy anything anymore, just bypass unmap/unpin and iommu_map
  on replay
---
 drivers/vfio/Kconfig            |  1 +
 drivers/vfio/vfio_iommu_type1.c | 10 +++++++++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index da6e2ce..673ec79 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -1,6 +1,7 @@
 config VFIO_IOMMU_TYPE1
 	tristate
 	depends on VFIO
+	select IOMMU_DMA
 	default n
 
 config VFIO_IOMMU_SPAPR_TCE
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 65a4038..5bc5fc9 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -36,6 +36,7 @@
 #include <linux/uaccess.h>
 #include <linux/vfio.h>
 #include <linux/workqueue.h>
+#include <linux/dma-iommu.h>
 
 #define DRIVER_VERSION  "0.2"
 #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>"
@@ -387,7 +388,7 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
 	struct vfio_domain *domain, *d;
 	long unlocked = 0;
 
-	if (!dma->size)
+	if (!dma->size || dma->type != VFIO_IOVA_USER)
 		return;
 	/*
 	 * We use the IOMMU to track the physical addresses, otherwise we'd
@@ -724,6 +725,13 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
 		dma = rb_entry(n, struct vfio_dma, node);
 		iova = dma->iova;
 
+		if (dma->type == VFIO_IOVA_RESERVED_MSI) {
+			ret = iommu_get_dma_msi_region_cookie(domain->domain,
+						     dma->iova, dma->size);
+			WARN_ON(ret);
+			continue;
+		}
+
 		while (iova < dma->iova + dma->size) {
 			phys_addr_t phys = iommu_iova_to_phys(d->domain, iova);
 			size_t size;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v13 11/15] vfio/type1: Handle unmap/unpin and replay for VFIO_IOVA_RESERVED slots
@ 2016-10-06  8:45   ` Eric Auger
  0 siblings, 0 replies; 109+ messages in thread
From: Eric Auger @ 2016-10-06  8:45 UTC (permalink / raw)
  To: linux-arm-kernel

Before allowing the end-user to create VFIO_IOVA_RESERVED dma slots,
let's implement the expected behavior for removal and replay.

As opposed to user dma slots, reserved IOVAs are not systematically bound
to PAs and PAs are not pinned. VFIO just initializes the IOVA "aperture".
IOVAs are allocated outside of the VFIO framework, by the MSI layer which
is responsible to free and unmap them. The MSI mapping resources are freeed
by the IOMMU driver on domain destruction.

On the creation of a new domain, the "replay" of a reserved slot simply
needs to set the MSI aperture on the new domain.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
v12 -> v13:
- use dma-iommu iommu_get_dma_msi_region_cookie

v9 -> v10:
- replay of a reserved slot sets the MSI aperture on the new domain
- use VFIO_IOVA_RESERVED_MSI enum value instead of VFIO_IOVA_RESERVED

v7 -> v8:
- do no destroy anything anymore, just bypass unmap/unpin and iommu_map
  on replay
---
 drivers/vfio/Kconfig            |  1 +
 drivers/vfio/vfio_iommu_type1.c | 10 +++++++++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index da6e2ce..673ec79 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -1,6 +1,7 @@
 config VFIO_IOMMU_TYPE1
 	tristate
 	depends on VFIO
+	select IOMMU_DMA
 	default n
 
 config VFIO_IOMMU_SPAPR_TCE
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 65a4038..5bc5fc9 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -36,6 +36,7 @@
 #include <linux/uaccess.h>
 #include <linux/vfio.h>
 #include <linux/workqueue.h>
+#include <linux/dma-iommu.h>
 
 #define DRIVER_VERSION  "0.2"
 #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
@@ -387,7 +388,7 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
 	struct vfio_domain *domain, *d;
 	long unlocked = 0;
 
-	if (!dma->size)
+	if (!dma->size || dma->type != VFIO_IOVA_USER)
 		return;
 	/*
 	 * We use the IOMMU to track the physical addresses, otherwise we'd
@@ -724,6 +725,13 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
 		dma = rb_entry(n, struct vfio_dma, node);
 		iova = dma->iova;
 
+		if (dma->type == VFIO_IOVA_RESERVED_MSI) {
+			ret = iommu_get_dma_msi_region_cookie(domain->domain,
+						     dma->iova, dma->size);
+			WARN_ON(ret);
+			continue;
+		}
+
 		while (iova < dma->iova + dma->size) {
 			phys_addr_t phys = iommu_iova_to_phys(d->domain, iova);
 			size_t size;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v13 12/15] vfio: Allow reserved msi iova registration
@ 2016-10-06  8:45   ` Eric Auger
  0 siblings, 0 replies; 109+ messages in thread
From: Eric Auger @ 2016-10-06  8:45 UTC (permalink / raw)
  To: eric.auger, eric.auger.pro, christoffer.dall, marc.zyngier,
	robin.murphy, alex.williamson, will.deacon, joro, tglx, jason,
	linux-arm-kernel
  Cc: kvm, drjones, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
	p.fedin, iommu, Jean-Philippe.Brucker, yehuday, Manish.Jaggi

The user is allowed to register a reserved MSI IOVA range by using the
DMA MAP API and setting the new flag: VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA.
This region is stored in the vfio_dma rb tree. At that point the iova
range is not mapped to any target address yet. The host kernel will use
those iova when needed, typically when MSIs are allocated.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>

---
v12 -> v13:
- use iommu_get_dma_msi_region_cookie

v9 -> v10
- use VFIO_IOVA_RESERVED_MSI enum value

v7 -> v8:
- use iommu_msi_set_aperture function. There is no notion of
  unregistration anymore since the reserved msi slot remains
  until the container gets closed.

v6 -> v7:
- use iommu_free_reserved_iova_domain
- convey prot attributes downto dma-reserved-iommu iova domain creation
- reserved bindings teardown now performed on iommu domain destruction
- rename VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA into
         VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA
- change title
- pass the protection attribute to dma-reserved-iommu API

v3 -> v4:
- use iommu_alloc/free_reserved_iova_domain exported by dma-reserved-iommu
- protect vfio_register_reserved_iova_range implementation with
  CONFIG_IOMMU_DMA_RESERVED
- handle unregistration by user-space and on vfio_iommu_type1 release

v1 -> v2:
- set returned value according to alloc_reserved_iova_domain result
- free the iova domains in case any error occurs

RFC v1 -> v1:
- takes into account Alex comments, based on
  [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region:
- use the existing dma map/unmap ioctl interface with a flag to register
  a reserved IOVA range. A single reserved iova region is allowed.
---
 drivers/vfio/vfio_iommu_type1.c | 77 ++++++++++++++++++++++++++++++++++++++++-
 include/uapi/linux/vfio.h       | 10 +++++-
 2 files changed, 85 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 5bc5fc9..c2f8bd9 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -442,6 +442,20 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
 	vfio_lock_acct(-unlocked);
 }
 
+static int vfio_set_msi_aperture(struct vfio_iommu *iommu,
+				dma_addr_t iova, size_t size)
+{
+	struct vfio_domain *d;
+	int ret = 0;
+
+	list_for_each_entry(d, &iommu->domain_list, next) {
+		ret = iommu_get_dma_msi_region_cookie(d->domain, iova, size);
+		if (ret)
+			break;
+	}
+	return ret;
+}
+
 static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
 {
 	vfio_unmap_unpin(iommu, dma);
@@ -691,6 +705,63 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
 	return ret;
 }
 
+static int vfio_register_msi_range(struct vfio_iommu *iommu,
+				   struct vfio_iommu_type1_dma_map *map)
+{
+	dma_addr_t iova = map->iova;
+	size_t size = map->size;
+	int ret = 0;
+	struct vfio_dma *dma;
+	unsigned long order;
+	uint64_t mask;
+
+	/* Verify that none of our __u64 fields overflow */
+	if (map->size != size || map->iova != iova)
+		return -EINVAL;
+
+	order =  __ffs(vfio_pgsize_bitmap(iommu));
+	mask = ((uint64_t)1 << order) - 1;
+
+	WARN_ON(mask & PAGE_MASK);
+
+	if (!size || (size | iova) & mask)
+		return -EINVAL;
+
+	/* Don't allow IOVA address wrap */
+	if (iova + size - 1 < iova)
+		return -EINVAL;
+
+	mutex_lock(&iommu->lock);
+
+	if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
+		ret =  -EEXIST;
+		goto unlock;
+	}
+
+	dma = kzalloc(sizeof(*dma), GFP_KERNEL);
+	if (!dma) {
+		ret = -ENOMEM;
+		goto unlock;
+	}
+
+	dma->iova = iova;
+	dma->size = size;
+	dma->type = VFIO_IOVA_RESERVED_MSI;
+
+	ret = vfio_set_msi_aperture(iommu, iova, size);
+	if (ret)
+		goto free_unlock;
+
+	vfio_link_dma(iommu, dma);
+	goto unlock;
+
+free_unlock:
+	kfree(dma);
+unlock:
+	mutex_unlock(&iommu->lock);
+	return ret;
+}
+
 static int vfio_bus_type(struct device *dev, void *data)
 {
 	struct bus_type **bus = data;
@@ -1064,7 +1135,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 	} else if (cmd == VFIO_IOMMU_MAP_DMA) {
 		struct vfio_iommu_type1_dma_map map;
 		uint32_t mask = VFIO_DMA_MAP_FLAG_READ |
-				VFIO_DMA_MAP_FLAG_WRITE;
+				VFIO_DMA_MAP_FLAG_WRITE |
+				VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA;
 
 		minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
 
@@ -1074,6 +1146,9 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 		if (map.argsz < minsz || map.flags & ~mask)
 			return -EINVAL;
 
+		if (map.flags & VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA)
+			return vfio_register_msi_range(iommu, &map);
+
 		return vfio_dma_do_map(iommu, &map);
 
 	} else if (cmd == VFIO_IOMMU_UNMAP_DMA) {
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 255a211..4a9dbc2 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -498,12 +498,19 @@ struct vfio_iommu_type1_info {
  *
  * Map process virtual addresses to IO virtual addresses using the
  * provided struct vfio_dma_map. Caller sets argsz. READ &/ WRITE required.
+ *
+ * In case RESERVED_MSI_IOVA flag is set, the API only aims at registering an
+ * IOVA region that will be used on some platforms to map the host MSI frames.
+ * In that specific case, vaddr is ignored. Once registered, an MSI reserved
+ * IOVA region stays until the container is closed.
  */
 struct vfio_iommu_type1_dma_map {
 	__u32	argsz;
 	__u32	flags;
 #define VFIO_DMA_MAP_FLAG_READ (1 << 0)		/* readable from device */
 #define VFIO_DMA_MAP_FLAG_WRITE (1 << 1)	/* writable from device */
+/* reserved iova for MSI vectors*/
+#define VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA (1 << 2)
 	__u64	vaddr;				/* Process virtual address */
 	__u64	iova;				/* IO virtual address */
 	__u64	size;				/* Size of mapping (bytes) */
@@ -519,7 +526,8 @@ struct vfio_iommu_type1_dma_map {
  * Caller sets argsz.  The actual unmapped size is returned in the size
  * field.  No guarantee is made to the user that arbitrary unmaps of iova
  * or size different from those used in the original mapping call will
- * succeed.
+ * succeed. Once registered, an MSI region cannot be unmapped and stays
+ * until the container is closed.
  */
 struct vfio_iommu_type1_dma_unmap {
 	__u32	argsz;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v13 12/15] vfio: Allow reserved msi iova registration
@ 2016-10-06  8:45   ` Eric Auger
  0 siblings, 0 replies; 109+ messages in thread
From: Eric Auger @ 2016-10-06  8:45 UTC (permalink / raw)
  To: eric.auger-H+wXaHxf7aLQT0dZR+AlfA,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	marc.zyngier-5wv7dgnIgG8, robin.murphy-5wv7dgnIgG8,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	joro-zLv9SwRftAIdnm+yROfE0A, tglx-hfZtesqFncYOwBW4kG4KsQ,
	jason-NLaQJdtUoK4Be96aLqz0jA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: drjones-H+wXaHxf7aLQT0dZR+AlfA, kvm-u79uwXL29TY76Z2rM5mHXA,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	yehuday-eYqpPyKDWXRBDgjK7y7TUQ

The user is allowed to register a reserved MSI IOVA range by using the
DMA MAP API and setting the new flag: VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA.
This region is stored in the vfio_dma rb tree. At that point the iova
range is not mapped to any target address yet. The host kernel will use
those iova when needed, typically when MSIs are allocated.

Signed-off-by: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Bharat Bhushan <Bharat.Bhushan-KZfg59tc24xl57MIdRCFDg@public.gmane.org>

---
v12 -> v13:
- use iommu_get_dma_msi_region_cookie

v9 -> v10
- use VFIO_IOVA_RESERVED_MSI enum value

v7 -> v8:
- use iommu_msi_set_aperture function. There is no notion of
  unregistration anymore since the reserved msi slot remains
  until the container gets closed.

v6 -> v7:
- use iommu_free_reserved_iova_domain
- convey prot attributes downto dma-reserved-iommu iova domain creation
- reserved bindings teardown now performed on iommu domain destruction
- rename VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA into
         VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA
- change title
- pass the protection attribute to dma-reserved-iommu API

v3 -> v4:
- use iommu_alloc/free_reserved_iova_domain exported by dma-reserved-iommu
- protect vfio_register_reserved_iova_range implementation with
  CONFIG_IOMMU_DMA_RESERVED
- handle unregistration by user-space and on vfio_iommu_type1 release

v1 -> v2:
- set returned value according to alloc_reserved_iova_domain result
- free the iova domains in case any error occurs

RFC v1 -> v1:
- takes into account Alex comments, based on
  [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region:
- use the existing dma map/unmap ioctl interface with a flag to register
  a reserved IOVA range. A single reserved iova region is allowed.
---
 drivers/vfio/vfio_iommu_type1.c | 77 ++++++++++++++++++++++++++++++++++++++++-
 include/uapi/linux/vfio.h       | 10 +++++-
 2 files changed, 85 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 5bc5fc9..c2f8bd9 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -442,6 +442,20 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
 	vfio_lock_acct(-unlocked);
 }
 
+static int vfio_set_msi_aperture(struct vfio_iommu *iommu,
+				dma_addr_t iova, size_t size)
+{
+	struct vfio_domain *d;
+	int ret = 0;
+
+	list_for_each_entry(d, &iommu->domain_list, next) {
+		ret = iommu_get_dma_msi_region_cookie(d->domain, iova, size);
+		if (ret)
+			break;
+	}
+	return ret;
+}
+
 static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
 {
 	vfio_unmap_unpin(iommu, dma);
@@ -691,6 +705,63 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
 	return ret;
 }
 
+static int vfio_register_msi_range(struct vfio_iommu *iommu,
+				   struct vfio_iommu_type1_dma_map *map)
+{
+	dma_addr_t iova = map->iova;
+	size_t size = map->size;
+	int ret = 0;
+	struct vfio_dma *dma;
+	unsigned long order;
+	uint64_t mask;
+
+	/* Verify that none of our __u64 fields overflow */
+	if (map->size != size || map->iova != iova)
+		return -EINVAL;
+
+	order =  __ffs(vfio_pgsize_bitmap(iommu));
+	mask = ((uint64_t)1 << order) - 1;
+
+	WARN_ON(mask & PAGE_MASK);
+
+	if (!size || (size | iova) & mask)
+		return -EINVAL;
+
+	/* Don't allow IOVA address wrap */
+	if (iova + size - 1 < iova)
+		return -EINVAL;
+
+	mutex_lock(&iommu->lock);
+
+	if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
+		ret =  -EEXIST;
+		goto unlock;
+	}
+
+	dma = kzalloc(sizeof(*dma), GFP_KERNEL);
+	if (!dma) {
+		ret = -ENOMEM;
+		goto unlock;
+	}
+
+	dma->iova = iova;
+	dma->size = size;
+	dma->type = VFIO_IOVA_RESERVED_MSI;
+
+	ret = vfio_set_msi_aperture(iommu, iova, size);
+	if (ret)
+		goto free_unlock;
+
+	vfio_link_dma(iommu, dma);
+	goto unlock;
+
+free_unlock:
+	kfree(dma);
+unlock:
+	mutex_unlock(&iommu->lock);
+	return ret;
+}
+
 static int vfio_bus_type(struct device *dev, void *data)
 {
 	struct bus_type **bus = data;
@@ -1064,7 +1135,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 	} else if (cmd == VFIO_IOMMU_MAP_DMA) {
 		struct vfio_iommu_type1_dma_map map;
 		uint32_t mask = VFIO_DMA_MAP_FLAG_READ |
-				VFIO_DMA_MAP_FLAG_WRITE;
+				VFIO_DMA_MAP_FLAG_WRITE |
+				VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA;
 
 		minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
 
@@ -1074,6 +1146,9 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 		if (map.argsz < minsz || map.flags & ~mask)
 			return -EINVAL;
 
+		if (map.flags & VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA)
+			return vfio_register_msi_range(iommu, &map);
+
 		return vfio_dma_do_map(iommu, &map);
 
 	} else if (cmd == VFIO_IOMMU_UNMAP_DMA) {
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 255a211..4a9dbc2 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -498,12 +498,19 @@ struct vfio_iommu_type1_info {
  *
  * Map process virtual addresses to IO virtual addresses using the
  * provided struct vfio_dma_map. Caller sets argsz. READ &/ WRITE required.
+ *
+ * In case RESERVED_MSI_IOVA flag is set, the API only aims at registering an
+ * IOVA region that will be used on some platforms to map the host MSI frames.
+ * In that specific case, vaddr is ignored. Once registered, an MSI reserved
+ * IOVA region stays until the container is closed.
  */
 struct vfio_iommu_type1_dma_map {
 	__u32	argsz;
 	__u32	flags;
 #define VFIO_DMA_MAP_FLAG_READ (1 << 0)		/* readable from device */
 #define VFIO_DMA_MAP_FLAG_WRITE (1 << 1)	/* writable from device */
+/* reserved iova for MSI vectors*/
+#define VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA (1 << 2)
 	__u64	vaddr;				/* Process virtual address */
 	__u64	iova;				/* IO virtual address */
 	__u64	size;				/* Size of mapping (bytes) */
@@ -519,7 +526,8 @@ struct vfio_iommu_type1_dma_map {
  * Caller sets argsz.  The actual unmapped size is returned in the size
  * field.  No guarantee is made to the user that arbitrary unmaps of iova
  * or size different from those used in the original mapping call will
- * succeed.
+ * succeed. Once registered, an MSI region cannot be unmapped and stays
+ * until the container is closed.
  */
 struct vfio_iommu_type1_dma_unmap {
 	__u32	argsz;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v13 12/15] vfio: Allow reserved msi iova registration
@ 2016-10-06  8:45   ` Eric Auger
  0 siblings, 0 replies; 109+ messages in thread
From: Eric Auger @ 2016-10-06  8:45 UTC (permalink / raw)
  To: linux-arm-kernel

The user is allowed to register a reserved MSI IOVA range by using the
DMA MAP API and setting the new flag: VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA.
This region is stored in the vfio_dma rb tree. At that point the iova
range is not mapped to any target address yet. The host kernel will use
those iova when needed, typically when MSIs are allocated.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>

---
v12 -> v13:
- use iommu_get_dma_msi_region_cookie

v9 -> v10
- use VFIO_IOVA_RESERVED_MSI enum value

v7 -> v8:
- use iommu_msi_set_aperture function. There is no notion of
  unregistration anymore since the reserved msi slot remains
  until the container gets closed.

v6 -> v7:
- use iommu_free_reserved_iova_domain
- convey prot attributes downto dma-reserved-iommu iova domain creation
- reserved bindings teardown now performed on iommu domain destruction
- rename VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA into
         VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA
- change title
- pass the protection attribute to dma-reserved-iommu API

v3 -> v4:
- use iommu_alloc/free_reserved_iova_domain exported by dma-reserved-iommu
- protect vfio_register_reserved_iova_range implementation with
  CONFIG_IOMMU_DMA_RESERVED
- handle unregistration by user-space and on vfio_iommu_type1 release

v1 -> v2:
- set returned value according to alloc_reserved_iova_domain result
- free the iova domains in case any error occurs

RFC v1 -> v1:
- takes into account Alex comments, based on
  [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region:
- use the existing dma map/unmap ioctl interface with a flag to register
  a reserved IOVA range. A single reserved iova region is allowed.
---
 drivers/vfio/vfio_iommu_type1.c | 77 ++++++++++++++++++++++++++++++++++++++++-
 include/uapi/linux/vfio.h       | 10 +++++-
 2 files changed, 85 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 5bc5fc9..c2f8bd9 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -442,6 +442,20 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
 	vfio_lock_acct(-unlocked);
 }
 
+static int vfio_set_msi_aperture(struct vfio_iommu *iommu,
+				dma_addr_t iova, size_t size)
+{
+	struct vfio_domain *d;
+	int ret = 0;
+
+	list_for_each_entry(d, &iommu->domain_list, next) {
+		ret = iommu_get_dma_msi_region_cookie(d->domain, iova, size);
+		if (ret)
+			break;
+	}
+	return ret;
+}
+
 static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
 {
 	vfio_unmap_unpin(iommu, dma);
@@ -691,6 +705,63 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
 	return ret;
 }
 
+static int vfio_register_msi_range(struct vfio_iommu *iommu,
+				   struct vfio_iommu_type1_dma_map *map)
+{
+	dma_addr_t iova = map->iova;
+	size_t size = map->size;
+	int ret = 0;
+	struct vfio_dma *dma;
+	unsigned long order;
+	uint64_t mask;
+
+	/* Verify that none of our __u64 fields overflow */
+	if (map->size != size || map->iova != iova)
+		return -EINVAL;
+
+	order =  __ffs(vfio_pgsize_bitmap(iommu));
+	mask = ((uint64_t)1 << order) - 1;
+
+	WARN_ON(mask & PAGE_MASK);
+
+	if (!size || (size | iova) & mask)
+		return -EINVAL;
+
+	/* Don't allow IOVA address wrap */
+	if (iova + size - 1 < iova)
+		return -EINVAL;
+
+	mutex_lock(&iommu->lock);
+
+	if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
+		ret =  -EEXIST;
+		goto unlock;
+	}
+
+	dma = kzalloc(sizeof(*dma), GFP_KERNEL);
+	if (!dma) {
+		ret = -ENOMEM;
+		goto unlock;
+	}
+
+	dma->iova = iova;
+	dma->size = size;
+	dma->type = VFIO_IOVA_RESERVED_MSI;
+
+	ret = vfio_set_msi_aperture(iommu, iova, size);
+	if (ret)
+		goto free_unlock;
+
+	vfio_link_dma(iommu, dma);
+	goto unlock;
+
+free_unlock:
+	kfree(dma);
+unlock:
+	mutex_unlock(&iommu->lock);
+	return ret;
+}
+
 static int vfio_bus_type(struct device *dev, void *data)
 {
 	struct bus_type **bus = data;
@@ -1064,7 +1135,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 	} else if (cmd == VFIO_IOMMU_MAP_DMA) {
 		struct vfio_iommu_type1_dma_map map;
 		uint32_t mask = VFIO_DMA_MAP_FLAG_READ |
-				VFIO_DMA_MAP_FLAG_WRITE;
+				VFIO_DMA_MAP_FLAG_WRITE |
+				VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA;
 
 		minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
 
@@ -1074,6 +1146,9 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 		if (map.argsz < minsz || map.flags & ~mask)
 			return -EINVAL;
 
+		if (map.flags & VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA)
+			return vfio_register_msi_range(iommu, &map);
+
 		return vfio_dma_do_map(iommu, &map);
 
 	} else if (cmd == VFIO_IOMMU_UNMAP_DMA) {
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 255a211..4a9dbc2 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -498,12 +498,19 @@ struct vfio_iommu_type1_info {
  *
  * Map process virtual addresses to IO virtual addresses using the
  * provided struct vfio_dma_map. Caller sets argsz. READ &/ WRITE required.
+ *
+ * In case RESERVED_MSI_IOVA flag is set, the API only aims@registering an
+ * IOVA region that will be used on some platforms to map the host MSI frames.
+ * In that specific case, vaddr is ignored. Once registered, an MSI reserved
+ * IOVA region stays until the container is closed.
  */
 struct vfio_iommu_type1_dma_map {
 	__u32	argsz;
 	__u32	flags;
 #define VFIO_DMA_MAP_FLAG_READ (1 << 0)		/* readable from device */
 #define VFIO_DMA_MAP_FLAG_WRITE (1 << 1)	/* writable from device */
+/* reserved iova for MSI vectors*/
+#define VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA (1 << 2)
 	__u64	vaddr;				/* Process virtual address */
 	__u64	iova;				/* IO virtual address */
 	__u64	size;				/* Size of mapping (bytes) */
@@ -519,7 +526,8 @@ struct vfio_iommu_type1_dma_map {
  * Caller sets argsz.  The actual unmapped size is returned in the size
  * field.  No guarantee is made to the user that arbitrary unmaps of iova
  * or size different from those used in the original mapping call will
- * succeed.
+ * succeed. Once registered, an MSI region cannot be unmapped and stays
+ * until the container is closed.
  */
 struct vfio_iommu_type1_dma_unmap {
 	__u32	argsz;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v13 13/15] vfio/type1: Check doorbell safety
  2016-10-06  8:45 ` Eric Auger
@ 2016-10-06  8:45   ` Eric Auger
  -1 siblings, 0 replies; 109+ messages in thread
From: Eric Auger @ 2016-10-06  8:45 UTC (permalink / raw)
  To: eric.auger, eric.auger.pro, christoffer.dall, marc.zyngier,
	robin.murphy, alex.williamson, will.deacon, joro, tglx, jason,
	linux-arm-kernel
  Cc: kvm, drjones, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
	p.fedin, iommu, Jean-Philippe.Brucker, yehuday, Manish.Jaggi

On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is abstracted
by the msi controller.

Since we currently have no way to detect whether the MSI controller is
upstream or downstream to the IOMMU we rely on the MSI doorbell information
registered by the interrupt controllers. In case at least one doorbell
does not implement proper isolation, we state the assignment is unsafe
with regard to interrupts. This is a coase assessment but should allow to
wait for a better system description.

At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
removed in next patch.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v9 -> v10:
- coarse safety assessment based on MSI doorbell info

v3 -> v4:
- rename vfio_msi_parent_irq_remapping_capable into vfio_safe_irq_domain
  and irq_remapping into safe_irq_domains

v2 -> v3:
- protect vfio_msi_parent_irq_remapping_capable with
  CONFIG_GENERIC_MSI_IRQ_DOMAIN
---
 drivers/vfio/vfio_iommu_type1.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index c2f8bd9..dc3ee5d 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -37,6 +37,7 @@
 #include <linux/vfio.h>
 #include <linux/workqueue.h>
 #include <linux/dma-iommu.h>
+#include <linux/msi-doorbell.h>
 
 #define DRIVER_VERSION  "0.2"
 #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
@@ -921,8 +922,13 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	INIT_LIST_HEAD(&domain->group_list);
 	list_add(&group->next, &domain->group_list);
 
+	/*
+	 * to advertise safe interrupts either the IOMMU or the MSI controllers
+	 * must support IRQ remapping (aka. interrupt translation)
+	 */
 	if (!allow_unsafe_interrupts &&
-	    !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
+	    (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) &&
+		!msi_doorbell_safe())) {
 		pr_warn("%s: No interrupt remapping support.  Use the module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this platform\n",
 		       __func__);
 		ret = -EPERM;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v13 13/15] vfio/type1: Check doorbell safety
@ 2016-10-06  8:45   ` Eric Auger
  0 siblings, 0 replies; 109+ messages in thread
From: Eric Auger @ 2016-10-06  8:45 UTC (permalink / raw)
  To: linux-arm-kernel

On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is abstracted
by the msi controller.

Since we currently have no way to detect whether the MSI controller is
upstream or downstream to the IOMMU we rely on the MSI doorbell information
registered by the interrupt controllers. In case at least one doorbell
does not implement proper isolation, we state the assignment is unsafe
with regard to interrupts. This is a coase assessment but should allow to
wait for a better system description.

At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
removed in next patch.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v9 -> v10:
- coarse safety assessment based on MSI doorbell info

v3 -> v4:
- rename vfio_msi_parent_irq_remapping_capable into vfio_safe_irq_domain
  and irq_remapping into safe_irq_domains

v2 -> v3:
- protect vfio_msi_parent_irq_remapping_capable with
  CONFIG_GENERIC_MSI_IRQ_DOMAIN
---
 drivers/vfio/vfio_iommu_type1.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index c2f8bd9..dc3ee5d 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -37,6 +37,7 @@
 #include <linux/vfio.h>
 #include <linux/workqueue.h>
 #include <linux/dma-iommu.h>
+#include <linux/msi-doorbell.h>
 
 #define DRIVER_VERSION  "0.2"
 #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
@@ -921,8 +922,13 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	INIT_LIST_HEAD(&domain->group_list);
 	list_add(&group->next, &domain->group_list);
 
+	/*
+	 * to advertise safe interrupts either the IOMMU or the MSI controllers
+	 * must support IRQ remapping (aka. interrupt translation)
+	 */
 	if (!allow_unsafe_interrupts &&
-	    !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
+	    (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) &&
+		!msi_doorbell_safe())) {
 		pr_warn("%s: No interrupt remapping support.  Use the module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this platform\n",
 		       __func__);
 		ret = -EPERM;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v13 14/15] iommu/arm-smmu: Do not advertise IOMMU_CAP_INTR_REMAP
@ 2016-10-06  8:45   ` Eric Auger
  0 siblings, 0 replies; 109+ messages in thread
From: Eric Auger @ 2016-10-06  8:45 UTC (permalink / raw)
  To: eric.auger, eric.auger.pro, christoffer.dall, marc.zyngier,
	robin.murphy, alex.williamson, will.deacon, joro, tglx, jason,
	linux-arm-kernel
  Cc: kvm, drjones, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
	p.fedin, iommu, Jean-Philippe.Brucker, yehuday, Manish.Jaggi

Do not advertise IOMMU_CAP_INTR_REMAP for arm-smmu(-v3). Indeed the
irq_remapping capability is abstracted on irqchip side for ARM as
opposed to Intel IOMMU featuring IRQ remapping HW.

So to check IRQ remapping capability, the msi domain needs to be
checked instead.

This commit affects platform and PCIe device assignment use cases
on any platform featuring an unsafe MSI controller (currently the
ARM GICv2m). For those platforms the VFIO module must be loaded with
allow_unsafe_interrupts set to 1.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v9 -> v10:
- reword the commit message (allow_unsafe_interrupts)
---
 drivers/iommu/arm-smmu-v3.c | 3 ++-
 drivers/iommu/arm-smmu.c    | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index f82eec3..c0a34be 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1371,7 +1371,8 @@ static bool arm_smmu_capable(enum iommu_cap cap)
 	case IOMMU_CAP_CACHE_COHERENCY:
 		return true;
 	case IOMMU_CAP_INTR_REMAP:
-		return true; /* MSIs are just memory writes */
+		/* interrupt translation handled at MSI controller level */
+		return false;
 	case IOMMU_CAP_NOEXEC:
 		return true;
 	default:
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 97ff1b4..0c0cd9e 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1361,7 +1361,8 @@ static bool arm_smmu_capable(enum iommu_cap cap)
 		 */
 		return true;
 	case IOMMU_CAP_INTR_REMAP:
-		return true; /* MSIs are just memory writes */
+		/* interrupt translation handled at MSI controller level */
+		return false;
 	case IOMMU_CAP_NOEXEC:
 		return true;
 	default:
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v13 14/15] iommu/arm-smmu: Do not advertise IOMMU_CAP_INTR_REMAP
@ 2016-10-06  8:45   ` Eric Auger
  0 siblings, 0 replies; 109+ messages in thread
From: Eric Auger @ 2016-10-06  8:45 UTC (permalink / raw)
  To: eric.auger-H+wXaHxf7aLQT0dZR+AlfA,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	marc.zyngier-5wv7dgnIgG8, robin.murphy-5wv7dgnIgG8,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	joro-zLv9SwRftAIdnm+yROfE0A, tglx-hfZtesqFncYOwBW4kG4KsQ,
	jason-NLaQJdtUoK4Be96aLqz0jA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: drjones-H+wXaHxf7aLQT0dZR+AlfA, kvm-u79uwXL29TY76Z2rM5mHXA,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	yehuday-eYqpPyKDWXRBDgjK7y7TUQ

Do not advertise IOMMU_CAP_INTR_REMAP for arm-smmu(-v3). Indeed the
irq_remapping capability is abstracted on irqchip side for ARM as
opposed to Intel IOMMU featuring IRQ remapping HW.

So to check IRQ remapping capability, the msi domain needs to be
checked instead.

This commit affects platform and PCIe device assignment use cases
on any platform featuring an unsafe MSI controller (currently the
ARM GICv2m). For those platforms the VFIO module must be loaded with
allow_unsafe_interrupts set to 1.

Signed-off-by: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

---

v9 -> v10:
- reword the commit message (allow_unsafe_interrupts)
---
 drivers/iommu/arm-smmu-v3.c | 3 ++-
 drivers/iommu/arm-smmu.c    | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index f82eec3..c0a34be 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1371,7 +1371,8 @@ static bool arm_smmu_capable(enum iommu_cap cap)
 	case IOMMU_CAP_CACHE_COHERENCY:
 		return true;
 	case IOMMU_CAP_INTR_REMAP:
-		return true; /* MSIs are just memory writes */
+		/* interrupt translation handled at MSI controller level */
+		return false;
 	case IOMMU_CAP_NOEXEC:
 		return true;
 	default:
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 97ff1b4..0c0cd9e 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1361,7 +1361,8 @@ static bool arm_smmu_capable(enum iommu_cap cap)
 		 */
 		return true;
 	case IOMMU_CAP_INTR_REMAP:
-		return true; /* MSIs are just memory writes */
+		/* interrupt translation handled at MSI controller level */
+		return false;
 	case IOMMU_CAP_NOEXEC:
 		return true;
 	default:
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v13 14/15] iommu/arm-smmu: Do not advertise IOMMU_CAP_INTR_REMAP
@ 2016-10-06  8:45   ` Eric Auger
  0 siblings, 0 replies; 109+ messages in thread
From: Eric Auger @ 2016-10-06  8:45 UTC (permalink / raw)
  To: linux-arm-kernel

Do not advertise IOMMU_CAP_INTR_REMAP for arm-smmu(-v3). Indeed the
irq_remapping capability is abstracted on irqchip side for ARM as
opposed to Intel IOMMU featuring IRQ remapping HW.

So to check IRQ remapping capability, the msi domain needs to be
checked instead.

This commit affects platform and PCIe device assignment use cases
on any platform featuring an unsafe MSI controller (currently the
ARM GICv2m). For those platforms the VFIO module must be loaded with
allow_unsafe_interrupts set to 1.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v9 -> v10:
- reword the commit message (allow_unsafe_interrupts)
---
 drivers/iommu/arm-smmu-v3.c | 3 ++-
 drivers/iommu/arm-smmu.c    | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index f82eec3..c0a34be 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1371,7 +1371,8 @@ static bool arm_smmu_capable(enum iommu_cap cap)
 	case IOMMU_CAP_CACHE_COHERENCY:
 		return true;
 	case IOMMU_CAP_INTR_REMAP:
-		return true; /* MSIs are just memory writes */
+		/* interrupt translation handled at MSI controller level */
+		return false;
 	case IOMMU_CAP_NOEXEC:
 		return true;
 	default:
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 97ff1b4..0c0cd9e 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1361,7 +1361,8 @@ static bool arm_smmu_capable(enum iommu_cap cap)
 		 */
 		return true;
 	case IOMMU_CAP_INTR_REMAP:
-		return true; /* MSIs are just memory writes */
+		/* interrupt translation handled at MSI controller level */
+		return false;
 	case IOMMU_CAP_NOEXEC:
 		return true;
 	default:
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v13 15/15] vfio/type1: Return the MSI geometry through VFIO_IOMMU_GET_INFO capability chains
@ 2016-10-06  8:45   ` Eric Auger
  0 siblings, 0 replies; 109+ messages in thread
From: Eric Auger @ 2016-10-06  8:45 UTC (permalink / raw)
  To: eric.auger, eric.auger.pro, christoffer.dall, marc.zyngier,
	robin.murphy, alex.williamson, will.deacon, joro, tglx, jason,
	linux-arm-kernel
  Cc: kvm, drjones, linux-kernel, Bharat.Bhushan, pranav.sawargaonkar,
	p.fedin, iommu, Jean-Philippe.Brucker, yehuday, Manish.Jaggi

This patch allows the user-space to retrieve the MSI geometry. The
implementation is based on capability chains, now also added to
VFIO_IOMMU_GET_INFO.

The returned info comprise:
- whether the MSI IOVA are constrained to a reserved range (x86 case) and
  in the positive, the start/end of the aperture,
- or whether the IOVA aperture need to be set by the userspace. In that
  case, the size and alignment of the IOVA window to be provided are
  returned.

In case the userspace must provide the IOVA aperture, we currently report
a size/alignment based on all the doorbells registered by the host kernel.
This may exceed the actual needs.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
v11 -> v11:
- msi_doorbell_pages was renamed msi_doorbell_calc_pages

v9 -> v10:
- move cap_offset after iova_pgsizes
- replace __u64 alignment by __u32 order
- introduce __u32 flags in vfio_iommu_type1_info_cap_msi_geometry and
  fix alignment
- call msi-doorbell API to compute the size/alignment

v8 -> v9:
- use iommu_msi_supported flag instead of programmable
- replace IOMMU_INFO_REQUIRE_MSI_MAP flag by a more sophisticated
  capability chain, reporting the MSI geometry

v7 -> v8:
- use iommu_domain_msi_geometry

v6 -> v7:
- remove the computation of the number of IOVA pages to be provisionned.
  This number depends on the domain/group/device topology which can
  dynamically change. Let's rely instead rely on an arbitrary max depending
  on the system

v4 -> v5:
- move msi_info and ret declaration within the conditional code

v3 -> v4:
- replace former vfio_domains_require_msi_mapping by
  more complex computation of MSI mapping requirements, especially the
  number of pages to be provided by the user-space.
- reword patch title

RFC v1 -> v1:
- derived from
  [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
- renamed allow_msi_reconfig into require_msi_mapping
- fixed VFIO_IOMMU_GET_INFO
---
 drivers/vfio/vfio_iommu_type1.c | 78 ++++++++++++++++++++++++++++++++++++++++-
 include/uapi/linux/vfio.h       | 32 ++++++++++++++++-
 2 files changed, 108 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index dc3ee5d..ce5e7eb 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -38,6 +38,8 @@
 #include <linux/workqueue.h>
 #include <linux/dma-iommu.h>
 #include <linux/msi-doorbell.h>
+#include <linux/irqdomain.h>
+#include <linux/msi.h>
 
 #define DRIVER_VERSION  "0.2"
 #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
@@ -1101,6 +1103,55 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
 	return ret;
 }
 
+static int compute_msi_geometry_caps(struct vfio_iommu *iommu,
+				     struct vfio_info_cap *caps)
+{
+	struct vfio_iommu_type1_info_cap_msi_geometry *vfio_msi_geometry;
+	unsigned long order = __ffs(vfio_pgsize_bitmap(iommu));
+	struct iommu_domain_msi_geometry msi_geometry;
+	struct vfio_info_cap_header *header;
+	struct vfio_domain *d;
+	bool reserved;
+	size_t size;
+
+	mutex_lock(&iommu->lock);
+	/* All domains have same require_msi_map property, pick first */
+	d = list_first_entry(&iommu->domain_list, struct vfio_domain, next);
+	iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_GEOMETRY,
+			      &msi_geometry);
+	reserved = !msi_geometry.iommu_msi_supported;
+
+	mutex_unlock(&iommu->lock);
+
+	size = sizeof(*vfio_msi_geometry);
+	header = vfio_info_cap_add(caps, size,
+				   VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY, 1);
+
+	if (IS_ERR(header))
+		return PTR_ERR(header);
+
+	vfio_msi_geometry = container_of(header,
+				struct vfio_iommu_type1_info_cap_msi_geometry,
+				header);
+
+	vfio_msi_geometry->flags = reserved;
+	if (reserved) {
+		vfio_msi_geometry->aperture_start = msi_geometry.aperture_start;
+		vfio_msi_geometry->aperture_end = msi_geometry.aperture_end;
+		return 0;
+	}
+
+	vfio_msi_geometry->order = order;
+	/*
+	 * we compute a system-wide requirement based on all the registered
+	 * doorbells
+	 */
+	vfio_msi_geometry->size =
+		msi_doorbell_calc_pages(order) * ((uint64_t) 1 << order);
+
+	return 0;
+}
+
 static long vfio_iommu_type1_ioctl(void *iommu_data,
 				   unsigned int cmd, unsigned long arg)
 {
@@ -1122,8 +1173,10 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 		}
 	} else if (cmd == VFIO_IOMMU_GET_INFO) {
 		struct vfio_iommu_type1_info info;
+		struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
+		int ret;
 
-		minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
+		minsz = offsetofend(struct vfio_iommu_type1_info, cap_offset);
 
 		if (copy_from_user(&info, (void __user *)arg, minsz))
 			return -EFAULT;
@@ -1135,6 +1188,29 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 
 		info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
 
+		ret = compute_msi_geometry_caps(iommu, &caps);
+		if (ret)
+			return ret;
+
+		if (caps.size) {
+			info.flags |= VFIO_IOMMU_INFO_CAPS;
+			if (info.argsz < sizeof(info) + caps.size) {
+				info.argsz = sizeof(info) + caps.size;
+				info.cap_offset = 0;
+			} else {
+				vfio_info_cap_shift(&caps, sizeof(info));
+				if (copy_to_user((void __user *)arg +
+						sizeof(info), caps.buf,
+						caps.size)) {
+					kfree(caps.buf);
+					return -EFAULT;
+				}
+				info.cap_offset = sizeof(info);
+			}
+
+			kfree(caps.buf);
+		}
+
 		return copy_to_user((void __user *)arg, &info, minsz) ?
 			-EFAULT : 0;
 
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 4a9dbc2..8dae013 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -488,7 +488,35 @@ struct vfio_iommu_type1_info {
 	__u32	argsz;
 	__u32	flags;
 #define VFIO_IOMMU_INFO_PGSIZES (1 << 0)	/* supported page sizes info */
-	__u64	iova_pgsizes;		/* Bitmap of supported page sizes */
+#define VFIO_IOMMU_INFO_CAPS	(1 << 1)	/* Info supports caps */
+	__u64	iova_pgsizes;	/* Bitmap of supported page sizes */
+	__u32	__resv;
+	__u32   cap_offset;	/* Offset within info struct of first cap */
+};
+
+#define VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY	1
+
+/*
+ * The MSI geometry capability allows to report the MSI IOVA geometry:
+ * - either the MSI IOVAs are constrained within a reserved IOVA aperture
+ *   whose boundaries are given by [@aperture_start, @aperture_end].
+ *   this is typically the case on x86 host. The userspace is not allowed
+ *   to map userspace memory at IOVAs intersecting this range using
+ *   VFIO_IOMMU_MAP_DMA.
+ * - or the MSI IOVAs are not requested to belong to any reserved range;
+ *   in that case the userspace must provide an IOVA window characterized by
+ *   @size and @alignment using VFIO_IOMMU_MAP_DMA with RESERVED_MSI_IOVA flag.
+ */
+struct vfio_iommu_type1_info_cap_msi_geometry {
+	struct vfio_info_cap_header header;
+	__u32 flags;
+#define VFIO_IOMMU_MSI_GEOMETRY_RESERVED (1 << 0) /* reserved geometry */
+	/* not reserved */
+	__u32 order; /* iommu page order used for aperture alignment*/
+	__u64 size; /* IOVA aperture size (bytes) the userspace must provide */
+	/* reserved */
+	__u64 aperture_start;
+	__u64 aperture_end;
 };
 
 #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
@@ -503,6 +531,8 @@ struct vfio_iommu_type1_info {
  * IOVA region that will be used on some platforms to map the host MSI frames.
  * In that specific case, vaddr is ignored. Once registered, an MSI reserved
  * IOVA region stays until the container is closed.
+ * The requirement for provisioning such reserved IOVA range can be checked by
+ * checking the VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY capability.
  */
 struct vfio_iommu_type1_dma_map {
 	__u32	argsz;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v13 15/15] vfio/type1: Return the MSI geometry through VFIO_IOMMU_GET_INFO capability chains
@ 2016-10-06  8:45   ` Eric Auger
  0 siblings, 0 replies; 109+ messages in thread
From: Eric Auger @ 2016-10-06  8:45 UTC (permalink / raw)
  To: eric.auger-H+wXaHxf7aLQT0dZR+AlfA,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	marc.zyngier-5wv7dgnIgG8, robin.murphy-5wv7dgnIgG8,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	joro-zLv9SwRftAIdnm+yROfE0A, tglx-hfZtesqFncYOwBW4kG4KsQ,
	jason-NLaQJdtUoK4Be96aLqz0jA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: drjones-H+wXaHxf7aLQT0dZR+AlfA, kvm-u79uwXL29TY76Z2rM5mHXA,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	yehuday-eYqpPyKDWXRBDgjK7y7TUQ

This patch allows the user-space to retrieve the MSI geometry. The
implementation is based on capability chains, now also added to
VFIO_IOMMU_GET_INFO.

The returned info comprise:
- whether the MSI IOVA are constrained to a reserved range (x86 case) and
  in the positive, the start/end of the aperture,
- or whether the IOVA aperture need to be set by the userspace. In that
  case, the size and alignment of the IOVA window to be provided are
  returned.

In case the userspace must provide the IOVA aperture, we currently report
a size/alignment based on all the doorbells registered by the host kernel.
This may exceed the actual needs.

Signed-off-by: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

---
v11 -> v11:
- msi_doorbell_pages was renamed msi_doorbell_calc_pages

v9 -> v10:
- move cap_offset after iova_pgsizes
- replace __u64 alignment by __u32 order
- introduce __u32 flags in vfio_iommu_type1_info_cap_msi_geometry and
  fix alignment
- call msi-doorbell API to compute the size/alignment

v8 -> v9:
- use iommu_msi_supported flag instead of programmable
- replace IOMMU_INFO_REQUIRE_MSI_MAP flag by a more sophisticated
  capability chain, reporting the MSI geometry

v7 -> v8:
- use iommu_domain_msi_geometry

v6 -> v7:
- remove the computation of the number of IOVA pages to be provisionned.
  This number depends on the domain/group/device topology which can
  dynamically change. Let's rely instead rely on an arbitrary max depending
  on the system

v4 -> v5:
- move msi_info and ret declaration within the conditional code

v3 -> v4:
- replace former vfio_domains_require_msi_mapping by
  more complex computation of MSI mapping requirements, especially the
  number of pages to be provided by the user-space.
- reword patch title

RFC v1 -> v1:
- derived from
  [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
- renamed allow_msi_reconfig into require_msi_mapping
- fixed VFIO_IOMMU_GET_INFO
---
 drivers/vfio/vfio_iommu_type1.c | 78 ++++++++++++++++++++++++++++++++++++++++-
 include/uapi/linux/vfio.h       | 32 ++++++++++++++++-
 2 files changed, 108 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index dc3ee5d..ce5e7eb 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -38,6 +38,8 @@
 #include <linux/workqueue.h>
 #include <linux/dma-iommu.h>
 #include <linux/msi-doorbell.h>
+#include <linux/irqdomain.h>
+#include <linux/msi.h>
 
 #define DRIVER_VERSION  "0.2"
 #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>"
@@ -1101,6 +1103,55 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
 	return ret;
 }
 
+static int compute_msi_geometry_caps(struct vfio_iommu *iommu,
+				     struct vfio_info_cap *caps)
+{
+	struct vfio_iommu_type1_info_cap_msi_geometry *vfio_msi_geometry;
+	unsigned long order = __ffs(vfio_pgsize_bitmap(iommu));
+	struct iommu_domain_msi_geometry msi_geometry;
+	struct vfio_info_cap_header *header;
+	struct vfio_domain *d;
+	bool reserved;
+	size_t size;
+
+	mutex_lock(&iommu->lock);
+	/* All domains have same require_msi_map property, pick first */
+	d = list_first_entry(&iommu->domain_list, struct vfio_domain, next);
+	iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_GEOMETRY,
+			      &msi_geometry);
+	reserved = !msi_geometry.iommu_msi_supported;
+
+	mutex_unlock(&iommu->lock);
+
+	size = sizeof(*vfio_msi_geometry);
+	header = vfio_info_cap_add(caps, size,
+				   VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY, 1);
+
+	if (IS_ERR(header))
+		return PTR_ERR(header);
+
+	vfio_msi_geometry = container_of(header,
+				struct vfio_iommu_type1_info_cap_msi_geometry,
+				header);
+
+	vfio_msi_geometry->flags = reserved;
+	if (reserved) {
+		vfio_msi_geometry->aperture_start = msi_geometry.aperture_start;
+		vfio_msi_geometry->aperture_end = msi_geometry.aperture_end;
+		return 0;
+	}
+
+	vfio_msi_geometry->order = order;
+	/*
+	 * we compute a system-wide requirement based on all the registered
+	 * doorbells
+	 */
+	vfio_msi_geometry->size =
+		msi_doorbell_calc_pages(order) * ((uint64_t) 1 << order);
+
+	return 0;
+}
+
 static long vfio_iommu_type1_ioctl(void *iommu_data,
 				   unsigned int cmd, unsigned long arg)
 {
@@ -1122,8 +1173,10 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 		}
 	} else if (cmd == VFIO_IOMMU_GET_INFO) {
 		struct vfio_iommu_type1_info info;
+		struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
+		int ret;
 
-		minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
+		minsz = offsetofend(struct vfio_iommu_type1_info, cap_offset);
 
 		if (copy_from_user(&info, (void __user *)arg, minsz))
 			return -EFAULT;
@@ -1135,6 +1188,29 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 
 		info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
 
+		ret = compute_msi_geometry_caps(iommu, &caps);
+		if (ret)
+			return ret;
+
+		if (caps.size) {
+			info.flags |= VFIO_IOMMU_INFO_CAPS;
+			if (info.argsz < sizeof(info) + caps.size) {
+				info.argsz = sizeof(info) + caps.size;
+				info.cap_offset = 0;
+			} else {
+				vfio_info_cap_shift(&caps, sizeof(info));
+				if (copy_to_user((void __user *)arg +
+						sizeof(info), caps.buf,
+						caps.size)) {
+					kfree(caps.buf);
+					return -EFAULT;
+				}
+				info.cap_offset = sizeof(info);
+			}
+
+			kfree(caps.buf);
+		}
+
 		return copy_to_user((void __user *)arg, &info, minsz) ?
 			-EFAULT : 0;
 
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 4a9dbc2..8dae013 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -488,7 +488,35 @@ struct vfio_iommu_type1_info {
 	__u32	argsz;
 	__u32	flags;
 #define VFIO_IOMMU_INFO_PGSIZES (1 << 0)	/* supported page sizes info */
-	__u64	iova_pgsizes;		/* Bitmap of supported page sizes */
+#define VFIO_IOMMU_INFO_CAPS	(1 << 1)	/* Info supports caps */
+	__u64	iova_pgsizes;	/* Bitmap of supported page sizes */
+	__u32	__resv;
+	__u32   cap_offset;	/* Offset within info struct of first cap */
+};
+
+#define VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY	1
+
+/*
+ * The MSI geometry capability allows to report the MSI IOVA geometry:
+ * - either the MSI IOVAs are constrained within a reserved IOVA aperture
+ *   whose boundaries are given by [@aperture_start, @aperture_end].
+ *   this is typically the case on x86 host. The userspace is not allowed
+ *   to map userspace memory at IOVAs intersecting this range using
+ *   VFIO_IOMMU_MAP_DMA.
+ * - or the MSI IOVAs are not requested to belong to any reserved range;
+ *   in that case the userspace must provide an IOVA window characterized by
+ *   @size and @alignment using VFIO_IOMMU_MAP_DMA with RESERVED_MSI_IOVA flag.
+ */
+struct vfio_iommu_type1_info_cap_msi_geometry {
+	struct vfio_info_cap_header header;
+	__u32 flags;
+#define VFIO_IOMMU_MSI_GEOMETRY_RESERVED (1 << 0) /* reserved geometry */
+	/* not reserved */
+	__u32 order; /* iommu page order used for aperture alignment*/
+	__u64 size; /* IOVA aperture size (bytes) the userspace must provide */
+	/* reserved */
+	__u64 aperture_start;
+	__u64 aperture_end;
 };
 
 #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
@@ -503,6 +531,8 @@ struct vfio_iommu_type1_info {
  * IOVA region that will be used on some platforms to map the host MSI frames.
  * In that specific case, vaddr is ignored. Once registered, an MSI reserved
  * IOVA region stays until the container is closed.
+ * The requirement for provisioning such reserved IOVA range can be checked by
+ * checking the VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY capability.
  */
 struct vfio_iommu_type1_dma_map {
 	__u32	argsz;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v13 15/15] vfio/type1: Return the MSI geometry through VFIO_IOMMU_GET_INFO capability chains
@ 2016-10-06  8:45   ` Eric Auger
  0 siblings, 0 replies; 109+ messages in thread
From: Eric Auger @ 2016-10-06  8:45 UTC (permalink / raw)
  To: linux-arm-kernel

This patch allows the user-space to retrieve the MSI geometry. The
implementation is based on capability chains, now also added to
VFIO_IOMMU_GET_INFO.

The returned info comprise:
- whether the MSI IOVA are constrained to a reserved range (x86 case) and
  in the positive, the start/end of the aperture,
- or whether the IOVA aperture need to be set by the userspace. In that
  case, the size and alignment of the IOVA window to be provided are
  returned.

In case the userspace must provide the IOVA aperture, we currently report
a size/alignment based on all the doorbells registered by the host kernel.
This may exceed the actual needs.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
v11 -> v11:
- msi_doorbell_pages was renamed msi_doorbell_calc_pages

v9 -> v10:
- move cap_offset after iova_pgsizes
- replace __u64 alignment by __u32 order
- introduce __u32 flags in vfio_iommu_type1_info_cap_msi_geometry and
  fix alignment
- call msi-doorbell API to compute the size/alignment

v8 -> v9:
- use iommu_msi_supported flag instead of programmable
- replace IOMMU_INFO_REQUIRE_MSI_MAP flag by a more sophisticated
  capability chain, reporting the MSI geometry

v7 -> v8:
- use iommu_domain_msi_geometry

v6 -> v7:
- remove the computation of the number of IOVA pages to be provisionned.
  This number depends on the domain/group/device topology which can
  dynamically change. Let's rely instead rely on an arbitrary max depending
  on the system

v4 -> v5:
- move msi_info and ret declaration within the conditional code

v3 -> v4:
- replace former vfio_domains_require_msi_mapping by
  more complex computation of MSI mapping requirements, especially the
  number of pages to be provided by the user-space.
- reword patch title

RFC v1 -> v1:
- derived from
  [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
- renamed allow_msi_reconfig into require_msi_mapping
- fixed VFIO_IOMMU_GET_INFO
---
 drivers/vfio/vfio_iommu_type1.c | 78 ++++++++++++++++++++++++++++++++++++++++-
 include/uapi/linux/vfio.h       | 32 ++++++++++++++++-
 2 files changed, 108 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index dc3ee5d..ce5e7eb 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -38,6 +38,8 @@
 #include <linux/workqueue.h>
 #include <linux/dma-iommu.h>
 #include <linux/msi-doorbell.h>
+#include <linux/irqdomain.h>
+#include <linux/msi.h>
 
 #define DRIVER_VERSION  "0.2"
 #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
@@ -1101,6 +1103,55 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
 	return ret;
 }
 
+static int compute_msi_geometry_caps(struct vfio_iommu *iommu,
+				     struct vfio_info_cap *caps)
+{
+	struct vfio_iommu_type1_info_cap_msi_geometry *vfio_msi_geometry;
+	unsigned long order = __ffs(vfio_pgsize_bitmap(iommu));
+	struct iommu_domain_msi_geometry msi_geometry;
+	struct vfio_info_cap_header *header;
+	struct vfio_domain *d;
+	bool reserved;
+	size_t size;
+
+	mutex_lock(&iommu->lock);
+	/* All domains have same require_msi_map property, pick first */
+	d = list_first_entry(&iommu->domain_list, struct vfio_domain, next);
+	iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_GEOMETRY,
+			      &msi_geometry);
+	reserved = !msi_geometry.iommu_msi_supported;
+
+	mutex_unlock(&iommu->lock);
+
+	size = sizeof(*vfio_msi_geometry);
+	header = vfio_info_cap_add(caps, size,
+				   VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY, 1);
+
+	if (IS_ERR(header))
+		return PTR_ERR(header);
+
+	vfio_msi_geometry = container_of(header,
+				struct vfio_iommu_type1_info_cap_msi_geometry,
+				header);
+
+	vfio_msi_geometry->flags = reserved;
+	if (reserved) {
+		vfio_msi_geometry->aperture_start = msi_geometry.aperture_start;
+		vfio_msi_geometry->aperture_end = msi_geometry.aperture_end;
+		return 0;
+	}
+
+	vfio_msi_geometry->order = order;
+	/*
+	 * we compute a system-wide requirement based on all the registered
+	 * doorbells
+	 */
+	vfio_msi_geometry->size =
+		msi_doorbell_calc_pages(order) * ((uint64_t) 1 << order);
+
+	return 0;
+}
+
 static long vfio_iommu_type1_ioctl(void *iommu_data,
 				   unsigned int cmd, unsigned long arg)
 {
@@ -1122,8 +1173,10 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 		}
 	} else if (cmd == VFIO_IOMMU_GET_INFO) {
 		struct vfio_iommu_type1_info info;
+		struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
+		int ret;
 
-		minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
+		minsz = offsetofend(struct vfio_iommu_type1_info, cap_offset);
 
 		if (copy_from_user(&info, (void __user *)arg, minsz))
 			return -EFAULT;
@@ -1135,6 +1188,29 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 
 		info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
 
+		ret = compute_msi_geometry_caps(iommu, &caps);
+		if (ret)
+			return ret;
+
+		if (caps.size) {
+			info.flags |= VFIO_IOMMU_INFO_CAPS;
+			if (info.argsz < sizeof(info) + caps.size) {
+				info.argsz = sizeof(info) + caps.size;
+				info.cap_offset = 0;
+			} else {
+				vfio_info_cap_shift(&caps, sizeof(info));
+				if (copy_to_user((void __user *)arg +
+						sizeof(info), caps.buf,
+						caps.size)) {
+					kfree(caps.buf);
+					return -EFAULT;
+				}
+				info.cap_offset = sizeof(info);
+			}
+
+			kfree(caps.buf);
+		}
+
 		return copy_to_user((void __user *)arg, &info, minsz) ?
 			-EFAULT : 0;
 
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 4a9dbc2..8dae013 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -488,7 +488,35 @@ struct vfio_iommu_type1_info {
 	__u32	argsz;
 	__u32	flags;
 #define VFIO_IOMMU_INFO_PGSIZES (1 << 0)	/* supported page sizes info */
-	__u64	iova_pgsizes;		/* Bitmap of supported page sizes */
+#define VFIO_IOMMU_INFO_CAPS	(1 << 1)	/* Info supports caps */
+	__u64	iova_pgsizes;	/* Bitmap of supported page sizes */
+	__u32	__resv;
+	__u32   cap_offset;	/* Offset within info struct of first cap */
+};
+
+#define VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY	1
+
+/*
+ * The MSI geometry capability allows to report the MSI IOVA geometry:
+ * - either the MSI IOVAs are constrained within a reserved IOVA aperture
+ *   whose boundaries are given by [@aperture_start, @aperture_end].
+ *   this is typically the case on x86 host. The userspace is not allowed
+ *   to map userspace memory@IOVAs intersecting this range using
+ *   VFIO_IOMMU_MAP_DMA.
+ * - or the MSI IOVAs are not requested to belong to any reserved range;
+ *   in that case the userspace must provide an IOVA window characterized by
+ *   @size and @alignment using VFIO_IOMMU_MAP_DMA with RESERVED_MSI_IOVA flag.
+ */
+struct vfio_iommu_type1_info_cap_msi_geometry {
+	struct vfio_info_cap_header header;
+	__u32 flags;
+#define VFIO_IOMMU_MSI_GEOMETRY_RESERVED (1 << 0) /* reserved geometry */
+	/* not reserved */
+	__u32 order; /* iommu page order used for aperture alignment*/
+	__u64 size; /* IOVA aperture size (bytes) the userspace must provide */
+	/* reserved */
+	__u64 aperture_start;
+	__u64 aperture_end;
 };
 
 #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
@@ -503,6 +531,8 @@ struct vfio_iommu_type1_info {
  * IOVA region that will be used on some platforms to map the host MSI frames.
  * In that specific case, vaddr is ignored. Once registered, an MSI reserved
  * IOVA region stays until the container is closed.
+ * The requirement for provisioning such reserved IOVA range can be checked by
+ * checking the VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY capability.
  */
 struct vfio_iommu_type1_dma_map {
 	__u32	argsz;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* Re: [PATCH v13 02/15] iommu/arm-smmu: Initialize the msi geometry
@ 2016-10-06 20:16     ` Alex Williamson
  0 siblings, 0 replies; 109+ messages in thread
From: Alex Williamson @ 2016-10-06 20:16 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, christoffer.dall, marc.zyngier, robin.murphy,
	will.deacon, joro, tglx, jason, linux-arm-kernel, kvm, drjones,
	linux-kernel, Bharat.Bhushan, pranav.sawargaonkar, p.fedin,
	iommu, Jean-Philippe.Brucker, yehuday, Manish.Jaggi

On Thu,  6 Oct 2016 08:45:18 +0000
Eric Auger <eric.auger@redhat.com> wrote:

> On ARM, MSI write transactions also are translated by the smmu.
> Let's report that specificity by setting the iommu_msi_supported
> field to true. A valid aperture window will need to be provided.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> 
> ---
> v12 -> v13:
> - reword the commit message
> 
> v8 -> v9:
> - reword the title and patch description
> 
> v7 -> v8:
> - use DOMAIN_ATTR_MSI_GEOMETRY
> 
> v4 -> v5:
> - don't handle fsl_pamu_domain anymore
> - handle arm-smmu-v3
> ---
>  drivers/iommu/arm-smmu-v3.c | 2 ++
>  drivers/iommu/arm-smmu.c    | 3 +++
>  2 files changed, 5 insertions(+)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 15c01c3..f82eec3 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -1382,6 +1382,7 @@ static bool arm_smmu_capable(enum iommu_cap cap)
>  static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
>  {
>  	struct arm_smmu_domain *smmu_domain;
> +	struct iommu_domain_msi_geometry msi_geometry = {0, 0, true};

nit, this initialization makes it difficult to search for who sets
iommu_msi_supported, could we perhaps be more explicit in the
initialization, ie.
	{
		.aperture_start = 0,
		.aperture_end = 0,
		.iommu_msi_supported = true
	};

No change to the compiled version, but easier to find in the source.

>  
>  	if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
>  		return NULL;
> @@ -1400,6 +1401,7 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
>  		kfree(smmu_domain);
>  		return NULL;
>  	}
> +	smmu_domain->domain.msi_geometry = msi_geometry;
>  
>  	mutex_init(&smmu_domain->init_mutex);
>  	spin_lock_init(&smmu_domain->pgtbl_lock);
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index ac4aab9..97ff1b4 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -1002,6 +1002,7 @@ static void arm_smmu_destroy_domain_context(struct iommu_domain *domain)
>  static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
>  {
>  	struct arm_smmu_domain *smmu_domain;
> +	struct iommu_domain_msi_geometry msi_geometry = {0, 0, true};
>  
>  	if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
>  		return NULL;
> @@ -1020,6 +1021,8 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
>  		return NULL;
>  	}
>  
> +	smmu_domain->domain.msi_geometry = msi_geometry;
> +
>  	mutex_init(&smmu_domain->init_mutex);
>  	spin_lock_init(&smmu_domain->pgtbl_lock);
>  

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v13 02/15] iommu/arm-smmu: Initialize the msi geometry
@ 2016-10-06 20:16     ` Alex Williamson
  0 siblings, 0 replies; 109+ messages in thread
From: Alex Williamson @ 2016-10-06 20:16 UTC (permalink / raw)
  To: Eric Auger
  Cc: yehuday-eYqpPyKDWXRBDgjK7y7TUQ, drjones-H+wXaHxf7aLQT0dZR+AlfA,
	jason-NLaQJdtUoK4Be96aLqz0jA, kvm-u79uwXL29TY76Z2rM5mHXA,
	marc.zyngier-5wv7dgnIgG8, p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	tglx-hfZtesqFncYOwBW4kG4KsQ,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w

On Thu,  6 Oct 2016 08:45:18 +0000
Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> On ARM, MSI write transactions also are translated by the smmu.
> Let's report that specificity by setting the iommu_msi_supported
> field to true. A valid aperture window will need to be provided.
> 
> Signed-off-by: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> 
> ---
> v12 -> v13:
> - reword the commit message
> 
> v8 -> v9:
> - reword the title and patch description
> 
> v7 -> v8:
> - use DOMAIN_ATTR_MSI_GEOMETRY
> 
> v4 -> v5:
> - don't handle fsl_pamu_domain anymore
> - handle arm-smmu-v3
> ---
>  drivers/iommu/arm-smmu-v3.c | 2 ++
>  drivers/iommu/arm-smmu.c    | 3 +++
>  2 files changed, 5 insertions(+)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 15c01c3..f82eec3 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -1382,6 +1382,7 @@ static bool arm_smmu_capable(enum iommu_cap cap)
>  static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
>  {
>  	struct arm_smmu_domain *smmu_domain;
> +	struct iommu_domain_msi_geometry msi_geometry = {0, 0, true};

nit, this initialization makes it difficult to search for who sets
iommu_msi_supported, could we perhaps be more explicit in the
initialization, ie.
	{
		.aperture_start = 0,
		.aperture_end = 0,
		.iommu_msi_supported = true
	};

No change to the compiled version, but easier to find in the source.

>  
>  	if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
>  		return NULL;
> @@ -1400,6 +1401,7 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
>  		kfree(smmu_domain);
>  		return NULL;
>  	}
> +	smmu_domain->domain.msi_geometry = msi_geometry;
>  
>  	mutex_init(&smmu_domain->init_mutex);
>  	spin_lock_init(&smmu_domain->pgtbl_lock);
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index ac4aab9..97ff1b4 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -1002,6 +1002,7 @@ static void arm_smmu_destroy_domain_context(struct iommu_domain *domain)
>  static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
>  {
>  	struct arm_smmu_domain *smmu_domain;
> +	struct iommu_domain_msi_geometry msi_geometry = {0, 0, true};
>  
>  	if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
>  		return NULL;
> @@ -1020,6 +1021,8 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
>  		return NULL;
>  	}
>  
> +	smmu_domain->domain.msi_geometry = msi_geometry;
> +
>  	mutex_init(&smmu_domain->init_mutex);
>  	spin_lock_init(&smmu_domain->pgtbl_lock);
>  

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v13 02/15] iommu/arm-smmu: Initialize the msi geometry
@ 2016-10-06 20:16     ` Alex Williamson
  0 siblings, 0 replies; 109+ messages in thread
From: Alex Williamson @ 2016-10-06 20:16 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu,  6 Oct 2016 08:45:18 +0000
Eric Auger <eric.auger@redhat.com> wrote:

> On ARM, MSI write transactions also are translated by the smmu.
> Let's report that specificity by setting the iommu_msi_supported
> field to true. A valid aperture window will need to be provided.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> 
> ---
> v12 -> v13:
> - reword the commit message
> 
> v8 -> v9:
> - reword the title and patch description
> 
> v7 -> v8:
> - use DOMAIN_ATTR_MSI_GEOMETRY
> 
> v4 -> v5:
> - don't handle fsl_pamu_domain anymore
> - handle arm-smmu-v3
> ---
>  drivers/iommu/arm-smmu-v3.c | 2 ++
>  drivers/iommu/arm-smmu.c    | 3 +++
>  2 files changed, 5 insertions(+)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 15c01c3..f82eec3 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -1382,6 +1382,7 @@ static bool arm_smmu_capable(enum iommu_cap cap)
>  static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
>  {
>  	struct arm_smmu_domain *smmu_domain;
> +	struct iommu_domain_msi_geometry msi_geometry = {0, 0, true};

nit, this initialization makes it difficult to search for who sets
iommu_msi_supported, could we perhaps be more explicit in the
initialization, ie.
	{
		.aperture_start = 0,
		.aperture_end = 0,
		.iommu_msi_supported = true
	};

No change to the compiled version, but easier to find in the source.

>  
>  	if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
>  		return NULL;
> @@ -1400,6 +1401,7 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
>  		kfree(smmu_domain);
>  		return NULL;
>  	}
> +	smmu_domain->domain.msi_geometry = msi_geometry;
>  
>  	mutex_init(&smmu_domain->init_mutex);
>  	spin_lock_init(&smmu_domain->pgtbl_lock);
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index ac4aab9..97ff1b4 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -1002,6 +1002,7 @@ static void arm_smmu_destroy_domain_context(struct iommu_domain *domain)
>  static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
>  {
>  	struct arm_smmu_domain *smmu_domain;
> +	struct iommu_domain_msi_geometry msi_geometry = {0, 0, true};
>  
>  	if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
>  		return NULL;
> @@ -1020,6 +1021,8 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
>  		return NULL;
>  	}
>  
> +	smmu_domain->domain.msi_geometry = msi_geometry;
> +
>  	mutex_init(&smmu_domain->init_mutex);
>  	spin_lock_init(&smmu_domain->pgtbl_lock);
>  

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v13 03/15] iommu/dma: Allow MSI-only cookies
@ 2016-10-06 20:17     ` Alex Williamson
  0 siblings, 0 replies; 109+ messages in thread
From: Alex Williamson @ 2016-10-06 20:17 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, christoffer.dall, marc.zyngier, robin.murphy,
	will.deacon, joro, tglx, jason, linux-arm-kernel, kvm, drjones,
	linux-kernel, Bharat.Bhushan, pranav.sawargaonkar, p.fedin,
	iommu, Jean-Philippe.Brucker, yehuday, Manish.Jaggi

On Thu,  6 Oct 2016 08:45:19 +0000
Eric Auger <eric.auger@redhat.com> wrote:

> From: Robin Murphy <robin.murphy@arm.com>
> 
> IOMMU domain users such as VFIO face a similar problem to DMA API ops
> with regard to mapping MSI messages in systems where the MSI write is
> subject to IOMMU translation. With the relevant infrastructure now in
> place for managed DMA domains, it's actually really simple for other
> users to piggyback off that and reap the benefits without giving up
> their own IOVA management, and without having to reinvent their own
> wheel in the MSI layer.
> 
> Allow such users to opt into automatic MSI remapping by dedicating a
> region of their IOVA space to a managed cookie.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> 
> ---
> 
> v1 -> v2:
> - compared to Robin's version
> - add NULL last param to iommu_dma_init_domain
> - set the msi_geometry aperture
> - I removed
>   if (base < U64_MAX - size)
>      reserve_iova(iovad, iova_pfn(iovad, base + size), ULONG_MAX);
>   don't get why we would reserve something out of the scope of the iova domain?
>   what do I miss?
> ---
>  drivers/iommu/dma-iommu.c | 40 ++++++++++++++++++++++++++++++++++++++++
>  include/linux/dma-iommu.h |  9 +++++++++
>  2 files changed, 49 insertions(+)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index c5ab866..11da1a0 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -716,3 +716,43 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
>  		msg->address_lo += lower_32_bits(msi_page->iova);
>  	}
>  }
> +
> +/**
> + * iommu_get_dma_msi_region_cookie - Configure a domain for MSI remapping only

Should this perhaps be iommu_setup_dma_msi_region_cookie, or something
along those lines.  I'm not sure what we're get'ing.  Thanks,

Alex

> + * @domain: IOMMU domain to prepare
> + * @base: Base address of IOVA region to use as the MSI remapping aperture
> + * @size: Size of the desired MSI aperture
> + *
> + * Users who manage their own IOVA allocation and do not want DMA API support,
> + * but would still like to take advantage of automatic MSI remapping, can use
> + * this to initialise their own domain appropriately.
> + */
> +int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
> +		dma_addr_t base, u64 size)
> +{
> +	struct iommu_dma_cookie *cookie;
> +	struct iova_domain *iovad;
> +	int ret;
> +
> +	if (domain->type == IOMMU_DOMAIN_DMA)
> +		return -EINVAL;
> +
> +	ret = iommu_get_dma_cookie(domain);
> +	if (ret)
> +		return ret;
> +
> +	ret = iommu_dma_init_domain(domain, base, size, NULL);
> +	if (ret) {
> +		iommu_put_dma_cookie(domain);
> +		return ret;
> +	}
> +
> +	domain->msi_geometry.aperture_start = base;
> +	domain->msi_geometry.aperture_end = base + size - 1;
> +
> +	cookie = domain->iova_cookie;
> +	iovad = &cookie->iovad;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL(iommu_get_dma_msi_region_cookie);
> diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
> index 32c5890..1c55413 100644
> --- a/include/linux/dma-iommu.h
> +++ b/include/linux/dma-iommu.h
> @@ -67,6 +67,9 @@ int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
>  /* The DMA API isn't _quite_ the whole story, though... */
>  void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg);
>  
> +int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
> +		dma_addr_t base, u64 size);
> +
>  #else
>  
>  struct iommu_domain;
> @@ -90,6 +93,12 @@ static inline void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
>  {
>  }
>  
> +static inline int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
> +		dma_addr_t base, u64 size)
> +{
> +	return -ENODEV;
> +}
> +
>  #endif	/* CONFIG_IOMMU_DMA */
>  #endif	/* __KERNEL__ */
>  #endif	/* __DMA_IOMMU_H */

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v13 03/15] iommu/dma: Allow MSI-only cookies
@ 2016-10-06 20:17     ` Alex Williamson
  0 siblings, 0 replies; 109+ messages in thread
From: Alex Williamson @ 2016-10-06 20:17 UTC (permalink / raw)
  To: Eric Auger
  Cc: yehuday-eYqpPyKDWXRBDgjK7y7TUQ, drjones-H+wXaHxf7aLQT0dZR+AlfA,
	jason-NLaQJdtUoK4Be96aLqz0jA, kvm-u79uwXL29TY76Z2rM5mHXA,
	marc.zyngier-5wv7dgnIgG8, p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	tglx-hfZtesqFncYOwBW4kG4KsQ,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w

On Thu,  6 Oct 2016 08:45:19 +0000
Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> From: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
> 
> IOMMU domain users such as VFIO face a similar problem to DMA API ops
> with regard to mapping MSI messages in systems where the MSI write is
> subject to IOMMU translation. With the relevant infrastructure now in
> place for managed DMA domains, it's actually really simple for other
> users to piggyback off that and reap the benefits without giving up
> their own IOVA management, and without having to reinvent their own
> wheel in the MSI layer.
> 
> Allow such users to opt into automatic MSI remapping by dedicating a
> region of their IOVA space to a managed cookie.
> 
> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
> Signed-off-by: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> 
> ---
> 
> v1 -> v2:
> - compared to Robin's version
> - add NULL last param to iommu_dma_init_domain
> - set the msi_geometry aperture
> - I removed
>   if (base < U64_MAX - size)
>      reserve_iova(iovad, iova_pfn(iovad, base + size), ULONG_MAX);
>   don't get why we would reserve something out of the scope of the iova domain?
>   what do I miss?
> ---
>  drivers/iommu/dma-iommu.c | 40 ++++++++++++++++++++++++++++++++++++++++
>  include/linux/dma-iommu.h |  9 +++++++++
>  2 files changed, 49 insertions(+)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index c5ab866..11da1a0 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -716,3 +716,43 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
>  		msg->address_lo += lower_32_bits(msi_page->iova);
>  	}
>  }
> +
> +/**
> + * iommu_get_dma_msi_region_cookie - Configure a domain for MSI remapping only

Should this perhaps be iommu_setup_dma_msi_region_cookie, or something
along those lines.  I'm not sure what we're get'ing.  Thanks,

Alex

> + * @domain: IOMMU domain to prepare
> + * @base: Base address of IOVA region to use as the MSI remapping aperture
> + * @size: Size of the desired MSI aperture
> + *
> + * Users who manage their own IOVA allocation and do not want DMA API support,
> + * but would still like to take advantage of automatic MSI remapping, can use
> + * this to initialise their own domain appropriately.
> + */
> +int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
> +		dma_addr_t base, u64 size)
> +{
> +	struct iommu_dma_cookie *cookie;
> +	struct iova_domain *iovad;
> +	int ret;
> +
> +	if (domain->type == IOMMU_DOMAIN_DMA)
> +		return -EINVAL;
> +
> +	ret = iommu_get_dma_cookie(domain);
> +	if (ret)
> +		return ret;
> +
> +	ret = iommu_dma_init_domain(domain, base, size, NULL);
> +	if (ret) {
> +		iommu_put_dma_cookie(domain);
> +		return ret;
> +	}
> +
> +	domain->msi_geometry.aperture_start = base;
> +	domain->msi_geometry.aperture_end = base + size - 1;
> +
> +	cookie = domain->iova_cookie;
> +	iovad = &cookie->iovad;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL(iommu_get_dma_msi_region_cookie);
> diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
> index 32c5890..1c55413 100644
> --- a/include/linux/dma-iommu.h
> +++ b/include/linux/dma-iommu.h
> @@ -67,6 +67,9 @@ int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
>  /* The DMA API isn't _quite_ the whole story, though... */
>  void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg);
>  
> +int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
> +		dma_addr_t base, u64 size);
> +
>  #else
>  
>  struct iommu_domain;
> @@ -90,6 +93,12 @@ static inline void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
>  {
>  }
>  
> +static inline int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
> +		dma_addr_t base, u64 size)
> +{
> +	return -ENODEV;
> +}
> +
>  #endif	/* CONFIG_IOMMU_DMA */
>  #endif	/* __KERNEL__ */
>  #endif	/* __DMA_IOMMU_H */

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v13 03/15] iommu/dma: Allow MSI-only cookies
@ 2016-10-06 20:17     ` Alex Williamson
  0 siblings, 0 replies; 109+ messages in thread
From: Alex Williamson @ 2016-10-06 20:17 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu,  6 Oct 2016 08:45:19 +0000
Eric Auger <eric.auger@redhat.com> wrote:

> From: Robin Murphy <robin.murphy@arm.com>
> 
> IOMMU domain users such as VFIO face a similar problem to DMA API ops
> with regard to mapping MSI messages in systems where the MSI write is
> subject to IOMMU translation. With the relevant infrastructure now in
> place for managed DMA domains, it's actually really simple for other
> users to piggyback off that and reap the benefits without giving up
> their own IOVA management, and without having to reinvent their own
> wheel in the MSI layer.
> 
> Allow such users to opt into automatic MSI remapping by dedicating a
> region of their IOVA space to a managed cookie.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> 
> ---
> 
> v1 -> v2:
> - compared to Robin's version
> - add NULL last param to iommu_dma_init_domain
> - set the msi_geometry aperture
> - I removed
>   if (base < U64_MAX - size)
>      reserve_iova(iovad, iova_pfn(iovad, base + size), ULONG_MAX);
>   don't get why we would reserve something out of the scope of the iova domain?
>   what do I miss?
> ---
>  drivers/iommu/dma-iommu.c | 40 ++++++++++++++++++++++++++++++++++++++++
>  include/linux/dma-iommu.h |  9 +++++++++
>  2 files changed, 49 insertions(+)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index c5ab866..11da1a0 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -716,3 +716,43 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
>  		msg->address_lo += lower_32_bits(msi_page->iova);
>  	}
>  }
> +
> +/**
> + * iommu_get_dma_msi_region_cookie - Configure a domain for MSI remapping only

Should this perhaps be iommu_setup_dma_msi_region_cookie, or something
along those lines.  I'm not sure what we're get'ing.  Thanks,

Alex

> + * @domain: IOMMU domain to prepare
> + * @base: Base address of IOVA region to use as the MSI remapping aperture
> + * @size: Size of the desired MSI aperture
> + *
> + * Users who manage their own IOVA allocation and do not want DMA API support,
> + * but would still like to take advantage of automatic MSI remapping, can use
> + * this to initialise their own domain appropriately.
> + */
> +int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
> +		dma_addr_t base, u64 size)
> +{
> +	struct iommu_dma_cookie *cookie;
> +	struct iova_domain *iovad;
> +	int ret;
> +
> +	if (domain->type == IOMMU_DOMAIN_DMA)
> +		return -EINVAL;
> +
> +	ret = iommu_get_dma_cookie(domain);
> +	if (ret)
> +		return ret;
> +
> +	ret = iommu_dma_init_domain(domain, base, size, NULL);
> +	if (ret) {
> +		iommu_put_dma_cookie(domain);
> +		return ret;
> +	}
> +
> +	domain->msi_geometry.aperture_start = base;
> +	domain->msi_geometry.aperture_end = base + size - 1;
> +
> +	cookie = domain->iova_cookie;
> +	iovad = &cookie->iovad;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL(iommu_get_dma_msi_region_cookie);
> diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
> index 32c5890..1c55413 100644
> --- a/include/linux/dma-iommu.h
> +++ b/include/linux/dma-iommu.h
> @@ -67,6 +67,9 @@ int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
>  /* The DMA API isn't _quite_ the whole story, though... */
>  void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg);
>  
> +int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
> +		dma_addr_t base, u64 size);
> +
>  #else
>  
>  struct iommu_domain;
> @@ -90,6 +93,12 @@ static inline void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
>  {
>  }
>  
> +static inline int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
> +		dma_addr_t base, u64 size)
> +{
> +	return -ENODEV;
> +}
> +
>  #endif	/* CONFIG_IOMMU_DMA */
>  #endif	/* __KERNEL__ */
>  #endif	/* __DMA_IOMMU_H */

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v13 04/15] genirq/msi: Introduce the MSI doorbell API
  2016-10-06  8:45   ` Eric Auger
@ 2016-10-06 20:17     ` Alex Williamson
  -1 siblings, 0 replies; 109+ messages in thread
From: Alex Williamson @ 2016-10-06 20:17 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, christoffer.dall, marc.zyngier, robin.murphy,
	will.deacon, joro, tglx, jason, linux-arm-kernel, kvm, drjones,
	linux-kernel, Bharat.Bhushan, pranav.sawargaonkar, p.fedin,
	iommu, Jean-Philippe.Brucker, yehuday, Manish.Jaggi

On Thu,  6 Oct 2016 08:45:20 +0000
Eric Auger <eric.auger@redhat.com> wrote:

> We introduce a new msi-doorbell API that allows msi controllers
> to allocate and register their doorbells. This is useful when
> those doorbells are likely to be iommu mapped (typically on ARM).
> The VFIO layer will need to gather information about those doorbells:
> whether they are safe (ie. they implement irq remapping) and how
> many IOMMU pages are requested to map all of them.
> 
> This patch first introduces the dedicated msi_doorbell_info struct
> and the registration/unregistration functions.
> 
> A doorbell region is characterized by its physical address base, size,
> and whether it its safe (ie. it implements IRQ remapping). A doorbell
> can be per-cpu of global. We currently only care about global doorbells.
                 ^^ s/of/or/

> 
> A function returns whether all doorbells are safe.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> 
> ---
> v12 -> v13:
> - directly select MSI_DOORBELL in ARM_SMMU and ARM_SMMU_V3 configs
> - remove prot attribute
> - move msi_doorbell_info struct definition in msi-doorbell.c
> - change the commit title
> - change proto of the registration function
> - msi_doorbell_safe now in this patch
> 
> v11 -> v12:
> - rename irqchip_doorbell into msi_doorbell, irqchip_doorbell_list
>   into msi_doorbell_list and irqchip_doorbell_mutex into
>   msi_doorbell_mutex
> - fix style issues: align msi_doorbell struct members, kernel-doc comments
> - use kzalloc
> - use container_of in msi_doorbell_unregister_global
> - compute nb_unsafe_doorbells on registration/unregistration
> - registration simply returns NULL if allocation failed
> 
> v10 -> v11:
> - remove void *chip_data argument from register/unregister function
> - remove lookup funtions since we restored the struct irq_chip
>   msi_doorbell_info ops to realize this function
> - reword commit message and title
> 
> Conflicts:
> 	kernel/irq/Makefile
> 
> Conflicts:
> 	drivers/iommu/Kconfig
> ---
>  drivers/iommu/Kconfig        |  2 +
>  include/linux/msi-doorbell.h | 77 ++++++++++++++++++++++++++++++++++
>  kernel/irq/Kconfig           |  4 ++
>  kernel/irq/Makefile          |  1 +
>  kernel/irq/msi-doorbell.c    | 98 ++++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 182 insertions(+)
>  create mode 100644 include/linux/msi-doorbell.h
>  create mode 100644 kernel/irq/msi-doorbell.c
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 8ee54d7..0cc7fac 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -297,6 +297,7 @@ config SPAPR_TCE_IOMMU
>  config ARM_SMMU
>  	bool "ARM Ltd. System MMU (SMMU) Support"
>  	depends on (ARM64 || ARM) && MMU
> +	select MSI_DOORBELL
>  	select IOMMU_API
>  	select IOMMU_IO_PGTABLE_LPAE
>  	select ARM_DMA_USE_IOMMU if ARM
> @@ -310,6 +311,7 @@ config ARM_SMMU
>  config ARM_SMMU_V3
>  	bool "ARM Ltd. System MMU Version 3 (SMMUv3) Support"
>  	depends on ARM64
> +	select MSI_DOORBELL
>  	select IOMMU_API
>  	select IOMMU_IO_PGTABLE_LPAE
>  	select GENERIC_MSI_IRQ_DOMAIN
> diff --git a/include/linux/msi-doorbell.h b/include/linux/msi-doorbell.h
> new file mode 100644
> index 0000000..c18a382
> --- /dev/null
> +++ b/include/linux/msi-doorbell.h
> @@ -0,0 +1,77 @@
> +/*
> + * API to register/query MSI doorbells likely to be IOMMU mapped
> + *
> + * Copyright (C) 2016 Red Hat, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef _LINUX_MSI_DOORBELL_H
> +#define _LINUX_MSI_DOORBELL_H
> +
> +struct msi_doorbell_info;
> +
> +#ifdef CONFIG_MSI_DOORBELL
> +
> +/**
> + * msi_doorbell_register - allocate and register a global doorbell
> + * @base: physical base address of the global doorbell
> + * @size: size of the global doorbell
> + * @prot: protection/memory attributes
> + * @safe: true is irq_remapping implemented for this doorbell
> + * @dbinfo: returned doorbell info
> + *
> + * Return: 0 on success, -ENOMEM on allocation failure
> + */
> +int msi_doorbell_register_global(phys_addr_t base, size_t size,
> +				 bool safe,
> +				 struct msi_doorbell_info **dbinfo);
> +

Seems like alloc/free behavior vs register/unregister.  Also seems
cleaner to just return a struct msi_doorbell_info* and use PTR_ERR for
return codes.  These are of course superficial changes that could be
addressed in the future.

> +/**
> + * msi_doorbell_unregister_global - unregister a global doorbell
> + * @db: doorbell info to unregister
> + *
> + * remove the doorbell descriptor from the list of registered doorbells
> + * and deallocates it
> + */
> +void msi_doorbell_unregister_global(struct msi_doorbell_info *db);
> +
> +/**
> + * msi_doorbell_safe - return whether all registered doorbells are safe
> + *
> + * Safe doorbells are those which implement irq remapping
> + * Return: true if all doorbells are safe, false otherwise
> + */
> +bool msi_doorbell_safe(void);
> +
> +#else
> +
> +static inline int
> +msi_doorbell_register_global(phys_addr_t base, size_t size,
> +			     int prot, bool safe,
> +			     struct msi_doorbell_info **dbinfo)
> +{
> +	*dbinfo = NULL;
> +	return 0;

If we return a struct*

return NULL;

> +}
> +
> +static inline void
> +msi_doorbell_unregister_global(struct msi_doorbell_info *db) {}
> +
> +static inline bool msi_doorbell_safe(void)
> +{
> +	return true;
> +}

Is it?

> +#endif /* CONFIG_MSI_DOORBELL */
> +
> +#endif
> diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig
> index 3bbfd6a..d4faaaa 100644
> --- a/kernel/irq/Kconfig
> +++ b/kernel/irq/Kconfig
> @@ -72,6 +72,10 @@ config GENERIC_IRQ_IPI
>  config GENERIC_MSI_IRQ
>  	bool
>  
> +# MSI doorbell support (for doorbell IOMMU mapping)
> +config MSI_DOORBELL
> +	bool
> +
>  # Generic MSI hierarchical interrupt domain support
>  config GENERIC_MSI_IRQ_DOMAIN
>  	bool
> diff --git a/kernel/irq/Makefile b/kernel/irq/Makefile
> index 1d3ee31..5b04dd1 100644
> --- a/kernel/irq/Makefile
> +++ b/kernel/irq/Makefile
> @@ -10,3 +10,4 @@ obj-$(CONFIG_PM_SLEEP) += pm.o
>  obj-$(CONFIG_GENERIC_MSI_IRQ) += msi.o
>  obj-$(CONFIG_GENERIC_IRQ_IPI) += ipi.o
>  obj-$(CONFIG_SMP) += affinity.o
> +obj-$(CONFIG_MSI_DOORBELL) += msi-doorbell.o
> diff --git a/kernel/irq/msi-doorbell.c b/kernel/irq/msi-doorbell.c
> new file mode 100644
> index 0000000..60a262a
> --- /dev/null
> +++ b/kernel/irq/msi-doorbell.c
> @@ -0,0 +1,98 @@
> +/*
> + * API to register/query MSI doorbells likely to be IOMMU mapped
> + *
> + * Copyright (C) 2016 Red Hat, Inc.
> + * Author: Eric Auger <eric.auger@redhat.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <linux/slab.h>
> +#include <linux/irq.h>
> +#include <linux/msi-doorbell.h>
> +
> +/**
> + * struct msi_doorbell_info - MSI doorbell region descriptor
> + * @percpu_doorbells: per cpu doorbell base address
> + * @global_doorbell: base address of the doorbell
> + * @doorbell_is_percpu: is the doorbell per cpu or global?
> + * @safe: true if irq remapping is implemented
> + * @size: size of the doorbell
> + */
> +struct msi_doorbell_info {
> +	union {
> +		phys_addr_t __percpu    *percpu_doorbells;
> +		phys_addr_t             global_doorbell;
> +	};
> +	bool    doorbell_is_percpu;
> +	bool    safe;
> +	size_t  size;
> +};
> +
> +struct msi_doorbell {
> +	struct msi_doorbell_info	info;
> +	struct list_head		next;
> +};
> +
> +/* list of registered MSI doorbells */
> +static LIST_HEAD(msi_doorbell_list);
> +
> +/* counts the number of unsafe registered doorbells */
> +static uint nb_unsafe_doorbells;
> +
> +/* protects the list and nb__unsafe_doorbells */

Extra underscore

> +static DEFINE_MUTEX(msi_doorbell_mutex);
> +
> +int msi_doorbell_register_global(phys_addr_t base, size_t size, bool safe,
> +				 struct msi_doorbell_info **dbinfo)
> +{
> +	struct msi_doorbell *db;
> +
> +	db = kzalloc(sizeof(*db), GFP_KERNEL);
> +	if (!db)
> +		return -ENOMEM;
> +
> +	db->info.global_doorbell = base;
> +	db->info.size = size;
> +	db->info.safe = safe;
> +
> +	mutex_lock(&msi_doorbell_mutex);
> +	list_add(&db->next, &msi_doorbell_list);
> +	if (!db->info.safe)
> +		nb_unsafe_doorbells++;
> +	mutex_unlock(&msi_doorbell_mutex);
> +	*dbinfo = &db->info;
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(msi_doorbell_register_global);
> +
> +void msi_doorbell_unregister_global(struct msi_doorbell_info *dbinfo)
> +{
> +	struct msi_doorbell *db;
> +
> +	db = container_of(dbinfo, struct msi_doorbell, info);
> +
> +	mutex_lock(&msi_doorbell_mutex);
> +	list_del(&db->next);
> +	if (!db->info.safe)
> +		nb_unsafe_doorbells--;
> +	mutex_unlock(&msi_doorbell_mutex);
> +	kfree(db);
> +}
> +EXPORT_SYMBOL_GPL(msi_doorbell_unregister_global);
> +
> +bool msi_doorbell_safe(void)
> +{
> +	return !nb_unsafe_doorbells;
> +}
> +EXPORT_SYMBOL_GPL(msi_doorbell_safe);

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v13 04/15] genirq/msi: Introduce the MSI doorbell API
@ 2016-10-06 20:17     ` Alex Williamson
  0 siblings, 0 replies; 109+ messages in thread
From: Alex Williamson @ 2016-10-06 20:17 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu,  6 Oct 2016 08:45:20 +0000
Eric Auger <eric.auger@redhat.com> wrote:

> We introduce a new msi-doorbell API that allows msi controllers
> to allocate and register their doorbells. This is useful when
> those doorbells are likely to be iommu mapped (typically on ARM).
> The VFIO layer will need to gather information about those doorbells:
> whether they are safe (ie. they implement irq remapping) and how
> many IOMMU pages are requested to map all of them.
> 
> This patch first introduces the dedicated msi_doorbell_info struct
> and the registration/unregistration functions.
> 
> A doorbell region is characterized by its physical address base, size,
> and whether it its safe (ie. it implements IRQ remapping). A doorbell
> can be per-cpu of global. We currently only care about global doorbells.
                 ^^ s/of/or/

> 
> A function returns whether all doorbells are safe.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> 
> ---
> v12 -> v13:
> - directly select MSI_DOORBELL in ARM_SMMU and ARM_SMMU_V3 configs
> - remove prot attribute
> - move msi_doorbell_info struct definition in msi-doorbell.c
> - change the commit title
> - change proto of the registration function
> - msi_doorbell_safe now in this patch
> 
> v11 -> v12:
> - rename irqchip_doorbell into msi_doorbell, irqchip_doorbell_list
>   into msi_doorbell_list and irqchip_doorbell_mutex into
>   msi_doorbell_mutex
> - fix style issues: align msi_doorbell struct members, kernel-doc comments
> - use kzalloc
> - use container_of in msi_doorbell_unregister_global
> - compute nb_unsafe_doorbells on registration/unregistration
> - registration simply returns NULL if allocation failed
> 
> v10 -> v11:
> - remove void *chip_data argument from register/unregister function
> - remove lookup funtions since we restored the struct irq_chip
>   msi_doorbell_info ops to realize this function
> - reword commit message and title
> 
> Conflicts:
> 	kernel/irq/Makefile
> 
> Conflicts:
> 	drivers/iommu/Kconfig
> ---
>  drivers/iommu/Kconfig        |  2 +
>  include/linux/msi-doorbell.h | 77 ++++++++++++++++++++++++++++++++++
>  kernel/irq/Kconfig           |  4 ++
>  kernel/irq/Makefile          |  1 +
>  kernel/irq/msi-doorbell.c    | 98 ++++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 182 insertions(+)
>  create mode 100644 include/linux/msi-doorbell.h
>  create mode 100644 kernel/irq/msi-doorbell.c
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 8ee54d7..0cc7fac 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -297,6 +297,7 @@ config SPAPR_TCE_IOMMU
>  config ARM_SMMU
>  	bool "ARM Ltd. System MMU (SMMU) Support"
>  	depends on (ARM64 || ARM) && MMU
> +	select MSI_DOORBELL
>  	select IOMMU_API
>  	select IOMMU_IO_PGTABLE_LPAE
>  	select ARM_DMA_USE_IOMMU if ARM
> @@ -310,6 +311,7 @@ config ARM_SMMU
>  config ARM_SMMU_V3
>  	bool "ARM Ltd. System MMU Version 3 (SMMUv3) Support"
>  	depends on ARM64
> +	select MSI_DOORBELL
>  	select IOMMU_API
>  	select IOMMU_IO_PGTABLE_LPAE
>  	select GENERIC_MSI_IRQ_DOMAIN
> diff --git a/include/linux/msi-doorbell.h b/include/linux/msi-doorbell.h
> new file mode 100644
> index 0000000..c18a382
> --- /dev/null
> +++ b/include/linux/msi-doorbell.h
> @@ -0,0 +1,77 @@
> +/*
> + * API to register/query MSI doorbells likely to be IOMMU mapped
> + *
> + * Copyright (C) 2016 Red Hat, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef _LINUX_MSI_DOORBELL_H
> +#define _LINUX_MSI_DOORBELL_H
> +
> +struct msi_doorbell_info;
> +
> +#ifdef CONFIG_MSI_DOORBELL
> +
> +/**
> + * msi_doorbell_register - allocate and register a global doorbell
> + * @base: physical base address of the global doorbell
> + * @size: size of the global doorbell
> + * @prot: protection/memory attributes
> + * @safe: true is irq_remapping implemented for this doorbell
> + * @dbinfo: returned doorbell info
> + *
> + * Return: 0 on success, -ENOMEM on allocation failure
> + */
> +int msi_doorbell_register_global(phys_addr_t base, size_t size,
> +				 bool safe,
> +				 struct msi_doorbell_info **dbinfo);
> +

Seems like alloc/free behavior vs register/unregister.  Also seems
cleaner to just return a struct msi_doorbell_info* and use PTR_ERR for
return codes.  These are of course superficial changes that could be
addressed in the future.

> +/**
> + * msi_doorbell_unregister_global - unregister a global doorbell
> + * @db: doorbell info to unregister
> + *
> + * remove the doorbell descriptor from the list of registered doorbells
> + * and deallocates it
> + */
> +void msi_doorbell_unregister_global(struct msi_doorbell_info *db);
> +
> +/**
> + * msi_doorbell_safe - return whether all registered doorbells are safe
> + *
> + * Safe doorbells are those which implement irq remapping
> + * Return: true if all doorbells are safe, false otherwise
> + */
> +bool msi_doorbell_safe(void);
> +
> +#else
> +
> +static inline int
> +msi_doorbell_register_global(phys_addr_t base, size_t size,
> +			     int prot, bool safe,
> +			     struct msi_doorbell_info **dbinfo)
> +{
> +	*dbinfo = NULL;
> +	return 0;

If we return a struct*

return NULL;

> +}
> +
> +static inline void
> +msi_doorbell_unregister_global(struct msi_doorbell_info *db) {}
> +
> +static inline bool msi_doorbell_safe(void)
> +{
> +	return true;
> +}

Is it?

> +#endif /* CONFIG_MSI_DOORBELL */
> +
> +#endif
> diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig
> index 3bbfd6a..d4faaaa 100644
> --- a/kernel/irq/Kconfig
> +++ b/kernel/irq/Kconfig
> @@ -72,6 +72,10 @@ config GENERIC_IRQ_IPI
>  config GENERIC_MSI_IRQ
>  	bool
>  
> +# MSI doorbell support (for doorbell IOMMU mapping)
> +config MSI_DOORBELL
> +	bool
> +
>  # Generic MSI hierarchical interrupt domain support
>  config GENERIC_MSI_IRQ_DOMAIN
>  	bool
> diff --git a/kernel/irq/Makefile b/kernel/irq/Makefile
> index 1d3ee31..5b04dd1 100644
> --- a/kernel/irq/Makefile
> +++ b/kernel/irq/Makefile
> @@ -10,3 +10,4 @@ obj-$(CONFIG_PM_SLEEP) += pm.o
>  obj-$(CONFIG_GENERIC_MSI_IRQ) += msi.o
>  obj-$(CONFIG_GENERIC_IRQ_IPI) += ipi.o
>  obj-$(CONFIG_SMP) += affinity.o
> +obj-$(CONFIG_MSI_DOORBELL) += msi-doorbell.o
> diff --git a/kernel/irq/msi-doorbell.c b/kernel/irq/msi-doorbell.c
> new file mode 100644
> index 0000000..60a262a
> --- /dev/null
> +++ b/kernel/irq/msi-doorbell.c
> @@ -0,0 +1,98 @@
> +/*
> + * API to register/query MSI doorbells likely to be IOMMU mapped
> + *
> + * Copyright (C) 2016 Red Hat, Inc.
> + * Author: Eric Auger <eric.auger@redhat.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <linux/slab.h>
> +#include <linux/irq.h>
> +#include <linux/msi-doorbell.h>
> +
> +/**
> + * struct msi_doorbell_info - MSI doorbell region descriptor
> + * @percpu_doorbells: per cpu doorbell base address
> + * @global_doorbell: base address of the doorbell
> + * @doorbell_is_percpu: is the doorbell per cpu or global?
> + * @safe: true if irq remapping is implemented
> + * @size: size of the doorbell
> + */
> +struct msi_doorbell_info {
> +	union {
> +		phys_addr_t __percpu    *percpu_doorbells;
> +		phys_addr_t             global_doorbell;
> +	};
> +	bool    doorbell_is_percpu;
> +	bool    safe;
> +	size_t  size;
> +};
> +
> +struct msi_doorbell {
> +	struct msi_doorbell_info	info;
> +	struct list_head		next;
> +};
> +
> +/* list of registered MSI doorbells */
> +static LIST_HEAD(msi_doorbell_list);
> +
> +/* counts the number of unsafe registered doorbells */
> +static uint nb_unsafe_doorbells;
> +
> +/* protects the list and nb__unsafe_doorbells */

Extra underscore

> +static DEFINE_MUTEX(msi_doorbell_mutex);
> +
> +int msi_doorbell_register_global(phys_addr_t base, size_t size, bool safe,
> +				 struct msi_doorbell_info **dbinfo)
> +{
> +	struct msi_doorbell *db;
> +
> +	db = kzalloc(sizeof(*db), GFP_KERNEL);
> +	if (!db)
> +		return -ENOMEM;
> +
> +	db->info.global_doorbell = base;
> +	db->info.size = size;
> +	db->info.safe = safe;
> +
> +	mutex_lock(&msi_doorbell_mutex);
> +	list_add(&db->next, &msi_doorbell_list);
> +	if (!db->info.safe)
> +		nb_unsafe_doorbells++;
> +	mutex_unlock(&msi_doorbell_mutex);
> +	*dbinfo = &db->info;
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(msi_doorbell_register_global);
> +
> +void msi_doorbell_unregister_global(struct msi_doorbell_info *dbinfo)
> +{
> +	struct msi_doorbell *db;
> +
> +	db = container_of(dbinfo, struct msi_doorbell, info);
> +
> +	mutex_lock(&msi_doorbell_mutex);
> +	list_del(&db->next);
> +	if (!db->info.safe)
> +		nb_unsafe_doorbells--;
> +	mutex_unlock(&msi_doorbell_mutex);
> +	kfree(db);
> +}
> +EXPORT_SYMBOL_GPL(msi_doorbell_unregister_global);
> +
> +bool msi_doorbell_safe(void)
> +{
> +	return !nb_unsafe_doorbells;
> +}
> +EXPORT_SYMBOL_GPL(msi_doorbell_safe);

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v13 08/15] vfio: Introduce a vfio_dma type field
@ 2016-10-06 20:18     ` Alex Williamson
  0 siblings, 0 replies; 109+ messages in thread
From: Alex Williamson @ 2016-10-06 20:18 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, christoffer.dall, marc.zyngier, robin.murphy,
	will.deacon, joro, tglx, jason, linux-arm-kernel, kvm, drjones,
	linux-kernel, Bharat.Bhushan, pranav.sawargaonkar, p.fedin,
	iommu, Jean-Philippe.Brucker, yehuday, Manish.Jaggi

On Thu,  6 Oct 2016 08:45:24 +0000
Eric Auger <eric.auger@redhat.com> wrote:

> We introduce a vfio_dma type since we will need to discriminate
> different types of dma slots:
> - VFIO_IOVA_USER: IOVA region used to map user vaddr
> - VFIO_IOVA_RESERVED_MSI: IOVA region reserved to map MSI doorbells
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>

Acked-by: Alex Williamson <alex.williamson@redhat.com>

> 
> ---
> v9 -> v10:
> - renamed VFIO_IOVA_RESERVED into VFIO_IOVA_RESERVED_MSI
> - explicitly set type to VFIO_IOVA_USER on dma_map
> 
> v6 -> v7:
> - add VFIO_IOVA_ANY
> - do not introduce yet any VFIO_IOVA_RESERVED handling
> ---
>  drivers/vfio/vfio_iommu_type1.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 2ba1942..a9f8b93 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -53,6 +53,12 @@ module_param_named(disable_hugepages,
>  MODULE_PARM_DESC(disable_hugepages,
>  		 "Disable VFIO IOMMU support for IOMMU hugepages.");
>  
> +enum vfio_iova_type {
> +	VFIO_IOVA_USER = 0,	/* standard IOVA used to map user vaddr */
> +	VFIO_IOVA_RESERVED_MSI,	/* reserved to map MSI doorbells */
> +	VFIO_IOVA_ANY,		/* matches any IOVA type */
> +};
> +
>  struct vfio_iommu {
>  	struct list_head	domain_list;
>  	struct mutex		lock;
> @@ -75,6 +81,7 @@ struct vfio_dma {
>  	unsigned long		vaddr;		/* Process virtual addr */
>  	size_t			size;		/* Map size (bytes) */
>  	int			prot;		/* IOMMU_READ/WRITE */
> +	enum vfio_iova_type	type;		/* type of IOVA */
>  };
>  
>  struct vfio_group {
> @@ -607,6 +614,7 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>  	dma->iova = iova;
>  	dma->vaddr = vaddr;
>  	dma->prot = prot;
> +	dma->type = VFIO_IOVA_USER;
>  
>  	/* Insert zero-sized and grow as we map chunks of it */
>  	vfio_link_dma(iommu, dma);

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v13 08/15] vfio: Introduce a vfio_dma type field
@ 2016-10-06 20:18     ` Alex Williamson
  0 siblings, 0 replies; 109+ messages in thread
From: Alex Williamson @ 2016-10-06 20:18 UTC (permalink / raw)
  To: Eric Auger
  Cc: yehuday-eYqpPyKDWXRBDgjK7y7TUQ, drjones-H+wXaHxf7aLQT0dZR+AlfA,
	jason-NLaQJdtUoK4Be96aLqz0jA, kvm-u79uwXL29TY76Z2rM5mHXA,
	marc.zyngier-5wv7dgnIgG8, p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	tglx-hfZtesqFncYOwBW4kG4KsQ,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w

On Thu,  6 Oct 2016 08:45:24 +0000
Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> We introduce a vfio_dma type since we will need to discriminate
> different types of dma slots:
> - VFIO_IOVA_USER: IOVA region used to map user vaddr
> - VFIO_IOVA_RESERVED_MSI: IOVA region reserved to map MSI doorbells
> 
> Signed-off-by: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

Acked-by: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

> 
> ---
> v9 -> v10:
> - renamed VFIO_IOVA_RESERVED into VFIO_IOVA_RESERVED_MSI
> - explicitly set type to VFIO_IOVA_USER on dma_map
> 
> v6 -> v7:
> - add VFIO_IOVA_ANY
> - do not introduce yet any VFIO_IOVA_RESERVED handling
> ---
>  drivers/vfio/vfio_iommu_type1.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 2ba1942..a9f8b93 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -53,6 +53,12 @@ module_param_named(disable_hugepages,
>  MODULE_PARM_DESC(disable_hugepages,
>  		 "Disable VFIO IOMMU support for IOMMU hugepages.");
>  
> +enum vfio_iova_type {
> +	VFIO_IOVA_USER = 0,	/* standard IOVA used to map user vaddr */
> +	VFIO_IOVA_RESERVED_MSI,	/* reserved to map MSI doorbells */
> +	VFIO_IOVA_ANY,		/* matches any IOVA type */
> +};
> +
>  struct vfio_iommu {
>  	struct list_head	domain_list;
>  	struct mutex		lock;
> @@ -75,6 +81,7 @@ struct vfio_dma {
>  	unsigned long		vaddr;		/* Process virtual addr */
>  	size_t			size;		/* Map size (bytes) */
>  	int			prot;		/* IOMMU_READ/WRITE */
> +	enum vfio_iova_type	type;		/* type of IOVA */
>  };
>  
>  struct vfio_group {
> @@ -607,6 +614,7 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>  	dma->iova = iova;
>  	dma->vaddr = vaddr;
>  	dma->prot = prot;
> +	dma->type = VFIO_IOVA_USER;
>  
>  	/* Insert zero-sized and grow as we map chunks of it */
>  	vfio_link_dma(iommu, dma);

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v13 08/15] vfio: Introduce a vfio_dma type field
@ 2016-10-06 20:18     ` Alex Williamson
  0 siblings, 0 replies; 109+ messages in thread
From: Alex Williamson @ 2016-10-06 20:18 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu,  6 Oct 2016 08:45:24 +0000
Eric Auger <eric.auger@redhat.com> wrote:

> We introduce a vfio_dma type since we will need to discriminate
> different types of dma slots:
> - VFIO_IOVA_USER: IOVA region used to map user vaddr
> - VFIO_IOVA_RESERVED_MSI: IOVA region reserved to map MSI doorbells
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>

Acked-by: Alex Williamson <alex.williamson@redhat.com>

> 
> ---
> v9 -> v10:
> - renamed VFIO_IOVA_RESERVED into VFIO_IOVA_RESERVED_MSI
> - explicitly set type to VFIO_IOVA_USER on dma_map
> 
> v6 -> v7:
> - add VFIO_IOVA_ANY
> - do not introduce yet any VFIO_IOVA_RESERVED handling
> ---
>  drivers/vfio/vfio_iommu_type1.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 2ba1942..a9f8b93 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -53,6 +53,12 @@ module_param_named(disable_hugepages,
>  MODULE_PARM_DESC(disable_hugepages,
>  		 "Disable VFIO IOMMU support for IOMMU hugepages.");
>  
> +enum vfio_iova_type {
> +	VFIO_IOVA_USER = 0,	/* standard IOVA used to map user vaddr */
> +	VFIO_IOVA_RESERVED_MSI,	/* reserved to map MSI doorbells */
> +	VFIO_IOVA_ANY,		/* matches any IOVA type */
> +};
> +
>  struct vfio_iommu {
>  	struct list_head	domain_list;
>  	struct mutex		lock;
> @@ -75,6 +81,7 @@ struct vfio_dma {
>  	unsigned long		vaddr;		/* Process virtual addr */
>  	size_t			size;		/* Map size (bytes) */
>  	int			prot;		/* IOMMU_READ/WRITE */
> +	enum vfio_iova_type	type;		/* type of IOVA */
>  };
>  
>  struct vfio_group {
> @@ -607,6 +614,7 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>  	dma->iova = iova;
>  	dma->vaddr = vaddr;
>  	dma->prot = prot;
> +	dma->type = VFIO_IOVA_USER;
>  
>  	/* Insert zero-sized and grow as we map chunks of it */
>  	vfio_link_dma(iommu, dma);

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v13 09/15] vfio/type1: vfio_find_dma accepting a type argument
@ 2016-10-06 20:18     ` Alex Williamson
  0 siblings, 0 replies; 109+ messages in thread
From: Alex Williamson @ 2016-10-06 20:18 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, christoffer.dall, marc.zyngier, robin.murphy,
	will.deacon, joro, tglx, jason, linux-arm-kernel, kvm, drjones,
	linux-kernel, Bharat.Bhushan, pranav.sawargaonkar, p.fedin,
	iommu, Jean-Philippe.Brucker, yehuday, Manish.Jaggi

On Thu,  6 Oct 2016 08:45:25 +0000
Eric Auger <eric.auger@redhat.com> wrote:

> In our RB-tree we get prepared to insert slots of different types
> (USER and RESERVED). It becomes useful to be able to search for dma
> slots of a specific type or any type.
> 
> This patch introduces vfio_find_dma_from_node which starts the
> search from a given node and stops on the first node that matches
> the @start and @size parameters. If this node also matches the
> @type parameter, the node is returned else NULL is returned.
> 
> At the moment we only have USER SLOTS so the type will always match.
> 
> In a separate patch, this function will be enhanced to pursue the
> search recursively in case a node with a different type is
> encountered.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 53 +++++++++++++++++++++++++++++++++--------
>  1 file changed, 43 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index a9f8b93..cb7267a 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -94,25 +94,56 @@ struct vfio_group {
>   * into DMA'ble space using the IOMMU
>   */
>  
> -static struct vfio_dma *vfio_find_dma(struct vfio_iommu *iommu,
> -				      dma_addr_t start, size_t size)
> +/**
> + * vfio_find_dma_from_node: looks for a dma slot intersecting a window
> + * from a given rb tree node
> + * @top: top rb tree node where the search starts (including this node)
> + * @start: window start
> + * @size: window size
> + * @type: window type
> + */
> +static struct vfio_dma *vfio_find_dma_from_node(struct rb_node *top,
> +						dma_addr_t start, size_t size,
> +						enum vfio_iova_type type)
>  {
> -	struct rb_node *node = iommu->dma_list.rb_node;
> +	struct rb_node *node = top;
> +	struct vfio_dma *dma;
>  
>  	while (node) {
> -		struct vfio_dma *dma = rb_entry(node, struct vfio_dma, node);
> -
> +		dma = rb_entry(node, struct vfio_dma, node);
>  		if (start + size <= dma->iova)
>  			node = node->rb_left;
>  		else if (start >= dma->iova + dma->size)
>  			node = node->rb_right;
>  		else
> -			return dma;
> +			break;
>  	}
> +	if (!node)
> +		return NULL;
> +
> +	/* a dma slot intersects our window, check the type also matches */
> +	if (type == VFIO_IOVA_ANY || dma->type == type)
> +		return dma;
>  
>  	return NULL;
>  }
>  
> +/**
> + * vfio_find_dma: find a dma slot intersecting a given window
> + * @iommu: vfio iommu handle
> + * @start: window base iova
> + * @size: window size
> + * @type: window type
> + */
> +static struct vfio_dma *vfio_find_dma(struct vfio_iommu *iommu,
> +				      dma_addr_t start, size_t size,
> +				      enum vfio_iova_type type)
> +{
> +	struct rb_node *top_node = iommu->dma_list.rb_node;
> +
> +	return vfio_find_dma_from_node(top_node, start, size, type);

nit, we could do without the top_node variable.

> +}
> +
>  static void vfio_link_dma(struct vfio_iommu *iommu, struct vfio_dma *new)
>  {
>  	struct rb_node **link = &iommu->dma_list.rb_node, *parent = NULL;
> @@ -484,19 +515,21 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>  	 * mappings within the range.
>  	 */
>  	if (iommu->v2) {
> -		dma = vfio_find_dma(iommu, unmap->iova, 0);
> +		dma = vfio_find_dma(iommu, unmap->iova, 0, VFIO_IOVA_USER);
>  		if (dma && dma->iova != unmap->iova) {
>  			ret = -EINVAL;
>  			goto unlock;
>  		}
> -		dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0);
> +		dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0,
> +				    VFIO_IOVA_USER);
>  		if (dma && dma->iova + dma->size != unmap->iova + unmap->size) {
>  			ret = -EINVAL;
>  			goto unlock;
>  		}
>  	}
>  
> -	while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size))) {
> +	while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size,
> +				    VFIO_IOVA_USER))) {
>  		if (!iommu->v2 && unmap->iova > dma->iova)
>  			break;
>  		unmapped += dma->size;
> @@ -600,7 +633,7 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>  
>  	mutex_lock(&iommu->lock);
>  
> -	if (vfio_find_dma(iommu, iova, size)) {
> +	if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
>  		mutex_unlock(&iommu->lock);
>  		return -EEXIST;
>  	}

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v13 09/15] vfio/type1: vfio_find_dma accepting a type argument
@ 2016-10-06 20:18     ` Alex Williamson
  0 siblings, 0 replies; 109+ messages in thread
From: Alex Williamson @ 2016-10-06 20:18 UTC (permalink / raw)
  To: Eric Auger
  Cc: yehuday-eYqpPyKDWXRBDgjK7y7TUQ, drjones-H+wXaHxf7aLQT0dZR+AlfA,
	jason-NLaQJdtUoK4Be96aLqz0jA, kvm-u79uwXL29TY76Z2rM5mHXA,
	marc.zyngier-5wv7dgnIgG8, p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	tglx-hfZtesqFncYOwBW4kG4KsQ,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w

On Thu,  6 Oct 2016 08:45:25 +0000
Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> In our RB-tree we get prepared to insert slots of different types
> (USER and RESERVED). It becomes useful to be able to search for dma
> slots of a specific type or any type.
> 
> This patch introduces vfio_find_dma_from_node which starts the
> search from a given node and stops on the first node that matches
> the @start and @size parameters. If this node also matches the
> @type parameter, the node is returned else NULL is returned.
> 
> At the moment we only have USER SLOTS so the type will always match.
> 
> In a separate patch, this function will be enhanced to pursue the
> search recursively in case a node with a different type is
> encountered.
> 
> Signed-off-by: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 53 +++++++++++++++++++++++++++++++++--------
>  1 file changed, 43 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index a9f8b93..cb7267a 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -94,25 +94,56 @@ struct vfio_group {
>   * into DMA'ble space using the IOMMU
>   */
>  
> -static struct vfio_dma *vfio_find_dma(struct vfio_iommu *iommu,
> -				      dma_addr_t start, size_t size)
> +/**
> + * vfio_find_dma_from_node: looks for a dma slot intersecting a window
> + * from a given rb tree node
> + * @top: top rb tree node where the search starts (including this node)
> + * @start: window start
> + * @size: window size
> + * @type: window type
> + */
> +static struct vfio_dma *vfio_find_dma_from_node(struct rb_node *top,
> +						dma_addr_t start, size_t size,
> +						enum vfio_iova_type type)
>  {
> -	struct rb_node *node = iommu->dma_list.rb_node;
> +	struct rb_node *node = top;
> +	struct vfio_dma *dma;
>  
>  	while (node) {
> -		struct vfio_dma *dma = rb_entry(node, struct vfio_dma, node);
> -
> +		dma = rb_entry(node, struct vfio_dma, node);
>  		if (start + size <= dma->iova)
>  			node = node->rb_left;
>  		else if (start >= dma->iova + dma->size)
>  			node = node->rb_right;
>  		else
> -			return dma;
> +			break;
>  	}
> +	if (!node)
> +		return NULL;
> +
> +	/* a dma slot intersects our window, check the type also matches */
> +	if (type == VFIO_IOVA_ANY || dma->type == type)
> +		return dma;
>  
>  	return NULL;
>  }
>  
> +/**
> + * vfio_find_dma: find a dma slot intersecting a given window
> + * @iommu: vfio iommu handle
> + * @start: window base iova
> + * @size: window size
> + * @type: window type
> + */
> +static struct vfio_dma *vfio_find_dma(struct vfio_iommu *iommu,
> +				      dma_addr_t start, size_t size,
> +				      enum vfio_iova_type type)
> +{
> +	struct rb_node *top_node = iommu->dma_list.rb_node;
> +
> +	return vfio_find_dma_from_node(top_node, start, size, type);

nit, we could do without the top_node variable.

> +}
> +
>  static void vfio_link_dma(struct vfio_iommu *iommu, struct vfio_dma *new)
>  {
>  	struct rb_node **link = &iommu->dma_list.rb_node, *parent = NULL;
> @@ -484,19 +515,21 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>  	 * mappings within the range.
>  	 */
>  	if (iommu->v2) {
> -		dma = vfio_find_dma(iommu, unmap->iova, 0);
> +		dma = vfio_find_dma(iommu, unmap->iova, 0, VFIO_IOVA_USER);
>  		if (dma && dma->iova != unmap->iova) {
>  			ret = -EINVAL;
>  			goto unlock;
>  		}
> -		dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0);
> +		dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0,
> +				    VFIO_IOVA_USER);
>  		if (dma && dma->iova + dma->size != unmap->iova + unmap->size) {
>  			ret = -EINVAL;
>  			goto unlock;
>  		}
>  	}
>  
> -	while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size))) {
> +	while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size,
> +				    VFIO_IOVA_USER))) {
>  		if (!iommu->v2 && unmap->iova > dma->iova)
>  			break;
>  		unmapped += dma->size;
> @@ -600,7 +633,7 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>  
>  	mutex_lock(&iommu->lock);
>  
> -	if (vfio_find_dma(iommu, iova, size)) {
> +	if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
>  		mutex_unlock(&iommu->lock);
>  		return -EEXIST;
>  	}

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v13 09/15] vfio/type1: vfio_find_dma accepting a type argument
@ 2016-10-06 20:18     ` Alex Williamson
  0 siblings, 0 replies; 109+ messages in thread
From: Alex Williamson @ 2016-10-06 20:18 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu,  6 Oct 2016 08:45:25 +0000
Eric Auger <eric.auger@redhat.com> wrote:

> In our RB-tree we get prepared to insert slots of different types
> (USER and RESERVED). It becomes useful to be able to search for dma
> slots of a specific type or any type.
> 
> This patch introduces vfio_find_dma_from_node which starts the
> search from a given node and stops on the first node that matches
> the @start and @size parameters. If this node also matches the
> @type parameter, the node is returned else NULL is returned.
> 
> At the moment we only have USER SLOTS so the type will always match.
> 
> In a separate patch, this function will be enhanced to pursue the
> search recursively in case a node with a different type is
> encountered.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 53 +++++++++++++++++++++++++++++++++--------
>  1 file changed, 43 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index a9f8b93..cb7267a 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -94,25 +94,56 @@ struct vfio_group {
>   * into DMA'ble space using the IOMMU
>   */
>  
> -static struct vfio_dma *vfio_find_dma(struct vfio_iommu *iommu,
> -				      dma_addr_t start, size_t size)
> +/**
> + * vfio_find_dma_from_node: looks for a dma slot intersecting a window
> + * from a given rb tree node
> + * @top: top rb tree node where the search starts (including this node)
> + * @start: window start
> + * @size: window size
> + * @type: window type
> + */
> +static struct vfio_dma *vfio_find_dma_from_node(struct rb_node *top,
> +						dma_addr_t start, size_t size,
> +						enum vfio_iova_type type)
>  {
> -	struct rb_node *node = iommu->dma_list.rb_node;
> +	struct rb_node *node = top;
> +	struct vfio_dma *dma;
>  
>  	while (node) {
> -		struct vfio_dma *dma = rb_entry(node, struct vfio_dma, node);
> -
> +		dma = rb_entry(node, struct vfio_dma, node);
>  		if (start + size <= dma->iova)
>  			node = node->rb_left;
>  		else if (start >= dma->iova + dma->size)
>  			node = node->rb_right;
>  		else
> -			return dma;
> +			break;
>  	}
> +	if (!node)
> +		return NULL;
> +
> +	/* a dma slot intersects our window, check the type also matches */
> +	if (type == VFIO_IOVA_ANY || dma->type == type)
> +		return dma;
>  
>  	return NULL;
>  }
>  
> +/**
> + * vfio_find_dma: find a dma slot intersecting a given window
> + * @iommu: vfio iommu handle
> + * @start: window base iova
> + * @size: window size
> + * @type: window type
> + */
> +static struct vfio_dma *vfio_find_dma(struct vfio_iommu *iommu,
> +				      dma_addr_t start, size_t size,
> +				      enum vfio_iova_type type)
> +{
> +	struct rb_node *top_node = iommu->dma_list.rb_node;
> +
> +	return vfio_find_dma_from_node(top_node, start, size, type);

nit, we could do without the top_node variable.

> +}
> +
>  static void vfio_link_dma(struct vfio_iommu *iommu, struct vfio_dma *new)
>  {
>  	struct rb_node **link = &iommu->dma_list.rb_node, *parent = NULL;
> @@ -484,19 +515,21 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>  	 * mappings within the range.
>  	 */
>  	if (iommu->v2) {
> -		dma = vfio_find_dma(iommu, unmap->iova, 0);
> +		dma = vfio_find_dma(iommu, unmap->iova, 0, VFIO_IOVA_USER);
>  		if (dma && dma->iova != unmap->iova) {
>  			ret = -EINVAL;
>  			goto unlock;
>  		}
> -		dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0);
> +		dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0,
> +				    VFIO_IOVA_USER);
>  		if (dma && dma->iova + dma->size != unmap->iova + unmap->size) {
>  			ret = -EINVAL;
>  			goto unlock;
>  		}
>  	}
>  
> -	while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size))) {
> +	while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size,
> +				    VFIO_IOVA_USER))) {
>  		if (!iommu->v2 && unmap->iova > dma->iova)
>  			break;
>  		unmapped += dma->size;
> @@ -600,7 +633,7 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>  
>  	mutex_lock(&iommu->lock);
>  
> -	if (vfio_find_dma(iommu, iova, size)) {
> +	if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
>  		mutex_unlock(&iommu->lock);
>  		return -EEXIST;
>  	}

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v13 10/15] vfio/type1: Implement recursive vfio_find_dma_from_node
@ 2016-10-06 20:19     ` Alex Williamson
  0 siblings, 0 replies; 109+ messages in thread
From: Alex Williamson @ 2016-10-06 20:19 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, christoffer.dall, marc.zyngier, robin.murphy,
	will.deacon, joro, tglx, jason, linux-arm-kernel, kvm, drjones,
	linux-kernel, Bharat.Bhushan, pranav.sawargaonkar, p.fedin,
	iommu, Jean-Philippe.Brucker, yehuday, Manish.Jaggi

On Thu,  6 Oct 2016 08:45:26 +0000
Eric Auger <eric.auger@redhat.com> wrote:

> This patch handles the case where a node is encountered, matching
> @start and @size arguments but not matching the @type argument.
> In that case, we need to skip that node and pursue the search in the
> node's leaves. In case @start is inferior to the node's base, we
> resume the search on the left leaf. If the recursive search on the left
> leaves did not produce any match, we search the right leaves recursively.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>

Acked-by: Alex Williamson <alex.williamson@redhat.com>
 
> ---
> 
> v10: creation
> ---
>  drivers/vfio/vfio_iommu_type1.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index cb7267a..65a4038 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -125,7 +125,17 @@ static struct vfio_dma *vfio_find_dma_from_node(struct rb_node *top,
>  	if (type == VFIO_IOVA_ANY || dma->type == type)
>  		return dma;
>  
> -	return NULL;
> +	/* restart 2 searches skipping the current node */
> +	if (start < dma->iova) {
> +		dma = vfio_find_dma_from_node(node->rb_left, start,
> +					      size, type);
> +		if (dma)
> +			return dma;
> +	}
> +	if (start + size > dma->iova + dma->size)
> +		dma = vfio_find_dma_from_node(node->rb_right, start,
> +					      size, type);
> +	return dma;
>  }
>  
>  /**

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v13 10/15] vfio/type1: Implement recursive vfio_find_dma_from_node
@ 2016-10-06 20:19     ` Alex Williamson
  0 siblings, 0 replies; 109+ messages in thread
From: Alex Williamson @ 2016-10-06 20:19 UTC (permalink / raw)
  To: Eric Auger
  Cc: yehuday-eYqpPyKDWXRBDgjK7y7TUQ, drjones-H+wXaHxf7aLQT0dZR+AlfA,
	jason-NLaQJdtUoK4Be96aLqz0jA, kvm-u79uwXL29TY76Z2rM5mHXA,
	marc.zyngier-5wv7dgnIgG8, p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	tglx-hfZtesqFncYOwBW4kG4KsQ,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w

On Thu,  6 Oct 2016 08:45:26 +0000
Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> This patch handles the case where a node is encountered, matching
> @start and @size arguments but not matching the @type argument.
> In that case, we need to skip that node and pursue the search in the
> node's leaves. In case @start is inferior to the node's base, we
> resume the search on the left leaf. If the recursive search on the left
> leaves did not produce any match, we search the right leaves recursively.
> 
> Signed-off-by: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

Acked-by: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
 
> ---
> 
> v10: creation
> ---
>  drivers/vfio/vfio_iommu_type1.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index cb7267a..65a4038 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -125,7 +125,17 @@ static struct vfio_dma *vfio_find_dma_from_node(struct rb_node *top,
>  	if (type == VFIO_IOVA_ANY || dma->type == type)
>  		return dma;
>  
> -	return NULL;
> +	/* restart 2 searches skipping the current node */
> +	if (start < dma->iova) {
> +		dma = vfio_find_dma_from_node(node->rb_left, start,
> +					      size, type);
> +		if (dma)
> +			return dma;
> +	}
> +	if (start + size > dma->iova + dma->size)
> +		dma = vfio_find_dma_from_node(node->rb_right, start,
> +					      size, type);
> +	return dma;
>  }
>  
>  /**

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v13 10/15] vfio/type1: Implement recursive vfio_find_dma_from_node
@ 2016-10-06 20:19     ` Alex Williamson
  0 siblings, 0 replies; 109+ messages in thread
From: Alex Williamson @ 2016-10-06 20:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu,  6 Oct 2016 08:45:26 +0000
Eric Auger <eric.auger@redhat.com> wrote:

> This patch handles the case where a node is encountered, matching
> @start and @size arguments but not matching the @type argument.
> In that case, we need to skip that node and pursue the search in the
> node's leaves. In case @start is inferior to the node's base, we
> resume the search on the left leaf. If the recursive search on the left
> leaves did not produce any match, we search the right leaves recursively.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>

Acked-by: Alex Williamson <alex.williamson@redhat.com>
 
> ---
> 
> v10: creation
> ---
>  drivers/vfio/vfio_iommu_type1.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index cb7267a..65a4038 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -125,7 +125,17 @@ static struct vfio_dma *vfio_find_dma_from_node(struct rb_node *top,
>  	if (type == VFIO_IOVA_ANY || dma->type == type)
>  		return dma;
>  
> -	return NULL;
> +	/* restart 2 searches skipping the current node */
> +	if (start < dma->iova) {
> +		dma = vfio_find_dma_from_node(node->rb_left, start,
> +					      size, type);
> +		if (dma)
> +			return dma;
> +	}
> +	if (start + size > dma->iova + dma->size)
> +		dma = vfio_find_dma_from_node(node->rb_right, start,
> +					      size, type);
> +	return dma;
>  }
>  
>  /**

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v13 11/15] vfio/type1: Handle unmap/unpin and replay for VFIO_IOVA_RESERVED slots
@ 2016-10-06 20:19     ` Alex Williamson
  0 siblings, 0 replies; 109+ messages in thread
From: Alex Williamson @ 2016-10-06 20:19 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, christoffer.dall, marc.zyngier, robin.murphy,
	will.deacon, joro, tglx, jason, linux-arm-kernel, kvm, drjones,
	linux-kernel, Bharat.Bhushan, pranav.sawargaonkar, p.fedin,
	iommu, Jean-Philippe.Brucker, yehuday, Manish.Jaggi

On Thu,  6 Oct 2016 08:45:27 +0000
Eric Auger <eric.auger@redhat.com> wrote:

> Before allowing the end-user to create VFIO_IOVA_RESERVED dma slots,
> let's implement the expected behavior for removal and replay.
> 
> As opposed to user dma slots, reserved IOVAs are not systematically bound
> to PAs and PAs are not pinned. VFIO just initializes the IOVA "aperture".
> IOVAs are allocated outside of the VFIO framework, by the MSI layer which
> is responsible to free and unmap them. The MSI mapping resources are freeed

nit, extra 'e', "freed"

> by the IOMMU driver on domain destruction.
> 
> On the creation of a new domain, the "replay" of a reserved slot simply
> needs to set the MSI aperture on the new domain.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> 
> ---
> v12 -> v13:
> - use dma-iommu iommu_get_dma_msi_region_cookie
> 
> v9 -> v10:
> - replay of a reserved slot sets the MSI aperture on the new domain
> - use VFIO_IOVA_RESERVED_MSI enum value instead of VFIO_IOVA_RESERVED
> 
> v7 -> v8:
> - do no destroy anything anymore, just bypass unmap/unpin and iommu_map
>   on replay
> ---
>  drivers/vfio/Kconfig            |  1 +
>  drivers/vfio/vfio_iommu_type1.c | 10 +++++++++-
>  2 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> index da6e2ce..673ec79 100644
> --- a/drivers/vfio/Kconfig
> +++ b/drivers/vfio/Kconfig
> @@ -1,6 +1,7 @@
>  config VFIO_IOMMU_TYPE1
>  	tristate
>  	depends on VFIO
> +	select IOMMU_DMA
>  	default n
>  
>  config VFIO_IOMMU_SPAPR_TCE
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 65a4038..5bc5fc9 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -36,6 +36,7 @@
>  #include <linux/uaccess.h>
>  #include <linux/vfio.h>
>  #include <linux/workqueue.h>
> +#include <linux/dma-iommu.h>
>  
>  #define DRIVER_VERSION  "0.2"
>  #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
> @@ -387,7 +388,7 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
>  	struct vfio_domain *domain, *d;
>  	long unlocked = 0;
>  
> -	if (!dma->size)
> +	if (!dma->size || dma->type != VFIO_IOVA_USER)
>  		return;
>  	/*
>  	 * We use the IOMMU to track the physical addresses, otherwise we'd
> @@ -724,6 +725,13 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
>  		dma = rb_entry(n, struct vfio_dma, node);
>  		iova = dma->iova;
>  
> +		if (dma->type == VFIO_IOVA_RESERVED_MSI) {
> +			ret = iommu_get_dma_msi_region_cookie(domain->domain,
> +						     dma->iova, dma->size);
> +			WARN_ON(ret);
> +			continue;
> +		}

Why is this a passable error?  We consider an iommu_map() error on any
entry a failure.

> +
>  		while (iova < dma->iova + dma->size) {
>  			phys_addr_t phys = iommu_iova_to_phys(d->domain, iova);
>  			size_t size;

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v13 11/15] vfio/type1: Handle unmap/unpin and replay for VFIO_IOVA_RESERVED slots
@ 2016-10-06 20:19     ` Alex Williamson
  0 siblings, 0 replies; 109+ messages in thread
From: Alex Williamson @ 2016-10-06 20:19 UTC (permalink / raw)
  To: Eric Auger
  Cc: yehuday-eYqpPyKDWXRBDgjK7y7TUQ, drjones-H+wXaHxf7aLQT0dZR+AlfA,
	jason-NLaQJdtUoK4Be96aLqz0jA, kvm-u79uwXL29TY76Z2rM5mHXA,
	marc.zyngier-5wv7dgnIgG8, p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	tglx-hfZtesqFncYOwBW4kG4KsQ,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w

On Thu,  6 Oct 2016 08:45:27 +0000
Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> Before allowing the end-user to create VFIO_IOVA_RESERVED dma slots,
> let's implement the expected behavior for removal and replay.
> 
> As opposed to user dma slots, reserved IOVAs are not systematically bound
> to PAs and PAs are not pinned. VFIO just initializes the IOVA "aperture".
> IOVAs are allocated outside of the VFIO framework, by the MSI layer which
> is responsible to free and unmap them. The MSI mapping resources are freeed

nit, extra 'e', "freed"

> by the IOMMU driver on domain destruction.
> 
> On the creation of a new domain, the "replay" of a reserved slot simply
> needs to set the MSI aperture on the new domain.
> 
> Signed-off-by: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> 
> ---
> v12 -> v13:
> - use dma-iommu iommu_get_dma_msi_region_cookie
> 
> v9 -> v10:
> - replay of a reserved slot sets the MSI aperture on the new domain
> - use VFIO_IOVA_RESERVED_MSI enum value instead of VFIO_IOVA_RESERVED
> 
> v7 -> v8:
> - do no destroy anything anymore, just bypass unmap/unpin and iommu_map
>   on replay
> ---
>  drivers/vfio/Kconfig            |  1 +
>  drivers/vfio/vfio_iommu_type1.c | 10 +++++++++-
>  2 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> index da6e2ce..673ec79 100644
> --- a/drivers/vfio/Kconfig
> +++ b/drivers/vfio/Kconfig
> @@ -1,6 +1,7 @@
>  config VFIO_IOMMU_TYPE1
>  	tristate
>  	depends on VFIO
> +	select IOMMU_DMA
>  	default n
>  
>  config VFIO_IOMMU_SPAPR_TCE
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 65a4038..5bc5fc9 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -36,6 +36,7 @@
>  #include <linux/uaccess.h>
>  #include <linux/vfio.h>
>  #include <linux/workqueue.h>
> +#include <linux/dma-iommu.h>
>  
>  #define DRIVER_VERSION  "0.2"
>  #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>"
> @@ -387,7 +388,7 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
>  	struct vfio_domain *domain, *d;
>  	long unlocked = 0;
>  
> -	if (!dma->size)
> +	if (!dma->size || dma->type != VFIO_IOVA_USER)
>  		return;
>  	/*
>  	 * We use the IOMMU to track the physical addresses, otherwise we'd
> @@ -724,6 +725,13 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
>  		dma = rb_entry(n, struct vfio_dma, node);
>  		iova = dma->iova;
>  
> +		if (dma->type == VFIO_IOVA_RESERVED_MSI) {
> +			ret = iommu_get_dma_msi_region_cookie(domain->domain,
> +						     dma->iova, dma->size);
> +			WARN_ON(ret);
> +			continue;
> +		}

Why is this a passable error?  We consider an iommu_map() error on any
entry a failure.

> +
>  		while (iova < dma->iova + dma->size) {
>  			phys_addr_t phys = iommu_iova_to_phys(d->domain, iova);
>  			size_t size;

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v13 11/15] vfio/type1: Handle unmap/unpin and replay for VFIO_IOVA_RESERVED slots
@ 2016-10-06 20:19     ` Alex Williamson
  0 siblings, 0 replies; 109+ messages in thread
From: Alex Williamson @ 2016-10-06 20:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu,  6 Oct 2016 08:45:27 +0000
Eric Auger <eric.auger@redhat.com> wrote:

> Before allowing the end-user to create VFIO_IOVA_RESERVED dma slots,
> let's implement the expected behavior for removal and replay.
> 
> As opposed to user dma slots, reserved IOVAs are not systematically bound
> to PAs and PAs are not pinned. VFIO just initializes the IOVA "aperture".
> IOVAs are allocated outside of the VFIO framework, by the MSI layer which
> is responsible to free and unmap them. The MSI mapping resources are freeed

nit, extra 'e', "freed"

> by the IOMMU driver on domain destruction.
> 
> On the creation of a new domain, the "replay" of a reserved slot simply
> needs to set the MSI aperture on the new domain.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> 
> ---
> v12 -> v13:
> - use dma-iommu iommu_get_dma_msi_region_cookie
> 
> v9 -> v10:
> - replay of a reserved slot sets the MSI aperture on the new domain
> - use VFIO_IOVA_RESERVED_MSI enum value instead of VFIO_IOVA_RESERVED
> 
> v7 -> v8:
> - do no destroy anything anymore, just bypass unmap/unpin and iommu_map
>   on replay
> ---
>  drivers/vfio/Kconfig            |  1 +
>  drivers/vfio/vfio_iommu_type1.c | 10 +++++++++-
>  2 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> index da6e2ce..673ec79 100644
> --- a/drivers/vfio/Kconfig
> +++ b/drivers/vfio/Kconfig
> @@ -1,6 +1,7 @@
>  config VFIO_IOMMU_TYPE1
>  	tristate
>  	depends on VFIO
> +	select IOMMU_DMA
>  	default n
>  
>  config VFIO_IOMMU_SPAPR_TCE
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 65a4038..5bc5fc9 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -36,6 +36,7 @@
>  #include <linux/uaccess.h>
>  #include <linux/vfio.h>
>  #include <linux/workqueue.h>
> +#include <linux/dma-iommu.h>
>  
>  #define DRIVER_VERSION  "0.2"
>  #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
> @@ -387,7 +388,7 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
>  	struct vfio_domain *domain, *d;
>  	long unlocked = 0;
>  
> -	if (!dma->size)
> +	if (!dma->size || dma->type != VFIO_IOVA_USER)
>  		return;
>  	/*
>  	 * We use the IOMMU to track the physical addresses, otherwise we'd
> @@ -724,6 +725,13 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
>  		dma = rb_entry(n, struct vfio_dma, node);
>  		iova = dma->iova;
>  
> +		if (dma->type == VFIO_IOVA_RESERVED_MSI) {
> +			ret = iommu_get_dma_msi_region_cookie(domain->domain,
> +						     dma->iova, dma->size);
> +			WARN_ON(ret);
> +			continue;
> +		}

Why is this a passable error?  We consider an iommu_map() error on any
entry a failure.

> +
>  		while (iova < dma->iova + dma->size) {
>  			phys_addr_t phys = iommu_iova_to_phys(d->domain, iova);
>  			size_t size;

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v13 12/15] vfio: Allow reserved msi iova registration
@ 2016-10-06 20:19     ` Alex Williamson
  0 siblings, 0 replies; 109+ messages in thread
From: Alex Williamson @ 2016-10-06 20:19 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, christoffer.dall, marc.zyngier, robin.murphy,
	will.deacon, joro, tglx, jason, linux-arm-kernel, kvm, drjones,
	linux-kernel, Bharat.Bhushan, pranav.sawargaonkar, p.fedin,
	iommu, Jean-Philippe.Brucker, yehuday, Manish.Jaggi

On Thu,  6 Oct 2016 08:45:28 +0000
Eric Auger <eric.auger@redhat.com> wrote:

> The user is allowed to register a reserved MSI IOVA range by using the
> DMA MAP API and setting the new flag: VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA.
> This region is stored in the vfio_dma rb tree. At that point the iova
> range is not mapped to any target address yet. The host kernel will use
> those iova when needed, typically when MSIs are allocated.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
> 
> ---
> v12 -> v13:
> - use iommu_get_dma_msi_region_cookie
> 
> v9 -> v10
> - use VFIO_IOVA_RESERVED_MSI enum value
> 
> v7 -> v8:
> - use iommu_msi_set_aperture function. There is no notion of
>   unregistration anymore since the reserved msi slot remains
>   until the container gets closed.
> 
> v6 -> v7:
> - use iommu_free_reserved_iova_domain
> - convey prot attributes downto dma-reserved-iommu iova domain creation
> - reserved bindings teardown now performed on iommu domain destruction
> - rename VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA into
>          VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA
> - change title
> - pass the protection attribute to dma-reserved-iommu API
> 
> v3 -> v4:
> - use iommu_alloc/free_reserved_iova_domain exported by dma-reserved-iommu
> - protect vfio_register_reserved_iova_range implementation with
>   CONFIG_IOMMU_DMA_RESERVED
> - handle unregistration by user-space and on vfio_iommu_type1 release
> 
> v1 -> v2:
> - set returned value according to alloc_reserved_iova_domain result
> - free the iova domains in case any error occurs
> 
> RFC v1 -> v1:
> - takes into account Alex comments, based on
>   [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region:
> - use the existing dma map/unmap ioctl interface with a flag to register
>   a reserved IOVA range. A single reserved iova region is allowed.
> ---
>  drivers/vfio/vfio_iommu_type1.c | 77 ++++++++++++++++++++++++++++++++++++++++-
>  include/uapi/linux/vfio.h       | 10 +++++-
>  2 files changed, 85 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 5bc5fc9..c2f8bd9 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -442,6 +442,20 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
>  	vfio_lock_acct(-unlocked);
>  }
>  
> +static int vfio_set_msi_aperture(struct vfio_iommu *iommu,
> +				dma_addr_t iova, size_t size)
> +{
> +	struct vfio_domain *d;
> +	int ret = 0;
> +
> +	list_for_each_entry(d, &iommu->domain_list, next) {
> +		ret = iommu_get_dma_msi_region_cookie(d->domain, iova, size);
> +		if (ret)
> +			break;
> +	}
> +	return ret;

Doesn't this need an unwind on failure loop?

> +}
> +
>  static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
>  {
>  	vfio_unmap_unpin(iommu, dma);
> @@ -691,6 +705,63 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>  	return ret;
>  }
>  
> +static int vfio_register_msi_range(struct vfio_iommu *iommu,
> +				   struct vfio_iommu_type1_dma_map *map)
> +{
> +	dma_addr_t iova = map->iova;
> +	size_t size = map->size;
> +	int ret = 0;
> +	struct vfio_dma *dma;
> +	unsigned long order;
> +	uint64_t mask;
> +
> +	/* Verify that none of our __u64 fields overflow */
> +	if (map->size != size || map->iova != iova)
> +		return -EINVAL;
> +
> +	order =  __ffs(vfio_pgsize_bitmap(iommu));
> +	mask = ((uint64_t)1 << order) - 1;
> +
> +	WARN_ON(mask & PAGE_MASK);
> +
> +	if (!size || (size | iova) & mask)
> +		return -EINVAL;
> +
> +	/* Don't allow IOVA address wrap */
> +	if (iova + size - 1 < iova)
> +		return -EINVAL;
> +
> +	mutex_lock(&iommu->lock);
> +
> +	if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
> +		ret =  -EEXIST;
> +		goto unlock;
> +	}
> +
> +	dma = kzalloc(sizeof(*dma), GFP_KERNEL);
> +	if (!dma) {
> +		ret = -ENOMEM;
> +		goto unlock;
> +	}
> +
> +	dma->iova = iova;
> +	dma->size = size;
> +	dma->type = VFIO_IOVA_RESERVED_MSI;
> +
> +	ret = vfio_set_msi_aperture(iommu, iova, size);
> +	if (ret)
> +		goto free_unlock;
> +
> +	vfio_link_dma(iommu, dma);
> +	goto unlock;
> +
> +free_unlock:
> +	kfree(dma);
> +unlock:
> +	mutex_unlock(&iommu->lock);
> +	return ret;
> +}
> +
>  static int vfio_bus_type(struct device *dev, void *data)
>  {
>  	struct bus_type **bus = data;
> @@ -1064,7 +1135,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  	} else if (cmd == VFIO_IOMMU_MAP_DMA) {
>  		struct vfio_iommu_type1_dma_map map;
>  		uint32_t mask = VFIO_DMA_MAP_FLAG_READ |
> -				VFIO_DMA_MAP_FLAG_WRITE;
> +				VFIO_DMA_MAP_FLAG_WRITE |
> +				VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA;
>  
>  		minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
>  
> @@ -1074,6 +1146,9 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  		if (map.argsz < minsz || map.flags & ~mask)
>  			return -EINVAL;
>  
> +		if (map.flags & VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA)
> +			return vfio_register_msi_range(iommu, &map);
> +
>  		return vfio_dma_do_map(iommu, &map);
>  
>  	} else if (cmd == VFIO_IOMMU_UNMAP_DMA) {
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 255a211..4a9dbc2 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -498,12 +498,19 @@ struct vfio_iommu_type1_info {
>   *
>   * Map process virtual addresses to IO virtual addresses using the
>   * provided struct vfio_dma_map. Caller sets argsz. READ &/ WRITE required.
> + *
> + * In case RESERVED_MSI_IOVA flag is set, the API only aims at registering an
> + * IOVA region that will be used on some platforms to map the host MSI frames.
> + * In that specific case, vaddr is ignored. Once registered, an MSI reserved
> + * IOVA region stays until the container is closed.
>   */
>  struct vfio_iommu_type1_dma_map {
>  	__u32	argsz;
>  	__u32	flags;
>  #define VFIO_DMA_MAP_FLAG_READ (1 << 0)		/* readable from device */
>  #define VFIO_DMA_MAP_FLAG_WRITE (1 << 1)	/* writable from device */
> +/* reserved iova for MSI vectors*/
> +#define VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA (1 << 2)
>  	__u64	vaddr;				/* Process virtual address */
>  	__u64	iova;				/* IO virtual address */
>  	__u64	size;				/* Size of mapping (bytes) */
> @@ -519,7 +526,8 @@ struct vfio_iommu_type1_dma_map {
>   * Caller sets argsz.  The actual unmapped size is returned in the size
>   * field.  No guarantee is made to the user that arbitrary unmaps of iova
>   * or size different from those used in the original mapping call will
> - * succeed.
> + * succeed. Once registered, an MSI region cannot be unmapped and stays
> + * until the container is closed.
>   */
>  struct vfio_iommu_type1_dma_unmap {
>  	__u32	argsz;

What happens when an x86 user does a mapping with this new flag set?
It seems like we end up configuring everything just as we would on a
platform requiring MSI mapping, including setting the domain MSI
geometry.  Should we be testing the MSI geometry flag on the iommu to
see if this is supported?  Surprisingly few things seem to check that
flag.

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v13 12/15] vfio: Allow reserved msi iova registration
@ 2016-10-06 20:19     ` Alex Williamson
  0 siblings, 0 replies; 109+ messages in thread
From: Alex Williamson @ 2016-10-06 20:19 UTC (permalink / raw)
  To: Eric Auger
  Cc: yehuday-eYqpPyKDWXRBDgjK7y7TUQ, drjones-H+wXaHxf7aLQT0dZR+AlfA,
	jason-NLaQJdtUoK4Be96aLqz0jA, kvm-u79uwXL29TY76Z2rM5mHXA,
	marc.zyngier-5wv7dgnIgG8, p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	tglx-hfZtesqFncYOwBW4kG4KsQ,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w

On Thu,  6 Oct 2016 08:45:28 +0000
Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> The user is allowed to register a reserved MSI IOVA range by using the
> DMA MAP API and setting the new flag: VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA.
> This region is stored in the vfio_dma rb tree. At that point the iova
> range is not mapped to any target address yet. The host kernel will use
> those iova when needed, typically when MSIs are allocated.
> 
> Signed-off-by: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Signed-off-by: Bharat Bhushan <Bharat.Bhushan-KZfg59tc24xl57MIdRCFDg@public.gmane.org>
> 
> ---
> v12 -> v13:
> - use iommu_get_dma_msi_region_cookie
> 
> v9 -> v10
> - use VFIO_IOVA_RESERVED_MSI enum value
> 
> v7 -> v8:
> - use iommu_msi_set_aperture function. There is no notion of
>   unregistration anymore since the reserved msi slot remains
>   until the container gets closed.
> 
> v6 -> v7:
> - use iommu_free_reserved_iova_domain
> - convey prot attributes downto dma-reserved-iommu iova domain creation
> - reserved bindings teardown now performed on iommu domain destruction
> - rename VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA into
>          VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA
> - change title
> - pass the protection attribute to dma-reserved-iommu API
> 
> v3 -> v4:
> - use iommu_alloc/free_reserved_iova_domain exported by dma-reserved-iommu
> - protect vfio_register_reserved_iova_range implementation with
>   CONFIG_IOMMU_DMA_RESERVED
> - handle unregistration by user-space and on vfio_iommu_type1 release
> 
> v1 -> v2:
> - set returned value according to alloc_reserved_iova_domain result
> - free the iova domains in case any error occurs
> 
> RFC v1 -> v1:
> - takes into account Alex comments, based on
>   [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region:
> - use the existing dma map/unmap ioctl interface with a flag to register
>   a reserved IOVA range. A single reserved iova region is allowed.
> ---
>  drivers/vfio/vfio_iommu_type1.c | 77 ++++++++++++++++++++++++++++++++++++++++-
>  include/uapi/linux/vfio.h       | 10 +++++-
>  2 files changed, 85 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 5bc5fc9..c2f8bd9 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -442,6 +442,20 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
>  	vfio_lock_acct(-unlocked);
>  }
>  
> +static int vfio_set_msi_aperture(struct vfio_iommu *iommu,
> +				dma_addr_t iova, size_t size)
> +{
> +	struct vfio_domain *d;
> +	int ret = 0;
> +
> +	list_for_each_entry(d, &iommu->domain_list, next) {
> +		ret = iommu_get_dma_msi_region_cookie(d->domain, iova, size);
> +		if (ret)
> +			break;
> +	}
> +	return ret;

Doesn't this need an unwind on failure loop?

> +}
> +
>  static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
>  {
>  	vfio_unmap_unpin(iommu, dma);
> @@ -691,6 +705,63 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>  	return ret;
>  }
>  
> +static int vfio_register_msi_range(struct vfio_iommu *iommu,
> +				   struct vfio_iommu_type1_dma_map *map)
> +{
> +	dma_addr_t iova = map->iova;
> +	size_t size = map->size;
> +	int ret = 0;
> +	struct vfio_dma *dma;
> +	unsigned long order;
> +	uint64_t mask;
> +
> +	/* Verify that none of our __u64 fields overflow */
> +	if (map->size != size || map->iova != iova)
> +		return -EINVAL;
> +
> +	order =  __ffs(vfio_pgsize_bitmap(iommu));
> +	mask = ((uint64_t)1 << order) - 1;
> +
> +	WARN_ON(mask & PAGE_MASK);
> +
> +	if (!size || (size | iova) & mask)
> +		return -EINVAL;
> +
> +	/* Don't allow IOVA address wrap */
> +	if (iova + size - 1 < iova)
> +		return -EINVAL;
> +
> +	mutex_lock(&iommu->lock);
> +
> +	if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
> +		ret =  -EEXIST;
> +		goto unlock;
> +	}
> +
> +	dma = kzalloc(sizeof(*dma), GFP_KERNEL);
> +	if (!dma) {
> +		ret = -ENOMEM;
> +		goto unlock;
> +	}
> +
> +	dma->iova = iova;
> +	dma->size = size;
> +	dma->type = VFIO_IOVA_RESERVED_MSI;
> +
> +	ret = vfio_set_msi_aperture(iommu, iova, size);
> +	if (ret)
> +		goto free_unlock;
> +
> +	vfio_link_dma(iommu, dma);
> +	goto unlock;
> +
> +free_unlock:
> +	kfree(dma);
> +unlock:
> +	mutex_unlock(&iommu->lock);
> +	return ret;
> +}
> +
>  static int vfio_bus_type(struct device *dev, void *data)
>  {
>  	struct bus_type **bus = data;
> @@ -1064,7 +1135,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  	} else if (cmd == VFIO_IOMMU_MAP_DMA) {
>  		struct vfio_iommu_type1_dma_map map;
>  		uint32_t mask = VFIO_DMA_MAP_FLAG_READ |
> -				VFIO_DMA_MAP_FLAG_WRITE;
> +				VFIO_DMA_MAP_FLAG_WRITE |
> +				VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA;
>  
>  		minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
>  
> @@ -1074,6 +1146,9 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  		if (map.argsz < minsz || map.flags & ~mask)
>  			return -EINVAL;
>  
> +		if (map.flags & VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA)
> +			return vfio_register_msi_range(iommu, &map);
> +
>  		return vfio_dma_do_map(iommu, &map);
>  
>  	} else if (cmd == VFIO_IOMMU_UNMAP_DMA) {
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 255a211..4a9dbc2 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -498,12 +498,19 @@ struct vfio_iommu_type1_info {
>   *
>   * Map process virtual addresses to IO virtual addresses using the
>   * provided struct vfio_dma_map. Caller sets argsz. READ &/ WRITE required.
> + *
> + * In case RESERVED_MSI_IOVA flag is set, the API only aims at registering an
> + * IOVA region that will be used on some platforms to map the host MSI frames.
> + * In that specific case, vaddr is ignored. Once registered, an MSI reserved
> + * IOVA region stays until the container is closed.
>   */
>  struct vfio_iommu_type1_dma_map {
>  	__u32	argsz;
>  	__u32	flags;
>  #define VFIO_DMA_MAP_FLAG_READ (1 << 0)		/* readable from device */
>  #define VFIO_DMA_MAP_FLAG_WRITE (1 << 1)	/* writable from device */
> +/* reserved iova for MSI vectors*/
> +#define VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA (1 << 2)
>  	__u64	vaddr;				/* Process virtual address */
>  	__u64	iova;				/* IO virtual address */
>  	__u64	size;				/* Size of mapping (bytes) */
> @@ -519,7 +526,8 @@ struct vfio_iommu_type1_dma_map {
>   * Caller sets argsz.  The actual unmapped size is returned in the size
>   * field.  No guarantee is made to the user that arbitrary unmaps of iova
>   * or size different from those used in the original mapping call will
> - * succeed.
> + * succeed. Once registered, an MSI region cannot be unmapped and stays
> + * until the container is closed.
>   */
>  struct vfio_iommu_type1_dma_unmap {
>  	__u32	argsz;

What happens when an x86 user does a mapping with this new flag set?
It seems like we end up configuring everything just as we would on a
platform requiring MSI mapping, including setting the domain MSI
geometry.  Should we be testing the MSI geometry flag on the iommu to
see if this is supported?  Surprisingly few things seem to check that
flag.

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v13 12/15] vfio: Allow reserved msi iova registration
@ 2016-10-06 20:19     ` Alex Williamson
  0 siblings, 0 replies; 109+ messages in thread
From: Alex Williamson @ 2016-10-06 20:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu,  6 Oct 2016 08:45:28 +0000
Eric Auger <eric.auger@redhat.com> wrote:

> The user is allowed to register a reserved MSI IOVA range by using the
> DMA MAP API and setting the new flag: VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA.
> This region is stored in the vfio_dma rb tree. At that point the iova
> range is not mapped to any target address yet. The host kernel will use
> those iova when needed, typically when MSIs are allocated.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
> 
> ---
> v12 -> v13:
> - use iommu_get_dma_msi_region_cookie
> 
> v9 -> v10
> - use VFIO_IOVA_RESERVED_MSI enum value
> 
> v7 -> v8:
> - use iommu_msi_set_aperture function. There is no notion of
>   unregistration anymore since the reserved msi slot remains
>   until the container gets closed.
> 
> v6 -> v7:
> - use iommu_free_reserved_iova_domain
> - convey prot attributes downto dma-reserved-iommu iova domain creation
> - reserved bindings teardown now performed on iommu domain destruction
> - rename VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA into
>          VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA
> - change title
> - pass the protection attribute to dma-reserved-iommu API
> 
> v3 -> v4:
> - use iommu_alloc/free_reserved_iova_domain exported by dma-reserved-iommu
> - protect vfio_register_reserved_iova_range implementation with
>   CONFIG_IOMMU_DMA_RESERVED
> - handle unregistration by user-space and on vfio_iommu_type1 release
> 
> v1 -> v2:
> - set returned value according to alloc_reserved_iova_domain result
> - free the iova domains in case any error occurs
> 
> RFC v1 -> v1:
> - takes into account Alex comments, based on
>   [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region:
> - use the existing dma map/unmap ioctl interface with a flag to register
>   a reserved IOVA range. A single reserved iova region is allowed.
> ---
>  drivers/vfio/vfio_iommu_type1.c | 77 ++++++++++++++++++++++++++++++++++++++++-
>  include/uapi/linux/vfio.h       | 10 +++++-
>  2 files changed, 85 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 5bc5fc9..c2f8bd9 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -442,6 +442,20 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
>  	vfio_lock_acct(-unlocked);
>  }
>  
> +static int vfio_set_msi_aperture(struct vfio_iommu *iommu,
> +				dma_addr_t iova, size_t size)
> +{
> +	struct vfio_domain *d;
> +	int ret = 0;
> +
> +	list_for_each_entry(d, &iommu->domain_list, next) {
> +		ret = iommu_get_dma_msi_region_cookie(d->domain, iova, size);
> +		if (ret)
> +			break;
> +	}
> +	return ret;

Doesn't this need an unwind on failure loop?

> +}
> +
>  static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
>  {
>  	vfio_unmap_unpin(iommu, dma);
> @@ -691,6 +705,63 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>  	return ret;
>  }
>  
> +static int vfio_register_msi_range(struct vfio_iommu *iommu,
> +				   struct vfio_iommu_type1_dma_map *map)
> +{
> +	dma_addr_t iova = map->iova;
> +	size_t size = map->size;
> +	int ret = 0;
> +	struct vfio_dma *dma;
> +	unsigned long order;
> +	uint64_t mask;
> +
> +	/* Verify that none of our __u64 fields overflow */
> +	if (map->size != size || map->iova != iova)
> +		return -EINVAL;
> +
> +	order =  __ffs(vfio_pgsize_bitmap(iommu));
> +	mask = ((uint64_t)1 << order) - 1;
> +
> +	WARN_ON(mask & PAGE_MASK);
> +
> +	if (!size || (size | iova) & mask)
> +		return -EINVAL;
> +
> +	/* Don't allow IOVA address wrap */
> +	if (iova + size - 1 < iova)
> +		return -EINVAL;
> +
> +	mutex_lock(&iommu->lock);
> +
> +	if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
> +		ret =  -EEXIST;
> +		goto unlock;
> +	}
> +
> +	dma = kzalloc(sizeof(*dma), GFP_KERNEL);
> +	if (!dma) {
> +		ret = -ENOMEM;
> +		goto unlock;
> +	}
> +
> +	dma->iova = iova;
> +	dma->size = size;
> +	dma->type = VFIO_IOVA_RESERVED_MSI;
> +
> +	ret = vfio_set_msi_aperture(iommu, iova, size);
> +	if (ret)
> +		goto free_unlock;
> +
> +	vfio_link_dma(iommu, dma);
> +	goto unlock;
> +
> +free_unlock:
> +	kfree(dma);
> +unlock:
> +	mutex_unlock(&iommu->lock);
> +	return ret;
> +}
> +
>  static int vfio_bus_type(struct device *dev, void *data)
>  {
>  	struct bus_type **bus = data;
> @@ -1064,7 +1135,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  	} else if (cmd == VFIO_IOMMU_MAP_DMA) {
>  		struct vfio_iommu_type1_dma_map map;
>  		uint32_t mask = VFIO_DMA_MAP_FLAG_READ |
> -				VFIO_DMA_MAP_FLAG_WRITE;
> +				VFIO_DMA_MAP_FLAG_WRITE |
> +				VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA;
>  
>  		minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
>  
> @@ -1074,6 +1146,9 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  		if (map.argsz < minsz || map.flags & ~mask)
>  			return -EINVAL;
>  
> +		if (map.flags & VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA)
> +			return vfio_register_msi_range(iommu, &map);
> +
>  		return vfio_dma_do_map(iommu, &map);
>  
>  	} else if (cmd == VFIO_IOMMU_UNMAP_DMA) {
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 255a211..4a9dbc2 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -498,12 +498,19 @@ struct vfio_iommu_type1_info {
>   *
>   * Map process virtual addresses to IO virtual addresses using the
>   * provided struct vfio_dma_map. Caller sets argsz. READ &/ WRITE required.
> + *
> + * In case RESERVED_MSI_IOVA flag is set, the API only aims at registering an
> + * IOVA region that will be used on some platforms to map the host MSI frames.
> + * In that specific case, vaddr is ignored. Once registered, an MSI reserved
> + * IOVA region stays until the container is closed.
>   */
>  struct vfio_iommu_type1_dma_map {
>  	__u32	argsz;
>  	__u32	flags;
>  #define VFIO_DMA_MAP_FLAG_READ (1 << 0)		/* readable from device */
>  #define VFIO_DMA_MAP_FLAG_WRITE (1 << 1)	/* writable from device */
> +/* reserved iova for MSI vectors*/
> +#define VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA (1 << 2)
>  	__u64	vaddr;				/* Process virtual address */
>  	__u64	iova;				/* IO virtual address */
>  	__u64	size;				/* Size of mapping (bytes) */
> @@ -519,7 +526,8 @@ struct vfio_iommu_type1_dma_map {
>   * Caller sets argsz.  The actual unmapped size is returned in the size
>   * field.  No guarantee is made to the user that arbitrary unmaps of iova
>   * or size different from those used in the original mapping call will
> - * succeed.
> + * succeed. Once registered, an MSI region cannot be unmapped and stays
> + * until the container is closed.
>   */
>  struct vfio_iommu_type1_dma_unmap {
>  	__u32	argsz;

What happens when an x86 user does a mapping with this new flag set?
It seems like we end up configuring everything just as we would on a
platform requiring MSI mapping, including setting the domain MSI
geometry.  Should we be testing the MSI geometry flag on the iommu to
see if this is supported?  Surprisingly few things seem to check that
flag.

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v13 13/15] vfio/type1: Check doorbell safety
@ 2016-10-06 20:19     ` Alex Williamson
  0 siblings, 0 replies; 109+ messages in thread
From: Alex Williamson @ 2016-10-06 20:19 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, christoffer.dall, marc.zyngier, robin.murphy,
	will.deacon, joro, tglx, jason, linux-arm-kernel, kvm, drjones,
	linux-kernel, Bharat.Bhushan, pranav.sawargaonkar, p.fedin,
	iommu, Jean-Philippe.Brucker, yehuday, Manish.Jaggi

On Thu,  6 Oct 2016 08:45:29 +0000
Eric Auger <eric.auger@redhat.com> wrote:

> On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is abstracted
> by the msi controller.
> 
> Since we currently have no way to detect whether the MSI controller is
> upstream or downstream to the IOMMU we rely on the MSI doorbell information
> registered by the interrupt controllers. In case at least one doorbell
> does not implement proper isolation, we state the assignment is unsafe
> with regard to interrupts. This is a coase assessment but should allow to
> wait for a better system description.

s/coase/coarse/

> 
> At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
> removed in next patch.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> 
> ---
> 
> v9 -> v10:
> - coarse safety assessment based on MSI doorbell info
> 
> v3 -> v4:
> - rename vfio_msi_parent_irq_remapping_capable into vfio_safe_irq_domain
>   and irq_remapping into safe_irq_domains
> 
> v2 -> v3:
> - protect vfio_msi_parent_irq_remapping_capable with
>   CONFIG_GENERIC_MSI_IRQ_DOMAIN
> ---
>  drivers/vfio/vfio_iommu_type1.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index c2f8bd9..dc3ee5d 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -37,6 +37,7 @@
>  #include <linux/vfio.h>
>  #include <linux/workqueue.h>
>  #include <linux/dma-iommu.h>
> +#include <linux/msi-doorbell.h>
>  
>  #define DRIVER_VERSION  "0.2"
>  #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
> @@ -921,8 +922,13 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>  	INIT_LIST_HEAD(&domain->group_list);
>  	list_add(&group->next, &domain->group_list);
>  
> +	/*
> +	 * to advertise safe interrupts either the IOMMU or the MSI controllers
> +	 * must support IRQ remapping (aka. interrupt translation)
> +	 */
>  	if (!allow_unsafe_interrupts &&
> -	    !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
> +	    (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) &&
> +		!msi_doorbell_safe())) {

I assume this is why you want msi_doorbell_safe() to return true when
!CONFIG_MSI_DOORBELL but don't we really want to look at the iommu
geometry to see if MSI mapping is supported and then, once we know the
iommu is participating in MSI mapping, whether it's safe?

>  		pr_warn("%s: No interrupt remapping support.  Use the module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this platform\n",
>  		       __func__);
>  		ret = -EPERM;

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v13 13/15] vfio/type1: Check doorbell safety
@ 2016-10-06 20:19     ` Alex Williamson
  0 siblings, 0 replies; 109+ messages in thread
From: Alex Williamson @ 2016-10-06 20:19 UTC (permalink / raw)
  To: Eric Auger
  Cc: yehuday-eYqpPyKDWXRBDgjK7y7TUQ, drjones-H+wXaHxf7aLQT0dZR+AlfA,
	jason-NLaQJdtUoK4Be96aLqz0jA, kvm-u79uwXL29TY76Z2rM5mHXA,
	marc.zyngier-5wv7dgnIgG8, p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	tglx-hfZtesqFncYOwBW4kG4KsQ,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w

On Thu,  6 Oct 2016 08:45:29 +0000
Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is abstracted
> by the msi controller.
> 
> Since we currently have no way to detect whether the MSI controller is
> upstream or downstream to the IOMMU we rely on the MSI doorbell information
> registered by the interrupt controllers. In case at least one doorbell
> does not implement proper isolation, we state the assignment is unsafe
> with regard to interrupts. This is a coase assessment but should allow to
> wait for a better system description.

s/coase/coarse/

> 
> At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
> removed in next patch.
> 
> Signed-off-by: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> 
> ---
> 
> v9 -> v10:
> - coarse safety assessment based on MSI doorbell info
> 
> v3 -> v4:
> - rename vfio_msi_parent_irq_remapping_capable into vfio_safe_irq_domain
>   and irq_remapping into safe_irq_domains
> 
> v2 -> v3:
> - protect vfio_msi_parent_irq_remapping_capable with
>   CONFIG_GENERIC_MSI_IRQ_DOMAIN
> ---
>  drivers/vfio/vfio_iommu_type1.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index c2f8bd9..dc3ee5d 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -37,6 +37,7 @@
>  #include <linux/vfio.h>
>  #include <linux/workqueue.h>
>  #include <linux/dma-iommu.h>
> +#include <linux/msi-doorbell.h>
>  
>  #define DRIVER_VERSION  "0.2"
>  #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>"
> @@ -921,8 +922,13 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>  	INIT_LIST_HEAD(&domain->group_list);
>  	list_add(&group->next, &domain->group_list);
>  
> +	/*
> +	 * to advertise safe interrupts either the IOMMU or the MSI controllers
> +	 * must support IRQ remapping (aka. interrupt translation)
> +	 */
>  	if (!allow_unsafe_interrupts &&
> -	    !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
> +	    (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) &&
> +		!msi_doorbell_safe())) {

I assume this is why you want msi_doorbell_safe() to return true when
!CONFIG_MSI_DOORBELL but don't we really want to look at the iommu
geometry to see if MSI mapping is supported and then, once we know the
iommu is participating in MSI mapping, whether it's safe?

>  		pr_warn("%s: No interrupt remapping support.  Use the module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this platform\n",
>  		       __func__);
>  		ret = -EPERM;

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v13 13/15] vfio/type1: Check doorbell safety
@ 2016-10-06 20:19     ` Alex Williamson
  0 siblings, 0 replies; 109+ messages in thread
From: Alex Williamson @ 2016-10-06 20:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu,  6 Oct 2016 08:45:29 +0000
Eric Auger <eric.auger@redhat.com> wrote:

> On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is abstracted
> by the msi controller.
> 
> Since we currently have no way to detect whether the MSI controller is
> upstream or downstream to the IOMMU we rely on the MSI doorbell information
> registered by the interrupt controllers. In case at least one doorbell
> does not implement proper isolation, we state the assignment is unsafe
> with regard to interrupts. This is a coase assessment but should allow to
> wait for a better system description.

s/coase/coarse/

> 
> At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
> removed in next patch.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> 
> ---
> 
> v9 -> v10:
> - coarse safety assessment based on MSI doorbell info
> 
> v3 -> v4:
> - rename vfio_msi_parent_irq_remapping_capable into vfio_safe_irq_domain
>   and irq_remapping into safe_irq_domains
> 
> v2 -> v3:
> - protect vfio_msi_parent_irq_remapping_capable with
>   CONFIG_GENERIC_MSI_IRQ_DOMAIN
> ---
>  drivers/vfio/vfio_iommu_type1.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index c2f8bd9..dc3ee5d 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -37,6 +37,7 @@
>  #include <linux/vfio.h>
>  #include <linux/workqueue.h>
>  #include <linux/dma-iommu.h>
> +#include <linux/msi-doorbell.h>
>  
>  #define DRIVER_VERSION  "0.2"
>  #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
> @@ -921,8 +922,13 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>  	INIT_LIST_HEAD(&domain->group_list);
>  	list_add(&group->next, &domain->group_list);
>  
> +	/*
> +	 * to advertise safe interrupts either the IOMMU or the MSI controllers
> +	 * must support IRQ remapping (aka. interrupt translation)
> +	 */
>  	if (!allow_unsafe_interrupts &&
> -	    !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
> +	    (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) &&
> +		!msi_doorbell_safe())) {

I assume this is why you want msi_doorbell_safe() to return true when
!CONFIG_MSI_DOORBELL but don't we really want to look at the iommu
geometry to see if MSI mapping is supported and then, once we know the
iommu is participating in MSI mapping, whether it's safe?

>  		pr_warn("%s: No interrupt remapping support.  Use the module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this platform\n",
>  		       __func__);
>  		ret = -EPERM;

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v13 15/15] vfio/type1: Return the MSI geometry through VFIO_IOMMU_GET_INFO capability chains
@ 2016-10-06 20:20     ` Alex Williamson
  0 siblings, 0 replies; 109+ messages in thread
From: Alex Williamson @ 2016-10-06 20:20 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, christoffer.dall, marc.zyngier, robin.murphy,
	will.deacon, joro, tglx, jason, linux-arm-kernel, kvm, drjones,
	linux-kernel, Bharat.Bhushan, pranav.sawargaonkar, p.fedin,
	iommu, Jean-Philippe.Brucker, yehuday, Manish.Jaggi

On Thu,  6 Oct 2016 08:45:31 +0000
Eric Auger <eric.auger@redhat.com> wrote:

> This patch allows the user-space to retrieve the MSI geometry. The
> implementation is based on capability chains, now also added to
> VFIO_IOMMU_GET_INFO.
> 
> The returned info comprise:
> - whether the MSI IOVA are constrained to a reserved range (x86 case) and
>   in the positive, the start/end of the aperture,
> - or whether the IOVA aperture need to be set by the userspace. In that
>   case, the size and alignment of the IOVA window to be provided are
>   returned.
> 
> In case the userspace must provide the IOVA aperture, we currently report
> a size/alignment based on all the doorbells registered by the host kernel.
> This may exceed the actual needs.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> 
> ---
> v11 -> v11:
> - msi_doorbell_pages was renamed msi_doorbell_calc_pages
> 
> v9 -> v10:
> - move cap_offset after iova_pgsizes
> - replace __u64 alignment by __u32 order
> - introduce __u32 flags in vfio_iommu_type1_info_cap_msi_geometry and
>   fix alignment
> - call msi-doorbell API to compute the size/alignment
> 
> v8 -> v9:
> - use iommu_msi_supported flag instead of programmable
> - replace IOMMU_INFO_REQUIRE_MSI_MAP flag by a more sophisticated
>   capability chain, reporting the MSI geometry
> 
> v7 -> v8:
> - use iommu_domain_msi_geometry
> 
> v6 -> v7:
> - remove the computation of the number of IOVA pages to be provisionned.
>   This number depends on the domain/group/device topology which can
>   dynamically change. Let's rely instead rely on an arbitrary max depending
>   on the system
> 
> v4 -> v5:
> - move msi_info and ret declaration within the conditional code
> 
> v3 -> v4:
> - replace former vfio_domains_require_msi_mapping by
>   more complex computation of MSI mapping requirements, especially the
>   number of pages to be provided by the user-space.
> - reword patch title
> 
> RFC v1 -> v1:
> - derived from
>   [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
> - renamed allow_msi_reconfig into require_msi_mapping
> - fixed VFIO_IOMMU_GET_INFO
> ---
>  drivers/vfio/vfio_iommu_type1.c | 78 ++++++++++++++++++++++++++++++++++++++++-
>  include/uapi/linux/vfio.h       | 32 ++++++++++++++++-
>  2 files changed, 108 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index dc3ee5d..ce5e7eb 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -38,6 +38,8 @@
>  #include <linux/workqueue.h>
>  #include <linux/dma-iommu.h>
>  #include <linux/msi-doorbell.h>
> +#include <linux/irqdomain.h>
> +#include <linux/msi.h>
>  
>  #define DRIVER_VERSION  "0.2"
>  #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
> @@ -1101,6 +1103,55 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
>  	return ret;
>  }
>  
> +static int compute_msi_geometry_caps(struct vfio_iommu *iommu,
> +				     struct vfio_info_cap *caps)
> +{
> +	struct vfio_iommu_type1_info_cap_msi_geometry *vfio_msi_geometry;
> +	unsigned long order = __ffs(vfio_pgsize_bitmap(iommu));
> +	struct iommu_domain_msi_geometry msi_geometry;
> +	struct vfio_info_cap_header *header;
> +	struct vfio_domain *d;
> +	bool reserved;
> +	size_t size;
> +
> +	mutex_lock(&iommu->lock);
> +	/* All domains have same require_msi_map property, pick first */
> +	d = list_first_entry(&iommu->domain_list, struct vfio_domain, next);
> +	iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_GEOMETRY,
> +			      &msi_geometry);
> +	reserved = !msi_geometry.iommu_msi_supported;
> +
> +	mutex_unlock(&iommu->lock);
> +
> +	size = sizeof(*vfio_msi_geometry);
> +	header = vfio_info_cap_add(caps, size,
> +				   VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY, 1);
> +
> +	if (IS_ERR(header))
> +		return PTR_ERR(header);
> +
> +	vfio_msi_geometry = container_of(header,
> +				struct vfio_iommu_type1_info_cap_msi_geometry,
> +				header);
> +
> +	vfio_msi_geometry->flags = reserved;

Use the bit flag VFIO_IOMMU_MSI_GEOMETRY_RESERVED

> +	if (reserved) {
> +		vfio_msi_geometry->aperture_start = msi_geometry.aperture_start;
> +		vfio_msi_geometry->aperture_end = msi_geometry.aperture_end;

But maybe nobody has set these, did you intend to use
iommu_domain_msi_aperture_valid(), which you defined early on but never
used?

> +		return 0;
> +	}
> +
> +	vfio_msi_geometry->order = order;

I'm tempted to suggest that a user could do the same math on their own
since we provide the supported bitmap already... could it ever not be
the same? 

> +	/*
> +	 * we compute a system-wide requirement based on all the registered
> +	 * doorbells
> +	 */
> +	vfio_msi_geometry->size =
> +		msi_doorbell_calc_pages(order) * ((uint64_t) 1 << order);
> +
> +	return 0;
> +}
> +
>  static long vfio_iommu_type1_ioctl(void *iommu_data,
>  				   unsigned int cmd, unsigned long arg)
>  {
> @@ -1122,8 +1173,10 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  		}
>  	} else if (cmd == VFIO_IOMMU_GET_INFO) {
>  		struct vfio_iommu_type1_info info;
> +		struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
> +		int ret;
>  
> -		minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
> +		minsz = offsetofend(struct vfio_iommu_type1_info, cap_offset);
>  
>  		if (copy_from_user(&info, (void __user *)arg, minsz))
>  			return -EFAULT;
> @@ -1135,6 +1188,29 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  
>  		info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
>  
> +		ret = compute_msi_geometry_caps(iommu, &caps);
> +		if (ret)
> +			return ret;
> +
> +		if (caps.size) {
> +			info.flags |= VFIO_IOMMU_INFO_CAPS;
> +			if (info.argsz < sizeof(info) + caps.size) {
> +				info.argsz = sizeof(info) + caps.size;
> +				info.cap_offset = 0;
> +			} else {
> +				vfio_info_cap_shift(&caps, sizeof(info));
> +				if (copy_to_user((void __user *)arg +
> +						sizeof(info), caps.buf,
> +						caps.size)) {
> +					kfree(caps.buf);
> +					return -EFAULT;
> +				}
> +				info.cap_offset = sizeof(info);
> +			}
> +
> +			kfree(caps.buf);
> +		}
> +
>  		return copy_to_user((void __user *)arg, &info, minsz) ?
>  			-EFAULT : 0;
>  
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 4a9dbc2..8dae013 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -488,7 +488,35 @@ struct vfio_iommu_type1_info {
>  	__u32	argsz;
>  	__u32	flags;
>  #define VFIO_IOMMU_INFO_PGSIZES (1 << 0)	/* supported page sizes info */
> -	__u64	iova_pgsizes;		/* Bitmap of supported page sizes */
> +#define VFIO_IOMMU_INFO_CAPS	(1 << 1)	/* Info supports caps */
> +	__u64	iova_pgsizes;	/* Bitmap of supported page sizes */
> +	__u32	__resv;
> +	__u32   cap_offset;	/* Offset within info struct of first cap */
> +};

I understand the padding, but not the ordering.  Why not end with
padding?

> +
> +#define VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY	1
> +
> +/*
> + * The MSI geometry capability allows to report the MSI IOVA geometry:
> + * - either the MSI IOVAs are constrained within a reserved IOVA aperture
> + *   whose boundaries are given by [@aperture_start, @aperture_end].
> + *   this is typically the case on x86 host. The userspace is not allowed
> + *   to map userspace memory at IOVAs intersecting this range using
> + *   VFIO_IOMMU_MAP_DMA.
> + * - or the MSI IOVAs are not requested to belong to any reserved range;
> + *   in that case the userspace must provide an IOVA window characterized by
> + *   @size and @alignment using VFIO_IOMMU_MAP_DMA with RESERVED_MSI_IOVA flag.
> + */
> +struct vfio_iommu_type1_info_cap_msi_geometry {
> +	struct vfio_info_cap_header header;
> +	__u32 flags;
> +#define VFIO_IOMMU_MSI_GEOMETRY_RESERVED (1 << 0) /* reserved geometry */
> +	/* not reserved */
> +	__u32 order; /* iommu page order used for aperture alignment*/
> +	__u64 size; /* IOVA aperture size (bytes) the userspace must provide */
> +	/* reserved */
> +	__u64 aperture_start;
> +	__u64 aperture_end;

Should these be a union?  We never set them both.  Should the !reserved
case have a flag as well, so the user can positively identify what's
being provided?

>  };
>  
>  #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
> @@ -503,6 +531,8 @@ struct vfio_iommu_type1_info {
>   * IOVA region that will be used on some platforms to map the host MSI frames.
>   * In that specific case, vaddr is ignored. Once registered, an MSI reserved
>   * IOVA region stays until the container is closed.
> + * The requirement for provisioning such reserved IOVA range can be checked by
> + * checking the VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY capability.
>   */
>  struct vfio_iommu_type1_dma_map {
>  	__u32	argsz;

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v13 15/15] vfio/type1: Return the MSI geometry through VFIO_IOMMU_GET_INFO capability chains
@ 2016-10-06 20:20     ` Alex Williamson
  0 siblings, 0 replies; 109+ messages in thread
From: Alex Williamson @ 2016-10-06 20:20 UTC (permalink / raw)
  To: Eric Auger
  Cc: yehuday-eYqpPyKDWXRBDgjK7y7TUQ, drjones-H+wXaHxf7aLQT0dZR+AlfA,
	jason-NLaQJdtUoK4Be96aLqz0jA, kvm-u79uwXL29TY76Z2rM5mHXA,
	marc.zyngier-5wv7dgnIgG8, p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	tglx-hfZtesqFncYOwBW4kG4KsQ,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w

On Thu,  6 Oct 2016 08:45:31 +0000
Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> This patch allows the user-space to retrieve the MSI geometry. The
> implementation is based on capability chains, now also added to
> VFIO_IOMMU_GET_INFO.
> 
> The returned info comprise:
> - whether the MSI IOVA are constrained to a reserved range (x86 case) and
>   in the positive, the start/end of the aperture,
> - or whether the IOVA aperture need to be set by the userspace. In that
>   case, the size and alignment of the IOVA window to be provided are
>   returned.
> 
> In case the userspace must provide the IOVA aperture, we currently report
> a size/alignment based on all the doorbells registered by the host kernel.
> This may exceed the actual needs.
> 
> Signed-off-by: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> 
> ---
> v11 -> v11:
> - msi_doorbell_pages was renamed msi_doorbell_calc_pages
> 
> v9 -> v10:
> - move cap_offset after iova_pgsizes
> - replace __u64 alignment by __u32 order
> - introduce __u32 flags in vfio_iommu_type1_info_cap_msi_geometry and
>   fix alignment
> - call msi-doorbell API to compute the size/alignment
> 
> v8 -> v9:
> - use iommu_msi_supported flag instead of programmable
> - replace IOMMU_INFO_REQUIRE_MSI_MAP flag by a more sophisticated
>   capability chain, reporting the MSI geometry
> 
> v7 -> v8:
> - use iommu_domain_msi_geometry
> 
> v6 -> v7:
> - remove the computation of the number of IOVA pages to be provisionned.
>   This number depends on the domain/group/device topology which can
>   dynamically change. Let's rely instead rely on an arbitrary max depending
>   on the system
> 
> v4 -> v5:
> - move msi_info and ret declaration within the conditional code
> 
> v3 -> v4:
> - replace former vfio_domains_require_msi_mapping by
>   more complex computation of MSI mapping requirements, especially the
>   number of pages to be provided by the user-space.
> - reword patch title
> 
> RFC v1 -> v1:
> - derived from
>   [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
> - renamed allow_msi_reconfig into require_msi_mapping
> - fixed VFIO_IOMMU_GET_INFO
> ---
>  drivers/vfio/vfio_iommu_type1.c | 78 ++++++++++++++++++++++++++++++++++++++++-
>  include/uapi/linux/vfio.h       | 32 ++++++++++++++++-
>  2 files changed, 108 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index dc3ee5d..ce5e7eb 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -38,6 +38,8 @@
>  #include <linux/workqueue.h>
>  #include <linux/dma-iommu.h>
>  #include <linux/msi-doorbell.h>
> +#include <linux/irqdomain.h>
> +#include <linux/msi.h>
>  
>  #define DRIVER_VERSION  "0.2"
>  #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>"
> @@ -1101,6 +1103,55 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
>  	return ret;
>  }
>  
> +static int compute_msi_geometry_caps(struct vfio_iommu *iommu,
> +				     struct vfio_info_cap *caps)
> +{
> +	struct vfio_iommu_type1_info_cap_msi_geometry *vfio_msi_geometry;
> +	unsigned long order = __ffs(vfio_pgsize_bitmap(iommu));
> +	struct iommu_domain_msi_geometry msi_geometry;
> +	struct vfio_info_cap_header *header;
> +	struct vfio_domain *d;
> +	bool reserved;
> +	size_t size;
> +
> +	mutex_lock(&iommu->lock);
> +	/* All domains have same require_msi_map property, pick first */
> +	d = list_first_entry(&iommu->domain_list, struct vfio_domain, next);
> +	iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_GEOMETRY,
> +			      &msi_geometry);
> +	reserved = !msi_geometry.iommu_msi_supported;
> +
> +	mutex_unlock(&iommu->lock);
> +
> +	size = sizeof(*vfio_msi_geometry);
> +	header = vfio_info_cap_add(caps, size,
> +				   VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY, 1);
> +
> +	if (IS_ERR(header))
> +		return PTR_ERR(header);
> +
> +	vfio_msi_geometry = container_of(header,
> +				struct vfio_iommu_type1_info_cap_msi_geometry,
> +				header);
> +
> +	vfio_msi_geometry->flags = reserved;

Use the bit flag VFIO_IOMMU_MSI_GEOMETRY_RESERVED

> +	if (reserved) {
> +		vfio_msi_geometry->aperture_start = msi_geometry.aperture_start;
> +		vfio_msi_geometry->aperture_end = msi_geometry.aperture_end;

But maybe nobody has set these, did you intend to use
iommu_domain_msi_aperture_valid(), which you defined early on but never
used?

> +		return 0;
> +	}
> +
> +	vfio_msi_geometry->order = order;

I'm tempted to suggest that a user could do the same math on their own
since we provide the supported bitmap already... could it ever not be
the same? 

> +	/*
> +	 * we compute a system-wide requirement based on all the registered
> +	 * doorbells
> +	 */
> +	vfio_msi_geometry->size =
> +		msi_doorbell_calc_pages(order) * ((uint64_t) 1 << order);
> +
> +	return 0;
> +}
> +
>  static long vfio_iommu_type1_ioctl(void *iommu_data,
>  				   unsigned int cmd, unsigned long arg)
>  {
> @@ -1122,8 +1173,10 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  		}
>  	} else if (cmd == VFIO_IOMMU_GET_INFO) {
>  		struct vfio_iommu_type1_info info;
> +		struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
> +		int ret;
>  
> -		minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
> +		minsz = offsetofend(struct vfio_iommu_type1_info, cap_offset);
>  
>  		if (copy_from_user(&info, (void __user *)arg, minsz))
>  			return -EFAULT;
> @@ -1135,6 +1188,29 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  
>  		info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
>  
> +		ret = compute_msi_geometry_caps(iommu, &caps);
> +		if (ret)
> +			return ret;
> +
> +		if (caps.size) {
> +			info.flags |= VFIO_IOMMU_INFO_CAPS;
> +			if (info.argsz < sizeof(info) + caps.size) {
> +				info.argsz = sizeof(info) + caps.size;
> +				info.cap_offset = 0;
> +			} else {
> +				vfio_info_cap_shift(&caps, sizeof(info));
> +				if (copy_to_user((void __user *)arg +
> +						sizeof(info), caps.buf,
> +						caps.size)) {
> +					kfree(caps.buf);
> +					return -EFAULT;
> +				}
> +				info.cap_offset = sizeof(info);
> +			}
> +
> +			kfree(caps.buf);
> +		}
> +
>  		return copy_to_user((void __user *)arg, &info, minsz) ?
>  			-EFAULT : 0;
>  
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 4a9dbc2..8dae013 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -488,7 +488,35 @@ struct vfio_iommu_type1_info {
>  	__u32	argsz;
>  	__u32	flags;
>  #define VFIO_IOMMU_INFO_PGSIZES (1 << 0)	/* supported page sizes info */
> -	__u64	iova_pgsizes;		/* Bitmap of supported page sizes */
> +#define VFIO_IOMMU_INFO_CAPS	(1 << 1)	/* Info supports caps */
> +	__u64	iova_pgsizes;	/* Bitmap of supported page sizes */
> +	__u32	__resv;
> +	__u32   cap_offset;	/* Offset within info struct of first cap */
> +};

I understand the padding, but not the ordering.  Why not end with
padding?

> +
> +#define VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY	1
> +
> +/*
> + * The MSI geometry capability allows to report the MSI IOVA geometry:
> + * - either the MSI IOVAs are constrained within a reserved IOVA aperture
> + *   whose boundaries are given by [@aperture_start, @aperture_end].
> + *   this is typically the case on x86 host. The userspace is not allowed
> + *   to map userspace memory at IOVAs intersecting this range using
> + *   VFIO_IOMMU_MAP_DMA.
> + * - or the MSI IOVAs are not requested to belong to any reserved range;
> + *   in that case the userspace must provide an IOVA window characterized by
> + *   @size and @alignment using VFIO_IOMMU_MAP_DMA with RESERVED_MSI_IOVA flag.
> + */
> +struct vfio_iommu_type1_info_cap_msi_geometry {
> +	struct vfio_info_cap_header header;
> +	__u32 flags;
> +#define VFIO_IOMMU_MSI_GEOMETRY_RESERVED (1 << 0) /* reserved geometry */
> +	/* not reserved */
> +	__u32 order; /* iommu page order used for aperture alignment*/
> +	__u64 size; /* IOVA aperture size (bytes) the userspace must provide */
> +	/* reserved */
> +	__u64 aperture_start;
> +	__u64 aperture_end;

Should these be a union?  We never set them both.  Should the !reserved
case have a flag as well, so the user can positively identify what's
being provided?

>  };
>  
>  #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
> @@ -503,6 +531,8 @@ struct vfio_iommu_type1_info {
>   * IOVA region that will be used on some platforms to map the host MSI frames.
>   * In that specific case, vaddr is ignored. Once registered, an MSI reserved
>   * IOVA region stays until the container is closed.
> + * The requirement for provisioning such reserved IOVA range can be checked by
> + * checking the VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY capability.
>   */
>  struct vfio_iommu_type1_dma_map {
>  	__u32	argsz;

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v13 15/15] vfio/type1: Return the MSI geometry through VFIO_IOMMU_GET_INFO capability chains
@ 2016-10-06 20:20     ` Alex Williamson
  0 siblings, 0 replies; 109+ messages in thread
From: Alex Williamson @ 2016-10-06 20:20 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu,  6 Oct 2016 08:45:31 +0000
Eric Auger <eric.auger@redhat.com> wrote:

> This patch allows the user-space to retrieve the MSI geometry. The
> implementation is based on capability chains, now also added to
> VFIO_IOMMU_GET_INFO.
> 
> The returned info comprise:
> - whether the MSI IOVA are constrained to a reserved range (x86 case) and
>   in the positive, the start/end of the aperture,
> - or whether the IOVA aperture need to be set by the userspace. In that
>   case, the size and alignment of the IOVA window to be provided are
>   returned.
> 
> In case the userspace must provide the IOVA aperture, we currently report
> a size/alignment based on all the doorbells registered by the host kernel.
> This may exceed the actual needs.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> 
> ---
> v11 -> v11:
> - msi_doorbell_pages was renamed msi_doorbell_calc_pages
> 
> v9 -> v10:
> - move cap_offset after iova_pgsizes
> - replace __u64 alignment by __u32 order
> - introduce __u32 flags in vfio_iommu_type1_info_cap_msi_geometry and
>   fix alignment
> - call msi-doorbell API to compute the size/alignment
> 
> v8 -> v9:
> - use iommu_msi_supported flag instead of programmable
> - replace IOMMU_INFO_REQUIRE_MSI_MAP flag by a more sophisticated
>   capability chain, reporting the MSI geometry
> 
> v7 -> v8:
> - use iommu_domain_msi_geometry
> 
> v6 -> v7:
> - remove the computation of the number of IOVA pages to be provisionned.
>   This number depends on the domain/group/device topology which can
>   dynamically change. Let's rely instead rely on an arbitrary max depending
>   on the system
> 
> v4 -> v5:
> - move msi_info and ret declaration within the conditional code
> 
> v3 -> v4:
> - replace former vfio_domains_require_msi_mapping by
>   more complex computation of MSI mapping requirements, especially the
>   number of pages to be provided by the user-space.
> - reword patch title
> 
> RFC v1 -> v1:
> - derived from
>   [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
> - renamed allow_msi_reconfig into require_msi_mapping
> - fixed VFIO_IOMMU_GET_INFO
> ---
>  drivers/vfio/vfio_iommu_type1.c | 78 ++++++++++++++++++++++++++++++++++++++++-
>  include/uapi/linux/vfio.h       | 32 ++++++++++++++++-
>  2 files changed, 108 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index dc3ee5d..ce5e7eb 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -38,6 +38,8 @@
>  #include <linux/workqueue.h>
>  #include <linux/dma-iommu.h>
>  #include <linux/msi-doorbell.h>
> +#include <linux/irqdomain.h>
> +#include <linux/msi.h>
>  
>  #define DRIVER_VERSION  "0.2"
>  #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
> @@ -1101,6 +1103,55 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
>  	return ret;
>  }
>  
> +static int compute_msi_geometry_caps(struct vfio_iommu *iommu,
> +				     struct vfio_info_cap *caps)
> +{
> +	struct vfio_iommu_type1_info_cap_msi_geometry *vfio_msi_geometry;
> +	unsigned long order = __ffs(vfio_pgsize_bitmap(iommu));
> +	struct iommu_domain_msi_geometry msi_geometry;
> +	struct vfio_info_cap_header *header;
> +	struct vfio_domain *d;
> +	bool reserved;
> +	size_t size;
> +
> +	mutex_lock(&iommu->lock);
> +	/* All domains have same require_msi_map property, pick first */
> +	d = list_first_entry(&iommu->domain_list, struct vfio_domain, next);
> +	iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_GEOMETRY,
> +			      &msi_geometry);
> +	reserved = !msi_geometry.iommu_msi_supported;
> +
> +	mutex_unlock(&iommu->lock);
> +
> +	size = sizeof(*vfio_msi_geometry);
> +	header = vfio_info_cap_add(caps, size,
> +				   VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY, 1);
> +
> +	if (IS_ERR(header))
> +		return PTR_ERR(header);
> +
> +	vfio_msi_geometry = container_of(header,
> +				struct vfio_iommu_type1_info_cap_msi_geometry,
> +				header);
> +
> +	vfio_msi_geometry->flags = reserved;

Use the bit flag VFIO_IOMMU_MSI_GEOMETRY_RESERVED

> +	if (reserved) {
> +		vfio_msi_geometry->aperture_start = msi_geometry.aperture_start;
> +		vfio_msi_geometry->aperture_end = msi_geometry.aperture_end;

But maybe nobody has set these, did you intend to use
iommu_domain_msi_aperture_valid(), which you defined early on but never
used?

> +		return 0;
> +	}
> +
> +	vfio_msi_geometry->order = order;

I'm tempted to suggest that a user could do the same math on their own
since we provide the supported bitmap already... could it ever not be
the same? 

> +	/*
> +	 * we compute a system-wide requirement based on all the registered
> +	 * doorbells
> +	 */
> +	vfio_msi_geometry->size =
> +		msi_doorbell_calc_pages(order) * ((uint64_t) 1 << order);
> +
> +	return 0;
> +}
> +
>  static long vfio_iommu_type1_ioctl(void *iommu_data,
>  				   unsigned int cmd, unsigned long arg)
>  {
> @@ -1122,8 +1173,10 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  		}
>  	} else if (cmd == VFIO_IOMMU_GET_INFO) {
>  		struct vfio_iommu_type1_info info;
> +		struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
> +		int ret;
>  
> -		minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
> +		minsz = offsetofend(struct vfio_iommu_type1_info, cap_offset);
>  
>  		if (copy_from_user(&info, (void __user *)arg, minsz))
>  			return -EFAULT;
> @@ -1135,6 +1188,29 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  
>  		info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
>  
> +		ret = compute_msi_geometry_caps(iommu, &caps);
> +		if (ret)
> +			return ret;
> +
> +		if (caps.size) {
> +			info.flags |= VFIO_IOMMU_INFO_CAPS;
> +			if (info.argsz < sizeof(info) + caps.size) {
> +				info.argsz = sizeof(info) + caps.size;
> +				info.cap_offset = 0;
> +			} else {
> +				vfio_info_cap_shift(&caps, sizeof(info));
> +				if (copy_to_user((void __user *)arg +
> +						sizeof(info), caps.buf,
> +						caps.size)) {
> +					kfree(caps.buf);
> +					return -EFAULT;
> +				}
> +				info.cap_offset = sizeof(info);
> +			}
> +
> +			kfree(caps.buf);
> +		}
> +
>  		return copy_to_user((void __user *)arg, &info, minsz) ?
>  			-EFAULT : 0;
>  
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 4a9dbc2..8dae013 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -488,7 +488,35 @@ struct vfio_iommu_type1_info {
>  	__u32	argsz;
>  	__u32	flags;
>  #define VFIO_IOMMU_INFO_PGSIZES (1 << 0)	/* supported page sizes info */
> -	__u64	iova_pgsizes;		/* Bitmap of supported page sizes */
> +#define VFIO_IOMMU_INFO_CAPS	(1 << 1)	/* Info supports caps */
> +	__u64	iova_pgsizes;	/* Bitmap of supported page sizes */
> +	__u32	__resv;
> +	__u32   cap_offset;	/* Offset within info struct of first cap */
> +};

I understand the padding, but not the ordering.  Why not end with
padding?

> +
> +#define VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY	1
> +
> +/*
> + * The MSI geometry capability allows to report the MSI IOVA geometry:
> + * - either the MSI IOVAs are constrained within a reserved IOVA aperture
> + *   whose boundaries are given by [@aperture_start, @aperture_end].
> + *   this is typically the case on x86 host. The userspace is not allowed
> + *   to map userspace memory at IOVAs intersecting this range using
> + *   VFIO_IOMMU_MAP_DMA.
> + * - or the MSI IOVAs are not requested to belong to any reserved range;
> + *   in that case the userspace must provide an IOVA window characterized by
> + *   @size and @alignment using VFIO_IOMMU_MAP_DMA with RESERVED_MSI_IOVA flag.
> + */
> +struct vfio_iommu_type1_info_cap_msi_geometry {
> +	struct vfio_info_cap_header header;
> +	__u32 flags;
> +#define VFIO_IOMMU_MSI_GEOMETRY_RESERVED (1 << 0) /* reserved geometry */
> +	/* not reserved */
> +	__u32 order; /* iommu page order used for aperture alignment*/
> +	__u64 size; /* IOVA aperture size (bytes) the userspace must provide */
> +	/* reserved */
> +	__u64 aperture_start;
> +	__u64 aperture_end;

Should these be a union?  We never set them both.  Should the !reserved
case have a flag as well, so the user can positively identify what's
being provided?

>  };
>  
>  #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
> @@ -503,6 +531,8 @@ struct vfio_iommu_type1_info {
>   * IOVA region that will be used on some platforms to map the host MSI frames.
>   * In that specific case, vaddr is ignored. Once registered, an MSI reserved
>   * IOVA region stays until the container is closed.
> + * The requirement for provisioning such reserved IOVA range can be checked by
> + * checking the VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY capability.
>   */
>  struct vfio_iommu_type1_dma_map {
>  	__u32	argsz;

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v13 15/15] vfio/type1: Return the MSI geometry through VFIO_IOMMU_GET_INFO capability chains
@ 2016-10-06 20:42       ` Alex Williamson
  0 siblings, 0 replies; 109+ messages in thread
From: Alex Williamson @ 2016-10-06 20:42 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, christoffer.dall, marc.zyngier, robin.murphy,
	will.deacon, joro, tglx, jason, linux-arm-kernel, kvm, drjones,
	linux-kernel, Bharat.Bhushan, pranav.sawargaonkar, p.fedin,
	iommu, Jean-Philippe.Brucker, yehuday, Manish.Jaggi

On Thu, 6 Oct 2016 14:20:40 -0600
Alex Williamson <alex.williamson@redhat.com> wrote:

> On Thu,  6 Oct 2016 08:45:31 +0000
> Eric Auger <eric.auger@redhat.com> wrote:
> 
> > This patch allows the user-space to retrieve the MSI geometry. The
> > implementation is based on capability chains, now also added to
> > VFIO_IOMMU_GET_INFO.
> > 
> > The returned info comprise:
> > - whether the MSI IOVA are constrained to a reserved range (x86 case) and
> >   in the positive, the start/end of the aperture,
> > - or whether the IOVA aperture need to be set by the userspace. In that
> >   case, the size and alignment of the IOVA window to be provided are
> >   returned.
> > 
> > In case the userspace must provide the IOVA aperture, we currently report
> > a size/alignment based on all the doorbells registered by the host kernel.
> > This may exceed the actual needs.
> > 
> > Signed-off-by: Eric Auger <eric.auger@redhat.com>
> > 
> > ---
> > v11 -> v11:
> > - msi_doorbell_pages was renamed msi_doorbell_calc_pages
> > 
> > v9 -> v10:
> > - move cap_offset after iova_pgsizes
> > - replace __u64 alignment by __u32 order
> > - introduce __u32 flags in vfio_iommu_type1_info_cap_msi_geometry and
> >   fix alignment
> > - call msi-doorbell API to compute the size/alignment
> > 
> > v8 -> v9:
> > - use iommu_msi_supported flag instead of programmable
> > - replace IOMMU_INFO_REQUIRE_MSI_MAP flag by a more sophisticated
> >   capability chain, reporting the MSI geometry
> > 
> > v7 -> v8:
> > - use iommu_domain_msi_geometry
> > 
> > v6 -> v7:
> > - remove the computation of the number of IOVA pages to be provisionned.
> >   This number depends on the domain/group/device topology which can
> >   dynamically change. Let's rely instead rely on an arbitrary max depending
> >   on the system
> > 
> > v4 -> v5:
> > - move msi_info and ret declaration within the conditional code
> > 
> > v3 -> v4:
> > - replace former vfio_domains_require_msi_mapping by
> >   more complex computation of MSI mapping requirements, especially the
> >   number of pages to be provided by the user-space.
> > - reword patch title
> > 
> > RFC v1 -> v1:
> > - derived from
> >   [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
> > - renamed allow_msi_reconfig into require_msi_mapping
> > - fixed VFIO_IOMMU_GET_INFO
> > ---
> >  drivers/vfio/vfio_iommu_type1.c | 78 ++++++++++++++++++++++++++++++++++++++++-
> >  include/uapi/linux/vfio.h       | 32 ++++++++++++++++-
> >  2 files changed, 108 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> > index dc3ee5d..ce5e7eb 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -38,6 +38,8 @@
> >  #include <linux/workqueue.h>
> >  #include <linux/dma-iommu.h>
> >  #include <linux/msi-doorbell.h>
> > +#include <linux/irqdomain.h>
> > +#include <linux/msi.h>
> >  
> >  #define DRIVER_VERSION  "0.2"
> >  #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
> > @@ -1101,6 +1103,55 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
> >  	return ret;
> >  }
> >  
> > +static int compute_msi_geometry_caps(struct vfio_iommu *iommu,
> > +				     struct vfio_info_cap *caps)
> > +{
> > +	struct vfio_iommu_type1_info_cap_msi_geometry *vfio_msi_geometry;
> > +	unsigned long order = __ffs(vfio_pgsize_bitmap(iommu));
> > +	struct iommu_domain_msi_geometry msi_geometry;
> > +	struct vfio_info_cap_header *header;
> > +	struct vfio_domain *d;
> > +	bool reserved;
> > +	size_t size;
> > +
> > +	mutex_lock(&iommu->lock);
> > +	/* All domains have same require_msi_map property, pick first */
> > +	d = list_first_entry(&iommu->domain_list, struct vfio_domain, next);
> > +	iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_GEOMETRY,
> > +			      &msi_geometry);
> > +	reserved = !msi_geometry.iommu_msi_supported;
> > +
> > +	mutex_unlock(&iommu->lock);
> > +
> > +	size = sizeof(*vfio_msi_geometry);
> > +	header = vfio_info_cap_add(caps, size,
> > +				   VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY, 1);
> > +
> > +	if (IS_ERR(header))
> > +		return PTR_ERR(header);
> > +
> > +	vfio_msi_geometry = container_of(header,
> > +				struct vfio_iommu_type1_info_cap_msi_geometry,
> > +				header);
> > +
> > +	vfio_msi_geometry->flags = reserved;  
> 
> Use the bit flag VFIO_IOMMU_MSI_GEOMETRY_RESERVED
> 
> > +	if (reserved) {
> > +		vfio_msi_geometry->aperture_start = msi_geometry.aperture_start;
> > +		vfio_msi_geometry->aperture_end = msi_geometry.aperture_end;  
> 
> But maybe nobody has set these, did you intend to use
> iommu_domain_msi_aperture_valid(), which you defined early on but never
> used?
> 
> > +		return 0;
> > +	}
> > +
> > +	vfio_msi_geometry->order = order;  
> 
> I'm tempted to suggest that a user could do the same math on their own
> since we provide the supported bitmap already... could it ever not be
> the same? 
> 
> > +	/*
> > +	 * we compute a system-wide requirement based on all the registered
> > +	 * doorbells
> > +	 */
> > +	vfio_msi_geometry->size =
> > +		msi_doorbell_calc_pages(order) * ((uint64_t) 1 << order);
> > +
> > +	return 0;
> > +}
> > +
> >  static long vfio_iommu_type1_ioctl(void *iommu_data,
> >  				   unsigned int cmd, unsigned long arg)
> >  {
> > @@ -1122,8 +1173,10 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
> >  		}
> >  	} else if (cmd == VFIO_IOMMU_GET_INFO) {
> >  		struct vfio_iommu_type1_info info;
> > +		struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
> > +		int ret;
> >  
> > -		minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
> > +		minsz = offsetofend(struct vfio_iommu_type1_info, cap_offset);
> >  
> >  		if (copy_from_user(&info, (void __user *)arg, minsz))
> >  			return -EFAULT;
> > @@ -1135,6 +1188,29 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
> >  
> >  		info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
> >  
> > +		ret = compute_msi_geometry_caps(iommu, &caps);
> > +		if (ret)
> > +			return ret;
> > +
> > +		if (caps.size) {
> > +			info.flags |= VFIO_IOMMU_INFO_CAPS;
> > +			if (info.argsz < sizeof(info) + caps.size) {
> > +				info.argsz = sizeof(info) + caps.size;
> > +				info.cap_offset = 0;
> > +			} else {
> > +				vfio_info_cap_shift(&caps, sizeof(info));
> > +				if (copy_to_user((void __user *)arg +
> > +						sizeof(info), caps.buf,
> > +						caps.size)) {
> > +					kfree(caps.buf);
> > +					return -EFAULT;
> > +				}
> > +				info.cap_offset = sizeof(info);
> > +			}
> > +
> > +			kfree(caps.buf);
> > +		}
> > +
> >  		return copy_to_user((void __user *)arg, &info, minsz) ?
> >  			-EFAULT : 0;
> >  
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 4a9dbc2..8dae013 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -488,7 +488,35 @@ struct vfio_iommu_type1_info {
> >  	__u32	argsz;
> >  	__u32	flags;
> >  #define VFIO_IOMMU_INFO_PGSIZES (1 << 0)	/* supported page sizes info */
> > -	__u64	iova_pgsizes;		/* Bitmap of supported page sizes */
> > +#define VFIO_IOMMU_INFO_CAPS	(1 << 1)	/* Info supports caps */
> > +	__u64	iova_pgsizes;	/* Bitmap of supported page sizes */
> > +	__u32	__resv;
> > +	__u32   cap_offset;	/* Offset within info struct of first cap */
> > +};  
> 
> I understand the padding, but not the ordering.  Why not end with
> padding?
> 
> > +
> > +#define VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY	1
> > +
> > +/*
> > + * The MSI geometry capability allows to report the MSI IOVA geometry:
> > + * - either the MSI IOVAs are constrained within a reserved IOVA aperture
> > + *   whose boundaries are given by [@aperture_start, @aperture_end].
> > + *   this is typically the case on x86 host. The userspace is not allowed
> > + *   to map userspace memory at IOVAs intersecting this range using
> > + *   VFIO_IOMMU_MAP_DMA.
> > + * - or the MSI IOVAs are not requested to belong to any reserved range;
> > + *   in that case the userspace must provide an IOVA window characterized by
> > + *   @size and @alignment using VFIO_IOMMU_MAP_DMA with RESERVED_MSI_IOVA flag.
> > + */
> > +struct vfio_iommu_type1_info_cap_msi_geometry {
> > +	struct vfio_info_cap_header header;
> > +	__u32 flags;
> > +#define VFIO_IOMMU_MSI_GEOMETRY_RESERVED (1 << 0) /* reserved geometry */
> > +	/* not reserved */
> > +	__u32 order; /* iommu page order used for aperture alignment*/
> > +	__u64 size; /* IOVA aperture size (bytes) the userspace must provide */
> > +	/* reserved */
> > +	__u64 aperture_start;
> > +	__u64 aperture_end;  
> 
> Should these be a union?  We never set them both.  Should the !reserved
> case have a flag as well, so the user can positively identify what's
> being provided?

Actually, is there really any need to fit both of these within the same
structure?  Part of the idea of the capability chains is we can create
a capability for each new thing we want to describe.  So, we could
simply define a generic reserved IOVA range capability with a 'start'
and 'end' and then another capability to define MSI mapping
requirements.  Thanks,

Alex
 
> >  };
> >  
> >  #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
> > @@ -503,6 +531,8 @@ struct vfio_iommu_type1_info {
> >   * IOVA region that will be used on some platforms to map the host MSI frames.
> >   * In that specific case, vaddr is ignored. Once registered, an MSI reserved
> >   * IOVA region stays until the container is closed.
> > + * The requirement for provisioning such reserved IOVA range can be checked by
> > + * checking the VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY capability.
> >   */
> >  struct vfio_iommu_type1_dma_map {
> >  	__u32	argsz;  
> 

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v13 15/15] vfio/type1: Return the MSI geometry through VFIO_IOMMU_GET_INFO capability chains
@ 2016-10-06 20:42       ` Alex Williamson
  0 siblings, 0 replies; 109+ messages in thread
From: Alex Williamson @ 2016-10-06 20:42 UTC (permalink / raw)
  To: Eric Auger
  Cc: yehuday-eYqpPyKDWXRBDgjK7y7TUQ, drjones-H+wXaHxf7aLQT0dZR+AlfA,
	jason-NLaQJdtUoK4Be96aLqz0jA, kvm-u79uwXL29TY76Z2rM5mHXA,
	marc.zyngier-5wv7dgnIgG8, p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	tglx-hfZtesqFncYOwBW4kG4KsQ,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w

On Thu, 6 Oct 2016 14:20:40 -0600
Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> On Thu,  6 Oct 2016 08:45:31 +0000
> Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> 
> > This patch allows the user-space to retrieve the MSI geometry. The
> > implementation is based on capability chains, now also added to
> > VFIO_IOMMU_GET_INFO.
> > 
> > The returned info comprise:
> > - whether the MSI IOVA are constrained to a reserved range (x86 case) and
> >   in the positive, the start/end of the aperture,
> > - or whether the IOVA aperture need to be set by the userspace. In that
> >   case, the size and alignment of the IOVA window to be provided are
> >   returned.
> > 
> > In case the userspace must provide the IOVA aperture, we currently report
> > a size/alignment based on all the doorbells registered by the host kernel.
> > This may exceed the actual needs.
> > 
> > Signed-off-by: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > 
> > ---
> > v11 -> v11:
> > - msi_doorbell_pages was renamed msi_doorbell_calc_pages
> > 
> > v9 -> v10:
> > - move cap_offset after iova_pgsizes
> > - replace __u64 alignment by __u32 order
> > - introduce __u32 flags in vfio_iommu_type1_info_cap_msi_geometry and
> >   fix alignment
> > - call msi-doorbell API to compute the size/alignment
> > 
> > v8 -> v9:
> > - use iommu_msi_supported flag instead of programmable
> > - replace IOMMU_INFO_REQUIRE_MSI_MAP flag by a more sophisticated
> >   capability chain, reporting the MSI geometry
> > 
> > v7 -> v8:
> > - use iommu_domain_msi_geometry
> > 
> > v6 -> v7:
> > - remove the computation of the number of IOVA pages to be provisionned.
> >   This number depends on the domain/group/device topology which can
> >   dynamically change. Let's rely instead rely on an arbitrary max depending
> >   on the system
> > 
> > v4 -> v5:
> > - move msi_info and ret declaration within the conditional code
> > 
> > v3 -> v4:
> > - replace former vfio_domains_require_msi_mapping by
> >   more complex computation of MSI mapping requirements, especially the
> >   number of pages to be provided by the user-space.
> > - reword patch title
> > 
> > RFC v1 -> v1:
> > - derived from
> >   [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
> > - renamed allow_msi_reconfig into require_msi_mapping
> > - fixed VFIO_IOMMU_GET_INFO
> > ---
> >  drivers/vfio/vfio_iommu_type1.c | 78 ++++++++++++++++++++++++++++++++++++++++-
> >  include/uapi/linux/vfio.h       | 32 ++++++++++++++++-
> >  2 files changed, 108 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> > index dc3ee5d..ce5e7eb 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -38,6 +38,8 @@
> >  #include <linux/workqueue.h>
> >  #include <linux/dma-iommu.h>
> >  #include <linux/msi-doorbell.h>
> > +#include <linux/irqdomain.h>
> > +#include <linux/msi.h>
> >  
> >  #define DRIVER_VERSION  "0.2"
> >  #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>"
> > @@ -1101,6 +1103,55 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
> >  	return ret;
> >  }
> >  
> > +static int compute_msi_geometry_caps(struct vfio_iommu *iommu,
> > +				     struct vfio_info_cap *caps)
> > +{
> > +	struct vfio_iommu_type1_info_cap_msi_geometry *vfio_msi_geometry;
> > +	unsigned long order = __ffs(vfio_pgsize_bitmap(iommu));
> > +	struct iommu_domain_msi_geometry msi_geometry;
> > +	struct vfio_info_cap_header *header;
> > +	struct vfio_domain *d;
> > +	bool reserved;
> > +	size_t size;
> > +
> > +	mutex_lock(&iommu->lock);
> > +	/* All domains have same require_msi_map property, pick first */
> > +	d = list_first_entry(&iommu->domain_list, struct vfio_domain, next);
> > +	iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_GEOMETRY,
> > +			      &msi_geometry);
> > +	reserved = !msi_geometry.iommu_msi_supported;
> > +
> > +	mutex_unlock(&iommu->lock);
> > +
> > +	size = sizeof(*vfio_msi_geometry);
> > +	header = vfio_info_cap_add(caps, size,
> > +				   VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY, 1);
> > +
> > +	if (IS_ERR(header))
> > +		return PTR_ERR(header);
> > +
> > +	vfio_msi_geometry = container_of(header,
> > +				struct vfio_iommu_type1_info_cap_msi_geometry,
> > +				header);
> > +
> > +	vfio_msi_geometry->flags = reserved;  
> 
> Use the bit flag VFIO_IOMMU_MSI_GEOMETRY_RESERVED
> 
> > +	if (reserved) {
> > +		vfio_msi_geometry->aperture_start = msi_geometry.aperture_start;
> > +		vfio_msi_geometry->aperture_end = msi_geometry.aperture_end;  
> 
> But maybe nobody has set these, did you intend to use
> iommu_domain_msi_aperture_valid(), which you defined early on but never
> used?
> 
> > +		return 0;
> > +	}
> > +
> > +	vfio_msi_geometry->order = order;  
> 
> I'm tempted to suggest that a user could do the same math on their own
> since we provide the supported bitmap already... could it ever not be
> the same? 
> 
> > +	/*
> > +	 * we compute a system-wide requirement based on all the registered
> > +	 * doorbells
> > +	 */
> > +	vfio_msi_geometry->size =
> > +		msi_doorbell_calc_pages(order) * ((uint64_t) 1 << order);
> > +
> > +	return 0;
> > +}
> > +
> >  static long vfio_iommu_type1_ioctl(void *iommu_data,
> >  				   unsigned int cmd, unsigned long arg)
> >  {
> > @@ -1122,8 +1173,10 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
> >  		}
> >  	} else if (cmd == VFIO_IOMMU_GET_INFO) {
> >  		struct vfio_iommu_type1_info info;
> > +		struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
> > +		int ret;
> >  
> > -		minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
> > +		minsz = offsetofend(struct vfio_iommu_type1_info, cap_offset);
> >  
> >  		if (copy_from_user(&info, (void __user *)arg, minsz))
> >  			return -EFAULT;
> > @@ -1135,6 +1188,29 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
> >  
> >  		info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
> >  
> > +		ret = compute_msi_geometry_caps(iommu, &caps);
> > +		if (ret)
> > +			return ret;
> > +
> > +		if (caps.size) {
> > +			info.flags |= VFIO_IOMMU_INFO_CAPS;
> > +			if (info.argsz < sizeof(info) + caps.size) {
> > +				info.argsz = sizeof(info) + caps.size;
> > +				info.cap_offset = 0;
> > +			} else {
> > +				vfio_info_cap_shift(&caps, sizeof(info));
> > +				if (copy_to_user((void __user *)arg +
> > +						sizeof(info), caps.buf,
> > +						caps.size)) {
> > +					kfree(caps.buf);
> > +					return -EFAULT;
> > +				}
> > +				info.cap_offset = sizeof(info);
> > +			}
> > +
> > +			kfree(caps.buf);
> > +		}
> > +
> >  		return copy_to_user((void __user *)arg, &info, minsz) ?
> >  			-EFAULT : 0;
> >  
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 4a9dbc2..8dae013 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -488,7 +488,35 @@ struct vfio_iommu_type1_info {
> >  	__u32	argsz;
> >  	__u32	flags;
> >  #define VFIO_IOMMU_INFO_PGSIZES (1 << 0)	/* supported page sizes info */
> > -	__u64	iova_pgsizes;		/* Bitmap of supported page sizes */
> > +#define VFIO_IOMMU_INFO_CAPS	(1 << 1)	/* Info supports caps */
> > +	__u64	iova_pgsizes;	/* Bitmap of supported page sizes */
> > +	__u32	__resv;
> > +	__u32   cap_offset;	/* Offset within info struct of first cap */
> > +};  
> 
> I understand the padding, but not the ordering.  Why not end with
> padding?
> 
> > +
> > +#define VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY	1
> > +
> > +/*
> > + * The MSI geometry capability allows to report the MSI IOVA geometry:
> > + * - either the MSI IOVAs are constrained within a reserved IOVA aperture
> > + *   whose boundaries are given by [@aperture_start, @aperture_end].
> > + *   this is typically the case on x86 host. The userspace is not allowed
> > + *   to map userspace memory at IOVAs intersecting this range using
> > + *   VFIO_IOMMU_MAP_DMA.
> > + * - or the MSI IOVAs are not requested to belong to any reserved range;
> > + *   in that case the userspace must provide an IOVA window characterized by
> > + *   @size and @alignment using VFIO_IOMMU_MAP_DMA with RESERVED_MSI_IOVA flag.
> > + */
> > +struct vfio_iommu_type1_info_cap_msi_geometry {
> > +	struct vfio_info_cap_header header;
> > +	__u32 flags;
> > +#define VFIO_IOMMU_MSI_GEOMETRY_RESERVED (1 << 0) /* reserved geometry */
> > +	/* not reserved */
> > +	__u32 order; /* iommu page order used for aperture alignment*/
> > +	__u64 size; /* IOVA aperture size (bytes) the userspace must provide */
> > +	/* reserved */
> > +	__u64 aperture_start;
> > +	__u64 aperture_end;  
> 
> Should these be a union?  We never set them both.  Should the !reserved
> case have a flag as well, so the user can positively identify what's
> being provided?

Actually, is there really any need to fit both of these within the same
structure?  Part of the idea of the capability chains is we can create
a capability for each new thing we want to describe.  So, we could
simply define a generic reserved IOVA range capability with a 'start'
and 'end' and then another capability to define MSI mapping
requirements.  Thanks,

Alex
 
> >  };
> >  
> >  #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
> > @@ -503,6 +531,8 @@ struct vfio_iommu_type1_info {
> >   * IOVA region that will be used on some platforms to map the host MSI frames.
> >   * In that specific case, vaddr is ignored. Once registered, an MSI reserved
> >   * IOVA region stays until the container is closed.
> > + * The requirement for provisioning such reserved IOVA range can be checked by
> > + * checking the VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY capability.
> >   */
> >  struct vfio_iommu_type1_dma_map {
> >  	__u32	argsz;  
> 

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v13 15/15] vfio/type1: Return the MSI geometry through VFIO_IOMMU_GET_INFO capability chains
@ 2016-10-06 20:42       ` Alex Williamson
  0 siblings, 0 replies; 109+ messages in thread
From: Alex Williamson @ 2016-10-06 20:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 6 Oct 2016 14:20:40 -0600
Alex Williamson <alex.williamson@redhat.com> wrote:

> On Thu,  6 Oct 2016 08:45:31 +0000
> Eric Auger <eric.auger@redhat.com> wrote:
> 
> > This patch allows the user-space to retrieve the MSI geometry. The
> > implementation is based on capability chains, now also added to
> > VFIO_IOMMU_GET_INFO.
> > 
> > The returned info comprise:
> > - whether the MSI IOVA are constrained to a reserved range (x86 case) and
> >   in the positive, the start/end of the aperture,
> > - or whether the IOVA aperture need to be set by the userspace. In that
> >   case, the size and alignment of the IOVA window to be provided are
> >   returned.
> > 
> > In case the userspace must provide the IOVA aperture, we currently report
> > a size/alignment based on all the doorbells registered by the host kernel.
> > This may exceed the actual needs.
> > 
> > Signed-off-by: Eric Auger <eric.auger@redhat.com>
> > 
> > ---
> > v11 -> v11:
> > - msi_doorbell_pages was renamed msi_doorbell_calc_pages
> > 
> > v9 -> v10:
> > - move cap_offset after iova_pgsizes
> > - replace __u64 alignment by __u32 order
> > - introduce __u32 flags in vfio_iommu_type1_info_cap_msi_geometry and
> >   fix alignment
> > - call msi-doorbell API to compute the size/alignment
> > 
> > v8 -> v9:
> > - use iommu_msi_supported flag instead of programmable
> > - replace IOMMU_INFO_REQUIRE_MSI_MAP flag by a more sophisticated
> >   capability chain, reporting the MSI geometry
> > 
> > v7 -> v8:
> > - use iommu_domain_msi_geometry
> > 
> > v6 -> v7:
> > - remove the computation of the number of IOVA pages to be provisionned.
> >   This number depends on the domain/group/device topology which can
> >   dynamically change. Let's rely instead rely on an arbitrary max depending
> >   on the system
> > 
> > v4 -> v5:
> > - move msi_info and ret declaration within the conditional code
> > 
> > v3 -> v4:
> > - replace former vfio_domains_require_msi_mapping by
> >   more complex computation of MSI mapping requirements, especially the
> >   number of pages to be provided by the user-space.
> > - reword patch title
> > 
> > RFC v1 -> v1:
> > - derived from
> >   [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
> > - renamed allow_msi_reconfig into require_msi_mapping
> > - fixed VFIO_IOMMU_GET_INFO
> > ---
> >  drivers/vfio/vfio_iommu_type1.c | 78 ++++++++++++++++++++++++++++++++++++++++-
> >  include/uapi/linux/vfio.h       | 32 ++++++++++++++++-
> >  2 files changed, 108 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> > index dc3ee5d..ce5e7eb 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -38,6 +38,8 @@
> >  #include <linux/workqueue.h>
> >  #include <linux/dma-iommu.h>
> >  #include <linux/msi-doorbell.h>
> > +#include <linux/irqdomain.h>
> > +#include <linux/msi.h>
> >  
> >  #define DRIVER_VERSION  "0.2"
> >  #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
> > @@ -1101,6 +1103,55 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
> >  	return ret;
> >  }
> >  
> > +static int compute_msi_geometry_caps(struct vfio_iommu *iommu,
> > +				     struct vfio_info_cap *caps)
> > +{
> > +	struct vfio_iommu_type1_info_cap_msi_geometry *vfio_msi_geometry;
> > +	unsigned long order = __ffs(vfio_pgsize_bitmap(iommu));
> > +	struct iommu_domain_msi_geometry msi_geometry;
> > +	struct vfio_info_cap_header *header;
> > +	struct vfio_domain *d;
> > +	bool reserved;
> > +	size_t size;
> > +
> > +	mutex_lock(&iommu->lock);
> > +	/* All domains have same require_msi_map property, pick first */
> > +	d = list_first_entry(&iommu->domain_list, struct vfio_domain, next);
> > +	iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_GEOMETRY,
> > +			      &msi_geometry);
> > +	reserved = !msi_geometry.iommu_msi_supported;
> > +
> > +	mutex_unlock(&iommu->lock);
> > +
> > +	size = sizeof(*vfio_msi_geometry);
> > +	header = vfio_info_cap_add(caps, size,
> > +				   VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY, 1);
> > +
> > +	if (IS_ERR(header))
> > +		return PTR_ERR(header);
> > +
> > +	vfio_msi_geometry = container_of(header,
> > +				struct vfio_iommu_type1_info_cap_msi_geometry,
> > +				header);
> > +
> > +	vfio_msi_geometry->flags = reserved;  
> 
> Use the bit flag VFIO_IOMMU_MSI_GEOMETRY_RESERVED
> 
> > +	if (reserved) {
> > +		vfio_msi_geometry->aperture_start = msi_geometry.aperture_start;
> > +		vfio_msi_geometry->aperture_end = msi_geometry.aperture_end;  
> 
> But maybe nobody has set these, did you intend to use
> iommu_domain_msi_aperture_valid(), which you defined early on but never
> used?
> 
> > +		return 0;
> > +	}
> > +
> > +	vfio_msi_geometry->order = order;  
> 
> I'm tempted to suggest that a user could do the same math on their own
> since we provide the supported bitmap already... could it ever not be
> the same? 
> 
> > +	/*
> > +	 * we compute a system-wide requirement based on all the registered
> > +	 * doorbells
> > +	 */
> > +	vfio_msi_geometry->size =
> > +		msi_doorbell_calc_pages(order) * ((uint64_t) 1 << order);
> > +
> > +	return 0;
> > +}
> > +
> >  static long vfio_iommu_type1_ioctl(void *iommu_data,
> >  				   unsigned int cmd, unsigned long arg)
> >  {
> > @@ -1122,8 +1173,10 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
> >  		}
> >  	} else if (cmd == VFIO_IOMMU_GET_INFO) {
> >  		struct vfio_iommu_type1_info info;
> > +		struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
> > +		int ret;
> >  
> > -		minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
> > +		minsz = offsetofend(struct vfio_iommu_type1_info, cap_offset);
> >  
> >  		if (copy_from_user(&info, (void __user *)arg, minsz))
> >  			return -EFAULT;
> > @@ -1135,6 +1188,29 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
> >  
> >  		info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
> >  
> > +		ret = compute_msi_geometry_caps(iommu, &caps);
> > +		if (ret)
> > +			return ret;
> > +
> > +		if (caps.size) {
> > +			info.flags |= VFIO_IOMMU_INFO_CAPS;
> > +			if (info.argsz < sizeof(info) + caps.size) {
> > +				info.argsz = sizeof(info) + caps.size;
> > +				info.cap_offset = 0;
> > +			} else {
> > +				vfio_info_cap_shift(&caps, sizeof(info));
> > +				if (copy_to_user((void __user *)arg +
> > +						sizeof(info), caps.buf,
> > +						caps.size)) {
> > +					kfree(caps.buf);
> > +					return -EFAULT;
> > +				}
> > +				info.cap_offset = sizeof(info);
> > +			}
> > +
> > +			kfree(caps.buf);
> > +		}
> > +
> >  		return copy_to_user((void __user *)arg, &info, minsz) ?
> >  			-EFAULT : 0;
> >  
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 4a9dbc2..8dae013 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -488,7 +488,35 @@ struct vfio_iommu_type1_info {
> >  	__u32	argsz;
> >  	__u32	flags;
> >  #define VFIO_IOMMU_INFO_PGSIZES (1 << 0)	/* supported page sizes info */
> > -	__u64	iova_pgsizes;		/* Bitmap of supported page sizes */
> > +#define VFIO_IOMMU_INFO_CAPS	(1 << 1)	/* Info supports caps */
> > +	__u64	iova_pgsizes;	/* Bitmap of supported page sizes */
> > +	__u32	__resv;
> > +	__u32   cap_offset;	/* Offset within info struct of first cap */
> > +};  
> 
> I understand the padding, but not the ordering.  Why not end with
> padding?
> 
> > +
> > +#define VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY	1
> > +
> > +/*
> > + * The MSI geometry capability allows to report the MSI IOVA geometry:
> > + * - either the MSI IOVAs are constrained within a reserved IOVA aperture
> > + *   whose boundaries are given by [@aperture_start, @aperture_end].
> > + *   this is typically the case on x86 host. The userspace is not allowed
> > + *   to map userspace memory at IOVAs intersecting this range using
> > + *   VFIO_IOMMU_MAP_DMA.
> > + * - or the MSI IOVAs are not requested to belong to any reserved range;
> > + *   in that case the userspace must provide an IOVA window characterized by
> > + *   @size and @alignment using VFIO_IOMMU_MAP_DMA with RESERVED_MSI_IOVA flag.
> > + */
> > +struct vfio_iommu_type1_info_cap_msi_geometry {
> > +	struct vfio_info_cap_header header;
> > +	__u32 flags;
> > +#define VFIO_IOMMU_MSI_GEOMETRY_RESERVED (1 << 0) /* reserved geometry */
> > +	/* not reserved */
> > +	__u32 order; /* iommu page order used for aperture alignment*/
> > +	__u64 size; /* IOVA aperture size (bytes) the userspace must provide */
> > +	/* reserved */
> > +	__u64 aperture_start;
> > +	__u64 aperture_end;  
> 
> Should these be a union?  We never set them both.  Should the !reserved
> case have a flag as well, so the user can positively identify what's
> being provided?

Actually, is there really any need to fit both of these within the same
structure?  Part of the idea of the capability chains is we can create
a capability for each new thing we want to describe.  So, we could
simply define a generic reserved IOVA range capability with a 'start'
and 'end' and then another capability to define MSI mapping
requirements.  Thanks,

Alex
 
> >  };
> >  
> >  #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
> > @@ -503,6 +531,8 @@ struct vfio_iommu_type1_info {
> >   * IOVA region that will be used on some platforms to map the host MSI frames.
> >   * In that specific case, vaddr is ignored. Once registered, an MSI reserved
> >   * IOVA region stays until the container is closed.
> > + * The requirement for provisioning such reserved IOVA range can be checked by
> > + * checking the VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY capability.
> >   */
> >  struct vfio_iommu_type1_dma_map {
> >  	__u32	argsz;  
> 

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v13 15/15] vfio/type1: Return the MSI geometry through VFIO_IOMMU_GET_INFO capability chains
@ 2016-10-07 17:10         ` Auger Eric
  0 siblings, 0 replies; 109+ messages in thread
From: Auger Eric @ 2016-10-07 17:10 UTC (permalink / raw)
  To: Alex Williamson
  Cc: yehuday, drjones, jason, kvm, marc.zyngier, p.fedin, joro,
	will.deacon, linux-kernel, Bharat.Bhushan, Jean-Philippe.Brucker,
	iommu, pranav.sawargaonkar, linux-arm-kernel, tglx, robin.murphy,
	Manish.Jaggi, christoffer.dall, eric.auger.pro

Hi Alex,

On 06/10/2016 22:42, Alex Williamson wrote:
> On Thu, 6 Oct 2016 14:20:40 -0600
> Alex Williamson <alex.williamson@redhat.com> wrote:
> 
>> On Thu,  6 Oct 2016 08:45:31 +0000
>> Eric Auger <eric.auger@redhat.com> wrote:
>>
>>> This patch allows the user-space to retrieve the MSI geometry. The
>>> implementation is based on capability chains, now also added to
>>> VFIO_IOMMU_GET_INFO.
>>>
>>> The returned info comprise:
>>> - whether the MSI IOVA are constrained to a reserved range (x86 case) and
>>>   in the positive, the start/end of the aperture,
>>> - or whether the IOVA aperture need to be set by the userspace. In that
>>>   case, the size and alignment of the IOVA window to be provided are
>>>   returned.
>>>
>>> In case the userspace must provide the IOVA aperture, we currently report
>>> a size/alignment based on all the doorbells registered by the host kernel.
>>> This may exceed the actual needs.
>>>
>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>
>>> ---
>>> v11 -> v11:
>>> - msi_doorbell_pages was renamed msi_doorbell_calc_pages
>>>
>>> v9 -> v10:
>>> - move cap_offset after iova_pgsizes
>>> - replace __u64 alignment by __u32 order
>>> - introduce __u32 flags in vfio_iommu_type1_info_cap_msi_geometry and
>>>   fix alignment
>>> - call msi-doorbell API to compute the size/alignment
>>>
>>> v8 -> v9:
>>> - use iommu_msi_supported flag instead of programmable
>>> - replace IOMMU_INFO_REQUIRE_MSI_MAP flag by a more sophisticated
>>>   capability chain, reporting the MSI geometry
>>>
>>> v7 -> v8:
>>> - use iommu_domain_msi_geometry
>>>
>>> v6 -> v7:
>>> - remove the computation of the number of IOVA pages to be provisionned.
>>>   This number depends on the domain/group/device topology which can
>>>   dynamically change. Let's rely instead rely on an arbitrary max depending
>>>   on the system
>>>
>>> v4 -> v5:
>>> - move msi_info and ret declaration within the conditional code
>>>
>>> v3 -> v4:
>>> - replace former vfio_domains_require_msi_mapping by
>>>   more complex computation of MSI mapping requirements, especially the
>>>   number of pages to be provided by the user-space.
>>> - reword patch title
>>>
>>> RFC v1 -> v1:
>>> - derived from
>>>   [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
>>> - renamed allow_msi_reconfig into require_msi_mapping
>>> - fixed VFIO_IOMMU_GET_INFO
>>> ---
>>>  drivers/vfio/vfio_iommu_type1.c | 78 ++++++++++++++++++++++++++++++++++++++++-
>>>  include/uapi/linux/vfio.h       | 32 ++++++++++++++++-
>>>  2 files changed, 108 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>>> index dc3ee5d..ce5e7eb 100644
>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>> @@ -38,6 +38,8 @@
>>>  #include <linux/workqueue.h>
>>>  #include <linux/dma-iommu.h>
>>>  #include <linux/msi-doorbell.h>
>>> +#include <linux/irqdomain.h>
>>> +#include <linux/msi.h>
>>>  
>>>  #define DRIVER_VERSION  "0.2"
>>>  #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
>>> @@ -1101,6 +1103,55 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
>>>  	return ret;
>>>  }
>>>  
>>> +static int compute_msi_geometry_caps(struct vfio_iommu *iommu,
>>> +				     struct vfio_info_cap *caps)
>>> +{
>>> +	struct vfio_iommu_type1_info_cap_msi_geometry *vfio_msi_geometry;
>>> +	unsigned long order = __ffs(vfio_pgsize_bitmap(iommu));
>>> +	struct iommu_domain_msi_geometry msi_geometry;
>>> +	struct vfio_info_cap_header *header;
>>> +	struct vfio_domain *d;
>>> +	bool reserved;
>>> +	size_t size;
>>> +
>>> +	mutex_lock(&iommu->lock);
>>> +	/* All domains have same require_msi_map property, pick first */
>>> +	d = list_first_entry(&iommu->domain_list, struct vfio_domain, next);
>>> +	iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_GEOMETRY,
>>> +			      &msi_geometry);
>>> +	reserved = !msi_geometry.iommu_msi_supported;
>>> +
>>> +	mutex_unlock(&iommu->lock);
>>> +
>>> +	size = sizeof(*vfio_msi_geometry);
>>> +	header = vfio_info_cap_add(caps, size,
>>> +				   VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY, 1);
>>> +
>>> +	if (IS_ERR(header))
>>> +		return PTR_ERR(header);
>>> +
>>> +	vfio_msi_geometry = container_of(header,
>>> +				struct vfio_iommu_type1_info_cap_msi_geometry,
>>> +				header);
>>> +
>>> +	vfio_msi_geometry->flags = reserved;  
>>
>> Use the bit flag VFIO_IOMMU_MSI_GEOMETRY_RESERVED
>>
>>> +	if (reserved) {
>>> +		vfio_msi_geometry->aperture_start = msi_geometry.aperture_start;
>>> +		vfio_msi_geometry->aperture_end = msi_geometry.aperture_end;  
>>
>> But maybe nobody has set these, did you intend to use
>> iommu_domain_msi_aperture_valid(), which you defined early on but never
>> used?
>>
>>> +		return 0;
>>> +	}
>>> +
>>> +	vfio_msi_geometry->order = order;  
>>
>> I'm tempted to suggest that a user could do the same math on their own
>> since we provide the supported bitmap already... could it ever not be
>> the same? 
>>
>>> +	/*
>>> +	 * we compute a system-wide requirement based on all the registered
>>> +	 * doorbells
>>> +	 */
>>> +	vfio_msi_geometry->size =
>>> +		msi_doorbell_calc_pages(order) * ((uint64_t) 1 << order);
>>> +
>>> +	return 0;
>>> +}
>>> +
>>>  static long vfio_iommu_type1_ioctl(void *iommu_data,
>>>  				   unsigned int cmd, unsigned long arg)
>>>  {
>>> @@ -1122,8 +1173,10 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>>>  		}
>>>  	} else if (cmd == VFIO_IOMMU_GET_INFO) {
>>>  		struct vfio_iommu_type1_info info;
>>> +		struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
>>> +		int ret;
>>>  
>>> -		minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
>>> +		minsz = offsetofend(struct vfio_iommu_type1_info, cap_offset);
>>>  
>>>  		if (copy_from_user(&info, (void __user *)arg, minsz))
>>>  			return -EFAULT;
>>> @@ -1135,6 +1188,29 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>>>  
>>>  		info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
>>>  
>>> +		ret = compute_msi_geometry_caps(iommu, &caps);
>>> +		if (ret)
>>> +			return ret;
>>> +
>>> +		if (caps.size) {
>>> +			info.flags |= VFIO_IOMMU_INFO_CAPS;
>>> +			if (info.argsz < sizeof(info) + caps.size) {
>>> +				info.argsz = sizeof(info) + caps.size;
>>> +				info.cap_offset = 0;
>>> +			} else {
>>> +				vfio_info_cap_shift(&caps, sizeof(info));
>>> +				if (copy_to_user((void __user *)arg +
>>> +						sizeof(info), caps.buf,
>>> +						caps.size)) {
>>> +					kfree(caps.buf);
>>> +					return -EFAULT;
>>> +				}
>>> +				info.cap_offset = sizeof(info);
>>> +			}
>>> +
>>> +			kfree(caps.buf);
>>> +		}
>>> +
>>>  		return copy_to_user((void __user *)arg, &info, minsz) ?
>>>  			-EFAULT : 0;
>>>  
>>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>>> index 4a9dbc2..8dae013 100644
>>> --- a/include/uapi/linux/vfio.h
>>> +++ b/include/uapi/linux/vfio.h
>>> @@ -488,7 +488,35 @@ struct vfio_iommu_type1_info {
>>>  	__u32	argsz;
>>>  	__u32	flags;
>>>  #define VFIO_IOMMU_INFO_PGSIZES (1 << 0)	/* supported page sizes info */
>>> -	__u64	iova_pgsizes;		/* Bitmap of supported page sizes */
>>> +#define VFIO_IOMMU_INFO_CAPS	(1 << 1)	/* Info supports caps */
>>> +	__u64	iova_pgsizes;	/* Bitmap of supported page sizes */
>>> +	__u32	__resv;
>>> +	__u32   cap_offset;	/* Offset within info struct of first cap */
>>> +};  
>>
>> I understand the padding, but not the ordering.  Why not end with
>> padding?
>>
>>> +
>>> +#define VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY	1
>>> +
>>> +/*
>>> + * The MSI geometry capability allows to report the MSI IOVA geometry:
>>> + * - either the MSI IOVAs are constrained within a reserved IOVA aperture
>>> + *   whose boundaries are given by [@aperture_start, @aperture_end].
>>> + *   this is typically the case on x86 host. The userspace is not allowed
>>> + *   to map userspace memory at IOVAs intersecting this range using
>>> + *   VFIO_IOMMU_MAP_DMA.
>>> + * - or the MSI IOVAs are not requested to belong to any reserved range;
>>> + *   in that case the userspace must provide an IOVA window characterized by
>>> + *   @size and @alignment using VFIO_IOMMU_MAP_DMA with RESERVED_MSI_IOVA flag.
>>> + */
>>> +struct vfio_iommu_type1_info_cap_msi_geometry {
>>> +	struct vfio_info_cap_header header;
>>> +	__u32 flags;
>>> +#define VFIO_IOMMU_MSI_GEOMETRY_RESERVED (1 << 0) /* reserved geometry */
>>> +	/* not reserved */
>>> +	__u32 order; /* iommu page order used for aperture alignment*/
>>> +	__u64 size; /* IOVA aperture size (bytes) the userspace must provide */
>>> +	/* reserved */
>>> +	__u64 aperture_start;
>>> +	__u64 aperture_end;  
>>
>> Should these be a union?  We never set them both.  Should the !reserved
>> case have a flag as well, so the user can positively identify what's
>> being provided?
> 
> Actually, is there really any need to fit both of these within the same
> structure?  Part of the idea of the capability chains is we can create
> a capability for each new thing we want to describe.  So, we could
> simply define a generic reserved IOVA range capability with a 'start'
> and 'end' and then another capability to define MSI mapping
> requirements.  Thanks,
Yes your suggested approach makes sense to me.

One reason why I proceeded that way is we are mixing things at iommu.h
level too. Personally I would have preferred to separate things:
1) add a new IOMMU_CAP_TRANSLATE_MSI capability in iommu_cap
2) rename iommu_msi_supported into "programmable" bool: reporting
whether the aperture is reserved or programmable.

In the early releases I think it was as above but slightly we moved to a
mixed description.

What do you think?

Thank you for the whole review!

Eric

> 
> Alex
>  
>>>  };
>>>  
>>>  #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
>>> @@ -503,6 +531,8 @@ struct vfio_iommu_type1_info {
>>>   * IOVA region that will be used on some platforms to map the host MSI frames.
>>>   * In that specific case, vaddr is ignored. Once registered, an MSI reserved
>>>   * IOVA region stays until the container is closed.
>>> + * The requirement for provisioning such reserved IOVA range can be checked by
>>> + * checking the VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY capability.
>>>   */
>>>  struct vfio_iommu_type1_dma_map {
>>>  	__u32	argsz;  
>>
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v13 15/15] vfio/type1: Return the MSI geometry through VFIO_IOMMU_GET_INFO capability chains
@ 2016-10-07 17:10         ` Auger Eric
  0 siblings, 0 replies; 109+ messages in thread
From: Auger Eric @ 2016-10-07 17:10 UTC (permalink / raw)
  To: Alex Williamson
  Cc: yehuday-eYqpPyKDWXRBDgjK7y7TUQ, jason-NLaQJdtUoK4Be96aLqz0jA,
	kvm-u79uwXL29TY76Z2rM5mHXA, marc.zyngier-5wv7dgnIgG8,
	will.deacon-5wv7dgnIgG8, p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
	drjones-H+wXaHxf7aLQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	tglx-hfZtesqFncYOwBW4kG4KsQ,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w

Hi Alex,

On 06/10/2016 22:42, Alex Williamson wrote:
> On Thu, 6 Oct 2016 14:20:40 -0600
> Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> 
>> On Thu,  6 Oct 2016 08:45:31 +0000
>> Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>>
>>> This patch allows the user-space to retrieve the MSI geometry. The
>>> implementation is based on capability chains, now also added to
>>> VFIO_IOMMU_GET_INFO.
>>>
>>> The returned info comprise:
>>> - whether the MSI IOVA are constrained to a reserved range (x86 case) and
>>>   in the positive, the start/end of the aperture,
>>> - or whether the IOVA aperture need to be set by the userspace. In that
>>>   case, the size and alignment of the IOVA window to be provided are
>>>   returned.
>>>
>>> In case the userspace must provide the IOVA aperture, we currently report
>>> a size/alignment based on all the doorbells registered by the host kernel.
>>> This may exceed the actual needs.
>>>
>>> Signed-off-by: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>>>
>>> ---
>>> v11 -> v11:
>>> - msi_doorbell_pages was renamed msi_doorbell_calc_pages
>>>
>>> v9 -> v10:
>>> - move cap_offset after iova_pgsizes
>>> - replace __u64 alignment by __u32 order
>>> - introduce __u32 flags in vfio_iommu_type1_info_cap_msi_geometry and
>>>   fix alignment
>>> - call msi-doorbell API to compute the size/alignment
>>>
>>> v8 -> v9:
>>> - use iommu_msi_supported flag instead of programmable
>>> - replace IOMMU_INFO_REQUIRE_MSI_MAP flag by a more sophisticated
>>>   capability chain, reporting the MSI geometry
>>>
>>> v7 -> v8:
>>> - use iommu_domain_msi_geometry
>>>
>>> v6 -> v7:
>>> - remove the computation of the number of IOVA pages to be provisionned.
>>>   This number depends on the domain/group/device topology which can
>>>   dynamically change. Let's rely instead rely on an arbitrary max depending
>>>   on the system
>>>
>>> v4 -> v5:
>>> - move msi_info and ret declaration within the conditional code
>>>
>>> v3 -> v4:
>>> - replace former vfio_domains_require_msi_mapping by
>>>   more complex computation of MSI mapping requirements, especially the
>>>   number of pages to be provided by the user-space.
>>> - reword patch title
>>>
>>> RFC v1 -> v1:
>>> - derived from
>>>   [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
>>> - renamed allow_msi_reconfig into require_msi_mapping
>>> - fixed VFIO_IOMMU_GET_INFO
>>> ---
>>>  drivers/vfio/vfio_iommu_type1.c | 78 ++++++++++++++++++++++++++++++++++++++++-
>>>  include/uapi/linux/vfio.h       | 32 ++++++++++++++++-
>>>  2 files changed, 108 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>>> index dc3ee5d..ce5e7eb 100644
>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>> @@ -38,6 +38,8 @@
>>>  #include <linux/workqueue.h>
>>>  #include <linux/dma-iommu.h>
>>>  #include <linux/msi-doorbell.h>
>>> +#include <linux/irqdomain.h>
>>> +#include <linux/msi.h>
>>>  
>>>  #define DRIVER_VERSION  "0.2"
>>>  #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>"
>>> @@ -1101,6 +1103,55 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
>>>  	return ret;
>>>  }
>>>  
>>> +static int compute_msi_geometry_caps(struct vfio_iommu *iommu,
>>> +				     struct vfio_info_cap *caps)
>>> +{
>>> +	struct vfio_iommu_type1_info_cap_msi_geometry *vfio_msi_geometry;
>>> +	unsigned long order = __ffs(vfio_pgsize_bitmap(iommu));
>>> +	struct iommu_domain_msi_geometry msi_geometry;
>>> +	struct vfio_info_cap_header *header;
>>> +	struct vfio_domain *d;
>>> +	bool reserved;
>>> +	size_t size;
>>> +
>>> +	mutex_lock(&iommu->lock);
>>> +	/* All domains have same require_msi_map property, pick first */
>>> +	d = list_first_entry(&iommu->domain_list, struct vfio_domain, next);
>>> +	iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_GEOMETRY,
>>> +			      &msi_geometry);
>>> +	reserved = !msi_geometry.iommu_msi_supported;
>>> +
>>> +	mutex_unlock(&iommu->lock);
>>> +
>>> +	size = sizeof(*vfio_msi_geometry);
>>> +	header = vfio_info_cap_add(caps, size,
>>> +				   VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY, 1);
>>> +
>>> +	if (IS_ERR(header))
>>> +		return PTR_ERR(header);
>>> +
>>> +	vfio_msi_geometry = container_of(header,
>>> +				struct vfio_iommu_type1_info_cap_msi_geometry,
>>> +				header);
>>> +
>>> +	vfio_msi_geometry->flags = reserved;  
>>
>> Use the bit flag VFIO_IOMMU_MSI_GEOMETRY_RESERVED
>>
>>> +	if (reserved) {
>>> +		vfio_msi_geometry->aperture_start = msi_geometry.aperture_start;
>>> +		vfio_msi_geometry->aperture_end = msi_geometry.aperture_end;  
>>
>> But maybe nobody has set these, did you intend to use
>> iommu_domain_msi_aperture_valid(), which you defined early on but never
>> used?
>>
>>> +		return 0;
>>> +	}
>>> +
>>> +	vfio_msi_geometry->order = order;  
>>
>> I'm tempted to suggest that a user could do the same math on their own
>> since we provide the supported bitmap already... could it ever not be
>> the same? 
>>
>>> +	/*
>>> +	 * we compute a system-wide requirement based on all the registered
>>> +	 * doorbells
>>> +	 */
>>> +	vfio_msi_geometry->size =
>>> +		msi_doorbell_calc_pages(order) * ((uint64_t) 1 << order);
>>> +
>>> +	return 0;
>>> +}
>>> +
>>>  static long vfio_iommu_type1_ioctl(void *iommu_data,
>>>  				   unsigned int cmd, unsigned long arg)
>>>  {
>>> @@ -1122,8 +1173,10 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>>>  		}
>>>  	} else if (cmd == VFIO_IOMMU_GET_INFO) {
>>>  		struct vfio_iommu_type1_info info;
>>> +		struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
>>> +		int ret;
>>>  
>>> -		minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
>>> +		minsz = offsetofend(struct vfio_iommu_type1_info, cap_offset);
>>>  
>>>  		if (copy_from_user(&info, (void __user *)arg, minsz))
>>>  			return -EFAULT;
>>> @@ -1135,6 +1188,29 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>>>  
>>>  		info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
>>>  
>>> +		ret = compute_msi_geometry_caps(iommu, &caps);
>>> +		if (ret)
>>> +			return ret;
>>> +
>>> +		if (caps.size) {
>>> +			info.flags |= VFIO_IOMMU_INFO_CAPS;
>>> +			if (info.argsz < sizeof(info) + caps.size) {
>>> +				info.argsz = sizeof(info) + caps.size;
>>> +				info.cap_offset = 0;
>>> +			} else {
>>> +				vfio_info_cap_shift(&caps, sizeof(info));
>>> +				if (copy_to_user((void __user *)arg +
>>> +						sizeof(info), caps.buf,
>>> +						caps.size)) {
>>> +					kfree(caps.buf);
>>> +					return -EFAULT;
>>> +				}
>>> +				info.cap_offset = sizeof(info);
>>> +			}
>>> +
>>> +			kfree(caps.buf);
>>> +		}
>>> +
>>>  		return copy_to_user((void __user *)arg, &info, minsz) ?
>>>  			-EFAULT : 0;
>>>  
>>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>>> index 4a9dbc2..8dae013 100644
>>> --- a/include/uapi/linux/vfio.h
>>> +++ b/include/uapi/linux/vfio.h
>>> @@ -488,7 +488,35 @@ struct vfio_iommu_type1_info {
>>>  	__u32	argsz;
>>>  	__u32	flags;
>>>  #define VFIO_IOMMU_INFO_PGSIZES (1 << 0)	/* supported page sizes info */
>>> -	__u64	iova_pgsizes;		/* Bitmap of supported page sizes */
>>> +#define VFIO_IOMMU_INFO_CAPS	(1 << 1)	/* Info supports caps */
>>> +	__u64	iova_pgsizes;	/* Bitmap of supported page sizes */
>>> +	__u32	__resv;
>>> +	__u32   cap_offset;	/* Offset within info struct of first cap */
>>> +};  
>>
>> I understand the padding, but not the ordering.  Why not end with
>> padding?
>>
>>> +
>>> +#define VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY	1
>>> +
>>> +/*
>>> + * The MSI geometry capability allows to report the MSI IOVA geometry:
>>> + * - either the MSI IOVAs are constrained within a reserved IOVA aperture
>>> + *   whose boundaries are given by [@aperture_start, @aperture_end].
>>> + *   this is typically the case on x86 host. The userspace is not allowed
>>> + *   to map userspace memory at IOVAs intersecting this range using
>>> + *   VFIO_IOMMU_MAP_DMA.
>>> + * - or the MSI IOVAs are not requested to belong to any reserved range;
>>> + *   in that case the userspace must provide an IOVA window characterized by
>>> + *   @size and @alignment using VFIO_IOMMU_MAP_DMA with RESERVED_MSI_IOVA flag.
>>> + */
>>> +struct vfio_iommu_type1_info_cap_msi_geometry {
>>> +	struct vfio_info_cap_header header;
>>> +	__u32 flags;
>>> +#define VFIO_IOMMU_MSI_GEOMETRY_RESERVED (1 << 0) /* reserved geometry */
>>> +	/* not reserved */
>>> +	__u32 order; /* iommu page order used for aperture alignment*/
>>> +	__u64 size; /* IOVA aperture size (bytes) the userspace must provide */
>>> +	/* reserved */
>>> +	__u64 aperture_start;
>>> +	__u64 aperture_end;  
>>
>> Should these be a union?  We never set them both.  Should the !reserved
>> case have a flag as well, so the user can positively identify what's
>> being provided?
> 
> Actually, is there really any need to fit both of these within the same
> structure?  Part of the idea of the capability chains is we can create
> a capability for each new thing we want to describe.  So, we could
> simply define a generic reserved IOVA range capability with a 'start'
> and 'end' and then another capability to define MSI mapping
> requirements.  Thanks,
Yes your suggested approach makes sense to me.

One reason why I proceeded that way is we are mixing things at iommu.h
level too. Personally I would have preferred to separate things:
1) add a new IOMMU_CAP_TRANSLATE_MSI capability in iommu_cap
2) rename iommu_msi_supported into "programmable" bool: reporting
whether the aperture is reserved or programmable.

In the early releases I think it was as above but slightly we moved to a
mixed description.

What do you think?

Thank you for the whole review!

Eric

> 
> Alex
>  
>>>  };
>>>  
>>>  #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
>>> @@ -503,6 +531,8 @@ struct vfio_iommu_type1_info {
>>>   * IOVA region that will be used on some platforms to map the host MSI frames.
>>>   * In that specific case, vaddr is ignored. Once registered, an MSI reserved
>>>   * IOVA region stays until the container is closed.
>>> + * The requirement for provisioning such reserved IOVA range can be checked by
>>> + * checking the VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY capability.
>>>   */
>>>  struct vfio_iommu_type1_dma_map {
>>>  	__u32	argsz;  
>>
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v13 15/15] vfio/type1: Return the MSI geometry through VFIO_IOMMU_GET_INFO capability chains
@ 2016-10-07 17:10         ` Auger Eric
  0 siblings, 0 replies; 109+ messages in thread
From: Auger Eric @ 2016-10-07 17:10 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Alex,

On 06/10/2016 22:42, Alex Williamson wrote:
> On Thu, 6 Oct 2016 14:20:40 -0600
> Alex Williamson <alex.williamson@redhat.com> wrote:
> 
>> On Thu,  6 Oct 2016 08:45:31 +0000
>> Eric Auger <eric.auger@redhat.com> wrote:
>>
>>> This patch allows the user-space to retrieve the MSI geometry. The
>>> implementation is based on capability chains, now also added to
>>> VFIO_IOMMU_GET_INFO.
>>>
>>> The returned info comprise:
>>> - whether the MSI IOVA are constrained to a reserved range (x86 case) and
>>>   in the positive, the start/end of the aperture,
>>> - or whether the IOVA aperture need to be set by the userspace. In that
>>>   case, the size and alignment of the IOVA window to be provided are
>>>   returned.
>>>
>>> In case the userspace must provide the IOVA aperture, we currently report
>>> a size/alignment based on all the doorbells registered by the host kernel.
>>> This may exceed the actual needs.
>>>
>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>
>>> ---
>>> v11 -> v11:
>>> - msi_doorbell_pages was renamed msi_doorbell_calc_pages
>>>
>>> v9 -> v10:
>>> - move cap_offset after iova_pgsizes
>>> - replace __u64 alignment by __u32 order
>>> - introduce __u32 flags in vfio_iommu_type1_info_cap_msi_geometry and
>>>   fix alignment
>>> - call msi-doorbell API to compute the size/alignment
>>>
>>> v8 -> v9:
>>> - use iommu_msi_supported flag instead of programmable
>>> - replace IOMMU_INFO_REQUIRE_MSI_MAP flag by a more sophisticated
>>>   capability chain, reporting the MSI geometry
>>>
>>> v7 -> v8:
>>> - use iommu_domain_msi_geometry
>>>
>>> v6 -> v7:
>>> - remove the computation of the number of IOVA pages to be provisionned.
>>>   This number depends on the domain/group/device topology which can
>>>   dynamically change. Let's rely instead rely on an arbitrary max depending
>>>   on the system
>>>
>>> v4 -> v5:
>>> - move msi_info and ret declaration within the conditional code
>>>
>>> v3 -> v4:
>>> - replace former vfio_domains_require_msi_mapping by
>>>   more complex computation of MSI mapping requirements, especially the
>>>   number of pages to be provided by the user-space.
>>> - reword patch title
>>>
>>> RFC v1 -> v1:
>>> - derived from
>>>   [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
>>> - renamed allow_msi_reconfig into require_msi_mapping
>>> - fixed VFIO_IOMMU_GET_INFO
>>> ---
>>>  drivers/vfio/vfio_iommu_type1.c | 78 ++++++++++++++++++++++++++++++++++++++++-
>>>  include/uapi/linux/vfio.h       | 32 ++++++++++++++++-
>>>  2 files changed, 108 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>>> index dc3ee5d..ce5e7eb 100644
>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>> @@ -38,6 +38,8 @@
>>>  #include <linux/workqueue.h>
>>>  #include <linux/dma-iommu.h>
>>>  #include <linux/msi-doorbell.h>
>>> +#include <linux/irqdomain.h>
>>> +#include <linux/msi.h>
>>>  
>>>  #define DRIVER_VERSION  "0.2"
>>>  #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
>>> @@ -1101,6 +1103,55 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
>>>  	return ret;
>>>  }
>>>  
>>> +static int compute_msi_geometry_caps(struct vfio_iommu *iommu,
>>> +				     struct vfio_info_cap *caps)
>>> +{
>>> +	struct vfio_iommu_type1_info_cap_msi_geometry *vfio_msi_geometry;
>>> +	unsigned long order = __ffs(vfio_pgsize_bitmap(iommu));
>>> +	struct iommu_domain_msi_geometry msi_geometry;
>>> +	struct vfio_info_cap_header *header;
>>> +	struct vfio_domain *d;
>>> +	bool reserved;
>>> +	size_t size;
>>> +
>>> +	mutex_lock(&iommu->lock);
>>> +	/* All domains have same require_msi_map property, pick first */
>>> +	d = list_first_entry(&iommu->domain_list, struct vfio_domain, next);
>>> +	iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_GEOMETRY,
>>> +			      &msi_geometry);
>>> +	reserved = !msi_geometry.iommu_msi_supported;
>>> +
>>> +	mutex_unlock(&iommu->lock);
>>> +
>>> +	size = sizeof(*vfio_msi_geometry);
>>> +	header = vfio_info_cap_add(caps, size,
>>> +				   VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY, 1);
>>> +
>>> +	if (IS_ERR(header))
>>> +		return PTR_ERR(header);
>>> +
>>> +	vfio_msi_geometry = container_of(header,
>>> +				struct vfio_iommu_type1_info_cap_msi_geometry,
>>> +				header);
>>> +
>>> +	vfio_msi_geometry->flags = reserved;  
>>
>> Use the bit flag VFIO_IOMMU_MSI_GEOMETRY_RESERVED
>>
>>> +	if (reserved) {
>>> +		vfio_msi_geometry->aperture_start = msi_geometry.aperture_start;
>>> +		vfio_msi_geometry->aperture_end = msi_geometry.aperture_end;  
>>
>> But maybe nobody has set these, did you intend to use
>> iommu_domain_msi_aperture_valid(), which you defined early on but never
>> used?
>>
>>> +		return 0;
>>> +	}
>>> +
>>> +	vfio_msi_geometry->order = order;  
>>
>> I'm tempted to suggest that a user could do the same math on their own
>> since we provide the supported bitmap already... could it ever not be
>> the same? 
>>
>>> +	/*
>>> +	 * we compute a system-wide requirement based on all the registered
>>> +	 * doorbells
>>> +	 */
>>> +	vfio_msi_geometry->size =
>>> +		msi_doorbell_calc_pages(order) * ((uint64_t) 1 << order);
>>> +
>>> +	return 0;
>>> +}
>>> +
>>>  static long vfio_iommu_type1_ioctl(void *iommu_data,
>>>  				   unsigned int cmd, unsigned long arg)
>>>  {
>>> @@ -1122,8 +1173,10 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>>>  		}
>>>  	} else if (cmd == VFIO_IOMMU_GET_INFO) {
>>>  		struct vfio_iommu_type1_info info;
>>> +		struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
>>> +		int ret;
>>>  
>>> -		minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
>>> +		minsz = offsetofend(struct vfio_iommu_type1_info, cap_offset);
>>>  
>>>  		if (copy_from_user(&info, (void __user *)arg, minsz))
>>>  			return -EFAULT;
>>> @@ -1135,6 +1188,29 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>>>  
>>>  		info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
>>>  
>>> +		ret = compute_msi_geometry_caps(iommu, &caps);
>>> +		if (ret)
>>> +			return ret;
>>> +
>>> +		if (caps.size) {
>>> +			info.flags |= VFIO_IOMMU_INFO_CAPS;
>>> +			if (info.argsz < sizeof(info) + caps.size) {
>>> +				info.argsz = sizeof(info) + caps.size;
>>> +				info.cap_offset = 0;
>>> +			} else {
>>> +				vfio_info_cap_shift(&caps, sizeof(info));
>>> +				if (copy_to_user((void __user *)arg +
>>> +						sizeof(info), caps.buf,
>>> +						caps.size)) {
>>> +					kfree(caps.buf);
>>> +					return -EFAULT;
>>> +				}
>>> +				info.cap_offset = sizeof(info);
>>> +			}
>>> +
>>> +			kfree(caps.buf);
>>> +		}
>>> +
>>>  		return copy_to_user((void __user *)arg, &info, minsz) ?
>>>  			-EFAULT : 0;
>>>  
>>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>>> index 4a9dbc2..8dae013 100644
>>> --- a/include/uapi/linux/vfio.h
>>> +++ b/include/uapi/linux/vfio.h
>>> @@ -488,7 +488,35 @@ struct vfio_iommu_type1_info {
>>>  	__u32	argsz;
>>>  	__u32	flags;
>>>  #define VFIO_IOMMU_INFO_PGSIZES (1 << 0)	/* supported page sizes info */
>>> -	__u64	iova_pgsizes;		/* Bitmap of supported page sizes */
>>> +#define VFIO_IOMMU_INFO_CAPS	(1 << 1)	/* Info supports caps */
>>> +	__u64	iova_pgsizes;	/* Bitmap of supported page sizes */
>>> +	__u32	__resv;
>>> +	__u32   cap_offset;	/* Offset within info struct of first cap */
>>> +};  
>>
>> I understand the padding, but not the ordering.  Why not end with
>> padding?
>>
>>> +
>>> +#define VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY	1
>>> +
>>> +/*
>>> + * The MSI geometry capability allows to report the MSI IOVA geometry:
>>> + * - either the MSI IOVAs are constrained within a reserved IOVA aperture
>>> + *   whose boundaries are given by [@aperture_start, @aperture_end].
>>> + *   this is typically the case on x86 host. The userspace is not allowed
>>> + *   to map userspace memory at IOVAs intersecting this range using
>>> + *   VFIO_IOMMU_MAP_DMA.
>>> + * - or the MSI IOVAs are not requested to belong to any reserved range;
>>> + *   in that case the userspace must provide an IOVA window characterized by
>>> + *   @size and @alignment using VFIO_IOMMU_MAP_DMA with RESERVED_MSI_IOVA flag.
>>> + */
>>> +struct vfio_iommu_type1_info_cap_msi_geometry {
>>> +	struct vfio_info_cap_header header;
>>> +	__u32 flags;
>>> +#define VFIO_IOMMU_MSI_GEOMETRY_RESERVED (1 << 0) /* reserved geometry */
>>> +	/* not reserved */
>>> +	__u32 order; /* iommu page order used for aperture alignment*/
>>> +	__u64 size; /* IOVA aperture size (bytes) the userspace must provide */
>>> +	/* reserved */
>>> +	__u64 aperture_start;
>>> +	__u64 aperture_end;  
>>
>> Should these be a union?  We never set them both.  Should the !reserved
>> case have a flag as well, so the user can positively identify what's
>> being provided?
> 
> Actually, is there really any need to fit both of these within the same
> structure?  Part of the idea of the capability chains is we can create
> a capability for each new thing we want to describe.  So, we could
> simply define a generic reserved IOVA range capability with a 'start'
> and 'end' and then another capability to define MSI mapping
> requirements.  Thanks,
Yes your suggested approach makes sense to me.

One reason why I proceeded that way is we are mixing things at iommu.h
level too. Personally I would have preferred to separate things:
1) add a new IOMMU_CAP_TRANSLATE_MSI capability in iommu_cap
2) rename iommu_msi_supported into "programmable" bool: reporting
whether the aperture is reserved or programmable.

In the early releases I think it was as above but slightly we moved to a
mixed description.

What do you think?

Thank you for the whole review!

Eric

> 
> Alex
>  
>>>  };
>>>  
>>>  #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
>>> @@ -503,6 +531,8 @@ struct vfio_iommu_type1_info {
>>>   * IOVA region that will be used on some platforms to map the host MSI frames.
>>>   * In that specific case, vaddr is ignored. Once registered, an MSI reserved
>>>   * IOVA region stays until the container is closed.
>>> + * The requirement for provisioning such reserved IOVA range can be checked by
>>> + * checking the VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY capability.
>>>   */
>>>  struct vfio_iommu_type1_dma_map {
>>>  	__u32	argsz;  
>>
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v13 12/15] vfio: Allow reserved msi iova registration
@ 2016-10-07 17:11       ` Auger Eric
  0 siblings, 0 replies; 109+ messages in thread
From: Auger Eric @ 2016-10-07 17:11 UTC (permalink / raw)
  To: Alex Williamson
  Cc: yehuday, drjones, jason, kvm, marc.zyngier, p.fedin, joro,
	will.deacon, linux-kernel, Bharat.Bhushan, Jean-Philippe.Brucker,
	iommu, pranav.sawargaonkar, linux-arm-kernel, tglx, robin.murphy,
	Manish.Jaggi, christoffer.dall, eric.auger.pro

Hi Alex,

On 06/10/2016 22:19, Alex Williamson wrote:
> On Thu,  6 Oct 2016 08:45:28 +0000
> Eric Auger <eric.auger@redhat.com> wrote:
> 
>> The user is allowed to register a reserved MSI IOVA range by using the
>> DMA MAP API and setting the new flag: VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA.
>> This region is stored in the vfio_dma rb tree. At that point the iova
>> range is not mapped to any target address yet. The host kernel will use
>> those iova when needed, typically when MSIs are allocated.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
>>
>> ---
>> v12 -> v13:
>> - use iommu_get_dma_msi_region_cookie
>>
>> v9 -> v10
>> - use VFIO_IOVA_RESERVED_MSI enum value
>>
>> v7 -> v8:
>> - use iommu_msi_set_aperture function. There is no notion of
>>   unregistration anymore since the reserved msi slot remains
>>   until the container gets closed.
>>
>> v6 -> v7:
>> - use iommu_free_reserved_iova_domain
>> - convey prot attributes downto dma-reserved-iommu iova domain creation
>> - reserved bindings teardown now performed on iommu domain destruction
>> - rename VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA into
>>          VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA
>> - change title
>> - pass the protection attribute to dma-reserved-iommu API
>>
>> v3 -> v4:
>> - use iommu_alloc/free_reserved_iova_domain exported by dma-reserved-iommu
>> - protect vfio_register_reserved_iova_range implementation with
>>   CONFIG_IOMMU_DMA_RESERVED
>> - handle unregistration by user-space and on vfio_iommu_type1 release
>>
>> v1 -> v2:
>> - set returned value according to alloc_reserved_iova_domain result
>> - free the iova domains in case any error occurs
>>
>> RFC v1 -> v1:
>> - takes into account Alex comments, based on
>>   [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region:
>> - use the existing dma map/unmap ioctl interface with a flag to register
>>   a reserved IOVA range. A single reserved iova region is allowed.
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 77 ++++++++++++++++++++++++++++++++++++++++-
>>  include/uapi/linux/vfio.h       | 10 +++++-
>>  2 files changed, 85 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index 5bc5fc9..c2f8bd9 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -442,6 +442,20 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
>>  	vfio_lock_acct(-unlocked);
>>  }
>>  
>> +static int vfio_set_msi_aperture(struct vfio_iommu *iommu,
>> +				dma_addr_t iova, size_t size)
>> +{
>> +	struct vfio_domain *d;
>> +	int ret = 0;
>> +
>> +	list_for_each_entry(d, &iommu->domain_list, next) {
>> +		ret = iommu_get_dma_msi_region_cookie(d->domain, iova, size);
>> +		if (ret)
>> +			break;
>> +	}
>> +	return ret;
> 
> Doesn't this need an unwind on failure loop?
At the moment the de-allocation is done by the smmu driver, on
domain_free ops, which calls iommu_put_dma_cookie. In case,
iommu_get_dma_msi_region_cookie fails on a given VFIO domain currently
there is no other way but destroying all VFIO domains and redo everything.

So yes I plan to unfold everything, ie call iommu_put_dma_cookie for
each domain.
> 
>> +}
>> +
>>  static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
>>  {
>>  	vfio_unmap_unpin(iommu, dma);
>> @@ -691,6 +705,63 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>>  	return ret;
>>  }
>>  
>> +static int vfio_register_msi_range(struct vfio_iommu *iommu,
>> +				   struct vfio_iommu_type1_dma_map *map)
>> +{
>> +	dma_addr_t iova = map->iova;
>> +	size_t size = map->size;
>> +	int ret = 0;
>> +	struct vfio_dma *dma;
>> +	unsigned long order;
>> +	uint64_t mask;
>> +
>> +	/* Verify that none of our __u64 fields overflow */
>> +	if (map->size != size || map->iova != iova)
>> +		return -EINVAL;
>> +
>> +	order =  __ffs(vfio_pgsize_bitmap(iommu));
>> +	mask = ((uint64_t)1 << order) - 1;
>> +
>> +	WARN_ON(mask & PAGE_MASK);
>> +
>> +	if (!size || (size | iova) & mask)
>> +		return -EINVAL;
>> +
>> +	/* Don't allow IOVA address wrap */
>> +	if (iova + size - 1 < iova)
>> +		return -EINVAL;
>> +
>> +	mutex_lock(&iommu->lock);
>> +
>> +	if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
>> +		ret =  -EEXIST;
>> +		goto unlock;
>> +	}
>> +
>> +	dma = kzalloc(sizeof(*dma), GFP_KERNEL);
>> +	if (!dma) {
>> +		ret = -ENOMEM;
>> +		goto unlock;
>> +	}
>> +
>> +	dma->iova = iova;
>> +	dma->size = size;
>> +	dma->type = VFIO_IOVA_RESERVED_MSI;
>> +
>> +	ret = vfio_set_msi_aperture(iommu, iova, size);
>> +	if (ret)
>> +		goto free_unlock;
>> +
>> +	vfio_link_dma(iommu, dma);
>> +	goto unlock;
>> +
>> +free_unlock:
>> +	kfree(dma);
>> +unlock:
>> +	mutex_unlock(&iommu->lock);
>> +	return ret;
>> +}
>> +
>>  static int vfio_bus_type(struct device *dev, void *data)
>>  {
>>  	struct bus_type **bus = data;
>> @@ -1064,7 +1135,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>>  	} else if (cmd == VFIO_IOMMU_MAP_DMA) {
>>  		struct vfio_iommu_type1_dma_map map;
>>  		uint32_t mask = VFIO_DMA_MAP_FLAG_READ |
>> -				VFIO_DMA_MAP_FLAG_WRITE;
>> +				VFIO_DMA_MAP_FLAG_WRITE |
>> +				VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA;
>>  
>>  		minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
>>  
>> @@ -1074,6 +1146,9 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>>  		if (map.argsz < minsz || map.flags & ~mask)
>>  			return -EINVAL;
>>  
>> +		if (map.flags & VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA)
>> +			return vfio_register_msi_range(iommu, &map);
>> +
>>  		return vfio_dma_do_map(iommu, &map);
>>  
>>  	} else if (cmd == VFIO_IOMMU_UNMAP_DMA) {
>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>> index 255a211..4a9dbc2 100644
>> --- a/include/uapi/linux/vfio.h
>> +++ b/include/uapi/linux/vfio.h
>> @@ -498,12 +498,19 @@ struct vfio_iommu_type1_info {
>>   *
>>   * Map process virtual addresses to IO virtual addresses using the
>>   * provided struct vfio_dma_map. Caller sets argsz. READ &/ WRITE required.
>> + *
>> + * In case RESERVED_MSI_IOVA flag is set, the API only aims at registering an
>> + * IOVA region that will be used on some platforms to map the host MSI frames.
>> + * In that specific case, vaddr is ignored. Once registered, an MSI reserved
>> + * IOVA region stays until the container is closed.
>>   */
>>  struct vfio_iommu_type1_dma_map {
>>  	__u32	argsz;
>>  	__u32	flags;
>>  #define VFIO_DMA_MAP_FLAG_READ (1 << 0)		/* readable from device */
>>  #define VFIO_DMA_MAP_FLAG_WRITE (1 << 1)	/* writable from device */
>> +/* reserved iova for MSI vectors*/
>> +#define VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA (1 << 2)
>>  	__u64	vaddr;				/* Process virtual address */
>>  	__u64	iova;				/* IO virtual address */
>>  	__u64	size;				/* Size of mapping (bytes) */
>> @@ -519,7 +526,8 @@ struct vfio_iommu_type1_dma_map {
>>   * Caller sets argsz.  The actual unmapped size is returned in the size
>>   * field.  No guarantee is made to the user that arbitrary unmaps of iova
>>   * or size different from those used in the original mapping call will
>> - * succeed.
>> + * succeed. Once registered, an MSI region cannot be unmapped and stays
>> + * until the container is closed.
>>   */
>>  struct vfio_iommu_type1_dma_unmap {
>>  	__u32	argsz;
> 
> What happens when an x86 user does a mapping with this new flag set?
> It seems like we end up configuring everything just as we would on a
> platform requiring MSI mapping, including setting the domain MSI
> geometry.  Should we be testing the MSI geometry flag on the iommu to
> see if this is supported?  Surprisingly few things seem to check that
> flag.

Yes I need to test the capability first and return -EINVAL in case the
capability is not supported..

Thanks

Eric
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v13 12/15] vfio: Allow reserved msi iova registration
@ 2016-10-07 17:11       ` Auger Eric
  0 siblings, 0 replies; 109+ messages in thread
From: Auger Eric @ 2016-10-07 17:11 UTC (permalink / raw)
  To: Alex Williamson
  Cc: yehuday-eYqpPyKDWXRBDgjK7y7TUQ, jason-NLaQJdtUoK4Be96aLqz0jA,
	kvm-u79uwXL29TY76Z2rM5mHXA, marc.zyngier-5wv7dgnIgG8,
	will.deacon-5wv7dgnIgG8, p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
	drjones-H+wXaHxf7aLQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	tglx-hfZtesqFncYOwBW4kG4KsQ,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w

Hi Alex,

On 06/10/2016 22:19, Alex Williamson wrote:
> On Thu,  6 Oct 2016 08:45:28 +0000
> Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> 
>> The user is allowed to register a reserved MSI IOVA range by using the
>> DMA MAP API and setting the new flag: VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA.
>> This region is stored in the vfio_dma rb tree. At that point the iova
>> range is not mapped to any target address yet. The host kernel will use
>> those iova when needed, typically when MSIs are allocated.
>>
>> Signed-off-by: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>> Signed-off-by: Bharat Bhushan <Bharat.Bhushan-KZfg59tc24xl57MIdRCFDg@public.gmane.org>
>>
>> ---
>> v12 -> v13:
>> - use iommu_get_dma_msi_region_cookie
>>
>> v9 -> v10
>> - use VFIO_IOVA_RESERVED_MSI enum value
>>
>> v7 -> v8:
>> - use iommu_msi_set_aperture function. There is no notion of
>>   unregistration anymore since the reserved msi slot remains
>>   until the container gets closed.
>>
>> v6 -> v7:
>> - use iommu_free_reserved_iova_domain
>> - convey prot attributes downto dma-reserved-iommu iova domain creation
>> - reserved bindings teardown now performed on iommu domain destruction
>> - rename VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA into
>>          VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA
>> - change title
>> - pass the protection attribute to dma-reserved-iommu API
>>
>> v3 -> v4:
>> - use iommu_alloc/free_reserved_iova_domain exported by dma-reserved-iommu
>> - protect vfio_register_reserved_iova_range implementation with
>>   CONFIG_IOMMU_DMA_RESERVED
>> - handle unregistration by user-space and on vfio_iommu_type1 release
>>
>> v1 -> v2:
>> - set returned value according to alloc_reserved_iova_domain result
>> - free the iova domains in case any error occurs
>>
>> RFC v1 -> v1:
>> - takes into account Alex comments, based on
>>   [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region:
>> - use the existing dma map/unmap ioctl interface with a flag to register
>>   a reserved IOVA range. A single reserved iova region is allowed.
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 77 ++++++++++++++++++++++++++++++++++++++++-
>>  include/uapi/linux/vfio.h       | 10 +++++-
>>  2 files changed, 85 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index 5bc5fc9..c2f8bd9 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -442,6 +442,20 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
>>  	vfio_lock_acct(-unlocked);
>>  }
>>  
>> +static int vfio_set_msi_aperture(struct vfio_iommu *iommu,
>> +				dma_addr_t iova, size_t size)
>> +{
>> +	struct vfio_domain *d;
>> +	int ret = 0;
>> +
>> +	list_for_each_entry(d, &iommu->domain_list, next) {
>> +		ret = iommu_get_dma_msi_region_cookie(d->domain, iova, size);
>> +		if (ret)
>> +			break;
>> +	}
>> +	return ret;
> 
> Doesn't this need an unwind on failure loop?
At the moment the de-allocation is done by the smmu driver, on
domain_free ops, which calls iommu_put_dma_cookie. In case,
iommu_get_dma_msi_region_cookie fails on a given VFIO domain currently
there is no other way but destroying all VFIO domains and redo everything.

So yes I plan to unfold everything, ie call iommu_put_dma_cookie for
each domain.
> 
>> +}
>> +
>>  static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
>>  {
>>  	vfio_unmap_unpin(iommu, dma);
>> @@ -691,6 +705,63 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>>  	return ret;
>>  }
>>  
>> +static int vfio_register_msi_range(struct vfio_iommu *iommu,
>> +				   struct vfio_iommu_type1_dma_map *map)
>> +{
>> +	dma_addr_t iova = map->iova;
>> +	size_t size = map->size;
>> +	int ret = 0;
>> +	struct vfio_dma *dma;
>> +	unsigned long order;
>> +	uint64_t mask;
>> +
>> +	/* Verify that none of our __u64 fields overflow */
>> +	if (map->size != size || map->iova != iova)
>> +		return -EINVAL;
>> +
>> +	order =  __ffs(vfio_pgsize_bitmap(iommu));
>> +	mask = ((uint64_t)1 << order) - 1;
>> +
>> +	WARN_ON(mask & PAGE_MASK);
>> +
>> +	if (!size || (size | iova) & mask)
>> +		return -EINVAL;
>> +
>> +	/* Don't allow IOVA address wrap */
>> +	if (iova + size - 1 < iova)
>> +		return -EINVAL;
>> +
>> +	mutex_lock(&iommu->lock);
>> +
>> +	if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
>> +		ret =  -EEXIST;
>> +		goto unlock;
>> +	}
>> +
>> +	dma = kzalloc(sizeof(*dma), GFP_KERNEL);
>> +	if (!dma) {
>> +		ret = -ENOMEM;
>> +		goto unlock;
>> +	}
>> +
>> +	dma->iova = iova;
>> +	dma->size = size;
>> +	dma->type = VFIO_IOVA_RESERVED_MSI;
>> +
>> +	ret = vfio_set_msi_aperture(iommu, iova, size);
>> +	if (ret)
>> +		goto free_unlock;
>> +
>> +	vfio_link_dma(iommu, dma);
>> +	goto unlock;
>> +
>> +free_unlock:
>> +	kfree(dma);
>> +unlock:
>> +	mutex_unlock(&iommu->lock);
>> +	return ret;
>> +}
>> +
>>  static int vfio_bus_type(struct device *dev, void *data)
>>  {
>>  	struct bus_type **bus = data;
>> @@ -1064,7 +1135,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>>  	} else if (cmd == VFIO_IOMMU_MAP_DMA) {
>>  		struct vfio_iommu_type1_dma_map map;
>>  		uint32_t mask = VFIO_DMA_MAP_FLAG_READ |
>> -				VFIO_DMA_MAP_FLAG_WRITE;
>> +				VFIO_DMA_MAP_FLAG_WRITE |
>> +				VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA;
>>  
>>  		minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
>>  
>> @@ -1074,6 +1146,9 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>>  		if (map.argsz < minsz || map.flags & ~mask)
>>  			return -EINVAL;
>>  
>> +		if (map.flags & VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA)
>> +			return vfio_register_msi_range(iommu, &map);
>> +
>>  		return vfio_dma_do_map(iommu, &map);
>>  
>>  	} else if (cmd == VFIO_IOMMU_UNMAP_DMA) {
>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>> index 255a211..4a9dbc2 100644
>> --- a/include/uapi/linux/vfio.h
>> +++ b/include/uapi/linux/vfio.h
>> @@ -498,12 +498,19 @@ struct vfio_iommu_type1_info {
>>   *
>>   * Map process virtual addresses to IO virtual addresses using the
>>   * provided struct vfio_dma_map. Caller sets argsz. READ &/ WRITE required.
>> + *
>> + * In case RESERVED_MSI_IOVA flag is set, the API only aims at registering an
>> + * IOVA region that will be used on some platforms to map the host MSI frames.
>> + * In that specific case, vaddr is ignored. Once registered, an MSI reserved
>> + * IOVA region stays until the container is closed.
>>   */
>>  struct vfio_iommu_type1_dma_map {
>>  	__u32	argsz;
>>  	__u32	flags;
>>  #define VFIO_DMA_MAP_FLAG_READ (1 << 0)		/* readable from device */
>>  #define VFIO_DMA_MAP_FLAG_WRITE (1 << 1)	/* writable from device */
>> +/* reserved iova for MSI vectors*/
>> +#define VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA (1 << 2)
>>  	__u64	vaddr;				/* Process virtual address */
>>  	__u64	iova;				/* IO virtual address */
>>  	__u64	size;				/* Size of mapping (bytes) */
>> @@ -519,7 +526,8 @@ struct vfio_iommu_type1_dma_map {
>>   * Caller sets argsz.  The actual unmapped size is returned in the size
>>   * field.  No guarantee is made to the user that arbitrary unmaps of iova
>>   * or size different from those used in the original mapping call will
>> - * succeed.
>> + * succeed. Once registered, an MSI region cannot be unmapped and stays
>> + * until the container is closed.
>>   */
>>  struct vfio_iommu_type1_dma_unmap {
>>  	__u32	argsz;
> 
> What happens when an x86 user does a mapping with this new flag set?
> It seems like we end up configuring everything just as we would on a
> platform requiring MSI mapping, including setting the domain MSI
> geometry.  Should we be testing the MSI geometry flag on the iommu to
> see if this is supported?  Surprisingly few things seem to check that
> flag.

Yes I need to test the capability first and return -EINVAL in case the
capability is not supported..

Thanks

Eric
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v13 12/15] vfio: Allow reserved msi iova registration
@ 2016-10-07 17:11       ` Auger Eric
  0 siblings, 0 replies; 109+ messages in thread
From: Auger Eric @ 2016-10-07 17:11 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Alex,

On 06/10/2016 22:19, Alex Williamson wrote:
> On Thu,  6 Oct 2016 08:45:28 +0000
> Eric Auger <eric.auger@redhat.com> wrote:
> 
>> The user is allowed to register a reserved MSI IOVA range by using the
>> DMA MAP API and setting the new flag: VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA.
>> This region is stored in the vfio_dma rb tree. At that point the iova
>> range is not mapped to any target address yet. The host kernel will use
>> those iova when needed, typically when MSIs are allocated.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
>>
>> ---
>> v12 -> v13:
>> - use iommu_get_dma_msi_region_cookie
>>
>> v9 -> v10
>> - use VFIO_IOVA_RESERVED_MSI enum value
>>
>> v7 -> v8:
>> - use iommu_msi_set_aperture function. There is no notion of
>>   unregistration anymore since the reserved msi slot remains
>>   until the container gets closed.
>>
>> v6 -> v7:
>> - use iommu_free_reserved_iova_domain
>> - convey prot attributes downto dma-reserved-iommu iova domain creation
>> - reserved bindings teardown now performed on iommu domain destruction
>> - rename VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA into
>>          VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA
>> - change title
>> - pass the protection attribute to dma-reserved-iommu API
>>
>> v3 -> v4:
>> - use iommu_alloc/free_reserved_iova_domain exported by dma-reserved-iommu
>> - protect vfio_register_reserved_iova_range implementation with
>>   CONFIG_IOMMU_DMA_RESERVED
>> - handle unregistration by user-space and on vfio_iommu_type1 release
>>
>> v1 -> v2:
>> - set returned value according to alloc_reserved_iova_domain result
>> - free the iova domains in case any error occurs
>>
>> RFC v1 -> v1:
>> - takes into account Alex comments, based on
>>   [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region:
>> - use the existing dma map/unmap ioctl interface with a flag to register
>>   a reserved IOVA range. A single reserved iova region is allowed.
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 77 ++++++++++++++++++++++++++++++++++++++++-
>>  include/uapi/linux/vfio.h       | 10 +++++-
>>  2 files changed, 85 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index 5bc5fc9..c2f8bd9 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -442,6 +442,20 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
>>  	vfio_lock_acct(-unlocked);
>>  }
>>  
>> +static int vfio_set_msi_aperture(struct vfio_iommu *iommu,
>> +				dma_addr_t iova, size_t size)
>> +{
>> +	struct vfio_domain *d;
>> +	int ret = 0;
>> +
>> +	list_for_each_entry(d, &iommu->domain_list, next) {
>> +		ret = iommu_get_dma_msi_region_cookie(d->domain, iova, size);
>> +		if (ret)
>> +			break;
>> +	}
>> +	return ret;
> 
> Doesn't this need an unwind on failure loop?
At the moment the de-allocation is done by the smmu driver, on
domain_free ops, which calls iommu_put_dma_cookie. In case,
iommu_get_dma_msi_region_cookie fails on a given VFIO domain currently
there is no other way but destroying all VFIO domains and redo everything.

So yes I plan to unfold everything, ie call iommu_put_dma_cookie for
each domain.
> 
>> +}
>> +
>>  static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
>>  {
>>  	vfio_unmap_unpin(iommu, dma);
>> @@ -691,6 +705,63 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>>  	return ret;
>>  }
>>  
>> +static int vfio_register_msi_range(struct vfio_iommu *iommu,
>> +				   struct vfio_iommu_type1_dma_map *map)
>> +{
>> +	dma_addr_t iova = map->iova;
>> +	size_t size = map->size;
>> +	int ret = 0;
>> +	struct vfio_dma *dma;
>> +	unsigned long order;
>> +	uint64_t mask;
>> +
>> +	/* Verify that none of our __u64 fields overflow */
>> +	if (map->size != size || map->iova != iova)
>> +		return -EINVAL;
>> +
>> +	order =  __ffs(vfio_pgsize_bitmap(iommu));
>> +	mask = ((uint64_t)1 << order) - 1;
>> +
>> +	WARN_ON(mask & PAGE_MASK);
>> +
>> +	if (!size || (size | iova) & mask)
>> +		return -EINVAL;
>> +
>> +	/* Don't allow IOVA address wrap */
>> +	if (iova + size - 1 < iova)
>> +		return -EINVAL;
>> +
>> +	mutex_lock(&iommu->lock);
>> +
>> +	if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
>> +		ret =  -EEXIST;
>> +		goto unlock;
>> +	}
>> +
>> +	dma = kzalloc(sizeof(*dma), GFP_KERNEL);
>> +	if (!dma) {
>> +		ret = -ENOMEM;
>> +		goto unlock;
>> +	}
>> +
>> +	dma->iova = iova;
>> +	dma->size = size;
>> +	dma->type = VFIO_IOVA_RESERVED_MSI;
>> +
>> +	ret = vfio_set_msi_aperture(iommu, iova, size);
>> +	if (ret)
>> +		goto free_unlock;
>> +
>> +	vfio_link_dma(iommu, dma);
>> +	goto unlock;
>> +
>> +free_unlock:
>> +	kfree(dma);
>> +unlock:
>> +	mutex_unlock(&iommu->lock);
>> +	return ret;
>> +}
>> +
>>  static int vfio_bus_type(struct device *dev, void *data)
>>  {
>>  	struct bus_type **bus = data;
>> @@ -1064,7 +1135,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>>  	} else if (cmd == VFIO_IOMMU_MAP_DMA) {
>>  		struct vfio_iommu_type1_dma_map map;
>>  		uint32_t mask = VFIO_DMA_MAP_FLAG_READ |
>> -				VFIO_DMA_MAP_FLAG_WRITE;
>> +				VFIO_DMA_MAP_FLAG_WRITE |
>> +				VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA;
>>  
>>  		minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
>>  
>> @@ -1074,6 +1146,9 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>>  		if (map.argsz < minsz || map.flags & ~mask)
>>  			return -EINVAL;
>>  
>> +		if (map.flags & VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA)
>> +			return vfio_register_msi_range(iommu, &map);
>> +
>>  		return vfio_dma_do_map(iommu, &map);
>>  
>>  	} else if (cmd == VFIO_IOMMU_UNMAP_DMA) {
>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>> index 255a211..4a9dbc2 100644
>> --- a/include/uapi/linux/vfio.h
>> +++ b/include/uapi/linux/vfio.h
>> @@ -498,12 +498,19 @@ struct vfio_iommu_type1_info {
>>   *
>>   * Map process virtual addresses to IO virtual addresses using the
>>   * provided struct vfio_dma_map. Caller sets argsz. READ &/ WRITE required.
>> + *
>> + * In case RESERVED_MSI_IOVA flag is set, the API only aims at registering an
>> + * IOVA region that will be used on some platforms to map the host MSI frames.
>> + * In that specific case, vaddr is ignored. Once registered, an MSI reserved
>> + * IOVA region stays until the container is closed.
>>   */
>>  struct vfio_iommu_type1_dma_map {
>>  	__u32	argsz;
>>  	__u32	flags;
>>  #define VFIO_DMA_MAP_FLAG_READ (1 << 0)		/* readable from device */
>>  #define VFIO_DMA_MAP_FLAG_WRITE (1 << 1)	/* writable from device */
>> +/* reserved iova for MSI vectors*/
>> +#define VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA (1 << 2)
>>  	__u64	vaddr;				/* Process virtual address */
>>  	__u64	iova;				/* IO virtual address */
>>  	__u64	size;				/* Size of mapping (bytes) */
>> @@ -519,7 +526,8 @@ struct vfio_iommu_type1_dma_map {
>>   * Caller sets argsz.  The actual unmapped size is returned in the size
>>   * field.  No guarantee is made to the user that arbitrary unmaps of iova
>>   * or size different from those used in the original mapping call will
>> - * succeed.
>> + * succeed. Once registered, an MSI region cannot be unmapped and stays
>> + * until the container is closed.
>>   */
>>  struct vfio_iommu_type1_dma_unmap {
>>  	__u32	argsz;
> 
> What happens when an x86 user does a mapping with this new flag set?
> It seems like we end up configuring everything just as we would on a
> platform requiring MSI mapping, including setting the domain MSI
> geometry.  Should we be testing the MSI geometry flag on the iommu to
> see if this is supported?  Surprisingly few things seem to check that
> flag.

Yes I need to test the capability first and return -EINVAL in case the
capability is not supported..

Thanks

Eric
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v13 11/15] vfio/type1: Handle unmap/unpin and replay for VFIO_IOVA_RESERVED slots
@ 2016-10-07 17:11       ` Auger Eric
  0 siblings, 0 replies; 109+ messages in thread
From: Auger Eric @ 2016-10-07 17:11 UTC (permalink / raw)
  To: Alex Williamson
  Cc: yehuday, drjones, jason, kvm, marc.zyngier, p.fedin, joro,
	will.deacon, linux-kernel, Bharat.Bhushan, Jean-Philippe.Brucker,
	iommu, pranav.sawargaonkar, linux-arm-kernel, tglx, robin.murphy,
	Manish.Jaggi, christoffer.dall, eric.auger.pro

Hi Alex,

On 06/10/2016 22:19, Alex Williamson wrote:
> On Thu,  6 Oct 2016 08:45:27 +0000
> Eric Auger <eric.auger@redhat.com> wrote:
> 
>> Before allowing the end-user to create VFIO_IOVA_RESERVED dma slots,
>> let's implement the expected behavior for removal and replay.
>>
>> As opposed to user dma slots, reserved IOVAs are not systematically bound
>> to PAs and PAs are not pinned. VFIO just initializes the IOVA "aperture".
>> IOVAs are allocated outside of the VFIO framework, by the MSI layer which
>> is responsible to free and unmap them. The MSI mapping resources are freeed
> 
> nit, extra 'e', "freed"
> 
>> by the IOMMU driver on domain destruction.
>>
>> On the creation of a new domain, the "replay" of a reserved slot simply
>> needs to set the MSI aperture on the new domain.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>
>> ---
>> v12 -> v13:
>> - use dma-iommu iommu_get_dma_msi_region_cookie
>>
>> v9 -> v10:
>> - replay of a reserved slot sets the MSI aperture on the new domain
>> - use VFIO_IOVA_RESERVED_MSI enum value instead of VFIO_IOVA_RESERVED
>>
>> v7 -> v8:
>> - do no destroy anything anymore, just bypass unmap/unpin and iommu_map
>>   on replay
>> ---
>>  drivers/vfio/Kconfig            |  1 +
>>  drivers/vfio/vfio_iommu_type1.c | 10 +++++++++-
>>  2 files changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
>> index da6e2ce..673ec79 100644
>> --- a/drivers/vfio/Kconfig
>> +++ b/drivers/vfio/Kconfig
>> @@ -1,6 +1,7 @@
>>  config VFIO_IOMMU_TYPE1
>>  	tristate
>>  	depends on VFIO
>> +	select IOMMU_DMA
>>  	default n
>>  
>>  config VFIO_IOMMU_SPAPR_TCE
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index 65a4038..5bc5fc9 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -36,6 +36,7 @@
>>  #include <linux/uaccess.h>
>>  #include <linux/vfio.h>
>>  #include <linux/workqueue.h>
>> +#include <linux/dma-iommu.h>
>>  
>>  #define DRIVER_VERSION  "0.2"
>>  #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
>> @@ -387,7 +388,7 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
>>  	struct vfio_domain *domain, *d;
>>  	long unlocked = 0;
>>  
>> -	if (!dma->size)
>> +	if (!dma->size || dma->type != VFIO_IOVA_USER)
>>  		return;
>>  	/*
>>  	 * We use the IOMMU to track the physical addresses, otherwise we'd
>> @@ -724,6 +725,13 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
>>  		dma = rb_entry(n, struct vfio_dma, node);
>>  		iova = dma->iova;
>>  
>> +		if (dma->type == VFIO_IOVA_RESERVED_MSI) {
>> +			ret = iommu_get_dma_msi_region_cookie(domain->domain,
>> +						     dma->iova, dma->size);
>> +			WARN_ON(ret);
>> +			continue;
>> +		}
> 
> Why is this a passable error?  We consider an iommu_map() error on any
> entry a failure.
Yes I agree.

Thanks

Eric
> 
>> +
>>  		while (iova < dma->iova + dma->size) {
>>  			phys_addr_t phys = iommu_iova_to_phys(d->domain, iova);
>>  			size_t size;
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v13 11/15] vfio/type1: Handle unmap/unpin and replay for VFIO_IOVA_RESERVED slots
@ 2016-10-07 17:11       ` Auger Eric
  0 siblings, 0 replies; 109+ messages in thread
From: Auger Eric @ 2016-10-07 17:11 UTC (permalink / raw)
  To: Alex Williamson
  Cc: yehuday-eYqpPyKDWXRBDgjK7y7TUQ, jason-NLaQJdtUoK4Be96aLqz0jA,
	kvm-u79uwXL29TY76Z2rM5mHXA, marc.zyngier-5wv7dgnIgG8,
	will.deacon-5wv7dgnIgG8, p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
	drjones-H+wXaHxf7aLQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	tglx-hfZtesqFncYOwBW4kG4KsQ,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w

Hi Alex,

On 06/10/2016 22:19, Alex Williamson wrote:
> On Thu,  6 Oct 2016 08:45:27 +0000
> Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> 
>> Before allowing the end-user to create VFIO_IOVA_RESERVED dma slots,
>> let's implement the expected behavior for removal and replay.
>>
>> As opposed to user dma slots, reserved IOVAs are not systematically bound
>> to PAs and PAs are not pinned. VFIO just initializes the IOVA "aperture".
>> IOVAs are allocated outside of the VFIO framework, by the MSI layer which
>> is responsible to free and unmap them. The MSI mapping resources are freeed
> 
> nit, extra 'e', "freed"
> 
>> by the IOMMU driver on domain destruction.
>>
>> On the creation of a new domain, the "replay" of a reserved slot simply
>> needs to set the MSI aperture on the new domain.
>>
>> Signed-off-by: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>>
>> ---
>> v12 -> v13:
>> - use dma-iommu iommu_get_dma_msi_region_cookie
>>
>> v9 -> v10:
>> - replay of a reserved slot sets the MSI aperture on the new domain
>> - use VFIO_IOVA_RESERVED_MSI enum value instead of VFIO_IOVA_RESERVED
>>
>> v7 -> v8:
>> - do no destroy anything anymore, just bypass unmap/unpin and iommu_map
>>   on replay
>> ---
>>  drivers/vfio/Kconfig            |  1 +
>>  drivers/vfio/vfio_iommu_type1.c | 10 +++++++++-
>>  2 files changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
>> index da6e2ce..673ec79 100644
>> --- a/drivers/vfio/Kconfig
>> +++ b/drivers/vfio/Kconfig
>> @@ -1,6 +1,7 @@
>>  config VFIO_IOMMU_TYPE1
>>  	tristate
>>  	depends on VFIO
>> +	select IOMMU_DMA
>>  	default n
>>  
>>  config VFIO_IOMMU_SPAPR_TCE
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index 65a4038..5bc5fc9 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -36,6 +36,7 @@
>>  #include <linux/uaccess.h>
>>  #include <linux/vfio.h>
>>  #include <linux/workqueue.h>
>> +#include <linux/dma-iommu.h>
>>  
>>  #define DRIVER_VERSION  "0.2"
>>  #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>"
>> @@ -387,7 +388,7 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
>>  	struct vfio_domain *domain, *d;
>>  	long unlocked = 0;
>>  
>> -	if (!dma->size)
>> +	if (!dma->size || dma->type != VFIO_IOVA_USER)
>>  		return;
>>  	/*
>>  	 * We use the IOMMU to track the physical addresses, otherwise we'd
>> @@ -724,6 +725,13 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
>>  		dma = rb_entry(n, struct vfio_dma, node);
>>  		iova = dma->iova;
>>  
>> +		if (dma->type == VFIO_IOVA_RESERVED_MSI) {
>> +			ret = iommu_get_dma_msi_region_cookie(domain->domain,
>> +						     dma->iova, dma->size);
>> +			WARN_ON(ret);
>> +			continue;
>> +		}
> 
> Why is this a passable error?  We consider an iommu_map() error on any
> entry a failure.
Yes I agree.

Thanks

Eric
> 
>> +
>>  		while (iova < dma->iova + dma->size) {
>>  			phys_addr_t phys = iommu_iova_to_phys(d->domain, iova);
>>  			size_t size;
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v13 11/15] vfio/type1: Handle unmap/unpin and replay for VFIO_IOVA_RESERVED slots
@ 2016-10-07 17:11       ` Auger Eric
  0 siblings, 0 replies; 109+ messages in thread
From: Auger Eric @ 2016-10-07 17:11 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Alex,

On 06/10/2016 22:19, Alex Williamson wrote:
> On Thu,  6 Oct 2016 08:45:27 +0000
> Eric Auger <eric.auger@redhat.com> wrote:
> 
>> Before allowing the end-user to create VFIO_IOVA_RESERVED dma slots,
>> let's implement the expected behavior for removal and replay.
>>
>> As opposed to user dma slots, reserved IOVAs are not systematically bound
>> to PAs and PAs are not pinned. VFIO just initializes the IOVA "aperture".
>> IOVAs are allocated outside of the VFIO framework, by the MSI layer which
>> is responsible to free and unmap them. The MSI mapping resources are freeed
> 
> nit, extra 'e', "freed"
> 
>> by the IOMMU driver on domain destruction.
>>
>> On the creation of a new domain, the "replay" of a reserved slot simply
>> needs to set the MSI aperture on the new domain.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>
>> ---
>> v12 -> v13:
>> - use dma-iommu iommu_get_dma_msi_region_cookie
>>
>> v9 -> v10:
>> - replay of a reserved slot sets the MSI aperture on the new domain
>> - use VFIO_IOVA_RESERVED_MSI enum value instead of VFIO_IOVA_RESERVED
>>
>> v7 -> v8:
>> - do no destroy anything anymore, just bypass unmap/unpin and iommu_map
>>   on replay
>> ---
>>  drivers/vfio/Kconfig            |  1 +
>>  drivers/vfio/vfio_iommu_type1.c | 10 +++++++++-
>>  2 files changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
>> index da6e2ce..673ec79 100644
>> --- a/drivers/vfio/Kconfig
>> +++ b/drivers/vfio/Kconfig
>> @@ -1,6 +1,7 @@
>>  config VFIO_IOMMU_TYPE1
>>  	tristate
>>  	depends on VFIO
>> +	select IOMMU_DMA
>>  	default n
>>  
>>  config VFIO_IOMMU_SPAPR_TCE
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index 65a4038..5bc5fc9 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -36,6 +36,7 @@
>>  #include <linux/uaccess.h>
>>  #include <linux/vfio.h>
>>  #include <linux/workqueue.h>
>> +#include <linux/dma-iommu.h>
>>  
>>  #define DRIVER_VERSION  "0.2"
>>  #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
>> @@ -387,7 +388,7 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
>>  	struct vfio_domain *domain, *d;
>>  	long unlocked = 0;
>>  
>> -	if (!dma->size)
>> +	if (!dma->size || dma->type != VFIO_IOVA_USER)
>>  		return;
>>  	/*
>>  	 * We use the IOMMU to track the physical addresses, otherwise we'd
>> @@ -724,6 +725,13 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
>>  		dma = rb_entry(n, struct vfio_dma, node);
>>  		iova = dma->iova;
>>  
>> +		if (dma->type == VFIO_IOVA_RESERVED_MSI) {
>> +			ret = iommu_get_dma_msi_region_cookie(domain->domain,
>> +						     dma->iova, dma->size);
>> +			WARN_ON(ret);
>> +			continue;
>> +		}
> 
> Why is this a passable error?  We consider an iommu_map() error on any
> entry a failure.
Yes I agree.

Thanks

Eric
> 
>> +
>>  		while (iova < dma->iova + dma->size) {
>>  			phys_addr_t phys = iommu_iova_to_phys(d->domain, iova);
>>  			size_t size;
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v13 04/15] genirq/msi: Introduce the MSI doorbell API
@ 2016-10-07 17:13       ` Auger Eric
  0 siblings, 0 replies; 109+ messages in thread
From: Auger Eric @ 2016-10-07 17:13 UTC (permalink / raw)
  To: Alex Williamson
  Cc: yehuday, drjones, jason, kvm, marc.zyngier, p.fedin, joro,
	will.deacon, linux-kernel, Bharat.Bhushan, Jean-Philippe.Brucker,
	iommu, pranav.sawargaonkar, linux-arm-kernel, tglx, robin.murphy,
	Manish.Jaggi, christoffer.dall, eric.auger.pro

Hi Alex,

On 06/10/2016 22:17, Alex Williamson wrote:
> On Thu,  6 Oct 2016 08:45:20 +0000
> Eric Auger <eric.auger@redhat.com> wrote:
> 
>> We introduce a new msi-doorbell API that allows msi controllers
>> to allocate and register their doorbells. This is useful when
>> those doorbells are likely to be iommu mapped (typically on ARM).
>> The VFIO layer will need to gather information about those doorbells:
>> whether they are safe (ie. they implement irq remapping) and how
>> many IOMMU pages are requested to map all of them.
>>
>> This patch first introduces the dedicated msi_doorbell_info struct
>> and the registration/unregistration functions.
>>
>> A doorbell region is characterized by its physical address base, size,
>> and whether it its safe (ie. it implements IRQ remapping). A doorbell
>> can be per-cpu of global. We currently only care about global doorbells.
>                  ^^ s/of/or/
OK
> 
>>
>> A function returns whether all doorbells are safe.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>
>> ---
>> v12 -> v13:
>> - directly select MSI_DOORBELL in ARM_SMMU and ARM_SMMU_V3 configs
>> - remove prot attribute
>> - move msi_doorbell_info struct definition in msi-doorbell.c
>> - change the commit title
>> - change proto of the registration function
>> - msi_doorbell_safe now in this patch
>>
>> v11 -> v12:
>> - rename irqchip_doorbell into msi_doorbell, irqchip_doorbell_list
>>   into msi_doorbell_list and irqchip_doorbell_mutex into
>>   msi_doorbell_mutex
>> - fix style issues: align msi_doorbell struct members, kernel-doc comments
>> - use kzalloc
>> - use container_of in msi_doorbell_unregister_global
>> - compute nb_unsafe_doorbells on registration/unregistration
>> - registration simply returns NULL if allocation failed
>>
>> v10 -> v11:
>> - remove void *chip_data argument from register/unregister function
>> - remove lookup funtions since we restored the struct irq_chip
>>   msi_doorbell_info ops to realize this function
>> - reword commit message and title
>>
>> Conflicts:
>> 	kernel/irq/Makefile
>>
>> Conflicts:
>> 	drivers/iommu/Kconfig
>> ---
>>  drivers/iommu/Kconfig        |  2 +
>>  include/linux/msi-doorbell.h | 77 ++++++++++++++++++++++++++++++++++
>>  kernel/irq/Kconfig           |  4 ++
>>  kernel/irq/Makefile          |  1 +
>>  kernel/irq/msi-doorbell.c    | 98 ++++++++++++++++++++++++++++++++++++++++++++
>>  5 files changed, 182 insertions(+)
>>  create mode 100644 include/linux/msi-doorbell.h
>>  create mode 100644 kernel/irq/msi-doorbell.c
>>
>> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
>> index 8ee54d7..0cc7fac 100644
>> --- a/drivers/iommu/Kconfig
>> +++ b/drivers/iommu/Kconfig
>> @@ -297,6 +297,7 @@ config SPAPR_TCE_IOMMU
>>  config ARM_SMMU
>>  	bool "ARM Ltd. System MMU (SMMU) Support"
>>  	depends on (ARM64 || ARM) && MMU
>> +	select MSI_DOORBELL
>>  	select IOMMU_API
>>  	select IOMMU_IO_PGTABLE_LPAE
>>  	select ARM_DMA_USE_IOMMU if ARM
>> @@ -310,6 +311,7 @@ config ARM_SMMU
>>  config ARM_SMMU_V3
>>  	bool "ARM Ltd. System MMU Version 3 (SMMUv3) Support"
>>  	depends on ARM64
>> +	select MSI_DOORBELL
>>  	select IOMMU_API
>>  	select IOMMU_IO_PGTABLE_LPAE
>>  	select GENERIC_MSI_IRQ_DOMAIN
>> diff --git a/include/linux/msi-doorbell.h b/include/linux/msi-doorbell.h
>> new file mode 100644
>> index 0000000..c18a382
>> --- /dev/null
>> +++ b/include/linux/msi-doorbell.h
>> @@ -0,0 +1,77 @@
>> +/*
>> + * API to register/query MSI doorbells likely to be IOMMU mapped
>> + *
>> + * Copyright (C) 2016 Red Hat, Inc.
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#ifndef _LINUX_MSI_DOORBELL_H
>> +#define _LINUX_MSI_DOORBELL_H
>> +
>> +struct msi_doorbell_info;
>> +
>> +#ifdef CONFIG_MSI_DOORBELL
>> +
>> +/**
>> + * msi_doorbell_register - allocate and register a global doorbell
>> + * @base: physical base address of the global doorbell
>> + * @size: size of the global doorbell
>> + * @prot: protection/memory attributes
>> + * @safe: true is irq_remapping implemented for this doorbell
>> + * @dbinfo: returned doorbell info
>> + *
>> + * Return: 0 on success, -ENOMEM on allocation failure
>> + */
>> +int msi_doorbell_register_global(phys_addr_t base, size_t size,
>> +				 bool safe,
>> +				 struct msi_doorbell_info **dbinfo);
>> +
> 
> Seems like alloc/free behavior vs register/unregister.  Also seems
> cleaner to just return a struct msi_doorbell_info* and use PTR_ERR for
> return codes.  These are of course superficial changes that could be
> addressed in the future.
Sure
> 
>> +/**
>> + * msi_doorbell_unregister_global - unregister a global doorbell
>> + * @db: doorbell info to unregister
>> + *
>> + * remove the doorbell descriptor from the list of registered doorbells
>> + * and deallocates it
>> + */
>> +void msi_doorbell_unregister_global(struct msi_doorbell_info *db);
>> +
>> +/**
>> + * msi_doorbell_safe - return whether all registered doorbells are safe
>> + *
>> + * Safe doorbells are those which implement irq remapping
>> + * Return: true if all doorbells are safe, false otherwise
>> + */
>> +bool msi_doorbell_safe(void);
>> +
>> +#else
>> +
>> +static inline int
>> +msi_doorbell_register_global(phys_addr_t base, size_t size,
>> +			     int prot, bool safe,
>> +			     struct msi_doorbell_info **dbinfo)
>> +{
>> +	*dbinfo = NULL;
>> +	return 0;
> 
> If we return a struct*
> 
> return NULL;
Yep
> 
>> +}
>> +
>> +static inline void
>> +msi_doorbell_unregister_global(struct msi_doorbell_info *db) {}
>> +
>> +static inline bool msi_doorbell_safe(void)
>> +{
>> +	return true;
>> +}
> 
> Is it?
Yes I will return false and change the safety check in vfio_iommu_type1.c

Thanks

Eric
> 
>> +#endif /* CONFIG_MSI_DOORBELL */
>> +
>> +#endif
>> diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig
>> index 3bbfd6a..d4faaaa 100644
>> --- a/kernel/irq/Kconfig
>> +++ b/kernel/irq/Kconfig
>> @@ -72,6 +72,10 @@ config GENERIC_IRQ_IPI
>>  config GENERIC_MSI_IRQ
>>  	bool
>>  
>> +# MSI doorbell support (for doorbell IOMMU mapping)
>> +config MSI_DOORBELL
>> +	bool
>> +
>>  # Generic MSI hierarchical interrupt domain support
>>  config GENERIC_MSI_IRQ_DOMAIN
>>  	bool
>> diff --git a/kernel/irq/Makefile b/kernel/irq/Makefile
>> index 1d3ee31..5b04dd1 100644
>> --- a/kernel/irq/Makefile
>> +++ b/kernel/irq/Makefile
>> @@ -10,3 +10,4 @@ obj-$(CONFIG_PM_SLEEP) += pm.o
>>  obj-$(CONFIG_GENERIC_MSI_IRQ) += msi.o
>>  obj-$(CONFIG_GENERIC_IRQ_IPI) += ipi.o
>>  obj-$(CONFIG_SMP) += affinity.o
>> +obj-$(CONFIG_MSI_DOORBELL) += msi-doorbell.o
>> diff --git a/kernel/irq/msi-doorbell.c b/kernel/irq/msi-doorbell.c
>> new file mode 100644
>> index 0000000..60a262a
>> --- /dev/null
>> +++ b/kernel/irq/msi-doorbell.c
>> @@ -0,0 +1,98 @@
>> +/*
>> + * API to register/query MSI doorbells likely to be IOMMU mapped
>> + *
>> + * Copyright (C) 2016 Red Hat, Inc.
>> + * Author: Eric Auger <eric.auger@redhat.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#include <linux/slab.h>
>> +#include <linux/irq.h>
>> +#include <linux/msi-doorbell.h>
>> +
>> +/**
>> + * struct msi_doorbell_info - MSI doorbell region descriptor
>> + * @percpu_doorbells: per cpu doorbell base address
>> + * @global_doorbell: base address of the doorbell
>> + * @doorbell_is_percpu: is the doorbell per cpu or global?
>> + * @safe: true if irq remapping is implemented
>> + * @size: size of the doorbell
>> + */
>> +struct msi_doorbell_info {
>> +	union {
>> +		phys_addr_t __percpu    *percpu_doorbells;
>> +		phys_addr_t             global_doorbell;
>> +	};
>> +	bool    doorbell_is_percpu;
>> +	bool    safe;
>> +	size_t  size;
>> +};
>> +
>> +struct msi_doorbell {
>> +	struct msi_doorbell_info	info;
>> +	struct list_head		next;
>> +};
>> +
>> +/* list of registered MSI doorbells */
>> +static LIST_HEAD(msi_doorbell_list);
>> +
>> +/* counts the number of unsafe registered doorbells */
>> +static uint nb_unsafe_doorbells;
>> +
>> +/* protects the list and nb__unsafe_doorbells */
> 
> Extra underscore
> 
>> +static DEFINE_MUTEX(msi_doorbell_mutex);
>> +
>> +int msi_doorbell_register_global(phys_addr_t base, size_t size, bool safe,
>> +				 struct msi_doorbell_info **dbinfo)
>> +{
>> +	struct msi_doorbell *db;
>> +
>> +	db = kzalloc(sizeof(*db), GFP_KERNEL);
>> +	if (!db)
>> +		return -ENOMEM;
>> +
>> +	db->info.global_doorbell = base;
>> +	db->info.size = size;
>> +	db->info.safe = safe;
>> +
>> +	mutex_lock(&msi_doorbell_mutex);
>> +	list_add(&db->next, &msi_doorbell_list);
>> +	if (!db->info.safe)
>> +		nb_unsafe_doorbells++;
>> +	mutex_unlock(&msi_doorbell_mutex);
>> +	*dbinfo = &db->info;
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(msi_doorbell_register_global);
>> +
>> +void msi_doorbell_unregister_global(struct msi_doorbell_info *dbinfo)
>> +{
>> +	struct msi_doorbell *db;
>> +
>> +	db = container_of(dbinfo, struct msi_doorbell, info);
>> +
>> +	mutex_lock(&msi_doorbell_mutex);
>> +	list_del(&db->next);
>> +	if (!db->info.safe)
>> +		nb_unsafe_doorbells--;
>> +	mutex_unlock(&msi_doorbell_mutex);
>> +	kfree(db);
>> +}
>> +EXPORT_SYMBOL_GPL(msi_doorbell_unregister_global);
>> +
>> +bool msi_doorbell_safe(void)
>> +{
>> +	return !nb_unsafe_doorbells;
>> +}
>> +EXPORT_SYMBOL_GPL(msi_doorbell_safe);
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v13 04/15] genirq/msi: Introduce the MSI doorbell API
@ 2016-10-07 17:13       ` Auger Eric
  0 siblings, 0 replies; 109+ messages in thread
From: Auger Eric @ 2016-10-07 17:13 UTC (permalink / raw)
  To: Alex Williamson
  Cc: yehuday-eYqpPyKDWXRBDgjK7y7TUQ, jason-NLaQJdtUoK4Be96aLqz0jA,
	kvm-u79uwXL29TY76Z2rM5mHXA, marc.zyngier-5wv7dgnIgG8,
	will.deacon-5wv7dgnIgG8, p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
	drjones-H+wXaHxf7aLQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	tglx-hfZtesqFncYOwBW4kG4KsQ,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w

Hi Alex,

On 06/10/2016 22:17, Alex Williamson wrote:
> On Thu,  6 Oct 2016 08:45:20 +0000
> Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> 
>> We introduce a new msi-doorbell API that allows msi controllers
>> to allocate and register their doorbells. This is useful when
>> those doorbells are likely to be iommu mapped (typically on ARM).
>> The VFIO layer will need to gather information about those doorbells:
>> whether they are safe (ie. they implement irq remapping) and how
>> many IOMMU pages are requested to map all of them.
>>
>> This patch first introduces the dedicated msi_doorbell_info struct
>> and the registration/unregistration functions.
>>
>> A doorbell region is characterized by its physical address base, size,
>> and whether it its safe (ie. it implements IRQ remapping). A doorbell
>> can be per-cpu of global. We currently only care about global doorbells.
>                  ^^ s/of/or/
OK
> 
>>
>> A function returns whether all doorbells are safe.
>>
>> Signed-off-by: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>>
>> ---
>> v12 -> v13:
>> - directly select MSI_DOORBELL in ARM_SMMU and ARM_SMMU_V3 configs
>> - remove prot attribute
>> - move msi_doorbell_info struct definition in msi-doorbell.c
>> - change the commit title
>> - change proto of the registration function
>> - msi_doorbell_safe now in this patch
>>
>> v11 -> v12:
>> - rename irqchip_doorbell into msi_doorbell, irqchip_doorbell_list
>>   into msi_doorbell_list and irqchip_doorbell_mutex into
>>   msi_doorbell_mutex
>> - fix style issues: align msi_doorbell struct members, kernel-doc comments
>> - use kzalloc
>> - use container_of in msi_doorbell_unregister_global
>> - compute nb_unsafe_doorbells on registration/unregistration
>> - registration simply returns NULL if allocation failed
>>
>> v10 -> v11:
>> - remove void *chip_data argument from register/unregister function
>> - remove lookup funtions since we restored the struct irq_chip
>>   msi_doorbell_info ops to realize this function
>> - reword commit message and title
>>
>> Conflicts:
>> 	kernel/irq/Makefile
>>
>> Conflicts:
>> 	drivers/iommu/Kconfig
>> ---
>>  drivers/iommu/Kconfig        |  2 +
>>  include/linux/msi-doorbell.h | 77 ++++++++++++++++++++++++++++++++++
>>  kernel/irq/Kconfig           |  4 ++
>>  kernel/irq/Makefile          |  1 +
>>  kernel/irq/msi-doorbell.c    | 98 ++++++++++++++++++++++++++++++++++++++++++++
>>  5 files changed, 182 insertions(+)
>>  create mode 100644 include/linux/msi-doorbell.h
>>  create mode 100644 kernel/irq/msi-doorbell.c
>>
>> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
>> index 8ee54d7..0cc7fac 100644
>> --- a/drivers/iommu/Kconfig
>> +++ b/drivers/iommu/Kconfig
>> @@ -297,6 +297,7 @@ config SPAPR_TCE_IOMMU
>>  config ARM_SMMU
>>  	bool "ARM Ltd. System MMU (SMMU) Support"
>>  	depends on (ARM64 || ARM) && MMU
>> +	select MSI_DOORBELL
>>  	select IOMMU_API
>>  	select IOMMU_IO_PGTABLE_LPAE
>>  	select ARM_DMA_USE_IOMMU if ARM
>> @@ -310,6 +311,7 @@ config ARM_SMMU
>>  config ARM_SMMU_V3
>>  	bool "ARM Ltd. System MMU Version 3 (SMMUv3) Support"
>>  	depends on ARM64
>> +	select MSI_DOORBELL
>>  	select IOMMU_API
>>  	select IOMMU_IO_PGTABLE_LPAE
>>  	select GENERIC_MSI_IRQ_DOMAIN
>> diff --git a/include/linux/msi-doorbell.h b/include/linux/msi-doorbell.h
>> new file mode 100644
>> index 0000000..c18a382
>> --- /dev/null
>> +++ b/include/linux/msi-doorbell.h
>> @@ -0,0 +1,77 @@
>> +/*
>> + * API to register/query MSI doorbells likely to be IOMMU mapped
>> + *
>> + * Copyright (C) 2016 Red Hat, Inc.
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#ifndef _LINUX_MSI_DOORBELL_H
>> +#define _LINUX_MSI_DOORBELL_H
>> +
>> +struct msi_doorbell_info;
>> +
>> +#ifdef CONFIG_MSI_DOORBELL
>> +
>> +/**
>> + * msi_doorbell_register - allocate and register a global doorbell
>> + * @base: physical base address of the global doorbell
>> + * @size: size of the global doorbell
>> + * @prot: protection/memory attributes
>> + * @safe: true is irq_remapping implemented for this doorbell
>> + * @dbinfo: returned doorbell info
>> + *
>> + * Return: 0 on success, -ENOMEM on allocation failure
>> + */
>> +int msi_doorbell_register_global(phys_addr_t base, size_t size,
>> +				 bool safe,
>> +				 struct msi_doorbell_info **dbinfo);
>> +
> 
> Seems like alloc/free behavior vs register/unregister.  Also seems
> cleaner to just return a struct msi_doorbell_info* and use PTR_ERR for
> return codes.  These are of course superficial changes that could be
> addressed in the future.
Sure
> 
>> +/**
>> + * msi_doorbell_unregister_global - unregister a global doorbell
>> + * @db: doorbell info to unregister
>> + *
>> + * remove the doorbell descriptor from the list of registered doorbells
>> + * and deallocates it
>> + */
>> +void msi_doorbell_unregister_global(struct msi_doorbell_info *db);
>> +
>> +/**
>> + * msi_doorbell_safe - return whether all registered doorbells are safe
>> + *
>> + * Safe doorbells are those which implement irq remapping
>> + * Return: true if all doorbells are safe, false otherwise
>> + */
>> +bool msi_doorbell_safe(void);
>> +
>> +#else
>> +
>> +static inline int
>> +msi_doorbell_register_global(phys_addr_t base, size_t size,
>> +			     int prot, bool safe,
>> +			     struct msi_doorbell_info **dbinfo)
>> +{
>> +	*dbinfo = NULL;
>> +	return 0;
> 
> If we return a struct*
> 
> return NULL;
Yep
> 
>> +}
>> +
>> +static inline void
>> +msi_doorbell_unregister_global(struct msi_doorbell_info *db) {}
>> +
>> +static inline bool msi_doorbell_safe(void)
>> +{
>> +	return true;
>> +}
> 
> Is it?
Yes I will return false and change the safety check in vfio_iommu_type1.c

Thanks

Eric
> 
>> +#endif /* CONFIG_MSI_DOORBELL */
>> +
>> +#endif
>> diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig
>> index 3bbfd6a..d4faaaa 100644
>> --- a/kernel/irq/Kconfig
>> +++ b/kernel/irq/Kconfig
>> @@ -72,6 +72,10 @@ config GENERIC_IRQ_IPI
>>  config GENERIC_MSI_IRQ
>>  	bool
>>  
>> +# MSI doorbell support (for doorbell IOMMU mapping)
>> +config MSI_DOORBELL
>> +	bool
>> +
>>  # Generic MSI hierarchical interrupt domain support
>>  config GENERIC_MSI_IRQ_DOMAIN
>>  	bool
>> diff --git a/kernel/irq/Makefile b/kernel/irq/Makefile
>> index 1d3ee31..5b04dd1 100644
>> --- a/kernel/irq/Makefile
>> +++ b/kernel/irq/Makefile
>> @@ -10,3 +10,4 @@ obj-$(CONFIG_PM_SLEEP) += pm.o
>>  obj-$(CONFIG_GENERIC_MSI_IRQ) += msi.o
>>  obj-$(CONFIG_GENERIC_IRQ_IPI) += ipi.o
>>  obj-$(CONFIG_SMP) += affinity.o
>> +obj-$(CONFIG_MSI_DOORBELL) += msi-doorbell.o
>> diff --git a/kernel/irq/msi-doorbell.c b/kernel/irq/msi-doorbell.c
>> new file mode 100644
>> index 0000000..60a262a
>> --- /dev/null
>> +++ b/kernel/irq/msi-doorbell.c
>> @@ -0,0 +1,98 @@
>> +/*
>> + * API to register/query MSI doorbells likely to be IOMMU mapped
>> + *
>> + * Copyright (C) 2016 Red Hat, Inc.
>> + * Author: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#include <linux/slab.h>
>> +#include <linux/irq.h>
>> +#include <linux/msi-doorbell.h>
>> +
>> +/**
>> + * struct msi_doorbell_info - MSI doorbell region descriptor
>> + * @percpu_doorbells: per cpu doorbell base address
>> + * @global_doorbell: base address of the doorbell
>> + * @doorbell_is_percpu: is the doorbell per cpu or global?
>> + * @safe: true if irq remapping is implemented
>> + * @size: size of the doorbell
>> + */
>> +struct msi_doorbell_info {
>> +	union {
>> +		phys_addr_t __percpu    *percpu_doorbells;
>> +		phys_addr_t             global_doorbell;
>> +	};
>> +	bool    doorbell_is_percpu;
>> +	bool    safe;
>> +	size_t  size;
>> +};
>> +
>> +struct msi_doorbell {
>> +	struct msi_doorbell_info	info;
>> +	struct list_head		next;
>> +};
>> +
>> +/* list of registered MSI doorbells */
>> +static LIST_HEAD(msi_doorbell_list);
>> +
>> +/* counts the number of unsafe registered doorbells */
>> +static uint nb_unsafe_doorbells;
>> +
>> +/* protects the list and nb__unsafe_doorbells */
> 
> Extra underscore
> 
>> +static DEFINE_MUTEX(msi_doorbell_mutex);
>> +
>> +int msi_doorbell_register_global(phys_addr_t base, size_t size, bool safe,
>> +				 struct msi_doorbell_info **dbinfo)
>> +{
>> +	struct msi_doorbell *db;
>> +
>> +	db = kzalloc(sizeof(*db), GFP_KERNEL);
>> +	if (!db)
>> +		return -ENOMEM;
>> +
>> +	db->info.global_doorbell = base;
>> +	db->info.size = size;
>> +	db->info.safe = safe;
>> +
>> +	mutex_lock(&msi_doorbell_mutex);
>> +	list_add(&db->next, &msi_doorbell_list);
>> +	if (!db->info.safe)
>> +		nb_unsafe_doorbells++;
>> +	mutex_unlock(&msi_doorbell_mutex);
>> +	*dbinfo = &db->info;
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(msi_doorbell_register_global);
>> +
>> +void msi_doorbell_unregister_global(struct msi_doorbell_info *dbinfo)
>> +{
>> +	struct msi_doorbell *db;
>> +
>> +	db = container_of(dbinfo, struct msi_doorbell, info);
>> +
>> +	mutex_lock(&msi_doorbell_mutex);
>> +	list_del(&db->next);
>> +	if (!db->info.safe)
>> +		nb_unsafe_doorbells--;
>> +	mutex_unlock(&msi_doorbell_mutex);
>> +	kfree(db);
>> +}
>> +EXPORT_SYMBOL_GPL(msi_doorbell_unregister_global);
>> +
>> +bool msi_doorbell_safe(void)
>> +{
>> +	return !nb_unsafe_doorbells;
>> +}
>> +EXPORT_SYMBOL_GPL(msi_doorbell_safe);
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v13 04/15] genirq/msi: Introduce the MSI doorbell API
@ 2016-10-07 17:13       ` Auger Eric
  0 siblings, 0 replies; 109+ messages in thread
From: Auger Eric @ 2016-10-07 17:13 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Alex,

On 06/10/2016 22:17, Alex Williamson wrote:
> On Thu,  6 Oct 2016 08:45:20 +0000
> Eric Auger <eric.auger@redhat.com> wrote:
> 
>> We introduce a new msi-doorbell API that allows msi controllers
>> to allocate and register their doorbells. This is useful when
>> those doorbells are likely to be iommu mapped (typically on ARM).
>> The VFIO layer will need to gather information about those doorbells:
>> whether they are safe (ie. they implement irq remapping) and how
>> many IOMMU pages are requested to map all of them.
>>
>> This patch first introduces the dedicated msi_doorbell_info struct
>> and the registration/unregistration functions.
>>
>> A doorbell region is characterized by its physical address base, size,
>> and whether it its safe (ie. it implements IRQ remapping). A doorbell
>> can be per-cpu of global. We currently only care about global doorbells.
>                  ^^ s/of/or/
OK
> 
>>
>> A function returns whether all doorbells are safe.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>
>> ---
>> v12 -> v13:
>> - directly select MSI_DOORBELL in ARM_SMMU and ARM_SMMU_V3 configs
>> - remove prot attribute
>> - move msi_doorbell_info struct definition in msi-doorbell.c
>> - change the commit title
>> - change proto of the registration function
>> - msi_doorbell_safe now in this patch
>>
>> v11 -> v12:
>> - rename irqchip_doorbell into msi_doorbell, irqchip_doorbell_list
>>   into msi_doorbell_list and irqchip_doorbell_mutex into
>>   msi_doorbell_mutex
>> - fix style issues: align msi_doorbell struct members, kernel-doc comments
>> - use kzalloc
>> - use container_of in msi_doorbell_unregister_global
>> - compute nb_unsafe_doorbells on registration/unregistration
>> - registration simply returns NULL if allocation failed
>>
>> v10 -> v11:
>> - remove void *chip_data argument from register/unregister function
>> - remove lookup funtions since we restored the struct irq_chip
>>   msi_doorbell_info ops to realize this function
>> - reword commit message and title
>>
>> Conflicts:
>> 	kernel/irq/Makefile
>>
>> Conflicts:
>> 	drivers/iommu/Kconfig
>> ---
>>  drivers/iommu/Kconfig        |  2 +
>>  include/linux/msi-doorbell.h | 77 ++++++++++++++++++++++++++++++++++
>>  kernel/irq/Kconfig           |  4 ++
>>  kernel/irq/Makefile          |  1 +
>>  kernel/irq/msi-doorbell.c    | 98 ++++++++++++++++++++++++++++++++++++++++++++
>>  5 files changed, 182 insertions(+)
>>  create mode 100644 include/linux/msi-doorbell.h
>>  create mode 100644 kernel/irq/msi-doorbell.c
>>
>> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
>> index 8ee54d7..0cc7fac 100644
>> --- a/drivers/iommu/Kconfig
>> +++ b/drivers/iommu/Kconfig
>> @@ -297,6 +297,7 @@ config SPAPR_TCE_IOMMU
>>  config ARM_SMMU
>>  	bool "ARM Ltd. System MMU (SMMU) Support"
>>  	depends on (ARM64 || ARM) && MMU
>> +	select MSI_DOORBELL
>>  	select IOMMU_API
>>  	select IOMMU_IO_PGTABLE_LPAE
>>  	select ARM_DMA_USE_IOMMU if ARM
>> @@ -310,6 +311,7 @@ config ARM_SMMU
>>  config ARM_SMMU_V3
>>  	bool "ARM Ltd. System MMU Version 3 (SMMUv3) Support"
>>  	depends on ARM64
>> +	select MSI_DOORBELL
>>  	select IOMMU_API
>>  	select IOMMU_IO_PGTABLE_LPAE
>>  	select GENERIC_MSI_IRQ_DOMAIN
>> diff --git a/include/linux/msi-doorbell.h b/include/linux/msi-doorbell.h
>> new file mode 100644
>> index 0000000..c18a382
>> --- /dev/null
>> +++ b/include/linux/msi-doorbell.h
>> @@ -0,0 +1,77 @@
>> +/*
>> + * API to register/query MSI doorbells likely to be IOMMU mapped
>> + *
>> + * Copyright (C) 2016 Red Hat, Inc.
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#ifndef _LINUX_MSI_DOORBELL_H
>> +#define _LINUX_MSI_DOORBELL_H
>> +
>> +struct msi_doorbell_info;
>> +
>> +#ifdef CONFIG_MSI_DOORBELL
>> +
>> +/**
>> + * msi_doorbell_register - allocate and register a global doorbell
>> + * @base: physical base address of the global doorbell
>> + * @size: size of the global doorbell
>> + * @prot: protection/memory attributes
>> + * @safe: true is irq_remapping implemented for this doorbell
>> + * @dbinfo: returned doorbell info
>> + *
>> + * Return: 0 on success, -ENOMEM on allocation failure
>> + */
>> +int msi_doorbell_register_global(phys_addr_t base, size_t size,
>> +				 bool safe,
>> +				 struct msi_doorbell_info **dbinfo);
>> +
> 
> Seems like alloc/free behavior vs register/unregister.  Also seems
> cleaner to just return a struct msi_doorbell_info* and use PTR_ERR for
> return codes.  These are of course superficial changes that could be
> addressed in the future.
Sure
> 
>> +/**
>> + * msi_doorbell_unregister_global - unregister a global doorbell
>> + * @db: doorbell info to unregister
>> + *
>> + * remove the doorbell descriptor from the list of registered doorbells
>> + * and deallocates it
>> + */
>> +void msi_doorbell_unregister_global(struct msi_doorbell_info *db);
>> +
>> +/**
>> + * msi_doorbell_safe - return whether all registered doorbells are safe
>> + *
>> + * Safe doorbells are those which implement irq remapping
>> + * Return: true if all doorbells are safe, false otherwise
>> + */
>> +bool msi_doorbell_safe(void);
>> +
>> +#else
>> +
>> +static inline int
>> +msi_doorbell_register_global(phys_addr_t base, size_t size,
>> +			     int prot, bool safe,
>> +			     struct msi_doorbell_info **dbinfo)
>> +{
>> +	*dbinfo = NULL;
>> +	return 0;
> 
> If we return a struct*
> 
> return NULL;
Yep
> 
>> +}
>> +
>> +static inline void
>> +msi_doorbell_unregister_global(struct msi_doorbell_info *db) {}
>> +
>> +static inline bool msi_doorbell_safe(void)
>> +{
>> +	return true;
>> +}
> 
> Is it?
Yes I will return false and change the safety check in vfio_iommu_type1.c

Thanks

Eric
> 
>> +#endif /* CONFIG_MSI_DOORBELL */
>> +
>> +#endif
>> diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig
>> index 3bbfd6a..d4faaaa 100644
>> --- a/kernel/irq/Kconfig
>> +++ b/kernel/irq/Kconfig
>> @@ -72,6 +72,10 @@ config GENERIC_IRQ_IPI
>>  config GENERIC_MSI_IRQ
>>  	bool
>>  
>> +# MSI doorbell support (for doorbell IOMMU mapping)
>> +config MSI_DOORBELL
>> +	bool
>> +
>>  # Generic MSI hierarchical interrupt domain support
>>  config GENERIC_MSI_IRQ_DOMAIN
>>  	bool
>> diff --git a/kernel/irq/Makefile b/kernel/irq/Makefile
>> index 1d3ee31..5b04dd1 100644
>> --- a/kernel/irq/Makefile
>> +++ b/kernel/irq/Makefile
>> @@ -10,3 +10,4 @@ obj-$(CONFIG_PM_SLEEP) += pm.o
>>  obj-$(CONFIG_GENERIC_MSI_IRQ) += msi.o
>>  obj-$(CONFIG_GENERIC_IRQ_IPI) += ipi.o
>>  obj-$(CONFIG_SMP) += affinity.o
>> +obj-$(CONFIG_MSI_DOORBELL) += msi-doorbell.o
>> diff --git a/kernel/irq/msi-doorbell.c b/kernel/irq/msi-doorbell.c
>> new file mode 100644
>> index 0000000..60a262a
>> --- /dev/null
>> +++ b/kernel/irq/msi-doorbell.c
>> @@ -0,0 +1,98 @@
>> +/*
>> + * API to register/query MSI doorbells likely to be IOMMU mapped
>> + *
>> + * Copyright (C) 2016 Red Hat, Inc.
>> + * Author: Eric Auger <eric.auger@redhat.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#include <linux/slab.h>
>> +#include <linux/irq.h>
>> +#include <linux/msi-doorbell.h>
>> +
>> +/**
>> + * struct msi_doorbell_info - MSI doorbell region descriptor
>> + * @percpu_doorbells: per cpu doorbell base address
>> + * @global_doorbell: base address of the doorbell
>> + * @doorbell_is_percpu: is the doorbell per cpu or global?
>> + * @safe: true if irq remapping is implemented
>> + * @size: size of the doorbell
>> + */
>> +struct msi_doorbell_info {
>> +	union {
>> +		phys_addr_t __percpu    *percpu_doorbells;
>> +		phys_addr_t             global_doorbell;
>> +	};
>> +	bool    doorbell_is_percpu;
>> +	bool    safe;
>> +	size_t  size;
>> +};
>> +
>> +struct msi_doorbell {
>> +	struct msi_doorbell_info	info;
>> +	struct list_head		next;
>> +};
>> +
>> +/* list of registered MSI doorbells */
>> +static LIST_HEAD(msi_doorbell_list);
>> +
>> +/* counts the number of unsafe registered doorbells */
>> +static uint nb_unsafe_doorbells;
>> +
>> +/* protects the list and nb__unsafe_doorbells */
> 
> Extra underscore
> 
>> +static DEFINE_MUTEX(msi_doorbell_mutex);
>> +
>> +int msi_doorbell_register_global(phys_addr_t base, size_t size, bool safe,
>> +				 struct msi_doorbell_info **dbinfo)
>> +{
>> +	struct msi_doorbell *db;
>> +
>> +	db = kzalloc(sizeof(*db), GFP_KERNEL);
>> +	if (!db)
>> +		return -ENOMEM;
>> +
>> +	db->info.global_doorbell = base;
>> +	db->info.size = size;
>> +	db->info.safe = safe;
>> +
>> +	mutex_lock(&msi_doorbell_mutex);
>> +	list_add(&db->next, &msi_doorbell_list);
>> +	if (!db->info.safe)
>> +		nb_unsafe_doorbells++;
>> +	mutex_unlock(&msi_doorbell_mutex);
>> +	*dbinfo = &db->info;
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(msi_doorbell_register_global);
>> +
>> +void msi_doorbell_unregister_global(struct msi_doorbell_info *dbinfo)
>> +{
>> +	struct msi_doorbell *db;
>> +
>> +	db = container_of(dbinfo, struct msi_doorbell, info);
>> +
>> +	mutex_lock(&msi_doorbell_mutex);
>> +	list_del(&db->next);
>> +	if (!db->info.safe)
>> +		nb_unsafe_doorbells--;
>> +	mutex_unlock(&msi_doorbell_mutex);
>> +	kfree(db);
>> +}
>> +EXPORT_SYMBOL_GPL(msi_doorbell_unregister_global);
>> +
>> +bool msi_doorbell_safe(void)
>> +{
>> +	return !nb_unsafe_doorbells;
>> +}
>> +EXPORT_SYMBOL_GPL(msi_doorbell_safe);
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v13 03/15] iommu/dma: Allow MSI-only cookies
@ 2016-10-07 17:14       ` Auger Eric
  0 siblings, 0 replies; 109+ messages in thread
From: Auger Eric @ 2016-10-07 17:14 UTC (permalink / raw)
  To: Alex Williamson
  Cc: yehuday, drjones, jason, kvm, marc.zyngier, p.fedin, joro,
	will.deacon, linux-kernel, Bharat.Bhushan, Jean-Philippe.Brucker,
	iommu, pranav.sawargaonkar, linux-arm-kernel, tglx, robin.murphy,
	Manish.Jaggi, christoffer.dall, eric.auger.pro

Hi Alex,

On 06/10/2016 22:17, Alex Williamson wrote:
> On Thu,  6 Oct 2016 08:45:19 +0000
> Eric Auger <eric.auger@redhat.com> wrote:
> 
>> From: Robin Murphy <robin.murphy@arm.com>
>>
>> IOMMU domain users such as VFIO face a similar problem to DMA API ops
>> with regard to mapping MSI messages in systems where the MSI write is
>> subject to IOMMU translation. With the relevant infrastructure now in
>> place for managed DMA domains, it's actually really simple for other
>> users to piggyback off that and reap the benefits without giving up
>> their own IOVA management, and without having to reinvent their own
>> wheel in the MSI layer.
>>
>> Allow such users to opt into automatic MSI remapping by dedicating a
>> region of their IOVA space to a managed cookie.
>>
>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>
>> ---
>>
>> v1 -> v2:
>> - compared to Robin's version
>> - add NULL last param to iommu_dma_init_domain
>> - set the msi_geometry aperture
>> - I removed
>>   if (base < U64_MAX - size)
>>      reserve_iova(iovad, iova_pfn(iovad, base + size), ULONG_MAX);
>>   don't get why we would reserve something out of the scope of the iova domain?
>>   what do I miss?
>> ---
>>  drivers/iommu/dma-iommu.c | 40 ++++++++++++++++++++++++++++++++++++++++
>>  include/linux/dma-iommu.h |  9 +++++++++
>>  2 files changed, 49 insertions(+)
>>
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index c5ab866..11da1a0 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -716,3 +716,43 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
>>  		msg->address_lo += lower_32_bits(msi_page->iova);
>>  	}
>>  }
>> +
>> +/**
>> + * iommu_get_dma_msi_region_cookie - Configure a domain for MSI remapping only
> 
> Should this perhaps be iommu_setup_dma_msi_region_cookie, or something
> along those lines.  I'm not sure what we're get'ing.  Thanks,
This was chosen by analogy with legacy iommu_get_dma_cookie/
iommu_put_dma_cookie. But in practice it does both get &
iommu_dma_init_domain.

I plan to rename into iommu_setup_dma_msi_region if no objection

Thanks

Eric

> 
> Alex
> 
>> + * @domain: IOMMU domain to prepare
>> + * @base: Base address of IOVA region to use as the MSI remapping aperture
>> + * @size: Size of the desired MSI aperture
>> + *
>> + * Users who manage their own IOVA allocation and do not want DMA API support,
>> + * but would still like to take advantage of automatic MSI remapping, can use
>> + * this to initialise their own domain appropriately.
>> + */
>> +int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>> +		dma_addr_t base, u64 size)
>> +{
>> +	struct iommu_dma_cookie *cookie;
>> +	struct iova_domain *iovad;
>> +	int ret;
>> +
>> +	if (domain->type == IOMMU_DOMAIN_DMA)
>> +		return -EINVAL;
>> +
>> +	ret = iommu_get_dma_cookie(domain);
>> +	if (ret)
>> +		return ret;
>> +
>> +	ret = iommu_dma_init_domain(domain, base, size, NULL);
>> +	if (ret) {
>> +		iommu_put_dma_cookie(domain);
>> +		return ret;
>> +	}
>> +
>> +	domain->msi_geometry.aperture_start = base;
>> +	domain->msi_geometry.aperture_end = base + size - 1;
>> +
>> +	cookie = domain->iova_cookie;
>> +	iovad = &cookie->iovad;
>> +
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL(iommu_get_dma_msi_region_cookie);
>> diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
>> index 32c5890..1c55413 100644
>> --- a/include/linux/dma-iommu.h
>> +++ b/include/linux/dma-iommu.h
>> @@ -67,6 +67,9 @@ int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
>>  /* The DMA API isn't _quite_ the whole story, though... */
>>  void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg);
>>  
>> +int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>> +		dma_addr_t base, u64 size);
>> +
>>  #else
>>  
>>  struct iommu_domain;
>> @@ -90,6 +93,12 @@ static inline void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
>>  {
>>  }
>>  
>> +static inline int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>> +		dma_addr_t base, u64 size)
>> +{
>> +	return -ENODEV;
>> +}
>> +
>>  #endif	/* CONFIG_IOMMU_DMA */
>>  #endif	/* __KERNEL__ */
>>  #endif	/* __DMA_IOMMU_H */
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v13 03/15] iommu/dma: Allow MSI-only cookies
@ 2016-10-07 17:14       ` Auger Eric
  0 siblings, 0 replies; 109+ messages in thread
From: Auger Eric @ 2016-10-07 17:14 UTC (permalink / raw)
  To: Alex Williamson
  Cc: yehuday-eYqpPyKDWXRBDgjK7y7TUQ, jason-NLaQJdtUoK4Be96aLqz0jA,
	kvm-u79uwXL29TY76Z2rM5mHXA, marc.zyngier-5wv7dgnIgG8,
	will.deacon-5wv7dgnIgG8, p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
	drjones-H+wXaHxf7aLQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	tglx-hfZtesqFncYOwBW4kG4KsQ,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w

Hi Alex,

On 06/10/2016 22:17, Alex Williamson wrote:
> On Thu,  6 Oct 2016 08:45:19 +0000
> Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> 
>> From: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
>>
>> IOMMU domain users such as VFIO face a similar problem to DMA API ops
>> with regard to mapping MSI messages in systems where the MSI write is
>> subject to IOMMU translation. With the relevant infrastructure now in
>> place for managed DMA domains, it's actually really simple for other
>> users to piggyback off that and reap the benefits without giving up
>> their own IOVA management, and without having to reinvent their own
>> wheel in the MSI layer.
>>
>> Allow such users to opt into automatic MSI remapping by dedicating a
>> region of their IOVA space to a managed cookie.
>>
>> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
>> Signed-off-by: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>>
>> ---
>>
>> v1 -> v2:
>> - compared to Robin's version
>> - add NULL last param to iommu_dma_init_domain
>> - set the msi_geometry aperture
>> - I removed
>>   if (base < U64_MAX - size)
>>      reserve_iova(iovad, iova_pfn(iovad, base + size), ULONG_MAX);
>>   don't get why we would reserve something out of the scope of the iova domain?
>>   what do I miss?
>> ---
>>  drivers/iommu/dma-iommu.c | 40 ++++++++++++++++++++++++++++++++++++++++
>>  include/linux/dma-iommu.h |  9 +++++++++
>>  2 files changed, 49 insertions(+)
>>
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index c5ab866..11da1a0 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -716,3 +716,43 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
>>  		msg->address_lo += lower_32_bits(msi_page->iova);
>>  	}
>>  }
>> +
>> +/**
>> + * iommu_get_dma_msi_region_cookie - Configure a domain for MSI remapping only
> 
> Should this perhaps be iommu_setup_dma_msi_region_cookie, or something
> along those lines.  I'm not sure what we're get'ing.  Thanks,
This was chosen by analogy with legacy iommu_get_dma_cookie/
iommu_put_dma_cookie. But in practice it does both get &
iommu_dma_init_domain.

I plan to rename into iommu_setup_dma_msi_region if no objection

Thanks

Eric

> 
> Alex
> 
>> + * @domain: IOMMU domain to prepare
>> + * @base: Base address of IOVA region to use as the MSI remapping aperture
>> + * @size: Size of the desired MSI aperture
>> + *
>> + * Users who manage their own IOVA allocation and do not want DMA API support,
>> + * but would still like to take advantage of automatic MSI remapping, can use
>> + * this to initialise their own domain appropriately.
>> + */
>> +int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>> +		dma_addr_t base, u64 size)
>> +{
>> +	struct iommu_dma_cookie *cookie;
>> +	struct iova_domain *iovad;
>> +	int ret;
>> +
>> +	if (domain->type == IOMMU_DOMAIN_DMA)
>> +		return -EINVAL;
>> +
>> +	ret = iommu_get_dma_cookie(domain);
>> +	if (ret)
>> +		return ret;
>> +
>> +	ret = iommu_dma_init_domain(domain, base, size, NULL);
>> +	if (ret) {
>> +		iommu_put_dma_cookie(domain);
>> +		return ret;
>> +	}
>> +
>> +	domain->msi_geometry.aperture_start = base;
>> +	domain->msi_geometry.aperture_end = base + size - 1;
>> +
>> +	cookie = domain->iova_cookie;
>> +	iovad = &cookie->iovad;
>> +
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL(iommu_get_dma_msi_region_cookie);
>> diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
>> index 32c5890..1c55413 100644
>> --- a/include/linux/dma-iommu.h
>> +++ b/include/linux/dma-iommu.h
>> @@ -67,6 +67,9 @@ int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
>>  /* The DMA API isn't _quite_ the whole story, though... */
>>  void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg);
>>  
>> +int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>> +		dma_addr_t base, u64 size);
>> +
>>  #else
>>  
>>  struct iommu_domain;
>> @@ -90,6 +93,12 @@ static inline void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
>>  {
>>  }
>>  
>> +static inline int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>> +		dma_addr_t base, u64 size)
>> +{
>> +	return -ENODEV;
>> +}
>> +
>>  #endif	/* CONFIG_IOMMU_DMA */
>>  #endif	/* __KERNEL__ */
>>  #endif	/* __DMA_IOMMU_H */
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v13 03/15] iommu/dma: Allow MSI-only cookies
@ 2016-10-07 17:14       ` Auger Eric
  0 siblings, 0 replies; 109+ messages in thread
From: Auger Eric @ 2016-10-07 17:14 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Alex,

On 06/10/2016 22:17, Alex Williamson wrote:
> On Thu,  6 Oct 2016 08:45:19 +0000
> Eric Auger <eric.auger@redhat.com> wrote:
> 
>> From: Robin Murphy <robin.murphy@arm.com>
>>
>> IOMMU domain users such as VFIO face a similar problem to DMA API ops
>> with regard to mapping MSI messages in systems where the MSI write is
>> subject to IOMMU translation. With the relevant infrastructure now in
>> place for managed DMA domains, it's actually really simple for other
>> users to piggyback off that and reap the benefits without giving up
>> their own IOVA management, and without having to reinvent their own
>> wheel in the MSI layer.
>>
>> Allow such users to opt into automatic MSI remapping by dedicating a
>> region of their IOVA space to a managed cookie.
>>
>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>
>> ---
>>
>> v1 -> v2:
>> - compared to Robin's version
>> - add NULL last param to iommu_dma_init_domain
>> - set the msi_geometry aperture
>> - I removed
>>   if (base < U64_MAX - size)
>>      reserve_iova(iovad, iova_pfn(iovad, base + size), ULONG_MAX);
>>   don't get why we would reserve something out of the scope of the iova domain?
>>   what do I miss?
>> ---
>>  drivers/iommu/dma-iommu.c | 40 ++++++++++++++++++++++++++++++++++++++++
>>  include/linux/dma-iommu.h |  9 +++++++++
>>  2 files changed, 49 insertions(+)
>>
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index c5ab866..11da1a0 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -716,3 +716,43 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
>>  		msg->address_lo += lower_32_bits(msi_page->iova);
>>  	}
>>  }
>> +
>> +/**
>> + * iommu_get_dma_msi_region_cookie - Configure a domain for MSI remapping only
> 
> Should this perhaps be iommu_setup_dma_msi_region_cookie, or something
> along those lines.  I'm not sure what we're get'ing.  Thanks,
This was chosen by analogy with legacy iommu_get_dma_cookie/
iommu_put_dma_cookie. But in practice it does both get &
iommu_dma_init_domain.

I plan to rename into iommu_setup_dma_msi_region if no objection

Thanks

Eric

> 
> Alex
> 
>> + * @domain: IOMMU domain to prepare
>> + * @base: Base address of IOVA region to use as the MSI remapping aperture
>> + * @size: Size of the desired MSI aperture
>> + *
>> + * Users who manage their own IOVA allocation and do not want DMA API support,
>> + * but would still like to take advantage of automatic MSI remapping, can use
>> + * this to initialise their own domain appropriately.
>> + */
>> +int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>> +		dma_addr_t base, u64 size)
>> +{
>> +	struct iommu_dma_cookie *cookie;
>> +	struct iova_domain *iovad;
>> +	int ret;
>> +
>> +	if (domain->type == IOMMU_DOMAIN_DMA)
>> +		return -EINVAL;
>> +
>> +	ret = iommu_get_dma_cookie(domain);
>> +	if (ret)
>> +		return ret;
>> +
>> +	ret = iommu_dma_init_domain(domain, base, size, NULL);
>> +	if (ret) {
>> +		iommu_put_dma_cookie(domain);
>> +		return ret;
>> +	}
>> +
>> +	domain->msi_geometry.aperture_start = base;
>> +	domain->msi_geometry.aperture_end = base + size - 1;
>> +
>> +	cookie = domain->iova_cookie;
>> +	iovad = &cookie->iovad;
>> +
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL(iommu_get_dma_msi_region_cookie);
>> diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
>> index 32c5890..1c55413 100644
>> --- a/include/linux/dma-iommu.h
>> +++ b/include/linux/dma-iommu.h
>> @@ -67,6 +67,9 @@ int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
>>  /* The DMA API isn't _quite_ the whole story, though... */
>>  void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg);
>>  
>> +int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>> +		dma_addr_t base, u64 size);
>> +
>>  #else
>>  
>>  struct iommu_domain;
>> @@ -90,6 +93,12 @@ static inline void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
>>  {
>>  }
>>  
>> +static inline int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>> +		dma_addr_t base, u64 size)
>> +{
>> +	return -ENODEV;
>> +}
>> +
>>  #endif	/* CONFIG_IOMMU_DMA */
>>  #endif	/* __KERNEL__ */
>>  #endif	/* __DMA_IOMMU_H */
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v13 15/15] vfio/type1: Return the MSI geometry through VFIO_IOMMU_GET_INFO capability chains
@ 2016-10-07 20:38           ` Alex Williamson
  0 siblings, 0 replies; 109+ messages in thread
From: Alex Williamson @ 2016-10-07 20:38 UTC (permalink / raw)
  To: Auger Eric
  Cc: yehuday, drjones, jason, kvm, marc.zyngier, p.fedin, joro,
	will.deacon, linux-kernel, Bharat.Bhushan, Jean-Philippe.Brucker,
	iommu, pranav.sawargaonkar, linux-arm-kernel, tglx, robin.murphy,
	Manish.Jaggi, christoffer.dall, eric.auger.pro

On Fri, 7 Oct 2016 19:10:27 +0200
Auger Eric <eric.auger@redhat.com> wrote:

> Hi Alex,
> 
> On 06/10/2016 22:42, Alex Williamson wrote:
> > On Thu, 6 Oct 2016 14:20:40 -0600
> > Alex Williamson <alex.williamson@redhat.com> wrote:
> >   
> >> On Thu,  6 Oct 2016 08:45:31 +0000
> >> Eric Auger <eric.auger@redhat.com> wrote:
> >>  
> >>> This patch allows the user-space to retrieve the MSI geometry. The
> >>> implementation is based on capability chains, now also added to
> >>> VFIO_IOMMU_GET_INFO.
> >>>
> >>> The returned info comprise:
> >>> - whether the MSI IOVA are constrained to a reserved range (x86 case) and
> >>>   in the positive, the start/end of the aperture,
> >>> - or whether the IOVA aperture need to be set by the userspace. In that
> >>>   case, the size and alignment of the IOVA window to be provided are
> >>>   returned.
> >>>
> >>> In case the userspace must provide the IOVA aperture, we currently report
> >>> a size/alignment based on all the doorbells registered by the host kernel.
> >>> This may exceed the actual needs.
> >>>
> >>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> >>>
> >>> ---
> >>> v11 -> v11:
> >>> - msi_doorbell_pages was renamed msi_doorbell_calc_pages
> >>>
> >>> v9 -> v10:
> >>> - move cap_offset after iova_pgsizes
> >>> - replace __u64 alignment by __u32 order
> >>> - introduce __u32 flags in vfio_iommu_type1_info_cap_msi_geometry and
> >>>   fix alignment
> >>> - call msi-doorbell API to compute the size/alignment
> >>>
> >>> v8 -> v9:
> >>> - use iommu_msi_supported flag instead of programmable
> >>> - replace IOMMU_INFO_REQUIRE_MSI_MAP flag by a more sophisticated
> >>>   capability chain, reporting the MSI geometry
> >>>
> >>> v7 -> v8:
> >>> - use iommu_domain_msi_geometry
> >>>
> >>> v6 -> v7:
> >>> - remove the computation of the number of IOVA pages to be provisionned.
> >>>   This number depends on the domain/group/device topology which can
> >>>   dynamically change. Let's rely instead rely on an arbitrary max depending
> >>>   on the system
> >>>
> >>> v4 -> v5:
> >>> - move msi_info and ret declaration within the conditional code
> >>>
> >>> v3 -> v4:
> >>> - replace former vfio_domains_require_msi_mapping by
> >>>   more complex computation of MSI mapping requirements, especially the
> >>>   number of pages to be provided by the user-space.
> >>> - reword patch title
> >>>
> >>> RFC v1 -> v1:
> >>> - derived from
> >>>   [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
> >>> - renamed allow_msi_reconfig into require_msi_mapping
> >>> - fixed VFIO_IOMMU_GET_INFO
> >>> ---
> >>>  drivers/vfio/vfio_iommu_type1.c | 78 ++++++++++++++++++++++++++++++++++++++++-
> >>>  include/uapi/linux/vfio.h       | 32 ++++++++++++++++-
> >>>  2 files changed, 108 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> >>> index dc3ee5d..ce5e7eb 100644
> >>> --- a/drivers/vfio/vfio_iommu_type1.c
> >>> +++ b/drivers/vfio/vfio_iommu_type1.c
> >>> @@ -38,6 +38,8 @@
> >>>  #include <linux/workqueue.h>
> >>>  #include <linux/dma-iommu.h>
> >>>  #include <linux/msi-doorbell.h>
> >>> +#include <linux/irqdomain.h>
> >>> +#include <linux/msi.h>
> >>>  
> >>>  #define DRIVER_VERSION  "0.2"
> >>>  #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
> >>> @@ -1101,6 +1103,55 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
> >>>  	return ret;
> >>>  }
> >>>  
> >>> +static int compute_msi_geometry_caps(struct vfio_iommu *iommu,
> >>> +				     struct vfio_info_cap *caps)
> >>> +{
> >>> +	struct vfio_iommu_type1_info_cap_msi_geometry *vfio_msi_geometry;
> >>> +	unsigned long order = __ffs(vfio_pgsize_bitmap(iommu));
> >>> +	struct iommu_domain_msi_geometry msi_geometry;
> >>> +	struct vfio_info_cap_header *header;
> >>> +	struct vfio_domain *d;
> >>> +	bool reserved;
> >>> +	size_t size;
> >>> +
> >>> +	mutex_lock(&iommu->lock);
> >>> +	/* All domains have same require_msi_map property, pick first */
> >>> +	d = list_first_entry(&iommu->domain_list, struct vfio_domain, next);
> >>> +	iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_GEOMETRY,
> >>> +			      &msi_geometry);
> >>> +	reserved = !msi_geometry.iommu_msi_supported;
> >>> +
> >>> +	mutex_unlock(&iommu->lock);
> >>> +
> >>> +	size = sizeof(*vfio_msi_geometry);
> >>> +	header = vfio_info_cap_add(caps, size,
> >>> +				   VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY, 1);
> >>> +
> >>> +	if (IS_ERR(header))
> >>> +		return PTR_ERR(header);
> >>> +
> >>> +	vfio_msi_geometry = container_of(header,
> >>> +				struct vfio_iommu_type1_info_cap_msi_geometry,
> >>> +				header);
> >>> +
> >>> +	vfio_msi_geometry->flags = reserved;    
> >>
> >> Use the bit flag VFIO_IOMMU_MSI_GEOMETRY_RESERVED
> >>  
> >>> +	if (reserved) {
> >>> +		vfio_msi_geometry->aperture_start = msi_geometry.aperture_start;
> >>> +		vfio_msi_geometry->aperture_end = msi_geometry.aperture_end;    
> >>
> >> But maybe nobody has set these, did you intend to use
> >> iommu_domain_msi_aperture_valid(), which you defined early on but never
> >> used?
> >>  
> >>> +		return 0;
> >>> +	}
> >>> +
> >>> +	vfio_msi_geometry->order = order;    
> >>
> >> I'm tempted to suggest that a user could do the same math on their own
> >> since we provide the supported bitmap already... could it ever not be
> >> the same? 
> >>  
> >>> +	/*
> >>> +	 * we compute a system-wide requirement based on all the registered
> >>> +	 * doorbells
> >>> +	 */
> >>> +	vfio_msi_geometry->size =
> >>> +		msi_doorbell_calc_pages(order) * ((uint64_t) 1 << order);
> >>> +
> >>> +	return 0;
> >>> +}
> >>> +
> >>>  static long vfio_iommu_type1_ioctl(void *iommu_data,
> >>>  				   unsigned int cmd, unsigned long arg)
> >>>  {
> >>> @@ -1122,8 +1173,10 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
> >>>  		}
> >>>  	} else if (cmd == VFIO_IOMMU_GET_INFO) {
> >>>  		struct vfio_iommu_type1_info info;
> >>> +		struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
> >>> +		int ret;
> >>>  
> >>> -		minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
> >>> +		minsz = offsetofend(struct vfio_iommu_type1_info, cap_offset);
> >>>  
> >>>  		if (copy_from_user(&info, (void __user *)arg, minsz))
> >>>  			return -EFAULT;
> >>> @@ -1135,6 +1188,29 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
> >>>  
> >>>  		info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
> >>>  
> >>> +		ret = compute_msi_geometry_caps(iommu, &caps);
> >>> +		if (ret)
> >>> +			return ret;
> >>> +
> >>> +		if (caps.size) {
> >>> +			info.flags |= VFIO_IOMMU_INFO_CAPS;
> >>> +			if (info.argsz < sizeof(info) + caps.size) {
> >>> +				info.argsz = sizeof(info) + caps.size;
> >>> +				info.cap_offset = 0;
> >>> +			} else {
> >>> +				vfio_info_cap_shift(&caps, sizeof(info));
> >>> +				if (copy_to_user((void __user *)arg +
> >>> +						sizeof(info), caps.buf,
> >>> +						caps.size)) {
> >>> +					kfree(caps.buf);
> >>> +					return -EFAULT;
> >>> +				}
> >>> +				info.cap_offset = sizeof(info);
> >>> +			}
> >>> +
> >>> +			kfree(caps.buf);
> >>> +		}
> >>> +
> >>>  		return copy_to_user((void __user *)arg, &info, minsz) ?
> >>>  			-EFAULT : 0;
> >>>  
> >>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> >>> index 4a9dbc2..8dae013 100644
> >>> --- a/include/uapi/linux/vfio.h
> >>> +++ b/include/uapi/linux/vfio.h
> >>> @@ -488,7 +488,35 @@ struct vfio_iommu_type1_info {
> >>>  	__u32	argsz;
> >>>  	__u32	flags;
> >>>  #define VFIO_IOMMU_INFO_PGSIZES (1 << 0)	/* supported page sizes info */
> >>> -	__u64	iova_pgsizes;		/* Bitmap of supported page sizes */
> >>> +#define VFIO_IOMMU_INFO_CAPS	(1 << 1)	/* Info supports caps */
> >>> +	__u64	iova_pgsizes;	/* Bitmap of supported page sizes */
> >>> +	__u32	__resv;
> >>> +	__u32   cap_offset;	/* Offset within info struct of first cap */
> >>> +};    
> >>
> >> I understand the padding, but not the ordering.  Why not end with
> >> padding?
> >>  
> >>> +
> >>> +#define VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY	1
> >>> +
> >>> +/*
> >>> + * The MSI geometry capability allows to report the MSI IOVA geometry:
> >>> + * - either the MSI IOVAs are constrained within a reserved IOVA aperture
> >>> + *   whose boundaries are given by [@aperture_start, @aperture_end].
> >>> + *   this is typically the case on x86 host. The userspace is not allowed
> >>> + *   to map userspace memory at IOVAs intersecting this range using
> >>> + *   VFIO_IOMMU_MAP_DMA.
> >>> + * - or the MSI IOVAs are not requested to belong to any reserved range;
> >>> + *   in that case the userspace must provide an IOVA window characterized by
> >>> + *   @size and @alignment using VFIO_IOMMU_MAP_DMA with RESERVED_MSI_IOVA flag.
> >>> + */
> >>> +struct vfio_iommu_type1_info_cap_msi_geometry {
> >>> +	struct vfio_info_cap_header header;
> >>> +	__u32 flags;
> >>> +#define VFIO_IOMMU_MSI_GEOMETRY_RESERVED (1 << 0) /* reserved geometry */
> >>> +	/* not reserved */
> >>> +	__u32 order; /* iommu page order used for aperture alignment*/
> >>> +	__u64 size; /* IOVA aperture size (bytes) the userspace must provide */
> >>> +	/* reserved */
> >>> +	__u64 aperture_start;
> >>> +	__u64 aperture_end;    
> >>
> >> Should these be a union?  We never set them both.  Should the !reserved
> >> case have a flag as well, so the user can positively identify what's
> >> being provided?  
> > 
> > Actually, is there really any need to fit both of these within the same
> > structure?  Part of the idea of the capability chains is we can create
> > a capability for each new thing we want to describe.  So, we could
> > simply define a generic reserved IOVA range capability with a 'start'
> > and 'end' and then another capability to define MSI mapping
> > requirements.  Thanks,  
> Yes your suggested approach makes sense to me.
> 
> One reason why I proceeded that way is we are mixing things at iommu.h
> level too. Personally I would have preferred to separate things:
> 1) add a new IOMMU_CAP_TRANSLATE_MSI capability in iommu_cap
> 2) rename iommu_msi_supported into "programmable" bool: reporting
> whether the aperture is reserved or programmable.
> 
> In the early releases I think it was as above but slightly we moved to a
> mixed description.
> 
> What do you think?

The API certainly doesn't seem like it has a cohesive feel to me.  It's
not entirely clear to me how we know when we need to register a DMA MSI
cookie, or how we know that the MSI doorbell API is actually
initialized and in use by the MSI/IOMMU layer, or exactly what is the
MSI geometry telling me.  Perhaps this is why the code doesn't seem to
have a good rejection mechanism for architectures that need it versus
those that don't, it's too hard to tell.

Maybe we can look at what we think the user API should be and work
backwards.  For x86 we simply have a reserved range of IOVA.  I'm not
entirely sure it adds to the user API to know that it's for MSI, it's
just a range of IOVAs that we cannot allocate for regular DMA.  In
fact, we currently lack a mechanism for describing the IOVA space of
the IOMMU at all, so rather than focusing on a mechanism to describe a
hole in the IOVA space, we might simply want to focus on a mechanism to
describe the available IOVA space.  Everybody needs that, not just
x86.  That sort of sounds like a VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE
that perhaps looks like:

struct vfio_iommu_type1_info_cap_iova_range {
	struct vfio_info_cap_header header;
	u64 start;
	u64 end;
};

Clearly we need to allow multiple of these in the capability chain
since the existing x86 MSI range bisects this address space.

To support this, we basically need the same information from the IOMMU
API.  We already have DOMAIN_ATTR_GEOMETRY, which should give us the
base IOVA range, but we don't have anything describing the gaps.  We
don't know how many sources of gaps we'll have in the future, but let's
keep it simple and assume we can look for MSI gaps and add other
possible sources of gaps in the future, it's an internal API after all.
So we can use DOMAIN_ATTR_MSI_GEOMETRY to tell us about the (we assume
one) MSI range of reserved IOVA within DOMAIN_ATTR_GEOMETRY.  For x86
this is fixed, for SMMU this is a zero range until someone programs it.

Now, what does a user need to know to add a reserved MSI IOVA range?
They need to know a) that it needs to be done, and b) how big to make
it (and maybe alignment requirements).  Really all we need to describe
then is b) since b) implies a). So maybe that gives us another
capability chain entry:

struct vfio_iommu_type1_info_cap_msi_resv {
	struct vfio_info_cap_header header;
	u64 size;
	u64 alignment;
};

It doesn't seem like we need to waste a flag bit on
vfio_iommu_type1_info.flags for this since the existence of this
capability would imply that VFIO_IOMMU_MAP_DMA supports an MSI_RESV
flag.

So what do we need from the kernel infrastructure to make that happen?
Well, we need a) and b) above, and again b) can imply a), so if the
IOMMU API provided a DOMAIN_ATTR_MSI_RESV, providing the same
size/alignment, then we're nearly there.  Then we just need a way to
set that range, which I'd probably try to plumb through the IOMMU API
rather than pulling in separate doorbell APIs and DMA cookie APIs.  If
it's going to pull together all those different things, let's at least
only do that in one place so we can expose a consistent API through the
IOMMU API.  Obviously once a range is set, DOMAIN_ATTR_MSI_RESV should
report that range, so if the user were to look at the type1 info
capability chain again, the available IOVA ranges would reflect the now
reserved range.

Maybe that's more than you're asking for, but that's the approach I
would take to solidify the API.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v13 15/15] vfio/type1: Return the MSI geometry through VFIO_IOMMU_GET_INFO capability chains
@ 2016-10-07 20:38           ` Alex Williamson
  0 siblings, 0 replies; 109+ messages in thread
From: Alex Williamson @ 2016-10-07 20:38 UTC (permalink / raw)
  To: Auger Eric
  Cc: yehuday-eYqpPyKDWXRBDgjK7y7TUQ, jason-NLaQJdtUoK4Be96aLqz0jA,
	kvm-u79uwXL29TY76Z2rM5mHXA, marc.zyngier-5wv7dgnIgG8,
	will.deacon-5wv7dgnIgG8, p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
	drjones-H+wXaHxf7aLQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	tglx-hfZtesqFncYOwBW4kG4KsQ,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w

On Fri, 7 Oct 2016 19:10:27 +0200
Auger Eric <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> Hi Alex,
> 
> On 06/10/2016 22:42, Alex Williamson wrote:
> > On Thu, 6 Oct 2016 14:20:40 -0600
> > Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> >   
> >> On Thu,  6 Oct 2016 08:45:31 +0000
> >> Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> >>  
> >>> This patch allows the user-space to retrieve the MSI geometry. The
> >>> implementation is based on capability chains, now also added to
> >>> VFIO_IOMMU_GET_INFO.
> >>>
> >>> The returned info comprise:
> >>> - whether the MSI IOVA are constrained to a reserved range (x86 case) and
> >>>   in the positive, the start/end of the aperture,
> >>> - or whether the IOVA aperture need to be set by the userspace. In that
> >>>   case, the size and alignment of the IOVA window to be provided are
> >>>   returned.
> >>>
> >>> In case the userspace must provide the IOVA aperture, we currently report
> >>> a size/alignment based on all the doorbells registered by the host kernel.
> >>> This may exceed the actual needs.
> >>>
> >>> Signed-off-by: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> >>>
> >>> ---
> >>> v11 -> v11:
> >>> - msi_doorbell_pages was renamed msi_doorbell_calc_pages
> >>>
> >>> v9 -> v10:
> >>> - move cap_offset after iova_pgsizes
> >>> - replace __u64 alignment by __u32 order
> >>> - introduce __u32 flags in vfio_iommu_type1_info_cap_msi_geometry and
> >>>   fix alignment
> >>> - call msi-doorbell API to compute the size/alignment
> >>>
> >>> v8 -> v9:
> >>> - use iommu_msi_supported flag instead of programmable
> >>> - replace IOMMU_INFO_REQUIRE_MSI_MAP flag by a more sophisticated
> >>>   capability chain, reporting the MSI geometry
> >>>
> >>> v7 -> v8:
> >>> - use iommu_domain_msi_geometry
> >>>
> >>> v6 -> v7:
> >>> - remove the computation of the number of IOVA pages to be provisionned.
> >>>   This number depends on the domain/group/device topology which can
> >>>   dynamically change. Let's rely instead rely on an arbitrary max depending
> >>>   on the system
> >>>
> >>> v4 -> v5:
> >>> - move msi_info and ret declaration within the conditional code
> >>>
> >>> v3 -> v4:
> >>> - replace former vfio_domains_require_msi_mapping by
> >>>   more complex computation of MSI mapping requirements, especially the
> >>>   number of pages to be provided by the user-space.
> >>> - reword patch title
> >>>
> >>> RFC v1 -> v1:
> >>> - derived from
> >>>   [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
> >>> - renamed allow_msi_reconfig into require_msi_mapping
> >>> - fixed VFIO_IOMMU_GET_INFO
> >>> ---
> >>>  drivers/vfio/vfio_iommu_type1.c | 78 ++++++++++++++++++++++++++++++++++++++++-
> >>>  include/uapi/linux/vfio.h       | 32 ++++++++++++++++-
> >>>  2 files changed, 108 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> >>> index dc3ee5d..ce5e7eb 100644
> >>> --- a/drivers/vfio/vfio_iommu_type1.c
> >>> +++ b/drivers/vfio/vfio_iommu_type1.c
> >>> @@ -38,6 +38,8 @@
> >>>  #include <linux/workqueue.h>
> >>>  #include <linux/dma-iommu.h>
> >>>  #include <linux/msi-doorbell.h>
> >>> +#include <linux/irqdomain.h>
> >>> +#include <linux/msi.h>
> >>>  
> >>>  #define DRIVER_VERSION  "0.2"
> >>>  #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>"
> >>> @@ -1101,6 +1103,55 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
> >>>  	return ret;
> >>>  }
> >>>  
> >>> +static int compute_msi_geometry_caps(struct vfio_iommu *iommu,
> >>> +				     struct vfio_info_cap *caps)
> >>> +{
> >>> +	struct vfio_iommu_type1_info_cap_msi_geometry *vfio_msi_geometry;
> >>> +	unsigned long order = __ffs(vfio_pgsize_bitmap(iommu));
> >>> +	struct iommu_domain_msi_geometry msi_geometry;
> >>> +	struct vfio_info_cap_header *header;
> >>> +	struct vfio_domain *d;
> >>> +	bool reserved;
> >>> +	size_t size;
> >>> +
> >>> +	mutex_lock(&iommu->lock);
> >>> +	/* All domains have same require_msi_map property, pick first */
> >>> +	d = list_first_entry(&iommu->domain_list, struct vfio_domain, next);
> >>> +	iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_GEOMETRY,
> >>> +			      &msi_geometry);
> >>> +	reserved = !msi_geometry.iommu_msi_supported;
> >>> +
> >>> +	mutex_unlock(&iommu->lock);
> >>> +
> >>> +	size = sizeof(*vfio_msi_geometry);
> >>> +	header = vfio_info_cap_add(caps, size,
> >>> +				   VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY, 1);
> >>> +
> >>> +	if (IS_ERR(header))
> >>> +		return PTR_ERR(header);
> >>> +
> >>> +	vfio_msi_geometry = container_of(header,
> >>> +				struct vfio_iommu_type1_info_cap_msi_geometry,
> >>> +				header);
> >>> +
> >>> +	vfio_msi_geometry->flags = reserved;    
> >>
> >> Use the bit flag VFIO_IOMMU_MSI_GEOMETRY_RESERVED
> >>  
> >>> +	if (reserved) {
> >>> +		vfio_msi_geometry->aperture_start = msi_geometry.aperture_start;
> >>> +		vfio_msi_geometry->aperture_end = msi_geometry.aperture_end;    
> >>
> >> But maybe nobody has set these, did you intend to use
> >> iommu_domain_msi_aperture_valid(), which you defined early on but never
> >> used?
> >>  
> >>> +		return 0;
> >>> +	}
> >>> +
> >>> +	vfio_msi_geometry->order = order;    
> >>
> >> I'm tempted to suggest that a user could do the same math on their own
> >> since we provide the supported bitmap already... could it ever not be
> >> the same? 
> >>  
> >>> +	/*
> >>> +	 * we compute a system-wide requirement based on all the registered
> >>> +	 * doorbells
> >>> +	 */
> >>> +	vfio_msi_geometry->size =
> >>> +		msi_doorbell_calc_pages(order) * ((uint64_t) 1 << order);
> >>> +
> >>> +	return 0;
> >>> +}
> >>> +
> >>>  static long vfio_iommu_type1_ioctl(void *iommu_data,
> >>>  				   unsigned int cmd, unsigned long arg)
> >>>  {
> >>> @@ -1122,8 +1173,10 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
> >>>  		}
> >>>  	} else if (cmd == VFIO_IOMMU_GET_INFO) {
> >>>  		struct vfio_iommu_type1_info info;
> >>> +		struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
> >>> +		int ret;
> >>>  
> >>> -		minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
> >>> +		minsz = offsetofend(struct vfio_iommu_type1_info, cap_offset);
> >>>  
> >>>  		if (copy_from_user(&info, (void __user *)arg, minsz))
> >>>  			return -EFAULT;
> >>> @@ -1135,6 +1188,29 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
> >>>  
> >>>  		info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
> >>>  
> >>> +		ret = compute_msi_geometry_caps(iommu, &caps);
> >>> +		if (ret)
> >>> +			return ret;
> >>> +
> >>> +		if (caps.size) {
> >>> +			info.flags |= VFIO_IOMMU_INFO_CAPS;
> >>> +			if (info.argsz < sizeof(info) + caps.size) {
> >>> +				info.argsz = sizeof(info) + caps.size;
> >>> +				info.cap_offset = 0;
> >>> +			} else {
> >>> +				vfio_info_cap_shift(&caps, sizeof(info));
> >>> +				if (copy_to_user((void __user *)arg +
> >>> +						sizeof(info), caps.buf,
> >>> +						caps.size)) {
> >>> +					kfree(caps.buf);
> >>> +					return -EFAULT;
> >>> +				}
> >>> +				info.cap_offset = sizeof(info);
> >>> +			}
> >>> +
> >>> +			kfree(caps.buf);
> >>> +		}
> >>> +
> >>>  		return copy_to_user((void __user *)arg, &info, minsz) ?
> >>>  			-EFAULT : 0;
> >>>  
> >>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> >>> index 4a9dbc2..8dae013 100644
> >>> --- a/include/uapi/linux/vfio.h
> >>> +++ b/include/uapi/linux/vfio.h
> >>> @@ -488,7 +488,35 @@ struct vfio_iommu_type1_info {
> >>>  	__u32	argsz;
> >>>  	__u32	flags;
> >>>  #define VFIO_IOMMU_INFO_PGSIZES (1 << 0)	/* supported page sizes info */
> >>> -	__u64	iova_pgsizes;		/* Bitmap of supported page sizes */
> >>> +#define VFIO_IOMMU_INFO_CAPS	(1 << 1)	/* Info supports caps */
> >>> +	__u64	iova_pgsizes;	/* Bitmap of supported page sizes */
> >>> +	__u32	__resv;
> >>> +	__u32   cap_offset;	/* Offset within info struct of first cap */
> >>> +};    
> >>
> >> I understand the padding, but not the ordering.  Why not end with
> >> padding?
> >>  
> >>> +
> >>> +#define VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY	1
> >>> +
> >>> +/*
> >>> + * The MSI geometry capability allows to report the MSI IOVA geometry:
> >>> + * - either the MSI IOVAs are constrained within a reserved IOVA aperture
> >>> + *   whose boundaries are given by [@aperture_start, @aperture_end].
> >>> + *   this is typically the case on x86 host. The userspace is not allowed
> >>> + *   to map userspace memory at IOVAs intersecting this range using
> >>> + *   VFIO_IOMMU_MAP_DMA.
> >>> + * - or the MSI IOVAs are not requested to belong to any reserved range;
> >>> + *   in that case the userspace must provide an IOVA window characterized by
> >>> + *   @size and @alignment using VFIO_IOMMU_MAP_DMA with RESERVED_MSI_IOVA flag.
> >>> + */
> >>> +struct vfio_iommu_type1_info_cap_msi_geometry {
> >>> +	struct vfio_info_cap_header header;
> >>> +	__u32 flags;
> >>> +#define VFIO_IOMMU_MSI_GEOMETRY_RESERVED (1 << 0) /* reserved geometry */
> >>> +	/* not reserved */
> >>> +	__u32 order; /* iommu page order used for aperture alignment*/
> >>> +	__u64 size; /* IOVA aperture size (bytes) the userspace must provide */
> >>> +	/* reserved */
> >>> +	__u64 aperture_start;
> >>> +	__u64 aperture_end;    
> >>
> >> Should these be a union?  We never set them both.  Should the !reserved
> >> case have a flag as well, so the user can positively identify what's
> >> being provided?  
> > 
> > Actually, is there really any need to fit both of these within the same
> > structure?  Part of the idea of the capability chains is we can create
> > a capability for each new thing we want to describe.  So, we could
> > simply define a generic reserved IOVA range capability with a 'start'
> > and 'end' and then another capability to define MSI mapping
> > requirements.  Thanks,  
> Yes your suggested approach makes sense to me.
> 
> One reason why I proceeded that way is we are mixing things at iommu.h
> level too. Personally I would have preferred to separate things:
> 1) add a new IOMMU_CAP_TRANSLATE_MSI capability in iommu_cap
> 2) rename iommu_msi_supported into "programmable" bool: reporting
> whether the aperture is reserved or programmable.
> 
> In the early releases I think it was as above but slightly we moved to a
> mixed description.
> 
> What do you think?

The API certainly doesn't seem like it has a cohesive feel to me.  It's
not entirely clear to me how we know when we need to register a DMA MSI
cookie, or how we know that the MSI doorbell API is actually
initialized and in use by the MSI/IOMMU layer, or exactly what is the
MSI geometry telling me.  Perhaps this is why the code doesn't seem to
have a good rejection mechanism for architectures that need it versus
those that don't, it's too hard to tell.

Maybe we can look at what we think the user API should be and work
backwards.  For x86 we simply have a reserved range of IOVA.  I'm not
entirely sure it adds to the user API to know that it's for MSI, it's
just a range of IOVAs that we cannot allocate for regular DMA.  In
fact, we currently lack a mechanism for describing the IOVA space of
the IOMMU at all, so rather than focusing on a mechanism to describe a
hole in the IOVA space, we might simply want to focus on a mechanism to
describe the available IOVA space.  Everybody needs that, not just
x86.  That sort of sounds like a VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE
that perhaps looks like:

struct vfio_iommu_type1_info_cap_iova_range {
	struct vfio_info_cap_header header;
	u64 start;
	u64 end;
};

Clearly we need to allow multiple of these in the capability chain
since the existing x86 MSI range bisects this address space.

To support this, we basically need the same information from the IOMMU
API.  We already have DOMAIN_ATTR_GEOMETRY, which should give us the
base IOVA range, but we don't have anything describing the gaps.  We
don't know how many sources of gaps we'll have in the future, but let's
keep it simple and assume we can look for MSI gaps and add other
possible sources of gaps in the future, it's an internal API after all.
So we can use DOMAIN_ATTR_MSI_GEOMETRY to tell us about the (we assume
one) MSI range of reserved IOVA within DOMAIN_ATTR_GEOMETRY.  For x86
this is fixed, for SMMU this is a zero range until someone programs it.

Now, what does a user need to know to add a reserved MSI IOVA range?
They need to know a) that it needs to be done, and b) how big to make
it (and maybe alignment requirements).  Really all we need to describe
then is b) since b) implies a). So maybe that gives us another
capability chain entry:

struct vfio_iommu_type1_info_cap_msi_resv {
	struct vfio_info_cap_header header;
	u64 size;
	u64 alignment;
};

It doesn't seem like we need to waste a flag bit on
vfio_iommu_type1_info.flags for this since the existence of this
capability would imply that VFIO_IOMMU_MAP_DMA supports an MSI_RESV
flag.

So what do we need from the kernel infrastructure to make that happen?
Well, we need a) and b) above, and again b) can imply a), so if the
IOMMU API provided a DOMAIN_ATTR_MSI_RESV, providing the same
size/alignment, then we're nearly there.  Then we just need a way to
set that range, which I'd probably try to plumb through the IOMMU API
rather than pulling in separate doorbell APIs and DMA cookie APIs.  If
it's going to pull together all those different things, let's at least
only do that in one place so we can expose a consistent API through the
IOMMU API.  Obviously once a range is set, DOMAIN_ATTR_MSI_RESV should
report that range, so if the user were to look at the type1 info
capability chain again, the available IOVA ranges would reflect the now
reserved range.

Maybe that's more than you're asking for, but that's the approach I
would take to solidify the API.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v13 15/15] vfio/type1: Return the MSI geometry through VFIO_IOMMU_GET_INFO capability chains
@ 2016-10-07 20:38           ` Alex Williamson
  0 siblings, 0 replies; 109+ messages in thread
From: Alex Williamson @ 2016-10-07 20:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 7 Oct 2016 19:10:27 +0200
Auger Eric <eric.auger@redhat.com> wrote:

> Hi Alex,
> 
> On 06/10/2016 22:42, Alex Williamson wrote:
> > On Thu, 6 Oct 2016 14:20:40 -0600
> > Alex Williamson <alex.williamson@redhat.com> wrote:
> >   
> >> On Thu,  6 Oct 2016 08:45:31 +0000
> >> Eric Auger <eric.auger@redhat.com> wrote:
> >>  
> >>> This patch allows the user-space to retrieve the MSI geometry. The
> >>> implementation is based on capability chains, now also added to
> >>> VFIO_IOMMU_GET_INFO.
> >>>
> >>> The returned info comprise:
> >>> - whether the MSI IOVA are constrained to a reserved range (x86 case) and
> >>>   in the positive, the start/end of the aperture,
> >>> - or whether the IOVA aperture need to be set by the userspace. In that
> >>>   case, the size and alignment of the IOVA window to be provided are
> >>>   returned.
> >>>
> >>> In case the userspace must provide the IOVA aperture, we currently report
> >>> a size/alignment based on all the doorbells registered by the host kernel.
> >>> This may exceed the actual needs.
> >>>
> >>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> >>>
> >>> ---
> >>> v11 -> v11:
> >>> - msi_doorbell_pages was renamed msi_doorbell_calc_pages
> >>>
> >>> v9 -> v10:
> >>> - move cap_offset after iova_pgsizes
> >>> - replace __u64 alignment by __u32 order
> >>> - introduce __u32 flags in vfio_iommu_type1_info_cap_msi_geometry and
> >>>   fix alignment
> >>> - call msi-doorbell API to compute the size/alignment
> >>>
> >>> v8 -> v9:
> >>> - use iommu_msi_supported flag instead of programmable
> >>> - replace IOMMU_INFO_REQUIRE_MSI_MAP flag by a more sophisticated
> >>>   capability chain, reporting the MSI geometry
> >>>
> >>> v7 -> v8:
> >>> - use iommu_domain_msi_geometry
> >>>
> >>> v6 -> v7:
> >>> - remove the computation of the number of IOVA pages to be provisionned.
> >>>   This number depends on the domain/group/device topology which can
> >>>   dynamically change. Let's rely instead rely on an arbitrary max depending
> >>>   on the system
> >>>
> >>> v4 -> v5:
> >>> - move msi_info and ret declaration within the conditional code
> >>>
> >>> v3 -> v4:
> >>> - replace former vfio_domains_require_msi_mapping by
> >>>   more complex computation of MSI mapping requirements, especially the
> >>>   number of pages to be provided by the user-space.
> >>> - reword patch title
> >>>
> >>> RFC v1 -> v1:
> >>> - derived from
> >>>   [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
> >>> - renamed allow_msi_reconfig into require_msi_mapping
> >>> - fixed VFIO_IOMMU_GET_INFO
> >>> ---
> >>>  drivers/vfio/vfio_iommu_type1.c | 78 ++++++++++++++++++++++++++++++++++++++++-
> >>>  include/uapi/linux/vfio.h       | 32 ++++++++++++++++-
> >>>  2 files changed, 108 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> >>> index dc3ee5d..ce5e7eb 100644
> >>> --- a/drivers/vfio/vfio_iommu_type1.c
> >>> +++ b/drivers/vfio/vfio_iommu_type1.c
> >>> @@ -38,6 +38,8 @@
> >>>  #include <linux/workqueue.h>
> >>>  #include <linux/dma-iommu.h>
> >>>  #include <linux/msi-doorbell.h>
> >>> +#include <linux/irqdomain.h>
> >>> +#include <linux/msi.h>
> >>>  
> >>>  #define DRIVER_VERSION  "0.2"
> >>>  #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
> >>> @@ -1101,6 +1103,55 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
> >>>  	return ret;
> >>>  }
> >>>  
> >>> +static int compute_msi_geometry_caps(struct vfio_iommu *iommu,
> >>> +				     struct vfio_info_cap *caps)
> >>> +{
> >>> +	struct vfio_iommu_type1_info_cap_msi_geometry *vfio_msi_geometry;
> >>> +	unsigned long order = __ffs(vfio_pgsize_bitmap(iommu));
> >>> +	struct iommu_domain_msi_geometry msi_geometry;
> >>> +	struct vfio_info_cap_header *header;
> >>> +	struct vfio_domain *d;
> >>> +	bool reserved;
> >>> +	size_t size;
> >>> +
> >>> +	mutex_lock(&iommu->lock);
> >>> +	/* All domains have same require_msi_map property, pick first */
> >>> +	d = list_first_entry(&iommu->domain_list, struct vfio_domain, next);
> >>> +	iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_GEOMETRY,
> >>> +			      &msi_geometry);
> >>> +	reserved = !msi_geometry.iommu_msi_supported;
> >>> +
> >>> +	mutex_unlock(&iommu->lock);
> >>> +
> >>> +	size = sizeof(*vfio_msi_geometry);
> >>> +	header = vfio_info_cap_add(caps, size,
> >>> +				   VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY, 1);
> >>> +
> >>> +	if (IS_ERR(header))
> >>> +		return PTR_ERR(header);
> >>> +
> >>> +	vfio_msi_geometry = container_of(header,
> >>> +				struct vfio_iommu_type1_info_cap_msi_geometry,
> >>> +				header);
> >>> +
> >>> +	vfio_msi_geometry->flags = reserved;    
> >>
> >> Use the bit flag VFIO_IOMMU_MSI_GEOMETRY_RESERVED
> >>  
> >>> +	if (reserved) {
> >>> +		vfio_msi_geometry->aperture_start = msi_geometry.aperture_start;
> >>> +		vfio_msi_geometry->aperture_end = msi_geometry.aperture_end;    
> >>
> >> But maybe nobody has set these, did you intend to use
> >> iommu_domain_msi_aperture_valid(), which you defined early on but never
> >> used?
> >>  
> >>> +		return 0;
> >>> +	}
> >>> +
> >>> +	vfio_msi_geometry->order = order;    
> >>
> >> I'm tempted to suggest that a user could do the same math on their own
> >> since we provide the supported bitmap already... could it ever not be
> >> the same? 
> >>  
> >>> +	/*
> >>> +	 * we compute a system-wide requirement based on all the registered
> >>> +	 * doorbells
> >>> +	 */
> >>> +	vfio_msi_geometry->size =
> >>> +		msi_doorbell_calc_pages(order) * ((uint64_t) 1 << order);
> >>> +
> >>> +	return 0;
> >>> +}
> >>> +
> >>>  static long vfio_iommu_type1_ioctl(void *iommu_data,
> >>>  				   unsigned int cmd, unsigned long arg)
> >>>  {
> >>> @@ -1122,8 +1173,10 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
> >>>  		}
> >>>  	} else if (cmd == VFIO_IOMMU_GET_INFO) {
> >>>  		struct vfio_iommu_type1_info info;
> >>> +		struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
> >>> +		int ret;
> >>>  
> >>> -		minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
> >>> +		minsz = offsetofend(struct vfio_iommu_type1_info, cap_offset);
> >>>  
> >>>  		if (copy_from_user(&info, (void __user *)arg, minsz))
> >>>  			return -EFAULT;
> >>> @@ -1135,6 +1188,29 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
> >>>  
> >>>  		info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
> >>>  
> >>> +		ret = compute_msi_geometry_caps(iommu, &caps);
> >>> +		if (ret)
> >>> +			return ret;
> >>> +
> >>> +		if (caps.size) {
> >>> +			info.flags |= VFIO_IOMMU_INFO_CAPS;
> >>> +			if (info.argsz < sizeof(info) + caps.size) {
> >>> +				info.argsz = sizeof(info) + caps.size;
> >>> +				info.cap_offset = 0;
> >>> +			} else {
> >>> +				vfio_info_cap_shift(&caps, sizeof(info));
> >>> +				if (copy_to_user((void __user *)arg +
> >>> +						sizeof(info), caps.buf,
> >>> +						caps.size)) {
> >>> +					kfree(caps.buf);
> >>> +					return -EFAULT;
> >>> +				}
> >>> +				info.cap_offset = sizeof(info);
> >>> +			}
> >>> +
> >>> +			kfree(caps.buf);
> >>> +		}
> >>> +
> >>>  		return copy_to_user((void __user *)arg, &info, minsz) ?
> >>>  			-EFAULT : 0;
> >>>  
> >>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> >>> index 4a9dbc2..8dae013 100644
> >>> --- a/include/uapi/linux/vfio.h
> >>> +++ b/include/uapi/linux/vfio.h
> >>> @@ -488,7 +488,35 @@ struct vfio_iommu_type1_info {
> >>>  	__u32	argsz;
> >>>  	__u32	flags;
> >>>  #define VFIO_IOMMU_INFO_PGSIZES (1 << 0)	/* supported page sizes info */
> >>> -	__u64	iova_pgsizes;		/* Bitmap of supported page sizes */
> >>> +#define VFIO_IOMMU_INFO_CAPS	(1 << 1)	/* Info supports caps */
> >>> +	__u64	iova_pgsizes;	/* Bitmap of supported page sizes */
> >>> +	__u32	__resv;
> >>> +	__u32   cap_offset;	/* Offset within info struct of first cap */
> >>> +};    
> >>
> >> I understand the padding, but not the ordering.  Why not end with
> >> padding?
> >>  
> >>> +
> >>> +#define VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY	1
> >>> +
> >>> +/*
> >>> + * The MSI geometry capability allows to report the MSI IOVA geometry:
> >>> + * - either the MSI IOVAs are constrained within a reserved IOVA aperture
> >>> + *   whose boundaries are given by [@aperture_start, @aperture_end].
> >>> + *   this is typically the case on x86 host. The userspace is not allowed
> >>> + *   to map userspace memory at IOVAs intersecting this range using
> >>> + *   VFIO_IOMMU_MAP_DMA.
> >>> + * - or the MSI IOVAs are not requested to belong to any reserved range;
> >>> + *   in that case the userspace must provide an IOVA window characterized by
> >>> + *   @size and @alignment using VFIO_IOMMU_MAP_DMA with RESERVED_MSI_IOVA flag.
> >>> + */
> >>> +struct vfio_iommu_type1_info_cap_msi_geometry {
> >>> +	struct vfio_info_cap_header header;
> >>> +	__u32 flags;
> >>> +#define VFIO_IOMMU_MSI_GEOMETRY_RESERVED (1 << 0) /* reserved geometry */
> >>> +	/* not reserved */
> >>> +	__u32 order; /* iommu page order used for aperture alignment*/
> >>> +	__u64 size; /* IOVA aperture size (bytes) the userspace must provide */
> >>> +	/* reserved */
> >>> +	__u64 aperture_start;
> >>> +	__u64 aperture_end;    
> >>
> >> Should these be a union?  We never set them both.  Should the !reserved
> >> case have a flag as well, so the user can positively identify what's
> >> being provided?  
> > 
> > Actually, is there really any need to fit both of these within the same
> > structure?  Part of the idea of the capability chains is we can create
> > a capability for each new thing we want to describe.  So, we could
> > simply define a generic reserved IOVA range capability with a 'start'
> > and 'end' and then another capability to define MSI mapping
> > requirements.  Thanks,  
> Yes your suggested approach makes sense to me.
> 
> One reason why I proceeded that way is we are mixing things at iommu.h
> level too. Personally I would have preferred to separate things:
> 1) add a new IOMMU_CAP_TRANSLATE_MSI capability in iommu_cap
> 2) rename iommu_msi_supported into "programmable" bool: reporting
> whether the aperture is reserved or programmable.
> 
> In the early releases I think it was as above but slightly we moved to a
> mixed description.
> 
> What do you think?

The API certainly doesn't seem like it has a cohesive feel to me.  It's
not entirely clear to me how we know when we need to register a DMA MSI
cookie, or how we know that the MSI doorbell API is actually
initialized and in use by the MSI/IOMMU layer, or exactly what is the
MSI geometry telling me.  Perhaps this is why the code doesn't seem to
have a good rejection mechanism for architectures that need it versus
those that don't, it's too hard to tell.

Maybe we can look at what we think the user API should be and work
backwards.  For x86 we simply have a reserved range of IOVA.  I'm not
entirely sure it adds to the user API to know that it's for MSI, it's
just a range of IOVAs that we cannot allocate for regular DMA.  In
fact, we currently lack a mechanism for describing the IOVA space of
the IOMMU at all, so rather than focusing on a mechanism to describe a
hole in the IOVA space, we might simply want to focus on a mechanism to
describe the available IOVA space.  Everybody needs that, not just
x86.  That sort of sounds like a VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE
that perhaps looks like:

struct vfio_iommu_type1_info_cap_iova_range {
	struct vfio_info_cap_header header;
	u64 start;
	u64 end;
};

Clearly we need to allow multiple of these in the capability chain
since the existing x86 MSI range bisects this address space.

To support this, we basically need the same information from the IOMMU
API.  We already have DOMAIN_ATTR_GEOMETRY, which should give us the
base IOVA range, but we don't have anything describing the gaps.  We
don't know how many sources of gaps we'll have in the future, but let's
keep it simple and assume we can look for MSI gaps and add other
possible sources of gaps in the future, it's an internal API after all.
So we can use DOMAIN_ATTR_MSI_GEOMETRY to tell us about the (we assume
one) MSI range of reserved IOVA within DOMAIN_ATTR_GEOMETRY.  For x86
this is fixed, for SMMU this is a zero range until someone programs it.

Now, what does a user need to know to add a reserved MSI IOVA range?
They need to know a) that it needs to be done, and b) how big to make
it (and maybe alignment requirements).  Really all we need to describe
then is b) since b) implies a). So maybe that gives us another
capability chain entry:

struct vfio_iommu_type1_info_cap_msi_resv {
	struct vfio_info_cap_header header;
	u64 size;
	u64 alignment;
};

It doesn't seem like we need to waste a flag bit on
vfio_iommu_type1_info.flags for this since the existence of this
capability would imply that VFIO_IOMMU_MAP_DMA supports an MSI_RESV
flag.

So what do we need from the kernel infrastructure to make that happen?
Well, we need a) and b) above, and again b) can imply a), so if the
IOMMU API provided a DOMAIN_ATTR_MSI_RESV, providing the same
size/alignment, then we're nearly there.  Then we just need a way to
set that range, which I'd probably try to plumb through the IOMMU API
rather than pulling in separate doorbell APIs and DMA cookie APIs.  If
it's going to pull together all those different things, let's at least
only do that in one place so we can expose a consistent API through the
IOMMU API.  Obviously once a range is set, DOMAIN_ATTR_MSI_RESV should
report that range, so if the user were to look at the type1 info
capability chain again, the available IOVA ranges would reflect the now
reserved range.

Maybe that's more than you're asking for, but that's the approach I
would take to solidify the API.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v13 12/15] vfio: Allow reserved msi iova registration
@ 2016-10-07 20:45         ` Alex Williamson
  0 siblings, 0 replies; 109+ messages in thread
From: Alex Williamson @ 2016-10-07 20:45 UTC (permalink / raw)
  To: Auger Eric
  Cc: yehuday, drjones, jason, kvm, marc.zyngier, p.fedin, joro,
	will.deacon, linux-kernel, Bharat.Bhushan, Jean-Philippe.Brucker,
	iommu, pranav.sawargaonkar, linux-arm-kernel, tglx, robin.murphy,
	Manish.Jaggi, christoffer.dall, eric.auger.pro

On Fri, 7 Oct 2016 19:11:43 +0200
Auger Eric <eric.auger@redhat.com> wrote:

> Hi Alex,
> 
> On 06/10/2016 22:19, Alex Williamson wrote:
> > On Thu,  6 Oct 2016 08:45:28 +0000
> > Eric Auger <eric.auger@redhat.com> wrote:
> >   
> >> The user is allowed to register a reserved MSI IOVA range by using the
> >> DMA MAP API and setting the new flag: VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA.
> >> This region is stored in the vfio_dma rb tree. At that point the iova
> >> range is not mapped to any target address yet. The host kernel will use
> >> those iova when needed, typically when MSIs are allocated.
> >>
> >> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> >> Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
> >>
> >> ---
> >> v12 -> v13:
> >> - use iommu_get_dma_msi_region_cookie
> >>
> >> v9 -> v10
> >> - use VFIO_IOVA_RESERVED_MSI enum value
> >>
> >> v7 -> v8:
> >> - use iommu_msi_set_aperture function. There is no notion of
> >>   unregistration anymore since the reserved msi slot remains
> >>   until the container gets closed.
> >>
> >> v6 -> v7:
> >> - use iommu_free_reserved_iova_domain
> >> - convey prot attributes downto dma-reserved-iommu iova domain creation
> >> - reserved bindings teardown now performed on iommu domain destruction
> >> - rename VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA into
> >>          VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA
> >> - change title
> >> - pass the protection attribute to dma-reserved-iommu API
> >>
> >> v3 -> v4:
> >> - use iommu_alloc/free_reserved_iova_domain exported by dma-reserved-iommu
> >> - protect vfio_register_reserved_iova_range implementation with
> >>   CONFIG_IOMMU_DMA_RESERVED
> >> - handle unregistration by user-space and on vfio_iommu_type1 release
> >>
> >> v1 -> v2:
> >> - set returned value according to alloc_reserved_iova_domain result
> >> - free the iova domains in case any error occurs
> >>
> >> RFC v1 -> v1:
> >> - takes into account Alex comments, based on
> >>   [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region:
> >> - use the existing dma map/unmap ioctl interface with a flag to register
> >>   a reserved IOVA range. A single reserved iova region is allowed.
> >> ---
> >>  drivers/vfio/vfio_iommu_type1.c | 77 ++++++++++++++++++++++++++++++++++++++++-
> >>  include/uapi/linux/vfio.h       | 10 +++++-
> >>  2 files changed, 85 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> >> index 5bc5fc9..c2f8bd9 100644
> >> --- a/drivers/vfio/vfio_iommu_type1.c
> >> +++ b/drivers/vfio/vfio_iommu_type1.c
> >> @@ -442,6 +442,20 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
> >>  	vfio_lock_acct(-unlocked);
> >>  }
> >>  
> >> +static int vfio_set_msi_aperture(struct vfio_iommu *iommu,
> >> +				dma_addr_t iova, size_t size)
> >> +{
> >> +	struct vfio_domain *d;
> >> +	int ret = 0;
> >> +
> >> +	list_for_each_entry(d, &iommu->domain_list, next) {
> >> +		ret = iommu_get_dma_msi_region_cookie(d->domain, iova, size);
> >> +		if (ret)
> >> +			break;
> >> +	}
> >> +	return ret;  
> > 
> > Doesn't this need an unwind on failure loop?  
> At the moment the de-allocation is done by the smmu driver, on
> domain_free ops, which calls iommu_put_dma_cookie. In case,
> iommu_get_dma_msi_region_cookie fails on a given VFIO domain currently
> there is no other way but destroying all VFIO domains and redo everything.
> 
> So yes I plan to unfold everything, ie call iommu_put_dma_cookie for
> each domain.

That's a pretty harsh user experience isn't it?  They potentially have
some domains where the cookie is setup and others without and they have
no means to recover except to tear it all down and start over?  Thanks,

Alex

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v13 12/15] vfio: Allow reserved msi iova registration
@ 2016-10-07 20:45         ` Alex Williamson
  0 siblings, 0 replies; 109+ messages in thread
From: Alex Williamson @ 2016-10-07 20:45 UTC (permalink / raw)
  To: Auger Eric
  Cc: yehuday-eYqpPyKDWXRBDgjK7y7TUQ, jason-NLaQJdtUoK4Be96aLqz0jA,
	kvm-u79uwXL29TY76Z2rM5mHXA, marc.zyngier-5wv7dgnIgG8,
	will.deacon-5wv7dgnIgG8, p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
	drjones-H+wXaHxf7aLQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	tglx-hfZtesqFncYOwBW4kG4KsQ,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w

On Fri, 7 Oct 2016 19:11:43 +0200
Auger Eric <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> Hi Alex,
> 
> On 06/10/2016 22:19, Alex Williamson wrote:
> > On Thu,  6 Oct 2016 08:45:28 +0000
> > Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> >   
> >> The user is allowed to register a reserved MSI IOVA range by using the
> >> DMA MAP API and setting the new flag: VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA.
> >> This region is stored in the vfio_dma rb tree. At that point the iova
> >> range is not mapped to any target address yet. The host kernel will use
> >> those iova when needed, typically when MSIs are allocated.
> >>
> >> Signed-off-by: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> >> Signed-off-by: Bharat Bhushan <Bharat.Bhushan-KZfg59tc24xl57MIdRCFDg@public.gmane.org>
> >>
> >> ---
> >> v12 -> v13:
> >> - use iommu_get_dma_msi_region_cookie
> >>
> >> v9 -> v10
> >> - use VFIO_IOVA_RESERVED_MSI enum value
> >>
> >> v7 -> v8:
> >> - use iommu_msi_set_aperture function. There is no notion of
> >>   unregistration anymore since the reserved msi slot remains
> >>   until the container gets closed.
> >>
> >> v6 -> v7:
> >> - use iommu_free_reserved_iova_domain
> >> - convey prot attributes downto dma-reserved-iommu iova domain creation
> >> - reserved bindings teardown now performed on iommu domain destruction
> >> - rename VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA into
> >>          VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA
> >> - change title
> >> - pass the protection attribute to dma-reserved-iommu API
> >>
> >> v3 -> v4:
> >> - use iommu_alloc/free_reserved_iova_domain exported by dma-reserved-iommu
> >> - protect vfio_register_reserved_iova_range implementation with
> >>   CONFIG_IOMMU_DMA_RESERVED
> >> - handle unregistration by user-space and on vfio_iommu_type1 release
> >>
> >> v1 -> v2:
> >> - set returned value according to alloc_reserved_iova_domain result
> >> - free the iova domains in case any error occurs
> >>
> >> RFC v1 -> v1:
> >> - takes into account Alex comments, based on
> >>   [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region:
> >> - use the existing dma map/unmap ioctl interface with a flag to register
> >>   a reserved IOVA range. A single reserved iova region is allowed.
> >> ---
> >>  drivers/vfio/vfio_iommu_type1.c | 77 ++++++++++++++++++++++++++++++++++++++++-
> >>  include/uapi/linux/vfio.h       | 10 +++++-
> >>  2 files changed, 85 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> >> index 5bc5fc9..c2f8bd9 100644
> >> --- a/drivers/vfio/vfio_iommu_type1.c
> >> +++ b/drivers/vfio/vfio_iommu_type1.c
> >> @@ -442,6 +442,20 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
> >>  	vfio_lock_acct(-unlocked);
> >>  }
> >>  
> >> +static int vfio_set_msi_aperture(struct vfio_iommu *iommu,
> >> +				dma_addr_t iova, size_t size)
> >> +{
> >> +	struct vfio_domain *d;
> >> +	int ret = 0;
> >> +
> >> +	list_for_each_entry(d, &iommu->domain_list, next) {
> >> +		ret = iommu_get_dma_msi_region_cookie(d->domain, iova, size);
> >> +		if (ret)
> >> +			break;
> >> +	}
> >> +	return ret;  
> > 
> > Doesn't this need an unwind on failure loop?  
> At the moment the de-allocation is done by the smmu driver, on
> domain_free ops, which calls iommu_put_dma_cookie. In case,
> iommu_get_dma_msi_region_cookie fails on a given VFIO domain currently
> there is no other way but destroying all VFIO domains and redo everything.
> 
> So yes I plan to unfold everything, ie call iommu_put_dma_cookie for
> each domain.

That's a pretty harsh user experience isn't it?  They potentially have
some domains where the cookie is setup and others without and they have
no means to recover except to tear it all down and start over?  Thanks,

Alex

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v13 12/15] vfio: Allow reserved msi iova registration
@ 2016-10-07 20:45         ` Alex Williamson
  0 siblings, 0 replies; 109+ messages in thread
From: Alex Williamson @ 2016-10-07 20:45 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 7 Oct 2016 19:11:43 +0200
Auger Eric <eric.auger@redhat.com> wrote:

> Hi Alex,
> 
> On 06/10/2016 22:19, Alex Williamson wrote:
> > On Thu,  6 Oct 2016 08:45:28 +0000
> > Eric Auger <eric.auger@redhat.com> wrote:
> >   
> >> The user is allowed to register a reserved MSI IOVA range by using the
> >> DMA MAP API and setting the new flag: VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA.
> >> This region is stored in the vfio_dma rb tree. At that point the iova
> >> range is not mapped to any target address yet. The host kernel will use
> >> those iova when needed, typically when MSIs are allocated.
> >>
> >> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> >> Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
> >>
> >> ---
> >> v12 -> v13:
> >> - use iommu_get_dma_msi_region_cookie
> >>
> >> v9 -> v10
> >> - use VFIO_IOVA_RESERVED_MSI enum value
> >>
> >> v7 -> v8:
> >> - use iommu_msi_set_aperture function. There is no notion of
> >>   unregistration anymore since the reserved msi slot remains
> >>   until the container gets closed.
> >>
> >> v6 -> v7:
> >> - use iommu_free_reserved_iova_domain
> >> - convey prot attributes downto dma-reserved-iommu iova domain creation
> >> - reserved bindings teardown now performed on iommu domain destruction
> >> - rename VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA into
> >>          VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA
> >> - change title
> >> - pass the protection attribute to dma-reserved-iommu API
> >>
> >> v3 -> v4:
> >> - use iommu_alloc/free_reserved_iova_domain exported by dma-reserved-iommu
> >> - protect vfio_register_reserved_iova_range implementation with
> >>   CONFIG_IOMMU_DMA_RESERVED
> >> - handle unregistration by user-space and on vfio_iommu_type1 release
> >>
> >> v1 -> v2:
> >> - set returned value according to alloc_reserved_iova_domain result
> >> - free the iova domains in case any error occurs
> >>
> >> RFC v1 -> v1:
> >> - takes into account Alex comments, based on
> >>   [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region:
> >> - use the existing dma map/unmap ioctl interface with a flag to register
> >>   a reserved IOVA range. A single reserved iova region is allowed.
> >> ---
> >>  drivers/vfio/vfio_iommu_type1.c | 77 ++++++++++++++++++++++++++++++++++++++++-
> >>  include/uapi/linux/vfio.h       | 10 +++++-
> >>  2 files changed, 85 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> >> index 5bc5fc9..c2f8bd9 100644
> >> --- a/drivers/vfio/vfio_iommu_type1.c
> >> +++ b/drivers/vfio/vfio_iommu_type1.c
> >> @@ -442,6 +442,20 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
> >>  	vfio_lock_acct(-unlocked);
> >>  }
> >>  
> >> +static int vfio_set_msi_aperture(struct vfio_iommu *iommu,
> >> +				dma_addr_t iova, size_t size)
> >> +{
> >> +	struct vfio_domain *d;
> >> +	int ret = 0;
> >> +
> >> +	list_for_each_entry(d, &iommu->domain_list, next) {
> >> +		ret = iommu_get_dma_msi_region_cookie(d->domain, iova, size);
> >> +		if (ret)
> >> +			break;
> >> +	}
> >> +	return ret;  
> > 
> > Doesn't this need an unwind on failure loop?  
> At the moment the de-allocation is done by the smmu driver, on
> domain_free ops, which calls iommu_put_dma_cookie. In case,
> iommu_get_dma_msi_region_cookie fails on a given VFIO domain currently
> there is no other way but destroying all VFIO domains and redo everything.
> 
> So yes I plan to unfold everything, ie call iommu_put_dma_cookie for
> each domain.

That's a pretty harsh user experience isn't it?  They potentially have
some domains where the cookie is setup and others without and they have
no means to recover except to tear it all down and start over?  Thanks,

Alex

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v13 03/15] iommu/dma: Allow MSI-only cookies
@ 2016-10-10 14:26       ` Robin Murphy
  0 siblings, 0 replies; 109+ messages in thread
From: Robin Murphy @ 2016-10-10 14:26 UTC (permalink / raw)
  To: Alex Williamson, Eric Auger
  Cc: eric.auger.pro, christoffer.dall, marc.zyngier, will.deacon,
	joro, tglx, jason, linux-arm-kernel, kvm, drjones, linux-kernel,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
	Jean-Philippe.Brucker, yehuday, Manish.Jaggi

Hi Alex, Eric,

On 06/10/16 21:17, Alex Williamson wrote:
> On Thu,  6 Oct 2016 08:45:19 +0000
> Eric Auger <eric.auger@redhat.com> wrote:
> 
>> From: Robin Murphy <robin.murphy@arm.com>
>>
>> IOMMU domain users such as VFIO face a similar problem to DMA API ops
>> with regard to mapping MSI messages in systems where the MSI write is
>> subject to IOMMU translation. With the relevant infrastructure now in
>> place for managed DMA domains, it's actually really simple for other
>> users to piggyback off that and reap the benefits without giving up
>> their own IOVA management, and without having to reinvent their own
>> wheel in the MSI layer.
>>
>> Allow such users to opt into automatic MSI remapping by dedicating a
>> region of their IOVA space to a managed cookie.
>>
>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>
>> ---
>>
>> v1 -> v2:
>> - compared to Robin's version
>> - add NULL last param to iommu_dma_init_domain
>> - set the msi_geometry aperture
>> - I removed
>>   if (base < U64_MAX - size)
>>      reserve_iova(iovad, iova_pfn(iovad, base + size), ULONG_MAX);
>>   don't get why we would reserve something out of the scope of the iova domain?
>>   what do I miss?
>> ---
>>  drivers/iommu/dma-iommu.c | 40 ++++++++++++++++++++++++++++++++++++++++
>>  include/linux/dma-iommu.h |  9 +++++++++
>>  2 files changed, 49 insertions(+)
>>
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index c5ab866..11da1a0 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -716,3 +716,43 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
>>  		msg->address_lo += lower_32_bits(msi_page->iova);
>>  	}
>>  }
>> +
>> +/**
>> + * iommu_get_dma_msi_region_cookie - Configure a domain for MSI remapping only
> 
> Should this perhaps be iommu_setup_dma_msi_region_cookie, or something
> along those lines.  I'm not sure what we're get'ing.  Thanks,

What we're getting is private third-party resources for the iommu_domain
given in the argument. It's a get/put rather than alloc/free model since
we operate opaquely on the domain as a container, rather than on the
actual resource in question (an IOVA allocator).

Since this particular use case is slightly different from the normal
flow and has special initialisation requirements, it seemed a lot
cleaner to simply combine that initialisation operation with the
prerequisite "get" into a single call. Especially as it helps emphasise
that this is not 'normal' DMA cookie usage.

> 
> Alex
> 
>> + * @domain: IOMMU domain to prepare
>> + * @base: Base address of IOVA region to use as the MSI remapping aperture
>> + * @size: Size of the desired MSI aperture
>> + *
>> + * Users who manage their own IOVA allocation and do not want DMA API support,
>> + * but would still like to take advantage of automatic MSI remapping, can use
>> + * this to initialise their own domain appropriately.
>> + */
>> +int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>> +		dma_addr_t base, u64 size)
>> +{
>> +	struct iommu_dma_cookie *cookie;
>> +	struct iova_domain *iovad;
>> +	int ret;
>> +
>> +	if (domain->type == IOMMU_DOMAIN_DMA)
>> +		return -EINVAL;
>> +
>> +	ret = iommu_get_dma_cookie(domain);
>> +	if (ret)
>> +		return ret;
>> +
>> +	ret = iommu_dma_init_domain(domain, base, size, NULL);
>> +	if (ret) {
>> +		iommu_put_dma_cookie(domain);
>> +		return ret;
>> +	}

It *is* necessary to explicitly reserve the upper part of the IOVA
domain here - the aforementioned "special initialisation" - because
dma_32bit_pfn is only an optimisation hint to prevent the allocator
walking down from the very top of the the tree every time when devices
with different DMA masks share a domain (I'm in two minds as to whether
to tweak the way the iommu-dma code uses it in this respect, now that I
fully understand things). The only actual upper limit to allocation is
the DMA mask passed into each alloc_iova() call, so if we want to ensure
IOVAs are really allocated within this specific region, we have to carve
out everything above it.

Robin.

>> +
>> +	domain->msi_geometry.aperture_start = base;
>> +	domain->msi_geometry.aperture_end = base + size - 1;
>> +
>> +	cookie = domain->iova_cookie;
>> +	iovad = &cookie->iovad;
>> +
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL(iommu_get_dma_msi_region_cookie);
>> diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
>> index 32c5890..1c55413 100644
>> --- a/include/linux/dma-iommu.h
>> +++ b/include/linux/dma-iommu.h
>> @@ -67,6 +67,9 @@ int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
>>  /* The DMA API isn't _quite_ the whole story, though... */
>>  void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg);
>>  
>> +int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>> +		dma_addr_t base, u64 size);
>> +
>>  #else
>>  
>>  struct iommu_domain;
>> @@ -90,6 +93,12 @@ static inline void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
>>  {
>>  }
>>  
>> +static inline int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>> +		dma_addr_t base, u64 size)
>> +{
>> +	return -ENODEV;
>> +}
>> +
>>  #endif	/* CONFIG_IOMMU_DMA */
>>  #endif	/* __KERNEL__ */
>>  #endif	/* __DMA_IOMMU_H */
> 

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v13 03/15] iommu/dma: Allow MSI-only cookies
@ 2016-10-10 14:26       ` Robin Murphy
  0 siblings, 0 replies; 109+ messages in thread
From: Robin Murphy @ 2016-10-10 14:26 UTC (permalink / raw)
  To: Alex Williamson, Eric Auger
  Cc: yehuday-eYqpPyKDWXRBDgjK7y7TUQ, drjones-H+wXaHxf7aLQT0dZR+AlfA,
	jason-NLaQJdtUoK4Be96aLqz0jA, kvm-u79uwXL29TY76Z2rM5mHXA,
	marc.zyngier-5wv7dgnIgG8, p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	tglx-hfZtesqFncYOwBW4kG4KsQ,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w

Hi Alex, Eric,

On 06/10/16 21:17, Alex Williamson wrote:
> On Thu,  6 Oct 2016 08:45:19 +0000
> Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> 
>> From: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
>>
>> IOMMU domain users such as VFIO face a similar problem to DMA API ops
>> with regard to mapping MSI messages in systems where the MSI write is
>> subject to IOMMU translation. With the relevant infrastructure now in
>> place for managed DMA domains, it's actually really simple for other
>> users to piggyback off that and reap the benefits without giving up
>> their own IOVA management, and without having to reinvent their own
>> wheel in the MSI layer.
>>
>> Allow such users to opt into automatic MSI remapping by dedicating a
>> region of their IOVA space to a managed cookie.
>>
>> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
>> Signed-off-by: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>>
>> ---
>>
>> v1 -> v2:
>> - compared to Robin's version
>> - add NULL last param to iommu_dma_init_domain
>> - set the msi_geometry aperture
>> - I removed
>>   if (base < U64_MAX - size)
>>      reserve_iova(iovad, iova_pfn(iovad, base + size), ULONG_MAX);
>>   don't get why we would reserve something out of the scope of the iova domain?
>>   what do I miss?
>> ---
>>  drivers/iommu/dma-iommu.c | 40 ++++++++++++++++++++++++++++++++++++++++
>>  include/linux/dma-iommu.h |  9 +++++++++
>>  2 files changed, 49 insertions(+)
>>
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index c5ab866..11da1a0 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -716,3 +716,43 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
>>  		msg->address_lo += lower_32_bits(msi_page->iova);
>>  	}
>>  }
>> +
>> +/**
>> + * iommu_get_dma_msi_region_cookie - Configure a domain for MSI remapping only
> 
> Should this perhaps be iommu_setup_dma_msi_region_cookie, or something
> along those lines.  I'm not sure what we're get'ing.  Thanks,

What we're getting is private third-party resources for the iommu_domain
given in the argument. It's a get/put rather than alloc/free model since
we operate opaquely on the domain as a container, rather than on the
actual resource in question (an IOVA allocator).

Since this particular use case is slightly different from the normal
flow and has special initialisation requirements, it seemed a lot
cleaner to simply combine that initialisation operation with the
prerequisite "get" into a single call. Especially as it helps emphasise
that this is not 'normal' DMA cookie usage.

> 
> Alex
> 
>> + * @domain: IOMMU domain to prepare
>> + * @base: Base address of IOVA region to use as the MSI remapping aperture
>> + * @size: Size of the desired MSI aperture
>> + *
>> + * Users who manage their own IOVA allocation and do not want DMA API support,
>> + * but would still like to take advantage of automatic MSI remapping, can use
>> + * this to initialise their own domain appropriately.
>> + */
>> +int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>> +		dma_addr_t base, u64 size)
>> +{
>> +	struct iommu_dma_cookie *cookie;
>> +	struct iova_domain *iovad;
>> +	int ret;
>> +
>> +	if (domain->type == IOMMU_DOMAIN_DMA)
>> +		return -EINVAL;
>> +
>> +	ret = iommu_get_dma_cookie(domain);
>> +	if (ret)
>> +		return ret;
>> +
>> +	ret = iommu_dma_init_domain(domain, base, size, NULL);
>> +	if (ret) {
>> +		iommu_put_dma_cookie(domain);
>> +		return ret;
>> +	}

It *is* necessary to explicitly reserve the upper part of the IOVA
domain here - the aforementioned "special initialisation" - because
dma_32bit_pfn is only an optimisation hint to prevent the allocator
walking down from the very top of the the tree every time when devices
with different DMA masks share a domain (I'm in two minds as to whether
to tweak the way the iommu-dma code uses it in this respect, now that I
fully understand things). The only actual upper limit to allocation is
the DMA mask passed into each alloc_iova() call, so if we want to ensure
IOVAs are really allocated within this specific region, we have to carve
out everything above it.

Robin.

>> +
>> +	domain->msi_geometry.aperture_start = base;
>> +	domain->msi_geometry.aperture_end = base + size - 1;
>> +
>> +	cookie = domain->iova_cookie;
>> +	iovad = &cookie->iovad;
>> +
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL(iommu_get_dma_msi_region_cookie);
>> diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
>> index 32c5890..1c55413 100644
>> --- a/include/linux/dma-iommu.h
>> +++ b/include/linux/dma-iommu.h
>> @@ -67,6 +67,9 @@ int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
>>  /* The DMA API isn't _quite_ the whole story, though... */
>>  void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg);
>>  
>> +int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>> +		dma_addr_t base, u64 size);
>> +
>>  #else
>>  
>>  struct iommu_domain;
>> @@ -90,6 +93,12 @@ static inline void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
>>  {
>>  }
>>  
>> +static inline int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>> +		dma_addr_t base, u64 size)
>> +{
>> +	return -ENODEV;
>> +}
>> +
>>  #endif	/* CONFIG_IOMMU_DMA */
>>  #endif	/* __KERNEL__ */
>>  #endif	/* __DMA_IOMMU_H */
> 

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v13 03/15] iommu/dma: Allow MSI-only cookies
@ 2016-10-10 14:26       ` Robin Murphy
  0 siblings, 0 replies; 109+ messages in thread
From: Robin Murphy @ 2016-10-10 14:26 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Alex, Eric,

On 06/10/16 21:17, Alex Williamson wrote:
> On Thu,  6 Oct 2016 08:45:19 +0000
> Eric Auger <eric.auger@redhat.com> wrote:
> 
>> From: Robin Murphy <robin.murphy@arm.com>
>>
>> IOMMU domain users such as VFIO face a similar problem to DMA API ops
>> with regard to mapping MSI messages in systems where the MSI write is
>> subject to IOMMU translation. With the relevant infrastructure now in
>> place for managed DMA domains, it's actually really simple for other
>> users to piggyback off that and reap the benefits without giving up
>> their own IOVA management, and without having to reinvent their own
>> wheel in the MSI layer.
>>
>> Allow such users to opt into automatic MSI remapping by dedicating a
>> region of their IOVA space to a managed cookie.
>>
>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>
>> ---
>>
>> v1 -> v2:
>> - compared to Robin's version
>> - add NULL last param to iommu_dma_init_domain
>> - set the msi_geometry aperture
>> - I removed
>>   if (base < U64_MAX - size)
>>      reserve_iova(iovad, iova_pfn(iovad, base + size), ULONG_MAX);
>>   don't get why we would reserve something out of the scope of the iova domain?
>>   what do I miss?
>> ---
>>  drivers/iommu/dma-iommu.c | 40 ++++++++++++++++++++++++++++++++++++++++
>>  include/linux/dma-iommu.h |  9 +++++++++
>>  2 files changed, 49 insertions(+)
>>
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index c5ab866..11da1a0 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -716,3 +716,43 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
>>  		msg->address_lo += lower_32_bits(msi_page->iova);
>>  	}
>>  }
>> +
>> +/**
>> + * iommu_get_dma_msi_region_cookie - Configure a domain for MSI remapping only
> 
> Should this perhaps be iommu_setup_dma_msi_region_cookie, or something
> along those lines.  I'm not sure what we're get'ing.  Thanks,

What we're getting is private third-party resources for the iommu_domain
given in the argument. It's a get/put rather than alloc/free model since
we operate opaquely on the domain as a container, rather than on the
actual resource in question (an IOVA allocator).

Since this particular use case is slightly different from the normal
flow and has special initialisation requirements, it seemed a lot
cleaner to simply combine that initialisation operation with the
prerequisite "get" into a single call. Especially as it helps emphasise
that this is not 'normal' DMA cookie usage.

> 
> Alex
> 
>> + * @domain: IOMMU domain to prepare
>> + * @base: Base address of IOVA region to use as the MSI remapping aperture
>> + * @size: Size of the desired MSI aperture
>> + *
>> + * Users who manage their own IOVA allocation and do not want DMA API support,
>> + * but would still like to take advantage of automatic MSI remapping, can use
>> + * this to initialise their own domain appropriately.
>> + */
>> +int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>> +		dma_addr_t base, u64 size)
>> +{
>> +	struct iommu_dma_cookie *cookie;
>> +	struct iova_domain *iovad;
>> +	int ret;
>> +
>> +	if (domain->type == IOMMU_DOMAIN_DMA)
>> +		return -EINVAL;
>> +
>> +	ret = iommu_get_dma_cookie(domain);
>> +	if (ret)
>> +		return ret;
>> +
>> +	ret = iommu_dma_init_domain(domain, base, size, NULL);
>> +	if (ret) {
>> +		iommu_put_dma_cookie(domain);
>> +		return ret;
>> +	}

It *is* necessary to explicitly reserve the upper part of the IOVA
domain here - the aforementioned "special initialisation" - because
dma_32bit_pfn is only an optimisation hint to prevent the allocator
walking down from the very top of the the tree every time when devices
with different DMA masks share a domain (I'm in two minds as to whether
to tweak the way the iommu-dma code uses it in this respect, now that I
fully understand things). The only actual upper limit to allocation is
the DMA mask passed into each alloc_iova() call, so if we want to ensure
IOVAs are really allocated within this specific region, we have to carve
out everything above it.

Robin.

>> +
>> +	domain->msi_geometry.aperture_start = base;
>> +	domain->msi_geometry.aperture_end = base + size - 1;
>> +
>> +	cookie = domain->iova_cookie;
>> +	iovad = &cookie->iovad;
>> +
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL(iommu_get_dma_msi_region_cookie);
>> diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
>> index 32c5890..1c55413 100644
>> --- a/include/linux/dma-iommu.h
>> +++ b/include/linux/dma-iommu.h
>> @@ -67,6 +67,9 @@ int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
>>  /* The DMA API isn't _quite_ the whole story, though... */
>>  void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg);
>>  
>> +int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>> +		dma_addr_t base, u64 size);
>> +
>>  #else
>>  
>>  struct iommu_domain;
>> @@ -90,6 +93,12 @@ static inline void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
>>  {
>>  }
>>  
>> +static inline int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>> +		dma_addr_t base, u64 size)
>> +{
>> +	return -ENODEV;
>> +}
>> +
>>  #endif	/* CONFIG_IOMMU_DMA */
>>  #endif	/* __KERNEL__ */
>>  #endif	/* __DMA_IOMMU_H */
> 

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v13 03/15] iommu/dma: Allow MSI-only cookies
@ 2016-10-10 14:47         ` Auger Eric
  0 siblings, 0 replies; 109+ messages in thread
From: Auger Eric @ 2016-10-10 14:47 UTC (permalink / raw)
  To: Robin Murphy, Alex Williamson
  Cc: eric.auger.pro, christoffer.dall, marc.zyngier, will.deacon,
	joro, tglx, jason, linux-arm-kernel, kvm, drjones, linux-kernel,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
	Jean-Philippe.Brucker, yehuday, Manish.Jaggi

Hi Robin,

On 10/10/2016 16:26, Robin Murphy wrote:
> Hi Alex, Eric,
> 
> On 06/10/16 21:17, Alex Williamson wrote:
>> On Thu,  6 Oct 2016 08:45:19 +0000
>> Eric Auger <eric.auger@redhat.com> wrote:
>>
>>> From: Robin Murphy <robin.murphy@arm.com>
>>>
>>> IOMMU domain users such as VFIO face a similar problem to DMA API ops
>>> with regard to mapping MSI messages in systems where the MSI write is
>>> subject to IOMMU translation. With the relevant infrastructure now in
>>> place for managed DMA domains, it's actually really simple for other
>>> users to piggyback off that and reap the benefits without giving up
>>> their own IOVA management, and without having to reinvent their own
>>> wheel in the MSI layer.
>>>
>>> Allow such users to opt into automatic MSI remapping by dedicating a
>>> region of their IOVA space to a managed cookie.
>>>
>>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>
>>> ---
>>>
>>> v1 -> v2:
>>> - compared to Robin's version
>>> - add NULL last param to iommu_dma_init_domain
>>> - set the msi_geometry aperture
>>> - I removed
>>>   if (base < U64_MAX - size)
>>>      reserve_iova(iovad, iova_pfn(iovad, base + size), ULONG_MAX);
>>>   don't get why we would reserve something out of the scope of the iova domain?
>>>   what do I miss?
>>> ---
>>>  drivers/iommu/dma-iommu.c | 40 ++++++++++++++++++++++++++++++++++++++++
>>>  include/linux/dma-iommu.h |  9 +++++++++
>>>  2 files changed, 49 insertions(+)
>>>
>>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>>> index c5ab866..11da1a0 100644
>>> --- a/drivers/iommu/dma-iommu.c
>>> +++ b/drivers/iommu/dma-iommu.c
>>> @@ -716,3 +716,43 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
>>>  		msg->address_lo += lower_32_bits(msi_page->iova);
>>>  	}
>>>  }
>>> +
>>> +/**
>>> + * iommu_get_dma_msi_region_cookie - Configure a domain for MSI remapping only
>>
>> Should this perhaps be iommu_setup_dma_msi_region_cookie, or something
>> along those lines.  I'm not sure what we're get'ing.  Thanks,
> 
> What we're getting is private third-party resources for the iommu_domain
> given in the argument. It's a get/put rather than alloc/free model since
> we operate opaquely on the domain as a container, rather than on the
> actual resource in question (an IOVA allocator).
> 
> Since this particular use case is slightly different from the normal
> flow and has special initialisation requirements, it seemed a lot
> cleaner to simply combine that initialisation operation with the
> prerequisite "get" into a single call. Especially as it helps emphasise
> that this is not 'normal' DMA cookie usage.

I renamed iommu_get_dma_msi_region_cookie into
iommu_setup_dma_msi_region. Is it a problem for you?
> 
>>
>> Alex
>>
>>> + * @domain: IOMMU domain to prepare
>>> + * @base: Base address of IOVA region to use as the MSI remapping aperture
>>> + * @size: Size of the desired MSI aperture
>>> + *
>>> + * Users who manage their own IOVA allocation and do not want DMA API support,
>>> + * but would still like to take advantage of automatic MSI remapping, can use
>>> + * this to initialise their own domain appropriately.
>>> + */
>>> +int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>>> +		dma_addr_t base, u64 size)
>>> +{
>>> +	struct iommu_dma_cookie *cookie;
>>> +	struct iova_domain *iovad;
>>> +	int ret;
>>> +
>>> +	if (domain->type == IOMMU_DOMAIN_DMA)
>>> +		return -EINVAL;
>>> +
>>> +	ret = iommu_get_dma_cookie(domain);
>>> +	if (ret)
>>> +		return ret;
>>> +
>>> +	ret = iommu_dma_init_domain(domain, base, size, NULL);
>>> +	if (ret) {
>>> +		iommu_put_dma_cookie(domain);
>>> +		return ret;
>>> +	}
> 
> It *is* necessary to explicitly reserve the upper part of the IOVA
> domain here - the aforementioned "special initialisation" - because
> dma_32bit_pfn is only an optimisation hint to prevent the allocator
> walking down from the very top of the the tree every time when devices
> with different DMA masks share a domain (I'm in two minds as to whether
> to tweak the way the iommu-dma code uses it in this respect, now that I
> fully understand things). The only actual upper limit to allocation is
> the DMA mask passed into each alloc_iova() call, so if we want to ensure
> IOVAs are really allocated within this specific region, we have to carve
> out everything above it.

thank you for the explanation. So I will restore the reserve then.

Thanks

Eric
> 
> Robin.
> 
>>> +
>>> +	domain->msi_geometry.aperture_start = base;
>>> +	domain->msi_geometry.aperture_end = base + size - 1;
>>> +
>>> +	cookie = domain->iova_cookie;
>>> +	iovad = &cookie->iovad;
>>> +
>>> +	return 0;
>>> +}
>>> +EXPORT_SYMBOL(iommu_get_dma_msi_region_cookie);
>>> diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
>>> index 32c5890..1c55413 100644
>>> --- a/include/linux/dma-iommu.h
>>> +++ b/include/linux/dma-iommu.h
>>> @@ -67,6 +67,9 @@ int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
>>>  /* The DMA API isn't _quite_ the whole story, though... */
>>>  void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg);
>>>  
>>> +int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>>> +		dma_addr_t base, u64 size);
>>> +
>>>  #else
>>>  
>>>  struct iommu_domain;
>>> @@ -90,6 +93,12 @@ static inline void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
>>>  {
>>>  }
>>>  
>>> +static inline int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>>> +		dma_addr_t base, u64 size)
>>> +{
>>> +	return -ENODEV;
>>> +}
>>> +
>>>  #endif	/* CONFIG_IOMMU_DMA */
>>>  #endif	/* __KERNEL__ */
>>>  #endif	/* __DMA_IOMMU_H */
>>
> 

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v13 03/15] iommu/dma: Allow MSI-only cookies
@ 2016-10-10 14:47         ` Auger Eric
  0 siblings, 0 replies; 109+ messages in thread
From: Auger Eric @ 2016-10-10 14:47 UTC (permalink / raw)
  To: Robin Murphy, Alex Williamson
  Cc: yehuday-eYqpPyKDWXRBDgjK7y7TUQ, drjones-H+wXaHxf7aLQT0dZR+AlfA,
	jason-NLaQJdtUoK4Be96aLqz0jA, kvm-u79uwXL29TY76Z2rM5mHXA,
	marc.zyngier-5wv7dgnIgG8, p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	tglx-hfZtesqFncYOwBW4kG4KsQ,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w

Hi Robin,

On 10/10/2016 16:26, Robin Murphy wrote:
> Hi Alex, Eric,
> 
> On 06/10/16 21:17, Alex Williamson wrote:
>> On Thu,  6 Oct 2016 08:45:19 +0000
>> Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>>
>>> From: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
>>>
>>> IOMMU domain users such as VFIO face a similar problem to DMA API ops
>>> with regard to mapping MSI messages in systems where the MSI write is
>>> subject to IOMMU translation. With the relevant infrastructure now in
>>> place for managed DMA domains, it's actually really simple for other
>>> users to piggyback off that and reap the benefits without giving up
>>> their own IOVA management, and without having to reinvent their own
>>> wheel in the MSI layer.
>>>
>>> Allow such users to opt into automatic MSI remapping by dedicating a
>>> region of their IOVA space to a managed cookie.
>>>
>>> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
>>> Signed-off-by: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>>>
>>> ---
>>>
>>> v1 -> v2:
>>> - compared to Robin's version
>>> - add NULL last param to iommu_dma_init_domain
>>> - set the msi_geometry aperture
>>> - I removed
>>>   if (base < U64_MAX - size)
>>>      reserve_iova(iovad, iova_pfn(iovad, base + size), ULONG_MAX);
>>>   don't get why we would reserve something out of the scope of the iova domain?
>>>   what do I miss?
>>> ---
>>>  drivers/iommu/dma-iommu.c | 40 ++++++++++++++++++++++++++++++++++++++++
>>>  include/linux/dma-iommu.h |  9 +++++++++
>>>  2 files changed, 49 insertions(+)
>>>
>>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>>> index c5ab866..11da1a0 100644
>>> --- a/drivers/iommu/dma-iommu.c
>>> +++ b/drivers/iommu/dma-iommu.c
>>> @@ -716,3 +716,43 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
>>>  		msg->address_lo += lower_32_bits(msi_page->iova);
>>>  	}
>>>  }
>>> +
>>> +/**
>>> + * iommu_get_dma_msi_region_cookie - Configure a domain for MSI remapping only
>>
>> Should this perhaps be iommu_setup_dma_msi_region_cookie, or something
>> along those lines.  I'm not sure what we're get'ing.  Thanks,
> 
> What we're getting is private third-party resources for the iommu_domain
> given in the argument. It's a get/put rather than alloc/free model since
> we operate opaquely on the domain as a container, rather than on the
> actual resource in question (an IOVA allocator).
> 
> Since this particular use case is slightly different from the normal
> flow and has special initialisation requirements, it seemed a lot
> cleaner to simply combine that initialisation operation with the
> prerequisite "get" into a single call. Especially as it helps emphasise
> that this is not 'normal' DMA cookie usage.

I renamed iommu_get_dma_msi_region_cookie into
iommu_setup_dma_msi_region. Is it a problem for you?
> 
>>
>> Alex
>>
>>> + * @domain: IOMMU domain to prepare
>>> + * @base: Base address of IOVA region to use as the MSI remapping aperture
>>> + * @size: Size of the desired MSI aperture
>>> + *
>>> + * Users who manage their own IOVA allocation and do not want DMA API support,
>>> + * but would still like to take advantage of automatic MSI remapping, can use
>>> + * this to initialise their own domain appropriately.
>>> + */
>>> +int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>>> +		dma_addr_t base, u64 size)
>>> +{
>>> +	struct iommu_dma_cookie *cookie;
>>> +	struct iova_domain *iovad;
>>> +	int ret;
>>> +
>>> +	if (domain->type == IOMMU_DOMAIN_DMA)
>>> +		return -EINVAL;
>>> +
>>> +	ret = iommu_get_dma_cookie(domain);
>>> +	if (ret)
>>> +		return ret;
>>> +
>>> +	ret = iommu_dma_init_domain(domain, base, size, NULL);
>>> +	if (ret) {
>>> +		iommu_put_dma_cookie(domain);
>>> +		return ret;
>>> +	}
> 
> It *is* necessary to explicitly reserve the upper part of the IOVA
> domain here - the aforementioned "special initialisation" - because
> dma_32bit_pfn is only an optimisation hint to prevent the allocator
> walking down from the very top of the the tree every time when devices
> with different DMA masks share a domain (I'm in two minds as to whether
> to tweak the way the iommu-dma code uses it in this respect, now that I
> fully understand things). The only actual upper limit to allocation is
> the DMA mask passed into each alloc_iova() call, so if we want to ensure
> IOVAs are really allocated within this specific region, we have to carve
> out everything above it.

thank you for the explanation. So I will restore the reserve then.

Thanks

Eric
> 
> Robin.
> 
>>> +
>>> +	domain->msi_geometry.aperture_start = base;
>>> +	domain->msi_geometry.aperture_end = base + size - 1;
>>> +
>>> +	cookie = domain->iova_cookie;
>>> +	iovad = &cookie->iovad;
>>> +
>>> +	return 0;
>>> +}
>>> +EXPORT_SYMBOL(iommu_get_dma_msi_region_cookie);
>>> diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
>>> index 32c5890..1c55413 100644
>>> --- a/include/linux/dma-iommu.h
>>> +++ b/include/linux/dma-iommu.h
>>> @@ -67,6 +67,9 @@ int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
>>>  /* The DMA API isn't _quite_ the whole story, though... */
>>>  void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg);
>>>  
>>> +int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>>> +		dma_addr_t base, u64 size);
>>> +
>>>  #else
>>>  
>>>  struct iommu_domain;
>>> @@ -90,6 +93,12 @@ static inline void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
>>>  {
>>>  }
>>>  
>>> +static inline int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>>> +		dma_addr_t base, u64 size)
>>> +{
>>> +	return -ENODEV;
>>> +}
>>> +
>>>  #endif	/* CONFIG_IOMMU_DMA */
>>>  #endif	/* __KERNEL__ */
>>>  #endif	/* __DMA_IOMMU_H */
>>
> 

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v13 03/15] iommu/dma: Allow MSI-only cookies
@ 2016-10-10 14:47         ` Auger Eric
  0 siblings, 0 replies; 109+ messages in thread
From: Auger Eric @ 2016-10-10 14:47 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Robin,

On 10/10/2016 16:26, Robin Murphy wrote:
> Hi Alex, Eric,
> 
> On 06/10/16 21:17, Alex Williamson wrote:
>> On Thu,  6 Oct 2016 08:45:19 +0000
>> Eric Auger <eric.auger@redhat.com> wrote:
>>
>>> From: Robin Murphy <robin.murphy@arm.com>
>>>
>>> IOMMU domain users such as VFIO face a similar problem to DMA API ops
>>> with regard to mapping MSI messages in systems where the MSI write is
>>> subject to IOMMU translation. With the relevant infrastructure now in
>>> place for managed DMA domains, it's actually really simple for other
>>> users to piggyback off that and reap the benefits without giving up
>>> their own IOVA management, and without having to reinvent their own
>>> wheel in the MSI layer.
>>>
>>> Allow such users to opt into automatic MSI remapping by dedicating a
>>> region of their IOVA space to a managed cookie.
>>>
>>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>
>>> ---
>>>
>>> v1 -> v2:
>>> - compared to Robin's version
>>> - add NULL last param to iommu_dma_init_domain
>>> - set the msi_geometry aperture
>>> - I removed
>>>   if (base < U64_MAX - size)
>>>      reserve_iova(iovad, iova_pfn(iovad, base + size), ULONG_MAX);
>>>   don't get why we would reserve something out of the scope of the iova domain?
>>>   what do I miss?
>>> ---
>>>  drivers/iommu/dma-iommu.c | 40 ++++++++++++++++++++++++++++++++++++++++
>>>  include/linux/dma-iommu.h |  9 +++++++++
>>>  2 files changed, 49 insertions(+)
>>>
>>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>>> index c5ab866..11da1a0 100644
>>> --- a/drivers/iommu/dma-iommu.c
>>> +++ b/drivers/iommu/dma-iommu.c
>>> @@ -716,3 +716,43 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
>>>  		msg->address_lo += lower_32_bits(msi_page->iova);
>>>  	}
>>>  }
>>> +
>>> +/**
>>> + * iommu_get_dma_msi_region_cookie - Configure a domain for MSI remapping only
>>
>> Should this perhaps be iommu_setup_dma_msi_region_cookie, or something
>> along those lines.  I'm not sure what we're get'ing.  Thanks,
> 
> What we're getting is private third-party resources for the iommu_domain
> given in the argument. It's a get/put rather than alloc/free model since
> we operate opaquely on the domain as a container, rather than on the
> actual resource in question (an IOVA allocator).
> 
> Since this particular use case is slightly different from the normal
> flow and has special initialisation requirements, it seemed a lot
> cleaner to simply combine that initialisation operation with the
> prerequisite "get" into a single call. Especially as it helps emphasise
> that this is not 'normal' DMA cookie usage.

I renamed iommu_get_dma_msi_region_cookie into
iommu_setup_dma_msi_region. Is it a problem for you?
> 
>>
>> Alex
>>
>>> + * @domain: IOMMU domain to prepare
>>> + * @base: Base address of IOVA region to use as the MSI remapping aperture
>>> + * @size: Size of the desired MSI aperture
>>> + *
>>> + * Users who manage their own IOVA allocation and do not want DMA API support,
>>> + * but would still like to take advantage of automatic MSI remapping, can use
>>> + * this to initialise their own domain appropriately.
>>> + */
>>> +int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>>> +		dma_addr_t base, u64 size)
>>> +{
>>> +	struct iommu_dma_cookie *cookie;
>>> +	struct iova_domain *iovad;
>>> +	int ret;
>>> +
>>> +	if (domain->type == IOMMU_DOMAIN_DMA)
>>> +		return -EINVAL;
>>> +
>>> +	ret = iommu_get_dma_cookie(domain);
>>> +	if (ret)
>>> +		return ret;
>>> +
>>> +	ret = iommu_dma_init_domain(domain, base, size, NULL);
>>> +	if (ret) {
>>> +		iommu_put_dma_cookie(domain);
>>> +		return ret;
>>> +	}
> 
> It *is* necessary to explicitly reserve the upper part of the IOVA
> domain here - the aforementioned "special initialisation" - because
> dma_32bit_pfn is only an optimisation hint to prevent the allocator
> walking down from the very top of the the tree every time when devices
> with different DMA masks share a domain (I'm in two minds as to whether
> to tweak the way the iommu-dma code uses it in this respect, now that I
> fully understand things). The only actual upper limit to allocation is
> the DMA mask passed into each alloc_iova() call, so if we want to ensure
> IOVAs are really allocated within this specific region, we have to carve
> out everything above it.

thank you for the explanation. So I will restore the reserve then.

Thanks

Eric
> 
> Robin.
> 
>>> +
>>> +	domain->msi_geometry.aperture_start = base;
>>> +	domain->msi_geometry.aperture_end = base + size - 1;
>>> +
>>> +	cookie = domain->iova_cookie;
>>> +	iovad = &cookie->iovad;
>>> +
>>> +	return 0;
>>> +}
>>> +EXPORT_SYMBOL(iommu_get_dma_msi_region_cookie);
>>> diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
>>> index 32c5890..1c55413 100644
>>> --- a/include/linux/dma-iommu.h
>>> +++ b/include/linux/dma-iommu.h
>>> @@ -67,6 +67,9 @@ int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
>>>  /* The DMA API isn't _quite_ the whole story, though... */
>>>  void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg);
>>>  
>>> +int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>>> +		dma_addr_t base, u64 size);
>>> +
>>>  #else
>>>  
>>>  struct iommu_domain;
>>> @@ -90,6 +93,12 @@ static inline void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
>>>  {
>>>  }
>>>  
>>> +static inline int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>>> +		dma_addr_t base, u64 size)
>>> +{
>>> +	return -ENODEV;
>>> +}
>>> +
>>>  #endif	/* CONFIG_IOMMU_DMA */
>>>  #endif	/* __KERNEL__ */
>>>  #endif	/* __DMA_IOMMU_H */
>>
> 

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v13 15/15] vfio/type1: Return the MSI geometry through VFIO_IOMMU_GET_INFO capability chains
  2016-10-07 20:38           ` Alex Williamson
@ 2016-10-10 15:01             ` Auger Eric
  -1 siblings, 0 replies; 109+ messages in thread
From: Auger Eric @ 2016-10-10 15:01 UTC (permalink / raw)
  To: Alex Williamson
  Cc: yehuday, jason, kvm, marc.zyngier, will.deacon, joro, p.fedin,
	drjones, linux-kernel, Bharat.Bhushan, Jean-Philippe.Brucker,
	iommu, pranav.sawargaonkar, christoffer.dall, tglx, robin.murphy,
	Manish.Jaggi, linux-arm-kernel, eric.auger.pro

Hi Alex,
On 07/10/2016 22:38, Alex Williamson wrote:
> On Fri, 7 Oct 2016 19:10:27 +0200
> Auger Eric <eric.auger@redhat.com> wrote:
> 
>> Hi Alex,
>>
>> On 06/10/2016 22:42, Alex Williamson wrote:
>>> On Thu, 6 Oct 2016 14:20:40 -0600
>>> Alex Williamson <alex.williamson@redhat.com> wrote:
>>>   
>>>> On Thu,  6 Oct 2016 08:45:31 +0000
>>>> Eric Auger <eric.auger@redhat.com> wrote:
>>>>  
>>>>> This patch allows the user-space to retrieve the MSI geometry. The
>>>>> implementation is based on capability chains, now also added to
>>>>> VFIO_IOMMU_GET_INFO.
>>>>>
>>>>> The returned info comprise:
>>>>> - whether the MSI IOVA are constrained to a reserved range (x86 case) and
>>>>>   in the positive, the start/end of the aperture,
>>>>> - or whether the IOVA aperture need to be set by the userspace. In that
>>>>>   case, the size and alignment of the IOVA window to be provided are
>>>>>   returned.
>>>>>
>>>>> In case the userspace must provide the IOVA aperture, we currently report
>>>>> a size/alignment based on all the doorbells registered by the host kernel.
>>>>> This may exceed the actual needs.
>>>>>
>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>>>
>>>>> ---
>>>>> v11 -> v11:
>>>>> - msi_doorbell_pages was renamed msi_doorbell_calc_pages
>>>>>
>>>>> v9 -> v10:
>>>>> - move cap_offset after iova_pgsizes
>>>>> - replace __u64 alignment by __u32 order
>>>>> - introduce __u32 flags in vfio_iommu_type1_info_cap_msi_geometry and
>>>>>   fix alignment
>>>>> - call msi-doorbell API to compute the size/alignment
>>>>>
>>>>> v8 -> v9:
>>>>> - use iommu_msi_supported flag instead of programmable
>>>>> - replace IOMMU_INFO_REQUIRE_MSI_MAP flag by a more sophisticated
>>>>>   capability chain, reporting the MSI geometry
>>>>>
>>>>> v7 -> v8:
>>>>> - use iommu_domain_msi_geometry
>>>>>
>>>>> v6 -> v7:
>>>>> - remove the computation of the number of IOVA pages to be provisionned.
>>>>>   This number depends on the domain/group/device topology which can
>>>>>   dynamically change. Let's rely instead rely on an arbitrary max depending
>>>>>   on the system
>>>>>
>>>>> v4 -> v5:
>>>>> - move msi_info and ret declaration within the conditional code
>>>>>
>>>>> v3 -> v4:
>>>>> - replace former vfio_domains_require_msi_mapping by
>>>>>   more complex computation of MSI mapping requirements, especially the
>>>>>   number of pages to be provided by the user-space.
>>>>> - reword patch title
>>>>>
>>>>> RFC v1 -> v1:
>>>>> - derived from
>>>>>   [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
>>>>> - renamed allow_msi_reconfig into require_msi_mapping
>>>>> - fixed VFIO_IOMMU_GET_INFO
>>>>> ---
>>>>>  drivers/vfio/vfio_iommu_type1.c | 78 ++++++++++++++++++++++++++++++++++++++++-
>>>>>  include/uapi/linux/vfio.h       | 32 ++++++++++++++++-
>>>>>  2 files changed, 108 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>>>>> index dc3ee5d..ce5e7eb 100644
>>>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>>>> @@ -38,6 +38,8 @@
>>>>>  #include <linux/workqueue.h>
>>>>>  #include <linux/dma-iommu.h>
>>>>>  #include <linux/msi-doorbell.h>
>>>>> +#include <linux/irqdomain.h>
>>>>> +#include <linux/msi.h>
>>>>>  
>>>>>  #define DRIVER_VERSION  "0.2"
>>>>>  #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
>>>>> @@ -1101,6 +1103,55 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
>>>>>  	return ret;
>>>>>  }
>>>>>  
>>>>> +static int compute_msi_geometry_caps(struct vfio_iommu *iommu,
>>>>> +				     struct vfio_info_cap *caps)
>>>>> +{
>>>>> +	struct vfio_iommu_type1_info_cap_msi_geometry *vfio_msi_geometry;
>>>>> +	unsigned long order = __ffs(vfio_pgsize_bitmap(iommu));
>>>>> +	struct iommu_domain_msi_geometry msi_geometry;
>>>>> +	struct vfio_info_cap_header *header;
>>>>> +	struct vfio_domain *d;
>>>>> +	bool reserved;
>>>>> +	size_t size;
>>>>> +
>>>>> +	mutex_lock(&iommu->lock);
>>>>> +	/* All domains have same require_msi_map property, pick first */
>>>>> +	d = list_first_entry(&iommu->domain_list, struct vfio_domain, next);
>>>>> +	iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_GEOMETRY,
>>>>> +			      &msi_geometry);
>>>>> +	reserved = !msi_geometry.iommu_msi_supported;
>>>>> +
>>>>> +	mutex_unlock(&iommu->lock);
>>>>> +
>>>>> +	size = sizeof(*vfio_msi_geometry);
>>>>> +	header = vfio_info_cap_add(caps, size,
>>>>> +				   VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY, 1);
>>>>> +
>>>>> +	if (IS_ERR(header))
>>>>> +		return PTR_ERR(header);
>>>>> +
>>>>> +	vfio_msi_geometry = container_of(header,
>>>>> +				struct vfio_iommu_type1_info_cap_msi_geometry,
>>>>> +				header);
>>>>> +
>>>>> +	vfio_msi_geometry->flags = reserved;    
>>>>
>>>> Use the bit flag VFIO_IOMMU_MSI_GEOMETRY_RESERVED
>>>>  
>>>>> +	if (reserved) {
>>>>> +		vfio_msi_geometry->aperture_start = msi_geometry.aperture_start;
>>>>> +		vfio_msi_geometry->aperture_end = msi_geometry.aperture_end;    
>>>>
>>>> But maybe nobody has set these, did you intend to use
>>>> iommu_domain_msi_aperture_valid(), which you defined early on but never
>>>> used?
>>>>  
>>>>> +		return 0;
>>>>> +	}
>>>>> +
>>>>> +	vfio_msi_geometry->order = order;    
>>>>
>>>> I'm tempted to suggest that a user could do the same math on their own
>>>> since we provide the supported bitmap already... could it ever not be
>>>> the same? 
>>>>  
>>>>> +	/*
>>>>> +	 * we compute a system-wide requirement based on all the registered
>>>>> +	 * doorbells
>>>>> +	 */
>>>>> +	vfio_msi_geometry->size =
>>>>> +		msi_doorbell_calc_pages(order) * ((uint64_t) 1 << order);
>>>>> +
>>>>> +	return 0;
>>>>> +}
>>>>> +
>>>>>  static long vfio_iommu_type1_ioctl(void *iommu_data,
>>>>>  				   unsigned int cmd, unsigned long arg)
>>>>>  {
>>>>> @@ -1122,8 +1173,10 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>>>>>  		}
>>>>>  	} else if (cmd == VFIO_IOMMU_GET_INFO) {
>>>>>  		struct vfio_iommu_type1_info info;
>>>>> +		struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
>>>>> +		int ret;
>>>>>  
>>>>> -		minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
>>>>> +		minsz = offsetofend(struct vfio_iommu_type1_info, cap_offset);
>>>>>  
>>>>>  		if (copy_from_user(&info, (void __user *)arg, minsz))
>>>>>  			return -EFAULT;
>>>>> @@ -1135,6 +1188,29 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>>>>>  
>>>>>  		info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
>>>>>  
>>>>> +		ret = compute_msi_geometry_caps(iommu, &caps);
>>>>> +		if (ret)
>>>>> +			return ret;
>>>>> +
>>>>> +		if (caps.size) {
>>>>> +			info.flags |= VFIO_IOMMU_INFO_CAPS;
>>>>> +			if (info.argsz < sizeof(info) + caps.size) {
>>>>> +				info.argsz = sizeof(info) + caps.size;
>>>>> +				info.cap_offset = 0;
>>>>> +			} else {
>>>>> +				vfio_info_cap_shift(&caps, sizeof(info));
>>>>> +				if (copy_to_user((void __user *)arg +
>>>>> +						sizeof(info), caps.buf,
>>>>> +						caps.size)) {
>>>>> +					kfree(caps.buf);
>>>>> +					return -EFAULT;
>>>>> +				}
>>>>> +				info.cap_offset = sizeof(info);
>>>>> +			}
>>>>> +
>>>>> +			kfree(caps.buf);
>>>>> +		}
>>>>> +
>>>>>  		return copy_to_user((void __user *)arg, &info, minsz) ?
>>>>>  			-EFAULT : 0;
>>>>>  
>>>>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>>>>> index 4a9dbc2..8dae013 100644
>>>>> --- a/include/uapi/linux/vfio.h
>>>>> +++ b/include/uapi/linux/vfio.h
>>>>> @@ -488,7 +488,35 @@ struct vfio_iommu_type1_info {
>>>>>  	__u32	argsz;
>>>>>  	__u32	flags;
>>>>>  #define VFIO_IOMMU_INFO_PGSIZES (1 << 0)	/* supported page sizes info */
>>>>> -	__u64	iova_pgsizes;		/* Bitmap of supported page sizes */
>>>>> +#define VFIO_IOMMU_INFO_CAPS	(1 << 1)	/* Info supports caps */
>>>>> +	__u64	iova_pgsizes;	/* Bitmap of supported page sizes */
>>>>> +	__u32	__resv;
>>>>> +	__u32   cap_offset;	/* Offset within info struct of first cap */
>>>>> +};    
>>>>
>>>> I understand the padding, but not the ordering.  Why not end with
>>>> padding?
>>>>  
>>>>> +
>>>>> +#define VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY	1
>>>>> +
>>>>> +/*
>>>>> + * The MSI geometry capability allows to report the MSI IOVA geometry:
>>>>> + * - either the MSI IOVAs are constrained within a reserved IOVA aperture
>>>>> + *   whose boundaries are given by [@aperture_start, @aperture_end].
>>>>> + *   this is typically the case on x86 host. The userspace is not allowed
>>>>> + *   to map userspace memory at IOVAs intersecting this range using
>>>>> + *   VFIO_IOMMU_MAP_DMA.
>>>>> + * - or the MSI IOVAs are not requested to belong to any reserved range;
>>>>> + *   in that case the userspace must provide an IOVA window characterized by
>>>>> + *   @size and @alignment using VFIO_IOMMU_MAP_DMA with RESERVED_MSI_IOVA flag.
>>>>> + */
>>>>> +struct vfio_iommu_type1_info_cap_msi_geometry {
>>>>> +	struct vfio_info_cap_header header;
>>>>> +	__u32 flags;
>>>>> +#define VFIO_IOMMU_MSI_GEOMETRY_RESERVED (1 << 0) /* reserved geometry */
>>>>> +	/* not reserved */
>>>>> +	__u32 order; /* iommu page order used for aperture alignment*/
>>>>> +	__u64 size; /* IOVA aperture size (bytes) the userspace must provide */
>>>>> +	/* reserved */
>>>>> +	__u64 aperture_start;
>>>>> +	__u64 aperture_end;    
>>>>
>>>> Should these be a union?  We never set them both.  Should the !reserved
>>>> case have a flag as well, so the user can positively identify what's
>>>> being provided?  
>>>
>>> Actually, is there really any need to fit both of these within the same
>>> structure?  Part of the idea of the capability chains is we can create
>>> a capability for each new thing we want to describe.  So, we could
>>> simply define a generic reserved IOVA range capability with a 'start'
>>> and 'end' and then another capability to define MSI mapping
>>> requirements.  Thanks,  
>> Yes your suggested approach makes sense to me.
>>
>> One reason why I proceeded that way is we are mixing things at iommu.h
>> level too. Personally I would have preferred to separate things:
>> 1) add a new IOMMU_CAP_TRANSLATE_MSI capability in iommu_cap
>> 2) rename iommu_msi_supported into "programmable" bool: reporting
>> whether the aperture is reserved or programmable.
>>
>> In the early releases I think it was as above but slightly we moved to a
>> mixed description.
>>
>> What do you think?
> 
> The API certainly doesn't seem like it has a cohesive feel to me.  It's
> not entirely clear to me how we know when we need to register a DMA MSI
> cookie, or how we know that the MSI doorbell API is actually
> initialized and in use by the MSI/IOMMU layer, or exactly what is the
> MSI geometry telling me.  Perhaps this is why the code doesn't seem to
> have a good rejection mechanism for architectures that need it versus
> those that don't, it's too hard to tell.
> 
> Maybe we can look at what we think the user API should be and work
> backwards.  For x86 we simply have a reserved range of IOVA.  I'm not
> entirely sure it adds to the user API to know that it's for MSI, it's
> just a range of IOVAs that we cannot allocate for regular DMA.  In
> fact, we currently lack a mechanism for describing the IOVA space of
> the IOMMU at all, so rather than focusing on a mechanism to describe a
> hole in the IOVA space, we might simply want to focus on a mechanism to
> describe the available IOVA space.  Everybody needs that, not just
> x86.  That sort of sounds like a VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE
> that perhaps looks like:
> 
> struct vfio_iommu_type1_info_cap_iova_range {
> 	struct vfio_info_cap_header header;
> 	u64 start;
> 	u64 end;
> };
> 
> Clearly we need to allow multiple of these in the capability chain
> since the existing x86 MSI range bisects this address space.
> 
> To support this, we basically need the same information from the IOMMU
> API.  We already have DOMAIN_ATTR_GEOMETRY, which should give us the
> base IOVA range, but we don't have anything describing the gaps.  We
> don't know how many sources of gaps we'll have in the future, but let's
> keep it simple and assume we can look for MSI gaps and add other
> possible sources of gaps in the future, it's an internal API after all.
> So we can use DOMAIN_ATTR_MSI_GEOMETRY to tell us about the (we assume
> one) MSI range of reserved IOVA within DOMAIN_ATTR_GEOMETRY.  For x86
> this is fixed, for SMMU this is a zero range until someone programs it.
> 
> Now, what does a user need to know to add a reserved MSI IOVA range?
> They need to know a) that it needs to be done, and b) how big to make
> it (and maybe alignment requirements).  Really all we need to describe
> then is b) since b) implies a). So maybe that gives us another
> capability chain entry:
> 
> struct vfio_iommu_type1_info_cap_msi_resv {
> 	struct vfio_info_cap_header header;
> 	u64 size;
> 	u64 alignment;
> };

I like the approach and I like the idea to separate the 2 issues in
separate structs, both at VFIO level and IOMMU level. It makes even more
sense now we have the other requirement to handle host PCIe host bridge
window.
> 
> It doesn't seem like we need to waste a flag bit on
> vfio_iommu_type1_info.flags for this since the existence of this
> capability would imply that VFIO_IOMMU_MAP_DMA supports an MSI_RESV
> flag.
I agree.
> 
> So what do we need from the kernel infrastructure to make that happen?
> Well, we need a) and b) above, and again b) can imply a), so if the
> IOMMU API provided a DOMAIN_ATTR_MSI_RESV, providing the same
> size/alignment, then we're nearly there.
Agreed
  Then we just need a way to
> set that range, which I'd probably try to plumb through the IOMMU API
> rather than pulling in separate doorbell APIs and DMA cookie APIs.  If
> it's going to pull together all those different things, let's at least
> only do that in one place so we can expose a consistent API through the
> IOMMU API.  Obviously once a range is set, DOMAIN_ATTR_MSI_RESV should
> report that range, so if the user were to look at the type1 info
> capability chain again, the available IOVA ranges would reflect the now
> reserved range.
So my plan is to respin the passthrough series with
vfio_iommu_type1_info_cap_msi_resv and associated iommu struct.

I would prefer to send a separate series to report IOVA  usable address
space.

Thanks

Eric
> 
> Maybe that's more than you're asking for, but that's the approach I
> would take to solidify the API.  Thanks,
> 
> Alex
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v13 15/15] vfio/type1: Return the MSI geometry through VFIO_IOMMU_GET_INFO capability chains
@ 2016-10-10 15:01             ` Auger Eric
  0 siblings, 0 replies; 109+ messages in thread
From: Auger Eric @ 2016-10-10 15:01 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Alex,
On 07/10/2016 22:38, Alex Williamson wrote:
> On Fri, 7 Oct 2016 19:10:27 +0200
> Auger Eric <eric.auger@redhat.com> wrote:
> 
>> Hi Alex,
>>
>> On 06/10/2016 22:42, Alex Williamson wrote:
>>> On Thu, 6 Oct 2016 14:20:40 -0600
>>> Alex Williamson <alex.williamson@redhat.com> wrote:
>>>   
>>>> On Thu,  6 Oct 2016 08:45:31 +0000
>>>> Eric Auger <eric.auger@redhat.com> wrote:
>>>>  
>>>>> This patch allows the user-space to retrieve the MSI geometry. The
>>>>> implementation is based on capability chains, now also added to
>>>>> VFIO_IOMMU_GET_INFO.
>>>>>
>>>>> The returned info comprise:
>>>>> - whether the MSI IOVA are constrained to a reserved range (x86 case) and
>>>>>   in the positive, the start/end of the aperture,
>>>>> - or whether the IOVA aperture need to be set by the userspace. In that
>>>>>   case, the size and alignment of the IOVA window to be provided are
>>>>>   returned.
>>>>>
>>>>> In case the userspace must provide the IOVA aperture, we currently report
>>>>> a size/alignment based on all the doorbells registered by the host kernel.
>>>>> This may exceed the actual needs.
>>>>>
>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>>>
>>>>> ---
>>>>> v11 -> v11:
>>>>> - msi_doorbell_pages was renamed msi_doorbell_calc_pages
>>>>>
>>>>> v9 -> v10:
>>>>> - move cap_offset after iova_pgsizes
>>>>> - replace __u64 alignment by __u32 order
>>>>> - introduce __u32 flags in vfio_iommu_type1_info_cap_msi_geometry and
>>>>>   fix alignment
>>>>> - call msi-doorbell API to compute the size/alignment
>>>>>
>>>>> v8 -> v9:
>>>>> - use iommu_msi_supported flag instead of programmable
>>>>> - replace IOMMU_INFO_REQUIRE_MSI_MAP flag by a more sophisticated
>>>>>   capability chain, reporting the MSI geometry
>>>>>
>>>>> v7 -> v8:
>>>>> - use iommu_domain_msi_geometry
>>>>>
>>>>> v6 -> v7:
>>>>> - remove the computation of the number of IOVA pages to be provisionned.
>>>>>   This number depends on the domain/group/device topology which can
>>>>>   dynamically change. Let's rely instead rely on an arbitrary max depending
>>>>>   on the system
>>>>>
>>>>> v4 -> v5:
>>>>> - move msi_info and ret declaration within the conditional code
>>>>>
>>>>> v3 -> v4:
>>>>> - replace former vfio_domains_require_msi_mapping by
>>>>>   more complex computation of MSI mapping requirements, especially the
>>>>>   number of pages to be provided by the user-space.
>>>>> - reword patch title
>>>>>
>>>>> RFC v1 -> v1:
>>>>> - derived from
>>>>>   [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
>>>>> - renamed allow_msi_reconfig into require_msi_mapping
>>>>> - fixed VFIO_IOMMU_GET_INFO
>>>>> ---
>>>>>  drivers/vfio/vfio_iommu_type1.c | 78 ++++++++++++++++++++++++++++++++++++++++-
>>>>>  include/uapi/linux/vfio.h       | 32 ++++++++++++++++-
>>>>>  2 files changed, 108 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>>>>> index dc3ee5d..ce5e7eb 100644
>>>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>>>> @@ -38,6 +38,8 @@
>>>>>  #include <linux/workqueue.h>
>>>>>  #include <linux/dma-iommu.h>
>>>>>  #include <linux/msi-doorbell.h>
>>>>> +#include <linux/irqdomain.h>
>>>>> +#include <linux/msi.h>
>>>>>  
>>>>>  #define DRIVER_VERSION  "0.2"
>>>>>  #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
>>>>> @@ -1101,6 +1103,55 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
>>>>>  	return ret;
>>>>>  }
>>>>>  
>>>>> +static int compute_msi_geometry_caps(struct vfio_iommu *iommu,
>>>>> +				     struct vfio_info_cap *caps)
>>>>> +{
>>>>> +	struct vfio_iommu_type1_info_cap_msi_geometry *vfio_msi_geometry;
>>>>> +	unsigned long order = __ffs(vfio_pgsize_bitmap(iommu));
>>>>> +	struct iommu_domain_msi_geometry msi_geometry;
>>>>> +	struct vfio_info_cap_header *header;
>>>>> +	struct vfio_domain *d;
>>>>> +	bool reserved;
>>>>> +	size_t size;
>>>>> +
>>>>> +	mutex_lock(&iommu->lock);
>>>>> +	/* All domains have same require_msi_map property, pick first */
>>>>> +	d = list_first_entry(&iommu->domain_list, struct vfio_domain, next);
>>>>> +	iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_GEOMETRY,
>>>>> +			      &msi_geometry);
>>>>> +	reserved = !msi_geometry.iommu_msi_supported;
>>>>> +
>>>>> +	mutex_unlock(&iommu->lock);
>>>>> +
>>>>> +	size = sizeof(*vfio_msi_geometry);
>>>>> +	header = vfio_info_cap_add(caps, size,
>>>>> +				   VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY, 1);
>>>>> +
>>>>> +	if (IS_ERR(header))
>>>>> +		return PTR_ERR(header);
>>>>> +
>>>>> +	vfio_msi_geometry = container_of(header,
>>>>> +				struct vfio_iommu_type1_info_cap_msi_geometry,
>>>>> +				header);
>>>>> +
>>>>> +	vfio_msi_geometry->flags = reserved;    
>>>>
>>>> Use the bit flag VFIO_IOMMU_MSI_GEOMETRY_RESERVED
>>>>  
>>>>> +	if (reserved) {
>>>>> +		vfio_msi_geometry->aperture_start = msi_geometry.aperture_start;
>>>>> +		vfio_msi_geometry->aperture_end = msi_geometry.aperture_end;    
>>>>
>>>> But maybe nobody has set these, did you intend to use
>>>> iommu_domain_msi_aperture_valid(), which you defined early on but never
>>>> used?
>>>>  
>>>>> +		return 0;
>>>>> +	}
>>>>> +
>>>>> +	vfio_msi_geometry->order = order;    
>>>>
>>>> I'm tempted to suggest that a user could do the same math on their own
>>>> since we provide the supported bitmap already... could it ever not be
>>>> the same? 
>>>>  
>>>>> +	/*
>>>>> +	 * we compute a system-wide requirement based on all the registered
>>>>> +	 * doorbells
>>>>> +	 */
>>>>> +	vfio_msi_geometry->size =
>>>>> +		msi_doorbell_calc_pages(order) * ((uint64_t) 1 << order);
>>>>> +
>>>>> +	return 0;
>>>>> +}
>>>>> +
>>>>>  static long vfio_iommu_type1_ioctl(void *iommu_data,
>>>>>  				   unsigned int cmd, unsigned long arg)
>>>>>  {
>>>>> @@ -1122,8 +1173,10 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>>>>>  		}
>>>>>  	} else if (cmd == VFIO_IOMMU_GET_INFO) {
>>>>>  		struct vfio_iommu_type1_info info;
>>>>> +		struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
>>>>> +		int ret;
>>>>>  
>>>>> -		minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
>>>>> +		minsz = offsetofend(struct vfio_iommu_type1_info, cap_offset);
>>>>>  
>>>>>  		if (copy_from_user(&info, (void __user *)arg, minsz))
>>>>>  			return -EFAULT;
>>>>> @@ -1135,6 +1188,29 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>>>>>  
>>>>>  		info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
>>>>>  
>>>>> +		ret = compute_msi_geometry_caps(iommu, &caps);
>>>>> +		if (ret)
>>>>> +			return ret;
>>>>> +
>>>>> +		if (caps.size) {
>>>>> +			info.flags |= VFIO_IOMMU_INFO_CAPS;
>>>>> +			if (info.argsz < sizeof(info) + caps.size) {
>>>>> +				info.argsz = sizeof(info) + caps.size;
>>>>> +				info.cap_offset = 0;
>>>>> +			} else {
>>>>> +				vfio_info_cap_shift(&caps, sizeof(info));
>>>>> +				if (copy_to_user((void __user *)arg +
>>>>> +						sizeof(info), caps.buf,
>>>>> +						caps.size)) {
>>>>> +					kfree(caps.buf);
>>>>> +					return -EFAULT;
>>>>> +				}
>>>>> +				info.cap_offset = sizeof(info);
>>>>> +			}
>>>>> +
>>>>> +			kfree(caps.buf);
>>>>> +		}
>>>>> +
>>>>>  		return copy_to_user((void __user *)arg, &info, minsz) ?
>>>>>  			-EFAULT : 0;
>>>>>  
>>>>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>>>>> index 4a9dbc2..8dae013 100644
>>>>> --- a/include/uapi/linux/vfio.h
>>>>> +++ b/include/uapi/linux/vfio.h
>>>>> @@ -488,7 +488,35 @@ struct vfio_iommu_type1_info {
>>>>>  	__u32	argsz;
>>>>>  	__u32	flags;
>>>>>  #define VFIO_IOMMU_INFO_PGSIZES (1 << 0)	/* supported page sizes info */
>>>>> -	__u64	iova_pgsizes;		/* Bitmap of supported page sizes */
>>>>> +#define VFIO_IOMMU_INFO_CAPS	(1 << 1)	/* Info supports caps */
>>>>> +	__u64	iova_pgsizes;	/* Bitmap of supported page sizes */
>>>>> +	__u32	__resv;
>>>>> +	__u32   cap_offset;	/* Offset within info struct of first cap */
>>>>> +};    
>>>>
>>>> I understand the padding, but not the ordering.  Why not end with
>>>> padding?
>>>>  
>>>>> +
>>>>> +#define VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY	1
>>>>> +
>>>>> +/*
>>>>> + * The MSI geometry capability allows to report the MSI IOVA geometry:
>>>>> + * - either the MSI IOVAs are constrained within a reserved IOVA aperture
>>>>> + *   whose boundaries are given by [@aperture_start, @aperture_end].
>>>>> + *   this is typically the case on x86 host. The userspace is not allowed
>>>>> + *   to map userspace memory at IOVAs intersecting this range using
>>>>> + *   VFIO_IOMMU_MAP_DMA.
>>>>> + * - or the MSI IOVAs are not requested to belong to any reserved range;
>>>>> + *   in that case the userspace must provide an IOVA window characterized by
>>>>> + *   @size and @alignment using VFIO_IOMMU_MAP_DMA with RESERVED_MSI_IOVA flag.
>>>>> + */
>>>>> +struct vfio_iommu_type1_info_cap_msi_geometry {
>>>>> +	struct vfio_info_cap_header header;
>>>>> +	__u32 flags;
>>>>> +#define VFIO_IOMMU_MSI_GEOMETRY_RESERVED (1 << 0) /* reserved geometry */
>>>>> +	/* not reserved */
>>>>> +	__u32 order; /* iommu page order used for aperture alignment*/
>>>>> +	__u64 size; /* IOVA aperture size (bytes) the userspace must provide */
>>>>> +	/* reserved */
>>>>> +	__u64 aperture_start;
>>>>> +	__u64 aperture_end;    
>>>>
>>>> Should these be a union?  We never set them both.  Should the !reserved
>>>> case have a flag as well, so the user can positively identify what's
>>>> being provided?  
>>>
>>> Actually, is there really any need to fit both of these within the same
>>> structure?  Part of the idea of the capability chains is we can create
>>> a capability for each new thing we want to describe.  So, we could
>>> simply define a generic reserved IOVA range capability with a 'start'
>>> and 'end' and then another capability to define MSI mapping
>>> requirements.  Thanks,  
>> Yes your suggested approach makes sense to me.
>>
>> One reason why I proceeded that way is we are mixing things at iommu.h
>> level too. Personally I would have preferred to separate things:
>> 1) add a new IOMMU_CAP_TRANSLATE_MSI capability in iommu_cap
>> 2) rename iommu_msi_supported into "programmable" bool: reporting
>> whether the aperture is reserved or programmable.
>>
>> In the early releases I think it was as above but slightly we moved to a
>> mixed description.
>>
>> What do you think?
> 
> The API certainly doesn't seem like it has a cohesive feel to me.  It's
> not entirely clear to me how we know when we need to register a DMA MSI
> cookie, or how we know that the MSI doorbell API is actually
> initialized and in use by the MSI/IOMMU layer, or exactly what is the
> MSI geometry telling me.  Perhaps this is why the code doesn't seem to
> have a good rejection mechanism for architectures that need it versus
> those that don't, it's too hard to tell.
> 
> Maybe we can look at what we think the user API should be and work
> backwards.  For x86 we simply have a reserved range of IOVA.  I'm not
> entirely sure it adds to the user API to know that it's for MSI, it's
> just a range of IOVAs that we cannot allocate for regular DMA.  In
> fact, we currently lack a mechanism for describing the IOVA space of
> the IOMMU at all, so rather than focusing on a mechanism to describe a
> hole in the IOVA space, we might simply want to focus on a mechanism to
> describe the available IOVA space.  Everybody needs that, not just
> x86.  That sort of sounds like a VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE
> that perhaps looks like:
> 
> struct vfio_iommu_type1_info_cap_iova_range {
> 	struct vfio_info_cap_header header;
> 	u64 start;
> 	u64 end;
> };
> 
> Clearly we need to allow multiple of these in the capability chain
> since the existing x86 MSI range bisects this address space.
> 
> To support this, we basically need the same information from the IOMMU
> API.  We already have DOMAIN_ATTR_GEOMETRY, which should give us the
> base IOVA range, but we don't have anything describing the gaps.  We
> don't know how many sources of gaps we'll have in the future, but let's
> keep it simple and assume we can look for MSI gaps and add other
> possible sources of gaps in the future, it's an internal API after all.
> So we can use DOMAIN_ATTR_MSI_GEOMETRY to tell us about the (we assume
> one) MSI range of reserved IOVA within DOMAIN_ATTR_GEOMETRY.  For x86
> this is fixed, for SMMU this is a zero range until someone programs it.
> 
> Now, what does a user need to know to add a reserved MSI IOVA range?
> They need to know a) that it needs to be done, and b) how big to make
> it (and maybe alignment requirements).  Really all we need to describe
> then is b) since b) implies a). So maybe that gives us another
> capability chain entry:
> 
> struct vfio_iommu_type1_info_cap_msi_resv {
> 	struct vfio_info_cap_header header;
> 	u64 size;
> 	u64 alignment;
> };

I like the approach and I like the idea to separate the 2 issues in
separate structs, both at VFIO level and IOMMU level. It makes even more
sense now we have the other requirement to handle host PCIe host bridge
window.
> 
> It doesn't seem like we need to waste a flag bit on
> vfio_iommu_type1_info.flags for this since the existence of this
> capability would imply that VFIO_IOMMU_MAP_DMA supports an MSI_RESV
> flag.
I agree.
> 
> So what do we need from the kernel infrastructure to make that happen?
> Well, we need a) and b) above, and again b) can imply a), so if the
> IOMMU API provided a DOMAIN_ATTR_MSI_RESV, providing the same
> size/alignment, then we're nearly there.
Agreed
  Then we just need a way to
> set that range, which I'd probably try to plumb through the IOMMU API
> rather than pulling in separate doorbell APIs and DMA cookie APIs.  If
> it's going to pull together all those different things, let's at least
> only do that in one place so we can expose a consistent API through the
> IOMMU API.  Obviously once a range is set, DOMAIN_ATTR_MSI_RESV should
> report that range, so if the user were to look at the type1 info
> capability chain again, the available IOVA ranges would reflect the now
> reserved range.
So my plan is to respin the passthrough series with
vfio_iommu_type1_info_cap_msi_resv and associated iommu struct.

I would prefer to send a separate series to report IOVA  usable address
space.

Thanks

Eric
> 
> Maybe that's more than you're asking for, but that's the approach I
> would take to solidify the API.  Thanks,
> 
> Alex
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v13 03/15] iommu/dma: Allow MSI-only cookies
@ 2016-10-10 15:52           ` Robin Murphy
  0 siblings, 0 replies; 109+ messages in thread
From: Robin Murphy @ 2016-10-10 15:52 UTC (permalink / raw)
  To: Auger Eric, Alex Williamson
  Cc: eric.auger.pro, christoffer.dall, marc.zyngier, will.deacon,
	joro, tglx, jason, linux-arm-kernel, kvm, drjones, linux-kernel,
	Bharat.Bhushan, pranav.sawargaonkar, p.fedin, iommu,
	Jean-Philippe.Brucker, yehuday, Manish.Jaggi

On 10/10/16 15:47, Auger Eric wrote:
> Hi Robin,
> 
> On 10/10/2016 16:26, Robin Murphy wrote:
>> Hi Alex, Eric,
>>
>> On 06/10/16 21:17, Alex Williamson wrote:
>>> On Thu,  6 Oct 2016 08:45:19 +0000
>>> Eric Auger <eric.auger@redhat.com> wrote:
>>>
>>>> From: Robin Murphy <robin.murphy@arm.com>
>>>>
>>>> IOMMU domain users such as VFIO face a similar problem to DMA API ops
>>>> with regard to mapping MSI messages in systems where the MSI write is
>>>> subject to IOMMU translation. With the relevant infrastructure now in
>>>> place for managed DMA domains, it's actually really simple for other
>>>> users to piggyback off that and reap the benefits without giving up
>>>> their own IOVA management, and without having to reinvent their own
>>>> wheel in the MSI layer.
>>>>
>>>> Allow such users to opt into automatic MSI remapping by dedicating a
>>>> region of their IOVA space to a managed cookie.
>>>>
>>>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>>
>>>> ---
>>>>
>>>> v1 -> v2:
>>>> - compared to Robin's version
>>>> - add NULL last param to iommu_dma_init_domain
>>>> - set the msi_geometry aperture
>>>> - I removed
>>>>   if (base < U64_MAX - size)
>>>>      reserve_iova(iovad, iova_pfn(iovad, base + size), ULONG_MAX);
>>>>   don't get why we would reserve something out of the scope of the iova domain?
>>>>   what do I miss?
>>>> ---
>>>>  drivers/iommu/dma-iommu.c | 40 ++++++++++++++++++++++++++++++++++++++++
>>>>  include/linux/dma-iommu.h |  9 +++++++++
>>>>  2 files changed, 49 insertions(+)
>>>>
>>>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>>>> index c5ab866..11da1a0 100644
>>>> --- a/drivers/iommu/dma-iommu.c
>>>> +++ b/drivers/iommu/dma-iommu.c
>>>> @@ -716,3 +716,43 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
>>>>  		msg->address_lo += lower_32_bits(msi_page->iova);
>>>>  	}
>>>>  }
>>>> +
>>>> +/**
>>>> + * iommu_get_dma_msi_region_cookie - Configure a domain for MSI remapping only
>>>
>>> Should this perhaps be iommu_setup_dma_msi_region_cookie, or something
>>> along those lines.  I'm not sure what we're get'ing.  Thanks,
>>
>> What we're getting is private third-party resources for the iommu_domain
>> given in the argument. It's a get/put rather than alloc/free model since
>> we operate opaquely on the domain as a container, rather than on the
>> actual resource in question (an IOVA allocator).
>>
>> Since this particular use case is slightly different from the normal
>> flow and has special initialisation requirements, it seemed a lot
>> cleaner to simply combine that initialisation operation with the
>> prerequisite "get" into a single call. Especially as it helps emphasise
>> that this is not 'normal' DMA cookie usage.
> 
> I renamed iommu_get_dma_msi_region_cookie into
> iommu_setup_dma_msi_region. Is it a problem for you?

I'd still prefer not to completely disguise the fact that it's
performing a get_cookie(), which ultimately still needs to be matched by
a put_cookie() somewhere. Really, VFIO should be doing the latter itself
before freeing the domain, as there's not strictly any guarantee that
the underlying IOMMU driver knows anything about this.

Robin.

>>
>>>
>>> Alex
>>>
>>>> + * @domain: IOMMU domain to prepare
>>>> + * @base: Base address of IOVA region to use as the MSI remapping aperture
>>>> + * @size: Size of the desired MSI aperture
>>>> + *
>>>> + * Users who manage their own IOVA allocation and do not want DMA API support,
>>>> + * but would still like to take advantage of automatic MSI remapping, can use
>>>> + * this to initialise their own domain appropriately.
>>>> + */
>>>> +int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>>>> +		dma_addr_t base, u64 size)
>>>> +{
>>>> +	struct iommu_dma_cookie *cookie;
>>>> +	struct iova_domain *iovad;
>>>> +	int ret;
>>>> +
>>>> +	if (domain->type == IOMMU_DOMAIN_DMA)
>>>> +		return -EINVAL;
>>>> +
>>>> +	ret = iommu_get_dma_cookie(domain);
>>>> +	if (ret)
>>>> +		return ret;
>>>> +
>>>> +	ret = iommu_dma_init_domain(domain, base, size, NULL);
>>>> +	if (ret) {
>>>> +		iommu_put_dma_cookie(domain);
>>>> +		return ret;
>>>> +	}
>>
>> It *is* necessary to explicitly reserve the upper part of the IOVA
>> domain here - the aforementioned "special initialisation" - because
>> dma_32bit_pfn is only an optimisation hint to prevent the allocator
>> walking down from the very top of the the tree every time when devices
>> with different DMA masks share a domain (I'm in two minds as to whether
>> to tweak the way the iommu-dma code uses it in this respect, now that I
>> fully understand things). The only actual upper limit to allocation is
>> the DMA mask passed into each alloc_iova() call, so if we want to ensure
>> IOVAs are really allocated within this specific region, we have to carve
>> out everything above it.
> 
> thank you for the explanation. So I will restore the reserve then.
> 
> Thanks
> 
> Eric
>>
>> Robin.
>>
>>>> +
>>>> +	domain->msi_geometry.aperture_start = base;
>>>> +	domain->msi_geometry.aperture_end = base + size - 1;
>>>> +
>>>> +	cookie = domain->iova_cookie;
>>>> +	iovad = &cookie->iovad;
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +EXPORT_SYMBOL(iommu_get_dma_msi_region_cookie);
>>>> diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
>>>> index 32c5890..1c55413 100644
>>>> --- a/include/linux/dma-iommu.h
>>>> +++ b/include/linux/dma-iommu.h
>>>> @@ -67,6 +67,9 @@ int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
>>>>  /* The DMA API isn't _quite_ the whole story, though... */
>>>>  void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg);
>>>>  
>>>> +int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>>>> +		dma_addr_t base, u64 size);
>>>> +
>>>>  #else
>>>>  
>>>>  struct iommu_domain;
>>>> @@ -90,6 +93,12 @@ static inline void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
>>>>  {
>>>>  }
>>>>  
>>>> +static inline int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>>>> +		dma_addr_t base, u64 size)
>>>> +{
>>>> +	return -ENODEV;
>>>> +}
>>>> +
>>>>  #endif	/* CONFIG_IOMMU_DMA */
>>>>  #endif	/* __KERNEL__ */
>>>>  #endif	/* __DMA_IOMMU_H */
>>>
>>
> 

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v13 03/15] iommu/dma: Allow MSI-only cookies
@ 2016-10-10 15:52           ` Robin Murphy
  0 siblings, 0 replies; 109+ messages in thread
From: Robin Murphy @ 2016-10-10 15:52 UTC (permalink / raw)
  To: Auger Eric, Alex Williamson
  Cc: yehuday-eYqpPyKDWXRBDgjK7y7TUQ, drjones-H+wXaHxf7aLQT0dZR+AlfA,
	jason-NLaQJdtUoK4Be96aLqz0jA, kvm-u79uwXL29TY76Z2rM5mHXA,
	marc.zyngier-5wv7dgnIgG8, p.fedin-Sze3O3UU22JBDgjK7y7TUQ,
	will.deacon-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	tglx-hfZtesqFncYOwBW4kG4KsQ,
	Manish.Jaggi-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	eric.auger.pro-Re5JQEeQqe8AvxtiuMwx3w

On 10/10/16 15:47, Auger Eric wrote:
> Hi Robin,
> 
> On 10/10/2016 16:26, Robin Murphy wrote:
>> Hi Alex, Eric,
>>
>> On 06/10/16 21:17, Alex Williamson wrote:
>>> On Thu,  6 Oct 2016 08:45:19 +0000
>>> Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>>>
>>>> From: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
>>>>
>>>> IOMMU domain users such as VFIO face a similar problem to DMA API ops
>>>> with regard to mapping MSI messages in systems where the MSI write is
>>>> subject to IOMMU translation. With the relevant infrastructure now in
>>>> place for managed DMA domains, it's actually really simple for other
>>>> users to piggyback off that and reap the benefits without giving up
>>>> their own IOVA management, and without having to reinvent their own
>>>> wheel in the MSI layer.
>>>>
>>>> Allow such users to opt into automatic MSI remapping by dedicating a
>>>> region of their IOVA space to a managed cookie.
>>>>
>>>> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
>>>> Signed-off-by: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>>>>
>>>> ---
>>>>
>>>> v1 -> v2:
>>>> - compared to Robin's version
>>>> - add NULL last param to iommu_dma_init_domain
>>>> - set the msi_geometry aperture
>>>> - I removed
>>>>   if (base < U64_MAX - size)
>>>>      reserve_iova(iovad, iova_pfn(iovad, base + size), ULONG_MAX);
>>>>   don't get why we would reserve something out of the scope of the iova domain?
>>>>   what do I miss?
>>>> ---
>>>>  drivers/iommu/dma-iommu.c | 40 ++++++++++++++++++++++++++++++++++++++++
>>>>  include/linux/dma-iommu.h |  9 +++++++++
>>>>  2 files changed, 49 insertions(+)
>>>>
>>>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>>>> index c5ab866..11da1a0 100644
>>>> --- a/drivers/iommu/dma-iommu.c
>>>> +++ b/drivers/iommu/dma-iommu.c
>>>> @@ -716,3 +716,43 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
>>>>  		msg->address_lo += lower_32_bits(msi_page->iova);
>>>>  	}
>>>>  }
>>>> +
>>>> +/**
>>>> + * iommu_get_dma_msi_region_cookie - Configure a domain for MSI remapping only
>>>
>>> Should this perhaps be iommu_setup_dma_msi_region_cookie, or something
>>> along those lines.  I'm not sure what we're get'ing.  Thanks,
>>
>> What we're getting is private third-party resources for the iommu_domain
>> given in the argument. It's a get/put rather than alloc/free model since
>> we operate opaquely on the domain as a container, rather than on the
>> actual resource in question (an IOVA allocator).
>>
>> Since this particular use case is slightly different from the normal
>> flow and has special initialisation requirements, it seemed a lot
>> cleaner to simply combine that initialisation operation with the
>> prerequisite "get" into a single call. Especially as it helps emphasise
>> that this is not 'normal' DMA cookie usage.
> 
> I renamed iommu_get_dma_msi_region_cookie into
> iommu_setup_dma_msi_region. Is it a problem for you?

I'd still prefer not to completely disguise the fact that it's
performing a get_cookie(), which ultimately still needs to be matched by
a put_cookie() somewhere. Really, VFIO should be doing the latter itself
before freeing the domain, as there's not strictly any guarantee that
the underlying IOMMU driver knows anything about this.

Robin.

>>
>>>
>>> Alex
>>>
>>>> + * @domain: IOMMU domain to prepare
>>>> + * @base: Base address of IOVA region to use as the MSI remapping aperture
>>>> + * @size: Size of the desired MSI aperture
>>>> + *
>>>> + * Users who manage their own IOVA allocation and do not want DMA API support,
>>>> + * but would still like to take advantage of automatic MSI remapping, can use
>>>> + * this to initialise their own domain appropriately.
>>>> + */
>>>> +int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>>>> +		dma_addr_t base, u64 size)
>>>> +{
>>>> +	struct iommu_dma_cookie *cookie;
>>>> +	struct iova_domain *iovad;
>>>> +	int ret;
>>>> +
>>>> +	if (domain->type == IOMMU_DOMAIN_DMA)
>>>> +		return -EINVAL;
>>>> +
>>>> +	ret = iommu_get_dma_cookie(domain);
>>>> +	if (ret)
>>>> +		return ret;
>>>> +
>>>> +	ret = iommu_dma_init_domain(domain, base, size, NULL);
>>>> +	if (ret) {
>>>> +		iommu_put_dma_cookie(domain);
>>>> +		return ret;
>>>> +	}
>>
>> It *is* necessary to explicitly reserve the upper part of the IOVA
>> domain here - the aforementioned "special initialisation" - because
>> dma_32bit_pfn is only an optimisation hint to prevent the allocator
>> walking down from the very top of the the tree every time when devices
>> with different DMA masks share a domain (I'm in two minds as to whether
>> to tweak the way the iommu-dma code uses it in this respect, now that I
>> fully understand things). The only actual upper limit to allocation is
>> the DMA mask passed into each alloc_iova() call, so if we want to ensure
>> IOVAs are really allocated within this specific region, we have to carve
>> out everything above it.
> 
> thank you for the explanation. So I will restore the reserve then.
> 
> Thanks
> 
> Eric
>>
>> Robin.
>>
>>>> +
>>>> +	domain->msi_geometry.aperture_start = base;
>>>> +	domain->msi_geometry.aperture_end = base + size - 1;
>>>> +
>>>> +	cookie = domain->iova_cookie;
>>>> +	iovad = &cookie->iovad;
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +EXPORT_SYMBOL(iommu_get_dma_msi_region_cookie);
>>>> diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
>>>> index 32c5890..1c55413 100644
>>>> --- a/include/linux/dma-iommu.h
>>>> +++ b/include/linux/dma-iommu.h
>>>> @@ -67,6 +67,9 @@ int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
>>>>  /* The DMA API isn't _quite_ the whole story, though... */
>>>>  void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg);
>>>>  
>>>> +int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>>>> +		dma_addr_t base, u64 size);
>>>> +
>>>>  #else
>>>>  
>>>>  struct iommu_domain;
>>>> @@ -90,6 +93,12 @@ static inline void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
>>>>  {
>>>>  }
>>>>  
>>>> +static inline int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>>>> +		dma_addr_t base, u64 size)
>>>> +{
>>>> +	return -ENODEV;
>>>> +}
>>>> +
>>>>  #endif	/* CONFIG_IOMMU_DMA */
>>>>  #endif	/* __KERNEL__ */
>>>>  #endif	/* __DMA_IOMMU_H */
>>>
>>
> 

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v13 03/15] iommu/dma: Allow MSI-only cookies
@ 2016-10-10 15:52           ` Robin Murphy
  0 siblings, 0 replies; 109+ messages in thread
From: Robin Murphy @ 2016-10-10 15:52 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/10/16 15:47, Auger Eric wrote:
> Hi Robin,
> 
> On 10/10/2016 16:26, Robin Murphy wrote:
>> Hi Alex, Eric,
>>
>> On 06/10/16 21:17, Alex Williamson wrote:
>>> On Thu,  6 Oct 2016 08:45:19 +0000
>>> Eric Auger <eric.auger@redhat.com> wrote:
>>>
>>>> From: Robin Murphy <robin.murphy@arm.com>
>>>>
>>>> IOMMU domain users such as VFIO face a similar problem to DMA API ops
>>>> with regard to mapping MSI messages in systems where the MSI write is
>>>> subject to IOMMU translation. With the relevant infrastructure now in
>>>> place for managed DMA domains, it's actually really simple for other
>>>> users to piggyback off that and reap the benefits without giving up
>>>> their own IOVA management, and without having to reinvent their own
>>>> wheel in the MSI layer.
>>>>
>>>> Allow such users to opt into automatic MSI remapping by dedicating a
>>>> region of their IOVA space to a managed cookie.
>>>>
>>>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>>
>>>> ---
>>>>
>>>> v1 -> v2:
>>>> - compared to Robin's version
>>>> - add NULL last param to iommu_dma_init_domain
>>>> - set the msi_geometry aperture
>>>> - I removed
>>>>   if (base < U64_MAX - size)
>>>>      reserve_iova(iovad, iova_pfn(iovad, base + size), ULONG_MAX);
>>>>   don't get why we would reserve something out of the scope of the iova domain?
>>>>   what do I miss?
>>>> ---
>>>>  drivers/iommu/dma-iommu.c | 40 ++++++++++++++++++++++++++++++++++++++++
>>>>  include/linux/dma-iommu.h |  9 +++++++++
>>>>  2 files changed, 49 insertions(+)
>>>>
>>>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>>>> index c5ab866..11da1a0 100644
>>>> --- a/drivers/iommu/dma-iommu.c
>>>> +++ b/drivers/iommu/dma-iommu.c
>>>> @@ -716,3 +716,43 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
>>>>  		msg->address_lo += lower_32_bits(msi_page->iova);
>>>>  	}
>>>>  }
>>>> +
>>>> +/**
>>>> + * iommu_get_dma_msi_region_cookie - Configure a domain for MSI remapping only
>>>
>>> Should this perhaps be iommu_setup_dma_msi_region_cookie, or something
>>> along those lines.  I'm not sure what we're get'ing.  Thanks,
>>
>> What we're getting is private third-party resources for the iommu_domain
>> given in the argument. It's a get/put rather than alloc/free model since
>> we operate opaquely on the domain as a container, rather than on the
>> actual resource in question (an IOVA allocator).
>>
>> Since this particular use case is slightly different from the normal
>> flow and has special initialisation requirements, it seemed a lot
>> cleaner to simply combine that initialisation operation with the
>> prerequisite "get" into a single call. Especially as it helps emphasise
>> that this is not 'normal' DMA cookie usage.
> 
> I renamed iommu_get_dma_msi_region_cookie into
> iommu_setup_dma_msi_region. Is it a problem for you?

I'd still prefer not to completely disguise the fact that it's
performing a get_cookie(), which ultimately still needs to be matched by
a put_cookie() somewhere. Really, VFIO should be doing the latter itself
before freeing the domain, as there's not strictly any guarantee that
the underlying IOMMU driver knows anything about this.

Robin.

>>
>>>
>>> Alex
>>>
>>>> + * @domain: IOMMU domain to prepare
>>>> + * @base: Base address of IOVA region to use as the MSI remapping aperture
>>>> + * @size: Size of the desired MSI aperture
>>>> + *
>>>> + * Users who manage their own IOVA allocation and do not want DMA API support,
>>>> + * but would still like to take advantage of automatic MSI remapping, can use
>>>> + * this to initialise their own domain appropriately.
>>>> + */
>>>> +int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>>>> +		dma_addr_t base, u64 size)
>>>> +{
>>>> +	struct iommu_dma_cookie *cookie;
>>>> +	struct iova_domain *iovad;
>>>> +	int ret;
>>>> +
>>>> +	if (domain->type == IOMMU_DOMAIN_DMA)
>>>> +		return -EINVAL;
>>>> +
>>>> +	ret = iommu_get_dma_cookie(domain);
>>>> +	if (ret)
>>>> +		return ret;
>>>> +
>>>> +	ret = iommu_dma_init_domain(domain, base, size, NULL);
>>>> +	if (ret) {
>>>> +		iommu_put_dma_cookie(domain);
>>>> +		return ret;
>>>> +	}
>>
>> It *is* necessary to explicitly reserve the upper part of the IOVA
>> domain here - the aforementioned "special initialisation" - because
>> dma_32bit_pfn is only an optimisation hint to prevent the allocator
>> walking down from the very top of the the tree every time when devices
>> with different DMA masks share a domain (I'm in two minds as to whether
>> to tweak the way the iommu-dma code uses it in this respect, now that I
>> fully understand things). The only actual upper limit to allocation is
>> the DMA mask passed into each alloc_iova() call, so if we want to ensure
>> IOVAs are really allocated within this specific region, we have to carve
>> out everything above it.
> 
> thank you for the explanation. So I will restore the reserve then.
> 
> Thanks
> 
> Eric
>>
>> Robin.
>>
>>>> +
>>>> +	domain->msi_geometry.aperture_start = base;
>>>> +	domain->msi_geometry.aperture_end = base + size - 1;
>>>> +
>>>> +	cookie = domain->iova_cookie;
>>>> +	iovad = &cookie->iovad;
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +EXPORT_SYMBOL(iommu_get_dma_msi_region_cookie);
>>>> diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
>>>> index 32c5890..1c55413 100644
>>>> --- a/include/linux/dma-iommu.h
>>>> +++ b/include/linux/dma-iommu.h
>>>> @@ -67,6 +67,9 @@ int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
>>>>  /* The DMA API isn't _quite_ the whole story, though... */
>>>>  void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg);
>>>>  
>>>> +int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>>>> +		dma_addr_t base, u64 size);
>>>> +
>>>>  #else
>>>>  
>>>>  struct iommu_domain;
>>>> @@ -90,6 +93,12 @@ static inline void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
>>>>  {
>>>>  }
>>>>  
>>>> +static inline int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>>>> +		dma_addr_t base, u64 size)
>>>> +{
>>>> +	return -ENODEV;
>>>> +}
>>>> +
>>>>  #endif	/* CONFIG_IOMMU_DMA */
>>>>  #endif	/* __KERNEL__ */
>>>>  #endif	/* __DMA_IOMMU_H */
>>>
>>
> 

^ permalink raw reply	[flat|nested] 109+ messages in thread

end of thread, other threads:[~2016-10-10 15:52 UTC | newest]

Thread overview: 109+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-06  8:45 [PATCH v13 00/15] KVM PCIe/MSI passthrough on ARM/ARM64 Eric Auger
2016-10-06  8:45 ` Eric Auger
2016-10-06  8:45 ` Eric Auger
2016-10-06  8:45 ` [PATCH v13 01/15] iommu: Introduce DOMAIN_ATTR_MSI_GEOMETRY Eric Auger
2016-10-06  8:45   ` Eric Auger
2016-10-06  8:45 ` [PATCH v13 02/15] iommu/arm-smmu: Initialize the msi geometry Eric Auger
2016-10-06  8:45   ` Eric Auger
2016-10-06  8:45   ` Eric Auger
2016-10-06 20:16   ` Alex Williamson
2016-10-06 20:16     ` Alex Williamson
2016-10-06 20:16     ` Alex Williamson
2016-10-06  8:45 ` [PATCH v13 03/15] iommu/dma: Allow MSI-only cookies Eric Auger
2016-10-06  8:45   ` Eric Auger
2016-10-06  8:45   ` Eric Auger
2016-10-06 20:17   ` Alex Williamson
2016-10-06 20:17     ` Alex Williamson
2016-10-06 20:17     ` Alex Williamson
2016-10-07 17:14     ` Auger Eric
2016-10-07 17:14       ` Auger Eric
2016-10-07 17:14       ` Auger Eric
2016-10-10 14:26     ` Robin Murphy
2016-10-10 14:26       ` Robin Murphy
2016-10-10 14:26       ` Robin Murphy
2016-10-10 14:47       ` Auger Eric
2016-10-10 14:47         ` Auger Eric
2016-10-10 14:47         ` Auger Eric
2016-10-10 15:52         ` Robin Murphy
2016-10-10 15:52           ` Robin Murphy
2016-10-10 15:52           ` Robin Murphy
2016-10-06  8:45 ` [PATCH v13 04/15] genirq/msi: Introduce the MSI doorbell API Eric Auger
2016-10-06  8:45   ` Eric Auger
2016-10-06  8:45   ` Eric Auger
2016-10-06 20:17   ` Alex Williamson
2016-10-06 20:17     ` Alex Williamson
2016-10-07 17:13     ` Auger Eric
2016-10-07 17:13       ` Auger Eric
2016-10-07 17:13       ` Auger Eric
2016-10-06  8:45 ` [PATCH v13 05/15] genirq/msi: msi_doorbell_calc_pages Eric Auger
2016-10-06  8:45   ` Eric Auger
2016-10-06  8:45 ` [PATCH v13 06/15] irqchip/gic-v2m: Register the MSI doorbell Eric Auger
2016-10-06  8:45   ` Eric Auger
2016-10-06  8:45   ` Eric Auger
2016-10-06  8:45 ` [PATCH v13 07/15] irqchip/gicv3-its: " Eric Auger
2016-10-06  8:45   ` Eric Auger
2016-10-06  8:45   ` Eric Auger
2016-10-06  8:45 ` [PATCH v13 08/15] vfio: Introduce a vfio_dma type field Eric Auger
2016-10-06  8:45   ` Eric Auger
2016-10-06  8:45   ` Eric Auger
2016-10-06 20:18   ` Alex Williamson
2016-10-06 20:18     ` Alex Williamson
2016-10-06 20:18     ` Alex Williamson
2016-10-06  8:45 ` [PATCH v13 09/15] vfio/type1: vfio_find_dma accepting a type argument Eric Auger
2016-10-06  8:45   ` Eric Auger
2016-10-06  8:45   ` Eric Auger
2016-10-06 20:18   ` Alex Williamson
2016-10-06 20:18     ` Alex Williamson
2016-10-06 20:18     ` Alex Williamson
2016-10-06  8:45 ` [PATCH v13 10/15] vfio/type1: Implement recursive vfio_find_dma_from_node Eric Auger
2016-10-06  8:45   ` Eric Auger
2016-10-06  8:45   ` Eric Auger
2016-10-06 20:19   ` Alex Williamson
2016-10-06 20:19     ` Alex Williamson
2016-10-06 20:19     ` Alex Williamson
2016-10-06  8:45 ` [PATCH v13 11/15] vfio/type1: Handle unmap/unpin and replay for VFIO_IOVA_RESERVED slots Eric Auger
2016-10-06  8:45   ` Eric Auger
2016-10-06  8:45   ` Eric Auger
2016-10-06 20:19   ` Alex Williamson
2016-10-06 20:19     ` Alex Williamson
2016-10-06 20:19     ` Alex Williamson
2016-10-07 17:11     ` Auger Eric
2016-10-07 17:11       ` Auger Eric
2016-10-07 17:11       ` Auger Eric
2016-10-06  8:45 ` [PATCH v13 12/15] vfio: Allow reserved msi iova registration Eric Auger
2016-10-06  8:45   ` Eric Auger
2016-10-06  8:45   ` Eric Auger
2016-10-06 20:19   ` Alex Williamson
2016-10-06 20:19     ` Alex Williamson
2016-10-06 20:19     ` Alex Williamson
2016-10-07 17:11     ` Auger Eric
2016-10-07 17:11       ` Auger Eric
2016-10-07 17:11       ` Auger Eric
2016-10-07 20:45       ` Alex Williamson
2016-10-07 20:45         ` Alex Williamson
2016-10-07 20:45         ` Alex Williamson
2016-10-06  8:45 ` [PATCH v13 13/15] vfio/type1: Check doorbell safety Eric Auger
2016-10-06  8:45   ` Eric Auger
2016-10-06 20:19   ` Alex Williamson
2016-10-06 20:19     ` Alex Williamson
2016-10-06 20:19     ` Alex Williamson
2016-10-06  8:45 ` [PATCH v13 14/15] iommu/arm-smmu: Do not advertise IOMMU_CAP_INTR_REMAP Eric Auger
2016-10-06  8:45   ` Eric Auger
2016-10-06  8:45   ` Eric Auger
2016-10-06  8:45 ` [PATCH v13 15/15] vfio/type1: Return the MSI geometry through VFIO_IOMMU_GET_INFO capability chains Eric Auger
2016-10-06  8:45   ` Eric Auger
2016-10-06  8:45   ` Eric Auger
2016-10-06 20:20   ` Alex Williamson
2016-10-06 20:20     ` Alex Williamson
2016-10-06 20:20     ` Alex Williamson
2016-10-06 20:42     ` Alex Williamson
2016-10-06 20:42       ` Alex Williamson
2016-10-06 20:42       ` Alex Williamson
2016-10-07 17:10       ` Auger Eric
2016-10-07 17:10         ` Auger Eric
2016-10-07 17:10         ` Auger Eric
2016-10-07 20:38         ` Alex Williamson
2016-10-07 20:38           ` Alex Williamson
2016-10-07 20:38           ` Alex Williamson
2016-10-10 15:01           ` Auger Eric
2016-10-10 15:01             ` Auger Eric

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.