All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] Introduce MSI hardware mapping for VFIO
@ 2015-11-24 13:50 Pavel Fedin
  2015-11-24 13:50 ` [PATCH v2 1/3] vfio: Introduce map and unmap operations Pavel Fedin
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Pavel Fedin @ 2015-11-24 13:50 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Marc Zyngier, Alex Williamson, Manish Jaggi, Thomas Gleixner,
	Jason Cooper

On some architectures (e.g. ARM64) if the device is behind an IOMMU, and
is being mapped by VFIO, it is necessary to also add mappings for MSI
translation register for interrupts to work. This series implements the
necessary API to do this, and makes use of this API for GICv3 ITS on
ARM64.

v1 => v2:
- Adde dependency on CONFIG_GENERIC_MSI_IRQ_DOMAIN in some parts of the
  code, should fix build without this option

Pavel Fedin (3):
  vfio: Introduce map and unmap operations
  gicv3, its: Introduce VFIO map and unmap operations
  vfio: Introduce generic MSI mapping operations

 drivers/irqchip/irq-gic-v3-its.c   |  31 ++++++++++
 drivers/vfio/pci/vfio_pci_intrs.c  |  11 ++++
 drivers/vfio/vfio.c                | 116 +++++++++++++++++++++++++++++++++++++
 drivers/vfio/vfio_iommu_type1.c    |  29 ++++++++++
 include/linux/irqchip/arm-gic-v3.h |   2 +
 include/linux/msi.h                |  12 ++++
 include/linux/vfio.h               |  17 +++++-
 7 files changed, 217 insertions(+), 1 deletion(-)

-- 
2.4.4

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/3] vfio: Introduce map and unmap operations
  2015-11-24 13:50 [PATCH v2 0/3] Introduce MSI hardware mapping for VFIO Pavel Fedin
@ 2015-11-24 13:50 ` Pavel Fedin
  2015-11-24 13:50 ` [PATCH v2 2/3] gicv3, its: Introduce VFIO " Pavel Fedin
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Pavel Fedin @ 2015-11-24 13:50 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Marc Zyngier, Alex Williamson, Manish Jaggi, Thomas Gleixner,
	Jason Cooper

These new functions allow direct mapping and unmapping of addresses on the
given IOMMU. They will be used for mapping MSI hardware.

Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
---
 drivers/vfio/vfio_iommu_type1.c | 29 +++++++++++++++++++++++++++++
 include/linux/vfio.h            |  4 +++-
 2 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 59d47cb..17506eb 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -558,6 +558,33 @@ unwind:
 	return ret;
 }
 
+static int vfio_iommu_type1_map(void *iommu_data, dma_addr_t iova,
+			      phys_addr_t paddr, long npage, int prot)
+{
+	struct vfio_iommu *iommu = iommu_data;
+	int ret;
+
+	mutex_lock(&iommu->lock);
+	ret = vfio_iommu_map(iommu, iova, paddr >> PAGE_SHIFT, npage, prot);
+	mutex_unlock(&iommu->lock);
+
+	return ret;
+}
+
+static void vfio_iommu_type1_unmap(void *iommu_data, dma_addr_t iova,
+				   long npage)
+{
+	struct vfio_iommu *iommu = iommu_data;
+	struct vfio_domain *d;
+
+	mutex_lock(&iommu->lock);
+
+	list_for_each_entry_reverse(d, &iommu->domain_list, next)
+		iommu_unmap(d->domain, iova, npage << PAGE_SHIFT);
+
+	mutex_unlock(&iommu->lock);
+}
+
 static int vfio_dma_do_map(struct vfio_iommu *iommu,
 			   struct vfio_iommu_type1_dma_map *map)
 {
@@ -1046,6 +1073,8 @@ static const struct vfio_iommu_driver_ops vfio_iommu_driver_ops_type1 = {
 	.ioctl		= vfio_iommu_type1_ioctl,
 	.attach_group	= vfio_iommu_type1_attach_group,
 	.detach_group	= vfio_iommu_type1_detach_group,
+	.map		= vfio_iommu_type1_map,
+	.unmap		= vfio_iommu_type1_unmap,
 };
 
 static int __init vfio_iommu_type1_init(void)
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 610a86a..061038a 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -75,7 +75,9 @@ struct vfio_iommu_driver_ops {
 					struct iommu_group *group);
 	void		(*detach_group)(void *iommu_data,
 					struct iommu_group *group);
-
+	int		(*map)(void *iommu_data, dma_addr_t iova,
+			       phys_addr_t paddr, long npage, int prot);
+	void		(*unmap)(void *iommu_data, dma_addr_t iova, long npage);
 };
 
 extern int vfio_register_iommu_driver(const struct vfio_iommu_driver_ops *ops);
-- 
2.4.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 2/3] gicv3, its: Introduce VFIO map and unmap operations
  2015-11-24 13:50 [PATCH v2 0/3] Introduce MSI hardware mapping for VFIO Pavel Fedin
  2015-11-24 13:50 ` [PATCH v2 1/3] vfio: Introduce map and unmap operations Pavel Fedin
@ 2015-11-24 13:50 ` Pavel Fedin
  2015-11-24 13:50 ` [PATCH v2 3/3] vfio: Introduce generic MSI mapping operations Pavel Fedin
  2015-12-02 21:32 ` [PATCH v2 0/3] Introduce MSI hardware mapping for VFIO Alex Williamson
  3 siblings, 0 replies; 8+ messages in thread
From: Pavel Fedin @ 2015-11-24 13:50 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Marc Zyngier, Alex Williamson, Manish Jaggi, Thomas Gleixner,
	Jason Cooper

These new functions use the supplied IOMMU in order to map and unmap MSI
translation register(s).

Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
---
 drivers/irqchip/irq-gic-v3-its.c   | 31 +++++++++++++++++++++++++++++++
 include/linux/irqchip/arm-gic-v3.h |  2 ++
 include/linux/msi.h                | 12 ++++++++++++
 3 files changed, 45 insertions(+)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index e23d1d1..b97dfd7 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -29,6 +29,7 @@
 #include <linux/of_platform.h>
 #include <linux/percpu.h>
 #include <linux/slab.h>
+#include <linux/vfio.h>
 
 #include <linux/irqchip.h>
 #include <linux/irqchip/arm-gic-v3.h>
@@ -1257,8 +1258,38 @@ out:
 	return 0;
 }
 
+#if IS_ENABLED(CONFIG_VFIO)
+
+static int its_vfio_map(struct irq_domain *domain,
+			const struct vfio_iommu_driver_ops *ops,
+			void *iommu_data)
+{
+	struct msi_domain_info *msi_info = msi_get_domain_info(domain);
+	struct its_node *its = msi_info->data;
+	u64 addr = its->phys_base + GIC_V3_ITS_CONTROL_SIZE;
+
+	return ops->map(iommu_data, addr, addr, 1, IOMMU_READ|IOMMU_WRITE);
+}
+
+static void its_vfio_unmap(struct irq_domain *domain,
+			   const struct vfio_iommu_driver_ops *ops,
+			   void *iommu_data)
+{
+	struct msi_domain_info *msi_info = msi_get_domain_info(domain);
+	struct its_node *its = msi_info->data;
+	u64 addr = its->phys_base + GIC_V3_ITS_CONTROL_SIZE;
+
+	ops->unmap(iommu_data, addr, 1);
+}
+
+#endif
+
 static struct msi_domain_ops its_msi_domain_ops = {
 	.msi_prepare	= its_msi_prepare,
+#if IS_ENABLED(CONFIG_VFIO)
+	.vfio_map	= its_vfio_map,
+	.vfio_unmap	= its_vfio_unmap,
+#endif
 };
 
 static int its_irq_gic_domain_alloc(struct irq_domain *domain,
diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h
index bff3eee..dfd2bed 100644
--- a/include/linux/irqchip/arm-gic-v3.h
+++ b/include/linux/irqchip/arm-gic-v3.h
@@ -241,6 +241,8 @@
 #define GITS_BASER_TYPE_RESERVED6	6
 #define GITS_BASER_TYPE_RESERVED7	7
 
+#define GIC_V3_ITS_CONTROL_SIZE		0x10000
+
 /*
  * ITS commands
  */
diff --git a/include/linux/msi.h b/include/linux/msi.h
index f71a25e..48faea9 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -155,6 +155,8 @@ void arch_restore_msi_irqs(struct pci_dev *dev);
 void default_teardown_msi_irqs(struct pci_dev *dev);
 void default_restore_msi_irqs(struct pci_dev *dev);
 
+struct vfio_iommu_driver_ops;
+
 struct msi_controller {
 	struct module *owner;
 	struct device *dev;
@@ -189,6 +191,8 @@ struct msi_domain_info;
  * @msi_finish:		Optional callbacl to finalize the allocation
  * @set_desc:		Set the msi descriptor for an interrupt
  * @handle_error:	Optional error handler if the allocation fails
+ * @vfio_map:		Map the MSI hardware for VFIO
+ * @vfio_unmap:		Unmap the MSI hardware for VFIO
  *
  * @get_hwirq, @msi_init and @msi_free are callbacks used by
  * msi_create_irq_domain() and related interfaces
@@ -218,6 +222,14 @@ struct msi_domain_ops {
 				    struct msi_desc *desc);
 	int		(*handle_error)(struct irq_domain *domain,
 					struct msi_desc *desc, int error);
+#if IS_ENABLED(CONFIG_VFIO)
+	int		(*vfio_map)(struct irq_domain *domain,
+				    const struct vfio_iommu_driver_ops *ops,
+				    void *iommu_data);
+	void		(*vfio_unmap)(struct irq_domain *domain,
+				      const struct vfio_iommu_driver_ops *ops,
+				      void *iommu_data);
+#endif
 };
 
 /**
-- 
2.4.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 3/3] vfio: Introduce generic MSI mapping operations
  2015-11-24 13:50 [PATCH v2 0/3] Introduce MSI hardware mapping for VFIO Pavel Fedin
  2015-11-24 13:50 ` [PATCH v2 1/3] vfio: Introduce map and unmap operations Pavel Fedin
  2015-11-24 13:50 ` [PATCH v2 2/3] gicv3, its: Introduce VFIO " Pavel Fedin
@ 2015-11-24 13:50 ` Pavel Fedin
  2015-12-02 21:32 ` [PATCH v2 0/3] Introduce MSI hardware mapping for VFIO Alex Williamson
  3 siblings, 0 replies; 8+ messages in thread
From: Pavel Fedin @ 2015-11-24 13:50 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Alex Williamson, Marc Zyngier, Thomas Gleixner, Jason Cooper,
	Manish Jaggi

These operations are used in order to map and unmap MSI translation
registers for the device, allowing it to send MSIs to the host while being
mapped via IOMMU.

Usage of MSI controllers is tracked on a per-device basis using reference
counting. An MSI controller remains mapped as long as there's at least one
device referring to it using MSI.

Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
---
 drivers/vfio/pci/vfio_pci_intrs.c |  11 ++++
 drivers/vfio/vfio.c               | 116 ++++++++++++++++++++++++++++++++++++++
 include/linux/vfio.h              |  13 +++++
 3 files changed, 140 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 3b3ba15..3c8be59 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -259,12 +259,19 @@ static int vfio_msi_enable(struct vfio_pci_device *vdev, int nvec, bool msix)
 	if (!vdev->ctx)
 		return -ENOMEM;
 
+	ret = vfio_device_map_msi(&pdev->dev);
+	if (ret) {
+		kfree(vdev->ctx);
+		return ret;
+	}
+
 	if (msix) {
 		int i;
 
 		vdev->msix = kzalloc(nvec * sizeof(struct msix_entry),
 				     GFP_KERNEL);
 		if (!vdev->msix) {
+			vfio_device_unmap_msi(&pdev->dev);
 			kfree(vdev->ctx);
 			return -ENOMEM;
 		}
@@ -277,6 +284,7 @@ static int vfio_msi_enable(struct vfio_pci_device *vdev, int nvec, bool msix)
 			if (ret > 0)
 				pci_disable_msix(pdev);
 			kfree(vdev->msix);
+			vfio_device_unmap_msi(&pdev->dev);
 			kfree(vdev->ctx);
 			return ret;
 		}
@@ -285,6 +293,7 @@ static int vfio_msi_enable(struct vfio_pci_device *vdev, int nvec, bool msix)
 		if (ret < nvec) {
 			if (ret > 0)
 				pci_disable_msi(pdev);
+			vfio_device_unmap_msi(&pdev->dev);
 			kfree(vdev->ctx);
 			return ret;
 		}
@@ -413,6 +422,8 @@ static void vfio_msi_disable(struct vfio_pci_device *vdev, bool msix)
 	} else
 		pci_disable_msi(pdev);
 
+	vfio_device_unmap_msi(&pdev->dev);
+
 	vdev->irq_type = VFIO_PCI_NUM_IRQS;
 	vdev->num_ctx = 0;
 	kfree(vdev->ctx);
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index de632da..37d99f5 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -21,9 +21,11 @@
 #include <linux/fs.h>
 #include <linux/idr.h>
 #include <linux/iommu.h>
+#include <linux/irqdomain.h>
 #include <linux/list.h>
 #include <linux/miscdevice.h>
 #include <linux/module.h>
+#include <linux/msi.h>
 #include <linux/mutex.h>
 #include <linux/pci.h>
 #include <linux/rwsem.h>
@@ -63,6 +65,8 @@ struct vfio_container {
 	struct vfio_iommu_driver	*iommu_driver;
 	void				*iommu_data;
 	bool				noiommu;
+	struct list_head		msi_list;
+	struct mutex			msi_lock;
 };
 
 struct vfio_unbound_dev {
@@ -97,6 +101,13 @@ struct vfio_device {
 	void				*device_data;
 };
 
+struct vfio_msi {
+	struct kref			kref;
+	struct list_head		msi_next;
+	struct vfio_container		*container;
+	struct irq_domain		*domain;
+};
+
 #ifdef CONFIG_VFIO_NOIOMMU
 static bool noiommu __read_mostly;
 module_param_named(enable_unsafe_noiommu_support,
@@ -882,6 +893,109 @@ void *vfio_device_data(struct vfio_device *device)
 }
 EXPORT_SYMBOL_GPL(vfio_device_data);
 
+#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
+
+int vfio_device_map_msi(struct device *dev)
+{
+	struct irq_domain *msi_domain = dev_get_msi_domain(dev);
+	struct msi_domain_info *info;
+	struct vfio_device *device;
+	struct vfio_container *container;
+	struct vfio_msi *vmsi;
+	int ret;
+
+	if (!msi_domain)
+		return 0;
+	info = msi_domain->host_data;
+	if (!info->ops->vfio_map)
+		return 0;
+
+	device = dev_get_drvdata(dev);
+	container = device->group->container;
+
+	if (!container->iommu_driver->ops->map)
+		return -EINVAL;
+
+	mutex_lock(&container->msi_lock);
+
+	list_for_each_entry(vmsi, &container->msi_list, msi_next) {
+		if (vmsi->domain == msi_domain) {
+			kref_get(&vmsi->kref);
+			mutex_unlock(&container->msi_lock);
+			return 0;
+		}
+	}
+
+	vmsi = kmalloc(sizeof(*vmsi), GFP_KERNEL);
+	if (!vmsi) {
+		mutex_unlock(&container->msi_lock);
+		return -ENOMEM;
+	}
+
+	ret = info->ops->vfio_map(msi_domain, container->iommu_driver->ops,
+				  container->iommu_data);
+	if (ret) {
+		mutex_unlock(&container->msi_lock);
+		kfree(vmsi);
+		return ret;
+	}
+
+	kref_init(&vmsi->kref);
+	vmsi->container = container;
+	vmsi->domain = msi_domain;
+	list_add(&vmsi->msi_next, &container->msi_list);
+
+	mutex_unlock(&container->msi_lock);
+
+	return 0;
+}
+EXPORT_SYMBOL(vfio_device_map_msi);
+
+static void msi_release(struct kref *kref)
+{
+	struct vfio_msi *vmsi = container_of(kref, struct vfio_msi, kref);
+	struct vfio_container *container = vmsi->container;
+	struct msi_domain_info *info = vmsi->domain->host_data;
+
+	info->ops->vfio_unmap(vmsi->domain, container->iommu_driver->ops,
+			      container->iommu_data);
+
+	list_del(&vmsi->msi_next);
+	kfree(vmsi);
+}
+
+void vfio_device_unmap_msi(struct device *dev)
+{
+	struct irq_domain *msi_domain = dev_get_msi_domain(dev);
+	struct msi_domain_info *info;
+	struct vfio_device *device;
+	struct vfio_container *container;
+	struct vfio_msi *vmsi;
+
+	if (!msi_domain)
+		return;
+	info = msi_domain->host_data;
+	if (!info->ops->vfio_unmap)
+		return;
+
+	device = dev_get_drvdata(dev);
+	container = device->group->container;
+
+	mutex_lock(&container->msi_lock);
+
+	list_for_each_entry(vmsi, &container->msi_list, msi_next) {
+		if (vmsi->domain == msi_domain) {
+			kref_put(&vmsi->kref, msi_release);
+			break;
+		}
+	}
+
+	mutex_unlock(&container->msi_lock);
+}
+EXPORT_SYMBOL(vfio_device_unmap_msi);
+
+#endif
+
 /* Given a referenced group, check if it contains the device */
 static bool vfio_dev_present(struct vfio_group *group, struct device *dev)
 {
@@ -1170,6 +1284,8 @@ static int vfio_fops_open(struct inode *inode, struct file *filep)
 
 	INIT_LIST_HEAD(&container->group_list);
 	init_rwsem(&container->group_lock);
+	INIT_LIST_HEAD(&container->msi_list);
+	mutex_init(&container->msi_lock);
 	kref_init(&container->kref);
 
 	filep->private_data = container;
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 061038a..7b87b02 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -93,6 +93,19 @@ extern void vfio_group_put_external_user(struct vfio_group *group);
 extern int vfio_external_user_iommu_id(struct vfio_group *group);
 extern long vfio_external_check_extension(struct vfio_group *group,
 					  unsigned long arg);
+#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
+int vfio_device_map_msi(struct device *dev);
+void vfio_device_unmap_msi(struct device *dev);
+#else
+static inline int vfio_device_map_msi(struct device *dev)
+{
+	return 0;
+}
+
+static inline void vfio_device_unmap_msi(struct device *dev)
+{
+}
+#endif
 
 struct pci_dev;
 #ifdef CONFIG_EEH
-- 
2.4.4


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/3] Introduce MSI hardware mapping for VFIO
  2015-11-24 13:50 [PATCH v2 0/3] Introduce MSI hardware mapping for VFIO Pavel Fedin
                   ` (2 preceding siblings ...)
  2015-11-24 13:50 ` [PATCH v2 3/3] vfio: Introduce generic MSI mapping operations Pavel Fedin
@ 2015-12-02 21:32 ` Alex Williamson
  2015-12-03 13:16   ` Pavel Fedin
  3 siblings, 1 reply; 8+ messages in thread
From: Alex Williamson @ 2015-12-02 21:32 UTC (permalink / raw)
  To: Pavel Fedin
  Cc: Jason Cooper, kvm, Manish Jaggi, Marc Zyngier, Thomas Gleixner, kvmarm

On Tue, 2015-11-24 at 16:50 +0300, Pavel Fedin wrote:
> On some architectures (e.g. ARM64) if the device is behind an IOMMU, and
> is being mapped by VFIO, it is necessary to also add mappings for MSI
> translation register for interrupts to work. This series implements the
> necessary API to do this, and makes use of this API for GICv3 ITS on
> ARM64.
> 
> v1 => v2:
> - Adde dependency on CONFIG_GENERIC_MSI_IRQ_DOMAIN in some parts of the
>   code, should fix build without this option
> 
> Pavel Fedin (3):
>   vfio: Introduce map and unmap operations
>   gicv3, its: Introduce VFIO map and unmap operations
>   vfio: Introduce generic MSI mapping operations
> 
>  drivers/irqchip/irq-gic-v3-its.c   |  31 ++++++++++
>  drivers/vfio/pci/vfio_pci_intrs.c  |  11 ++++
>  drivers/vfio/vfio.c                | 116 +++++++++++++++++++++++++++++++++++++
>  drivers/vfio/vfio_iommu_type1.c    |  29 ++++++++++
>  include/linux/irqchip/arm-gic-v3.h |   2 +
>  include/linux/msi.h                |  12 ++++
>  include/linux/vfio.h               |  17 +++++-
>  7 files changed, 217 insertions(+), 1 deletion(-)


Some good points and bad.  I like that you're making this transparent
for the user, but at the same time, directly calling function pointers
through the msi_domain_ops is quite ugly.  There needs to be a real,
interface there that isn't specific to vfio.  The down side of making it
transparent to the user is that parts of their IOVA space are being
claimed and they have no way to figure out what they are.  In fact, the
IOMMU mappings bypass the rb-tree that the type1 driver uses, so these
mappings might stomp on existing mappings for the user or the user might
stomp on these.  Neither of which would be much fun to debug.

There have been previous efforts to support MSI mapping in VFIO[1,2],
but none of them have really gone anywhere.  Whatever solution we use
needs to support everyone who needs it.  Thanks,

Alex

[1] http://www.spinics.net/lists/kvm/msg121669.html, http://www.spinics.net/lists/kvm/msg121662.html
[2] http://www.spinics.net/lists/kvm/msg119236.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH v2 0/3] Introduce MSI hardware mapping for VFIO
  2015-12-02 21:32 ` [PATCH v2 0/3] Introduce MSI hardware mapping for VFIO Alex Williamson
@ 2015-12-03 13:16   ` Pavel Fedin
  2015-12-03 14:49     ` Marc Zyngier
  2015-12-03 17:05     ` Alex Williamson
  0 siblings, 2 replies; 8+ messages in thread
From: Pavel Fedin @ 2015-12-03 13:16 UTC (permalink / raw)
  To: 'Alex Williamson'
  Cc: kvm, kvmarm, 'Marc Zyngier', 'Thomas Gleixner',
	'Jason Cooper', 'Manish Jaggi',
	'Pranavkumar Sawargaonkar', 'Bharat Bhushan'

 Hello!

> I like that you're making this transparent
> for the user, but at the same time, directly calling function pointers
> through the msi_domain_ops is quite ugly.

 Do you mean dereferencing info->ops->vfio_map() in .c code? I can introduce some wrappers in include/linux/msi.h like msi_domain_vfio_map()/msi_domain_vfio_unmap(), this would not conceptually change anything.

>  There needs to be a real, interface there that isn't specific to vfio.

 Hm... What else is going to use this?
 Actually, in my implementation the only thing specific to vfio is using struct vfio_iommu_driver_ops. This is because we have to perform MSI mapping for all "vfio domains" registered for this container. At least this is how original type1 driver works.
 Can anybody explain me, what these "vfio domains" are? From the code it looks like we can have several IOMMU instances belonging to one VFIO container, and in this case one IOMMU == one "vfio domain". So is my understanding correct that "vfio domain" is IOMMU instance?
 And here come completely different ideas...
 First of all, can anybody explain, why do i perform all mappings on per-IOMMU basis, not on per-device basis? AFAIK at least ARM SMMU knows about "stream IDs", and therefore it should be capable of distinguishing between different devices. So can i have per-device mapping? This would make things much simpler.
 So:
 Idea 1: do per-device mappings. In this case i don't have to track down which devices belong to which group and which IOMMU...
 Idea 2: What if we indeed simply simulate x86 behavior? What if we just do 1:1 mapping for MSI register when IOMMU is initialized and forget about it, so that MSI messages are guaranteed to reach the host? Or would this mean that we would have to do 1:1 mapping for the whole address range? Looks like (1) tried to do something similar, with address reservation.
 Idea 3: Is single device guaranteed to correspond to a single "vfio domain" (and, as a consequence, to a single IOMMU)? In this case it's very easy to unlink interface introduced by 0002 of my series from vfio, and pass just raw struct iommu_domain * without any driver_ops? irqchip code would only need iommu_map() and iommu_unmap() then, no calling back to vfio layer.

Cc'ed to authors of all mentioned series

> [1] http://www.spinics.net/lists/kvm/msg121669.html,
> http://www.spinics.net/lists/kvm/msg121662.html
> [2] http://www.spinics.net/lists/kvm/msg119236.html

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/3] Introduce MSI hardware mapping for VFIO
  2015-12-03 13:16   ` Pavel Fedin
@ 2015-12-03 14:49     ` Marc Zyngier
  2015-12-03 17:05     ` Alex Williamson
  1 sibling, 0 replies; 8+ messages in thread
From: Marc Zyngier @ 2015-12-03 14:49 UTC (permalink / raw)
  To: Pavel Fedin
  Cc: 'Jason Cooper', kvm, 'Manish Jaggi',
	'Alex Williamson', 'Thomas Gleixner',
	kvmarm

On Thu, 3 Dec 2015 16:16:54 +0300
Pavel Fedin <p.fedin@samsung.com> wrote:

Pavel,

>  Hello!
> 
> > I like that you're making this transparent
> > for the user, but at the same time, directly calling function pointers
> > through the msi_domain_ops is quite ugly.
> 
>  Do you mean dereferencing info->ops->vfio_map() in .c code? I can introduce some wrappers in include/linux/msi.h like msi_domain_vfio_map()/msi_domain_vfio_unmap(), this would not conceptually change anything.
> 
> >  There needs to be a real, interface there that isn't specific to vfio.
> 
>  Hm... What else is going to use this?
>  Actually, in my implementation the only thing specific to vfio is
> using struct vfio_iommu_driver_ops. This is because we have to
> perform MSI mapping for all "vfio domains" registered for this
> container. At least this is how original type1 driver works.

>  Can anybody explain me, what these "vfio domains" are? From the code
> it looks like we can have several IOMMU instances belonging to one
> VFIO container, and in this case one IOMMU == one "vfio domain". So
> is my understanding correct that "vfio domain" is IOMMU instance?

>  And here come completely different ideas...

>  First of all, can anybody explain, why do i perform all mappings on
> per-IOMMU basis, not on per-device basis? AFAIK at least ARM SMMU
> knows about "stream IDs", and therefore it should be capable of
> distinguishing between different devices. So can i have per-device
> mapping? This would make things much simpler.

Not exactly. You can have a a bunch of devices between a single
StreamID (making them completely indistinguishable). This is a system
integration decision.

>  So:
>  Idea 1: do per-device mappings. In this case i don't have to track
> down which devices belong to which group and which IOMMU...

As explained above, this doesn't work.

>  Idea 2: What if we indeed simply simulate x86 behavior? What if we
> just do 1:1 mapping for MSI register when IOMMU is initialized and
> forget about it, so that MSI messages are guaranteed to reach the
> host? Or would this mean that we would have to do 1:1 mapping for the
> whole address range? Looks like (1) tried to do something similar,
> with address reservation.

Neither. We want userspace to be in control of the memory map, and it
is the kernel's job to tell us whether or not this matches the HW
capabilities or not. A fixed mapping may completely clash with the
memory map I want (think emulating HW x on platform y), and there is no
reason why we should have the restrictions x86 has.

My understanding is that userspace should be in charge here, and maybe
obtain a range of acceptable IOVAs for the MSI doorbell to be mapped.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/3] Introduce MSI hardware mapping for VFIO
  2015-12-03 13:16   ` Pavel Fedin
  2015-12-03 14:49     ` Marc Zyngier
@ 2015-12-03 17:05     ` Alex Williamson
  1 sibling, 0 replies; 8+ messages in thread
From: Alex Williamson @ 2015-12-03 17:05 UTC (permalink / raw)
  To: Pavel Fedin
  Cc: kvm, kvmarm, 'Marc Zyngier', 'Thomas Gleixner',
	'Jason Cooper', 'Manish Jaggi',
	'Pranavkumar Sawargaonkar', 'Bharat Bhushan'

On Thu, 2015-12-03 at 16:16 +0300, Pavel Fedin wrote:
>  Hello!
> 
> > I like that you're making this transparent
> > for the user, but at the same time, directly calling function pointers
> > through the msi_domain_ops is quite ugly.
> 
>  Do you mean dereferencing info->ops->vfio_map() in .c code? I can
> introduce some wrappers in include/linux/msi.h like
> msi_domain_vfio_map()/msi_domain_vfio_unmap(), this would not
> conceptually change anything.

But otherwise we're parsing data structures that are possibly intended
to be private.  An interface also abstracts the implementation from the
caller.

> >  There needs to be a real, interface there that isn't specific to
> vfio.
> 
>  Hm... What else is going to use this?

I don't know, but pushing vfio specific data structures and concepts
into core kernel callbacks is clearly the wrong direction to go.

>  Actually, in my implementation the only thing specific to vfio is
> using struct vfio_iommu_driver_ops. This is because we have to perform
> MSI mapping for all "vfio domains" registered for this container. At
> least this is how original type1 driver works.
>  Can anybody explain me, what these "vfio domains" are? From the code
> it looks like we can have several IOMMU instances belonging to one
> VFIO container, and in this case one IOMMU == one "vfio domain". So is
> my understanding correct that "vfio domain" is IOMMU instance?

There's no such thing as a vfio domain, I think you mean iommu domains.
A vfio container represents a user iommu context.  All of the groups
(and thus devices) within a container have the same iommu mappings.
However, not all of the groups are necessarily behind iommu hardware
units that support the same set of features.  We might therefore need to
mirror the user iommu context for the container across multiple physical
iommu contexts (aka domains).  When we walk the iommu->domain_list,
we're mirroring mappings across these multiple iommu domains within the
container.

>  And here come completely different ideas...
>  First of all, can anybody explain, why do i perform all mappings on
> per-IOMMU basis, not on per-device basis? AFAIK at least ARM SMMU
> knows about "stream IDs", and therefore it should be capable of
> distinguishing between different devices. So can i have per-device
> mapping? This would make things much simpler.

vfio is built on iommu groups with the premise being that an iommu group
represents the smallest set of devices that are isolated from all other
devices both by iommu visibility and by DMA isolation (peer-to-peer).
Therefore we base iommu mappings on an iommu group because we cannot
enforce userspace isolation at a sub-group level.  In a system or
topology that is well architected for device isolation, there will be a
one-to-one mapping of iommu groups to devices.

So, iommu mappings are always on a per-group basis, but the user may
attach multiple groups to a single container, which as discussed above
represents a single iommu context.  That single context may be backed by
one or more iommu domains, depending on the capabilities of the iommu
hardware.  You're therefore not performing all mappings on a per-iommu
basis unless you have a user defined container spanning multiple iommus
which are incompatible in a way that requires us to manage them with
separate iommu domains.

The iommu's ability to do per device mappings here is irrelevant.
You're working within a user defined IOMMU context where they have
decided that all of the devices should have the same context.

>  So:
>  Idea 1: do per-device mappings. In this case i don't have to track
> down which devices belong to which group and which IOMMU...

Nak, that doesn't solve anything.

>  Idea 2: What if we indeed simply simulate x86 behavior? What if we
> just do 1:1 mapping for MSI register when IOMMU is initialized and
> forget about it, so that MSI messages are guaranteed to reach the
> host? Or would this mean that we would have to do 1:1 mapping for the
> whole address range? Looks like (1) tried to do something similar,
> with address reservation.

x86 isn't problem-free in this space.  An x86 VM is going to know that
the 0xfee00000 address range is special, it won't be backed by RAM and
won't be a DMA target, thus we'll never attempt to map it for an iova
address.  However, if we run a non-x86 VM or a userspace driver, it
doesn't necessarily know that there's anything special about that range
of iovas.  I intend to resolve this with an extension to the iommu info
ioctl that describes the available iova space for the iommu.  The
interrupt region would simply be excluded.

This may be an option for you too, but you need to consider whether it
precludes things like hotplug.  Even in the x86 case, if we have a
non-x86 VM and we try to hot-add a PCI device, we can't dynamically
remove the RAM that would interfere with with the MSI vector block.  I
don't know what that looks like on your platform, whether you can pick a
fixed range for the VM and use it regardless of the devices attached
later.

>  Idea 3: Is single device guaranteed to correspond to a single "vfio
> domain" (and, as a consequence, to a single IOMMU)? In this case it's
> very easy to unlink interface introduced by 0002 of my series from
> vfio, and pass just raw struct iommu_domain * without any driver_ops?
> irqchip code would only need iommu_map() and iommu_unmap() then, no
> calling back to vfio layer.

Again, the context is per contain.  It would certainly be more correct
to setup the mapping for each iommu domain than to introduce any vfio
related concepts down at the MSI mapping level, but you're still
imposing a mapping into the user's iommu context.  That needs to be
dealt with.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-12-03 17:05 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-24 13:50 [PATCH v2 0/3] Introduce MSI hardware mapping for VFIO Pavel Fedin
2015-11-24 13:50 ` [PATCH v2 1/3] vfio: Introduce map and unmap operations Pavel Fedin
2015-11-24 13:50 ` [PATCH v2 2/3] gicv3, its: Introduce VFIO " Pavel Fedin
2015-11-24 13:50 ` [PATCH v2 3/3] vfio: Introduce generic MSI mapping operations Pavel Fedin
2015-12-02 21:32 ` [PATCH v2 0/3] Introduce MSI hardware mapping for VFIO Alex Williamson
2015-12-03 13:16   ` Pavel Fedin
2015-12-03 14:49     ` Marc Zyngier
2015-12-03 17:05     ` Alex Williamson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.