linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/4] Virtio uses DMA API for all devices
@ 2018-07-20  3:59 Anshuman Khandual
  2018-07-20  3:59 ` [RFC 1/4] virtio: Define virtio_direct_dma_ops structure Anshuman Khandual
                   ` (6 more replies)
  0 siblings, 7 replies; 119+ messages in thread
From: Anshuman Khandual @ 2018-07-20  3:59 UTC (permalink / raw)
  To: virtualization, linux-kernel
  Cc: linuxppc-dev, aik, robh, joe, elfring, david, jasowang, benh,
	mpe, mst, hch, khandual, linuxram, haren, paulus, srikar

This patch series is the follow up on the discussions we had before about
the RFC titled [RFC,V2] virtio: Add platform specific DMA API translation
for virito devices (https://patchwork.kernel.org/patch/10417371/). There
were suggestions about doing away with two different paths of transactions
with the host/QEMU, first being the direct GPA and the other being the DMA
API based translations.

First patch attempts to create a direct GPA mapping based DMA operations
structure called 'virtio_direct_dma_ops' with exact same implementation
of the direct GPA path which virtio core currently has but just wrapped in
a DMA API format. Virtio core must use 'virtio_direct_dma_ops' instead of
the arch default in absence of VIRTIO_F_IOMMU_PLATFORM flag to preserve the
existing semantics. The second patch does exactly that inside the function
virtio_finalize_features(). The third patch removes the default direct GPA
path from virtio core forcing it to use DMA API callbacks for all devices.
Now with that change, every device must have a DMA operations structure
associated with it. The fourth patch adds an additional hook which gives
the platform an opportunity to do yet another override if required. This
platform hook can be used on POWER Ultravisor based protected guests to
load up SWIOTLB DMA callbacks to do the required (as discussed previously
in the above mentioned thread how host is allowed to access only parts of
the guest GPA range) bounce buffering into the shared memory for all I/O
scatter gather buffers to be consumed on the host side.

Please go through these patches and review whether this approach broadly
makes sense. I will appreciate suggestions, inputs, comments regarding
the patches or the approach in general. Thank you.

Anshuman Khandual (4):
  virtio: Define virtio_direct_dma_ops structure
  virtio: Override device's DMA OPS with virtio_direct_dma_ops selectively
  virtio: Force virtio core to use DMA API callbacks for all virtio devices
  virtio: Add platform specific DMA API translation for virito devices

 arch/powerpc/include/asm/dma-mapping.h |  6 +++
 arch/powerpc/platforms/pseries/iommu.c |  6 +++
 drivers/virtio/virtio.c                | 72 ++++++++++++++++++++++++++++++++++
 drivers/virtio/virtio_pci_common.h     |  3 ++
 drivers/virtio/virtio_ring.c           | 65 +-----------------------------
 5 files changed, 89 insertions(+), 63 deletions(-)

-- 
2.9.3


^ permalink raw reply	[flat|nested] 119+ messages in thread

* [RFC 1/4] virtio: Define virtio_direct_dma_ops structure
  2018-07-20  3:59 [RFC 0/4] Virtio uses DMA API for all devices Anshuman Khandual
@ 2018-07-20  3:59 ` Anshuman Khandual
  2018-07-30  9:24   ` Christoph Hellwig
  2018-07-20  3:59 ` [RFC 2/4] virtio: Override device's DMA OPS with virtio_direct_dma_ops selectively Anshuman Khandual
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 119+ messages in thread
From: Anshuman Khandual @ 2018-07-20  3:59 UTC (permalink / raw)
  To: virtualization, linux-kernel
  Cc: linuxppc-dev, aik, robh, joe, elfring, david, jasowang, benh,
	mpe, mst, hch, khandual, linuxram, haren, paulus, srikar

Current implementation of DMA API inside virtio core calls device's DMA OPS
callback functions when the flag VIRTIO_F_IOMMU_PLATFORM flag is set. But
in absence of the flag, virtio core falls back calling basic transformation
of the incoming SG addresses as GPA. Going forward virtio should only call
DMA API based transformations generating either GPA or IOVA depending on
QEMU expectations again based on VIRTIO_F_IOMMU_PLATFORM flag. It requires
removing existing fallback code path for GPA transformation and replacing
that with a direct map DMA OPS structure. This adds that direct mapping DMA
OPS structure to be used in later patches which will make virtio core call
DMA API all the time for all virtio devices.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 drivers/virtio/virtio.c            | 60 ++++++++++++++++++++++++++++++++++++++
 drivers/virtio/virtio_pci_common.h |  3 ++
 2 files changed, 63 insertions(+)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 59e36ef..7907ad3 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -3,6 +3,7 @@
 #include <linux/virtio_config.h>
 #include <linux/module.h>
 #include <linux/idr.h>
+#include <linux/dma-mapping.h>
 #include <uapi/linux/virtio_ids.h>
 
 /* Unique numbering for virtio devices. */
@@ -442,3 +443,62 @@ core_initcall(virtio_init);
 module_exit(virtio_exit);
 
 MODULE_LICENSE("GPL");
+
+/*
+ * Virtio direct mapping DMA API operations structure
+ *
+ * This defines DMA API structure for all virtio devices which would not
+ * either bring in their own DMA OPS from architecture or they would not
+ * like to use architecture specific IOMMU based DMA OPS because QEMU
+ * expects GPA instead of an IOVA in absence of VIRTIO_F_IOMMU_PLATFORM.
+ */
+dma_addr_t virtio_direct_map_page(struct device *dev, struct page *page,
+			    unsigned long offset, size_t size,
+			    enum dma_data_direction dir,
+			    unsigned long attrs)
+{
+	return page_to_phys(page) + offset;
+}
+
+void virtio_direct_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
+			size_t size, enum dma_data_direction dir,
+			unsigned long attrs)
+{
+}
+
+int virtio_direct_mapping_error(struct device *hwdev, dma_addr_t dma_addr)
+{
+	return 0;
+}
+
+void *virtio_direct_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
+		gfp_t gfp, unsigned long attrs)
+{
+	void *queue = alloc_pages_exact(PAGE_ALIGN(size), gfp);
+
+	if (queue) {
+		phys_addr_t phys_addr = virt_to_phys(queue);
+		*dma_handle = (dma_addr_t)phys_addr;
+
+		if (WARN_ON_ONCE(*dma_handle != phys_addr)) {
+			free_pages_exact(queue, PAGE_ALIGN(size));
+			return NULL;
+		}
+	}
+	return queue;
+}
+
+void virtio_direct_free(struct device *dev, size_t size, void *vaddr,
+		dma_addr_t dma_addr, unsigned long attrs)
+{
+	free_pages_exact(vaddr, PAGE_ALIGN(size));
+}
+
+const struct dma_map_ops virtio_direct_dma_ops = {
+	.alloc			= virtio_direct_alloc,
+	.free			= virtio_direct_free,
+	.map_page		= virtio_direct_map_page,
+	.unmap_page		= virtio_direct_unmap_page,
+	.mapping_error		= virtio_direct_mapping_error,
+};
+EXPORT_SYMBOL(virtio_direct_dma_ops);
diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h
index 135ee3c..ec44d2f 100644
--- a/drivers/virtio/virtio_pci_common.h
+++ b/drivers/virtio/virtio_pci_common.h
@@ -31,6 +31,9 @@
 #include <linux/highmem.h>
 #include <linux/spinlock.h>
 
+extern struct dma_map_ops virtio_direct_dma_ops;
+
+
 struct virtio_pci_vq_info {
 	/* the actual virtqueue */
 	struct virtqueue *vq;
-- 
2.9.3


^ permalink raw reply	[flat|nested] 119+ messages in thread

* [RFC 2/4] virtio: Override device's DMA OPS with virtio_direct_dma_ops selectively
  2018-07-20  3:59 [RFC 0/4] Virtio uses DMA API for all devices Anshuman Khandual
  2018-07-20  3:59 ` [RFC 1/4] virtio: Define virtio_direct_dma_ops structure Anshuman Khandual
@ 2018-07-20  3:59 ` Anshuman Khandual
  2018-07-28  8:56   ` Anshuman Khandual
  2018-07-30  9:25   ` Christoph Hellwig
  2018-07-20  3:59 ` [RFC 3/4] virtio: Force virtio core to use DMA API callbacks for all virtio devices Anshuman Khandual
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 119+ messages in thread
From: Anshuman Khandual @ 2018-07-20  3:59 UTC (permalink / raw)
  To: virtualization, linux-kernel
  Cc: linuxppc-dev, aik, robh, joe, elfring, david, jasowang, benh,
	mpe, mst, hch, khandual, linuxram, haren, paulus, srikar

Now that virtio core always needs all virtio devices to have DMA OPS, we
need to make sure that the structure it points is the right one. In the
absence of VIRTIO_F_IOMMU_PLATFORM flag QEMU expects GPA from guest kernel.
In such case, virtio device must use default virtio_direct_dma_ops DMA OPS
structure which transforms scatter gather buffer addresses as GPA. This
DMA OPS override must happen as early as possible during virtio device
initializatin sequence before virtio core starts using given device's DMA
OPS callbacks for I/O transactions. This change detects device's IOMMU flag
and does the override in case the flag is cleared.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 drivers/virtio/virtio.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 7907ad3..6b13987 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -166,6 +166,8 @@ void virtio_add_status(struct virtio_device *dev, unsigned int status)
 }
 EXPORT_SYMBOL_GPL(virtio_add_status);
 
+const struct dma_map_ops virtio_direct_dma_ops;
+
 int virtio_finalize_features(struct virtio_device *dev)
 {
 	int ret = dev->config->finalize_features(dev);
@@ -174,6 +176,9 @@ int virtio_finalize_features(struct virtio_device *dev)
 	if (ret)
 		return ret;
 
+	if (virtio_has_iommu_quirk(dev))
+		set_dma_ops(dev->dev.parent, &virtio_direct_dma_ops);
+
 	if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1))
 		return 0;
 
-- 
2.9.3


^ permalink raw reply	[flat|nested] 119+ messages in thread

* [RFC 3/4] virtio: Force virtio core to use DMA API callbacks for all virtio devices
  2018-07-20  3:59 [RFC 0/4] Virtio uses DMA API for all devices Anshuman Khandual
  2018-07-20  3:59 ` [RFC 1/4] virtio: Define virtio_direct_dma_ops structure Anshuman Khandual
  2018-07-20  3:59 ` [RFC 2/4] virtio: Override device's DMA OPS with virtio_direct_dma_ops selectively Anshuman Khandual
@ 2018-07-20  3:59 ` Anshuman Khandual
  2018-07-20  3:59 ` [RFC 4/4] virtio: Add platform specific DMA API translation for virito devices Anshuman Khandual
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 119+ messages in thread
From: Anshuman Khandual @ 2018-07-20  3:59 UTC (permalink / raw)
  To: virtualization, linux-kernel
  Cc: linuxppc-dev, aik, robh, joe, elfring, david, jasowang, benh,
	mpe, mst, hch, khandual, linuxram, haren, paulus, srikar

Virtio core should use DMA API callbacks for all virtio devices which may
generate either GAP or IOVA depending on VIRTIO_F_IOMMU_PLATFORM flag and
resulting QEMU expectations. This implies that every virtio device needs
to have a DMA OPS structure. This removes previous GPA fallback code paths.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 drivers/virtio/virtio_ring.c | 65 ++------------------------------------------
 1 file changed, 2 insertions(+), 63 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 814b395..c265964 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -141,26 +141,6 @@ struct vring_virtqueue {
  * unconditionally on data path.
  */
 
-static bool vring_use_dma_api(struct virtio_device *vdev)
-{
-	if (!virtio_has_iommu_quirk(vdev))
-		return true;
-
-	/* Otherwise, we are left to guess. */
-	/*
-	 * In theory, it's possible to have a buggy QEMU-supposed
-	 * emulated Q35 IOMMU and Xen enabled at the same time.  On
-	 * such a configuration, virtio has never worked and will
-	 * not work without an even larger kludge.  Instead, enable
-	 * the DMA API if we're a Xen guest, which at least allows
-	 * all of the sensible Xen configurations to work correctly.
-	 */
-	if (xen_domain())
-		return true;
-
-	return false;
-}
-
 /*
  * The DMA ops on various arches are rather gnarly right now, and
  * making all of the arch DMA ops work on the vring device itself
@@ -176,9 +156,6 @@ static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq,
 				   struct scatterlist *sg,
 				   enum dma_data_direction direction)
 {
-	if (!vring_use_dma_api(vq->vq.vdev))
-		return (dma_addr_t)sg_phys(sg);
-
 	/*
 	 * We can't use dma_map_sg, because we don't use scatterlists in
 	 * the way it expects (we don't guarantee that the scatterlist
@@ -193,9 +170,6 @@ static dma_addr_t vring_map_single(const struct vring_virtqueue *vq,
 				   void *cpu_addr, size_t size,
 				   enum dma_data_direction direction)
 {
-	if (!vring_use_dma_api(vq->vq.vdev))
-		return (dma_addr_t)virt_to_phys(cpu_addr);
-
 	return dma_map_single(vring_dma_dev(vq),
 			      cpu_addr, size, direction);
 }
@@ -205,9 +179,6 @@ static void vring_unmap_one(const struct vring_virtqueue *vq,
 {
 	u16 flags;
 
-	if (!vring_use_dma_api(vq->vq.vdev))
-		return;
-
 	flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
 
 	if (flags & VRING_DESC_F_INDIRECT) {
@@ -228,9 +199,6 @@ static void vring_unmap_one(const struct vring_virtqueue *vq,
 static int vring_mapping_error(const struct vring_virtqueue *vq,
 			       dma_addr_t addr)
 {
-	if (!vring_use_dma_api(vq->vq.vdev))
-		return 0;
-
 	return dma_mapping_error(vring_dma_dev(vq), addr);
 }
 
@@ -1016,43 +984,14 @@ EXPORT_SYMBOL_GPL(__vring_new_virtqueue);
 static void *vring_alloc_queue(struct virtio_device *vdev, size_t size,
 			      dma_addr_t *dma_handle, gfp_t flag)
 {
-	if (vring_use_dma_api(vdev)) {
-		return dma_alloc_coherent(vdev->dev.parent, size,
+	return dma_alloc_coherent(vdev->dev.parent, size,
 					  dma_handle, flag);
-	} else {
-		void *queue = alloc_pages_exact(PAGE_ALIGN(size), flag);
-		if (queue) {
-			phys_addr_t phys_addr = virt_to_phys(queue);
-			*dma_handle = (dma_addr_t)phys_addr;
-
-			/*
-			 * Sanity check: make sure we dind't truncate
-			 * the address.  The only arches I can find that
-			 * have 64-bit phys_addr_t but 32-bit dma_addr_t
-			 * are certain non-highmem MIPS and x86
-			 * configurations, but these configurations
-			 * should never allocate physical pages above 32
-			 * bits, so this is fine.  Just in case, throw a
-			 * warning and abort if we end up with an
-			 * unrepresentable address.
-			 */
-			if (WARN_ON_ONCE(*dma_handle != phys_addr)) {
-				free_pages_exact(queue, PAGE_ALIGN(size));
-				return NULL;
-			}
-		}
-		return queue;
-	}
 }
 
 static void vring_free_queue(struct virtio_device *vdev, size_t size,
 			     void *queue, dma_addr_t dma_handle)
 {
-	if (vring_use_dma_api(vdev)) {
-		dma_free_coherent(vdev->dev.parent, size, queue, dma_handle);
-	} else {
-		free_pages_exact(queue, PAGE_ALIGN(size));
-	}
+	dma_free_coherent(vdev->dev.parent, size, queue, dma_handle);
 }
 
 struct virtqueue *vring_create_virtqueue(
-- 
2.9.3


^ permalink raw reply	[flat|nested] 119+ messages in thread

* [RFC 4/4] virtio: Add platform specific DMA API translation for virito devices
  2018-07-20  3:59 [RFC 0/4] Virtio uses DMA API for all devices Anshuman Khandual
                   ` (2 preceding siblings ...)
  2018-07-20  3:59 ` [RFC 3/4] virtio: Force virtio core to use DMA API callbacks for all virtio devices Anshuman Khandual
@ 2018-07-20  3:59 ` Anshuman Khandual
  2018-07-20 13:15   ` Michael S. Tsirkin
  2018-07-20 13:16 ` [RFC 0/4] Virtio uses DMA API for all devices Michael S. Tsirkin
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 119+ messages in thread
From: Anshuman Khandual @ 2018-07-20  3:59 UTC (permalink / raw)
  To: virtualization, linux-kernel
  Cc: linuxppc-dev, aik, robh, joe, elfring, david, jasowang, benh,
	mpe, mst, hch, khandual, linuxram, haren, paulus, srikar

This adds a hook which a platform can define in order to allow it to
override virtio device's DMA OPS irrespective of whether it has the
flag VIRTIO_F_IOMMU_PLATFORM set or not. We want to use this to do
bounce-buffering of data on the new secure pSeries platform, currently
under development, where a KVM host cannot access all of the memory
space of a secure KVM guest.  The host can only access the pages which
the guest has explicitly requested to be shared with the host, thus
the virtio implementation in the guest has to copy data to and from
shared pages.

With this hook, the platform code in the secure guest can force the
use of swiotlb for virtio buffers, with a back-end for swiotlb which
will use a pool of pre-allocated shared pages.  Thus all data being
sent or received by virtio devices will be copied through pages which
the host has access to.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/dma-mapping.h | 6 ++++++
 arch/powerpc/platforms/pseries/iommu.c | 6 ++++++
 drivers/virtio/virtio.c                | 7 +++++++
 3 files changed, 19 insertions(+)

diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h
index 8fa3945..bc5a9d3 100644
--- a/arch/powerpc/include/asm/dma-mapping.h
+++ b/arch/powerpc/include/asm/dma-mapping.h
@@ -116,3 +116,9 @@ extern u64 __dma_get_required_mask(struct device *dev);
 
 #endif /* __KERNEL__ */
 #endif	/* _ASM_DMA_MAPPING_H */
+
+#define platform_override_dma_ops platform_override_dma_ops
+
+struct virtio_device;
+
+extern void platform_override_dma_ops(struct virtio_device *vdev);
diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 06f0296..5773bc7 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -38,6 +38,7 @@
 #include <linux/of.h>
 #include <linux/iommu.h>
 #include <linux/rculist.h>
+#include <linux/virtio.h>
 #include <asm/io.h>
 #include <asm/prom.h>
 #include <asm/rtas.h>
@@ -1396,3 +1397,8 @@ static int __init disable_multitce(char *str)
 __setup("multitce=", disable_multitce);
 
 machine_subsys_initcall_sync(pseries, tce_iommu_bus_notifier_init);
+
+void platform_override_dma_ops(struct virtio_device *vdev)
+{
+	/* Override vdev->parent.dma_ops if required */
+}
diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 6b13987..432c332 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -168,6 +168,12 @@ EXPORT_SYMBOL_GPL(virtio_add_status);
 
 const struct dma_map_ops virtio_direct_dma_ops;
 
+#ifndef platform_override_dma_ops
+static inline void platform_override_dma_ops(struct virtio_device *vdev)
+{
+}
+#endif
+
 int virtio_finalize_features(struct virtio_device *dev)
 {
 	int ret = dev->config->finalize_features(dev);
@@ -179,6 +185,7 @@ int virtio_finalize_features(struct virtio_device *dev)
 	if (virtio_has_iommu_quirk(dev))
 		set_dma_ops(dev->dev.parent, &virtio_direct_dma_ops);
 
+	platform_override_dma_ops(dev);
 	if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1))
 		return 0;
 
-- 
2.9.3


^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 4/4] virtio: Add platform specific DMA API translation for virito devices
  2018-07-20  3:59 ` [RFC 4/4] virtio: Add platform specific DMA API translation for virito devices Anshuman Khandual
@ 2018-07-20 13:15   ` Michael S. Tsirkin
  2018-07-23  2:16     ` Anshuman Khandual
  0 siblings, 1 reply; 119+ messages in thread
From: Michael S. Tsirkin @ 2018-07-20 13:15 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, benh, mpe, hch, linuxram, haren,
	paulus, srikar

On Fri, Jul 20, 2018 at 09:29:41AM +0530, Anshuman Khandual wrote:
>Subject: Re: [RFC 4/4] virtio: Add platform specific DMA API translation for
> virito devices

s/virito/virtio/

> This adds a hook which a platform can define in order to allow it to
> override virtio device's DMA OPS irrespective of whether it has the
> flag VIRTIO_F_IOMMU_PLATFORM set or not. We want to use this to do
> bounce-buffering of data on the new secure pSeries platform, currently
> under development, where a KVM host cannot access all of the memory
> space of a secure KVM guest.  The host can only access the pages which
> the guest has explicitly requested to be shared with the host, thus
> the virtio implementation in the guest has to copy data to and from
> shared pages.
> 
> With this hook, the platform code in the secure guest can force the
> use of swiotlb for virtio buffers, with a back-end for swiotlb which
> will use a pool of pre-allocated shared pages.  Thus all data being
> sent or received by virtio devices will be copied through pages which
> the host has access to.
> 
> Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/dma-mapping.h | 6 ++++++
>  arch/powerpc/platforms/pseries/iommu.c | 6 ++++++
>  drivers/virtio/virtio.c                | 7 +++++++
>  3 files changed, 19 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h
> index 8fa3945..bc5a9d3 100644
> --- a/arch/powerpc/include/asm/dma-mapping.h
> +++ b/arch/powerpc/include/asm/dma-mapping.h
> @@ -116,3 +116,9 @@ extern u64 __dma_get_required_mask(struct device *dev);
>  
>  #endif /* __KERNEL__ */
>  #endif	/* _ASM_DMA_MAPPING_H */
> +
> +#define platform_override_dma_ops platform_override_dma_ops
> +
> +struct virtio_device;
> +
> +extern void platform_override_dma_ops(struct virtio_device *vdev);
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index 06f0296..5773bc7 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -38,6 +38,7 @@
>  #include <linux/of.h>
>  #include <linux/iommu.h>
>  #include <linux/rculist.h>
> +#include <linux/virtio.h>
>  #include <asm/io.h>
>  #include <asm/prom.h>
>  #include <asm/rtas.h>
> @@ -1396,3 +1397,8 @@ static int __init disable_multitce(char *str)
>  __setup("multitce=", disable_multitce);
>  
>  machine_subsys_initcall_sync(pseries, tce_iommu_bus_notifier_init);
> +
> +void platform_override_dma_ops(struct virtio_device *vdev)
> +{
> +	/* Override vdev->parent.dma_ops if required */
> +}
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 6b13987..432c332 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -168,6 +168,12 @@ EXPORT_SYMBOL_GPL(virtio_add_status);
>  
>  const struct dma_map_ops virtio_direct_dma_ops;
>  
> +#ifndef platform_override_dma_ops
> +static inline void platform_override_dma_ops(struct virtio_device *vdev)
> +{
> +}
> +#endif
> +
>  int virtio_finalize_features(struct virtio_device *dev)
>  {
>  	int ret = dev->config->finalize_features(dev);
> @@ -179,6 +185,7 @@ int virtio_finalize_features(struct virtio_device *dev)
>  	if (virtio_has_iommu_quirk(dev))
>  		set_dma_ops(dev->dev.parent, &virtio_direct_dma_ops);
>  
> +	platform_override_dma_ops(dev);

Is there a single place where virtio_has_iommu_quirk is called now?
If so, we could put this into virtio_has_iommu_quirk then.

>  	if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1))
>  		return 0;
>  
> -- 
> 2.9.3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-07-20  3:59 [RFC 0/4] Virtio uses DMA API for all devices Anshuman Khandual
                   ` (3 preceding siblings ...)
  2018-07-20  3:59 ` [RFC 4/4] virtio: Add platform specific DMA API translation for virito devices Anshuman Khandual
@ 2018-07-20 13:16 ` Michael S. Tsirkin
  2018-07-23  6:28   ` Anshuman Khandual
  2018-07-27  9:58 ` Will Deacon
  2018-08-02 20:55 ` Michael S. Tsirkin
  6 siblings, 1 reply; 119+ messages in thread
From: Michael S. Tsirkin @ 2018-07-20 13:16 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, benh, mpe, hch, linuxram, haren,
	paulus, srikar

On Fri, Jul 20, 2018 at 09:29:37AM +0530, Anshuman Khandual wrote:
> This patch series is the follow up on the discussions we had before about
> the RFC titled [RFC,V2] virtio: Add platform specific DMA API translation
> for virito devices (https://patchwork.kernel.org/patch/10417371/). There
> were suggestions about doing away with two different paths of transactions
> with the host/QEMU, first being the direct GPA and the other being the DMA
> API based translations.
> 
> First patch attempts to create a direct GPA mapping based DMA operations
> structure called 'virtio_direct_dma_ops' with exact same implementation
> of the direct GPA path which virtio core currently has but just wrapped in
> a DMA API format. Virtio core must use 'virtio_direct_dma_ops' instead of
> the arch default in absence of VIRTIO_F_IOMMU_PLATFORM flag to preserve the
> existing semantics. The second patch does exactly that inside the function
> virtio_finalize_features(). The third patch removes the default direct GPA
> path from virtio core forcing it to use DMA API callbacks for all devices.
> Now with that change, every device must have a DMA operations structure
> associated with it. The fourth patch adds an additional hook which gives
> the platform an opportunity to do yet another override if required. This
> platform hook can be used on POWER Ultravisor based protected guests to
> load up SWIOTLB DMA callbacks to do the required (as discussed previously
> in the above mentioned thread how host is allowed to access only parts of
> the guest GPA range) bounce buffering into the shared memory for all I/O
> scatter gather buffers to be consumed on the host side.
> 
> Please go through these patches and review whether this approach broadly
> makes sense. I will appreciate suggestions, inputs, comments regarding
> the patches or the approach in general. Thank you.

I like how patches 1-3 look. Could you test performance
with/without to see whether the extra indirection through
use of DMA ops causes a measurable slow-down?

> Anshuman Khandual (4):
>   virtio: Define virtio_direct_dma_ops structure
>   virtio: Override device's DMA OPS with virtio_direct_dma_ops selectively
>   virtio: Force virtio core to use DMA API callbacks for all virtio devices
>   virtio: Add platform specific DMA API translation for virito devices
> 
>  arch/powerpc/include/asm/dma-mapping.h |  6 +++
>  arch/powerpc/platforms/pseries/iommu.c |  6 +++
>  drivers/virtio/virtio.c                | 72 ++++++++++++++++++++++++++++++++++
>  drivers/virtio/virtio_pci_common.h     |  3 ++
>  drivers/virtio/virtio_ring.c           | 65 +-----------------------------
>  5 files changed, 89 insertions(+), 63 deletions(-)
> 
> -- 
> 2.9.3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 4/4] virtio: Add platform specific DMA API translation for virito devices
  2018-07-20 13:15   ` Michael S. Tsirkin
@ 2018-07-23  2:16     ` Anshuman Khandual
  2018-07-25  4:30       ` Anshuman Khandual
  2018-07-25 13:31       ` Michael S. Tsirkin
  0 siblings, 2 replies; 119+ messages in thread
From: Anshuman Khandual @ 2018-07-23  2:16 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, benh, mpe, hch, linuxram, haren,
	paulus, srikar

On 07/20/2018 06:45 PM, Michael S. Tsirkin wrote:
> On Fri, Jul 20, 2018 at 09:29:41AM +0530, Anshuman Khandual wrote:
>> Subject: Re: [RFC 4/4] virtio: Add platform specific DMA API translation for
>> virito devices
> 
> s/virito/virtio/

Oops, will fix it. Thanks for pointing out.

> 
>> This adds a hook which a platform can define in order to allow it to
>> override virtio device's DMA OPS irrespective of whether it has the
>> flag VIRTIO_F_IOMMU_PLATFORM set or not. We want to use this to do
>> bounce-buffering of data on the new secure pSeries platform, currently
>> under development, where a KVM host cannot access all of the memory
>> space of a secure KVM guest.  The host can only access the pages which
>> the guest has explicitly requested to be shared with the host, thus
>> the virtio implementation in the guest has to copy data to and from
>> shared pages.
>>
>> With this hook, the platform code in the secure guest can force the
>> use of swiotlb for virtio buffers, with a back-end for swiotlb which
>> will use a pool of pre-allocated shared pages.  Thus all data being
>> sent or received by virtio devices will be copied through pages which
>> the host has access to.
>>
>> Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/include/asm/dma-mapping.h | 6 ++++++
>>  arch/powerpc/platforms/pseries/iommu.c | 6 ++++++
>>  drivers/virtio/virtio.c                | 7 +++++++
>>  3 files changed, 19 insertions(+)
>>
>> diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h
>> index 8fa3945..bc5a9d3 100644
>> --- a/arch/powerpc/include/asm/dma-mapping.h
>> +++ b/arch/powerpc/include/asm/dma-mapping.h
>> @@ -116,3 +116,9 @@ extern u64 __dma_get_required_mask(struct device *dev);
>>  
>>  #endif /* __KERNEL__ */
>>  #endif	/* _ASM_DMA_MAPPING_H */
>> +
>> +#define platform_override_dma_ops platform_override_dma_ops
>> +
>> +struct virtio_device;
>> +
>> +extern void platform_override_dma_ops(struct virtio_device *vdev);
>> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
>> index 06f0296..5773bc7 100644
>> --- a/arch/powerpc/platforms/pseries/iommu.c
>> +++ b/arch/powerpc/platforms/pseries/iommu.c
>> @@ -38,6 +38,7 @@
>>  #include <linux/of.h>
>>  #include <linux/iommu.h>
>>  #include <linux/rculist.h>
>> +#include <linux/virtio.h>
>>  #include <asm/io.h>
>>  #include <asm/prom.h>
>>  #include <asm/rtas.h>
>> @@ -1396,3 +1397,8 @@ static int __init disable_multitce(char *str)
>>  __setup("multitce=", disable_multitce);
>>  
>>  machine_subsys_initcall_sync(pseries, tce_iommu_bus_notifier_init);
>> +
>> +void platform_override_dma_ops(struct virtio_device *vdev)
>> +{
>> +	/* Override vdev->parent.dma_ops if required */
>> +}
>> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
>> index 6b13987..432c332 100644
>> --- a/drivers/virtio/virtio.c
>> +++ b/drivers/virtio/virtio.c
>> @@ -168,6 +168,12 @@ EXPORT_SYMBOL_GPL(virtio_add_status);
>>  
>>  const struct dma_map_ops virtio_direct_dma_ops;
>>  
>> +#ifndef platform_override_dma_ops
>> +static inline void platform_override_dma_ops(struct virtio_device *vdev)
>> +{
>> +}
>> +#endif
>> +
>>  int virtio_finalize_features(struct virtio_device *dev)
>>  {
>>  	int ret = dev->config->finalize_features(dev);
>> @@ -179,6 +185,7 @@ int virtio_finalize_features(struct virtio_device *dev)
>>  	if (virtio_has_iommu_quirk(dev))
>>  		set_dma_ops(dev->dev.parent, &virtio_direct_dma_ops);
>>  
>> +	platform_override_dma_ops(dev);
> 
> Is there a single place where virtio_has_iommu_quirk is called now?

Not other than this one. But in the proposed implementation of
platform_override_dma_ops on powerpc, we will again check on
virtio_has_iommu_quirk before overriding it with SWIOTLB.

void platform_override_dma_ops(struct virtio_device *vdev)
{
        if (is_ultravisor_platform() && virtio_has_iommu_quirk(vdev))
                set_dma_ops(vdev->dev.parent, &swiotlb_dma_ops);
}

> If so, we could put this into virtio_has_iommu_quirk then.

Did you mean platform_override_dma_ops instead ? If so, yes that
is possible. Default implementation of platform_override_dma_ops
should just check on VIRTIO_F_IOMMU_PLATFORM feature and override
with virtio_direct_dma_ops but arch implementation can check on
what ever else they would like and override appropriately.

Default platform_override_dma_ops will be like this

#ifndef platform_override_dma_ops
static inline void platform_override_dma_ops(struct virtio_device *vdev)
{
	if(!virtio_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM))
		set_dma_ops(dev->dev.parent, &virtio_direct_dma_ops);
}
#endif

Proposed powerpc implementation will be like this instead

void platform_override_dma_ops(struct virtio_device *vdev)
{
	if (virtio_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM))
		return;

        if (is_ultravisor_platform())
                set_dma_ops(vdev->dev.parent, &swiotlb_dma_ops);
	else
		set_dma_ops(dev->dev.parent, &virtio_direct_dma_ops);
	
}

There is a redundant definition of virtio_has_iommu_quirk in the tools
directory (tools/virtio/linux/virtio_config.h) which does not seem to
be used any where. I guess that can be removed without problem.


^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-07-20 13:16 ` [RFC 0/4] Virtio uses DMA API for all devices Michael S. Tsirkin
@ 2018-07-23  6:28   ` Anshuman Khandual
  2018-07-23  9:08     ` Michael S. Tsirkin
  0 siblings, 1 reply; 119+ messages in thread
From: Anshuman Khandual @ 2018-07-23  6:28 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: robh, srikar, aik, jasowang, linuxram, linux-kernel,
	virtualization, hch, paulus, joe, linuxppc-dev, elfring, haren,
	david

On 07/20/2018 06:46 PM, Michael S. Tsirkin wrote:
> On Fri, Jul 20, 2018 at 09:29:37AM +0530, Anshuman Khandual wrote:
>> This patch series is the follow up on the discussions we had before about
>> the RFC titled [RFC,V2] virtio: Add platform specific DMA API translation
>> for virito devices (https://patchwork.kernel.org/patch/10417371/). There
>> were suggestions about doing away with two different paths of transactions
>> with the host/QEMU, first being the direct GPA and the other being the DMA
>> API based translations.
>>
>> First patch attempts to create a direct GPA mapping based DMA operations
>> structure called 'virtio_direct_dma_ops' with exact same implementation
>> of the direct GPA path which virtio core currently has but just wrapped in
>> a DMA API format. Virtio core must use 'virtio_direct_dma_ops' instead of
>> the arch default in absence of VIRTIO_F_IOMMU_PLATFORM flag to preserve the
>> existing semantics. The second patch does exactly that inside the function
>> virtio_finalize_features(). The third patch removes the default direct GPA
>> path from virtio core forcing it to use DMA API callbacks for all devices.
>> Now with that change, every device must have a DMA operations structure
>> associated with it. The fourth patch adds an additional hook which gives
>> the platform an opportunity to do yet another override if required. This
>> platform hook can be used on POWER Ultravisor based protected guests to
>> load up SWIOTLB DMA callbacks to do the required (as discussed previously
>> in the above mentioned thread how host is allowed to access only parts of
>> the guest GPA range) bounce buffering into the shared memory for all I/O
>> scatter gather buffers to be consumed on the host side.
>>
>> Please go through these patches and review whether this approach broadly
>> makes sense. I will appreciate suggestions, inputs, comments regarding
>> the patches or the approach in general. Thank you.
> I like how patches 1-3 look. Could you test performance
> with/without to see whether the extra indirection through
> use of DMA ops causes a measurable slow-down?

I ran this simple DD command 10 times where /dev/vda is a virtio block
device of 10GB size.

dd if=/dev/zero of=/dev/vda bs=8M count=1024 oflag=direct

With and without patches bandwidth which has a bit wide range does not
look that different from each other.

Without patches
===============

---------- 1 ---------
1024+0 records in
1024+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.95557 s, 4.4 GB/s
---------- 2 ---------
1024+0 records in
1024+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.05176 s, 4.2 GB/s
---------- 3 ---------
1024+0 records in
1024+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.88314 s, 4.6 GB/s
---------- 4 ---------
1024+0 records in
1024+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.84899 s, 4.6 GB/s
---------- 5 ---------
1024+0 records in
1024+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 5.37184 s, 1.6 GB/s
---------- 6 ---------
1024+0 records in
1024+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.9205 s, 4.5 GB/s
---------- 7 ---------
1024+0 records in
1024+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 6.85166 s, 1.3 GB/s
---------- 8 ---------
1024+0 records in
1024+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.74049 s, 4.9 GB/s
---------- 9 ---------
1024+0 records in
1024+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 6.31699 s, 1.4 GB/s
---------- 10 ---------
1024+0 records in
1024+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.47057 s, 3.5 GB/s


With patches
============

---------- 1 ---------
1024+0 records in
1024+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.25993 s, 3.8 GB/s
---------- 2 ---------
1024+0 records in
1024+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.82438 s, 4.7 GB/s
---------- 3 ---------
1024+0 records in
1024+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.93856 s, 4.4 GB/s
---------- 4 ---------
1024+0 records in
1024+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.83405 s, 4.7 GB/s
---------- 5 ---------
1024+0 records in
1024+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 7.50199 s, 1.1 GB/s
---------- 6 ---------
1024+0 records in
1024+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.28742 s, 3.8 GB/s
---------- 7 ---------
1024+0 records in
1024+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 5.74958 s, 1.5 GB/s
---------- 8 ---------
1024+0 records in
1024+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.99149 s, 4.3 GB/s
---------- 9 ---------
1024+0 records in
1024+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 5.67647 s, 1.5 GB/s
---------- 10 ---------
1024+0 records in
1024+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.93957 s, 2.9 GB/s

Does this look okay ?


^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-07-23  6:28   ` Anshuman Khandual
@ 2018-07-23  9:08     ` Michael S. Tsirkin
  2018-07-25  3:26       ` Anshuman Khandual
  0 siblings, 1 reply; 119+ messages in thread
From: Michael S. Tsirkin @ 2018-07-23  9:08 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: robh, srikar, aik, jasowang, linuxram, linux-kernel,
	virtualization, hch, paulus, joe, linuxppc-dev, elfring, haren,
	david

On Mon, Jul 23, 2018 at 11:58:23AM +0530, Anshuman Khandual wrote:
> On 07/20/2018 06:46 PM, Michael S. Tsirkin wrote:
> > On Fri, Jul 20, 2018 at 09:29:37AM +0530, Anshuman Khandual wrote:
> >> This patch series is the follow up on the discussions we had before about
> >> the RFC titled [RFC,V2] virtio: Add platform specific DMA API translation
> >> for virito devices (https://patchwork.kernel.org/patch/10417371/). There
> >> were suggestions about doing away with two different paths of transactions
> >> with the host/QEMU, first being the direct GPA and the other being the DMA
> >> API based translations.
> >>
> >> First patch attempts to create a direct GPA mapping based DMA operations
> >> structure called 'virtio_direct_dma_ops' with exact same implementation
> >> of the direct GPA path which virtio core currently has but just wrapped in
> >> a DMA API format. Virtio core must use 'virtio_direct_dma_ops' instead of
> >> the arch default in absence of VIRTIO_F_IOMMU_PLATFORM flag to preserve the
> >> existing semantics. The second patch does exactly that inside the function
> >> virtio_finalize_features(). The third patch removes the default direct GPA
> >> path from virtio core forcing it to use DMA API callbacks for all devices.
> >> Now with that change, every device must have a DMA operations structure
> >> associated with it. The fourth patch adds an additional hook which gives
> >> the platform an opportunity to do yet another override if required. This
> >> platform hook can be used on POWER Ultravisor based protected guests to
> >> load up SWIOTLB DMA callbacks to do the required (as discussed previously
> >> in the above mentioned thread how host is allowed to access only parts of
> >> the guest GPA range) bounce buffering into the shared memory for all I/O
> >> scatter gather buffers to be consumed on the host side.
> >>
> >> Please go through these patches and review whether this approach broadly
> >> makes sense. I will appreciate suggestions, inputs, comments regarding
> >> the patches or the approach in general. Thank you.
> > I like how patches 1-3 look. Could you test performance
> > with/without to see whether the extra indirection through
> > use of DMA ops causes a measurable slow-down?
> 
> I ran this simple DD command 10 times where /dev/vda is a virtio block
> device of 10GB size.
> 
> dd if=/dev/zero of=/dev/vda bs=8M count=1024 oflag=direct
> 
> With and without patches bandwidth which has a bit wide range does not
> look that different from each other.
> 
> Without patches
> ===============
> 
> ---------- 1 ---------
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.95557 s, 4.4 GB/s
> ---------- 2 ---------
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.05176 s, 4.2 GB/s
> ---------- 3 ---------
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.88314 s, 4.6 GB/s
> ---------- 4 ---------
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.84899 s, 4.6 GB/s
> ---------- 5 ---------
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 5.37184 s, 1.6 GB/s
> ---------- 6 ---------
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.9205 s, 4.5 GB/s
> ---------- 7 ---------
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 6.85166 s, 1.3 GB/s
> ---------- 8 ---------
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.74049 s, 4.9 GB/s
> ---------- 9 ---------
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 6.31699 s, 1.4 GB/s
> ---------- 10 ---------
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.47057 s, 3.5 GB/s
> 
> 
> With patches
> ============
> 
> ---------- 1 ---------
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.25993 s, 3.8 GB/s
> ---------- 2 ---------
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.82438 s, 4.7 GB/s
> ---------- 3 ---------
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.93856 s, 4.4 GB/s
> ---------- 4 ---------
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.83405 s, 4.7 GB/s
> ---------- 5 ---------
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 7.50199 s, 1.1 GB/s
> ---------- 6 ---------
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.28742 s, 3.8 GB/s
> ---------- 7 ---------
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 5.74958 s, 1.5 GB/s
> ---------- 8 ---------
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.99149 s, 4.3 GB/s
> ---------- 9 ---------
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 5.67647 s, 1.5 GB/s
> ---------- 10 ---------
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.93957 s, 2.9 GB/s
> 
> Does this look okay ?

You want to test IOPS with lots of small writes and using
raw ramdisk on host.

-- 
MST

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-07-23  9:08     ` Michael S. Tsirkin
@ 2018-07-25  3:26       ` Anshuman Khandual
  2018-07-27 11:31         ` Michael S. Tsirkin
  0 siblings, 1 reply; 119+ messages in thread
From: Anshuman Khandual @ 2018-07-25  3:26 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: robh, srikar, aik, jasowang, linuxram, linux-kernel,
	virtualization, hch, paulus, joe, linuxppc-dev, elfring, haren,
	david

On 07/23/2018 02:38 PM, Michael S. Tsirkin wrote:
> On Mon, Jul 23, 2018 at 11:58:23AM +0530, Anshuman Khandual wrote:
>> On 07/20/2018 06:46 PM, Michael S. Tsirkin wrote:
>>> On Fri, Jul 20, 2018 at 09:29:37AM +0530, Anshuman Khandual wrote:
>>>> This patch series is the follow up on the discussions we had before about
>>>> the RFC titled [RFC,V2] virtio: Add platform specific DMA API translation
>>>> for virito devices (https://patchwork.kernel.org/patch/10417371/). There
>>>> were suggestions about doing away with two different paths of transactions
>>>> with the host/QEMU, first being the direct GPA and the other being the DMA
>>>> API based translations.
>>>>
>>>> First patch attempts to create a direct GPA mapping based DMA operations
>>>> structure called 'virtio_direct_dma_ops' with exact same implementation
>>>> of the direct GPA path which virtio core currently has but just wrapped in
>>>> a DMA API format. Virtio core must use 'virtio_direct_dma_ops' instead of
>>>> the arch default in absence of VIRTIO_F_IOMMU_PLATFORM flag to preserve the
>>>> existing semantics. The second patch does exactly that inside the function
>>>> virtio_finalize_features(). The third patch removes the default direct GPA
>>>> path from virtio core forcing it to use DMA API callbacks for all devices.
>>>> Now with that change, every device must have a DMA operations structure
>>>> associated with it. The fourth patch adds an additional hook which gives
>>>> the platform an opportunity to do yet another override if required. This
>>>> platform hook can be used on POWER Ultravisor based protected guests to
>>>> load up SWIOTLB DMA callbacks to do the required (as discussed previously
>>>> in the above mentioned thread how host is allowed to access only parts of
>>>> the guest GPA range) bounce buffering into the shared memory for all I/O
>>>> scatter gather buffers to be consumed on the host side.
>>>>
>>>> Please go through these patches and review whether this approach broadly
>>>> makes sense. I will appreciate suggestions, inputs, comments regarding
>>>> the patches or the approach in general. Thank you.
>>> I like how patches 1-3 look. Could you test performance
>>> with/without to see whether the extra indirection through
>>> use of DMA ops causes a measurable slow-down?
>>
>> I ran this simple DD command 10 times where /dev/vda is a virtio block
>> device of 10GB size.
>>
>> dd if=/dev/zero of=/dev/vda bs=8M count=1024 oflag=direct
>>
>> With and without patches bandwidth which has a bit wide range does not
>> look that different from each other.
>>
>> Without patches
>> ===============
>>
>> ---------- 1 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.95557 s, 4.4 GB/s
>> ---------- 2 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.05176 s, 4.2 GB/s
>> ---------- 3 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.88314 s, 4.6 GB/s
>> ---------- 4 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.84899 s, 4.6 GB/s
>> ---------- 5 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 5.37184 s, 1.6 GB/s
>> ---------- 6 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.9205 s, 4.5 GB/s
>> ---------- 7 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 6.85166 s, 1.3 GB/s
>> ---------- 8 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.74049 s, 4.9 GB/s
>> ---------- 9 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 6.31699 s, 1.4 GB/s
>> ---------- 10 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.47057 s, 3.5 GB/s
>>
>>
>> With patches
>> ============
>>
>> ---------- 1 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.25993 s, 3.8 GB/s
>> ---------- 2 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.82438 s, 4.7 GB/s
>> ---------- 3 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.93856 s, 4.4 GB/s
>> ---------- 4 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.83405 s, 4.7 GB/s
>> ---------- 5 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 7.50199 s, 1.1 GB/s
>> ---------- 6 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.28742 s, 3.8 GB/s
>> ---------- 7 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 5.74958 s, 1.5 GB/s
>> ---------- 8 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.99149 s, 4.3 GB/s
>> ---------- 9 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 5.67647 s, 1.5 GB/s
>> ---------- 10 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.93957 s, 2.9 GB/s
>>
>> Does this look okay ?
> 
> You want to test IOPS with lots of small writes and using
> raw ramdisk on host.

Hello Michael,

I have conducted the following experiments and here are the results.

TEST SETUP
==========

A virtio block disk is mounted on the guest as follows.


    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' ioeventfd='off'/>
      <source file='/mnt/disk2.img'/>
      <target dev='vdb' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>

In the host back end its an QEMU raw image on tmpfs file system.

disk:

-rw-r--r-- 1 libvirt-qemu kvm  5.0G Jul 24 06:26 disk2.img

mount:

size=21G on /mnt type tmpfs (rw,relatime,size=22020096k)

TEST CONFIG
===========

FIO (https://linux.die.net/man/1/fio) is being run with and without
the patches.

Read test config:

[Sequential]
direct=1
ioengine=libaio
runtime=5m
time_based
filename=/dev/vda
bs=4k
numjobs=16
rw=read
unlink=1
iodepth=256


Write test config:

[Sequential]
direct=1
ioengine=libaio
runtime=5m
time_based
filename=/dev/vda
bs=4k
numjobs=16
rw=write
unlink=1
iodepth=256

The virtio block device comes up as /dev/vda on the guest with

/sys/block/vda/queue/nr_requests=128

TEST RESULTS
============

Without the patches
-------------------

Read test:

Run status group 0 (all jobs):
   READ: bw=550MiB/s (577MB/s), 33.2MiB/s-35.6MiB/s (34.9MB/s-37.4MB/s), io=161GiB (173GB), run=300001-300009msec

Disk stats (read/write):
  vda: ios=42249926/0, merge=0/0, ticks=1499920/0, in_queue=35672384, util=100.00%


Write test:

Run status group 0 (all jobs):
  WRITE: bw=514MiB/s (539MB/s), 31.5MiB/s-34.6MiB/s (33.0MB/s-36.2MB/s), io=151GiB (162GB), run=300001-300009msec

Disk stats (read/write):
  vda: ios=29/39459261, merge=0/0, ticks=0/1570580, in_queue=35745992, util=100.00%

With the patches
----------------

Read test:

Run status group 0 (all jobs):
   READ: bw=572MiB/s (600MB/s), 35.0MiB/s-37.2MiB/s (36.7MB/s-38.0MB/s), io=168GiB (180GB), run=300001-300006msec

Disk stats (read/write):
  vda: ios=43917611/0, merge=0/0, ticks=1934268/0, in_queue=35531688, util=100.00%
  
Write test:

Run status group 0 (all jobs):
  WRITE: bw=546MiB/s (572MB/s), 33.7MiB/s-35.0MiB/s (35.3MB/s-36.7MB/s), io=160GiB (172GB), run=300001-300007msec

Disk stats (read/write):
  vda: ios=14/41893878, merge=0/0, ticks=8/2107816, in_queue=35535716, util=100.00%

Results with and without the patches are similar.


^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 4/4] virtio: Add platform specific DMA API translation for virito devices
  2018-07-23  2:16     ` Anshuman Khandual
@ 2018-07-25  4:30       ` Anshuman Khandual
  2018-07-25 13:31       ` Michael S. Tsirkin
  1 sibling, 0 replies; 119+ messages in thread
From: Anshuman Khandual @ 2018-07-25  4:30 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: robh, srikar, aik, jasowang, linuxram, linux-kernel,
	virtualization, hch, paulus, joe, linuxppc-dev, elfring, haren,
	david

On 07/23/2018 07:46 AM, Anshuman Khandual wrote:
> On 07/20/2018 06:45 PM, Michael S. Tsirkin wrote:
>> On Fri, Jul 20, 2018 at 09:29:41AM +0530, Anshuman Khandual wrote:
>>> Subject: Re: [RFC 4/4] virtio: Add platform specific DMA API translation for
>>> virito devices
>>
>> s/virito/virtio/
> 
> Oops, will fix it. Thanks for pointing out.
> 
>>
>>> This adds a hook which a platform can define in order to allow it to
>>> override virtio device's DMA OPS irrespective of whether it has the
>>> flag VIRTIO_F_IOMMU_PLATFORM set or not. We want to use this to do
>>> bounce-buffering of data on the new secure pSeries platform, currently
>>> under development, where a KVM host cannot access all of the memory
>>> space of a secure KVM guest.  The host can only access the pages which
>>> the guest has explicitly requested to be shared with the host, thus
>>> the virtio implementation in the guest has to copy data to and from
>>> shared pages.
>>>
>>> With this hook, the platform code in the secure guest can force the
>>> use of swiotlb for virtio buffers, with a back-end for swiotlb which
>>> will use a pool of pre-allocated shared pages.  Thus all data being
>>> sent or received by virtio devices will be copied through pages which
>>> the host has access to.
>>>
>>> Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
>>> ---
>>>  arch/powerpc/include/asm/dma-mapping.h | 6 ++++++
>>>  arch/powerpc/platforms/pseries/iommu.c | 6 ++++++
>>>  drivers/virtio/virtio.c                | 7 +++++++
>>>  3 files changed, 19 insertions(+)
>>>
>>> diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h
>>> index 8fa3945..bc5a9d3 100644
>>> --- a/arch/powerpc/include/asm/dma-mapping.h
>>> +++ b/arch/powerpc/include/asm/dma-mapping.h
>>> @@ -116,3 +116,9 @@ extern u64 __dma_get_required_mask(struct device *dev);
>>>  
>>>  #endif /* __KERNEL__ */
>>>  #endif	/* _ASM_DMA_MAPPING_H */
>>> +
>>> +#define platform_override_dma_ops platform_override_dma_ops
>>> +
>>> +struct virtio_device;
>>> +
>>> +extern void platform_override_dma_ops(struct virtio_device *vdev);
>>> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
>>> index 06f0296..5773bc7 100644
>>> --- a/arch/powerpc/platforms/pseries/iommu.c
>>> +++ b/arch/powerpc/platforms/pseries/iommu.c
>>> @@ -38,6 +38,7 @@
>>>  #include <linux/of.h>
>>>  #include <linux/iommu.h>
>>>  #include <linux/rculist.h>
>>> +#include <linux/virtio.h>
>>>  #include <asm/io.h>
>>>  #include <asm/prom.h>
>>>  #include <asm/rtas.h>
>>> @@ -1396,3 +1397,8 @@ static int __init disable_multitce(char *str)
>>>  __setup("multitce=", disable_multitce);
>>>  
>>>  machine_subsys_initcall_sync(pseries, tce_iommu_bus_notifier_init);
>>> +
>>> +void platform_override_dma_ops(struct virtio_device *vdev)
>>> +{
>>> +	/* Override vdev->parent.dma_ops if required */
>>> +}
>>> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
>>> index 6b13987..432c332 100644
>>> --- a/drivers/virtio/virtio.c
>>> +++ b/drivers/virtio/virtio.c
>>> @@ -168,6 +168,12 @@ EXPORT_SYMBOL_GPL(virtio_add_status);
>>>  
>>>  const struct dma_map_ops virtio_direct_dma_ops;
>>>  
>>> +#ifndef platform_override_dma_ops
>>> +static inline void platform_override_dma_ops(struct virtio_device *vdev)
>>> +{
>>> +}
>>> +#endif
>>> +
>>>  int virtio_finalize_features(struct virtio_device *dev)
>>>  {
>>>  	int ret = dev->config->finalize_features(dev);
>>> @@ -179,6 +185,7 @@ int virtio_finalize_features(struct virtio_device *dev)
>>>  	if (virtio_has_iommu_quirk(dev))
>>>  		set_dma_ops(dev->dev.parent, &virtio_direct_dma_ops);
>>>  
>>> +	platform_override_dma_ops(dev);
>>
>> Is there a single place where virtio_has_iommu_quirk is called now?
> 
> Not other than this one. But in the proposed implementation of
> platform_override_dma_ops on powerpc, we will again check on
> virtio_has_iommu_quirk before overriding it with SWIOTLB.
> 
> void platform_override_dma_ops(struct virtio_device *vdev)
> {
>         if (is_ultravisor_platform() && virtio_has_iommu_quirk(vdev))
>                 set_dma_ops(vdev->dev.parent, &swiotlb_dma_ops);
> }
> 
>> If so, we could put this into virtio_has_iommu_quirk then.
> 
> Did you mean platform_override_dma_ops instead ? If so, yes that
> is possible. Default implementation of platform_override_dma_ops
> should just check on VIRTIO_F_IOMMU_PLATFORM feature and override
> with virtio_direct_dma_ops but arch implementation can check on
> what ever else they would like and override appropriately.
> 
> Default platform_override_dma_ops will be like this
> 
> #ifndef platform_override_dma_ops
> static inline void platform_override_dma_ops(struct virtio_device *vdev)
> {
> 	if(!virtio_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM))
> 		set_dma_ops(dev->dev.parent, &virtio_direct_dma_ops);
> }
> #endif
> 
> Proposed powerpc implementation will be like this instead
> 
> void platform_override_dma_ops(struct virtio_device *vdev)
> {
> 	if (virtio_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM))
> 		return;
> 
>         if (is_ultravisor_platform())
>                 set_dma_ops(vdev->dev.parent, &swiotlb_dma_ops);
> 	else
> 		set_dma_ops(dev->dev.parent, &virtio_direct_dma_ops);
> 	
> }
> 
> There is a redundant definition of virtio_has_iommu_quirk in the tools
> directory (tools/virtio/linux/virtio_config.h) which does not seem to
> be used any where. I guess that can be removed without problem.

Does this sound okay ? It will merge patch 3 and 4 into a single one.
On the other hand it also passes the responsibility of dealing with
VIRTIO_F_IOMMU_PLATFORM flag to the architecture callback which might
be bit problematic. Keeping VIRTIO_F_IOMMU_PLATFORM handling in virtio
core at least makes sure that the device has a working DMA ops to fall
back on if the arch hook fails to take care of it somehow.


^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 4/4] virtio: Add platform specific DMA API translation for virito devices
  2018-07-23  2:16     ` Anshuman Khandual
  2018-07-25  4:30       ` Anshuman Khandual
@ 2018-07-25 13:31       ` Michael S. Tsirkin
  1 sibling, 0 replies; 119+ messages in thread
From: Michael S. Tsirkin @ 2018-07-25 13:31 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, benh, mpe, hch, linuxram, haren,
	paulus, srikar

On Mon, Jul 23, 2018 at 07:46:09AM +0530, Anshuman Khandual wrote:
> There is a redundant definition of virtio_has_iommu_quirk in the tools
> directory (tools/virtio/linux/virtio_config.h) which does not seem to
> be used any where. I guess that can be removed without problem.

It's there just to make tools/virtio build.
Try
	make -C tools/virtio/
In fact I see there's a missing definition for
dma_rmb/dma_wmb there, I'll post a patch.

-- 
MST

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-07-20  3:59 [RFC 0/4] Virtio uses DMA API for all devices Anshuman Khandual
                   ` (4 preceding siblings ...)
  2018-07-20 13:16 ` [RFC 0/4] Virtio uses DMA API for all devices Michael S. Tsirkin
@ 2018-07-27  9:58 ` Will Deacon
  2018-07-27 10:58   ` Anshuman Khandual
  2018-07-30  9:34   ` Christoph Hellwig
  2018-08-02 20:55 ` Michael S. Tsirkin
  6 siblings, 2 replies; 119+ messages in thread
From: Will Deacon @ 2018-07-27  9:58 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, benh, mpe, mst, hch, linuxram, haren,
	paulus, srikar, robin.murphy, jean-philippe.brucker,
	marc.zyngier

Hi Anshuman,

On Fri, Jul 20, 2018 at 09:29:37AM +0530, Anshuman Khandual wrote:
> This patch series is the follow up on the discussions we had before about
> the RFC titled [RFC,V2] virtio: Add platform specific DMA API translation
> for virito devices (https://patchwork.kernel.org/patch/10417371/). There
> were suggestions about doing away with two different paths of transactions
> with the host/QEMU, first being the direct GPA and the other being the DMA
> API based translations.
> 
> First patch attempts to create a direct GPA mapping based DMA operations
> structure called 'virtio_direct_dma_ops' with exact same implementation
> of the direct GPA path which virtio core currently has but just wrapped in
> a DMA API format. Virtio core must use 'virtio_direct_dma_ops' instead of
> the arch default in absence of VIRTIO_F_IOMMU_PLATFORM flag to preserve the
> existing semantics. The second patch does exactly that inside the function
> virtio_finalize_features(). The third patch removes the default direct GPA
> path from virtio core forcing it to use DMA API callbacks for all devices.
> Now with that change, every device must have a DMA operations structure
> associated with it. The fourth patch adds an additional hook which gives
> the platform an opportunity to do yet another override if required. This
> platform hook can be used on POWER Ultravisor based protected guests to
> load up SWIOTLB DMA callbacks to do the required (as discussed previously
> in the above mentioned thread how host is allowed to access only parts of
> the guest GPA range) bounce buffering into the shared memory for all I/O
> scatter gather buffers to be consumed on the host side.
> 
> Please go through these patches and review whether this approach broadly
> makes sense. I will appreciate suggestions, inputs, comments regarding
> the patches or the approach in general. Thank you.

I just wanted to say that this patch series provides a means for us to
force the coherent DMA ops for legacy virtio devices on arm64, which in turn
means that we can enable the SMMU with legacy devices in our fastmodel
emulation platform (which is slowly being upgraded to virtio 1.0) without
hanging during boot. Patch below.

So:

Acked-by: Will Deacon <will.deacon@arm.com>
Tested-by: Will Deacon <will.deacon@arm.com>

Thanks!

Will

--->8

From 4ef39e9de2c87c97bf046816ca762832f92e39b5 Mon Sep 17 00:00:00 2001
From: Will Deacon <will.deacon@arm.com>
Date: Fri, 27 Jul 2018 10:49:25 +0100
Subject: [PATCH] arm64: dma: Override DMA ops for legacy virtio devices

Virtio devices are always cache-coherent, so force use of the coherent
DMA ops for legacy virtio devices where the dma-coherent is known to
be omitted by QEMU for the MMIO transport.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/dma-mapping.h |  6 ++++++
 arch/arm64/mm/dma-mapping.c          | 19 +++++++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/arch/arm64/include/asm/dma-mapping.h b/arch/arm64/include/asm/dma-mapping.h
index b7847eb8a7bb..30aa8fb62dc3 100644
--- a/arch/arm64/include/asm/dma-mapping.h
+++ b/arch/arm64/include/asm/dma-mapping.h
@@ -44,6 +44,12 @@ void arch_teardown_dma_ops(struct device *dev);
 #define arch_teardown_dma_ops	arch_teardown_dma_ops
 #endif
 
+#ifdef CONFIG_VIRTIO
+struct virtio_device;
+void platform_override_dma_ops(struct virtio_device *vdev);
+#define platform_override_dma_ops	platform_override_dma_ops
+#endif
+
 /* do not use this function in a driver */
 static inline bool is_device_dma_coherent(struct device *dev)
 {
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 61e93f0b5482..f9ca61b1b34d 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -891,3 +891,22 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 	}
 #endif
 }
+
+#ifdef CONFIG_VIRTIO
+#include <linux/virtio_config.h>
+
+void platform_override_dma_ops(struct virtio_device *vdev)
+{
+	struct device *dev = vdev->dev.parent;
+	const struct dma_map_ops *dma_ops = &arm64_swiotlb_dma_ops;
+
+	if (virtio_has_feature(vdev, VIRTIO_F_VERSION_1))
+		return;
+
+	dev->archdata.dma_coherent = true;
+	if (iommu_get_domain_for_dev(dev))
+		dma_ops = &iommu_dma_ops;
+
+	set_dma_ops(dev, dma_ops);
+}
+#endif	/* CONFIG_VIRTIO */
-- 
2.1.4


^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-07-27  9:58 ` Will Deacon
@ 2018-07-27 10:58   ` Anshuman Khandual
  2018-07-30  9:34   ` Christoph Hellwig
  1 sibling, 0 replies; 119+ messages in thread
From: Anshuman Khandual @ 2018-07-27 10:58 UTC (permalink / raw)
  To: Will Deacon
  Cc: robh, srikar, mst, aik, jasowang, linuxram, linux-kernel,
	virtualization, hch, jean-philippe.brucker, paulus, marc.zyngier,
	joe, robin.murphy, linuxppc-dev, elfring, haren, david

On 07/27/2018 03:28 PM, Will Deacon wrote:
> Hi Anshuman,
> 
> On Fri, Jul 20, 2018 at 09:29:37AM +0530, Anshuman Khandual wrote:
>> This patch series is the follow up on the discussions we had before about
>> the RFC titled [RFC,V2] virtio: Add platform specific DMA API translation
>> for virito devices (https://patchwork.kernel.org/patch/10417371/). There
>> were suggestions about doing away with two different paths of transactions
>> with the host/QEMU, first being the direct GPA and the other being the DMA
>> API based translations.
>>
>> First patch attempts to create a direct GPA mapping based DMA operations
>> structure called 'virtio_direct_dma_ops' with exact same implementation
>> of the direct GPA path which virtio core currently has but just wrapped in
>> a DMA API format. Virtio core must use 'virtio_direct_dma_ops' instead of
>> the arch default in absence of VIRTIO_F_IOMMU_PLATFORM flag to preserve the
>> existing semantics. The second patch does exactly that inside the function
>> virtio_finalize_features(). The third patch removes the default direct GPA
>> path from virtio core forcing it to use DMA API callbacks for all devices.
>> Now with that change, every device must have a DMA operations structure
>> associated with it. The fourth patch adds an additional hook which gives
>> the platform an opportunity to do yet another override if required. This
>> platform hook can be used on POWER Ultravisor based protected guests to
>> load up SWIOTLB DMA callbacks to do the required (as discussed previously
>> in the above mentioned thread how host is allowed to access only parts of
>> the guest GPA range) bounce buffering into the shared memory for all I/O
>> scatter gather buffers to be consumed on the host side.
>>
>> Please go through these patches and review whether this approach broadly
>> makes sense. I will appreciate suggestions, inputs, comments regarding
>> the patches or the approach in general. Thank you.
> I just wanted to say that this patch series provides a means for us to
> force the coherent DMA ops for legacy virtio devices on arm64, which in turn
> means that we can enable the SMMU with legacy devices in our fastmodel
> emulation platform (which is slowly being upgraded to virtio 1.0) without
> hanging during boot. Patch below.
> 
> So:
> 
> Acked-by: Will Deacon <will.deacon@arm.com>
> Tested-by: Will Deacon <will.deacon@arm.com>

Thanks Will.


^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-07-25  3:26       ` Anshuman Khandual
@ 2018-07-27 11:31         ` Michael S. Tsirkin
  2018-07-28  8:37           ` Anshuman Khandual
  0 siblings, 1 reply; 119+ messages in thread
From: Michael S. Tsirkin @ 2018-07-27 11:31 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: robh, srikar, aik, jasowang, linuxram, linux-kernel,
	virtualization, hch, paulus, joe, linuxppc-dev, elfring, haren,
	david

On Wed, Jul 25, 2018 at 08:56:23AM +0530, Anshuman Khandual wrote:
> Results with and without the patches are similar.

Thanks! And another thing to try is virtio-net with
a fast NIC backend (40G and up). Unfortunately
at this point loopback tests stress the host
scheduler too much.

-- 
MST

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-07-27 11:31         ` Michael S. Tsirkin
@ 2018-07-28  8:37           ` Anshuman Khandual
  0 siblings, 0 replies; 119+ messages in thread
From: Anshuman Khandual @ 2018-07-28  8:37 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: robh, srikar, aik, jasowang, linuxram, linux-kernel,
	virtualization, hch, paulus, joe, linuxppc-dev, elfring, haren,
	david

On 07/27/2018 05:01 PM, Michael S. Tsirkin wrote:
> On Wed, Jul 25, 2018 at 08:56:23AM +0530, Anshuman Khandual wrote:
>> Results with and without the patches are similar.
> 
> Thanks! And another thing to try is virtio-net with
> a fast NIC backend (40G and up). Unfortunately
> at this point loopback tests stress the host
> scheduler too much.
> 

Sure. Will look around for a 40G NIC system. BTW I have been testing
virtio-net with a TAP device as back end.

ip tuntap add dev tap0 mode tap user $(whoami)
ip link set tap0 master virbr0
ip link set dev virbr0 up
ip link set dev tap0 up

which is exported into the guest as follows

-device virtio-net,netdev=network0,mac=52:55:00:d1:55:01 \
-netdev tap,id=network0,ifname=tap0,script=no,downscript=no \

But I have not run any network benchmarks on it though.


^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 2/4] virtio: Override device's DMA OPS with virtio_direct_dma_ops selectively
  2018-07-20  3:59 ` [RFC 2/4] virtio: Override device's DMA OPS with virtio_direct_dma_ops selectively Anshuman Khandual
@ 2018-07-28  8:56   ` Anshuman Khandual
  2018-07-28 21:16     ` Michael S. Tsirkin
  2018-07-30  9:25   ` Christoph Hellwig
  1 sibling, 1 reply; 119+ messages in thread
From: Anshuman Khandual @ 2018-07-28  8:56 UTC (permalink / raw)
  To: virtualization, linux-kernel
  Cc: linuxppc-dev, aik, robh, joe, elfring, david, jasowang, benh,
	mpe, mst, hch, linuxram, haren, paulus, srikar

On 07/20/2018 09:29 AM, Anshuman Khandual wrote:
> Now that virtio core always needs all virtio devices to have DMA OPS, we
> need to make sure that the structure it points is the right one. In the
> absence of VIRTIO_F_IOMMU_PLATFORM flag QEMU expects GPA from guest kernel.
> In such case, virtio device must use default virtio_direct_dma_ops DMA OPS
> structure which transforms scatter gather buffer addresses as GPA. This
> DMA OPS override must happen as early as possible during virtio device
> initializatin sequence before virtio core starts using given device's DMA
> OPS callbacks for I/O transactions. This change detects device's IOMMU flag
> and does the override in case the flag is cleared.
> 
> Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
> ---
>  drivers/virtio/virtio.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 7907ad3..6b13987 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -166,6 +166,8 @@ void virtio_add_status(struct virtio_device *dev, unsigned int status)
>  }
>  EXPORT_SYMBOL_GPL(virtio_add_status);
> 
> +const struct dma_map_ops virtio_direct_dma_ops;
> +
>  int virtio_finalize_features(struct virtio_device *dev)
>  {
>  	int ret = dev->config->finalize_features(dev);
> @@ -174,6 +176,9 @@ int virtio_finalize_features(struct virtio_device *dev)
>  	if (ret)
>  		return ret;


The previous patch removed the code block for XEN guests which forced
the use of DMA API all the time irrespective of VIRTIO_F_IOMMU_PLATFORM
flag on the device. Here is what I have removed with patch 2/4 which
breaks the existing semantics on XEN guests.

-static bool vring_use_dma_api(struct virtio_device *vdev)
-{
-	if (!virtio_has_iommu_quirk(vdev))
-		return true;
-
-	/* Otherwise, we are left to guess. */
-	/*
-	 * In theory, it's possible to have a buggy QEMU-supposed
-	 * emulated Q35 IOMMU and Xen enabled at the same time.  On
-	 * such a configuration, virtio has never worked and will
-	 * not work without an even larger kludge.  Instead, enable
-	 * the DMA API if we're a Xen guest, which at least allows
-	 * all of the sensible Xen configurations to work correctly.
-	 */
-	if (xen_domain())
-		return true;
-
-	return false;
-}

XEN guests would not like override with virtio_direct_dma_ops in any
case irrespective of the flag VIRTIO_F_IOMMU_PLATFORM. So the existing
semantics can be preserved with something like this. It just assumes
that dev->dma_ops is non-NULL and a valid one set by the architecture.
If required we can add those tests here before skipping the override.

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 7907ad3..6b13987 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -166,6 +166,8 @@ void virtio_add_status(struct virtio_device *dev, unsigned int status)
 }
 EXPORT_SYMBOL_GPL(virtio_add_status);

+const struct dma_map_ops virtio_direct_dma_ops;
+
 int virtio_finalize_features(struct virtio_device *dev)
 {
 	int ret = dev->config->finalize_features(dev);
@@ -174,6 +176,9 @@ int virtio_finalize_features(struct virtio_device *dev)
 	if (ret)
 		return ret;
+
+	if (xen_domain())
+		goto skip_override;
+
+	if (virtio_has_iommu_quirk(dev))
+		set_dma_ops(dev->dev.parent, &virtio_direct_dma_ops);
+
+ skip_override:
+
 	if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1))
 		return 0

Will incorporate these changes in the next version.


^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 2/4] virtio: Override device's DMA OPS with virtio_direct_dma_ops selectively
  2018-07-28  8:56   ` Anshuman Khandual
@ 2018-07-28 21:16     ` Michael S. Tsirkin
  2018-07-30  4:15       ` Anshuman Khandual
  2018-07-30  9:30       ` Christoph Hellwig
  0 siblings, 2 replies; 119+ messages in thread
From: Michael S. Tsirkin @ 2018-07-28 21:16 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, benh, mpe, hch, linuxram, haren,
	paulus, srikar

On Sat, Jul 28, 2018 at 02:26:24PM +0530, Anshuman Khandual wrote:
> On 07/20/2018 09:29 AM, Anshuman Khandual wrote:
> > Now that virtio core always needs all virtio devices to have DMA OPS, we
> > need to make sure that the structure it points is the right one. In the
> > absence of VIRTIO_F_IOMMU_PLATFORM flag QEMU expects GPA from guest kernel.
> > In such case, virtio device must use default virtio_direct_dma_ops DMA OPS
> > structure which transforms scatter gather buffer addresses as GPA. This
> > DMA OPS override must happen as early as possible during virtio device
> > initializatin sequence before virtio core starts using given device's DMA
> > OPS callbacks for I/O transactions. This change detects device's IOMMU flag
> > and does the override in case the flag is cleared.
> > 
> > Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
> > ---
> >  drivers/virtio/virtio.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > index 7907ad3..6b13987 100644
> > --- a/drivers/virtio/virtio.c
> > +++ b/drivers/virtio/virtio.c
> > @@ -166,6 +166,8 @@ void virtio_add_status(struct virtio_device *dev, unsigned int status)
> >  }
> >  EXPORT_SYMBOL_GPL(virtio_add_status);
> > 
> > +const struct dma_map_ops virtio_direct_dma_ops;
> > +
> >  int virtio_finalize_features(struct virtio_device *dev)
> >  {
> >  	int ret = dev->config->finalize_features(dev);
> > @@ -174,6 +176,9 @@ int virtio_finalize_features(struct virtio_device *dev)
> >  	if (ret)
> >  		return ret;
> 
> 
> The previous patch removed the code block for XEN guests which forced
> the use of DMA API all the time irrespective of VIRTIO_F_IOMMU_PLATFORM
> flag on the device. Here is what I have removed with patch 2/4 which
> breaks the existing semantics on XEN guests.
> 
> -static bool vring_use_dma_api(struct virtio_device *vdev)
> -{
> -	if (!virtio_has_iommu_quirk(vdev))
> -		return true;
> -
> -	/* Otherwise, we are left to guess. */
> -	/*
> -	 * In theory, it's possible to have a buggy QEMU-supposed
> -	 * emulated Q35 IOMMU and Xen enabled at the same time.  On
> -	 * such a configuration, virtio has never worked and will
> -	 * not work without an even larger kludge.  Instead, enable
> -	 * the DMA API if we're a Xen guest, which at least allows
> -	 * all of the sensible Xen configurations to work correctly.
> -	 */
> -	if (xen_domain())
> -		return true;
> -
> -	return false;
> -}
> 
> XEN guests would not like override with virtio_direct_dma_ops in any
> case irrespective of the flag VIRTIO_F_IOMMU_PLATFORM. So the existing
> semantics can be preserved with something like this. It just assumes
> that dev->dma_ops is non-NULL and a valid one set by the architecture.
> If required we can add those tests here before skipping the override.
> 
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 7907ad3..6b13987 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -166,6 +166,8 @@ void virtio_add_status(struct virtio_device *dev, unsigned int status)
>  }
>  EXPORT_SYMBOL_GPL(virtio_add_status);
> 
> +const struct dma_map_ops virtio_direct_dma_ops;
> +
>  int virtio_finalize_features(struct virtio_device *dev)
>  {
>  	int ret = dev->config->finalize_features(dev);
> @@ -174,6 +176,9 @@ int virtio_finalize_features(struct virtio_device *dev)
>  	if (ret)
>  		return ret;
> +
> +	if (xen_domain())
> +		goto skip_override;
> +
> +	if (virtio_has_iommu_quirk(dev))
> +		set_dma_ops(dev->dev.parent, &virtio_direct_dma_ops);
> +
> + skip_override:
> +

I prefer normal if scoping as opposed to goto spaghetti pls.
Better yet move vring_use_dma_api here and use it.
Less of a chance something will break.

>  	if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1))
>  		return 0
> 
> Will incorporate these changes in the next version.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 2/4] virtio: Override device's DMA OPS with virtio_direct_dma_ops selectively
  2018-07-28 21:16     ` Michael S. Tsirkin
@ 2018-07-30  4:15       ` Anshuman Khandual
  2018-07-30  9:30       ` Christoph Hellwig
  1 sibling, 0 replies; 119+ messages in thread
From: Anshuman Khandual @ 2018-07-30  4:15 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, benh, mpe, hch, linuxram, haren,
	paulus, srikar

On 07/29/2018 02:46 AM, Michael S. Tsirkin wrote:
> On Sat, Jul 28, 2018 at 02:26:24PM +0530, Anshuman Khandual wrote:
>> On 07/20/2018 09:29 AM, Anshuman Khandual wrote:
>>> Now that virtio core always needs all virtio devices to have DMA OPS, we
>>> need to make sure that the structure it points is the right one. In the
>>> absence of VIRTIO_F_IOMMU_PLATFORM flag QEMU expects GPA from guest kernel.
>>> In such case, virtio device must use default virtio_direct_dma_ops DMA OPS
>>> structure which transforms scatter gather buffer addresses as GPA. This
>>> DMA OPS override must happen as early as possible during virtio device
>>> initializatin sequence before virtio core starts using given device's DMA
>>> OPS callbacks for I/O transactions. This change detects device's IOMMU flag
>>> and does the override in case the flag is cleared.
>>>
>>> Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
>>> ---
>>>  drivers/virtio/virtio.c | 5 +++++
>>>  1 file changed, 5 insertions(+)
>>>
>>> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
>>> index 7907ad3..6b13987 100644
>>> --- a/drivers/virtio/virtio.c
>>> +++ b/drivers/virtio/virtio.c
>>> @@ -166,6 +166,8 @@ void virtio_add_status(struct virtio_device *dev, unsigned int status)
>>>  }
>>>  EXPORT_SYMBOL_GPL(virtio_add_status);
>>>
>>> +const struct dma_map_ops virtio_direct_dma_ops;
>>> +
>>>  int virtio_finalize_features(struct virtio_device *dev)
>>>  {
>>>  	int ret = dev->config->finalize_features(dev);
>>> @@ -174,6 +176,9 @@ int virtio_finalize_features(struct virtio_device *dev)
>>>  	if (ret)
>>>  		return ret;
>>
>>
>> The previous patch removed the code block for XEN guests which forced
>> the use of DMA API all the time irrespective of VIRTIO_F_IOMMU_PLATFORM
>> flag on the device. Here is what I have removed with patch 2/4 which
>> breaks the existing semantics on XEN guests.
>>
>> -static bool vring_use_dma_api(struct virtio_device *vdev)
>> -{
>> -	if (!virtio_has_iommu_quirk(vdev))
>> -		return true;
>> -
>> -	/* Otherwise, we are left to guess. */
>> -	/*
>> -	 * In theory, it's possible to have a buggy QEMU-supposed
>> -	 * emulated Q35 IOMMU and Xen enabled at the same time.  On
>> -	 * such a configuration, virtio has never worked and will
>> -	 * not work without an even larger kludge.  Instead, enable
>> -	 * the DMA API if we're a Xen guest, which at least allows
>> -	 * all of the sensible Xen configurations to work correctly.
>> -	 */
>> -	if (xen_domain())
>> -		return true;
>> -
>> -	return false;
>> -}
>>
>> XEN guests would not like override with virtio_direct_dma_ops in any
>> case irrespective of the flag VIRTIO_F_IOMMU_PLATFORM. So the existing
>> semantics can be preserved with something like this. It just assumes
>> that dev->dma_ops is non-NULL and a valid one set by the architecture.
>> If required we can add those tests here before skipping the override.
>>
>> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
>> index 7907ad3..6b13987 100644
>> --- a/drivers/virtio/virtio.c
>> +++ b/drivers/virtio/virtio.c
>> @@ -166,6 +166,8 @@ void virtio_add_status(struct virtio_device *dev, unsigned int status)
>>  }
>>  EXPORT_SYMBOL_GPL(virtio_add_status);
>>
>> +const struct dma_map_ops virtio_direct_dma_ops;
>> +
>>  int virtio_finalize_features(struct virtio_device *dev)
>>  {
>>  	int ret = dev->config->finalize_features(dev);
>> @@ -174,6 +176,9 @@ int virtio_finalize_features(struct virtio_device *dev)
>>  	if (ret)
>>  		return ret;
>> +
>> +	if (xen_domain())
>> +		goto skip_override;
>> +
>> +	if (virtio_has_iommu_quirk(dev))
>> +		set_dma_ops(dev->dev.parent, &virtio_direct_dma_ops);
>> +
>> + skip_override:
>> +
> 
> I prefer normal if scoping as opposed to goto spaghetti pls.
> Better yet move vring_use_dma_api here and use it.
> Less of a chance something will break.

Sure, will move vring_use_dma_api() function in here.


^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 1/4] virtio: Define virtio_direct_dma_ops structure
  2018-07-20  3:59 ` [RFC 1/4] virtio: Define virtio_direct_dma_ops structure Anshuman Khandual
@ 2018-07-30  9:24   ` Christoph Hellwig
  2018-07-31  4:01     ` Anshuman Khandual
  0 siblings, 1 reply; 119+ messages in thread
From: Christoph Hellwig @ 2018-07-30  9:24 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, benh, mpe, mst, hch, linuxram, haren,
	paulus, srikar

> +/*
> + * Virtio direct mapping DMA API operations structure
> + *
> + * This defines DMA API structure for all virtio devices which would not
> + * either bring in their own DMA OPS from architecture or they would not
> + * like to use architecture specific IOMMU based DMA OPS because QEMU
> + * expects GPA instead of an IOVA in absence of VIRTIO_F_IOMMU_PLATFORM.
> + */
> +dma_addr_t virtio_direct_map_page(struct device *dev, struct page *page,
> +			    unsigned long offset, size_t size,
> +			    enum dma_data_direction dir,
> +			    unsigned long attrs)

All these functions should probably be marked static.

> +void virtio_direct_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
> +			size_t size, enum dma_data_direction dir,
> +			unsigned long attrs)
> +{
> +}

No need to implement no-op callbacks in struct dma_map_ops.

> +
> +int virtio_direct_mapping_error(struct device *hwdev, dma_addr_t dma_addr)
> +{
> +	return 0;
> +}

Including this one.

> +void *virtio_direct_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
> +		gfp_t gfp, unsigned long attrs)
> +{
> +	void *queue = alloc_pages_exact(PAGE_ALIGN(size), gfp);
> +
> +	if (queue) {
> +		phys_addr_t phys_addr = virt_to_phys(queue);
> +		*dma_handle = (dma_addr_t)phys_addr;
> +
> +		if (WARN_ON_ONCE(*dma_handle != phys_addr)) {
> +			free_pages_exact(queue, PAGE_ALIGN(size));
> +			return NULL;
> +		}
> +	}
> +	return queue;

queue is a very odd name in a generic memory allocator.

> +void virtio_direct_free(struct device *dev, size_t size, void *vaddr,
> +		dma_addr_t dma_addr, unsigned long attrs)
> +{
> +	free_pages_exact(vaddr, PAGE_ALIGN(size));
> +}
> +
> +const struct dma_map_ops virtio_direct_dma_ops = {
> +	.alloc			= virtio_direct_alloc,
> +	.free			= virtio_direct_free,
> +	.map_page		= virtio_direct_map_page,
> +	.unmap_page		= virtio_direct_unmap_page,
> +	.mapping_error		= virtio_direct_mapping_error,
> +};

This is missing a dma_map_sg implementation.  In general this is
mandatory for dma_ops.  So either you implement it or explain in
a common why you think you can skip it.

> +EXPORT_SYMBOL(virtio_direct_dma_ops);

EXPORT_SYMBOL_GPL like all virtio symbols, please.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 2/4] virtio: Override device's DMA OPS with virtio_direct_dma_ops selectively
  2018-07-20  3:59 ` [RFC 2/4] virtio: Override device's DMA OPS with virtio_direct_dma_ops selectively Anshuman Khandual
  2018-07-28  8:56   ` Anshuman Khandual
@ 2018-07-30  9:25   ` Christoph Hellwig
  2018-07-31  7:00     ` Anshuman Khandual
  1 sibling, 1 reply; 119+ messages in thread
From: Christoph Hellwig @ 2018-07-30  9:25 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, benh, mpe, mst, hch, linuxram, haren,
	paulus, srikar

> +const struct dma_map_ops virtio_direct_dma_ops;

This belongs into a header if it is non-static.  If you only
use it in this file anyway please mark it static and avoid a forward
declaration.

> +
>  int virtio_finalize_features(struct virtio_device *dev)
>  {
>  	int ret = dev->config->finalize_features(dev);
> @@ -174,6 +176,9 @@ int virtio_finalize_features(struct virtio_device *dev)
>  	if (ret)
>  		return ret;
>  
> +	if (virtio_has_iommu_quirk(dev))
> +		set_dma_ops(dev->dev.parent, &virtio_direct_dma_ops);

This needs a big fat comment explaining what is going on here.

Also not new, but I find the existance of virtio_has_iommu_quirk and its
name horribly confusing.  It might be better to open code it here once
only a single caller is left.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 2/4] virtio: Override device's DMA OPS with virtio_direct_dma_ops selectively
  2018-07-28 21:16     ` Michael S. Tsirkin
  2018-07-30  4:15       ` Anshuman Khandual
@ 2018-07-30  9:30       ` Christoph Hellwig
  2018-07-31  6:39         ` Anshuman Khandual
  1 sibling, 1 reply; 119+ messages in thread
From: Christoph Hellwig @ 2018-07-30  9:30 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev,
	aik, robh, joe, elfring, david, jasowang, benh, mpe, hch,
	linuxram, haren, paulus, srikar

> > +
> > +	if (xen_domain())
> > +		goto skip_override;
> > +
> > +	if (virtio_has_iommu_quirk(dev))
> > +		set_dma_ops(dev->dev.parent, &virtio_direct_dma_ops);
> > +
> > + skip_override:
> > +
> 
> I prefer normal if scoping as opposed to goto spaghetti pls.
> Better yet move vring_use_dma_api here and use it.
> Less of a chance something will break.

I agree about avoid pointless gotos here, but we can do things
perfectly well without either gotos or a confusing helper here
if we structure it right. E.g.:

	// suitably detailed comment here
	if (!xen_domain() &&
	    !virtio_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM))
		set_dma_ops(dev->dev.parent, &virtio_direct_dma_ops);

and while we're at it - modifying dma ops for the parent looks very
dangerous.  I don't think we can do that, as it could break iommu
setup interactions.  IFF we set a specific dma map ops it has to be
on the virtio device itself, of which we have full control.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-07-27  9:58 ` Will Deacon
  2018-07-27 10:58   ` Anshuman Khandual
@ 2018-07-30  9:34   ` Christoph Hellwig
  2018-07-30 10:28     ` Michael S. Tsirkin
  1 sibling, 1 reply; 119+ messages in thread
From: Christoph Hellwig @ 2018-07-30  9:34 UTC (permalink / raw)
  To: Will Deacon
  Cc: Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev,
	aik, robh, joe, elfring, david, jasowang, benh, mpe, mst, hch,
	linuxram, haren, paulus, srikar, robin.murphy,
	jean-philippe.brucker, marc.zyngier

On Fri, Jul 27, 2018 at 10:58:05AM +0100, Will Deacon wrote:
> 
> I just wanted to say that this patch series provides a means for us to
> force the coherent DMA ops for legacy virtio devices on arm64, which in turn
> means that we can enable the SMMU with legacy devices in our fastmodel
> emulation platform (which is slowly being upgraded to virtio 1.0) without
> hanging during boot. Patch below.

Yikes, this is a nightmare.  That is exactly where I do not want things
to end up.  We really need to distinguish between legacy virtual crappy
virtio (and that includes v1) that totally ignores the bus it pretends
to be on, and sane virtio (to be defined) that sit on a real (or
properly emulated including iommu and details for dma mapping) bus.

Having a mumble jumble of arch specific undocumented magic as in
the powerpc patch replied to or this arm patch is a complete no-go.

Nacked-by: Christoph Hellwig <hch@lst.de>

for both.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-07-30  9:34   ` Christoph Hellwig
@ 2018-07-30 10:28     ` Michael S. Tsirkin
  2018-07-30 11:18       ` Christoph Hellwig
  0 siblings, 1 reply; 119+ messages in thread
From: Michael S. Tsirkin @ 2018-07-30 10:28 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Will Deacon, Anshuman Khandual, virtualization, linux-kernel,
	linuxppc-dev, aik, robh, joe, elfring, david, jasowang, benh,
	mpe, linuxram, haren, paulus, srikar, robin.murphy,
	jean-philippe.brucker, marc.zyngier

On Mon, Jul 30, 2018 at 02:34:14AM -0700, Christoph Hellwig wrote:
> We really need to distinguish between legacy virtual crappy
> virtio (and that includes v1) that totally ignores the bus it pretends
> to be on, and sane virtio (to be defined) that sit on a real (or
> properly emulated including iommu and details for dma mapping) bus.

Let me reply to the "crappy" part first:
So virtio devices can run on another CPU or on a PCI bus. Configuration
can happen over mupltiple transports.  There is a discovery protocol to
figure out where it is. It has some warts but any real system has warts.

So IMHO virtio running on another CPU isn't "legacy virtual crappy
virtio". virtio devices that actually sit on a PCI bus aren't "sane"
simply because the DMA is more convoluted on some architectures.

Performance impact of the optimizations possible when you know
your "device" is in fact just another CPU has been measured,
it is real, so we aren't interested in adding all that overhead back
just so we can use DMA API. The "correct then fast" mantra doesn't
apply to something that is as widely deployed as virtio.

And I can accept an argument that maybe the DMA API isn't designed to
support such virtual DMA. Whether it should I don't know.

With this out of my system:
I agree these approaches are hacky. I think it is generally better to
have virtio feature negotiation tell you whether device runs on a CPU or
not rather than rely on platform specific ways for this. To this end
there was a recent proposal to rename VIRTIO_F_IO_BARRIER to
VIRTIO_F_REAL_DEVICE.  It got stuck since "real" sounds vague to people,
e.g.  what if it's a VF - is that real or not? But I can see something
like e.g. VIRTIO_F_PLATFORM_DMA gaining support.

We would then rename virtio_has_iommu_quirk to virtio_has_dma_quirk
and test VIRTIO_F_PLATFORM_DMA in addition to the IOMMU thing.

-- 
MST

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-07-30 10:28     ` Michael S. Tsirkin
@ 2018-07-30 11:18       ` Christoph Hellwig
  2018-07-30 13:26         ` Michael S. Tsirkin
  0 siblings, 1 reply; 119+ messages in thread
From: Christoph Hellwig @ 2018-07-30 11:18 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, benh, mpe, linuxram, haren, paulus,
	srikar, robin.murphy, jean-philippe.brucker, marc.zyngier

On Mon, Jul 30, 2018 at 01:28:03PM +0300, Michael S. Tsirkin wrote:
> Let me reply to the "crappy" part first:
> So virtio devices can run on another CPU or on a PCI bus. Configuration
> can happen over mupltiple transports.  There is a discovery protocol to
> figure out where it is. It has some warts but any real system has warts.
> 
> So IMHO virtio running on another CPU isn't "legacy virtual crappy
> virtio". virtio devices that actually sit on a PCI bus aren't "sane"
> simply because the DMA is more convoluted on some architectures.

All of what you said would be true if virtio didn't claim to be
a PCI device.  Once it claims to be a PCI device and we also see
real hardware written to the interface I stand to all what I said
above.

> With this out of my system:
> I agree these approaches are hacky. I think it is generally better to
> have virtio feature negotiation tell you whether device runs on a CPU or
> not rather than rely on platform specific ways for this. To this end
> there was a recent proposal to rename VIRTIO_F_IO_BARRIER to
> VIRTIO_F_REAL_DEVICE.  It got stuck since "real" sounds vague to people,
> e.g.  what if it's a VF - is that real or not? But I can see something
> like e.g. VIRTIO_F_PLATFORM_DMA gaining support.
> 
> We would then rename virtio_has_iommu_quirk to virtio_has_dma_quirk
> and test VIRTIO_F_PLATFORM_DMA in addition to the IOMMU thing.

I don't really care about the exact naming, and indeed a device that
sets the flag doesn't have to be a 'real' device - it just has to act
like one.  I explained all the issues that this means (at least relating
to DMA) in one of the previous threads.

The important bit is that we can specify exact behavior for both
devices that sets the "I'm real!" flag and that ones that don't exactly
in the spec.  And that very much excludes arch-specific (or
Xen-specific) overrides.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-07-30 11:18       ` Christoph Hellwig
@ 2018-07-30 13:26         ` Michael S. Tsirkin
  2018-07-31 17:30           ` Christoph Hellwig
  0 siblings, 1 reply; 119+ messages in thread
From: Michael S. Tsirkin @ 2018-07-30 13:26 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Will Deacon, Anshuman Khandual, virtualization, linux-kernel,
	linuxppc-dev, aik, robh, joe, elfring, david, jasowang, benh,
	mpe, linuxram, haren, paulus, srikar, robin.murphy,
	jean-philippe.brucker, marc.zyngier

On Mon, Jul 30, 2018 at 04:18:02AM -0700, Christoph Hellwig wrote:
> On Mon, Jul 30, 2018 at 01:28:03PM +0300, Michael S. Tsirkin wrote:
> > Let me reply to the "crappy" part first:
> > So virtio devices can run on another CPU or on a PCI bus. Configuration
> > can happen over mupltiple transports.  There is a discovery protocol to
> > figure out where it is. It has some warts but any real system has warts.
> > 
> > So IMHO virtio running on another CPU isn't "legacy virtual crappy
> > virtio". virtio devices that actually sit on a PCI bus aren't "sane"
> > simply because the DMA is more convoluted on some architectures.
> 
> All of what you said would be true if virtio didn't claim to be
> a PCI device.  

There's nothing virtio claims to be.  It's a PV device that uses PCI for
its configuration.  Configuration is enumerated on the virtual PCI bus.
That part of the interface is emulated PCI. Data path is through a
PV device enumerated on the virtio bus.

> Once it claims to be a PCI device and we also see
> real hardware written to the interface I stand to all what I said
> above.

Real hardware would reuse parts of the interface but by necessity it
needs to behave slightly differently on some platforms.  However for
some platforms (such as x86) a PV virtio driver will by luck work with a
PCI device backend without changes. As these platforms and drivers are
widely deployed, some people will deploy hardware like that.  Should be
a non issue as by definition it's transparent to guests.

> > With this out of my system:
> > I agree these approaches are hacky. I think it is generally better to
> > have virtio feature negotiation tell you whether device runs on a CPU or
> > not rather than rely on platform specific ways for this. To this end
> > there was a recent proposal to rename VIRTIO_F_IO_BARRIER to
> > VIRTIO_F_REAL_DEVICE.  It got stuck since "real" sounds vague to people,
> > e.g.  what if it's a VF - is that real or not? But I can see something
> > like e.g. VIRTIO_F_PLATFORM_DMA gaining support.
> > 
> > We would then rename virtio_has_iommu_quirk to virtio_has_dma_quirk
> > and test VIRTIO_F_PLATFORM_DMA in addition to the IOMMU thing.
> 
> I don't really care about the exact naming, and indeed a device that
> sets the flag doesn't have to be a 'real' device - it just has to act
> like one.  I explained all the issues that this means (at least relating
> to DMA) in one of the previous threads.

I believe you refer to this:
https://lkml.org/lkml/2018/6/7/15
that was a very helpful list outlining the problems we need to solve,
thanks a lot for that!

> The important bit is that we can specify exact behavior for both
> devices that sets the "I'm real!" flag and that ones that don't exactly
> in the spec.

I would very much like that, yes.

> And that very much excludes arch-specific (or
> Xen-specific) overrides.

We already committed to a xen specific hack but generally I prefer
devices that describe how they work instead of platforms magically
guessing, yes.

However the question people raise is that DMA API is already full of
arch-specific tricks the likes of which are outlined in your post linked
above. How is this one much worse?

-- 
MST

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 1/4] virtio: Define virtio_direct_dma_ops structure
  2018-07-30  9:24   ` Christoph Hellwig
@ 2018-07-31  4:01     ` Anshuman Khandual
  0 siblings, 0 replies; 119+ messages in thread
From: Anshuman Khandual @ 2018-07-31  4:01 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, benh, mpe, mst, linuxram, haren,
	paulus, srikar

On 07/30/2018 02:54 PM, Christoph Hellwig wrote:
>> +/*
>> + * Virtio direct mapping DMA API operations structure
>> + *
>> + * This defines DMA API structure for all virtio devices which would not
>> + * either bring in their own DMA OPS from architecture or they would not
>> + * like to use architecture specific IOMMU based DMA OPS because QEMU
>> + * expects GPA instead of an IOVA in absence of VIRTIO_F_IOMMU_PLATFORM.
>> + */
>> +dma_addr_t virtio_direct_map_page(struct device *dev, struct page *page,
>> +			    unsigned long offset, size_t size,
>> +			    enum dma_data_direction dir,
>> +			    unsigned long attrs)
> 
> All these functions should probably be marked static.

Sure.

> 
>> +void virtio_direct_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
>> +			size_t size, enum dma_data_direction dir,
>> +			unsigned long attrs)
>> +{
>> +}
> 
> No need to implement no-op callbacks in struct dma_map_ops.

Okay.

> 
>> +
>> +int virtio_direct_mapping_error(struct device *hwdev, dma_addr_t dma_addr)
>> +{
>> +	return 0;
>> +}
> 
> Including this one.
> 
>> +void *virtio_direct_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
>> +		gfp_t gfp, unsigned long attrs)
>> +{
>> +	void *queue = alloc_pages_exact(PAGE_ALIGN(size), gfp);
>> +
>> +	if (queue) {
>> +		phys_addr_t phys_addr = virt_to_phys(queue);
>> +		*dma_handle = (dma_addr_t)phys_addr;
>> +
>> +		if (WARN_ON_ONCE(*dma_handle != phys_addr)) {
>> +			free_pages_exact(queue, PAGE_ALIGN(size));
>> +			return NULL;
>> +		}
>> +	}
>> +	return queue;
> 
> queue is a very odd name in a generic memory allocator.

Will change it to addr.

> 
>> +void virtio_direct_free(struct device *dev, size_t size, void *vaddr,
>> +		dma_addr_t dma_addr, unsigned long attrs)
>> +{
>> +	free_pages_exact(vaddr, PAGE_ALIGN(size));
>> +}
>> +
>> +const struct dma_map_ops virtio_direct_dma_ops = {
>> +	.alloc			= virtio_direct_alloc,
>> +	.free			= virtio_direct_free,
>> +	.map_page		= virtio_direct_map_page,
>> +	.unmap_page		= virtio_direct_unmap_page,
>> +	.mapping_error		= virtio_direct_mapping_error,
>> +};
> 
> This is missing a dma_map_sg implementation.  In general this is
> mandatory for dma_ops.  So either you implement it or explain in
> a common why you think you can skip it.

Hmm. IIUC virtio core never used dma_map_sg(). Am I missing something
here ? The only reference to dma_map_sg() is inside a comment.

$git grep dma_map_sg drivers/virtio/
drivers/virtio/virtio_ring.c:    * We can't use dma_map_sg, because we don't use scatterlists in

> 
>> +EXPORT_SYMBOL(virtio_direct_dma_ops);
> 
> EXPORT_SYMBOL_GPL like all virtio symbols, please.

I am planning to drop EXPORT_SYMBOL from virtio_direct_dma_ops structure.


^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 2/4] virtio: Override device's DMA OPS with virtio_direct_dma_ops selectively
  2018-07-30  9:30       ` Christoph Hellwig
@ 2018-07-31  6:39         ` Anshuman Khandual
  0 siblings, 0 replies; 119+ messages in thread
From: Anshuman Khandual @ 2018-07-31  6:39 UTC (permalink / raw)
  To: Christoph Hellwig, Michael S. Tsirkin
  Cc: virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, benh, mpe, linuxram, haren, paulus,
	srikar

On 07/30/2018 03:00 PM, Christoph Hellwig wrote:
>>> +
>>> +	if (xen_domain())
>>> +		goto skip_override;
>>> +
>>> +	if (virtio_has_iommu_quirk(dev))
>>> +		set_dma_ops(dev->dev.parent, &virtio_direct_dma_ops);
>>> +
>>> + skip_override:
>>> +
>>
>> I prefer normal if scoping as opposed to goto spaghetti pls.
>> Better yet move vring_use_dma_api here and use it.
>> Less of a chance something will break.
> 
> I agree about avoid pointless gotos here, but we can do things
> perfectly well without either gotos or a confusing helper here
> if we structure it right. E.g.:
> 
> 	// suitably detailed comment here
> 	if (!xen_domain() &&
> 	    !virtio_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM))
> 		set_dma_ops(dev->dev.parent, &virtio_direct_dma_ops);

I had updated this patch calling vring_use_dma_api() as a helper
as suggested by Michael but yes we can have the above condition
with a comment block. I will change this patch accordingly.

> 
> and while we're at it - modifying dma ops for the parent looks very
> dangerous.  I don't think we can do that, as it could break iommu
> setup interactions.  IFF we set a specific dma map ops it has to be
> on the virtio device itself, of which we have full control.

I understand your concern. At present virtio core calls parent's DMA
ops callbacks when device has VIRTIO_F_IOMMU_PLATFORM flag set. Most
likely those DMA OPS are architecture specific ones which can really
configure IOMMU. Most probably all devices and their parents share
the same DMA ops callback. IIUC as long as the entire system has a
single DMA ops structure, it should be okay. But I may be missing
other implications. I tried changing virtio core so that it always
calls device's DMA ops instead of it's parent DMA ops, it hit the
following WARN_ON for devices without IOMMU flag and hit both the
WARN_ON and BUG_ON for devices with the IOMMU flag.

static inline void *dma_alloc_attrs(struct device *dev, size_t size,
                                       dma_addr_t *dma_handle, gfp_t flag,
                                       unsigned long attrs)
{
        const struct dma_map_ops *ops = get_dma_ops(dev);
        void *cpu_addr;

        BUG_ON(!ops);
        WARN_ON_ONCE(dev && !dev->coherent_dma_mask);

--------

Seems like virtio device's DMA ops and coherent_dma_mask was never
set correctly assuming that virtio core always called parent's DMA
OPS all the time. We may have to change virtio device init to fix
this. Any thoughts ?


^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 2/4] virtio: Override device's DMA OPS with virtio_direct_dma_ops selectively
  2018-07-30  9:25   ` Christoph Hellwig
@ 2018-07-31  7:00     ` Anshuman Khandual
  0 siblings, 0 replies; 119+ messages in thread
From: Anshuman Khandual @ 2018-07-31  7:00 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: robh, srikar, mst, aik, jasowang, linuxram, linux-kernel,
	virtualization, paulus, joe, linuxppc-dev, elfring, haren, david

On 07/30/2018 02:55 PM, Christoph Hellwig wrote:
>> +const struct dma_map_ops virtio_direct_dma_ops;
> 
> This belongs into a header if it is non-static.  If you only
> use it in this file anyway please mark it static and avoid a forward
> declaration.

Sure, will make it static, move the definition up in the file to avoid
forward declaration.
 
> 
>> +
>>  int virtio_finalize_features(struct virtio_device *dev)
>>  {
>>  	int ret = dev->config->finalize_features(dev);
>> @@ -174,6 +176,9 @@ int virtio_finalize_features(struct virtio_device *dev)
>>  	if (ret)
>>  		return ret;
>>  
>> +	if (virtio_has_iommu_quirk(dev))
>> +		set_dma_ops(dev->dev.parent, &virtio_direct_dma_ops);
> 
> This needs a big fat comment explaining what is going on here.

Sure, will do. Also talk about the XEN domain exception as well once
that goes into this conditional statement.

> 
> Also not new, but I find the existance of virtio_has_iommu_quirk and its
> name horribly confusing.  It might be better to open code it here once
> only a single caller is left.

Sure will do. There is one definition in the tools directory which can
be removed and then this will be the only one left.


^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-07-30 13:26         ` Michael S. Tsirkin
@ 2018-07-31 17:30           ` Christoph Hellwig
  2018-07-31 20:36             ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 119+ messages in thread
From: Christoph Hellwig @ 2018-07-31 17:30 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, benh, mpe, linuxram, haren, paulus,
	srikar, robin.murphy, jean-philippe.brucker, marc.zyngier

On Mon, Jul 30, 2018 at 04:26:32PM +0300, Michael S. Tsirkin wrote:
> Real hardware would reuse parts of the interface but by necessity it
> needs to behave slightly differently on some platforms.  However for
> some platforms (such as x86) a PV virtio driver will by luck work with a
> PCI device backend without changes. As these platforms and drivers are
> widely deployed, some people will deploy hardware like that.  Should be
> a non issue as by definition it's transparent to guests.

On some x86.  As soon as you have an iommu or strange PCI root ports
things are going to start breaking apart.

> > And that very much excludes arch-specific (or
> > Xen-specific) overrides.
> 
> We already committed to a xen specific hack but generally I prefer
> devices that describe how they work instead of platforms magically
> guessing, yes.

For legacy reasons I guess we'll have to keep it, but we really need
to avoid adding more junk than this.

> However the question people raise is that DMA API is already full of
> arch-specific tricks the likes of which are outlined in your post linked
> above. How is this one much worse?

None of these warts is visible to the driver, they are all handled in
the architecture (possibly on a per-bus basis).

So for virtio we really need to decide if it has one set of behavior
as specified in the virtio spec, or if it behaves exactly as if it
was on a PCI bus, or in fact probably both as you lined up.  But no
magic arch specific behavior inbetween.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-07-31 17:30           ` Christoph Hellwig
@ 2018-07-31 20:36             ` Benjamin Herrenschmidt
  2018-08-01  8:16               ` Will Deacon
  2018-08-01 21:56               ` Michael S. Tsirkin
  0 siblings, 2 replies; 119+ messages in thread
From: Benjamin Herrenschmidt @ 2018-07-31 20:36 UTC (permalink / raw)
  To: Christoph Hellwig, Michael S. Tsirkin
  Cc: Will Deacon, Anshuman Khandual, virtualization, linux-kernel,
	linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe,
	linuxram, haren, paulus, srikar, robin.murphy,
	jean-philippe.brucker, marc.zyngier

On Tue, 2018-07-31 at 10:30 -0700, Christoph Hellwig wrote:
> > However the question people raise is that DMA API is already full of
> > arch-specific tricks the likes of which are outlined in your post linked
> > above. How is this one much worse?
> 
> None of these warts is visible to the driver, they are all handled in
> the architecture (possibly on a per-bus basis).
> 
> So for virtio we really need to decide if it has one set of behavior
> as specified in the virtio spec, or if it behaves exactly as if it
> was on a PCI bus, or in fact probably both as you lined up.  But no
> magic arch specific behavior inbetween.

The only arch specific behaviour is needed in the case where it doesn't
behave like PCI. In this case, the PCI DMA ops are not suitable, but in
our secure VMs, we still need to make it use swiotlb in order to bounce
through non-secure pages.

It would be nice if "real PCI" was the default but it's not, VMs are
created in "legacy" mode all the times and we don't know at VM creation
time whether it will become a secure VM or not. The way our secure VMs
work is that they start as a normal VM, load a secure "payload" and
call the Ultravisor to "become" secure.

So we're in a bit of a bind here. We need that one-liner optional arch
hook to make virtio use swiotlb in that "IOMMU bypass" case.

Ben.


^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-07-31 20:36             ` Benjamin Herrenschmidt
@ 2018-08-01  8:16               ` Will Deacon
  2018-08-01  8:36                 ` Christoph Hellwig
  2018-08-05  0:27                 ` Michael S. Tsirkin
  2018-08-01 21:56               ` Michael S. Tsirkin
  1 sibling, 2 replies; 119+ messages in thread
From: Will Deacon @ 2018-08-01  8:16 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Christoph Hellwig, Michael S. Tsirkin, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Tue, Jul 31, 2018 at 03:36:22PM -0500, Benjamin Herrenschmidt wrote:
> On Tue, 2018-07-31 at 10:30 -0700, Christoph Hellwig wrote:
> > > However the question people raise is that DMA API is already full of
> > > arch-specific tricks the likes of which are outlined in your post linked
> > > above. How is this one much worse?
> > 
> > None of these warts is visible to the driver, they are all handled in
> > the architecture (possibly on a per-bus basis).
> > 
> > So for virtio we really need to decide if it has one set of behavior
> > as specified in the virtio spec, or if it behaves exactly as if it
> > was on a PCI bus, or in fact probably both as you lined up.  But no
> > magic arch specific behavior inbetween.
> 
> The only arch specific behaviour is needed in the case where it doesn't
> behave like PCI. In this case, the PCI DMA ops are not suitable, but in
> our secure VMs, we still need to make it use swiotlb in order to bounce
> through non-secure pages.

On arm/arm64, the problem we have is that legacy virtio devices on the MMIO
transport (so definitely not PCI) have historically been advertised by qemu
as not being cache coherent, but because the virtio core has bypassed DMA
ops then everything has happened to work. If we blindly enable the arch DMA
ops, we'll plumb in the non-coherent ops and start getting data corruption,
so we do need a way to quirk virtio as being "always coherent" if we want to
use the DMA ops (which we do, because our emulation platforms have an IOMMU
for all virtio devices).

Will

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-01  8:16               ` Will Deacon
@ 2018-08-01  8:36                 ` Christoph Hellwig
  2018-08-01  9:05                   ` Will Deacon
                                     ` (2 more replies)
  2018-08-05  0:27                 ` Michael S. Tsirkin
  1 sibling, 3 replies; 119+ messages in thread
From: Christoph Hellwig @ 2018-08-01  8:36 UTC (permalink / raw)
  To: Will Deacon
  Cc: Benjamin Herrenschmidt, Christoph Hellwig, Michael S. Tsirkin,
	Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev,
	aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren,
	paulus, srikar, robin.murphy, jean-philippe.brucker,
	marc.zyngier

On Wed, Aug 01, 2018 at 09:16:38AM +0100, Will Deacon wrote:
> On arm/arm64, the problem we have is that legacy virtio devices on the MMIO
> transport (so definitely not PCI) have historically been advertised by qemu
> as not being cache coherent, but because the virtio core has bypassed DMA
> ops then everything has happened to work. If we blindly enable the arch DMA
> ops,

No one is suggesting that as far as I can tell.

> we'll plumb in the non-coherent ops and start getting data corruption,
> so we do need a way to quirk virtio as being "always coherent" if we want to
> use the DMA ops (which we do, because our emulation platforms have an IOMMU
> for all virtio devices).

From all that I've gather so far: no you do not want that.  We really
need to figure out virtio "dma" interacts with the host / device.

If you look at the current iommu spec it does talk of physical address
with a little careveout for VIRTIO_F_IOMMU_PLATFORM.

So between that and our discussion in this thread and its previous
iterations I think we need to stick to the current always physical,
bypass system dma ops mode of virtio operation as the default.

We just need to figure out how to deal with devices that deviate
from the default.  One things is that VIRTIO_F_IOMMU_PLATFORM really
should become VIRTIO_F_PLATFORM_DMA to cover the cases of non-iommu
dma tweaks (offsets, cache flushing), which seems well in spirit of
the original design.  The other issue is VIRTIO_F_IO_BARRIER
which is very vaguely defined, and which needs a better definition.
And last but not least we'll need some text explaining the challenges
of hardware devices - I think VIRTIO_F_PLATFORM_DMA + VIRTIO_F_IO_BARRIER
is what would basically cover them, but a good description including
an explanation of why these matter.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-01  8:36                 ` Christoph Hellwig
@ 2018-08-01  9:05                   ` Will Deacon
  2018-08-01 22:41                     ` Michael S. Tsirkin
  2018-08-01 22:35                   ` Michael S. Tsirkin
  2018-08-02 15:24                   ` Benjamin Herrenschmidt
  2 siblings, 1 reply; 119+ messages in thread
From: Will Deacon @ 2018-08-01  9:05 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Benjamin Herrenschmidt, Michael S. Tsirkin, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

Hi Christoph,

On Wed, Aug 01, 2018 at 01:36:39AM -0700, Christoph Hellwig wrote:
> On Wed, Aug 01, 2018 at 09:16:38AM +0100, Will Deacon wrote:
> > On arm/arm64, the problem we have is that legacy virtio devices on the MMIO
> > transport (so definitely not PCI) have historically been advertised by qemu
> > as not being cache coherent, but because the virtio core has bypassed DMA
> > ops then everything has happened to work. If we blindly enable the arch DMA
> > ops,
> 
> No one is suggesting that as far as I can tell.

Apologies: it's me that wants the DMA ops enabled to handle legacy devices
behind an IOMMU, but see below.

> > we'll plumb in the non-coherent ops and start getting data corruption,
> > so we do need a way to quirk virtio as being "always coherent" if we want to
> > use the DMA ops (which we do, because our emulation platforms have an IOMMU
> > for all virtio devices).
> 
> From all that I've gather so far: no you do not want that.  We really
> need to figure out virtio "dma" interacts with the host / device.
> 
> If you look at the current iommu spec it does talk of physical address
> with a little careveout for VIRTIO_F_IOMMU_PLATFORM.

That's true, although that doesn't exist in the legacy virtio spec, and we
have an existing emulation platform which puts legacy virtio devices behind
an IOMMU. Currently, Linux is unable to boot on this platform unless the
IOMMU is configured as bypass. If we can use the coherent IOMMU DMA ops,
then it works perfectly.

> So between that and our discussion in this thread and its previous
> iterations I think we need to stick to the current always physical,
> bypass system dma ops mode of virtio operation as the default.

As above -- that means we hang during boot because we get stuck trying to
bring up a virtio-block device whose DMA is aborted by the IOMMU. The easy
answer is "just upgrade to latest virtio and advertise the presence of the
IOMMU". I'm pushing for that in future platforms, but it seems a shame not
to support the current platform, especially given that other systems do have
hacks in mainline to get virtio working.

> We just need to figure out how to deal with devices that deviate
> from the default.  One things is that VIRTIO_F_IOMMU_PLATFORM really
> should become VIRTIO_F_PLATFORM_DMA to cover the cases of non-iommu
> dma tweaks (offsets, cache flushing), which seems well in spirit of
> the original design.  The other issue is VIRTIO_F_IO_BARRIER
> which is very vaguely defined, and which needs a better definition.
> And last but not least we'll need some text explaining the challenges
> of hardware devices - I think VIRTIO_F_PLATFORM_DMA + VIRTIO_F_IO_BARRIER
> is what would basically cover them, but a good description including
> an explanation of why these matter.

I agree that this makes sense for future revisions of virtio (or perhaps
it can just be a clarification to virtio 1.0), but we're still left in the
dark with legacy devices and it would be nice to have them work on the
systems which currently exist, even if it's a legacy-only hack in the arch
code.

Will

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-07-31 20:36             ` Benjamin Herrenschmidt
  2018-08-01  8:16               ` Will Deacon
@ 2018-08-01 21:56               ` Michael S. Tsirkin
  2018-08-02 15:33                 ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 119+ messages in thread
From: Michael S. Tsirkin @ 2018-08-01 21:56 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Tue, Jul 31, 2018 at 03:36:22PM -0500, Benjamin Herrenschmidt wrote:
> On Tue, 2018-07-31 at 10:30 -0700, Christoph Hellwig wrote:
> > > However the question people raise is that DMA API is already full of
> > > arch-specific tricks the likes of which are outlined in your post linked
> > > above. How is this one much worse?
> > 
> > None of these warts is visible to the driver, they are all handled in
> > the architecture (possibly on a per-bus basis).
> > 
> > So for virtio we really need to decide if it has one set of behavior
> > as specified in the virtio spec, or if it behaves exactly as if it
> > was on a PCI bus, or in fact probably both as you lined up.  But no
> > magic arch specific behavior inbetween.
> 
> The only arch specific behaviour is needed in the case where it doesn't
> behave like PCI. In this case, the PCI DMA ops are not suitable, but in
> our secure VMs, we still need to make it use swiotlb in order to bounce
> through non-secure pages.
> 
> It would be nice if "real PCI" was the default

I think you are mixing "real PCI" which isn't coded up yet and IOMMU
bypass which is. IOMMU bypass will maybe with time become unnecessary
since it seems that one can just program an IOMMU in a bypass mode
instead.

It's hard to blame you since right now if you disable IOMMU bypass
you get a real PCI mode. But they are distinct and to allow people
to enable IOMMU by default we will need to teach someone
(virtio or DMA API) about this mode that does follow
translation and protection rules in the IOMMU but runs
on a CPU and so does not need cache flushes and whatnot.

OTOH real PCI mode as opposed to default hypervisor mode does not perform as
well when what you actually have is a hypervisor.

So we'll likely have a mix of these two modes for a while.

> but it's not, VMs are
> created in "legacy" mode all the times and we don't know at VM creation
> time whether it will become a secure VM or not. The way our secure VMs
> work is that they start as a normal VM, load a secure "payload" and
> call the Ultravisor to "become" secure.
> 
> So we're in a bit of a bind here. We need that one-liner optional arch
> hook to make virtio use swiotlb in that "IOMMU bypass" case.
> 
> Ben.

And just to make sure I understand, on your platform DMA APIs do include
some of the cache flushing tricks and this is why you don't want to
declare iommu support in the hypervisor?

-- 
MST

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-01  8:36                 ` Christoph Hellwig
  2018-08-01  9:05                   ` Will Deacon
@ 2018-08-01 22:35                   ` Michael S. Tsirkin
  2018-08-02 15:24                   ` Benjamin Herrenschmidt
  2 siblings, 0 replies; 119+ messages in thread
From: Michael S. Tsirkin @ 2018-08-01 22:35 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Will Deacon, Benjamin Herrenschmidt, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Wed, Aug 01, 2018 at 01:36:39AM -0700, Christoph Hellwig wrote:
> On Wed, Aug 01, 2018 at 09:16:38AM +0100, Will Deacon wrote:
> > On arm/arm64, the problem we have is that legacy virtio devices on the MMIO
> > transport (so definitely not PCI) have historically been advertised by qemu
> > as not being cache coherent, but because the virtio core has bypassed DMA
> > ops then everything has happened to work. If we blindly enable the arch DMA
> > ops,
> 
> No one is suggesting that as far as I can tell.
> 
> > we'll plumb in the non-coherent ops and start getting data corruption,
> > so we do need a way to quirk virtio as being "always coherent" if we want to
> > use the DMA ops (which we do, because our emulation platforms have an IOMMU
> > for all virtio devices).
> 
> >From all that I've gather so far: no you do not want that.  We really
> need to figure out virtio "dma" interacts with the host / device.
> 
> If you look at the current iommu spec it does talk of physical address
> with a little careveout for VIRTIO_F_IOMMU_PLATFORM.
> 
> So between that and our discussion in this thread and its previous
> iterations I think we need to stick to the current always physical,
> bypass system dma ops mode of virtio operation as the default.
> 
> We just need to figure out how to deal with devices that deviate
> from the default.  One things is that VIRTIO_F_IOMMU_PLATFORM really
> should become VIRTIO_F_PLATFORM_DMA to cover the cases of non-iommu
> dma tweaks (offsets, cache flushing), which seems well in spirit of
> the original design.

Well I wouldn't say that. VIRTIO_F_IOMMU_PLATFORM is for guest
programmable protection which is designed for things like userspace
drivers but still very much which a CPU doing the accesses. I think
VIRTIO_F_IO_BARRIER needs to be extended to VIRTIO_F_PLATFORM_DMA.

>  The other issue is VIRTIO_F_IO_BARRIER
> which is very vaguely defined, and which needs a better definition.
> And last but not least we'll need some text explaining the challenges
> of hardware devices - I think VIRTIO_F_PLATFORM_DMA + VIRTIO_F_IO_BARRIER
> is what would basically cover them, but a good description including
> an explanation of why these matter.

I think VIRTIO_F_IOMMU_PLATFORM + VIRTIO_F_PLATFORM_DMA but yea.

-- 
MST

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-01  9:05                   ` Will Deacon
@ 2018-08-01 22:41                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 119+ messages in thread
From: Michael S. Tsirkin @ 2018-08-01 22:41 UTC (permalink / raw)
  To: Will Deacon
  Cc: Christoph Hellwig, Benjamin Herrenschmidt, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Wed, Aug 01, 2018 at 10:05:35AM +0100, Will Deacon wrote:
> Hi Christoph,
> 
> On Wed, Aug 01, 2018 at 01:36:39AM -0700, Christoph Hellwig wrote:
> > On Wed, Aug 01, 2018 at 09:16:38AM +0100, Will Deacon wrote:
> > > On arm/arm64, the problem we have is that legacy virtio devices on the MMIO
> > > transport (so definitely not PCI) have historically been advertised by qemu
> > > as not being cache coherent, but because the virtio core has bypassed DMA
> > > ops then everything has happened to work. If we blindly enable the arch DMA
> > > ops,
> > 
> > No one is suggesting that as far as I can tell.
> 
> Apologies: it's me that wants the DMA ops enabled to handle legacy devices
> behind an IOMMU, but see below.
> 
> > > we'll plumb in the non-coherent ops and start getting data corruption,
> > > so we do need a way to quirk virtio as being "always coherent" if we want to
> > > use the DMA ops (which we do, because our emulation platforms have an IOMMU
> > > for all virtio devices).
> > 
> > From all that I've gather so far: no you do not want that.  We really
> > need to figure out virtio "dma" interacts with the host / device.
> > 
> > If you look at the current iommu spec it does talk of physical address
> > with a little careveout for VIRTIO_F_IOMMU_PLATFORM.
> 
> That's true, although that doesn't exist in the legacy virtio spec, and we
> have an existing emulation platform which puts legacy virtio devices behind
> an IOMMU. Currently, Linux is unable to boot on this platform unless the
> IOMMU is configured as bypass. If we can use the coherent IOMMU DMA ops,
> then it works perfectly.
> 
> > So between that and our discussion in this thread and its previous
> > iterations I think we need to stick to the current always physical,
> > bypass system dma ops mode of virtio operation as the default.
> 
> As above -- that means we hang during boot because we get stuck trying to
> bring up a virtio-block device whose DMA is aborted by the IOMMU. The easy
> answer is "just upgrade to latest virtio and advertise the presence of the
> IOMMU". I'm pushing for that in future platforms, but it seems a shame not
> to support the current platform, especially given that other systems do have
> hacks in mainline to get virtio working.
> 
> > We just need to figure out how to deal with devices that deviate
> > from the default.  One things is that VIRTIO_F_IOMMU_PLATFORM really
> > should become VIRTIO_F_PLATFORM_DMA to cover the cases of non-iommu
> > dma tweaks (offsets, cache flushing), which seems well in spirit of
> > the original design.  The other issue is VIRTIO_F_IO_BARRIER
> > which is very vaguely defined, and which needs a better definition.
> > And last but not least we'll need some text explaining the challenges
> > of hardware devices - I think VIRTIO_F_PLATFORM_DMA + VIRTIO_F_IO_BARRIER
> > is what would basically cover them, but a good description including
> > an explanation of why these matter.
> 
> I agree that this makes sense for future revisions of virtio (or perhaps
> it can just be a clarification to virtio 1.0), but we're still left in the
> dark with legacy devices and it would be nice to have them work on the
> systems which currently exist, even if it's a legacy-only hack in the arch
> code.
> 
> Will


Myself I'm sympathetic to this use-case and I see more uses to this
than just legacy support. But more work is required IMHO.
Will post tomorrow though - it's late here ...

-- 
MST

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-01  8:36                 ` Christoph Hellwig
  2018-08-01  9:05                   ` Will Deacon
  2018-08-01 22:35                   ` Michael S. Tsirkin
@ 2018-08-02 15:24                   ` Benjamin Herrenschmidt
  2018-08-02 15:41                     ` Michael S. Tsirkin
  2 siblings, 1 reply; 119+ messages in thread
From: Benjamin Herrenschmidt @ 2018-08-02 15:24 UTC (permalink / raw)
  To: Christoph Hellwig, Will Deacon
  Cc: Michael S. Tsirkin, Anshuman Khandual, virtualization,
	linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david,
	jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy,
	jean-philippe.brucker, marc.zyngier

On Wed, 2018-08-01 at 01:36 -0700, Christoph Hellwig wrote:
> We just need to figure out how to deal with devices that deviate
> from the default.  One things is that VIRTIO_F_IOMMU_PLATFORM really
> should become VIRTIO_F_PLATFORM_DMA to cover the cases of non-iommu
> dma tweaks (offsets, cache flushing), which seems well in spirit of
> the original design. 

I don't completely agree:

1 - VIRTIO_F_IOMMU_PLATFORM is a property of the "other side", ie qemu
for example. It indicates that the peer bypasses the normal platform
iommu. The platform code in the guest has no real way to know that this
is the case, this is a specific "feature" of the qemu implementation.

2 - VIRTIO_F_PLATFORM_DMA (or whatever you want to call it), is a
property of the guest platform itself (not qemu), there's no way the
"peer" can advertize it via the virtio negociated flags. At least for
us. I don't know for sure whether that would be workable for the ARM
case. In our case, qemu has no idea at VM creation time that the VM
will turn itself into a secure VM and thus will require bounce
buffering for IOs (including virtio).

So unless we have another hook for the arch code to set
VIRTIO_F_PLATFORM_DMA on selected (or all) virtio devices from the
guest itself, I don't see that as a way to deal with it.

>  The other issue is VIRTIO_F_IO_BARRIER
> which is very vaguely defined, and which needs a better definition.
> And last but not least we'll need some text explaining the challenges
> of hardware devices - I think VIRTIO_F_PLATFORM_DMA + VIRTIO_F_IO_BARRIER
> is what would basically cover them, but a good description including
> an explanation of why these matter.

Ben.



^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-01 21:56               ` Michael S. Tsirkin
@ 2018-08-02 15:33                 ` Benjamin Herrenschmidt
  2018-08-02 20:53                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 119+ messages in thread
From: Benjamin Herrenschmidt @ 2018-08-02 15:33 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Thu, 2018-08-02 at 00:56 +0300, Michael S. Tsirkin wrote:
> > but it's not, VMs are
> > created in "legacy" mode all the times and we don't know at VM creation
> > time whether it will become a secure VM or not. The way our secure VMs
> > work is that they start as a normal VM, load a secure "payload" and
> > call the Ultravisor to "become" secure.
> > 
> > So we're in a bit of a bind here. We need that one-liner optional arch
> > hook to make virtio use swiotlb in that "IOMMU bypass" case.
> > 
> > Ben.
> 
> And just to make sure I understand, on your platform DMA APIs do include
> some of the cache flushing tricks and this is why you don't want to
> declare iommu support in the hypervisor?

I'm not sure I parse what you mean.

We don't need cache flushing tricks. The problem we have with our
"secure" VMs is that:

 - At VM creation time we have no idea it's going to become a secure
VM, qemu doesn't know anything about it, and thus qemu (or other
management tools, libvirt etc...) are going to create "legacy" (ie
iommu bypass) virtio devices.

 - Once the VM goes secure (early during boot but too late for qemu),
it will need to make virtio do bounce-buffering via swiotlb because
qemu cannot physically access most VM pages (blocked by HW security
features), we need to bounce buffer using some unsecure pages that are
accessible to qemu.

That said, I wouldn't object for us to more generally switch long run
to changing qemu so that virtio on powerpc starts using the IOMMU as a
default provided we fix our guest firmware to understand it (it
currently doesn't), and provided we verify that the performance impact
on things like vhost is negligible.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-02 15:24                   ` Benjamin Herrenschmidt
@ 2018-08-02 15:41                     ` Michael S. Tsirkin
  2018-08-02 16:01                       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 119+ messages in thread
From: Michael S. Tsirkin @ 2018-08-02 15:41 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Thu, Aug 02, 2018 at 10:24:57AM -0500, Benjamin Herrenschmidt wrote:
> On Wed, 2018-08-01 at 01:36 -0700, Christoph Hellwig wrote:
> > We just need to figure out how to deal with devices that deviate
> > from the default.  One things is that VIRTIO_F_IOMMU_PLATFORM really
> > should become VIRTIO_F_PLATFORM_DMA to cover the cases of non-iommu
> > dma tweaks (offsets, cache flushing), which seems well in spirit of
> > the original design. 
> 
> I don't completely agree:
> 
> 1 - VIRTIO_F_IOMMU_PLATFORM is a property of the "other side", ie qemu
> for example. It indicates that the peer bypasses the normal platform
> iommu. The platform code in the guest has no real way to know that this
> is the case, this is a specific "feature" of the qemu implementation.
> 
> 2 - VIRTIO_F_PLATFORM_DMA (or whatever you want to call it), is a
> property of the guest platform itself (not qemu), there's no way the
> "peer" can advertize it via the virtio negociated flags. At least for
> us. I don't know for sure whether that would be workable for the ARM
> case. In our case, qemu has no idea at VM creation time that the VM
> will turn itself into a secure VM and thus will require bounce
> buffering for IOs (including virtio).
> 
> So unless we have another hook for the arch code to set
> VIRTIO_F_PLATFORM_DMA on selected (or all) virtio devices from the
> guest itself, I don't see that as a way to deal with it.
> 
> >  The other issue is VIRTIO_F_IO_BARRIER
> > which is very vaguely defined, and which needs a better definition.
> > And last but not least we'll need some text explaining the challenges
> > of hardware devices - I think VIRTIO_F_PLATFORM_DMA + VIRTIO_F_IO_BARRIER
> > is what would basically cover them, but a good description including
> > an explanation of why these matter.
> 
> Ben.
> 

So is it true that from qemu point of view there is nothing special
going on?  You pass in a PA, host writes there.


-- 
MST

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-02 15:41                     ` Michael S. Tsirkin
@ 2018-08-02 16:01                       ` Benjamin Herrenschmidt
  2018-08-02 17:19                         ` Michael S. Tsirkin
  0 siblings, 1 reply; 119+ messages in thread
From: Benjamin Herrenschmidt @ 2018-08-02 16:01 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Thu, 2018-08-02 at 18:41 +0300, Michael S. Tsirkin wrote:
> 
> > I don't completely agree:
> > 
> > 1 - VIRTIO_F_IOMMU_PLATFORM is a property of the "other side", ie qemu
> > for example. It indicates that the peer bypasses the normal platform
> > iommu. The platform code in the guest has no real way to know that this
> > is the case, this is a specific "feature" of the qemu implementation.
> > 
> > 2 - VIRTIO_F_PLATFORM_DMA (or whatever you want to call it), is a
> > property of the guest platform itself (not qemu), there's no way the
> > "peer" can advertize it via the virtio negociated flags. At least for
> > us. I don't know for sure whether that would be workable for the ARM
> > case. In our case, qemu has no idea at VM creation time that the VM
> > will turn itself into a secure VM and thus will require bounce
> > buffering for IOs (including virtio).
> > 
> > So unless we have another hook for the arch code to set
> > VIRTIO_F_PLATFORM_DMA on selected (or all) virtio devices from the
> > guest itself, I don't see that as a way to deal with it.
> > 
> > >  The other issue is VIRTIO_F_IO_BARRIER
> > > which is very vaguely defined, and which needs a better definition.
> > > And last but not least we'll need some text explaining the challenges
> > > of hardware devices - I think VIRTIO_F_PLATFORM_DMA + VIRTIO_F_IO_BARRIER
> > > is what would basically cover them, but a good description including
> > > an explanation of why these matter.
> > 
> > Ben.
> > 
> 
> So is it true that from qemu point of view there is nothing special
> going on?  You pass in a PA, host writes there.

Yes, qemu doesn't see a different. It's the guest that will bounce the
pages via a pool of "insecure" pages that qemu can access. Normal pages
in a secure VM come from PAs that qemu cannot physically access.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-02 16:01                       ` Benjamin Herrenschmidt
@ 2018-08-02 17:19                         ` Michael S. Tsirkin
  2018-08-02 17:53                           ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 119+ messages in thread
From: Michael S. Tsirkin @ 2018-08-02 17:19 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Thu, Aug 02, 2018 at 11:01:26AM -0500, Benjamin Herrenschmidt wrote:
> On Thu, 2018-08-02 at 18:41 +0300, Michael S. Tsirkin wrote:
> > 
> > > I don't completely agree:
> > > 
> > > 1 - VIRTIO_F_IOMMU_PLATFORM is a property of the "other side", ie qemu
> > > for example. It indicates that the peer bypasses the normal platform
> > > iommu. The platform code in the guest has no real way to know that this
> > > is the case, this is a specific "feature" of the qemu implementation.
> > > 
> > > 2 - VIRTIO_F_PLATFORM_DMA (or whatever you want to call it), is a
> > > property of the guest platform itself (not qemu), there's no way the
> > > "peer" can advertize it via the virtio negociated flags. At least for
> > > us. I don't know for sure whether that would be workable for the ARM
> > > case. In our case, qemu has no idea at VM creation time that the VM
> > > will turn itself into a secure VM and thus will require bounce
> > > buffering for IOs (including virtio).
> > > 
> > > So unless we have another hook for the arch code to set
> > > VIRTIO_F_PLATFORM_DMA on selected (or all) virtio devices from the
> > > guest itself, I don't see that as a way to deal with it.
> > > 
> > > >  The other issue is VIRTIO_F_IO_BARRIER
> > > > which is very vaguely defined, and which needs a better definition.
> > > > And last but not least we'll need some text explaining the challenges
> > > > of hardware devices - I think VIRTIO_F_PLATFORM_DMA + VIRTIO_F_IO_BARRIER
> > > > is what would basically cover them, but a good description including
> > > > an explanation of why these matter.
> > > 
> > > Ben.
> > > 
> > 
> > So is it true that from qemu point of view there is nothing special
> > going on?  You pass in a PA, host writes there.
> 
> Yes, qemu doesn't see a different. It's the guest that will bounce the
> pages via a pool of "insecure" pages that qemu can access. Normal pages
> in a secure VM come from PAs that qemu cannot physically access.
> 
> Cheers,
> Ben.
> 

I see. So yes, given that device does not know or care, using
virtio features is an awkward fit.

So let's say as a quick fix for you maybe we could generalize the
xen_domain hack, instead of just checking xen_domain check some static
branch.  Then teach xen and others to enable that.
OK but problem then becomes this: if you do this and virtio device appears
behind a vIOMMU and it does not advertize the IOMMU flag, the
code will try to use the vIOMMU mappings and fail.

It does look like even with trick above, you need a special version of
DMA ops that does just swiotlb but not any of the other things DMA API
might do.

Thoughts?

-- 
MST

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-02 17:19                         ` Michael S. Tsirkin
@ 2018-08-02 17:53                           ` Benjamin Herrenschmidt
  2018-08-02 20:52                             ` Michael S. Tsirkin
  0 siblings, 1 reply; 119+ messages in thread
From: Benjamin Herrenschmidt @ 2018-08-02 17:53 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Thu, 2018-08-02 at 20:19 +0300, Michael S. Tsirkin wrote:
> 
> I see. So yes, given that device does not know or care, using
> virtio features is an awkward fit.
> 
> So let's say as a quick fix for you maybe we could generalize the
> xen_domain hack, instead of just checking xen_domain check some static
> branch.  Then teach xen and others to enable that.>

> OK but problem then becomes this: if you do this and virtio device appears
> behind a vIOMMU and it does not advertize the IOMMU flag, the
> code will try to use the vIOMMU mappings and fail.
>
> It does look like even with trick above, you need a special version of
> DMA ops that does just swiotlb but not any of the other things DMA API
> might do.
> 
> Thoughts?

Yes, this is the purpose of Anshuman original patch (I haven't looked
at the details of the patch in a while but that's what I told him to
implement ;-) :

 - Make virtio always use DMA ops to simplify the code path (with a set
of "transparent" ops for legacy)

 and

 -  Provide an arch hook allowing us to "override" those "transparent"
DMA ops with some custom ones that do the appropriate swiotlb gunk.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-02 17:53                           ` Benjamin Herrenschmidt
@ 2018-08-02 20:52                             ` Michael S. Tsirkin
  2018-08-02 21:13                               ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 119+ messages in thread
From: Michael S. Tsirkin @ 2018-08-02 20:52 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Thu, Aug 02, 2018 at 12:53:41PM -0500, Benjamin Herrenschmidt wrote:
> On Thu, 2018-08-02 at 20:19 +0300, Michael S. Tsirkin wrote:
> > 
> > I see. So yes, given that device does not know or care, using
> > virtio features is an awkward fit.
> > 
> > So let's say as a quick fix for you maybe we could generalize the
> > xen_domain hack, instead of just checking xen_domain check some static
> > branch.  Then teach xen and others to enable that.>
> 
> > OK but problem then becomes this: if you do this and virtio device appears
> > behind a vIOMMU and it does not advertize the IOMMU flag, the
> > code will try to use the vIOMMU mappings and fail.
> >
> > It does look like even with trick above, you need a special version of
> > DMA ops that does just swiotlb but not any of the other things DMA API
> > might do.
> > 
> > Thoughts?
> 
> Yes, this is the purpose of Anshuman original patch (I haven't looked
> at the details of the patch in a while but that's what I told him to
> implement ;-) :
> 
>  - Make virtio always use DMA ops to simplify the code path (with a set
> of "transparent" ops for legacy)
> 
>  and
> 
>  -  Provide an arch hook allowing us to "override" those "transparent"
> DMA ops with some custom ones that do the appropriate swiotlb gunk.
> 
> Cheers,
> Ben.
> 

Right but as I tried to say doing that brings us to a bunch of issues
with using DMA APIs in virtio. Put simply DMA APIs weren't designed for
guest to hypervisor communication.

When we do (as is the case with PLATFORM_IOMMU right now) this adds a
bunch of overhead which we need to get rid of if we are to switch to
PLATFORM_IOMMU by default.  We need to fix that.

-- 
MST

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-02 15:33                 ` Benjamin Herrenschmidt
@ 2018-08-02 20:53                   ` Michael S. Tsirkin
  2018-08-03  7:06                     ` Christoph Hellwig
  0 siblings, 1 reply; 119+ messages in thread
From: Michael S. Tsirkin @ 2018-08-02 20:53 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Thu, Aug 02, 2018 at 10:33:05AM -0500, Benjamin Herrenschmidt wrote:
> On Thu, 2018-08-02 at 00:56 +0300, Michael S. Tsirkin wrote:
> > > but it's not, VMs are
> > > created in "legacy" mode all the times and we don't know at VM creation
> > > time whether it will become a secure VM or not. The way our secure VMs
> > > work is that they start as a normal VM, load a secure "payload" and
> > > call the Ultravisor to "become" secure.
> > > 
> > > So we're in a bit of a bind here. We need that one-liner optional arch
> > > hook to make virtio use swiotlb in that "IOMMU bypass" case.
> > > 
> > > Ben.
> > 
> > And just to make sure I understand, on your platform DMA APIs do include
> > some of the cache flushing tricks and this is why you don't want to
> > declare iommu support in the hypervisor?
> 
> I'm not sure I parse what you mean.
> 
> We don't need cache flushing tricks.

You don't but do real devices on same platform need them?

> The problem we have with our
> "secure" VMs is that:
> 
>  - At VM creation time we have no idea it's going to become a secure
> VM, qemu doesn't know anything about it, and thus qemu (or other
> management tools, libvirt etc...) are going to create "legacy" (ie
> iommu bypass) virtio devices.
> 
>  - Once the VM goes secure (early during boot but too late for qemu),
> it will need to make virtio do bounce-buffering via swiotlb because
> qemu cannot physically access most VM pages (blocked by HW security
> features), we need to bounce buffer using some unsecure pages that are
> accessible to qemu.
> 
> That said, I wouldn't object for us to more generally switch long run
> to changing qemu so that virtio on powerpc starts using the IOMMU as a
> default provided we fix our guest firmware to understand it (it
> currently doesn't), and provided we verify that the performance impact
> on things like vhost is negligible.
> 
> Cheers,
> Ben.
> 

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-07-20  3:59 [RFC 0/4] Virtio uses DMA API for all devices Anshuman Khandual
                   ` (5 preceding siblings ...)
  2018-07-27  9:58 ` Will Deacon
@ 2018-08-02 20:55 ` Michael S. Tsirkin
  2018-08-03  2:41   ` Jason Wang
  6 siblings, 1 reply; 119+ messages in thread
From: Michael S. Tsirkin @ 2018-08-02 20:55 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, benh, mpe, hch, linuxram, haren,
	paulus, srikar

On Fri, Jul 20, 2018 at 09:29:37AM +0530, Anshuman Khandual wrote:
> This patch series is the follow up on the discussions we had before about
> the RFC titled [RFC,V2] virtio: Add platform specific DMA API translation
> for virito devices (https://patchwork.kernel.org/patch/10417371/). There
> were suggestions about doing away with two different paths of transactions
> with the host/QEMU, first being the direct GPA and the other being the DMA
> API based translations.
> 
> First patch attempts to create a direct GPA mapping based DMA operations
> structure called 'virtio_direct_dma_ops' with exact same implementation
> of the direct GPA path which virtio core currently has but just wrapped in
> a DMA API format. Virtio core must use 'virtio_direct_dma_ops' instead of
> the arch default in absence of VIRTIO_F_IOMMU_PLATFORM flag to preserve the
> existing semantics. The second patch does exactly that inside the function
> virtio_finalize_features(). The third patch removes the default direct GPA
> path from virtio core forcing it to use DMA API callbacks for all devices.
> Now with that change, every device must have a DMA operations structure
> associated with it. The fourth patch adds an additional hook which gives
> the platform an opportunity to do yet another override if required. This
> platform hook can be used on POWER Ultravisor based protected guests to
> load up SWIOTLB DMA callbacks to do the required (as discussed previously
> in the above mentioned thread how host is allowed to access only parts of
> the guest GPA range) bounce buffering into the shared memory for all I/O
> scatter gather buffers to be consumed on the host side.
> 
> Please go through these patches and review whether this approach broadly
> makes sense. I will appreciate suggestions, inputs, comments regarding
> the patches or the approach in general. Thank you.

Jason did some work on profiling this. Unfortunately he reports
about 4% extra overhead from this switch on x86 with no vIOMMU.

I expect he's writing up the data in more detail, but
just wanted to let you know this would be one more
thing to debug before we can just switch to DMA APIs.


> Anshuman Khandual (4):
>   virtio: Define virtio_direct_dma_ops structure
>   virtio: Override device's DMA OPS with virtio_direct_dma_ops selectively
>   virtio: Force virtio core to use DMA API callbacks for all virtio devices
>   virtio: Add platform specific DMA API translation for virito devices
> 
>  arch/powerpc/include/asm/dma-mapping.h |  6 +++
>  arch/powerpc/platforms/pseries/iommu.c |  6 +++
>  drivers/virtio/virtio.c                | 72 ++++++++++++++++++++++++++++++++++
>  drivers/virtio/virtio_pci_common.h     |  3 ++
>  drivers/virtio/virtio_ring.c           | 65 +-----------------------------
>  5 files changed, 89 insertions(+), 63 deletions(-)
> 
> -- 
> 2.9.3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-02 20:52                             ` Michael S. Tsirkin
@ 2018-08-02 21:13                               ` Benjamin Herrenschmidt
  2018-08-02 21:51                                 ` Michael S. Tsirkin
  2018-08-03  7:05                                 ` Christoph Hellwig
  0 siblings, 2 replies; 119+ messages in thread
From: Benjamin Herrenschmidt @ 2018-08-02 21:13 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Thu, 2018-08-02 at 23:52 +0300, Michael S. Tsirkin wrote:
> > Yes, this is the purpose of Anshuman original patch (I haven't looked
> > at the details of the patch in a while but that's what I told him to
> > implement ;-) :
> > 
> >   - Make virtio always use DMA ops to simplify the code path (with a set
> > of "transparent" ops for legacy)
> > 
> >   and
> > 
> >   -  Provide an arch hook allowing us to "override" those "transparent"
> > DMA ops with some custom ones that do the appropriate swiotlb gunk.
> > 
> > Cheers,
> > Ben.
> > 
> 
> Right but as I tried to say doing that brings us to a bunch of issues
> with using DMA APIs in virtio. Put simply DMA APIs weren't designed for
> guest to hypervisor communication.

I'm not sure I see the problem, see below

> When we do (as is the case with PLATFORM_IOMMU right now) this adds a
> bunch of overhead which we need to get rid of if we are to switch to
> PLATFORM_IOMMU by default.  We need to fix that.

So let's differenciate the two problems of having an IOMMU (real or
emulated) which indeeds adds overhead etc... and using the DMA API.

At the moment, virtio does this all over the place:

	if (use_dma_api)
		dma_map/alloc_something(...)
	else
		use_pa

The idea of the patch set is to do two, somewhat orthogonal, changes
that together achieve what we want. Let me know where you think there
is "a bunch of issues" because I'm missing it:

 1- Replace the above if/else constructs with just calling the DMA API,
and have virtio, at initialization, hookup its own dma_ops that just
"return pa" (roughly) when the IOMMU stuff isn't used.

This adds an indirect function call to the path that previously didn't
have one (the else case above). Is that a significant/measurable
overhead ?

This change stands alone, and imho "cleans" up virtio by avoiding all
that if/else "2 path" and unless it adds a measurable overhead, should
probably be done.

 2- Make virtio use the DMA API with our custom platform-provided
swiotlb callbacks when needed, that is when not using IOMMU *and*
running on a secure VM in our case.

This benefits from -1- by making us just plumb in a different set of
DMA ops we would have cooked up specifically for virtio in our arch
code (or in virtio itself but build arch-conditionally in a separate
file). But it doesn't strictly need it -1-:

Now, -2- doesn't strictly needs -1-. We could have just done another
xen-like hack that forces the DMA API "ON" for virtio when running in a
secure VM.

The problem if we do that however is that we also then need the arch
PCI code to make sure it hooks up the virtio PCI devices with the
special "magic" DMA ops that avoid the iommu but still do swiotlb, ie,
not the same as other PCI devices. So it will have to play games such
as checking vendor/device IDs for virtio, checking the IOMMU flag,
etc... from the arch code which really bloody sucks when assigning PCI
DMA ops.

However, if we do it the way we plan here, on top of -1-, with a hook
called from virtio into the arch to "override" the virtio DMA ops, then
we avoid the problem completely: The arch hook would only be called by
virtio if the IOMMU flag is *not* set. IE only when using that special
"hypervisor" iommu bypass. If the IOMMU flag is set, virtio uses normal
PCI dma ops as usual.

That way, we have a very clear semantic: This hook is purely about
replacing those "null" DMA ops that just return PA introduced in -1-
with some arch provided specially cooked up DMA ops for non-IOMMU
virtio that know about the arch special requirements. For us bounce
buffering.

Is there something I'm missing ?

Cheers,
Ben.





^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-02 21:13                               ` Benjamin Herrenschmidt
@ 2018-08-02 21:51                                 ` Michael S. Tsirkin
  2018-08-03  7:05                                 ` Christoph Hellwig
  1 sibling, 0 replies; 119+ messages in thread
From: Michael S. Tsirkin @ 2018-08-02 21:51 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Thu, Aug 02, 2018 at 04:13:09PM -0500, Benjamin Herrenschmidt wrote:
> On Thu, 2018-08-02 at 23:52 +0300, Michael S. Tsirkin wrote:
> > > Yes, this is the purpose of Anshuman original patch (I haven't looked
> > > at the details of the patch in a while but that's what I told him to
> > > implement ;-) :
> > > 
> > >   - Make virtio always use DMA ops to simplify the code path (with a set
> > > of "transparent" ops for legacy)
> > > 
> > >   and
> > > 
> > >   -  Provide an arch hook allowing us to "override" those "transparent"
> > > DMA ops with some custom ones that do the appropriate swiotlb gunk.
> > > 
> > > Cheers,
> > > Ben.
> > > 
> > 
> > Right but as I tried to say doing that brings us to a bunch of issues
> > with using DMA APIs in virtio. Put simply DMA APIs weren't designed for
> > guest to hypervisor communication.
> 
> I'm not sure I see the problem, see below
> 
> > When we do (as is the case with PLATFORM_IOMMU right now) this adds a
> > bunch of overhead which we need to get rid of if we are to switch to
> > PLATFORM_IOMMU by default.  We need to fix that.
> 
> So let's differenciate the two problems of having an IOMMU (real or
> emulated) which indeeds adds overhead etc... and using the DMA API.

Well actually it's the other way around. An iommu in theory doesn't need
to bring overhead if you set it in bypass mode.  Which does imply the
iommu supports bypass mode. Is that universally the case?  DMA API does
see Christoph's list of things it does some of which add overhead.

> At the moment, virtio does this all over the place:
> 
> 	if (use_dma_api)
> 		dma_map/alloc_something(...)
> 	else
> 		use_pa
> 
> The idea of the patch set is to do two, somewhat orthogonal, changes
> that together achieve what we want. Let me know where you think there
> is "a bunch of issues" because I'm missing it:
> 
>  1- Replace the above if/else constructs with just calling the DMA API,
> and have virtio, at initialization, hookup its own dma_ops that just
> "return pa" (roughly) when the IOMMU stuff isn't used.
> 
> This adds an indirect function call to the path that previously didn't
> have one (the else case above). Is that a significant/measurable
> overhead ?

Seems to be :( Jason reports about 4%. I wonder whether we can support
map_sg and friends being NULL, then use that when mapping is an
identity. A conditional branch there is likely very cheap.

Would this cover all platforms with kvm (which is where we care
most about performance)?

> This change stands alone, and imho "cleans" up virtio by avoiding all
> that if/else "2 path" and unless it adds a measurable overhead, should
> probably be done.
> 
>  2- Make virtio use the DMA API with our custom platform-provided
> swiotlb callbacks when needed, that is when not using IOMMU *and*
> running on a secure VM in our case.
> 
> This benefits from -1- by making us just plumb in a different set of
> DMA ops we would have cooked up specifically for virtio in our arch
> code (or in virtio itself but build arch-conditionally in a separate
> file). But it doesn't strictly need it -1-:
> 
> Now, -2- doesn't strictly needs -1-. We could have just done another
> xen-like hack that forces the DMA API "ON" for virtio when running in a
> secure VM.
> 
> The problem if we do that however is that we also then need the arch
> PCI code to make sure it hooks up the virtio PCI devices with the
> special "magic" DMA ops that avoid the iommu but still do swiotlb, ie,
> not the same as other PCI devices. So it will have to play games such
> as checking vendor/device IDs for virtio, checking the IOMMU flag,
> etc... from the arch code which really bloody sucks when assigning PCI
> DMA ops.
> 
> However, if we do it the way we plan here, on top of -1-, with a hook
> called from virtio into the arch to "override" the virtio DMA ops, then
> we avoid the problem completely: The arch hook would only be called by
> virtio if the IOMMU flag is *not* set. IE only when using that special
> "hypervisor" iommu bypass. If the IOMMU flag is set, virtio uses normal
> PCI dma ops as usual.
> 
> That way, we have a very clear semantic: This hook is purely about
> replacing those "null" DMA ops that just return PA introduced in -1-
> with some arch provided specially cooked up DMA ops for non-IOMMU
> virtio that know about the arch special requirements. For us bounce
> buffering.
> 
> Is there something I'm missing ?
> 
> Cheers,
> Ben.

Right so I was trying to write it up in a systematic way, but just to
give you one example, if there is a system where DMA API handles
coherency issues, or flushing of some buffers, then our PLATFORM_IOMMU
flag causes that to happen.

And we kinda worked around this without the IOMMU by basically saying
"ok we do not really need DMA API so let's just bypass it" and
it was kind of ok except now everyone is switching to vIOMMU
just in case. So now people do want some parts of what DMA API does,
such as the bounce buffer use, or IOMMU mappings.

And maybe in the end the solution is going to be to do something similar
to virt_Xmb except for DMA APIs: add APIs that handle just the
addressing bits but without the overhead. See
commit 6a65d26385bf487926a0616650927303058551e3
    asm-generic: implement virt_xxx memory barriers
for reference, it's a similar set of issues.

So it's not a problem with your patches as such, it's just that
they don't solve that harder problem.

-- 
MST

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-02 20:55 ` Michael S. Tsirkin
@ 2018-08-03  2:41   ` Jason Wang
  2018-08-03 19:08     ` Michael S. Tsirkin
  0 siblings, 1 reply; 119+ messages in thread
From: Jason Wang @ 2018-08-03  2:41 UTC (permalink / raw)
  To: Michael S. Tsirkin, Anshuman Khandual
  Cc: virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, benh, mpe, hch, linuxram, haren, paulus, srikar



On 2018年08月03日 04:55, Michael S. Tsirkin wrote:
> On Fri, Jul 20, 2018 at 09:29:37AM +0530, Anshuman Khandual wrote:
>> This patch series is the follow up on the discussions we had before about
>> the RFC titled [RFC,V2] virtio: Add platform specific DMA API translation
>> for virito devices (https://patchwork.kernel.org/patch/10417371/). There
>> were suggestions about doing away with two different paths of transactions
>> with the host/QEMU, first being the direct GPA and the other being the DMA
>> API based translations.
>>
>> First patch attempts to create a direct GPA mapping based DMA operations
>> structure called 'virtio_direct_dma_ops' with exact same implementation
>> of the direct GPA path which virtio core currently has but just wrapped in
>> a DMA API format. Virtio core must use 'virtio_direct_dma_ops' instead of
>> the arch default in absence of VIRTIO_F_IOMMU_PLATFORM flag to preserve the
>> existing semantics. The second patch does exactly that inside the function
>> virtio_finalize_features(). The third patch removes the default direct GPA
>> path from virtio core forcing it to use DMA API callbacks for all devices.
>> Now with that change, every device must have a DMA operations structure
>> associated with it. The fourth patch adds an additional hook which gives
>> the platform an opportunity to do yet another override if required. This
>> platform hook can be used on POWER Ultravisor based protected guests to
>> load up SWIOTLB DMA callbacks to do the required (as discussed previously
>> in the above mentioned thread how host is allowed to access only parts of
>> the guest GPA range) bounce buffering into the shared memory for all I/O
>> scatter gather buffers to be consumed on the host side.
>>
>> Please go through these patches and review whether this approach broadly
>> makes sense. I will appreciate suggestions, inputs, comments regarding
>> the patches or the approach in general. Thank you.
> Jason did some work on profiling this. Unfortunately he reports
> about 4% extra overhead from this switch on x86 with no vIOMMU.

The test is rather simple, just run pktgen (pktgen_sample01_simple.sh) 
in guest and measure PPS on tap on host.

Thanks

>
> I expect he's writing up the data in more detail, but
> just wanted to let you know this would be one more
> thing to debug before we can just switch to DMA APIs.
>
>
>> Anshuman Khandual (4):
>>    virtio: Define virtio_direct_dma_ops structure
>>    virtio: Override device's DMA OPS with virtio_direct_dma_ops selectively
>>    virtio: Force virtio core to use DMA API callbacks for all virtio devices
>>    virtio: Add platform specific DMA API translation for virito devices
>>
>>   arch/powerpc/include/asm/dma-mapping.h |  6 +++
>>   arch/powerpc/platforms/pseries/iommu.c |  6 +++
>>   drivers/virtio/virtio.c                | 72 ++++++++++++++++++++++++++++++++++
>>   drivers/virtio/virtio_pci_common.h     |  3 ++
>>   drivers/virtio/virtio_ring.c           | 65 +-----------------------------
>>   5 files changed, 89 insertions(+), 63 deletions(-)
>>
>> -- 
>> 2.9.3


^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-02 21:13                               ` Benjamin Herrenschmidt
  2018-08-02 21:51                                 ` Michael S. Tsirkin
@ 2018-08-03  7:05                                 ` Christoph Hellwig
  2018-08-03 15:58                                   ` Benjamin Herrenschmidt
  2018-08-03 19:17                                   ` Michael S. Tsirkin
  1 sibling, 2 replies; 119+ messages in thread
From: Christoph Hellwig @ 2018-08-03  7:05 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Michael S. Tsirkin, Christoph Hellwig, Will Deacon,
	Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev,
	aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren,
	paulus, srikar, robin.murphy, jean-philippe.brucker,
	marc.zyngier

On Thu, Aug 02, 2018 at 04:13:09PM -0500, Benjamin Herrenschmidt wrote:
> So let's differenciate the two problems of having an IOMMU (real or
> emulated) which indeeds adds overhead etc... and using the DMA API.
> 
> At the moment, virtio does this all over the place:
> 
> 	if (use_dma_api)
> 		dma_map/alloc_something(...)
> 	else
> 		use_pa
> 
> The idea of the patch set is to do two, somewhat orthogonal, changes
> that together achieve what we want. Let me know where you think there
> is "a bunch of issues" because I'm missing it:
> 
>  1- Replace the above if/else constructs with just calling the DMA API,
> and have virtio, at initialization, hookup its own dma_ops that just
> "return pa" (roughly) when the IOMMU stuff isn't used.
> 
> This adds an indirect function call to the path that previously didn't
> have one (the else case above). Is that a significant/measurable
> overhead ?

If you call it often enough it does:

https://www.spinics.net/lists/netdev/msg495413.html

>  2- Make virtio use the DMA API with our custom platform-provided
> swiotlb callbacks when needed, that is when not using IOMMU *and*
> running on a secure VM in our case.

And total NAK the customer platform-provided part of this.  We need
a flag passed in from the hypervisor that the device needs all bus
specific dma api treatment, and then just use the normal plaform
dma mapping setup.  To get swiotlb you'll need to then use the DT/ACPI
dma-range property to limit the addressable range, and a swiotlb
capable plaform will use swiotlb automatically.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-02 20:53                   ` Michael S. Tsirkin
@ 2018-08-03  7:06                     ` Christoph Hellwig
  0 siblings, 0 replies; 119+ messages in thread
From: Christoph Hellwig @ 2018-08-03  7:06 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Benjamin Herrenschmidt, Christoph Hellwig, Will Deacon,
	Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev,
	aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren,
	paulus, srikar, robin.murphy, jean-philippe.brucker,
	marc.zyngier

On Thu, Aug 02, 2018 at 11:53:08PM +0300, Michael S. Tsirkin wrote:
> > We don't need cache flushing tricks.
> 
> You don't but do real devices on same platform need them?

IBMs power plaforms are always cache coherent.  There are some powerpc
platforms have not cache coherent DMA, but I guess this scheme isn't
intended for them.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-03  7:05                                 ` Christoph Hellwig
@ 2018-08-03 15:58                                   ` Benjamin Herrenschmidt
  2018-08-03 16:02                                     ` Christoph Hellwig
  2018-08-03 19:07                                     ` Michael S. Tsirkin
  2018-08-03 19:17                                   ` Michael S. Tsirkin
  1 sibling, 2 replies; 119+ messages in thread
From: Benjamin Herrenschmidt @ 2018-08-03 15:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Michael S. Tsirkin, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote:
> >   2- Make virtio use the DMA API with our custom platform-provided
> > swiotlb callbacks when needed, that is when not using IOMMU *and*
> > running on a secure VM in our case.
> 
> And total NAK the customer platform-provided part of this.  We need
> a flag passed in from the hypervisor that the device needs all bus
> specific dma api treatment, and then just use the normal plaform
> dma mapping setup. 

Christoph, as I have explained already, we do NOT have a way to provide
such a flag as neither the hypervisor nor qemu knows anything about
this when the VM is created.

>  To get swiotlb you'll need to then use the DT/ACPI
> dma-range property to limit the addressable range, and a swiotlb
> capable plaform will use swiotlb automatically.

This cannot be done as you describe it.

The VM is created as a *normal* VM. The DT stuff is generated by qemu
at a point where it has *no idea* that the VM will later become secure
and thus will have to restrict which pages can be used for "DMA".

The VM will *at runtime* turn itself into a secure VM via interactions
with the security HW and the Ultravisor layer (which sits below the
HV). This happens way after the DT has been created and consumed, the
qemu devices instanciated etc...

Only the guest kernel knows because it initates the transition. When
that happens, the virtio devices have already been used by the guest
firmware, bootloader, possibly another kernel that kexeced the "secure"
one, etc... 

So instead of running around saying NAK NAK NAK, please explain how we
can solve that differently.

Ben.



^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-03 15:58                                   ` Benjamin Herrenschmidt
@ 2018-08-03 16:02                                     ` Christoph Hellwig
  2018-08-03 18:58                                       ` Benjamin Herrenschmidt
  2018-08-03 19:07                                     ` Michael S. Tsirkin
  1 sibling, 1 reply; 119+ messages in thread
From: Christoph Hellwig @ 2018-08-03 16:02 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Christoph Hellwig, Michael S. Tsirkin, Will Deacon,
	Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev,
	aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren,
	paulus, srikar, robin.murphy, jean-philippe.brucker,
	marc.zyngier

On Fri, Aug 03, 2018 at 10:58:36AM -0500, Benjamin Herrenschmidt wrote:
> On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote:
> > >   2- Make virtio use the DMA API with our custom platform-provided
> > > swiotlb callbacks when needed, that is when not using IOMMU *and*
> > > running on a secure VM in our case.
> > 
> > And total NAK the customer platform-provided part of this.  We need
> > a flag passed in from the hypervisor that the device needs all bus
> > specific dma api treatment, and then just use the normal plaform
> > dma mapping setup. 
> 
> Christoph, as I have explained already, we do NOT have a way to provide
> such a flag as neither the hypervisor nor qemu knows anything about
> this when the VM is created.

Well, if your setup is so fucked up I see no way to support it in Linux.

Let's end the discussion right now then.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-03 16:02                                     ` Christoph Hellwig
@ 2018-08-03 18:58                                       ` Benjamin Herrenschmidt
  2018-08-04  8:21                                         ` Christoph Hellwig
  0 siblings, 1 reply; 119+ messages in thread
From: Benjamin Herrenschmidt @ 2018-08-03 18:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Michael S. Tsirkin, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Fri, 2018-08-03 at 09:02 -0700, Christoph Hellwig wrote:
> On Fri, Aug 03, 2018 at 10:58:36AM -0500, Benjamin Herrenschmidt wrote:
> > On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote:
> > > >   2- Make virtio use the DMA API with our custom platform-provided
> > > > swiotlb callbacks when needed, that is when not using IOMMU *and*
> > > > running on a secure VM in our case.
> > > 
> > > And total NAK the customer platform-provided part of this.  We need
> > > a flag passed in from the hypervisor that the device needs all bus
> > > specific dma api treatment, and then just use the normal plaform
> > > dma mapping setup. 
> > 
> > Christoph, as I have explained already, we do NOT have a way to provide
> > such a flag as neither the hypervisor nor qemu knows anything about
> > this when the VM is created.
> 
> Well, if your setup is so fucked up I see no way to support it in Linux.
> 
> Let's end the discussion right now then.

You are saying something along the lines of "I don't like an
instruction in your ISA, let's not support your entire CPU architecture
in Linux".

Our setup is not fucked. It makes a LOT of sense and it's a very
sensible design. It's hitting a problem due to a corner case oddity in
virtio bypassing the MMU, we've worked around such corner cases many
times in the past without any problem, I fail to see what the problem
is here.

We aren't going to cancel years of HW and SW development for our
security infrastructure bcs you don't like a 2 lines hook into virtio
to make things work and aren't willing to even consider the options.

Ben.
 


^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-03 15:58                                   ` Benjamin Herrenschmidt
  2018-08-03 16:02                                     ` Christoph Hellwig
@ 2018-08-03 19:07                                     ` Michael S. Tsirkin
  2018-08-04  1:11                                       ` Benjamin Herrenschmidt
                                                         ` (3 more replies)
  1 sibling, 4 replies; 119+ messages in thread
From: Michael S. Tsirkin @ 2018-08-03 19:07 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Fri, Aug 03, 2018 at 10:58:36AM -0500, Benjamin Herrenschmidt wrote:
> On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote:
> > >   2- Make virtio use the DMA API with our custom platform-provided
> > > swiotlb callbacks when needed, that is when not using IOMMU *and*
> > > running on a secure VM in our case.
> > 
> > And total NAK the customer platform-provided part of this.  We need
> > a flag passed in from the hypervisor that the device needs all bus
> > specific dma api treatment, and then just use the normal plaform
> > dma mapping setup. 
> 
> Christoph, as I have explained already, we do NOT have a way to provide
> such a flag as neither the hypervisor nor qemu knows anything about
> this when the VM is created.

I think the fact you can't add flags from the hypervisor is
a sign of a problematic architecture, you should look at
adding that down the road - you will likely need it at some point.

However in this specific case, the flag does not need to come from the
hypervisor, it can be set by arch boot code I think.
Christoph do you see a problem with that?

> >  To get swiotlb you'll need to then use the DT/ACPI
> > dma-range property to limit the addressable range, and a swiotlb
> > capable plaform will use swiotlb automatically.
> 
> This cannot be done as you describe it.
> 
> The VM is created as a *normal* VM. The DT stuff is generated by qemu
> at a point where it has *no idea* that the VM will later become secure
> and thus will have to restrict which pages can be used for "DMA".
> 
> The VM will *at runtime* turn itself into a secure VM via interactions
> with the security HW and the Ultravisor layer (which sits below the
> HV). This happens way after the DT has been created and consumed, the
> qemu devices instanciated etc...
> 
> Only the guest kernel knows because it initates the transition. When
> that happens, the virtio devices have already been used by the guest
> firmware, bootloader, possibly another kernel that kexeced the "secure"
> one, etc... 
> 
> So instead of running around saying NAK NAK NAK, please explain how we
> can solve that differently.
> 
> Ben.
> 

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-03  2:41   ` Jason Wang
@ 2018-08-03 19:08     ` Michael S. Tsirkin
  2018-08-04  1:21       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 119+ messages in thread
From: Michael S. Tsirkin @ 2018-08-03 19:08 UTC (permalink / raw)
  To: Jason Wang
  Cc: Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev,
	aik, robh, joe, elfring, david, benh, mpe, hch, linuxram, haren,
	paulus, srikar

On Fri, Aug 03, 2018 at 10:41:41AM +0800, Jason Wang wrote:
> 
> 
> On 2018年08月03日 04:55, Michael S. Tsirkin wrote:
> > On Fri, Jul 20, 2018 at 09:29:37AM +0530, Anshuman Khandual wrote:
> > > This patch series is the follow up on the discussions we had before about
> > > the RFC titled [RFC,V2] virtio: Add platform specific DMA API translation
> > > for virito devices (https://patchwork.kernel.org/patch/10417371/). There
> > > were suggestions about doing away with two different paths of transactions
> > > with the host/QEMU, first being the direct GPA and the other being the DMA
> > > API based translations.
> > > 
> > > First patch attempts to create a direct GPA mapping based DMA operations
> > > structure called 'virtio_direct_dma_ops' with exact same implementation
> > > of the direct GPA path which virtio core currently has but just wrapped in
> > > a DMA API format. Virtio core must use 'virtio_direct_dma_ops' instead of
> > > the arch default in absence of VIRTIO_F_IOMMU_PLATFORM flag to preserve the
> > > existing semantics. The second patch does exactly that inside the function
> > > virtio_finalize_features(). The third patch removes the default direct GPA
> > > path from virtio core forcing it to use DMA API callbacks for all devices.
> > > Now with that change, every device must have a DMA operations structure
> > > associated with it. The fourth patch adds an additional hook which gives
> > > the platform an opportunity to do yet another override if required. This
> > > platform hook can be used on POWER Ultravisor based protected guests to
> > > load up SWIOTLB DMA callbacks to do the required (as discussed previously
> > > in the above mentioned thread how host is allowed to access only parts of
> > > the guest GPA range) bounce buffering into the shared memory for all I/O
> > > scatter gather buffers to be consumed on the host side.
> > > 
> > > Please go through these patches and review whether this approach broadly
> > > makes sense. I will appreciate suggestions, inputs, comments regarding
> > > the patches or the approach in general. Thank you.
> > Jason did some work on profiling this. Unfortunately he reports
> > about 4% extra overhead from this switch on x86 with no vIOMMU.
> 
> The test is rather simple, just run pktgen (pktgen_sample01_simple.sh) in
> guest and measure PPS on tap on host.
> 
> Thanks

Could you supply host configuration involved please?

> > 
> > I expect he's writing up the data in more detail, but
> > just wanted to let you know this would be one more
> > thing to debug before we can just switch to DMA APIs.
> > 
> > 
> > > Anshuman Khandual (4):
> > >    virtio: Define virtio_direct_dma_ops structure
> > >    virtio: Override device's DMA OPS with virtio_direct_dma_ops selectively
> > >    virtio: Force virtio core to use DMA API callbacks for all virtio devices
> > >    virtio: Add platform specific DMA API translation for virito devices
> > > 
> > >   arch/powerpc/include/asm/dma-mapping.h |  6 +++
> > >   arch/powerpc/platforms/pseries/iommu.c |  6 +++
> > >   drivers/virtio/virtio.c                | 72 ++++++++++++++++++++++++++++++++++
> > >   drivers/virtio/virtio_pci_common.h     |  3 ++
> > >   drivers/virtio/virtio_ring.c           | 65 +-----------------------------
> > >   5 files changed, 89 insertions(+), 63 deletions(-)
> > > 
> > > -- 
> > > 2.9.3

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-03  7:05                                 ` Christoph Hellwig
  2018-08-03 15:58                                   ` Benjamin Herrenschmidt
@ 2018-08-03 19:17                                   ` Michael S. Tsirkin
  2018-08-04  8:15                                     ` Christoph Hellwig
  1 sibling, 1 reply; 119+ messages in thread
From: Michael S. Tsirkin @ 2018-08-03 19:17 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Benjamin Herrenschmidt, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Fri, Aug 03, 2018 at 12:05:07AM -0700, Christoph Hellwig wrote:
> On Thu, Aug 02, 2018 at 04:13:09PM -0500, Benjamin Herrenschmidt wrote:
> > So let's differenciate the two problems of having an IOMMU (real or
> > emulated) which indeeds adds overhead etc... and using the DMA API.
> > 
> > At the moment, virtio does this all over the place:
> > 
> > 	if (use_dma_api)
> > 		dma_map/alloc_something(...)
> > 	else
> > 		use_pa
> > 
> > The idea of the patch set is to do two, somewhat orthogonal, changes
> > that together achieve what we want. Let me know where you think there
> > is "a bunch of issues" because I'm missing it:
> > 
> >  1- Replace the above if/else constructs with just calling the DMA API,
> > and have virtio, at initialization, hookup its own dma_ops that just
> > "return pa" (roughly) when the IOMMU stuff isn't used.
> > 
> > This adds an indirect function call to the path that previously didn't
> > have one (the else case above). Is that a significant/measurable
> > overhead ?
> 
> If you call it often enough it does:
> 
> https://www.spinics.net/lists/netdev/msg495413.html
> 
> >  2- Make virtio use the DMA API with our custom platform-provided
> > swiotlb callbacks when needed, that is when not using IOMMU *and*
> > running on a secure VM in our case.
> 
> And total NAK the customer platform-provided part of this.  We need
> a flag passed in from the hypervisor that the device needs all bus
> specific dma api treatment, and then just use the normal plaform
> dma mapping setup.  To get swiotlb you'll need to then use the DT/ACPI
> dma-range property to limit the addressable range, and a swiotlb
> capable plaform will use swiotlb automatically.

It seems reasonable to teach a platform to override dma-range
for a specific device e.g. in case it knows about bugs in ACPI.

-- 
MST

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-03 19:07                                     ` Michael S. Tsirkin
@ 2018-08-04  1:11                                       ` Benjamin Herrenschmidt
  2018-08-04  1:16                                       ` Benjamin Herrenschmidt
                                                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 119+ messages in thread
From: Benjamin Herrenschmidt @ 2018-08-04  1:11 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Fri, 2018-08-03 at 22:07 +0300, Michael S. Tsirkin wrote:
> On Fri, Aug 03, 2018 at 10:58:36AM -0500, Benjamin Herrenschmidt wrote:
> > On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote:
> > > >   2- Make virtio use the DMA API with our custom platform-provided
> > > > swiotlb callbacks when needed, that is when not using IOMMU *and*
> > > > running on a secure VM in our case.
> > > 
> > > And total NAK the customer platform-provided part of this.  We need
> > > a flag passed in from the hypervisor that the device needs all bus
> > > specific dma api treatment, and then just use the normal plaform
> > > dma mapping setup. 
> > 
> > Christoph, as I have explained already, we do NOT have a way to provide
> > such a flag as neither the hypervisor nor qemu knows anything about
> > this when the VM is created.
> 
> I think the fact you can't add flags from the hypervisor is
> a sign of a problematic architecture, you should look at
> adding that down the road - you will likely need it at some point.

Well, we can later in the boot process. At VM creation time, it's just
a normal VM. The VM firmware, bootloader etc... are just operating
normally etc...

Later on, (we may have even already run Linux at that point,
unsecurely, as we can use Linux as a bootloader under some
circumstances), we start a "secure image".

This is a kernel zImage that includes a "ticket" that has the
appropriate signature etc... so that when that kernel starts, it can
authenticate with the ultravisor, be verified (along with its ramdisk)
etc... and copied (by the UV) into secure memory & run from there.

At that point, the hypervisor is informed that the VM has become
secure.

So at that point, we could exit to qemu to inform it of the change, and
have it walk the qtree and "Switch" all the virtio devices to use the
IOMMU I suppose, but it feels a lot grosser to me.

That's the only other option I can think of.

> However in this specific case, the flag does not need to come from the
> hypervisor, it can be set by arch boot code I think.
> Christoph do you see a problem with that?

The above could do that yes. Another approach would be to do it from a
small virtio "quirk" that pokes a bit in the device to force it to
iommu mode when it detects that we are running in a secure VM. That's a
bit warty on the virito side but probably not as much as having a qemu
one that walks of the virtio devices to change how they behave.

What do you reckon ?

Cheers,
Ben.

> > >  To get swiotlb you'll need to then use the DT/ACPI
> > > dma-range property to limit the addressable range, and a swiotlb
> > > capable plaform will use swiotlb automatically.
> > 
> > This cannot be done as you describe it.
> > 
> > The VM is created as a *normal* VM. The DT stuff is generated by qemu
> > at a point where it has *no idea* that the VM will later become secure
> > and thus will have to restrict which pages can be used for "DMA".
> > 
> > The VM will *at runtime* turn itself into a secure VM via interactions
> > with the security HW and the Ultravisor layer (which sits below the
> > HV). This happens way after the DT has been created and consumed, the
> > qemu devices instanciated etc...
> > 
> > Only the guest kernel knows because it initates the transition. When
> > that happens, the virtio devices have already been used by the guest
> > firmware, bootloader, possibly another kernel that kexeced the "secure"
> > one, etc... 
> > 
> > So instead of running around saying NAK NAK NAK, please explain how we
> > can solve that differently.
> > 
> > Ben.
> > 


^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-03 19:07                                     ` Michael S. Tsirkin
  2018-08-04  1:11                                       ` Benjamin Herrenschmidt
@ 2018-08-04  1:16                                       ` Benjamin Herrenschmidt
  2018-08-05  0:22                                         ` Michael S. Tsirkin
  2018-08-04  1:18                                       ` Benjamin Herrenschmidt
  2018-08-04  1:22                                       ` Benjamin Herrenschmidt
  3 siblings, 1 reply; 119+ messages in thread
From: Benjamin Herrenschmidt @ 2018-08-04  1:16 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Fri, 2018-08-03 at 22:07 +0300, Michael S. Tsirkin wrote:
> On Fri, Aug 03, 2018 at 10:58:36AM -0500, Benjamin Herrenschmidt wrote:
> > On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote:
> > > >   2- Make virtio use the DMA API with our custom platform-provided
> > > > swiotlb callbacks when needed, that is when not using IOMMU *and*
> > > > running on a secure VM in our case.
> > > 
> > > And total NAK the customer platform-provided part of this.  We need
> > > a flag passed in from the hypervisor that the device needs all bus
> > > specific dma api treatment, and then just use the normal plaform
> > > dma mapping setup. 
> > 
> > Christoph, as I have explained already, we do NOT have a way to provide
> > such a flag as neither the hypervisor nor qemu knows anything about
> > this when the VM is created.
> 
> I think the fact you can't add flags from the hypervisor is
> a sign of a problematic architecture, you should look at
> adding that down the road - you will likely need it at some point.

Well, we can later in the boot process. At VM creation time, it's just
a normal VM. The VM firmware, bootloader etc... are just operating
normally etc...

Later on, (we may have even already run Linux at that point,
unsecurely, as we can use Linux as a bootloader under some
circumstances), we start a "secure image".

This is a kernel zImage that includes a "ticket" that has the
appropriate signature etc... so that when that kernel starts, it can
authenticate with the ultravisor, be verified (along with its ramdisk)
etc... and copied (by the UV) into secure memory & run from there.

At that point, the hypervisor is informed that the VM has become
secure.

So at that point, we could exit to qemu to inform it of the change, and
have it walk the qtree and "Switch" all the virtio devices to use the
IOMMU I suppose, but it feels a lot grosser to me.

That's the only other option I can think of.

> However in this specific case, the flag does not need to come from the
> hypervisor, it can be set by arch boot code I think.
> Christoph do you see a problem with that?

The above could do that yes. Another approach would be to do it from a
small virtio "quirk" that pokes a bit in the device to force it to
iommu mode when it detects that we are running in a secure VM. That's a
bit warty on the virito side but probably not as much as having a qemu
one that walks of the virtio devices to change how they behave.

What do you reckon ?

What we want to avoid is to expose any of this to the *end user* or
libvirt or any other higher level of the management stack. We really
want that stuff to remain contained between the VM itself, KVM and
maybe qemu.

We will need some other qemu changes for migration so that's ok. But
the minute you start touching libvirt and the higher levels it becomes
a nightmare.

Cheers,
Ben.

> > >  To get swiotlb you'll need to then use the DT/ACPI
> > > dma-range property to limit the addressable range, and a swiotlb
> > > capable plaform will use swiotlb automatically.
> > 
> > This cannot be done as you describe it.
> > 
> > The VM is created as a *normal* VM. The DT stuff is generated by qemu
> > at a point where it has *no idea* that the VM will later become secure
> > and thus will have to restrict which pages can be used for "DMA".
> > 
> > The VM will *at runtime* turn itself into a secure VM via interactions
> > with the security HW and the Ultravisor layer (which sits below the
> > HV). This happens way after the DT has been created and consumed, the
> > qemu devices instanciated etc...
> > 
> > Only the guest kernel knows because it initates the transition. When
> > that happens, the virtio devices have already been used by the guest
> > firmware, bootloader, possibly another kernel that kexeced the "secure"
> > one, etc... 
> > 
> > So instead of running around saying NAK NAK NAK, please explain how we
> > can solve that differently.
> > 
> > Ben.
> > 


^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-03 19:07                                     ` Michael S. Tsirkin
  2018-08-04  1:11                                       ` Benjamin Herrenschmidt
  2018-08-04  1:16                                       ` Benjamin Herrenschmidt
@ 2018-08-04  1:18                                       ` Benjamin Herrenschmidt
  2018-08-04  1:22                                       ` Benjamin Herrenschmidt
  3 siblings, 0 replies; 119+ messages in thread
From: Benjamin Herrenschmidt @ 2018-08-04  1:18 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Fri, 2018-08-03 at 22:07 +0300, Michael S. Tsirkin wrote:
> On Fri, Aug 03, 2018 at 10:58:36AM -0500, Benjamin Herrenschmidt wrote:
> > On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote:
> > > >   2- Make virtio use the DMA API with our custom platform-provided
> > > > swiotlb callbacks when needed, that is when not using IOMMU *and*
> > > > running on a secure VM in our case.
> > > 
> > > And total NAK the customer platform-provided part of this.  We need
> > > a flag passed in from the hypervisor that the device needs all bus
> > > specific dma api treatment, and then just use the normal plaform
> > > dma mapping setup. 
> > 
> > Christoph, as I have explained already, we do NOT have a way to provide
> > such a flag as neither the hypervisor nor qemu knows anything about
> > this when the VM is created.
> 
> I think the fact you can't add flags from the hypervisor is
> a sign of a problematic architecture, you should look at
> adding that down the road - you will likely need it at some point.

Well, we can later in the boot process. At VM creation time, it's just
a normal VM. The VM firmware, bootloader etc... are just operating
normally etc...

Later on, (we may have even already run Linux at that point,
unsecurely, as we can use Linux as a bootloader under some
circumstances), we start a "secure image".

This is a kernel zImage that includes a "ticket" that has the
appropriate signature etc... so that when that kernel starts, it can
authenticate with the ultravisor, be verified (along with its ramdisk)
etc... and copied (by the UV) into secure memory & run from there.

At that point, the hypervisor is informed that the VM has become
secure.

So at that point, we could exit to qemu to inform it of the change, and
have it walk the qtree and "Switch" all the virtio devices to use the
IOMMU I suppose, but it feels a lot grosser to me.

That's the only other option I can think of.

> However in this specific case, the flag does not need to come from the
> hypervisor, it can be set by arch boot code I think.
> Christoph do you see a problem with that?

The above could do that yes. Another approach would be to do it from a
small virtio "quirk" that pokes a bit in the device to force it to
iommu mode when it detects that we are running in a secure VM. That's a
bit warty on the virito side but probably not as much as having a qemu
one that walks of the virtio devices to change how they behave.

What do you reckon ?

What we want to avoid is to expose any of this to the *end user* or
libvirt or any other higher level of the management stack. We really
want that stuff to remain contained between the VM itself, KVM and
maybe qemu.

We will need some other qemu changes for migration so that's ok. But
the minute you start touching libvirt and the higher levels it becomes
a nightmare.

Cheers,
Ben.

> > >  To get swiotlb you'll need to then use the DT/ACPI
> > > dma-range property to limit the addressable range, and a swiotlb
> > > capable plaform will use swiotlb automatically.
> > 
> > This cannot be done as you describe it.
> > 
> > The VM is created as a *normal* VM. The DT stuff is generated by qemu
> > at a point where it has *no idea* that the VM will later become secure
> > and thus will have to restrict which pages can be used for "DMA".
> > 
> > The VM will *at runtime* turn itself into a secure VM via interactions
> > with the security HW and the Ultravisor layer (which sits below the
> > HV). This happens way after the DT has been created and consumed, the
> > qemu devices instanciated etc...
> > 
> > Only the guest kernel knows because it initates the transition. When
> > that happens, the virtio devices have already been used by the guest
> > firmware, bootloader, possibly another kernel that kexeced the "secure"
> > one, etc... 
> > 
> > So instead of running around saying NAK NAK NAK, please explain how we
> > can solve that differently.
> > 
> > Ben.
> > 


^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-03 19:08     ` Michael S. Tsirkin
@ 2018-08-04  1:21       ` Benjamin Herrenschmidt
  2018-08-05  0:24         ` Michael S. Tsirkin
  0 siblings, 1 reply; 119+ messages in thread
From: Benjamin Herrenschmidt @ 2018-08-04  1:21 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang
  Cc: Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev,
	aik, robh, joe, elfring, david, mpe, hch, linuxram, haren,
	paulus, srikar

On Fri, 2018-08-03 at 22:08 +0300, Michael S. Tsirkin wrote:
> > > > Please go through these patches and review whether this approach broadly
> > > > makes sense. I will appreciate suggestions, inputs, comments regarding
> > > > the patches or the approach in general. Thank you.
> > > 
> > > Jason did some work on profiling this. Unfortunately he reports
> > > about 4% extra overhead from this switch on x86 with no vIOMMU.
> > 
> > The test is rather simple, just run pktgen (pktgen_sample01_simple.sh) in
> > guest and measure PPS on tap on host.
> > 
> > Thanks
> 
> Could you supply host configuration involved please?

I wonder how much of that could be caused by Spectre mitigations
blowing up indirect function calls...

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-03 19:07                                     ` Michael S. Tsirkin
                                                         ` (2 preceding siblings ...)
  2018-08-04  1:18                                       ` Benjamin Herrenschmidt
@ 2018-08-04  1:22                                       ` Benjamin Herrenschmidt
  2018-08-05  0:23                                         ` Michael S. Tsirkin
  3 siblings, 1 reply; 119+ messages in thread
From: Benjamin Herrenschmidt @ 2018-08-04  1:22 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Fri, 2018-08-03 at 22:07 +0300, Michael S. Tsirkin wrote:
> On Fri, Aug 03, 2018 at 10:58:36AM -0500, Benjamin Herrenschmidt wrote:
> > On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote:
> > > >   2- Make virtio use the DMA API with our custom platform-provided
> > > > swiotlb callbacks when needed, that is when not using IOMMU *and*
> > > > running on a secure VM in our case.
> > > 
> > > And total NAK the customer platform-provided part of this.  We need
> > > a flag passed in from the hypervisor that the device needs all bus
> > > specific dma api treatment, and then just use the normal plaform
> > > dma mapping setup. 
> > 
> > Christoph, as I have explained already, we do NOT have a way to provide
> > such a flag as neither the hypervisor nor qemu knows anything about
> > this when the VM is created.
> 
> I think the fact you can't add flags from the hypervisor is
> a sign of a problematic architecture, you should look at
> adding that down the road - you will likely need it at some point.

(Appologies if you got this twice, my mailer had a brain fart and I don't
know if the first one got through & am about to disappear in a plane for 17h)

Well, we can later in the boot process. At VM creation time, it's just
a normal VM. The VM firmware, bootloader etc... are just operating
normally etc...

Later on, (we may have even already run Linux at that point,
unsecurely, as we can use Linux as a bootloader under some
circumstances), we start a "secure image".

This is a kernel zImage that includes a "ticket" that has the
appropriate signature etc... so that when that kernel starts, it can
authenticate with the ultravisor, be verified (along with its ramdisk)
etc... and copied (by the UV) into secure memory & run from there.

At that point, the hypervisor is informed that the VM has become
secure.

So at that point, we could exit to qemu to inform it of the change, and
have it walk the qtree and "Switch" all the virtio devices to use the
IOMMU I suppose, but it feels a lot grosser to me.

That's the only other option I can think of.

> However in this specific case, the flag does not need to come from the
> hypervisor, it can be set by arch boot code I think.
> Christoph do you see a problem with that?

The above could do that yes. Another approach would be to do it from a
small virtio "quirk" that pokes a bit in the device to force it to
iommu mode when it detects that we are running in a secure VM. That's a
bit warty on the virito side but probably not as much as having a qemu
one that walks of the virtio devices to change how they behave.

What do you reckon ?

What we want to avoid is to expose any of this to the *end user* or
libvirt or any other higher level of the management stack. We really
want that stuff to remain contained between the VM itself, KVM and
maybe qemu.

We will need some other qemu changes for migration so that's ok. But
the minute you start touching libvirt and the higher levels it becomes
a nightmare.

Cheers,
Ben.

> > >  To get swiotlb you'll need to then use the DT/ACPI
> > > dma-range property to limit the addressable range, and a swiotlb
> > > capable plaform will use swiotlb automatically.
> > 
> > This cannot be done as you describe it.
> > 
> > The VM is created as a *normal* VM. The DT stuff is generated by qemu
> > at a point where it has *no idea* that the VM will later become secure
> > and thus will have to restrict which pages can be used for "DMA".
> > 
> > The VM will *at runtime* turn itself into a secure VM via interactions
> > with the security HW and the Ultravisor layer (which sits below the
> > HV). This happens way after the DT has been created and consumed, the
> > qemu devices instanciated etc...
> > 
> > Only the guest kernel knows because it initates the transition. When
> > that happens, the virtio devices have already been used by the guest
> > firmware, bootloader, possibly another kernel that kexeced the "secure"
> > one, etc... 
> > 
> > So instead of running around saying NAK NAK NAK, please explain how we
> > can solve that differently.
> > 
> > Ben.
> > 


^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-03 19:17                                   ` Michael S. Tsirkin
@ 2018-08-04  8:15                                     ` Christoph Hellwig
  2018-08-05  0:09                                       ` Michael S. Tsirkin
  2018-08-05  0:53                                       ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 119+ messages in thread
From: Christoph Hellwig @ 2018-08-04  8:15 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, Benjamin Herrenschmidt, Will Deacon,
	Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev,
	aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren,
	paulus, srikar, robin.murphy, jean-philippe.brucker,
	marc.zyngier

On Fri, Aug 03, 2018 at 10:17:32PM +0300, Michael S. Tsirkin wrote:
> It seems reasonable to teach a platform to override dma-range
> for a specific device e.g. in case it knows about bugs in ACPI.

A platform will be able override dma-range using the dev->bus_dma_mask
field starting in 4.19.  But we'll still need a way how to

  a) document in the virtio spec that all bus dma quirks are to be
     applied
  b) a way to document in a virtio-related spec how the bus handles
     dma for Ben's totally fucked up hypervisor.  Without that there
     is not way we'll get interoperable implementations.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-03 18:58                                       ` Benjamin Herrenschmidt
@ 2018-08-04  8:21                                         ` Christoph Hellwig
  2018-08-05  1:10                                           ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 119+ messages in thread
From: Christoph Hellwig @ 2018-08-04  8:21 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Christoph Hellwig, Michael S. Tsirkin, Will Deacon,
	Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev,
	aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren,
	paulus, srikar, robin.murphy, jean-philippe.brucker,
	marc.zyngier

On Fri, Aug 03, 2018 at 01:58:46PM -0500, Benjamin Herrenschmidt wrote:
> You are saying something along the lines of "I don't like an
> instruction in your ISA, let's not support your entire CPU architecture
> in Linux".

No.  I'm saying if you can't describe your architecture in the virtio
spec document it is bogus.

> Our setup is not fucked. It makes a LOT of sense and it's a very
> sensible design. It's hitting a problem due to a corner case oddity in
> virtio bypassing the MMU, we've worked around such corner cases many
> times in the past without any problem, I fail to see what the problem
> is here.

No matter if you like it or not (I don't!) virtio is defined to bypass
dma translations, it is very clearly stated in the spec.  It has some
ill-defined bits to bypass it, so if you want the dma mapping API
to be used you'll have to set that bit (in its original form, a refined
form, or an entirely newly defined sane form) and make sure your
hypersivors always sets it.  It's not rocket science, just a little bit
for work to make sure your setup is actually going to work reliably
and portably.

> We aren't going to cancel years of HW and SW development for our

Maybe you should have actually read the specs you are claiming to
implemented before spending all that effort.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-04  8:15                                     ` Christoph Hellwig
@ 2018-08-05  0:09                                       ` Michael S. Tsirkin
  2018-08-05  1:11                                         ` Benjamin Herrenschmidt
  2018-08-05  7:25                                         ` Christoph Hellwig
  2018-08-05  0:53                                       ` Benjamin Herrenschmidt
  1 sibling, 2 replies; 119+ messages in thread
From: Michael S. Tsirkin @ 2018-08-05  0:09 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Benjamin Herrenschmidt, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Sat, Aug 04, 2018 at 01:15:00AM -0700, Christoph Hellwig wrote:
> On Fri, Aug 03, 2018 at 10:17:32PM +0300, Michael S. Tsirkin wrote:
> > It seems reasonable to teach a platform to override dma-range
> > for a specific device e.g. in case it knows about bugs in ACPI.
> 
> A platform will be able override dma-range using the dev->bus_dma_mask
> field starting in 4.19.  But we'll still need a way how to
> 
>   a) document in the virtio spec that all bus dma quirks are to be
>      applied

I agree it's a good idea. In particular I suspect that PLATFORM_IOMMU
should be extended to cover that. But see below.

>   b) a way to document in a virtio-related spec how the bus handles
>      dma for Ben's totally fucked up hypervisor.  Without that there
>      is not way we'll get interoperable implementations.


So in this case however I'm not sure what exactly do we want to add. It
seems that from point of view of the device, there is nothing special -
it just gets a PA and writes there.  It also seems that guest does not
need to get any info from the device either. Instead guest itself needs
device to DMA into specific addresses, for its own reasons.

It seems that the fact that within guest it's implemented using a bounce
buffer and that it's easiest to do by switching virtio to use the DMA API
isn't something virtio spec concerns itself with.

I'm open to suggestions.

-- 
MST

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-04  1:16                                       ` Benjamin Herrenschmidt
@ 2018-08-05  0:22                                         ` Michael S. Tsirkin
  2018-08-05  4:52                                           ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 119+ messages in thread
From: Michael S. Tsirkin @ 2018-08-05  0:22 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Fri, Aug 03, 2018 at 08:16:21PM -0500, Benjamin Herrenschmidt wrote:
> On Fri, 2018-08-03 at 22:07 +0300, Michael S. Tsirkin wrote:
> > On Fri, Aug 03, 2018 at 10:58:36AM -0500, Benjamin Herrenschmidt wrote:
> > > On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote:
> > > > >   2- Make virtio use the DMA API with our custom platform-provided
> > > > > swiotlb callbacks when needed, that is when not using IOMMU *and*
> > > > > running on a secure VM in our case.
> > > > 
> > > > And total NAK the customer platform-provided part of this.  We need
> > > > a flag passed in from the hypervisor that the device needs all bus
> > > > specific dma api treatment, and then just use the normal plaform
> > > > dma mapping setup. 
> > > 
> > > Christoph, as I have explained already, we do NOT have a way to provide
> > > such a flag as neither the hypervisor nor qemu knows anything about
> > > this when the VM is created.
> > 
> > I think the fact you can't add flags from the hypervisor is
> > a sign of a problematic architecture, you should look at
> > adding that down the road - you will likely need it at some point.
> 
> Well, we can later in the boot process. At VM creation time, it's just
> a normal VM. The VM firmware, bootloader etc... are just operating
> normally etc...

I see the allure of this, but I think down the road you will
discover passing a flag in libvirt XML saying
"please use a secure mode" or whatever is a good idea.

Even thought it is probably not required to address this
specific issue.

For example, I don't think ballooning works in secure mode,
you will be able to teach libvirt not to try to add a
balloon to the guest.

> Later on, (we may have even already run Linux at that point,
> unsecurely, as we can use Linux as a bootloader under some
> circumstances), we start a "secure image".
> 
> This is a kernel zImage that includes a "ticket" that has the
> appropriate signature etc... so that when that kernel starts, it can
> authenticate with the ultravisor, be verified (along with its ramdisk)
> etc... and copied (by the UV) into secure memory & run from there.
> 
> At that point, the hypervisor is informed that the VM has become
> secure.
> 
> So at that point, we could exit to qemu to inform it of the change,

That's probably a good idea too.

> and
> have it walk the qtree and "Switch" all the virtio devices to use the
> IOMMU I suppose, but it feels a lot grosser to me.

That part feels gross, yes.

> That's the only other option I can think of.
> 
> > However in this specific case, the flag does not need to come from the
> > hypervisor, it can be set by arch boot code I think.
> > Christoph do you see a problem with that?
> 
> The above could do that yes. Another approach would be to do it from a
> small virtio "quirk" that pokes a bit in the device to force it to
> iommu mode when it detects that we are running in a secure VM. That's a
> bit warty on the virito side but probably not as much as having a qemu
> one that walks of the virtio devices to change how they behave.
> 
> What do you reckon ?

I think you are right that for the dma limit the hypervisor doesn't seem
to need to know.

> What we want to avoid is to expose any of this to the *end user* or
> libvirt or any other higher level of the management stack. We really
> want that stuff to remain contained between the VM itself, KVM and
> maybe qemu.
>
> We will need some other qemu changes for migration so that's ok. But
> the minute you start touching libvirt and the higher levels it becomes
> a nightmare.
> 
> Cheers,
> Ben.

I don't believe you'll be able to avoid that entirely. The split between
libvirt and qemu is more about community than about code, random bits of
functionality tend to land on random sides of that fence.  Better add a
tag in domain XML early is my advice. Having said that, it's your
hypervisor. I'm just suggesting that when hypervisor does somehow need
to care then I suspect most people won't be receptive to the argument
that changing libvirt is a nightmare.

> > > >  To get swiotlb you'll need to then use the DT/ACPI
> > > > dma-range property to limit the addressable range, and a swiotlb
> > > > capable plaform will use swiotlb automatically.
> > > 
> > > This cannot be done as you describe it.
> > > 
> > > The VM is created as a *normal* VM. The DT stuff is generated by qemu
> > > at a point where it has *no idea* that the VM will later become secure
> > > and thus will have to restrict which pages can be used for "DMA".
> > > 
> > > The VM will *at runtime* turn itself into a secure VM via interactions
> > > with the security HW and the Ultravisor layer (which sits below the
> > > HV). This happens way after the DT has been created and consumed, the
> > > qemu devices instanciated etc...
> > > 
> > > Only the guest kernel knows because it initates the transition. When
> > > that happens, the virtio devices have already been used by the guest
> > > firmware, bootloader, possibly another kernel that kexeced the "secure"
> > > one, etc... 
> > > 
> > > So instead of running around saying NAK NAK NAK, please explain how we
> > > can solve that differently.
> > > 
> > > Ben.
> > > 

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-04  1:22                                       ` Benjamin Herrenschmidt
@ 2018-08-05  0:23                                         ` Michael S. Tsirkin
  0 siblings, 0 replies; 119+ messages in thread
From: Michael S. Tsirkin @ 2018-08-05  0:23 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Fri, Aug 03, 2018 at 08:22:11PM -0500, Benjamin Herrenschmidt wrote:
> (Appologies if you got this twice, my mailer had a brain fart and I don't
> know if the first one got through & am about to disappear in a plane for 17h)

I got like 3 of these. I hope that's true for everyone as I replied to
1st one.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-04  1:21       ` Benjamin Herrenschmidt
@ 2018-08-05  0:24         ` Michael S. Tsirkin
  2018-08-06  9:02           ` Anshuman Khandual
  0 siblings, 1 reply; 119+ messages in thread
From: Michael S. Tsirkin @ 2018-08-05  0:24 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Jason Wang, Anshuman Khandual, virtualization, linux-kernel,
	linuxppc-dev, aik, robh, joe, elfring, david, mpe, hch, linuxram,
	haren, paulus, srikar

On Fri, Aug 03, 2018 at 08:21:26PM -0500, Benjamin Herrenschmidt wrote:
> On Fri, 2018-08-03 at 22:08 +0300, Michael S. Tsirkin wrote:
> > > > > Please go through these patches and review whether this approach broadly
> > > > > makes sense. I will appreciate suggestions, inputs, comments regarding
> > > > > the patches or the approach in general. Thank you.
> > > > 
> > > > Jason did some work on profiling this. Unfortunately he reports
> > > > about 4% extra overhead from this switch on x86 with no vIOMMU.
> > > 
> > > The test is rather simple, just run pktgen (pktgen_sample01_simple.sh) in
> > > guest and measure PPS on tap on host.
> > > 
> > > Thanks
> > 
> > Could you supply host configuration involved please?
> 
> I wonder how much of that could be caused by Spectre mitigations
> blowing up indirect function calls...
> 
> Cheers,
> Ben.

I won't be surprised. If yes I suggested a way to mitigate the overhead.

-- 
MSR 

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-01  8:16               ` Will Deacon
  2018-08-01  8:36                 ` Christoph Hellwig
@ 2018-08-05  0:27                 ` Michael S. Tsirkin
  2018-08-06 14:05                   ` Will Deacon
  1 sibling, 1 reply; 119+ messages in thread
From: Michael S. Tsirkin @ 2018-08-05  0:27 UTC (permalink / raw)
  To: Will Deacon
  Cc: Benjamin Herrenschmidt, Christoph Hellwig, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Wed, Aug 01, 2018 at 09:16:38AM +0100, Will Deacon wrote:
> On Tue, Jul 31, 2018 at 03:36:22PM -0500, Benjamin Herrenschmidt wrote:
> > On Tue, 2018-07-31 at 10:30 -0700, Christoph Hellwig wrote:
> > > > However the question people raise is that DMA API is already full of
> > > > arch-specific tricks the likes of which are outlined in your post linked
> > > > above. How is this one much worse?
> > > 
> > > None of these warts is visible to the driver, they are all handled in
> > > the architecture (possibly on a per-bus basis).
> > > 
> > > So for virtio we really need to decide if it has one set of behavior
> > > as specified in the virtio spec, or if it behaves exactly as if it
> > > was on a PCI bus, or in fact probably both as you lined up.  But no
> > > magic arch specific behavior inbetween.
> > 
> > The only arch specific behaviour is needed in the case where it doesn't
> > behave like PCI. In this case, the PCI DMA ops are not suitable, but in
> > our secure VMs, we still need to make it use swiotlb in order to bounce
> > through non-secure pages.
> 
> On arm/arm64, the problem we have is that legacy virtio devices on the MMIO
> transport (so definitely not PCI) have historically been advertised by qemu
> as not being cache coherent, but because the virtio core has bypassed DMA
> ops then everything has happened to work. If we blindly enable the arch DMA
> ops, we'll plumb in the non-coherent ops and start getting data corruption,
> so we do need a way to quirk virtio as being "always coherent" if we want to
> use the DMA ops (which we do, because our emulation platforms have an IOMMU
> for all virtio devices).
> 
> Will

Right that's not very different from placing the device within the IOMMU
domain but in fact bypassing the IOMMU. I wonder whether anyone ever
needs a non coherent virtio-mmio. If yes we can extend
PLATFORM_IOMMU to cover that or add another bit.

What exactly do the non-coherent ops do that causes the corruption?

-- 
MST

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-04  8:15                                     ` Christoph Hellwig
  2018-08-05  0:09                                       ` Michael S. Tsirkin
@ 2018-08-05  0:53                                       ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 119+ messages in thread
From: Benjamin Herrenschmidt @ 2018-08-05  0:53 UTC (permalink / raw)
  To: Christoph Hellwig, Michael S. Tsirkin
  Cc: Will Deacon, Anshuman Khandual, virtualization, linux-kernel,
	linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe,
	linuxram, haren, paulus, srikar, robin.murphy,
	jean-philippe.brucker, marc.zyngier

On Sat, 2018-08-04 at 01:15 -0700, Christoph Hellwig wrote:
>   b) a way to document in a virtio-related spec how the bus handles
>      dma for Ben's totally fucked up hypervisor.  Without that there
>      is not way we'll get interoperable implementations.

Christoph, this isn't a totally fucked up hypervisor. It's not even
about the hypervisor itself, I mean seriously, man, can you at least
bother reading what I described is going on with the security
architecture ?

Anyway, Michael is onto what could possibly be an alternative approach,
by having us tell qemu to flip to iommu mode at secure VM boot time.
Let's see where that leads.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-04  8:21                                         ` Christoph Hellwig
@ 2018-08-05  1:10                                           ` Benjamin Herrenschmidt
  2018-08-05  7:29                                             ` Christoph Hellwig
  0 siblings, 1 reply; 119+ messages in thread
From: Benjamin Herrenschmidt @ 2018-08-05  1:10 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Michael S. Tsirkin, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Sat, 2018-08-04 at 01:21 -0700, Christoph Hellwig wrote:
> No matter if you like it or not (I don't!) virtio is defined to bypass
> dma translations, it is very clearly stated in the spec.  It has some
> ill-defined bits to bypass it, so if you want the dma mapping API
> to be used you'll have to set that bit (in its original form, a refined
> form, or an entirely newly defined sane form) and make sure your
> hypersivors always sets it.  It's not rocket science, just a little bit
> for work to make sure your setup is actually going to work reliably
> and portably.

I think you are conflating completely different things, let me try to
clarify, we might actually be talking past each other.

> > We aren't going to cancel years of HW and SW development for our
> 
> Maybe you should have actually read the specs you are claiming to
> implemented before spending all that effort.

Anyway, let's cool our respective jets and sort that out, there are
indeed other approaches than overriding the DMA ops with special ones,
though I find them less tasty ... but here's my attempt at a (simpler)
description.

Bear with me for the long-ish email, this tries to describe the system
so you get an idea where we come from, and options we can use to get
out of this.

So we *are* implementing the spec, since qemu is currently unmodified:

Default virtio will bypass the iommu emulated by qemu as per spec etc..

On the Linux side, thus, virtio "sees" a normal iommu-bypassing device
and will treat it as such.

The problem is the assumption in the middle that qemu can access all
guest pages directly, which holds true for traditional VMs, but breaks
when the VM in our case turns itself into a secure VM. This isn't under
the action (or due to changes in) the hypervisor. KVM operates (almost)
normally here.

But there's this (very thin and open source btw) layer underneath
called ultravisor, which exploits some HW facilities to maintain a
separate pool of "secure" memory, which cannot be physically accessed
by a non-secure entity.

So in our scenario, qemu and KVM create a VM totally normally, there is
no changes required to the VM firmware, bootloader(s), etc... in fact
we support Linux based bootloaders, and those will work as normal linux
would in a VM, virtio works normally, etc...

Until that VM (via grub or kexec for example) loads a "secure image".

That secure image is a Linux kernel which has been "wrapped" (to simply
imagine a modified zImage wrapper though that's not entirely exact).

When that is run, before it modifies it's .data, it will interact with
the ultravisor using a specific HW facility to make itself secure. What
happens then is that the UV cryptographically verifies the kernel and
ramdisk, and copies them to the secure memory where execution returns.

The Ultravisor is then involved as a small shim for hypercalls between
the secure VM and KVM to prevent leakage of information (sanitize
registers etc...).

Now at this point, qemu can no longer access the secure VM pages
(there's more to this, such as using HMM to allow migration/encryption
accross etc... but let's not get bogged down).

So virtio can no longer access any page in the VM.

Now the VM *can* request from the Ultravisor some selected pages to be
made "insecure" and thus shared with qemu. This is how we handle some
of the pages used in our paravirt stuff, and that's how we want to deal
with virtio, by creating an insecure swiotlb pool.

At this point, thus, there are two options.

 - One you have rejected, which is to have a way for "no-iommu" virtio
(which still doesn't use an iommu on the qemu side and doesn't need
to), to be forced to use some custom DMA ops on the VM side.

 - One, which sadly has more overhead and will require modifying more
pieces of the puzzle, which is to make qemu uses an emulated iommu.
Once we make qemu do that, we can then layer swiotlb on top of the
emulated iommu on the guest side, and pass that as dma_ops to virtio.

Now, assuming you still absolutely want us to go down the second
option, there are several ways to get there. We would prefer to avoid
requiring the user to pass some special option to qemu. That has an
impact up the food chain (libvirt, management tools etc...) and users
probably won't understand what it's about. In fact the *end user* might
not even need to know a VM is secure, though applications inside might.

There's the additional annoyance that currently our guest FW (SLOF)
cannot deal with virtio in IOMMU mode, but that's fixable.

From there, refer to the email chain between Michael and I where we are
discussing options to "switch" virtio at runtime on the qemu side.

Any comment or suggestion ?

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-05  0:09                                       ` Michael S. Tsirkin
@ 2018-08-05  1:11                                         ` Benjamin Herrenschmidt
  2018-08-05  7:25                                         ` Christoph Hellwig
  1 sibling, 0 replies; 119+ messages in thread
From: Benjamin Herrenschmidt @ 2018-08-05  1:11 UTC (permalink / raw)
  To: Michael S. Tsirkin, Christoph Hellwig
  Cc: Will Deacon, Anshuman Khandual, virtualization, linux-kernel,
	linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe,
	linuxram, haren, paulus, srikar, robin.murphy,
	jean-philippe.brucker, marc.zyngier

On Sun, 2018-08-05 at 03:09 +0300, Michael S. Tsirkin wrote:
> It seems that the fact that within guest it's implemented using a bounce
> buffer and that it's easiest to do by switching virtio to use the DMA API
> isn't something virtio spec concerns itself with.

Right, this is my reasoning as well. See this other (long) email I just
sent to Christoph to explain the whole flow.

> I'm open to suggestions.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-05  0:22                                         ` Michael S. Tsirkin
@ 2018-08-05  4:52                                           ` Benjamin Herrenschmidt
  2018-08-06 13:46                                             ` Michael S. Tsirkin
  0 siblings, 1 reply; 119+ messages in thread
From: Benjamin Herrenschmidt @ 2018-08-05  4:52 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Sun, 2018-08-05 at 03:22 +0300, Michael S. Tsirkin wrote:
> I see the allure of this, but I think down the road you will
> discover passing a flag in libvirt XML saying
> "please use a secure mode" or whatever is a good idea.
> 
> Even thought it is probably not required to address this
> specific issue.
> 
> For example, I don't think ballooning works in secure mode,
> you will be able to teach libvirt not to try to add a
> balloon to the guest.

Right, we'll need some quirk to disable balloons  in the guest I
suppose.

Passing something from libvirt is cumbersome because the end user may
not even need to know about secure VMs. There are use cases where the
security is a contract down to some special application running inside
the secure VM, the sysadmin knows nothing about.

Also there's repercussions all the way to admin tools, web UIs etc...
so it's fairly wide ranging.

So as long as we only need to quirk a couple of devices, it's much
better contained that way.

> > Later on, (we may have even already run Linux at that point,
> > unsecurely, as we can use Linux as a bootloader under some
> > circumstances), we start a "secure image".
> > 
> > This is a kernel zImage that includes a "ticket" that has the
> > appropriate signature etc... so that when that kernel starts, it can
> > authenticate with the ultravisor, be verified (along with its ramdisk)
> > etc... and copied (by the UV) into secure memory & run from there.
> > 
> > At that point, the hypervisor is informed that the VM has become
> > secure.
> > 
> > So at that point, we could exit to qemu to inform it of the change,
> 
> That's probably a good idea too.

We probably will have to tell qemu eventually for migration, as we'll
need some kind of key exchange phase etc... to deal with the crypto
aspects (the actual page copy is sorted via encrypting the secure pages
back to normal pages in qemu, but we'll need extra metadata).

> > and
> > have it walk the qtree and "Switch" all the virtio devices to use the
> > IOMMU I suppose, but it feels a lot grosser to me.
> 
> That part feels gross, yes.
> 
> > That's the only other option I can think of.
> > 
> > > However in this specific case, the flag does not need to come from the
> > > hypervisor, it can be set by arch boot code I think.
> > > Christoph do you see a problem with that?
> > 
> > The above could do that yes. Another approach would be to do it from a
> > small virtio "quirk" that pokes a bit in the device to force it to
> > iommu mode when it detects that we are running in a secure VM. That's a
> > bit warty on the virito side but probably not as much as having a qemu
> > one that walks of the virtio devices to change how they behave.
> > 
> > What do you reckon ?
> 
> I think you are right that for the dma limit the hypervisor doesn't seem
> to need to know.

It's not just a limit mind you. It's a range, at least if we allocate
just a single pool of insecure pages. swiotlb feels like a better
option for us.

> > What we want to avoid is to expose any of this to the *end user* or
> > libvirt or any other higher level of the management stack. We really
> > want that stuff to remain contained between the VM itself, KVM and
> > maybe qemu.
> > 
> > We will need some other qemu changes for migration so that's ok. But
> > the minute you start touching libvirt and the higher levels it becomes
> > a nightmare.
> > 
> > Cheers,
> > Ben.
> 
> I don't believe you'll be able to avoid that entirely. The split between
> libvirt and qemu is more about community than about code, random bits of
> functionality tend to land on random sides of that fence.  Better add a
> tag in domain XML early is my advice. Having said that, it's your
> hypervisor. I'm just suggesting that when hypervisor does somehow need
> to care then I suspect most people won't be receptive to the argument
> that changing libvirt is a nightmare.

It only needs to care at runtime. The problem isn't changing libvirt
per-se, I don't have a problem with that. The problem is that it means
creating two categories of machines "secure" and "non-secure", which is
end-user visible, and thus has to be escalated to all the various
management stacks, UIs, etc... out there.

In addition, there are some cases where the individual creating the VMs
may not have any idea that they are secure.

But yes, if we have to, we'll do it. However, so far, we don't think
it's a great idea.

Cheers,
Ben.

> > > > >   To get swiotlb you'll need to then use the DT/ACPI
> > > > > dma-range property to limit the addressable range, and a swiotlb
> > > > > capable plaform will use swiotlb automatically.
> > > > 
> > > > This cannot be done as you describe it.
> > > > 
> > > > The VM is created as a *normal* VM. The DT stuff is generated by qemu
> > > > at a point where it has *no idea* that the VM will later become secure
> > > > and thus will have to restrict which pages can be used for "DMA".
> > > > 
> > > > The VM will *at runtime* turn itself into a secure VM via interactions
> > > > with the security HW and the Ultravisor layer (which sits below the
> > > > HV). This happens way after the DT has been created and consumed, the
> > > > qemu devices instanciated etc...
> > > > 
> > > > Only the guest kernel knows because it initates the transition. When
> > > > that happens, the virtio devices have already been used by the guest
> > > > firmware, bootloader, possibly another kernel that kexeced the "secure"
> > > > one, etc... 
> > > > 
> > > > So instead of running around saying NAK NAK NAK, please explain how we
> > > > can solve that differently.
> > > > 
> > > > Ben.


^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-05  0:09                                       ` Michael S. Tsirkin
  2018-08-05  1:11                                         ` Benjamin Herrenschmidt
@ 2018-08-05  7:25                                         ` Christoph Hellwig
  1 sibling, 0 replies; 119+ messages in thread
From: Christoph Hellwig @ 2018-08-05  7:25 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, Benjamin Herrenschmidt, Will Deacon,
	Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev,
	aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren,
	paulus, srikar, robin.murphy, jean-philippe.brucker,
	marc.zyngier

On Sun, Aug 05, 2018 at 03:09:55AM +0300, Michael S. Tsirkin wrote:
> So in this case however I'm not sure what exactly do we want to add. It
> seems that from point of view of the device, there is nothing special -
> it just gets a PA and writes there.  It also seems that guest does not
> need to get any info from the device either. Instead guest itself needs
> device to DMA into specific addresses, for its own reasons.
> 
> It seems that the fact that within guest it's implemented using a bounce
> buffer and that it's easiest to do by switching virtio to use the DMA API
> isn't something virtio spec concerns itself with.

And that is exactly what we added bus_dma_mask for - the case where
the device itself has not limitation (or a bigger limitation), but
the platform limits the accessible dma ranges.  One typical case is
a PCIe root port that is only connected to the CPU through an
interconnect that is limited to 32 address bits for example.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-05  1:10                                           ` Benjamin Herrenschmidt
@ 2018-08-05  7:29                                             ` Christoph Hellwig
  2018-08-05 21:16                                               ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 119+ messages in thread
From: Christoph Hellwig @ 2018-08-05  7:29 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Christoph Hellwig, Michael S. Tsirkin, Will Deacon,
	Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev,
	aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren,
	paulus, srikar, robin.murphy, jean-philippe.brucker,
	marc.zyngier

On Sun, Aug 05, 2018 at 11:10:15AM +1000, Benjamin Herrenschmidt wrote:
>  - One you have rejected, which is to have a way for "no-iommu" virtio
> (which still doesn't use an iommu on the qemu side and doesn't need
> to), to be forced to use some custom DMA ops on the VM side.
> 
>  - One, which sadly has more overhead and will require modifying more
> pieces of the puzzle, which is to make qemu uses an emulated iommu.
> Once we make qemu do that, we can then layer swiotlb on top of the
> emulated iommu on the guest side, and pass that as dma_ops to virtio.

Or number three:  have a a virtio feature bit that tells the VM
to use whatever dma ops the platform thinks are appropinquate for
the bus it pretends to be on.  Then set a dma-range that is limited
to your secure memory range (if you really need it to be runtime
enabled only after a device reset that rescans) and use the normal
dma mapping code to bounce buffer.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-05  7:29                                             ` Christoph Hellwig
@ 2018-08-05 21:16                                               ` Benjamin Herrenschmidt
  2018-08-05 21:30                                                 ` Benjamin Herrenschmidt
  2018-08-06  9:42                                                 ` Christoph Hellwig
  0 siblings, 2 replies; 119+ messages in thread
From: Benjamin Herrenschmidt @ 2018-08-05 21:16 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Michael S. Tsirkin, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Sun, 2018-08-05 at 00:29 -0700, Christoph Hellwig wrote:
> On Sun, Aug 05, 2018 at 11:10:15AM +1000, Benjamin Herrenschmidt wrote:
> >  - One you have rejected, which is to have a way for "no-iommu" virtio
> > (which still doesn't use an iommu on the qemu side and doesn't need
> > to), to be forced to use some custom DMA ops on the VM side.
> > 
> >  - One, which sadly has more overhead and will require modifying more
> > pieces of the puzzle, which is to make qemu uses an emulated iommu.
> > Once we make qemu do that, we can then layer swiotlb on top of the
> > emulated iommu on the guest side, and pass that as dma_ops to virtio.
> 
> Or number three:  have a a virtio feature bit that tells the VM
> to use whatever dma ops the platform thinks are appropinquate for
> the bus it pretends to be on.  Then set a dma-range that is limited
> to your secure memory range (if you really need it to be runtime
> enabled only after a device reset that rescans) and use the normal
> dma mapping code to bounce buffer.

Who would set this bit ? qemu ? Under what circumstances ?

What would be the effect of this bit while VIRTIO_F_IOMMU is NOT set,
ie, what would qemu do and what would Linux do ? I'm not sure I fully
understand your idea.

I'm trying to understand because the limitation is not a device side
limitation, it's not a qemu limitation, it's actually more of a VM
limitation. It has most of its memory pages made inaccessible for
security reasons. The platform from a qemu/KVM perspective is almost
entirely normal.

So I don't understand when would qemu set this bit, or should it be set
by the VM at runtime ?

Cheers,
Ben.






^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-05 21:16                                               ` Benjamin Herrenschmidt
@ 2018-08-05 21:30                                                 ` Benjamin Herrenschmidt
  2018-08-06  9:42                                                 ` Christoph Hellwig
  1 sibling, 0 replies; 119+ messages in thread
From: Benjamin Herrenschmidt @ 2018-08-05 21:30 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Michael S. Tsirkin, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Mon, 2018-08-06 at 07:16 +1000, Benjamin Herrenschmidt wrote:
> I'm trying to understand because the limitation is not a device side
> limitation, it's not a qemu limitation, it's actually more of a VM
> limitation. It has most of its memory pages made inaccessible for
> security reasons. The platform from a qemu/KVM perspective is almost
> entirely normal.

In fact this is probably the best image of what's going on:

It's a normal VM from a KVM/qemu perspective (and thus virtio). It
boots normally, can run firmware, linux, etc... normally, it's not
created with any different XML or qemu command line definition etc...

It just that once it reaches the kernel with the secure stuff enabled
(could be via kexec from a normal kernel), that kernel will "stash
away" most of the VM's memory into some secure space that nothing else
(not even the hypervisor) can access.

It can keep around a pool or two of normal memory for bounce buferring
IOs but that's about it.

I think that's the clearest way I could find to explain what's going
on, and why I'm so resistant on adding things on qemu side.

That said, we *can* (and will) notify KVM and qemu of the transition,
and we can/will do so after virtio has been instanciated and used by
the bootloader, but before it will be used (or even probed) by the
secure VM itself, so there's an opportunity to poke at things, either
from the VM itself (a quirk poking at virtio config space for example)
or from qemu (though I find the idea of iterating all virtio devices
from qemu to change a setting rather gross).

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-05  0:24         ` Michael S. Tsirkin
@ 2018-08-06  9:02           ` Anshuman Khandual
  2018-08-06 13:36             ` Michael S. Tsirkin
  0 siblings, 1 reply; 119+ messages in thread
From: Anshuman Khandual @ 2018-08-06  9:02 UTC (permalink / raw)
  To: Michael S. Tsirkin, Benjamin Herrenschmidt
  Cc: robh, srikar, aik, Jason Wang, linuxram, linux-kernel,
	virtualization, hch, paulus, joe, david, linuxppc-dev, elfring,
	haren

On 08/05/2018 05:54 AM, Michael S. Tsirkin wrote:
> On Fri, Aug 03, 2018 at 08:21:26PM -0500, Benjamin Herrenschmidt wrote:
>> On Fri, 2018-08-03 at 22:08 +0300, Michael S. Tsirkin wrote:
>>>>>> Please go through these patches and review whether this approach broadly
>>>>>> makes sense. I will appreciate suggestions, inputs, comments regarding
>>>>>> the patches or the approach in general. Thank you.
>>>>>
>>>>> Jason did some work on profiling this. Unfortunately he reports
>>>>> about 4% extra overhead from this switch on x86 with no vIOMMU.
>>>>
>>>> The test is rather simple, just run pktgen (pktgen_sample01_simple.sh) in
>>>> guest and measure PPS on tap on host.
>>>>
>>>> Thanks
>>>
>>> Could you supply host configuration involved please?
>>
>> I wonder how much of that could be caused by Spectre mitigations
>> blowing up indirect function calls...
>>
>> Cheers,
>> Ben.
> 
> I won't be surprised. If yes I suggested a way to mitigate the overhead.

Did we get better results (lower regression due to indirect calls) with
the suggested mitigation ? Just curious.


^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-05 21:16                                               ` Benjamin Herrenschmidt
  2018-08-05 21:30                                                 ` Benjamin Herrenschmidt
@ 2018-08-06  9:42                                                 ` Christoph Hellwig
  2018-08-06 19:52                                                   ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 119+ messages in thread
From: Christoph Hellwig @ 2018-08-06  9:42 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Christoph Hellwig, Michael S. Tsirkin, Will Deacon,
	Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev,
	aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren,
	paulus, srikar, robin.murphy, jean-philippe.brucker,
	marc.zyngier

On Mon, Aug 06, 2018 at 07:16:47AM +1000, Benjamin Herrenschmidt wrote:
> Who would set this bit ? qemu ? Under what circumstances ?

I don't really care who sets what.  The implementation might not even
involved qemu.

It is your job to write a coherent interface specification that does
not depend on the used components.  The hypervisor might be PAPR,
Linux + qemu, VMware, Hyperv or something so secret that you'd have
to shoot me if you had to tell me.  The guest might be Linux, FreeBSD,
AIX, OS400 or a Hipster project of the day in Rust.  As long as we
properly specify the interface it simplify does not matter.

> What would be the effect of this bit while VIRTIO_F_IOMMU is NOT set,
> ie, what would qemu do and what would Linux do ? I'm not sure I fully
> understand your idea.

In a perfect would we'd just reuse VIRTIO_F_IOMMU and clarify the
description which currently is rather vague but basically captures
the use case.  Currently is is:

VIRTIO_F_IOMMU_PLATFORM(33)
    This feature indicates that the device is behind an IOMMU that
    translates bus addresses from the device into physical addresses in
    memory. If this feature bit is set to 0, then the device emits
    physical addresses which are not translated further, even though an
    IOMMU may be present.

And I'd change it to something like:

VIRTIO_F_PLATFORM_DMA(33)
    This feature indicates that the device emits platform specific
    bus addresses that might not be identical to physical address.
    The translation of physical to bus address is platform speific
    and defined by the plaform specification for the bus that the virtio
    device is attached to.
    If this feature bit is set to 0, then the device emits
    physical addresses which are not translated further, even if
    the platform would normally require translations for the bus that
    the virtio device is attached to.

If we can't change the defintion any more we should deprecate the
old VIRTIO_F_IOMMU_PLATFORM bit, and require the VIRTIO_F_IOMMU_PLATFORM
and VIRTIO_F_PLATFORM_DMA to be not set at the same time.

> I'm trying to understand because the limitation is not a device side
> limitation, it's not a qemu limitation, it's actually more of a VM
> limitation. It has most of its memory pages made inaccessible for
> security reasons. The platform from a qemu/KVM perspective is almost
> entirely normal.

Well, find a way to describe this either in the qemu specification using
new feature bits, or by using something like the above.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-06  9:02           ` Anshuman Khandual
@ 2018-08-06 13:36             ` Michael S. Tsirkin
  2018-08-06 15:24               ` Christoph Hellwig
  0 siblings, 1 reply; 119+ messages in thread
From: Michael S. Tsirkin @ 2018-08-06 13:36 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: Benjamin Herrenschmidt, robh, srikar, aik, Jason Wang, linuxram,
	linux-kernel, virtualization, hch, paulus, joe, david,
	linuxppc-dev, elfring, haren

On Mon, Aug 06, 2018 at 02:32:28PM +0530, Anshuman Khandual wrote:
> On 08/05/2018 05:54 AM, Michael S. Tsirkin wrote:
> > On Fri, Aug 03, 2018 at 08:21:26PM -0500, Benjamin Herrenschmidt wrote:
> >> On Fri, 2018-08-03 at 22:08 +0300, Michael S. Tsirkin wrote:
> >>>>>> Please go through these patches and review whether this approach broadly
> >>>>>> makes sense. I will appreciate suggestions, inputs, comments regarding
> >>>>>> the patches or the approach in general. Thank you.
> >>>>>
> >>>>> Jason did some work on profiling this. Unfortunately he reports
> >>>>> about 4% extra overhead from this switch on x86 with no vIOMMU.
> >>>>
> >>>> The test is rather simple, just run pktgen (pktgen_sample01_simple.sh) in
> >>>> guest and measure PPS on tap on host.
> >>>>
> >>>> Thanks
> >>>
> >>> Could you supply host configuration involved please?
> >>
> >> I wonder how much of that could be caused by Spectre mitigations
> >> blowing up indirect function calls...
> >>
> >> Cheers,
> >> Ben.
> > 
> > I won't be surprised. If yes I suggested a way to mitigate the overhead.
> 
> Did we get better results (lower regression due to indirect calls) with
> the suggested mitigation ? Just curious.

I'm referring to this:
	I wonder whether we can support map_sg and friends being NULL, then use
	that when mapping is an identity. A conditional branch there is likely
	very cheap.

I don't think anyone tried implementing this yes.

-- 
MST

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-05  4:52                                           ` Benjamin Herrenschmidt
@ 2018-08-06 13:46                                             ` Michael S. Tsirkin
  2018-08-06 19:56                                               ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 119+ messages in thread
From: Michael S. Tsirkin @ 2018-08-06 13:46 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Sun, Aug 05, 2018 at 02:52:54PM +1000, Benjamin Herrenschmidt wrote:
> On Sun, 2018-08-05 at 03:22 +0300, Michael S. Tsirkin wrote:
> > I see the allure of this, but I think down the road you will
> > discover passing a flag in libvirt XML saying
> > "please use a secure mode" or whatever is a good idea.
> > 
> > Even thought it is probably not required to address this
> > specific issue.
> > 
> > For example, I don't think ballooning works in secure mode,
> > you will be able to teach libvirt not to try to add a
> > balloon to the guest.
> 
> Right, we'll need some quirk to disable balloons  in the guest I
> suppose.
> 
> Passing something from libvirt is cumbersome because the end user may
> not even need to know about secure VMs. There are use cases where the
> security is a contract down to some special application running inside
> the secure VM, the sysadmin knows nothing about.
> 
> Also there's repercussions all the way to admin tools, web UIs etc...
> so it's fairly wide ranging.
> 
> So as long as we only need to quirk a couple of devices, it's much
> better contained that way.

So just the balloon thing already means that yes management and all the
way to the user tools must know this is going on. Otherwise
user will try to inflate the balloon and wonder why this does not work.

> > > Later on, (we may have even already run Linux at that point,
> > > unsecurely, as we can use Linux as a bootloader under some
> > > circumstances), we start a "secure image".
> > > 
> > > This is a kernel zImage that includes a "ticket" that has the
> > > appropriate signature etc... so that when that kernel starts, it can
> > > authenticate with the ultravisor, be verified (along with its ramdisk)
> > > etc... and copied (by the UV) into secure memory & run from there.
> > > 
> > > At that point, the hypervisor is informed that the VM has become
> > > secure.
> > > 
> > > So at that point, we could exit to qemu to inform it of the change,
> > 
> > That's probably a good idea too.
> 
> We probably will have to tell qemu eventually for migration, as we'll
> need some kind of key exchange phase etc... to deal with the crypto
> aspects (the actual page copy is sorted via encrypting the secure pages
> back to normal pages in qemu, but we'll need extra metadata).
> 
> > > and
> > > have it walk the qtree and "Switch" all the virtio devices to use the
> > > IOMMU I suppose, but it feels a lot grosser to me.
> > 
> > That part feels gross, yes.
> > 
> > > That's the only other option I can think of.
> > > 
> > > > However in this specific case, the flag does not need to come from the
> > > > hypervisor, it can be set by arch boot code I think.
> > > > Christoph do you see a problem with that?
> > > 
> > > The above could do that yes. Another approach would be to do it from a
> > > small virtio "quirk" that pokes a bit in the device to force it to
> > > iommu mode when it detects that we are running in a secure VM. That's a
> > > bit warty on the virito side but probably not as much as having a qemu
> > > one that walks of the virtio devices to change how they behave.
> > > 
> > > What do you reckon ?
> > 
> > I think you are right that for the dma limit the hypervisor doesn't seem
> > to need to know.
> 
> It's not just a limit mind you. It's a range, at least if we allocate
> just a single pool of insecure pages. swiotlb feels like a better
> option for us.
> 
> > > What we want to avoid is to expose any of this to the *end user* or
> > > libvirt or any other higher level of the management stack. We really
> > > want that stuff to remain contained between the VM itself, KVM and
> > > maybe qemu.
> > > 
> > > We will need some other qemu changes for migration so that's ok. But
> > > the minute you start touching libvirt and the higher levels it becomes
> > > a nightmare.
> > > 
> > > Cheers,
> > > Ben.
> > 
> > I don't believe you'll be able to avoid that entirely. The split between
> > libvirt and qemu is more about community than about code, random bits of
> > functionality tend to land on random sides of that fence.  Better add a
> > tag in domain XML early is my advice. Having said that, it's your
> > hypervisor. I'm just suggesting that when hypervisor does somehow need
> > to care then I suspect most people won't be receptive to the argument
> > that changing libvirt is a nightmare.
> 
> It only needs to care at runtime. The problem isn't changing libvirt
> per-se, I don't have a problem with that. The problem is that it means
> creating two categories of machines "secure" and "non-secure", which is
> end-user visible, and thus has to be escalated to all the various
> management stacks, UIs, etc... out there.
> 
> In addition, there are some cases where the individual creating the VMs
> may not have any idea that they are secure.
> 
> But yes, if we have to, we'll do it. However, so far, we don't think
> it's a great idea.
> 
> Cheers,
> Ben.

Here's another example: you can't migrate a secure vm to hypervisor
which doesn't support this feature. Again management tools above libvirt
need to know otherwise they will try.

> > > > > >   To get swiotlb you'll need to then use the DT/ACPI
> > > > > > dma-range property to limit the addressable range, and a swiotlb
> > > > > > capable plaform will use swiotlb automatically.
> > > > > 
> > > > > This cannot be done as you describe it.
> > > > > 
> > > > > The VM is created as a *normal* VM. The DT stuff is generated by qemu
> > > > > at a point where it has *no idea* that the VM will later become secure
> > > > > and thus will have to restrict which pages can be used for "DMA".
> > > > > 
> > > > > The VM will *at runtime* turn itself into a secure VM via interactions
> > > > > with the security HW and the Ultravisor layer (which sits below the
> > > > > HV). This happens way after the DT has been created and consumed, the
> > > > > qemu devices instanciated etc...
> > > > > 
> > > > > Only the guest kernel knows because it initates the transition. When
> > > > > that happens, the virtio devices have already been used by the guest
> > > > > firmware, bootloader, possibly another kernel that kexeced the "secure"
> > > > > one, etc... 
> > > > > 
> > > > > So instead of running around saying NAK NAK NAK, please explain how we
> > > > > can solve that differently.
> > > > > 
> > > > > Ben.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-05  0:27                 ` Michael S. Tsirkin
@ 2018-08-06 14:05                   ` Will Deacon
  0 siblings, 0 replies; 119+ messages in thread
From: Will Deacon @ 2018-08-06 14:05 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Benjamin Herrenschmidt, Christoph Hellwig, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

Hi Michael,

On Sun, Aug 05, 2018 at 03:27:42AM +0300, Michael S. Tsirkin wrote:
> On Wed, Aug 01, 2018 at 09:16:38AM +0100, Will Deacon wrote:
> > On Tue, Jul 31, 2018 at 03:36:22PM -0500, Benjamin Herrenschmidt wrote:
> > > On Tue, 2018-07-31 at 10:30 -0700, Christoph Hellwig wrote:
> > > > > However the question people raise is that DMA API is already full of
> > > > > arch-specific tricks the likes of which are outlined in your post linked
> > > > > above. How is this one much worse?
> > > > 
> > > > None of these warts is visible to the driver, they are all handled in
> > > > the architecture (possibly on a per-bus basis).
> > > > 
> > > > So for virtio we really need to decide if it has one set of behavior
> > > > as specified in the virtio spec, or if it behaves exactly as if it
> > > > was on a PCI bus, or in fact probably both as you lined up.  But no
> > > > magic arch specific behavior inbetween.
> > > 
> > > The only arch specific behaviour is needed in the case where it doesn't
> > > behave like PCI. In this case, the PCI DMA ops are not suitable, but in
> > > our secure VMs, we still need to make it use swiotlb in order to bounce
> > > through non-secure pages.
> > 
> > On arm/arm64, the problem we have is that legacy virtio devices on the MMIO
> > transport (so definitely not PCI) have historically been advertised by qemu
> > as not being cache coherent, but because the virtio core has bypassed DMA
> > ops then everything has happened to work. If we blindly enable the arch DMA
> > ops, we'll plumb in the non-coherent ops and start getting data corruption,
> > so we do need a way to quirk virtio as being "always coherent" if we want to
> > use the DMA ops (which we do, because our emulation platforms have an IOMMU
> > for all virtio devices).
> > 
> > Will
> 
> Right that's not very different from placing the device within the IOMMU
> domain but in fact bypassing the IOMMU

Hmm, I'm not sure I follow you here -- the IOMMU bypassing is handled
inside the IOMMU driver, so we'd still end up with non-coherent DMA ops
for the guest accesses. The presence of an IOMMU doesn't imply coherency for
us. Or am I missing your point here?

> I wonder whether anyone ever needs a non coherent virtio-mmio. If yes we
> can extend PLATFORM_IOMMU to cover that or add another bit.

I think that's probably the right way around: assume that legacy virtio-mmio
devices are coherent by default.

> What exactly do the non-coherent ops do that causes the corruption?

The non-coherent ops mean that the guest ends up allocating the vring queues
using non-cacheable mappings, whereas qemu/hypervisor uses a cacheable
mapping despite not advertising the devices as being cache-coherent.

This hits something in the architecture known as "mismatched aliases", which
means that coherency is lost between the guest and the hypervisor,
consequently resulting in data not being visible and ordering not being
guaranteed. The usual symptom is that the device appears to lock up iirc,
because the guest and the hypervisor are unable to communicate with each
other.

Does that help to clarify things?

Thanks,

Will

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-06 13:36             ` Michael S. Tsirkin
@ 2018-08-06 15:24               ` Christoph Hellwig
  2018-08-06 16:06                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 119+ messages in thread
From: Christoph Hellwig @ 2018-08-06 15:24 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Anshuman Khandual, Benjamin Herrenschmidt, robh, srikar, aik,
	Jason Wang, linuxram, linux-kernel, virtualization, hch, paulus,
	joe, david, linuxppc-dev, elfring, haren

On Mon, Aug 06, 2018 at 04:36:43PM +0300, Michael S. Tsirkin wrote:
> On Mon, Aug 06, 2018 at 02:32:28PM +0530, Anshuman Khandual wrote:
> > On 08/05/2018 05:54 AM, Michael S. Tsirkin wrote:
> > > On Fri, Aug 03, 2018 at 08:21:26PM -0500, Benjamin Herrenschmidt wrote:
> > >> On Fri, 2018-08-03 at 22:08 +0300, Michael S. Tsirkin wrote:
> > >>>>>> Please go through these patches and review whether this approach broadly
> > >>>>>> makes sense. I will appreciate suggestions, inputs, comments regarding
> > >>>>>> the patches or the approach in general. Thank you.
> > >>>>>
> > >>>>> Jason did some work on profiling this. Unfortunately he reports
> > >>>>> about 4% extra overhead from this switch on x86 with no vIOMMU.
> > >>>>
> > >>>> The test is rather simple, just run pktgen (pktgen_sample01_simple.sh) in
> > >>>> guest and measure PPS on tap on host.
> > >>>>
> > >>>> Thanks
> > >>>
> > >>> Could you supply host configuration involved please?
> > >>
> > >> I wonder how much of that could be caused by Spectre mitigations
> > >> blowing up indirect function calls...
> > >>
> > >> Cheers,
> > >> Ben.
> > > 
> > > I won't be surprised. If yes I suggested a way to mitigate the overhead.
> > 
> > Did we get better results (lower regression due to indirect calls) with
> > the suggested mitigation ? Just curious.
> 
> I'm referring to this:
> 	I wonder whether we can support map_sg and friends being NULL, then use
> 	that when mapping is an identity. A conditional branch there is likely
> 	very cheap.
> 
> I don't think anyone tried implementing this yes.

I've done something very similar in the thread I posted a few years
ago. I plan to get a version of that upstream for 4.20, but it won't
cover the virtio case, just the real direct mapping.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-06 15:24               ` Christoph Hellwig
@ 2018-08-06 16:06                 ` Michael S. Tsirkin
  2018-08-06 16:10                   ` Christoph Hellwig
  0 siblings, 1 reply; 119+ messages in thread
From: Michael S. Tsirkin @ 2018-08-06 16:06 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Anshuman Khandual, Benjamin Herrenschmidt, robh, srikar, aik,
	Jason Wang, linuxram, linux-kernel, virtualization, paulus, joe,
	david, linuxppc-dev, elfring, haren

On Mon, Aug 06, 2018 at 08:24:06AM -0700, Christoph Hellwig wrote:
> On Mon, Aug 06, 2018 at 04:36:43PM +0300, Michael S. Tsirkin wrote:
> > On Mon, Aug 06, 2018 at 02:32:28PM +0530, Anshuman Khandual wrote:
> > > On 08/05/2018 05:54 AM, Michael S. Tsirkin wrote:
> > > > On Fri, Aug 03, 2018 at 08:21:26PM -0500, Benjamin Herrenschmidt wrote:
> > > >> On Fri, 2018-08-03 at 22:08 +0300, Michael S. Tsirkin wrote:
> > > >>>>>> Please go through these patches and review whether this approach broadly
> > > >>>>>> makes sense. I will appreciate suggestions, inputs, comments regarding
> > > >>>>>> the patches or the approach in general. Thank you.
> > > >>>>>
> > > >>>>> Jason did some work on profiling this. Unfortunately he reports
> > > >>>>> about 4% extra overhead from this switch on x86 with no vIOMMU.
> > > >>>>
> > > >>>> The test is rather simple, just run pktgen (pktgen_sample01_simple.sh) in
> > > >>>> guest and measure PPS on tap on host.
> > > >>>>
> > > >>>> Thanks
> > > >>>
> > > >>> Could you supply host configuration involved please?
> > > >>
> > > >> I wonder how much of that could be caused by Spectre mitigations
> > > >> blowing up indirect function calls...
> > > >>
> > > >> Cheers,
> > > >> Ben.
> > > > 
> > > > I won't be surprised. If yes I suggested a way to mitigate the overhead.
> > > 
> > > Did we get better results (lower regression due to indirect calls) with
> > > the suggested mitigation ? Just curious.
> > 
> > I'm referring to this:
> > 	I wonder whether we can support map_sg and friends being NULL, then use
> > 	that when mapping is an identity. A conditional branch there is likely
> > 	very cheap.
> > 
> > I don't think anyone tried implementing this yes.
> 
> I've done something very similar in the thread I posted a few years
> ago.

Right so that was before spectre where a virtual call was cheaper :(

> I plan to get a version of that upstream for 4.20, but it won't
> cover the virtio case, just the real direct mapping.

I guess this RFC will have to be reworked on top and performance retested.

Thanks,

-- 
MST

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-06 16:06                 ` Michael S. Tsirkin
@ 2018-08-06 16:10                   ` Christoph Hellwig
  2018-08-06 16:13                     ` Michael S. Tsirkin
  0 siblings, 1 reply; 119+ messages in thread
From: Christoph Hellwig @ 2018-08-06 16:10 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, Anshuman Khandual, Benjamin Herrenschmidt,
	robh, srikar, aik, Jason Wang, linuxram, linux-kernel,
	virtualization, paulus, joe, david, linuxppc-dev, elfring, haren

On Mon, Aug 06, 2018 at 07:06:05PM +0300, Michael S. Tsirkin wrote:
> > I've done something very similar in the thread I posted a few years
> > ago.
> 
> Right so that was before spectre where a virtual call was cheaper :(

Sorry, I meant days, not years.  The whole point of the thread was the
slowdowns due to retpolines, which are the software spectre mitigation.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-06 16:10                   ` Christoph Hellwig
@ 2018-08-06 16:13                     ` Michael S. Tsirkin
  2018-08-06 16:34                       ` Christoph Hellwig
  0 siblings, 1 reply; 119+ messages in thread
From: Michael S. Tsirkin @ 2018-08-06 16:13 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Anshuman Khandual, Benjamin Herrenschmidt, robh, srikar, aik,
	Jason Wang, linuxram, linux-kernel, virtualization, paulus, joe,
	david, linuxppc-dev, elfring, haren

On Mon, Aug 06, 2018 at 09:10:40AM -0700, Christoph Hellwig wrote:
> On Mon, Aug 06, 2018 at 07:06:05PM +0300, Michael S. Tsirkin wrote:
> > > I've done something very similar in the thread I posted a few years
> > > ago.
> > 
> > Right so that was before spectre where a virtual call was cheaper :(
> 
> Sorry, I meant days, not years.  The whole point of the thread was the
> slowdowns due to retpolines, which are the software spectre mitigation.

Oh that makes sense then. Could you post a pointer pls so
this patchset is rebased on top (there are things to
change about 4/4 but 1-3 could go in if they don't add
overhead)?

-- 
MST

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-06 16:13                     ` Michael S. Tsirkin
@ 2018-08-06 16:34                       ` Christoph Hellwig
  0 siblings, 0 replies; 119+ messages in thread
From: Christoph Hellwig @ 2018-08-06 16:34 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, Anshuman Khandual, Benjamin Herrenschmidt,
	robh, srikar, aik, Jason Wang, linuxram, linux-kernel,
	virtualization, paulus, joe, david, linuxppc-dev, elfring, haren

On Mon, Aug 06, 2018 at 07:13:32PM +0300, Michael S. Tsirkin wrote:
> Oh that makes sense then. Could you post a pointer pls so
> this patchset is rebased on top (there are things to
> change about 4/4 but 1-3 could go in if they don't add
> overhead)?

The dma mapping direct calls will need a major work vs what I posted.
I plan to start that work in about two weeks once returning from my
vacation.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-06  9:42                                                 ` Christoph Hellwig
@ 2018-08-06 19:52                                                   ` Benjamin Herrenschmidt
  2018-08-07  6:21                                                     ` Christoph Hellwig
  0 siblings, 1 reply; 119+ messages in thread
From: Benjamin Herrenschmidt @ 2018-08-06 19:52 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Michael S. Tsirkin, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Mon, 2018-08-06 at 02:42 -0700, Christoph Hellwig wrote:
> On Mon, Aug 06, 2018 at 07:16:47AM +1000, Benjamin Herrenschmidt wrote:
> > Who would set this bit ? qemu ? Under what circumstances ?
> 
> I don't really care who sets what.  The implementation might not even
> involved qemu.
> 
> It is your job to write a coherent interface specification that does
> not depend on the used components.  The hypervisor might be PAPR,
> Linux + qemu, VMware, Hyperv or something so secret that you'd have
> to shoot me if you had to tell me.  The guest might be Linux, FreeBSD,
> AIX, OS400 or a Hipster project of the day in Rust.  As long as we
> properly specify the interface it simplify does not matter.

That's the point Christoph. The interface is today's interface. It does
NOT change. That information is not part of the interface.

It's the VM itself that is stashing away its memory in a secret place,
and thus needs to do bounce buffering. There is no change to the virtio
interface per-se.

> > What would be the effect of this bit while VIRTIO_F_IOMMU is NOT set,
> > ie, what would qemu do and what would Linux do ? I'm not sure I fully
> > understand your idea.
> 
> In a perfect would we'd just reuse VIRTIO_F_IOMMU and clarify the
> description which currently is rather vague but basically captures
> the use case.  Currently is is:
> 
> VIRTIO_F_IOMMU_PLATFORM(33)
>     This feature indicates that the device is behind an IOMMU that
>     translates bus addresses from the device into physical addresses in
>     memory. If this feature bit is set to 0, then the device emits
>     physical addresses which are not translated further, even though an
>     IOMMU may be present.
> 
> And I'd change it to something like:
> 
> VIRTIO_F_PLATFORM_DMA(33)
>     This feature indicates that the device emits platform specific
>     bus addresses that might not be identical to physical address.
>     The translation of physical to bus address is platform speific
>     and defined by the plaform specification for the bus that the virtio
>     device is attached to.
>     If this feature bit is set to 0, then the device emits
>     physical addresses which are not translated further, even if
>     the platform would normally require translations for the bus that
>     the virtio device is attached to.
> 
> If we can't change the defintion any more we should deprecate the
> old VIRTIO_F_IOMMU_PLATFORM bit, and require the VIRTIO_F_IOMMU_PLATFORM
> and VIRTIO_F_PLATFORM_DMA to be not set at the same time.

But this doesn't really change our problem does it ?

None of what happens in our case is part of the "interface". The
suggestion to force the iommu ON was simply that it was a "workaround"
as by doing so, we get to override the DMA ops, but that's just a
trick.

Fundamentally, what we need to solve is pretty much entirely a guest
problem.

> > I'm trying to understand because the limitation is not a device side
> > limitation, it's not a qemu limitation, it's actually more of a VM
> > limitation. It has most of its memory pages made inaccessible for
> > security reasons. The platform from a qemu/KVM perspective is almost
> > entirely normal.
> 
> Well, find a way to describe this either in the qemu specification using
> new feature bits, or by using something like the above.

But again, why do you want to involve the interface, and thus the
hypervisor for something that is essentially what the guest is doign to
itself ?

It really is something we need to solve locally to the guest, it's not
part of the interface.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-06 13:46                                             ` Michael S. Tsirkin
@ 2018-08-06 19:56                                               ` Benjamin Herrenschmidt
  2018-08-06 20:35                                                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 119+ messages in thread
From: Benjamin Herrenschmidt @ 2018-08-06 19:56 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Mon, 2018-08-06 at 16:46 +0300, Michael S. Tsirkin wrote:
> 
> > Right, we'll need some quirk to disable balloons  in the guest I
> > suppose.
> > 
> > Passing something from libvirt is cumbersome because the end user may
> > not even need to know about secure VMs. There are use cases where the
> > security is a contract down to some special application running inside
> > the secure VM, the sysadmin knows nothing about.
> > 
> > Also there's repercussions all the way to admin tools, web UIs etc...
> > so it's fairly wide ranging.
> > 
> > So as long as we only need to quirk a couple of devices, it's much
> > better contained that way.
> 
> So just the balloon thing already means that yes management and all the
> way to the user tools must know this is going on. Otherwise
> user will try to inflate the balloon and wonder why this does not work.

There is *dozens* of management systems out there, not even all open
source, we won't ever be able to see the end of the tunnel if we need
to teach every single of them, including end users, about platform
specific new VM flags like that.

.../...

> Here's another example: you can't migrate a secure vm to hypervisor
> which doesn't support this feature. Again management tools above libvirt
> need to know otherwise they will try.

There will have to be a new machine type for that I suppose, yes,
though it's not just the hypervisor that needs to know about the
modified migration stream, it's also the need to have a compatible
ultravisor with the right keys on the other side.

So migration is going to be special and require extra admin work in all
cases yes. But not all secure VMs are meant to be migratable.

In any case, back to the problem at hand. What a qemu flag gives us is
just a way to force iommu at VM creation time.

This is rather sub-optimal, we don't really want the iommu in the way,
so it's at best a "workaround", and it's not really solving the real
problem.

As I said replying to Christoph, we are "leaking" into the interface
something here that is really what's the VM is doing to itself, which
is to stash its memory away in an inaccessible place.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-06 19:56                                               ` Benjamin Herrenschmidt
@ 2018-08-06 20:35                                                 ` Michael S. Tsirkin
  2018-08-06 21:26                                                   ` Benjamin Herrenschmidt
                                                                     ` (2 more replies)
  0 siblings, 3 replies; 119+ messages in thread
From: Michael S. Tsirkin @ 2018-08-06 20:35 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Tue, Aug 07, 2018 at 05:56:59AM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2018-08-06 at 16:46 +0300, Michael S. Tsirkin wrote:
> > 
> > > Right, we'll need some quirk to disable balloons  in the guest I
> > > suppose.
> > > 
> > > Passing something from libvirt is cumbersome because the end user may
> > > not even need to know about secure VMs. There are use cases where the
> > > security is a contract down to some special application running inside
> > > the secure VM, the sysadmin knows nothing about.
> > > 
> > > Also there's repercussions all the way to admin tools, web UIs etc...
> > > so it's fairly wide ranging.
> > > 
> > > So as long as we only need to quirk a couple of devices, it's much
> > > better contained that way.
> > 
> > So just the balloon thing already means that yes management and all the
> > way to the user tools must know this is going on. Otherwise
> > user will try to inflate the balloon and wonder why this does not work.
> 
> There is *dozens* of management systems out there, not even all open
> source, we won't ever be able to see the end of the tunnel if we need
> to teach every single of them, including end users, about platform
> specific new VM flags like that.
> 
> .../...

In the end I suspect you will find you have to.

> > Here's another example: you can't migrate a secure vm to hypervisor
> > which doesn't support this feature. Again management tools above libvirt
> > need to know otherwise they will try.
> 
> There will have to be a new machine type for that I suppose, yes,
> though it's not just the hypervisor that needs to know about the
> modified migration stream, it's also the need to have a compatible
> ultravisor with the right keys on the other side.
> 
> So migration is going to be special and require extra admin work in all
> cases yes. But not all secure VMs are meant to be migratable.
> 
> In any case, back to the problem at hand. What a qemu flag gives us is
> just a way to force iommu at VM creation time.

I don't think a qemu flag is strictly required for a problem at hand.

> This is rather sub-optimal, we don't really want the iommu in the way,
> so it's at best a "workaround", and it's not really solving the real
> problem.

This specific problem, I think I agree.

> As I said replying to Christoph, we are "leaking" into the interface
> something here that is really what's the VM is doing to itself, which
> is to stash its memory away in an inaccessible place.
> 
> Cheers,
> Ben.

I think Christoph merely objects to the specific implementation.  If
instead you do something like tweak dev->bus_dma_mask for the virtio
device I think he won't object.

-- 
MST

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-06 20:35                                                 ` Michael S. Tsirkin
@ 2018-08-06 21:26                                                   ` Benjamin Herrenschmidt
  2018-08-06 21:46                                                     ` Michael S. Tsirkin
  2018-08-07  6:16                                                     ` Christoph Hellwig
  2018-08-06 23:18                                                   ` Benjamin Herrenschmidt
  2018-08-07  6:12                                                   ` Christoph Hellwig
  2 siblings, 2 replies; 119+ messages in thread
From: Benjamin Herrenschmidt @ 2018-08-06 21:26 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Mon, 2018-08-06 at 23:35 +0300, Michael S. Tsirkin wrote:
> > As I said replying to Christoph, we are "leaking" into the interface
> > something here that is really what's the VM is doing to itself, which
> > is to stash its memory away in an inaccessible place.
> > 
> > Cheers,
> > Ben.
> 
> I think Christoph merely objects to the specific implementation.  If
> instead you do something like tweak dev->bus_dma_mask for the virtio
> device I think he won't object.

Well, we don't have "bus_dma_mask" yet ..or you mean dma_mask ?

So, something like that would be a possibility, but the problem is that
the current virtio (guest side) implementation doesn't honor this when
not using dma ops and will not use dma ops if not using iommu, so back
to square one.

Christoph seems to be wanting to use a flag in the interface to make
the guest use dma_ops which is what I don't understand.

What would be needed then would be something along the lines of virtio
noticing that dma_mask isn't big enough to cover all of memory (which
isn't something generic code can easily do here for various reasons I
can elaborate if you want, but that specific test more/less has to be
arch specific), and in that case, force itself to use DMA ops routed to
swiotlb.

I'd rather have arch code do the bulk of that work, don't you think ?

Which brings me back to this option, which may be the simplest and
avoids the overhead of the proposed series (I found the series to be a
nice cleanup but retpoline does kick us in the nuts here).

So what about this ?

--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -155,7 +155,7 @@ static bool vring_use_dma_api(struct virtio_device
*vdev)
         * the DMA API if we're a Xen guest, which at least allows
         * all of the sensible Xen configurations to work correctly.
         */
-       if (xen_domain())
+       if (xen_domain() || arch_virtio_direct_dma_ops(&vdev->dev))
                return true;
 
        return false;

(Passing the dev allows the arch to know this is a virtio device in
"direct" mode or whatever we want to call the !iommu case, and
construct appropriate DMA ops for it, which aren't the same as the DMA
ops of any other PCI device who *do* use the iommu).

Otherwise, the harder option would be for us to hack so that
xen_domain() returns true in our setup (gross), and have the arch code,
when it sets up PCI device DMA ops, have a gross hack to identify
virtio PCI devices, checks their F_IOMMU flag itself, and sets up the
different ops at that point.

As for those "special" ops, they are of course just normal swiotlb ops,
there's nothing "special" other that they aren't the ops that other PCI
device on that bus use.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-06 21:26                                                   ` Benjamin Herrenschmidt
@ 2018-08-06 21:46                                                     ` Michael S. Tsirkin
  2018-08-06 22:13                                                       ` Benjamin Herrenschmidt
  2018-08-07  6:18                                                       ` Christoph Hellwig
  2018-08-07  6:16                                                     ` Christoph Hellwig
  1 sibling, 2 replies; 119+ messages in thread
From: Michael S. Tsirkin @ 2018-08-06 21:46 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Tue, Aug 07, 2018 at 07:26:35AM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2018-08-06 at 23:35 +0300, Michael S. Tsirkin wrote:
> > > As I said replying to Christoph, we are "leaking" into the interface
> > > something here that is really what's the VM is doing to itself, which
> > > is to stash its memory away in an inaccessible place.
> > > 
> > > Cheers,
> > > Ben.
> > 
> > I think Christoph merely objects to the specific implementation.  If
> > instead you do something like tweak dev->bus_dma_mask for the virtio
> > device I think he won't object.
> 
> Well, we don't have "bus_dma_mask" yet ..or you mean dma_mask ?
> 
> So, something like that would be a possibility, but the problem is that
> the current virtio (guest side) implementation doesn't honor this when
> not using dma ops and will not use dma ops if not using iommu, so back
> to square one.

Well we have the RFC for that - the switch to using DMA ops unconditionally isn't
problematic itself IMHO, for now that RFC is blocked
by its perfromance overhead for now but Christoph says
he's trying to remove that for direct mappings,
so we should hopefully be able to get there in X weeks.

> Christoph seems to be wanting to use a flag in the interface to make
> the guest use dma_ops which is what I don't understand.
> 
> What would be needed then would be something along the lines of virtio
> noticing that dma_mask isn't big enough to cover all of memory (which
> isn't something generic code can easily do here for various reasons I
> can elaborate if you want, but that specific test more/less has to be
> arch specific), and in that case, force itself to use DMA ops routed to
> swiotlb.
> 
> I'd rather have arch code do the bulk of that work, don't you think ?
> 
> Which brings me back to this option, which may be the simplest and
> avoids the overhead of the proposed series (I found the series to be a
> nice cleanup but retpoline does kick us in the nuts here).
> 
> So what about this ?
> 
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -155,7 +155,7 @@ static bool vring_use_dma_api(struct virtio_device
> *vdev)
>          * the DMA API if we're a Xen guest, which at least allows
>          * all of the sensible Xen configurations to work correctly.
>          */
> -       if (xen_domain())
> +       if (xen_domain() || arch_virtio_direct_dma_ops(&vdev->dev))
>                 return true;
>  
>         return false;

Right but can't we fix the retpoline overhead such that
vring_use_dma_api will not be called on data path any longer, making
this a setup time check?


> (Passing the dev allows the arch to know this is a virtio device in
> "direct" mode or whatever we want to call the !iommu case, and
> construct appropriate DMA ops for it, which aren't the same as the DMA
> ops of any other PCI device who *do* use the iommu).

I think that's where Christoph might have specific ideas about it.

> Otherwise, the harder option would be for us to hack so that
> xen_domain() returns true in our setup (gross), and have the arch code,
> when it sets up PCI device DMA ops, have a gross hack to identify
> virtio PCI devices, checks their F_IOMMU flag itself, and sets up the
> different ops at that point.
> 
> As for those "special" ops, they are of course just normal swiotlb ops,
> there's nothing "special" other that they aren't the ops that other PCI
> device on that bus use.
> 
> Cheers,
> Ben.

-- 
MST

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-06 21:46                                                     ` Michael S. Tsirkin
@ 2018-08-06 22:13                                                       ` Benjamin Herrenschmidt
  2018-08-06 23:16                                                         ` Benjamin Herrenschmidt
                                                                           ` (2 more replies)
  2018-08-07  6:18                                                       ` Christoph Hellwig
  1 sibling, 3 replies; 119+ messages in thread
From: Benjamin Herrenschmidt @ 2018-08-06 22:13 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Tue, 2018-08-07 at 00:46 +0300, Michael S. Tsirkin wrote:
> On Tue, Aug 07, 2018 at 07:26:35AM +1000, Benjamin Herrenschmidt wrote:
> > On Mon, 2018-08-06 at 23:35 +0300, Michael S. Tsirkin wrote:
> > > > As I said replying to Christoph, we are "leaking" into the interface
> > > > something here that is really what's the VM is doing to itself, which
> > > > is to stash its memory away in an inaccessible place.
> > > > 
> > > > Cheers,
> > > > Ben.
> > > 
> > > I think Christoph merely objects to the specific implementation.  If
> > > instead you do something like tweak dev->bus_dma_mask for the virtio
> > > device I think he won't object.
> > 
> > Well, we don't have "bus_dma_mask" yet ..or you mean dma_mask ?
> > 
> > So, something like that would be a possibility, but the problem is that
> > the current virtio (guest side) implementation doesn't honor this when
> > not using dma ops and will not use dma ops if not using iommu, so back
> > to square one.
> 
> Well we have the RFC for that - the switch to using DMA ops unconditionally isn't
> problematic itself IMHO, for now that RFC is blocked
> by its perfromance overhead for now but Christoph says
> he's trying to remove that for direct mappings,
> so we should hopefully be able to get there in X weeks.

That would be good yes.

 ../..

> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -155,7 +155,7 @@ static bool vring_use_dma_api(struct virtio_device
> > *vdev)
> >          * the DMA API if we're a Xen guest, which at least allows
> >          * all of the sensible Xen configurations to work correctly.
> >          */
> > -       if (xen_domain())
> > +       if (xen_domain() || arch_virtio_direct_dma_ops(&vdev->dev))
> >                 return true;
> >  
> >         return false;
> 
> Right but can't we fix the retpoline overhead such that
> vring_use_dma_api will not be called on data path any longer, making
> this a setup time check?

Yes it needs to be a setup time check regardless actually !

The above is broken, sorry I was a bit quick here (too early in the
morning... ugh). We don't want the arch to go override the dma ops
every time that is callled.

But yes, if we can fix the overhead, it becomes just a matter of
setting up the "right" ops automatically.

> > (Passing the dev allows the arch to know this is a virtio device in
> > "direct" mode or whatever we want to call the !iommu case, and
> > construct appropriate DMA ops for it, which aren't the same as the DMA
> > ops of any other PCI device who *do* use the iommu).
> 
> I think that's where Christoph might have specific ideas about it.

OK well, assuming Christoph can solve the direct case in a way that
also work for the virtio !iommu case, we still want some bit of logic
somewhere that will "switch" to swiotlb based ops if the DMA mask is
limited.

You mentioned an RFC for that ? Do you happen to have a link ?

It would be indeed ideal if all we had to do was setup some kind of
bus_dma_mask on all PCI devices and have virtio automagically insert
swiotlb when necessary.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-06 22:13                                                       ` Benjamin Herrenschmidt
@ 2018-08-06 23:16                                                         ` Benjamin Herrenschmidt
  2018-08-06 23:45                                                         ` Michael S. Tsirkin
  2018-08-07  6:27                                                         ` Christoph Hellwig
  2 siblings, 0 replies; 119+ messages in thread
From: Benjamin Herrenschmidt @ 2018-08-06 23:16 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Tue, 2018-08-07 at 08:13 +1000, Benjamin Herrenschmidt wrote:
> 
> OK well, assuming Christoph can solve the direct case in a way that
> also work for the virtio !iommu case, we still want some bit of logic
> somewhere that will "switch" to swiotlb based ops if the DMA mask is
> limited.
> 
> You mentioned an RFC for that ? Do you happen to have a link ?
> 
> It would be indeed ideal if all we had to do was setup some kind of
> bus_dma_mask on all PCI devices and have virtio automagically insert
> swiotlb when necessary.

Actually... I can think of a simpler option (Anshuman, didn't you
prototype this earlier ?):

Since that limitaiton of requiring bounce buffering via swiotlb is true
of any device in a secure VM, whether it goes through the iommu or not,
the iommu remapping is essentially pointless.

Thus, we could ensure that the iommu maps 1:1 the swiotlb bounce buffer
(either that or we configure it as "disabled" which is equivalent in
this case).

That way, we can now use the basic swiotlb ops everywhere, the same
dma_ops (swiotlb) will work whether a device uses the iommu or not.

Which boils down now to only making virtio use dma ops, there is no
need to override the dma_ops.

Which means all we have to do is either make xen_domain() return true
(yuck) or replace that one test with arch_virtio_force_dma_api() which
resolves to xen_domain() on x86 and can do something else for us.

As to using a virtio feature flag for that, which is what Christoph
proposes, I'm not too fan of it because this means effectively exposing
this to the peer, ie the interface. I don't think it belong there. The
interface, from the hypervisor perspective, whether it's qemu, vmware,
hyperz etc... have no business knowing how the guest manages its dma
operations, and may not even be aware of the access limitations (in our
case they are somewhat guest self-imposed).

Now, if this flag really is what we have to do, then we'd probably need
a qemu hack which will go set that flag on all virtio devices when it
detects that the VM is going secure.

But I don't think that's where that information "need to use the dma
API even for direct mode" belongs.

Cheers,
Ben.





^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-06 20:35                                                 ` Michael S. Tsirkin
  2018-08-06 21:26                                                   ` Benjamin Herrenschmidt
@ 2018-08-06 23:18                                                   ` Benjamin Herrenschmidt
  2018-08-07  6:12                                                   ` Christoph Hellwig
  2 siblings, 0 replies; 119+ messages in thread
From: Benjamin Herrenschmidt @ 2018-08-06 23:18 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Mon, 2018-08-06 at 23:35 +0300, Michael S. Tsirkin wrote:
> On Tue, Aug 07, 2018 at 05:56:59AM +1000, Benjamin Herrenschmidt wrote:
> > On Mon, 2018-08-06 at 16:46 +0300, Michael S. Tsirkin wrote:
> > > 
> > > > Right, we'll need some quirk to disable balloons  in the guest I
> > > > suppose.
> > > > 
> > > > Passing something from libvirt is cumbersome because the end user may
> > > > not even need to know about secure VMs. There are use cases where the
> > > > security is a contract down to some special application running inside
> > > > the secure VM, the sysadmin knows nothing about.
> > > > 
> > > > Also there's repercussions all the way to admin tools, web UIs etc...
> > > > so it's fairly wide ranging.
> > > > 
> > > > So as long as we only need to quirk a couple of devices, it's much
> > > > better contained that way.
> > > 
> > > So just the balloon thing already means that yes management and all the
> > > way to the user tools must know this is going on. Otherwise
> > > user will try to inflate the balloon and wonder why this does not work.
> > 
> > There is *dozens* of management systems out there, not even all open
> > source, we won't ever be able to see the end of the tunnel if we need
> > to teach every single of them, including end users, about platform
> > specific new VM flags like that.
> > 
> > .../...
> 
> In the end I suspect you will find you have to.

Maybe... we'll tackle this if/when we have to.

For balloon I suspect it's not such a big deal because once secure, all
the guest memory goes into the secure memory which isn't visible or
accounted by the hypervisor, so there's nothing to steal but the guest
is also using no HV memory (other than the few "non-secure" pages used
for swiotlb and a couple of other kernel things).

Future versions of our secure architecture might allow to turn
arbitrary pages of memory secure/non-secure rather than relying on a
separate physical pool, in which case, the balloon will be able to work
normally.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-06 22:13                                                       ` Benjamin Herrenschmidt
  2018-08-06 23:16                                                         ` Benjamin Herrenschmidt
@ 2018-08-06 23:45                                                         ` Michael S. Tsirkin
  2018-08-07  0:18                                                           ` Benjamin Herrenschmidt
  2018-08-07  6:32                                                           ` Christoph Hellwig
  2018-08-07  6:27                                                         ` Christoph Hellwig
  2 siblings, 2 replies; 119+ messages in thread
From: Michael S. Tsirkin @ 2018-08-06 23:45 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Tue, Aug 07, 2018 at 08:13:56AM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2018-08-07 at 00:46 +0300, Michael S. Tsirkin wrote:
> > On Tue, Aug 07, 2018 at 07:26:35AM +1000, Benjamin Herrenschmidt wrote:
> > > On Mon, 2018-08-06 at 23:35 +0300, Michael S. Tsirkin wrote:
> > > > > As I said replying to Christoph, we are "leaking" into the interface
> > > > > something here that is really what's the VM is doing to itself, which
> > > > > is to stash its memory away in an inaccessible place.
> > > > > 
> > > > > Cheers,
> > > > > Ben.
> > > > 
> > > > I think Christoph merely objects to the specific implementation.  If
> > > > instead you do something like tweak dev->bus_dma_mask for the virtio
> > > > device I think he won't object.
> > > 
> > > Well, we don't have "bus_dma_mask" yet ..or you mean dma_mask ?
> > > 
> > > So, something like that would be a possibility, but the problem is that
> > > the current virtio (guest side) implementation doesn't honor this when
> > > not using dma ops and will not use dma ops if not using iommu, so back
> > > to square one.
> > 
> > Well we have the RFC for that - the switch to using DMA ops unconditionally isn't
> > problematic itself IMHO, for now that RFC is blocked
> > by its perfromance overhead for now but Christoph says
> > he's trying to remove that for direct mappings,
> > so we should hopefully be able to get there in X weeks.
> 
> That would be good yes.
> 
>  ../..
> 
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -155,7 +155,7 @@ static bool vring_use_dma_api(struct virtio_device
> > > *vdev)
> > >          * the DMA API if we're a Xen guest, which at least allows
> > >          * all of the sensible Xen configurations to work correctly.
> > >          */
> > > -       if (xen_domain())
> > > +       if (xen_domain() || arch_virtio_direct_dma_ops(&vdev->dev))
> > >                 return true;
> > >  
> > >         return false;
> > 
> > Right but can't we fix the retpoline overhead such that
> > vring_use_dma_api will not be called on data path any longer, making
> > this a setup time check?
> 
> Yes it needs to be a setup time check regardless actually !
> 
> The above is broken, sorry I was a bit quick here (too early in the
> morning... ugh). We don't want the arch to go override the dma ops
> every time that is callled.
> 
> But yes, if we can fix the overhead, it becomes just a matter of
> setting up the "right" ops automatically.
> 
> > > (Passing the dev allows the arch to know this is a virtio device in
> > > "direct" mode or whatever we want to call the !iommu case, and
> > > construct appropriate DMA ops for it, which aren't the same as the DMA
> > > ops of any other PCI device who *do* use the iommu).
> > 
> > I think that's where Christoph might have specific ideas about it.
> 
> OK well, assuming Christoph can solve the direct case in a way that
> also work for the virtio !iommu case, we still want some bit of logic
> somewhere that will "switch" to swiotlb based ops if the DMA mask is
> limited.
> 
> You mentioned an RFC for that ? Do you happen to have a link ?

No but Christoph did I think.

> It would be indeed ideal if all we had to do was setup some kind of
> bus_dma_mask on all PCI devices and have virtio automagically insert
> swiotlb when necessary.
> 
> Cheers,
> Ben.
> 

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-06 23:45                                                         ` Michael S. Tsirkin
@ 2018-08-07  0:18                                                           ` Benjamin Herrenschmidt
  2018-08-07  6:32                                                           ` Christoph Hellwig
  1 sibling, 0 replies; 119+ messages in thread
From: Benjamin Herrenschmidt @ 2018-08-07  0:18 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Tue, 2018-08-07 at 02:45 +0300, Michael S. Tsirkin wrote:
> > OK well, assuming Christoph can solve the direct case in a way that
> > also work for the virtio !iommu case, we still want some bit of logic
> > somewhere that will "switch" to swiotlb based ops if the DMA mask is
> > limited.
> > 
> > You mentioned an RFC for that ? Do you happen to have a link ?
> 
> No but Christoph did I think.

Ok I missed that, sorry, I'll dig it out. Thanks.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-06 20:35                                                 ` Michael S. Tsirkin
  2018-08-06 21:26                                                   ` Benjamin Herrenschmidt
  2018-08-06 23:18                                                   ` Benjamin Herrenschmidt
@ 2018-08-07  6:12                                                   ` Christoph Hellwig
  2 siblings, 0 replies; 119+ messages in thread
From: Christoph Hellwig @ 2018-08-07  6:12 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Benjamin Herrenschmidt, Christoph Hellwig, Will Deacon,
	Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev,
	aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren,
	paulus, srikar, robin.murphy, jean-philippe.brucker,
	marc.zyngier

On Mon, Aug 06, 2018 at 11:35:39PM +0300, Michael S. Tsirkin wrote:
> > As I said replying to Christoph, we are "leaking" into the interface
> > something here that is really what's the VM is doing to itself, which
> > is to stash its memory away in an inaccessible place.
> > 
> > Cheers,
> > Ben.
> 
> I think Christoph merely objects to the specific implementation.  If
> instead you do something like tweak dev->bus_dma_mask for the virtio
> device I think he won't object.

As long as we also document how dev->bus_dma_mask is tweaked for this
particular virtual bus, yes.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-06 21:26                                                   ` Benjamin Herrenschmidt
  2018-08-06 21:46                                                     ` Michael S. Tsirkin
@ 2018-08-07  6:16                                                     ` Christoph Hellwig
  1 sibling, 0 replies; 119+ messages in thread
From: Christoph Hellwig @ 2018-08-07  6:16 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Michael S. Tsirkin, Christoph Hellwig, Will Deacon,
	Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev,
	aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren,
	paulus, srikar, robin.murphy, jean-philippe.brucker,
	marc.zyngier

On Tue, Aug 07, 2018 at 07:26:35AM +1000, Benjamin Herrenschmidt wrote:
> > I think Christoph merely objects to the specific implementation.  If
> > instead you do something like tweak dev->bus_dma_mask for the virtio
> > device I think he won't object.
> 
> Well, we don't have "bus_dma_mask" yet ..or you mean dma_mask ?

It will be new in 4.19:

http://git.infradead.org/users/hch/dma-mapping.git/commitdiff/f07d141fe9430cdf9f8a65a87c41

> So, something like that would be a possibility, but the problem is that
> the current virtio (guest side) implementation doesn't honor this when
> not using dma ops and will not use dma ops if not using iommu, so back
> to square one.
> 
> Christoph seems to be wanting to use a flag in the interface to make
> the guest use dma_ops which is what I don't understand.

As-is virtio devices are very clearly and explcitly defined to use
physical addresses in the spec.  dma ops will often do platform
based translations (iommu, offsets), so we can't just use the plaform
default dma ops and will need to opt into them.

> What would be needed then would be something along the lines of virtio
> noticing that dma_mask isn't big enough to cover all of memory (which
> isn't something generic code can easily do here for various reasons I
> can elaborate if you want, but that specific test more/less has to be
> arch specific), and in that case, force itself to use DMA ops routed to
> swiotlb.
> 
> I'd rather have arch code do the bulk of that work, don't you think ?

There is nothing architecture specific about that.  I've been working
hard to remove all the bullshit architectures have done in their DMA
ops and consolidating them into common code based on rules.  The last
thing I want is another vector for weird underspecified arch
interfaction with DMA ops, which is exactly what your patch below
does.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-06 21:46                                                     ` Michael S. Tsirkin
  2018-08-06 22:13                                                       ` Benjamin Herrenschmidt
@ 2018-08-07  6:18                                                       ` Christoph Hellwig
  1 sibling, 0 replies; 119+ messages in thread
From: Christoph Hellwig @ 2018-08-07  6:18 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Benjamin Herrenschmidt, Christoph Hellwig, Will Deacon,
	Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev,
	aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren,
	paulus, srikar, robin.murphy, jean-philippe.brucker,
	marc.zyngier

On Tue, Aug 07, 2018 at 12:46:34AM +0300, Michael S. Tsirkin wrote:
> Well we have the RFC for that - the switch to using DMA ops unconditionally isn't
> problematic itself IMHO, for now that RFC is blocked
> by its perfromance overhead for now but Christoph says
> he's trying to remove that for direct mappings,
> so we should hopefully be able to get there in X weeks.

The direct calls to dma_direct_ops aren't going to help you with legacy
virtio, given that virtio is specified to deal with physical addresses,
while dma-direct is not in many cases.

It would however help with the case where qemu always sets the platform
dma flag, as we'd avoid the indirect calls for that.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-06 19:52                                                   ` Benjamin Herrenschmidt
@ 2018-08-07  6:21                                                     ` Christoph Hellwig
  2018-08-07  6:42                                                       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 119+ messages in thread
From: Christoph Hellwig @ 2018-08-07  6:21 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Christoph Hellwig, Michael S. Tsirkin, Will Deacon,
	Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev,
	aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren,
	paulus, srikar, robin.murphy, jean-philippe.brucker,
	marc.zyngier

On Tue, Aug 07, 2018 at 05:52:12AM +1000, Benjamin Herrenschmidt wrote:
> > It is your job to write a coherent interface specification that does
> > not depend on the used components.  The hypervisor might be PAPR,
> > Linux + qemu, VMware, Hyperv or something so secret that you'd have
> > to shoot me if you had to tell me.  The guest might be Linux, FreeBSD,
> > AIX, OS400 or a Hipster project of the day in Rust.  As long as we
> > properly specify the interface it simplify does not matter.
> 
> That's the point Christoph. The interface is today's interface. It does
> NOT change. That information is not part of the interface.
> 
> It's the VM itself that is stashing away its memory in a secret place,
> and thus needs to do bounce buffering. There is no change to the virtio
> interface per-se.

Any guest that doesn't know about your magic limited adressing is simply
not going to work, so we need to communicate that fact.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-06 22:13                                                       ` Benjamin Herrenschmidt
  2018-08-06 23:16                                                         ` Benjamin Herrenschmidt
  2018-08-06 23:45                                                         ` Michael S. Tsirkin
@ 2018-08-07  6:27                                                         ` Christoph Hellwig
  2018-08-07  6:44                                                           ` Benjamin Herrenschmidt
  2 siblings, 1 reply; 119+ messages in thread
From: Christoph Hellwig @ 2018-08-07  6:27 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Michael S. Tsirkin, Christoph Hellwig, Will Deacon,
	Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev,
	aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren,
	paulus, srikar, robin.murphy, jean-philippe.brucker,
	marc.zyngier

On Tue, Aug 07, 2018 at 08:13:56AM +1000, Benjamin Herrenschmidt wrote:
> It would be indeed ideal if all we had to do was setup some kind of
> bus_dma_mask on all PCI devices and have virtio automagically insert
> swiotlb when necessary.

For 4.20 I plan to remove the swiotlb ops and instead do the bounce
buffering in the common code, including a direct call to the direct
ops to avoid retpoline overhead.  For that you still need a flag
in virtio that instead of blindly working physical addresses it needs
to be treated like a real device in terms of DMA.

And for powerpc to make use of that I need to get the dma series I
posted last week reviewed and included, otherwise powerpc will have
to be excepted (like arm, where rmk didn't like the way the code
was factored, everything else has already been taken care of).

https://lists.linuxfoundation.org/pipermail/iommu/2018-July/028989.html

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-06 23:45                                                         ` Michael S. Tsirkin
  2018-08-07  0:18                                                           ` Benjamin Herrenschmidt
@ 2018-08-07  6:32                                                           ` Christoph Hellwig
  1 sibling, 0 replies; 119+ messages in thread
From: Christoph Hellwig @ 2018-08-07  6:32 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Benjamin Herrenschmidt, Christoph Hellwig, Will Deacon,
	Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev,
	aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren,
	paulus, srikar, robin.murphy, jean-philippe.brucker,
	marc.zyngier

On Tue, Aug 07, 2018 at 02:45:25AM +0300, Michael S. Tsirkin wrote:
> > > I think that's where Christoph might have specific ideas about it.
> > 
> > OK well, assuming Christoph can solve the direct case in a way that
> > also work for the virtio !iommu case, we still want some bit of logic
> > somewhere that will "switch" to swiotlb based ops if the DMA mask is
> > limited.
> > 
> > You mentioned an RFC for that ? Do you happen to have a link ?
> 
> No but Christoph did I think.

Do you mean the direct map retpoline mitigation?  It is here:

https://www.spinics.net/lists/netdev/msg495413.html

https://www.spinics.net/lists/netdev/msg495785.html

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-07  6:21                                                     ` Christoph Hellwig
@ 2018-08-07  6:42                                                       ` Benjamin Herrenschmidt
  2018-08-07 13:55                                                         ` Christoph Hellwig
  0 siblings, 1 reply; 119+ messages in thread
From: Benjamin Herrenschmidt @ 2018-08-07  6:42 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Michael S. Tsirkin, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Mon, 2018-08-06 at 23:21 -0700, Christoph Hellwig wrote:
> On Tue, Aug 07, 2018 at 05:52:12AM +1000, Benjamin Herrenschmidt wrote:
> > > It is your job to write a coherent interface specification that does
> > > not depend on the used components.  The hypervisor might be PAPR,
> > > Linux + qemu, VMware, Hyperv or something so secret that you'd have
> > > to shoot me if you had to tell me.  The guest might be Linux, FreeBSD,
> > > AIX, OS400 or a Hipster project of the day in Rust.  As long as we
> > > properly specify the interface it simplify does not matter.
> > 
> > That's the point Christoph. The interface is today's interface. It does
> > NOT change. That information is not part of the interface.
> > 
> > It's the VM itself that is stashing away its memory in a secret place,
> > and thus needs to do bounce buffering. There is no change to the virtio
> > interface per-se.
> 
> Any guest that doesn't know about your magic limited adressing is simply
> not going to work, so we need to communicate that fact.

The guest does. It's the guest itself that initiates it. That's my
point, it's not a factor of the hypervisor, which is unchanged in that
area.

It's the guest itself, that makes the decision early on, to stash it's
memory away in a secure place, and thus needs to establish some kind of
bouce buffering via a few left over "insecure" pages.

It's all done by the guest: initiated by the guest and controlled by
the guest.

That's why I don't see why this specifically needs to involve the
hypervisor side, and thus a VIRTIO feature bit.

Note that I can make it so that the same DMA ops (basically standard
swiotlb ops without arch hacks) work for both "direct virtio" and
"normal PCI" devices.

The trick is simply in the arch to setup the iommu to map the swiotlb
bounce buffer pool 1:1 in the iommu, so the iommu essentially can be
ignored without affecting the physical addresses.

If I do that, *all* I need is a way, from the guest itself (again, the
other side dosn't know anything about it), to force virtio to use the
DMA ops as if there was an iommu, that is, use whatever dma ops were
setup by the platform for the pci device.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-07  6:27                                                         ` Christoph Hellwig
@ 2018-08-07  6:44                                                           ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 119+ messages in thread
From: Benjamin Herrenschmidt @ 2018-08-07  6:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Michael S. Tsirkin, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Mon, 2018-08-06 at 23:27 -0700, Christoph Hellwig wrote:
> On Tue, Aug 07, 2018 at 08:13:56AM +1000, Benjamin Herrenschmidt wrote:
> > It would be indeed ideal if all we had to do was setup some kind of
> > bus_dma_mask on all PCI devices and have virtio automagically insert
> > swiotlb when necessary.
> 
> For 4.20 I plan to remove the swiotlb ops and instead do the bounce
> buffering in the common code, including a direct call to the direct
> ops to avoid retpoline overhead.  For that you still need a flag
> in virtio that instead of blindly working physical addresses it needs
> to be treated like a real device in terms of DMA.

But you will still call the swiotlb infrastructure, right ? IE, I sitll
need to control where/how the swiotlb "pool" is allocated.
> 
> And for powerpc to make use of that I need to get the dma series I
> posted last week reviewed and included, otherwise powerpc will have
> to be excepted (like arm, where rmk didn't like the way the code
> was factored, everything else has already been taken care of).
> 
> https://lists.linuxfoundation.org/pipermail/iommu/2018-July/028989.html

Yes, I saw your series. I'm just back from a week of travel, I plan to
start reviewing it this week if Michael doesn't beat me to it.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-07  6:42                                                       ` Benjamin Herrenschmidt
@ 2018-08-07 13:55                                                         ` Christoph Hellwig
  2018-08-07 20:32                                                           ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 119+ messages in thread
From: Christoph Hellwig @ 2018-08-07 13:55 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Christoph Hellwig, Michael S. Tsirkin, Will Deacon,
	Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev,
	aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren,
	paulus, srikar, robin.murphy, jean-philippe.brucker,
	marc.zyngier

On Tue, Aug 07, 2018 at 04:42:44PM +1000, Benjamin Herrenschmidt wrote:
> Note that I can make it so that the same DMA ops (basically standard
> swiotlb ops without arch hacks) work for both "direct virtio" and
> "normal PCI" devices.
> 
> The trick is simply in the arch to setup the iommu to map the swiotlb
> bounce buffer pool 1:1 in the iommu, so the iommu essentially can be
> ignored without affecting the physical addresses.
> 
> If I do that, *all* I need is a way, from the guest itself (again, the
> other side dosn't know anything about it), to force virtio to use the
> DMA ops as if there was an iommu, that is, use whatever dma ops were
> setup by the platform for the pci device.

In that case just setting VIRTIO_F_IOMMU_PLATFORM in the flags should
do the work (even if that isn't strictly what the current definition
of the flag actually means).  On the qemu side you'll need to make
sure you have a way to set VIRTIO_F_IOMMU_PLATFORM without emulating
an iommu, but with code to take dma offsets into account if your
plaform has any (various power plaforms seem to have them, not sure
if it affects your config).

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-07 13:55                                                         ` Christoph Hellwig
@ 2018-08-07 20:32                                                           ` Benjamin Herrenschmidt
  2018-08-08  6:31                                                             ` Christoph Hellwig
  0 siblings, 1 reply; 119+ messages in thread
From: Benjamin Herrenschmidt @ 2018-08-07 20:32 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Michael S. Tsirkin, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Tue, 2018-08-07 at 06:55 -0700, Christoph Hellwig wrote:
> On Tue, Aug 07, 2018 at 04:42:44PM +1000, Benjamin Herrenschmidt wrote:
> > Note that I can make it so that the same DMA ops (basically standard
> > swiotlb ops without arch hacks) work for both "direct virtio" and
> > "normal PCI" devices.
> > 
> > The trick is simply in the arch to setup the iommu to map the swiotlb
> > bounce buffer pool 1:1 in the iommu, so the iommu essentially can be
> > ignored without affecting the physical addresses.
> > 
> > If I do that, *all* I need is a way, from the guest itself (again, the
> > other side dosn't know anything about it), to force virtio to use the
> > DMA ops as if there was an iommu, that is, use whatever dma ops were
> > setup by the platform for the pci device.
> 
> In that case just setting VIRTIO_F_IOMMU_PLATFORM in the flags should
> do the work (even if that isn't strictly what the current definition
> of the flag actually means).  On the qemu side you'll need to make
> sure you have a way to set VIRTIO_F_IOMMU_PLATFORM without emulating
> an iommu, but with code to take dma offsets into account if your
> plaform has any (various power plaforms seem to have them, not sure
> if it affects your config).

Something like that yes. I prefer a slightly different way, see below,
any but in both cases, it should alleviate your concerns since it means
there would be no particular mucking around with DMA ops at all, virtio
would just use whatever "normal" ops we establish for all PCI devices
on that platform, which will be standard ones.

(swiotlb ones today and the new "integrates" ones you're cooking
tomorrow).

As for the flag itself, while we could set it from qemu when we get
notified that the guest is going secure, both Michael and I think it's
rather gross, it requires qemu to go iterate all virtio devices and
"poke" something into them.

It also means qemu will need some other internal nasty flag that says
"set that bit but don't do iommu".

It's nicer if we have a way in the guest virtio driver to do something
along the lines of

	if ((flags & VIRTIO_F_IOMMU_PLATFORM) || arch_virtio_wants_dma_ops())

Which would have the same effect and means the issue is entirely
contained in the guest.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-07 20:32                                                           ` Benjamin Herrenschmidt
@ 2018-08-08  6:31                                                             ` Christoph Hellwig
  2018-08-08 10:07                                                               ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 119+ messages in thread
From: Christoph Hellwig @ 2018-08-08  6:31 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Christoph Hellwig, Michael S. Tsirkin, Will Deacon,
	Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev,
	aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren,
	paulus, srikar, robin.murphy, jean-philippe.brucker,
	marc.zyngier

On Wed, Aug 08, 2018 at 06:32:45AM +1000, Benjamin Herrenschmidt wrote:
> As for the flag itself, while we could set it from qemu when we get
> notified that the guest is going secure, both Michael and I think it's
> rather gross, it requires qemu to go iterate all virtio devices and
> "poke" something into them.

You don't need to set them the time you go secure.  You just need to
set the flag from the beginning on any VM you might want to go secure.
Or for simplicity just any VM - if the DT/ACPI tables exposed by
qemu are good enough that will always exclude a iommu and not set a
DMA offset, so nothing will change on the qemu side of he processing,
and with the new direct calls for the direct dma ops performance in
the guest won't change either.

> It's nicer if we have a way in the guest virtio driver to do something
> along the lines of
> 
> 	if ((flags & VIRTIO_F_IOMMU_PLATFORM) || arch_virtio_wants_dma_ops())
> 
> Which would have the same effect and means the issue is entirely
> contained in the guest.

It would not be the same effect.  The problem with that is that you must
now assumes that your qemu knows that for example you might be passing
a dma offset if the bus otherwise requires it.  Or in other words:
you potentially break the contract between qemu and the guest of always
passing down physical addresses.  If we explicitly change that contract
through using a flag that says you pass bus address everything is fine.

Note that in practice your scheme will probably just work for your
initial prototype, but chances are it will get us in trouble later on.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-08  6:31                                                             ` Christoph Hellwig
@ 2018-08-08 10:07                                                               ` Benjamin Herrenschmidt
  2018-08-08 12:30                                                                 ` Christoph Hellwig
  0 siblings, 1 reply; 119+ messages in thread
From: Benjamin Herrenschmidt @ 2018-08-08 10:07 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Michael S. Tsirkin, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Tue, 2018-08-07 at 23:31 -0700, Christoph Hellwig wrote:
> 
> You don't need to set them the time you go secure.  You just need to
> set the flag from the beginning on any VM you might want to go secure.
> Or for simplicity just any VM - if the DT/ACPI tables exposed by
> qemu are good enough that will always exclude a iommu and not set a
> DMA offset, so nothing will change on the qemu side of he processing,
> and with the new direct calls for the direct dma ops performance in
> the guest won't change either.

So that's where I'm not sure things are "good enough" due to how
pseries works. (remember it's paravirtualized).

A pseries system starts with a default iommu on all devices, that uses
translation using 4k entires with a "pinhole" window (usually 2G with
qemu iirc). There's no "pass through" by default.

Qemu virtio bypasses that iommu when the VIRTIO_F_IOMMU_PLATFORM flag
is not set (default) but there's nothing in the device-tree to tell the
guest about this since it's a violation of our pseries architecture, so
we just rely on Linux virtio "knowing" that it happens. It's a bit
yucky but that's now history...

Essentially pseries "architecturally" does not have the concept of not
having an iommu in the way and qemu violates that architecture today.

(Remember it comes from pHyp, our priorietary HV, which we are somewhat
mimmicing here).

So if we always set VIRTIO_F_IOMMU_PLATFORM, it *will* force all virtio
through that iommu and performance will suffer (esp vhost I suspect),
especially since adding/removing translations in the iommu is a
hypercall.

Now, we do have HV APIs to create a second window that's "permanently
mapped" to the guest memory, thus avoiding dynamic map/unmaps, and
Linux can make use of this but I don't know if that works with qemu and
the performance impact with vhost.

So the situation isn't that great.... On the other hand, I think the
other approach works for us:

> > It's nicer if we have a way in the guest virtio driver to do something
> > along the lines of
> > 
> > 	if ((flags & VIRTIO_F_IOMMU_PLATFORM) || arch_virtio_wants_dma_ops())
> > 
> > Which would have the same effect and means the issue is entirely
> > contained in the guest.
> 
> It would not be the same effect.  The problem with that is that you must
> now assumes that your qemu knows that for example you might be passing
> a dma offset if the bus otherwise requires it. 

I would assume that arch_virtio_wants_dma_ops() only returns true when
no such offsets are involved, at least in our case that would be what
happens.

>  Or in other words:
> you potentially break the contract between qemu and the guest of always
> passing down physical addresses.  If we explicitly change that contract
> through using a flag that says you pass bus address everything is fine.

For us a "bus address" is behind the iommu so that's what
VIRTIO_F_IOMMU_PLATFORM does already. We don't have the concept of a
bus address that is different. I suppose it's an ARMism to have DMA
offsets that are separate from iommus ? 

> Note that in practice your scheme will probably just work for your
> initial prototype, but chances are it will get us in trouble later on.

Not on pseries, at least not in any way I can think of mind you... but
maybe other architectures would abuse it... We could add a WARN_ON if
that calls returns true on a bus with an offset I suppose.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-08 10:07                                                               ` Benjamin Herrenschmidt
@ 2018-08-08 12:30                                                                 ` Christoph Hellwig
  2018-08-08 13:18                                                                   ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 119+ messages in thread
From: Christoph Hellwig @ 2018-08-08 12:30 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Christoph Hellwig, Michael S. Tsirkin, Will Deacon,
	Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev,
	aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren,
	paulus, srikar, robin.murphy, jean-philippe.brucker,
	marc.zyngier

On Wed, Aug 08, 2018 at 08:07:49PM +1000, Benjamin Herrenschmidt wrote:
> Qemu virtio bypasses that iommu when the VIRTIO_F_IOMMU_PLATFORM flag
> is not set (default) but there's nothing in the device-tree to tell the
> guest about this since it's a violation of our pseries architecture, so
> we just rely on Linux virtio "knowing" that it happens. It's a bit
> yucky but that's now history...

That is ugly as hell, but it is how virtio works everywhere, so nothing
special so far.

> Essentially pseries "architecturally" does not have the concept of not
> having an iommu in the way and qemu violates that architecture today.
> 
> (Remember it comes from pHyp, our priorietary HV, which we are somewhat
> mimmicing here).

It shouldnt be too hard to have a dt property that communicates this,
should it?

> So if we always set VIRTIO_F_IOMMU_PLATFORM, it *will* force all virtio
> through that iommu and performance will suffer (esp vhost I suspect),
> especially since adding/removing translations in the iommu is a
> hypercall.

Well, we'd nee to make sure that for this particular bus we skip the
actualy iommu.

> > It would not be the same effect.  The problem with that is that you must
> > now assumes that your qemu knows that for example you might be passing
> > a dma offset if the bus otherwise requires it. 
> 
> I would assume that arch_virtio_wants_dma_ops() only returns true when
> no such offsets are involved, at least in our case that would be what
> happens.

That would work, but we're really piling hacĸs ontop of hacks here.

> >  Or in other words:
> > you potentially break the contract between qemu and the guest of always
> > passing down physical addresses.  If we explicitly change that contract
> > through using a flag that says you pass bus address everything is fine.
> 
> For us a "bus address" is behind the iommu so that's what
> VIRTIO_F_IOMMU_PLATFORM does already. We don't have the concept of a
> bus address that is different. I suppose it's an ARMism to have DMA
> offsets that are separate from iommus ? 

No, a lot of platforms support a bus address that has an offset from
the physical address. including a lot of power platforms:

arch/powerpc/kernel/pci-common.c:       set_dma_offset(&dev->dev, PCI_DRAM_OFFSET);
arch/powerpc/platforms/cell/iommu.c:            set_dma_offset(dev, cell_dma_nommu_offset);
arch/powerpc/platforms/cell/iommu.c:            set_dma_offset(dev, addr);
arch/powerpc/platforms/powernv/pci-ioda.c:      set_dma_offset(&pdev->dev, pe->tce_bypass_base);
arch/powerpc/platforms/powernv/pci-ioda.c:                      set_dma_offset(&pdev->dev, (1ULL << 32));
arch/powerpc/platforms/powernv/pci-ioda.c:              set_dma_offset(&dev->dev, pe->tce_bypass_base);
arch/powerpc/platforms/pseries/iommu.c:                         set_dma_offset(dev, dma_offset);
arch/powerpc/sysdev/dart_iommu.c:               set_dma_offset(&dev->dev, DART_U4_BYPASS_BASE);
arch/powerpc/sysdev/fsl_pci.c:          set_dma_offset(dev, pci64_dma_offset);

to make things worse some platforms (at least on arm/arm64/mips/x86) can
also require additional banking where it isn't even a single linear map
but multiples windows.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-08 12:30                                                                 ` Christoph Hellwig
@ 2018-08-08 13:18                                                                   ` Benjamin Herrenschmidt
  2018-08-08 20:31                                                                     ` Michael S. Tsirkin
  0 siblings, 1 reply; 119+ messages in thread
From: Benjamin Herrenschmidt @ 2018-08-08 13:18 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Michael S. Tsirkin, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Wed, 2018-08-08 at 05:30 -0700, Christoph Hellwig wrote:
> On Wed, Aug 08, 2018 at 08:07:49PM +1000, Benjamin Herrenschmidt wrote:
> > Qemu virtio bypasses that iommu when the VIRTIO_F_IOMMU_PLATFORM flag
> > is not set (default) but there's nothing in the device-tree to tell the
> > guest about this since it's a violation of our pseries architecture, so
> > we just rely on Linux virtio "knowing" that it happens. It's a bit
> > yucky but that's now history...
> 
> That is ugly as hell, but it is how virtio works everywhere, so nothing
> special so far.

Yup.

> > Essentially pseries "architecturally" does not have the concept of not
> > having an iommu in the way and qemu violates that architecture today.
> > 
> > (Remember it comes from pHyp, our priorietary HV, which we are somewhat
> > mimmicing here).
> 
> It shouldnt be too hard to have a dt property that communicates this,
> should it?

We could invent something I suppose. The additional problem then (yeah
I know ... what a mess) is that qemu doesn't create the DT for PCI
devices, the firmware (SLOF) inside the guest does using normal PCI
probing.

That said, that FW could know about all the virtio vendor/device IDs,
check the VIRTIO_F_IOMMU_PLATFORM and set that property accordingly...
messy but doable. It's not a bus property (see my other reply below as
this could complicate things with your bus mask).

But we are drifting from the problem at hand :-) You propose we do set
VIRTIO_F_IOMMU_PLATFORM so we aren't in the above case, and the bypass
stuff works, so no need to touch it.

See my recap at the end of the email to make sure I understand fully
what you suggest.

> > So if we always set VIRTIO_F_IOMMU_PLATFORM, it *will* force all virtio
> > through that iommu and performance will suffer (esp vhost I suspect),
> > especially since adding/removing translations in the iommu is a
> > hypercall.

> Well, we'd nee to make sure that for this particular bus we skip the
> actualy iommu.

It's not a bus property. Qemu will happily mix up everything on the
same bus, that includes emulated devices that go through the emulated
iommu, real VFIO devices that go through an actual HW iommu and virtio
that bypasses everything.

This makes things tricky in general (not just in my powerpc secure VM
case) since, at least on powerpc but I suppose elsewhere too, iommu
related properties tend to be per "bus" while here, qemu will mix and
match.

But again, I think we are drifting away from the topic, see below

> > > It would not be the same effect.  The problem with that is that you must
> > > now assumes that your qemu knows that for example you might be passing
> > > a dma offset if the bus otherwise requires it. 
> > 
> > I would assume that arch_virtio_wants_dma_ops() only returns true when
> > no such offsets are involved, at least in our case that would be what
> > happens.
> 
> That would work, but we're really piling hacĸs ontop of hacks here.

Sort-of :-) At least none of what we are discussing now involves
touching the dma_ops themselves so we are not in the way of your big
cleanup operation here. But yeah, let's continue discussing your other
solution below.

> > >  Or in other words:
> > > you potentially break the contract between qemu and the guest of always
> > > passing down physical addresses.  If we explicitly change that contract
> > > through using a flag that says you pass bus address everything is fine.
> > 
> > For us a "bus address" is behind the iommu so that's what
> > VIRTIO_F_IOMMU_PLATFORM does already. We don't have the concept of a
> > bus address that is different. I suppose it's an ARMism to have DMA
> > offsets that are separate from iommus ? 
> 
> No, a lot of platforms support a bus address that has an offset from
> the physical address. including a lot of power platforms:

Ok, just talking past each other :-) For all the powerpc ones, these
*do* go through the iommu, which is what I meant. It's just a window of
the iommu that provides some kind of direct mapping of memory.

For pseries, there is no such thing however. What we do to avoid
constant map/unmap of iommu PTEs in pseries guests is that we use
hypercalls to create a 64-bit window and populate all its PTEs with an
identity mapping. But that's not as efficient as a real bypass.

There are good historical reasons for that, since pseries is a guest
platform, its memory is never really where the guest thinks it is, so
you always need an iommu to remap. Even for virtual devices, since for
most of them, in the "IBM" pHyp model, the "peer" is actually another
partition, so the virtual iommu handles translating accross the two
partitions.

Same goes with cell in HW, no real bypass, just the iommu being
confiured with very large pages and a fixed mapping.

powernv has a separate physical window that can be configured as a real
bypass though, so does the U4 DART. Not sure about the FSL one.

But yeah, your point stands, this is just implementation details.

> arch/powerpc/kernel/pci-common.c:       set_dma_offset(&dev->dev, PCI_DRAM_OFFSET);
> arch/powerpc/platforms/cell/iommu.c:            set_dma_offset(dev, cell_dma_nommu_offset);
> arch/powerpc/platforms/cell/iommu.c:            set_dma_offset(dev, addr);
> arch/powerpc/platforms/powernv/pci-ioda.c:      set_dma_offset(&pdev->dev, pe->tce_bypass_base);
> arch/powerpc/platforms/powernv/pci-ioda.c:                      set_dma_offset(&pdev->dev, (1ULL << 32));
> arch/powerpc/platforms/powernv/pci-ioda.c:              set_dma_offset(&dev->dev, pe->tce_bypass_base);
> arch/powerpc/platforms/pseries/iommu.c:                         set_dma_offset(dev, dma_offset);
> arch/powerpc/sysdev/dart_iommu.c:               set_dma_offset(&dev->dev, DART_U4_BYPASS_BASE);
> arch/powerpc/sysdev/fsl_pci.c:          set_dma_offset(dev, pci64_dma_offset);
> 
> to make things worse some platforms (at least on arm/arm64/mips/x86) can
> also require additional banking where it isn't even a single linear map
> but multiples windows.

Sure, but all of this is just the configuration of the iommu. But I
think we agree here, and your point remains valid, indeed my proposed
hack:

>       if ((flags & VIRTIO_F_IOMMU_PLATFORM) || arch_virtio_wants_dma_ops())

Will only work if the IOMMU and non-IOMMU path are completely equivalent.

We can provide that guarantee for our secure VM case, but not generally so if
we were to go down the route of a quirk in virtio, it might be better to
make it painfully obvious that it's specific to that one case with a different
kind of turd:

-	if (xen_domain())
+	if (xen_domain() || pseries_secure_vm())
		return true;

So to summarize, and make sure I'm not missing something, the two approaches
at hand are either:

 1- The above, which is a one liner and contained in the guest, so that's nice, but
also means another turd in virtio which isn't ...

 2- We force pseries to always set VIRTIO_F_IOMMU_PLATFORM, but with the current
architecture on our side that will force virtio to always go through an emulated
iommu, as pseries doesn't have the concept of a real bypass window, and thus will
impact performance for both secure and non-secure VMs.

 3- Invent a property that can be put in selected PCI device tree nodes that
indicates that for that device specifically, the iommu can be bypassed, along with
a hypercall to turn that bypass on/off. Virtio would then use VIRTIO_F_IOMMU_PLATFORM
but its DT nodes would also have that property and Linux would notice it and turn
bypass on.

The resulting properties of those options are:

1- Is what I want because it's the simplest, provides the best performance now,
   and works without code changes to qemu or non-secure Linux. However it does
   add a tiny turd to virtio which is annoying.

2- This works but it puts the iommu in the way always, thus reducing virtio performance
   accross the board for pseries unless we only do that for secure VMs but that is
   difficult (as discussed earlier).

3- This would recover the performance lost in -2-, however it requires qemu *and*
   guest changes. Specifically, existing guests (RHEL 7 etc...) would get the
   performance hit of -2- unless modified to call that 'enable bypass' call, which
   isn't great.

So imho we have to chose one of 3 not-great solutions here... Unless I missed
something in your ideas of course.

Cheers,
Ben.




^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-08 13:18                                                                   ` Benjamin Herrenschmidt
@ 2018-08-08 20:31                                                                     ` Michael S. Tsirkin
  2018-08-08 22:13                                                                       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 119+ messages in thread
From: Michael S. Tsirkin @ 2018-08-08 20:31 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Wed, Aug 08, 2018 at 11:18:13PM +1000, Benjamin Herrenschmidt wrote:
> Sure, but all of this is just the configuration of the iommu. But I
> think we agree here, and your point remains valid, indeed my proposed
> hack:
> 
> >       if ((flags & VIRTIO_F_IOMMU_PLATFORM) || arch_virtio_wants_dma_ops())
> 
> Will only work if the IOMMU and non-IOMMU path are completely equivalent.
> 
> We can provide that guarantee for our secure VM case, but not generally so if
> we were to go down the route of a quirk in virtio, it might be better to
> make it painfully obvious that it's specific to that one case with a different
> kind of turd:
> 
> -	if (xen_domain())
> +	if (xen_domain() || pseries_secure_vm())
> 		return true;

I don't think it's pseries specific actually. E.g. I suspect AMD SEV
might benefit from the same kind of hack.


> So to summarize, and make sure I'm not missing something, the two approaches
> at hand are either:
> 
>  1- The above, which is a one liner and contained in the guest, so that's nice, but
> also means another turd in virtio which isn't ...
> 
>  2- We force pseries to always set VIRTIO_F_IOMMU_PLATFORM, but with the current
> architecture on our side that will force virtio to always go through an emulated
> iommu, as pseries doesn't have the concept of a real bypass window, and thus will
> impact performance for both secure and non-secure VMs.
> 
>  3- Invent a property that can be put in selected PCI device tree nodes that
> indicates that for that device specifically, the iommu can be bypassed, along with
> a hypercall to turn that bypass on/off. Virtio would then use VIRTIO_F_IOMMU_PLATFORM
> but its DT nodes would also have that property and Linux would notice it and turn
> bypass on.

For completeness, virtio could also have its own bounce buffer
outside of DMA API one. I don't see lots of benefits to this
though.


> The resulting properties of those options are:
> 
> 1- Is what I want because it's the simplest, provides the best performance now,
>    and works without code changes to qemu or non-secure Linux. However it does
>    add a tiny turd to virtio which is annoying.
> 
> 2- This works but it puts the iommu in the way always, thus reducing virtio performance
>    accross the board for pseries unless we only do that for secure VMs but that is
>    difficult (as discussed earlier).
> 
> 3- This would recover the performance lost in -2-, however it requires qemu *and*
>    guest changes. Specifically, existing guests (RHEL 7 etc...) would get the
>    performance hit of -2- unless modified to call that 'enable bypass' call, which
>    isn't great.
> 
> So imho we have to chose one of 3 not-great solutions here... Unless I missed
> something in your ideas of course.
> 
> Cheers,
> Ben.
> 
> 

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-08 20:31                                                                     ` Michael S. Tsirkin
@ 2018-08-08 22:13                                                                       ` Benjamin Herrenschmidt
  2018-08-09  2:00                                                                         ` Benjamin Herrenschmidt
  2018-08-09  5:40                                                                         ` Christoph Hellwig
  0 siblings, 2 replies; 119+ messages in thread
From: Benjamin Herrenschmidt @ 2018-08-08 22:13 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Wed, 2018-08-08 at 23:31 +0300, Michael S. Tsirkin wrote:
> On Wed, Aug 08, 2018 at 11:18:13PM +1000, Benjamin Herrenschmidt wrote:
> > Sure, but all of this is just the configuration of the iommu. But I
> > think we agree here, and your point remains valid, indeed my proposed
> > hack:
> > 
> > >       if ((flags & VIRTIO_F_IOMMU_PLATFORM) || arch_virtio_wants_dma_ops())
> > 
> > Will only work if the IOMMU and non-IOMMU path are completely equivalent.
> > 
> > We can provide that guarantee for our secure VM case, but not generally so if
> > we were to go down the route of a quirk in virtio, it might be better to
> > make it painfully obvious that it's specific to that one case with a different
> > kind of turd:
> > 
> > -	if (xen_domain())
> > +	if (xen_domain() || pseries_secure_vm())
> > 		return true;
> 
> I don't think it's pseries specific actually. E.g. I suspect AMD SEV
> might benefit from the same kind of hack.

As long as they can provide the same guarantee that the DMA ops are
completely equivalent between virtio and other PCI devices, at least on
the same bus, ie, we don't have to go hack special DMA ops.

I think the latter is really what Christoph wants to avoid for good
reasons.

> > So to summarize, and make sure I'm not missing something, the two approaches
> > at hand are either:
> > 
> >  1- The above, which is a one liner and contained in the guest, so that's nice, but
> > also means another turd in virtio which isn't ...
> > 
> >  2- We force pseries to always set VIRTIO_F_IOMMU_PLATFORM, but with the current
> > architecture on our side that will force virtio to always go through an emulated
> > iommu, as pseries doesn't have the concept of a real bypass window, and thus will
> > impact performance for both secure and non-secure VMs.
> > 
> >  3- Invent a property that can be put in selected PCI device tree nodes that
> > indicates that for that device specifically, the iommu can be bypassed, along with
> > a hypercall to turn that bypass on/off. Virtio would then use VIRTIO_F_IOMMU_PLATFORM
> > but its DT nodes would also have that property and Linux would notice it and turn
> > bypass on.
> 
> For completeness, virtio could also have its own bounce buffer
> outside of DMA API one. I don't see lots of benefits to this
> though.

Not fan of that either...

> > The resulting properties of those options are:
> > 
> > 1- Is what I want because it's the simplest, provides the best performance now,
> >    and works without code changes to qemu or non-secure Linux. However it does
> >    add a tiny turd to virtio which is annoying.
> > 
> > 2- This works but it puts the iommu in the way always, thus reducing virtio performance
> >    accross the board for pseries unless we only do that for secure VMs but that is
> >    difficult (as discussed earlier).
> > 
> > 3- This would recover the performance lost in -2-, however it requires qemu *and*
> >    guest changes. Specifically, existing guests (RHEL 7 etc...) would get the
> >    performance hit of -2- unless modified to call that 'enable bypass' call, which
> >    isn't great.
> > 
> > So imho we have to chose one of 3 not-great solutions here... Unless I missed
> > something in your ideas of course.
> > 


^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-08 22:13                                                                       ` Benjamin Herrenschmidt
@ 2018-08-09  2:00                                                                         ` Benjamin Herrenschmidt
  2018-08-09  5:40                                                                         ` Christoph Hellwig
  1 sibling, 0 replies; 119+ messages in thread
From: Benjamin Herrenschmidt @ 2018-08-09  2:00 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier

On Thu, 2018-08-09 at 08:13 +1000, Benjamin Herrenschmidt wrote:
> > For completeness, virtio could also have its own bounce buffer
> > outside of DMA API one. I don't see lots of benefits to this
> > though.
> 
> Not fan of that either...

To elaborate a bit ...

For our secure VMs, we will need bounce buffering for everything
anyway. virtio, emulated PCI, or vfio.

By ensuring that we create an identity mapping in the IOMMU for
the bounce buffering pool, we enable virtio "legacy/direct" to
use the same mapping ops as things using the iommu.

That said, we still need somewhere in arch/powerpc a set of dma
ops which we'll attach to all PCI devices of a secure VM to force
bouncing always, rather than just based on address (which is what
the standard swiotlb ones do)... Unless we can tweak the swiotlb
"threshold" for example by using an empty mask.

We'll need the same set of DMA ops for VIO devices too, not just PCI.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-08 22:13                                                                       ` Benjamin Herrenschmidt
  2018-08-09  2:00                                                                         ` Benjamin Herrenschmidt
@ 2018-08-09  5:40                                                                         ` Christoph Hellwig
  2018-09-07  0:09                                                                           ` Jiandi An
  1 sibling, 1 reply; 119+ messages in thread
From: Christoph Hellwig @ 2018-08-09  5:40 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Michael S. Tsirkin, Christoph Hellwig, Will Deacon,
	Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev,
	aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren,
	paulus, srikar, robin.murphy, jean-philippe.brucker,
	marc.zyngier

On Thu, Aug 09, 2018 at 08:13:32AM +1000, Benjamin Herrenschmidt wrote:
> > > -	if (xen_domain())
> > > +	if (xen_domain() || pseries_secure_vm())
> > > 		return true;
> > 
> > I don't think it's pseries specific actually. E.g. I suspect AMD SEV
> > might benefit from the same kind of hack.
> 
> As long as they can provide the same guarantee that the DMA ops are
> completely equivalent between virtio and other PCI devices, at least on
> the same bus, ie, we don't have to go hack special DMA ops.
> 
> I think the latter is really what Christoph wants to avoid for good
> reasons.

Yes.  I also generally want to avoid too much arch specific magic.

FYI, I'm off to a week-long vacation today, don't expect quick replies.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-08-09  5:40                                                                         ` Christoph Hellwig
@ 2018-09-07  0:09                                                                           ` Jiandi An
  2018-09-10  6:19                                                                             ` Christoph Hellwig
  0 siblings, 1 reply; 119+ messages in thread
From: Jiandi An @ 2018-09-07  0:09 UTC (permalink / raw)
  To: Christoph Hellwig, Benjamin Herrenschmidt
  Cc: Michael S. Tsirkin, Will Deacon, Anshuman Khandual,
	virtualization, linux-kernel, linuxppc-dev, aik, robh, joe,
	elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar,
	robin.murphy, jean-philippe.brucker, marc.zyngier,
	thomas.lendacky, brijesh.singh, jiandi.an



On 08/09/2018 12:40 AM, Christoph Hellwig wrote:
> On Thu, Aug 09, 2018 at 08:13:32AM +1000, Benjamin Herrenschmidt wrote:
>>>> -	if (xen_domain())
>>>> +	if (xen_domain() || pseries_secure_vm())
>>>> 		return true;
>>>
>>> I don't think it's pseries specific actually. E.g. I suspect AMD SEV
>>> might benefit from the same kind of hack.
>>
>> As long as they can provide the same guarantee that the DMA ops are
>> completely equivalent between virtio and other PCI devices, at least on
>> the same bus, ie, we don't have to go hack special DMA ops.
>>
>> I think the latter is really what Christoph wants to avoid for good
>> reasons.
> 
> Yes.  I also generally want to avoid too much arch specific magic.
> 
> FYI, I'm off to a week-long vacation today, don't expect quick replies.
> 
> 
> 

I've been following this RFC series as this has impact on AMD SEV.
Could you guys keep us in the loop on this (thomas.lendacky@amd.com,
brijesh.singh@amd.com, jiandi.an@amd.com are on cc).

AMD SEV today sets swiotlb_force to SWIOTLB_FORCE early on in
x86_64_start_kernel.  During start_kernel, mem_encrypt_init() sets
dma_ops to swiotlb_dma_ops if SEV is on as it uses SWIOTLB to bounce
buffer DMA operation and it's marked as decrypted.

For virtio device we have to pass in iommu_platform=true flag for
this to set the VIRTIO_F_IOMMU_PLATFORM flag. But for example
QEMU has the use of iommu_platform attribute disabled for virtio-gpu
device.  So would also like to move towards not having to specify
the VIRTIO_F_IOMMU_PLATFORM flag.

Anshuman's patch [RFC,2/4] virtio: Override device's DMA OPS with
virtio_direct_dma_ops selectively sets the default dma ops of
virtio device's parent pci to direct_dma_ops if VIRTIO_F_IOMMU_PLATFORM
is set.  And later platform specific code can override the dma ops again.

 int virtio_finalize_features(struct virtio_device *dev)
 {
 	int ret = dev->config->finalize_features(dev);
@@ -174,6 +176,9 @@  int virtio_finalize_features(struct virtio_device *dev)
 	if (ret)
 		return ret;
 
+	if (virtio_has_iommu_quirk(dev))
+		set_dma_ops(dev->dev.parent, &virtio_direct_dma_ops);
+
 	if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1))
 		return 0;

Would like to be in the loop and put in the platform specific code for
AMD SEV at the same time along with this.  Or does it make sense that if
swiotlb force is set don't override virtio device's parent dma_ops to
direct_dma_ops?  What if someone passes in swiotlb=force from kernel
boot command line parameter, this would override the parent pci device's
dma ops to direct_dma_ops.

So where is this RFC standing currently.  I saw Michael's comment mentioning
this RFC is blocked by its performance overhead for now for switching to
using DMA ops unconditionally.

-Jiandi


^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-09-07  0:09                                                                           ` Jiandi An
@ 2018-09-10  6:19                                                                             ` Christoph Hellwig
  2018-09-10  8:53                                                                               ` Gerd Hoffmann
  0 siblings, 1 reply; 119+ messages in thread
From: Christoph Hellwig @ 2018-09-10  6:19 UTC (permalink / raw)
  To: Jiandi An
  Cc: Christoph Hellwig, Benjamin Herrenschmidt, Michael S. Tsirkin,
	Will Deacon, Anshuman Khandual, virtualization, linux-kernel,
	linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe,
	linuxram, haren, paulus, srikar, robin.murphy,
	jean-philippe.brucker, marc.zyngier, thomas.lendacky,
	brijesh.singh, jiandi.an

On Thu, Sep 06, 2018 at 07:09:09PM -0500, Jiandi An wrote:
> For virtio device we have to pass in iommu_platform=true flag for
> this to set the VIRTIO_F_IOMMU_PLATFORM flag. But for example
> QEMU has the use of iommu_platform attribute disabled for virtio-gpu
> device.  So would also like to move towards not having to specify
> the VIRTIO_F_IOMMU_PLATFORM flag.

Specifying VIRTIO_F_IOMMU_PLATFORM is the right thing for your 
platform given that you can't directly use physical addresses.
Please fix qemu so that virtio-gpu works with VIRTIO_F_IOMMU_PLATFORM.

Also just as I said for the power folks: you should really work with
the qemu folks that VIRTIO_F_IOMMU_PLATFORM (or whatever we call the
properly documented flag) can be set by default, and no pointless
performance overhead is implied by having a sane and simple
implementation.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [RFC 0/4] Virtio uses DMA API for all devices
  2018-09-10  6:19                                                                             ` Christoph Hellwig
@ 2018-09-10  8:53                                                                               ` Gerd Hoffmann
  0 siblings, 0 replies; 119+ messages in thread
From: Gerd Hoffmann @ 2018-09-10  8:53 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jiandi An, brijesh.singh, srikar, Michael S. Tsirkin,
	Benjamin Herrenschmidt, Will Deacon, virtualization, paulus,
	elfring, Anshuman Khandual, robh, jean-philippe.brucker, mpe,
	thomas.lendacky, marc.zyngier, linuxram, david, linuxppc-dev,
	linux-kernel, joe, jiandi.an, robin.murphy, haren

> > this to set the VIRTIO_F_IOMMU_PLATFORM flag. But for example
> > QEMU has the use of iommu_platform attribute disabled for virtio-gpu
> > device.  So would also like to move towards not having to specify
> > the VIRTIO_F_IOMMU_PLATFORM flag.
> 
> Specifying VIRTIO_F_IOMMU_PLATFORM is the right thing for your 
> platform given that you can't directly use physical addresses.
> Please fix qemu so that virtio-gpu works with VIRTIO_F_IOMMU_PLATFORM.

This needs both host and guest side changes btw.

Guest side patch is in drm-misc (a3b815f09bb8) and should land in the
next merge window.

Host side patches are here:
  https://git.kraxel.org/cgit/qemu/log/?h=sirius/virtio-gpu-iommu

Should also land in the next qemu version.

cheers,
  Gerd


^ permalink raw reply	[flat|nested] 119+ messages in thread

end of thread, other threads:[~2018-09-10  8:53 UTC | newest]

Thread overview: 119+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-20  3:59 [RFC 0/4] Virtio uses DMA API for all devices Anshuman Khandual
2018-07-20  3:59 ` [RFC 1/4] virtio: Define virtio_direct_dma_ops structure Anshuman Khandual
2018-07-30  9:24   ` Christoph Hellwig
2018-07-31  4:01     ` Anshuman Khandual
2018-07-20  3:59 ` [RFC 2/4] virtio: Override device's DMA OPS with virtio_direct_dma_ops selectively Anshuman Khandual
2018-07-28  8:56   ` Anshuman Khandual
2018-07-28 21:16     ` Michael S. Tsirkin
2018-07-30  4:15       ` Anshuman Khandual
2018-07-30  9:30       ` Christoph Hellwig
2018-07-31  6:39         ` Anshuman Khandual
2018-07-30  9:25   ` Christoph Hellwig
2018-07-31  7:00     ` Anshuman Khandual
2018-07-20  3:59 ` [RFC 3/4] virtio: Force virtio core to use DMA API callbacks for all virtio devices Anshuman Khandual
2018-07-20  3:59 ` [RFC 4/4] virtio: Add platform specific DMA API translation for virito devices Anshuman Khandual
2018-07-20 13:15   ` Michael S. Tsirkin
2018-07-23  2:16     ` Anshuman Khandual
2018-07-25  4:30       ` Anshuman Khandual
2018-07-25 13:31       ` Michael S. Tsirkin
2018-07-20 13:16 ` [RFC 0/4] Virtio uses DMA API for all devices Michael S. Tsirkin
2018-07-23  6:28   ` Anshuman Khandual
2018-07-23  9:08     ` Michael S. Tsirkin
2018-07-25  3:26       ` Anshuman Khandual
2018-07-27 11:31         ` Michael S. Tsirkin
2018-07-28  8:37           ` Anshuman Khandual
2018-07-27  9:58 ` Will Deacon
2018-07-27 10:58   ` Anshuman Khandual
2018-07-30  9:34   ` Christoph Hellwig
2018-07-30 10:28     ` Michael S. Tsirkin
2018-07-30 11:18       ` Christoph Hellwig
2018-07-30 13:26         ` Michael S. Tsirkin
2018-07-31 17:30           ` Christoph Hellwig
2018-07-31 20:36             ` Benjamin Herrenschmidt
2018-08-01  8:16               ` Will Deacon
2018-08-01  8:36                 ` Christoph Hellwig
2018-08-01  9:05                   ` Will Deacon
2018-08-01 22:41                     ` Michael S. Tsirkin
2018-08-01 22:35                   ` Michael S. Tsirkin
2018-08-02 15:24                   ` Benjamin Herrenschmidt
2018-08-02 15:41                     ` Michael S. Tsirkin
2018-08-02 16:01                       ` Benjamin Herrenschmidt
2018-08-02 17:19                         ` Michael S. Tsirkin
2018-08-02 17:53                           ` Benjamin Herrenschmidt
2018-08-02 20:52                             ` Michael S. Tsirkin
2018-08-02 21:13                               ` Benjamin Herrenschmidt
2018-08-02 21:51                                 ` Michael S. Tsirkin
2018-08-03  7:05                                 ` Christoph Hellwig
2018-08-03 15:58                                   ` Benjamin Herrenschmidt
2018-08-03 16:02                                     ` Christoph Hellwig
2018-08-03 18:58                                       ` Benjamin Herrenschmidt
2018-08-04  8:21                                         ` Christoph Hellwig
2018-08-05  1:10                                           ` Benjamin Herrenschmidt
2018-08-05  7:29                                             ` Christoph Hellwig
2018-08-05 21:16                                               ` Benjamin Herrenschmidt
2018-08-05 21:30                                                 ` Benjamin Herrenschmidt
2018-08-06  9:42                                                 ` Christoph Hellwig
2018-08-06 19:52                                                   ` Benjamin Herrenschmidt
2018-08-07  6:21                                                     ` Christoph Hellwig
2018-08-07  6:42                                                       ` Benjamin Herrenschmidt
2018-08-07 13:55                                                         ` Christoph Hellwig
2018-08-07 20:32                                                           ` Benjamin Herrenschmidt
2018-08-08  6:31                                                             ` Christoph Hellwig
2018-08-08 10:07                                                               ` Benjamin Herrenschmidt
2018-08-08 12:30                                                                 ` Christoph Hellwig
2018-08-08 13:18                                                                   ` Benjamin Herrenschmidt
2018-08-08 20:31                                                                     ` Michael S. Tsirkin
2018-08-08 22:13                                                                       ` Benjamin Herrenschmidt
2018-08-09  2:00                                                                         ` Benjamin Herrenschmidt
2018-08-09  5:40                                                                         ` Christoph Hellwig
2018-09-07  0:09                                                                           ` Jiandi An
2018-09-10  6:19                                                                             ` Christoph Hellwig
2018-09-10  8:53                                                                               ` Gerd Hoffmann
2018-08-03 19:07                                     ` Michael S. Tsirkin
2018-08-04  1:11                                       ` Benjamin Herrenschmidt
2018-08-04  1:16                                       ` Benjamin Herrenschmidt
2018-08-05  0:22                                         ` Michael S. Tsirkin
2018-08-05  4:52                                           ` Benjamin Herrenschmidt
2018-08-06 13:46                                             ` Michael S. Tsirkin
2018-08-06 19:56                                               ` Benjamin Herrenschmidt
2018-08-06 20:35                                                 ` Michael S. Tsirkin
2018-08-06 21:26                                                   ` Benjamin Herrenschmidt
2018-08-06 21:46                                                     ` Michael S. Tsirkin
2018-08-06 22:13                                                       ` Benjamin Herrenschmidt
2018-08-06 23:16                                                         ` Benjamin Herrenschmidt
2018-08-06 23:45                                                         ` Michael S. Tsirkin
2018-08-07  0:18                                                           ` Benjamin Herrenschmidt
2018-08-07  6:32                                                           ` Christoph Hellwig
2018-08-07  6:27                                                         ` Christoph Hellwig
2018-08-07  6:44                                                           ` Benjamin Herrenschmidt
2018-08-07  6:18                                                       ` Christoph Hellwig
2018-08-07  6:16                                                     ` Christoph Hellwig
2018-08-06 23:18                                                   ` Benjamin Herrenschmidt
2018-08-07  6:12                                                   ` Christoph Hellwig
2018-08-04  1:18                                       ` Benjamin Herrenschmidt
2018-08-04  1:22                                       ` Benjamin Herrenschmidt
2018-08-05  0:23                                         ` Michael S. Tsirkin
2018-08-03 19:17                                   ` Michael S. Tsirkin
2018-08-04  8:15                                     ` Christoph Hellwig
2018-08-05  0:09                                       ` Michael S. Tsirkin
2018-08-05  1:11                                         ` Benjamin Herrenschmidt
2018-08-05  7:25                                         ` Christoph Hellwig
2018-08-05  0:53                                       ` Benjamin Herrenschmidt
2018-08-05  0:27                 ` Michael S. Tsirkin
2018-08-06 14:05                   ` Will Deacon
2018-08-01 21:56               ` Michael S. Tsirkin
2018-08-02 15:33                 ` Benjamin Herrenschmidt
2018-08-02 20:53                   ` Michael S. Tsirkin
2018-08-03  7:06                     ` Christoph Hellwig
2018-08-02 20:55 ` Michael S. Tsirkin
2018-08-03  2:41   ` Jason Wang
2018-08-03 19:08     ` Michael S. Tsirkin
2018-08-04  1:21       ` Benjamin Herrenschmidt
2018-08-05  0:24         ` Michael S. Tsirkin
2018-08-06  9:02           ` Anshuman Khandual
2018-08-06 13:36             ` Michael S. Tsirkin
2018-08-06 15:24               ` Christoph Hellwig
2018-08-06 16:06                 ` Michael S. Tsirkin
2018-08-06 16:10                   ` Christoph Hellwig
2018-08-06 16:13                     ` Michael S. Tsirkin
2018-08-06 16:34                       ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).