linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] Fix virtio-blk issue with SWIOTLB
@ 2019-01-10 13:44 Joerg Roedel
  2019-01-10 13:44 ` [PATCH 1/3] swiotlb: Export maximum allocation size Joerg Roedel
                   ` (4 more replies)
  0 siblings, 5 replies; 27+ messages in thread
From: Joerg Roedel @ 2019-01-10 13:44 UTC (permalink / raw)
  To: Michael S . Tsirkin, Jason Wang, Konrad Rzeszutek Wilk
  Cc: Jens Axboe, virtualization, linux-block, linux-kernel, iommu,
	jfehlig, jon.grimm, brijesh.singh, hch, Joerg Roedel

Hi,

there is a problem with virtio-blk driven devices when
virtio-ring uses SWIOTLB through the DMA-API. This happens
for example in AMD-SEV enabled guests, where the guest RAM
is mostly encrypted and all emulated DMA has to happen
to/from the SWIOTLB aperture.

The problem is a limitation of the SWIOTLB implementation,
which does not support allocations larger than 256kb. When
the virtio-blk driver tries to read/write a block larger
than that, the allocation of the dma-handle fails and an IO
error is reported.

This patch-set adds a check to the virtio-code whether it
might be using SWIOTLB bounce buffering and limits the
maximum segment size in the virtio-blk driver in this case,
so that it doesn't try to do larger reads/writes.

Please review.

Thanks,

	Joerg

Joerg Roedel (3):
  swiotlb: Export maximum allocation size
  virtio: Introduce virtio_max_dma_size()
  virtio-blk: Consider virtio_max_dma_size() for maximum segment size

 drivers/block/virtio_blk.c   | 10 ++++++----
 drivers/virtio/virtio_ring.c | 11 +++++++++++
 include/linux/swiotlb.h      | 12 ++++++++++++
 include/linux/virtio.h       |  2 ++
 4 files changed, 31 insertions(+), 4 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/3] swiotlb: Export maximum allocation size
  2019-01-10 13:44 [PATCH 0/3] Fix virtio-blk issue with SWIOTLB Joerg Roedel
@ 2019-01-10 13:44 ` Joerg Roedel
  2019-01-10 17:02   ` Konrad Rzeszutek Wilk
  2019-01-10 13:44 ` [PATCH 2/3] virtio: Introduce virtio_max_dma_size() Joerg Roedel
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 27+ messages in thread
From: Joerg Roedel @ 2019-01-10 13:44 UTC (permalink / raw)
  To: Michael S . Tsirkin, Jason Wang, Konrad Rzeszutek Wilk
  Cc: Jens Axboe, virtualization, linux-block, linux-kernel, iommu,
	jfehlig, jon.grimm, brijesh.singh, hch, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

The SWIOTLB implementation has a maximum size it can
allocate dma-handles for. This needs to be exported so that
device drivers don't try to allocate larger chunks.

This is especially important for block device drivers like
virtio-blk, that might do DMA through SWIOTLB.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 include/linux/swiotlb.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 7c007ed7505f..0bcc80a97036 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -72,6 +72,11 @@ static inline bool is_swiotlb_buffer(phys_addr_t paddr)
 	return paddr >= io_tlb_start && paddr < io_tlb_end;
 }
 
+static inline size_t swiotlb_max_alloc_size(void)
+{
+	return ((1UL << IO_TLB_SHIFT) * IO_TLB_SEGSIZE);
+}
+
 bool swiotlb_map(struct device *dev, phys_addr_t *phys, dma_addr_t *dma_addr,
 		size_t size, enum dma_data_direction dir, unsigned long attrs);
 void __init swiotlb_exit(void);
@@ -95,6 +100,13 @@ static inline unsigned int swiotlb_max_segment(void)
 {
 	return 0;
 }
+
+static inline size_t swiotlb_max_alloc_size(void)
+{
+	/* There is no limit when SWIOTLB isn't used */
+	return ~0UL;
+}
+
 #endif /* CONFIG_SWIOTLB */
 
 extern void swiotlb_print_info(void);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/3] virtio: Introduce virtio_max_dma_size()
  2019-01-10 13:44 [PATCH 0/3] Fix virtio-blk issue with SWIOTLB Joerg Roedel
  2019-01-10 13:44 ` [PATCH 1/3] swiotlb: Export maximum allocation size Joerg Roedel
@ 2019-01-10 13:44 ` Joerg Roedel
  2019-01-10 13:44 ` [PATCH 3/3] virtio-blk: Consider virtio_max_dma_size() for maximum segment size Joerg Roedel
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 27+ messages in thread
From: Joerg Roedel @ 2019-01-10 13:44 UTC (permalink / raw)
  To: Michael S . Tsirkin, Jason Wang, Konrad Rzeszutek Wilk
  Cc: Jens Axboe, virtualization, linux-block, linux-kernel, iommu,
	jfehlig, jon.grimm, brijesh.singh, hch, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

This function returns the maximum segment size for a single
dma transaction of a virtio device. The possible limit comes
from the SWIOTLB implementation in the Linux kernel, that
has an upper limit of (currently) 256kb of contiguous
memory it can map.

The functions return the lower limie when SWIOTLB is used for DMA
transactions of the device and -1U otherwise.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 drivers/virtio/virtio_ring.c | 11 +++++++++++
 include/linux/virtio.h       |  2 ++
 2 files changed, 13 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index cd7e755484e3..c8d229da203e 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -266,6 +266,17 @@ static bool vring_use_dma_api(struct virtio_device *vdev)
 	return false;
 }
 
+size_t virtio_max_dma_size(struct virtio_device *vdev)
+{
+	const struct dma_map_ops *ops = get_dma_ops(&vdev->dev);
+	size_t max_segment_size = -1U;
+
+	if (vring_use_dma_api(vdev) && dma_is_direct(ops))
+		max_segment_size = swiotlb_max_alloc_size();
+
+	return max_segment_size;
+}
+
 static void *vring_alloc_queue(struct virtio_device *vdev, size_t size,
 			      dma_addr_t *dma_handle, gfp_t flag)
 {
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index fa1b5da2804e..673fe3ef3607 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -157,6 +157,8 @@ int virtio_device_freeze(struct virtio_device *dev);
 int virtio_device_restore(struct virtio_device *dev);
 #endif
 
+size_t virtio_max_dma_size(struct virtio_device *vdev);
+
 #define virtio_device_for_each_vq(vdev, vq) \
 	list_for_each_entry(vq, &vdev->vqs, list)
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 3/3] virtio-blk: Consider virtio_max_dma_size() for maximum segment size
  2019-01-10 13:44 [PATCH 0/3] Fix virtio-blk issue with SWIOTLB Joerg Roedel
  2019-01-10 13:44 ` [PATCH 1/3] swiotlb: Export maximum allocation size Joerg Roedel
  2019-01-10 13:44 ` [PATCH 2/3] virtio: Introduce virtio_max_dma_size() Joerg Roedel
@ 2019-01-10 13:44 ` Joerg Roedel
  2019-01-10 13:59 ` [PATCH 0/3] Fix virtio-blk issue with SWIOTLB Christoph Hellwig
  2019-01-11  3:29 ` Jason Wang
  4 siblings, 0 replies; 27+ messages in thread
From: Joerg Roedel @ 2019-01-10 13:44 UTC (permalink / raw)
  To: Michael S . Tsirkin, Jason Wang, Konrad Rzeszutek Wilk
  Cc: Jens Axboe, virtualization, linux-block, linux-kernel, iommu,
	jfehlig, jon.grimm, brijesh.singh, hch, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

Segments can't be larger than the maximum DMA allocation
size supported by virtio on the platform. Take that into
account when setting the maximum segment size for a block
device.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 drivers/block/virtio_blk.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index b16a887bbd02..6062faa8d213 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -723,7 +723,7 @@ static int virtblk_probe(struct virtio_device *vdev)
 	struct request_queue *q;
 	int err, index;
 
-	u32 v, blk_size, sg_elems, opt_io_size;
+	u32 v, blk_size, seg_size, sg_elems, opt_io_size;
 	u16 min_io_size;
 	u8 physical_block_exp, alignment_offset;
 
@@ -826,14 +826,16 @@ static int virtblk_probe(struct virtio_device *vdev)
 	/* No real sector limit. */
 	blk_queue_max_hw_sectors(q, -1U);
 
+	seg_size = virtio_max_dma_size(vdev);
+
 	/* Host can optionally specify maximum segment size and number of
 	 * segments. */
 	err = virtio_cread_feature(vdev, VIRTIO_BLK_F_SIZE_MAX,
 				   struct virtio_blk_config, size_max, &v);
 	if (!err)
-		blk_queue_max_segment_size(q, v);
-	else
-		blk_queue_max_segment_size(q, -1U);
+		seg_size = min(v, seg_size);
+
+	blk_queue_max_segment_size(q, seg_size);
 
 	/* Host can optionally specify the block size of the device */
 	err = virtio_cread_feature(vdev, VIRTIO_BLK_F_BLK_SIZE,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/3] Fix virtio-blk issue with SWIOTLB
  2019-01-10 13:44 [PATCH 0/3] Fix virtio-blk issue with SWIOTLB Joerg Roedel
                   ` (2 preceding siblings ...)
  2019-01-10 13:44 ` [PATCH 3/3] virtio-blk: Consider virtio_max_dma_size() for maximum segment size Joerg Roedel
@ 2019-01-10 13:59 ` Christoph Hellwig
  2019-01-10 14:26   ` Joerg Roedel
  2019-01-11  3:29 ` Jason Wang
  4 siblings, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2019-01-10 13:59 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Michael S . Tsirkin, Jason Wang, Konrad Rzeszutek Wilk,
	Jens Axboe, virtualization, linux-block, linux-kernel, iommu,
	jfehlig, jon.grimm, brijesh.singh, hch

On Thu, Jan 10, 2019 at 02:44:30PM +0100, Joerg Roedel wrote:
> Hi,
> 
> there is a problem with virtio-blk driven devices when
> virtio-ring uses SWIOTLB through the DMA-API. This happens
> for example in AMD-SEV enabled guests, where the guest RAM
> is mostly encrypted and all emulated DMA has to happen
> to/from the SWIOTLB aperture.
> 
> The problem is a limitation of the SWIOTLB implementation,
> which does not support allocations larger than 256kb. When
> the virtio-blk driver tries to read/write a block larger
> than that, the allocation of the dma-handle fails and an IO
> error is reported.

s/allocations/mappings/, right?  We don't use the swiotlb
buffer for the coherent allocations anymore, and I don't think
virtio-blk uses them anyway.

> This patch-set adds a check to the virtio-code whether it
> might be using SWIOTLB bounce buffering and limits the
> maximum segment size in the virtio-blk driver in this case,
> so that it doesn't try to do larger reads/writes.

I really don't like the fact that we hardcode swiotlb specific.
This needs to be indirect through the dma ops or struct device,
as various iommu implementations also have limits (although
usually much larger ones).

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/3] Fix virtio-blk issue with SWIOTLB
  2019-01-10 13:59 ` [PATCH 0/3] Fix virtio-blk issue with SWIOTLB Christoph Hellwig
@ 2019-01-10 14:26   ` Joerg Roedel
  0 siblings, 0 replies; 27+ messages in thread
From: Joerg Roedel @ 2019-01-10 14:26 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Michael S . Tsirkin, Jason Wang, Konrad Rzeszutek Wilk,
	Jens Axboe, virtualization, linux-block, linux-kernel, iommu,
	jfehlig, jon.grimm, brijesh.singh

Hi Christoph,

On Thu, Jan 10, 2019 at 02:59:52PM +0100, Christoph Hellwig wrote:
> On Thu, Jan 10, 2019 at 02:44:30PM +0100, Joerg Roedel wrote:

> > The problem is a limitation of the SWIOTLB implementation,
> > which does not support allocations larger than 256kb. When
> > the virtio-blk driver tries to read/write a block larger
> > than that, the allocation of the dma-handle fails and an IO
> > error is reported.
> 
> s/allocations/mappings/, right?  We don't use the swiotlb
> buffer for the coherent allocations anymore, and I don't think
> virtio-blk uses them anyway.

'Allocation' in the sense that there are address ranges allocated, not
memory, but mappings would also be right.

> I really don't like the fact that we hardcode swiotlb specific.
> This needs to be indirect through the dma ops or struct device,
> as various iommu implementations also have limits (although
> usually much larger ones).

I though about exposing it through DMA-API, but didn't go that route as
I didn't want to extend a generic API for some SWIOTLB specifics. But if
there are more implementations that have a size limitation it makes
certainly sense to put it into the DMA-API. I'll change that in the
next version.

Thanks,

	Joerg

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/3] swiotlb: Export maximum allocation size
  2019-01-10 13:44 ` [PATCH 1/3] swiotlb: Export maximum allocation size Joerg Roedel
@ 2019-01-10 17:02   ` Konrad Rzeszutek Wilk
  2019-01-11  9:12     ` Joerg Roedel
  0 siblings, 1 reply; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2019-01-10 17:02 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Michael S . Tsirkin, Jason Wang, Jens Axboe, virtualization,
	linux-block, linux-kernel, iommu, jfehlig, jon.grimm,
	brijesh.singh, hch, Joerg Roedel

On Thu, Jan 10, 2019 at 02:44:31PM +0100, Joerg Roedel wrote:
> From: Joerg Roedel <jroedel@suse.de>
> 
> The SWIOTLB implementation has a maximum size it can
> allocate dma-handles for. This needs to be exported so that
> device drivers don't try to allocate larger chunks.
> 
> This is especially important for block device drivers like
> virtio-blk, that might do DMA through SWIOTLB.

Why not use swiotlb_nr_tbl ? That is how drivers/gpu/drm use to figure if they
need to limit the size of pages.
> 
> Signed-off-by: Joerg Roedel <jroedel@suse.de>
> ---
>  include/linux/swiotlb.h | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index 7c007ed7505f..0bcc80a97036 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -72,6 +72,11 @@ static inline bool is_swiotlb_buffer(phys_addr_t paddr)
>  	return paddr >= io_tlb_start && paddr < io_tlb_end;
>  }
>  
> +static inline size_t swiotlb_max_alloc_size(void)
> +{
> +	return ((1UL << IO_TLB_SHIFT) * IO_TLB_SEGSIZE);
> +}
> +
>  bool swiotlb_map(struct device *dev, phys_addr_t *phys, dma_addr_t *dma_addr,
>  		size_t size, enum dma_data_direction dir, unsigned long attrs);
>  void __init swiotlb_exit(void);
> @@ -95,6 +100,13 @@ static inline unsigned int swiotlb_max_segment(void)
>  {
>  	return 0;
>  }
> +
> +static inline size_t swiotlb_max_alloc_size(void)
> +{
> +	/* There is no limit when SWIOTLB isn't used */
> +	return ~0UL;
> +}
> +
>  #endif /* CONFIG_SWIOTLB */
>  
>  extern void swiotlb_print_info(void);
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/3] Fix virtio-blk issue with SWIOTLB
  2019-01-10 13:44 [PATCH 0/3] Fix virtio-blk issue with SWIOTLB Joerg Roedel
                   ` (3 preceding siblings ...)
  2019-01-10 13:59 ` [PATCH 0/3] Fix virtio-blk issue with SWIOTLB Christoph Hellwig
@ 2019-01-11  3:29 ` Jason Wang
  2019-01-11  9:15   ` Joerg Roedel
  4 siblings, 1 reply; 27+ messages in thread
From: Jason Wang @ 2019-01-11  3:29 UTC (permalink / raw)
  To: Joerg Roedel, Michael S . Tsirkin, Konrad Rzeszutek Wilk
  Cc: Jens Axboe, virtualization, linux-block, linux-kernel, iommu,
	jfehlig, jon.grimm, brijesh.singh, hch


On 2019/1/10 下午9:44, Joerg Roedel wrote:
> Hi,
>
> there is a problem with virtio-blk driven devices when
> virtio-ring uses SWIOTLB through the DMA-API. This happens
> for example in AMD-SEV enabled guests, where the guest RAM
> is mostly encrypted and all emulated DMA has to happen
> to/from the SWIOTLB aperture.
>
> The problem is a limitation of the SWIOTLB implementation,
> which does not support allocations larger than 256kb. When
> the virtio-blk driver tries to read/write a block larger
> than that, the allocation of the dma-handle fails and an IO
> error is reported.
>
> This patch-set adds a check to the virtio-code whether it
> might be using SWIOTLB bounce buffering and limits the
> maximum segment size in the virtio-blk driver in this case,
> so that it doesn't try to do larger reads/writes.


Just wonder if my understanding is correct IOMMU_PLATFORM must be set 
for all virtio devices under AMD-SEV guests?

Thanks


>
> Please review.
>
> Thanks,
>
> 	Joerg
>
> Joerg Roedel (3):
>    swiotlb: Export maximum allocation size
>    virtio: Introduce virtio_max_dma_size()
>    virtio-blk: Consider virtio_max_dma_size() for maximum segment size
>
>   drivers/block/virtio_blk.c   | 10 ++++++----
>   drivers/virtio/virtio_ring.c | 11 +++++++++++
>   include/linux/swiotlb.h      | 12 ++++++++++++
>   include/linux/virtio.h       |  2 ++
>   4 files changed, 31 insertions(+), 4 deletions(-)
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/3] swiotlb: Export maximum allocation size
  2019-01-10 17:02   ` Konrad Rzeszutek Wilk
@ 2019-01-11  9:12     ` Joerg Roedel
  2019-01-14 20:49       ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 27+ messages in thread
From: Joerg Roedel @ 2019-01-11  9:12 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Michael S . Tsirkin, Jason Wang, Jens Axboe, virtualization,
	linux-block, linux-kernel, iommu, jfehlig, jon.grimm,
	brijesh.singh, hch, Joerg Roedel

On Thu, Jan 10, 2019 at 12:02:05PM -0500, Konrad Rzeszutek Wilk wrote:
> Why not use swiotlb_nr_tbl ? That is how drivers/gpu/drm use to figure if they
> need to limit the size of pages.

That function just exports the overall size of the swiotlb aperture, no?
What I need here is the maximum size for a single mapping.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/3] Fix virtio-blk issue with SWIOTLB
  2019-01-11  3:29 ` Jason Wang
@ 2019-01-11  9:15   ` Joerg Roedel
  2019-01-14  9:41     ` Jason Wang
  0 siblings, 1 reply; 27+ messages in thread
From: Joerg Roedel @ 2019-01-11  9:15 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S . Tsirkin, Konrad Rzeszutek Wilk, Jens Axboe,
	virtualization, linux-block, linux-kernel, iommu, jfehlig,
	jon.grimm, brijesh.singh, hch

On Fri, Jan 11, 2019 at 11:29:31AM +0800, Jason Wang wrote:
> Just wonder if my understanding is correct IOMMU_PLATFORM must be set for
> all virtio devices under AMD-SEV guests?

Yes, that is correct. Emulated DMA can only happen on the SWIOTLB
aperture, because that memory is not encrypted. The guest bounces the
data then to its encrypted memory.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/3] Fix virtio-blk issue with SWIOTLB
  2019-01-11  9:15   ` Joerg Roedel
@ 2019-01-14  9:41     ` Jason Wang
  2019-01-14  9:50       ` Christoph Hellwig
  0 siblings, 1 reply; 27+ messages in thread
From: Jason Wang @ 2019-01-14  9:41 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Michael S . Tsirkin, Konrad Rzeszutek Wilk, Jens Axboe,
	virtualization, linux-block, linux-kernel, iommu, jfehlig,
	jon.grimm, brijesh.singh, hch


On 2019/1/11 下午5:15, Joerg Roedel wrote:
> On Fri, Jan 11, 2019 at 11:29:31AM +0800, Jason Wang wrote:
>> Just wonder if my understanding is correct IOMMU_PLATFORM must be set for
>> all virtio devices under AMD-SEV guests?
> Yes, that is correct. Emulated DMA can only happen on the SWIOTLB
> aperture, because that memory is not encrypted. The guest bounces the
> data then to its encrypted memory.
>
> Regards,
>
> 	Joerg


Thanks, have you tested vhost-net in this case. I suspect it may not work




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/3] Fix virtio-blk issue with SWIOTLB
  2019-01-14  9:41     ` Jason Wang
@ 2019-01-14  9:50       ` Christoph Hellwig
  2019-01-14 12:41         ` Jason Wang
  0 siblings, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2019-01-14  9:50 UTC (permalink / raw)
  To: Jason Wang
  Cc: Joerg Roedel, Michael S . Tsirkin, Konrad Rzeszutek Wilk,
	Jens Axboe, virtualization, linux-block, linux-kernel, iommu,
	jfehlig, jon.grimm, brijesh.singh, hch

On Mon, Jan 14, 2019 at 05:41:56PM +0800, Jason Wang wrote:
>
> On 2019/1/11 下午5:15, Joerg Roedel wrote:
>> On Fri, Jan 11, 2019 at 11:29:31AM +0800, Jason Wang wrote:
>>> Just wonder if my understanding is correct IOMMU_PLATFORM must be set for
>>> all virtio devices under AMD-SEV guests?
>> Yes, that is correct. Emulated DMA can only happen on the SWIOTLB
>> aperture, because that memory is not encrypted. The guest bounces the
>> data then to its encrypted memory.
>>
>> Regards,
>>
>> 	Joerg
>
>
> Thanks, have you tested vhost-net in this case. I suspect it may not work

Which brings me back to my pet pevee that we need to take actions
that virtio uses the proper dma mapping API by default with quirks
for legacy cases.  The magic bypass it uses is just causing problems
over problems.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/3] Fix virtio-blk issue with SWIOTLB
  2019-01-14  9:50       ` Christoph Hellwig
@ 2019-01-14 12:41         ` Jason Wang
  2019-01-14 18:20           ` Michael S. Tsirkin
  0 siblings, 1 reply; 27+ messages in thread
From: Jason Wang @ 2019-01-14 12:41 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Joerg Roedel, Michael S . Tsirkin, Konrad Rzeszutek Wilk,
	Jens Axboe, virtualization, linux-block, linux-kernel, iommu,
	jfehlig, jon.grimm, brijesh.singh


On 2019/1/14 下午5:50, Christoph Hellwig wrote:
> On Mon, Jan 14, 2019 at 05:41:56PM +0800, Jason Wang wrote:
>> On 2019/1/11 下午5:15, Joerg Roedel wrote:
>>> On Fri, Jan 11, 2019 at 11:29:31AM +0800, Jason Wang wrote:
>>>> Just wonder if my understanding is correct IOMMU_PLATFORM must be set for
>>>> all virtio devices under AMD-SEV guests?
>>> Yes, that is correct. Emulated DMA can only happen on the SWIOTLB
>>> aperture, because that memory is not encrypted. The guest bounces the
>>> data then to its encrypted memory.
>>>
>>> Regards,
>>>
>>> 	Joerg
>>
>> Thanks, have you tested vhost-net in this case. I suspect it may not work
> Which brings me back to my pet pevee that we need to take actions
> that virtio uses the proper dma mapping API by default with quirks
> for legacy cases.  The magic bypass it uses is just causing problems
> over problems.


Yes, I fully agree with you. This is probably an exact example of such 
problem.

Thanks


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/3] Fix virtio-blk issue with SWIOTLB
  2019-01-14 12:41         ` Jason Wang
@ 2019-01-14 18:20           ` Michael S. Tsirkin
  2019-01-14 19:09             ` Singh, Brijesh
                               ` (3 more replies)
  0 siblings, 4 replies; 27+ messages in thread
From: Michael S. Tsirkin @ 2019-01-14 18:20 UTC (permalink / raw)
  To: Jason Wang
  Cc: Christoph Hellwig, Joerg Roedel, Konrad Rzeszutek Wilk,
	Jens Axboe, virtualization, linux-block, linux-kernel, iommu,
	jfehlig, jon.grimm, brijesh.singh

On Mon, Jan 14, 2019 at 08:41:37PM +0800, Jason Wang wrote:
> 
> On 2019/1/14 下午5:50, Christoph Hellwig wrote:
> > On Mon, Jan 14, 2019 at 05:41:56PM +0800, Jason Wang wrote:
> > > On 2019/1/11 下午5:15, Joerg Roedel wrote:
> > > > On Fri, Jan 11, 2019 at 11:29:31AM +0800, Jason Wang wrote:
> > > > > Just wonder if my understanding is correct IOMMU_PLATFORM must be set for
> > > > > all virtio devices under AMD-SEV guests?
> > > > Yes, that is correct. Emulated DMA can only happen on the SWIOTLB
> > > > aperture, because that memory is not encrypted. The guest bounces the
> > > > data then to its encrypted memory.
> > > > 
> > > > Regards,
> > > > 
> > > > 	Joerg
> > > 
> > > Thanks, have you tested vhost-net in this case. I suspect it may not work
> > Which brings me back to my pet pevee that we need to take actions
> > that virtio uses the proper dma mapping API by default with quirks
> > for legacy cases.  The magic bypass it uses is just causing problems
> > over problems.
> 
> 
> Yes, I fully agree with you. This is probably an exact example of such
> problem.
> 
> Thanks

I don't think so - the issue is really that DMA API does not yet handle
the SEV case 100% correctly. I suspect passthrough devices would have
the same issue.

In fact whoever sets IOMMU_PLATFORM is completely unaffected by
Christoph's pet peeve.

Christoph is saying that !IOMMU_PLATFORM devices should hide the
compatibility code in a special per-device DMA API implementation.
Which would be fine especially if we can manage not to introduce a bunch
of indirect calls all over the place and hurt performance.  It's just
that the benefit is unlikely to be big (e.g. we can't also get rid of
the virtio specific memory barriers) so no one was motivated enough to
work on it.

-- 
MST

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/3] Fix virtio-blk issue with SWIOTLB
  2019-01-14 18:20           ` Michael S. Tsirkin
@ 2019-01-14 19:09             ` Singh, Brijesh
  2019-01-14 19:12             ` Robin Murphy
                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 27+ messages in thread
From: Singh, Brijesh @ 2019-01-14 19:09 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang
  Cc: Singh, Brijesh, Christoph Hellwig, Joerg Roedel,
	Konrad Rzeszutek Wilk, Jens Axboe, virtualization, linux-block,
	linux-kernel, iommu, jfehlig, Grimm, Jon



On 1/14/19 12:20 PM, Michael S. Tsirkin wrote:
> On Mon, Jan 14, 2019 at 08:41:37PM +0800, Jason Wang wrote:
>>
>> On 2019/1/14 下午5:50, Christoph Hellwig wrote:
>>> On Mon, Jan 14, 2019 at 05:41:56PM +0800, Jason Wang wrote:
>>>> On 2019/1/11 下午5:15, Joerg Roedel wrote:
>>>>> On Fri, Jan 11, 2019 at 11:29:31AM +0800, Jason Wang wrote:
>>>>>> Just wonder if my understanding is correct IOMMU_PLATFORM must be set for
>>>>>> all virtio devices under AMD-SEV guests?
>>>>> Yes, that is correct. Emulated DMA can only happen on the SWIOTLB
>>>>> aperture, because that memory is not encrypted. The guest bounces the
>>>>> data then to its encrypted memory.
>>>>>
>>>>> Regards,
>>>>>
>>>>> 	Joerg
>>>>
>>>> Thanks, have you tested vhost-net in this case. I suspect it may not work
>>> Which brings me back to my pet pevee that we need to take actions
>>> that virtio uses the proper dma mapping API by default with quirks
>>> for legacy cases.  The magic bypass it uses is just causing problems
>>> over problems.
>>
>>
>> Yes, I fully agree with you. This is probably an exact example of such
>> problem.
>>
>> Thanks
> 
> I don't think so - the issue is really that DMA API does not yet handle
> the SEV case 100% correctly. I suspect passthrough devices would have
> the same issue.
> 


In case of SEV, emulated DMA is performed through the SWIOTLB
(which bounces the encrypted buffers). The issue reported here will
happen on any platform which is making use of SWIOTLB. We could
easily reproduce the the virtio-blk failure if we configure
swiotlb=force in non SEV guest. Unfortunately in case of SEV the
SWIOTLB is must. As Jorge highlighted the main issue is limitation
of the SWIOTLB, it does not support allocation/map larger than 256Kb.


> In fact whoever sets IOMMU_PLATFORM is completely unaffected by
> Christoph's pet peeve.
> 
> Christoph is saying that !IOMMU_PLATFORM devices should hide the
> compatibility code in a special per-device DMA API implementation.
> Which would be fine especially if we can manage not to introduce a bunch
> of indirect calls all over the place and hurt performance.  It's just
> that the benefit is unlikely to be big (e.g. we can't also get rid of
> the virtio specific memory barriers) so no one was motivated enough to
> work on it.
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/3] Fix virtio-blk issue with SWIOTLB
  2019-01-14 18:20           ` Michael S. Tsirkin
  2019-01-14 19:09             ` Singh, Brijesh
@ 2019-01-14 19:12             ` Robin Murphy
  2019-01-14 20:22               ` Christoph Hellwig
  2019-01-14 20:29               ` Michael S. Tsirkin
  2019-01-14 20:19             ` Christoph Hellwig
  2019-01-15  8:37             ` Joerg Roedel
  3 siblings, 2 replies; 27+ messages in thread
From: Robin Murphy @ 2019-01-14 19:12 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang
  Cc: Jens Axboe, brijesh.singh, Konrad Rzeszutek Wilk, jon.grimm,
	jfehlig, linux-kernel, virtualization, linux-block, iommu,
	Christoph Hellwig

On 14/01/2019 18:20, Michael S. Tsirkin wrote:
> On Mon, Jan 14, 2019 at 08:41:37PM +0800, Jason Wang wrote:
>>
>> On 2019/1/14 下午5:50, Christoph Hellwig wrote:
>>> On Mon, Jan 14, 2019 at 05:41:56PM +0800, Jason Wang wrote:
>>>> On 2019/1/11 下午5:15, Joerg Roedel wrote:
>>>>> On Fri, Jan 11, 2019 at 11:29:31AM +0800, Jason Wang wrote:
>>>>>> Just wonder if my understanding is correct IOMMU_PLATFORM must be set for
>>>>>> all virtio devices under AMD-SEV guests?
>>>>> Yes, that is correct. Emulated DMA can only happen on the SWIOTLB
>>>>> aperture, because that memory is not encrypted. The guest bounces the
>>>>> data then to its encrypted memory.
>>>>>
>>>>> Regards,
>>>>>
>>>>> 	Joerg
>>>>
>>>> Thanks, have you tested vhost-net in this case. I suspect it may not work
>>> Which brings me back to my pet pevee that we need to take actions
>>> that virtio uses the proper dma mapping API by default with quirks
>>> for legacy cases.  The magic bypass it uses is just causing problems
>>> over problems.
>>
>>
>> Yes, I fully agree with you. This is probably an exact example of such
>> problem.
>>
>> Thanks
> 
> I don't think so - the issue is really that DMA API does not yet handle
> the SEV case 100% correctly. I suspect passthrough devices would have
> the same issue.

Huh? Regardless of which virtio devices use it or not, the DMA API is 
handling the SEV case as correctly as it possibly can, by forcing 
everything through the unencrypted bounce buffer. If the segments being 
mapped are too big for that bounce buffer in the first place, there's 
nothing it can possibly do except fail, gracefully or otherwise.

Now, in theory, yes, the real issue at hand is not unique to virtio-blk 
nor SEV - any driver whose device has a sufficiently large DMA segment 
size and who manages to get sufficient physically-contiguous memory 
could technically generate a scatterlist segment longer than SWIOTLB can 
handle. However, in practice that basically never happens, not least 
because very few drivers ever override the default 64K DMA segment 
limit. AFAICS nothing in drivers/virtio is calling 
dma_set_max_seg_size() or otherwise assigning any dma_parms to replace 
the defaults either, so the really interesting question here is how are 
these apparently-out-of-spec 256K segments getting generated at all?

Robin.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/3] Fix virtio-blk issue with SWIOTLB
  2019-01-14 18:20           ` Michael S. Tsirkin
  2019-01-14 19:09             ` Singh, Brijesh
  2019-01-14 19:12             ` Robin Murphy
@ 2019-01-14 20:19             ` Christoph Hellwig
  2019-01-14 20:48               ` Michael S. Tsirkin
  2019-01-15  8:37             ` Joerg Roedel
  3 siblings, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2019-01-14 20:19 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, Christoph Hellwig, Joerg Roedel,
	Konrad Rzeszutek Wilk, Jens Axboe, virtualization, linux-block,
	linux-kernel, iommu, jfehlig, jon.grimm, brijesh.singh

On Mon, Jan 14, 2019 at 01:20:45PM -0500, Michael S. Tsirkin wrote:
> I don't think so - the issue is really that DMA API does not yet handle
> the SEV case 100% correctly. I suspect passthrough devices would have
> the same issue.

The DMA API handles the SEV case perfectly.  Its just that virtio_blk
supports huge segments that virtio does not generally support, but that
is not related to SEV in any way.

> In fact whoever sets IOMMU_PLATFORM is completely unaffected by
> Christoph's pet peeve.

No, the above happens only when we set IOMMU_PLATFORM.

> Christoph is saying that !IOMMU_PLATFORM devices should hide the
> compatibility code in a special per-device DMA API implementation.
> Which would be fine especially if we can manage not to introduce a bunch
> of indirect calls all over the place and hurt performance.  It's just
> that the benefit is unlikely to be big (e.g. we can't also get rid of
> the virtio specific memory barriers) so no one was motivated enough to
> work on it.

No.  The problem is that we still haven't fully specified what
IOMMU_PLATFORM and !IOMMU_PLATFORM actually mean.  Your
"ACCESS_PLATFORM/ORDER_PLATFORM" commit in the virtio-spec repo
improves it a little bit, but it is still far from enough.

As a start VIRTIO_F_ORDER_PLATFORM and VIRTIO_F_ACCESS_PLATFORM
absolutely MUST be set for hardware implementations.  Otherwise said
hardware has no chance of working on anything but the most x86-like
systems.

Second software implementations SHOULD set VIRTIO_F_ACCESS_PLATFORM,
because otherwise we can't add proper handling for things like SEV or
the IBM "secure hypervisor" thing.

Last but not least a lot of wording outside the area describing these
flags really needs some major updates in terms of DMA access.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/3] Fix virtio-blk issue with SWIOTLB
  2019-01-14 19:12             ` Robin Murphy
@ 2019-01-14 20:22               ` Christoph Hellwig
  2019-01-14 20:29               ` Michael S. Tsirkin
  1 sibling, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2019-01-14 20:22 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Michael S. Tsirkin, Jason Wang, Jens Axboe, brijesh.singh,
	Konrad Rzeszutek Wilk, jon.grimm, jfehlig, linux-kernel,
	virtualization, linux-block, iommu, Christoph Hellwig

On Mon, Jan 14, 2019 at 07:12:08PM +0000, Robin Murphy wrote:
> Now, in theory, yes, the real issue at hand is not unique to virtio-blk nor 
> SEV - any driver whose device has a sufficiently large DMA segment size and 
> who manages to get sufficient physically-contiguous memory could 
> technically generate a scatterlist segment longer than SWIOTLB can handle. 
> However, in practice that basically never happens, not least because very 
> few drivers ever override the default 64K DMA segment limit. AFAICS nothing 
> in drivers/virtio is calling dma_set_max_seg_size() or otherwise assigning 
> any dma_parms to replace the defaults either, so the really interesting 
> question here is how are these apparently-out-of-spec 256K segments getting 
> generated at all?

drivers/block/virtio_blk.c:virtblk_probe():

	/* Host can optionally specify maximum segment size and number of
         * segments. */
	err = virtio_cread_feature(vdev, VIRTIO_BLK_F_SIZE_MAX,
				   struct virtio_blk_config, size_max, &v);
        if (!err)
		blk_queue_max_segment_size(q, v);
	else
		blk_queue_max_segment_size(q, -1U);

So it really is virtio_blk that is special here.  The host could
set VIRTIO_BLK_F_SIZE_MAX to paper over the problem, but I really
think we need a dma_max_segment_size API that is wired up through
the dma_map_ops to sort this out for real.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/3] Fix virtio-blk issue with SWIOTLB
  2019-01-14 19:12             ` Robin Murphy
  2019-01-14 20:22               ` Christoph Hellwig
@ 2019-01-14 20:29               ` Michael S. Tsirkin
  1 sibling, 0 replies; 27+ messages in thread
From: Michael S. Tsirkin @ 2019-01-14 20:29 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Jason Wang, Jens Axboe, brijesh.singh, Konrad Rzeszutek Wilk,
	jon.grimm, jfehlig, linux-kernel, virtualization, linux-block,
	iommu, Christoph Hellwig

On Mon, Jan 14, 2019 at 07:12:08PM +0000, Robin Murphy wrote:
> On 14/01/2019 18:20, Michael S. Tsirkin wrote:
> > On Mon, Jan 14, 2019 at 08:41:37PM +0800, Jason Wang wrote:
> > > 
> > > On 2019/1/14 下午5:50, Christoph Hellwig wrote:
> > > > On Mon, Jan 14, 2019 at 05:41:56PM +0800, Jason Wang wrote:
> > > > > On 2019/1/11 下午5:15, Joerg Roedel wrote:
> > > > > > On Fri, Jan 11, 2019 at 11:29:31AM +0800, Jason Wang wrote:
> > > > > > > Just wonder if my understanding is correct IOMMU_PLATFORM must be set for
> > > > > > > all virtio devices under AMD-SEV guests?
> > > > > > Yes, that is correct. Emulated DMA can only happen on the SWIOTLB
> > > > > > aperture, because that memory is not encrypted. The guest bounces the
> > > > > > data then to its encrypted memory.
> > > > > > 
> > > > > > Regards,
> > > > > > 
> > > > > > 	Joerg
> > > > > 
> > > > > Thanks, have you tested vhost-net in this case. I suspect it may not work
> > > > Which brings me back to my pet pevee that we need to take actions
> > > > that virtio uses the proper dma mapping API by default with quirks
> > > > for legacy cases.  The magic bypass it uses is just causing problems
> > > > over problems.
> > > 
> > > 
> > > Yes, I fully agree with you. This is probably an exact example of such
> > > problem.
> > > 
> > > Thanks
> > 
> > I don't think so - the issue is really that DMA API does not yet handle
> > the SEV case 100% correctly. I suspect passthrough devices would have
> > the same issue.
> 
> Huh? Regardless of which virtio devices use it or not, the DMA API is
> handling the SEV case as correctly as it possibly can, by forcing everything
> through the unencrypted bounce buffer. If the segments being mapped are too
> big for that bounce buffer in the first place, there's nothing it can
> possibly do except fail, gracefully or otherwise.

It seems reasonable to be able to ask it what it's capabilities are
though.

> Now, in theory, yes, the real issue at hand is not unique to virtio-blk nor
> SEV - any driver whose device has a sufficiently large DMA segment size and
> who manages to get sufficient physically-contiguous memory could technically
> generate a scatterlist segment longer than SWIOTLB can handle. However, in
> practice that basically never happens, not least because very few drivers
> ever override the default 64K DMA segment limit. AFAICS nothing in
> drivers/virtio is calling dma_set_max_seg_size() or otherwise assigning any
> dma_parms to replace the defaults either, so the really interesting question
> here is how are these apparently-out-of-spec 256K segments getting generated
> at all?
> 
> Robin.

I guess this is what you are looking for:

        /* Host can optionally specify maximum segment size and number of
         * segments. */
        err = virtio_cread_feature(vdev, VIRTIO_BLK_F_SIZE_MAX,
                                   struct virtio_blk_config, size_max, &v);
        if (!err)
                blk_queue_max_segment_size(q, v);
        else
                blk_queue_max_segment_size(q, -1U);

virtio isn't the only caller with a value >64K:

$ git grep -A1 blk_queue_max_segment_size
Documentation/block/biodoc.txt: blk_queue_max_segment_size(q, max_seg_size)
Documentation/block/biodoc.txt-         Maximum size of a clustered segment, 64kB default.
--
block/blk-settings.c: * blk_queue_max_segment_size - set max segment size for blk_rq_map_sg
block/blk-settings.c- * @q:  the request queue for the device
--
block/blk-settings.c:void blk_queue_max_segment_size(struct request_queue *q, unsigned int max_size)
block/blk-settings.c-{
--
block/blk-settings.c:EXPORT_SYMBOL(blk_queue_max_segment_size);
block/blk-settings.c-
--
drivers/block/mtip32xx/mtip32xx.c:      blk_queue_max_segment_size(dd->queue, 0x400000);
drivers/block/mtip32xx/mtip32xx.c-      blk_queue_io_min(dd->queue, 4096);
--
drivers/block/nbd.c:    blk_queue_max_segment_size(disk->queue, UINT_MAX);
drivers/block/nbd.c-    blk_queue_max_segments(disk->queue, USHRT_MAX);
--
drivers/block/ps3disk.c:        blk_queue_max_segment_size(queue, dev->bounce_size);
drivers/block/ps3disk.c-
--
drivers/block/ps3vram.c:        blk_queue_max_segment_size(queue, BLK_MAX_SEGMENT_SIZE);
drivers/block/ps3vram.c-        blk_queue_max_hw_sectors(queue, BLK_SAFE_MAX_SECTORS);
--
drivers/block/rbd.c:    blk_queue_max_segment_size(q, UINT_MAX);
drivers/block/rbd.c-    blk_queue_io_min(q, objset_bytes);
--
drivers/block/sunvdc.c: blk_queue_max_segment_size(q, PAGE_SIZE);
drivers/block/sunvdc.c-
--
drivers/block/virtio_blk.c:             blk_queue_max_segment_size(q, v);
drivers/block/virtio_blk.c-     else
drivers/block/virtio_blk.c:             blk_queue_max_segment_size(q, -1U);
drivers/block/virtio_blk.c-
--
drivers/block/xen-blkfront.c:   blk_queue_max_segment_size(rq, PAGE_SIZE);
drivers/block/xen-blkfront.c-
--
drivers/cdrom/gdrom.c:  blk_queue_max_segment_size(gd.gdrom_rq, 0x40000);
drivers/cdrom/gdrom.c-  gd.disk->queue = gd.gdrom_rq;
--
drivers/memstick/core/ms_block.c:       blk_queue_max_segment_size(msb->queue,
drivers/memstick/core/ms_block.c-                                  MS_BLOCK_MAX_PAGES * msb->page_size);
--
drivers/memstick/core/mspro_block.c:    blk_queue_max_segment_size(msb->queue,
drivers/memstick/core/mspro_block.c-                               MSPRO_BLOCK_MAX_PAGES * msb->page_size);
--
drivers/mmc/core/queue.c:       blk_queue_max_segment_size(mq->queue, host->max_seg_size);
drivers/mmc/core/queue.c-
--
drivers/s390/block/dasd.c:      blk_queue_max_segment_size(q, PAGE_SIZE);
drivers/s390/block/dasd.c-      blk_queue_segment_boundary(q, PAGE_SIZE - 1);
--
drivers/scsi/be2iscsi/be_main.c:        blk_queue_max_segment_size(sdev->request_queue, 65536);
drivers/scsi/be2iscsi/be_main.c-        return 0;
--
drivers/scsi/scsi_debug.c:      blk_queue_max_segment_size(sdp->request_queue, -1U);
drivers/scsi/scsi_debug.c-      if (sdebug_no_uld)
--
drivers/scsi/scsi_lib.c:        blk_queue_max_segment_size(q, dma_get_max_seg_size(dev));
drivers/scsi/scsi_lib.c-
--
drivers/scsi/ufs/ufshcd.c:      blk_queue_max_segment_size(q, PRDT_DATA_BYTE_COUNT_MAX);
drivers/scsi/ufs/ufshcd.c-
--
include/linux/blkdev.h:extern void blk_queue_max_segment_size(struct request_queue *, unsigned int);
include/linux/blkdev.h-extern void blk_queue_max_discard_sectors(struct request_queue *q,
--
include/linux/mmc/host.h:       unsigned int            max_seg_size;   /* see blk_queue_max_segment_size */
include/linux/mmc/host.h-       unsigned short          max_segs;       /* see blk_queue_max_segments */


Some of these devices are probably not going to work well if
passed through to a SEV guest.

Going back to virtio, at some level virtio is like a stacking device so
it does not necessarily need a limit.

-- 
MST

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/3] Fix virtio-blk issue with SWIOTLB
  2019-01-14 20:19             ` Christoph Hellwig
@ 2019-01-14 20:48               ` Michael S. Tsirkin
  2019-01-15 13:09                 ` Christoph Hellwig
  0 siblings, 1 reply; 27+ messages in thread
From: Michael S. Tsirkin @ 2019-01-14 20:48 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jason Wang, Joerg Roedel, Konrad Rzeszutek Wilk, Jens Axboe,
	virtualization, linux-block, linux-kernel, iommu, jfehlig,
	jon.grimm, brijesh.singh

On Mon, Jan 14, 2019 at 09:19:35PM +0100, Christoph Hellwig wrote:
> > Christoph is saying that !IOMMU_PLATFORM devices should hide the
> > compatibility code in a special per-device DMA API implementation.
> > Which would be fine especially if we can manage not to introduce a bunch
> > of indirect calls all over the place and hurt performance.  It's just
> > that the benefit is unlikely to be big (e.g. we can't also get rid of
> > the virtio specific memory barriers) so no one was motivated enough to
> > work on it.
> 
> No.  

Oh ok then.

> The problem is that we still haven't fully specified what
> IOMMU_PLATFORM and !IOMMU_PLATFORM actually mean.  Your
> "ACCESS_PLATFORM/ORDER_PLATFORM" commit in the virtio-spec repo
> improves it a little bit, but it is still far from enough.
> 
> As a start VIRTIO_F_ORDER_PLATFORM and VIRTIO_F_ACCESS_PLATFORM
> absolutely MUST be set for hardware implementations.  Otherwise said
> hardware has no chance of working on anything but the most x86-like
> systems.

I think you are exaggerating a little here.  Limited access with e.g.
need to flush caches is common on lower end but less common on higher
end systems used for virtualization.  As you point out that changes with
e.g. SEV but then SEV is a pretty niche thing so far.

So I would make that a SHOULD probably.

> Second software implementations SHOULD set VIRTIO_F_ACCESS_PLATFORM,
> because otherwise we can't add proper handling for things like SEV or
> the IBM "secure hypervisor" thing.

Yes. If host does not have access to all of memory it SHOULD set
VIRTIO_F_ACCESS_PLATFORM.

It seems rather straight-forward to add conformance clauses
for this, thanks for reminding me.


> Last but not least a lot of wording outside the area describing these
> flags really needs some major updates in terms of DMA access.

Thanks for the reminder. Yes generally spec tends to still say e.g.
"physical address" where it really means a "DMA address" in Linux terms.
Whether changing that will make things much clearer I'm not sure. E.g.
hardware designers don't much care I guess.

If you want to list the most problematic cases, e.g. in a github issue,
we might get to clarifying them sooner.

Thanks!

-- 
MST

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/3] swiotlb: Export maximum allocation size
  2019-01-11  9:12     ` Joerg Roedel
@ 2019-01-14 20:49       ` Konrad Rzeszutek Wilk
  2019-01-14 21:59         ` Michael S. Tsirkin
  0 siblings, 1 reply; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2019-01-14 20:49 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Michael S . Tsirkin, Jason Wang, Jens Axboe, virtualization,
	linux-block, linux-kernel, iommu, jfehlig, jon.grimm,
	brijesh.singh, hch, Joerg Roedel

On Fri, Jan 11, 2019 at 10:12:31AM +0100, Joerg Roedel wrote:
> On Thu, Jan 10, 2019 at 12:02:05PM -0500, Konrad Rzeszutek Wilk wrote:
> > Why not use swiotlb_nr_tbl ? That is how drivers/gpu/drm use to figure if they
> > need to limit the size of pages.
> 
> That function just exports the overall size of the swiotlb aperture, no?
> What I need here is the maximum size for a single mapping.

Yes. The other drivers just assumed that if there is SWIOTLB they would use
the smaller size by default (as in they knew the limitation).

But I agree it would be better to have something smarter - and also convert the
DRM drivers to piggy back on this.

Or alternatively we could make SWIOTLB handle bigger sizes..
> 
> Regards,
> 
> 	Joerg

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/3] swiotlb: Export maximum allocation size
  2019-01-14 20:49       ` Konrad Rzeszutek Wilk
@ 2019-01-14 21:59         ` Michael S. Tsirkin
  2019-01-15 13:05           ` Christoph Hellwig
  0 siblings, 1 reply; 27+ messages in thread
From: Michael S. Tsirkin @ 2019-01-14 21:59 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Joerg Roedel, Jason Wang, Jens Axboe, virtualization,
	linux-block, linux-kernel, iommu, jfehlig, jon.grimm,
	brijesh.singh, hch, Joerg Roedel

On Mon, Jan 14, 2019 at 03:49:07PM -0500, Konrad Rzeszutek Wilk wrote:
> On Fri, Jan 11, 2019 at 10:12:31AM +0100, Joerg Roedel wrote:
> > On Thu, Jan 10, 2019 at 12:02:05PM -0500, Konrad Rzeszutek Wilk wrote:
> > > Why not use swiotlb_nr_tbl ? That is how drivers/gpu/drm use to figure if they
> > > need to limit the size of pages.
> > 
> > That function just exports the overall size of the swiotlb aperture, no?
> > What I need here is the maximum size for a single mapping.
> 
> Yes. The other drivers just assumed that if there is SWIOTLB they would use
> the smaller size by default (as in they knew the limitation).
> 
> But I agree it would be better to have something smarter - and also convert the
> DRM drivers to piggy back on this.
> 
> Or alternatively we could make SWIOTLB handle bigger sizes..


Just a thought: is it a good idea to teach blk_queue_max_segment_size
to get the dma size? This will help us find other devices
possibly missing this check.


> > 
> > Regards,
> > 
> > 	Joerg

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/3] Fix virtio-blk issue with SWIOTLB
  2019-01-14 18:20           ` Michael S. Tsirkin
                               ` (2 preceding siblings ...)
  2019-01-14 20:19             ` Christoph Hellwig
@ 2019-01-15  8:37             ` Joerg Roedel
  2019-01-15 13:20               ` Christoph Hellwig
  3 siblings, 1 reply; 27+ messages in thread
From: Joerg Roedel @ 2019-01-15  8:37 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, Christoph Hellwig, Konrad Rzeszutek Wilk, Jens Axboe,
	virtualization, linux-block, linux-kernel, iommu, jfehlig,
	jon.grimm, brijesh.singh

On Mon, Jan 14, 2019 at 01:20:45PM -0500, Michael S. Tsirkin wrote:
> Which would be fine especially if we can manage not to introduce a bunch
> of indirect calls all over the place and hurt performance.

Which indirect calls? In case of unset dma_ops the DMA-API functions
call directly into the dma-direct implementation, no indirect calls at
all.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/3] swiotlb: Export maximum allocation size
  2019-01-14 21:59         ` Michael S. Tsirkin
@ 2019-01-15 13:05           ` Christoph Hellwig
  0 siblings, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2019-01-15 13:05 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Konrad Rzeszutek Wilk, Joerg Roedel, Jason Wang, Jens Axboe,
	virtualization, linux-block, linux-kernel, iommu, jfehlig,
	jon.grimm, brijesh.singh, hch, Joerg Roedel

On Mon, Jan 14, 2019 at 04:59:27PM -0500, Michael S. Tsirkin wrote:
> On Mon, Jan 14, 2019 at 03:49:07PM -0500, Konrad Rzeszutek Wilk wrote:
> > On Fri, Jan 11, 2019 at 10:12:31AM +0100, Joerg Roedel wrote:
> > > On Thu, Jan 10, 2019 at 12:02:05PM -0500, Konrad Rzeszutek Wilk wrote:
> > > > Why not use swiotlb_nr_tbl ? That is how drivers/gpu/drm use to figure if they
> > > > need to limit the size of pages.
> > > 
> > > That function just exports the overall size of the swiotlb aperture, no?
> > > What I need here is the maximum size for a single mapping.
> > 
> > Yes. The other drivers just assumed that if there is SWIOTLB they would use
> > the smaller size by default (as in they knew the limitation).
> > 
> > But I agree it would be better to have something smarter - and also convert the
> > DRM drivers to piggy back on this.
> > 
> > Or alternatively we could make SWIOTLB handle bigger sizes..
> 
> 
> Just a thought: is it a good idea to teach blk_queue_max_segment_size
> to get the dma size? This will help us find other devices
> possibly missing this check.

Yes, we should.  Both the existing DMA size communicated through dma_params
which is set by the driver, and this new DMA-ops exposed one which needs
to be added.  I'm working on some preliminary patches for the first part,
as I think I introduced a bug related to that in the SCSI layer in 5.0..

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/3] Fix virtio-blk issue with SWIOTLB
  2019-01-14 20:48               ` Michael S. Tsirkin
@ 2019-01-15 13:09                 ` Christoph Hellwig
  0 siblings, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2019-01-15 13:09 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, Jason Wang, Joerg Roedel,
	Konrad Rzeszutek Wilk, Jens Axboe, virtualization, linux-block,
	linux-kernel, iommu, jfehlig, jon.grimm, brijesh.singh

On Mon, Jan 14, 2019 at 03:48:47PM -0500, Michael S. Tsirkin wrote:
> I think you are exaggerating a little here.  Limited access with e.g.
> need to flush caches is common on lower end but less common on higher
> end systems used for virtualization.  As you point out that changes with
> e.g. SEV but then SEV is a pretty niche thing so far.
> 
> So I would make that a SHOULD probably.

The problem is that without using DMA ops you can't just plug the device
into a random system, which is a total no-go for periphals.

And cache flushing is not that uncommmon, e.g. every sparc systems
needs it, many arm/arm64 ones, some power ones, lots of mips ones, etc.

But cache flushing is just one side of it - lots of systems have either
mandatory or option IOMMUs, and without the platform access flag that
doesn't work.  As does that case where you have a DMA offset, which
is extremly common on arm and power systems.  So you might find a few
non-x86 systems you can plug it in and it'll just work, but it won't
be all that many.  And even on x86 your device will be completely broken
if the users dares to use an IOMMU.

> > flags really needs some major updates in terms of DMA access.
> 
> Thanks for the reminder. Yes generally spec tends to still say e.g.
> "physical address" where it really means a "DMA address" in Linux terms.
> Whether changing that will make things much clearer I'm not sure. E.g.
> hardware designers don't much care I guess.

At least say bus address - as said above the CPU and bus concepts of
physicall addresses are different in many systems.  Including x86
when you use IOMMUs.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/3] Fix virtio-blk issue with SWIOTLB
  2019-01-15  8:37             ` Joerg Roedel
@ 2019-01-15 13:20               ` Christoph Hellwig
  2019-01-16 14:16                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2019-01-15 13:20 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Michael S. Tsirkin, Jason Wang, Christoph Hellwig,
	Konrad Rzeszutek Wilk, Jens Axboe, virtualization, linux-block,
	linux-kernel, iommu, jfehlig, jon.grimm, brijesh.singh

On Tue, Jan 15, 2019 at 09:37:42AM +0100, Joerg Roedel wrote:
> On Mon, Jan 14, 2019 at 01:20:45PM -0500, Michael S. Tsirkin wrote:
> > Which would be fine especially if we can manage not to introduce a bunch
> > of indirect calls all over the place and hurt performance.
> 
> Which indirect calls? In case of unset dma_ops the DMA-API functions
> call directly into the dma-direct implementation, no indirect calls at
> all.

True.  But the NULL-ops dma direct case still has two issues that might
not work for virtio:

 (a) if the architecture supports devices that are not DMA coherent it
     will dip into a special allocator and do cache maintainance for
     streaming mappings.  Although it would be a bit of a hack we could
     work around this in virtio doings something like:

#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
    defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \
    defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
	dev->dma_coherent = true;
#endif

    except that won't work for mips, which has a mode where it does
    a system instead of device level coherency flag and thus doesn't
    look at this struct device field

 (b) we can still mangle the DMA address, either using the
     dma_pfn_offset field in struct device, or by a full override
     of __phys_to_dma / __dma_to_phys by the architecture code.
     The first could be solved using a hack like the one above,
     but the latter would be a little harder.  In the long run
     I'd love to get rid of that hook and have everyone use the
     generic offset code, but for that we first need to support
     multiple ranges with different offset and do quite some
     nasty arch code surgery.
 
So for the legacy virtio case I fear we need to keep local dma mapping
implementation for now.  I just wish now recent hypervisor would ever
offer devices in this broken legacy mode..

> 
> Regards,
> 
> 	Joerg
---end quoted text---

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/3] Fix virtio-blk issue with SWIOTLB
  2019-01-15 13:20               ` Christoph Hellwig
@ 2019-01-16 14:16                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 27+ messages in thread
From: Michael S. Tsirkin @ 2019-01-16 14:16 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Joerg Roedel, Jason Wang, Konrad Rzeszutek Wilk, Jens Axboe,
	virtualization, linux-block, linux-kernel, iommu, jfehlig,
	jon.grimm, brijesh.singh

On Tue, Jan 15, 2019 at 02:20:19PM +0100, Christoph Hellwig wrote:
> On Tue, Jan 15, 2019 at 09:37:42AM +0100, Joerg Roedel wrote:
> > On Mon, Jan 14, 2019 at 01:20:45PM -0500, Michael S. Tsirkin wrote:
> > > Which would be fine especially if we can manage not to introduce a bunch
> > > of indirect calls all over the place and hurt performance.
> > 
> > Which indirect calls? In case of unset dma_ops the DMA-API functions
> > call directly into the dma-direct implementation, no indirect calls at
> > all.
> 
> True.  But the NULL-ops dma direct case still has two issues that might
> not work for virtio:
> 
>  (a) if the architecture supports devices that are not DMA coherent it
>      will dip into a special allocator and do cache maintainance for
>      streaming mappings.  Although it would be a bit of a hack we could
>      work around this in virtio doings something like:
> 
> #if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
>     defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \
>     defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
> 	dev->dma_coherent = true;
> #endif
> 
>     except that won't work for mips, which has a mode where it does
>     a system instead of device level coherency flag and thus doesn't
>     look at this struct device field
> 
>  (b) we can still mangle the DMA address, either using the
>      dma_pfn_offset field in struct device, or by a full override
>      of __phys_to_dma / __dma_to_phys by the architecture code.
>      The first could be solved using a hack like the one above,
>      but the latter would be a little harder.  In the long run
>      I'd love to get rid of that hook and have everyone use the
>      generic offset code, but for that we first need to support
>      multiple ranges with different offset and do quite some
>      nasty arch code surgery.
>  
> So for the legacy virtio case I fear we need to keep local dma mapping
> implementation for now.  I just wish now recent hypervisor would ever
> offer devices in this broken legacy mode..

IIUC some emulated hardware has no reasonable way to specify that
some devices bypass the IOMMU, and no cheap way to cover all
memory with an IOMMU mapping.

So I suspect !ACCESS_PLATFORM will stay a supported configuration
forever :(



> > 
> > Regards,
> > 
> > 	Joerg
> ---end quoted text---

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2019-01-16 14:16 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-10 13:44 [PATCH 0/3] Fix virtio-blk issue with SWIOTLB Joerg Roedel
2019-01-10 13:44 ` [PATCH 1/3] swiotlb: Export maximum allocation size Joerg Roedel
2019-01-10 17:02   ` Konrad Rzeszutek Wilk
2019-01-11  9:12     ` Joerg Roedel
2019-01-14 20:49       ` Konrad Rzeszutek Wilk
2019-01-14 21:59         ` Michael S. Tsirkin
2019-01-15 13:05           ` Christoph Hellwig
2019-01-10 13:44 ` [PATCH 2/3] virtio: Introduce virtio_max_dma_size() Joerg Roedel
2019-01-10 13:44 ` [PATCH 3/3] virtio-blk: Consider virtio_max_dma_size() for maximum segment size Joerg Roedel
2019-01-10 13:59 ` [PATCH 0/3] Fix virtio-blk issue with SWIOTLB Christoph Hellwig
2019-01-10 14:26   ` Joerg Roedel
2019-01-11  3:29 ` Jason Wang
2019-01-11  9:15   ` Joerg Roedel
2019-01-14  9:41     ` Jason Wang
2019-01-14  9:50       ` Christoph Hellwig
2019-01-14 12:41         ` Jason Wang
2019-01-14 18:20           ` Michael S. Tsirkin
2019-01-14 19:09             ` Singh, Brijesh
2019-01-14 19:12             ` Robin Murphy
2019-01-14 20:22               ` Christoph Hellwig
2019-01-14 20:29               ` Michael S. Tsirkin
2019-01-14 20:19             ` Christoph Hellwig
2019-01-14 20:48               ` Michael S. Tsirkin
2019-01-15 13:09                 ` Christoph Hellwig
2019-01-15  8:37             ` Joerg Roedel
2019-01-15 13:20               ` Christoph Hellwig
2019-01-16 14:16                 ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).