* [PATCH 0/3] Fix virtio-blk issue with SWIOTLB @ 2019-01-10 13:44 Joerg Roedel 2019-01-10 13:44 ` [PATCH 1/3] swiotlb: Export maximum allocation size Joerg Roedel ` (4 more replies) 0 siblings, 5 replies; 27+ messages in thread From: Joerg Roedel @ 2019-01-10 13:44 UTC (permalink / raw) To: Michael S . Tsirkin, Jason Wang, Konrad Rzeszutek Wilk Cc: Jens Axboe, virtualization, linux-block, linux-kernel, iommu, jfehlig, jon.grimm, brijesh.singh, hch, Joerg Roedel Hi, there is a problem with virtio-blk driven devices when virtio-ring uses SWIOTLB through the DMA-API. This happens for example in AMD-SEV enabled guests, where the guest RAM is mostly encrypted and all emulated DMA has to happen to/from the SWIOTLB aperture. The problem is a limitation of the SWIOTLB implementation, which does not support allocations larger than 256kb. When the virtio-blk driver tries to read/write a block larger than that, the allocation of the dma-handle fails and an IO error is reported. This patch-set adds a check to the virtio-code whether it might be using SWIOTLB bounce buffering and limits the maximum segment size in the virtio-blk driver in this case, so that it doesn't try to do larger reads/writes. Please review. Thanks, Joerg Joerg Roedel (3): swiotlb: Export maximum allocation size virtio: Introduce virtio_max_dma_size() virtio-blk: Consider virtio_max_dma_size() for maximum segment size drivers/block/virtio_blk.c | 10 ++++++---- drivers/virtio/virtio_ring.c | 11 +++++++++++ include/linux/swiotlb.h | 12 ++++++++++++ include/linux/virtio.h | 2 ++ 4 files changed, 31 insertions(+), 4 deletions(-) -- 2.17.1 ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 1/3] swiotlb: Export maximum allocation size 2019-01-10 13:44 [PATCH 0/3] Fix virtio-blk issue with SWIOTLB Joerg Roedel @ 2019-01-10 13:44 ` Joerg Roedel 2019-01-10 17:02 ` Konrad Rzeszutek Wilk 2019-01-10 13:44 ` [PATCH 2/3] virtio: Introduce virtio_max_dma_size() Joerg Roedel ` (3 subsequent siblings) 4 siblings, 1 reply; 27+ messages in thread From: Joerg Roedel @ 2019-01-10 13:44 UTC (permalink / raw) To: Michael S . Tsirkin, Jason Wang, Konrad Rzeszutek Wilk Cc: Jens Axboe, virtualization, linux-block, linux-kernel, iommu, jfehlig, jon.grimm, brijesh.singh, hch, Joerg Roedel From: Joerg Roedel <jroedel@suse.de> The SWIOTLB implementation has a maximum size it can allocate dma-handles for. This needs to be exported so that device drivers don't try to allocate larger chunks. This is especially important for block device drivers like virtio-blk, that might do DMA through SWIOTLB. Signed-off-by: Joerg Roedel <jroedel@suse.de> --- include/linux/swiotlb.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 7c007ed7505f..0bcc80a97036 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -72,6 +72,11 @@ static inline bool is_swiotlb_buffer(phys_addr_t paddr) return paddr >= io_tlb_start && paddr < io_tlb_end; } +static inline size_t swiotlb_max_alloc_size(void) +{ + return ((1UL << IO_TLB_SHIFT) * IO_TLB_SEGSIZE); +} + bool swiotlb_map(struct device *dev, phys_addr_t *phys, dma_addr_t *dma_addr, size_t size, enum dma_data_direction dir, unsigned long attrs); void __init swiotlb_exit(void); @@ -95,6 +100,13 @@ static inline unsigned int swiotlb_max_segment(void) { return 0; } + +static inline size_t swiotlb_max_alloc_size(void) +{ + /* There is no limit when SWIOTLB isn't used */ + return ~0UL; +} + #endif /* CONFIG_SWIOTLB */ extern void swiotlb_print_info(void); -- 2.17.1 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH 1/3] swiotlb: Export maximum allocation size 2019-01-10 13:44 ` [PATCH 1/3] swiotlb: Export maximum allocation size Joerg Roedel @ 2019-01-10 17:02 ` Konrad Rzeszutek Wilk 2019-01-11 9:12 ` Joerg Roedel 0 siblings, 1 reply; 27+ messages in thread From: Konrad Rzeszutek Wilk @ 2019-01-10 17:02 UTC (permalink / raw) To: Joerg Roedel Cc: Michael S . Tsirkin, Jason Wang, Jens Axboe, virtualization, linux-block, linux-kernel, iommu, jfehlig, jon.grimm, brijesh.singh, hch, Joerg Roedel On Thu, Jan 10, 2019 at 02:44:31PM +0100, Joerg Roedel wrote: > From: Joerg Roedel <jroedel@suse.de> > > The SWIOTLB implementation has a maximum size it can > allocate dma-handles for. This needs to be exported so that > device drivers don't try to allocate larger chunks. > > This is especially important for block device drivers like > virtio-blk, that might do DMA through SWIOTLB. Why not use swiotlb_nr_tbl ? That is how drivers/gpu/drm use to figure if they need to limit the size of pages. > > Signed-off-by: Joerg Roedel <jroedel@suse.de> > --- > include/linux/swiotlb.h | 12 ++++++++++++ > 1 file changed, 12 insertions(+) > > diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h > index 7c007ed7505f..0bcc80a97036 100644 > --- a/include/linux/swiotlb.h > +++ b/include/linux/swiotlb.h > @@ -72,6 +72,11 @@ static inline bool is_swiotlb_buffer(phys_addr_t paddr) > return paddr >= io_tlb_start && paddr < io_tlb_end; > } > > +static inline size_t swiotlb_max_alloc_size(void) > +{ > + return ((1UL << IO_TLB_SHIFT) * IO_TLB_SEGSIZE); > +} > + > bool swiotlb_map(struct device *dev, phys_addr_t *phys, dma_addr_t *dma_addr, > size_t size, enum dma_data_direction dir, unsigned long attrs); > void __init swiotlb_exit(void); > @@ -95,6 +100,13 @@ static inline unsigned int swiotlb_max_segment(void) > { > return 0; > } > + > +static inline size_t swiotlb_max_alloc_size(void) > +{ > + /* There is no limit when SWIOTLB isn't used */ > + return ~0UL; > +} > + > #endif /* CONFIG_SWIOTLB */ > > extern void swiotlb_print_info(void); > -- > 2.17.1 > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/3] swiotlb: Export maximum allocation size 2019-01-10 17:02 ` Konrad Rzeszutek Wilk @ 2019-01-11 9:12 ` Joerg Roedel 2019-01-14 20:49 ` Konrad Rzeszutek Wilk 0 siblings, 1 reply; 27+ messages in thread From: Joerg Roedel @ 2019-01-11 9:12 UTC (permalink / raw) To: Konrad Rzeszutek Wilk Cc: Michael S . Tsirkin, Jason Wang, Jens Axboe, virtualization, linux-block, linux-kernel, iommu, jfehlig, jon.grimm, brijesh.singh, hch, Joerg Roedel On Thu, Jan 10, 2019 at 12:02:05PM -0500, Konrad Rzeszutek Wilk wrote: > Why not use swiotlb_nr_tbl ? That is how drivers/gpu/drm use to figure if they > need to limit the size of pages. That function just exports the overall size of the swiotlb aperture, no? What I need here is the maximum size for a single mapping. Regards, Joerg ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/3] swiotlb: Export maximum allocation size 2019-01-11 9:12 ` Joerg Roedel @ 2019-01-14 20:49 ` Konrad Rzeszutek Wilk 2019-01-14 21:59 ` Michael S. Tsirkin 0 siblings, 1 reply; 27+ messages in thread From: Konrad Rzeszutek Wilk @ 2019-01-14 20:49 UTC (permalink / raw) To: Joerg Roedel Cc: Michael S . Tsirkin, Jason Wang, Jens Axboe, virtualization, linux-block, linux-kernel, iommu, jfehlig, jon.grimm, brijesh.singh, hch, Joerg Roedel On Fri, Jan 11, 2019 at 10:12:31AM +0100, Joerg Roedel wrote: > On Thu, Jan 10, 2019 at 12:02:05PM -0500, Konrad Rzeszutek Wilk wrote: > > Why not use swiotlb_nr_tbl ? That is how drivers/gpu/drm use to figure if they > > need to limit the size of pages. > > That function just exports the overall size of the swiotlb aperture, no? > What I need here is the maximum size for a single mapping. Yes. The other drivers just assumed that if there is SWIOTLB they would use the smaller size by default (as in they knew the limitation). But I agree it would be better to have something smarter - and also convert the DRM drivers to piggy back on this. Or alternatively we could make SWIOTLB handle bigger sizes.. > > Regards, > > Joerg ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/3] swiotlb: Export maximum allocation size 2019-01-14 20:49 ` Konrad Rzeszutek Wilk @ 2019-01-14 21:59 ` Michael S. Tsirkin 2019-01-15 13:05 ` Christoph Hellwig 0 siblings, 1 reply; 27+ messages in thread From: Michael S. Tsirkin @ 2019-01-14 21:59 UTC (permalink / raw) To: Konrad Rzeszutek Wilk Cc: Joerg Roedel, Jason Wang, Jens Axboe, virtualization, linux-block, linux-kernel, iommu, jfehlig, jon.grimm, brijesh.singh, hch, Joerg Roedel On Mon, Jan 14, 2019 at 03:49:07PM -0500, Konrad Rzeszutek Wilk wrote: > On Fri, Jan 11, 2019 at 10:12:31AM +0100, Joerg Roedel wrote: > > On Thu, Jan 10, 2019 at 12:02:05PM -0500, Konrad Rzeszutek Wilk wrote: > > > Why not use swiotlb_nr_tbl ? That is how drivers/gpu/drm use to figure if they > > > need to limit the size of pages. > > > > That function just exports the overall size of the swiotlb aperture, no? > > What I need here is the maximum size for a single mapping. > > Yes. The other drivers just assumed that if there is SWIOTLB they would use > the smaller size by default (as in they knew the limitation). > > But I agree it would be better to have something smarter - and also convert the > DRM drivers to piggy back on this. > > Or alternatively we could make SWIOTLB handle bigger sizes.. Just a thought: is it a good idea to teach blk_queue_max_segment_size to get the dma size? This will help us find other devices possibly missing this check. > > > > Regards, > > > > Joerg ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/3] swiotlb: Export maximum allocation size 2019-01-14 21:59 ` Michael S. Tsirkin @ 2019-01-15 13:05 ` Christoph Hellwig 0 siblings, 0 replies; 27+ messages in thread From: Christoph Hellwig @ 2019-01-15 13:05 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Konrad Rzeszutek Wilk, Joerg Roedel, Jason Wang, Jens Axboe, virtualization, linux-block, linux-kernel, iommu, jfehlig, jon.grimm, brijesh.singh, hch, Joerg Roedel On Mon, Jan 14, 2019 at 04:59:27PM -0500, Michael S. Tsirkin wrote: > On Mon, Jan 14, 2019 at 03:49:07PM -0500, Konrad Rzeszutek Wilk wrote: > > On Fri, Jan 11, 2019 at 10:12:31AM +0100, Joerg Roedel wrote: > > > On Thu, Jan 10, 2019 at 12:02:05PM -0500, Konrad Rzeszutek Wilk wrote: > > > > Why not use swiotlb_nr_tbl ? That is how drivers/gpu/drm use to figure if they > > > > need to limit the size of pages. > > > > > > That function just exports the overall size of the swiotlb aperture, no? > > > What I need here is the maximum size for a single mapping. > > > > Yes. The other drivers just assumed that if there is SWIOTLB they would use > > the smaller size by default (as in they knew the limitation). > > > > But I agree it would be better to have something smarter - and also convert the > > DRM drivers to piggy back on this. > > > > Or alternatively we could make SWIOTLB handle bigger sizes.. > > > Just a thought: is it a good idea to teach blk_queue_max_segment_size > to get the dma size? This will help us find other devices > possibly missing this check. Yes, we should. Both the existing DMA size communicated through dma_params which is set by the driver, and this new DMA-ops exposed one which needs to be added. I'm working on some preliminary patches for the first part, as I think I introduced a bug related to that in the SCSI layer in 5.0.. ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 2/3] virtio: Introduce virtio_max_dma_size() 2019-01-10 13:44 [PATCH 0/3] Fix virtio-blk issue with SWIOTLB Joerg Roedel 2019-01-10 13:44 ` [PATCH 1/3] swiotlb: Export maximum allocation size Joerg Roedel @ 2019-01-10 13:44 ` Joerg Roedel 2019-01-10 13:44 ` [PATCH 3/3] virtio-blk: Consider virtio_max_dma_size() for maximum segment size Joerg Roedel ` (2 subsequent siblings) 4 siblings, 0 replies; 27+ messages in thread From: Joerg Roedel @ 2019-01-10 13:44 UTC (permalink / raw) To: Michael S . Tsirkin, Jason Wang, Konrad Rzeszutek Wilk Cc: Jens Axboe, virtualization, linux-block, linux-kernel, iommu, jfehlig, jon.grimm, brijesh.singh, hch, Joerg Roedel From: Joerg Roedel <jroedel@suse.de> This function returns the maximum segment size for a single dma transaction of a virtio device. The possible limit comes from the SWIOTLB implementation in the Linux kernel, that has an upper limit of (currently) 256kb of contiguous memory it can map. The functions return the lower limie when SWIOTLB is used for DMA transactions of the device and -1U otherwise. Signed-off-by: Joerg Roedel <jroedel@suse.de> --- drivers/virtio/virtio_ring.c | 11 +++++++++++ include/linux/virtio.h | 2 ++ 2 files changed, 13 insertions(+) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index cd7e755484e3..c8d229da203e 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -266,6 +266,17 @@ static bool vring_use_dma_api(struct virtio_device *vdev) return false; } +size_t virtio_max_dma_size(struct virtio_device *vdev) +{ + const struct dma_map_ops *ops = get_dma_ops(&vdev->dev); + size_t max_segment_size = -1U; + + if (vring_use_dma_api(vdev) && dma_is_direct(ops)) + max_segment_size = swiotlb_max_alloc_size(); + + return max_segment_size; +} + static void *vring_alloc_queue(struct virtio_device *vdev, size_t size, dma_addr_t *dma_handle, gfp_t flag) { diff --git a/include/linux/virtio.h b/include/linux/virtio.h index fa1b5da2804e..673fe3ef3607 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -157,6 +157,8 @@ int virtio_device_freeze(struct virtio_device *dev); int virtio_device_restore(struct virtio_device *dev); #endif +size_t virtio_max_dma_size(struct virtio_device *vdev); + #define virtio_device_for_each_vq(vdev, vq) \ list_for_each_entry(vq, &vdev->vqs, list) -- 2.17.1 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 3/3] virtio-blk: Consider virtio_max_dma_size() for maximum segment size 2019-01-10 13:44 [PATCH 0/3] Fix virtio-blk issue with SWIOTLB Joerg Roedel 2019-01-10 13:44 ` [PATCH 1/3] swiotlb: Export maximum allocation size Joerg Roedel 2019-01-10 13:44 ` [PATCH 2/3] virtio: Introduce virtio_max_dma_size() Joerg Roedel @ 2019-01-10 13:44 ` Joerg Roedel 2019-01-10 13:59 ` [PATCH 0/3] Fix virtio-blk issue with SWIOTLB Christoph Hellwig 2019-01-11 3:29 ` Jason Wang 4 siblings, 0 replies; 27+ messages in thread From: Joerg Roedel @ 2019-01-10 13:44 UTC (permalink / raw) To: Michael S . Tsirkin, Jason Wang, Konrad Rzeszutek Wilk Cc: Jens Axboe, virtualization, linux-block, linux-kernel, iommu, jfehlig, jon.grimm, brijesh.singh, hch, Joerg Roedel From: Joerg Roedel <jroedel@suse.de> Segments can't be larger than the maximum DMA allocation size supported by virtio on the platform. Take that into account when setting the maximum segment size for a block device. Signed-off-by: Joerg Roedel <jroedel@suse.de> --- drivers/block/virtio_blk.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index b16a887bbd02..6062faa8d213 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -723,7 +723,7 @@ static int virtblk_probe(struct virtio_device *vdev) struct request_queue *q; int err, index; - u32 v, blk_size, sg_elems, opt_io_size; + u32 v, blk_size, seg_size, sg_elems, opt_io_size; u16 min_io_size; u8 physical_block_exp, alignment_offset; @@ -826,14 +826,16 @@ static int virtblk_probe(struct virtio_device *vdev) /* No real sector limit. */ blk_queue_max_hw_sectors(q, -1U); + seg_size = virtio_max_dma_size(vdev); + /* Host can optionally specify maximum segment size and number of * segments. */ err = virtio_cread_feature(vdev, VIRTIO_BLK_F_SIZE_MAX, struct virtio_blk_config, size_max, &v); if (!err) - blk_queue_max_segment_size(q, v); - else - blk_queue_max_segment_size(q, -1U); + seg_size = min(v, seg_size); + + blk_queue_max_segment_size(q, seg_size); /* Host can optionally specify the block size of the device */ err = virtio_cread_feature(vdev, VIRTIO_BLK_F_BLK_SIZE, -- 2.17.1 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH 0/3] Fix virtio-blk issue with SWIOTLB 2019-01-10 13:44 [PATCH 0/3] Fix virtio-blk issue with SWIOTLB Joerg Roedel ` (2 preceding siblings ...) 2019-01-10 13:44 ` [PATCH 3/3] virtio-blk: Consider virtio_max_dma_size() for maximum segment size Joerg Roedel @ 2019-01-10 13:59 ` Christoph Hellwig 2019-01-10 14:26 ` Joerg Roedel 2019-01-11 3:29 ` Jason Wang 4 siblings, 1 reply; 27+ messages in thread From: Christoph Hellwig @ 2019-01-10 13:59 UTC (permalink / raw) To: Joerg Roedel Cc: Michael S . Tsirkin, Jason Wang, Konrad Rzeszutek Wilk, Jens Axboe, virtualization, linux-block, linux-kernel, iommu, jfehlig, jon.grimm, brijesh.singh, hch On Thu, Jan 10, 2019 at 02:44:30PM +0100, Joerg Roedel wrote: > Hi, > > there is a problem with virtio-blk driven devices when > virtio-ring uses SWIOTLB through the DMA-API. This happens > for example in AMD-SEV enabled guests, where the guest RAM > is mostly encrypted and all emulated DMA has to happen > to/from the SWIOTLB aperture. > > The problem is a limitation of the SWIOTLB implementation, > which does not support allocations larger than 256kb. When > the virtio-blk driver tries to read/write a block larger > than that, the allocation of the dma-handle fails and an IO > error is reported. s/allocations/mappings/, right? We don't use the swiotlb buffer for the coherent allocations anymore, and I don't think virtio-blk uses them anyway. > This patch-set adds a check to the virtio-code whether it > might be using SWIOTLB bounce buffering and limits the > maximum segment size in the virtio-blk driver in this case, > so that it doesn't try to do larger reads/writes. I really don't like the fact that we hardcode swiotlb specific. This needs to be indirect through the dma ops or struct device, as various iommu implementations also have limits (although usually much larger ones). ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/3] Fix virtio-blk issue with SWIOTLB 2019-01-10 13:59 ` [PATCH 0/3] Fix virtio-blk issue with SWIOTLB Christoph Hellwig @ 2019-01-10 14:26 ` Joerg Roedel 0 siblings, 0 replies; 27+ messages in thread From: Joerg Roedel @ 2019-01-10 14:26 UTC (permalink / raw) To: Christoph Hellwig Cc: Michael S . Tsirkin, Jason Wang, Konrad Rzeszutek Wilk, Jens Axboe, virtualization, linux-block, linux-kernel, iommu, jfehlig, jon.grimm, brijesh.singh Hi Christoph, On Thu, Jan 10, 2019 at 02:59:52PM +0100, Christoph Hellwig wrote: > On Thu, Jan 10, 2019 at 02:44:30PM +0100, Joerg Roedel wrote: > > The problem is a limitation of the SWIOTLB implementation, > > which does not support allocations larger than 256kb. When > > the virtio-blk driver tries to read/write a block larger > > than that, the allocation of the dma-handle fails and an IO > > error is reported. > > s/allocations/mappings/, right? We don't use the swiotlb > buffer for the coherent allocations anymore, and I don't think > virtio-blk uses them anyway. 'Allocation' in the sense that there are address ranges allocated, not memory, but mappings would also be right. > I really don't like the fact that we hardcode swiotlb specific. > This needs to be indirect through the dma ops or struct device, > as various iommu implementations also have limits (although > usually much larger ones). I though about exposing it through DMA-API, but didn't go that route as I didn't want to extend a generic API for some SWIOTLB specifics. But if there are more implementations that have a size limitation it makes certainly sense to put it into the DMA-API. I'll change that in the next version. Thanks, Joerg ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/3] Fix virtio-blk issue with SWIOTLB 2019-01-10 13:44 [PATCH 0/3] Fix virtio-blk issue with SWIOTLB Joerg Roedel ` (3 preceding siblings ...) 2019-01-10 13:59 ` [PATCH 0/3] Fix virtio-blk issue with SWIOTLB Christoph Hellwig @ 2019-01-11 3:29 ` Jason Wang 2019-01-11 9:15 ` Joerg Roedel 4 siblings, 1 reply; 27+ messages in thread From: Jason Wang @ 2019-01-11 3:29 UTC (permalink / raw) To: Joerg Roedel, Michael S . Tsirkin, Konrad Rzeszutek Wilk Cc: Jens Axboe, virtualization, linux-block, linux-kernel, iommu, jfehlig, jon.grimm, brijesh.singh, hch On 2019/1/10 下午9:44, Joerg Roedel wrote: > Hi, > > there is a problem with virtio-blk driven devices when > virtio-ring uses SWIOTLB through the DMA-API. This happens > for example in AMD-SEV enabled guests, where the guest RAM > is mostly encrypted and all emulated DMA has to happen > to/from the SWIOTLB aperture. > > The problem is a limitation of the SWIOTLB implementation, > which does not support allocations larger than 256kb. When > the virtio-blk driver tries to read/write a block larger > than that, the allocation of the dma-handle fails and an IO > error is reported. > > This patch-set adds a check to the virtio-code whether it > might be using SWIOTLB bounce buffering and limits the > maximum segment size in the virtio-blk driver in this case, > so that it doesn't try to do larger reads/writes. Just wonder if my understanding is correct IOMMU_PLATFORM must be set for all virtio devices under AMD-SEV guests? Thanks > > Please review. > > Thanks, > > Joerg > > Joerg Roedel (3): > swiotlb: Export maximum allocation size > virtio: Introduce virtio_max_dma_size() > virtio-blk: Consider virtio_max_dma_size() for maximum segment size > > drivers/block/virtio_blk.c | 10 ++++++---- > drivers/virtio/virtio_ring.c | 11 +++++++++++ > include/linux/swiotlb.h | 12 ++++++++++++ > include/linux/virtio.h | 2 ++ > 4 files changed, 31 insertions(+), 4 deletions(-) > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/3] Fix virtio-blk issue with SWIOTLB 2019-01-11 3:29 ` Jason Wang @ 2019-01-11 9:15 ` Joerg Roedel 2019-01-14 9:41 ` Jason Wang 0 siblings, 1 reply; 27+ messages in thread From: Joerg Roedel @ 2019-01-11 9:15 UTC (permalink / raw) To: Jason Wang Cc: Michael S . Tsirkin, Konrad Rzeszutek Wilk, Jens Axboe, virtualization, linux-block, linux-kernel, iommu, jfehlig, jon.grimm, brijesh.singh, hch On Fri, Jan 11, 2019 at 11:29:31AM +0800, Jason Wang wrote: > Just wonder if my understanding is correct IOMMU_PLATFORM must be set for > all virtio devices under AMD-SEV guests? Yes, that is correct. Emulated DMA can only happen on the SWIOTLB aperture, because that memory is not encrypted. The guest bounces the data then to its encrypted memory. Regards, Joerg ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/3] Fix virtio-blk issue with SWIOTLB 2019-01-11 9:15 ` Joerg Roedel @ 2019-01-14 9:41 ` Jason Wang 2019-01-14 9:50 ` Christoph Hellwig 0 siblings, 1 reply; 27+ messages in thread From: Jason Wang @ 2019-01-14 9:41 UTC (permalink / raw) To: Joerg Roedel Cc: Michael S . Tsirkin, Konrad Rzeszutek Wilk, Jens Axboe, virtualization, linux-block, linux-kernel, iommu, jfehlig, jon.grimm, brijesh.singh, hch On 2019/1/11 下午5:15, Joerg Roedel wrote: > On Fri, Jan 11, 2019 at 11:29:31AM +0800, Jason Wang wrote: >> Just wonder if my understanding is correct IOMMU_PLATFORM must be set for >> all virtio devices under AMD-SEV guests? > Yes, that is correct. Emulated DMA can only happen on the SWIOTLB > aperture, because that memory is not encrypted. The guest bounces the > data then to its encrypted memory. > > Regards, > > Joerg Thanks, have you tested vhost-net in this case. I suspect it may not work ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/3] Fix virtio-blk issue with SWIOTLB 2019-01-14 9:41 ` Jason Wang @ 2019-01-14 9:50 ` Christoph Hellwig 2019-01-14 12:41 ` Jason Wang 0 siblings, 1 reply; 27+ messages in thread From: Christoph Hellwig @ 2019-01-14 9:50 UTC (permalink / raw) To: Jason Wang Cc: Joerg Roedel, Michael S . Tsirkin, Konrad Rzeszutek Wilk, Jens Axboe, virtualization, linux-block, linux-kernel, iommu, jfehlig, jon.grimm, brijesh.singh, hch On Mon, Jan 14, 2019 at 05:41:56PM +0800, Jason Wang wrote: > > On 2019/1/11 下午5:15, Joerg Roedel wrote: >> On Fri, Jan 11, 2019 at 11:29:31AM +0800, Jason Wang wrote: >>> Just wonder if my understanding is correct IOMMU_PLATFORM must be set for >>> all virtio devices under AMD-SEV guests? >> Yes, that is correct. Emulated DMA can only happen on the SWIOTLB >> aperture, because that memory is not encrypted. The guest bounces the >> data then to its encrypted memory. >> >> Regards, >> >> Joerg > > > Thanks, have you tested vhost-net in this case. I suspect it may not work Which brings me back to my pet pevee that we need to take actions that virtio uses the proper dma mapping API by default with quirks for legacy cases. The magic bypass it uses is just causing problems over problems. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/3] Fix virtio-blk issue with SWIOTLB 2019-01-14 9:50 ` Christoph Hellwig @ 2019-01-14 12:41 ` Jason Wang 2019-01-14 18:20 ` Michael S. Tsirkin 0 siblings, 1 reply; 27+ messages in thread From: Jason Wang @ 2019-01-14 12:41 UTC (permalink / raw) To: Christoph Hellwig Cc: Joerg Roedel, Michael S . Tsirkin, Konrad Rzeszutek Wilk, Jens Axboe, virtualization, linux-block, linux-kernel, iommu, jfehlig, jon.grimm, brijesh.singh On 2019/1/14 下午5:50, Christoph Hellwig wrote: > On Mon, Jan 14, 2019 at 05:41:56PM +0800, Jason Wang wrote: >> On 2019/1/11 下午5:15, Joerg Roedel wrote: >>> On Fri, Jan 11, 2019 at 11:29:31AM +0800, Jason Wang wrote: >>>> Just wonder if my understanding is correct IOMMU_PLATFORM must be set for >>>> all virtio devices under AMD-SEV guests? >>> Yes, that is correct. Emulated DMA can only happen on the SWIOTLB >>> aperture, because that memory is not encrypted. The guest bounces the >>> data then to its encrypted memory. >>> >>> Regards, >>> >>> Joerg >> >> Thanks, have you tested vhost-net in this case. I suspect it may not work > Which brings me back to my pet pevee that we need to take actions > that virtio uses the proper dma mapping API by default with quirks > for legacy cases. The magic bypass it uses is just causing problems > over problems. Yes, I fully agree with you. This is probably an exact example of such problem. Thanks ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/3] Fix virtio-blk issue with SWIOTLB 2019-01-14 12:41 ` Jason Wang @ 2019-01-14 18:20 ` Michael S. Tsirkin 2019-01-14 19:09 ` Singh, Brijesh ` (3 more replies) 0 siblings, 4 replies; 27+ messages in thread From: Michael S. Tsirkin @ 2019-01-14 18:20 UTC (permalink / raw) To: Jason Wang Cc: Christoph Hellwig, Joerg Roedel, Konrad Rzeszutek Wilk, Jens Axboe, virtualization, linux-block, linux-kernel, iommu, jfehlig, jon.grimm, brijesh.singh On Mon, Jan 14, 2019 at 08:41:37PM +0800, Jason Wang wrote: > > On 2019/1/14 下午5:50, Christoph Hellwig wrote: > > On Mon, Jan 14, 2019 at 05:41:56PM +0800, Jason Wang wrote: > > > On 2019/1/11 下午5:15, Joerg Roedel wrote: > > > > On Fri, Jan 11, 2019 at 11:29:31AM +0800, Jason Wang wrote: > > > > > Just wonder if my understanding is correct IOMMU_PLATFORM must be set for > > > > > all virtio devices under AMD-SEV guests? > > > > Yes, that is correct. Emulated DMA can only happen on the SWIOTLB > > > > aperture, because that memory is not encrypted. The guest bounces the > > > > data then to its encrypted memory. > > > > > > > > Regards, > > > > > > > > Joerg > > > > > > Thanks, have you tested vhost-net in this case. I suspect it may not work > > Which brings me back to my pet pevee that we need to take actions > > that virtio uses the proper dma mapping API by default with quirks > > for legacy cases. The magic bypass it uses is just causing problems > > over problems. > > > Yes, I fully agree with you. This is probably an exact example of such > problem. > > Thanks I don't think so - the issue is really that DMA API does not yet handle the SEV case 100% correctly. I suspect passthrough devices would have the same issue. In fact whoever sets IOMMU_PLATFORM is completely unaffected by Christoph's pet peeve. Christoph is saying that !IOMMU_PLATFORM devices should hide the compatibility code in a special per-device DMA API implementation. Which would be fine especially if we can manage not to introduce a bunch of indirect calls all over the place and hurt performance. It's just that the benefit is unlikely to be big (e.g. we can't also get rid of the virtio specific memory barriers) so no one was motivated enough to work on it. -- MST ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/3] Fix virtio-blk issue with SWIOTLB 2019-01-14 18:20 ` Michael S. Tsirkin @ 2019-01-14 19:09 ` Singh, Brijesh 2019-01-14 19:12 ` Robin Murphy ` (2 subsequent siblings) 3 siblings, 0 replies; 27+ messages in thread From: Singh, Brijesh @ 2019-01-14 19:09 UTC (permalink / raw) To: Michael S. Tsirkin, Jason Wang Cc: Singh, Brijesh, Christoph Hellwig, Joerg Roedel, Konrad Rzeszutek Wilk, Jens Axboe, virtualization, linux-block, linux-kernel, iommu, jfehlig, Grimm, Jon On 1/14/19 12:20 PM, Michael S. Tsirkin wrote: > On Mon, Jan 14, 2019 at 08:41:37PM +0800, Jason Wang wrote: >> >> On 2019/1/14 下午5:50, Christoph Hellwig wrote: >>> On Mon, Jan 14, 2019 at 05:41:56PM +0800, Jason Wang wrote: >>>> On 2019/1/11 下午5:15, Joerg Roedel wrote: >>>>> On Fri, Jan 11, 2019 at 11:29:31AM +0800, Jason Wang wrote: >>>>>> Just wonder if my understanding is correct IOMMU_PLATFORM must be set for >>>>>> all virtio devices under AMD-SEV guests? >>>>> Yes, that is correct. Emulated DMA can only happen on the SWIOTLB >>>>> aperture, because that memory is not encrypted. The guest bounces the >>>>> data then to its encrypted memory. >>>>> >>>>> Regards, >>>>> >>>>> Joerg >>>> >>>> Thanks, have you tested vhost-net in this case. I suspect it may not work >>> Which brings me back to my pet pevee that we need to take actions >>> that virtio uses the proper dma mapping API by default with quirks >>> for legacy cases. The magic bypass it uses is just causing problems >>> over problems. >> >> >> Yes, I fully agree with you. This is probably an exact example of such >> problem. >> >> Thanks > > I don't think so - the issue is really that DMA API does not yet handle > the SEV case 100% correctly. I suspect passthrough devices would have > the same issue. > In case of SEV, emulated DMA is performed through the SWIOTLB (which bounces the encrypted buffers). The issue reported here will happen on any platform which is making use of SWIOTLB. We could easily reproduce the the virtio-blk failure if we configure swiotlb=force in non SEV guest. Unfortunately in case of SEV the SWIOTLB is must. As Jorge highlighted the main issue is limitation of the SWIOTLB, it does not support allocation/map larger than 256Kb. > In fact whoever sets IOMMU_PLATFORM is completely unaffected by > Christoph's pet peeve. > > Christoph is saying that !IOMMU_PLATFORM devices should hide the > compatibility code in a special per-device DMA API implementation. > Which would be fine especially if we can manage not to introduce a bunch > of indirect calls all over the place and hurt performance. It's just > that the benefit is unlikely to be big (e.g. we can't also get rid of > the virtio specific memory barriers) so no one was motivated enough to > work on it. > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/3] Fix virtio-blk issue with SWIOTLB 2019-01-14 18:20 ` Michael S. Tsirkin 2019-01-14 19:09 ` Singh, Brijesh @ 2019-01-14 19:12 ` Robin Murphy 2019-01-14 20:22 ` Christoph Hellwig 2019-01-14 20:29 ` Michael S. Tsirkin 2019-01-14 20:19 ` Christoph Hellwig 2019-01-15 8:37 ` Joerg Roedel 3 siblings, 2 replies; 27+ messages in thread From: Robin Murphy @ 2019-01-14 19:12 UTC (permalink / raw) To: Michael S. Tsirkin, Jason Wang Cc: Jens Axboe, brijesh.singh, Konrad Rzeszutek Wilk, jon.grimm, jfehlig, linux-kernel, virtualization, linux-block, iommu, Christoph Hellwig On 14/01/2019 18:20, Michael S. Tsirkin wrote: > On Mon, Jan 14, 2019 at 08:41:37PM +0800, Jason Wang wrote: >> >> On 2019/1/14 下午5:50, Christoph Hellwig wrote: >>> On Mon, Jan 14, 2019 at 05:41:56PM +0800, Jason Wang wrote: >>>> On 2019/1/11 下午5:15, Joerg Roedel wrote: >>>>> On Fri, Jan 11, 2019 at 11:29:31AM +0800, Jason Wang wrote: >>>>>> Just wonder if my understanding is correct IOMMU_PLATFORM must be set for >>>>>> all virtio devices under AMD-SEV guests? >>>>> Yes, that is correct. Emulated DMA can only happen on the SWIOTLB >>>>> aperture, because that memory is not encrypted. The guest bounces the >>>>> data then to its encrypted memory. >>>>> >>>>> Regards, >>>>> >>>>> Joerg >>>> >>>> Thanks, have you tested vhost-net in this case. I suspect it may not work >>> Which brings me back to my pet pevee that we need to take actions >>> that virtio uses the proper dma mapping API by default with quirks >>> for legacy cases. The magic bypass it uses is just causing problems >>> over problems. >> >> >> Yes, I fully agree with you. This is probably an exact example of such >> problem. >> >> Thanks > > I don't think so - the issue is really that DMA API does not yet handle > the SEV case 100% correctly. I suspect passthrough devices would have > the same issue. Huh? Regardless of which virtio devices use it or not, the DMA API is handling the SEV case as correctly as it possibly can, by forcing everything through the unencrypted bounce buffer. If the segments being mapped are too big for that bounce buffer in the first place, there's nothing it can possibly do except fail, gracefully or otherwise. Now, in theory, yes, the real issue at hand is not unique to virtio-blk nor SEV - any driver whose device has a sufficiently large DMA segment size and who manages to get sufficient physically-contiguous memory could technically generate a scatterlist segment longer than SWIOTLB can handle. However, in practice that basically never happens, not least because very few drivers ever override the default 64K DMA segment limit. AFAICS nothing in drivers/virtio is calling dma_set_max_seg_size() or otherwise assigning any dma_parms to replace the defaults either, so the really interesting question here is how are these apparently-out-of-spec 256K segments getting generated at all? Robin. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/3] Fix virtio-blk issue with SWIOTLB 2019-01-14 19:12 ` Robin Murphy @ 2019-01-14 20:22 ` Christoph Hellwig 2019-01-14 20:29 ` Michael S. Tsirkin 1 sibling, 0 replies; 27+ messages in thread From: Christoph Hellwig @ 2019-01-14 20:22 UTC (permalink / raw) To: Robin Murphy Cc: Michael S. Tsirkin, Jason Wang, Jens Axboe, brijesh.singh, Konrad Rzeszutek Wilk, jon.grimm, jfehlig, linux-kernel, virtualization, linux-block, iommu, Christoph Hellwig On Mon, Jan 14, 2019 at 07:12:08PM +0000, Robin Murphy wrote: > Now, in theory, yes, the real issue at hand is not unique to virtio-blk nor > SEV - any driver whose device has a sufficiently large DMA segment size and > who manages to get sufficient physically-contiguous memory could > technically generate a scatterlist segment longer than SWIOTLB can handle. > However, in practice that basically never happens, not least because very > few drivers ever override the default 64K DMA segment limit. AFAICS nothing > in drivers/virtio is calling dma_set_max_seg_size() or otherwise assigning > any dma_parms to replace the defaults either, so the really interesting > question here is how are these apparently-out-of-spec 256K segments getting > generated at all? drivers/block/virtio_blk.c:virtblk_probe(): /* Host can optionally specify maximum segment size and number of * segments. */ err = virtio_cread_feature(vdev, VIRTIO_BLK_F_SIZE_MAX, struct virtio_blk_config, size_max, &v); if (!err) blk_queue_max_segment_size(q, v); else blk_queue_max_segment_size(q, -1U); So it really is virtio_blk that is special here. The host could set VIRTIO_BLK_F_SIZE_MAX to paper over the problem, but I really think we need a dma_max_segment_size API that is wired up through the dma_map_ops to sort this out for real. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/3] Fix virtio-blk issue with SWIOTLB 2019-01-14 19:12 ` Robin Murphy 2019-01-14 20:22 ` Christoph Hellwig @ 2019-01-14 20:29 ` Michael S. Tsirkin 1 sibling, 0 replies; 27+ messages in thread From: Michael S. Tsirkin @ 2019-01-14 20:29 UTC (permalink / raw) To: Robin Murphy Cc: Jason Wang, Jens Axboe, brijesh.singh, Konrad Rzeszutek Wilk, jon.grimm, jfehlig, linux-kernel, virtualization, linux-block, iommu, Christoph Hellwig On Mon, Jan 14, 2019 at 07:12:08PM +0000, Robin Murphy wrote: > On 14/01/2019 18:20, Michael S. Tsirkin wrote: > > On Mon, Jan 14, 2019 at 08:41:37PM +0800, Jason Wang wrote: > > > > > > On 2019/1/14 下午5:50, Christoph Hellwig wrote: > > > > On Mon, Jan 14, 2019 at 05:41:56PM +0800, Jason Wang wrote: > > > > > On 2019/1/11 下午5:15, Joerg Roedel wrote: > > > > > > On Fri, Jan 11, 2019 at 11:29:31AM +0800, Jason Wang wrote: > > > > > > > Just wonder if my understanding is correct IOMMU_PLATFORM must be set for > > > > > > > all virtio devices under AMD-SEV guests? > > > > > > Yes, that is correct. Emulated DMA can only happen on the SWIOTLB > > > > > > aperture, because that memory is not encrypted. The guest bounces the > > > > > > data then to its encrypted memory. > > > > > > > > > > > > Regards, > > > > > > > > > > > > Joerg > > > > > > > > > > Thanks, have you tested vhost-net in this case. I suspect it may not work > > > > Which brings me back to my pet pevee that we need to take actions > > > > that virtio uses the proper dma mapping API by default with quirks > > > > for legacy cases. The magic bypass it uses is just causing problems > > > > over problems. > > > > > > > > > Yes, I fully agree with you. This is probably an exact example of such > > > problem. > > > > > > Thanks > > > > I don't think so - the issue is really that DMA API does not yet handle > > the SEV case 100% correctly. I suspect passthrough devices would have > > the same issue. > > Huh? Regardless of which virtio devices use it or not, the DMA API is > handling the SEV case as correctly as it possibly can, by forcing everything > through the unencrypted bounce buffer. If the segments being mapped are too > big for that bounce buffer in the first place, there's nothing it can > possibly do except fail, gracefully or otherwise. It seems reasonable to be able to ask it what it's capabilities are though. > Now, in theory, yes, the real issue at hand is not unique to virtio-blk nor > SEV - any driver whose device has a sufficiently large DMA segment size and > who manages to get sufficient physically-contiguous memory could technically > generate a scatterlist segment longer than SWIOTLB can handle. However, in > practice that basically never happens, not least because very few drivers > ever override the default 64K DMA segment limit. AFAICS nothing in > drivers/virtio is calling dma_set_max_seg_size() or otherwise assigning any > dma_parms to replace the defaults either, so the really interesting question > here is how are these apparently-out-of-spec 256K segments getting generated > at all? > > Robin. I guess this is what you are looking for: /* Host can optionally specify maximum segment size and number of * segments. */ err = virtio_cread_feature(vdev, VIRTIO_BLK_F_SIZE_MAX, struct virtio_blk_config, size_max, &v); if (!err) blk_queue_max_segment_size(q, v); else blk_queue_max_segment_size(q, -1U); virtio isn't the only caller with a value >64K: $ git grep -A1 blk_queue_max_segment_size Documentation/block/biodoc.txt: blk_queue_max_segment_size(q, max_seg_size) Documentation/block/biodoc.txt- Maximum size of a clustered segment, 64kB default. -- block/blk-settings.c: * blk_queue_max_segment_size - set max segment size for blk_rq_map_sg block/blk-settings.c- * @q: the request queue for the device -- block/blk-settings.c:void blk_queue_max_segment_size(struct request_queue *q, unsigned int max_size) block/blk-settings.c-{ -- block/blk-settings.c:EXPORT_SYMBOL(blk_queue_max_segment_size); block/blk-settings.c- -- drivers/block/mtip32xx/mtip32xx.c: blk_queue_max_segment_size(dd->queue, 0x400000); drivers/block/mtip32xx/mtip32xx.c- blk_queue_io_min(dd->queue, 4096); -- drivers/block/nbd.c: blk_queue_max_segment_size(disk->queue, UINT_MAX); drivers/block/nbd.c- blk_queue_max_segments(disk->queue, USHRT_MAX); -- drivers/block/ps3disk.c: blk_queue_max_segment_size(queue, dev->bounce_size); drivers/block/ps3disk.c- -- drivers/block/ps3vram.c: blk_queue_max_segment_size(queue, BLK_MAX_SEGMENT_SIZE); drivers/block/ps3vram.c- blk_queue_max_hw_sectors(queue, BLK_SAFE_MAX_SECTORS); -- drivers/block/rbd.c: blk_queue_max_segment_size(q, UINT_MAX); drivers/block/rbd.c- blk_queue_io_min(q, objset_bytes); -- drivers/block/sunvdc.c: blk_queue_max_segment_size(q, PAGE_SIZE); drivers/block/sunvdc.c- -- drivers/block/virtio_blk.c: blk_queue_max_segment_size(q, v); drivers/block/virtio_blk.c- else drivers/block/virtio_blk.c: blk_queue_max_segment_size(q, -1U); drivers/block/virtio_blk.c- -- drivers/block/xen-blkfront.c: blk_queue_max_segment_size(rq, PAGE_SIZE); drivers/block/xen-blkfront.c- -- drivers/cdrom/gdrom.c: blk_queue_max_segment_size(gd.gdrom_rq, 0x40000); drivers/cdrom/gdrom.c- gd.disk->queue = gd.gdrom_rq; -- drivers/memstick/core/ms_block.c: blk_queue_max_segment_size(msb->queue, drivers/memstick/core/ms_block.c- MS_BLOCK_MAX_PAGES * msb->page_size); -- drivers/memstick/core/mspro_block.c: blk_queue_max_segment_size(msb->queue, drivers/memstick/core/mspro_block.c- MSPRO_BLOCK_MAX_PAGES * msb->page_size); -- drivers/mmc/core/queue.c: blk_queue_max_segment_size(mq->queue, host->max_seg_size); drivers/mmc/core/queue.c- -- drivers/s390/block/dasd.c: blk_queue_max_segment_size(q, PAGE_SIZE); drivers/s390/block/dasd.c- blk_queue_segment_boundary(q, PAGE_SIZE - 1); -- drivers/scsi/be2iscsi/be_main.c: blk_queue_max_segment_size(sdev->request_queue, 65536); drivers/scsi/be2iscsi/be_main.c- return 0; -- drivers/scsi/scsi_debug.c: blk_queue_max_segment_size(sdp->request_queue, -1U); drivers/scsi/scsi_debug.c- if (sdebug_no_uld) -- drivers/scsi/scsi_lib.c: blk_queue_max_segment_size(q, dma_get_max_seg_size(dev)); drivers/scsi/scsi_lib.c- -- drivers/scsi/ufs/ufshcd.c: blk_queue_max_segment_size(q, PRDT_DATA_BYTE_COUNT_MAX); drivers/scsi/ufs/ufshcd.c- -- include/linux/blkdev.h:extern void blk_queue_max_segment_size(struct request_queue *, unsigned int); include/linux/blkdev.h-extern void blk_queue_max_discard_sectors(struct request_queue *q, -- include/linux/mmc/host.h: unsigned int max_seg_size; /* see blk_queue_max_segment_size */ include/linux/mmc/host.h- unsigned short max_segs; /* see blk_queue_max_segments */ Some of these devices are probably not going to work well if passed through to a SEV guest. Going back to virtio, at some level virtio is like a stacking device so it does not necessarily need a limit. -- MST ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/3] Fix virtio-blk issue with SWIOTLB 2019-01-14 18:20 ` Michael S. Tsirkin 2019-01-14 19:09 ` Singh, Brijesh 2019-01-14 19:12 ` Robin Murphy @ 2019-01-14 20:19 ` Christoph Hellwig 2019-01-14 20:48 ` Michael S. Tsirkin 2019-01-15 8:37 ` Joerg Roedel 3 siblings, 1 reply; 27+ messages in thread From: Christoph Hellwig @ 2019-01-14 20:19 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Jason Wang, Christoph Hellwig, Joerg Roedel, Konrad Rzeszutek Wilk, Jens Axboe, virtualization, linux-block, linux-kernel, iommu, jfehlig, jon.grimm, brijesh.singh On Mon, Jan 14, 2019 at 01:20:45PM -0500, Michael S. Tsirkin wrote: > I don't think so - the issue is really that DMA API does not yet handle > the SEV case 100% correctly. I suspect passthrough devices would have > the same issue. The DMA API handles the SEV case perfectly. Its just that virtio_blk supports huge segments that virtio does not generally support, but that is not related to SEV in any way. > In fact whoever sets IOMMU_PLATFORM is completely unaffected by > Christoph's pet peeve. No, the above happens only when we set IOMMU_PLATFORM. > Christoph is saying that !IOMMU_PLATFORM devices should hide the > compatibility code in a special per-device DMA API implementation. > Which would be fine especially if we can manage not to introduce a bunch > of indirect calls all over the place and hurt performance. It's just > that the benefit is unlikely to be big (e.g. we can't also get rid of > the virtio specific memory barriers) so no one was motivated enough to > work on it. No. The problem is that we still haven't fully specified what IOMMU_PLATFORM and !IOMMU_PLATFORM actually mean. Your "ACCESS_PLATFORM/ORDER_PLATFORM" commit in the virtio-spec repo improves it a little bit, but it is still far from enough. As a start VIRTIO_F_ORDER_PLATFORM and VIRTIO_F_ACCESS_PLATFORM absolutely MUST be set for hardware implementations. Otherwise said hardware has no chance of working on anything but the most x86-like systems. Second software implementations SHOULD set VIRTIO_F_ACCESS_PLATFORM, because otherwise we can't add proper handling for things like SEV or the IBM "secure hypervisor" thing. Last but not least a lot of wording outside the area describing these flags really needs some major updates in terms of DMA access. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/3] Fix virtio-blk issue with SWIOTLB 2019-01-14 20:19 ` Christoph Hellwig @ 2019-01-14 20:48 ` Michael S. Tsirkin 2019-01-15 13:09 ` Christoph Hellwig 0 siblings, 1 reply; 27+ messages in thread From: Michael S. Tsirkin @ 2019-01-14 20:48 UTC (permalink / raw) To: Christoph Hellwig Cc: Jason Wang, Joerg Roedel, Konrad Rzeszutek Wilk, Jens Axboe, virtualization, linux-block, linux-kernel, iommu, jfehlig, jon.grimm, brijesh.singh On Mon, Jan 14, 2019 at 09:19:35PM +0100, Christoph Hellwig wrote: > > Christoph is saying that !IOMMU_PLATFORM devices should hide the > > compatibility code in a special per-device DMA API implementation. > > Which would be fine especially if we can manage not to introduce a bunch > > of indirect calls all over the place and hurt performance. It's just > > that the benefit is unlikely to be big (e.g. we can't also get rid of > > the virtio specific memory barriers) so no one was motivated enough to > > work on it. > > No. Oh ok then. > The problem is that we still haven't fully specified what > IOMMU_PLATFORM and !IOMMU_PLATFORM actually mean. Your > "ACCESS_PLATFORM/ORDER_PLATFORM" commit in the virtio-spec repo > improves it a little bit, but it is still far from enough. > > As a start VIRTIO_F_ORDER_PLATFORM and VIRTIO_F_ACCESS_PLATFORM > absolutely MUST be set for hardware implementations. Otherwise said > hardware has no chance of working on anything but the most x86-like > systems. I think you are exaggerating a little here. Limited access with e.g. need to flush caches is common on lower end but less common on higher end systems used for virtualization. As you point out that changes with e.g. SEV but then SEV is a pretty niche thing so far. So I would make that a SHOULD probably. > Second software implementations SHOULD set VIRTIO_F_ACCESS_PLATFORM, > because otherwise we can't add proper handling for things like SEV or > the IBM "secure hypervisor" thing. Yes. If host does not have access to all of memory it SHOULD set VIRTIO_F_ACCESS_PLATFORM. It seems rather straight-forward to add conformance clauses for this, thanks for reminding me. > Last but not least a lot of wording outside the area describing these > flags really needs some major updates in terms of DMA access. Thanks for the reminder. Yes generally spec tends to still say e.g. "physical address" where it really means a "DMA address" in Linux terms. Whether changing that will make things much clearer I'm not sure. E.g. hardware designers don't much care I guess. If you want to list the most problematic cases, e.g. in a github issue, we might get to clarifying them sooner. Thanks! -- MST ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/3] Fix virtio-blk issue with SWIOTLB 2019-01-14 20:48 ` Michael S. Tsirkin @ 2019-01-15 13:09 ` Christoph Hellwig 0 siblings, 0 replies; 27+ messages in thread From: Christoph Hellwig @ 2019-01-15 13:09 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Christoph Hellwig, Jason Wang, Joerg Roedel, Konrad Rzeszutek Wilk, Jens Axboe, virtualization, linux-block, linux-kernel, iommu, jfehlig, jon.grimm, brijesh.singh On Mon, Jan 14, 2019 at 03:48:47PM -0500, Michael S. Tsirkin wrote: > I think you are exaggerating a little here. Limited access with e.g. > need to flush caches is common on lower end but less common on higher > end systems used for virtualization. As you point out that changes with > e.g. SEV but then SEV is a pretty niche thing so far. > > So I would make that a SHOULD probably. The problem is that without using DMA ops you can't just plug the device into a random system, which is a total no-go for periphals. And cache flushing is not that uncommmon, e.g. every sparc systems needs it, many arm/arm64 ones, some power ones, lots of mips ones, etc. But cache flushing is just one side of it - lots of systems have either mandatory or option IOMMUs, and without the platform access flag that doesn't work. As does that case where you have a DMA offset, which is extremly common on arm and power systems. So you might find a few non-x86 systems you can plug it in and it'll just work, but it won't be all that many. And even on x86 your device will be completely broken if the users dares to use an IOMMU. > > flags really needs some major updates in terms of DMA access. > > Thanks for the reminder. Yes generally spec tends to still say e.g. > "physical address" where it really means a "DMA address" in Linux terms. > Whether changing that will make things much clearer I'm not sure. E.g. > hardware designers don't much care I guess. At least say bus address - as said above the CPU and bus concepts of physicall addresses are different in many systems. Including x86 when you use IOMMUs. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/3] Fix virtio-blk issue with SWIOTLB 2019-01-14 18:20 ` Michael S. Tsirkin ` (2 preceding siblings ...) 2019-01-14 20:19 ` Christoph Hellwig @ 2019-01-15 8:37 ` Joerg Roedel 2019-01-15 13:20 ` Christoph Hellwig 3 siblings, 1 reply; 27+ messages in thread From: Joerg Roedel @ 2019-01-15 8:37 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Jason Wang, Christoph Hellwig, Konrad Rzeszutek Wilk, Jens Axboe, virtualization, linux-block, linux-kernel, iommu, jfehlig, jon.grimm, brijesh.singh On Mon, Jan 14, 2019 at 01:20:45PM -0500, Michael S. Tsirkin wrote: > Which would be fine especially if we can manage not to introduce a bunch > of indirect calls all over the place and hurt performance. Which indirect calls? In case of unset dma_ops the DMA-API functions call directly into the dma-direct implementation, no indirect calls at all. Regards, Joerg ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/3] Fix virtio-blk issue with SWIOTLB 2019-01-15 8:37 ` Joerg Roedel @ 2019-01-15 13:20 ` Christoph Hellwig 2019-01-16 14:16 ` Michael S. Tsirkin 0 siblings, 1 reply; 27+ messages in thread From: Christoph Hellwig @ 2019-01-15 13:20 UTC (permalink / raw) To: Joerg Roedel Cc: Michael S. Tsirkin, Jason Wang, Christoph Hellwig, Konrad Rzeszutek Wilk, Jens Axboe, virtualization, linux-block, linux-kernel, iommu, jfehlig, jon.grimm, brijesh.singh On Tue, Jan 15, 2019 at 09:37:42AM +0100, Joerg Roedel wrote: > On Mon, Jan 14, 2019 at 01:20:45PM -0500, Michael S. Tsirkin wrote: > > Which would be fine especially if we can manage not to introduce a bunch > > of indirect calls all over the place and hurt performance. > > Which indirect calls? In case of unset dma_ops the DMA-API functions > call directly into the dma-direct implementation, no indirect calls at > all. True. But the NULL-ops dma direct case still has two issues that might not work for virtio: (a) if the architecture supports devices that are not DMA coherent it will dip into a special allocator and do cache maintainance for streaming mappings. Although it would be a bit of a hack we could work around this in virtio doings something like: #if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \ defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \ defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL) dev->dma_coherent = true; #endif except that won't work for mips, which has a mode where it does a system instead of device level coherency flag and thus doesn't look at this struct device field (b) we can still mangle the DMA address, either using the dma_pfn_offset field in struct device, or by a full override of __phys_to_dma / __dma_to_phys by the architecture code. The first could be solved using a hack like the one above, but the latter would be a little harder. In the long run I'd love to get rid of that hook and have everyone use the generic offset code, but for that we first need to support multiple ranges with different offset and do quite some nasty arch code surgery. So for the legacy virtio case I fear we need to keep local dma mapping implementation for now. I just wish now recent hypervisor would ever offer devices in this broken legacy mode.. > > Regards, > > Joerg ---end quoted text--- ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/3] Fix virtio-blk issue with SWIOTLB 2019-01-15 13:20 ` Christoph Hellwig @ 2019-01-16 14:16 ` Michael S. Tsirkin 0 siblings, 0 replies; 27+ messages in thread From: Michael S. Tsirkin @ 2019-01-16 14:16 UTC (permalink / raw) To: Christoph Hellwig Cc: Joerg Roedel, Jason Wang, Konrad Rzeszutek Wilk, Jens Axboe, virtualization, linux-block, linux-kernel, iommu, jfehlig, jon.grimm, brijesh.singh On Tue, Jan 15, 2019 at 02:20:19PM +0100, Christoph Hellwig wrote: > On Tue, Jan 15, 2019 at 09:37:42AM +0100, Joerg Roedel wrote: > > On Mon, Jan 14, 2019 at 01:20:45PM -0500, Michael S. Tsirkin wrote: > > > Which would be fine especially if we can manage not to introduce a bunch > > > of indirect calls all over the place and hurt performance. > > > > Which indirect calls? In case of unset dma_ops the DMA-API functions > > call directly into the dma-direct implementation, no indirect calls at > > all. > > True. But the NULL-ops dma direct case still has two issues that might > not work for virtio: > > (a) if the architecture supports devices that are not DMA coherent it > will dip into a special allocator and do cache maintainance for > streaming mappings. Although it would be a bit of a hack we could > work around this in virtio doings something like: > > #if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \ > defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \ > defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL) > dev->dma_coherent = true; > #endif > > except that won't work for mips, which has a mode where it does > a system instead of device level coherency flag and thus doesn't > look at this struct device field > > (b) we can still mangle the DMA address, either using the > dma_pfn_offset field in struct device, or by a full override > of __phys_to_dma / __dma_to_phys by the architecture code. > The first could be solved using a hack like the one above, > but the latter would be a little harder. In the long run > I'd love to get rid of that hook and have everyone use the > generic offset code, but for that we first need to support > multiple ranges with different offset and do quite some > nasty arch code surgery. > > So for the legacy virtio case I fear we need to keep local dma mapping > implementation for now. I just wish now recent hypervisor would ever > offer devices in this broken legacy mode.. IIUC some emulated hardware has no reasonable way to specify that some devices bypass the IOMMU, and no cheap way to cover all memory with an IOMMU mapping. So I suspect !ACCESS_PLATFORM will stay a supported configuration forever :( > > > > Regards, > > > > Joerg > ---end quoted text--- ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2019-01-16 14:16 UTC | newest] Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-01-10 13:44 [PATCH 0/3] Fix virtio-blk issue with SWIOTLB Joerg Roedel 2019-01-10 13:44 ` [PATCH 1/3] swiotlb: Export maximum allocation size Joerg Roedel 2019-01-10 17:02 ` Konrad Rzeszutek Wilk 2019-01-11 9:12 ` Joerg Roedel 2019-01-14 20:49 ` Konrad Rzeszutek Wilk 2019-01-14 21:59 ` Michael S. Tsirkin 2019-01-15 13:05 ` Christoph Hellwig 2019-01-10 13:44 ` [PATCH 2/3] virtio: Introduce virtio_max_dma_size() Joerg Roedel 2019-01-10 13:44 ` [PATCH 3/3] virtio-blk: Consider virtio_max_dma_size() for maximum segment size Joerg Roedel 2019-01-10 13:59 ` [PATCH 0/3] Fix virtio-blk issue with SWIOTLB Christoph Hellwig 2019-01-10 14:26 ` Joerg Roedel 2019-01-11 3:29 ` Jason Wang 2019-01-11 9:15 ` Joerg Roedel 2019-01-14 9:41 ` Jason Wang 2019-01-14 9:50 ` Christoph Hellwig 2019-01-14 12:41 ` Jason Wang 2019-01-14 18:20 ` Michael S. Tsirkin 2019-01-14 19:09 ` Singh, Brijesh 2019-01-14 19:12 ` Robin Murphy 2019-01-14 20:22 ` Christoph Hellwig 2019-01-14 20:29 ` Michael S. Tsirkin 2019-01-14 20:19 ` Christoph Hellwig 2019-01-14 20:48 ` Michael S. Tsirkin 2019-01-15 13:09 ` Christoph Hellwig 2019-01-15 8:37 ` Joerg Roedel 2019-01-15 13:20 ` Christoph Hellwig 2019-01-16 14:16 ` Michael S. Tsirkin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).