Linux-Block Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 0/2] virtio-blk: improve handling of DMA mapping failures
@ 2020-02-13 12:37 Halil Pasic
  2020-02-13 12:37 ` [PATCH 1/2] virtio-blk: fix hw_queue stopped on arbitrary error Halil Pasic
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Halil Pasic @ 2020-02-13 12:37 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang
  Cc: Halil Pasic, Paolo Bonzini, Stefan Hajnoczi, Jens Axboe,
	virtualization, linux-block, linux-kernel, linux-s390,
	Christian Borntraeger, Viktor Mihajlovski, Cornelia Huck,
	Ram Pai, Thiago Jung Bauermann, Lendacky, Thomas

Two patches are handling new edge cases introduced by doing DMA mappings
(which can fail) in virtio core.

I stumbled upon this while stress testing I/O for Protected Virtual
Machines. I deliberately chose a tiny swiotlb size and have generated
load with fio. With more than one virtio-blk disk in use I experienced
hangs.

The goal of this series is to fix those hangs.

Halil Pasic (2):
  virtio-blk: fix hw_queue stopped on arbitrary error
  virtio-blk: improve virtqueue error to BLK_STS

 drivers/block/virtio_blk.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)


base-commit: 39bed42de2e7d74686a2d5a45638d6a5d7e7d473
-- 
2.17.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/2] virtio-blk: fix hw_queue stopped on arbitrary error
  2020-02-13 12:37 [PATCH 0/2] virtio-blk: improve handling of DMA mapping failures Halil Pasic
@ 2020-02-13 12:37 ` Halil Pasic
  2020-02-14 18:20   ` dongli.zhang
  2020-02-18  2:21   ` Ming Lei
  2020-02-13 12:37 ` [PATCH 2/2] virtio-blk: improve virtqueue error to BLK_STS Halil Pasic
  2020-02-19 15:11 ` [PATCH 0/2] virtio-blk: improve handling of DMA mapping failures Stefan Hajnoczi
  2 siblings, 2 replies; 10+ messages in thread
From: Halil Pasic @ 2020-02-13 12:37 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang
  Cc: Halil Pasic, Jens Axboe, Paolo Bonzini, Stefan Hajnoczi,
	virtualization, linux-block, linux-kernel, linux-s390,
	Christian Borntraeger, Viktor Mihajlovski, Cornelia Huck,
	Ram Pai, Thiago Jung Bauermann, Lendacky, Thomas

Since nobody else is going to restart our hw_queue for us, the
blk_mq_start_stopped_hw_queues() is in virtblk_done() is not sufficient
necessarily sufficient to ensure that the queue will get started again.
In case of global resource outage (-ENOMEM because mapping failure,
because of swiotlb full) our virtqueue may be empty and we can get
stuck with a stopped hw_queue.

Let us not stop the queue on arbitrary errors, but only on -EONSPC which
indicates a full virtqueue, where the hw_queue is guaranteed to get
started by virtblk_done() before when it makes sense to carry on
submitting requests. Let us also remove a stale comment.

Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Cc: Jens Axboe <axboe@kernel.dk>
Fixes: f7728002c1c7 ("virtio_ring: fix return code on DMA mapping fails")
---

I'm in doubt with regards to the Fixes tag. The thing is, virtio-blk
probably made an assumption on virtqueue_add: the failure is either
because the virtqueue is full, or the failure is fatal. In both cases it
seems acceptable to stop the queue, although the fatal case is arguable.
Since commit f7728002c1c7 it the dma mapping has failed returns -ENOMEM
and not -EIO, and thus we have a recoverable failure that ain't
virtqueue full. So I lean towards to a fixes tag that references that
commit, although it ain't broken. Alternatively one could say 'Fixes:
e467cde23818 ("Block driver using virtio.")', if the aforementioned
assumption shouldn't have made in the first place.
---
 drivers/block/virtio_blk.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 54158766334b..adfe43f5ffe4 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -245,10 +245,12 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
 	err = virtblk_add_req(vblk->vqs[qid].vq, vbr, vbr->sg, num);
 	if (err) {
 		virtqueue_kick(vblk->vqs[qid].vq);
-		blk_mq_stop_hw_queue(hctx);
+		/* Don't stop the queue if -ENOMEM: we may have failed to
+		 * bounce the buffer due to global resource outage.
+		 */
+		if (err == -ENOSPC)
+			blk_mq_stop_hw_queue(hctx);
 		spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
-		/* Out of mem doesn't actually happen, since we fall back
-		 * to direct descriptors */
 		if (err == -ENOMEM || err == -ENOSPC)
 			return BLK_STS_DEV_RESOURCE;
 		return BLK_STS_IOERR;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 2/2] virtio-blk: improve virtqueue error to BLK_STS
  2020-02-13 12:37 [PATCH 0/2] virtio-blk: improve handling of DMA mapping failures Halil Pasic
  2020-02-13 12:37 ` [PATCH 1/2] virtio-blk: fix hw_queue stopped on arbitrary error Halil Pasic
@ 2020-02-13 12:37 ` Halil Pasic
  2020-02-19 15:11 ` [PATCH 0/2] virtio-blk: improve handling of DMA mapping failures Stefan Hajnoczi
  2 siblings, 0 replies; 10+ messages in thread
From: Halil Pasic @ 2020-02-13 12:37 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang
  Cc: Halil Pasic, Paolo Bonzini, Stefan Hajnoczi, Jens Axboe,
	virtualization, linux-block, linux-kernel, linux-s390,
	Christian Borntraeger, Viktor Mihajlovski, Cornelia Huck,
	Ram Pai, Thiago Jung Bauermann, Lendacky, Thomas

Let's change the mapping between virtqueue_add errors to BLK_STS
statuses, so that -ENOSPC, which indicates virtqueue full is still
mapped to BLK_STS_DEV_RESOURCE, but -ENOMEM which indicates non-device
specific resource outage is mapped to BLK_STS_RESOURCE.

Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
---
See comment about BLK_STS_DEV_RESOURCE in include/linux/blk_types.h
---
 drivers/block/virtio_blk.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index adfe43f5ffe4..0736248999b0 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -251,9 +251,14 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
 		if (err == -ENOSPC)
 			blk_mq_stop_hw_queue(hctx);
 		spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
-		if (err == -ENOMEM || err == -ENOSPC)
+		switch (err) {
+		case -ENOSPC:
 			return BLK_STS_DEV_RESOURCE;
-		return BLK_STS_IOERR;
+		case -ENOMEM:
+			return BLK_STS_RESOURCE;
+		default:
+			return BLK_STS_IOERR;
+		}
 	}
 
 	if (bd->last && virtqueue_kick_prepare(vblk->vqs[qid].vq))
-- 
2.17.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] virtio-blk: fix hw_queue stopped on arbitrary error
  2020-02-13 12:37 ` [PATCH 1/2] virtio-blk: fix hw_queue stopped on arbitrary error Halil Pasic
@ 2020-02-14 18:20   ` dongli.zhang
  2020-02-17 13:08     ` Halil Pasic
  2020-02-18  2:21   ` Ming Lei
  1 sibling, 1 reply; 10+ messages in thread
From: dongli.zhang @ 2020-02-14 18:20 UTC (permalink / raw)
  To: Halil Pasic, Michael S. Tsirkin, Jason Wang
  Cc: Jens Axboe, linux-block, linux-s390, Cornelia Huck, Ram Pai,
	linux-kernel, virtualization, Christian Borntraeger,
	Stefan Hajnoczi, Paolo Bonzini, Lendacky, Thomas,
	Viktor Mihajlovski

Hi Halil,

When swiotlb full is hit for virtio_blk, there is below warning for once (the
warning is not by this patch set). Is this expected or just false positive?

[   54.767257] virtio-pci 0000:00:04.0: swiotlb buffer is full (sz: 16 bytes),
total 32768 (slots), used 258 (slots)
[   54.767260] virtio-pci 0000:00:04.0: overflow 0x0000000075770110+16 of DMA
mask ffffffffffffffff bus limit 0
[   54.769192] ------------[ cut here ]------------
[   54.769200] WARNING: CPU: 3 PID: 102 at kernel/dma/direct.c:35
report_addr+0x71/0x77
[   54.769200] Modules linked in:
[   54.769203] CPU: 3 PID: 102 Comm: kworker/u8:2 Not tainted 5.6.0-rc1+ #2
[   54.769204] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
[   54.769208] Workqueue: writeback wb_workfn (flush-253:0)
[   54.769211] RIP: 0010:report_addr+0x71/0x77
... ...
[   54.769226] Call Trace:
[   54.769241]  dma_direct_map_page+0xc9/0xe0
[   54.769245]  virtqueue_add+0x172/0xaa0
[   54.769248]  virtqueue_add_sgs+0x85/0xa0
[   54.769251]  virtio_queue_rq+0x292/0x480
[   54.769255]  __blk_mq_try_issue_directly+0x13e/0x1f0
[   54.769257]  blk_mq_request_issue_directly+0x48/0xa0
[   54.769259]  blk_mq_try_issue_list_directly+0x3c/0xb0
[   54.769261]  blk_mq_sched_insert_requests+0xb6/0x100
[   54.769263]  blk_mq_flush_plug_list+0x146/0x210
[   54.769264]  blk_flush_plug_list+0xba/0xe0
[   54.769266]  blk_mq_make_request+0x331/0x5b0
[   54.769268]  generic_make_request+0x10d/0x2e0
[   54.769270]  submit_bio+0x69/0x130
[   54.769273]  ext4_io_submit+0x44/0x50
[   54.769275]  ext4_writepages+0x56f/0xd30
[   54.769278]  ? cpumask_next_and+0x19/0x20
[   54.769280]  ? find_busiest_group+0x11a/0xa40
[   54.769283]  do_writepages+0x15/0x70
[   54.769288]  __writeback_single_inode+0x38/0x330
[   54.769290]  writeback_sb_inodes+0x219/0x4c0
[   54.769292]  __writeback_inodes_wb+0x82/0xb0
[   54.769293]  wb_writeback+0x267/0x300
[   54.769295]  wb_workfn+0x1aa/0x430
[   54.769298]  process_one_work+0x156/0x360
[   54.769299]  worker_thread+0x41/0x3b0
[   54.769300]  kthread+0xf3/0x130
[   54.769302]  ? process_one_work+0x360/0x360
[   54.769303]  ? kthread_bind+0x10/0x10
[   54.769305]  ret_from_fork+0x35/0x40
[   54.769307] ---[ end trace 923a87a9ce0e777a ]---

Thank you very much!

Dongli Zhang

On 2/13/20 4:37 AM, Halil Pasic wrote:
> Since nobody else is going to restart our hw_queue for us, the
> blk_mq_start_stopped_hw_queues() is in virtblk_done() is not sufficient
> necessarily sufficient to ensure that the queue will get started again.
> In case of global resource outage (-ENOMEM because mapping failure,
> because of swiotlb full) our virtqueue may be empty and we can get
> stuck with a stopped hw_queue.
> 
> Let us not stop the queue on arbitrary errors, but only on -EONSPC which
> indicates a full virtqueue, where the hw_queue is guaranteed to get
> started by virtblk_done() before when it makes sense to carry on
> submitting requests. Let us also remove a stale comment.
> 
> Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
> Cc: Jens Axboe <axboe@kernel.dk>
> Fixes: f7728002c1c7 ("virtio_ring: fix return code on DMA mapping fails")
> ---
> 
> I'm in doubt with regards to the Fixes tag. The thing is, virtio-blk
> probably made an assumption on virtqueue_add: the failure is either
> because the virtqueue is full, or the failure is fatal. In both cases it
> seems acceptable to stop the queue, although the fatal case is arguable.
> Since commit f7728002c1c7 it the dma mapping has failed returns -ENOMEM
> and not -EIO, and thus we have a recoverable failure that ain't
> virtqueue full. So I lean towards to a fixes tag that references that
> commit, although it ain't broken. Alternatively one could say 'Fixes:
> e467cde23818 ("Block driver using virtio.")', if the aforementioned
> assumption shouldn't have made in the first place.
> ---
>  drivers/block/virtio_blk.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 54158766334b..adfe43f5ffe4 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -245,10 +245,12 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
>  	err = virtblk_add_req(vblk->vqs[qid].vq, vbr, vbr->sg, num);
>  	if (err) {
>  		virtqueue_kick(vblk->vqs[qid].vq);
> -		blk_mq_stop_hw_queue(hctx);
> +		/* Don't stop the queue if -ENOMEM: we may have failed to
> +		 * bounce the buffer due to global resource outage.
> +		 */
> +		if (err == -ENOSPC)
> +			blk_mq_stop_hw_queue(hctx);
>  		spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
> -		/* Out of mem doesn't actually happen, since we fall back
> -		 * to direct descriptors */
>  		if (err == -ENOMEM || err == -ENOSPC)
>  			return BLK_STS_DEV_RESOURCE;
>  		return BLK_STS_IOERR;
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] virtio-blk: fix hw_queue stopped on arbitrary error
  2020-02-14 18:20   ` dongli.zhang
@ 2020-02-17 13:08     ` Halil Pasic
  0 siblings, 0 replies; 10+ messages in thread
From: Halil Pasic @ 2020-02-17 13:08 UTC (permalink / raw)
  To: dongli.zhang
  Cc: Michael S. Tsirkin, Jason Wang, Jens Axboe, linux-block,
	linux-s390, Cornelia Huck, Ram Pai, linux-kernel, virtualization,
	Christian Borntraeger, Stefan Hajnoczi, Paolo Bonzini, Lendacky,
	Thomas, Viktor Mihajlovski

On Fri, 14 Feb 2020 10:20:44 -0800
dongli.zhang@oracle.com wrote:

> Hi Halil,
> 
> When swiotlb full is hit for virtio_blk, there is below warning for once (the
> warning is not by this patch set). Is this expected or just false positive?

The warning is kind of expected. Certainly not a false positive, but it
probably looks more dramatic than I would like it to look.

If swiotlb cmdline parameter can be chosen so that the swiotlb won't
run out of space, that is certainly preferable. But out of swiotlb space
should merely result in performance degradation (provided the device
drivers properly handle the condition).

Thanks for having a look! 

Regards,
Halil

> 
> [   54.767257] virtio-pci 0000:00:04.0: swiotlb buffer is full (sz: 16 bytes),
> total 32768 (slots), used 258 (slots)
> [   54.767260] virtio-pci 0000:00:04.0: overflow 0x0000000075770110+16 of DMA
> mask ffffffffffffffff bus limit 0
> [   54.769192] ------------[ cut here ]------------
> [   54.769200] WARNING: CPU: 3 PID: 102 at kernel/dma/direct.c:35
> report_addr+0x71/0x77
> [   54.769200] Modules linked in:
> [   54.769203] CPU: 3 PID: 102 Comm: kworker/u8:2 Not tainted 5.6.0-rc1+ #2
> [   54.769204] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
> [   54.769208] Workqueue: writeback wb_workfn (flush-253:0)
> [   54.769211] RIP: 0010:report_addr+0x71/0x77
> ... ...
> [   54.769226] Call Trace:
> [   54.769241]  dma_direct_map_page+0xc9/0xe0
> [   54.769245]  virtqueue_add+0x172/0xaa0
> [   54.769248]  virtqueue_add_sgs+0x85/0xa0
> [   54.769251]  virtio_queue_rq+0x292/0x480
> [   54.769255]  __blk_mq_try_issue_directly+0x13e/0x1f0
> [   54.769257]  blk_mq_request_issue_directly+0x48/0xa0
> [   54.769259]  blk_mq_try_issue_list_directly+0x3c/0xb0
> [   54.769261]  blk_mq_sched_insert_requests+0xb6/0x100
> [   54.769263]  blk_mq_flush_plug_list+0x146/0x210
> [   54.769264]  blk_flush_plug_list+0xba/0xe0
> [   54.769266]  blk_mq_make_request+0x331/0x5b0
> [   54.769268]  generic_make_request+0x10d/0x2e0
> [   54.769270]  submit_bio+0x69/0x130
> [   54.769273]  ext4_io_submit+0x44/0x50
> [   54.769275]  ext4_writepages+0x56f/0xd30
> [   54.769278]  ? cpumask_next_and+0x19/0x20
> [   54.769280]  ? find_busiest_group+0x11a/0xa40
> [   54.769283]  do_writepages+0x15/0x70
> [   54.769288]  __writeback_single_inode+0x38/0x330
> [   54.769290]  writeback_sb_inodes+0x219/0x4c0
> [   54.769292]  __writeback_inodes_wb+0x82/0xb0
> [   54.769293]  wb_writeback+0x267/0x300
> [   54.769295]  wb_workfn+0x1aa/0x430
> [   54.769298]  process_one_work+0x156/0x360
> [   54.769299]  worker_thread+0x41/0x3b0
> [   54.769300]  kthread+0xf3/0x130
> [   54.769302]  ? process_one_work+0x360/0x360
> [   54.769303]  ? kthread_bind+0x10/0x10
> [   54.769305]  ret_from_fork+0x35/0x40
> [   54.769307] ---[ end trace 923a87a9ce0e777a ]---
> 
> Thank you very much!
> 
> Dongli Zhang
> 
> On 2/13/20 4:37 AM, Halil Pasic wrote:
> > Since nobody else is going to restart our hw_queue for us, the
> > blk_mq_start_stopped_hw_queues() is in virtblk_done() is not sufficient
> > necessarily sufficient to ensure that the queue will get started again.
> > In case of global resource outage (-ENOMEM because mapping failure,
> > because of swiotlb full) our virtqueue may be empty and we can get
> > stuck with a stopped hw_queue.
> > 
> > Let us not stop the queue on arbitrary errors, but only on -EONSPC which
> > indicates a full virtqueue, where the hw_queue is guaranteed to get
> > started by virtblk_done() before when it makes sense to carry on
> > submitting requests. Let us also remove a stale comment.
> > 
> > Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
> > Cc: Jens Axboe <axboe@kernel.dk>
> > Fixes: f7728002c1c7 ("virtio_ring: fix return code on DMA mapping fails")
> > ---
> > 
> > I'm in doubt with regards to the Fixes tag. The thing is, virtio-blk
> > probably made an assumption on virtqueue_add: the failure is either
> > because the virtqueue is full, or the failure is fatal. In both cases it
> > seems acceptable to stop the queue, although the fatal case is arguable.
> > Since commit f7728002c1c7 it the dma mapping has failed returns -ENOMEM
> > and not -EIO, and thus we have a recoverable failure that ain't
> > virtqueue full. So I lean towards to a fixes tag that references that
> > commit, although it ain't broken. Alternatively one could say 'Fixes:
> > e467cde23818 ("Block driver using virtio.")', if the aforementioned
> > assumption shouldn't have made in the first place.
> > ---
> >  drivers/block/virtio_blk.c | 8 +++++---
> >  1 file changed, 5 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > index 54158766334b..adfe43f5ffe4 100644
> > --- a/drivers/block/virtio_blk.c
> > +++ b/drivers/block/virtio_blk.c
> > @@ -245,10 +245,12 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
> >  	err = virtblk_add_req(vblk->vqs[qid].vq, vbr, vbr->sg, num);
> >  	if (err) {
> >  		virtqueue_kick(vblk->vqs[qid].vq);
> > -		blk_mq_stop_hw_queue(hctx);
> > +		/* Don't stop the queue if -ENOMEM: we may have failed to
> > +		 * bounce the buffer due to global resource outage.
> > +		 */
> > +		if (err == -ENOSPC)
> > +			blk_mq_stop_hw_queue(hctx);
> >  		spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
> > -		/* Out of mem doesn't actually happen, since we fall back
> > -		 * to direct descriptors */
> >  		if (err == -ENOMEM || err == -ENOSPC)
> >  			return BLK_STS_DEV_RESOURCE;
> >  		return BLK_STS_IOERR;
> > 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] virtio-blk: fix hw_queue stopped on arbitrary error
  2020-02-13 12:37 ` [PATCH 1/2] virtio-blk: fix hw_queue stopped on arbitrary error Halil Pasic
  2020-02-14 18:20   ` dongli.zhang
@ 2020-02-18  2:21   ` Ming Lei
  2020-02-18 12:35     ` Halil Pasic
  1 sibling, 1 reply; 10+ messages in thread
From: Ming Lei @ 2020-02-18  2:21 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Michael S. Tsirkin, Jason Wang, Jens Axboe, linux-block,
	linux-s390, Cornelia Huck, Ram Pai, Linux Kernel Mailing List,
	Linux Virtualization, Christian Borntraeger, Stefan Hajnoczi,
	Paolo Bonzini, Lendacky, Thomas, Viktor Mihajlovski

On Thu, Feb 13, 2020 at 8:38 PM Halil Pasic <pasic@linux.ibm.com> wrote:
>
> Since nobody else is going to restart our hw_queue for us, the
> blk_mq_start_stopped_hw_queues() is in virtblk_done() is not sufficient
> necessarily sufficient to ensure that the queue will get started again.
> In case of global resource outage (-ENOMEM because mapping failure,
> because of swiotlb full) our virtqueue may be empty and we can get
> stuck with a stopped hw_queue.
>
> Let us not stop the queue on arbitrary errors, but only on -EONSPC which
> indicates a full virtqueue, where the hw_queue is guaranteed to get
> started by virtblk_done() before when it makes sense to carry on
> submitting requests. Let us also remove a stale comment.

The generic solution may be to stop queue only when there is any
in-flight request
not completed.

Checking -ENOMEM may not be enough, given -EIO can be returned from
virtqueue_add()
too in case of dma map failure.

Thanks,

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] virtio-blk: fix hw_queue stopped on arbitrary error
  2020-02-18  2:21   ` Ming Lei
@ 2020-02-18 12:35     ` Halil Pasic
  2020-02-19  1:46       ` Ming Lei
  0 siblings, 1 reply; 10+ messages in thread
From: Halil Pasic @ 2020-02-18 12:35 UTC (permalink / raw)
  To: Ming Lei
  Cc: Michael S. Tsirkin, Jason Wang, Jens Axboe, linux-block,
	linux-s390, Cornelia Huck, Ram Pai, Linux Kernel Mailing List,
	Linux Virtualization, Christian Borntraeger, Stefan Hajnoczi,
	Paolo Bonzini, Lendacky, Thomas, Viktor Mihajlovski

On Tue, 18 Feb 2020 10:21:18 +0800
Ming Lei <tom.leiming@gmail.com> wrote:

> On Thu, Feb 13, 2020 at 8:38 PM Halil Pasic <pasic@linux.ibm.com> wrote:
> >
> > Since nobody else is going to restart our hw_queue for us, the
> > blk_mq_start_stopped_hw_queues() is in virtblk_done() is not sufficient
> > necessarily sufficient to ensure that the queue will get started again.
> > In case of global resource outage (-ENOMEM because mapping failure,
> > because of swiotlb full) our virtqueue may be empty and we can get
> > stuck with a stopped hw_queue.
> >
> > Let us not stop the queue on arbitrary errors, but only on -EONSPC which
> > indicates a full virtqueue, where the hw_queue is guaranteed to get
> > started by virtblk_done() before when it makes sense to carry on
> > submitting requests. Let us also remove a stale comment.
> 
> The generic solution may be to stop queue only when there is any
> in-flight request
> not completed.
> 

I think this is a pretty close to that. The queue is stopped only on
ENOSPC, which means virtqueue is full.

> Checking -ENOMEM may not be enough, given -EIO can be returned from
> virtqueue_add()
> too in case of dma map failure.

I'm not checking on -ENOMEM. So the queue would not be stopped on EIO.
Maybe I'm misunderstanding something In any case, please have another
look at the diff, and if your concerns persist please help me understand.

Thanks for having a look!

Regards,
Halil

> 
> Thanks,


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] virtio-blk: fix hw_queue stopped on arbitrary error
  2020-02-18 12:35     ` Halil Pasic
@ 2020-02-19  1:46       ` Ming Lei
  2020-02-19 15:42         ` Halil Pasic
  0 siblings, 1 reply; 10+ messages in thread
From: Ming Lei @ 2020-02-19  1:46 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Michael S. Tsirkin, Jason Wang, Jens Axboe, linux-block,
	linux-s390, Cornelia Huck, Ram Pai, Linux Kernel Mailing List,
	Linux Virtualization, Christian Borntraeger, Stefan Hajnoczi,
	Paolo Bonzini, Lendacky, Thomas, Viktor Mihajlovski

On Tue, Feb 18, 2020 at 8:35 PM Halil Pasic <pasic@linux.ibm.com> wrote:
>
> On Tue, 18 Feb 2020 10:21:18 +0800
> Ming Lei <tom.leiming@gmail.com> wrote:
>
> > On Thu, Feb 13, 2020 at 8:38 PM Halil Pasic <pasic@linux.ibm.com> wrote:
> > >
> > > Since nobody else is going to restart our hw_queue for us, the
> > > blk_mq_start_stopped_hw_queues() is in virtblk_done() is not sufficient
> > > necessarily sufficient to ensure that the queue will get started again.
> > > In case of global resource outage (-ENOMEM because mapping failure,
> > > because of swiotlb full) our virtqueue may be empty and we can get
> > > stuck with a stopped hw_queue.
> > >
> > > Let us not stop the queue on arbitrary errors, but only on -EONSPC which
> > > indicates a full virtqueue, where the hw_queue is guaranteed to get
> > > started by virtblk_done() before when it makes sense to carry on
> > > submitting requests. Let us also remove a stale comment.
> >
> > The generic solution may be to stop queue only when there is any
> > in-flight request
> > not completed.
> >
>
> I think this is a pretty close to that. The queue is stopped only on
> ENOSPC, which means virtqueue is full.
>
> > Checking -ENOMEM may not be enough, given -EIO can be returned from
> > virtqueue_add()
> > too in case of dma map failure.
>
> I'm not checking on -ENOMEM. So the queue would not be stopped on EIO.
> Maybe I'm misunderstanding something In any case, please have another
> look at the diff, and if your concerns persist please help me understand.

Looks I misread the patch, and this patch is fine:

Reviewed-by: Ming Lei <ming.lei@redhat.com>


Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] virtio-blk: improve handling of DMA mapping failures
  2020-02-13 12:37 [PATCH 0/2] virtio-blk: improve handling of DMA mapping failures Halil Pasic
  2020-02-13 12:37 ` [PATCH 1/2] virtio-blk: fix hw_queue stopped on arbitrary error Halil Pasic
  2020-02-13 12:37 ` [PATCH 2/2] virtio-blk: improve virtqueue error to BLK_STS Halil Pasic
@ 2020-02-19 15:11 ` Stefan Hajnoczi
  2 siblings, 0 replies; 10+ messages in thread
From: Stefan Hajnoczi @ 2020-02-19 15:11 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Michael S. Tsirkin, Jason Wang, Paolo Bonzini, Jens Axboe,
	virtualization, linux-block, linux-kernel, linux-s390,
	Christian Borntraeger, Viktor Mihajlovski, Cornelia Huck,
	Ram Pai, Thiago Jung Bauermann, Lendacky, Thomas

[-- Attachment #1: Type: text/plain, Size: 844 bytes --]

On Thu, Feb 13, 2020 at 01:37:26PM +0100, Halil Pasic wrote:
> Two patches are handling new edge cases introduced by doing DMA mappings
> (which can fail) in virtio core.
> 
> I stumbled upon this while stress testing I/O for Protected Virtual
> Machines. I deliberately chose a tiny swiotlb size and have generated
> load with fio. With more than one virtio-blk disk in use I experienced
> hangs.
> 
> The goal of this series is to fix those hangs.
> 
> Halil Pasic (2):
>   virtio-blk: fix hw_queue stopped on arbitrary error
>   virtio-blk: improve virtqueue error to BLK_STS
> 
>  drivers/block/virtio_blk.c | 17 ++++++++++++-----
>  1 file changed, 12 insertions(+), 5 deletions(-)
> 
> 
> base-commit: 39bed42de2e7d74686a2d5a45638d6a5d7e7d473
> -- 
> 2.17.1
> 

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] virtio-blk: fix hw_queue stopped on arbitrary error
  2020-02-19  1:46       ` Ming Lei
@ 2020-02-19 15:42         ` Halil Pasic
  0 siblings, 0 replies; 10+ messages in thread
From: Halil Pasic @ 2020-02-19 15:42 UTC (permalink / raw)
  To: Ming Lei
  Cc: Michael S. Tsirkin, Jason Wang, Jens Axboe, linux-block,
	linux-s390, Cornelia Huck, Ram Pai, Linux Kernel Mailing List,
	Linux Virtualization, Christian Borntraeger, Stefan Hajnoczi,
	Paolo Bonzini, Lendacky, Thomas, Viktor Mihajlovski

On Wed, 19 Feb 2020 09:46:56 +0800
Ming Lei <tom.leiming@gmail.com> wrote:

> On Tue, Feb 18, 2020 at 8:35 PM Halil Pasic <pasic@linux.ibm.com> wrote:
> >
> > On Tue, 18 Feb 2020 10:21:18 +0800
> > Ming Lei <tom.leiming@gmail.com> wrote:
> >
> > > On Thu, Feb 13, 2020 at 8:38 PM Halil Pasic <pasic@linux.ibm.com> wrote:
> > > >
> > > > Since nobody else is going to restart our hw_queue for us, the
> > > > blk_mq_start_stopped_hw_queues() is in virtblk_done() is not sufficient
> > > > necessarily sufficient to ensure that the queue will get started again.
> > > > In case of global resource outage (-ENOMEM because mapping failure,
> > > > because of swiotlb full) our virtqueue may be empty and we can get
> > > > stuck with a stopped hw_queue.
> > > >
> > > > Let us not stop the queue on arbitrary errors, but only on -EONSPC which
> > > > indicates a full virtqueue, where the hw_queue is guaranteed to get
> > > > started by virtblk_done() before when it makes sense to carry on
> > > > submitting requests. Let us also remove a stale comment.
> > >
> > > The generic solution may be to stop queue only when there is any
> > > in-flight request
> > > not completed.
> > >
> >
> > I think this is a pretty close to that. The queue is stopped only on
> > ENOSPC, which means virtqueue is full.
> >
> > > Checking -ENOMEM may not be enough, given -EIO can be returned from
> > > virtqueue_add()
> > > too in case of dma map failure.
> >
> > I'm not checking on -ENOMEM. So the queue would not be stopped on EIO.
> > Maybe I'm misunderstanding something In any case, please have another
> > look at the diff, and if your concerns persist please help me understand.
> 
> Looks I misread the patch, and this patch is fine:
> 
> Reviewed-by: Ming Lei <ming.lei@redhat.com>

Thank you very much!

Regards,
Halil

> 
> 
> Thanks,
> Ming Lei


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, back to index

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-13 12:37 [PATCH 0/2] virtio-blk: improve handling of DMA mapping failures Halil Pasic
2020-02-13 12:37 ` [PATCH 1/2] virtio-blk: fix hw_queue stopped on arbitrary error Halil Pasic
2020-02-14 18:20   ` dongli.zhang
2020-02-17 13:08     ` Halil Pasic
2020-02-18  2:21   ` Ming Lei
2020-02-18 12:35     ` Halil Pasic
2020-02-19  1:46       ` Ming Lei
2020-02-19 15:42         ` Halil Pasic
2020-02-13 12:37 ` [PATCH 2/2] virtio-blk: improve virtqueue error to BLK_STS Halil Pasic
2020-02-19 15:11 ` [PATCH 0/2] virtio-blk: improve handling of DMA mapping failures Stefan Hajnoczi

Linux-Block Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-block/0 linux-block/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-block linux-block/ https://lore.kernel.org/linux-block \
		linux-block@vger.kernel.org
	public-inbox-index linux-block

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-block


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git