linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dongli Zhang <dongli.zhang@oracle.com>
To: virtualization@lists.linux-foundation.org,
	linux-scsi@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-block@vger.kernel.org
Cc: mst@redhat.com, jasowang@redhat.com, pbonzini@redhat.com,
	stefanha@redhat.com, jejb@linux.ibm.com,
	martin.petersen@oracle.com, joe.jin@oracle.com,
	junxiao.bi@oracle.com, srinivas.eeda@oracle.com
Subject: [RFC] virtio_scsi: to poll and kick the virtqueue in timeout handler
Date: Sat, 22 May 2021 23:38:43 -0700	[thread overview]
Message-ID: <20210523063843.1177-1-dongli.zhang@oracle.com> (raw)

This RFC is to trigger the discussion about to poll and kick the
virtqueue on purpose in virtio-scsi timeout handler.

The virtio-scsi relies on the virtio vring shared between VM and host.
The VM side produces requests to vring and kicks the virtqueue, while the
host side produces responses to vring and interrupts the VM side.

By default the virtio-scsi handler depends on the host timeout handler
by BLK_EH_RESET_TIMER to give host a chance to perform EH.

However, this is not helpful for the case that the responses are available
on vring but the notification from host to VM is lost.


Would you mind share your feedback on this idea to poll the vring on
purpose in timeout handler to give VM a chance to recover from stalled
IO? In addition, how about to kick for an extra time, in case the
stalled IO is due to the loss of VM-to-host notification?


I have tested the IO can be recovered after interrupt is lost on purpose
with below debug patch.

https://github.com/finallyjustice/patchset/blob/master/scsi-virtio_scsi-to-lose-an-interrupt-on-purpose.patch


In addition, the virtio-blk may implement the timeout handler as well,
with the helper function in below patchset from Stefan.

https://lore.kernel.org/linux-block/20210520141305.355961-1-stefanha@redhat.com/T/#t


Thank you very much!

Dongli Zhang

--------

Considering there might be loss of interrupt or kick issue, the timeout
handler may poll or kick the virtqueue on purpose, in order to recover the
IO.

If the response is already available on vring, it indicates the host side
has already processed the request and it is not helpful by giving host a
chance to perform EH.

There will be syslog like below by the timeout handler:

[  135.830746] virtio_scsi: Virtio SCSI HBA 0: I/O 196 QID 3 timeout, completion polled

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
---
 drivers/scsi/virtio_scsi.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index b9c86a7e3b97..633950b6336a 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -36,6 +36,9 @@
 #define VIRTIO_SCSI_EVENT_LEN 8
 #define VIRTIO_SCSI_VQ_BASE 2
 
+static bool __read_mostly timed_out_enabled;
+module_param(timed_out_enabled, bool, 0644);
+
 /* Command queue element */
 struct virtio_scsi_cmd {
 	struct scsi_cmnd *sc;
@@ -732,9 +735,38 @@ static void virtscsi_commit_rqs(struct Scsi_Host *shost, u16 hwq)
  * The host guarantees to respond to each command, although I/O
  * latencies might be higher than on bare metal.  Reset the timer
  * unconditionally to give the host a chance to perform EH.
+ *
+ * If 'timed_out_enabled' is set, the timeout handler polls or kicks the
+ * virtqueue on purpose, in order to recover the IO, in case there is loss
+ * of interrupt or kick issue with virtio.
  */
 static enum blk_eh_timer_return virtscsi_eh_timed_out(struct scsi_cmnd *scmnd)
 {
+	struct Scsi_Host *shost;
+	struct virtio_scsi *vscsi;
+	struct virtio_scsi_vq *req_vq;
+
+	if (!timed_out_enabled)
+		return BLK_EH_RESET_TIMER;
+
+	shost = scmnd->device->host;
+	vscsi = shost_priv(shost);
+	req_vq = virtscsi_pick_vq_mq(vscsi, scmnd);
+
+	virtscsi_vq_done(vscsi, req_vq, virtscsi_complete_cmd);
+
+	if (test_bit(SCMD_STATE_COMPLETE, &scmnd->state)) {
+		pr_warn("Virtio SCSI HBA %u: I/O %u QID %d timeout, completion polled\n",
+			shost->host_no, scmnd->tag, req_vq->vq->index);
+		return BLK_EH_DONE;
+	}
+
+	/*
+	 * To kick the virtqueue in case the timeout is due to the loss of
+	 * one prior notification.
+	 */
+	virtqueue_notify(req_vq->vq);
+
 	return BLK_EH_RESET_TIMER;
 }
 
-- 
2.17.1


             reply	other threads:[~2021-05-23  6:38 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-23  6:38 Dongli Zhang [this message]
2021-05-23  7:39 ` [RFC] virtio_scsi: to poll and kick the virtqueue in timeout handler Hannes Reinecke
2021-05-24 13:24   ` Stefan Hajnoczi
2021-05-25  6:33     ` Dongli Zhang
2021-05-25 16:47       ` Stefan Hajnoczi
2021-05-25 17:24         ` Hannes Reinecke
2021-05-25 17:34           ` Dongli Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210523063843.1177-1-dongli.zhang@oracle.com \
    --to=dongli.zhang@oracle.com \
    --cc=jasowang@redhat.com \
    --cc=jejb@linux.ibm.com \
    --cc=joe.jin@oracle.com \
    --cc=junxiao.bi@oracle.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=srinivas.eeda@oracle.com \
    --cc=stefanha@redhat.com \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).