All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] virtio_scsi: Reject commands when virtqueue is broken
@ 2017-01-13 17:48 Eric Farman
  2017-01-13 17:48 ` Eric Farman
  0 siblings, 1 reply; 4+ messages in thread
From: Eric Farman @ 2017-01-13 17:48 UTC (permalink / raw)
  To: linux-scsi; +Cc: jejb, martin.petersen, famz, Eric Farman

While doing some disruptive testing with QEMU/KVM, I have encountered some
guest problems during hot unplug of virtio-scsi devices depending on the
order of operations in which they are performed.  The following notes
describe my setup (s390x), and how I'm able to reproduce the error and
test the attached fix.

In both the "working" and "failing" case, the detaches appear to work
just fine.  Any sign of problems only begin to appear later based on
other actions I may perform, such as powering off the guest system.

Host:
 # lsscsi -g | grep sg6
 [6:0:6:1074151456]disk    IBM      2107900          .217  /dev/sdg   /dev/sg6 

QEMU:
 - Include the following parameters
    -device virtio-scsi-ccw,id=scsi0
    -drive file=/dev/sg6,if=none,id=drive0,format=raw
    -device scsi-generic,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive0,id=hostdev6
 - QMP commands (working)
    - device_del hostdev6
    - device_del scsi0
 - QMP commands (failing)
    - device_del scsi0

Libvirt:
 - Note: A preventative fix went into Libvirt 2.5.0
   (libvirt commit 655429a0d4a5 ("qemu: Prevent detaching SCSI controller used by hostdev"))
 - Include the following XML
    # cat scsicontroller.xml 
        <controller type='scsi' model='virtio-scsi' index='0'/>
    # cat scsihostdev.xml 
        <hostdev mode='subsystem' type='scsi'>
          <source>
            <adapter name='scsi_host6'/>
            <address bus='0' target='6' unit='1074151456'/>
          </source>
        </hostdev>
 - virsh commands (working)
    - virsh detach-device guest scsihostdev.xml
    - virsh detach-device guest scsicontroller.xml
 - virsh commands (failing)
    - virsh detach-device guest scsicontroller.xml

v1->v2:
 - Hold vq_lock across virtscsi_complete_cmd call (Fam Zheng)

Eric Farman (1):
  virtio_scsi: Reject commands when virtqueue is broken

 drivers/scsi/virtio_scsi.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

-- 
1.9.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v2] virtio_scsi: Reject commands when virtqueue is broken
  2017-01-13 17:48 [PATCH v2] virtio_scsi: Reject commands when virtqueue is broken Eric Farman
@ 2017-01-13 17:48 ` Eric Farman
  2017-01-16  5:39   ` Fam Zheng
  2017-01-21  0:18   ` Martin K. Petersen
  0 siblings, 2 replies; 4+ messages in thread
From: Eric Farman @ 2017-01-13 17:48 UTC (permalink / raw)
  To: linux-scsi; +Cc: jejb, martin.petersen, famz, Eric Farman

In the case of a graceful set of detaches, where the virtio-scsi-ccw
disk is removed from the guest prior to the controller, the guest
behaves quite normally.  Specifically, the detach gets us into
sd_sync_cache to issue a Synchronize Cache(10) command, which
immediately fails (and is retried a couple of times) because the
device has been removed.  Later, the removal of the controller
sees two CRWs presented, but there's no further indication of the
removal from the guest viewpoint.

 [   17.217458] sd 0:0:0:0: [sda] Synchronizing SCSI cache
 [   17.219257] sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
 [   21.449400] crw_info : CRW reports slct=0, oflw=0, chn=1, rsc=3, anc=0, erc=4, rsid=2
 [   21.449406] crw_info : CRW reports slct=0, oflw=0, chn=0, rsc=3, anc=0, erc=4, rsid=0

However, on s390, the SCSI disks can be removed "by surprise" when
an entire controller (host) is removed and all associated disks
are removed via the loop in scsi_forget_host.  The same call to
sd_sync_cache is made, but because the controller has already
been removed, the Synchronize Cache(10) command is neither issued
(and then failed) nor rejected.

That the I/O isn't returned means the guest cannot have other devices
added nor removed, and other tasks (such as shutdown or reboot) issued
by the guest will not complete either.  The virtio ring has already
been marked as broken (via virtio_break_device in virtio_ccw_remove),
but we still attempt to queue the command only to have it remain there.
The calling sequence provides a bit of distinction for us:

  virtscsi_queuecommand()
   -> virtscsi_kick_cmd()
    -> virtscsi_add_cmd()
     -> virtqueue_add_sgs()
      -> virtqueue_add()
         if success
           return 0
         elseif vq->broken or vring_mapping_error()
           return -EIO
         else
           return -ENOSPC

A return of ENOSPC is generally a temporary condition, so returning
"host busy" from virtscsi_queuecommand makes sense here, to have it
redriven in a moment or two.  But the EIO return code is more of a
permanent error and so it would be wise to return the I/O itself and
allow the calling thread to finish gracefully.  The result is these
four kernel messages in the guest (the fourth one does not occur
prior to this patch):

 [   22.921562] crw_info : CRW reports slct=0, oflw=0, chn=1, rsc=3, anc=0, erc=4, rsid=2
 [   22.921580] crw_info : CRW reports slct=0, oflw=0, chn=0, rsc=3, anc=0, erc=4, rsid=0
 [   22.921978] sd 0:0:0:0: [sda] Synchronizing SCSI cache
 [   22.921993] sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK

I opted to fill in the same response data that is returned from the
more graceful device detach, where the disk device is removed prior
to the controller device.

Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
---
 drivers/scsi/virtio_scsi.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index ec91bd0..c680d76 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -534,7 +534,9 @@ static int virtscsi_queuecommand(struct virtio_scsi *vscsi,
 {
 	struct Scsi_Host *shost = virtio_scsi_host(vscsi->vdev);
 	struct virtio_scsi_cmd *cmd = scsi_cmd_priv(sc);
+	unsigned long flags;
 	int req_size;
+	int ret;
 
 	BUG_ON(scsi_sg_count(sc) > shost->sg_tablesize);
 
@@ -562,8 +564,15 @@ static int virtscsi_queuecommand(struct virtio_scsi *vscsi,
 		req_size = sizeof(cmd->req.cmd);
 	}
 
-	if (virtscsi_kick_cmd(req_vq, cmd, req_size, sizeof(cmd->resp.cmd)) != 0)
+	ret = virtscsi_kick_cmd(req_vq, cmd, req_size, sizeof(cmd->resp.cmd));
+	if (ret == -EIO) {
+		cmd->resp.cmd.response = VIRTIO_SCSI_S_BAD_TARGET;
+		spin_lock_irqsave(&req_vq->vq_lock, flags);
+		virtscsi_complete_cmd(vscsi, cmd);
+		spin_unlock_irqrestore(&req_vq->vq_lock, flags);
+	} else if (ret != 0) {
 		return SCSI_MLQUEUE_HOST_BUSY;
+	}
 	return 0;
 }
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] virtio_scsi: Reject commands when virtqueue is broken
  2017-01-13 17:48 ` Eric Farman
@ 2017-01-16  5:39   ` Fam Zheng
  2017-01-21  0:18   ` Martin K. Petersen
  1 sibling, 0 replies; 4+ messages in thread
From: Fam Zheng @ 2017-01-16  5:39 UTC (permalink / raw)
  To: Eric Farman; +Cc: linux-scsi, jejb, martin.petersen

On Fri, 01/13 12:48, Eric Farman wrote:
> In the case of a graceful set of detaches, where the virtio-scsi-ccw
> disk is removed from the guest prior to the controller, the guest
> behaves quite normally.  Specifically, the detach gets us into
> sd_sync_cache to issue a Synchronize Cache(10) command, which
> immediately fails (and is retried a couple of times) because the
> device has been removed.  Later, the removal of the controller
> sees two CRWs presented, but there's no further indication of the
> removal from the guest viewpoint.
> 
>  [   17.217458] sd 0:0:0:0: [sda] Synchronizing SCSI cache
>  [   17.219257] sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
>  [   21.449400] crw_info : CRW reports slct=0, oflw=0, chn=1, rsc=3, anc=0, erc=4, rsid=2
>  [   21.449406] crw_info : CRW reports slct=0, oflw=0, chn=0, rsc=3, anc=0, erc=4, rsid=0
> 
> However, on s390, the SCSI disks can be removed "by surprise" when
> an entire controller (host) is removed and all associated disks
> are removed via the loop in scsi_forget_host.  The same call to
> sd_sync_cache is made, but because the controller has already
> been removed, the Synchronize Cache(10) command is neither issued
> (and then failed) nor rejected.
> 
> That the I/O isn't returned means the guest cannot have other devices
> added nor removed, and other tasks (such as shutdown or reboot) issued
> by the guest will not complete either.  The virtio ring has already
> been marked as broken (via virtio_break_device in virtio_ccw_remove),
> but we still attempt to queue the command only to have it remain there.
> The calling sequence provides a bit of distinction for us:
> 
>   virtscsi_queuecommand()
>    -> virtscsi_kick_cmd()
>     -> virtscsi_add_cmd()
>      -> virtqueue_add_sgs()
>       -> virtqueue_add()
>          if success
>            return 0
>          elseif vq->broken or vring_mapping_error()
>            return -EIO
>          else
>            return -ENOSPC
> 
> A return of ENOSPC is generally a temporary condition, so returning
> "host busy" from virtscsi_queuecommand makes sense here, to have it
> redriven in a moment or two.  But the EIO return code is more of a
> permanent error and so it would be wise to return the I/O itself and
> allow the calling thread to finish gracefully.  The result is these
> four kernel messages in the guest (the fourth one does not occur
> prior to this patch):
> 
>  [   22.921562] crw_info : CRW reports slct=0, oflw=0, chn=1, rsc=3, anc=0, erc=4, rsid=2
>  [   22.921580] crw_info : CRW reports slct=0, oflw=0, chn=0, rsc=3, anc=0, erc=4, rsid=0
>  [   22.921978] sd 0:0:0:0: [sda] Synchronizing SCSI cache
>  [   22.921993] sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> 
> I opted to fill in the same response data that is returned from the
> more graceful device detach, where the disk device is removed prior
> to the controller device.
> 
> Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
> ---
>  drivers/scsi/virtio_scsi.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
> index ec91bd0..c680d76 100644
> --- a/drivers/scsi/virtio_scsi.c
> +++ b/drivers/scsi/virtio_scsi.c
> @@ -534,7 +534,9 @@ static int virtscsi_queuecommand(struct virtio_scsi *vscsi,
>  {
>  	struct Scsi_Host *shost = virtio_scsi_host(vscsi->vdev);
>  	struct virtio_scsi_cmd *cmd = scsi_cmd_priv(sc);
> +	unsigned long flags;
>  	int req_size;
> +	int ret;
>  
>  	BUG_ON(scsi_sg_count(sc) > shost->sg_tablesize);
>  
> @@ -562,8 +564,15 @@ static int virtscsi_queuecommand(struct virtio_scsi *vscsi,
>  		req_size = sizeof(cmd->req.cmd);
>  	}
>  
> -	if (virtscsi_kick_cmd(req_vq, cmd, req_size, sizeof(cmd->resp.cmd)) != 0)
> +	ret = virtscsi_kick_cmd(req_vq, cmd, req_size, sizeof(cmd->resp.cmd));
> +	if (ret == -EIO) {
> +		cmd->resp.cmd.response = VIRTIO_SCSI_S_BAD_TARGET;
> +		spin_lock_irqsave(&req_vq->vq_lock, flags);
> +		virtscsi_complete_cmd(vscsi, cmd);
> +		spin_unlock_irqrestore(&req_vq->vq_lock, flags);
> +	} else if (ret != 0) {
>  		return SCSI_MLQUEUE_HOST_BUSY;
> +	}
>  	return 0;
>  }
>  
> -- 
> 1.9.1
> 

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] virtio_scsi: Reject commands when virtqueue is broken
  2017-01-13 17:48 ` Eric Farman
  2017-01-16  5:39   ` Fam Zheng
@ 2017-01-21  0:18   ` Martin K. Petersen
  1 sibling, 0 replies; 4+ messages in thread
From: Martin K. Petersen @ 2017-01-21  0:18 UTC (permalink / raw)
  To: Eric Farman; +Cc: linux-scsi, jejb, martin.petersen, famz

>>>>> "Eric" == Eric Farman <farman@linux.vnet.ibm.com> writes:

Eric> In the case of a graceful set of detaches, where the
Eric> virtio-scsi-ccw disk is removed from the guest prior to the
Eric> controller, the guest behaves quite normally.  Specifically, the
Eric> detach gets us into sd_sync_cache to issue a Synchronize Cache(10)
Eric> command, which immediately fails (and is retried a couple of
Eric> times) because the device has been removed.  Later, the removal of
Eric> the controller sees two CRWs presented, but there's no further
Eric> indication of the removal from the guest viewpoint.

Applied to 4.10/scsi-fixes.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-01-21  0:18 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-13 17:48 [PATCH v2] virtio_scsi: Reject commands when virtqueue is broken Eric Farman
2017-01-13 17:48 ` Eric Farman
2017-01-16  5:39   ` Fam Zheng
2017-01-21  0:18   ` Martin K. Petersen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.