linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@redhat.com>
To: mwilck@suse.com
Cc: Alasdair G Kergon <agk@redhat.com>,
	Bart Van Assche <Bart.VanAssche@sandisk.com>,
	dm-devel@redhat.com, Hannes Reinecke <hare@suse.de>,
	Daniel Wagner <dwagner@suse.de>,
	linux-block@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
	Christoph Hellwig <hch@lst.de>,
	Benjamin Marzinski <bmarzins@redhat.com>,
	linux-scsi@vger.kernel.org
Subject: Re: [PATCH v3 2/2] dm: add CONFIG_DM_MULTIPATH_SG_IO - failover for SG_IO on dm-multipath
Date: Tue, 15 Jun 2021 13:10:50 -0400	[thread overview]
Message-ID: <YMjfGh9hJjLkol9V@redhat.com> (raw)
In-Reply-To: <20210611202509.5426-3-mwilck@suse.com>

On Fri, Jun 11 2021 at  4:25P -0400,
mwilck@suse.com <mwilck@suse.com> wrote:

> From: Martin Wilck <mwilck@suse.com>
> 
> In virtual deployments, SCSI passthrough over dm-multipath devices is a
> common setup. The qemu "pr-helper" was specifically invented for it. I
> believe that this is the most important real-world scenario for sending
> SG_IO ioctls to device-mapper devices.
> 
> In this configuration, guests send SCSI IO to the hypervisor in the form of
> SG_IO ioctls issued by qemu. But on the device-mapper level, these SCSI
> ioctls aren't treated like regular IO. Until commit 2361ae595352 ("dm mpath:
> switch paths in dm_blk_ioctl() code path"), no path switching was done at
> all. Worse though, if an SG_IO call fails because of a path error,
> dm-multipath doesn't retry the IO on a another path; rather, the failure is
> passed back to the guest, and paths are not marked as faulty.  This is in
> stark contrast with regular block IO of guests on dm-multipath devices, and
> certainly comes as a surprise to users who switch to SCSI passthrough in
> qemu. In general, users of dm-multipath devices would probably expect failover
> to work at least in a basic way.
> 
> This patch fixes this by taking a special code path for SG_IO on request-
> based device mapper targets if CONFIG_DM_MULTIPATH_SG_IO is set.  Rather then
> just choosing a single path, sending the IO to it, and failing to the caller
> if the IO on the path failed, it retries the same IO on another path for
> certain error codes, using blk_path_error() to determine if a retry would
> make sense for the given error code. Moreover, it sends a message to the
> multipath target to mark the path as failed.
> 
> One problem remains open: if all paths in a multipath map are failed,
> normal multipath IO may switch to queueing mode (depending on
> configuration). This isn't possible for SG_IO, as SG_IO requests can't
> easily be queued like regular block I/O. Thus in the "no path" case, the
> guest will still see an error.
> 
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>  block/scsi_ioctl.c         |   5 +-
>  drivers/md/Kconfig         |  11 ++++
>  drivers/md/Makefile        |   4 ++
>  drivers/md/dm-core.h       |   5 ++
>  drivers/md/dm-rq.h         |  11 ++++
>  drivers/md/dm-scsi_ioctl.c | 127 +++++++++++++++++++++++++++++++++++++
>  drivers/md/dm.c            |  20 +++++-
>  include/linux/blkdev.h     |   2 +
>  8 files changed, 180 insertions(+), 5 deletions(-)
>  create mode 100644 drivers/md/dm-scsi_ioctl.c
> 
> diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
> index b39e0835600f..38771f4bcf18 100644
> --- a/block/scsi_ioctl.c
> +++ b/block/scsi_ioctl.c
> @@ -279,8 +279,8 @@ static int blk_complete_sghdr_rq(struct request *rq, struct sg_io_hdr *hdr,
>  	return ret;
>  }
>  
> -static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
> -		struct sg_io_hdr *hdr, fmode_t mode)
> +int sg_io(struct request_queue *q, struct gendisk *bd_disk,
> +	  struct sg_io_hdr *hdr, fmode_t mode)
>  {
>  	unsigned long start_time;
>  	ssize_t ret = 0;
> @@ -365,6 +365,7 @@ static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
>  	blk_put_request(rq);
>  	return ret;
>  }
> +EXPORT_SYMBOL_GPL(sg_io);
>  
>  /**
>   * sg_scsi_ioctl  --  handle deprecated SCSI_IOCTL_SEND_COMMAND ioctl
> diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
> index f2014385d48b..f28f29e3bd11 100644
> --- a/drivers/md/Kconfig
> +++ b/drivers/md/Kconfig
> @@ -473,6 +473,17 @@ config DM_MULTIPATH_IOA
>  
>  	  If unsure, say N.
>  
> +config DM_MULTIPATH_SG_IO
> +       bool "Retry SCSI generic I/O on multipath devices"
> +       depends on DM_MULTIPATH && BLK_SCSI_REQUEST
> +       help
> +	 With this option, SCSI generic (SG) requests issued on multipath
> +	 devices will behave similar to regular block I/O: upon failure,
> +	 they are repeated on a different path, and the erroring device
> +	 is marked as failed.
> +
> +	 If unsure, say N.
> +

Given how this is all about multipath, there is no reason to bolt this
on to DM core and then play games to issuing multipath target specific
DM message ("fail_path") from generic code.

So the SG_IO ioctl handling code should be in dm-mpath.c and the DM
target interface extended as needed.

>  config DM_DELAY
>  	tristate "I/O delaying target"
>  	depends on BLK_DEV_DM
> diff --git a/drivers/md/Makefile b/drivers/md/Makefile
> index ef7ddc27685c..187ea469f64a 100644
> --- a/drivers/md/Makefile
> +++ b/drivers/md/Makefile
> @@ -88,6 +88,10 @@ ifeq ($(CONFIG_DM_INIT),y)
>  dm-mod-objs			+= dm-init.o
>  endif
>  
> +ifeq ($(CONFIG_DM_MULTIPATH_SG_IO),y)
> +dm-mod-objs			+= dm-scsi_ioctl.o
> +endif
> +
>  ifeq ($(CONFIG_DM_UEVENT),y)
>  dm-mod-objs			+= dm-uevent.o
>  endif
> diff --git a/drivers/md/dm-core.h b/drivers/md/dm-core.h
> index 5953ff2bd260..8bd8a8e3916e 100644
> --- a/drivers/md/dm-core.h
> +++ b/drivers/md/dm-core.h
> @@ -189,4 +189,9 @@ extern atomic_t dm_global_event_nr;
>  extern wait_queue_head_t dm_global_eventq;
>  void dm_issue_global_event(void);
>  
> +int __dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx,
> +		       struct block_device **bdev,
> +		       struct dm_target **target);
> +void dm_unprepare_ioctl(struct mapped_device *md, int srcu_idx);
> +
>  #endif
> diff --git a/drivers/md/dm-rq.h b/drivers/md/dm-rq.h
> index 1eea0da641db..c6d2853e4d1d 100644
> --- a/drivers/md/dm-rq.h
> +++ b/drivers/md/dm-rq.h
> @@ -44,4 +44,15 @@ ssize_t dm_attr_rq_based_seq_io_merge_deadline_show(struct mapped_device *md, ch
>  ssize_t dm_attr_rq_based_seq_io_merge_deadline_store(struct mapped_device *md,
>  						     const char *buf, size_t count);
>  
> +#ifdef CONFIG_DM_MULTIPATH_SG_IO
> +int dm_sg_io_ioctl(struct block_device *bdev, fmode_t mode,
> +		   unsigned int cmd, unsigned long uarg);
> +#else
> +static inline int dm_sg_io_ioctl(struct block_device *bdev, fmode_t mode,
> +				 unsigned int cmd, unsigned long uarg)
> +{
> +	return -ENOTTY;
> +}
> +#endif
> +
>  #endif

There is no reason, that I can see, why bio-based dm-multipath
shouldn't handle SG_IO too.  Why did you constrain it to
request-based?


> diff --git a/drivers/md/dm-scsi_ioctl.c b/drivers/md/dm-scsi_ioctl.c
> new file mode 100644
> index 000000000000..a696e2a6557e
> --- /dev/null
> +++ b/drivers/md/dm-scsi_ioctl.c
> @@ -0,0 +1,127 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2021 Martin Wilck, SUSE LLC
> + */
> +
> +#include "dm-core.h"
> +#include <linux/types.h>
> +#include <linux/errno.h>
> +#include <linux/kernel.h>
> +#include <linux/uaccess.h>
> +#include <linux/blk_types.h>
> +#include <linux/blkdev.h>
> +#include <linux/device-mapper.h>
> +#include <scsi/sg.h>
> +#include <scsi/scsi_cmnd.h>
> +
> +#define DM_MSG_PREFIX "sg_io"
> +
> +int dm_sg_io_ioctl(struct block_device *bdev, fmode_t mode,
> +		   unsigned int cmd, unsigned long uarg)
> +{
> +	struct mapped_device *md = bdev->bd_disk->private_data;
> +	struct sg_io_hdr hdr;
> +	void __user *arg = (void __user *)uarg;
> +	int rc, srcu_idx;
> +	char path_name[BDEVNAME_SIZE];
> +
> +	if (cmd != SG_IO)
> +		return -ENOTTY;
> +
> +	if (copy_from_user(&hdr, arg, sizeof(hdr)))
> +		return -EFAULT;
> +
> +	if (hdr.interface_id != 'S')
> +		return -EINVAL;
> +
> +	if (hdr.dxfer_len > (queue_max_hw_sectors(bdev->bd_disk->queue) << 9))
> +		return -EIO;
> +
> +	for (;;) {
> +		struct dm_target *tgt;
> +		struct sg_io_hdr rhdr;
> +
> +		rc = __dm_prepare_ioctl(md, &srcu_idx, &bdev, &tgt);
> +		if (rc < 0) {
> +			DMERR("%s: failed to get path: %d",
> +			      __func__, rc);
> +			goto out;
> +		}
> +
> +		rhdr = hdr;
> +
> +		rc = sg_io(bdev->bd_disk->queue, bdev->bd_disk, &rhdr, mode);
> +
> +		DMDEBUG("SG_IO via %s: rc = %d D%02xH%02xM%02xS%02x",
> +			bdevname(bdev, path_name), rc,
> +			rhdr.driver_status, rhdr.host_status,
> +			rhdr.msg_status, rhdr.status);
> +
> +		/*
> +		 * Errors resulting from invalid parameters shouldn't be retried
> +		 * on another path.
> +		 */
> +		switch (rc) {
> +		case -ENOIOCTLCMD:
> +		case -EFAULT:
> +		case -EINVAL:
> +		case -EPERM:
> +			goto out;
> +		default:
> +			break;
> +		}
> +
> +		if (rhdr.info & SG_INFO_CHECK) {
> +			int result;
> +			blk_status_t sts;
> +
> +			result = rhdr.status |
> +				(rhdr.msg_status << 8) |
> +				(rhdr.host_status << 16) |
> +				(rhdr.driver_status << 24);
> +
> +			sts = __scsi_result_to_blk_status(&result, result);
> +			rhdr.host_status = host_byte(result);

It is the open-coded SCSI specific sg_io_hdr and result work that
feels like it doesn't belong open-coded in DM.  So maybe the above
code from this SG_INFO_CHECK block could go into an
block/scsi_ioctl.c:sg_io_info_check() method?

> +
> +			/* See if this is a target or path error. */
> +			if (sts == BLK_STS_OK)
> +				rc = 0;
> +			else if (blk_path_error(sts))
> +				rc = -EIO;
> +			else {
> +				rc = blk_status_to_errno(sts);
> +				goto out;
> +			}
> +		}
> +
> +		if (rc == 0) {
> +			/* success */
> +			if (copy_to_user(arg, &rhdr, sizeof(rhdr)))
> +				rc = -EFAULT;
> +			goto out;
> +		}
> +
> +		/* Failure - fail path by sending a message to the target */
> +		if (!tgt->type->message) {
> +			DMWARN("invalid target!");
> +			rc = -EIO;
> +			goto out;
> +		} else {
> +			char bdbuf[BDEVT_SIZE];
> +			char *argv[2] = { "fail_path", bdbuf };
> +
> +			scnprintf(bdbuf, sizeof(bdbuf), "%u:%u",
> +				  MAJOR(bdev->bd_dev), MINOR(bdev->bd_dev));
> +
> +			DMDEBUG("sending \"%s %s\" to target", argv[0], argv[1]);
> +			rc = tgt->type->message(tgt, 2, argv, NULL, 0);
> +			if (rc < 0)
> +				goto out;
> +		}

If you factor things how I suggest below (introducing
dm-mpath.c:dm_mpath_ioctl) then you'll have direct access to
dm-mpath.c:fail_path() so need need to mess around with constructing
DM messages from kernel code.

> +
> +		dm_unprepare_ioctl(md, srcu_idx);
> +	}
> +out:
> +	dm_unprepare_ioctl(md, srcu_idx);
> +	return rc;
> +}
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index ca2aedd8ee7d..29b93fb3929e 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -522,8 +522,9 @@ static int dm_blk_report_zones(struct gendisk *disk, sector_t sector,
>  #define dm_blk_report_zones		NULL
>  #endif /* CONFIG_BLK_DEV_ZONED */
>  
> -static int dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx,
> -			    struct block_device **bdev)
> +int __dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx,
> +		       struct block_device **bdev,
> +		       struct dm_target **target)
>  {
>  	struct dm_target *tgt;
>  	struct dm_table *map;
> @@ -553,10 +554,19 @@ static int dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx,
>  		goto retry;
>  	}
>  
> +	if (r >= 0 && target)
> +		*target = tgt;
> +
>  	return r;
>  }

For dm-multipath you can leverage md->immutable_target always being
multipath.

So after dm_blk_ioctl's dm_prepare_ioctl: 

if (cmd == SG_IO && md->immutable_target &&
    md->immutable_target->ioctl)
    r = md->immutable_target->ioctl(bdev, mode, cmd, arg);

(doing check for SG_IO here just avoids needless call into ->ioctl for
now, but it could be other ioctls will need specialization in future
so checking 'cmd' should be pushed down to md->immutable_target->ioctl
at that time?)

But I'm not following you use of a for (;;) in dm_sg_io_ioctl() --
other than to retry infinitely (you aren't checking for no paths!?,
etc).

Anyway, best to have md->immutable_target->ioctl return
-EAGAIN and goto top of dm_blk_ioctl as needed?

>  
> -static void dm_unprepare_ioctl(struct mapped_device *md, int srcu_idx)
> +static int dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx,
> +			    struct block_device **bdev)
> +{
> +	return __dm_prepare_ioctl(md, srcu_idx, bdev, NULL);
> +}
> +
> +void dm_unprepare_ioctl(struct mapped_device *md, int srcu_idx)
>  {
>  	dm_put_live_table(md, srcu_idx);
>  }
> @@ -567,6 +577,10 @@ static int dm_blk_ioctl(struct block_device *bdev, fmode_t mode,
>  	struct mapped_device *md = bdev->bd_disk->private_data;
>  	int r, srcu_idx;
>  
> +	if ((dm_get_md_type(md) == DM_TYPE_REQUEST_BASED) &&
> +	    ((r = dm_sg_io_ioctl(bdev, mode, cmd, arg)) != -ENOTTY))
> +		return r;
> +

Again, bio-based multipath should work fine with SG_IO too.

>  	r = dm_prepare_ioctl(md, &srcu_idx, &bdev);
>  	if (r < 0)
>  		goto out;
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 48497a77428d..b8f1d603cc7a 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -923,6 +923,8 @@ extern int scsi_cmd_ioctl(struct request_queue *, struct gendisk *, fmode_t,
>  			  unsigned int, void __user *);
>  extern int sg_scsi_ioctl(struct request_queue *, struct gendisk *, fmode_t,
>  			 struct scsi_ioctl_command __user *);
> +extern int sg_io(struct request_queue *, struct gendisk *,
> +		 struct sg_io_hdr *, fmode_t);
>  extern int get_sg_io_hdr(struct sg_io_hdr *hdr, const void __user *argp);
>  extern int put_sg_io_hdr(const struct sg_io_hdr *hdr, void __user *argp);
>  
> -- 
> 2.31.1
> 

Think adding block/scsi_ioctl.c:sg_io_info_check() and exporting it
and sg_io() via blkdev.h should be done as a separate patch 2.

Then patch 3 is purely DM changes to use sg_io() and
sg_io_info_check()

Mike

  reply	other threads:[~2021-06-15 17:10 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-11 20:25 [PATCH v3 0/2] dm: dm_blk_ioctl(): implement failover for SG_IO on dm-multipath mwilck
2021-06-11 20:25 ` [PATCH v3 1/2] scsi: export __scsi_result_to_blk_status() in scsi_ioctl.c mwilck
2021-06-11 20:25 ` [PATCH v3 2/2] dm: add CONFIG_DM_MULTIPATH_SG_IO - failover for SG_IO on dm-multipath mwilck
2021-06-15 17:10   ` Mike Snitzer [this message]
2021-06-16  9:56     ` Martin Wilck
2021-06-14 15:15 ` [PATCH v3 0/2] dm: dm_blk_ioctl(): implement " Mike Snitzer
2021-06-15 10:54   ` Martin Wilck
2021-06-15 17:11     ` Mike Snitzer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YMjfGh9hJjLkol9V@redhat.com \
    --to=snitzer@redhat.com \
    --cc=Bart.VanAssche@sandisk.com \
    --cc=agk@redhat.com \
    --cc=bmarzins@redhat.com \
    --cc=dm-devel@redhat.com \
    --cc=dwagner@suse.de \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=mwilck@suse.com \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).