From: Mike Snitzer <snitzer@redhat.com>
To: mwilck@suse.com
Cc: Alasdair G Kergon <agk@redhat.com>,
Bart Van Assche <Bart.VanAssche@sandisk.com>,
dm-devel@redhat.com, Hannes Reinecke <hare@suse.de>,
Daniel Wagner <dwagner@suse.de>,
linux-block@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
Christoph Hellwig <hch@lst.de>,
Benjamin Marzinski <bmarzins@redhat.com>,
linux-scsi@vger.kernel.org
Subject: Re: [PATCH v3 2/2] dm: add CONFIG_DM_MULTIPATH_SG_IO - failover for SG_IO on dm-multipath
Date: Tue, 15 Jun 2021 13:10:50 -0400 [thread overview]
Message-ID: <YMjfGh9hJjLkol9V@redhat.com> (raw)
In-Reply-To: <20210611202509.5426-3-mwilck@suse.com>
On Fri, Jun 11 2021 at 4:25P -0400,
mwilck@suse.com <mwilck@suse.com> wrote:
> From: Martin Wilck <mwilck@suse.com>
>
> In virtual deployments, SCSI passthrough over dm-multipath devices is a
> common setup. The qemu "pr-helper" was specifically invented for it. I
> believe that this is the most important real-world scenario for sending
> SG_IO ioctls to device-mapper devices.
>
> In this configuration, guests send SCSI IO to the hypervisor in the form of
> SG_IO ioctls issued by qemu. But on the device-mapper level, these SCSI
> ioctls aren't treated like regular IO. Until commit 2361ae595352 ("dm mpath:
> switch paths in dm_blk_ioctl() code path"), no path switching was done at
> all. Worse though, if an SG_IO call fails because of a path error,
> dm-multipath doesn't retry the IO on a another path; rather, the failure is
> passed back to the guest, and paths are not marked as faulty. This is in
> stark contrast with regular block IO of guests on dm-multipath devices, and
> certainly comes as a surprise to users who switch to SCSI passthrough in
> qemu. In general, users of dm-multipath devices would probably expect failover
> to work at least in a basic way.
>
> This patch fixes this by taking a special code path for SG_IO on request-
> based device mapper targets if CONFIG_DM_MULTIPATH_SG_IO is set. Rather then
> just choosing a single path, sending the IO to it, and failing to the caller
> if the IO on the path failed, it retries the same IO on another path for
> certain error codes, using blk_path_error() to determine if a retry would
> make sense for the given error code. Moreover, it sends a message to the
> multipath target to mark the path as failed.
>
> One problem remains open: if all paths in a multipath map are failed,
> normal multipath IO may switch to queueing mode (depending on
> configuration). This isn't possible for SG_IO, as SG_IO requests can't
> easily be queued like regular block I/O. Thus in the "no path" case, the
> guest will still see an error.
>
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
> block/scsi_ioctl.c | 5 +-
> drivers/md/Kconfig | 11 ++++
> drivers/md/Makefile | 4 ++
> drivers/md/dm-core.h | 5 ++
> drivers/md/dm-rq.h | 11 ++++
> drivers/md/dm-scsi_ioctl.c | 127 +++++++++++++++++++++++++++++++++++++
> drivers/md/dm.c | 20 +++++-
> include/linux/blkdev.h | 2 +
> 8 files changed, 180 insertions(+), 5 deletions(-)
> create mode 100644 drivers/md/dm-scsi_ioctl.c
>
> diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
> index b39e0835600f..38771f4bcf18 100644
> --- a/block/scsi_ioctl.c
> +++ b/block/scsi_ioctl.c
> @@ -279,8 +279,8 @@ static int blk_complete_sghdr_rq(struct request *rq, struct sg_io_hdr *hdr,
> return ret;
> }
>
> -static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
> - struct sg_io_hdr *hdr, fmode_t mode)
> +int sg_io(struct request_queue *q, struct gendisk *bd_disk,
> + struct sg_io_hdr *hdr, fmode_t mode)
> {
> unsigned long start_time;
> ssize_t ret = 0;
> @@ -365,6 +365,7 @@ static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
> blk_put_request(rq);
> return ret;
> }
> +EXPORT_SYMBOL_GPL(sg_io);
>
> /**
> * sg_scsi_ioctl -- handle deprecated SCSI_IOCTL_SEND_COMMAND ioctl
> diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
> index f2014385d48b..f28f29e3bd11 100644
> --- a/drivers/md/Kconfig
> +++ b/drivers/md/Kconfig
> @@ -473,6 +473,17 @@ config DM_MULTIPATH_IOA
>
> If unsure, say N.
>
> +config DM_MULTIPATH_SG_IO
> + bool "Retry SCSI generic I/O on multipath devices"
> + depends on DM_MULTIPATH && BLK_SCSI_REQUEST
> + help
> + With this option, SCSI generic (SG) requests issued on multipath
> + devices will behave similar to regular block I/O: upon failure,
> + they are repeated on a different path, and the erroring device
> + is marked as failed.
> +
> + If unsure, say N.
> +
Given how this is all about multipath, there is no reason to bolt this
on to DM core and then play games to issuing multipath target specific
DM message ("fail_path") from generic code.
So the SG_IO ioctl handling code should be in dm-mpath.c and the DM
target interface extended as needed.
> config DM_DELAY
> tristate "I/O delaying target"
> depends on BLK_DEV_DM
> diff --git a/drivers/md/Makefile b/drivers/md/Makefile
> index ef7ddc27685c..187ea469f64a 100644
> --- a/drivers/md/Makefile
> +++ b/drivers/md/Makefile
> @@ -88,6 +88,10 @@ ifeq ($(CONFIG_DM_INIT),y)
> dm-mod-objs += dm-init.o
> endif
>
> +ifeq ($(CONFIG_DM_MULTIPATH_SG_IO),y)
> +dm-mod-objs += dm-scsi_ioctl.o
> +endif
> +
> ifeq ($(CONFIG_DM_UEVENT),y)
> dm-mod-objs += dm-uevent.o
> endif
> diff --git a/drivers/md/dm-core.h b/drivers/md/dm-core.h
> index 5953ff2bd260..8bd8a8e3916e 100644
> --- a/drivers/md/dm-core.h
> +++ b/drivers/md/dm-core.h
> @@ -189,4 +189,9 @@ extern atomic_t dm_global_event_nr;
> extern wait_queue_head_t dm_global_eventq;
> void dm_issue_global_event(void);
>
> +int __dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx,
> + struct block_device **bdev,
> + struct dm_target **target);
> +void dm_unprepare_ioctl(struct mapped_device *md, int srcu_idx);
> +
> #endif
> diff --git a/drivers/md/dm-rq.h b/drivers/md/dm-rq.h
> index 1eea0da641db..c6d2853e4d1d 100644
> --- a/drivers/md/dm-rq.h
> +++ b/drivers/md/dm-rq.h
> @@ -44,4 +44,15 @@ ssize_t dm_attr_rq_based_seq_io_merge_deadline_show(struct mapped_device *md, ch
> ssize_t dm_attr_rq_based_seq_io_merge_deadline_store(struct mapped_device *md,
> const char *buf, size_t count);
>
> +#ifdef CONFIG_DM_MULTIPATH_SG_IO
> +int dm_sg_io_ioctl(struct block_device *bdev, fmode_t mode,
> + unsigned int cmd, unsigned long uarg);
> +#else
> +static inline int dm_sg_io_ioctl(struct block_device *bdev, fmode_t mode,
> + unsigned int cmd, unsigned long uarg)
> +{
> + return -ENOTTY;
> +}
> +#endif
> +
> #endif
There is no reason, that I can see, why bio-based dm-multipath
shouldn't handle SG_IO too. Why did you constrain it to
request-based?
> diff --git a/drivers/md/dm-scsi_ioctl.c b/drivers/md/dm-scsi_ioctl.c
> new file mode 100644
> index 000000000000..a696e2a6557e
> --- /dev/null
> +++ b/drivers/md/dm-scsi_ioctl.c
> @@ -0,0 +1,127 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2021 Martin Wilck, SUSE LLC
> + */
> +
> +#include "dm-core.h"
> +#include <linux/types.h>
> +#include <linux/errno.h>
> +#include <linux/kernel.h>
> +#include <linux/uaccess.h>
> +#include <linux/blk_types.h>
> +#include <linux/blkdev.h>
> +#include <linux/device-mapper.h>
> +#include <scsi/sg.h>
> +#include <scsi/scsi_cmnd.h>
> +
> +#define DM_MSG_PREFIX "sg_io"
> +
> +int dm_sg_io_ioctl(struct block_device *bdev, fmode_t mode,
> + unsigned int cmd, unsigned long uarg)
> +{
> + struct mapped_device *md = bdev->bd_disk->private_data;
> + struct sg_io_hdr hdr;
> + void __user *arg = (void __user *)uarg;
> + int rc, srcu_idx;
> + char path_name[BDEVNAME_SIZE];
> +
> + if (cmd != SG_IO)
> + return -ENOTTY;
> +
> + if (copy_from_user(&hdr, arg, sizeof(hdr)))
> + return -EFAULT;
> +
> + if (hdr.interface_id != 'S')
> + return -EINVAL;
> +
> + if (hdr.dxfer_len > (queue_max_hw_sectors(bdev->bd_disk->queue) << 9))
> + return -EIO;
> +
> + for (;;) {
> + struct dm_target *tgt;
> + struct sg_io_hdr rhdr;
> +
> + rc = __dm_prepare_ioctl(md, &srcu_idx, &bdev, &tgt);
> + if (rc < 0) {
> + DMERR("%s: failed to get path: %d",
> + __func__, rc);
> + goto out;
> + }
> +
> + rhdr = hdr;
> +
> + rc = sg_io(bdev->bd_disk->queue, bdev->bd_disk, &rhdr, mode);
> +
> + DMDEBUG("SG_IO via %s: rc = %d D%02xH%02xM%02xS%02x",
> + bdevname(bdev, path_name), rc,
> + rhdr.driver_status, rhdr.host_status,
> + rhdr.msg_status, rhdr.status);
> +
> + /*
> + * Errors resulting from invalid parameters shouldn't be retried
> + * on another path.
> + */
> + switch (rc) {
> + case -ENOIOCTLCMD:
> + case -EFAULT:
> + case -EINVAL:
> + case -EPERM:
> + goto out;
> + default:
> + break;
> + }
> +
> + if (rhdr.info & SG_INFO_CHECK) {
> + int result;
> + blk_status_t sts;
> +
> + result = rhdr.status |
> + (rhdr.msg_status << 8) |
> + (rhdr.host_status << 16) |
> + (rhdr.driver_status << 24);
> +
> + sts = __scsi_result_to_blk_status(&result, result);
> + rhdr.host_status = host_byte(result);
It is the open-coded SCSI specific sg_io_hdr and result work that
feels like it doesn't belong open-coded in DM. So maybe the above
code from this SG_INFO_CHECK block could go into an
block/scsi_ioctl.c:sg_io_info_check() method?
> +
> + /* See if this is a target or path error. */
> + if (sts == BLK_STS_OK)
> + rc = 0;
> + else if (blk_path_error(sts))
> + rc = -EIO;
> + else {
> + rc = blk_status_to_errno(sts);
> + goto out;
> + }
> + }
> +
> + if (rc == 0) {
> + /* success */
> + if (copy_to_user(arg, &rhdr, sizeof(rhdr)))
> + rc = -EFAULT;
> + goto out;
> + }
> +
> + /* Failure - fail path by sending a message to the target */
> + if (!tgt->type->message) {
> + DMWARN("invalid target!");
> + rc = -EIO;
> + goto out;
> + } else {
> + char bdbuf[BDEVT_SIZE];
> + char *argv[2] = { "fail_path", bdbuf };
> +
> + scnprintf(bdbuf, sizeof(bdbuf), "%u:%u",
> + MAJOR(bdev->bd_dev), MINOR(bdev->bd_dev));
> +
> + DMDEBUG("sending \"%s %s\" to target", argv[0], argv[1]);
> + rc = tgt->type->message(tgt, 2, argv, NULL, 0);
> + if (rc < 0)
> + goto out;
> + }
If you factor things how I suggest below (introducing
dm-mpath.c:dm_mpath_ioctl) then you'll have direct access to
dm-mpath.c:fail_path() so need need to mess around with constructing
DM messages from kernel code.
> +
> + dm_unprepare_ioctl(md, srcu_idx);
> + }
> +out:
> + dm_unprepare_ioctl(md, srcu_idx);
> + return rc;
> +}
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index ca2aedd8ee7d..29b93fb3929e 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -522,8 +522,9 @@ static int dm_blk_report_zones(struct gendisk *disk, sector_t sector,
> #define dm_blk_report_zones NULL
> #endif /* CONFIG_BLK_DEV_ZONED */
>
> -static int dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx,
> - struct block_device **bdev)
> +int __dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx,
> + struct block_device **bdev,
> + struct dm_target **target)
> {
> struct dm_target *tgt;
> struct dm_table *map;
> @@ -553,10 +554,19 @@ static int dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx,
> goto retry;
> }
>
> + if (r >= 0 && target)
> + *target = tgt;
> +
> return r;
> }
For dm-multipath you can leverage md->immutable_target always being
multipath.
So after dm_blk_ioctl's dm_prepare_ioctl:
if (cmd == SG_IO && md->immutable_target &&
md->immutable_target->ioctl)
r = md->immutable_target->ioctl(bdev, mode, cmd, arg);
(doing check for SG_IO here just avoids needless call into ->ioctl for
now, but it could be other ioctls will need specialization in future
so checking 'cmd' should be pushed down to md->immutable_target->ioctl
at that time?)
But I'm not following you use of a for (;;) in dm_sg_io_ioctl() --
other than to retry infinitely (you aren't checking for no paths!?,
etc).
Anyway, best to have md->immutable_target->ioctl return
-EAGAIN and goto top of dm_blk_ioctl as needed?
>
> -static void dm_unprepare_ioctl(struct mapped_device *md, int srcu_idx)
> +static int dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx,
> + struct block_device **bdev)
> +{
> + return __dm_prepare_ioctl(md, srcu_idx, bdev, NULL);
> +}
> +
> +void dm_unprepare_ioctl(struct mapped_device *md, int srcu_idx)
> {
> dm_put_live_table(md, srcu_idx);
> }
> @@ -567,6 +577,10 @@ static int dm_blk_ioctl(struct block_device *bdev, fmode_t mode,
> struct mapped_device *md = bdev->bd_disk->private_data;
> int r, srcu_idx;
>
> + if ((dm_get_md_type(md) == DM_TYPE_REQUEST_BASED) &&
> + ((r = dm_sg_io_ioctl(bdev, mode, cmd, arg)) != -ENOTTY))
> + return r;
> +
Again, bio-based multipath should work fine with SG_IO too.
> r = dm_prepare_ioctl(md, &srcu_idx, &bdev);
> if (r < 0)
> goto out;
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 48497a77428d..b8f1d603cc7a 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -923,6 +923,8 @@ extern int scsi_cmd_ioctl(struct request_queue *, struct gendisk *, fmode_t,
> unsigned int, void __user *);
> extern int sg_scsi_ioctl(struct request_queue *, struct gendisk *, fmode_t,
> struct scsi_ioctl_command __user *);
> +extern int sg_io(struct request_queue *, struct gendisk *,
> + struct sg_io_hdr *, fmode_t);
> extern int get_sg_io_hdr(struct sg_io_hdr *hdr, const void __user *argp);
> extern int put_sg_io_hdr(const struct sg_io_hdr *hdr, void __user *argp);
>
> --
> 2.31.1
>
Think adding block/scsi_ioctl.c:sg_io_info_check() and exporting it
and sg_io() via blkdev.h should be done as a separate patch 2.
Then patch 3 is purely DM changes to use sg_io() and
sg_io_info_check()
Mike
next prev parent reply other threads:[~2021-06-15 17:10 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-11 20:25 [PATCH v3 0/2] dm: dm_blk_ioctl(): implement failover for SG_IO on dm-multipath mwilck
2021-06-11 20:25 ` [PATCH v3 1/2] scsi: export __scsi_result_to_blk_status() in scsi_ioctl.c mwilck
2021-06-11 20:25 ` [PATCH v3 2/2] dm: add CONFIG_DM_MULTIPATH_SG_IO - failover for SG_IO on dm-multipath mwilck
2021-06-15 17:10 ` Mike Snitzer [this message]
2021-06-16 9:56 ` Martin Wilck
2021-06-14 15:15 ` [PATCH v3 0/2] dm: dm_blk_ioctl(): implement " Mike Snitzer
2021-06-15 10:54 ` Martin Wilck
2021-06-15 17:11 ` Mike Snitzer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YMjfGh9hJjLkol9V@redhat.com \
--to=snitzer@redhat.com \
--cc=Bart.VanAssche@sandisk.com \
--cc=agk@redhat.com \
--cc=bmarzins@redhat.com \
--cc=dm-devel@redhat.com \
--cc=dwagner@suse.de \
--cc=hare@suse.de \
--cc=hch@lst.de \
--cc=linux-block@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=mwilck@suse.com \
--cc=pbonzini@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).