linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/2]  dm: dm_blk_ioctl(): implement failover for SG_IO on dm-multipath
@ 2021-06-11 20:25 mwilck
  2021-06-11 20:25 ` [PATCH v3 1/2] scsi: export __scsi_result_to_blk_status() in scsi_ioctl.c mwilck
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: mwilck @ 2021-06-11 20:25 UTC (permalink / raw)
  To: Mike Snitzer, Alasdair G Kergon, Bart Van Assche, dm-devel,
	Hannes Reinecke
  Cc: Daniel Wagner, linux-block, Paolo Bonzini, Christoph Hellwig,
	Benjamin Marzinski, Martin Wilck

From: Martin Wilck <mwilck@suse.com>

Hello Mike,

here is v3 of my attempt to add retry logic to SG_IO on dm-multipath devices.
Sorry that it took such a long time.

Regards
Martin

Changes v2->v3:

 - un-inlined scsi_result_to_blk_status again, and move the helper
   __scsi_result_to_blk_status to block/scsi_ioctl.c instead
   (Bart v. Assche)
 - open-coded the status/msg/host/driver-byte -> result conversion
   where the standard SCSI helpers aren't usable (Bart v. Assche)
    
Changes v1->v2:

 - applied modifications from Mike Snitzer
 - moved SG_IO dependent code to a separate file, no scsi includes in
   dm.c any more
 - made the new code depend on a configuration option 
 - separated out scsi changes, made scsi_result_to_blk_status()
   inline to avoid dependency of dm_mod from scsi_mod (Paolo Bonzini)

Martin Wilck (2):
  scsi: export __scsi_result_to_blk_status() in scsi_ioctl.c
  dm: add CONFIG_DM_MULTIPATH_SG_IO - failover for SG_IO on dm-multipath

 block/scsi_ioctl.c         |  52 ++++++++++++++-
 drivers/md/Kconfig         |  11 ++++
 drivers/md/Makefile        |   4 ++
 drivers/md/dm-core.h       |   5 ++
 drivers/md/dm-rq.h         |  11 ++++
 drivers/md/dm-scsi_ioctl.c | 127 +++++++++++++++++++++++++++++++++++++
 drivers/md/dm.c            |  20 +++++-
 drivers/scsi/scsi_lib.c    |  29 +--------
 include/linux/blkdev.h     |   3 +
 9 files changed, 229 insertions(+), 33 deletions(-)
 create mode 100644 drivers/md/dm-scsi_ioctl.c

-- 
2.31.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v3 1/2] scsi: export __scsi_result_to_blk_status() in scsi_ioctl.c
  2021-06-11 20:25 [PATCH v3 0/2] dm: dm_blk_ioctl(): implement failover for SG_IO on dm-multipath mwilck
@ 2021-06-11 20:25 ` mwilck
  2021-06-11 20:25 ` [PATCH v3 2/2] dm: add CONFIG_DM_MULTIPATH_SG_IO - failover for SG_IO on dm-multipath mwilck
  2021-06-14 15:15 ` [PATCH v3 0/2] dm: dm_blk_ioctl(): implement " Mike Snitzer
  2 siblings, 0 replies; 8+ messages in thread
From: mwilck @ 2021-06-11 20:25 UTC (permalink / raw)
  To: Mike Snitzer, Alasdair G Kergon, Bart Van Assche, dm-devel,
	Hannes Reinecke
  Cc: Daniel Wagner, linux-block, Paolo Bonzini, Christoph Hellwig,
	Benjamin Marzinski, Martin Wilck

From: Martin Wilck <mwilck@suse.com>

This makes it possible to use scsi_result_to_blk_status() from
code that shouldn't depend on scsi_mod (e.g. device mapper).

scsi_ioctl.c is selected by CONFIG_BLK_SCSI_REQUEST, which is automatically
selected by CONFIG_SCSI.

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 block/scsi_ioctl.c      | 47 +++++++++++++++++++++++++++++++++++++++++
 drivers/scsi/scsi_lib.c | 29 +------------------------
 include/linux/blkdev.h  |  1 +
 3 files changed, 49 insertions(+), 28 deletions(-)

diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index 1b3fe99b83a6..b39e0835600f 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -879,6 +879,53 @@ void scsi_req_init(struct scsi_request *req)
 }
 EXPORT_SYMBOL(scsi_req_init);
 
+/* See set_host_byte() in include/scsi/scsi_cmnd.h */
+static void __set_host_byte(int *result, char status)
+{
+	*result = (*result & 0xff00ffff) | (status << 16);
+}
+
+/**
+ * __scsi_result_to_blk_status - translate a SCSI result code into blk_status_t
+ * @cmd:	SCSI command
+ * @cmd_result: pointer to scsi cmnd result code to be possibly changed
+ *
+ * Translate a SCSI result code into a blk_status_t value. May reset the host
+ * byte of @cmd_result.
+ */
+blk_status_t __scsi_result_to_blk_status(int *cmd_result, int result)
+{
+	switch (host_byte(result)) {
+	case DID_OK:
+		/*
+		 * Also check the other bytes than the status byte in result
+		 * to handle the case when a SCSI LLD sets result to
+		 * DRIVER_SENSE << 24 without setting SAM_STAT_CHECK_CONDITION.
+		 */
+		if (scsi_status_is_good(result) && (result & ~0xff) == 0)
+			return BLK_STS_OK;
+		return BLK_STS_IOERR;
+	case DID_TRANSPORT_FAILFAST:
+	case DID_TRANSPORT_MARGINAL:
+		return BLK_STS_TRANSPORT;
+	case DID_TARGET_FAILURE:
+		__set_host_byte(cmd_result, DID_OK);
+		return BLK_STS_TARGET;
+	case DID_NEXUS_FAILURE:
+		__set_host_byte(cmd_result, DID_OK);
+		return BLK_STS_NEXUS;
+	case DID_ALLOC_FAILURE:
+		__set_host_byte(cmd_result, DID_OK);
+		return BLK_STS_NOSPC;
+	case DID_MEDIUM_ERROR:
+		__set_host_byte(cmd_result, DID_OK);
+		return BLK_STS_MEDIUM;
+	default:
+		return BLK_STS_IOERR;
+	}
+}
+EXPORT_SYMBOL(__scsi_result_to_blk_status);
+
 static int __init blk_scsi_ioctl_init(void)
 {
 	blk_set_cmd_filter_defaults(&blk_default_cmd_filter);
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 532304d42f00..add243c5dba7 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -586,34 +586,7 @@ static bool scsi_end_request(struct request *req, blk_status_t error,
  */
 static blk_status_t scsi_result_to_blk_status(struct scsi_cmnd *cmd, int result)
 {
-	switch (host_byte(result)) {
-	case DID_OK:
-		/*
-		 * Also check the other bytes than the status byte in result
-		 * to handle the case when a SCSI LLD sets result to
-		 * DRIVER_SENSE << 24 without setting SAM_STAT_CHECK_CONDITION.
-		 */
-		if (scsi_status_is_good(result) && (result & ~0xff) == 0)
-			return BLK_STS_OK;
-		return BLK_STS_IOERR;
-	case DID_TRANSPORT_FAILFAST:
-	case DID_TRANSPORT_MARGINAL:
-		return BLK_STS_TRANSPORT;
-	case DID_TARGET_FAILURE:
-		set_host_byte(cmd, DID_OK);
-		return BLK_STS_TARGET;
-	case DID_NEXUS_FAILURE:
-		set_host_byte(cmd, DID_OK);
-		return BLK_STS_NEXUS;
-	case DID_ALLOC_FAILURE:
-		set_host_byte(cmd, DID_OK);
-		return BLK_STS_NOSPC;
-	case DID_MEDIUM_ERROR:
-		set_host_byte(cmd, DID_OK);
-		return BLK_STS_MEDIUM;
-	default:
-		return BLK_STS_IOERR;
-	}
+	return __scsi_result_to_blk_status(&cmd->result, result);
 }
 
 /* Helper for scsi_io_completion() when "reprep" action required. */
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 1255823b2bc0..48497a77428d 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -2021,4 +2021,5 @@ int fsync_bdev(struct block_device *bdev);
 int freeze_bdev(struct block_device *bdev);
 int thaw_bdev(struct block_device *bdev);
 
+blk_status_t __scsi_result_to_blk_status(int *cmd_result, int result);
 #endif /* _LINUX_BLKDEV_H */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v3 2/2] dm: add CONFIG_DM_MULTIPATH_SG_IO - failover for SG_IO on dm-multipath
  2021-06-11 20:25 [PATCH v3 0/2] dm: dm_blk_ioctl(): implement failover for SG_IO on dm-multipath mwilck
  2021-06-11 20:25 ` [PATCH v3 1/2] scsi: export __scsi_result_to_blk_status() in scsi_ioctl.c mwilck
@ 2021-06-11 20:25 ` mwilck
  2021-06-15 17:10   ` Mike Snitzer
  2021-06-14 15:15 ` [PATCH v3 0/2] dm: dm_blk_ioctl(): implement " Mike Snitzer
  2 siblings, 1 reply; 8+ messages in thread
From: mwilck @ 2021-06-11 20:25 UTC (permalink / raw)
  To: Mike Snitzer, Alasdair G Kergon, Bart Van Assche, dm-devel,
	Hannes Reinecke
  Cc: Daniel Wagner, linux-block, Paolo Bonzini, Christoph Hellwig,
	Benjamin Marzinski, Martin Wilck

From: Martin Wilck <mwilck@suse.com>

In virtual deployments, SCSI passthrough over dm-multipath devices is a
common setup. The qemu "pr-helper" was specifically invented for it. I
believe that this is the most important real-world scenario for sending
SG_IO ioctls to device-mapper devices.

In this configuration, guests send SCSI IO to the hypervisor in the form of
SG_IO ioctls issued by qemu. But on the device-mapper level, these SCSI
ioctls aren't treated like regular IO. Until commit 2361ae595352 ("dm mpath:
switch paths in dm_blk_ioctl() code path"), no path switching was done at
all. Worse though, if an SG_IO call fails because of a path error,
dm-multipath doesn't retry the IO on a another path; rather, the failure is
passed back to the guest, and paths are not marked as faulty.  This is in
stark contrast with regular block IO of guests on dm-multipath devices, and
certainly comes as a surprise to users who switch to SCSI passthrough in
qemu. In general, users of dm-multipath devices would probably expect failover
to work at least in a basic way.

This patch fixes this by taking a special code path for SG_IO on request-
based device mapper targets if CONFIG_DM_MULTIPATH_SG_IO is set.  Rather then
just choosing a single path, sending the IO to it, and failing to the caller
if the IO on the path failed, it retries the same IO on another path for
certain error codes, using blk_path_error() to determine if a retry would
make sense for the given error code. Moreover, it sends a message to the
multipath target to mark the path as failed.

One problem remains open: if all paths in a multipath map are failed,
normal multipath IO may switch to queueing mode (depending on
configuration). This isn't possible for SG_IO, as SG_IO requests can't
easily be queued like regular block I/O. Thus in the "no path" case, the
guest will still see an error.

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 block/scsi_ioctl.c         |   5 +-
 drivers/md/Kconfig         |  11 ++++
 drivers/md/Makefile        |   4 ++
 drivers/md/dm-core.h       |   5 ++
 drivers/md/dm-rq.h         |  11 ++++
 drivers/md/dm-scsi_ioctl.c | 127 +++++++++++++++++++++++++++++++++++++
 drivers/md/dm.c            |  20 +++++-
 include/linux/blkdev.h     |   2 +
 8 files changed, 180 insertions(+), 5 deletions(-)
 create mode 100644 drivers/md/dm-scsi_ioctl.c

diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index b39e0835600f..38771f4bcf18 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -279,8 +279,8 @@ static int blk_complete_sghdr_rq(struct request *rq, struct sg_io_hdr *hdr,
 	return ret;
 }
 
-static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
-		struct sg_io_hdr *hdr, fmode_t mode)
+int sg_io(struct request_queue *q, struct gendisk *bd_disk,
+	  struct sg_io_hdr *hdr, fmode_t mode)
 {
 	unsigned long start_time;
 	ssize_t ret = 0;
@@ -365,6 +365,7 @@ static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
 	blk_put_request(rq);
 	return ret;
 }
+EXPORT_SYMBOL_GPL(sg_io);
 
 /**
  * sg_scsi_ioctl  --  handle deprecated SCSI_IOCTL_SEND_COMMAND ioctl
diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index f2014385d48b..f28f29e3bd11 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -473,6 +473,17 @@ config DM_MULTIPATH_IOA
 
 	  If unsure, say N.
 
+config DM_MULTIPATH_SG_IO
+       bool "Retry SCSI generic I/O on multipath devices"
+       depends on DM_MULTIPATH && BLK_SCSI_REQUEST
+       help
+	 With this option, SCSI generic (SG) requests issued on multipath
+	 devices will behave similar to regular block I/O: upon failure,
+	 they are repeated on a different path, and the erroring device
+	 is marked as failed.
+
+	 If unsure, say N.
+
 config DM_DELAY
 	tristate "I/O delaying target"
 	depends on BLK_DEV_DM
diff --git a/drivers/md/Makefile b/drivers/md/Makefile
index ef7ddc27685c..187ea469f64a 100644
--- a/drivers/md/Makefile
+++ b/drivers/md/Makefile
@@ -88,6 +88,10 @@ ifeq ($(CONFIG_DM_INIT),y)
 dm-mod-objs			+= dm-init.o
 endif
 
+ifeq ($(CONFIG_DM_MULTIPATH_SG_IO),y)
+dm-mod-objs			+= dm-scsi_ioctl.o
+endif
+
 ifeq ($(CONFIG_DM_UEVENT),y)
 dm-mod-objs			+= dm-uevent.o
 endif
diff --git a/drivers/md/dm-core.h b/drivers/md/dm-core.h
index 5953ff2bd260..8bd8a8e3916e 100644
--- a/drivers/md/dm-core.h
+++ b/drivers/md/dm-core.h
@@ -189,4 +189,9 @@ extern atomic_t dm_global_event_nr;
 extern wait_queue_head_t dm_global_eventq;
 void dm_issue_global_event(void);
 
+int __dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx,
+		       struct block_device **bdev,
+		       struct dm_target **target);
+void dm_unprepare_ioctl(struct mapped_device *md, int srcu_idx);
+
 #endif
diff --git a/drivers/md/dm-rq.h b/drivers/md/dm-rq.h
index 1eea0da641db..c6d2853e4d1d 100644
--- a/drivers/md/dm-rq.h
+++ b/drivers/md/dm-rq.h
@@ -44,4 +44,15 @@ ssize_t dm_attr_rq_based_seq_io_merge_deadline_show(struct mapped_device *md, ch
 ssize_t dm_attr_rq_based_seq_io_merge_deadline_store(struct mapped_device *md,
 						     const char *buf, size_t count);
 
+#ifdef CONFIG_DM_MULTIPATH_SG_IO
+int dm_sg_io_ioctl(struct block_device *bdev, fmode_t mode,
+		   unsigned int cmd, unsigned long uarg);
+#else
+static inline int dm_sg_io_ioctl(struct block_device *bdev, fmode_t mode,
+				 unsigned int cmd, unsigned long uarg)
+{
+	return -ENOTTY;
+}
+#endif
+
 #endif
diff --git a/drivers/md/dm-scsi_ioctl.c b/drivers/md/dm-scsi_ioctl.c
new file mode 100644
index 000000000000..a696e2a6557e
--- /dev/null
+++ b/drivers/md/dm-scsi_ioctl.c
@@ -0,0 +1,127 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2021 Martin Wilck, SUSE LLC
+ */
+
+#include "dm-core.h"
+#include <linux/types.h>
+#include <linux/errno.h>
+#include <linux/kernel.h>
+#include <linux/uaccess.h>
+#include <linux/blk_types.h>
+#include <linux/blkdev.h>
+#include <linux/device-mapper.h>
+#include <scsi/sg.h>
+#include <scsi/scsi_cmnd.h>
+
+#define DM_MSG_PREFIX "sg_io"
+
+int dm_sg_io_ioctl(struct block_device *bdev, fmode_t mode,
+		   unsigned int cmd, unsigned long uarg)
+{
+	struct mapped_device *md = bdev->bd_disk->private_data;
+	struct sg_io_hdr hdr;
+	void __user *arg = (void __user *)uarg;
+	int rc, srcu_idx;
+	char path_name[BDEVNAME_SIZE];
+
+	if (cmd != SG_IO)
+		return -ENOTTY;
+
+	if (copy_from_user(&hdr, arg, sizeof(hdr)))
+		return -EFAULT;
+
+	if (hdr.interface_id != 'S')
+		return -EINVAL;
+
+	if (hdr.dxfer_len > (queue_max_hw_sectors(bdev->bd_disk->queue) << 9))
+		return -EIO;
+
+	for (;;) {
+		struct dm_target *tgt;
+		struct sg_io_hdr rhdr;
+
+		rc = __dm_prepare_ioctl(md, &srcu_idx, &bdev, &tgt);
+		if (rc < 0) {
+			DMERR("%s: failed to get path: %d",
+			      __func__, rc);
+			goto out;
+		}
+
+		rhdr = hdr;
+
+		rc = sg_io(bdev->bd_disk->queue, bdev->bd_disk, &rhdr, mode);
+
+		DMDEBUG("SG_IO via %s: rc = %d D%02xH%02xM%02xS%02x",
+			bdevname(bdev, path_name), rc,
+			rhdr.driver_status, rhdr.host_status,
+			rhdr.msg_status, rhdr.status);
+
+		/*
+		 * Errors resulting from invalid parameters shouldn't be retried
+		 * on another path.
+		 */
+		switch (rc) {
+		case -ENOIOCTLCMD:
+		case -EFAULT:
+		case -EINVAL:
+		case -EPERM:
+			goto out;
+		default:
+			break;
+		}
+
+		if (rhdr.info & SG_INFO_CHECK) {
+			int result;
+			blk_status_t sts;
+
+			result = rhdr.status |
+				(rhdr.msg_status << 8) |
+				(rhdr.host_status << 16) |
+				(rhdr.driver_status << 24);
+
+			sts = __scsi_result_to_blk_status(&result, result);
+			rhdr.host_status = host_byte(result);
+
+			/* See if this is a target or path error. */
+			if (sts == BLK_STS_OK)
+				rc = 0;
+			else if (blk_path_error(sts))
+				rc = -EIO;
+			else {
+				rc = blk_status_to_errno(sts);
+				goto out;
+			}
+		}
+
+		if (rc == 0) {
+			/* success */
+			if (copy_to_user(arg, &rhdr, sizeof(rhdr)))
+				rc = -EFAULT;
+			goto out;
+		}
+
+		/* Failure - fail path by sending a message to the target */
+		if (!tgt->type->message) {
+			DMWARN("invalid target!");
+			rc = -EIO;
+			goto out;
+		} else {
+			char bdbuf[BDEVT_SIZE];
+			char *argv[2] = { "fail_path", bdbuf };
+
+			scnprintf(bdbuf, sizeof(bdbuf), "%u:%u",
+				  MAJOR(bdev->bd_dev), MINOR(bdev->bd_dev));
+
+			DMDEBUG("sending \"%s %s\" to target", argv[0], argv[1]);
+			rc = tgt->type->message(tgt, 2, argv, NULL, 0);
+			if (rc < 0)
+				goto out;
+		}
+
+		dm_unprepare_ioctl(md, srcu_idx);
+	}
+out:
+	dm_unprepare_ioctl(md, srcu_idx);
+	return rc;
+}
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index ca2aedd8ee7d..29b93fb3929e 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -522,8 +522,9 @@ static int dm_blk_report_zones(struct gendisk *disk, sector_t sector,
 #define dm_blk_report_zones		NULL
 #endif /* CONFIG_BLK_DEV_ZONED */
 
-static int dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx,
-			    struct block_device **bdev)
+int __dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx,
+		       struct block_device **bdev,
+		       struct dm_target **target)
 {
 	struct dm_target *tgt;
 	struct dm_table *map;
@@ -553,10 +554,19 @@ static int dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx,
 		goto retry;
 	}
 
+	if (r >= 0 && target)
+		*target = tgt;
+
 	return r;
 }
 
-static void dm_unprepare_ioctl(struct mapped_device *md, int srcu_idx)
+static int dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx,
+			    struct block_device **bdev)
+{
+	return __dm_prepare_ioctl(md, srcu_idx, bdev, NULL);
+}
+
+void dm_unprepare_ioctl(struct mapped_device *md, int srcu_idx)
 {
 	dm_put_live_table(md, srcu_idx);
 }
@@ -567,6 +577,10 @@ static int dm_blk_ioctl(struct block_device *bdev, fmode_t mode,
 	struct mapped_device *md = bdev->bd_disk->private_data;
 	int r, srcu_idx;
 
+	if ((dm_get_md_type(md) == DM_TYPE_REQUEST_BASED) &&
+	    ((r = dm_sg_io_ioctl(bdev, mode, cmd, arg)) != -ENOTTY))
+		return r;
+
 	r = dm_prepare_ioctl(md, &srcu_idx, &bdev);
 	if (r < 0)
 		goto out;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 48497a77428d..b8f1d603cc7a 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -923,6 +923,8 @@ extern int scsi_cmd_ioctl(struct request_queue *, struct gendisk *, fmode_t,
 			  unsigned int, void __user *);
 extern int sg_scsi_ioctl(struct request_queue *, struct gendisk *, fmode_t,
 			 struct scsi_ioctl_command __user *);
+extern int sg_io(struct request_queue *, struct gendisk *,
+		 struct sg_io_hdr *, fmode_t);
 extern int get_sg_io_hdr(struct sg_io_hdr *hdr, const void __user *argp);
 extern int put_sg_io_hdr(const struct sg_io_hdr *hdr, void __user *argp);
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 0/2]  dm: dm_blk_ioctl(): implement failover for SG_IO on dm-multipath
  2021-06-11 20:25 [PATCH v3 0/2] dm: dm_blk_ioctl(): implement failover for SG_IO on dm-multipath mwilck
  2021-06-11 20:25 ` [PATCH v3 1/2] scsi: export __scsi_result_to_blk_status() in scsi_ioctl.c mwilck
  2021-06-11 20:25 ` [PATCH v3 2/2] dm: add CONFIG_DM_MULTIPATH_SG_IO - failover for SG_IO on dm-multipath mwilck
@ 2021-06-14 15:15 ` Mike Snitzer
  2021-06-15 10:54   ` Martin Wilck
  2 siblings, 1 reply; 8+ messages in thread
From: Mike Snitzer @ 2021-06-14 15:15 UTC (permalink / raw)
  To: mwilck
  Cc: Alasdair G Kergon, Bart Van Assche, dm-devel, Hannes Reinecke,
	Daniel Wagner, linux-block, Paolo Bonzini, Christoph Hellwig,
	Benjamin Marzinski, linux-scsi

On Fri, Jun 11 2021 at  4:25P -0400,
mwilck@suse.com <mwilck@suse.com> wrote:

> From: Martin Wilck <mwilck@suse.com>
> 
> Hello Mike,
> 
> here is v3 of my attempt to add retry logic to SG_IO on dm-multipath devices.
> Sorry that it took such a long time.
> 
> Regards
> Martin
> 
> Changes v2->v3:
> 
>  - un-inlined scsi_result_to_blk_status again, and move the helper
>    __scsi_result_to_blk_status to block/scsi_ioctl.c instead
>    (Bart v. Assche)
>  - open-coded the status/msg/host/driver-byte -> result conversion
>    where the standard SCSI helpers aren't usable (Bart v. Assche)

This work offers a proof-of-concept but it needs further refinement
for sure.

The proposed open-coded SCSI code (in patch 2's drivers/md/dm-scsi_ioctl.c) 
is well beyond what I'm willing to take in DM.  If this type of
functionality is still needed (for kvm's SCSI passthru snafu) then
more work is needed to negotiate proper interfaces with the SCSI
subsystem (added linux-scsi to cc, odd they weren't engaged on this).

Does it make sense to extend the SCSI device handler interface to add
the required enablement? (I think it'd have to if this line of work is
to ultimately get upstream).

Mike

  
> Changes v1->v2:
> 
>  - applied modifications from Mike Snitzer
>  - moved SG_IO dependent code to a separate file, no scsi includes in
>    dm.c any more
>  - made the new code depend on a configuration option 
>  - separated out scsi changes, made scsi_result_to_blk_status()
>    inline to avoid dependency of dm_mod from scsi_mod (Paolo Bonzini)
> 
> Martin Wilck (2):
>   scsi: export __scsi_result_to_blk_status() in scsi_ioctl.c
>   dm: add CONFIG_DM_MULTIPATH_SG_IO - failover for SG_IO on dm-multipath
> 
>  block/scsi_ioctl.c         |  52 ++++++++++++++-
>  drivers/md/Kconfig         |  11 ++++
>  drivers/md/Makefile        |   4 ++
>  drivers/md/dm-core.h       |   5 ++
>  drivers/md/dm-rq.h         |  11 ++++
>  drivers/md/dm-scsi_ioctl.c | 127 +++++++++++++++++++++++++++++++++++++
>  drivers/md/dm.c            |  20 +++++-
>  drivers/scsi/scsi_lib.c    |  29 +--------
>  include/linux/blkdev.h     |   3 +
>  9 files changed, 229 insertions(+), 33 deletions(-)
>  create mode 100644 drivers/md/dm-scsi_ioctl.c
> 
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 0/2]  dm: dm_blk_ioctl(): implement failover for SG_IO on dm-multipath
  2021-06-14 15:15 ` [PATCH v3 0/2] dm: dm_blk_ioctl(): implement " Mike Snitzer
@ 2021-06-15 10:54   ` Martin Wilck
  2021-06-15 17:11     ` Mike Snitzer
  0 siblings, 1 reply; 8+ messages in thread
From: Martin Wilck @ 2021-06-15 10:54 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Alasdair G Kergon, Bart Van Assche, dm-devel, Hannes Reinecke,
	Daniel Wagner, linux-block, Paolo Bonzini, Christoph Hellwig,
	Benjamin Marzinski, linux-scsi

Hi Mike,

On Mo, 2021-06-14 at 11:15 -0400, Mike Snitzer wrote:
> 
> This work offers a proof-of-concept but it needs further refinement
> for sure.

Thanks for looking into it again. I need some more guidance from your
part in order to be able to resubmit the set in a form that you
consider ready for merging.

> The proposed open-coded SCSI code (in patch 2's drivers/md/dm-
> scsi_ioctl.c) 
> is well beyond what I'm willing to take in DM.

I'm not sure what code you're referring to. Is it the processing of the
bytes of the SCSI result code? If yes, please note that I had changed
that to open-coded form in response to Bart's review of my v2
submission. If it's something else, please point it out to me.

To minimize the special-casing for this code path, Hannes suggested to
use a target-specific unprepare_ioctl() callback to to tell the generic
dm code whether a given ioctl could be retried. The logic that I've put
into dm-scsi_ioctl.c could then be moved into the unprepare_ioctl()
callback of dm-mpath. dm_blk_ioctl() would need to check the callback's
return value and possibly retry the ioctl. Would hat appeal to you?

>   If this type of
> functionality is still needed (for kvm's SCSI passthru snafu) then
> more work is needed to negotiate proper interfaces with the SCSI
> subsystem (added linux-scsi to cc, odd they weren't engaged on this).

Not cc'ing linux-scsi was my oversight, sorry about that. 

But I don't quite understand what interfaces you have in mind. SCSI
needs to expose the SG_IO interface to dm, which it does; I just needed
to export sg_io() to get access to the sg_io_hdr fields. The question
whether a given IO can be retried is decided on the dm (-mpath) layer,
based on blk_status_t; no additional interface on the SCSI side is
necessary for that.

> Does it make sense to extend the SCSI device handler interface to add
> the required enablement? (I think it'd have to if this line of work
> is
> to ultimately get upstream).

My current code uses the device handler indirectly for activating paths
during priority group switching, via the dm-mpath prepare_ioctl()
method and __pg_init_all_paths(). This is what I intended - to use
exactly the same logic for SG_IO which is used for regular read/write
IO on the block device. What additional functionality for the device
handler do you have in mind?

Regards and thanks,
Martin

> 
> Mike
> 
>   
> > Changes v1->v2:
> > 
> >  - applied modifications from Mike Snitzer
> >  - moved SG_IO dependent code to a separate file, no scsi includes
> > in
> >    dm.c any more
> >  - made the new code depend on a configuration option 
> >  - separated out scsi changes, made scsi_result_to_blk_status()
> >    inline to avoid dependency of dm_mod from scsi_mod (Paolo
> > Bonzini)
> > 
> > Martin Wilck (2):
> >   scsi: export __scsi_result_to_blk_status() in scsi_ioctl.c
> >   dm: add CONFIG_DM_MULTIPATH_SG_IO - failover for SG_IO on dm-
> > multipath
> > 
> >  block/scsi_ioctl.c         |  52 ++++++++++++++-
> >  drivers/md/Kconfig         |  11 ++++
> >  drivers/md/Makefile        |   4 ++
> >  drivers/md/dm-core.h       |   5 ++
> >  drivers/md/dm-rq.h         |  11 ++++
> >  drivers/md/dm-scsi_ioctl.c | 127
> > +++++++++++++++++++++++++++++++++++++
> >  drivers/md/dm.c            |  20 +++++-
> >  drivers/scsi/scsi_lib.c    |  29 +--------
> >  include/linux/blkdev.h     |   3 +
> >  9 files changed, 229 insertions(+), 33 deletions(-)
> >  create mode 100644 drivers/md/dm-scsi_ioctl.c
> > 
> > -- 
> > 2.31.1
> > 
> 



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 2/2] dm: add CONFIG_DM_MULTIPATH_SG_IO - failover for SG_IO on dm-multipath
  2021-06-11 20:25 ` [PATCH v3 2/2] dm: add CONFIG_DM_MULTIPATH_SG_IO - failover for SG_IO on dm-multipath mwilck
@ 2021-06-15 17:10   ` Mike Snitzer
  2021-06-16  9:56     ` Martin Wilck
  0 siblings, 1 reply; 8+ messages in thread
From: Mike Snitzer @ 2021-06-15 17:10 UTC (permalink / raw)
  To: mwilck
  Cc: Alasdair G Kergon, Bart Van Assche, dm-devel, Hannes Reinecke,
	Daniel Wagner, linux-block, Paolo Bonzini, Christoph Hellwig,
	Benjamin Marzinski, linux-scsi

On Fri, Jun 11 2021 at  4:25P -0400,
mwilck@suse.com <mwilck@suse.com> wrote:

> From: Martin Wilck <mwilck@suse.com>
> 
> In virtual deployments, SCSI passthrough over dm-multipath devices is a
> common setup. The qemu "pr-helper" was specifically invented for it. I
> believe that this is the most important real-world scenario for sending
> SG_IO ioctls to device-mapper devices.
> 
> In this configuration, guests send SCSI IO to the hypervisor in the form of
> SG_IO ioctls issued by qemu. But on the device-mapper level, these SCSI
> ioctls aren't treated like regular IO. Until commit 2361ae595352 ("dm mpath:
> switch paths in dm_blk_ioctl() code path"), no path switching was done at
> all. Worse though, if an SG_IO call fails because of a path error,
> dm-multipath doesn't retry the IO on a another path; rather, the failure is
> passed back to the guest, and paths are not marked as faulty.  This is in
> stark contrast with regular block IO of guests on dm-multipath devices, and
> certainly comes as a surprise to users who switch to SCSI passthrough in
> qemu. In general, users of dm-multipath devices would probably expect failover
> to work at least in a basic way.
> 
> This patch fixes this by taking a special code path for SG_IO on request-
> based device mapper targets if CONFIG_DM_MULTIPATH_SG_IO is set.  Rather then
> just choosing a single path, sending the IO to it, and failing to the caller
> if the IO on the path failed, it retries the same IO on another path for
> certain error codes, using blk_path_error() to determine if a retry would
> make sense for the given error code. Moreover, it sends a message to the
> multipath target to mark the path as failed.
> 
> One problem remains open: if all paths in a multipath map are failed,
> normal multipath IO may switch to queueing mode (depending on
> configuration). This isn't possible for SG_IO, as SG_IO requests can't
> easily be queued like regular block I/O. Thus in the "no path" case, the
> guest will still see an error.
> 
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>  block/scsi_ioctl.c         |   5 +-
>  drivers/md/Kconfig         |  11 ++++
>  drivers/md/Makefile        |   4 ++
>  drivers/md/dm-core.h       |   5 ++
>  drivers/md/dm-rq.h         |  11 ++++
>  drivers/md/dm-scsi_ioctl.c | 127 +++++++++++++++++++++++++++++++++++++
>  drivers/md/dm.c            |  20 +++++-
>  include/linux/blkdev.h     |   2 +
>  8 files changed, 180 insertions(+), 5 deletions(-)
>  create mode 100644 drivers/md/dm-scsi_ioctl.c
> 
> diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
> index b39e0835600f..38771f4bcf18 100644
> --- a/block/scsi_ioctl.c
> +++ b/block/scsi_ioctl.c
> @@ -279,8 +279,8 @@ static int blk_complete_sghdr_rq(struct request *rq, struct sg_io_hdr *hdr,
>  	return ret;
>  }
>  
> -static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
> -		struct sg_io_hdr *hdr, fmode_t mode)
> +int sg_io(struct request_queue *q, struct gendisk *bd_disk,
> +	  struct sg_io_hdr *hdr, fmode_t mode)
>  {
>  	unsigned long start_time;
>  	ssize_t ret = 0;
> @@ -365,6 +365,7 @@ static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
>  	blk_put_request(rq);
>  	return ret;
>  }
> +EXPORT_SYMBOL_GPL(sg_io);
>  
>  /**
>   * sg_scsi_ioctl  --  handle deprecated SCSI_IOCTL_SEND_COMMAND ioctl
> diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
> index f2014385d48b..f28f29e3bd11 100644
> --- a/drivers/md/Kconfig
> +++ b/drivers/md/Kconfig
> @@ -473,6 +473,17 @@ config DM_MULTIPATH_IOA
>  
>  	  If unsure, say N.
>  
> +config DM_MULTIPATH_SG_IO
> +       bool "Retry SCSI generic I/O on multipath devices"
> +       depends on DM_MULTIPATH && BLK_SCSI_REQUEST
> +       help
> +	 With this option, SCSI generic (SG) requests issued on multipath
> +	 devices will behave similar to regular block I/O: upon failure,
> +	 they are repeated on a different path, and the erroring device
> +	 is marked as failed.
> +
> +	 If unsure, say N.
> +

Given how this is all about multipath, there is no reason to bolt this
on to DM core and then play games to issuing multipath target specific
DM message ("fail_path") from generic code.

So the SG_IO ioctl handling code should be in dm-mpath.c and the DM
target interface extended as needed.

>  config DM_DELAY
>  	tristate "I/O delaying target"
>  	depends on BLK_DEV_DM
> diff --git a/drivers/md/Makefile b/drivers/md/Makefile
> index ef7ddc27685c..187ea469f64a 100644
> --- a/drivers/md/Makefile
> +++ b/drivers/md/Makefile
> @@ -88,6 +88,10 @@ ifeq ($(CONFIG_DM_INIT),y)
>  dm-mod-objs			+= dm-init.o
>  endif
>  
> +ifeq ($(CONFIG_DM_MULTIPATH_SG_IO),y)
> +dm-mod-objs			+= dm-scsi_ioctl.o
> +endif
> +
>  ifeq ($(CONFIG_DM_UEVENT),y)
>  dm-mod-objs			+= dm-uevent.o
>  endif
> diff --git a/drivers/md/dm-core.h b/drivers/md/dm-core.h
> index 5953ff2bd260..8bd8a8e3916e 100644
> --- a/drivers/md/dm-core.h
> +++ b/drivers/md/dm-core.h
> @@ -189,4 +189,9 @@ extern atomic_t dm_global_event_nr;
>  extern wait_queue_head_t dm_global_eventq;
>  void dm_issue_global_event(void);
>  
> +int __dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx,
> +		       struct block_device **bdev,
> +		       struct dm_target **target);
> +void dm_unprepare_ioctl(struct mapped_device *md, int srcu_idx);
> +
>  #endif
> diff --git a/drivers/md/dm-rq.h b/drivers/md/dm-rq.h
> index 1eea0da641db..c6d2853e4d1d 100644
> --- a/drivers/md/dm-rq.h
> +++ b/drivers/md/dm-rq.h
> @@ -44,4 +44,15 @@ ssize_t dm_attr_rq_based_seq_io_merge_deadline_show(struct mapped_device *md, ch
>  ssize_t dm_attr_rq_based_seq_io_merge_deadline_store(struct mapped_device *md,
>  						     const char *buf, size_t count);
>  
> +#ifdef CONFIG_DM_MULTIPATH_SG_IO
> +int dm_sg_io_ioctl(struct block_device *bdev, fmode_t mode,
> +		   unsigned int cmd, unsigned long uarg);
> +#else
> +static inline int dm_sg_io_ioctl(struct block_device *bdev, fmode_t mode,
> +				 unsigned int cmd, unsigned long uarg)
> +{
> +	return -ENOTTY;
> +}
> +#endif
> +
>  #endif

There is no reason, that I can see, why bio-based dm-multipath
shouldn't handle SG_IO too.  Why did you constrain it to
request-based?


> diff --git a/drivers/md/dm-scsi_ioctl.c b/drivers/md/dm-scsi_ioctl.c
> new file mode 100644
> index 000000000000..a696e2a6557e
> --- /dev/null
> +++ b/drivers/md/dm-scsi_ioctl.c
> @@ -0,0 +1,127 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2021 Martin Wilck, SUSE LLC
> + */
> +
> +#include "dm-core.h"
> +#include <linux/types.h>
> +#include <linux/errno.h>
> +#include <linux/kernel.h>
> +#include <linux/uaccess.h>
> +#include <linux/blk_types.h>
> +#include <linux/blkdev.h>
> +#include <linux/device-mapper.h>
> +#include <scsi/sg.h>
> +#include <scsi/scsi_cmnd.h>
> +
> +#define DM_MSG_PREFIX "sg_io"
> +
> +int dm_sg_io_ioctl(struct block_device *bdev, fmode_t mode,
> +		   unsigned int cmd, unsigned long uarg)
> +{
> +	struct mapped_device *md = bdev->bd_disk->private_data;
> +	struct sg_io_hdr hdr;
> +	void __user *arg = (void __user *)uarg;
> +	int rc, srcu_idx;
> +	char path_name[BDEVNAME_SIZE];
> +
> +	if (cmd != SG_IO)
> +		return -ENOTTY;
> +
> +	if (copy_from_user(&hdr, arg, sizeof(hdr)))
> +		return -EFAULT;
> +
> +	if (hdr.interface_id != 'S')
> +		return -EINVAL;
> +
> +	if (hdr.dxfer_len > (queue_max_hw_sectors(bdev->bd_disk->queue) << 9))
> +		return -EIO;
> +
> +	for (;;) {
> +		struct dm_target *tgt;
> +		struct sg_io_hdr rhdr;
> +
> +		rc = __dm_prepare_ioctl(md, &srcu_idx, &bdev, &tgt);
> +		if (rc < 0) {
> +			DMERR("%s: failed to get path: %d",
> +			      __func__, rc);
> +			goto out;
> +		}
> +
> +		rhdr = hdr;
> +
> +		rc = sg_io(bdev->bd_disk->queue, bdev->bd_disk, &rhdr, mode);
> +
> +		DMDEBUG("SG_IO via %s: rc = %d D%02xH%02xM%02xS%02x",
> +			bdevname(bdev, path_name), rc,
> +			rhdr.driver_status, rhdr.host_status,
> +			rhdr.msg_status, rhdr.status);
> +
> +		/*
> +		 * Errors resulting from invalid parameters shouldn't be retried
> +		 * on another path.
> +		 */
> +		switch (rc) {
> +		case -ENOIOCTLCMD:
> +		case -EFAULT:
> +		case -EINVAL:
> +		case -EPERM:
> +			goto out;
> +		default:
> +			break;
> +		}
> +
> +		if (rhdr.info & SG_INFO_CHECK) {
> +			int result;
> +			blk_status_t sts;
> +
> +			result = rhdr.status |
> +				(rhdr.msg_status << 8) |
> +				(rhdr.host_status << 16) |
> +				(rhdr.driver_status << 24);
> +
> +			sts = __scsi_result_to_blk_status(&result, result);
> +			rhdr.host_status = host_byte(result);

It is the open-coded SCSI specific sg_io_hdr and result work that
feels like it doesn't belong open-coded in DM.  So maybe the above
code from this SG_INFO_CHECK block could go into an
block/scsi_ioctl.c:sg_io_info_check() method?

> +
> +			/* See if this is a target or path error. */
> +			if (sts == BLK_STS_OK)
> +				rc = 0;
> +			else if (blk_path_error(sts))
> +				rc = -EIO;
> +			else {
> +				rc = blk_status_to_errno(sts);
> +				goto out;
> +			}
> +		}
> +
> +		if (rc == 0) {
> +			/* success */
> +			if (copy_to_user(arg, &rhdr, sizeof(rhdr)))
> +				rc = -EFAULT;
> +			goto out;
> +		}
> +
> +		/* Failure - fail path by sending a message to the target */
> +		if (!tgt->type->message) {
> +			DMWARN("invalid target!");
> +			rc = -EIO;
> +			goto out;
> +		} else {
> +			char bdbuf[BDEVT_SIZE];
> +			char *argv[2] = { "fail_path", bdbuf };
> +
> +			scnprintf(bdbuf, sizeof(bdbuf), "%u:%u",
> +				  MAJOR(bdev->bd_dev), MINOR(bdev->bd_dev));
> +
> +			DMDEBUG("sending \"%s %s\" to target", argv[0], argv[1]);
> +			rc = tgt->type->message(tgt, 2, argv, NULL, 0);
> +			if (rc < 0)
> +				goto out;
> +		}

If you factor things how I suggest below (introducing
dm-mpath.c:dm_mpath_ioctl) then you'll have direct access to
dm-mpath.c:fail_path() so need need to mess around with constructing
DM messages from kernel code.

> +
> +		dm_unprepare_ioctl(md, srcu_idx);
> +	}
> +out:
> +	dm_unprepare_ioctl(md, srcu_idx);
> +	return rc;
> +}
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index ca2aedd8ee7d..29b93fb3929e 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -522,8 +522,9 @@ static int dm_blk_report_zones(struct gendisk *disk, sector_t sector,
>  #define dm_blk_report_zones		NULL
>  #endif /* CONFIG_BLK_DEV_ZONED */
>  
> -static int dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx,
> -			    struct block_device **bdev)
> +int __dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx,
> +		       struct block_device **bdev,
> +		       struct dm_target **target)
>  {
>  	struct dm_target *tgt;
>  	struct dm_table *map;
> @@ -553,10 +554,19 @@ static int dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx,
>  		goto retry;
>  	}
>  
> +	if (r >= 0 && target)
> +		*target = tgt;
> +
>  	return r;
>  }

For dm-multipath you can leverage md->immutable_target always being
multipath.

So after dm_blk_ioctl's dm_prepare_ioctl: 

if (cmd == SG_IO && md->immutable_target &&
    md->immutable_target->ioctl)
    r = md->immutable_target->ioctl(bdev, mode, cmd, arg);

(doing check for SG_IO here just avoids needless call into ->ioctl for
now, but it could be other ioctls will need specialization in future
so checking 'cmd' should be pushed down to md->immutable_target->ioctl
at that time?)

But I'm not following you use of a for (;;) in dm_sg_io_ioctl() --
other than to retry infinitely (you aren't checking for no paths!?,
etc).

Anyway, best to have md->immutable_target->ioctl return
-EAGAIN and goto top of dm_blk_ioctl as needed?

>  
> -static void dm_unprepare_ioctl(struct mapped_device *md, int srcu_idx)
> +static int dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx,
> +			    struct block_device **bdev)
> +{
> +	return __dm_prepare_ioctl(md, srcu_idx, bdev, NULL);
> +}
> +
> +void dm_unprepare_ioctl(struct mapped_device *md, int srcu_idx)
>  {
>  	dm_put_live_table(md, srcu_idx);
>  }
> @@ -567,6 +577,10 @@ static int dm_blk_ioctl(struct block_device *bdev, fmode_t mode,
>  	struct mapped_device *md = bdev->bd_disk->private_data;
>  	int r, srcu_idx;
>  
> +	if ((dm_get_md_type(md) == DM_TYPE_REQUEST_BASED) &&
> +	    ((r = dm_sg_io_ioctl(bdev, mode, cmd, arg)) != -ENOTTY))
> +		return r;
> +

Again, bio-based multipath should work fine with SG_IO too.

>  	r = dm_prepare_ioctl(md, &srcu_idx, &bdev);
>  	if (r < 0)
>  		goto out;
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 48497a77428d..b8f1d603cc7a 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -923,6 +923,8 @@ extern int scsi_cmd_ioctl(struct request_queue *, struct gendisk *, fmode_t,
>  			  unsigned int, void __user *);
>  extern int sg_scsi_ioctl(struct request_queue *, struct gendisk *, fmode_t,
>  			 struct scsi_ioctl_command __user *);
> +extern int sg_io(struct request_queue *, struct gendisk *,
> +		 struct sg_io_hdr *, fmode_t);
>  extern int get_sg_io_hdr(struct sg_io_hdr *hdr, const void __user *argp);
>  extern int put_sg_io_hdr(const struct sg_io_hdr *hdr, void __user *argp);
>  
> -- 
> 2.31.1
> 

Think adding block/scsi_ioctl.c:sg_io_info_check() and exporting it
and sg_io() via blkdev.h should be done as a separate patch 2.

Then patch 3 is purely DM changes to use sg_io() and
sg_io_info_check()

Mike

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 0/2]  dm: dm_blk_ioctl(): implement failover for SG_IO on dm-multipath
  2021-06-15 10:54   ` Martin Wilck
@ 2021-06-15 17:11     ` Mike Snitzer
  0 siblings, 0 replies; 8+ messages in thread
From: Mike Snitzer @ 2021-06-15 17:11 UTC (permalink / raw)
  To: Martin Wilck
  Cc: Alasdair G Kergon, Bart Van Assche, dm-devel, Hannes Reinecke,
	Daniel Wagner, linux-block, Paolo Bonzini, Christoph Hellwig,
	Benjamin Marzinski, linux-scsi

On Tue, Jun 15 2021 at  6:54P -0400,
Martin Wilck <mwilck@suse.com> wrote:

> Hi Mike,
> 
> On Mo, 2021-06-14 at 11:15 -0400, Mike Snitzer wrote:
> > 
> > This work offers a proof-of-concept but it needs further refinement
> > for sure.
> 
> Thanks for looking into it again. I need some more guidance from your
> part in order to be able to resubmit the set in a form that you
> consider ready for merging.
> 
> > The proposed open-coded SCSI code (in patch 2's drivers/md/dm-
> > scsi_ioctl.c) 
> > is well beyond what I'm willing to take in DM.
> 
> I'm not sure what code you're referring to. Is it the processing of the
> bytes of the SCSI result code? If yes, please note that I had changed
> that to open-coded form in response to Bart's review of my v2
> submission. If it's something else, please point it out to me.
> 
> To minimize the special-casing for this code path, Hannes suggested to
> use a target-specific unprepare_ioctl() callback to to tell the generic
> dm code whether a given ioctl could be retried. The logic that I've put
> into dm-scsi_ioctl.c could then be moved into the unprepare_ioctl()
> callback of dm-mpath. dm_blk_ioctl() would need to check the callback's
> return value and possibly retry the ioctl. Would hat appeal to you?
> 
> >   If this type of
> > functionality is still needed (for kvm's SCSI passthru snafu) then
> > more work is needed to negotiate proper interfaces with the SCSI
> > subsystem (added linux-scsi to cc, odd they weren't engaged on this).
> 
> Not cc'ing linux-scsi was my oversight, sorry about that. 
> 
> But I don't quite understand what interfaces you have in mind. SCSI
> needs to expose the SG_IO interface to dm, which it does; I just needed
> to export sg_io() to get access to the sg_io_hdr fields. The question
> whether a given IO can be retried is decided on the dm (-mpath) layer,
> based on blk_status_t; no additional interface on the SCSI side is
> necessary for that.
> 
> > Does it make sense to extend the SCSI device handler interface to add
> > the required enablement? (I think it'd have to if this line of work
> > is
> > to ultimately get upstream).
> 
> My current code uses the device handler indirectly for activating paths
> during priority group switching, via the dm-mpath prepare_ioctl()
> method and __pg_init_all_paths(). This is what I intended - to use
> exactly the same logic for SG_IO which is used for regular read/write
> IO on the block device. What additional functionality for the device
> handler do you have in mind?
> 
> Regards and thanks,
> Martin

I just replied to patch 2 with detailed suggestions.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 2/2] dm: add CONFIG_DM_MULTIPATH_SG_IO - failover for SG_IO on dm-multipath
  2021-06-15 17:10   ` Mike Snitzer
@ 2021-06-16  9:56     ` Martin Wilck
  0 siblings, 0 replies; 8+ messages in thread
From: Martin Wilck @ 2021-06-16  9:56 UTC (permalink / raw)
  To: Mike Snitzer, Bart Van Assche
  Cc: Alasdair G Kergon, dm-devel, Hannes Reinecke, Daniel Wagner,
	linux-block, Paolo Bonzini, Christoph Hellwig,
	Benjamin Marzinski, linux-scsi

On Di, 2021-06-15 at 13:10 -0400, Mike Snitzer wrote:
> On Fri, Jun 11 2021 at  4:25P -0400,
> mwilck@suse.com <mwilck@suse.com> wrote:
> 
> > From: Martin Wilck <mwilck@suse.com>
> > 
> > In virtual deployments, SCSI passthrough over dm-multipath devices is
> > a
> > common setup. The qemu "pr-helper" was specifically invented for it.
> > I
> > believe that this is the most important real-world scenario for
> > sending
> > SG_IO ioctls to device-mapper devices.
> > 
> > In this configuration, guests send SCSI IO to the hypervisor in the
> > form of
> > SG_IO ioctls issued by qemu. But on the device-mapper level, these
> > SCSI
> > ioctls aren't treated like regular IO. Until commit 2361ae595352 ("dm
> > mpath:
> > switch paths in dm_blk_ioctl() code path"), no path switching was
> > done at
> > all. Worse though, if an SG_IO call fails because of a path error,
> > dm-multipath doesn't retry the IO on a another path; rather, the
> > failure is
> > passed back to the guest, and paths are not marked as faulty.  This
> > is in
> > stark contrast with regular block IO of guests on dm-multipath
> > devices, and
> > certainly comes as a surprise to users who switch to SCSI passthrough
> > in
> > qemu. In general, users of dm-multipath devices would probably expect
> > failover
> > to work at least in a basic way.
> > 
> > This patch fixes this by taking a special code path for SG_IO on
> > request-
> > based device mapper targets if CONFIG_DM_MULTIPATH_SG_IO is set. 
> > Rather then
> > just choosing a single path, sending the IO to it, and failing to the
> > caller
> > if the IO on the path failed, it retries the same IO on another path
> > for
> > certain error codes, using blk_path_error() to determine if a retry
> > would
> > make sense for the given error code. Moreover, it sends a message to
> > the
> > multipath target to mark the path as failed.
> > 
> > One problem remains open: if all paths in a multipath map are failed,
> > normal multipath IO may switch to queueing mode (depending on
> > configuration). This isn't possible for SG_IO, as SG_IO requests
> > can't
> > easily be queued like regular block I/O. Thus in the "no path" case,
> > the
> > guest will still see an error.
> > 
> > Signed-off-by: Martin Wilck <mwilck@suse.com>
> > ---
> >  block/scsi_ioctl.c         |   5 +-
> >  drivers/md/Kconfig         |  11 ++++
> >  drivers/md/Makefile        |   4 ++
> >  drivers/md/dm-core.h       |   5 ++
> >  drivers/md/dm-rq.h         |  11 ++++
> >  drivers/md/dm-scsi_ioctl.c | 127
> > +++++++++++++++++++++++++++++++++++++
> >  drivers/md/dm.c            |  20 +++++-
> >  include/linux/blkdev.h     |   2 +
> >  8 files changed, 180 insertions(+), 5 deletions(-)
> >  create mode 100644 drivers/md/dm-scsi_ioctl.c
> > 
> > diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
> > index b39e0835600f..38771f4bcf18 100644
> > --- a/block/scsi_ioctl.c
> > +++ b/block/scsi_ioctl.c
> > @@ -279,8 +279,8 @@ static int blk_complete_sghdr_rq(struct request
> > *rq, struct sg_io_hdr *hdr,
> >         return ret;
> >  }
> >  
> > -static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
> > -               struct sg_io_hdr *hdr, fmode_t mode)
> > +int sg_io(struct request_queue *q, struct gendisk *bd_disk,
> > +         struct sg_io_hdr *hdr, fmode_t mode)
> >  {
> >         unsigned long start_time;
> >         ssize_t ret = 0;
> > @@ -365,6 +365,7 @@ static int sg_io(struct request_queue *q, struct
> > gendisk *bd_disk,
> >         blk_put_request(rq);
> >         return ret;
> >  }
> > +EXPORT_SYMBOL_GPL(sg_io);
> >  
> >  /**
> >   * sg_scsi_ioctl  --  handle deprecated SCSI_IOCTL_SEND_COMMAND
> > ioctl
> > diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
> > index f2014385d48b..f28f29e3bd11 100644
> > --- a/drivers/md/Kconfig
> > +++ b/drivers/md/Kconfig
> > @@ -473,6 +473,17 @@ config DM_MULTIPATH_IOA
> >  
> >           If unsure, say N.
> >  
> > +config DM_MULTIPATH_SG_IO
> > +       bool "Retry SCSI generic I/O on multipath devices"
> > +       depends on DM_MULTIPATH && BLK_SCSI_REQUEST
> > +       help
> > +        With this option, SCSI generic (SG) requests issued on
> > multipath
> > +        devices will behave similar to regular block I/O: upon
> > failure,
> > +        they are repeated on a different path, and the erroring
> > device
> > +        is marked as failed.
> > +
> > +        If unsure, say N.
> > +
> 
> Given how this is all about multipath, there is no reason to bolt this
> on to DM core and then play games to issuing multipath target specific
> DM message ("fail_path") from generic code.
> 
> So the SG_IO ioctl handling code should be in dm-mpath.c and the DM
> target interface extended as needed.

Ok.


> 
> >  config DM_DELAY
> >         tristate "I/O delaying target"
> >         depends on BLK_DEV_DM
> > diff --git a/drivers/md/Makefile b/drivers/md/Makefile
> > index ef7ddc27685c..187ea469f64a 100644
> > --- a/drivers/md/Makefile
> > +++ b/drivers/md/Makefile
> > @@ -88,6 +88,10 @@ ifeq ($(CONFIG_DM_INIT),y)
> >  dm-mod-objs                    += dm-init.o
> >  endif
> >  
> > +ifeq ($(CONFIG_DM_MULTIPATH_SG_IO),y)
> > +dm-mod-objs                    += dm-scsi_ioctl.o
> > +endif
> > +
> >  ifeq ($(CONFIG_DM_UEVENT),y)
> >  dm-mod-objs                    += dm-uevent.o
> >  endif
> > diff --git a/drivers/md/dm-core.h b/drivers/md/dm-core.h
> > index 5953ff2bd260..8bd8a8e3916e 100644
> > --- a/drivers/md/dm-core.h
> > +++ b/drivers/md/dm-core.h
> > @@ -189,4 +189,9 @@ extern atomic_t dm_global_event_nr;
> >  extern wait_queue_head_t dm_global_eventq;
> >  void dm_issue_global_event(void);
> >  
> > +int __dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx,
> > +                      struct block_device **bdev,
> > +                      struct dm_target **target);
> > +void dm_unprepare_ioctl(struct mapped_device *md, int srcu_idx);
> > +
> >  #endif
> > diff --git a/drivers/md/dm-rq.h b/drivers/md/dm-rq.h
> > index 1eea0da641db..c6d2853e4d1d 100644
> > --- a/drivers/md/dm-rq.h
> > +++ b/drivers/md/dm-rq.h
> > @@ -44,4 +44,15 @@ ssize_t
> > dm_attr_rq_based_seq_io_merge_deadline_show(struct mapped_device *md,
> > ch
> >  ssize_t dm_attr_rq_based_seq_io_merge_deadline_store(struct
> > mapped_device *md,
> >                                                      const char *buf,
> > size_t count);
> >  
> > +#ifdef CONFIG_DM_MULTIPATH_SG_IO
> > +int dm_sg_io_ioctl(struct block_device *bdev, fmode_t mode,
> > +                  unsigned int cmd, unsigned long uarg);
> > +#else
> > +static inline int dm_sg_io_ioctl(struct block_device *bdev, fmode_t
> > mode,
> > +                                unsigned int cmd, unsigned long
> > uarg)
> > +{
> > +       return -ENOTTY;
> > +}
> > +#endif
> > +
> >  #endif
> 
> There is no reason, that I can see, why bio-based dm-multipath
> shouldn't handle SG_IO too.  Why did you constrain it to
> request-based?

I couldn't figure out a better way to check if the target in question
was multipath (in my experience, multipath is practically always
request based). With your suggestions below, I'll be able to get rid of
this admittedly odd logic. Thanks.


I'm not sure about SG_IO on bio-based dm-multipath. My feeling was that
it wouldn't work, but I haven't tried or analyzed it in detail.

> 
> > diff --git a/drivers/md/dm-scsi_ioctl.c b/drivers/md/dm-scsi_ioctl.c
> > new file mode 100644
> > index 000000000000..a696e2a6557e
> > --- /dev/null
> > +++ b/drivers/md/dm-scsi_ioctl.c
> > @@ -0,0 +1,127 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (C) 2021 Martin Wilck, SUSE LLC
> > + */
> > +
> > +#include "dm-core.h"
> > +#include <linux/types.h>
> > +#include <linux/errno.h>
> > +#include <linux/kernel.h>
> > +#include <linux/uaccess.h>
> > +#include <linux/blk_types.h>
> > +#include <linux/blkdev.h>
> > +#include <linux/device-mapper.h>
> > +#include <scsi/sg.h>
> > +#include <scsi/scsi_cmnd.h>
> > +
> > +#define DM_MSG_PREFIX "sg_io"
> > +
> > +int dm_sg_io_ioctl(struct block_device *bdev, fmode_t mode,
> > +                  unsigned int cmd, unsigned long uarg)
> > +{
> > +       struct mapped_device *md = bdev->bd_disk->private_data;
> > +       struct sg_io_hdr hdr;
> > +       void __user *arg = (void __user *)uarg;
> > +       int rc, srcu_idx;
> > +       char path_name[BDEVNAME_SIZE];
> > +
> > +       if (cmd != SG_IO)
> > +               return -ENOTTY;
> > +
> > +       if (copy_from_user(&hdr, arg, sizeof(hdr)))
> > +               return -EFAULT;
> > +
> > +       if (hdr.interface_id != 'S')
> > +               return -EINVAL;
> > +
> > +       if (hdr.dxfer_len > (queue_max_hw_sectors(bdev->bd_disk-
> > >queue) << 9))
> > +               return -EIO;
> > +
> > +       for (;;) {
> > +               struct dm_target *tgt;
> > +               struct sg_io_hdr rhdr;
> > +
> > +               rc = __dm_prepare_ioctl(md, &srcu_idx, &bdev, &tgt);
> > +               if (rc < 0) {
> > +                       DMERR("%s: failed to get path: %d",
> > +                             __func__, rc);
> > +                       goto out;
> > +               }
> > +
> > +               rhdr = hdr;
> > +
> > +               rc = sg_io(bdev->bd_disk->queue, bdev->bd_disk,
> > &rhdr, mode);
> > +
> > +               DMDEBUG("SG_IO via %s: rc = %d D%02xH%02xM%02xS%02x",
> > +                       bdevname(bdev, path_name), rc,
> > +                       rhdr.driver_status, rhdr.host_status,
> > +                       rhdr.msg_status, rhdr.status);
> > +
> > +               /*
> > +                * Errors resulting from invalid parameters shouldn't
> > be retried
> > +                * on another path.
> > +                */
> > +               switch (rc) {
> > +               case -ENOIOCTLCMD:
> > +               case -EFAULT:
> > +               case -EINVAL:
> > +               case -EPERM:
> > +                       goto out;
> > +               default:
> > +                       break;
> > +               }
> > +
> > +               if (rhdr.info & SG_INFO_CHECK) {
> > +                       int result;
> > +                       blk_status_t sts;
> > +
> > +                       result = rhdr.status |
> > +                               (rhdr.msg_status << 8) |
> > +                               (rhdr.host_status << 16) |
> > +                               (rhdr.driver_status << 24);
> > +
> > +                       sts = __scsi_result_to_blk_status(&result,
> > result);
> > +                       rhdr.host_status = host_byte(result);
> 
> It is the open-coded SCSI specific sg_io_hdr and result work that
> feels like it doesn't belong open-coded in DM.  So maybe the above
> code from this SG_INFO_CHECK block could go into an
> block/scsi_ioctl.c:sg_io_info_check() method?

Ok. I'll see if I can shape this code in a way that addresses both your
concerns and Bart's.

> > +
> > +                       /* See if this is a target or path error. */
> > +                       if (sts == BLK_STS_OK)
> > +                               rc = 0;
> > +                       else if (blk_path_error(sts))
> > +                               rc = -EIO;
> > +                       else {
> > +                               rc = blk_status_to_errno(sts);
> > +                               goto out;
> > +                       }
> > +               }
> > +
> > +               if (rc == 0) {
> > +                       /* success */
> > +                       if (copy_to_user(arg, &rhdr, sizeof(rhdr)))
> > +                               rc = -EFAULT;
> > +                       goto out;
> > +               }
> > +
> > +               /* Failure - fail path by sending a message to the
> > target */
> > +               if (!tgt->type->message) {
> > +                       DMWARN("invalid target!");
> > +                       rc = -EIO;
> > +                       goto out;
> > +               } else {
> > +                       char bdbuf[BDEVT_SIZE];
> > +                       char *argv[2] = { "fail_path", bdbuf };
> > +
> > +                       scnprintf(bdbuf, sizeof(bdbuf), "%u:%u",
> > +                                 MAJOR(bdev->bd_dev), MINOR(bdev-
> > >bd_dev));
> > +
> > +                       DMDEBUG("sending \"%s %s\" to target",
> > argv[0], argv[1]);
> > +                       rc = tgt->type->message(tgt, 2, argv, NULL,
> > 0);
> > +                       if (rc < 0)
> > +                               goto out;
> > +               }
> 
> If you factor things how I suggest below (introducing
> dm-mpath.c:dm_mpath_ioctl) then you'll have direct access to
> dm-mpath.c:fail_path() so need need to mess around with constructing
> DM messages from kernel code.

Sure.

> 
> > +
> > +               dm_unprepare_ioctl(md, srcu_idx);
> > +       }
> > +out:
> > +       dm_unprepare_ioctl(md, srcu_idx);
> > +       return rc;
> > +}
> > diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> > index ca2aedd8ee7d..29b93fb3929e 100644
> > --- a/drivers/md/dm.c
> > +++ b/drivers/md/dm.c
> > @@ -522,8 +522,9 @@ static int dm_blk_report_zones(struct gendisk
> > *disk, sector_t sector,
> >  #define dm_blk_report_zones            NULL
> >  #endif /* CONFIG_BLK_DEV_ZONED */
> >  
> > -static int dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx,
> > -                           struct block_device **bdev)
> > +int __dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx,
> > +                      struct block_device **bdev,
> > +                      struct dm_target **target)
> >  {
> >         struct dm_target *tgt;
> >         struct dm_table *map;
> > @@ -553,10 +554,19 @@ static int dm_prepare_ioctl(struct
> > mapped_device *md, int *srcu_idx,
> >                 goto retry;
> >         }
> >  
> > +       if (r >= 0 && target)
> > +               *target = tgt;
> > +
> >         return r;
> >  }
> 
> For dm-multipath you can leverage md->immutable_target always being
> multipath.

Ok, this was what I'd been missing. Can I trust that this will hold in
the future?

> 
> So after dm_blk_ioctl's dm_prepare_ioctl: 
> 
> if (cmd == SG_IO && md->immutable_target &&
>     md->immutable_target->ioctl)
>     r = md->immutable_target->ioctl(bdev, mode, cmd, arg);
> 
> (doing check for SG_IO here just avoids needless call into ->ioctl for
> now, but it could be other ioctls will need specialization in future
> so checking 'cmd' should be pushed down to md->immutable_target->ioctl
> at that time?)
> 
> But I'm not following you use of a for (;;) in dm_sg_io_ioctl() --
> other than to retry infinitely (you aren't checking for no paths!?,
> etc).

multipath's prepare_ioctl() method returns an error if no path could be
found, in which case I break the loop.


> 
> Anyway, best to have md->immutable_target->ioctl return
> -EAGAIN and goto top of dm_blk_ioctl as needed?

That would be similar to Hannes' suggestion.

> 
> >  
> > -static void dm_unprepare_ioctl(struct mapped_device *md, int
> > srcu_idx)
> > +static int dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx,
> > +                           struct block_device **bdev)
> > +{
> > +       return __dm_prepare_ioctl(md, srcu_idx, bdev, NULL);
> > +}
> > +
> > +void dm_unprepare_ioctl(struct mapped_device *md, int srcu_idx)
> >  {
> >         dm_put_live_table(md, srcu_idx);
> >  }
> > @@ -567,6 +577,10 @@ static int dm_blk_ioctl(struct block_device
> > *bdev, fmode_t mode,
> >         struct mapped_device *md = bdev->bd_disk->private_data;
> >         int r, srcu_idx;
> >  
> > +       if ((dm_get_md_type(md) == DM_TYPE_REQUEST_BASED) &&
> > +           ((r = dm_sg_io_ioctl(bdev, mode, cmd, arg)) != -ENOTTY))
> > +               return r;
> > +
> 
> Again, bio-based multipath should work fine with SG_IO too.
> 
> >         r = dm_prepare_ioctl(md, &srcu_idx, &bdev);
> >         if (r < 0)
> >                 goto out;
> > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> > index 48497a77428d..b8f1d603cc7a 100644
> > --- a/include/linux/blkdev.h
> > +++ b/include/linux/blkdev.h
> > @@ -923,6 +923,8 @@ extern int scsi_cmd_ioctl(struct request_queue *,
> > struct gendisk *, fmode_t,
> >                           unsigned int, void __user *);
> >  extern int sg_scsi_ioctl(struct request_queue *, struct gendisk *,
> > fmode_t,
> >                          struct scsi_ioctl_command __user *);
> > +extern int sg_io(struct request_queue *, struct gendisk *,
> > +                struct sg_io_hdr *, fmode_t);
> >  extern int get_sg_io_hdr(struct sg_io_hdr *hdr, const void __user
> > *argp);
> >  extern int put_sg_io_hdr(const struct sg_io_hdr *hdr, void __user
> > *argp);
> >  
> > -- 
> > 2.31.1
> > 
> 
> Think adding block/scsi_ioctl.c:sg_io_info_check() and exporting it
> and sg_io() via blkdev.h should be done as a separate patch 2.
> 
> Then patch 3 is purely DM changes to use sg_io() and
> sg_io_info_check()

Thanks a lot for the detailed review. I think this should get me
forward.

Regards
Martin



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-06-16  9:56 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-11 20:25 [PATCH v3 0/2] dm: dm_blk_ioctl(): implement failover for SG_IO on dm-multipath mwilck
2021-06-11 20:25 ` [PATCH v3 1/2] scsi: export __scsi_result_to_blk_status() in scsi_ioctl.c mwilck
2021-06-11 20:25 ` [PATCH v3 2/2] dm: add CONFIG_DM_MULTIPATH_SG_IO - failover for SG_IO on dm-multipath mwilck
2021-06-15 17:10   ` Mike Snitzer
2021-06-16  9:56     ` Martin Wilck
2021-06-14 15:15 ` [PATCH v3 0/2] dm: dm_blk_ioctl(): implement " Mike Snitzer
2021-06-15 10:54   ` Martin Wilck
2021-06-15 17:11     ` Mike Snitzer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).