All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/3] scsi/dm: dm_blk_ioctl(): implement failover for SG_IO on dm-multipath
@ 2021-06-28 15:15 ` mwilck
  0 siblings, 0 replies; 26+ messages in thread
From: mwilck @ 2021-06-28 15:15 UTC (permalink / raw)
  To: Mike Snitzer, Alasdair G Kergon, Bart Van Assche,
	Martin K. Petersen, linux-scsi, dm-devel, Hannes Reinecke,
	Christoph Hellwig
  Cc: Daniel Wagner, linux-block, Paolo Bonzini, Benjamin Marzinski,
	nkoenig, emilne, Martin Wilck

From: Martin Wilck <mwilck@suse.com>

Hello Mike, hello Martin,

here is v5 of my attempt to add retry logic to SG_IO on dm-multipath
devices.

Christoph's comment on v4 has been received, but as I'm still confused
how to handle the SCSI result / blk_status_t conversion properly, and
waiting for more guidance, I want to fix the nasty bug in v4 first.

Regards
Martin

Changes v4->v5:

 - (2/3) Fixed bug in logical / bitwise and expression
   Reported-by: kernel test robot <lkp@intel.com>

Changes v3->v4 (thanks to Mike Snitzer):

 - Added an additional helper function sg_io_to_blk_status() to
   scsi_ioctl.c, in order to avoid open-coding handling of the SCSI result
   code in device-mapper.

 - Added a new method dm_sg_io_ioctl_fn() in struct target_type, define
   only by the multipath target. This allows moving the bulk of the new
   code to dm-mpath.c, and avoids the wrong limitation of the code to
   request-based multipath.

Changes v2->v3:

 - un-inlined scsi_result_to_blk_status again, and move the helper
   __scsi_result_to_blk_status to block/scsi_ioctl.c instead
   (Bart v. Assche)
 - open-coded the status/msg/host/driver-byte -> result conversion
   where the standard SCSI helpers aren't usable (Bart v. Assche)
    
Changes v1->v2:

 - applied modifications from Mike Snitzer
 - moved SG_IO dependent code to a separate file, no scsi includes in
   dm.c any more
 - made the new code depend on a configuration option 
 - separated out scsi changes, made scsi_result_to_blk_status()
   inline to avoid dependency of dm_mod from scsi_mod (Paolo Bonzini)

Martin Wilck (3):
  scsi: scsi_ioctl: export __scsi_result_to_blk_status()
  scsi: scsi_ioctl: add sg_io_to_blk_status()
  dm mpath: add CONFIG_DM_MULTIPATH_SG_IO - failover for SG_IO

Martin Wilck (3):
  scsi: scsi_ioctl: export __scsi_result_to_blk_status()
  scsi: scsi_ioctl: add sg_io_to_blk_status()
  dm mpath: add CONFIG_DM_MULTIPATH_SG_IO - failover for SG_IO

 block/scsi_ioctl.c            |  72 ++++++++++++++++++++++-
 drivers/md/Kconfig            |  11 ++++
 drivers/md/dm-core.h          |   5 ++
 drivers/md/dm-mpath.c         | 105 ++++++++++++++++++++++++++++++++++
 drivers/md/dm.c               |  26 ++++++++-
 drivers/scsi/scsi_lib.c       |  24 +-------
 include/linux/blkdev.h        |   4 ++
 include/linux/device-mapper.h |   8 ++-
 8 files changed, 226 insertions(+), 29 deletions(-)

-- 
2.32.0


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [dm-devel] [PATCH v5 0/3] scsi/dm: dm_blk_ioctl(): implement failover for SG_IO on dm-multipath
@ 2021-06-28 15:15 ` mwilck
  0 siblings, 0 replies; 26+ messages in thread
From: mwilck @ 2021-06-28 15:15 UTC (permalink / raw)
  To: Mike Snitzer, Alasdair G Kergon, Bart Van Assche,
	Martin K. Petersen, linux-scsi, dm-devel, Hannes Reinecke,
	Christoph Hellwig
  Cc: Daniel Wagner, emilne, linux-block, nkoenig, Paolo Bonzini, Martin Wilck

From: Martin Wilck <mwilck@suse.com>

Hello Mike, hello Martin,

here is v5 of my attempt to add retry logic to SG_IO on dm-multipath
devices.

Christoph's comment on v4 has been received, but as I'm still confused
how to handle the SCSI result / blk_status_t conversion properly, and
waiting for more guidance, I want to fix the nasty bug in v4 first.

Regards
Martin

Changes v4->v5:

 - (2/3) Fixed bug in logical / bitwise and expression
   Reported-by: kernel test robot <lkp@intel.com>

Changes v3->v4 (thanks to Mike Snitzer):

 - Added an additional helper function sg_io_to_blk_status() to
   scsi_ioctl.c, in order to avoid open-coding handling of the SCSI result
   code in device-mapper.

 - Added a new method dm_sg_io_ioctl_fn() in struct target_type, define
   only by the multipath target. This allows moving the bulk of the new
   code to dm-mpath.c, and avoids the wrong limitation of the code to
   request-based multipath.

Changes v2->v3:

 - un-inlined scsi_result_to_blk_status again, and move the helper
   __scsi_result_to_blk_status to block/scsi_ioctl.c instead
   (Bart v. Assche)
 - open-coded the status/msg/host/driver-byte -> result conversion
   where the standard SCSI helpers aren't usable (Bart v. Assche)
    
Changes v1->v2:

 - applied modifications from Mike Snitzer
 - moved SG_IO dependent code to a separate file, no scsi includes in
   dm.c any more
 - made the new code depend on a configuration option 
 - separated out scsi changes, made scsi_result_to_blk_status()
   inline to avoid dependency of dm_mod from scsi_mod (Paolo Bonzini)

Martin Wilck (3):
  scsi: scsi_ioctl: export __scsi_result_to_blk_status()
  scsi: scsi_ioctl: add sg_io_to_blk_status()
  dm mpath: add CONFIG_DM_MULTIPATH_SG_IO - failover for SG_IO

Martin Wilck (3):
  scsi: scsi_ioctl: export __scsi_result_to_blk_status()
  scsi: scsi_ioctl: add sg_io_to_blk_status()
  dm mpath: add CONFIG_DM_MULTIPATH_SG_IO - failover for SG_IO

 block/scsi_ioctl.c            |  72 ++++++++++++++++++++++-
 drivers/md/Kconfig            |  11 ++++
 drivers/md/dm-core.h          |   5 ++
 drivers/md/dm-mpath.c         | 105 ++++++++++++++++++++++++++++++++++
 drivers/md/dm.c               |  26 ++++++++-
 drivers/scsi/scsi_lib.c       |  24 +-------
 include/linux/blkdev.h        |   4 ++
 include/linux/device-mapper.h |   8 ++-
 8 files changed, 226 insertions(+), 29 deletions(-)

-- 
2.32.0


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v5 1/3] scsi: scsi_ioctl: export __scsi_result_to_blk_status()
  2021-06-28 15:15 ` [dm-devel] " mwilck
@ 2021-06-28 15:15   ` mwilck
  -1 siblings, 0 replies; 26+ messages in thread
From: mwilck @ 2021-06-28 15:15 UTC (permalink / raw)
  To: Mike Snitzer, Alasdair G Kergon, Bart Van Assche,
	Martin K. Petersen, linux-scsi, dm-devel, Hannes Reinecke,
	Christoph Hellwig
  Cc: Daniel Wagner, linux-block, Paolo Bonzini, Benjamin Marzinski,
	nkoenig, emilne, Martin Wilck

From: Martin Wilck <mwilck@suse.com>

This makes it possible to use scsi_result_to_blk_status() from
code that shouldn't depend on scsi_mod (e.g. device mapper).

scsi_ioctl.c is selected by CONFIG_BLK_SCSI_REQUEST, which is automatically
selected by CONFIG_SCSI.

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 block/scsi_ioctl.c      | 47 +++++++++++++++++++++++++++++++++++++++++
 drivers/scsi/scsi_lib.c | 24 +--------------------
 include/linux/blkdev.h  |  1 +
 3 files changed, 49 insertions(+), 23 deletions(-)

diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index fa6df11b8bdd..19b63b64ecbc 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -882,6 +882,53 @@ void scsi_req_init(struct scsi_request *req)
 }
 EXPORT_SYMBOL(scsi_req_init);
 
+/* See set_host_byte() in include/scsi/scsi_cmnd.h */
+static void __set_host_byte(int *result, char status)
+{
+	*result = (*result & 0xff00ffff) | (status << 16);
+}
+
+/**
+ * __scsi_result_to_blk_status - translate a SCSI result code into blk_status_t
+ * @cmd:	SCSI command
+ * @cmd_result: pointer to scsi cmnd result code to be possibly changed
+ *
+ * Translate a SCSI result code into a blk_status_t value. May reset the host
+ * byte of @cmd_result.
+ */
+blk_status_t __scsi_result_to_blk_status(int *cmd_result, int result)
+{
+	switch (host_byte(result)) {
+	case DID_OK:
+		/*
+		 * Also check the other bytes than the status byte in result
+		 * to handle the case when a SCSI LLD sets result to
+		 * DRIVER_SENSE << 24 without setting SAM_STAT_CHECK_CONDITION.
+		 */
+		if (scsi_status_is_good(result))
+			return BLK_STS_OK;
+		return BLK_STS_IOERR;
+	case DID_TRANSPORT_FAILFAST:
+	case DID_TRANSPORT_MARGINAL:
+		return BLK_STS_TRANSPORT;
+	case DID_TARGET_FAILURE:
+		__set_host_byte(cmd_result, DID_OK);
+		return BLK_STS_TARGET;
+	case DID_NEXUS_FAILURE:
+		__set_host_byte(cmd_result, DID_OK);
+		return BLK_STS_NEXUS;
+	case DID_ALLOC_FAILURE:
+		__set_host_byte(cmd_result, DID_OK);
+		return BLK_STS_NOSPC;
+	case DID_MEDIUM_ERROR:
+		__set_host_byte(cmd_result, DID_OK);
+		return BLK_STS_MEDIUM;
+	default:
+		return BLK_STS_IOERR;
+	}
+}
+EXPORT_SYMBOL(__scsi_result_to_blk_status);
+
 static int __init blk_scsi_ioctl_init(void)
 {
 	blk_set_cmd_filter_defaults(&blk_default_cmd_filter);
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 6b994baf87c2..2c0eca3693af 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -589,29 +589,7 @@ static bool scsi_end_request(struct request *req, blk_status_t error,
  */
 static blk_status_t scsi_result_to_blk_status(struct scsi_cmnd *cmd, int result)
 {
-	switch (host_byte(result)) {
-	case DID_OK:
-		if (scsi_status_is_good(result))
-			return BLK_STS_OK;
-		return BLK_STS_IOERR;
-	case DID_TRANSPORT_FAILFAST:
-	case DID_TRANSPORT_MARGINAL:
-		return BLK_STS_TRANSPORT;
-	case DID_TARGET_FAILURE:
-		set_host_byte(cmd, DID_OK);
-		return BLK_STS_TARGET;
-	case DID_NEXUS_FAILURE:
-		set_host_byte(cmd, DID_OK);
-		return BLK_STS_NEXUS;
-	case DID_ALLOC_FAILURE:
-		set_host_byte(cmd, DID_OK);
-		return BLK_STS_NOSPC;
-	case DID_MEDIUM_ERROR:
-		set_host_byte(cmd, DID_OK);
-		return BLK_STS_MEDIUM;
-	default:
-		return BLK_STS_IOERR;
-	}
+	return __scsi_result_to_blk_status(&cmd->result, result);
 }
 
 /* Helper for scsi_io_completion() when "reprep" action required. */
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 1255823b2bc0..48497a77428d 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -2021,4 +2021,5 @@ int fsync_bdev(struct block_device *bdev);
 int freeze_bdev(struct block_device *bdev);
 int thaw_bdev(struct block_device *bdev);
 
+blk_status_t __scsi_result_to_blk_status(int *cmd_result, int result);
 #endif /* _LINUX_BLKDEV_H */
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [dm-devel] [PATCH v5 1/3] scsi: scsi_ioctl: export __scsi_result_to_blk_status()
@ 2021-06-28 15:15   ` mwilck
  0 siblings, 0 replies; 26+ messages in thread
From: mwilck @ 2021-06-28 15:15 UTC (permalink / raw)
  To: Mike Snitzer, Alasdair G Kergon, Bart Van Assche,
	Martin K. Petersen, linux-scsi, dm-devel, Hannes Reinecke,
	Christoph Hellwig
  Cc: Daniel Wagner, emilne, linux-block, nkoenig, Paolo Bonzini, Martin Wilck

From: Martin Wilck <mwilck@suse.com>

This makes it possible to use scsi_result_to_blk_status() from
code that shouldn't depend on scsi_mod (e.g. device mapper).

scsi_ioctl.c is selected by CONFIG_BLK_SCSI_REQUEST, which is automatically
selected by CONFIG_SCSI.

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 block/scsi_ioctl.c      | 47 +++++++++++++++++++++++++++++++++++++++++
 drivers/scsi/scsi_lib.c | 24 +--------------------
 include/linux/blkdev.h  |  1 +
 3 files changed, 49 insertions(+), 23 deletions(-)

diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index fa6df11b8bdd..19b63b64ecbc 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -882,6 +882,53 @@ void scsi_req_init(struct scsi_request *req)
 }
 EXPORT_SYMBOL(scsi_req_init);
 
+/* See set_host_byte() in include/scsi/scsi_cmnd.h */
+static void __set_host_byte(int *result, char status)
+{
+	*result = (*result & 0xff00ffff) | (status << 16);
+}
+
+/**
+ * __scsi_result_to_blk_status - translate a SCSI result code into blk_status_t
+ * @cmd:	SCSI command
+ * @cmd_result: pointer to scsi cmnd result code to be possibly changed
+ *
+ * Translate a SCSI result code into a blk_status_t value. May reset the host
+ * byte of @cmd_result.
+ */
+blk_status_t __scsi_result_to_blk_status(int *cmd_result, int result)
+{
+	switch (host_byte(result)) {
+	case DID_OK:
+		/*
+		 * Also check the other bytes than the status byte in result
+		 * to handle the case when a SCSI LLD sets result to
+		 * DRIVER_SENSE << 24 without setting SAM_STAT_CHECK_CONDITION.
+		 */
+		if (scsi_status_is_good(result))
+			return BLK_STS_OK;
+		return BLK_STS_IOERR;
+	case DID_TRANSPORT_FAILFAST:
+	case DID_TRANSPORT_MARGINAL:
+		return BLK_STS_TRANSPORT;
+	case DID_TARGET_FAILURE:
+		__set_host_byte(cmd_result, DID_OK);
+		return BLK_STS_TARGET;
+	case DID_NEXUS_FAILURE:
+		__set_host_byte(cmd_result, DID_OK);
+		return BLK_STS_NEXUS;
+	case DID_ALLOC_FAILURE:
+		__set_host_byte(cmd_result, DID_OK);
+		return BLK_STS_NOSPC;
+	case DID_MEDIUM_ERROR:
+		__set_host_byte(cmd_result, DID_OK);
+		return BLK_STS_MEDIUM;
+	default:
+		return BLK_STS_IOERR;
+	}
+}
+EXPORT_SYMBOL(__scsi_result_to_blk_status);
+
 static int __init blk_scsi_ioctl_init(void)
 {
 	blk_set_cmd_filter_defaults(&blk_default_cmd_filter);
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 6b994baf87c2..2c0eca3693af 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -589,29 +589,7 @@ static bool scsi_end_request(struct request *req, blk_status_t error,
  */
 static blk_status_t scsi_result_to_blk_status(struct scsi_cmnd *cmd, int result)
 {
-	switch (host_byte(result)) {
-	case DID_OK:
-		if (scsi_status_is_good(result))
-			return BLK_STS_OK;
-		return BLK_STS_IOERR;
-	case DID_TRANSPORT_FAILFAST:
-	case DID_TRANSPORT_MARGINAL:
-		return BLK_STS_TRANSPORT;
-	case DID_TARGET_FAILURE:
-		set_host_byte(cmd, DID_OK);
-		return BLK_STS_TARGET;
-	case DID_NEXUS_FAILURE:
-		set_host_byte(cmd, DID_OK);
-		return BLK_STS_NEXUS;
-	case DID_ALLOC_FAILURE:
-		set_host_byte(cmd, DID_OK);
-		return BLK_STS_NOSPC;
-	case DID_MEDIUM_ERROR:
-		set_host_byte(cmd, DID_OK);
-		return BLK_STS_MEDIUM;
-	default:
-		return BLK_STS_IOERR;
-	}
+	return __scsi_result_to_blk_status(&cmd->result, result);
 }
 
 /* Helper for scsi_io_completion() when "reprep" action required. */
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 1255823b2bc0..48497a77428d 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -2021,4 +2021,5 @@ int fsync_bdev(struct block_device *bdev);
 int freeze_bdev(struct block_device *bdev);
 int thaw_bdev(struct block_device *bdev);
 
+blk_status_t __scsi_result_to_blk_status(int *cmd_result, int result);
 #endif /* _LINUX_BLKDEV_H */
-- 
2.32.0


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v5 2/3] scsi: scsi_ioctl: add sg_io_to_blk_status()
  2021-06-28 15:15 ` [dm-devel] " mwilck
@ 2021-06-28 15:15   ` mwilck
  -1 siblings, 0 replies; 26+ messages in thread
From: mwilck @ 2021-06-28 15:15 UTC (permalink / raw)
  To: Mike Snitzer, Alasdair G Kergon, Bart Van Assche,
	Martin K. Petersen, linux-scsi, dm-devel, Hannes Reinecke,
	Christoph Hellwig
  Cc: Daniel Wagner, linux-block, Paolo Bonzini, Benjamin Marzinski,
	nkoenig, emilne, Martin Wilck

From: Martin Wilck <mwilck@suse.com>

This helper converts the SCSI result in a sg_io_hdr struct to a blk_status_t.
It will be used in the SG_IO code path for dm-multipath. Putting it into
scsi_ioctl.c avoids open-coding SCSI specific code in the dm layer.

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 block/scsi_ioctl.c     | 20 ++++++++++++++++++++
 include/linux/blkdev.h |  1 +
 2 files changed, 21 insertions(+)

diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index 19b63b64ecbc..e061398be90f 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -929,6 +929,26 @@ blk_status_t __scsi_result_to_blk_status(int *cmd_result, int result)
 }
 EXPORT_SYMBOL(__scsi_result_to_blk_status);
 
+blk_status_t sg_io_to_blk_status(struct sg_io_hdr *hdr)
+{
+	int result;
+	blk_status_t sts;
+
+	if (!(hdr->info & SG_INFO_CHECK))
+		return BLK_STS_OK;
+
+	result = hdr->status |
+		(hdr->msg_status << 8) |
+		(hdr->host_status << 16) |
+		(hdr->driver_status << 24);
+
+	sts = __scsi_result_to_blk_status(&result, result);
+	hdr->host_status = host_byte(result);
+
+	return sts;
+}
+EXPORT_SYMBOL(sg_io_to_blk_status);
+
 static int __init blk_scsi_ioctl_init(void)
 {
 	blk_set_cmd_filter_defaults(&blk_default_cmd_filter);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 48497a77428d..5da03edf125c 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -2022,4 +2022,5 @@ int freeze_bdev(struct block_device *bdev);
 int thaw_bdev(struct block_device *bdev);
 
 blk_status_t __scsi_result_to_blk_status(int *cmd_result, int result);
+blk_status_t sg_io_to_blk_status(struct sg_io_hdr *hdr);
 #endif /* _LINUX_BLKDEV_H */
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [dm-devel] [PATCH v5 2/3] scsi: scsi_ioctl: add sg_io_to_blk_status()
@ 2021-06-28 15:15   ` mwilck
  0 siblings, 0 replies; 26+ messages in thread
From: mwilck @ 2021-06-28 15:15 UTC (permalink / raw)
  To: Mike Snitzer, Alasdair G Kergon, Bart Van Assche,
	Martin K. Petersen, linux-scsi, dm-devel, Hannes Reinecke,
	Christoph Hellwig
  Cc: Daniel Wagner, emilne, linux-block, nkoenig, Paolo Bonzini, Martin Wilck

From: Martin Wilck <mwilck@suse.com>

This helper converts the SCSI result in a sg_io_hdr struct to a blk_status_t.
It will be used in the SG_IO code path for dm-multipath. Putting it into
scsi_ioctl.c avoids open-coding SCSI specific code in the dm layer.

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 block/scsi_ioctl.c     | 20 ++++++++++++++++++++
 include/linux/blkdev.h |  1 +
 2 files changed, 21 insertions(+)

diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index 19b63b64ecbc..e061398be90f 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -929,6 +929,26 @@ blk_status_t __scsi_result_to_blk_status(int *cmd_result, int result)
 }
 EXPORT_SYMBOL(__scsi_result_to_blk_status);
 
+blk_status_t sg_io_to_blk_status(struct sg_io_hdr *hdr)
+{
+	int result;
+	blk_status_t sts;
+
+	if (!(hdr->info & SG_INFO_CHECK))
+		return BLK_STS_OK;
+
+	result = hdr->status |
+		(hdr->msg_status << 8) |
+		(hdr->host_status << 16) |
+		(hdr->driver_status << 24);
+
+	sts = __scsi_result_to_blk_status(&result, result);
+	hdr->host_status = host_byte(result);
+
+	return sts;
+}
+EXPORT_SYMBOL(sg_io_to_blk_status);
+
 static int __init blk_scsi_ioctl_init(void)
 {
 	blk_set_cmd_filter_defaults(&blk_default_cmd_filter);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 48497a77428d..5da03edf125c 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -2022,4 +2022,5 @@ int freeze_bdev(struct block_device *bdev);
 int thaw_bdev(struct block_device *bdev);
 
 blk_status_t __scsi_result_to_blk_status(int *cmd_result, int result);
+blk_status_t sg_io_to_blk_status(struct sg_io_hdr *hdr);
 #endif /* _LINUX_BLKDEV_H */
-- 
2.32.0


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v5 3/3] dm mpath: add CONFIG_DM_MULTIPATH_SG_IO - failover for SG_IO
  2021-06-28 15:15 ` [dm-devel] " mwilck
@ 2021-06-28 15:15   ` mwilck
  -1 siblings, 0 replies; 26+ messages in thread
From: mwilck @ 2021-06-28 15:15 UTC (permalink / raw)
  To: Mike Snitzer, Alasdair G Kergon, Bart Van Assche,
	Martin K. Petersen, linux-scsi, dm-devel, Hannes Reinecke,
	Christoph Hellwig
  Cc: Daniel Wagner, linux-block, Paolo Bonzini, Benjamin Marzinski,
	nkoenig, emilne, Martin Wilck

From: Martin Wilck <mwilck@suse.com>

In virtual deployments, SCSI passthrough over dm-multipath devices is a
common setup. The qemu "pr-helper" was specifically invented for it. I
believe that this is the most important real-world scenario for sending
SG_IO ioctls to device-mapper devices.

In this configuration, guests send SCSI IO to the hypervisor in the form of
SG_IO ioctls issued by qemu. But on the device-mapper level, these SCSI
ioctls aren't treated like regular IO. Until commit 2361ae595352 ("dm mpath:
switch paths in dm_blk_ioctl() code path"), no path switching was done at
all. Worse though, if an SG_IO call fails because of a path error,
dm-multipath doesn't retry the IO on a another path; rather, the failure is
passed back to the guest, and paths are not marked as faulty.  This is in
stark contrast with regular block IO of guests on dm-multipath devices, and
certainly comes as a surprise to users who switch to SCSI passthrough in
qemu. In general, users of dm-multipath devices would probably expect failover
to work at least in a basic way.

This patch fixes this by taking a special code path for SG_IO on dm-multipath
targets if CONFIG_DM_MULTIPATH_SG_IO is set.  Rather then just choosing a
single path, sending the IO to it, and failing to the caller if the IO on the
path failed, it retries the same IO on another path for certain error codes,
using blk_path_error() to determine if a retry would make sense for the given
error code. Moreover, it fails the path on which the path error occurred,
like regular block IO would.

If all paths in a multipath map are failed, the behavior depends on the
queue_if_no_path setting of the map. If it is off, multipath_prepare_ioctl()
fails with -EIO, and the search for a valid paths is stopped. If it is on,
the caller will block until either queuing is disabled (in which case IO will
error out) or at least one path is reinstated (in which case IO will resume).
This is as close to regular READ/WRITE as it gets.

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 block/scsi_ioctl.c            |   5 +-
 drivers/md/Kconfig            |  11 ++++
 drivers/md/dm-core.h          |   5 ++
 drivers/md/dm-mpath.c         | 105 ++++++++++++++++++++++++++++++++++
 drivers/md/dm.c               |  26 ++++++++-
 include/linux/blkdev.h        |   2 +
 include/linux/device-mapper.h |   8 ++-
 7 files changed, 156 insertions(+), 6 deletions(-)

diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index e061398be90f..e6a62c1c5404 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -281,8 +281,8 @@ static int blk_complete_sghdr_rq(struct request *rq, struct sg_io_hdr *hdr,
 	return ret;
 }
 
-static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
-		struct sg_io_hdr *hdr, fmode_t mode)
+int sg_io(struct request_queue *q, struct gendisk *bd_disk,
+	  struct sg_io_hdr *hdr, fmode_t mode)
 {
 	unsigned long start_time;
 	ssize_t ret = 0;
@@ -367,6 +367,7 @@ static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
 	blk_put_request(rq);
 	return ret;
 }
+EXPORT_SYMBOL_GPL(sg_io);
 
 /**
  * sg_scsi_ioctl  --  handle deprecated SCSI_IOCTL_SEND_COMMAND ioctl
diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index f2014385d48b..f28f29e3bd11 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -473,6 +473,17 @@ config DM_MULTIPATH_IOA
 
 	  If unsure, say N.
 
+config DM_MULTIPATH_SG_IO
+       bool "Retry SCSI generic I/O on multipath devices"
+       depends on DM_MULTIPATH && BLK_SCSI_REQUEST
+       help
+	 With this option, SCSI generic (SG) requests issued on multipath
+	 devices will behave similar to regular block I/O: upon failure,
+	 they are repeated on a different path, and the erroring device
+	 is marked as failed.
+
+	 If unsure, say N.
+
 config DM_DELAY
 	tristate "I/O delaying target"
 	depends on BLK_DEV_DM
diff --git a/drivers/md/dm-core.h b/drivers/md/dm-core.h
index 5953ff2bd260..8bd8a8e3916e 100644
--- a/drivers/md/dm-core.h
+++ b/drivers/md/dm-core.h
@@ -189,4 +189,9 @@ extern atomic_t dm_global_event_nr;
 extern wait_queue_head_t dm_global_eventq;
 void dm_issue_global_event(void);
 
+int __dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx,
+		       struct block_device **bdev,
+		       struct dm_target **target);
+void dm_unprepare_ioctl(struct mapped_device *md, int srcu_idx);
+
 #endif
diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index bced42f082b0..86518aec32b4 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -11,6 +11,7 @@
 #include "dm-bio-record.h"
 #include "dm-path-selector.h"
 #include "dm-uevent.h"
+#include "dm-core.h"
 
 #include <linux/blkdev.h>
 #include <linux/ctype.h>
@@ -26,6 +27,7 @@
 #include <scsi/scsi_dh.h>
 #include <linux/atomic.h>
 #include <linux/blk-mq.h>
+#include <scsi/sg.h>
 
 #define DM_MSG_PREFIX "multipath"
 #define DM_PG_INIT_DELAY_MSECS 2000
@@ -2129,6 +2131,106 @@ static int multipath_busy(struct dm_target *ti)
 	return busy;
 }
 
+#ifdef CONFIG_DM_MULTIPATH_SG_IO
+static int pgpath_sg_io_ioctl(struct block_device *bdev,
+			    struct sg_io_hdr *hdr, struct multipath *m,
+			    fmode_t mode)
+{
+	int rc;
+	blk_status_t sts;
+	struct priority_group *pg;
+	struct pgpath *pgpath;
+	char path_name[BDEVNAME_SIZE];
+
+	rc = sg_io(bdev->bd_disk->queue, bdev->bd_disk, hdr, mode);
+	DMDEBUG("SG_IO via %s: rc = %d D%02xH%02xM%02xS%02x",
+		bdevname(bdev, path_name), rc,
+		hdr->driver_status, hdr->host_status,
+		hdr->msg_status, hdr->status);
+
+	/*
+	 * Errors resulting from invalid parameters shouldn't be retried
+	 * on another path.
+	 */
+	switch (rc) {
+	case -ENOIOCTLCMD:
+	case -EFAULT:
+	case -EINVAL:
+	case -EPERM:
+		return rc;
+	default:
+		break;
+	}
+
+	sts = sg_io_to_blk_status(hdr);
+	if (sts == BLK_STS_OK)
+		return 0;
+	else if (!blk_path_error(sts))
+		return blk_status_to_errno(sts);
+
+	/* path error - fail the path */
+	list_for_each_entry(pg, &m->priority_groups, list) {
+		list_for_each_entry(pgpath, &pg->pgpaths, list) {
+			if (pgpath->path.dev->bdev == bdev)
+				fail_path(pgpath);
+		}
+	}
+
+	return -EAGAIN;
+}
+
+static int multipath_sg_io_ioctl(struct block_device *bdev, fmode_t mode,
+				 unsigned int cmd, unsigned long uarg)
+{
+	struct mapped_device *md = bdev->bd_disk->private_data;
+	void __user *arg = (void __user *)uarg;
+	struct sg_io_hdr hdr;
+	int rc;
+	bool suspended;
+	int retries = 5;
+
+	if (copy_from_user(&hdr, arg, sizeof(hdr)))
+		return -EFAULT;
+
+	if (hdr.interface_id != 'S')
+		return -EINVAL;
+
+	if (hdr.dxfer_len > (queue_max_hw_sectors(bdev->bd_disk->queue) << 9))
+		return -EIO;
+
+	do {
+		struct block_device *path_dev;
+		struct dm_target *tgt;
+		struct sg_io_hdr rhdr;
+		int srcu_idx;
+
+		suspended = false;
+		/* This will fail and break the loop if no valid paths found */
+		rc = __dm_prepare_ioctl(md, &srcu_idx, &path_dev, &tgt);
+		if (rc == -EAGAIN)
+			suspended = true;
+		else if (rc < 0)
+			DMERR("%s: failed to get path: %d", __func__, rc);
+		else {
+			/* Need to copy the sg_io_hdr, it may be modified */
+			rhdr = hdr;
+			rc = pgpath_sg_io_ioctl(path_dev, &rhdr,
+						tgt->private, mode);
+			if (rc == 0 && copy_to_user(arg, &rhdr, sizeof(rhdr)))
+				rc = -EFAULT;
+		}
+		dm_unprepare_ioctl(md, srcu_idx);
+		if (suspended) {
+			DMDEBUG("%s: suspended, retries = %d\n",
+				__func__, retries);
+			msleep(20);
+		}
+	} while (rc == -EAGAIN && (!suspended || retries-- > 0));
+
+	return rc;
+}
+#endif
+
 /*-----------------------------------------------------------------
  * Module setup
  *---------------------------------------------------------------*/
@@ -2153,6 +2255,9 @@ static struct target_type multipath_target = {
 	.prepare_ioctl = multipath_prepare_ioctl,
 	.iterate_devices = multipath_iterate_devices,
 	.busy = multipath_busy,
+#ifdef CONFIG_DM_MULTIPATH_SG_IO
+	.sg_io_ioctl = multipath_sg_io_ioctl,
+#endif
 };
 
 static int __init dm_multipath_init(void)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index ca2aedd8ee7d..af15d2c132ce 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -29,6 +29,7 @@
 #include <linux/part_stat.h>
 #include <linux/blk-crypto.h>
 #include <linux/keyslot-manager.h>
+#include <scsi/sg.h>
 
 #define DM_MSG_PREFIX "core"
 
@@ -522,8 +523,9 @@ static int dm_blk_report_zones(struct gendisk *disk, sector_t sector,
 #define dm_blk_report_zones		NULL
 #endif /* CONFIG_BLK_DEV_ZONED */
 
-static int dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx,
-			    struct block_device **bdev)
+int __dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx,
+		       struct block_device **bdev,
+		       struct dm_target **target)
 {
 	struct dm_target *tgt;
 	struct dm_table *map;
@@ -553,13 +555,24 @@ static int dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx,
 		goto retry;
 	}
 
+	if (r >= 0 && target)
+		*target = tgt;
+
 	return r;
 }
+EXPORT_SYMBOL_GPL(__dm_prepare_ioctl);
 
-static void dm_unprepare_ioctl(struct mapped_device *md, int srcu_idx)
+static int dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx,
+			    struct block_device **bdev)
+{
+	return __dm_prepare_ioctl(md, srcu_idx, bdev, NULL);
+}
+
+void dm_unprepare_ioctl(struct mapped_device *md, int srcu_idx)
 {
 	dm_put_live_table(md, srcu_idx);
 }
+EXPORT_SYMBOL_GPL(dm_unprepare_ioctl);
 
 static int dm_blk_ioctl(struct block_device *bdev, fmode_t mode,
 			unsigned int cmd, unsigned long arg)
@@ -567,6 +580,13 @@ static int dm_blk_ioctl(struct block_device *bdev, fmode_t mode,
 	struct mapped_device *md = bdev->bd_disk->private_data;
 	int r, srcu_idx;
 
+#ifdef CONFIG_DM_MULTIPATH_SG_IO
+	if (cmd == SG_IO && md->immutable_target &&
+	    md->immutable_target->type->sg_io_ioctl)
+		return md->immutable_target->type->sg_io_ioctl(bdev, mode,
+							       cmd, arg);
+#endif
+
 	r = dm_prepare_ioctl(md, &srcu_idx, &bdev);
 	if (r < 0)
 		goto out;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 5da03edf125c..96eaf1edd7b0 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -923,6 +923,8 @@ extern int scsi_cmd_ioctl(struct request_queue *, struct gendisk *, fmode_t,
 			  unsigned int, void __user *);
 extern int sg_scsi_ioctl(struct request_queue *, struct gendisk *, fmode_t,
 			 struct scsi_ioctl_command __user *);
+extern int sg_io(struct request_queue *q, struct gendisk *gd,
+		 struct sg_io_hdr *hdr, fmode_t mode);
 extern int get_sg_io_hdr(struct sg_io_hdr *hdr, const void __user *argp);
 extern int put_sg_io_hdr(const struct sg_io_hdr *hdr, void __user *argp);
 
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index ff700fb6ce1d..c94ae6ee81b8 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -151,6 +151,10 @@ typedef size_t (*dm_dax_copy_iter_fn)(struct dm_target *ti, pgoff_t pgoff,
 		void *addr, size_t bytes, struct iov_iter *i);
 typedef int (*dm_dax_zero_page_range_fn)(struct dm_target *ti, pgoff_t pgoff,
 		size_t nr_pages);
+
+typedef int (*dm_sg_io_ioctl_fn)(struct block_device *bdev, fmode_t mode,
+				 unsigned int cmd, unsigned long uarg);
+
 #define PAGE_SECTORS (PAGE_SIZE / 512)
 
 void dm_error(const char *message);
@@ -204,7 +208,9 @@ struct target_type {
 	dm_dax_copy_iter_fn dax_copy_from_iter;
 	dm_dax_copy_iter_fn dax_copy_to_iter;
 	dm_dax_zero_page_range_fn dax_zero_page_range;
-
+#ifdef CONFIG_DM_MULTIPATH_SG_IO
+	dm_sg_io_ioctl_fn sg_io_ioctl;
+#endif
 	/* For internal device-mapper use. */
 	struct list_head list;
 };
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [dm-devel] [PATCH v5 3/3] dm mpath: add CONFIG_DM_MULTIPATH_SG_IO - failover for SG_IO
@ 2021-06-28 15:15   ` mwilck
  0 siblings, 0 replies; 26+ messages in thread
From: mwilck @ 2021-06-28 15:15 UTC (permalink / raw)
  To: Mike Snitzer, Alasdair G Kergon, Bart Van Assche,
	Martin K. Petersen, linux-scsi, dm-devel, Hannes Reinecke,
	Christoph Hellwig
  Cc: Daniel Wagner, emilne, linux-block, nkoenig, Paolo Bonzini, Martin Wilck

From: Martin Wilck <mwilck@suse.com>

In virtual deployments, SCSI passthrough over dm-multipath devices is a
common setup. The qemu "pr-helper" was specifically invented for it. I
believe that this is the most important real-world scenario for sending
SG_IO ioctls to device-mapper devices.

In this configuration, guests send SCSI IO to the hypervisor in the form of
SG_IO ioctls issued by qemu. But on the device-mapper level, these SCSI
ioctls aren't treated like regular IO. Until commit 2361ae595352 ("dm mpath:
switch paths in dm_blk_ioctl() code path"), no path switching was done at
all. Worse though, if an SG_IO call fails because of a path error,
dm-multipath doesn't retry the IO on a another path; rather, the failure is
passed back to the guest, and paths are not marked as faulty.  This is in
stark contrast with regular block IO of guests on dm-multipath devices, and
certainly comes as a surprise to users who switch to SCSI passthrough in
qemu. In general, users of dm-multipath devices would probably expect failover
to work at least in a basic way.

This patch fixes this by taking a special code path for SG_IO on dm-multipath
targets if CONFIG_DM_MULTIPATH_SG_IO is set.  Rather then just choosing a
single path, sending the IO to it, and failing to the caller if the IO on the
path failed, it retries the same IO on another path for certain error codes,
using blk_path_error() to determine if a retry would make sense for the given
error code. Moreover, it fails the path on which the path error occurred,
like regular block IO would.

If all paths in a multipath map are failed, the behavior depends on the
queue_if_no_path setting of the map. If it is off, multipath_prepare_ioctl()
fails with -EIO, and the search for a valid paths is stopped. If it is on,
the caller will block until either queuing is disabled (in which case IO will
error out) or at least one path is reinstated (in which case IO will resume).
This is as close to regular READ/WRITE as it gets.

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 block/scsi_ioctl.c            |   5 +-
 drivers/md/Kconfig            |  11 ++++
 drivers/md/dm-core.h          |   5 ++
 drivers/md/dm-mpath.c         | 105 ++++++++++++++++++++++++++++++++++
 drivers/md/dm.c               |  26 ++++++++-
 include/linux/blkdev.h        |   2 +
 include/linux/device-mapper.h |   8 ++-
 7 files changed, 156 insertions(+), 6 deletions(-)

diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index e061398be90f..e6a62c1c5404 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -281,8 +281,8 @@ static int blk_complete_sghdr_rq(struct request *rq, struct sg_io_hdr *hdr,
 	return ret;
 }
 
-static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
-		struct sg_io_hdr *hdr, fmode_t mode)
+int sg_io(struct request_queue *q, struct gendisk *bd_disk,
+	  struct sg_io_hdr *hdr, fmode_t mode)
 {
 	unsigned long start_time;
 	ssize_t ret = 0;
@@ -367,6 +367,7 @@ static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
 	blk_put_request(rq);
 	return ret;
 }
+EXPORT_SYMBOL_GPL(sg_io);
 
 /**
  * sg_scsi_ioctl  --  handle deprecated SCSI_IOCTL_SEND_COMMAND ioctl
diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index f2014385d48b..f28f29e3bd11 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -473,6 +473,17 @@ config DM_MULTIPATH_IOA
 
 	  If unsure, say N.
 
+config DM_MULTIPATH_SG_IO
+       bool "Retry SCSI generic I/O on multipath devices"
+       depends on DM_MULTIPATH && BLK_SCSI_REQUEST
+       help
+	 With this option, SCSI generic (SG) requests issued on multipath
+	 devices will behave similar to regular block I/O: upon failure,
+	 they are repeated on a different path, and the erroring device
+	 is marked as failed.
+
+	 If unsure, say N.
+
 config DM_DELAY
 	tristate "I/O delaying target"
 	depends on BLK_DEV_DM
diff --git a/drivers/md/dm-core.h b/drivers/md/dm-core.h
index 5953ff2bd260..8bd8a8e3916e 100644
--- a/drivers/md/dm-core.h
+++ b/drivers/md/dm-core.h
@@ -189,4 +189,9 @@ extern atomic_t dm_global_event_nr;
 extern wait_queue_head_t dm_global_eventq;
 void dm_issue_global_event(void);
 
+int __dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx,
+		       struct block_device **bdev,
+		       struct dm_target **target);
+void dm_unprepare_ioctl(struct mapped_device *md, int srcu_idx);
+
 #endif
diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index bced42f082b0..86518aec32b4 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -11,6 +11,7 @@
 #include "dm-bio-record.h"
 #include "dm-path-selector.h"
 #include "dm-uevent.h"
+#include "dm-core.h"
 
 #include <linux/blkdev.h>
 #include <linux/ctype.h>
@@ -26,6 +27,7 @@
 #include <scsi/scsi_dh.h>
 #include <linux/atomic.h>
 #include <linux/blk-mq.h>
+#include <scsi/sg.h>
 
 #define DM_MSG_PREFIX "multipath"
 #define DM_PG_INIT_DELAY_MSECS 2000
@@ -2129,6 +2131,106 @@ static int multipath_busy(struct dm_target *ti)
 	return busy;
 }
 
+#ifdef CONFIG_DM_MULTIPATH_SG_IO
+static int pgpath_sg_io_ioctl(struct block_device *bdev,
+			    struct sg_io_hdr *hdr, struct multipath *m,
+			    fmode_t mode)
+{
+	int rc;
+	blk_status_t sts;
+	struct priority_group *pg;
+	struct pgpath *pgpath;
+	char path_name[BDEVNAME_SIZE];
+
+	rc = sg_io(bdev->bd_disk->queue, bdev->bd_disk, hdr, mode);
+	DMDEBUG("SG_IO via %s: rc = %d D%02xH%02xM%02xS%02x",
+		bdevname(bdev, path_name), rc,
+		hdr->driver_status, hdr->host_status,
+		hdr->msg_status, hdr->status);
+
+	/*
+	 * Errors resulting from invalid parameters shouldn't be retried
+	 * on another path.
+	 */
+	switch (rc) {
+	case -ENOIOCTLCMD:
+	case -EFAULT:
+	case -EINVAL:
+	case -EPERM:
+		return rc;
+	default:
+		break;
+	}
+
+	sts = sg_io_to_blk_status(hdr);
+	if (sts == BLK_STS_OK)
+		return 0;
+	else if (!blk_path_error(sts))
+		return blk_status_to_errno(sts);
+
+	/* path error - fail the path */
+	list_for_each_entry(pg, &m->priority_groups, list) {
+		list_for_each_entry(pgpath, &pg->pgpaths, list) {
+			if (pgpath->path.dev->bdev == bdev)
+				fail_path(pgpath);
+		}
+	}
+
+	return -EAGAIN;
+}
+
+static int multipath_sg_io_ioctl(struct block_device *bdev, fmode_t mode,
+				 unsigned int cmd, unsigned long uarg)
+{
+	struct mapped_device *md = bdev->bd_disk->private_data;
+	void __user *arg = (void __user *)uarg;
+	struct sg_io_hdr hdr;
+	int rc;
+	bool suspended;
+	int retries = 5;
+
+	if (copy_from_user(&hdr, arg, sizeof(hdr)))
+		return -EFAULT;
+
+	if (hdr.interface_id != 'S')
+		return -EINVAL;
+
+	if (hdr.dxfer_len > (queue_max_hw_sectors(bdev->bd_disk->queue) << 9))
+		return -EIO;
+
+	do {
+		struct block_device *path_dev;
+		struct dm_target *tgt;
+		struct sg_io_hdr rhdr;
+		int srcu_idx;
+
+		suspended = false;
+		/* This will fail and break the loop if no valid paths found */
+		rc = __dm_prepare_ioctl(md, &srcu_idx, &path_dev, &tgt);
+		if (rc == -EAGAIN)
+			suspended = true;
+		else if (rc < 0)
+			DMERR("%s: failed to get path: %d", __func__, rc);
+		else {
+			/* Need to copy the sg_io_hdr, it may be modified */
+			rhdr = hdr;
+			rc = pgpath_sg_io_ioctl(path_dev, &rhdr,
+						tgt->private, mode);
+			if (rc == 0 && copy_to_user(arg, &rhdr, sizeof(rhdr)))
+				rc = -EFAULT;
+		}
+		dm_unprepare_ioctl(md, srcu_idx);
+		if (suspended) {
+			DMDEBUG("%s: suspended, retries = %d\n",
+				__func__, retries);
+			msleep(20);
+		}
+	} while (rc == -EAGAIN && (!suspended || retries-- > 0));
+
+	return rc;
+}
+#endif
+
 /*-----------------------------------------------------------------
  * Module setup
  *---------------------------------------------------------------*/
@@ -2153,6 +2255,9 @@ static struct target_type multipath_target = {
 	.prepare_ioctl = multipath_prepare_ioctl,
 	.iterate_devices = multipath_iterate_devices,
 	.busy = multipath_busy,
+#ifdef CONFIG_DM_MULTIPATH_SG_IO
+	.sg_io_ioctl = multipath_sg_io_ioctl,
+#endif
 };
 
 static int __init dm_multipath_init(void)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index ca2aedd8ee7d..af15d2c132ce 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -29,6 +29,7 @@
 #include <linux/part_stat.h>
 #include <linux/blk-crypto.h>
 #include <linux/keyslot-manager.h>
+#include <scsi/sg.h>
 
 #define DM_MSG_PREFIX "core"
 
@@ -522,8 +523,9 @@ static int dm_blk_report_zones(struct gendisk *disk, sector_t sector,
 #define dm_blk_report_zones		NULL
 #endif /* CONFIG_BLK_DEV_ZONED */
 
-static int dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx,
-			    struct block_device **bdev)
+int __dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx,
+		       struct block_device **bdev,
+		       struct dm_target **target)
 {
 	struct dm_target *tgt;
 	struct dm_table *map;
@@ -553,13 +555,24 @@ static int dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx,
 		goto retry;
 	}
 
+	if (r >= 0 && target)
+		*target = tgt;
+
 	return r;
 }
+EXPORT_SYMBOL_GPL(__dm_prepare_ioctl);
 
-static void dm_unprepare_ioctl(struct mapped_device *md, int srcu_idx)
+static int dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx,
+			    struct block_device **bdev)
+{
+	return __dm_prepare_ioctl(md, srcu_idx, bdev, NULL);
+}
+
+void dm_unprepare_ioctl(struct mapped_device *md, int srcu_idx)
 {
 	dm_put_live_table(md, srcu_idx);
 }
+EXPORT_SYMBOL_GPL(dm_unprepare_ioctl);
 
 static int dm_blk_ioctl(struct block_device *bdev, fmode_t mode,
 			unsigned int cmd, unsigned long arg)
@@ -567,6 +580,13 @@ static int dm_blk_ioctl(struct block_device *bdev, fmode_t mode,
 	struct mapped_device *md = bdev->bd_disk->private_data;
 	int r, srcu_idx;
 
+#ifdef CONFIG_DM_MULTIPATH_SG_IO
+	if (cmd == SG_IO && md->immutable_target &&
+	    md->immutable_target->type->sg_io_ioctl)
+		return md->immutable_target->type->sg_io_ioctl(bdev, mode,
+							       cmd, arg);
+#endif
+
 	r = dm_prepare_ioctl(md, &srcu_idx, &bdev);
 	if (r < 0)
 		goto out;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 5da03edf125c..96eaf1edd7b0 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -923,6 +923,8 @@ extern int scsi_cmd_ioctl(struct request_queue *, struct gendisk *, fmode_t,
 			  unsigned int, void __user *);
 extern int sg_scsi_ioctl(struct request_queue *, struct gendisk *, fmode_t,
 			 struct scsi_ioctl_command __user *);
+extern int sg_io(struct request_queue *q, struct gendisk *gd,
+		 struct sg_io_hdr *hdr, fmode_t mode);
 extern int get_sg_io_hdr(struct sg_io_hdr *hdr, const void __user *argp);
 extern int put_sg_io_hdr(const struct sg_io_hdr *hdr, void __user *argp);
 
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index ff700fb6ce1d..c94ae6ee81b8 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -151,6 +151,10 @@ typedef size_t (*dm_dax_copy_iter_fn)(struct dm_target *ti, pgoff_t pgoff,
 		void *addr, size_t bytes, struct iov_iter *i);
 typedef int (*dm_dax_zero_page_range_fn)(struct dm_target *ti, pgoff_t pgoff,
 		size_t nr_pages);
+
+typedef int (*dm_sg_io_ioctl_fn)(struct block_device *bdev, fmode_t mode,
+				 unsigned int cmd, unsigned long uarg);
+
 #define PAGE_SECTORS (PAGE_SIZE / 512)
 
 void dm_error(const char *message);
@@ -204,7 +208,9 @@ struct target_type {
 	dm_dax_copy_iter_fn dax_copy_from_iter;
 	dm_dax_copy_iter_fn dax_copy_to_iter;
 	dm_dax_zero_page_range_fn dax_zero_page_range;
-
+#ifdef CONFIG_DM_MULTIPATH_SG_IO
+	dm_sg_io_ioctl_fn sg_io_ioctl;
+#endif
 	/* For internal device-mapper use. */
 	struct list_head list;
 };
-- 
2.32.0


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 3/3] dm mpath: add CONFIG_DM_MULTIPATH_SG_IO - failover for SG_IO
  2021-06-28 15:15   ` [dm-devel] " mwilck
@ 2021-07-01  7:56     ` Christoph Hellwig
  -1 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2021-07-01  7:56 UTC (permalink / raw)
  To: mwilck
  Cc: Mike Snitzer, Alasdair G Kergon, Bart Van Assche,
	Martin K. Petersen, linux-scsi, dm-devel, Hannes Reinecke,
	Christoph Hellwig, Daniel Wagner, linux-block, Paolo Bonzini,
	Benjamin Marzinski, nkoenig, emilne

On Mon, Jun 28, 2021 at 05:15:58PM +0200, mwilck@suse.com wrote:
> The qemu "pr-helper" was specifically invented for it. I
> believe that this is the most important real-world scenario for sending
> SG_IO ioctls to device-mapper devices.

pr-helper obviously does not SG_IO on dm-multipath as that simply does
not work.

More importantly - if you want to use persistent reservations use the
kernel ioctls for that.  These work on SCSI, NVMe and device mapper
without any extra magic.

Failing over SG_IO does not make sense.  It is an interface specically
designed to leave all error handling to the userspace program using it,
and we should not change that for one specific error case.  If you
want the kernel to handle errors for you, use the proper interfaces.
In this case this is the persistent reservation ioctls.  If they miss
some features that qemu needs we should add those.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [dm-devel] [PATCH v5 3/3] dm mpath: add CONFIG_DM_MULTIPATH_SG_IO - failover for SG_IO
@ 2021-07-01  7:56     ` Christoph Hellwig
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2021-07-01  7:56 UTC (permalink / raw)
  To: mwilck
  Cc: Mike Snitzer, linux-scsi, Daniel Wagner, emilne, linux-block,
	dm-devel, Paolo Bonzini, Martin K. Petersen, nkoenig,
	Bart Van Assche, Christoph Hellwig, Alasdair G Kergon

On Mon, Jun 28, 2021 at 05:15:58PM +0200, mwilck@suse.com wrote:
> The qemu "pr-helper" was specifically invented for it. I
> believe that this is the most important real-world scenario for sending
> SG_IO ioctls to device-mapper devices.

pr-helper obviously does not SG_IO on dm-multipath as that simply does
not work.

More importantly - if you want to use persistent reservations use the
kernel ioctls for that.  These work on SCSI, NVMe and device mapper
without any extra magic.

Failing over SG_IO does not make sense.  It is an interface specically
designed to leave all error handling to the userspace program using it,
and we should not change that for one specific error case.  If you
want the kernel to handle errors for you, use the proper interfaces.
In this case this is the persistent reservation ioctls.  If they miss
some features that qemu needs we should add those.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 3/3] dm mpath: add CONFIG_DM_MULTIPATH_SG_IO - failover for SG_IO
  2021-07-01  7:56     ` [dm-devel] " Christoph Hellwig
@ 2021-07-01 10:35       ` Martin Wilck
  -1 siblings, 0 replies; 26+ messages in thread
From: Martin Wilck @ 2021-07-01 10:35 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Mike Snitzer, Alasdair G Kergon, Bart Van Assche,
	Martin K. Petersen, linux-scsi, dm-devel, Hannes Reinecke,
	Daniel Wagner, linux-block, Paolo Bonzini, Benjamin Marzinski,
	nkoenig, emilne

On Do, 2021-07-01 at 09:56 +0200, Christoph Hellwig wrote:
> On Mon, Jun 28, 2021 at 05:15:58PM +0200, mwilck@suse.com wrote:
> > The qemu "pr-helper" was specifically invented for it. I
> > believe that this is the most important real-world scenario for
> > sending
> > SG_IO ioctls to device-mapper devices.
> 
> pr-helper obviously does not SG_IO on dm-multipath as that simply
> does
> not work.
> 
> More importantly - if you want to use persistent reservations use the
> kernel ioctls for that.  These work on SCSI, NVMe and device mapper
> without any extra magic.

This set is not about persistent reservations. Sorry if my mentioning
pr-helper made this impression. It was only meant to emphasize the fact
that qemu SCSI passthrough using SG_IO is an important use case.

> Failing over SG_IO does not make sense.  It is an interface
> specically
> designed to leave all error handling to the userspace program using
> it,
> and we should not change that for one specific error case.  If you
> want the kernel to handle errors for you, use the proper interfaces.
> In this case this is the persistent reservation ioctls.  If they miss
> some features that qemu needs we should add those.

I respectfully disagree. Users of dm-multipath devices expect that IO
succeeds as long as there's at least one healthy path. This is a
fundamental property of multipath devices. Whether IO is sent
"normally" or via SG_IO doesn't make a difference wrt this expectation.

The fact that qemu implements SCSI passthrough this way shows that this
is not just a naïve user expectation, but is shared by experienced
developers as well. AFAICS, we can't simply move the path error
detection and retry logic into user space qemu, because user space
doesn't have information about the status of the multipath map; not to
mention that doing this would be highly inefficient.

I agree that in principle, SG_IO error handling should be left to user
space. But in this specific case, it makes sense to handle just the
"path error vs. target error" distinction in the kernel. All else is of
course still handled by user space.

Regards,
Martin





^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [dm-devel] [PATCH v5 3/3] dm mpath: add CONFIG_DM_MULTIPATH_SG_IO - failover for SG_IO
@ 2021-07-01 10:35       ` Martin Wilck
  0 siblings, 0 replies; 26+ messages in thread
From: Martin Wilck @ 2021-07-01 10:35 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Mike Snitzer, linux-scsi, emilne, linux-block, dm-devel,
	Paolo Bonzini, Martin K. Petersen, nkoenig, Bart Van Assche,
	Daniel Wagner, Alasdair G Kergon

On Do, 2021-07-01 at 09:56 +0200, Christoph Hellwig wrote:
> On Mon, Jun 28, 2021 at 05:15:58PM +0200, mwilck@suse.com wrote:
> > The qemu "pr-helper" was specifically invented for it. I
> > believe that this is the most important real-world scenario for
> > sending
> > SG_IO ioctls to device-mapper devices.
> 
> pr-helper obviously does not SG_IO on dm-multipath as that simply
> does
> not work.
> 
> More importantly - if you want to use persistent reservations use the
> kernel ioctls for that.  These work on SCSI, NVMe and device mapper
> without any extra magic.

This set is not about persistent reservations. Sorry if my mentioning
pr-helper made this impression. It was only meant to emphasize the fact
that qemu SCSI passthrough using SG_IO is an important use case.

> Failing over SG_IO does not make sense.  It is an interface
> specically
> designed to leave all error handling to the userspace program using
> it,
> and we should not change that for one specific error case.  If you
> want the kernel to handle errors for you, use the proper interfaces.
> In this case this is the persistent reservation ioctls.  If they miss
> some features that qemu needs we should add those.

I respectfully disagree. Users of dm-multipath devices expect that IO
succeeds as long as there's at least one healthy path. This is a
fundamental property of multipath devices. Whether IO is sent
"normally" or via SG_IO doesn't make a difference wrt this expectation.

The fact that qemu implements SCSI passthrough this way shows that this
is not just a naïve user expectation, but is shared by experienced
developers as well. AFAICS, we can't simply move the path error
detection and retry logic into user space qemu, because user space
doesn't have information about the status of the multipath map; not to
mention that doing this would be highly inefficient.

I agree that in principle, SG_IO error handling should be left to user
space. But in this specific case, it makes sense to handle just the
"path error vs. target error" distinction in the kernel. All else is of
course still handled by user space.

Regards,
Martin





--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 3/3] dm mpath: add CONFIG_DM_MULTIPATH_SG_IO - failover for SG_IO
  2021-07-01  7:56     ` [dm-devel] " Christoph Hellwig
@ 2021-07-01 11:06       ` Paolo Bonzini
  -1 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2021-07-01 11:06 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Martin Wilck, Mike Snitzer, Alasdair G Kergon, Bart Van Assche,
	Martin K. Petersen, linux-scsi, dm-devel, Hannes Reinecke,
	Daniel Wagner, linux-block, Benjamin Marzinski, Nils Koenig,
	Ewan Milne

On Thu, Jul 1, 2021 at 9:56 AM Christoph Hellwig <hch@lst.de> wrote:
> On Mon, Jun 28, 2021 at 05:15:58PM +0200, mwilck@suse.com wrote:
> > The qemu "pr-helper" was specifically invented for it. I
> > believe that this is the most important real-world scenario for sending
> > SG_IO ioctls to device-mapper devices.
>
> pr-helper obviously does not SG_IO on dm-multipath as that simply does
> not work.

Right, for the specific case of persistent reservation ioctls, SG_IO is
sent manually to each path via libmpathpersist.

Failover for SG_IO is needed for general-purpose commands (ranging
from INQUIRY/READ CAPACITY to READ/WRITE). The reason to use
SG_IO instead of syscalls is mostly to preserve sense data; QEMU does
have code to convert errno to sense data, but it's fickle. If QEMU can use
sense data directly, it's easier to forward conditions that the guest can
resolve itself (for example unit attentions) and to let the guest operate
at a lower level (e.g. host-managed ZBC can be forwarded and they just
work).

Of course, all this works only for SCSI. As NVMe becomes more common,
and Linux exposes more functionality to userspace with a fabric-neutral
API, QEMU's SBC emulation can start using that functionality and provide
low-level passthrough functionality no matter if the host is using SCSI
or NVMe. Again, the main obstacle for this is sense data; for example,
the SCSI subsystem rightfully eats unit attentions and converts them to
uevents if you go through read/write requests instead of SG_IO.

> More importantly - if you want to use persistent reservations use the
> kernel ioctls for that.  These work on SCSI, NVMe and device mapper
> without any extra magic.

If they provide functionality equivalent to libmpathpersist without having
to do the DM_TABLE_STATUS, I will certainly consider switching! The
only possible issue could be the lost unit attentions.

Paolo

> Failing over SG_IO does not make sense.  It is an interface specically
> designed to leave all error handling to the userspace program using it,
> and we should not change that for one specific error case.  If you
> want the kernel to handle errors for you, use the proper interfaces.
> In this case this is the persistent reservation ioctls.  If they miss
> some features that qemu needs we should add those.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [dm-devel] [PATCH v5 3/3] dm mpath: add CONFIG_DM_MULTIPATH_SG_IO - failover for SG_IO
@ 2021-07-01 11:06       ` Paolo Bonzini
  0 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2021-07-01 11:06 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Mike Snitzer, linux-scsi, Daniel Wagner, Ewan Milne, linux-block,
	dm-devel, Martin K. Petersen, Nils Koenig, Bart Van Assche,
	Martin Wilck, Alasdair G Kergon

On Thu, Jul 1, 2021 at 9:56 AM Christoph Hellwig <hch@lst.de> wrote:
> On Mon, Jun 28, 2021 at 05:15:58PM +0200, mwilck@suse.com wrote:
> > The qemu "pr-helper" was specifically invented for it. I
> > believe that this is the most important real-world scenario for sending
> > SG_IO ioctls to device-mapper devices.
>
> pr-helper obviously does not SG_IO on dm-multipath as that simply does
> not work.

Right, for the specific case of persistent reservation ioctls, SG_IO is
sent manually to each path via libmpathpersist.

Failover for SG_IO is needed for general-purpose commands (ranging
from INQUIRY/READ CAPACITY to READ/WRITE). The reason to use
SG_IO instead of syscalls is mostly to preserve sense data; QEMU does
have code to convert errno to sense data, but it's fickle. If QEMU can use
sense data directly, it's easier to forward conditions that the guest can
resolve itself (for example unit attentions) and to let the guest operate
at a lower level (e.g. host-managed ZBC can be forwarded and they just
work).

Of course, all this works only for SCSI. As NVMe becomes more common,
and Linux exposes more functionality to userspace with a fabric-neutral
API, QEMU's SBC emulation can start using that functionality and provide
low-level passthrough functionality no matter if the host is using SCSI
or NVMe. Again, the main obstacle for this is sense data; for example,
the SCSI subsystem rightfully eats unit attentions and converts them to
uevents if you go through read/write requests instead of SG_IO.

> More importantly - if you want to use persistent reservations use the
> kernel ioctls for that.  These work on SCSI, NVMe and device mapper
> without any extra magic.

If they provide functionality equivalent to libmpathpersist without having
to do the DM_TABLE_STATUS, I will certainly consider switching! The
only possible issue could be the lost unit attentions.

Paolo

> Failing over SG_IO does not make sense.  It is an interface specically
> designed to leave all error handling to the userspace program using it,
> and we should not change that for one specific error case.  If you
> want the kernel to handle errors for you, use the proper interfaces.
> In this case this is the persistent reservation ioctls.  If they miss
> some features that qemu needs we should add those.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 3/3] dm mpath: add CONFIG_DM_MULTIPATH_SG_IO - failover for SG_IO
  2021-07-01 10:35       ` [dm-devel] " Martin Wilck
@ 2021-07-01 11:34         ` Christoph Hellwig
  -1 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2021-07-01 11:34 UTC (permalink / raw)
  To: Martin Wilck
  Cc: Christoph Hellwig, Mike Snitzer, Alasdair G Kergon,
	Bart Van Assche, Martin K. Petersen, linux-scsi, dm-devel,
	Hannes Reinecke, Daniel Wagner, linux-block, Paolo Bonzini,
	Benjamin Marzinski, nkoenig, emilne

On Thu, Jul 01, 2021 at 12:35:53PM +0200, Martin Wilck wrote:
> I respectfully disagree. Users of dm-multipath devices expect that IO
> succeeds as long as there's at least one healthy path. This is a
> fundamental property of multipath devices. Whether IO is sent
> "normally" or via SG_IO doesn't make a difference wrt this expectation.

If you have those (pretty reasonable) expections don't use SG_IO.
That is what the regular read/write path is for.  SG_IO gives you
raw access to the SCSI logic unit, and you get to keep the pieces
if anything goes wrong.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [dm-devel] [PATCH v5 3/3] dm mpath: add CONFIG_DM_MULTIPATH_SG_IO - failover for SG_IO
@ 2021-07-01 11:34         ` Christoph Hellwig
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2021-07-01 11:34 UTC (permalink / raw)
  To: Martin Wilck
  Cc: Mike Snitzer, linux-scsi, Daniel Wagner, emilne, linux-block,
	dm-devel, Paolo Bonzini, Martin K. Petersen, nkoenig,
	Bart Van Assche, Christoph Hellwig, Alasdair G Kergon

On Thu, Jul 01, 2021 at 12:35:53PM +0200, Martin Wilck wrote:
> I respectfully disagree. Users of dm-multipath devices expect that IO
> succeeds as long as there's at least one healthy path. This is a
> fundamental property of multipath devices. Whether IO is sent
> "normally" or via SG_IO doesn't make a difference wrt this expectation.

If you have those (pretty reasonable) expections don't use SG_IO.
That is what the regular read/write path is for.  SG_IO gives you
raw access to the SCSI logic unit, and you get to keep the pieces
if anything goes wrong.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [dm-devel] [PATCH v5 3/3] dm mpath: add CONFIG_DM_MULTIPATH_SG_IO - failover for SG_IO
  2021-07-01 11:34         ` [dm-devel] " Christoph Hellwig
@ 2021-07-02 14:21           ` Martin Wilck
  -1 siblings, 0 replies; 26+ messages in thread
From: Martin Wilck @ 2021-07-02 14:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Mike Snitzer, linux-scsi, Daniel Wagner, emilne, linux-block,
	dm-devel, Paolo Bonzini, Martin K. Petersen, nkoenig,
	Bart Van Assche, Alasdair G Kergon

On Do, 2021-07-01 at 13:34 +0200, Christoph Hellwig wrote:
> On Thu, Jul 01, 2021 at 12:35:53PM +0200, Martin Wilck wrote:
> > I respectfully disagree. Users of dm-multipath devices expect that
> > IO
> > succeeds as long as there's at least one healthy path. This is a
> > fundamental property of multipath devices. Whether IO is sent
> > "normally" or via SG_IO doesn't make a difference wrt this
> > expectation.
> 
> If you have those (pretty reasonable) expections don't use SG_IO.
> That is what the regular read/write path is for.  SG_IO gives you
> raw access to the SCSI logic unit, and you get to keep the pieces
> if anything goes wrong.

With this logic, if some paths are down, SG_IO commands on multipath
devices yield random results. On one path a command would fail, and on
another it would succeed. User space has no way to control or even see
what path is being used. That's a very fragile user space API, on the
fringe of being useless IMO.

If user space is interested in the error handling semantics you
describe, it can run SG_IO on individual SCSI devices any time. With
the change I am proposing, nothing is lost for user space. If user
space decides to do SG_IO on a multipath device, it has a different
expectation, which my patch set implements. IMO we should strive to
match the semantics for ioctls on natively multipathed NVMe namespaces.

Regards
Martin



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [dm-devel] [PATCH v5 3/3] dm mpath: add CONFIG_DM_MULTIPATH_SG_IO - failover for SG_IO
@ 2021-07-02 14:21           ` Martin Wilck
  0 siblings, 0 replies; 26+ messages in thread
From: Martin Wilck @ 2021-07-02 14:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Assche, linux-scsi, Wagner, Mike Snitzer, emilne, linux-block,
	dm-devel, Daniel, Martin K. Petersen, nkoenig, Paolo Bonzini,
	Alasdair G Kergon, Bart

On Do, 2021-07-01 at 13:34 +0200, Christoph Hellwig wrote:
> On Thu, Jul 01, 2021 at 12:35:53PM +0200, Martin Wilck wrote:
> > I respectfully disagree. Users of dm-multipath devices expect that
> > IO
> > succeeds as long as there's at least one healthy path. This is a
> > fundamental property of multipath devices. Whether IO is sent
> > "normally" or via SG_IO doesn't make a difference wrt this
> > expectation.
> 
> If you have those (pretty reasonable) expections don't use SG_IO.
> That is what the regular read/write path is for.  SG_IO gives you
> raw access to the SCSI logic unit, and you get to keep the pieces
> if anything goes wrong.

With this logic, if some paths are down, SG_IO commands on multipath
devices yield random results. On one path a command would fail, and on
another it would succeed. User space has no way to control or even see
what path is being used. That's a very fragile user space API, on the
fringe of being useless IMO.

If user space is interested in the error handling semantics you
describe, it can run SG_IO on individual SCSI devices any time. With
the change I am proposing, nothing is lost for user space. If user
space decides to do SG_IO on a multipath device, it has a different
expectation, which my patch set implements. IMO we should strive to
match the semantics for ioctls on natively multipathed NVMe namespaces.

Regards
Martin



--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [dm-devel] [PATCH v5 3/3] dm mpath: add CONFIG_DM_MULTIPATH_SG_IO - failover for SG_IO
  2021-07-02 14:21           ` Martin Wilck
@ 2021-07-05 13:02             ` Paolo Bonzini
  -1 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2021-07-05 13:02 UTC (permalink / raw)
  To: Martin Wilck, Christoph Hellwig
  Cc: Mike Snitzer, linux-scsi, Daniel Wagner, emilne, linux-block,
	dm-devel, Martin K. Petersen, nkoenig, Bart Van Assche,
	Alasdair G Kergon

On 02/07/21 16:21, Martin Wilck wrote:
>> SG_IO gives you raw access to the SCSI logic unit, and you get to
>> keep the pieces if anything goes wrong.
> 
> That's a very fragile user space API, on the fringe of being useless
> IMO.

Indeed.  If SG_IO is for raw access to an ITL nexus, it shouldn't be
supported at all by mpath devices.  If on the other hand SG_IO is for
raw access to a LUN, there is no reason for it to not support failover.

Paolo


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [dm-devel] [PATCH v5 3/3] dm mpath: add CONFIG_DM_MULTIPATH_SG_IO - failover for SG_IO
@ 2021-07-05 13:02             ` Paolo Bonzini
  0 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2021-07-05 13:02 UTC (permalink / raw)
  To: Martin Wilck, Christoph Hellwig
  Cc: linux-scsi, Daniel Wagner, Mike Snitzer, emilne, linux-block,
	dm-devel, Martin K. Petersen, nkoenig, Bart Van Assche,
	Alasdair G Kergon

On 02/07/21 16:21, Martin Wilck wrote:
>> SG_IO gives you raw access to the SCSI logic unit, and you get to
>> keep the pieces if anything goes wrong.
> 
> That's a very fragile user space API, on the fringe of being useless
> IMO.

Indeed.  If SG_IO is for raw access to an ITL nexus, it shouldn't be
supported at all by mpath devices.  If on the other hand SG_IO is for
raw access to a LUN, there is no reason for it to not support failover.

Paolo

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [dm-devel] [PATCH v5 3/3] dm mpath: add CONFIG_DM_MULTIPATH_SG_IO - failover for SG_IO
  2021-07-05 13:02             ` Paolo Bonzini
@ 2021-07-05 13:11               ` Hannes Reinecke
  -1 siblings, 0 replies; 26+ messages in thread
From: Hannes Reinecke @ 2021-07-05 13:11 UTC (permalink / raw)
  To: Paolo Bonzini, Martin Wilck, Christoph Hellwig
  Cc: Mike Snitzer, linux-scsi, Daniel Wagner, emilne, linux-block,
	dm-devel, Martin K. Petersen, nkoenig, Bart Van Assche,
	Alasdair G Kergon

On 7/5/21 3:02 PM, Paolo Bonzini wrote:
> On 02/07/21 16:21, Martin Wilck wrote:
>>> SG_IO gives you raw access to the SCSI logic unit, and you get to
>>> keep the pieces if anything goes wrong.
>>
>> That's a very fragile user space API, on the fringe of being useless
>> IMO.
> 
> Indeed.  If SG_IO is for raw access to an ITL nexus, it shouldn't be
> supported at all by mpath devices.  If on the other hand SG_IO is for
> raw access to a LUN, there is no reason for it to not support failover.
> 
Or we look at IO_URING_OP_URING_CMD, to send raw cdbs via io_uring.
And delegate SG_IO to raw access to an ITL nexus.
Doesn't help with existing issues, but should get a clean way forward.

... I think I've even tendered a topic at LSF for this ...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [dm-devel] [PATCH v5 3/3] dm mpath: add CONFIG_DM_MULTIPATH_SG_IO - failover for SG_IO
@ 2021-07-05 13:11               ` Hannes Reinecke
  0 siblings, 0 replies; 26+ messages in thread
From: Hannes Reinecke @ 2021-07-05 13:11 UTC (permalink / raw)
  To: Paolo Bonzini, Martin Wilck, Christoph Hellwig
  Cc: linux-scsi, Daniel Wagner, Mike Snitzer, emilne, linux-block,
	dm-devel, Martin K. Petersen, nkoenig, Bart Van Assche,
	Alasdair G Kergon

On 7/5/21 3:02 PM, Paolo Bonzini wrote:
> On 02/07/21 16:21, Martin Wilck wrote:
>>> SG_IO gives you raw access to the SCSI logic unit, and you get to
>>> keep the pieces if anything goes wrong.
>>
>> That's a very fragile user space API, on the fringe of being useless
>> IMO.
> 
> Indeed.  If SG_IO is for raw access to an ITL nexus, it shouldn't be
> supported at all by mpath devices.  If on the other hand SG_IO is for
> raw access to a LUN, there is no reason for it to not support failover.
> 
Or we look at IO_URING_OP_URING_CMD, to send raw cdbs via io_uring.
And delegate SG_IO to raw access to an ITL nexus.
Doesn't help with existing issues, but should get a clean way forward.

... I think I've even tendered a topic at LSF for this ...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [dm-devel] [PATCH v5 3/3] dm mpath: add CONFIG_DM_MULTIPATH_SG_IO - failover for SG_IO
  2021-07-05 13:11               ` Hannes Reinecke
@ 2021-07-05 13:48                 ` Martin Wilck
  -1 siblings, 0 replies; 26+ messages in thread
From: Martin Wilck @ 2021-07-05 13:48 UTC (permalink / raw)
  To: Hannes Reinecke, Paolo Bonzini, Christoph Hellwig
  Cc: linux-scsi, Daniel Wagner, Mike Snitzer, emilne, linux-block,
	dm-devel, Martin K. Petersen, nkoenig, Bart Van Assche,
	Alasdair G Kergon

On Mo, 2021-07-05 at 15:11 +0200, Hannes Reinecke wrote:
> On 7/5/21 3:02 PM, Paolo Bonzini wrote:
> > On 02/07/21 16:21, Martin Wilck wrote:
> > > > SG_IO gives you raw access to the SCSI logic unit, and you get
> > > > to
> > > > keep the pieces if anything goes wrong.
> > > 
> > > That's a very fragile user space API, on the fringe of being
> > > useless
> > > IMO.
> > 
> > Indeed.  If SG_IO is for raw access to an ITL nexus, it shouldn't
> > be
> > supported at all by mpath devices.  If on the other hand SG_IO is
> > for
> > raw access to a LUN, there is no reason for it to not support
> > failover.
> > 
> Or we look at IO_URING_OP_URING_CMD, to send raw cdbs via io_uring.
> And delegate SG_IO to raw access to an ITL nexus.
> Doesn't help with existing issues, but should get a clean way
> forward.

I still have to understand how this would help with the retrying
semantics. Wouldn't we get the exact same problem if a path error
occurs?

Regards,
Martin



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [dm-devel] [PATCH v5 3/3] dm mpath: add CONFIG_DM_MULTIPATH_SG_IO - failover for SG_IO
@ 2021-07-05 13:48                 ` Martin Wilck
  0 siblings, 0 replies; 26+ messages in thread
From: Martin Wilck @ 2021-07-05 13:48 UTC (permalink / raw)
  To: Hannes Reinecke, Paolo Bonzini, Christoph Hellwig
  Cc: Mike, Daniel Wagner, Snitzer, emilne, Martin K. Petersen,
	linux-block, dm-devel, linux-scsi, nkoenig, Bart Van Assche,
	Alasdair G Kergon

On Mo, 2021-07-05 at 15:11 +0200, Hannes Reinecke wrote:
> On 7/5/21 3:02 PM, Paolo Bonzini wrote:
> > On 02/07/21 16:21, Martin Wilck wrote:
> > > > SG_IO gives you raw access to the SCSI logic unit, and you get
> > > > to
> > > > keep the pieces if anything goes wrong.
> > > 
> > > That's a very fragile user space API, on the fringe of being
> > > useless
> > > IMO.
> > 
> > Indeed.  If SG_IO is for raw access to an ITL nexus, it shouldn't
> > be
> > supported at all by mpath devices.  If on the other hand SG_IO is
> > for
> > raw access to a LUN, there is no reason for it to not support
> > failover.
> > 
> Or we look at IO_URING_OP_URING_CMD, to send raw cdbs via io_uring.
> And delegate SG_IO to raw access to an ITL nexus.
> Doesn't help with existing issues, but should get a clean way
> forward.

I still have to understand how this would help with the retrying
semantics. Wouldn't we get the exact same problem if a path error
occurs?

Regards,
Martin



--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [dm-devel] [PATCH v5 3/3] dm mpath: add CONFIG_DM_MULTIPATH_SG_IO - failover for SG_IO
  2021-07-05 13:48                 ` Martin Wilck
@ 2021-07-06 10:13                   ` Paolo Bonzini
  -1 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2021-07-06 10:13 UTC (permalink / raw)
  To: Martin Wilck, Hannes Reinecke, Christoph Hellwig
  Cc: linux-scsi, Daniel Wagner, Mike Snitzer, emilne, linux-block,
	dm-devel, Martin K. Petersen, nkoenig, Bart Van Assche,
	Alasdair G Kergon

On 05/07/21 15:48, Martin Wilck wrote:
> On Mo, 2021-07-05 at 15:11 +0200, Hannes Reinecke wrote:
>> On 7/5/21 3:02 PM, Paolo Bonzini wrote:
>>> On 02/07/21 16:21, Martin Wilck wrote:
>>>>> SG_IO gives you raw access to the SCSI logic unit, and you get
>>>>> to
>>>>> keep the pieces if anything goes wrong.
>>>>
>>>> That's a very fragile user space API, on the fringe of being
>>>> useless
>>>> IMO.
>>>
>>> Indeed.  If SG_IO is for raw access to an ITL nexus, it shouldn't
>>> be supported at all by mpath devices.  If on the other hand SG_IO is
>>> for raw access to a LUN, there is no reason for it to not support
>>> failover.
>>
>> Or we look at IO_URING_OP_URING_CMD, to send raw cdbs via io_uring.
>> And delegate SG_IO to raw access to an ITL nexus.
>> Doesn't help with existing issues, but should get a clean way
>> forward.
> 
> I still have to understand how this would help with the retrying
> semantics. Wouldn't we get the exact same problem if a path error
> occurs?

Also, how would the URING_CMD API differ from SG_IO modulo one being a 
ioctl and one being io_uring-based?  In the end what you have to do is 
1) send a CDB and optionally some data 2) get back a status and 
optionally some data and sense.  Whether the intended use of the API is 
for an ITL nexus or a LUN doesn't really matter.  So, what is the 
rationale for "SG_IO is for a nexus" in the first place, if you think 
that "raw CDBs for a LUN" is a useful operation?  You can still use 
DM_TABLE_STATUS (iirc) to address a specific ITL nexus if desired.

Besides the virtualization case, think of udev rules that use SG_IO to 
retrieve the device identification page for an mpath device and create 
corresponding symlinks in /dev.  They would fail if the first path is 
not responding, which is not desirable.

Paolo


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [dm-devel] [PATCH v5 3/3] dm mpath: add CONFIG_DM_MULTIPATH_SG_IO - failover for SG_IO
@ 2021-07-06 10:13                   ` Paolo Bonzini
  0 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2021-07-06 10:13 UTC (permalink / raw)
  To: Martin Wilck, Hannes Reinecke, Christoph Hellwig
  Cc: Daniel Wagner, Mike Snitzer, emilne, Martin K. Petersen,
	linux-block, dm-devel, linux-scsi, nkoenig, Bart Van Assche,
	Alasdair G Kergon

On 05/07/21 15:48, Martin Wilck wrote:
> On Mo, 2021-07-05 at 15:11 +0200, Hannes Reinecke wrote:
>> On 7/5/21 3:02 PM, Paolo Bonzini wrote:
>>> On 02/07/21 16:21, Martin Wilck wrote:
>>>>> SG_IO gives you raw access to the SCSI logic unit, and you get
>>>>> to
>>>>> keep the pieces if anything goes wrong.
>>>>
>>>> That's a very fragile user space API, on the fringe of being
>>>> useless
>>>> IMO.
>>>
>>> Indeed.  If SG_IO is for raw access to an ITL nexus, it shouldn't
>>> be supported at all by mpath devices.  If on the other hand SG_IO is
>>> for raw access to a LUN, there is no reason for it to not support
>>> failover.
>>
>> Or we look at IO_URING_OP_URING_CMD, to send raw cdbs via io_uring.
>> And delegate SG_IO to raw access to an ITL nexus.
>> Doesn't help with existing issues, but should get a clean way
>> forward.
> 
> I still have to understand how this would help with the retrying
> semantics. Wouldn't we get the exact same problem if a path error
> occurs?

Also, how would the URING_CMD API differ from SG_IO modulo one being a 
ioctl and one being io_uring-based?  In the end what you have to do is 
1) send a CDB and optionally some data 2) get back a status and 
optionally some data and sense.  Whether the intended use of the API is 
for an ITL nexus or a LUN doesn't really matter.  So, what is the 
rationale for "SG_IO is for a nexus" in the first place, if you think 
that "raw CDBs for a LUN" is a useful operation?  You can still use 
DM_TABLE_STATUS (iirc) to address a specific ITL nexus if desired.

Besides the virtualization case, think of udev rules that use SG_IO to 
retrieve the device identification page for an mpath device and create 
corresponding symlinks in /dev.  They would fail if the first path is 
not responding, which is not desirable.

Paolo

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2021-07-06 10:13 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-28 15:15 [PATCH v5 0/3] scsi/dm: dm_blk_ioctl(): implement failover for SG_IO on dm-multipath mwilck
2021-06-28 15:15 ` [dm-devel] " mwilck
2021-06-28 15:15 ` [PATCH v5 1/3] scsi: scsi_ioctl: export __scsi_result_to_blk_status() mwilck
2021-06-28 15:15   ` [dm-devel] " mwilck
2021-06-28 15:15 ` [PATCH v5 2/3] scsi: scsi_ioctl: add sg_io_to_blk_status() mwilck
2021-06-28 15:15   ` [dm-devel] " mwilck
2021-06-28 15:15 ` [PATCH v5 3/3] dm mpath: add CONFIG_DM_MULTIPATH_SG_IO - failover for SG_IO mwilck
2021-06-28 15:15   ` [dm-devel] " mwilck
2021-07-01  7:56   ` Christoph Hellwig
2021-07-01  7:56     ` [dm-devel] " Christoph Hellwig
2021-07-01 10:35     ` Martin Wilck
2021-07-01 10:35       ` [dm-devel] " Martin Wilck
2021-07-01 11:34       ` Christoph Hellwig
2021-07-01 11:34         ` [dm-devel] " Christoph Hellwig
2021-07-02 14:21         ` Martin Wilck
2021-07-02 14:21           ` Martin Wilck
2021-07-05 13:02           ` Paolo Bonzini
2021-07-05 13:02             ` Paolo Bonzini
2021-07-05 13:11             ` Hannes Reinecke
2021-07-05 13:11               ` Hannes Reinecke
2021-07-05 13:48               ` Martin Wilck
2021-07-05 13:48                 ` Martin Wilck
2021-07-06 10:13                 ` Paolo Bonzini
2021-07-06 10:13                   ` Paolo Bonzini
2021-07-01 11:06     ` Paolo Bonzini
2021-07-01 11:06       ` [dm-devel] " Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.