linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/19] Use block pr_ops in LIO
@ 2022-10-26 23:19 Mike Christie
  2022-10-26 23:19 ` [PATCH v3 01/19] block: Add PR callouts for read keys and reservation Mike Christie
                   ` (18 more replies)
  0 siblings, 19 replies; 54+ messages in thread
From: Mike Christie @ 2022-10-26 23:19 UTC (permalink / raw)
  To: bvanassche, hch, martin.petersen, linux-scsi, james.bottomley,
	linux-block, dm-devel, snitzer, axboe, linux-nvme, chaitanyak,
	kbusch, target-devel

The following patches were built over Linus's tree and this patchset

https://lore.kernel.org/all/20221023030403.33845-24-michael.christie@oracle.com/t/

which allows the SCSI layer passthrough users to control retries for
commands like PRs used in this patchset.

The patches in this thread allow us to use the block pr_ops with LIO's
target_core_iblock module to support cluster applications in VMs.
Currently, to use windows clustering or linux clustering (pacemaker +
cluster labs scsi fence agents) in VMs with LIO and vhost-scsi, you have
to use tcmu or pscsi or use a cluster aware FS/framework for the LIO pr
file. Setting up a cluster FS/framework is pain and waste when your real
backend device is already a distributed device, and pscsi and tcmu are
nice for specific use cases, but iblock gives you the best performance and
allows you to use stacked devices like dm-multipath. So these patches
allow iblock to work like pscsi/tcmu where they can pass a PR command to
the backend module. And then iblock will use the pr_ops to pass the PR
command to the real devices similar to what we do for unmap today.

The patches are separated in the following groups:

patches 1 - 11
- Add callouts to read a reservation and it's keys.

patches 12 - 15
- Have pr_ops return a blk_status_t.

patches 16 - 19
- Support for target_core_iblock to bypass the emulate PR code and call
the pr_ops.

This patchset has been tested with the libiscsi PGR ops and with window's
failover cluster verification test.

v3:
- Fix patch subject formatting.
- Fix coding style.
- Rearrange patches so helpers are added with users to avoid compilation
errors.
- Move pr type conversion to array and add nvme_pr_type.
- Add Extended Data Structure control flag enum and use in code for checks.
- Move nvme pr code to new file.
- Add more info to patch subjects about why we need to add blk_status
to pr_ops.
- Use generic SCSI passthrough error handling interface.
- Fix checkpatch --strict errors. Note that I kept the existing coding
style that it complained about because it looked like it was the preferred
style for the code and I didn't want a mix and match.

v2:
- Drop BLK_STS_NEXUS rename changes. Will do separately.
- Add NVMe support.
- Fixed bug in target_core_iblock where a variable was not initialized
mentioned by Christoph.
- Fixed sd pr_ops UA handling issue found when running libiscsi PGR tests.
- Added patches to allow pr_ops to pass up a BLK_STS so we could return
a RESERVATION_CONFLICT status when a pr_ops callout fails.





^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v3 01/19] block: Add PR callouts for read keys and reservation
  2022-10-26 23:19 [PATCH v3 00/19] Use block pr_ops in LIO Mike Christie
@ 2022-10-26 23:19 ` Mike Christie
  2022-11-02 22:50   ` Bart Van Assche
  2022-11-02 22:53   ` Bart Van Assche
  2022-10-26 23:19 ` [PATCH v3 02/19] scsi: Rename sd_pr_command Mike Christie
                   ` (17 subsequent siblings)
  18 siblings, 2 replies; 54+ messages in thread
From: Mike Christie @ 2022-10-26 23:19 UTC (permalink / raw)
  To: bvanassche, hch, martin.petersen, linux-scsi, james.bottomley,
	linux-block, dm-devel, snitzer, axboe, linux-nvme, chaitanyak,
	kbusch, target-devel
  Cc: Mike Christie

Add callouts for reading keys and reservations. This allows LIO to support
the READ_KEYS and READ_RESERVATION commands and will allow dm-multipath
to optimize it's error handling so it can check if it's getting an error
because there's an existing reservation or if we need to retry different
paths.

Note: This only initially adds the struct definitions in the kernel as I'm
not sure if we wanted to export the interface to userspace yet. read_keys
and read_reservation are exactly what dm-multipath and LIO need, but for a
userspace interface we may want something like SCSI's READ_FULL_STATUS and
NVMe's report reservation commands. Those are overkill for dm/LIO and
READ_FULL_STATUS is sometimes broken for SCSI devices.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 include/linux/pr.h | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/include/linux/pr.h b/include/linux/pr.h
index 94ceec713afe..55c9ab7a202b 100644
--- a/include/linux/pr.h
+++ b/include/linux/pr.h
@@ -4,6 +4,18 @@
 
 #include <uapi/linux/pr.h>
 
+struct pr_keys {
+	u32	generation;
+	u32	num_keys;
+	u64	keys[];
+};
+
+struct pr_held_reservation {
+	u64		key;
+	u32		generation;
+	enum pr_type	type;
+};
+
 struct pr_ops {
 	int (*pr_register)(struct block_device *bdev, u64 old_key, u64 new_key,
 			u32 flags);
@@ -14,6 +26,18 @@ struct pr_ops {
 	int (*pr_preempt)(struct block_device *bdev, u64 old_key, u64 new_key,
 			enum pr_type type, bool abort);
 	int (*pr_clear)(struct block_device *bdev, u64 key);
+	/*
+	 * pr_read_keys - Read the registered keys and return them in the
+	 * pr_keys->keys array. The keys array will have been allocated at the
+	 * end of the pr_keys struct and is keys_len bytes. If there are more
+	 * keys than can fit in the array, success will still be returned and
+	 * pr_keys->num_keys will reflect the total number of keys the device
+	 * contains, so the caller can retry with a larger array.
+	 */
+	int (*pr_read_keys)(struct block_device *bdev,
+			struct pr_keys *keys_info, u32 keys_len);
+	int (*pr_read_reservation)(struct block_device *bdev,
+			struct pr_held_reservation *rsv);
 };
 
 #endif /* LINUX_PR_H */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v3 02/19] scsi: Rename sd_pr_command
  2022-10-26 23:19 [PATCH v3 00/19] Use block pr_ops in LIO Mike Christie
  2022-10-26 23:19 ` [PATCH v3 01/19] block: Add PR callouts for read keys and reservation Mike Christie
@ 2022-10-26 23:19 ` Mike Christie
  2022-11-01  5:33   ` Chaitanya Kulkarni
  2022-10-26 23:19 ` [PATCH v3 03/19] scsi: Move sd_pr_type to header to share Mike Christie
                   ` (16 subsequent siblings)
  18 siblings, 1 reply; 54+ messages in thread
From: Mike Christie @ 2022-10-26 23:19 UTC (permalink / raw)
  To: bvanassche, hch, martin.petersen, linux-scsi, james.bottomley,
	linux-block, dm-devel, snitzer, axboe, linux-nvme, chaitanyak,
	kbusch, target-devel
  Cc: Mike Christie

Rename sd_pr_command to sd_pr_out_command to match a
sd_pr_in_command helper added in the next patches.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/sd.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 187b4fe2bc2b..4dc5c932fbd3 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1714,7 +1714,7 @@ static char sd_pr_type(enum pr_type type)
 	}
 };
 
-static int sd_pr_command(struct block_device *bdev, u8 sa,
+static int sd_pr_out_command(struct block_device *bdev, u8 sa,
 		u64 key, u64 sa_key, u8 type, u8 flags)
 {
 	struct scsi_disk *sdkp = scsi_disk(bdev->bd_disk);
@@ -1768,7 +1768,7 @@ static int sd_pr_register(struct block_device *bdev, u64 old_key, u64 new_key,
 {
 	if (flags & ~PR_FL_IGNORE_KEY)
 		return -EOPNOTSUPP;
-	return sd_pr_command(bdev, (flags & PR_FL_IGNORE_KEY) ? 0x06 : 0x00,
+	return sd_pr_out_command(bdev, (flags & PR_FL_IGNORE_KEY) ? 0x06 : 0x00,
 			old_key, new_key, 0,
 			(1 << 0) /* APTPL */);
 }
@@ -1778,24 +1778,24 @@ static int sd_pr_reserve(struct block_device *bdev, u64 key, enum pr_type type,
 {
 	if (flags)
 		return -EOPNOTSUPP;
-	return sd_pr_command(bdev, 0x01, key, 0, sd_pr_type(type), 0);
+	return sd_pr_out_command(bdev, 0x01, key, 0, sd_pr_type(type), 0);
 }
 
 static int sd_pr_release(struct block_device *bdev, u64 key, enum pr_type type)
 {
-	return sd_pr_command(bdev, 0x02, key, 0, sd_pr_type(type), 0);
+	return sd_pr_out_command(bdev, 0x02, key, 0, sd_pr_type(type), 0);
 }
 
 static int sd_pr_preempt(struct block_device *bdev, u64 old_key, u64 new_key,
 		enum pr_type type, bool abort)
 {
-	return sd_pr_command(bdev, abort ? 0x05 : 0x04, old_key, new_key,
+	return sd_pr_out_command(bdev, abort ? 0x05 : 0x04, old_key, new_key,
 			     sd_pr_type(type), 0);
 }
 
 static int sd_pr_clear(struct block_device *bdev, u64 key)
 {
-	return sd_pr_command(bdev, 0x03, key, 0, 0, 0);
+	return sd_pr_out_command(bdev, 0x03, key, 0, 0, 0);
 }
 
 static const struct pr_ops sd_pr_ops = {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v3 03/19] scsi: Move sd_pr_type to header to share
  2022-10-26 23:19 [PATCH v3 00/19] Use block pr_ops in LIO Mike Christie
  2022-10-26 23:19 ` [PATCH v3 01/19] block: Add PR callouts for read keys and reservation Mike Christie
  2022-10-26 23:19 ` [PATCH v3 02/19] scsi: Rename sd_pr_command Mike Christie
@ 2022-10-26 23:19 ` Mike Christie
  2022-11-01  5:43   ` Chaitanya Kulkarni
  2022-11-02 22:47   ` Bart Van Assche
  2022-10-26 23:19 ` [PATCH v3 04/19] scsi: Add support for block PR read keys/reservation Mike Christie
                   ` (15 subsequent siblings)
  18 siblings, 2 replies; 54+ messages in thread
From: Mike Christie @ 2022-10-26 23:19 UTC (permalink / raw)
  To: bvanassche, hch, martin.petersen, linux-scsi, james.bottomley,
	linux-block, dm-devel, snitzer, axboe, linux-nvme, chaitanyak,
	kbusch, target-devel
  Cc: Mike Christie

LIO is going to want to do the same block to/from SCSI pr types as sd.c
so this moves the sd_pr_type helper to a new file. The next patch will
then also add a helper to go from the SCSI value to the block one for use
with PERSISTENT_RESERVE_IN commands.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/sd.c            | 31 +++++++------------------------
 include/scsi/scsi_block_pr.h | 36 ++++++++++++++++++++++++++++++++++++
 2 files changed, 43 insertions(+), 24 deletions(-)
 create mode 100644 include/scsi/scsi_block_pr.h

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 4dc5c932fbd3..ad9374b07585 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -67,6 +67,7 @@
 #include <scsi/scsi_host.h>
 #include <scsi/scsi_ioctl.h>
 #include <scsi/scsicam.h>
+#include <scsi/scsi_block_pr.h>
 
 #include "sd.h"
 #include "scsi_priv.h"
@@ -1694,28 +1695,8 @@ static int sd_get_unique_id(struct gendisk *disk, u8 id[16],
 	return ret;
 }
 
-static char sd_pr_type(enum pr_type type)
-{
-	switch (type) {
-	case PR_WRITE_EXCLUSIVE:
-		return 0x01;
-	case PR_EXCLUSIVE_ACCESS:
-		return 0x03;
-	case PR_WRITE_EXCLUSIVE_REG_ONLY:
-		return 0x05;
-	case PR_EXCLUSIVE_ACCESS_REG_ONLY:
-		return 0x06;
-	case PR_WRITE_EXCLUSIVE_ALL_REGS:
-		return 0x07;
-	case PR_EXCLUSIVE_ACCESS_ALL_REGS:
-		return 0x08;
-	default:
-		return 0;
-	}
-};
-
 static int sd_pr_out_command(struct block_device *bdev, u8 sa,
-		u64 key, u64 sa_key, u8 type, u8 flags)
+		u64 key, u64 sa_key, enum scsi_pr_type type, u8 flags)
 {
 	struct scsi_disk *sdkp = scsi_disk(bdev->bd_disk);
 	struct scsi_device *sdev = sdkp->device;
@@ -1778,19 +1759,21 @@ static int sd_pr_reserve(struct block_device *bdev, u64 key, enum pr_type type,
 {
 	if (flags)
 		return -EOPNOTSUPP;
-	return sd_pr_out_command(bdev, 0x01, key, 0, sd_pr_type(type), 0);
+	return sd_pr_out_command(bdev, 0x01, key, 0,
+				 block_pr_type_to_scsi(type), 0);
 }
 
 static int sd_pr_release(struct block_device *bdev, u64 key, enum pr_type type)
 {
-	return sd_pr_out_command(bdev, 0x02, key, 0, sd_pr_type(type), 0);
+	return sd_pr_out_command(bdev, 0x02, key, 0,
+				 block_pr_type_to_scsi(type), 0);
 }
 
 static int sd_pr_preempt(struct block_device *bdev, u64 old_key, u64 new_key,
 		enum pr_type type, bool abort)
 {
 	return sd_pr_out_command(bdev, abort ? 0x05 : 0x04, old_key, new_key,
-			     sd_pr_type(type), 0);
+				 block_pr_type_to_scsi(type), 0);
 }
 
 static int sd_pr_clear(struct block_device *bdev, u64 key)
diff --git a/include/scsi/scsi_block_pr.h b/include/scsi/scsi_block_pr.h
new file mode 100644
index 000000000000..6e99f844330d
--- /dev/null
+++ b/include/scsi/scsi_block_pr.h
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _SCSI_BLOCK_PR_H
+#define _SCSI_BLOCK_PR_H
+
+#include <uapi/linux/pr.h>
+
+enum scsi_pr_type {
+	SCSI_PR_WRITE_EXCLUSIVE			= 0x01,
+	SCSI_PR_EXCLUSIVE_ACCESS		= 0x03,
+	SCSI_PR_WRITE_EXCLUSIVE_REG_ONLY	= 0x05,
+	SCSI_PR_EXCLUSIVE_ACCESS_REG_ONLY	= 0x06,
+	SCSI_PR_WRITE_EXCLUSIVE_ALL_REGS	= 0x07,
+	SCSI_PR_EXCLUSIVE_ACCESS_ALL_REGS	= 0x08,
+};
+
+static inline enum scsi_pr_type block_pr_type_to_scsi(enum pr_type type)
+{
+	switch (type) {
+	case PR_WRITE_EXCLUSIVE:
+		return SCSI_PR_WRITE_EXCLUSIVE;
+	case PR_EXCLUSIVE_ACCESS:
+		return SCSI_PR_EXCLUSIVE_ACCESS;
+	case PR_WRITE_EXCLUSIVE_REG_ONLY:
+		return SCSI_PR_WRITE_EXCLUSIVE_REG_ONLY;
+	case PR_EXCLUSIVE_ACCESS_REG_ONLY:
+		return SCSI_PR_EXCLUSIVE_ACCESS_REG_ONLY;
+	case PR_WRITE_EXCLUSIVE_ALL_REGS:
+		return SCSI_PR_WRITE_EXCLUSIVE_ALL_REGS;
+	case PR_EXCLUSIVE_ACCESS_ALL_REGS:
+		return SCSI_PR_EXCLUSIVE_ACCESS_ALL_REGS;
+	default:
+		return 0;
+	}
+};
+
+#endif
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v3 04/19] scsi: Add support for block PR read keys/reservation
  2022-10-26 23:19 [PATCH v3 00/19] Use block pr_ops in LIO Mike Christie
                   ` (2 preceding siblings ...)
  2022-10-26 23:19 ` [PATCH v3 03/19] scsi: Move sd_pr_type to header to share Mike Christie
@ 2022-10-26 23:19 ` Mike Christie
  2022-11-01  5:45   ` Chaitanya Kulkarni
  2022-11-02 22:54   ` Bart Van Assche
  2022-10-26 23:19 ` [PATCH v3 05/19] dm: " Mike Christie
                   ` (14 subsequent siblings)
  18 siblings, 2 replies; 54+ messages in thread
From: Mike Christie @ 2022-10-26 23:19 UTC (permalink / raw)
  To: bvanassche, hch, martin.petersen, linux-scsi, james.bottomley,
	linux-block, dm-devel, snitzer, axboe, linux-nvme, chaitanyak,
	kbusch, target-devel
  Cc: Mike Christie

This adds support in sd.c for the block PR read keys and read reservation
callouts.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/sd.c            | 104 +++++++++++++++++++++++++++++++++++
 include/scsi/scsi_block_pr.h |  20 +++++++
 include/scsi/scsi_proto.h    |   5 ++
 3 files changed, 129 insertions(+)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index ad9374b07585..86b602399000 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1695,6 +1695,108 @@ static int sd_get_unique_id(struct gendisk *disk, u8 id[16],
 	return ret;
 }
 
+static int sd_pr_in_command(struct block_device *bdev, u8 sa,
+			    unsigned char *data, int data_len)
+{
+	struct scsi_disk *sdkp = scsi_disk(bdev->bd_disk);
+	struct scsi_device *sdev = sdkp->device;
+	struct scsi_sense_hdr sshdr;
+	u8 cmd[10] = { PERSISTENT_RESERVE_IN, sa };
+	struct scsi_failure failures[] = {
+		{
+			.sense = UNIT_ATTENTION,
+			.asc = SCMD_FAILURE_ASC_ANY,
+			.ascq = SCMD_FAILURE_ASCQ_ANY,
+			.allowed = 5,
+			.result = SAM_STAT_CHECK_CONDITION,
+		},
+		{},
+	};
+	int result;
+
+	put_unaligned_be16(data_len, &cmd[7]);
+
+	result = scsi_exec_req(((struct scsi_exec_args) {
+					.sdev = sdev,
+					.cmd = cmd,
+					.data_dir = DMA_FROM_DEVICE,
+					.buf = data,
+					.buf_len = data_len,
+					.sshdr = &sshdr,
+					.timeout = SD_TIMEOUT,
+					.retries = sdkp->max_retries,
+					.failures = failures }));
+	if (scsi_status_is_check_condition(result) &&
+	    scsi_sense_valid(&sshdr)) {
+		sdev_printk(KERN_INFO, sdev, "PR command failed: %d\n", result);
+		scsi_print_sense_hdr(sdev, NULL, &sshdr);
+	}
+
+	return result;
+}
+
+static int sd_pr_read_keys(struct block_device *bdev, struct pr_keys *keys_info,
+			   u32 keys_len)
+{
+	int result, i, data_offset, num_copy_keys;
+	int data_len = keys_len + 8;
+	u8 *data;
+
+	data = kzalloc(data_len, GFP_KERNEL);
+	if (!data)
+		return -ENOMEM;
+
+	result = sd_pr_in_command(bdev, READ_KEYS, data, data_len);
+	if (result)
+		goto free_data;
+
+	keys_info->generation = get_unaligned_be32(&data[0]);
+	keys_info->num_keys = get_unaligned_be32(&data[4]) / 8;
+
+	data_offset = 8;
+	num_copy_keys = min(keys_len / 8, keys_info->num_keys);
+
+	for (i = 0; i < num_copy_keys; i++) {
+		keys_info->keys[i] = get_unaligned_be64(&data[data_offset]);
+		data_offset += 8;
+	}
+
+free_data:
+	kfree(data);
+	return result;
+}
+
+static int sd_pr_read_reservation(struct block_device *bdev,
+				  struct pr_held_reservation *rsv)
+{
+	struct scsi_disk *sdkp = scsi_disk(bdev->bd_disk);
+	struct scsi_device *sdev = sdkp->device;
+	u8 data[24] = { 0, };
+	int result, len;
+
+	result = sd_pr_in_command(bdev, READ_RESERVATION, data, sizeof(data));
+	if (result)
+		return result;
+
+	memset(rsv, 0, sizeof(*rsv));
+	len = get_unaligned_be32(&data[4]);
+	if (!len)
+		return result;
+
+	/* Make sure we have at least the key and type */
+	if (len < 14) {
+		sdev_printk(KERN_INFO, sdev,
+			    "READ RESERVATION failed due to short return buffer of %d bytes\n",
+			    len);
+		return -EINVAL;
+	}
+
+	rsv->generation = get_unaligned_be32(&data[0]);
+	rsv->key = get_unaligned_be64(&data[8]);
+	rsv->type = scsi_pr_type_to_block(data[21] & 0x0f);
+	return 0;
+}
+
 static int sd_pr_out_command(struct block_device *bdev, u8 sa,
 		u64 key, u64 sa_key, enum scsi_pr_type type, u8 flags)
 {
@@ -1787,6 +1889,8 @@ static const struct pr_ops sd_pr_ops = {
 	.pr_release	= sd_pr_release,
 	.pr_preempt	= sd_pr_preempt,
 	.pr_clear	= sd_pr_clear,
+	.pr_read_keys	= sd_pr_read_keys,
+	.pr_read_reservation = sd_pr_read_reservation,
 };
 
 static void scsi_disk_free_disk(struct gendisk *disk)
diff --git a/include/scsi/scsi_block_pr.h b/include/scsi/scsi_block_pr.h
index 6e99f844330d..137bf2247273 100644
--- a/include/scsi/scsi_block_pr.h
+++ b/include/scsi/scsi_block_pr.h
@@ -33,4 +33,24 @@ static inline enum scsi_pr_type block_pr_type_to_scsi(enum pr_type type)
 	}
 };
 
+static inline enum pr_type scsi_pr_type_to_block(enum scsi_pr_type type)
+{
+	switch (type) {
+	case SCSI_PR_WRITE_EXCLUSIVE:
+		return PR_WRITE_EXCLUSIVE;
+	case SCSI_PR_EXCLUSIVE_ACCESS:
+		return PR_EXCLUSIVE_ACCESS;
+	case SCSI_PR_WRITE_EXCLUSIVE_REG_ONLY:
+		return PR_WRITE_EXCLUSIVE_REG_ONLY;
+	case SCSI_PR_EXCLUSIVE_ACCESS_REG_ONLY:
+		return PR_EXCLUSIVE_ACCESS_REG_ONLY;
+	case SCSI_PR_WRITE_EXCLUSIVE_ALL_REGS:
+		return PR_WRITE_EXCLUSIVE_ALL_REGS;
+	case SCSI_PR_EXCLUSIVE_ACCESS_ALL_REGS:
+		return PR_EXCLUSIVE_ACCESS_ALL_REGS;
+	default:
+		return 0;
+	}
+}
+
 #endif
diff --git a/include/scsi/scsi_proto.h b/include/scsi/scsi_proto.h
index c03e35fc382c..0fd6e295375a 100644
--- a/include/scsi/scsi_proto.h
+++ b/include/scsi/scsi_proto.h
@@ -151,6 +151,11 @@
 #define ZO_FINISH_ZONE	      0x02
 #define ZO_OPEN_ZONE	      0x03
 #define ZO_RESET_WRITE_POINTER 0x04
+/* values for PR in service action */
+#define READ_KEYS             0x00
+#define READ_RESERVATION      0x01
+#define REPORT_CAPABILITES    0x02
+#define READ_FULL_STATUS      0x03
 /* values for variable length command */
 #define XDREAD_32	      0x03
 #define XDWRITE_32	      0x04
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v3 05/19] dm: Add support for block PR read keys/reservation
  2022-10-26 23:19 [PATCH v3 00/19] Use block pr_ops in LIO Mike Christie
                   ` (3 preceding siblings ...)
  2022-10-26 23:19 ` [PATCH v3 04/19] scsi: Add support for block PR read keys/reservation Mike Christie
@ 2022-10-26 23:19 ` Mike Christie
  2022-10-26 23:19 ` [PATCH v3 06/19] nvme: Fix reservation status related structs Mike Christie
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 54+ messages in thread
From: Mike Christie @ 2022-10-26 23:19 UTC (permalink / raw)
  To: bvanassche, hch, martin.petersen, linux-scsi, james.bottomley,
	linux-block, dm-devel, snitzer, axboe, linux-nvme, chaitanyak,
	kbusch, target-devel
  Cc: Mike Christie

This adds support in dm for the block PR read keys and read reservation
callouts.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/md/dm.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 95a1ee3d314e..f7f806890c92 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -3312,12 +3312,56 @@ static int dm_pr_clear(struct block_device *bdev, u64 key)
 	return r;
 }
 
+static int dm_pr_read_keys(struct block_device *bdev, struct pr_keys *keys,
+			   u32 keys_len)
+{
+	struct mapped_device *md = bdev->bd_disk->private_data;
+	const struct pr_ops *ops;
+	int r, srcu_idx;
+
+	r = dm_prepare_ioctl(md, &srcu_idx, &bdev);
+	if (r < 0)
+		goto out;
+
+	ops = bdev->bd_disk->fops->pr_ops;
+	if (ops && ops->pr_read_keys)
+		r = ops->pr_read_keys(bdev, keys, keys_len);
+	else
+		r = -EOPNOTSUPP;
+out:
+	dm_unprepare_ioctl(md, srcu_idx);
+	return r;
+}
+
+static int dm_pr_read_reservation(struct block_device *bdev,
+				  struct pr_held_reservation *rsv)
+{
+	struct mapped_device *md = bdev->bd_disk->private_data;
+	const struct pr_ops *ops;
+	int r, srcu_idx;
+
+	r = dm_prepare_ioctl(md, &srcu_idx, &bdev);
+	if (r < 0)
+		goto out;
+
+	ops = bdev->bd_disk->fops->pr_ops;
+	if (ops && ops->pr_read_reservation)
+		r = ops->pr_read_reservation(bdev, rsv);
+	else
+		r = -EOPNOTSUPP;
+out:
+	dm_unprepare_ioctl(md, srcu_idx);
+	return r;
+}
+
 static const struct pr_ops dm_pr_ops = {
 	.pr_register	= dm_pr_register,
 	.pr_reserve	= dm_pr_reserve,
 	.pr_release	= dm_pr_release,
 	.pr_preempt	= dm_pr_preempt,
 	.pr_clear	= dm_pr_clear,
+	.pr_read_keys	= dm_pr_read_keys,
+	.pr_read_reservation = dm_pr_read_reservation,
 };
 
 static const struct block_device_operations dm_blk_dops = {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v3 06/19] nvme: Fix reservation status related structs
  2022-10-26 23:19 [PATCH v3 00/19] Use block pr_ops in LIO Mike Christie
                   ` (4 preceding siblings ...)
  2022-10-26 23:19 ` [PATCH v3 05/19] dm: " Mike Christie
@ 2022-10-26 23:19 ` Mike Christie
  2022-10-27 17:04   ` Keith Busch
  2022-10-26 23:19 ` [PATCH v3 07/19] nvme: Don't hardcode the data len for pr commands Mike Christie
                   ` (12 subsequent siblings)
  18 siblings, 1 reply; 54+ messages in thread
From: Mike Christie @ 2022-10-26 23:19 UTC (permalink / raw)
  To: bvanassche, hch, martin.petersen, linux-scsi, james.bottomley,
	linux-block, dm-devel, snitzer, axboe, linux-nvme, chaitanyak,
	kbusch, target-devel
  Cc: Mike Christie

This fixes the following issues with the reservation status structs:

1. resv10 is bytes 23:10 so it should be 14 bytes.
2. regctl_ds only supports 64 bit host IDs.

These are not currently used, but will be in this patchset which adds
support for the reservation report command.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 include/linux/nvme.h | 33 +++++++++++++++++++++++++--------
 1 file changed, 25 insertions(+), 8 deletions(-)

diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index 050d7d0cd81b..3ab141d982d1 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -757,20 +757,37 @@ enum {
 	NVME_LBART_ATTRIB_HIDE	= 1 << 1,
 };
 
+struct nvme_registered_ctrl {
+	__le16	cntlid;
+	__u8	rcsts;
+	__u8	rsvd3[5];
+	__le64	hostid;
+	__le64	rkey;
+};
+
+struct nvme_registered_ctrl_ext {
+	__le16	cntlid;
+	__u8	rcsts;
+	__u8	rsvd3[5];
+	__le64	rkey;
+	__u8	hostid[16];
+	__u8	rsvd32[32];
+};
+
 struct nvme_reservation_status {
 	__le32	gen;
 	__u8	rtype;
 	__u8	regctl[2];
 	__u8	resv5[2];
 	__u8	ptpls;
-	__u8	resv10[13];
-	struct {
-		__le16	cntlid;
-		__u8	rcsts;
-		__u8	resv3[5];
-		__le64	hostid;
-		__le64	rkey;
-	} regctl_ds[];
+	__u8	resv10[14];
+	union {
+		struct {
+			__u8	rsvd24[40];
+			struct nvme_registered_ctrl_ext regctl_eds[0];
+		};
+		struct nvme_registered_ctrl regctl_ds[0];
+	};
 };
 
 enum nvme_async_event_type {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v3 07/19] nvme: Don't hardcode the data len for pr commands
  2022-10-26 23:19 [PATCH v3 00/19] Use block pr_ops in LIO Mike Christie
                   ` (5 preceding siblings ...)
  2022-10-26 23:19 ` [PATCH v3 06/19] nvme: Fix reservation status related structs Mike Christie
@ 2022-10-26 23:19 ` Mike Christie
  2022-10-27 17:05   ` Keith Busch
  2022-11-01  5:29   ` Chaitanya Kulkarni
  2022-10-26 23:19 ` [PATCH v3 08/19] nvme: Move pr code to it's own file Mike Christie
                   ` (11 subsequent siblings)
  18 siblings, 2 replies; 54+ messages in thread
From: Mike Christie @ 2022-10-26 23:19 UTC (permalink / raw)
  To: bvanassche, hch, martin.petersen, linux-scsi, james.bottomley,
	linux-block, dm-devel, snitzer, axboe, linux-nvme, chaitanyak,
	kbusch, target-devel
  Cc: Mike Christie

Reservation Report support needs to pass in a variable sized buffer, so
this patch has the pr command helpers take a data length argument.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/core.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index dc4220600585..a79fa710d012 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2082,7 +2082,7 @@ static char nvme_pr_type(enum pr_type type)
 }
 
 static int nvme_send_ns_head_pr_command(struct block_device *bdev,
-		struct nvme_command *c, u8 data[16])
+		struct nvme_command *c, u8 *data, unsigned int data_len)
 {
 	struct nvme_ns_head *head = bdev->bd_disk->private_data;
 	int srcu_idx = srcu_read_lock(&head->srcu);
@@ -2091,17 +2091,17 @@ static int nvme_send_ns_head_pr_command(struct block_device *bdev,
 
 	if (ns) {
 		c->common.nsid = cpu_to_le32(ns->head->ns_id);
-		ret = nvme_submit_sync_cmd(ns->queue, c, data, 16);
+		ret = nvme_submit_sync_cmd(ns->queue, c, data, data_len);
 	}
 	srcu_read_unlock(&head->srcu, srcu_idx);
 	return ret;
 }
 	
 static int nvme_send_ns_pr_command(struct nvme_ns *ns, struct nvme_command *c,
-		u8 data[16])
+		u8 *data, unsigned int data_len)
 {
 	c->common.nsid = cpu_to_le32(ns->head->ns_id);
-	return nvme_submit_sync_cmd(ns->queue, c, data, 16);
+	return nvme_submit_sync_cmd(ns->queue, c, data, data_len);
 }
 
 static int nvme_pr_command(struct block_device *bdev, u32 cdw10,
@@ -2118,8 +2118,10 @@ static int nvme_pr_command(struct block_device *bdev, u32 cdw10,
 
 	if (IS_ENABLED(CONFIG_NVME_MULTIPATH) &&
 	    bdev->bd_disk->fops == &nvme_ns_head_ops)
-		return nvme_send_ns_head_pr_command(bdev, &c, data);
-	return nvme_send_ns_pr_command(bdev->bd_disk->private_data, &c, data);
+		return nvme_send_ns_head_pr_command(bdev, &c, data,
+						    sizeof(data));
+	return nvme_send_ns_pr_command(bdev->bd_disk->private_data, &c, data,
+				       sizeof(data));
 }
 
 static int nvme_pr_register(struct block_device *bdev, u64 old,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v3 08/19] nvme: Move pr code to it's own file
  2022-10-26 23:19 [PATCH v3 00/19] Use block pr_ops in LIO Mike Christie
                   ` (6 preceding siblings ...)
  2022-10-26 23:19 ` [PATCH v3 07/19] nvme: Don't hardcode the data len for pr commands Mike Christie
@ 2022-10-26 23:19 ` Mike Christie
  2022-10-27 17:06   ` Keith Busch
  2022-11-01  5:25   ` Chaitanya Kulkarni
  2022-10-26 23:19 ` [PATCH v3 09/19] nvme: Add pr_ops read_keys support Mike Christie
                   ` (10 subsequent siblings)
  18 siblings, 2 replies; 54+ messages in thread
From: Mike Christie @ 2022-10-26 23:19 UTC (permalink / raw)
  To: bvanassche, hch, martin.petersen, linux-scsi, james.bottomley,
	linux-block, dm-devel, snitzer, axboe, linux-nvme, chaitanyak,
	kbusch, target-devel
  Cc: Mike Christie

This patch moves the pr code to it's own file because I'm going to be
adding more functions and core.c is getting bigger.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/nvme/host/Makefile |   2 +-
 drivers/nvme/host/core.c   | 120 -----------------------------------
 drivers/nvme/host/nvme.h   |   2 +
 drivers/nvme/host/pr.c     | 127 +++++++++++++++++++++++++++++++++++++
 4 files changed, 130 insertions(+), 121 deletions(-)
 create mode 100644 drivers/nvme/host/pr.c

diff --git a/drivers/nvme/host/Makefile b/drivers/nvme/host/Makefile
index e27202d22c7d..06c18a65da99 100644
--- a/drivers/nvme/host/Makefile
+++ b/drivers/nvme/host/Makefile
@@ -10,7 +10,7 @@ obj-$(CONFIG_NVME_FC)			+= nvme-fc.o
 obj-$(CONFIG_NVME_TCP)			+= nvme-tcp.o
 obj-$(CONFIG_NVME_APPLE)		+= nvme-apple.o
 
-nvme-core-y				+= core.o ioctl.o
+nvme-core-y				+= core.o ioctl.o pr.o
 nvme-core-$(CONFIG_NVME_VERBOSE_ERRORS)	+= constants.o
 nvme-core-$(CONFIG_TRACING)		+= trace.o
 nvme-core-$(CONFIG_NVME_MULTIPATH)	+= multipath.o
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index a79fa710d012..2de9c42094a6 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2061,126 +2061,6 @@ static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_ns_info *info)
 	}
 }
 
-static char nvme_pr_type(enum pr_type type)
-{
-	switch (type) {
-	case PR_WRITE_EXCLUSIVE:
-		return 1;
-	case PR_EXCLUSIVE_ACCESS:
-		return 2;
-	case PR_WRITE_EXCLUSIVE_REG_ONLY:
-		return 3;
-	case PR_EXCLUSIVE_ACCESS_REG_ONLY:
-		return 4;
-	case PR_WRITE_EXCLUSIVE_ALL_REGS:
-		return 5;
-	case PR_EXCLUSIVE_ACCESS_ALL_REGS:
-		return 6;
-	default:
-		return 0;
-	}
-}
-
-static int nvme_send_ns_head_pr_command(struct block_device *bdev,
-		struct nvme_command *c, u8 *data, unsigned int data_len)
-{
-	struct nvme_ns_head *head = bdev->bd_disk->private_data;
-	int srcu_idx = srcu_read_lock(&head->srcu);
-	struct nvme_ns *ns = nvme_find_path(head);
-	int ret = -EWOULDBLOCK;
-
-	if (ns) {
-		c->common.nsid = cpu_to_le32(ns->head->ns_id);
-		ret = nvme_submit_sync_cmd(ns->queue, c, data, data_len);
-	}
-	srcu_read_unlock(&head->srcu, srcu_idx);
-	return ret;
-}
-	
-static int nvme_send_ns_pr_command(struct nvme_ns *ns, struct nvme_command *c,
-		u8 *data, unsigned int data_len)
-{
-	c->common.nsid = cpu_to_le32(ns->head->ns_id);
-	return nvme_submit_sync_cmd(ns->queue, c, data, data_len);
-}
-
-static int nvme_pr_command(struct block_device *bdev, u32 cdw10,
-				u64 key, u64 sa_key, u8 op)
-{
-	struct nvme_command c = { };
-	u8 data[16] = { 0, };
-
-	put_unaligned_le64(key, &data[0]);
-	put_unaligned_le64(sa_key, &data[8]);
-
-	c.common.opcode = op;
-	c.common.cdw10 = cpu_to_le32(cdw10);
-
-	if (IS_ENABLED(CONFIG_NVME_MULTIPATH) &&
-	    bdev->bd_disk->fops == &nvme_ns_head_ops)
-		return nvme_send_ns_head_pr_command(bdev, &c, data,
-						    sizeof(data));
-	return nvme_send_ns_pr_command(bdev->bd_disk->private_data, &c, data,
-				       sizeof(data));
-}
-
-static int nvme_pr_register(struct block_device *bdev, u64 old,
-		u64 new, unsigned flags)
-{
-	u32 cdw10;
-
-	if (flags & ~PR_FL_IGNORE_KEY)
-		return -EOPNOTSUPP;
-
-	cdw10 = old ? 2 : 0;
-	cdw10 |= (flags & PR_FL_IGNORE_KEY) ? 1 << 3 : 0;
-	cdw10 |= (1 << 30) | (1 << 31); /* PTPL=1 */
-	return nvme_pr_command(bdev, cdw10, old, new, nvme_cmd_resv_register);
-}
-
-static int nvme_pr_reserve(struct block_device *bdev, u64 key,
-		enum pr_type type, unsigned flags)
-{
-	u32 cdw10;
-
-	if (flags & ~PR_FL_IGNORE_KEY)
-		return -EOPNOTSUPP;
-
-	cdw10 = nvme_pr_type(type) << 8;
-	cdw10 |= ((flags & PR_FL_IGNORE_KEY) ? 1 << 3 : 0);
-	return nvme_pr_command(bdev, cdw10, key, 0, nvme_cmd_resv_acquire);
-}
-
-static int nvme_pr_preempt(struct block_device *bdev, u64 old, u64 new,
-		enum pr_type type, bool abort)
-{
-	u32 cdw10 = nvme_pr_type(type) << 8 | (abort ? 2 : 1);
-
-	return nvme_pr_command(bdev, cdw10, old, new, nvme_cmd_resv_acquire);
-}
-
-static int nvme_pr_clear(struct block_device *bdev, u64 key)
-{
-	u32 cdw10 = 1 | (key ? 0 : 1 << 3);
-
-	return nvme_pr_command(bdev, cdw10, key, 0, nvme_cmd_resv_release);
-}
-
-static int nvme_pr_release(struct block_device *bdev, u64 key, enum pr_type type)
-{
-	u32 cdw10 = nvme_pr_type(type) << 8 | (key ? 0 : 1 << 3);
-
-	return nvme_pr_command(bdev, cdw10, key, 0, nvme_cmd_resv_release);
-}
-
-const struct pr_ops nvme_pr_ops = {
-	.pr_register	= nvme_pr_register,
-	.pr_reserve	= nvme_pr_reserve,
-	.pr_release	= nvme_pr_release,
-	.pr_preempt	= nvme_pr_preempt,
-	.pr_clear	= nvme_pr_clear,
-};
-
 #ifdef CONFIG_BLK_SED_OPAL
 int nvme_sec_submit(void *data, u16 spsp, u8 secp, void *buffer, size_t len,
 		bool send)
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index a29877217ee6..e82be1f1373d 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -19,6 +19,8 @@
 
 #include <trace/events/block.h>
 
+extern const struct pr_ops nvme_pr_ops;
+
 extern unsigned int nvme_io_timeout;
 #define NVME_IO_TIMEOUT	(nvme_io_timeout * HZ)
 
diff --git a/drivers/nvme/host/pr.c b/drivers/nvme/host/pr.c
new file mode 100644
index 000000000000..aa88c55879b2
--- /dev/null
+++ b/drivers/nvme/host/pr.c
@@ -0,0 +1,127 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/blkdev.h>
+#include <linux/pr.h>
+#include <asm/unaligned.h>
+
+#include "nvme.h"
+
+static char nvme_pr_type(enum pr_type type)
+{
+	switch (type) {
+	case PR_WRITE_EXCLUSIVE:
+		return 1;
+	case PR_EXCLUSIVE_ACCESS:
+		return 2;
+	case PR_WRITE_EXCLUSIVE_REG_ONLY:
+		return 3;
+	case PR_EXCLUSIVE_ACCESS_REG_ONLY:
+		return 4;
+	case PR_WRITE_EXCLUSIVE_ALL_REGS:
+		return 5;
+	case PR_EXCLUSIVE_ACCESS_ALL_REGS:
+		return 6;
+	default:
+		return 0;
+	}
+}
+
+static int nvme_send_ns_head_pr_command(struct block_device *bdev,
+		struct nvme_command *c, u8 *data, unsigned int data_len)
+{
+	struct nvme_ns_head *head = bdev->bd_disk->private_data;
+	int srcu_idx = srcu_read_lock(&head->srcu);
+	struct nvme_ns *ns = nvme_find_path(head);
+	int ret = -EWOULDBLOCK;
+
+	if (ns) {
+		c->common.nsid = cpu_to_le32(ns->head->ns_id);
+		ret = nvme_submit_sync_cmd(ns->queue, c, data, data_len);
+	}
+	srcu_read_unlock(&head->srcu, srcu_idx);
+	return ret;
+}
+
+static int nvme_send_ns_pr_command(struct nvme_ns *ns, struct nvme_command *c,
+		u8 *data, unsigned int data_len)
+{
+	c->common.nsid = cpu_to_le32(ns->head->ns_id);
+	return nvme_submit_sync_cmd(ns->queue, c, data, data_len);
+}
+
+static int nvme_pr_command(struct block_device *bdev, u32 cdw10,
+				u64 key, u64 sa_key, u8 op)
+{
+	struct nvme_command c = { };
+	u8 data[16] = { 0, };
+
+	put_unaligned_le64(key, &data[0]);
+	put_unaligned_le64(sa_key, &data[8]);
+
+	c.common.opcode = op;
+	c.common.cdw10 = cpu_to_le32(cdw10);
+
+	if (IS_ENABLED(CONFIG_NVME_MULTIPATH) &&
+	    bdev->bd_disk->fops == &nvme_ns_head_ops)
+		return nvme_send_ns_head_pr_command(bdev, &c, data,
+						    sizeof(data));
+	return nvme_send_ns_pr_command(bdev->bd_disk->private_data, &c, data,
+				       sizeof(data));
+}
+
+static int nvme_pr_register(struct block_device *bdev, u64 old,
+		u64 new, unsigned flags)
+{
+	u32 cdw10;
+
+	if (flags & ~PR_FL_IGNORE_KEY)
+		return -EOPNOTSUPP;
+
+	cdw10 = old ? 2 : 0;
+	cdw10 |= (flags & PR_FL_IGNORE_KEY) ? 1 << 3 : 0;
+	cdw10 |= (1 << 30) | (1 << 31); /* PTPL=1 */
+	return nvme_pr_command(bdev, cdw10, old, new, nvme_cmd_resv_register);
+}
+
+static int nvme_pr_reserve(struct block_device *bdev, u64 key,
+		enum pr_type type, unsigned flags)
+{
+	u32 cdw10;
+
+	if (flags & ~PR_FL_IGNORE_KEY)
+		return -EOPNOTSUPP;
+
+	cdw10 = nvme_pr_type(type) << 8;
+	cdw10 |= ((flags & PR_FL_IGNORE_KEY) ? 1 << 3 : 0);
+	return nvme_pr_command(bdev, cdw10, key, 0, nvme_cmd_resv_acquire);
+}
+
+static int nvme_pr_preempt(struct block_device *bdev, u64 old, u64 new,
+		enum pr_type type, bool abort)
+{
+	u32 cdw10 = nvme_pr_type(type) << 8 | (abort ? 2 : 1);
+
+	return nvme_pr_command(bdev, cdw10, old, new, nvme_cmd_resv_acquire);
+}
+
+static int nvme_pr_clear(struct block_device *bdev, u64 key)
+{
+	u32 cdw10 = 1 | (key ? 0 : 1 << 3);
+
+	return nvme_pr_command(bdev, cdw10, key, 0, nvme_cmd_resv_release);
+}
+
+static int nvme_pr_release(struct block_device *bdev, u64 key, enum pr_type type)
+{
+	u32 cdw10 = nvme_pr_type(type) << 8 | (key ? 0 : 1 << 3);
+
+	return nvme_pr_command(bdev, cdw10, key, 0, nvme_cmd_resv_release);
+}
+
+const struct pr_ops nvme_pr_ops = {
+	.pr_register	= nvme_pr_register,
+	.pr_reserve	= nvme_pr_reserve,
+	.pr_release	= nvme_pr_release,
+	.pr_preempt	= nvme_pr_preempt,
+	.pr_clear	= nvme_pr_clear,
+};
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v3 09/19] nvme: Add pr_ops read_keys support
  2022-10-26 23:19 [PATCH v3 00/19] Use block pr_ops in LIO Mike Christie
                   ` (7 preceding siblings ...)
  2022-10-26 23:19 ` [PATCH v3 08/19] nvme: Move pr code to it's own file Mike Christie
@ 2022-10-26 23:19 ` Mike Christie
  2022-10-30  8:17   ` Christoph Hellwig
  2022-10-26 23:19 ` [PATCH v3 10/19] nvme: Move NVMe and Block PR types to an array Mike Christie
                   ` (9 subsequent siblings)
  18 siblings, 1 reply; 54+ messages in thread
From: Mike Christie @ 2022-10-26 23:19 UTC (permalink / raw)
  To: bvanassche, hch, martin.petersen, linux-scsi, james.bottomley,
	linux-block, dm-devel, snitzer, axboe, linux-nvme, chaitanyak,
	kbusch, target-devel
  Cc: Mike Christie

This patch adds support for the pr_ops read_keys callout by calling the
NVMe Reservation Report helper, then parsing that info to get the
controller's registered keys. Because the callout is only used in the
kernel where the callers do not know about controller/host IDs, the
callout just returns the registered keys which is required by the SCSI PR
in READ KEYS command.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/nvme/host/pr.c | 73 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/nvme.h   |  4 +++
 2 files changed, 77 insertions(+)

diff --git a/drivers/nvme/host/pr.c b/drivers/nvme/host/pr.c
index aa88c55879b2..df7eb2440c67 100644
--- a/drivers/nvme/host/pr.c
+++ b/drivers/nvme/host/pr.c
@@ -118,10 +118,83 @@ static int nvme_pr_release(struct block_device *bdev, u64 key, enum pr_type type
 	return nvme_pr_command(bdev, cdw10, key, 0, nvme_cmd_resv_release);
 }
 
+static int nvme_pr_resv_report(struct block_device *bdev, u8 *data,
+		u32 data_len, bool *eds)
+{
+	struct nvme_command c = { };
+	int ret;
+
+	c.common.opcode = nvme_cmd_resv_report;
+	c.common.cdw10 = cpu_to_le32(nvme_bytes_to_numd(data_len));
+	c.common.cdw11 = NVME_EXTENDED_DATA_STRUCT;
+	*eds = true;
+
+retry:
+	if (IS_ENABLED(CONFIG_NVME_MULTIPATH) &&
+	    bdev->bd_disk->fops == &nvme_ns_head_ops)
+		ret = nvme_send_ns_head_pr_command(bdev, &c, data, data_len);
+	else
+		ret = nvme_send_ns_pr_command(bdev->bd_disk->private_data, &c,
+					      data, data_len);
+	if (ret == NVME_SC_HOST_ID_INCONSIST &&
+	    c.common.cdw11 == NVME_EXTENDED_DATA_STRUCT) {
+		c.common.cdw11 = 0;
+		*eds = false;
+		goto retry;
+	}
+
+	return ret;
+}
+
+static int nvme_pr_read_keys(struct block_device *bdev,
+		struct pr_keys *keys_info, u32 keys_len)
+{
+	struct nvme_reservation_status *status;
+	u32 data_len, num_ret_keys;
+	int ret, i;
+	bool eds;
+	u8 *data;
+
+	/*
+	 * Assume we are using 128-bit host IDs and allocate a buffer large
+	 * enough to get enough keys to fill the return keys buffer.
+	 */
+	num_ret_keys = keys_len / 8;
+	data_len = sizeof(*status) +
+			num_ret_keys * sizeof(struct nvme_registered_ctrl_ext);
+	data = kzalloc(data_len, GFP_KERNEL);
+	if (!data)
+		return -ENOMEM;
+
+	ret = nvme_pr_resv_report(bdev, data, data_len, &eds);
+	if (ret)
+		goto free_data;
+
+	status = (struct nvme_reservation_status *)data;
+	keys_info->generation = le32_to_cpu(status->gen);
+	keys_info->num_keys = get_unaligned_le16(&status->regctl);
+
+	num_ret_keys = min(num_ret_keys, keys_info->num_keys);
+	for (i = 0; i < num_ret_keys; i++) {
+		if (eds) {
+			keys_info->keys[i] =
+					le64_to_cpu(status->regctl_eds[i].rkey);
+		} else {
+			keys_info->keys[i] =
+					le64_to_cpu(status->regctl_ds[i].rkey);
+		}
+	}
+
+free_data:
+	kfree(data);
+	return ret;
+}
+
 const struct pr_ops nvme_pr_ops = {
 	.pr_register	= nvme_pr_register,
 	.pr_reserve	= nvme_pr_reserve,
 	.pr_release	= nvme_pr_release,
 	.pr_preempt	= nvme_pr_preempt,
 	.pr_clear	= nvme_pr_clear,
+	.pr_read_keys	= nvme_pr_read_keys,
 };
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index 3ab141d982d1..5bc9c84dc216 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -757,6 +757,10 @@ enum {
 	NVME_LBART_ATTRIB_HIDE	= 1 << 1,
 };
 
+enum nvme_eds {
+	NVME_EXTENDED_DATA_STRUCT	= 0x1,
+};
+
 struct nvme_registered_ctrl {
 	__le16	cntlid;
 	__u8	rcsts;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v3 10/19] nvme: Move NVMe and Block PR types to an array
  2022-10-26 23:19 [PATCH v3 00/19] Use block pr_ops in LIO Mike Christie
                   ` (8 preceding siblings ...)
  2022-10-26 23:19 ` [PATCH v3 09/19] nvme: Add pr_ops read_keys support Mike Christie
@ 2022-10-26 23:19 ` Mike Christie
  2022-10-27 15:18   ` Keith Busch
  2022-10-26 23:19 ` [PATCH v3 11/19] nvme: Add pr_ops read_reservation support Mike Christie
                   ` (8 subsequent siblings)
  18 siblings, 1 reply; 54+ messages in thread
From: Mike Christie @ 2022-10-26 23:19 UTC (permalink / raw)
  To: bvanassche, hch, martin.petersen, linux-scsi, james.bottomley,
	linux-block, dm-devel, snitzer, axboe, linux-nvme, chaitanyak,
	kbusch, target-devel
  Cc: Mike Christie

For Reservation Report support we need to also convert from the NVMe spec
PR type back to the block PR definition. This moves us to an array, so in
the next patch we can add another helper to do the conversion without
having to manage 2 switches.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/nvme/host/pr.c | 42 +++++++++++++++++++++++-------------------
 include/linux/nvme.h   |  9 +++++++++
 2 files changed, 32 insertions(+), 19 deletions(-)

diff --git a/drivers/nvme/host/pr.c b/drivers/nvme/host/pr.c
index df7eb2440c67..5c4611d15d9c 100644
--- a/drivers/nvme/host/pr.c
+++ b/drivers/nvme/host/pr.c
@@ -6,24 +6,28 @@
 
 #include "nvme.h"
 
-static char nvme_pr_type(enum pr_type type)
+static const struct {
+	enum nvme_pr_type	nvme_type;
+	enum pr_type		blk_type;
+} nvme_pr_types[] = {
+	{ NVME_PR_WRITE_EXCLUSIVE, PR_WRITE_EXCLUSIVE },
+	{ NVME_PR_EXCLUSIVE_ACCESS, PR_EXCLUSIVE_ACCESS },
+	{ NVME_PR_WRITE_EXCLUSIVE_REG_ONLY, PR_WRITE_EXCLUSIVE_REG_ONLY },
+	{ NVME_PR_EXCLUSIVE_ACCESS_REG_ONLY, PR_EXCLUSIVE_ACCESS_REG_ONLY },
+	{ NVME_PR_WRITE_EXCLUSIVE_ALL_REGS, PR_WRITE_EXCLUSIVE_ALL_REGS },
+	{ NVME_PR_EXCLUSIVE_ACCESS_ALL_REGS, PR_EXCLUSIVE_ACCESS_ALL_REGS },
+};
+
+static enum nvme_pr_type nvme_pr_type_from_blk(enum pr_type type)
 {
-	switch (type) {
-	case PR_WRITE_EXCLUSIVE:
-		return 1;
-	case PR_EXCLUSIVE_ACCESS:
-		return 2;
-	case PR_WRITE_EXCLUSIVE_REG_ONLY:
-		return 3;
-	case PR_EXCLUSIVE_ACCESS_REG_ONLY:
-		return 4;
-	case PR_WRITE_EXCLUSIVE_ALL_REGS:
-		return 5;
-	case PR_EXCLUSIVE_ACCESS_ALL_REGS:
-		return 6;
-	default:
-		return 0;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(nvme_pr_types); i++) {
+		if (nvme_pr_types[i].blk_type == type)
+			return nvme_pr_types[i].nvme_type;
 	}
+
+	return 0;
 }
 
 static int nvme_send_ns_head_pr_command(struct block_device *bdev,
@@ -91,7 +95,7 @@ static int nvme_pr_reserve(struct block_device *bdev, u64 key,
 	if (flags & ~PR_FL_IGNORE_KEY)
 		return -EOPNOTSUPP;
 
-	cdw10 = nvme_pr_type(type) << 8;
+	cdw10 = nvme_pr_type_from_blk(type) << 8;
 	cdw10 |= ((flags & PR_FL_IGNORE_KEY) ? 1 << 3 : 0);
 	return nvme_pr_command(bdev, cdw10, key, 0, nvme_cmd_resv_acquire);
 }
@@ -99,7 +103,7 @@ static int nvme_pr_reserve(struct block_device *bdev, u64 key,
 static int nvme_pr_preempt(struct block_device *bdev, u64 old, u64 new,
 		enum pr_type type, bool abort)
 {
-	u32 cdw10 = nvme_pr_type(type) << 8 | (abort ? 2 : 1);
+	u32 cdw10 = nvme_pr_type_from_blk(type) << 8 | (abort ? 2 : 1);
 
 	return nvme_pr_command(bdev, cdw10, old, new, nvme_cmd_resv_acquire);
 }
@@ -113,7 +117,7 @@ static int nvme_pr_clear(struct block_device *bdev, u64 key)
 
 static int nvme_pr_release(struct block_device *bdev, u64 key, enum pr_type type)
 {
-	u32 cdw10 = nvme_pr_type(type) << 8 | (key ? 0 : 1 << 3);
+	u32 cdw10 = nvme_pr_type_from_blk(type) << 8 | (key ? 0 : 1 << 3);
 
 	return nvme_pr_command(bdev, cdw10, key, 0, nvme_cmd_resv_release);
 }
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index 5bc9c84dc216..d0bd15f527fc 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -757,6 +757,15 @@ enum {
 	NVME_LBART_ATTRIB_HIDE	= 1 << 1,
 };
 
+enum nvme_pr_type {
+	NVME_PR_WRITE_EXCLUSIVE			= 1,
+	NVME_PR_EXCLUSIVE_ACCESS		= 2,
+	NVME_PR_WRITE_EXCLUSIVE_REG_ONLY	= 3,
+	NVME_PR_EXCLUSIVE_ACCESS_REG_ONLY	= 4,
+	NVME_PR_WRITE_EXCLUSIVE_ALL_REGS	= 5,
+	NVME_PR_EXCLUSIVE_ACCESS_ALL_REGS	= 6,
+};
+
 enum nvme_eds {
 	NVME_EXTENDED_DATA_STRUCT	= 0x1,
 };
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v3 11/19] nvme: Add pr_ops read_reservation support
  2022-10-26 23:19 [PATCH v3 00/19] Use block pr_ops in LIO Mike Christie
                   ` (9 preceding siblings ...)
  2022-10-26 23:19 ` [PATCH v3 10/19] nvme: Move NVMe and Block PR types to an array Mike Christie
@ 2022-10-26 23:19 ` Mike Christie
  2022-10-30  8:18   ` Christoph Hellwig
  2022-10-26 23:19 ` [PATCH v3 12/19] block,nvme,scsi,dm: Add blk_status to pr_ops callouts Mike Christie
                   ` (7 subsequent siblings)
  18 siblings, 1 reply; 54+ messages in thread
From: Mike Christie @ 2022-10-26 23:19 UTC (permalink / raw)
  To: bvanassche, hch, martin.petersen, linux-scsi, james.bottomley,
	linux-block, dm-devel, snitzer, axboe, linux-nvme, chaitanyak,
	kbusch, target-devel
  Cc: Mike Christie

This patch adds support for the pr_ops read_reservation callout by
calling the NVMe Reservation Report helper. It then parses that info to
detect if there is a reservation and if there is then convert the
returned info to a pr_ops pr_held_reservation struct.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/nvme/host/pr.c | 79 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 79 insertions(+)

diff --git a/drivers/nvme/host/pr.c b/drivers/nvme/host/pr.c
index 5c4611d15d9c..87a0aaf811f4 100644
--- a/drivers/nvme/host/pr.c
+++ b/drivers/nvme/host/pr.c
@@ -30,6 +30,18 @@ static enum nvme_pr_type nvme_pr_type_from_blk(enum pr_type type)
 	return 0;
 }
 
+static enum pr_type block_pr_type_from_nvme(enum nvme_pr_type type)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(nvme_pr_types); i++) {
+		if (nvme_pr_types[i].nvme_type == type)
+			return nvme_pr_types[i].blk_type;
+	}
+
+	return 0;
+}
+
 static int nvme_send_ns_head_pr_command(struct block_device *bdev,
 		struct nvme_command *c, u8 *data, unsigned int data_len)
 {
@@ -194,6 +206,72 @@ static int nvme_pr_read_keys(struct block_device *bdev,
 	return ret;
 }
 
+static int nvme_pr_read_reservation(struct block_device *bdev,
+		struct pr_held_reservation *resv)
+{
+	struct nvme_reservation_status tmp_status, *status;
+	int ret, i, num_regs;
+	u32 data_len;
+	bool eds;
+	u8 *data;
+
+	memset(resv, 0, sizeof(*resv));
+retry:
+	/*
+	 * Get the number of registrations so we know how big to allocate
+	 * the response buffer.
+	 */
+	ret = nvme_pr_resv_report(bdev, (u8 *)&tmp_status, sizeof(tmp_status),
+				  &eds);
+	if (ret)
+		return 0;
+
+	num_regs = get_unaligned_le16(&tmp_status.regctl);
+	if (!num_regs) {
+		resv->generation = le32_to_cpu(tmp_status.gen);
+		return 0;
+	}
+
+	data_len = sizeof(*status) +
+			num_regs * sizeof(struct nvme_registered_ctrl_ext);
+	data = kzalloc(data_len, GFP_KERNEL);
+	if (!data)
+		return -ENOMEM;
+
+	ret = nvme_pr_resv_report(bdev, data, data_len, &eds);
+	if (ret)
+		goto free_data;
+	status = (struct nvme_reservation_status *)data;
+
+	if (num_regs != get_unaligned_le16(&status->regctl)) {
+		kfree(data);
+		goto retry;
+	}
+
+	resv->generation = le32_to_cpu(status->gen);
+	resv->type = block_pr_type_from_nvme(status->rtype);
+
+	for (i = 0; i < num_regs; i++) {
+		if (eds) {
+			if (status->regctl_eds[i].rcsts) {
+				resv->key =
+					le64_to_cpu(status->regctl_eds[i].rkey);
+				break;
+			}
+		} else {
+			if (status->regctl_ds[i].rcsts) {
+				resv->key =
+					le64_to_cpu(status->regctl_ds[i].rkey);
+				break;
+			}
+		}
+	}
+
+free_data:
+	kfree(data);
+	return ret;
+}
+
 const struct pr_ops nvme_pr_ops = {
 	.pr_register	= nvme_pr_register,
 	.pr_reserve	= nvme_pr_reserve,
@@ -201,4 +279,5 @@ const struct pr_ops nvme_pr_ops = {
 	.pr_preempt	= nvme_pr_preempt,
 	.pr_clear	= nvme_pr_clear,
 	.pr_read_keys	= nvme_pr_read_keys,
+	.pr_read_reservation = nvme_pr_read_reservation,
 };
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v3 12/19] block,nvme,scsi,dm: Add blk_status to pr_ops callouts
  2022-10-26 23:19 [PATCH v3 00/19] Use block pr_ops in LIO Mike Christie
                   ` (10 preceding siblings ...)
  2022-10-26 23:19 ` [PATCH v3 11/19] nvme: Add pr_ops read_reservation support Mike Christie
@ 2022-10-26 23:19 ` Mike Christie
  2022-10-30  8:20   ` Christoph Hellwig
  2022-10-26 23:19 ` [PATCH v3 13/19] nvme: Have NVMe pr_ops return a blk_status_t Mike Christie
                   ` (6 subsequent siblings)
  18 siblings, 1 reply; 54+ messages in thread
From: Mike Christie @ 2022-10-26 23:19 UTC (permalink / raw)
  To: bvanassche, hch, martin.petersen, linux-scsi, james.bottomley,
	linux-block, dm-devel, snitzer, axboe, linux-nvme, chaitanyak,
	kbusch, target-devel
  Cc: Mike Christie

LIO needs to be able to know if a failure was the result of a reservation
conflict and then be able to convert from the lower level's definition of
that error to SCSI so it can be returned to the initiator. Windows
clustering and test tools like libiscsi require this.

dm-multipath would also like to be able to distiguish between path
failures and reservation conflict so they can optimize their error
handlers for their pr_ops.

To do this they currently have to know the lower level device type and how
to convert between that driver's error code and SCSI. Just knowing the
device type is difficult because we can have layers like dm-multipath
between us and dm-multipath only knows the layer below it is a block
device.

To handle both cases and keep userspace compatibility, this patch adds a
blk_status_t arg to the pr_ops callouts. The lower levels will convert
their device specific error to the blk_status_t then the upper levels
can easily check that code without knowing the device type. Adding the
extra return value will then allow us to not break userspace which expects
a negative -Exyz error code if the command fails before it's sent to the
device or a device/driver specific value if the error is > 0.

This patch just wires in the blk_status_t to the pr_ops callouts. The
next patches will then have the drivers pass up a blk_status_t.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 block/ioctl.c            | 11 ++++++-----
 drivers/md/dm.c          | 41 +++++++++++++++++++++++++---------------
 drivers/nvme/host/pr.c   | 16 +++++++++-------
 drivers/scsi/sd.c        | 21 +++++++++++---------
 fs/nfs/blocklayout/dev.c |  4 ++--
 fs/nfsd/blocklayout.c    |  6 +++---
 include/linux/pr.h       | 17 ++++++++++-------
 7 files changed, 68 insertions(+), 48 deletions(-)

diff --git a/block/ioctl.c b/block/ioctl.c
index 60121e89052b..72338c56e235 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -269,7 +269,8 @@ static int blkdev_pr_register(struct block_device *bdev,
 
 	if (reg.flags & ~PR_FL_IGNORE_KEY)
 		return -EOPNOTSUPP;
-	return ops->pr_register(bdev, reg.old_key, reg.new_key, reg.flags);
+	return ops->pr_register(bdev, reg.old_key, reg.new_key, reg.flags,
+				NULL);
 }
 
 static int blkdev_pr_reserve(struct block_device *bdev,
@@ -287,7 +288,7 @@ static int blkdev_pr_reserve(struct block_device *bdev,
 
 	if (rsv.flags & ~PR_FL_IGNORE_KEY)
 		return -EOPNOTSUPP;
-	return ops->pr_reserve(bdev, rsv.key, rsv.type, rsv.flags);
+	return ops->pr_reserve(bdev, rsv.key, rsv.type, rsv.flags, NULL);
 }
 
 static int blkdev_pr_release(struct block_device *bdev,
@@ -305,7 +306,7 @@ static int blkdev_pr_release(struct block_device *bdev,
 
 	if (rsv.flags)
 		return -EOPNOTSUPP;
-	return ops->pr_release(bdev, rsv.key, rsv.type);
+	return ops->pr_release(bdev, rsv.key, rsv.type, NULL);
 }
 
 static int blkdev_pr_preempt(struct block_device *bdev,
@@ -323,7 +324,7 @@ static int blkdev_pr_preempt(struct block_device *bdev,
 
 	if (p.flags)
 		return -EOPNOTSUPP;
-	return ops->pr_preempt(bdev, p.old_key, p.new_key, p.type, abort);
+	return ops->pr_preempt(bdev, p.old_key, p.new_key, p.type, abort, NULL);
 }
 
 static int blkdev_pr_clear(struct block_device *bdev,
@@ -341,7 +342,7 @@ static int blkdev_pr_clear(struct block_device *bdev,
 
 	if (c.flags)
 		return -EOPNOTSUPP;
-	return ops->pr_clear(bdev, c.key);
+	return ops->pr_clear(bdev, c.key, NULL);
 }
 
 static int blkdev_flushbuf(struct block_device *bdev, fmode_t mode,
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index f7f806890c92..7e77c836a61a 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -3079,7 +3079,8 @@ struct dm_pr {
 	bool	abort;
 	bool	fail_early;
 	int	ret;
-	enum pr_type type;
+	enum pr_type	type;
+	blk_status_t	*blk_stat;
 };
 
 static int dm_call_pr(struct block_device *bdev, iterate_devices_callout_fn fn,
@@ -3130,7 +3131,8 @@ static int __dm_pr_register(struct dm_target *ti, struct dm_dev *dev,
 		return -1;
 	}
 
-	ret = ops->pr_register(dev->bdev, pr->old_key, pr->new_key, pr->flags);
+	ret = ops->pr_register(dev->bdev, pr->old_key, pr->new_key, pr->flags,
+			       pr->blk_stat);
 	if (!ret)
 		return 0;
 
@@ -3144,7 +3146,7 @@ static int __dm_pr_register(struct dm_target *ti, struct dm_dev *dev,
 }
 
 static int dm_pr_register(struct block_device *bdev, u64 old_key, u64 new_key,
-			  u32 flags)
+			  u32 flags, blk_status_t *blk_stat)
 {
 	struct dm_pr pr = {
 		.old_key	= old_key,
@@ -3152,6 +3154,7 @@ static int dm_pr_register(struct block_device *bdev, u64 old_key, u64 new_key,
 		.flags		= flags,
 		.fail_early	= true,
 		.ret		= 0,
+		.blk_stat	= blk_stat,
 	};
 	int ret;
 
@@ -3189,7 +3192,8 @@ static int __dm_pr_reserve(struct dm_target *ti, struct dm_dev *dev,
 		return -1;
 	}
 
-	pr->ret = ops->pr_reserve(dev->bdev, pr->old_key, pr->type, pr->flags);
+	pr->ret = ops->pr_reserve(dev->bdev, pr->old_key, pr->type, pr->flags,
+				  pr->blk_stat);
 	if (!pr->ret)
 		return -1;
 
@@ -3197,7 +3201,7 @@ static int __dm_pr_reserve(struct dm_target *ti, struct dm_dev *dev,
 }
 
 static int dm_pr_reserve(struct block_device *bdev, u64 key, enum pr_type type,
-			 u32 flags)
+			 u32 flags, blk_status_t *blk_stat)
 {
 	struct dm_pr pr = {
 		.old_key	= key,
@@ -3205,6 +3209,7 @@ static int dm_pr_reserve(struct block_device *bdev, u64 key, enum pr_type type,
 		.type		= type,
 		.fail_early	= false,
 		.ret		= 0,
+		.blk_stat	= blk_stat,
 	};
 	int ret;
 
@@ -3232,19 +3237,22 @@ static int __dm_pr_release(struct dm_target *ti, struct dm_dev *dev,
 		return -1;
 	}
 
-	pr->ret = ops->pr_release(dev->bdev, pr->old_key, pr->type);
+	pr->ret = ops->pr_release(dev->bdev, pr->old_key, pr->type,
+				  pr->blk_stat);
 	if (pr->ret)
 		return -1;
 
 	return 0;
 }
 
-static int dm_pr_release(struct block_device *bdev, u64 key, enum pr_type type)
+static int dm_pr_release(struct block_device *bdev, u64 key, enum pr_type type,
+			 blk_status_t *blk_stat)
 {
 	struct dm_pr pr = {
 		.old_key	= key,
 		.type		= type,
 		.fail_early	= false,
+		.blk_stat	= blk_stat,
 	};
 	int ret;
 
@@ -3267,7 +3275,7 @@ static int __dm_pr_preempt(struct dm_target *ti, struct dm_dev *dev,
 	}
 
 	pr->ret = ops->pr_preempt(dev->bdev, pr->old_key, pr->new_key, pr->type,
-				  pr->abort);
+				  pr->abort, pr->blk_stat);
 	if (!pr->ret)
 		return -1;
 
@@ -3275,13 +3283,14 @@ static int __dm_pr_preempt(struct dm_target *ti, struct dm_dev *dev,
 }
 
 static int dm_pr_preempt(struct block_device *bdev, u64 old_key, u64 new_key,
-			 enum pr_type type, bool abort)
+			 enum pr_type type, bool abort, blk_status_t *blk_stat)
 {
 	struct dm_pr pr = {
 		.new_key	= new_key,
 		.old_key	= old_key,
 		.type		= type,
 		.fail_early	= false,
+		.blk_stat	= blk_stat,
 	};
 	int ret;
 
@@ -3292,7 +3301,8 @@ static int dm_pr_preempt(struct block_device *bdev, u64 old_key, u64 new_key,
 	return pr.ret;
 }
 
-static int dm_pr_clear(struct block_device *bdev, u64 key)
+static int dm_pr_clear(struct block_device *bdev, u64 key,
+		       blk_status_t *blk_stat)
 {
 	struct mapped_device *md = bdev->bd_disk->private_data;
 	const struct pr_ops *ops;
@@ -3304,7 +3314,7 @@ static int dm_pr_clear(struct block_device *bdev, u64 key)
 
 	ops = bdev->bd_disk->fops->pr_ops;
 	if (ops && ops->pr_clear)
-		r = ops->pr_clear(bdev, key);
+		r = ops->pr_clear(bdev, key, blk_stat);
 	else
 		r = -EOPNOTSUPP;
 out:
@@ -3313,7 +3323,7 @@ static int dm_pr_clear(struct block_device *bdev, u64 key)
 }
 
 static int dm_pr_read_keys(struct block_device *bdev, struct pr_keys *keys,
-			   u32 keys_len)
+			   u32 keys_len, blk_status_t *blk_stat)
 {
 	struct mapped_device *md = bdev->bd_disk->private_data;
 	const struct pr_ops *ops;
@@ -3325,7 +3335,7 @@ static int dm_pr_read_keys(struct block_device *bdev, struct pr_keys *keys,
 
 	ops = bdev->bd_disk->fops->pr_ops;
 	if (ops && ops->pr_read_keys)
-		r = ops->pr_read_keys(bdev, keys, keys_len);
+		r = ops->pr_read_keys(bdev, keys, keys_len, blk_stat);
 	else
 		r = -EOPNOTSUPP;
 out:
@@ -3334,7 +3344,8 @@ static int dm_pr_read_keys(struct block_device *bdev, struct pr_keys *keys,
 }
 
 static int dm_pr_read_reservation(struct block_device *bdev,
-				  struct pr_held_reservation *rsv)
+				  struct pr_held_reservation *rsv,
+				  blk_status_t *blk_stat)
 {
 	struct mapped_device *md = bdev->bd_disk->private_data;
 	const struct pr_ops *ops;
@@ -3346,7 +3357,7 @@ static int dm_pr_read_reservation(struct block_device *bdev,
 
 	ops = bdev->bd_disk->fops->pr_ops;
 	if (ops && ops->pr_read_reservation)
-		r = ops->pr_read_reservation(bdev, rsv);
+		r = ops->pr_read_reservation(bdev, rsv, blk_stat);
 	else
 		r = -EOPNOTSUPP;
 out:
diff --git a/drivers/nvme/host/pr.c b/drivers/nvme/host/pr.c
index 87a0aaf811f4..854a26ce55fe 100644
--- a/drivers/nvme/host/pr.c
+++ b/drivers/nvme/host/pr.c
@@ -86,7 +86,7 @@ static int nvme_pr_command(struct block_device *bdev, u32 cdw10,
 }
 
 static int nvme_pr_register(struct block_device *bdev, u64 old,
-		u64 new, unsigned flags)
+		u64 new, unsigned flags, blk_status_t *blk_stat)
 {
 	u32 cdw10;
 
@@ -100,7 +100,7 @@ static int nvme_pr_register(struct block_device *bdev, u64 old,
 }
 
 static int nvme_pr_reserve(struct block_device *bdev, u64 key,
-		enum pr_type type, unsigned flags)
+		enum pr_type type, unsigned flags, blk_status_t *blk_stat)
 {
 	u32 cdw10;
 
@@ -113,21 +113,23 @@ static int nvme_pr_reserve(struct block_device *bdev, u64 key,
 }
 
 static int nvme_pr_preempt(struct block_device *bdev, u64 old, u64 new,
-		enum pr_type type, bool abort)
+		enum pr_type type, bool abort, blk_status_t *blk_stat)
 {
 	u32 cdw10 = nvme_pr_type_from_blk(type) << 8 | (abort ? 2 : 1);
 
 	return nvme_pr_command(bdev, cdw10, old, new, nvme_cmd_resv_acquire);
 }
 
-static int nvme_pr_clear(struct block_device *bdev, u64 key)
+static int nvme_pr_clear(struct block_device *bdev, u64 key,
+		blk_status_t *blk_stat)
 {
 	u32 cdw10 = 1 | (key ? 0 : 1 << 3);
 
 	return nvme_pr_command(bdev, cdw10, key, 0, nvme_cmd_resv_release);
 }
 
-static int nvme_pr_release(struct block_device *bdev, u64 key, enum pr_type type)
+static int nvme_pr_release(struct block_device *bdev, u64 key, enum pr_type type,
+		blk_status_t *blk_stat)
 {
 	u32 cdw10 = nvme_pr_type_from_blk(type) << 8 | (key ? 0 : 1 << 3);
 
@@ -163,7 +165,7 @@ static int nvme_pr_resv_report(struct block_device *bdev, u8 *data,
 }
 
 static int nvme_pr_read_keys(struct block_device *bdev,
-		struct pr_keys *keys_info, u32 keys_len)
+		struct pr_keys *keys_info, u32 keys_len, blk_status_t *blk_stat)
 {
 	struct nvme_reservation_status *status;
 	u32 data_len, num_ret_keys;
@@ -207,7 +209,7 @@ static int nvme_pr_read_keys(struct block_device *bdev,
 }
 
 static int nvme_pr_read_reservation(struct block_device *bdev,
-		struct pr_held_reservation *resv)
+		struct pr_held_reservation *resv, blk_status_t *blk_stat)
 {
 	struct nvme_reservation_status tmp_status, *status;
 	int ret, i, num_regs;
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 86b602399000..8a39f25e4470 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1736,7 +1736,7 @@ static int sd_pr_in_command(struct block_device *bdev, u8 sa,
 }
 
 static int sd_pr_read_keys(struct block_device *bdev, struct pr_keys *keys_info,
-			   u32 keys_len)
+			   u32 keys_len, blk_status_t *blk_stat)
 {
 	int result, i, data_offset, num_copy_keys;
 	int data_len = keys_len + 8;
@@ -1767,7 +1767,8 @@ static int sd_pr_read_keys(struct block_device *bdev, struct pr_keys *keys_info,
 }
 
 static int sd_pr_read_reservation(struct block_device *bdev,
-				  struct pr_held_reservation *rsv)
+				  struct pr_held_reservation *rsv,
+				  blk_status_t *blk_stat)
 {
 	struct scsi_disk *sdkp = scsi_disk(bdev->bd_disk);
 	struct scsi_device *sdev = sdkp->device;
@@ -1797,8 +1798,8 @@ static int sd_pr_read_reservation(struct block_device *bdev,
 	return 0;
 }
 
-static int sd_pr_out_command(struct block_device *bdev, u8 sa,
-		u64 key, u64 sa_key, enum scsi_pr_type type, u8 flags)
+static int sd_pr_out_command(struct block_device *bdev, u8 sa, u64 key,
+		u64 sa_key, enum scsi_pr_type type, u8 flags)
 {
 	struct scsi_disk *sdkp = scsi_disk(bdev->bd_disk);
 	struct scsi_device *sdev = sdkp->device;
@@ -1847,7 +1848,7 @@ static int sd_pr_out_command(struct block_device *bdev, u8 sa,
 }
 
 static int sd_pr_register(struct block_device *bdev, u64 old_key, u64 new_key,
-		u32 flags)
+		u32 flags, blk_status_t *blk_stat)
 {
 	if (flags & ~PR_FL_IGNORE_KEY)
 		return -EOPNOTSUPP;
@@ -1857,7 +1858,7 @@ static int sd_pr_register(struct block_device *bdev, u64 old_key, u64 new_key,
 }
 
 static int sd_pr_reserve(struct block_device *bdev, u64 key, enum pr_type type,
-		u32 flags)
+		u32 flags, blk_status_t *blk_stat)
 {
 	if (flags)
 		return -EOPNOTSUPP;
@@ -1865,20 +1866,22 @@ static int sd_pr_reserve(struct block_device *bdev, u64 key, enum pr_type type,
 				 block_pr_type_to_scsi(type), 0);
 }
 
-static int sd_pr_release(struct block_device *bdev, u64 key, enum pr_type type)
+static int sd_pr_release(struct block_device *bdev, u64 key, enum pr_type type,
+		blk_status_t *blk_stat)
 {
 	return sd_pr_out_command(bdev, 0x02, key, 0,
 				 block_pr_type_to_scsi(type), 0);
 }
 
 static int sd_pr_preempt(struct block_device *bdev, u64 old_key, u64 new_key,
-		enum pr_type type, bool abort)
+		enum pr_type type, bool abort, blk_status_t *blk_stat)
 {
 	return sd_pr_out_command(bdev, abort ? 0x05 : 0x04, old_key, new_key,
 				 block_pr_type_to_scsi(type), 0);
 }
 
-static int sd_pr_clear(struct block_device *bdev, u64 key)
+static int sd_pr_clear(struct block_device *bdev, u64 key,
+		blk_status_t *blk_stat)
 {
 	return sd_pr_out_command(bdev, 0x03, key, 0, 0, 0);
 }
diff --git a/fs/nfs/blocklayout/dev.c b/fs/nfs/blocklayout/dev.c
index fea5f8821da5..516436446f5d 100644
--- a/fs/nfs/blocklayout/dev.c
+++ b/fs/nfs/blocklayout/dev.c
@@ -29,7 +29,7 @@ bl_free_device(struct pnfs_block_dev *dev)
 			int error;
 
 			error = ops->pr_register(dev->bdev, dev->pr_key, 0,
-				false);
+				false, NULL);
 			if (error)
 				pr_err("failed to unregister PR key.\n");
 		}
@@ -362,7 +362,7 @@ bl_parse_scsi(struct nfs_server *server, struct pnfs_block_dev *d,
 		goto out_blkdev_put;
 	}
 
-	error = ops->pr_register(d->bdev, 0, d->pr_key, true);
+	error = ops->pr_register(d->bdev, 0, d->pr_key, true, NULL);
 	if (error) {
 		pr_err("pNFS: failed to register key for block device %s.",
 				d->bdev->bd_disk->disk_name);
diff --git a/fs/nfsd/blocklayout.c b/fs/nfsd/blocklayout.c
index b6d01d51a746..a302ea026f72 100644
--- a/fs/nfsd/blocklayout.c
+++ b/fs/nfsd/blocklayout.c
@@ -277,7 +277,7 @@ nfsd4_block_get_device_info_scsi(struct super_block *sb,
 		goto out_free_dev;
 	}
 
-	ret = ops->pr_register(sb->s_bdev, 0, NFSD_MDS_PR_KEY, true);
+	ret = ops->pr_register(sb->s_bdev, 0, NFSD_MDS_PR_KEY, true, NULL);
 	if (ret) {
 		pr_err("pNFS: failed to register key for device %s.\n",
 			sb->s_id);
@@ -285,7 +285,7 @@ nfsd4_block_get_device_info_scsi(struct super_block *sb,
 	}
 
 	ret = ops->pr_reserve(sb->s_bdev, NFSD_MDS_PR_KEY,
-			PR_EXCLUSIVE_ACCESS_REG_ONLY, 0);
+			PR_EXCLUSIVE_ACCESS_REG_ONLY, 0, NULL);
 	if (ret) {
 		pr_err("pNFS: failed to reserve device %s.\n",
 			sb->s_id);
@@ -331,7 +331,7 @@ nfsd4_scsi_fence_client(struct nfs4_layout_stateid *ls)
 	struct block_device *bdev = ls->ls_file->nf_file->f_path.mnt->mnt_sb->s_bdev;
 
 	bdev->bd_disk->fops->pr_ops->pr_preempt(bdev, NFSD_MDS_PR_KEY,
-			nfsd4_scsi_pr_key(clp), 0, true);
+			nfsd4_scsi_pr_key(clp), 0, true, NULL);
 }
 
 const struct nfsd4_layout_ops scsi_layout_ops = {
diff --git a/include/linux/pr.h b/include/linux/pr.h
index 55c9ab7a202b..954544200a57 100644
--- a/include/linux/pr.h
+++ b/include/linux/pr.h
@@ -18,14 +18,15 @@ struct pr_held_reservation {
 
 struct pr_ops {
 	int (*pr_register)(struct block_device *bdev, u64 old_key, u64 new_key,
-			u32 flags);
+			u32 flags, blk_status_t *blk_stat);
 	int (*pr_reserve)(struct block_device *bdev, u64 key,
-			enum pr_type type, u32 flags);
+			enum pr_type type, u32 flags, blk_status_t *blk_stat);
 	int (*pr_release)(struct block_device *bdev, u64 key,
-			enum pr_type type);
+			enum pr_type type, blk_status_t *blk_stat);
 	int (*pr_preempt)(struct block_device *bdev, u64 old_key, u64 new_key,
-			enum pr_type type, bool abort);
-	int (*pr_clear)(struct block_device *bdev, u64 key);
+			enum pr_type type, bool abort, blk_status_t *blk_stat);
+	int (*pr_clear)(struct block_device *bdev, u64 key,
+			blk_status_t *blk_stat);
 	/*
 	 * pr_read_keys - Read the registered keys and return them in the
 	 * pr_keys->keys array. The keys array will have been allocated at the
@@ -35,9 +36,11 @@ struct pr_ops {
 	 * contains, so the caller can retry with a larger array.
 	 */
 	int (*pr_read_keys)(struct block_device *bdev,
-			struct pr_keys *keys_info, u32 keys_len);
+			struct pr_keys *keys_info, u32 keys_len,
+			blk_status_t *blk_stat);
 	int (*pr_read_reservation)(struct block_device *bdev,
-			struct pr_held_reservation *rsv);
+			struct pr_held_reservation *rsv,
+			blk_status_t *blk_stat);
 };
 
 #endif /* LINUX_PR_H */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v3 13/19] nvme: Have NVMe pr_ops return a blk_status_t
  2022-10-26 23:19 [PATCH v3 00/19] Use block pr_ops in LIO Mike Christie
                   ` (11 preceding siblings ...)
  2022-10-26 23:19 ` [PATCH v3 12/19] block,nvme,scsi,dm: Add blk_status to pr_ops callouts Mike Christie
@ 2022-10-26 23:19 ` Mike Christie
  2022-10-26 23:19 ` [PATCH v3 14/19] scsi: Export scsi_result_to_blk_status Mike Christie
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 54+ messages in thread
From: Mike Christie @ 2022-10-26 23:19 UTC (permalink / raw)
  To: bvanassche, hch, martin.petersen, linux-scsi, james.bottomley,
	linux-block, dm-devel, snitzer, axboe, linux-nvme, chaitanyak,
	kbusch, target-devel
  Cc: Mike Christie

If register or reserve hit a reservation conflict upper layers like LIO
need to pass that error to the initiator. To do this it has to know the
device/driver type so it can convert the return code because that's
currently a NVMe specific value. Instead of having the upper layers
figure out the device/driver type and call a NVMe conversion function this
has NVMe do the conversion and return a blk_status_t which the upper
layer knows how to handle.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/nvme/host/core.c |  2 +-
 drivers/nvme/host/nvme.h |  1 +
 drivers/nvme/host/pr.c   | 54 ++++++++++++++++++++++++++--------------
 3 files changed, 38 insertions(+), 19 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 2de9c42094a6..1fd152a90ed4 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -248,7 +248,7 @@ static void nvme_delete_ctrl_sync(struct nvme_ctrl *ctrl)
 	nvme_put_ctrl(ctrl);
 }
 
-static blk_status_t nvme_error_status(u16 status)
+blk_status_t nvme_error_status(u16 status)
 {
 	switch (status & 0x7ff) {
 	case NVME_SC_SUCCESS:
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index e82be1f1373d..d60da2b8c1c6 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -855,6 +855,7 @@ int nvme_ns_head_chr_uring_cmd(struct io_uring_cmd *ioucmd,
 		unsigned int issue_flags);
 int nvme_getgeo(struct block_device *bdev, struct hd_geometry *geo);
 int nvme_dev_uring_cmd(struct io_uring_cmd *ioucmd, unsigned int issue_flags);
+blk_status_t nvme_error_status(u16 status);
 
 extern const struct attribute_group *nvme_ns_id_attr_groups[];
 extern const struct pr_ops nvme_pr_ops;
diff --git a/drivers/nvme/host/pr.c b/drivers/nvme/host/pr.c
index 854a26ce55fe..818955c570e3 100644
--- a/drivers/nvme/host/pr.c
+++ b/drivers/nvme/host/pr.c
@@ -43,7 +43,8 @@ static enum pr_type block_pr_type_from_nvme(enum nvme_pr_type type)
 }
 
 static int nvme_send_ns_head_pr_command(struct block_device *bdev,
-		struct nvme_command *c, u8 *data, unsigned int data_len)
+		struct nvme_command *c, u8 *data, unsigned int data_len,
+		blk_status_t *blk_stat)
 {
 	struct nvme_ns_head *head = bdev->bd_disk->private_data;
 	int srcu_idx = srcu_read_lock(&head->srcu);
@@ -53,20 +54,28 @@ static int nvme_send_ns_head_pr_command(struct block_device *bdev,
 	if (ns) {
 		c->common.nsid = cpu_to_le32(ns->head->ns_id);
 		ret = nvme_submit_sync_cmd(ns->queue, c, data, data_len);
+		if (blk_stat && ret >= 0)
+			*blk_stat = nvme_error_status(ret);
 	}
 	srcu_read_unlock(&head->srcu, srcu_idx);
 	return ret;
 }
 
 static int nvme_send_ns_pr_command(struct nvme_ns *ns, struct nvme_command *c,
-		u8 *data, unsigned int data_len)
+		u8 *data, unsigned int data_len, blk_status_t *blk_stat)
 {
+	int ret;
+
 	c->common.nsid = cpu_to_le32(ns->head->ns_id);
-	return nvme_submit_sync_cmd(ns->queue, c, data, data_len);
+	ret = nvme_submit_sync_cmd(ns->queue, c, data, data_len);
+	if (blk_stat && ret >= 0)
+		*blk_stat = nvme_error_status(ret);
+	return ret;
 }
 
 static int nvme_pr_command(struct block_device *bdev, u32 cdw10,
-				u64 key, u64 sa_key, u8 op)
+				u64 key, u64 sa_key, u8 op,
+				blk_status_t *blk_stat)
 {
 	struct nvme_command c = { };
 	u8 data[16] = { 0, };
@@ -80,9 +89,9 @@ static int nvme_pr_command(struct block_device *bdev, u32 cdw10,
 	if (IS_ENABLED(CONFIG_NVME_MULTIPATH) &&
 	    bdev->bd_disk->fops == &nvme_ns_head_ops)
 		return nvme_send_ns_head_pr_command(bdev, &c, data,
-						    sizeof(data));
-	return nvme_send_ns_pr_command(bdev->bd_disk->private_data, &c, data,
-				       sizeof(data));
+						    sizeof(data), blk_stat);
+	return nvme_send_ns_pr_command(bdev->bd_disk->private_data, &c,
+				       data, sizeof(data), blk_stat);
 }
 
 static int nvme_pr_register(struct block_device *bdev, u64 old,
@@ -96,7 +105,8 @@ static int nvme_pr_register(struct block_device *bdev, u64 old,
 	cdw10 = old ? 2 : 0;
 	cdw10 |= (flags & PR_FL_IGNORE_KEY) ? 1 << 3 : 0;
 	cdw10 |= (1 << 30) | (1 << 31); /* PTPL=1 */
-	return nvme_pr_command(bdev, cdw10, old, new, nvme_cmd_resv_register);
+	return nvme_pr_command(bdev, cdw10, old, new, nvme_cmd_resv_register,
+			       blk_stat);
 }
 
 static int nvme_pr_reserve(struct block_device *bdev, u64 key,
@@ -109,7 +119,8 @@ static int nvme_pr_reserve(struct block_device *bdev, u64 key,
 
 	cdw10 = nvme_pr_type_from_blk(type) << 8;
 	cdw10 |= ((flags & PR_FL_IGNORE_KEY) ? 1 << 3 : 0);
-	return nvme_pr_command(bdev, cdw10, key, 0, nvme_cmd_resv_acquire);
+	return nvme_pr_command(bdev, cdw10, key, 0, nvme_cmd_resv_acquire,
+			       blk_stat);
 }
 
 static int nvme_pr_preempt(struct block_device *bdev, u64 old, u64 new,
@@ -117,7 +128,8 @@ static int nvme_pr_preempt(struct block_device *bdev, u64 old, u64 new,
 {
 	u32 cdw10 = nvme_pr_type_from_blk(type) << 8 | (abort ? 2 : 1);
 
-	return nvme_pr_command(bdev, cdw10, old, new, nvme_cmd_resv_acquire);
+	return nvme_pr_command(bdev, cdw10, old, new, nvme_cmd_resv_acquire,
+			       blk_stat);
 }
 
 static int nvme_pr_clear(struct block_device *bdev, u64 key,
@@ -125,7 +137,8 @@ static int nvme_pr_clear(struct block_device *bdev, u64 key,
 {
 	u32 cdw10 = 1 | (key ? 0 : 1 << 3);
 
-	return nvme_pr_command(bdev, cdw10, key, 0, nvme_cmd_resv_release);
+	return nvme_pr_command(bdev, cdw10, key, 0, nvme_cmd_resv_release,
+			       blk_stat);
 }
 
 static int nvme_pr_release(struct block_device *bdev, u64 key, enum pr_type type,
@@ -133,11 +146,12 @@ static int nvme_pr_release(struct block_device *bdev, u64 key, enum pr_type type
 {
 	u32 cdw10 = nvme_pr_type_from_blk(type) << 8 | (key ? 0 : 1 << 3);
 
-	return nvme_pr_command(bdev, cdw10, key, 0, nvme_cmd_resv_release);
+	return nvme_pr_command(bdev, cdw10, key, 0, nvme_cmd_resv_release,
+			       blk_stat);
 }
 
 static int nvme_pr_resv_report(struct block_device *bdev, u8 *data,
-		u32 data_len, bool *eds)
+		u32 data_len, bool *eds, blk_status_t *blk_stat)
 {
 	struct nvme_command c = { };
 	int ret;
@@ -148,12 +162,16 @@ static int nvme_pr_resv_report(struct block_device *bdev, u8 *data,
 	*eds = true;
 
 retry:
+	if (blk_stat)
+		*blk_stat = 0;
+
 	if (IS_ENABLED(CONFIG_NVME_MULTIPATH) &&
 	    bdev->bd_disk->fops == &nvme_ns_head_ops)
-		ret = nvme_send_ns_head_pr_command(bdev, &c, data, data_len);
+		ret = nvme_send_ns_head_pr_command(bdev, &c, data, data_len,
+						   blk_stat);
 	else
 		ret = nvme_send_ns_pr_command(bdev->bd_disk->private_data, &c,
-					      data, data_len);
+					      data, data_len, blk_stat);
 	if (ret == NVME_SC_HOST_ID_INCONSIST &&
 	    c.common.cdw11 == NVME_EXTENDED_DATA_STRUCT) {
 		c.common.cdw11 = 0;
@@ -184,7 +202,7 @@ static int nvme_pr_read_keys(struct block_device *bdev,
 	if (!data)
 		return -ENOMEM;
 
-	ret = nvme_pr_resv_report(bdev, data, data_len, &eds);
+	ret = nvme_pr_resv_report(bdev, data, data_len, &eds, blk_stat);
 	if (ret)
 		goto free_data;
 
@@ -224,7 +242,7 @@ static int nvme_pr_read_reservation(struct block_device *bdev,
 	 * the response buffer.
 	 */
 	ret = nvme_pr_resv_report(bdev, (u8 *)&tmp_status, sizeof(tmp_status),
-				  &eds);
+				  &eds, blk_stat);
 	if (ret)
 		return 0;
 
@@ -240,7 +258,7 @@ static int nvme_pr_read_reservation(struct block_device *bdev,
 	if (!data)
 		return -ENOMEM;
 
-	ret = nvme_pr_resv_report(bdev, data, data_len, &eds);
+	ret = nvme_pr_resv_report(bdev, data, data_len, &eds, blk_stat);
 	if (ret)
 		goto free_data;
 	status = (struct nvme_reservation_status *)data;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v3 14/19] scsi: Export scsi_result_to_blk_status
  2022-10-26 23:19 [PATCH v3 00/19] Use block pr_ops in LIO Mike Christie
                   ` (12 preceding siblings ...)
  2022-10-26 23:19 ` [PATCH v3 13/19] nvme: Have NVMe pr_ops return a blk_status_t Mike Christie
@ 2022-10-26 23:19 ` Mike Christie
  2022-10-26 23:19 ` [PATCH v3 15/19] scsi: Have sd pr_ops return a blk_status_t Mike Christie
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 54+ messages in thread
From: Mike Christie @ 2022-10-26 23:19 UTC (permalink / raw)
  To: bvanassche, hch, martin.petersen, linux-scsi, james.bottomley,
	linux-block, dm-devel, snitzer, axboe, linux-nvme, chaitanyak,
	kbusch, target-devel
  Cc: Mike Christie

Export scsi_result_to_blk_status so the sd pr_ops can get a BLK_STS error
that can be returned to other kernel pr ops users.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/scsi_lib.c  | 3 ++-
 include/scsi/scsi_cmnd.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index f19bc3a7ef59..9367c900f093 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -578,7 +578,7 @@ static inline u8 get_scsi_ml_byte(int result)
  *
  * Translate a SCSI result code into a blk_status_t value.
  */
-static blk_status_t scsi_result_to_blk_status(int result)
+blk_status_t scsi_result_to_blk_status(int result)
 {
 	/*
 	 * Check the scsi-ml byte first in case we converted a host or status
@@ -609,6 +609,7 @@ static blk_status_t scsi_result_to_blk_status(int result)
 		return BLK_STS_IOERR;
 	}
 }
+EXPORT_SYMBOL_GPL(scsi_result_to_blk_status);
 
 /**
  * scsi_rq_err_bytes - determine number of bytes till the next failure boundary
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index ba629ac4525f..2ef3b92e901f 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -174,6 +174,7 @@ static inline void *scsi_cmd_priv(struct scsi_cmnd *cmd)
 void scsi_done(struct scsi_cmnd *cmd);
 void scsi_done_direct(struct scsi_cmnd *cmd);
 
+blk_status_t scsi_result_to_blk_status(int result);
 extern void scsi_finish_command(struct scsi_cmnd *cmd);
 
 extern void *scsi_kmap_atomic_sg(struct scatterlist *sg, int sg_count,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v3 15/19] scsi: Have sd pr_ops return a blk_status_t
  2022-10-26 23:19 [PATCH v3 00/19] Use block pr_ops in LIO Mike Christie
                   ` (13 preceding siblings ...)
  2022-10-26 23:19 ` [PATCH v3 14/19] scsi: Export scsi_result_to_blk_status Mike Christie
@ 2022-10-26 23:19 ` Mike Christie
  2022-10-26 23:19 ` [PATCH v3 16/19] scsi: target: Rename sbc_ops to exec_cmd_ops Mike Christie
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 54+ messages in thread
From: Mike Christie @ 2022-10-26 23:19 UTC (permalink / raw)
  To: bvanassche, hch, martin.petersen, linux-scsi, james.bottomley,
	linux-block, dm-devel, snitzer, axboe, linux-nvme, chaitanyak,
	kbusch, target-devel
  Cc: Mike Christie

If a register or reserve hit a reservation conflict upper layers like LIO
need to pass that error to the initiator. To do this it has to know the
device/driver type so it can convert the return code because that's
currently a device/driver specific value. Instead of having the upper
layers figure out the device/driver type and call a driver specific
conversion function this has SCSI do the conversion and return a
blk_status_t which the upper layer knows how to handle. This will handle
the reservation conflict and in the future we can handle timeouts,
transport errors and sense errors if needed.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/sd.c | 28 +++++++++++++++++++---------
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 8a39f25e4470..44d25194ff4e 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1696,7 +1696,8 @@ static int sd_get_unique_id(struct gendisk *disk, u8 id[16],
 }
 
 static int sd_pr_in_command(struct block_device *bdev, u8 sa,
-			    unsigned char *data, int data_len)
+			    unsigned char *data, int data_len,
+			    blk_status_t *blk_stat)
 {
 	struct scsi_disk *sdkp = scsi_disk(bdev->bd_disk);
 	struct scsi_device *sdev = sdkp->device;
@@ -1732,6 +1733,9 @@ static int sd_pr_in_command(struct block_device *bdev, u8 sa,
 		scsi_print_sense_hdr(sdev, NULL, &sshdr);
 	}
 
+	if (blk_stat && result >= 0)
+		*blk_stat = scsi_result_to_blk_status(result);
+
 	return result;
 }
 
@@ -1746,7 +1750,7 @@ static int sd_pr_read_keys(struct block_device *bdev, struct pr_keys *keys_info,
 	if (!data)
 		return -ENOMEM;
 
-	result = sd_pr_in_command(bdev, READ_KEYS, data, data_len);
+	result = sd_pr_in_command(bdev, READ_KEYS, data, data_len, blk_stat);
 	if (result)
 		goto free_data;
 
@@ -1775,7 +1779,8 @@ static int sd_pr_read_reservation(struct block_device *bdev,
 	u8 data[24] = { 0, };
 	int result, len;
 
-	result = sd_pr_in_command(bdev, READ_RESERVATION, data, sizeof(data));
+	result = sd_pr_in_command(bdev, READ_RESERVATION, data, sizeof(data),
+				  blk_stat);
 	if (result)
 		return result;
 
@@ -1799,7 +1804,8 @@ static int sd_pr_read_reservation(struct block_device *bdev,
 }
 
 static int sd_pr_out_command(struct block_device *bdev, u8 sa, u64 key,
-		u64 sa_key, enum scsi_pr_type type, u8 flags)
+		u64 sa_key, enum scsi_pr_type type, u8 flags,
+		blk_status_t *blk_stat)
 {
 	struct scsi_disk *sdkp = scsi_disk(bdev->bd_disk);
 	struct scsi_device *sdev = sdkp->device;
@@ -1844,6 +1850,9 @@ static int sd_pr_out_command(struct block_device *bdev, u8 sa, u64 key,
 		scsi_print_sense_hdr(sdev, NULL, &sshdr);
 	}
 
+	if (blk_stat && result >= 0)
+		*blk_stat = scsi_result_to_blk_status(result);
+
 	return result;
 }
 
@@ -1854,7 +1863,8 @@ static int sd_pr_register(struct block_device *bdev, u64 old_key, u64 new_key,
 		return -EOPNOTSUPP;
 	return sd_pr_out_command(bdev, (flags & PR_FL_IGNORE_KEY) ? 0x06 : 0x00,
 			old_key, new_key, 0,
-			(1 << 0) /* APTPL */);
+			(1 << 0) /* APTPL */,
+			blk_stat);
 }
 
 static int sd_pr_reserve(struct block_device *bdev, u64 key, enum pr_type type,
@@ -1863,27 +1873,27 @@ static int sd_pr_reserve(struct block_device *bdev, u64 key, enum pr_type type,
 	if (flags)
 		return -EOPNOTSUPP;
 	return sd_pr_out_command(bdev, 0x01, key, 0,
-				 block_pr_type_to_scsi(type), 0);
+				 block_pr_type_to_scsi(type), 0, blk_stat);
 }
 
 static int sd_pr_release(struct block_device *bdev, u64 key, enum pr_type type,
 		blk_status_t *blk_stat)
 {
 	return sd_pr_out_command(bdev, 0x02, key, 0,
-				 block_pr_type_to_scsi(type), 0);
+				 block_pr_type_to_scsi(type), 0, blk_stat);
 }
 
 static int sd_pr_preempt(struct block_device *bdev, u64 old_key, u64 new_key,
 		enum pr_type type, bool abort, blk_status_t *blk_stat)
 {
 	return sd_pr_out_command(bdev, abort ? 0x05 : 0x04, old_key, new_key,
-				 block_pr_type_to_scsi(type), 0);
+				 block_pr_type_to_scsi(type), 0, blk_stat);
 }
 
 static int sd_pr_clear(struct block_device *bdev, u64 key,
 		blk_status_t *blk_stat)
 {
-	return sd_pr_out_command(bdev, 0x03, key, 0, 0, 0);
+	return sd_pr_out_command(bdev, 0x03, key, 0, 0, 0, blk_stat);
 }
 
 static const struct pr_ops sd_pr_ops = {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v3 16/19] scsi: target: Rename sbc_ops to exec_cmd_ops
  2022-10-26 23:19 [PATCH v3 00/19] Use block pr_ops in LIO Mike Christie
                   ` (14 preceding siblings ...)
  2022-10-26 23:19 ` [PATCH v3 15/19] scsi: Have sd pr_ops return a blk_status_t Mike Christie
@ 2022-10-26 23:19 ` Mike Christie
  2022-10-26 23:19 ` [PATCH v3 17/19] scsi: target: Allow backends to hook into PR handling Mike Christie
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 54+ messages in thread
From: Mike Christie @ 2022-10-26 23:19 UTC (permalink / raw)
  To: bvanassche, hch, martin.petersen, linux-scsi, james.bottomley,
	linux-block, dm-devel, snitzer, axboe, linux-nvme, chaitanyak,
	kbusch, target-devel
  Cc: Mike Christie

The next patches allow us to call the block layer's pr_ops from the
backends. This will require allowing the backends to hook into the cmd
processing for SPC commands, so this renames sbc_ops to a more generic
exec_cmd_ops.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/target/target_core_file.c    |  4 ++--
 drivers/target/target_core_iblock.c  |  4 ++--
 drivers/target/target_core_rd.c      |  4 ++--
 drivers/target/target_core_sbc.c     | 13 +++++++------
 include/target/target_core_backend.h |  4 ++--
 5 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/drivers/target/target_core_file.c b/drivers/target/target_core_file.c
index 28aa643be5d5..8336c0e0b1db 100644
--- a/drivers/target/target_core_file.c
+++ b/drivers/target/target_core_file.c
@@ -903,7 +903,7 @@ static void fd_free_prot(struct se_device *dev)
 	fd_dev->fd_prot_file = NULL;
 }
 
-static struct sbc_ops fd_sbc_ops = {
+static struct exec_cmd_ops fd_exec_cmd_ops = {
 	.execute_rw		= fd_execute_rw,
 	.execute_sync_cache	= fd_execute_sync_cache,
 	.execute_write_same	= fd_execute_write_same,
@@ -913,7 +913,7 @@ static struct sbc_ops fd_sbc_ops = {
 static sense_reason_t
 fd_parse_cdb(struct se_cmd *cmd)
 {
-	return sbc_parse_cdb(cmd, &fd_sbc_ops);
+	return sbc_parse_cdb(cmd, &fd_exec_cmd_ops);
 }
 
 static const struct target_backend_ops fileio_ops = {
diff --git a/drivers/target/target_core_iblock.c b/drivers/target/target_core_iblock.c
index 8351c974cee3..5db7318b4822 100644
--- a/drivers/target/target_core_iblock.c
+++ b/drivers/target/target_core_iblock.c
@@ -878,7 +878,7 @@ static unsigned int iblock_get_io_opt(struct se_device *dev)
 	return bdev_io_opt(bd);
 }
 
-static struct sbc_ops iblock_sbc_ops = {
+static struct exec_cmd_ops iblock_exec_cmd_ops = {
 	.execute_rw		= iblock_execute_rw,
 	.execute_sync_cache	= iblock_execute_sync_cache,
 	.execute_write_same	= iblock_execute_write_same,
@@ -888,7 +888,7 @@ static struct sbc_ops iblock_sbc_ops = {
 static sense_reason_t
 iblock_parse_cdb(struct se_cmd *cmd)
 {
-	return sbc_parse_cdb(cmd, &iblock_sbc_ops);
+	return sbc_parse_cdb(cmd, &iblock_exec_cmd_ops);
 }
 
 static bool iblock_get_write_cache(struct se_device *dev)
diff --git a/drivers/target/target_core_rd.c b/drivers/target/target_core_rd.c
index 6648c1c90e19..6f67cc09c2b5 100644
--- a/drivers/target/target_core_rd.c
+++ b/drivers/target/target_core_rd.c
@@ -643,14 +643,14 @@ static void rd_free_prot(struct se_device *dev)
 	rd_release_prot_space(rd_dev);
 }
 
-static struct sbc_ops rd_sbc_ops = {
+static struct exec_cmd_ops rd_exec_cmd_ops = {
 	.execute_rw		= rd_execute_rw,
 };
 
 static sense_reason_t
 rd_parse_cdb(struct se_cmd *cmd)
 {
-	return sbc_parse_cdb(cmd, &rd_sbc_ops);
+	return sbc_parse_cdb(cmd, &rd_exec_cmd_ops);
 }
 
 static const struct target_backend_ops rd_mcp_ops = {
diff --git a/drivers/target/target_core_sbc.c b/drivers/target/target_core_sbc.c
index 1e3216de1e04..74133efda529 100644
--- a/drivers/target/target_core_sbc.c
+++ b/drivers/target/target_core_sbc.c
@@ -192,7 +192,7 @@ EXPORT_SYMBOL(sbc_get_write_same_sectors);
 static sense_reason_t
 sbc_execute_write_same_unmap(struct se_cmd *cmd)
 {
-	struct sbc_ops *ops = cmd->protocol_data;
+	struct exec_cmd_ops *ops = cmd->protocol_data;
 	sector_t nolb = sbc_get_write_same_sectors(cmd);
 	sense_reason_t ret;
 
@@ -279,7 +279,8 @@ static inline unsigned long long transport_lba_64_ext(unsigned char *cdb)
 }
 
 static sense_reason_t
-sbc_setup_write_same(struct se_cmd *cmd, unsigned char flags, struct sbc_ops *ops)
+sbc_setup_write_same(struct se_cmd *cmd, unsigned char flags,
+		     struct exec_cmd_ops *ops)
 {
 	struct se_device *dev = cmd->se_dev;
 	sector_t end_lba = dev->transport->get_blocks(dev) + 1;
@@ -348,7 +349,7 @@ sbc_setup_write_same(struct se_cmd *cmd, unsigned char flags, struct sbc_ops *op
 static sense_reason_t
 sbc_execute_rw(struct se_cmd *cmd)
 {
-	struct sbc_ops *ops = cmd->protocol_data;
+	struct exec_cmd_ops *ops = cmd->protocol_data;
 
 	return ops->execute_rw(cmd, cmd->t_data_sg, cmd->t_data_nents,
 			       cmd->data_direction);
@@ -564,7 +565,7 @@ static sense_reason_t compare_and_write_callback(struct se_cmd *cmd, bool succes
 static sense_reason_t
 sbc_compare_and_write(struct se_cmd *cmd)
 {
-	struct sbc_ops *ops = cmd->protocol_data;
+	struct exec_cmd_ops *ops = cmd->protocol_data;
 	struct se_device *dev = cmd->se_dev;
 	sense_reason_t ret;
 	int rc;
@@ -762,7 +763,7 @@ sbc_check_dpofua(struct se_device *dev, struct se_cmd *cmd, unsigned char *cdb)
 }
 
 sense_reason_t
-sbc_parse_cdb(struct se_cmd *cmd, struct sbc_ops *ops)
+sbc_parse_cdb(struct se_cmd *cmd, struct exec_cmd_ops *ops)
 {
 	struct se_device *dev = cmd->se_dev;
 	unsigned char *cdb = cmd->t_task_cdb;
@@ -1074,7 +1075,7 @@ EXPORT_SYMBOL(sbc_get_device_type);
 static sense_reason_t
 sbc_execute_unmap(struct se_cmd *cmd)
 {
-	struct sbc_ops *ops = cmd->protocol_data;
+	struct exec_cmd_ops *ops = cmd->protocol_data;
 	struct se_device *dev = cmd->se_dev;
 	unsigned char *buf, *ptr = NULL;
 	sector_t lba;
diff --git a/include/target/target_core_backend.h b/include/target/target_core_backend.h
index a3c193df25b3..c5df78959532 100644
--- a/include/target/target_core_backend.h
+++ b/include/target/target_core_backend.h
@@ -62,7 +62,7 @@ struct target_backend_ops {
 	struct configfs_attribute **tb_dev_action_attrs;
 };
 
-struct sbc_ops {
+struct exec_cmd_ops {
 	sense_reason_t (*execute_rw)(struct se_cmd *cmd, struct scatterlist *,
 				     u32, enum dma_data_direction);
 	sense_reason_t (*execute_sync_cache)(struct se_cmd *cmd);
@@ -86,7 +86,7 @@ sense_reason_t	spc_emulate_report_luns(struct se_cmd *cmd);
 sense_reason_t	spc_emulate_inquiry_std(struct se_cmd *, unsigned char *);
 sense_reason_t	spc_emulate_evpd_83(struct se_cmd *, unsigned char *);
 
-sense_reason_t	sbc_parse_cdb(struct se_cmd *cmd, struct sbc_ops *ops);
+sense_reason_t	sbc_parse_cdb(struct se_cmd *cmd, struct exec_cmd_ops *ops);
 u32	sbc_get_device_rev(struct se_device *dev);
 u32	sbc_get_device_type(struct se_device *dev);
 sector_t	sbc_get_write_same_sectors(struct se_cmd *cmd);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v3 17/19] scsi: target: Allow backends to hook into PR handling
  2022-10-26 23:19 [PATCH v3 00/19] Use block pr_ops in LIO Mike Christie
                   ` (15 preceding siblings ...)
  2022-10-26 23:19 ` [PATCH v3 16/19] scsi: target: Rename sbc_ops to exec_cmd_ops Mike Christie
@ 2022-10-26 23:19 ` Mike Christie
  2022-10-26 23:19 ` [PATCH v3 18/19] scsi: target: Don't support SCSI-2 RESERVE/RELEASE Mike Christie
  2022-10-26 23:19 ` [PATCH v3 19/19] scsi: target: Add block PR support to iblock Mike Christie
  18 siblings, 0 replies; 54+ messages in thread
From: Mike Christie @ 2022-10-26 23:19 UTC (permalink / raw)
  To: bvanassche, hch, martin.petersen, linux-scsi, james.bottomley,
	linux-block, dm-devel, snitzer, axboe, linux-nvme, chaitanyak,
	kbusch, target-devel
  Cc: Mike Christie

For the cases where you want to export a device to a VM via a single
I_T nexus and want to passthrough the PR handling to the physical/real
device you have to use pscsi or tcmu. Both are good for specific uses
however for the case where you want good performance, and are not using
SCSI devices directly (using DM/MD RAID or multipath devices) then we are
out of luck.

The following patches allow iblock to mimimally hook into the LIO PR code
and then pass the PR handling to the physical device. Note that like with
the tcmu an pscsi cases it's only supported when you export the device via
one I_T nexus.

This patch adds the initial LIO callouts. The next patch will modify
iblock.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/target/target_core_pr.c      | 60 ++++++++++++++++++++++++++++
 include/target/target_core_backend.h |  5 +++
 2 files changed, 65 insertions(+)

diff --git a/drivers/target/target_core_pr.c b/drivers/target/target_core_pr.c
index a1d67554709f..ac85f2b87c4d 100644
--- a/drivers/target/target_core_pr.c
+++ b/drivers/target/target_core_pr.c
@@ -3519,6 +3519,26 @@ core_scsi3_emulate_pro_register_and_move(struct se_cmd *cmd, u64 res_key,
 	return ret;
 }
 
+static sense_reason_t
+target_try_pr_out_pt(struct se_cmd *cmd, u8 sa, u64 res_key, u64 sa_res_key,
+		     u8 type, bool aptpl, bool all_tg_pt, bool spec_i_pt)
+{
+	struct exec_cmd_ops *ops = cmd->protocol_data;
+
+	if (!cmd->se_sess || !cmd->se_lun) {
+		pr_err("SPC-3 PR: se_sess || struct se_lun is NULL!\n");
+		return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
+	}
+
+	if (!ops->execute_pr_out) {
+		pr_err("SPC-3 PR: Device has been configured for PR passthrough but it's not supported by the backend.\n");
+		return TCM_UNSUPPORTED_SCSI_OPCODE;
+	}
+
+	return ops->execute_pr_out(cmd, sa, res_key, sa_res_key, type,
+				   aptpl, all_tg_pt, spec_i_pt);
+}
+
 /*
  * See spc4r17 section 6.14 Table 170
  */
@@ -3622,6 +3642,12 @@ target_scsi3_emulate_pr_out(struct se_cmd *cmd)
 		return TCM_PARAMETER_LIST_LENGTH_ERROR;
 	}
 
+	if (dev->transport_flags & TRANSPORT_FLAG_PASSTHROUGH_PGR) {
+		ret = target_try_pr_out_pt(cmd, sa, res_key, sa_res_key, type,
+					   aptpl, all_tg_pt, spec_i_pt);
+		goto done;
+	}
+
 	/*
 	 * (core_scsi3_emulate_pro_* function parameters
 	 * are defined by spc4r17 Table 174:
@@ -3663,6 +3689,7 @@ target_scsi3_emulate_pr_out(struct se_cmd *cmd)
 		return TCM_INVALID_CDB_FIELD;
 	}
 
+done:
 	if (!ret)
 		target_complete_cmd(cmd, SAM_STAT_GOOD);
 	return ret;
@@ -4020,6 +4047,33 @@ core_scsi3_pri_read_full_status(struct se_cmd *cmd)
 	return 0;
 }
 
+static sense_reason_t target_try_pr_in_pt(struct se_cmd *cmd)
+{
+	struct exec_cmd_ops *ops = cmd->protocol_data;
+	unsigned char *buf;
+	sense_reason_t ret;
+
+	if (cmd->data_length < 8) {
+		pr_err("PRIN SA SCSI Data Length: %u too small\n",
+		       cmd->data_length);
+		return TCM_INVALID_CDB_FIELD;
+	}
+
+	if (!ops->execute_pr_in) {
+		pr_err("SPC-3 PR: Device has been configured for PR passthrough but it's not supported by the backend.\n");
+		return TCM_UNSUPPORTED_SCSI_OPCODE;
+	}
+
+	buf = transport_kmap_data_sg(cmd);
+	if (!buf)
+		return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
+
+	ret = ops->execute_pr_in(cmd, cmd->t_task_cdb[1] & 0x1f, buf);
+
+	transport_kunmap_data_sg(cmd);
+	return ret;
+}
+
 sense_reason_t
 target_scsi3_emulate_pr_in(struct se_cmd *cmd)
 {
@@ -4041,6 +4095,11 @@ target_scsi3_emulate_pr_in(struct se_cmd *cmd)
 		return TCM_RESERVATION_CONFLICT;
 	}
 
+	if (cmd->se_dev->transport_flags & TRANSPORT_FLAG_PASSTHROUGH_PGR) {
+		ret = target_try_pr_in_pt(cmd);
+		goto done;
+	}
+
 	switch (cmd->t_task_cdb[1] & 0x1f) {
 	case PRI_READ_KEYS:
 		ret = core_scsi3_pri_read_keys(cmd);
@@ -4060,6 +4119,7 @@ target_scsi3_emulate_pr_in(struct se_cmd *cmd)
 		return TCM_INVALID_CDB_FIELD;
 	}
 
+done:
 	if (!ret)
 		target_complete_cmd(cmd, SAM_STAT_GOOD);
 	return ret;
diff --git a/include/target/target_core_backend.h b/include/target/target_core_backend.h
index c5df78959532..84bfdfb14997 100644
--- a/include/target/target_core_backend.h
+++ b/include/target/target_core_backend.h
@@ -69,6 +69,11 @@ struct exec_cmd_ops {
 	sense_reason_t (*execute_write_same)(struct se_cmd *cmd);
 	sense_reason_t (*execute_unmap)(struct se_cmd *cmd,
 				sector_t lba, sector_t nolb);
+	sense_reason_t (*execute_pr_out)(struct se_cmd *cmd, u8 sa, u64 key,
+					 u64 sa_key, u8 type, bool aptpl,
+					 bool all_tg_pt, bool spec_i_pt);
+	sense_reason_t (*execute_pr_in)(struct se_cmd *cmd, u8 sa,
+					unsigned char *param_data);
 };
 
 int	transport_backend_register(const struct target_backend_ops *);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v3 18/19] scsi: target: Don't support SCSI-2 RESERVE/RELEASE
  2022-10-26 23:19 [PATCH v3 00/19] Use block pr_ops in LIO Mike Christie
                   ` (16 preceding siblings ...)
  2022-10-26 23:19 ` [PATCH v3 17/19] scsi: target: Allow backends to hook into PR handling Mike Christie
@ 2022-10-26 23:19 ` Mike Christie
  2022-10-26 23:19 ` [PATCH v3 19/19] scsi: target: Add block PR support to iblock Mike Christie
  18 siblings, 0 replies; 54+ messages in thread
From: Mike Christie @ 2022-10-26 23:19 UTC (permalink / raw)
  To: bvanassche, hch, martin.petersen, linux-scsi, james.bottomley,
	linux-block, dm-devel, snitzer, axboe, linux-nvme, chaitanyak,
	kbusch, target-devel
  Cc: Mike Christie

The pr_ops don't support SCSI-2 RESERVE/RELEASE so fail them during
parsing.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/target/target_core_spc.c | 25 +++++++++++++++++++------
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/drivers/target/target_core_spc.c b/drivers/target/target_core_spc.c
index 7cca3b15472b..fb5febc437a0 100644
--- a/drivers/target/target_core_spc.c
+++ b/drivers/target/target_core_spc.c
@@ -1320,12 +1320,25 @@ spc_parse_cdb(struct se_cmd *cmd, unsigned int *size)
 	struct se_device *dev = cmd->se_dev;
 	unsigned char *cdb = cmd->t_task_cdb;
 
-	if (!dev->dev_attrib.emulate_pr &&
-	    ((cdb[0] == PERSISTENT_RESERVE_IN) ||
-	     (cdb[0] == PERSISTENT_RESERVE_OUT) ||
-	     (cdb[0] == RELEASE || cdb[0] == RELEASE_10) ||
-	     (cdb[0] == RESERVE || cdb[0] == RESERVE_10))) {
-		return TCM_UNSUPPORTED_SCSI_OPCODE;
+	switch (cdb[0]) {
+	case RESERVE:
+	case RESERVE_10:
+	case RELEASE:
+	case RELEASE_10:
+		if (!dev->dev_attrib.emulate_pr)
+			return TCM_UNSUPPORTED_SCSI_OPCODE;
+		/*
+		 * The block layer pr_ops don't support the old RESERVE/RELEASE
+		 * commands.
+		 */
+		if (dev->transport_flags & TRANSPORT_FLAG_PASSTHROUGH_PGR)
+			return TCM_UNSUPPORTED_SCSI_OPCODE;
+		break;
+	case PERSISTENT_RESERVE_IN:
+	case PERSISTENT_RESERVE_OUT:
+		if (!dev->dev_attrib.emulate_pr)
+			return TCM_UNSUPPORTED_SCSI_OPCODE;
+		break;
 	}
 
 	switch (cdb[0]) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v3 19/19] scsi: target: Add block PR support to iblock
  2022-10-26 23:19 [PATCH v3 00/19] Use block pr_ops in LIO Mike Christie
                   ` (17 preceding siblings ...)
  2022-10-26 23:19 ` [PATCH v3 18/19] scsi: target: Don't support SCSI-2 RESERVE/RELEASE Mike Christie
@ 2022-10-26 23:19 ` Mike Christie
  18 siblings, 0 replies; 54+ messages in thread
From: Mike Christie @ 2022-10-26 23:19 UTC (permalink / raw)
  To: bvanassche, hch, martin.petersen, linux-scsi, james.bottomley,
	linux-block, dm-devel, snitzer, axboe, linux-nvme, chaitanyak,
	kbusch, target-devel
  Cc: Mike Christie

This adds support for the block PR callouts to target_core_iblock. This
patch doesn't attempt to implement the entire spec because there's no way
support it all like SPEC_I_PT and ALL_TG_PT. This only supports
exporting the iblock device from one path on the local target.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/target/target_core_iblock.c | 292 +++++++++++++++++++++++++++-
 1 file changed, 287 insertions(+), 5 deletions(-)

diff --git a/drivers/target/target_core_iblock.c b/drivers/target/target_core_iblock.c
index 5db7318b4822..caf6958dd75d 100644
--- a/drivers/target/target_core_iblock.c
+++ b/drivers/target/target_core_iblock.c
@@ -23,13 +23,16 @@
 #include <linux/file.h>
 #include <linux/module.h>
 #include <linux/scatterlist.h>
+#include <linux/pr.h>
 #include <scsi/scsi_proto.h>
+#include <scsi/scsi_block_pr.h>
 #include <asm/unaligned.h>
 
 #include <target/target_core_base.h>
 #include <target/target_core_backend.h>
 
 #include "target_core_iblock.h"
+#include "target_core_pr.h"
 
 #define IBLOCK_MAX_BIO_PER_TASK	 32	/* max # of bios to submit at a time */
 #define IBLOCK_BIO_POOL_SIZE	128
@@ -310,7 +313,7 @@ static unsigned long long iblock_emulate_read_cap_with_block_size(
 	return blocks_long;
 }
 
-static void iblock_complete_cmd(struct se_cmd *cmd)
+static void iblock_complete_cmd(struct se_cmd *cmd, blk_status_t blk_status)
 {
 	struct iblock_req *ibr = cmd->priv;
 	u8 status;
@@ -318,7 +321,9 @@ static void iblock_complete_cmd(struct se_cmd *cmd)
 	if (!refcount_dec_and_test(&ibr->pending))
 		return;
 
-	if (atomic_read(&ibr->ib_bio_err_cnt))
+	if (blk_status == BLK_STS_NEXUS)
+		status = SAM_STAT_RESERVATION_CONFLICT;
+	else if (atomic_read(&ibr->ib_bio_err_cnt))
 		status = SAM_STAT_CHECK_CONDITION;
 	else
 		status = SAM_STAT_GOOD;
@@ -331,6 +336,7 @@ static void iblock_bio_done(struct bio *bio)
 {
 	struct se_cmd *cmd = bio->bi_private;
 	struct iblock_req *ibr = cmd->priv;
+	blk_status_t blk_status = bio->bi_status;
 
 	if (bio->bi_status) {
 		pr_err("bio error: %p,  err: %d\n", bio, bio->bi_status);
@@ -343,7 +349,7 @@ static void iblock_bio_done(struct bio *bio)
 
 	bio_put(bio);
 
-	iblock_complete_cmd(cmd);
+	iblock_complete_cmd(cmd, blk_status);
 }
 
 static struct bio *iblock_get_bio(struct se_cmd *cmd, sector_t lba, u32 sg_num,
@@ -759,7 +765,7 @@ iblock_execute_rw(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
 
 	if (!sgl_nents) {
 		refcount_set(&ibr->pending, 1);
-		iblock_complete_cmd(cmd);
+		iblock_complete_cmd(cmd, BLK_STS_OK);
 		return 0;
 	}
 
@@ -817,7 +823,7 @@ iblock_execute_rw(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
 	}
 
 	iblock_submit_bios(&list);
-	iblock_complete_cmd(cmd);
+	iblock_complete_cmd(cmd, BLK_STS_OK);
 	return 0;
 
 fail_put_bios:
@@ -829,6 +835,279 @@ iblock_execute_rw(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
 	return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
 }
 
+static sense_reason_t iblock_execute_pr_out(struct se_cmd *cmd, u8 sa, u64 key,
+					    u64 sa_key, u8 type, bool aptpl,
+					    bool all_tg_pt, bool spec_i_pt)
+{
+	struct se_device *dev = cmd->se_dev;
+	struct iblock_dev *ib_dev = IBLOCK_DEV(dev);
+	struct block_device *bdev = ib_dev->ibd_bd;
+	const struct pr_ops *ops = bdev->bd_disk->fops->pr_ops;
+	blk_status_t blk_stat = 0;
+	int ret;
+
+	if (!ops) {
+		pr_err("Block device does not support pr_ops but iblock device has been configured for PR passthrough.\n");
+		return TCM_UNSUPPORTED_SCSI_OPCODE;
+	}
+
+	switch (sa) {
+	case PRO_REGISTER_AND_MOVE:
+		pr_err("PRO_REGISTER_AND_MOVE is not supported by iblock PR passthrough.\n");
+		return TCM_UNSUPPORTED_SCSI_OPCODE;
+	case PRO_REPLACE_LOST_RESERVATION:
+		pr_err("PRO_REPLACE_LOST_RESERVATION is not supported by iblock PR passthrough.\n");
+		return TCM_UNSUPPORTED_SCSI_OPCODE;
+	case PRO_REGISTER:
+	case PRO_REGISTER_AND_IGNORE_EXISTING_KEY:
+		if (!ops->pr_register) {
+			pr_err("block device does not support pr_register.\n");
+			return TCM_UNSUPPORTED_SCSI_OPCODE;
+		}
+
+		/*
+		 * We only support one target port. We don't know the target
+		 * port config at this level and the block layer has a
+		 * different view.
+		 */
+		if (spec_i_pt || all_tg_pt) {
+			pr_err("SPC-3 PR: SPEC_I_PT and ALL_TG_PT are not supported by PR passthrough.\n");
+
+			return TCM_UNSUPPORTED_SCSI_OPCODE;
+		}
+
+		/* The block layer pr ops always enables aptpl */
+		if (!aptpl)
+			pr_info("APTPL not set by initiator, but will be used.\n");
+
+		ret = ops->pr_register(bdev, key, sa_key,
+				sa == PRO_REGISTER ? 0 : PR_FL_IGNORE_KEY,
+				&blk_stat);
+		break;
+	case PRO_RESERVE:
+		if (!ops->pr_reserve) {
+			pr_err("block_device does not support pr_reserve.\n");
+			return TCM_UNSUPPORTED_SCSI_OPCODE;
+		}
+
+		ret = ops->pr_reserve(bdev, key, scsi_pr_type_to_block(type), 0,
+				      &blk_stat);
+		break;
+	case PRO_CLEAR:
+		if (!ops->pr_clear) {
+			pr_err("block_device does not support pr_clear.\n");
+			return TCM_UNSUPPORTED_SCSI_OPCODE;
+		}
+
+		ret = ops->pr_clear(bdev, key, &blk_stat);
+		break;
+	case PRO_PREEMPT:
+	case PRO_PREEMPT_AND_ABORT:
+		if (!ops->pr_clear) {
+			pr_err("block_device does not support pr_preempt.\n");
+			return TCM_UNSUPPORTED_SCSI_OPCODE;
+		}
+
+		ret = ops->pr_preempt(bdev, key, sa_key,
+				      scsi_pr_type_to_block(type),
+				      sa == PRO_PREEMPT ? false : true,
+				      &blk_stat);
+		break;
+	case PRO_RELEASE:
+		if (!ops->pr_clear) {
+			pr_err("block_device does not support pr_pclear.\n");
+			return TCM_UNSUPPORTED_SCSI_OPCODE;
+		}
+
+		ret = ops->pr_release(bdev, key, scsi_pr_type_to_block(type),
+				      &blk_stat);
+		break;
+	default:
+		pr_err("Unknown PERSISTENT_RESERVE_OUT SA: 0x%02x\n", sa);
+		return TCM_UNSUPPORTED_SCSI_OPCODE;
+	}
+
+	if (!ret)
+		return TCM_NO_SENSE;
+	else if (blk_stat == BLK_STS_NEXUS)
+		return TCM_RESERVATION_CONFLICT;
+	else
+		return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
+}
+
+static void iblock_pr_report_caps(unsigned char *param_data)
+{
+	u16 len = 8;
+
+	put_unaligned_be16(len, &param_data[0]);
+	/*
+	 * When using the pr_ops passthrough method we only support exporting
+	 * the device through one target port because from the backend module
+	 * level we can't see the target port config. As a result we only
+	 * support registration directly from the I_T nexus the cmd is sent
+	 * through and do not set ATP_C here.
+	 *
+	 * The block layer pr_ops do not support passing in initiators so
+	 * we don't set SIP_C here.
+	 */
+	/* PTPL_C: Persistence across Target Power Loss bit */
+	param_data[2] |= 0x01;
+	/*
+	 * We are filling in the PERSISTENT RESERVATION TYPE MASK below, so
+	 * set the TMV: Task Mask Valid bit.
+	 */
+	param_data[3] |= 0x80;
+	/*
+	 * Change ALLOW COMMANDs to 0x20 or 0x40 later from Table 166
+	 */
+	param_data[3] |= 0x10; /* ALLOW COMMANDs field 001b */
+	/*
+	 * PTPL_A: Persistence across Target Power Loss Active bit. The block
+	 * layer pr ops always enables this so report it active.
+	 */
+	param_data[3] |= 0x01;
+	/*
+	 * Setup the PERSISTENT RESERVATION TYPE MASK from Table 212 spc4r37.
+	 */
+	param_data[4] |= 0x80; /* PR_TYPE_EXCLUSIVE_ACCESS_ALLREG */
+	param_data[4] |= 0x40; /* PR_TYPE_EXCLUSIVE_ACCESS_REGONLY */
+	param_data[4] |= 0x20; /* PR_TYPE_WRITE_EXCLUSIVE_REGONLY */
+	param_data[4] |= 0x08; /* PR_TYPE_EXCLUSIVE_ACCESS */
+	param_data[4] |= 0x02; /* PR_TYPE_WRITE_EXCLUSIVE */
+	param_data[5] |= 0x01; /* PR_TYPE_EXCLUSIVE_ACCESS_ALLREG */
+}
+
+static int iblock_pr_read_keys(struct se_cmd *cmd, unsigned char *param_data)
+{
+	struct se_device *dev = cmd->se_dev;
+	struct iblock_dev *ib_dev = IBLOCK_DEV(dev);
+	struct block_device *bdev = ib_dev->ibd_bd;
+	const struct pr_ops *ops = bdev->bd_disk->fops->pr_ops;
+	int i, ret, len, paths, data_offset;
+	struct pr_keys *keys;
+
+	if (!ops) {
+		pr_err("Block device does not support pr_ops but iblock device has been configured for PR passthrough.\n");
+		return TCM_UNSUPPORTED_SCSI_OPCODE;
+	}
+
+	if (!ops->pr_read_keys) {
+		pr_err("Block device does not support read_keys.\n");
+		return TCM_UNSUPPORTED_SCSI_OPCODE;
+	}
+
+	/*
+	 * We don't know what's under us, but dm-multipath will register every
+	 * path with the same key, so start off with enough space for 16 paths.
+	 */
+	paths = 16;
+retry:
+	len = 8 * paths;
+	keys = kzalloc(sizeof(*keys) + len, GFP_KERNEL);
+	if (!keys)
+		return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
+
+	ret = ops->pr_read_keys(bdev, keys, len, NULL);
+	if (!ret) {
+		if (keys->num_keys > paths) {
+			kfree(keys);
+			paths *= 2;
+			goto retry;
+		}
+	} else if (ret) {
+		ret = TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
+		goto free_keys;
+	}
+
+	ret = TCM_NO_SENSE;
+
+	put_unaligned_be32(keys->generation, &param_data[0]);
+	if (!keys->num_keys) {
+		put_unaligned_be32(0, &param_data[4]);
+		goto free_keys;
+	}
+
+	put_unaligned_be32(8 * keys->num_keys, &param_data[4]);
+
+	data_offset = 8;
+	for (i = 0; i < keys->num_keys; i++) {
+		if (data_offset + 8 > cmd->data_length)
+			break;
+
+		put_unaligned_be64(keys->keys[i], &param_data[data_offset]);
+		data_offset += 8;
+	}
+
+free_keys:
+	kfree(keys);
+	return ret;
+}
+
+static int iblock_pr_read_reservation(struct se_cmd *cmd,
+				      unsigned char *param_data)
+{
+	struct se_device *dev = cmd->se_dev;
+	struct iblock_dev *ib_dev = IBLOCK_DEV(dev);
+	struct block_device *bdev = ib_dev->ibd_bd;
+	const struct pr_ops *ops = bdev->bd_disk->fops->pr_ops;
+	struct pr_held_reservation rsv;
+
+	if (!ops) {
+		pr_err("Block device does not support pr_ops but iblock device has been configured for PR passthrough.\n");
+		return TCM_UNSUPPORTED_SCSI_OPCODE;
+	}
+
+	if (!ops->pr_read_reservation) {
+		pr_err("Block device does not support read_keys.\n");
+		return TCM_UNSUPPORTED_SCSI_OPCODE;
+	}
+
+	if (ops->pr_read_reservation(bdev, &rsv, NULL))
+		return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
+
+	put_unaligned_be32(rsv.generation, &param_data[0]);
+	if (!block_pr_type_to_scsi(rsv.type)) {
+		put_unaligned_be32(0, &param_data[4]);
+		return TCM_NO_SENSE;
+	}
+
+	put_unaligned_be32(16, &param_data[4]);
+
+	if (cmd->data_length < 16)
+		return TCM_NO_SENSE;
+	put_unaligned_be64(rsv.key, &param_data[8]);
+
+	if (cmd->data_length < 22)
+		return TCM_NO_SENSE;
+	param_data[21] = block_pr_type_to_scsi(rsv.type);
+
+	return TCM_NO_SENSE;
+}
+
+static sense_reason_t iblock_execute_pr_in(struct se_cmd *cmd, u8 sa,
+					   unsigned char *param_data)
+{
+	sense_reason_t ret = TCM_NO_SENSE;
+
+	switch (sa) {
+	case PRI_REPORT_CAPABILITIES:
+		iblock_pr_report_caps(param_data);
+		break;
+	case PRI_READ_KEYS:
+		ret = iblock_pr_read_keys(cmd, param_data);
+		break;
+	case PRI_READ_RESERVATION:
+		ret = iblock_pr_read_reservation(cmd, param_data);
+		break;
+	case PRI_READ_FULL_STATUS:
+	default:
+		pr_err("Unknown PERSISTENT_RESERVE_IN SA: 0x%02x\n", sa);
+		return TCM_UNSUPPORTED_SCSI_OPCODE;
+	}
+
+	return ret;
+}
+
 static sector_t iblock_get_blocks(struct se_device *dev)
 {
 	struct iblock_dev *ib_dev = IBLOCK_DEV(dev);
@@ -883,6 +1162,8 @@ static struct exec_cmd_ops iblock_exec_cmd_ops = {
 	.execute_sync_cache	= iblock_execute_sync_cache,
 	.execute_write_same	= iblock_execute_write_same,
 	.execute_unmap		= iblock_execute_unmap,
+	.execute_pr_out		= iblock_execute_pr_out,
+	.execute_pr_in		= iblock_execute_pr_in,
 };
 
 static sense_reason_t
@@ -899,6 +1180,7 @@ static bool iblock_get_write_cache(struct se_device *dev)
 static const struct target_backend_ops iblock_ops = {
 	.name			= "iblock",
 	.inquiry_prod		= "IBLOCK",
+	.transport_flags_changeable = TRANSPORT_FLAG_PASSTHROUGH_PGR,
 	.inquiry_rev		= IBLOCK_VERSION,
 	.owner			= THIS_MODULE,
 	.attach_hba		= iblock_attach_hba,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 10/19] nvme: Move NVMe and Block PR types to an array
  2022-10-26 23:19 ` [PATCH v3 10/19] nvme: Move NVMe and Block PR types to an array Mike Christie
@ 2022-10-27 15:18   ` Keith Busch
  2022-10-27 17:06     ` Mike Christie
  0 siblings, 1 reply; 54+ messages in thread
From: Keith Busch @ 2022-10-27 15:18 UTC (permalink / raw)
  To: Mike Christie
  Cc: bvanassche, hch, martin.petersen, linux-scsi, james.bottomley,
	linux-block, dm-devel, snitzer, axboe, linux-nvme, chaitanyak,
	target-devel

On Wed, Oct 26, 2022 at 06:19:36PM -0500, Mike Christie wrote:
> For Reservation Report support we need to also convert from the NVMe spec
> PR type back to the block PR definition. This moves us to an array, so in
> the next patch we can add another helper to do the conversion without
> having to manage 2 switches.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> ---
>  drivers/nvme/host/pr.c | 42 +++++++++++++++++++++++-------------------
>  include/linux/nvme.h   |  9 +++++++++
>  2 files changed, 32 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/nvme/host/pr.c b/drivers/nvme/host/pr.c
> index df7eb2440c67..5c4611d15d9c 100644
> --- a/drivers/nvme/host/pr.c
> +++ b/drivers/nvme/host/pr.c
> @@ -6,24 +6,28 @@
>  
>  #include "nvme.h"
>  
> -static char nvme_pr_type(enum pr_type type)
> +static const struct {
> +	enum nvme_pr_type	nvme_type;
> +	enum pr_type		blk_type;
> +} nvme_pr_types[] = {
> +	{ NVME_PR_WRITE_EXCLUSIVE, PR_WRITE_EXCLUSIVE },
> +	{ NVME_PR_EXCLUSIVE_ACCESS, PR_EXCLUSIVE_ACCESS },
> +	{ NVME_PR_WRITE_EXCLUSIVE_REG_ONLY, PR_WRITE_EXCLUSIVE_REG_ONLY },
> +	{ NVME_PR_EXCLUSIVE_ACCESS_REG_ONLY, PR_EXCLUSIVE_ACCESS_REG_ONLY },
> +	{ NVME_PR_WRITE_EXCLUSIVE_ALL_REGS, PR_WRITE_EXCLUSIVE_ALL_REGS },
> +	{ NVME_PR_EXCLUSIVE_ACCESS_ALL_REGS, PR_EXCLUSIVE_ACCESS_ALL_REGS },
> +};

Wouldn't it be easier to use the block type as the array index to avoid
the whole looped lookup?

  enum nvme_pr_type types[] = {
	.PR_WRITE_EXCLUSIVE = NVME_PR_WRITE_EXCLUSIVE,
	.PR_EXCLUSIVE_ACCESS  = NVME_PR_EXCLUSIVE_ACCESS,
        ...
  };

?

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 06/19] nvme: Fix reservation status related structs
  2022-10-26 23:19 ` [PATCH v3 06/19] nvme: Fix reservation status related structs Mike Christie
@ 2022-10-27 17:04   ` Keith Busch
  0 siblings, 0 replies; 54+ messages in thread
From: Keith Busch @ 2022-10-27 17:04 UTC (permalink / raw)
  To: Mike Christie
  Cc: bvanassche, hch, martin.petersen, linux-scsi, james.bottomley,
	linux-block, dm-devel, snitzer, axboe, linux-nvme, chaitanyak,
	target-devel

On Wed, Oct 26, 2022 at 06:19:32PM -0500, Mike Christie wrote:
> This fixes the following issues with the reservation status structs:
> 
> 1. resv10 is bytes 23:10 so it should be 14 bytes.
> 2. regctl_ds only supports 64 bit host IDs.
> 
> These are not currently used, but will be in this patchset which adds
> support for the reservation report command.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>

Looks good.

Reviewed-by: Keith Busch <kbusch@kernel.org>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 07/19] nvme: Don't hardcode the data len for pr commands
  2022-10-26 23:19 ` [PATCH v3 07/19] nvme: Don't hardcode the data len for pr commands Mike Christie
@ 2022-10-27 17:05   ` Keith Busch
  2022-11-01  5:29   ` Chaitanya Kulkarni
  1 sibling, 0 replies; 54+ messages in thread
From: Keith Busch @ 2022-10-27 17:05 UTC (permalink / raw)
  To: Mike Christie
  Cc: bvanassche, hch, martin.petersen, linux-scsi, james.bottomley,
	linux-block, dm-devel, snitzer, axboe, linux-nvme, chaitanyak,
	target-devel

On Wed, Oct 26, 2022 at 06:19:33PM -0500, Mike Christie wrote:
> Reservation Report support needs to pass in a variable sized buffer, so
> this patch has the pr command helpers take a data length argument.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Looks good.

Reviewed-by: Keith Busch <kbusch@kernel.org>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 08/19] nvme: Move pr code to it's own file
  2022-10-26 23:19 ` [PATCH v3 08/19] nvme: Move pr code to it's own file Mike Christie
@ 2022-10-27 17:06   ` Keith Busch
  2022-10-28 16:06     ` Mike Christie
  2022-11-01  5:25   ` Chaitanya Kulkarni
  1 sibling, 1 reply; 54+ messages in thread
From: Keith Busch @ 2022-10-27 17:06 UTC (permalink / raw)
  To: Mike Christie
  Cc: bvanassche, hch, martin.petersen, linux-scsi, james.bottomley,
	linux-block, dm-devel, snitzer, axboe, linux-nvme, chaitanyak,
	target-devel

On Wed, Oct 26, 2022 at 06:19:34PM -0500, Mike Christie wrote:
> This patch moves the pr code to it's own file because I'm going to be
> adding more functions and core.c is getting bigger.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>

Good idea. The nvme core file is getting a bit too big and too diverse
in its responsibilities.

Reviewed-by: Keith Busch <kbusch@kernel.org>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 10/19] nvme: Move NVMe and Block PR types to an array
  2022-10-27 15:18   ` Keith Busch
@ 2022-10-27 17:06     ` Mike Christie
  2022-10-27 17:13       ` michael.christie
  0 siblings, 1 reply; 54+ messages in thread
From: Mike Christie @ 2022-10-27 17:06 UTC (permalink / raw)
  To: Keith Busch
  Cc: bvanassche, hch, martin.petersen, linux-scsi, james.bottomley,
	linux-block, dm-devel, snitzer, axboe, linux-nvme, chaitanyak,
	target-devel

On 10/27/22 10:18 AM, Keith Busch wrote:
> On Wed, Oct 26, 2022 at 06:19:36PM -0500, Mike Christie wrote:
>> For Reservation Report support we need to also convert from the NVMe spec
>> PR type back to the block PR definition. This moves us to an array, so in
>> the next patch we can add another helper to do the conversion without
>> having to manage 2 switches.
>>
>> Signed-off-by: Mike Christie <michael.christie@oracle.com>
>> ---
>>  drivers/nvme/host/pr.c | 42 +++++++++++++++++++++++-------------------
>>  include/linux/nvme.h   |  9 +++++++++
>>  2 files changed, 32 insertions(+), 19 deletions(-)
>>
>> diff --git a/drivers/nvme/host/pr.c b/drivers/nvme/host/pr.c
>> index df7eb2440c67..5c4611d15d9c 100644
>> --- a/drivers/nvme/host/pr.c
>> +++ b/drivers/nvme/host/pr.c
>> @@ -6,24 +6,28 @@
>>  
>>  #include "nvme.h"
>>  
>> -static char nvme_pr_type(enum pr_type type)
>> +static const struct {
>> +	enum nvme_pr_type	nvme_type;
>> +	enum pr_type		blk_type;
>> +} nvme_pr_types[] = {
>> +	{ NVME_PR_WRITE_EXCLUSIVE, PR_WRITE_EXCLUSIVE },
>> +	{ NVME_PR_EXCLUSIVE_ACCESS, PR_EXCLUSIVE_ACCESS },
>> +	{ NVME_PR_WRITE_EXCLUSIVE_REG_ONLY, PR_WRITE_EXCLUSIVE_REG_ONLY },
>> +	{ NVME_PR_EXCLUSIVE_ACCESS_REG_ONLY, PR_EXCLUSIVE_ACCESS_REG_ONLY },
>> +	{ NVME_PR_WRITE_EXCLUSIVE_ALL_REGS, PR_WRITE_EXCLUSIVE_ALL_REGS },
>> +	{ NVME_PR_EXCLUSIVE_ACCESS_ALL_REGS, PR_EXCLUSIVE_ACCESS_ALL_REGS },
>> +};
> 
> Wouldn't it be easier to use the block type as the array index to avoid
> the whole looped lookup?
> 
>   enum nvme_pr_type types[] = {
> 	.PR_WRITE_EXCLUSIVE = NVME_PR_WRITE_EXCLUSIVE,
> 	.PR_EXCLUSIVE_ACCESS  = NVME_PR_EXCLUSIVE_ACCESS,
>         ...
>   };

It would be. However,

1. I wasn't sure how future proof we wanted it and I might have
misinterpreted Chaitanya's original review comment. The part of
the comment about handling "every new nvme_type" made me think that
we were worried there would be new types in the future. So I thought
we wanted it to be really generic and be able to handle cases where
the values could be funky like -1 in the future.

2. I also need to go from NVME_PR type to PR type, so we need a
second array. So we can either have 2 arrays or 1 array and 2
loops (the next patch in this set added the second loop).
If we don't care about #1 then I can I see 2 arrays is nicer.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 10/19] nvme: Move NVMe and Block PR types to an array
  2022-10-27 17:06     ` Mike Christie
@ 2022-10-27 17:13       ` michael.christie
  2022-10-27 17:16         ` Keith Busch
  0 siblings, 1 reply; 54+ messages in thread
From: michael.christie @ 2022-10-27 17:13 UTC (permalink / raw)
  To: Keith Busch
  Cc: bvanassche, hch, martin.petersen, linux-scsi, james.bottomley,
	linux-block, dm-devel, snitzer, axboe, linux-nvme, chaitanyak,
	target-devel

On 10/27/22 12:06 PM, Mike Christie wrote:
> On 10/27/22 10:18 AM, Keith Busch wrote:
>> On Wed, Oct 26, 2022 at 06:19:36PM -0500, Mike Christie wrote:
>>> For Reservation Report support we need to also convert from the NVMe spec
>>> PR type back to the block PR definition. This moves us to an array, so in
>>> the next patch we can add another helper to do the conversion without
>>> having to manage 2 switches.
>>>
>>> Signed-off-by: Mike Christie <michael.christie@oracle.com>
>>> ---
>>>  drivers/nvme/host/pr.c | 42 +++++++++++++++++++++++-------------------
>>>  include/linux/nvme.h   |  9 +++++++++
>>>  2 files changed, 32 insertions(+), 19 deletions(-)
>>>
>>> diff --git a/drivers/nvme/host/pr.c b/drivers/nvme/host/pr.c
>>> index df7eb2440c67..5c4611d15d9c 100644
>>> --- a/drivers/nvme/host/pr.c
>>> +++ b/drivers/nvme/host/pr.c
>>> @@ -6,24 +6,28 @@
>>>  
>>>  #include "nvme.h"
>>>  
>>> -static char nvme_pr_type(enum pr_type type)
>>> +static const struct {
>>> +	enum nvme_pr_type	nvme_type;
>>> +	enum pr_type		blk_type;
>>> +} nvme_pr_types[] = {
>>> +	{ NVME_PR_WRITE_EXCLUSIVE, PR_WRITE_EXCLUSIVE },
>>> +	{ NVME_PR_EXCLUSIVE_ACCESS, PR_EXCLUSIVE_ACCESS },
>>> +	{ NVME_PR_WRITE_EXCLUSIVE_REG_ONLY, PR_WRITE_EXCLUSIVE_REG_ONLY },
>>> +	{ NVME_PR_EXCLUSIVE_ACCESS_REG_ONLY, PR_EXCLUSIVE_ACCESS_REG_ONLY },
>>> +	{ NVME_PR_WRITE_EXCLUSIVE_ALL_REGS, PR_WRITE_EXCLUSIVE_ALL_REGS },
>>> +	{ NVME_PR_EXCLUSIVE_ACCESS_ALL_REGS, PR_EXCLUSIVE_ACCESS_ALL_REGS },
>>> +};
>>
>> Wouldn't it be easier to use the block type as the array index to avoid
>> the whole looped lookup?
>>
>>   enum nvme_pr_type types[] = {
>> 	.PR_WRITE_EXCLUSIVE = NVME_PR_WRITE_EXCLUSIVE,
>> 	.PR_EXCLUSIVE_ACCESS  = NVME_PR_EXCLUSIVE_ACCESS,
>>         ...
>>   };
> 
> It would be. However,
> 
> 1. I wasn't sure how future proof we wanted it and I might have
> misinterpreted Chaitanya's original review comment. The part of
> the comment about handling "every new nvme_type" made me think that
> we were worried there would be new types in the future. So I thought
> we wanted it to be really generic and be able to handle cases where
> the values could be funky like -1 in the future.
> 
> 2. I also need to go from NVME_PR type to PR type, so we need a
> second array. So we can either have 2 arrays or 1 array and 2
> loops (the next patch in this set added the second loop).
> If we don't care about #1 then I can I see 2 arrays is nicer.

Oh wait there was also a

3. The pr_types come from userspace so if it passes us 10
and we just do:

types[pr_type]

then we would crash due an out of bounds error.

Similarly I thought there could be a bad target that does the
same thing.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 10/19] nvme: Move NVMe and Block PR types to an array
  2022-10-27 17:13       ` michael.christie
@ 2022-10-27 17:16         ` Keith Busch
  2022-10-28 16:05           ` Mike Christie
  0 siblings, 1 reply; 54+ messages in thread
From: Keith Busch @ 2022-10-27 17:16 UTC (permalink / raw)
  To: michael.christie
  Cc: bvanassche, hch, martin.petersen, linux-scsi, james.bottomley,
	linux-block, dm-devel, snitzer, axboe, linux-nvme, chaitanyak,
	target-devel

On Thu, Oct 27, 2022 at 12:13:06PM -0500, michael.christie@oracle.com wrote:
> Oh wait there was also a
> 
> 3. The pr_types come from userspace so if it passes us 10
> and we just do:
> 
> types[pr_type]
> 
> then we would crash due an out of bounds error.
> 
> Similarly I thought there could be a bad target that does the
> same thing.

Well, you'd of course have to check the boundaries before accessing if
you were to implement this scheme. :)

But considering this isn't a performance path, perhaps these kinds of
optimizations are not worth it.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 10/19] nvme: Move NVMe and Block PR types to an array
  2022-10-27 17:16         ` Keith Busch
@ 2022-10-28 16:05           ` Mike Christie
  0 siblings, 0 replies; 54+ messages in thread
From: Mike Christie @ 2022-10-28 16:05 UTC (permalink / raw)
  To: Keith Busch
  Cc: bvanassche, hch, martin.petersen, linux-scsi, james.bottomley,
	linux-block, dm-devel, snitzer, axboe, linux-nvme, chaitanyak,
	target-devel

On 10/27/22 12:16 PM, Keith Busch wrote:
> On Thu, Oct 27, 2022 at 12:13:06PM -0500, michael.christie@oracle.com wrote:
>> Oh wait there was also a
>>
>> 3. The pr_types come from userspace so if it passes us 10
>> and we just do:
>>
>> types[pr_type]
>>
>> then we would crash due an out of bounds error.
>>
>> Similarly I thought there could be a bad target that does the
>> same thing.
> 
> Well, you'd of course have to check the boundaries before accessing if
> you were to implement this scheme. :)

Yeah :) Sorry I didn't write that reply well. I was more trying to say
that we would still need a wrapper function to check the bounds, so either
we have 2 arrays and 2 helper functions or 1 array and 2 helper functions.

I'll see what other people's opinions are and just do what you guys end up
preferring.


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 08/19] nvme: Move pr code to it's own file
  2022-10-27 17:06   ` Keith Busch
@ 2022-10-28 16:06     ` Mike Christie
  2022-10-28 16:38       ` Keith Busch
  0 siblings, 1 reply; 54+ messages in thread
From: Mike Christie @ 2022-10-28 16:06 UTC (permalink / raw)
  To: Keith Busch
  Cc: bvanassche, hch, martin.petersen, linux-scsi, james.bottomley,
	linux-block, dm-devel, snitzer, axboe, linux-nvme, chaitanyak,
	target-devel

On 10/27/22 12:06 PM, Keith Busch wrote:
> On Wed, Oct 26, 2022 at 06:19:34PM -0500, Mike Christie wrote:
>> This patch moves the pr code to it's own file because I'm going to be
>> adding more functions and core.c is getting bigger.
>>
>> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> 
> Good idea.

Credit goes to Chaitanya.

One question for you. I wasn't sure about the copyright. I saw
you added the pr_ops code in 2015 when you were at Intel. Do you
want me to add:

Copyright (c) 2015, Intel Corporation.

to the new pr.c file?

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 08/19] nvme: Move pr code to it's own file
  2022-10-28 16:06     ` Mike Christie
@ 2022-10-28 16:38       ` Keith Busch
  2022-10-30  8:06         ` Christoph Hellwig
  0 siblings, 1 reply; 54+ messages in thread
From: Keith Busch @ 2022-10-28 16:38 UTC (permalink / raw)
  To: Mike Christie
  Cc: bvanassche, hch, martin.petersen, linux-scsi, james.bottomley,
	linux-block, dm-devel, snitzer, axboe, linux-nvme, chaitanyak,
	target-devel

On Fri, Oct 28, 2022 at 11:06:29AM -0500, Mike Christie wrote:
> On 10/27/22 12:06 PM, Keith Busch wrote:
> > On Wed, Oct 26, 2022 at 06:19:34PM -0500, Mike Christie wrote:
> >> This patch moves the pr code to it's own file because I'm going to be
> >> adding more functions and core.c is getting bigger.
> >>
> >> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> > 
> > Good idea.
> 
> Credit goes to Chaitanya.
> 
> One question for you. I wasn't sure about the copyright. I saw
> you added the pr_ops code in 2015 when you were at Intel. Do you
> want me to add:
> 
> Copyright (c) 2015, Intel Corporation.
> 
> to the new pr.c file?

I think I was just the last person to touch the code, but it was mostly
developed elsewhere. So "no", don't bother propagating the (c).

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 08/19] nvme: Move pr code to it's own file
  2022-10-28 16:38       ` Keith Busch
@ 2022-10-30  8:06         ` Christoph Hellwig
  0 siblings, 0 replies; 54+ messages in thread
From: Christoph Hellwig @ 2022-10-30  8:06 UTC (permalink / raw)
  To: Keith Busch
  Cc: Mike Christie, bvanassche, hch, martin.petersen, linux-scsi,
	james.bottomley, linux-block, dm-devel, snitzer, axboe,
	linux-nvme, chaitanyak, target-devel

On Fri, Oct 28, 2022 at 10:38:36AM -0600, Keith Busch wrote:
> I think I was just the last person to touch the code, but it was mostly
> developed elsewhere. So "no", don't bother propagating the (c).

You actually did write the PR ops code.  I wrote the core and SCSI
side, and you added NVMe.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 09/19] nvme: Add pr_ops read_keys support
  2022-10-26 23:19 ` [PATCH v3 09/19] nvme: Add pr_ops read_keys support Mike Christie
@ 2022-10-30  8:17   ` Christoph Hellwig
  2022-10-30 20:47     ` Mike Christie
  0 siblings, 1 reply; 54+ messages in thread
From: Christoph Hellwig @ 2022-10-30  8:17 UTC (permalink / raw)
  To: Mike Christie
  Cc: bvanassche, hch, martin.petersen, linux-scsi, james.bottomley,
	linux-block, dm-devel, snitzer, axboe, linux-nvme, chaitanyak,
	kbusch, target-devel

On Wed, Oct 26, 2022 at 06:19:35PM -0500, Mike Christie wrote:
> This patch adds support for the pr_ops read_keys callout by calling the
> NVMe Reservation Report helper, then parsing that info to get the
> controller's registered keys. Because the callout is only used in the
> kernel where the callers do not know about controller/host IDs, the
> callout just returns the registered keys which is required by the SCSI PR
> in READ KEYS command.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> ---
>  drivers/nvme/host/pr.c | 73 ++++++++++++++++++++++++++++++++++++++++++
>  include/linux/nvme.h   |  4 +++
>  2 files changed, 77 insertions(+)
> 
> diff --git a/drivers/nvme/host/pr.c b/drivers/nvme/host/pr.c
> index aa88c55879b2..df7eb2440c67 100644
> --- a/drivers/nvme/host/pr.c
> +++ b/drivers/nvme/host/pr.c
> @@ -118,10 +118,83 @@ static int nvme_pr_release(struct block_device *bdev, u64 key, enum pr_type type
>  	return nvme_pr_command(bdev, cdw10, key, 0, nvme_cmd_resv_release);
>  }
>  
> +static int nvme_pr_resv_report(struct block_device *bdev, u8 *data,
> +		u32 data_len, bool *eds)

Is there any good reason this method seems to take a u8 instead of a void
pointer?  As that seems to make a few things a bit messy.

> +	if (IS_ENABLED(CONFIG_NVME_MULTIPATH) &&
> +	    bdev->bd_disk->fops == &nvme_ns_head_ops)
> +		ret = nvme_send_ns_head_pr_command(bdev, &c, data, data_len);
> +	else
> +		ret = nvme_send_ns_pr_command(bdev->bd_disk->private_data, &c,
> +					      data, data_len);

Can you please add a hlper for this logic?

> +	for (i = 0; i < num_ret_keys; i++) {
> +		if (eds) {
> +			keys_info->keys[i] =
> +					le64_to_cpu(status->regctl_eds[i].rkey);
> +		} else {
> +			keys_info->keys[i] =
> +					le64_to_cpu(status->regctl_ds[i].rkey);
> +		}

If you shorten the status variable name to something like rs this becomes
much easier to follow :)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 11/19] nvme: Add pr_ops read_reservation support
  2022-10-26 23:19 ` [PATCH v3 11/19] nvme: Add pr_ops read_reservation support Mike Christie
@ 2022-10-30  8:18   ` Christoph Hellwig
  2022-10-30 20:54     ` Mike Christie
  0 siblings, 1 reply; 54+ messages in thread
From: Christoph Hellwig @ 2022-10-30  8:18 UTC (permalink / raw)
  To: Mike Christie
  Cc: bvanassche, hch, martin.petersen, linux-scsi, james.bottomley,
	linux-block, dm-devel, snitzer, axboe, linux-nvme, chaitanyak,
	kbusch, target-devel

> +	memset(resv, 0, sizeof(*resv));

Is there any good reason this isn't done by the caller?

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 12/19] block,nvme,scsi,dm: Add blk_status to pr_ops callouts
  2022-10-26 23:19 ` [PATCH v3 12/19] block,nvme,scsi,dm: Add blk_status to pr_ops callouts Mike Christie
@ 2022-10-30  8:20   ` Christoph Hellwig
  2022-10-30 23:05     ` Mike Christie
  0 siblings, 1 reply; 54+ messages in thread
From: Christoph Hellwig @ 2022-10-30  8:20 UTC (permalink / raw)
  To: Mike Christie
  Cc: bvanassche, hch, martin.petersen, linux-scsi, james.bottomley,
	linux-block, dm-devel, snitzer, axboe, linux-nvme, chaitanyak,
	kbusch, target-devel

On Wed, Oct 26, 2022 at 06:19:38PM -0500, Mike Christie wrote:
> To handle both cases and keep userspace compatibility, this patch adds a
> blk_status_t arg to the pr_ops callouts. The lower levels will convert
> their device specific error to the blk_status_t then the upper levels
> can easily check that code without knowing the device type. Adding the
> extra return value will then allow us to not break userspace which expects
> a negative -Exyz error code if the command fails before it's sent to the
> device or a device/driver specific value if the error is > 0.

I really hate this double error return.  What -E* statuses that matter
can be returned without a BLK_STS_* equivalent that we couldn't convert
to and from?

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 09/19] nvme: Add pr_ops read_keys support
  2022-10-30  8:17   ` Christoph Hellwig
@ 2022-10-30 20:47     ` Mike Christie
  0 siblings, 0 replies; 54+ messages in thread
From: Mike Christie @ 2022-10-30 20:47 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: bvanassche, martin.petersen, linux-scsi, james.bottomley,
	linux-block, dm-devel, snitzer, axboe, linux-nvme, chaitanyak,
	kbusch, target-devel

On 10/30/22 3:17 AM, Christoph Hellwig wrote:
> On Wed, Oct 26, 2022 at 06:19:35PM -0500, Mike Christie wrote:
>> This patch adds support for the pr_ops read_keys callout by calling the
>> NVMe Reservation Report helper, then parsing that info to get the
>> controller's registered keys. Because the callout is only used in the
>> kernel where the callers do not know about controller/host IDs, the
>> callout just returns the registered keys which is required by the SCSI PR
>> in READ KEYS command.
>>
>> Signed-off-by: Mike Christie <michael.christie@oracle.com>
>> ---
>>  drivers/nvme/host/pr.c | 73 ++++++++++++++++++++++++++++++++++++++++++
>>  include/linux/nvme.h   |  4 +++
>>  2 files changed, 77 insertions(+)
>>
>> diff --git a/drivers/nvme/host/pr.c b/drivers/nvme/host/pr.c
>> index aa88c55879b2..df7eb2440c67 100644
>> --- a/drivers/nvme/host/pr.c
>> +++ b/drivers/nvme/host/pr.c
>> @@ -118,10 +118,83 @@ static int nvme_pr_release(struct block_device *bdev, u64 key, enum pr_type type
>>  	return nvme_pr_command(bdev, cdw10, key, 0, nvme_cmd_resv_release);
>>  }
>>  
>> +static int nvme_pr_resv_report(struct block_device *bdev, u8 *data,
>> +		u32 data_len, bool *eds)
> 
> Is there any good reason this method seems to take a u8 instead of a void
> pointer?  As that seems to make a few things a bit messy.

I did it because the helper functions nvme_send_ns_head_pr_command/
nvme_send_ns_pr_command took a u8.

Looking at it some more I see those functions use nvme_submit_sync_cmd
which then takes a avoid pointer.

To handle your comment about the helper below I'll fix all that up to
take a void pointer.


> 
>> +	if (IS_ENABLED(CONFIG_NVME_MULTIPATH) &&
>> +	    bdev->bd_disk->fops == &nvme_ns_head_ops)
>> +		ret = nvme_send_ns_head_pr_command(bdev, &c, data, data_len);
>> +	else
>> +		ret = nvme_send_ns_pr_command(bdev->bd_disk->private_data, &c,
>> +					      data, data_len);
> 
> Can you please add a hlper for this logic?

Will do.

> 
>> +	for (i = 0; i < num_ret_keys; i++) {
>> +		if (eds) {
>> +			keys_info->keys[i] =
>> +					le64_to_cpu(status->regctl_eds[i].rkey);
>> +		} else {
>> +			keys_info->keys[i] =
>> +					le64_to_cpu(status->regctl_ds[i].rkey);
>> +		}
> 
> If you shorten the status variable name to something like rs this becomes
> much easier to follow :)


Will do.


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 11/19] nvme: Add pr_ops read_reservation support
  2022-10-30  8:18   ` Christoph Hellwig
@ 2022-10-30 20:54     ` Mike Christie
  0 siblings, 0 replies; 54+ messages in thread
From: Mike Christie @ 2022-10-30 20:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: bvanassche, martin.petersen, linux-scsi, james.bottomley,
	linux-block, dm-devel, snitzer, axboe, linux-nvme, chaitanyak,
	kbusch, target-devel

On 10/30/22 3:18 AM, Christoph Hellwig wrote:
>> +	memset(resv, 0, sizeof(*resv));
> 
> Is there any good reason this isn't done by the caller?

I will change it.

I think it was leftover from when I was experimenting with
some mulitpath support. It's not needed anymore.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 12/19] block,nvme,scsi,dm: Add blk_status to pr_ops callouts
  2022-10-30  8:20   ` Christoph Hellwig
@ 2022-10-30 23:05     ` Mike Christie
  2022-11-01 10:15       ` Christoph Hellwig
  0 siblings, 1 reply; 54+ messages in thread
From: Mike Christie @ 2022-10-30 23:05 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: bvanassche, martin.petersen, linux-scsi, james.bottomley,
	linux-block, dm-devel, snitzer, axboe, linux-nvme, chaitanyak,
	kbusch, target-devel

On 10/30/22 3:20 AM, Christoph Hellwig wrote:
> On Wed, Oct 26, 2022 at 06:19:38PM -0500, Mike Christie wrote:
>> To handle both cases and keep userspace compatibility, this patch adds a
>> blk_status_t arg to the pr_ops callouts. The lower levels will convert
>> their device specific error to the blk_status_t then the upper levels
>> can easily check that code without knowing the device type. Adding the
>> extra return value will then allow us to not break userspace which expects
>> a negative -Exyz error code if the command fails before it's sent to the
>> device or a device/driver specific value if the error is > 0.
> 
> I really hate this double error return.  What -E* statuses that matter
> can be returned without a BLK_STS_* equivalent that we couldn't convert
> to and from?

Hey,

I didn't fully understand the question. The specific issue I'm hitting is
that if a scsi/nvme device returns SAM_STAT_RESERVATION_CONFLICT/
NVME_SC_RESERVATION_CONFLICT then in lio I need to be able to detect that
and return SAM_STAT_RESERVATION_CONFLICT. So I only care about
-EBADE/BLK_STS_NEXUS right now. So are you asking about -EBADE?

Do you mean I could have sd_pr_out_command return -EBADE when it gets
a SAM_STAT_RESERVATION_CONFLICT (nvme would do the equivalent). Then in
lio do:

ret = ops->pr_register()
if (ret == -EBADE)
	return SAM_STAT_RESERVATION_CONFLICT;

The problem I hit is that in the ioctl code I then have to do:

@@ -269,7 +270,14 @@ static int blkdev_pr_register(struct block_device *bdev,
 
 	if (reg.flags & ~PR_FL_IGNORE_KEY)
 		return -EOPNOTSUPP;
-	return ops->pr_register(bdev, reg.old_key, reg.new_key, reg.flags);
+	ret = ops->pr_register(bdev, reg.old_key, reg.new_key, reg.flags);
+	if (ret == -EBADE) {
+		if (bdev_is_nvme(bdev))
+			ret = NVME_SC_RESERVATION_CONFLICT;
+		else if (bdev_is_scsi(bdev)
+			ret = SAM_STAT_RESERVATION_CONFLICT;
+	}
+	return ret;
 }
 

or I could convert the scsi/nvme or code to always use BLK_STS errors.
In LIO I can easily check for BLK_STS_NEXUS like with the -EBADE example. In
the ioctl code then for common errors I can go from BLK_STS using the
blk_status_to_errno helper. However, for some scsi/nvme specific errors we
would want to do:

@@ -269,7 +270,36 @@ static int blkdev_pr_register(struct block_device *bdev,
 
 	if (reg.flags & ~PR_FL_IGNORE_KEY)
 		return -EOPNOTSUPP;
-	return ops->pr_register(bdev, reg.old_key, reg.new_key, reg.flags);
+	ret = ops->pr_register(bdev, reg.old_key, reg.new_key, reg.flags);
+	switch (ret) {
+	/* there could be nvme/scsi helper functions for this which would
+	 * be the reverse of nvme_error_status/ */
+	case BLK_STS_NEXUS:
+		if (bdev_is_nvme(bdev))
+			ret = NVME_SC_RESERVATION_CONFLICT;
+		else if (bdev_is_scsi(bdev)
+			ret = SAM_STAT_RESERVATION_CONFLICT;
+		break;
+	case BLK_STS_TRANSPORT:
+		if (bdev_is_nvme(bdev))
+			ret = NVME_SC_HOST_PATH_ERROR;
+		else if (bdev_is_scsi(bdev)
+			ret = DID_TRANSPORT_FAILFAST or DID_TRANSPORT_MARGINAL;
+		break;
+	case BLK_STS_NOTSUPP:
+		if (bdev_is_nvme(bdev))
+			ret = NVME_SC_BAD_ATTRIBUTES or
+				NVME_SC_ONCS_NOT_SUPPORTED or
+				NVME_SC_INVALID_OPCODE or
+				NVME_SC_INVALID_FIELD or
+				NVME_SC_INVALID_NS
+		else if (bdev_is_scsi(bdev)
+			ret = We don't have a way to support this in SCSI yet
				because it would be in the sense/asc/ascq.
+		break;
+	default:
+		ret = blk_status_to_errno(ret);
+	}
+	return ret;
 }
 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 08/19] nvme: Move pr code to it's own file
  2022-10-26 23:19 ` [PATCH v3 08/19] nvme: Move pr code to it's own file Mike Christie
  2022-10-27 17:06   ` Keith Busch
@ 2022-11-01  5:25   ` Chaitanya Kulkarni
  1 sibling, 0 replies; 54+ messages in thread
From: Chaitanya Kulkarni @ 2022-11-01  5:25 UTC (permalink / raw)
  To: Mike Christie
  Cc: hch, linux-block, snitzer, martin.petersen, linux-scsi, dm-devel,
	axboe, kbusch, target-devel, linux-nvme, james.bottomley

On 10/26/22 16:19, Mike Christie wrote:
> This patch moves the pr code to it's own file because I'm going to be
> adding more functions and core.c is getting bigger.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> ---
>

Thanks a lot for doing this ...

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 07/19] nvme: Don't hardcode the data len for pr commands
  2022-10-26 23:19 ` [PATCH v3 07/19] nvme: Don't hardcode the data len for pr commands Mike Christie
  2022-10-27 17:05   ` Keith Busch
@ 2022-11-01  5:29   ` Chaitanya Kulkarni
  1 sibling, 0 replies; 54+ messages in thread
From: Chaitanya Kulkarni @ 2022-11-01  5:29 UTC (permalink / raw)
  To: Mike Christie, hch, martin.petersen, linux-scsi, linux-block,
	dm-devel, axboe, linux-nvme, kbusch, target-devel

On 10/26/22 16:19, Mike Christie wrote:
> Reservation Report support needs to pass in a variable sized buffer, so
> this patch has the pr command helpers take a data length argument.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---


Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 02/19] scsi: Rename sd_pr_command
  2022-10-26 23:19 ` [PATCH v3 02/19] scsi: Rename sd_pr_command Mike Christie
@ 2022-11-01  5:33   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 54+ messages in thread
From: Chaitanya Kulkarni @ 2022-11-01  5:33 UTC (permalink / raw)
  To: Mike Christie, hch, martin.petersen, linux-scsi, linux-block,
	dm-devel, axboe, linux-nvme, kbusch, target-devel

On 10/26/22 16:19, Mike Christie wrote:
> Rename sd_pr_command to sd_pr_out_command to match a
> sd_pr_in_command helper added in the next patches.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/scsi/sd.c | 12 ++++++------
>

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 03/19] scsi: Move sd_pr_type to header to share
  2022-10-26 23:19 ` [PATCH v3 03/19] scsi: Move sd_pr_type to header to share Mike Christie
@ 2022-11-01  5:43   ` Chaitanya Kulkarni
  2022-11-01 16:43     ` Mike Christie
  2022-11-02 22:47   ` Bart Van Assche
  1 sibling, 1 reply; 54+ messages in thread
From: Chaitanya Kulkarni @ 2022-11-01  5:43 UTC (permalink / raw)
  To: Mike Christie, hch, martin.petersen, linux-scsi, linux-block,
	dm-devel, axboe, linux-nvme, kbusch, target-devel

On 10/26/22 16:19, Mike Christie wrote:
> LIO is going to want to do the same block to/from SCSI pr types as sd.c
> so this moves the sd_pr_type helper to a new file. The next patch will
> then also add a helper to go from the SCSI value to the block one for use
> with PERSISTENT_RESERVE_IN commands.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/scsi/sd.c            | 31 +++++++------------------------
>   include/scsi/scsi_block_pr.h | 36 ++++++++++++++++++++++++++++++++++++
>   2 files changed, 43 insertions(+), 24 deletions(-)
>   create mode 100644 include/scsi/scsi_block_pr.h
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 4dc5c932fbd3..ad9374b07585 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -67,6 +67,7 @@
>   #include <scsi/scsi_host.h>
>   #include <scsi/scsi_ioctl.h>
>   #include <scsi/scsicam.h>
> +#include <scsi/scsi_block_pr.h>
>   
>   #include "sd.h"
>   #include "scsi_priv.h"
> @@ -1694,28 +1695,8 @@ static int sd_get_unique_id(struct gendisk *disk, u8 id[16],
>   	return ret;
>   }
>   
> -static char sd_pr_type(enum pr_type type)
> -{
> -	switch (type) {
> -	case PR_WRITE_EXCLUSIVE:
> -		return 0x01;
> -	case PR_EXCLUSIVE_ACCESS:
> -		return 0x03;
> -	case PR_WRITE_EXCLUSIVE_REG_ONLY:
> -		return 0x05;
> -	case PR_EXCLUSIVE_ACCESS_REG_ONLY:
> -		return 0x06;
> -	case PR_WRITE_EXCLUSIVE_ALL_REGS:
> -		return 0x07;
> -	case PR_EXCLUSIVE_ACCESS_ALL_REGS:
> -		return 0x08;
> -	default:
> -		return 0;
> -	}
> -};
> -
>   static int sd_pr_out_command(struct block_device *bdev, u8 sa,
> -		u64 key, u64 sa_key, u8 type, u8 flags)
> +		u64 key, u64 sa_key, enum scsi_pr_type type, u8 flags)
>   {
>   	struct scsi_disk *sdkp = scsi_disk(bdev->bd_disk);
>   	struct scsi_device *sdev = sdkp->device;
> @@ -1778,19 +1759,21 @@ static int sd_pr_reserve(struct block_device *bdev, u64 key, enum pr_type type,
>   {
>   	if (flags)
>   		return -EOPNOTSUPP;
> -	return sd_pr_out_command(bdev, 0x01, key, 0, sd_pr_type(type), 0);
> +	return sd_pr_out_command(bdev, 0x01, key, 0,
> +				 block_pr_type_to_scsi(type), 0);
>   }
>   
>   static int sd_pr_release(struct block_device *bdev, u64 key, enum pr_type type)
>   {
> -	return sd_pr_out_command(bdev, 0x02, key, 0, sd_pr_type(type), 0);
> +	return sd_pr_out_command(bdev, 0x02, key, 0,
> +				 block_pr_type_to_scsi(type), 0);
>   }
>   
>   static int sd_pr_preempt(struct block_device *bdev, u64 old_key, u64 new_key,
>   		enum pr_type type, bool abort)
>   {
>   	return sd_pr_out_command(bdev, abort ? 0x05 : 0x04, old_key, new_key,
> -			     sd_pr_type(type), 0);
> +				 block_pr_type_to_scsi(type), 0);
>   }
>   
>   static int sd_pr_clear(struct block_device *bdev, u64 key)
> diff --git a/include/scsi/scsi_block_pr.h b/include/scsi/scsi_block_pr.h
> new file mode 100644
> index 000000000000..6e99f844330d
> --- /dev/null
> +++ b/include/scsi/scsi_block_pr.h
> @@ -0,0 +1,36 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _SCSI_BLOCK_PR_H
> +#define _SCSI_BLOCK_PR_H
> +
> +#include <uapi/linux/pr.h>
> +
> +enum scsi_pr_type {
> +	SCSI_PR_WRITE_EXCLUSIVE			= 0x01,
> +	SCSI_PR_EXCLUSIVE_ACCESS		= 0x03,
> +	SCSI_PR_WRITE_EXCLUSIVE_REG_ONLY	= 0x05,
> +	SCSI_PR_EXCLUSIVE_ACCESS_REG_ONLY	= 0x06,
> +	SCSI_PR_WRITE_EXCLUSIVE_ALL_REGS	= 0x07,
> +	SCSI_PR_EXCLUSIVE_ACCESS_ALL_REGS	= 0x08,
> +};
> +
> +static inline enum scsi_pr_type block_pr_type_to_scsi(enum pr_type type)
> +{
> +	switch (type) {
> +	case PR_WRITE_EXCLUSIVE:
> +		return SCSI_PR_WRITE_EXCLUSIVE;
> +	case PR_EXCLUSIVE_ACCESS:
> +		return SCSI_PR_EXCLUSIVE_ACCESS;
> +	case PR_WRITE_EXCLUSIVE_REG_ONLY:
> +		return SCSI_PR_WRITE_EXCLUSIVE_REG_ONLY;
> +	case PR_EXCLUSIVE_ACCESS_REG_ONLY:
> +		return SCSI_PR_EXCLUSIVE_ACCESS_REG_ONLY;
> +	case PR_WRITE_EXCLUSIVE_ALL_REGS:
> +		return SCSI_PR_WRITE_EXCLUSIVE_ALL_REGS;
> +	case PR_EXCLUSIVE_ACCESS_ALL_REGS:
> +		return SCSI_PR_EXCLUSIVE_ACCESS_ALL_REGS;
> +	default:
> +		return 0;
> +	}
> +};


do we need above semicolon ?

how about not using switch case pattern totally untested below ?

static inline enum scsi_pr_type block_pr_type_to_scsi(enum pr_type type)
{
         enum pr_type pr_to_scsi_pr[] = {
                 [PR_WRITE_EXCLUSIVE] = SCSI_PR_WRITE_EXCLUSIVE,
                 [PR_EXCLUSIVE_ACCESS] = SCSI_PR_EXCLUSIVE_ACCESS,
                 [PR_WRITE_EXCLUSIVE_REG_ONLY] = 
SCSI_PR_WRITE_EXCLUSIVE_REG_ONLY,
                 [PR_EXCLUSIVE_ACCESS_REG_ONLY] = 
SCSI_PR_EXCLUSIVE_ACCESS_REG_ONLY,
                 [PR_WRITE_EXCLUSIVE_ALL_REGS] = 
SCSI_PR_WRITE_EXCLUSIVE_ALL_REGS,
                 [PR_EXCLUSIVE_ACCESS_ALL_REGS] = 
SCSI_PR_EXCLUSIVE_ACCESS_ALL_REGS,
         };

         if (type > ARRAY_SIZE(pr_to_scsi_pr))
                 return 0;
         return pr_to_scsi_pr[type];
}


-ck


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 04/19] scsi: Add support for block PR read keys/reservation
  2022-10-26 23:19 ` [PATCH v3 04/19] scsi: Add support for block PR read keys/reservation Mike Christie
@ 2022-11-01  5:45   ` Chaitanya Kulkarni
  2022-11-02 22:54   ` Bart Van Assche
  1 sibling, 0 replies; 54+ messages in thread
From: Chaitanya Kulkarni @ 2022-11-01  5:45 UTC (permalink / raw)
  To: Mike Christie, hch, martin.petersen, linux-scsi, linux-block,
	dm-devel, axboe, linux-nvme, kbusch, target-devel


> +static inline enum pr_type scsi_pr_type_to_block(enum scsi_pr_type type)
> +{
> +	switch (type) {
> +	case SCSI_PR_WRITE_EXCLUSIVE:
> +		return PR_WRITE_EXCLUSIVE;
> +	case SCSI_PR_EXCLUSIVE_ACCESS:
> +		return PR_EXCLUSIVE_ACCESS;
> +	case SCSI_PR_WRITE_EXCLUSIVE_REG_ONLY:
> +		return PR_WRITE_EXCLUSIVE_REG_ONLY;
> +	case SCSI_PR_EXCLUSIVE_ACCESS_REG_ONLY:
> +		return PR_EXCLUSIVE_ACCESS_REG_ONLY;
> +	case SCSI_PR_WRITE_EXCLUSIVE_ALL_REGS:
> +		return PR_WRITE_EXCLUSIVE_ALL_REGS;
> +	case SCSI_PR_EXCLUSIVE_ACCESS_ALL_REGS:
> +		return PR_EXCLUSIVE_ACCESS_ALL_REGS;
> +	default:
> +		return 0;
> +	}
> +}
> +

same here as previous comment, unless there is strong reason not to do
that ...

-ck


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 12/19] block,nvme,scsi,dm: Add blk_status to pr_ops callouts
  2022-10-30 23:05     ` Mike Christie
@ 2022-11-01 10:15       ` Christoph Hellwig
  2022-11-05 18:36         ` Mike Christie
  0 siblings, 1 reply; 54+ messages in thread
From: Christoph Hellwig @ 2022-11-01 10:15 UTC (permalink / raw)
  To: Mike Christie
  Cc: Christoph Hellwig, bvanassche, martin.petersen, linux-scsi,
	james.bottomley, linux-block, dm-devel, snitzer, axboe,
	linux-nvme, chaitanyak, kbusch, target-devel

On Sun, Oct 30, 2022 at 06:05:35PM -0500, Mike Christie wrote:
> The problem I hit is that in the ioctl code I then have to do:
> 
> @@ -269,7 +270,14 @@ static int blkdev_pr_register(struct block_device *bdev,
>  
>  	if (reg.flags & ~PR_FL_IGNORE_KEY)
>  		return -EOPNOTSUPP;
> -	return ops->pr_register(bdev, reg.old_key, reg.new_key, reg.flags);
> +	ret = ops->pr_register(bdev, reg.old_key, reg.new_key, reg.flags);
> +	if (ret == -EBADE) {
> +		if (bdev_is_nvme(bdev))
> +			ret = NVME_SC_RESERVATION_CONFLICT;
> +		else if (bdev_is_scsi(bdev)
> +			ret = SAM_STAT_RESERVATION_CONFLICT;
> +	}
> +	return ret;

Eww.  We should have never leaked protocol specific values out to
userspace.  This is an original bug I introduceѕ, so all blame is on me.

I suspect the right way to fix is is to keep the numeric value of
SAM_STAT_RESERVATION_CONFLICT and give it a new constant exposed in
the uapi header, assuming that SCSI is the thing people actually
used the PR API for, and nvme was just an nice little add-on.

Now if an errno value or blk_status_t is returned from the method
should not matter for this fix, but in the long run I think the
blk_status_t would be cleaner than the int used for errno, and
that will also prevent us from returning accidental non-intended
values.

Btw, I also thing we should rename BLK_STS_NEXUS to
BLK_STS_RESERVATION_CONFLICT (assuming s390 is ok with that), as that
has much better documentary value.

> +	case BLK_STS_TRANSPORT:
> +		if (bdev_is_nvme(bdev))
> +			ret = NVME_SC_HOST_PATH_ERROR;
> +		else if (bdev_is_scsi(bdev)
> +			ret = DID_TRANSPORT_FAILFAST or DID_TRANSPORT_MARGINAL;
> +		break;

And we'll need an uapi value for this as well.

> +	case BLK_STS_NOTSUPP:

and maybe this unless we can just get away with the negative errno
value.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 03/19] scsi: Move sd_pr_type to header to share
  2022-11-01  5:43   ` Chaitanya Kulkarni
@ 2022-11-01 16:43     ` Mike Christie
  0 siblings, 0 replies; 54+ messages in thread
From: Mike Christie @ 2022-11-01 16:43 UTC (permalink / raw)
  To: Chaitanya Kulkarni, hch, martin.petersen, linux-scsi,
	linux-block, dm-devel, axboe, linux-nvme, kbusch, target-devel

On 11/1/22 12:43 AM, Chaitanya Kulkarni wrote:
>> +static inline enum scsi_pr_type block_pr_type_to_scsi(enum pr_type type)
>> +{
>> +	switch (type) {
>> +	case PR_WRITE_EXCLUSIVE:
>> +		return SCSI_PR_WRITE_EXCLUSIVE;
>> +	case PR_EXCLUSIVE_ACCESS:
>> +		return SCSI_PR_EXCLUSIVE_ACCESS;
>> +	case PR_WRITE_EXCLUSIVE_REG_ONLY:
>> +		return SCSI_PR_WRITE_EXCLUSIVE_REG_ONLY;
>> +	case PR_EXCLUSIVE_ACCESS_REG_ONLY:
>> +		return SCSI_PR_EXCLUSIVE_ACCESS_REG_ONLY;
>> +	case PR_WRITE_EXCLUSIVE_ALL_REGS:
>> +		return SCSI_PR_WRITE_EXCLUSIVE_ALL_REGS;
>> +	case PR_EXCLUSIVE_ACCESS_ALL_REGS:
>> +		return SCSI_PR_EXCLUSIVE_ACCESS_ALL_REGS;
>> +	default:
>> +		return 0;
>> +	}
>> +};
> 
> 
> do we need above semicolon ?

No. It was a mistake.

> 
> how about not using switch case pattern totally untested below ?
> 
> static inline enum scsi_pr_type block_pr_type_to_scsi(enum pr_type type)
> {
>          enum pr_type pr_to_scsi_pr[] = {
>                  [PR_WRITE_EXCLUSIVE] = SCSI_PR_WRITE_EXCLUSIVE,
>                  [PR_EXCLUSIVE_ACCESS] = SCSI_PR_EXCLUSIVE_ACCESS,
>                  [PR_WRITE_EXCLUSIVE_REG_ONLY] = 
> SCSI_PR_WRITE_EXCLUSIVE_REG_ONLY,
>                  [PR_EXCLUSIVE_ACCESS_REG_ONLY] = 
> SCSI_PR_EXCLUSIVE_ACCESS_REG_ONLY,
>                  [PR_WRITE_EXCLUSIVE_ALL_REGS] = 
> SCSI_PR_WRITE_EXCLUSIVE_ALL_REGS,
>                  [PR_EXCLUSIVE_ACCESS_ALL_REGS] = 
> SCSI_PR_EXCLUSIVE_ACCESS_ALL_REGS,
>          };
> 
>          if (type > ARRAY_SIZE(pr_to_scsi_pr))
>                  return 0;
>          return pr_to_scsi_pr[type];
> }
> 

Keith also wanted something like this for nvme so will fix up
the scsi and nvme code.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 03/19] scsi: Move sd_pr_type to header to share
  2022-10-26 23:19 ` [PATCH v3 03/19] scsi: Move sd_pr_type to header to share Mike Christie
  2022-11-01  5:43   ` Chaitanya Kulkarni
@ 2022-11-02 22:47   ` Bart Van Assche
  2022-11-03  2:13     ` Mike Christie
  1 sibling, 1 reply; 54+ messages in thread
From: Bart Van Assche @ 2022-11-02 22:47 UTC (permalink / raw)
  To: Mike Christie, hch, martin.petersen, linux-scsi, james.bottomley,
	linux-block, dm-devel, snitzer, axboe, linux-nvme, chaitanyak,
	kbusch, target-devel

On 10/26/22 16:19, Mike Christie wrote:
> +static inline enum scsi_pr_type block_pr_type_to_scsi(enum pr_type type)
> +{
> +	switch (type) {
> +	case PR_WRITE_EXCLUSIVE:
> +		return SCSI_PR_WRITE_EXCLUSIVE;
> +	case PR_EXCLUSIVE_ACCESS:
> +		return SCSI_PR_EXCLUSIVE_ACCESS;
> +	case PR_WRITE_EXCLUSIVE_REG_ONLY:
> +		return SCSI_PR_WRITE_EXCLUSIVE_REG_ONLY;
> +	case PR_EXCLUSIVE_ACCESS_REG_ONLY:
> +		return SCSI_PR_EXCLUSIVE_ACCESS_REG_ONLY;
> +	case PR_WRITE_EXCLUSIVE_ALL_REGS:
> +		return SCSI_PR_WRITE_EXCLUSIVE_ALL_REGS;
> +	case PR_EXCLUSIVE_ACCESS_ALL_REGS:
> +		return SCSI_PR_EXCLUSIVE_ACCESS_ALL_REGS;
> +	default:
> +		return 0;
> +	}
> +};

Please leave out "default: return 0;" from the switch statement and add 
"return 0;" after the switch statement. That will make the compiler emit 
a warning if a value is added in enum pr_type but not in the above function.

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 01/19] block: Add PR callouts for read keys and reservation
  2022-10-26 23:19 ` [PATCH v3 01/19] block: Add PR callouts for read keys and reservation Mike Christie
@ 2022-11-02 22:50   ` Bart Van Assche
  2022-11-03  1:54     ` Mike Christie
  2022-11-02 22:53   ` Bart Van Assche
  1 sibling, 1 reply; 54+ messages in thread
From: Bart Van Assche @ 2022-11-02 22:50 UTC (permalink / raw)
  To: Mike Christie, hch, martin.petersen, linux-scsi, james.bottomley,
	linux-block, dm-devel, snitzer, axboe, linux-nvme, chaitanyak,
	kbusch, target-devel

On 10/26/22 16:19, Mike Christie wrote:
> +struct pr_keys {
> +	u32	generation;
> +	u32	num_keys;
> +	u64	keys[];
> +};
> +
> +struct pr_held_reservation {
> +	u64		key;
> +	u32		generation;
> +	enum pr_type	type;
> +};
> +
>   struct pr_ops {
>   	int (*pr_register)(struct block_device *bdev, u64 old_key, u64 new_key,
>   			u32 flags);
> @@ -14,6 +26,18 @@ struct pr_ops {
>   	int (*pr_preempt)(struct block_device *bdev, u64 old_key, u64 new_key,
>   			enum pr_type type, bool abort);
>   	int (*pr_clear)(struct block_device *bdev, u64 key);
> +	/*
> +	 * pr_read_keys - Read the registered keys and return them in the
> +	 * pr_keys->keys array. The keys array will have been allocated at the
> +	 * end of the pr_keys struct and is keys_len bytes. If there are more
> +	 * keys than can fit in the array, success will still be returned and
> +	 * pr_keys->num_keys will reflect the total number of keys the device
> +	 * contains, so the caller can retry with a larger array.
> +	 */
> +	int (*pr_read_keys)(struct block_device *bdev,
> +			struct pr_keys *keys_info, u32 keys_len);
> +	int (*pr_read_reservation)(struct block_device *bdev,
> +			struct pr_held_reservation *rsv);
>   };

Is there any pr_read_keys() implementation that won't have to divide 
@keys_len by 8? How about leaving out that argument and making callers 
store the number of elements in the keys[] array in the num_keys member 
before calling pr_read_keys()?

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 01/19] block: Add PR callouts for read keys and reservation
  2022-10-26 23:19 ` [PATCH v3 01/19] block: Add PR callouts for read keys and reservation Mike Christie
  2022-11-02 22:50   ` Bart Van Assche
@ 2022-11-02 22:53   ` Bart Van Assche
  2022-11-03  2:25     ` Mike Christie
  1 sibling, 1 reply; 54+ messages in thread
From: Bart Van Assche @ 2022-11-02 22:53 UTC (permalink / raw)
  To: Mike Christie, hch, martin.petersen, linux-scsi, james.bottomley,
	linux-block, dm-devel, snitzer, axboe, linux-nvme, chaitanyak,
	kbusch, target-devel

On 10/26/22 16:19, Mike Christie wrote:
> +struct pr_keys {
> +	u32	generation;
> +	u32	num_keys;
> +	u64	keys[];
> +};
Is my understanding correct that keys[] is treated as opaque data by the 
kernel? If so, is it necessary to convert the persistent reservation 
keys from big endian to CPU endianness? Some SCSI stacks keep 
reservation keys as __be64 format.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 04/19] scsi: Add support for block PR read keys/reservation
  2022-10-26 23:19 ` [PATCH v3 04/19] scsi: Add support for block PR read keys/reservation Mike Christie
  2022-11-01  5:45   ` Chaitanya Kulkarni
@ 2022-11-02 22:54   ` Bart Van Assche
  1 sibling, 0 replies; 54+ messages in thread
From: Bart Van Assche @ 2022-11-02 22:54 UTC (permalink / raw)
  To: Mike Christie, hch, martin.petersen, linux-scsi, james.bottomley,
	linux-block, dm-devel, snitzer, axboe, linux-nvme, chaitanyak,
	kbusch, target-devel

On 10/26/22 16:19, Mike Christie wrote:
> +static inline enum pr_type scsi_pr_type_to_block(enum scsi_pr_type type)
> +{
> +	switch (type) {
> +	case SCSI_PR_WRITE_EXCLUSIVE:
> +		return PR_WRITE_EXCLUSIVE;
> +	case SCSI_PR_EXCLUSIVE_ACCESS:
> +		return PR_EXCLUSIVE_ACCESS;
> +	case SCSI_PR_WRITE_EXCLUSIVE_REG_ONLY:
> +		return PR_WRITE_EXCLUSIVE_REG_ONLY;
> +	case SCSI_PR_EXCLUSIVE_ACCESS_REG_ONLY:
> +		return PR_EXCLUSIVE_ACCESS_REG_ONLY;
> +	case SCSI_PR_WRITE_EXCLUSIVE_ALL_REGS:
> +		return PR_WRITE_EXCLUSIVE_ALL_REGS;
> +	case SCSI_PR_EXCLUSIVE_ACCESS_ALL_REGS:
> +		return PR_EXCLUSIVE_ACCESS_ALL_REGS;
> +	default:
> +		return 0;
> +	}
> +}

Also for this function, how about moving the "return 0" statement out of 
the switch statement?

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 01/19] block: Add PR callouts for read keys and reservation
  2022-11-02 22:50   ` Bart Van Assche
@ 2022-11-03  1:54     ` Mike Christie
  0 siblings, 0 replies; 54+ messages in thread
From: Mike Christie @ 2022-11-03  1:54 UTC (permalink / raw)
  To: Bart Van Assche, hch, martin.petersen, linux-scsi,
	james.bottomley, linux-block, dm-devel, snitzer, axboe,
	linux-nvme, chaitanyak, kbusch, target-devel

On 11/2/22 5:50 PM, Bart Van Assche wrote:
> On 10/26/22 16:19, Mike Christie wrote:
>> +struct pr_keys {
>> +    u32    generation;
>> +    u32    num_keys;
>> +    u64    keys[];
>> +};
>> +
>> +struct pr_held_reservation {
>> +    u64        key;
>> +    u32        generation;
>> +    enum pr_type    type;
>> +};
>> +
>>   struct pr_ops {
>>       int (*pr_register)(struct block_device *bdev, u64 old_key, u64 new_key,
>>               u32 flags);
>> @@ -14,6 +26,18 @@ struct pr_ops {
>>       int (*pr_preempt)(struct block_device *bdev, u64 old_key, u64 new_key,
>>               enum pr_type type, bool abort);
>>       int (*pr_clear)(struct block_device *bdev, u64 key);
>> +    /*
>> +     * pr_read_keys - Read the registered keys and return them in the
>> +     * pr_keys->keys array. The keys array will have been allocated at the
>> +     * end of the pr_keys struct and is keys_len bytes. If there are more
>> +     * keys than can fit in the array, success will still be returned and
>> +     * pr_keys->num_keys will reflect the total number of keys the device
>> +     * contains, so the caller can retry with a larger array.
>> +     */
>> +    int (*pr_read_keys)(struct block_device *bdev,
>> +            struct pr_keys *keys_info, u32 keys_len);
>> +    int (*pr_read_reservation)(struct block_device *bdev,
>> +            struct pr_held_reservation *rsv);
>>   };
> 
> Is there any pr_read_keys() implementation that won't have to divide @keys_len by 8? How about leaving out that argument and making callers store the number of elements in the keys[] array in the num_keys member before calling pr_read_keys()?

That seems ok to me. 


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 03/19] scsi: Move sd_pr_type to header to share
  2022-11-02 22:47   ` Bart Van Assche
@ 2022-11-03  2:13     ` Mike Christie
  2022-11-03 18:14       ` Bart Van Assche
  0 siblings, 1 reply; 54+ messages in thread
From: Mike Christie @ 2022-11-03  2:13 UTC (permalink / raw)
  To: Bart Van Assche, hch, martin.petersen, linux-scsi,
	james.bottomley, linux-block, dm-devel, snitzer, axboe,
	linux-nvme, chaitanyak, kbusch, target-devel

On 11/2/22 5:47 PM, Bart Van Assche wrote:
> On 10/26/22 16:19, Mike Christie wrote:
>> +static inline enum scsi_pr_type block_pr_type_to_scsi(enum pr_type type)
>> +{
>> +    switch (type) {
>> +    case PR_WRITE_EXCLUSIVE:
>> +        return SCSI_PR_WRITE_EXCLUSIVE;
>> +    case PR_EXCLUSIVE_ACCESS:
>> +        return SCSI_PR_EXCLUSIVE_ACCESS;
>> +    case PR_WRITE_EXCLUSIVE_REG_ONLY:
>> +        return SCSI_PR_WRITE_EXCLUSIVE_REG_ONLY;
>> +    case PR_EXCLUSIVE_ACCESS_REG_ONLY:
>> +        return SCSI_PR_EXCLUSIVE_ACCESS_REG_ONLY;
>> +    case PR_WRITE_EXCLUSIVE_ALL_REGS:
>> +        return SCSI_PR_WRITE_EXCLUSIVE_ALL_REGS;
>> +    case PR_EXCLUSIVE_ACCESS_ALL_REGS:
>> +        return SCSI_PR_EXCLUSIVE_ACCESS_ALL_REGS;
>> +    default:
>> +        return 0;
>> +    }
>> +};
> 
> Please leave out "default: return 0;" from the switch statement and add "return 0;" after the switch statement. That will make the compiler emit a warning if a value is added in enum pr_type but not in the above function.
Hey Bart,

Did you want that compiler warning functionality in the future code
or are you asking me to do the above only if we go the switch based
route?

Chaitanya requested me to make this array based in nvme/scsi. For this
approach, I can add a WARN or printk warning if the pr_type passed in
does not match a value in the array. I don't think I can do a compiler
warning though.

I didn't care, but I think the compiler warning route might be better
though.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 01/19] block: Add PR callouts for read keys and reservation
  2022-11-02 22:53   ` Bart Van Assche
@ 2022-11-03  2:25     ` Mike Christie
  0 siblings, 0 replies; 54+ messages in thread
From: Mike Christie @ 2022-11-03  2:25 UTC (permalink / raw)
  To: Bart Van Assche, hch, martin.petersen, linux-scsi,
	james.bottomley, linux-block, dm-devel, snitzer, axboe,
	linux-nvme, chaitanyak, kbusch, target-devel

On 11/2/22 5:53 PM, Bart Van Assche wrote:
> On 10/26/22 16:19, Mike Christie wrote:
>> +struct pr_keys {
>> +    u32    generation;
>> +    u32    num_keys;
>> +    u64    keys[];
>> +};
> Is my understanding correct that keys[] is treated as opaque data by the kernel? If so, is it necessary to convert the persistent reservation keys from big endian to CPU endianness? Some SCSI stacks keep reservation keys as __be64 format.

The pr_read_keys/reservation calls work like the pr_register/reserve/
release calls where the scsi and nvme layer convert to/from the cpu
endianness to the specs endiennness (big for scsi and little for nvme).

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 03/19] scsi: Move sd_pr_type to header to share
  2022-11-03  2:13     ` Mike Christie
@ 2022-11-03 18:14       ` Bart Van Assche
  0 siblings, 0 replies; 54+ messages in thread
From: Bart Van Assche @ 2022-11-03 18:14 UTC (permalink / raw)
  To: Mike Christie, hch, martin.petersen, linux-scsi, james.bottomley,
	linux-block, dm-devel, snitzer, axboe, linux-nvme, chaitanyak,
	kbusch, target-devel

On 11/2/22 19:13, Mike Christie wrote:
> On 11/2/22 5:47 PM, Bart Van Assche wrote:
>> On 10/26/22 16:19, Mike Christie wrote:
>>> +static inline enum scsi_pr_type block_pr_type_to_scsi(enum pr_type type)
>>> +{
>>> +    switch (type) {
>>> +    case PR_WRITE_EXCLUSIVE:
>>> +        return SCSI_PR_WRITE_EXCLUSIVE;
>>> +    case PR_EXCLUSIVE_ACCESS:
>>> +        return SCSI_PR_EXCLUSIVE_ACCESS;
>>> +    case PR_WRITE_EXCLUSIVE_REG_ONLY:
>>> +        return SCSI_PR_WRITE_EXCLUSIVE_REG_ONLY;
>>> +    case PR_EXCLUSIVE_ACCESS_REG_ONLY:
>>> +        return SCSI_PR_EXCLUSIVE_ACCESS_REG_ONLY;
>>> +    case PR_WRITE_EXCLUSIVE_ALL_REGS:
>>> +        return SCSI_PR_WRITE_EXCLUSIVE_ALL_REGS;
>>> +    case PR_EXCLUSIVE_ACCESS_ALL_REGS:
>>> +        return SCSI_PR_EXCLUSIVE_ACCESS_ALL_REGS;
>>> +    default:
>>> +        return 0;
>>> +    }
>>> +};
>>
>> Please leave out "default: return 0;" from the switch statement and add "return 0;" after the switch statement. That will make the compiler emit a warning if a value is added in enum pr_type but not in the above function.
> 
> Did you want that compiler warning functionality in the future code
> or are you asking me to do the above only if we go the switch based
> route?
> 
> Chaitanya requested me to make this array based in nvme/scsi. For this
> approach, I can add a WARN or printk warning if the pr_type passed in
> does not match a value in the array. I don't think I can do a compiler
> warning though.
> 
> I didn't care, but I think the compiler warning route might be better
> though.

Hi Mike,

Although I do not have a strong opinion about this, keeping the switch 
statement and moving the return statement out of the switch statement 
has the advantage that the compiler will warn if the switch statement is 
incomplete. I don't think that a similar warning would be emitted if the 
switch statement would be converted into an array lookup.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 12/19] block,nvme,scsi,dm: Add blk_status to pr_ops callouts
  2022-11-01 10:15       ` Christoph Hellwig
@ 2022-11-05 18:36         ` Mike Christie
  2022-11-07  9:16           ` Christoph Hellwig
  0 siblings, 1 reply; 54+ messages in thread
From: Mike Christie @ 2022-11-05 18:36 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: bvanassche, martin.petersen, linux-scsi, james.bottomley,
	linux-block, dm-devel, snitzer, axboe, linux-nvme, chaitanyak,
	kbusch, target-devel

On 11/1/22 5:15 AM, Christoph Hellwig wrote:
> On Sun, Oct 30, 2022 at 06:05:35PM -0500, Mike Christie wrote:
>> The problem I hit is that in the ioctl code I then have to do:
>>
>> @@ -269,7 +270,14 @@ static int blkdev_pr_register(struct block_device *bdev,
>>  
>>  	if (reg.flags & ~PR_FL_IGNORE_KEY)
>>  		return -EOPNOTSUPP;
>> -	return ops->pr_register(bdev, reg.old_key, reg.new_key, reg.flags);
>> +	ret = ops->pr_register(bdev, reg.old_key, reg.new_key, reg.flags);
>> +	if (ret == -EBADE) {
>> +		if (bdev_is_nvme(bdev))
>> +			ret = NVME_SC_RESERVATION_CONFLICT;
>> +		else if (bdev_is_scsi(bdev)
>> +			ret = SAM_STAT_RESERVATION_CONFLICT;
>> +	}
>> +	return ret;
> 
> Eww.  We should have never leaked protocol specific values out to
> userspace.  This is an original bug I introduceѕ, so all blame is on me.
> 
> I suspect the right way to fix is is to keep the numeric value of
> SAM_STAT_RESERVATION_CONFLICT and give it a new constant exposed in
> the uapi header, assuming that SCSI is the thing people actually
> used the PR API for, and nvme was just an nice little add-on.
> 

Do you mean just doing this:

diff --git a/drivers/nvme/host/pr.c b/drivers/nvme/host/pr.c
index 727123c611e6..6ac70514170d 100644
--- a/drivers/nvme/host/pr.c
+++ b/drivers/nvme/host/pr.c
@@ -72,12 +72,17 @@ static int nvme_send_ns_pr_command(struct nvme_ns *ns, struct nvme_command *c,
 static int nvme_send_pr_command(struct block_device *bdev,
 		struct nvme_command *c, void *data, unsigned int data_len)
 {
+	int ret;
+
 	if (IS_ENABLED(CONFIG_NVME_MULTIPATH) &&
 	    bdev->bd_disk->fops == &nvme_ns_head_ops)
-		return nvme_send_ns_head_pr_command(bdev, c, data, data_len);
+		ret = nvme_send_ns_head_pr_command(bdev, c, data, data_len);
 	else
-		return nvme_send_ns_pr_command(bdev->bd_disk->private_data, c,
-					       data, data_len);
+		ret = nvme_send_ns_pr_command(bdev->bd_disk->private_data, c,
+					      data, data_len);
+	if (ret == NVME_SC_RESERVATION_CONFLICT)
+		ret = PR_STS_RESERVATION_CONFLICT;
+	return ret;
 }
 
 static int nvme_pr_command(struct block_device *bdev, u32 cdw10,
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index c0a614efcfce..c7621d8ffa51 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1840,7 +1840,8 @@ static int sd_pr_out_command(struct block_device *bdev, u8 sa,
 	    scsi_sense_valid(&sshdr)) {
 		sdev_printk(KERN_INFO, sdev, "PR command failed: %d\n", result);
 		scsi_print_sense_hdr(sdev, NULL, &sshdr);
-	}
+	} else if (__get_status_byte(result) == SAM_STAT_RESERVATION_CONFLICT)
+		result = PR_STS_RESERVATION_CONFLICT;
 
 	return result;
 }
diff --git a/include/uapi/linux/pr.h b/include/uapi/linux/pr.h
index ccc78cbf1221..7a637f9e5b49 100644
--- a/include/uapi/linux/pr.h
+++ b/include/uapi/linux/pr.h
@@ -13,6 +13,15 @@ enum pr_type {
 	PR_EXCLUSIVE_ACCESS_ALL_REGS	= 6,
 };
 
+enum pr_status {
+	PR_STS_SUCCESS			= 0x0,
+	/*
+	 * The error codes are based on SCSI, because it was the primary user
+	 * and had userspace users.
+	 */
+	PR_STS_RESERVATION_CONFLICT	= 0x18,
+};
+
 struct pr_reservation {
 	__u64	key;
 	__u32	type;


---------------------------------------------------------------------------


Or we could add a new flag and make it nicer for the user in the future,
but also uglier for the drivers but a little less uglier than adding the
blk_sts field to the calls:



diff --git a/block/ioctl.c b/block/ioctl.c
index 60121e89052b..cae58f57ea13 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -269,7 +269,8 @@ static int blkdev_pr_register(struct block_device *bdev,
 
 	if (reg.flags & ~PR_FL_IGNORE_KEY)
 		return -EOPNOTSUPP;
-	return ops->pr_register(bdev, reg.old_key, reg.new_key, reg.flags);
+	return ops->pr_register(bdev, reg.old_key, reg.new_key, reg.flags,
+				reg.flags & PR_FL_PR_STAT);
 }
 
 static int blkdev_pr_reserve(struct block_device *bdev,
@@ -287,7 +288,8 @@ static int blkdev_pr_reserve(struct block_device *bdev,
 
 	if (rsv.flags & ~PR_FL_IGNORE_KEY)
 		return -EOPNOTSUPP;
-	return ops->pr_reserve(bdev, rsv.key, rsv.type, rsv.flags);
+	return ops->pr_reserve(bdev, rsv.key, rsv.type, rsv.flags,
+			       rsv.flags & PR_FL_PR_STAT)
 }
 
 static int blkdev_pr_release(struct block_device *bdev,
@@ -305,7 +307,8 @@ static int blkdev_pr_release(struct block_device *bdev,
 
 	if (rsv.flags)
 		return -EOPNOTSUPP;
-	return ops->pr_release(bdev, rsv.key, rsv.type);
+	return ops->pr_release(bdev, rsv.key, rsv.type,
+			       rsv.flags & PR_FL_PR_STAT);
 }
 
 static int blkdev_pr_preempt(struct block_device *bdev,
@@ -323,7 +326,8 @@ static int blkdev_pr_preempt(struct block_device *bdev,
 
 	if (p.flags)
 		return -EOPNOTSUPP;
-	return ops->pr_preempt(bdev, p.old_key, p.new_key, p.type, abort);
+	return ops->pr_preempt(bdev, p.old_key, p.new_key, p.type, abort,
+			       p.flags & PR_FL_PR_STAT)
 }
 
 static int blkdev_pr_clear(struct block_device *bdev,
@@ -341,7 +345,7 @@ static int blkdev_pr_clear(struct block_device *bdev,
 
 	if (c.flags)
 		return -EOPNOTSUPP;
-	return ops->pr_clear(bdev, c.key);
+	return ops->pr_clear(bdev, c.key, c.flags & PR_FL_PR_STAT);
 }
 
 static int blkdev_flushbuf(struct block_device *bdev, fmode_t mode,
diff --git a/drivers/nvme/host/pr.c b/drivers/nvme/host/pr.c
index 727123c611e6..7c317b646cde 100644
--- a/drivers/nvme/host/pr.c
+++ b/drivers/nvme/host/pr.c
@@ -70,18 +70,44 @@ static int nvme_send_ns_pr_command(struct nvme_ns *ns, struct nvme_command *c,
 }
 
 static int nvme_send_pr_command(struct block_device *bdev,
-		struct nvme_command *c, void *data, unsigned int data_len)
+		struct nvme_command *c, void *data, unsigned int data_len,
+		bool use_pr_sts)
 {
+	int ret;
+
 	if (IS_ENABLED(CONFIG_NVME_MULTIPATH) &&
 	    bdev->bd_disk->fops == &nvme_ns_head_ops)
-		return nvme_send_ns_head_pr_command(bdev, c, data, data_len);
+		ret = nvme_send_ns_head_pr_command(bdev, c, data, data_len);
 	else
-		return nvme_send_ns_pr_command(bdev->bd_disk->private_data, c,
-					       data, data_len);
+		ret = nvme_send_ns_pr_command(bdev->bd_disk->private_data, c,
+					      data, data_len);
+	if (!ret || ret < 0 || (ret > 0 && !use_pr_sts))
+		return ret;
+
+	switch (ret) {
+	case NVME_SC_RESERVATION_CONFLICT:
+		ret = PR_STS_RESERVATION_CONFLICT;
+		break;
+	case NVME_SC_HOST_PATH_ERROR:
+		ret = PR_STS_PATH_FAILURE;
+		break
+	case NVME_SC_BAD_ATTRIBUTES:
+	case NVME_SC_ONCS_NOT_SUPPORTED:
+	case NVME_SC_INVALID_OPCODE:
+	case NVME_SC_INVALID_FIELD:
+	case NVME_SC_INVALID_NS:
+		ret = PR_STS_INVALID_OP;
+		break;
+	default:
+		ret = PR_STS_IOERR;
+		break;
+	}
+
+	return ret;
 }
 
 static int nvme_pr_command(struct block_device *bdev, u32 cdw10,
-				u64 key, u64 sa_key, u8 op)
+				u64 key, u64 sa_key, u8 op, bool use_pr_sts)
 {
 	struct nvme_command c = { };
 	u8 data[16] = { 0, };
@@ -92,11 +118,11 @@ static int nvme_pr_command(struct block_device *bdev, u32 cdw10,
 	c.common.opcode = op;
 	c.common.cdw10 = cpu_to_le32(cdw10);
 
-	return nvme_send_pr_command(bdev, &c, data, sizeof(data));
+	return nvme_send_pr_command(bdev, &c, data, sizeof(data), use_pr_sts);
 }
 
 static int nvme_pr_register(struct block_device *bdev, u64 old,
-		u64 new, unsigned flags)
+		u64 new, unsigned flags, bool use_pr_sts)
 {
 	u32 cdw10;
 
@@ -106,11 +132,12 @@ static int nvme_pr_register(struct block_device *bdev, u64 old,
 	cdw10 = old ? 2 : 0;
 	cdw10 |= (flags & PR_FL_IGNORE_KEY) ? 1 << 3 : 0;
 	cdw10 |= (1 << 30) | (1 << 31); /* PTPL=1 */
-	return nvme_pr_command(bdev, cdw10, old, new, nvme_cmd_resv_register);
+	return nvme_pr_command(bdev, cdw10, old, new, nvme_cmd_resv_register,
+			       use_pr_sts);
 }
 
 static int nvme_pr_reserve(struct block_device *bdev, u64 key,
-		enum pr_type type, unsigned flags)
+		enum pr_type type, unsigned flags, bool use_pr_sts)
 {
 	u32 cdw10;
 
@@ -119,29 +146,34 @@ static int nvme_pr_reserve(struct block_device *bdev, u64 key,
 
 	cdw10 = nvme_pr_type_from_blk(type) << 8;
 	cdw10 |= ((flags & PR_FL_IGNORE_KEY) ? 1 << 3 : 0);
-	return nvme_pr_command(bdev, cdw10, key, 0, nvme_cmd_resv_acquire);
+	return nvme_pr_command(bdev, cdw10, key, 0, nvme_cmd_resv_acquire,
+			       use_pr_sts);
 }
 
 static int nvme_pr_preempt(struct block_device *bdev, u64 old, u64 new,
-		enum pr_type type, bool abort)
+		enum pr_type type, bool abort, bool use_pr_sts)
 {
 	u32 cdw10 = nvme_pr_type_from_blk(type) << 8 | (abort ? 2 : 1);
 
-	return nvme_pr_command(bdev, cdw10, old, new, nvme_cmd_resv_acquire);
+	return nvme_pr_command(bdev, cdw10, old, new, nvme_cmd_resv_acquire,
+			       use_pr_sts);
 }
 
-static int nvme_pr_clear(struct block_device *bdev, u64 key)
+static int nvme_pr_clear(struct block_device *bdev, u64 key, bool use_pr_sts)
 {
 	u32 cdw10 = 1 | (key ? 0 : 1 << 3);
 
-	return nvme_pr_command(bdev, cdw10, key, 0, nvme_cmd_resv_release);
+	return nvme_pr_command(bdev, cdw10, key, 0, nvme_cmd_resv_release,
+			       use_pr_sts);
 }
 
-static int nvme_pr_release(struct block_device *bdev, u64 key, enum pr_type type)
+static int nvme_pr_release(struct block_device *bdev, u64 key,
+		enum pr_type type, bool use_pr_sts)
 {
 	u32 cdw10 = nvme_pr_type_from_blk(type) << 8 | (key ? 0 : 1 << 3);
 
-	return nvme_pr_command(bdev, cdw10, key, 0, nvme_cmd_resv_release);
+	return nvme_pr_command(bdev, cdw10, key, 0, nvme_cmd_resv_release,
+			       use_pr_sts);
 }
 
 static int nvme_pr_resv_report(struct block_device *bdev, void *data,
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index c0a614efcfce..46393bdd7427 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1797,7 +1797,8 @@ static int sd_pr_read_reservation(struct block_device *bdev,
 }
 
 static int sd_pr_out_command(struct block_device *bdev, u8 sa,
-		u64 key, u64 sa_key, enum scsi_pr_type type, u8 flags)
+		u64 key, u64 sa_key, enum scsi_pr_type type, u8 flags,
+		bool use_pr_sts)
 {
 	struct scsi_disk *sdkp = scsi_disk(bdev->bd_disk);
 	struct scsi_device *sdev = sdkp->device;
@@ -1842,44 +1843,74 @@ static int sd_pr_out_command(struct block_device *bdev, u8 sa,
 		scsi_print_sense_hdr(sdev, NULL, &sshdr);
 	}
 
+	if (!result || result < 0 || (result > 0 && !use_pr_sts))
+		return result;
+
+	result = PR_STS_IOERR;
+
+	switch host_byte(result) {
+	case DID_TRANSPORT_FAILFAST:
+	case DID_TRANSPORT_MARGINAL:
+	case DID_TRANSPORT_DISRUPTED:
+		result = PR_STS_PATH_FAILURE;
+		goto done;
+	}
+
+	switch (__get_status_byte(result)) {
+	case SAM_STAT_RESERVATION_CONFLICT:
+		result = PR_STS_RESERVATION_CONFLICT;
+		goto done;
+	case SAM_STAT_CHECK_CONDITION:
+		if (!scsi_sense_valid(&sshdr))
+			goto done;
+
+		if (sshdr.sense_key == ILLEGAL_REQUEST &&
+		    (sshdr.asc == 0x26 || sshdr.asc == 0x24)) {
+			result = PR_STS_INVALID_OP;
+			goto done;
+		}
+	}
+
+done:
 	return result;
 }
 
 static int sd_pr_register(struct block_device *bdev, u64 old_key, u64 new_key,
-		u32 flags)
+		u32 flags, bool use_pr_sts)
 {
 	if (flags & ~PR_FL_IGNORE_KEY)
 		return -EOPNOTSUPP;
 	return sd_pr_out_command(bdev, (flags & PR_FL_IGNORE_KEY) ? 0x06 : 0x00,
-			old_key, new_key, 0,
-			(1 << 0) /* APTPL */);
+				 old_key, new_key, 0, (1 << 0) /* APTPL */,
+				 use_pr_sts);
 }
 
 static int sd_pr_reserve(struct block_device *bdev, u64 key, enum pr_type type,
-		u32 flags)
+		u32 flags, bool use_pr_sts)
 {
 	if (flags)
 		return -EOPNOTSUPP;
 	return sd_pr_out_command(bdev, 0x01, key, 0,
-				 block_pr_type_to_scsi(type), 0);
+				 block_pr_type_to_scsi(type), 0, use_pr_sts)
 }
 
-static int sd_pr_release(struct block_device *bdev, u64 key, enum pr_type type)
+static int sd_pr_release(struct block_device *bdev, u64 key, enum pr_type type,
+			 bool use_pr_sts)
 {
 	return sd_pr_out_command(bdev, 0x02, key, 0,
-				 block_pr_type_to_scsi(type), 0);
+				 block_pr_type_to_scsi(type), 0, use_pr_sts);
 }
 
 static int sd_pr_preempt(struct block_device *bdev, u64 old_key, u64 new_key,
-		enum pr_type type, bool abort)
+		enum pr_type type, bool abort, bool use_pr_sts)
 {
 	return sd_pr_out_command(bdev, abort ? 0x05 : 0x04, old_key, new_key,
-				 block_pr_type_to_scsi(type), 0);
+				 block_pr_type_to_scsi(type), 0, use_pr_sts);
 }
 
-static int sd_pr_clear(struct block_device *bdev, u64 key)
+static int sd_pr_clear(struct block_device *bdev, u64 key, bool use_pr_sts)
 {
-	return sd_pr_out_command(bdev, 0x03, key, 0, 0, 0);
+	return sd_pr_out_command(bdev, 0x03, key, 0, 0, 0, use_pr_sts);
 }
 
 static const struct pr_ops sd_pr_ops = {
diff --git a/include/linux/pr.h b/include/linux/pr.h
index 3003daec28a5..51e03e73a87f 100644
--- a/include/linux/pr.h
+++ b/include/linux/pr.h
@@ -18,14 +18,14 @@ struct pr_held_reservation {
 
 struct pr_ops {
 	int (*pr_register)(struct block_device *bdev, u64 old_key, u64 new_key,
-			u32 flags);
+			u32 flags, bool use_pr_sts);
 	int (*pr_reserve)(struct block_device *bdev, u64 key,
-			enum pr_type type, u32 flags);
+			enum pr_type type, u32 flags, bool use_pr_sts);
 	int (*pr_release)(struct block_device *bdev, u64 key,
-			enum pr_type type);
+			enum pr_type type, bool use_pr_sts);
 	int (*pr_preempt)(struct block_device *bdev, u64 old_key, u64 new_key,
-			enum pr_type type, bool abort);
-	int (*pr_clear)(struct block_device *bdev, u64 key);
+			enum pr_type type, bool abort, bool use_pr_sts);
+	int (*pr_clear)(struct block_device *bdev, u64 key, bool use_pr_sts);
 	/*
 	 * pr_read_keys - Read the registered keys and return them in the
 	 * pr_keys->keys array. The keys array will have been allocated at the
diff --git a/include/uapi/linux/pr.h b/include/uapi/linux/pr.h
index ccc78cbf1221..ac8f7fc404ff 100644
--- a/include/uapi/linux/pr.h
+++ b/include/uapi/linux/pr.h
@@ -13,6 +13,14 @@ enum pr_type {
 	PR_EXCLUSIVE_ACCESS_ALL_REGS	= 6,
 };
 
+enum pr_status {
+	PR_STS_SUCCESS,
+	PR_STS_IOERR,
+	PR_STS_RESERVATION_CONFLICT,
+	PR_STS_PATH_FAILURE,
+	PR_STS_INVALID_OP,
+};
+
 struct pr_reservation {
 	__u64	key;
 	__u32	type;
@@ -40,6 +48,7 @@ struct pr_clear {
 };
 
 #define PR_FL_IGNORE_KEY	(1 << 0)	/* ignore existing key */
+#define PR_FL_PR_STAT		(1 << 1)	/* Convert device errors to pr_status */
 
 #define IOC_PR_REGISTER		_IOW('p', 200, struct pr_registration)
 #define IOC_PR_RESERVE		_IOW('p', 201, struct pr_reservation)


Bart, if this is what you were suggesting I misunderstood you. I thought you wanted to
only pass the new code in the kernel so that's why I was saying we still have the problem
of converting the error back when passing it back to userspace.

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 12/19] block,nvme,scsi,dm: Add blk_status to pr_ops callouts
  2022-11-05 18:36         ` Mike Christie
@ 2022-11-07  9:16           ` Christoph Hellwig
  0 siblings, 0 replies; 54+ messages in thread
From: Christoph Hellwig @ 2022-11-07  9:16 UTC (permalink / raw)
  To: Mike Christie
  Cc: Christoph Hellwig, bvanassche, martin.petersen, linux-scsi,
	james.bottomley, linux-block, dm-devel, snitzer, axboe,
	linux-nvme, chaitanyak, kbusch, target-devel

On Sat, Nov 05, 2022 at 01:36:18PM -0500, Mike Christie wrote:
> Do you mean just doing this:

That would be the minimal fix.  We'd then still need to enumerate
the allowed positive return values and check noting else is returned.

I don't like the opt in in the other version.  The SCSI return values are
the defactor API, and we need to switch NVMe to align with it ASAP
instead of keeping the broken old version around.

^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2022-11-07  9:16 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-26 23:19 [PATCH v3 00/19] Use block pr_ops in LIO Mike Christie
2022-10-26 23:19 ` [PATCH v3 01/19] block: Add PR callouts for read keys and reservation Mike Christie
2022-11-02 22:50   ` Bart Van Assche
2022-11-03  1:54     ` Mike Christie
2022-11-02 22:53   ` Bart Van Assche
2022-11-03  2:25     ` Mike Christie
2022-10-26 23:19 ` [PATCH v3 02/19] scsi: Rename sd_pr_command Mike Christie
2022-11-01  5:33   ` Chaitanya Kulkarni
2022-10-26 23:19 ` [PATCH v3 03/19] scsi: Move sd_pr_type to header to share Mike Christie
2022-11-01  5:43   ` Chaitanya Kulkarni
2022-11-01 16:43     ` Mike Christie
2022-11-02 22:47   ` Bart Van Assche
2022-11-03  2:13     ` Mike Christie
2022-11-03 18:14       ` Bart Van Assche
2022-10-26 23:19 ` [PATCH v3 04/19] scsi: Add support for block PR read keys/reservation Mike Christie
2022-11-01  5:45   ` Chaitanya Kulkarni
2022-11-02 22:54   ` Bart Van Assche
2022-10-26 23:19 ` [PATCH v3 05/19] dm: " Mike Christie
2022-10-26 23:19 ` [PATCH v3 06/19] nvme: Fix reservation status related structs Mike Christie
2022-10-27 17:04   ` Keith Busch
2022-10-26 23:19 ` [PATCH v3 07/19] nvme: Don't hardcode the data len for pr commands Mike Christie
2022-10-27 17:05   ` Keith Busch
2022-11-01  5:29   ` Chaitanya Kulkarni
2022-10-26 23:19 ` [PATCH v3 08/19] nvme: Move pr code to it's own file Mike Christie
2022-10-27 17:06   ` Keith Busch
2022-10-28 16:06     ` Mike Christie
2022-10-28 16:38       ` Keith Busch
2022-10-30  8:06         ` Christoph Hellwig
2022-11-01  5:25   ` Chaitanya Kulkarni
2022-10-26 23:19 ` [PATCH v3 09/19] nvme: Add pr_ops read_keys support Mike Christie
2022-10-30  8:17   ` Christoph Hellwig
2022-10-30 20:47     ` Mike Christie
2022-10-26 23:19 ` [PATCH v3 10/19] nvme: Move NVMe and Block PR types to an array Mike Christie
2022-10-27 15:18   ` Keith Busch
2022-10-27 17:06     ` Mike Christie
2022-10-27 17:13       ` michael.christie
2022-10-27 17:16         ` Keith Busch
2022-10-28 16:05           ` Mike Christie
2022-10-26 23:19 ` [PATCH v3 11/19] nvme: Add pr_ops read_reservation support Mike Christie
2022-10-30  8:18   ` Christoph Hellwig
2022-10-30 20:54     ` Mike Christie
2022-10-26 23:19 ` [PATCH v3 12/19] block,nvme,scsi,dm: Add blk_status to pr_ops callouts Mike Christie
2022-10-30  8:20   ` Christoph Hellwig
2022-10-30 23:05     ` Mike Christie
2022-11-01 10:15       ` Christoph Hellwig
2022-11-05 18:36         ` Mike Christie
2022-11-07  9:16           ` Christoph Hellwig
2022-10-26 23:19 ` [PATCH v3 13/19] nvme: Have NVMe pr_ops return a blk_status_t Mike Christie
2022-10-26 23:19 ` [PATCH v3 14/19] scsi: Export scsi_result_to_blk_status Mike Christie
2022-10-26 23:19 ` [PATCH v3 15/19] scsi: Have sd pr_ops return a blk_status_t Mike Christie
2022-10-26 23:19 ` [PATCH v3 16/19] scsi: target: Rename sbc_ops to exec_cmd_ops Mike Christie
2022-10-26 23:19 ` [PATCH v3 17/19] scsi: target: Allow backends to hook into PR handling Mike Christie
2022-10-26 23:19 ` [PATCH v3 18/19] scsi: target: Don't support SCSI-2 RESERVE/RELEASE Mike Christie
2022-10-26 23:19 ` [PATCH v3 19/19] scsi: target: Add block PR support to iblock Mike Christie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).