linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/3] nvme : Add ioctl to query nvme device attributes
       [not found] <CGME20221027160102eucas1p1bb39a9a2b8ed6b54f854849532359332@eucas1p1.samsung.com>
@ 2022-10-27 15:57 ` Joel Granados
       [not found]   ` <CGME20221027160108eucas1p223383fc27ba0a27a38f7b768c562a1ed@eucas1p2.samsung.com>
                     ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Joel Granados @ 2022-10-27 15:57 UTC (permalink / raw)
  To: kbusch, sagi, hch
  Cc: linux-nvme, gost.dev, joshi.k, javier.gonz, p.raghav, Joel Granados

What?
In this patch I add an ioctl that provides nvme controller attributes to
the user that can open the char device file. We also provide them for the
block devices for convenience.

Why?
Applications with write permissions should not need to be privileged to
write to the device. With Kanchan's latest patch
(https://lore.kernel.org/linux-nvme/20221020070205.57366-1-joshi.k@samsung.com/)
the nvme IO and identify commands in passthru now follow device
permissions; however there are still some controller attributes like
minimal data transfer size (MDTS) which need a privileged user to be
queried.

How?
Added an ioctl (NVME_IOCTL_GET_ATTR) that fetches attributes from the nvme
driver. It sends both nvme_identify_ctrl and nvme_identify_cs_ctrl commands
to query the attributes that are not contained in nvme_ctrl already.

I have rebased this on top of Kanchans "nvme: identify-namespace without
CAP_SYS_ADMIN" (https://lore.kernel.org/linux-nvme/20221020070205.57366-3-joshi.k@samsung.com)
because this only makes sense if we can also do IO as unprivileged.

I have made the call extensible by embedding the struct size as the first
member and using it as a version. As more members get added, the switch
statement gets populated with more checks. For this I followed in the steps
of the openat2 system call [1] and the extensible ioctl for vfio [2].
Another interesting reference for this is here https://lwn.net/Articles/830666/

The first patch in the series is a fix to a potential bug in a return
value. This was in a section of the code that was getting changed so I
decided to include it in this series.

Questions:
1. I started by providing members that have relevant attributes like mdts,
   and mpsmin for regular writes as well as wzsl, wusl for more specialized
   write commands like write zeroes and also dmsl, dmrsl, dmrl for dataset
   management commands.  Have I missed an attribute that should definitely
   be there? Or have a included one which should be removed?

2. Naming is always important. We called it "GET_ATTR" because it "gets"
   nvme specific attributes. This is fairly general for whatever we want to
   add in the future, but do we need to be more specific here? Alternatives
   are  "NVME_IOCTL_DEV_ATTR", "NVME_IOCTL_DRV_ATTR" or
   "NVME_IOCTL_CTRL_ATTR". Thoughts?

3. I have named the communication structure "nvme_get_attr" to be
   consistent with the ioctl name. Are there alternatives?

4. I have made the ioctl call extensible because I believe that this ioctl
   might grow in the future as ppl see that there might be more data needed
   to do an nvme write.

[1] https://github.com/torvalds/linux/commit/fddb5d430ad9
[2] https://github.com/torvalds/linux/blob/master/include/uapi/linux/vfio.h#L56

Joel Granados (3):
  nvme: Return -ENOMEM when kzalloc fails
  nvme: Move nvme identify CS to separate function
  nvme : Add ioctl to query nvme attributes

 drivers/nvme/host/core.c        | 35 ++++++++++------
 drivers/nvme/host/ioctl.c       | 58 ++++++++++++++++++++++++++
 drivers/nvme/host/nvme.h        | 10 +++++
 include/uapi/linux/nvme_ioctl.h | 74 +++++++++++++++++++++++++++++++++
 4 files changed, 164 insertions(+), 13 deletions(-)

-- 
2.30.2



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC 1/3] nvme: Return -ENOMEM when kzalloc fails
       [not found]   ` <CGME20221027160108eucas1p223383fc27ba0a27a38f7b768c562a1ed@eucas1p2.samsung.com>
@ 2022-10-27 15:57     ` Joel Granados
  2022-10-27 16:26       ` Keith Busch
  0 siblings, 1 reply; 14+ messages in thread
From: Joel Granados @ 2022-10-27 15:57 UTC (permalink / raw)
  To: kbusch, sagi, hch
  Cc: linux-nvme, gost.dev, joshi.k, javier.gonz, p.raghav, Joel Granados

In nvme_init_non_mdts_limits function we were returning 0 when kzalloc
failed. This patch corrects this behavior and return -ENOMEM
Fixes: 5befc7c26e5a ("nvme: implement non-mdts command limits")

Signed-off-by: Joel Granados <j.granados@samsung.com>
---
 drivers/nvme/host/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 059737c1a2c1..7dbf221247f7 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3050,7 +3050,7 @@ static int nvme_init_non_mdts_limits(struct nvme_ctrl *ctrl)
 
 	id = kzalloc(sizeof(*id), GFP_KERNEL);
 	if (!id)
-		return 0;
+		return -ENOMEM;
 
 	c.identify.opcode = nvme_admin_identify;
 	c.identify.cns = NVME_ID_CNS_CS_CTRL;
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC 2/3] nvme: Move nvme identify CS to separate function
       [not found]   ` <CGME20221027160108eucas1p1cc1351fd8bf35ef6e5220e0cdc865ff4@eucas1p1.samsung.com>
@ 2022-10-27 15:57     ` Joel Granados
  0 siblings, 0 replies; 14+ messages in thread
From: Joel Granados @ 2022-10-27 15:57 UTC (permalink / raw)
  To: kbusch, sagi, hch
  Cc: linux-nvme, gost.dev, joshi.k, javier.gonz, p.raghav, Joel Granados

This is a preparation commit to be able to use the NVME identify command
space controller command from two places without duplication.

Signed-off-by: Joel Granados <j.granados@samsung.com>
---
 drivers/nvme/host/core.c | 33 +++++++++++++++++++++------------
 1 file changed, 21 insertions(+), 12 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 7dbf221247f7..d76be125dc49 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3019,9 +3019,27 @@ static inline u32 nvme_mps_to_sectors(struct nvme_ctrl *ctrl, u32 units)
 	return val;
 }
 
+int nvme_identify_cs_ctrl(struct nvme_ctrl *ctrl, struct nvme_id_ctrl_nvm **id)
+{
+	struct nvme_command c = { };
+	int err;
+
+	c.identify.opcode = nvme_admin_identify;
+	c.identify.cns = NVME_ID_CNS_CS_CTRL;
+	c.identify.csi = NVME_CSI_NVM;
+
+	*id = kzalloc(sizeof(**id), GFP_KERNEL);
+	if (!(*id))
+		return -ENOMEM;
+
+	err = nvme_submit_sync_cmd(ctrl->admin_q, &c, *id, sizeof(**id));
+	if (err)
+		kfree(*id);
+	return err;
+}
+
 static int nvme_init_non_mdts_limits(struct nvme_ctrl *ctrl)
 {
-	struct nvme_command c = { };
 	struct nvme_id_ctrl_nvm *id;
 	int ret;
 
@@ -3048,17 +3066,9 @@ static int nvme_init_non_mdts_limits(struct nvme_ctrl *ctrl)
 	if (nvme_ctrl_limited_cns(ctrl))
 		return 0;
 
-	id = kzalloc(sizeof(*id), GFP_KERNEL);
-	if (!id)
-		return -ENOMEM;
-
-	c.identify.opcode = nvme_admin_identify;
-	c.identify.cns = NVME_ID_CNS_CS_CTRL;
-	c.identify.csi = NVME_CSI_NVM;
-
-	ret = nvme_submit_sync_cmd(ctrl->admin_q, &c, id, sizeof(*id));
+	ret = nvme_identify_cs_ctrl(ctrl, &id);
 	if (ret)
-		goto free_data;
+		return ret;
 
 	if (id->dmrl)
 		ctrl->max_discard_segments = id->dmrl;
@@ -3066,7 +3076,6 @@ static int nvme_init_non_mdts_limits(struct nvme_ctrl *ctrl)
 	if (id->wzsl)
 		ctrl->max_zeroes_sectors = nvme_mps_to_sectors(ctrl, id->wzsl);
 
-free_data:
 	kfree(id);
 	return ret;
 }
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC 3/3] nvme : Add ioctl to query nvme attributes
       [not found]   ` <CGME20221027160108eucas1p2722b30a1be27a855e2b0f2495fed15ab@eucas1p2.samsung.com>
@ 2022-10-27 15:57     ` Joel Granados
  2022-10-27 16:55       ` Keith Busch
  0 siblings, 1 reply; 14+ messages in thread
From: Joel Granados @ 2022-10-27 15:57 UTC (permalink / raw)
  To: kbusch, sagi, hch
  Cc: linux-nvme, gost.dev, joshi.k, javier.gonz, p.raghav, Joel Granados

An ioctl (NVME_IOCTL_GET_ATTR) is added to query nvme controller
attributes needed to effectively write to the char device without needing
to be a priviledged user.  We also provide them for block devices for
convenience. The attributes here are meant to complement what you can
already get with the allowed identify commands.

We add NVME_IOCTL_GET_ATTR_{V0SZ,CURZS} in order to make the ioctl
extensible. These serve as both size and version. The caller is expected to
pass the structure size as the first memeber of the struct. For example:

	{...
	struct nvme_get_attr nvme_get_attr = {0};
	nvme_get_attr.argsz = sizeof(struct nvme_get_attr);
	...}

Signed-off-by: Joel Granados <j.granados@samsung.com>
---
 drivers/nvme/host/core.c        |  2 +-
 drivers/nvme/host/ioctl.c       | 58 ++++++++++++++++++++++++++
 drivers/nvme/host/nvme.h        | 10 +++++
 include/uapi/linux/nvme_ioctl.h | 74 +++++++++++++++++++++++++++++++++
 4 files changed, 143 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index d76be125dc49..1cc14bd94aa5 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1276,7 +1276,7 @@ static bool nvme_ctrl_limited_cns(struct nvme_ctrl *ctrl)
 	return ctrl->vs < NVME_VS(1, 1, 0);
 }
 
-static int nvme_identify_ctrl(struct nvme_ctrl *dev, struct nvme_id_ctrl **id)
+int nvme_identify_ctrl(struct nvme_ctrl *dev, struct nvme_id_ctrl **id)
 {
 	struct nvme_command c = { };
 	int error;
diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
index 9273db147872..21291981584b 100644
--- a/drivers/nvme/host/ioctl.c
+++ b/drivers/nvme/host/ioctl.c
@@ -53,6 +53,58 @@ static void __user *nvme_to_user_ptr(uintptr_t ptrval)
 	return (void __user *)ptrval;
 }
 
+static int nvme_ctrl_export_attrs(struct nvme_ctrl *ctrl,
+				  struct nvme_get_attr __user *arg)
+{
+	int ret;
+	struct nvme_id_ctrl *id_ctrl;
+	struct nvme_get_attr nvme_get_attr = {0};
+	struct nvme_id_ctrl_nvm *id_ctrl_nvm;
+	__u32 usize;
+
+	BUILD_BUG_ON(sizeof(struct nvme_get_attr) < NVME_IOCTL_GET_ATTR_V0SZ);
+	BUILD_BUG_ON(sizeof(struct nvme_get_attr) != NVME_IOCTL_GET_ATTR_CURSZ);
+
+	if (copy_from_user(&nvme_get_attr, arg, 2 * sizeof(__u32)))
+		return -EFAULT;
+
+	if (nvme_get_attr.flags != 0)
+		return -EINVAL;
+
+	switch (nvme_get_attr.argsz) {
+	case NVME_IOCTL_GET_ATTR_V0SZ:
+		break;
+	default:
+		return -EINVAL;
+	}
+	usize = nvme_get_attr.argsz;
+
+	ret = nvme_identify_ctrl(ctrl, &id_ctrl);
+	if (ret)
+		return ret;
+
+	ret = nvme_identify_cs_ctrl(ctrl, &id_ctrl_nvm);
+	if (ret)
+		return ret;
+
+	nvme_get_attr.argsz = NVME_IOCTL_GET_ATTR_CURSZ;
+	nvme_get_attr.mpsmin = NVME_CAP_MPSMIN(ctrl->cap);
+	nvme_get_attr.vsl = id_ctrl_nvm->vsl;
+	nvme_get_attr.wzsl = id_ctrl_nvm->wzsl;
+	nvme_get_attr.wusl = id_ctrl_nvm->wusl;
+	nvme_get_attr.dmrl = id_ctrl_nvm->dmrl;
+	nvme_get_attr.dmsl = id_ctrl_nvm->dmsl;
+	nvme_get_attr.dmrsl = id_ctrl_nvm->dmrsl;
+	nvme_get_attr.oncs = id_ctrl->oncs;
+	nvme_get_attr.mdts = id_ctrl->mdts;
+
+	if (copy_to_user((struct nvme_get_attr __user *)arg, &nvme_get_attr,
+			  usize))
+		return -EFAULT;
+
+	return ret;
+}
+
 static void *nvme_add_user_metadata(struct request *req, void __user *ubuf,
 		unsigned len, u32 seed)
 {
@@ -613,6 +665,8 @@ static int nvme_ctrl_ioctl(struct nvme_ctrl *ctrl, unsigned int cmd,
 		return nvme_user_cmd(ctrl, NULL, argp, mode);
 	case NVME_IOCTL_ADMIN64_CMD:
 		return nvme_user_cmd64(ctrl, NULL, argp, false, mode);
+	case NVME_IOCTL_GET_ATTR:
+		return nvme_ctrl_export_attrs(ctrl, argp);
 	default:
 		return sed_ioctl(ctrl->opal_dev, cmd, argp);
 	}
@@ -659,6 +713,8 @@ static int nvme_ns_ioctl(struct nvme_ns *ns, unsigned int cmd,
 		return nvme_user_cmd64(ns->ctrl, ns, argp, false, mode);
 	case NVME_IOCTL_IO64_CMD_VEC:
 		return nvme_user_cmd64(ns->ctrl, ns, argp, true, mode);
+	case NVME_IOCTL_GET_ATTR:
+		return nvme_ctrl_export_attrs(ns->ctrl, argp);
 	default:
 		return -ENOTTY;
 	}
@@ -950,6 +1006,8 @@ long nvme_dev_ioctl(struct file *file, unsigned int cmd,
 			return -EACCES;
 		nvme_queue_scan(ctrl);
 		return 0;
+	case NVME_IOCTL_GET_ATTR:
+		return nvme_ctrl_export_attrs(ctrl, argp);
 	default:
 		return -ENOTTY;
 	}
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index a29877217ee6..77693f51975b 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -815,6 +815,9 @@ int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
 		union nvme_result *result, void *buffer, unsigned bufflen,
 		int qid, int at_head,
 		blk_mq_req_flags_t flags);
+
+int nvme_identify_ctrl(struct nvme_ctrl *dev, struct nvme_id_ctrl **id);
+int nvme_identify_cs_ctrl(struct nvme_ctrl *ctrl, struct nvme_id_ctrl_nvm **id);
 int nvme_set_features(struct nvme_ctrl *dev, unsigned int fid,
 		      unsigned int dword11, void *buffer, size_t buflen,
 		      u32 *result);
@@ -1072,4 +1075,11 @@ static inline const unsigned char *nvme_get_admin_opcode_str(u8 opcode)
 }
 #endif /* CONFIG_NVME_VERBOSE_ERRORS */
 
+/*
+ * List of all nvme ioctl controller attribute calls. Use size of nvme_get_attr
+ * as the version which enables us to use copy_struct_from_user
+ */
+#define NVME_IOCTL_GET_ATTR_V0SZ		32
+#define NVME_IOCTL_GET_ATTR_CURSZ		NVME_IOCTL_GET_ATTR_V0SZ
+
 #endif /* _NVME_H */
diff --git a/include/uapi/linux/nvme_ioctl.h b/include/uapi/linux/nvme_ioctl.h
index 2f76cba67166..b44a53d81aa9 100644
--- a/include/uapi/linux/nvme_ioctl.h
+++ b/include/uapi/linux/nvme_ioctl.h
@@ -92,6 +92,79 @@ struct nvme_uring_cmd {
 	__u32   rsvd2;
 };
 
+
+/*
+ * This struct is populated from the following nvme commands:
+ * [1] Identify Controller Data Structure
+ * [2] I/O Command Set Specific Identify Controller Data Structure for the
+ *     NVME Command Set. Usually contained in struct nvme_id_ctrl_nvm
+ * [3] Controller capabilities on controller initialization
+ */
+struct nvme_get_attr {
+	__u32	argsz;
+	__u32	flags;
+
+	/*
+	 * Memory Page Size MINimum : The host should not configure a page size that
+	 * is larger than (2 ^ (12 + mpsmin)). Comes from [3]
+	 */
+	__u32	mpsmin;
+
+	/*
+	 * Verify Size Limit : Recommended or supported data size for a verify
+	 * command. From [2].
+	 */
+	__u8	vsl;
+
+	/*
+	 * Write Zeroes Size Limit : Recommended or supported data size for a
+	 * zeroes command. From [2].
+	 */
+	__u8	wzsl;
+
+	/*
+	 * Write Uncorrected Size Limit : Recommended or supported data size for
+	 * an uncorrected command. From [2].
+	 */
+	__u8	wusl;
+
+	/*
+	 * Dataset Management Ranges Limit : Recommended or supported maximum
+	 * number of logical block ranges for the Dataset Management Command.
+	 * From [2].
+	 */
+	__u8	dmrl;
+
+	/*
+	 * Dataset Management Size Limit : Recommended or supported maximum of
+	 * total number of logical blocks for a Dataset Management Command.
+	 * From [2].
+	 */
+	__le64	dmsl;
+
+	/*
+	 * Dataset Management Range Size Limit : Recommended or supported maximum
+	 * number of logical blocks in a range of a Dataset Management Command.
+	 * From [2].
+	 */
+	__le32	dmrsl;
+
+	/*
+	 * Optional NVM Command Support. Is needed to make sense of attributes
+	 * like vsl, wzsl, wusl... Comes from [1].
+	 */
+	__le16	oncs;
+
+	/*
+	 * Maximum data transfer size. Commands should not exceed this size.
+	 * Comes from [1].
+	 */
+	__u8	mdts;
+
+	__u8	rsvd0;
+
+};
+
 #define nvme_admin_cmd nvme_passthru_cmd
 
 #define NVME_IOCTL_ID		_IO('N', 0x40)
@@ -104,6 +177,7 @@ struct nvme_uring_cmd {
 #define NVME_IOCTL_ADMIN64_CMD	_IOWR('N', 0x47, struct nvme_passthru_cmd64)
 #define NVME_IOCTL_IO64_CMD	_IOWR('N', 0x48, struct nvme_passthru_cmd64)
 #define NVME_IOCTL_IO64_CMD_VEC	_IOWR('N', 0x49, struct nvme_passthru_cmd64)
+#define NVME_IOCTL_GET_ATTR	_IOWR('N', 0x50, struct nvme_get_attr)
 
 /* io_uring async commands: */
 #define NVME_URING_CMD_IO	_IOWR('N', 0x80, struct nvme_uring_cmd)
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC 1/3] nvme: Return -ENOMEM when kzalloc fails
  2022-10-27 15:57     ` [RFC 1/3] nvme: Return -ENOMEM when kzalloc fails Joel Granados
@ 2022-10-27 16:26       ` Keith Busch
  2022-10-28  9:07         ` Joel Granados
  2022-10-30  8:02         ` Christoph Hellwig
  0 siblings, 2 replies; 14+ messages in thread
From: Keith Busch @ 2022-10-27 16:26 UTC (permalink / raw)
  To: Joel Granados
  Cc: sagi, hch, linux-nvme, gost.dev, joshi.k, javier.gonz, p.raghav

On Thu, Oct 27, 2022 at 05:57:22PM +0200, Joel Granados wrote:
> In nvme_init_non_mdts_limits function we were returning 0 when kzalloc
> failed. This patch corrects this behavior and return -ENOMEM
> Fixes: 5befc7c26e5a ("nvme: implement non-mdts command limits")

I'm pretty sure I had this returning 0 on purpose. We can proceed
without this optional structure.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC 3/3] nvme : Add ioctl to query nvme attributes
  2022-10-27 15:57     ` [RFC 3/3] nvme : Add ioctl to query nvme attributes Joel Granados
@ 2022-10-27 16:55       ` Keith Busch
  2022-10-28 10:38         ` Joel Granados
  0 siblings, 1 reply; 14+ messages in thread
From: Keith Busch @ 2022-10-27 16:55 UTC (permalink / raw)
  To: Joel Granados
  Cc: sagi, hch, linux-nvme, gost.dev, joshi.k, javier.gonz, p.raghav

On Thu, Oct 27, 2022 at 05:57:24PM +0200, Joel Granados wrote:
> +{
> +	int ret;
> +	struct nvme_id_ctrl *id_ctrl;
> +	struct nvme_get_attr nvme_get_attr = {0};
> +	struct nvme_id_ctrl_nvm *id_ctrl_nvm;
> +	__u32 usize;
> +
> +	BUILD_BUG_ON(sizeof(struct nvme_get_attr) < NVME_IOCTL_GET_ATTR_V0SZ);
> +	BUILD_BUG_ON(sizeof(struct nvme_get_attr) != NVME_IOCTL_GET_ATTR_CURSZ);
> +
> +	if (copy_from_user(&nvme_get_attr, arg, 2 * sizeof(__u32)))
> +		return -EFAULT;
> +
> +	if (nvme_get_attr.flags != 0)
> +		return -EINVAL;
> +
> +	switch (nvme_get_attr.argsz) {
> +	case NVME_IOCTL_GET_ATTR_V0SZ:
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +	usize = nvme_get_attr.argsz;
> +
> +	ret = nvme_identify_ctrl(ctrl, &id_ctrl);
> +	if (ret)
> +		return ret;
> +
> +	ret = nvme_identify_cs_ctrl(ctrl, &id_ctrl_nvm);
> +	if (ret)
> +		return ret;
> +
> +	nvme_get_attr.argsz = NVME_IOCTL_GET_ATTR_CURSZ;
> +	nvme_get_attr.mpsmin = NVME_CAP_MPSMIN(ctrl->cap);
> +	nvme_get_attr.vsl = id_ctrl_nvm->vsl;
> +	nvme_get_attr.wzsl = id_ctrl_nvm->wzsl;
> +	nvme_get_attr.wusl = id_ctrl_nvm->wusl;
> +	nvme_get_attr.dmrl = id_ctrl_nvm->dmrl;
> +	nvme_get_attr.dmsl = id_ctrl_nvm->dmsl;
> +	nvme_get_attr.dmrsl = id_ctrl_nvm->dmrsl;
> +	nvme_get_attr.oncs = id_ctrl->oncs;
> +	nvme_get_attr.mdts = id_ctrl->mdts;

You already have the 'struct nvme_ctrl' that saves nearly all these
values. We shouldn't need to send new IO when you can just reference the
cached values instead.

> +struct nvme_get_attr {
> +	__u32	argsz;
> +	__u32	flags;
> +
> +	/*
> +	 * Memory Page Size MINimum : The host should not configure a page size that
> +	 * is larger than (2 ^ (12 + mpsmin)). Comes from [3]
> +	 */
> +	__u32	mpsmin;
> +
> +	/*
> +	 * Verify Size Limit : Recommended or supported data size for a verify
> +	 * command. From [2].
> +	 */
> +	__u8	vsl;
> +
> +	/*
> +	 * Write Zeroes Size Limit : Recommended or supported data size for a
> +	 * zeroes command. From [2].
> +	 */
> +	__u8	wzsl;
> +
> +	/*
> +	 * Write Uncorrected Size Limit : Recommended or supported data size for
> +	 * an uncorrected command. From [2].
> +	 */
> +	__u8	wusl;
> +
> +	/*
> +	 * Dataset Management Ranges Limit : Recommended or supported maximum
> +	 * number of logical block ranges for the Dataset Management Command.
> +	 * From [2].
> +	 */
> +	__u8	dmrl;
> +
> +	/*
> +	 * Dataset Management Size Limit : Recommended or supported maximum of
> +	 * total number of logical blocks for a Dataset Management Command.
> +	 * From [2].
> +	 */
> +	__le64	dmsl;
> +
> +	/*
> +	 * Dataset Management Range Size Limit : Recommended or supported maximum
> +	 * number of logical blocks in a range of a Dataset Management Command.
> +	 * From [2].
> +	 */
> +	__le32	dmrsl;
> +
> +	/*
> +	 * Optional NVM Command Support. Is needed to make sense of attributes
> +	 * like vsl, wzsl, wusl... Comes from [1].
> +	 */
> +	__le16	oncs;

Don't use the little-endian format for values that are not going over
the wire.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC 1/3] nvme: Return -ENOMEM when kzalloc fails
  2022-10-27 16:26       ` Keith Busch
@ 2022-10-28  9:07         ` Joel Granados
  2022-10-30  8:02         ` Christoph Hellwig
  1 sibling, 0 replies; 14+ messages in thread
From: Joel Granados @ 2022-10-28  9:07 UTC (permalink / raw)
  To: Keith Busch
  Cc: sagi, hch, linux-nvme, gost.dev, joshi.k, javier.gonz, p.raghav

[-- Attachment #1: Type: text/plain, Size: 1676 bytes --]

On Thu, Oct 27, 2022 at 10:26:57AM -0600, Keith Busch wrote:
> On Thu, Oct 27, 2022 at 05:57:22PM +0200, Joel Granados wrote:
> > In nvme_init_non_mdts_limits function we were returning 0 when kzalloc
> > failed. This patch corrects this behavior and return -ENOMEM
> > Fixes: 5befc7c26e5a ("nvme: implement non-mdts command limits")
> 
> I'm pretty sure I had this returning 0 on purpose. We can proceed
> without this optional structure.

Looking a bit more into this I see a couple of things:
1. If we fail in the kzalloc, ctrl->max_discard_segments and
   ctrl->max_zeroes_sectors do not get set but we return 0. On the other
   hand if nvme_submit_sync_cmd fails, ctrl->max_discard_segments and
   ctrl->max_zeroes_sectors still do not get set but we forward
   nvme_submit_sync_cmd's error code.
   * Shouldn't we just set ret to 0 before returning regardless of what
     happens in nvme_submit_sync_cmd?
   * If we are returning 0 no matter what, shouldn't this function be
     void?

2. The function nvme_scan_work will not rescan the controler if
   nvme_init_non_mdts_limits has a return value that is less than zero.
   * Should we remove the check in nvme_scan_work to let the scan move
     forward regardless of the return value from
     nvme_init_non_mdts_limits?

3. I think what makes more sense is to:
  a. Make nvme_init_non_mdts_limits a void function
  b. Move the warning of failing to read mdts limits inside
     nvme_init_non_mdts_limits
  c. Allow the scan work to move forward after executing
     nvme_init_non_mdts_limits

What do you think? I can spin up a patch for this if it makes sense.

Best

Joel


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC 3/3] nvme : Add ioctl to query nvme attributes
  2022-10-27 16:55       ` Keith Busch
@ 2022-10-28 10:38         ` Joel Granados
  2022-10-28 14:52           ` Keith Busch
  0 siblings, 1 reply; 14+ messages in thread
From: Joel Granados @ 2022-10-28 10:38 UTC (permalink / raw)
  To: Keith Busch
  Cc: sagi, hch, linux-nvme, gost.dev, joshi.k, javier.gonz, p.raghav

[-- Attachment #1: Type: text/plain, Size: 4832 bytes --]

On Thu, Oct 27, 2022 at 10:55:38AM -0600, Keith Busch wrote:
> On Thu, Oct 27, 2022 at 05:57:24PM +0200, Joel Granados wrote:
> > +{
> > +	int ret;
> > +	struct nvme_id_ctrl *id_ctrl;
> > +	struct nvme_get_attr nvme_get_attr = {0};
> > +	struct nvme_id_ctrl_nvm *id_ctrl_nvm;
> > +	__u32 usize;
> > +
> > +	BUILD_BUG_ON(sizeof(struct nvme_get_attr) < NVME_IOCTL_GET_ATTR_V0SZ);
> > +	BUILD_BUG_ON(sizeof(struct nvme_get_attr) != NVME_IOCTL_GET_ATTR_CURSZ);
> > +
> > +	if (copy_from_user(&nvme_get_attr, arg, 2 * sizeof(__u32)))
> > +		return -EFAULT;
> > +
> > +	if (nvme_get_attr.flags != 0)
> > +		return -EINVAL;
> > +
> > +	switch (nvme_get_attr.argsz) {
> > +	case NVME_IOCTL_GET_ATTR_V0SZ:
> > +		break;
> > +	default:
> > +		return -EINVAL;
> > +	}
> > +	usize = nvme_get_attr.argsz;
> > +
> > +	ret = nvme_identify_ctrl(ctrl, &id_ctrl);
> > +	if (ret)
> > +		return ret;
> > +
> > +	ret = nvme_identify_cs_ctrl(ctrl, &id_ctrl_nvm);
> > +	if (ret)
> > +		return ret;
> > +
> > +	nvme_get_attr.argsz = NVME_IOCTL_GET_ATTR_CURSZ;
> > +	nvme_get_attr.mpsmin = NVME_CAP_MPSMIN(ctrl->cap);
> > +	nvme_get_attr.vsl = id_ctrl_nvm->vsl;
> > +	nvme_get_attr.wzsl = id_ctrl_nvm->wzsl;
> > +	nvme_get_attr.wusl = id_ctrl_nvm->wusl;
> > +	nvme_get_attr.dmrl = id_ctrl_nvm->dmrl;
> > +	nvme_get_attr.dmsl = id_ctrl_nvm->dmsl;
> > +	nvme_get_attr.dmrsl = id_ctrl_nvm->dmrsl;
> > +	nvme_get_attr.oncs = id_ctrl->oncs;
> > +	nvme_get_attr.mdts = id_ctrl->mdts;
> 
> You already have the 'struct nvme_ctrl' that saves nearly all these
> values. We shouldn't need to send new IO when you can just reference the
> cached values instead.

There are three types of variables here:
1. The variables that are adjusted from what the device actually
   returns. Like wzsl that is in max_zeroes_sectors but is already
   adjusted with MSPMIN. Here we can de-adjust them if we want to save
   an IO call
2. The variables that are not adjusted and are the same in struct
   nvme_ctrl in the device. Here we can just use the ctrl member.
3. And the variables that are not in struct nvme_ctrl. Here, we have not
   option but to make an IO.


> > +	nvme_get_attr.vsl = id_ctrl_nvm->vsl;
Not in ctrl (type 3)

> > +	nvme_get_attr.wzsl = id_ctrl_nvm->wzsl;
Exported as ctrl->max_zeroes_sector and adjusted with MPSMIN (type 1)

> > +	nvme_get_attr.wusl = id_ctrl_nvm->wusl;
Not in ctrl (type 3)

> > +	nvme_get_attr.dmrl = id_ctrl_nvm->dmrl;
In ctrl->max_discard_segments (type 2)

> > +	nvme_get_attr.dmsl = id_ctrl_nvm->dmsl;
Not in ctrl (type 3)

> > +	nvme_get_attr.dmrsl = id_ctrl_nvm->dmrsl;
In ctrl->dmrsl (type 2)

> > +	nvme_get_attr.oncs = id_ctrl->oncs;
In ctrl->oncs (type 2)

> > +	nvme_get_attr.mdts = id_ctrl->mdts;
Exported as ctrl->max_hw_sectors and adjusted with MPSMIN (type1)


So I see 3 members in struct id_ctrl_nvm that are not contained in
struct nvme_ctrl, therefore we need an IO to populate these.

To save an IO for the 2 last members (oncs and mdts), then I'll
de-adjust mdts and leave oncs as it is in nvme_ctrl.


>
> 
> > +struct nvme_get_attr {
> > +	__u32	argsz;
> > +	__u32	flags;
> > +
> > +	/*
> > +	 * Memory Page Size MINimum : The host should not configure a page size that
> > +	 * is larger than (2 ^ (12 + mpsmin)). Comes from [3]
> > +	 */
> > +	__u32	mpsmin;
> > +
> > +	/*
> > +	 * Verify Size Limit : Recommended or supported data size for a verify
> > +	 * command. From [2].
> > +	 */
> > +	__u8	vsl;
> > +
> > +	/*
> > +	 * Write Zeroes Size Limit : Recommended or supported data size for a
> > +	 * zeroes command. From [2].
> > +	 */
> > +	__u8	wzsl;
> > +
> > +	/*
> > +	 * Write Uncorrected Size Limit : Recommended or supported data size for
> > +	 * an uncorrected command. From [2].
> > +	 */
> > +	__u8	wusl;
> > +
> > +	/*
> > +	 * Dataset Management Ranges Limit : Recommended or supported maximum
> > +	 * number of logical block ranges for the Dataset Management Command.
> > +	 * From [2].
> > +	 */
> > +	__u8	dmrl;
> > +
> > +	/*
> > +	 * Dataset Management Size Limit : Recommended or supported maximum of
> > +	 * total number of logical blocks for a Dataset Management Command.
> > +	 * From [2].
> > +	 */
> > +	__le64	dmsl;
> > +
> > +	/*
> > +	 * Dataset Management Range Size Limit : Recommended or supported maximum
> > +	 * number of logical blocks in a range of a Dataset Management Command.
> > +	 * From [2].
> > +	 */
> > +	__le32	dmrsl;
> > +
> > +	/*
> > +	 * Optional NVM Command Support. Is needed to make sense of attributes
> > +	 * like vsl, wzsl, wusl... Comes from [1].
> > +	 */
> > +	__le16	oncs;
> 
> Don't use the little-endian format for values that are not going over
> the wire.

oops. That is mistake. thx

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC 3/3] nvme : Add ioctl to query nvme attributes
  2022-10-28 10:38         ` Joel Granados
@ 2022-10-28 14:52           ` Keith Busch
  2022-10-28 15:52             ` Joel Granados
  0 siblings, 1 reply; 14+ messages in thread
From: Keith Busch @ 2022-10-28 14:52 UTC (permalink / raw)
  To: Joel Granados
  Cc: sagi, hch, linux-nvme, gost.dev, joshi.k, javier.gonz, p.raghav

On Fri, Oct 28, 2022 at 12:38:27PM +0200, Joel Granados wrote:
> 3. And the variables that are not in struct nvme_ctrl. Here, we have not
>    option but to make an IO.

You do have another option: we have historically added new fields to
nvme_ctrl to cache parameters that are repeatedly referenced in other
contexts, and this usage appears to qualify.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC 3/3] nvme : Add ioctl to query nvme attributes
  2022-10-28 14:52           ` Keith Busch
@ 2022-10-28 15:52             ` Joel Granados
  2022-10-28 16:22               ` Keith Busch
  0 siblings, 1 reply; 14+ messages in thread
From: Joel Granados @ 2022-10-28 15:52 UTC (permalink / raw)
  To: Keith Busch
  Cc: sagi, hch, linux-nvme, gost.dev, joshi.k, javier.gonz, p.raghav

[-- Attachment #1: Type: text/plain, Size: 530 bytes --]

On Fri, Oct 28, 2022 at 08:52:01AM -0600, Keith Busch wrote:
> On Fri, Oct 28, 2022 at 12:38:27PM +0200, Joel Granados wrote:
> > 3. And the variables that are not in struct nvme_ctrl. Here, we have not
> >    option but to make an IO.
> 
> You do have another option: we have historically added new fields to
> nvme_ctrl to cache parameters that are repeatedly referenced in other
> contexts, and this usage appears to qualify.

Agreed. That is yet another option. Will do that for my V1.
Thx for the feedback

Best

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC 3/3] nvme : Add ioctl to query nvme attributes
  2022-10-28 15:52             ` Joel Granados
@ 2022-10-28 16:22               ` Keith Busch
  2022-10-31 12:24                 ` Joel Granados
  0 siblings, 1 reply; 14+ messages in thread
From: Keith Busch @ 2022-10-28 16:22 UTC (permalink / raw)
  To: Joel Granados
  Cc: sagi, hch, linux-nvme, gost.dev, joshi.k, javier.gonz, p.raghav

On Fri, Oct 28, 2022 at 05:52:30PM +0200, Joel Granados wrote:
> On Fri, Oct 28, 2022 at 08:52:01AM -0600, Keith Busch wrote:
> > On Fri, Oct 28, 2022 at 12:38:27PM +0200, Joel Granados wrote:
> > > 3. And the variables that are not in struct nvme_ctrl. Here, we have not
> > >    option but to make an IO.
> > 
> > You do have another option: we have historically added new fields to
> > nvme_ctrl to cache parameters that are repeatedly referenced in other
> > contexts, and this usage appears to qualify.
> 
> Agreed. That is yet another option. Will do that for my V1.
> Thx for the feedback

Just for the record, I don't really like this approach, but I don't have
a better suggestion right now other than opening up the admin queue or
exporting a ton of sysfs attributes (and each carry a different set of
concerns..).

My concern here is that drives implementing new specs may have new
constraints that applications need to know, but older kernels won't
accomodate. You are handling this by error'ing out the query, so it
relies on the application implementing a sane fallback (assuming such a
fallback for future attributes even exists). This just does not feel
very future-proof.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC 1/3] nvme: Return -ENOMEM when kzalloc fails
  2022-10-27 16:26       ` Keith Busch
  2022-10-28  9:07         ` Joel Granados
@ 2022-10-30  8:02         ` Christoph Hellwig
  2022-10-31 12:36           ` Joel Granados
  1 sibling, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2022-10-30  8:02 UTC (permalink / raw)
  To: Keith Busch
  Cc: Joel Granados, sagi, hch, linux-nvme, gost.dev, joshi.k,
	javier.gonz, p.raghav

On Thu, Oct 27, 2022 at 10:26:57AM -0600, Keith Busch wrote:
> On Thu, Oct 27, 2022 at 05:57:22PM +0200, Joel Granados wrote:
> > In nvme_init_non_mdts_limits function we were returning 0 when kzalloc
> > failed. This patch corrects this behavior and return -ENOMEM
> > Fixes: 5befc7c26e5a ("nvme: implement non-mdts command limits")
> 
> I'm pretty sure I had this returning 0 on purpose. We can proceed
> without this optional structure.

Well, we could.  But I don't think it really is a good idea.  Why I think
failing major resource allocation (e.g. HMB) and just continuing limited,
doing that for these tiny kmallocs that the kernel basically never fails
tends to create a lot of confusion and hard to test code pathes.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC 3/3] nvme : Add ioctl to query nvme attributes
  2022-10-28 16:22               ` Keith Busch
@ 2022-10-31 12:24                 ` Joel Granados
  0 siblings, 0 replies; 14+ messages in thread
From: Joel Granados @ 2022-10-31 12:24 UTC (permalink / raw)
  To: Keith Busch
  Cc: sagi, hch, linux-nvme, gost.dev, joshi.k, javier.gonz, p.raghav

[-- Attachment #1: Type: text/plain, Size: 2339 bytes --]

On Fri, Oct 28, 2022 at 10:22:35AM -0600, Keith Busch wrote:
> On Fri, Oct 28, 2022 at 05:52:30PM +0200, Joel Granados wrote:
> > On Fri, Oct 28, 2022 at 08:52:01AM -0600, Keith Busch wrote:
> > > On Fri, Oct 28, 2022 at 12:38:27PM +0200, Joel Granados wrote:
> > > > 3. And the variables that are not in struct nvme_ctrl. Here, we have not
> > > >    option but to make an IO.
> > > 
> > > You do have another option: we have historically added new fields to
> > > nvme_ctrl to cache parameters that are repeatedly referenced in other
> > > contexts, and this usage appears to qualify.
> > 
> > Agreed. That is yet another option. Will do that for my V1.
> > Thx for the feedback
> 
> Just for the record, I don't really like this approach, but I don't have
> a better suggestion right now other than opening up the admin queue or
> exporting a ton of sysfs attributes (and each carry a different set of
> concerns..).
> 
> My concern here is that drives implementing new specs may have new
> constraints that applications need to know, but older kernels won't
> accomodate. You are handling this by error'ing out the query, so it
> relies on the application implementing a sane fallback (assuming such a
> fallback for future attributes even exists). This just does not feel
> very future-proof.

There might be an additional route to handle the case where the user has
new kernel headers and is running the ioctl on an older kernel: We can
use copy_struct_from_user f5a1a536fa14 ("lib: introduce
copy_struct_from_user() helper").
In summary it has a clear behavior/assumptions for the three possible
cases:
1. When kernel and user space have the same header version it just
   copies the struct.
2. When the kernel has a higher version of the struct, it only copies
   what the user space requested in argsz.
3. When user has an old kernel (The case that you are worried about) it
   will only fill up the members that the kernel knows about provided
   that members that are not contained in the kernel struct are zeroed
   out.

What do you think?

I did not implement this at the outset because I was wanting to avoid
the additional call to copy_struct_from_user. But if this provides a
suitable solution for you case, then it is trivial to include it in the
patch.

Best

Joel



[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC 1/3] nvme: Return -ENOMEM when kzalloc fails
  2022-10-30  8:02         ` Christoph Hellwig
@ 2022-10-31 12:36           ` Joel Granados
  0 siblings, 0 replies; 14+ messages in thread
From: Joel Granados @ 2022-10-31 12:36 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, sagi, linux-nvme, gost.dev, joshi.k, javier.gonz, p.raghav

[-- Attachment #1: Type: text/plain, Size: 999 bytes --]

On Sun, Oct 30, 2022 at 09:02:36AM +0100, Christoph Hellwig wrote:
> On Thu, Oct 27, 2022 at 10:26:57AM -0600, Keith Busch wrote:
> > On Thu, Oct 27, 2022 at 05:57:22PM +0200, Joel Granados wrote:
> > > In nvme_init_non_mdts_limits function we were returning 0 when kzalloc
> > > failed. This patch corrects this behavior and return -ENOMEM
> > > Fixes: 5befc7c26e5a ("nvme: implement non-mdts command limits")
> > 
> > I'm pretty sure I had this returning 0 on purpose. We can proceed
> > without this optional structure.
> 
> Well, we could.  But I don't think it really is a good idea.  Why I think
> failing major resource allocation (e.g. HMB) and just continuing limited,
> doing that for these tiny kmallocs that the kernel basically never fails
> tends to create a lot of confusion and hard to test code pathes.

This seems like a more sound solution to me. Might just take this out to
a separate thread as a new nvme_identify_cs_ctrl function might not be
needed.

Best

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-10-31 12:36 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CGME20221027160102eucas1p1bb39a9a2b8ed6b54f854849532359332@eucas1p1.samsung.com>
2022-10-27 15:57 ` [RFC 0/3] nvme : Add ioctl to query nvme device attributes Joel Granados
     [not found]   ` <CGME20221027160108eucas1p223383fc27ba0a27a38f7b768c562a1ed@eucas1p2.samsung.com>
2022-10-27 15:57     ` [RFC 1/3] nvme: Return -ENOMEM when kzalloc fails Joel Granados
2022-10-27 16:26       ` Keith Busch
2022-10-28  9:07         ` Joel Granados
2022-10-30  8:02         ` Christoph Hellwig
2022-10-31 12:36           ` Joel Granados
     [not found]   ` <CGME20221027160108eucas1p1cc1351fd8bf35ef6e5220e0cdc865ff4@eucas1p1.samsung.com>
2022-10-27 15:57     ` [RFC 2/3] nvme: Move nvme identify CS to separate function Joel Granados
     [not found]   ` <CGME20221027160108eucas1p2722b30a1be27a855e2b0f2495fed15ab@eucas1p2.samsung.com>
2022-10-27 15:57     ` [RFC 3/3] nvme : Add ioctl to query nvme attributes Joel Granados
2022-10-27 16:55       ` Keith Busch
2022-10-28 10:38         ` Joel Granados
2022-10-28 14:52           ` Keith Busch
2022-10-28 15:52             ` Joel Granados
2022-10-28 16:22               ` Keith Busch
2022-10-31 12:24                 ` Joel Granados

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).