* [RFC 0/3] nvme : Add ioctl to query nvme device attributes [not found] <CGME20221027160102eucas1p1bb39a9a2b8ed6b54f854849532359332@eucas1p1.samsung.com> @ 2022-10-27 15:57 ` Joel Granados [not found] ` <CGME20221027160108eucas1p223383fc27ba0a27a38f7b768c562a1ed@eucas1p2.samsung.com> ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Joel Granados @ 2022-10-27 15:57 UTC (permalink / raw) To: kbusch, sagi, hch Cc: linux-nvme, gost.dev, joshi.k, javier.gonz, p.raghav, Joel Granados What? In this patch I add an ioctl that provides nvme controller attributes to the user that can open the char device file. We also provide them for the block devices for convenience. Why? Applications with write permissions should not need to be privileged to write to the device. With Kanchan's latest patch (https://lore.kernel.org/linux-nvme/20221020070205.57366-1-joshi.k@samsung.com/) the nvme IO and identify commands in passthru now follow device permissions; however there are still some controller attributes like minimal data transfer size (MDTS) which need a privileged user to be queried. How? Added an ioctl (NVME_IOCTL_GET_ATTR) that fetches attributes from the nvme driver. It sends both nvme_identify_ctrl and nvme_identify_cs_ctrl commands to query the attributes that are not contained in nvme_ctrl already. I have rebased this on top of Kanchans "nvme: identify-namespace without CAP_SYS_ADMIN" (https://lore.kernel.org/linux-nvme/20221020070205.57366-3-joshi.k@samsung.com) because this only makes sense if we can also do IO as unprivileged. I have made the call extensible by embedding the struct size as the first member and using it as a version. As more members get added, the switch statement gets populated with more checks. For this I followed in the steps of the openat2 system call [1] and the extensible ioctl for vfio [2]. Another interesting reference for this is here https://lwn.net/Articles/830666/ The first patch in the series is a fix to a potential bug in a return value. This was in a section of the code that was getting changed so I decided to include it in this series. Questions: 1. I started by providing members that have relevant attributes like mdts, and mpsmin for regular writes as well as wzsl, wusl for more specialized write commands like write zeroes and also dmsl, dmrsl, dmrl for dataset management commands. Have I missed an attribute that should definitely be there? Or have a included one which should be removed? 2. Naming is always important. We called it "GET_ATTR" because it "gets" nvme specific attributes. This is fairly general for whatever we want to add in the future, but do we need to be more specific here? Alternatives are "NVME_IOCTL_DEV_ATTR", "NVME_IOCTL_DRV_ATTR" or "NVME_IOCTL_CTRL_ATTR". Thoughts? 3. I have named the communication structure "nvme_get_attr" to be consistent with the ioctl name. Are there alternatives? 4. I have made the ioctl call extensible because I believe that this ioctl might grow in the future as ppl see that there might be more data needed to do an nvme write. [1] https://github.com/torvalds/linux/commit/fddb5d430ad9 [2] https://github.com/torvalds/linux/blob/master/include/uapi/linux/vfio.h#L56 Joel Granados (3): nvme: Return -ENOMEM when kzalloc fails nvme: Move nvme identify CS to separate function nvme : Add ioctl to query nvme attributes drivers/nvme/host/core.c | 35 ++++++++++------ drivers/nvme/host/ioctl.c | 58 ++++++++++++++++++++++++++ drivers/nvme/host/nvme.h | 10 +++++ include/uapi/linux/nvme_ioctl.h | 74 +++++++++++++++++++++++++++++++++ 4 files changed, 164 insertions(+), 13 deletions(-) -- 2.30.2 ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <CGME20221027160108eucas1p223383fc27ba0a27a38f7b768c562a1ed@eucas1p2.samsung.com>]
* [RFC 1/3] nvme: Return -ENOMEM when kzalloc fails [not found] ` <CGME20221027160108eucas1p223383fc27ba0a27a38f7b768c562a1ed@eucas1p2.samsung.com> @ 2022-10-27 15:57 ` Joel Granados 2022-10-27 16:26 ` Keith Busch 0 siblings, 1 reply; 14+ messages in thread From: Joel Granados @ 2022-10-27 15:57 UTC (permalink / raw) To: kbusch, sagi, hch Cc: linux-nvme, gost.dev, joshi.k, javier.gonz, p.raghav, Joel Granados In nvme_init_non_mdts_limits function we were returning 0 when kzalloc failed. This patch corrects this behavior and return -ENOMEM Fixes: 5befc7c26e5a ("nvme: implement non-mdts command limits") Signed-off-by: Joel Granados <j.granados@samsung.com> --- drivers/nvme/host/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 059737c1a2c1..7dbf221247f7 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -3050,7 +3050,7 @@ static int nvme_init_non_mdts_limits(struct nvme_ctrl *ctrl) id = kzalloc(sizeof(*id), GFP_KERNEL); if (!id) - return 0; + return -ENOMEM; c.identify.opcode = nvme_admin_identify; c.identify.cns = NVME_ID_CNS_CS_CTRL; -- 2.30.2 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [RFC 1/3] nvme: Return -ENOMEM when kzalloc fails 2022-10-27 15:57 ` [RFC 1/3] nvme: Return -ENOMEM when kzalloc fails Joel Granados @ 2022-10-27 16:26 ` Keith Busch 2022-10-28 9:07 ` Joel Granados 2022-10-30 8:02 ` Christoph Hellwig 0 siblings, 2 replies; 14+ messages in thread From: Keith Busch @ 2022-10-27 16:26 UTC (permalink / raw) To: Joel Granados Cc: sagi, hch, linux-nvme, gost.dev, joshi.k, javier.gonz, p.raghav On Thu, Oct 27, 2022 at 05:57:22PM +0200, Joel Granados wrote: > In nvme_init_non_mdts_limits function we were returning 0 when kzalloc > failed. This patch corrects this behavior and return -ENOMEM > Fixes: 5befc7c26e5a ("nvme: implement non-mdts command limits") I'm pretty sure I had this returning 0 on purpose. We can proceed without this optional structure. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC 1/3] nvme: Return -ENOMEM when kzalloc fails 2022-10-27 16:26 ` Keith Busch @ 2022-10-28 9:07 ` Joel Granados 2022-10-30 8:02 ` Christoph Hellwig 1 sibling, 0 replies; 14+ messages in thread From: Joel Granados @ 2022-10-28 9:07 UTC (permalink / raw) To: Keith Busch Cc: sagi, hch, linux-nvme, gost.dev, joshi.k, javier.gonz, p.raghav [-- Attachment #1: Type: text/plain, Size: 1676 bytes --] On Thu, Oct 27, 2022 at 10:26:57AM -0600, Keith Busch wrote: > On Thu, Oct 27, 2022 at 05:57:22PM +0200, Joel Granados wrote: > > In nvme_init_non_mdts_limits function we were returning 0 when kzalloc > > failed. This patch corrects this behavior and return -ENOMEM > > Fixes: 5befc7c26e5a ("nvme: implement non-mdts command limits") > > I'm pretty sure I had this returning 0 on purpose. We can proceed > without this optional structure. Looking a bit more into this I see a couple of things: 1. If we fail in the kzalloc, ctrl->max_discard_segments and ctrl->max_zeroes_sectors do not get set but we return 0. On the other hand if nvme_submit_sync_cmd fails, ctrl->max_discard_segments and ctrl->max_zeroes_sectors still do not get set but we forward nvme_submit_sync_cmd's error code. * Shouldn't we just set ret to 0 before returning regardless of what happens in nvme_submit_sync_cmd? * If we are returning 0 no matter what, shouldn't this function be void? 2. The function nvme_scan_work will not rescan the controler if nvme_init_non_mdts_limits has a return value that is less than zero. * Should we remove the check in nvme_scan_work to let the scan move forward regardless of the return value from nvme_init_non_mdts_limits? 3. I think what makes more sense is to: a. Make nvme_init_non_mdts_limits a void function b. Move the warning of failing to read mdts limits inside nvme_init_non_mdts_limits c. Allow the scan work to move forward after executing nvme_init_non_mdts_limits What do you think? I can spin up a patch for this if it makes sense. Best Joel [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 659 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC 1/3] nvme: Return -ENOMEM when kzalloc fails 2022-10-27 16:26 ` Keith Busch 2022-10-28 9:07 ` Joel Granados @ 2022-10-30 8:02 ` Christoph Hellwig 2022-10-31 12:36 ` Joel Granados 1 sibling, 1 reply; 14+ messages in thread From: Christoph Hellwig @ 2022-10-30 8:02 UTC (permalink / raw) To: Keith Busch Cc: Joel Granados, sagi, hch, linux-nvme, gost.dev, joshi.k, javier.gonz, p.raghav On Thu, Oct 27, 2022 at 10:26:57AM -0600, Keith Busch wrote: > On Thu, Oct 27, 2022 at 05:57:22PM +0200, Joel Granados wrote: > > In nvme_init_non_mdts_limits function we were returning 0 when kzalloc > > failed. This patch corrects this behavior and return -ENOMEM > > Fixes: 5befc7c26e5a ("nvme: implement non-mdts command limits") > > I'm pretty sure I had this returning 0 on purpose. We can proceed > without this optional structure. Well, we could. But I don't think it really is a good idea. Why I think failing major resource allocation (e.g. HMB) and just continuing limited, doing that for these tiny kmallocs that the kernel basically never fails tends to create a lot of confusion and hard to test code pathes. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC 1/3] nvme: Return -ENOMEM when kzalloc fails 2022-10-30 8:02 ` Christoph Hellwig @ 2022-10-31 12:36 ` Joel Granados 0 siblings, 0 replies; 14+ messages in thread From: Joel Granados @ 2022-10-31 12:36 UTC (permalink / raw) To: Christoph Hellwig Cc: Keith Busch, sagi, linux-nvme, gost.dev, joshi.k, javier.gonz, p.raghav [-- Attachment #1: Type: text/plain, Size: 999 bytes --] On Sun, Oct 30, 2022 at 09:02:36AM +0100, Christoph Hellwig wrote: > On Thu, Oct 27, 2022 at 10:26:57AM -0600, Keith Busch wrote: > > On Thu, Oct 27, 2022 at 05:57:22PM +0200, Joel Granados wrote: > > > In nvme_init_non_mdts_limits function we were returning 0 when kzalloc > > > failed. This patch corrects this behavior and return -ENOMEM > > > Fixes: 5befc7c26e5a ("nvme: implement non-mdts command limits") > > > > I'm pretty sure I had this returning 0 on purpose. We can proceed > > without this optional structure. > > Well, we could. But I don't think it really is a good idea. Why I think > failing major resource allocation (e.g. HMB) and just continuing limited, > doing that for these tiny kmallocs that the kernel basically never fails > tends to create a lot of confusion and hard to test code pathes. This seems like a more sound solution to me. Might just take this out to a separate thread as a new nvme_identify_cs_ctrl function might not be needed. Best [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 659 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <CGME20221027160108eucas1p1cc1351fd8bf35ef6e5220e0cdc865ff4@eucas1p1.samsung.com>]
* [RFC 2/3] nvme: Move nvme identify CS to separate function [not found] ` <CGME20221027160108eucas1p1cc1351fd8bf35ef6e5220e0cdc865ff4@eucas1p1.samsung.com> @ 2022-10-27 15:57 ` Joel Granados 0 siblings, 0 replies; 14+ messages in thread From: Joel Granados @ 2022-10-27 15:57 UTC (permalink / raw) To: kbusch, sagi, hch Cc: linux-nvme, gost.dev, joshi.k, javier.gonz, p.raghav, Joel Granados This is a preparation commit to be able to use the NVME identify command space controller command from two places without duplication. Signed-off-by: Joel Granados <j.granados@samsung.com> --- drivers/nvme/host/core.c | 33 +++++++++++++++++++++------------ 1 file changed, 21 insertions(+), 12 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 7dbf221247f7..d76be125dc49 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -3019,9 +3019,27 @@ static inline u32 nvme_mps_to_sectors(struct nvme_ctrl *ctrl, u32 units) return val; } +int nvme_identify_cs_ctrl(struct nvme_ctrl *ctrl, struct nvme_id_ctrl_nvm **id) +{ + struct nvme_command c = { }; + int err; + + c.identify.opcode = nvme_admin_identify; + c.identify.cns = NVME_ID_CNS_CS_CTRL; + c.identify.csi = NVME_CSI_NVM; + + *id = kzalloc(sizeof(**id), GFP_KERNEL); + if (!(*id)) + return -ENOMEM; + + err = nvme_submit_sync_cmd(ctrl->admin_q, &c, *id, sizeof(**id)); + if (err) + kfree(*id); + return err; +} + static int nvme_init_non_mdts_limits(struct nvme_ctrl *ctrl) { - struct nvme_command c = { }; struct nvme_id_ctrl_nvm *id; int ret; @@ -3048,17 +3066,9 @@ static int nvme_init_non_mdts_limits(struct nvme_ctrl *ctrl) if (nvme_ctrl_limited_cns(ctrl)) return 0; - id = kzalloc(sizeof(*id), GFP_KERNEL); - if (!id) - return -ENOMEM; - - c.identify.opcode = nvme_admin_identify; - c.identify.cns = NVME_ID_CNS_CS_CTRL; - c.identify.csi = NVME_CSI_NVM; - - ret = nvme_submit_sync_cmd(ctrl->admin_q, &c, id, sizeof(*id)); + ret = nvme_identify_cs_ctrl(ctrl, &id); if (ret) - goto free_data; + return ret; if (id->dmrl) ctrl->max_discard_segments = id->dmrl; @@ -3066,7 +3076,6 @@ static int nvme_init_non_mdts_limits(struct nvme_ctrl *ctrl) if (id->wzsl) ctrl->max_zeroes_sectors = nvme_mps_to_sectors(ctrl, id->wzsl); -free_data: kfree(id); return ret; } -- 2.30.2 ^ permalink raw reply related [flat|nested] 14+ messages in thread
[parent not found: <CGME20221027160108eucas1p2722b30a1be27a855e2b0f2495fed15ab@eucas1p2.samsung.com>]
* [RFC 3/3] nvme : Add ioctl to query nvme attributes [not found] ` <CGME20221027160108eucas1p2722b30a1be27a855e2b0f2495fed15ab@eucas1p2.samsung.com> @ 2022-10-27 15:57 ` Joel Granados 2022-10-27 16:55 ` Keith Busch 0 siblings, 1 reply; 14+ messages in thread From: Joel Granados @ 2022-10-27 15:57 UTC (permalink / raw) To: kbusch, sagi, hch Cc: linux-nvme, gost.dev, joshi.k, javier.gonz, p.raghav, Joel Granados An ioctl (NVME_IOCTL_GET_ATTR) is added to query nvme controller attributes needed to effectively write to the char device without needing to be a priviledged user. We also provide them for block devices for convenience. The attributes here are meant to complement what you can already get with the allowed identify commands. We add NVME_IOCTL_GET_ATTR_{V0SZ,CURZS} in order to make the ioctl extensible. These serve as both size and version. The caller is expected to pass the structure size as the first memeber of the struct. For example: {... struct nvme_get_attr nvme_get_attr = {0}; nvme_get_attr.argsz = sizeof(struct nvme_get_attr); ...} Signed-off-by: Joel Granados <j.granados@samsung.com> --- drivers/nvme/host/core.c | 2 +- drivers/nvme/host/ioctl.c | 58 ++++++++++++++++++++++++++ drivers/nvme/host/nvme.h | 10 +++++ include/uapi/linux/nvme_ioctl.h | 74 +++++++++++++++++++++++++++++++++ 4 files changed, 143 insertions(+), 1 deletion(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index d76be125dc49..1cc14bd94aa5 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -1276,7 +1276,7 @@ static bool nvme_ctrl_limited_cns(struct nvme_ctrl *ctrl) return ctrl->vs < NVME_VS(1, 1, 0); } -static int nvme_identify_ctrl(struct nvme_ctrl *dev, struct nvme_id_ctrl **id) +int nvme_identify_ctrl(struct nvme_ctrl *dev, struct nvme_id_ctrl **id) { struct nvme_command c = { }; int error; diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c index 9273db147872..21291981584b 100644 --- a/drivers/nvme/host/ioctl.c +++ b/drivers/nvme/host/ioctl.c @@ -53,6 +53,58 @@ static void __user *nvme_to_user_ptr(uintptr_t ptrval) return (void __user *)ptrval; } +static int nvme_ctrl_export_attrs(struct nvme_ctrl *ctrl, + struct nvme_get_attr __user *arg) +{ + int ret; + struct nvme_id_ctrl *id_ctrl; + struct nvme_get_attr nvme_get_attr = {0}; + struct nvme_id_ctrl_nvm *id_ctrl_nvm; + __u32 usize; + + BUILD_BUG_ON(sizeof(struct nvme_get_attr) < NVME_IOCTL_GET_ATTR_V0SZ); + BUILD_BUG_ON(sizeof(struct nvme_get_attr) != NVME_IOCTL_GET_ATTR_CURSZ); + + if (copy_from_user(&nvme_get_attr, arg, 2 * sizeof(__u32))) + return -EFAULT; + + if (nvme_get_attr.flags != 0) + return -EINVAL; + + switch (nvme_get_attr.argsz) { + case NVME_IOCTL_GET_ATTR_V0SZ: + break; + default: + return -EINVAL; + } + usize = nvme_get_attr.argsz; + + ret = nvme_identify_ctrl(ctrl, &id_ctrl); + if (ret) + return ret; + + ret = nvme_identify_cs_ctrl(ctrl, &id_ctrl_nvm); + if (ret) + return ret; + + nvme_get_attr.argsz = NVME_IOCTL_GET_ATTR_CURSZ; + nvme_get_attr.mpsmin = NVME_CAP_MPSMIN(ctrl->cap); + nvme_get_attr.vsl = id_ctrl_nvm->vsl; + nvme_get_attr.wzsl = id_ctrl_nvm->wzsl; + nvme_get_attr.wusl = id_ctrl_nvm->wusl; + nvme_get_attr.dmrl = id_ctrl_nvm->dmrl; + nvme_get_attr.dmsl = id_ctrl_nvm->dmsl; + nvme_get_attr.dmrsl = id_ctrl_nvm->dmrsl; + nvme_get_attr.oncs = id_ctrl->oncs; + nvme_get_attr.mdts = id_ctrl->mdts; + + if (copy_to_user((struct nvme_get_attr __user *)arg, &nvme_get_attr, + usize)) + return -EFAULT; + + return ret; +} + static void *nvme_add_user_metadata(struct request *req, void __user *ubuf, unsigned len, u32 seed) { @@ -613,6 +665,8 @@ static int nvme_ctrl_ioctl(struct nvme_ctrl *ctrl, unsigned int cmd, return nvme_user_cmd(ctrl, NULL, argp, mode); case NVME_IOCTL_ADMIN64_CMD: return nvme_user_cmd64(ctrl, NULL, argp, false, mode); + case NVME_IOCTL_GET_ATTR: + return nvme_ctrl_export_attrs(ctrl, argp); default: return sed_ioctl(ctrl->opal_dev, cmd, argp); } @@ -659,6 +713,8 @@ static int nvme_ns_ioctl(struct nvme_ns *ns, unsigned int cmd, return nvme_user_cmd64(ns->ctrl, ns, argp, false, mode); case NVME_IOCTL_IO64_CMD_VEC: return nvme_user_cmd64(ns->ctrl, ns, argp, true, mode); + case NVME_IOCTL_GET_ATTR: + return nvme_ctrl_export_attrs(ns->ctrl, argp); default: return -ENOTTY; } @@ -950,6 +1006,8 @@ long nvme_dev_ioctl(struct file *file, unsigned int cmd, return -EACCES; nvme_queue_scan(ctrl); return 0; + case NVME_IOCTL_GET_ATTR: + return nvme_ctrl_export_attrs(ctrl, argp); default: return -ENOTTY; } diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index a29877217ee6..77693f51975b 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -815,6 +815,9 @@ int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd, union nvme_result *result, void *buffer, unsigned bufflen, int qid, int at_head, blk_mq_req_flags_t flags); + +int nvme_identify_ctrl(struct nvme_ctrl *dev, struct nvme_id_ctrl **id); +int nvme_identify_cs_ctrl(struct nvme_ctrl *ctrl, struct nvme_id_ctrl_nvm **id); int nvme_set_features(struct nvme_ctrl *dev, unsigned int fid, unsigned int dword11, void *buffer, size_t buflen, u32 *result); @@ -1072,4 +1075,11 @@ static inline const unsigned char *nvme_get_admin_opcode_str(u8 opcode) } #endif /* CONFIG_NVME_VERBOSE_ERRORS */ +/* + * List of all nvme ioctl controller attribute calls. Use size of nvme_get_attr + * as the version which enables us to use copy_struct_from_user + */ +#define NVME_IOCTL_GET_ATTR_V0SZ 32 +#define NVME_IOCTL_GET_ATTR_CURSZ NVME_IOCTL_GET_ATTR_V0SZ + #endif /* _NVME_H */ diff --git a/include/uapi/linux/nvme_ioctl.h b/include/uapi/linux/nvme_ioctl.h index 2f76cba67166..b44a53d81aa9 100644 --- a/include/uapi/linux/nvme_ioctl.h +++ b/include/uapi/linux/nvme_ioctl.h @@ -92,6 +92,79 @@ struct nvme_uring_cmd { __u32 rsvd2; }; + +/* + * This struct is populated from the following nvme commands: + * [1] Identify Controller Data Structure + * [2] I/O Command Set Specific Identify Controller Data Structure for the + * NVME Command Set. Usually contained in struct nvme_id_ctrl_nvm + * [3] Controller capabilities on controller initialization + */ +struct nvme_get_attr { + __u32 argsz; + __u32 flags; + + /* + * Memory Page Size MINimum : The host should not configure a page size that + * is larger than (2 ^ (12 + mpsmin)). Comes from [3] + */ + __u32 mpsmin; + + /* + * Verify Size Limit : Recommended or supported data size for a verify + * command. From [2]. + */ + __u8 vsl; + + /* + * Write Zeroes Size Limit : Recommended or supported data size for a + * zeroes command. From [2]. + */ + __u8 wzsl; + + /* + * Write Uncorrected Size Limit : Recommended or supported data size for + * an uncorrected command. From [2]. + */ + __u8 wusl; + + /* + * Dataset Management Ranges Limit : Recommended or supported maximum + * number of logical block ranges for the Dataset Management Command. + * From [2]. + */ + __u8 dmrl; + + /* + * Dataset Management Size Limit : Recommended or supported maximum of + * total number of logical blocks for a Dataset Management Command. + * From [2]. + */ + __le64 dmsl; + + /* + * Dataset Management Range Size Limit : Recommended or supported maximum + * number of logical blocks in a range of a Dataset Management Command. + * From [2]. + */ + __le32 dmrsl; + + /* + * Optional NVM Command Support. Is needed to make sense of attributes + * like vsl, wzsl, wusl... Comes from [1]. + */ + __le16 oncs; + + /* + * Maximum data transfer size. Commands should not exceed this size. + * Comes from [1]. + */ + __u8 mdts; + + __u8 rsvd0; + +}; + #define nvme_admin_cmd nvme_passthru_cmd #define NVME_IOCTL_ID _IO('N', 0x40) @@ -104,6 +177,7 @@ struct nvme_uring_cmd { #define NVME_IOCTL_ADMIN64_CMD _IOWR('N', 0x47, struct nvme_passthru_cmd64) #define NVME_IOCTL_IO64_CMD _IOWR('N', 0x48, struct nvme_passthru_cmd64) #define NVME_IOCTL_IO64_CMD_VEC _IOWR('N', 0x49, struct nvme_passthru_cmd64) +#define NVME_IOCTL_GET_ATTR _IOWR('N', 0x50, struct nvme_get_attr) /* io_uring async commands: */ #define NVME_URING_CMD_IO _IOWR('N', 0x80, struct nvme_uring_cmd) -- 2.30.2 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [RFC 3/3] nvme : Add ioctl to query nvme attributes 2022-10-27 15:57 ` [RFC 3/3] nvme : Add ioctl to query nvme attributes Joel Granados @ 2022-10-27 16:55 ` Keith Busch 2022-10-28 10:38 ` Joel Granados 0 siblings, 1 reply; 14+ messages in thread From: Keith Busch @ 2022-10-27 16:55 UTC (permalink / raw) To: Joel Granados Cc: sagi, hch, linux-nvme, gost.dev, joshi.k, javier.gonz, p.raghav On Thu, Oct 27, 2022 at 05:57:24PM +0200, Joel Granados wrote: > +{ > + int ret; > + struct nvme_id_ctrl *id_ctrl; > + struct nvme_get_attr nvme_get_attr = {0}; > + struct nvme_id_ctrl_nvm *id_ctrl_nvm; > + __u32 usize; > + > + BUILD_BUG_ON(sizeof(struct nvme_get_attr) < NVME_IOCTL_GET_ATTR_V0SZ); > + BUILD_BUG_ON(sizeof(struct nvme_get_attr) != NVME_IOCTL_GET_ATTR_CURSZ); > + > + if (copy_from_user(&nvme_get_attr, arg, 2 * sizeof(__u32))) > + return -EFAULT; > + > + if (nvme_get_attr.flags != 0) > + return -EINVAL; > + > + switch (nvme_get_attr.argsz) { > + case NVME_IOCTL_GET_ATTR_V0SZ: > + break; > + default: > + return -EINVAL; > + } > + usize = nvme_get_attr.argsz; > + > + ret = nvme_identify_ctrl(ctrl, &id_ctrl); > + if (ret) > + return ret; > + > + ret = nvme_identify_cs_ctrl(ctrl, &id_ctrl_nvm); > + if (ret) > + return ret; > + > + nvme_get_attr.argsz = NVME_IOCTL_GET_ATTR_CURSZ; > + nvme_get_attr.mpsmin = NVME_CAP_MPSMIN(ctrl->cap); > + nvme_get_attr.vsl = id_ctrl_nvm->vsl; > + nvme_get_attr.wzsl = id_ctrl_nvm->wzsl; > + nvme_get_attr.wusl = id_ctrl_nvm->wusl; > + nvme_get_attr.dmrl = id_ctrl_nvm->dmrl; > + nvme_get_attr.dmsl = id_ctrl_nvm->dmsl; > + nvme_get_attr.dmrsl = id_ctrl_nvm->dmrsl; > + nvme_get_attr.oncs = id_ctrl->oncs; > + nvme_get_attr.mdts = id_ctrl->mdts; You already have the 'struct nvme_ctrl' that saves nearly all these values. We shouldn't need to send new IO when you can just reference the cached values instead. > +struct nvme_get_attr { > + __u32 argsz; > + __u32 flags; > + > + /* > + * Memory Page Size MINimum : The host should not configure a page size that > + * is larger than (2 ^ (12 + mpsmin)). Comes from [3] > + */ > + __u32 mpsmin; > + > + /* > + * Verify Size Limit : Recommended or supported data size for a verify > + * command. From [2]. > + */ > + __u8 vsl; > + > + /* > + * Write Zeroes Size Limit : Recommended or supported data size for a > + * zeroes command. From [2]. > + */ > + __u8 wzsl; > + > + /* > + * Write Uncorrected Size Limit : Recommended or supported data size for > + * an uncorrected command. From [2]. > + */ > + __u8 wusl; > + > + /* > + * Dataset Management Ranges Limit : Recommended or supported maximum > + * number of logical block ranges for the Dataset Management Command. > + * From [2]. > + */ > + __u8 dmrl; > + > + /* > + * Dataset Management Size Limit : Recommended or supported maximum of > + * total number of logical blocks for a Dataset Management Command. > + * From [2]. > + */ > + __le64 dmsl; > + > + /* > + * Dataset Management Range Size Limit : Recommended or supported maximum > + * number of logical blocks in a range of a Dataset Management Command. > + * From [2]. > + */ > + __le32 dmrsl; > + > + /* > + * Optional NVM Command Support. Is needed to make sense of attributes > + * like vsl, wzsl, wusl... Comes from [1]. > + */ > + __le16 oncs; Don't use the little-endian format for values that are not going over the wire. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC 3/3] nvme : Add ioctl to query nvme attributes 2022-10-27 16:55 ` Keith Busch @ 2022-10-28 10:38 ` Joel Granados 2022-10-28 14:52 ` Keith Busch 0 siblings, 1 reply; 14+ messages in thread From: Joel Granados @ 2022-10-28 10:38 UTC (permalink / raw) To: Keith Busch Cc: sagi, hch, linux-nvme, gost.dev, joshi.k, javier.gonz, p.raghav [-- Attachment #1: Type: text/plain, Size: 4832 bytes --] On Thu, Oct 27, 2022 at 10:55:38AM -0600, Keith Busch wrote: > On Thu, Oct 27, 2022 at 05:57:24PM +0200, Joel Granados wrote: > > +{ > > + int ret; > > + struct nvme_id_ctrl *id_ctrl; > > + struct nvme_get_attr nvme_get_attr = {0}; > > + struct nvme_id_ctrl_nvm *id_ctrl_nvm; > > + __u32 usize; > > + > > + BUILD_BUG_ON(sizeof(struct nvme_get_attr) < NVME_IOCTL_GET_ATTR_V0SZ); > > + BUILD_BUG_ON(sizeof(struct nvme_get_attr) != NVME_IOCTL_GET_ATTR_CURSZ); > > + > > + if (copy_from_user(&nvme_get_attr, arg, 2 * sizeof(__u32))) > > + return -EFAULT; > > + > > + if (nvme_get_attr.flags != 0) > > + return -EINVAL; > > + > > + switch (nvme_get_attr.argsz) { > > + case NVME_IOCTL_GET_ATTR_V0SZ: > > + break; > > + default: > > + return -EINVAL; > > + } > > + usize = nvme_get_attr.argsz; > > + > > + ret = nvme_identify_ctrl(ctrl, &id_ctrl); > > + if (ret) > > + return ret; > > + > > + ret = nvme_identify_cs_ctrl(ctrl, &id_ctrl_nvm); > > + if (ret) > > + return ret; > > + > > + nvme_get_attr.argsz = NVME_IOCTL_GET_ATTR_CURSZ; > > + nvme_get_attr.mpsmin = NVME_CAP_MPSMIN(ctrl->cap); > > + nvme_get_attr.vsl = id_ctrl_nvm->vsl; > > + nvme_get_attr.wzsl = id_ctrl_nvm->wzsl; > > + nvme_get_attr.wusl = id_ctrl_nvm->wusl; > > + nvme_get_attr.dmrl = id_ctrl_nvm->dmrl; > > + nvme_get_attr.dmsl = id_ctrl_nvm->dmsl; > > + nvme_get_attr.dmrsl = id_ctrl_nvm->dmrsl; > > + nvme_get_attr.oncs = id_ctrl->oncs; > > + nvme_get_attr.mdts = id_ctrl->mdts; > > You already have the 'struct nvme_ctrl' that saves nearly all these > values. We shouldn't need to send new IO when you can just reference the > cached values instead. There are three types of variables here: 1. The variables that are adjusted from what the device actually returns. Like wzsl that is in max_zeroes_sectors but is already adjusted with MSPMIN. Here we can de-adjust them if we want to save an IO call 2. The variables that are not adjusted and are the same in struct nvme_ctrl in the device. Here we can just use the ctrl member. 3. And the variables that are not in struct nvme_ctrl. Here, we have not option but to make an IO. > > + nvme_get_attr.vsl = id_ctrl_nvm->vsl; Not in ctrl (type 3) > > + nvme_get_attr.wzsl = id_ctrl_nvm->wzsl; Exported as ctrl->max_zeroes_sector and adjusted with MPSMIN (type 1) > > + nvme_get_attr.wusl = id_ctrl_nvm->wusl; Not in ctrl (type 3) > > + nvme_get_attr.dmrl = id_ctrl_nvm->dmrl; In ctrl->max_discard_segments (type 2) > > + nvme_get_attr.dmsl = id_ctrl_nvm->dmsl; Not in ctrl (type 3) > > + nvme_get_attr.dmrsl = id_ctrl_nvm->dmrsl; In ctrl->dmrsl (type 2) > > + nvme_get_attr.oncs = id_ctrl->oncs; In ctrl->oncs (type 2) > > + nvme_get_attr.mdts = id_ctrl->mdts; Exported as ctrl->max_hw_sectors and adjusted with MPSMIN (type1) So I see 3 members in struct id_ctrl_nvm that are not contained in struct nvme_ctrl, therefore we need an IO to populate these. To save an IO for the 2 last members (oncs and mdts), then I'll de-adjust mdts and leave oncs as it is in nvme_ctrl. > > > > +struct nvme_get_attr { > > + __u32 argsz; > > + __u32 flags; > > + > > + /* > > + * Memory Page Size MINimum : The host should not configure a page size that > > + * is larger than (2 ^ (12 + mpsmin)). Comes from [3] > > + */ > > + __u32 mpsmin; > > + > > + /* > > + * Verify Size Limit : Recommended or supported data size for a verify > > + * command. From [2]. > > + */ > > + __u8 vsl; > > + > > + /* > > + * Write Zeroes Size Limit : Recommended or supported data size for a > > + * zeroes command. From [2]. > > + */ > > + __u8 wzsl; > > + > > + /* > > + * Write Uncorrected Size Limit : Recommended or supported data size for > > + * an uncorrected command. From [2]. > > + */ > > + __u8 wusl; > > + > > + /* > > + * Dataset Management Ranges Limit : Recommended or supported maximum > > + * number of logical block ranges for the Dataset Management Command. > > + * From [2]. > > + */ > > + __u8 dmrl; > > + > > + /* > > + * Dataset Management Size Limit : Recommended or supported maximum of > > + * total number of logical blocks for a Dataset Management Command. > > + * From [2]. > > + */ > > + __le64 dmsl; > > + > > + /* > > + * Dataset Management Range Size Limit : Recommended or supported maximum > > + * number of logical blocks in a range of a Dataset Management Command. > > + * From [2]. > > + */ > > + __le32 dmrsl; > > + > > + /* > > + * Optional NVM Command Support. Is needed to make sense of attributes > > + * like vsl, wzsl, wusl... Comes from [1]. > > + */ > > + __le16 oncs; > > Don't use the little-endian format for values that are not going over > the wire. oops. That is mistake. thx [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 659 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC 3/3] nvme : Add ioctl to query nvme attributes 2022-10-28 10:38 ` Joel Granados @ 2022-10-28 14:52 ` Keith Busch 2022-10-28 15:52 ` Joel Granados 0 siblings, 1 reply; 14+ messages in thread From: Keith Busch @ 2022-10-28 14:52 UTC (permalink / raw) To: Joel Granados Cc: sagi, hch, linux-nvme, gost.dev, joshi.k, javier.gonz, p.raghav On Fri, Oct 28, 2022 at 12:38:27PM +0200, Joel Granados wrote: > 3. And the variables that are not in struct nvme_ctrl. Here, we have not > option but to make an IO. You do have another option: we have historically added new fields to nvme_ctrl to cache parameters that are repeatedly referenced in other contexts, and this usage appears to qualify. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC 3/3] nvme : Add ioctl to query nvme attributes 2022-10-28 14:52 ` Keith Busch @ 2022-10-28 15:52 ` Joel Granados 2022-10-28 16:22 ` Keith Busch 0 siblings, 1 reply; 14+ messages in thread From: Joel Granados @ 2022-10-28 15:52 UTC (permalink / raw) To: Keith Busch Cc: sagi, hch, linux-nvme, gost.dev, joshi.k, javier.gonz, p.raghav [-- Attachment #1: Type: text/plain, Size: 530 bytes --] On Fri, Oct 28, 2022 at 08:52:01AM -0600, Keith Busch wrote: > On Fri, Oct 28, 2022 at 12:38:27PM +0200, Joel Granados wrote: > > 3. And the variables that are not in struct nvme_ctrl. Here, we have not > > option but to make an IO. > > You do have another option: we have historically added new fields to > nvme_ctrl to cache parameters that are repeatedly referenced in other > contexts, and this usage appears to qualify. Agreed. That is yet another option. Will do that for my V1. Thx for the feedback Best [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 659 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC 3/3] nvme : Add ioctl to query nvme attributes 2022-10-28 15:52 ` Joel Granados @ 2022-10-28 16:22 ` Keith Busch 2022-10-31 12:24 ` Joel Granados 0 siblings, 1 reply; 14+ messages in thread From: Keith Busch @ 2022-10-28 16:22 UTC (permalink / raw) To: Joel Granados Cc: sagi, hch, linux-nvme, gost.dev, joshi.k, javier.gonz, p.raghav On Fri, Oct 28, 2022 at 05:52:30PM +0200, Joel Granados wrote: > On Fri, Oct 28, 2022 at 08:52:01AM -0600, Keith Busch wrote: > > On Fri, Oct 28, 2022 at 12:38:27PM +0200, Joel Granados wrote: > > > 3. And the variables that are not in struct nvme_ctrl. Here, we have not > > > option but to make an IO. > > > > You do have another option: we have historically added new fields to > > nvme_ctrl to cache parameters that are repeatedly referenced in other > > contexts, and this usage appears to qualify. > > Agreed. That is yet another option. Will do that for my V1. > Thx for the feedback Just for the record, I don't really like this approach, but I don't have a better suggestion right now other than opening up the admin queue or exporting a ton of sysfs attributes (and each carry a different set of concerns..). My concern here is that drives implementing new specs may have new constraints that applications need to know, but older kernels won't accomodate. You are handling this by error'ing out the query, so it relies on the application implementing a sane fallback (assuming such a fallback for future attributes even exists). This just does not feel very future-proof. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC 3/3] nvme : Add ioctl to query nvme attributes 2022-10-28 16:22 ` Keith Busch @ 2022-10-31 12:24 ` Joel Granados 0 siblings, 0 replies; 14+ messages in thread From: Joel Granados @ 2022-10-31 12:24 UTC (permalink / raw) To: Keith Busch Cc: sagi, hch, linux-nvme, gost.dev, joshi.k, javier.gonz, p.raghav [-- Attachment #1: Type: text/plain, Size: 2339 bytes --] On Fri, Oct 28, 2022 at 10:22:35AM -0600, Keith Busch wrote: > On Fri, Oct 28, 2022 at 05:52:30PM +0200, Joel Granados wrote: > > On Fri, Oct 28, 2022 at 08:52:01AM -0600, Keith Busch wrote: > > > On Fri, Oct 28, 2022 at 12:38:27PM +0200, Joel Granados wrote: > > > > 3. And the variables that are not in struct nvme_ctrl. Here, we have not > > > > option but to make an IO. > > > > > > You do have another option: we have historically added new fields to > > > nvme_ctrl to cache parameters that are repeatedly referenced in other > > > contexts, and this usage appears to qualify. > > > > Agreed. That is yet another option. Will do that for my V1. > > Thx for the feedback > > Just for the record, I don't really like this approach, but I don't have > a better suggestion right now other than opening up the admin queue or > exporting a ton of sysfs attributes (and each carry a different set of > concerns..). > > My concern here is that drives implementing new specs may have new > constraints that applications need to know, but older kernels won't > accomodate. You are handling this by error'ing out the query, so it > relies on the application implementing a sane fallback (assuming such a > fallback for future attributes even exists). This just does not feel > very future-proof. There might be an additional route to handle the case where the user has new kernel headers and is running the ioctl on an older kernel: We can use copy_struct_from_user f5a1a536fa14 ("lib: introduce copy_struct_from_user() helper"). In summary it has a clear behavior/assumptions for the three possible cases: 1. When kernel and user space have the same header version it just copies the struct. 2. When the kernel has a higher version of the struct, it only copies what the user space requested in argsz. 3. When user has an old kernel (The case that you are worried about) it will only fill up the members that the kernel knows about provided that members that are not contained in the kernel struct are zeroed out. What do you think? I did not implement this at the outset because I was wanting to avoid the additional call to copy_struct_from_user. But if this provides a suitable solution for you case, then it is trivial to include it in the patch. Best Joel [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 659 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2022-10-31 12:36 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <CGME20221027160102eucas1p1bb39a9a2b8ed6b54f854849532359332@eucas1p1.samsung.com> 2022-10-27 15:57 ` [RFC 0/3] nvme : Add ioctl to query nvme device attributes Joel Granados [not found] ` <CGME20221027160108eucas1p223383fc27ba0a27a38f7b768c562a1ed@eucas1p2.samsung.com> 2022-10-27 15:57 ` [RFC 1/3] nvme: Return -ENOMEM when kzalloc fails Joel Granados 2022-10-27 16:26 ` Keith Busch 2022-10-28 9:07 ` Joel Granados 2022-10-30 8:02 ` Christoph Hellwig 2022-10-31 12:36 ` Joel Granados [not found] ` <CGME20221027160108eucas1p1cc1351fd8bf35ef6e5220e0cdc865ff4@eucas1p1.samsung.com> 2022-10-27 15:57 ` [RFC 2/3] nvme: Move nvme identify CS to separate function Joel Granados [not found] ` <CGME20221027160108eucas1p2722b30a1be27a855e2b0f2495fed15ab@eucas1p2.samsung.com> 2022-10-27 15:57 ` [RFC 3/3] nvme : Add ioctl to query nvme attributes Joel Granados 2022-10-27 16:55 ` Keith Busch 2022-10-28 10:38 ` Joel Granados 2022-10-28 14:52 ` Keith Busch 2022-10-28 15:52 ` Joel Granados 2022-10-28 16:22 ` Keith Busch 2022-10-31 12:24 ` Joel Granados
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).