* [PATCH] nvmet: fix nvme status code when namespace is disabled
@ 2024-04-28 9:25 Sagi Grimberg
2024-04-29 5:14 ` Christoph Hellwig
` (4 more replies)
0 siblings, 5 replies; 7+ messages in thread
From: Sagi Grimberg @ 2024-04-28 9:25 UTC (permalink / raw)
To: linux-nvme
Cc: Christoph Hellwig, Keith Busch, Chaitanya Kulkarni, jirong.feng
If the user disabled a nvmet namesapce, it is removed from
the subsystem namespaces list. When nvmet processes a command
directed to an nsid that was disabled, it cannot differentiate
between a nsid that is disabled vs. a non-existent namespace,
and resorts to return NVME_SC_INVALID_NS with the dnr bit set.
This translates to a non-retryable status for the host, which
translates to a user error. We should expect disabled namespaces
to not cause an I/O error in a multipath environment.
Address this by searching a configfs item for the namespace nvmet
failed to find, and if we found one, conclude that the namespace
is disabled (perhaps temporarily). Return NVME_SC_INTERNAL_PATH_ERROR
in this case and keep DNR bit cleared.
Reported-by: Jirong Feng <jirong.feng@easystack.cn>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
Jirong Feng, can you please re-run the test as requested and verify
that the I/O error goes away. If so, please reply with your "Tested-by"
tag.
drivers/nvme/target/configfs.c | 13 +++++++++++++
drivers/nvme/target/core.c | 8 ++++++--
drivers/nvme/target/nvmet.h | 1 +
3 files changed, 20 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c
index 77a6e817b315..e744bd0a9cc2 100644
--- a/drivers/nvme/target/configfs.c
+++ b/drivers/nvme/target/configfs.c
@@ -754,6 +754,19 @@ static struct configfs_attribute *nvmet_ns_attrs[] = {
NULL,
};
+bool nvmet_subsys_nsid_exists(struct nvmet_subsys *subsys, u32 nsid)
+{
+ struct config_item *ns_item;
+ char name[4] = {};
+
+ if (sprintf(name, "%u", nsid) <= 0)
+ return false;
+ mutex_lock(&subsys->namespaces_group.cg_subsys->su_mutex);
+ ns_item = config_group_find_item(&subsys->namespaces_group, name);
+ mutex_unlock(&subsys->namespaces_group.cg_subsys->su_mutex);
+ return ns_item != NULL;
+}
+
static void nvmet_ns_release(struct config_item *item)
{
struct nvmet_ns *ns = to_nvmet_ns(item);
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index 78cba026610a..ed0b8f3707b4 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -437,11 +437,15 @@ void nvmet_stop_keep_alive_timer(struct nvmet_ctrl *ctrl)
u16 nvmet_req_find_ns(struct nvmet_req *req)
{
u32 nsid = le32_to_cpu(req->cmd->common.nsid);
+ struct nvmet_subsys *subsys = nvmet_req_subsys(req);
- req->ns = xa_load(&nvmet_req_subsys(req)->namespaces, nsid);
+ req->ns = xa_load(&subsys->namespaces, nsid);
if (unlikely(!req->ns)) {
req->error_loc = offsetof(struct nvme_common_command, nsid);
- return NVME_SC_INVALID_NS | NVME_SC_DNR;
+ if (nvmet_subsys_nsid_exists(subsys, nsid))
+ return NVME_SC_INTERNAL_PATH_ERROR;
+ else
+ return NVME_SC_INVALID_NS | NVME_SC_DNR;
}
percpu_ref_get(&req->ns->ref);
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index f460728e1df1..c1306de1f4dd 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -543,6 +543,7 @@ void nvmet_subsys_disc_changed(struct nvmet_subsys *subsys,
struct nvmet_host *host);
void nvmet_add_async_event(struct nvmet_ctrl *ctrl, u8 event_type,
u8 event_info, u8 log_page);
+bool nvmet_subsys_nsid_exists(struct nvmet_subsys *subsys, u32 nsid);
#define NVMET_MIN_QUEUE_SIZE 16
#define NVMET_MAX_QUEUE_SIZE 1024
--
2.40.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] nvmet: fix nvme status code when namespace is disabled
2024-04-28 9:25 [PATCH] nvmet: fix nvme status code when namespace is disabled Sagi Grimberg
@ 2024-04-29 5:14 ` Christoph Hellwig
2024-04-29 11:34 ` Jirong Feng
` (3 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2024-04-29 5:14 UTC (permalink / raw)
To: Sagi Grimberg
Cc: linux-nvme, Christoph Hellwig, Keith Busch, Chaitanya Kulkarni,
jirong.feng
On Sun, Apr 28, 2024 at 12:25:40PM +0300, Sagi Grimberg wrote:
> If the user disabled a nvmet namesapce, it is removed from
s/namesapce/namespace/
(lines also look a little short here, maybe use up all 73 chars
when refreshing it).
> + if (nvmet_subsys_nsid_exists(subsys, nsid))
> + return NVME_SC_INTERNAL_PATH_ERROR;
> + else
> + return NVME_SC_INVALID_NS | NVME_SC_DNR;
No need for an else after a return.
Otherwise looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] nvmet: fix nvme status code when namespace is disabled
2024-04-28 9:25 [PATCH] nvmet: fix nvme status code when namespace is disabled Sagi Grimberg
2024-04-29 5:14 ` Christoph Hellwig
@ 2024-04-29 11:34 ` Jirong Feng
2024-04-29 11:52 ` Keith Busch
` (2 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: Jirong Feng @ 2024-04-29 11:34 UTC (permalink / raw)
To: Sagi Grimberg, linux-nvme
Cc: Christoph Hellwig, Keith Busch, Chaitanya Kulkarni
> Jirong Feng, can you please re-run the test as requested and verify
> that the I/O error goes away. If so, please reply with your "Tested-by"
> tag.
Having re-runned the test basing on kernel 6.9.0-rc5, the error does go
away.
Tested-by: Jirong Feng <jirong.feng@easystack.cn>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] nvmet: fix nvme status code when namespace is disabled
2024-04-28 9:25 [PATCH] nvmet: fix nvme status code when namespace is disabled Sagi Grimberg
2024-04-29 5:14 ` Christoph Hellwig
2024-04-29 11:34 ` Jirong Feng
@ 2024-04-29 11:52 ` Keith Busch
2024-04-30 3:38 ` Chaitanya Kulkarni
2024-05-23 9:49 ` Hannes Reinecke
4 siblings, 0 replies; 7+ messages in thread
From: Keith Busch @ 2024-04-29 11:52 UTC (permalink / raw)
To: Sagi Grimberg
Cc: linux-nvme, Christoph Hellwig, Chaitanya Kulkarni, jirong.feng
On Sun, Apr 28, 2024 at 12:25:40PM +0300, Sagi Grimberg wrote:
> If the user disabled a nvmet namesapce, it is removed from
> the subsystem namespaces list. When nvmet processes a command
> directed to an nsid that was disabled, it cannot differentiate
> between a nsid that is disabled vs. a non-existent namespace,
> and resorts to return NVME_SC_INVALID_NS with the dnr bit set.
>
> This translates to a non-retryable status for the host, which
> translates to a user error. We should expect disabled namespaces
> to not cause an I/O error in a multipath environment.
>
> Address this by searching a configfs item for the namespace nvmet
> failed to find, and if we found one, conclude that the namespace
> is disabled (perhaps temporarily). Return NVME_SC_INTERNAL_PATH_ERROR
> in this case and keep DNR bit cleared.
>
> Reported-by: Jirong Feng <jirong.feng@easystack.cn>
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Applied with Jirong's "Tested-by" and Christoph's suggestions (spelling,
word wrap, and unnecessary 'else') to nvme-6.9.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] nvmet: fix nvme status code when namespace is disabled
2024-04-28 9:25 [PATCH] nvmet: fix nvme status code when namespace is disabled Sagi Grimberg
` (2 preceding siblings ...)
2024-04-29 11:52 ` Keith Busch
@ 2024-04-30 3:38 ` Chaitanya Kulkarni
2024-05-23 9:49 ` Hannes Reinecke
4 siblings, 0 replies; 7+ messages in thread
From: Chaitanya Kulkarni @ 2024-04-30 3:38 UTC (permalink / raw)
To: Sagi Grimberg, linux-nvme
Cc: Christoph Hellwig, Keith Busch, Chaitanya Kulkarni, jirong.feng
On 4/28/2024 2:25 AM, Sagi Grimberg wrote:
> If the user disabled a nvmet namesapce, it is removed from
> the subsystem namespaces list. When nvmet processes a command
> directed to an nsid that was disabled, it cannot differentiate
> between a nsid that is disabled vs. a non-existent namespace,
> and resorts to return NVME_SC_INVALID_NS with the dnr bit set.
>
> This translates to a non-retryable status for the host, which
> translates to a user error. We should expect disabled namespaces
> to not cause an I/O error in a multipath environment.
>
> Address this by searching a configfs item for the namespace nvmet
> failed to find, and if we found one, conclude that the namespace
> is disabled (perhaps temporarily). Return NVME_SC_INTERNAL_PATH_ERROR
> in this case and keep DNR bit cleared.
>
> Reported-by: Jirong Feng <jirong.feng@easystack.cn>
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
> ---
> Jirong Feng, can you please re-run the test as requested and verify
> that the I/O error goes away. If so, please reply with your "Tested-by"
> tag.
>
FWIW,
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
-ck
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] nvmet: fix nvme status code when namespace is disabled
2024-04-28 9:25 [PATCH] nvmet: fix nvme status code when namespace is disabled Sagi Grimberg
` (3 preceding siblings ...)
2024-04-30 3:38 ` Chaitanya Kulkarni
@ 2024-05-23 9:49 ` Hannes Reinecke
2024-05-23 10:10 ` Sagi Grimberg
4 siblings, 1 reply; 7+ messages in thread
From: Hannes Reinecke @ 2024-05-23 9:49 UTC (permalink / raw)
To: Sagi Grimberg, linux-nvme
Cc: Christoph Hellwig, Keith Busch, Chaitanya Kulkarni, jirong.feng
On 4/28/24 11:25, Sagi Grimberg wrote:
> If the user disabled a nvmet namesapce, it is removed from
> the subsystem namespaces list. When nvmet processes a command
> directed to an nsid that was disabled, it cannot differentiate
> between a nsid that is disabled vs. a non-existent namespace,
> and resorts to return NVME_SC_INVALID_NS with the dnr bit set.
>
> This translates to a non-retryable status for the host, which
> translates to a user error. We should expect disabled namespaces
> to not cause an I/O error in a multipath environment.
>
> Address this by searching a configfs item for the namespace nvmet
> failed to find, and if we found one, conclude that the namespace
> is disabled (perhaps temporarily). Return NVME_SC_INTERNAL_PATH_ERROR
> in this case and keep DNR bit cleared.
>
> Reported-by: Jirong Feng <jirong.feng@easystack.cn>
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
> ---
> Jirong Feng, can you please re-run the test as requested and verify
> that the I/O error goes away. If so, please reply with your "Tested-by"
> tag.
>
> drivers/nvme/target/configfs.c | 13 +++++++++++++
> drivers/nvme/target/core.c | 8 ++++++--
> drivers/nvme/target/nvmet.h | 1 +
> 3 files changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c
> index 77a6e817b315..e744bd0a9cc2 100644
> --- a/drivers/nvme/target/configfs.c
> +++ b/drivers/nvme/target/configfs.c
> @@ -754,6 +754,19 @@ static struct configfs_attribute *nvmet_ns_attrs[] = {
> NULL,
> };
>
> +bool nvmet_subsys_nsid_exists(struct nvmet_subsys *subsys, u32 nsid)
> +{
> + struct config_item *ns_item;
> + char name[4] = {};
> +
> + if (sprintf(name, "%u", nsid) <= 0)
> + return false;
> + mutex_lock(&subsys->namespaces_group.cg_subsys->su_mutex);
> + ns_item = config_group_find_item(&subsys->namespaces_group, name);
> + mutex_unlock(&subsys->namespaces_group.cg_subsys->su_mutex);
> + return ns_item != NULL;
> +}
> +
> static void nvmet_ns_release(struct config_item *item)
> {
> struct nvmet_ns *ns = to_nvmet_ns(item);
> diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
> index 78cba026610a..ed0b8f3707b4 100644
> --- a/drivers/nvme/target/core.c
> +++ b/drivers/nvme/target/core.c
> @@ -437,11 +437,15 @@ void nvmet_stop_keep_alive_timer(struct nvmet_ctrl *ctrl)
> u16 nvmet_req_find_ns(struct nvmet_req *req)
> {
> u32 nsid = le32_to_cpu(req->cmd->common.nsid);
> + struct nvmet_subsys *subsys = nvmet_req_subsys(req);
>
> - req->ns = xa_load(&nvmet_req_subsys(req)->namespaces, nsid);
> + req->ns = xa_load(&subsys->namespaces, nsid);
> if (unlikely(!req->ns)) {
> req->error_loc = offsetof(struct nvme_common_command, nsid);
> - return NVME_SC_INVALID_NS | NVME_SC_DNR;
> + if (nvmet_subsys_nsid_exists(subsys, nsid))
> + return NVME_SC_INTERNAL_PATH_ERROR;
> + else
> + return NVME_SC_INVALID_NS | NVME_SC_DNR;
> }
>
> percpu_ref_get(&req->ns->ref);
> diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
> index f460728e1df1..c1306de1f4dd 100644
> --- a/drivers/nvme/target/nvmet.h
> +++ b/drivers/nvme/target/nvmet.h
> @@ -543,6 +543,7 @@ void nvmet_subsys_disc_changed(struct nvmet_subsys *subsys,
> struct nvmet_host *host);
> void nvmet_add_async_event(struct nvmet_ctrl *ctrl, u8 event_type,
> u8 event_info, u8 log_page);
> +bool nvmet_subsys_nsid_exists(struct nvmet_subsys *subsys, u32 nsid);
>
> #define NVMET_MIN_QUEUE_SIZE 16
> #define NVMET_MAX_QUEUE_SIZE 1024
Shouldn't we rather use 'ANA inaccessible' status here?
Just to make it clear that the namespace is disabled, and not some other
internal error?
Cheers,
Hannes
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] nvmet: fix nvme status code when namespace is disabled
2024-05-23 9:49 ` Hannes Reinecke
@ 2024-05-23 10:10 ` Sagi Grimberg
0 siblings, 0 replies; 7+ messages in thread
From: Sagi Grimberg @ 2024-05-23 10:10 UTC (permalink / raw)
To: Hannes Reinecke, linux-nvme
Cc: Christoph Hellwig, Keith Busch, Chaitanya Kulkarni, jirong.feng
On 23/05/2024 12:49, Hannes Reinecke wrote:
> On 4/28/24 11:25, Sagi Grimberg wrote:
>> If the user disabled a nvmet namesapce, it is removed from
>> the subsystem namespaces list. When nvmet processes a command
>> directed to an nsid that was disabled, it cannot differentiate
>> between a nsid that is disabled vs. a non-existent namespace,
>> and resorts to return NVME_SC_INVALID_NS with the dnr bit set.
>>
>> This translates to a non-retryable status for the host, which
>> translates to a user error. We should expect disabled namespaces
>> to not cause an I/O error in a multipath environment.
>>
>> Address this by searching a configfs item for the namespace nvmet
>> failed to find, and if we found one, conclude that the namespace
>> is disabled (perhaps temporarily). Return NVME_SC_INTERNAL_PATH_ERROR
>> in this case and keep DNR bit cleared.
>>
>> Reported-by: Jirong Feng <jirong.feng@easystack.cn>
>> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
>> ---
>> Jirong Feng, can you please re-run the test as requested and verify
>> that the I/O error goes away. If so, please reply with your "Tested-by"
>> tag.
>>
>> drivers/nvme/target/configfs.c | 13 +++++++++++++
>> drivers/nvme/target/core.c | 8 ++++++--
>> drivers/nvme/target/nvmet.h | 1 +
>> 3 files changed, 20 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/nvme/target/configfs.c
>> b/drivers/nvme/target/configfs.c
>> index 77a6e817b315..e744bd0a9cc2 100644
>> --- a/drivers/nvme/target/configfs.c
>> +++ b/drivers/nvme/target/configfs.c
>> @@ -754,6 +754,19 @@ static struct configfs_attribute
>> *nvmet_ns_attrs[] = {
>> NULL,
>> };
>> +bool nvmet_subsys_nsid_exists(struct nvmet_subsys *subsys, u32 nsid)
>> +{
>> + struct config_item *ns_item;
>> + char name[4] = {};
>> +
>> + if (sprintf(name, "%u", nsid) <= 0)
>> + return false;
>> + mutex_lock(&subsys->namespaces_group.cg_subsys->su_mutex);
>> + ns_item = config_group_find_item(&subsys->namespaces_group, name);
>> + mutex_unlock(&subsys->namespaces_group.cg_subsys->su_mutex);
>> + return ns_item != NULL;
>> +}
>> +
>> static void nvmet_ns_release(struct config_item *item)
>> {
>> struct nvmet_ns *ns = to_nvmet_ns(item);
>> diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
>> index 78cba026610a..ed0b8f3707b4 100644
>> --- a/drivers/nvme/target/core.c
>> +++ b/drivers/nvme/target/core.c
>> @@ -437,11 +437,15 @@ void nvmet_stop_keep_alive_timer(struct
>> nvmet_ctrl *ctrl)
>> u16 nvmet_req_find_ns(struct nvmet_req *req)
>> {
>> u32 nsid = le32_to_cpu(req->cmd->common.nsid);
>> + struct nvmet_subsys *subsys = nvmet_req_subsys(req);
>> - req->ns = xa_load(&nvmet_req_subsys(req)->namespaces, nsid);
>> + req->ns = xa_load(&subsys->namespaces, nsid);
>> if (unlikely(!req->ns)) {
>> req->error_loc = offsetof(struct nvme_common_command, nsid);
>> - return NVME_SC_INVALID_NS | NVME_SC_DNR;
>> + if (nvmet_subsys_nsid_exists(subsys, nsid))
>> + return NVME_SC_INTERNAL_PATH_ERROR;
>> + else
>> + return NVME_SC_INVALID_NS | NVME_SC_DNR;
>> }
>> percpu_ref_get(&req->ns->ref);
>> diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
>> index f460728e1df1..c1306de1f4dd 100644
>> --- a/drivers/nvme/target/nvmet.h
>> +++ b/drivers/nvme/target/nvmet.h
>> @@ -543,6 +543,7 @@ void nvmet_subsys_disc_changed(struct
>> nvmet_subsys *subsys,
>> struct nvmet_host *host);
>> void nvmet_add_async_event(struct nvmet_ctrl *ctrl, u8 event_type,
>> u8 event_info, u8 log_page);
>> +bool nvmet_subsys_nsid_exists(struct nvmet_subsys *subsys, u32 nsid);
>> #define NVMET_MIN_QUEUE_SIZE 16
>> #define NVMET_MAX_QUEUE_SIZE 1024
>
> Shouldn't we rather use 'ANA inaccessible' status here?
> Just to make it clear that the namespace is disabled, and not some
> other internal error?
No, because this is not related to the ANA group, but a namespace error.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-05-23 10:10 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-28 9:25 [PATCH] nvmet: fix nvme status code when namespace is disabled Sagi Grimberg
2024-04-29 5:14 ` Christoph Hellwig
2024-04-29 11:34 ` Jirong Feng
2024-04-29 11:52 ` Keith Busch
2024-04-30 3:38 ` Chaitanya Kulkarni
2024-05-23 9:49 ` Hannes Reinecke
2024-05-23 10:10 ` Sagi Grimberg
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.