All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] nvmet: fix nvme status code when namespace is disabled
@ 2024-04-28  9:25 Sagi Grimberg
  2024-04-29  5:14 ` Christoph Hellwig
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: Sagi Grimberg @ 2024-04-28  9:25 UTC (permalink / raw)
  To: linux-nvme
  Cc: Christoph Hellwig, Keith Busch, Chaitanya Kulkarni, jirong.feng

If the user disabled a nvmet namesapce, it is removed from
the subsystem namespaces list. When nvmet processes a command
directed to an nsid that was disabled, it cannot differentiate
between a nsid that is disabled vs. a non-existent namespace,
and resorts to return NVME_SC_INVALID_NS with the dnr bit set.

This translates to a non-retryable status for the host, which
translates to a user error. We should expect disabled namespaces
to not cause an I/O error in a multipath environment.

Address this by searching a configfs item for the namespace nvmet
failed to find, and if we found one, conclude that the namespace
is disabled (perhaps temporarily). Return NVME_SC_INTERNAL_PATH_ERROR
in this case and keep DNR bit cleared.

Reported-by: Jirong Feng <jirong.feng@easystack.cn>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
Jirong Feng, can you please re-run the test as requested and verify
that the I/O error goes away. If so, please reply with your "Tested-by"
tag.

 drivers/nvme/target/configfs.c | 13 +++++++++++++
 drivers/nvme/target/core.c     |  8 ++++++--
 drivers/nvme/target/nvmet.h    |  1 +
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c
index 77a6e817b315..e744bd0a9cc2 100644
--- a/drivers/nvme/target/configfs.c
+++ b/drivers/nvme/target/configfs.c
@@ -754,6 +754,19 @@ static struct configfs_attribute *nvmet_ns_attrs[] = {
 	NULL,
 };
 
+bool nvmet_subsys_nsid_exists(struct nvmet_subsys *subsys, u32 nsid)
+{
+	struct config_item *ns_item;
+	char name[4] = {};
+
+	if (sprintf(name, "%u", nsid) <= 0)
+		return false;
+	mutex_lock(&subsys->namespaces_group.cg_subsys->su_mutex);
+	ns_item = config_group_find_item(&subsys->namespaces_group, name);
+	mutex_unlock(&subsys->namespaces_group.cg_subsys->su_mutex);
+	return ns_item != NULL;
+}
+
 static void nvmet_ns_release(struct config_item *item)
 {
 	struct nvmet_ns *ns = to_nvmet_ns(item);
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index 78cba026610a..ed0b8f3707b4 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -437,11 +437,15 @@ void nvmet_stop_keep_alive_timer(struct nvmet_ctrl *ctrl)
 u16 nvmet_req_find_ns(struct nvmet_req *req)
 {
 	u32 nsid = le32_to_cpu(req->cmd->common.nsid);
+	struct nvmet_subsys *subsys = nvmet_req_subsys(req);
 
-	req->ns = xa_load(&nvmet_req_subsys(req)->namespaces, nsid);
+	req->ns = xa_load(&subsys->namespaces, nsid);
 	if (unlikely(!req->ns)) {
 		req->error_loc = offsetof(struct nvme_common_command, nsid);
-		return NVME_SC_INVALID_NS | NVME_SC_DNR;
+		if (nvmet_subsys_nsid_exists(subsys, nsid))
+			return NVME_SC_INTERNAL_PATH_ERROR;
+		else
+			return NVME_SC_INVALID_NS | NVME_SC_DNR;
 	}
 
 	percpu_ref_get(&req->ns->ref);
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index f460728e1df1..c1306de1f4dd 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -543,6 +543,7 @@ void nvmet_subsys_disc_changed(struct nvmet_subsys *subsys,
 		struct nvmet_host *host);
 void nvmet_add_async_event(struct nvmet_ctrl *ctrl, u8 event_type,
 		u8 event_info, u8 log_page);
+bool nvmet_subsys_nsid_exists(struct nvmet_subsys *subsys, u32 nsid);
 
 #define NVMET_MIN_QUEUE_SIZE	16
 #define NVMET_MAX_QUEUE_SIZE	1024
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvmet: fix nvme status code when namespace is disabled
  2024-04-28  9:25 [PATCH] nvmet: fix nvme status code when namespace is disabled Sagi Grimberg
@ 2024-04-29  5:14 ` Christoph Hellwig
  2024-04-29 11:34 ` Jirong Feng
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2024-04-29  5:14 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: linux-nvme, Christoph Hellwig, Keith Busch, Chaitanya Kulkarni,
	jirong.feng

On Sun, Apr 28, 2024 at 12:25:40PM +0300, Sagi Grimberg wrote:
> If the user disabled a nvmet namesapce, it is removed from

s/namesapce/namespace/

(lines also look a little short here, maybe use up all 73 chars
when refreshing it).

> +		if (nvmet_subsys_nsid_exists(subsys, nsid))
> +			return NVME_SC_INTERNAL_PATH_ERROR;
> +		else
> +			return NVME_SC_INVALID_NS | NVME_SC_DNR;

No need for an else after a return.

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvmet: fix nvme status code when namespace is disabled
  2024-04-28  9:25 [PATCH] nvmet: fix nvme status code when namespace is disabled Sagi Grimberg
  2024-04-29  5:14 ` Christoph Hellwig
@ 2024-04-29 11:34 ` Jirong Feng
  2024-04-29 11:52 ` Keith Busch
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Jirong Feng @ 2024-04-29 11:34 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme
  Cc: Christoph Hellwig, Keith Busch, Chaitanya Kulkarni

> Jirong Feng, can you please re-run the test as requested and verify
> that the I/O error goes away. If so, please reply with your "Tested-by"
> tag.

Having re-runned the test basing on kernel 6.9.0-rc5, the error does go 
away.

Tested-by: Jirong Feng <jirong.feng@easystack.cn>




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvmet: fix nvme status code when namespace is disabled
  2024-04-28  9:25 [PATCH] nvmet: fix nvme status code when namespace is disabled Sagi Grimberg
  2024-04-29  5:14 ` Christoph Hellwig
  2024-04-29 11:34 ` Jirong Feng
@ 2024-04-29 11:52 ` Keith Busch
  2024-04-30  3:38 ` Chaitanya Kulkarni
  2024-05-23  9:49 ` Hannes Reinecke
  4 siblings, 0 replies; 7+ messages in thread
From: Keith Busch @ 2024-04-29 11:52 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: linux-nvme, Christoph Hellwig, Chaitanya Kulkarni, jirong.feng

On Sun, Apr 28, 2024 at 12:25:40PM +0300, Sagi Grimberg wrote:
> If the user disabled a nvmet namesapce, it is removed from
> the subsystem namespaces list. When nvmet processes a command
> directed to an nsid that was disabled, it cannot differentiate
> between a nsid that is disabled vs. a non-existent namespace,
> and resorts to return NVME_SC_INVALID_NS with the dnr bit set.
> 
> This translates to a non-retryable status for the host, which
> translates to a user error. We should expect disabled namespaces
> to not cause an I/O error in a multipath environment.
> 
> Address this by searching a configfs item for the namespace nvmet
> failed to find, and if we found one, conclude that the namespace
> is disabled (perhaps temporarily). Return NVME_SC_INTERNAL_PATH_ERROR
> in this case and keep DNR bit cleared.
> 
> Reported-by: Jirong Feng <jirong.feng@easystack.cn>
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>

Applied with Jirong's "Tested-by" and Christoph's suggestions (spelling,
word wrap, and unnecessary 'else') to nvme-6.9.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvmet: fix nvme status code when namespace is disabled
  2024-04-28  9:25 [PATCH] nvmet: fix nvme status code when namespace is disabled Sagi Grimberg
                   ` (2 preceding siblings ...)
  2024-04-29 11:52 ` Keith Busch
@ 2024-04-30  3:38 ` Chaitanya Kulkarni
  2024-05-23  9:49 ` Hannes Reinecke
  4 siblings, 0 replies; 7+ messages in thread
From: Chaitanya Kulkarni @ 2024-04-30  3:38 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme
  Cc: Christoph Hellwig, Keith Busch, Chaitanya Kulkarni, jirong.feng

On 4/28/2024 2:25 AM, Sagi Grimberg wrote:
> If the user disabled a nvmet namesapce, it is removed from
> the subsystem namespaces list. When nvmet processes a command
> directed to an nsid that was disabled, it cannot differentiate
> between a nsid that is disabled vs. a non-existent namespace,
> and resorts to return NVME_SC_INVALID_NS with the dnr bit set.
> 
> This translates to a non-retryable status for the host, which
> translates to a user error. We should expect disabled namespaces
> to not cause an I/O error in a multipath environment.
> 
> Address this by searching a configfs item for the namespace nvmet
> failed to find, and if we found one, conclude that the namespace
> is disabled (perhaps temporarily). Return NVME_SC_INTERNAL_PATH_ERROR
> in this case and keep DNR bit cleared.
> 
> Reported-by: Jirong Feng <jirong.feng@easystack.cn>
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
> ---
> Jirong Feng, can you please re-run the test as requested and verify
> that the I/O error goes away. If so, please reply with your "Tested-by"
> tag.
> 

FWIW,

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvmet: fix nvme status code when namespace is disabled
  2024-04-28  9:25 [PATCH] nvmet: fix nvme status code when namespace is disabled Sagi Grimberg
                   ` (3 preceding siblings ...)
  2024-04-30  3:38 ` Chaitanya Kulkarni
@ 2024-05-23  9:49 ` Hannes Reinecke
  2024-05-23 10:10   ` Sagi Grimberg
  4 siblings, 1 reply; 7+ messages in thread
From: Hannes Reinecke @ 2024-05-23  9:49 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme
  Cc: Christoph Hellwig, Keith Busch, Chaitanya Kulkarni, jirong.feng

On 4/28/24 11:25, Sagi Grimberg wrote:
> If the user disabled a nvmet namesapce, it is removed from
> the subsystem namespaces list. When nvmet processes a command
> directed to an nsid that was disabled, it cannot differentiate
> between a nsid that is disabled vs. a non-existent namespace,
> and resorts to return NVME_SC_INVALID_NS with the dnr bit set.
> 
> This translates to a non-retryable status for the host, which
> translates to a user error. We should expect disabled namespaces
> to not cause an I/O error in a multipath environment.
> 
> Address this by searching a configfs item for the namespace nvmet
> failed to find, and if we found one, conclude that the namespace
> is disabled (perhaps temporarily). Return NVME_SC_INTERNAL_PATH_ERROR
> in this case and keep DNR bit cleared.
> 
> Reported-by: Jirong Feng <jirong.feng@easystack.cn>
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
> ---
> Jirong Feng, can you please re-run the test as requested and verify
> that the I/O error goes away. If so, please reply with your "Tested-by"
> tag.
> 
>   drivers/nvme/target/configfs.c | 13 +++++++++++++
>   drivers/nvme/target/core.c     |  8 ++++++--
>   drivers/nvme/target/nvmet.h    |  1 +
>   3 files changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c
> index 77a6e817b315..e744bd0a9cc2 100644
> --- a/drivers/nvme/target/configfs.c
> +++ b/drivers/nvme/target/configfs.c
> @@ -754,6 +754,19 @@ static struct configfs_attribute *nvmet_ns_attrs[] = {
>   	NULL,
>   };
>   
> +bool nvmet_subsys_nsid_exists(struct nvmet_subsys *subsys, u32 nsid)
> +{
> +	struct config_item *ns_item;
> +	char name[4] = {};
> +
> +	if (sprintf(name, "%u", nsid) <= 0)
> +		return false;
> +	mutex_lock(&subsys->namespaces_group.cg_subsys->su_mutex);
> +	ns_item = config_group_find_item(&subsys->namespaces_group, name);
> +	mutex_unlock(&subsys->namespaces_group.cg_subsys->su_mutex);
> +	return ns_item != NULL;
> +}
> +
>   static void nvmet_ns_release(struct config_item *item)
>   {
>   	struct nvmet_ns *ns = to_nvmet_ns(item);
> diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
> index 78cba026610a..ed0b8f3707b4 100644
> --- a/drivers/nvme/target/core.c
> +++ b/drivers/nvme/target/core.c
> @@ -437,11 +437,15 @@ void nvmet_stop_keep_alive_timer(struct nvmet_ctrl *ctrl)
>   u16 nvmet_req_find_ns(struct nvmet_req *req)
>   {
>   	u32 nsid = le32_to_cpu(req->cmd->common.nsid);
> +	struct nvmet_subsys *subsys = nvmet_req_subsys(req);
>   
> -	req->ns = xa_load(&nvmet_req_subsys(req)->namespaces, nsid);
> +	req->ns = xa_load(&subsys->namespaces, nsid);
>   	if (unlikely(!req->ns)) {
>   		req->error_loc = offsetof(struct nvme_common_command, nsid);
> -		return NVME_SC_INVALID_NS | NVME_SC_DNR;
> +		if (nvmet_subsys_nsid_exists(subsys, nsid))
> +			return NVME_SC_INTERNAL_PATH_ERROR;
> +		else
> +			return NVME_SC_INVALID_NS | NVME_SC_DNR;
>   	}
>   
>   	percpu_ref_get(&req->ns->ref);
> diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
> index f460728e1df1..c1306de1f4dd 100644
> --- a/drivers/nvme/target/nvmet.h
> +++ b/drivers/nvme/target/nvmet.h
> @@ -543,6 +543,7 @@ void nvmet_subsys_disc_changed(struct nvmet_subsys *subsys,
>   		struct nvmet_host *host);
>   void nvmet_add_async_event(struct nvmet_ctrl *ctrl, u8 event_type,
>   		u8 event_info, u8 log_page);
> +bool nvmet_subsys_nsid_exists(struct nvmet_subsys *subsys, u32 nsid);
>   
>   #define NVMET_MIN_QUEUE_SIZE	16
>   #define NVMET_MAX_QUEUE_SIZE	1024

Shouldn't we rather use 'ANA inaccessible' status here?
Just to make it clear that the namespace is disabled, and not some other 
internal error?

Cheers,

Hannes



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvmet: fix nvme status code when namespace is disabled
  2024-05-23  9:49 ` Hannes Reinecke
@ 2024-05-23 10:10   ` Sagi Grimberg
  0 siblings, 0 replies; 7+ messages in thread
From: Sagi Grimberg @ 2024-05-23 10:10 UTC (permalink / raw)
  To: Hannes Reinecke, linux-nvme
  Cc: Christoph Hellwig, Keith Busch, Chaitanya Kulkarni, jirong.feng



On 23/05/2024 12:49, Hannes Reinecke wrote:
> On 4/28/24 11:25, Sagi Grimberg wrote:
>> If the user disabled a nvmet namesapce, it is removed from
>> the subsystem namespaces list. When nvmet processes a command
>> directed to an nsid that was disabled, it cannot differentiate
>> between a nsid that is disabled vs. a non-existent namespace,
>> and resorts to return NVME_SC_INVALID_NS with the dnr bit set.
>>
>> This translates to a non-retryable status for the host, which
>> translates to a user error. We should expect disabled namespaces
>> to not cause an I/O error in a multipath environment.
>>
>> Address this by searching a configfs item for the namespace nvmet
>> failed to find, and if we found one, conclude that the namespace
>> is disabled (perhaps temporarily). Return NVME_SC_INTERNAL_PATH_ERROR
>> in this case and keep DNR bit cleared.
>>
>> Reported-by: Jirong Feng <jirong.feng@easystack.cn>
>> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
>> ---
>> Jirong Feng, can you please re-run the test as requested and verify
>> that the I/O error goes away. If so, please reply with your "Tested-by"
>> tag.
>>
>>   drivers/nvme/target/configfs.c | 13 +++++++++++++
>>   drivers/nvme/target/core.c     |  8 ++++++--
>>   drivers/nvme/target/nvmet.h    |  1 +
>>   3 files changed, 20 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/nvme/target/configfs.c 
>> b/drivers/nvme/target/configfs.c
>> index 77a6e817b315..e744bd0a9cc2 100644
>> --- a/drivers/nvme/target/configfs.c
>> +++ b/drivers/nvme/target/configfs.c
>> @@ -754,6 +754,19 @@ static struct configfs_attribute 
>> *nvmet_ns_attrs[] = {
>>       NULL,
>>   };
>>   +bool nvmet_subsys_nsid_exists(struct nvmet_subsys *subsys, u32 nsid)
>> +{
>> +    struct config_item *ns_item;
>> +    char name[4] = {};
>> +
>> +    if (sprintf(name, "%u", nsid) <= 0)
>> +        return false;
>> + mutex_lock(&subsys->namespaces_group.cg_subsys->su_mutex);
>> +    ns_item = config_group_find_item(&subsys->namespaces_group, name);
>> + mutex_unlock(&subsys->namespaces_group.cg_subsys->su_mutex);
>> +    return ns_item != NULL;
>> +}
>> +
>>   static void nvmet_ns_release(struct config_item *item)
>>   {
>>       struct nvmet_ns *ns = to_nvmet_ns(item);
>> diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
>> index 78cba026610a..ed0b8f3707b4 100644
>> --- a/drivers/nvme/target/core.c
>> +++ b/drivers/nvme/target/core.c
>> @@ -437,11 +437,15 @@ void nvmet_stop_keep_alive_timer(struct 
>> nvmet_ctrl *ctrl)
>>   u16 nvmet_req_find_ns(struct nvmet_req *req)
>>   {
>>       u32 nsid = le32_to_cpu(req->cmd->common.nsid);
>> +    struct nvmet_subsys *subsys = nvmet_req_subsys(req);
>>   -    req->ns = xa_load(&nvmet_req_subsys(req)->namespaces, nsid);
>> +    req->ns = xa_load(&subsys->namespaces, nsid);
>>       if (unlikely(!req->ns)) {
>>           req->error_loc = offsetof(struct nvme_common_command, nsid);
>> -        return NVME_SC_INVALID_NS | NVME_SC_DNR;
>> +        if (nvmet_subsys_nsid_exists(subsys, nsid))
>> +            return NVME_SC_INTERNAL_PATH_ERROR;
>> +        else
>> +            return NVME_SC_INVALID_NS | NVME_SC_DNR;
>>       }
>>         percpu_ref_get(&req->ns->ref);
>> diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
>> index f460728e1df1..c1306de1f4dd 100644
>> --- a/drivers/nvme/target/nvmet.h
>> +++ b/drivers/nvme/target/nvmet.h
>> @@ -543,6 +543,7 @@ void nvmet_subsys_disc_changed(struct 
>> nvmet_subsys *subsys,
>>           struct nvmet_host *host);
>>   void nvmet_add_async_event(struct nvmet_ctrl *ctrl, u8 event_type,
>>           u8 event_info, u8 log_page);
>> +bool nvmet_subsys_nsid_exists(struct nvmet_subsys *subsys, u32 nsid);
>>     #define NVMET_MIN_QUEUE_SIZE    16
>>   #define NVMET_MAX_QUEUE_SIZE    1024
>
> Shouldn't we rather use 'ANA inaccessible' status here?
> Just to make it clear that the namespace is disabled, and not some 
> other internal error?

No, because this is not related to the ANA group, but a namespace error.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-05-23 10:10 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-28  9:25 [PATCH] nvmet: fix nvme status code when namespace is disabled Sagi Grimberg
2024-04-29  5:14 ` Christoph Hellwig
2024-04-29 11:34 ` Jirong Feng
2024-04-29 11:52 ` Keith Busch
2024-04-30  3:38 ` Chaitanya Kulkarni
2024-05-23  9:49 ` Hannes Reinecke
2024-05-23 10:10   ` Sagi Grimberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.