* [PATCH 1/4] nvme: authentication error are always non-retryable
2024-03-01 11:28 [PATCH 0/4] nvme: fixes for authentication errors Hannes Reinecke
@ 2024-03-01 11:28 ` Hannes Reinecke
2024-03-01 13:12 ` Christoph Hellwig
2024-03-01 11:28 ` [PATCH 2/4] nvmet: lock config semaphore when accessing DH-HMAC-CHAP key Hannes Reinecke
` (4 subsequent siblings)
5 siblings, 1 reply; 20+ messages in thread
From: Hannes Reinecke @ 2024-03-01 11:28 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Keith Busch, Sagi Grimberg, linux-nvme, Hannes Reinecke
From: Hannes Reinecke <hare@suse.de>
Any authentication errors which are generated internally are always
non-retryable, so set the DNR bit to ensure they are not retried.
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
drivers/nvme/host/core.c | 6 +++---
drivers/nvme/host/fabrics.c | 29 +++++++++++++++++------------
drivers/nvme/host/nvme.h | 2 +-
3 files changed, 21 insertions(+), 16 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 5e9a0cf43636..0a8a595070c0 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -374,14 +374,14 @@ static inline enum nvme_disposition nvme_decide_disposition(struct request *req)
if (likely(nvme_req(req)->status == 0))
return COMPLETE;
- if ((nvme_req(req)->status & 0x7ff) == NVME_SC_AUTH_REQUIRED)
- return AUTHENTICATE;
-
if (blk_noretry_request(req) ||
(nvme_req(req)->status & NVME_SC_DNR) ||
nvme_req(req)->retries >= nvme_max_retries)
return COMPLETE;
+ if ((nvme_req(req)->status & 0x7ff) == NVME_SC_AUTH_REQUIRED)
+ return AUTHENTICATE;
+
if (req->cmd_flags & REQ_NVME_MPATH) {
if (nvme_is_path_error(nvme_req(req)->status) ||
blk_queue_dying(req->q))
diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index ab5ac219b70a..32e993781236 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -467,7 +467,7 @@ int nvmf_connect_admin_queue(struct nvme_ctrl *ctrl)
if (result & NVME_CONNECT_AUTHREQ_ASCR) {
dev_warn(ctrl->device,
"qid 0: secure concatenation is not supported\n");
- ret = NVME_SC_AUTH_REQUIRED;
+ ret = NVME_SC_AUTH_REQUIRED | NVME_SC_DNR;
goto out_free_data;
}
/* Authentication required */
@@ -475,14 +475,16 @@ int nvmf_connect_admin_queue(struct nvme_ctrl *ctrl)
if (ret) {
dev_warn(ctrl->device,
"qid 0: authentication setup failed\n");
- ret = NVME_SC_AUTH_REQUIRED;
+ ret = NVME_SC_AUTH_REQUIRED | NVME_SC_DNR;
goto out_free_data;
}
ret = nvme_auth_wait(ctrl, 0);
- if (ret)
+ if (ret) {
dev_warn(ctrl->device,
- "qid 0: authentication failed\n");
- else
+ "qid 0: authentication failed, error %d\n",
+ ret);
+ ret = NVME_SC_AUTH_REQUIRED | NVME_SC_DNR;
+ } else
dev_info(ctrl->device,
"qid 0: authenticated\n");
}
@@ -541,7 +543,7 @@ int nvmf_connect_io_queue(struct nvme_ctrl *ctrl, u16 qid)
if (result & NVME_CONNECT_AUTHREQ_ASCR) {
dev_warn(ctrl->device,
"qid 0: secure concatenation is not supported\n");
- ret = NVME_SC_AUTH_REQUIRED;
+ ret = NVME_SC_AUTH_REQUIRED | NVME_SC_DNR;
goto out_free_data;
}
/* Authentication required */
@@ -549,12 +551,15 @@ int nvmf_connect_io_queue(struct nvme_ctrl *ctrl, u16 qid)
if (ret) {
dev_warn(ctrl->device,
"qid %d: authentication setup failed\n", qid);
- ret = NVME_SC_AUTH_REQUIRED;
- } else {
- ret = nvme_auth_wait(ctrl, qid);
- if (ret)
- dev_warn(ctrl->device,
- "qid %u: authentication failed\n", qid);
+ ret = NVME_SC_AUTH_REQUIRED | NVME_SC_DNR;
+ goto out_free_data;
+ }
+ ret = nvme_auth_wait(ctrl, qid);
+ if (ret) {
+ dev_warn(ctrl->device,
+ "qid %u: authentication failed, error %d\n",
+ qid, ret);
+ ret = NVME_SC_AUTH_REQUIRED | NVME_SC_DNR;
}
}
out_free_data:
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 6ed2ca6b35e4..c08c220a68cf 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -1123,7 +1123,7 @@ static inline int nvme_auth_negotiate(struct nvme_ctrl *ctrl, int qid)
}
static inline int nvme_auth_wait(struct nvme_ctrl *ctrl, int qid)
{
- return NVME_SC_AUTH_REQUIRED;
+ return NVME_SC_AUTH_REQUIRED | NVME_SC_DNR;
}
static inline void nvme_auth_free(struct nvme_ctrl *ctrl) {};
#endif
--
2.35.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 1/4] nvme: authentication error are always non-retryable
2024-03-01 11:28 ` [PATCH 1/4] nvme: authentication error are always non-retryable Hannes Reinecke
@ 2024-03-01 13:12 ` Christoph Hellwig
2024-03-01 15:26 ` Hannes Reinecke
0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2024-03-01 13:12 UTC (permalink / raw)
To: Hannes Reinecke
Cc: Christoph Hellwig, Keith Busch, Sagi Grimberg, linux-nvme,
Hannes Reinecke
> - if ((nvme_req(req)->status & 0x7ff) == NVME_SC_AUTH_REQUIRED)
> - return AUTHENTICATE;
> -
> if (blk_noretry_request(req) ||
> (nvme_req(req)->status & NVME_SC_DNR) ||
> nvme_req(req)->retries >= nvme_max_retries)
> return COMPLETE;
>
> + if ((nvme_req(req)->status & 0x7ff) == NVME_SC_AUTH_REQUIRED)
> + return AUTHENTICATE;
This part looks fine (although I'd word the commit message
differently for it).
> @@ -467,7 +467,7 @@ int nvmf_connect_admin_queue(struct nvme_ctrl *ctrl)
> if (result & NVME_CONNECT_AUTHREQ_ASCR) {
> dev_warn(ctrl->device,
> "qid 0: secure concatenation is not supported\n");
> - ret = NVME_SC_AUTH_REQUIRED;
> + ret = NVME_SC_AUTH_REQUIRED | NVME_SC_DNR;
> goto out_free_data;
> }
> /* Authentication required */
> @@ -475,14 +475,16 @@ int nvmf_connect_admin_queue(struct nvme_ctrl *ctrl)
> if (ret) {
> dev_warn(ctrl->device,
> "qid 0: authentication setup failed\n");
> - ret = NVME_SC_AUTH_REQUIRED;
> + ret = NVME_SC_AUTH_REQUIRED | NVME_SC_DNR;
.. but the others should never use nvme status codes as they never
go out onto the wire.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/4] nvme: authentication error are always non-retryable
2024-03-01 13:12 ` Christoph Hellwig
@ 2024-03-01 15:26 ` Hannes Reinecke
2024-03-07 8:51 ` Sagi Grimberg
0 siblings, 1 reply; 20+ messages in thread
From: Hannes Reinecke @ 2024-03-01 15:26 UTC (permalink / raw)
To: Christoph Hellwig, Hannes Reinecke; +Cc: Keith Busch, Sagi Grimberg, linux-nvme
On 3/1/24 14:12, Christoph Hellwig wrote:
>> - if ((nvme_req(req)->status & 0x7ff) == NVME_SC_AUTH_REQUIRED)
>> - return AUTHENTICATE;
>> -
>> if (blk_noretry_request(req) ||
>> (nvme_req(req)->status & NVME_SC_DNR) ||
>> nvme_req(req)->retries >= nvme_max_retries)
>> return COMPLETE;
>>
>> + if ((nvme_req(req)->status & 0x7ff) == NVME_SC_AUTH_REQUIRED)
>> + return AUTHENTICATE;
>
> This part looks fine (although I'd word the commit message
> differently for it).
>
>> @@ -467,7 +467,7 @@ int nvmf_connect_admin_queue(struct nvme_ctrl *ctrl)
>> if (result & NVME_CONNECT_AUTHREQ_ASCR) {
>> dev_warn(ctrl->device,
>> "qid 0: secure concatenation is not supported\n");
>> - ret = NVME_SC_AUTH_REQUIRED;
>> + ret = NVME_SC_AUTH_REQUIRED | NVME_SC_DNR;
>> goto out_free_data;
>> }
>> /* Authentication required */
>> @@ -475,14 +475,16 @@ int nvmf_connect_admin_queue(struct nvme_ctrl *ctrl)
>> if (ret) {
>> dev_warn(ctrl->device,
>> "qid 0: authentication setup failed\n");
>> - ret = NVME_SC_AUTH_REQUIRED;
>> + ret = NVME_SC_AUTH_REQUIRED | NVME_SC_DNR;
>
>
> .. but the others should never use nvme status codes as they never
> go out onto the wire.
>
Would love to, but to my knowledge we have the convention that NVMe
status codes less than 0 should _not_ be retried. NVMe transport errors,
OTOH, _should_ be retried. 'connect_admin_queue' and 'connect_io_queue'
now straddles the boundary: technically it's an NVMe command, but in
practice it's a transport command as there are quite a lot of steps
before we even can send the 'connect' command.
So, how do we handle the return codes from 'connect_admin_queue' ?
The NVMe core style (with not retrying negative errors) or the NVMe
transport style (with always retrying until we run out of retries)?
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/4] nvme: authentication error are always non-retryable
2024-03-01 15:26 ` Hannes Reinecke
@ 2024-03-07 8:51 ` Sagi Grimberg
2024-03-07 10:32 ` Hannes Reinecke
0 siblings, 1 reply; 20+ messages in thread
From: Sagi Grimberg @ 2024-03-07 8:51 UTC (permalink / raw)
To: Hannes Reinecke, Christoph Hellwig, Hannes Reinecke
Cc: Keith Busch, linux-nvme
On 01/03/2024 17:26, Hannes Reinecke wrote:
> On 3/1/24 14:12, Christoph Hellwig wrote:
>>> - if ((nvme_req(req)->status & 0x7ff) == NVME_SC_AUTH_REQUIRED)
>>> - return AUTHENTICATE;
>>> -
>>> if (blk_noretry_request(req) ||
>>> (nvme_req(req)->status & NVME_SC_DNR) ||
>>> nvme_req(req)->retries >= nvme_max_retries)
>>> return COMPLETE;
>>> + if ((nvme_req(req)->status & 0x7ff) == NVME_SC_AUTH_REQUIRED)
>>> + return AUTHENTICATE;
>>
>> This part looks fine (although I'd word the commit message
>> differently for it).
>>
>>> @@ -467,7 +467,7 @@ int nvmf_connect_admin_queue(struct nvme_ctrl
>>> *ctrl)
>>> if (result & NVME_CONNECT_AUTHREQ_ASCR) {
>>> dev_warn(ctrl->device,
>>> "qid 0: secure concatenation is not supported\n");
>>> - ret = NVME_SC_AUTH_REQUIRED;
>>> + ret = NVME_SC_AUTH_REQUIRED | NVME_SC_DNR;
>>> goto out_free_data;
>>> }
>>> /* Authentication required */
>>> @@ -475,14 +475,16 @@ int nvmf_connect_admin_queue(struct nvme_ctrl
>>> *ctrl)
>>> if (ret) {
>>> dev_warn(ctrl->device,
>>> "qid 0: authentication setup failed\n");
>>> - ret = NVME_SC_AUTH_REQUIRED;
>>> + ret = NVME_SC_AUTH_REQUIRED | NVME_SC_DNR;
>>
>>
>> .. but the others should never use nvme status codes as they never
>> go out onto the wire.
>>
> Would love to, but to my knowledge we have the convention that NVMe
> status codes less than 0 should _not_ be retried.
What nvme status code is less than zero?
> NVMe transport errors,
> OTOH, _should_ be retried.
Yes.
> 'connect_admin_queue' and 'connect_io_queue' now straddles the
> boundary: technically it's an NVMe command, but in
> practice it's a transport command as there are quite a lot of steps
> before we even can send the 'connect' command.
>
> So, how do we handle the return codes from 'connect_admin_queue' ?
> The NVMe core style (with not retrying negative errors) or the NVMe
> transport style (with always retrying until we run out of retries)?
Not exactly sure what is the question here. I am still going over emails
so there
may be some discussion around this, but it is unclear from the patch
description
what is the issue this is solving.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/4] nvme: authentication error are always non-retryable
2024-03-07 8:51 ` Sagi Grimberg
@ 2024-03-07 10:32 ` Hannes Reinecke
2024-03-07 11:37 ` Sagi Grimberg
0 siblings, 1 reply; 20+ messages in thread
From: Hannes Reinecke @ 2024-03-07 10:32 UTC (permalink / raw)
To: Sagi Grimberg, Christoph Hellwig, Hannes Reinecke; +Cc: Keith Busch, linux-nvme
On 3/7/24 09:51, Sagi Grimberg wrote:
>
>
> On 01/03/2024 17:26, Hannes Reinecke wrote:
>> On 3/1/24 14:12, Christoph Hellwig wrote:
>>>> - if ((nvme_req(req)->status & 0x7ff) == NVME_SC_AUTH_REQUIRED)
>>>> - return AUTHENTICATE;
>>>> -
>>>> if (blk_noretry_request(req) ||
>>>> (nvme_req(req)->status & NVME_SC_DNR) ||
>>>> nvme_req(req)->retries >= nvme_max_retries)
>>>> return COMPLETE;
>>>> + if ((nvme_req(req)->status & 0x7ff) == NVME_SC_AUTH_REQUIRED)
>>>> + return AUTHENTICATE;
>>>
>>> This part looks fine (although I'd word the commit message
>>> differently for it).
>>>
>>>> @@ -467,7 +467,7 @@ int nvmf_connect_admin_queue(struct nvme_ctrl
>>>> *ctrl)
>>>> if (result & NVME_CONNECT_AUTHREQ_ASCR) {
>>>> dev_warn(ctrl->device,
>>>> "qid 0: secure concatenation is not supported\n");
>>>> - ret = NVME_SC_AUTH_REQUIRED;
>>>> + ret = NVME_SC_AUTH_REQUIRED | NVME_SC_DNR;
>>>> goto out_free_data;
>>>> }
>>>> /* Authentication required */
>>>> @@ -475,14 +475,16 @@ int nvmf_connect_admin_queue(struct nvme_ctrl
>>>> *ctrl)
>>>> if (ret) {
>>>> dev_warn(ctrl->device,
>>>> "qid 0: authentication setup failed\n");
>>>> - ret = NVME_SC_AUTH_REQUIRED;
>>>> + ret = NVME_SC_AUTH_REQUIRED | NVME_SC_DNR;
>>>
>>>
>>> .. but the others should never use nvme status codes as they never
>>> go out onto the wire.
>>>
>> Would love to, but to my knowledge we have the convention that NVMe
>> status codes less than 0 should _not_ be retried.
>
> What nvme status code is less than zero?
>
Not the NVMe status, but the 'ret' value containing either a (positive)
NVMe status or a (negative) errno.
>> NVMe transport errors,
>> OTOH, _should_ be retried.
> Yes.
>> 'connect_admin_queue' and 'connect_io_queue' now straddles the
>> boundary: technically it's an NVMe command, but in
>> practice it's a transport command as there are quite a lot of steps
>> before we even can send the 'connect' command.
>>
>> So, how do we handle the return codes from 'connect_admin_queue' ?
>> The NVMe core style (with not retrying negative errors) or the NVMe
>> transport style (with always retrying until we run out of retries)?
>
> Not exactly sure what is the question here. I am still going over emails
> so there
> may be some discussion around this, but it is unclear from the patch
> description what is the issue this is solving.
The question is what to do if nvmf_connect_admin_queue() returns with a
status with the DNR status set.
In TCP the status is carried back to the return value of
nvme_tcp_setup_ctrl(), which is called during reset, reconnect, and
initialisation.
For the first two the status is simply ignored, and always retried
(ie every error is treated as retryable). For the latter, we will always
abort the initialisation (ie every error is non-retryable).
Either way, we're completely ignoring the DNR bit in the NVMe status.
Daniel has sent two patches on my behalf, trying to evaluate the DNR
status during reconnect and reset. With them it suddenly matters how
the return value from nvmf_connect_admin_queue() is to be interpreted.
With my original idea of retrying authentication errors we will keep
on retrying, even though the authentication settings won't change,
and consequently each retry will fail. So in effect we go through
pointless retries, and only abort the test case once the retries
are exhausted. By making authentication errors non-retryable we
can terminate the testcase directly.
The only way I could think of to signal a non-retryable error
is to return an NVMe status with the DNR bit set; every other error
will cause a retry. But this will require us to fabricate an NVMe
status code, which Christoph objects to.
Cheers,
Hannes
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/4] nvme: authentication error are always non-retryable
2024-03-07 10:32 ` Hannes Reinecke
@ 2024-03-07 11:37 ` Sagi Grimberg
0 siblings, 0 replies; 20+ messages in thread
From: Sagi Grimberg @ 2024-03-07 11:37 UTC (permalink / raw)
To: Hannes Reinecke, Christoph Hellwig, Hannes Reinecke
Cc: Keith Busch, linux-nvme
On 07/03/2024 12:32, Hannes Reinecke wrote:
> On 3/7/24 09:51, Sagi Grimberg wrote:
>>
>>
>> On 01/03/2024 17:26, Hannes Reinecke wrote:
>>> On 3/1/24 14:12, Christoph Hellwig wrote:
>>>>> - if ((nvme_req(req)->status & 0x7ff) == NVME_SC_AUTH_REQUIRED)
>>>>> - return AUTHENTICATE;
>>>>> -
>>>>> if (blk_noretry_request(req) ||
>>>>> (nvme_req(req)->status & NVME_SC_DNR) ||
>>>>> nvme_req(req)->retries >= nvme_max_retries)
>>>>> return COMPLETE;
>>>>> + if ((nvme_req(req)->status & 0x7ff) == NVME_SC_AUTH_REQUIRED)
>>>>> + return AUTHENTICATE;
>>>>
>>>> This part looks fine (although I'd word the commit message
>>>> differently for it).
>>>>
>>>>> @@ -467,7 +467,7 @@ int nvmf_connect_admin_queue(struct nvme_ctrl
>>>>> *ctrl)
>>>>> if (result & NVME_CONNECT_AUTHREQ_ASCR) {
>>>>> dev_warn(ctrl->device,
>>>>> "qid 0: secure concatenation is not supported\n");
>>>>> - ret = NVME_SC_AUTH_REQUIRED;
>>>>> + ret = NVME_SC_AUTH_REQUIRED | NVME_SC_DNR;
>>>>> goto out_free_data;
>>>>> }
>>>>> /* Authentication required */
>>>>> @@ -475,14 +475,16 @@ int nvmf_connect_admin_queue(struct
>>>>> nvme_ctrl *ctrl)
>>>>> if (ret) {
>>>>> dev_warn(ctrl->device,
>>>>> "qid 0: authentication setup failed\n");
>>>>> - ret = NVME_SC_AUTH_REQUIRED;
>>>>> + ret = NVME_SC_AUTH_REQUIRED | NVME_SC_DNR;
>>>>
>>>>
>>>> .. but the others should never use nvme status codes as they never
>>>> go out onto the wire.
>>>>
>>> Would love to, but to my knowledge we have the convention that NVMe
>>> status codes less than 0 should _not_ be retried.
>>
>> What nvme status code is less than zero?
>>
> Not the NVMe status, but the 'ret' value containing either a
> (positive) NVMe status or a (negative) errno.
>
>>> NVMe transport errors,
>>> OTOH, _should_ be retried.
>> Yes.
>>> 'connect_admin_queue' and 'connect_io_queue' now straddles the
>>> boundary: technically it's an NVMe command, but in
>>> practice it's a transport command as there are quite a lot of steps
>>> before we even can send the 'connect' command.
>>>
>>> So, how do we handle the return codes from 'connect_admin_queue' ?
>>> The NVMe core style (with not retrying negative errors) or the NVMe
>>> transport style (with always retrying until we run out of retries)?
>>
>> Not exactly sure what is the question here. I am still going over
>> emails so there
>> may be some discussion around this, but it is unclear from the patch
>> description what is the issue this is solving.
>
> The question is what to do if nvmf_connect_admin_queue() returns with
> a status with the DNR status set.
> In TCP the status is carried back to the return value of
> nvme_tcp_setup_ctrl(), which is called during reset, reconnect, and
> initialisation.
> For the first two the status is simply ignored, and always retried
> (ie every error is treated as retryable). For the latter, we will
> always abort the initialisation (ie every error is non-retryable).
> Either way, we're completely ignoring the DNR bit in the NVMe status.
IMO We should always fail initial connect, for whatever reason.
userspace can take care
of it if it wants to reconnect. If we managed to ever connect to a
controller, we will retry
once we lose the connection.
>
> Daniel has sent two patches on my behalf, trying to evaluate the DNR
> status during reconnect and reset.
Didn't get to those yet.
> With them it suddenly matters how
> the return value from nvmf_connect_admin_queue() is to be interpreted.
OK, The spec should probably be clear about it.
> With my original idea of retrying authentication errors we will keep
> on retrying, even though the authentication settings won't change,
> and consequently each retry will fail. So in effect we go through
> pointless retries, and only abort the test case once the retries
> are exhausted. By making authentication errors non-retryable we
> can terminate the testcase directly.
> The only way I could think of to signal a non-retryable error
> is to return an NVMe status with the DNR bit set; every other error
> will cause a retry. But this will require us to fabricate an NVMe
> status code, which Christoph objects to.
Can we start with defining what is a retry-able authentication error and
what isn't?
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 2/4] nvmet: lock config semaphore when accessing DH-HMAC-CHAP key
2024-03-01 11:28 [PATCH 0/4] nvme: fixes for authentication errors Hannes Reinecke
2024-03-01 11:28 ` [PATCH 1/4] nvme: authentication error are always non-retryable Hannes Reinecke
@ 2024-03-01 11:28 ` Hannes Reinecke
2024-03-01 13:13 ` Christoph Hellwig
2024-03-07 8:53 ` Sagi Grimberg
2024-03-01 11:28 ` [PATCH 3/4] nvmet: return DHCHAP status codes from nvmet_setup_auth() Hannes Reinecke
` (3 subsequent siblings)
5 siblings, 2 replies; 20+ messages in thread
From: Hannes Reinecke @ 2024-03-01 11:28 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Keith Busch, Sagi Grimberg, linux-nvme, Hannes Reinecke, Hannes Reinecke
When the DH-HMAC-CHAP key is accessed via configfs we need to
take the config semaphore as a reconnect might be running at
the same time.
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
drivers/nvme/target/auth.c | 2 ++
drivers/nvme/target/configfs.c | 22 +++++++++++++++++-----
2 files changed, 19 insertions(+), 5 deletions(-)
diff --git a/drivers/nvme/target/auth.c b/drivers/nvme/target/auth.c
index 3ddbc3880cac..9afc28f1ffac 100644
--- a/drivers/nvme/target/auth.c
+++ b/drivers/nvme/target/auth.c
@@ -44,6 +44,7 @@ int nvmet_auth_set_key(struct nvmet_host *host, const char *secret,
dhchap_secret = kstrdup(secret, GFP_KERNEL);
if (!dhchap_secret)
return -ENOMEM;
+ down_write(&nvmet_config_sem);
if (set_ctrl) {
kfree(host->dhchap_ctrl_secret);
host->dhchap_ctrl_secret = strim(dhchap_secret);
@@ -53,6 +54,7 @@ int nvmet_auth_set_key(struct nvmet_host *host, const char *secret,
host->dhchap_secret = strim(dhchap_secret);
host->dhchap_key_hash = key_hash;
}
+ up_write(&nvmet_config_sem);
return 0;
}
diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c
index 2482a0db2504..92756fca0005 100644
--- a/drivers/nvme/target/configfs.c
+++ b/drivers/nvme/target/configfs.c
@@ -1962,11 +1962,17 @@ static struct config_group nvmet_ports_group;
static ssize_t nvmet_host_dhchap_key_show(struct config_item *item,
char *page)
{
- u8 *dhchap_secret = to_host(item)->dhchap_secret;
+ u8 *dhchap_secret;
+ ssize_t ret;
+ down_read(&nvmet_config_sem);
+ dhchap_secret = to_host(item)->dhchap_secret;
if (!dhchap_secret)
- return sprintf(page, "\n");
- return sprintf(page, "%s\n", dhchap_secret);
+ ret = sprintf(page, "\n");
+ else
+ ret = sprintf(page, "%s\n", dhchap_secret);
+ up_read(&nvmet_config_sem);
+ return ret;
}
static ssize_t nvmet_host_dhchap_key_store(struct config_item *item,
@@ -1990,10 +1996,16 @@ static ssize_t nvmet_host_dhchap_ctrl_key_show(struct config_item *item,
char *page)
{
u8 *dhchap_secret = to_host(item)->dhchap_ctrl_secret;
+ ssize_t ret;
+ down_read(&nvmet_config_sem);
+ dhchap_secret = to_host(item)->dhchap_ctrl_secret;
if (!dhchap_secret)
- return sprintf(page, "\n");
- return sprintf(page, "%s\n", dhchap_secret);
+ ret = sprintf(page, "\n");
+ else
+ ret = sprintf(page, "%s\n", dhchap_secret);
+ up_read(&nvmet_config_sem);
+ return ret;
}
static ssize_t nvmet_host_dhchap_ctrl_key_store(struct config_item *item,
--
2.35.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 3/4] nvmet: return DHCHAP status codes from nvmet_setup_auth()
2024-03-01 11:28 [PATCH 0/4] nvme: fixes for authentication errors Hannes Reinecke
2024-03-01 11:28 ` [PATCH 1/4] nvme: authentication error are always non-retryable Hannes Reinecke
2024-03-01 11:28 ` [PATCH 2/4] nvmet: lock config semaphore when accessing DH-HMAC-CHAP key Hannes Reinecke
@ 2024-03-01 11:28 ` Hannes Reinecke
2024-03-01 13:13 ` Christoph Hellwig
2024-03-07 8:56 ` Sagi Grimberg
2024-03-01 11:28 ` [PATCH 4/4] nvmet-loop: do not call nvme_ctrl_put() after nvme_ctrl_uninit() Hannes Reinecke
` (2 subsequent siblings)
5 siblings, 2 replies; 20+ messages in thread
From: Hannes Reinecke @ 2024-03-01 11:28 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Keith Busch, Sagi Grimberg, linux-nvme, Hannes Reinecke, Hannes Reinecke
A failure in nvmet_setup_auth() does not mean that the NVMe
authentication command failed, so we should rather return a
protocol error with a 'failure1' response than an NVMe status.
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
drivers/nvme/target/auth.c | 17 +++++++----------
drivers/nvme/target/fabrics-cmd-auth.c | 11 ++++++-----
2 files changed, 13 insertions(+), 15 deletions(-)
diff --git a/drivers/nvme/target/auth.c b/drivers/nvme/target/auth.c
index 9afc28f1ffac..63dea7cd7cd1 100644
--- a/drivers/nvme/target/auth.c
+++ b/drivers/nvme/target/auth.c
@@ -149,13 +149,16 @@ int nvmet_setup_auth(struct nvmet_ctrl *ctrl)
}
if (!host) {
pr_debug("host %s not found\n", ctrl->hostnqn);
- ret = -EPERM;
+ ret = NVME_AUTH_DHCHAP_FAILURE_FAILED;
goto out_unlock;
}
ret = nvmet_setup_dhgroup(ctrl, host->dhchap_dhgroup_id);
- if (ret < 0)
+ if (ret < 0) {
pr_warn("Failed to setup DH group");
+ ret = NVME_AUTH_DHCHAP_FAILURE_DHGROUP_UNUSABLE;
+ goto out_unlock;
+ }
if (!host->dhchap_secret) {
pr_debug("No authentication provided\n");
@@ -166,12 +169,6 @@ int nvmet_setup_auth(struct nvmet_ctrl *ctrl)
pr_debug("Re-use existing hash ID %d\n",
ctrl->shash_id);
} else {
- hash_name = nvme_auth_hmac_name(host->dhchap_hash_id);
- if (!hash_name) {
- pr_warn("Hash ID %d invalid\n", host->dhchap_hash_id);
- ret = -EINVAL;
- goto out_unlock;
- }
ctrl->shash_id = host->dhchap_hash_id;
}
@@ -180,7 +177,7 @@ int nvmet_setup_auth(struct nvmet_ctrl *ctrl)
ctrl->host_key = nvme_auth_extract_key(host->dhchap_secret + 10,
host->dhchap_key_hash);
if (IS_ERR(ctrl->host_key)) {
- ret = PTR_ERR(ctrl->host_key);
+ ret = NVME_AUTH_DHCHAP_FAILURE_NOT_USABLE;
ctrl->host_key = NULL;
goto out_free_hash;
}
@@ -198,7 +195,7 @@ int nvmet_setup_auth(struct nvmet_ctrl *ctrl)
ctrl->ctrl_key = nvme_auth_extract_key(host->dhchap_ctrl_secret + 10,
host->dhchap_ctrl_key_hash);
if (IS_ERR(ctrl->ctrl_key)) {
- ret = PTR_ERR(ctrl->ctrl_key);
+ ret = NVME_AUTH_DHCHAP_FAILURE_NOT_USABLE;
ctrl->ctrl_key = NULL;
goto out_free_hash;
}
diff --git a/drivers/nvme/target/fabrics-cmd-auth.c b/drivers/nvme/target/fabrics-cmd-auth.c
index eb7785be0ca7..a95dc6606396 100644
--- a/drivers/nvme/target/fabrics-cmd-auth.c
+++ b/drivers/nvme/target/fabrics-cmd-auth.c
@@ -240,12 +240,13 @@ void nvmet_execute_auth_send(struct nvmet_req *req)
pr_debug("%s: ctrl %d qid %d reset negotiation\n", __func__,
ctrl->cntlid, req->sq->qid);
if (!req->sq->qid) {
- if (nvmet_setup_auth(ctrl) < 0) {
- status = NVME_SC_INTERNAL;
- pr_err("ctrl %d qid 0 failed to setup"
- "re-authentication",
+ status = nvmet_setup_auth(ctrl);
+ if (status) {
+ pr_err("ctrl %d qid 0 failed to setup re-authentication\n",
ctrl->cntlid);
- goto done_failure1;
+ req->sq->dhchap_status = status;
+ req->sq->dhchap_step = NVME_AUTH_DHCHAP_MESSAGE_FAILURE1;
+ goto done_kfree;
}
}
req->sq->dhchap_step = NVME_AUTH_DHCHAP_MESSAGE_NEGOTIATE;
--
2.35.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 3/4] nvmet: return DHCHAP status codes from nvmet_setup_auth()
2024-03-01 11:28 ` [PATCH 3/4] nvmet: return DHCHAP status codes from nvmet_setup_auth() Hannes Reinecke
@ 2024-03-01 13:13 ` Christoph Hellwig
2024-03-07 8:56 ` Sagi Grimberg
1 sibling, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2024-03-01 13:13 UTC (permalink / raw)
To: Hannes Reinecke
Cc: Christoph Hellwig, Keith Busch, Sagi Grimberg, linux-nvme,
Hannes Reinecke
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 3/4] nvmet: return DHCHAP status codes from nvmet_setup_auth()
2024-03-01 11:28 ` [PATCH 3/4] nvmet: return DHCHAP status codes from nvmet_setup_auth() Hannes Reinecke
2024-03-01 13:13 ` Christoph Hellwig
@ 2024-03-07 8:56 ` Sagi Grimberg
2024-03-07 11:19 ` Hannes Reinecke
1 sibling, 1 reply; 20+ messages in thread
From: Sagi Grimberg @ 2024-03-07 8:56 UTC (permalink / raw)
To: Hannes Reinecke, Christoph Hellwig
Cc: Keith Busch, linux-nvme, Hannes Reinecke
On 01/03/2024 13:28, Hannes Reinecke wrote:
> A failure in nvmet_setup_auth() does not mean that the NVMe
> authentication command failed, so we should rather return a
> protocol error with a 'failure1' response than an NVMe status.
Shouldn't nvmet reply NVME_SC_INTERNAL like in other similar
cases?
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 3/4] nvmet: return DHCHAP status codes from nvmet_setup_auth()
2024-03-07 8:56 ` Sagi Grimberg
@ 2024-03-07 11:19 ` Hannes Reinecke
2024-03-07 12:03 ` Sagi Grimberg
0 siblings, 1 reply; 20+ messages in thread
From: Hannes Reinecke @ 2024-03-07 11:19 UTC (permalink / raw)
To: Sagi Grimberg, Hannes Reinecke, Christoph Hellwig; +Cc: Keith Busch, linux-nvme
On 3/7/24 09:56, Sagi Grimberg wrote:
>
>
> On 01/03/2024 13:28, Hannes Reinecke wrote:
>> A failure in nvmet_setup_auth() does not mean that the NVMe
>> authentication command failed, so we should rather return a
>> protocol error with a 'failure1' response than an NVMe status.
>
> Shouldn't nvmet reply NVME_SC_INTERNAL like in other similar
> cases?
No. NVME_SC_INTERNAL will be converted into a status code for the
authentication _command_, indicating that the authentication
_command_ encountered an error.
But authentication _command_ is fine (ie it got send and received
okay). It's the authentication itself which failed, so we need to
return an authentication protocol failure state.
Cheers,
Hannes
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 3/4] nvmet: return DHCHAP status codes from nvmet_setup_auth()
2024-03-07 11:19 ` Hannes Reinecke
@ 2024-03-07 12:03 ` Sagi Grimberg
0 siblings, 0 replies; 20+ messages in thread
From: Sagi Grimberg @ 2024-03-07 12:03 UTC (permalink / raw)
To: Hannes Reinecke, Hannes Reinecke, Christoph Hellwig
Cc: Keith Busch, linux-nvme
On 07/03/2024 13:19, Hannes Reinecke wrote:
> On 3/7/24 09:56, Sagi Grimberg wrote:
>>
>>
>> On 01/03/2024 13:28, Hannes Reinecke wrote:
>>> A failure in nvmet_setup_auth() does not mean that the NVMe
>>> authentication command failed, so we should rather return a
>>> protocol error with a 'failure1' response than an NVMe status.
>>
>> Shouldn't nvmet reply NVME_SC_INTERNAL like in other similar
>> cases?
>
> No. NVME_SC_INTERNAL will be converted into a status code for the
> authentication _command_, indicating that the authentication
> _command_ encountered an error.
> But authentication _command_ is fine (ie it got send and received
> okay). It's the authentication itself which failed, so we need to
> return an authentication protocol failure state.
Yes, I guess I was specifically asking about the failure to setup_auth
in the re-auth case...
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 4/4] nvmet-loop: do not call nvme_ctrl_put() after nvme_ctrl_uninit()
2024-03-01 11:28 [PATCH 0/4] nvme: fixes for authentication errors Hannes Reinecke
` (2 preceding siblings ...)
2024-03-01 11:28 ` [PATCH 3/4] nvmet: return DHCHAP status codes from nvmet_setup_auth() Hannes Reinecke
@ 2024-03-01 11:28 ` Hannes Reinecke
2024-03-01 13:14 ` Christoph Hellwig
2024-03-07 8:58 ` Sagi Grimberg
2024-03-01 12:24 ` [PATCH 0/4] nvme: fixes for authentication errors Daniel Wagner
2024-03-03 2:58 ` Chaitanya Kulkarni
5 siblings, 2 replies; 20+ messages in thread
From: Hannes Reinecke @ 2024-03-01 11:28 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Keith Busch, Sagi Grimberg, linux-nvme, Hannes Reinecke,
Daniel Wagner, Hannes Reinecke
nvme_ctrl_uninit() already calls nvme_ctrl_put(), doing it twice
will cause an UAF.
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
drivers/nvme/target/loop.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
index f1d5eee3daec..82db181f0168 100644
--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -615,7 +615,6 @@ static struct nvme_ctrl *nvme_loop_create_ctrl(struct device *dev,
kfree(ctrl->queues);
out_uninit_ctrl:
nvme_uninit_ctrl(&ctrl->ctrl);
- nvme_put_ctrl(&ctrl->ctrl);
out:
if (ret > 0)
ret = -EIO;
--
2.35.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 4/4] nvmet-loop: do not call nvme_ctrl_put() after nvme_ctrl_uninit()
2024-03-01 11:28 ` [PATCH 4/4] nvmet-loop: do not call nvme_ctrl_put() after nvme_ctrl_uninit() Hannes Reinecke
@ 2024-03-01 13:14 ` Christoph Hellwig
2024-03-07 8:58 ` Sagi Grimberg
1 sibling, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2024-03-01 13:14 UTC (permalink / raw)
To: Hannes Reinecke
Cc: Christoph Hellwig, Keith Busch, Sagi Grimberg, linux-nvme,
Daniel Wagner, Hannes Reinecke
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 4/4] nvmet-loop: do not call nvme_ctrl_put() after nvme_ctrl_uninit()
2024-03-01 11:28 ` [PATCH 4/4] nvmet-loop: do not call nvme_ctrl_put() after nvme_ctrl_uninit() Hannes Reinecke
2024-03-01 13:14 ` Christoph Hellwig
@ 2024-03-07 8:58 ` Sagi Grimberg
1 sibling, 0 replies; 20+ messages in thread
From: Sagi Grimberg @ 2024-03-07 8:58 UTC (permalink / raw)
To: Hannes Reinecke, Christoph Hellwig
Cc: Keith Busch, linux-nvme, Daniel Wagner, Hannes Reinecke
Looks fine,
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
But I think it would help if loop would behave like other transports
that do have the extra put.
On 01/03/2024 13:28, Hannes Reinecke wrote:
> nvme_ctrl_uninit() already calls nvme_ctrl_put(), doing it twice
> will cause an UAF.
>
> Signed-off-by: Daniel Wagner <dwagner@suse.de>
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
> drivers/nvme/target/loop.c | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
> index f1d5eee3daec..82db181f0168 100644
> --- a/drivers/nvme/target/loop.c
> +++ b/drivers/nvme/target/loop.c
> @@ -615,7 +615,6 @@ static struct nvme_ctrl *nvme_loop_create_ctrl(struct device *dev,
> kfree(ctrl->queues);
> out_uninit_ctrl:
> nvme_uninit_ctrl(&ctrl->ctrl);
> - nvme_put_ctrl(&ctrl->ctrl);
> out:
> if (ret > 0)
> ret = -EIO;
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 0/4] nvme: fixes for authentication errors
2024-03-01 11:28 [PATCH 0/4] nvme: fixes for authentication errors Hannes Reinecke
` (3 preceding siblings ...)
2024-03-01 11:28 ` [PATCH 4/4] nvmet-loop: do not call nvme_ctrl_put() after nvme_ctrl_uninit() Hannes Reinecke
@ 2024-03-01 12:24 ` Daniel Wagner
2024-03-03 2:58 ` Chaitanya Kulkarni
5 siblings, 0 replies; 20+ messages in thread
From: Daniel Wagner @ 2024-03-01 12:24 UTC (permalink / raw)
To: Hannes Reinecke; +Cc: Christoph Hellwig, Keith Busch, Sagi Grimberg, linux-nvme
On Fri, Mar 01, 2024 at 12:28:19PM +0100, Hannes Reinecke wrote:
> Hi all,
>
> here's a set of fixes we've encountered during blktest runs.
> Turns out that any authentication error we're generating
> internally a in fact non-retryable, so we should map them
> onto the appropriate NVMe status.
> That caused some fallout on the target side, which gets
> fixed with the remainder of this patchset.
>
> As usual, comments and reviews are welcome.
I've tested this with all transports and the results are pretty good.
Only rdma has some random fallouts (UAFs and lockdep warnings) but this
has nothing to do with this series.
With this series applied and my nvme/048 fixes from today, we finally
have fully working blktests for all transports.
Tested-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 0/4] nvme: fixes for authentication errors
2024-03-01 11:28 [PATCH 0/4] nvme: fixes for authentication errors Hannes Reinecke
` (4 preceding siblings ...)
2024-03-01 12:24 ` [PATCH 0/4] nvme: fixes for authentication errors Daniel Wagner
@ 2024-03-03 2:58 ` Chaitanya Kulkarni
5 siblings, 0 replies; 20+ messages in thread
From: Chaitanya Kulkarni @ 2024-03-03 2:58 UTC (permalink / raw)
To: Hannes Reinecke; +Cc: Keith Busch, Christoph Hellwig, Sagi Grimberg, linux-nvme
On 3/1/2024 3:28 AM, Hannes Reinecke wrote:
> Hi all,
>
> here's a set of fixes we've encountered during blktest runs.
> Turns out that any authentication error we're generating
> internally a in fact non-retryable, so we should map them
> onto the appropriate NVMe status.
> That caused some fallout on the target side, which gets
> fixed with the remainder of this patchset.
>
> As usual, comments and reviews are welcome.
>
>
This whole series looks good to me.
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
-ck
^ permalink raw reply [flat|nested] 20+ messages in thread