Linux-NVME Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH] nvme-core: fix io interrupt when work with dm-multipah
@ 2020-07-27  5:58 Chao Leng
  2020-07-28 11:19 ` Christoph Hellwig
  0 siblings, 1 reply; 6+ messages in thread
From: Chao Leng @ 2020-07-27  5:58 UTC (permalink / raw)
  To: linux-nvme; +Cc: kbusch, axboe, hch, lengchao, sagi

The protocol NVM-Express-1.4 define:
Command Interrupted: Command processing was interrupted and the
controller is unable to successfully complete the command. The host
should retry the command. If this status code is returned, then
the controller shall clear the Do Not Retry bit to ‘0’ in the Status
field of the CQE (refer to Figure 124). The controller shall not return
this status code unless the host has set the Advanced Command Retry
Enable (ACRE) field to 1h in the Host Behavior Support feature(refer to
section 5.21.1.22).

According the protocol define, NVME_SC_CMD_INTERRUPTED need retry.
The error code NVME_SC_CMD_INTERRUPTED should not translate to
BLK_STS_TARGET, because if the error code translate to BLK_STS_TARGET,
dm-multipah will return error to application. So if target return error
code NVME_SC_CMD_INTERRUPTED, io will interrupt. NVME_SC_CMD_INTERRUPTED
should translate to BLK_STS_IOERR by default, dm-multipath will fail
over to other path retry the io.

Signed-off-by: Chao Leng <lengchao@huawei.com>
---
 drivers/nvme/host/core.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index c2c5bc4fb702..1a1ad5e5212c 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -211,7 +211,6 @@ static blk_status_t nvme_error_status(u16 status)
 	case NVME_SC_CAP_EXCEEDED:
 		return BLK_STS_NOSPC;
 	case NVME_SC_LBA_RANGE:
-	case NVME_SC_CMD_INTERRUPTED:
 	case NVME_SC_NS_NOT_READY:
 		return BLK_STS_TARGET;
 	case NVME_SC_BAD_ATTRIBUTES:
-- 
2.16.4


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] nvme-core: fix io interrupt when work with dm-multipah
  2020-07-27  5:58 [PATCH] nvme-core: fix io interrupt when work with dm-multipah Chao Leng
@ 2020-07-28 11:19 ` Christoph Hellwig
  2020-07-29  2:54   ` Chao Leng
  0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2020-07-28 11:19 UTC (permalink / raw)
  To: Chao Leng; +Cc: kbusch, axboe, hch, linux-nvme, sagi

On Mon, Jul 27, 2020 at 01:58:18PM +0800, Chao Leng wrote:
> The protocol NVM-Express-1.4 define:
> Command Interrupted: Command processing was interrupted and the
> controller is unable to successfully complete the command. The host
> should retry the command. If this status code is returned, then
> the controller shall clear the Do Not Retry bit to ‘0’ in the Status
> field of the CQE (refer to Figure 124). The controller shall not return
> this status code unless the host has set the Advanced Command Retry
> Enable (ACRE) field to 1h in the Host Behavior Support feature(refer to
> section 5.21.1.22).
> 
> According the protocol define, NVME_SC_CMD_INTERRUPTED need retry.
> The error code NVME_SC_CMD_INTERRUPTED should not translate to
> BLK_STS_TARGET, because if the error code translate to BLK_STS_TARGET,
> dm-multipah will return error to application. So if target return error
> code NVME_SC_CMD_INTERRUPTED, io will interrupt. NVME_SC_CMD_INTERRUPTED
> should translate to BLK_STS_IOERR by default, dm-multipath will fail
> over to other path retry the io.

IOERR still seems wrong, though.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] nvme-core: fix io interrupt when work with dm-multipah
  2020-07-28 11:19 ` Christoph Hellwig
@ 2020-07-29  2:54   ` Chao Leng
  2020-07-29  5:59     ` Christoph Hellwig
  0 siblings, 1 reply; 6+ messages in thread
From: Chao Leng @ 2020-07-29  2:54 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: kbusch, axboe, sagi, linux-nvme



On 2020/7/28 19:19, Christoph Hellwig wrote:
> On Mon, Jul 27, 2020 at 01:58:18PM +0800, Chao Leng wrote:
>> The protocol NVM-Express-1.4 define:
>> Command Interrupted: Command processing was interrupted and the
>> controller is unable to successfully complete the command. The host
>> should retry the command. If this status code is returned, then
>> the controller shall clear the Do Not Retry bit to ‘0’ in the Status
>> field of the CQE (refer to Figure 124). The controller shall not return
>> this status code unless the host has set the Advanced Command Retry
>> Enable (ACRE) field to 1h in the Host Behavior Support feature(refer to
>> section 5.21.1.22).
>>
>> According the protocol define, NVME_SC_CMD_INTERRUPTED need retry.
>> The error code NVME_SC_CMD_INTERRUPTED should not translate to
>> BLK_STS_TARGET, because if the error code translate to BLK_STS_TARGET,
>> dm-multipah will return error to application. So if target return error
>> code NVME_SC_CMD_INTERRUPTED, io will interrupt. NVME_SC_CMD_INTERRUPTED
>> should translate to BLK_STS_IOERR by default, dm-multipath will fail
>> over to other path retry the io.
> 
> IOERR still seems wrong, though.
> .

BLK_STS_TARGET means target has critical error. NVME_SC_CMD_INTERRUPTED
just means target need retry io. It is not suitable to translate
NVME_SC_CMD_INTERRUPTED to BLK_STS_TARGET. Maybe translate to
BLK_STS_IOERR is also not suitable, we should translate
NVME_SC_CMD_INTERRUPTED to BLK_STS_AGAIN.
We can do like this:
---
  drivers/nvme/host/core.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index c2c5bc4fb702..359ce471df1d 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -211,7 +211,6 @@ static blk_status_t nvme_error_status(u16 status)
  	case NVME_SC_CAP_EXCEEDED:
  		return BLK_STS_NOSPC;
  	case NVME_SC_LBA_RANGE:
-	case NVME_SC_CMD_INTERRUPTED:
  	case NVME_SC_NS_NOT_READY:
  		return BLK_STS_TARGET;
  	case NVME_SC_BAD_ATTRIBUTES:
@@ -236,6 +235,8 @@ static blk_status_t nvme_error_status(u16 status)
  		return BLK_STS_NEXUS;
  	case NVME_SC_HOST_PATH_ERROR:
  		return BLK_STS_TRANSPORT;
+	case NVME_SC_CMD_INTERRUPTED:
+		return BLK_STS_AGAIN;
  	default:
  		return BLK_STS_IOERR;
  	}
-- 
2.16.4

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] nvme-core: fix io interrupt when work with dm-multipah
  2020-07-29  2:54   ` Chao Leng
@ 2020-07-29  5:59     ` Christoph Hellwig
  2020-07-30  1:49       ` Chao Leng
  0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2020-07-29  5:59 UTC (permalink / raw)
  To: Chao Leng; +Cc: kbusch, axboe, Christoph Hellwig, linux-nvme, sagi

On Wed, Jul 29, 2020 at 10:54:29AM +0800, Chao Leng wrote:
>
>
> On 2020/7/28 19:19, Christoph Hellwig wrote:
>> On Mon, Jul 27, 2020 at 01:58:18PM +0800, Chao Leng wrote:
>>> The protocol NVM-Express-1.4 define:
>>> Command Interrupted: Command processing was interrupted and the
>>> controller is unable to successfully complete the command. The host
>>> should retry the command. If this status code is returned, then
>>> the controller shall clear the Do Not Retry bit to ‘0’ in the Status
>>> field of the CQE (refer to Figure 124). The controller shall not return
>>> this status code unless the host has set the Advanced Command Retry
>>> Enable (ACRE) field to 1h in the Host Behavior Support feature(refer to
>>> section 5.21.1.22).
>>>
>>> According the protocol define, NVME_SC_CMD_INTERRUPTED need retry.
>>> The error code NVME_SC_CMD_INTERRUPTED should not translate to
>>> BLK_STS_TARGET, because if the error code translate to BLK_STS_TARGET,
>>> dm-multipah will return error to application. So if target return error
>>> code NVME_SC_CMD_INTERRUPTED, io will interrupt. NVME_SC_CMD_INTERRUPTED
>>> should translate to BLK_STS_IOERR by default, dm-multipath will fail
>>> over to other path retry the io.
>>
>> IOERR still seems wrong, though.
>> .
>
> BLK_STS_TARGET means target has critical error. NVME_SC_CMD_INTERRUPTED
> just means target need retry io. It is not suitable to translate
> NVME_SC_CMD_INTERRUPTED to BLK_STS_TARGET. Maybe translate to
> BLK_STS_IOERR is also not suitable, we should translate
> NVME_SC_CMD_INTERRUPTED to BLK_STS_AGAIN.
> We can do like this:

BLK_STS_AGAIN is a bad choice as we use it for calls that block when
the callers asked for non-blocking submission.  I'm really not sure
we want to change anything here - the error definition clearly states
it is not a failure but a request to retry later.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] nvme-core: fix io interrupt when work with dm-multipah
  2020-07-29  5:59     ` Christoph Hellwig
@ 2020-07-30  1:49       ` Chao Leng
  2020-08-05  6:40         ` Chao Leng
  0 siblings, 1 reply; 6+ messages in thread
From: Chao Leng @ 2020-07-30  1:49 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: kbusch, axboe, sagi, linux-nvme, John.Meneghini



On 2020/7/29 13:59, Christoph Hellwig wrote:
> On Wed, Jul 29, 2020 at 10:54:29AM +0800, Chao Leng wrote:
>>
>>
>> On 2020/7/28 19:19, Christoph Hellwig wrote:
>>> On Mon, Jul 27, 2020 at 01:58:18PM +0800, Chao Leng wrote:
>>>> The protocol NVM-Express-1.4 define:
>>>> Command Interrupted: Command processing was interrupted and the
>>>> controller is unable to successfully complete the command. The host
>>>> should retry the command. If this status code is returned, then
>>>> the controller shall clear the Do Not Retry bit to ‘0’ in the Status
>>>> field of the CQE (refer to Figure 124). The controller shall not return
>>>> this status code unless the host has set the Advanced Command Retry
>>>> Enable (ACRE) field to 1h in the Host Behavior Support feature(refer to
>>>> section 5.21.1.22).
>>>>
>>>> According the protocol define, NVME_SC_CMD_INTERRUPTED need retry.
>>>> The error code NVME_SC_CMD_INTERRUPTED should not translate to
>>>> BLK_STS_TARGET, because if the error code translate to BLK_STS_TARGET,
>>>> dm-multipah will return error to application. So if target return error
>>>> code NVME_SC_CMD_INTERRUPTED, io will interrupt. NVME_SC_CMD_INTERRUPTED
>>>> should translate to BLK_STS_IOERR by default, dm-multipath will fail
>>>> over to other path retry the io.
>>>
>>> IOERR still seems wrong, though.
>>> .
>>
>> BLK_STS_TARGET means target has critical error. NVME_SC_CMD_INTERRUPTED
>> just means target need retry io. It is not suitable to translate
>> NVME_SC_CMD_INTERRUPTED to BLK_STS_TARGET. Maybe translate to
>> BLK_STS_IOERR is also not suitable, we should translate
>> NVME_SC_CMD_INTERRUPTED to BLK_STS_AGAIN.
>> We can do like this:
> 
> BLK_STS_AGAIN is a bad choice as we use it for calls that block when
> the callers asked for non-blocking submission.  I'm really not sure
> we want to change anything here - the error definition clearly states
> it is not a failure but a request to retry later.
> .
> 
BLK_STS_AGAIN is not a good choice, but BLK_STS_TARGET is also not
a good choice. I find the patch that translate NVME_SC_CMD_INTERRUPTED
to BLK_STS_TARGET.

commit:35038bffa87da282010b91108cadd13238bb5bbd
nvme: Translate more status codes to blk_status_t
Decode interrupted command and not ready namespace nvme status codes to
BLK_STS_TARGET. These are not generic IO errors and should use a non-path
specific error so that it can use the non-failover retry path.
Reported-by: John Meneghini <John.Meneghini@netapp.com>

In the old version, need translate NVME_SC_CMD_INTERRUPTED to
BLK_STS_TARGET, because nvme multipath check the blk_path_error,
we expect retry IO in the current path, so have to translate
NVME_SC_CMD_INTERRUPTED to BLK_STS_TARGET. Although converting to
BLK_STS_TARGET is not a good choice, there seems to be no better choice
for the old version code. But now we do not need translate
NVME_SC_CMD_INTERRUPTED to BLK_STS_TARGET, nvme multipath already
improve and does not depend on blk_path_error.

According to the description of BLK_STS_TARGET:
[BLK_STS_TARGET]	= { -EREMOTEIO,	"critical target" }
BLK_STS_TARGET may easily mistaken as a fatal error on the storage.
I already check all the error code of BLK_STS_xx, there is no good
choice for NVME_SC_CMD_INTERRUPTED now. In the absence of a suitable
error code, it is a good choice to retain the default. So I strongly
suggest delete translate NVME_SC_CMD_INTERRUPTED to BLK_STS_TARGET.

John, what do you think?

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] nvme-core: fix io interrupt when work with dm-multipah
  2020-07-30  1:49       ` Chao Leng
@ 2020-08-05  6:40         ` Chao Leng
  0 siblings, 0 replies; 6+ messages in thread
From: Chao Leng @ 2020-08-05  6:40 UTC (permalink / raw)
  Cc: linux-nvme, John.Meneghini

John, Wat do you think about if delete translate NVME_SC_CMD_INTERRUPTED
to BLK_STS_TARGET? Thank you.

On 2020/7/30 9:49, Chao Leng wrote:
> 
> 
> On 2020/7/29 13:59, Christoph Hellwig wrote:
>> On Wed, Jul 29, 2020 at 10:54:29AM +0800, Chao Leng wrote:
>>>
>>>
>>> On 2020/7/28 19:19, Christoph Hellwig wrote:
>>>> On Mon, Jul 27, 2020 at 01:58:18PM +0800, Chao Leng wrote:
>>>>> The protocol NVM-Express-1.4 define:
>>>>> Command Interrupted: Command processing was interrupted and the
>>>>> controller is unable to successfully complete the command. The host
>>>>> should retry the command. If this status code is returned, then
>>>>> the controller shall clear the Do Not Retry bit to ‘0’ in the Status
>>>>> field of the CQE (refer to Figure 124). The controller shall not return
>>>>> this status code unless the host has set the Advanced Command Retry
>>>>> Enable (ACRE) field to 1h in the Host Behavior Support feature(refer to
>>>>> section 5.21.1.22).
>>>>>
>>>>> According the protocol define, NVME_SC_CMD_INTERRUPTED need retry.
>>>>> The error code NVME_SC_CMD_INTERRUPTED should not translate to
>>>>> BLK_STS_TARGET, because if the error code translate to BLK_STS_TARGET,
>>>>> dm-multipah will return error to application. So if target return error
>>>>> code NVME_SC_CMD_INTERRUPTED, io will interrupt. NVME_SC_CMD_INTERRUPTED
>>>>> should translate to BLK_STS_IOERR by default, dm-multipath will fail
>>>>> over to other path retry the io.
>>>>
>>>> IOERR still seems wrong, though.
>>>> .
>>>
>>> BLK_STS_TARGET means target has critical error. NVME_SC_CMD_INTERRUPTED
>>> just means target need retry io. It is not suitable to translate
>>> NVME_SC_CMD_INTERRUPTED to BLK_STS_TARGET. Maybe translate to
>>> BLK_STS_IOERR is also not suitable, we should translate
>>> NVME_SC_CMD_INTERRUPTED to BLK_STS_AGAIN.
>>> We can do like this:
>>
>> BLK_STS_AGAIN is a bad choice as we use it for calls that block when
>> the callers asked for non-blocking submission.  I'm really not sure
>> we want to change anything here - the error definition clearly states
>> it is not a failure but a request to retry later.
>> .
>>
> BLK_STS_AGAIN is not a good choice, but BLK_STS_TARGET is also not
> a good choice. I find the patch that translate NVME_SC_CMD_INTERRUPTED
> to BLK_STS_TARGET.
> 
> commit:35038bffa87da282010b91108cadd13238bb5bbd
> nvme: Translate more status codes to blk_status_t
> Decode interrupted command and not ready namespace nvme status codes to
> BLK_STS_TARGET. These are not generic IO errors and should use a non-path
> specific error so that it can use the non-failover retry path.
> Reported-by: John Meneghini <John.Meneghini@netapp.com>
> 
> In the old version, need translate NVME_SC_CMD_INTERRUPTED to
> BLK_STS_TARGET, because nvme multipath check the blk_path_error,
> we expect retry IO in the current path, so have to translate
> NVME_SC_CMD_INTERRUPTED to BLK_STS_TARGET. Although converting to
> BLK_STS_TARGET is not a good choice, there seems to be no better choice
> for the old version code. But now we do not need translate
> NVME_SC_CMD_INTERRUPTED to BLK_STS_TARGET, nvme multipath already
> improve and does not depend on blk_path_error.
> 
> According to the description of BLK_STS_TARGET:
> [BLK_STS_TARGET]    = { -EREMOTEIO,    "critical target" }
> BLK_STS_TARGET may easily mistaken as a fatal error on the storage.
> I already check all the error code of BLK_STS_xx, there is no good
> choice for NVME_SC_CMD_INTERRUPTED now. In the absence of a suitable
> error code, it is a good choice to retain the default. So I strongly
> suggest delete translate NVME_SC_CMD_INTERRUPTED to BLK_STS_TARGET.
> 
> John, what do you think?

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, back to index

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-27  5:58 [PATCH] nvme-core: fix io interrupt when work with dm-multipah Chao Leng
2020-07-28 11:19 ` Christoph Hellwig
2020-07-29  2:54   ` Chao Leng
2020-07-29  5:59     ` Christoph Hellwig
2020-07-30  1:49       ` Chao Leng
2020-08-05  6:40         ` Chao Leng

Linux-NVME Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nvme/0 linux-nvme/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nvme linux-nvme/ https://lore.kernel.org/linux-nvme \
		linux-nvme@lists.infradead.org
	public-inbox-index linux-nvme

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.infradead.lists.linux-nvme


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git