linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] nvme-tcp: Do not reset transport on data digest errors
@ 2021-08-05 12:15 Daniel Wagner
  2021-08-06 19:42 ` Sagi Grimberg
  2021-08-20 10:20 ` Hannes Reinecke
  0 siblings, 2 replies; 8+ messages in thread
From: Daniel Wagner @ 2021-08-05 12:15 UTC (permalink / raw)
  To: linux-nvme
  Cc: linux-kernel, Sagi Grimberg, Hannes Reinecke, yi.he, Daniel Wagner

The spec says

  7.4.6.1 Digest Error handling

  When a host detects a data digest error in a C2HData PDU, that host
  shall continue processing C2HData PDUs associated with the command and
  when the command processing has completed, if a successful status was
  returned by the controller, the host shall fail the command with a
  non-fatal transport error.

Currently the transport is reseted when a data digest error is
detected. To fix this, keep track of the final status in the queue
object and use it when completing the request.

The new member can be placed adjacent to the receive related members and
fits in the cacheline as there is a 4 byte hole.

Signed-off-by: Daniel Wagner <dwagner@suse.de>
---

Hi,

I've tested this by modifying the receive path. Via the fault_inject
interface I injecting wrong hash values. The request would then be
completed with status != 0 and nvme_decide_disposition decices to
retry the request. So this seems be in more sync with what the spec
says on this topic.

Daniel

 drivers/nvme/host/tcp.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 097f7dd10ed3..5253147df4c7 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -89,6 +89,7 @@ struct nvme_tcp_queue {
 	size_t			data_remaining;
 	size_t			ddgst_remaining;
 	unsigned int		nr_cqe;
+	u16			status;
 
 	/* send state */
 	struct nvme_tcp_request *request;
@@ -496,7 +497,8 @@ static int nvme_tcp_process_nvme_cqe(struct nvme_tcp_queue *queue,
 		return -EINVAL;
 	}
 
-	if (!nvme_try_complete_req(rq, cqe->status, cqe->result))
+	if (!nvme_try_complete_req(rq, queue->status ?
+			queue->status : cqe->status, cqe->result))
 		nvme_complete_rq(rq);
 	queue->nr_cqe++;
 
@@ -676,6 +678,7 @@ static int nvme_tcp_recv_pdu(struct nvme_tcp_queue *queue, struct sk_buff *skb,
 
 	switch (hdr->type) {
 	case nvme_tcp_c2h_data:
+		queue->status = NVME_SC_SUCCESS;
 		return nvme_tcp_handle_c2h_data(queue, (void *)queue->pdu);
 	case nvme_tcp_rsp:
 		nvme_tcp_init_recv_ctx(queue);
@@ -758,7 +761,7 @@ static int nvme_tcp_recv_data(struct nvme_tcp_queue *queue, struct sk_buff *skb,
 			queue->ddgst_remaining = NVME_TCP_DIGEST_LENGTH;
 		} else {
 			if (pdu->hdr.flags & NVME_TCP_F_DATA_SUCCESS) {
-				nvme_tcp_end_request(rq, NVME_SC_SUCCESS);
+				nvme_tcp_end_request(rq, queue->status);
 				queue->nr_cqe++;
 			}
 			nvme_tcp_init_recv_ctx(queue);
@@ -792,14 +795,14 @@ static int nvme_tcp_recv_ddgst(struct nvme_tcp_queue *queue,
 			"data digest error: recv %#x expected %#x\n",
 			le32_to_cpu(queue->recv_ddgst),
 			le32_to_cpu(queue->exp_ddgst));
-		return -EIO;
+		queue->status = NVME_SC_DATA_XFER_ERROR;
 	}
 
 	if (pdu->hdr.flags & NVME_TCP_F_DATA_SUCCESS) {
 		struct request *rq = nvme_cid_to_rq(nvme_tcp_tagset(queue),
 					pdu->command_id);
 
-		nvme_tcp_end_request(rq, NVME_SC_SUCCESS);
+		nvme_tcp_end_request(rq, queue->status);
 		queue->nr_cqe++;
 	}
 
-- 
2.29.2


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] nvme-tcp: Do not reset transport on data digest errors
  2021-08-05 12:15 [PATCH] nvme-tcp: Do not reset transport on data digest errors Daniel Wagner
@ 2021-08-06 19:42 ` Sagi Grimberg
  2021-08-09  9:09   ` Daniel Wagner
  2021-08-20 10:20 ` Hannes Reinecke
  1 sibling, 1 reply; 8+ messages in thread
From: Sagi Grimberg @ 2021-08-06 19:42 UTC (permalink / raw)
  To: Daniel Wagner, linux-nvme; +Cc: linux-kernel, Hannes Reinecke, yi.he


> The spec says
> 
>    7.4.6.1 Digest Error handling
> 
>    When a host detects a data digest error in a C2HData PDU, that host
>    shall continue processing C2HData PDUs associated with the command and
>    when the command processing has completed, if a successful status was
>    returned by the controller, the host shall fail the command with a
>    non-fatal transport error.
> 
> Currently the transport is reseted when a data digest error is
> detected. To fix this, keep track of the final status in the queue
> object and use it when completing the request.
> 
> The new member can be placed adjacent to the receive related members and
> fits in the cacheline as there is a 4 byte hole.
> 
> Signed-off-by: Daniel Wagner <dwagner@suse.de>
> ---
> 
> Hi,
> 
> I've tested this by modifying the receive path. Via the fault_inject
> interface I injecting wrong hash values. The request would then be
> completed with status != 0 and nvme_decide_disposition decices to
> retry the request. So this seems be in more sync with what the spec
> says on this topic.
> 
> Daniel
> 
>   drivers/nvme/host/tcp.c | 11 +++++++----
>   1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index 097f7dd10ed3..5253147df4c7 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -89,6 +89,7 @@ struct nvme_tcp_queue {
>   	size_t			data_remaining;
>   	size_t			ddgst_remaining;
>   	unsigned int		nr_cqe;
> +	u16			status;

Why is this a queue member and not a request member?

>   
>   	/* send state */
>   	struct nvme_tcp_request *request;
> @@ -496,7 +497,8 @@ static int nvme_tcp_process_nvme_cqe(struct nvme_tcp_queue *queue,
>   		return -EINVAL;
>   	}
>   
> -	if (!nvme_try_complete_req(rq, cqe->status, cqe->result))
> +	if (!nvme_try_complete_req(rq, queue->status ?
> +			queue->status : cqe->status, cqe->result))
>   		nvme_complete_rq(rq);
>   	queue->nr_cqe++;
>   
> @@ -676,6 +678,7 @@ static int nvme_tcp_recv_pdu(struct nvme_tcp_queue *queue, struct sk_buff *skb,
>   
>   	switch (hdr->type) {
>   	case nvme_tcp_c2h_data:
> +		queue->status = NVME_SC_SUCCESS;
>   		return nvme_tcp_handle_c2h_data(queue, (void *)queue->pdu);
>   	case nvme_tcp_rsp:
>   		nvme_tcp_init_recv_ctx(queue);
> @@ -758,7 +761,7 @@ static int nvme_tcp_recv_data(struct nvme_tcp_queue *queue, struct sk_buff *skb,
>   			queue->ddgst_remaining = NVME_TCP_DIGEST_LENGTH;
>   		} else {
>   			if (pdu->hdr.flags & NVME_TCP_F_DATA_SUCCESS) {
> -				nvme_tcp_end_request(rq, NVME_SC_SUCCESS);
> +				nvme_tcp_end_request(rq, queue->status);
>   				queue->nr_cqe++;
>   			}
>   			nvme_tcp_init_recv_ctx(queue);
> @@ -792,14 +795,14 @@ static int nvme_tcp_recv_ddgst(struct nvme_tcp_queue *queue,
>   			"data digest error: recv %#x expected %#x\n",
>   			le32_to_cpu(queue->recv_ddgst),
>   			le32_to_cpu(queue->exp_ddgst));
> -		return -EIO;
> +		queue->status = NVME_SC_DATA_XFER_ERROR;
>   	}
>   
>   	if (pdu->hdr.flags & NVME_TCP_F_DATA_SUCCESS) {
>   		struct request *rq = nvme_cid_to_rq(nvme_tcp_tagset(queue),
>   					pdu->command_id);
>   
> -		nvme_tcp_end_request(rq, NVME_SC_SUCCESS);
> +		nvme_tcp_end_request(rq, queue->status);
>   		queue->nr_cqe++;
>   	}
>   
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] nvme-tcp: Do not reset transport on data digest errors
  2021-08-06 19:42 ` Sagi Grimberg
@ 2021-08-09  9:09   ` Daniel Wagner
  2021-08-11  1:02     ` Sagi Grimberg
  0 siblings, 1 reply; 8+ messages in thread
From: Daniel Wagner @ 2021-08-09  9:09 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-nvme, linux-kernel, Hannes Reinecke, yi.he

Hi Sagi,

On Fri, Aug 06, 2021 at 12:42:00PM -0700, Sagi Grimberg wrote:
> > @@ -89,6 +89,7 @@ struct nvme_tcp_queue {
> >   	size_t			data_remaining;
> >   	size_t			ddgst_remaining;
> >   	unsigned int		nr_cqe;
> > +	u16			status;
> 
> Why is this a queue member and not a request member?

I was not sure if the TCP transport specific error handling should
impact all other transport (size of struct request). Also I tried to
avoid accessing cachelines which are not already in use. Except this I
don't see there should be no problem to put this to struct request.

Daniel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] nvme-tcp: Do not reset transport on data digest errors
  2021-08-09  9:09   ` Daniel Wagner
@ 2021-08-11  1:02     ` Sagi Grimberg
  2021-08-11 10:31       ` Daniel Wagner
  0 siblings, 1 reply; 8+ messages in thread
From: Sagi Grimberg @ 2021-08-11  1:02 UTC (permalink / raw)
  To: Daniel Wagner; +Cc: linux-nvme, linux-kernel, Hannes Reinecke, yi.he


> Hi Sagi,
> 
> On Fri, Aug 06, 2021 at 12:42:00PM -0700, Sagi Grimberg wrote:
>>> @@ -89,6 +89,7 @@ struct nvme_tcp_queue {
>>>    	size_t			data_remaining;
>>>    	size_t			ddgst_remaining;
>>>    	unsigned int		nr_cqe;
>>> +	u16			status;
>>
>> Why is this a queue member and not a request member?
> 
> I was not sure if the TCP transport specific error handling should
> impact all other transport (size of struct request). Also I tried to
> avoid accessing cachelines which are not already in use. Except this I
> don't see there should be no problem to put this to struct request.

It is the correct place, lets see that it doesn't increase the struct.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] nvme-tcp: Do not reset transport on data digest errors
  2021-08-11  1:02     ` Sagi Grimberg
@ 2021-08-11 10:31       ` Daniel Wagner
  2021-08-18 12:44         ` Daniel Wagner
  0 siblings, 1 reply; 8+ messages in thread
From: Daniel Wagner @ 2021-08-11 10:31 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-nvme, linux-kernel, Hannes Reinecke, yi.he

On Tue, Aug 10, 2021 at 06:02:36PM -0700, Sagi Grimberg wrote:
> 
> > Hi Sagi,
> > 
> > On Fri, Aug 06, 2021 at 12:42:00PM -0700, Sagi Grimberg wrote:
> > > > @@ -89,6 +89,7 @@ struct nvme_tcp_queue {
> > > >    	size_t			data_remaining;
> > > >    	size_t			ddgst_remaining;
> > > >    	unsigned int		nr_cqe;
> > > > +	u16			status;
> > > 
> > > Why is this a queue member and not a request member?
> > 
> > I was not sure if the TCP transport specific error handling should
> > impact all other transport (size of struct request). Also I tried to
> > avoid accessing cachelines which are not already in use. Except this I
> > don't see there should be no problem to put this to struct request.
> 
> It is the correct place, lets see that it doesn't increase the struct.

It should not according pahole:

        /* XXX 7 bytes hole, try to pack */

        void *                     pdu;                  /*   144     8 */
        int                        pdu_remaining;        /*   152     4 */
        int                        pdu_offset;           /*   156     4 */
        size_t                     data_remaining;       /*   160     8 */
        size_t                     ddgst_remaining;      /*   168     8 */
        unsigned int               nr_cqe;               /*   176     4 */
        u16                        status;               /*   180     2 */

        /* XXX 2 bytes hole, try to pack */

Daniel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] nvme-tcp: Do not reset transport on data digest errors
  2021-08-11 10:31       ` Daniel Wagner
@ 2021-08-18 12:44         ` Daniel Wagner
  2021-08-23 17:13           ` Sagi Grimberg
  0 siblings, 1 reply; 8+ messages in thread
From: Daniel Wagner @ 2021-08-18 12:44 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-nvme, linux-kernel, Hannes Reinecke, yi.he

On Wed, Aug 11, 2021 at 12:31:46PM +0200, Daniel Wagner wrote:
> > It is the correct place, lets see that it doesn't increase the struct.
> 
> It should not according pahole:

Anything I need to do to get this patch accepted?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] nvme-tcp: Do not reset transport on data digest errors
  2021-08-05 12:15 [PATCH] nvme-tcp: Do not reset transport on data digest errors Daniel Wagner
  2021-08-06 19:42 ` Sagi Grimberg
@ 2021-08-20 10:20 ` Hannes Reinecke
  1 sibling, 0 replies; 8+ messages in thread
From: Hannes Reinecke @ 2021-08-20 10:20 UTC (permalink / raw)
  To: Daniel Wagner, linux-nvme; +Cc: linux-kernel, Sagi Grimberg, yi.he

On 8/5/21 2:15 PM, Daniel Wagner wrote:
> The spec says
> 
>    7.4.6.1 Digest Error handling
> 
>    When a host detects a data digest error in a C2HData PDU, that host
>    shall continue processing C2HData PDUs associated with the command and
>    when the command processing has completed, if a successful status was
>    returned by the controller, the host shall fail the command with a
>    non-fatal transport error.
> 
> Currently the transport is reseted when a data digest error is
> detected. To fix this, keep track of the final status in the queue
> object and use it when completing the request.
> 
> The new member can be placed adjacent to the receive related members and
> fits in the cacheline as there is a 4 byte hole.
> 
> Signed-off-by: Daniel Wagner <dwagner@suse.de>
> ---
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] nvme-tcp: Do not reset transport on data digest errors
  2021-08-18 12:44         ` Daniel Wagner
@ 2021-08-23 17:13           ` Sagi Grimberg
  0 siblings, 0 replies; 8+ messages in thread
From: Sagi Grimberg @ 2021-08-23 17:13 UTC (permalink / raw)
  To: Daniel Wagner; +Cc: linux-nvme, linux-kernel, Hannes Reinecke, yi.he


>>> It is the correct place, lets see that it doesn't increase the struct.
>>
>> It should not according pahole:
> 
> Anything I need to do to get this patch accepted?

Yes, move the status to nvme_tcp_request, I think its
a more appropriate place for it.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-08-23 17:14 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-05 12:15 [PATCH] nvme-tcp: Do not reset transport on data digest errors Daniel Wagner
2021-08-06 19:42 ` Sagi Grimberg
2021-08-09  9:09   ` Daniel Wagner
2021-08-11  1:02     ` Sagi Grimberg
2021-08-11 10:31       ` Daniel Wagner
2021-08-18 12:44         ` Daniel Wagner
2021-08-23 17:13           ` Sagi Grimberg
2021-08-20 10:20 ` Hannes Reinecke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).