Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target

From: Howell, Seth <seth.howell at intel.com>
To: spdk@lists.01.org
Subject: Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target
Date: Wed, 06 Feb 2019 23:03:06 +0000	[thread overview]
Message-ID: <EA913ED399BBA34AA4EAC2EDC24CDD009C25E29E@FMSMSX105.amr.corp.intel.com> (raw)
In-Reply-To: EA913ED399BBA34AA4EAC2EDC24CDD009C25E13B@FMSMSX105.amr.corp.intel.com

[-- Attachment #1: Type: text/plain, Size: 10276 bytes --]

Hi Valeriy,

Thanks for sending the full log. At the surface, everything looks how I would expect it to. I am going to try to replicate your I/O patterns here locally to see if I can repro this with another initiator. In the meantime, can you add debug statements to all of the places where we increment and decrement the current_send_depth value and run the test again?

Also, does it always fail on a get_log_page command?

Thanks,

Seth

-----Original Message-----
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Howell, Seth
Sent: Wednesday, February 6, 2019 1:28 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target

Thanks Valeriy,

I agree with you that there is something going wrong with the current_send_depth value.
My first thought with something like this is usually that we are doing our math wrong and we are underflowing. But the value being reported is 62 which implies that we have simply submitted 62 send_wrs without reaping any on the completion queue. Underflow would probably result in a number closer to 2^32-1. This leads me to believe we are blocking the decrement in some way.

In order for a blockage like this to occur, one of two things has to be happening:
1. The SENDs are not properly being posted to the completion queue. (This is unlikely since we add the flag IBV_SEND_SIGNALED to all of the work requests. The spec says they will definitely get completed back to the completion queue in that case) 2. We aren't properly decrementing the current_send_depth when we get a completion (Also very unlikely. You can inspect the code in spdk_nvmf_rdma_poller_poll to see that for every completion that is classified as a send we decrement current_send_depth in both the standard and error paths).

Can you please send me the full debug log? I want to look at all of the commands previous to the failed one to see if they are returning to state 0, which would imply that we are getting the send completions back.
Something else that would be really helpful is if you printed out the value of current_send_depth each time we increment or decrement it. That would help me see exactly what's missing here.

Just a few more questions to establish a baseline - 1. What kind of a NIC are you using?
2. Have you made any changes to the target code on top of 19.01? i.e. applied any custom patches?

Thanks,

Seth

-----Original Message-----
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Valeriy Glushkov
Sent: Wednesday, February 6, 2019 11:15 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target

Hi Seth,

thank you for your great explanation of the changes you have done!

I added some debug prints to the target code as you requested.

Here is a part of the target's debug log with the last Admin queue
requests:
=======
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 entering state 1
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 entering state 5
request.c: 121:nvmf_trace_command: *DEBUG*: Admin cmd: opc 0x0a fuse 0 cid
2815 nsid 0 cdw10 0x00000006
request.c: 136:nvmf_trace_command: *DEBUG*: SGL: Keyed: addr 0x0 key 0x0 len 0x0
request.c:  92:spdk_nvmf_request_complete: *DEBUG*: cpl: cid=2815
cdw0=0x00000001 rsvd1=0 status=0x0000
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 entering state 7
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 entering state 9
rdma.c: 978:request_transfer_out: *DEBUG*: RDMA RECV POSTED. Recv:  
0xfdd2e8 Connection: 0xf04970
rdma.c:1001:request_transfer_out: *DEBUG*: RDMA SEND POSTED. Request:  
0xff14e0 Connection: 0xf04970
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 entering state 11
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 entering state 11
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 entering state 12
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 entering state 0 ...
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20 entering state 1
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20 entering state 2
rdma.c:1414:spdk_nvmf_rdma_request_parse_sgl: *DEBUG*: Request 0xff1a20 took 1 buffer/s from central pool
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20 entering state 5
request.c: 121:nvmf_trace_command: *DEBUG*: Admin cmd: opc 0x02 fuse 0 cid
2847 nsid 4294967295 cdw10 0x001e0002
request.c: 136:nvmf_trace_command: *DEBUG*: SGL: Keyed (Inv): addr
0xffffe483a781fdd0 key 0x82929 len 0x200
ctrlr.c:1285:spdk_nvmf_ctrlr_get_log_page: *DEBUG*: Get log page: LID=0x02
offset=0x0 len=0x7c
request.c:  92:spdk_nvmf_request_complete: *DEBUG*: cpl: cid=2847
cdw0=0x00000000 rsvd1=0 status=0x0000
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20 entering state 7
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20 entering state 8
rdma.c:1643:spdk_nvmf_rdma_request_process: *DEBUG*: 62+1 WRs outstanding (max_send_depth 63). Waiting until some finish.
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20 entering state 8
rdma.c:1643:spdk_nvmf_rdma_request_process: *DEBUG*: 62+1 WRs outstanding (max_send_depth 63). Waiting until some finish.
=======

The Admin queue depth = 32.

It seems that there is an issue with the rqpair->current_send_depth counter as the request was the only one in the queue at the moment.

I could send you the complete debug log, if you need.

Thank you for your help.

--
Best regards,
Valeriy Glushkov
www.starwind.com
valeriy.glushkov(a)starwind.com

Howell, Seth <seth.howell(a)intel.com> писал(а) в своём письме Wed, 06 Feb
2019 18:03:49 +0200:

> Hi Valeriy,
>
> I made the changes that you mentioned to the RDMA request state machine.  
> Here is a little background on those changes.
> Those changes were part of an effort to better approach the limits of 
> work requests that can be posted at once to the send queue. Previously 
> we were limiting both RDMA reads and RDMA writes to the same limit.
> That is the one indicated by device_attr.qp_rd_atom_max. This value 
> only controls rdma READ operations, and write operations should be 
> governed by the size of the send queue. So we separated the state 
> RDMA_REQUEST_STATE_DATA_TRANSFER_PENDING into two different states, 
> one for controller to host transfers, and the other for host to 
> controller transfers.
> This was useful in two ways. Firstly, it allowed us to loosen 
> restrictions on rdma sends, and secondly it allowed us to make the 
> states of the rdma request a linear chain that can be followed through 
> from start to finish.
> It looks like your I/O is getting stuck in state 8, or 
> RDMA_REQUEST_STATE_DATA_TRANSFERRING_HOST_TO_CONTROLLER. I see from 
> your stacktrace that the I/O enters that state at least twice, which 
> is normal. If there is more than one I/O queued in that state, we 
> operate on them in a fifo manner. Also, if there are currently too 
> many outstanding send wrs, we wait until we receive some completions 
> on the send queue to continue. Every time we get a completion, we will 
> poll through this list to try to submit more SENDs.
>
> I have a few follow up questions about your configuration and 
> information we will need to be able to help:
>
> 1. What is the queue depth of I/O you are submitting to the target?
> 2. Are you doing a 100% NVMe-oF read workload, or is it a mixed 
> read/write workload?
> 3. Your stacktrace shows the I/O enter state 8 twice, does this trend 
> continue forever until you get a timeout on the I/O?
> 4. Can you add two extra debug prints to the target, one to each if 
> statement inside of state 
> RDMA_REQUEST_STATE_DATA_TRANSFER_TO_HOST_PENDING? I want to see why we 
> are staying in that state. In those debug prints, it would be useful 
> if you could include the following information:
> rqpair->current_send_depth, rdma_req->num_outstanding_data_wr, rqpair->max_send_depth.
>
> Some other information that may be useful:
> How many cores are you running the SPDK target on? How many devices 
> and connections do you have when this hits?
>
> This information will help us understand better why your I/O aren't 
> making forward progress. I am really interested to see what is going 
> on here since we don't have this issue on either the kernel or SPDK 
> initiators, and the changes we made should have loosened up the 
> requirements for sending I/O.
>
> Thanks,
>
> Seth Howell
>
>
> -----Original Message-----
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Harris, 
> James R
> Sent: Wednesday, February 6, 2019 7:55 AM
> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Subject: Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target
>
>
>
> On 2/6/19, 12:59 AM, "SPDK on behalf of Valeriy Glushkov"  
> <spdk-bounces(a)lists.01.org on behalf of valeriy.glushkov(a)starwind.com>
> wrote:
>
>     Hi Jim,
>    Our module is an implementation of the NVMeoF/RDMA host.
>    It works with SPDK 18.10.1 well, so the problem seems to be related 
> to the
>     SPDK 19.01 code.
>    I can see that the RDMA request's state engine have been changed in 
> the
>     recent SPDK release.
>     So it would be great if the author of the modifications could take 
> a look
>     at the issue...
>    Thank you for your help!
>
>
> Hi Valeriy,
>
> Can you provide detailed information about what your host module is 
> doing to induce this behavior?  Our tests with the Linux kernel host 
> driver and the SPDK host driver do not seem to be hitting this problem.
>
> Thanks,
>
> -Jim
>
>

_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk
_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk