Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target

From: Valeriy Glushkov <valeriy.glushkov at starwind.com>
To: spdk@lists.01.org
Subject: Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target
Date: Wed, 06 Feb 2019 18:15:03 +0000	[thread overview]
Message-ID: <op.zwsya7ahns8zkg@sam8> (raw)
In-Reply-To: EA913ED399BBA34AA4EAC2EDC24CDD009C25DFB8@FMSMSX105.amr.corp.intel.com

[-- Attachment #1: Type: text/plain, Size: 7384 bytes --]

Hi Seth,

thank you for your great explanation of the changes you have done!

I added some debug prints to the target code as you requested.

Here is a part of the target's debug log with the last Admin queue  
requests:
=======
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0  
entering state 1
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0  
entering state 5
request.c: 121:nvmf_trace_command: *DEBUG*: Admin cmd: opc 0x0a fuse 0 cid  
2815 nsid 0 cdw10 0x00000006
request.c: 136:nvmf_trace_command: *DEBUG*: SGL: Keyed: addr 0x0 key 0x0  
len 0x0
request.c:  92:spdk_nvmf_request_complete: *DEBUG*: cpl: cid=2815  
cdw0=0x00000001 rsvd1=0 status=0x0000
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0  
entering state 7
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0  
entering state 9
rdma.c: 978:request_transfer_out: *DEBUG*: RDMA RECV POSTED. Recv:  
0xfdd2e8 Connection: 0xf04970
rdma.c:1001:request_transfer_out: *DEBUG*: RDMA SEND POSTED. Request:  
0xff14e0 Connection: 0xf04970
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0  
entering state 11
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0  
entering state 11
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0  
entering state 12
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0  
entering state 0
...
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20  
entering state 1
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20  
entering state 2
rdma.c:1414:spdk_nvmf_rdma_request_parse_sgl: *DEBUG*: Request 0xff1a20  
took 1 buffer/s from central pool
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20  
entering state 5
request.c: 121:nvmf_trace_command: *DEBUG*: Admin cmd: opc 0x02 fuse 0 cid  
2847 nsid 4294967295 cdw10 0x001e0002
request.c: 136:nvmf_trace_command: *DEBUG*: SGL: Keyed (Inv): addr  
0xffffe483a781fdd0 key 0x82929 len 0x200
ctrlr.c:1285:spdk_nvmf_ctrlr_get_log_page: *DEBUG*: Get log page: LID=0x02  
offset=0x0 len=0x7c
request.c:  92:spdk_nvmf_request_complete: *DEBUG*: cpl: cid=2847  
cdw0=0x00000000 rsvd1=0 status=0x0000
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20  
entering state 7
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20  
entering state 8
rdma.c:1643:spdk_nvmf_rdma_request_process: *DEBUG*: 62+1 WRs outstanding  
(max_send_depth 63). Waiting until some finish.
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20  
entering state 8
rdma.c:1643:spdk_nvmf_rdma_request_process: *DEBUG*: 62+1 WRs outstanding  
(max_send_depth 63). Waiting until some finish.
=======

The Admin queue depth = 32.

It seems that there is an issue with the rqpair->current_send_depth  
counter as the request was the only one in the queue at the moment.

I could send you the complete debug log, if you need.

Thank you for your help.

-- 
Best regards,
Valeriy Glushkov
www.starwind.com
valeriy.glushkov(a)starwind.com

Howell, Seth <seth.howell(a)intel.com> писал(а) в своём письме Wed, 06 Feb  
2019 18:03:49 +0200:

> Hi Valeriy,
>
> I made the changes that you mentioned to the RDMA request state machine.  
> Here is a little background on those changes.
> Those changes were part of an effort to better approach the limits of  
> work requests that can be posted at once to the send queue. Previously  
> we were limiting both RDMA reads and RDMA writes to the same limit. That  
> is the one indicated by device_attr.qp_rd_atom_max. This value only  
> controls rdma READ operations, and write operations should be governed  
> by the size of the send queue. So we separated the state  
> RDMA_REQUEST_STATE_DATA_TRANSFER_PENDING into two different states, one  
> for controller to host transfers, and the other for host to controller  
> transfers.
> This was useful in two ways. Firstly, it allowed us to loosen  
> restrictions on rdma sends, and secondly it allowed us to make the  
> states of the rdma request a linear chain that can be followed through  
> from start to finish.
> It looks like your I/O is getting stuck in state 8, or  
> RDMA_REQUEST_STATE_DATA_TRANSFERRING_HOST_TO_CONTROLLER. I see from your  
> stacktrace that the I/O enters that state at least twice, which is  
> normal. If there is more than one I/O queued in that state, we operate  
> on them in a fifo manner. Also, if there are currently too many  
> outstanding send wrs, we wait until we receive some completions on the  
> send queue to continue. Every time we get a completion, we will poll  
> through this list to try to submit more SENDs.
>
> I have a few follow up questions about your configuration and  
> information we will need to be able to help:
>
> 1. What is the queue depth of I/O you are submitting to the target?
> 2. Are you doing a 100% NVMe-oF read workload, or is it a mixed  
> read/write workload?
> 3. Your stacktrace shows the I/O enter state 8 twice, does this trend  
> continue forever until you get a timeout on the I/O?
> 4. Can you add two extra debug prints to the target, one to each if  
> statement inside of state  
> RDMA_REQUEST_STATE_DATA_TRANSFER_TO_HOST_PENDING? I want to see why we  
> are staying in that state. In those debug prints, it would be useful if  
> you could include the following information: rqpair->current_send_depth,  
> rdma_req->num_outstanding_data_wr, rqpair->max_send_depth.
>
> Some other information that may be useful:
> How many cores are you running the SPDK target on? How many devices and  
> connections do you have when this hits?
>
> This information will help us understand better why your I/O aren't  
> making forward progress. I am really interested to see what is going on  
> here since we don't have this issue on either the kernel or SPDK  
> initiators, and the changes we made should have loosened up the  
> requirements for sending I/O.
>
> Thanks,
>
> Seth Howell
>
>
> -----Original Message-----
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Harris, James  
> R
> Sent: Wednesday, February 6, 2019 7:55 AM
> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Subject: Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target
>
>
>
> On 2/6/19, 12:59 AM, "SPDK on behalf of Valeriy Glushkov"  
> <spdk-bounces(a)lists.01.org on behalf of valeriy.glushkov(a)starwind.com>  
> wrote:
>
>     Hi Jim,
>    Our module is an implementation of the NVMeoF/RDMA host.
>    It works with SPDK 18.10.1 well, so the problem seems to be related  
> to the
>     SPDK 19.01 code.
>    I can see that the RDMA request's state engine have been changed in  
> the
>     recent SPDK release.
>     So it would be great if the author of the modifications could take a  
> look
>     at the issue...
>    Thank you for your help!
>
>
> Hi Valeriy,
>
> Can you provide detailed information about what your host module is  
> doing to induce this behavior?  Our tests with the Linux kernel host  
> driver and the SPDK host driver do not seem to be hitting this problem.
>
> Thanks,
>
> -Jim
>
>