From mboxrd@z Thu Jan  1 00:00:00 1970
Content-Type: multipart/mixed; boundary="===============2054719530352922593=="
MIME-Version: 1.0
From: Valeriy Glushkov <valeriy.glushkov at starwind.com>
Subject: Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target
Date: Thu, 07 Feb 2019 04:50:17 +0000
Message-ID: <op.zwtrp0gjns8zkg@sam8>
In-Reply-To: EA913ED399BBA34AA4EAC2EDC24CDD009C25E29E@FMSMSX105.amr.corp.intel.com
List-ID: <spdk@lists.01.org>
To: spdk@lists.01.org

--===============2054719530352922593==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable

Hi Seth,

It would be great if you send me a modified source with changes you need  =

or a patch and I add it to the SPDK build, reproduce the case and send you  =

the resulting debug log.

It seems that the Log page request is okay itself as there are many such  =

requests before in the log...

-- =


Best regards,
Valeriy Glushkov
www.starwind.com
valeriy.glushkov(a)starwind.com

Howell, Seth <seth.howell(a)intel.com> =D0=BF=D0=B8=D1=81=D0=B0=D0=BB(=D0=
=B0) =D0=B2 =D1=81=D0=B2=D0=BE=D1=91=D0=BC =D0=BF=D0=B8=D1=81=D1=8C=D0=BC=
=D0=B5 Thu, 07 Feb  =

2019 01:03:06 +0200:

> Hi Valeriy,
>
> Thanks for sending the full log. At the surface, everything looks how I  =

> would expect it to. I am going to try to replicate your I/O patterns  =

> here locally to see if I can repro this with another initiator. In the  =

> meantime, can you add debug statements to all of the places where we  =

> increment and decrement the current_send_depth value and run the test  =

> again?
>
> Also, does it always fail on a get_log_page command?
>
> Thanks,
>
> Seth
>
> -----Original Message-----
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Howell, Seth
> Sent: Wednesday, February 6, 2019 1:28 PM
> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Subject: Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target
>
> Thanks Valeriy,
>
> I agree with you that there is something going wrong with the  =

> current_send_depth value.
> My first thought with something like this is usually that we are doing  =

> our math wrong and we are underflowing. But the value being reported is  =

> 62 which implies that we have simply submitted 62 send_wrs without  =

> reaping any on the completion queue. Underflow would probably result in  =

> a number closer to 2^32-1. This leads me to believe we are blocking the  =

> decrement in some way.
>
> In order for a blockage like this to occur, one of two things has to be  =

> happening:
> 1. The SENDs are not properly being posted to the completion queue.  =

> (This is unlikely since we add the flag IBV_SEND_SIGNALED to all of the  =

> work requests. The spec says they will definitely get completed back to  =

> the completion queue in that case) 2. We aren't properly decrementing  =

> the current_send_depth when we get a completion (Also very unlikely. You  =

> can inspect the code in spdk_nvmf_rdma_poller_poll to see that for every  =

> completion that is classified as a send we decrement current_send_depth  =

> in both the standard and error paths).
>
> Can you please send me the full debug log? I want to look at all of the  =

> commands previous to the failed one to see if they are returning to  =

> state 0, which would imply that we are getting the send completions back.
> Something else that would be really helpful is if you printed out the  =

> value of current_send_depth each time we increment or decrement it. That  =

> would help me see exactly what's missing here.
>
> Just a few more questions to establish a baseline - 1. What kind of a  =

> NIC are you using?
> 2. Have you made any changes to the target code on top of 19.01? i.e.  =

> applied any custom patches?
>
> Thanks,
>
> Seth
>
> -----Original Message-----
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Valeriy  =

> Glushkov
> Sent: Wednesday, February 6, 2019 11:15 AM
> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Subject: Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target
>
> Hi Seth,
>
> thank you for your great explanation of the changes you have done!
>
> I added some debug prints to the target code as you requested.
>
> Here is a part of the target's debug log with the last Admin queue
> requests:
> =3D=3D=3D=3D=3D=3D=3D
> rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0  =

> entering state 1
> rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0  =

> entering state 5
> request.c: 121:nvmf_trace_command: *DEBUG*: Admin cmd: opc 0x0a fuse 0  =

> cid
> 2815 nsid 0 cdw10 0x00000006
> request.c: 136:nvmf_trace_command: *DEBUG*: SGL: Keyed: addr 0x0 key 0x0  =

> len 0x0
> request.c:  92:spdk_nvmf_request_complete: *DEBUG*: cpl: cid=3D2815
> cdw0=3D0x00000001 rsvd1=3D0 status=3D0x0000
> rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0  =

> entering state 7
> rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0  =

> entering state 9
> rdma.c: 978:request_transfer_out: *DEBUG*: RDMA RECV POSTED. Recv:
> 0xfdd2e8 Connection: 0xf04970
> rdma.c:1001:request_transfer_out: *DEBUG*: RDMA SEND POSTED. Request:
> 0xff14e0 Connection: 0xf04970
> rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0  =

> entering state 11
> rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0  =

> entering state 11
> rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0  =

> entering state 12
> rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0  =

> entering state 0 ...
> rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20  =

> entering state 1
> rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20  =

> entering state 2
> rdma.c:1414:spdk_nvmf_rdma_request_parse_sgl: *DEBUG*: Request 0xff1a20  =

> took 1 buffer/s from central pool
> rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20  =

> entering state 5
> request.c: 121:nvmf_trace_command: *DEBUG*: Admin cmd: opc 0x02 fuse 0  =

> cid
> 2847 nsid 4294967295 cdw10 0x001e0002
> request.c: 136:nvmf_trace_command: *DEBUG*: SGL: Keyed (Inv): addr
> 0xffffe483a781fdd0 key 0x82929 len 0x200
> ctrlr.c:1285:spdk_nvmf_ctrlr_get_log_page: *DEBUG*: Get log page:  =

> LID=3D0x02
> offset=3D0x0 len=3D0x7c
> request.c:  92:spdk_nvmf_request_complete: *DEBUG*: cpl: cid=3D2847
> cdw0=3D0x00000000 rsvd1=3D0 status=3D0x0000
> rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20  =

> entering state 7
> rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20  =

> entering state 8
> rdma.c:1643:spdk_nvmf_rdma_request_process: *DEBUG*: 62+1 WRs  =

> outstanding (max_send_depth 63). Waiting until some finish.
> rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20  =

> entering state 8
> rdma.c:1643:spdk_nvmf_rdma_request_process: *DEBUG*: 62+1 WRs  =

> outstanding (max_send_depth 63). Waiting until some finish.
> =3D=3D=3D=3D=3D=3D=3D
>
> The Admin queue depth =3D 32.
>
> It seems that there is an issue with the rqpair->current_send_depth  =

> counter as the request was the only one in the queue at the moment.
>
> I could send you the complete debug log, if you need.
>
> Thank you for your help.
>
> --
> Best regards,
> Valeriy Glushkov
> www.starwind.com
> valeriy.glushkov(a)starwind.com
>
> Howell, Seth <seth.howell(a)intel.com> =D0=BF=D0=B8=D1=81=D0=B0=D0=BB(=D0=
=B0) =D0=B2 =D1=81=D0=B2=D0=BE=D1=91=D0=BC =D0=BF=D0=B8=D1=81=D1=8C=D0=BC=
=D0=B5 Wed, 06 Feb
> 2019 18:03:49 +0200:
>
>> Hi Valeriy,
>>
>> I made the changes that you mentioned to the RDMA request state machine.
>> Here is a little background on those changes.
>> Those changes were part of an effort to better approach the limits of
>> work requests that can be posted at once to the send queue. Previously
>> we were limiting both RDMA reads and RDMA writes to the same limit.
>> That is the one indicated by device_attr.qp_rd_atom_max. This value
>> only controls rdma READ operations, and write operations should be
>> governed by the size of the send queue. So we separated the state
>> RDMA_REQUEST_STATE_DATA_TRANSFER_PENDING into two different states,
>> one for controller to host transfers, and the other for host to
>> controller transfers.
>> This was useful in two ways. Firstly, it allowed us to loosen
>> restrictions on rdma sends, and secondly it allowed us to make the
>> states of the rdma request a linear chain that can be followed through
>> from start to finish.
>> It looks like your I/O is getting stuck in state 8, or
>> RDMA_REQUEST_STATE_DATA_TRANSFERRING_HOST_TO_CONTROLLER. I see from
>> your stacktrace that the I/O enters that state at least twice, which
>> is normal. If there is more than one I/O queued in that state, we
>> operate on them in a fifo manner. Also, if there are currently too
>> many outstanding send wrs, we wait until we receive some completions
>> on the send queue to continue. Every time we get a completion, we will
>> poll through this list to try to submit more SENDs.
>>
>> I have a few follow up questions about your configuration and
>> information we will need to be able to help:
>>
>> 1. What is the queue depth of I/O you are submitting to the target?
>> 2. Are you doing a 100% NVMe-oF read workload, or is it a mixed
>> read/write workload?
>> 3. Your stacktrace shows the I/O enter state 8 twice, does this trend
>> continue forever until you get a timeout on the I/O?
>> 4. Can you add two extra debug prints to the target, one to each if
>> statement inside of state
>> RDMA_REQUEST_STATE_DATA_TRANSFER_TO_HOST_PENDING? I want to see why we
>> are staying in that state. In those debug prints, it would be useful
>> if you could include the following information:
>> rqpair->current_send_depth, rdma_req->num_outstanding_data_wr,  =

>> rqpair->max_send_depth.
>>
>> Some other information that may be useful:
>> How many cores are you running the SPDK target on? How many devices
>> and connections do you have when this hits?
>>
>> This information will help us understand better why your I/O aren't
>> making forward progress. I am really interested to see what is going
>> on here since we don't have this issue on either the kernel or SPDK
>> initiators, and the changes we made should have loosened up the
>> requirements for sending I/O.
>>
>> Thanks,
>>
>> Seth Howell
>>
>>
>> -----Original Message-----
>> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Harris,
>> James R
>> Sent: Wednesday, February 6, 2019 7:55 AM
>> To: Storage Performance Development Kit <spdk(a)lists.01.org>
>> Subject: Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target
>>
>>
>>
>> =EF=BB=BFOn 2/6/19, 12:59 AM, "SPDK on behalf of Valeriy Glushkov"
>> <spdk-bounces(a)lists.01.org on behalf of valeriy.glushkov(a)starwind.co=
m>
>> wrote:
>>
>>     Hi Jim,
>>    Our module is an implementation of the NVMeoF/RDMA host.
>>    It works with SPDK 18.10.1 well, so the problem seems to be related
>> to the
>>     SPDK 19.01 code.
>>    I can see that the RDMA request's state engine have been changed in
>> the
>>     recent SPDK release.
>>     So it would be great if the author of the modifications could take
>> a look
>>     at the issue...
>>    Thank you for your help!
>>
>>
>> Hi Valeriy,
>>
>> Can you provide detailed information about what your host module is
>> doing to induce this behavior?  Our tests with the Linux kernel host
>> driver and the SPDK host driver do not seem to be hitting this problem.
>>
>> Thanks,
>>
>> -Jim
>>
>>
>

	=


--===============2054719530352922593==--