From mboxrd@z Thu Jan  1 00:00:00 1970
Content-Type: multipart/mixed; boundary="===============3628646871855364070=="
MIME-Version: 1.0
From: Harris, James R <james.r.harris at intel.com>
Subject: Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target
Date: Wed, 06 Feb 2019 23:06:48 +0000
Message-ID: <CF63F9D3-60B9-49F4-95BF-404BB5A5D3CF@intel.com>
In-Reply-To: EA913ED399BBA34AA4EAC2EDC24CDD009C25E29E@FMSMSX105.amr.corp.intel.com
List-ID: <spdk@lists.01.org>
To: spdk@lists.01.org

--===============3628646871855364070==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable

Could you also (as an experiment) try reading the full 512 bytes of the SMA=
RT/Health Information log page from your host driver?  Currently it looks l=
ike it's trying to read just the first 124 bytes.

=EF=BB=BFOn 2/6/19, 4:03 PM, "SPDK on behalf of Howell, Seth" <spdk-bounces=
(a)lists.01.org on behalf of seth.howell(a)intel.com> wrote:

    Hi Valeriy,
    =

    Thanks for sending the full log. At the surface, everything looks how I=
 would expect it to. I am going to try to replicate your I/O patterns here =
locally to see if I can repro this with another initiator. In the meantime,=
 can you add debug statements to all of the places where we increment and d=
ecrement the current_send_depth value and run the test again?
    =

    Also, does it always fail on a get_log_page command?
    =

    Thanks,
    =

    Seth
    =

    -----Original Message-----
    From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Howell, Se=
th
    Sent: Wednesday, February 6, 2019 1:28 PM
    To: Storage Performance Development Kit <spdk(a)lists.01.org>
    Subject: Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target
    =

    Thanks Valeriy,
    =

    I agree with you that there is something going wrong with the current_s=
end_depth value.
    My first thought with something like this is usually that we are doing =
our math wrong and we are underflowing. But the value being reported is 62 =
which implies that we have simply submitted 62 send_wrs without reaping any=
 on the completion queue. Underflow would probably result in a number close=
r to 2^32-1. This leads me to believe we are blocking the decrement in some=
 way.
    =

    In order for a blockage like this to occur, one of two things has to be=
 happening:
    1. The SENDs are not properly being posted to the completion queue. (Th=
is is unlikely since we add the flag IBV_SEND_SIGNALED to all of the work r=
equests. The spec says they will definitely get completed back to the compl=
etion queue in that case) 2. We aren't properly decrementing the current_se=
nd_depth when we get a completion (Also very unlikely. You can inspect the =
code in spdk_nvmf_rdma_poller_poll to see that for every completion that is=
 classified as a send we decrement current_send_depth in both the standard =
and error paths).
    =

    Can you please send me the full debug log? I want to look at all of the=
 commands previous to the failed one to see if they are returning to state =
0, which would imply that we are getting the send completions back.
    Something else that would be really helpful is if you printed out the v=
alue of current_send_depth each time we increment or decrement it. That wou=
ld help me see exactly what's missing here.
    =

    Just a few more questions to establish a baseline - 1. What kind of a N=
IC are you using?
    2. Have you made any changes to the target code on top of 19.01? i.e. a=
pplied any custom patches?
    =

    Thanks,
    =

    Seth
    =

    -----Original Message-----
    From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Valeriy Gl=
ushkov
    Sent: Wednesday, February 6, 2019 11:15 AM
    To: Storage Performance Development Kit <spdk(a)lists.01.org>
    Subject: Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target
    =

    Hi Seth,
    =

    thank you for your great explanation of the changes you have done!
    =

    I added some debug prints to the target code as you requested.
    =

    Here is a part of the target's debug log with the last Admin queue
    requests:
    =3D=3D=3D=3D=3D=3D=3D
    rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 e=
ntering state 1
    rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 e=
ntering state 5
    request.c: 121:nvmf_trace_command: *DEBUG*: Admin cmd: opc 0x0a fuse 0 =
cid
    2815 nsid 0 cdw10 0x00000006
    request.c: 136:nvmf_trace_command: *DEBUG*: SGL: Keyed: addr 0x0 key 0x=
0 len 0x0
    request.c:  92:spdk_nvmf_request_complete: *DEBUG*: cpl: cid=3D2815
    cdw0=3D0x00000001 rsvd1=3D0 status=3D0x0000
    rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 e=
ntering state 7
    rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 e=
ntering state 9
    rdma.c: 978:request_transfer_out: *DEBUG*: RDMA RECV POSTED. Recv:  =

    0xfdd2e8 Connection: 0xf04970
    rdma.c:1001:request_transfer_out: *DEBUG*: RDMA SEND POSTED. Request:  =

    0xff14e0 Connection: 0xf04970
    rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 e=
ntering state 11
    rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 e=
ntering state 11
    rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 e=
ntering state 12
    rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 e=
ntering state 0 ...
    rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20 e=
ntering state 1
    rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20 e=
ntering state 2
    rdma.c:1414:spdk_nvmf_rdma_request_parse_sgl: *DEBUG*: Request 0xff1a20=
 took 1 buffer/s from central pool
    rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20 e=
ntering state 5
    request.c: 121:nvmf_trace_command: *DEBUG*: Admin cmd: opc 0x02 fuse 0 =
cid
    2847 nsid 4294967295 cdw10 0x001e0002
    request.c: 136:nvmf_trace_command: *DEBUG*: SGL: Keyed (Inv): addr
    0xffffe483a781fdd0 key 0x82929 len 0x200
    ctrlr.c:1285:spdk_nvmf_ctrlr_get_log_page: *DEBUG*: Get log page: LID=
=3D0x02
    offset=3D0x0 len=3D0x7c
    request.c:  92:spdk_nvmf_request_complete: *DEBUG*: cpl: cid=3D2847
    cdw0=3D0x00000000 rsvd1=3D0 status=3D0x0000
    rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20 e=
ntering state 7
    rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20 e=
ntering state 8
    rdma.c:1643:spdk_nvmf_rdma_request_process: *DEBUG*: 62+1 WRs outstandi=
ng (max_send_depth 63). Waiting until some finish.
    rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20 e=
ntering state 8
    rdma.c:1643:spdk_nvmf_rdma_request_process: *DEBUG*: 62+1 WRs outstandi=
ng (max_send_depth 63). Waiting until some finish.
    =3D=3D=3D=3D=3D=3D=3D
    =

    The Admin queue depth =3D 32.
    =

    It seems that there is an issue with the rqpair->current_send_depth cou=
nter as the request was the only one in the queue at the moment.
    =

    I could send you the complete debug log, if you need.
    =

    Thank you for your help.
    =

    --
    Best regards,
    Valeriy Glushkov
    www.starwind.com
    valeriy.glushkov(a)starwind.com
    =

    Howell, Seth <seth.howell(a)intel.com> =D0=BF=D0=B8=D1=81=D0=B0=D0=BB(=
=D0=B0) =D0=B2 =D1=81=D0=B2=D0=BE=D1=91=D0=BC =D0=BF=D0=B8=D1=81=D1=8C=D0=
=BC=D0=B5 Wed, 06 Feb
    2019 18:03:49 +0200:
    =

    > Hi Valeriy,
    >
    > I made the changes that you mentioned to the RDMA request state machi=
ne.  =

    > Here is a little background on those changes.
    > Those changes were part of an effort to better approach the limits of =

    > work requests that can be posted at once to the send queue. Previousl=
y =

    > we were limiting both RDMA reads and RDMA writes to the same limit.
    > That is the one indicated by device_attr.qp_rd_atom_max. This value =

    > only controls rdma READ operations, and write operations should be =

    > governed by the size of the send queue. So we separated the state =

    > RDMA_REQUEST_STATE_DATA_TRANSFER_PENDING into two different states, =

    > one for controller to host transfers, and the other for host to =

    > controller transfers.
    > This was useful in two ways. Firstly, it allowed us to loosen =

    > restrictions on rdma sends, and secondly it allowed us to make the =

    > states of the rdma request a linear chain that can be followed throug=
h =

    > from start to finish.
    > It looks like your I/O is getting stuck in state 8, or =

    > RDMA_REQUEST_STATE_DATA_TRANSFERRING_HOST_TO_CONTROLLER. I see from =

    > your stacktrace that the I/O enters that state at least twice, which =

    > is normal. If there is more than one I/O queued in that state, we =

    > operate on them in a fifo manner. Also, if there are currently too =

    > many outstanding send wrs, we wait until we receive some completions =

    > on the send queue to continue. Every time we get a completion, we wil=
l =

    > poll through this list to try to submit more SENDs.
    >
    > I have a few follow up questions about your configuration and =

    > information we will need to be able to help:
    >
    > 1. What is the queue depth of I/O you are submitting to the target?
    > 2. Are you doing a 100% NVMe-oF read workload, or is it a mixed =

    > read/write workload?
    > 3. Your stacktrace shows the I/O enter state 8 twice, does this trend =

    > continue forever until you get a timeout on the I/O?
    > 4. Can you add two extra debug prints to the target, one to each if =

    > statement inside of state =

    > RDMA_REQUEST_STATE_DATA_TRANSFER_TO_HOST_PENDING? I want to see why w=
e =

    > are staying in that state. In those debug prints, it would be useful =

    > if you could include the following information:
    > rqpair->current_send_depth, rdma_req->num_outstanding_data_wr, rqpair=
->max_send_depth.
    >
    > Some other information that may be useful:
    > How many cores are you running the SPDK target on? How many devices =

    > and connections do you have when this hits?
    >
    > This information will help us understand better why your I/O aren't =

    > making forward progress. I am really interested to see what is going =

    > on here since we don't have this issue on either the kernel or SPDK =

    > initiators, and the changes we made should have loosened up the =

    > requirements for sending I/O.
    >
    > Thanks,
    >
    > Seth Howell
    >
    >
    > -----Original Message-----
    > From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Harris, =

    > James R
    > Sent: Wednesday, February 6, 2019 7:55 AM
    > To: Storage Performance Development Kit <spdk(a)lists.01.org>
    > Subject: Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target
    >
    >
    >
    > On 2/6/19, 12:59 AM, "SPDK on behalf of Valeriy Glushkov"  =

    > <spdk-bounces(a)lists.01.org on behalf of valeriy.glushkov(a)starwind=
.com>
    > wrote:
    >
    >     Hi Jim,
    >    Our module is an implementation of the NVMeoF/RDMA host.
    >    It works with SPDK 18.10.1 well, so the problem seems to be relate=
d =

    > to the
    >     SPDK 19.01 code.
    >    I can see that the RDMA request's state engine have been changed i=
n =

    > the
    >     recent SPDK release.
    >     So it would be great if the author of the modifications could tak=
e =

    > a look
    >     at the issue...
    >    Thank you for your help!
    >
    >
    > Hi Valeriy,
    >
    > Can you provide detailed information about what your host module is =

    > doing to induce this behavior?  Our tests with the Linux kernel host =

    > driver and the SPDK host driver do not seem to be hitting this proble=
m.
    >
    > Thanks,
    >
    > -Jim
    >
    >
    =

    =

    	=

    =

    =

    _______________________________________________
    SPDK mailing list
    SPDK(a)lists.01.org
    https://lists.01.org/mailman/listinfo/spdk
    _______________________________________________
    SPDK mailing list
    SPDK(a)lists.01.org
    https://lists.01.org/mailman/listinfo/spdk
    _______________________________________________
    SPDK mailing list
    SPDK(a)lists.01.org
    https://lists.01.org/mailman/listinfo/spdk
    =


--===============3628646871855364070==--