From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============3628646871855364070==" MIME-Version: 1.0 From: Harris, James R Subject: Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target Date: Wed, 06 Feb 2019 23:06:48 +0000 Message-ID: In-Reply-To: EA913ED399BBA34AA4EAC2EDC24CDD009C25E29E@FMSMSX105.amr.corp.intel.com List-ID: To: spdk@lists.01.org --===============3628646871855364070== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Could you also (as an experiment) try reading the full 512 bytes of the SMA= RT/Health Information log page from your host driver? Currently it looks l= ike it's trying to read just the first 124 bytes. =EF=BB=BFOn 2/6/19, 4:03 PM, "SPDK on behalf of Howell, Seth" wrote: Hi Valeriy, = Thanks for sending the full log. At the surface, everything looks how I= would expect it to. I am going to try to replicate your I/O patterns here = locally to see if I can repro this with another initiator. In the meantime,= can you add debug statements to all of the places where we increment and d= ecrement the current_send_depth value and run the test again? = Also, does it always fail on a get_log_page command? = Thanks, = Seth = -----Original Message----- From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Howell, Se= th Sent: Wednesday, February 6, 2019 1:28 PM To: Storage Performance Development Kit Subject: Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target = Thanks Valeriy, = I agree with you that there is something going wrong with the current_s= end_depth value. My first thought with something like this is usually that we are doing = our math wrong and we are underflowing. But the value being reported is 62 = which implies that we have simply submitted 62 send_wrs without reaping any= on the completion queue. Underflow would probably result in a number close= r to 2^32-1. This leads me to believe we are blocking the decrement in some= way. = In order for a blockage like this to occur, one of two things has to be= happening: 1. The SENDs are not properly being posted to the completion queue. (Th= is is unlikely since we add the flag IBV_SEND_SIGNALED to all of the work r= equests. The spec says they will definitely get completed back to the compl= etion queue in that case) 2. We aren't properly decrementing the current_se= nd_depth when we get a completion (Also very unlikely. You can inspect the = code in spdk_nvmf_rdma_poller_poll to see that for every completion that is= classified as a send we decrement current_send_depth in both the standard = and error paths). = Can you please send me the full debug log? I want to look at all of the= commands previous to the failed one to see if they are returning to state = 0, which would imply that we are getting the send completions back. Something else that would be really helpful is if you printed out the v= alue of current_send_depth each time we increment or decrement it. That wou= ld help me see exactly what's missing here. = Just a few more questions to establish a baseline - 1. What kind of a N= IC are you using? 2. Have you made any changes to the target code on top of 19.01? i.e. a= pplied any custom patches? = Thanks, = Seth = -----Original Message----- From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Valeriy Gl= ushkov Sent: Wednesday, February 6, 2019 11:15 AM To: Storage Performance Development Kit Subject: Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target = Hi Seth, = thank you for your great explanation of the changes you have done! = I added some debug prints to the target code as you requested. = Here is a part of the target's debug log with the last Admin queue requests: =3D=3D=3D=3D=3D=3D=3D rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 e= ntering state 1 rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 e= ntering state 5 request.c: 121:nvmf_trace_command: *DEBUG*: Admin cmd: opc 0x0a fuse 0 = cid 2815 nsid 0 cdw10 0x00000006 request.c: 136:nvmf_trace_command: *DEBUG*: SGL: Keyed: addr 0x0 key 0x= 0 len 0x0 request.c: 92:spdk_nvmf_request_complete: *DEBUG*: cpl: cid=3D2815 cdw0=3D0x00000001 rsvd1=3D0 status=3D0x0000 rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 e= ntering state 7 rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 e= ntering state 9 rdma.c: 978:request_transfer_out: *DEBUG*: RDMA RECV POSTED. Recv: = 0xfdd2e8 Connection: 0xf04970 rdma.c:1001:request_transfer_out: *DEBUG*: RDMA SEND POSTED. Request: = 0xff14e0 Connection: 0xf04970 rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 e= ntering state 11 rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 e= ntering state 11 rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 e= ntering state 12 rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 e= ntering state 0 ... rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20 e= ntering state 1 rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20 e= ntering state 2 rdma.c:1414:spdk_nvmf_rdma_request_parse_sgl: *DEBUG*: Request 0xff1a20= took 1 buffer/s from central pool rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20 e= ntering state 5 request.c: 121:nvmf_trace_command: *DEBUG*: Admin cmd: opc 0x02 fuse 0 = cid 2847 nsid 4294967295 cdw10 0x001e0002 request.c: 136:nvmf_trace_command: *DEBUG*: SGL: Keyed (Inv): addr 0xffffe483a781fdd0 key 0x82929 len 0x200 ctrlr.c:1285:spdk_nvmf_ctrlr_get_log_page: *DEBUG*: Get log page: LID= =3D0x02 offset=3D0x0 len=3D0x7c request.c: 92:spdk_nvmf_request_complete: *DEBUG*: cpl: cid=3D2847 cdw0=3D0x00000000 rsvd1=3D0 status=3D0x0000 rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20 e= ntering state 7 rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20 e= ntering state 8 rdma.c:1643:spdk_nvmf_rdma_request_process: *DEBUG*: 62+1 WRs outstandi= ng (max_send_depth 63). Waiting until some finish. rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20 e= ntering state 8 rdma.c:1643:spdk_nvmf_rdma_request_process: *DEBUG*: 62+1 WRs outstandi= ng (max_send_depth 63). Waiting until some finish. =3D=3D=3D=3D=3D=3D=3D = The Admin queue depth =3D 32. = It seems that there is an issue with the rqpair->current_send_depth cou= nter as the request was the only one in the queue at the moment. = I could send you the complete debug log, if you need. = Thank you for your help. = -- Best regards, Valeriy Glushkov www.starwind.com valeriy.glushkov(a)starwind.com = Howell, Seth =D0=BF=D0=B8=D1=81=D0=B0=D0=BB(= =D0=B0) =D0=B2 =D1=81=D0=B2=D0=BE=D1=91=D0=BC =D0=BF=D0=B8=D1=81=D1=8C=D0= =BC=D0=B5 Wed, 06 Feb 2019 18:03:49 +0200: = > Hi Valeriy, > > I made the changes that you mentioned to the RDMA request state machi= ne. = > Here is a little background on those changes. > Those changes were part of an effort to better approach the limits of = > work requests that can be posted at once to the send queue. Previousl= y = > we were limiting both RDMA reads and RDMA writes to the same limit. > That is the one indicated by device_attr.qp_rd_atom_max. This value = > only controls rdma READ operations, and write operations should be = > governed by the size of the send queue. So we separated the state = > RDMA_REQUEST_STATE_DATA_TRANSFER_PENDING into two different states, = > one for controller to host transfers, and the other for host to = > controller transfers. > This was useful in two ways. Firstly, it allowed us to loosen = > restrictions on rdma sends, and secondly it allowed us to make the = > states of the rdma request a linear chain that can be followed throug= h = > from start to finish. > It looks like your I/O is getting stuck in state 8, or = > RDMA_REQUEST_STATE_DATA_TRANSFERRING_HOST_TO_CONTROLLER. I see from = > your stacktrace that the I/O enters that state at least twice, which = > is normal. If there is more than one I/O queued in that state, we = > operate on them in a fifo manner. Also, if there are currently too = > many outstanding send wrs, we wait until we receive some completions = > on the send queue to continue. Every time we get a completion, we wil= l = > poll through this list to try to submit more SENDs. > > I have a few follow up questions about your configuration and = > information we will need to be able to help: > > 1. What is the queue depth of I/O you are submitting to the target? > 2. Are you doing a 100% NVMe-oF read workload, or is it a mixed = > read/write workload? > 3. Your stacktrace shows the I/O enter state 8 twice, does this trend = > continue forever until you get a timeout on the I/O? > 4. Can you add two extra debug prints to the target, one to each if = > statement inside of state = > RDMA_REQUEST_STATE_DATA_TRANSFER_TO_HOST_PENDING? I want to see why w= e = > are staying in that state. In those debug prints, it would be useful = > if you could include the following information: > rqpair->current_send_depth, rdma_req->num_outstanding_data_wr, rqpair= ->max_send_depth. > > Some other information that may be useful: > How many cores are you running the SPDK target on? How many devices = > and connections do you have when this hits? > > This information will help us understand better why your I/O aren't = > making forward progress. I am really interested to see what is going = > on here since we don't have this issue on either the kernel or SPDK = > initiators, and the changes we made should have loosened up the = > requirements for sending I/O. > > Thanks, > > Seth Howell > > > -----Original Message----- > From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Harris, = > James R > Sent: Wednesday, February 6, 2019 7:55 AM > To: Storage Performance Development Kit > Subject: Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target > > > > On 2/6/19, 12:59 AM, "SPDK on behalf of Valeriy Glushkov" = > > wrote: > > Hi Jim, > Our module is an implementation of the NVMeoF/RDMA host. > It works with SPDK 18.10.1 well, so the problem seems to be relate= d = > to the > SPDK 19.01 code. > I can see that the RDMA request's state engine have been changed i= n = > the > recent SPDK release. > So it would be great if the author of the modifications could tak= e = > a look > at the issue... > Thank you for your help! > > > Hi Valeriy, > > Can you provide detailed information about what your host module is = > doing to induce this behavior? Our tests with the Linux kernel host = > driver and the SPDK host driver do not seem to be hitting this proble= m. > > Thanks, > > -Jim > > = = = = = _______________________________________________ SPDK mailing list SPDK(a)lists.01.org https://lists.01.org/mailman/listinfo/spdk _______________________________________________ SPDK mailing list SPDK(a)lists.01.org https://lists.01.org/mailman/listinfo/spdk _______________________________________________ SPDK mailing list SPDK(a)lists.01.org https://lists.01.org/mailman/listinfo/spdk = --===============3628646871855364070==--