From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============2054719530352922593==" MIME-Version: 1.0 From: Valeriy Glushkov Subject: Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target Date: Thu, 07 Feb 2019 04:50:17 +0000 Message-ID: In-Reply-To: EA913ED399BBA34AA4EAC2EDC24CDD009C25E29E@FMSMSX105.amr.corp.intel.com List-ID: To: spdk@lists.01.org --===============2054719530352922593== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Hi Seth, It would be great if you send me a modified source with changes you need = or a patch and I add it to the SPDK build, reproduce the case and send you = the resulting debug log. It seems that the Log page request is okay itself as there are many such = requests before in the log... -- = Best regards, Valeriy Glushkov www.starwind.com valeriy.glushkov(a)starwind.com Howell, Seth =D0=BF=D0=B8=D1=81=D0=B0=D0=BB(=D0= =B0) =D0=B2 =D1=81=D0=B2=D0=BE=D1=91=D0=BC =D0=BF=D0=B8=D1=81=D1=8C=D0=BC= =D0=B5 Thu, 07 Feb = 2019 01:03:06 +0200: > Hi Valeriy, > > Thanks for sending the full log. At the surface, everything looks how I = > would expect it to. I am going to try to replicate your I/O patterns = > here locally to see if I can repro this with another initiator. In the = > meantime, can you add debug statements to all of the places where we = > increment and decrement the current_send_depth value and run the test = > again? > > Also, does it always fail on a get_log_page command? > > Thanks, > > Seth > > -----Original Message----- > From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Howell, Seth > Sent: Wednesday, February 6, 2019 1:28 PM > To: Storage Performance Development Kit > Subject: Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target > > Thanks Valeriy, > > I agree with you that there is something going wrong with the = > current_send_depth value. > My first thought with something like this is usually that we are doing = > our math wrong and we are underflowing. But the value being reported is = > 62 which implies that we have simply submitted 62 send_wrs without = > reaping any on the completion queue. Underflow would probably result in = > a number closer to 2^32-1. This leads me to believe we are blocking the = > decrement in some way. > > In order for a blockage like this to occur, one of two things has to be = > happening: > 1. The SENDs are not properly being posted to the completion queue. = > (This is unlikely since we add the flag IBV_SEND_SIGNALED to all of the = > work requests. The spec says they will definitely get completed back to = > the completion queue in that case) 2. We aren't properly decrementing = > the current_send_depth when we get a completion (Also very unlikely. You = > can inspect the code in spdk_nvmf_rdma_poller_poll to see that for every = > completion that is classified as a send we decrement current_send_depth = > in both the standard and error paths). > > Can you please send me the full debug log? I want to look at all of the = > commands previous to the failed one to see if they are returning to = > state 0, which would imply that we are getting the send completions back. > Something else that would be really helpful is if you printed out the = > value of current_send_depth each time we increment or decrement it. That = > would help me see exactly what's missing here. > > Just a few more questions to establish a baseline - 1. What kind of a = > NIC are you using? > 2. Have you made any changes to the target code on top of 19.01? i.e. = > applied any custom patches? > > Thanks, > > Seth > > -----Original Message----- > From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Valeriy = > Glushkov > Sent: Wednesday, February 6, 2019 11:15 AM > To: Storage Performance Development Kit > Subject: Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target > > Hi Seth, > > thank you for your great explanation of the changes you have done! > > I added some debug prints to the target code as you requested. > > Here is a part of the target's debug log with the last Admin queue > requests: > =3D=3D=3D=3D=3D=3D=3D > rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 = > entering state 1 > rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 = > entering state 5 > request.c: 121:nvmf_trace_command: *DEBUG*: Admin cmd: opc 0x0a fuse 0 = > cid > 2815 nsid 0 cdw10 0x00000006 > request.c: 136:nvmf_trace_command: *DEBUG*: SGL: Keyed: addr 0x0 key 0x0 = > len 0x0 > request.c: 92:spdk_nvmf_request_complete: *DEBUG*: cpl: cid=3D2815 > cdw0=3D0x00000001 rsvd1=3D0 status=3D0x0000 > rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 = > entering state 7 > rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 = > entering state 9 > rdma.c: 978:request_transfer_out: *DEBUG*: RDMA RECV POSTED. Recv: > 0xfdd2e8 Connection: 0xf04970 > rdma.c:1001:request_transfer_out: *DEBUG*: RDMA SEND POSTED. Request: > 0xff14e0 Connection: 0xf04970 > rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 = > entering state 11 > rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 = > entering state 11 > rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 = > entering state 12 > rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 = > entering state 0 ... > rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20 = > entering state 1 > rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20 = > entering state 2 > rdma.c:1414:spdk_nvmf_rdma_request_parse_sgl: *DEBUG*: Request 0xff1a20 = > took 1 buffer/s from central pool > rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20 = > entering state 5 > request.c: 121:nvmf_trace_command: *DEBUG*: Admin cmd: opc 0x02 fuse 0 = > cid > 2847 nsid 4294967295 cdw10 0x001e0002 > request.c: 136:nvmf_trace_command: *DEBUG*: SGL: Keyed (Inv): addr > 0xffffe483a781fdd0 key 0x82929 len 0x200 > ctrlr.c:1285:spdk_nvmf_ctrlr_get_log_page: *DEBUG*: Get log page: = > LID=3D0x02 > offset=3D0x0 len=3D0x7c > request.c: 92:spdk_nvmf_request_complete: *DEBUG*: cpl: cid=3D2847 > cdw0=3D0x00000000 rsvd1=3D0 status=3D0x0000 > rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20 = > entering state 7 > rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20 = > entering state 8 > rdma.c:1643:spdk_nvmf_rdma_request_process: *DEBUG*: 62+1 WRs = > outstanding (max_send_depth 63). Waiting until some finish. > rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20 = > entering state 8 > rdma.c:1643:spdk_nvmf_rdma_request_process: *DEBUG*: 62+1 WRs = > outstanding (max_send_depth 63). Waiting until some finish. > =3D=3D=3D=3D=3D=3D=3D > > The Admin queue depth =3D 32. > > It seems that there is an issue with the rqpair->current_send_depth = > counter as the request was the only one in the queue at the moment. > > I could send you the complete debug log, if you need. > > Thank you for your help. > > -- > Best regards, > Valeriy Glushkov > www.starwind.com > valeriy.glushkov(a)starwind.com > > Howell, Seth =D0=BF=D0=B8=D1=81=D0=B0=D0=BB(=D0= =B0) =D0=B2 =D1=81=D0=B2=D0=BE=D1=91=D0=BC =D0=BF=D0=B8=D1=81=D1=8C=D0=BC= =D0=B5 Wed, 06 Feb > 2019 18:03:49 +0200: > >> Hi Valeriy, >> >> I made the changes that you mentioned to the RDMA request state machine. >> Here is a little background on those changes. >> Those changes were part of an effort to better approach the limits of >> work requests that can be posted at once to the send queue. Previously >> we were limiting both RDMA reads and RDMA writes to the same limit. >> That is the one indicated by device_attr.qp_rd_atom_max. This value >> only controls rdma READ operations, and write operations should be >> governed by the size of the send queue. So we separated the state >> RDMA_REQUEST_STATE_DATA_TRANSFER_PENDING into two different states, >> one for controller to host transfers, and the other for host to >> controller transfers. >> This was useful in two ways. Firstly, it allowed us to loosen >> restrictions on rdma sends, and secondly it allowed us to make the >> states of the rdma request a linear chain that can be followed through >> from start to finish. >> It looks like your I/O is getting stuck in state 8, or >> RDMA_REQUEST_STATE_DATA_TRANSFERRING_HOST_TO_CONTROLLER. I see from >> your stacktrace that the I/O enters that state at least twice, which >> is normal. If there is more than one I/O queued in that state, we >> operate on them in a fifo manner. Also, if there are currently too >> many outstanding send wrs, we wait until we receive some completions >> on the send queue to continue. Every time we get a completion, we will >> poll through this list to try to submit more SENDs. >> >> I have a few follow up questions about your configuration and >> information we will need to be able to help: >> >> 1. What is the queue depth of I/O you are submitting to the target? >> 2. Are you doing a 100% NVMe-oF read workload, or is it a mixed >> read/write workload? >> 3. Your stacktrace shows the I/O enter state 8 twice, does this trend >> continue forever until you get a timeout on the I/O? >> 4. Can you add two extra debug prints to the target, one to each if >> statement inside of state >> RDMA_REQUEST_STATE_DATA_TRANSFER_TO_HOST_PENDING? I want to see why we >> are staying in that state. In those debug prints, it would be useful >> if you could include the following information: >> rqpair->current_send_depth, rdma_req->num_outstanding_data_wr, = >> rqpair->max_send_depth. >> >> Some other information that may be useful: >> How many cores are you running the SPDK target on? How many devices >> and connections do you have when this hits? >> >> This information will help us understand better why your I/O aren't >> making forward progress. I am really interested to see what is going >> on here since we don't have this issue on either the kernel or SPDK >> initiators, and the changes we made should have loosened up the >> requirements for sending I/O. >> >> Thanks, >> >> Seth Howell >> >> >> -----Original Message----- >> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Harris, >> James R >> Sent: Wednesday, February 6, 2019 7:55 AM >> To: Storage Performance Development Kit >> Subject: Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target >> >> >> >> =EF=BB=BFOn 2/6/19, 12:59 AM, "SPDK on behalf of Valeriy Glushkov" >> >> wrote: >> >> Hi Jim, >> Our module is an implementation of the NVMeoF/RDMA host. >> It works with SPDK 18.10.1 well, so the problem seems to be related >> to the >> SPDK 19.01 code. >> I can see that the RDMA request's state engine have been changed in >> the >> recent SPDK release. >> So it would be great if the author of the modifications could take >> a look >> at the issue... >> Thank you for your help! >> >> >> Hi Valeriy, >> >> Can you provide detailed information about what your host module is >> doing to induce this behavior? Our tests with the Linux kernel host >> driver and the SPDK host driver do not seem to be hitting this problem. >> >> Thanks, >> >> -Jim >> >> > = --===============2054719530352922593==--