Hi Valeriy, This was a really stupid mistake on my part. I moved a decrement up without moving the assert that accompanied it up with it (in my original fix). Please see this one liner which should fix it. I will also backport it to 19.01.1. The silver lining here is that there is nothing functionally wrong with the code, the assert was just erroneous and will hit when SPDK is built in debug mode. https://review.gerrithub.io/c/spdk/spdk/+/446440/1/lib/nvmf/rdma.c Thank you for replying and pointing this out, Seth -----Original Message----- From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Valeriy Glushkov Sent: Tuesday, February 26, 2019 3:21 PM To: Storage Performance Development Kit Subject: Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target Hi Seth, It seems that some problem is still present in the nvmf target built from the recent SPDK trunk. It crashes with dump under our performance test. =========== # Starting SPDK v19.04-pre / DPDK 18.11.0 initialization... [ DPDK EAL parameters: nvmf --no-shconf -c 0x1 --log-level=lib.eal:6 --base-virtaddr=0x200000000000 --file-prefix=spdk_pid131804 ] EAL: No free hugepages reported in hugepages-2048kB EAL: 2 hugepages of size 1073741824 reserved, but no mounted hugetlbfs found for that size app.c: 624:spdk_app_start: *NOTICE*: Total cores available: 1 reactor.c: 233:_spdk_reactor_run: *NOTICE*: Reactor started on core 0 rdma.c:2758:spdk_nvmf_rdma_poller_poll: *ERROR*: data=0x20002458d000 length=4096 rdma.c:2758:spdk_nvmf_rdma_poller_poll: *ERROR*: data=0x20002464b000 length=4096 rdma.c:2758:spdk_nvmf_rdma_poller_poll: *ERROR*: data=0x2000245ae000 length=4096 rdma.c:2786:spdk_nvmf_rdma_poller_poll: *ERROR*: data=0x20002457e000 length=4096 rdma.c:2758:spdk_nvmf_rdma_poller_poll: *ERROR*: data=0x20002457e000 length=4096 rdma.c:2786:spdk_nvmf_rdma_poller_poll: *ERROR*: data=0x2000245a8000 length=4096 nvmf_tgt: rdma.c:2789: spdk_nvmf_rdma_poller_poll: Assertion `rdma_req->num_outstanding_data_wr > 0' failed. ^C [1]+ Aborted (core dumped) ./nvmf_tgt -c ./nvmf.conf ========= Do you need the dump file or some additional info? -- Best regards, Valeriy Glushkov www.starwind.com valeriy.glushkov(a)starwind.com Howell, Seth писал(а) в своём письме Thu, 07 Feb 2019 20:18:28 +0200: > Hi Sasha, Valeriy, > > With the help of Valeriy's logs I was able to get to the bottom of this. > The root cause is that for NVMe-oF requests that don't transfer any > data, such as keep_alive, we were not properly resetting the value of > rdma_req->num_outstanding_data_wr between uses of that structure. All > data carrying operations properly reset this value in > spdk_nvmf_rdma_req_parse_sgl. > > My local repro steps look like this for anyone interested. > > Start the SPDK target, > Submit a full queue depth worth of Smart log requests (sequentially is > fine). A smaller number also works, but takes much longer. > Wait for a while (This assumes you have keep alive enabled). Keep > alive requests will reuse the rdma_req objects slowly incrementing the > curr_send_depth on the admin qpair. > Eventually the admin qpair will be unable to submit I/O. > > I was able to fix the issue locally with the following patch. > https://review.gerrithub.io/#/c/spdk/spdk/+/443811/. Valeriy, please > let me know if applying this patch also fixes it for you ( I am pretty > sure that it will). > > Thank you for the bug report and for all of your help, > > Seth > > -----Original Message----- > From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Sasha > Kotchubievsky > Sent: Thursday, February 7, 2019 11:06 AM > To: spdk(a)lists.01.org > Subject: Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target > > Hi, > > RNR value shouldn't affect NVMF. I just want to check if NVMF prepost > enough receive requests. 19.10 introduced some new way for flow > control and count number of send and receive work requests. Probably, > NVMF doesn't pre-post enough requests. > > Which network do you use : IB or ROcE? What it is you HW and SW stack > in host and in target sides? (OS, OFED/MOFED version, NIC type) > > I'd suggest to configure NVMF with big max queue depth, and in your > test actually use a half of the value. > > On 2/7/2019 5:37 PM, Valeriy Glushkov wrote: >> Hi Sasha, >> >> There is no IBV on the host side, it's Windows. >> So we have no control over the RNR field. >> >> From a RDMA session's dump I can see that the initiator sets >> infiniband.cm.req.rnrretrcount to 0x6. >> >> Could the RNR value be related to the problem we have with SPDK 19.01 >> NVMeoF target? >> _______________________________________________ SPDK mailing list SPDK(a)lists.01.org https://lists.01.org/mailman/listinfo/spdk