Hi Seth, > On Feb 7, 2019, at 12:18 PM, Howell, Seth wrote: > > Hi Sasha, Valeriy, > > With the help of Valeriy's logs I was able to get to the bottom of this. The root cause is that for NVMe-oF requests that don't transfer any data, such as keep_alive, we were not properly resetting the value of rdma_req->num_outstanding_data_wr between uses of that structure. All data carrying operations properly reset this value in spdk_nvmf_rdma_req_parse_sgl. > > My local repro steps look like this for anyone interested. > > Start the SPDK target, > Submit a full queue depth worth of Smart log requests (sequentially is fine). A smaller number also works, but takes much longer. > Wait for a while (This assumes you have keep alive enabled). Keep alive requests will reuse the rdma_req objects slowly incrementing the curr_send_depth on the admin qpair. > Eventually the admin qpair will be unable to submit I/O. > > I was able to fix the issue locally with the following patch. https://review.gerrithub.io/#/c/spdk/spdk/+/443811/. Valeriy, please let me know if applying this patch also fixes it for you ( I am pretty sure that it will). Does this issue present only in 19.01, or can it also occur in 18.10.1? -- Lance Hartmann