Hi Valeriy,

I made the changes that you mentioned to the RDMA request state machine. Here is a little background on those changes.
Those changes were part of an effort to better approach the limits of work requests that can be posted at once to the send queue. Previously we were limiting both RDMA reads and RDMA writes to the same limit. That is the one indicated by device_attr.qp_rd_atom_max. This value only controls rdma READ operations, and write operations should be governed by the size of the send queue. So we separated the state RDMA_REQUEST_STATE_DATA_TRANSFER_PENDING into two different states, one for controller to host transfers, and the other for host to controller transfers.
This was useful in two ways. Firstly, it allowed us to loosen restrictions on rdma sends, and secondly it allowed us to make the states of the rdma request a linear chain that can be followed through from start to finish.
It looks like your I/O is getting stuck in state 8, or RDMA_REQUEST_STATE_DATA_TRANSFERRING_HOST_TO_CONTROLLER. I see from your stacktrace that the I/O enters that state at least twice, which is normal. If there is more than one I/O queued in that state, we operate on them in a fifo manner. Also, if there are currently too many outstanding send wrs, we wait until we receive some completions on the send queue to continue. Every time we get a completion, we will poll through this list to try to submit more SENDs.

I have a few follow up questions about your configuration and information we will need to be able to help:

1. What is the queue depth of I/O you are submitting to the target?
2. Are you doing a 100% NVMe-oF read workload, or is it a mixed read/write workload?
3. Your stacktrace shows the I/O enter state 8 twice, does this trend continue forever until you get a timeout on the I/O?
4. Can you add two extra debug prints to the target, one to each if statement inside of state RDMA_REQUEST_STATE_DATA_TRANSFER_TO_HOST_PENDING? I want to see why we are staying in that state. In those debug prints, it would be useful if you could include the following information: rqpair->current_send_depth, rdma_req->num_outstanding_data_wr, rqpair->max_send_depth.

Some other information that may be useful:
How many cores are you running the SPDK target on? How many devices and connections do you have when this hits? 

This information will help us understand better why your I/O aren't making forward progress. I am really interested to see what is going on here since we don't have this issue on either the kernel or SPDK initiators, and the changes we made should have loosened up the requirements for sending I/O.

Thanks,

Seth Howell


-----Original Message-----
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Harris, James R
Sent: Wednesday, February 6, 2019 7:55 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target



﻿On 2/6/19, 12:59 AM, "SPDK on behalf of Valeriy Glushkov" <spdk-bounces(a)lists.01.org on behalf of valeriy.glushkov(a)starwind.com> wrote:

    Hi Jim,
    
    Our module is an implementation of the NVMeoF/RDMA host.
    
    It works with SPDK 18.10.1 well, so the problem seems to be related to the  
    SPDK 19.01 code.
    
    I can see that the RDMA request's state engine have been changed in the  
    recent SPDK release.
    So it would be great if the author of the modifications could take a look  
    at the issue...
    
    Thank you for your help!


Hi Valeriy,

Can you provide detailed information about what your host module is doing to induce this behavior?  Our tests with the Linux kernel host driver and the SPDK host driver do not seem to be hitting this problem.

Thanks,

-Jim


_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk