linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "yangx.jy@fujitsu.com" <yangx.jy@fujitsu.com>
To: Zhu Yanjun <zyjzyj2000@gmail.com>
Cc: RDMA mailing list <linux-rdma@vger.kernel.org>,
	Leon Romanovsky <leon@kernel.org>,
	Bob Pearson <rpearsonhpe@gmail.com>,
	Jason Gunthorpe <jgg@nvidia.com>
Subject: Re: [PATCH v2] providers/rxe: Set the correct value of resid for inline data
Date: Mon, 23 Aug 2021 03:58:21 +0000	[thread overview]
Message-ID: <61231CDB.4010602@fujitsu.com> (raw)
In-Reply-To: <CAD=hENeOka=Fm5WqyFK3mY27t8wRqt4uZ0gQ5GJ027KhZcQVuQ@mail.gmail.com>

于 2021/8/23 10:44, Zhu Yanjun 写道:
> On Tue, Aug 17, 2021 at 8:54 AM yangx.jy@fujitsu.com
> <yangx.jy@fujitsu.com>  wrote:
>> On 2021/8/16 15:47, Zhu Yanjun wrote:
>>> On Wed, Aug 11, 2021 at 12:33 AM yangx.jy@fujitsu.com
>>> <yangx.jy@fujitsu.com>   wrote:
>>>> On 2021/8/10 11:40, Zhu Yanjun wrote:
>>>>> On Mon, Aug 9, 2021 at 10:43 PM Xiao Yang<yangx.jy@fujitsu.com>   wrote:
>>>>>> Resid indicates the residual length of transmitted bytes but current
>>>>>> rxe sets it to zero for inline data at the beginning.  In this case,
>>>>>> request will transmit zero byte to responder wrongly.
>>>>> What are the symptoms if resid is set to zero?
>>>> Hi Yanjun,
>>>>
>>>> You can construct code by the attached patch and then run
>>>> rdma_client/server to reproduce the issue.
>>>>
>>>> If resid is set to zero, running rdma_client/server:
>>>> --------------------------------
>>>> # ./rdma_client -s 10.167.220.99 -p 8765
>>>> rdma_client: start
>>>> rdma_client: end 0
>>>>
>>>> # ./rdma_server -s 10.167.220.99 -p 8765
>>>> rdma_server: start
>>>> wc.byte_len 0 recv_msg bbbbbbbbbbbbbbbb
>>>> rdma_server: end 0
>>>> --------------------------------
>>>>
>>>> rdma_client sends zero byte instead of 16 btyes to rdma_server.
>>>> rdma_server receives zero byte instead of 16 btyes from rdma_client.
>>>>
>>>> Please also see the logic about resid in kernel, for example:
>>>> --------------------------------
>>>> int rxe_requester(void *arg)
>>>> {
>>>> ...
>>>> payload = (mask&   RXE_WRITE_OR_SEND) ? wqe->dma.resid : 0;
>>>> ...
>>>> skb = init_req_packet(qp, wqe, opcode, payload,&pkt);
>>>> ...
>>>> }
>>>> --------------------------------
>>>>
>>>> Best Regards,
>>>> Xiao Yang
>>> Follow your steps on the latest rdma-core and linux upstream,
>>> make tests with the followings:
>>> "
>>> diff --git a/librdmacm/examples/rdma_client.c b/librdmacm/examples/rdma_client.c
>>> index c27047c5..6734757b 100644
>>> --- a/librdmacm/examples/rdma_client.c
>>> +++ b/librdmacm/examples/rdma_client.c
>>> @@ -52,6 +52,7 @@ static int run(void)
>>>           struct ibv_wc wc;
>>>           int ret;
>>>
>>> +       memset(send_msg, 'a', 16);
>>>           memset(&hints, 0, sizeof hints);
>>>           hints.ai_port_space = RDMA_PS_TCP;
>>>           ret = rdma_getaddrinfo(server, port,&hints,&res);
>>> diff --git a/librdmacm/examples/rdma_server.c b/librdmacm/examples/rdma_server.c
>>> index f9c766b2..afa20996 100644
>>> --- a/librdmacm/examples/rdma_server.c
>>> +++ b/librdmacm/examples/rdma_server.c
>>> @@ -132,6 +132,7 @@ static int run(void)
>>>                   goto out_disconnect;
>>>           }
>>>
>>> +       printf("wc.byte_len %u recv_msg %s\n", wc.byte_len, recv_msg);
>>>           ret = rdma_post_send(id, NULL, send_msg, 16, send_mr, send_flags);
>>>           if (ret) {
>>>                   perror("rdma_post_send");
>>> "
>>> The followings are results:
>>>
>>> # ./build/bin/rdma_server -s 10.238.154.61 -p 5486&
>>> [1] 10812
>>> # rdma_server: start
>>>
>>> # ./build/bin/rdma_client -s 10.238.154.61 -p 5486&
>>> [2] 10815
>>> # rdma_client: start
>>> wc.byte_len 16 recv_msg aaaaaaaaaaaaaaaa<--------------it seems
>>> that inline is 16?
>>> rdma_server: end 0
>>> rdma_client: end 0
>>> [1]-  Done                    ./build/bin/rdma_server -s 10.238.154.61 -p 5486
>>> [2]+  Done                    ./build/bin/rdma_client -s 10.238.154.61 -p 5486
>>>
>>> What does your commit fix?
>> Hi Yanjun,
>>
>> You missed the change that sets resid to zero on purpose:
>> -------------------------------------------------------------
>>
>> diff --git a/providers/rxe/rxe.c b/providers/rxe/rxe.c
>> index 3c3ea8bb..ed5479fe 100644
>> --- a/providers/rxe/rxe.c
>> +++ b/providers/rxe/rxe.c
>> @@ -1492,7 +1492,7 @@ static int init_send_wqe(struct rxe_qp *qp, struct rxe_wq *sq,
>>                  wqe->iova       = ibwr->wr.rdma.remote_addr;
>>
>>          wqe->dma.length         = length;
>> -       wqe->dma.resid          = length;
>> +       wqe->dma.resid          = 0;
>>          wqe->dma.num_sge        = num_sge;
>>          wqe->dma.cur_sge        = 0;
>>          wqe->dma.sge_offset     = 0;
>>
>> -------------------------------------------------------------
>> Note:
>> It is also ok to remove "wqe->dma.resid = length" here because resid has
>> been set to zero before.
>>
>> With the change, running rdma_client/server will show the impact.
> With the original source code, the rdma_server/rdma_client can work well.
>
> Why "wqe->dma.resid          = length;" is replaced by "wqe->dma.resid
>           = 0;", then this problem appears?
Hi Yanjun,

The original source code has the following logic:
-----------------------------
static int init_send_wqe(...)
{
     ...
     if (ibwr->send_flags & IBV_SEND_INLINE) {
         ...
          wqe->dma.resid = 0;
         ...
     }
     ...
     wqe->dma.resid = length;
     ...
}
-----------------------------
For inline data flag, resid is set to zero first and then replaced with 
length at subsequent steps. In this case, the issue doesn't appear.
My patch removes the useless "wqe->dma.resid = 0" and sets the correct 
"wqe->dma.resid = length" for new inline_data APIs.

Best Regards,
Xiao Yang
> Thanks
> Zhu Yanjun
>
>
>> I didn't use new API(e.g. ibv_wr_set_inline_data, ibv_wr_send) to create
>> a new test
>> but it is enough to show the impact of resid == 0 by doing some changes
>> on rxe and
>> rdma_client/server.
>>
>> Best Regards,
>> Xiao Yang
>>> Zhu Yanjun
>>>
>>>>> Thanks
>>>>> Zhu Yanjun
>>>>>
>>>>>> Resid should be set to the total length of transmitted bytes at the
>>>>>> beginning.
>>>>>>
>>>>>> Note:
>>>>>> Just remove the useless setting of resid in init_send_wqe().
>>>>>>
>>>>>> Fixes: 1a894ca10105 ("Providers/rxe: Implement ibv_create_qp_ex verb")
>>>>>> Fixes: 8337db5df125 ("Providers/rxe: Implement memory windows")
>>>>>> Signed-off-by: Xiao Yang<yangx.jy@fujitsu.com>
>>>>>> ---
>>>>>>    providers/rxe/rxe.c | 5 ++---
>>>>>>    1 file changed, 2 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/providers/rxe/rxe.c b/providers/rxe/rxe.c
>>>>>> index 3c3ea8bb..3533a325 100644
>>>>>> --- a/providers/rxe/rxe.c
>>>>>> +++ b/providers/rxe/rxe.c
>>>>>> @@ -1004,7 +1004,7 @@ static void wr_set_inline_data(struct ibv_qp_ex *ibqp, void *addr,
>>>>>>
>>>>>>           memcpy(wqe->dma.inline_data, addr, length);
>>>>>>           wqe->dma.length = length;
>>>>>> -       wqe->dma.resid = 0;
>>>>>> +       wqe->dma.resid = length;
>>>>>>    }
>>>>>>
>>>>>>    static void wr_set_inline_data_list(struct ibv_qp_ex *ibqp, size_t num_buf,
>>>>>> @@ -1035,6 +1035,7 @@ static void wr_set_inline_data_list(struct ibv_qp_ex *ibqp, size_t num_buf,
>>>>>>           }
>>>>>>
>>>>>>           wqe->dma.length = tot_length;
>>>>>> +       wqe->dma.resid = tot_length;
>>>>>>    }
>>>>>>
>>>>>>    static void wr_set_sge(struct ibv_qp_ex *ibqp, uint32_t lkey, uint64_t addr,
>>>>>> @@ -1473,8 +1474,6 @@ static int init_send_wqe(struct rxe_qp *qp, struct rxe_wq *sq,
>>>>>>           if (ibwr->send_flags&   IBV_SEND_INLINE) {
>>>>>>                   uint8_t *inline_data = wqe->dma.inline_data;
>>>>>>
>>>>>> -               wqe->dma.resid = 0;
>>>>>> -
>>>>>>                   for (i = 0; i<   num_sge; i++) {
>>>>>>                           memcpy(inline_data,
>>>>>>                                  (uint8_t *)(long)ibwr->sg_list[i].addr,
>>>>>> --
>>>>>> 2.25.1
>>>>>>
>>>>>>
>>>>>>

  reply	other threads:[~2021-08-23  3:58 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-09 15:07 [PATCH v2] providers/rxe: Set the correct value of resid for inline data Xiao Yang
2021-08-10  3:40 ` Zhu Yanjun
2021-08-10 16:33   ` yangx.jy
2021-08-16  7:47     ` Zhu Yanjun
2021-08-17  0:54       ` yangx.jy
2021-08-23  2:44         ` Zhu Yanjun
2021-08-23  3:58           ` yangx.jy [this message]
2021-08-13 22:11 ` Bob Pearson
2021-08-16  3:55   ` Zhu Yanjun
2021-08-16 18:52     ` Bob Pearson
2021-08-23  1:33       ` yangx.jy
2021-08-25  9:44         ` Leon Romanovsky
2021-09-02  8:27           ` yangx.jy
2021-09-02 19:48             ` Bob Pearson
2021-09-03  1:39               ` yangx.jy
2021-09-03  3:34             ` Zhu Yanjun
2021-09-26 10:51 ` Leon Romanovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=61231CDB.4010602@fujitsu.com \
    --to=yangx.jy@fujitsu.com \
    --cc=jgg@nvidia.com \
    --cc=leon@kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=rpearsonhpe@gmail.com \
    --cc=zyjzyj2000@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).