All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] RDMA/rxe: skip adjusting remote addr for write in retry operation
@ 2022-05-02  5:39 Chengguang Xu
  2022-05-02 17:15 ` Pearson, Robert B
  2022-05-06 16:15 ` Jason Gunthorpe
  0 siblings, 2 replies; 5+ messages in thread
From: Chengguang Xu @ 2022-05-02  5:39 UTC (permalink / raw)
  To: zyjzyj2000, jgg, leon; +Cc: linux-rdma, Chengguang Xu

For write request the remote addr will be sent only with first packet
so we don't have to adjust wqe->iova in retry operation.

Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
---
 drivers/infiniband/sw/rxe/rxe_req.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index ae5fbc79dd5c..f08010651ef7 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -33,8 +33,6 @@ static inline void retry_first_write_send(struct rxe_qp *qp,
 		} else {
 			advance_dma_data(&wqe->dma, to_send);
 		}
-		if (mask & WR_WRITE_MASK)
-			wqe->iova += qp->mtu;
 	}
 }
 
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* RE: [RFC PATCH] RDMA/rxe: skip adjusting remote addr for write in retry operation
  2022-05-02  5:39 [RFC PATCH] RDMA/rxe: skip adjusting remote addr for write in retry operation Chengguang Xu
@ 2022-05-02 17:15 ` Pearson, Robert B
  2022-05-03 15:04   ` Chengguang Xu
  2022-05-06 16:15 ` Jason Gunthorpe
  1 sibling, 1 reply; 5+ messages in thread
From: Pearson, Robert B @ 2022-05-02 17:15 UTC (permalink / raw)
  To: Chengguang Xu, zyjzyj2000, jgg, leon; +Cc: linux-rdma

> -----Original Message-----
> From: Chengguang Xu <cgxu519@mykernel.net> 
> Sent: Monday, May 2, 2022 12:39 AM
> To: zyjzyj2000@gmail.com; jgg@ziepe.ca; leon@kernel.org
> Cc: linux-rdma@vger.kernel.org; Chengguang Xu <cgxu519@mykernel.net>
> Subject: [RFC PATCH] RDMA/rxe: skip adjusting remote addr for write in retry operation
> 
> For write request the remote addr will be sent only with first packet so we don't have to adjust wqe->iova in retry operation.

This is problematic for lossy networks. A very large read request, say 8MiB, sends 2048 packets in response without any acknowledgement
from the requester. If the packet loss rate was 1% the read request would never finish as the probability of sending 2048 packets without
loss is very small. The way the code works today is that the iova is adjusted, and if you are lucky the responder has already finished the
previous read operation and starts over with a new read reply starting with a first packet at iova. If you are less fortunate the previous
read reply has not finished and the responder will continue to work on it until it is finished before looking at the new read request wqe.
the completer will respond to each out of order packet by checking to see if it should start a retry but since it has already done so
it drops the packet. It's messy but one can make forward progress ~100 packets at a time. It would be faster if the responder saw that
the new request replaced the old one and stopped sending packets on the old read. I have no idea how CX NICs do this but just restarting
from scratch seems bad.

Bob


Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
---
 drivers/infiniband/sw/rxe/rxe_req.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index ae5fbc79dd5c..f08010651ef7 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -33,8 +33,6 @@ static inline void retry_first_write_send(struct rxe_qp *qp,
 		} else {
 			advance_dma_data(&wqe->dma, to_send);
 		}
-		if (mask & WR_WRITE_MASK)
-			wqe->iova += qp->mtu;
 	}
 }
 
--
2.35.1



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] RDMA/rxe: skip adjusting remote addr for write in retry operation
  2022-05-02 17:15 ` Pearson, Robert B
@ 2022-05-03 15:04   ` Chengguang Xu
  2022-05-04 15:25     ` Robert Pearson
  0 siblings, 1 reply; 5+ messages in thread
From: Chengguang Xu @ 2022-05-03 15:04 UTC (permalink / raw)
  To: Pearson, Robert B, zyjzyj2000, jgg, leon; +Cc: linux-rdma

在 2022/5/3 1:15, Pearson, Robert B 写道:
>> -----Original Message-----
>> From: Chengguang Xu <cgxu519@mykernel.net>
>> Sent: Monday, May 2, 2022 12:39 AM
>> To: zyjzyj2000@gmail.com; jgg@ziepe.ca; leon@kernel.org
>> Cc: linux-rdma@vger.kernel.org; Chengguang Xu <cgxu519@mykernel.net>
>> Subject: [RFC PATCH] RDMA/rxe: skip adjusting remote addr for write in retry operation
>>
>> For write request the remote addr will be sent only with first packet so we don't have to adjust wqe->iova in retry operation.
> This is problematic for lossy networks. A very large read request, say 8MiB, sends 2048 packets in response without any acknowledgement
> from the requester. If the packet loss rate was 1% the read request would never finish as the probability of sending 2048 packets without
> loss is very small. The way the code works today is that the iova is adjusted, and if you are lucky the responder has already finished the
> previous read operation and starts over with a new read reply starting with a first packet at iova. If you are less fortunate the previous
> read reply has not finished and the responder will continue to work on it until it is finished before looking at the new read request wqe.
> the completer will respond to each out of order packet by checking to see if it should start a retry but since it has already done so
> it drops the packet. It's messy but one can make forward progress ~100 packets at a time. It would be faster if the responder saw that
> the new request replaced the old one and stopped sending packets on the old read. I have no idea how CX NICs do this but just restarting
> from scratch seems bad.

I agree that read request indeed needs to adjust iova during retry and  
the adjustment(for read) has already done in below logic in req_retry().


if (mask & WR_READ_MASK) {
         npsn = (wqe->dma.length - wqe->dma.resid) /
                      qp->mtu;
         wqe->iova += npsn * qp->mtu;
}

For write request, retry will not send new iova because only first write 
packet has RXE_RETH_MASK regardless iova adjustment.
Am I missing something?


Thanks,
Chengguang






^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] RDMA/rxe: skip adjusting remote addr for write in retry operation
  2022-05-03 15:04   ` Chengguang Xu
@ 2022-05-04 15:25     ` Robert Pearson
  0 siblings, 0 replies; 5+ messages in thread
From: Robert Pearson @ 2022-05-04 15:25 UTC (permalink / raw)
  To: Chengguang Xu; +Cc: Pearson, Robert B, zyjzyj2000, jgg, leon, linux-rdma

Sorry I misread your original post. You are correct. The wqe->iova is
only used to fill in the RETH header so it is not needed after
the first packet.
This commit looks OK.

On Tue, May 3, 2022 at 2:59 PM Chengguang Xu <cgxu519@mykernel.net> wrote:
>
> 在 2022/5/3 1:15, Pearson, Robert B 写道:
> >> -----Original Message-----
> >> From: Chengguang Xu <cgxu519@mykernel.net>
> >> Sent: Monday, May 2, 2022 12:39 AM
> >> To: zyjzyj2000@gmail.com; jgg@ziepe.ca; leon@kernel.org
> >> Cc: linux-rdma@vger.kernel.org; Chengguang Xu <cgxu519@mykernel.net>
> >> Subject: [RFC PATCH] RDMA/rxe: skip adjusting remote addr for write in retry operation
> >>
> >> For write request the remote addr will be sent only with first packet so we don't have to adjust wqe->iova in retry operation.
> > This is problematic for lossy networks. A very large read request, say 8MiB, sends 2048 packets in response without any acknowledgement
> > from the requester. If the packet loss rate was 1% the read request would never finish as the probability of sending 2048 packets without
> > loss is very small. The way the code works today is that the iova is adjusted, and if you are lucky the responder has already finished the
> > previous read operation and starts over with a new read reply starting with a first packet at iova. If you are less fortunate the previous
> > read reply has not finished and the responder will continue to work on it until it is finished before looking at the new read request wqe.
> > the completer will respond to each out of order packet by checking to see if it should start a retry but since it has already done so
> > it drops the packet. It's messy but one can make forward progress ~100 packets at a time. It would be faster if the responder saw that
> > the new request replaced the old one and stopped sending packets on the old read. I have no idea how CX NICs do this but just restarting
> > from scratch seems bad.
>
> I agree that read request indeed needs to adjust iova during retry and
> the adjustment(for read) has already done in below logic in req_retry().
>
>
> if (mask & WR_READ_MASK) {
>          npsn = (wqe->dma.length - wqe->dma.resid) /
>                       qp->mtu;
>          wqe->iova += npsn * qp->mtu;
> }
>
> For write request, retry will not send new iova because only first write
> packet has RXE_RETH_MASK regardless iova adjustment.
> Am I missing something?
>
>
> Thanks,
> Chengguang
>
>
>
>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] RDMA/rxe: skip adjusting remote addr for write in retry operation
  2022-05-02  5:39 [RFC PATCH] RDMA/rxe: skip adjusting remote addr for write in retry operation Chengguang Xu
  2022-05-02 17:15 ` Pearson, Robert B
@ 2022-05-06 16:15 ` Jason Gunthorpe
  1 sibling, 0 replies; 5+ messages in thread
From: Jason Gunthorpe @ 2022-05-06 16:15 UTC (permalink / raw)
  To: Chengguang Xu; +Cc: zyjzyj2000, leon, linux-rdma

On Mon, May 02, 2022 at 01:39:07AM -0400, Chengguang Xu wrote:
> For write request the remote addr will be sent only with first packet
> so we don't have to adjust wqe->iova in retry operation.
> 
> Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
> Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
> ---
>  drivers/infiniband/sw/rxe/rxe_req.c | 2 --
>  1 file changed, 2 deletions(-)

Applied to for-next, thanks

Jason

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-05-06 16:15 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-02  5:39 [RFC PATCH] RDMA/rxe: skip adjusting remote addr for write in retry operation Chengguang Xu
2022-05-02 17:15 ` Pearson, Robert B
2022-05-03 15:04   ` Chengguang Xu
2022-05-04 15:25     ` Robert Pearson
2022-05-06 16:15 ` Jason Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.