All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-next 0/6] Fix extra/redundant copies
@ 2021-06-18  4:57 Bob Pearson
  2021-06-18  4:57 ` [PATCH for-next 1/6] RDMA/rxe: Fix useless copy in send_atomic_ack Bob Pearson
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: Bob Pearson @ 2021-06-18  4:57 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

This series of patches removes or shortens several unneeded passes over
packet data.

Applies cleanly to for-next after the memory windows commits.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---

Bob Pearson (6):
  RDMA/rxe: Fix useless copy in send_atomic_ack
  RDMA/rxe: Fix redundant call to ip_send_check
  RDMA/rxe: Fix extra copies in build_rdma_network_hdr
  RDMA/rxe: Fix over copying in get_srq_wqe
  RDMA/rxe: Fix extra copy in prepare_ack_packet
  RDMA/rxe: Fix redundant skb_put_zero

 drivers/infiniband/sw/rxe/rxe_net.c  |  4 +-
 drivers/infiniband/sw/rxe/rxe_resp.c | 56 ++++++++++++----------------
 2 files changed, 24 insertions(+), 36 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH for-next 1/6] RDMA/rxe: Fix useless copy in send_atomic_ack
  2021-06-18  4:57 [PATCH for-next 0/6] Fix extra/redundant copies Bob Pearson
@ 2021-06-18  4:57 ` Bob Pearson
  2021-06-18  4:57 ` [PATCH for-next 2/6] RDMA/rxe: Fix redundant call to ip_send_check Bob Pearson
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Bob Pearson @ 2021-06-18  4:57 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

From: Bob Pearson <rpearson@hpe.com>

In send_atomic_ack() in rxe_resp.c there is code copying ack_pkt
into the skb->cb[]. This doesn't do anything useful because the
cb[] is not used in the transmit path by the rxe driver.

Remove this code.

Fixes: 4c93496f18ce ("IB/rxe: do not copy extra stack memory to skb")
Signed-off-by: Bob Pearson <rpearson@hpe.com>
---
 drivers/infiniband/sw/rxe/rxe_resp.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index f8a7ccd4d8b7..5565d88e0261 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -1023,10 +1023,6 @@ static int send_atomic_ack(struct rxe_qp *qp, struct rxe_pkt_info *pkt,
 	free_rd_atomic_resource(qp, res);
 	rxe_advance_resp_resource(qp);
 
-	memcpy(SKB_TO_PKT(skb), &ack_pkt, sizeof(ack_pkt));
-	memset((unsigned char *)SKB_TO_PKT(skb) + sizeof(ack_pkt), 0,
-	       sizeof(skb->cb) - sizeof(ack_pkt));
-
 	skb_get(skb);
 	res->type = RXE_ATOMIC_MASK;
 	res->atomic.skb = skb;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH for-next 2/6] RDMA/rxe: Fix redundant call to ip_send_check
  2021-06-18  4:57 [PATCH for-next 0/6] Fix extra/redundant copies Bob Pearson
  2021-06-18  4:57 ` [PATCH for-next 1/6] RDMA/rxe: Fix useless copy in send_atomic_ack Bob Pearson
@ 2021-06-18  4:57 ` Bob Pearson
  2021-06-18  4:57 ` [PATCH for-next 3/6] RDMA/rxe: Fix extra copies in build_rdma_network_hdr Bob Pearson
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Bob Pearson @ 2021-06-18  4:57 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

For IPV4 packets sent on the wire the rxe driver calls ip_local_out()
which immediately calls __ip_local_out() which sets iph->tot_len and
calls ip_send_check(). This code is duplicated in prepare4(). On the
loopback path the IP header checksum and tot_len fields are not used so
they do not need to be set.

Remove this redundant code.

Fixes: 8700e3e7c485 ("Soft RoCE driver")
Signed-of-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_net.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
index 01662727dca0..178a66a45312 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -269,8 +269,6 @@ static void prepare_ipv4_hdr(struct dst_entry *dst, struct sk_buff *skb,
 	iph->ttl	=	ttl;
 	__ip_select_ident(dev_net(dst->dev), iph,
 			  skb_shinfo(skb)->gso_segs ?: 1);
-	iph->tot_len = htons(skb->len);
-	ip_send_check(iph);
 }
 
 static void prepare_ipv6_hdr(struct dst_entry *dst, struct sk_buff *skb,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH for-next 3/6] RDMA/rxe: Fix extra copies in build_rdma_network_hdr
  2021-06-18  4:57 [PATCH for-next 0/6] Fix extra/redundant copies Bob Pearson
  2021-06-18  4:57 ` [PATCH for-next 1/6] RDMA/rxe: Fix useless copy in send_atomic_ack Bob Pearson
  2021-06-18  4:57 ` [PATCH for-next 2/6] RDMA/rxe: Fix redundant call to ip_send_check Bob Pearson
@ 2021-06-18  4:57 ` Bob Pearson
  2021-06-18  4:57 ` [PATCH for-next 4/6] RDMA/rxe: Fix over copying in get_srq_wqe Bob Pearson
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Bob Pearson @ 2021-06-18  4:57 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

build_rdma_network_hdr() in rxe_resp.c does more copying than is
needed. Remove this subroutine and eliminate the extra copies for
IPV6 and reduce the extra copying for IPV4.

Fixes: e404f945a610 ("IB/rxe: improved debug prints & code cleanup")
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_resp.c | 29 ++++++++++++----------------
 1 file changed, 12 insertions(+), 17 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index 5565d88e0261..5718c8bb28ac 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -785,18 +785,6 @@ static enum resp_states read_reply(struct rxe_qp *qp,
 	return state;
 }
 
-static void build_rdma_network_hdr(union rdma_network_hdr *hdr,
-				   struct rxe_pkt_info *pkt)
-{
-	struct sk_buff *skb = PKT_TO_SKB(pkt);
-
-	memset(hdr, 0, sizeof(*hdr));
-	if (skb->protocol == htons(ETH_P_IP))
-		memcpy(&hdr->roce4grh, ip_hdr(skb), sizeof(hdr->roce4grh));
-	else if (skb->protocol == htons(ETH_P_IPV6))
-		memcpy(&hdr->ibgrh, ipv6_hdr(skb), sizeof(hdr->ibgrh));
-}
-
 static int invalidate_rkey(struct rxe_qp *qp, u32 rkey)
 {
 	if (rkey_is_mw(rkey))
@@ -811,16 +799,23 @@ static int invalidate_rkey(struct rxe_qp *qp, u32 rkey)
 static enum resp_states execute(struct rxe_qp *qp, struct rxe_pkt_info *pkt)
 {
 	enum resp_states err;
+	struct sk_buff *skb = PKT_TO_SKB(pkt);
+	union rdma_network_hdr hdr;
 
 	if (pkt->mask & RXE_SEND_MASK) {
 		if (qp_type(qp) == IB_QPT_UD ||
 		    qp_type(qp) == IB_QPT_SMI ||
 		    qp_type(qp) == IB_QPT_GSI) {
-			union rdma_network_hdr hdr;
-
-			build_rdma_network_hdr(&hdr, pkt);
-
-			err = send_data_in(qp, &hdr, sizeof(hdr));
+			if (skb->protocol == htons(ETH_P_IP)) {
+				memset(&hdr.reserved, 0,
+						sizeof(hdr.reserved));
+				memcpy(&hdr.roce4grh, ip_hdr(skb),
+						sizeof(hdr.roce4grh));
+				err = send_data_in(qp, &hdr, sizeof(hdr));
+			} else {
+				err = send_data_in(qp, ipv6_hdr(skb),
+						sizeof(hdr));
+			}
 			if (err)
 				return err;
 		}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH for-next 4/6] RDMA/rxe: Fix over copying in get_srq_wqe
  2021-06-18  4:57 [PATCH for-next 0/6] Fix extra/redundant copies Bob Pearson
                   ` (2 preceding siblings ...)
  2021-06-18  4:57 ` [PATCH for-next 3/6] RDMA/rxe: Fix extra copies in build_rdma_network_hdr Bob Pearson
@ 2021-06-18  4:57 ` Bob Pearson
  2021-06-18  4:57 ` [PATCH for-next 5/6] RDMA/rxe: Fix extra copy in prepare_ack_packet Bob Pearson
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Bob Pearson @ 2021-06-18  4:57 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Currently get_srq_wqe() in rxe_resp.c copies the maximum possible number
of bytes from the wqe into the QPs copy of the SRQ wqe. This is usually
extra work and risks reading past the end of the SRQ circular buffer if
the SRQ is configured with less than the maximum possible number of SGEs.

Check the number of SGEs is not too large.
Compute the actual number of bytes in the WR and copy only those.

Fixes: 8700e3e7c485 ("Soft RoCE driver")
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_resp.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index 5718c8bb28ac..93322d20c0ab 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -296,6 +296,7 @@ static enum resp_states get_srq_wqe(struct rxe_qp *qp)
 	struct rxe_recv_wqe *wqe;
 	struct ib_event ev;
 	unsigned int count;
+	size_t size;
 
 	if (srq->error)
 		return RESPST_ERR_RNR;
@@ -311,8 +312,13 @@ static enum resp_states get_srq_wqe(struct rxe_qp *qp)
 		return RESPST_ERR_RNR;
 	}
 
-	/* note kernel and user space recv wqes have same size */
-	memcpy(&qp->resp.srq_wqe, wqe, sizeof(qp->resp.srq_wqe));
+	/* don't trust user space data */
+	if (unlikely(wqe->dma.num_sge > srq->rq.max_sge)) {
+		pr_warn("%s: invalid num_sge in SRQ entry\n", __func__);
+		return RESPST_ERR_MALFORMED_WQE;
+	}
+	size = sizeof(wqe) + wqe->dma.num_sge*sizeof(struct rxe_sge);
+	memcpy(&qp->resp.srq_wqe, wqe, size);
 
 	qp->resp.wqe = &qp->resp.srq_wqe.wqe;
 	if (qp->is_user) {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH for-next 5/6] RDMA/rxe: Fix extra copy in prepare_ack_packet
  2021-06-18  4:57 [PATCH for-next 0/6] Fix extra/redundant copies Bob Pearson
                   ` (3 preceding siblings ...)
  2021-06-18  4:57 ` [PATCH for-next 4/6] RDMA/rxe: Fix over copying in get_srq_wqe Bob Pearson
@ 2021-06-18  4:57 ` Bob Pearson
  2021-06-18  4:57 ` [PATCH for-next 6/6] RDMA/rxe: Fix redundant skb_put_zero Bob Pearson
  2021-06-22 18:40 ` [PATCH for-next 0/6] Fix extra/redundant copies Jason Gunthorpe
  6 siblings, 0 replies; 13+ messages in thread
From: Bob Pearson @ 2021-06-18  4:57 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Currently prepare_ack_packet writes almost all the fields of the
BTH in the ack packet twice. Replace code with the subroutine
init_bth().

Fixes: 8700e3e7c485 ("Soft RoCE driver")
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_resp.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index 93322d20c0ab..72cdb170b67b 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -637,18 +637,11 @@ static struct sk_buff *prepare_ack_packet(struct rxe_qp *qp,
 	ack->opcode = opcode;
 	ack->mask = rxe_opcode[opcode].mask;
 	ack->paylen = paylen;
-
-	/* fill in bth using the request packet headers */
-	memcpy(ack->hdr, pkt->hdr, RXE_BTH_BYTES);
-
-	bth_set_opcode(ack, opcode);
-	bth_set_qpn(ack, qp->attr.dest_qp_num);
-	bth_set_pad(ack, pad);
-	bth_set_se(ack, 0);
-	bth_set_psn(ack, psn);
-	bth_set_ack(ack, 0);
 	ack->psn = psn;
 
+	bth_init(ack, opcode, 0, 0, pad, IB_DEFAULT_PKEY_FULL,
+		 qp->attr.dest_qp_num, 0, psn);
+
 	if (ack->mask & RXE_AETH_MASK) {
 		aeth_set_syn(ack, syndrome);
 		aeth_set_msn(ack, qp->resp.msn);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH for-next 6/6] RDMA/rxe: Fix redundant skb_put_zero
  2021-06-18  4:57 [PATCH for-next 0/6] Fix extra/redundant copies Bob Pearson
                   ` (4 preceding siblings ...)
  2021-06-18  4:57 ` [PATCH for-next 5/6] RDMA/rxe: Fix extra copy in prepare_ack_packet Bob Pearson
@ 2021-06-18  4:57 ` Bob Pearson
  2021-06-18  8:02   ` Zhu Yanjun
  2021-06-22 18:40 ` [PATCH for-next 0/6] Fix extra/redundant copies Jason Gunthorpe
  6 siblings, 1 reply; 13+ messages in thread
From: Bob Pearson @ 2021-06-18  4:57 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

rxe_init_packet() in rxe_net.c calls skb_put_zero() to reserve space
for the payload and zero it out. All these bytes are then re-written
with RoCE headers and payload. Remove this useless extra copy.

Fixes: ecb238f6a7f3 ("IB/cxgb4: use skb_put_zero()/__skb_put_zero")
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
index 178a66a45312..6605ee777667 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -470,7 +470,7 @@ struct sk_buff *rxe_init_packet(struct rxe_dev *rxe, struct rxe_av *av,
 
 	pkt->rxe	= rxe;
 	pkt->port_num	= port_num;
-	pkt->hdr	= skb_put_zero(skb, paylen);
+	pkt->hdr	= skb_put(skb, paylen);
 	pkt->mask	|= RXE_GRH_MASK;
 
 out:
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH for-next 6/6] RDMA/rxe: Fix redundant skb_put_zero
  2021-06-18  4:57 ` [PATCH for-next 6/6] RDMA/rxe: Fix redundant skb_put_zero Bob Pearson
@ 2021-06-18  8:02   ` Zhu Yanjun
  2021-06-18 15:32     ` Bob Pearson
  0 siblings, 1 reply; 13+ messages in thread
From: Zhu Yanjun @ 2021-06-18  8:02 UTC (permalink / raw)
  To: Bob Pearson; +Cc: Jason Gunthorpe, RDMA mailing list

On Fri, Jun 18, 2021 at 1:00 PM Bob Pearson <rpearsonhpe@gmail.com> wrote:
>
> rxe_init_packet() in rxe_net.c calls skb_put_zero() to reserve space
> for the payload and zero it out. All these bytes are then re-written
> with RoCE headers and payload. Remove this useless extra copy.

The paylen seems to be a variable, that is, the length of pkt->hdr is not fixed.

Can you confirm that all the pkt->hdr are re-writtenwith RoCE headers
and payload?

Zhu Yanjun

>
> Fixes: ecb238f6a7f3 ("IB/cxgb4: use skb_put_zero()/__skb_put_zero")
> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
> ---
>  drivers/infiniband/sw/rxe/rxe_net.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
> index 178a66a45312..6605ee777667 100644
> --- a/drivers/infiniband/sw/rxe/rxe_net.c
> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
> @@ -470,7 +470,7 @@ struct sk_buff *rxe_init_packet(struct rxe_dev *rxe, struct rxe_av *av,
>
>         pkt->rxe        = rxe;
>         pkt->port_num   = port_num;
> -       pkt->hdr        = skb_put_zero(skb, paylen);
> +       pkt->hdr        = skb_put(skb, paylen);
>         pkt->mask       |= RXE_GRH_MASK;
>
>  out:
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for-next 6/6] RDMA/rxe: Fix redundant skb_put_zero
  2021-06-18  8:02   ` Zhu Yanjun
@ 2021-06-18 15:32     ` Bob Pearson
  2021-06-20 14:07       ` Zhu Yanjun
  0 siblings, 1 reply; 13+ messages in thread
From: Bob Pearson @ 2021-06-18 15:32 UTC (permalink / raw)
  To: Zhu Yanjun; +Cc: Jason Gunthorpe, RDMA mailing list

On 6/18/21 3:02 AM, Zhu Yanjun wrote:
> On Fri, Jun 18, 2021 at 1:00 PM Bob Pearson <rpearsonhpe@gmail.com> wrote:
>>
>> rxe_init_packet() in rxe_net.c calls skb_put_zero() to reserve space
>> for the payload and zero it out. All these bytes are then re-written
>> with RoCE headers and payload. Remove this useless extra copy.
> 
> The paylen seems to be a variable, that is, the length of pkt->hdr is not fixed.
> 
> Can you confirm that all the pkt->hdr are re-writtenwith RoCE headers
> and payload?

Yes. rxe_init_packet() is called twice, once from rxe_req.c for request packets and once from rxe_resp.c for response packets.
In rxe_req.c in init_req_packet() paylen is set to

    paylen = rxe_opcode[opcode].length + payload + pad + RXE_ICRC_SIZE

which is the correct size of the packet from the UDP header to the frame FCS i.e. the UDP payload. rxe_opcode[opcode] is a table that includes the length of the all the RoCE headers for a given opcode which does vary. Payload is the RoCE payload and pad is the number of pad bytes required to extend the payload to a multiple of 4 bytes. RXE_ICRC_SIZE is the 4 bytes for the RoCE invariant CRC. It requires some checking but all the headers are fully written, the payload is fully copied from the client and the pad and ICRC bytes are also written. In rxe_resp.c paylen is set to the same value.
 
There are two potential issues here 1) Is the intended packet sent to the destination, and 2) is there a possibility that information can leak from the kernel to the outside. The above addresses 1). 2) requires the assumption that the NIC is not examining data outside of the proper data area in the skb and doing something with it. But you have a worse problem there since the NIC has DMA access to all of kernel memory and can send any packet it likes.

Bob Pearson

> Zhu Yanjun
> 
>>
>> Fixes: ecb238f6a7f3 ("IB/cxgb4: use skb_put_zero()/__skb_put_zero")
>> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
>> ---
>>  drivers/infiniband/sw/rxe/rxe_net.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
>> index 178a66a45312..6605ee777667 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_net.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
>> @@ -470,7 +470,7 @@ struct sk_buff *rxe_init_packet(struct rxe_dev *rxe, struct rxe_av *av,
>>
>>         pkt->rxe        = rxe;
>>         pkt->port_num   = port_num;
>> -       pkt->hdr        = skb_put_zero(skb, paylen);
>> +       pkt->hdr        = skb_put(skb, paylen);
>>         pkt->mask       |= RXE_GRH_MASK;
>>
>>  out:
>> --
>> 2.30.2
>>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for-next 6/6] RDMA/rxe: Fix redundant skb_put_zero
  2021-06-18 15:32     ` Bob Pearson
@ 2021-06-20 14:07       ` Zhu Yanjun
  2021-06-20 20:21         ` Bob Pearson
  0 siblings, 1 reply; 13+ messages in thread
From: Zhu Yanjun @ 2021-06-20 14:07 UTC (permalink / raw)
  To: Bob Pearson; +Cc: Jason Gunthorpe, RDMA mailing list

On Fri, Jun 18, 2021 at 11:32 PM Bob Pearson <rpearsonhpe@gmail.com> wrote:
>
> On 6/18/21 3:02 AM, Zhu Yanjun wrote:
> > On Fri, Jun 18, 2021 at 1:00 PM Bob Pearson <rpearsonhpe@gmail.com> wrote:
> >>
> >> rxe_init_packet() in rxe_net.c calls skb_put_zero() to reserve space
> >> for the payload and zero it out. All these bytes are then re-written
> >> with RoCE headers and payload. Remove this useless extra copy.
> >
> > The paylen seems to be a variable, that is, the length of pkt->hdr is not fixed.
> >
> > Can you confirm that all the pkt->hdr are re-writtenwith RoCE headers
> > and payload?
>
> Yes. rxe_init_packet() is called twice, once from rxe_req.c for request packets and once from rxe_resp.c for response packets.
> In rxe_req.c in init_req_packet() paylen is set to
>
>     paylen = rxe_opcode[opcode].length + payload + pad + RXE_ICRC_SIZE
>
> which is the correct size of the packet from the UDP header to the frame FCS i.e. the UDP payload. rxe_opcode[opcode] is a table that includes the length of the all the RoCE headers for a given opcode which does vary. Payload is the RoCE payload and pad is the number of pad bytes required to extend the payload to a multiple of 4 bytes. RXE_ICRC_SIZE is the 4 bytes for the RoCE invariant CRC. It requires some checking but all the headers are fully written, the payload is fully copied from the client and the pad and ICRC bytes are also written. In rxe_resp.c paylen is set to the same value.

Too complicated assignment.
So I prefer to skb_put_zero.

Zhu Yanjun
>
> There are two potential issues here 1) Is the intended packet sent to the destination, and 2) is there a possibility that information can leak from the kernel to the outside. The above addresses 1). 2) requires the assumption that the NIC is not examining data outside of the proper data area in the skb and doing something with it. But you have a worse problem there since the NIC has DMA access to all of kernel memory and can send any packet it likes.
>
> Bob Pearson
>
> > Zhu Yanjun
> >
> >>
> >> Fixes: ecb238f6a7f3 ("IB/cxgb4: use skb_put_zero()/__skb_put_zero")
> >> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
> >> ---
> >>  drivers/infiniband/sw/rxe/rxe_net.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
> >> index 178a66a45312..6605ee777667 100644
> >> --- a/drivers/infiniband/sw/rxe/rxe_net.c
> >> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
> >> @@ -470,7 +470,7 @@ struct sk_buff *rxe_init_packet(struct rxe_dev *rxe, struct rxe_av *av,
> >>
> >>         pkt->rxe        = rxe;
> >>         pkt->port_num   = port_num;
> >> -       pkt->hdr        = skb_put_zero(skb, paylen);
> >> +       pkt->hdr        = skb_put(skb, paylen);
> >>         pkt->mask       |= RXE_GRH_MASK;
> >>
> >>  out:
> >> --
> >> 2.30.2
> >>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for-next 6/6] RDMA/rxe: Fix redundant skb_put_zero
  2021-06-20 14:07       ` Zhu Yanjun
@ 2021-06-20 20:21         ` Bob Pearson
  2021-06-21  2:58           ` Zhu Yanjun
  0 siblings, 1 reply; 13+ messages in thread
From: Bob Pearson @ 2021-06-20 20:21 UTC (permalink / raw)
  To: Zhu Yanjun; +Cc: Jason Gunthorpe, RDMA mailing list

On 6/20/21 9:07 AM, Zhu Yanjun wrote:
> On Fri, Jun 18, 2021 at 11:32 PM Bob Pearson <rpearsonhpe@gmail.com> wrote:
>>
>> On 6/18/21 3:02 AM, Zhu Yanjun wrote:
>>> On Fri, Jun 18, 2021 at 1:00 PM Bob Pearson <rpearsonhpe@gmail.com> wrote:
>>>>
>>>> rxe_init_packet() in rxe_net.c calls skb_put_zero() to reserve space
>>>> for the payload and zero it out. All these bytes are then re-written
>>>> with RoCE headers and payload. Remove this useless extra copy.
>>>
>>> The paylen seems to be a variable, that is, the length of pkt->hdr is not fixed.
>>>
>>> Can you confirm that all the pkt->hdr are re-writtenwith RoCE headers
>>> and payload?
>>
>> Yes. rxe_init_packet() is called twice, once from rxe_req.c for request packets and once from rxe_resp.c for response packets.
>> In rxe_req.c in init_req_packet() paylen is set to
>>
>>     paylen = rxe_opcode[opcode].length + payload + pad + RXE_ICRC_SIZE
>>
>> which is the correct size of the packet from the UDP header to the frame FCS i.e. the UDP payload. rxe_opcode[opcode] is a table that includes the length of the all the RoCE headers for a given opcode which does vary. Payload is the RoCE payload and pad is the number of pad bytes required to extend the payload to a multiple of 4 bytes. RXE_ICRC_SIZE is the 4 bytes for the RoCE invariant CRC. It requires some checking but all the headers are fully written, the payload is fully copied from the client and the pad and ICRC bytes are also written. In rxe_resp.c paylen is set to the same value.
> 
> Too complicated assignment.
> So I prefer to skb_put_zero.

My goal here is to improve the performance of rxe. This one line adds an extra memory copy on every sent message. Without the skb_put_zero it passes all the tests and works correctly. What are you worried about exactly?

Bob
> 
> Zhu Yanjun
>>
>> There are two potential issues here 1) Is the intended packet sent to the destination, and 2) is there a possibility that information can leak from the kernel to the outside. The above addresses 1). 2) requires the assumption that the NIC is not examining data outside of the proper data area in the skb and doing something with it. But you have a worse problem there since the NIC has DMA access to all of kernel memory and can send any packet it likes.
>>
>> Bob Pearson
>>
>>> Zhu Yanjun
>>>
>>>>
>>>> Fixes: ecb238f6a7f3 ("IB/cxgb4: use skb_put_zero()/__skb_put_zero")
>>>> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
>>>> ---
>>>>  drivers/infiniband/sw/rxe/rxe_net.c | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
>>>> index 178a66a45312..6605ee777667 100644
>>>> --- a/drivers/infiniband/sw/rxe/rxe_net.c
>>>> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
>>>> @@ -470,7 +470,7 @@ struct sk_buff *rxe_init_packet(struct rxe_dev *rxe, struct rxe_av *av,
>>>>
>>>>         pkt->rxe        = rxe;
>>>>         pkt->port_num   = port_num;
>>>> -       pkt->hdr        = skb_put_zero(skb, paylen);
>>>> +       pkt->hdr        = skb_put(skb, paylen);
>>>>         pkt->mask       |= RXE_GRH_MASK;
>>>>
>>>>  out:
>>>> --
>>>> 2.30.2
>>>>
>>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for-next 6/6] RDMA/rxe: Fix redundant skb_put_zero
  2021-06-20 20:21         ` Bob Pearson
@ 2021-06-21  2:58           ` Zhu Yanjun
  0 siblings, 0 replies; 13+ messages in thread
From: Zhu Yanjun @ 2021-06-21  2:58 UTC (permalink / raw)
  To: Bob Pearson; +Cc: Jason Gunthorpe, RDMA mailing list

On Mon, Jun 21, 2021 at 4:21 AM Bob Pearson <rpearsonhpe@gmail.com> wrote:
>
> On 6/20/21 9:07 AM, Zhu Yanjun wrote:
> > On Fri, Jun 18, 2021 at 11:32 PM Bob Pearson <rpearsonhpe@gmail.com> wrote:
> >>
> >> On 6/18/21 3:02 AM, Zhu Yanjun wrote:
> >>> On Fri, Jun 18, 2021 at 1:00 PM Bob Pearson <rpearsonhpe@gmail.com> wrote:
> >>>>
> >>>> rxe_init_packet() in rxe_net.c calls skb_put_zero() to reserve space
> >>>> for the payload and zero it out. All these bytes are then re-written
> >>>> with RoCE headers and payload. Remove this useless extra copy.
> >>>
> >>> The paylen seems to be a variable, that is, the length of pkt->hdr is not fixed.
> >>>
> >>> Can you confirm that all the pkt->hdr are re-writtenwith RoCE headers
> >>> and payload?
> >>
> >> Yes. rxe_init_packet() is called twice, once from rxe_req.c for request packets and once from rxe_resp.c for response packets.
> >> In rxe_req.c in init_req_packet() paylen is set to
> >>
> >>     paylen = rxe_opcode[opcode].length + payload + pad + RXE_ICRC_SIZE
> >>
> >> which is the correct size of the packet from the UDP header to the frame FCS i.e. the UDP payload. rxe_opcode[opcode] is a table that includes the length of the all the RoCE headers for a given opcode which does vary. Payload is the RoCE payload and pad is the number of pad bytes required to extend the payload to a multiple of 4 bytes. RXE_ICRC_SIZE is the 4 bytes for the RoCE invariant CRC. It requires some checking but all the headers are fully written, the payload is fully copied from the client and the pad and ICRC bytes are also written. In rxe_resp.c paylen is set to the same value.
> >
> > Too complicated assignment.
> > So I prefer to skb_put_zero.
>
> My goal here is to improve the performance of rxe. This one line adds an extra memory copy on every sent message. Without the skb_put_zero it passes all the tests and works correctly. What are you worried about exactly?

Please show us the performance data.

Thanks
Zhu Yanjun

>
> Bob
> >
> > Zhu Yanjun
> >>
> >> There are two potential issues here 1) Is the intended packet sent to the destination, and 2) is there a possibility that information can leak from the kernel to the outside. The above addresses 1). 2) requires the assumption that the NIC is not examining data outside of the proper data area in the skb and doing something with it. But you have a worse problem there since the NIC has DMA access to all of kernel memory and can send any packet it likes.
> >>
> >> Bob Pearson
> >>
> >>> Zhu Yanjun
> >>>
> >>>>
> >>>> Fixes: ecb238f6a7f3 ("IB/cxgb4: use skb_put_zero()/__skb_put_zero")
> >>>> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
> >>>> ---
> >>>>  drivers/infiniband/sw/rxe/rxe_net.c | 2 +-
> >>>>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
> >>>> index 178a66a45312..6605ee777667 100644
> >>>> --- a/drivers/infiniband/sw/rxe/rxe_net.c
> >>>> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
> >>>> @@ -470,7 +470,7 @@ struct sk_buff *rxe_init_packet(struct rxe_dev *rxe, struct rxe_av *av,
> >>>>
> >>>>         pkt->rxe        = rxe;
> >>>>         pkt->port_num   = port_num;
> >>>> -       pkt->hdr        = skb_put_zero(skb, paylen);
> >>>> +       pkt->hdr        = skb_put(skb, paylen);
> >>>>         pkt->mask       |= RXE_GRH_MASK;
> >>>>
> >>>>  out:
> >>>> --
> >>>> 2.30.2
> >>>>
> >>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for-next 0/6] Fix extra/redundant copies
  2021-06-18  4:57 [PATCH for-next 0/6] Fix extra/redundant copies Bob Pearson
                   ` (5 preceding siblings ...)
  2021-06-18  4:57 ` [PATCH for-next 6/6] RDMA/rxe: Fix redundant skb_put_zero Bob Pearson
@ 2021-06-22 18:40 ` Jason Gunthorpe
  6 siblings, 0 replies; 13+ messages in thread
From: Jason Gunthorpe @ 2021-06-22 18:40 UTC (permalink / raw)
  To: Bob Pearson; +Cc: zyjzyj2000, linux-rdma

On Thu, Jun 17, 2021 at 11:57:37PM -0500, Bob Pearson wrote:
> This series of patches removes or shortens several unneeded passes over
> packet data.
> 
> Applies cleanly to for-next after the memory windows commits.
> 
> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
> 
> Bob Pearson (6):
>   RDMA/rxe: Fix useless copy in send_atomic_ack
>   RDMA/rxe: Fix redundant call to ip_send_check
>   RDMA/rxe: Fix extra copies in build_rdma_network_hdr
>   RDMA/rxe: Fix over copying in get_srq_wqe
>   RDMA/rxe: Fix extra copy in prepare_ack_packet
>   RDMA/rxe: Fix redundant skb_put_zero

I'm inclined to agree with Bob that the network fast path should not
rely on pre-zeroing but each written header should zero or write valid
data as it goes, for performance

So applied to for-next, thanks

Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-06-22 18:40 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-18  4:57 [PATCH for-next 0/6] Fix extra/redundant copies Bob Pearson
2021-06-18  4:57 ` [PATCH for-next 1/6] RDMA/rxe: Fix useless copy in send_atomic_ack Bob Pearson
2021-06-18  4:57 ` [PATCH for-next 2/6] RDMA/rxe: Fix redundant call to ip_send_check Bob Pearson
2021-06-18  4:57 ` [PATCH for-next 3/6] RDMA/rxe: Fix extra copies in build_rdma_network_hdr Bob Pearson
2021-06-18  4:57 ` [PATCH for-next 4/6] RDMA/rxe: Fix over copying in get_srq_wqe Bob Pearson
2021-06-18  4:57 ` [PATCH for-next 5/6] RDMA/rxe: Fix extra copy in prepare_ack_packet Bob Pearson
2021-06-18  4:57 ` [PATCH for-next 6/6] RDMA/rxe: Fix redundant skb_put_zero Bob Pearson
2021-06-18  8:02   ` Zhu Yanjun
2021-06-18 15:32     ` Bob Pearson
2021-06-20 14:07       ` Zhu Yanjun
2021-06-20 20:21         ` Bob Pearson
2021-06-21  2:58           ` Zhu Yanjun
2021-06-22 18:40 ` [PATCH for-next 0/6] Fix extra/redundant copies Jason Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.