All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-next v2] RDMA/rxe: Fix ib_device reference counting (again)
@ 2021-03-03 22:56 Bob Pearson
  2021-03-04  8:25 ` Leon Romanovsky
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Bob Pearson @ 2021-03-03 22:56 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Three errors occurred in the fix referenced below.

1) rxe_rcv_mcast_pkt() dropped a reference to ib_device when
no error occured causing an underflow on the reference counter.
This code is cleaned up to be clearer and easier to read.

2) Extending the reference taken by rxe_get_dev_from_net() in
rxe_udp_encap_recv() until each skb is freed was not matched by
a reference in the loopback path resulting in underflows.

3) In rxe_comp.c the function free_pkt() did not clear skb which
triggered a warning at done: and could possibly at exit: in
rxe_completer(). The WARN_ONCE() calls are not actually needed.

This patch fixes these errors.

Fixes: 899aba891cab ("RDMA/rxe: Fix FIXME in rxe_udp_encap_recv()")
Signed-off-by: Bob Pearson <rpearson@hpe.com>
---
Version 2:
v1 of this patch incorrectly added a WARN_ON_ONCE in rxe_completer
where it could be triggered for normal traffic. This version
replaced that with a pr_warn located correctly.

v1 of this patch placed a call to kfree_skb in an if statement
that could trigger style warnings. This version cleans that up.

 drivers/infiniband/sw/rxe/rxe_comp.c |  6 +--
 drivers/infiniband/sw/rxe/rxe_net.c  | 10 ++++-
 drivers/infiniband/sw/rxe/rxe_recv.c | 60 +++++++++++++++++-----------
 3 files changed, 48 insertions(+), 28 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
index a8ac791a1bb9..96e5a73579f8 100644
--- a/drivers/infiniband/sw/rxe/rxe_comp.c
+++ b/drivers/infiniband/sw/rxe/rxe_comp.c
@@ -672,8 +672,10 @@ int rxe_completer(void *arg)
 			 */
 
 			/* there is nothing to retry in this case */
-			if (!wqe || (wqe->state == wqe_state_posted))
+			if (!wqe || (wqe->state == wqe_state_posted)) {
+				pr_warn("Retry attempted without a valid wqe\n");
 				goto exit;
+			}
 
 			/* if we've started a retry, don't start another
 			 * retry sequence, unless this is a timeout.
@@ -750,7 +752,6 @@ int rxe_completer(void *arg)
 	/* we come here if we are done with processing and want the task to
 	 * exit from the loop calling us
 	 */
-	WARN_ON_ONCE(skb);
 	rxe_drop_ref(qp);
 	return -EAGAIN;
 
@@ -758,7 +759,6 @@ int rxe_completer(void *arg)
 	/* we come here if we have processed a packet we want the task to call
 	 * us again to see if there is anything else to do
 	 */
-	WARN_ON_ONCE(skb);
 	rxe_drop_ref(qp);
 	return 0;
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
index 0701bd1ffd1a..01662727dca0 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -407,14 +407,22 @@ int rxe_send(struct rxe_pkt_info *pkt, struct sk_buff *skb)
 	return 0;
 }
 
+/* fix up a send packet to match the packets
+ * received from UDP before looping them back
+ */
 void rxe_loopback(struct sk_buff *skb)
 {
+	struct rxe_pkt_info *pkt = SKB_TO_PKT(skb);
+
 	if (skb->protocol == htons(ETH_P_IP))
 		skb_pull(skb, sizeof(struct iphdr));
 	else
 		skb_pull(skb, sizeof(struct ipv6hdr));
 
-	rxe_rcv(skb);
+	if (WARN_ON(!ib_device_try_get(&pkt->rxe->ib_dev)))
+		kfree_skb(skb);
+	else
+		rxe_rcv(skb);
 }
 
 struct sk_buff *rxe_init_packet(struct rxe_dev *rxe, struct rxe_av *av,
diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c
index 45d2f711bce2..2b2465744896 100644
--- a/drivers/infiniband/sw/rxe/rxe_recv.c
+++ b/drivers/infiniband/sw/rxe/rxe_recv.c
@@ -237,8 +237,6 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb)
 	struct rxe_mc_elem *mce;
 	struct rxe_qp *qp;
 	union ib_gid dgid;
-	struct sk_buff *per_qp_skb;
-	struct rxe_pkt_info *per_qp_pkt;
 	int err;
 
 	if (skb->protocol == htons(ETH_P_IP))
@@ -250,10 +248,15 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb)
 	/* lookup mcast group corresponding to mgid, takes a ref */
 	mcg = rxe_pool_get_key(&rxe->mc_grp_pool, &dgid);
 	if (!mcg)
-		goto err1;	/* mcast group not registered */
+		goto drop;	/* mcast group not registered */
 
 	spin_lock_bh(&mcg->mcg_lock);
 
+	/* this is unreliable datagram service so we let
+	 * failures to deliver a multicast packet to a
+	 * single QP happen and just move on and try
+	 * the rest of them on the list
+	 */
 	list_for_each_entry(mce, &mcg->qp_list, qp_list) {
 		qp = mce->qp;
 
@@ -266,39 +269,48 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb)
 		if (err)
 			continue;
 
-		/* for all but the last qp create a new clone of the
-		 * skb and pass to the qp. If an error occurs in the
-		 * checks for the last qp in the list we need to
-		 * free the skb since it hasn't been passed on to
-		 * rxe_rcv_pkt() which would free it later.
+		/* for all but the last QP create a new clone of the
+		 * skb and pass to the QP. Pass the original skb to
+		 * the last QP in the list.
 		 */
 		if (mce->qp_list.next != &mcg->qp_list) {
-			per_qp_skb = skb_clone(skb, GFP_ATOMIC);
-			if (WARN_ON(!ib_device_try_get(&rxe->ib_dev))) {
-				kfree_skb(per_qp_skb);
+			struct sk_buff *cskb;
+			struct rxe_pkt_info *cpkt;
+
+			cskb = skb_clone(skb, GFP_ATOMIC);
+			if (unlikely(!cskb))
 				continue;
+
+			if (WARN_ON(!ib_device_try_get(&rxe->ib_dev))) {
+				kfree_skb(cskb);
+				break;
 			}
+
+			cpkt = SKB_TO_PKT(cskb);
+			cpkt->qp = qp;
+			rxe_add_ref(qp);
+			rxe_rcv_pkt(cpkt, cskb);
 		} else {
-			per_qp_skb = skb;
-			/* show we have consumed the skb */
-			skb = NULL;
+			pkt->qp = qp;
+			rxe_add_ref(qp);
+			rxe_rcv_pkt(pkt, skb);
+			skb = NULL;	/* mark consumed */
 		}
-
-		if (unlikely(!per_qp_skb))
-			continue;
-
-		per_qp_pkt = SKB_TO_PKT(per_qp_skb);
-		per_qp_pkt->qp = qp;
-		rxe_add_ref(qp);
-		rxe_rcv_pkt(per_qp_pkt, per_qp_skb);
 	}
 
 	spin_unlock_bh(&mcg->mcg_lock);
 
 	rxe_drop_ref(mcg);	/* drop ref from rxe_pool_get_key. */
 
-err1:
-	/* free skb if not consumed */
+	if (likely(!skb))
+		return;
+
+	/* Fall through to drop packet
+	 * This only occurs if one of the checks fails on the last
+	 * QP in the list above
+	 */
+
+drop:
 	kfree_skb(skb);
 	ib_device_put(&rxe->ib_dev);
 }
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH for-next v2] RDMA/rxe: Fix ib_device reference counting (again)
  2021-03-03 22:56 [PATCH for-next v2] RDMA/rxe: Fix ib_device reference counting (again) Bob Pearson
@ 2021-03-04  8:25 ` Leon Romanovsky
  2021-03-04  8:58 ` Zhu Yanjun
  2021-03-04  9:14 ` Zhu Yanjun
  2 siblings, 0 replies; 5+ messages in thread
From: Leon Romanovsky @ 2021-03-04  8:25 UTC (permalink / raw)
  To: Bob Pearson; +Cc: jgg, zyjzyj2000, linux-rdma, Bob Pearson

On Wed, Mar 03, 2021 at 04:56:29PM -0600, Bob Pearson wrote:
> Three errors occurred in the fix referenced below.
>
> 1) rxe_rcv_mcast_pkt() dropped a reference to ib_device when
> no error occured causing an underflow on the reference counter.
> This code is cleaned up to be clearer and easier to read.
>
> 2) Extending the reference taken by rxe_get_dev_from_net() in
> rxe_udp_encap_recv() until each skb is freed was not matched by
> a reference in the loopback path resulting in underflows.
>
> 3) In rxe_comp.c the function free_pkt() did not clear skb which
> triggered a warning at done: and could possibly at exit: in
> rxe_completer(). The WARN_ONCE() calls are not actually needed.
>
> This patch fixes these errors.
>
> Fixes: 899aba891cab ("RDMA/rxe: Fix FIXME in rxe_udp_encap_recv()")
> Signed-off-by: Bob Pearson <rpearson@hpe.com>
> ---
> Version 2:
> v1 of this patch incorrectly added a WARN_ON_ONCE in rxe_completer
> where it could be triggered for normal traffic. This version
> replaced that with a pr_warn located correctly.
>
> v1 of this patch placed a call to kfree_skb in an if statement
> that could trigger style warnings. This version cleans that up.
>
>  drivers/infiniband/sw/rxe/rxe_comp.c |  6 +--
>  drivers/infiniband/sw/rxe/rxe_net.c  | 10 ++++-
>  drivers/infiniband/sw/rxe/rxe_recv.c | 60 +++++++++++++++++-----------
>  3 files changed, 48 insertions(+), 28 deletions(-)

➜  kernel git:(rdma-next) mkt ci
ff130e1dc197 (HEAD -> build) RDMA/rxe: Fix ib_device reference counting (again)
WARNING: 'occured' may be misspelled - perhaps 'occurred'?
#9:
no error occured causing an underflow on the reference counter.
         ^^^^^^^

WARNING: Prefer 'fallthrough;' over fallthrough comment
#182: FILE: drivers/infiniband/sw/rxe/rxe_recv.c:308:
+	/* Fall through to drop packet

WARNING: From:/Signed-off-by: email address mismatch: 'From: Bob Pearson <rpearsonhpe@gmail.com>' != 'Signed-off-by: Bob Pearson <rpearson@hpe.com>'

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH for-next v2] RDMA/rxe: Fix ib_device reference counting (again)
  2021-03-03 22:56 [PATCH for-next v2] RDMA/rxe: Fix ib_device reference counting (again) Bob Pearson
  2021-03-04  8:25 ` Leon Romanovsky
@ 2021-03-04  8:58 ` Zhu Yanjun
  2021-03-04 18:36   ` Bob Pearson
  2021-03-04  9:14 ` Zhu Yanjun
  2 siblings, 1 reply; 5+ messages in thread
From: Zhu Yanjun @ 2021-03-04  8:58 UTC (permalink / raw)
  To: Bob Pearson; +Cc: Jason Gunthorpe, RDMA mailing list, Bob Pearson

On Thu, Mar 4, 2021 at 7:02 AM Bob Pearson <rpearsonhpe@gmail.com> wrote:
>
> Three errors occurred in the fix referenced below.
>
> 1) rxe_rcv_mcast_pkt() dropped a reference to ib_device when
> no error occured causing an underflow on the reference counter.
> This code is cleaned up to be clearer and easier to read.
>
> 2) Extending the reference taken by rxe_get_dev_from_net() in
> rxe_udp_encap_recv() until each skb is freed was not matched by
> a reference in the loopback path resulting in underflows.
>
> 3) In rxe_comp.c the function free_pkt() did not clear skb which
> triggered a warning at done: and could possibly at exit: in
> rxe_completer(). The WARN_ONCE() calls are not actually needed.
>
> This patch fixes these errors.
>
> Fixes: 899aba891cab ("RDMA/rxe: Fix FIXME in rxe_udp_encap_recv()")
> Signed-off-by: Bob Pearson <rpearson@hpe.com>
> ---
> Version 2:
> v1 of this patch incorrectly added a WARN_ON_ONCE in rxe_completer
> where it could be triggered for normal traffic. This version
> replaced that with a pr_warn located correctly.
>
> v1 of this patch placed a call to kfree_skb in an if statement
> that could trigger style warnings. This version cleans that up.
>
>  drivers/infiniband/sw/rxe/rxe_comp.c |  6 +--
>  drivers/infiniband/sw/rxe/rxe_net.c  | 10 ++++-
>  drivers/infiniband/sw/rxe/rxe_recv.c | 60 +++++++++++++++++-----------
>  3 files changed, 48 insertions(+), 28 deletions(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
> index a8ac791a1bb9..96e5a73579f8 100644
> --- a/drivers/infiniband/sw/rxe/rxe_comp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_comp.c
> @@ -672,8 +672,10 @@ int rxe_completer(void *arg)
>                          */
>
>                         /* there is nothing to retry in this case */
> -                       if (!wqe || (wqe->state == wqe_state_posted))
> +                       if (!wqe || (wqe->state == wqe_state_posted)) {
> +                               pr_warn("Retry attempted without a valid wqe\n");
>                                 goto exit;
> +                       }
>
>                         /* if we've started a retry, don't start another
>                          * retry sequence, unless this is a timeout.
> @@ -750,7 +752,6 @@ int rxe_completer(void *arg)
>         /* we come here if we are done with processing and want the task to
>          * exit from the loop calling us
>          */
> -       WARN_ON_ONCE(skb);
>         rxe_drop_ref(qp);
>         return -EAGAIN;
>
> @@ -758,7 +759,6 @@ int rxe_completer(void *arg)
>         /* we come here if we have processed a packet we want the task to call
>          * us again to see if there is anything else to do
>          */
> -       WARN_ON_ONCE(skb);

With the above line is kept, I made tests with this commit.
1. git clone  https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git
2. cd rdma && git pull
3. apply this commit and with "WARN_ON_ONCE(skb);" kept
make tests with "rping ...."
The similar problem still occurs.

Zhu Yanjun
"
[  129.220472] ------------[ cut here ]------------
[  129.220475] WARNING: CPU: 11 PID: 1669 at
drivers/infiniband/sw/rxe/rxe_comp.c:763 rxe_completer+0x97f/0xd60
[rdma_rxe]
[  129.220485] Modules linked in: rdma_ucm rdma_cm iw_cm ib_cm
rdma_rxe ib_uverbs ip6_udp_tunnel udp_tunnel ib_core dm_multipath
scsi_dh_rdac scsi_dh_emc scsi_dh_alua intel_rapl_msr intel_rapl_common
isst_if_common nfit rapl snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm
snd_timer snd soundcore joydev input_leds mac_hid serio_raw
sch_fq_codel ip_tables x_tables autofs4 btrfs blake2b_generic
zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear
hid_generic usbhid hid crct10dif_pclmul crc32_pclmul
ghash_clmulni_intel vmwgfx ttm drm_kms_helper syscopyarea aesni_intel
sysfillrect sysimgblt fb_sys_fops crypto_simd cec cryptd psmouse drm
video ahci i2c_piix4 e1000 libahci pata_acpi
[  129.220537] CPU: 11 PID: 1669 Comm: rping Kdump: loaded Not tainted
5.12.0-rc1+ #2
[  129.220539] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
VirtualBox 12/01/2006
[  129.220541] RIP: 0010:rxe_completer+0x97f/0xd60 [rdma_rxe]
[  129.220545] Code: 8d 7f 08 e8 e3 e8 7d cd e9 f7 f6 ff ff 39 ca b8
0e 00 00 00 ba 03 00 00 00 0f 44 c2 89 c3 e9 47 f7 ff ff 0f 0b e9 b2
f9 ff ff <0f> 0b e9 26 f9 ff ff 49 8d 9f 30 08 00 00 48 89 df e8 5b a1
e0 cd
[  129.220547] RSP: 0018:ffffb4e3c0def7f0 EFLAGS: 00010286
[  129.220549] RAX: 0000000000000008 RBX: 000000000000000e RCX: ffffffff8f07feb8
[  129.220551] RDX: 0000000000000c40 RSI: ffffffff8e56db0e RDI: ffff99b1c2834000
[  129.220552] RBP: ffffb4e3c0def850 R08: 0000000000000000 R09: 0000000000000001
[  129.220553] R10: ffffb4e3c0def888 R11: 0000000000000000 R12: ffffb4e3c0bb1180
[  129.220554] R13: ffff99b1d15f4228 R14: 0000000000000000 R15: ffff99b1c8b66000
[  129.220555] FS:  00007f306d63b740(0000) GS:ffff99b4cfcc0000(0000)
knlGS:0000000000000000
[  129.220557] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  129.220558] CR2: 00007f306cd5bfb8 CR3: 000000011169a002 CR4: 00000000000706e0
[  129.220561] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  129.220562] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  129.220563] Call Trace:
[  129.220568]  rxe_do_task+0xa7/0xf0 [rdma_rxe]
[  129.220574]  rxe_run_task+0x2a/0x40 [rdma_rxe]
[  129.220578]  rxe_comp_queue_pkt+0x48/0x50 [rdma_rxe]
[  129.220582]  rxe_rcv+0x339/0x8e0 [rdma_rxe]
[  129.220586]  ? prepare_ack_packet+0x1b6/0x250 [rdma_rxe]
[  129.220590]  rxe_loopback+0x53/0x80 [rdma_rxe]
[  129.220595]  send_ack+0xac/0x170 [rdma_rxe]
[  129.220599]  rxe_responder+0x15bf/0x2220 [rdma_rxe]
[  129.220678]  rxe_do_task+0xa7/0xf0 [rdma_rxe]
[  129.220685]  rxe_run_task+0x2a/0x40 [rdma_rxe]
[  129.220689]  rxe_resp_queue_pkt+0x44/0x50 [rdma_rxe]
[  129.220693]  rxe_rcv+0x286/0x8e0 [rdma_rxe]
[  129.220697]  ? copy_data+0xc4/0x2a0 [rdma_rxe]
[  129.220701]  rxe_loopback+0x53/0x80 [rdma_rxe]
[  129.220705]  rxe_requester+0x6ec/0x10a0 [rdma_rxe]
[  129.220709]  ? _raw_spin_unlock_irqrestore+0xe/0x30
[  129.220739]  rxe_do_task+0xa7/0xf0 [rdma_rxe]
[  129.220744]  rxe_run_task+0x2a/0x40 [rdma_rxe]
[  129.220747]  rxe_post_send+0x320/0x530 [rdma_rxe]
[  129.220752]  ? lookup_get_idr_uobject+0x19/0x30 [ib_uverbs]
[  129.220761]  ? rdma_lookup_get_uobject+0x47/0x180 [ib_uverbs]
[  129.220767]  ib_uverbs_post_send+0x64d/0x6e0 [ib_uverbs]
[  129.220773]  ? __wake_up+0x13/0x20
[  129.220779]  ib_uverbs_write+0x44f/0x580 [ib_uverbs]
[  129.220785]  vfs_write+0xb9/0x250
[  129.220790]  ksys_write+0xb1/0xe0
[  129.220792]  __x64_sys_write+0x1a/0x20
[  129.220794]  do_syscall_64+0x38/0x90
[  129.220799]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  129.220801] RIP: 0033:0x7f306d9062cf
[  129.220803] Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 29 fd
ff ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00
00 0f 05 <48> 3d 00 f0 ff ff 77 2d 44 89 c7 48 89 44 24 08 e8 5c fd ff
ff 48
[  129.220805] RSP: 002b:00007fff81960ae0 EFLAGS: 00000293 ORIG_RAX:
0000000000000001
[  129.220807] RAX: ffffffffffffffda RBX: 00007f306cd5d180 RCX: 00007f306d9062cf
[  129.220808] RDX: 0000000000000020 RSI: 00007fff81960b40 RDI: 0000000000000004
[  129.220809] RBP: 0000000000000010 R08: 0000000000000000 R09: 0000000000000027
[  129.220810] R10: 00000000ffffffff R11: 0000000000000293 R12: 0000000000000001
[  129.220811] R13: 000055cfe2fb64a0 R14: 0000000000000000 R15: 0000000000000000
[  129.220813] ---[ end trace e6143931678defa3 ]---
"
>         rxe_drop_ref(qp);
>         return 0;
>  }
> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
> index 0701bd1ffd1a..01662727dca0 100644
> --- a/drivers/infiniband/sw/rxe/rxe_net.c
> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
> @@ -407,14 +407,22 @@ int rxe_send(struct rxe_pkt_info *pkt, struct sk_buff *skb)
>         return 0;
>  }
>
> +/* fix up a send packet to match the packets
> + * received from UDP before looping them back
> + */
>  void rxe_loopback(struct sk_buff *skb)
>  {
> +       struct rxe_pkt_info *pkt = SKB_TO_PKT(skb);
> +
>         if (skb->protocol == htons(ETH_P_IP))
>                 skb_pull(skb, sizeof(struct iphdr));
>         else
>                 skb_pull(skb, sizeof(struct ipv6hdr));
>
> -       rxe_rcv(skb);
> +       if (WARN_ON(!ib_device_try_get(&pkt->rxe->ib_dev)))
> +               kfree_skb(skb);
> +       else
> +               rxe_rcv(skb);
>  }
>
>  struct sk_buff *rxe_init_packet(struct rxe_dev *rxe, struct rxe_av *av,
> diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c
> index 45d2f711bce2..2b2465744896 100644
> --- a/drivers/infiniband/sw/rxe/rxe_recv.c
> +++ b/drivers/infiniband/sw/rxe/rxe_recv.c
> @@ -237,8 +237,6 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb)
>         struct rxe_mc_elem *mce;
>         struct rxe_qp *qp;
>         union ib_gid dgid;
> -       struct sk_buff *per_qp_skb;
> -       struct rxe_pkt_info *per_qp_pkt;
>         int err;
>
>         if (skb->protocol == htons(ETH_P_IP))
> @@ -250,10 +248,15 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb)
>         /* lookup mcast group corresponding to mgid, takes a ref */
>         mcg = rxe_pool_get_key(&rxe->mc_grp_pool, &dgid);
>         if (!mcg)
> -               goto err1;      /* mcast group not registered */
> +               goto drop;      /* mcast group not registered */
>
>         spin_lock_bh(&mcg->mcg_lock);
>
> +       /* this is unreliable datagram service so we let
> +        * failures to deliver a multicast packet to a
> +        * single QP happen and just move on and try
> +        * the rest of them on the list
> +        */
>         list_for_each_entry(mce, &mcg->qp_list, qp_list) {
>                 qp = mce->qp;
>
> @@ -266,39 +269,48 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb)
>                 if (err)
>                         continue;
>
> -               /* for all but the last qp create a new clone of the
> -                * skb and pass to the qp. If an error occurs in the
> -                * checks for the last qp in the list we need to
> -                * free the skb since it hasn't been passed on to
> -                * rxe_rcv_pkt() which would free it later.
> +               /* for all but the last QP create a new clone of the
> +                * skb and pass to the QP. Pass the original skb to
> +                * the last QP in the list.
>                  */
>                 if (mce->qp_list.next != &mcg->qp_list) {
> -                       per_qp_skb = skb_clone(skb, GFP_ATOMIC);
> -                       if (WARN_ON(!ib_device_try_get(&rxe->ib_dev))) {
> -                               kfree_skb(per_qp_skb);
> +                       struct sk_buff *cskb;
> +                       struct rxe_pkt_info *cpkt;
> +
> +                       cskb = skb_clone(skb, GFP_ATOMIC);
> +                       if (unlikely(!cskb))
>                                 continue;
> +
> +                       if (WARN_ON(!ib_device_try_get(&rxe->ib_dev))) {
> +                               kfree_skb(cskb);
> +                               break;
>                         }
> +
> +                       cpkt = SKB_TO_PKT(cskb);
> +                       cpkt->qp = qp;
> +                       rxe_add_ref(qp);
> +                       rxe_rcv_pkt(cpkt, cskb);
>                 } else {
> -                       per_qp_skb = skb;
> -                       /* show we have consumed the skb */
> -                       skb = NULL;
> +                       pkt->qp = qp;
> +                       rxe_add_ref(qp);
> +                       rxe_rcv_pkt(pkt, skb);
> +                       skb = NULL;     /* mark consumed */
>                 }
> -
> -               if (unlikely(!per_qp_skb))
> -                       continue;
> -
> -               per_qp_pkt = SKB_TO_PKT(per_qp_skb);
> -               per_qp_pkt->qp = qp;
> -               rxe_add_ref(qp);
> -               rxe_rcv_pkt(per_qp_pkt, per_qp_skb);
>         }
>
>         spin_unlock_bh(&mcg->mcg_lock);
>
>         rxe_drop_ref(mcg);      /* drop ref from rxe_pool_get_key. */
>
> -err1:
> -       /* free skb if not consumed */
> +       if (likely(!skb))
> +               return;
> +
> +       /* Fall through to drop packet
> +        * This only occurs if one of the checks fails on the last
> +        * QP in the list above
> +        */
> +
> +drop:
>         kfree_skb(skb);
>         ib_device_put(&rxe->ib_dev);
>  }
> --
> 2.27.0
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH for-next v2] RDMA/rxe: Fix ib_device reference counting (again)
  2021-03-03 22:56 [PATCH for-next v2] RDMA/rxe: Fix ib_device reference counting (again) Bob Pearson
  2021-03-04  8:25 ` Leon Romanovsky
  2021-03-04  8:58 ` Zhu Yanjun
@ 2021-03-04  9:14 ` Zhu Yanjun
  2 siblings, 0 replies; 5+ messages in thread
From: Zhu Yanjun @ 2021-03-04  9:14 UTC (permalink / raw)
  To: Bob Pearson; +Cc: Jason Gunthorpe, RDMA mailing list, Bob Pearson

On Thu, Mar 4, 2021 at 7:02 AM Bob Pearson <rpearsonhpe@gmail.com> wrote:
>
> Three errors occurred in the fix referenced below.
>
> 1) rxe_rcv_mcast_pkt() dropped a reference to ib_device when
> no error occured causing an underflow on the reference counter.
> This code is cleaned up to be clearer and easier to read.
>
> 2) Extending the reference taken by rxe_get_dev_from_net() in
> rxe_udp_encap_recv() until each skb is freed was not matched by
> a reference in the loopback path resulting in underflows.
>
> 3) In rxe_comp.c the function free_pkt() did not clear skb which
> triggered a warning at done: and could possibly at exit: in
> rxe_completer(). The WARN_ONCE() calls are not actually needed.
>
> This patch fixes these errors.
>
> Fixes: 899aba891cab ("RDMA/rxe: Fix FIXME in rxe_udp_encap_recv()")
> Signed-off-by: Bob Pearson <rpearson@hpe.com>
> ---
> Version 2:
> v1 of this patch incorrectly added a WARN_ON_ONCE in rxe_completer
> where it could be triggered for normal traffic. This version
> replaced that with a pr_warn located correctly.
>
> v1 of this patch placed a call to kfree_skb in an if statement
> that could trigger style warnings. This version cleans that up.
>
>  drivers/infiniband/sw/rxe/rxe_comp.c |  6 +--
>  drivers/infiniband/sw/rxe/rxe_net.c  | 10 ++++-
>  drivers/infiniband/sw/rxe/rxe_recv.c | 60 +++++++++++++++++-----------
>  3 files changed, 48 insertions(+), 28 deletions(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
> index a8ac791a1bb9..96e5a73579f8 100644
> --- a/drivers/infiniband/sw/rxe/rxe_comp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_comp.c
> @@ -672,8 +672,10 @@ int rxe_completer(void *arg)
>                          */
>
>                         /* there is nothing to retry in this case */
> -                       if (!wqe || (wqe->state == wqe_state_posted))
> +                       if (!wqe || (wqe->state == wqe_state_posted)) {
> +                               pr_warn("Retry attempted without a valid wqe\n");
>                                 goto exit;
> +                       }
>
>                         /* if we've started a retry, don't start another
>                          * retry sequence, unless this is a timeout.
> @@ -750,7 +752,6 @@ int rxe_completer(void *arg)
>         /* we come here if we are done with processing and want the task to
>          * exit from the loop calling us
>          */
> -       WARN_ON_ONCE(skb);

IMHO, the above "WARN_ON_ONCE(skb);" had better be kept.

>         rxe_drop_ref(qp);
>         return -EAGAIN;
>
> @@ -758,7 +759,6 @@ int rxe_completer(void *arg)
>         /* we come here if we have processed a packet we want the task to call
>          * us again to see if there is anything else to do
>          */
> -       WARN_ON_ONCE(skb);

The above "WARN_ON_ONCE(skb);" had better also be kept.

Zhu Yanjun

>         rxe_drop_ref(qp);
>         return 0;
>  }
> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
> index 0701bd1ffd1a..01662727dca0 100644
> --- a/drivers/infiniband/sw/rxe/rxe_net.c
> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
> @@ -407,14 +407,22 @@ int rxe_send(struct rxe_pkt_info *pkt, struct sk_buff *skb)
>         return 0;
>  }
>
> +/* fix up a send packet to match the packets
> + * received from UDP before looping them back
> + */
>  void rxe_loopback(struct sk_buff *skb)
>  {
> +       struct rxe_pkt_info *pkt = SKB_TO_PKT(skb);
> +
>         if (skb->protocol == htons(ETH_P_IP))
>                 skb_pull(skb, sizeof(struct iphdr));
>         else
>                 skb_pull(skb, sizeof(struct ipv6hdr));
>
> -       rxe_rcv(skb);
> +       if (WARN_ON(!ib_device_try_get(&pkt->rxe->ib_dev)))
> +               kfree_skb(skb);
> +       else
> +               rxe_rcv(skb);
>  }
>
>  struct sk_buff *rxe_init_packet(struct rxe_dev *rxe, struct rxe_av *av,
> diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c
> index 45d2f711bce2..2b2465744896 100644
> --- a/drivers/infiniband/sw/rxe/rxe_recv.c
> +++ b/drivers/infiniband/sw/rxe/rxe_recv.c
> @@ -237,8 +237,6 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb)
>         struct rxe_mc_elem *mce;
>         struct rxe_qp *qp;
>         union ib_gid dgid;
> -       struct sk_buff *per_qp_skb;
> -       struct rxe_pkt_info *per_qp_pkt;
>         int err;
>
>         if (skb->protocol == htons(ETH_P_IP))
> @@ -250,10 +248,15 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb)
>         /* lookup mcast group corresponding to mgid, takes a ref */
>         mcg = rxe_pool_get_key(&rxe->mc_grp_pool, &dgid);
>         if (!mcg)
> -               goto err1;      /* mcast group not registered */
> +               goto drop;      /* mcast group not registered */
>
>         spin_lock_bh(&mcg->mcg_lock);
>
> +       /* this is unreliable datagram service so we let
> +        * failures to deliver a multicast packet to a
> +        * single QP happen and just move on and try
> +        * the rest of them on the list
> +        */
>         list_for_each_entry(mce, &mcg->qp_list, qp_list) {
>                 qp = mce->qp;
>
> @@ -266,39 +269,48 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb)
>                 if (err)
>                         continue;
>
> -               /* for all but the last qp create a new clone of the
> -                * skb and pass to the qp. If an error occurs in the
> -                * checks for the last qp in the list we need to
> -                * free the skb since it hasn't been passed on to
> -                * rxe_rcv_pkt() which would free it later.
> +               /* for all but the last QP create a new clone of the
> +                * skb and pass to the QP. Pass the original skb to
> +                * the last QP in the list.
>                  */
>                 if (mce->qp_list.next != &mcg->qp_list) {
> -                       per_qp_skb = skb_clone(skb, GFP_ATOMIC);
> -                       if (WARN_ON(!ib_device_try_get(&rxe->ib_dev))) {
> -                               kfree_skb(per_qp_skb);
> +                       struct sk_buff *cskb;
> +                       struct rxe_pkt_info *cpkt;
> +
> +                       cskb = skb_clone(skb, GFP_ATOMIC);
> +                       if (unlikely(!cskb))
>                                 continue;
> +
> +                       if (WARN_ON(!ib_device_try_get(&rxe->ib_dev))) {
> +                               kfree_skb(cskb);
> +                               break;
>                         }
> +
> +                       cpkt = SKB_TO_PKT(cskb);
> +                       cpkt->qp = qp;
> +                       rxe_add_ref(qp);
> +                       rxe_rcv_pkt(cpkt, cskb);
>                 } else {
> -                       per_qp_skb = skb;
> -                       /* show we have consumed the skb */
> -                       skb = NULL;
> +                       pkt->qp = qp;
> +                       rxe_add_ref(qp);
> +                       rxe_rcv_pkt(pkt, skb);
> +                       skb = NULL;     /* mark consumed */
>                 }
> -
> -               if (unlikely(!per_qp_skb))
> -                       continue;
> -
> -               per_qp_pkt = SKB_TO_PKT(per_qp_skb);
> -               per_qp_pkt->qp = qp;
> -               rxe_add_ref(qp);
> -               rxe_rcv_pkt(per_qp_pkt, per_qp_skb);
>         }
>
>         spin_unlock_bh(&mcg->mcg_lock);
>
>         rxe_drop_ref(mcg);      /* drop ref from rxe_pool_get_key. */
>
> -err1:
> -       /* free skb if not consumed */
> +       if (likely(!skb))
> +               return;
> +
> +       /* Fall through to drop packet
> +        * This only occurs if one of the checks fails on the last
> +        * QP in the list above
> +        */
> +
> +drop:
>         kfree_skb(skb);
>         ib_device_put(&rxe->ib_dev);
>  }
> --
> 2.27.0
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH for-next v2] RDMA/rxe: Fix ib_device reference counting (again)
  2021-03-04  8:58 ` Zhu Yanjun
@ 2021-03-04 18:36   ` Bob Pearson
  0 siblings, 0 replies; 5+ messages in thread
From: Bob Pearson @ 2021-03-04 18:36 UTC (permalink / raw)
  To: Zhu Yanjun; +Cc: Jason Gunthorpe, RDMA mailing list, Bob Pearson

On 3/4/21 2:58 AM, Zhu Yanjun wrote:
> On Thu, Mar 4, 2021 at 7:02 AM Bob Pearson <rpearsonhpe@gmail.com> wrote:
>>
>> Three errors occurred in the fix referenced below.
>>
>> 1) rxe_rcv_mcast_pkt() dropped a reference to ib_device when
>> no error occured causing an underflow on the reference counter.
>> This code is cleaned up to be clearer and easier to read.
>>
>> 2) Extending the reference taken by rxe_get_dev_from_net() in
>> rxe_udp_encap_recv() until each skb is freed was not matched by
>> a reference in the loopback path resulting in underflows.
>>
>> 3) In rxe_comp.c the function free_pkt() did not clear skb which
>> triggered a warning at done: and could possibly at exit: in
>> rxe_completer(). The WARN_ONCE() calls are not actually needed.
>>
>> This patch fixes these errors.
>>
>> Fixes: 899aba891cab ("RDMA/rxe: Fix FIXME in rxe_udp_encap_recv()")
>> Signed-off-by: Bob Pearson <rpearson@hpe.com>
>> ---
>> Version 2:
>> v1 of this patch incorrectly added a WARN_ON_ONCE in rxe_completer
>> where it could be triggered for normal traffic. This version
>> replaced that with a pr_warn located correctly.
>>
>> v1 of this patch placed a call to kfree_skb in an if statement
>> that could trigger style warnings. This version cleans that up.
>>
>>  drivers/infiniband/sw/rxe/rxe_comp.c |  6 +--
>>  drivers/infiniband/sw/rxe/rxe_net.c  | 10 ++++-
>>  drivers/infiniband/sw/rxe/rxe_recv.c | 60 +++++++++++++++++-----------
>>  3 files changed, 48 insertions(+), 28 deletions(-)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
>> index a8ac791a1bb9..96e5a73579f8 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_comp.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_comp.c
>> @@ -672,8 +672,10 @@ int rxe_completer(void *arg)
>>                          */
>>
>>                         /* there is nothing to retry in this case */
>> -                       if (!wqe || (wqe->state == wqe_state_posted))
>> +                       if (!wqe || (wqe->state == wqe_state_posted)) {
>> +                               pr_warn("Retry attempted without a valid wqe\n");
>>                                 goto exit;
>> +                       }
>>
>>                         /* if we've started a retry, don't start another
>>                          * retry sequence, unless this is a timeout.
>> @@ -750,7 +752,6 @@ int rxe_completer(void *arg)
>>         /* we come here if we are done with processing and want the task to
>>          * exit from the loop calling us
>>          */
>> -       WARN_ON_ONCE(skb);
>>         rxe_drop_ref(qp);
>>         return -EAGAIN;
>>
>> @@ -758,7 +759,6 @@ int rxe_completer(void *arg)
>>         /* we come here if we have processed a packet we want the task to call
>>          * us again to see if there is anything else to do
>>          */
>> -       WARN_ON_ONCE(skb);
> 
> With the above line is kept, I made tests with this commit.
> 1. git clone  https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git
> 2. cd rdma && git pull
> 3. apply this commit and with "WARN_ON_ONCE(skb);" kept
> make tests with "rping ...."
> The similar problem still occurs.
> 
> Zhu Yanjun

The WARNs occur because skb is not getting cleared not because packets are not being
freed.

The issue is whether the skbs are freed not whether a local variable still has an
(old) address of an skb. The following would trigger a warning but doesn't mean
anything

	skb = skb_alloc(...);
	kfree_skb(skb);
	WARN_ON_ONCE(skb);

Every path out of the subroutine calls free_pkt() except one. That is because I was
trying to not change the behavior to the original code. That occurs in the
ERROR_RETRY state when no wqe is available. All other paths call free_pkt() and
there calls kfree_skb(). On that one path we leak an skb which is not good so we
should probably go ahead and free it too just dropping the packet. In that case we
can move the free_pkt() to the end and make it explicit that all the packets are
actually freed. I will modify the code to do that. The WARN is still not required.

Bob

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-03-04 18:37 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-03 22:56 [PATCH for-next v2] RDMA/rxe: Fix ib_device reference counting (again) Bob Pearson
2021-03-04  8:25 ` Leon Romanovsky
2021-03-04  8:58 ` Zhu Yanjun
2021-03-04 18:36   ` Bob Pearson
2021-03-04  9:14 ` Zhu Yanjun

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.