Netdev Archive on lore.kernel.org
 help / color / Atom feed
* [RFC] can: can_create_echo_skb(): fix echo skb generation: always use skb_clone()
@ 2020-01-24 13:26 Oleksij Rempel
  2020-02-03 13:43 ` Naresh Kamboju
  2020-02-14 12:09 ` Oleksij Rempel
  0 siblings, 2 replies; 11+ messages in thread
From: Oleksij Rempel @ 2020-01-24 13:26 UTC (permalink / raw)
  To: dev.kurt, mkl, wg; +Cc: Oleksij Rempel, kernel, linux-can, netdev

All user space generated SKBs are owned by a socket (unless injected
into the key via AF_PACKET). If a socket is closed, all associated skbs
will be cleaned up.

This leads to a problem when a CAN driver calls can_put_echo_skb() on a
unshared SKB. If the socket is closed prior to the TX complete handler,
can_get_echo_skb() and the subsequent delivering of the echo SKB to
all registered callbacks, a SKB with a refcount of 0 is delivered.

To avoid the problem, in can_get_echo_skb() the original SKB is now
always cloned, regardless of shared SKB or not. If the process exists it
can now safely discard its SKBs, without disturbing the delivery of the
echo SKB.

The problem shows up in the j1939 stack, when it clones the
incoming skb, which detects the already 0 refcount.

We can easily reproduce this with following example:

testj1939 -B -r can0: &
cansend can0 1823ff40#0123

WARNING: CPU: 0 PID: 293 at lib/refcount.c:25 refcount_warn_saturate+0x108/0x174
refcount_t: addition on 0; use-after-free.
Modules linked in: coda_vpu imx_vdoa videobuf2_vmalloc dw_hdmi_ahb_audio vcan
CPU: 0 PID: 293 Comm: cansend Not tainted 5.5.0-rc6-00376-g9e20dcb7040d #1
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
Backtrace:
[<c010f570>] (dump_backtrace) from [<c010f90c>] (show_stack+0x20/0x24)
[<c010f8ec>] (show_stack) from [<c0c3e1a4>] (dump_stack+0x8c/0xa0)
[<c0c3e118>] (dump_stack) from [<c0127fec>] (__warn+0xe0/0x108)
[<c0127f0c>] (__warn) from [<c01283c8>] (warn_slowpath_fmt+0xa8/0xcc)
[<c0128324>] (warn_slowpath_fmt) from [<c0539c0c>] (refcount_warn_saturate+0x108/0x174)
[<c0539b04>] (refcount_warn_saturate) from [<c0ad2cac>] (j1939_can_recv+0x20c/0x210)
[<c0ad2aa0>] (j1939_can_recv) from [<c0ac9dc8>] (can_rcv_filter+0xb4/0x268)
[<c0ac9d14>] (can_rcv_filter) from [<c0aca2cc>] (can_receive+0xb0/0xe4)
[<c0aca21c>] (can_receive) from [<c0aca348>] (can_rcv+0x48/0x98)
[<c0aca300>] (can_rcv) from [<c09b1fdc>] (__netif_receive_skb_one_core+0x64/0x88)
[<c09b1f78>] (__netif_receive_skb_one_core) from [<c09b2070>] (__netif_receive_skb+0x38/0x94)
[<c09b2038>] (__netif_receive_skb) from [<c09b2130>] (netif_receive_skb_internal+0x64/0xf8)
[<c09b20cc>] (netif_receive_skb_internal) from [<c09b21f8>] (netif_receive_skb+0x34/0x19c)
[<c09b21c4>] (netif_receive_skb) from [<c0791278>] (can_rx_offload_napi_poll+0x58/0xb4)

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
---
 include/linux/can/skb.h | 20 ++++++++------------
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/include/linux/can/skb.h b/include/linux/can/skb.h
index a954def26c0d..0783b0c6d9e2 100644
--- a/include/linux/can/skb.h
+++ b/include/linux/can/skb.h
@@ -61,21 +61,17 @@ static inline void can_skb_set_owner(struct sk_buff *skb, struct sock *sk)
  */
 static inline struct sk_buff *can_create_echo_skb(struct sk_buff *skb)
 {
-	if (skb_shared(skb)) {
-		struct sk_buff *nskb = skb_clone(skb, GFP_ATOMIC);
+	struct sk_buff *nskb;
 
-		if (likely(nskb)) {
-			can_skb_set_owner(nskb, skb->sk);
-			consume_skb(skb);
-			return nskb;
-		} else {
-			kfree_skb(skb);
-			return NULL;
-		}
+	nskb = skb_clone(skb, GFP_ATOMIC);
+	if (unlikely(!nskb)) {
+		kfree_skb(skb);
+		return NULL;
 	}
 
-	/* we can assume to have an unshared skb with proper owner */
-	return skb;
+	can_skb_set_owner(nskb, skb->sk);
+	consume_skb(skb);
+	return nskb;
 }
 
 #endif /* !_CAN_SKB_H */
-- 
2.25.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] can: can_create_echo_skb(): fix echo skb generation: always use skb_clone()
  2020-01-24 13:26 [RFC] can: can_create_echo_skb(): fix echo skb generation: always use skb_clone() Oleksij Rempel
@ 2020-02-03 13:43 ` Naresh Kamboju
  2020-02-03 13:48   ` Marc Kleine-Budde
  2020-02-14 12:09 ` Oleksij Rempel
  1 sibling, 1 reply; 11+ messages in thread
From: Naresh Kamboju @ 2020-02-03 13:43 UTC (permalink / raw)
  To: Oleksij Rempel
  Cc: dev.kurt, mkl, wg, Pengutronix Kernel Team, linux-can, Netdev,
	lkft-triage

On Fri, 24 Jan 2020 at 18:57, Oleksij Rempel <o.rempel@pengutronix.de> wrote:
>
> All user space generated SKBs are owned by a socket (unless injected
> into the key via AF_PACKET). If a socket is closed, all associated skbs
> will be cleaned up.
>
> This leads to a problem when a CAN driver calls can_put_echo_skb() on a
> unshared SKB. If the socket is closed prior to the TX complete handler,
> can_get_echo_skb() and the subsequent delivering of the echo SKB to
> all registered callbacks, a SKB with a refcount of 0 is delivered.
>
> To avoid the problem, in can_get_echo_skb() the original SKB is now
> always cloned, regardless of shared SKB or not. If the process exists it
> can now safely discard its SKBs, without disturbing the delivery of the
> echo SKB.
>
> The problem shows up in the j1939 stack, when it clones the
> incoming skb, which detects the already 0 refcount.
>
> We can easily reproduce this with following example:
>
> testj1939 -B -r can0: &
> cansend can0 1823ff40#0123
>
> WARNING: CPU: 0 PID: 293 at lib/refcount.c:25 refcount_warn_saturate+0x108/0x174
> refcount_t: addition on 0; use-after-free.

FYI,
This issue noticed in our Linaro test farm
On linux next version 5.5.0-next-20200203 running on beagleboard x15 arm device.

Thanks for providing fix for this case.

Warning log.
[    0.013414] ------------[ cut here ]------------
[    0.013420] WARNING: CPU: 0 PID: 0 at
/usr/src/kernel/lib/refcount.c:25 refcount_warn_saturate+0x108/0x174
[    0.013424] refcount_t: addition on 0; use-after-free.
[    0.013427] Modules linked in:
[    0.013435] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.5.0-next-20200203 #1
[    0.013439] Hardware name: Generic DRA74X (Flattened Device Tree)
[    0.013442] Backtrace:
[    0.013448] [<c040fac4>] (dump_backtrace) from [<c040fdf8>]
(show_stack+0x20/0x24)
[    0.013452]  r7:c23f2e68 r6:00000000 r5:600000d3 r4:c23f2e68
[    0.013456] [<c040fdd8>] (show_stack) from [<c14144d0>]
(dump_stack+0xe8/0x114)
[    0.013459] [<c14143e8>] (dump_stack) from [<c04595cc>] (__warn+0x100/0x118)
[    0.013463]  r10:efca9a50 r9:c0957770 r8:00000019 r7:c1c2343c
r6:00000009 r5:00000000
[    0.013467]  r4:c2201b7c r3:be2d277f
[    0.013470] [<c04594cc>] (__warn) from [<c0459668>]
(warn_slowpath_fmt+0x84/0xc0)
[    0.013474]  r9:00000009 r8:c0957770 r7:00000019 r6:c1c2343c
r5:c1c2345c r4:c2208708
[    0.013478] [<c04595e8>] (warn_slowpath_fmt) from [<c0957770>]
(refcount_warn_saturate+0x108/0x174)
[    0.013481]  r9:c2a36014 r8:c2a35c56 r7:c2a35c56 r6:00000007
r5:efca9a50 r4:efca9a70
[    0.013485] [<c0957668>] (refcount_warn_saturate) from [<c1419a30>]
(kobject_get+0xa8/0xac)
[    0.013489] [<c1419988>] (kobject_get) from [<c112aa6c>]
(of_node_get+0x24/0x2c)
[    0.013492]  r4:efca9a44
[    0.013495] [<c112aa48>] (of_node_get) from [<c11298fc>]
(of_fwnode_get+0x44/0x50)
[    0.013499]  r5:efca9a50 r4:00000007
[    0.013502] [<c11298b8>] (of_fwnode_get) from [<c0cbbdc8>]
(fwnode_get_nth_parent+0x3c/0x6c)
[    0.013507] [<c0cbbd8c>] (fwnode_get_nth_parent) from [<c1428624>]
(fwnode_full_name_string+0x3c/0xa8)
[    0.013510]  r7:c2a35c56 r6:c1c54319 r5:c189c7d0 r4:00000007
[    0.013514] [<c14285e8>] (fwnode_full_name_string) from
[<c142a04c>] (device_node_string+0x48c/0x4ec)
[    0.013518]  r10:ffffffff r9:c1bde730 r8:efca9a44 r7:c2a35c56
r6:c1c54319 r5:c2a36014
[    0.013521]  r4:c2208708
[    0.013525] [<c1429bc4>] (device_node_string) from [<c142bc1c>]
(pointer+0x43c/0x4e0)
[    0.013529]  r10:c2a36014 r9:c2201d3c r8:c2201e90 r7:00000002
r6:00000000 r5:c2a36014
[    0.013532]  r4:c2a35c56
[    0.013535] [<c142b7e0>] (pointer) from [<c142be88>] (vsnprintf+0x1c8/0x414)
[    0.013539]  r7:00000002 r6:c1d5b4e8 r5:c1d5b4e6 r4:c2a35c56
[    0.013542] [<c142bcc0>] (vsnprintf) from [<c142c0e8>] (vscnprintf+0x14/0x2c)
[    0.013546]  r10:00000000 r9:00000000 r8:ffffffff r7:c2a352e8
r6:00000028 r5:600000d3
[    0.013549]  r4:000003e0
[    0.013553] [<c142c0d4>] (vscnprintf) from [<c04db300>]
(vprintk_store+0x44/0x220)
[    0.013556]  r5:600000d3 r4:c2a352e8
[    0.013560] [<c04db2bc>] (vprintk_store) from [<c04db8a0>]
(vprintk_emit+0xa0/0x2fc)
[    0.013564]  r10:00000001 r9:ffffffff r8:00000000 r7:00000000
r6:00000028 r5:600000d3
[    0.013567]  r4:c2a352e8
[    0.013571] [<c04db800>] (vprintk_emit) from [<c04dbb2c>]
(vprintk_default+0x30/0x38)
[    0.013575]  r10:efca9a44 r9:00000001 r8:00000000 r7:ffffe000
r6:c2201e8c r5:c1d5b4c4
[    0.013578]  r4:c21aa590
[    0.013582] [<c04dbafc>] (vprintk_default) from [<c04dc9d4>]
(vprintk_func+0xe0/0x168)
[    0.013585] [<c04dc8f4>] (vprintk_func) from [<c04dc1ec>] (printk+0x40/0x5c)
[    0.013589]  r7:00000000 r6:c23d2350 r5:efca9a44 r4:c2208708
[    0.013592] [<c04dc1ac>] (printk) from [<c112b7c8>]
(of_node_release+0xb0/0xcc)
[    0.013596]  r3:00000008 r2:00000000 r1:efca9a44 r0:c1d5b4c4
[    0.013599]  r4:efca9a70
[    0.013602] [<c112b718>] (of_node_release) from [<c1419c28>]
(kobject_put+0x11c/0x23c)
[    0.013606]  r5:c2422cb8 r4:efca9a70
[    0.013609] [<c1419b0c>] (kobject_put) from [<c112aa98>]
(of_node_put+0x24/0x28)
[    0.013613]  r7:e98f7980 r6:c2201ef4 r5:00000000 r4:e98f7940
[    0.013616] [<c112aa74>] (of_node_put) from [<c20474a0>]
(of_clk_init+0x1a4/0x248)
[    0.013620] [<c20472fc>] (of_clk_init) from [<c20140dc>]
(omap_clk_init+0x4c/0x68)
[    0.013624]  r10:efc8b8c0 r9:c2433054 r8:00000000 r7:c2208700
r6:00000066 r5:c20dab64
[    0.013627]  r4:c2434500
[    0.013631] [<c2014090>] (omap_clk_init) from [<c2014afc>]
(omap4_sync32k_timer_init+0x18/0x3c)
[    0.013634]  r5:c20dab64 r4:c2433000
[    0.013638] [<c2014ae4>] (omap4_sync32k_timer_init) from
[<c2014de8>] (omap5_realtime_timer_init+0x1c/0x258)
[    0.013642] [<c2014dcc>] (omap5_realtime_timer_init) from
[<c2005954>] (time_init+0x30/0x44)
[    0.013645]  r9:c2433054 r8:00000000 r7:c2208700 r6:00000066
r5:c20dab64 r4:c2433000
[    0.013649] [<c2005924>] (time_init) from [<c20012dc>]
(start_kernel+0x590/0x720)
[    0.013652] [<c2000d4c>] (start_kernel) from [<00000000>] (0x0)
[    0.013656]  r10:30c5387d r9:412fc0f2 r8:8ffdc000 r7:00000000
r6:30c0387d r5:00000000
[    0.013659]  r4:c2000330
[    0.013662] irq event stamp: 0
[    0.013665] hardirqs last  enabled at (0): [<00000000>] 0x0
[    0.013669] hardirqs last disabled at (0): [<00000000>] 0x0
[    0.013672] softirqs last  enabled at (0): [<00000000>] 0x0
[    0.013676] softirqs last disabled at (0): [<00000000>] 0x0
[    0.013679] ---[ end trace ec9a61ce578d03f8 ]---
[    0.013683] ------------[ cut here ]------------a

full test log link,
https://lkft.validation.linaro.org/scheduler/job/1158386#L3711


-- 
Linaro LKFT
https://lkft.linaro.org

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] can: can_create_echo_skb(): fix echo skb generation: always use skb_clone()
  2020-02-03 13:43 ` Naresh Kamboju
@ 2020-02-03 13:48   ` Marc Kleine-Budde
  2020-02-03 13:52     ` Marc Kleine-Budde
  0 siblings, 1 reply; 11+ messages in thread
From: Marc Kleine-Budde @ 2020-02-03 13:48 UTC (permalink / raw)
  To: Naresh Kamboju, Oleksij Rempel
  Cc: dev.kurt, wg, Pengutronix Kernel Team, linux-can, Netdev, lkft-triage

On 2/3/20 2:43 PM, Naresh Kamboju wrote:
> On Fri, 24 Jan 2020 at 18:57, Oleksij Rempel <o.rempel@pengutronix.de> wrote:
>>
>> All user space generated SKBs are owned by a socket (unless injected
>> into the key via AF_PACKET). If a socket is closed, all associated skbs
>> will be cleaned up.
>>
>> This leads to a problem when a CAN driver calls can_put_echo_skb() on a
>> unshared SKB. If the socket is closed prior to the TX complete handler,
>> can_get_echo_skb() and the subsequent delivering of the echo SKB to
>> all registered callbacks, a SKB with a refcount of 0 is delivered.
>>
>> To avoid the problem, in can_get_echo_skb() the original SKB is now
>> always cloned, regardless of shared SKB or not. If the process exists it
>> can now safely discard its SKBs, without disturbing the delivery of the
>> echo SKB.
>>
>> The problem shows up in the j1939 stack, when it clones the
>> incoming skb, which detects the already 0 refcount.
>>
>> We can easily reproduce this with following example:
>>
>> testj1939 -B -r can0: &
>> cansend can0 1823ff40#0123
>>
>> WARNING: CPU: 0 PID: 293 at lib/refcount.c:25 refcount_warn_saturate+0x108/0x174
>> refcount_t: addition on 0; use-after-free.
> 
> FYI,
> This issue noticed in our Linaro test farm
> On linux next version 5.5.0-next-20200203 running on beagleboard x15 arm device.
> 
> Thanks for providing fix for this case.

Can we add your Tested-by to the patch?

regards,
Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] can: can_create_echo_skb(): fix echo skb generation: always use skb_clone()
  2020-02-03 13:48   ` Marc Kleine-Budde
@ 2020-02-03 13:52     ` Marc Kleine-Budde
  2020-02-03 14:36       ` Naresh Kamboju
  0 siblings, 1 reply; 11+ messages in thread
From: Marc Kleine-Budde @ 2020-02-03 13:52 UTC (permalink / raw)
  To: Naresh Kamboju, Oleksij Rempel
  Cc: dev.kurt, Netdev, lkft-triage, linux-can, Pengutronix Kernel Team, wg

On 2/3/20 2:48 PM, Marc Kleine-Budde wrote:
>>> WARNING: CPU: 0 PID: 293 at lib/refcount.c:25 refcount_warn_saturate+0x108/0x174
>>> refcount_t: addition on 0; use-after-free.
>>
>> FYI,
>> This issue noticed in our Linaro test farm
>> On linux next version 5.5.0-next-20200203 running on beagleboard x15 arm device.
>>
>> Thanks for providing fix for this case.

Please look closely at your backtraces, they are totally unrelated. It
seems the whole culprit in your testcase fails with:

> [    0.000000] OF: ERROR: Bad of_node_put() on /ocp/interconnect@4a000000/segment@0/target-module@8000/cm_core@0/l4per-cm@1700/l4per-clkctrl@28

...the rest are subsequent faults.

> Can we add your Tested-by to the patch?

Answering myself: NO!

Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] can: can_create_echo_skb(): fix echo skb generation: always use skb_clone()
  2020-02-03 13:52     ` Marc Kleine-Budde
@ 2020-02-03 14:36       ` Naresh Kamboju
  0 siblings, 0 replies; 11+ messages in thread
From: Naresh Kamboju @ 2020-02-03 14:36 UTC (permalink / raw)
  To: Marc Kleine-Budde
  Cc: Oleksij Rempel, dev.kurt, Netdev, lkft-triage, linux-can,
	Pengutronix Kernel Team, wg

On Mon, 3 Feb 2020 at 19:22, Marc Kleine-Budde <mkl@pengutronix.de> wrote:
>
> On 2/3/20 2:48 PM, Marc Kleine-Budde wrote:
> >>> WARNING: CPU: 0 PID: 293 at lib/refcount.c:25 refcount_warn_saturate+0x108/0x174
> >>> refcount_t: addition on 0; use-after-free.
> >>
> >> FYI,
> >> This issue noticed in our Linaro test farm
> >> On linux next version 5.5.0-next-20200203 running on beagleboard x15 arm device.
> >>
> >> Thanks for providing fix for this case.
>
> Please look closely at your backtraces, they are totally unrelated. It
> seems the whole culprit in your testcase fails with:
>
> > [    0.000000] OF: ERROR: Bad of_node_put() on /ocp/interconnect@4a000000/segment@0/target-module@8000/cm_core@0/l4per-cm@1700/l4per-clkctrl@28

I have reported on another email regarding this dtb issue.

>
> ...the rest are subsequent faults.
>
> > Can we add your Tested-by to the patch?

We generally test linux mainline, linux-next, LTS 5.4, 4.19, 4.14, 4.9
and 4.4 branches.
When issue like this found and with a proposed fix patch.
I do schedule it for apply patch and test and report it back on email thread.

- Naresh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] can: can_create_echo_skb(): fix echo skb generation: always use skb_clone()
  2020-01-24 13:26 [RFC] can: can_create_echo_skb(): fix echo skb generation: always use skb_clone() Oleksij Rempel
  2020-02-03 13:43 ` Naresh Kamboju
@ 2020-02-14 12:09 ` Oleksij Rempel
  2020-10-16 19:36   ` Marc Kleine-Budde
  1 sibling, 1 reply; 11+ messages in thread
From: Oleksij Rempel @ 2020-02-14 12:09 UTC (permalink / raw)
  To: dev.kurt, mkl, wg; +Cc: netdev, kernel, linux-can


[-- Attachment #1: Type: text/plain, Size: 4073 bytes --]

Hi all,

any comments on this patch?

On Fri, Jan 24, 2020 at 02:26:56PM +0100, Oleksij Rempel wrote:
> All user space generated SKBs are owned by a socket (unless injected
> into the key via AF_PACKET). If a socket is closed, all associated skbs
> will be cleaned up.
> 
> This leads to a problem when a CAN driver calls can_put_echo_skb() on a
> unshared SKB. If the socket is closed prior to the TX complete handler,
> can_get_echo_skb() and the subsequent delivering of the echo SKB to
> all registered callbacks, a SKB with a refcount of 0 is delivered.
> 
> To avoid the problem, in can_get_echo_skb() the original SKB is now
> always cloned, regardless of shared SKB or not. If the process exists it
> can now safely discard its SKBs, without disturbing the delivery of the
> echo SKB.
> 
> The problem shows up in the j1939 stack, when it clones the
> incoming skb, which detects the already 0 refcount.
> 
> We can easily reproduce this with following example:
> 
> testj1939 -B -r can0: &
> cansend can0 1823ff40#0123
> 
> WARNING: CPU: 0 PID: 293 at lib/refcount.c:25 refcount_warn_saturate+0x108/0x174
> refcount_t: addition on 0; use-after-free.
> Modules linked in: coda_vpu imx_vdoa videobuf2_vmalloc dw_hdmi_ahb_audio vcan
> CPU: 0 PID: 293 Comm: cansend Not tainted 5.5.0-rc6-00376-g9e20dcb7040d #1
> Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> Backtrace:
> [<c010f570>] (dump_backtrace) from [<c010f90c>] (show_stack+0x20/0x24)
> [<c010f8ec>] (show_stack) from [<c0c3e1a4>] (dump_stack+0x8c/0xa0)
> [<c0c3e118>] (dump_stack) from [<c0127fec>] (__warn+0xe0/0x108)
> [<c0127f0c>] (__warn) from [<c01283c8>] (warn_slowpath_fmt+0xa8/0xcc)
> [<c0128324>] (warn_slowpath_fmt) from [<c0539c0c>] (refcount_warn_saturate+0x108/0x174)
> [<c0539b04>] (refcount_warn_saturate) from [<c0ad2cac>] (j1939_can_recv+0x20c/0x210)
> [<c0ad2aa0>] (j1939_can_recv) from [<c0ac9dc8>] (can_rcv_filter+0xb4/0x268)
> [<c0ac9d14>] (can_rcv_filter) from [<c0aca2cc>] (can_receive+0xb0/0xe4)
> [<c0aca21c>] (can_receive) from [<c0aca348>] (can_rcv+0x48/0x98)
> [<c0aca300>] (can_rcv) from [<c09b1fdc>] (__netif_receive_skb_one_core+0x64/0x88)
> [<c09b1f78>] (__netif_receive_skb_one_core) from [<c09b2070>] (__netif_receive_skb+0x38/0x94)
> [<c09b2038>] (__netif_receive_skb) from [<c09b2130>] (netif_receive_skb_internal+0x64/0xf8)
> [<c09b20cc>] (netif_receive_skb_internal) from [<c09b21f8>] (netif_receive_skb+0x34/0x19c)
> [<c09b21c4>] (netif_receive_skb) from [<c0791278>] (can_rx_offload_napi_poll+0x58/0xb4)
> 
> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
> ---
>  include/linux/can/skb.h | 20 ++++++++------------
>  1 file changed, 8 insertions(+), 12 deletions(-)
> 
> diff --git a/include/linux/can/skb.h b/include/linux/can/skb.h
> index a954def26c0d..0783b0c6d9e2 100644
> --- a/include/linux/can/skb.h
> +++ b/include/linux/can/skb.h
> @@ -61,21 +61,17 @@ static inline void can_skb_set_owner(struct sk_buff *skb, struct sock *sk)
>   */
>  static inline struct sk_buff *can_create_echo_skb(struct sk_buff *skb)
>  {
> -	if (skb_shared(skb)) {
> -		struct sk_buff *nskb = skb_clone(skb, GFP_ATOMIC);
> +	struct sk_buff *nskb;
>  
> -		if (likely(nskb)) {
> -			can_skb_set_owner(nskb, skb->sk);
> -			consume_skb(skb);
> -			return nskb;
> -		} else {
> -			kfree_skb(skb);
> -			return NULL;
> -		}
> +	nskb = skb_clone(skb, GFP_ATOMIC);
> +	if (unlikely(!nskb)) {
> +		kfree_skb(skb);
> +		return NULL;
>  	}
>  
> -	/* we can assume to have an unshared skb with proper owner */
> -	return skb;
> +	can_skb_set_owner(nskb, skb->sk);
> +	consume_skb(skb);
> +	return nskb;
>  }
>  
>  #endif /* !_CAN_SKB_H */
> -- 
> 2.25.0
> 
> 
> 

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] can: can_create_echo_skb(): fix echo skb generation: always use skb_clone()
  2020-02-14 12:09 ` Oleksij Rempel
@ 2020-10-16 19:36   ` Marc Kleine-Budde
  2020-10-17 19:13     ` Oliver Hartkopp
  0 siblings, 1 reply; 11+ messages in thread
From: Marc Kleine-Budde @ 2020-10-16 19:36 UTC (permalink / raw)
  To: Oleksij Rempel, dev.kurt, wg; +Cc: netdev, kernel, linux-can

[-- Attachment #1.1: Type: text/plain, Size: 4238 bytes --]

On 2/14/20 1:09 PM, Oleksij Rempel wrote:
> Hi all,
> 
> any comments on this patch?

I'm going to take this patch now for 5.10....Comments?

Marc

> 
> On Fri, Jan 24, 2020 at 02:26:56PM +0100, Oleksij Rempel wrote:
>> All user space generated SKBs are owned by a socket (unless injected
>> into the key via AF_PACKET). If a socket is closed, all associated skbs
>> will be cleaned up.
>>
>> This leads to a problem when a CAN driver calls can_put_echo_skb() on a
>> unshared SKB. If the socket is closed prior to the TX complete handler,
>> can_get_echo_skb() and the subsequent delivering of the echo SKB to
>> all registered callbacks, a SKB with a refcount of 0 is delivered.
>>
>> To avoid the problem, in can_get_echo_skb() the original SKB is now
>> always cloned, regardless of shared SKB or not. If the process exists it
>> can now safely discard its SKBs, without disturbing the delivery of the
>> echo SKB.
>>
>> The problem shows up in the j1939 stack, when it clones the
>> incoming skb, which detects the already 0 refcount.
>>
>> We can easily reproduce this with following example:
>>
>> testj1939 -B -r can0: &
>> cansend can0 1823ff40#0123
>>
>> WARNING: CPU: 0 PID: 293 at lib/refcount.c:25 refcount_warn_saturate+0x108/0x174
>> refcount_t: addition on 0; use-after-free.
>> Modules linked in: coda_vpu imx_vdoa videobuf2_vmalloc dw_hdmi_ahb_audio vcan
>> CPU: 0 PID: 293 Comm: cansend Not tainted 5.5.0-rc6-00376-g9e20dcb7040d #1
>> Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
>> Backtrace:
>> [<c010f570>] (dump_backtrace) from [<c010f90c>] (show_stack+0x20/0x24)
>> [<c010f8ec>] (show_stack) from [<c0c3e1a4>] (dump_stack+0x8c/0xa0)
>> [<c0c3e118>] (dump_stack) from [<c0127fec>] (__warn+0xe0/0x108)
>> [<c0127f0c>] (__warn) from [<c01283c8>] (warn_slowpath_fmt+0xa8/0xcc)
>> [<c0128324>] (warn_slowpath_fmt) from [<c0539c0c>] (refcount_warn_saturate+0x108/0x174)
>> [<c0539b04>] (refcount_warn_saturate) from [<c0ad2cac>] (j1939_can_recv+0x20c/0x210)
>> [<c0ad2aa0>] (j1939_can_recv) from [<c0ac9dc8>] (can_rcv_filter+0xb4/0x268)
>> [<c0ac9d14>] (can_rcv_filter) from [<c0aca2cc>] (can_receive+0xb0/0xe4)
>> [<c0aca21c>] (can_receive) from [<c0aca348>] (can_rcv+0x48/0x98)
>> [<c0aca300>] (can_rcv) from [<c09b1fdc>] (__netif_receive_skb_one_core+0x64/0x88)
>> [<c09b1f78>] (__netif_receive_skb_one_core) from [<c09b2070>] (__netif_receive_skb+0x38/0x94)
>> [<c09b2038>] (__netif_receive_skb) from [<c09b2130>] (netif_receive_skb_internal+0x64/0xf8)
>> [<c09b20cc>] (netif_receive_skb_internal) from [<c09b21f8>] (netif_receive_skb+0x34/0x19c)
>> [<c09b21c4>] (netif_receive_skb) from [<c0791278>] (can_rx_offload_napi_poll+0x58/0xb4)
>>
>> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
>> ---
>>  include/linux/can/skb.h | 20 ++++++++------------
>>  1 file changed, 8 insertions(+), 12 deletions(-)
>>
>> diff --git a/include/linux/can/skb.h b/include/linux/can/skb.h
>> index a954def26c0d..0783b0c6d9e2 100644
>> --- a/include/linux/can/skb.h
>> +++ b/include/linux/can/skb.h
>> @@ -61,21 +61,17 @@ static inline void can_skb_set_owner(struct sk_buff *skb, struct sock *sk)
>>   */
>>  static inline struct sk_buff *can_create_echo_skb(struct sk_buff *skb)
>>  {
>> -	if (skb_shared(skb)) {
>> -		struct sk_buff *nskb = skb_clone(skb, GFP_ATOMIC);
>> +	struct sk_buff *nskb;
>>  
>> -		if (likely(nskb)) {
>> -			can_skb_set_owner(nskb, skb->sk);
>> -			consume_skb(skb);
>> -			return nskb;
>> -		} else {
>> -			kfree_skb(skb);
>> -			return NULL;
>> -		}
>> +	nskb = skb_clone(skb, GFP_ATOMIC);
>> +	if (unlikely(!nskb)) {
>> +		kfree_skb(skb);
>> +		return NULL;
>>  	}
>>  
>> -	/* we can assume to have an unshared skb with proper owner */
>> -	return skb;
>> +	can_skb_set_owner(nskb, skb->sk);
>> +	consume_skb(skb);
>> +	return nskb;
>>  }
>>  
>>  #endif /* !_CAN_SKB_H */
>> -- 
>> 2.25.0
>>
>>
>>
> 


-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] can: can_create_echo_skb(): fix echo skb generation: always use skb_clone()
  2020-10-16 19:36   ` Marc Kleine-Budde
@ 2020-10-17 19:13     ` Oliver Hartkopp
  2020-10-18  8:46       ` Oliver Hartkopp
  0 siblings, 1 reply; 11+ messages in thread
From: Oliver Hartkopp @ 2020-10-17 19:13 UTC (permalink / raw)
  To: Marc Kleine-Budde, Oleksij Rempel, dev.kurt, wg; +Cc: netdev, kernel, linux-can



On 16.10.20 21:36, Marc Kleine-Budde wrote:
> On 2/14/20 1:09 PM, Oleksij Rempel wrote:
>> Hi all,
>>
>> any comments on this patch?
> 
> I'm going to take this patch now for 5.10....Comments?

Yes.

Removing the sk reference will lead to the effect, that you will receive 
the CAN frames you have sent on that socket - which is disabled by default:

https://elixir.bootlin.com/linux/latest/source/net/can/raw.c#L124

See concept here:

https://elixir.bootlin.com/linux/latest/source/Documentation/networking/can.rst#L560

How can we maintain the CAN_RAW_RECV_OWN_MSGS to be disabled by default 
and fix the described problem?

Regards,
Oliver

> 
> Marc
> 
>>
>> On Fri, Jan 24, 2020 at 02:26:56PM +0100, Oleksij Rempel wrote:
>>> All user space generated SKBs are owned by a socket (unless injected
>>> into the key via AF_PACKET). If a socket is closed, all associated skbs
>>> will be cleaned up.
>>>
>>> This leads to a problem when a CAN driver calls can_put_echo_skb() on a
>>> unshared SKB. If the socket is closed prior to the TX complete handler,
>>> can_get_echo_skb() and the subsequent delivering of the echo SKB to
>>> all registered callbacks, a SKB with a refcount of 0 is delivered.
>>>
>>> To avoid the problem, in can_get_echo_skb() the original SKB is now
>>> always cloned, regardless of shared SKB or not. If the process exists it
>>> can now safely discard its SKBs, without disturbing the delivery of the
>>> echo SKB.
>>>
>>> The problem shows up in the j1939 stack, when it clones the
>>> incoming skb, which detects the already 0 refcount.
>>>
>>> We can easily reproduce this with following example:
>>>
>>> testj1939 -B -r can0: &
>>> cansend can0 1823ff40#0123
>>>
>>> WARNING: CPU: 0 PID: 293 at lib/refcount.c:25 refcount_warn_saturate+0x108/0x174
>>> refcount_t: addition on 0; use-after-free.
>>> Modules linked in: coda_vpu imx_vdoa videobuf2_vmalloc dw_hdmi_ahb_audio vcan
>>> CPU: 0 PID: 293 Comm: cansend Not tainted 5.5.0-rc6-00376-g9e20dcb7040d #1
>>> Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
>>> Backtrace:
>>> [<c010f570>] (dump_backtrace) from [<c010f90c>] (show_stack+0x20/0x24)
>>> [<c010f8ec>] (show_stack) from [<c0c3e1a4>] (dump_stack+0x8c/0xa0)
>>> [<c0c3e118>] (dump_stack) from [<c0127fec>] (__warn+0xe0/0x108)
>>> [<c0127f0c>] (__warn) from [<c01283c8>] (warn_slowpath_fmt+0xa8/0xcc)
>>> [<c0128324>] (warn_slowpath_fmt) from [<c0539c0c>] (refcount_warn_saturate+0x108/0x174)
>>> [<c0539b04>] (refcount_warn_saturate) from [<c0ad2cac>] (j1939_can_recv+0x20c/0x210)
>>> [<c0ad2aa0>] (j1939_can_recv) from [<c0ac9dc8>] (can_rcv_filter+0xb4/0x268)
>>> [<c0ac9d14>] (can_rcv_filter) from [<c0aca2cc>] (can_receive+0xb0/0xe4)
>>> [<c0aca21c>] (can_receive) from [<c0aca348>] (can_rcv+0x48/0x98)
>>> [<c0aca300>] (can_rcv) from [<c09b1fdc>] (__netif_receive_skb_one_core+0x64/0x88)
>>> [<c09b1f78>] (__netif_receive_skb_one_core) from [<c09b2070>] (__netif_receive_skb+0x38/0x94)
>>> [<c09b2038>] (__netif_receive_skb) from [<c09b2130>] (netif_receive_skb_internal+0x64/0xf8)
>>> [<c09b20cc>] (netif_receive_skb_internal) from [<c09b21f8>] (netif_receive_skb+0x34/0x19c)
>>> [<c09b21c4>] (netif_receive_skb) from [<c0791278>] (can_rx_offload_napi_poll+0x58/0xb4)
>>>
>>> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
>>> ---
>>>   include/linux/can/skb.h | 20 ++++++++------------
>>>   1 file changed, 8 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/include/linux/can/skb.h b/include/linux/can/skb.h
>>> index a954def26c0d..0783b0c6d9e2 100644
>>> --- a/include/linux/can/skb.h
>>> +++ b/include/linux/can/skb.h
>>> @@ -61,21 +61,17 @@ static inline void can_skb_set_owner(struct sk_buff *skb, struct sock *sk)
>>>    */
>>>   static inline struct sk_buff *can_create_echo_skb(struct sk_buff *skb)
>>>   {
>>> -	if (skb_shared(skb)) {
>>> -		struct sk_buff *nskb = skb_clone(skb, GFP_ATOMIC);
>>> +	struct sk_buff *nskb;
>>>   
>>> -		if (likely(nskb)) {
>>> -			can_skb_set_owner(nskb, skb->sk);
>>> -			consume_skb(skb);
>>> -			return nskb;
>>> -		} else {
>>> -			kfree_skb(skb);
>>> -			return NULL;
>>> -		}
>>> +	nskb = skb_clone(skb, GFP_ATOMIC);
>>> +	if (unlikely(!nskb)) {
>>> +		kfree_skb(skb);
>>> +		return NULL;
>>>   	}
>>>   
>>> -	/* we can assume to have an unshared skb with proper owner */
>>> -	return skb;
>>> +	can_skb_set_owner(nskb, skb->sk);
>>> +	consume_skb(skb);
>>> +	return nskb;
>>>   }
>>>   
>>>   #endif /* !_CAN_SKB_H */
>>> -- 
>>> 2.25.0
>>>
>>>
>>>
>>
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] can: can_create_echo_skb(): fix echo skb generation: always use skb_clone()
  2020-10-17 19:13     ` Oliver Hartkopp
@ 2020-10-18  8:46       ` Oliver Hartkopp
  2020-10-19  6:28         ` Marc Kleine-Budde
  0 siblings, 1 reply; 11+ messages in thread
From: Oliver Hartkopp @ 2020-10-18  8:46 UTC (permalink / raw)
  To: Marc Kleine-Budde, Oleksij Rempel, dev.kurt, wg; +Cc: netdev, kernel, linux-can

Oh, answering myself ...

On 17.10.20 21:13, Oliver Hartkopp wrote:
> 
> 
> On 16.10.20 21:36, Marc Kleine-Budde wrote:
>> On 2/14/20 1:09 PM, Oleksij Rempel wrote:
>>> Hi all,
>>>
>>> any comments on this patch?
>>
>> I'm going to take this patch now for 5.10....Comments?
> 
> Yes.
> 
> Removing the sk reference will lead to the effect, that you will receive 
> the CAN frames you have sent on that socket - which is disabled by default:
> 
> https://elixir.bootlin.com/linux/latest/source/net/can/raw.c#L124
> 
> See concept here:
> 
> https://elixir.bootlin.com/linux/latest/source/Documentation/networking/can.rst#L560 
> 
> 
> How can we maintain the CAN_RAW_RECV_OWN_MSGS to be disabled by default 
> and fix the described problem?

>>>> +    nskb = skb_clone(skb, GFP_ATOMIC);
>>>> +    if (unlikely(!nskb)) {
>>>> +        kfree_skb(skb);
>>>> +        return NULL;
>>>>       }
>>>> -    /* we can assume to have an unshared skb with proper owner */
>>>> -    return skb;
>>>> +    can_skb_set_owner(nskb, skb->sk);

skb-> sk is still set here - so everything should be fine.

Sorry for the noise.

Regards,
Oliver


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] can: can_create_echo_skb(): fix echo skb generation: always use skb_clone()
  2020-10-18  8:46       ` Oliver Hartkopp
@ 2020-10-19  6:28         ` Marc Kleine-Budde
  2020-10-19  6:53           ` Oliver Hartkopp
  0 siblings, 1 reply; 11+ messages in thread
From: Marc Kleine-Budde @ 2020-10-19  6:28 UTC (permalink / raw)
  To: Oliver Hartkopp, Oleksij Rempel, dev.kurt, wg; +Cc: netdev, kernel, linux-can

[-- Attachment #1.1: Type: text/plain, Size: 1595 bytes --]

On 10/18/20 10:46 AM, Oliver Hartkopp wrote:
> Oh, answering myself ...
> 
> On 17.10.20 21:13, Oliver Hartkopp wrote:
>>
>>
>> On 16.10.20 21:36, Marc Kleine-Budde wrote:
>>> On 2/14/20 1:09 PM, Oleksij Rempel wrote:
>>>> Hi all,
>>>>
>>>> any comments on this patch?
>>>
>>> I'm going to take this patch now for 5.10....Comments?
>>
>> Yes.
>>
>> Removing the sk reference will lead to the effect, that you will receive 
>> the CAN frames you have sent on that socket - which is disabled by default:
>>
>> https://elixir.bootlin.com/linux/latest/source/net/can/raw.c#L124
>>
>> See concept here:
>>
>> https://elixir.bootlin.com/linux/latest/source/Documentation/networking/can.rst#L560 
>>
>>
>> How can we maintain the CAN_RAW_RECV_OWN_MSGS to be disabled by default 
>> and fix the described problem?
> 
>>>>> +    nskb = skb_clone(skb, GFP_ATOMIC);
>>>>> +    if (unlikely(!nskb)) {
>>>>> +        kfree_skb(skb);
>>>>> +        return NULL;
>>>>>       }
>>>>> -    /* we can assume to have an unshared skb with proper owner */
>>>>> -    return skb;
>>>>> +    can_skb_set_owner(nskb, skb->sk);
> 
> skb-> sk is still set here - so everything should be fine.
> 
> Sorry for the noise.

Is this a Acked-by/Reviewed-by?

regrards,
Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] can: can_create_echo_skb(): fix echo skb generation: always use skb_clone()
  2020-10-19  6:28         ` Marc Kleine-Budde
@ 2020-10-19  6:53           ` Oliver Hartkopp
  0 siblings, 0 replies; 11+ messages in thread
From: Oliver Hartkopp @ 2020-10-19  6:53 UTC (permalink / raw)
  To: Marc Kleine-Budde, Oleksij Rempel, dev.kurt, wg; +Cc: netdev, kernel, linux-can



On 19.10.20 08:28, Marc Kleine-Budde wrote:
> On 10/18/20 10:46 AM, Oliver Hartkopp wrote:
>> Oh, answering myself ...
>>
>> On 17.10.20 21:13, Oliver Hartkopp wrote:
>>>
>>>
>>> On 16.10.20 21:36, Marc Kleine-Budde wrote:
>>>> On 2/14/20 1:09 PM, Oleksij Rempel wrote:
>>>>> Hi all,
>>>>>
>>>>> any comments on this patch?
>>>>
>>>> I'm going to take this patch now for 5.10....Comments?
>>>
>>> Yes.
>>>
>>> Removing the sk reference will lead to the effect, that you will receive
>>> the CAN frames you have sent on that socket - which is disabled by default:
>>>
>>> https://elixir.bootlin.com/linux/latest/source/net/can/raw.c#L124
>>>
>>> See concept here:
>>>
>>> https://elixir.bootlin.com/linux/latest/source/Documentation/networking/can.rst#L560
>>>
>>>
>>> How can we maintain the CAN_RAW_RECV_OWN_MSGS to be disabled by default
>>> and fix the described problem?
>>
>>>>>> +    nskb = skb_clone(skb, GFP_ATOMIC);
>>>>>> +    if (unlikely(!nskb)) {
>>>>>> +        kfree_skb(skb);
>>>>>> +        return NULL;
>>>>>>        }
>>>>>> -    /* we can assume to have an unshared skb with proper owner */
>>>>>> -    return skb;
>>>>>> +    can_skb_set_owner(nskb, skb->sk);
>>
>> skb-> sk is still set here - so everything should be fine.
>>
>> Sorry for the noise.
> 
> Is this a Acked-by/Reviewed-by?

Yes.

Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>

Thanks Marc!

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, back to index

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-24 13:26 [RFC] can: can_create_echo_skb(): fix echo skb generation: always use skb_clone() Oleksij Rempel
2020-02-03 13:43 ` Naresh Kamboju
2020-02-03 13:48   ` Marc Kleine-Budde
2020-02-03 13:52     ` Marc Kleine-Budde
2020-02-03 14:36       ` Naresh Kamboju
2020-02-14 12:09 ` Oleksij Rempel
2020-10-16 19:36   ` Marc Kleine-Budde
2020-10-17 19:13     ` Oliver Hartkopp
2020-10-18  8:46       ` Oliver Hartkopp
2020-10-19  6:28         ` Marc Kleine-Budde
2020-10-19  6:53           ` Oliver Hartkopp

Netdev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/netdev/0 netdev/git/0.git
	git clone --mirror https://lore.kernel.org/netdev/1 netdev/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 netdev netdev/ https://lore.kernel.org/netdev \
		netdev@vger.kernel.org
	public-inbox-index netdev

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.netdev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git