All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joakim Zhang <qiangqing.zhang@nxp.com>
To: Joakim Zhang <qiangqing.zhang@nxp.com>,
	Marc Kleine-Budde <mkl@pengutronix.de>,
	Oleksij Rempel <o.rempel@pengutronix.de>,
	Oliver Hartkopp <socketcan@hartkopp.net>
Cc: "kernel@pengutronix.de" <kernel@pengutronix.de>,
	"linux-can@vger.kernel.org" <linux-can@vger.kernel.org>
Subject: RE: [5.10 CAN BUG report] kernel dump about echo skb
Date: Fri, 22 Jan 2021 07:53:39 +0000	[thread overview]
Message-ID: <DB8PR04MB679559428DCCA341EE53A6E4E6A00@DB8PR04MB6795.eurprd04.prod.outlook.com> (raw)
In-Reply-To: <DB8PR04MB6795E0155F4523CA9D4AAC1CE6A10@DB8PR04MB6795.eurprd04.prod.outlook.com>


@Marc Kleine-Budde, @Oleksij Rempel, any comments? Thanks a lot.

Best Regards,
Joakim Zhang

> -----Original Message-----
> From: Joakim Zhang <qiangqing.zhang@nxp.com>
> Sent: 2021年1月21日 16:38
> To: Marc Kleine-Budde <mkl@pengutronix.de>; Oleksij Rempel
> <o.rempel@pengutronix.de>; Oliver Hartkopp <socketcan@hartkopp.net>
> Cc: kernel@pengutronix.de; linux-can@vger.kernel.org
> Subject: [5.10 CAN BUG report] kernel dump about echo skb
> 
> 
> Hi Marc, Oleksij, Oliver,
> 
> Our auto test scripts reports below kernel dump, after revert Oleksij Rempel
> patch "286228d382ba can: can_create_echo_skb(): fix echo skb generation:
> always use skb_clone()", this issue disappeared.
> What strange is that, commit message intends to fix this issue. Report it here
> to see if anyone meets the same issue or can give some suggestions. I will
> continue to dig how to reproduce it manually, now I don't know how to do it.
> Thanks.
> 
> 1. Kernel dump log:
> [  101.688327] ------------[ cut here ]------------ [  101.692968] refcount_t:
> addition on 0; use-after-free.
> [  101.698210] WARNING: CPU: 0 PID: 0 at lib/refcount.c:25
> refcount_warn_saturat
> e+0x98/0x140
> [  101.706386] Modules linked in:
> [  101.709446] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
> 5.10.4-100715-gc60abf9c 2110-dirty #376 [  101.717881] Hardware name:
> NXP i.MX8MPlus EVK board (DT) [  101.723193] pstate: 40000005 (nZcv daif
> -PAN -UAO -TCO BTYPE=--) [  101.729200] pc :
> refcount_warn_saturate+0x98/0x140 [  101.733990] lr :
> refcount_warn_saturate+0x98/0x140 [  101.738778] sp : ffff800011bcbcd0
> [  101.742090] x29: ffff800011bcbcd0 x28: ffff0000c16b7000 [  101.747405]
> x27: 0000000000000000 x26: ffff0000c1718000 [  101.752718] x25:
> 00000000048c0000 x24: ffff0000c1e54c00 [  101.758031] x23:
> 0000000000000000 x22: ffff0000c1718900 [  101.763345] x21:
> ffff0000c1718000 x20: ffff0000c1f00800 [  101.768661] x19: ffff0000c1985e00
> x18: 0000000000000010 [  101.773974] x17: 0000000000000000 x16:
> 0000000000000000 [  101.779287] x15: 0000000000000229 x14:
> ffff8000119d2cb8 [  101.784600] x13: 00000000ffffffea x12: ffff800011a51b60
> [  101.789914] x11: 0000000000000003 x10: ffff800011a39b78 [  101.795227]
> x9 : 0000000000017fe8 x8 : ffff800011a39b20 [  101.800543] x7 :
> c0000000ffffefff x6 : 0000000000000001 [  101.805858] x5 :
> 0000000000000000 x4 : ffff00017f36f900 [  101.811174] x3 : ffff00017f376830
> x2 : ffff00017f36f900 [  101.816487] x1 : 5ead5e088841e800 x0 :
> 0000000000000000 [  101.821800] Call trace:
> [  101.824248]  refcount_warn_saturate+0x98/0x140 [  101.828694]
> can_put_echo_skb+0x154/0x190 [  101.832704]
> flexcan_start_xmit+0x188/0x240 [  101.836888]
> dev_hard_start_xmit+0xe0/0x1a8 [  101.841072]
> sch_direct_xmit+0x11c/0x338 [  101.844995]  __qdisc_run+0x118/0x5d8
> [  101.848570]  net_tx_action+0x110/0x1a8 [  101.852319]
> efi_header_end+0x120/0x264 [  101.856156]  irq_exit+0xb8/0xd8
> [  101.859298]  __handle_domain_irq+0x64/0xb8 [  101.863396]
> gic_handle_irq+0x54/0x118 [  101.867145]  el1_irq+0xc0/0x180
> [  101.870288]  cpuidle_enter_state+0x130/0x2e8 [  101.874560]
> cpuidle_enter+0x34/0x48 [  101.878136]  call_cpuidle+0x18/0x38
> [  101.881627]  do_idle+0x1f4/0x258 [  101.884854]
> cpu_startup_entry+0x20/0x68 [  101.888778]  rest_init+0xd4/0xe0
> [  101.892008]  arch_call_rest_init+0xc/0x14 [  101.896019]
> start_kernel+0x470/0x4a4 [  101.899681] ---[ end trace
> 8de9661669c150a6 ]--- [  101.908725] ------------[ cut here ]------------
> [  101.913379] refcount_t: underflow; use-after-free.
> [  101.918360] WARNING: CPU: 0 PID: 9 at lib/refcount.c:28
> refcount_warn_saturat
> e+0xec/0x140
> [  101.926544] Modules linked in:
> [  101.929605] CPU: 0 PID: 9 Comm: ksoftirqd/0 Tainted: G        W
> 5.10.
> 4-100715-gc60abf9c2110-dirty #376
> [  101.939603] Hardware name: NXP i.MX8MPlus EVK board (DT)
> [  101.944914] pstate: 40000005 (nZcv daif -PAN -UAO -TCO BTYPE=--)
> [  101.950921] pc : refcount_warn_saturate+0xec/0x140 [  101.955709] lr :
> refcount_warn_saturate+0xec/0x140 [  101.960497] sp : ffff800011c73b00
> [  101.963809] x29: ffff800011c73b00 x28: 000000000000003d [  101.969123]
> x27: ffff00017f37a880 x26: ffff800011c73d48 [  101.974435] x25:
> 000000000000012c x24: ffff800011c73d58 [  101.979748] x23:
> ffff0000c1718000 x22: 0000000000000000 [  101.985061] x21:
> ffff800011affcc0 x20: ffff0000c13eeb80 [  101.990374] x19: ffff0000c1f00800
> x18: 0000000000000010 [  101.995686] x17: 0000000000000000 x16:
> 0000000000000000 [  102.000999] x15: 0000000000000258 x14:
> ffff0000c007f278 [  102.006311] x13: 00000000ffffffea x12: ffff800011a51b60
> [  102.011624] x11: 0000000000000003 x10: ffff800011a39b78 [  102.016936]
> x9 : 0000000000017fe8 x8 : ffff800011a39b20 [  102.022249] x7 :
> c0000000ffffefff x6 : 0000000000000001 [  102.027562] x5 :
> 0000000000000000 x4 : ffff00017f36f900 [  102.032874] x3 : ffff00017f376830
> x2 : ffff00017f36f900 [  102.038187] x1 : b6fa97510f7af300 x0 :
> 0000000000000000 [  102.043500] Call trace:
> [  102.045947]  refcount_warn_saturate+0xec/0x140 [  102.050391]
> sock_efree+0x6c/0x78 [  102.053707]  skb_release_head_state+0x40/0xb0
> [  102.058065]  skb_release_all+0x14/0x30 [  102.061813]
> kfree_skb+0x2c/0x88 [  102.065041]  can_receive+0xb4/0xf8 [  102.068442]
> can_rcv+0x3c/0xc0 [  102.071496]
> __netif_receive_skb_one_core+0x50/0x78
> [  102.076373]  __netif_receive_skb+0x14/0x60 [  102.080468]
> netif_receive_skb+0x34/0xc0 [  102.084392]
> can_rx_offload_napi_poll+0x60/0xd8
> [  102.088921]  net_rx_action+0x114/0x378 [  102.092670]
> efi_header_end+0x120/0x264 [  102.096506]  run_ksoftirqd+0x44/0x58
> [  102.100081]  smpboot_thread_fn+0x1b4/0x270 [  102.104177]
> kthread+0x138/0x158 [  102.107405]  ret_from_fork+0x10/0x18
> [  102.110979] ---[ end trace 8de9661669c150a7 ]---
> 
> 2. Patch introduces issue:
> commit 286228d382ba6320f04fa2e7c6fc8d4d92e428f4
> Author: Oleksij Rempel <o.rempel@pengutronix.de>
> Date:   Wed Dec 18 09:39:02 2019 +0100
> 
>     can: can_create_echo_skb(): fix echo skb generation: always use
> skb_clone()
> 
>     All user space generated SKBs are owned by a socket (unless injected into
> the
>     key via AF_PACKET). If a socket is closed, all associated skbs will be
> cleaned
>     up.
> 
>     This leads to a problem when a CAN driver calls can_put_echo_skb() on a
>     unshared SKB. If the socket is closed prior to the TX complete handler,
>     can_get_echo_skb() and the subsequent delivering of the echo SKB to all
>     registered callbacks, a SKB with a refcount of 0 is delivered.
> 
>     To avoid the problem, in can_get_echo_skb() the original SKB is now
> always
>     cloned, regardless of shared SKB or not. If the process exists it can now
>     safely discard its SKBs, without disturbing the delivery of the echo SKB.
> 
>     The problem shows up in the j1939 stack, when it clones the incoming skb,
> which
>     detects the already 0 refcount.
> 
>     We can easily reproduce this with following example:
> 
>     testj1939 -B -r can0: &
>     cansend can0 1823ff40#0123
> 
>     WARNING: CPU: 0 PID: 293 at lib/refcount.c:25
> refcount_warn_saturate+0x108/0x174
>     refcount_t: addition on 0; use-after-free.
>     Modules linked in: coda_vpu imx_vdoa videobuf2_vmalloc
> dw_hdmi_ahb_audio vcan
>     CPU: 0 PID: 293 Comm: cansend Not tainted
> 5.5.0-rc6-00376-g9e20dcb7040d #1
>     Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
>     Backtrace:
>     [<c010f570>] (dump_backtrace) from [<c010f90c>]
> (show_stack+0x20/0x24)
>     [<c010f8ec>] (show_stack) from [<c0c3e1a4>] (dump_stack+0x8c/0xa0)
>     [<c0c3e118>] (dump_stack) from [<c0127fec>] (__warn+0xe0/0x108)
>     [<c0127f0c>] (__warn) from [<c01283c8>]
> (warn_slowpath_fmt+0xa8/0xcc)
>     [<c0128324>] (warn_slowpath_fmt) from [<c0539c0c>]
> (refcount_warn_saturate+0x108/0x174)
>     [<c0539b04>] (refcount_warn_saturate) from [<c0ad2cac>]
> (j1939_can_recv+0x20c/0x210)
>     [<c0ad2aa0>] (j1939_can_recv) from [<c0ac9dc8>]
> (can_rcv_filter+0xb4/0x268)
>     [<c0ac9d14>] (can_rcv_filter) from [<c0aca2cc>] (can_receive+0xb0/0xe4)
>     [<c0aca21c>] (can_receive) from [<c0aca348>] (can_rcv+0x48/0x98)
>     [<c0aca300>] (can_rcv) from [<c09b1fdc>]
> (__netif_receive_skb_one_core+0x64/0x88)
>     [<c09b1f78>] (__netif_receive_skb_one_core) from [<c09b2070>]
> (__netif_receive_skb+0x38/0x94)
>     [<c09b2038>] (__netif_receive_skb) from [<c09b2130>]
> (netif_receive_skb_internal+0x64/0xf8)
>     [<c09b20cc>] (netif_receive_skb_internal) from [<c09b21f8>]
> (netif_receive_skb+0x34/0x19c)
>     [<c09b21c4>] (netif_receive_skb) from [<c0791278>]
> (can_rx_offload_napi_poll+0x58/0xb4)
> 
>     Fixes: 0ae89beb283a ("can: add destructor for self generated skbs")
>     Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
>     Link:
> https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flore.kern
> el.org%2Fr%2F20200124132656.22156-1-o.rempel%40pengutronix.de&amp;d
> ata=04%7C01%7Cqiangqing.zhang%40nxp.com%7C2d128ac66b3f4e93db9a08
> d8bde80d18%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C6374681
> 51639635283%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIj
> oiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=QQtX%2
> FCpYRmeynsy18qnLFUMyFgwUSgzbQeSqYGwD3UM%3D&amp;reserved=0
>     Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
>     Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
> 
> Best Regards,
> Joakim Zhang


  reply	other threads:[~2021-01-22  7:54 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-21  8:37 [5.10 CAN BUG report] kernel dump about echo skb Joakim Zhang
2021-01-22  7:53 ` Joakim Zhang [this message]
2021-01-22  9:08   ` Vincent MAILHOL
2021-01-22  9:23     ` Marc Kleine-Budde
2021-01-22  9:40       ` Oleksij Rempel
2021-01-22  9:51         ` Vincent MAILHOL
2021-01-23  7:40           ` Joakim Zhang
2021-01-22  8:58 ` Marc Kleine-Budde

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DB8PR04MB679559428DCCA341EE53A6E4E6A00@DB8PR04MB6795.eurprd04.prod.outlook.com \
    --to=qiangqing.zhang@nxp.com \
    --cc=kernel@pengutronix.de \
    --cc=linux-can@vger.kernel.org \
    --cc=mkl@pengutronix.de \
    --cc=o.rempel@pengutronix.de \
    --cc=socketcan@hartkopp.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.