linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] can: slcan: fix freed work crash
@ 2022-12-01  7:34 Jiri Slaby (SUSE)
  2022-12-01 10:13 ` Petr Vorel
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Jiri Slaby (SUSE) @ 2022-12-01  7:34 UTC (permalink / raw)
  To: dario.binacchi
  Cc: linux-serial, linux-kernel, Jiri Slaby (SUSE),
	Richard Palethorpe, Petr Vorel, Wolfgang Grandegger,
	Marc Kleine-Budde, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, linux-can, netdev, stable, Max Staudt

The LTP test pty03 is causing a crash in slcan:
  BUG: kernel NULL pointer dereference, address: 0000000000000008
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: 0000 [#1] PREEMPT SMP NOPTI
  CPU: 0 PID: 348 Comm: kworker/0:3 Not tainted 6.0.8-1-default #1 openSUSE Tumbleweed 9d20364b934f5aab0a9bdf84e8f45cfdfae39dab
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014
  Workqueue:  0x0 (events)
  RIP: 0010:process_one_work (/home/rich/kernel/linux/kernel/workqueue.c:706 /home/rich/kernel/linux/kernel/workqueue.c:2185)
  Code: 49 89 ff 41 56 41 55 41 54 55 53 48 89 f3 48 83 ec 10 48 8b 06 48 8b 6f 48 49 89 c4 45 30 e4 a8 04 b8 00 00 00 00 4c 0f 44 e0 <49> 8b 44 24 08 44 8b a8 00 01 00 00 41 83 e5 20 f6 45 10 04 75 0e
  RSP: 0018:ffffaf7b40f47e98 EFLAGS: 00010046
  RAX: 0000000000000000 RBX: ffff9d644e1b8b48 RCX: ffff9d649e439968
  RDX: 00000000ffff8455 RSI: ffff9d644e1b8b48 RDI: ffff9d64764aa6c0
  RBP: ffff9d649e4335c0 R08: 0000000000000c00 R09: ffff9d64764aa734
  R10: 0000000000000007 R11: 0000000000000001 R12: 0000000000000000
  R13: ffff9d649e4335e8 R14: ffff9d64490da780 R15: ffff9d64764aa6c0
  FS:  0000000000000000(0000) GS:ffff9d649e400000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000008 CR3: 0000000036424000 CR4: 00000000000006f0
  Call Trace:
   <TASK>
  worker_thread (/home/rich/kernel/linux/kernel/workqueue.c:2436)
  kthread (/home/rich/kernel/linux/kernel/kthread.c:376)
  ret_from_fork (/home/rich/kernel/linux/arch/x86/entry/entry_64.S:312)

Apparently, the slcan's tx_work is freed while being scheduled. While
slcan_netdev_close() (netdev side) calls flush_work(&sl->tx_work),
slcan_close() (tty side) does not. So when the netdev is never set UP,
but the tty is stuffed with bytes and forced to wakeup write, the work
is scheduled, but never flushed.

So add an additional flush_work() to slcan_close() to be sure the work
is flushed under all circumstances.

The Fixes commit below moved flush_work() from slcan_close() to
slcan_netdev_close(). What was the rationale behind it? Maybe we can
drop the one in slcan_netdev_close()?

I see the same pattern in can327. So it perhaps needs the very same fix.

Fixes: cfcb4465e992 ("can: slcan: remove legacy infrastructure")
Link: https://bugzilla.suse.com/show_bug.cgi?id=1205597
Reported-by: Richard Palethorpe <richard.palethorpe@suse.com>
Tested-by: Petr Vorel <petr.vorel@suse.com>
Cc: Dario Binacchi <dario.binacchi@amarulasolutions.com>
Cc: Wolfgang Grandegger <wg@grandegger.com>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: linux-can@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: stable@vger.kernel.org
Cc: Max Staudt <max@enpas.org>
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
---
 drivers/net/can/slcan/slcan-core.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/net/can/slcan/slcan-core.c b/drivers/net/can/slcan/slcan-core.c
index fbb34139daa1..f4db77007c13 100644
--- a/drivers/net/can/slcan/slcan-core.c
+++ b/drivers/net/can/slcan/slcan-core.c
@@ -864,12 +864,14 @@ static void slcan_close(struct tty_struct *tty)
 {
 	struct slcan *sl = (struct slcan *)tty->disc_data;
 
-	/* unregister_netdev() calls .ndo_stop() so we don't have to.
-	 * Our .ndo_stop() also flushes the TTY write wakeup handler,
-	 * so we can safely set sl->tty = NULL after this.
-	 */
 	unregister_candev(sl->dev);
 
+	/*
+	 * The netdev needn't be UP (so .ndo_stop() is not called). Hence make
+	 * sure this is not running before freeing it up.
+	 */
+	flush_work(&sl->tx_work);
+
 	/* Mark channel as dead */
 	spin_lock_bh(&sl->lock);
 	tty->disc_data = NULL;
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] can: slcan: fix freed work crash
  2022-12-01  7:34 [PATCH] can: slcan: fix freed work crash Jiri Slaby (SUSE)
@ 2022-12-01 10:13 ` Petr Vorel
  2022-12-01 18:52 ` Max Staudt
  2022-12-02 15:27 ` Marc Kleine-Budde
  2 siblings, 0 replies; 6+ messages in thread
From: Petr Vorel @ 2022-12-01 10:13 UTC (permalink / raw)
  To: Jiri Slaby (SUSE)
  Cc: dario.binacchi, linux-serial, linux-kernel, Richard Palethorpe,
	Petr Vorel, Wolfgang Grandegger, Marc Kleine-Budde,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	linux-can, netdev, stable, Max Staudt, ltp

Hi all,

Cc LTP mailing list.
Jiri, thanks a lot for quick fix!

Kind regards,
Petr

> The LTP test pty03 is causing a crash in slcan:
>   BUG: kernel NULL pointer dereference, address: 0000000000000008
>   #PF: supervisor read access in kernel mode
>   #PF: error_code(0x0000) - not-present page
>   PGD 0 P4D 0
>   Oops: 0000 [#1] PREEMPT SMP NOPTI
>   CPU: 0 PID: 348 Comm: kworker/0:3 Not tainted 6.0.8-1-default #1 openSUSE Tumbleweed 9d20364b934f5aab0a9bdf84e8f45cfdfae39dab
>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014
>   Workqueue:  0x0 (events)
>   RIP: 0010:process_one_work (/home/rich/kernel/linux/kernel/workqueue.c:706 /home/rich/kernel/linux/kernel/workqueue.c:2185)
>   Code: 49 89 ff 41 56 41 55 41 54 55 53 48 89 f3 48 83 ec 10 48 8b 06 48 8b 6f 48 49 89 c4 45 30 e4 a8 04 b8 00 00 00 00 4c 0f 44 e0 <49> 8b 44 24 08 44 8b a8 00 01 00 00 41 83 e5 20 f6 45 10 04 75 0e
>   RSP: 0018:ffffaf7b40f47e98 EFLAGS: 00010046
>   RAX: 0000000000000000 RBX: ffff9d644e1b8b48 RCX: ffff9d649e439968
>   RDX: 00000000ffff8455 RSI: ffff9d644e1b8b48 RDI: ffff9d64764aa6c0
>   RBP: ffff9d649e4335c0 R08: 0000000000000c00 R09: ffff9d64764aa734
>   R10: 0000000000000007 R11: 0000000000000001 R12: 0000000000000000
>   R13: ffff9d649e4335e8 R14: ffff9d64490da780 R15: ffff9d64764aa6c0
>   FS:  0000000000000000(0000) GS:ffff9d649e400000(0000) knlGS:0000000000000000
>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: 0000000000000008 CR3: 0000000036424000 CR4: 00000000000006f0
>   Call Trace:
>    <TASK>
>   worker_thread (/home/rich/kernel/linux/kernel/workqueue.c:2436)
>   kthread (/home/rich/kernel/linux/kernel/kthread.c:376)
>   ret_from_fork (/home/rich/kernel/linux/arch/x86/entry/entry_64.S:312)

> Apparently, the slcan's tx_work is freed while being scheduled. While
> slcan_netdev_close() (netdev side) calls flush_work(&sl->tx_work),
> slcan_close() (tty side) does not. So when the netdev is never set UP,
> but the tty is stuffed with bytes and forced to wakeup write, the work
> is scheduled, but never flushed.

> So add an additional flush_work() to slcan_close() to be sure the work
> is flushed under all circumstances.

> The Fixes commit below moved flush_work() from slcan_close() to
> slcan_netdev_close(). What was the rationale behind it? Maybe we can
> drop the one in slcan_netdev_close()?

> I see the same pattern in can327. So it perhaps needs the very same fix.

> Fixes: cfcb4465e992 ("can: slcan: remove legacy infrastructure")
> Link: https://bugzilla.suse.com/show_bug.cgi?id=1205597
> Reported-by: Richard Palethorpe <richard.palethorpe@suse.com>
> Tested-by: Petr Vorel <petr.vorel@suse.com>
> Cc: Dario Binacchi <dario.binacchi@amarulasolutions.com>
> Cc: Wolfgang Grandegger <wg@grandegger.com>
> Cc: Marc Kleine-Budde <mkl@pengutronix.de>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Paolo Abeni <pabeni@redhat.com>
> Cc: linux-can@vger.kernel.org
> Cc: netdev@vger.kernel.org
> Cc: stable@vger.kernel.org
> Cc: Max Staudt <max@enpas.org>
> Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
> ---
>  drivers/net/can/slcan/slcan-core.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)

> diff --git a/drivers/net/can/slcan/slcan-core.c b/drivers/net/can/slcan/slcan-core.c
> index fbb34139daa1..f4db77007c13 100644
> --- a/drivers/net/can/slcan/slcan-core.c
> +++ b/drivers/net/can/slcan/slcan-core.c
> @@ -864,12 +864,14 @@ static void slcan_close(struct tty_struct *tty)
>  {
>  	struct slcan *sl = (struct slcan *)tty->disc_data;

> -	/* unregister_netdev() calls .ndo_stop() so we don't have to.
> -	 * Our .ndo_stop() also flushes the TTY write wakeup handler,
> -	 * so we can safely set sl->tty = NULL after this.
> -	 */
>  	unregister_candev(sl->dev);

> +	/*
> +	 * The netdev needn't be UP (so .ndo_stop() is not called). Hence make
> +	 * sure this is not running before freeing it up.
> +	 */
> +	flush_work(&sl->tx_work);
> +
>  	/* Mark channel as dead */
>  	spin_lock_bh(&sl->lock);
>  	tty->disc_data = NULL;

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] can: slcan: fix freed work crash
  2022-12-01  7:34 [PATCH] can: slcan: fix freed work crash Jiri Slaby (SUSE)
  2022-12-01 10:13 ` Petr Vorel
@ 2022-12-01 18:52 ` Max Staudt
  2022-12-02 12:14   ` Marc Kleine-Budde
  2022-12-02 15:27 ` Marc Kleine-Budde
  2 siblings, 1 reply; 6+ messages in thread
From: Max Staudt @ 2022-12-01 18:52 UTC (permalink / raw)
  To: Jiri Slaby (SUSE)
  Cc: dario.binacchi, linux-serial, linux-kernel, Richard Palethorpe,
	Petr Vorel, Wolfgang Grandegger, Marc Kleine-Budde,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	linux-can, netdev, stable, ltp

(CC: ltp@lists.linux.it because Petr did so.)

Hi Jiry,

Thanks for finding this!


Your patch looks correct to me, so please have a

  Reviewed-by: Max Staudt <max@enpas.org>

for both this patch to slcan, as well as an 1:1 patch to can327.



Some history:

This is actually my code from can327, which was backported to slcan as
part of Dario's larger modernisation effort.

The rationale for moving it was to flush the UART TX buffer in case of
ndo_close(), in order to bring the device into a more predictable state
between ndo_close() and ndo_open(). I guess that's actually
counterproductive - whatever is in the TX buffer at that time should
likely be fully sent. For example, can327 sends one last byte to abort
any running chatty monitoring mode from the adapter. So your patch also
fixes this as well ;)

Of course, this resulted in calling flush_worker() in both ndo_stop()
and ldisc_close(), so I wanted to avoid code duplication, and relied on
ndo_stop(). Oops.



Thanks,

Max

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] can: slcan: fix freed work crash
  2022-12-01 18:52 ` Max Staudt
@ 2022-12-02 12:14   ` Marc Kleine-Budde
  0 siblings, 0 replies; 6+ messages in thread
From: Marc Kleine-Budde @ 2022-12-02 12:14 UTC (permalink / raw)
  To: Max Staudt
  Cc: Jiri Slaby (SUSE),
	dario.binacchi, linux-serial, linux-kernel, Richard Palethorpe,
	Petr Vorel, Wolfgang Grandegger, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, linux-can, netdev, stable, ltp

[-- Attachment #1: Type: text/plain, Size: 664 bytes --]

On 02.12.2022 03:52:42, Max Staudt wrote:
> (CC: ltp@lists.linux.it because Petr did so.)
> 
> Hi Jiry,
> 
> Thanks for finding this!
> 
> 
> Your patch looks correct to me, so please have a
> 
>   Reviewed-by: Max Staudt <max@enpas.org>
> 
> for both this patch to slcan, as well as an 1:1 patch to can327.

Max, can you create a patch for the can327 driver?

regards,
Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] can: slcan: fix freed work crash
  2022-12-01  7:34 [PATCH] can: slcan: fix freed work crash Jiri Slaby (SUSE)
  2022-12-01 10:13 ` Petr Vorel
  2022-12-01 18:52 ` Max Staudt
@ 2022-12-02 15:27 ` Marc Kleine-Budde
  2022-12-02 15:32   ` Max Staudt
  2 siblings, 1 reply; 6+ messages in thread
From: Marc Kleine-Budde @ 2022-12-02 15:27 UTC (permalink / raw)
  To: Jiri Slaby (SUSE)
  Cc: dario.binacchi, linux-serial, linux-kernel, Richard Palethorpe,
	Petr Vorel, Wolfgang Grandegger, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, linux-can, netdev, stable,
	Max Staudt

[-- Attachment #1: Type: text/plain, Size: 3555 bytes --]

On 01.12.2022 08:34:26, Jiri Slaby (SUSE) wrote:
> The LTP test pty03 is causing a crash in slcan:
>   BUG: kernel NULL pointer dereference, address: 0000000000000008
>   #PF: supervisor read access in kernel mode
>   #PF: error_code(0x0000) - not-present page
>   PGD 0 P4D 0
>   Oops: 0000 [#1] PREEMPT SMP NOPTI
>   CPU: 0 PID: 348 Comm: kworker/0:3 Not tainted 6.0.8-1-default #1 openSUSE Tumbleweed 9d20364b934f5aab0a9bdf84e8f45cfdfae39dab
>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014
>   Workqueue:  0x0 (events)
>   RIP: 0010:process_one_work (/home/rich/kernel/linux/kernel/workqueue.c:706 /home/rich/kernel/linux/kernel/workqueue.c:2185)
>   Code: 49 89 ff 41 56 41 55 41 54 55 53 48 89 f3 48 83 ec 10 48 8b 06 48 8b 6f 48 49 89 c4 45 30 e4 a8 04 b8 00 00 00 00 4c 0f 44 e0 <49> 8b 44 24 08 44 8b a8 00 01 00 00 41 83 e5 20 f6 45 10 04 75 0e
>   RSP: 0018:ffffaf7b40f47e98 EFLAGS: 00010046
>   RAX: 0000000000000000 RBX: ffff9d644e1b8b48 RCX: ffff9d649e439968
>   RDX: 00000000ffff8455 RSI: ffff9d644e1b8b48 RDI: ffff9d64764aa6c0
>   RBP: ffff9d649e4335c0 R08: 0000000000000c00 R09: ffff9d64764aa734
>   R10: 0000000000000007 R11: 0000000000000001 R12: 0000000000000000
>   R13: ffff9d649e4335e8 R14: ffff9d64490da780 R15: ffff9d64764aa6c0
>   FS:  0000000000000000(0000) GS:ffff9d649e400000(0000) knlGS:0000000000000000
>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: 0000000000000008 CR3: 0000000036424000 CR4: 00000000000006f0
>   Call Trace:
>    <TASK>
>   worker_thread (/home/rich/kernel/linux/kernel/workqueue.c:2436)
>   kthread (/home/rich/kernel/linux/kernel/kthread.c:376)
>   ret_from_fork (/home/rich/kernel/linux/arch/x86/entry/entry_64.S:312)
> 
> Apparently, the slcan's tx_work is freed while being scheduled. While
> slcan_netdev_close() (netdev side) calls flush_work(&sl->tx_work),
> slcan_close() (tty side) does not. So when the netdev is never set UP,
> but the tty is stuffed with bytes and forced to wakeup write, the work
> is scheduled, but never flushed.
> 
> So add an additional flush_work() to slcan_close() to be sure the work
> is flushed under all circumstances.
> 
> The Fixes commit below moved flush_work() from slcan_close() to
> slcan_netdev_close(). What was the rationale behind it? Maybe we can
> drop the one in slcan_netdev_close()?
> 
> I see the same pattern in can327. So it perhaps needs the very same fix.
> 
> Fixes: cfcb4465e992 ("can: slcan: remove legacy infrastructure")
> Link: https://bugzilla.suse.com/show_bug.cgi?id=1205597
> Reported-by: Richard Palethorpe <richard.palethorpe@suse.com>
> Tested-by: Petr Vorel <petr.vorel@suse.com>
> Cc: Dario Binacchi <dario.binacchi@amarulasolutions.com>
> Cc: Wolfgang Grandegger <wg@grandegger.com>
> Cc: Marc Kleine-Budde <mkl@pengutronix.de>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Paolo Abeni <pabeni@redhat.com>
> Cc: linux-can@vger.kernel.org
> Cc: netdev@vger.kernel.org
> Cc: stable@vger.kernel.org
> Cc: Max Staudt <max@enpas.org>
> Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>

Added to linux-can,

Thanks,
Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] can: slcan: fix freed work crash
  2022-12-02 15:27 ` Marc Kleine-Budde
@ 2022-12-02 15:32   ` Max Staudt
  0 siblings, 0 replies; 6+ messages in thread
From: Max Staudt @ 2022-12-02 15:32 UTC (permalink / raw)
  To: Marc Kleine-Budde
  Cc: Jiri Slaby (SUSE),
	dario.binacchi, linux-serial, linux-kernel, Richard Palethorpe,
	Petr Vorel, Wolfgang Grandegger, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, linux-can, netdev, stable

This patch fixes the crash, but is IMHO incomplete: The flush_work() in
.ndo_stop() should also be removed, since its existence implies
unexpected behaviour.

In other words, my moving it there in can327 was a double mistake, and
slcan just happened to copy my mistake over.

I'm preparing a patch for can327, and it will remove the flush from
.ndo_stop(). What shall we do about slcan?


Max

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-12-02 15:32 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-01  7:34 [PATCH] can: slcan: fix freed work crash Jiri Slaby (SUSE)
2022-12-01 10:13 ` Petr Vorel
2022-12-01 18:52 ` Max Staudt
2022-12-02 12:14   ` Marc Kleine-Budde
2022-12-02 15:27 ` Marc Kleine-Budde
2022-12-02 15:32   ` Max Staudt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).