linux-can.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] can: j1939: avoid possible use-after-free when j1939_can_rx_register fails
@ 2023-05-26 17:19 Fedor Pchelkin
  2023-05-26 17:19 ` [PATCH 1/2] can: j1939: change j1939_netdev_lock type to mutex Fedor Pchelkin
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Fedor Pchelkin @ 2023-05-26 17:19 UTC (permalink / raw)
  To: Oleksij Rempel
  Cc: Fedor Pchelkin, Marc Kleine-Budde, kernel, Robin van der Gracht,
	Oliver Hartkopp, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Kurt Van Dijck, linux-can, netdev, linux-kernel,
	Alexey Khoroshilov, lvc-project

The patch series fixes a possible racy use-after-free scenario described
in 2/2: if j1939_can_rx_register() fails then the concurrent thread may
have already read the invalid priv structure.

The 1/2 makes j1939_netdev_lock a mutex so that access to
j1939_can_rx_register() can be serialized without changing GFP_KERNEL to
GFP_ATOMIC inside can_rx_register(). This seems to be safe.

Note that the patch series has been tested only via Syzkaller and not with
a real device.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] can: j1939: change j1939_netdev_lock type to mutex
  2023-05-26 17:19 [PATCH 0/2] can: j1939: avoid possible use-after-free when j1939_can_rx_register fails Fedor Pchelkin
@ 2023-05-26 17:19 ` Fedor Pchelkin
  2023-06-02 12:33   ` Oleksij Rempel
  2023-05-26 17:19 ` [PATCH 2/2] can: j1939: avoid possible use-after-free when j1939_can_rx_register fails Fedor Pchelkin
  2023-06-05  6:37 ` [PATCH 0/2] " Marc Kleine-Budde
  2 siblings, 1 reply; 11+ messages in thread
From: Fedor Pchelkin @ 2023-05-26 17:19 UTC (permalink / raw)
  To: Oleksij Rempel
  Cc: Fedor Pchelkin, Marc Kleine-Budde, kernel, Robin van der Gracht,
	Oliver Hartkopp, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Kurt Van Dijck, linux-can, netdev, linux-kernel,
	Alexey Khoroshilov, lvc-project

It turns out access to j1939_can_rx_register() needs to be serialized,
otherwise j1939_priv can be corrupted when parallel threads call
j1939_netdev_start() and j1939_can_rx_register() fails. This issue is
thoroughly covered in other commit which serializes access to
j1939_can_rx_register().

Change j1939_netdev_lock type to mutex so that we do not need to remove
GFP_KERNEL from can_rx_register().

j1939_netdev_lock seems to be used in normal contexts where mutex usage
is not prohibited.

Found by Linux Verification Center (linuxtesting.org) with Syzkaller.

Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
Suggested-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
---
Note that it has been only tested via Syzkaller and not with real
hardware.

 net/can/j1939/main.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/net/can/j1939/main.c b/net/can/j1939/main.c
index 821d4ff303b3..6ed79afe19a5 100644
--- a/net/can/j1939/main.c
+++ b/net/can/j1939/main.c
@@ -126,7 +126,7 @@ static void j1939_can_recv(struct sk_buff *iskb, void *data)
 #define J1939_CAN_ID CAN_EFF_FLAG
 #define J1939_CAN_MASK (CAN_EFF_FLAG | CAN_RTR_FLAG)
 
-static DEFINE_SPINLOCK(j1939_netdev_lock);
+static DEFINE_MUTEX(j1939_netdev_lock);
 
 static struct j1939_priv *j1939_priv_create(struct net_device *ndev)
 {
@@ -220,7 +220,7 @@ static void __j1939_rx_release(struct kref *kref)
 	j1939_can_rx_unregister(priv);
 	j1939_ecu_unmap_all(priv);
 	j1939_priv_set(priv->ndev, NULL);
-	spin_unlock(&j1939_netdev_lock);
+	mutex_unlock(&j1939_netdev_lock);
 }
 
 /* get pointer to priv without increasing ref counter */
@@ -248,9 +248,9 @@ static struct j1939_priv *j1939_priv_get_by_ndev(struct net_device *ndev)
 {
 	struct j1939_priv *priv;
 
-	spin_lock(&j1939_netdev_lock);
+	mutex_lock(&j1939_netdev_lock);
 	priv = j1939_priv_get_by_ndev_locked(ndev);
-	spin_unlock(&j1939_netdev_lock);
+	mutex_unlock(&j1939_netdev_lock);
 
 	return priv;
 }
@@ -260,14 +260,14 @@ struct j1939_priv *j1939_netdev_start(struct net_device *ndev)
 	struct j1939_priv *priv, *priv_new;
 	int ret;
 
-	spin_lock(&j1939_netdev_lock);
+	mutex_lock(&j1939_netdev_lock);
 	priv = j1939_priv_get_by_ndev_locked(ndev);
 	if (priv) {
 		kref_get(&priv->rx_kref);
-		spin_unlock(&j1939_netdev_lock);
+		mutex_unlock(&j1939_netdev_lock);
 		return priv;
 	}
-	spin_unlock(&j1939_netdev_lock);
+	mutex_unlock(&j1939_netdev_lock);
 
 	priv = j1939_priv_create(ndev);
 	if (!priv)
@@ -277,20 +277,20 @@ struct j1939_priv *j1939_netdev_start(struct net_device *ndev)
 	spin_lock_init(&priv->j1939_socks_lock);
 	INIT_LIST_HEAD(&priv->j1939_socks);
 
-	spin_lock(&j1939_netdev_lock);
+	mutex_lock(&j1939_netdev_lock);
 	priv_new = j1939_priv_get_by_ndev_locked(ndev);
 	if (priv_new) {
 		/* Someone was faster than us, use their priv and roll
 		 * back our's.
 		 */
 		kref_get(&priv_new->rx_kref);
-		spin_unlock(&j1939_netdev_lock);
+		mutex_unlock(&j1939_netdev_lock);
 		dev_put(ndev);
 		kfree(priv);
 		return priv_new;
 	}
 	j1939_priv_set(ndev, priv);
-	spin_unlock(&j1939_netdev_lock);
+	mutex_unlock(&j1939_netdev_lock);
 
 	ret = j1939_can_rx_register(priv);
 	if (ret < 0)
@@ -308,7 +308,7 @@ struct j1939_priv *j1939_netdev_start(struct net_device *ndev)
 
 void j1939_netdev_stop(struct j1939_priv *priv)
 {
-	kref_put_lock(&priv->rx_kref, __j1939_rx_release, &j1939_netdev_lock);
+	kref_put_mutex(&priv->rx_kref, __j1939_rx_release, &j1939_netdev_lock);
 	j1939_priv_put(priv);
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] can: j1939: avoid possible use-after-free when j1939_can_rx_register fails
  2023-05-26 17:19 [PATCH 0/2] can: j1939: avoid possible use-after-free when j1939_can_rx_register fails Fedor Pchelkin
  2023-05-26 17:19 ` [PATCH 1/2] can: j1939: change j1939_netdev_lock type to mutex Fedor Pchelkin
@ 2023-05-26 17:19 ` Fedor Pchelkin
  2023-05-26 18:15   ` Oleksij Rempel
  2023-06-02 12:35   ` Oleksij Rempel
  2023-06-05  6:37 ` [PATCH 0/2] " Marc Kleine-Budde
  2 siblings, 2 replies; 11+ messages in thread
From: Fedor Pchelkin @ 2023-05-26 17:19 UTC (permalink / raw)
  To: Oleksij Rempel
  Cc: Fedor Pchelkin, Marc Kleine-Budde, kernel, Robin van der Gracht,
	Oliver Hartkopp, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Kurt Van Dijck, linux-can, netdev, linux-kernel,
	Alexey Khoroshilov, lvc-project

Syzkaller reports the following failure:

BUG: KASAN: use-after-free in kref_put include/linux/kref.h:64 [inline]
BUG: KASAN: use-after-free in j1939_priv_put+0x25/0xa0 net/can/j1939/main.c:172
Write of size 4 at addr ffff888141c15058 by task swapper/3/0

CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.10.144-syzkaller #0
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x107/0x167 lib/dump_stack.c:118
 print_address_description.constprop.0+0x1c/0x220 mm/kasan/report.c:385
 __kasan_report mm/kasan/report.c:545 [inline]
 kasan_report.cold+0x1f/0x37 mm/kasan/report.c:562
 check_memory_region_inline mm/kasan/generic.c:186 [inline]
 check_memory_region+0x145/0x190 mm/kasan/generic.c:192
 instrument_atomic_read_write include/linux/instrumented.h:101 [inline]
 atomic_fetch_sub_release include/asm-generic/atomic-instrumented.h:220 [inline]
 __refcount_sub_and_test include/linux/refcount.h:272 [inline]
 __refcount_dec_and_test include/linux/refcount.h:315 [inline]
 refcount_dec_and_test include/linux/refcount.h:333 [inline]
 kref_put include/linux/kref.h:64 [inline]
 j1939_priv_put+0x25/0xa0 net/can/j1939/main.c:172
 j1939_sk_sock_destruct+0x44/0x90 net/can/j1939/socket.c:374
 __sk_destruct+0x4e/0x820 net/core/sock.c:1784
 rcu_do_batch kernel/rcu/tree.c:2485 [inline]
 rcu_core+0xb35/0x1a30 kernel/rcu/tree.c:2726
 __do_softirq+0x289/0x9a3 kernel/softirq.c:298
 asm_call_irq_on_stack+0x12/0x20
 </IRQ>
 __run_on_irqstack arch/x86/include/asm/irq_stack.h:26 [inline]
 run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:77 [inline]
 do_softirq_own_stack+0xaa/0xe0 arch/x86/kernel/irq_64.c:77
 invoke_softirq kernel/softirq.c:393 [inline]
 __irq_exit_rcu kernel/softirq.c:423 [inline]
 irq_exit_rcu+0x136/0x200 kernel/softirq.c:435
 sysvec_apic_timer_interrupt+0x4d/0x100 arch/x86/kernel/apic/apic.c:1095
 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:635

Allocated by task 1141:
 kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
 kasan_set_track mm/kasan/common.c:56 [inline]
 __kasan_kmalloc.constprop.0+0xc9/0xd0 mm/kasan/common.c:461
 kmalloc include/linux/slab.h:552 [inline]
 kzalloc include/linux/slab.h:664 [inline]
 j1939_priv_create net/can/j1939/main.c:131 [inline]
 j1939_netdev_start+0x111/0x860 net/can/j1939/main.c:268
 j1939_sk_bind+0x8ea/0xd30 net/can/j1939/socket.c:485
 __sys_bind+0x1f2/0x260 net/socket.c:1645
 __do_sys_bind net/socket.c:1656 [inline]
 __se_sys_bind net/socket.c:1654 [inline]
 __x64_sys_bind+0x6f/0xb0 net/socket.c:1654
 do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x61/0xc6

Freed by task 1141:
 kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
 kasan_set_track+0x1c/0x30 mm/kasan/common.c:56
 kasan_set_free_info+0x1b/0x30 mm/kasan/generic.c:355
 __kasan_slab_free+0x112/0x170 mm/kasan/common.c:422
 slab_free_hook mm/slub.c:1542 [inline]
 slab_free_freelist_hook+0xad/0x190 mm/slub.c:1576
 slab_free mm/slub.c:3149 [inline]
 kfree+0xd9/0x3b0 mm/slub.c:4125
 j1939_netdev_start+0x5ee/0x860 net/can/j1939/main.c:300
 j1939_sk_bind+0x8ea/0xd30 net/can/j1939/socket.c:485
 __sys_bind+0x1f2/0x260 net/socket.c:1645
 __do_sys_bind net/socket.c:1656 [inline]
 __se_sys_bind net/socket.c:1654 [inline]
 __x64_sys_bind+0x6f/0xb0 net/socket.c:1654
 do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x61/0xc6

It can be caused by this scenario:

CPU0					CPU1
j1939_sk_bind(socket0, ndev0, ...)
  j1939_netdev_start()
					j1939_sk_bind(socket1, ndev0, ...)
                                          j1939_netdev_start()
  mutex_lock(&j1939_netdev_lock)
  j1939_priv_set(ndev0, priv)
  mutex_unlock(&j1939_netdev_lock)
					  if (priv_new)
					    kref_get(&priv_new->rx_kref)
					    return priv_new;
					  /* inside j1939_sk_bind() */
					  jsk->priv = priv
  j1939_can_rx_register(priv) // fails
  j1939_priv_set(ndev, NULL)
  kfree(priv)
					j1939_sk_sock_destruct()
					j1939_priv_put() // <- uaf

To avoid this, call j1939_can_rx_register() under j1939_netdev_lock so
that a concurrent thread cannot process j1939_priv before
j1939_can_rx_register() returns.

Found by Linux Verification Center (linuxtesting.org) with Syzkaller.

Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
---
 net/can/j1939/main.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/can/j1939/main.c b/net/can/j1939/main.c
index 6ed79afe19a5..ecff1c947d68 100644
--- a/net/can/j1939/main.c
+++ b/net/can/j1939/main.c
@@ -290,16 +290,18 @@ struct j1939_priv *j1939_netdev_start(struct net_device *ndev)
 		return priv_new;
 	}
 	j1939_priv_set(ndev, priv);
-	mutex_unlock(&j1939_netdev_lock);
 
 	ret = j1939_can_rx_register(priv);
 	if (ret < 0)
 		goto out_priv_put;
 
+	mutex_unlock(&j1939_netdev_lock);
 	return priv;
 
  out_priv_put:
 	j1939_priv_set(ndev, NULL);
+	mutex_unlock(&j1939_netdev_lock);
+
 	dev_put(ndev);
 	kfree(priv);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] can: j1939: avoid possible use-after-free when j1939_can_rx_register fails
  2023-05-26 17:19 ` [PATCH 2/2] can: j1939: avoid possible use-after-free when j1939_can_rx_register fails Fedor Pchelkin
@ 2023-05-26 18:15   ` Oleksij Rempel
  2023-05-26 18:50     ` Fedor Pchelkin
  2023-06-02 12:35   ` Oleksij Rempel
  1 sibling, 1 reply; 11+ messages in thread
From: Oleksij Rempel @ 2023-05-26 18:15 UTC (permalink / raw)
  To: Fedor Pchelkin
  Cc: Oleksij Rempel, Marc Kleine-Budde, kernel, Robin van der Gracht,
	Oliver Hartkopp, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Kurt Van Dijck, linux-can, netdev, linux-kernel,
	Alexey Khoroshilov, lvc-project

Hi Fedor,

On Fri, May 26, 2023 at 08:19:10PM +0300, Fedor Pchelkin wrote:
> Syzkaller reports the following failure:
> 
> BUG: KASAN: use-after-free in kref_put include/linux/kref.h:64 [inline]
> BUG: KASAN: use-after-free in j1939_priv_put+0x25/0xa0 net/can/j1939/main.c:172
> Write of size 4 at addr ffff888141c15058 by task swapper/3/0
> 
> CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.10.144-syzkaller #0
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> Call Trace:
>  <IRQ>
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x107/0x167 lib/dump_stack.c:118
>  print_address_description.constprop.0+0x1c/0x220 mm/kasan/report.c:385
>  __kasan_report mm/kasan/report.c:545 [inline]
>  kasan_report.cold+0x1f/0x37 mm/kasan/report.c:562
>  check_memory_region_inline mm/kasan/generic.c:186 [inline]
>  check_memory_region+0x145/0x190 mm/kasan/generic.c:192
>  instrument_atomic_read_write include/linux/instrumented.h:101 [inline]
>  atomic_fetch_sub_release include/asm-generic/atomic-instrumented.h:220 [inline]
>  __refcount_sub_and_test include/linux/refcount.h:272 [inline]
>  __refcount_dec_and_test include/linux/refcount.h:315 [inline]
>  refcount_dec_and_test include/linux/refcount.h:333 [inline]
>  kref_put include/linux/kref.h:64 [inline]
>  j1939_priv_put+0x25/0xa0 net/can/j1939/main.c:172
>  j1939_sk_sock_destruct+0x44/0x90 net/can/j1939/socket.c:374
>  __sk_destruct+0x4e/0x820 net/core/sock.c:1784
>  rcu_do_batch kernel/rcu/tree.c:2485 [inline]
>  rcu_core+0xb35/0x1a30 kernel/rcu/tree.c:2726
>  __do_softirq+0x289/0x9a3 kernel/softirq.c:298
>  asm_call_irq_on_stack+0x12/0x20
>  </IRQ>
>  __run_on_irqstack arch/x86/include/asm/irq_stack.h:26 [inline]
>  run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:77 [inline]
>  do_softirq_own_stack+0xaa/0xe0 arch/x86/kernel/irq_64.c:77
>  invoke_softirq kernel/softirq.c:393 [inline]
>  __irq_exit_rcu kernel/softirq.c:423 [inline]
>  irq_exit_rcu+0x136/0x200 kernel/softirq.c:435
>  sysvec_apic_timer_interrupt+0x4d/0x100 arch/x86/kernel/apic/apic.c:1095
>  asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:635
> 
> Allocated by task 1141:
>  kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
>  kasan_set_track mm/kasan/common.c:56 [inline]
>  __kasan_kmalloc.constprop.0+0xc9/0xd0 mm/kasan/common.c:461
>  kmalloc include/linux/slab.h:552 [inline]
>  kzalloc include/linux/slab.h:664 [inline]
>  j1939_priv_create net/can/j1939/main.c:131 [inline]
>  j1939_netdev_start+0x111/0x860 net/can/j1939/main.c:268
>  j1939_sk_bind+0x8ea/0xd30 net/can/j1939/socket.c:485
>  __sys_bind+0x1f2/0x260 net/socket.c:1645
>  __do_sys_bind net/socket.c:1656 [inline]
>  __se_sys_bind net/socket.c:1654 [inline]
>  __x64_sys_bind+0x6f/0xb0 net/socket.c:1654
>  do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46
>  entry_SYSCALL_64_after_hwframe+0x61/0xc6
> 
> Freed by task 1141:
>  kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
>  kasan_set_track+0x1c/0x30 mm/kasan/common.c:56
>  kasan_set_free_info+0x1b/0x30 mm/kasan/generic.c:355
>  __kasan_slab_free+0x112/0x170 mm/kasan/common.c:422
>  slab_free_hook mm/slub.c:1542 [inline]
>  slab_free_freelist_hook+0xad/0x190 mm/slub.c:1576
>  slab_free mm/slub.c:3149 [inline]
>  kfree+0xd9/0x3b0 mm/slub.c:4125
>  j1939_netdev_start+0x5ee/0x860 net/can/j1939/main.c:300
>  j1939_sk_bind+0x8ea/0xd30 net/can/j1939/socket.c:485
>  __sys_bind+0x1f2/0x260 net/socket.c:1645
>  __do_sys_bind net/socket.c:1656 [inline]
>  __se_sys_bind net/socket.c:1654 [inline]
>  __x64_sys_bind+0x6f/0xb0 net/socket.c:1654
>  do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46
>  entry_SYSCALL_64_after_hwframe+0x61/0xc6
> 
> It can be caused by this scenario:
> 
> CPU0					CPU1
> j1939_sk_bind(socket0, ndev0, ...)
>   j1939_netdev_start()
> 					j1939_sk_bind(socket1, ndev0, ...)
>                                           j1939_netdev_start()
>   mutex_lock(&j1939_netdev_lock)
>   j1939_priv_set(ndev0, priv)
>   mutex_unlock(&j1939_netdev_lock)
> 					  if (priv_new)
> 					    kref_get(&priv_new->rx_kref)
> 					    return priv_new;
> 					  /* inside j1939_sk_bind() */
> 					  jsk->priv = priv
>   j1939_can_rx_register(priv) // fails
>   j1939_priv_set(ndev, NULL)
>   kfree(priv)
> 					j1939_sk_sock_destruct()
> 					j1939_priv_put() // <- uaf
> 
> To avoid this, call j1939_can_rx_register() under j1939_netdev_lock so
> that a concurrent thread cannot process j1939_priv before
> j1939_can_rx_register() returns.
> 
> Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
> 
> Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
> Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
> ---
>  net/can/j1939/main.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/net/can/j1939/main.c b/net/can/j1939/main.c
> index 6ed79afe19a5..ecff1c947d68 100644
> --- a/net/can/j1939/main.c
> +++ b/net/can/j1939/main.c
> @@ -290,16 +290,18 @@ struct j1939_priv *j1939_netdev_start(struct net_device *ndev)
>  		return priv_new;
>  	}
>  	j1939_priv_set(ndev, priv);
> -	mutex_unlock(&j1939_netdev_lock);
>  
>  	ret = j1939_can_rx_register(priv);
>  	if (ret < 0)
>  		goto out_priv_put;
>  
> +	mutex_unlock(&j1939_netdev_lock);
>  	return priv;
>  
>   out_priv_put:
>  	j1939_priv_set(ndev, NULL);
> +	mutex_unlock(&j1939_netdev_lock);
> +
>  	dev_put(ndev);
>  	kfree(priv);
>  
> -- 
> 2.34.1
> 
> 


Thank you for your investigation. How about this change?
--- a/net/can/j1939/main.c
+++ b/net/can/j1939/main.c
@@ -285,8 +285,7 @@ struct j1939_priv *j1939_netdev_start(struct net_device *ndev)
                 */
                kref_get(&priv_new->rx_kref);
                spin_unlock(&j1939_netdev_lock);
-               dev_put(ndev);
-               kfree(priv);
+               j1939_priv_put(priv);
                return priv_new;
        }
        j1939_priv_set(ndev, priv);
@@ -300,8 +299,7 @@ struct j1939_priv *j1939_netdev_start(struct net_device *ndev)
 
  out_priv_put:
        j1939_priv_set(ndev, NULL);
-       dev_put(ndev);
-       kfree(priv);
+       j1939_priv_put(priv);
 
        return ERR_PTR(ret);
 }

If I see it correctly, the problem is kfree() which is called without respecting
the ref counting. If CPU1 has priv_new, refcounting is increased. The priv will
not be freed on this place.

Can you please test it?

Regards,
Oleksij
-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] can: j1939: avoid possible use-after-free when j1939_can_rx_register fails
  2023-05-26 18:15   ` Oleksij Rempel
@ 2023-05-26 18:50     ` Fedor Pchelkin
  2023-05-27  5:57       ` Oleksij Rempel
  0 siblings, 1 reply; 11+ messages in thread
From: Fedor Pchelkin @ 2023-05-26 18:50 UTC (permalink / raw)
  To: Oleksij Rempel
  Cc: Oleksij Rempel, Marc Kleine-Budde, kernel, Robin van der Gracht,
	Oliver Hartkopp, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Kurt Van Dijck, linux-can, netdev, linux-kernel,
	Alexey Khoroshilov, lvc-project

Hi Oleksij,

thanks for the reply!

On Fri, May 26, 2023 at 08:15:00PM +0200, Oleksij Rempel wrote:
> Hi Fedor,
> 
> On Fri, May 26, 2023 at 08:19:10PM +0300, Fedor Pchelkin wrote:
> 
> 
> Thank you for your investigation. How about this change?
> --- a/net/can/j1939/main.c
> +++ b/net/can/j1939/main.c
> @@ -285,8 +285,7 @@ struct j1939_priv *j1939_netdev_start(struct net_device *ndev)
>                  */
>                 kref_get(&priv_new->rx_kref);
>                 spin_unlock(&j1939_netdev_lock);
> -               dev_put(ndev);
> -               kfree(priv);
> +               j1939_priv_put(priv);

I don't think that's good because the priv which is directly freed here is
still local to the thread, and parallel threads don't have any access to
it. j1939_priv_create() has allocated a fresh priv and called dev_hold()
so dev_put() and kfree() here are okay.

>                 return priv_new;
>         }
>         j1939_priv_set(ndev, priv);
> @@ -300,8 +299,7 @@ struct j1939_priv *j1939_netdev_start(struct net_device *ndev)
>  
>   out_priv_put:
>         j1939_priv_set(ndev, NULL);
> -       dev_put(ndev);
> -       kfree(priv);
> +       j1939_priv_put(priv);
>  
>         return ERR_PTR(ret);
>  }
> 
> If I see it correctly, the problem is kfree() which is called without respecting
> the ref counting. If CPU1 has priv_new, refcounting is increased. The priv will
> not be freed on this place.

With your suggestion, I think it doesn't work correctly if
j1939_can_rx_register() fails and we go to out_priv_put. The priv is kept
but the parallel thread which may have already grabbed it thinks that
j1939_can_rx_register() has succeeded when actually it hasn't succeed.
Moreover, j1939_priv_set() makes it NULL on error path so that priv cannot
be accessed from ndev.

I also considered the alternatives where we don't have to serialize access
to j1939_can_rx_register() and subsequently introduce mutex. But with
current j1939_netdev_start() implementation I can't see how to fix the
racy bug without it.

> 
> Can you please test it?
> 
> Regards,
> Oleksij
> -- 
> Pengutronix e.K.                           |                             |
> Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
> 31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
> Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] can: j1939: avoid possible use-after-free when j1939_can_rx_register fails
  2023-05-26 18:50     ` Fedor Pchelkin
@ 2023-05-27  5:57       ` Oleksij Rempel
  2023-05-27 10:05         ` Fedor Pchelkin
  0 siblings, 1 reply; 11+ messages in thread
From: Oleksij Rempel @ 2023-05-27  5:57 UTC (permalink / raw)
  To: Fedor Pchelkin
  Cc: Oleksij Rempel, Marc Kleine-Budde, kernel, Robin van der Gracht,
	Oliver Hartkopp, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Kurt Van Dijck, linux-can, netdev, linux-kernel,
	Alexey Khoroshilov, lvc-project

Hi Fedor,

On Fri, May 26, 2023 at 09:50:26PM +0300, Fedor Pchelkin wrote:
> Hi Oleksij,
> 
> thanks for the reply!
> 
> On Fri, May 26, 2023 at 08:15:00PM +0200, Oleksij Rempel wrote:
> > Hi Fedor,
> > 
> > On Fri, May 26, 2023 at 08:19:10PM +0300, Fedor Pchelkin wrote:
> > 
> > 
> > Thank you for your investigation. How about this change?
> > --- a/net/can/j1939/main.c
> > +++ b/net/can/j1939/main.c
> > @@ -285,8 +285,7 @@ struct j1939_priv *j1939_netdev_start(struct net_device *ndev)
> >                  */
> >                 kref_get(&priv_new->rx_kref);
> >                 spin_unlock(&j1939_netdev_lock);
> > -               dev_put(ndev);
> > -               kfree(priv);
> > +               j1939_priv_put(priv);
> 
> I don't think that's good because the priv which is directly freed here is
> still local to the thread, and parallel threads don't have any access to
> it. j1939_priv_create() has allocated a fresh priv and called dev_hold()
> so dev_put() and kfree() here are okay.
> 
> >                 return priv_new;
> >         }
> >         j1939_priv_set(ndev, priv);
> > @@ -300,8 +299,7 @@ struct j1939_priv *j1939_netdev_start(struct net_device *ndev)
> >  
> >   out_priv_put:
> >         j1939_priv_set(ndev, NULL);
> > -       dev_put(ndev);
> > -       kfree(priv);
> > +       j1939_priv_put(priv);
> >  
> >         return ERR_PTR(ret);
> >  }
> > 
> > If I see it correctly, the problem is kfree() which is called without respecting
> > the ref counting. If CPU1 has priv_new, refcounting is increased. The priv will
> > not be freed on this place.
> 
> With your suggestion, I think it doesn't work correctly if
> j1939_can_rx_register() fails and we go to out_priv_put. The priv is kept
> but the parallel thread which may have already grabbed it thinks that
> j1939_can_rx_register() has succeeded when actually it hasn't succeed.
> Moreover, j1939_priv_set() makes it NULL on error path so that priv cannot
> be accessed from ndev.
> 
> I also considered the alternatives where we don't have to serialize access
> to j1939_can_rx_register() and subsequently introduce mutex. But with
> current j1939_netdev_start() implementation I can't see how to fix the
> racy bug without it.
 
Ok, it make sense.

I'll try to do some testing next week. If i'll forget it, please feel
free to ping me.

Regards,
Oleksij
-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] can: j1939: avoid possible use-after-free when j1939_can_rx_register fails
  2023-05-27  5:57       ` Oleksij Rempel
@ 2023-05-27 10:05         ` Fedor Pchelkin
  0 siblings, 0 replies; 11+ messages in thread
From: Fedor Pchelkin @ 2023-05-27 10:05 UTC (permalink / raw)
  To: Oleksij Rempel
  Cc: Marc Kleine-Budde, kernel, Robin van der Gracht, Oliver Hartkopp,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Kurt Van Dijck, linux-can, netdev, linux-kernel,
	Alexey Khoroshilov, lvc-project

On Sat, May 27, 2023 at 07:57:04AM +0200, Oleksij Rempel wrote:
> Hi Fedor,
> 
> On Fri, May 26, 2023 at 09:50:26PM +0300, Fedor Pchelkin wrote:
> > Hi Oleksij,
> > 
> > thanks for the reply!
> > 
> > On Fri, May 26, 2023 at 08:15:00PM +0200, Oleksij Rempel wrote:
> > > Hi Fedor,
> > > 
> > > On Fri, May 26, 2023 at 08:19:10PM +0300, Fedor Pchelkin wrote:
> > > 
> > > 
> > > Thank you for your investigation. How about this change?
> > > --- a/net/can/j1939/main.c
> > > +++ b/net/can/j1939/main.c
> > > @@ -285,8 +285,7 @@ struct j1939_priv *j1939_netdev_start(struct net_device *ndev)
> > >                  */
> > >                 kref_get(&priv_new->rx_kref);
> > >                 spin_unlock(&j1939_netdev_lock);
> > > -               dev_put(ndev);
> > > -               kfree(priv);
> > > +               j1939_priv_put(priv);
> > 
> > I don't think that's good because the priv which is directly freed here is
> > still local to the thread, and parallel threads don't have any access to
> > it. j1939_priv_create() has allocated a fresh priv and called dev_hold()
> > so dev_put() and kfree() here are okay.
> > 
> > >                 return priv_new;
> > >         }
> > >         j1939_priv_set(ndev, priv);
> > > @@ -300,8 +299,7 @@ struct j1939_priv *j1939_netdev_start(struct net_device *ndev)
> > >  
> > >   out_priv_put:
> > >         j1939_priv_set(ndev, NULL);
> > > -       dev_put(ndev);
> > > -       kfree(priv);
> > > +       j1939_priv_put(priv);
> > >  
> > >         return ERR_PTR(ret);
> > >  }
> > > 
> > > If I see it correctly, the problem is kfree() which is called without respecting
> > > the ref counting. If CPU1 has priv_new, refcounting is increased. The priv will
> > > not be freed on this place.
> > 
> > With your suggestion, I think it doesn't work correctly if
> > j1939_can_rx_register() fails and we go to out_priv_put. The priv is kept
> > but the parallel thread which may have already grabbed it thinks that
> > j1939_can_rx_register() has succeeded when actually it hasn't succeed.
> > Moreover, j1939_priv_set() makes it NULL on error path so that priv cannot
> > be accessed from ndev.
> > 
> > I also considered the alternatives where we don't have to serialize access
> > to j1939_can_rx_register() and subsequently introduce mutex. But with
> > current j1939_netdev_start() implementation I can't see how to fix the
> > racy bug without it.
>  
> Ok, it make sense.
> 
> I'll try to do some testing next week. If i'll forget it, please feel
> free to ping me.

Got it, thank you.

> 
> Regards,
> Oleksij
> -- 
> Pengutronix e.K.                           |                             |
> Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
> 31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
> Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] can: j1939: change j1939_netdev_lock type to mutex
  2023-05-26 17:19 ` [PATCH 1/2] can: j1939: change j1939_netdev_lock type to mutex Fedor Pchelkin
@ 2023-06-02 12:33   ` Oleksij Rempel
  0 siblings, 0 replies; 11+ messages in thread
From: Oleksij Rempel @ 2023-06-02 12:33 UTC (permalink / raw)
  To: Fedor Pchelkin
  Cc: Oleksij Rempel, Kurt Van Dijck, lvc-project,
	Robin van der Gracht, linux-can, Eric Dumazet, netdev,
	Marc Kleine-Budde, Alexey Khoroshilov, kernel, Oliver Hartkopp,
	Jakub Kicinski, Paolo Abeni, David S. Miller, linux-kernel

On Fri, May 26, 2023 at 08:19:09PM +0300, Fedor Pchelkin wrote:
> It turns out access to j1939_can_rx_register() needs to be serialized,
> otherwise j1939_priv can be corrupted when parallel threads call
> j1939_netdev_start() and j1939_can_rx_register() fails. This issue is
> thoroughly covered in other commit which serializes access to
> j1939_can_rx_register().
> 
> Change j1939_netdev_lock type to mutex so that we do not need to remove
> GFP_KERNEL from can_rx_register().
> 
> j1939_netdev_lock seems to be used in normal contexts where mutex usage
> is not prohibited.
> 
> Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
> 
> Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
> Suggested-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
> Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>

Tested-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>

Thank you!

> ---
> Note that it has been only tested via Syzkaller and not with real
> hardware.
> 
>  net/can/j1939/main.c | 22 +++++++++++-----------
>  1 file changed, 11 insertions(+), 11 deletions(-)
> 
> diff --git a/net/can/j1939/main.c b/net/can/j1939/main.c
> index 821d4ff303b3..6ed79afe19a5 100644
> --- a/net/can/j1939/main.c
> +++ b/net/can/j1939/main.c
> @@ -126,7 +126,7 @@ static void j1939_can_recv(struct sk_buff *iskb, void *data)
>  #define J1939_CAN_ID CAN_EFF_FLAG
>  #define J1939_CAN_MASK (CAN_EFF_FLAG | CAN_RTR_FLAG)
>  
> -static DEFINE_SPINLOCK(j1939_netdev_lock);
> +static DEFINE_MUTEX(j1939_netdev_lock);
>  
>  static struct j1939_priv *j1939_priv_create(struct net_device *ndev)
>  {
> @@ -220,7 +220,7 @@ static void __j1939_rx_release(struct kref *kref)
>  	j1939_can_rx_unregister(priv);
>  	j1939_ecu_unmap_all(priv);
>  	j1939_priv_set(priv->ndev, NULL);
> -	spin_unlock(&j1939_netdev_lock);
> +	mutex_unlock(&j1939_netdev_lock);
>  }
>  
>  /* get pointer to priv without increasing ref counter */
> @@ -248,9 +248,9 @@ static struct j1939_priv *j1939_priv_get_by_ndev(struct net_device *ndev)
>  {
>  	struct j1939_priv *priv;
>  
> -	spin_lock(&j1939_netdev_lock);
> +	mutex_lock(&j1939_netdev_lock);
>  	priv = j1939_priv_get_by_ndev_locked(ndev);
> -	spin_unlock(&j1939_netdev_lock);
> +	mutex_unlock(&j1939_netdev_lock);
>  
>  	return priv;
>  }
> @@ -260,14 +260,14 @@ struct j1939_priv *j1939_netdev_start(struct net_device *ndev)
>  	struct j1939_priv *priv, *priv_new;
>  	int ret;
>  
> -	spin_lock(&j1939_netdev_lock);
> +	mutex_lock(&j1939_netdev_lock);
>  	priv = j1939_priv_get_by_ndev_locked(ndev);
>  	if (priv) {
>  		kref_get(&priv->rx_kref);
> -		spin_unlock(&j1939_netdev_lock);
> +		mutex_unlock(&j1939_netdev_lock);
>  		return priv;
>  	}
> -	spin_unlock(&j1939_netdev_lock);
> +	mutex_unlock(&j1939_netdev_lock);
>  
>  	priv = j1939_priv_create(ndev);
>  	if (!priv)
> @@ -277,20 +277,20 @@ struct j1939_priv *j1939_netdev_start(struct net_device *ndev)
>  	spin_lock_init(&priv->j1939_socks_lock);
>  	INIT_LIST_HEAD(&priv->j1939_socks);
>  
> -	spin_lock(&j1939_netdev_lock);
> +	mutex_lock(&j1939_netdev_lock);
>  	priv_new = j1939_priv_get_by_ndev_locked(ndev);
>  	if (priv_new) {
>  		/* Someone was faster than us, use their priv and roll
>  		 * back our's.
>  		 */
>  		kref_get(&priv_new->rx_kref);
> -		spin_unlock(&j1939_netdev_lock);
> +		mutex_unlock(&j1939_netdev_lock);
>  		dev_put(ndev);
>  		kfree(priv);
>  		return priv_new;
>  	}
>  	j1939_priv_set(ndev, priv);
> -	spin_unlock(&j1939_netdev_lock);
> +	mutex_unlock(&j1939_netdev_lock);
>  
>  	ret = j1939_can_rx_register(priv);
>  	if (ret < 0)
> @@ -308,7 +308,7 @@ struct j1939_priv *j1939_netdev_start(struct net_device *ndev)
>  
>  void j1939_netdev_stop(struct j1939_priv *priv)
>  {
> -	kref_put_lock(&priv->rx_kref, __j1939_rx_release, &j1939_netdev_lock);
> +	kref_put_mutex(&priv->rx_kref, __j1939_rx_release, &j1939_netdev_lock);
>  	j1939_priv_put(priv);
>  }
>  
> -- 
> 2.34.1
> 
> 
> 

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] can: j1939: avoid possible use-after-free when j1939_can_rx_register fails
  2023-05-26 17:19 ` [PATCH 2/2] can: j1939: avoid possible use-after-free when j1939_can_rx_register fails Fedor Pchelkin
  2023-05-26 18:15   ` Oleksij Rempel
@ 2023-06-02 12:35   ` Oleksij Rempel
  2023-06-02 16:06     ` Fedor Pchelkin
  1 sibling, 1 reply; 11+ messages in thread
From: Oleksij Rempel @ 2023-06-02 12:35 UTC (permalink / raw)
  To: Fedor Pchelkin
  Cc: Oleksij Rempel, Kurt Van Dijck, lvc-project,
	Robin van der Gracht, linux-can, Eric Dumazet, netdev,
	Marc Kleine-Budde, Alexey Khoroshilov, kernel, Oliver Hartkopp,
	Jakub Kicinski, Paolo Abeni, David S. Miller, linux-kernel

On Fri, May 26, 2023 at 08:19:10PM +0300, Fedor Pchelkin wrote:
> Syzkaller reports the following failure:
> 
> BUG: KASAN: use-after-free in kref_put include/linux/kref.h:64 [inline]
> BUG: KASAN: use-after-free in j1939_priv_put+0x25/0xa0 net/can/j1939/main.c:172
> Write of size 4 at addr ffff888141c15058 by task swapper/3/0
> 
> CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.10.144-syzkaller #0
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> Call Trace:
>  <IRQ>
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x107/0x167 lib/dump_stack.c:118
>  print_address_description.constprop.0+0x1c/0x220 mm/kasan/report.c:385
>  __kasan_report mm/kasan/report.c:545 [inline]
>  kasan_report.cold+0x1f/0x37 mm/kasan/report.c:562
>  check_memory_region_inline mm/kasan/generic.c:186 [inline]
>  check_memory_region+0x145/0x190 mm/kasan/generic.c:192
>  instrument_atomic_read_write include/linux/instrumented.h:101 [inline]
>  atomic_fetch_sub_release include/asm-generic/atomic-instrumented.h:220 [inline]
>  __refcount_sub_and_test include/linux/refcount.h:272 [inline]
>  __refcount_dec_and_test include/linux/refcount.h:315 [inline]
>  refcount_dec_and_test include/linux/refcount.h:333 [inline]
>  kref_put include/linux/kref.h:64 [inline]
>  j1939_priv_put+0x25/0xa0 net/can/j1939/main.c:172
>  j1939_sk_sock_destruct+0x44/0x90 net/can/j1939/socket.c:374
>  __sk_destruct+0x4e/0x820 net/core/sock.c:1784
>  rcu_do_batch kernel/rcu/tree.c:2485 [inline]
>  rcu_core+0xb35/0x1a30 kernel/rcu/tree.c:2726
>  __do_softirq+0x289/0x9a3 kernel/softirq.c:298
>  asm_call_irq_on_stack+0x12/0x20
>  </IRQ>
>  __run_on_irqstack arch/x86/include/asm/irq_stack.h:26 [inline]
>  run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:77 [inline]
>  do_softirq_own_stack+0xaa/0xe0 arch/x86/kernel/irq_64.c:77
>  invoke_softirq kernel/softirq.c:393 [inline]
>  __irq_exit_rcu kernel/softirq.c:423 [inline]
>  irq_exit_rcu+0x136/0x200 kernel/softirq.c:435
>  sysvec_apic_timer_interrupt+0x4d/0x100 arch/x86/kernel/apic/apic.c:1095
>  asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:635
> 
> Allocated by task 1141:
>  kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
>  kasan_set_track mm/kasan/common.c:56 [inline]
>  __kasan_kmalloc.constprop.0+0xc9/0xd0 mm/kasan/common.c:461
>  kmalloc include/linux/slab.h:552 [inline]
>  kzalloc include/linux/slab.h:664 [inline]
>  j1939_priv_create net/can/j1939/main.c:131 [inline]
>  j1939_netdev_start+0x111/0x860 net/can/j1939/main.c:268
>  j1939_sk_bind+0x8ea/0xd30 net/can/j1939/socket.c:485
>  __sys_bind+0x1f2/0x260 net/socket.c:1645
>  __do_sys_bind net/socket.c:1656 [inline]
>  __se_sys_bind net/socket.c:1654 [inline]
>  __x64_sys_bind+0x6f/0xb0 net/socket.c:1654
>  do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46
>  entry_SYSCALL_64_after_hwframe+0x61/0xc6
> 
> Freed by task 1141:
>  kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
>  kasan_set_track+0x1c/0x30 mm/kasan/common.c:56
>  kasan_set_free_info+0x1b/0x30 mm/kasan/generic.c:355
>  __kasan_slab_free+0x112/0x170 mm/kasan/common.c:422
>  slab_free_hook mm/slub.c:1542 [inline]
>  slab_free_freelist_hook+0xad/0x190 mm/slub.c:1576
>  slab_free mm/slub.c:3149 [inline]
>  kfree+0xd9/0x3b0 mm/slub.c:4125
>  j1939_netdev_start+0x5ee/0x860 net/can/j1939/main.c:300
>  j1939_sk_bind+0x8ea/0xd30 net/can/j1939/socket.c:485
>  __sys_bind+0x1f2/0x260 net/socket.c:1645
>  __do_sys_bind net/socket.c:1656 [inline]
>  __se_sys_bind net/socket.c:1654 [inline]
>  __x64_sys_bind+0x6f/0xb0 net/socket.c:1654
>  do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46
>  entry_SYSCALL_64_after_hwframe+0x61/0xc6
> 
> It can be caused by this scenario:
> 
> CPU0					CPU1
> j1939_sk_bind(socket0, ndev0, ...)
>   j1939_netdev_start()
> 					j1939_sk_bind(socket1, ndev0, ...)
>                                           j1939_netdev_start()
>   mutex_lock(&j1939_netdev_lock)
>   j1939_priv_set(ndev0, priv)
>   mutex_unlock(&j1939_netdev_lock)
> 					  if (priv_new)
> 					    kref_get(&priv_new->rx_kref)
> 					    return priv_new;
> 					  /* inside j1939_sk_bind() */
> 					  jsk->priv = priv
>   j1939_can_rx_register(priv) // fails
>   j1939_priv_set(ndev, NULL)
>   kfree(priv)
> 					j1939_sk_sock_destruct()
> 					j1939_priv_put() // <- uaf
> 
> To avoid this, call j1939_can_rx_register() under j1939_netdev_lock so
> that a concurrent thread cannot process j1939_priv before
> j1939_can_rx_register() returns.
> 
> Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
> 
> Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
> Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>

Tested-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>

Thank you!

> ---
>  net/can/j1939/main.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/net/can/j1939/main.c b/net/can/j1939/main.c
> index 6ed79afe19a5..ecff1c947d68 100644
> --- a/net/can/j1939/main.c
> +++ b/net/can/j1939/main.c
> @@ -290,16 +290,18 @@ struct j1939_priv *j1939_netdev_start(struct net_device *ndev)
>  		return priv_new;
>  	}
>  	j1939_priv_set(ndev, priv);
> -	mutex_unlock(&j1939_netdev_lock);
>  
>  	ret = j1939_can_rx_register(priv);
>  	if (ret < 0)
>  		goto out_priv_put;
>  
> +	mutex_unlock(&j1939_netdev_lock);
>  	return priv;
>  
>   out_priv_put:
>  	j1939_priv_set(ndev, NULL);
> +	mutex_unlock(&j1939_netdev_lock);
> +
>  	dev_put(ndev);
>  	kfree(priv);
>  
> -- 
> 2.34.1
> 
> 
> 

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] can: j1939: avoid possible use-after-free when j1939_can_rx_register fails
  2023-06-02 12:35   ` Oleksij Rempel
@ 2023-06-02 16:06     ` Fedor Pchelkin
  0 siblings, 0 replies; 11+ messages in thread
From: Fedor Pchelkin @ 2023-06-02 16:06 UTC (permalink / raw)
  To: Oleksij Rempel
  Cc: Oleksij Rempel, Kurt Van Dijck, lvc-project,
	Robin van der Gracht, linux-can, Eric Dumazet, netdev,
	Marc Kleine-Budde, Alexey Khoroshilov, kernel, Oliver Hartkopp,
	Jakub Kicinski, Paolo Abeni, David S. Miller, linux-kernel

On Fri, Jun 02, 2023 at 02:35:19PM +0200, Oleksij Rempel wrote:
> On Fri, May 26, 2023 at 08:19:10PM +0300, Fedor Pchelkin wrote:
> > Syzkaller reports the following failure:
> > 
> > BUG: KASAN: use-after-free in kref_put include/linux/kref.h:64 [inline]
> > BUG: KASAN: use-after-free in j1939_priv_put+0x25/0xa0 net/can/j1939/main.c:172
> > Write of size 4 at addr ffff888141c15058 by task swapper/3/0
> > 
> > CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.10.144-syzkaller #0
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> > Call Trace:
> >  <IRQ>
> >  __dump_stack lib/dump_stack.c:77 [inline]
> >  dump_stack+0x107/0x167 lib/dump_stack.c:118
> >  print_address_description.constprop.0+0x1c/0x220 mm/kasan/report.c:385
> >  __kasan_report mm/kasan/report.c:545 [inline]
> >  kasan_report.cold+0x1f/0x37 mm/kasan/report.c:562
> >  check_memory_region_inline mm/kasan/generic.c:186 [inline]
> >  check_memory_region+0x145/0x190 mm/kasan/generic.c:192
> >  instrument_atomic_read_write include/linux/instrumented.h:101 [inline]
> >  atomic_fetch_sub_release include/asm-generic/atomic-instrumented.h:220 [inline]
> >  __refcount_sub_and_test include/linux/refcount.h:272 [inline]
> >  __refcount_dec_and_test include/linux/refcount.h:315 [inline]
> >  refcount_dec_and_test include/linux/refcount.h:333 [inline]
> >  kref_put include/linux/kref.h:64 [inline]
> >  j1939_priv_put+0x25/0xa0 net/can/j1939/main.c:172
> >  j1939_sk_sock_destruct+0x44/0x90 net/can/j1939/socket.c:374
> >  __sk_destruct+0x4e/0x820 net/core/sock.c:1784
> >  rcu_do_batch kernel/rcu/tree.c:2485 [inline]
> >  rcu_core+0xb35/0x1a30 kernel/rcu/tree.c:2726
> >  __do_softirq+0x289/0x9a3 kernel/softirq.c:298
> >  asm_call_irq_on_stack+0x12/0x20
> >  </IRQ>
> >  __run_on_irqstack arch/x86/include/asm/irq_stack.h:26 [inline]
> >  run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:77 [inline]
> >  do_softirq_own_stack+0xaa/0xe0 arch/x86/kernel/irq_64.c:77
> >  invoke_softirq kernel/softirq.c:393 [inline]
> >  __irq_exit_rcu kernel/softirq.c:423 [inline]
> >  irq_exit_rcu+0x136/0x200 kernel/softirq.c:435
> >  sysvec_apic_timer_interrupt+0x4d/0x100 arch/x86/kernel/apic/apic.c:1095
> >  asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:635
> > 
> > Allocated by task 1141:
> >  kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
> >  kasan_set_track mm/kasan/common.c:56 [inline]
> >  __kasan_kmalloc.constprop.0+0xc9/0xd0 mm/kasan/common.c:461
> >  kmalloc include/linux/slab.h:552 [inline]
> >  kzalloc include/linux/slab.h:664 [inline]
> >  j1939_priv_create net/can/j1939/main.c:131 [inline]
> >  j1939_netdev_start+0x111/0x860 net/can/j1939/main.c:268
> >  j1939_sk_bind+0x8ea/0xd30 net/can/j1939/socket.c:485
> >  __sys_bind+0x1f2/0x260 net/socket.c:1645
> >  __do_sys_bind net/socket.c:1656 [inline]
> >  __se_sys_bind net/socket.c:1654 [inline]
> >  __x64_sys_bind+0x6f/0xb0 net/socket.c:1654
> >  do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46
> >  entry_SYSCALL_64_after_hwframe+0x61/0xc6
> > 
> > Freed by task 1141:
> >  kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
> >  kasan_set_track+0x1c/0x30 mm/kasan/common.c:56
> >  kasan_set_free_info+0x1b/0x30 mm/kasan/generic.c:355
> >  __kasan_slab_free+0x112/0x170 mm/kasan/common.c:422
> >  slab_free_hook mm/slub.c:1542 [inline]
> >  slab_free_freelist_hook+0xad/0x190 mm/slub.c:1576
> >  slab_free mm/slub.c:3149 [inline]
> >  kfree+0xd9/0x3b0 mm/slub.c:4125
> >  j1939_netdev_start+0x5ee/0x860 net/can/j1939/main.c:300
> >  j1939_sk_bind+0x8ea/0xd30 net/can/j1939/socket.c:485
> >  __sys_bind+0x1f2/0x260 net/socket.c:1645
> >  __do_sys_bind net/socket.c:1656 [inline]
> >  __se_sys_bind net/socket.c:1654 [inline]
> >  __x64_sys_bind+0x6f/0xb0 net/socket.c:1654
> >  do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46
> >  entry_SYSCALL_64_after_hwframe+0x61/0xc6
> > 
> > It can be caused by this scenario:
> > 
> > CPU0					CPU1
> > j1939_sk_bind(socket0, ndev0, ...)
> >   j1939_netdev_start()
> > 					j1939_sk_bind(socket1, ndev0, ...)
> >                                           j1939_netdev_start()
> >   mutex_lock(&j1939_netdev_lock)
> >   j1939_priv_set(ndev0, priv)
> >   mutex_unlock(&j1939_netdev_lock)
> > 					  if (priv_new)
> > 					    kref_get(&priv_new->rx_kref)
> > 					    return priv_new;
> > 					  /* inside j1939_sk_bind() */
> > 					  jsk->priv = priv
> >   j1939_can_rx_register(priv) // fails
> >   j1939_priv_set(ndev, NULL)
> >   kfree(priv)
> > 					j1939_sk_sock_destruct()
> > 					j1939_priv_put() // <- uaf
> > 
> > To avoid this, call j1939_can_rx_register() under j1939_netdev_lock so
> > that a concurrent thread cannot process j1939_priv before
> > j1939_can_rx_register() returns.
> > 
> > Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
> > 
> > Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
> > Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
> 
> Tested-by: Oleksij Rempel <o.rempel@pengutronix.de>
> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
> 
> Thank you!
> 

Great!

Thanks for testing the patches!

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] can: j1939: avoid possible use-after-free when j1939_can_rx_register fails
  2023-05-26 17:19 [PATCH 0/2] can: j1939: avoid possible use-after-free when j1939_can_rx_register fails Fedor Pchelkin
  2023-05-26 17:19 ` [PATCH 1/2] can: j1939: change j1939_netdev_lock type to mutex Fedor Pchelkin
  2023-05-26 17:19 ` [PATCH 2/2] can: j1939: avoid possible use-after-free when j1939_can_rx_register fails Fedor Pchelkin
@ 2023-06-05  6:37 ` Marc Kleine-Budde
  2 siblings, 0 replies; 11+ messages in thread
From: Marc Kleine-Budde @ 2023-06-05  6:37 UTC (permalink / raw)
  To: Fedor Pchelkin
  Cc: Oleksij Rempel, kernel, Robin van der Gracht, Oliver Hartkopp,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Kurt Van Dijck, linux-can, netdev, linux-kernel,
	Alexey Khoroshilov, lvc-project

[-- Attachment #1: Type: text/plain, Size: 881 bytes --]

On 26.05.2023 20:19:08, Fedor Pchelkin wrote:
> The patch series fixes a possible racy use-after-free scenario described
> in 2/2: if j1939_can_rx_register() fails then the concurrent thread may
> have already read the invalid priv structure.
>
> The 1/2 makes j1939_netdev_lock a mutex so that access to
> j1939_can_rx_register() can be serialized without changing GFP_KERNEL to
> GFP_ATOMIC inside can_rx_register(). This seems to be safe.
>
> Note that the patch series has been tested only via Syzkaller and not with
> a real device.

Applied to linux-can + adding stable on Cc.

Thanks,
Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde          |
Embedded Linux                   | https://www.pengutronix.de |
Vertretung Nürnberg              | Phone: +49-5121-206917-129 |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-9   |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-06-05  6:37 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-26 17:19 [PATCH 0/2] can: j1939: avoid possible use-after-free when j1939_can_rx_register fails Fedor Pchelkin
2023-05-26 17:19 ` [PATCH 1/2] can: j1939: change j1939_netdev_lock type to mutex Fedor Pchelkin
2023-06-02 12:33   ` Oleksij Rempel
2023-05-26 17:19 ` [PATCH 2/2] can: j1939: avoid possible use-after-free when j1939_can_rx_register fails Fedor Pchelkin
2023-05-26 18:15   ` Oleksij Rempel
2023-05-26 18:50     ` Fedor Pchelkin
2023-05-27  5:57       ` Oleksij Rempel
2023-05-27 10:05         ` Fedor Pchelkin
2023-06-02 12:35   ` Oleksij Rempel
2023-06-02 16:06     ` Fedor Pchelkin
2023-06-05  6:37 ` [PATCH 0/2] " Marc Kleine-Budde

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).