linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net v2] tipc: call tipc_rcv() only if bearer is up in tipc_udp_recv()
@ 2017-11-29 10:48 Tommi Rantala
  2017-11-30 10:57 ` Ying Xue
  2017-12-01 20:14 ` David Miller
  0 siblings, 2 replies; 7+ messages in thread
From: Tommi Rantala @ 2017-11-29 10:48 UTC (permalink / raw)
  To: Jon Maloy, Ying Xue, David S. Miller, netdev, tipc-discussion,
	linux-kernel
  Cc: Tommi Rantala

Remove the second tipc_rcv() call in tipc_udp_recv(). We have just
checked that the bearer is not up, and calling tipc_rcv() with a bearer
that is not up leads to a TIPC div-by-zero crash in
tipc_node_calculate_timer(). The crash is rare in practice, but can
happen like this:

  We're enabling a bearer, but it's not yet up and fully initialized.
  At the same time we receive a discovery packet, and in tipc_udp_recv()
  we end up calling tipc_rcv() with the not-yet-initialized bearer,
  causing later the div-by-zero crash in tipc_node_calculate_timer().

Jon Maloy explains the impact of removing the second tipc_rcv() call:
  "link setup in the worst case will be delayed until the next arriving
   discovery messages, 1 sec later, and this is an acceptable delay."

As the tipc_rcv() call is removed, just leave the function via the
rcu_out label, so that we will kfree_skb().

[   12.590450] Own node address <1.1.1>, network identity 1
[   12.668088] divide error: 0000 [#1] SMP
[   12.676952] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.14.2-dirty #1
[   12.679225] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
[   12.682095] task: ffff8c2a761edb80 task.stack: ffffa41cc0cac000
[   12.684087] RIP: 0010:tipc_node_calculate_timer.isra.12+0x45/0x60 [tipc]
[   12.686486] RSP: 0018:ffff8c2a7fc838a0 EFLAGS: 00010246
[   12.688451] RAX: 0000000000000000 RBX: ffff8c2a5b382600 RCX: 0000000000000000
[   12.691197] RDX: 0000000000000000 RSI: ffff8c2a5b382600 RDI: ffff8c2a5b382600
[   12.693945] RBP: ffff8c2a7fc838b0 R08: 0000000000000001 R09: 0000000000000001
[   12.696632] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8c2a5d8949d8
[   12.699491] R13: ffffffff95ede400 R14: 0000000000000000 R15: ffff8c2a5d894800
[   12.702338] FS:  0000000000000000(0000) GS:ffff8c2a7fc80000(0000) knlGS:0000000000000000
[   12.705099] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   12.706776] CR2: 0000000001bb9440 CR3: 00000000bd009001 CR4: 00000000003606e0
[   12.708847] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   12.711016] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   12.712627] Call Trace:
[   12.713390]  <IRQ>
[   12.714011]  tipc_node_check_dest+0x2e8/0x350 [tipc]
[   12.715286]  tipc_disc_rcv+0x14d/0x1d0 [tipc]
[   12.716370]  tipc_rcv+0x8b0/0xd40 [tipc]
[   12.717396]  ? minmax_running_min+0x2f/0x60
[   12.718248]  ? dst_alloc+0x4c/0xa0
[   12.718964]  ? tcp_ack+0xaf1/0x10b0
[   12.719658]  ? tipc_udp_is_known_peer+0xa0/0xa0 [tipc]
[   12.720634]  tipc_udp_recv+0x71/0x1d0 [tipc]
[   12.721459]  ? dst_alloc+0x4c/0xa0
[   12.722130]  udp_queue_rcv_skb+0x264/0x490
[   12.722924]  __udp4_lib_rcv+0x21e/0x990
[   12.723670]  ? ip_route_input_rcu+0x2dd/0xbf0
[   12.724442]  ? tcp_v4_rcv+0x958/0xa40
[   12.725039]  udp_rcv+0x1a/0x20
[   12.725587]  ip_local_deliver_finish+0x97/0x1d0
[   12.726323]  ip_local_deliver+0xaf/0xc0
[   12.726959]  ? ip_route_input_noref+0x19/0x20
[   12.727689]  ip_rcv_finish+0xdd/0x3b0
[   12.728307]  ip_rcv+0x2ac/0x360
[   12.728839]  __netif_receive_skb_core+0x6fb/0xa90
[   12.729580]  ? udp4_gro_receive+0x1a7/0x2c0
[   12.730274]  __netif_receive_skb+0x1d/0x60
[   12.730953]  ? __netif_receive_skb+0x1d/0x60
[   12.731637]  netif_receive_skb_internal+0x37/0xd0
[   12.732371]  napi_gro_receive+0xc7/0xf0
[   12.732920]  receive_buf+0x3c3/0xd40
[   12.733441]  virtnet_poll+0xb1/0x250
[   12.733944]  net_rx_action+0x23e/0x370
[   12.734476]  __do_softirq+0xc5/0x2f8
[   12.734922]  irq_exit+0xfa/0x100
[   12.735315]  do_IRQ+0x4f/0xd0
[   12.735680]  common_interrupt+0xa2/0xa2
[   12.736126]  </IRQ>
[   12.736416] RIP: 0010:native_safe_halt+0x6/0x10
[   12.736925] RSP: 0018:ffffa41cc0cafe90 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff4d
[   12.737756] RAX: 0000000000000000 RBX: ffff8c2a761edb80 RCX: 0000000000000000
[   12.738504] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[   12.739258] RBP: ffffa41cc0cafe90 R08: 0000014b5b9795e5 R09: ffffa41cc12c7e88
[   12.740118] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
[   12.740964] R13: ffff8c2a761edb80 R14: 0000000000000000 R15: 0000000000000000
[   12.741831]  default_idle+0x2a/0x100
[   12.742323]  arch_cpu_idle+0xf/0x20
[   12.742796]  default_idle_call+0x28/0x40
[   12.743312]  do_idle+0x179/0x1f0
[   12.743761]  cpu_startup_entry+0x1d/0x20
[   12.744291]  start_secondary+0x112/0x120
[   12.744816]  secondary_startup_64+0xa5/0xa5
[   12.745367] Code: b9 f4 01 00 00 48 89 c2 48 c1 ea 02 48 3d d3 07 00
00 48 0f 47 d1 49 8b 0c 24 48 39 d1 76 07 49 89 14 24 48 89 d1 31 d2 48
89 df <48> f7 f1 89 c6 e8 81 6e ff ff 5b 41 5c 5d c3 66 90 66 2e 0f 1f
[   12.747527] RIP: tipc_node_calculate_timer.isra.12+0x45/0x60 [tipc] RSP: ffff8c2a7fc838a0
[   12.748555] ---[ end trace 1399ab83390650fd ]---
[   12.749296] Kernel panic - not syncing: Fatal exception in interrupt
[   12.750123] Kernel Offset: 0x13200000 from 0xffffffff82000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[   12.751215] Rebooting in 60 seconds..

Fixes: c9b64d492b1f ("tipc: add replicast peer discovery")
Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com>
Cc: Jon Maloy <jon.maloy@ericsson.com>
---

v2: Resorted to a minimal fix, as per feedback from David M.

 net/tipc/udp_media.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c
index ecca64fc6a6f..3deabcab4882 100644
--- a/net/tipc/udp_media.c
+++ b/net/tipc/udp_media.c
@@ -371,10 +371,6 @@ static int tipc_udp_recv(struct sock *sk, struct sk_buff *skb)
 			goto rcu_out;
 	}
 
-	tipc_rcv(sock_net(sk), skb, b);
-	rcu_read_unlock();
-	return 0;
-
 rcu_out:
 	rcu_read_unlock();
 out:
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net v2] tipc: call tipc_rcv() only if bearer is up in tipc_udp_recv()
  2017-11-29 10:48 [PATCH net v2] tipc: call tipc_rcv() only if bearer is up in tipc_udp_recv() Tommi Rantala
@ 2017-11-30 10:57 ` Ying Xue
  2017-11-30 12:32   ` Tommi Rantala
  2017-12-01 20:14 ` David Miller
  1 sibling, 1 reply; 7+ messages in thread
From: Ying Xue @ 2017-11-30 10:57 UTC (permalink / raw)
  To: Tommi Rantala, Jon Maloy, David S. Miller, netdev,
	tipc-discussion, linux-kernel

On 11/29/2017 06:48 PM, Tommi Rantala wrote:
> Remove the second tipc_rcv() call in tipc_udp_recv(). We have just
> checked that the bearer is not up, and calling tipc_rcv() with a bearer
> that is not up leads to a TIPC div-by-zero crash in
> tipc_node_calculate_timer(). The crash is rare in practice, but can
> happen like this:

In my opinion, the real root cause of the issue is because we too early
set a not-yet-initialized bearer instance to ub->bearer through
rcu_assign_pointer(ub->bearer, b) in tipc_udp_enable(). Instead if we
assign the bearer pointer at the end of tipc_udp_enable() where the
bearer has been completed the initialization, the issue would be avoided.

Thanks,
Ying

> 
>   We're enabling a bearer, but it's not yet up and fully initialized.
>   At the same time we receive a discovery packet, and in tipc_udp_recv()
>   we end up calling tipc_rcv() with the not-yet-initialized bearer,
>   causing later the div-by-zero crash in tipc_node_calculate_timer().
> 
> Jon Maloy explains the impact of removing the second tipc_rcv() call:
>   "link setup in the worst case will be delayed until the next arriving
>    discovery messages, 1 sec later, and this is an acceptable delay."
> 
> As the tipc_rcv() call is removed, just leave the function via the
> rcu_out label, so that we will kfree_skb().
> 
> [   12.590450] Own node address <1.1.1>, network identity 1
> [   12.668088] divide error: 0000 [#1] SMP
> [   12.676952] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.14.2-dirty #1
> [   12.679225] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
> [   12.682095] task: ffff8c2a761edb80 task.stack: ffffa41cc0cac000
> [   12.684087] RIP: 0010:tipc_node_calculate_timer.isra.12+0x45/0x60 [tipc]
> [   12.686486] RSP: 0018:ffff8c2a7fc838a0 EFLAGS: 00010246
> [   12.688451] RAX: 0000000000000000 RBX: ffff8c2a5b382600 RCX: 0000000000000000
> [   12.691197] RDX: 0000000000000000 RSI: ffff8c2a5b382600 RDI: ffff8c2a5b382600
> [   12.693945] RBP: ffff8c2a7fc838b0 R08: 0000000000000001 R09: 0000000000000001
> [   12.696632] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8c2a5d8949d8
> [   12.699491] R13: ffffffff95ede400 R14: 0000000000000000 R15: ffff8c2a5d894800
> [   12.702338] FS:  0000000000000000(0000) GS:ffff8c2a7fc80000(0000) knlGS:0000000000000000
> [   12.705099] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   12.706776] CR2: 0000000001bb9440 CR3: 00000000bd009001 CR4: 00000000003606e0
> [   12.708847] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   12.711016] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   12.712627] Call Trace:
> [   12.713390]  <IRQ>
> [   12.714011]  tipc_node_check_dest+0x2e8/0x350 [tipc]
> [   12.715286]  tipc_disc_rcv+0x14d/0x1d0 [tipc]
> [   12.716370]  tipc_rcv+0x8b0/0xd40 [tipc]
> [   12.717396]  ? minmax_running_min+0x2f/0x60
> [   12.718248]  ? dst_alloc+0x4c/0xa0
> [   12.718964]  ? tcp_ack+0xaf1/0x10b0
> [   12.719658]  ? tipc_udp_is_known_peer+0xa0/0xa0 [tipc]
> [   12.720634]  tipc_udp_recv+0x71/0x1d0 [tipc]
> [   12.721459]  ? dst_alloc+0x4c/0xa0
> [   12.722130]  udp_queue_rcv_skb+0x264/0x490
> [   12.722924]  __udp4_lib_rcv+0x21e/0x990
> [   12.723670]  ? ip_route_input_rcu+0x2dd/0xbf0
> [   12.724442]  ? tcp_v4_rcv+0x958/0xa40
> [   12.725039]  udp_rcv+0x1a/0x20
> [   12.725587]  ip_local_deliver_finish+0x97/0x1d0
> [   12.726323]  ip_local_deliver+0xaf/0xc0
> [   12.726959]  ? ip_route_input_noref+0x19/0x20
> [   12.727689]  ip_rcv_finish+0xdd/0x3b0
> [   12.728307]  ip_rcv+0x2ac/0x360
> [   12.728839]  __netif_receive_skb_core+0x6fb/0xa90
> [   12.729580]  ? udp4_gro_receive+0x1a7/0x2c0
> [   12.730274]  __netif_receive_skb+0x1d/0x60
> [   12.730953]  ? __netif_receive_skb+0x1d/0x60
> [   12.731637]  netif_receive_skb_internal+0x37/0xd0
> [   12.732371]  napi_gro_receive+0xc7/0xf0
> [   12.732920]  receive_buf+0x3c3/0xd40
> [   12.733441]  virtnet_poll+0xb1/0x250
> [   12.733944]  net_rx_action+0x23e/0x370
> [   12.734476]  __do_softirq+0xc5/0x2f8
> [   12.734922]  irq_exit+0xfa/0x100
> [   12.735315]  do_IRQ+0x4f/0xd0
> [   12.735680]  common_interrupt+0xa2/0xa2
> [   12.736126]  </IRQ>
> [   12.736416] RIP: 0010:native_safe_halt+0x6/0x10
> [   12.736925] RSP: 0018:ffffa41cc0cafe90 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff4d
> [   12.737756] RAX: 0000000000000000 RBX: ffff8c2a761edb80 RCX: 0000000000000000
> [   12.738504] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> [   12.739258] RBP: ffffa41cc0cafe90 R08: 0000014b5b9795e5 R09: ffffa41cc12c7e88
> [   12.740118] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
> [   12.740964] R13: ffff8c2a761edb80 R14: 0000000000000000 R15: 0000000000000000
> [   12.741831]  default_idle+0x2a/0x100
> [   12.742323]  arch_cpu_idle+0xf/0x20
> [   12.742796]  default_idle_call+0x28/0x40
> [   12.743312]  do_idle+0x179/0x1f0
> [   12.743761]  cpu_startup_entry+0x1d/0x20
> [   12.744291]  start_secondary+0x112/0x120
> [   12.744816]  secondary_startup_64+0xa5/0xa5
> [   12.745367] Code: b9 f4 01 00 00 48 89 c2 48 c1 ea 02 48 3d d3 07 00
> 00 48 0f 47 d1 49 8b 0c 24 48 39 d1 76 07 49 89 14 24 48 89 d1 31 d2 48
> 89 df <48> f7 f1 89 c6 e8 81 6e ff ff 5b 41 5c 5d c3 66 90 66 2e 0f 1f
> [   12.747527] RIP: tipc_node_calculate_timer.isra.12+0x45/0x60 [tipc] RSP: ffff8c2a7fc838a0
> [   12.748555] ---[ end trace 1399ab83390650fd ]---
> [   12.749296] Kernel panic - not syncing: Fatal exception in interrupt
> [   12.750123] Kernel Offset: 0x13200000 from 0xffffffff82000000
> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [   12.751215] Rebooting in 60 seconds..
> 
> Fixes: c9b64d492b1f ("tipc: add replicast peer discovery")
> Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com>
> Cc: Jon Maloy <jon.maloy@ericsson.com>
> ---
> 
> v2: Resorted to a minimal fix, as per feedback from David M.
> 
>  net/tipc/udp_media.c | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c
> index ecca64fc6a6f..3deabcab4882 100644
> --- a/net/tipc/udp_media.c
> +++ b/net/tipc/udp_media.c
> @@ -371,10 +371,6 @@ static int tipc_udp_recv(struct sock *sk, struct sk_buff *skb)
>  			goto rcu_out;
>  	}
>  
> -	tipc_rcv(sock_net(sk), skb, b);
> -	rcu_read_unlock();
> -	return 0;
> -
>  rcu_out:
>  	rcu_read_unlock();
>  out:
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net v2] tipc: call tipc_rcv() only if bearer is up in tipc_udp_recv()
  2017-11-30 10:57 ` Ying Xue
@ 2017-11-30 12:32   ` Tommi Rantala
  2017-12-01 13:18     ` Ying Xue
  0 siblings, 1 reply; 7+ messages in thread
From: Tommi Rantala @ 2017-11-30 12:32 UTC (permalink / raw)
  To: Ying Xue, Jon Maloy, David S. Miller, netdev, tipc-discussion,
	linux-kernel

On 30.11.2017 12:57, Ying Xue wrote:
> On 11/29/2017 06:48 PM, Tommi Rantala wrote:
>> Remove the second tipc_rcv() call in tipc_udp_recv(). We have just
>> checked that the bearer is not up, and calling tipc_rcv() with a bearer
>> that is not up leads to a TIPC div-by-zero crash in
>> tipc_node_calculate_timer(). The crash is rare in practice, but can
>> happen like this:
> 
> In my opinion, the real root cause of the issue is because we too early
> set a not-yet-initialized bearer instance to ub->bearer through
> rcu_assign_pointer(ub->bearer, b) in tipc_udp_enable(). Instead if we
> assign the bearer pointer at the end of tipc_udp_enable() where the
> bearer has been completed the initialization, the issue would be avoided.

Hi, sorry, I fail to see how that helps.

bearer->tolerance is only initialized in tipc_enable_bearer() after 
calling m->enable_media() ie. tipc_udp_enable().

So even if we do "rcu_assign_pointer(ub->bearer, b)" later in 
tipc_udp_enable(), bearer->tolerance will still be uninitialized, and 
the crash can happen.

BR,
Tommi

> Thanks,
> Ying
> 
>>
>>    We're enabling a bearer, but it's not yet up and fully initialized.
>>    At the same time we receive a discovery packet, and in tipc_udp_recv()
>>    we end up calling tipc_rcv() with the not-yet-initialized bearer,
>>    causing later the div-by-zero crash in tipc_node_calculate_timer().
>>
>> Jon Maloy explains the impact of removing the second tipc_rcv() call:
>>    "link setup in the worst case will be delayed until the next arriving
>>     discovery messages, 1 sec later, and this is an acceptable delay."
>>
>> As the tipc_rcv() call is removed, just leave the function via the
>> rcu_out label, so that we will kfree_skb().
>>
>> [   12.590450] Own node address <1.1.1>, network identity 1
>> [   12.668088] divide error: 0000 [#1] SMP
>> [   12.676952] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.14.2-dirty #1
>> [   12.679225] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
>> [   12.682095] task: ffff8c2a761edb80 task.stack: ffffa41cc0cac000
>> [   12.684087] RIP: 0010:tipc_node_calculate_timer.isra.12+0x45/0x60 [tipc]
>> [   12.686486] RSP: 0018:ffff8c2a7fc838a0 EFLAGS: 00010246
>> [   12.688451] RAX: 0000000000000000 RBX: ffff8c2a5b382600 RCX: 0000000000000000
>> [   12.691197] RDX: 0000000000000000 RSI: ffff8c2a5b382600 RDI: ffff8c2a5b382600
>> [   12.693945] RBP: ffff8c2a7fc838b0 R08: 0000000000000001 R09: 0000000000000001
>> [   12.696632] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8c2a5d8949d8
>> [   12.699491] R13: ffffffff95ede400 R14: 0000000000000000 R15: ffff8c2a5d894800
>> [   12.702338] FS:  0000000000000000(0000) GS:ffff8c2a7fc80000(0000) knlGS:0000000000000000
>> [   12.705099] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [   12.706776] CR2: 0000000001bb9440 CR3: 00000000bd009001 CR4: 00000000003606e0
>> [   12.708847] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [   12.711016] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [   12.712627] Call Trace:
>> [   12.713390]  <IRQ>
>> [   12.714011]  tipc_node_check_dest+0x2e8/0x350 [tipc]
>> [   12.715286]  tipc_disc_rcv+0x14d/0x1d0 [tipc]
>> [   12.716370]  tipc_rcv+0x8b0/0xd40 [tipc]
>> [   12.717396]  ? minmax_running_min+0x2f/0x60
>> [   12.718248]  ? dst_alloc+0x4c/0xa0
>> [   12.718964]  ? tcp_ack+0xaf1/0x10b0
>> [   12.719658]  ? tipc_udp_is_known_peer+0xa0/0xa0 [tipc]
>> [   12.720634]  tipc_udp_recv+0x71/0x1d0 [tipc]
>> [   12.721459]  ? dst_alloc+0x4c/0xa0
>> [   12.722130]  udp_queue_rcv_skb+0x264/0x490
>> [   12.722924]  __udp4_lib_rcv+0x21e/0x990
>> [   12.723670]  ? ip_route_input_rcu+0x2dd/0xbf0
>> [   12.724442]  ? tcp_v4_rcv+0x958/0xa40
>> [   12.725039]  udp_rcv+0x1a/0x20
>> [   12.725587]  ip_local_deliver_finish+0x97/0x1d0
>> [   12.726323]  ip_local_deliver+0xaf/0xc0
>> [   12.726959]  ? ip_route_input_noref+0x19/0x20
>> [   12.727689]  ip_rcv_finish+0xdd/0x3b0
>> [   12.728307]  ip_rcv+0x2ac/0x360
>> [   12.728839]  __netif_receive_skb_core+0x6fb/0xa90
>> [   12.729580]  ? udp4_gro_receive+0x1a7/0x2c0
>> [   12.730274]  __netif_receive_skb+0x1d/0x60
>> [   12.730953]  ? __netif_receive_skb+0x1d/0x60
>> [   12.731637]  netif_receive_skb_internal+0x37/0xd0
>> [   12.732371]  napi_gro_receive+0xc7/0xf0
>> [   12.732920]  receive_buf+0x3c3/0xd40
>> [   12.733441]  virtnet_poll+0xb1/0x250
>> [   12.733944]  net_rx_action+0x23e/0x370
>> [   12.734476]  __do_softirq+0xc5/0x2f8
>> [   12.734922]  irq_exit+0xfa/0x100
>> [   12.735315]  do_IRQ+0x4f/0xd0
>> [   12.735680]  common_interrupt+0xa2/0xa2
>> [   12.736126]  </IRQ>
>> [   12.736416] RIP: 0010:native_safe_halt+0x6/0x10
>> [   12.736925] RSP: 0018:ffffa41cc0cafe90 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff4d
>> [   12.737756] RAX: 0000000000000000 RBX: ffff8c2a761edb80 RCX: 0000000000000000
>> [   12.738504] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
>> [   12.739258] RBP: ffffa41cc0cafe90 R08: 0000014b5b9795e5 R09: ffffa41cc12c7e88
>> [   12.740118] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
>> [   12.740964] R13: ffff8c2a761edb80 R14: 0000000000000000 R15: 0000000000000000
>> [   12.741831]  default_idle+0x2a/0x100
>> [   12.742323]  arch_cpu_idle+0xf/0x20
>> [   12.742796]  default_idle_call+0x28/0x40
>> [   12.743312]  do_idle+0x179/0x1f0
>> [   12.743761]  cpu_startup_entry+0x1d/0x20
>> [   12.744291]  start_secondary+0x112/0x120
>> [   12.744816]  secondary_startup_64+0xa5/0xa5
>> [   12.745367] Code: b9 f4 01 00 00 48 89 c2 48 c1 ea 02 48 3d d3 07 00
>> 00 48 0f 47 d1 49 8b 0c 24 48 39 d1 76 07 49 89 14 24 48 89 d1 31 d2 48
>> 89 df <48> f7 f1 89 c6 e8 81 6e ff ff 5b 41 5c 5d c3 66 90 66 2e 0f 1f
>> [   12.747527] RIP: tipc_node_calculate_timer.isra.12+0x45/0x60 [tipc] RSP: ffff8c2a7fc838a0
>> [   12.748555] ---[ end trace 1399ab83390650fd ]---
>> [   12.749296] Kernel panic - not syncing: Fatal exception in interrupt
>> [   12.750123] Kernel Offset: 0x13200000 from 0xffffffff82000000
>> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>> [   12.751215] Rebooting in 60 seconds..
>>
>> Fixes: c9b64d492b1f ("tipc: add replicast peer discovery")
>> Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com>
>> Cc: Jon Maloy <jon.maloy@ericsson.com>
>> ---
>>
>> v2: Resorted to a minimal fix, as per feedback from David M.
>>
>>   net/tipc/udp_media.c | 4 ----
>>   1 file changed, 4 deletions(-)
>>
>> diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c
>> index ecca64fc6a6f..3deabcab4882 100644
>> --- a/net/tipc/udp_media.c
>> +++ b/net/tipc/udp_media.c
>> @@ -371,10 +371,6 @@ static int tipc_udp_recv(struct sock *sk, struct sk_buff *skb)
>>   			goto rcu_out;
>>   	}
>>   
>> -	tipc_rcv(sock_net(sk), skb, b);
>> -	rcu_read_unlock();
>> -	return 0;
>> -
>>   rcu_out:
>>   	rcu_read_unlock();
>>   out:
>>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net v2] tipc: call tipc_rcv() only if bearer is up in tipc_udp_recv()
  2017-11-30 12:32   ` Tommi Rantala
@ 2017-12-01 13:18     ` Ying Xue
  2017-12-01 14:02       ` Tommi Rantala
  0 siblings, 1 reply; 7+ messages in thread
From: Ying Xue @ 2017-12-01 13:18 UTC (permalink / raw)
  To: Tommi Rantala, Jon Maloy, David S. Miller, netdev,
	tipc-discussion, linux-kernel

On 11/30/2017 08:32 PM, Tommi Rantala wrote:
>> In my opinion, the real root cause of the issue is because we too early
>> set a not-yet-initialized bearer instance to ub->bearer through
>> rcu_assign_pointer(ub->bearer, b) in tipc_udp_enable(). Instead if we
>> assign the bearer pointer at the end of tipc_udp_enable() where the
>> bearer has been completed the initialization, the issue would be avoided.
> Hi, sorry, I fail to see how that helps.
> 
> bearer->tolerance is only initialized in tipc_enable_bearer() after 
> calling m->enable_media() ie. tipc_udp_enable().
> 
> So even if we do "rcu_assign_pointer(ub->bearer, b)" later in 
> tipc_udp_enable(), bearer->tolerance will still be uninitialized, and 
> the crash can happen.

Sorry, I missed the point that b->tolerance is not uninitialized when we
assign bearer pointer to ub->bearer later.

But in my view the issue happened is because we enable media too early.
So it's better to change the code as belows:

diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c
index 47ec121..ec6f02a 100644
--- a/net/tipc/bearer.c
+++ b/net/tipc/bearer.c
@@ -320,19 +320,18 @@ static int tipc_enable_bearer(struct net *net,
const char *name,

        strcpy(b->name, name);
        b->media = m;
-       res = m->enable_media(net, b, attr);
-       if (res) {
-               pr_warn("Bearer <%s> rejected, enable failure (%d)\n",
-                       name, -res);
-               return -EINVAL;
-       }
-
        b->identity = bearer_id;
        b->tolerance = m->tolerance;
        b->window = m->window;
        b->domain = disc_domain;
        b->net_plane = bearer_id + 'A';
        b->priority = priority;
+       res = m->enable_media(net, b, attr);
+       if (res) {
+               pr_warn("Bearer <%s> rejected, enable failure (%d)\n",
+                       name, -res);
+               return -EINVAL;
+       }
        test_and_set_bit_lock(0, &b->up);

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net v2] tipc: call tipc_rcv() only if bearer is up in tipc_udp_recv()
  2017-12-01 13:18     ` Ying Xue
@ 2017-12-01 14:02       ` Tommi Rantala
  2017-12-01 14:20         ` Jon Maloy
  0 siblings, 1 reply; 7+ messages in thread
From: Tommi Rantala @ 2017-12-01 14:02 UTC (permalink / raw)
  To: Ying Xue, Jon Maloy, David S. Miller, netdev, tipc-discussion,
	linux-kernel

On 01.12.2017 15:18, Ying Xue wrote:
> On 11/30/2017 08:32 PM, Tommi Rantala wrote:
>>> In my opinion, the real root cause of the issue is because we too early
>>> set a not-yet-initialized bearer instance to ub->bearer through
>>> rcu_assign_pointer(ub->bearer, b) in tipc_udp_enable(). Instead if we
>>> assign the bearer pointer at the end of tipc_udp_enable() where the
>>> bearer has been completed the initialization, the issue would be avoided.
>> Hi, sorry, I fail to see how that helps.
>>
>> bearer->tolerance is only initialized in tipc_enable_bearer() after
>> calling m->enable_media() ie. tipc_udp_enable().
>>
>> So even if we do "rcu_assign_pointer(ub->bearer, b)" later in
>> tipc_udp_enable(), bearer->tolerance will still be uninitialized, and
>> the crash can happen.
> 
> Sorry, I missed the point that b->tolerance is not uninitialized when we
> assign bearer pointer to ub->bearer later.
> 
> But in my view the issue happened is because we enable media too early.
> So it's better to change the code as belows:

Thanks, looks good to me!

Tested in 4.4 (which does not have b->up), and this fixes the oops also 
there.

-Tommi


> diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c
> index 47ec121..ec6f02a 100644
> --- a/net/tipc/bearer.c
> +++ b/net/tipc/bearer.c
> @@ -320,19 +320,18 @@ static int tipc_enable_bearer(struct net *net,
> const char *name,
> 
>          strcpy(b->name, name);
>          b->media = m;
> -       res = m->enable_media(net, b, attr);
> -       if (res) {
> -               pr_warn("Bearer <%s> rejected, enable failure (%d)\n",
> -                       name, -res);
> -               return -EINVAL;
> -       }
> -
>          b->identity = bearer_id;
>          b->tolerance = m->tolerance;
>          b->window = m->window;
>          b->domain = disc_domain;
>          b->net_plane = bearer_id + 'A';
>          b->priority = priority;
> +       res = m->enable_media(net, b, attr);
> +       if (res) {
> +               pr_warn("Bearer <%s> rejected, enable failure (%d)\n",
> +                       name, -res);
> +               return -EINVAL;
> +       }
>          test_and_set_bit_lock(0, &b->up);
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH net v2] tipc: call tipc_rcv() only if bearer is up in tipc_udp_recv()
  2017-12-01 14:02       ` Tommi Rantala
@ 2017-12-01 14:20         ` Jon Maloy
  0 siblings, 0 replies; 7+ messages in thread
From: Jon Maloy @ 2017-12-01 14:20 UTC (permalink / raw)
  To: Tommi Rantala, Ying Xue, David S. Miller, netdev,
	tipc-discussion, linux-kernel

Looks ok to me too. But I suggest we let David apply the pending patch as is to net, as that code is still wrong, and then post this improvement to net-next later.

///jon


> -----Original Message-----
> From: Tommi Rantala [mailto:tommi.t.rantala@nokia.com]
> Sent: Friday, December 01, 2017 09:03
> To: Ying Xue <ying.xue@windriver.com>; Jon Maloy
> <jon.maloy@ericsson.com>; David S. Miller <davem@davemloft.net>;
> netdev@vger.kernel.org; tipc-discussion@lists.sourceforge.net; linux-
> kernel@vger.kernel.org
> Subject: Re: [PATCH net v2] tipc: call tipc_rcv() only if bearer is up in
> tipc_udp_recv()
> 
> On 01.12.2017 15:18, Ying Xue wrote:
> > On 11/30/2017 08:32 PM, Tommi Rantala wrote:
> >>> In my opinion, the real root cause of the issue is because we too
> >>> early set a not-yet-initialized bearer instance to ub->bearer
> >>> through rcu_assign_pointer(ub->bearer, b) in tipc_udp_enable().
> >>> Instead if we assign the bearer pointer at the end of
> >>> tipc_udp_enable() where the bearer has been completed the
> initialization, the issue would be avoided.
> >> Hi, sorry, I fail to see how that helps.
> >>
> >> bearer->tolerance is only initialized in tipc_enable_bearer() after
> >> calling m->enable_media() ie. tipc_udp_enable().
> >>
> >> So even if we do "rcu_assign_pointer(ub->bearer, b)" later in
> >> tipc_udp_enable(), bearer->tolerance will still be uninitialized, and
> >> the crash can happen.
> >
> > Sorry, I missed the point that b->tolerance is not uninitialized when
> > we assign bearer pointer to ub->bearer later.
> >
> > But in my view the issue happened is because we enable media too early.
> > So it's better to change the code as belows:
> 
> Thanks, looks good to me!
> 
> Tested in 4.4 (which does not have b->up), and this fixes the oops also there.
> 
> -Tommi
> 
> 
> > diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c index
> > 47ec121..ec6f02a 100644
> > --- a/net/tipc/bearer.c
> > +++ b/net/tipc/bearer.c
> > @@ -320,19 +320,18 @@ static int tipc_enable_bearer(struct net *net,
> > const char *name,
> >
> >          strcpy(b->name, name);
> >          b->media = m;
> > -       res = m->enable_media(net, b, attr);
> > -       if (res) {
> > -               pr_warn("Bearer <%s> rejected, enable failure (%d)\n",
> > -                       name, -res);
> > -               return -EINVAL;
> > -       }
> > -
> >          b->identity = bearer_id;
> >          b->tolerance = m->tolerance;
> >          b->window = m->window;
> >          b->domain = disc_domain;
> >          b->net_plane = bearer_id + 'A';
> >          b->priority = priority;
> > +       res = m->enable_media(net, b, attr);
> > +       if (res) {
> > +               pr_warn("Bearer <%s> rejected, enable failure (%d)\n",
> > +                       name, -res);
> > +               return -EINVAL;
> > +       }
> >          test_and_set_bit_lock(0, &b->up);
> >

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net v2] tipc: call tipc_rcv() only if bearer is up in tipc_udp_recv()
  2017-11-29 10:48 [PATCH net v2] tipc: call tipc_rcv() only if bearer is up in tipc_udp_recv() Tommi Rantala
  2017-11-30 10:57 ` Ying Xue
@ 2017-12-01 20:14 ` David Miller
  1 sibling, 0 replies; 7+ messages in thread
From: David Miller @ 2017-12-01 20:14 UTC (permalink / raw)
  To: tommi.t.rantala
  Cc: jon.maloy, ying.xue, netdev, tipc-discussion, linux-kernel

From: Tommi Rantala <tommi.t.rantala@nokia.com>
Date: Wed, 29 Nov 2017 12:48:42 +0200

> Remove the second tipc_rcv() call in tipc_udp_recv(). We have just
> checked that the bearer is not up, and calling tipc_rcv() with a bearer
> that is not up leads to a TIPC div-by-zero crash in
> tipc_node_calculate_timer(). The crash is rare in practice, but can
> happen like this:
> 
>   We're enabling a bearer, but it's not yet up and fully initialized.
>   At the same time we receive a discovery packet, and in tipc_udp_recv()
>   we end up calling tipc_rcv() with the not-yet-initialized bearer,
>   causing later the div-by-zero crash in tipc_node_calculate_timer().
> 
> Jon Maloy explains the impact of removing the second tipc_rcv() call:
>   "link setup in the worst case will be delayed until the next arriving
>    discovery messages, 1 sec later, and this is an acceptable delay."
> 
> As the tipc_rcv() call is removed, just leave the function via the
> rcu_out label, so that we will kfree_skb().
 ...
> Fixes: c9b64d492b1f ("tipc: add replicast peer discovery")
> Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com>
> Cc: Jon Maloy <jon.maloy@ericsson.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-12-01 20:15 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-29 10:48 [PATCH net v2] tipc: call tipc_rcv() only if bearer is up in tipc_udp_recv() Tommi Rantala
2017-11-30 10:57 ` Ying Xue
2017-11-30 12:32   ` Tommi Rantala
2017-12-01 13:18     ` Ying Xue
2017-12-01 14:02       ` Tommi Rantala
2017-12-01 14:20         ` Jon Maloy
2017-12-01 20:14 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).