All of lore.kernel.org
 help / color / mirror / Atom feed
* RCU callback crashes
@ 2017-12-20  1:59 Jakub Kicinski
  2017-12-20  6:11 ` Jiri Pirko
  0 siblings, 1 reply; 28+ messages in thread
From: Jakub Kicinski @ 2017-12-20  1:59 UTC (permalink / raw)
  To: netdev, Jiri Pirko, Cong Wang

Hi!

If I run the netdevsim test long enough on a kernel with no debugging 
I get this:

[ 1400.450124] BUG: unable to handle kernel paging request at 000000046474e552
[ 1400.458005] IP: 0x46474e552
[ 1400.461231] PGD 0 P4D 0 
[ 1400.464150] Oops: 0010 [#1] PREEMPT SMP
[ 1400.468525] Modules linked in: cls_bpf sch_ingress algif_hash af_alg netdevsim rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace f3
[ 1400.516951] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.15.0-rc3-perf-00918-g129c9981a55f #918
[ 1400.526678] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
[ 1400.535150] RIP: 0010:0x46474e552
[ 1400.538941] RSP: 0018:ffff9f736f083f08 EFLAGS: 00010216
[ 1400.544870] RAX: ffff9f736b4771b8 RBX: ffff9f736f09b880 RCX: ffff9f736b4771b8
[ 1400.552935] RDX: 000000046474e552 RSI: ffff9f736f083f18 RDI: ffff9f736b4771b8
[ 1400.561001] RBP: ffffffff8bc4a740 R08: ffff9f736b4771b8 R09: 0000000000000000
[ 1400.569066] R10: ffff9f736f083d90 R11: 0000000000000000 R12: ffff9f736f09b8b8
[ 1400.577132] R13: 000000000000000a R14: 7fffffffffffffff R15: 0000000000000202
[ 1400.585197] FS:  0000000000000000(0000) GS:ffff9f736f080000(0000) knlGS:0000000000000000
[ 1400.594349] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1400.600859] CR2: 000000046474e552 CR3: 0000000839c09001 CR4: 00000000003606e0
[ 1400.608917] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1400.616982] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1400.625048] Call Trace:
[ 1400.627868]  <IRQ>
[ 1400.630207]  ? rcu_process_callbacks+0x1a0/0x4d0
[ 1400.635458]  ? __do_softirq+0xd1/0x30a
[ 1400.639739]  ? irq_exit+0xae/0xb0
[ 1400.643532]  ? smp_apic_timer_interrupt+0x60/0x140
[ 1400.648977]  ? apic_timer_interrupt+0x8c/0xa0
[ 1400.653934]  </IRQ>
[ 1400.656370]  ? cpuidle_enter_state+0xb0/0x2f0
[ 1400.661328]  ? cpuidle_enter_state+0x8d/0x2f0
[ 1400.666287]  ? do_idle+0x17b/0x1d0
[ 1400.670167]  ? cpu_startup_entry+0x5f/0x70
[ 1400.674836]  ? start_secondary+0x169/0x190
[ 1400.679504]  ? secondary_startup_64+0xa5/0xb0
[ 1400.684466] Code:  Bad RIP value.
[ 1400.688259] RIP: 0x46474e552 RSP: ffff9f736f083f08
[ 1400.693703] CR2: 000000046474e552
[ 1400.697501] ---[ end trace fab2c0fb826644df ]---
[ 1400.708442] Kernel panic - not syncing: Fatal exception in interrupt
[ 1400.715693] Kernel Offset: 0xa000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 1400.732994] ---[ end Kernel panic - not syncing: Fatal exception in interrupt

Unfortunately reproducing the crash on an instrumented kernel seems to
be difficult..

I managed to gather this:

[   26.157415] ------------[ cut here ]------------
[   26.162670] ODEBUG: free active (active state 1) object type: rcu_head hint:           (null)
[   26.172361] WARNING: CPU: 19 PID: 1352 at ../lib/debugobjects.c:291 debug_print_object+0x64/0x80
[   26.182288] Modules linked in: cls_bpf sch_ingress algif_hash af_alg netdevsim rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace f3
[   26.230728] CPU: 19 PID: 1352 Comm: tc Not tainted 4.15.0-rc3-perf-00918-g129c9981a55f #4
[   26.239977] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
[   26.248453] RIP: 0010:debug_print_object+0x64/0x80
[   26.253896] RSP: 0018:ffffb7340410fa00 EFLAGS: 00010086
[   26.259825] RAX: 0000000000000051 RBX: ffff8f1f6b7cc5a0 RCX: 0000000000000006
[   26.267892] RDX: 0000000000000007 RSI: 0000000000000082 RDI: ffff8f1f6f48cdd0
[   26.275959] RBP: ffffffffb3c48600 R08: 0000000000000000 R09: 00000000000005f2
[   26.284042] R10: 000000000000001e R11: ffffffffb41c35ad R12: ffffffffb3a1d101
[   26.292125] R13: ffff8f1f6b7cc5a0 R14: ffffffffb423a8b8 R15: 0000000000000001
[   26.300194] FS:  00007f64d4956700(0000) GS:ffff8f1f6f480000(0000) knlGS:0000000000000000
[   26.309346] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   26.315859] CR2: 0000000001cbc498 CR3: 000000086a8a2004 CR4: 00000000003606e0
[   26.323925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   26.331994] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   26.331994] Call Trace:
[   26.331998]  debug_check_no_obj_freed+0x1e6/0x220
[   26.332020]  ? qdisc_graft+0x14f/0x450
[   26.332025]  kfree+0x14d/0x1b0
[   26.332027]  qdisc_graft+0x14f/0x450
[   26.332029]  tc_get_qdisc+0x12f/0x200
[   26.332035]  rtnetlink_rcv_msg+0x122/0x310
[   26.332039]  ? __skb_try_recv_datagram+0xef/0x150
[   26.332040]  ? __kmalloc_node_track_caller+0x205/0x2b0
[   26.332042]  ? rtnl_calcit.isra.12+0x100/0x100
[   26.332044]  netlink_rcv_skb+0x8d/0x130
[   26.332046]  netlink_unicast+0x16a/0x210
[   26.332048]  netlink_sendmsg+0x32a/0x370
[   26.332054]  sock_sendmsg+0x2d/0x40
[   26.332056]  ___sys_sendmsg+0x298/0x2e0
[   26.332061]  ? mem_cgroup_commit_charge+0x7a/0x540
[   26.332062]  ? mem_cgroup_try_charge+0x8e/0x1d0
[   26.332066]  ? __handle_mm_fault+0x3a1/0x1190
[   26.332068]  ? __sys_sendmsg+0x41/0x70
[   26.332069]  __sys_sendmsg+0x41/0x70
[   26.332074]  entry_SYSCALL_64_fastpath+0x1e/0x81
[   26.332076] RIP: 0033:0x7f64d3b53450
[   26.332076] RSP: 002b:00007fffb5ea4388 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[   26.332077] RAX: ffffffffffffffda RBX: 00007f64d3e0fb20 RCX: 00007f64d3b53450
[   26.332078] RDX: 0000000000000000 RSI: 00007fffb5ea43e0 RDI: 0000000000000003
[   26.332078] RBP: 0000000000000a11 R08: 0000000000000000 R09: 000000000000000f
[   26.332079] R10: 00000000000005e7 R11: 0000000000000246 R12: 00007f64d3e0fb78
[   26.332079] R13: 00007f64d3e0fb78 R14: 000000000000270f R15: 00007f64d3e0fb78
[   26.332081] Code: c1 83 c2 01 8b 4b 14 4c 8b 45 00 89 15 f6 d0 e5 00 8b 53 10 4c 89 e6 48 c7 c7 38 7c a3 b3 48 8b 14 d5 80 3d 85 b 
[   26.332097] ---[ end trace bd33b199ae76ad43 ]---

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RCU callback crashes
  2017-12-20  1:59 RCU callback crashes Jakub Kicinski
@ 2017-12-20  6:11 ` Jiri Pirko
  2017-12-20  6:22   ` Jakub Kicinski
  0 siblings, 1 reply; 28+ messages in thread
From: Jiri Pirko @ 2017-12-20  6:11 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, Cong Wang

Wed, Dec 20, 2017 at 02:59:21AM CET, kubakici@wp.pl wrote:
>Hi!
>
>If I run the netdevsim test long enough on a kernel with no debugging 

Just running tools/testing/selftests/bpf/test_offload.py?

>I get this:

Could you try to run it with kasan on?

>
>[ 1400.450124] BUG: unable to handle kernel paging request at 000000046474e552
>[ 1400.458005] IP: 0x46474e552
>[ 1400.461231] PGD 0 P4D 0 
>[ 1400.464150] Oops: 0010 [#1] PREEMPT SMP
>[ 1400.468525] Modules linked in: cls_bpf sch_ingress algif_hash af_alg netdevsim rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace f3
>[ 1400.516951] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.15.0-rc3-perf-00918-g129c9981a55f #918
>[ 1400.526678] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
>[ 1400.535150] RIP: 0010:0x46474e552
>[ 1400.538941] RSP: 0018:ffff9f736f083f08 EFLAGS: 00010216
>[ 1400.544870] RAX: ffff9f736b4771b8 RBX: ffff9f736f09b880 RCX: ffff9f736b4771b8
>[ 1400.552935] RDX: 000000046474e552 RSI: ffff9f736f083f18 RDI: ffff9f736b4771b8
>[ 1400.561001] RBP: ffffffff8bc4a740 R08: ffff9f736b4771b8 R09: 0000000000000000
>[ 1400.569066] R10: ffff9f736f083d90 R11: 0000000000000000 R12: ffff9f736f09b8b8
>[ 1400.577132] R13: 000000000000000a R14: 7fffffffffffffff R15: 0000000000000202
>[ 1400.585197] FS:  0000000000000000(0000) GS:ffff9f736f080000(0000) knlGS:0000000000000000
>[ 1400.594349] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>[ 1400.600859] CR2: 000000046474e552 CR3: 0000000839c09001 CR4: 00000000003606e0
>[ 1400.608917] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>[ 1400.616982] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>[ 1400.625048] Call Trace:
>[ 1400.627868]  <IRQ>
>[ 1400.630207]  ? rcu_process_callbacks+0x1a0/0x4d0
>[ 1400.635458]  ? __do_softirq+0xd1/0x30a
>[ 1400.639739]  ? irq_exit+0xae/0xb0
>[ 1400.643532]  ? smp_apic_timer_interrupt+0x60/0x140
>[ 1400.648977]  ? apic_timer_interrupt+0x8c/0xa0
>[ 1400.653934]  </IRQ>
>[ 1400.656370]  ? cpuidle_enter_state+0xb0/0x2f0
>[ 1400.661328]  ? cpuidle_enter_state+0x8d/0x2f0
>[ 1400.666287]  ? do_idle+0x17b/0x1d0
>[ 1400.670167]  ? cpu_startup_entry+0x5f/0x70
>[ 1400.674836]  ? start_secondary+0x169/0x190
>[ 1400.679504]  ? secondary_startup_64+0xa5/0xb0
>[ 1400.684466] Code:  Bad RIP value.
>[ 1400.688259] RIP: 0x46474e552 RSP: ffff9f736f083f08
>[ 1400.693703] CR2: 000000046474e552
>[ 1400.697501] ---[ end trace fab2c0fb826644df ]---
>[ 1400.708442] Kernel panic - not syncing: Fatal exception in interrupt
>[ 1400.715693] Kernel Offset: 0xa000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>[ 1400.732994] ---[ end Kernel panic - not syncing: Fatal exception in interrupt
>
>Unfortunately reproducing the crash on an instrumented kernel seems to
>be difficult..
>
>I managed to gather this:
>
>[   26.157415] ------------[ cut here ]------------
>[   26.162670] ODEBUG: free active (active state 1) object type: rcu_head hint:           (null)
>[   26.172361] WARNING: CPU: 19 PID: 1352 at ../lib/debugobjects.c:291 debug_print_object+0x64/0x80
>[   26.182288] Modules linked in: cls_bpf sch_ingress algif_hash af_alg netdevsim rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace f3
>[   26.230728] CPU: 19 PID: 1352 Comm: tc Not tainted 4.15.0-rc3-perf-00918-g129c9981a55f #4
>[   26.239977] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
>[   26.248453] RIP: 0010:debug_print_object+0x64/0x80
>[   26.253896] RSP: 0018:ffffb7340410fa00 EFLAGS: 00010086
>[   26.259825] RAX: 0000000000000051 RBX: ffff8f1f6b7cc5a0 RCX: 0000000000000006
>[   26.267892] RDX: 0000000000000007 RSI: 0000000000000082 RDI: ffff8f1f6f48cdd0
>[   26.275959] RBP: ffffffffb3c48600 R08: 0000000000000000 R09: 00000000000005f2
>[   26.284042] R10: 000000000000001e R11: ffffffffb41c35ad R12: ffffffffb3a1d101
>[   26.292125] R13: ffff8f1f6b7cc5a0 R14: ffffffffb423a8b8 R15: 0000000000000001
>[   26.300194] FS:  00007f64d4956700(0000) GS:ffff8f1f6f480000(0000) knlGS:0000000000000000
>[   26.309346] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>[   26.315859] CR2: 0000000001cbc498 CR3: 000000086a8a2004 CR4: 00000000003606e0
>[   26.323925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>[   26.331994] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>[   26.331994] Call Trace:
>[   26.331998]  debug_check_no_obj_freed+0x1e6/0x220
>[   26.332020]  ? qdisc_graft+0x14f/0x450
>[   26.332025]  kfree+0x14d/0x1b0
>[   26.332027]  qdisc_graft+0x14f/0x450
>[   26.332029]  tc_get_qdisc+0x12f/0x200
>[   26.332035]  rtnetlink_rcv_msg+0x122/0x310
>[   26.332039]  ? __skb_try_recv_datagram+0xef/0x150
>[   26.332040]  ? __kmalloc_node_track_caller+0x205/0x2b0
>[   26.332042]  ? rtnl_calcit.isra.12+0x100/0x100
>[   26.332044]  netlink_rcv_skb+0x8d/0x130
>[   26.332046]  netlink_unicast+0x16a/0x210
>[   26.332048]  netlink_sendmsg+0x32a/0x370
>[   26.332054]  sock_sendmsg+0x2d/0x40
>[   26.332056]  ___sys_sendmsg+0x298/0x2e0
>[   26.332061]  ? mem_cgroup_commit_charge+0x7a/0x540
>[   26.332062]  ? mem_cgroup_try_charge+0x8e/0x1d0
>[   26.332066]  ? __handle_mm_fault+0x3a1/0x1190
>[   26.332068]  ? __sys_sendmsg+0x41/0x70
>[   26.332069]  __sys_sendmsg+0x41/0x70
>[   26.332074]  entry_SYSCALL_64_fastpath+0x1e/0x81
>[   26.332076] RIP: 0033:0x7f64d3b53450
>[   26.332076] RSP: 002b:00007fffb5ea4388 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
>[   26.332077] RAX: ffffffffffffffda RBX: 00007f64d3e0fb20 RCX: 00007f64d3b53450
>[   26.332078] RDX: 0000000000000000 RSI: 00007fffb5ea43e0 RDI: 0000000000000003
>[   26.332078] RBP: 0000000000000a11 R08: 0000000000000000 R09: 000000000000000f
>[   26.332079] R10: 00000000000005e7 R11: 0000000000000246 R12: 00007f64d3e0fb78
>[   26.332079] R13: 00007f64d3e0fb78 R14: 000000000000270f R15: 00007f64d3e0fb78
>[   26.332081] Code: c1 83 c2 01 8b 4b 14 4c 8b 45 00 89 15 f6 d0 e5 00 8b 53 10 4c 89 e6 48 c7 c7 38 7c a3 b3 48 8b 14 d5 80 3d 85 b 
>[   26.332097] ---[ end trace bd33b199ae76ad43 ]---

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RCU callback crashes
  2017-12-20  6:11 ` Jiri Pirko
@ 2017-12-20  6:22   ` Jakub Kicinski
  2017-12-20  6:34     ` Jakub Kicinski
  0 siblings, 1 reply; 28+ messages in thread
From: Jakub Kicinski @ 2017-12-20  6:22 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, Cong Wang

On Wed, 20 Dec 2017 07:11:18 +0100, Jiri Pirko wrote:
> Wed, Dec 20, 2017 at 02:59:21AM CET, kubakici@wp.pl wrote:
> >Hi!
> >
> >If I run the netdevsim test long enough on a kernel with no debugging   
> 
> Just running tools/testing/selftests/bpf/test_offload.py?

Yes, like this:

while ./linux/tools/testing/selftests/bpf/test_offload.py --log /tmp/log; do echo; done

I usually crashes after ~10 minutes on my machine.

> >I get this:  
> 
> Could you try to run it with kasan on?

I didn't manage to reproduce it with KASAN on so far :(  Even enabling
object debugging to get the second splat in my email (which is more
useful) actually makes the crash go away, I only see the warning...

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RCU callback crashes
  2017-12-20  6:22   ` Jakub Kicinski
@ 2017-12-20  6:34     ` Jakub Kicinski
  2017-12-20 18:04       ` John Fastabend
  2017-12-20 18:17       ` Cong Wang
  0 siblings, 2 replies; 28+ messages in thread
From: Jakub Kicinski @ 2017-12-20  6:34 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, Cong Wang

On Tue, 19 Dec 2017 22:22:27 -0800, Jakub Kicinski wrote:
> > >I get this:    
> > 
> > Could you try to run it with kasan on?  
> 
> I didn't manage to reproduce it with KASAN on so far :(  Even enabling
> object debugging to get the second splat in my email (which is more
> useful) actually makes the crash go away, I only see the warning...

Ah, no object debug but KASAN on produces this:

[   39.268209] BUG: KASAN: use-after-free in cpu_needs_another_gp+0x246/0x2b0
[   39.275965] Read of size 8 at addr ffff8803aa64f138 by task swapper/13/0
[   39.283524] 
[   39.285256] CPU: 13 PID: 0 Comm: swapper/13 Not tainted 4.15.0-rc3-perf-00955-g1d0b01347dd5-dirty #8
[   39.295535] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
[   39.303969] Call Trace:
[   39.306769]  <IRQ>
[   39.309088]  dump_stack+0xa6/0x118
[   39.312957]  ? _atomic_dec_and_lock+0xe8/0xe8
[   39.317895]  ? cpu_needs_another_gp+0x246/0x2b0
[   39.323030]  print_address_description+0x6a/0x270
[   39.328380]  ? cpu_needs_another_gp+0x246/0x2b0
[   39.333510]  kasan_report+0x23f/0x350
[   39.337672]  cpu_needs_another_gp+0x246/0x2b0
...
[   39.383026]  rcu_process_callbacks+0x1a0/0x620
...
[   39.426713]  __do_softirq+0x17f/0x4de
...
[   39.463841]  irq_exit+0xe1/0xf0
[   39.467437]  smp_apic_timer_interrupt+0xd9/0x290
[   39.472685]  ? smp_call_function_single_interrupt+0x230/0x230
[   39.479195]  ? smp_reschedule_interrupt+0x240/0x240
[   39.484736]  apic_timer_interrupt+0x8c/0xa0
[   39.489497]  </IRQ>
[   39.491929] RIP: 0010:cpuidle_enter_state+0x12a/0x510
[   39.497660] RSP: 0018:ffff88086bf9fd08 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff11
[   39.506228] RAX: 0000000000000000 RBX: ffffe8ffffb060e0 RCX: ffffffff921329f5
[   39.514291] RDX: dffffc0000000000 RSI: dffffc0000000000 RDI: ffff88086f3246e8
[   39.522354] RBP: 1ffff1010d7f3fa6 R08: fffffbfff2742768 R09: fffffbfff2742768
[   39.530418] R10: ffff88086bf9fcc8 R11: fffffbfff2742767 R12: 0000000924148b4b
[   39.538480] R13: 0000000000000004 R14: 0000000000000004 R15: ffffffff9383eb80
[   39.546545]  ? sched_idle_set_state+0x25/0x30
[   39.551502]  ? cpuidle_enter_state+0x106/0x510
[   39.556556]  ? cpuidle_enter_s2idle+0x130/0x130
[   39.561706]  ? rcu_eqs_enter_common.constprop.62+0xd1/0x1e0
[   39.568037]  ? rcu_gp_init+0xf70/0xf70
[   39.572331]  ? sched_set_stop_task+0x160/0x160
[   39.577384]  do_idle+0x1af/0x200
[   39.581076]  cpu_startup_entry+0xd2/0xe0
[   39.585545]  ? cpu_in_idle+0x20/0x20
[   39.589626]  ? _raw_spin_trylock+0xe0/0xe0
[   39.594292]  ? memcpy+0x34/0x50
[   39.597890]  start_secondary+0x271/0x2b0
[   39.602361]  ? set_cpu_sibling_map+0x840/0x840
[   39.607416]  secondary_startup_64+0xa5/0xb0
[   39.612180] 
[   39.613929] Allocated by task 1358:
[   39.617914]  __kmalloc_node+0x183/0x2c0
[   39.622290]  qdisc_alloc+0xbd/0x3f0
[   39.626274]  qdisc_create+0xd8/0x720
[   39.630355]  tc_modify_qdisc+0x657/0x910
[   39.634826]  rtnetlink_rcv_msg+0x37c/0x7e0
[   39.639491]  netlink_rcv_skb+0x122/0x230
[   39.643960]  netlink_unicast+0x2ae/0x360
[   39.648443]  netlink_sendmsg+0x5d5/0x620
[   39.652915]  sock_sendmsg+0x64/0x80
[   39.656900]  ___sys_sendmsg+0x4a8/0x500
[   39.661272]  __sys_sendmsg+0xa9/0x140
[   39.665450]  entry_SYSCALL_64_fastpath+0x1e/0x81
[   39.670695] 
[   39.672441] Freed by task 1370:
[   39.676052]  kfree+0x8d/0x1c0
[   39.679454]  qdisc_graft+0x208/0x670
[   39.683535]  tc_get_qdisc+0x229/0x350
[   39.687713]  rtnetlink_rcv_msg+0x37c/0x7e0
[   39.692411]  netlink_rcv_skb+0x122/0x230
[   39.696881]  netlink_unicast+0x2ae/0x360
[   39.701350]  netlink_sendmsg+0x5d5/0x620
[   39.705819]  sock_sendmsg+0x64/0x80
[   39.709801]  ___sys_sendmsg+0x4a8/0x500
[   39.714172]  __sys_sendmsg+0xa9/0x140
[   39.718351]  entry_SYSCALL_64_fastpath+0x1e/0x81
[   39.723597] 
[   39.725347] The buggy address belongs to the object at ffff8803aa64ef80
[   39.725347]  which belongs to the cache kmalloc-512 of size 512
[   39.739453] The buggy address is located 440 bytes inside of
[   39.739453]  512-byte region [ffff8803aa64ef80, ffff8803aa64f180)
[   39.752684] The buggy address belongs to the page:
[   39.758127] page:0000000042b3124b count:1 mapcount:0 mapping:          (null) index:0x0 compound_mapcount: 0
[   39.769222] flags: 0x2ffff0000008100(slab|head)
[   39.774365] raw: 02ffff0000008100 0000000000000000 0000000000000000 0000000180190019
[   39.783129] raw: dead000000000100 dead000000000200 ffff8803afc0ed80 0000000000000000
[   39.791986] page dumped because: kasan: bad access detected
[   39.798300] 
[   39.800063] Memory state around the buggy address:
[   39.805503]  ffff8803aa64f000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   39.813684]  ffff8803aa64f080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   39.821866] >ffff8803aa64f100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   39.830045]                                         ^
[   39.835778]  ffff8803aa64f180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   39.843958]  ffff8803aa64f200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RCU callback crashes
  2017-12-20  6:34     ` Jakub Kicinski
@ 2017-12-20 18:04       ` John Fastabend
  2017-12-20 20:17         ` Jakub Kicinski
  2017-12-20 18:17       ` Cong Wang
  1 sibling, 1 reply; 28+ messages in thread
From: John Fastabend @ 2017-12-20 18:04 UTC (permalink / raw)
  To: Jakub Kicinski, Jiri Pirko; +Cc: netdev, Cong Wang

On 12/19/2017 10:34 PM, Jakub Kicinski wrote:
> On Tue, 19 Dec 2017 22:22:27 -0800, Jakub Kicinski wrote:
>>>> I get this:    
>>>
>>> Could you try to run it with kasan on?  
>>
>> I didn't manage to reproduce it with KASAN on so far :(  Even enabling
>> object debugging to get the second splat in my email (which is more
>> useful) actually makes the crash go away, I only see the warning...
> 
> Ah, no object debug but KASAN on produces this:
> 

@Jakub, This is with mq and pfifo_fast I guess?

> [   39.268209] BUG: KASAN: use-after-free in cpu_needs_another_gp+0x246/0x2b0
> [   39.275965] Read of size 8 at addr ffff8803aa64f138 by task swapper/13/0
> [   39.283524] 
> [   39.285256] CPU: 13 PID: 0 Comm: swapper/13 Not tainted 4.15.0-rc3-perf-00955-g1d0b01347dd5-dirty #8
> [   39.295535] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
> [   39.303969] Call Trace:
> [   39.306769]  <IRQ>
> [   39.309088]  dump_stack+0xa6/0x118
> [   39.312957]  ? _atomic_dec_and_lock+0xe8/0xe8
> [   39.317895]  ? cpu_needs_another_gp+0x246/0x2b0
> [   39.323030]  print_address_description+0x6a/0x270
> [   39.328380]  ? cpu_needs_another_gp+0x246/0x2b0
> [   39.333510]  kasan_report+0x23f/0x350
> [   39.337672]  cpu_needs_another_gp+0x246/0x2b0
> ...
> [   39.383026]  rcu_process_callbacks+0x1a0/0x620
> ...
> [   39.426713]  __do_softirq+0x17f/0x4de
> ...
> [   39.463841]  irq_exit+0xe1/0xf0
> [   39.467437]  smp_apic_timer_interrupt+0xd9/0x290
> [   39.472685]  ? smp_call_function_single_interrupt+0x230/0x230
> [   39.479195]  ? smp_reschedule_interrupt+0x240/0x240
> [   39.484736]  apic_timer_interrupt+0x8c/0xa0
> [   39.489497]  </IRQ>
> [   39.491929] RIP: 0010:cpuidle_enter_state+0x12a/0x510
> [   39.497660] RSP: 0018:ffff88086bf9fd08 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff11
> [   39.506228] RAX: 0000000000000000 RBX: ffffe8ffffb060e0 RCX: ffffffff921329f5
> [   39.514291] RDX: dffffc0000000000 RSI: dffffc0000000000 RDI: ffff88086f3246e8
> [   39.522354] RBP: 1ffff1010d7f3fa6 R08: fffffbfff2742768 R09: fffffbfff2742768
> [   39.530418] R10: ffff88086bf9fcc8 R11: fffffbfff2742767 R12: 0000000924148b4b
> [   39.538480] R13: 0000000000000004 R14: 0000000000000004 R15: ffffffff9383eb80
> [   39.546545]  ? sched_idle_set_state+0x25/0x30
> [   39.551502]  ? cpuidle_enter_state+0x106/0x510
> [   39.556556]  ? cpuidle_enter_s2idle+0x130/0x130
> [   39.561706]  ? rcu_eqs_enter_common.constprop.62+0xd1/0x1e0
> [   39.568037]  ? rcu_gp_init+0xf70/0xf70
> [   39.572331]  ? sched_set_stop_task+0x160/0x160
> [   39.577384]  do_idle+0x1af/0x200
> [   39.581076]  cpu_startup_entry+0xd2/0xe0
> [   39.585545]  ? cpu_in_idle+0x20/0x20
> [   39.589626]  ? _raw_spin_trylock+0xe0/0xe0
> [   39.594292]  ? memcpy+0x34/0x50
> [   39.597890]  start_secondary+0x271/0x2b0
> [   39.602361]  ? set_cpu_sibling_map+0x840/0x840
> [   39.607416]  secondary_startup_64+0xa5/0xb0
> [   39.612180] 
> [   39.613929] Allocated by task 1358:
> [   39.617914]  __kmalloc_node+0x183/0x2c0
> [   39.622290]  qdisc_alloc+0xbd/0x3f0
> [   39.626274]  qdisc_create+0xd8/0x720
> [   39.630355]  tc_modify_qdisc+0x657/0x910
> [   39.634826]  rtnetlink_rcv_msg+0x37c/0x7e0
> [   39.639491]  netlink_rcv_skb+0x122/0x230
> [   39.643960]  netlink_unicast+0x2ae/0x360
> [   39.648443]  netlink_sendmsg+0x5d5/0x620
> [   39.652915]  sock_sendmsg+0x64/0x80
> [   39.656900]  ___sys_sendmsg+0x4a8/0x500
> [   39.661272]  __sys_sendmsg+0xa9/0x140
> [   39.665450]  entry_SYSCALL_64_fastpath+0x1e/0x81
> [   39.670695] 
> [   39.672441] Freed by task 1370:
> [   39.676052]  kfree+0x8d/0x1c0
> [   39.679454]  qdisc_graft+0x208/0x670
> [   39.683535]  tc_get_qdisc+0x229/0x350
> [   39.687713]  rtnetlink_rcv_msg+0x37c/0x7e0
> [   39.692411]  netlink_rcv_skb+0x122/0x230
> [   39.696881]  netlink_unicast+0x2ae/0x360
> [   39.701350]  netlink_sendmsg+0x5d5/0x620
> [   39.705819]  sock_sendmsg+0x64/0x80
> [   39.709801]  ___sys_sendmsg+0x4a8/0x500
> [   39.714172]  __sys_sendmsg+0xa9/0x140
> [   39.718351]  entry_SYSCALL_64_fastpath+0x1e/0x81
> [   39.723597] 
> [   39.725347] The buggy address belongs to the object at ffff8803aa64ef80
> [   39.725347]  which belongs to the cache kmalloc-512 of size 512
> [   39.739453] The buggy address is located 440 bytes inside of
> [   39.739453]  512-byte region [ffff8803aa64ef80, ffff8803aa64f180)
> [   39.752684] The buggy address belongs to the page:
> [   39.758127] page:0000000042b3124b count:1 mapcount:0 mapping:          (null) index:0x0 compound_mapcount: 0
> [   39.769222] flags: 0x2ffff0000008100(slab|head)
> [   39.774365] raw: 02ffff0000008100 0000000000000000 0000000000000000 0000000180190019
> [   39.783129] raw: dead000000000100 dead000000000200 ffff8803afc0ed80 0000000000000000
> [   39.791986] page dumped because: kasan: bad access detected
> [   39.798300] 
> [   39.800063] Memory state around the buggy address:
> [   39.805503]  ffff8803aa64f000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [   39.813684]  ffff8803aa64f080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [   39.821866] >ffff8803aa64f100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [   39.830045]                                         ^
> [   39.835778]  ffff8803aa64f180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> [   39.843958]  ffff8803aa64f200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> 

So with lockless qdisc support we really do need to wait a
RCU grace period before free'ing the qdisc. I missed this initially
in the lockless qdisc set but we need to revert this,

commit 752fbcc33405d6f8249465e4b2c4e420091bb825
Author: Cong Wang <xiyou.wangcong@gmail.com>
Date:   Tue Sep 19 13:15:42 2017 -0700

    net_sched: no need to free qdisc in RCU callback
    
    gen estimator has been rewritten in commit 1c0d32fde5bd
    ("net_sched: gen_estimator: complete rewrite of rate estimators"),
    the caller no longer needs to wait for a grace period. So this
    patch gets rid of it.
    
    Cc: Jamal Hadi Salim <jhs@mojatatu.com>
    Cc: Eric Dumazet <edumazet@google.com>
    Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
    Acked-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Thanks,
John

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RCU callback crashes
  2017-12-20  6:34     ` Jakub Kicinski
  2017-12-20 18:04       ` John Fastabend
@ 2017-12-20 18:17       ` Cong Wang
  2017-12-20 18:31         ` Cong Wang
  2017-12-20 19:59         ` Jiri Pirko
  1 sibling, 2 replies; 28+ messages in thread
From: Cong Wang @ 2017-12-20 18:17 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Jiri Pirko, netdev

On Tue, Dec 19, 2017 at 10:34 PM, Jakub Kicinski <kubakici@wp.pl> wrote:
> Ah, no object debug but KASAN on produces this:
>


I bet it is an ingress qdisc which is being freed?



> [   39.268209] BUG: KASAN: use-after-free in cpu_needs_another_gp+0x246/0x2b0
> [   39.275965] Read of size 8 at addr ffff8803aa64f138 by task swapper/13/0
> [   39.283524]
> [   39.285256] CPU: 13 PID: 0 Comm: swapper/13 Not tainted 4.15.0-rc3-perf-00955-g1d0b01347dd5-dirty #8
> [   39.295535] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
> [   39.303969] Call Trace:
> [   39.306769]  <IRQ>
> [   39.309088]  dump_stack+0xa6/0x118
> [   39.312957]  ? _atomic_dec_and_lock+0xe8/0xe8
> [   39.317895]  ? cpu_needs_another_gp+0x246/0x2b0
> [   39.323030]  print_address_description+0x6a/0x270
> [   39.328380]  ? cpu_needs_another_gp+0x246/0x2b0
> [   39.333510]  kasan_report+0x23f/0x350
> [   39.337672]  cpu_needs_another_gp+0x246/0x2b0
> ...
> [   39.383026]  rcu_process_callbacks+0x1a0/0x620
> ...


This is confusing.

I guess it is q->miniqp which is freed in qdisc_graft() without properly
waiting for rcu readers?


> [   39.426713]  __do_softirq+0x17f/0x4de
> ...
> [   39.463841]  irq_exit+0xe1/0xf0
> [   39.467437]  smp_apic_timer_interrupt+0xd9/0x290
> [   39.472685]  ? smp_call_function_single_interrupt+0x230/0x230
> [   39.479195]  ? smp_reschedule_interrupt+0x240/0x240
> [   39.484736]  apic_timer_interrupt+0x8c/0xa0
> [   39.489497]  </IRQ>
> [   39.491929] RIP: 0010:cpuidle_enter_state+0x12a/0x510
> [   39.497660] RSP: 0018:ffff88086bf9fd08 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff11
> [   39.506228] RAX: 0000000000000000 RBX: ffffe8ffffb060e0 RCX: ffffffff921329f5
> [   39.514291] RDX: dffffc0000000000 RSI: dffffc0000000000 RDI: ffff88086f3246e8
> [   39.522354] RBP: 1ffff1010d7f3fa6 R08: fffffbfff2742768 R09: fffffbfff2742768
> [   39.530418] R10: ffff88086bf9fcc8 R11: fffffbfff2742767 R12: 0000000924148b4b
> [   39.538480] R13: 0000000000000004 R14: 0000000000000004 R15: ffffffff9383eb80
> [   39.546545]  ? sched_idle_set_state+0x25/0x30
> [   39.551502]  ? cpuidle_enter_state+0x106/0x510
> [   39.556556]  ? cpuidle_enter_s2idle+0x130/0x130
> [   39.561706]  ? rcu_eqs_enter_common.constprop.62+0xd1/0x1e0
> [   39.568037]  ? rcu_gp_init+0xf70/0xf70
> [   39.572331]  ? sched_set_stop_task+0x160/0x160
> [   39.577384]  do_idle+0x1af/0x200
> [   39.581076]  cpu_startup_entry+0xd2/0xe0
> [   39.585545]  ? cpu_in_idle+0x20/0x20
> [   39.589626]  ? _raw_spin_trylock+0xe0/0xe0
> [   39.594292]  ? memcpy+0x34/0x50
> [   39.597890]  start_secondary+0x271/0x2b0
> [   39.602361]  ? set_cpu_sibling_map+0x840/0x840
> [   39.607416]  secondary_startup_64+0xa5/0xb0
> [   39.612180]
> [   39.613929] Allocated by task 1358:
> [   39.617914]  __kmalloc_node+0x183/0x2c0
> [   39.622290]  qdisc_alloc+0xbd/0x3f0
> [   39.626274]  qdisc_create+0xd8/0x720
> [   39.630355]  tc_modify_qdisc+0x657/0x910
> [   39.634826]  rtnetlink_rcv_msg+0x37c/0x7e0
> [   39.639491]  netlink_rcv_skb+0x122/0x230
> [   39.643960]  netlink_unicast+0x2ae/0x360
> [   39.648443]  netlink_sendmsg+0x5d5/0x620
> [   39.652915]  sock_sendmsg+0x64/0x80
> [   39.656900]  ___sys_sendmsg+0x4a8/0x500
> [   39.661272]  __sys_sendmsg+0xa9/0x140
> [   39.665450]  entry_SYSCALL_64_fastpath+0x1e/0x81
> [   39.670695]
> [   39.672441] Freed by task 1370:
> [   39.676052]  kfree+0x8d/0x1c0
> [   39.679454]  qdisc_graft+0x208/0x670
> [   39.683535]  tc_get_qdisc+0x229/0x350
> [   39.687713]  rtnetlink_rcv_msg+0x37c/0x7e0
> [   39.692411]  netlink_rcv_skb+0x122/0x230
> [   39.696881]  netlink_unicast+0x2ae/0x360
> [   39.701350]  netlink_sendmsg+0x5d5/0x620
> [   39.705819]  sock_sendmsg+0x64/0x80
> [   39.709801]  ___sys_sendmsg+0x4a8/0x500
> [   39.714172]  __sys_sendmsg+0xa9/0x140
> [   39.718351]  entry_SYSCALL_64_fastpath+0x1e/0x81
> [   39.723597]
> [   39.725347] The buggy address belongs to the object at ffff8803aa64ef80
> [   39.725347]  which belongs to the cache kmalloc-512 of size 512
> [   39.739453] The buggy address is located 440 bytes inside of
> [   39.739453]  512-byte region [ffff8803aa64ef80, ffff8803aa64f180)
> [   39.752684] The buggy address belongs to the page:
> [   39.758127] page:0000000042b3124b count:1 mapcount:0 mapping:          (null) index:0x0 compound_mapcount: 0
> [   39.769222] flags: 0x2ffff0000008100(slab|head)
> [   39.774365] raw: 02ffff0000008100 0000000000000000 0000000000000000 0000000180190019
> [   39.783129] raw: dead000000000100 dead000000000200 ffff8803afc0ed80 0000000000000000
> [   39.791986] page dumped because: kasan: bad access detected
> [   39.798300]
> [   39.800063] Memory state around the buggy address:
> [   39.805503]  ffff8803aa64f000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [   39.813684]  ffff8803aa64f080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [   39.821866] >ffff8803aa64f100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [   39.830045]                                         ^
> [   39.835778]  ffff8803aa64f180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> [   39.843958]  ffff8803aa64f200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RCU callback crashes
  2017-12-20 18:17       ` Cong Wang
@ 2017-12-20 18:31         ` Cong Wang
  2017-12-21  0:03           ` Cong Wang
  2017-12-20 19:59         ` Jiri Pirko
  1 sibling, 1 reply; 28+ messages in thread
From: Cong Wang @ 2017-12-20 18:31 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Jiri Pirko, netdev

On Wed, Dec 20, 2017 at 10:17 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> I guess it is q->miniqp which is freed in qdisc_graft() without properly
> waiting for rcu readers?

It is probably so, the call_rcu_bh(&miniq_old->rcu, mini_qdisc_rcu_func)
in the end of mini_qdisc_pair_swap() is invoked on miniq_old->rcu,
but miniq is being freed, no rcu barrier waits for it...

You can try to add a rcu_barrier_bh() at the end to see if this crash
is gone, but I don't think people like adding yet another rcu barrier...

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RCU callback crashes
  2017-12-20 18:17       ` Cong Wang
  2017-12-20 18:31         ` Cong Wang
@ 2017-12-20 19:59         ` Jiri Pirko
  2017-12-20 20:14           ` John Fastabend
  2017-12-20 20:15           ` Jiri Pirko
  1 sibling, 2 replies; 28+ messages in thread
From: Jiri Pirko @ 2017-12-20 19:59 UTC (permalink / raw)
  To: Cong Wang; +Cc: Jakub Kicinski, netdev

Wed, Dec 20, 2017 at 07:17:50PM CET, xiyou.wangcong@gmail.com wrote:
>On Tue, Dec 19, 2017 at 10:34 PM, Jakub Kicinski <kubakici@wp.pl> wrote:
>> Ah, no object debug but KASAN on produces this:
>>
>
>
>I bet it is an ingress qdisc which is being freed?
>
>
>
>> [   39.268209] BUG: KASAN: use-after-free in cpu_needs_another_gp+0x246/0x2b0
>> [   39.275965] Read of size 8 at addr ffff8803aa64f138 by task swapper/13/0
>> [   39.283524]
>> [   39.285256] CPU: 13 PID: 0 Comm: swapper/13 Not tainted 4.15.0-rc3-perf-00955-g1d0b01347dd5-dirty #8
>> [   39.295535] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
>> [   39.303969] Call Trace:
>> [   39.306769]  <IRQ>
>> [   39.309088]  dump_stack+0xa6/0x118
>> [   39.312957]  ? _atomic_dec_and_lock+0xe8/0xe8
>> [   39.317895]  ? cpu_needs_another_gp+0x246/0x2b0
>> [   39.323030]  print_address_description+0x6a/0x270
>> [   39.328380]  ? cpu_needs_another_gp+0x246/0x2b0
>> [   39.333510]  kasan_report+0x23f/0x350
>> [   39.337672]  cpu_needs_another_gp+0x246/0x2b0
>> ...
>> [   39.383026]  rcu_process_callbacks+0x1a0/0x620
>> ...
>
>
>This is confusing.
>
>I guess it is q->miniqp which is freed in qdisc_graft() without properly
>waiting for rcu readers?

miniqp is inside qdisc private data:
struct ingress_sched_data {
        struct tcf_block *block;
        struct tcf_block_ext_info block_info;
        struct mini_Qdisc_pair miniqp;
};

That is freed along with the qdisc itself in:
qdisc_destroy->qdisc_free

Before miniq, tp was checked in the rcu reader path. In case it was not
null, q was processed. In slow patch, tp is freed after rcu grace period:
tcf_proto_destroy->kfree_rcu

I assumed that since q is processed in rcu reader, it is also freed after
a grace period, but now looking at the code I don't see it happening
like that.

So I think that change to miniq made the existing race window
a bit wider and easier to hit.

I believe that calling kfree_rcu by call_rcu should resolve this.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RCU callback crashes
  2017-12-20 19:59         ` Jiri Pirko
@ 2017-12-20 20:14           ` John Fastabend
  2017-12-20 20:18             ` Jiri Pirko
  2017-12-20 22:25             ` Cong Wang
  2017-12-20 20:15           ` Jiri Pirko
  1 sibling, 2 replies; 28+ messages in thread
From: John Fastabend @ 2017-12-20 20:14 UTC (permalink / raw)
  To: Jiri Pirko, Cong Wang; +Cc: Jakub Kicinski, netdev

On 12/20/2017 11:59 AM, Jiri Pirko wrote:
> Wed, Dec 20, 2017 at 07:17:50PM CET, xiyou.wangcong@gmail.com wrote:
>> On Tue, Dec 19, 2017 at 10:34 PM, Jakub Kicinski <kubakici@wp.pl> wrote:
>>> Ah, no object debug but KASAN on produces this:
>>>
>>
>>
>> I bet it is an ingress qdisc which is being freed?
>>
>>
>>
>>> [   39.268209] BUG: KASAN: use-after-free in cpu_needs_another_gp+0x246/0x2b0
>>> [   39.275965] Read of size 8 at addr ffff8803aa64f138 by task swapper/13/0
>>> [   39.283524]
>>> [   39.285256] CPU: 13 PID: 0 Comm: swapper/13 Not tainted 4.15.0-rc3-perf-00955-g1d0b01347dd5-dirty #8
>>> [   39.295535] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
>>> [   39.303969] Call Trace:
>>> [   39.306769]  <IRQ>
>>> [   39.309088]  dump_stack+0xa6/0x118
>>> [   39.312957]  ? _atomic_dec_and_lock+0xe8/0xe8
>>> [   39.317895]  ? cpu_needs_another_gp+0x246/0x2b0
>>> [   39.323030]  print_address_description+0x6a/0x270
>>> [   39.328380]  ? cpu_needs_another_gp+0x246/0x2b0
>>> [   39.333510]  kasan_report+0x23f/0x350
>>> [   39.337672]  cpu_needs_another_gp+0x246/0x2b0
>>> ...
>>> [   39.383026]  rcu_process_callbacks+0x1a0/0x620
>>> ...
>>
>>
>> This is confusing.
>>
>> I guess it is q->miniqp which is freed in qdisc_graft() without properly
>> waiting for rcu readers?
> 
> miniqp is inside qdisc private data:
> struct ingress_sched_data {
>         struct tcf_block *block;
>         struct tcf_block_ext_info block_info;
>         struct mini_Qdisc_pair miniqp;
> };
> 
> That is freed along with the qdisc itself in:
> qdisc_destroy->qdisc_free
> 
> Before miniq, tp was checked in the rcu reader path. In case it was not
> null, q was processed. In slow patch, tp is freed after rcu grace period:
> tcf_proto_destroy->kfree_rcu
> 
> I assumed that since q is processed in rcu reader, it is also freed after
> a grace period, but now looking at the code I don't see it happening
> like that.
> 
> So I think that change to miniq made the existing race window
> a bit wider and easier to hit.
> 
> I believe that calling kfree_rcu by call_rcu should resolve this.
> 

Hi,

Just sent a patch to complete qdisc_destroy from rcu callback. This
is needed to resolve a race with the lockless qdisc patches.

But I guess it should fix the miniq issue as well?

Thanks,
John

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RCU callback crashes
  2017-12-20 19:59         ` Jiri Pirko
  2017-12-20 20:14           ` John Fastabend
@ 2017-12-20 20:15           ` Jiri Pirko
  2017-12-20 20:18             ` John Fastabend
  1 sibling, 1 reply; 28+ messages in thread
From: Jiri Pirko @ 2017-12-20 20:15 UTC (permalink / raw)
  To: Cong Wang; +Cc: Jakub Kicinski, netdev, john.fastabend

Wed, Dec 20, 2017 at 08:59:22PM CET, jiri@resnulli.us wrote:
>Wed, Dec 20, 2017 at 07:17:50PM CET, xiyou.wangcong@gmail.com wrote:
>>On Tue, Dec 19, 2017 at 10:34 PM, Jakub Kicinski <kubakici@wp.pl> wrote:
>>> Ah, no object debug but KASAN on produces this:
>>>
>>
>>
>>I bet it is an ingress qdisc which is being freed?
>>
>>
>>
>>> [   39.268209] BUG: KASAN: use-after-free in cpu_needs_another_gp+0x246/0x2b0
>>> [   39.275965] Read of size 8 at addr ffff8803aa64f138 by task swapper/13/0
>>> [   39.283524]
>>> [   39.285256] CPU: 13 PID: 0 Comm: swapper/13 Not tainted 4.15.0-rc3-perf-00955-g1d0b01347dd5-dirty #8
>>> [   39.295535] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
>>> [   39.303969] Call Trace:
>>> [   39.306769]  <IRQ>
>>> [   39.309088]  dump_stack+0xa6/0x118
>>> [   39.312957]  ? _atomic_dec_and_lock+0xe8/0xe8
>>> [   39.317895]  ? cpu_needs_another_gp+0x246/0x2b0
>>> [   39.323030]  print_address_description+0x6a/0x270
>>> [   39.328380]  ? cpu_needs_another_gp+0x246/0x2b0
>>> [   39.333510]  kasan_report+0x23f/0x350
>>> [   39.337672]  cpu_needs_another_gp+0x246/0x2b0
>>> ...
>>> [   39.383026]  rcu_process_callbacks+0x1a0/0x620
>>> ...
>>
>>
>>This is confusing.
>>
>>I guess it is q->miniqp which is freed in qdisc_graft() without properly
>>waiting for rcu readers?
>
>miniqp is inside qdisc private data:
>struct ingress_sched_data {
>        struct tcf_block *block;
>        struct tcf_block_ext_info block_info;
>        struct mini_Qdisc_pair miniqp;
>};
>
>That is freed along with the qdisc itself in:
>qdisc_destroy->qdisc_free
>
>Before miniq, tp was checked in the rcu reader path. In case it was not
>null, q was processed. In slow patch, tp is freed after rcu grace period:
>tcf_proto_destroy->kfree_rcu
>
>I assumed that since q is processed in rcu reader, it is also freed after
>a grace period, but now looking at the code I don't see it happening
>like that.

Aha! It was removed by:
commit c5ad119fb6c09b0297446be05bd66602fa564758
Author: John Fastabend <john.fastabend@gmail.com>
Date:   Thu Dec 7 09:58:19 2017 -0800

    net: sched: pfifo_fast use skb_array


>
>So I think that change to miniq made the existing race window
>a bit wider and easier to hit.
>
>I believe that calling kfree_rcu by call_rcu should resolve this.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RCU callback crashes
  2017-12-20 18:04       ` John Fastabend
@ 2017-12-20 20:17         ` Jakub Kicinski
  2017-12-20 20:23           ` John Fastabend
  0 siblings, 1 reply; 28+ messages in thread
From: Jakub Kicinski @ 2017-12-20 20:17 UTC (permalink / raw)
  To: John Fastabend; +Cc: Jiri Pirko, netdev, Cong Wang

On Wed, 20 Dec 2017 10:04:17 -0800, John Fastabend wrote:
> On 12/19/2017 10:34 PM, Jakub Kicinski wrote:
> > On Tue, 19 Dec 2017 22:22:27 -0800, Jakub Kicinski wrote:  
> >>>> I get this:      
> >>>
> >>> Could you try to run it with kasan on?    
> >>
> >> I didn't manage to reproduce it with KASAN on so far :(  Even enabling
> >> object debugging to get the second splat in my email (which is more
> >> useful) actually makes the crash go away, I only see the warning...  
> > 
> > Ah, no object debug but KASAN on produces this:
> >   
> 
> @Jakub, This is with mq and pfifo_fast I guess?

Sorry for falling silent, I was convinced I saw this before your code
went in, it just takes a lot longer to trigger... I've been running
net-next from Dec 1st now for an hour now and it didn't crash :/

Trying KASAN now..

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RCU callback crashes
  2017-12-20 20:14           ` John Fastabend
@ 2017-12-20 20:18             ` Jiri Pirko
  2017-12-20 22:25             ` Cong Wang
  1 sibling, 0 replies; 28+ messages in thread
From: Jiri Pirko @ 2017-12-20 20:18 UTC (permalink / raw)
  To: John Fastabend; +Cc: Cong Wang, Jakub Kicinski, netdev

Wed, Dec 20, 2017 at 09:14:49PM CET, john.fastabend@gmail.com wrote:
>On 12/20/2017 11:59 AM, Jiri Pirko wrote:
>> Wed, Dec 20, 2017 at 07:17:50PM CET, xiyou.wangcong@gmail.com wrote:
>>> On Tue, Dec 19, 2017 at 10:34 PM, Jakub Kicinski <kubakici@wp.pl> wrote:
>>>> Ah, no object debug but KASAN on produces this:
>>>>
>>>
>>>
>>> I bet it is an ingress qdisc which is being freed?
>>>
>>>
>>>
>>>> [   39.268209] BUG: KASAN: use-after-free in cpu_needs_another_gp+0x246/0x2b0
>>>> [   39.275965] Read of size 8 at addr ffff8803aa64f138 by task swapper/13/0
>>>> [   39.283524]
>>>> [   39.285256] CPU: 13 PID: 0 Comm: swapper/13 Not tainted 4.15.0-rc3-perf-00955-g1d0b01347dd5-dirty #8
>>>> [   39.295535] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
>>>> [   39.303969] Call Trace:
>>>> [   39.306769]  <IRQ>
>>>> [   39.309088]  dump_stack+0xa6/0x118
>>>> [   39.312957]  ? _atomic_dec_and_lock+0xe8/0xe8
>>>> [   39.317895]  ? cpu_needs_another_gp+0x246/0x2b0
>>>> [   39.323030]  print_address_description+0x6a/0x270
>>>> [   39.328380]  ? cpu_needs_another_gp+0x246/0x2b0
>>>> [   39.333510]  kasan_report+0x23f/0x350
>>>> [   39.337672]  cpu_needs_another_gp+0x246/0x2b0
>>>> ...
>>>> [   39.383026]  rcu_process_callbacks+0x1a0/0x620
>>>> ...
>>>
>>>
>>> This is confusing.
>>>
>>> I guess it is q->miniqp which is freed in qdisc_graft() without properly
>>> waiting for rcu readers?
>> 
>> miniqp is inside qdisc private data:
>> struct ingress_sched_data {
>>         struct tcf_block *block;
>>         struct tcf_block_ext_info block_info;
>>         struct mini_Qdisc_pair miniqp;
>> };
>> 
>> That is freed along with the qdisc itself in:
>> qdisc_destroy->qdisc_free
>> 
>> Before miniq, tp was checked in the rcu reader path. In case it was not
>> null, q was processed. In slow patch, tp is freed after rcu grace period:
>> tcf_proto_destroy->kfree_rcu
>> 
>> I assumed that since q is processed in rcu reader, it is also freed after
>> a grace period, but now looking at the code I don't see it happening
>> like that.
>> 
>> So I think that change to miniq made the existing race window
>> a bit wider and easier to hit.
>> 
>> I believe that calling kfree_rcu by call_rcu should resolve this.
>> 
>
>Hi,
>
>Just sent a patch to complete qdisc_destroy from rcu callback. This
>is needed to resolve a race with the lockless qdisc patches.
>
>But I guess it should fix the miniq issue as well?

Yes, it should.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RCU callback crashes
  2017-12-20 20:15           ` Jiri Pirko
@ 2017-12-20 20:18             ` John Fastabend
  0 siblings, 0 replies; 28+ messages in thread
From: John Fastabend @ 2017-12-20 20:18 UTC (permalink / raw)
  To: Jiri Pirko, Cong Wang; +Cc: Jakub Kicinski, netdev

On 12/20/2017 12:15 PM, Jiri Pirko wrote:
> Wed, Dec 20, 2017 at 08:59:22PM CET, jiri@resnulli.us wrote:
>> Wed, Dec 20, 2017 at 07:17:50PM CET, xiyou.wangcong@gmail.com wrote:
>>> On Tue, Dec 19, 2017 at 10:34 PM, Jakub Kicinski <kubakici@wp.pl> wrote:
>>>> Ah, no object debug but KASAN on produces this:
>>>>
>>>
>>>
>>> I bet it is an ingress qdisc which is being freed?
>>>
>>>
>>>
>>>> [   39.268209] BUG: KASAN: use-after-free in cpu_needs_another_gp+0x246/0x2b0
>>>> [   39.275965] Read of size 8 at addr ffff8803aa64f138 by task swapper/13/0
>>>> [   39.283524]
>>>> [   39.285256] CPU: 13 PID: 0 Comm: swapper/13 Not tainted 4.15.0-rc3-perf-00955-g1d0b01347dd5-dirty #8
>>>> [   39.295535] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
>>>> [   39.303969] Call Trace:
>>>> [   39.306769]  <IRQ>
>>>> [   39.309088]  dump_stack+0xa6/0x118
>>>> [   39.312957]  ? _atomic_dec_and_lock+0xe8/0xe8
>>>> [   39.317895]  ? cpu_needs_another_gp+0x246/0x2b0
>>>> [   39.323030]  print_address_description+0x6a/0x270
>>>> [   39.328380]  ? cpu_needs_another_gp+0x246/0x2b0
>>>> [   39.333510]  kasan_report+0x23f/0x350
>>>> [   39.337672]  cpu_needs_another_gp+0x246/0x2b0
>>>> ...
>>>> [   39.383026]  rcu_process_callbacks+0x1a0/0x620
>>>> ...
>>>
>>>
>>> This is confusing.
>>>
>>> I guess it is q->miniqp which is freed in qdisc_graft() without properly
>>> waiting for rcu readers?
>>
>> miniqp is inside qdisc private data:
>> struct ingress_sched_data {
>>        struct tcf_block *block;
>>        struct tcf_block_ext_info block_info;
>>        struct mini_Qdisc_pair miniqp;
>> };
>>
>> That is freed along with the qdisc itself in:
>> qdisc_destroy->qdisc_free
>>
>> Before miniq, tp was checked in the rcu reader path. In case it was not
>> null, q was processed. In slow patch, tp is freed after rcu grace period:
>> tcf_proto_destroy->kfree_rcu
>>
>> I assumed that since q is processed in rcu reader, it is also freed after
>> a grace period, but now looking at the code I don't see it happening
>> like that.
> 
> Aha! It was removed by:
> commit c5ad119fb6c09b0297446be05bd66602fa564758
> Author: John Fastabend <john.fastabend@gmail.com>
> Date:   Thu Dec 7 09:58:19 2017 -0800
> 
>     net: sched: pfifo_fast use skb_array
> 

Even farther back right,

commit 752fbcc33405d6f8249465e4b2c4e420091bb825
Author: Cong Wang <xiyou.wangcong@gmail.com>
Date:   Tue Sep 19 13:15:42 2017 -0700

    net_sched: no need to free qdisc in RCU callback


> 
>>
>> So I think that change to miniq made the existing race window
>> a bit wider and easier to hit.
>>
>> I believe that calling kfree_rcu by call_rcu should resolve this.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RCU callback crashes
  2017-12-20 20:17         ` Jakub Kicinski
@ 2017-12-20 20:23           ` John Fastabend
  2017-12-20 22:38             ` Cong Wang
  0 siblings, 1 reply; 28+ messages in thread
From: John Fastabend @ 2017-12-20 20:23 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Jiri Pirko, netdev, Cong Wang

On 12/20/2017 12:17 PM, Jakub Kicinski wrote:
> On Wed, 20 Dec 2017 10:04:17 -0800, John Fastabend wrote:
>> On 12/19/2017 10:34 PM, Jakub Kicinski wrote:
>>> On Tue, 19 Dec 2017 22:22:27 -0800, Jakub Kicinski wrote:  
>>>>>> I get this:      
>>>>>
>>>>> Could you try to run it with kasan on?    
>>>>
>>>> I didn't manage to reproduce it with KASAN on so far :(  Even enabling
>>>> object debugging to get the second splat in my email (which is more
>>>> useful) actually makes the crash go away, I only see the warning...  
>>>
>>> Ah, no object debug but KASAN on produces this:
>>>   
>>
>> @Jakub, This is with mq and pfifo_fast I guess?
> 
> Sorry for falling silent, I was convinced I saw this before your code
> went in, it just takes a lot longer to trigger... I've been running
> net-next from Dec 1st now for an hour now and it didn't crash :/
> 
> Trying KASAN now..
> 

Its possible my patches just made it worse because the kfree on the skb
lists was exposed as well.

I'm trying to see how removing that rcu grace period was safe in the
first place. The datapath is using rcu_read critical section to protect
the qdisc but the control path (a) doesn't use rcu grace period and (b)
doesn't use the qidisc lock. Going to go get a coffee and I'll think
about it a bit more. Any ideas Cong?

Perhaps we need a patch for net (mine was against net-next) and stable
as well probably.

Thanks,
John

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RCU callback crashes
  2017-12-20 20:14           ` John Fastabend
  2017-12-20 20:18             ` Jiri Pirko
@ 2017-12-20 22:25             ` Cong Wang
  1 sibling, 0 replies; 28+ messages in thread
From: Cong Wang @ 2017-12-20 22:25 UTC (permalink / raw)
  To: John Fastabend; +Cc: Jiri Pirko, Jakub Kicinski, netdev

On Wed, Dec 20, 2017 at 12:14 PM, John Fastabend
<john.fastabend@gmail.com> wrote:
>
> Hi,
>
> Just sent a patch to complete qdisc_destroy from rcu callback. This
> is needed to resolve a race with the lockless qdisc patches.
>
> But I guess it should fix the miniq issue as well?


If you ever look into tools/testing/selftests/bpf/test_offload.py, it is
ingress qdisc. Don't know why you keep believing it is related
to your patches.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RCU callback crashes
  2017-12-20 20:23           ` John Fastabend
@ 2017-12-20 22:38             ` Cong Wang
  0 siblings, 0 replies; 28+ messages in thread
From: Cong Wang @ 2017-12-20 22:38 UTC (permalink / raw)
  To: John Fastabend; +Cc: Jakub Kicinski, Jiri Pirko, netdev

On Wed, Dec 20, 2017 at 12:23 PM, John Fastabend
<john.fastabend@gmail.com> wrote:
> I'm trying to see how removing that rcu grace period was safe in the
> first place. The datapath is using rcu_read critical section to protect
> the qdisc but the control path (a) doesn't use rcu grace period and (b)
> doesn't use the qidisc lock. Going to go get a coffee and I'll think
> about it a bit more. Any ideas Cong?

qdisc_graft() -> dev_deactivate() -> synchronize_net() is the reason
you want to find?

Also, you can try `git log` to see why it was introduced in the beginning,
here it is:

commit 5d944c640b4ae5f37c537acf491c2f0eb89fa0d6
Author: Eric Dumazet <eric.dumazet@gmail.com>
Date:   Wed Mar 31 07:06:04 2010 +0000

    gen_estimator: deadlock fix

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RCU callback crashes
  2017-12-20 18:31         ` Cong Wang
@ 2017-12-21  0:03           ` Cong Wang
  2017-12-21  0:08             ` Jakub Kicinski
  2017-12-21  0:37             ` Jakub Kicinski
  0 siblings, 2 replies; 28+ messages in thread
From: Cong Wang @ 2017-12-21  0:03 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Jiri Pirko, netdev

On Wed, Dec 20, 2017 at 10:31 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Wed, Dec 20, 2017 at 10:17 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>
>> I guess it is q->miniqp which is freed in qdisc_graft() without properly
>> waiting for rcu readers?
>
> It is probably so, the call_rcu_bh(&miniq_old->rcu, mini_qdisc_rcu_func)
> in the end of mini_qdisc_pair_swap() is invoked on miniq_old->rcu,
> but miniq is being freed, no rcu barrier waits for it...
>
> You can try to add a rcu_barrier_bh() at the end to see if this crash
> is gone, but I don't think people like adding yet another rcu barrier...

Hi, Jakub

Can you test the following fix? I am not a fan of rcu barrier but we
already have one so...

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 876fab2604b8..1b68fedea124 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -1240,6 +1240,8 @@ void mini_qdisc_pair_swap(struct mini_Qdisc_pair *miniqp,

        if (!tp_head) {
                RCU_INIT_POINTER(*miniqp->p_miniq, NULL);
+               /* Wait for existing flying RCU callback before being freed. */
+               rcu_barrier_bh();
                return;
        }

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: RCU callback crashes
  2017-12-21  0:03           ` Cong Wang
@ 2017-12-21  0:08             ` Jakub Kicinski
  2017-12-21  0:37             ` Jakub Kicinski
  1 sibling, 0 replies; 28+ messages in thread
From: Jakub Kicinski @ 2017-12-21  0:08 UTC (permalink / raw)
  To: Cong Wang; +Cc: Jiri Pirko, netdev

On Wed, 20 Dec 2017 16:03:49 -0800, Cong Wang wrote:
> On Wed, Dec 20, 2017 at 10:31 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > On Wed, Dec 20, 2017 at 10:17 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:  
> >>
> >> I guess it is q->miniqp which is freed in qdisc_graft() without properly
> >> waiting for rcu readers?  
> >
> > It is probably so, the call_rcu_bh(&miniq_old->rcu, mini_qdisc_rcu_func)
> > in the end of mini_qdisc_pair_swap() is invoked on miniq_old->rcu,
> > but miniq is being freed, no rcu barrier waits for it...
> >
> > You can try to add a rcu_barrier_bh() at the end to see if this crash
> > is gone, but I don't think people like adding yet another rcu barrier...  
> 
> Hi, Jakub
> 
> Can you test the following fix? I am not a fan of rcu barrier but we
> already have one so...
> 
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index 876fab2604b8..1b68fedea124 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -1240,6 +1240,8 @@ void mini_qdisc_pair_swap(struct mini_Qdisc_pair *miniqp,
> 
>         if (!tp_head) {
>                 RCU_INIT_POINTER(*miniqp->p_miniq, NULL);
> +               /* Wait for existing flying RCU callback before being freed. */
> +               rcu_barrier_bh();
>                 return;
>         }

Mm.. I was running with this hack for the last two hours and it was OK:

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 876fab2604b8..d7e0c3ad0a1c 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -1260,6 +1260,7 @@ void mini_qdisc_pair_swap(struct mini_Qdisc_pair *miniqp,
                 * are not seeing it.
                 */
                call_rcu_bh(&miniq_old->rcu, mini_qdisc_rcu_func);
+       rcu_barrier_bh();
 }
 EXPORT_SYMBOL(mini_qdisc_pair_swap);

Let me try to move the barrier...

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: RCU callback crashes
  2017-12-21  0:03           ` Cong Wang
  2017-12-21  0:08             ` Jakub Kicinski
@ 2017-12-21  0:37             ` Jakub Kicinski
  2017-12-21  0:41               ` Jakub Kicinski
  2017-12-21  7:24               ` Cong Wang
  1 sibling, 2 replies; 28+ messages in thread
From: Jakub Kicinski @ 2017-12-21  0:37 UTC (permalink / raw)
  To: Cong Wang; +Cc: Jiri Pirko, netdev

On Wed, 20 Dec 2017 16:03:49 -0800, Cong Wang wrote:
> On Wed, Dec 20, 2017 at 10:31 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > On Wed, Dec 20, 2017 at 10:17 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:  
> >>
> >> I guess it is q->miniqp which is freed in qdisc_graft() without properly
> >> waiting for rcu readers?  
> >
> > It is probably so, the call_rcu_bh(&miniq_old->rcu, mini_qdisc_rcu_func)
> > in the end of mini_qdisc_pair_swap() is invoked on miniq_old->rcu,
> > but miniq is being freed, no rcu barrier waits for it...
> >
> > You can try to add a rcu_barrier_bh() at the end to see if this crash
> > is gone, but I don't think people like adding yet another rcu barrier...  
> 
> Hi, Jakub
> 
> Can you test the following fix? I am not a fan of rcu barrier but we
> already have one so...
> 
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index 876fab2604b8..1b68fedea124 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -1240,6 +1240,8 @@ void mini_qdisc_pair_swap(struct mini_Qdisc_pair *miniqp,
> 
>         if (!tp_head) {
>                 RCU_INIT_POINTER(*miniqp->p_miniq, NULL);
> +               /* Wait for existing flying RCU callback before being freed. */
> +               rcu_barrier_bh();
>                 return;
>         }

Looks good after 30 minutes, feel free to add if you post officially:

Tested-by: Jakub Kicinski <jakub.kicinski@netronome.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RCU callback crashes
  2017-12-21  0:37             ` Jakub Kicinski
@ 2017-12-21  0:41               ` Jakub Kicinski
  2017-12-21  0:50                 ` Jakub Kicinski
  2017-12-21  7:24               ` Cong Wang
  1 sibling, 1 reply; 28+ messages in thread
From: Jakub Kicinski @ 2017-12-21  0:41 UTC (permalink / raw)
  To: Cong Wang, John Fastabend; +Cc: Jiri Pirko, netdev

On Wed, 20 Dec 2017 16:37:10 -0800, Jakub Kicinski wrote:
> On Wed, 20 Dec 2017 16:03:49 -0800, Cong Wang wrote:
> > On Wed, Dec 20, 2017 at 10:31 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:  
> > > On Wed, Dec 20, 2017 at 10:17 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:    
> > >>
> > >> I guess it is q->miniqp which is freed in qdisc_graft() without properly
> > >> waiting for rcu readers?    
> > >
> > > It is probably so, the call_rcu_bh(&miniq_old->rcu, mini_qdisc_rcu_func)
> > > in the end of mini_qdisc_pair_swap() is invoked on miniq_old->rcu,
> > > but miniq is being freed, no rcu barrier waits for it...
> > >
> > > You can try to add a rcu_barrier_bh() at the end to see if this crash
> > > is gone, but I don't think people like adding yet another rcu barrier...    
> > 
> > Hi, Jakub
> > 
> > Can you test the following fix? I am not a fan of rcu barrier but we
> > already have one so...
> > 
> > diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> > index 876fab2604b8..1b68fedea124 100644
> > --- a/net/sched/sch_generic.c
> > +++ b/net/sched/sch_generic.c
> > @@ -1240,6 +1240,8 @@ void mini_qdisc_pair_swap(struct mini_Qdisc_pair *miniqp,
> > 
> >         if (!tp_head) {
> >                 RCU_INIT_POINTER(*miniqp->p_miniq, NULL);
> > +               /* Wait for existing flying RCU callback before being freed. */
> > +               rcu_barrier_bh();
> >                 return;
> >         }  
> 
> Looks good after 30 minutes, feel free to add if you post officially:
> 
> Tested-by: Jakub Kicinski <jakub.kicinski@netronome.com>

Just as I hit send... :)  but this looks unrelated, "Comm: sshd" -
so probably from the management interface.

[  154.604041] ==================================================================
[  154.612245] BUG: KASAN: slab-out-of-bounds in pfifo_fast_dequeue+0x140/0x2d0
[  154.620219] Read of size 8 at addr ffff88086bb64040 by task sshd/983
[  154.627403] 
[  154.629161] CPU: 10 PID: 983 Comm: sshd Not tainted 4.15.0-rc3-perf-00984-g82d3fc87a4aa-dirty #13
[  154.639190] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
[  154.647665] Call Trace:
[  154.650494]  dump_stack+0xa6/0x118
[  154.654387]  ? _atomic_dec_and_lock+0xe8/0xe8
[  154.659355]  ? trace_event_raw_event_rcu_torture_read+0x190/0x190
[  154.666263]  ? rcu_segcblist_enqueue+0xe9/0x120
[  154.671422]  ? _raw_spin_unlock_bh+0x91/0xc0
[  154.676286]  ? pfifo_fast_dequeue+0x140/0x2d0
[  154.681251]  print_address_description+0x6a/0x270
[  154.686601]  ? pfifo_fast_dequeue+0x140/0x2d0
[  154.691565]  kasan_report+0x23f/0x350
[  154.695752]  pfifo_fast_dequeue+0x140/0x2d0
[  154.700523]  __qdisc_run+0x264/0xa20
[  154.704613]  ? sch_direct_xmit+0x3d0/0x3d0
[  154.709287]  ? _raw_spin_unlock+0x73/0xc0
[  154.713860]  ? is_bpf_text_address+0x1e/0x30
[  154.718724]  ? kernel_text_address+0xec/0x100
[  154.723687]  ? __kernel_text_address+0xe/0x30
[  154.728650]  ? unwind_get_return_address+0x2f/0x50
[  154.734099]  ? pfifo_fast_enqueue+0x154/0x180
[  154.739065]  __dev_queue_xmit+0x5ae/0x1110
[  154.743738]  ? dst_alloc+0x8c/0xd0
[  154.747633]  ? netdev_pick_tx+0x150/0x150
[  154.752206]  ? ip_route_output_key_hash+0xee/0x130
[  154.757654]  ? ip_queue_xmit+0x7d0/0x830
[  154.762131]  ? tcp_transmit_skb+0xc52/0x15b0
[  154.766994]  ? tcp_write_xmit+0x425/0x2060
[  154.771665]  ? __tcp_push_pending_frames+0x56/0x110
[  154.777209]  ? tcp_push+0x2cf/0x360
[  154.781200]  ? tcp_sendmsg_locked+0xdb3/0x1cb0
[  154.786259]  ? tcp_sendmsg+0x27/0x40
[  154.790347]  ? inet_sendmsg+0xb3/0x1f0
[  154.794629]  ? sock_sendmsg+0x64/0x80
[  154.798814]  ? sock_write_iter+0x148/0x1f0
[  154.803486]  ? __vfs_write+0x26e/0x370
[  154.807767]  ? vfs_write+0xe9/0x240
[  154.811747]  ? SyS_write+0xa7/0x130
[  154.815739]  ? entry_SYSCALL_64_fastpath+0x1e/0x81
[  154.821190]  ? __alias_free_mem+0x20/0x20
[  154.825766]  ? rt_cache_route+0x143/0x170
[  154.830342]  ? find_busiest_group+0x12eb/0x1630
[  154.835500]  ? inet_lookup_ifaddr_rcu+0x126/0x170
[  154.840852]  ? percpu_counter_add_batch+0x24/0xa0
[  154.846207]  ? rt_cpu_seq_stop+0x10/0x10
[  154.850684]  ? dst_alloc+0xac/0xd0
[  154.854579]  ? rt_dst_alloc+0x1f0/0x250
[  154.858958]  ? ipv4_neigh_lookup+0x3a0/0x3a0
[  154.863824]  ? __rcu_read_unlock+0x6e/0x120
[  154.868594]  ? trace_event_raw_event_rcu_torture_read+0x190/0x190
[  154.875502]  ? ip_finish_output2+0x68d/0x7c0
[  154.880366]  ip_finish_output2+0x68d/0x7c0
[  154.885040]  ? ip_send_check+0x60/0x60
[  154.889322]  ? ip_route_input_noref+0xd0/0xd0
[  154.894287]  ? xfrm_lookup+0x888/0x10f0
[  154.898668]  ? ipv4_mtu+0x163/0x200
[  154.902662]  ? load_balance+0x108d/0x14a0
[  154.907238]  ? ip_finish_output+0x39a/0x4c0
[  154.912004]  ip_finish_output+0x39a/0x4c0
[  154.916578]  ? ip_fragment.constprop.5+0xf0/0xf0
[  154.921832]  ? find_busiest_group+0x1630/0x1630
[  154.926991]  ? check_cfs_rq_runtime+0x70/0x70
[  154.931954]  ? __rcu_read_unlock+0x6e/0x120
[  154.936723]  ? trace_event_raw_event_rcu_torture_read+0x190/0x190
[  154.943630]  ? unwind_get_return_address+0x2f/0x50
[  154.949077]  ? ip_send_check+0x20/0x60
[  154.953360]  ip_output+0x106/0x280
[  154.957253]  ? ip_mc_output+0x750/0x750
[  154.961631]  ? ip_route_output_key_hash_rcu+0x1240/0x1240
[  154.967757]  ? sk_setup_caps+0x180/0x180
[  154.972236]  ? __skb_clone+0x2f8/0x370
[  154.976520]  ip_queue_xmit+0x381/0x830
[  154.980805]  ? ip_build_and_send_pkt+0x420/0x420
[  154.986060]  ? trace_event_raw_event_bpf_obj_map+0x200/0x200
[  154.992481]  ? tcp_options_write+0xc3/0x360
[  154.997248]  ? tcp_established_options+0x122/0x190
[  155.002697]  tcp_transmit_skb+0xc52/0x15b0
[  155.007374]  ? __tcp_select_window+0x3c0/0x3c0
[  155.012433]  ? is_bpf_text_address+0x1e/0x30
[  155.017296]  ? kernel_text_address+0xec/0x100
[  155.022259]  ? __kernel_text_address+0xe/0x30
[  155.027221]  ? unwind_get_return_address+0x2f/0x50
[  155.032670]  ? __save_stack_trace+0x83/0xd0
[  155.037437]  ? memcmp+0x45/0x70
[  155.041041]  ? depot_save_stack+0x12d/0x470
[  155.045811]  ? tcp_small_queue_check.isra.4+0x10a/0x1f0
[  155.051745]  ? tcp_tso_segs+0xe0/0xe0
[  155.055932]  ? native_sched_clock+0xcc/0x130
[  155.060799]  ? cyc2ns_read_end+0x20/0x20
[  155.065275]  ? sock_sendmsg+0x64/0x80
[  155.069460]  ? vfs_write+0xe9/0x240
[  155.073483]  ? entry_SYSCALL_64_fastpath+0x1e/0x81
[  155.078931]  ? sock_sendmsg+0x64/0x80
[  155.083116]  ? sock_write_iter+0x148/0x1f0
[  155.087790]  ? sched_clock+0x5/0x10
[  155.091780]  ? deref_stack_reg+0x98/0xd0
[  155.096257]  ? sched_clock+0x5/0x10
[  155.100248]  ? sched_clock_cpu+0x14/0xf0
[  155.104726]  tcp_write_xmit+0x425/0x2060
[  155.109209]  ? memcg_kmem_get_cache+0x4e0/0x4e0
[  155.114356]  ? tcp_transmit_skb+0x15b0/0x15b0
[  155.119318]  ? memcg_kmem_put_cache+0x63/0x120
[  155.124376]  ? memcg_kmem_get_cache+0x4e0/0x4e0
[  155.129536]  ? __kmalloc_node_track_caller+0x1fe/0x2a0
[  155.135371]  ? __alloc_skb+0xed/0x390
[  155.139558]  ? __kmalloc_reserve.isra.7+0x43/0x80
[  155.144908]  ? memset+0x1f/0x40
[  155.148510]  ? __alloc_skb+0x302/0x390
[  155.152792]  ? __kmalloc_reserve.isra.7+0x80/0x80
[  155.158142]  ? ipv4_mtu+0x90/0x200
[  155.162036]  ? tcp_mtu_to_mss+0x155/0x1a0
[  155.166610]  ? ipv4_negative_advice+0x60/0x60
[  155.171572]  ? tcp_trim_head+0x260/0x260
[  155.176048]  ? SyS_read+0xa7/0x130
[  155.179941]  ? iov_iter_advance+0x16a/0x780
[  155.184709]  ? copyout+0x4f/0x60
[  155.188410]  ? tcp_established_options+0x122/0x190
[  155.193858]  ? import_single_range+0x110/0x110
[  155.198918]  __tcp_push_pending_frames+0x56/0x110
[  155.204269]  tcp_push+0x2cf/0x360
[  155.208068]  ? tcp_splice_data_recv+0xb0/0xb0
[  155.213032]  ? skb_entail+0x2e5/0x300
[  155.217217]  ? _copy_from_iter+0x680/0x680
[  155.221890]  ? _raw_spin_unlock_bh+0x91/0xc0
[  155.226757]  tcp_sendmsg_locked+0xdb3/0x1cb0
[  155.231628]  ? tcp_recvmsg+0x790/0x1420
[  155.236001]  ? tcp_sendpage+0x60/0x60
[  155.240190]  ? tcp_recv_timestamp+0x240/0x240
[  155.245158]  ? compat_poll_select_copy_remaining+0x310/0x310
[  155.251582]  ? compat_poll_select_copy_remaining+0x310/0x310
[  155.258004]  ? compat_poll_select_copy_remaining+0x310/0x310
[  155.264427]  ? __rcu_read_unlock+0xf8/0x120
[  155.269197]  ? trace_event_raw_event_rcu_torture_read+0x190/0x190
[  155.276104]  ? trace_event_raw_event_rcu_torture_read+0x190/0x190
[  155.283000]  ? _raw_spin_unlock+0x73/0xc0
[  155.287573]  ? _raw_spin_trylock+0xe0/0xe0
[  155.292246]  ? __release_sock+0xc0/0x140
[  155.296727]  tcp_sendmsg+0x27/0x40
[  155.300621]  inet_sendmsg+0xb3/0x1f0
[  155.304709]  ? aa_path_link+0x260/0x260
[  155.309088]  ? inet_recvmsg+0x210/0x210
[  155.313469]  ? fsnotify+0xae8/0xb30
[  155.317462]  ? inet_recvmsg+0x210/0x210
[  155.332037]  sock_sendmsg+0x64/0x80
[  155.336029]  sock_write_iter+0x148/0x1f0
[  155.340506]  ? sock_sendmsg+0x80/0x80
[  155.344691]  ? sock_recvmsg+0x90/0x90
[  155.348881]  ? tty_ldisc_deref+0x12/0x20
[  155.353356]  ? iov_iter_init+0x77/0xb0
[  155.357639]  __vfs_write+0x26e/0x370
[  155.361727]  ? kernel_read+0xa0/0xa0
[  155.365816]  ? _raw_spin_unlock_irq+0x73/0xc0
[  155.370783]  ? __fsnotify_update_child_dentry_flags.part.0+0x150/0x150
[  155.378174]  ? __fsnotify_parent+0x84/0x220
[  155.382944]  ? __fsnotify_update_child_dentry_flags.part.0+0x150/0x150
[  155.390340]  vfs_write+0xe9/0x240
[  155.394137]  SyS_write+0xa7/0x130
[  155.397934]  ? SyS_read+0x130/0x130
[  155.401925]  ? SyS_clock_settime+0x110/0x110
[  155.406791]  ? SyS_fcntl+0x82/0xb0
[  155.410685]  entry_SYSCALL_64_fastpath+0x1e/0x81
[  155.415939] RIP: 0033:0x7fed3bbf4290
[  155.420024] RSP: 002b:00007ffcffb69468 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  155.428600] RAX: ffffffffffffffda RBX: 00007fed3bec1b20 RCX: 00007fed3bbf4290
[  155.436668] RDX: 0000000000000024 RSI: 0000559b52ddd308 RDI: 0000000000000003
[  155.444735] RBP: 0000000000000021 R08: 0000559b52ddaf60 R09: 0000000000000014
[  155.452803] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fed3bec1b78
[  155.460870] R13: 0000559b52ddb030 R14: 0000559b52ddaf50 R15: 0000559b52ddaf50
[  155.468942] 
[  155.470697] Allocated by task 780:
[  155.474589]  __kmalloc+0xfa/0x230
[  155.478377]  pfifo_fast_init+0x69/0x160
[  155.482757]  qdisc_create_dflt+0x69/0xb0
[  155.487232]  mq_init+0x195/0x1e0
[  155.490931]  qdisc_create_dflt+0x69/0xb0
[  155.495407]  dev_activate+0x48a/0x4e0
[  155.499593]  __dev_open+0x19e/0x210
[  155.503583]  __dev_change_flags+0x3b5/0x3f0
[  155.508351]  dev_change_flags+0x50/0xa0
[  155.512729]  do_setlink+0x5eb/0x1cf0
[  155.516817]  rtnl_newlink+0x9d5/0xe40
[  155.521002]  rtnetlink_rcv_msg+0x37c/0x7e0
[  155.525673]  netlink_rcv_skb+0x122/0x230
[  155.530149]  netlink_unicast+0x2ae/0x360
[  155.534624]  netlink_sendmsg+0x5d5/0x620
[  155.539100]  sock_sendmsg+0x64/0x80
[  155.543090]  ___sys_sendmsg+0x4a8/0x500
[  155.547467]  __sys_sendmsg+0xa9/0x140
[  155.551643]  entry_SYSCALL_64_fastpath+0x1e/0x81
[  155.556893] 
[  155.558646] Freed by task 0:
[  155.561953] (stack is not available)
[  155.566035] 
[  155.567791] The buggy address belongs to the object at ffff88086bb62100
[  155.567791]  which belongs to the cache kmalloc-8192 of size 8192
[  155.582099] The buggy address is located 8000 bytes inside of
[  155.582099]  8192-byte region [ffff88086bb62100, ffff88086bb64100)
[  155.595529] The buggy address belongs to the page:
[  155.600977] page:00000000a9a82c52 count:1 mapcount:0 mapping:          (null) index:0x0 compound_mapcount: 0
[  155.612081] flags: 0x6ffff0000008100(slab|head)
[  155.617240] raw: 06ffff0000008100 0000000000000000 0000000000000000 0000000100030003
[  155.626010] raw: dead000000000100 dead000000000200 ffff8803afc0e680 0000000000000000
[  155.634776] page dumped because: kasan: bad access detected
[  155.641094] 
[  155.642935] Memory state around the buggy address:
[  155.648382]  ffff88086bb63f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  155.656568]  ffff88086bb63f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  155.664756] >ffff88086bb64000: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
[  155.672943]                                            ^
[  155.678972]  ffff88086bb64080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  155.687160]  ffff88086bb64100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  155.695346] ==================================================================

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RCU callback crashes
  2017-12-21  0:41               ` Jakub Kicinski
@ 2017-12-21  0:50                 ` Jakub Kicinski
  2017-12-21  7:27                   ` Cong Wang
  2017-12-21 21:31                   ` Cong Wang
  0 siblings, 2 replies; 28+ messages in thread
From: Jakub Kicinski @ 2017-12-21  0:50 UTC (permalink / raw)
  To: John Fastabend; +Cc: Cong Wang, Jiri Pirko, netdev

On Wed, 20 Dec 2017 16:41:14 -0800, Jakub Kicinski wrote:
> Just as I hit send... :)  but this looks unrelated, "Comm: sshd" -
> so probably from the management interface.
> 
> [  154.604041] ==================================================================
> [  154.612245] BUG: KASAN: slab-out-of-bounds in pfifo_fast_dequeue+0x140/0x2d0
> [  154.620219] Read of size 8 at addr ffff88086bb64040 by task sshd/983
> [  154.627403] 
> [  154.629161] CPU: 10 PID: 983 Comm: sshd Not tainted 4.15.0-rc3-perf-00984-g82d3fc87a4aa-dirty #13
> [  154.639190] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
> [  154.647665] Call Trace:
> [  154.650494]  dump_stack+0xa6/0x118
> [  154.654387]  ? _atomic_dec_and_lock+0xe8/0xe8
> [  154.659355]  ? trace_event_raw_event_rcu_torture_read+0x190/0x190
> [  154.666263]  ? rcu_segcblist_enqueue+0xe9/0x120
> [  154.671422]  ? _raw_spin_unlock_bh+0x91/0xc0
> [  154.676286]  ? pfifo_fast_dequeue+0x140/0x2d0
> [  154.681251]  print_address_description+0x6a/0x270
> [  154.686601]  ? pfifo_fast_dequeue+0x140/0x2d0
> [  154.691565]  kasan_report+0x23f/0x350
> [  154.695752]  pfifo_fast_dequeue+0x140/0x2d0

If we trust stack decode it's:

   615  static struct sk_buff *pfifo_fast_dequeue(struct Qdisc *qdisc)
   616  {
   617          struct pfifo_fast_priv *priv = qdisc_priv(qdisc);
   618          struct sk_buff *skb = NULL;
   619          int band;
   620  
   621          for (band = 0; band < PFIFO_FAST_BANDS && !skb; band++) {
   622                  struct skb_array *q = band2list(priv, band);
   623  
>> 624                  if (__skb_array_empty(q))
   625                          continue;
   626  
   627                  skb = skb_array_consume_bh(q);
   628          }
   629          if (likely(skb)) {
   630                  qdisc_qstats_cpu_backlog_dec(qdisc, skb);
   631                  qdisc_bstats_cpu_update(qdisc, skb);
   632                  qdisc_qstats_cpu_qlen_dec(qdisc);
   633          }
   634  
   635          return skb;
   636  }

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RCU callback crashes
  2017-12-21  0:37             ` Jakub Kicinski
  2017-12-21  0:41               ` Jakub Kicinski
@ 2017-12-21  7:24               ` Cong Wang
  1 sibling, 0 replies; 28+ messages in thread
From: Cong Wang @ 2017-12-21  7:24 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Jiri Pirko, netdev

On Wed, Dec 20, 2017 at 4:37 PM, Jakub Kicinski <kubakici@wp.pl> wrote:
> On Wed, 20 Dec 2017 16:03:49 -0800, Cong Wang wrote:
>> On Wed, Dec 20, 2017 at 10:31 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> > On Wed, Dec 20, 2017 at 10:17 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> >>
>> >> I guess it is q->miniqp which is freed in qdisc_graft() without properly
>> >> waiting for rcu readers?
>> >
>> > It is probably so, the call_rcu_bh(&miniq_old->rcu, mini_qdisc_rcu_func)
>> > in the end of mini_qdisc_pair_swap() is invoked on miniq_old->rcu,
>> > but miniq is being freed, no rcu barrier waits for it...
>> >
>> > You can try to add a rcu_barrier_bh() at the end to see if this crash
>> > is gone, but I don't think people like adding yet another rcu barrier...
>>
>> Hi, Jakub
>>
>> Can you test the following fix? I am not a fan of rcu barrier but we
>> already have one so...
>>
>> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
>> index 876fab2604b8..1b68fedea124 100644
>> --- a/net/sched/sch_generic.c
>> +++ b/net/sched/sch_generic.c
>> @@ -1240,6 +1240,8 @@ void mini_qdisc_pair_swap(struct mini_Qdisc_pair *miniqp,
>>
>>         if (!tp_head) {
>>                 RCU_INIT_POINTER(*miniqp->p_miniq, NULL);
>> +               /* Wait for existing flying RCU callback before being freed. */
>> +               rcu_barrier_bh();
>>                 return;
>>         }
>
> Looks good after 30 minutes, feel free to add if you post officially:
>
> Tested-by: Jakub Kicinski <jakub.kicinski@netronome.com>

Thanks for testing! I just sent a formal patch out.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RCU callback crashes
  2017-12-21  0:50                 ` Jakub Kicinski
@ 2017-12-21  7:27                   ` Cong Wang
  2017-12-21 16:26                     ` John Fastabend
  2017-12-21 21:31                   ` Cong Wang
  1 sibling, 1 reply; 28+ messages in thread
From: Cong Wang @ 2017-12-21  7:27 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: John Fastabend, Jiri Pirko, netdev

On Wed, Dec 20, 2017 at 4:50 PM, Jakub Kicinski <kubakici@wp.pl> wrote:
> On Wed, 20 Dec 2017 16:41:14 -0800, Jakub Kicinski wrote:
>> Just as I hit send... :)  but this looks unrelated, "Comm: sshd" -
>> so probably from the management interface.
>>
>> [  154.604041] ==================================================================
>> [  154.612245] BUG: KASAN: slab-out-of-bounds in pfifo_fast_dequeue+0x140/0x2d0
>> [  154.620219] Read of size 8 at addr ffff88086bb64040 by task sshd/983
>> [  154.627403]
>> [  154.629161] CPU: 10 PID: 983 Comm: sshd Not tainted 4.15.0-rc3-perf-00984-g82d3fc87a4aa-dirty #13
>> [  154.639190] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
>> [  154.647665] Call Trace:
>> [  154.650494]  dump_stack+0xa6/0x118
>> [  154.654387]  ? _atomic_dec_and_lock+0xe8/0xe8
>> [  154.659355]  ? trace_event_raw_event_rcu_torture_read+0x190/0x190
>> [  154.666263]  ? rcu_segcblist_enqueue+0xe9/0x120
>> [  154.671422]  ? _raw_spin_unlock_bh+0x91/0xc0
>> [  154.676286]  ? pfifo_fast_dequeue+0x140/0x2d0
>> [  154.681251]  print_address_description+0x6a/0x270
>> [  154.686601]  ? pfifo_fast_dequeue+0x140/0x2d0
>> [  154.691565]  kasan_report+0x23f/0x350
>> [  154.695752]  pfifo_fast_dequeue+0x140/0x2d0
>
> If we trust stack decode it's:
>
>    615  static struct sk_buff *pfifo_fast_dequeue(struct Qdisc *qdisc)
>    616  {
>    617          struct pfifo_fast_priv *priv = qdisc_priv(qdisc);
>    618          struct sk_buff *skb = NULL;
>    619          int band;
>    620
>    621          for (band = 0; band < PFIFO_FAST_BANDS && !skb; band++) {
>    622                  struct skb_array *q = band2list(priv, band);
>    623
>>> 624                  if (__skb_array_empty(q))
>    625                          continue;
>    626
>    627                  skb = skb_array_consume_bh(q);
>    628          }
>    629          if (likely(skb)) {
>    630                  qdisc_qstats_cpu_backlog_dec(qdisc, skb);
>    631                  qdisc_bstats_cpu_update(qdisc, skb);
>    632                  qdisc_qstats_cpu_qlen_dec(qdisc);
>    633          }
>    634
>    635          return skb;
>    636  }

Yeah, this one is clearly a different one and it is introduced by John's
"lockless" patchset.

I will take a look tomorrow if John doesn't.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RCU callback crashes
  2017-12-21  7:27                   ` Cong Wang
@ 2017-12-21 16:26                     ` John Fastabend
  2017-12-21 16:56                       ` Jakub Kicinski
  2017-12-21 20:17                       ` Cong Wang
  0 siblings, 2 replies; 28+ messages in thread
From: John Fastabend @ 2017-12-21 16:26 UTC (permalink / raw)
  To: Cong Wang, Jakub Kicinski; +Cc: Jiri Pirko, netdev

On 12/20/2017 11:27 PM, Cong Wang wrote:
> On Wed, Dec 20, 2017 at 4:50 PM, Jakub Kicinski <kubakici@wp.pl> wrote:
>> On Wed, 20 Dec 2017 16:41:14 -0800, Jakub Kicinski wrote:
>>> Just as I hit send... :)  but this looks unrelated, "Comm: sshd" -
>>> so probably from the management interface.
>>>
>>> [  154.604041] ==================================================================
>>> [  154.612245] BUG: KASAN: slab-out-of-bounds in pfifo_fast_dequeue+0x140/0x2d0
>>> [  154.620219] Read of size 8 at addr ffff88086bb64040 by task sshd/983
>>> [  154.627403]
>>> [  154.629161] CPU: 10 PID: 983 Comm: sshd Not tainted 4.15.0-rc3-perf-00984-g82d3fc87a4aa-dirty #13
>>> [  154.639190] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
>>> [  154.647665] Call Trace:
>>> [  154.650494]  dump_stack+0xa6/0x118
>>> [  154.654387]  ? _atomic_dec_and_lock+0xe8/0xe8
>>> [  154.659355]  ? trace_event_raw_event_rcu_torture_read+0x190/0x190
>>> [  154.666263]  ? rcu_segcblist_enqueue+0xe9/0x120
>>> [  154.671422]  ? _raw_spin_unlock_bh+0x91/0xc0
>>> [  154.676286]  ? pfifo_fast_dequeue+0x140/0x2d0
>>> [  154.681251]  print_address_description+0x6a/0x270
>>> [  154.686601]  ? pfifo_fast_dequeue+0x140/0x2d0
>>> [  154.691565]  kasan_report+0x23f/0x350
>>> [  154.695752]  pfifo_fast_dequeue+0x140/0x2d0
>>
>> If we trust stack decode it's:
>>
>>    615  static struct sk_buff *pfifo_fast_dequeue(struct Qdisc *qdisc)
>>    616  {
>>    617          struct pfifo_fast_priv *priv = qdisc_priv(qdisc);
>>    618          struct sk_buff *skb = NULL;
>>    619          int band;
>>    620
>>    621          for (band = 0; band < PFIFO_FAST_BANDS && !skb; band++) {
>>    622                  struct skb_array *q = band2list(priv, band);
>>    623
>>>> 624                  if (__skb_array_empty(q))
>>    625                          continue;
>>    626
>>    627                  skb = skb_array_consume_bh(q);
>>    628          }
>>    629          if (likely(skb)) {
>>    630                  qdisc_qstats_cpu_backlog_dec(qdisc, skb);
>>    631                  qdisc_bstats_cpu_update(qdisc, skb);
>>    632                  qdisc_qstats_cpu_qlen_dec(qdisc);
>>    633          }
>>    634
>>    635          return skb;
>>    636  }
> 
> Yeah, this one is clearly a different one and it is introduced by John's
> "lockless" patchset.
> 
> I will take a look tomorrow if John doesn't.
> 

I guess this path

  dev_deactivate_many
    dev_deactivate_queue
      qdisc_reset

here we have the qdisc lock but no rcu call or sync before the reset
does a kfree_skb and cleans up list walks. So possible for xmit path to
also be pushing skbs onto the array/lists still. I don't think this is
the issue triggered above but needs to be fixed

Also net_synchronize uses synchronize_rcu and we also have _bh variants
involved here...

Finally looks like net_tx_action is calling into qdisc_run without
rcu_read. Either need to check is_running bit (wanted to avoid this)
or put in rcu critical section. Maybe this is what you hit.

@Jakub, does your test have traffic generator running or just control
path? My theory would be a bit odd if you didn't have traffic, but
something is kicking the dequeue so must be some traffic.

I'll come up with some fixes today.

Thanks,
John

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RCU callback crashes
  2017-12-21 16:26                     ` John Fastabend
@ 2017-12-21 16:56                       ` Jakub Kicinski
  2017-12-21 20:17                       ` Cong Wang
  1 sibling, 0 replies; 28+ messages in thread
From: Jakub Kicinski @ 2017-12-21 16:56 UTC (permalink / raw)
  To: John Fastabend; +Cc: Cong Wang, Jiri Pirko, netdev

On Thu, 21 Dec 2017 08:26:56 -0800, John Fastabend wrote:
> @Jakub, does your test have traffic generator running or just control
> path? My theory would be a bit odd if you didn't have traffic, but
> something is kicking the dequeue so must be some traffic.

It was just control traffic, but it's the first time I've seen it so it
may be very unlikely to trigger...

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RCU callback crashes
  2017-12-21 16:26                     ` John Fastabend
  2017-12-21 16:56                       ` Jakub Kicinski
@ 2017-12-21 20:17                       ` Cong Wang
  1 sibling, 0 replies; 28+ messages in thread
From: Cong Wang @ 2017-12-21 20:17 UTC (permalink / raw)
  To: John Fastabend; +Cc: Jakub Kicinski, Jiri Pirko, netdev

On Thu, Dec 21, 2017 at 8:26 AM, John Fastabend
<john.fastabend@gmail.com> wrote:
> On 12/20/2017 11:27 PM, Cong Wang wrote:
>> On Wed, Dec 20, 2017 at 4:50 PM, Jakub Kicinski <kubakici@wp.pl> wrote:
>>> On Wed, 20 Dec 2017 16:41:14 -0800, Jakub Kicinski wrote:
>>>> Just as I hit send... :)  but this looks unrelated, "Comm: sshd" -
>>>> so probably from the management interface.
>>>>
>>>> [  154.604041] ==================================================================
>>>> [  154.612245] BUG: KASAN: slab-out-of-bounds in pfifo_fast_dequeue+0x140/0x2d0
>>>> [  154.620219] Read of size 8 at addr ffff88086bb64040 by task sshd/983
>>>> [  154.627403]
>>>> [  154.629161] CPU: 10 PID: 983 Comm: sshd Not tainted 4.15.0-rc3-perf-00984-g82d3fc87a4aa-dirty #13
>>>> [  154.639190] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
>>>> [  154.647665] Call Trace:
>>>> [  154.650494]  dump_stack+0xa6/0x118
>>>> [  154.654387]  ? _atomic_dec_and_lock+0xe8/0xe8
>>>> [  154.659355]  ? trace_event_raw_event_rcu_torture_read+0x190/0x190
>>>> [  154.666263]  ? rcu_segcblist_enqueue+0xe9/0x120
>>>> [  154.671422]  ? _raw_spin_unlock_bh+0x91/0xc0
>>>> [  154.676286]  ? pfifo_fast_dequeue+0x140/0x2d0
>>>> [  154.681251]  print_address_description+0x6a/0x270
>>>> [  154.686601]  ? pfifo_fast_dequeue+0x140/0x2d0
>>>> [  154.691565]  kasan_report+0x23f/0x350
>>>> [  154.695752]  pfifo_fast_dequeue+0x140/0x2d0
>>>
>>> If we trust stack decode it's:
>>>
>>>    615  static struct sk_buff *pfifo_fast_dequeue(struct Qdisc *qdisc)
>>>    616  {
>>>    617          struct pfifo_fast_priv *priv = qdisc_priv(qdisc);
>>>    618          struct sk_buff *skb = NULL;
>>>    619          int band;
>>>    620
>>>    621          for (band = 0; band < PFIFO_FAST_BANDS && !skb; band++) {
>>>    622                  struct skb_array *q = band2list(priv, band);
>>>    623
>>>>> 624                  if (__skb_array_empty(q))
>>>    625                          continue;
>>>    626
>>>    627                  skb = skb_array_consume_bh(q);
>>>    628          }
>>>    629          if (likely(skb)) {
>>>    630                  qdisc_qstats_cpu_backlog_dec(qdisc, skb);
>>>    631                  qdisc_bstats_cpu_update(qdisc, skb);
>>>    632                  qdisc_qstats_cpu_qlen_dec(qdisc);
>>>    633          }
>>>    634
>>>    635          return skb;
>>>    636  }
>>
>> Yeah, this one is clearly a different one and it is introduced by John's
>> "lockless" patchset.
>>
>> I will take a look tomorrow if John doesn't.
>>
>
> I guess this path
>
>   dev_deactivate_many
>     dev_deactivate_queue
>       qdisc_reset
>
> here we have the qdisc lock but no rcu call or sync before the reset
> does a kfree_skb and cleans up list walks. So possible for xmit path to
> also be pushing skbs onto the array/lists still. I don't think this is
> the issue triggered above but needs to be fixed

No, we already have a synchronize_net() there. It is probably just
a race with BH.

I think you missed the spinlock in dev_qdisc_reset(), at least
the comment right above qdisc_reset() says so.


diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 10aaa3b615ce..00ddb5f8f430 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -1097,8 +1097,11 @@ static void dev_qdisc_reset(struct net_device *dev,
 {
        struct Qdisc *qdisc = dev_queue->qdisc_sleeping;

-       if (qdisc)
+       if (qdisc) {
+               spin_lock_bh(qdisc_lock(qdisc));
                qdisc_reset(qdisc);
+               spin_unlock_bh(qdisc_lock(qdisc));
+       }
 }

 /**

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: RCU callback crashes
  2017-12-21  0:50                 ` Jakub Kicinski
  2017-12-21  7:27                   ` Cong Wang
@ 2017-12-21 21:31                   ` Cong Wang
  2017-12-21 21:45                     ` Jakub Kicinski
  1 sibling, 1 reply; 28+ messages in thread
From: Cong Wang @ 2017-12-21 21:31 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: John Fastabend, Jiri Pirko, netdev

[-- Attachment #1: Type: text/plain, Size: 2276 bytes --]

On Wed, Dec 20, 2017 at 4:50 PM, Jakub Kicinski <kubakici@wp.pl> wrote:
> On Wed, 20 Dec 2017 16:41:14 -0800, Jakub Kicinski wrote:
>> Just as I hit send... :)  but this looks unrelated, "Comm: sshd" -
>> so probably from the management interface.
>>
>> [  154.604041] ==================================================================
>> [  154.612245] BUG: KASAN: slab-out-of-bounds in pfifo_fast_dequeue+0x140/0x2d0
>> [  154.620219] Read of size 8 at addr ffff88086bb64040 by task sshd/983
>> [  154.627403]
>> [  154.629161] CPU: 10 PID: 983 Comm: sshd Not tainted 4.15.0-rc3-perf-00984-g82d3fc87a4aa-dirty #13
>> [  154.639190] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
>> [  154.647665] Call Trace:
>> [  154.650494]  dump_stack+0xa6/0x118
>> [  154.654387]  ? _atomic_dec_and_lock+0xe8/0xe8
>> [  154.659355]  ? trace_event_raw_event_rcu_torture_read+0x190/0x190
>> [  154.666263]  ? rcu_segcblist_enqueue+0xe9/0x120
>> [  154.671422]  ? _raw_spin_unlock_bh+0x91/0xc0
>> [  154.676286]  ? pfifo_fast_dequeue+0x140/0x2d0
>> [  154.681251]  print_address_description+0x6a/0x270
>> [  154.686601]  ? pfifo_fast_dequeue+0x140/0x2d0
>> [  154.691565]  kasan_report+0x23f/0x350
>> [  154.695752]  pfifo_fast_dequeue+0x140/0x2d0
>
> If we trust stack decode it's:
>
>    615  static struct sk_buff *pfifo_fast_dequeue(struct Qdisc *qdisc)
>    616  {
>    617          struct pfifo_fast_priv *priv = qdisc_priv(qdisc);
>    618          struct sk_buff *skb = NULL;
>    619          int band;
>    620
>    621          for (band = 0; band < PFIFO_FAST_BANDS && !skb; band++) {
>    622                  struct skb_array *q = band2list(priv, band);
>    623
>>> 624                  if (__skb_array_empty(q))
>    625                          continue;
>    626
>    627                  skb = skb_array_consume_bh(q);
>    628          }
>    629          if (likely(skb)) {
>    630                  qdisc_qstats_cpu_backlog_dec(qdisc, skb);
>    631                  qdisc_bstats_cpu_update(qdisc, skb);
>    632                  qdisc_qstats_cpu_qlen_dec(qdisc);
>    633          }
>    634
>    635          return skb;
>    636  }

Hi, Jakub

Could you test the attached patch? It looks like the __skb_array_empty()
use is unsafe.

Thanks!

[-- Attachment #2: pfifo_fast_dequeue.diff --]
[-- Type: text/plain, Size: 463 bytes --]

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 10aaa3b615ce..8d47fb4aadb4 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -621,10 +621,6 @@ static struct sk_buff *pfifo_fast_dequeue(struct Qdisc *qdisc)
 
 	for (band = 0; band < PFIFO_FAST_BANDS && !skb; band++) {
 		struct skb_array *q = band2list(priv, band);
-
-		if (__skb_array_empty(q))
-			continue;
-
 		skb = skb_array_consume_bh(q);
 	}
 	if (likely(skb)) {

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: RCU callback crashes
  2017-12-21 21:31                   ` Cong Wang
@ 2017-12-21 21:45                     ` Jakub Kicinski
  0 siblings, 0 replies; 28+ messages in thread
From: Jakub Kicinski @ 2017-12-21 21:45 UTC (permalink / raw)
  To: Cong Wang; +Cc: John Fastabend, Jiri Pirko, netdev

On Thu, 21 Dec 2017 13:31:01 -0800, Cong Wang wrote:
> >    629          if (likely(skb)) {
> >    630                  qdisc_qstats_cpu_backlog_dec(qdisc, skb);
> >    631                  qdisc_bstats_cpu_update(qdisc, skb);
> >    632                  qdisc_qstats_cpu_qlen_dec(qdisc);
> >    633          }
> >    634
> >    635          return skb;
> >    636  }  
> 
> Hi, Jakub
> 
> Could you test the attached patch? It looks like the __skb_array_empty()
> use is unsafe.

I don't have a reproducer, unfortunately, I haven't seen the splat
since :(  FWIW the kernel config was with all debug/checks disabled,
only KASAN on.

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2017-12-21 21:45 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-20  1:59 RCU callback crashes Jakub Kicinski
2017-12-20  6:11 ` Jiri Pirko
2017-12-20  6:22   ` Jakub Kicinski
2017-12-20  6:34     ` Jakub Kicinski
2017-12-20 18:04       ` John Fastabend
2017-12-20 20:17         ` Jakub Kicinski
2017-12-20 20:23           ` John Fastabend
2017-12-20 22:38             ` Cong Wang
2017-12-20 18:17       ` Cong Wang
2017-12-20 18:31         ` Cong Wang
2017-12-21  0:03           ` Cong Wang
2017-12-21  0:08             ` Jakub Kicinski
2017-12-21  0:37             ` Jakub Kicinski
2017-12-21  0:41               ` Jakub Kicinski
2017-12-21  0:50                 ` Jakub Kicinski
2017-12-21  7:27                   ` Cong Wang
2017-12-21 16:26                     ` John Fastabend
2017-12-21 16:56                       ` Jakub Kicinski
2017-12-21 20:17                       ` Cong Wang
2017-12-21 21:31                   ` Cong Wang
2017-12-21 21:45                     ` Jakub Kicinski
2017-12-21  7:24               ` Cong Wang
2017-12-20 19:59         ` Jiri Pirko
2017-12-20 20:14           ` John Fastabend
2017-12-20 20:18             ` Jiri Pirko
2017-12-20 22:25             ` Cong Wang
2017-12-20 20:15           ` Jiri Pirko
2017-12-20 20:18             ` John Fastabend

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.