RCU callback crashes

* RCU callback crashes
@ 2017-12-20  1:59 Jakub Kicinski
  2017-12-20  6:11 ` Jiri Pirko
  0 siblings, 1 reply; 28+ messages in thread
From: Jakub Kicinski @ 2017-12-20  1:59 UTC (permalink / raw)
  To: netdev, Jiri Pirko, Cong Wang

Hi!

If I run the netdevsim test long enough on a kernel with no debugging 
I get this:

[ 1400.450124] BUG: unable to handle kernel paging request at 000000046474e552
[ 1400.458005] IP: 0x46474e552
[ 1400.461231] PGD 0 P4D 0 
[ 1400.464150] Oops: 0010 [#1] PREEMPT SMP
[ 1400.468525] Modules linked in: cls_bpf sch_ingress algif_hash af_alg netdevsim rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace f3
[ 1400.516951] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.15.0-rc3-perf-00918-g129c9981a55f #918
[ 1400.526678] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
[ 1400.535150] RIP: 0010:0x46474e552
[ 1400.538941] RSP: 0018:ffff9f736f083f08 EFLAGS: 00010216
[ 1400.544870] RAX: ffff9f736b4771b8 RBX: ffff9f736f09b880 RCX: ffff9f736b4771b8
[ 1400.552935] RDX: 000000046474e552 RSI: ffff9f736f083f18 RDI: ffff9f736b4771b8
[ 1400.561001] RBP: ffffffff8bc4a740 R08: ffff9f736b4771b8 R09: 0000000000000000
[ 1400.569066] R10: ffff9f736f083d90 R11: 0000000000000000 R12: ffff9f736f09b8b8
[ 1400.577132] R13: 000000000000000a R14: 7fffffffffffffff R15: 0000000000000202
[ 1400.585197] FS:  0000000000000000(0000) GS:ffff9f736f080000(0000) knlGS:0000000000000000
[ 1400.594349] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1400.600859] CR2: 000000046474e552 CR3: 0000000839c09001 CR4: 00000000003606e0
[ 1400.608917] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1400.616982] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1400.625048] Call Trace:
[ 1400.627868]  <IRQ>
[ 1400.630207]  ? rcu_process_callbacks+0x1a0/0x4d0
[ 1400.635458]  ? __do_softirq+0xd1/0x30a
[ 1400.639739]  ? irq_exit+0xae/0xb0
[ 1400.643532]  ? smp_apic_timer_interrupt+0x60/0x140
[ 1400.648977]  ? apic_timer_interrupt+0x8c/0xa0
[ 1400.653934]  </IRQ>
[ 1400.656370]  ? cpuidle_enter_state+0xb0/0x2f0
[ 1400.661328]  ? cpuidle_enter_state+0x8d/0x2f0
[ 1400.666287]  ? do_idle+0x17b/0x1d0
[ 1400.670167]  ? cpu_startup_entry+0x5f/0x70
[ 1400.674836]  ? start_secondary+0x169/0x190
[ 1400.679504]  ? secondary_startup_64+0xa5/0xb0
[ 1400.684466] Code:  Bad RIP value.
[ 1400.688259] RIP: 0x46474e552 RSP: ffff9f736f083f08
[ 1400.693703] CR2: 000000046474e552
[ 1400.697501] ---[ end trace fab2c0fb826644df ]---
[ 1400.708442] Kernel panic - not syncing: Fatal exception in interrupt
[ 1400.715693] Kernel Offset: 0xa000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 1400.732994] ---[ end Kernel panic - not syncing: Fatal exception in interrupt

Unfortunately reproducing the crash on an instrumented kernel seems to
be difficult..

I managed to gather this:

[   26.157415] ------------[ cut here ]------------
[   26.162670] ODEBUG: free active (active state 1) object type: rcu_head hint:           (null)
[   26.172361] WARNING: CPU: 19 PID: 1352 at ../lib/debugobjects.c:291 debug_print_object+0x64/0x80
[   26.182288] Modules linked in: cls_bpf sch_ingress algif_hash af_alg netdevsim rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace f3
[   26.230728] CPU: 19 PID: 1352 Comm: tc Not tainted 4.15.0-rc3-perf-00918-g129c9981a55f #4
[   26.239977] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
[   26.248453] RIP: 0010:debug_print_object+0x64/0x80
[   26.253896] RSP: 0018:ffffb7340410fa00 EFLAGS: 00010086
[   26.259825] RAX: 0000000000000051 RBX: ffff8f1f6b7cc5a0 RCX: 0000000000000006
[   26.267892] RDX: 0000000000000007 RSI: 0000000000000082 RDI: ffff8f1f6f48cdd0
[   26.275959] RBP: ffffffffb3c48600 R08: 0000000000000000 R09: 00000000000005f2
[   26.284042] R10: 000000000000001e R11: ffffffffb41c35ad R12: ffffffffb3a1d101
[   26.292125] R13: ffff8f1f6b7cc5a0 R14: ffffffffb423a8b8 R15: 0000000000000001
[   26.300194] FS:  00007f64d4956700(0000) GS:ffff8f1f6f480000(0000) knlGS:0000000000000000
[   26.309346] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   26.315859] CR2: 0000000001cbc498 CR3: 000000086a8a2004 CR4: 00000000003606e0
[   26.323925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   26.331994] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   26.331994] Call Trace:
[   26.331998]  debug_check_no_obj_freed+0x1e6/0x220
[   26.332020]  ? qdisc_graft+0x14f/0x450
[   26.332025]  kfree+0x14d/0x1b0
[   26.332027]  qdisc_graft+0x14f/0x450
[   26.332029]  tc_get_qdisc+0x12f/0x200
[   26.332035]  rtnetlink_rcv_msg+0x122/0x310
[   26.332039]  ? __skb_try_recv_datagram+0xef/0x150
[   26.332040]  ? __kmalloc_node_track_caller+0x205/0x2b0
[   26.332042]  ? rtnl_calcit.isra.12+0x100/0x100
[   26.332044]  netlink_rcv_skb+0x8d/0x130
[   26.332046]  netlink_unicast+0x16a/0x210
[   26.332048]  netlink_sendmsg+0x32a/0x370
[   26.332054]  sock_sendmsg+0x2d/0x40
[   26.332056]  ___sys_sendmsg+0x298/0x2e0
[   26.332061]  ? mem_cgroup_commit_charge+0x7a/0x540
[   26.332062]  ? mem_cgroup_try_charge+0x8e/0x1d0
[   26.332066]  ? __handle_mm_fault+0x3a1/0x1190
[   26.332068]  ? __sys_sendmsg+0x41/0x70
[   26.332069]  __sys_sendmsg+0x41/0x70
[   26.332074]  entry_SYSCALL_64_fastpath+0x1e/0x81
[   26.332076] RIP: 0033:0x7f64d3b53450
[   26.332076] RSP: 002b:00007fffb5ea4388 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[   26.332077] RAX: ffffffffffffffda RBX: 00007f64d3e0fb20 RCX: 00007f64d3b53450
[   26.332078] RDX: 0000000000000000 RSI: 00007fffb5ea43e0 RDI: 0000000000000003
[   26.332078] RBP: 0000000000000a11 R08: 0000000000000000 R09: 000000000000000f
[   26.332079] R10: 00000000000005e7 R11: 0000000000000246 R12: 00007f64d3e0fb78
[   26.332079] R13: 00007f64d3e0fb78 R14: 000000000000270f R15: 00007f64d3e0fb78
[   26.332081] Code: c1 83 c2 01 8b 4b 14 4c 8b 45 00 89 15 f6 d0 e5 00 8b 53 10 4c 89 e6 48 c7 c7 38 7c a3 b3 48 8b 14 d5 80 3d 85 b 
[   26.332097] ---[ end trace bd33b199ae76ad43 ]---

^ permalink raw reply	[flat|nested] 28+ messages in thread