* Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+ @ 2017-06-06 21:06 Ben Greear 2017-06-07 0:00 ` David Ahern 0 siblings, 1 reply; 25+ messages in thread From: Ben Greear @ 2017-06-06 21:06 UTC (permalink / raw) To: netdev Hello, This bug has been around forever, and we recently got an intern and stuck him with trying to reproduce it on the latest kernel. It is still here. I'm not super excited about trying to fix this, but we can easily test patches if someone has a patch to try. Test case is to create 1000 mac-vlans and bring them up, with user-space tools running lots of 'dump' related commands as part of bringing up the interfaces and configuring some special source-based routing tables. (gdb) l *(inet6_dump_fib+0x109) 0x192f9 is in inet6_dump_fib (/home/greearb/git/linux-2.6/net/ipv6/ip6_fib.c:392). 387 } else 388 w->skip = 0; 389 390 read_lock_bh(&table->tb6_lock); 391 res = fib6_walk_continue(w); 392 read_unlock_bh(&table->tb6_lock); 393 if (res <= 0) { 394 fib6_walker_unlink(net, w); 395 cb->args[4] = 0; 396 } (gdb) l *(fib6_walk_continue+0x76) 0x188c6 is in fib6_walk_continue (/home/greearb/git/linux-2.6/net/ipv6/ip6_fib.c:1593). 1588 if (fn == w->root) 1589 return 0; 1590 pn = fn->parent; 1591 w->node = pn; 1592 #ifdef CONFIG_IPV6_SUBTREES 1593 if (FIB6_SUBTREE(pn) == fn) { 1594 WARN_ON(!(fn->fn_flags & RTN_ROOT)); 1595 w->state = FWS_L; 1596 continue; 1597 } [root@ct524-ffb0 ~]# BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 IP: fib6_walk_continue+0x76/0x180 [ipv6] PGD 3d9226067 P4D 3d9226067 PUD 3d9020067 PMD 0 Oops: 0000 [#1] PREEMPT SMP Modules linked in: nf_conntrack_netlink nf_conntrack nfnetlink nf_defrag_ipv4 libcrc32c bnep fuse macvlan pktgen cfg80211 ipmi_ssif iTCO_wdt iTCO_vendor_support coretemp intel_rapl x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm irqbypass joydev i2c_i801 ie31200_edac intel_pch_thermal shpchp hci_uart ipmi_si btbcm btqca ipmi_devintf btintel ipmi_msghandler bluetooth pinctrl_sunrisepoint acpi_als pinctrl_intel video tpm_tis intel_lpss_acpi kfifo_buf tpm_tis_core intel_lpss industrialio tpm acpi_pad acpi_power_meter sch_fq_codel nfsd auth_rpcgss nfs_acl lockd grace sunrpc ast drm_kms_helper ttm drm igb hwmon ptp pps_core dca i2c_algo_bit i2c_hid i2c_core ipv6 crc_ccitt [last unloaded: nf_conntrack] CPU: 1 PID: 996 Comm: ip Not tainted 4.12.0-rc4+ #32 Hardware name: Supermicro Super Server/X11SSM-F, BIOS 1.0b 12/29/2015 task: ffff8803d4d61dc0 task.stack: ffffc9000970c000 RIP: 0010:fib6_walk_continue+0x76/0x180 [ipv6] RSP: 0018:ffffc9000970fbb8 EFLAGS: 00010283 RAX: ffff8803de84b020 RBX: ffff8803e0756f00 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffc9000970fc00 RDI: ffffffff81eee280 RBP: ffffc9000970fbc0 R08: 0000000000000008 R09: ffff8803d4fbbf31 R10: ffffc9000970fb68 R11: 0000000000000000 R12: 0000000000000001 R13: 0000000000000001 R14: ffff8803e0756f00 R15: ffff8803d9345b18 FS: 00007f32ca4ec700(0000) GS:ffff880477840000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000018 CR3: 00000003ddacc000 CR4: 00000000003406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: inet6_dump_fib+0x109/0x290 [ipv6] netlink_dump+0x11d/0x290 netlink_recvmsg+0x260/0x3f0 sock_recvmsg+0x38/0x40 ___sys_recvmsg+0xe9/0x230 ? alloc_pages_vma+0x9d/0x260 ? page_add_new_anon_rmap+0x88/0xc0 ? lru_cache_add_active_or_unevictable+0x31/0xb0 ? __handle_mm_fault+0xce3/0xf70 __sys_recvmsg+0x3d/0x70 ? __sys_recvmsg+0x3d/0x70 SyS_recvmsg+0xd/0x20 do_syscall_64+0x56/0xc0 entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7f32c9e21050 RSP: 002b:00007fff96401de8 EFLAGS: 00000246 ORIG_RAX: 000000000000002f RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f32c9e21050 RDX: 0000000000000000 RSI: 00007fff96401e50 RDI: 0000000000000004 RBP: 00007fff96405e74 R08: 0000000000003fe4 R09: 0000000000000000 R10: 00007fff96401e90 R11: 0000000000000246 R12: 000000000064f3a0 R13: 00007fff96405ee0 R14: 0000000000003fe4 R15: 0000000000000000 Code: f6 40 2a 04 74 11 8b 53 30 85 d2 0f 84 02 01 00 00 83 ea 01 89 53 30 c7 43 28 04 00 00 00 48 39 43 10 74 33 48 8b 10 48 89 53 18 <48> 39 42 18 0f 84 a3 00 00 00 48 39 42 08 0f 84 ae 00 00 00 48 RIP: fib6_walk_continue+0x76/0x180 [ipv6] RSP: ffffc9000970fbb8 CR2: 0000000000000018 ---[ end trace 5ebbc4ee97bea64e ]--- Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: disabled Rebooting in 10 seconds.. ACPI MEMORY or I/O RESET_REG. -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+ 2017-06-06 21:06 Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+ Ben Greear @ 2017-06-07 0:00 ` David Ahern 2017-06-07 0:27 ` Eric Dumazet 0 siblings, 1 reply; 25+ messages in thread From: David Ahern @ 2017-06-07 0:00 UTC (permalink / raw) To: Ben Greear, netdev On 6/6/17 3:06 PM, Ben Greear wrote: > This bug has been around forever, and we recently got an intern and > stuck him with > trying to reproduce it on the latest kernel. It is still here. I'm not > super excited > about trying to fix this, but we can easily test patches if someone has a > patch to try. Can you try this (whitespace damaged on paste, but it is moving the lock ahead of the fn_sernum check): diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index deea901746c8..7a44c49055c0 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -378,6 +378,7 @@ static int fib6_dump_table(struct fib6_table *table, struct sk_buff *skb, cb->args[5] = w->root->fn_sernum; } } else { + read_lock_bh(&table->tb6_lock); if (cb->args[5] != w->root->fn_sernum) { /* Begin at the root if the tree changed */ cb->args[5] = w->root->fn_sernum; @@ -387,7 +388,6 @@ static int fib6_dump_table(struct fib6_table *table, struct sk_buff *skb, } else w->skip = 0; - read_lock_bh(&table->tb6_lock); res = fib6_walk_continue(w); read_unlock_bh(&table->tb6_lock); if (res <= 0) { ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+ 2017-06-07 0:00 ` David Ahern @ 2017-06-07 0:27 ` Eric Dumazet 2017-06-07 0:34 ` David Ahern 2017-06-07 0:48 ` Ben Greear 0 siblings, 2 replies; 25+ messages in thread From: Eric Dumazet @ 2017-06-07 0:27 UTC (permalink / raw) To: David Ahern; +Cc: Ben Greear, netdev On Tue, 2017-06-06 at 18:00 -0600, David Ahern wrote: > On 6/6/17 3:06 PM, Ben Greear wrote: > > This bug has been around forever, and we recently got an intern and > > stuck him with > > trying to reproduce it on the latest kernel. It is still here. I'm not > > super excited > > about trying to fix this, but we can easily test patches if someone has a > > patch to try. > > Can you try this (whitespace damaged on paste, but it is moving the lock > ahead of the fn_sernum check): > > diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c > index deea901746c8..7a44c49055c0 100644 > --- a/net/ipv6/ip6_fib.c > +++ b/net/ipv6/ip6_fib.c > @@ -378,6 +378,7 @@ static int fib6_dump_table(struct fib6_table *table, > struct sk_buff *skb, > cb->args[5] = w->root->fn_sernum; > } > } else { > + read_lock_bh(&table->tb6_lock); > if (cb->args[5] != w->root->fn_sernum) { > /* Begin at the root if the tree changed */ > cb->args[5] = w->root->fn_sernum; > @@ -387,7 +388,6 @@ static int fib6_dump_table(struct fib6_table *table, > struct sk_buff *skb, > } else > w->skip = 0; > > - read_lock_bh(&table->tb6_lock); > res = fib6_walk_continue(w); > read_unlock_bh(&table->tb6_lock); > if (res <= 0) { Good catch, but it looks like similar fix is needed a few lines before. diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index deea901746c8570c5e801e40592c91e3b62812e0..b214443dc8346cef3690df7f27cc48a864028865 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -372,12 +372,13 @@ static int fib6_dump_table(struct fib6_table *table, struct sk_buff *skb, read_lock_bh(&table->tb6_lock); res = fib6_walk(net, w); - read_unlock_bh(&table->tb6_lock); if (res > 0) { cb->args[4] = 1; cb->args[5] = w->root->fn_sernum; } + read_unlock_bh(&table->tb6_lock); } else { + read_lock_bh(&table->tb6_lock); if (cb->args[5] != w->root->fn_sernum) { /* Begin at the root if the tree changed */ cb->args[5] = w->root->fn_sernum; @@ -387,7 +388,6 @@ static int fib6_dump_table(struct fib6_table *table, struct sk_buff *skb, } else w->skip = 0; - read_lock_bh(&table->tb6_lock); res = fib6_walk_continue(w); read_unlock_bh(&table->tb6_lock); if (res <= 0) { ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+ 2017-06-07 0:27 ` Eric Dumazet @ 2017-06-07 0:34 ` David Ahern 2017-06-07 4:19 ` Eric Dumazet 2017-06-07 0:48 ` Ben Greear 1 sibling, 1 reply; 25+ messages in thread From: David Ahern @ 2017-06-07 0:34 UTC (permalink / raw) To: Eric Dumazet; +Cc: Ben Greear, netdev On 6/6/17 6:27 PM, Eric Dumazet wrote: > Good catch, but it looks like similar fix is needed a few lines before. > > diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c > index deea901746c8570c5e801e40592c91e3b62812e0..b214443dc8346cef3690df7f27cc48a864028865 100644 > --- a/net/ipv6/ip6_fib.c > +++ b/net/ipv6/ip6_fib.c > @@ -372,12 +372,13 @@ static int fib6_dump_table(struct fib6_table *table, struct sk_buff *skb, > > read_lock_bh(&table->tb6_lock); > res = fib6_walk(net, w); > - read_unlock_bh(&table->tb6_lock); > if (res > 0) { > cb->args[4] = 1; > cb->args[5] = w->root->fn_sernum; > } > + read_unlock_bh(&table->tb6_lock); indeed. tunnel vision on Ben's problem ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+ 2017-06-07 0:34 ` David Ahern @ 2017-06-07 4:19 ` Eric Dumazet 2017-06-08 21:27 ` Ben Greear 0 siblings, 1 reply; 25+ messages in thread From: Eric Dumazet @ 2017-06-07 4:19 UTC (permalink / raw) To: David Ahern; +Cc: Ben Greear, netdev On Tue, 2017-06-06 at 18:34 -0600, David Ahern wrote: > On 6/6/17 6:27 PM, Eric Dumazet wrote: > > Good catch, but it looks like similar fix is needed a few lines before. > > > > diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c > > index deea901746c8570c5e801e40592c91e3b62812e0..b214443dc8346cef3690df7f27cc48a864028865 100644 > > --- a/net/ipv6/ip6_fib.c > > +++ b/net/ipv6/ip6_fib.c > > @@ -372,12 +372,13 @@ static int fib6_dump_table(struct fib6_table *table, struct sk_buff *skb, > > > > read_lock_bh(&table->tb6_lock); > > res = fib6_walk(net, w); > > - read_unlock_bh(&table->tb6_lock); > > if (res > 0) { > > cb->args[4] = 1; > > cb->args[5] = w->root->fn_sernum; > > } > > + read_unlock_bh(&table->tb6_lock); > > indeed. tunnel vision on Ben's problem BTW, bug was already Ben's problem when Patrick tried to fix it in commit 2bec5a369ee79 ("ipv6: fib: fix crash when changing large fib while dumping it") seven years ago ;) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+ 2017-06-07 4:19 ` Eric Dumazet @ 2017-06-08 21:27 ` Ben Greear 2017-06-09 5:55 ` Cong Wang 0 siblings, 1 reply; 25+ messages in thread From: Ben Greear @ 2017-06-08 21:27 UTC (permalink / raw) To: Eric Dumazet, David Ahern; +Cc: netdev On 06/06/2017 09:19 PM, Eric Dumazet wrote: > On Tue, 2017-06-06 at 18:34 -0600, David Ahern wrote: >> On 6/6/17 6:27 PM, Eric Dumazet wrote: >>> Good catch, but it looks like similar fix is needed a few lines before. >>> >>> diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c >>> index deea901746c8570c5e801e40592c91e3b62812e0..b214443dc8346cef3690df7f27cc48a864028865 100644 >>> --- a/net/ipv6/ip6_fib.c >>> +++ b/net/ipv6/ip6_fib.c >>> @@ -372,12 +372,13 @@ static int fib6_dump_table(struct fib6_table *table, struct sk_buff *skb, >>> >>> read_lock_bh(&table->tb6_lock); >>> res = fib6_walk(net, w); >>> - read_unlock_bh(&table->tb6_lock); >>> if (res > 0) { >>> cb->args[4] = 1; >>> cb->args[5] = w->root->fn_sernum; >>> } >>> + read_unlock_bh(&table->tb6_lock); >> >> indeed. tunnel vision on Ben's problem > > BTW, bug was already Ben's problem when Patrick tried to fix it > in commit 2bec5a369ee79 ("ipv6: fib: fix crash when changing large fib > while dumping it") seven years ago ;) As far as I can tell, the patch did not help, or at least we still reproduce the crash easily. ct524-ffb0 login: BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 IP: fib6_walk_continue+0x76/0x180 [ipv6] PGD 3ec59a067 P4D 3ec59a067 PUD 3eb939067 PMD 0 Oops: 0000 [#1] PREEMPT SMP Modules linked in: nf_conntrack_netlink nf_conntrack nfnetlink nf_defrag_ipv4 libcrc32c bridge stp llc veth bnep fuse macvlan pktgen cfg80211 ipmi_ssif iTCO_wdt iTCO_vendor_support coretemp intel_rapl x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm irqbypass joydev i2c_i801 ie31200_edac intel_pch_thermal shpchp hci_uart ipmi_si btbcm btqca ipmi_devintf btintel ipmi_msghandler pinctrl_sunrisepoint bluetooth intel_lpss_acpi acpi_als video pinctrl_intel intel_lpss kfifo_buf tpm_tis tpm_tis_core industrialio acpi_power_meter tpm acpi_pad sch_fq_codel nfsd auth_rpcgss nfs_acl lockd grace sunrpc ast drm_kms_helper ttm drm igb hwmon ptp pps_core dca i2c_algo_bit i2c_hid i2c_core ipv6 crc_ccitt [last unloaded: nfnetlink] CPU: 3 PID: 2185 Comm: ip Not tainted 4.12.0-rc4+ #32 Hardware name: Supermicro Super Server/X11SSM-F, BIOS 1.0b 12/29/2015 task: ffff8803e87fd940 task.stack: ffffc90009ae8000 RIP: 0010:fib6_walk_continue+0x76/0x180 [ipv6] RSP: 0018:ffffc90009aebbc0 EFLAGS: 00010287 RAX: ffff880460df8ca0 RBX: ffff8803f20a2c60 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffc90009aebc00 RDI: ffffffff81eee280 RBP: ffffc90009aebbc8 R08: 0000000000000008 R09: ffff8803e87b47cd R10: ffffc90009aebb70 R11: 0000000000000000 R12: 0000000000000001 R13: 0000000000000001 R14: ffff8803f20a2c60 R15: ffff8803ec601f80 FS: 00007f43520ee700(0000) GS:ffff8804778c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000018 CR3: 00000003ebb46000 CR4: 00000000003406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: inet6_dump_fib+0x1ab/0x2a0 [ipv6] netlink_dump+0x11d/0x290 netlink_recvmsg+0x260/0x3f0 sock_recvmsg+0x38/0x40 ___sys_recvmsg+0xe9/0x230 ? alloc_pages_vma+0x9d/0x260 ? page_add_new_anon_rmap+0x88/0xc0 ? lru_cache_add_active_or_unevictable+0x31/0xb0 ? __handle_mm_fault+0xce3/0xf70 __sys_recvmsg+0x3d/0x70 ? __sys_recvmsg+0x3d/0x70 SyS_recvmsg+0xd/0x20 do_syscall_64+0x56/0xc0 entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7f4351a23050 RSP: 002b:00007ffdb1bfafb8 EFLAGS: 00000246 ORIG_RAX: 000000000000002f RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f4351a23050 RDX: 0000000000000000 RSI: 00007ffdb1bfb020 RDI: 0000000000000004 RBP: 00007ffdb1bff044 R08: 0000000000003fe4 R09: 0000000000000000 R10: 00007ffdb1bfb060 R11: 0000000000000246 R12: 000000000064f360 R13: 00007ffdb1bff0b0 R14: 0000000000003fe4 R15: 0000000000000000 Code: f6 40 2a 04 74 11 8b 53 30 85 d2 0f 84 02 01 00 00 83 ea 01 89 53 30 c7 43 28 04 00 00 00 48 39 43 10 74 33 48 8b 10 48 89 53 18 <48> 39 42 18 0f 84 a3 00 00 00 48 39 42 08 0f 84 ae 00 00 00 48 RIP: fib6_walk_continue+0x76/0x180 [ipv6] RSP: ffffc90009aebbc0 CR2: 0000000000000018 ---[ end trace 06ac9dee8b14db6b ]--- Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: disabled (gdb) l *(fib6_walk_continue+0x76) 0x188c6 is in fib6_walk_continue (/home/greearb/git/linux-2.6/net/ipv6/ip6_fib.c:1593). 1588 if (fn == w->root) 1589 return 0; 1590 pn = fn->parent; 1591 w->node = pn; 1592 #ifdef CONFIG_IPV6_SUBTREES 1593 if (FIB6_SUBTREE(pn) == fn) { 1594 WARN_ON(!(fn->fn_flags & RTN_ROOT)); 1595 w->state = FWS_L; 1596 continue; 1597 } (gdb) l *(inet6_dump_fib+0x1ab) 0x1939b is in inet6_dump_fib (/home/greearb/git/linux-2.6/net/ipv6/ip6_fib.c:392). 387 w->skip = w->count; 388 } else 389 w->skip = 0; 390 391 res = fib6_walk_continue(w); 392 read_unlock_bh(&table->tb6_lock); 393 if (res <= 0) { 394 fib6_walker_unlink(net, w); 395 cb->args[4] = 0; 396 } (gdb) [greearb@ben-dt3 linux-2.6]$ git diff diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index d4bf2c6..4e32a16 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -372,12 +372,13 @@ static int fib6_dump_table(struct fib6_table *table, struct sk_buff *skb, read_lock_bh(&table->tb6_lock); res = fib6_walk(net, w); - read_unlock_bh(&table->tb6_lock); if (res > 0) { cb->args[4] = 1; cb->args[5] = w->root->fn_sernum; } + read_unlock_bh(&table->tb6_lock); } else { + read_lock_bh(&table->tb6_lock); if (cb->args[5] != w->root->fn_sernum) { /* Begin at the root if the tree changed */ cb->args[5] = w->root->fn_sernum; @@ -387,7 +388,6 @@ static int fib6_dump_table(struct fib6_table *table, struct sk_buff *skb, } else w->skip = 0; - read_lock_bh(&table->tb6_lock); res = fib6_walk_continue(w); read_unlock_bh(&table->tb6_lock); if (res <= 0) { Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+ 2017-06-08 21:27 ` Ben Greear @ 2017-06-09 5:55 ` Cong Wang 2017-06-09 13:27 ` David Ahern 0 siblings, 1 reply; 25+ messages in thread From: Cong Wang @ 2017-06-09 5:55 UTC (permalink / raw) To: Ben Greear; +Cc: Eric Dumazet, David Ahern, netdev On Thu, Jun 8, 2017 at 2:27 PM, Ben Greear <greearb@candelatech.com> wrote: > > As far as I can tell, the patch did not help, or at least we still reproduce > the > crash easily. netlink dump is serialized by nlk->cb_mutex so I don't think that patch makes any sense w.r.t race condition. > (gdb) l *(fib6_walk_continue+0x76) > 0x188c6 is in fib6_walk_continue > (/home/greearb/git/linux-2.6/net/ipv6/ip6_fib.c:1593). > 1588 if (fn == w->root) > 1589 return 0; > 1590 pn = fn->parent; > 1591 w->node = pn; > 1592 #ifdef CONFIG_IPV6_SUBTREES > 1593 if (FIB6_SUBTREE(pn) == fn) { Apparently fn->parent is NULL here for some reason, but I don't know if that is expected or not. If a simple NULL check is not enough here, we have to trace why it is NULL. > 1594 WARN_ON(!(fn->fn_flags & RTN_ROOT)); > 1595 w->state = FWS_L; > 1596 continue; > 1597 } > (gdb) l *(inet6_dump_fib+0x1ab) > 0x1939b is in inet6_dump_fib > (/home/greearb/git/linux-2.6/net/ipv6/ip6_fib.c:392). > 387 w->skip = w->count; > 388 } else > 389 w->skip = 0; > 390 > 391 res = fib6_walk_continue(w); > 392 read_unlock_bh(&table->tb6_lock); > 393 if (res <= 0) { > 394 fib6_walker_unlink(net, w); > 395 cb->args[4] = 0; > 396 } > (gdb) Thanks. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+ 2017-06-09 5:55 ` Cong Wang @ 2017-06-09 13:27 ` David Ahern 2017-06-09 21:25 ` Eric Dumazet 0 siblings, 1 reply; 25+ messages in thread From: David Ahern @ 2017-06-09 13:27 UTC (permalink / raw) To: Cong Wang, Ben Greear; +Cc: Eric Dumazet, netdev On 6/8/17 11:55 PM, Cong Wang wrote: > On Thu, Jun 8, 2017 at 2:27 PM, Ben Greear <greearb@candelatech.com> wrote: >> >> As far as I can tell, the patch did not help, or at least we still reproduce >> the >> crash easily. > > netlink dump is serialized by nlk->cb_mutex so I don't think that > patch makes any sense w.r.t race condition. >From what I can see fn_sernum should be accessed under table lock, so when saving and checking it during a walk make sure it the lock is held. That has nothing to do with the netlink dump, but the table changing during a walk. >> (gdb) l *(fib6_walk_continue+0x76) >> 0x188c6 is in fib6_walk_continue >> (/home/greearb/git/linux-2.6/net/ipv6/ip6_fib.c:1593). >> 1588 if (fn == w->root) >> 1589 return 0; >> 1590 pn = fn->parent; >> 1591 w->node = pn; >> 1592 #ifdef CONFIG_IPV6_SUBTREES >> 1593 if (FIB6_SUBTREE(pn) == fn) { > > Apparently fn->parent is NULL here for some reason, but > I don't know if that is expected or not. If a simple NULL check > is not enough here, we have to trace why it is NULL. >From my understanding, parent should not be null hence the attempts to fix access to table nodes under a lock. ie., figuring out why it is null here. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+ 2017-06-09 13:27 ` David Ahern @ 2017-06-09 21:25 ` Eric Dumazet 2017-06-13 20:16 ` Ben Greear 0 siblings, 1 reply; 25+ messages in thread From: Eric Dumazet @ 2017-06-09 21:25 UTC (permalink / raw) To: David Ahern; +Cc: Cong Wang, Ben Greear, netdev On Fri, 2017-06-09 at 07:27 -0600, David Ahern wrote: > On 6/8/17 11:55 PM, Cong Wang wrote: > > On Thu, Jun 8, 2017 at 2:27 PM, Ben Greear <greearb@candelatech.com> wrote: > >> > >> As far as I can tell, the patch did not help, or at least we still reproduce > >> the > >> crash easily. > > > > netlink dump is serialized by nlk->cb_mutex so I don't think that > > patch makes any sense w.r.t race condition. > > From what I can see fn_sernum should be accessed under table lock, so > when saving and checking it during a walk make sure it the lock is held. > That has nothing to do with the netlink dump, but the table changing > during a walk. Yes, your patch makes total sense, of course. > > > >> (gdb) l *(fib6_walk_continue+0x76) > >> 0x188c6 is in fib6_walk_continue > >> (/home/greearb/git/linux-2.6/net/ipv6/ip6_fib.c:1593). > >> 1588 if (fn == w->root) > >> 1589 return 0; > >> 1590 pn = fn->parent; > >> 1591 w->node = pn; > >> 1592 #ifdef CONFIG_IPV6_SUBTREES > >> 1593 if (FIB6_SUBTREE(pn) == fn) { > > > > Apparently fn->parent is NULL here for some reason, but > > I don't know if that is expected or not. If a simple NULL check > > is not enough here, we have to trace why it is NULL. > > From my understanding, parent should not be null hence the attempts to > fix access to table nodes under a lock. ie., figuring out why it is null > here. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+ 2017-06-09 21:25 ` Eric Dumazet @ 2017-06-13 20:16 ` Ben Greear 2017-06-13 20:28 ` David Ahern 2017-06-13 21:42 ` Cong Wang 0 siblings, 2 replies; 25+ messages in thread From: Ben Greear @ 2017-06-13 20:16 UTC (permalink / raw) To: Eric Dumazet, David Ahern; +Cc: Cong Wang, netdev On 06/09/2017 02:25 PM, Eric Dumazet wrote: > On Fri, 2017-06-09 at 07:27 -0600, David Ahern wrote: >> On 6/8/17 11:55 PM, Cong Wang wrote: >>> On Thu, Jun 8, 2017 at 2:27 PM, Ben Greear <greearb@candelatech.com> wrote: >>>> >>>> As far as I can tell, the patch did not help, or at least we still reproduce >>>> the >>>> crash easily. >>> >>> netlink dump is serialized by nlk->cb_mutex so I don't think that >>> patch makes any sense w.r.t race condition. >> >> From what I can see fn_sernum should be accessed under table lock, so >> when saving and checking it during a walk make sure it the lock is held. >> That has nothing to do with the netlink dump, but the table changing >> during a walk. > > > Yes, your patch makes total sense, of course. I guess someone should go ahead and make an official patch and submit it, even if it doesn't fix my problem. >>>> (gdb) l *(fib6_walk_continue+0x76) >>>> 0x188c6 is in fib6_walk_continue >>>> (/home/greearb/git/linux-2.6/net/ipv6/ip6_fib.c:1593). >>>> 1588 if (fn == w->root) >>>> 1589 return 0; >>>> 1590 pn = fn->parent; >>>> 1591 w->node = pn; >>>> 1592 #ifdef CONFIG_IPV6_SUBTREES >>>> 1593 if (FIB6_SUBTREE(pn) == fn) { >>> >>> Apparently fn->parent is NULL here for some reason, but >>> I don't know if that is expected or not. If a simple NULL check >>> is not enough here, we have to trace why it is NULL. >> >> From my understanding, parent should not be null hence the attempts to >> fix access to table nodes under a lock. ie., figuring out why it is null >> here. If someone has more suggestions, I'll be happy to test. Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+ 2017-06-13 20:16 ` Ben Greear @ 2017-06-13 20:28 ` David Ahern 2017-06-13 20:39 ` Ben Greear 2017-06-13 21:42 ` Cong Wang 1 sibling, 1 reply; 25+ messages in thread From: David Ahern @ 2017-06-13 20:28 UTC (permalink / raw) To: Ben Greear, Eric Dumazet; +Cc: Cong Wang, netdev On 6/13/17 2:16 PM, Ben Greear wrote: > On 06/09/2017 02:25 PM, Eric Dumazet wrote: >> On Fri, 2017-06-09 at 07:27 -0600, David Ahern wrote: >>> On 6/8/17 11:55 PM, Cong Wang wrote: >>>> On Thu, Jun 8, 2017 at 2:27 PM, Ben Greear <greearb@candelatech.com> >>>> wrote: >>>>> >>>>> As far as I can tell, the patch did not help, or at least we still >>>>> reproduce >>>>> the >>>>> crash easily. >>>> >>>> netlink dump is serialized by nlk->cb_mutex so I don't think that >>>> patch makes any sense w.r.t race condition. >>> >>> From what I can see fn_sernum should be accessed under table lock, so >>> when saving and checking it during a walk make sure it the lock is held. >>> That has nothing to do with the netlink dump, but the table changing >>> during a walk. >> >> >> Yes, your patch makes total sense, of course. > > I guess someone should go ahead and make an official patch and > submit it, even if it doesn't fix my problem. I can do that; was hoping to root cause the problem first. > >>>>> (gdb) l *(fib6_walk_continue+0x76) >>>>> 0x188c6 is in fib6_walk_continue >>>>> (/home/greearb/git/linux-2.6/net/ipv6/ip6_fib.c:1593). >>>>> 1588 if (fn == w->root) >>>>> 1589 return 0; >>>>> 1590 pn = fn->parent; >>>>> 1591 w->node = pn; >>>>> 1592 #ifdef CONFIG_IPV6_SUBTREES >>>>> 1593 if (FIB6_SUBTREE(pn) == fn) { >>>> >>>> Apparently fn->parent is NULL here for some reason, but >>>> I don't know if that is expected or not. If a simple NULL check >>>> is not enough here, we have to trace why it is NULL. >>> >>> From my understanding, parent should not be null hence the attempts to >>> fix access to table nodes under a lock. ie., figuring out why it is null >>> here. > > If someone has more suggestions, I'll be happy to test. I have looked at the code again and nothing is jumping out. Will look again later today. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+ 2017-06-13 20:28 ` David Ahern @ 2017-06-13 20:39 ` Ben Greear 0 siblings, 0 replies; 25+ messages in thread From: Ben Greear @ 2017-06-13 20:39 UTC (permalink / raw) To: David Ahern, Eric Dumazet; +Cc: Cong Wang, netdev On 06/13/2017 01:28 PM, David Ahern wrote: > On 6/13/17 2:16 PM, Ben Greear wrote: >> On 06/09/2017 02:25 PM, Eric Dumazet wrote: >>> On Fri, 2017-06-09 at 07:27 -0600, David Ahern wrote: >>>> On 6/8/17 11:55 PM, Cong Wang wrote: >>>>> On Thu, Jun 8, 2017 at 2:27 PM, Ben Greear <greearb@candelatech.com> >>>>> wrote: >>>>>> >>>>>> As far as I can tell, the patch did not help, or at least we still >>>>>> reproduce >>>>>> the >>>>>> crash easily. >>>>> >>>>> netlink dump is serialized by nlk->cb_mutex so I don't think that >>>>> patch makes any sense w.r.t race condition. >>>> >>>> From what I can see fn_sernum should be accessed under table lock, so >>>> when saving and checking it during a walk make sure it the lock is held. >>>> That has nothing to do with the netlink dump, but the table changing >>>> during a walk. >>> >>> >>> Yes, your patch makes total sense, of course. >> >> I guess someone should go ahead and make an official patch and >> submit it, even if it doesn't fix my problem. > > I can do that; was hoping to root cause the problem first. > > >> >>>>>> (gdb) l *(fib6_walk_continue+0x76) >>>>>> 0x188c6 is in fib6_walk_continue >>>>>> (/home/greearb/git/linux-2.6/net/ipv6/ip6_fib.c:1593). >>>>>> 1588 if (fn == w->root) >>>>>> 1589 return 0; >>>>>> 1590 pn = fn->parent; >>>>>> 1591 w->node = pn; >>>>>> 1592 #ifdef CONFIG_IPV6_SUBTREES >>>>>> 1593 if (FIB6_SUBTREE(pn) == fn) { >>>>> >>>>> Apparently fn->parent is NULL here for some reason, but >>>>> I don't know if that is expected or not. If a simple NULL check >>>>> is not enough here, we have to trace why it is NULL. >>>> >>>> From my understanding, parent should not be null hence the attempts to >>>> fix access to table nodes under a lock. ie., figuring out why it is null >>>> here. >> >> If someone has more suggestions, I'll be happy to test. > > I have looked at the code again and nothing is jumping out. Will look > again later today. > I noticed there is some code to help fix up the walkers when nodes are deleted. They use lock: read_lock(&net->ipv6.fib6_walker_lock); The code you were tweaking uses a different lock: read_lock_bh(&table->tb6_lock); In is certainly not simple code, so I don't know if that is correct or not, but might possibly be a place to start looking. I'm going to re-test with a WARN_ON to see if that triggers since previous suggestion was that f->parent was NULL. diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index 51cd637..86295df 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -1571,6 +1571,10 @@ static int fib6_walk_continue(struct fib6_walker *w) case FWS_U: if (fn == w->root) return 0; + if (!fn->parent) { + WARN_ON_ONCE(0); + return 0; + } pn = fn->parent; w->node = pn; #ifdef CONFIG_IPV6_SUBTREES Thanks, Ben Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+ 2017-06-13 20:16 ` Ben Greear 2017-06-13 20:28 ` David Ahern @ 2017-06-13 21:42 ` Cong Wang 2017-06-14 2:27 ` David Ahern 1 sibling, 1 reply; 25+ messages in thread From: Cong Wang @ 2017-06-13 21:42 UTC (permalink / raw) To: Ben Greear; +Cc: Eric Dumazet, David Ahern, netdev On Tue, Jun 13, 2017 at 1:16 PM, Ben Greear <greearb@candelatech.com> wrote: > On 06/09/2017 02:25 PM, Eric Dumazet wrote: >> >> On Fri, 2017-06-09 at 07:27 -0600, David Ahern wrote: >>> >>> On 6/8/17 11:55 PM, Cong Wang wrote: >>>> Apparently fn->parent is NULL here for some reason, but >>>> I don't know if that is expected or not. If a simple NULL check >>>> is not enough here, we have to trace why it is NULL. >>> >>> >>> From my understanding, parent should not be null hence the attempts to >>> fix access to table nodes under a lock. ie., figuring out why it is null >>> here. > > > If someone has more suggestions, I'll be happy to test. Can you enable RT6_TRACE() by changing RT6_DEBUG from 2 to 3? We may collect some useful log with it. diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index d4bf2c6..1941595 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -37,7 +37,7 @@ #include <net/ip6_fib.h> #include <net/ip6_route.h> -#define RT6_DEBUG 2 +#define RT6_DEBUG 3 #if RT6_DEBUG >= 3 #define RT6_TRACE(x...) pr_debug(x) ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+ 2017-06-13 21:42 ` Cong Wang @ 2017-06-14 2:27 ` David Ahern 2017-06-14 22:23 ` Ben Greear 0 siblings, 1 reply; 25+ messages in thread From: David Ahern @ 2017-06-14 2:27 UTC (permalink / raw) To: Cong Wang, Ben Greear; +Cc: Eric Dumazet, netdev [-- Attachment #1: Type: text/plain, Size: 1198 bytes --] On 6/13/17 3:42 PM, Cong Wang wrote: > On Tue, Jun 13, 2017 at 1:16 PM, Ben Greear <greearb@candelatech.com> wrote: >> On 06/09/2017 02:25 PM, Eric Dumazet wrote: >>> >>> On Fri, 2017-06-09 at 07:27 -0600, David Ahern wrote: >>>> >>>> On 6/8/17 11:55 PM, Cong Wang wrote: >>>>> Apparently fn->parent is NULL here for some reason, but >>>>> I don't know if that is expected or not. If a simple NULL check >>>>> is not enough here, we have to trace why it is NULL. >>>> >>>> >>>> From my understanding, parent should not be null hence the attempts to >>>> fix access to table nodes under a lock. ie., figuring out why it is null >>>> here. >> >> >> If someone has more suggestions, I'll be happy to test. > > Can you enable RT6_TRACE() by changing RT6_DEBUG > from 2 to 3? We may collect some useful log with it. > > diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c > index d4bf2c6..1941595 100644 > --- a/net/ipv6/ip6_fib.c > +++ b/net/ipv6/ip6_fib.c > @@ -37,7 +37,7 @@ > #include <net/ip6_fib.h> > #include <net/ip6_route.h> > > -#define RT6_DEBUG 2 > +#define RT6_DEBUG 3 > > #if RT6_DEBUG >= 3 > #define RT6_TRACE(x...) pr_debug(x) > Let's try a targeted debug patch. See attached [-- Attachment #2: fib6-walk-debug.patch --] [-- Type: text/plain, Size: 1145 bytes --] diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index deea901746c8..367f1284f05b 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -1396,6 +1396,8 @@ static struct fib6_node *fib6_repair_tree(struct net *net, RT6_TRACE("W %p adjusted by delnode 2, s=%d\n", w, w->state); w->state = w->state >= FWS_C ? FWS_U : FWS_INIT; } + if (w->state == FWS_U) + pr_warn("fib6_repair_tree: W %p adjusted by delnode 2, state FWS_U\n", w, w->state); } } } @@ -1447,8 +1449,10 @@ static void fib6_del_route(struct fib6_node *fn, struct rt6_info **rtp, if (w->state == FWS_C && w->leaf == rt) { RT6_TRACE("walker %p adjusted by delroute\n", w); w->leaf = rt->dst.rt6_next; - if (!w->leaf) + if (!w->leaf) { + pr_warn("fib6_del_route: walker %p adjusted by delroute - state FWS_U\n", w); w->state = FWS_U; + } } } read_unlock(&net->ipv6.fib6_walker_lock); @@ -1591,6 +1595,7 @@ static int fib6_walk_continue(struct fib6_walker *w) continue; } skip: + pr_warn("fib6_walk_continue: set state to FWS_U\n"); w->state = FWS_U; case FWS_U: if (fn == w->root) ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+ 2017-06-14 2:27 ` David Ahern @ 2017-06-14 22:23 ` Ben Greear 2017-06-14 22:25 ` David Ahern 0 siblings, 1 reply; 25+ messages in thread From: Ben Greear @ 2017-06-14 22:23 UTC (permalink / raw) To: David Ahern, Cong Wang; +Cc: Eric Dumazet, netdev On 06/13/2017 07:27 PM, David Ahern wrote: > Let's try a targeted debug patch. See attached I had to change it to pr_err so it would go to our serial console since the system locked hard on crash, and that appears to be enough to change the timing where we can no longer reproduce the problem. Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+ 2017-06-14 22:23 ` Ben Greear @ 2017-06-14 22:25 ` David Ahern 2017-06-20 14:12 ` Ben Greear 0 siblings, 1 reply; 25+ messages in thread From: David Ahern @ 2017-06-14 22:25 UTC (permalink / raw) To: Ben Greear, Cong Wang; +Cc: Eric Dumazet, netdev On 6/14/17 4:23 PM, Ben Greear wrote: > On 06/13/2017 07:27 PM, David Ahern wrote: > >> Let's try a targeted debug patch. See attached > > I had to change it to pr_err so it would go to our serial console > since the system locked hard on crash, > and that appears to be enough to change the timing where we can no longer > reproduce the problem. ok, let's figure out which one is doing that. There are 3 debug statements. I suspect fib6_del_route is the one setting the state to FWS_U. Can you remove the debug prints in fib6_repair_tree and fib6_walk_continue and try again? ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+ 2017-06-14 22:25 ` David Ahern @ 2017-06-20 14:12 ` Ben Greear 2017-06-20 18:05 ` Michal Kubecek 0 siblings, 1 reply; 25+ messages in thread From: Ben Greear @ 2017-06-20 14:12 UTC (permalink / raw) To: David Ahern, Cong Wang; +Cc: Eric Dumazet, netdev On 06/14/2017 03:25 PM, David Ahern wrote: > On 6/14/17 4:23 PM, Ben Greear wrote: >> On 06/13/2017 07:27 PM, David Ahern wrote: >> >>> Let's try a targeted debug patch. See attached >> >> I had to change it to pr_err so it would go to our serial console >> since the system locked hard on crash, >> and that appears to be enough to change the timing where we can no longer >> reproduce the problem. > > > ok, let's figure out which one is doing that. There are 3 debug > statements. I suspect fib6_del_route is the one setting the state to > FWS_U. Can you remove the debug prints in fib6_repair_tree and > fib6_walk_continue and try again? We cannot reproduce with just that one printf in the kernel either. It must change the timing too much to trigger the bug. Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+ 2017-06-20 14:12 ` Ben Greear @ 2017-06-20 18:05 ` Michal Kubecek 2017-06-20 21:41 ` Ben Greear 0 siblings, 1 reply; 25+ messages in thread From: Michal Kubecek @ 2017-06-20 18:05 UTC (permalink / raw) To: Ben Greear; +Cc: David Ahern, Cong Wang, Eric Dumazet, netdev On Tue, Jun 20, 2017 at 07:12:27AM -0700, Ben Greear wrote: > On 06/14/2017 03:25 PM, David Ahern wrote: > >On 6/14/17 4:23 PM, Ben Greear wrote: > >>On 06/13/2017 07:27 PM, David Ahern wrote: > >> > >>>Let's try a targeted debug patch. See attached > >> > >>I had to change it to pr_err so it would go to our serial console > >>since the system locked hard on crash, > >>and that appears to be enough to change the timing where we can no longer > >>reproduce the problem. > > > > > >ok, let's figure out which one is doing that. There are 3 debug > >statements. I suspect fib6_del_route is the one setting the state to > >FWS_U. Can you remove the debug prints in fib6_repair_tree and > >fib6_walk_continue and try again? > > We cannot reproduce with just that one printf in the kernel either. It > must change the timing too much to trigger the bug. You might try trace_printk() which should have less impact (don't forget to enable /proc/sys/kernel/ftrace_dump_on_oops). Michal Kubecek ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+ 2017-06-20 18:05 ` Michal Kubecek @ 2017-06-20 21:41 ` Ben Greear 2017-06-21 3:03 ` David Ahern 0 siblings, 1 reply; 25+ messages in thread From: Ben Greear @ 2017-06-20 21:41 UTC (permalink / raw) To: Michal Kubecek; +Cc: David Ahern, Cong Wang, Eric Dumazet, netdev On 06/20/2017 11:05 AM, Michal Kubecek wrote: > On Tue, Jun 20, 2017 at 07:12:27AM -0700, Ben Greear wrote: >> On 06/14/2017 03:25 PM, David Ahern wrote: >>> On 6/14/17 4:23 PM, Ben Greear wrote: >>>> On 06/13/2017 07:27 PM, David Ahern wrote: >>>> >>>>> Let's try a targeted debug patch. See attached >>>> >>>> I had to change it to pr_err so it would go to our serial console >>>> since the system locked hard on crash, >>>> and that appears to be enough to change the timing where we can no longer >>>> reproduce the problem. >>> >>> >>> ok, let's figure out which one is doing that. There are 3 debug >>> statements. I suspect fib6_del_route is the one setting the state to >>> FWS_U. Can you remove the debug prints in fib6_repair_tree and >>> fib6_walk_continue and try again? >> >> We cannot reproduce with just that one printf in the kernel either. It >> must change the timing too much to trigger the bug. > > You might try trace_printk() which should have less impact (don't forget > to enable /proc/sys/kernel/ftrace_dump_on_oops). We cannot reproduce with trace_printk() either. Thanks, Ben > > Michal Kubecek > -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+ 2017-06-20 21:41 ` Ben Greear @ 2017-06-21 3:03 ` David Ahern 2017-06-25 21:59 ` David Ahern 2018-01-24 23:59 ` Ben Greear 0 siblings, 2 replies; 25+ messages in thread From: David Ahern @ 2017-06-21 3:03 UTC (permalink / raw) To: Ben Greear, Michal Kubecek; +Cc: Cong Wang, Eric Dumazet, netdev On 6/20/17 5:41 PM, Ben Greear wrote: > On 06/20/2017 11:05 AM, Michal Kubecek wrote: >> On Tue, Jun 20, 2017 at 07:12:27AM -0700, Ben Greear wrote: >>> On 06/14/2017 03:25 PM, David Ahern wrote: >>>> On 6/14/17 4:23 PM, Ben Greear wrote: >>>>> On 06/13/2017 07:27 PM, David Ahern wrote: >>>>> >>>>>> Let's try a targeted debug patch. See attached >>>>> >>>>> I had to change it to pr_err so it would go to our serial console >>>>> since the system locked hard on crash, >>>>> and that appears to be enough to change the timing where we can no >>>>> longer >>>>> reproduce the problem. >>>> >>>> >>>> ok, let's figure out which one is doing that. There are 3 debug >>>> statements. I suspect fib6_del_route is the one setting the state to >>>> FWS_U. Can you remove the debug prints in fib6_repair_tree and >>>> fib6_walk_continue and try again? >>> >>> We cannot reproduce with just that one printf in the kernel either. It >>> must change the timing too much to trigger the bug. >> >> You might try trace_printk() which should have less impact (don't forget >> to enable /proc/sys/kernel/ftrace_dump_on_oops). > > We cannot reproduce with trace_printk() either. I think that suggests the walker state is set to FWS_U in fib6_del_route, and it is the FWS_U case in fib6_walk_continue that triggers the fault -- the null parent (pn = fn->parent). So we have the 2 areas of code that are interacting. I'm on a road trip through the end of this week with little time to focus on this problem. I'll get back to you another suggestion when I can. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+ 2017-06-21 3:03 ` David Ahern @ 2017-06-25 21:59 ` David Ahern 2018-01-24 23:59 ` Ben Greear 1 sibling, 0 replies; 25+ messages in thread From: David Ahern @ 2017-06-25 21:59 UTC (permalink / raw) To: Ben Greear, Michal Kubecek; +Cc: Cong Wang, Eric Dumazet, netdev [-- Attachment #1: Type: text/plain, Size: 1754 bytes --] On 6/20/17 9:03 PM, David Ahern wrote: > On 6/20/17 5:41 PM, Ben Greear wrote: >> On 06/20/2017 11:05 AM, Michal Kubecek wrote: >>> On Tue, Jun 20, 2017 at 07:12:27AM -0700, Ben Greear wrote: >>>> On 06/14/2017 03:25 PM, David Ahern wrote: >>>>> On 6/14/17 4:23 PM, Ben Greear wrote: >>>>>> On 06/13/2017 07:27 PM, David Ahern wrote: >>>>>> >>>>>>> Let's try a targeted debug patch. See attached >>>>>> >>>>>> I had to change it to pr_err so it would go to our serial console >>>>>> since the system locked hard on crash, >>>>>> and that appears to be enough to change the timing where we can no >>>>>> longer >>>>>> reproduce the problem. >>>>> >>>>> >>>>> ok, let's figure out which one is doing that. There are 3 debug >>>>> statements. I suspect fib6_del_route is the one setting the state to >>>>> FWS_U. Can you remove the debug prints in fib6_repair_tree and >>>>> fib6_walk_continue and try again? >>>> >>>> We cannot reproduce with just that one printf in the kernel either. It >>>> must change the timing too much to trigger the bug. >>> >>> You might try trace_printk() which should have less impact (don't forget >>> to enable /proc/sys/kernel/ftrace_dump_on_oops). >> >> We cannot reproduce with trace_printk() either. > > I think that suggests the walker state is set to FWS_U in > fib6_del_route, and it is the FWS_U case in fib6_walk_continue that > triggers the fault -- the null parent (pn = fn->parent). So we have the > 2 areas of code that are interacting. > > I'm on a road trip through the end of this week with little time to > focus on this problem. I'll get back to you another suggestion when I can. > Remove all other changes and try the attached. The fault should not happen so you need to watch dmesg / console output. [-- Attachment #2: ip6-walker-panic.patch --] [-- Type: text/plain, Size: 2063 bytes --] diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index deea901746c8..245941e9db8a 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -372,12 +372,13 @@ static int fib6_dump_table(struct fib6_table *table, struct sk_buff *skb, read_lock_bh(&table->tb6_lock); res = fib6_walk(net, w); - read_unlock_bh(&table->tb6_lock); if (res > 0) { cb->args[4] = 1; cb->args[5] = w->root->fn_sernum; } + read_unlock_bh(&table->tb6_lock); } else { + read_lock_bh(&table->tb6_lock); if (cb->args[5] != w->root->fn_sernum) { /* Begin at the root if the tree changed */ cb->args[5] = w->root->fn_sernum; @@ -387,7 +388,6 @@ static int fib6_dump_table(struct fib6_table *table, struct sk_buff *skb, } else w->skip = 0; - read_lock_bh(&table->tb6_lock); res = fib6_walk_continue(w); read_unlock_bh(&table->tb6_lock); if (res <= 0) { @@ -1422,7 +1422,6 @@ static void fib6_del_route(struct fib6_node *fn, struct rt6_info **rtp, /* Unlink it */ *rtp = rt->dst.rt6_next; - rt->rt6i_node = NULL; net->ipv6.rt6_stats->fib_rt_entries--; net->ipv6.rt6_stats->fib_discarded_routes++; @@ -1447,12 +1446,24 @@ static void fib6_del_route(struct fib6_node *fn, struct rt6_info **rtp, if (w->state == FWS_C && w->leaf == rt) { RT6_TRACE("walker %p adjusted by delroute\n", w); w->leaf = rt->dst.rt6_next; - if (!w->leaf) - w->state = FWS_U; + if (!w->leaf) { + if (!w->node->parent) { +pr_warn("fib6_del_route: setting walker node to null while deleting route: " + "dst %pI6c/%d src %pI6c/%d dev %s siblings %d\n", + &rt->rt6i_dst.addr, rt->rt6i_dst.plen, &rt->rt6i_src.addr, rt->rt6i_src.plen, + rt->dst.dev ? rt->dst.dev->name : "<not set>", rt->rt6i_nsiblings); + +if (rt->rt6i_node == w->node) + pr_warn("fib6_del_route: walker node matches deleted route\n"); + w->node = NULL; + } else + w->state = FWS_U; + } } } read_unlock(&net->ipv6.fib6_walker_lock); + rt->rt6i_node = NULL; rt->dst.rt6_next = NULL; /* If it was last route, expunge its radix tree node */ ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+ 2017-06-21 3:03 ` David Ahern 2017-06-25 21:59 ` David Ahern @ 2018-01-24 23:59 ` Ben Greear 2018-04-17 23:29 ` Ben Greear 1 sibling, 1 reply; 25+ messages in thread From: Ben Greear @ 2018-01-24 23:59 UTC (permalink / raw) To: David Ahern, Michal Kubecek; +Cc: Cong Wang, Eric Dumazet, netdev On 06/20/2017 08:03 PM, David Ahern wrote: > On 6/20/17 5:41 PM, Ben Greear wrote: >> On 06/20/2017 11:05 AM, Michal Kubecek wrote: >>> On Tue, Jun 20, 2017 at 07:12:27AM -0700, Ben Greear wrote: >>>> On 06/14/2017 03:25 PM, David Ahern wrote: >>>>> On 6/14/17 4:23 PM, Ben Greear wrote: >>>>>> On 06/13/2017 07:27 PM, David Ahern wrote: >>>>>> >>>>>>> Let's try a targeted debug patch. See attached >>>>>> >>>>>> I had to change it to pr_err so it would go to our serial console >>>>>> since the system locked hard on crash, >>>>>> and that appears to be enough to change the timing where we can no >>>>>> longer >>>>>> reproduce the problem. >>>>> >>>>> >>>>> ok, let's figure out which one is doing that. There are 3 debug >>>>> statements. I suspect fib6_del_route is the one setting the state to >>>>> FWS_U. Can you remove the debug prints in fib6_repair_tree and >>>>> fib6_walk_continue and try again? >>>> >>>> We cannot reproduce with just that one printf in the kernel either. It >>>> must change the timing too much to trigger the bug. >>> >>> You might try trace_printk() which should have less impact (don't forget >>> to enable /proc/sys/kernel/ftrace_dump_on_oops). >> >> We cannot reproduce with trace_printk() either. > > I think that suggests the walker state is set to FWS_U in > fib6_del_route, and it is the FWS_U case in fib6_walk_continue that > triggers the fault -- the null parent (pn = fn->parent). So we have the > 2 areas of code that are interacting. > > I'm on a road trip through the end of this week with little time to > focus on this problem. I'll get back to you another suggestion when I can. So, though I don't know the right way to fix it, the patch below appears to make the system not crash. diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index 68b9cc7..bf19a14 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -1614,6 +1614,12 @@ static int fib6_walk_continue(struct fib6_walker *w) pn = fn->parent; w->node = pn; #ifdef CONFIG_IPV6_SUBTREES + if (WARN_ON_ONCE(!pn)) { + pr_err("FWS-U, w: %p fn: %p pn: %p\n", + w, fn, pn); + /* Attempt to work around crash that has been here forever. --Ben */ + return 0; + } if (FIB6_SUBTREE(pn) == fn) { WARN_ON(!(fn->fn_flags & RTN_ROOT)); w->state = FWS_L; The printout looks like this (when adding 4000 mac-vlans, so it is pretty rare). PN is definitely NULL sometimes: [root@2u-6n ~]# journalctl -f|grep FWS Jan 24 15:48:05 2u-6n kernel: IPv6: FWS-U, w: ffff8807ea121ba0 fn: ffff880856a09260 pn: (null) Jan 24 15:51:15 2u-6n kernel: IPv6: FWS-U, w: ffff8807e3963de0 fn: ffff880856a09260 pn: (null) Jan 24 15:51:15 2u-6n kernel: IPv6: FWS-U, w: ffff88081ac22de0 fn: ffff880856a09260 pn: (null) Jan 24 15:53:13 2u-6n kernel: IPv6: FWS-U, w: ffff8808290c69c0 fn: ffff8807e369f920 pn: (null) Jan 24 15:53:24 2u-6n kernel: IPv6: FWS-U, w: ffff8807ea3156c0 fn: ffff88082d1eeb60 pn: (null) 8066 Jan 24 15:48:04 2u-6n kernel: 8021q: adding VLAN 0 to HW filter on device eth2#1006 8067 Jan 24 15:48:05 2u-6n kernel: ------------[ cut here ]------------ 8068 Jan 24 15:48:05 2u-6n kernel: WARNING: CPU: 5 PID: 3346 at /home/greearb/git/linux-4.13.dev.y/net/ipv6/ip6_fib.c:1617 fib6_walk_continue+ 0x154/0x1b0 [ipv6] 8069 Jan 24 15:48:05 2u-6n kernel: Modules linked in: 8021q garp mrp stp llc fuse macvlan wanlink(O) pktgen ipmi_ssif coretemp intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm ath9k irqbypass iTCO_wdt ath9k_common iTCO_vendor_support ath9k_hw ath i2c_i801 mac80211 joydev lpc_ich cfg80211 ioatdma shpchp tpm_tis tpm_tis_core wmi tpm ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter nfsd auth_rpcgss nfs_acl sch_fq_codel lockd grace sunrpc ast drm_kms_helper ttm drm igb hwmon ptp pps_core dca i2c_algo_bit i2c_core ipv6 crc_ccitt 8070 Jan 24 15:48:05 2u-6n kernel: CPU: 5 PID: 3346 Comm: ip Tainted: G O 4.13.16+ #22 8071 Jan 24 15:48:05 2u-6n kernel: Hardware name: Iron_Systems,Inc CS-CAD-2U-A02/X10SRL-F, BIOS 2.0b 05/02/2017 8072 Jan 24 15:48:05 2u-6n kernel: task: ffff8807e9ef1dc0 task.stack: ffffc9002083c000 8073 Jan 24 15:48:05 2u-6n kernel: RIP: 0010:fib6_walk_continue+0x154/0x1b0 [ipv6] 8074 Jan 24 15:48:05 2u-6n kernel: RSP: 0018:ffffc9002083fbc0 EFLAGS: 00010246 8075 Jan 24 15:48:05 2u-6n kernel: RAX: 0000000000000000 RBX: ffff8807ea121ba0 RCX: 0000000000000000 8076 Jan 24 15:48:05 2u-6n kernel: RDX: ffff880856a09260 RSI: ffffc9002083fc00 RDI: ffffffff81ef2140 8077 Jan 24 15:48:05 2u-6n kernel: RBP: ffffc9002083fbc8 R08: 0000000000000008 R09: ffff8807e36f6b25 8078 Jan 24 15:48:05 2u-6n kernel: R10: ffffc9002083fb70 R11: 0000000000000000 R12: 0000000000000002 8079 Jan 24 15:48:05 2u-6n kernel: R13: 0000000000000002 R14: ffff8807ea121ba0 R15: ffff8807ebcc8d80 8080 Jan 24 15:48:05 2u-6n kernel: FS: 00007f77a5d0f700(0000) GS:ffff88087fd40000(0000) knlGS:0000000000000000 8081 Jan 24 15:48:05 2u-6n kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 8082 Jan 24 15:48:05 2u-6n kernel: CR2: 0000000003d56c88 CR3: 00000007f3106000 CR4: 00000000003406e0 8083 Jan 24 15:48:05 2u-6n kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 8084 Jan 24 15:48:05 2u-6n kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 8085 Jan 24 15:48:05 2u-6n kernel: Call Trace: 8086 Jan 24 15:48:05 2u-6n kernel: inet6_dump_fib+0x1ab/0x2a0 [ipv6] 8087 Jan 24 15:48:05 2u-6n kernel: netlink_dump+0x22c/0x2b0 8088 Jan 24 15:48:05 2u-6n kernel: netlink_recvmsg+0x260/0x3f0 8089 Jan 24 15:48:05 2u-6n kernel: sock_recvmsg+0x38/0x40 8090 Jan 24 15:48:05 2u-6n kernel: ___sys_recvmsg+0xe9/0x230 8091 Jan 24 15:48:05 2u-6n kernel: ? alloc_pages_vma+0x83/0x1e0 8092 Jan 24 15:48:05 2u-6n kernel: ? page_add_new_anon_rmap+0x88/0xc0 8093 Jan 24 15:48:05 2u-6n kernel: ? lru_cache_add_active_or_unevictable+0x31/0xb0 8094 Jan 24 15:48:05 2u-6n kernel: ? __handle_mm_fault+0x5e5/0xfa0 8095 Jan 24 15:48:05 2u-6n kernel: __sys_recvmsg+0x3d/0x70 8096 Jan 24 15:48:05 2u-6n kernel: ? __sys_recvmsg+0x3d/0x70 8097 Jan 24 15:48:05 2u-6n kernel: SyS_recvmsg+0xd/0x20 8098 Jan 24 15:48:05 2u-6n kernel: do_syscall_64+0x56/0xc0 8099 Jan 24 15:48:05 2u-6n kernel: entry_SYSCALL64_slow_path+0x25/0x25 8100 Jan 24 15:48:05 2u-6n kernel: RIP: 0033:0x7f77a5644030 8101 Jan 24 15:48:05 2u-6n kernel: RSP: 002b:00007ffc3e783e68 EFLAGS: 00000246 ORIG_RAX: 000000000000002f 8102 Jan 24 15:48:05 2u-6n kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f77a5644030 8103 Jan 24 15:48:05 2u-6n kernel: RDX: 0000000000000000 RSI: 00007ffc3e783ed0 RDI: 0000000000000004 8104 Jan 24 15:48:05 2u-6n kernel: RBP: 00007ffc3e787ef4 R08: 0000000000003fe4 R09: 0 8105 Jan 24 15:48:05 2u-6n kernel: R10: 00007ffc3e783f10 R11: 0000000000000246 R12: 000000000064f360 8106 Jan 24 15:48:05 2u-6n kernel: R13: 00007ffc3e787f60 R14: 0000000000003fe4 R15: 0000000000000000 8107 Jan 24 15:48:05 2u-6n kernel: Code: ff 24 c5 a8 e5 04 a0 f6 42 2a 02 74 68 c7 43 28 01 00 00 00 48 89 c2 e9 c7 fe ff ff c7 43 28 02 00 00 00 48 89 c2 e9 b8 fe ff ff <0f> ff 31 c9 48 89 de 48 c7 c7 78 36 05 a0 e8 65 e4 14 e1 31 c0 8108 Jan 24 15:48:05 2u-6n kernel: ---[ end trace 1d1c7028c9dec459 ]--- 8109 Jan 24 15:48:05 2u-6n kernel: IPv6: FWS-U, w: ffff8807ea121ba0 fn: ffff880856a09260 pn: (null) 8110 Jan 24 15:48:05 2u-6n kernel: 8021q: adding VLAN 0 to HW filter on device eth2#1008 8111 Jan 24 15:48:05 2u-6n kernel: 8021q: adding VLAN 0 to HW filter on device eth2#1009 .... Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+ 2018-01-24 23:59 ` Ben Greear @ 2018-04-17 23:29 ` Ben Greear 2018-04-18 0:38 ` David Ahern 0 siblings, 1 reply; 25+ messages in thread From: Ben Greear @ 2018-04-17 23:29 UTC (permalink / raw) To: David Ahern, Michal Kubecek; +Cc: Cong Wang, Eric Dumazet, netdev On 01/24/2018 03:59 PM, Ben Greear wrote: > On 06/20/2017 08:03 PM, David Ahern wrote: >> On 6/20/17 5:41 PM, Ben Greear wrote: >>> On 06/20/2017 11:05 AM, Michal Kubecek wrote: >>>> On Tue, Jun 20, 2017 at 07:12:27AM -0700, Ben Greear wrote: >>>>> On 06/14/2017 03:25 PM, David Ahern wrote: >>>>>> On 6/14/17 4:23 PM, Ben Greear wrote: >>>>>>> On 06/13/2017 07:27 PM, David Ahern wrote: >>>>>>> >>>>>>>> Let's try a targeted debug patch. See attached >>>>>>> >>>>>>> I had to change it to pr_err so it would go to our serial console >>>>>>> since the system locked hard on crash, >>>>>>> and that appears to be enough to change the timing where we can no >>>>>>> longer >>>>>>> reproduce the problem. >>>>>> >>>>>> >>>>>> ok, let's figure out which one is doing that. There are 3 debug >>>>>> statements. I suspect fib6_del_route is the one setting the state to >>>>>> FWS_U. Can you remove the debug prints in fib6_repair_tree and >>>>>> fib6_walk_continue and try again? >>>>> >>>>> We cannot reproduce with just that one printf in the kernel either. It >>>>> must change the timing too much to trigger the bug. >>>> >>>> You might try trace_printk() which should have less impact (don't forget >>>> to enable /proc/sys/kernel/ftrace_dump_on_oops). >>> >>> We cannot reproduce with trace_printk() either. >> >> I think that suggests the walker state is set to FWS_U in >> fib6_del_route, and it is the FWS_U case in fib6_walk_continue that >> triggers the fault -- the null parent (pn = fn->parent). So we have the >> 2 areas of code that are interacting. >> >> I'm on a road trip through the end of this week with little time to >> focus on this problem. I'll get back to you another suggestion when I can. FYI, problem still happens in 4.16. I'm going to re-enable my hack below for this kernel as well...I had hopes it might be fixed... BUG: unable to handle kernel NULL pointer dereference at 8 IP: fib6_walk_continue+0x5b/0x140 [ipv6] PGD 80000007dfc0c067 P4D 80000007dfc0c067 PUD 7e66ff067 PMD 0 Oops: 0000 [#1] PREEMPT SMP PTI Modules linked in: nf_conntrack_netlink nf_conntrack nfnetlink nf_defrag_ipv4 libcrc32c vrf] CPU: 3 PID: 15117 Comm: ip Tainted: G O 4.16.0+ #5 Hardware name: Iron_Systems,Inc CS-CAD-2U-A02/X10SRL-F, BIOS 2.0b 05/02/2017 RIP: 0010:fib6_walk_continue+0x5b/0x140 [ipv6] RSP: 0018:ffffc90008c3bc10 EFLAGS: 00010287 RAX: ffff88085ac45050 RBX: ffff8807e03008a0 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffc90008c3bc48 RDI: ffffffff8232b240 RBP: ffff880819167600 R08: 0000000000000008 R09: ffff8807dff10071 R10: ffffc90008c3bbd0 R11: 0000000000000000 R12: ffff8807e03008a0 R13: 0000000000000002 R14: ffff8807e05744c8 R15: ffff8807e08ef000 FS: 00007f2f04342700(0000) GS:ffff88087fcc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 00000007e0556002 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: inet6_dump_fib+0x14b/0x2c0 [ipv6] netlink_dump+0x216/0x2a0 netlink_recvmsg+0x254/0x400 ? copy_msghdr_from_user+0xb5/0x110 ___sys_recvmsg+0xe9/0x230 ? find_held_lock+0x3b/0xb0 ? __handle_mm_fault+0x617/0x1180 ? __audit_syscall_entry+0xb3/0x110 ? __sys_recvmsg+0x39/0x70 __sys_recvmsg+0x39/0x70 do_syscall_64+0x63/0x120 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 RIP: 0033:0x7f2f03a72030 RSP: 002b:00007fffab3de508 EFLAGS: 00000246 ORIG_RAX: 000000000000002f RAX: ffffffffffffffda RBX: 00007fffab3e641c RCX: 00007f2f03a72030 RDX: 0000000000000000 RSI: 00007fffab3de570 RDI: 0000000000000004 RBP: 0000000000000000 R08: 0000000000007e6c R09: 00007fffab3e63a8 R10: 00007fffab3de5b0 R11: 0000000000000246 R12: 00007fffab3e6608 R13: 000000000066b460 R14: 0000000000007e6c R15: 0000000000000000 Code: 85 d2 74 17 f6 40 2a 04 74 11 8b 53 2c 85 d2 0f 84 d7 00 00 00 83 ea 01 89 53 2c c7 4 RIP: fib6_walk_continue+0x5b/0x140 [ipv6] RSP: ffffc90008c3bc10 CR2: 0000000000000008 ---[ end trace bd03458864eb266c ]--- Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: disabled Rebooting in 10 seconds.. ACPI MEMORY or I/O RESET_REG. > > So, though I don't know the right way to fix it, the patch below appears > to make the system not crash. > > > diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c > index 68b9cc7..bf19a14 100644 > --- a/net/ipv6/ip6_fib.c > +++ b/net/ipv6/ip6_fib.c > @@ -1614,6 +1614,12 @@ static int fib6_walk_continue(struct fib6_walker *w) > pn = fn->parent; > w->node = pn; > #ifdef CONFIG_IPV6_SUBTREES > + if (WARN_ON_ONCE(!pn)) { > + pr_err("FWS-U, w: %p fn: %p pn: %p\n", > + w, fn, pn); > + /* Attempt to work around crash that has been here forever. --Ben */ > + return 0; > + } > if (FIB6_SUBTREE(pn) == fn) { > WARN_ON(!(fn->fn_flags & RTN_ROOT)); > w->state = FWS_L; > > > > The printout looks like this (when adding 4000 mac-vlans, so it is pretty rare). PN is definitely NULL sometimes: > > [root@2u-6n ~]# journalctl -f|grep FWS > Jan 24 15:48:05 2u-6n kernel: IPv6: FWS-U, w: ffff8807ea121ba0 fn: ffff880856a09260 pn: (null) > Jan 24 15:51:15 2u-6n kernel: IPv6: FWS-U, w: ffff8807e3963de0 fn: ffff880856a09260 pn: (null) > Jan 24 15:51:15 2u-6n kernel: IPv6: FWS-U, w: ffff88081ac22de0 fn: ffff880856a09260 pn: (null) > Jan 24 15:53:13 2u-6n kernel: IPv6: FWS-U, w: ffff8808290c69c0 fn: ffff8807e369f920 pn: (null) > Jan 24 15:53:24 2u-6n kernel: IPv6: FWS-U, w: ffff8807ea3156c0 fn: ffff88082d1eeb60 pn: (null) > > > > 8066 Jan 24 15:48:04 2u-6n kernel: 8021q: adding VLAN 0 to HW filter on device eth2#1006 > 8067 Jan 24 15:48:05 2u-6n kernel: ------------[ cut here ]------------ > 8068 Jan 24 15:48:05 2u-6n kernel: WARNING: CPU: 5 PID: 3346 at /home/greearb/git/linux-4.13.dev.y/net/ipv6/ip6_fib.c:1617 fib6_walk_continue+ 0x154/0x1b0 [ipv6] > 8069 Jan 24 15:48:05 2u-6n kernel: Modules linked in: 8021q garp mrp stp llc fuse macvlan wanlink(O) pktgen ipmi_ssif coretemp intel_rapl sb_edac > x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm ath9k irqbypass iTCO_wdt ath9k_common iTCO_vendor_support ath9k_hw ath i2c_i801 mac80211 joydev > lpc_ich cfg80211 ioatdma shpchp tpm_tis tpm_tis_core wmi tpm ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter nfsd auth_rpcgss nfs_acl > sch_fq_codel lockd grace sunrpc ast drm_kms_helper ttm drm igb hwmon ptp pps_core dca i2c_algo_bit i2c_core ipv6 crc_ccitt > 8070 Jan 24 15:48:05 2u-6n kernel: CPU: 5 PID: 3346 Comm: ip Tainted: G O 4.13.16+ #22 > 8071 Jan 24 15:48:05 2u-6n kernel: Hardware name: Iron_Systems,Inc CS-CAD-2U-A02/X10SRL-F, BIOS 2.0b 05/02/2017 > 8072 Jan 24 15:48:05 2u-6n kernel: task: ffff8807e9ef1dc0 task.stack: ffffc9002083c000 > 8073 Jan 24 15:48:05 2u-6n kernel: RIP: 0010:fib6_walk_continue+0x154/0x1b0 [ipv6] > 8074 Jan 24 15:48:05 2u-6n kernel: RSP: 0018:ffffc9002083fbc0 EFLAGS: 00010246 > 8075 Jan 24 15:48:05 2u-6n kernel: RAX: 0000000000000000 RBX: ffff8807ea121ba0 RCX: 0000000000000000 > 8076 Jan 24 15:48:05 2u-6n kernel: RDX: ffff880856a09260 RSI: ffffc9002083fc00 RDI: ffffffff81ef2140 > 8077 Jan 24 15:48:05 2u-6n kernel: RBP: ffffc9002083fbc8 R08: 0000000000000008 R09: ffff8807e36f6b25 > 8078 Jan 24 15:48:05 2u-6n kernel: R10: ffffc9002083fb70 R11: 0000000000000000 R12: 0000000000000002 > 8079 Jan 24 15:48:05 2u-6n kernel: R13: 0000000000000002 R14: ffff8807ea121ba0 R15: ffff8807ebcc8d80 > 8080 Jan 24 15:48:05 2u-6n kernel: FS: 00007f77a5d0f700(0000) GS:ffff88087fd40000(0000) knlGS:0000000000000000 > 8081 Jan 24 15:48:05 2u-6n kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > 8082 Jan 24 15:48:05 2u-6n kernel: CR2: 0000000003d56c88 CR3: 00000007f3106000 CR4: 00000000003406e0 > 8083 Jan 24 15:48:05 2u-6n kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > 8084 Jan 24 15:48:05 2u-6n kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > 8085 Jan 24 15:48:05 2u-6n kernel: Call Trace: > 8086 Jan 24 15:48:05 2u-6n kernel: inet6_dump_fib+0x1ab/0x2a0 [ipv6] > 8087 Jan 24 15:48:05 2u-6n kernel: netlink_dump+0x22c/0x2b0 > 8088 Jan 24 15:48:05 2u-6n kernel: netlink_recvmsg+0x260/0x3f0 > 8089 Jan 24 15:48:05 2u-6n kernel: sock_recvmsg+0x38/0x40 > 8090 Jan 24 15:48:05 2u-6n kernel: ___sys_recvmsg+0xe9/0x230 > 8091 Jan 24 15:48:05 2u-6n kernel: ? alloc_pages_vma+0x83/0x1e0 > 8092 Jan 24 15:48:05 2u-6n kernel: ? page_add_new_anon_rmap+0x88/0xc0 > 8093 Jan 24 15:48:05 2u-6n kernel: ? lru_cache_add_active_or_unevictable+0x31/0xb0 > 8094 Jan 24 15:48:05 2u-6n kernel: ? __handle_mm_fault+0x5e5/0xfa0 > 8095 Jan 24 15:48:05 2u-6n kernel: __sys_recvmsg+0x3d/0x70 > 8096 Jan 24 15:48:05 2u-6n kernel: ? __sys_recvmsg+0x3d/0x70 > 8097 Jan 24 15:48:05 2u-6n kernel: SyS_recvmsg+0xd/0x20 > 8098 Jan 24 15:48:05 2u-6n kernel: do_syscall_64+0x56/0xc0 > 8099 Jan 24 15:48:05 2u-6n kernel: entry_SYSCALL64_slow_path+0x25/0x25 > 8100 Jan 24 15:48:05 2u-6n kernel: RIP: 0033:0x7f77a5644030 > 8101 Jan 24 15:48:05 2u-6n kernel: RSP: 002b:00007ffc3e783e68 EFLAGS: 00000246 ORIG_RAX: 000000000000002f > 8102 Jan 24 15:48:05 2u-6n kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f77a5644030 > 8103 Jan 24 15:48:05 2u-6n kernel: RDX: 0000000000000000 RSI: 00007ffc3e783ed0 RDI: 0000000000000004 > 8104 Jan 24 15:48:05 2u-6n kernel: RBP: 00007ffc3e787ef4 R08: 0000000000003fe4 R09: 0 > 8105 Jan 24 15:48:05 2u-6n kernel: R10: 00007ffc3e783f10 R11: 0000000000000246 R12: 000000000064f360 > 8106 Jan 24 15:48:05 2u-6n kernel: R13: 00007ffc3e787f60 R14: 0000000000003fe4 R15: 0000000000000000 > 8107 Jan 24 15:48:05 2u-6n kernel: Code: ff 24 c5 a8 e5 04 a0 f6 42 2a 02 74 68 c7 43 28 01 00 00 00 48 89 c2 e9 c7 fe ff ff c7 43 28 02 00 00 00 48 89 > c2 e9 b8 fe ff ff <0f> ff 31 c9 48 89 de 48 c7 c7 78 36 05 a0 e8 65 e4 14 e1 31 c0 > 8108 Jan 24 15:48:05 2u-6n kernel: ---[ end trace 1d1c7028c9dec459 ]--- > 8109 Jan 24 15:48:05 2u-6n kernel: IPv6: FWS-U, w: ffff8807ea121ba0 fn: ffff880856a09260 pn: (null) > 8110 Jan 24 15:48:05 2u-6n kernel: 8021q: adding VLAN 0 to HW filter on device eth2#1008 > 8111 Jan 24 15:48:05 2u-6n kernel: 8021q: adding VLAN 0 to HW filter on device eth2#1009 > .... > > Thanks, > Ben > -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+ 2018-04-17 23:29 ` Ben Greear @ 2018-04-18 0:38 ` David Ahern 0 siblings, 0 replies; 25+ messages in thread From: David Ahern @ 2018-04-18 0:38 UTC (permalink / raw) To: Ben Greear, Michal Kubecek; +Cc: Cong Wang, Eric Dumazet, netdev On 4/17/18 5:29 PM, Ben Greear wrote: > > FYI, problem still happens in 4.16. I'm going to re-enable my hack below > for this kernel as well...I had hopes it might be fixed... Interesting. I was hoping the same. > > BUG: unable to handle kernel NULL pointer dereference at 8 > IP: fib6_walk_continue+0x5b/0x140 [ipv6] > PGD 80000007dfc0c067 P4D 80000007dfc0c067 PUD 7e66ff067 PMD 0 > Oops: 0000 [#1] PREEMPT SMP PTI > Modules linked in: nf_conntrack_netlink nf_conntrack nfnetlink > nf_defrag_ipv4 libcrc32c vrf] > CPU: 3 PID: 15117 Comm: ip Tainted: G O 4.16.0+ #5 > Hardware name: Iron_Systems,Inc CS-CAD-2U-A02/X10SRL-F, BIOS 2.0b > 05/02/2017 > RIP: 0010:fib6_walk_continue+0x5b/0x140 [ipv6] > RSP: 0018:ffffc90008c3bc10 EFLAGS: 00010287 > RAX: ffff88085ac45050 RBX: ffff8807e03008a0 RCX: 0000000000000000 > RDX: 0000000000000000 RSI: ffffc90008c3bc48 RDI: ffffffff8232b240 > RBP: ffff880819167600 R08: 0000000000000008 R09: ffff8807dff10071 > R10: ffffc90008c3bbd0 R11: 0000000000000000 R12: ffff8807e03008a0 > R13: 0000000000000002 R14: ffff8807e05744c8 R15: ffff8807e08ef000 > FS: 00007f2f04342700(0000) GS:ffff88087fcc0000(0000) > knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000008 CR3: 00000007e0556002 CR4: 00000000003606e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Call Trace: > inet6_dump_fib+0x14b/0x2c0 [ipv6] > netlink_dump+0x216/0x2a0 > netlink_recvmsg+0x254/0x400 > ? copy_msghdr_from_user+0xb5/0x110 > ___sys_recvmsg+0xe9/0x230 > ? find_held_lock+0x3b/0xb0 > ? __handle_mm_fault+0x617/0x1180 > ? __audit_syscall_entry+0xb3/0x110 > ? __sys_recvmsg+0x39/0x70 > __sys_recvmsg+0x39/0x70 > do_syscall_64+0x63/0x120 > entry_SYSCALL_64_after_hwframe+0x3d/0xa2 > RIP: 0033:0x7f2f03a72030 > RSP: 002b:00007fffab3de508 EFLAGS: 00000246 ORIG_RAX: 000000000000002f > RAX: ffffffffffffffda RBX: 00007fffab3e641c RCX: 00007f2f03a72030 > RDX: 0000000000000000 RSI: 00007fffab3de570 RDI: 0000000000000004 > RBP: 0000000000000000 R08: 0000000000007e6c R09: 00007fffab3e63a8 > R10: 00007fffab3de5b0 R11: 0000000000000246 R12: 00007fffab3e6608 > R13: 000000000066b460 R14: 0000000000007e6c R15: 0000000000000000 > Code: 85 d2 74 17 f6 40 2a 04 74 11 8b 53 2c 85 d2 0f 84 d7 00 00 00 83 > ea 01 89 53 2c c7 4 > RIP: fib6_walk_continue+0x5b/0x140 [ipv6] RSP: ffffc90008c3bc10 > CR2: 0000000000000008 > ---[ end trace bd03458864eb266c ]--- > Kernel panic - not syncing: Fatal exception in interrupt > Kernel Offset: disabled > Rebooting in 10 seconds.. > ACPI MEMORY or I/O RESET_REG. > Since you can reproduce, would you mind trying https://github.com/dsahern/linux.git ipv6/fib6-change-v2 Hopefully these will be committed upstream soon. It changes the game a bit with the FIB walker. Would be interesting to know if this problem goes away. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+ 2017-06-07 0:27 ` Eric Dumazet 2017-06-07 0:34 ` David Ahern @ 2017-06-07 0:48 ` Ben Greear 1 sibling, 0 replies; 25+ messages in thread From: Ben Greear @ 2017-06-07 0:48 UTC (permalink / raw) To: Eric Dumazet, David Ahern; +Cc: netdev On 06/06/2017 05:27 PM, Eric Dumazet wrote: > On Tue, 2017-06-06 at 18:00 -0600, David Ahern wrote: >> On 6/6/17 3:06 PM, Ben Greear wrote: >>> This bug has been around forever, and we recently got an intern and >>> stuck him with >>> trying to reproduce it on the latest kernel. It is still here. I'm not >>> super excited >>> about trying to fix this, but we can easily test patches if someone has a >>> patch to try. >> >> Can you try this (whitespace damaged on paste, but it is moving the lock >> ahead of the fn_sernum check): >> >> diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c >> index deea901746c8..7a44c49055c0 100644 >> --- a/net/ipv6/ip6_fib.c >> +++ b/net/ipv6/ip6_fib.c >> @@ -378,6 +378,7 @@ static int fib6_dump_table(struct fib6_table *table, >> struct sk_buff *skb, >> cb->args[5] = w->root->fn_sernum; >> } >> } else { >> + read_lock_bh(&table->tb6_lock); >> if (cb->args[5] != w->root->fn_sernum) { >> /* Begin at the root if the tree changed */ >> cb->args[5] = w->root->fn_sernum; >> @@ -387,7 +388,6 @@ static int fib6_dump_table(struct fib6_table *table, >> struct sk_buff *skb, >> } else >> w->skip = 0; >> >> - read_lock_bh(&table->tb6_lock); >> res = fib6_walk_continue(w); >> read_unlock_bh(&table->tb6_lock); >> if (res <= 0) { > > > Good catch, but it looks like similar fix is needed a few lines before. We will test this tomorrow. Thanks, Ben > > diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c > index deea901746c8570c5e801e40592c91e3b62812e0..b214443dc8346cef3690df7f27cc48a864028865 100644 > --- a/net/ipv6/ip6_fib.c > +++ b/net/ipv6/ip6_fib.c > @@ -372,12 +372,13 @@ static int fib6_dump_table(struct fib6_table *table, struct sk_buff *skb, > > read_lock_bh(&table->tb6_lock); > res = fib6_walk(net, w); > - read_unlock_bh(&table->tb6_lock); > if (res > 0) { > cb->args[4] = 1; > cb->args[5] = w->root->fn_sernum; > } > + read_unlock_bh(&table->tb6_lock); > } else { > + read_lock_bh(&table->tb6_lock); > if (cb->args[5] != w->root->fn_sernum) { > /* Begin at the root if the tree changed */ > cb->args[5] = w->root->fn_sernum; > @@ -387,7 +388,6 @@ static int fib6_dump_table(struct fib6_table *table, struct sk_buff *skb, > } else > w->skip = 0; > > - read_lock_bh(&table->tb6_lock); > res = fib6_walk_continue(w); > read_unlock_bh(&table->tb6_lock); > if (res <= 0) { > > -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2018-04-18 0:38 UTC | newest] Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-06-06 21:06 Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+ Ben Greear 2017-06-07 0:00 ` David Ahern 2017-06-07 0:27 ` Eric Dumazet 2017-06-07 0:34 ` David Ahern 2017-06-07 4:19 ` Eric Dumazet 2017-06-08 21:27 ` Ben Greear 2017-06-09 5:55 ` Cong Wang 2017-06-09 13:27 ` David Ahern 2017-06-09 21:25 ` Eric Dumazet 2017-06-13 20:16 ` Ben Greear 2017-06-13 20:28 ` David Ahern 2017-06-13 20:39 ` Ben Greear 2017-06-13 21:42 ` Cong Wang 2017-06-14 2:27 ` David Ahern 2017-06-14 22:23 ` Ben Greear 2017-06-14 22:25 ` David Ahern 2017-06-20 14:12 ` Ben Greear 2017-06-20 18:05 ` Michal Kubecek 2017-06-20 21:41 ` Ben Greear 2017-06-21 3:03 ` David Ahern 2017-06-25 21:59 ` David Ahern 2018-01-24 23:59 ` Ben Greear 2018-04-17 23:29 ` Ben Greear 2018-04-18 0:38 ` David Ahern 2017-06-07 0:48 ` Ben Greear
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.