* 4.1.12 kernel crash in rtnetlink_put_metrics
@ 2015-11-04 16:00 Andrew
2015-11-04 19:55 ` Daniel Borkmann
0 siblings, 1 reply; 5+ messages in thread
From: Andrew @ 2015-11-04 16:00 UTC (permalink / raw)
To: netdev
Hi all.
Today I've got a crash on one of servers (PPPoE BRAS with BGP/OSPF).
This server becomes unstable after updating from 3.2.x kernel to 4.1.x
(other servers with slightly different CPUs/MBs also have troubles - but
they hang less frequently).
Place in kernel code:
(gdb) list *rtnetlink_put_metrics+0x50
0xc131c7d0 is in rtnetlink_put_metrics
(/var/testpoint/LEAF/source/i486-unknown-linux-uclibc/linux/linux-4.1/net/core/rtnetlink.c:672).
667 mx = nla_nest_start(skb, RTA_METRICS);
668 if (mx == NULL)
669 return -ENOBUFS;
670
671 for (i = 0; i < RTAX_MAX; i++) {
672 if (metrics[i]) {
673 if (i == RTAX_CC_ALGO - 1) {
674 char tmp[TCP_CA_NAME_MAX], *name;
675
676 name = tcp_ca_get_name_by_key(metrics[i], tmp);
Here's trace:
[41358.475254]BUG:unable to handle kernel NULL pointer dereference at
(null)[41358.475333]IP:[<c131c7d0>]rtnetlink_put_metrics+0x50/0x180[41358.475376]*pdpt
=0000000026d58001*pde =0000000000000000[41358.475413]Oops:0000[#1] SMP
[41358.475453]Moduleslinked in:act_mirred pppoe pppox ppp_generic slhc
iptable_filter xt_length xt_TCPMSS xt_tcpudp xt_mark xt_dscp
iptable_mangle ip_tables x_tables ipv6 sch_sfq sch_htb cls_u32
sch_ingress sch_prio sch_tbf cls_flow cls_fw act_police ifb 8021qmrp
garp stp llc softdog parport_pc parport acpi_cpufreq processor
thermal_sys igb(O)k10temp hwmon dca ohci_pci ohci_hcd ptp pps_core
i2c_piix4 i2c_core sp5100_tco sd_mod pata_acpi pata_atiixp pcspkr
ata_generic ahci libahci libata ehci_pci ehci_hcd scsi_mod usbcore
usb_common ext4 mbcache jbd2 crc16 vfat fat isofs
[41358.475807]CPU:2PID:10877Comm:bird Tainted:G O 4.1.12-i686
#1[41358.475880]Hardwarename:MICRO-STAR INTERNATIONAL CO.,LTD
MS-7596/760GM-E51(MS-7596),BIOS
V3.301/12/2012[41358.475955]task:f5302da0 ti:e1364000 task.ti:e1364000
[41358.475993]EIP:0060:[<c131c7d0>]EFLAGS:00010282CPU:2[41358.476030]EIP
isat
rtnetlink_put_metrics+0x50/0x180[41358.476066]EAX:00000000EBX:00000001ECX:00000004EDX:00000000[41358.476106]ESI:00000000EDI:e0b38000
EBP:e1365ca8 ESP:e1365c78
[41358.476143] DS:007bES:007bFS:00d8GS:0033SS:0068[41358.476179]CR0:8005003bCR2:00000000CR3:34966ac0CR4:000006f0[41358.476216]Stack:[41358.476249]00000000c1213873
d4316f64 00000000e0b38000 e1365d00 c1213989
00000fe4[41358.476330] e0b38000 00000000d4316f30 e0b38000 e1365d00
c138362e e1365cd8
0000000c[41358.476405]00000002000000020000000000000000c13bba01 e0b38000
000000fe007d8196[41358.476482]CallTrace:[41358.476522][<c1213873>]?__nla_reserve+0x23/0xe0[41358.476557][<c1213989>]?__nla_put+0x9/0xb0[41358.476595][<c138362e>]?fib_dump_info+0x15e/0x3e0[41358.476636][<c13bba01>]?irq_entries_start+0x639/0x678[41358.476671][<c1386823>]?fib_table_dump+0xf3/0x180[41358.476708][<c138053d>]?inet_dump_fib+0x7d/0x100[41358.476746][<c1337ef1>]?netlink_dump+0x121/0x270[41358.476781][<c1303572>]?skb_free_datagram+0x12/0x40[41358.476818][<c1338284>]?netlink_recvmsg+0x244/0x360[41358.476855][<c12f3f8d>]?sock_recvmsg+0x1d/0x30[41358.476890][<c12f3f70>]?sock_recvmsg_nosec+0x30/0x30[41358.476924][<c12f5cec>]?___sys_recvmsg+0x9c/0x120[41358.476958][<c12f3f70>]?sock_recvmsg_nosec+0x30/0x30[41358.476994][<c10740e4>]?update_cfs_rq_blocked_load+0xc4/0x130[41358.477030][<c109
4bb4>]?hrtimer_forward+0xa4/0x1c0[41358.477065][<c12f4cdd>]?sockfd_lookup_light+0x1d/0x80[41358.477099][<c12f6c5e>]?__sys_recvmsg+0x3e/0x80[41358.477134][<c12f6ff1>]?SyS_socketcall+0xb1/0x2a0[41358.477168][<c108657c>]?handle_irq_event+0x3c/0x60[41358.477203][<c1088efd>]?handle_edge_irq+0x7d/0x100[41358.477238][<c130a2e6>]?rps_trigger_softirq+0x26/0x30[41358.477273][<c10a88e3>]?flush_smp_call_function_queue+0x83/0x120[41358.477307][<c13bb2be>]?syscall_call+0x7/0x7[41358.477341]Code:008945d8
89c3 89f8 e8 7e72ef ff 85c0 0f889e00000085db 0f8496000000bb 01000000c7
45dc 000000006690<8b>449efc 85c0 742b83fb 100f84840000008945e0
8d[41358.477509]EIP:[<c131c7d0>]rtnetlink_put_metrics+0x50/0x180SS:ESP
0068:e1365c78
[41358.477576]CR2:0000000000000000[41358.477880]---[endtrace
6e3e7e6b81407c0a]---[41358.499813]------------[cut here
]------------[41358.499879]WARNING:CPU:2PID:0at
/var/testpoint/LEAF/source/i486-unknown-linux-uclibc/linux/linux-4.1/net/netlink/af_netlink.c:944netlink_sock_destruct+0xa8/0xc0()[41358.500003]Moduleslinked
in:act_mirred pppoe pppox ppp_generic slhc iptable_filter xt_length
xt_TCPMSS xt_tcpudp xt_mark xt_dscp iptable_mangle ip_tables x_tables
ipv6 sch_sfq sch_htb cls_u32 sch_ingress sch_prio sch_tbf cls_flow
cls_fw act_police ifb 8021qmrp garp stp llc softdog parport_pc parport
acpi_cpufreq processor thermal_sys igb(O)k10temp hwmon dca ohci_pci
ohci_hcd ptp pps_core i2c_piix4 i2c_core sp5100_tco sd_mod pata_acpi
pata_atiixp pcspkr ata_generic ahci libahci libata ehci_pci ehci_hcd
scsi_mod usbcore usb_common ext4 mbcache jbd2 crc16 vfat fat isofs
[41358.502110]CPU:2PID:0Comm:swapper/2Tainted:G D O 4.1.12-i686
#1[41358.502213]Hardwarename:MICRO-STAR INTERNATIONAL CO.,LTD
MS-7596/760GM-E51(MS-7596),BIOS V3.301/12/2012[41358.502305] c14b0540
f5259f40 c13b6ee2 00000000c104b5a3 c1475fd4
0000000200000000[41358.502610] c14b0540 000003b0c13373e8
00000009c13373e8 f2204c00 0000000a0000000a[41358.502920] f5259f50
c104b680 0000000900000000f5259f64 c13373e8 c108f4d7 c108f4d7
[41358.503230]CallTrace:[41358.503292][<c13b6ee2>]?dump_stack+0x3e/0x4e[41358.503357][<c104b5a3>]?warn_slowpath_common+0x93/0xd0[41358.503420][<c13373e8>]?netlink_sock_destruct+0xa8/0xc0[41358.503484][<c13373e8>]?netlink_sock_destruct+0xa8/0xc0[41358.503548][<c104b680>]?warn_slowpath_null+0x20/0x30[41358.503609][<c13373e8>]?netlink_sock_destruct+0xa8/0xc0[41358.503671][<c108f4d7>]?rcu_process_callbacks+0x1b7/0x4e0[41358.503732][<c108f4d7>]?rcu_process_callbacks+0x1b7/0x4e0[41358.503794][<c12f9b88>]?__sk_free+0x18/0xf0[41358.503862][<c108f513>]?rcu_process_callbacks+0x1f3/0x4e0[41358.503929][<c104e753>]?__do_softirq+0xc3/0x240[41358.503992][<c104e690>]?__tasklet_hrtimer_trampoline+0x50/0x50[41358.504056][<c1004729>]?do_softirq_own_stack+0x29/0x40[41358.504117]<IRQ>[<c104ea9e>]?irq_exit+0x6e
/0x90[41358.504208][<c13bc3f8>]?smp_apic_timer_interrupt+0x38/0x50[41358.504270][<c13bbcd9>]?apic_timer_interrupt+0x2d/0x34[41358.504332][<c100bfc9>]?default_idle+0x19/0xb0[41358.504395][<c100cd2e>]?arch_cpu_idle+0xe/0x10[41358.504458][<c107ec55>]?cpu_startup_entry+0x215/0x310[41358.504519]---[endtrace
6e3e7e6b81407c0b]---
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 4.1.12 kernel crash in rtnetlink_put_metrics
2015-11-04 16:00 4.1.12 kernel crash in rtnetlink_put_metrics Andrew
@ 2015-11-04 19:55 ` Daniel Borkmann
2016-03-07 22:15 ` subashab
0 siblings, 1 reply; 5+ messages in thread
From: Daniel Borkmann @ 2015-11-04 19:55 UTC (permalink / raw)
To: Andrew; +Cc: netdev
Hi Andrew,
thanks for the report!
On 11/04/2015 05:00 PM, Andrew wrote:
> Hi all.
>
> Today I've got a crash on one of servers (PPPoE BRAS with BGP/OSPF). This server becomes unstable after updating from 3.2.x kernel to 4.1.x (other servers with slightly different CPUs/MBs also have troubles - but they hang less frequently).
>
> Place in kernel code:
> (gdb) list *rtnetlink_put_metrics+0x50
> 0xc131c7d0 is in rtnetlink_put_metrics (/var/testpoint/LEAF/source/i486-unknown-linux-uclibc/linux/linux-4.1/net/core/rtnetlink.c:672).
> 667 mx = nla_nest_start(skb, RTA_METRICS);
> 668 if (mx == NULL)
> 669 return -ENOBUFS;
> 670
> 671 for (i = 0; i < RTAX_MAX; i++) {
> 672 if (metrics[i]) {
( Making the trace a bit more readable ... )
[41358.475254]BUG:unable to handle kernel NULL pointer dereference at (null)
[41358.475333]IP:[<c131c7d0>]rtnetlink_put_metrics+0x50/0x180
[...]
CallTrace:
[41358.476522][<c1213873>]?__nla_reserve+0x23/0xe0
[41358.476557][<c1213989>]?__nla_put+0x9/0xb0
[41358.476595][<c138362e>]?fib_dump_info+0x15e/0x3e0
[41358.476636][<c13bba01>]?irq_entries_start+0x639/0x678
[41358.476671][<c1386823>]?fib_table_dump+0xf3/0x180
[41358.476708][<c138053d>]?inet_dump_fib+0x7d/0x100
[41358.476746][<c1337ef1>]?netlink_dump+0x121/0x270
[41358.476781][<c1303572>]?skb_free_datagram+0x12/0x40
[41358.476818][<c1338284>]?netlink_recvmsg+0x244/0x360
[41358.476855][<c12f3f8d>]?sock_recvmsg+0x1d/0x30
[41358.476890][<c12f3f70>]?sock_recvmsg_nosec+0x30/0x30
[41358.476924][<c12f5cec>]?___sys_recvmsg+0x9c/0x120
[41358.476958][<c12f3f70>]?sock_recvmsg_nosec+0x30/0x30
[41358.476994][<c10740e4>]?update_cfs_rq_blocked_load+0xc4/0x130
[41358.477030][<c1094bb4>]?hrtimer_forward+0xa4/0x1c0
[41358.477065][<c12f4cdd>]?sockfd_lookup_light+0x1d/0x80
[41358.477099][<c12f6c5e>]?__sys_recvmsg+0x3e/0x80
[41358.477134][<c12f6ff1>]?SyS_socketcall+0xb1/0x2a0
[41358.477168][<c108657c>]?handle_irq_event+0x3c/0x60
[41358.477203][<c1088efd>]?handle_edge_irq+0x7d/0x100
[41358.477238][<c130a2e6>]?rps_trigger_softirq+0x26/0x30
[41358.477273][<c10a88e3>]?flush_smp_call_function_queue+0x83/0x120
[41358.477307][<c13bb2be>]?syscall_call+0x7/0x7
[...]
Strange that rtnetlink_put_metrics() itself is not part of the above
call trace (it's an exported symbol).
So, your analysis suggests that metrics itself is NULL in this case?
(Can you confirm that?)
How frequently does this trigger? Are the seen call traces all the same kind?
Is there an easy way to reproduce this?
I presume you don't use any per route congestion control settings, right?
Thanks,
Daniel
> 673 if (i == RTAX_CC_ALGO - 1) {
> 674 char tmp[TCP_CA_NAME_MAX], *name;
> 675
> 676 name = tcp_ca_get_name_by_key(metrics[i], tmp);
>
>
> Here's trace:
>
> [41358.475254]BUG:unable to handle kernel NULL pointer dereference at (null)[41358.475333]IP:[<c131c7d0>]rtnetlink_put_metrics+0x50/0x180[41358.475376]*pdpt =0000000026d58001*pde =0000000000000000[41358.475413]Oops:0000[#1] SMP [41358.475453]Moduleslinked in:act_mirred pppoe pppox ppp_generic slhc iptable_filter xt_length xt_TCPMSS xt_tcpudp xt_mark xt_dscp iptable_mangle ip_tables x_tables ipv6 sch_sfq sch_htb cls_u32 sch_ingress sch_prio sch_tbf cls_flow cls_fw act_police ifb 8021qmrp garp stp llc softdog parport_pc parport acpi_cpufreq processor thermal_sys igb(O)k10temp hwmon dca ohci_pci ohci_hcd ptp pps_core i2c_piix4 i2c_core sp5100_tco sd_mod pata_acpi pata_atiixp pcspkr ata_generic ahci libahci libata ehci_pci ehci_hcd scsi_mod usbcore usb_common ext4 mbcache jbd2 crc16 vfat fat
isofs [41358.475807]CPU:2PID:10877Comm:bird Tainted:G O 4.1.12-i686 #1[41358.475880]Hardwarename:MICRO-STAR INTERNATIONAL CO.,LTD MS-7596/760GM-E51(MS-7596),BIOS
> V3.301/12/2012[41358.475955]task:f5302da0 ti:e1364000 task.ti:e1364000 [41358.475993]EIP:0060:[<c131c7d0>]EFLAGS:00010282CPU:2[41358.476030]EIP isat rtnetlink_put_metrics+0x50/0x180[41358.476066]EAX:00000000EBX:00000001ECX:00000004EDX:00000000[41358.476106]ESI:00000000EDI:e0b38000 EBP:e1365ca8 ESP:e1365c78 [41358.476143] DS:007bES:007bFS:00d8GS:0033SS:0068[41358.476179]CR0:8005003bCR2:00000000CR3:34966ac0CR4:000006f0[41358.476216]Stack:[41358.476249]00000000c1213873 d4316f64 00000000e0b38000 e1365d00 c1213989 00000fe4[41358.476330] e0b38000 00000000d4316f30 e0b38000 e1365d00 c138362e e1365cd8 0000000c[41358.476405]00000002000000020000000000000000c13bba01 e0b38000
> 000000fe007d8196[41358.476482]CallTrace:[41358.476522][<c1213873>]?__nla_reserve+0x23/0xe0[41358.476557][<c1213989>]?__nla_put+0x9/0xb0[41358.476595][<c138362e>]?fib_dump_info+0x15e/0x3e0[41358.476636][<c13bba01>]?irq_entries_start+0x639/0x678[41358.476671][<c1386823>]?fib_table_dump+0xf3/0x180[41358.476708][<c138053d>]?inet_dump_fib+0x7d/0x100[41358.476746][<c1337ef1>]?netlink_dump+0x121/0x270[41358.476781][<c1303572>]?skb_free_datagram+0x12/0x40[41358.476818][<c1338284>]?netlink_recvmsg+0x244/0x360[41358.476855][<c12f3f8d>]?sock_recvmsg+0x1d/0x30[41358.476890][<c12f3f70>]?sock_recvmsg_nosec+0x30/0x30[41358.476924][<c12f5cec>]?___sys_recvmsg+0x9c/0x120[41358.476958][<c12f3f70>]?sock_recvmsg_nosec+0x30/0x30[41358.476994][<c10740e4>]?update_cfs_rq_blocked_load+0xc4/0x130[41358.477030][<c1
094bb4>]?hrtimer_forward+0xa4/0x1c0[41358.477065][<c12f4cdd>]?sockfd_lookup_light+0x1d/0x80[41358.477099][<c12f6c5e>]?__sys_recvmsg+0x3e/0x80[41358.477134][<c12f6ff1>]?SyS_socketcall+0xb1/0x2a0[41358
.477168][<c108657c>]?handle_irq_event+0x3c/0x60[41358.477203][<c1088efd>]?handle_edge_irq+0x7d/0x100[41358.477238][<c130a2e6>]?rps_trigger_softirq+0x26/0x30[41358.477273][<c10a88e3>]?flush_smp_call_function_queue+0x83/0x120[41358.477307][<c13bb2be>]?syscall_call+0x7/0x7[41358.477341]Code:008945d8
> 89c3 89f8 e8 7e72ef ff 85c0 0f889e00000085db 0f8496000000bb 01000000c7 45dc 000000006690<8b>449efc 85c0 742b83fb 100f84840000008945e0 8d[41358.477509]EIP:[<c131c7d0>]rtnetlink_put_metrics+0x50/0x180SS:ESP 0068:e1365c78 [41358.477576]CR2:0000000000000000[41358.477880]---[endtrace 6e3e7e6b81407c0a]---[41358.499813]------------[cut here ]------------[41358.499879]WARNING:CPU:2PID:0at /var/testpoint/LEAF/source/i486-unknown-linux-uclibc/linux/linux-4.1/net/netlink/af_netlink.c:944netlink_sock_destruct+0xa8/0xc0()[41358.500003]Moduleslinked in:act_mirred pppoe pppox ppp_generic slhc iptable_filter xt_length xt_TCPMSS xt_tcpudp xt_mark xt_dscp iptable_mangle ip_tables x_tables ipv6 sch_sfq sch_htb cls_u32 sch_ingress sch_prio sch_tbf cls_flow cls_fw act_police ifb 8021qmrp garp stp llc softdog
parport_pc parport acpi_cpufreq processor thermal_sys igb(O)k10temp hwmon dca ohci_pci ohci_hcd ptp pps_core i2c_piix4 i2c_core sp5100_tco sd_mod pata_acpi pata_atiixp pcspkr ata_generic ahci
> libahci libata ehci_pci ehci_hcd scsi_mod usbcore usb_common ext4 mbcache jbd2 crc16 vfat fat isofs [41358.502110]CPU:2PID:0Comm:swapper/2Tainted:G D O 4.1.12-i686 #1[41358.502213]Hardwarename:MICRO-STAR INTERNATIONAL CO.,LTD MS-7596/760GM-E51(MS-7596),BIOS V3.301/12/2012[41358.502305] c14b0540 f5259f40 c13b6ee2 00000000c104b5a3 c1475fd4 0000000200000000[41358.502610] c14b0540 000003b0c13373e8 00000009c13373e8 f2204c00 0000000a0000000a[41358.502920] f5259f50 c104b680 0000000900000000f5259f64 c13373e8 c108f4d7 c108f4d7
> [41358.503230]CallTrace:[41358.503292][<c13b6ee2>]?dump_stack+0x3e/0x4e[41358.503357][<c104b5a3>]?warn_slowpath_common+0x93/0xd0[41358.503420][<c13373e8>]?netlink_sock_destruct+0xa8/0xc0[41358.503484][<c13373e8>]?netlink_sock_destruct+0xa8/0xc0[41358.503548][<c104b680>]?warn_slowpath_null+0x20/0x30[41358.503609][<c13373e8>]?netlink_sock_destruct+0xa8/0xc0[41358.503671][<c108f4d7>]?rcu_process_callbacks+0x1b7/0x4e0[41358.503732][<c108f4d7>]?rcu_process_callbacks+0x1b7/0x4e0[41358.503794][<c12f9b88>]?__sk_free+0x18/0xf0[41358.503862][<c108f513>]?rcu_process_callbacks+0x1f3/0x4e0[41358.503929][<c104e753>]?__do_softirq+0xc3/0x240[41358.503992][<c104e690>]?__tasklet_hrtimer_trampoline+0x50/0x50[41358.504056][<c1004729>]?do_softirq_own_stack+0x29/0x40[41358.504117]<IRQ>[<c104ea9e>]?irq_exit+0x
6e/0x90[41358.504208][<c13bc3f8>]?smp_apic_timer_interrupt+0x38/0x50[41358.504270][<c13bbcd9>]?apic_timer_interrupt+0x2d/0x34[41358.504332][<c100bfc9>]?default_idle+0x19/0xb0[41358.504395][<c100cd2e>
]?arch_cpu_idle+0xe/0x10[41358.504458][<c107ec55>]?cpu_startup_entry+0x215/0x310[41358.504519]---[endtrace
> 6e3e7e6b81407c0b]---
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 4.1.12 kernel crash in rtnetlink_put_metrics
2015-11-04 19:55 ` Daniel Borkmann
@ 2016-03-07 22:15 ` subashab
2016-03-07 23:39 ` Daniel Borkmann
0 siblings, 1 reply; 5+ messages in thread
From: subashab @ 2016-03-07 22:15 UTC (permalink / raw)
To: Daniel Borkmann; +Cc: Andrew, netdev, netdev-owner
On , Daniel Borkmann wrote:
> Hi Andrew,
>
> thanks for the report!
>
> ( Making the trace a bit more readable ... )
>
> [41358.475254]BUG:unable to handle kernel NULL pointer dereference at
> (null)
> [41358.475333]IP:[<c131c7d0>]rtnetlink_put_metrics+0x50/0x180
> [...]
> CallTrace:
> [41358.476522][<c1213873>]?__nla_reserve+0x23/0xe0
> [41358.476557][<c1213989>]?__nla_put+0x9/0xb0
> [41358.476595][<c138362e>]?fib_dump_info+0x15e/0x3e0
> [41358.476636][<c13bba01>]?irq_entries_start+0x639/0x678
> [41358.476671][<c1386823>]?fib_table_dump+0xf3/0x180
> [41358.476708][<c138053d>]?inet_dump_fib+0x7d/0x100
> [41358.476746][<c1337ef1>]?netlink_dump+0x121/0x270
> [41358.476781][<c1303572>]?skb_free_datagram+0x12/0x40
> [41358.476818][<c1338284>]?netlink_recvmsg+0x244/0x360
> [41358.476855][<c12f3f8d>]?sock_recvmsg+0x1d/0x30
> [41358.476890][<c12f3f70>]?sock_recvmsg_nosec+0x30/0x30
> [41358.476924][<c12f5cec>]?___sys_recvmsg+0x9c/0x120
> [41358.476958][<c12f3f70>]?sock_recvmsg_nosec+0x30/0x30
> [41358.476994][<c10740e4>]?update_cfs_rq_blocked_load+0xc4/0x130
> [41358.477030][<c1094bb4>]?hrtimer_forward+0xa4/0x1c0
> [41358.477065][<c12f4cdd>]?sockfd_lookup_light+0x1d/0x80
> [41358.477099][<c12f6c5e>]?__sys_recvmsg+0x3e/0x80
> [41358.477134][<c12f6ff1>]?SyS_socketcall+0xb1/0x2a0
> [41358.477168][<c108657c>]?handle_irq_event+0x3c/0x60
> [41358.477203][<c1088efd>]?handle_edge_irq+0x7d/0x100
> [41358.477238][<c130a2e6>]?rps_trigger_softirq+0x26/0x30
> [41358.477273][<c10a88e3>]?flush_smp_call_function_queue+0x83/0x120
> [41358.477307][<c13bb2be>]?syscall_call+0x7/0x7
> [...]
>
> Strange that rtnetlink_put_metrics() itself is not part of the above
> call trace (it's an exported symbol).
>
> So, your analysis suggests that metrics itself is NULL in this case?
> (Can you confirm that?)
>
> How frequently does this trigger? Are the seen call traces all the same
> kind?
>
> Is there an easy way to reproduce this?
>
> I presume you don't use any per route congestion control settings,
> right?
>
> Thanks,
> Daniel
Hi Daniel
I am observing a similar crash as well. This is on a 3.10 based ARM64
kernel.
Unfortunately, the crash is occurring in a regression test rack, so I am
not
sure of the exact test case to reproduce this crash. This seems to have
occurred twice so far with both cases having metrics as NULL.
| rt_=_0xFFFFFFC012DA4300 -> (
| dst = (
| callback_head = (next = 0x0, func = 0xFFFFFF800262D040),
| child = 0xFFFFFFC03B8BC2B0,
| dev = 0xFFFFFFC012DA4318,
| ops = 0xFFFFFFC012DA4318,
| _metrics = 0,
| expires = 0,
| path = 0x0,
| from = 0x0,
| xfrm = 0x0,
| input = 0xFFFFFFC0AD498000,
| output = 0x000000010401C411,
| flags = 0,
| pending_confirm = 0,
| error = 0,
| obsolete = 0,
| header_len = 3,
| trailer_len = 0,
| __pad2 = 4096,
168539.549000: <6> Process ip (pid: 28473, stack limit =
0xffffffc04b584060)
168539.549006: <2> Call trace:
168539.549016: <2> [<ffffffc000a95900>]
rtnetlink_put_metrics+0x4c/0xec
168539.549027: <2> [<ffffffc000b5e198>]
rt6_fill_node.isra.34+0x2b8/0x3c8
168539.549035: <2> [<ffffffc000b5e6e0>] rt6_dump_route+0x68/0x7c
168539.549043: <2> [<ffffffc000b5edec>] fib6_dump_node+0x2c/0x74
168539.549051: <2> [<ffffffc000b5ec24>] fib6_walk_continue+0xf8/0x1b4
168539.549059: <2> [<ffffffc000b5f140>] fib6_walk+0x5c/0xb8
168539.549067: <2> [<ffffffc000b5f2a0>] inet6_dump_fib+0x104/0x234
168539.549076: <2> [<ffffffc000ab1510>] netlink_dump+0x7c/0x1cc
168539.549084: <2> [<ffffffc000ab22f0>]
__netlink_dump_start+0x128/0x170
168539.549093: <2> [<ffffffc000a98ddc>] rtnetlink_rcv_msg+0x12c/0x1a0
168539.549101: <2> [<ffffffc000ab3a80>] netlink_rcv_skb+0x64/0xc8
168539.549110: <2> [<ffffffc000a97644>] rtnetlink_rcv+0x1c/0x2c
168539.549117: <2> [<ffffffc000ab34cc>] netlink_unicast+0x108/0x1b8
168539.549125: <2> [<ffffffc000ab38b8>] netlink_sendmsg+0x27c/0x2d4
168539.549134: <2> [<ffffffc000a73f04>] sock_sendmsg+0x8c/0xb0
168539.549143: <2> [<ffffffc000a75f04>] SyS_sendto+0xcc/0x110
I am using the following patch as a workaround now. I do not have any
per route congestion control settings enabled.
Any pointers to debug this would be greatly appreciated.
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index a67310e..c63098e 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -566,7 +566,7 @@ int rtnetlink_put_metrics(struct sk_buff *skb, u32
*metrics)
int i, valid = 0;
mx = nla_nest_start(skb, RTA_METRICS);
- if (mx == NULL)
+ if (mx == NULL || metrics == NULL)
return -ENOBUFS;
for (i = 0; i < RTAX_MAX; i++) {
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: 4.1.12 kernel crash in rtnetlink_put_metrics
2016-03-07 22:15 ` subashab
@ 2016-03-07 23:39 ` Daniel Borkmann
2016-03-08 4:27 ` subashab
0 siblings, 1 reply; 5+ messages in thread
From: Daniel Borkmann @ 2016-03-07 23:39 UTC (permalink / raw)
To: subashab; +Cc: Andrew, netdev, kafai
On 03/07/2016 11:15 PM, subashab@codeaurora.org wrote:
> On , Daniel Borkmann wrote:
>> Hi Andrew,
>>
>> thanks for the report!
>>
>> ( Making the trace a bit more readable ... )
>>
>> [41358.475254]BUG:unable to handle kernel NULL pointer dereference at (null)
>> [41358.475333]IP:[<c131c7d0>]rtnetlink_put_metrics+0x50/0x180
>> [...]
>> CallTrace:
>> [41358.476522][<c1213873>]?__nla_reserve+0x23/0xe0
>> [41358.476557][<c1213989>]?__nla_put+0x9/0xb0
>> [41358.476595][<c138362e>]?fib_dump_info+0x15e/0x3e0
>> [41358.476636][<c13bba01>]?irq_entries_start+0x639/0x678
>> [41358.476671][<c1386823>]?fib_table_dump+0xf3/0x180
>> [41358.476708][<c138053d>]?inet_dump_fib+0x7d/0x100
>> [41358.476746][<c1337ef1>]?netlink_dump+0x121/0x270
>> [41358.476781][<c1303572>]?skb_free_datagram+0x12/0x40
>> [41358.476818][<c1338284>]?netlink_recvmsg+0x244/0x360
>> [41358.476855][<c12f3f8d>]?sock_recvmsg+0x1d/0x30
>> [41358.476890][<c12f3f70>]?sock_recvmsg_nosec+0x30/0x30
>> [41358.476924][<c12f5cec>]?___sys_recvmsg+0x9c/0x120
>> [41358.476958][<c12f3f70>]?sock_recvmsg_nosec+0x30/0x30
>> [41358.476994][<c10740e4>]?update_cfs_rq_blocked_load+0xc4/0x130
>> [41358.477030][<c1094bb4>]?hrtimer_forward+0xa4/0x1c0
>> [41358.477065][<c12f4cdd>]?sockfd_lookup_light+0x1d/0x80
>> [41358.477099][<c12f6c5e>]?__sys_recvmsg+0x3e/0x80
>> [41358.477134][<c12f6ff1>]?SyS_socketcall+0xb1/0x2a0
>> [41358.477168][<c108657c>]?handle_irq_event+0x3c/0x60
>> [41358.477203][<c1088efd>]?handle_edge_irq+0x7d/0x100
>> [41358.477238][<c130a2e6>]?rps_trigger_softirq+0x26/0x30
>> [41358.477273][<c10a88e3>]?flush_smp_call_function_queue+0x83/0x120
>> [41358.477307][<c13bb2be>]?syscall_call+0x7/0x7
>> [...]
>>
>> Strange that rtnetlink_put_metrics() itself is not part of the above
>> call trace (it's an exported symbol).
>>
>> So, your analysis suggests that metrics itself is NULL in this case?
>> (Can you confirm that?)
>>
>> How frequently does this trigger? Are the seen call traces all the same kind?
>>
>> Is there an easy way to reproduce this?
>>
>> I presume you don't use any per route congestion control settings, right?
>>
>> Thanks,
>> Daniel
>
> Hi Daniel
>
> I am observing a similar crash as well. This is on a 3.10 based ARM64 kernel.
> Unfortunately, the crash is occurring in a regression test rack, so I am not
> sure of the exact test case to reproduce this crash. This seems to have
> occurred twice so far with both cases having metrics as NULL.
>
> | rt_=_0xFFFFFFC012DA4300 -> (
> | dst = (
> | callback_head = (next = 0x0, func = 0xFFFFFF800262D040),
> | child = 0xFFFFFFC03B8BC2B0,
> | dev = 0xFFFFFFC012DA4318,
> | ops = 0xFFFFFFC012DA4318,
> | _metrics = 0,
> | expires = 0,
> | path = 0x0,
> | from = 0x0,
> | xfrm = 0x0,
> | input = 0xFFFFFFC0AD498000,
> | output = 0x000000010401C411,
> | flags = 0,
> | pending_confirm = 0,
> | error = 0,
> | obsolete = 0,
> | header_len = 3,
> | trailer_len = 0,
> | __pad2 = 4096,
>
> 168539.549000: <6> Process ip (pid: 28473, stack limit = 0xffffffc04b584060)
> 168539.549006: <2> Call trace:
> 168539.549016: <2> [<ffffffc000a95900>] rtnetlink_put_metrics+0x4c/0xec
> 168539.549027: <2> [<ffffffc000b5e198>] rt6_fill_node.isra.34+0x2b8/0x3c8
> 168539.549035: <2> [<ffffffc000b5e6e0>] rt6_dump_route+0x68/0x7c
> 168539.549043: <2> [<ffffffc000b5edec>] fib6_dump_node+0x2c/0x74
> 168539.549051: <2> [<ffffffc000b5ec24>] fib6_walk_continue+0xf8/0x1b4
> 168539.549059: <2> [<ffffffc000b5f140>] fib6_walk+0x5c/0xb8
> 168539.549067: <2> [<ffffffc000b5f2a0>] inet6_dump_fib+0x104/0x234
> 168539.549076: <2> [<ffffffc000ab1510>] netlink_dump+0x7c/0x1cc
> 168539.549084: <2> [<ffffffc000ab22f0>] __netlink_dump_start+0x128/0x170
> 168539.549093: <2> [<ffffffc000a98ddc>] rtnetlink_rcv_msg+0x12c/0x1a0
> 168539.549101: <2> [<ffffffc000ab3a80>] netlink_rcv_skb+0x64/0xc8
> 168539.549110: <2> [<ffffffc000a97644>] rtnetlink_rcv+0x1c/0x2c
> 168539.549117: <2> [<ffffffc000ab34cc>] netlink_unicast+0x108/0x1b8
> 168539.549125: <2> [<ffffffc000ab38b8>] netlink_sendmsg+0x27c/0x2d4
> 168539.549134: <2> [<ffffffc000a73f04>] sock_sendmsg+0x8c/0xb0
> 168539.549143: <2> [<ffffffc000a75f04>] SyS_sendto+0xcc/0x110
>
> I am using the following patch as a workaround now. I do not have any
> per route congestion control settings enabled.
> Any pointers to debug this would be greatly appreciated.
Hmm, if it was 4.1.X like in original reporter case, I might have thought
something like commit 0a1f59620068 ("ipv6: Initialize rt6_info properly
in ip6_blackhole_route()") ... any chance on reproducing this on a latest
kernel?
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index a67310e..c63098e 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -566,7 +566,7 @@ int rtnetlink_put_metrics(struct sk_buff *skb, u32 *metrics)
> int i, valid = 0;
>
> mx = nla_nest_start(skb, RTA_METRICS);
> - if (mx == NULL)
> + if (mx == NULL || metrics == NULL)
> return -ENOBUFS;
>
> for (i = 0; i < RTAX_MAX; i++) {
>
>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 4.1.12 kernel crash in rtnetlink_put_metrics
2016-03-07 23:39 ` Daniel Borkmann
@ 2016-03-08 4:27 ` subashab
0 siblings, 0 replies; 5+ messages in thread
From: subashab @ 2016-03-08 4:27 UTC (permalink / raw)
To: Daniel Borkmann; +Cc: Andrew, netdev, kafai
> Hmm, if it was 4.1.X like in original reporter case, I might have thought
> something like commit 0a1f59620068 ("ipv6: Initialize rt6_info properly
> in ip6_blackhole_route()") ... any chance on reproducing this on a latest
> kernel?
>
Unfortunately, I haven't encountered a similar crash on newer kernels as of now.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-03-08 4:27 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-04 16:00 4.1.12 kernel crash in rtnetlink_put_metrics Andrew
2015-11-04 19:55 ` Daniel Borkmann
2016-03-07 22:15 ` subashab
2016-03-07 23:39 ` Daniel Borkmann
2016-03-08 4:27 ` subashab
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.