All of lore.kernel.org
 help / color / mirror / Atom feed
* 4.1.12 kernel crash in rtnetlink_put_metrics
@ 2015-11-04 16:00 Andrew
  2015-11-04 19:55 ` Daniel Borkmann
  0 siblings, 1 reply; 5+ messages in thread
From: Andrew @ 2015-11-04 16:00 UTC (permalink / raw)
  To: netdev

Hi all.

Today I've got a crash on one of servers (PPPoE BRAS with BGP/OSPF). 
This server becomes unstable after updating from 3.2.x kernel to 4.1.x 
(other servers with slightly different CPUs/MBs also have troubles - but 
they hang less frequently).

Place in kernel code:
(gdb) list *rtnetlink_put_metrics+0x50
0xc131c7d0 is in rtnetlink_put_metrics 
(/var/testpoint/LEAF/source/i486-unknown-linux-uclibc/linux/linux-4.1/net/core/rtnetlink.c:672).
667        mx = nla_nest_start(skb, RTA_METRICS);
668        if (mx == NULL)
669            return -ENOBUFS;
670
671        for (i = 0; i < RTAX_MAX; i++) {
672            if (metrics[i]) {
673                if (i == RTAX_CC_ALGO - 1) {
674                    char tmp[TCP_CA_NAME_MAX], *name;
675
676                    name = tcp_ca_get_name_by_key(metrics[i], tmp);


Here's trace:

[41358.475254]BUG:unable to handle kernel NULL pointer dereference at 
(null)[41358.475333]IP:[<c131c7d0>]rtnetlink_put_metrics+0x50/0x180[41358.475376]*pdpt 
=0000000026d58001*pde =0000000000000000[41358.475413]Oops:0000[#1] SMP 
[41358.475453]Moduleslinked in:act_mirred pppoe pppox ppp_generic slhc 
iptable_filter xt_length xt_TCPMSS xt_tcpudp xt_mark xt_dscp 
iptable_mangle ip_tables x_tables ipv6 sch_sfq sch_htb cls_u32 
sch_ingress sch_prio sch_tbf cls_flow cls_fw act_police ifb 8021qmrp 
garp stp llc softdog parport_pc parport acpi_cpufreq processor 
thermal_sys igb(O)k10temp hwmon dca ohci_pci ohci_hcd ptp pps_core 
i2c_piix4 i2c_core sp5100_tco sd_mod pata_acpi pata_atiixp pcspkr 
ata_generic ahci libahci libata ehci_pci ehci_hcd scsi_mod usbcore 
usb_common ext4 mbcache jbd2 crc16 vfat fat isofs 
[41358.475807]CPU:2PID:10877Comm:bird Tainted:G           O 4.1.12-i686 
#1[41358.475880]Hardwarename:MICRO-STAR INTERNATIONAL CO.,LTD 
MS-7596/760GM-E51(MS-7596),BIOS 
V3.301/12/2012[41358.475955]task:f5302da0 ti:e1364000 task.ti:e1364000 
[41358.475993]EIP:0060:[<c131c7d0>]EFLAGS:00010282CPU:2[41358.476030]EIP 
isat 
rtnetlink_put_metrics+0x50/0x180[41358.476066]EAX:00000000EBX:00000001ECX:00000004EDX:00000000[41358.476106]ESI:00000000EDI:e0b38000 
EBP:e1365ca8 ESP:e1365c78 
[41358.476143] DS:007bES:007bFS:00d8GS:0033SS:0068[41358.476179]CR0:8005003bCR2:00000000CR3:34966ac0CR4:000006f0[41358.476216]Stack:[41358.476249]00000000c1213873 
d4316f64 00000000e0b38000 e1365d00 c1213989 
00000fe4[41358.476330] e0b38000 00000000d4316f30 e0b38000 e1365d00 
c138362e e1365cd8 
0000000c[41358.476405]00000002000000020000000000000000c13bba01 e0b38000 
000000fe007d8196[41358.476482]CallTrace:[41358.476522][<c1213873>]?__nla_reserve+0x23/0xe0[41358.476557][<c1213989>]?__nla_put+0x9/0xb0[41358.476595][<c138362e>]?fib_dump_info+0x15e/0x3e0[41358.476636][<c13bba01>]?irq_entries_start+0x639/0x678[41358.476671][<c1386823>]?fib_table_dump+0xf3/0x180[41358.476708][<c138053d>]?inet_dump_fib+0x7d/0x100[41358.476746][<c1337ef1>]?netlink_dump+0x121/0x270[41358.476781][<c1303572>]?skb_free_datagram+0x12/0x40[41358.476818][<c1338284>]?netlink_recvmsg+0x244/0x360[41358.476855][<c12f3f8d>]?sock_recvmsg+0x1d/0x30[41358.476890][<c12f3f70>]?sock_recvmsg_nosec+0x30/0x30[41358.476924][<c12f5cec>]?___sys_recvmsg+0x9c/0x120[41358.476958][<c12f3f70>]?sock_recvmsg_nosec+0x30/0x30[41358.476994][<c10740e4>]?update_cfs_rq_blocked_load+0xc4/0x130[41358.477030][<c109
 4bb4>]?hrtimer_forward+0xa4/0x1c0[41358.477065][<c12f4cdd>]?sockfd_lookup_light+0x1d/0x80[41358.477099][<c12f6c5e>]?__sys_recvmsg+0x3e/0x80[41358.477134][<c12f6ff1>]?SyS_socketcall+0xb1/0x2a0[41358.477168][<c108657c>]?handle_irq_event+0x3c/0x60[41358.477203][<c1088efd>]?handle_edge_irq+0x7d/0x100[41358.477238][<c130a2e6>]?rps_trigger_softirq+0x26/0x30[41358.477273][<c10a88e3>]?flush_smp_call_function_queue+0x83/0x120[41358.477307][<c13bb2be>]?syscall_call+0x7/0x7[41358.477341]Code:008945d8 
89c3 89f8 e8 7e72ef ff 85c0 0f889e00000085db 0f8496000000bb 01000000c7 
45dc 000000006690<8b>449efc 85c0 742b83fb 100f84840000008945e0 
8d[41358.477509]EIP:[<c131c7d0>]rtnetlink_put_metrics+0x50/0x180SS:ESP 
0068:e1365c78 
[41358.477576]CR2:0000000000000000[41358.477880]---[endtrace 
6e3e7e6b81407c0a]---[41358.499813]------------[cut here 
]------------[41358.499879]WARNING:CPU:2PID:0at 
/var/testpoint/LEAF/source/i486-unknown-linux-uclibc/linux/linux-4.1/net/netlink/af_netlink.c:944netlink_sock_destruct+0xa8/0xc0()[41358.500003]Moduleslinked 
in:act_mirred pppoe pppox ppp_generic slhc iptable_filter xt_length 
xt_TCPMSS xt_tcpudp xt_mark xt_dscp iptable_mangle ip_tables x_tables 
ipv6 sch_sfq sch_htb cls_u32 sch_ingress sch_prio sch_tbf cls_flow 
cls_fw act_police ifb 8021qmrp garp stp llc softdog parport_pc parport 
acpi_cpufreq processor thermal_sys igb(O)k10temp hwmon dca ohci_pci 
ohci_hcd ptp pps_core i2c_piix4 i2c_core sp5100_tco sd_mod pata_acpi 
pata_atiixp pcspkr ata_generic ahci libahci libata ehci_pci ehci_hcd 
scsi_mod usbcore usb_common ext4 mbcache jbd2 crc16 vfat fat isofs 
[41358.502110]CPU:2PID:0Comm:swapper/2Tainted:G      D    O 4.1.12-i686 
#1[41358.502213]Hardwarename:MICRO-STAR INTERNATIONAL CO.,LTD 
MS-7596/760GM-E51(MS-7596),BIOS V3.301/12/2012[41358.502305] c14b0540 
f5259f40 c13b6ee2 00000000c104b5a3 c1475fd4 
0000000200000000[41358.502610] c14b0540 000003b0c13373e8 
00000009c13373e8 f2204c00 0000000a0000000a[41358.502920] f5259f50 
c104b680 0000000900000000f5259f64 c13373e8 c108f4d7 c108f4d7 
[41358.503230]CallTrace:[41358.503292][<c13b6ee2>]?dump_stack+0x3e/0x4e[41358.503357][<c104b5a3>]?warn_slowpath_common+0x93/0xd0[41358.503420][<c13373e8>]?netlink_sock_destruct+0xa8/0xc0[41358.503484][<c13373e8>]?netlink_sock_destruct+0xa8/0xc0[41358.503548][<c104b680>]?warn_slowpath_null+0x20/0x30[41358.503609][<c13373e8>]?netlink_sock_destruct+0xa8/0xc0[41358.503671][<c108f4d7>]?rcu_process_callbacks+0x1b7/0x4e0[41358.503732][<c108f4d7>]?rcu_process_callbacks+0x1b7/0x4e0[41358.503794][<c12f9b88>]?__sk_free+0x18/0xf0[41358.503862][<c108f513>]?rcu_process_callbacks+0x1f3/0x4e0[41358.503929][<c104e753>]?__do_softirq+0xc3/0x240[41358.503992][<c104e690>]?__tasklet_hrtimer_trampoline+0x50/0x50[41358.504056][<c1004729>]?do_softirq_own_stack+0x29/0x40[41358.504117]<IRQ>[<c104ea9e>]?irq_exit+0x6e
 /0x90[41358.504208][<c13bc3f8>]?smp_apic_timer_interrupt+0x38/0x50[41358.504270][<c13bbcd9>]?apic_timer_interrupt+0x2d/0x34[41358.504332][<c100bfc9>]?default_idle+0x19/0xb0[41358.504395][<c100cd2e>]?arch_cpu_idle+0xe/0x10[41358.504458][<c107ec55>]?cpu_startup_entry+0x215/0x310[41358.504519]---[endtrace 
6e3e7e6b81407c0b]---

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 4.1.12 kernel crash in rtnetlink_put_metrics
  2015-11-04 16:00 4.1.12 kernel crash in rtnetlink_put_metrics Andrew
@ 2015-11-04 19:55 ` Daniel Borkmann
  2016-03-07 22:15   ` subashab
  0 siblings, 1 reply; 5+ messages in thread
From: Daniel Borkmann @ 2015-11-04 19:55 UTC (permalink / raw)
  To: Andrew; +Cc: netdev

Hi Andrew,

thanks for the report!

On 11/04/2015 05:00 PM, Andrew wrote:
> Hi all.
>
> Today I've got a crash on one of servers (PPPoE BRAS with BGP/OSPF). This server becomes unstable after updating from 3.2.x kernel to 4.1.x (other servers with slightly different CPUs/MBs also have troubles - but they hang less frequently).
>
> Place in kernel code:
> (gdb) list *rtnetlink_put_metrics+0x50
> 0xc131c7d0 is in rtnetlink_put_metrics (/var/testpoint/LEAF/source/i486-unknown-linux-uclibc/linux/linux-4.1/net/core/rtnetlink.c:672).
> 667        mx = nla_nest_start(skb, RTA_METRICS);
> 668        if (mx == NULL)
> 669            return -ENOBUFS;
> 670
> 671        for (i = 0; i < RTAX_MAX; i++) {
> 672            if (metrics[i]) {

( Making the trace a bit more readable ... )

[41358.475254]BUG:unable to handle kernel NULL pointer dereference at (null)
[41358.475333]IP:[<c131c7d0>]rtnetlink_put_metrics+0x50/0x180
[...]
CallTrace:
[41358.476522][<c1213873>]?__nla_reserve+0x23/0xe0
[41358.476557][<c1213989>]?__nla_put+0x9/0xb0
[41358.476595][<c138362e>]?fib_dump_info+0x15e/0x3e0
[41358.476636][<c13bba01>]?irq_entries_start+0x639/0x678
[41358.476671][<c1386823>]?fib_table_dump+0xf3/0x180
[41358.476708][<c138053d>]?inet_dump_fib+0x7d/0x100
[41358.476746][<c1337ef1>]?netlink_dump+0x121/0x270
[41358.476781][<c1303572>]?skb_free_datagram+0x12/0x40
[41358.476818][<c1338284>]?netlink_recvmsg+0x244/0x360
[41358.476855][<c12f3f8d>]?sock_recvmsg+0x1d/0x30
[41358.476890][<c12f3f70>]?sock_recvmsg_nosec+0x30/0x30
[41358.476924][<c12f5cec>]?___sys_recvmsg+0x9c/0x120
[41358.476958][<c12f3f70>]?sock_recvmsg_nosec+0x30/0x30
[41358.476994][<c10740e4>]?update_cfs_rq_blocked_load+0xc4/0x130
[41358.477030][<c1094bb4>]?hrtimer_forward+0xa4/0x1c0
[41358.477065][<c12f4cdd>]?sockfd_lookup_light+0x1d/0x80
[41358.477099][<c12f6c5e>]?__sys_recvmsg+0x3e/0x80
[41358.477134][<c12f6ff1>]?SyS_socketcall+0xb1/0x2a0
[41358.477168][<c108657c>]?handle_irq_event+0x3c/0x60
[41358.477203][<c1088efd>]?handle_edge_irq+0x7d/0x100
[41358.477238][<c130a2e6>]?rps_trigger_softirq+0x26/0x30
[41358.477273][<c10a88e3>]?flush_smp_call_function_queue+0x83/0x120
[41358.477307][<c13bb2be>]?syscall_call+0x7/0x7
[...]

Strange that rtnetlink_put_metrics() itself is not part of the above
call trace (it's an exported symbol).

So, your analysis suggests that metrics itself is NULL in this case?
(Can you confirm that?)

How frequently does this trigger? Are the seen call traces all the same kind?

Is there an easy way to reproduce this?

I presume you don't use any per route congestion control settings, right?

Thanks,
Daniel

> 673                if (i == RTAX_CC_ALGO - 1) {
> 674                    char tmp[TCP_CA_NAME_MAX], *name;
> 675
> 676                    name = tcp_ca_get_name_by_key(metrics[i], tmp);
>
>
> Here's trace:
>
> [41358.475254]BUG:unable to handle kernel NULL pointer dereference at (null)[41358.475333]IP:[<c131c7d0>]rtnetlink_put_metrics+0x50/0x180[41358.475376]*pdpt =0000000026d58001*pde =0000000000000000[41358.475413]Oops:0000[#1] SMP [41358.475453]Moduleslinked in:act_mirred pppoe pppox ppp_generic slhc iptable_filter xt_length xt_TCPMSS xt_tcpudp xt_mark xt_dscp iptable_mangle ip_tables x_tables ipv6 sch_sfq sch_htb cls_u32 sch_ingress sch_prio sch_tbf cls_flow cls_fw act_police ifb 8021qmrp garp stp llc softdog parport_pc parport acpi_cpufreq processor thermal_sys igb(O)k10temp hwmon dca ohci_pci ohci_hcd ptp pps_core i2c_piix4 i2c_core sp5100_tco sd_mod pata_acpi pata_atiixp pcspkr ata_generic ahci libahci libata ehci_pci ehci_hcd scsi_mod usbcore usb_common ext4 mbcache jbd2 crc16 vfat fat
  isofs [41358.475807]CPU:2PID:10877Comm:bird Tainted:G           O 4.1.12-i686 #1[41358.475880]Hardwarename:MICRO-STAR INTERNATIONAL CO.,LTD MS-7596/760GM-E51(MS-7596),BIOS
> V3.301/12/2012[41358.475955]task:f5302da0 ti:e1364000 task.ti:e1364000 [41358.475993]EIP:0060:[<c131c7d0>]EFLAGS:00010282CPU:2[41358.476030]EIP isat rtnetlink_put_metrics+0x50/0x180[41358.476066]EAX:00000000EBX:00000001ECX:00000004EDX:00000000[41358.476106]ESI:00000000EDI:e0b38000 EBP:e1365ca8 ESP:e1365c78 [41358.476143] DS:007bES:007bFS:00d8GS:0033SS:0068[41358.476179]CR0:8005003bCR2:00000000CR3:34966ac0CR4:000006f0[41358.476216]Stack:[41358.476249]00000000c1213873 d4316f64 00000000e0b38000 e1365d00 c1213989 00000fe4[41358.476330] e0b38000 00000000d4316f30 e0b38000 e1365d00 c138362e e1365cd8 0000000c[41358.476405]00000002000000020000000000000000c13bba01 e0b38000
> 000000fe007d8196[41358.476482]CallTrace:[41358.476522][<c1213873>]?__nla_reserve+0x23/0xe0[41358.476557][<c1213989>]?__nla_put+0x9/0xb0[41358.476595][<c138362e>]?fib_dump_info+0x15e/0x3e0[41358.476636][<c13bba01>]?irq_entries_start+0x639/0x678[41358.476671][<c1386823>]?fib_table_dump+0xf3/0x180[41358.476708][<c138053d>]?inet_dump_fib+0x7d/0x100[41358.476746][<c1337ef1>]?netlink_dump+0x121/0x270[41358.476781][<c1303572>]?skb_free_datagram+0x12/0x40[41358.476818][<c1338284>]?netlink_recvmsg+0x244/0x360[41358.476855][<c12f3f8d>]?sock_recvmsg+0x1d/0x30[41358.476890][<c12f3f70>]?sock_recvmsg_nosec+0x30/0x30[41358.476924][<c12f5cec>]?___sys_recvmsg+0x9c/0x120[41358.476958][<c12f3f70>]?sock_recvmsg_nosec+0x30/0x30[41358.476994][<c10740e4>]?update_cfs_rq_blocked_load+0xc4/0x130[41358.477030][<c1
 094bb4>]?hrtimer_forward+0xa4/0x1c0[41358.477065][<c12f4cdd>]?sockfd_lookup_light+0x1d/0x80[41358.477099][<c12f6c5e>]?__sys_recvmsg+0x3e/0x80[41358.477134][<c12f6ff1>]?SyS_socketcall+0xb1/0x2a0[41358
.477168][<c108657c>]?handle_irq_event+0x3c/0x60[41358.477203][<c1088efd>]?handle_edge_irq+0x7d/0x100[41358.477238][<c130a2e6>]?rps_trigger_softirq+0x26/0x30[41358.477273][<c10a88e3>]?flush_smp_call_function_queue+0x83/0x120[41358.477307][<c13bb2be>]?syscall_call+0x7/0x7[41358.477341]Code:008945d8
> 89c3 89f8 e8 7e72ef ff 85c0 0f889e00000085db 0f8496000000bb 01000000c7 45dc 000000006690<8b>449efc 85c0 742b83fb 100f84840000008945e0 8d[41358.477509]EIP:[<c131c7d0>]rtnetlink_put_metrics+0x50/0x180SS:ESP 0068:e1365c78 [41358.477576]CR2:0000000000000000[41358.477880]---[endtrace 6e3e7e6b81407c0a]---[41358.499813]------------[cut here ]------------[41358.499879]WARNING:CPU:2PID:0at /var/testpoint/LEAF/source/i486-unknown-linux-uclibc/linux/linux-4.1/net/netlink/af_netlink.c:944netlink_sock_destruct+0xa8/0xc0()[41358.500003]Moduleslinked in:act_mirred pppoe pppox ppp_generic slhc iptable_filter xt_length xt_TCPMSS xt_tcpudp xt_mark xt_dscp iptable_mangle ip_tables x_tables ipv6 sch_sfq sch_htb cls_u32 sch_ingress sch_prio sch_tbf cls_flow cls_fw act_police ifb 8021qmrp garp stp llc softdog
  parport_pc parport acpi_cpufreq processor thermal_sys igb(O)k10temp hwmon dca ohci_pci ohci_hcd ptp pps_core i2c_piix4 i2c_core sp5100_tco sd_mod pata_acpi pata_atiixp pcspkr ata_generic ahci
> libahci libata ehci_pci ehci_hcd scsi_mod usbcore usb_common ext4 mbcache jbd2 crc16 vfat fat isofs [41358.502110]CPU:2PID:0Comm:swapper/2Tainted:G      D    O 4.1.12-i686 #1[41358.502213]Hardwarename:MICRO-STAR INTERNATIONAL CO.,LTD MS-7596/760GM-E51(MS-7596),BIOS V3.301/12/2012[41358.502305] c14b0540 f5259f40 c13b6ee2 00000000c104b5a3 c1475fd4 0000000200000000[41358.502610] c14b0540 000003b0c13373e8 00000009c13373e8 f2204c00 0000000a0000000a[41358.502920] f5259f50 c104b680 0000000900000000f5259f64 c13373e8 c108f4d7 c108f4d7
> [41358.503230]CallTrace:[41358.503292][<c13b6ee2>]?dump_stack+0x3e/0x4e[41358.503357][<c104b5a3>]?warn_slowpath_common+0x93/0xd0[41358.503420][<c13373e8>]?netlink_sock_destruct+0xa8/0xc0[41358.503484][<c13373e8>]?netlink_sock_destruct+0xa8/0xc0[41358.503548][<c104b680>]?warn_slowpath_null+0x20/0x30[41358.503609][<c13373e8>]?netlink_sock_destruct+0xa8/0xc0[41358.503671][<c108f4d7>]?rcu_process_callbacks+0x1b7/0x4e0[41358.503732][<c108f4d7>]?rcu_process_callbacks+0x1b7/0x4e0[41358.503794][<c12f9b88>]?__sk_free+0x18/0xf0[41358.503862][<c108f513>]?rcu_process_callbacks+0x1f3/0x4e0[41358.503929][<c104e753>]?__do_softirq+0xc3/0x240[41358.503992][<c104e690>]?__tasklet_hrtimer_trampoline+0x50/0x50[41358.504056][<c1004729>]?do_softirq_own_stack+0x29/0x40[41358.504117]<IRQ>[<c104ea9e>]?irq_exit+0x
 6e/0x90[41358.504208][<c13bc3f8>]?smp_apic_timer_interrupt+0x38/0x50[41358.504270][<c13bbcd9>]?apic_timer_interrupt+0x2d/0x34[41358.504332][<c100bfc9>]?default_idle+0x19/0xb0[41358.504395][<c100cd2e>
]?arch_cpu_idle+0xe/0x10[41358.504458][<c107ec55>]?cpu_startup_entry+0x215/0x310[41358.504519]---[endtrace
> 6e3e7e6b81407c0b]---
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 4.1.12 kernel crash in rtnetlink_put_metrics
  2015-11-04 19:55 ` Daniel Borkmann
@ 2016-03-07 22:15   ` subashab
  2016-03-07 23:39     ` Daniel Borkmann
  0 siblings, 1 reply; 5+ messages in thread
From: subashab @ 2016-03-07 22:15 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: Andrew, netdev, netdev-owner

On , Daniel Borkmann wrote:
> Hi Andrew,
> 
> thanks for the report!
> 
> ( Making the trace a bit more readable ... )
> 
> [41358.475254]BUG:unable to handle kernel NULL pointer dereference at 
> (null)
> [41358.475333]IP:[<c131c7d0>]rtnetlink_put_metrics+0x50/0x180
> [...]
> CallTrace:
> [41358.476522][<c1213873>]?__nla_reserve+0x23/0xe0
> [41358.476557][<c1213989>]?__nla_put+0x9/0xb0
> [41358.476595][<c138362e>]?fib_dump_info+0x15e/0x3e0
> [41358.476636][<c13bba01>]?irq_entries_start+0x639/0x678
> [41358.476671][<c1386823>]?fib_table_dump+0xf3/0x180
> [41358.476708][<c138053d>]?inet_dump_fib+0x7d/0x100
> [41358.476746][<c1337ef1>]?netlink_dump+0x121/0x270
> [41358.476781][<c1303572>]?skb_free_datagram+0x12/0x40
> [41358.476818][<c1338284>]?netlink_recvmsg+0x244/0x360
> [41358.476855][<c12f3f8d>]?sock_recvmsg+0x1d/0x30
> [41358.476890][<c12f3f70>]?sock_recvmsg_nosec+0x30/0x30
> [41358.476924][<c12f5cec>]?___sys_recvmsg+0x9c/0x120
> [41358.476958][<c12f3f70>]?sock_recvmsg_nosec+0x30/0x30
> [41358.476994][<c10740e4>]?update_cfs_rq_blocked_load+0xc4/0x130
> [41358.477030][<c1094bb4>]?hrtimer_forward+0xa4/0x1c0
> [41358.477065][<c12f4cdd>]?sockfd_lookup_light+0x1d/0x80
> [41358.477099][<c12f6c5e>]?__sys_recvmsg+0x3e/0x80
> [41358.477134][<c12f6ff1>]?SyS_socketcall+0xb1/0x2a0
> [41358.477168][<c108657c>]?handle_irq_event+0x3c/0x60
> [41358.477203][<c1088efd>]?handle_edge_irq+0x7d/0x100
> [41358.477238][<c130a2e6>]?rps_trigger_softirq+0x26/0x30
> [41358.477273][<c10a88e3>]?flush_smp_call_function_queue+0x83/0x120
> [41358.477307][<c13bb2be>]?syscall_call+0x7/0x7
> [...]
> 
> Strange that rtnetlink_put_metrics() itself is not part of the above
> call trace (it's an exported symbol).
> 
> So, your analysis suggests that metrics itself is NULL in this case?
> (Can you confirm that?)
> 
> How frequently does this trigger? Are the seen call traces all the same 
> kind?
> 
> Is there an easy way to reproduce this?
> 
> I presume you don't use any per route congestion control settings, 
> right?
> 
> Thanks,
> Daniel

Hi Daniel

I am observing a similar crash as well. This is on a 3.10 based ARM64 
kernel.
Unfortunately, the crash is occurring in a regression test rack, so I am 
not
sure of the exact test case to reproduce this crash. This seems to have
occurred twice so far with both cases having metrics as NULL.

     |  rt_=_0xFFFFFFC012DA4300 -> (
     |    dst = (
     |      callback_head = (next = 0x0, func = 0xFFFFFF800262D040),
     |      child = 0xFFFFFFC03B8BC2B0,
     |      dev = 0xFFFFFFC012DA4318,
     |      ops = 0xFFFFFFC012DA4318,
     |      _metrics = 0,
     |      expires = 0,
     |      path = 0x0,
     |      from = 0x0,
     |      xfrm = 0x0,
     |      input = 0xFFFFFFC0AD498000,
     |      output = 0x000000010401C411,
     |      flags = 0,
     |      pending_confirm = 0,
     |      error = 0,
     |      obsolete = 0,
     |      header_len = 3,
     |      trailer_len = 0,
     |      __pad2 = 4096,

168539.549000:   <6> Process ip (pid: 28473, stack limit = 
0xffffffc04b584060)
168539.549006:   <2> Call trace:
168539.549016:   <2> [<ffffffc000a95900>] 
rtnetlink_put_metrics+0x4c/0xec
168539.549027:   <2> [<ffffffc000b5e198>] 
rt6_fill_node.isra.34+0x2b8/0x3c8
168539.549035:   <2> [<ffffffc000b5e6e0>] rt6_dump_route+0x68/0x7c
168539.549043:   <2> [<ffffffc000b5edec>] fib6_dump_node+0x2c/0x74
168539.549051:   <2> [<ffffffc000b5ec24>] fib6_walk_continue+0xf8/0x1b4
168539.549059:   <2> [<ffffffc000b5f140>] fib6_walk+0x5c/0xb8
168539.549067:   <2> [<ffffffc000b5f2a0>] inet6_dump_fib+0x104/0x234
168539.549076:   <2> [<ffffffc000ab1510>] netlink_dump+0x7c/0x1cc
168539.549084:   <2> [<ffffffc000ab22f0>] 
__netlink_dump_start+0x128/0x170
168539.549093:   <2> [<ffffffc000a98ddc>] rtnetlink_rcv_msg+0x12c/0x1a0
168539.549101:   <2> [<ffffffc000ab3a80>] netlink_rcv_skb+0x64/0xc8
168539.549110:   <2> [<ffffffc000a97644>] rtnetlink_rcv+0x1c/0x2c
168539.549117:   <2> [<ffffffc000ab34cc>] netlink_unicast+0x108/0x1b8
168539.549125:   <2> [<ffffffc000ab38b8>] netlink_sendmsg+0x27c/0x2d4
168539.549134:   <2> [<ffffffc000a73f04>] sock_sendmsg+0x8c/0xb0
168539.549143:   <2> [<ffffffc000a75f04>] SyS_sendto+0xcc/0x110

I am using the following patch as a workaround now. I do not have any
per route congestion control settings enabled.
Any pointers to debug this would be greatly appreciated.

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index a67310e..c63098e 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -566,7 +566,7 @@ int rtnetlink_put_metrics(struct sk_buff *skb, u32 
*metrics)
         int i, valid = 0;

         mx = nla_nest_start(skb, RTA_METRICS);
-       if (mx == NULL)
+       if (mx == NULL || metrics == NULL)
                 return -ENOBUFS;

         for (i = 0; i < RTAX_MAX; i++) {

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: 4.1.12 kernel crash in rtnetlink_put_metrics
  2016-03-07 22:15   ` subashab
@ 2016-03-07 23:39     ` Daniel Borkmann
  2016-03-08  4:27       ` subashab
  0 siblings, 1 reply; 5+ messages in thread
From: Daniel Borkmann @ 2016-03-07 23:39 UTC (permalink / raw)
  To: subashab; +Cc: Andrew, netdev, kafai

On 03/07/2016 11:15 PM, subashab@codeaurora.org wrote:
> On , Daniel Borkmann wrote:
>> Hi Andrew,
>>
>> thanks for the report!
>>
>> ( Making the trace a bit more readable ... )
>>
>> [41358.475254]BUG:unable to handle kernel NULL pointer dereference at (null)
>> [41358.475333]IP:[<c131c7d0>]rtnetlink_put_metrics+0x50/0x180
>> [...]
>> CallTrace:
>> [41358.476522][<c1213873>]?__nla_reserve+0x23/0xe0
>> [41358.476557][<c1213989>]?__nla_put+0x9/0xb0
>> [41358.476595][<c138362e>]?fib_dump_info+0x15e/0x3e0
>> [41358.476636][<c13bba01>]?irq_entries_start+0x639/0x678
>> [41358.476671][<c1386823>]?fib_table_dump+0xf3/0x180
>> [41358.476708][<c138053d>]?inet_dump_fib+0x7d/0x100
>> [41358.476746][<c1337ef1>]?netlink_dump+0x121/0x270
>> [41358.476781][<c1303572>]?skb_free_datagram+0x12/0x40
>> [41358.476818][<c1338284>]?netlink_recvmsg+0x244/0x360
>> [41358.476855][<c12f3f8d>]?sock_recvmsg+0x1d/0x30
>> [41358.476890][<c12f3f70>]?sock_recvmsg_nosec+0x30/0x30
>> [41358.476924][<c12f5cec>]?___sys_recvmsg+0x9c/0x120
>> [41358.476958][<c12f3f70>]?sock_recvmsg_nosec+0x30/0x30
>> [41358.476994][<c10740e4>]?update_cfs_rq_blocked_load+0xc4/0x130
>> [41358.477030][<c1094bb4>]?hrtimer_forward+0xa4/0x1c0
>> [41358.477065][<c12f4cdd>]?sockfd_lookup_light+0x1d/0x80
>> [41358.477099][<c12f6c5e>]?__sys_recvmsg+0x3e/0x80
>> [41358.477134][<c12f6ff1>]?SyS_socketcall+0xb1/0x2a0
>> [41358.477168][<c108657c>]?handle_irq_event+0x3c/0x60
>> [41358.477203][<c1088efd>]?handle_edge_irq+0x7d/0x100
>> [41358.477238][<c130a2e6>]?rps_trigger_softirq+0x26/0x30
>> [41358.477273][<c10a88e3>]?flush_smp_call_function_queue+0x83/0x120
>> [41358.477307][<c13bb2be>]?syscall_call+0x7/0x7
>> [...]
>>
>> Strange that rtnetlink_put_metrics() itself is not part of the above
>> call trace (it's an exported symbol).
>>
>> So, your analysis suggests that metrics itself is NULL in this case?
>> (Can you confirm that?)
>>
>> How frequently does this trigger? Are the seen call traces all the same kind?
>>
>> Is there an easy way to reproduce this?
>>
>> I presume you don't use any per route congestion control settings, right?
>>
>> Thanks,
>> Daniel
>
> Hi Daniel
>
> I am observing a similar crash as well. This is on a 3.10 based ARM64 kernel.
> Unfortunately, the crash is occurring in a regression test rack, so I am not
> sure of the exact test case to reproduce this crash. This seems to have
> occurred twice so far with both cases having metrics as NULL.
>
>      |  rt_=_0xFFFFFFC012DA4300 -> (
>      |    dst = (
>      |      callback_head = (next = 0x0, func = 0xFFFFFF800262D040),
>      |      child = 0xFFFFFFC03B8BC2B0,
>      |      dev = 0xFFFFFFC012DA4318,
>      |      ops = 0xFFFFFFC012DA4318,
>      |      _metrics = 0,
>      |      expires = 0,
>      |      path = 0x0,
>      |      from = 0x0,
>      |      xfrm = 0x0,
>      |      input = 0xFFFFFFC0AD498000,
>      |      output = 0x000000010401C411,
>      |      flags = 0,
>      |      pending_confirm = 0,
>      |      error = 0,
>      |      obsolete = 0,
>      |      header_len = 3,
>      |      trailer_len = 0,
>      |      __pad2 = 4096,
>
> 168539.549000:   <6> Process ip (pid: 28473, stack limit = 0xffffffc04b584060)
> 168539.549006:   <2> Call trace:
> 168539.549016:   <2> [<ffffffc000a95900>] rtnetlink_put_metrics+0x4c/0xec
> 168539.549027:   <2> [<ffffffc000b5e198>] rt6_fill_node.isra.34+0x2b8/0x3c8
> 168539.549035:   <2> [<ffffffc000b5e6e0>] rt6_dump_route+0x68/0x7c
> 168539.549043:   <2> [<ffffffc000b5edec>] fib6_dump_node+0x2c/0x74
> 168539.549051:   <2> [<ffffffc000b5ec24>] fib6_walk_continue+0xf8/0x1b4
> 168539.549059:   <2> [<ffffffc000b5f140>] fib6_walk+0x5c/0xb8
> 168539.549067:   <2> [<ffffffc000b5f2a0>] inet6_dump_fib+0x104/0x234
> 168539.549076:   <2> [<ffffffc000ab1510>] netlink_dump+0x7c/0x1cc
> 168539.549084:   <2> [<ffffffc000ab22f0>] __netlink_dump_start+0x128/0x170
> 168539.549093:   <2> [<ffffffc000a98ddc>] rtnetlink_rcv_msg+0x12c/0x1a0
> 168539.549101:   <2> [<ffffffc000ab3a80>] netlink_rcv_skb+0x64/0xc8
> 168539.549110:   <2> [<ffffffc000a97644>] rtnetlink_rcv+0x1c/0x2c
> 168539.549117:   <2> [<ffffffc000ab34cc>] netlink_unicast+0x108/0x1b8
> 168539.549125:   <2> [<ffffffc000ab38b8>] netlink_sendmsg+0x27c/0x2d4
> 168539.549134:   <2> [<ffffffc000a73f04>] sock_sendmsg+0x8c/0xb0
> 168539.549143:   <2> [<ffffffc000a75f04>] SyS_sendto+0xcc/0x110
>
> I am using the following patch as a workaround now. I do not have any
> per route congestion control settings enabled.
> Any pointers to debug this would be greatly appreciated.

Hmm, if it was 4.1.X like in original reporter case, I might have thought
something like commit 0a1f59620068 ("ipv6: Initialize rt6_info properly
in ip6_blackhole_route()") ... any chance on reproducing this on a latest
kernel?

> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index a67310e..c63098e 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -566,7 +566,7 @@ int rtnetlink_put_metrics(struct sk_buff *skb, u32 *metrics)
>          int i, valid = 0;
>
>          mx = nla_nest_start(skb, RTA_METRICS);
> -       if (mx == NULL)
> +       if (mx == NULL || metrics == NULL)
>                  return -ENOBUFS;
>
>          for (i = 0; i < RTAX_MAX; i++) {
>
>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 4.1.12 kernel crash in rtnetlink_put_metrics
  2016-03-07 23:39     ` Daniel Borkmann
@ 2016-03-08  4:27       ` subashab
  0 siblings, 0 replies; 5+ messages in thread
From: subashab @ 2016-03-08  4:27 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: Andrew, netdev, kafai

> Hmm, if it was 4.1.X like in original reporter case, I might have thought
> something like commit 0a1f59620068 ("ipv6: Initialize rt6_info properly
> in ip6_blackhole_route()") ... any chance on reproducing this on a latest
> kernel?
>

Unfortunately, I haven't encountered a similar crash on newer kernels as of now.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-03-08  4:27 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-04 16:00 4.1.12 kernel crash in rtnetlink_put_metrics Andrew
2015-11-04 19:55 ` Daniel Borkmann
2016-03-07 22:15   ` subashab
2016-03-07 23:39     ` Daniel Borkmann
2016-03-08  4:27       ` subashab

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.