* IPv6 kernel warning @ 2013-09-20 13:11 Russell King - ARM Linux 2013-09-20 16:08 ` Michele Baldessari 0 siblings, 1 reply; 15+ messages in thread From: Russell King - ARM Linux @ 2013-09-20 13:11 UTC (permalink / raw) To: netdev While running v3.11 on my firewall, I saw this warning. I'm not sure what it means or what its implications are: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 0 at /home/rmk/git/linux-rmk/net/ipv4/tcp_input.c:2711 tcp_fastretrans_alert+0x178/0x840() Modules linked in: ipt_REJECT xt_multiport iptable_filter ipt_MASQUERADE xt_nat xt_mark iptable_nat nf_nat_ipv4 nf_nat ip6table_mangle xt_LOG xt_limit nf_conntrack_ipv6 nf_defrag_ipv6 xt_state ip6table_filter pata_pcmcia libata scsi_mod 3c589_cs ide_gd_mod ide_cs ide_core sa1111_cs sa1100_cs sa11xx_base soc_common sa11x0_dma virt_dma usbcore usb_common CPU: 0 PID: 0 Comm: swapper Not tainted 3.11.0+ #15 Backtrace: [<c02111a8>] (dump_backtrace+0x0/0x114) from [<c02115a0>] (show_stack+0x18/0x1c) r6:c0520824 r5:00000009 r4:00000000 [<c0211588>] (show_stack+0x0/0x1c) from [<c04d65e0>] (dump_stack+0x20/0x28) [<c04d65c0>] (dump_stack+0x0/0x28) from [<c021bfb0>] (warn_slowpath_common+0x68/0x88) [<c021bf48>] (warn_slowpath_common+0x0/0x88) from [<c021bff4>] (warn_slowpath_null+0x24/0x28) r8:00000000 r7:00000001 r6:00000006 r5:00004320 r4:c11c6580 [<c021bfd0>] (warn_slowpath_null+0x0/0x28) from [<c04515bc>] (tcp_fastretrans_alert+0x178/0x840) [<c0451444>] (tcp_fastretrans_alert+0x0/0x840) from [<c045273c>] (tcp_ack+0xa14/0xc18) [<c0451d28>] (tcp_ack+0x0/0xc18) from [<c0453138>] (tcp_rcv_established+0x494/0x594) [<c0452ca4>] (tcp_rcv_established+0x0/0x594) from [<c04a977c>] (tcp_v6_do_rcv+0xd0/0x428) [<c04a96ac>] (tcp_v6_do_rcv+0x0/0x428) from [<c04a9e70>] (tcp_v6_rcv+0x340/0x63c) [<c04a9b30>] (tcp_v6_rcv+0x0/0x63c) from [<c048b334>] (ip6_input_finish+0x214/0x3c4) [<c048b120>] (ip6_input_finish+0x0/0x3c4) from [<c048ba60>] (ip6_input+0x64/0x74) [<c048b9fc>] (ip6_input+0x0/0x74) from [<c048b564>] (ip6_rcv_finish+0x80/0x8c) r4:c1c9ee20 [<c048b4e4>] (ip6_rcv_finish+0x0/0x8c) from [<c048b994>] (ipv6_rcv+0x424/0x48c) r4:c1c9ee20 [<c048b570>] (ipv6_rcv+0x0/0x48c) from [<c0407624>] (__netif_receive_skb_core+0x618/0x688) r8:0000dd86 r7:00000000 r6:c11f6800 r5:c05ee6cc r4:c05f1b98 [<c040700c>] (__netif_receive_skb_core+0x0/0x688) from [<c040770c>] (__netif_receive_skb+0x78/0x80) [<c0407694>] (__netif_receive_skb+0x0/0x80) from [<c04077a8>] (process_backlog+0x94/0x14c) r5:c06091e0 r4:c0609220 [<c0407714>] (process_backlog+0x0/0x14c) from [<c0407af4>] (net_rx_action+0x78/0x1ac) [<c0407a7c>] (net_rx_action+0x0/0x1ac) from [<c021f500>] (__do_softirq+0xb4/0x198) [<c021f44c>] (__do_softirq+0x0/0x198) from [<c021f90c>] (irq_exit+0x74/0xc8) [<c021f898>] (irq_exit+0x0/0xc8) from [<c020f1ac>] (handle_IRQ+0x68/0x88) r4:0000000b [<c020f144>] (handle_IRQ+0x0/0x88) from [<c0208210>] (asm_do_IRQ+0x10/0x14) r5:60000013 r4:c0246818 [<c0208200>] (asm_do_IRQ+0x0/0x14) from [<c0211fcc>] (__irq_svc+0x2c/0x98) Exception stack(0xc05e7f54 to 0xc05e7f9c) 7f40: 00000000 00000000 00000000 7f60: 60000013 c05e6000 c06092a4 c05ee080 00000001 c0204000 6901b115 c05e0800 7f80: c05e7fb8 c05e7f9c c05e7f9c c020f348 c0246818 60000013 ffffffff [<c0246794>] (cpu_startup_entry+0x0/0xe8) from [<c04d4d70>] (rest_init+0x64/0x7c) r7:c05f3940 r6:c0922200 r5:c0609340 r4:c05ee0c0 [<c04d4d0c>] (rest_init+0x0/0x7c) from [<c05c3acc>] (start_kernel+0x350/0x3ac) [<c05c377c>] (start_kernel+0x0/0x3ac) from [<c0208040>] (0xc0208040) ---[ end trace ab55f0e3f592fa5e ]--- ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: IPv6 kernel warning 2013-09-20 13:11 IPv6 kernel warning Russell King - ARM Linux @ 2013-09-20 16:08 ` Michele Baldessari 2013-09-20 16:40 ` Yuchung Cheng 0 siblings, 1 reply; 15+ messages in thread From: Michele Baldessari @ 2013-09-20 16:08 UTC (permalink / raw) To: Russell King - ARM Linux; +Cc: netdev, ycheng Hi Russell, On Fri, Sep 20, 2013 at 02:11:53PM +0100, Russell King - ARM Linux wrote: > While running v3.11 on my firewall, I saw this warning. I'm not sure > what it means or what its implications are: > > ------------[ cut here ]------------ > WARNING: CPU: 0 PID: 0 at /home/rmk/git/linux-rmk/net/ipv4/tcp_input.c:2711 tcp_fastretrans_alert+0x178/0x840() > Modules linked in: ipt_REJECT xt_multiport iptable_filter ipt_MASQUERADE xt_nat > xt_mark iptable_nat nf_nat_ipv4 nf_nat ip6table_mangle xt_LOG xt_limit nf_conntrack_ipv6 nf_defrag_ipv6 xt_state ip6table_filter pata_pcmcia libata scsi_mod 3c589_cs ide_gd_mod ide_cs ide_core sa1111_cs sa1100_cs sa11xx_base soc_common sa11x0_dma virt_dma usbcore usb_common > CPU: 0 PID: 0 Comm: swapper Not tainted 3.11.0+ #15 > Backtrace: > [<c02111a8>] (dump_backtrace+0x0/0x114) from [<c02115a0>] (show_stack+0x18/0x1c) r6:c0520824 r5:00000009 r4:00000000 > [<c0211588>] (show_stack+0x0/0x1c) from [<c04d65e0>] (dump_stack+0x20/0x28) > [<c04d65c0>] (dump_stack+0x0/0x28) from [<c021bfb0>] (warn_slowpath_common+0x68/0x88) > [<c021bf48>] (warn_slowpath_common+0x0/0x88) from [<c021bff4>] (warn_slowpath_null+0x24/0x28) > r8:00000000 r7:00000001 r6:00000006 r5:00004320 r4:c11c6580 > [<c021bfd0>] (warn_slowpath_null+0x0/0x28) from [<c04515bc>] (tcp_fastretrans_alert+0x178/0x840) > [<c0451444>] (tcp_fastretrans_alert+0x0/0x840) from [<c045273c>] (tcp_ack+0xa14/0xc18) > [<c0451d28>] (tcp_ack+0x0/0xc18) from [<c0453138>] (tcp_rcv_established+0x494/0x594) > [<c0452ca4>] (tcp_rcv_established+0x0/0x594) from [<c04a977c>] (tcp_v6_do_rcv+0xd0/0x428) > [<c04a96ac>] (tcp_v6_do_rcv+0x0/0x428) from [<c04a9e70>] (tcp_v6_rcv+0x340/0x63c) > [<c04a9b30>] (tcp_v6_rcv+0x0/0x63c) from [<c048b334>] (ip6_input_finish+0x214/0x3c4) > [<c048b120>] (ip6_input_finish+0x0/0x3c4) from [<c048ba60>] (ip6_input+0x64/0x74) > [<c048b9fc>] (ip6_input+0x0/0x74) from [<c048b564>] (ip6_rcv_finish+0x80/0x8c) > r4:c1c9ee20 > [<c048b4e4>] (ip6_rcv_finish+0x0/0x8c) from [<c048b994>] (ipv6_rcv+0x424/0x48c) > r4:c1c9ee20 > [<c048b570>] (ipv6_rcv+0x0/0x48c) from [<c0407624>] (__netif_receive_skb_core+0x618/0x688) > r8:0000dd86 r7:00000000 r6:c11f6800 r5:c05ee6cc r4:c05f1b98 > [<c040700c>] (__netif_receive_skb_core+0x0/0x688) from [<c040770c>] (__netif_receive_skb+0x78/0x80) > [<c0407694>] (__netif_receive_skb+0x0/0x80) from [<c04077a8>] (process_backlog+0x94/0x14c) > r5:c06091e0 r4:c0609220 > [<c0407714>] (process_backlog+0x0/0x14c) from [<c0407af4>] (net_rx_action+0x78/0x1ac) > [<c0407a7c>] (net_rx_action+0x0/0x1ac) from [<c021f500>] (__do_softirq+0xb4/0x198) > [<c021f44c>] (__do_softirq+0x0/0x198) from [<c021f90c>] (irq_exit+0x74/0xc8) > [<c021f898>] (irq_exit+0x0/0xc8) from [<c020f1ac>] (handle_IRQ+0x68/0x88) > r4:0000000b > [<c020f144>] (handle_IRQ+0x0/0x88) from [<c0208210>] (asm_do_IRQ+0x10/0x14) > r5:60000013 r4:c0246818 > [<c0208200>] (asm_do_IRQ+0x0/0x14) from [<c0211fcc>] (__irq_svc+0x2c/0x98) > Exception stack(0xc05e7f54 to 0xc05e7f9c) > 7f40: 00000000 00000000 00000000 > 7f60: 60000013 c05e6000 c06092a4 c05ee080 00000001 c0204000 6901b115 c05e0800 > 7f80: c05e7fb8 c05e7f9c c05e7f9c c020f348 c0246818 60000013 ffffffff > [<c0246794>] (cpu_startup_entry+0x0/0xe8) from [<c04d4d70>] (rest_init+0x64/0x7c) > r7:c05f3940 r6:c0922200 r5:c0609340 r4:c05ee0c0 > [<c04d4d0c>] (rest_init+0x0/0x7c) from [<c05c3acc>] (start_kernel+0x350/0x3ac) > [<c05c377c>] (start_kernel+0x0/0x3ac) from [<c0208040>] (0xc0208040) > ---[ end trace ab55f0e3f592fa5e ]--- there's been multiple reports about this one: https://bugzilla.redhat.com/show_bug.cgi?id=989251 http://bugzilla.kernel.org/show_bug.cgi?id=60779 Could you try Yuchung's debug patch? http://www.spinics.net/lists/netdev/msg250193.html thanks, Michele -- Michele Baldessari <michele@acksyn.org> C2A5 9DA3 9961 4FFB E01B D0BC DDD4 DCCB 7515 5C6D ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: IPv6 kernel warning 2013-09-20 16:08 ` Michele Baldessari @ 2013-09-20 16:40 ` Yuchung Cheng 2013-10-07 18:13 ` dormando 0 siblings, 1 reply; 15+ messages in thread From: Yuchung Cheng @ 2013-09-20 16:40 UTC (permalink / raw) To: Michele Baldessari; +Cc: Russell King - ARM Linux, netdev On Fri, Sep 20, 2013 at 9:08 AM, Michele Baldessari <michele@acksyn.org> wrote: > Hi Russell, > > On Fri, Sep 20, 2013 at 02:11:53PM +0100, Russell King - ARM Linux wrote: >> While running v3.11 on my firewall, I saw this warning. I'm not sure >> what it means or what its implications are: >> >> ------------[ cut here ]------------ >> WARNING: CPU: 0 PID: 0 at /home/rmk/git/linux-rmk/net/ipv4/tcp_input.c:2711 tcp_fastretrans_alert+0x178/0x840() >> Modules linked in: ipt_REJECT xt_multiport iptable_filter ipt_MASQUERADE xt_nat >> xt_mark iptable_nat nf_nat_ipv4 nf_nat ip6table_mangle xt_LOG xt_limit nf_conntrack_ipv6 nf_defrag_ipv6 xt_state ip6table_filter pata_pcmcia libata scsi_mod 3c589_cs ide_gd_mod ide_cs ide_core sa1111_cs sa1100_cs sa11xx_base soc_common sa11x0_dma virt_dma usbcore usb_common >> CPU: 0 PID: 0 Comm: swapper Not tainted 3.11.0+ #15 >> Backtrace: >> [<c02111a8>] (dump_backtrace+0x0/0x114) from [<c02115a0>] (show_stack+0x18/0x1c) r6:c0520824 r5:00000009 r4:00000000 >> [<c0211588>] (show_stack+0x0/0x1c) from [<c04d65e0>] (dump_stack+0x20/0x28) >> [<c04d65c0>] (dump_stack+0x0/0x28) from [<c021bfb0>] (warn_slowpath_common+0x68/0x88) >> [<c021bf48>] (warn_slowpath_common+0x0/0x88) from [<c021bff4>] (warn_slowpath_null+0x24/0x28) >> r8:00000000 r7:00000001 r6:00000006 r5:00004320 r4:c11c6580 >> [<c021bfd0>] (warn_slowpath_null+0x0/0x28) from [<c04515bc>] (tcp_fastretrans_alert+0x178/0x840) >> [<c0451444>] (tcp_fastretrans_alert+0x0/0x840) from [<c045273c>] (tcp_ack+0xa14/0xc18) >> [<c0451d28>] (tcp_ack+0x0/0xc18) from [<c0453138>] (tcp_rcv_established+0x494/0x594) >> [<c0452ca4>] (tcp_rcv_established+0x0/0x594) from [<c04a977c>] (tcp_v6_do_rcv+0xd0/0x428) >> [<c04a96ac>] (tcp_v6_do_rcv+0x0/0x428) from [<c04a9e70>] (tcp_v6_rcv+0x340/0x63c) >> [<c04a9b30>] (tcp_v6_rcv+0x0/0x63c) from [<c048b334>] (ip6_input_finish+0x214/0x3c4) >> [<c048b120>] (ip6_input_finish+0x0/0x3c4) from [<c048ba60>] (ip6_input+0x64/0x74) >> [<c048b9fc>] (ip6_input+0x0/0x74) from [<c048b564>] (ip6_rcv_finish+0x80/0x8c) >> r4:c1c9ee20 >> [<c048b4e4>] (ip6_rcv_finish+0x0/0x8c) from [<c048b994>] (ipv6_rcv+0x424/0x48c) >> r4:c1c9ee20 >> [<c048b570>] (ipv6_rcv+0x0/0x48c) from [<c0407624>] (__netif_receive_skb_core+0x618/0x688) >> r8:0000dd86 r7:00000000 r6:c11f6800 r5:c05ee6cc r4:c05f1b98 >> [<c040700c>] (__netif_receive_skb_core+0x0/0x688) from [<c040770c>] (__netif_receive_skb+0x78/0x80) >> [<c0407694>] (__netif_receive_skb+0x0/0x80) from [<c04077a8>] (process_backlog+0x94/0x14c) >> r5:c06091e0 r4:c0609220 >> [<c0407714>] (process_backlog+0x0/0x14c) from [<c0407af4>] (net_rx_action+0x78/0x1ac) >> [<c0407a7c>] (net_rx_action+0x0/0x1ac) from [<c021f500>] (__do_softirq+0xb4/0x198) >> [<c021f44c>] (__do_softirq+0x0/0x198) from [<c021f90c>] (irq_exit+0x74/0xc8) >> [<c021f898>] (irq_exit+0x0/0xc8) from [<c020f1ac>] (handle_IRQ+0x68/0x88) >> r4:0000000b >> [<c020f144>] (handle_IRQ+0x0/0x88) from [<c0208210>] (asm_do_IRQ+0x10/0x14) >> r5:60000013 r4:c0246818 >> [<c0208200>] (asm_do_IRQ+0x0/0x14) from [<c0211fcc>] (__irq_svc+0x2c/0x98) >> Exception stack(0xc05e7f54 to 0xc05e7f9c) >> 7f40: 00000000 00000000 00000000 >> 7f60: 60000013 c05e6000 c06092a4 c05ee080 00000001 c0204000 6901b115 c05e0800 >> 7f80: c05e7fb8 c05e7f9c c05e7f9c c020f348 c0246818 60000013 ffffffff >> [<c0246794>] (cpu_startup_entry+0x0/0xe8) from [<c04d4d70>] (rest_init+0x64/0x7c) >> r7:c05f3940 r6:c0922200 r5:c0609340 r4:c05ee0c0 >> [<c04d4d0c>] (rest_init+0x0/0x7c) from [<c05c3acc>] (start_kernel+0x350/0x3ac) >> [<c05c377c>] (start_kernel+0x0/0x3ac) from [<c0208040>] (0xc0208040) >> ---[ end trace ab55f0e3f592fa5e ]--- > > there's been multiple reports about this one: > https://bugzilla.redhat.com/show_bug.cgi?id=989251 > http://bugzilla.kernel.org/show_bug.cgi?id=60779 > > Could you try Yuchung's debug patch? > http://www.spinics.net/lists/netdev/msg250193.html Yes it looks like the same bug. Please try that patch to help identify this elusive bug. > > thanks, > Michele > -- > Michele Baldessari <michele@acksyn.org> > C2A5 9DA3 9961 4FFB E01B D0BC DDD4 DCCB 7515 5C6D ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: IPv6 kernel warning 2013-09-20 16:40 ` Yuchung Cheng @ 2013-10-07 18:13 ` dormando 2013-10-07 19:51 ` Yuchung Cheng 0 siblings, 1 reply; 15+ messages in thread From: dormando @ 2013-10-07 18:13 UTC (permalink / raw) To: Yuchung Cheng; +Cc: Michele Baldessari, Russell King - ARM Linux, netdev > > > > there's been multiple reports about this one: > > https://bugzilla.redhat.com/show_bug.cgi?id=989251 > > http://bugzilla.kernel.org/show_bug.cgi?id=60779 > > > > Could you try Yuchung's debug patch? > > http://www.spinics.net/lists/netdev/msg250193.html > Yes it looks like the same bug. Please try that patch to help identify > this elusive bug. > Hi! We get this one a few times a day in production. Here's a warning with your debug trace in the line immediately following: (I censored a few things) [125311.721950] ------------[ cut here ]------------ [125311.721961] WARNING: at net/ipv4/tcp_input.c:2776 tcp_fastretrans_alert+0xb58/0xc80() [125311.721962] Modules linked in: bridge ip_vs macvlan coretemp crc32_pclmul ghash_clmulni_intel gpio_ich ipmi_watchdog microcode ipmi_devintf sb_edac lpc_ich edac_core mfd_core ipmi_si ipmi_msghandler iptable_nat nf_nat_ipv4 nf_nat ixgbe igb mdio i2c_algo_bit ptp pps_core [125311.721981] CPU: 11 PID: 0 Comm: swapper/11 Not tainted 3.10.13 #1 [125311.721982] Hardware name: Supermicro XXXXXXXXXXX, BIOS 1.1 10/03/2012 [125311.721984] ffffffff81a82007 ffff88407fc63958 ffffffff816bb9cc ffff88407fc63998 [125311.721986] ffffffff8104b940 00ff8840ad904f82 ffff883b8a165b00 0000000000004120 [125311.721989] 0000000000000001 0000000000000019 0000000000000000 ffff88407fc639a8 [125311.721991] Call Trace: [125311.721992] <IRQ> [<ffffffff816bb9cc>] dump_stack+0x19/0x1d [125311.722002] [<ffffffff8104b940>] warn_slowpath_common+0x70/0xa0 [125311.722005] [<ffffffff8104b98a>] warn_slowpath_null+0x1a/0x20 [125311.722007] [<ffffffff81616db8>] tcp_fastretrans_alert+0xb58/0xc80 [125311.722011] [<ffffffff8161891f>] tcp_ack+0x6df/0xe90 [125311.722016] [<ffffffff8164e0ca>] ? ipt_do_table+0x22a/0x680 [125311.722018] [<ffffffff816194b3>] ? tcp_validate_incoming+0x63/0x320 [125311.722021] [<ffffffff8161a55c>] tcp_rcv_established+0x2cc/0x810 [125311.722023] [<ffffffff81622c84>] tcp_v4_do_rcv+0x254/0x4f0 [125311.722025] [<ffffffff816245ac>] tcp_v4_rcv+0x5fc/0x750 [125311.722027] [<ffffffff815ffa00>] ? ip_rcv+0x350/0x350 [125311.722032] [<ffffffff815df3ad>] ? nf_hook_slow+0x7d/0x160 [125311.722034] [<ffffffff815ffa00>] ? ip_rcv+0x350/0x350 [125311.722036] [<ffffffff815fface>] ip_local_deliver_finish+0xce/0x250 [125311.722037] [<ffffffff815ffc9c>] ip_local_deliver+0x4c/0x80 [125311.722039] [<ffffffff815ff329>] ip_rcv_finish+0x119/0x360 [125311.722040] [<ffffffff815ff8e0>] ip_rcv+0x230/0x350 [125311.722046] [<ffffffff815b4067>] __netif_receive_skb_core+0x477/0x600 [125311.722049] [<ffffffff815b4217>] __netif_receive_skb+0x27/0x70 [125311.722051] [<ffffffff815b4354>] process_backlog+0xf4/0x1e0 [125311.722053] [<ffffffff815b4b45>] net_rx_action+0xf5/0x250 [125311.722056] [<ffffffff81053a5f>] __do_softirq+0xef/0x270 [125311.722058] [<ffffffff81053cb5>] irq_exit+0x95/0xa0 [125311.722062] [<ffffffff816c8f26>] do_IRQ+0x66/0xe0 [125311.722065] [<ffffffff816bf62a>] common_interrupt+0x6a/0x6a [125311.722065] <EOI> [<ffffffff8100abf1>] ? default_idle+0x21/0xc0 [125311.722082] [<ffffffff8100a54f>] arch_cpu_idle+0xf/0x20 [125311.722086] [<ffffffff8108f353>] cpu_startup_entry+0xb3/0x230 [125311.722091] [<ffffffff816b439e>] start_secondary+0x1dc/0x1e3 [125311.722093] ---[ end trace e77cd5ba583fcbe9 ]--- [125311.722096] 355.355.1.355:22496 F0x4120 S1 s7 IF25+17-1-24f0 ur57 rr3 rt0 um0 hs23120 nxt23120 It's been happening with all 3.10 kernels, and the one above is .13 as stated in the trace. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: IPv6 kernel warning 2013-10-07 18:13 ` dormando @ 2013-10-07 19:51 ` Yuchung Cheng 2013-10-07 19:56 ` dormando 2013-10-08 14:05 ` Neal Cardwell 0 siblings, 2 replies; 15+ messages in thread From: Yuchung Cheng @ 2013-10-07 19:51 UTC (permalink / raw) To: dormando Cc: Michele Baldessari, Russell King - ARM Linux, netdev, Neal Cardwell, Nandita Dukkipati On Mon, Oct 7, 2013 at 11:13 AM, dormando <dormando@rydia.net> wrote: > > > > > > > there's been multiple reports about this one: > > > https://bugzilla.redhat.com/show_bug.cgi?id=989251 > > > http://bugzilla.kernel.org/show_bug.cgi?id=60779 > > > > > > Could you try Yuchung's debug patch? > > > http://www.spinics.net/lists/netdev/msg250193.html > > Yes it looks like the same bug. Please try that patch to help identify > > this elusive bug. > > > > Hi! > > We get this one a few times a day in production. Here's a warning with > your debug trace in the line immediately following: > (I censored a few things) > > [125311.721950] ------------[ cut here ]------------ > [125311.721961] WARNING: at net/ipv4/tcp_input.c:2776 tcp_fastretrans_alert+0xb58/0xc80() > [125311.721962] Modules linked in: bridge ip_vs macvlan coretemp crc32_pclmul ghash_clmulni_intel gpio_ich ipmi_watchdog microcode ipmi_devintf sb_edac lpc_ich edac_core mfd_core ipmi_si ipmi_msghandler iptable_nat nf_nat_ipv4 nf_nat ixgbe igb mdio i2c_algo_bit ptp pps_core > [125311.721981] CPU: 11 PID: 0 Comm: swapper/11 Not tainted 3.10.13 #1 > [125311.721982] Hardware name: Supermicro XXXXXXXXXXX, BIOS 1.1 10/03/2012 > [125311.721984] ffffffff81a82007 ffff88407fc63958 ffffffff816bb9cc ffff88407fc63998 > [125311.721986] ffffffff8104b940 00ff8840ad904f82 ffff883b8a165b00 0000000000004120 > [125311.721989] 0000000000000001 0000000000000019 0000000000000000 ffff88407fc639a8 > [125311.721991] Call Trace: > [125311.721992] <IRQ> [<ffffffff816bb9cc>] dump_stack+0x19/0x1d > [125311.722002] [<ffffffff8104b940>] warn_slowpath_common+0x70/0xa0 > [125311.722005] [<ffffffff8104b98a>] warn_slowpath_null+0x1a/0x20 > [125311.722007] [<ffffffff81616db8>] tcp_fastretrans_alert+0xb58/0xc80 > [125311.722011] [<ffffffff8161891f>] tcp_ack+0x6df/0xe90 > [125311.722016] [<ffffffff8164e0ca>] ? ipt_do_table+0x22a/0x680 > [125311.722018] [<ffffffff816194b3>] ? tcp_validate_incoming+0x63/0x320 > [125311.722021] [<ffffffff8161a55c>] tcp_rcv_established+0x2cc/0x810 > [125311.722023] [<ffffffff81622c84>] tcp_v4_do_rcv+0x254/0x4f0 > [125311.722025] [<ffffffff816245ac>] tcp_v4_rcv+0x5fc/0x750 > [125311.722027] [<ffffffff815ffa00>] ? ip_rcv+0x350/0x350 > [125311.722032] [<ffffffff815df3ad>] ? nf_hook_slow+0x7d/0x160 > [125311.722034] [<ffffffff815ffa00>] ? ip_rcv+0x350/0x350 > [125311.722036] [<ffffffff815fface>] ip_local_deliver_finish+0xce/0x250 > [125311.722037] [<ffffffff815ffc9c>] ip_local_deliver+0x4c/0x80 > [125311.722039] [<ffffffff815ff329>] ip_rcv_finish+0x119/0x360 > [125311.722040] [<ffffffff815ff8e0>] ip_rcv+0x230/0x350 > [125311.722046] [<ffffffff815b4067>] __netif_receive_skb_core+0x477/0x600 > [125311.722049] [<ffffffff815b4217>] __netif_receive_skb+0x27/0x70 > [125311.722051] [<ffffffff815b4354>] process_backlog+0xf4/0x1e0 > [125311.722053] [<ffffffff815b4b45>] net_rx_action+0xf5/0x250 > [125311.722056] [<ffffffff81053a5f>] __do_softirq+0xef/0x270 > [125311.722058] [<ffffffff81053cb5>] irq_exit+0x95/0xa0 > [125311.722062] [<ffffffff816c8f26>] do_IRQ+0x66/0xe0 > [125311.722065] [<ffffffff816bf62a>] common_interrupt+0x6a/0x6a > [125311.722065] <EOI> [<ffffffff8100abf1>] ? default_idle+0x21/0xc0 > [125311.722082] [<ffffffff8100a54f>] arch_cpu_idle+0xf/0x20 > [125311.722086] [<ffffffff8108f353>] cpu_startup_entry+0xb3/0x230 > [125311.722091] [<ffffffff816b439e>] start_secondary+0x1dc/0x1e3 > [125311.722093] ---[ end trace e77cd5ba583fcbe9 ]--- > [125311.722096] 355.355.1.355:22496 F0x4120 S1 s7 IF25+17-1-24f0 ur57 rr3 rt0 um0 hs23120 nxt23120 > > It's been happening with all 3.10 kernels, and the one above is .13 as > stated in the trace. Thanks! could you post the output of `sysctl -a |grep tcp`? I suspect tcp_process_tlp_ack() should not revert state to Open directly, but calling tcp_try_keep_open() instead, similar to all the undo processing in the tcp_fastretrans_alert(): after tcp_end_cwnd_reduction(), the process (E) falls back to check other stats before moving to CA_Open. index 9c62257..9012b42 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -3314,7 +3314,7 @@ static void tcp_process_tlp_ack(struct sock *sk, u32 ack, tcp_init_cwnd_reduction(sk, true); tcp_set_ca_state(sk, TCP_CA_CWR); tcp_end_cwnd_reduction(sk); - tcp_set_ca_state(sk, TCP_CA_Open); + tcp_try_keep_open(sk); NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPLOSSPROBERECOVERY); } ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: IPv6 kernel warning 2013-10-07 19:51 ` Yuchung Cheng @ 2013-10-07 19:56 ` dormando 2013-10-07 20:00 ` Yuchung Cheng 2013-10-08 14:05 ` Neal Cardwell 1 sibling, 1 reply; 15+ messages in thread From: dormando @ 2013-10-07 19:56 UTC (permalink / raw) To: Yuchung Cheng Cc: Michele Baldessari, Russell King - ARM Linux, netdev, Neal Cardwell, Nandita Dukkipati On Mon, 7 Oct 2013, Yuchung Cheng wrote: > On Mon, Oct 7, 2013 at 11:13 AM, dormando <dormando@rydia.net> wrote: > > > > > > > > > > there's been multiple reports about this one: > > > > https://bugzilla.redhat.com/show_bug.cgi?id=989251 > > > > http://bugzilla.kernel.org/show_bug.cgi?id=60779 > > > > > > > > Could you try Yuchung's debug patch? > > > > http://www.spinics.net/lists/netdev/msg250193.html > > > Yes it looks like the same bug. Please try that patch to help identify > > > this elusive bug. > > > > > > > Hi! > > > > We get this one a few times a day in production. Here's a warning with > > your debug trace in the line immediately following: > > (I censored a few things) > > > > [125311.721950] ------------[ cut here ]------------ > > [125311.721961] WARNING: at net/ipv4/tcp_input.c:2776 tcp_fastretrans_alert+0xb58/0xc80() > > [125311.721962] Modules linked in: bridge ip_vs macvlan coretemp crc32_pclmul ghash_clmulni_intel gpio_ich ipmi_watchdog microcode ipmi_devintf sb_edac lpc_ich edac_core mfd_core ipmi_si ipmi_msghandler iptable_nat nf_nat_ipv4 nf_nat ixgbe igb mdio i2c_algo_bit ptp pps_core > > [125311.721981] CPU: 11 PID: 0 Comm: swapper/11 Not tainted 3.10.13 #1 > > [125311.721982] Hardware name: Supermicro XXXXXXXXXXX, BIOS 1.1 10/03/2012 > > [125311.721984] ffffffff81a82007 ffff88407fc63958 ffffffff816bb9cc ffff88407fc63998 > > [125311.721986] ffffffff8104b940 00ff8840ad904f82 ffff883b8a165b00 0000000000004120 > > [125311.721989] 0000000000000001 0000000000000019 0000000000000000 ffff88407fc639a8 > > [125311.721991] Call Trace: > > [125311.721992] <IRQ> [<ffffffff816bb9cc>] dump_stack+0x19/0x1d > > [125311.722002] [<ffffffff8104b940>] warn_slowpath_common+0x70/0xa0 > > [125311.722005] [<ffffffff8104b98a>] warn_slowpath_null+0x1a/0x20 > > [125311.722007] [<ffffffff81616db8>] tcp_fastretrans_alert+0xb58/0xc80 > > [125311.722011] [<ffffffff8161891f>] tcp_ack+0x6df/0xe90 > > [125311.722016] [<ffffffff8164e0ca>] ? ipt_do_table+0x22a/0x680 > > [125311.722018] [<ffffffff816194b3>] ? tcp_validate_incoming+0x63/0x320 > > [125311.722021] [<ffffffff8161a55c>] tcp_rcv_established+0x2cc/0x810 > > [125311.722023] [<ffffffff81622c84>] tcp_v4_do_rcv+0x254/0x4f0 > > [125311.722025] [<ffffffff816245ac>] tcp_v4_rcv+0x5fc/0x750 > > [125311.722027] [<ffffffff815ffa00>] ? ip_rcv+0x350/0x350 > > [125311.722032] [<ffffffff815df3ad>] ? nf_hook_slow+0x7d/0x160 > > [125311.722034] [<ffffffff815ffa00>] ? ip_rcv+0x350/0x350 > > [125311.722036] [<ffffffff815fface>] ip_local_deliver_finish+0xce/0x250 > > [125311.722037] [<ffffffff815ffc9c>] ip_local_deliver+0x4c/0x80 > > [125311.722039] [<ffffffff815ff329>] ip_rcv_finish+0x119/0x360 > > [125311.722040] [<ffffffff815ff8e0>] ip_rcv+0x230/0x350 > > [125311.722046] [<ffffffff815b4067>] __netif_receive_skb_core+0x477/0x600 > > [125311.722049] [<ffffffff815b4217>] __netif_receive_skb+0x27/0x70 > > [125311.722051] [<ffffffff815b4354>] process_backlog+0xf4/0x1e0 > > [125311.722053] [<ffffffff815b4b45>] net_rx_action+0xf5/0x250 > > [125311.722056] [<ffffffff81053a5f>] __do_softirq+0xef/0x270 > > [125311.722058] [<ffffffff81053cb5>] irq_exit+0x95/0xa0 > > [125311.722062] [<ffffffff816c8f26>] do_IRQ+0x66/0xe0 > > [125311.722065] [<ffffffff816bf62a>] common_interrupt+0x6a/0x6a > > [125311.722065] <EOI> [<ffffffff8100abf1>] ? default_idle+0x21/0xc0 > > [125311.722082] [<ffffffff8100a54f>] arch_cpu_idle+0xf/0x20 > > [125311.722086] [<ffffffff8108f353>] cpu_startup_entry+0xb3/0x230 > > [125311.722091] [<ffffffff816b439e>] start_secondary+0x1dc/0x1e3 > > [125311.722093] ---[ end trace e77cd5ba583fcbe9 ]--- > > [125311.722096] 355.355.1.355:22496 F0x4120 S1 s7 IF25+17-1-24f0 ur57 rr3 rt0 um0 hs23120 nxt23120 > > > > It's been happening with all 3.10 kernels, and the one above is .13 as > > stated in the trace. > > Thanks! could you post the output of `sysctl -a |grep tcp`? > > I suspect tcp_process_tlp_ack() should not revert state to Open > directly, but calling tcp_try_keep_open() instead, similar to all the > undo processing in the tcp_fastretrans_alert(): after > tcp_end_cwnd_reduction(), the process (E) falls back to check other > stats before moving to CA_Open. > > > index 9c62257..9012b42 100644 > --- a/net/ipv4/tcp_input.c > +++ b/net/ipv4/tcp_input.c > @@ -3314,7 +3314,7 @@ static void tcp_process_tlp_ack(struct sock *sk, u32 ack, > tcp_init_cwnd_reduction(sk, true); > tcp_set_ca_state(sk, TCP_CA_CWR); > tcp_end_cwnd_reduction(sk); > - tcp_set_ca_state(sk, TCP_CA_Open); > + tcp_try_keep_open(sk); > NET_INC_STATS_BH(sock_net(sk), > LINUX_MIB_TCPLOSSPROBERECOVERY); > } > Should I apply this and see if the warning stops? net.ipv4.tcp_abort_on_overflow = 0 net.ipv4.tcp_adv_win_scale = 1 net.ipv4.tcp_allowed_congestion_control = cubic reno net.ipv4.tcp_app_win = 31 net.ipv4.tcp_available_congestion_control = cubic reno westwood net.ipv4.tcp_base_mss = 512 net.ipv4.tcp_challenge_ack_limit = 100 net.ipv4.tcp_congestion_control = cubic net.ipv4.tcp_dma_copybreak = 262144 net.ipv4.tcp_dsack = 1 net.ipv4.tcp_early_retrans = 3 net.ipv4.tcp_ecn = 2 net.ipv4.tcp_fack = 1 net.ipv4.tcp_fastopen = 0 net.ipv4.tcp_fastopen_key = 009dc92c-82e3e514-d440ed23-c49b1a89 net.ipv4.tcp_fin_timeout = 5 net.ipv4.tcp_frto = 0 net.ipv4.tcp_keepalive_intvl = 75 net.ipv4.tcp_keepalive_probes = 9 net.ipv4.tcp_keepalive_time = 1800 net.ipv4.tcp_limit_output_bytes = 131072 net.ipv4.tcp_low_latency = 0 net.ipv4.tcp_max_orphans = 2000000 net.ipv4.tcp_max_ssthresh = 0 net.ipv4.tcp_max_syn_backlog = 65536 net.ipv4.tcp_max_tw_buckets = 2000000 net.ipv4.tcp_mem = 6188001 8250670 12376002 net.ipv4.tcp_moderate_rcvbuf = 1 net.ipv4.tcp_mtu_probing = 0 net.ipv4.tcp_no_metrics_save = 1 net.ipv4.tcp_orphan_retries = 0 net.ipv4.tcp_reordering = 3 net.ipv4.tcp_retrans_collapse = 1 net.ipv4.tcp_retries1 = 3 net.ipv4.tcp_retries2 = 15 net.ipv4.tcp_rfc1337 = 0 net.ipv4.tcp_rmem = 4096 87380 16777216 net.ipv4.tcp_sack = 1 net.ipv4.tcp_slow_start_after_idle = 0 net.ipv4.tcp_stdurg = 0 net.ipv4.tcp_syn_retries = 6 net.ipv4.tcp_synack_retries = 5 net.ipv4.tcp_syncookies = 1 net.ipv4.tcp_thin_dupack = 0 net.ipv4.tcp_thin_linear_timeouts = 0 net.ipv4.tcp_timestamps = 1 net.ipv4.tcp_tso_win_divisor = 3 net.ipv4.tcp_tw_recycle = 0 net.ipv4.tcp_tw_reuse = 0 net.ipv4.tcp_user_cwnd_max = 20 net.ipv4.tcp_window_scaling = 1 net.ipv4.tcp_wmem = 4096 65536 16777216 net.ipv4.tcp_workaround_signed_windows = 0 net.ipv4.vs.secure_tcp = 0 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: IPv6 kernel warning 2013-10-07 19:56 ` dormando @ 2013-10-07 20:00 ` Yuchung Cheng 2013-10-07 20:15 ` dormando 2013-10-08 18:24 ` dormando 0 siblings, 2 replies; 15+ messages in thread From: Yuchung Cheng @ 2013-10-07 20:00 UTC (permalink / raw) To: dormando Cc: Michele Baldessari, Russell King - ARM Linux, netdev, Neal Cardwell, Nandita Dukkipati On Mon, Oct 7, 2013 at 12:56 PM, dormando <dormando@rydia.net> wrote: > On Mon, 7 Oct 2013, Yuchung Cheng wrote: > >> On Mon, Oct 7, 2013 at 11:13 AM, dormando <dormando@rydia.net> wrote: >> > >> > > > >> > > > there's been multiple reports about this one: >> > > > https://bugzilla.redhat.com/show_bug.cgi?id=989251 >> > > > http://bugzilla.kernel.org/show_bug.cgi?id=60779 >> > > > >> > > > Could you try Yuchung's debug patch? >> > > > http://www.spinics.net/lists/netdev/msg250193.html >> > > Yes it looks like the same bug. Please try that patch to help identify >> > > this elusive bug. >> > > >> > >> > Hi! >> > >> > We get this one a few times a day in production. Here's a warning with >> > your debug trace in the line immediately following: >> > (I censored a few things) >> > >> > [125311.721950] ------------[ cut here ]------------ >> > [125311.721961] WARNING: at net/ipv4/tcp_input.c:2776 tcp_fastretrans_alert+0xb58/0xc80() >> > [125311.721962] Modules linked in: bridge ip_vs macvlan coretemp crc32_pclmul ghash_clmulni_intel gpio_ich ipmi_watchdog microcode ipmi_devintf sb_edac lpc_ich edac_core mfd_core ipmi_si ipmi_msghandler iptable_nat nf_nat_ipv4 nf_nat ixgbe igb mdio i2c_algo_bit ptp pps_core >> > [125311.721981] CPU: 11 PID: 0 Comm: swapper/11 Not tainted 3.10.13 #1 >> > [125311.721982] Hardware name: Supermicro XXXXXXXXXXX, BIOS 1.1 10/03/2012 >> > [125311.721984] ffffffff81a82007 ffff88407fc63958 ffffffff816bb9cc ffff88407fc63998 >> > [125311.721986] ffffffff8104b940 00ff8840ad904f82 ffff883b8a165b00 0000000000004120 >> > [125311.721989] 0000000000000001 0000000000000019 0000000000000000 ffff88407fc639a8 >> > [125311.721991] Call Trace: >> > [125311.721992] <IRQ> [<ffffffff816bb9cc>] dump_stack+0x19/0x1d >> > [125311.722002] [<ffffffff8104b940>] warn_slowpath_common+0x70/0xa0 >> > [125311.722005] [<ffffffff8104b98a>] warn_slowpath_null+0x1a/0x20 >> > [125311.722007] [<ffffffff81616db8>] tcp_fastretrans_alert+0xb58/0xc80 >> > [125311.722011] [<ffffffff8161891f>] tcp_ack+0x6df/0xe90 >> > [125311.722016] [<ffffffff8164e0ca>] ? ipt_do_table+0x22a/0x680 >> > [125311.722018] [<ffffffff816194b3>] ? tcp_validate_incoming+0x63/0x320 >> > [125311.722021] [<ffffffff8161a55c>] tcp_rcv_established+0x2cc/0x810 >> > [125311.722023] [<ffffffff81622c84>] tcp_v4_do_rcv+0x254/0x4f0 >> > [125311.722025] [<ffffffff816245ac>] tcp_v4_rcv+0x5fc/0x750 >> > [125311.722027] [<ffffffff815ffa00>] ? ip_rcv+0x350/0x350 >> > [125311.722032] [<ffffffff815df3ad>] ? nf_hook_slow+0x7d/0x160 >> > [125311.722034] [<ffffffff815ffa00>] ? ip_rcv+0x350/0x350 >> > [125311.722036] [<ffffffff815fface>] ip_local_deliver_finish+0xce/0x250 >> > [125311.722037] [<ffffffff815ffc9c>] ip_local_deliver+0x4c/0x80 >> > [125311.722039] [<ffffffff815ff329>] ip_rcv_finish+0x119/0x360 >> > [125311.722040] [<ffffffff815ff8e0>] ip_rcv+0x230/0x350 >> > [125311.722046] [<ffffffff815b4067>] __netif_receive_skb_core+0x477/0x600 >> > [125311.722049] [<ffffffff815b4217>] __netif_receive_skb+0x27/0x70 >> > [125311.722051] [<ffffffff815b4354>] process_backlog+0xf4/0x1e0 >> > [125311.722053] [<ffffffff815b4b45>] net_rx_action+0xf5/0x250 >> > [125311.722056] [<ffffffff81053a5f>] __do_softirq+0xef/0x270 >> > [125311.722058] [<ffffffff81053cb5>] irq_exit+0x95/0xa0 >> > [125311.722062] [<ffffffff816c8f26>] do_IRQ+0x66/0xe0 >> > [125311.722065] [<ffffffff816bf62a>] common_interrupt+0x6a/0x6a >> > [125311.722065] <EOI> [<ffffffff8100abf1>] ? default_idle+0x21/0xc0 >> > [125311.722082] [<ffffffff8100a54f>] arch_cpu_idle+0xf/0x20 >> > [125311.722086] [<ffffffff8108f353>] cpu_startup_entry+0xb3/0x230 >> > [125311.722091] [<ffffffff816b439e>] start_secondary+0x1dc/0x1e3 >> > [125311.722093] ---[ end trace e77cd5ba583fcbe9 ]--- >> > [125311.722096] 355.355.1.355:22496 F0x4120 S1 s7 IF25+17-1-24f0 ur57 rr3 rt0 um0 hs23120 nxt23120 >> > >> > It's been happening with all 3.10 kernels, and the one above is .13 as >> > stated in the trace. >> >> Thanks! could you post the output of `sysctl -a |grep tcp`? >> >> I suspect tcp_process_tlp_ack() should not revert state to Open >> directly, but calling tcp_try_keep_open() instead, similar to all the >> undo processing in the tcp_fastretrans_alert(): after >> tcp_end_cwnd_reduction(), the process (E) falls back to check other >> stats before moving to CA_Open. >> >> >> index 9c62257..9012b42 100644 >> --- a/net/ipv4/tcp_input.c >> +++ b/net/ipv4/tcp_input.c >> @@ -3314,7 +3314,7 @@ static void tcp_process_tlp_ack(struct sock *sk, u32 ack, >> tcp_init_cwnd_reduction(sk, true); >> tcp_set_ca_state(sk, TCP_CA_CWR); >> tcp_end_cwnd_reduction(sk); >> - tcp_set_ca_state(sk, TCP_CA_Open); >> + tcp_try_keep_open(sk); >> NET_INC_STATS_BH(sock_net(sk), >> LINUX_MIB_TCPLOSSPROBERECOVERY); >> } >> > > Should I apply this and see if the warning stops? I'd like to hear what the authors of TLP think. In the mean time could you help us collect more evidence by disabling TLP with sysctl net.ipv4.tcp_early_retrans=2 and see if the problem still occurs? (it should not). thanks > > net.ipv4.tcp_abort_on_overflow = 0 > net.ipv4.tcp_adv_win_scale = 1 > net.ipv4.tcp_allowed_congestion_control = cubic reno > net.ipv4.tcp_app_win = 31 > net.ipv4.tcp_available_congestion_control = cubic reno westwood > net.ipv4.tcp_base_mss = 512 > net.ipv4.tcp_challenge_ack_limit = 100 > net.ipv4.tcp_congestion_control = cubic > net.ipv4.tcp_dma_copybreak = 262144 > net.ipv4.tcp_dsack = 1 > net.ipv4.tcp_early_retrans = 3 > net.ipv4.tcp_ecn = 2 > net.ipv4.tcp_fack = 1 > net.ipv4.tcp_fastopen = 0 > net.ipv4.tcp_fastopen_key = 009dc92c-82e3e514-d440ed23-c49b1a89 > net.ipv4.tcp_fin_timeout = 5 > net.ipv4.tcp_frto = 0 > net.ipv4.tcp_keepalive_intvl = 75 > net.ipv4.tcp_keepalive_probes = 9 > net.ipv4.tcp_keepalive_time = 1800 > net.ipv4.tcp_limit_output_bytes = 131072 > net.ipv4.tcp_low_latency = 0 > net.ipv4.tcp_max_orphans = 2000000 > net.ipv4.tcp_max_ssthresh = 0 > net.ipv4.tcp_max_syn_backlog = 65536 > net.ipv4.tcp_max_tw_buckets = 2000000 > net.ipv4.tcp_mem = 6188001 8250670 12376002 > net.ipv4.tcp_moderate_rcvbuf = 1 > net.ipv4.tcp_mtu_probing = 0 > net.ipv4.tcp_no_metrics_save = 1 > net.ipv4.tcp_orphan_retries = 0 > net.ipv4.tcp_reordering = 3 > net.ipv4.tcp_retrans_collapse = 1 > net.ipv4.tcp_retries1 = 3 > net.ipv4.tcp_retries2 = 15 > net.ipv4.tcp_rfc1337 = 0 > net.ipv4.tcp_rmem = 4096 87380 16777216 > net.ipv4.tcp_sack = 1 > net.ipv4.tcp_slow_start_after_idle = 0 > net.ipv4.tcp_stdurg = 0 > net.ipv4.tcp_syn_retries = 6 > net.ipv4.tcp_synack_retries = 5 > net.ipv4.tcp_syncookies = 1 > net.ipv4.tcp_thin_dupack = 0 > net.ipv4.tcp_thin_linear_timeouts = 0 > net.ipv4.tcp_timestamps = 1 > net.ipv4.tcp_tso_win_divisor = 3 > net.ipv4.tcp_tw_recycle = 0 > net.ipv4.tcp_tw_reuse = 0 > net.ipv4.tcp_user_cwnd_max = 20 > net.ipv4.tcp_window_scaling = 1 > net.ipv4.tcp_wmem = 4096 65536 16777216 > net.ipv4.tcp_workaround_signed_windows = 0 > net.ipv4.vs.secure_tcp = 0 > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: IPv6 kernel warning 2013-10-07 20:00 ` Yuchung Cheng @ 2013-10-07 20:15 ` dormando 2013-10-08 18:24 ` dormando 1 sibling, 0 replies; 15+ messages in thread From: dormando @ 2013-10-07 20:15 UTC (permalink / raw) To: Yuchung Cheng Cc: Michele Baldessari, Russell King - ARM Linux, netdev, Neal Cardwell, Nandita Dukkipati On Mon, 7 Oct 2013, Yuchung Cheng wrote: > On Mon, Oct 7, 2013 at 12:56 PM, dormando <dormando@rydia.net> wrote: > > On Mon, 7 Oct 2013, Yuchung Cheng wrote: > > > >> On Mon, Oct 7, 2013 at 11:13 AM, dormando <dormando@rydia.net> wrote: > >> > > >> > > > > >> > > > there's been multiple reports about this one: > >> > > > https://bugzilla.redhat.com/show_bug.cgi?id=989251 > >> > > > http://bugzilla.kernel.org/show_bug.cgi?id=60779 > >> > > > > >> > > > Could you try Yuchung's debug patch? > >> > > > http://www.spinics.net/lists/netdev/msg250193.html > >> > > Yes it looks like the same bug. Please try that patch to help identify > >> > > this elusive bug. > >> > > > >> > > >> > Hi! > >> > > >> > We get this one a few times a day in production. Here's a warning with > >> > your debug trace in the line immediately following: > >> > (I censored a few things) > >> > > >> > [125311.721950] ------------[ cut here ]------------ > >> > [125311.721961] WARNING: at net/ipv4/tcp_input.c:2776 tcp_fastretrans_alert+0xb58/0xc80() > >> > [125311.721962] Modules linked in: bridge ip_vs macvlan coretemp crc32_pclmul ghash_clmulni_intel gpio_ich ipmi_watchdog microcode ipmi_devintf sb_edac lpc_ich edac_core mfd_core ipmi_si ipmi_msghandler iptable_nat nf_nat_ipv4 nf_nat ixgbe igb mdio i2c_algo_bit ptp pps_core > >> > [125311.721981] CPU: 11 PID: 0 Comm: swapper/11 Not tainted 3.10.13 #1 > >> > [125311.721982] Hardware name: Supermicro XXXXXXXXXXX, BIOS 1.1 10/03/2012 > >> > [125311.721984] ffffffff81a82007 ffff88407fc63958 ffffffff816bb9cc ffff88407fc63998 > >> > [125311.721986] ffffffff8104b940 00ff8840ad904f82 ffff883b8a165b00 0000000000004120 > >> > [125311.721989] 0000000000000001 0000000000000019 0000000000000000 ffff88407fc639a8 > >> > [125311.721991] Call Trace: > >> > [125311.721992] <IRQ> [<ffffffff816bb9cc>] dump_stack+0x19/0x1d > >> > [125311.722002] [<ffffffff8104b940>] warn_slowpath_common+0x70/0xa0 > >> > [125311.722005] [<ffffffff8104b98a>] warn_slowpath_null+0x1a/0x20 > >> > [125311.722007] [<ffffffff81616db8>] tcp_fastretrans_alert+0xb58/0xc80 > >> > [125311.722011] [<ffffffff8161891f>] tcp_ack+0x6df/0xe90 > >> > [125311.722016] [<ffffffff8164e0ca>] ? ipt_do_table+0x22a/0x680 > >> > [125311.722018] [<ffffffff816194b3>] ? tcp_validate_incoming+0x63/0x320 > >> > [125311.722021] [<ffffffff8161a55c>] tcp_rcv_established+0x2cc/0x810 > >> > [125311.722023] [<ffffffff81622c84>] tcp_v4_do_rcv+0x254/0x4f0 > >> > [125311.722025] [<ffffffff816245ac>] tcp_v4_rcv+0x5fc/0x750 > >> > [125311.722027] [<ffffffff815ffa00>] ? ip_rcv+0x350/0x350 > >> > [125311.722032] [<ffffffff815df3ad>] ? nf_hook_slow+0x7d/0x160 > >> > [125311.722034] [<ffffffff815ffa00>] ? ip_rcv+0x350/0x350 > >> > [125311.722036] [<ffffffff815fface>] ip_local_deliver_finish+0xce/0x250 > >> > [125311.722037] [<ffffffff815ffc9c>] ip_local_deliver+0x4c/0x80 > >> > [125311.722039] [<ffffffff815ff329>] ip_rcv_finish+0x119/0x360 > >> > [125311.722040] [<ffffffff815ff8e0>] ip_rcv+0x230/0x350 > >> > [125311.722046] [<ffffffff815b4067>] __netif_receive_skb_core+0x477/0x600 > >> > [125311.722049] [<ffffffff815b4217>] __netif_receive_skb+0x27/0x70 > >> > [125311.722051] [<ffffffff815b4354>] process_backlog+0xf4/0x1e0 > >> > [125311.722053] [<ffffffff815b4b45>] net_rx_action+0xf5/0x250 > >> > [125311.722056] [<ffffffff81053a5f>] __do_softirq+0xef/0x270 > >> > [125311.722058] [<ffffffff81053cb5>] irq_exit+0x95/0xa0 > >> > [125311.722062] [<ffffffff816c8f26>] do_IRQ+0x66/0xe0 > >> > [125311.722065] [<ffffffff816bf62a>] common_interrupt+0x6a/0x6a > >> > [125311.722065] <EOI> [<ffffffff8100abf1>] ? default_idle+0x21/0xc0 > >> > [125311.722082] [<ffffffff8100a54f>] arch_cpu_idle+0xf/0x20 > >> > [125311.722086] [<ffffffff8108f353>] cpu_startup_entry+0xb3/0x230 > >> > [125311.722091] [<ffffffff816b439e>] start_secondary+0x1dc/0x1e3 > >> > [125311.722093] ---[ end trace e77cd5ba583fcbe9 ]--- > >> > [125311.722096] 355.355.1.355:22496 F0x4120 S1 s7 IF25+17-1-24f0 ur57 rr3 rt0 um0 hs23120 nxt23120 > >> > > >> > It's been happening with all 3.10 kernels, and the one above is .13 as > >> > stated in the trace. > >> > >> Thanks! could you post the output of `sysctl -a |grep tcp`? > >> > >> I suspect tcp_process_tlp_ack() should not revert state to Open > >> directly, but calling tcp_try_keep_open() instead, similar to all the > >> undo processing in the tcp_fastretrans_alert(): after > >> tcp_end_cwnd_reduction(), the process (E) falls back to check other > >> stats before moving to CA_Open. > >> > >> > >> index 9c62257..9012b42 100644 > >> --- a/net/ipv4/tcp_input.c > >> +++ b/net/ipv4/tcp_input.c > >> @@ -3314,7 +3314,7 @@ static void tcp_process_tlp_ack(struct sock *sk, u32 ack, > >> tcp_init_cwnd_reduction(sk, true); > >> tcp_set_ca_state(sk, TCP_CA_CWR); > >> tcp_end_cwnd_reduction(sk); > >> - tcp_set_ca_state(sk, TCP_CA_Open); > >> + tcp_try_keep_open(sk); > >> NET_INC_STATS_BH(sock_net(sk), > >> LINUX_MIB_TCPLOSSPROBERECOVERY); > >> } > >> > > > > Should I apply this and see if the warning stops? > I'd like to hear what the authors of TLP think. In the mean time could > you help us collect more evidence by disabling TLP with > sysctl net.ipv4.tcp_early_retrans=2 > and see if the problem still occurs? (it should not). > > thanks Changed on one machine. We tend to only see one per box every 12-24 hours, so it'll take a while to confirm. > > > > net.ipv4.tcp_abort_on_overflow = 0 > > net.ipv4.tcp_adv_win_scale = 1 > > net.ipv4.tcp_allowed_congestion_control = cubic reno > > net.ipv4.tcp_app_win = 31 > > net.ipv4.tcp_available_congestion_control = cubic reno westwood > > net.ipv4.tcp_base_mss = 512 > > net.ipv4.tcp_challenge_ack_limit = 100 > > net.ipv4.tcp_congestion_control = cubic > > net.ipv4.tcp_dma_copybreak = 262144 > > net.ipv4.tcp_dsack = 1 > > net.ipv4.tcp_early_retrans = 3 > > > > net.ipv4.tcp_ecn = 2 > > net.ipv4.tcp_fack = 1 > > net.ipv4.tcp_fastopen = 0 > > net.ipv4.tcp_fastopen_key = 009dc92c-82e3e514-d440ed23-c49b1a89 > > net.ipv4.tcp_fin_timeout = 5 > > net.ipv4.tcp_frto = 0 > > net.ipv4.tcp_keepalive_intvl = 75 > > net.ipv4.tcp_keepalive_probes = 9 > > net.ipv4.tcp_keepalive_time = 1800 > > net.ipv4.tcp_limit_output_bytes = 131072 > > net.ipv4.tcp_low_latency = 0 > > net.ipv4.tcp_max_orphans = 2000000 > > net.ipv4.tcp_max_ssthresh = 0 > > net.ipv4.tcp_max_syn_backlog = 65536 > > net.ipv4.tcp_max_tw_buckets = 2000000 > > net.ipv4.tcp_mem = 6188001 8250670 12376002 > > net.ipv4.tcp_moderate_rcvbuf = 1 > > net.ipv4.tcp_mtu_probing = 0 > > net.ipv4.tcp_no_metrics_save = 1 > > net.ipv4.tcp_orphan_retries = 0 > > net.ipv4.tcp_reordering = 3 > > net.ipv4.tcp_retrans_collapse = 1 > > net.ipv4.tcp_retries1 = 3 > > net.ipv4.tcp_retries2 = 15 > > net.ipv4.tcp_rfc1337 = 0 > > net.ipv4.tcp_rmem = 4096 87380 16777216 > > net.ipv4.tcp_sack = 1 > > net.ipv4.tcp_slow_start_after_idle = 0 > > net.ipv4.tcp_stdurg = 0 > > net.ipv4.tcp_syn_retries = 6 > > net.ipv4.tcp_synack_retries = 5 > > net.ipv4.tcp_syncookies = 1 > > net.ipv4.tcp_thin_dupack = 0 > > net.ipv4.tcp_thin_linear_timeouts = 0 > > net.ipv4.tcp_timestamps = 1 > > net.ipv4.tcp_tso_win_divisor = 3 > > net.ipv4.tcp_tw_recycle = 0 > > net.ipv4.tcp_tw_reuse = 0 > > net.ipv4.tcp_user_cwnd_max = 20 > > net.ipv4.tcp_window_scaling = 1 > > net.ipv4.tcp_wmem = 4096 65536 16777216 > > net.ipv4.tcp_workaround_signed_windows = 0 > > net.ipv4.vs.secure_tcp = 0 > > > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: IPv6 kernel warning 2013-10-07 20:00 ` Yuchung Cheng 2013-10-07 20:15 ` dormando @ 2013-10-08 18:24 ` dormando 2013-10-08 20:53 ` Yuchung Cheng 1 sibling, 1 reply; 15+ messages in thread From: dormando @ 2013-10-08 18:24 UTC (permalink / raw) To: Yuchung Cheng Cc: Michele Baldessari, Russell King - ARM Linux, netdev, Neal Cardwell, Nandita Dukkipati On Mon, 7 Oct 2013, Yuchung Cheng wrote: > On Mon, Oct 7, 2013 at 12:56 PM, dormando <dormando@rydia.net> wrote: > > On Mon, 7 Oct 2013, Yuchung Cheng wrote: > > > >> On Mon, Oct 7, 2013 at 11:13 AM, dormando <dormando@rydia.net> wrote: > >> > > >> > > > > >> > > > there's been multiple reports about this one: > >> > > > https://bugzilla.redhat.com/show_bug.cgi?id=989251 > >> > > > http://bugzilla.kernel.org/show_bug.cgi?id=60779 > >> > > > > >> > > > Could you try Yuchung's debug patch? > >> > > > http://www.spinics.net/lists/netdev/msg250193.html > >> > > Yes it looks like the same bug. Please try that patch to help identify > >> > > this elusive bug. > >> > > > >> > > >> > Hi! > >> > > >> > We get this one a few times a day in production. Here's a warning with > >> > your debug trace in the line immediately following: > >> > (I censored a few things) > >> > > >> > [125311.721950] ------------[ cut here ]------------ > >> > [125311.721961] WARNING: at net/ipv4/tcp_input.c:2776 tcp_fastretrans_alert+0xb58/0xc80() > >> > [125311.721962] Modules linked in: bridge ip_vs macvlan coretemp crc32_pclmul ghash_clmulni_intel gpio_ich ipmi_watchdog microcode ipmi_devintf sb_edac lpc_ich edac_core mfd_core ipmi_si ipmi_msghandler iptable_nat nf_nat_ipv4 nf_nat ixgbe igb mdio i2c_algo_bit ptp pps_core > >> > [125311.721981] CPU: 11 PID: 0 Comm: swapper/11 Not tainted 3.10.13 #1 > >> > [125311.721982] Hardware name: Supermicro XXXXXXXXXXX, BIOS 1.1 10/03/2012 > >> > [125311.721984] ffffffff81a82007 ffff88407fc63958 ffffffff816bb9cc ffff88407fc63998 > >> > [125311.721986] ffffffff8104b940 00ff8840ad904f82 ffff883b8a165b00 0000000000004120 > >> > [125311.721989] 0000000000000001 0000000000000019 0000000000000000 ffff88407fc639a8 > >> > [125311.721991] Call Trace: > >> > [125311.721992] <IRQ> [<ffffffff816bb9cc>] dump_stack+0x19/0x1d > >> > [125311.722002] [<ffffffff8104b940>] warn_slowpath_common+0x70/0xa0 > >> > [125311.722005] [<ffffffff8104b98a>] warn_slowpath_null+0x1a/0x20 > >> > [125311.722007] [<ffffffff81616db8>] tcp_fastretrans_alert+0xb58/0xc80 > >> > [125311.722011] [<ffffffff8161891f>] tcp_ack+0x6df/0xe90 > >> > [125311.722016] [<ffffffff8164e0ca>] ? ipt_do_table+0x22a/0x680 > >> > [125311.722018] [<ffffffff816194b3>] ? tcp_validate_incoming+0x63/0x320 > >> > [125311.722021] [<ffffffff8161a55c>] tcp_rcv_established+0x2cc/0x810 > >> > [125311.722023] [<ffffffff81622c84>] tcp_v4_do_rcv+0x254/0x4f0 > >> > [125311.722025] [<ffffffff816245ac>] tcp_v4_rcv+0x5fc/0x750 > >> > [125311.722027] [<ffffffff815ffa00>] ? ip_rcv+0x350/0x350 > >> > [125311.722032] [<ffffffff815df3ad>] ? nf_hook_slow+0x7d/0x160 > >> > [125311.722034] [<ffffffff815ffa00>] ? ip_rcv+0x350/0x350 > >> > [125311.722036] [<ffffffff815fface>] ip_local_deliver_finish+0xce/0x250 > >> > [125311.722037] [<ffffffff815ffc9c>] ip_local_deliver+0x4c/0x80 > >> > [125311.722039] [<ffffffff815ff329>] ip_rcv_finish+0x119/0x360 > >> > [125311.722040] [<ffffffff815ff8e0>] ip_rcv+0x230/0x350 > >> > [125311.722046] [<ffffffff815b4067>] __netif_receive_skb_core+0x477/0x600 > >> > [125311.722049] [<ffffffff815b4217>] __netif_receive_skb+0x27/0x70 > >> > [125311.722051] [<ffffffff815b4354>] process_backlog+0xf4/0x1e0 > >> > [125311.722053] [<ffffffff815b4b45>] net_rx_action+0xf5/0x250 > >> > [125311.722056] [<ffffffff81053a5f>] __do_softirq+0xef/0x270 > >> > [125311.722058] [<ffffffff81053cb5>] irq_exit+0x95/0xa0 > >> > [125311.722062] [<ffffffff816c8f26>] do_IRQ+0x66/0xe0 > >> > [125311.722065] [<ffffffff816bf62a>] common_interrupt+0x6a/0x6a > >> > [125311.722065] <EOI> [<ffffffff8100abf1>] ? default_idle+0x21/0xc0 > >> > [125311.722082] [<ffffffff8100a54f>] arch_cpu_idle+0xf/0x20 > >> > [125311.722086] [<ffffffff8108f353>] cpu_startup_entry+0xb3/0x230 > >> > [125311.722091] [<ffffffff816b439e>] start_secondary+0x1dc/0x1e3 > >> > [125311.722093] ---[ end trace e77cd5ba583fcbe9 ]--- > >> > [125311.722096] 355.355.1.355:22496 F0x4120 S1 s7 IF25+17-1-24f0 ur57 rr3 rt0 um0 hs23120 nxt23120 > >> > > >> > It's been happening with all 3.10 kernels, and the one above is .13 as > >> > stated in the trace. > >> > >> Thanks! could you post the output of `sysctl -a |grep tcp`? > >> > >> I suspect tcp_process_tlp_ack() should not revert state to Open > >> directly, but calling tcp_try_keep_open() instead, similar to all the > >> undo processing in the tcp_fastretrans_alert(): after > >> tcp_end_cwnd_reduction(), the process (E) falls back to check other > >> stats before moving to CA_Open. > >> > >> > >> index 9c62257..9012b42 100644 > >> --- a/net/ipv4/tcp_input.c > >> +++ b/net/ipv4/tcp_input.c > >> @@ -3314,7 +3314,7 @@ static void tcp_process_tlp_ack(struct sock *sk, u32 ack, > >> tcp_init_cwnd_reduction(sk, true); > >> tcp_set_ca_state(sk, TCP_CA_CWR); > >> tcp_end_cwnd_reduction(sk); > >> - tcp_set_ca_state(sk, TCP_CA_Open); > >> + tcp_try_keep_open(sk); > >> NET_INC_STATS_BH(sock_net(sk), > >> LINUX_MIB_TCPLOSSPROBERECOVERY); > >> } > >> > > > > Should I apply this and see if the warning stops? > I'd like to hear what the authors of TLP think. In the mean time could > you help us collect more evidence by disabling TLP with > sysctl net.ipv4.tcp_early_retrans=2 > and see if the problem still occurs? (it should not). > > thanks Box hasn't had a warning in the last 24ish hours. A neighboring machine with the default tcp_early_retrans setting has had 5-6 in the same timeframe. Is this a harmful situation to the socket in any way, or is it just informational weirdness? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: IPv6 kernel warning 2013-10-08 18:24 ` dormando @ 2013-10-08 20:53 ` Yuchung Cheng 2013-10-09 17:33 ` Yuchung Cheng 0 siblings, 1 reply; 15+ messages in thread From: Yuchung Cheng @ 2013-10-08 20:53 UTC (permalink / raw) To: dormando Cc: Michele Baldessari, Russell King - ARM Linux, netdev, Neal Cardwell, Nandita Dukkipati On Tue, Oct 8, 2013 at 11:24 AM, dormando <dormando@rydia.net> wrote: > On Mon, 7 Oct 2013, Yuchung Cheng wrote: > >> On Mon, Oct 7, 2013 at 12:56 PM, dormando <dormando@rydia.net> wrote: >> > On Mon, 7 Oct 2013, Yuchung Cheng wrote: >> > >> >> On Mon, Oct 7, 2013 at 11:13 AM, dormando <dormando@rydia.net> wrote: >> >> > >> >> > > > >> >> > > > there's been multiple reports about this one: >> >> > > > https://bugzilla.redhat.com/show_bug.cgi?id=989251 >> >> > > > http://bugzilla.kernel.org/show_bug.cgi?id=60779 >> >> > > > >> >> > > > Could you try Yuchung's debug patch? >> >> > > > http://www.spinics.net/lists/netdev/msg250193.html >> >> > > Yes it looks like the same bug. Please try that patch to help identify >> >> > > this elusive bug. >> >> > > >> >> > >> >> > Hi! >> >> > >> >> > We get this one a few times a day in production. Here's a warning with >> >> > your debug trace in the line immediately following: >> >> > (I censored a few things) >> >> > >> >> > [125311.721950] ------------[ cut here ]------------ >> >> > [125311.721961] WARNING: at net/ipv4/tcp_input.c:2776 tcp_fastretrans_alert+0xb58/0xc80() >> >> > [125311.721962] Modules linked in: bridge ip_vs macvlan coretemp crc32_pclmul ghash_clmulni_intel gpio_ich ipmi_watchdog microcode ipmi_devintf sb_edac lpc_ich edac_core mfd_core ipmi_si ipmi_msghandler iptable_nat nf_nat_ipv4 nf_nat ixgbe igb mdio i2c_algo_bit ptp pps_core >> >> > [125311.721981] CPU: 11 PID: 0 Comm: swapper/11 Not tainted 3.10.13 #1 >> >> > [125311.721982] Hardware name: Supermicro XXXXXXXXXXX, BIOS 1.1 10/03/2012 >> >> > [125311.721984] ffffffff81a82007 ffff88407fc63958 ffffffff816bb9cc ffff88407fc63998 >> >> > [125311.721986] ffffffff8104b940 00ff8840ad904f82 ffff883b8a165b00 0000000000004120 >> >> > [125311.721989] 0000000000000001 0000000000000019 0000000000000000 ffff88407fc639a8 >> >> > [125311.721991] Call Trace: >> >> > [125311.721992] <IRQ> [<ffffffff816bb9cc>] dump_stack+0x19/0x1d >> >> > [125311.722002] [<ffffffff8104b940>] warn_slowpath_common+0x70/0xa0 >> >> > [125311.722005] [<ffffffff8104b98a>] warn_slowpath_null+0x1a/0x20 >> >> > [125311.722007] [<ffffffff81616db8>] tcp_fastretrans_alert+0xb58/0xc80 >> >> > [125311.722011] [<ffffffff8161891f>] tcp_ack+0x6df/0xe90 >> >> > [125311.722016] [<ffffffff8164e0ca>] ? ipt_do_table+0x22a/0x680 >> >> > [125311.722018] [<ffffffff816194b3>] ? tcp_validate_incoming+0x63/0x320 >> >> > [125311.722021] [<ffffffff8161a55c>] tcp_rcv_established+0x2cc/0x810 >> >> > [125311.722023] [<ffffffff81622c84>] tcp_v4_do_rcv+0x254/0x4f0 >> >> > [125311.722025] [<ffffffff816245ac>] tcp_v4_rcv+0x5fc/0x750 >> >> > [125311.722027] [<ffffffff815ffa00>] ? ip_rcv+0x350/0x350 >> >> > [125311.722032] [<ffffffff815df3ad>] ? nf_hook_slow+0x7d/0x160 >> >> > [125311.722034] [<ffffffff815ffa00>] ? ip_rcv+0x350/0x350 >> >> > [125311.722036] [<ffffffff815fface>] ip_local_deliver_finish+0xce/0x250 >> >> > [125311.722037] [<ffffffff815ffc9c>] ip_local_deliver+0x4c/0x80 >> >> > [125311.722039] [<ffffffff815ff329>] ip_rcv_finish+0x119/0x360 >> >> > [125311.722040] [<ffffffff815ff8e0>] ip_rcv+0x230/0x350 >> >> > [125311.722046] [<ffffffff815b4067>] __netif_receive_skb_core+0x477/0x600 >> >> > [125311.722049] [<ffffffff815b4217>] __netif_receive_skb+0x27/0x70 >> >> > [125311.722051] [<ffffffff815b4354>] process_backlog+0xf4/0x1e0 >> >> > [125311.722053] [<ffffffff815b4b45>] net_rx_action+0xf5/0x250 >> >> > [125311.722056] [<ffffffff81053a5f>] __do_softirq+0xef/0x270 >> >> > [125311.722058] [<ffffffff81053cb5>] irq_exit+0x95/0xa0 >> >> > [125311.722062] [<ffffffff816c8f26>] do_IRQ+0x66/0xe0 >> >> > [125311.722065] [<ffffffff816bf62a>] common_interrupt+0x6a/0x6a >> >> > [125311.722065] <EOI> [<ffffffff8100abf1>] ? default_idle+0x21/0xc0 >> >> > [125311.722082] [<ffffffff8100a54f>] arch_cpu_idle+0xf/0x20 >> >> > [125311.722086] [<ffffffff8108f353>] cpu_startup_entry+0xb3/0x230 >> >> > [125311.722091] [<ffffffff816b439e>] start_secondary+0x1dc/0x1e3 >> >> > [125311.722093] ---[ end trace e77cd5ba583fcbe9 ]--- >> >> > [125311.722096] 355.355.1.355:22496 F0x4120 S1 s7 IF25+17-1-24f0 ur57 rr3 rt0 um0 hs23120 nxt23120 >> >> > >> >> > It's been happening with all 3.10 kernels, and the one above is .13 as >> >> > stated in the trace. >> >> >> >> Thanks! could you post the output of `sysctl -a |grep tcp`? >> >> >> >> I suspect tcp_process_tlp_ack() should not revert state to Open >> >> directly, but calling tcp_try_keep_open() instead, similar to all the >> >> undo processing in the tcp_fastretrans_alert(): after >> >> tcp_end_cwnd_reduction(), the process (E) falls back to check other >> >> stats before moving to CA_Open. >> >> >> >> >> >> index 9c62257..9012b42 100644 >> >> --- a/net/ipv4/tcp_input.c >> >> +++ b/net/ipv4/tcp_input.c >> >> @@ -3314,7 +3314,7 @@ static void tcp_process_tlp_ack(struct sock *sk, u32 ack, >> >> tcp_init_cwnd_reduction(sk, true); >> >> tcp_set_ca_state(sk, TCP_CA_CWR); >> >> tcp_end_cwnd_reduction(sk); >> >> - tcp_set_ca_state(sk, TCP_CA_Open); >> >> + tcp_try_keep_open(sk); >> >> NET_INC_STATS_BH(sock_net(sk), >> >> LINUX_MIB_TCPLOSSPROBERECOVERY); >> >> } >> >> >> > >> > Should I apply this and see if the warning stops? >> I'd like to hear what the authors of TLP think. In the mean time could >> you help us collect more evidence by disabling TLP with >> sysctl net.ipv4.tcp_early_retrans=2 >> and see if the problem still occurs? (it should not). >> >> thanks > > Box hasn't had a warning in the last 24ish hours. A neighboring machine > with the default tcp_early_retrans setting has had 5-6 in the same > timeframe. > > Is this a harmful situation to the socket in any way, or is it just > informational weirdness? It should be fairly harmless. The ack that triggers the warning should set the TCP back to the good (non-Open) state, but it's still good to get rid of. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: IPv6 kernel warning 2013-10-08 20:53 ` Yuchung Cheng @ 2013-10-09 17:33 ` Yuchung Cheng 2013-10-09 18:48 ` dormando 0 siblings, 1 reply; 15+ messages in thread From: Yuchung Cheng @ 2013-10-09 17:33 UTC (permalink / raw) To: dormando Cc: Michele Baldessari, Russell King - ARM Linux, netdev, Neal Cardwell, Nandita Dukkipati [-- Attachment #1: Type: text/plain, Size: 6631 bytes --] On Tue, Oct 8, 2013 at 1:53 PM, Yuchung Cheng <ycheng@google.com> wrote: > > On Tue, Oct 8, 2013 at 11:24 AM, dormando <dormando@rydia.net> wrote: > > On Mon, 7 Oct 2013, Yuchung Cheng wrote: > > > >> On Mon, Oct 7, 2013 at 12:56 PM, dormando <dormando@rydia.net> wrote: > >> > On Mon, 7 Oct 2013, Yuchung Cheng wrote: > >> > > >> >> On Mon, Oct 7, 2013 at 11:13 AM, dormando <dormando@rydia.net> wrote: > >> >> > > >> >> > > > > >> >> > > > there's been multiple reports about this one: > >> >> > > > https://bugzilla.redhat.com/show_bug.cgi?id=989251 > >> >> > > > http://bugzilla.kernel.org/show_bug.cgi?id=60779 > >> >> > > > > >> >> > > > Could you try Yuchung's debug patch? > >> >> > > > http://www.spinics.net/lists/netdev/msg250193.html > >> >> > > Yes it looks like the same bug. Please try that patch to help identify > >> >> > > this elusive bug. > >> >> > > > >> >> > > >> >> > Hi! > >> >> > > >> >> > We get this one a few times a day in production. Here's a warning with > >> >> > your debug trace in the line immediately following: > >> >> > (I censored a few things) > >> >> > > >> >> > [125311.721950] ------------[ cut here ]------------ > >> >> > [125311.721961] WARNING: at net/ipv4/tcp_input.c:2776 tcp_fastretrans_alert+0xb58/0xc80() > >> >> > [125311.721962] Modules linked in: bridge ip_vs macvlan coretemp crc32_pclmul ghash_clmulni_intel gpio_ich ipmi_watchdog microcode ipmi_devintf sb_edac lpc_ich edac_core mfd_core ipmi_si ipmi_msghandler iptable_nat nf_nat_ipv4 nf_nat ixgbe igb mdio i2c_algo_bit ptp pps_core > >> >> > [125311.721981] CPU: 11 PID: 0 Comm: swapper/11 Not tainted 3.10.13 #1 > >> >> > [125311.721982] Hardware name: Supermicro XXXXXXXXXXX, BIOS 1.1 10/03/2012 > >> >> > [125311.721984] ffffffff81a82007 ffff88407fc63958 ffffffff816bb9cc ffff88407fc63998 > >> >> > [125311.721986] ffffffff8104b940 00ff8840ad904f82 ffff883b8a165b00 0000000000004120 > >> >> > [125311.721989] 0000000000000001 0000000000000019 0000000000000000 ffff88407fc639a8 > >> >> > [125311.721991] Call Trace: > >> >> > [125311.721992] <IRQ> [<ffffffff816bb9cc>] dump_stack+0x19/0x1d > >> >> > [125311.722002] [<ffffffff8104b940>] warn_slowpath_common+0x70/0xa0 > >> >> > [125311.722005] [<ffffffff8104b98a>] warn_slowpath_null+0x1a/0x20 > >> >> > [125311.722007] [<ffffffff81616db8>] tcp_fastretrans_alert+0xb58/0xc80 > >> >> > [125311.722011] [<ffffffff8161891f>] tcp_ack+0x6df/0xe90 > >> >> > [125311.722016] [<ffffffff8164e0ca>] ? ipt_do_table+0x22a/0x680 > >> >> > [125311.722018] [<ffffffff816194b3>] ? tcp_validate_incoming+0x63/0x320 > >> >> > [125311.722021] [<ffffffff8161a55c>] tcp_rcv_established+0x2cc/0x810 > >> >> > [125311.722023] [<ffffffff81622c84>] tcp_v4_do_rcv+0x254/0x4f0 > >> >> > [125311.722025] [<ffffffff816245ac>] tcp_v4_rcv+0x5fc/0x750 > >> >> > [125311.722027] [<ffffffff815ffa00>] ? ip_rcv+0x350/0x350 > >> >> > [125311.722032] [<ffffffff815df3ad>] ? nf_hook_slow+0x7d/0x160 > >> >> > [125311.722034] [<ffffffff815ffa00>] ? ip_rcv+0x350/0x350 > >> >> > [125311.722036] [<ffffffff815fface>] ip_local_deliver_finish+0xce/0x250 > >> >> > [125311.722037] [<ffffffff815ffc9c>] ip_local_deliver+0x4c/0x80 > >> >> > [125311.722039] [<ffffffff815ff329>] ip_rcv_finish+0x119/0x360 > >> >> > [125311.722040] [<ffffffff815ff8e0>] ip_rcv+0x230/0x350 > >> >> > [125311.722046] [<ffffffff815b4067>] __netif_receive_skb_core+0x477/0x600 > >> >> > [125311.722049] [<ffffffff815b4217>] __netif_receive_skb+0x27/0x70 > >> >> > [125311.722051] [<ffffffff815b4354>] process_backlog+0xf4/0x1e0 > >> >> > [125311.722053] [<ffffffff815b4b45>] net_rx_action+0xf5/0x250 > >> >> > [125311.722056] [<ffffffff81053a5f>] __do_softirq+0xef/0x270 > >> >> > [125311.722058] [<ffffffff81053cb5>] irq_exit+0x95/0xa0 > >> >> > [125311.722062] [<ffffffff816c8f26>] do_IRQ+0x66/0xe0 > >> >> > [125311.722065] [<ffffffff816bf62a>] common_interrupt+0x6a/0x6a > >> >> > [125311.722065] <EOI> [<ffffffff8100abf1>] ? default_idle+0x21/0xc0 > >> >> > [125311.722082] [<ffffffff8100a54f>] arch_cpu_idle+0xf/0x20 > >> >> > [125311.722086] [<ffffffff8108f353>] cpu_startup_entry+0xb3/0x230 > >> >> > [125311.722091] [<ffffffff816b439e>] start_secondary+0x1dc/0x1e3 > >> >> > [125311.722093] ---[ end trace e77cd5ba583fcbe9 ]--- > >> >> > [125311.722096] 355.355.1.355:22496 F0x4120 S1 s7 IF25+17-1-24f0 ur57 rr3 rt0 um0 hs23120 nxt23120 > >> >> > > >> >> > It's been happening with all 3.10 kernels, and the one above is .13 as > >> >> > stated in the trace. > >> >> > >> >> Thanks! could you post the output of `sysctl -a |grep tcp`? > >> >> > >> >> I suspect tcp_process_tlp_ack() should not revert state to Open > >> >> directly, but calling tcp_try_keep_open() instead, similar to all the > >> >> undo processing in the tcp_fastretrans_alert(): after > >> >> tcp_end_cwnd_reduction(), the process (E) falls back to check other > >> >> stats before moving to CA_Open. > >> >> > >> >> > >> >> index 9c62257..9012b42 100644 > >> >> --- a/net/ipv4/tcp_input.c > >> >> +++ b/net/ipv4/tcp_input.c > >> >> @@ -3314,7 +3314,7 @@ static void tcp_process_tlp_ack(struct sock *sk, u32 ack, > >> >> tcp_init_cwnd_reduction(sk, true); > >> >> tcp_set_ca_state(sk, TCP_CA_CWR); > >> >> tcp_end_cwnd_reduction(sk); > >> >> - tcp_set_ca_state(sk, TCP_CA_Open); > >> >> + tcp_try_keep_open(sk); > >> >> NET_INC_STATS_BH(sock_net(sk), > >> >> LINUX_MIB_TCPLOSSPROBERECOVERY); > >> >> } > >> >> > >> > > >> > Should I apply this and see if the warning stops? Hi Dormando, Could you try this patch to make sure it fixes the warning (with sysctl net.ipv4.early_retrans=3)? > >> I'd like to hear what the authors of TLP think. In the mean time could > >> you help us collect more evidence by disabling TLP with > >> sysctl net.ipv4.tcp_early_retrans=2 > >> and see if the problem still occurs? (it should not). > >> > >> thanks > > > > Box hasn't had a warning in the last 24ish hours. A neighboring machine > > with the default tcp_early_retrans setting has had 5-6 in the same > > timeframe. > > > > Is this a harmful situation to the socket in any way, or is it just > > informational weirdness? > It should be fairly harmless. The ack that triggers the warning should > set the TCP back to the good (non-Open) state, but it's still good to > get rid of. [-- Attachment #2: 0001-tcp-fix-incorrect-ca_state-in-tail-loss-probe.patch --] [-- Type: application/octet-stream, Size: 1356 bytes --] From 6aacfe24692341ac93c1d153a801c34066b86262 Mon Sep 17 00:00:00 2001 From: Yuchung Cheng <ycheng@google.com> Date: Wed, 9 Oct 2013 10:08:52 -0700 Subject: [PATCH] tcp: fix incorrect ca_state in tail loss probe On receiving an ACK that covers the loss probe sequence, TLP immediately sets the congestion state to Open, even though some packets are not recovered and retransmisssion are on the way. The later ACks may trigger a WARN_ON check of step D in tcp_fastretrans_alert(). The fix is to follow the similar procedure in recovery by calling tcp_try_keep_open(). The sender switches to Open state if no packets are retransmissted. Otherwise it goes to Disorder and let subsequent ACKs move the state to Recovery or Open. Signed-off-by: Yuchung Cheng <ycheng@google.com> --- net/ipv4/tcp_input.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 113dc5f..53974c7 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -3291,7 +3291,7 @@ static void tcp_process_tlp_ack(struct sock *sk, u32 ack, int flag) tcp_init_cwnd_reduction(sk, true); tcp_set_ca_state(sk, TCP_CA_CWR); tcp_end_cwnd_reduction(sk); - tcp_set_ca_state(sk, TCP_CA_Open); + tcp_try_keep_open(sk); NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPLOSSPROBERECOVERY); } -- 1.8.4 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: IPv6 kernel warning 2013-10-09 17:33 ` Yuchung Cheng @ 2013-10-09 18:48 ` dormando 2013-10-11 18:15 ` dormando 0 siblings, 1 reply; 15+ messages in thread From: dormando @ 2013-10-09 18:48 UTC (permalink / raw) To: Yuchung Cheng Cc: Michele Baldessari, Russell King - ARM Linux, netdev, Neal Cardwell, Nandita Dukkipati On Wed, 9 Oct 2013, Yuchung Cheng wrote: > On Tue, Oct 8, 2013 at 1:53 PM, Yuchung Cheng <ycheng@google.com> wrote: > > > > On Tue, Oct 8, 2013 at 11:24 AM, dormando <dormando@rydia.net> wrote: > > > On Mon, 7 Oct 2013, Yuchung Cheng wrote: > > > > > >> On Mon, Oct 7, 2013 at 12:56 PM, dormando <dormando@rydia.net> wrote: > > >> > On Mon, 7 Oct 2013, Yuchung Cheng wrote: > > >> > > > >> >> On Mon, Oct 7, 2013 at 11:13 AM, dormando <dormando@rydia.net> wrote: > > >> >> > > > >> >> > > > > > >> >> > > > there's been multiple reports about this one: > > >> >> > > > https://bugzilla.redhat.com/show_bug.cgi?id=989251 > > >> >> > > > http://bugzilla.kernel.org/show_bug.cgi?id=60779 > > >> >> > > > > > >> >> > > > Could you try Yuchung's debug patch? > > >> >> > > > http://www.spinics.net/lists/netdev/msg250193.html > > >> >> > > Yes it looks like the same bug. Please try that patch to help identify > > >> >> > > this elusive bug. > > >> >> > > > > >> >> > > > >> >> > Hi! > > >> >> > > > >> >> > We get this one a few times a day in production. Here's a warning with > > >> >> > your debug trace in the line immediately following: > > >> >> > (I censored a few things) > > >> >> > > > >> >> > [125311.721950] ------------[ cut here ]------------ > > >> >> > [125311.721961] WARNING: at net/ipv4/tcp_input.c:2776 tcp_fastretrans_alert+0xb58/0xc80() > > >> >> > [125311.721962] Modules linked in: bridge ip_vs macvlan coretemp crc32_pclmul ghash_clmulni_intel gpio_ich ipmi_watchdog microcode ipmi_devintf sb_edac lpc_ich edac_core mfd_core ipmi_si ipmi_msghandler iptable_nat nf_nat_ipv4 nf_nat ixgbe igb mdio i2c_algo_bit ptp pps_core > > >> >> > [125311.721981] CPU: 11 PID: 0 Comm: swapper/11 Not tainted 3.10.13 #1 > > >> >> > [125311.721982] Hardware name: Supermicro XXXXXXXXXXX, BIOS 1.1 10/03/2012 > > >> >> > [125311.721984] ffffffff81a82007 ffff88407fc63958 ffffffff816bb9cc ffff88407fc63998 > > >> >> > [125311.721986] ffffffff8104b940 00ff8840ad904f82 ffff883b8a165b00 0000000000004120 > > >> >> > [125311.721989] 0000000000000001 0000000000000019 0000000000000000 ffff88407fc639a8 > > >> >> > [125311.721991] Call Trace: > > >> >> > [125311.721992] <IRQ> [<ffffffff816bb9cc>] dump_stack+0x19/0x1d > > >> >> > [125311.722002] [<ffffffff8104b940>] warn_slowpath_common+0x70/0xa0 > > >> >> > [125311.722005] [<ffffffff8104b98a>] warn_slowpath_null+0x1a/0x20 > > >> >> > [125311.722007] [<ffffffff81616db8>] tcp_fastretrans_alert+0xb58/0xc80 > > >> >> > [125311.722011] [<ffffffff8161891f>] tcp_ack+0x6df/0xe90 > > >> >> > [125311.722016] [<ffffffff8164e0ca>] ? ipt_do_table+0x22a/0x680 > > >> >> > [125311.722018] [<ffffffff816194b3>] ? tcp_validate_incoming+0x63/0x320 > > >> >> > [125311.722021] [<ffffffff8161a55c>] tcp_rcv_established+0x2cc/0x810 > > >> >> > [125311.722023] [<ffffffff81622c84>] tcp_v4_do_rcv+0x254/0x4f0 > > >> >> > [125311.722025] [<ffffffff816245ac>] tcp_v4_rcv+0x5fc/0x750 > > >> >> > [125311.722027] [<ffffffff815ffa00>] ? ip_rcv+0x350/0x350 > > >> >> > [125311.722032] [<ffffffff815df3ad>] ? nf_hook_slow+0x7d/0x160 > > >> >> > [125311.722034] [<ffffffff815ffa00>] ? ip_rcv+0x350/0x350 > > >> >> > [125311.722036] [<ffffffff815fface>] ip_local_deliver_finish+0xce/0x250 > > >> >> > [125311.722037] [<ffffffff815ffc9c>] ip_local_deliver+0x4c/0x80 > > >> >> > [125311.722039] [<ffffffff815ff329>] ip_rcv_finish+0x119/0x360 > > >> >> > [125311.722040] [<ffffffff815ff8e0>] ip_rcv+0x230/0x350 > > >> >> > [125311.722046] [<ffffffff815b4067>] __netif_receive_skb_core+0x477/0x600 > > >> >> > [125311.722049] [<ffffffff815b4217>] __netif_receive_skb+0x27/0x70 > > >> >> > [125311.722051] [<ffffffff815b4354>] process_backlog+0xf4/0x1e0 > > >> >> > [125311.722053] [<ffffffff815b4b45>] net_rx_action+0xf5/0x250 > > >> >> > [125311.722056] [<ffffffff81053a5f>] __do_softirq+0xef/0x270 > > >> >> > [125311.722058] [<ffffffff81053cb5>] irq_exit+0x95/0xa0 > > >> >> > [125311.722062] [<ffffffff816c8f26>] do_IRQ+0x66/0xe0 > > >> >> > [125311.722065] [<ffffffff816bf62a>] common_interrupt+0x6a/0x6a > > >> >> > [125311.722065] <EOI> [<ffffffff8100abf1>] ? default_idle+0x21/0xc0 > > >> >> > [125311.722082] [<ffffffff8100a54f>] arch_cpu_idle+0xf/0x20 > > >> >> > [125311.722086] [<ffffffff8108f353>] cpu_startup_entry+0xb3/0x230 > > >> >> > [125311.722091] [<ffffffff816b439e>] start_secondary+0x1dc/0x1e3 > > >> >> > [125311.722093] ---[ end trace e77cd5ba583fcbe9 ]--- > > >> >> > [125311.722096] 355.355.1.355:22496 F0x4120 S1 s7 IF25+17-1-24f0 ur57 rr3 rt0 um0 hs23120 nxt23120 > > >> >> > > > >> >> > It's been happening with all 3.10 kernels, and the one above is .13 as > > >> >> > stated in the trace. > > >> >> > > >> >> Thanks! could you post the output of `sysctl -a |grep tcp`? > > >> >> > > >> >> I suspect tcp_process_tlp_ack() should not revert state to Open > > >> >> directly, but calling tcp_try_keep_open() instead, similar to all the > > >> >> undo processing in the tcp_fastretrans_alert(): after > > >> >> tcp_end_cwnd_reduction(), the process (E) falls back to check other > > >> >> stats before moving to CA_Open. > > >> >> > > >> >> > > >> >> index 9c62257..9012b42 100644 > > >> >> --- a/net/ipv4/tcp_input.c > > >> >> +++ b/net/ipv4/tcp_input.c > > >> >> @@ -3314,7 +3314,7 @@ static void tcp_process_tlp_ack(struct sock *sk, u32 ack, > > >> >> tcp_init_cwnd_reduction(sk, true); > > >> >> tcp_set_ca_state(sk, TCP_CA_CWR); > > >> >> tcp_end_cwnd_reduction(sk); > > >> >> - tcp_set_ca_state(sk, TCP_CA_Open); > > >> >> + tcp_try_keep_open(sk); > > >> >> NET_INC_STATS_BH(sock_net(sk), > > >> >> LINUX_MIB_TCPLOSSPROBERECOVERY); > > >> >> } > > >> >> > > >> > > > >> > Should I apply this and see if the warning stops? > Hi Dormando, > > Could you try this patch to make sure it fixes the warning (with > sysctl net.ipv4.early_retrans=3)? It's now running on one machine, with early_retrans=3. Will have to give it 24 hours to confirm. > > >> I'd like to hear what the authors of TLP think. In the mean time could > > >> you help us collect more evidence by disabling TLP with > > >> sysctl net.ipv4.tcp_early_retrans=2 > > >> and see if the problem still occurs? (it should not). > > >> > > >> thanks > > > > > > Box hasn't had a warning in the last 24ish hours. A neighboring machine > > > with the default tcp_early_retrans setting has had 5-6 in the same > > > timeframe. > > > > > > Is this a harmful situation to the socket in any way, or is it just > > > informational weirdness? > > It should be fairly harmless. The ack that triggers the warning should > > set the TCP back to the good (non-Open) state, but it's still good to > > get rid of. > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: IPv6 kernel warning 2013-10-09 18:48 ` dormando @ 2013-10-11 18:15 ` dormando 0 siblings, 0 replies; 15+ messages in thread From: dormando @ 2013-10-11 18:15 UTC (permalink / raw) To: Yuchung Cheng Cc: Michele Baldessari, Russell King - ARM Linux, netdev, Neal Cardwell, Nandita Dukkipati On Wed, 9 Oct 2013, dormando wrote: > > > >> >> > > > >> > > > > >> > Should I apply this and see if the warning stops? > > Hi Dormando, > > > > Could you try this patch to make sure it fixes the warning (with > > sysctl net.ipv4.early_retrans=3)? > > It's now running on one machine, with early_retrans=3. Will have to give > it 24 hours to confirm. > Almost 48 hours, early_retrans=3, no warnings! (or crashes...) Good catch :) ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: IPv6 kernel warning 2013-10-07 19:51 ` Yuchung Cheng 2013-10-07 19:56 ` dormando @ 2013-10-08 14:05 ` Neal Cardwell 2013-10-08 17:56 ` Yuchung Cheng 1 sibling, 1 reply; 15+ messages in thread From: Neal Cardwell @ 2013-10-08 14:05 UTC (permalink / raw) To: Yuchung Cheng Cc: dormando, Michele Baldessari, Russell King - ARM Linux, netdev, Nandita Dukkipati On Mon, Oct 7, 2013 at 3:51 PM, Yuchung Cheng <ycheng@google.com> wrote: > I suspect tcp_process_tlp_ack() should not revert state to Open > directly, but calling tcp_try_keep_open() instead, similar to all the > undo processing in the tcp_fastretrans_alert(): after > tcp_end_cwnd_reduction(), the process (E) falls back to check other > stats before moving to CA_Open. > > > index 9c62257..9012b42 100644 > --- a/net/ipv4/tcp_input.c > +++ b/net/ipv4/tcp_input.c > @@ -3314,7 +3314,7 @@ static void tcp_process_tlp_ack(struct sock *sk, u32 ack, > tcp_init_cwnd_reduction(sk, true); > tcp_set_ca_state(sk, TCP_CA_CWR); > tcp_end_cwnd_reduction(sk); > - tcp_set_ca_state(sk, TCP_CA_Open); > + tcp_try_keep_open(sk); > NET_INC_STATS_BH(sock_net(sk), > LINUX_MIB_TCPLOSSPROBERECOVERY); > } Yes, nice catch! This looks good to me. My testing confirms that this definitely fixes a bug when this code fires and there are segments SACKed out. Since it will stay in CA_Disorder if there are outstanding retransmissions, I bet it will also fix the WARN_ON(tp->retrans_out != 0) in state TCP_CA_Open that people are seeing. neal ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: IPv6 kernel warning 2013-10-08 14:05 ` Neal Cardwell @ 2013-10-08 17:56 ` Yuchung Cheng 0 siblings, 0 replies; 15+ messages in thread From: Yuchung Cheng @ 2013-10-08 17:56 UTC (permalink / raw) To: Neal Cardwell Cc: dormando, Michele Baldessari, Russell King - ARM Linux, netdev, Nandita Dukkipati On Tue, Oct 8, 2013 at 7:05 AM, Neal Cardwell <ncardwell@google.com> wrote: > On Mon, Oct 7, 2013 at 3:51 PM, Yuchung Cheng <ycheng@google.com> wrote: >> I suspect tcp_process_tlp_ack() should not revert state to Open >> directly, but calling tcp_try_keep_open() instead, similar to all the >> undo processing in the tcp_fastretrans_alert(): after >> tcp_end_cwnd_reduction(), the process (E) falls back to check other >> stats before moving to CA_Open. >> >> >> index 9c62257..9012b42 100644 >> --- a/net/ipv4/tcp_input.c >> +++ b/net/ipv4/tcp_input.c >> @@ -3314,7 +3314,7 @@ static void tcp_process_tlp_ack(struct sock *sk, u32 ack, >> tcp_init_cwnd_reduction(sk, true); >> tcp_set_ca_state(sk, TCP_CA_CWR); >> tcp_end_cwnd_reduction(sk); >> - tcp_set_ca_state(sk, TCP_CA_Open); >> + tcp_try_keep_open(sk); >> NET_INC_STATS_BH(sock_net(sk), >> LINUX_MIB_TCPLOSSPROBERECOVERY); >> } > > Yes, nice catch! This looks good to me. My testing confirms that this > definitely fixes a bug when this code fires and there are segments > SACKed out. Since it will stay in CA_Disorder if there are outstanding > retransmissions, I bet it will also fix the WARN_ON(tp->retrans_out != > 0) in state TCP_CA_Open that people are seeing. Sounds good. Let me do more tests then I will submit a bug fix. > > neal ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2013-10-11 18:15 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-09-20 13:11 IPv6 kernel warning Russell King - ARM Linux 2013-09-20 16:08 ` Michele Baldessari 2013-09-20 16:40 ` Yuchung Cheng 2013-10-07 18:13 ` dormando 2013-10-07 19:51 ` Yuchung Cheng 2013-10-07 19:56 ` dormando 2013-10-07 20:00 ` Yuchung Cheng 2013-10-07 20:15 ` dormando 2013-10-08 18:24 ` dormando 2013-10-08 20:53 ` Yuchung Cheng 2013-10-09 17:33 ` Yuchung Cheng 2013-10-09 18:48 ` dormando 2013-10-11 18:15 ` dormando 2013-10-08 14:05 ` Neal Cardwell 2013-10-08 17:56 ` Yuchung Cheng
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).