linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c
@ 2017-09-21  1:46 Roman Gushchin
  2017-09-21 17:07 ` Yuchung Cheng
  0 siblings, 1 reply; 8+ messages in thread
From: Roman Gushchin @ 2017-09-21  1:46 UTC (permalink / raw)
  To: Oleksandr Natalenko
  Cc: Alexey Kuznetsov, Hideaki YOSHIFUJI, netdev, linux-kernel

> Hello.
>
> Since, IIRC, v4.11, there is some regression in TCP stack resulting in the 
> warning shown below. Most of the time it is harmless, but rarely it just 
> causes either freeze or (I believe, this is related too) panic in 
> tcp_sacktag_walk() (because sk_buff passed to this function is NULL). 
> Unfortunately, I still do not have proper stacktrace from panic, but will try 
> to capture it if possible.
> 
> Also, I have custom settings regarding TCP stack, shown below as well. ifb is 
> used to shape traffic with tc.
> 
> Please note this regression was already reported as BZ [1] and as a letter to 
> ML [2], but got neither attention nor resolution. It is reproducible for (not 
> only) me on my home router since v4.11 till v4.13.1 incl.
> 
> Please advise on how to deal with it. I'll provide any additional info if 
> necessary, also ready to test patches if any.
> 
> Thanks.
> 
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=195835
> [2] https://www.spinics.net/lists/netdev/msg436158.html

We're experiencing the same problems on some machines in our fleet.
Exactly the same symptoms: tcp_fastretrans_alert() warnings and
sometimes panics in tcp_sacktag_walk().

Here is an example of a backtrace with the panic log:

978.210080]  fuse
[973978.214099]  sg
[973978.217789]  loop
[973978.221829]  efivarfs
[973978.226544]  autofs4
[973978.231109] CPU: 12 PID: 3806320 Comm: ld:srv:W20 Tainted: G        W       4.11.3-7_fbk1_1174_ga56eebf #7
[973978.250563] Hardware name: Wiwynn Leopard-Orv2/Leopard-DDR BW, BIOS LBM06   10/28/2016
[973978.266558] Call Trace:
[973978.271615]  <IRQ>
[973978.275817]  dump_stack+0x4d/0x70
[973978.282626]  __warn+0xd3/0xf0
[973978.288727]  warn_slowpath_null+0x1e/0x20
[973978.296910]  tcp_fastretrans_alert+0xacf/0xbd0
[973978.305974]  tcp_ack+0xbce/0x1390
[973978.312770]  tcp_rcv_established+0x1ce/0x740
[973978.321488]  tcp_v6_do_rcv+0x195/0x440
[973978.329166]  tcp_v6_rcv+0x94c/0x9f0
[973978.336323]  ip6_input_finish+0xea/0x430
[973978.344330]  ip6_input+0x32/0xa0
[973978.350968]  ? ip6_rcv_finish+0xa0/0xa0
[973978.358799]  ip6_rcv_finish+0x4b/0xa0
[973978.366289]  ipv6_rcv+0x2ec/0x4f0
[973978.373082]  ? ip6_make_skb+0x1c0/0x1c0
[973978.380919]  __netif_receive_skb_core+0x2d5/0x9a0
[973978.390505]  __netif_receive_skb+0x16/0x70
[973978.398875]  netif_receive_skb_internal+0x23/0x80
[973978.408462]  napi_gro_receive+0x113/0x1d0
[973978.416657]  mlx5e_handle_rx_cqe_mpwrq+0x5b6/0x8d0
[973978.426398]  mlx5e_poll_rx_cq+0x7c/0x7f0
[973978.434405]  mlx5e_napi_poll+0x8c/0x380
[973978.442238]  ? mlx5_cq_completion+0x54/0xb0
[973978.450770]  net_rx_action+0x22e/0x380
[973978.458447]  __do_softirq+0x106/0x2e8
[973978.465950]  irq_exit+0xb0/0xc0
[973978.472396]  do_IRQ+0x4f/0xd0
[973978.478495]  common_interrupt+0x86/0x86
[973978.486329] RIP: 0033:0x7f8dee58d8ae
[973978.493642] RSP: 002b:00007f8cb925f078 EFLAGS: 00000206
[973978.504251]  ORIG_RAX: ffffffffffffff5f
[973978.512085] RAX: 00007f8cb925f1a8 RBX: 0000000048000000 RCX: 00007f8764bd6a80
[973978.526508] RDX: 00000000000001ba RSI: 00007f7cb4ca3410 RDI: 00007f7cb4ca3410
[973978.540927] RBP: 000000000000000d R08: 00007f8764bd6b00 R09: 00007f8764bd6b80
[973978.555347] R10: 0000000000002400 R11: 00007f8dee58e240 R12: d3273c84146e8c29
[973978.569766] R13: 9dac83ddf04c235c R14: 00007f7cb4ca33b0 R15: 00007f7cb4ca4f50
[973978.584189]  </IRQ>
[973978.588650] ---[ end trace 5d1c83e12a57d039 ]---
[973995.178183] BUG: unable to handle kernel 
[973995.186385] NULL pointer dereference
[973995.193692]  at 0000000000000028
[973995.200323] IP: tcp_sacktag_walk+0x2b7/0x460
[973995.209032] PGD 102d856067 
[973995.214789] PUD fded0d067 
[973995.220385] PMD 0 
[973995.224577] 
[973995.227732] ------------[ cut here ]------------
[973995.237128] Oops: 0000 [#1] SMP
[973995.243575] Modules linked in:
[973995.249868]  mptctl
[973995.254251]  mptbase
[973995.258792]  xt_DSCP
[973995.263331]  xt_set
[973995.267698]  ip_set_hash_ip
[973995.273452]  cls_u32
[973995.277993]  sch_sfq
[973995.282535]  cls_fw
[973995.286904]  sch_htb
[973995.291444]  mpt3sas
[973995.295982]  raid_class
[973995.301044]  ip6table_mangle
[973995.306973]  iptable_mangle
[973995.312726]  cls_bpf
[973995.317268]  tcp_diag
[973995.321983]  udp_diag
[973995.326697]  inet_diag
[973995.331585]  ip6table_filter
[973995.337513]  xt_NFLOG
[973995.342226]  nfnetlink_log
[973995.347807]  xt_comment
[973995.352866]  xt_statistic
[973995.358276]  iptable_filter
[973995.364029]  xt_mark
[973995.368572]  sb_edac
[973995.373113]  edac_core
[973995.378001]  x86_pkg_temp_thermal
[973995.384795]  intel_powerclamp
[973995.390897]  coretemp
[973995.395608]  kvm_intel
[973995.400498]  kvm
[973995.404345]  irqbypass
[973995.409235]  ses
[973995.413078]  iTCO_wdt
[973995.417794]  iTCO_vendor_support
[973995.424415]  enclosure
[973995.429301]  lpc_ich
[973995.433843]  scsi_transport_sas
[973995.440292]  mfd_core
[973995.445007]  efivars
[973995.449548]  ipmi_si
[973995.454087]  ipmi_devintf
[973995.459496]  i2c_i801
[973995.464209]  ipmi_msghandler
[973995.470138]  acpi_cpufreq
[973995.475545]  button
[973995.479914]  sch_fq_codel
[973995.485319]  nfsd
[973995.489341]  auth_rpcgss
[973995.494573]  nfs_acl
[973995.499114]  oid_registry
[973995.504524]  lockd
[973995.508717]  grace
[973995.512912]  sunrpc
[973995.517280]  megaraid_sas
[973995.522689]  fuse
[973995.526709]  sg
[973995.530382]  loop
[973995.534405]  efivarfs
[973995.539118]  autofs4
[973995.543660] CPU: 19 PID: 3806297 Comm: ld:srv:W0 Tainted: G        W       4.11.3-7_fbk1_1174_ga56eebf #7
[973995.562936] Hardware name: Wiwynn Leopard-Orv2/Leopard-DDR BW, BIOS LBM06   10/28/2016
[973995.578914] task: ffff880129e5c380 task.stack: ffffc900210cc000
[973995.590914] RIP: 0010:tcp_sacktag_walk+0x2b7/0x460
[973995.600648] RSP: 0000:ffff88203ef438e8 EFLAGS: 00010207
[973995.611254] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 000000004e4ec474
[973995.625677] RDX: 000000004e4ec2ad RSI: ffff8810361faa00 RDI: ffff881ffecf8840
[973995.640102] RBP: ffff88203ef43940 R08: 0000000045729921 R09: 0000000000000000
[973995.654519] R10: 00000000000085d6 R11: ffff881ffecf8998 R12: ffff881ffecf8840
[973995.668938] R13: ffff88203ef43a48 R14: 0000000000000000 R15: ffff881ffecf8998
[973995.683362] FS:  00007f8cc8cf7700(0000) GS:ffff88203ef40000(0000) knlGS:0000000000000000
[973995.699686] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[973995.711331] CR2: 0000000000000028 CR3: 0000000104c1b000 CR4: 00000000003406e0
[973995.725755] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[973995.740175] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[973995.754595] Call Trace:
[973995.759652]  <IRQ>
[973995.763855]  ? kprobe_perf_func+0x28/0x210
[973995.772210]  tcp_sacktag_write_queue+0x5ff/0x9e0
[973995.781615]  tcp_ack+0x677/0x1390
[973995.788408]  tcp_rcv_established+0x1ce/0x740
[973995.797112]  tcp_v6_do_rcv+0x195/0x440
[973995.804767]  tcp_v6_rcv+0x94c/0x9f0
[973995.811911]  ip6_input_finish+0xea/0x430
[973995.819917]  ip6_input+0x32/0xa0
[973995.826538]  ? ip6_rcv_finish+0xa0/0xa0
[973995.834373]  ip6_rcv_finish+0x4b/0xa0
[973995.841859]  ipv6_rcv+0x2ec/0x4f0
[973995.848653]  ? ip6_fragment+0x9f0/0x9f0
[973995.856489]  ? ip6_make_skb+0x1c0/0x1c0
[973995.864323]  __netif_receive_skb_core+0x2d5/0x9a0
[973995.873891]  __netif_receive_skb+0x16/0x70
[973995.882244]  netif_receive_skb_internal+0x23/0x80
[973995.891812]  napi_gro_receive+0x113/0x1d0
[973995.899993]  mlx5e_handle_rx_cqe_mpwrq+0x5b6/0x8d0
[973995.909735]  mlx5e_poll_rx_cq+0x7c/0x7f0
[973995.917740]  mlx5e_napi_poll+0x8c/0x380
[973995.925576]  ? mlx5_cq_completion+0x54/0xb0
[973995.934101]  net_rx_action+0x22e/0x380
[973995.941764]  __do_softirq+0x106/0x2e8
[973995.949255]  irq_exit+0xb0/0xc0
[973995.955696]  do_IRQ+0x4f/0xd0
[973995.961798]  common_interrupt+0x86/0x86
[973995.969634] RIP: 0033:0x7f8dec97a557
[973995.976945] RSP: 002b:00007f8cc8cf2f48 EFLAGS: 00000206
[973995.987552]  ORIG_RAX: ffffffffffffff20
[973995.995386] RAX: 00007f7fa9e15230 RBX: 00007f8c2153a160 RCX: 00007f7fa9e15230
[973996.009810] RDX: 0000000000000000 RSI: 00007f8cc8cf3040 RDI: 00007f8c204f90c0
[973996.024230] RBP: 00007f8cc8cf2f80 R08: 0000000000000001 R09: 000131aa4c002c01
[973996.038652] R10: 0000000000000000 R11: 0000000000000001 R12: 00007f8c2153a170
[973996.053073] R13: 00007f8cc8cf3040 R14: 00007f8c204f90c0 R15: 00007f8c2153a120
[973996.067494]  </IRQ>
[973996.071858] Code: 
[973996.076051] b9 
[973996.079723] 01 
[973996.083383] 00 
[973996.087056] 00 
[973996.090730] 00 
[973996.094388] 85 
[973996.098062] c0 
[973996.101738] 0f 
[973996.105410] 8e 
[973996.109087] da 
[973996.112759] fd 
[973996.116433] ff 
[973996.120109] ff 
[973996.123783] 85 
[973996.127457] c0 
[973996.131132] 75 
[973996.134806] 28 
[973996.138481] 0f 
[973996.142156] b7 
[973996.145831] 43 
[973996.149504] 30 
[973996.153180] 41 
[973996.156835] 01 
[973996.160511] 45 
[973996.164168] 04 
[973996.167843] 48 
[973996.171517] 8b 
[973996.175190] 1b 
[973996.178848] 4c 
[973996.182521] 39 
[973996.186198] fb 
[973996.189872] 74 
[973996.193529] 8c 
[973996.197202] 49 
[973996.200877] 3b 
[973996.204534] 9c 
[973996.208209] 24 
[973996.211883] 50 
[973996.215559] 01 
[973996.219215] 00 
[973996.222889] 00 
[973996.226562] 74 
[973996.230221] c1 
[973996.233894] <8b> 
[973996.237916] 43 
[973996.241590] 28 
[973996.245264] 3b 
[973996.248921] 45 
[973996.252596] d4 
[973996.256271] 0f 
[973996.259929] 88 
[973996.263601] 9f 
[973996.267276] fd 
[973996.270935] ff 
[973996.274592] ff 
[973996.278267] eb 
[973996.281938] b3 
[973996.285614] 48 
[973996.289289] 8d 
[973996.292964] 43 
[973996.296638] 10 
[973996.300296] 8b 
[973996.303969] 4b 
[973996.307642] 28 
[973996.311325] RIP: tcp_sacktag_walk+0x2b7/0x460 RSP: ffff88203ef438e8
[973996.324007] ------------[ cut here ]------------
[973996.333399] CR2: 0000000000000028
[973996.340218] ---[ end trace 5d1c83e12a57d03a ]---
[973996.349605] Kernel panic - not syncing: Fatal exception in interrupt
[973996.362521] Kernel Offset: disabled
TBOOT: wait until all APs ready for txt shutdown
TBOOT: IA32_FEATURE_CONTROL_MSR: 0000ff07
TBOOT: CPU is SMX-capable
TBOOT: CPU is VMX-capable
TBOOT: SMX is enabled
TBOOT: TXT chipset and all needed capabilities present
TBOOT: TPM: Pcr 17 extend, return value = 0000003D
TBOOT: TPM: Pcr 18 extend, return value = 0000003D
TBOOT: TPM: Pcr 19 extend, return value = 0000003D
TBOOT: cap'ed dynamic PCRs
TBOOT: waiting for APs (0) to exit guests...
TBOOT: .
TBOOT: 
TBOOT: all APs exited guests
TBOOT: calling txt_shutdown on AP


Thanks!

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c
  2017-09-21  1:46 [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c Roman Gushchin
@ 2017-09-21 17:07 ` Yuchung Cheng
       [not found]   ` <CAK6E8=cGF+xKiixRVvA=3PVPA7OQta9hVLTgCbKgvYf3e9Eu-A@mail.gmail.com>
  0 siblings, 1 reply; 8+ messages in thread
From: Yuchung Cheng @ 2017-09-21 17:07 UTC (permalink / raw)
  To: 10035198.1vE6NFrMDO
  Cc: Oleksandr Natalenko, Alexey Kuznetsov, Hideaki YOSHIFUJI, netdev,
	linux-kernel

On Wed, Sep 20, 2017 at 6:46 PM, Roman Gushchin <guro@fb.com> wrote:
>
> > Hello.
> >
> > Since, IIRC, v4.11, there is some regression in TCP stack resulting in the
> > warning shown below. Most of the time it is harmless, but rarely it just
> > causes either freeze or (I believe, this is related too) panic in
> > tcp_sacktag_walk() (because sk_buff passed to this function is NULL).
> > Unfortunately, I still do not have proper stacktrace from panic, but will try
> > to capture it if possible.
> >
> > Also, I have custom settings regarding TCP stack, shown below as well. ifb is
> > used to shape traffic with tc.
> >
> > Please note this regression was already reported as BZ [1] and as a letter to
> > ML [2], but got neither attention nor resolution. It is reproducible for (not
> > only) me on my home router since v4.11 till v4.13.1 incl.
> >
> > Please advise on how to deal with it. I'll provide any additional info if
> > necessary, also ready to test patches if any.
> >
> > Thanks.
> >
> > [1] https://bugzilla.kernel.org/show_bug.cgi?id=195835
> > [2] https://www.spinics.net/lists/netdev/msg436158.html
>
> We're experiencing the same problems on some machines in our fleet.
> Exactly the same symptoms: tcp_fastretrans_alert() warnings and
> sometimes panics in tcp_sacktag_walk().
>
> Here is an example of a backtrace with the panic log:
do you still see the panics if you disable RACK?
sysctl net.ipv4.tcp_recovery=0?

also have you experience any sack reneg? could you post the output of
' nstat |grep -i TCP' thanks


>
> 978.210080]  fuse
> [973978.214099]  sg
> [973978.217789]  loop
> [973978.221829]  efivarfs
> [973978.226544]  autofs4
> [973978.231109] CPU: 12 PID: 3806320 Comm: ld:srv:W20 Tainted: G        W       4.11.3-7_fbk1_1174_ga56eebf #7
> [973978.250563] Hardware name: Wiwynn Leopard-Orv2/Leopard-DDR BW, BIOS LBM06   10/28/2016
> [973978.266558] Call Trace:
> [973978.271615]  <IRQ>
> [973978.275817]  dump_stack+0x4d/0x70
> [973978.282626]  __warn+0xd3/0xf0
> [973978.288727]  warn_slowpath_null+0x1e/0x20
> [973978.296910]  tcp_fastretrans_alert+0xacf/0xbd0
> [973978.305974]  tcp_ack+0xbce/0x1390
> [973978.312770]  tcp_rcv_established+0x1ce/0x740
> [973978.321488]  tcp_v6_do_rcv+0x195/0x440
> [973978.329166]  tcp_v6_rcv+0x94c/0x9f0
> [973978.336323]  ip6_input_finish+0xea/0x430
> [973978.344330]  ip6_input+0x32/0xa0
> [973978.350968]  ? ip6_rcv_finish+0xa0/0xa0
> [973978.358799]  ip6_rcv_finish+0x4b/0xa0
> [973978.366289]  ipv6_rcv+0x2ec/0x4f0
> [973978.373082]  ? ip6_make_skb+0x1c0/0x1c0
> [973978.380919]  __netif_receive_skb_core+0x2d5/0x9a0
> [973978.390505]  __netif_receive_skb+0x16/0x70
> [973978.398875]  netif_receive_skb_internal+0x23/0x80
> [973978.408462]  napi_gro_receive+0x113/0x1d0
> [973978.416657]  mlx5e_handle_rx_cqe_mpwrq+0x5b6/0x8d0
> [973978.426398]  mlx5e_poll_rx_cq+0x7c/0x7f0
> [973978.434405]  mlx5e_napi_poll+0x8c/0x380
> [973978.442238]  ? mlx5_cq_completion+0x54/0xb0
> [973978.450770]  net_rx_action+0x22e/0x380
> [973978.458447]  __do_softirq+0x106/0x2e8
> [973978.465950]  irq_exit+0xb0/0xc0
> [973978.472396]  do_IRQ+0x4f/0xd0
> [973978.478495]  common_interrupt+0x86/0x86
> [973978.486329] RIP: 0033:0x7f8dee58d8ae
> [973978.493642] RSP: 002b:00007f8cb925f078 EFLAGS: 00000206
> [973978.504251]  ORIG_RAX: ffffffffffffff5f
> [973978.512085] RAX: 00007f8cb925f1a8 RBX: 0000000048000000 RCX: 00007f8764bd6a80
> [973978.526508] RDX: 00000000000001ba RSI: 00007f7cb4ca3410 RDI: 00007f7cb4ca3410
> [973978.540927] RBP: 000000000000000d R08: 00007f8764bd6b00 R09: 00007f8764bd6b80
> [973978.555347] R10: 0000000000002400 R11: 00007f8dee58e240 R12: d3273c84146e8c29
> [973978.569766] R13: 9dac83ddf04c235c R14: 00007f7cb4ca33b0 R15: 00007f7cb4ca4f50
> [973978.584189]  </IRQ>
> [973978.588650] ---[ end trace 5d1c83e12a57d039 ]---
> [973995.178183] BUG: unable to handle kernel
> [973995.186385] NULL pointer dereference
> [973995.193692]  at 0000000000000028
> [973995.200323] IP: tcp_sacktag_walk+0x2b7/0x460
> [973995.209032] PGD 102d856067
> [973995.214789] PUD fded0d067
> [973995.220385] PMD 0
> [973995.224577]
> [973995.227732] ------------[ cut here ]------------
> [973995.237128] Oops: 0000 [#1] SMP
> [973995.243575] Modules linked in:
> [973995.249868]  mptctl
> [973995.254251]  mptbase
> [973995.258792]  xt_DSCP
> [973995.263331]  xt_set
> [973995.267698]  ip_set_hash_ip
> [973995.273452]  cls_u32
> [973995.277993]  sch_sfq
> [973995.282535]  cls_fw
> [973995.286904]  sch_htb
> [973995.291444]  mpt3sas
> [973995.295982]  raid_class
> [973995.301044]  ip6table_mangle
> [973995.306973]  iptable_mangle
> [973995.312726]  cls_bpf
> [973995.317268]  tcp_diag
> [973995.321983]  udp_diag
> [973995.326697]  inet_diag
> [973995.331585]  ip6table_filter
> [973995.337513]  xt_NFLOG
> [973995.342226]  nfnetlink_log
> [973995.347807]  xt_comment
> [973995.352866]  xt_statistic
> [973995.358276]  iptable_filter
> [973995.364029]  xt_mark
> [973995.368572]  sb_edac
> [973995.373113]  edac_core
> [973995.378001]  x86_pkg_temp_thermal
> [973995.384795]  intel_powerclamp
> [973995.390897]  coretemp
> [973995.395608]  kvm_intel
> [973995.400498]  kvm
> [973995.404345]  irqbypass
> [973995.409235]  ses
> [973995.413078]  iTCO_wdt
> [973995.417794]  iTCO_vendor_support
> [973995.424415]  enclosure
> [973995.429301]  lpc_ich
> [973995.433843]  scsi_transport_sas
> [973995.440292]  mfd_core
> [973995.445007]  efivars
> [973995.449548]  ipmi_si
> [973995.454087]  ipmi_devintf
> [973995.459496]  i2c_i801
> [973995.464209]  ipmi_msghandler
> [973995.470138]  acpi_cpufreq
> [973995.475545]  button
> [973995.479914]  sch_fq_codel
> [973995.485319]  nfsd
> [973995.489341]  auth_rpcgss
> [973995.494573]  nfs_acl
> [973995.499114]  oid_registry
> [973995.504524]  lockd
> [973995.508717]  grace
> [973995.512912]  sunrpc
> [973995.517280]  megaraid_sas
> [973995.522689]  fuse
> [973995.526709]  sg
> [973995.530382]  loop
> [973995.534405]  efivarfs
> [973995.539118]  autofs4
> [973995.543660] CPU: 19 PID: 3806297 Comm: ld:srv:W0 Tainted: G        W       4.11.3-7_fbk1_1174_ga56eebf #7
> [973995.562936] Hardware name: Wiwynn Leopard-Orv2/Leopard-DDR BW, BIOS LBM06   10/28/2016
> [973995.578914] task: ffff880129e5c380 task.stack: ffffc900210cc000
> [973995.590914] RIP: 0010:tcp_sacktag_walk+0x2b7/0x460
> [973995.600648] RSP: 0000:ffff88203ef438e8 EFLAGS: 00010207
> [973995.611254] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 000000004e4ec474
> [973995.625677] RDX: 000000004e4ec2ad RSI: ffff8810361faa00 RDI: ffff881ffecf8840
> [973995.640102] RBP: ffff88203ef43940 R08: 0000000045729921 R09: 0000000000000000
> [973995.654519] R10: 00000000000085d6 R11: ffff881ffecf8998 R12: ffff881ffecf8840
> [973995.668938] R13: ffff88203ef43a48 R14: 0000000000000000 R15: ffff881ffecf8998
> [973995.683362] FS:  00007f8cc8cf7700(0000) GS:ffff88203ef40000(0000) knlGS:0000000000000000
> [973995.699686] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [973995.711331] CR2: 0000000000000028 CR3: 0000000104c1b000 CR4: 00000000003406e0
> [973995.725755] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [973995.740175] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [973995.754595] Call Trace:
> [973995.759652]  <IRQ>
> [973995.763855]  ? kprobe_perf_func+0x28/0x210
> [973995.772210]  tcp_sacktag_write_queue+0x5ff/0x9e0
> [973995.781615]  tcp_ack+0x677/0x1390
> [973995.788408]  tcp_rcv_established+0x1ce/0x740
> [973995.797112]  tcp_v6_do_rcv+0x195/0x440
> [973995.804767]  tcp_v6_rcv+0x94c/0x9f0
> [973995.811911]  ip6_input_finish+0xea/0x430
> [973995.819917]  ip6_input+0x32/0xa0
> [973995.826538]  ? ip6_rcv_finish+0xa0/0xa0
> [973995.834373]  ip6_rcv_finish+0x4b/0xa0
> [973995.841859]  ipv6_rcv+0x2ec/0x4f0
> [973995.848653]  ? ip6_fragment+0x9f0/0x9f0
> [973995.856489]  ? ip6_make_skb+0x1c0/0x1c0
> [973995.864323]  __netif_receive_skb_core+0x2d5/0x9a0
> [973995.873891]  __netif_receive_skb+0x16/0x70
> [973995.882244]  netif_receive_skb_internal+0x23/0x80
> [973995.891812]  napi_gro_receive+0x113/0x1d0
> [973995.899993]  mlx5e_handle_rx_cqe_mpwrq+0x5b6/0x8d0
> [973995.909735]  mlx5e_poll_rx_cq+0x7c/0x7f0
> [973995.917740]  mlx5e_napi_poll+0x8c/0x380
> [973995.925576]  ? mlx5_cq_completion+0x54/0xb0
> [973995.934101]  net_rx_action+0x22e/0x380
> [973995.941764]  __do_softirq+0x106/0x2e8
> [973995.949255]  irq_exit+0xb0/0xc0
> [973995.955696]  do_IRQ+0x4f/0xd0
> [973995.961798]  common_interrupt+0x86/0x86
> [973995.969634] RIP: 0033:0x7f8dec97a557
> [973995.976945] RSP: 002b:00007f8cc8cf2f48 EFLAGS: 00000206
> [973995.987552]  ORIG_RAX: ffffffffffffff20
> [973995.995386] RAX: 00007f7fa9e15230 RBX: 00007f8c2153a160 RCX: 00007f7fa9e15230
> [973996.009810] RDX: 0000000000000000 RSI: 00007f8cc8cf3040 RDI: 00007f8c204f90c0
> [973996.024230] RBP: 00007f8cc8cf2f80 R08: 0000000000000001 R09: 000131aa4c002c01
> [973996.038652] R10: 0000000000000000 R11: 0000000000000001 R12: 00007f8c2153a170
> [973996.053073] R13: 00007f8cc8cf3040 R14: 00007f8c204f90c0 R15: 00007f8c2153a120
> [973996.067494]  </IRQ>
> [973996.071858] Code:
> [973996.076051] b9
> [973996.079723] 01
> [973996.083383] 00
> [973996.087056] 00
> [973996.090730] 00
> [973996.094388] 85
> [973996.098062] c0
> [973996.101738] 0f
> [973996.105410] 8e
> [973996.109087] da
> [973996.112759] fd
> [973996.116433] ff
> [973996.120109] ff
> [973996.123783] 85
> [973996.127457] c0
> [973996.131132] 75
> [973996.134806] 28
> [973996.138481] 0f
> [973996.142156] b7
> [973996.145831] 43
> [973996.149504] 30
> [973996.153180] 41
> [973996.156835] 01
> [973996.160511] 45
> [973996.164168] 04
> [973996.167843] 48
> [973996.171517] 8b
> [973996.175190] 1b
> [973996.178848] 4c
> [973996.182521] 39
> [973996.186198] fb
> [973996.189872] 74
> [973996.193529] 8c
> [973996.197202] 49
> [973996.200877] 3b
> [973996.204534] 9c
> [973996.208209] 24
> [973996.211883] 50
> [973996.215559] 01
> [973996.219215] 00
> [973996.222889] 00
> [973996.226562] 74
> [973996.230221] c1
> [973996.233894] <8b>
> [973996.237916] 43
> [973996.241590] 28
> [973996.245264] 3b
> [973996.248921] 45
> [973996.252596] d4
> [973996.256271] 0f
> [973996.259929] 88
> [973996.263601] 9f
> [973996.267276] fd
> [973996.270935] ff
> [973996.274592] ff
> [973996.278267] eb
> [973996.281938] b3
> [973996.285614] 48
> [973996.289289] 8d
> [973996.292964] 43
> [973996.296638] 10
> [973996.300296] 8b
> [973996.303969] 4b
> [973996.307642] 28
> [973996.311325] RIP: tcp_sacktag_walk+0x2b7/0x460 RSP: ffff88203ef438e8
> [973996.324007] ------------[ cut here ]------------
> [973996.333399] CR2: 0000000000000028
> [973996.340218] ---[ end trace 5d1c83e12a57d03a ]---
> [973996.349605] Kernel panic - not syncing: Fatal exception in interrupt
> [973996.362521] Kernel Offset: disabled
> TBOOT: wait until all APs ready for txt shutdown
> TBOOT: IA32_FEATURE_CONTROL_MSR: 0000ff07
> TBOOT: CPU is SMX-capable
> TBOOT: CPU is VMX-capable
> TBOOT: SMX is enabled
> TBOOT: TXT chipset and all needed capabilities present
> TBOOT: TPM: Pcr 17 extend, return value = 0000003D
> TBOOT: TPM: Pcr 18 extend, return value = 0000003D
> TBOOT: TPM: Pcr 19 extend, return value = 0000003D
> TBOOT: cap'ed dynamic PCRs
> TBOOT: waiting for APs (0) to exit guests...
> TBOOT: .
> TBOOT:
> TBOOT: all APs exited guests
> TBOOT: calling txt_shutdown on AP
>
>
> Thanks!

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c
       [not found]   ` <CAK6E8=cGF+xKiixRVvA=3PVPA7OQta9hVLTgCbKgvYf3e9Eu-A@mail.gmail.com>
@ 2017-09-26 13:10     ` Roman Gushchin
  2017-09-27  0:12       ` Yuchung Cheng
  0 siblings, 1 reply; 8+ messages in thread
From: Roman Gushchin @ 2017-09-26 13:10 UTC (permalink / raw)
  To: Yuchung Cheng
  Cc: Oleksandr Natalenko, Hideaki YOSHIFUJI, Alexey Kuznetsov, netdev,
	linux-kernel

> On Wed, Sep 20, 2017 at 6:46 PM, Roman Gushchin <guro@fb.com> wrote:
> >
> > > Hello.
> > >
> > > Since, IIRC, v4.11, there is some regression in TCP stack resulting in the
> > > warning shown below. Most of the time it is harmless, but rarely it just
> > > causes either freeze or (I believe, this is related too) panic in
> > > tcp_sacktag_walk() (because sk_buff passed to this function is NULL).
> > > Unfortunately, I still do not have proper stacktrace from panic, but will try
> > > to capture it if possible.
> > >
> > > Also, I have custom settings regarding TCP stack, shown below as well. ifb is
> > > used to shape traffic with tc.
> > >
> > > Please note this regression was already reported as BZ [1] and as a letter to
> > > ML [2], but got neither attention nor resolution. It is reproducible for (not
> > > only) me on my home router since v4.11 till v4.13.1 incl.
> > >
> > > Please advise on how to deal with it. I'll provide any additional info if
> > > necessary, also ready to test patches if any.
> > >
> > > Thanks.
> > >
> > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=195835
> > > [2] https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spinics.net_lists_netdev_msg436158.html&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=jJYgtDM7QT-W-Fz_d29HYQ&m=MDDRfLG5DvdOeniMpaZDJI8ulKQ6PQ6OX_1YtRsiTMA&s=-n3dGZw-pQ95kMBUfq5G9nYZFcuWtbTDlYFkcvQPoKc&e= 
> >
> > We're experiencing the same problems on some machines in our fleet.
> > Exactly the same symptoms: tcp_fastretrans_alert() warnings and
> > sometimes panics in tcp_sacktag_walk().
> >
> > Here is an example of a backtrace with the panic log:

Hi Yuchung!

> do you still see the panics if you disable RACK?
> sysctl net.ipv4.tcp_recovery=0?

No, we haven't seen any crash since that.

> 
> also have you experience any sack reneg? could you post the output of
> ' nstat |grep -i TCP' thanks

hostname	TcpActiveOpens                  2289680            0.0
hostname	TcpPassiveOpens                 3592758            0.0
hostname	TcpAttemptFails                 746910             0.0
hostname	TcpEstabResets                  154988             0.0
hostname	TcpInSegs                       16258678255        0.0
hostname	TcpOutSegs                      46967011611        0.0
hostname	TcpRetransSegs                  13724310           0.0
hostname	TcpInErrs                       2                  0.0
hostname	TcpOutRsts                      9418798            0.0
hostname	TcpExtEmbryonicRsts             2303               0.0
hostname	TcpExtPruneCalled               90192              0.0
hostname	TcpExtOfoPruned                 57274              0.0
hostname	TcpExtOutOfWindowIcmps          3                  0.0
hostname	TcpExtTW                        1164705            0.0
hostname	TcpExtTWRecycled                2                  0.0
hostname	TcpExtPAWSEstab                 159                0.0
hostname	TcpExtDelayedACKs               209207209          0.0
hostname	TcpExtDelayedACKLocked          508571             0.0
hostname	TcpExtDelayedACKLost            1713248            0.0
hostname	TcpExtListenOverflows           625                0.0
hostname	TcpExtListenDrops               625                0.0
hostname	TcpExtTCPHPHits                 9341188489         0.0
hostname	TcpExtTCPPureAcks               1434646465         0.0
hostname	TcpExtTCPHPAcks                 5733614672         0.0
hostname	TcpExtTCPSackRecovery           3261698            0.0
hostname	TcpExtTCPSACKReneging           12203              0.0
hostname	TcpExtTCPSACKReorder            433189             0.0
hostname	TcpExtTCPTSReorder              22694              0.0
hostname	TcpExtTCPFullUndo               45092              0.0
hostname	TcpExtTCPPartialUndo            22016              0.0
hostname	TcpExtTCPLossUndo               2150040            0.0
hostname	TcpExtTCPLostRetransmit         60119              0.0
hostname	TcpExtTCPSackFailures           2626782            0.0
hostname	TcpExtTCPLossFailures           182999             0.0
hostname	TcpExtTCPFastRetrans            4334275            0.0
hostname	TcpExtTCPSlowStartRetrans       3453348            0.0
hostname	TcpExtTCPTimeouts               1070997            0.0
hostname	TcpExtTCPLossProbes             2633545            0.0
hostname	TcpExtTCPLossProbeRecovery      941647             0.0
hostname	TcpExtTCPSackRecoveryFail       336302             0.0
hostname	TcpExtTCPRcvCollapsed           461354             0.0
hostname	TcpExtTCPAbortOnData            349196             0.0
hostname	TcpExtTCPAbortOnClose           3395               0.0
hostname	TcpExtTCPAbortOnTimeout         51201              0.0
hostname	TcpExtTCPMemoryPressures        2                  0.0
hostname	TcpExtTCPSpuriousRTOs           2120503            0.0
hostname	TcpExtTCPSackShifted            2613736            0.0
hostname	TcpExtTCPSackMerged             21358743           0.0
hostname	TcpExtTCPSackShiftFallback      8769387            0.0
hostname	TcpExtTCPBacklogDrop            5                  0.0
hostname	TcpExtTCPRetransFail            843                0.0
hostname	TcpExtTCPRcvCoalesce            949068035          0.0
hostname	TcpExtTCPOFOQueue               470118             0.0
hostname	TcpExtTCPOFODrop                9915               0.0
hostname	TcpExtTCPOFOMerge               9                  0.0
hostname	TcpExtTCPChallengeACK           90                 0.0
hostname	TcpExtTCPSYNChallenge           3                  0.0
hostname	TcpExtTCPFastOpenActive         2089               0.0
hostname	TcpExtTCPSpuriousRtxHostQueues  896596             0.0
hostname	TcpExtTCPAutoCorking            547386735          0.0
hostname	TcpExtTCPFromZeroWindowAdv      28757              0.0
hostname	TcpExtTCPToZeroWindowAdv        28761              0.0
hostname	TcpExtTCPWantZeroWindowAdv      322431             0.0
hostname	TcpExtTCPSynRetrans             3026               0.0
hostname	TcpExtTCPOrigDataSent           40976870977        0.0
hostname	TcpExtTCPHystartTrainDetect     453920             0.0
hostname	TcpExtTCPHystartTrainCwnd       11586273           0.0
hostname	TcpExtTCPHystartDelayDetect     10943              0.0
hostname	TcpExtTCPHystartDelayCwnd       763554             0.0
hostname	TcpExtTCPACKSkippedPAWS         30                 0.0
hostname	TcpExtTCPACKSkippedSeq          218                0.0
hostname	TcpExtTCPWinProbe               2408               0.0
hostname	TcpExtTCPKeepAlive              213768             0.0
hostname	TcpExtTCPMTUPFail               69                 0.0
hostname	TcpExtTCPMTUPSuccess            8811               0.0

Thanks!

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c
  2017-09-26 13:10     ` Roman Gushchin
@ 2017-09-27  0:12       ` Yuchung Cheng
  2017-09-27  0:18         ` Yuchung Cheng
  0 siblings, 1 reply; 8+ messages in thread
From: Yuchung Cheng @ 2017-09-27  0:12 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Oleksandr Natalenko, Hideaki YOSHIFUJI, Alexey Kuznetsov, netdev,
	linux-kernel

On Tue, Sep 26, 2017 at 6:10 AM, Roman Gushchin <guro@fb.com> wrote:
>> On Wed, Sep 20, 2017 at 6:46 PM, Roman Gushchin <guro@fb.com> wrote:
>> >
>> > > Hello.
>> > >
>> > > Since, IIRC, v4.11, there is some regression in TCP stack resulting in the
>> > > warning shown below. Most of the time it is harmless, but rarely it just
>> > > causes either freeze or (I believe, this is related too) panic in
>> > > tcp_sacktag_walk() (because sk_buff passed to this function is NULL).
>> > > Unfortunately, I still do not have proper stacktrace from panic, but will try
>> > > to capture it if possible.
>> > >
>> > > Also, I have custom settings regarding TCP stack, shown below as well. ifb is
>> > > used to shape traffic with tc.
>> > >
>> > > Please note this regression was already reported as BZ [1] and as a letter to
>> > > ML [2], but got neither attention nor resolution. It is reproducible for (not
>> > > only) me on my home router since v4.11 till v4.13.1 incl.
>> > >
>> > > Please advise on how to deal with it. I'll provide any additional info if
>> > > necessary, also ready to test patches if any.
>> > >
>> > > Thanks.
>> > >
>> > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=195835
>> > > [2] https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spinics.net_lists_netdev_msg436158.html&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=jJYgtDM7QT-W-Fz_d29HYQ&m=MDDRfLG5DvdOeniMpaZDJI8ulKQ6PQ6OX_1YtRsiTMA&s=-n3dGZw-pQ95kMBUfq5G9nYZFcuWtbTDlYFkcvQPoKc&e=
>> >
>> > We're experiencing the same problems on some machines in our fleet.
>> > Exactly the same symptoms: tcp_fastretrans_alert() warnings and
>> > sometimes panics in tcp_sacktag_walk().
>> >
>> > Here is an example of a backtrace with the panic log:
>
> Hi Yuchung!
>
>> do you still see the panics if you disable RACK?
>> sysctl net.ipv4.tcp_recovery=0?
>
> No, we haven't seen any crash since that.
I am out of ideas how RACK can potentially cause tcp_sacktag_walk to
take an empty skb :-( Do you have stack trace or any hint on which call
to tcp-sacktag_walk triggered the panic? internally at Google we never
see that.


>
>>
>> also have you experience any sack reneg? could you post the output of
>> ' nstat |grep -i TCP' thanks
>
> hostname        TcpActiveOpens                  2289680            0.0
> hostname        TcpPassiveOpens                 3592758            0.0
> hostname        TcpAttemptFails                 746910             0.0
> hostname        TcpEstabResets                  154988             0.0
> hostname        TcpInSegs                       16258678255        0.0
> hostname        TcpOutSegs                      46967011611        0.0
> hostname        TcpRetransSegs                  13724310           0.0
> hostname        TcpInErrs                       2                  0.0
> hostname        TcpOutRsts                      9418798            0.0
> hostname        TcpExtEmbryonicRsts             2303               0.0
> hostname        TcpExtPruneCalled               90192              0.0
> hostname        TcpExtOfoPruned                 57274              0.0
> hostname        TcpExtOutOfWindowIcmps          3                  0.0
> hostname        TcpExtTW                        1164705            0.0
> hostname        TcpExtTWRecycled                2                  0.0
> hostname        TcpExtPAWSEstab                 159                0.0
> hostname        TcpExtDelayedACKs               209207209          0.0
> hostname        TcpExtDelayedACKLocked          508571             0.0
> hostname        TcpExtDelayedACKLost            1713248            0.0
> hostname        TcpExtListenOverflows           625                0.0
> hostname        TcpExtListenDrops               625                0.0
> hostname        TcpExtTCPHPHits                 9341188489         0.0
> hostname        TcpExtTCPPureAcks               1434646465         0.0
> hostname        TcpExtTCPHPAcks                 5733614672         0.0
> hostname        TcpExtTCPSackRecovery           3261698            0.0
> hostname        TcpExtTCPSACKReneging           12203              0.0
> hostname        TcpExtTCPSACKReorder            433189             0.0
> hostname        TcpExtTCPTSReorder              22694              0.0
> hostname        TcpExtTCPFullUndo               45092              0.0
> hostname        TcpExtTCPPartialUndo            22016              0.0
> hostname        TcpExtTCPLossUndo               2150040            0.0
> hostname        TcpExtTCPLostRetransmit         60119              0.0
> hostname        TcpExtTCPSackFailures           2626782            0.0
> hostname        TcpExtTCPLossFailures           182999             0.0
> hostname        TcpExtTCPFastRetrans            4334275            0.0
> hostname        TcpExtTCPSlowStartRetrans       3453348            0.0
> hostname        TcpExtTCPTimeouts               1070997            0.0
> hostname        TcpExtTCPLossProbes             2633545            0.0
> hostname        TcpExtTCPLossProbeRecovery      941647             0.0
> hostname        TcpExtTCPSackRecoveryFail       336302             0.0
> hostname        TcpExtTCPRcvCollapsed           461354             0.0
> hostname        TcpExtTCPAbortOnData            349196             0.0
> hostname        TcpExtTCPAbortOnClose           3395               0.0
> hostname        TcpExtTCPAbortOnTimeout         51201              0.0
> hostname        TcpExtTCPMemoryPressures        2                  0.0
> hostname        TcpExtTCPSpuriousRTOs           2120503            0.0
> hostname        TcpExtTCPSackShifted            2613736            0.0
> hostname        TcpExtTCPSackMerged             21358743           0.0
> hostname        TcpExtTCPSackShiftFallback      8769387            0.0
> hostname        TcpExtTCPBacklogDrop            5                  0.0
> hostname        TcpExtTCPRetransFail            843                0.0
> hostname        TcpExtTCPRcvCoalesce            949068035          0.0
> hostname        TcpExtTCPOFOQueue               470118             0.0
> hostname        TcpExtTCPOFODrop                9915               0.0
> hostname        TcpExtTCPOFOMerge               9                  0.0
> hostname        TcpExtTCPChallengeACK           90                 0.0
> hostname        TcpExtTCPSYNChallenge           3                  0.0
> hostname        TcpExtTCPFastOpenActive         2089               0.0
> hostname        TcpExtTCPSpuriousRtxHostQueues  896596             0.0
> hostname        TcpExtTCPAutoCorking            547386735          0.0
> hostname        TcpExtTCPFromZeroWindowAdv      28757              0.0
> hostname        TcpExtTCPToZeroWindowAdv        28761              0.0
> hostname        TcpExtTCPWantZeroWindowAdv      322431             0.0
> hostname        TcpExtTCPSynRetrans             3026               0.0
> hostname        TcpExtTCPOrigDataSent           40976870977        0.0
> hostname        TcpExtTCPHystartTrainDetect     453920             0.0
> hostname        TcpExtTCPHystartTrainCwnd       11586273           0.0
> hostname        TcpExtTCPHystartDelayDetect     10943              0.0
> hostname        TcpExtTCPHystartDelayCwnd       763554             0.0
> hostname        TcpExtTCPACKSkippedPAWS         30                 0.0
> hostname        TcpExtTCPACKSkippedSeq          218                0.0
> hostname        TcpExtTCPWinProbe               2408               0.0
> hostname        TcpExtTCPKeepAlive              213768             0.0
> hostname        TcpExtTCPMTUPFail               69                 0.0
> hostname        TcpExtTCPMTUPSuccess            8811               0.0
>
> Thanks!

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c
  2017-09-27  0:12       ` Yuchung Cheng
@ 2017-09-27  0:18         ` Yuchung Cheng
  2017-09-28  8:14           ` Oleksandr Natalenko
  0 siblings, 1 reply; 8+ messages in thread
From: Yuchung Cheng @ 2017-09-27  0:18 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Oleksandr Natalenko, Hideaki YOSHIFUJI, Alexey Kuznetsov, netdev,
	linux-kernel

On Tue, Sep 26, 2017 at 5:12 PM, Yuchung Cheng <ycheng@google.com> wrote:
> On Tue, Sep 26, 2017 at 6:10 AM, Roman Gushchin <guro@fb.com> wrote:
>>> On Wed, Sep 20, 2017 at 6:46 PM, Roman Gushchin <guro@fb.com> wrote:
>>> >
>>> > > Hello.
>>> > >
>>> > > Since, IIRC, v4.11, there is some regression in TCP stack resulting in the
>>> > > warning shown below. Most of the time it is harmless, but rarely it just
>>> > > causes either freeze or (I believe, this is related too) panic in
>>> > > tcp_sacktag_walk() (because sk_buff passed to this function is NULL).
>>> > > Unfortunately, I still do not have proper stacktrace from panic, but will try
>>> > > to capture it if possible.
>>> > >
>>> > > Also, I have custom settings regarding TCP stack, shown below as well. ifb is
>>> > > used to shape traffic with tc.
>>> > >
>>> > > Please note this regression was already reported as BZ [1] and as a letter to
>>> > > ML [2], but got neither attention nor resolution. It is reproducible for (not
>>> > > only) me on my home router since v4.11 till v4.13.1 incl.
>>> > >
>>> > > Please advise on how to deal with it. I'll provide any additional info if
>>> > > necessary, also ready to test patches if any.
>>> > >
>>> > > Thanks.
>>> > >
>>> > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=195835
>>> > > [2] https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spinics.net_lists_netdev_msg436158.html&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=jJYgtDM7QT-W-Fz_d29HYQ&m=MDDRfLG5DvdOeniMpaZDJI8ulKQ6PQ6OX_1YtRsiTMA&s=-n3dGZw-pQ95kMBUfq5G9nYZFcuWtbTDlYFkcvQPoKc&e=
>>> >
>>> > We're experiencing the same problems on some machines in our fleet.
>>> > Exactly the same symptoms: tcp_fastretrans_alert() warnings and
>>> > sometimes panics in tcp_sacktag_walk().
>>> >
>>> > Here is an example of a backtrace with the panic log:
>>
>> Hi Yuchung!
>>
>>> do you still see the panics if you disable RACK?
>>> sysctl net.ipv4.tcp_recovery=0?
>>
>> No, we haven't seen any crash since that.
> I am out of ideas how RACK can potentially cause tcp_sacktag_walk to
> take an empty skb :-( Do you have stack trace or any hint on which call
> to tcp-sacktag_walk triggered the panic? internally at Google we never
> see that.
hmm something just struck me: could you try
sysctl net.ipv4.tcp_recovery=1 net.ipv4.tcp_retrans_collapse=0
and see if kernel still panics on sack processing?

>
>
>>
>>>
>>> also have you experience any sack reneg? could you post the output of
>>> ' nstat |grep -i TCP' thanks
>>
>> hostname        TcpActiveOpens                  2289680            0.0
>> hostname        TcpPassiveOpens                 3592758            0.0
>> hostname        TcpAttemptFails                 746910             0.0
>> hostname        TcpEstabResets                  154988             0.0
>> hostname        TcpInSegs                       16258678255        0.0
>> hostname        TcpOutSegs                      46967011611        0.0
>> hostname        TcpRetransSegs                  13724310           0.0
>> hostname        TcpInErrs                       2                  0.0
>> hostname        TcpOutRsts                      9418798            0.0
>> hostname        TcpExtEmbryonicRsts             2303               0.0
>> hostname        TcpExtPruneCalled               90192              0.0
>> hostname        TcpExtOfoPruned                 57274              0.0
>> hostname        TcpExtOutOfWindowIcmps          3                  0.0
>> hostname        TcpExtTW                        1164705            0.0
>> hostname        TcpExtTWRecycled                2                  0.0
>> hostname        TcpExtPAWSEstab                 159                0.0
>> hostname        TcpExtDelayedACKs               209207209          0.0
>> hostname        TcpExtDelayedACKLocked          508571             0.0
>> hostname        TcpExtDelayedACKLost            1713248            0.0
>> hostname        TcpExtListenOverflows           625                0.0
>> hostname        TcpExtListenDrops               625                0.0
>> hostname        TcpExtTCPHPHits                 9341188489         0.0
>> hostname        TcpExtTCPPureAcks               1434646465         0.0
>> hostname        TcpExtTCPHPAcks                 5733614672         0.0
>> hostname        TcpExtTCPSackRecovery           3261698            0.0
>> hostname        TcpExtTCPSACKReneging           12203              0.0
>> hostname        TcpExtTCPSACKReorder            433189             0.0
>> hostname        TcpExtTCPTSReorder              22694              0.0
>> hostname        TcpExtTCPFullUndo               45092              0.0
>> hostname        TcpExtTCPPartialUndo            22016              0.0
>> hostname        TcpExtTCPLossUndo               2150040            0.0
>> hostname        TcpExtTCPLostRetransmit         60119              0.0
>> hostname        TcpExtTCPSackFailures           2626782            0.0
>> hostname        TcpExtTCPLossFailures           182999             0.0
>> hostname        TcpExtTCPFastRetrans            4334275            0.0
>> hostname        TcpExtTCPSlowStartRetrans       3453348            0.0
>> hostname        TcpExtTCPTimeouts               1070997            0.0
>> hostname        TcpExtTCPLossProbes             2633545            0.0
>> hostname        TcpExtTCPLossProbeRecovery      941647             0.0
>> hostname        TcpExtTCPSackRecoveryFail       336302             0.0
>> hostname        TcpExtTCPRcvCollapsed           461354             0.0
>> hostname        TcpExtTCPAbortOnData            349196             0.0
>> hostname        TcpExtTCPAbortOnClose           3395               0.0
>> hostname        TcpExtTCPAbortOnTimeout         51201              0.0
>> hostname        TcpExtTCPMemoryPressures        2                  0.0
>> hostname        TcpExtTCPSpuriousRTOs           2120503            0.0
>> hostname        TcpExtTCPSackShifted            2613736            0.0
>> hostname        TcpExtTCPSackMerged             21358743           0.0
>> hostname        TcpExtTCPSackShiftFallback      8769387            0.0
>> hostname        TcpExtTCPBacklogDrop            5                  0.0
>> hostname        TcpExtTCPRetransFail            843                0.0
>> hostname        TcpExtTCPRcvCoalesce            949068035          0.0
>> hostname        TcpExtTCPOFOQueue               470118             0.0
>> hostname        TcpExtTCPOFODrop                9915               0.0
>> hostname        TcpExtTCPOFOMerge               9                  0.0
>> hostname        TcpExtTCPChallengeACK           90                 0.0
>> hostname        TcpExtTCPSYNChallenge           3                  0.0
>> hostname        TcpExtTCPFastOpenActive         2089               0.0
>> hostname        TcpExtTCPSpuriousRtxHostQueues  896596             0.0
>> hostname        TcpExtTCPAutoCorking            547386735          0.0
>> hostname        TcpExtTCPFromZeroWindowAdv      28757              0.0
>> hostname        TcpExtTCPToZeroWindowAdv        28761              0.0
>> hostname        TcpExtTCPWantZeroWindowAdv      322431             0.0
>> hostname        TcpExtTCPSynRetrans             3026               0.0
>> hostname        TcpExtTCPOrigDataSent           40976870977        0.0
>> hostname        TcpExtTCPHystartTrainDetect     453920             0.0
>> hostname        TcpExtTCPHystartTrainCwnd       11586273           0.0
>> hostname        TcpExtTCPHystartDelayDetect     10943              0.0
>> hostname        TcpExtTCPHystartDelayCwnd       763554             0.0
>> hostname        TcpExtTCPACKSkippedPAWS         30                 0.0
>> hostname        TcpExtTCPACKSkippedSeq          218                0.0
>> hostname        TcpExtTCPWinProbe               2408               0.0
>> hostname        TcpExtTCPKeepAlive              213768             0.0
>> hostname        TcpExtTCPMTUPFail               69                 0.0
>> hostname        TcpExtTCPMTUPSuccess            8811               0.0
>>
>> Thanks!

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c
  2017-09-27  0:18         ` Yuchung Cheng
@ 2017-09-28  8:14           ` Oleksandr Natalenko
  2017-09-28 23:36             ` Yuchung Cheng
  0 siblings, 1 reply; 8+ messages in thread
From: Oleksandr Natalenko @ 2017-09-28  8:14 UTC (permalink / raw)
  To: Yuchung Cheng
  Cc: Roman Gushchin, Hideaki YOSHIFUJI, Alexey Kuznetsov, netdev,
	linux-kernel

Hi.

Won't tell about panic in tcp_sacktag_walk() since I cannot trigger it 
intentionally, but setting net.ipv4.tcp_retrans_collapse to 0 *does not* fix 
warning in tcp_fastretrans_alert() for me.

On středa 27. září 2017 2:18:32 CEST Yuchung Cheng wrote:
> On Tue, Sep 26, 2017 at 5:12 PM, Yuchung Cheng <ycheng@google.com> wrote:
> > On Tue, Sep 26, 2017 at 6:10 AM, Roman Gushchin <guro@fb.com> wrote:
> >>> On Wed, Sep 20, 2017 at 6:46 PM, Roman Gushchin <guro@fb.com> wrote:
> >>> > > Hello.
> >>> > > 
> >>> > > Since, IIRC, v4.11, there is some regression in TCP stack resulting
> >>> > > in the
> >>> > > warning shown below. Most of the time it is harmless, but rarely it
> >>> > > just
> >>> > > causes either freeze or (I believe, this is related too) panic in
> >>> > > tcp_sacktag_walk() (because sk_buff passed to this function is
> >>> > > NULL).
> >>> > > Unfortunately, I still do not have proper stacktrace from panic, but
> >>> > > will try to capture it if possible.
> >>> > > 
> >>> > > Also, I have custom settings regarding TCP stack, shown below as
> >>> > > well. ifb is used to shape traffic with tc.
> >>> > > 
> >>> > > Please note this regression was already reported as BZ [1] and as a
> >>> > > letter to ML [2], but got neither attention nor resolution. It is
> >>> > > reproducible for (not only) me on my home router since v4.11 till
> >>> > > v4.13.1 incl.
> >>> > > 
> >>> > > Please advise on how to deal with it. I'll provide any additional
> >>> > > info if
> >>> > > necessary, also ready to test patches if any.
> >>> > > 
> >>> > > Thanks.
> >>> > > 
> >>> > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=195835
> >>> > > [2]
> >>> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spinics.ne
> >>> > > t_lists_netdev_msg436158.html&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=jJ
> >>> > > YgtDM7QT-W-Fz_d29HYQ&m=MDDRfLG5DvdOeniMpaZDJI8ulKQ6PQ6OX_1YtRsiTMA&s
> >>> > > =-n3dGZw-pQ95kMBUfq5G9nYZFcuWtbTDlYFkcvQPoKc&e=>>> > 
> >>> > We're experiencing the same problems on some machines in our fleet.
> >>> > Exactly the same symptoms: tcp_fastretrans_alert() warnings and
> >>> > sometimes panics in tcp_sacktag_walk().
> >> 
> >>> > Here is an example of a backtrace with the panic log:
> >> Hi Yuchung!
> >> 
> >>> do you still see the panics if you disable RACK?
> >>> sysctl net.ipv4.tcp_recovery=0?
> >> 
> >> No, we haven't seen any crash since that.
> > 
> > I am out of ideas how RACK can potentially cause tcp_sacktag_walk to
> > take an empty skb :-( Do you have stack trace or any hint on which call
> > to tcp-sacktag_walk triggered the panic? internally at Google we never
> > see that.
> 
> hmm something just struck me: could you try
> sysctl net.ipv4.tcp_recovery=1 net.ipv4.tcp_retrans_collapse=0
> and see if kernel still panics on sack processing?
> 
> >>> also have you experience any sack reneg? could you post the output of
> >>> ' nstat |grep -i TCP' thanks
> >> 
> >> hostname        TcpActiveOpens                  2289680            0.0
> >> hostname        TcpPassiveOpens                 3592758            0.0
> >> hostname        TcpAttemptFails                 746910             0.0
> >> hostname        TcpEstabResets                  154988             0.0
> >> hostname        TcpInSegs                       16258678255        0.0
> >> hostname        TcpOutSegs                      46967011611        0.0
> >> hostname        TcpRetransSegs                  13724310           0.0
> >> hostname        TcpInErrs                       2                  0.0
> >> hostname        TcpOutRsts                      9418798            0.0
> >> hostname        TcpExtEmbryonicRsts             2303               0.0
> >> hostname        TcpExtPruneCalled               90192              0.0
> >> hostname        TcpExtOfoPruned                 57274              0.0
> >> hostname        TcpExtOutOfWindowIcmps          3                  0.0
> >> hostname        TcpExtTW                        1164705            0.0
> >> hostname        TcpExtTWRecycled                2                  0.0
> >> hostname        TcpExtPAWSEstab                 159                0.0
> >> hostname        TcpExtDelayedACKs               209207209          0.0
> >> hostname        TcpExtDelayedACKLocked          508571             0.0
> >> hostname        TcpExtDelayedACKLost            1713248            0.0
> >> hostname        TcpExtListenOverflows           625                0.0
> >> hostname        TcpExtListenDrops               625                0.0
> >> hostname        TcpExtTCPHPHits                 9341188489         0.0
> >> hostname        TcpExtTCPPureAcks               1434646465         0.0
> >> hostname        TcpExtTCPHPAcks                 5733614672         0.0
> >> hostname        TcpExtTCPSackRecovery           3261698            0.0
> >> hostname        TcpExtTCPSACKReneging           12203              0.0
> >> hostname        TcpExtTCPSACKReorder            433189             0.0
> >> hostname        TcpExtTCPTSReorder              22694              0.0
> >> hostname        TcpExtTCPFullUndo               45092              0.0
> >> hostname        TcpExtTCPPartialUndo            22016              0.0
> >> hostname        TcpExtTCPLossUndo               2150040            0.0
> >> hostname        TcpExtTCPLostRetransmit         60119              0.0
> >> hostname        TcpExtTCPSackFailures           2626782            0.0
> >> hostname        TcpExtTCPLossFailures           182999             0.0
> >> hostname        TcpExtTCPFastRetrans            4334275            0.0
> >> hostname        TcpExtTCPSlowStartRetrans       3453348            0.0
> >> hostname        TcpExtTCPTimeouts               1070997            0.0
> >> hostname        TcpExtTCPLossProbes             2633545            0.0
> >> hostname        TcpExtTCPLossProbeRecovery      941647             0.0
> >> hostname        TcpExtTCPSackRecoveryFail       336302             0.0
> >> hostname        TcpExtTCPRcvCollapsed           461354             0.0
> >> hostname        TcpExtTCPAbortOnData            349196             0.0
> >> hostname        TcpExtTCPAbortOnClose           3395               0.0
> >> hostname        TcpExtTCPAbortOnTimeout         51201              0.0
> >> hostname        TcpExtTCPMemoryPressures        2                  0.0
> >> hostname        TcpExtTCPSpuriousRTOs           2120503            0.0
> >> hostname        TcpExtTCPSackShifted            2613736            0.0
> >> hostname        TcpExtTCPSackMerged             21358743           0.0
> >> hostname        TcpExtTCPSackShiftFallback      8769387            0.0
> >> hostname        TcpExtTCPBacklogDrop            5                  0.0
> >> hostname        TcpExtTCPRetransFail            843                0.0
> >> hostname        TcpExtTCPRcvCoalesce            949068035          0.0
> >> hostname        TcpExtTCPOFOQueue               470118             0.0
> >> hostname        TcpExtTCPOFODrop                9915               0.0
> >> hostname        TcpExtTCPOFOMerge               9                  0.0
> >> hostname        TcpExtTCPChallengeACK           90                 0.0
> >> hostname        TcpExtTCPSYNChallenge           3                  0.0
> >> hostname        TcpExtTCPFastOpenActive         2089               0.0
> >> hostname        TcpExtTCPSpuriousRtxHostQueues  896596             0.0
> >> hostname        TcpExtTCPAutoCorking            547386735          0.0
> >> hostname        TcpExtTCPFromZeroWindowAdv      28757              0.0
> >> hostname        TcpExtTCPToZeroWindowAdv        28761              0.0
> >> hostname        TcpExtTCPWantZeroWindowAdv      322431             0.0
> >> hostname        TcpExtTCPSynRetrans             3026               0.0
> >> hostname        TcpExtTCPOrigDataSent           40976870977        0.0
> >> hostname        TcpExtTCPHystartTrainDetect     453920             0.0
> >> hostname        TcpExtTCPHystartTrainCwnd       11586273           0.0
> >> hostname        TcpExtTCPHystartDelayDetect     10943              0.0
> >> hostname        TcpExtTCPHystartDelayCwnd       763554             0.0
> >> hostname        TcpExtTCPACKSkippedPAWS         30                 0.0
> >> hostname        TcpExtTCPACKSkippedSeq          218                0.0
> >> hostname        TcpExtTCPWinProbe               2408               0.0
> >> hostname        TcpExtTCPKeepAlive              213768             0.0
> >> hostname        TcpExtTCPMTUPFail               69                 0.0
> >> hostname        TcpExtTCPMTUPSuccess            8811               0.0
> >> 
> >> Thanks!

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c
  2017-09-28  8:14           ` Oleksandr Natalenko
@ 2017-09-28 23:36             ` Yuchung Cheng
  0 siblings, 0 replies; 8+ messages in thread
From: Yuchung Cheng @ 2017-09-28 23:36 UTC (permalink / raw)
  To: Oleksandr Natalenko
  Cc: Roman Gushchin, Hideaki YOSHIFUJI, Alexey Kuznetsov, netdev,
	linux-kernel

On Thu, Sep 28, 2017 at 1:14 AM, Oleksandr Natalenko
<oleksandr@natalenko.name> wrote:
> Hi.
>
> Won't tell about panic in tcp_sacktag_walk() since I cannot trigger it
> intentionally, but setting net.ipv4.tcp_retrans_collapse to 0 *does not* fix
> warning in tcp_fastretrans_alert() for me.

Hi Oleksandr: no retrans_collapse should not matter for that warning
in tcp_fstretrans_alert(). the warning as I explained earlier is
likely false. Neal and I are more concerned the panic in
tcp_sacktag_walk. This is just a blind shot but thx for retrying.

We can submit a one-liner to remove the fast retrans warning but want
to nail the bigger issue first.

>
> On středa 27. září 2017 2:18:32 CEST Yuchung Cheng wrote:
>> On Tue, Sep 26, 2017 at 5:12 PM, Yuchung Cheng <ycheng@google.com> wrote:
>> > On Tue, Sep 26, 2017 at 6:10 AM, Roman Gushchin <guro@fb.com> wrote:
>> >>> On Wed, Sep 20, 2017 at 6:46 PM, Roman Gushchin <guro@fb.com> wrote:
>> >>> > > Hello.
>> >>> > >
>> >>> > > Since, IIRC, v4.11, there is some regression in TCP stack resulting
>> >>> > > in the
>> >>> > > warning shown below. Most of the time it is harmless, but rarely it
>> >>> > > just
>> >>> > > causes either freeze or (I believe, this is related too) panic in
>> >>> > > tcp_sacktag_walk() (because sk_buff passed to this function is
>> >>> > > NULL).
>> >>> > > Unfortunately, I still do not have proper stacktrace from panic, but
>> >>> > > will try to capture it if possible.
>> >>> > >
>> >>> > > Also, I have custom settings regarding TCP stack, shown below as
>> >>> > > well. ifb is used to shape traffic with tc.
>> >>> > >
>> >>> > > Please note this regression was already reported as BZ [1] and as a
>> >>> > > letter to ML [2], but got neither attention nor resolution. It is
>> >>> > > reproducible for (not only) me on my home router since v4.11 till
>> >>> > > v4.13.1 incl.
>> >>> > >
>> >>> > > Please advise on how to deal with it. I'll provide any additional
>> >>> > > info if
>> >>> > > necessary, also ready to test patches if any.
>> >>> > >
>> >>> > > Thanks.
>> >>> > >
>> >>> > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=195835
>> >>> > > [2]
>> >>> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spinics.ne
>> >>> > > t_lists_netdev_msg436158.html&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=jJ
>> >>> > > YgtDM7QT-W-Fz_d29HYQ&m=MDDRfLG5DvdOeniMpaZDJI8ulKQ6PQ6OX_1YtRsiTMA&s
>> >>> > > =-n3dGZw-pQ95kMBUfq5G9nYZFcuWtbTDlYFkcvQPoKc&e=>>> >
>> >>> > We're experiencing the same problems on some machines in our fleet.
>> >>> > Exactly the same symptoms: tcp_fastretrans_alert() warnings and
>> >>> > sometimes panics in tcp_sacktag_walk().
>> >>
>> >>> > Here is an example of a backtrace with the panic log:
>> >> Hi Yuchung!
>> >>
>> >>> do you still see the panics if you disable RACK?
>> >>> sysctl net.ipv4.tcp_recovery=0?
>> >>
>> >> No, we haven't seen any crash since that.
>> >
>> > I am out of ideas how RACK can potentially cause tcp_sacktag_walk to
>> > take an empty skb :-( Do you have stack trace or any hint on which call
>> > to tcp-sacktag_walk triggered the panic? internally at Google we never
>> > see that.
>>
>> hmm something just struck me: could you try
>> sysctl net.ipv4.tcp_recovery=1 net.ipv4.tcp_retrans_collapse=0
>> and see if kernel still panics on sack processing?
>>
>> >>> also have you experience any sack reneg? could you post the output of
>> >>> ' nstat |grep -i TCP' thanks
>> >>
>> >> hostname        TcpActiveOpens                  2289680            0.0
>> >> hostname        TcpPassiveOpens                 3592758            0.0
>> >> hostname        TcpAttemptFails                 746910             0.0
>> >> hostname        TcpEstabResets                  154988             0.0
>> >> hostname        TcpInSegs                       16258678255        0.0
>> >> hostname        TcpOutSegs                      46967011611        0.0
>> >> hostname        TcpRetransSegs                  13724310           0.0
>> >> hostname        TcpInErrs                       2                  0.0
>> >> hostname        TcpOutRsts                      9418798            0.0
>> >> hostname        TcpExtEmbryonicRsts             2303               0.0
>> >> hostname        TcpExtPruneCalled               90192              0.0
>> >> hostname        TcpExtOfoPruned                 57274              0.0
>> >> hostname        TcpExtOutOfWindowIcmps          3                  0.0
>> >> hostname        TcpExtTW                        1164705            0.0
>> >> hostname        TcpExtTWRecycled                2                  0.0
>> >> hostname        TcpExtPAWSEstab                 159                0.0
>> >> hostname        TcpExtDelayedACKs               209207209          0.0
>> >> hostname        TcpExtDelayedACKLocked          508571             0.0
>> >> hostname        TcpExtDelayedACKLost            1713248            0.0
>> >> hostname        TcpExtListenOverflows           625                0.0
>> >> hostname        TcpExtListenDrops               625                0.0
>> >> hostname        TcpExtTCPHPHits                 9341188489         0.0
>> >> hostname        TcpExtTCPPureAcks               1434646465         0.0
>> >> hostname        TcpExtTCPHPAcks                 5733614672         0.0
>> >> hostname        TcpExtTCPSackRecovery           3261698            0.0
>> >> hostname        TcpExtTCPSACKReneging           12203              0.0
>> >> hostname        TcpExtTCPSACKReorder            433189             0.0
>> >> hostname        TcpExtTCPTSReorder              22694              0.0
>> >> hostname        TcpExtTCPFullUndo               45092              0.0
>> >> hostname        TcpExtTCPPartialUndo            22016              0.0
>> >> hostname        TcpExtTCPLossUndo               2150040            0.0
>> >> hostname        TcpExtTCPLostRetransmit         60119              0.0
>> >> hostname        TcpExtTCPSackFailures           2626782            0.0
>> >> hostname        TcpExtTCPLossFailures           182999             0.0
>> >> hostname        TcpExtTCPFastRetrans            4334275            0.0
>> >> hostname        TcpExtTCPSlowStartRetrans       3453348            0.0
>> >> hostname        TcpExtTCPTimeouts               1070997            0.0
>> >> hostname        TcpExtTCPLossProbes             2633545            0.0
>> >> hostname        TcpExtTCPLossProbeRecovery      941647             0.0
>> >> hostname        TcpExtTCPSackRecoveryFail       336302             0.0
>> >> hostname        TcpExtTCPRcvCollapsed           461354             0.0
>> >> hostname        TcpExtTCPAbortOnData            349196             0.0
>> >> hostname        TcpExtTCPAbortOnClose           3395               0.0
>> >> hostname        TcpExtTCPAbortOnTimeout         51201              0.0
>> >> hostname        TcpExtTCPMemoryPressures        2                  0.0
>> >> hostname        TcpExtTCPSpuriousRTOs           2120503            0.0
>> >> hostname        TcpExtTCPSackShifted            2613736            0.0
>> >> hostname        TcpExtTCPSackMerged             21358743           0.0
>> >> hostname        TcpExtTCPSackShiftFallback      8769387            0.0
>> >> hostname        TcpExtTCPBacklogDrop            5                  0.0
>> >> hostname        TcpExtTCPRetransFail            843                0.0
>> >> hostname        TcpExtTCPRcvCoalesce            949068035          0.0
>> >> hostname        TcpExtTCPOFOQueue               470118             0.0
>> >> hostname        TcpExtTCPOFODrop                9915               0.0
>> >> hostname        TcpExtTCPOFOMerge               9                  0.0
>> >> hostname        TcpExtTCPChallengeACK           90                 0.0
>> >> hostname        TcpExtTCPSYNChallenge           3                  0.0
>> >> hostname        TcpExtTCPFastOpenActive         2089               0.0
>> >> hostname        TcpExtTCPSpuriousRtxHostQueues  896596             0.0
>> >> hostname        TcpExtTCPAutoCorking            547386735          0.0
>> >> hostname        TcpExtTCPFromZeroWindowAdv      28757              0.0
>> >> hostname        TcpExtTCPToZeroWindowAdv        28761              0.0
>> >> hostname        TcpExtTCPWantZeroWindowAdv      322431             0.0
>> >> hostname        TcpExtTCPSynRetrans             3026               0.0
>> >> hostname        TcpExtTCPOrigDataSent           40976870977        0.0
>> >> hostname        TcpExtTCPHystartTrainDetect     453920             0.0
>> >> hostname        TcpExtTCPHystartTrainCwnd       11586273           0.0
>> >> hostname        TcpExtTCPHystartDelayDetect     10943              0.0
>> >> hostname        TcpExtTCPHystartDelayCwnd       763554             0.0
>> >> hostname        TcpExtTCPACKSkippedPAWS         30                 0.0
>> >> hostname        TcpExtTCPACKSkippedSeq          218                0.0
>> >> hostname        TcpExtTCPWinProbe               2408               0.0
>> >> hostname        TcpExtTCPKeepAlive              213768             0.0
>> >> hostname        TcpExtTCPMTUPFail               69                 0.0
>> >> hostname        TcpExtTCPMTUPSuccess            8811               0.0
>> >>
>> >> Thanks!
>
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c
@ 2017-09-10 20:53 Oleksandr Natalenko
  0 siblings, 0 replies; 8+ messages in thread
From: Oleksandr Natalenko @ 2017-09-10 20:53 UTC (permalink / raw)
  To: David S. Miller; +Cc: Alexey Kuznetsov, Hideaki YOSHIFUJI, netdev, linux-kernel

Hello.

Since, IIRC, v4.11, there is some regression in TCP stack resulting in the 
warning shown below. Most of the time it is harmless, but rarely it just 
causes either freeze or (I believe, this is related too) panic in 
tcp_sacktag_walk() (because sk_buff passed to this function is NULL). 
Unfortunately, I still do not have proper stacktrace from panic, but will try 
to capture it if possible.

Also, I have custom settings regarding TCP stack, shown below as well. ifb is 
used to shape traffic with tc.

Please note this regression was already reported as BZ [1] and as a letter to 
ML [2], but got neither attention nor resolution. It is reproducible for (not 
only) me on my home router since v4.11 till v4.13.1 incl.

Please advise on how to deal with it. I'll provide any additional info if 
necessary, also ready to test patches if any.

Thanks.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=195835
[2] https://www.spinics.net/lists/netdev/msg436158.html

=== warning
[14407.060066] ------------[ cut here ]------------
[14407.060353] WARNING: CPU: 0 PID: 719 at net/ipv4/tcp_input.c:2826 
tcp_fastretrans_alert+0x7c8/0x990
[14407.060747] Modules linked in: netconsole ctr ccm cls_bpf sch_htb 
act_mirred cls_u32 sch_ingress sit tunnel4 ip_tunnel 8021q mrp nf
_conntrack_ipv6 nf_defrag_ipv6 nft_ct nft_set_bitmap nft_set_hash 
nft_set_rbtree nf_tables_inet nf_tables_ipv6 nft_masq_ipv4 nf_nat_ma
squerade_ipv4 nft_masq nft_nat nft_counter nft_meta nft_chain_nat_ipv4 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrac
k libcrc32c crc32c_generic nf_tables_ipv4 tun nf_tables nfnetlink nct6775 
hwmon_vid nls_iso8859_1 nls_cp437 vfat fat ext4 mbcache jbd2
 arc4 f2fs snd_hda_codec_hdmi fscrypto snd_hda_codec_realtek 
snd_hda_codec_generic intel_rapl intel_powerclamp coretemp iTCO_wdt iTCO_
vendor_support ath9k ath9k_common kvm_intel ath9k_hw kvm ath irqbypass 
intel_cstate mac80211 pcspkr snd_intel_sst_acpi i2c_i801 i915 s
nd_hda_intel
[14407.063800]  snd_intel_sst_core r8169 cfg80211 evdev mii snd_hda_codec 
joydev mousedev input_leds snd_soc_rt5670 mei_txe snd_soc_ss
t_atom_hifi2_platform snd_hda_core snd_soc_rl6231 snd_soc_sst_match mac_hid 
mei lpc_ich shpchp drm_kms_helper snd_hwdep snd_soc_core s
nd_compress battery snd_pcm_dmaengine drm hci_uart ov2722(C) snd_pcm lm3554(C) 
ov5693(C) snd_timer v4l2_common btbcm snd intel_gtt btq
ca btintel videodev syscopyarea bluetooth video soundcore sysfillrect media 
sysimgblt ac97_bus ecdh_generic rfkill_gpio i2c_hid rfkill
 tpm_tis crc16 fb_sys_fops i2c_algo_bit 8250_dw tpm_tis_core tpm 
soc_button_array pinctrl_cherryview intel_int0002_vgpio acpi_pad butt
on sch_fq_codel tcp_bbr ifb ip_tables x_tables btrfs xor raid6_pq 
algif_skcipher af_alg hid_logitech_hidpp hid_logitech_dj usbhid hid
uas usb_storage
[14407.066873]  dm_crypt dm_mod dax raid10 md_mod sd_mod crct10dif_pclmul 
crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_int
el aes_x86_64 crypto_simd glue_helper cryptd ahci xhci_pci libahci xhci_hcd 
libata usbcore scsi_mod usb_common serio sdhci_acpi sdhci
led_class mmc_core
[14407.068034] CPU: 0 PID: 719 Comm: irq/123-enp3s0 Tainted: G         C      
4.13.0-pf2 #1
[14407.068403] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./
J3710-ITX, BIOS P1.30 03/30/2016
[14407.068827] task: ffff98b1c0a05400 task.stack: ffffbb59c15c0000
[14407.069111] RIP: 0010:tcp_fastretrans_alert+0x7c8/0x990
[14407.069358] RSP: 0018:ffff98b1ffc03a78 EFLAGS: 00010202
[14407.069607] RAX: 0000000000000000 RBX: ffff98b135ae0000 RCX: 
ffff98b1ffc03b0c
[14407.069928] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 
ffff98b135ae0000
[14407.070248] RBP: ffff98b1ffc03ab8 R08: 0000000000000000 R09: 
ffff98b1ffc03b60
[14407.070565] R10: 0000000000000000 R11: 0000000000000000 R12: 
0000000000005120
[14407.070884] R13: ffff98b1ffc03b10 R14: 0000000000000001 R15: 
ffff98b1ffc03b0c
[14407.071205] FS:  0000000000000000(0000) GS:ffff98b1ffc00000(0000) knlGS:
0000000000000000
[14407.071564] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[14407.071827] CR2: 00007ffc580b2f0f CR3: 0000000010a09000 CR4: 
00000000001006f0
[14407.072146] Call Trace:
[14407.072279]  <IRQ>
[14407.072412]  ? sk_reset_timer+0x18/0x30
[14407.072610]  tcp_ack+0x741/0x1110
[14407.072810]  tcp_rcv_established+0x325/0x770
[14407.073033]  ? sk_filter_trim_cap+0xd4/0x1a0
[14407.073249]  tcp_v4_do_rcv+0x90/0x1e0
[14407.073449]  tcp_v4_rcv+0x950/0xa10
[14407.073647]  ? nf_ct_deliver_cached_events+0xb8/0x110 [nf_conntrack]
[14407.073955]  ip_local_deliver_finish+0x68/0x210
[14407.074183]  ip_local_deliver+0xfa/0x110
[14407.074385]  ? ip_rcv_finish+0x410/0x410
[14407.074589]  ip_rcv_finish+0x120/0x410
[14407.074782]  ip_rcv+0x28e/0x3b0
[14407.074952]  ? inet_del_offload+0x40/0x40
[14407.075154]  __netif_receive_skb_core+0x39b/0xb00
[14407.075389]  ? netif_receive_skb_internal+0xa0/0x480
[14407.075635]  ? skb_release_all+0x24/0x30
[14407.075832]  ? consume_skb+0x38/0xa0
[14407.076025]  __netif_receive_skb+0x18/0x60
[14407.076230]  netif_receive_skb_internal+0x98/0x480
[14407.076470]  netif_receive_skb+0x1c/0x80
[14407.087463]  ifb_ri_tasklet+0x109/0x26a [ifb]
[14407.090528]  tasklet_action+0x63/0x120
[14407.093258]  __do_softirq+0xdf/0x2e5
[14407.095974]  ? irq_finalize_oneshot.part.39+0xe0/0xe0
[14407.098708]  do_softirq_own_stack+0x1c/0x30
[14407.101437]  </IRQ>
[14407.104139]  do_softirq.part.17+0x4e/0x60
[14407.106854]  __local_bh_enable_ip+0x77/0x80
[14407.109671]  irq_forced_thread_fn+0x5c/0x70
[14407.112407]  irq_thread+0x131/0x1a0
[14407.115120]  ? wake_threads_waitq+0x30/0x30
[14407.117836]  kthread+0x126/0x140
[14407.120541]  ? irq_thread_check_affinity+0x90/0x90
[14407.123244]  ? kthread_create_on_node+0x70/0x70
[14407.125913]  ret_from_fork+0x25/0x30
[14407.128548] Code: 05 00 00 3b 83 30 05 00 00 0f 88 ca 01 00 00 0f b6 83 3c 
06 00 00 80 a3 cd 05 00 00 7f c0 e8 04 0f 85 3b fb ff ff
 e9 2c fb ff ff <0f> ff e9 46 f9 ff ff 31 d2 48 89 df e8 47 aa ff ff e9 f9 f9 
ff
[14407.133867] ---[ end trace 4bb223d8deb9f077 ]---
===

=== code
2823     /* D. Check state exit conditions. State can be terminated
2824      *    when high_seq is ACKed. */
2825     if (icsk->icsk_ca_state == TCP_CA_Open) {
2826         WARN_ON(tp->retrans_out != 0); // here
2827         tp->retrans_stamp = 0;
===

=== sysctl custom settings
net.ipv4.ip_nonlocal_bind = 1
net.ipv4.ip_local_port_range = 1026 59999
net.ipv4.ip_forward = 1
net.ipv6.conf.all.forwarding = 1
net.ipv6.route.max_size = 16384
net.ipv4.ip_dynaddr = 1
net.ipv4.tcp_mtu_probing = 1
net.ipv4.tcp_congestion_control = bbr
net.ipv4.tcp_fack = 1
net.ipv4.tcp_fastopen = 3
net.ipv4.tcp_low_latency = 1
net.ipv4.tcp_fin_timeout = 10
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_slow_start_after_idle = 0
net.ipv4.tcp_rmem = 4096 262143 4194304
net.ipv4.tcp_wmem = 4096 262143 4194304
net.ipv4.tcp_keepalive_time = 300
net.ipv4.tcp_keepalive_intvl = 60
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_fin_timeout = 10
net.ipv4.tcp_retries2 = 5
net.core.rmem_max = 4194304
net.core.rmem_default = 262143
net.core.wmem_max = 4194304
net.core.wmem_default = 262143
net.core.bpf_jit_enable = 1
net.ipv4.tcp_ecn = 1
===

=== kernel cmdline
BOOT_IMAGE=/vmlinuz-linux-pf root=/dev/mapper/system-root rw cryptdevice=/dev/
md0:system:allow-discards resume=/dev/mapper/system-swap quiet zswap.enabled=1 
threadirqs
===

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-09-28 23:37 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-21  1:46 [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c Roman Gushchin
2017-09-21 17:07 ` Yuchung Cheng
     [not found]   ` <CAK6E8=cGF+xKiixRVvA=3PVPA7OQta9hVLTgCbKgvYf3e9Eu-A@mail.gmail.com>
2017-09-26 13:10     ` Roman Gushchin
2017-09-27  0:12       ` Yuchung Cheng
2017-09-27  0:18         ` Yuchung Cheng
2017-09-28  8:14           ` Oleksandr Natalenko
2017-09-28 23:36             ` Yuchung Cheng
  -- strict thread matches above, loose matches on Subject: below --
2017-09-10 20:53 Oleksandr Natalenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).