All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c
@ 2017-09-21  1:46 Roman Gushchin
  2017-09-21 17:07 ` Yuchung Cheng
  0 siblings, 1 reply; 39+ messages in thread
From: Roman Gushchin @ 2017-09-21  1:46 UTC (permalink / raw)
  To: Oleksandr Natalenko
  Cc: Alexey Kuznetsov, Hideaki YOSHIFUJI, netdev, linux-kernel

> Hello.
>
> Since, IIRC, v4.11, there is some regression in TCP stack resulting in the 
> warning shown below. Most of the time it is harmless, but rarely it just 
> causes either freeze or (I believe, this is related too) panic in 
> tcp_sacktag_walk() (because sk_buff passed to this function is NULL). 
> Unfortunately, I still do not have proper stacktrace from panic, but will try 
> to capture it if possible.
> 
> Also, I have custom settings regarding TCP stack, shown below as well. ifb is 
> used to shape traffic with tc.
> 
> Please note this regression was already reported as BZ [1] and as a letter to 
> ML [2], but got neither attention nor resolution. It is reproducible for (not 
> only) me on my home router since v4.11 till v4.13.1 incl.
> 
> Please advise on how to deal with it. I'll provide any additional info if 
> necessary, also ready to test patches if any.
> 
> Thanks.
> 
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=195835
> [2] https://www.spinics.net/lists/netdev/msg436158.html

We're experiencing the same problems on some machines in our fleet.
Exactly the same symptoms: tcp_fastretrans_alert() warnings and
sometimes panics in tcp_sacktag_walk().

Here is an example of a backtrace with the panic log:

978.210080]  fuse
[973978.214099]  sg
[973978.217789]  loop
[973978.221829]  efivarfs
[973978.226544]  autofs4
[973978.231109] CPU: 12 PID: 3806320 Comm: ld:srv:W20 Tainted: G        W       4.11.3-7_fbk1_1174_ga56eebf #7
[973978.250563] Hardware name: Wiwynn Leopard-Orv2/Leopard-DDR BW, BIOS LBM06   10/28/2016
[973978.266558] Call Trace:
[973978.271615]  <IRQ>
[973978.275817]  dump_stack+0x4d/0x70
[973978.282626]  __warn+0xd3/0xf0
[973978.288727]  warn_slowpath_null+0x1e/0x20
[973978.296910]  tcp_fastretrans_alert+0xacf/0xbd0
[973978.305974]  tcp_ack+0xbce/0x1390
[973978.312770]  tcp_rcv_established+0x1ce/0x740
[973978.321488]  tcp_v6_do_rcv+0x195/0x440
[973978.329166]  tcp_v6_rcv+0x94c/0x9f0
[973978.336323]  ip6_input_finish+0xea/0x430
[973978.344330]  ip6_input+0x32/0xa0
[973978.350968]  ? ip6_rcv_finish+0xa0/0xa0
[973978.358799]  ip6_rcv_finish+0x4b/0xa0
[973978.366289]  ipv6_rcv+0x2ec/0x4f0
[973978.373082]  ? ip6_make_skb+0x1c0/0x1c0
[973978.380919]  __netif_receive_skb_core+0x2d5/0x9a0
[973978.390505]  __netif_receive_skb+0x16/0x70
[973978.398875]  netif_receive_skb_internal+0x23/0x80
[973978.408462]  napi_gro_receive+0x113/0x1d0
[973978.416657]  mlx5e_handle_rx_cqe_mpwrq+0x5b6/0x8d0
[973978.426398]  mlx5e_poll_rx_cq+0x7c/0x7f0
[973978.434405]  mlx5e_napi_poll+0x8c/0x380
[973978.442238]  ? mlx5_cq_completion+0x54/0xb0
[973978.450770]  net_rx_action+0x22e/0x380
[973978.458447]  __do_softirq+0x106/0x2e8
[973978.465950]  irq_exit+0xb0/0xc0
[973978.472396]  do_IRQ+0x4f/0xd0
[973978.478495]  common_interrupt+0x86/0x86
[973978.486329] RIP: 0033:0x7f8dee58d8ae
[973978.493642] RSP: 002b:00007f8cb925f078 EFLAGS: 00000206
[973978.504251]  ORIG_RAX: ffffffffffffff5f
[973978.512085] RAX: 00007f8cb925f1a8 RBX: 0000000048000000 RCX: 00007f8764bd6a80
[973978.526508] RDX: 00000000000001ba RSI: 00007f7cb4ca3410 RDI: 00007f7cb4ca3410
[973978.540927] RBP: 000000000000000d R08: 00007f8764bd6b00 R09: 00007f8764bd6b80
[973978.555347] R10: 0000000000002400 R11: 00007f8dee58e240 R12: d3273c84146e8c29
[973978.569766] R13: 9dac83ddf04c235c R14: 00007f7cb4ca33b0 R15: 00007f7cb4ca4f50
[973978.584189]  </IRQ>
[973978.588650] ---[ end trace 5d1c83e12a57d039 ]---
[973995.178183] BUG: unable to handle kernel 
[973995.186385] NULL pointer dereference
[973995.193692]  at 0000000000000028
[973995.200323] IP: tcp_sacktag_walk+0x2b7/0x460
[973995.209032] PGD 102d856067 
[973995.214789] PUD fded0d067 
[973995.220385] PMD 0 
[973995.224577] 
[973995.227732] ------------[ cut here ]------------
[973995.237128] Oops: 0000 [#1] SMP
[973995.243575] Modules linked in:
[973995.249868]  mptctl
[973995.254251]  mptbase
[973995.258792]  xt_DSCP
[973995.263331]  xt_set
[973995.267698]  ip_set_hash_ip
[973995.273452]  cls_u32
[973995.277993]  sch_sfq
[973995.282535]  cls_fw
[973995.286904]  sch_htb
[973995.291444]  mpt3sas
[973995.295982]  raid_class
[973995.301044]  ip6table_mangle
[973995.306973]  iptable_mangle
[973995.312726]  cls_bpf
[973995.317268]  tcp_diag
[973995.321983]  udp_diag
[973995.326697]  inet_diag
[973995.331585]  ip6table_filter
[973995.337513]  xt_NFLOG
[973995.342226]  nfnetlink_log
[973995.347807]  xt_comment
[973995.352866]  xt_statistic
[973995.358276]  iptable_filter
[973995.364029]  xt_mark
[973995.368572]  sb_edac
[973995.373113]  edac_core
[973995.378001]  x86_pkg_temp_thermal
[973995.384795]  intel_powerclamp
[973995.390897]  coretemp
[973995.395608]  kvm_intel
[973995.400498]  kvm
[973995.404345]  irqbypass
[973995.409235]  ses
[973995.413078]  iTCO_wdt
[973995.417794]  iTCO_vendor_support
[973995.424415]  enclosure
[973995.429301]  lpc_ich
[973995.433843]  scsi_transport_sas
[973995.440292]  mfd_core
[973995.445007]  efivars
[973995.449548]  ipmi_si
[973995.454087]  ipmi_devintf
[973995.459496]  i2c_i801
[973995.464209]  ipmi_msghandler
[973995.470138]  acpi_cpufreq
[973995.475545]  button
[973995.479914]  sch_fq_codel
[973995.485319]  nfsd
[973995.489341]  auth_rpcgss
[973995.494573]  nfs_acl
[973995.499114]  oid_registry
[973995.504524]  lockd
[973995.508717]  grace
[973995.512912]  sunrpc
[973995.517280]  megaraid_sas
[973995.522689]  fuse
[973995.526709]  sg
[973995.530382]  loop
[973995.534405]  efivarfs
[973995.539118]  autofs4
[973995.543660] CPU: 19 PID: 3806297 Comm: ld:srv:W0 Tainted: G        W       4.11.3-7_fbk1_1174_ga56eebf #7
[973995.562936] Hardware name: Wiwynn Leopard-Orv2/Leopard-DDR BW, BIOS LBM06   10/28/2016
[973995.578914] task: ffff880129e5c380 task.stack: ffffc900210cc000
[973995.590914] RIP: 0010:tcp_sacktag_walk+0x2b7/0x460
[973995.600648] RSP: 0000:ffff88203ef438e8 EFLAGS: 00010207
[973995.611254] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 000000004e4ec474
[973995.625677] RDX: 000000004e4ec2ad RSI: ffff8810361faa00 RDI: ffff881ffecf8840
[973995.640102] RBP: ffff88203ef43940 R08: 0000000045729921 R09: 0000000000000000
[973995.654519] R10: 00000000000085d6 R11: ffff881ffecf8998 R12: ffff881ffecf8840
[973995.668938] R13: ffff88203ef43a48 R14: 0000000000000000 R15: ffff881ffecf8998
[973995.683362] FS:  00007f8cc8cf7700(0000) GS:ffff88203ef40000(0000) knlGS:0000000000000000
[973995.699686] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[973995.711331] CR2: 0000000000000028 CR3: 0000000104c1b000 CR4: 00000000003406e0
[973995.725755] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[973995.740175] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[973995.754595] Call Trace:
[973995.759652]  <IRQ>
[973995.763855]  ? kprobe_perf_func+0x28/0x210
[973995.772210]  tcp_sacktag_write_queue+0x5ff/0x9e0
[973995.781615]  tcp_ack+0x677/0x1390
[973995.788408]  tcp_rcv_established+0x1ce/0x740
[973995.797112]  tcp_v6_do_rcv+0x195/0x440
[973995.804767]  tcp_v6_rcv+0x94c/0x9f0
[973995.811911]  ip6_input_finish+0xea/0x430
[973995.819917]  ip6_input+0x32/0xa0
[973995.826538]  ? ip6_rcv_finish+0xa0/0xa0
[973995.834373]  ip6_rcv_finish+0x4b/0xa0
[973995.841859]  ipv6_rcv+0x2ec/0x4f0
[973995.848653]  ? ip6_fragment+0x9f0/0x9f0
[973995.856489]  ? ip6_make_skb+0x1c0/0x1c0
[973995.864323]  __netif_receive_skb_core+0x2d5/0x9a0
[973995.873891]  __netif_receive_skb+0x16/0x70
[973995.882244]  netif_receive_skb_internal+0x23/0x80
[973995.891812]  napi_gro_receive+0x113/0x1d0
[973995.899993]  mlx5e_handle_rx_cqe_mpwrq+0x5b6/0x8d0
[973995.909735]  mlx5e_poll_rx_cq+0x7c/0x7f0
[973995.917740]  mlx5e_napi_poll+0x8c/0x380
[973995.925576]  ? mlx5_cq_completion+0x54/0xb0
[973995.934101]  net_rx_action+0x22e/0x380
[973995.941764]  __do_softirq+0x106/0x2e8
[973995.949255]  irq_exit+0xb0/0xc0
[973995.955696]  do_IRQ+0x4f/0xd0
[973995.961798]  common_interrupt+0x86/0x86
[973995.969634] RIP: 0033:0x7f8dec97a557
[973995.976945] RSP: 002b:00007f8cc8cf2f48 EFLAGS: 00000206
[973995.987552]  ORIG_RAX: ffffffffffffff20
[973995.995386] RAX: 00007f7fa9e15230 RBX: 00007f8c2153a160 RCX: 00007f7fa9e15230
[973996.009810] RDX: 0000000000000000 RSI: 00007f8cc8cf3040 RDI: 00007f8c204f90c0
[973996.024230] RBP: 00007f8cc8cf2f80 R08: 0000000000000001 R09: 000131aa4c002c01
[973996.038652] R10: 0000000000000000 R11: 0000000000000001 R12: 00007f8c2153a170
[973996.053073] R13: 00007f8cc8cf3040 R14: 00007f8c204f90c0 R15: 00007f8c2153a120
[973996.067494]  </IRQ>
[973996.071858] Code: 
[973996.076051] b9 
[973996.079723] 01 
[973996.083383] 00 
[973996.087056] 00 
[973996.090730] 00 
[973996.094388] 85 
[973996.098062] c0 
[973996.101738] 0f 
[973996.105410] 8e 
[973996.109087] da 
[973996.112759] fd 
[973996.116433] ff 
[973996.120109] ff 
[973996.123783] 85 
[973996.127457] c0 
[973996.131132] 75 
[973996.134806] 28 
[973996.138481] 0f 
[973996.142156] b7 
[973996.145831] 43 
[973996.149504] 30 
[973996.153180] 41 
[973996.156835] 01 
[973996.160511] 45 
[973996.164168] 04 
[973996.167843] 48 
[973996.171517] 8b 
[973996.175190] 1b 
[973996.178848] 4c 
[973996.182521] 39 
[973996.186198] fb 
[973996.189872] 74 
[973996.193529] 8c 
[973996.197202] 49 
[973996.200877] 3b 
[973996.204534] 9c 
[973996.208209] 24 
[973996.211883] 50 
[973996.215559] 01 
[973996.219215] 00 
[973996.222889] 00 
[973996.226562] 74 
[973996.230221] c1 
[973996.233894] <8b> 
[973996.237916] 43 
[973996.241590] 28 
[973996.245264] 3b 
[973996.248921] 45 
[973996.252596] d4 
[973996.256271] 0f 
[973996.259929] 88 
[973996.263601] 9f 
[973996.267276] fd 
[973996.270935] ff 
[973996.274592] ff 
[973996.278267] eb 
[973996.281938] b3 
[973996.285614] 48 
[973996.289289] 8d 
[973996.292964] 43 
[973996.296638] 10 
[973996.300296] 8b 
[973996.303969] 4b 
[973996.307642] 28 
[973996.311325] RIP: tcp_sacktag_walk+0x2b7/0x460 RSP: ffff88203ef438e8
[973996.324007] ------------[ cut here ]------------
[973996.333399] CR2: 0000000000000028
[973996.340218] ---[ end trace 5d1c83e12a57d03a ]---
[973996.349605] Kernel panic - not syncing: Fatal exception in interrupt
[973996.362521] Kernel Offset: disabled
TBOOT: wait until all APs ready for txt shutdown
TBOOT: IA32_FEATURE_CONTROL_MSR: 0000ff07
TBOOT: CPU is SMX-capable
TBOOT: CPU is VMX-capable
TBOOT: SMX is enabled
TBOOT: TXT chipset and all needed capabilities present
TBOOT: TPM: Pcr 17 extend, return value = 0000003D
TBOOT: TPM: Pcr 18 extend, return value = 0000003D
TBOOT: TPM: Pcr 19 extend, return value = 0000003D
TBOOT: cap'ed dynamic PCRs
TBOOT: waiting for APs (0) to exit guests...
TBOOT: .
TBOOT: 
TBOOT: all APs exited guests
TBOOT: calling txt_shutdown on AP


Thanks!

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c
  2017-09-21  1:46 [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c Roman Gushchin
@ 2017-09-21 17:07 ` Yuchung Cheng
       [not found]   ` <CAK6E8=cGF+xKiixRVvA=3PVPA7OQta9hVLTgCbKgvYf3e9Eu-A@mail.gmail.com>
  0 siblings, 1 reply; 39+ messages in thread
From: Yuchung Cheng @ 2017-09-21 17:07 UTC (permalink / raw)
  To: 10035198.1vE6NFrMDO
  Cc: Oleksandr Natalenko, Alexey Kuznetsov, Hideaki YOSHIFUJI, netdev,
	linux-kernel

On Wed, Sep 20, 2017 at 6:46 PM, Roman Gushchin <guro@fb.com> wrote:
>
> > Hello.
> >
> > Since, IIRC, v4.11, there is some regression in TCP stack resulting in the
> > warning shown below. Most of the time it is harmless, but rarely it just
> > causes either freeze or (I believe, this is related too) panic in
> > tcp_sacktag_walk() (because sk_buff passed to this function is NULL).
> > Unfortunately, I still do not have proper stacktrace from panic, but will try
> > to capture it if possible.
> >
> > Also, I have custom settings regarding TCP stack, shown below as well. ifb is
> > used to shape traffic with tc.
> >
> > Please note this regression was already reported as BZ [1] and as a letter to
> > ML [2], but got neither attention nor resolution. It is reproducible for (not
> > only) me on my home router since v4.11 till v4.13.1 incl.
> >
> > Please advise on how to deal with it. I'll provide any additional info if
> > necessary, also ready to test patches if any.
> >
> > Thanks.
> >
> > [1] https://bugzilla.kernel.org/show_bug.cgi?id=195835
> > [2] https://www.spinics.net/lists/netdev/msg436158.html
>
> We're experiencing the same problems on some machines in our fleet.
> Exactly the same symptoms: tcp_fastretrans_alert() warnings and
> sometimes panics in tcp_sacktag_walk().
>
> Here is an example of a backtrace with the panic log:
do you still see the panics if you disable RACK?
sysctl net.ipv4.tcp_recovery=0?

also have you experience any sack reneg? could you post the output of
' nstat |grep -i TCP' thanks


>
> 978.210080]  fuse
> [973978.214099]  sg
> [973978.217789]  loop
> [973978.221829]  efivarfs
> [973978.226544]  autofs4
> [973978.231109] CPU: 12 PID: 3806320 Comm: ld:srv:W20 Tainted: G        W       4.11.3-7_fbk1_1174_ga56eebf #7
> [973978.250563] Hardware name: Wiwynn Leopard-Orv2/Leopard-DDR BW, BIOS LBM06   10/28/2016
> [973978.266558] Call Trace:
> [973978.271615]  <IRQ>
> [973978.275817]  dump_stack+0x4d/0x70
> [973978.282626]  __warn+0xd3/0xf0
> [973978.288727]  warn_slowpath_null+0x1e/0x20
> [973978.296910]  tcp_fastretrans_alert+0xacf/0xbd0
> [973978.305974]  tcp_ack+0xbce/0x1390
> [973978.312770]  tcp_rcv_established+0x1ce/0x740
> [973978.321488]  tcp_v6_do_rcv+0x195/0x440
> [973978.329166]  tcp_v6_rcv+0x94c/0x9f0
> [973978.336323]  ip6_input_finish+0xea/0x430
> [973978.344330]  ip6_input+0x32/0xa0
> [973978.350968]  ? ip6_rcv_finish+0xa0/0xa0
> [973978.358799]  ip6_rcv_finish+0x4b/0xa0
> [973978.366289]  ipv6_rcv+0x2ec/0x4f0
> [973978.373082]  ? ip6_make_skb+0x1c0/0x1c0
> [973978.380919]  __netif_receive_skb_core+0x2d5/0x9a0
> [973978.390505]  __netif_receive_skb+0x16/0x70
> [973978.398875]  netif_receive_skb_internal+0x23/0x80
> [973978.408462]  napi_gro_receive+0x113/0x1d0
> [973978.416657]  mlx5e_handle_rx_cqe_mpwrq+0x5b6/0x8d0
> [973978.426398]  mlx5e_poll_rx_cq+0x7c/0x7f0
> [973978.434405]  mlx5e_napi_poll+0x8c/0x380
> [973978.442238]  ? mlx5_cq_completion+0x54/0xb0
> [973978.450770]  net_rx_action+0x22e/0x380
> [973978.458447]  __do_softirq+0x106/0x2e8
> [973978.465950]  irq_exit+0xb0/0xc0
> [973978.472396]  do_IRQ+0x4f/0xd0
> [973978.478495]  common_interrupt+0x86/0x86
> [973978.486329] RIP: 0033:0x7f8dee58d8ae
> [973978.493642] RSP: 002b:00007f8cb925f078 EFLAGS: 00000206
> [973978.504251]  ORIG_RAX: ffffffffffffff5f
> [973978.512085] RAX: 00007f8cb925f1a8 RBX: 0000000048000000 RCX: 00007f8764bd6a80
> [973978.526508] RDX: 00000000000001ba RSI: 00007f7cb4ca3410 RDI: 00007f7cb4ca3410
> [973978.540927] RBP: 000000000000000d R08: 00007f8764bd6b00 R09: 00007f8764bd6b80
> [973978.555347] R10: 0000000000002400 R11: 00007f8dee58e240 R12: d3273c84146e8c29
> [973978.569766] R13: 9dac83ddf04c235c R14: 00007f7cb4ca33b0 R15: 00007f7cb4ca4f50
> [973978.584189]  </IRQ>
> [973978.588650] ---[ end trace 5d1c83e12a57d039 ]---
> [973995.178183] BUG: unable to handle kernel
> [973995.186385] NULL pointer dereference
> [973995.193692]  at 0000000000000028
> [973995.200323] IP: tcp_sacktag_walk+0x2b7/0x460
> [973995.209032] PGD 102d856067
> [973995.214789] PUD fded0d067
> [973995.220385] PMD 0
> [973995.224577]
> [973995.227732] ------------[ cut here ]------------
> [973995.237128] Oops: 0000 [#1] SMP
> [973995.243575] Modules linked in:
> [973995.249868]  mptctl
> [973995.254251]  mptbase
> [973995.258792]  xt_DSCP
> [973995.263331]  xt_set
> [973995.267698]  ip_set_hash_ip
> [973995.273452]  cls_u32
> [973995.277993]  sch_sfq
> [973995.282535]  cls_fw
> [973995.286904]  sch_htb
> [973995.291444]  mpt3sas
> [973995.295982]  raid_class
> [973995.301044]  ip6table_mangle
> [973995.306973]  iptable_mangle
> [973995.312726]  cls_bpf
> [973995.317268]  tcp_diag
> [973995.321983]  udp_diag
> [973995.326697]  inet_diag
> [973995.331585]  ip6table_filter
> [973995.337513]  xt_NFLOG
> [973995.342226]  nfnetlink_log
> [973995.347807]  xt_comment
> [973995.352866]  xt_statistic
> [973995.358276]  iptable_filter
> [973995.364029]  xt_mark
> [973995.368572]  sb_edac
> [973995.373113]  edac_core
> [973995.378001]  x86_pkg_temp_thermal
> [973995.384795]  intel_powerclamp
> [973995.390897]  coretemp
> [973995.395608]  kvm_intel
> [973995.400498]  kvm
> [973995.404345]  irqbypass
> [973995.409235]  ses
> [973995.413078]  iTCO_wdt
> [973995.417794]  iTCO_vendor_support
> [973995.424415]  enclosure
> [973995.429301]  lpc_ich
> [973995.433843]  scsi_transport_sas
> [973995.440292]  mfd_core
> [973995.445007]  efivars
> [973995.449548]  ipmi_si
> [973995.454087]  ipmi_devintf
> [973995.459496]  i2c_i801
> [973995.464209]  ipmi_msghandler
> [973995.470138]  acpi_cpufreq
> [973995.475545]  button
> [973995.479914]  sch_fq_codel
> [973995.485319]  nfsd
> [973995.489341]  auth_rpcgss
> [973995.494573]  nfs_acl
> [973995.499114]  oid_registry
> [973995.504524]  lockd
> [973995.508717]  grace
> [973995.512912]  sunrpc
> [973995.517280]  megaraid_sas
> [973995.522689]  fuse
> [973995.526709]  sg
> [973995.530382]  loop
> [973995.534405]  efivarfs
> [973995.539118]  autofs4
> [973995.543660] CPU: 19 PID: 3806297 Comm: ld:srv:W0 Tainted: G        W       4.11.3-7_fbk1_1174_ga56eebf #7
> [973995.562936] Hardware name: Wiwynn Leopard-Orv2/Leopard-DDR BW, BIOS LBM06   10/28/2016
> [973995.578914] task: ffff880129e5c380 task.stack: ffffc900210cc000
> [973995.590914] RIP: 0010:tcp_sacktag_walk+0x2b7/0x460
> [973995.600648] RSP: 0000:ffff88203ef438e8 EFLAGS: 00010207
> [973995.611254] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 000000004e4ec474
> [973995.625677] RDX: 000000004e4ec2ad RSI: ffff8810361faa00 RDI: ffff881ffecf8840
> [973995.640102] RBP: ffff88203ef43940 R08: 0000000045729921 R09: 0000000000000000
> [973995.654519] R10: 00000000000085d6 R11: ffff881ffecf8998 R12: ffff881ffecf8840
> [973995.668938] R13: ffff88203ef43a48 R14: 0000000000000000 R15: ffff881ffecf8998
> [973995.683362] FS:  00007f8cc8cf7700(0000) GS:ffff88203ef40000(0000) knlGS:0000000000000000
> [973995.699686] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [973995.711331] CR2: 0000000000000028 CR3: 0000000104c1b000 CR4: 00000000003406e0
> [973995.725755] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [973995.740175] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [973995.754595] Call Trace:
> [973995.759652]  <IRQ>
> [973995.763855]  ? kprobe_perf_func+0x28/0x210
> [973995.772210]  tcp_sacktag_write_queue+0x5ff/0x9e0
> [973995.781615]  tcp_ack+0x677/0x1390
> [973995.788408]  tcp_rcv_established+0x1ce/0x740
> [973995.797112]  tcp_v6_do_rcv+0x195/0x440
> [973995.804767]  tcp_v6_rcv+0x94c/0x9f0
> [973995.811911]  ip6_input_finish+0xea/0x430
> [973995.819917]  ip6_input+0x32/0xa0
> [973995.826538]  ? ip6_rcv_finish+0xa0/0xa0
> [973995.834373]  ip6_rcv_finish+0x4b/0xa0
> [973995.841859]  ipv6_rcv+0x2ec/0x4f0
> [973995.848653]  ? ip6_fragment+0x9f0/0x9f0
> [973995.856489]  ? ip6_make_skb+0x1c0/0x1c0
> [973995.864323]  __netif_receive_skb_core+0x2d5/0x9a0
> [973995.873891]  __netif_receive_skb+0x16/0x70
> [973995.882244]  netif_receive_skb_internal+0x23/0x80
> [973995.891812]  napi_gro_receive+0x113/0x1d0
> [973995.899993]  mlx5e_handle_rx_cqe_mpwrq+0x5b6/0x8d0
> [973995.909735]  mlx5e_poll_rx_cq+0x7c/0x7f0
> [973995.917740]  mlx5e_napi_poll+0x8c/0x380
> [973995.925576]  ? mlx5_cq_completion+0x54/0xb0
> [973995.934101]  net_rx_action+0x22e/0x380
> [973995.941764]  __do_softirq+0x106/0x2e8
> [973995.949255]  irq_exit+0xb0/0xc0
> [973995.955696]  do_IRQ+0x4f/0xd0
> [973995.961798]  common_interrupt+0x86/0x86
> [973995.969634] RIP: 0033:0x7f8dec97a557
> [973995.976945] RSP: 002b:00007f8cc8cf2f48 EFLAGS: 00000206
> [973995.987552]  ORIG_RAX: ffffffffffffff20
> [973995.995386] RAX: 00007f7fa9e15230 RBX: 00007f8c2153a160 RCX: 00007f7fa9e15230
> [973996.009810] RDX: 0000000000000000 RSI: 00007f8cc8cf3040 RDI: 00007f8c204f90c0
> [973996.024230] RBP: 00007f8cc8cf2f80 R08: 0000000000000001 R09: 000131aa4c002c01
> [973996.038652] R10: 0000000000000000 R11: 0000000000000001 R12: 00007f8c2153a170
> [973996.053073] R13: 00007f8cc8cf3040 R14: 00007f8c204f90c0 R15: 00007f8c2153a120
> [973996.067494]  </IRQ>
> [973996.071858] Code:
> [973996.076051] b9
> [973996.079723] 01
> [973996.083383] 00
> [973996.087056] 00
> [973996.090730] 00
> [973996.094388] 85
> [973996.098062] c0
> [973996.101738] 0f
> [973996.105410] 8e
> [973996.109087] da
> [973996.112759] fd
> [973996.116433] ff
> [973996.120109] ff
> [973996.123783] 85
> [973996.127457] c0
> [973996.131132] 75
> [973996.134806] 28
> [973996.138481] 0f
> [973996.142156] b7
> [973996.145831] 43
> [973996.149504] 30
> [973996.153180] 41
> [973996.156835] 01
> [973996.160511] 45
> [973996.164168] 04
> [973996.167843] 48
> [973996.171517] 8b
> [973996.175190] 1b
> [973996.178848] 4c
> [973996.182521] 39
> [973996.186198] fb
> [973996.189872] 74
> [973996.193529] 8c
> [973996.197202] 49
> [973996.200877] 3b
> [973996.204534] 9c
> [973996.208209] 24
> [973996.211883] 50
> [973996.215559] 01
> [973996.219215] 00
> [973996.222889] 00
> [973996.226562] 74
> [973996.230221] c1
> [973996.233894] <8b>
> [973996.237916] 43
> [973996.241590] 28
> [973996.245264] 3b
> [973996.248921] 45
> [973996.252596] d4
> [973996.256271] 0f
> [973996.259929] 88
> [973996.263601] 9f
> [973996.267276] fd
> [973996.270935] ff
> [973996.274592] ff
> [973996.278267] eb
> [973996.281938] b3
> [973996.285614] 48
> [973996.289289] 8d
> [973996.292964] 43
> [973996.296638] 10
> [973996.300296] 8b
> [973996.303969] 4b
> [973996.307642] 28
> [973996.311325] RIP: tcp_sacktag_walk+0x2b7/0x460 RSP: ffff88203ef438e8
> [973996.324007] ------------[ cut here ]------------
> [973996.333399] CR2: 0000000000000028
> [973996.340218] ---[ end trace 5d1c83e12a57d03a ]---
> [973996.349605] Kernel panic - not syncing: Fatal exception in interrupt
> [973996.362521] Kernel Offset: disabled
> TBOOT: wait until all APs ready for txt shutdown
> TBOOT: IA32_FEATURE_CONTROL_MSR: 0000ff07
> TBOOT: CPU is SMX-capable
> TBOOT: CPU is VMX-capable
> TBOOT: SMX is enabled
> TBOOT: TXT chipset and all needed capabilities present
> TBOOT: TPM: Pcr 17 extend, return value = 0000003D
> TBOOT: TPM: Pcr 18 extend, return value = 0000003D
> TBOOT: TPM: Pcr 19 extend, return value = 0000003D
> TBOOT: cap'ed dynamic PCRs
> TBOOT: waiting for APs (0) to exit guests...
> TBOOT: .
> TBOOT:
> TBOOT: all APs exited guests
> TBOOT: calling txt_shutdown on AP
>
>
> Thanks!

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c
       [not found]   ` <CAK6E8=cGF+xKiixRVvA=3PVPA7OQta9hVLTgCbKgvYf3e9Eu-A@mail.gmail.com>
@ 2017-09-26 13:10     ` Roman Gushchin
  2017-09-27  0:12       ` Yuchung Cheng
  0 siblings, 1 reply; 39+ messages in thread
From: Roman Gushchin @ 2017-09-26 13:10 UTC (permalink / raw)
  To: Yuchung Cheng
  Cc: Oleksandr Natalenko, Hideaki YOSHIFUJI, Alexey Kuznetsov, netdev,
	linux-kernel

> On Wed, Sep 20, 2017 at 6:46 PM, Roman Gushchin <guro@fb.com> wrote:
> >
> > > Hello.
> > >
> > > Since, IIRC, v4.11, there is some regression in TCP stack resulting in the
> > > warning shown below. Most of the time it is harmless, but rarely it just
> > > causes either freeze or (I believe, this is related too) panic in
> > > tcp_sacktag_walk() (because sk_buff passed to this function is NULL).
> > > Unfortunately, I still do not have proper stacktrace from panic, but will try
> > > to capture it if possible.
> > >
> > > Also, I have custom settings regarding TCP stack, shown below as well. ifb is
> > > used to shape traffic with tc.
> > >
> > > Please note this regression was already reported as BZ [1] and as a letter to
> > > ML [2], but got neither attention nor resolution. It is reproducible for (not
> > > only) me on my home router since v4.11 till v4.13.1 incl.
> > >
> > > Please advise on how to deal with it. I'll provide any additional info if
> > > necessary, also ready to test patches if any.
> > >
> > > Thanks.
> > >
> > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=195835
> > > [2] https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spinics.net_lists_netdev_msg436158.html&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=jJYgtDM7QT-W-Fz_d29HYQ&m=MDDRfLG5DvdOeniMpaZDJI8ulKQ6PQ6OX_1YtRsiTMA&s=-n3dGZw-pQ95kMBUfq5G9nYZFcuWtbTDlYFkcvQPoKc&e= 
> >
> > We're experiencing the same problems on some machines in our fleet.
> > Exactly the same symptoms: tcp_fastretrans_alert() warnings and
> > sometimes panics in tcp_sacktag_walk().
> >
> > Here is an example of a backtrace with the panic log:

Hi Yuchung!

> do you still see the panics if you disable RACK?
> sysctl net.ipv4.tcp_recovery=0?

No, we haven't seen any crash since that.

> 
> also have you experience any sack reneg? could you post the output of
> ' nstat |grep -i TCP' thanks

hostname	TcpActiveOpens                  2289680            0.0
hostname	TcpPassiveOpens                 3592758            0.0
hostname	TcpAttemptFails                 746910             0.0
hostname	TcpEstabResets                  154988             0.0
hostname	TcpInSegs                       16258678255        0.0
hostname	TcpOutSegs                      46967011611        0.0
hostname	TcpRetransSegs                  13724310           0.0
hostname	TcpInErrs                       2                  0.0
hostname	TcpOutRsts                      9418798            0.0
hostname	TcpExtEmbryonicRsts             2303               0.0
hostname	TcpExtPruneCalled               90192              0.0
hostname	TcpExtOfoPruned                 57274              0.0
hostname	TcpExtOutOfWindowIcmps          3                  0.0
hostname	TcpExtTW                        1164705            0.0
hostname	TcpExtTWRecycled                2                  0.0
hostname	TcpExtPAWSEstab                 159                0.0
hostname	TcpExtDelayedACKs               209207209          0.0
hostname	TcpExtDelayedACKLocked          508571             0.0
hostname	TcpExtDelayedACKLost            1713248            0.0
hostname	TcpExtListenOverflows           625                0.0
hostname	TcpExtListenDrops               625                0.0
hostname	TcpExtTCPHPHits                 9341188489         0.0
hostname	TcpExtTCPPureAcks               1434646465         0.0
hostname	TcpExtTCPHPAcks                 5733614672         0.0
hostname	TcpExtTCPSackRecovery           3261698            0.0
hostname	TcpExtTCPSACKReneging           12203              0.0
hostname	TcpExtTCPSACKReorder            433189             0.0
hostname	TcpExtTCPTSReorder              22694              0.0
hostname	TcpExtTCPFullUndo               45092              0.0
hostname	TcpExtTCPPartialUndo            22016              0.0
hostname	TcpExtTCPLossUndo               2150040            0.0
hostname	TcpExtTCPLostRetransmit         60119              0.0
hostname	TcpExtTCPSackFailures           2626782            0.0
hostname	TcpExtTCPLossFailures           182999             0.0
hostname	TcpExtTCPFastRetrans            4334275            0.0
hostname	TcpExtTCPSlowStartRetrans       3453348            0.0
hostname	TcpExtTCPTimeouts               1070997            0.0
hostname	TcpExtTCPLossProbes             2633545            0.0
hostname	TcpExtTCPLossProbeRecovery      941647             0.0
hostname	TcpExtTCPSackRecoveryFail       336302             0.0
hostname	TcpExtTCPRcvCollapsed           461354             0.0
hostname	TcpExtTCPAbortOnData            349196             0.0
hostname	TcpExtTCPAbortOnClose           3395               0.0
hostname	TcpExtTCPAbortOnTimeout         51201              0.0
hostname	TcpExtTCPMemoryPressures        2                  0.0
hostname	TcpExtTCPSpuriousRTOs           2120503            0.0
hostname	TcpExtTCPSackShifted            2613736            0.0
hostname	TcpExtTCPSackMerged             21358743           0.0
hostname	TcpExtTCPSackShiftFallback      8769387            0.0
hostname	TcpExtTCPBacklogDrop            5                  0.0
hostname	TcpExtTCPRetransFail            843                0.0
hostname	TcpExtTCPRcvCoalesce            949068035          0.0
hostname	TcpExtTCPOFOQueue               470118             0.0
hostname	TcpExtTCPOFODrop                9915               0.0
hostname	TcpExtTCPOFOMerge               9                  0.0
hostname	TcpExtTCPChallengeACK           90                 0.0
hostname	TcpExtTCPSYNChallenge           3                  0.0
hostname	TcpExtTCPFastOpenActive         2089               0.0
hostname	TcpExtTCPSpuriousRtxHostQueues  896596             0.0
hostname	TcpExtTCPAutoCorking            547386735          0.0
hostname	TcpExtTCPFromZeroWindowAdv      28757              0.0
hostname	TcpExtTCPToZeroWindowAdv        28761              0.0
hostname	TcpExtTCPWantZeroWindowAdv      322431             0.0
hostname	TcpExtTCPSynRetrans             3026               0.0
hostname	TcpExtTCPOrigDataSent           40976870977        0.0
hostname	TcpExtTCPHystartTrainDetect     453920             0.0
hostname	TcpExtTCPHystartTrainCwnd       11586273           0.0
hostname	TcpExtTCPHystartDelayDetect     10943              0.0
hostname	TcpExtTCPHystartDelayCwnd       763554             0.0
hostname	TcpExtTCPACKSkippedPAWS         30                 0.0
hostname	TcpExtTCPACKSkippedSeq          218                0.0
hostname	TcpExtTCPWinProbe               2408               0.0
hostname	TcpExtTCPKeepAlive              213768             0.0
hostname	TcpExtTCPMTUPFail               69                 0.0
hostname	TcpExtTCPMTUPSuccess            8811               0.0

Thanks!

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c
  2017-09-26 13:10     ` Roman Gushchin
@ 2017-09-27  0:12       ` Yuchung Cheng
  2017-09-27  0:18         ` Yuchung Cheng
  0 siblings, 1 reply; 39+ messages in thread
From: Yuchung Cheng @ 2017-09-27  0:12 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Oleksandr Natalenko, Hideaki YOSHIFUJI, Alexey Kuznetsov, netdev,
	linux-kernel

On Tue, Sep 26, 2017 at 6:10 AM, Roman Gushchin <guro@fb.com> wrote:
>> On Wed, Sep 20, 2017 at 6:46 PM, Roman Gushchin <guro@fb.com> wrote:
>> >
>> > > Hello.
>> > >
>> > > Since, IIRC, v4.11, there is some regression in TCP stack resulting in the
>> > > warning shown below. Most of the time it is harmless, but rarely it just
>> > > causes either freeze or (I believe, this is related too) panic in
>> > > tcp_sacktag_walk() (because sk_buff passed to this function is NULL).
>> > > Unfortunately, I still do not have proper stacktrace from panic, but will try
>> > > to capture it if possible.
>> > >
>> > > Also, I have custom settings regarding TCP stack, shown below as well. ifb is
>> > > used to shape traffic with tc.
>> > >
>> > > Please note this regression was already reported as BZ [1] and as a letter to
>> > > ML [2], but got neither attention nor resolution. It is reproducible for (not
>> > > only) me on my home router since v4.11 till v4.13.1 incl.
>> > >
>> > > Please advise on how to deal with it. I'll provide any additional info if
>> > > necessary, also ready to test patches if any.
>> > >
>> > > Thanks.
>> > >
>> > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=195835
>> > > [2] https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spinics.net_lists_netdev_msg436158.html&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=jJYgtDM7QT-W-Fz_d29HYQ&m=MDDRfLG5DvdOeniMpaZDJI8ulKQ6PQ6OX_1YtRsiTMA&s=-n3dGZw-pQ95kMBUfq5G9nYZFcuWtbTDlYFkcvQPoKc&e=
>> >
>> > We're experiencing the same problems on some machines in our fleet.
>> > Exactly the same symptoms: tcp_fastretrans_alert() warnings and
>> > sometimes panics in tcp_sacktag_walk().
>> >
>> > Here is an example of a backtrace with the panic log:
>
> Hi Yuchung!
>
>> do you still see the panics if you disable RACK?
>> sysctl net.ipv4.tcp_recovery=0?
>
> No, we haven't seen any crash since that.
I am out of ideas how RACK can potentially cause tcp_sacktag_walk to
take an empty skb :-( Do you have stack trace or any hint on which call
to tcp-sacktag_walk triggered the panic? internally at Google we never
see that.


>
>>
>> also have you experience any sack reneg? could you post the output of
>> ' nstat |grep -i TCP' thanks
>
> hostname        TcpActiveOpens                  2289680            0.0
> hostname        TcpPassiveOpens                 3592758            0.0
> hostname        TcpAttemptFails                 746910             0.0
> hostname        TcpEstabResets                  154988             0.0
> hostname        TcpInSegs                       16258678255        0.0
> hostname        TcpOutSegs                      46967011611        0.0
> hostname        TcpRetransSegs                  13724310           0.0
> hostname        TcpInErrs                       2                  0.0
> hostname        TcpOutRsts                      9418798            0.0
> hostname        TcpExtEmbryonicRsts             2303               0.0
> hostname        TcpExtPruneCalled               90192              0.0
> hostname        TcpExtOfoPruned                 57274              0.0
> hostname        TcpExtOutOfWindowIcmps          3                  0.0
> hostname        TcpExtTW                        1164705            0.0
> hostname        TcpExtTWRecycled                2                  0.0
> hostname        TcpExtPAWSEstab                 159                0.0
> hostname        TcpExtDelayedACKs               209207209          0.0
> hostname        TcpExtDelayedACKLocked          508571             0.0
> hostname        TcpExtDelayedACKLost            1713248            0.0
> hostname        TcpExtListenOverflows           625                0.0
> hostname        TcpExtListenDrops               625                0.0
> hostname        TcpExtTCPHPHits                 9341188489         0.0
> hostname        TcpExtTCPPureAcks               1434646465         0.0
> hostname        TcpExtTCPHPAcks                 5733614672         0.0
> hostname        TcpExtTCPSackRecovery           3261698            0.0
> hostname        TcpExtTCPSACKReneging           12203              0.0
> hostname        TcpExtTCPSACKReorder            433189             0.0
> hostname        TcpExtTCPTSReorder              22694              0.0
> hostname        TcpExtTCPFullUndo               45092              0.0
> hostname        TcpExtTCPPartialUndo            22016              0.0
> hostname        TcpExtTCPLossUndo               2150040            0.0
> hostname        TcpExtTCPLostRetransmit         60119              0.0
> hostname        TcpExtTCPSackFailures           2626782            0.0
> hostname        TcpExtTCPLossFailures           182999             0.0
> hostname        TcpExtTCPFastRetrans            4334275            0.0
> hostname        TcpExtTCPSlowStartRetrans       3453348            0.0
> hostname        TcpExtTCPTimeouts               1070997            0.0
> hostname        TcpExtTCPLossProbes             2633545            0.0
> hostname        TcpExtTCPLossProbeRecovery      941647             0.0
> hostname        TcpExtTCPSackRecoveryFail       336302             0.0
> hostname        TcpExtTCPRcvCollapsed           461354             0.0
> hostname        TcpExtTCPAbortOnData            349196             0.0
> hostname        TcpExtTCPAbortOnClose           3395               0.0
> hostname        TcpExtTCPAbortOnTimeout         51201              0.0
> hostname        TcpExtTCPMemoryPressures        2                  0.0
> hostname        TcpExtTCPSpuriousRTOs           2120503            0.0
> hostname        TcpExtTCPSackShifted            2613736            0.0
> hostname        TcpExtTCPSackMerged             21358743           0.0
> hostname        TcpExtTCPSackShiftFallback      8769387            0.0
> hostname        TcpExtTCPBacklogDrop            5                  0.0
> hostname        TcpExtTCPRetransFail            843                0.0
> hostname        TcpExtTCPRcvCoalesce            949068035          0.0
> hostname        TcpExtTCPOFOQueue               470118             0.0
> hostname        TcpExtTCPOFODrop                9915               0.0
> hostname        TcpExtTCPOFOMerge               9                  0.0
> hostname        TcpExtTCPChallengeACK           90                 0.0
> hostname        TcpExtTCPSYNChallenge           3                  0.0
> hostname        TcpExtTCPFastOpenActive         2089               0.0
> hostname        TcpExtTCPSpuriousRtxHostQueues  896596             0.0
> hostname        TcpExtTCPAutoCorking            547386735          0.0
> hostname        TcpExtTCPFromZeroWindowAdv      28757              0.0
> hostname        TcpExtTCPToZeroWindowAdv        28761              0.0
> hostname        TcpExtTCPWantZeroWindowAdv      322431             0.0
> hostname        TcpExtTCPSynRetrans             3026               0.0
> hostname        TcpExtTCPOrigDataSent           40976870977        0.0
> hostname        TcpExtTCPHystartTrainDetect     453920             0.0
> hostname        TcpExtTCPHystartTrainCwnd       11586273           0.0
> hostname        TcpExtTCPHystartDelayDetect     10943              0.0
> hostname        TcpExtTCPHystartDelayCwnd       763554             0.0
> hostname        TcpExtTCPACKSkippedPAWS         30                 0.0
> hostname        TcpExtTCPACKSkippedSeq          218                0.0
> hostname        TcpExtTCPWinProbe               2408               0.0
> hostname        TcpExtTCPKeepAlive              213768             0.0
> hostname        TcpExtTCPMTUPFail               69                 0.0
> hostname        TcpExtTCPMTUPSuccess            8811               0.0
>
> Thanks!

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c
  2017-09-27  0:12       ` Yuchung Cheng
@ 2017-09-27  0:18         ` Yuchung Cheng
  2017-09-28  8:14           ` Oleksandr Natalenko
  0 siblings, 1 reply; 39+ messages in thread
From: Yuchung Cheng @ 2017-09-27  0:18 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Oleksandr Natalenko, Hideaki YOSHIFUJI, Alexey Kuznetsov, netdev,
	linux-kernel

On Tue, Sep 26, 2017 at 5:12 PM, Yuchung Cheng <ycheng@google.com> wrote:
> On Tue, Sep 26, 2017 at 6:10 AM, Roman Gushchin <guro@fb.com> wrote:
>>> On Wed, Sep 20, 2017 at 6:46 PM, Roman Gushchin <guro@fb.com> wrote:
>>> >
>>> > > Hello.
>>> > >
>>> > > Since, IIRC, v4.11, there is some regression in TCP stack resulting in the
>>> > > warning shown below. Most of the time it is harmless, but rarely it just
>>> > > causes either freeze or (I believe, this is related too) panic in
>>> > > tcp_sacktag_walk() (because sk_buff passed to this function is NULL).
>>> > > Unfortunately, I still do not have proper stacktrace from panic, but will try
>>> > > to capture it if possible.
>>> > >
>>> > > Also, I have custom settings regarding TCP stack, shown below as well. ifb is
>>> > > used to shape traffic with tc.
>>> > >
>>> > > Please note this regression was already reported as BZ [1] and as a letter to
>>> > > ML [2], but got neither attention nor resolution. It is reproducible for (not
>>> > > only) me on my home router since v4.11 till v4.13.1 incl.
>>> > >
>>> > > Please advise on how to deal with it. I'll provide any additional info if
>>> > > necessary, also ready to test patches if any.
>>> > >
>>> > > Thanks.
>>> > >
>>> > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=195835
>>> > > [2] https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spinics.net_lists_netdev_msg436158.html&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=jJYgtDM7QT-W-Fz_d29HYQ&m=MDDRfLG5DvdOeniMpaZDJI8ulKQ6PQ6OX_1YtRsiTMA&s=-n3dGZw-pQ95kMBUfq5G9nYZFcuWtbTDlYFkcvQPoKc&e=
>>> >
>>> > We're experiencing the same problems on some machines in our fleet.
>>> > Exactly the same symptoms: tcp_fastretrans_alert() warnings and
>>> > sometimes panics in tcp_sacktag_walk().
>>> >
>>> > Here is an example of a backtrace with the panic log:
>>
>> Hi Yuchung!
>>
>>> do you still see the panics if you disable RACK?
>>> sysctl net.ipv4.tcp_recovery=0?
>>
>> No, we haven't seen any crash since that.
> I am out of ideas how RACK can potentially cause tcp_sacktag_walk to
> take an empty skb :-( Do you have stack trace or any hint on which call
> to tcp-sacktag_walk triggered the panic? internally at Google we never
> see that.
hmm something just struck me: could you try
sysctl net.ipv4.tcp_recovery=1 net.ipv4.tcp_retrans_collapse=0
and see if kernel still panics on sack processing?

>
>
>>
>>>
>>> also have you experience any sack reneg? could you post the output of
>>> ' nstat |grep -i TCP' thanks
>>
>> hostname        TcpActiveOpens                  2289680            0.0
>> hostname        TcpPassiveOpens                 3592758            0.0
>> hostname        TcpAttemptFails                 746910             0.0
>> hostname        TcpEstabResets                  154988             0.0
>> hostname        TcpInSegs                       16258678255        0.0
>> hostname        TcpOutSegs                      46967011611        0.0
>> hostname        TcpRetransSegs                  13724310           0.0
>> hostname        TcpInErrs                       2                  0.0
>> hostname        TcpOutRsts                      9418798            0.0
>> hostname        TcpExtEmbryonicRsts             2303               0.0
>> hostname        TcpExtPruneCalled               90192              0.0
>> hostname        TcpExtOfoPruned                 57274              0.0
>> hostname        TcpExtOutOfWindowIcmps          3                  0.0
>> hostname        TcpExtTW                        1164705            0.0
>> hostname        TcpExtTWRecycled                2                  0.0
>> hostname        TcpExtPAWSEstab                 159                0.0
>> hostname        TcpExtDelayedACKs               209207209          0.0
>> hostname        TcpExtDelayedACKLocked          508571             0.0
>> hostname        TcpExtDelayedACKLost            1713248            0.0
>> hostname        TcpExtListenOverflows           625                0.0
>> hostname        TcpExtListenDrops               625                0.0
>> hostname        TcpExtTCPHPHits                 9341188489         0.0
>> hostname        TcpExtTCPPureAcks               1434646465         0.0
>> hostname        TcpExtTCPHPAcks                 5733614672         0.0
>> hostname        TcpExtTCPSackRecovery           3261698            0.0
>> hostname        TcpExtTCPSACKReneging           12203              0.0
>> hostname        TcpExtTCPSACKReorder            433189             0.0
>> hostname        TcpExtTCPTSReorder              22694              0.0
>> hostname        TcpExtTCPFullUndo               45092              0.0
>> hostname        TcpExtTCPPartialUndo            22016              0.0
>> hostname        TcpExtTCPLossUndo               2150040            0.0
>> hostname        TcpExtTCPLostRetransmit         60119              0.0
>> hostname        TcpExtTCPSackFailures           2626782            0.0
>> hostname        TcpExtTCPLossFailures           182999             0.0
>> hostname        TcpExtTCPFastRetrans            4334275            0.0
>> hostname        TcpExtTCPSlowStartRetrans       3453348            0.0
>> hostname        TcpExtTCPTimeouts               1070997            0.0
>> hostname        TcpExtTCPLossProbes             2633545            0.0
>> hostname        TcpExtTCPLossProbeRecovery      941647             0.0
>> hostname        TcpExtTCPSackRecoveryFail       336302             0.0
>> hostname        TcpExtTCPRcvCollapsed           461354             0.0
>> hostname        TcpExtTCPAbortOnData            349196             0.0
>> hostname        TcpExtTCPAbortOnClose           3395               0.0
>> hostname        TcpExtTCPAbortOnTimeout         51201              0.0
>> hostname        TcpExtTCPMemoryPressures        2                  0.0
>> hostname        TcpExtTCPSpuriousRTOs           2120503            0.0
>> hostname        TcpExtTCPSackShifted            2613736            0.0
>> hostname        TcpExtTCPSackMerged             21358743           0.0
>> hostname        TcpExtTCPSackShiftFallback      8769387            0.0
>> hostname        TcpExtTCPBacklogDrop            5                  0.0
>> hostname        TcpExtTCPRetransFail            843                0.0
>> hostname        TcpExtTCPRcvCoalesce            949068035          0.0
>> hostname        TcpExtTCPOFOQueue               470118             0.0
>> hostname        TcpExtTCPOFODrop                9915               0.0
>> hostname        TcpExtTCPOFOMerge               9                  0.0
>> hostname        TcpExtTCPChallengeACK           90                 0.0
>> hostname        TcpExtTCPSYNChallenge           3                  0.0
>> hostname        TcpExtTCPFastOpenActive         2089               0.0
>> hostname        TcpExtTCPSpuriousRtxHostQueues  896596             0.0
>> hostname        TcpExtTCPAutoCorking            547386735          0.0
>> hostname        TcpExtTCPFromZeroWindowAdv      28757              0.0
>> hostname        TcpExtTCPToZeroWindowAdv        28761              0.0
>> hostname        TcpExtTCPWantZeroWindowAdv      322431             0.0
>> hostname        TcpExtTCPSynRetrans             3026               0.0
>> hostname        TcpExtTCPOrigDataSent           40976870977        0.0
>> hostname        TcpExtTCPHystartTrainDetect     453920             0.0
>> hostname        TcpExtTCPHystartTrainCwnd       11586273           0.0
>> hostname        TcpExtTCPHystartDelayDetect     10943              0.0
>> hostname        TcpExtTCPHystartDelayCwnd       763554             0.0
>> hostname        TcpExtTCPACKSkippedPAWS         30                 0.0
>> hostname        TcpExtTCPACKSkippedSeq          218                0.0
>> hostname        TcpExtTCPWinProbe               2408               0.0
>> hostname        TcpExtTCPKeepAlive              213768             0.0
>> hostname        TcpExtTCPMTUPFail               69                 0.0
>> hostname        TcpExtTCPMTUPSuccess            8811               0.0
>>
>> Thanks!

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c
  2017-09-27  0:18         ` Yuchung Cheng
@ 2017-09-28  8:14           ` Oleksandr Natalenko
  2017-09-28 23:36             ` Yuchung Cheng
  0 siblings, 1 reply; 39+ messages in thread
From: Oleksandr Natalenko @ 2017-09-28  8:14 UTC (permalink / raw)
  To: Yuchung Cheng
  Cc: Roman Gushchin, Hideaki YOSHIFUJI, Alexey Kuznetsov, netdev,
	linux-kernel

Hi.

Won't tell about panic in tcp_sacktag_walk() since I cannot trigger it 
intentionally, but setting net.ipv4.tcp_retrans_collapse to 0 *does not* fix 
warning in tcp_fastretrans_alert() for me.

On středa 27. září 2017 2:18:32 CEST Yuchung Cheng wrote:
> On Tue, Sep 26, 2017 at 5:12 PM, Yuchung Cheng <ycheng@google.com> wrote:
> > On Tue, Sep 26, 2017 at 6:10 AM, Roman Gushchin <guro@fb.com> wrote:
> >>> On Wed, Sep 20, 2017 at 6:46 PM, Roman Gushchin <guro@fb.com> wrote:
> >>> > > Hello.
> >>> > > 
> >>> > > Since, IIRC, v4.11, there is some regression in TCP stack resulting
> >>> > > in the
> >>> > > warning shown below. Most of the time it is harmless, but rarely it
> >>> > > just
> >>> > > causes either freeze or (I believe, this is related too) panic in
> >>> > > tcp_sacktag_walk() (because sk_buff passed to this function is
> >>> > > NULL).
> >>> > > Unfortunately, I still do not have proper stacktrace from panic, but
> >>> > > will try to capture it if possible.
> >>> > > 
> >>> > > Also, I have custom settings regarding TCP stack, shown below as
> >>> > > well. ifb is used to shape traffic with tc.
> >>> > > 
> >>> > > Please note this regression was already reported as BZ [1] and as a
> >>> > > letter to ML [2], but got neither attention nor resolution. It is
> >>> > > reproducible for (not only) me on my home router since v4.11 till
> >>> > > v4.13.1 incl.
> >>> > > 
> >>> > > Please advise on how to deal with it. I'll provide any additional
> >>> > > info if
> >>> > > necessary, also ready to test patches if any.
> >>> > > 
> >>> > > Thanks.
> >>> > > 
> >>> > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=195835
> >>> > > [2]
> >>> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spinics.ne
> >>> > > t_lists_netdev_msg436158.html&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=jJ
> >>> > > YgtDM7QT-W-Fz_d29HYQ&m=MDDRfLG5DvdOeniMpaZDJI8ulKQ6PQ6OX_1YtRsiTMA&s
> >>> > > =-n3dGZw-pQ95kMBUfq5G9nYZFcuWtbTDlYFkcvQPoKc&e=>>> > 
> >>> > We're experiencing the same problems on some machines in our fleet.
> >>> > Exactly the same symptoms: tcp_fastretrans_alert() warnings and
> >>> > sometimes panics in tcp_sacktag_walk().
> >> 
> >>> > Here is an example of a backtrace with the panic log:
> >> Hi Yuchung!
> >> 
> >>> do you still see the panics if you disable RACK?
> >>> sysctl net.ipv4.tcp_recovery=0?
> >> 
> >> No, we haven't seen any crash since that.
> > 
> > I am out of ideas how RACK can potentially cause tcp_sacktag_walk to
> > take an empty skb :-( Do you have stack trace or any hint on which call
> > to tcp-sacktag_walk triggered the panic? internally at Google we never
> > see that.
> 
> hmm something just struck me: could you try
> sysctl net.ipv4.tcp_recovery=1 net.ipv4.tcp_retrans_collapse=0
> and see if kernel still panics on sack processing?
> 
> >>> also have you experience any sack reneg? could you post the output of
> >>> ' nstat |grep -i TCP' thanks
> >> 
> >> hostname        TcpActiveOpens                  2289680            0.0
> >> hostname        TcpPassiveOpens                 3592758            0.0
> >> hostname        TcpAttemptFails                 746910             0.0
> >> hostname        TcpEstabResets                  154988             0.0
> >> hostname        TcpInSegs                       16258678255        0.0
> >> hostname        TcpOutSegs                      46967011611        0.0
> >> hostname        TcpRetransSegs                  13724310           0.0
> >> hostname        TcpInErrs                       2                  0.0
> >> hostname        TcpOutRsts                      9418798            0.0
> >> hostname        TcpExtEmbryonicRsts             2303               0.0
> >> hostname        TcpExtPruneCalled               90192              0.0
> >> hostname        TcpExtOfoPruned                 57274              0.0
> >> hostname        TcpExtOutOfWindowIcmps          3                  0.0
> >> hostname        TcpExtTW                        1164705            0.0
> >> hostname        TcpExtTWRecycled                2                  0.0
> >> hostname        TcpExtPAWSEstab                 159                0.0
> >> hostname        TcpExtDelayedACKs               209207209          0.0
> >> hostname        TcpExtDelayedACKLocked          508571             0.0
> >> hostname        TcpExtDelayedACKLost            1713248            0.0
> >> hostname        TcpExtListenOverflows           625                0.0
> >> hostname        TcpExtListenDrops               625                0.0
> >> hostname        TcpExtTCPHPHits                 9341188489         0.0
> >> hostname        TcpExtTCPPureAcks               1434646465         0.0
> >> hostname        TcpExtTCPHPAcks                 5733614672         0.0
> >> hostname        TcpExtTCPSackRecovery           3261698            0.0
> >> hostname        TcpExtTCPSACKReneging           12203              0.0
> >> hostname        TcpExtTCPSACKReorder            433189             0.0
> >> hostname        TcpExtTCPTSReorder              22694              0.0
> >> hostname        TcpExtTCPFullUndo               45092              0.0
> >> hostname        TcpExtTCPPartialUndo            22016              0.0
> >> hostname        TcpExtTCPLossUndo               2150040            0.0
> >> hostname        TcpExtTCPLostRetransmit         60119              0.0
> >> hostname        TcpExtTCPSackFailures           2626782            0.0
> >> hostname        TcpExtTCPLossFailures           182999             0.0
> >> hostname        TcpExtTCPFastRetrans            4334275            0.0
> >> hostname        TcpExtTCPSlowStartRetrans       3453348            0.0
> >> hostname        TcpExtTCPTimeouts               1070997            0.0
> >> hostname        TcpExtTCPLossProbes             2633545            0.0
> >> hostname        TcpExtTCPLossProbeRecovery      941647             0.0
> >> hostname        TcpExtTCPSackRecoveryFail       336302             0.0
> >> hostname        TcpExtTCPRcvCollapsed           461354             0.0
> >> hostname        TcpExtTCPAbortOnData            349196             0.0
> >> hostname        TcpExtTCPAbortOnClose           3395               0.0
> >> hostname        TcpExtTCPAbortOnTimeout         51201              0.0
> >> hostname        TcpExtTCPMemoryPressures        2                  0.0
> >> hostname        TcpExtTCPSpuriousRTOs           2120503            0.0
> >> hostname        TcpExtTCPSackShifted            2613736            0.0
> >> hostname        TcpExtTCPSackMerged             21358743           0.0
> >> hostname        TcpExtTCPSackShiftFallback      8769387            0.0
> >> hostname        TcpExtTCPBacklogDrop            5                  0.0
> >> hostname        TcpExtTCPRetransFail            843                0.0
> >> hostname        TcpExtTCPRcvCoalesce            949068035          0.0
> >> hostname        TcpExtTCPOFOQueue               470118             0.0
> >> hostname        TcpExtTCPOFODrop                9915               0.0
> >> hostname        TcpExtTCPOFOMerge               9                  0.0
> >> hostname        TcpExtTCPChallengeACK           90                 0.0
> >> hostname        TcpExtTCPSYNChallenge           3                  0.0
> >> hostname        TcpExtTCPFastOpenActive         2089               0.0
> >> hostname        TcpExtTCPSpuriousRtxHostQueues  896596             0.0
> >> hostname        TcpExtTCPAutoCorking            547386735          0.0
> >> hostname        TcpExtTCPFromZeroWindowAdv      28757              0.0
> >> hostname        TcpExtTCPToZeroWindowAdv        28761              0.0
> >> hostname        TcpExtTCPWantZeroWindowAdv      322431             0.0
> >> hostname        TcpExtTCPSynRetrans             3026               0.0
> >> hostname        TcpExtTCPOrigDataSent           40976870977        0.0
> >> hostname        TcpExtTCPHystartTrainDetect     453920             0.0
> >> hostname        TcpExtTCPHystartTrainCwnd       11586273           0.0
> >> hostname        TcpExtTCPHystartDelayDetect     10943              0.0
> >> hostname        TcpExtTCPHystartDelayCwnd       763554             0.0
> >> hostname        TcpExtTCPACKSkippedPAWS         30                 0.0
> >> hostname        TcpExtTCPACKSkippedSeq          218                0.0
> >> hostname        TcpExtTCPWinProbe               2408               0.0
> >> hostname        TcpExtTCPKeepAlive              213768             0.0
> >> hostname        TcpExtTCPMTUPFail               69                 0.0
> >> hostname        TcpExtTCPMTUPSuccess            8811               0.0
> >> 
> >> Thanks!

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c
  2017-09-28  8:14           ` Oleksandr Natalenko
@ 2017-09-28 23:36             ` Yuchung Cheng
  2017-10-26  2:07               ` Alexei Starovoitov
  0 siblings, 1 reply; 39+ messages in thread
From: Yuchung Cheng @ 2017-09-28 23:36 UTC (permalink / raw)
  To: Oleksandr Natalenko
  Cc: Roman Gushchin, Hideaki YOSHIFUJI, Alexey Kuznetsov, netdev,
	linux-kernel

On Thu, Sep 28, 2017 at 1:14 AM, Oleksandr Natalenko
<oleksandr@natalenko.name> wrote:
> Hi.
>
> Won't tell about panic in tcp_sacktag_walk() since I cannot trigger it
> intentionally, but setting net.ipv4.tcp_retrans_collapse to 0 *does not* fix
> warning in tcp_fastretrans_alert() for me.

Hi Oleksandr: no retrans_collapse should not matter for that warning
in tcp_fstretrans_alert(). the warning as I explained earlier is
likely false. Neal and I are more concerned the panic in
tcp_sacktag_walk. This is just a blind shot but thx for retrying.

We can submit a one-liner to remove the fast retrans warning but want
to nail the bigger issue first.

>
> On středa 27. září 2017 2:18:32 CEST Yuchung Cheng wrote:
>> On Tue, Sep 26, 2017 at 5:12 PM, Yuchung Cheng <ycheng@google.com> wrote:
>> > On Tue, Sep 26, 2017 at 6:10 AM, Roman Gushchin <guro@fb.com> wrote:
>> >>> On Wed, Sep 20, 2017 at 6:46 PM, Roman Gushchin <guro@fb.com> wrote:
>> >>> > > Hello.
>> >>> > >
>> >>> > > Since, IIRC, v4.11, there is some regression in TCP stack resulting
>> >>> > > in the
>> >>> > > warning shown below. Most of the time it is harmless, but rarely it
>> >>> > > just
>> >>> > > causes either freeze or (I believe, this is related too) panic in
>> >>> > > tcp_sacktag_walk() (because sk_buff passed to this function is
>> >>> > > NULL).
>> >>> > > Unfortunately, I still do not have proper stacktrace from panic, but
>> >>> > > will try to capture it if possible.
>> >>> > >
>> >>> > > Also, I have custom settings regarding TCP stack, shown below as
>> >>> > > well. ifb is used to shape traffic with tc.
>> >>> > >
>> >>> > > Please note this regression was already reported as BZ [1] and as a
>> >>> > > letter to ML [2], but got neither attention nor resolution. It is
>> >>> > > reproducible for (not only) me on my home router since v4.11 till
>> >>> > > v4.13.1 incl.
>> >>> > >
>> >>> > > Please advise on how to deal with it. I'll provide any additional
>> >>> > > info if
>> >>> > > necessary, also ready to test patches if any.
>> >>> > >
>> >>> > > Thanks.
>> >>> > >
>> >>> > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=195835
>> >>> > > [2]
>> >>> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spinics.ne
>> >>> > > t_lists_netdev_msg436158.html&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=jJ
>> >>> > > YgtDM7QT-W-Fz_d29HYQ&m=MDDRfLG5DvdOeniMpaZDJI8ulKQ6PQ6OX_1YtRsiTMA&s
>> >>> > > =-n3dGZw-pQ95kMBUfq5G9nYZFcuWtbTDlYFkcvQPoKc&e=>>> >
>> >>> > We're experiencing the same problems on some machines in our fleet.
>> >>> > Exactly the same symptoms: tcp_fastretrans_alert() warnings and
>> >>> > sometimes panics in tcp_sacktag_walk().
>> >>
>> >>> > Here is an example of a backtrace with the panic log:
>> >> Hi Yuchung!
>> >>
>> >>> do you still see the panics if you disable RACK?
>> >>> sysctl net.ipv4.tcp_recovery=0?
>> >>
>> >> No, we haven't seen any crash since that.
>> >
>> > I am out of ideas how RACK can potentially cause tcp_sacktag_walk to
>> > take an empty skb :-( Do you have stack trace or any hint on which call
>> > to tcp-sacktag_walk triggered the panic? internally at Google we never
>> > see that.
>>
>> hmm something just struck me: could you try
>> sysctl net.ipv4.tcp_recovery=1 net.ipv4.tcp_retrans_collapse=0
>> and see if kernel still panics on sack processing?
>>
>> >>> also have you experience any sack reneg? could you post the output of
>> >>> ' nstat |grep -i TCP' thanks
>> >>
>> >> hostname        TcpActiveOpens                  2289680            0.0
>> >> hostname        TcpPassiveOpens                 3592758            0.0
>> >> hostname        TcpAttemptFails                 746910             0.0
>> >> hostname        TcpEstabResets                  154988             0.0
>> >> hostname        TcpInSegs                       16258678255        0.0
>> >> hostname        TcpOutSegs                      46967011611        0.0
>> >> hostname        TcpRetransSegs                  13724310           0.0
>> >> hostname        TcpInErrs                       2                  0.0
>> >> hostname        TcpOutRsts                      9418798            0.0
>> >> hostname        TcpExtEmbryonicRsts             2303               0.0
>> >> hostname        TcpExtPruneCalled               90192              0.0
>> >> hostname        TcpExtOfoPruned                 57274              0.0
>> >> hostname        TcpExtOutOfWindowIcmps          3                  0.0
>> >> hostname        TcpExtTW                        1164705            0.0
>> >> hostname        TcpExtTWRecycled                2                  0.0
>> >> hostname        TcpExtPAWSEstab                 159                0.0
>> >> hostname        TcpExtDelayedACKs               209207209          0.0
>> >> hostname        TcpExtDelayedACKLocked          508571             0.0
>> >> hostname        TcpExtDelayedACKLost            1713248            0.0
>> >> hostname        TcpExtListenOverflows           625                0.0
>> >> hostname        TcpExtListenDrops               625                0.0
>> >> hostname        TcpExtTCPHPHits                 9341188489         0.0
>> >> hostname        TcpExtTCPPureAcks               1434646465         0.0
>> >> hostname        TcpExtTCPHPAcks                 5733614672         0.0
>> >> hostname        TcpExtTCPSackRecovery           3261698            0.0
>> >> hostname        TcpExtTCPSACKReneging           12203              0.0
>> >> hostname        TcpExtTCPSACKReorder            433189             0.0
>> >> hostname        TcpExtTCPTSReorder              22694              0.0
>> >> hostname        TcpExtTCPFullUndo               45092              0.0
>> >> hostname        TcpExtTCPPartialUndo            22016              0.0
>> >> hostname        TcpExtTCPLossUndo               2150040            0.0
>> >> hostname        TcpExtTCPLostRetransmit         60119              0.0
>> >> hostname        TcpExtTCPSackFailures           2626782            0.0
>> >> hostname        TcpExtTCPLossFailures           182999             0.0
>> >> hostname        TcpExtTCPFastRetrans            4334275            0.0
>> >> hostname        TcpExtTCPSlowStartRetrans       3453348            0.0
>> >> hostname        TcpExtTCPTimeouts               1070997            0.0
>> >> hostname        TcpExtTCPLossProbes             2633545            0.0
>> >> hostname        TcpExtTCPLossProbeRecovery      941647             0.0
>> >> hostname        TcpExtTCPSackRecoveryFail       336302             0.0
>> >> hostname        TcpExtTCPRcvCollapsed           461354             0.0
>> >> hostname        TcpExtTCPAbortOnData            349196             0.0
>> >> hostname        TcpExtTCPAbortOnClose           3395               0.0
>> >> hostname        TcpExtTCPAbortOnTimeout         51201              0.0
>> >> hostname        TcpExtTCPMemoryPressures        2                  0.0
>> >> hostname        TcpExtTCPSpuriousRTOs           2120503            0.0
>> >> hostname        TcpExtTCPSackShifted            2613736            0.0
>> >> hostname        TcpExtTCPSackMerged             21358743           0.0
>> >> hostname        TcpExtTCPSackShiftFallback      8769387            0.0
>> >> hostname        TcpExtTCPBacklogDrop            5                  0.0
>> >> hostname        TcpExtTCPRetransFail            843                0.0
>> >> hostname        TcpExtTCPRcvCoalesce            949068035          0.0
>> >> hostname        TcpExtTCPOFOQueue               470118             0.0
>> >> hostname        TcpExtTCPOFODrop                9915               0.0
>> >> hostname        TcpExtTCPOFOMerge               9                  0.0
>> >> hostname        TcpExtTCPChallengeACK           90                 0.0
>> >> hostname        TcpExtTCPSYNChallenge           3                  0.0
>> >> hostname        TcpExtTCPFastOpenActive         2089               0.0
>> >> hostname        TcpExtTCPSpuriousRtxHostQueues  896596             0.0
>> >> hostname        TcpExtTCPAutoCorking            547386735          0.0
>> >> hostname        TcpExtTCPFromZeroWindowAdv      28757              0.0
>> >> hostname        TcpExtTCPToZeroWindowAdv        28761              0.0
>> >> hostname        TcpExtTCPWantZeroWindowAdv      322431             0.0
>> >> hostname        TcpExtTCPSynRetrans             3026               0.0
>> >> hostname        TcpExtTCPOrigDataSent           40976870977        0.0
>> >> hostname        TcpExtTCPHystartTrainDetect     453920             0.0
>> >> hostname        TcpExtTCPHystartTrainCwnd       11586273           0.0
>> >> hostname        TcpExtTCPHystartDelayDetect     10943              0.0
>> >> hostname        TcpExtTCPHystartDelayCwnd       763554             0.0
>> >> hostname        TcpExtTCPACKSkippedPAWS         30                 0.0
>> >> hostname        TcpExtTCPACKSkippedSeq          218                0.0
>> >> hostname        TcpExtTCPWinProbe               2408               0.0
>> >> hostname        TcpExtTCPKeepAlive              213768             0.0
>> >> hostname        TcpExtTCPMTUPFail               69                 0.0
>> >> hostname        TcpExtTCPMTUPSuccess            8811               0.0
>> >>
>> >> Thanks!
>
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c
  2017-09-28 23:36             ` Yuchung Cheng
@ 2017-10-26  2:07               ` Alexei Starovoitov
  2017-10-26  5:37                 ` Yuchung Cheng
  0 siblings, 1 reply; 39+ messages in thread
From: Alexei Starovoitov @ 2017-10-26  2:07 UTC (permalink / raw)
  To: Yuchung Cheng
  Cc: Oleksandr Natalenko, Roman Gushchin, netdev, Eric Dumazet,
	Neal Cardwell, David S. Miller, brakmo

On Thu, Sep 28, 2017 at 04:36:58PM -0700, Yuchung Cheng wrote:
> On Thu, Sep 28, 2017 at 1:14 AM, Oleksandr Natalenko
> <oleksandr@natalenko.name> wrote:
> > Hi.
> >
> > Won't tell about panic in tcp_sacktag_walk() since I cannot trigger it
> > intentionally, but setting net.ipv4.tcp_retrans_collapse to 0 *does not* fix
> > warning in tcp_fastretrans_alert() for me.
> 
> Hi Oleksandr: no retrans_collapse should not matter for that warning
> in tcp_fstretrans_alert(). the warning as I explained earlier is
> likely false. Neal and I are more concerned the panic in
> tcp_sacktag_walk. This is just a blind shot but thx for retrying.
> 
> We can submit a one-liner to remove the fast retrans warning but want
> to nail the bigger issue first.

we're still seeing the warnings followed by crashes and it's very concerning.
We hoped that most recent Neal's patches from Sep 18 around this area may
magically fix the issue, but no. The panics are still there.
It's confirmed that net.ipv4.tcp_retrans_collapse=0 does not help
whereas net.ipv4.tcp_recovery=0 works, but obviously undesirable.
We're out of ideas on how to debug this.

Warning + panic:
[ 7678.309043] Call Trace:
[ 7678.313925]  <IRQ>
[ 7678.317955]  dump_stack+0x4d/0x70
[ 7678.324573]  __warn+0xd3/0xf0
[ 7678.330497]  warn_slowpath_null+0x1e/0x20
[ 7678.338510]  tcp_fastretrans_alert+0xacf/0xbd0
[ 7678.347380]  tcp_ack+0xbce/0x1390
[ 7678.354006]  tcp_rcv_established+0x1ce/0x740
[ 7678.362536]  tcp_v6_do_rcv+0x15b/0x400
[ 7678.370025]  tcp_v6_rcv+0x94c/0x9f0
[ 7678.376989]  ? tcp_rcv_established+0x42b/0x740
[ 7678.385866]  ip6_input_finish+0xea/0x430
[ 7678.393702]  ip6_input+0x32/0xa0
[ 7678.400155]  ? ip6_rcv_finish+0xa0/0xa0
[ 7678.407834]  ip6_rcv_finish+0x4b/0xa0
[ 7678.415164]  ipv6_rcv+0x2ec/0x4f0
[ 7678.421778]  ? ip6_make_skb+0x1c0/0x1c0
[ 7678.429447]  __netif_receive_skb_core+0x2d5/0x9a0
[ 7678.438845]  ? tcp6_gro_receive+0x11d/0x1c0
[ 7678.447211]  ? ipv6_gro_receive+0x1d0/0x380
[ 7678.455566]  __netif_receive_skb+0x16/0x70
[ 7678.463748]  netif_receive_skb_internal+0x32/0x3b0
[ 7678.473318]  ? dev_gro_receive+0x27c/0x470
[ 7678.481502]  napi_gro_receive+0x113/0x1d0
[ 7678.489510]  mlx5e_handle_rx_cqe_mpwrq+0x5b6/0x8d0
[ 7678.499077]  mlx5e_poll_rx_cq+0x79/0x830
[ 7678.506911]  mlx5e_napi_poll+0x8c/0x3d0
[ 7678.514579]  ? mlx5_cq_completion+0x54/0xb0
[ 7678.522950]  net_rx_action+0x22e/0x380
[ 7678.530450]  __do_softirq+0x106/0x2e8
[ 7678.537768]  irq_exit+0xb0/0xc0
[ 7678.544042]  do_IRQ+0x4f/0xd0
[ 7678.549970]  common_interrupt+0x86/0x86
...
[ 7685.690697] BUG: unable to handle kernel 
[ 7685.698737] NULL pointer dereference
[ 7685.705889]  at 0000000000000035
[ 7685.712357] IP: tcp_sacktag_walk+0xfd/0x460
[ 7685.720726] PGD 1002d75067 
[ 7685.726320] PUD 102c506067 
[ 7685.731902] PMD 0 
...
[ 7686.070541] task: ffff880ff5959b00 task.stack: ffffc90024708000
[ 7686.082363] RIP: 0010:tcp_sacktag_walk+0xfd/0x460
[ 7686.091754] RSP: 0000:ffff88203eec38b0 EFLAGS: 00010217
[ 7686.102188] RAX: ffff881953620a00 RBX: ffff881009273200 RCX: 00000000fbfa50ec
[ 7686.116439] RDX: 00000000fbfa54a0 RSI: 0000000000000000 RDI: ffff881009273200
[ 7686.130688] RBP: ffff88203eec3908 R08: 0000000000000048 R09: 00000000fbfa55fb
[ 7686.144936] R10: 0000000000000000 R11: 0000000000000000 R12: ffff881ec9ff98c0
[ 7686.159184] R13: ffff88203eec3a10 R14: 0000000000000000 R15: ffff881ec9ff9a18
[ 7686.173431] FS:  00007f2b235ff700(0000) GS:ffff88203eec0000(0000) knlGS:0000000000000000
[ 7686.189589] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7686.201062] CR2: 0000000000000035 CR3: 0000000fd8229000 CR4: 00000000003406e0
[ 7686.215310] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 7686.229560] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 7686.243807] Call Trace:
[ 7686.248696]  <IRQ>
[ 7686.252727]  tcp_sacktag_write_queue+0x5ff/0x9e0
[ 7686.261943]  tcp_ack+0x677/0x1390
[ 7686.268569]  ? select_task_rq_fair+0x9e3/0xf70
[ 7686.277437]  ? tcp_write_xmit+0x230/0x10a0
[ 7686.285619]  tcp_rcv_established+0x1ce/0x740
[ 7686.294151]  tcp_v6_do_rcv+0x15b/0x400
[ 7686.301636]  tcp_v6_rcv+0x94c/0x9f0
[ 7686.308606]  ? default_wake_function+0x12/0x20
[ 7686.317483]  ? __wake_up_common+0x53/0x90
[ 7686.325492]  ip6_input_finish+0xea/0x430
[ 7686.333329]  ip6_input+0x32/0xa0
[ 7686.339791]  ? ip6_rcv_finish+0xa0/0xa0
[ 7686.347451]  ip6_rcv_finish+0x4b/0xa0
[ 7686.354767]  ipv6_rcv+0x2ec/0x4f0
[ 7686.361389]  ? ip6_make_skb+0x1c0/0x1c0
[ 7686.369050]  __netif_receive_skb_core+0x2d5/0x9a0
[ 7686.378449]  ? tcp6_gro_receive+0x11d/0x1c0
[ 7686.386797]  ? ipv6_gro_receive+0x1d0/0x380
[ 7686.395160]  __netif_receive_skb+0x16/0x70
[ 7686.403342]  netif_receive_skb_internal+0x32/0x3b0
[ 7686.412931]  ? dev_gro_receive+0x27c/0x470
[ 7686.421110]  napi_gro_receive+0x113/0x1d0
[ 7686.429136]  mlx5e_handle_rx_cqe_mpwrq+0x5b6/0x8d0
[ 7686.438722]  mlx5e_poll_rx_cq+0x79/0x830
[ 7686.446572]  mlx5e_napi_poll+0x8c/0x3d0
[ 7686.454233]  ? mlx5_cq_completion+0x54/0xb0
[ 7686.462591]  net_rx_action+0x22e/0x380
[ 7686.470099]  __do_softirq+0x106/0x2e8
[ 7686.477431]  irq_exit+0xb0/0xc0
[ 7686.483700]  do_IRQ+0x4f/0xd0
[ 7686.489629]  common_interrupt+0x86/0x86

Sometimes soft lockup is also seen:

[2676627.345885] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:0]
[2676627.717771] RIP: 0010:queued_spin_lock_slowpath+0x15c/0x180
[2676627.729239] RSP: 0018:ffff88103f603e78 EFLAGS: 00000202
[2676627.740019]  ORIG_RAX: ffffffffffffff10
[2676627.748027] RAX: 0000000000000101 RBX: ffff8811181d8840 RCX: 0000000000000101
[2676627.762622] RDX: 0000000000000101 RSI: 0000000000000001 RDI: ffff8811181d88c8
[2676627.777217] RBP: ffff88103f603e88 R08: 0000000000000001 R09: ffff88103f95a800
[2676627.791811] R10: 0000000000037a13 R11: 000000001a37bb20 R12: 0000000000000100
[2676627.806405] R13: ffff8811181d8c18 R14: ffffffff81857270 R15: ffff8811181d8840
[2676627.821000] FS:  0000000000000000(0000) GS:ffff88103f600000(0000) knlGS:0000000000000000
[2676627.837500] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2676627.849318] CR2: 00007f6e44c1f000 CR3: 0000000001e09000 CR4: 00000000003406f0
[2676627.863916] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[2676627.878508] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[2676627.893103] Call Trace:
[2676627.898338]  <IRQ>
[2676627.902711]  ? _raw_spin_lock+0x21/0x30
[2676627.910715]  tcp_write_timer+0x19/0x70
[2676627.918548]  call_timer_fn+0x35/0x150
[2676627.926214]  run_timer_softirq+0x1be/0x3e0
[2676627.934742]  ? timerqueue_add+0x59/0xb0
[2676627.942745]  ? ktime_get+0x3c/0xb0
[2676627.949889]  __do_softirq+0x106/0x2e8
[2676627.957551]  irq_exit+0xb0/0xc0
[2676627.964174]  smp_apic_timer_interrupt+0x68/0x90
[2676627.973563]  apic_timer_interrupt+0x86/0x90
[2676628.084861] NMI watchdog: BUG: soft lockup - CPU#11 stuck for 21s! [swapper/11:0]
[2676628.084896] RIP: 0010:tcp_match_skb_to_sack+0x2d/0xb0
[2676628.084897] RSP: 0018:ffff88103f8c38a8 EFLAGS: 00000297
[2676628.084897]  ORIG_RAX: ffffffffffffff10
[2676628.084898] RAX: 0000000000000000 RBX: ffff88010f9dbb18 RCX: 000000002d5da350
[2676628.084899] RDX: 000000002d5da141 RSI: ffff88010f9dbb18 RDI: ffff8811181d8840
[2676628.084899] RBP: ffff88103f8c3908 R08: 00000000ffffffff R09: 00000000d2a25ebe
[2676628.084900] R10: 000000000000e323 R11: ffff8811181d8998 R12: ffff8811181d8840
[2676628.084900] R13: ffff88103f8c3a10 R14: 0000000000000000 R15: ffff8811181d8998
[2676628.084901] FS:  0000000000000000(0000) GS:ffff88103f8c0000(0000) knlGS:0000000000000000
[2676628.084902] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2676628.084903] CR2: 00007fbc07410000 CR3: 0000000001e09000 CR4: 00000000003406e0
[2676628.084903] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[2676628.084904] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[2676628.084904] Call Trace:
[2676628.084905]  <IRQ>
[2676628.084907]  ? tcp_sacktag_walk+0x270/0x460
[2676628.084909]  tcp_sacktag_write_queue+0x5ff/0x9e0
[2676628.084911]  tcp_ack+0x677/0x1390
[2676628.084913]  ? tcp_write_xmit+0x230/0x10a0
[2676628.084915]  tcp_rcv_established+0x1ce/0x740
[2676628.084919]  tcp_v6_do_rcv+0x15b/0x400
[2676628.084921]  tcp_v6_rcv+0x94c/0x9f0
[2676628.084924]  ? __dev_queue_xmit+0x251/0x650
[2676628.084928]  ip6_input_finish+0xea/0x430
[2676628.084930]  ip6_input+0x32/0xa0
[2676628.084932]  ? ip6_rcv_finish+0xa0/0xa0
[2676628.084933]  ip6_rcv_finish+0x4b/0xa0
[2676628.084935]  ipv6_rcv+0x2ec/0x4f0
[2676628.084937]  ? ip6_make_skb+0x1c0/0x1c0
[2676628.084938]  __netif_receive_skb_core+0x2d5/0x9a0
[2676628.084941]  ? tcp6_gro_receive+0x11d/0x1c0
[2676628.084943]  ? ipv6_gro_receive+0x1d0/0x380
[2676628.084944]  __netif_receive_skb+0x16/0x70
[2676628.084945]  netif_receive_skb_internal+0x32/0x3b0
[2676628.084947]  ? dev_gro_receive+0x27c/0x470
[2676628.084948]  napi_gro_receive+0x113/0x1d0
[2676628.084952]  mlx5e_handle_rx_cqe_mpwrq+0x5b6/0x8d0
[2676628.084954]  mlx5e_poll_rx_cq+0x79/0x830
[2676628.084956]  mlx5e_napi_poll+0x8c/0x3d0
[2676628.084959]  ? mlx5_cq_completion+0x54/0xb0
[2676628.084960]  net_rx_action+0x22e/0x380
[2676628.084962]  __do_softirq+0x106/0x2e8
[2676628.084965]  irq_exit+0xb0/0xc0
[2676628.084967]  do_IRQ+0x4f/0xd0
[2676656.083964] NMI watchdog: BUG: soft lockup - CPU#11 stuck for 23s! [swapper/11:0]
[2676656.083992] RIP: 0010:tcp_sacktag_walk+0x25f/0x460
[2676656.083993] RSP: 0018:ffff88103f8c38b0 EFLAGS: 00000207
[2676656.083993]  ORIG_RAX: ffffffffffffff10
[2676656.083994] RAX: 0000000000021500 RBX: ffff8810d84c3600 RCX: 000000002d5d9e41
[2676656.083994] RDX: 00000000f15de229 RSI: ffff8810d84c0200 RDI: ffff8811181d8840
[2676656.083995] RBP: ffff88103f8c3908 R08: 000000000000ff21 R09: 0000000000000000
[2676656.083996] R10: 000000000000e323 R11: ffff8811181d8998 R12: ffff8811181d8840
[2676656.083996] R13: ffff88103f8c3a10 R14: 0000000000000000 R15: ffff8811181d8998
[2676656.083997] FS:  0000000000000000(0000) GS:ffff88103f8c0000(0000) knlGS:0000000000000000
[2676656.083998] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2676656.083998] CR2: 00007fbc07410000 CR3: 0000000001e09000 CR4: 00000000003406e0
[2676656.083999] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[2676656.083999] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[2676656.084000] Call Trace:
[2676656.084001]  <IRQ>
[2676656.084003]  tcp_sacktag_write_queue+0x5ff/0x9e0
[2676656.084005]  tcp_ack+0x677/0x1390
[2676656.084007]  ? tcp_write_xmit+0x230/0x10a0
[2676656.084009]  tcp_rcv_established+0x1ce/0x740
[2676656.084013]  tcp_v6_do_rcv+0x15b/0x400
[2676656.084015]  tcp_v6_rcv+0x94c/0x9f0
[2676656.084018]  ? __dev_queue_xmit+0x251/0x650
[2676656.084022]  ip6_input_finish+0xea/0x430
[2676656.084024]  ip6_input+0x32/0xa0
[2676656.084026]  ? ip6_rcv_finish+0xa0/0xa0
[2676656.084027]  ip6_rcv_finish+0x4b/0xa0
[2676656.084029]  ipv6_rcv+0x2ec/0x4f0
[2676656.084031]  ? ip6_make_skb+0x1c0/0x1c0
[2676656.084032]  __netif_receive_skb_core+0x2d5/0x9a0
[2676656.084035]  ? tcp6_gro_receive+0x11d/0x1c0
[2676656.084037]  ? ipv6_gro_receive+0x1d0/0x380
[2676656.084038]  __netif_receive_skb+0x16/0x70
[2676656.084039]  netif_receive_skb_internal+0x32/0x3b0
[2676656.084041]  ? dev_gro_receive+0x27c/0x470
[2676656.084042]  napi_gro_receive+0x113/0x1d0
[2676656.084046]  mlx5e_handle_rx_cqe_mpwrq+0x5b6/0x8d0
[2676656.084048]  mlx5e_poll_rx_cq+0x79/0x830
[2676656.084050]  mlx5e_napi_poll+0x8c/0x3d0

Any help greatly appreciated.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c
  2017-10-26  2:07               ` Alexei Starovoitov
@ 2017-10-26  5:37                 ` Yuchung Cheng
  2017-10-27 20:38                   ` Eric Dumazet
  0 siblings, 1 reply; 39+ messages in thread
From: Yuchung Cheng @ 2017-10-26  5:37 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Oleksandr Natalenko, Roman Gushchin, netdev, Eric Dumazet,
	Neal Cardwell, David S. Miller, Lawrence Brakmo

On Wed, Oct 25, 2017 at 7:07 PM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Sep 28, 2017 at 04:36:58PM -0700, Yuchung Cheng wrote:
> > On Thu, Sep 28, 2017 at 1:14 AM, Oleksandr Natalenko
> > <oleksandr@natalenko.name> wrote:
> > > Hi.
> > >
> > > Won't tell about panic in tcp_sacktag_walk() since I cannot trigger it
> > > intentionally, but setting net.ipv4.tcp_retrans_collapse to 0 *does not* fix
> > > warning in tcp_fastretrans_alert() for me.
> >
> > Hi Oleksandr: no retrans_collapse should not matter for that warning
> > in tcp_fstretrans_alert(). the warning as I explained earlier is
> > likely false. Neal and I are more concerned the panic in
> > tcp_sacktag_walk. This is just a blind shot but thx for retrying.
> >
> > We can submit a one-liner to remove the fast retrans warning but want
> > to nail the bigger issue first.
>
> we're still seeing the warnings followed by crashes and it's very concerning.
> We hoped that most recent Neal's patches from Sep 18 around this area may
> magically fix the issue, but no. The panics are still there.
> It's confirmed that net.ipv4.tcp_retrans_collapse=0 does not help
> whereas net.ipv4.tcp_recovery=0 works, but obviously undesirable.
> We're out of ideas on how to debug this.
Can you try Eric's latest SACK rb-tree patches?
https://patchwork.ozlabs.org/cover/822218/

Roman's SNMP data suggests MTU probing is enabled. Another blind shot
is to disable it.
>
> Warning + panic:
> [ 7678.309043] Call Trace:
> [ 7678.313925]  <IRQ>
> [ 7678.317955]  dump_stack+0x4d/0x70
> [ 7678.324573]  __warn+0xd3/0xf0
> [ 7678.330497]  warn_slowpath_null+0x1e/0x20
> [ 7678.338510]  tcp_fastretrans_alert+0xacf/0xbd0
> [ 7678.347380]  tcp_ack+0xbce/0x1390
> [ 7678.354006]  tcp_rcv_established+0x1ce/0x740
> [ 7678.362536]  tcp_v6_do_rcv+0x15b/0x400
> [ 7678.370025]  tcp_v6_rcv+0x94c/0x9f0
> [ 7678.376989]  ? tcp_rcv_established+0x42b/0x740
> [ 7678.385866]  ip6_input_finish+0xea/0x430
> [ 7678.393702]  ip6_input+0x32/0xa0
> [ 7678.400155]  ? ip6_rcv_finish+0xa0/0xa0
> [ 7678.407834]  ip6_rcv_finish+0x4b/0xa0
> [ 7678.415164]  ipv6_rcv+0x2ec/0x4f0
> [ 7678.421778]  ? ip6_make_skb+0x1c0/0x1c0
> [ 7678.429447]  __netif_receive_skb_core+0x2d5/0x9a0
> [ 7678.438845]  ? tcp6_gro_receive+0x11d/0x1c0
> [ 7678.447211]  ? ipv6_gro_receive+0x1d0/0x380
> [ 7678.455566]  __netif_receive_skb+0x16/0x70
> [ 7678.463748]  netif_receive_skb_internal+0x32/0x3b0
> [ 7678.473318]  ? dev_gro_receive+0x27c/0x470
> [ 7678.481502]  napi_gro_receive+0x113/0x1d0
> [ 7678.489510]  mlx5e_handle_rx_cqe_mpwrq+0x5b6/0x8d0
> [ 7678.499077]  mlx5e_poll_rx_cq+0x79/0x830
> [ 7678.506911]  mlx5e_napi_poll+0x8c/0x3d0
> [ 7678.514579]  ? mlx5_cq_completion+0x54/0xb0
> [ 7678.522950]  net_rx_action+0x22e/0x380
> [ 7678.530450]  __do_softirq+0x106/0x2e8
> [ 7678.537768]  irq_exit+0xb0/0xc0
> [ 7678.544042]  do_IRQ+0x4f/0xd0
> [ 7678.549970]  common_interrupt+0x86/0x86
> ...
> [ 7685.690697] BUG: unable to handle kernel
> [ 7685.698737] NULL pointer dereference
> [ 7685.705889]  at 0000000000000035
> [ 7685.712357] IP: tcp_sacktag_walk+0xfd/0x460
> [ 7685.720726] PGD 1002d75067
> [ 7685.726320] PUD 102c506067
> [ 7685.731902] PMD 0
> ...
> [ 7686.070541] task: ffff880ff5959b00 task.stack: ffffc90024708000
> [ 7686.082363] RIP: 0010:tcp_sacktag_walk+0xfd/0x460
> [ 7686.091754] RSP: 0000:ffff88203eec38b0 EFLAGS: 00010217
> [ 7686.102188] RAX: ffff881953620a00 RBX: ffff881009273200 RCX: 00000000fbfa50ec
> [ 7686.116439] RDX: 00000000fbfa54a0 RSI: 0000000000000000 RDI: ffff881009273200
> [ 7686.130688] RBP: ffff88203eec3908 R08: 0000000000000048 R09: 00000000fbfa55fb
> [ 7686.144936] R10: 0000000000000000 R11: 0000000000000000 R12: ffff881ec9ff98c0
> [ 7686.159184] R13: ffff88203eec3a10 R14: 0000000000000000 R15: ffff881ec9ff9a18
> [ 7686.173431] FS:  00007f2b235ff700(0000) GS:ffff88203eec0000(0000) knlGS:0000000000000000
> [ 7686.189589] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 7686.201062] CR2: 0000000000000035 CR3: 0000000fd8229000 CR4: 00000000003406e0
> [ 7686.215310] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 7686.229560] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 7686.243807] Call Trace:
> [ 7686.248696]  <IRQ>
> [ 7686.252727]  tcp_sacktag_write_queue+0x5ff/0x9e0
> [ 7686.261943]  tcp_ack+0x677/0x1390
> [ 7686.268569]  ? select_task_rq_fair+0x9e3/0xf70
> [ 7686.277437]  ? tcp_write_xmit+0x230/0x10a0
> [ 7686.285619]  tcp_rcv_established+0x1ce/0x740
> [ 7686.294151]  tcp_v6_do_rcv+0x15b/0x400
> [ 7686.301636]  tcp_v6_rcv+0x94c/0x9f0
> [ 7686.308606]  ? default_wake_function+0x12/0x20
> [ 7686.317483]  ? __wake_up_common+0x53/0x90
> [ 7686.325492]  ip6_input_finish+0xea/0x430
> [ 7686.333329]  ip6_input+0x32/0xa0
> [ 7686.339791]  ? ip6_rcv_finish+0xa0/0xa0
> [ 7686.347451]  ip6_rcv_finish+0x4b/0xa0
> [ 7686.354767]  ipv6_rcv+0x2ec/0x4f0
> [ 7686.361389]  ? ip6_make_skb+0x1c0/0x1c0
> [ 7686.369050]  __netif_receive_skb_core+0x2d5/0x9a0
> [ 7686.378449]  ? tcp6_gro_receive+0x11d/0x1c0
> [ 7686.386797]  ? ipv6_gro_receive+0x1d0/0x380
> [ 7686.395160]  __netif_receive_skb+0x16/0x70
> [ 7686.403342]  netif_receive_skb_internal+0x32/0x3b0
> [ 7686.412931]  ? dev_gro_receive+0x27c/0x470
> [ 7686.421110]  napi_gro_receive+0x113/0x1d0
> [ 7686.429136]  mlx5e_handle_rx_cqe_mpwrq+0x5b6/0x8d0
> [ 7686.438722]  mlx5e_poll_rx_cq+0x79/0x830
> [ 7686.446572]  mlx5e_napi_poll+0x8c/0x3d0
> [ 7686.454233]  ? mlx5_cq_completion+0x54/0xb0
> [ 7686.462591]  net_rx_action+0x22e/0x380
> [ 7686.470099]  __do_softirq+0x106/0x2e8
> [ 7686.477431]  irq_exit+0xb0/0xc0
> [ 7686.483700]  do_IRQ+0x4f/0xd0
> [ 7686.489629]  common_interrupt+0x86/0x86
>
> Sometimes soft lockup is also seen:
>
> [2676627.345885] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:0]
> [2676627.717771] RIP: 0010:queued_spin_lock_slowpath+0x15c/0x180
> [2676627.729239] RSP: 0018:ffff88103f603e78 EFLAGS: 00000202
> [2676627.740019]  ORIG_RAX: ffffffffffffff10
> [2676627.748027] RAX: 0000000000000101 RBX: ffff8811181d8840 RCX: 0000000000000101
> [2676627.762622] RDX: 0000000000000101 RSI: 0000000000000001 RDI: ffff8811181d88c8
> [2676627.777217] RBP: ffff88103f603e88 R08: 0000000000000001 R09: ffff88103f95a800
> [2676627.791811] R10: 0000000000037a13 R11: 000000001a37bb20 R12: 0000000000000100
> [2676627.806405] R13: ffff8811181d8c18 R14: ffffffff81857270 R15: ffff8811181d8840
> [2676627.821000] FS:  0000000000000000(0000) GS:ffff88103f600000(0000) knlGS:0000000000000000
> [2676627.837500] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [2676627.849318] CR2: 00007f6e44c1f000 CR3: 0000000001e09000 CR4: 00000000003406f0
> [2676627.863916] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [2676627.878508] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [2676627.893103] Call Trace:
> [2676627.898338]  <IRQ>
> [2676627.902711]  ? _raw_spin_lock+0x21/0x30
> [2676627.910715]  tcp_write_timer+0x19/0x70
> [2676627.918548]  call_timer_fn+0x35/0x150
> [2676627.926214]  run_timer_softirq+0x1be/0x3e0
> [2676627.934742]  ? timerqueue_add+0x59/0xb0
> [2676627.942745]  ? ktime_get+0x3c/0xb0
> [2676627.949889]  __do_softirq+0x106/0x2e8
> [2676627.957551]  irq_exit+0xb0/0xc0
> [2676627.964174]  smp_apic_timer_interrupt+0x68/0x90
> [2676627.973563]  apic_timer_interrupt+0x86/0x90
> [2676628.084861] NMI watchdog: BUG: soft lockup - CPU#11 stuck for 21s! [swapper/11:0]
> [2676628.084896] RIP: 0010:tcp_match_skb_to_sack+0x2d/0xb0
> [2676628.084897] RSP: 0018:ffff88103f8c38a8 EFLAGS: 00000297
> [2676628.084897]  ORIG_RAX: ffffffffffffff10
> [2676628.084898] RAX: 0000000000000000 RBX: ffff88010f9dbb18 RCX: 000000002d5da350
> [2676628.084899] RDX: 000000002d5da141 RSI: ffff88010f9dbb18 RDI: ffff8811181d8840
> [2676628.084899] RBP: ffff88103f8c3908 R08: 00000000ffffffff R09: 00000000d2a25ebe
> [2676628.084900] R10: 000000000000e323 R11: ffff8811181d8998 R12: ffff8811181d8840
> [2676628.084900] R13: ffff88103f8c3a10 R14: 0000000000000000 R15: ffff8811181d8998
> [2676628.084901] FS:  0000000000000000(0000) GS:ffff88103f8c0000(0000) knlGS:0000000000000000
> [2676628.084902] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [2676628.084903] CR2: 00007fbc07410000 CR3: 0000000001e09000 CR4: 00000000003406e0
> [2676628.084903] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [2676628.084904] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [2676628.084904] Call Trace:
> [2676628.084905]  <IRQ>
> [2676628.084907]  ? tcp_sacktag_walk+0x270/0x460
> [2676628.084909]  tcp_sacktag_write_queue+0x5ff/0x9e0
> [2676628.084911]  tcp_ack+0x677/0x1390
> [2676628.084913]  ? tcp_write_xmit+0x230/0x10a0
> [2676628.084915]  tcp_rcv_established+0x1ce/0x740
> [2676628.084919]  tcp_v6_do_rcv+0x15b/0x400
> [2676628.084921]  tcp_v6_rcv+0x94c/0x9f0
> [2676628.084924]  ? __dev_queue_xmit+0x251/0x650
> [2676628.084928]  ip6_input_finish+0xea/0x430
> [2676628.084930]  ip6_input+0x32/0xa0
> [2676628.084932]  ? ip6_rcv_finish+0xa0/0xa0
> [2676628.084933]  ip6_rcv_finish+0x4b/0xa0
> [2676628.084935]  ipv6_rcv+0x2ec/0x4f0
> [2676628.084937]  ? ip6_make_skb+0x1c0/0x1c0
> [2676628.084938]  __netif_receive_skb_core+0x2d5/0x9a0
> [2676628.084941]  ? tcp6_gro_receive+0x11d/0x1c0
> [2676628.084943]  ? ipv6_gro_receive+0x1d0/0x380
> [2676628.084944]  __netif_receive_skb+0x16/0x70
> [2676628.084945]  netif_receive_skb_internal+0x32/0x3b0
> [2676628.084947]  ? dev_gro_receive+0x27c/0x470
> [2676628.084948]  napi_gro_receive+0x113/0x1d0
> [2676628.084952]  mlx5e_handle_rx_cqe_mpwrq+0x5b6/0x8d0
> [2676628.084954]  mlx5e_poll_rx_cq+0x79/0x830
> [2676628.084956]  mlx5e_napi_poll+0x8c/0x3d0
> [2676628.084959]  ? mlx5_cq_completion+0x54/0xb0
> [2676628.084960]  net_rx_action+0x22e/0x380
> [2676628.084962]  __do_softirq+0x106/0x2e8
> [2676628.084965]  irq_exit+0xb0/0xc0
> [2676628.084967]  do_IRQ+0x4f/0xd0
> [2676656.083964] NMI watchdog: BUG: soft lockup - CPU#11 stuck for 23s! [swapper/11:0]
> [2676656.083992] RIP: 0010:tcp_sacktag_walk+0x25f/0x460
> [2676656.083993] RSP: 0018:ffff88103f8c38b0 EFLAGS: 00000207
> [2676656.083993]  ORIG_RAX: ffffffffffffff10
> [2676656.083994] RAX: 0000000000021500 RBX: ffff8810d84c3600 RCX: 000000002d5d9e41
> [2676656.083994] RDX: 00000000f15de229 RSI: ffff8810d84c0200 RDI: ffff8811181d8840
> [2676656.083995] RBP: ffff88103f8c3908 R08: 000000000000ff21 R09: 0000000000000000
> [2676656.083996] R10: 000000000000e323 R11: ffff8811181d8998 R12: ffff8811181d8840
> [2676656.083996] R13: ffff88103f8c3a10 R14: 0000000000000000 R15: ffff8811181d8998
> [2676656.083997] FS:  0000000000000000(0000) GS:ffff88103f8c0000(0000) knlGS:0000000000000000
> [2676656.083998] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [2676656.083998] CR2: 00007fbc07410000 CR3: 0000000001e09000 CR4: 00000000003406e0
> [2676656.083999] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [2676656.083999] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [2676656.084000] Call Trace:
> [2676656.084001]  <IRQ>
> [2676656.084003]  tcp_sacktag_write_queue+0x5ff/0x9e0
> [2676656.084005]  tcp_ack+0x677/0x1390
> [2676656.084007]  ? tcp_write_xmit+0x230/0x10a0
> [2676656.084009]  tcp_rcv_established+0x1ce/0x740
> [2676656.084013]  tcp_v6_do_rcv+0x15b/0x400
> [2676656.084015]  tcp_v6_rcv+0x94c/0x9f0
> [2676656.084018]  ? __dev_queue_xmit+0x251/0x650
> [2676656.084022]  ip6_input_finish+0xea/0x430
> [2676656.084024]  ip6_input+0x32/0xa0
> [2676656.084026]  ? ip6_rcv_finish+0xa0/0xa0
> [2676656.084027]  ip6_rcv_finish+0x4b/0xa0
> [2676656.084029]  ipv6_rcv+0x2ec/0x4f0
> [2676656.084031]  ? ip6_make_skb+0x1c0/0x1c0
> [2676656.084032]  __netif_receive_skb_core+0x2d5/0x9a0
> [2676656.084035]  ? tcp6_gro_receive+0x11d/0x1c0
> [2676656.084037]  ? ipv6_gro_receive+0x1d0/0x380
> [2676656.084038]  __netif_receive_skb+0x16/0x70
> [2676656.084039]  netif_receive_skb_internal+0x32/0x3b0
> [2676656.084041]  ? dev_gro_receive+0x27c/0x470
> [2676656.084042]  napi_gro_receive+0x113/0x1d0
> [2676656.084046]  mlx5e_handle_rx_cqe_mpwrq+0x5b6/0x8d0
> [2676656.084048]  mlx5e_poll_rx_cq+0x79/0x830
> [2676656.084050]  mlx5e_napi_poll+0x8c/0x3d0
>
> Any help greatly appreciated.
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c
  2017-10-26  5:37                 ` Yuchung Cheng
@ 2017-10-27 20:38                   ` Eric Dumazet
  2017-10-31  6:08                     ` [PATCH net] tcp: fix tcp_mtu_probe() vs highest_sack Eric Dumazet
  2017-11-06 22:27                     ` [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c Yuchung Cheng
  0 siblings, 2 replies; 39+ messages in thread
From: Eric Dumazet @ 2017-10-27 20:38 UTC (permalink / raw)
  To: Yuchung Cheng
  Cc: Alexei Starovoitov, Oleksandr Natalenko, Roman Gushchin, netdev,
	Neal Cardwell, David S. Miller, Lawrence Brakmo

On Wed, Oct 25, 2017 at 10:37 PM, Yuchung Cheng <ycheng@google.com> wrote:
> On Wed, Oct 25, 2017 at 7:07 PM, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
>>
>> On Thu, Sep 28, 2017 at 04:36:58PM -0700, Yuchung Cheng wrote:
>> > On Thu, Sep 28, 2017 at 1:14 AM, Oleksandr Natalenko
>> > <oleksandr@natalenko.name> wrote:
>> > > Hi.
>> > >
>> > > Won't tell about panic in tcp_sacktag_walk() since I cannot trigger it
>> > > intentionally, but setting net.ipv4.tcp_retrans_collapse to 0 *does not* fix
>> > > warning in tcp_fastretrans_alert() for me.
>> >
>> > Hi Oleksandr: no retrans_collapse should not matter for that warning
>> > in tcp_fstretrans_alert(). the warning as I explained earlier is
>> > likely false. Neal and I are more concerned the panic in
>> > tcp_sacktag_walk. This is just a blind shot but thx for retrying.
>> >
>> > We can submit a one-liner to remove the fast retrans warning but want
>> > to nail the bigger issue first.
>>
>> we're still seeing the warnings followed by crashes and it's very concerning.
>> We hoped that most recent Neal's patches from Sep 18 around this area may
>> magically fix the issue, but no. The panics are still there.
>> It's confirmed that net.ipv4.tcp_retrans_collapse=0 does not help
>> whereas net.ipv4.tcp_recovery=0 works, but obviously undesirable.
>> We're out of ideas on how to debug this.
> Can you try Eric's latest SACK rb-tree patches?
> https://patchwork.ozlabs.org/cover/822218/
>
> Roman's SNMP data suggests MTU probing is enabled. Another blind shot
> is to disable it.


Or alternatively try this fix :

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 1151870018e345592853b035a0902121c41e268d..6a849c7028f06f31b36a906be37995b28b579a40
100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2062,6 +2062,8 @@ static int tcp_mtu_probe(struct sock *sk)
        nskb->ip_summed = skb->ip_summed;

        tcp_insert_write_queue_before(nskb, skb, sk);
+       if (skb == tp->highest_sack)
+               tp->highest_sack = nskb;

        len = 0;
        tcp_for_write_queue_from_safe(skb, next, sk) {

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH net] tcp: fix tcp_mtu_probe() vs highest_sack
  2017-10-27 20:38                   ` Eric Dumazet
@ 2017-10-31  6:08                     ` Eric Dumazet
  2017-10-31  6:17                       ` Alexei Starovoitov
                                         ` (3 more replies)
  2017-11-06 22:27                     ` [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c Yuchung Cheng
  1 sibling, 4 replies; 39+ messages in thread
From: Eric Dumazet @ 2017-10-31  6:08 UTC (permalink / raw)
  To: David Miller
  Cc: Yuchung Cheng, Alexei Starovoitov, Oleksandr Natalenko,
	Roman Gushchin, netdev, Neal Cardwell, Lawrence Brakmo

From: Eric Dumazet <edumazet@google.com>

Based on SNMP values provided by Roman, Yuchung made the observation
that some crashes in tcp_sacktag_walk() might be caused by MTU probing.

Looking at tcp_mtu_probe(), I found that when a new skb was placed
in front of the write queue, we were not updating tcp highest sack.

If one skb is freed because all its content was copied to the new skb
(for MTU probing), then tp->highest_sack could point to a now freed skb.

Bad things would then happen, including infinite loops.

This patch renames tcp_highest_sack_combine() and uses it
from tcp_mtu_probe() to fix the bug.

Note that I also removed one test against tp->sacked_out,
since we want to replace tp->highest_sack regardless of whatever
condition, since keeping a stale pointer to freed skb is a recipe
for disaster.

Fixes: a47e5a988a57 ("[TCP]: Convert highest_sack to sk_buff to allow direct access")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Reported-by: Roman Gushchin <guro@fb.com>
Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
---
 include/net/tcp.h     |    6 +++---
 net/ipv4/tcp_output.c |    3 ++-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 33599d17522d6a19b9d9a316cc1579cd5e71ee32..e6d0002a1b0bc5f28c331a760823c8dc92f8fe24 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1771,12 +1771,12 @@ static inline void tcp_highest_sack_reset(struct sock *sk)
 	tcp_sk(sk)->highest_sack = tcp_write_queue_head(sk);
 }
 
-/* Called when old skb is about to be deleted (to be combined with new skb) */
-static inline void tcp_highest_sack_combine(struct sock *sk,
+/* Called when old skb is about to be deleted and replaced by new skb */
+static inline void tcp_highest_sack_replace(struct sock *sk,
 					    struct sk_buff *old,
 					    struct sk_buff *new)
 {
-	if (tcp_sk(sk)->sacked_out && (old == tcp_sk(sk)->highest_sack))
+	if (old == tcp_highest_sack(sk))
 		tcp_sk(sk)->highest_sack = new;
 }
 
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index ae60dd3faed0adc71731bc686f878afd4c628d32..823003eef3a21a5cc5c27e0be9f46159afa060df 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2062,6 +2062,7 @@ static int tcp_mtu_probe(struct sock *sk)
 	nskb->ip_summed = skb->ip_summed;
 
 	tcp_insert_write_queue_before(nskb, skb, sk);
+	tcp_highest_sack_replace(sk, skb, nskb);
 
 	len = 0;
 	tcp_for_write_queue_from_safe(skb, next, sk) {
@@ -2665,7 +2666,7 @@ static bool tcp_collapse_retrans(struct sock *sk, struct sk_buff *skb)
 		else if (!skb_shift(skb, next_skb, next_skb_size))
 			return false;
 	}
-	tcp_highest_sack_combine(sk, next_skb, skb);
+	tcp_highest_sack_replace(sk, next_skb, skb);
 
 	tcp_unlink_write_queue(next_skb, sk);
 

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH net] tcp: fix tcp_mtu_probe() vs highest_sack
  2017-10-31  6:08                     ` [PATCH net] tcp: fix tcp_mtu_probe() vs highest_sack Eric Dumazet
@ 2017-10-31  6:17                       ` Alexei Starovoitov
  2017-10-31  6:21                         ` Eric Dumazet
  2017-11-01  5:50                         ` Yuchung Cheng
  2017-10-31 13:51                       ` Neal Cardwell
                                         ` (2 subsequent siblings)
  3 siblings, 2 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2017-10-31  6:17 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, Yuchung Cheng, Oleksandr Natalenko, Roman Gushchin,
	netdev, Neal Cardwell, Lawrence Brakmo

On Mon, Oct 30, 2017 at 11:08:20PM -0700, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> Based on SNMP values provided by Roman, Yuchung made the observation
> that some crashes in tcp_sacktag_walk() might be caused by MTU probing.
> 
> Looking at tcp_mtu_probe(), I found that when a new skb was placed
> in front of the write queue, we were not updating tcp highest sack.
> 
> If one skb is freed because all its content was copied to the new skb
> (for MTU probing), then tp->highest_sack could point to a now freed skb.
> 
> Bad things would then happen, including infinite loops.
> 
> This patch renames tcp_highest_sack_combine() and uses it
> from tcp_mtu_probe() to fix the bug.
> 
> Note that I also removed one test against tp->sacked_out,
> since we want to replace tp->highest_sack regardless of whatever
> condition, since keeping a stale pointer to freed skb is a recipe
> for disaster.
> 
> Fixes: a47e5a988a57 ("[TCP]: Convert highest_sack to sk_buff to allow direct access")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Reported-by: Roman Gushchin <guro@fb.com>
> Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>

Thanks!

Acked-by: Alexei Starovoitov <ast@kernel.org>

wow. a bug from 2007.
Any idea why it only started to bite us in 4.11 ?

It's not trivial for us to reproduce it, but we will definitely
test the patch as soon as we can.
Do you have packet drill test or something for easy repro?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH net] tcp: fix tcp_mtu_probe() vs highest_sack
  2017-10-31  6:17                       ` Alexei Starovoitov
@ 2017-10-31  6:21                         ` Eric Dumazet
  2017-10-31  6:30                           ` Alexei Starovoitov
  2017-11-01  5:50                         ` Yuchung Cheng
  1 sibling, 1 reply; 39+ messages in thread
From: Eric Dumazet @ 2017-10-31  6:21 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David Miller, Yuchung Cheng, Oleksandr Natalenko, Roman Gushchin,
	netdev, Neal Cardwell, Lawrence Brakmo

On Mon, 2017-10-30 at 23:17 -0700, Alexei Starovoitov wrote:
> On Mon, Oct 30, 2017 at 11:08:20PM -0700, Eric Dumazet wrote:
> > From: Eric Dumazet <edumazet@google.com>
> > 
> > Based on SNMP values provided by Roman, Yuchung made the observation
> > that some crashes in tcp_sacktag_walk() might be caused by MTU probing.
> > 
> > Looking at tcp_mtu_probe(), I found that when a new skb was placed
> > in front of the write queue, we were not updating tcp highest sack.
> > 
> > If one skb is freed because all its content was copied to the new skb
> > (for MTU probing), then tp->highest_sack could point to a now freed skb.
> > 
> > Bad things would then happen, including infinite loops.
> > 
> > This patch renames tcp_highest_sack_combine() and uses it
> > from tcp_mtu_probe() to fix the bug.
> > 
> > Note that I also removed one test against tp->sacked_out,
> > since we want to replace tp->highest_sack regardless of whatever
> > condition, since keeping a stale pointer to freed skb is a recipe
> > for disaster.
> > 
> > Fixes: a47e5a988a57 ("[TCP]: Convert highest_sack to sk_buff to allow direct access")
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> > Reported-by: Roman Gushchin <guro@fb.com>
> > Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
> 
> Thanks!
> 
> Acked-by: Alexei Starovoitov <ast@kernel.org>
> 
> wow. a bug from 2007.
> Any idea why it only started to bite us in 4.11 ?
> 
> It's not trivial for us to reproduce it, but we will definitely
> test the patch as soon as we can.
> Do you have packet drill test or something for easy repro?

I tried to cook a packetdrill test but could not trigger the issue.

When have you started to enable mtu probing ?

(Linux defaults to not enabling it )

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH net] tcp: fix tcp_mtu_probe() vs highest_sack
  2017-10-31  6:21                         ` Eric Dumazet
@ 2017-10-31  6:30                           ` Alexei Starovoitov
  0 siblings, 0 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2017-10-31  6:30 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, Yuchung Cheng, Oleksandr Natalenko, Roman Gushchin,
	netdev, Neal Cardwell, Lawrence Brakmo

On Mon, Oct 30, 2017 at 11:21:42PM -0700, Eric Dumazet wrote:
> On Mon, 2017-10-30 at 23:17 -0700, Alexei Starovoitov wrote:
> > On Mon, Oct 30, 2017 at 11:08:20PM -0700, Eric Dumazet wrote:
> > > From: Eric Dumazet <edumazet@google.com>
> > > 
> > > Based on SNMP values provided by Roman, Yuchung made the observation
> > > that some crashes in tcp_sacktag_walk() might be caused by MTU probing.
> > > 
> > > Looking at tcp_mtu_probe(), I found that when a new skb was placed
> > > in front of the write queue, we were not updating tcp highest sack.
> > > 
> > > If one skb is freed because all its content was copied to the new skb
> > > (for MTU probing), then tp->highest_sack could point to a now freed skb.
> > > 
> > > Bad things would then happen, including infinite loops.
> > > 
> > > This patch renames tcp_highest_sack_combine() and uses it
> > > from tcp_mtu_probe() to fix the bug.
> > > 
> > > Note that I also removed one test against tp->sacked_out,
> > > since we want to replace tp->highest_sack regardless of whatever
> > > condition, since keeping a stale pointer to freed skb is a recipe
> > > for disaster.
> > > 
> > > Fixes: a47e5a988a57 ("[TCP]: Convert highest_sack to sk_buff to allow direct access")
> > > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > > Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> > > Reported-by: Roman Gushchin <guro@fb.com>
> > > Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
> > 
> > Thanks!
> > 
> > Acked-by: Alexei Starovoitov <ast@kernel.org>
> > 
> > wow. a bug from 2007.
> > Any idea why it only started to bite us in 4.11 ?
> > 
> > It's not trivial for us to reproduce it, but we will definitely
> > test the patch as soon as we can.
> > Do you have packet drill test or something for easy repro?
> 
> I tried to cook a packetdrill test but could not trigger the issue.
> 
> When have you started to enable mtu probing ?

for some time. somehow 4.6 based kernel didn't trigger it.
May be it's a different bug still...

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH net] tcp: fix tcp_mtu_probe() vs highest_sack
  2017-10-31  6:08                     ` [PATCH net] tcp: fix tcp_mtu_probe() vs highest_sack Eric Dumazet
  2017-10-31  6:17                       ` Alexei Starovoitov
@ 2017-10-31 13:51                       ` Neal Cardwell
  2017-11-01 12:20                       ` David Miller
  2017-11-03 18:22                       ` Oleksandr Natalenko
  3 siblings, 0 replies; 39+ messages in thread
From: Neal Cardwell @ 2017-10-31 13:51 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, Yuchung Cheng, Alexei Starovoitov,
	Oleksandr Natalenko, Roman Gushchin, netdev, Lawrence Brakmo

On Tue, Oct 31, 2017 at 2:08 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> Based on SNMP values provided by Roman, Yuchung made the observation
> that some crashes in tcp_sacktag_walk() might be caused by MTU probing.
>
> Looking at tcp_mtu_probe(), I found that when a new skb was placed
> in front of the write queue, we were not updating tcp highest sack.
>
> If one skb is freed because all its content was copied to the new skb
> (for MTU probing), then tp->highest_sack could point to a now freed skb.
>
> Bad things would then happen, including infinite loops.
>
> This patch renames tcp_highest_sack_combine() and uses it
> from tcp_mtu_probe() to fix the bug.
>
> Note that I also removed one test against tp->sacked_out,
> since we want to replace tp->highest_sack regardless of whatever
> condition, since keeping a stale pointer to freed skb is a recipe
> for disaster.
>
> Fixes: a47e5a988a57 ("[TCP]: Convert highest_sack to sk_buff to allow direct access")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Reported-by: Roman Gushchin <guro@fb.com>
> Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
> ---
>  include/net/tcp.h     |    6 +++---
>  net/ipv4/tcp_output.c |    3 ++-
>  2 files changed, 5 insertions(+), 4 deletions(-)

Acked-by: Neal Cardwell <ncardwell@google.com>

Nice catch, indeed. Thanks, Eric!

neal

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH net] tcp: fix tcp_mtu_probe() vs highest_sack
  2017-10-31  6:17                       ` Alexei Starovoitov
  2017-10-31  6:21                         ` Eric Dumazet
@ 2017-11-01  5:50                         ` Yuchung Cheng
  1 sibling, 0 replies; 39+ messages in thread
From: Yuchung Cheng @ 2017-11-01  5:50 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Eric Dumazet, David Miller, Oleksandr Natalenko, Roman Gushchin,
	netdev, Neal Cardwell, Lawrence Brakmo

On Mon, Oct 30, 2017 at 11:17 PM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, Oct 30, 2017 at 11:08:20PM -0700, Eric Dumazet wrote:
> > From: Eric Dumazet <edumazet@google.com>
> >
> > Based on SNMP values provided by Roman, Yuchung made the observation
> > that some crashes in tcp_sacktag_walk() might be caused by MTU probing.
> >
> > Looking at tcp_mtu_probe(), I found that when a new skb was placed
> > in front of the write queue, we were not updating tcp highest sack.
> >
> > If one skb is freed because all its content was copied to the new skb
> > (for MTU probing), then tp->highest_sack could point to a now freed skb.
> >
> > Bad things would then happen, including infinite loops.
> >
> > This patch renames tcp_highest_sack_combine() and uses it
> > from tcp_mtu_probe() to fix the bug.
> >
> > Note that I also removed one test against tp->sacked_out,
> > since we want to replace tp->highest_sack regardless of whatever
> > condition, since keeping a stale pointer to freed skb is a recipe
> > for disaster.
> >
> > Fixes: a47e5a988a57 ("[TCP]: Convert highest_sack to sk_buff to allow direct access")
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> > Reported-by: Roman Gushchin <guro@fb.com>
> > Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
>
> Thanks!
>
> Acked-by: Alexei Starovoitov <ast@kernel.org>
>
> wow. a bug from 2007.
> Any idea why it only started to bite us in 4.11 ?
FWIW some random guess:
Since RACK was confirmed to trigger the issue, and RACK enables
detecting lost retransmission w/o limited-transmit in CA_Loss state, I
guess RACK create a new type of "fast retransmit" that caused some
previously impossible SACK during MTU probing.

Acked-by: Yuchung Cheng <ycheng@google.com>


>
> It's not trivial for us to reproduce it, but we will definitely
> test the patch as soon as we can.
> Do you have packet drill test or something for easy repro?
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH net] tcp: fix tcp_mtu_probe() vs highest_sack
  2017-10-31  6:08                     ` [PATCH net] tcp: fix tcp_mtu_probe() vs highest_sack Eric Dumazet
  2017-10-31  6:17                       ` Alexei Starovoitov
  2017-10-31 13:51                       ` Neal Cardwell
@ 2017-11-01 12:20                       ` David Miller
  2017-11-03 18:22                       ` Oleksandr Natalenko
  3 siblings, 0 replies; 39+ messages in thread
From: David Miller @ 2017-11-01 12:20 UTC (permalink / raw)
  To: eric.dumazet
  Cc: ycheng, alexei.starovoitov, oleksandr, guro, netdev, ncardwell, brakmo

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 30 Oct 2017 23:08:20 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> Based on SNMP values provided by Roman, Yuchung made the observation
> that some crashes in tcp_sacktag_walk() might be caused by MTU probing.
> 
> Looking at tcp_mtu_probe(), I found that when a new skb was placed
> in front of the write queue, we were not updating tcp highest sack.
> 
> If one skb is freed because all its content was copied to the new skb
> (for MTU probing), then tp->highest_sack could point to a now freed skb.
> 
> Bad things would then happen, including infinite loops.
> 
> This patch renames tcp_highest_sack_combine() and uses it
> from tcp_mtu_probe() to fix the bug.
> 
> Note that I also removed one test against tp->sacked_out,
> since we want to replace tp->highest_sack regardless of whatever
> condition, since keeping a stale pointer to freed skb is a recipe
> for disaster.
> 
> Fixes: a47e5a988a57 ("[TCP]: Convert highest_sack to sk_buff to allow direct access")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Reported-by: Roman Gushchin <guro@fb.com>
> Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>

Applied and queued up for -stable.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH net] tcp: fix tcp_mtu_probe() vs highest_sack
  2017-10-31  6:08                     ` [PATCH net] tcp: fix tcp_mtu_probe() vs highest_sack Eric Dumazet
                                         ` (2 preceding siblings ...)
  2017-11-01 12:20                       ` David Miller
@ 2017-11-03 18:22                       ` Oleksandr Natalenko
  2017-11-03 21:31                         ` Eric Dumazet
  3 siblings, 1 reply; 39+ messages in thread
From: Oleksandr Natalenko @ 2017-11-03 18:22 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, Yuchung Cheng, Alexei Starovoitov, Roman Gushchin,
	netdev, Neal Cardwell, Lawrence Brakmo

Hi.

Thanks for the fix.

However, tcp_fastretrans_alert() warning case still remains open even with 
this patch. Do I understand correctly that these are 2 different issues?

Currently, I use latest 4.13 stable kernel + this patch and still get:

WARNING: CPU: 1 PID: 736 at net/ipv4/tcp_input.c:2826 tcp_fastretrans_alert
+0x7c8/0x990

Any idea on this?

On úterý 31. října 2017 7:08:20 CET Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> Based on SNMP values provided by Roman, Yuchung made the observation
> that some crashes in tcp_sacktag_walk() might be caused by MTU probing.
> 
> Looking at tcp_mtu_probe(), I found that when a new skb was placed
> in front of the write queue, we were not updating tcp highest sack.
> 
> If one skb is freed because all its content was copied to the new skb
> (for MTU probing), then tp->highest_sack could point to a now freed skb.
> 
> Bad things would then happen, including infinite loops.
> 
> This patch renames tcp_highest_sack_combine() and uses it
> from tcp_mtu_probe() to fix the bug.
> 
> Note that I also removed one test against tp->sacked_out,
> since we want to replace tp->highest_sack regardless of whatever
> condition, since keeping a stale pointer to freed skb is a recipe
> for disaster.
> 
> Fixes: a47e5a988a57 ("[TCP]: Convert highest_sack to sk_buff to allow direct
> access") Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Reported-by: Roman Gushchin <guro@fb.com>
> Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
> ---
>  include/net/tcp.h     |    6 +++---
>  net/ipv4/tcp_output.c |    3 ++-
>  2 files changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index
> 33599d17522d6a19b9d9a316cc1579cd5e71ee32..e6d0002a1b0bc5f28c331a760823c8dc9
> 2f8fe24 100644 --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -1771,12 +1771,12 @@ static inline void tcp_highest_sack_reset(struct
> sock *sk) tcp_sk(sk)->highest_sack = tcp_write_queue_head(sk);
>  }
> 
> -/* Called when old skb is about to be deleted (to be combined with new skb)
> */ -static inline void tcp_highest_sack_combine(struct sock *sk,
> +/* Called when old skb is about to be deleted and replaced by new skb */
> +static inline void tcp_highest_sack_replace(struct sock *sk,
>  					    struct sk_buff *old,
>  					    struct sk_buff *new)
>  {
> -	if (tcp_sk(sk)->sacked_out && (old == tcp_sk(sk)->highest_sack))
> +	if (old == tcp_highest_sack(sk))
>  		tcp_sk(sk)->highest_sack = new;
>  }
> 
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index
> ae60dd3faed0adc71731bc686f878afd4c628d32..823003eef3a21a5cc5c27e0be9f46159a
> fa060df 100644 --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -2062,6 +2062,7 @@ static int tcp_mtu_probe(struct sock *sk)
>  	nskb->ip_summed = skb->ip_summed;
> 
>  	tcp_insert_write_queue_before(nskb, skb, sk);
> +	tcp_highest_sack_replace(sk, skb, nskb);
> 
>  	len = 0;
>  	tcp_for_write_queue_from_safe(skb, next, sk) {
> @@ -2665,7 +2666,7 @@ static bool tcp_collapse_retrans(struct sock *sk,
> struct sk_buff *skb) else if (!skb_shift(skb, next_skb, next_skb_size))
>  			return false;
>  	}
> -	tcp_highest_sack_combine(sk, next_skb, skb);
> +	tcp_highest_sack_replace(sk, next_skb, skb);
> 
>  	tcp_unlink_write_queue(next_skb, sk);

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH net] tcp: fix tcp_mtu_probe() vs highest_sack
  2017-11-03 18:22                       ` Oleksandr Natalenko
@ 2017-11-03 21:31                         ` Eric Dumazet
  0 siblings, 0 replies; 39+ messages in thread
From: Eric Dumazet @ 2017-11-03 21:31 UTC (permalink / raw)
  To: Oleksandr Natalenko
  Cc: David Miller, Yuchung Cheng, Alexei Starovoitov, Roman Gushchin,
	netdev, Neal Cardwell, Lawrence Brakmo

On Fri, 2017-11-03 at 19:22 +0100, Oleksandr Natalenko wrote:
> Hi.
> 
> Thanks for the fix.
> 
> However, tcp_fastretrans_alert() warning case still remains open even with 
> this patch. Do I understand correctly that these are 2 different issues?
> 
> Currently, I use latest 4.13 stable kernel + this patch and still get:
> 
> WARNING: CPU: 1 PID: 736 at net/ipv4/tcp_input.c:2826 tcp_fastretrans_alert
> +0x7c8/


My patch only fixed the panics that you guys reported.

The warning issue in fastretrans is a separate problem,
we are still working on it, but at least the effects are not
catastrophic.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c
  2017-10-27 20:38                   ` Eric Dumazet
  2017-10-31  6:08                     ` [PATCH net] tcp: fix tcp_mtu_probe() vs highest_sack Eric Dumazet
@ 2017-11-06 22:27                     ` Yuchung Cheng
  2017-11-10 13:15                       ` Oleksandr Natalenko
  1 sibling, 1 reply; 39+ messages in thread
From: Yuchung Cheng @ 2017-11-06 22:27 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alexei Starovoitov, Oleksandr Natalenko, Roman Gushchin, netdev,
	Neal Cardwell, David S. Miller, Lawrence Brakmo

On Fri, Oct 27, 2017 at 1:38 PM, Eric Dumazet <edumazet@google.com> wrote:
>
> On Wed, Oct 25, 2017 at 10:37 PM, Yuchung Cheng <ycheng@google.com> wrote:
> > On Wed, Oct 25, 2017 at 7:07 PM, Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> >>
> >> On Thu, Sep 28, 2017 at 04:36:58PM -0700, Yuchung Cheng wrote:
> >> > On Thu, Sep 28, 2017 at 1:14 AM, Oleksandr Natalenko
> >> > <oleksandr@natalenko.name> wrote:
> >> > > Hi.
> >> > >
> >> > > Won't tell about panic in tcp_sacktag_walk() since I cannot trigger it
> >> > > intentionally, but setting net.ipv4.tcp_retrans_collapse to 0 *does not* fix
> >> > > warning in tcp_fastretrans_alert() for me.
> >> >
> >> > Hi Oleksandr: no retrans_collapse should not matter for that warning
> >> > in tcp_fstretrans_alert(). the warning as I explained earlier is
hi guys can you try if the warning goes away w/ this quick fix?


diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 0ada8bfc2ebd..072aab2a8226 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2626,7 +2626,7 @@ void tcp_simple_retransmit(struct sock *sk)

        tcp_clear_retrans_hints_partial(tp);

-       if (prior_lost == tp->lost_out)
+       if (!tp->lost_out)
                return;

        if (tcp_is_reno(tp))



> >> > likely false. Neal and I are more concerned the panic in
> >> > tcp_sacktag_walk. This is just a blind shot but thx for retrying.
> >> >
> >> > We can submit a one-liner to remove the fast retrans warning but want
> >> > to nail the bigger issue first.
> >>
> >> we're still seeing the warnings followed by crashes and it's very concerning.
> >> We hoped that most recent Neal's patches from Sep 18 around this area may
> >> magically fix the issue, but no. The panics are still there.
> >> It's confirmed that net.ipv4.tcp_retrans_collapse=0 does not help
> >> whereas net.ipv4.tcp_recovery=0 works, but obviously undesirable.
> >> We're out of ideas on how to debug this.
> > Can you try Eric's latest SACK rb-tree patches?
> > https://patchwork.ozlabs.org/cover/822218/
> >
> > Roman's SNMP data suggests MTU probing is enabled. Another blind shot
> > is to disable it.
>
>
> Or alternatively try this fix :
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 1151870018e345592853b035a0902121c41e268d..6a849c7028f06f31b36a906be37995b28b579a40
> 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -2062,6 +2062,8 @@ static int tcp_mtu_probe(struct sock *sk)
>         nskb->ip_summed = skb->ip_summed;
>
>         tcp_insert_write_queue_before(nskb, skb, sk);
> +       if (skb == tp->highest_sack)
> +               tp->highest_sack = nskb;
>
>         len = 0;
>         tcp_for_write_queue_from_safe(skb, next, sk) {

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c
  2017-11-06 22:27                     ` [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c Yuchung Cheng
@ 2017-11-10 13:15                       ` Oleksandr Natalenko
  2017-11-10 13:40                         ` Oleksandr Natalenko
  0 siblings, 1 reply; 39+ messages in thread
From: Oleksandr Natalenko @ 2017-11-10 13:15 UTC (permalink / raw)
  To: Yuchung Cheng
  Cc: Eric Dumazet, Alexei Starovoitov, Roman Gushchin, netdev,
	Neal Cardwell, David S. Miller, Lawrence Brakmo

Hi.

I'm running the machine with this patch applied for 7 hours now, and the 
warning hasn't appeared yet. Typically, it should be there within the 
first hour.

I'll keep an eye on it for a longer time, but as of now it looks good.

Some explanation on this please?

Thanks!

06.11.2017 23:27, Yuchung Cheng wrote:
...snip...
> hi guys can you try if the warning goes away w/ this quick fix?
> 
> 
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 0ada8bfc2ebd..072aab2a8226 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -2626,7 +2626,7 @@ void tcp_simple_retransmit(struct sock *sk)
> 
>         tcp_clear_retrans_hints_partial(tp);
> 
> -       if (prior_lost == tp->lost_out)
> +       if (!tp->lost_out)
>                 return;
> 
>         if (tcp_is_reno(tp))
...snip...

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c
  2017-11-10 13:15                       ` Oleksandr Natalenko
@ 2017-11-10 13:40                         ` Oleksandr Natalenko
  0 siblings, 0 replies; 39+ messages in thread
From: Oleksandr Natalenko @ 2017-11-10 13:40 UTC (permalink / raw)
  To: Yuchung Cheng
  Cc: Eric Dumazet, Alexei Starovoitov, Roman Gushchin, netdev,
	Neal Cardwell, David S. Miller, Lawrence Brakmo

Uhh, sorry, just found the original submission [1].

[1] https://marc.info/?l=linux-netdev&m=151009763926816&w=2

10.11.2017 14:15, Oleksandr Natalenko wrote:
> Hi.
> 
> I'm running the machine with this patch applied for 7 hours now, and
> the warning hasn't appeared yet. Typically, it should be there within
> the first hour.
> 
> I'll keep an eye on it for a longer time, but as of now it looks good.
> 
> Some explanation on this please?
> 
> Thanks!
> 
> 06.11.2017 23:27, Yuchung Cheng wrote:
> ...snip...
>> hi guys can you try if the warning goes away w/ this quick fix?
>> 
>> 
>> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
>> index 0ada8bfc2ebd..072aab2a8226 100644
>> --- a/net/ipv4/tcp_input.c
>> +++ b/net/ipv4/tcp_input.c
>> @@ -2626,7 +2626,7 @@ void tcp_simple_retransmit(struct sock *sk)
>> 
>>         tcp_clear_retrans_hints_partial(tp);
>> 
>> -       if (prior_lost == tp->lost_out)
>> +       if (!tp->lost_out)
>>                 return;
>> 
>>         if (tcp_is_reno(tp))
> ...snip...

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c
  2017-09-19 11:04                       ` Oleksandr Natalenko
@ 2017-09-19 18:16                         ` Yuchung Cheng
  0 siblings, 0 replies; 39+ messages in thread
From: Yuchung Cheng @ 2017-09-19 18:16 UTC (permalink / raw)
  To: Oleksandr Natalenko
  Cc: Neal Cardwell, David S. Miller, Alexey Kuznetsov,
	Hideaki YOSHIFUJI, Netdev

On Tue, Sep 19, 2017 at 4:04 AM, Oleksandr Natalenko
<oleksandr@natalenko.name> wrote:
> Hi.
>
> 18.09.2017 23:40, Yuchung Cheng wrote:
>>
>> I assume this kernel does not have the patch that Neal proposed in his
>> first reply?
>
>
> Correct.
>
>> The main warning needs to be triggered by another peculiar SACK that
>> kicks the sender into recovery again (after undo). Please let it run
>> longer if possible to see if we can get both. But the new data does
>> indicate the we can (validly) be in CA_Open with retrans_out > 0.
>
>
> OK, here it is:
>
> ===
> » LC_TIME=C jctl -kb | grep RIP
> …
> Sep 19 12:54:03 defiant kernel: RIP: 0010:tcp_undo_cwnd_reduction+0xbd/0xd0
> Sep 19 12:54:22 defiant kernel: RIP: 0010:tcp_undo_cwnd_reduction+0xbd/0xd0
> Sep 19 12:54:25 defiant kernel: RIP: 0010:tcp_undo_cwnd_reduction+0xbd/0xd0
> Sep 19 12:56:00 defiant kernel: RIP: 0010:tcp_fastretrans_alert+0x7c8/0x990
> Sep 19 12:57:07 defiant kernel: RIP: 0010:tcp_undo_cwnd_reduction+0xbd/0xd0
> Sep 19 12:57:14 defiant kernel: RIP: 0010:tcp_undo_cwnd_reduction+0xbd/0xd0
> Sep 19 12:58:04 defiant kernel: RIP: 0010:tcp_undo_cwnd_reduction+0xbd/0xd0
> …
> ===
>
> Note timestamps — two types of warning are distant in time, so didn't happen
> at once.
>
> While still running this kernel, anything else I can check for you?
Thanks. Based on all the experiments you did I believe there's other
code path than my hypothesis that'd cause the warning:
1) Neal's proposed F-RTO fix didn't work
2) the main warning is not being triggered together with the newly-instrumented
warning in undo
3) Disabling RACK stopped the warning

We couldn't figure out exactly what. So we'll do a bit code auditing
first to find more suspects

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c
  2017-09-18 21:40                     ` Yuchung Cheng
  2017-09-19 11:04                       ` Oleksandr Natalenko
@ 2017-09-19 16:05                       ` Oleksandr Natalenko
  1 sibling, 0 replies; 39+ messages in thread
From: Oleksandr Natalenko @ 2017-09-19 16:05 UTC (permalink / raw)
  To: Yuchung Cheng
  Cc: Neal Cardwell, David S. Miller, Alexey Kuznetsov,
	Hideaki YOSHIFUJI, Netdev

And 2 more events:

===
$ dmesg --time-format iso | grep RIP
…
2017-09-19T16:52:21,623328+0200 RIP: 0010:tcp_undo_cwnd_reduction+0xbd/0xd0
2017-09-19T16:52:40,455296+0200 RIP: 0010:tcp_fastretrans_alert+0x7c8/0x990
2017-09-19T16:52:41,047378+0200 RIP: 0010:tcp_undo_cwnd_reduction+0xbd/0xd0
…
2017-09-19T16:54:59,930726+0200 RIP: 0010:tcp_undo_cwnd_reduction+0xbd/0xd0
2017-09-19T16:55:07,985767+0200 RIP: 0010:tcp_fastretrans_alert+0x7c8/0x990
2017-09-19T16:55:41,911527+0200 RIP: 0010:tcp_undo_cwnd_reduction+0xbd/0xd0
…
===

On pondělí 18. září 2017 23:40:08 CEST Yuchung Cheng wrote:
> On Mon, Sep 18, 2017 at 1:46 PM, Oleksandr Natalenko
> 
> <oleksandr@natalenko.name> wrote:
> > Actually, same warning was just triggered with RACK enabled. But main
> > warning was not triggered in this case.
> 
> Thanks.
> 
> I assume this kernel does not have the patch that Neal proposed in his
> first reply?
> 
> The main warning needs to be triggered by another peculiar SACK that
> kicks the sender into recovery again (after undo). Please let it run
> longer if possible to see if we can get both. But the new data does
> indicate the we can (validly) be in CA_Open with retrans_out > 0.
> 
> > ===
> > Sep 18 22:44:32 defiant kernel: ------------[ cut here ]------------
> > Sep 18 22:44:32 defiant kernel: WARNING: CPU: 1 PID: 702 at net/ipv4/
> > tcp_input.c:2392 tcp_undo_cwnd_reduction+0xbd/0xd0
> > Sep 18 22:44:32 defiant kernel: Modules linked in: netconsole ctr ccm
> > cls_bpf sch_htb act_mirred cls_u32 sch_ingress sit tunnel4 ip_tunnel
> > 8021q mrp nf_conntrack_ipv6 nf_defrag_ipv6 nft_ct nft_set_bitmap
> > nft_set_hash nft_set_rbtree nf_tables_inet nf_tables_ipv6 nft_masq_ipv4
> > nf_nat_masquerade_ipv4 nft_masq nft_nat nft_counter nft_meta
> > nft_chain_nat_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat
> > nf_conntrack libcrc32c crc32c_generic nf_tables_ipv4 nf_tables tun nct6775
> > nfnetlink hwmon_vid nls_iso8859_1 nls_cp437 vfat fat ext4
> > snd_hda_codec_hdmi mbcache jbd2 snd_hda_codec_realtek
> > snd_hda_codec_generic f2fs arc4 fscrypto intel_rapl iTCO_wdt ath9k
> > iTCO_vendor_support intel_powerclamp ath9k_common ath9k_hw coretemp
> > kvm_intel ath mac80211 kvm irqbypass intel_cstate cfg80211 pcspkr
> > snd_hda_intel snd_hda_codec r8169
> > Sep 18 22:44:32 defiant kernel:  joydev evdev mii snd_hda_core mousedev
> > mei_txe input_leds i2c_i801 mac_hid i915 lpc_ich mei shpchp snd_hwdep
> > snd_intel_sst_acpi snd_intel_sst_core snd_soc_rt5670
> > snd_soc_sst_atom_hifi2_platform battery snd_soc_sst_match snd_soc_rl6231
> > drm_kms_helper hci_uart ov5693(C) ov2722(C) lm3554(C) btbcm btqca
> > v4l2_common snd_soc_core btintel snd_compress videodev snd_pcm_dmaengine
> > snd_pcm video bluetooth snd_timer drm media tpm_tis snd i2c_hid soundcore
> > tpm_tis_core rfkill_gpio ac97_bus soc_button_array ecdh_generic rfkill
> > crc16 tpm 8250_dw intel_gtt syscopyarea sysfillrect acpi_pad sysimgblt
> > intel_int0002_vgpio fb_sys_fops pinctrl_cherryview i2c_algo_bit button
> > sch_fq_codel tcp_bbr ifb ip_tables x_tables btrfs xor raid6_pq
> > algif_skcipher af_alg hid_logitech_hidpp hid_logitech_dj usbhid hid uas
> > Sep 18 22:44:32 defiant kernel:  usb_storage dm_crypt dm_mod dax raid10
> > md_mod sd_mod crct10dif_pclmul crc32_pclmul crc32c_intel
> > ghash_clmulni_intel pcbc ahci aesni_intel xhci_pci libahci aes_x86_64
> > crypto_simd glue_helper xhci_hcd cryptd libata usbcore scsi_mod
> > usb_common serio sdhci_acpi sdhci led_class mmc_core
> > Sep 18 22:44:32 defiant kernel: CPU: 1 PID: 702 Comm: irq/123-enp3s0
> > Tainted: G        WC      4.13.0-pf4 #1
> > Sep 18 22:44:32 defiant kernel: Hardware name: To Be Filled By O.E.M. To
> > Be
> > Filled By O.E.M./J3710-ITX, BIOS P1.30 03/30/2016
> > Sep 18 22:44:32 defiant kernel: task: ffff88923a738000 task.stack:
> > ffff958001500000
> > Sep 18 22:44:32 defiant kernel: RIP:
> > 0010:tcp_undo_cwnd_reduction+0xbd/0xd0
> > Sep 18 22:44:32 defiant kernel: RSP: 0018:ffff88927fc83a48 EFLAGS:
> > 00010202
> > Sep 18 22:44:32 defiant kernel: RAX: 0000000000000001 RBX:
> > ffff8892412d9800
> > RCX: ffff88927fc83b0c
> > Sep 18 22:44:32 defiant kernel: RDX: 000000007fffffff RSI:
> > 0000000000000001
> > RDI: ffff8892412d9800
> > Sep 18 22:44:32 defiant kernel: RBP: ffff88927fc83a50 R08:
> > 0000000000000000
> > R09: 0000000018dfb063
> > Sep 18 22:44:32 defiant kernel: R10: 0000000018dfd223 R11:
> > 0000000018dfb063
> > R12: 0000000000005320
> > Sep 18 22:44:32 defiant kernel: R13: ffff88927fc83b10 R14:
> > 0000000000000001
> > R15: ffff88927fc83b0c
> > Sep 18 22:44:32 defiant kernel: FS:  0000000000000000(0000)
> > GS:ffff88927fc80000(0000) knlGS:0000000000000000
> > Sep 18 22:44:32 defiant kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
> > 0000000080050033
> > Sep 18 22:44:32 defiant kernel: CR2: 00007f1cd1a43620 CR3:
> > 0000000114a09000
> > CR4: 00000000001006e0
> > Sep 18 22:44:32 defiant kernel: Call Trace:
> > Sep 18 22:44:32 defiant kernel:  <IRQ>
> > Sep 18 22:44:32 defiant kernel:  tcp_try_undo_loss+0xb3/0xf0
> > Sep 18 22:44:32 defiant kernel:  tcp_fastretrans_alert+0x746/0x990
> > Sep 18 22:44:32 defiant kernel:  tcp_ack+0x741/0x1110
> > Sep 18 22:44:32 defiant kernel:  tcp_rcv_established+0x325/0x770
> > Sep 18 22:44:32 defiant kernel:  ? sk_filter_trim_cap+0xd4/0x1a0
> > Sep 18 22:44:32 defiant kernel:  tcp_v4_do_rcv+0x90/0x1e0
> > Sep 18 22:44:32 defiant kernel:  tcp_v4_rcv+0x950/0xa10
> > Sep 18 22:44:32 defiant kernel:  ? nf_ct_deliver_cached_events+0xb8/0x110
> > [nf_conntrack]
> > Sep 18 22:44:32 defiant kernel:  ip_local_deliver_finish+0x68/0x210
> > Sep 18 22:44:32 defiant kernel:  ip_local_deliver+0xfa/0x110
> > Sep 18 22:44:32 defiant kernel:  ? ip_rcv_finish+0x410/0x410
> > Sep 18 22:44:32 defiant kernel:  ip_rcv_finish+0x120/0x410
> > Sep 18 22:44:32 defiant kernel:  ip_rcv+0x28e/0x3b0
> > Sep 18 22:44:32 defiant kernel:  ? inet_del_offload+0x40/0x40
> > Sep 18 22:44:32 defiant kernel:  __netif_receive_skb_core+0x39b/0xb00
> > Sep 18 22:44:32 defiant kernel:  ? netif_receive_skb_internal+0xa0/0x480
> > Sep 18 22:44:32 defiant kernel:  ? dev_gro_receive+0x2eb/0x4a0
> > Sep 18 22:44:32 defiant kernel:  __netif_receive_skb+0x18/0x60
> > Sep 18 22:44:32 defiant kernel:  netif_receive_skb_internal+0x98/0x480
> > Sep 18 22:44:32 defiant kernel:  netif_receive_skb+0x1c/0x80
> > Sep 18 22:44:32 defiant kernel:  ifb_ri_tasklet+0x109/0x26a [ifb]
> > Sep 18 22:44:32 defiant kernel:  tasklet_action+0x63/0x120
> > Sep 18 22:44:32 defiant kernel:  __do_softirq+0xdf/0x2e5
> > Sep 18 22:44:32 defiant kernel:  ? irq_finalize_oneshot.part.39+0xe0/0xe0
> > Sep 18 22:44:32 defiant kernel:  do_softirq_own_stack+0x1c/0x30
> > Sep 18 22:44:32 defiant kernel:  </IRQ>
> > Sep 18 22:44:32 defiant kernel:  do_softirq.part.17+0x4e/0x60
> > Sep 18 22:44:32 defiant kernel:  __local_bh_enable_ip+0x77/0x80
> > Sep 18 22:44:32 defiant kernel:  irq_forced_thread_fn+0x5c/0x70
> > Sep 18 22:44:32 defiant kernel:  irq_thread+0x131/0x1a0
> > Sep 18 22:44:32 defiant kernel:  ? wake_threads_waitq+0x30/0x30
> > Sep 18 22:44:32 defiant kernel:  kthread+0x126/0x140
> > Sep 18 22:44:32 defiant kernel:  ? irq_thread_check_affinity+0x90/0x90
> > Sep 18 22:44:32 defiant kernel:  ? kthread_create_on_node+0x70/0x70
> > Sep 18 22:44:32 defiant kernel:  ret_from_fork+0x25/0x30
> > Sep 18 22:44:32 defiant kernel: Code: 5d c3 80 60 35 fb 48 8b 00 48 39 c2
> > 74 85 48 3b 83 50 01 00 00 75 eb e9 77 ff ff ff 89 83 48 06 00 00 80 a3
> > 1e 06 00 00 fb eb b3 <0f> ff 5b 5d c3 0f 1f 40 00 66 2e 0f 1f 84 00 00 00
> > 00 00 0f 1f Sep 18 22:44:32 defiant kernel: ---[ end trace
> > 1aea180efeedb474 ]--- ===
> > 
> > On pondělí 18. září 2017 20:01:42 CEST Yuchung Cheng wrote:
> >> On Mon, Sep 18, 2017 at 10:59 AM, Oleksandr Natalenko
> >> 
> >> <oleksandr@natalenko.name> wrote:
> >> > OK. Should I keep FACK disabled?
> >> 
> >> Yes since it is disabled in the upstream by default. Although you can
> >> experiment FACK enabled additionally.
> >> 
> >> Do we know the crash you first experienced is tied to this issue?
> >> 
> >> > On pondělí 18. září 2017 19:51:21 CEST Yuchung Cheng wrote:
> >> >> Can you try this patch to verify my theory with tcp_recovery=0 and 1?
> >> >> thanks
> >> >> 
> >> >> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> >> >> index 5af2f04f8859..9253d9ee7d0e 100644
> >> >> --- a/net/ipv4/tcp_input.c
> >> >> +++ b/net/ipv4/tcp_input.c
> >> >> @@ -2381,6 +2381,7 @@ static void tcp_undo_cwnd_reduction(struct sock
> >> >> *sk, bool unmark_loss)
> >> >> 
> >> >>         }
> >> >>         tp->snd_cwnd_stamp = tcp_time_stamp;
> >> >>         tp->undo_marker = 0;
> >> >> 
> >> >> +       WARN_ON(tp->retrans_out);
> >> >> 
> >> >>  }

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c
  2017-09-18 21:40                     ` Yuchung Cheng
@ 2017-09-19 11:04                       ` Oleksandr Natalenko
  2017-09-19 18:16                         ` Yuchung Cheng
  2017-09-19 16:05                       ` Oleksandr Natalenko
  1 sibling, 1 reply; 39+ messages in thread
From: Oleksandr Natalenko @ 2017-09-19 11:04 UTC (permalink / raw)
  To: Yuchung Cheng
  Cc: Neal Cardwell, David S. Miller, Alexey Kuznetsov,
	Hideaki YOSHIFUJI, Netdev

Hi.

18.09.2017 23:40, Yuchung Cheng wrote:
> I assume this kernel does not have the patch that Neal proposed in his
> first reply?

Correct.

> The main warning needs to be triggered by another peculiar SACK that
> kicks the sender into recovery again (after undo). Please let it run
> longer if possible to see if we can get both. But the new data does
> indicate the we can (validly) be in CA_Open with retrans_out > 0.

OK, here it is:

===
» LC_TIME=C jctl -kb | grep RIP
…
Sep 19 12:54:03 defiant kernel: RIP: 
0010:tcp_undo_cwnd_reduction+0xbd/0xd0
Sep 19 12:54:22 defiant kernel: RIP: 
0010:tcp_undo_cwnd_reduction+0xbd/0xd0
Sep 19 12:54:25 defiant kernel: RIP: 
0010:tcp_undo_cwnd_reduction+0xbd/0xd0
Sep 19 12:56:00 defiant kernel: RIP: 
0010:tcp_fastretrans_alert+0x7c8/0x990
Sep 19 12:57:07 defiant kernel: RIP: 
0010:tcp_undo_cwnd_reduction+0xbd/0xd0
Sep 19 12:57:14 defiant kernel: RIP: 
0010:tcp_undo_cwnd_reduction+0xbd/0xd0
Sep 19 12:58:04 defiant kernel: RIP: 
0010:tcp_undo_cwnd_reduction+0xbd/0xd0
…
===

Note timestamps — two types of warning are distant in time, so didn't 
happen at once.

While still running this kernel, anything else I can check for you?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c
  2017-09-18 20:46                   ` Oleksandr Natalenko
@ 2017-09-18 21:40                     ` Yuchung Cheng
  2017-09-19 11:04                       ` Oleksandr Natalenko
  2017-09-19 16:05                       ` Oleksandr Natalenko
  0 siblings, 2 replies; 39+ messages in thread
From: Yuchung Cheng @ 2017-09-18 21:40 UTC (permalink / raw)
  To: Oleksandr Natalenko
  Cc: Neal Cardwell, David S. Miller, Alexey Kuznetsov,
	Hideaki YOSHIFUJI, Netdev

On Mon, Sep 18, 2017 at 1:46 PM, Oleksandr Natalenko
<oleksandr@natalenko.name> wrote:
> Actually, same warning was just triggered with RACK enabled. But main warning
> was not triggered in this case.
Thanks.

I assume this kernel does not have the patch that Neal proposed in his
first reply?

The main warning needs to be triggered by another peculiar SACK that
kicks the sender into recovery again (after undo). Please let it run
longer if possible to see if we can get both. But the new data does
indicate the we can (validly) be in CA_Open with retrans_out > 0.

>
> ===
> Sep 18 22:44:32 defiant kernel: ------------[ cut here ]------------
> Sep 18 22:44:32 defiant kernel: WARNING: CPU: 1 PID: 702 at net/ipv4/
> tcp_input.c:2392 tcp_undo_cwnd_reduction+0xbd/0xd0
> Sep 18 22:44:32 defiant kernel: Modules linked in: netconsole ctr ccm cls_bpf
> sch_htb act_mirred cls_u32 sch_ingress sit tunnel4 ip_tunnel 8021q mrp
> nf_conntrack_ipv6 nf_defrag_ipv6 nft_ct nft_set_bitmap nft_set_hash
> nft_set_rbtree nf_tables_inet nf_tables_ipv6 nft_masq_ipv4
> nf_nat_masquerade_ipv4 nft_masq nft_nat nft_counter nft_meta
> nft_chain_nat_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat
> nf_conntrack libcrc32c crc32c_generic nf_tables_ipv4 nf_tables tun nct6775
> nfnetlink hwmon_vid nls_iso8859_1 nls_cp437 vfat fat ext4 snd_hda_codec_hdmi
> mbcache jbd2 snd_hda_codec_realtek snd_hda_codec_generic f2fs arc4 fscrypto
> intel_rapl iTCO_wdt ath9k iTCO_vendor_support intel_powerclamp ath9k_common
> ath9k_hw coretemp kvm_intel ath mac80211 kvm irqbypass intel_cstate cfg80211
> pcspkr snd_hda_intel snd_hda_codec r8169
> Sep 18 22:44:32 defiant kernel:  joydev evdev mii snd_hda_core mousedev
> mei_txe input_leds i2c_i801 mac_hid i915 lpc_ich mei shpchp snd_hwdep
> snd_intel_sst_acpi snd_intel_sst_core snd_soc_rt5670
> snd_soc_sst_atom_hifi2_platform battery snd_soc_sst_match snd_soc_rl6231
> drm_kms_helper hci_uart ov5693(C) ov2722(C) lm3554(C) btbcm btqca v4l2_common
> snd_soc_core btintel snd_compress videodev snd_pcm_dmaengine snd_pcm video
> bluetooth snd_timer drm media tpm_tis snd i2c_hid soundcore tpm_tis_core
> rfkill_gpio ac97_bus soc_button_array ecdh_generic rfkill crc16 tpm 8250_dw
> intel_gtt syscopyarea sysfillrect acpi_pad sysimgblt intel_int0002_vgpio
> fb_sys_fops pinctrl_cherryview i2c_algo_bit button sch_fq_codel tcp_bbr ifb
> ip_tables x_tables btrfs xor raid6_pq algif_skcipher af_alg hid_logitech_hidpp
> hid_logitech_dj usbhid hid uas
> Sep 18 22:44:32 defiant kernel:  usb_storage dm_crypt dm_mod dax raid10 md_mod
> sd_mod crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc
> ahci aesni_intel xhci_pci libahci aes_x86_64 crypto_simd glue_helper xhci_hcd
> cryptd libata usbcore scsi_mod usb_common serio sdhci_acpi sdhci led_class
> mmc_core
> Sep 18 22:44:32 defiant kernel: CPU: 1 PID: 702 Comm: irq/123-enp3s0 Tainted:
> G        WC      4.13.0-pf4 #1
> Sep 18 22:44:32 defiant kernel: Hardware name: To Be Filled By O.E.M. To Be
> Filled By O.E.M./J3710-ITX, BIOS P1.30 03/30/2016
> Sep 18 22:44:32 defiant kernel: task: ffff88923a738000 task.stack:
> ffff958001500000
> Sep 18 22:44:32 defiant kernel: RIP: 0010:tcp_undo_cwnd_reduction+0xbd/0xd0
> Sep 18 22:44:32 defiant kernel: RSP: 0018:ffff88927fc83a48 EFLAGS: 00010202
> Sep 18 22:44:32 defiant kernel: RAX: 0000000000000001 RBX: ffff8892412d9800
> RCX: ffff88927fc83b0c
> Sep 18 22:44:32 defiant kernel: RDX: 000000007fffffff RSI: 0000000000000001
> RDI: ffff8892412d9800
> Sep 18 22:44:32 defiant kernel: RBP: ffff88927fc83a50 R08: 0000000000000000
> R09: 0000000018dfb063
> Sep 18 22:44:32 defiant kernel: R10: 0000000018dfd223 R11: 0000000018dfb063
> R12: 0000000000005320
> Sep 18 22:44:32 defiant kernel: R13: ffff88927fc83b10 R14: 0000000000000001
> R15: ffff88927fc83b0c
> Sep 18 22:44:32 defiant kernel: FS:  0000000000000000(0000)
> GS:ffff88927fc80000(0000) knlGS:0000000000000000
> Sep 18 22:44:32 defiant kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
> 0000000080050033
> Sep 18 22:44:32 defiant kernel: CR2: 00007f1cd1a43620 CR3: 0000000114a09000
> CR4: 00000000001006e0
> Sep 18 22:44:32 defiant kernel: Call Trace:
> Sep 18 22:44:32 defiant kernel:  <IRQ>
> Sep 18 22:44:32 defiant kernel:  tcp_try_undo_loss+0xb3/0xf0
> Sep 18 22:44:32 defiant kernel:  tcp_fastretrans_alert+0x746/0x990
> Sep 18 22:44:32 defiant kernel:  tcp_ack+0x741/0x1110
> Sep 18 22:44:32 defiant kernel:  tcp_rcv_established+0x325/0x770
> Sep 18 22:44:32 defiant kernel:  ? sk_filter_trim_cap+0xd4/0x1a0
> Sep 18 22:44:32 defiant kernel:  tcp_v4_do_rcv+0x90/0x1e0
> Sep 18 22:44:32 defiant kernel:  tcp_v4_rcv+0x950/0xa10
> Sep 18 22:44:32 defiant kernel:  ? nf_ct_deliver_cached_events+0xb8/0x110
> [nf_conntrack]
> Sep 18 22:44:32 defiant kernel:  ip_local_deliver_finish+0x68/0x210
> Sep 18 22:44:32 defiant kernel:  ip_local_deliver+0xfa/0x110
> Sep 18 22:44:32 defiant kernel:  ? ip_rcv_finish+0x410/0x410
> Sep 18 22:44:32 defiant kernel:  ip_rcv_finish+0x120/0x410
> Sep 18 22:44:32 defiant kernel:  ip_rcv+0x28e/0x3b0
> Sep 18 22:44:32 defiant kernel:  ? inet_del_offload+0x40/0x40
> Sep 18 22:44:32 defiant kernel:  __netif_receive_skb_core+0x39b/0xb00
> Sep 18 22:44:32 defiant kernel:  ? netif_receive_skb_internal+0xa0/0x480
> Sep 18 22:44:32 defiant kernel:  ? dev_gro_receive+0x2eb/0x4a0
> Sep 18 22:44:32 defiant kernel:  __netif_receive_skb+0x18/0x60
> Sep 18 22:44:32 defiant kernel:  netif_receive_skb_internal+0x98/0x480
> Sep 18 22:44:32 defiant kernel:  netif_receive_skb+0x1c/0x80
> Sep 18 22:44:32 defiant kernel:  ifb_ri_tasklet+0x109/0x26a [ifb]
> Sep 18 22:44:32 defiant kernel:  tasklet_action+0x63/0x120
> Sep 18 22:44:32 defiant kernel:  __do_softirq+0xdf/0x2e5
> Sep 18 22:44:32 defiant kernel:  ? irq_finalize_oneshot.part.39+0xe0/0xe0
> Sep 18 22:44:32 defiant kernel:  do_softirq_own_stack+0x1c/0x30
> Sep 18 22:44:32 defiant kernel:  </IRQ>
> Sep 18 22:44:32 defiant kernel:  do_softirq.part.17+0x4e/0x60
> Sep 18 22:44:32 defiant kernel:  __local_bh_enable_ip+0x77/0x80
> Sep 18 22:44:32 defiant kernel:  irq_forced_thread_fn+0x5c/0x70
> Sep 18 22:44:32 defiant kernel:  irq_thread+0x131/0x1a0
> Sep 18 22:44:32 defiant kernel:  ? wake_threads_waitq+0x30/0x30
> Sep 18 22:44:32 defiant kernel:  kthread+0x126/0x140
> Sep 18 22:44:32 defiant kernel:  ? irq_thread_check_affinity+0x90/0x90
> Sep 18 22:44:32 defiant kernel:  ? kthread_create_on_node+0x70/0x70
> Sep 18 22:44:32 defiant kernel:  ret_from_fork+0x25/0x30
> Sep 18 22:44:32 defiant kernel: Code: 5d c3 80 60 35 fb 48 8b 00 48 39 c2 74
> 85 48 3b 83 50 01 00 00 75 eb e9 77 ff ff ff 89 83 48 06 00 00 80 a3 1e 06 00
> 00 fb eb b3 <0f> ff 5b 5d c3 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
> Sep 18 22:44:32 defiant kernel: ---[ end trace 1aea180efeedb474 ]---
> ===
>
> On pondělí 18. září 2017 20:01:42 CEST Yuchung Cheng wrote:
>> On Mon, Sep 18, 2017 at 10:59 AM, Oleksandr Natalenko
>>
>> <oleksandr@natalenko.name> wrote:
>> > OK. Should I keep FACK disabled?
>>
>> Yes since it is disabled in the upstream by default. Although you can
>> experiment FACK enabled additionally.
>>
>> Do we know the crash you first experienced is tied to this issue?
>>
>> > On pondělí 18. září 2017 19:51:21 CEST Yuchung Cheng wrote:
>> >> Can you try this patch to verify my theory with tcp_recovery=0 and 1?
>> >> thanks
>> >>
>> >> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
>> >> index 5af2f04f8859..9253d9ee7d0e 100644
>> >> --- a/net/ipv4/tcp_input.c
>> >> +++ b/net/ipv4/tcp_input.c
>> >> @@ -2381,6 +2381,7 @@ static void tcp_undo_cwnd_reduction(struct sock
>> >> *sk, bool unmark_loss)
>> >>
>> >>         }
>> >>         tp->snd_cwnd_stamp = tcp_time_stamp;
>> >>         tp->undo_marker = 0;
>> >>
>> >> +       WARN_ON(tp->retrans_out);
>> >>
>> >>  }
>
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c
  2017-09-18 18:01                 ` Yuchung Cheng
  2017-09-18 18:04                   ` Oleksandr Natalenko
  2017-09-18 20:41                   ` Oleksandr Natalenko
@ 2017-09-18 20:46                   ` Oleksandr Natalenko
  2017-09-18 21:40                     ` Yuchung Cheng
  2 siblings, 1 reply; 39+ messages in thread
From: Oleksandr Natalenko @ 2017-09-18 20:46 UTC (permalink / raw)
  To: Yuchung Cheng
  Cc: Neal Cardwell, David S. Miller, Alexey Kuznetsov,
	Hideaki YOSHIFUJI, Netdev

Actually, same warning was just triggered with RACK enabled. But main warning 
was not triggered in this case.

===
Sep 18 22:44:32 defiant kernel: ------------[ cut here ]------------
Sep 18 22:44:32 defiant kernel: WARNING: CPU: 1 PID: 702 at net/ipv4/
tcp_input.c:2392 tcp_undo_cwnd_reduction+0xbd/0xd0
Sep 18 22:44:32 defiant kernel: Modules linked in: netconsole ctr ccm cls_bpf 
sch_htb act_mirred cls_u32 sch_ingress sit tunnel4 ip_tunnel 8021q mrp 
nf_conntrack_ipv6 nf_defrag_ipv6 nft_ct nft_set_bitmap nft_set_hash 
nft_set_rbtree nf_tables_inet nf_tables_ipv6 nft_masq_ipv4 
nf_nat_masquerade_ipv4 nft_masq nft_nat nft_counter nft_meta 
nft_chain_nat_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat 
nf_conntrack libcrc32c crc32c_generic nf_tables_ipv4 nf_tables tun nct6775 
nfnetlink hwmon_vid nls_iso8859_1 nls_cp437 vfat fat ext4 snd_hda_codec_hdmi 
mbcache jbd2 snd_hda_codec_realtek snd_hda_codec_generic f2fs arc4 fscrypto 
intel_rapl iTCO_wdt ath9k iTCO_vendor_support intel_powerclamp ath9k_common 
ath9k_hw coretemp kvm_intel ath mac80211 kvm irqbypass intel_cstate cfg80211 
pcspkr snd_hda_intel snd_hda_codec r8169
Sep 18 22:44:32 defiant kernel:  joydev evdev mii snd_hda_core mousedev 
mei_txe input_leds i2c_i801 mac_hid i915 lpc_ich mei shpchp snd_hwdep 
snd_intel_sst_acpi snd_intel_sst_core snd_soc_rt5670 
snd_soc_sst_atom_hifi2_platform battery snd_soc_sst_match snd_soc_rl6231 
drm_kms_helper hci_uart ov5693(C) ov2722(C) lm3554(C) btbcm btqca v4l2_common 
snd_soc_core btintel snd_compress videodev snd_pcm_dmaengine snd_pcm video 
bluetooth snd_timer drm media tpm_tis snd i2c_hid soundcore tpm_tis_core 
rfkill_gpio ac97_bus soc_button_array ecdh_generic rfkill crc16 tpm 8250_dw 
intel_gtt syscopyarea sysfillrect acpi_pad sysimgblt intel_int0002_vgpio 
fb_sys_fops pinctrl_cherryview i2c_algo_bit button sch_fq_codel tcp_bbr ifb 
ip_tables x_tables btrfs xor raid6_pq algif_skcipher af_alg hid_logitech_hidpp 
hid_logitech_dj usbhid hid uas
Sep 18 22:44:32 defiant kernel:  usb_storage dm_crypt dm_mod dax raid10 md_mod 
sd_mod crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc 
ahci aesni_intel xhci_pci libahci aes_x86_64 crypto_simd glue_helper xhci_hcd 
cryptd libata usbcore scsi_mod usb_common serio sdhci_acpi sdhci led_class 
mmc_core
Sep 18 22:44:32 defiant kernel: CPU: 1 PID: 702 Comm: irq/123-enp3s0 Tainted: 
G        WC      4.13.0-pf4 #1
Sep 18 22:44:32 defiant kernel: Hardware name: To Be Filled By O.E.M. To Be 
Filled By O.E.M./J3710-ITX, BIOS P1.30 03/30/2016
Sep 18 22:44:32 defiant kernel: task: ffff88923a738000 task.stack: 
ffff958001500000
Sep 18 22:44:32 defiant kernel: RIP: 0010:tcp_undo_cwnd_reduction+0xbd/0xd0
Sep 18 22:44:32 defiant kernel: RSP: 0018:ffff88927fc83a48 EFLAGS: 00010202
Sep 18 22:44:32 defiant kernel: RAX: 0000000000000001 RBX: ffff8892412d9800 
RCX: ffff88927fc83b0c
Sep 18 22:44:32 defiant kernel: RDX: 000000007fffffff RSI: 0000000000000001 
RDI: ffff8892412d9800
Sep 18 22:44:32 defiant kernel: RBP: ffff88927fc83a50 R08: 0000000000000000 
R09: 0000000018dfb063
Sep 18 22:44:32 defiant kernel: R10: 0000000018dfd223 R11: 0000000018dfb063 
R12: 0000000000005320
Sep 18 22:44:32 defiant kernel: R13: ffff88927fc83b10 R14: 0000000000000001 
R15: ffff88927fc83b0c
Sep 18 22:44:32 defiant kernel: FS:  0000000000000000(0000) 
GS:ffff88927fc80000(0000) knlGS:0000000000000000
Sep 18 22:44:32 defiant kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 
0000000080050033
Sep 18 22:44:32 defiant kernel: CR2: 00007f1cd1a43620 CR3: 0000000114a09000 
CR4: 00000000001006e0
Sep 18 22:44:32 defiant kernel: Call Trace:
Sep 18 22:44:32 defiant kernel:  <IRQ>
Sep 18 22:44:32 defiant kernel:  tcp_try_undo_loss+0xb3/0xf0
Sep 18 22:44:32 defiant kernel:  tcp_fastretrans_alert+0x746/0x990
Sep 18 22:44:32 defiant kernel:  tcp_ack+0x741/0x1110
Sep 18 22:44:32 defiant kernel:  tcp_rcv_established+0x325/0x770
Sep 18 22:44:32 defiant kernel:  ? sk_filter_trim_cap+0xd4/0x1a0
Sep 18 22:44:32 defiant kernel:  tcp_v4_do_rcv+0x90/0x1e0
Sep 18 22:44:32 defiant kernel:  tcp_v4_rcv+0x950/0xa10
Sep 18 22:44:32 defiant kernel:  ? nf_ct_deliver_cached_events+0xb8/0x110 
[nf_conntrack]
Sep 18 22:44:32 defiant kernel:  ip_local_deliver_finish+0x68/0x210
Sep 18 22:44:32 defiant kernel:  ip_local_deliver+0xfa/0x110
Sep 18 22:44:32 defiant kernel:  ? ip_rcv_finish+0x410/0x410
Sep 18 22:44:32 defiant kernel:  ip_rcv_finish+0x120/0x410
Sep 18 22:44:32 defiant kernel:  ip_rcv+0x28e/0x3b0
Sep 18 22:44:32 defiant kernel:  ? inet_del_offload+0x40/0x40
Sep 18 22:44:32 defiant kernel:  __netif_receive_skb_core+0x39b/0xb00
Sep 18 22:44:32 defiant kernel:  ? netif_receive_skb_internal+0xa0/0x480
Sep 18 22:44:32 defiant kernel:  ? dev_gro_receive+0x2eb/0x4a0
Sep 18 22:44:32 defiant kernel:  __netif_receive_skb+0x18/0x60
Sep 18 22:44:32 defiant kernel:  netif_receive_skb_internal+0x98/0x480
Sep 18 22:44:32 defiant kernel:  netif_receive_skb+0x1c/0x80
Sep 18 22:44:32 defiant kernel:  ifb_ri_tasklet+0x109/0x26a [ifb]
Sep 18 22:44:32 defiant kernel:  tasklet_action+0x63/0x120
Sep 18 22:44:32 defiant kernel:  __do_softirq+0xdf/0x2e5
Sep 18 22:44:32 defiant kernel:  ? irq_finalize_oneshot.part.39+0xe0/0xe0
Sep 18 22:44:32 defiant kernel:  do_softirq_own_stack+0x1c/0x30
Sep 18 22:44:32 defiant kernel:  </IRQ>
Sep 18 22:44:32 defiant kernel:  do_softirq.part.17+0x4e/0x60
Sep 18 22:44:32 defiant kernel:  __local_bh_enable_ip+0x77/0x80
Sep 18 22:44:32 defiant kernel:  irq_forced_thread_fn+0x5c/0x70
Sep 18 22:44:32 defiant kernel:  irq_thread+0x131/0x1a0
Sep 18 22:44:32 defiant kernel:  ? wake_threads_waitq+0x30/0x30
Sep 18 22:44:32 defiant kernel:  kthread+0x126/0x140
Sep 18 22:44:32 defiant kernel:  ? irq_thread_check_affinity+0x90/0x90
Sep 18 22:44:32 defiant kernel:  ? kthread_create_on_node+0x70/0x70
Sep 18 22:44:32 defiant kernel:  ret_from_fork+0x25/0x30
Sep 18 22:44:32 defiant kernel: Code: 5d c3 80 60 35 fb 48 8b 00 48 39 c2 74 
85 48 3b 83 50 01 00 00 75 eb e9 77 ff ff ff 89 83 48 06 00 00 80 a3 1e 06 00 
00 fb eb b3 <0f> ff 5b 5d c3 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 
Sep 18 22:44:32 defiant kernel: ---[ end trace 1aea180efeedb474 ]---
===

On pondělí 18. září 2017 20:01:42 CEST Yuchung Cheng wrote:
> On Mon, Sep 18, 2017 at 10:59 AM, Oleksandr Natalenko
> 
> <oleksandr@natalenko.name> wrote:
> > OK. Should I keep FACK disabled?
> 
> Yes since it is disabled in the upstream by default. Although you can
> experiment FACK enabled additionally.
> 
> Do we know the crash you first experienced is tied to this issue?
> 
> > On pondělí 18. září 2017 19:51:21 CEST Yuchung Cheng wrote:
> >> Can you try this patch to verify my theory with tcp_recovery=0 and 1?
> >> thanks
> >> 
> >> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> >> index 5af2f04f8859..9253d9ee7d0e 100644
> >> --- a/net/ipv4/tcp_input.c
> >> +++ b/net/ipv4/tcp_input.c
> >> @@ -2381,6 +2381,7 @@ static void tcp_undo_cwnd_reduction(struct sock
> >> *sk, bool unmark_loss)
> >> 
> >>         }
> >>         tp->snd_cwnd_stamp = tcp_time_stamp;
> >>         tp->undo_marker = 0;
> >> 
> >> +       WARN_ON(tp->retrans_out);
> >> 
> >>  }

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c
  2017-09-18 18:01                 ` Yuchung Cheng
  2017-09-18 18:04                   ` Oleksandr Natalenko
@ 2017-09-18 20:41                   ` Oleksandr Natalenko
  2017-09-18 20:46                   ` Oleksandr Natalenko
  2 siblings, 0 replies; 39+ messages in thread
From: Oleksandr Natalenko @ 2017-09-18 20:41 UTC (permalink / raw)
  To: Yuchung Cheng
  Cc: Neal Cardwell, David S. Miller, Alexey Kuznetsov,
	Hideaki YOSHIFUJI, Netdev

OK,

with:

net.ipv4.tcp_recovery = 0
net.ipv4.tcp_fack = 0

and your patch I got the following warning within 10 min uptime:

===
Sep 18 22:18:34 defiant kernel: ------------[ cut here ]------------
Sep 18 22:18:34 defiant kernel: WARNING: CPU: 0 PID: 702 at net/ipv4/
tcp_input.c:2392 tcp_undo_cwnd_reduction+0xbd/0xd0
Sep 18 22:18:34 defiant kernel: Modules linked in: netconsole ctr ccm cls_bpf 
sch_htb act_mirred cls_u32 sch_ingress sit tunnel4 ip_tunnel 8021q mrp 
nf_conntrack_ipv6 nf_defrag_ipv6 nft_ct nft_set_bitmap nft_set_hash 
nft_set_rbtree nf_tables_inet nf_tables_ipv6 nft_masq_ipv4 
nf_nat_masquerade_ipv4 nft_masq nft_nat nft_counter nft_meta 
nft_chain_nat_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat 
nf_conntrack libcrc32c crc32c_generic nf_tables_ipv4 nf_tables tun nct6775 
nfnetlink hwmon_vid nls_iso8859_1 nls_cp437 vfat fat ext4 snd_hda_codec_hdmi 
mbcache jbd2 snd_hda_codec_realtek snd_hda_codec_generic f2fs arc4 fscrypto 
intel_rapl iTCO_wdt ath9k iTCO_vendor_support intel_powerclamp ath9k_common 
ath9k_hw coretemp kvm_intel ath mac80211 kvm irqbypass intel_cstate cfg80211 
pcspkr snd_hda_intel snd_hda_codec r8169
Sep 18 22:18:34 defiant kernel:  joydev evdev mii snd_hda_core mousedev 
mei_txe input_leds i2c_i801 mac_hid i915 lpc_ich mei shpchp snd_hwdep 
snd_intel_sst_acpi snd_intel_sst_core snd_soc_rt5670 
snd_soc_sst_atom_hifi2_platform battery snd_soc_sst_match snd_soc_rl6231 
drm_kms_helper hci_uart ov5693(C) ov2722(C) lm3554(C) btbcm btqca v4l2_common 
snd_soc_core btintel snd_compress videodev snd_pcm_dmaengine snd_pcm video 
bluetooth snd_timer drm media tpm_tis snd i2c_hid soundcore tpm_tis_core 
rfkill_gpio ac97_bus soc_button_array ecdh_generic rfkill crc16 tpm 8250_dw 
intel_gtt syscopyarea sysfillrect acpi_pad sysimgblt intel_int0002_vgpio 
fb_sys_fops pinctrl_cherryview i2c_algo_bit button sch_fq_codel tcp_bbr ifb 
ip_tables x_tables btrfs xor raid6_pq algif_skcipher af_alg hid_logitech_hidpp 
hid_logitech_dj usbhid hid uas
Sep 18 22:18:34 defiant kernel:  usb_storage dm_crypt dm_mod dax raid10 md_mod 
sd_mod crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc 
ahci aesni_intel xhci_pci libahci aes_x86_64 crypto_simd glue_helper xhci_hcd 
cryptd libata usbcore scsi_mod usb_common serio sdhci_acpi sdhci led_class 
mmc_core
Sep 18 22:18:34 defiant kernel: CPU: 0 PID: 702 Comm: irq/123-enp3s0 Tainted: 
G         C      4.13.0-pf4 #1
Sep 18 22:18:34 defiant kernel: Hardware name: To Be Filled By O.E.M. To Be 
Filled By O.E.M./J3710-ITX, BIOS P1.30 03/30/2016
Sep 18 22:18:34 defiant kernel: task: ffff88923a738000 task.stack: 
ffff958001500000
Sep 18 22:18:34 defiant kernel: RIP: 0010:tcp_undo_cwnd_reduction+0xbd/0xd0
Sep 18 22:18:34 defiant kernel: RSP: 0018:ffff88927fc03a48 EFLAGS: 00010202
Sep 18 22:18:34 defiant kernel: RAX: 0000000000000001 RBX: ffff889241130800 
RCX: ffff88927fc03b0c
Sep 18 22:18:34 defiant kernel: RDX: 000000007fffffff RSI: 0000000000000001 
RDI: ffff889241130800
Sep 18 22:18:34 defiant kernel: RBP: ffff88927fc03a50 R08: 0000000000000000 
R09: 000000007b3e89be
Sep 18 22:18:34 defiant kernel: R10: 000000007b3ea2f6 R11: 000000007b3e89be 
R12: 0000000000005320
Sep 18 22:18:34 defiant kernel: R13: ffff88927fc03b10 R14: 0000000000000001 
R15: ffff88927fc03b0c
Sep 18 22:18:34 defiant kernel: FS:  0000000000000000(0000) 
GS:ffff88927fc00000(0000) knlGS:0000000000000000
Sep 18 22:18:34 defiant kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 
0000000080050033
Sep 18 22:18:34 defiant kernel: CR2: 00007fe9c4106898 CR3: 0000000114a09000 
CR4: 00000000001006f0
Sep 18 22:18:34 defiant kernel: Call Trace:
Sep 18 22:18:34 defiant kernel:  <IRQ>
Sep 18 22:18:34 defiant kernel:  tcp_try_undo_loss+0xb3/0xf0
Sep 18 22:18:34 defiant kernel:  tcp_fastretrans_alert+0x746/0x990
Sep 18 22:18:34 defiant kernel:  tcp_ack+0x741/0x1110
Sep 18 22:18:34 defiant kernel:  tcp_rcv_established+0x325/0x770
Sep 18 22:18:34 defiant kernel:  ? sk_filter_trim_cap+0xd4/0x1a0
Sep 18 22:18:34 defiant kernel:  tcp_v4_do_rcv+0x90/0x1e0
Sep 18 22:18:34 defiant kernel:  tcp_v4_rcv+0x950/0xa10
Sep 18 22:18:34 defiant kernel:  ? nf_ct_deliver_cached_events+0xb8/0x110 
[nf_conntrack]
Sep 18 22:18:34 defiant kernel:  ip_local_deliver_finish+0x68/0x210
Sep 18 22:18:34 defiant kernel:  ip_local_deliver+0xfa/0x110
Sep 18 22:18:34 defiant kernel:  ? ip_rcv_finish+0x410/0x410
Sep 18 22:18:34 defiant kernel:  ip_rcv_finish+0x120/0x410
Sep 18 22:18:34 defiant kernel:  ip_rcv+0x28e/0x3b0
Sep 18 22:18:34 defiant kernel:  ? inet_del_offload+0x40/0x40
Sep 18 22:18:34 defiant kernel:  __netif_receive_skb_core+0x39b/0xb00
Sep 18 22:18:34 defiant kernel:  ? skb_release_all+0x24/0x30
Sep 18 22:18:34 defiant kernel:  ? consume_skb+0x38/0xa0
Sep 18 22:18:34 defiant kernel:  __netif_receive_skb+0x18/0x60
Sep 18 22:18:34 defiant kernel:  netif_receive_skb_internal+0x98/0x480
Sep 18 22:18:34 defiant kernel:  netif_receive_skb+0x1c/0x80
Sep 18 22:18:34 defiant kernel:  ifb_ri_tasklet+0x109/0x26a [ifb]
Sep 18 22:18:34 defiant kernel:  tasklet_action+0x63/0x120
Sep 18 22:18:34 defiant kernel:  __do_softirq+0xdf/0x2e5
Sep 18 22:18:34 defiant kernel:  ? irq_finalize_oneshot.part.39+0xe0/0xe0
Sep 18 22:18:34 defiant kernel:  do_softirq_own_stack+0x1c/0x30
Sep 18 22:18:34 defiant kernel:  </IRQ>
Sep 18 22:18:34 defiant kernel:  do_softirq.part.17+0x4e/0x60
Sep 18 22:18:34 defiant kernel:  __local_bh_enable_ip+0x77/0x80
Sep 18 22:18:34 defiant kernel:  irq_forced_thread_fn+0x5c/0x70
Sep 18 22:18:34 defiant kernel:  irq_thread+0x131/0x1a0
Sep 18 22:18:34 defiant kernel:  ? wake_threads_waitq+0x30/0x30
Sep 18 22:18:34 defiant kernel:  kthread+0x126/0x140
Sep 18 22:18:34 defiant kernel:  ? irq_thread_check_affinity+0x90/0x90
Sep 18 22:18:34 defiant kernel:  ? kthread_create_on_node+0x70/0x70
Sep 18 22:18:34 defiant kernel:  ret_from_fork+0x25/0x30
Sep 18 22:18:34 defiant kernel: Code: 5d c3 80 60 35 fb 48 8b 00 48 39 c2 74 
85 48 3b 83 50 01 00 00 75 eb e9 77 ff ff ff 89 83 48 06 00 00 80 a3 1e 06 00 
00 fb eb b3 <0f> ff 5b 5d c3 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 
Sep 18 22:18:34 defiant kernel: ---[ end trace 1aea180efeedb473 ]---
===

Should I continue with net.ipv4.tcp_recovery = 1, or this is enough?

On pondělí 18. září 2017 20:01:42 CEST Yuchung Cheng wrote:
> On Mon, Sep 18, 2017 at 10:59 AM, Oleksandr Natalenko
> 
> <oleksandr@natalenko.name> wrote:
> > OK. Should I keep FACK disabled?
> 
> Yes since it is disabled in the upstream by default. Although you can
> experiment FACK enabled additionally.
> 
> Do we know the crash you first experienced is tied to this issue?
> 
> > On pondělí 18. září 2017 19:51:21 CEST Yuchung Cheng wrote:
> >> Can you try this patch to verify my theory with tcp_recovery=0 and 1?
> >> thanks
> >> 
> >> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> >> index 5af2f04f8859..9253d9ee7d0e 100644
> >> --- a/net/ipv4/tcp_input.c
> >> +++ b/net/ipv4/tcp_input.c
> >> @@ -2381,6 +2381,7 @@ static void tcp_undo_cwnd_reduction(struct sock
> >> *sk, bool unmark_loss)
> >> 
> >>         }
> >>         tp->snd_cwnd_stamp = tcp_time_stamp;
> >>         tp->undo_marker = 0;
> >> 
> >> +       WARN_ON(tp->retrans_out);
> >> 
> >>  }

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c
  2017-09-18 18:01                 ` Yuchung Cheng
@ 2017-09-18 18:04                   ` Oleksandr Natalenko
  2017-09-18 20:41                   ` Oleksandr Natalenko
  2017-09-18 20:46                   ` Oleksandr Natalenko
  2 siblings, 0 replies; 39+ messages in thread
From: Oleksandr Natalenko @ 2017-09-18 18:04 UTC (permalink / raw)
  To: Yuchung Cheng
  Cc: Neal Cardwell, David S. Miller, Alexey Kuznetsov,
	Hideaki YOSHIFUJI, Netdev

On pondělí 18. září 2017 20:01:42 CEST Yuchung Cheng wrote:
> Yes since it is disabled in the upstream by default. Although you can
> experiment FACK enabled additionally.

OK.

> Do we know the crash you first experienced is tied to this issue?

No, unfortunately. I wasn't able to re-create it again, so lets focus on 
tcp_fastretrans_alert warning only.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c
  2017-09-18 17:59               ` Oleksandr Natalenko
@ 2017-09-18 18:01                 ` Yuchung Cheng
  2017-09-18 18:04                   ` Oleksandr Natalenko
                                     ` (2 more replies)
  0 siblings, 3 replies; 39+ messages in thread
From: Yuchung Cheng @ 2017-09-18 18:01 UTC (permalink / raw)
  To: Oleksandr Natalenko
  Cc: Neal Cardwell, David S. Miller, Alexey Kuznetsov,
	Hideaki YOSHIFUJI, Netdev

On Mon, Sep 18, 2017 at 10:59 AM, Oleksandr Natalenko
<oleksandr@natalenko.name> wrote:
> OK. Should I keep FACK disabled?
Yes since it is disabled in the upstream by default. Although you can
experiment FACK enabled additionally.

Do we know the crash you first experienced is tied to this issue?

>
> On pondělí 18. září 2017 19:51:21 CEST Yuchung Cheng wrote:
>> Can you try this patch to verify my theory with tcp_recovery=0 and 1? thanks
>>
>> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
>> index 5af2f04f8859..9253d9ee7d0e 100644
>> --- a/net/ipv4/tcp_input.c
>> +++ b/net/ipv4/tcp_input.c
>> @@ -2381,6 +2381,7 @@ static void tcp_undo_cwnd_reduction(struct sock
>> *sk, bool unmark_loss)
>>         }
>>         tp->snd_cwnd_stamp = tcp_time_stamp;
>>         tp->undo_marker = 0;
>> +       WARN_ON(tp->retrans_out);
>>  }
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c
  2017-09-18 17:51             ` Yuchung Cheng
@ 2017-09-18 17:59               ` Oleksandr Natalenko
  2017-09-18 18:01                 ` Yuchung Cheng
  0 siblings, 1 reply; 39+ messages in thread
From: Oleksandr Natalenko @ 2017-09-18 17:59 UTC (permalink / raw)
  To: Yuchung Cheng
  Cc: Neal Cardwell, David S. Miller, Alexey Kuznetsov,
	Hideaki YOSHIFUJI, Netdev

OK. Should I keep FACK disabled?

On pondělí 18. září 2017 19:51:21 CEST Yuchung Cheng wrote:
> Can you try this patch to verify my theory with tcp_recovery=0 and 1? thanks
> 
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 5af2f04f8859..9253d9ee7d0e 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -2381,6 +2381,7 @@ static void tcp_undo_cwnd_reduction(struct sock
> *sk, bool unmark_loss)
>         }
>         tp->snd_cwnd_stamp = tcp_time_stamp;
>         tp->undo_marker = 0;
> +       WARN_ON(tp->retrans_out);
>  }

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c
  2017-09-18 17:18           ` Yuchung Cheng
@ 2017-09-18 17:51             ` Yuchung Cheng
  2017-09-18 17:59               ` Oleksandr Natalenko
  0 siblings, 1 reply; 39+ messages in thread
From: Yuchung Cheng @ 2017-09-18 17:51 UTC (permalink / raw)
  To: Oleksandr Natalenko
  Cc: Neal Cardwell, David S. Miller, Alexey Kuznetsov,
	Hideaki YOSHIFUJI, Netdev

On Mon, Sep 18, 2017 at 10:18 AM, Yuchung Cheng <ycheng@google.com> wrote:
> On Sun, Sep 17, 2017 at 11:43 AM, Oleksandr Natalenko
> <oleksandr@natalenko.name> wrote:
>> Hi.
>>
>> Just to note that it looks like disabling RACK and re-enabling FACK prevents
>> warning from happening:
>>
>> net.ipv4.tcp_fack = 1
>> net.ipv4.tcp_recovery = 0
>>
>> Hope I get semantics of these tunables right.
> Thanks.
>
> One difference between RACK and FACK is that RACK can detect lost
> retransmission in CA_Recovery (fast recovery) and CA_Loss  (post RTO)
> mode, while the current FACK can not. A previous FACK version can also
> detect lost retransmission in CA_recovery with limited-transmit. I
> suspect it is RACK's special ability that triggers this warning.
>
> IMO, however, this warning itself is questionably valid: with undo
> (TCP Eifel), the sender can detect and revert a false CA_Recovery /
> CA_Loss to CA_Open, with spurious retransmission in-flight
> (tp->retrans_out > 0). Then another SACK after undo triggers this
> warning. Neal and I are not sure if this is causing the panics you're
> seeing, but personally I'd argue this warning is false, or at least
> should be revised to skip undo case.
Can you try this patch to verify my theory with tcp_recovery=0 and 1? thanks

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 5af2f04f8859..9253d9ee7d0e 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2381,6 +2381,7 @@ static void tcp_undo_cwnd_reduction(struct sock
*sk, bool unmark_loss)
        }
        tp->snd_cwnd_stamp = tcp_time_stamp;
        tp->undo_marker = 0;
+       WARN_ON(tp->retrans_out);
 }




>
>
>>
>> On pátek 15. září 2017 21:04:36 CEST Oleksandr Natalenko wrote:
>>> Hello.
>>>
>>> With net.ipv4.tcp_fack set to 0 the warning still appears:
>>>
>>> ===
>>> » sysctl net.ipv4.tcp_fack
>>> net.ipv4.tcp_fack = 0
>>>
>>> » LC_TIME=C dmesg -T | grep WARNING
>>> [Fri Sep 15 20:40:30 2017] WARNING: CPU: 1 PID: 711 at net/ipv4/tcp_input.c:
>>> 2826 tcp_fastretrans_alert+0x7c8/0x990
>>> [Fri Sep 15 20:40:30 2017] WARNING: CPU: 0 PID: 711 at net/ipv4/tcp_input.c:
>>> 2826 tcp_fastretrans_alert+0x7c8/0x990
>>> [Fri Sep 15 20:48:37 2017] WARNING: CPU: 1 PID: 711 at net/ipv4/tcp_input.c:
>>> 2826 tcp_fastretrans_alert+0x7c8/0x990
>>> [Fri Sep 15 20:48:55 2017] WARNING: CPU: 0 PID: 711 at net/ipv4/tcp_input.c:
>>> 2826 tcp_fastretrans_alert+0x7c8/0x990
>>>
>>> » ps -up 711
>>> USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
>>> root       711  4.3  0.0      0     0 ?        S    18:12   7:23 [irq/123-
>>> enp3s0]
>>> ===
>>>
>>> Any suggestions?
>>>
>>> On pátek 15. září 2017 16:03:00 CEST Neal Cardwell wrote:
>>> > Thanks for testing that. That is a very useful data point.
>>> >
>>> > I was able to cook up a packetdrill test that could put the connection
>>> > in CA_Disorder with retransmitted packets out, but not in CA_Open. So
>>> > we do not yet have a test case to reproduce this.
>>> >
>>> > We do not see this warning on our fleet at Google. One significant
>>> > difference I see between our environment and yours is that it seems
>>> >
>>> > you run with FACK enabled:
>>> >   net.ipv4.tcp_fack = 1
>>> >
>>> > Note that FACK was disabled by default (since it was replaced by RACK)
>>> > between kernel v4.10 and v4.11. And this is exactly the time when this
>>> > bug started manifesting itself for you and some others, but not our
>>> > fleet. So my new working hypothesis would be that this warning is due
>>> > to a behavior that only shows up in kernels >=4.11 when FACK is
>>> > enabled.
>>> >
>>> > Would you be able to disable FACK ("sysctl net.ipv4.tcp_fack=0" at
>>> > boot, or net.ipv4.tcp_fack=0 in /etc/sysctl.conf, or equivalent),
>>> > reboot, and test the kernel for a few days to see if the warning still
>>> > pops up?
>>> >
>>> > thanks,
>>> > neal
>>> >
>>> > [ps: apologies for the previous, mis-formatted post...]
>>
>>

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c
  2017-09-17 18:43         ` Oleksandr Natalenko
@ 2017-09-18 17:18           ` Yuchung Cheng
  2017-09-18 17:51             ` Yuchung Cheng
  0 siblings, 1 reply; 39+ messages in thread
From: Yuchung Cheng @ 2017-09-18 17:18 UTC (permalink / raw)
  To: Oleksandr Natalenko
  Cc: Neal Cardwell, David S. Miller, Alexey Kuznetsov,
	Hideaki YOSHIFUJI, Netdev

On Sun, Sep 17, 2017 at 11:43 AM, Oleksandr Natalenko
<oleksandr@natalenko.name> wrote:
> Hi.
>
> Just to note that it looks like disabling RACK and re-enabling FACK prevents
> warning from happening:
>
> net.ipv4.tcp_fack = 1
> net.ipv4.tcp_recovery = 0
>
> Hope I get semantics of these tunables right.
Thanks.

One difference between RACK and FACK is that RACK can detect lost
retransmission in CA_Recovery (fast recovery) and CA_Loss  (post RTO)
mode, while the current FACK can not. A previous FACK version can also
detect lost retransmission in CA_recovery with limited-transmit. I
suspect it is RACK's special ability that triggers this warning.

IMO, however, this warning itself is questionably valid: with undo
(TCP Eifel), the sender can detect and revert a false CA_Recovery /
CA_Loss to CA_Open, with spurious retransmission in-flight
(tp->retrans_out > 0). Then another SACK after undo triggers this
warning. Neal and I are not sure if this is causing the panics you're
seeing, but personally I'd argue this warning is false, or at least
should be revised to skip undo case.


>
> On pátek 15. září 2017 21:04:36 CEST Oleksandr Natalenko wrote:
>> Hello.
>>
>> With net.ipv4.tcp_fack set to 0 the warning still appears:
>>
>> ===
>> » sysctl net.ipv4.tcp_fack
>> net.ipv4.tcp_fack = 0
>>
>> » LC_TIME=C dmesg -T | grep WARNING
>> [Fri Sep 15 20:40:30 2017] WARNING: CPU: 1 PID: 711 at net/ipv4/tcp_input.c:
>> 2826 tcp_fastretrans_alert+0x7c8/0x990
>> [Fri Sep 15 20:40:30 2017] WARNING: CPU: 0 PID: 711 at net/ipv4/tcp_input.c:
>> 2826 tcp_fastretrans_alert+0x7c8/0x990
>> [Fri Sep 15 20:48:37 2017] WARNING: CPU: 1 PID: 711 at net/ipv4/tcp_input.c:
>> 2826 tcp_fastretrans_alert+0x7c8/0x990
>> [Fri Sep 15 20:48:55 2017] WARNING: CPU: 0 PID: 711 at net/ipv4/tcp_input.c:
>> 2826 tcp_fastretrans_alert+0x7c8/0x990
>>
>> » ps -up 711
>> USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
>> root       711  4.3  0.0      0     0 ?        S    18:12   7:23 [irq/123-
>> enp3s0]
>> ===
>>
>> Any suggestions?
>>
>> On pátek 15. září 2017 16:03:00 CEST Neal Cardwell wrote:
>> > Thanks for testing that. That is a very useful data point.
>> >
>> > I was able to cook up a packetdrill test that could put the connection
>> > in CA_Disorder with retransmitted packets out, but not in CA_Open. So
>> > we do not yet have a test case to reproduce this.
>> >
>> > We do not see this warning on our fleet at Google. One significant
>> > difference I see between our environment and yours is that it seems
>> >
>> > you run with FACK enabled:
>> >   net.ipv4.tcp_fack = 1
>> >
>> > Note that FACK was disabled by default (since it was replaced by RACK)
>> > between kernel v4.10 and v4.11. And this is exactly the time when this
>> > bug started manifesting itself for you and some others, but not our
>> > fleet. So my new working hypothesis would be that this warning is due
>> > to a behavior that only shows up in kernels >=4.11 when FACK is
>> > enabled.
>> >
>> > Would you be able to disable FACK ("sysctl net.ipv4.tcp_fack=0" at
>> > boot, or net.ipv4.tcp_fack=0 in /etc/sysctl.conf, or equivalent),
>> > reboot, and test the kernel for a few days to see if the warning still
>> > pops up?
>> >
>> > thanks,
>> > neal
>> >
>> > [ps: apologies for the previous, mis-formatted post...]
>
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c
  2017-09-15 19:04       ` Oleksandr Natalenko
@ 2017-09-17 18:43         ` Oleksandr Natalenko
  2017-09-18 17:18           ` Yuchung Cheng
  0 siblings, 1 reply; 39+ messages in thread
From: Oleksandr Natalenko @ 2017-09-17 18:43 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI, Netdev,
	Yuchung Cheng

Hi.

Just to note that it looks like disabling RACK and re-enabling FACK prevents 
warning from happening:

net.ipv4.tcp_fack = 1
net.ipv4.tcp_recovery = 0

Hope I get semantics of these tunables right.

On pátek 15. září 2017 21:04:36 CEST Oleksandr Natalenko wrote:
> Hello.
> 
> With net.ipv4.tcp_fack set to 0 the warning still appears:
> 
> ===
> » sysctl net.ipv4.tcp_fack
> net.ipv4.tcp_fack = 0
> 
> » LC_TIME=C dmesg -T | grep WARNING
> [Fri Sep 15 20:40:30 2017] WARNING: CPU: 1 PID: 711 at net/ipv4/tcp_input.c:
> 2826 tcp_fastretrans_alert+0x7c8/0x990
> [Fri Sep 15 20:40:30 2017] WARNING: CPU: 0 PID: 711 at net/ipv4/tcp_input.c:
> 2826 tcp_fastretrans_alert+0x7c8/0x990
> [Fri Sep 15 20:48:37 2017] WARNING: CPU: 1 PID: 711 at net/ipv4/tcp_input.c:
> 2826 tcp_fastretrans_alert+0x7c8/0x990
> [Fri Sep 15 20:48:55 2017] WARNING: CPU: 0 PID: 711 at net/ipv4/tcp_input.c:
> 2826 tcp_fastretrans_alert+0x7c8/0x990
> 
> » ps -up 711
> USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
> root       711  4.3  0.0      0     0 ?        S    18:12   7:23 [irq/123-
> enp3s0]
> ===
> 
> Any suggestions?
> 
> On pátek 15. září 2017 16:03:00 CEST Neal Cardwell wrote:
> > Thanks for testing that. That is a very useful data point.
> > 
> > I was able to cook up a packetdrill test that could put the connection
> > in CA_Disorder with retransmitted packets out, but not in CA_Open. So
> > we do not yet have a test case to reproduce this.
> > 
> > We do not see this warning on our fleet at Google. One significant
> > difference I see between our environment and yours is that it seems
> > 
> > you run with FACK enabled:
> >   net.ipv4.tcp_fack = 1
> > 
> > Note that FACK was disabled by default (since it was replaced by RACK)
> > between kernel v4.10 and v4.11. And this is exactly the time when this
> > bug started manifesting itself for you and some others, but not our
> > fleet. So my new working hypothesis would be that this warning is due
> > to a behavior that only shows up in kernels >=4.11 when FACK is
> > enabled.
> > 
> > Would you be able to disable FACK ("sysctl net.ipv4.tcp_fack=0" at
> > boot, or net.ipv4.tcp_fack=0 in /etc/sysctl.conf, or equivalent),
> > reboot, and test the kernel for a few days to see if the warning still
> > pops up?
> > 
> > thanks,
> > neal
> > 
> > [ps: apologies for the previous, mis-formatted post...]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c
  2017-09-15 14:03     ` Neal Cardwell
@ 2017-09-15 19:04       ` Oleksandr Natalenko
  2017-09-17 18:43         ` Oleksandr Natalenko
  0 siblings, 1 reply; 39+ messages in thread
From: Oleksandr Natalenko @ 2017-09-15 19:04 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI, Netdev,
	Yuchung Cheng

Hello.

With net.ipv4.tcp_fack set to 0 the warning still appears:

===
» sysctl net.ipv4.tcp_fack     
net.ipv4.tcp_fack = 0

» LC_TIME=C dmesg -T | grep WARNING
[Fri Sep 15 20:40:30 2017] WARNING: CPU: 1 PID: 711 at net/ipv4/tcp_input.c:
2826 tcp_fastretrans_alert+0x7c8/0x990
[Fri Sep 15 20:40:30 2017] WARNING: CPU: 0 PID: 711 at net/ipv4/tcp_input.c:
2826 tcp_fastretrans_alert+0x7c8/0x990
[Fri Sep 15 20:48:37 2017] WARNING: CPU: 1 PID: 711 at net/ipv4/tcp_input.c:
2826 tcp_fastretrans_alert+0x7c8/0x990
[Fri Sep 15 20:48:55 2017] WARNING: CPU: 0 PID: 711 at net/ipv4/tcp_input.c:
2826 tcp_fastretrans_alert+0x7c8/0x990

» ps -up 711
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root       711  4.3  0.0      0     0 ?        S    18:12   7:23 [irq/123-
enp3s0]
===

Any suggestions?

On pátek 15. září 2017 16:03:00 CEST Neal Cardwell wrote:
> Thanks for testing that. That is a very useful data point.
> 
> I was able to cook up a packetdrill test that could put the connection
> in CA_Disorder with retransmitted packets out, but not in CA_Open. So
> we do not yet have a test case to reproduce this.
> 
> We do not see this warning on our fleet at Google. One significant
> difference I see between our environment and yours is that it seems
> you run with FACK enabled:
> 
>   net.ipv4.tcp_fack = 1
> 
> Note that FACK was disabled by default (since it was replaced by RACK)
> between kernel v4.10 and v4.11. And this is exactly the time when this
> bug started manifesting itself for you and some others, but not our
> fleet. So my new working hypothesis would be that this warning is due
> to a behavior that only shows up in kernels >=4.11 when FACK is
> enabled.
> 
> Would you be able to disable FACK ("sysctl net.ipv4.tcp_fack=0" at
> boot, or net.ipv4.tcp_fack=0 in /etc/sysctl.conf, or equivalent),
> reboot, and test the kernel for a few days to see if the warning still
> pops up?
> 
> thanks,
> neal
> 
> [ps: apologies for the previous, mis-formatted post...]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c
  2017-09-15  5:03   ` Oleksandr Natalenko
@ 2017-09-15 14:03     ` Neal Cardwell
  2017-09-15 19:04       ` Oleksandr Natalenko
  0 siblings, 1 reply; 39+ messages in thread
From: Neal Cardwell @ 2017-09-15 14:03 UTC (permalink / raw)
  To: Oleksandr Natalenko
  Cc: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI, Netdev,
	Yuchung Cheng

On Fri, Sep 15, 2017 at 1:03 AM, Oleksandr Natalenko
<oleksandr@natalenko.name> wrote:
> Hi.
>
> I've applied your test patch but it doesn't fix the issue for me since the
> warning is still there.
>
> Were you able to reproduce it?

Hi,

Thanks for testing that. That is a very useful data point.

I was able to cook up a packetdrill test that could put the connection
in CA_Disorder with retransmitted packets out, but not in CA_Open. So
we do not yet have a test case to reproduce this.

We do not see this warning on our fleet at Google. One significant
difference I see between our environment and yours is that it seems
you run with FACK enabled:

  net.ipv4.tcp_fack = 1

Note that FACK was disabled by default (since it was replaced by RACK)
between kernel v4.10 and v4.11. And this is exactly the time when this
bug started manifesting itself for you and some others, but not our
fleet. So my new working hypothesis would be that this warning is due
to a behavior that only shows up in kernels >=4.11 when FACK is
enabled.

Would you be able to disable FACK ("sysctl net.ipv4.tcp_fack=0" at
boot, or net.ipv4.tcp_fack=0 in /etc/sysctl.conf, or equivalent),
reboot, and test the kernel for a few days to see if the warning still
pops up?

thanks,
neal

[ps: apologies for the previous, mis-formatted post...]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c
  2017-09-10 23:59 ` Neal Cardwell
@ 2017-09-15  5:03   ` Oleksandr Natalenko
  2017-09-15 14:03     ` Neal Cardwell
  0 siblings, 1 reply; 39+ messages in thread
From: Oleksandr Natalenko @ 2017-09-15  5:03 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI, Netdev,
	Yuchung Cheng

Hi.

I've applied your test patch but it doesn't fix the issue for me since the 
warning is still there.

Were you able to reproduce it?

On pondělí 11. září 2017 1:59:02 CEST Neal Cardwell wrote:
> Thanks for the detailed report!
> 
> I suspect this is due to the following commit, which happened between
> 4.10 and 4.11:
> 
>   89fe18e44f7e tcp: extend F-RTO to catch more spurious timeouts
>  
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?
> id=89fe18e44f7e
> 
> This commit expanded the set of scenarios where we would undo a
> CA_Loss cwnd reduction and return to TCP_CA_Open, but did not include
> a check to see if there were any in-flight retransmissions. I think we
> need a fix like the following:
> 
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 659d1baefb2b..730a2de9d2b0 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -2439,7 +2439,7 @@ static bool tcp_try_undo_loss(struct sock *sk,
> bool frto_undo)
>  {
>         struct tcp_sock *tp = tcp_sk(sk);
> 
> -       if (frto_undo || tcp_may_undo(tp)) {
> +       if ((frto_undo || tcp_may_undo(tp)) && !tp->retrans_out) {
>                 tcp_undo_cwnd_reduction(sk, true);
> 
>                 DBGUNDO(sk, "partial loss");
> 
> I will try a packetdrill test to see if I can reproduce this issue and
> verify the fix.
> 
> thanks,
> neal

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c
  2017-09-10 20:53 Oleksandr Natalenko
@ 2017-09-10 23:59 ` Neal Cardwell
  2017-09-15  5:03   ` Oleksandr Natalenko
  0 siblings, 1 reply; 39+ messages in thread
From: Neal Cardwell @ 2017-09-10 23:59 UTC (permalink / raw)
  To: Oleksandr Natalenko
  Cc: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI, Netdev,
	Yuchung Cheng

On Sun, Sep 10, 2017 at 4:53 PM, Oleksandr Natalenko
<oleksandr@natalenko.name> wrote:
> Hello.
>
> Since, IIRC, v4.11, there is some regression in TCP stack resulting in the
> warning shown below. Most of the time it is harmless, but rarely it just
> causes either freeze or (I believe, this is related too) panic in
> tcp_sacktag_walk() (because sk_buff passed to this function is NULL).
> Unfortunately, I still do not have proper stacktrace from panic, but will try
> to capture it if possible.
...
> [14407.060066] ------------[ cut here ]------------
> [14407.060353] WARNING: CPU: 0 PID: 719 at net/ipv4/tcp_input.c:2826
> tcp_fastretrans_alert+0x7c8/0x990
...
> 2823     /* D. Check state exit conditions. State can be terminated
> 2824      *    when high_seq is ACKed. */
> 2825     if (icsk->icsk_ca_state == TCP_CA_Open) {
> 2826         WARN_ON(tp->retrans_out != 0); // here
> 2827         tp->retrans_stamp = 0;

Thanks for the detailed report!

I suspect this is due to the following commit, which happened between
4.10 and 4.11:

  89fe18e44f7e tcp: extend F-RTO to catch more spurious timeouts
  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=89fe18e44f7e

This commit expanded the set of scenarios where we would undo a
CA_Loss cwnd reduction and return to TCP_CA_Open, but did not include
a check to see if there were any in-flight retransmissions. I think we
need a fix like the following:

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 659d1baefb2b..730a2de9d2b0 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2439,7 +2439,7 @@ static bool tcp_try_undo_loss(struct sock *sk,
bool frto_undo)
 {
        struct tcp_sock *tp = tcp_sk(sk);

-       if (frto_undo || tcp_may_undo(tp)) {
+       if ((frto_undo || tcp_may_undo(tp)) && !tp->retrans_out) {
                tcp_undo_cwnd_reduction(sk, true);

                DBGUNDO(sk, "partial loss");

I will try a packetdrill test to see if I can reproduce this issue and
verify the fix.

thanks,
neal

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c
@ 2017-09-10 20:53 Oleksandr Natalenko
  2017-09-10 23:59 ` Neal Cardwell
  0 siblings, 1 reply; 39+ messages in thread
From: Oleksandr Natalenko @ 2017-09-10 20:53 UTC (permalink / raw)
  To: David S. Miller; +Cc: Alexey Kuznetsov, Hideaki YOSHIFUJI, netdev, linux-kernel

Hello.

Since, IIRC, v4.11, there is some regression in TCP stack resulting in the 
warning shown below. Most of the time it is harmless, but rarely it just 
causes either freeze or (I believe, this is related too) panic in 
tcp_sacktag_walk() (because sk_buff passed to this function is NULL). 
Unfortunately, I still do not have proper stacktrace from panic, but will try 
to capture it if possible.

Also, I have custom settings regarding TCP stack, shown below as well. ifb is 
used to shape traffic with tc.

Please note this regression was already reported as BZ [1] and as a letter to 
ML [2], but got neither attention nor resolution. It is reproducible for (not 
only) me on my home router since v4.11 till v4.13.1 incl.

Please advise on how to deal with it. I'll provide any additional info if 
necessary, also ready to test patches if any.

Thanks.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=195835
[2] https://www.spinics.net/lists/netdev/msg436158.html

=== warning
[14407.060066] ------------[ cut here ]------------
[14407.060353] WARNING: CPU: 0 PID: 719 at net/ipv4/tcp_input.c:2826 
tcp_fastretrans_alert+0x7c8/0x990
[14407.060747] Modules linked in: netconsole ctr ccm cls_bpf sch_htb 
act_mirred cls_u32 sch_ingress sit tunnel4 ip_tunnel 8021q mrp nf
_conntrack_ipv6 nf_defrag_ipv6 nft_ct nft_set_bitmap nft_set_hash 
nft_set_rbtree nf_tables_inet nf_tables_ipv6 nft_masq_ipv4 nf_nat_ma
squerade_ipv4 nft_masq nft_nat nft_counter nft_meta nft_chain_nat_ipv4 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrac
k libcrc32c crc32c_generic nf_tables_ipv4 tun nf_tables nfnetlink nct6775 
hwmon_vid nls_iso8859_1 nls_cp437 vfat fat ext4 mbcache jbd2
 arc4 f2fs snd_hda_codec_hdmi fscrypto snd_hda_codec_realtek 
snd_hda_codec_generic intel_rapl intel_powerclamp coretemp iTCO_wdt iTCO_
vendor_support ath9k ath9k_common kvm_intel ath9k_hw kvm ath irqbypass 
intel_cstate mac80211 pcspkr snd_intel_sst_acpi i2c_i801 i915 s
nd_hda_intel
[14407.063800]  snd_intel_sst_core r8169 cfg80211 evdev mii snd_hda_codec 
joydev mousedev input_leds snd_soc_rt5670 mei_txe snd_soc_ss
t_atom_hifi2_platform snd_hda_core snd_soc_rl6231 snd_soc_sst_match mac_hid 
mei lpc_ich shpchp drm_kms_helper snd_hwdep snd_soc_core s
nd_compress battery snd_pcm_dmaengine drm hci_uart ov2722(C) snd_pcm lm3554(C) 
ov5693(C) snd_timer v4l2_common btbcm snd intel_gtt btq
ca btintel videodev syscopyarea bluetooth video soundcore sysfillrect media 
sysimgblt ac97_bus ecdh_generic rfkill_gpio i2c_hid rfkill
 tpm_tis crc16 fb_sys_fops i2c_algo_bit 8250_dw tpm_tis_core tpm 
soc_button_array pinctrl_cherryview intel_int0002_vgpio acpi_pad butt
on sch_fq_codel tcp_bbr ifb ip_tables x_tables btrfs xor raid6_pq 
algif_skcipher af_alg hid_logitech_hidpp hid_logitech_dj usbhid hid
uas usb_storage
[14407.066873]  dm_crypt dm_mod dax raid10 md_mod sd_mod crct10dif_pclmul 
crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_int
el aes_x86_64 crypto_simd glue_helper cryptd ahci xhci_pci libahci xhci_hcd 
libata usbcore scsi_mod usb_common serio sdhci_acpi sdhci
led_class mmc_core
[14407.068034] CPU: 0 PID: 719 Comm: irq/123-enp3s0 Tainted: G         C      
4.13.0-pf2 #1
[14407.068403] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./
J3710-ITX, BIOS P1.30 03/30/2016
[14407.068827] task: ffff98b1c0a05400 task.stack: ffffbb59c15c0000
[14407.069111] RIP: 0010:tcp_fastretrans_alert+0x7c8/0x990
[14407.069358] RSP: 0018:ffff98b1ffc03a78 EFLAGS: 00010202
[14407.069607] RAX: 0000000000000000 RBX: ffff98b135ae0000 RCX: 
ffff98b1ffc03b0c
[14407.069928] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 
ffff98b135ae0000
[14407.070248] RBP: ffff98b1ffc03ab8 R08: 0000000000000000 R09: 
ffff98b1ffc03b60
[14407.070565] R10: 0000000000000000 R11: 0000000000000000 R12: 
0000000000005120
[14407.070884] R13: ffff98b1ffc03b10 R14: 0000000000000001 R15: 
ffff98b1ffc03b0c
[14407.071205] FS:  0000000000000000(0000) GS:ffff98b1ffc00000(0000) knlGS:
0000000000000000
[14407.071564] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[14407.071827] CR2: 00007ffc580b2f0f CR3: 0000000010a09000 CR4: 
00000000001006f0
[14407.072146] Call Trace:
[14407.072279]  <IRQ>
[14407.072412]  ? sk_reset_timer+0x18/0x30
[14407.072610]  tcp_ack+0x741/0x1110
[14407.072810]  tcp_rcv_established+0x325/0x770
[14407.073033]  ? sk_filter_trim_cap+0xd4/0x1a0
[14407.073249]  tcp_v4_do_rcv+0x90/0x1e0
[14407.073449]  tcp_v4_rcv+0x950/0xa10
[14407.073647]  ? nf_ct_deliver_cached_events+0xb8/0x110 [nf_conntrack]
[14407.073955]  ip_local_deliver_finish+0x68/0x210
[14407.074183]  ip_local_deliver+0xfa/0x110
[14407.074385]  ? ip_rcv_finish+0x410/0x410
[14407.074589]  ip_rcv_finish+0x120/0x410
[14407.074782]  ip_rcv+0x28e/0x3b0
[14407.074952]  ? inet_del_offload+0x40/0x40
[14407.075154]  __netif_receive_skb_core+0x39b/0xb00
[14407.075389]  ? netif_receive_skb_internal+0xa0/0x480
[14407.075635]  ? skb_release_all+0x24/0x30
[14407.075832]  ? consume_skb+0x38/0xa0
[14407.076025]  __netif_receive_skb+0x18/0x60
[14407.076230]  netif_receive_skb_internal+0x98/0x480
[14407.076470]  netif_receive_skb+0x1c/0x80
[14407.087463]  ifb_ri_tasklet+0x109/0x26a [ifb]
[14407.090528]  tasklet_action+0x63/0x120
[14407.093258]  __do_softirq+0xdf/0x2e5
[14407.095974]  ? irq_finalize_oneshot.part.39+0xe0/0xe0
[14407.098708]  do_softirq_own_stack+0x1c/0x30
[14407.101437]  </IRQ>
[14407.104139]  do_softirq.part.17+0x4e/0x60
[14407.106854]  __local_bh_enable_ip+0x77/0x80
[14407.109671]  irq_forced_thread_fn+0x5c/0x70
[14407.112407]  irq_thread+0x131/0x1a0
[14407.115120]  ? wake_threads_waitq+0x30/0x30
[14407.117836]  kthread+0x126/0x140
[14407.120541]  ? irq_thread_check_affinity+0x90/0x90
[14407.123244]  ? kthread_create_on_node+0x70/0x70
[14407.125913]  ret_from_fork+0x25/0x30
[14407.128548] Code: 05 00 00 3b 83 30 05 00 00 0f 88 ca 01 00 00 0f b6 83 3c 
06 00 00 80 a3 cd 05 00 00 7f c0 e8 04 0f 85 3b fb ff ff
 e9 2c fb ff ff <0f> ff e9 46 f9 ff ff 31 d2 48 89 df e8 47 aa ff ff e9 f9 f9 
ff
[14407.133867] ---[ end trace 4bb223d8deb9f077 ]---
===

=== code
2823     /* D. Check state exit conditions. State can be terminated
2824      *    when high_seq is ACKed. */
2825     if (icsk->icsk_ca_state == TCP_CA_Open) {
2826         WARN_ON(tp->retrans_out != 0); // here
2827         tp->retrans_stamp = 0;
===

=== sysctl custom settings
net.ipv4.ip_nonlocal_bind = 1
net.ipv4.ip_local_port_range = 1026 59999
net.ipv4.ip_forward = 1
net.ipv6.conf.all.forwarding = 1
net.ipv6.route.max_size = 16384
net.ipv4.ip_dynaddr = 1
net.ipv4.tcp_mtu_probing = 1
net.ipv4.tcp_congestion_control = bbr
net.ipv4.tcp_fack = 1
net.ipv4.tcp_fastopen = 3
net.ipv4.tcp_low_latency = 1
net.ipv4.tcp_fin_timeout = 10
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_slow_start_after_idle = 0
net.ipv4.tcp_rmem = 4096 262143 4194304
net.ipv4.tcp_wmem = 4096 262143 4194304
net.ipv4.tcp_keepalive_time = 300
net.ipv4.tcp_keepalive_intvl = 60
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_fin_timeout = 10
net.ipv4.tcp_retries2 = 5
net.core.rmem_max = 4194304
net.core.rmem_default = 262143
net.core.wmem_max = 4194304
net.core.wmem_default = 262143
net.core.bpf_jit_enable = 1
net.ipv4.tcp_ecn = 1
===

=== kernel cmdline
BOOT_IMAGE=/vmlinuz-linux-pf root=/dev/mapper/system-root rw cryptdevice=/dev/
md0:system:allow-discards resume=/dev/mapper/system-swap quiet zswap.enabled=1 
threadirqs
===

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2017-11-10 13:40 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-21  1:46 [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c Roman Gushchin
2017-09-21 17:07 ` Yuchung Cheng
     [not found]   ` <CAK6E8=cGF+xKiixRVvA=3PVPA7OQta9hVLTgCbKgvYf3e9Eu-A@mail.gmail.com>
2017-09-26 13:10     ` Roman Gushchin
2017-09-27  0:12       ` Yuchung Cheng
2017-09-27  0:18         ` Yuchung Cheng
2017-09-28  8:14           ` Oleksandr Natalenko
2017-09-28 23:36             ` Yuchung Cheng
2017-10-26  2:07               ` Alexei Starovoitov
2017-10-26  5:37                 ` Yuchung Cheng
2017-10-27 20:38                   ` Eric Dumazet
2017-10-31  6:08                     ` [PATCH net] tcp: fix tcp_mtu_probe() vs highest_sack Eric Dumazet
2017-10-31  6:17                       ` Alexei Starovoitov
2017-10-31  6:21                         ` Eric Dumazet
2017-10-31  6:30                           ` Alexei Starovoitov
2017-11-01  5:50                         ` Yuchung Cheng
2017-10-31 13:51                       ` Neal Cardwell
2017-11-01 12:20                       ` David Miller
2017-11-03 18:22                       ` Oleksandr Natalenko
2017-11-03 21:31                         ` Eric Dumazet
2017-11-06 22:27                     ` [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c Yuchung Cheng
2017-11-10 13:15                       ` Oleksandr Natalenko
2017-11-10 13:40                         ` Oleksandr Natalenko
  -- strict thread matches above, loose matches on Subject: below --
2017-09-10 20:53 Oleksandr Natalenko
2017-09-10 23:59 ` Neal Cardwell
2017-09-15  5:03   ` Oleksandr Natalenko
2017-09-15 14:03     ` Neal Cardwell
2017-09-15 19:04       ` Oleksandr Natalenko
2017-09-17 18:43         ` Oleksandr Natalenko
2017-09-18 17:18           ` Yuchung Cheng
2017-09-18 17:51             ` Yuchung Cheng
2017-09-18 17:59               ` Oleksandr Natalenko
2017-09-18 18:01                 ` Yuchung Cheng
2017-09-18 18:04                   ` Oleksandr Natalenko
2017-09-18 20:41                   ` Oleksandr Natalenko
2017-09-18 20:46                   ` Oleksandr Natalenko
2017-09-18 21:40                     ` Yuchung Cheng
2017-09-19 11:04                       ` Oleksandr Natalenko
2017-09-19 18:16                         ` Yuchung Cheng
2017-09-19 16:05                       ` Oleksandr Natalenko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.