On Wed, 7 Mar 2018, Yuchung Cheng wrote: > On Wed, Mar 7, 2018 at 12:19 PM, Neal Cardwell wrote: > > > > On Wed, Mar 7, 2018 at 7:59 AM, Ilpo J�rvinen wrote: > > > A bogus undo may/will trigger when the loss recovery state is > > > kept until snd_una is above high_seq. If tcp_any_retrans_done > > > is zero, retrans_stamp is cleared in this transient state. On > > > the next ACK, tcp_try_undo_recovery again executes and > > > tcp_may_undo will always return true because tcp_packet_delayed > > > has this condition: > > > return !tp->retrans_stamp || ... > > > > > > Check for the false fast retransmit transient condition in > > > tcp_packet_delayed to avoid bogus undos. Since snd_una may have > > > advanced on this ACK but CA state still remains unchanged, > > > prior_snd_una needs to be passed instead of tp->snd_una. > > > > This one also seems like a case where it would be nice to have a > > specific packet-by-packet example, or trace, or packetdrill scenario. > > Something that we might be able to translate into a test, or at least > > to document the issue more explicitly. > > I am hesitate for further logic to make undo "perfect" on non-sack > cases b/c undo is very complicated and SACK is extremely > well-supported today. so a trace to demonstrate how severe this issue > is appreciated. This is not just some remote corner cases to which I perhaps would understand your "making undo perfect" comment. Those undos result in a burst that, at worst, triggers additional buffer overflow because the correct CC action is cancelled. Unfortunately I don't have now permission to publish the time-seq graph about it but I've tried to improve the changelog messages so that you can better understand under which conditions the problem occurs. SACK case remains the same even after this change. I did rework the logic a bit though (pass prior_snd_una rather than flag around) to make it more obvious but the change is unfortunately lengthy (no matter what I pass through the call-chain). -- i.