All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <edumazet@google.com>
To: Neal Cardwell <ncardwell@google.com>
Cc: Steve Ibanez <sibanez@stanford.edu>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Netdev <netdev@vger.kernel.org>, Florian Westphal <fw@strlen.de>,
	Mohammad Alizadeh <alizadeh@csail.mit.edu>,
	Lawrence Brakmo <brakmo@fb.com>,
	Yuchung Cheng <ycheng@google.com>
Subject: Re: Linux ECN Handling
Date: Mon, 20 Nov 2017 07:40:52 -0800	[thread overview]
Message-ID: <CANn89iK-hwXjO6y36ORN_JeXSA2aaRsPOc=p3T4qCQREEzg4uw@mail.gmail.com> (raw)
In-Reply-To: <CADVnQy=q4qBpe0Ymo8dtFTYU_0x0q_XKE+ZvazLQE-ULwu7pQA@mail.gmail.com>

On Mon, Nov 20, 2017 at 7:01 AM, Neal Cardwell <ncardwell@google.com> wrote:
> Going back to one of your Oct 19 trace snapshots (attached), AFAICT at the
> time of the timeout there is actually almost 64KBytes  (352553398 + 1448 -
> 352489686 = 65160) of unacknowledged data. So there really does seem to be a
> significant chunk of packets that were in-flight that were then declared
> lost.
>
> So here is a possibility: perhaps the combination of CWR+PRR plus
> tcp_tso_should_defer() means that PRR can make cwnd so gentle that
> tcp_tso_should_defer() thinks we should wait for another ACK to send, and
> that ACK doesn't come. Breaking it, down, the potential sequence would be:
>
> (1) tcp_write_xmit() does not send, because the CWR behavior, using PRR,
> does not leave enough cwnd for tcp_tso_should_defer() to think we should
> send (PRR was originally designed for recovery, which did not have TSO
> deferral)
>
> (2) TLP does not fire, because we are in state CWR, not Open
>
> (3) The only remaining option is an RTO, which fires.
>
> In other words, the possibility is that, at the time of the stall, the cwnd
> is reasonably high, but tcp_packets_in_flight() is also quite high, so
> either there is (a) literally no unused cwnd left ( tcp_packets_in_flight()
> == cwnd), or (b) some mechanism like tcp_tso_should_defer() is deciding that
> there is not enough available cwnd for it to make sense to chop off a
> fraction of a TSO skb to send now.
>
> One way to test that conjecture would be to disable tcp_tso_should_defer()
> by adding a:
>
>    goto send_now;
>
> at the top of tcp_tso_should_defer().
>
> If that doesn't prevent the freezes then I would recommend adding printks or
> other instrumentation to  tcp_write_xmit() to log:
>
> - time
> - ca_state
> - cwnd
> - ssthresh
> - tcp_packets_in_flight()
> - the reason for breaking out of the tcp_write_xmit() loop (tso deferral, no
> packets left, tcp_snd_wnd_test, tcp_nagle_test, etc)
>
> cheers,
> neal
>
>
>
> On Mon, Nov 20, 2017 at 2:31 AM, Steve Ibanez <sibanez@stanford.edu> wrote:
>>
>> Hi Folks,
>>
>> I wanted to check back in on this for another update and to solicit
>> some more suggestions. I did a bit more digging to try an isolate the
>> problem.
>>
>> As I explained earlier, the log generated by tcp_probe indicates that
>> the snd_cwnd is set to 1 just before the end host receives an ECN
>> marked ACK and unexpectedly enters a timeout (
>> https://drive.google.com/open?id=1iyt8PvBxQga2jpRpBJ8KdQw3Q_mPTzZF ).
>> I was trying to track down where this is happening, but the only place
>> I could find that might be setting the snd_cwnd to 1 is in the
>> tcp_enter_loss() function. I inserted a printk() call in this function
>> to see when it is being invoked and it looks like it is only called by
>> the tcp_retransmit_timer() function after the timer expires.
>>
>> I decided to try recording the snd_cwnd, ss-thresh, and icsk_ca_state
>> inside the tcp_fastretrans_alert() function whenever it processes an
>> ECN marked ACK (
>> https://drive.google.com/open?id=17GD77lb9lkCSu0_s9p40GZ5r4EU8B4VB )
>> This plot also shows when the tcp_retransmit_timer() and
>> tcp_enter_loss() functions are invoked (red and purple dots
>> respectively). And I see that the ACK state machine is always either
>> in the TCP_CA_Open or TCP_CA_CWR state whenever the
>> tcp_fastretrans_alert() function processes ECN marked ACKs (
>> https://drive.google.com/open?id=1xwuPxjgwriT9DSblFx2uILfQ95Fy-Eqq ).
>> So I'm not sure where the snd_cwnd is being set to 1 (or possibly 0 as
>> Neal suggested) just before entering a timeout. Any suggestions here?
>>
>> In order to do a bit of profiling of the tcp_dctcp code I added
>> support into tcp_probe for recording the dctcp alpha parameter. I see
>> that alpha oscillates around about 0.1 when the flow rates have
>> converged, it goes to zero when the other host enters a timeout, and I
>> don't see any unexpected behavior just before the timeout (
>> https://drive.google.com/open?id=1zPdyS57TrUYZIekbid9p1UNyraLYrdw7 ).
>>
>> So I haven't had much luck yet trying to track down where the problem
>> is. If you have any suggestions that would help me to focus my search
>> efforts, I would appreciate the comments.
>>
>> Thanks!
>> -Steve

Steve, what HZ value your kernel is compiled with ?

  parent reply	other threads:[~2017-11-20 15:40 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CACJspmLFdy9i8K=TkzXHnofyFMNoJf9HkYE7On8uG+PREc2Dqw@mail.gmail.com>
2017-10-19 12:43 ` Linux ECN Handling Florian Westphal
2017-10-23 22:15   ` Steve Ibanez
2017-10-24  1:11     ` Neal Cardwell
2017-11-06 14:08       ` Daniel Borkmann
2017-11-06 23:31         ` Steve Ibanez
2017-11-20  7:31           ` Steve Ibanez
2017-11-20 15:05             ` Neal Cardwell
     [not found]             ` <CADVnQy=q4qBpe0Ymo8dtFTYU_0x0q_XKE+ZvazLQE-ULwu7pQA@mail.gmail.com>
2017-11-20 15:40               ` Eric Dumazet [this message]
2017-11-21  5:58               ` Steve Ibanez
2017-11-21 15:01                 ` Neal Cardwell
2017-11-21 15:51                   ` Yuchung Cheng
2017-11-21 16:20                     ` Neal Cardwell
2017-11-21 16:52                       ` Eric Dumazet
2017-11-22  3:02                         ` Steve Ibanez
2017-11-22  3:46                           ` Neal Cardwell
2017-11-27 18:49                             ` Steve Ibanez
2017-12-01 16:35                               ` Neal Cardwell
2017-12-05  5:22                                 ` Steve Ibanez
2017-12-05 15:23                                   ` Neal Cardwell
2017-12-05 19:36                                     ` Steve Ibanez
2017-12-05 20:04                                       ` Neal Cardwell
2017-12-19  5:16                                         ` Steve Ibanez
2017-12-19 15:28                                           ` Neal Cardwell
2017-12-19 22:00                                             ` Steve Ibanez
2017-12-20  0:08                                               ` Neal Cardwell
2017-12-20 19:20                                                 ` Steve Ibanez
2017-12-20 20:17                                                   ` Neal Cardwell
2018-01-02  7:43                                                     ` Steve Ibanez
2018-01-02 16:27                                                       ` Neal Cardwell
2018-01-02 23:57                                                         ` Steve Ibanez
2018-01-03 19:39                                                           ` Neal Cardwell
2018-01-03 22:21                                                             ` Steve Ibanez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CANn89iK-hwXjO6y36ORN_JeXSA2aaRsPOc=p3T4qCQREEzg4uw@mail.gmail.com' \
    --to=edumazet@google.com \
    --cc=alizadeh@csail.mit.edu \
    --cc=brakmo@fb.com \
    --cc=daniel@iogearbox.net \
    --cc=fw@strlen.de \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=sibanez@stanford.edu \
    --cc=ycheng@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.