All of lore.kernel.org
 help / color / mirror / Atom feed
From: Steve Ibanez <sibanez@stanford.edu>
To: Daniel Borkmann <daniel@iogearbox.net>
Cc: Neal Cardwell <ncardwell@google.com>,
	Netdev <netdev@vger.kernel.org>, Florian Westphal <fw@strlen.de>,
	Mohammad Alizadeh <alizadeh@csail.mit.edu>,
	Lawrence Brakmo <brakmo@fb.com>
Subject: Re: Linux ECN Handling
Date: Sun, 19 Nov 2017 23:31:04 -0800	[thread overview]
Message-ID: <CACJspmLJWE0Ld1ZW_PgKoyeNgbDumL-P3mWAN-8WZRM9fGVvFg@mail.gmail.com> (raw)
In-Reply-To: <CACJspmL3SKhsAa9_2uEFNGADd0Q3e8S1hrgRAmvXjbnuevasyg@mail.gmail.com>

Hi Folks,

I wanted to check back in on this for another update and to solicit
some more suggestions. I did a bit more digging to try an isolate the
problem.

As I explained earlier, the log generated by tcp_probe indicates that
the snd_cwnd is set to 1 just before the end host receives an ECN
marked ACK and unexpectedly enters a timeout (
https://drive.google.com/open?id=1iyt8PvBxQga2jpRpBJ8KdQw3Q_mPTzZF ).
I was trying to track down where this is happening, but the only place
I could find that might be setting the snd_cwnd to 1 is in the
tcp_enter_loss() function. I inserted a printk() call in this function
to see when it is being invoked and it looks like it is only called by
the tcp_retransmit_timer() function after the timer expires.

I decided to try recording the snd_cwnd, ss-thresh, and icsk_ca_state
inside the tcp_fastretrans_alert() function whenever it processes an
ECN marked ACK (
https://drive.google.com/open?id=17GD77lb9lkCSu0_s9p40GZ5r4EU8B4VB )
This plot also shows when the tcp_retransmit_timer() and
tcp_enter_loss() functions are invoked (red and purple dots
respectively). And I see that the ACK state machine is always either
in the TCP_CA_Open or TCP_CA_CWR state whenever the
tcp_fastretrans_alert() function processes ECN marked ACKs (
https://drive.google.com/open?id=1xwuPxjgwriT9DSblFx2uILfQ95Fy-Eqq ).
So I'm not sure where the snd_cwnd is being set to 1 (or possibly 0 as
Neal suggested) just before entering a timeout. Any suggestions here?

In order to do a bit of profiling of the tcp_dctcp code I added
support into tcp_probe for recording the dctcp alpha parameter. I see
that alpha oscillates around about 0.1 when the flow rates have
converged, it goes to zero when the other host enters a timeout, and I
don't see any unexpected behavior just before the timeout (
https://drive.google.com/open?id=1zPdyS57TrUYZIekbid9p1UNyraLYrdw7 ).

So I haven't had much luck yet trying to track down where the problem
is. If you have any suggestions that would help me to focus my search
efforts, I would appreciate the comments.

Thanks!
-Steve


On Mon, Nov 6, 2017 at 3:31 PM, Steve Ibanez <sibanez@stanford.edu> wrote:
> Hi Daniel,
>
> Apologies for the delay. I tried out Neal's suggestion to printk the
> cwnd and ss-thresh in the tcp_init_cwnd_reduction(),
> tcp_cwnd_reduction(), and tcp_end_cwnd_reduction() functions in
> tcp_input.c. From what I can tell, none of these functions are setting
> the cwnd to 0.
>
> Here is the kernel log with the cwnd and ss-thresh print statements:
> https://drive.google.com/open?id=1LEWIkz64NuZN3yuDpBOAXbUfJfiju55O
> And here is the corresponding packet trace at this end host:
> https://drive.google.com/open?id=1qf4cSW3wzsiwPngcYpZY-AoBspuqONLH
> (The kernel log buffer was not large enough to capture the full
> 3-second experiment, so there's only about a second of data for the
> log buffer and about 3 seconds for the packet trace.)
>
> Here are is a plot of the cwnd and ss-thresh from each of the three functions:
>     - tcp_init_cwnd_reduction:
> https://drive.google.com/open?id=1KOEXG2ISJQMi9c6KyPOQ6rpVUVsQwtWU
>     - tcp_cwnd_reduction:
> https://drive.google.com/open?id=1awoPWC3hi4CGZt7HyuI4aAaLG1LPLwJE
>     - tcp_end_cwnd_reduction:
> https://drive.google.com/open?id=1G7XUSnkX8tP7Z5XdY2O97OWj6jguQHO5
>
> Here is a plot of the measured flow rates:
> https://drive.google.com/open?id=1XwmGve10J4qa1nPE3LustK8NbvhZscac
>
> The kernel log and packet trace data was collected on the 10.0.0.3
> host. The cwnd and ss-thresh plots are from the final second or so of
> the experiment and they show two timeout events. In the first event,
> the 10.0.0.1 host times out allowing 10.0.0.3 to increase it's cwnd.
> And in the second event, the 10.0.0.3 host times out causing the cwnd
> to decrease from ~100 to about ~10. The cwnd samples from tcp_probe (
> https://drive.google.com/open?id=1QCuPspLqbGoA68MKTaAh7rx2wCv3Cr_e )
> indicate that the cwnd is 1 MSS just before the timeout event, but I
> don't see that in the data collected from the tcp_*_cwnd_reduction
> functions.
>
> Here is a diff of the changes that I applied to the tcp_input.c file:
> https://drive.google.com/open?id=1k5x3AkfTr3tJhohSIcmQp-3g2yTVNMWm
>
> Are there other places in the code that you would suggest I check for
> how the cwnd and ss-thresh are changing?
>
> Thanks,
> -Steve
>
>
> On Mon, Nov 6, 2017 at 6:08 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
>> On 10/24/2017 03:11 AM, Neal Cardwell wrote:
>>>
>>> On Mon, Oct 23, 2017 at 6:15 PM, Steve Ibanez <sibanez@stanford.edu>
>>> wrote:
>>>>
>>>> Hi All,
>>>>
>>>> I upgraded the kernel on all of our machines to Linux
>>>> 4.13.8-041308-lowlatency. However, I'm still observing the same
>>>> behavior where the source enters a timeout when the CWND=1MSS and it
>>>> receives ECN marks.
>>>>
>>>> Here are the measured flow rates:
>>>>
>>>> <https://drive.google.com/file/d/0B-bt9QS-C3ONT0VXMUt6WHhKREE/view?usp=sharing>
>>>>
>>>> Here are snapshots of the packet traces at the sources when they both
>>>> enter a timeout at t=1.6sec:
>>>>
>>>> 10.0.0.1 timeout event:
>>>>
>>>> <https://drive.google.com/file/d/0B-bt9QS-C3ONcl9WRnRPazg2ems/view?usp=sharing>
>>>>
>>>> 10.0.0.3 timeout event:
>>>>
>>>> <https://drive.google.com/file/d/0B-bt9QS-C3ONeDlxRjNXa0VzWm8/view?usp=sharing>
>>>>
>>>> Both still essentially follow the same sequence of events that I
>>>> mentioned earlier:
>>>> (1) receives an ACK for byte XYZ with the ECN flag set
>>>> (2) stops sending for RTO_min=300ms
>>>> (3) sends a retransmission for byte XYZ
>>>>
>>>> The cwnd samples reported by tcp_probe still indicate that the sources
>>>> are reacting to the ECN marks more than once per window. Here are the
>>>> cwnd samples at the same timeout event mentioned above:
>>>>
>>>> <https://drive.google.com/file/d/0B-bt9QS-C3ONdEZQdktpaW5JUm8/view?usp=sharing>
>>>>
>>>> Let me know if there is anything else you think I should try.
>>>
>>>
>>> Sounds like perhaps cwnd is being set to 0 somewhere in this DCTCP
>>> scenario. Would you be able to add printk statements in
>>> tcp_init_cwnd_reduction(), tcp_cwnd_reduction(), and
>>> tcp_end_cwnd_reduction(), printing the IP:port, tp->snd_cwnd, and
>>> tp->snd_ssthresh?
>>>
>>> Based on the output you may be able to figure out where cwnd is being
>>> set to zero. If not, could you please post the printk output and
>>> tcpdump traces (.pcap, headers-only is fine) from your tests?
>>
>>
>> Hi Steve, do you have any updates on your debugging?
>>
>>> thanks,
>>> neal
>>>
>>

  reply	other threads:[~2017-11-20  7:31 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CACJspmLFdy9i8K=TkzXHnofyFMNoJf9HkYE7On8uG+PREc2Dqw@mail.gmail.com>
2017-10-19 12:43 ` Linux ECN Handling Florian Westphal
2017-10-23 22:15   ` Steve Ibanez
2017-10-24  1:11     ` Neal Cardwell
2017-11-06 14:08       ` Daniel Borkmann
2017-11-06 23:31         ` Steve Ibanez
2017-11-20  7:31           ` Steve Ibanez [this message]
2017-11-20 15:05             ` Neal Cardwell
     [not found]             ` <CADVnQy=q4qBpe0Ymo8dtFTYU_0x0q_XKE+ZvazLQE-ULwu7pQA@mail.gmail.com>
2017-11-20 15:40               ` Eric Dumazet
2017-11-21  5:58               ` Steve Ibanez
2017-11-21 15:01                 ` Neal Cardwell
2017-11-21 15:51                   ` Yuchung Cheng
2017-11-21 16:20                     ` Neal Cardwell
2017-11-21 16:52                       ` Eric Dumazet
2017-11-22  3:02                         ` Steve Ibanez
2017-11-22  3:46                           ` Neal Cardwell
2017-11-27 18:49                             ` Steve Ibanez
2017-12-01 16:35                               ` Neal Cardwell
2017-12-05  5:22                                 ` Steve Ibanez
2017-12-05 15:23                                   ` Neal Cardwell
2017-12-05 19:36                                     ` Steve Ibanez
2017-12-05 20:04                                       ` Neal Cardwell
2017-12-19  5:16                                         ` Steve Ibanez
2017-12-19 15:28                                           ` Neal Cardwell
2017-12-19 22:00                                             ` Steve Ibanez
2017-12-20  0:08                                               ` Neal Cardwell
2017-12-20 19:20                                                 ` Steve Ibanez
2017-12-20 20:17                                                   ` Neal Cardwell
2018-01-02  7:43                                                     ` Steve Ibanez
2018-01-02 16:27                                                       ` Neal Cardwell
2018-01-02 23:57                                                         ` Steve Ibanez
2018-01-03 19:39                                                           ` Neal Cardwell
2018-01-03 22:21                                                             ` Steve Ibanez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACJspmLJWE0Ld1ZW_PgKoyeNgbDumL-P3mWAN-8WZRM9fGVvFg@mail.gmail.com \
    --to=sibanez@stanford.edu \
    --cc=alizadeh@csail.mit.edu \
    --cc=brakmo@fb.com \
    --cc=daniel@iogearbox.net \
    --cc=fw@strlen.de \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.