All of lore.kernel.org
 help / color / mirror / Atom feed
From: Neal Cardwell <ncardwell@google.com>
To: Leonard Crestez <lcrestez@drivenets.com>
Cc: Matt Mathis <mattmathis@google.com>,
	Willem de Bruijn <willemb@google.com>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>,
	Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
	David Ahern <dsahern@kernel.org>,
	Soheil Hassas Yeganeh <soheil@google.com>,
	Roopa Prabhu <roopa@cumulusnetworks.com>,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	Yuchung Cheng <ycheng@google.com>,
	John Heffner <johnwheffner@gmail.com>
Subject: Re: [RFC] tcp: Delay sending non-probes for RFC4821 mtu probing
Date: Mon, 26 Apr 2021 13:24:07 -0400	[thread overview]
Message-ID: <CADVnQykNRumYbEj4s-NV=BQLqVAbR2An4rLVZft4gq_E-JdNug@mail.gmail.com> (raw)
In-Reply-To: <5af52ab4-237f-8646-76e4-5e24236d9b4a@drivenets.com>

On Mon, Apr 26, 2021 at 1:09 PM Leonard Crestez <lcrestez@drivenets.com> wrote:
>
> On 26.04.2021 18:59, Neal Cardwell wrote:
> > On Sun, Apr 25, 2021 at 10:34 PM Leonard Crestez <lcrestez@drivenets.com> wrote:
> >> On 4/21/21 3:47 PM, Neal Cardwell wrote:
> >>> On Wed, Apr 21, 2021 at 6:21 AM Leonard Crestez <cdleonard@gmail.com> wrote:
>
> >>> If the goal is to increase the frequency of PMTU probes, which seems
> >>> like a valid goal, I would suggest that we rethink the Linux heuristic
> >>> for triggering PMTU probes in the light of the fact that the loss
> >>> detection mechanism is now RACK-TLP, which provides quick recovery in
> >>> a much wider variety of scenarios.
> >>
> >>> You mention:
> >>>> Linux waits for probe_size + (1 + retries) * mss_cache to be available
> >>>
> >>> The code in question seems to be:
> >>>
> >>>     size_needed = probe_size + (tp->reordering + 1) * tp->mss_cache;
> >>> How about just changing this to:
> >>>
> >>>     size_needed = probe_size + tp->mss_cache;
> >>>
> >>> The rationale would be that if that amount of data is available, then
> >>> the sender can send one probe and one following current-mss-size
> >>> packet. If the path MTU has not increased to allow the probe of size
> >>> probe_size to pass through the network, then the following
> >>> current-mss-size packet will likely pass through the network, generate
> >>> a SACK, and trigger a RACK fast recovery 1/4*min_rtt later, when the
> >>> RACK reorder timer fires.
> >>
> >> This appears to almost work except it stalls after a while. I spend some
> >> time investigating it and it seems that cwnd is shrunk on mss increases
> >> and does not go back up. This causes probes to be skipped because of a
> >> "snd_cwnd < 11" condition.
> >>
> >> I don't undestand where that magical "11" comes from, could that be
> >> shrunk. Maybe it's meant to only send probes when the cwnd is above the
> >> default of 10? Then maybe mtu_probe_success shouldn't shrink mss below
> >> what is required for an additional probe, or at least round-up.
> >>
> >> The shrinkage of cwnd is a problem with this "short probes" approach
> >> because tcp_is_cwnd_limited returns false because tp->max_packets_out is
> >> smaller (4). With longer probes tp->max_packets_out is larger (6) so
> >> tcp_is_cwnd_limited returns true even for a cwnd of 10.
> >>
> >> I'm testing using namespace-to-namespace loopback so my delays are close
> >> to zero. I tried to introduce an artificial delay of 30ms (using tc
> >> netem) and it works but 20ms does not.
> >
> > I agree the magic 11 seems outdated and unnecessarily high, given RACK-TLP.
> >
> > I think it would be fine to change the magic 11 to a magic
> > (TCP_FASTRETRANS_THRESH+1), aka 3+1=4:
> >
> >    - tp->snd_cwnd < 11 ||
> >    + p->snd_cwnd < (TCP_FASTRETRANS_THRESH + 1) ||
> >
> > As long as the cwnd is >= TCP_FASTRETRANS_THRESH+1 then the sender
> > should usually be able to send the 1 probe packet and then 3
> > additional packets beyond the probe, and in the common case (with no
> > reordering) then with failed probes this should allow the sender to
> > quickly receive 3 SACKed segments and enter fast recovery quickly.
> > Even if the sender doesn't have 3 additional packets, or if reordering
> > has been detected, then RACK-TLP should be able to start recovery
> > quickly (5/4*RTT if there is at least one SACK, or 2*RTT for a TLP if
> > there is no SACK).
>
> As far as I understand tp->reordering is a dynamic evaluation of the
> fastretrans threshold to deal with environments with lots of reordering.
> Your suggestion seems equivalent to the current size_needed calculation
> except using packets instead of bytes.
>
> Wouldn't it be easier to drop the "11" check and just verify that
> size_needed fits into cwnd as bytes?

Yes, that sounds good to me (dropping the "11" check in favor of
verifying that size_needed fits into the cwnd).

neal

  reply	other threads:[~2021-04-26 17:24 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-21 10:21 [RFC] tcp: Delay sending non-probes for RFC4821 mtu probing Leonard Crestez
2021-04-21 12:47 ` Neal Cardwell
     [not found]   ` <CAH56bmDBGsHOSjJpo=TseUATOh0cZqTMFyFO1sqtQmMrTPHtrA@mail.gmail.com>
2021-04-21 16:45     ` Fwd: " Matt Mathis
2021-04-26  2:32       ` Leonard Crestez
2021-04-26  2:34   ` Leonard Crestez
2021-04-26  3:20     ` Matt Mathis
2021-04-26 15:59     ` Neal Cardwell
2021-04-26 17:09       ` Leonard Crestez
2021-04-26 17:24         ` Neal Cardwell [this message]
2021-04-21 13:05 ` kernel test robot
2021-04-21 13:05 ` [RFC PATCH] tcp: tcp_mtu_probe_wait_stop() can be static kernel test robot
2021-04-21 13:15 ` [RFC] tcp: Delay sending non-probes for RFC4821 mtu probing kernel test robot
2021-04-21 14:53 ` kernel test robot
2021-04-21 15:09 ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CADVnQykNRumYbEj4s-NV=BQLqVAbR2An4rLVZft4gq_E-JdNug@mail.gmail.com' \
    --to=ncardwell@google.com \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=johnwheffner@gmail.com \
    --cc=kuba@kernel.org \
    --cc=lcrestez@drivenets.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mattmathis@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=roopa@cumulusnetworks.com \
    --cc=soheil@google.com \
    --cc=willemb@google.com \
    --cc=ycheng@google.com \
    --cc=yoshfuji@linux-ipv6.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.