linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v2] tcp: Ensure DCTCP reacts to losses
@ 2019-04-04 12:24 Tilmans, Olivier (Nokia - BE/Antwerp)
  2019-04-04 13:17 ` Florian Westphal
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Tilmans, Olivier (Nokia - BE/Antwerp) @ 2019-04-04 12:24 UTC (permalink / raw)
  Cc: Tilmans, Olivier (Nokia - BE/Antwerp),
	De Schepper, Koen (Nokia - BE/Antwerp),
	Bob Briscoe, Lawrence Brakmo, Florian Westphal, Daniel Borkmann,
	Yuchung Cheng, Neal Cardwell, Eric Dumazet, Andrew Shewmaker,
	Glenn Judd, David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	netdev, linux-kernel

From: Koen De Schepper <koen.de_schepper@nokia-bell-labs.com>

RFC8257 §3.5 explicitly states that "A DCTCP sender MUST react to
loss episodes in the same way as conventional TCP".

Currently, Linux DCTCP performs no cwnd reduction when losses
are encountered. Optionally, the dctcp_clamp_alpha_on_loss resets
alpha to its maximal value if a RTO happens. This behavior
is sub-optimal for at least two reasons: i) it ignores losses
triggering fast retransmissions; and ii) it causes unnecessary large
cwnd reduction in the future if the loss was isolated as it resets
the historical term of DCTCP's alpha EWMA to its maximal value (i.e.,
denoting a total congestion). The second reason has an especially
noticeable effect when using DCTCP in high BDP environments, where
alpha normally stays at low values.

This patch replace the clamping of alpha by setting ssthresh to
half of cwnd for both fast retransmissions and RTOs, at most once
per RTT. Consequently, the dctcp_clamp_alpha_on_loss module parameter
has been removed.

The table below shows experimental results where we measured the
drop probability of a PIE AQM (not applying ECN marks) at a
bottleneck in the presence of a single TCP flow with either the
alpha-clamping option enabled or the cwnd halving proposed by this
patch. Results using reno or cubic are given for comparison.

                          |  Link   |   RTT    |    Drop
                 TCP CC   |  speed  | base+AQM | probability
        ==================|=========|==========|============
                    CUBIC |  40Mbps |  7+20ms  |    0.21%
                     RENO |         |          |    0.19%
        DCTCP-CLAMP-ALPHA |         |          |   25.80%
         DCTCP-HALVE-CWND |         |          |    0.22%
        ------------------|---------|----------|------------
                    CUBIC | 100Mbps |  7+20ms  |    0.03%
                     RENO |         |          |    0.02%
        DCTCP-CLAMP-ALPHA |         |          |   23.30%
         DCTCP-HALVE-CWND |         |          |    0.04%
        ------------------|---------|----------|------------
                    CUBIC | 800Mbps |   1+1ms  |    0.04%
                     RENO |         |          |    0.05%
        DCTCP-CLAMP-ALPHA |         |          |   18.70%
         DCTCP-HALVE-CWND |         |          |    0.06%

We see that, without halving its cwnd for all source of losses,
DCTCP drives the AQM to large drop probabilities in order to keep
the queue length under control (i.e., it repeatedly faces RTOs).
Instead, if DCTCP reacts to all source of losses, it can then be
controlled by the AQM using similar drop levels than cubic or reno.

Signed-off-by: Koen De Schepper <koen.de_schepper@nokia-bell-labs.com>
Signed-off-by: Olivier Tilmans <olivier.tilmans@nokia-bell-labs.com>
Cc: Bob Briscoe <research@bobbriscoe.net>
Cc: Lawrence Brakmo <brakmo@fb.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Daniel Borkmann <borkmann@iogearbox.net>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Andrew Shewmaker <agshew@gmail.com>
Cc: Glenn Judd <glenn.judd@morganstanley.com>

---
v1 -> v2:
    - Remove the module parameter that was making the reaction to losses
      optional as RFC8257 specifies the behavior as a MUST.
---
 net/ipv4/tcp_dctcp.c | 36 ++++++++++++++++++------------------
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/net/ipv4/tcp_dctcp.c b/net/ipv4/tcp_dctcp.c
index cd4814f7e962..359da68d7c06 100644
--- a/net/ipv4/tcp_dctcp.c
+++ b/net/ipv4/tcp_dctcp.c
@@ -67,11 +67,6 @@ static unsigned int dctcp_alpha_on_init __read_mostly = DCTCP_MAX_ALPHA;
 module_param(dctcp_alpha_on_init, uint, 0644);
 MODULE_PARM_DESC(dctcp_alpha_on_init, "parameter for initial alpha value");
 
-static unsigned int dctcp_clamp_alpha_on_loss __read_mostly;
-module_param(dctcp_clamp_alpha_on_loss, uint, 0644);
-MODULE_PARM_DESC(dctcp_clamp_alpha_on_loss,
-		 "parameter for clamping alpha on loss");
-
 static struct tcp_congestion_ops dctcp_reno;
 
 static void dctcp_reset(const struct tcp_sock *tp, struct dctcp *ca)
@@ -164,21 +159,23 @@ static void dctcp_update_alpha(struct sock *sk, u32 flags)
 	}
 }
 
-static void dctcp_state(struct sock *sk, u8 new_state)
+static void dctcp_react_to_loss(struct sock *sk)
 {
-	if (dctcp_clamp_alpha_on_loss && new_state == TCP_CA_Loss) {
-		struct dctcp *ca = inet_csk_ca(sk);
+	struct dctcp *ca = inet_csk_ca(sk);
+	struct tcp_sock *tp = tcp_sk(sk);
 
-		/* If this extension is enabled, we clamp dctcp_alpha to
-		 * max on packet loss; the motivation is that dctcp_alpha
-		 * is an indicator to the extend of congestion and packet
-		 * loss is an indicator of extreme congestion; setting
-		 * this in practice turned out to be beneficial, and
-		 * effectively assumes total congestion which reduces the
-		 * window by half.
-		 */
-		ca->dctcp_alpha = DCTCP_MAX_ALPHA;
-	}
+	ca->loss_cwnd = tp->snd_cwnd;
+	tp->snd_ssthresh = max(tp->snd_cwnd >> 1U, 2U);
+}
+
+static void dctcp_state(struct sock *sk, u8 new_state)
+{
+	if (new_state == TCP_CA_Recovery &&
+	    new_state != inet_csk(sk)->icsk_ca_state)
+		dctcp_react_to_loss(sk);
+	/* We handle RTO in dctcp_cwnd_event to ensure that we perform only
+	 * one loss-adjustment per RTT.
+	 */
 }
 
 static void dctcp_cwnd_event(struct sock *sk, enum tcp_ca_event ev)
@@ -190,6 +187,9 @@ static void dctcp_cwnd_event(struct sock *sk, enum tcp_ca_event ev)
 	case CA_EVENT_ECN_NO_CE:
 		dctcp_ece_ack_update(sk, ev, &ca->prior_rcv_nxt, &ca->ce_state);
 		break;
+	case CA_EVENT_LOSS:
+		dctcp_react_to_loss(sk);
+		break;
 	default:
 		/* Don't care for the rest. */
 		break;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v2] tcp: Ensure DCTCP reacts to losses
  2019-04-04 12:24 [PATCH net-next v2] tcp: Ensure DCTCP reacts to losses Tilmans, Olivier (Nokia - BE/Antwerp)
@ 2019-04-04 13:17 ` Florian Westphal
  2019-04-04 13:48 ` Neal Cardwell
  2019-04-04 17:52 ` David Miller
  2 siblings, 0 replies; 10+ messages in thread
From: Florian Westphal @ 2019-04-04 13:17 UTC (permalink / raw)
  To: Tilmans, Olivier (Nokia - BE/Antwerp)
  Cc: De Schepper, Koen (Nokia - BE/Antwerp),
	Bob Briscoe, Lawrence Brakmo, Daniel Borkmann, Yuchung Cheng,
	Neal Cardwell, Eric Dumazet, Andrew Shewmaker, Glenn Judd,
	David S. Miller, Hideaki YOSHIFUJI, netdev, linux-kernel

Tilmans, Olivier (Nokia - BE/Antwerp) <olivier.tilmans@nokia-bell-labs.com> wrote:
> RFC8257 §3.5 explicitly states that "A DCTCP sender MUST react to
> loss episodes in the same way as conventional TCP".

Thanks, this looks good.

Acked-by: Florian Westphal <fw@strlen.de>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v2] tcp: Ensure DCTCP reacts to losses
  2019-04-04 12:24 [PATCH net-next v2] tcp: Ensure DCTCP reacts to losses Tilmans, Olivier (Nokia - BE/Antwerp)
  2019-04-04 13:17 ` Florian Westphal
@ 2019-04-04 13:48 ` Neal Cardwell
  2019-04-04 13:55   ` Neal Cardwell
  2019-04-04 17:52 ` David Miller
  2 siblings, 1 reply; 10+ messages in thread
From: Neal Cardwell @ 2019-04-04 13:48 UTC (permalink / raw)
  To: Tilmans, Olivier (Nokia - BE/Antwerp)
  Cc: De Schepper, Koen (Nokia - BE/Antwerp),
	Bob Briscoe, Lawrence Brakmo, Florian Westphal, Daniel Borkmann,
	Yuchung Cheng, Eric Dumazet, Andrew Shewmaker, Glenn Judd,
	David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI, netdev,
	linux-kernel

On Thu, Apr 4, 2019 at 8:24 AM Tilmans, Olivier (Nokia - BE/Antwerp)
<olivier.tilmans@nokia-bell-labs.com> wrote:
>
> From: Koen De Schepper <koen.de_schepper@nokia-bell-labs.com>
>
> RFC8257 §3.5 explicitly states that "A DCTCP sender MUST react to
> loss episodes in the same way as conventional TCP".
...
> This patch replace the clamping of alpha by setting ssthresh to
> half of cwnd for both fast retransmissions and RTOs, at most once
> per RTT. Consequently, the dctcp_clamp_alpha_on_loss module parameter
> has been removed.

Acked-by: Neal Cardwell <ncardwell@google.com>

Thanks!

neal

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v2] tcp: Ensure DCTCP reacts to losses
  2019-04-04 13:48 ` Neal Cardwell
@ 2019-04-04 13:55   ` Neal Cardwell
  2019-04-04 13:59     ` Daniel Borkmann
  0 siblings, 1 reply; 10+ messages in thread
From: Neal Cardwell @ 2019-04-04 13:55 UTC (permalink / raw)
  To: Tilmans, Olivier (Nokia - BE/Antwerp)
  Cc: De Schepper, Koen (Nokia - BE/Antwerp),
	Bob Briscoe, Lawrence Brakmo, Florian Westphal, Daniel Borkmann,
	Yuchung Cheng, Eric Dumazet, Andrew Shewmaker, Glenn Judd,
	David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI, netdev,
	linux-kernel

On Thu, Apr 4, 2019 at 9:48 AM Neal Cardwell <ncardwell@google.com> wrote:
>
> On Thu, Apr 4, 2019 at 8:24 AM Tilmans, Olivier (Nokia - BE/Antwerp)
> <olivier.tilmans@nokia-bell-labs.com> wrote:
> >
> > From: Koen De Schepper <koen.de_schepper@nokia-bell-labs.com>
> >
> > RFC8257 §3.5 explicitly states that "A DCTCP sender MUST react to
> > loss episodes in the same way as conventional TCP".
> ...
> > This patch replace the clamping of alpha by setting ssthresh to
> > half of cwnd for both fast retransmissions and RTOs, at most once
> > per RTT. Consequently, the dctcp_clamp_alpha_on_loss module parameter
> > has been removed.
>
> Acked-by: Neal Cardwell <ncardwell@google.com>
>
> Thanks!

FWIW, my vote is that this is an important bug fix that is appropriate
for the net branch and -stable releases, rather than net-next.

neal

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v2] tcp: Ensure DCTCP reacts to losses
  2019-04-04 13:55   ` Neal Cardwell
@ 2019-04-04 13:59     ` Daniel Borkmann
  0 siblings, 0 replies; 10+ messages in thread
From: Daniel Borkmann @ 2019-04-04 13:59 UTC (permalink / raw)
  To: Neal Cardwell, Tilmans, Olivier (Nokia - BE/Antwerp)
  Cc: De Schepper, Koen (Nokia - BE/Antwerp),
	Bob Briscoe, Lawrence Brakmo, Florian Westphal, Daniel Borkmann,
	Yuchung Cheng, Eric Dumazet, Andrew Shewmaker, Glenn Judd,
	David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI, netdev,
	linux-kernel

On 04/04/2019 03:55 PM, Neal Cardwell wrote:
> On Thu, Apr 4, 2019 at 9:48 AM Neal Cardwell <ncardwell@google.com> wrote:
>> On Thu, Apr 4, 2019 at 8:24 AM Tilmans, Olivier (Nokia - BE/Antwerp)
>> <olivier.tilmans@nokia-bell-labs.com> wrote:
>>>
>>> From: Koen De Schepper <koen.de_schepper@nokia-bell-labs.com>
>>>
>>> RFC8257 §3.5 explicitly states that "A DCTCP sender MUST react to
>>> loss episodes in the same way as conventional TCP".
>> ...
>>> This patch replace the clamping of alpha by setting ssthresh to
>>> half of cwnd for both fast retransmissions and RTOs, at most once
>>> per RTT. Consequently, the dctcp_clamp_alpha_on_loss module parameter
>>> has been removed.
>>
>> Acked-by: Neal Cardwell <ncardwell@google.com>
>>
>> Thanks!
> 
> FWIW, my vote is that this is an important bug fix that is appropriate
> for the net branch and -stable releases, rather than net-next.

Agree, my preference would be for -net tree as well. Thanks for the fix!

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v2] tcp: Ensure DCTCP reacts to losses
  2019-04-04 12:24 [PATCH net-next v2] tcp: Ensure DCTCP reacts to losses Tilmans, Olivier (Nokia - BE/Antwerp)
  2019-04-04 13:17 ` Florian Westphal
  2019-04-04 13:48 ` Neal Cardwell
@ 2019-04-04 17:52 ` David Miller
  2019-04-04 18:10   ` Lawrence Brakmo
  2 siblings, 1 reply; 10+ messages in thread
From: David Miller @ 2019-04-04 17:52 UTC (permalink / raw)
  To: olivier.tilmans
  Cc: koen.de_schepper, research, brakmo, fw, borkmann, ycheng,
	ncardwell, edumazet, agshew, glenn.judd, kuznet, yoshfuji,
	netdev, linux-kernel

From: "Tilmans, Olivier (Nokia - BE/Antwerp)" <olivier.tilmans@nokia-bell-labs.com>
Date: Thu, 4 Apr 2019 12:24:02 +0000

> RFC8257 §3.5 explicitly states that "A DCTCP sender MUST react to
> loss episodes in the same way as conventional TCP".
...
> This patch replace the clamping of alpha by setting ssthresh to
> half of cwnd for both fast retransmissions and RTOs, at most once
> per RTT. Consequently, the dctcp_clamp_alpha_on_loss module parameter
> has been removed.

Applied to 'net' and queued up for -stable, thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v2] tcp: Ensure DCTCP reacts to losses
  2019-04-04 17:52 ` David Miller
@ 2019-04-04 18:10   ` Lawrence Brakmo
  2019-04-04 18:39     ` Tilmans, Olivier (Nokia - BE/Antwerp)
  0 siblings, 1 reply; 10+ messages in thread
From: Lawrence Brakmo @ 2019-04-04 18:10 UTC (permalink / raw)
  To: David Miller, olivier.tilmans
  Cc: koen.de_schepper, research, fw, borkmann, ycheng, ncardwell,
	edumazet, agshew, glenn.judd, kuznet, yoshfuji, netdev,
	linux-kernel


On 4/4/19, 10:53 AM, "David Miller" <davem@davemloft.net> wrote:

    From: "Tilmans, Olivier (Nokia - BE/Antwerp)" <olivier.tilmans@nokia-bell-labs.com>
    Date: Thu, 4 Apr 2019 12:24:02 +0000
    
    > RFC8257 §3.5 explicitly states that "A DCTCP sender MUST react to
    > loss episodes in the same way as conventional TCP".
    ...
    > This patch replace the clamping of alpha by setting ssthresh to
    > half of cwnd for both fast retransmissions and RTOs, at most once
    > per RTT. Consequently, the dctcp_clamp_alpha_on_loss module parameter
    > has been removed.
    
    Applied to 'net' and queued up for -stable, thanks.
    
DCTCP is meant to be used in environments where the switches/routers do ECN marking, so it is not surprising that it performs badly when used in environments where it was not meant to be used. Has anyone measured the effect of this changed when DCTCP is used in environments where all the switches/routers do ECN marking? My concern is that we could end up hurting performance when DCTCP is used how it was meant to be used in order to protect incorrect uses of DCTCP.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH net-next v2] tcp: Ensure DCTCP reacts to losses
  2019-04-04 18:10   ` Lawrence Brakmo
@ 2019-04-04 18:39     ` Tilmans, Olivier (Nokia - BE/Antwerp)
  2019-04-04 18:55       ` Lawrence Brakmo
  0 siblings, 1 reply; 10+ messages in thread
From: Tilmans, Olivier (Nokia - BE/Antwerp) @ 2019-04-04 18:39 UTC (permalink / raw)
  To: Lawrence Brakmo, David Miller
  Cc: De Schepper, Koen (Nokia - BE/Antwerp),
	research, fw, borkmann, ycheng, ncardwell, edumazet, agshew,
	glenn.judd, kuznet, yoshfuji, netdev, linux-kernel

> DCTCP is meant to be used in environments where the switches/routers do
> ECN marking, so it is not surprising that it performs badly when used in
> environments where it was not meant to be used. Has anyone measured the
> effect of this changed when DCTCP is used in environments where all the
> switches/routers do ECN marking? My concern is that we could end up
> hurting performance when DCTCP is used how it was meant to be used in
> order to protect incorrect uses of DCTCP.

The results reported are indeed from a non-optimal setting, and mostly to
show that it was ignoring losses. In practice, we only use DCTCP on 
ECN-enabled AQMs, and rarely see the loss reaction (e.g., a burst of new flows
IW that congest the ToR switch, in which case I'd argue the behavior is
beneficial). I cannot estimate the impact on FB's workloads though.

I had originally put a module parameter to make this loss reaction behavior
optional, mostly to enable people to check first whether it was safe to use 
with their configuration. In hindsight, I should have waited a bit more
before submitting the v2 with its removal as requested.


Olivier

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v2] tcp: Ensure DCTCP reacts to losses
  2019-04-04 18:39     ` Tilmans, Olivier (Nokia - BE/Antwerp)
@ 2019-04-04 18:55       ` Lawrence Brakmo
  2019-04-05  9:23         ` Eric Dumazet
  0 siblings, 1 reply; 10+ messages in thread
From: Lawrence Brakmo @ 2019-04-04 18:55 UTC (permalink / raw)
  To: Tilmans, Olivier (Nokia - BE/Antwerp), David Miller
  Cc: De Schepper, Koen (Nokia - BE/Antwerp),
	research, fw, borkmann, ycheng, ncardwell, edumazet, agshew,
	glenn.judd, kuznet, yoshfuji, netdev, linux-kernel


On 4/4/19, 11:39 AM, "Tilmans, Olivier (Nokia - BE/Antwerp)" <olivier.tilmans@nokia-bell-labs.com> wrote:

    > DCTCP is meant to be used in environments where the switches/routers do
    > ECN marking, so it is not surprising that it performs badly when used in
    > environments where it was not meant to be used. Has anyone measured the
    > effect of this changed when DCTCP is used in environments where all the
    > switches/routers do ECN marking? My concern is that we could end up
    > hurting performance when DCTCP is used how it was meant to be used in
    > order to protect incorrect uses of DCTCP.
    
    The results reported are indeed from a non-optimal setting, and mostly to
    show that it was ignoring losses. In practice, we only use DCTCP on 
    ECN-enabled AQMs, and rarely see the loss reaction (e.g., a burst of new flows
    IW that congest the ToR switch, in which case I'd argue the behavior is
    beneficial). I cannot estimate the impact on FB's workloads though.
    
In some of our environments we do see packet losses at high workloads with DCTCP.
My concern is that I have no idea whether this change will be beneficial or
harmful in those environments.

    I had originally put a module parameter to make this loss reaction behavior
    optional, mostly to enable people to check first whether it was safe to use 
    with their configuration. In hindsight, I should have waited a bit more
    before submitting the v2 with its removal as requested.
    
The module parameter, even if enabled by default, would have been
preferable since it would support environments where this feature
turned out to be sub-optimal.
    
    Olivier
    


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v2] tcp: Ensure DCTCP reacts to losses
  2019-04-04 18:55       ` Lawrence Brakmo
@ 2019-04-05  9:23         ` Eric Dumazet
  0 siblings, 0 replies; 10+ messages in thread
From: Eric Dumazet @ 2019-04-05  9:23 UTC (permalink / raw)
  To: Lawrence Brakmo, Tilmans, Olivier (Nokia - BE/Antwerp), David Miller
  Cc: De Schepper, Koen (Nokia - BE/Antwerp),
	research, fw, borkmann, ycheng, ncardwell, edumazet, agshew,
	glenn.judd, kuznet, yoshfuji, netdev, linux-kernel



On 04/04/2019 11:55 AM, Lawrence Brakmo wrote:
> 
> The module parameter, even if enabled by default, would have been
> preferable since it would support environments where this feature
> turned out to be sub-optimal.

Module parameters are awful, just add a sysctl if you need.

Bonus : per netns thing, meaning writing tests (say with packetdrill) is easier,
since you can launch dozens of them in parallel, each one in a different netns.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-04-05  9:23 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-04 12:24 [PATCH net-next v2] tcp: Ensure DCTCP reacts to losses Tilmans, Olivier (Nokia - BE/Antwerp)
2019-04-04 13:17 ` Florian Westphal
2019-04-04 13:48 ` Neal Cardwell
2019-04-04 13:55   ` Neal Cardwell
2019-04-04 13:59     ` Daniel Borkmann
2019-04-04 17:52 ` David Miller
2019-04-04 18:10   ` Lawrence Brakmo
2019-04-04 18:39     ` Tilmans, Olivier (Nokia - BE/Antwerp)
2019-04-04 18:55       ` Lawrence Brakmo
2019-04-05  9:23         ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).