linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org,
	Koen De Schepper <koen.de_schepper@nokia-bell-labs.com>,
	Olivier Tilmans <olivier.tilmans@nokia-bell-labs.com>,
	Bob Briscoe <research@bobbriscoe.net>,
	Lawrence Brakmo <brakmo@fb.com>, Florian Westphal <fw@strlen.de>,
	Daniel Borkmann <borkmann@iogearbox.net>,
	Yuchung Cheng <ycheng@google.com>,
	Neal Cardwell <ncardwell@google.com>,
	Eric Dumazet <edumazet@google.com>,
	Andrew Shewmaker <agshew@gmail.com>,
	Glenn Judd <glenn.judd@morganstanley.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	"David S. Miller" <davem@davemloft.net>
Subject: [PATCH 4.9 55/76] tcp: Ensure DCTCP reacts to losses
Date: Mon, 15 Apr 2019 20:44:19 +0200	[thread overview]
Message-ID: <20190415183723.537531204@linuxfoundation.org> (raw)
In-Reply-To: <20190415183707.712011689@linuxfoundation.org>

From: Koen De Schepper <koen.de_schepper@nokia-bell-labs.com>

[ Upstream commit aecfde23108b8e637d9f5c5e523b24fb97035dc3 ]

RFC8257 §3.5 explicitly states that "A DCTCP sender MUST react to
loss episodes in the same way as conventional TCP".

Currently, Linux DCTCP performs no cwnd reduction when losses
are encountered. Optionally, the dctcp_clamp_alpha_on_loss resets
alpha to its maximal value if a RTO happens. This behavior
is sub-optimal for at least two reasons: i) it ignores losses
triggering fast retransmissions; and ii) it causes unnecessary large
cwnd reduction in the future if the loss was isolated as it resets
the historical term of DCTCP's alpha EWMA to its maximal value (i.e.,
denoting a total congestion). The second reason has an especially
noticeable effect when using DCTCP in high BDP environments, where
alpha normally stays at low values.

This patch replace the clamping of alpha by setting ssthresh to
half of cwnd for both fast retransmissions and RTOs, at most once
per RTT. Consequently, the dctcp_clamp_alpha_on_loss module parameter
has been removed.

The table below shows experimental results where we measured the
drop probability of a PIE AQM (not applying ECN marks) at a
bottleneck in the presence of a single TCP flow with either the
alpha-clamping option enabled or the cwnd halving proposed by this
patch. Results using reno or cubic are given for comparison.

                          |  Link   |   RTT    |    Drop
                 TCP CC   |  speed  | base+AQM | probability
        ==================|=========|==========|============
                    CUBIC |  40Mbps |  7+20ms  |    0.21%
                     RENO |         |          |    0.19%
        DCTCP-CLAMP-ALPHA |         |          |   25.80%
         DCTCP-HALVE-CWND |         |          |    0.22%
        ------------------|---------|----------|------------
                    CUBIC | 100Mbps |  7+20ms  |    0.03%
                     RENO |         |          |    0.02%
        DCTCP-CLAMP-ALPHA |         |          |   23.30%
         DCTCP-HALVE-CWND |         |          |    0.04%
        ------------------|---------|----------|------------
                    CUBIC | 800Mbps |   1+1ms  |    0.04%
                     RENO |         |          |    0.05%
        DCTCP-CLAMP-ALPHA |         |          |   18.70%
         DCTCP-HALVE-CWND |         |          |    0.06%

We see that, without halving its cwnd for all source of losses,
DCTCP drives the AQM to large drop probabilities in order to keep
the queue length under control (i.e., it repeatedly faces RTOs).
Instead, if DCTCP reacts to all source of losses, it can then be
controlled by the AQM using similar drop levels than cubic or reno.

Signed-off-by: Koen De Schepper <koen.de_schepper@nokia-bell-labs.com>
Signed-off-by: Olivier Tilmans <olivier.tilmans@nokia-bell-labs.com>
Cc: Bob Briscoe <research@bobbriscoe.net>
Cc: Lawrence Brakmo <brakmo@fb.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Daniel Borkmann <borkmann@iogearbox.net>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Andrew Shewmaker <agshew@gmail.com>
Cc: Glenn Judd <glenn.judd@morganstanley.com>
Acked-by: Florian Westphal <fw@strlen.de>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv4/tcp_dctcp.c |   36 ++++++++++++++++++------------------
 1 file changed, 18 insertions(+), 18 deletions(-)

--- a/net/ipv4/tcp_dctcp.c
+++ b/net/ipv4/tcp_dctcp.c
@@ -66,11 +66,6 @@ static unsigned int dctcp_alpha_on_init
 module_param(dctcp_alpha_on_init, uint, 0644);
 MODULE_PARM_DESC(dctcp_alpha_on_init, "parameter for initial alpha value");
 
-static unsigned int dctcp_clamp_alpha_on_loss __read_mostly;
-module_param(dctcp_clamp_alpha_on_loss, uint, 0644);
-MODULE_PARM_DESC(dctcp_clamp_alpha_on_loss,
-		 "parameter for clamping alpha on loss");
-
 static struct tcp_congestion_ops dctcp_reno;
 
 static void dctcp_reset(const struct tcp_sock *tp, struct dctcp *ca)
@@ -211,21 +206,23 @@ static void dctcp_update_alpha(struct so
 	}
 }
 
-static void dctcp_state(struct sock *sk, u8 new_state)
+static void dctcp_react_to_loss(struct sock *sk)
 {
-	if (dctcp_clamp_alpha_on_loss && new_state == TCP_CA_Loss) {
-		struct dctcp *ca = inet_csk_ca(sk);
+	struct dctcp *ca = inet_csk_ca(sk);
+	struct tcp_sock *tp = tcp_sk(sk);
 
-		/* If this extension is enabled, we clamp dctcp_alpha to
-		 * max on packet loss; the motivation is that dctcp_alpha
-		 * is an indicator to the extend of congestion and packet
-		 * loss is an indicator of extreme congestion; setting
-		 * this in practice turned out to be beneficial, and
-		 * effectively assumes total congestion which reduces the
-		 * window by half.
-		 */
-		ca->dctcp_alpha = DCTCP_MAX_ALPHA;
-	}
+	ca->loss_cwnd = tp->snd_cwnd;
+	tp->snd_ssthresh = max(tp->snd_cwnd >> 1U, 2U);
+}
+
+static void dctcp_state(struct sock *sk, u8 new_state)
+{
+	if (new_state == TCP_CA_Recovery &&
+	    new_state != inet_csk(sk)->icsk_ca_state)
+		dctcp_react_to_loss(sk);
+	/* We handle RTO in dctcp_cwnd_event to ensure that we perform only
+	 * one loss-adjustment per RTT.
+	 */
 }
 
 static void dctcp_cwnd_event(struct sock *sk, enum tcp_ca_event ev)
@@ -237,6 +234,9 @@ static void dctcp_cwnd_event(struct sock
 	case CA_EVENT_ECN_NO_CE:
 		dctcp_ce_state_1_to_0(sk);
 		break;
+	case CA_EVENT_LOSS:
+		dctcp_react_to_loss(sk);
+		break;
 	default:
 		/* Don't care for the rest. */
 		break;



  parent reply	other threads:[~2019-04-15 18:47 UTC|newest]

Thread overview: 103+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-15 18:43 [PATCH 4.9 00/76] 4.9.169-stable review Greg Kroah-Hartman
2019-04-15 18:43 ` [PATCH 4.9 01/76] x86/power: Fix some ordering bugs in __restore_processor_context() Greg Kroah-Hartman
2019-04-15 18:43 ` [PATCH 4.9 02/76] x86/power/64: Use struct desc_ptr for the IDT in struct saved_context Greg Kroah-Hartman
2019-04-15 18:43 ` [PATCH 4.9 03/76] x86/power/32: Move SYSENTER MSR restoration to fix_processor_context() Greg Kroah-Hartman
2019-04-15 18:43 ` [PATCH 4.9 04/76] x86/power: Make restore_processor_context() sane Greg Kroah-Hartman
2019-04-15 18:43 ` [PATCH 4.9 05/76] powerpc/tm: Limit TM code inside PPC_TRANSACTIONAL_MEM Greg Kroah-Hartman
2019-04-15 18:43 ` [PATCH 4.9 06/76] kbuild: clang: choose GCC_TOOLCHAIN_DIR not on LD Greg Kroah-Hartman
2019-04-15 18:43 ` [PATCH 4.9 07/76] x86: vdso: Use $LD instead of $CC to link Greg Kroah-Hartman
2019-04-15 18:43 ` [PATCH 4.9 08/76] x86/vdso: Drop implicit common-page-size linker flag Greg Kroah-Hartman
2019-04-15 18:43 ` [PATCH 4.9 09/76] lib/string.c: implement a basic bcmp Greg Kroah-Hartman
2019-04-15 18:43 ` [PATCH 4.9 10/76] powerpc: Fix invalid use of register expressions Greg Kroah-Hartman
2019-04-15 18:43 ` [PATCH 4.9 11/76] powerpc/64s: Add barrier_nospec Greg Kroah-Hartman
2019-04-15 18:43 ` [PATCH 4.9 12/76] powerpc/64s: Add support for ori barrier_nospec patching Greg Kroah-Hartman
2019-04-15 18:43 ` [PATCH 4.9 13/76] powerpc: Avoid code patching freed init sections Greg Kroah-Hartman
2019-04-15 18:43 ` [PATCH 4.9 14/76] powerpc/64s: Patch barrier_nospec in modules Greg Kroah-Hartman
2019-04-15 18:43 ` [PATCH 4.9 15/76] powerpc/64s: Enable barrier_nospec based on firmware settings Greg Kroah-Hartman
2019-04-15 18:43 ` [PATCH 4.9 16/76] powerpc: Use barrier_nospec in copy_from_user() Greg Kroah-Hartman
2019-04-15 18:43 ` [PATCH 4.9 17/76] powerpc/64: Use barrier_nospec in syscall entry Greg Kroah-Hartman
2019-04-15 18:43 ` [PATCH 4.9 18/76] powerpc/64s: Enhance the information in cpu_show_spectre_v1() Greg Kroah-Hartman
2019-04-15 18:43 ` [PATCH 4.9 19/76] powerpc64s: Show ori31 availability in spectre_v1 sysfs file not v2 Greg Kroah-Hartman
2019-04-15 18:43 ` [PATCH 4.9 20/76] powerpc/64: Disable the speculation barrier from the command line Greg Kroah-Hartman
2019-04-16 15:43   ` Diana Madalina Craciun
2019-04-16 18:46     ` Greg Kroah-Hartman
2019-04-17  8:41     ` Michael Ellerman
2019-04-15 18:43 ` [PATCH 4.9 21/76] powerpc/64: Make stf barrier PPC_BOOK3S_64 specific Greg Kroah-Hartman
2019-04-15 18:43 ` [PATCH 4.9 22/76] powerpc/64: Add CONFIG_PPC_BARRIER_NOSPEC Greg Kroah-Hartman
2019-04-15 18:43 ` [PATCH 4.9 23/76] powerpc/64: Call setup_barrier_nospec() from setup_arch() Greg Kroah-Hartman
2019-04-15 18:43 ` [PATCH 4.9 24/76] powerpc/64: Make meltdown reporting Book3S 64 specific Greg Kroah-Hartman
2019-04-15 18:43 ` [PATCH 4.9 25/76] powerpc/fsl: Add barrier_nospec implementation for NXP PowerPC Book3E Greg Kroah-Hartman
2019-04-15 18:43 ` [PATCH 4.9 26/76] powerpc/fsl: Sanitize the syscall table for NXP PowerPC 32 bit platforms Greg Kroah-Hartman
2019-04-15 18:43 ` [PATCH 4.9 27/76] powerpc/asm: Add a patch_site macro & helpers for patching instructions Greg Kroah-Hartman
2019-04-15 18:43 ` [PATCH 4.9 28/76] powerpc/64s: Add new security feature flags for count cache flush Greg Kroah-Hartman
2019-04-15 18:43 ` [PATCH 4.9 29/76] powerpc/64s: Add support for software " Greg Kroah-Hartman
2019-04-15 18:43 ` [PATCH 4.9 30/76] powerpc/pseries: Query hypervisor for count cache flush settings Greg Kroah-Hartman
2019-04-15 18:43 ` [PATCH 4.9 31/76] powerpc/powernv: Query firmware " Greg Kroah-Hartman
2019-04-15 18:43 ` [PATCH 4.9 32/76] powerpc/fsl: Add infrastructure to fixup branch predictor flush Greg Kroah-Hartman
2019-04-15 18:43 ` [PATCH 4.9 33/76] powerpc/fsl: Add macro to flush the branch predictor Greg Kroah-Hartman
2019-04-15 18:43 ` [PATCH 4.9 34/76] powerpc/fsl: Fix spectre_v2 mitigations reporting Greg Kroah-Hartman
2019-04-15 18:43 ` [PATCH 4.9 35/76] powerpc/fsl: Emulate SPRN_BUCSR register Greg Kroah-Hartman
2019-04-15 18:44 ` [PATCH 4.9 36/76] powerpc/fsl: Add nospectre_v2 command line argument Greg Kroah-Hartman
2019-04-15 18:44 ` [PATCH 4.9 37/76] powerpc/fsl: Flush the branch predictor at each kernel entry (64bit) Greg Kroah-Hartman
2019-04-15 18:44 ` [PATCH 4.9 38/76] powerpc/fsl: Flush the branch predictor at each kernel entry (32 bit) Greg Kroah-Hartman
2019-04-15 18:44 ` [PATCH 4.9 39/76] powerpc/fsl: Flush branch predictor when entering KVM Greg Kroah-Hartman
2019-04-15 18:44 ` [PATCH 4.9 40/76] powerpc/fsl: Enable runtime patching if nospectre_v2 boot arg is used Greg Kroah-Hartman
2019-04-15 18:44 ` [PATCH 4.9 41/76] powerpc/fsl: Update Spectre v2 reporting Greg Kroah-Hartman
2019-04-15 18:44 ` [PATCH 4.9 42/76] powerpc/fsl: Fixed warning: orphan section `__btb_flush_fixup Greg Kroah-Hartman
2019-04-15 18:44 ` [PATCH 4.9 43/76] powerpc/fsl: Fix the flush of branch predictor Greg Kroah-Hartman
2019-04-15 18:44 ` [PATCH 4.9 44/76] powerpc/security: Fix spectre_v2 reporting Greg Kroah-Hartman
2019-04-15 18:44 ` [PATCH 4.9 45/76] arm64: kaslr: Reserve size of ARM64_MEMSTART_ALIGN in linear region Greg Kroah-Hartman
2019-04-15 18:44 ` [PATCH 4.9 46/76] tty: mark Siemens R3964 line discipline as BROKEN Greg Kroah-Hartman
2019-04-15 18:44 ` [PATCH 4.9 47/76] tty: ldisc: add sysctl to prevent autoloading of ldiscs Greg Kroah-Hartman
2019-04-15 18:44 ` [PATCH 4.9 48/76] ipv6: Fix dangling pointer when ipv6 fragment Greg Kroah-Hartman
2019-04-15 18:44 ` [PATCH 4.9 49/76] ipv6: sit: reset ip header pointer in ipip6_rcv Greg Kroah-Hartman
2019-04-15 18:44 ` [PATCH 4.9 50/76] kcm: switch order of device registration to fix a crash Greg Kroah-Hartman
2019-04-15 18:44 ` [PATCH 4.9 51/76] net: rds: force to destroy connection if t_sock is NULL in rds_tcp_kill_sock() Greg Kroah-Hartman
2019-04-15 18:44 ` [PATCH 4.9 52/76] openvswitch: fix flow actions reallocation Greg Kroah-Hartman
2019-04-15 18:44 ` [PATCH 4.9 53/76] qmi_wwan: add Olicard 600 Greg Kroah-Hartman
2019-04-15 18:44 ` [PATCH 4.9 54/76] sctp: initialize _pad of sockaddr_in before copying to user memory Greg Kroah-Hartman
2019-04-15 18:44 ` Greg Kroah-Hartman [this message]
2019-04-15 18:44 ` [PATCH 4.9 56/76] vrf: check accept_source_route on the original netdevice Greg Kroah-Hartman
2019-04-15 18:44 ` [PATCH 4.9 57/76] bnxt_en: Reset device on RX buffer errors Greg Kroah-Hartman
2019-04-15 18:44 ` [PATCH 4.9 58/76] bnxt_en: Improve RX consumer index validity check Greg Kroah-Hartman
2019-04-15 18:44 ` [PATCH 4.9 59/76] net/mlx5e: Add a lock on tir list Greg Kroah-Hartman
2019-04-15 18:44 ` [PATCH 4.9 60/76] netns: provide pure entropy for net_hash_mix() Greg Kroah-Hartman
2019-04-15 18:44 ` [PATCH 4.9 61/76] net: ethtool: not call vzalloc for zero sized memory request Greg Kroah-Hartman
2019-04-15 18:44 ` [PATCH 4.9 62/76] ip6_tunnel: Match to ARPHRD_TUNNEL6 for dev type Greg Kroah-Hartman
2019-04-15 18:44 ` [PATCH 4.9 63/76] ALSA: seq: Fix OOB-reads from strlcpy Greg Kroah-Hartman
2019-04-15 18:44 ` [PATCH 4.9 64/76] parisc: Detect QEMU earlier in boot process Greg Kroah-Hartman
2019-04-16 13:49   ` Helge Deller
2019-04-16 13:50   ` Helge Deller
2019-04-16 14:23     ` Greg Kroah-Hartman
2019-04-16 15:00       ` Helge Deller
2019-04-16 18:03         ` Greg Kroah-Hartman
2019-04-15 18:44 ` [PATCH 4.9 65/76] include/linux/bitrev.h: fix constant bitrev Greg Kroah-Hartman
2019-04-15 18:44 ` [PATCH 4.9 66/76] ASoC: fsl_esai: fix channel swap issue when stream starts Greg Kroah-Hartman
2019-04-15 18:44 ` [PATCH 4.9 67/76] Btrfs: do not allow trimming when a fs is mounted with the nologreplay option Greg Kroah-Hartman
2019-04-15 18:44 ` [PATCH 4.9 68/76] block: do not leak memory in bio_copy_user_iov() Greg Kroah-Hartman
2019-04-15 18:44 ` [PATCH 4.9 69/76] genirq: Respect IRQCHIP_SKIP_SET_WAKE in irq_chip_set_wake_parent() Greg Kroah-Hartman
2019-04-15 18:44 ` [PATCH 4.9 70/76] virtio: Honour may_reduce_num in vring_create_virtqueue Greg Kroah-Hartman
2019-04-15 18:44 ` [PATCH 4.9 71/76] ARM: dts: at91: Fix typo in ISC_D0 on PC9 Greg Kroah-Hartman
2019-04-15 18:44 ` [PATCH 4.9 72/76] arm64: futex: Fix FUTEX_WAKE_OP atomic ops with non-zero result value Greg Kroah-Hartman
2019-04-15 22:01   ` Nathan Chancellor
2019-04-16  9:00     ` Greg Kroah-Hartman
2019-04-16 16:47       ` Nathan Chancellor
2019-04-17  6:15         ` Greg Kroah-Hartman
2019-04-17  6:41           ` Nathan Chancellor
2019-04-17  6:47             ` Greg Kroah-Hartman
2019-04-16  9:13     ` Will Deacon
2019-04-16 16:49       ` Nathan Chancellor
2019-04-15 18:44 ` [PATCH 4.9 73/76] xen: Prevent buffer overflow in privcmd ioctl Greg Kroah-Hartman
2019-04-15 18:44 ` [PATCH 4.9 74/76] sched/fair: Do not re-read ->h_load_next during hierarchical load calculation Greg Kroah-Hartman
2019-04-15 18:44 ` [PATCH 4.9 75/76] xtensa: fix return_address Greg Kroah-Hartman
2019-04-15 18:44 ` [PATCH 4.9 76/76] PCI: Add function 1 DMA alias quirk for Marvell 9170 SATA controller Greg Kroah-Hartman
2019-04-16  0:44 ` [PATCH 4.9 00/76] 4.9.169-stable review kernelci.org bot
2019-04-16  4:06 ` Naresh Kamboju
2019-04-16 10:33 ` Jon Hunter
2019-04-16 13:12 ` Guenter Roeck
2019-04-16 16:29 ` Guenter Roeck
2019-04-16 18:46   ` Greg Kroah-Hartman
2019-04-16 20:19     ` Guenter Roeck
2019-04-16 20:27       ` Greg Kroah-Hartman
2019-04-16 21:39 ` shuah
2019-04-16 22:05 ` Bharath Vedartham

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190415183723.537531204@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=agshew@gmail.com \
    --cc=borkmann@iogearbox.net \
    --cc=brakmo@fb.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=fw@strlen.de \
    --cc=glenn.judd@morganstanley.com \
    --cc=koen.de_schepper@nokia-bell-labs.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ncardwell@google.com \
    --cc=olivier.tilmans@nokia-bell-labs.com \
    --cc=research@bobbriscoe.net \
    --cc=stable@vger.kernel.org \
    --cc=ycheng@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).