Stable Archive on lore.kernel.org
 help / color / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org,
	Koen De Schepper <koen.de_schepper@nokia-bell-labs.com>,
	Olivier Tilmans <olivier.tilmans@nokia-bell-labs.com>,
	Bob Briscoe <research@bobbriscoe.net>,
	Lawrence Brakmo <brakmo@fb.com>, Florian Westphal <fw@strlen.de>,
	Daniel Borkmann <borkmann@iogearbox.net>,
	Yuchung Cheng <ycheng@google.com>,
	Neal Cardwell <ncardwell@google.com>,
	Eric Dumazet <edumazet@google.com>,
	Andrew Shewmaker <agshew@gmail.com>,
	Glenn Judd <glenn.judd@morganstanley.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	"David S. Miller" <davem@davemloft.net>
Subject: [PATCH 4.14 26/69] tcp: Ensure DCTCP reacts to losses
Date: Mon, 15 Apr 2019 20:58:44 +0200
Message-ID: <20190415183731.114640169@linuxfoundation.org> (raw)
In-Reply-To: <20190415183726.036654568@linuxfoundation.org>

From: Koen De Schepper <koen.de_schepper@nokia-bell-labs.com>

[ Upstream commit aecfde23108b8e637d9f5c5e523b24fb97035dc3 ]

RFC8257 §3.5 explicitly states that "A DCTCP sender MUST react to
loss episodes in the same way as conventional TCP".

Currently, Linux DCTCP performs no cwnd reduction when losses
are encountered. Optionally, the dctcp_clamp_alpha_on_loss resets
alpha to its maximal value if a RTO happens. This behavior
is sub-optimal for at least two reasons: i) it ignores losses
triggering fast retransmissions; and ii) it causes unnecessary large
cwnd reduction in the future if the loss was isolated as it resets
the historical term of DCTCP's alpha EWMA to its maximal value (i.e.,
denoting a total congestion). The second reason has an especially
noticeable effect when using DCTCP in high BDP environments, where
alpha normally stays at low values.

This patch replace the clamping of alpha by setting ssthresh to
half of cwnd for both fast retransmissions and RTOs, at most once
per RTT. Consequently, the dctcp_clamp_alpha_on_loss module parameter
has been removed.

The table below shows experimental results where we measured the
drop probability of a PIE AQM (not applying ECN marks) at a
bottleneck in the presence of a single TCP flow with either the
alpha-clamping option enabled or the cwnd halving proposed by this
patch. Results using reno or cubic are given for comparison.

                          |  Link   |   RTT    |    Drop
                 TCP CC   |  speed  | base+AQM | probability
        ==================|=========|==========|============
                    CUBIC |  40Mbps |  7+20ms  |    0.21%
                     RENO |         |          |    0.19%
        DCTCP-CLAMP-ALPHA |         |          |   25.80%
         DCTCP-HALVE-CWND |         |          |    0.22%
        ------------------|---------|----------|------------
                    CUBIC | 100Mbps |  7+20ms  |    0.03%
                     RENO |         |          |    0.02%
        DCTCP-CLAMP-ALPHA |         |          |   23.30%
         DCTCP-HALVE-CWND |         |          |    0.04%
        ------------------|---------|----------|------------
                    CUBIC | 800Mbps |   1+1ms  |    0.04%
                     RENO |         |          |    0.05%
        DCTCP-CLAMP-ALPHA |         |          |   18.70%
         DCTCP-HALVE-CWND |         |          |    0.06%

We see that, without halving its cwnd for all source of losses,
DCTCP drives the AQM to large drop probabilities in order to keep
the queue length under control (i.e., it repeatedly faces RTOs).
Instead, if DCTCP reacts to all source of losses, it can then be
controlled by the AQM using similar drop levels than cubic or reno.

Signed-off-by: Koen De Schepper <koen.de_schepper@nokia-bell-labs.com>
Signed-off-by: Olivier Tilmans <olivier.tilmans@nokia-bell-labs.com>
Cc: Bob Briscoe <research@bobbriscoe.net>
Cc: Lawrence Brakmo <brakmo@fb.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Daniel Borkmann <borkmann@iogearbox.net>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Andrew Shewmaker <agshew@gmail.com>
Cc: Glenn Judd <glenn.judd@morganstanley.com>
Acked-by: Florian Westphal <fw@strlen.de>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv4/tcp_dctcp.c |   36 ++++++++++++++++++------------------
 1 file changed, 18 insertions(+), 18 deletions(-)

--- a/net/ipv4/tcp_dctcp.c
+++ b/net/ipv4/tcp_dctcp.c
@@ -66,11 +66,6 @@ static unsigned int dctcp_alpha_on_init
 module_param(dctcp_alpha_on_init, uint, 0644);
 MODULE_PARM_DESC(dctcp_alpha_on_init, "parameter for initial alpha value");
 
-static unsigned int dctcp_clamp_alpha_on_loss __read_mostly;
-module_param(dctcp_clamp_alpha_on_loss, uint, 0644);
-MODULE_PARM_DESC(dctcp_clamp_alpha_on_loss,
-		 "parameter for clamping alpha on loss");
-
 static struct tcp_congestion_ops dctcp_reno;
 
 static void dctcp_reset(const struct tcp_sock *tp, struct dctcp *ca)
@@ -211,21 +206,23 @@ static void dctcp_update_alpha(struct so
 	}
 }
 
-static void dctcp_state(struct sock *sk, u8 new_state)
+static void dctcp_react_to_loss(struct sock *sk)
 {
-	if (dctcp_clamp_alpha_on_loss && new_state == TCP_CA_Loss) {
-		struct dctcp *ca = inet_csk_ca(sk);
+	struct dctcp *ca = inet_csk_ca(sk);
+	struct tcp_sock *tp = tcp_sk(sk);
 
-		/* If this extension is enabled, we clamp dctcp_alpha to
-		 * max on packet loss; the motivation is that dctcp_alpha
-		 * is an indicator to the extend of congestion and packet
-		 * loss is an indicator of extreme congestion; setting
-		 * this in practice turned out to be beneficial, and
-		 * effectively assumes total congestion which reduces the
-		 * window by half.
-		 */
-		ca->dctcp_alpha = DCTCP_MAX_ALPHA;
-	}
+	ca->loss_cwnd = tp->snd_cwnd;
+	tp->snd_ssthresh = max(tp->snd_cwnd >> 1U, 2U);
+}
+
+static void dctcp_state(struct sock *sk, u8 new_state)
+{
+	if (new_state == TCP_CA_Recovery &&
+	    new_state != inet_csk(sk)->icsk_ca_state)
+		dctcp_react_to_loss(sk);
+	/* We handle RTO in dctcp_cwnd_event to ensure that we perform only
+	 * one loss-adjustment per RTT.
+	 */
 }
 
 static void dctcp_cwnd_event(struct sock *sk, enum tcp_ca_event ev)
@@ -237,6 +234,9 @@ static void dctcp_cwnd_event(struct sock
 	case CA_EVENT_ECN_NO_CE:
 		dctcp_ce_state_1_to_0(sk);
 		break;
+	case CA_EVENT_LOSS:
+		dctcp_react_to_loss(sk);
+		break;
 	default:
 		/* Don't care for the rest. */
 		break;



  parent reply index

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-15 18:58 [PATCH 4.14 00/69] 4.14.112-stable review Greg Kroah-Hartman
2019-04-15 18:58 ` [PATCH 4.14 01/69] net: sfp: move sfp_register_socket call from sfp_remove to sfp_probe Greg Kroah-Hartman
2019-04-15 18:58 ` [PATCH 4.14 02/69] x86/power: Fix some ordering bugs in __restore_processor_context() Greg Kroah-Hartman
2019-04-15 18:58 ` [PATCH 4.14 03/69] x86/power/64: Use struct desc_ptr for the IDT in struct saved_context Greg Kroah-Hartman
2019-04-15 20:07   ` Pavel Machek
2019-04-15 18:58 ` [PATCH 4.14 04/69] x86/power/32: Move SYSENTER MSR restoration to fix_processor_context() Greg Kroah-Hartman
2019-04-15 20:07   ` Pavel Machek
2019-04-15 22:07     ` Sasha Levin
2019-04-15 18:58 ` [PATCH 4.14 05/69] x86/power: Make restore_processor_context() sane Greg Kroah-Hartman
2019-04-15 20:04   ` Pavel Machek
2019-04-15 20:18     ` Andy Lutomirski
2019-04-16 11:56       ` Pavel Machek
2019-04-16 13:22         ` Sasha Levin
2019-04-15 18:58 ` [PATCH 4.14 06/69] drm/i915/gvt: do not let pin count of shadow mm go negative Greg Kroah-Hartman
2019-04-15 18:58 ` [PATCH 4.14 07/69] powerpc/tm: Limit TM code inside PPC_TRANSACTIONAL_MEM Greg Kroah-Hartman
2019-04-15 18:58 ` [PATCH 4.14 08/69] kbuild: clang: choose GCC_TOOLCHAIN_DIR not on LD Greg Kroah-Hartman
2019-04-15 18:58 ` [PATCH 4.14 09/69] x86: vdso: Use $LD instead of $CC to link Greg Kroah-Hartman
2019-04-26 11:41   ` Rantala, Tommi T. (Nokia - FI/Espoo)
2019-04-26 12:48     ` Nathan Chancellor
2019-04-26 13:23       ` Rantala, Tommi T. (Nokia - FI/Espoo)
2019-04-26 13:34         ` Nathan Chancellor
2019-04-27 13:54           ` gregkh
2019-04-15 18:58 ` [PATCH 4.14 10/69] x86/vdso: Drop implicit common-page-size linker flag Greg Kroah-Hartman
2019-04-15 18:58 ` [PATCH 4.14 11/69] lib/string.c: implement a basic bcmp Greg Kroah-Hartman
2019-04-15 18:58 ` [PATCH 4.14 12/69] stating: ccree: revert "staging: ccree: fix leak of import() after init()" Greg Kroah-Hartman
2019-04-15 18:58 ` [PATCH 4.14 13/69] arm64: kaslr: Reserve size of ARM64_MEMSTART_ALIGN in linear region Greg Kroah-Hartman
2019-04-15 18:58 ` [PATCH 4.14 14/69] tty: mark Siemens R3964 line discipline as BROKEN Greg Kroah-Hartman
2019-04-15 18:58 ` [PATCH 4.14 15/69] tty: ldisc: add sysctl to prevent autoloading of ldiscs Greg Kroah-Hartman
2019-04-15 18:58 ` [PATCH 4.14 16/69] ipv6: Fix dangling pointer when ipv6 fragment Greg Kroah-Hartman
2019-04-15 18:58 ` [PATCH 4.14 17/69] ipv6: sit: reset ip header pointer in ipip6_rcv Greg Kroah-Hartman
2019-04-15 18:58 ` [PATCH 4.14 18/69] kcm: switch order of device registration to fix a crash Greg Kroah-Hartman
2019-04-15 18:58 ` [PATCH 4.14 19/69] net-gro: Fix GRO flush when receiving a GSO packet Greg Kroah-Hartman
2019-04-15 18:58 ` [PATCH 4.14 20/69] net/mlx5: Decrease default mr cache size Greg Kroah-Hartman
2019-04-15 18:58 ` [PATCH 4.14 21/69] net: rds: force to destroy connection if t_sock is NULL in rds_tcp_kill_sock() Greg Kroah-Hartman
2019-04-15 18:58 ` [PATCH 4.14 22/69] net/sched: fix ->get helper of the matchall cls Greg Kroah-Hartman
2019-04-15 18:58 ` [PATCH 4.14 23/69] openvswitch: fix flow actions reallocation Greg Kroah-Hartman
2019-04-15 18:58 ` [PATCH 4.14 24/69] qmi_wwan: add Olicard 600 Greg Kroah-Hartman
2019-04-15 18:58 ` [PATCH 4.14 25/69] sctp: initialize _pad of sockaddr_in before copying to user memory Greg Kroah-Hartman
2019-04-15 18:58 ` Greg Kroah-Hartman [this message]
2019-04-15 18:58 ` [PATCH 4.14 27/69] vrf: check accept_source_route on the original netdevice Greg Kroah-Hartman
2019-04-15 18:58 ` [PATCH 4.14 28/69] net/mlx5e: Fix error handling when refreshing TIRs Greg Kroah-Hartman
2019-04-15 18:58 ` [PATCH 4.14 29/69] net/mlx5e: Add a lock on tir list Greg Kroah-Hartman
2019-04-15 18:58 ` [PATCH 4.14 30/69] nfp: validate the return code from dev_queue_xmit() Greg Kroah-Hartman
2019-04-15 18:58 ` [PATCH 4.14 31/69] bnxt_en: Improve RX consumer index validity check Greg Kroah-Hartman
2019-04-15 18:58 ` [PATCH 4.14 32/69] bnxt_en: Reset device on RX buffer errors Greg Kroah-Hartman
2019-04-15 18:58 ` [PATCH 4.14 33/69] net/sched: act_sample: fix divide by zero in the traffic path Greg Kroah-Hartman
2019-04-15 18:58 ` [PATCH 4.14 34/69] netns: provide pure entropy for net_hash_mix() Greg Kroah-Hartman
2019-04-15 18:58 ` [PATCH 4.14 35/69] net: ethtool: not call vzalloc for zero sized memory request Greg Kroah-Hartman
2019-04-15 18:58 ` [PATCH 4.14 36/69] ALSA: seq: Fix OOB-reads from strlcpy Greg Kroah-Hartman
2019-04-15 18:58 ` [PATCH 4.14 37/69] ip6_tunnel: Match to ARPHRD_TUNNEL6 for dev type Greg Kroah-Hartman
2019-04-15 18:58 ` [PATCH 4.14 38/69] hv_netvsc: Fix unwanted wakeup after tx_disable Greg Kroah-Hartman
2019-04-15 18:58 ` [PATCH 4.14 39/69] arm64: dts: rockchip: fix rk3328 sdmmc0 write errors Greg Kroah-Hartman
2019-04-15 18:58 ` [PATCH 4.14 40/69] parisc: Detect QEMU earlier in boot process Greg Kroah-Hartman
2019-04-15 18:58 ` [PATCH 4.14 41/69] parisc: regs_return_value() should return gpr28 Greg Kroah-Hartman
2019-04-15 18:59 ` [PATCH 4.14 42/69] alarmtimer: Return correct remaining time Greg Kroah-Hartman
2019-04-15 18:59 ` [PATCH 4.14 43/69] drm/udl: add a release method and delay modeset teardown Greg Kroah-Hartman
2019-04-15 18:59 ` [PATCH 4.14 44/69] include/linux/bitrev.h: fix constant bitrev Greg Kroah-Hartman
2019-04-15 18:59 ` [PATCH 4.14 45/69] ASoC: fsl_esai: fix channel swap issue when stream starts Greg Kroah-Hartman
2019-04-15 18:59 ` [PATCH 4.14 46/69] Btrfs: do not allow trimming when a fs is mounted with the nologreplay option Greg Kroah-Hartman
2019-04-15 18:59 ` [PATCH 4.14 47/69] btrfs: prop: fix zstd compression parameter validation Greg Kroah-Hartman
2019-04-15 18:59 ` [PATCH 4.14 48/69] btrfs: prop: fix vanished compression property after failed set Greg Kroah-Hartman
2019-04-15 18:59 ` [PATCH 4.14 49/69] block: do not leak memory in bio_copy_user_iov() Greg Kroah-Hartman
2019-04-15 18:59 ` [PATCH 4.14 50/69] block: fix the return errno for direct IO Greg Kroah-Hartman
2019-04-15 18:59 ` [PATCH 4.14 51/69] genirq: Respect IRQCHIP_SKIP_SET_WAKE in irq_chip_set_wake_parent() Greg Kroah-Hartman
2019-04-15 18:59 ` [PATCH 4.14 52/69] genirq: Initialize request_mutex if CONFIG_SPARSE_IRQ=n Greg Kroah-Hartman
2019-04-15 18:59 ` [PATCH 4.14 53/69] virtio: Honour may_reduce_num in vring_create_virtqueue Greg Kroah-Hartman
2019-04-15 18:59 ` [PATCH 4.14 54/69] ARM: dts: am335x-evmsk: Correct the regulators for the audio codec Greg Kroah-Hartman
2019-04-15 18:59 ` [PATCH 4.14 55/69] ARM: dts: am335x-evm: " Greg Kroah-Hartman
2019-04-15 18:59 ` [PATCH 4.14 56/69] ARM: dts: at91: Fix typo in ISC_D0 on PC9 Greg Kroah-Hartman
2019-04-15 18:59 ` [PATCH 4.14 57/69] arm64: futex: Fix FUTEX_WAKE_OP atomic ops with non-zero result value Greg Kroah-Hartman
2019-04-15 18:59 ` [PATCH 4.14 58/69] arm64: dts: rockchip: fix rk3328 rgmii high tx error rate Greg Kroah-Hartman
2019-04-15 18:59 ` [PATCH 4.14 59/69] arm64: backtrace: Dont bother trying to unwind the userspace stack Greg Kroah-Hartman
2019-04-15 18:59 ` [PATCH 4.14 60/69] xen: Prevent buffer overflow in privcmd ioctl Greg Kroah-Hartman
2019-04-15 18:59 ` [PATCH 4.14 61/69] sched/fair: Do not re-read ->h_load_next during hierarchical load calculation Greg Kroah-Hartman
2019-04-15 18:59 ` [PATCH 4.14 62/69] xtensa: fix return_address Greg Kroah-Hartman
2019-04-15 18:59 ` [PATCH 4.14 63/69] x86/perf/amd: Resolve race condition when disabling PMC Greg Kroah-Hartman
2019-04-15 18:59 ` [PATCH 4.14 64/69] x86/perf/amd: Resolve NMI latency issues for active PMCs Greg Kroah-Hartman
2019-04-15 18:59 ` [PATCH 4.14 65/69] x86/perf/amd: Remove need to check "running" bit in NMI handler Greg Kroah-Hartman
2019-04-15 18:59 ` [PATCH 4.14 66/69] PCI: Add function 1 DMA alias quirk for Marvell 9170 SATA controller Greg Kroah-Hartman
2019-04-15 18:59 ` [PATCH 4.14 67/69] dm table: propagate BDI_CAP_STABLE_WRITES to fix sporadic checksum errors Greg Kroah-Hartman
2019-04-15 18:59 ` [PATCH 4.14 68/69] arm64: dts: rockchip: fix vcc_host1_5v pin assign on rk3328-rock64 Greg Kroah-Hartman
2019-04-15 18:59 ` [PATCH 4.14 69/69] arm64: dts: rockchip: Fix vcc_host1_5v GPIO polarity " Greg Kroah-Hartman
2019-04-16  0:04 ` [PATCH 4.14 00/69] 4.14.112-stable review kernelci.org bot
2019-04-16  4:29 ` Naresh Kamboju
2019-04-16 10:33 ` Jon Hunter
2019-04-16 16:30 ` Guenter Roeck
2019-04-16 21:40 ` shuah
2019-04-16 22:17 ` Bharath Vedartham

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190415183731.114640169@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=agshew@gmail.com \
    --cc=borkmann@iogearbox.net \
    --cc=brakmo@fb.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=fw@strlen.de \
    --cc=glenn.judd@morganstanley.com \
    --cc=koen.de_schepper@nokia-bell-labs.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ncardwell@google.com \
    --cc=olivier.tilmans@nokia-bell-labs.com \
    --cc=research@bobbriscoe.net \
    --cc=stable@vger.kernel.org \
    --cc=ycheng@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Stable Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/stable/0 stable/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 stable stable/ https://lore.kernel.org/stable \
		stable@vger.kernel.org
	public-inbox-index stable

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.stable


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git