All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Larry Brakmo <brakmo@fb.com>,
	Yuchung Cheng <ycheng@google.com>,
	Eric Dumazet <edumazet@google.com>,
	Neal Cardwell <ncardwell@google.com>,
	"David S. Miller" <davem@davemloft.net>
Subject: [PATCH 4.4 05/23] tcp: fix dctcp delayed ACK schedule
Date: Fri, 27 Jul 2018 12:09:06 +0200	[thread overview]
Message-ID: <20180727100845.383929376@linuxfoundation.org> (raw)
In-Reply-To: <20180727100845.054377089@linuxfoundation.org>

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Yuchung Cheng <ycheng@google.com>

[ Upstream commit b0c05d0e99d98d7f0cd41efc1eeec94efdc3325d ]

Previously, when a data segment was sent an ACK was piggybacked
on the data segment without generating a CA_EVENT_NON_DELAYED_ACK
event to notify congestion control modules. So the DCTCP
ca->delayed_ack_reserved flag could incorrectly stay set when
in fact there were no delayed ACKs being reserved. This could result
in sending a special ECN notification ACK that carries an older
ACK sequence, when in fact there was no need for such an ACK.
DCTCP keeps track of the delayed ACK status with its own separate
state ca->delayed_ack_reserved. Previously it may accidentally cancel
the delayed ACK without updating this field upon sending a special
ACK that carries a older ACK sequence. This inconsistency would
lead to DCTCP receiver never acknowledging the latest data until the
sender times out and retry in some cases.

Packetdrill script (provided by Larry Brakmo)

0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
0.000 setsockopt(3, SOL_TCP, TCP_CONGESTION, "dctcp", 5) = 0
0.000 bind(3, ..., ...) = 0
0.000 listen(3, 1) = 0

0.100 < [ect0] SEW 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
0.100 > SE. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8>
0.110 < [ect0] . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4

0.200 < [ect0] . 1:1001(1000) ack 1 win 257
0.200 > [ect01] . 1:1(0) ack 1001

0.200 write(4, ..., 1) = 1
0.200 > [ect01] P. 1:2(1) ack 1001

0.200 < [ect0] . 1001:2001(1000) ack 2 win 257
0.200 write(4, ..., 1) = 1
0.200 > [ect01] P. 2:3(1) ack 2001

0.200 < [ect0] . 2001:3001(1000) ack 3 win 257
0.200 < [ect0] . 3001:4001(1000) ack 3 win 257
0.200 > [ect01] . 3:3(0) ack 4001

0.210 < [ce] P. 4001:4501(500) ack 3 win 257

+0.001 read(4, ..., 4500) = 4500
+0 write(4, ..., 1) = 1
+0 > [ect01] PE. 3:4(1) ack 4501

+0.010 < [ect0] W. 4501:5501(1000) ack 4 win 257
// Previously the ACK sequence below would be 4501, causing a long RTO
+0.040~+0.045 > [ect01] . 4:4(0) ack 5501   // delayed ack

+0.311 < [ect0] . 5501:6501(1000) ack 4 win 257  // More data
+0 > [ect01] . 4:4(0) ack 6501     // now acks everything

+0.500 < F. 9501:9501(0) ack 4 win 257

Reported-by: Larry Brakmo <brakmo@fb.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv4/tcp_dctcp.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

--- a/net/ipv4/tcp_dctcp.c
+++ b/net/ipv4/tcp_dctcp.c
@@ -134,7 +134,8 @@ static void dctcp_ce_state_0_to_1(struct
 	/* State has changed from CE=0 to CE=1 and delayed
 	 * ACK has not sent yet.
 	 */
-	if (!ca->ce_state && ca->delayed_ack_reserved) {
+	if (!ca->ce_state &&
+	    inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER) {
 		u32 tmp_rcv_nxt;
 
 		/* Save current rcv_nxt. */
@@ -164,7 +165,8 @@ static void dctcp_ce_state_1_to_0(struct
 	/* State has changed from CE=1 to CE=0 and delayed
 	 * ACK has not sent yet.
 	 */
-	if (ca->ce_state && ca->delayed_ack_reserved) {
+	if (ca->ce_state &&
+	    inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER) {
 		u32 tmp_rcv_nxt;
 
 		/* Save current rcv_nxt. */



  parent reply	other threads:[~2018-07-27 10:12 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-27 10:09 [PATCH 4.4 00/23] 4.4.145-stable review Greg Kroah-Hartman
2018-07-27 10:09 ` [PATCH 4.4 01/23] MIPS: ath79: fix register address in ath79_ddr_wb_flush() Greg Kroah-Hartman
2018-07-27 10:09 ` [PATCH 4.4 02/23] ip: hash fragments consistently Greg Kroah-Hartman
2018-07-27 10:09 ` [PATCH 4.4 03/23] net/mlx4_core: Save the qpn from the input modifier in RST2INIT wrapper Greg Kroah-Hartman
2018-07-27 10:09 ` [PATCH 4.4 04/23] rtnetlink: add rtnl_link_state check in rtnl_configure_link Greg Kroah-Hartman
2018-07-27 10:09 ` Greg Kroah-Hartman [this message]
2018-07-27 10:09 ` [PATCH 4.4 06/23] tcp: helpers to send special DCTCP ack Greg Kroah-Hartman
2018-07-27 10:09 ` [PATCH 4.4 07/23] tcp: do not cancel delay-AcK on DCTCP special ACK Greg Kroah-Hartman
2018-07-27 10:09 ` [PATCH 4.4 08/23] tcp: do not delay ACK in DCTCP upon CE status change Greg Kroah-Hartman
2018-07-27 10:09 ` [PATCH 4.4 09/23] tcp: avoid collapses in tcp_prune_queue() if possible Greg Kroah-Hartman
2018-07-27 10:09 ` [PATCH 4.4 10/23] tcp: detect malicious patterns in tcp_collapse_ofo_queue() Greg Kroah-Hartman
2018-07-27 10:09 ` [PATCH 4.4 11/23] ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull Greg Kroah-Hartman
2018-07-27 10:09 ` [PATCH 4.4 12/23] usb: cdc_acm: Add quirk for Castles VEGA3000 Greg Kroah-Hartman
2018-07-27 10:09 ` [PATCH 4.4 13/23] usb: core: handle hub C_PORT_OVER_CURRENT condition Greg Kroah-Hartman
2018-07-27 10:09 ` [PATCH 4.4 14/23] usb: gadget: f_fs: Only return delayed status when len is 0 Greg Kroah-Hartman
2018-07-27 10:09 ` [PATCH 4.4 15/23] driver core: Partially revert "driver core: correct devices shutdown order" Greg Kroah-Hartman
2018-07-27 10:09 ` [PATCH 4.4 16/23] can: xilinx_can: fix RX loop if RXNEMP is asserted without RXOK Greg Kroah-Hartman
2018-07-27 10:09 ` [PATCH 4.4 17/23] can: xilinx_can: fix recovery from error states not being propagated Greg Kroah-Hartman
2018-07-27 10:09 ` [PATCH 4.4 18/23] can: xilinx_can: fix device dropping off bus on RX overrun Greg Kroah-Hartman
2018-07-27 10:09 ` [PATCH 4.4 19/23] can: xilinx_can: keep only 1-2 frames in TX FIFO to fix TX accounting Greg Kroah-Hartman
2018-07-27 10:09 ` [PATCH 4.4 20/23] can: xilinx_can: fix incorrect clear of non-processed interrupts Greg Kroah-Hartman
2018-07-27 10:09 ` [PATCH 4.4 21/23] can: xilinx_can: fix RX overflow interrupt not being enabled Greg Kroah-Hartman
2018-07-27 10:09 ` [PATCH 4.4 22/23] turn off -Wattribute-alias Greg Kroah-Hartman
2018-07-27 10:09 ` [PATCH 4.4 23/23] ARM: fix put_user() for gcc-8 Greg Kroah-Hartman
2018-07-27 12:22 ` [PATCH 4.4 00/23] 4.4.145-stable review Nathan Chancellor
2018-07-27 17:28 ` Guenter Roeck
2018-07-27 20:06 ` Shuah Khan
2018-07-28  6:55 ` Naresh Kamboju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180727100845.383929376@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=brakmo@fb.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ncardwell@google.com \
    --cc=stable@vger.kernel.org \
    --cc=ycheng@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.