All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET 0/18] FRTO: fixes and small changes + SACK enhanced version
@ 2007-02-19 11:37 Ilpo Järvinen
  2007-02-19 11:37 ` [PATCH 1/18] [TCP] FRTO: Incorrectly clears TCPCB_EVER_RETRANS bit Ilpo Järvinen
  0 siblings, 1 reply; 19+ messages in thread
From: Ilpo Järvinen @ 2007-02-19 11:37 UTC (permalink / raw)
  To: netdev; +Cc: David Miller, Pasi Sarolahti

Hi,

Here is a set of patches that fix most of the flaws the current FRTO
implementation (specified in RFC4138) has, besides that, the last two
patches add SACK-enhanced FRTO (not enabled unless frto sysctl is set
to 2, which allows using of the basic version also with SACK). There
are some depencies to the earlier patches in the set (hard to list
all thoughts I've had, but not all combinations are not good ones
even if they apply cleanly).

 Documentation/networking/ip-sysctl.txt |    5 -
 include/net/tcp.h                      |   14 --
 net/ipv4/tcp_input.c                   |  265 ++++++++++++++++++++++++++------
 3 files changed, 221 insertions(+), 63 deletions(-)

(At least) one interpretation issue exists, see patch "FRTO: Entry is
allowed only during (New)Reno like recovery".

Besides that, these things should/could be solved (later on):
	- Setting undo_marker when RTO is not spurious (FRTO has been
	  clearing it, which disabled DSACK undos for conventional
	  recovery).
	- Interaction with Eifel
	- Different response (new sysctl to select them?)
	- When cumulative ACK arrives to the frto_highseq during FRTO,
	  it could be useful to go directly to CA_Open because then
	  duplicate ACKs for that segment could then be used initiate
	  recovery if it was lost. Most of the time, the duplicate ACKs
	  won't be false ones (we might have made too many unnecessary
	  retransmission but that's less likely with FRTO and it could
	  be consider while making state decision).
	- Maybe the frto_highmark should be reset somewhere during a
	  connection due to wrapping of seqnos (reord adjustment relies
	  on it having a valid after relation...)?
	- tcp_use_frto and tcp_enter_loss now both scan skb list from
	  the beginning, it might be possible to take advantage of this
	  either by combining them or by passing skb from use_frto
	  iteration to tcp_enter_loss.

I did some tests with FACK + SACK FRTO, results seemed to be correct but
the conservative response had really poor performance. I'm more familiar
with more aggressive response time-seq graphs and I was really wondering
does this thing really work at all (in couple of cases), but yes, I found
after tracing that it worked although the results was not a very good
looking one due to interaction with rate halving, maybe a "rate-halving
aware" response could do much better (or alternatively one that does more
aggressive undo).

# Test 1: normal TCP
# Test 2: spurious RTO
# Test 3: drop the segment
# Test 4: drop a delayed segment
# Test 5: drop the next segment
# Test 6: drop in window segment
# Test 7: drop the segment and the next segment
# Test 8: drop the segment and in window segment
# Test 9: delay the first and next (spurious RTOs, for different segments)
# Test 10: delay the first excessively (two spurious RTOs)
# Test n+1: drop rexmission
# Test n+2: delay rexmission (spurious RTO also after frto_highmark)
# Test n+3: delay rexmission (spurious RTO also after highmark), drop RTO seg
# Test n+4: drop the segment and rexmit
# Test n+5: drop the segment and first new data
# Test n+6: drop the segment and second new data

The tests were run in 2.6.18, I have quite a lot of own modifications
included in but they were disable using sysctls except for a change in
mark_head_lost: if condition from !TAGBITS -> !(TAGBITS & ~SACKED_RETRANS)
but afaict, it shouldn't affect, and if it does, it should be included
(if you received this mail from previous send attempt, I claimed by a
mistakenly that SACKED_ACKED was the bit that was excluded and had
incorrect parenthesized it here). I couldn't come up with a scenario in
mainline only code where SACKED_RETRANS would be set for a skb when LOST
has not been set, except for the head by FRTO itself which will not be a
problem. I have checked that the FRTO parts used in tests were identical
to the result of this patchset. Compile tested againts the net-2.6 (also
intermediate steps).


-- 
 i.

ps. I'm sorry if you receive these twice, the previous attempted had some
charset problems and was rejected at least by netdev.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/18] [TCP] FRTO: Incorrectly clears TCPCB_EVER_RETRANS bit
  2007-02-19 11:37 [PATCHSET 0/18] FRTO: fixes and small changes + SACK enhanced version Ilpo Järvinen
@ 2007-02-19 11:37 ` Ilpo Järvinen
  2007-02-19 11:37   ` [PATCH 2/18] [TCP] FRTO: Separated response from FRTO detection algorithm Ilpo Järvinen
  0 siblings, 1 reply; 19+ messages in thread
From: Ilpo Järvinen @ 2007-02-19 11:37 UTC (permalink / raw)
  To: netdev; +Cc: David Miller, Pasi Sarolahti

FRTO was slightly too brave... Should only clear
TCPCB_SACKED_RETRANS bit.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_input.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 1a14191..b21e232 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1266,7 +1266,7 @@ void tcp_enter_frto(struct sock *sk)
 	tp->undo_retrans = 0;
 
 	sk_stream_for_retrans_queue(skb, sk) {
-		TCP_SKB_CB(skb)->sacked &= ~TCPCB_RETRANS;
+		TCP_SKB_CB(skb)->sacked &= ~TCPCB_SACKED_RETRANS;
 	}
 	tcp_sync_left_out(tp);
 
-- 
1.4.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/18] [TCP] FRTO: Separated response from FRTO detection algorithm
  2007-02-19 11:37 ` [PATCH 1/18] [TCP] FRTO: Incorrectly clears TCPCB_EVER_RETRANS bit Ilpo Järvinen
@ 2007-02-19 11:37   ` Ilpo Järvinen
  2007-02-19 11:37     ` [PATCH 3/18] [TCP] FRTO: Moved tcp_use_frto from tcp.h to tcp_input.c Ilpo Järvinen
  0 siblings, 1 reply; 19+ messages in thread
From: Ilpo Järvinen @ 2007-02-19 11:37 UTC (permalink / raw)
  To: netdev; +Cc: David Miller, Pasi Sarolahti

FRTO spurious RTO detection algorithm (RFC4138) does not include response
to a detected spurious RTO but can use different response algorithms.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_input.c |   16 ++++++++++------
 1 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index b21e232..c5be3d0 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2467,6 +2467,15 @@ static int tcp_ack_update_window(struct 
 	return flag;
 }
 
+/* A very conservative spurious RTO response algorithm: reduce cwnd and
+ * continue in congestion avoidance.
+ */
+static void tcp_conservative_spur_to_response(struct tcp_sock *tp)
+{
+	tp->snd_cwnd = min(tp->snd_cwnd, tp->snd_ssthresh);
+	tcp_moderate_cwnd(tp);
+}
+
 static void tcp_process_frto(struct sock *sk, u32 prior_snd_una)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
@@ -2488,12 +2497,7 @@ static void tcp_process_frto(struct sock
 		 */
 		tp->snd_cwnd = tcp_packets_in_flight(tp) + 2;
 	} else {
-		/* Also the second ACK after RTO advances the window.
-		 * The RTO was likely spurious. Reduce cwnd and continue
-		 * in congestion avoidance
-		 */
-		tp->snd_cwnd = min(tp->snd_cwnd, tp->snd_ssthresh);
-		tcp_moderate_cwnd(tp);
+		tcp_conservative_spur_to_response(tp);
 	}
 
 	/* F-RTO affects on two new ACKs following RTO.
-- 
1.4.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/18] [TCP] FRTO: Moved tcp_use_frto from tcp.h to tcp_input.c
  2007-02-19 11:37   ` [PATCH 2/18] [TCP] FRTO: Separated response from FRTO detection algorithm Ilpo Järvinen
@ 2007-02-19 11:37     ` Ilpo Järvinen
  2007-02-19 11:37       ` [PATCH 4/18] [TCP] FRTO: Comment cleanup & improvement Ilpo Järvinen
  0 siblings, 1 reply; 19+ messages in thread
From: Ilpo Järvinen @ 2007-02-19 11:37 UTC (permalink / raw)
  To: netdev; +Cc: David Miller, Pasi Sarolahti

In addition, removed inline.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 include/net/tcp.h    |   14 +-------------
 net/ipv4/tcp_input.c |   13 +++++++++++++
 2 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 5c472f2..572a77b 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -341,6 +341,7 @@ extern struct sock *		tcp_check_req(stru
 extern int			tcp_child_process(struct sock *parent,
 						  struct sock *child,
 						  struct sk_buff *skb);
+extern int			tcp_use_frto(const struct sock *sk);
 extern void			tcp_enter_frto(struct sock *sk);
 extern void			tcp_enter_loss(struct sock *sk, int how);
 extern void			tcp_clear_retrans(struct tcp_sock *tp);
@@ -1033,19 +1034,6 @@ static inline int tcp_paws_check(const s
 
 #define TCP_CHECK_TIMER(sk) do { } while (0)
 
-static inline int tcp_use_frto(const struct sock *sk)
-{
-	const struct tcp_sock *tp = tcp_sk(sk);
-	
-	/* F-RTO must be activated in sysctl and there must be some
-	 * unsent new data, and the advertised window should allow
-	 * sending it.
-	 */
-	return (sysctl_tcp_frto && sk->sk_send_head &&
-		!after(TCP_SKB_CB(sk->sk_send_head)->end_seq,
-		       tp->snd_una + tp->snd_wnd));
-}
-
 static inline void tcp_mib_init(void)
 {
 	/* See RFC 2012 */
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index c5be3d0..294cb44 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1236,6 +1236,19 @@ #endif
 	return flag;
 }
 
+int tcp_use_frto(const struct sock *sk)
+{
+	const struct tcp_sock *tp = tcp_sk(sk);
+	
+	/* F-RTO must be activated in sysctl and there must be some
+	 * unsent new data, and the advertised window should allow
+	 * sending it.
+	 */
+	return (sysctl_tcp_frto && sk->sk_send_head &&
+		!after(TCP_SKB_CB(sk->sk_send_head)->end_seq,
+		       tp->snd_una + tp->snd_wnd));
+}
+
 /* RTO occurred, but do not yet enter loss state. Instead, transmit two new
  * segments to see from the next ACKs whether any data was really missing.
  * If the RTO was spurious, new ACKs should arrive.
-- 
1.4.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/18] [TCP] FRTO: Comment cleanup & improvement
  2007-02-19 11:37     ` [PATCH 3/18] [TCP] FRTO: Moved tcp_use_frto from tcp.h to tcp_input.c Ilpo Järvinen
@ 2007-02-19 11:37       ` Ilpo Järvinen
  2007-02-19 11:37         ` [PATCH 5/18] [TCP] FRTO: Consecutive RTOs keep prior_ssthresh and ssthresh Ilpo Järvinen
  0 siblings, 1 reply; 19+ messages in thread
From: Ilpo Järvinen @ 2007-02-19 11:37 UTC (permalink / raw)
  To: netdev; +Cc: David Miller, Pasi Sarolahti

Moved comments out from the body of process_frto() to the head
(preferred way; see Documentation/CodingStyle). Bonus: it's much
easier to read in this compacted form.

FRTO algorithm and implementation is described in greater detail.
For interested reader, more information is available in RFC4138.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_input.c |   49 ++++++++++++++++++++++++++++++++-----------------
 1 files changed, 32 insertions(+), 17 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 294cb44..f645c3e 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1236,22 +1236,22 @@ #endif
 	return flag;
 }
 
+/* F-RTO can only be used if these conditions are satisfied:
+ *  - there must be some unsent new data
+ *  - the advertised window should allow sending it
+ */
 int tcp_use_frto(const struct sock *sk)
 {
 	const struct tcp_sock *tp = tcp_sk(sk);
 	
-	/* F-RTO must be activated in sysctl and there must be some
-	 * unsent new data, and the advertised window should allow
-	 * sending it.
-	 */
 	return (sysctl_tcp_frto && sk->sk_send_head &&
 		!after(TCP_SKB_CB(sk->sk_send_head)->end_seq,
 		       tp->snd_una + tp->snd_wnd));
 }
 
-/* RTO occurred, but do not yet enter loss state. Instead, transmit two new
- * segments to see from the next ACKs whether any data was really missing.
- * If the RTO was spurious, new ACKs should arrive.
+/* RTO occurred, but do not yet enter Loss state. Instead, defer RTO
+ * recovery a bit and use heuristics in tcp_process_frto() to detect if
+ * the RTO was spurious.
  */
 void tcp_enter_frto(struct sock *sk)
 {
@@ -2489,6 +2489,30 @@ static void tcp_conservative_spur_to_res
 	tcp_moderate_cwnd(tp);
 }
 
+/* F-RTO spurious RTO detection algorithm (RFC4138)
+ *
+ * F-RTO affects during two new ACKs following RTO. State (ACK number) is kept
+ * in frto_counter. When ACK advances window (but not to or beyond highest
+ * sequence sent before RTO):
+ *   On First ACK,  send two new segments out.
+ *   On Second ACK, RTO was likely spurious. Do spurious response (response
+ *                  algorithm is not part of the F-RTO detection algorithm
+ *                  given in RFC4138 but can be selected separately).
+ * Otherwise (basically on duplicate ACK), RTO was (likely) caused by a loss
+ * and TCP falls back to conventional RTO recovery.
+ *
+ * Rationale: if the RTO was spurious, new ACKs should arrive from the
+ * original window even after we transmit two new data segments.
+ *
+ * F-RTO is implemented (mainly) in four functions:
+ *   - tcp_use_frto() is used to determine if TCP is can use F-RTO
+ *   - tcp_enter_frto() prepares TCP state on RTO if F-RTO is used, it is
+ *     called when tcp_use_frto() showed green light
+ *   - tcp_process_frto() handles incoming ACKs during F-RTO algorithm
+ *   - tcp_enter_frto_loss() is called if there is not enough evidence
+ *     to prove that the RTO is indeed spurious. It transfers the control
+ *     from F-RTO to the conventional RTO recovery
+ */
 static void tcp_process_frto(struct sock *sk, u32 prior_snd_una)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
@@ -2497,25 +2521,16 @@ static void tcp_process_frto(struct sock
 
 	if (tp->snd_una == prior_snd_una ||
 	    !before(tp->snd_una, tp->frto_highmark)) {
-		/* RTO was caused by loss, start retransmitting in
-		 * go-back-N slow start
-		 */
 		tcp_enter_frto_loss(sk);
 		return;
 	}
 
 	if (tp->frto_counter == 1) {
-		/* First ACK after RTO advances the window: allow two new
-		 * segments out.
-		 */
 		tp->snd_cwnd = tcp_packets_in_flight(tp) + 2;
-	} else {
+	} else /* frto_counter == 2 */ {
 		tcp_conservative_spur_to_response(tp);
 	}
 
-	/* F-RTO affects on two new ACKs following RTO.
-	 * At latest on third ACK the TCP behavior is back to normal.
-	 */
 	tp->frto_counter = (tp->frto_counter + 1) % 3;
 }
 
-- 
1.4.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/18] [TCP] FRTO: Consecutive RTOs keep prior_ssthresh and ssthresh
  2007-02-19 11:37       ` [PATCH 4/18] [TCP] FRTO: Comment cleanup & improvement Ilpo Järvinen
@ 2007-02-19 11:37         ` Ilpo Järvinen
  2007-02-19 11:38           ` [PATCH 6/18] [TCP] FRTO: Use Disorder state during operation instead of Open Ilpo Järvinen
  0 siblings, 1 reply; 19+ messages in thread
From: Ilpo Järvinen @ 2007-02-19 11:37 UTC (permalink / raw)
  To: netdev; +Cc: David Miller, Pasi Sarolahti

In case a latency spike causes more than one RTO, the later should not
cause the already reduced ssthresh to propagate into the prior_ssthresh
since FRTO declares all such RTOs spurious at once or none of them. In
treating of ssthresh, we mimic what tcp_enter_loss() does.

The previous state (in frto_counter) must be available until we have
checked it in tcp_enter_frto(), and also ACK information flag in
process_frto().

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_input.c |   20 ++++++++++++++------
 1 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index f645c3e..c846beb 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1252,6 +1252,10 @@ int tcp_use_frto(const struct sock *sk)
 /* RTO occurred, but do not yet enter Loss state. Instead, defer RTO
  * recovery a bit and use heuristics in tcp_process_frto() to detect if
  * the RTO was spurious.
+ *
+ * Do like tcp_enter_loss() would; when RTO expires the second time it
+ * does:
+ *  "Reduce ssthresh if it has not yet been made inside this window."
  */
 void tcp_enter_frto(struct sock *sk)
 {
@@ -1259,11 +1263,10 @@ void tcp_enter_frto(struct sock *sk)
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct sk_buff *skb;
 
-	tp->frto_counter = 1;
-
-	if (icsk->icsk_ca_state <= TCP_CA_Disorder ||
+	if ((!tp->frto_counter && icsk->icsk_ca_state <= TCP_CA_Disorder) ||
 	    tp->snd_una == tp->high_seq ||
-	    (icsk->icsk_ca_state == TCP_CA_Loss && !icsk->icsk_retransmits)) {
+	    ((icsk->icsk_ca_state == TCP_CA_Loss || tp->frto_counter) &&
+	     !icsk->icsk_retransmits)) {
 		tp->prior_ssthresh = tcp_current_ssthresh(sk);
 		tp->snd_ssthresh = icsk->icsk_ca_ops->ssthresh(sk);
 		tcp_ca_event(sk, CA_EVENT_FRTO);
@@ -1285,6 +1288,7 @@ void tcp_enter_frto(struct sock *sk)
 
 	tcp_set_ca_state(sk, TCP_CA_Open);
 	tp->frto_highmark = tp->snd_nxt;
+	tp->frto_counter = 1;
 }
 
 /* Enter Loss state after F-RTO was applied. Dupack arrived after RTO,
@@ -2513,12 +2517,16 @@ static void tcp_conservative_spur_to_res
  *     to prove that the RTO is indeed spurious. It transfers the control
  *     from F-RTO to the conventional RTO recovery
  */
-static void tcp_process_frto(struct sock *sk, u32 prior_snd_una)
+static void tcp_process_frto(struct sock *sk, u32 prior_snd_una, int flag)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 
 	tcp_sync_left_out(tp);
 
+	/* Duplicate the behavior from Loss state (fastretrans_alert) */
+	if (flag&FLAG_DATA_ACKED)
+		inet_csk(sk)->icsk_retransmits = 0;
+
 	if (tp->snd_una == prior_snd_una ||
 	    !before(tp->snd_una, tp->frto_highmark)) {
 		tcp_enter_frto_loss(sk);
@@ -2607,7 +2615,7 @@ static int tcp_ack(struct sock *sk, stru
 	flag |= tcp_clean_rtx_queue(sk, &seq_rtt);
 
 	if (tp->frto_counter)
-		tcp_process_frto(sk, prior_snd_una);
+		tcp_process_frto(sk, prior_snd_una, flag);
 
 	if (tcp_ack_is_dubious(sk, flag)) {
 		/* Advance CWND, if state allows this. */
-- 
1.4.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 6/18] [TCP] FRTO: Use Disorder state during operation instead of Open
  2007-02-19 11:37         ` [PATCH 5/18] [TCP] FRTO: Consecutive RTOs keep prior_ssthresh and ssthresh Ilpo Järvinen
@ 2007-02-19 11:38           ` Ilpo Järvinen
  2007-02-19 11:38             ` [PATCH 7/18] [TCP] FRTO: Ignore some uninteresting ACKs Ilpo Järvinen
  0 siblings, 1 reply; 19+ messages in thread
From: Ilpo Järvinen @ 2007-02-19 11:38 UTC (permalink / raw)
  To: netdev; +Cc: David Miller, Pasi Sarolahti

Retransmission counter assumptions are to be changed. Forcing
reason to do this exist: Using sysctl in check would be racy
as soon as FRTO starts to ignore some ACKs (doing that in the
following patches). Userspace may disable it at any moment
giving nice oops if timing is right. frto_counter would be
inaccessible from userspace, but with SACK enhanced FRTO
retrans_out can include other than head, and possibly leaving
it non-zero after spurious RTO, boom again.

Luckily, solution seems rather simple: never go directly to Open
state but use Disorder instead. This does not really change much,
since TCP could anyway change its state to Disorder during FRTO
using path tcp_fastretrans_alert -> tcp_try_to_open (e.g., when
a SACK block makes ACK dubious). Besides, Disorder seems to be
the state where TCP should be if not recovering (in Recovery or
Loss state) while having some retransmissions in-flight (see
tcp_try_to_open), which is exactly what happens with FRTO.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_input.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index c846beb..d1e731f 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1286,7 +1286,8 @@ void tcp_enter_frto(struct sock *sk)
 	}
 	tcp_sync_left_out(tp);
 
-	tcp_set_ca_state(sk, TCP_CA_Open);
+	tcp_set_ca_state(sk, TCP_CA_Disorder);
+	tp->high_seq = tp->snd_nxt;
 	tp->frto_highmark = tp->snd_nxt;
 	tp->frto_counter = 1;
 }
@@ -2014,8 +2015,7 @@ tcp_fastretrans_alert(struct sock *sk, u
 	/* E. Check state exit conditions. State can be terminated
 	 *    when high_seq is ACKed. */
 	if (icsk->icsk_ca_state == TCP_CA_Open) {
-		if (!sysctl_tcp_frto)
-			BUG_TRAP(tp->retrans_out == 0);
+		BUG_TRAP(tp->retrans_out == 0);
 		tp->retrans_stamp = 0;
 	} else if (!before(tp->snd_una, tp->high_seq)) {
 		switch (icsk->icsk_ca_state) {
-- 
1.4.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 7/18] [TCP] FRTO: Ignore some uninteresting ACKs
  2007-02-19 11:38           ` [PATCH 6/18] [TCP] FRTO: Use Disorder state during operation instead of Open Ilpo Järvinen
@ 2007-02-19 11:38             ` Ilpo Järvinen
  2007-02-19 11:38               ` [PATCH 8/18] [TCP] FRTO: fixes fallback to conventional recovery Ilpo Järvinen
  0 siblings, 1 reply; 19+ messages in thread
From: Ilpo Järvinen @ 2007-02-19 11:38 UTC (permalink / raw)
  To: netdev; +Cc: David Miller, Pasi Sarolahti

Handles RFC4138 shortcoming (in step 2); it should also have case
c) which ignores ACKs that are not duplicates nor advance window
(opposite dir data, winupdate).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_input.c |   13 ++++++++++---
 1 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index d1e731f..5831daa 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2495,9 +2495,9 @@ static void tcp_conservative_spur_to_res
 
 /* F-RTO spurious RTO detection algorithm (RFC4138)
  *
- * F-RTO affects during two new ACKs following RTO. State (ACK number) is kept
- * in frto_counter. When ACK advances window (but not to or beyond highest
- * sequence sent before RTO):
+ * F-RTO affects during two new ACKs following RTO (well, almost, see inline
+ * comments). State (ACK number) is kept in frto_counter. When ACK advances
+ * window (but not to or beyond highest sequence sent before RTO):
  *   On First ACK,  send two new segments out.
  *   On Second ACK, RTO was likely spurious. Do spurious response (response
  *                  algorithm is not part of the F-RTO detection algorithm
@@ -2527,6 +2527,13 @@ static void tcp_process_frto(struct sock
 	if (flag&FLAG_DATA_ACKED)
 		inet_csk(sk)->icsk_retransmits = 0;
 
+	/* RFC4138 shortcoming in step 2; should also have case c): ACK isn't
+	 * duplicate nor advances window, e.g., opposite dir data, winupdate
+	 */
+	if ((tp->snd_una == prior_snd_una) && (flag&FLAG_NOT_DUP) &&
+	    !(flag&FLAG_FORWARD_PROGRESS))
+		return;
+
 	if (tp->snd_una == prior_snd_una ||
 	    !before(tp->snd_una, tp->frto_highmark)) {
 		tcp_enter_frto_loss(sk);
-- 
1.4.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 8/18] [TCP] FRTO: fixes fallback to conventional recovery
  2007-02-19 11:38             ` [PATCH 7/18] [TCP] FRTO: Ignore some uninteresting ACKs Ilpo Järvinen
@ 2007-02-19 11:38               ` Ilpo Järvinen
  2007-02-19 11:38                 ` [PATCH 9/18] [TCP] FRTO: Response should reset also snd_cwnd_cnt Ilpo Järvinen
  0 siblings, 1 reply; 19+ messages in thread
From: Ilpo Järvinen @ 2007-02-19 11:38 UTC (permalink / raw)
  To: netdev; +Cc: David Miller, Pasi Sarolahti

The FRTO detection did not care how ACK pattern affects to cwnd
calculation of the conventional recovery. This caused incorrect
setting of cwnd when the fallback becames necessary. The
knowledge tcp_process_frto() has about the incoming ACK is now
passed on to tcp_enter_frto_loss() in allowed_segments parameter
that gives the number of segments that must be added to
packets-in-flight while calculating the new cwnd.

Instead of snd_una we use FLAG_DATA_ACKED in duplicate ACK
detection because RFC4138 states (in Section 2.2):
  If the first acknowledgment after the RTO retransmission
  does not acknowledge all of the data that was retransmitted
  in step 1, the TCP sender reverts to the conventional RTO
  recovery.  Otherwise, a malicious receiver acknowledging
  partial segments could cause the sender to declare the
  timeout spurious in a case where data was lost.

If the next ACK after RTO is duplicate, we do not retransmit
anything, which is equal to what conservative conventional
recovery does in such case.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_input.c |   14 +++++++++-----
 1 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 5831daa..2679279 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1296,7 +1296,7 @@ void tcp_enter_frto(struct sock *sk)
  * which indicates that we should follow the traditional RTO recovery,
  * i.e. mark everything lost and do go-back-N retransmission.
  */
-static void tcp_enter_frto_loss(struct sock *sk)
+static void tcp_enter_frto_loss(struct sock *sk, int allowed_segments)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct sk_buff *skb;
@@ -1326,7 +1326,7 @@ static void tcp_enter_frto_loss(struct s
 	}
 	tcp_sync_left_out(tp);
 
-	tp->snd_cwnd = tp->frto_counter + tcp_packets_in_flight(tp)+1;
+	tp->snd_cwnd = tcp_packets_in_flight(tp) + allowed_segments;
 	tp->snd_cwnd_cnt = 0;
 	tp->snd_cwnd_stamp = tcp_time_stamp;
 	tp->undo_marker = 0;
@@ -2527,6 +2527,11 @@ static void tcp_process_frto(struct sock
 	if (flag&FLAG_DATA_ACKED)
 		inet_csk(sk)->icsk_retransmits = 0;
 
+	if (!before(tp->snd_una, tp->frto_highmark)) {
+		tcp_enter_frto_loss(sk, tp->frto_counter + 1);
+		return;
+	}
+
 	/* RFC4138 shortcoming in step 2; should also have case c): ACK isn't
 	 * duplicate nor advances window, e.g., opposite dir data, winupdate
 	 */
@@ -2534,9 +2539,8 @@ static void tcp_process_frto(struct sock
 	    !(flag&FLAG_FORWARD_PROGRESS))
 		return;
 
-	if (tp->snd_una == prior_snd_una ||
-	    !before(tp->snd_una, tp->frto_highmark)) {
-		tcp_enter_frto_loss(sk);
+	if (!(flag&FLAG_DATA_ACKED)) {
+		tcp_enter_frto_loss(sk, (tp->frto_counter == 1 ? 0 : 3));
 		return;
 	}
 
-- 
1.4.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 9/18] [TCP] FRTO: Response should reset also snd_cwnd_cnt
  2007-02-19 11:38               ` [PATCH 8/18] [TCP] FRTO: fixes fallback to conventional recovery Ilpo Järvinen
@ 2007-02-19 11:38                 ` Ilpo Järvinen
  2007-02-19 11:38                   ` [PATCH 10/18] [TCP]: Don't enter to fast recovery while using FRTO Ilpo Järvinen
  0 siblings, 1 reply; 19+ messages in thread
From: Ilpo Järvinen @ 2007-02-19 11:38 UTC (permalink / raw)
  To: netdev; +Cc: David Miller, Pasi Sarolahti

Since purpose is to reduce CWND, we prevent immediate growth. This
is not a major issue nor is "the correct way" specified anywhere.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_input.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 2679279..9637abd 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2490,6 +2490,7 @@ static int tcp_ack_update_window(struct 
 static void tcp_conservative_spur_to_response(struct tcp_sock *tp)
 {
 	tp->snd_cwnd = min(tp->snd_cwnd, tp->snd_ssthresh);
+	tp->snd_cwnd_cnt = 0;
 	tcp_moderate_cwnd(tp);
 }
 
-- 
1.4.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 10/18] [TCP]: Don't enter to fast recovery while using FRTO
  2007-02-19 11:38                 ` [PATCH 9/18] [TCP] FRTO: Response should reset also snd_cwnd_cnt Ilpo Järvinen
@ 2007-02-19 11:38                   ` Ilpo Järvinen
  2007-02-19 11:38                     ` [PATCH 11/18] [TCP] FRTO: frto_counter modulo-op converted to two assignments Ilpo Järvinen
  0 siblings, 1 reply; 19+ messages in thread
From: Ilpo Järvinen @ 2007-02-19 11:38 UTC (permalink / raw)
  To: netdev; +Cc: David Miller, Pasi Sarolahti

Because TCP is not in Loss state during FRTO recovery, fast
recovery could be triggered by accident. Non-SACK FRTO is more
robust than not yet included SACK-enhanced version (that can
receiver high number of duplicate ACKs with SACK blocks during
FRTO), at least with unidirectional transfers, but under
extraordinary patterns fast recovery can be incorrectly
triggered, e.g., Data loss+ACK losses => cumulative ACK with
enough SACK blocks to meet sacked_out >= dupthresh condition).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_input.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 9637abd..309da3e 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1547,6 +1547,10 @@ static int tcp_time_to_recover(struct so
 {
 	__u32 packets_out;
 
+	/* Do not perform any recovery during FRTO algorithm */
+	if (tp->frto_counter)
+		return 0;
+
 	/* Trick#1: The loss is proven. */
 	if (tp->lost_out)
 		return 1;
-- 
1.4.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 11/18] [TCP] FRTO: frto_counter modulo-op converted to two assignments
  2007-02-19 11:38                   ` [PATCH 10/18] [TCP]: Don't enter to fast recovery while using FRTO Ilpo Järvinen
@ 2007-02-19 11:38                     ` Ilpo Järvinen
  2007-02-19 11:38                       ` [PATCH 12/18] [TCP]: Prevent unrelated cwnd adjustment while using FRTO Ilpo Järvinen
  0 siblings, 1 reply; 19+ messages in thread
From: Ilpo Järvinen @ 2007-02-19 11:38 UTC (permalink / raw)
  To: netdev; +Cc: David Miller, Pasi Sarolahti

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_input.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 309da3e..9fc7f66 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2551,11 +2551,11 @@ static void tcp_process_frto(struct sock
 
 	if (tp->frto_counter == 1) {
 		tp->snd_cwnd = tcp_packets_in_flight(tp) + 2;
+		tp->frto_counter = 2;
 	} else /* frto_counter == 2 */ {
 		tcp_conservative_spur_to_response(tp);
+		tp->frto_counter = 0;
 	}
-
-	tp->frto_counter = (tp->frto_counter + 1) % 3;
 }
 
 /* This routine deals with incoming acks, but not outgoing ones. */
-- 
1.4.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 12/18] [TCP]: Prevent unrelated cwnd adjustment while using FRTO
  2007-02-19 11:38                     ` [PATCH 11/18] [TCP] FRTO: frto_counter modulo-op converted to two assignments Ilpo Järvinen
@ 2007-02-19 11:38                       ` Ilpo Järvinen
  2007-02-19 11:38                         ` [PATCH 13/18] [TCP] FRTO: Entry is allowed only during (New)Reno like recovery Ilpo Järvinen
  0 siblings, 1 reply; 19+ messages in thread
From: Ilpo Järvinen @ 2007-02-19 11:38 UTC (permalink / raw)
  To: netdev; +Cc: David Miller, Pasi Sarolahti

FRTO controls cwnd when it still processes the ACK input or it
has just reverted back to conventional RTO recovery; the normal
rules apply when FRTO has reverted to standard congestion
control.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_input.c |   18 +++++++++++-------
 1 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 9fc7f66..5e952f0 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2522,7 +2522,7 @@ static void tcp_conservative_spur_to_res
  *     to prove that the RTO is indeed spurious. It transfers the control
  *     from F-RTO to the conventional RTO recovery
  */
-static void tcp_process_frto(struct sock *sk, u32 prior_snd_una, int flag)
+static int tcp_process_frto(struct sock *sk, u32 prior_snd_una, int flag)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 
@@ -2534,7 +2534,7 @@ static void tcp_process_frto(struct sock
 
 	if (!before(tp->snd_una, tp->frto_highmark)) {
 		tcp_enter_frto_loss(sk, tp->frto_counter + 1);
-		return;
+		return 1;
 	}
 
 	/* RFC4138 shortcoming in step 2; should also have case c): ACK isn't
@@ -2542,20 +2542,22 @@ static void tcp_process_frto(struct sock
 	 */
 	if ((tp->snd_una == prior_snd_una) && (flag&FLAG_NOT_DUP) &&
 	    !(flag&FLAG_FORWARD_PROGRESS))
-		return;
+		return 1;
 
 	if (!(flag&FLAG_DATA_ACKED)) {
 		tcp_enter_frto_loss(sk, (tp->frto_counter == 1 ? 0 : 3));
-		return;
+		return 1;
 	}
 
 	if (tp->frto_counter == 1) {
 		tp->snd_cwnd = tcp_packets_in_flight(tp) + 2;
 		tp->frto_counter = 2;
+		return 1;
 	} else /* frto_counter == 2 */ {
 		tcp_conservative_spur_to_response(tp);
 		tp->frto_counter = 0;
 	}
+	return 0;
 }
 
 /* This routine deals with incoming acks, but not outgoing ones. */
@@ -2569,6 +2571,7 @@ static int tcp_ack(struct sock *sk, stru
 	u32 prior_in_flight;
 	s32 seq_rtt;
 	int prior_packets;
+	int frto_cwnd = 0;
 
 	/* If the ack is newer than sent or older than previous acks
 	 * then we can probably ignore it.
@@ -2631,15 +2634,16 @@ static int tcp_ack(struct sock *sk, stru
 	flag |= tcp_clean_rtx_queue(sk, &seq_rtt);
 
 	if (tp->frto_counter)
-		tcp_process_frto(sk, prior_snd_una, flag);
+		frto_cwnd = tcp_process_frto(sk, prior_snd_una, flag);
 
 	if (tcp_ack_is_dubious(sk, flag)) {
 		/* Advance CWND, if state allows this. */
-		if ((flag & FLAG_DATA_ACKED) && tcp_may_raise_cwnd(sk, flag))
+		if ((flag & FLAG_DATA_ACKED) && !frto_cwnd &&
+		    tcp_may_raise_cwnd(sk, flag))
 			tcp_cong_avoid(sk, ack,  seq_rtt, prior_in_flight, 0);
 		tcp_fastretrans_alert(sk, prior_snd_una, prior_packets, flag);
 	} else {
-		if ((flag & FLAG_DATA_ACKED))
+		if ((flag & FLAG_DATA_ACKED) && !frto_cwnd)
 			tcp_cong_avoid(sk, ack, seq_rtt, prior_in_flight, 1);
 	}
 
-- 
1.4.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 13/18] [TCP] FRTO: Entry is allowed only during (New)Reno like recovery
  2007-02-19 11:38                       ` [PATCH 12/18] [TCP]: Prevent unrelated cwnd adjustment while using FRTO Ilpo Järvinen
@ 2007-02-19 11:38                         ` Ilpo Järvinen
  2007-02-19 11:38                           ` [PATCH 14/18] [TCP] FRTO: Reverse RETRANS bit clearing logic Ilpo Järvinen
  0 siblings, 1 reply; 19+ messages in thread
From: Ilpo Järvinen @ 2007-02-19 11:38 UTC (permalink / raw)
  To: netdev; +Cc: David Miller, Pasi Sarolahti

This interpretation comes from RFC4138:
    "If the sender implements some loss recovery algorithm other
     than Reno or NewReno [FHG04], the F-RTO algorithm SHOULD
     NOT be entered when earlier fast recovery is underway."

I think the RFC means to say (especially in the light of
Appendix B) that ...recovery is underway (not just fast recovery)
or was underway when it was interrupted by an earlier (F-)RTO
that hasn't yet been resolved (snd_una has not advanced enough).
Thus, my interpretation is that whenever TCP has ever
retransmitted other than head, basic version cannot be used
because then the order assumptions which are used as FRTO basis
do not hold.

NewReno has only the head segment retransmitted at a time.
Therefore, walk up to the segment that has not been SACKed, if
that segment is not retransmitted nor anything before it, we know
for sure, that nothing after the non-SACKed segment should be
either. This assumption is valid because TCPCB_EVER_RETRANS does
not leave holes but each non-SACKed segment is rexmitted
in-order.

Check for retrans_out > 1 avoids more expensive walk through the
skb list, as we can know the result beforehand: F-RTO will not be
allowed.

SACKed skb can turn into non-SACked only in the extremely rare
case of SACK reneging, in this case we might fail to detect
retransmissions if there were them for any other than head. To
get rid of that feature, whole rexmit queue would have to be
walked (always) or FRTO should be prevented when SACK reneging
happens. Of course RTO should still trigger after reneging which
makes this issue even less likely to show up. And as long as the
response is as conservative as it's now, nothing bad happens even
then.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 include/net/tcp.h    |    2 +-
 net/ipv4/tcp_input.c |   25 +++++++++++++++++++++----
 2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 572a77b..7fd6b77 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -341,7 +341,7 @@ extern struct sock *		tcp_check_req(stru
 extern int			tcp_child_process(struct sock *parent,
 						  struct sock *child,
 						  struct sk_buff *skb);
-extern int			tcp_use_frto(const struct sock *sk);
+extern int			tcp_use_frto(struct sock *sk);
 extern void			tcp_enter_frto(struct sock *sk);
 extern void			tcp_enter_loss(struct sock *sk, int how);
 extern void			tcp_clear_retrans(struct tcp_sock *tp);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 5e952f0..8f0aa9d 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1239,14 +1239,31 @@ #endif
 /* F-RTO can only be used if these conditions are satisfied:
  *  - there must be some unsent new data
  *  - the advertised window should allow sending it
+ *  - TCP has never retransmitted anything other than head
  */
-int tcp_use_frto(const struct sock *sk)
+int tcp_use_frto(struct sock *sk)
 {
 	const struct tcp_sock *tp = tcp_sk(sk);
+	struct sk_buff *skb;
 	
-	return (sysctl_tcp_frto && sk->sk_send_head &&
-		!after(TCP_SKB_CB(sk->sk_send_head)->end_seq,
-		       tp->snd_una + tp->snd_wnd));
+	if (!sysctl_tcp_frto || !sk->sk_send_head ||
+		after(TCP_SKB_CB(sk->sk_send_head)->end_seq,
+		      tp->snd_una + tp->snd_wnd))
+		return 0;
+
+	/* Avoid expensive walking of rexmit queue if possible */
+	if (tp->retrans_out > 1)
+		return 0;
+
+	skb = skb_peek(&sk->sk_write_queue)->next;	/* Skips head */
+	sk_stream_for_retrans_queue_from(skb, sk) {
+		if (TCP_SKB_CB(skb)->sacked&TCPCB_RETRANS)
+			return 0;
+		/* Short-circuit when first non-SACKed skb has been checked */
+		if (!(TCP_SKB_CB(skb)->sacked&TCPCB_SACKED_ACKED))
+			break;
+	}
+	return 1;
 }
 
 /* RTO occurred, but do not yet enter Loss state. Instead, defer RTO
-- 
1.4.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 14/18] [TCP] FRTO: Reverse RETRANS bit clearing logic
  2007-02-19 11:38                         ` [PATCH 13/18] [TCP] FRTO: Entry is allowed only during (New)Reno like recovery Ilpo Järvinen
@ 2007-02-19 11:38                           ` Ilpo Järvinen
       [not found]                             ` <1171885092244-git- send-email-ilpo.jarvinen@helsinki.fi>
  2007-02-19 11:38                             ` [PATCH 15/18] [TCP] FRTO: Fake cwnd for ssthresh callback Ilpo Järvinen
  0 siblings, 2 replies; 19+ messages in thread
From: Ilpo Järvinen @ 2007-02-19 11:38 UTC (permalink / raw)
  To: netdev; +Cc: David Miller, Pasi Sarolahti

Previously RETRANS bits were cleared on the entry to FRTO. We
postpone that into tcp_enter_frto_loss, which is really the
place were the clearing should be done anyway. This allows
simplification of the logic from a clearing loop to the head skb
clearing only.

Besides, the other changes made in the previous patches to
tcp_use_frto made it impossible for the non-SACKed FRTO to be
entered if other than the head has been rexmitted.

With SACK-enhanced FRTO (and Appendix B), however, there can be
a number retransmissions in flight when RTO expires (same thing
could happen before this patchset also with non-SACK FRTO). To
not introduce any jumpiness into the packet counting during FRTO,
instead of clearing RETRANS bits from skbs during entry, do it
later on.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_input.c |   35 +++++++++++++++++++++++------------
 1 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 8f0aa9d..2c0b387 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1268,7 +1268,11 @@ int tcp_use_frto(struct sock *sk)
 
 /* RTO occurred, but do not yet enter Loss state. Instead, defer RTO
  * recovery a bit and use heuristics in tcp_process_frto() to detect if
- * the RTO was spurious.
+ * the RTO was spurious. Only clear SACKED_RETRANS of the head here to
+ * keep retrans_out counting accurate (with SACK F-RTO, other than head
+ * may still have that bit set); TCPCB_LOST and remaining SACKED_RETRANS
+ * bits are handled if the Loss state is really to be entered (in
+ * tcp_enter_frto_loss).
  *
  * Do like tcp_enter_loss() would; when RTO expires the second time it
  * does:
@@ -1289,17 +1293,13 @@ void tcp_enter_frto(struct sock *sk)
 		tcp_ca_event(sk, CA_EVENT_FRTO);
 	}
 
-	/* Have to clear retransmission markers here to keep the bookkeeping
-	 * in shape, even though we are not yet in Loss state.
-	 * If something was really lost, it is eventually caught up
-	 * in tcp_enter_frto_loss.
-	 */
-	tp->retrans_out = 0;
 	tp->undo_marker = tp->snd_una;
 	tp->undo_retrans = 0;
 
-	sk_stream_for_retrans_queue(skb, sk) {
+	skb = skb_peek(&sk->sk_write_queue);
+	if (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_RETRANS) {
 		TCP_SKB_CB(skb)->sacked &= ~TCPCB_SACKED_RETRANS;
+		tp->retrans_out -= tcp_skb_pcount(skb);
 	}
 	tcp_sync_left_out(tp);
 
@@ -1313,7 +1313,7 @@ void tcp_enter_frto(struct sock *sk)
  * which indicates that we should follow the traditional RTO recovery,
  * i.e. mark everything lost and do go-back-N retransmission.
  */
-static void tcp_enter_frto_loss(struct sock *sk, int allowed_segments)
+static void tcp_enter_frto_loss(struct sock *sk, int allowed_segments, int flag)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct sk_buff *skb;
@@ -1322,10 +1322,21 @@ static void tcp_enter_frto_loss(struct s
 	tp->sacked_out = 0;
 	tp->lost_out = 0;
 	tp->fackets_out = 0;
+	tp->retrans_out = 0;
 
 	sk_stream_for_retrans_queue(skb, sk) {
 		cnt += tcp_skb_pcount(skb);
-		TCP_SKB_CB(skb)->sacked &= ~TCPCB_LOST;
+		/*
+		 * Count the retransmission made on RTO correctly (only when
+		 * waiting for the first ACK and did not get it)...
+		 */
+		if ((tp->frto_counter == 1) && !(flag&FLAG_DATA_ACKED)) {
+			tp->retrans_out += tcp_skb_pcount(skb);
+			/* ...enter this if branch just for the first segment */
+			flag |= FLAG_DATA_ACKED;
+		} else {
+			TCP_SKB_CB(skb)->sacked &= ~(TCPCB_LOST|TCPCB_SACKED_RETRANS);
+		}
 		if (!(TCP_SKB_CB(skb)->sacked&TCPCB_SACKED_ACKED)) {
 
 			/* Do not mark those segments lost that were
@@ -2550,7 +2561,7 @@ static int tcp_process_frto(struct sock 
 		inet_csk(sk)->icsk_retransmits = 0;
 
 	if (!before(tp->snd_una, tp->frto_highmark)) {
-		tcp_enter_frto_loss(sk, tp->frto_counter + 1);
+		tcp_enter_frto_loss(sk, tp->frto_counter + 1, flag);
 		return 1;
 	}
 
@@ -2562,7 +2573,7 @@ static int tcp_process_frto(struct sock 
 		return 1;
 
 	if (!(flag&FLAG_DATA_ACKED)) {
-		tcp_enter_frto_loss(sk, (tp->frto_counter == 1 ? 0 : 3));
+		tcp_enter_frto_loss(sk, (tp->frto_counter == 1 ? 0 : 3), flag);
 		return 1;
 	}
 
-- 
1.4.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 15/18] [TCP] FRTO: Fake cwnd for ssthresh callback
  2007-02-19 11:38                           ` [PATCH 14/18] [TCP] FRTO: Reverse RETRANS bit clearing logic Ilpo Järvinen
       [not found]                             ` <1171885092244-git- send-email-ilpo.jarvinen@helsinki.fi>
@ 2007-02-19 11:38                             ` Ilpo Järvinen
  2007-02-19 11:38                               ` [PATCH 16/18] [TCP]: Prevent reordering adjustments during FRTO Ilpo Järvinen
  1 sibling, 1 reply; 19+ messages in thread
From: Ilpo Järvinen @ 2007-02-19 11:38 UTC (permalink / raw)
  To: netdev; +Cc: David Miller, Pasi Sarolahti

TCP without FRTO would be in Loss state with small cwnd. FRTO,
however, leaves cwnd (typically) to a larger value which causes
ssthresh to become too large in case RTO is triggered again
compared to what conventional recovery would do. Because
consecutive RTOs result in only a single ssthresh reduction,
RTO+cumulative ACK+RTO pattern is required to trigger this
event.

A large comment is included for congestion control module writers
trying to figure out what CA_EVENT_FRTO handler should do because
there exists a remote possibility of incompatibility between
FRTO and module defined ssthresh functions.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_input.c |   26 +++++++++++++++++++++++++-
 1 files changed, 25 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 2c0b387..5d935b1 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1289,7 +1289,31 @@ void tcp_enter_frto(struct sock *sk)
 	    ((icsk->icsk_ca_state == TCP_CA_Loss || tp->frto_counter) &&
 	     !icsk->icsk_retransmits)) {
 		tp->prior_ssthresh = tcp_current_ssthresh(sk);
-		tp->snd_ssthresh = icsk->icsk_ca_ops->ssthresh(sk);
+		/* Our state is too optimistic in ssthresh() call because cwnd
+		 * is not reduced until tcp_enter_frto_loss() when previous FRTO
+		 * recovery has not yet completed. Pattern would be this: RTO,
+		 * Cumulative ACK, RTO (2xRTO for the same segment does not end
+		 * up here twice).
+		 * RFC4138 should be more specific on what to do, even though
+		 * RTO is quite unlikely to occur after the first Cumulative ACK
+		 * due to back-off and complexity of triggering events ...
+		 */
+		if (tp->frto_counter) {
+			u32 stored_cwnd;
+			stored_cwnd = tp->snd_cwnd;
+			tp->snd_cwnd = 2;
+			tp->snd_ssthresh = icsk->icsk_ca_ops->ssthresh(sk);
+			tp->snd_cwnd = stored_cwnd;
+		} else {
+			tp->snd_ssthresh = icsk->icsk_ca_ops->ssthresh(sk);
+		}
+		/* ... in theory, cong.control module could do "any tricks" in
+		 * ssthresh(), which means that ca_state, lost bits and lost_out
+		 * counter would have to be faked before the call occurs. We
+		 * consider that too expensive, unlikely and hacky, so modules
+		 * using these in ssthresh() must deal these incompatibility
+		 * issues if they receives CA_EVENT_FRTO and frto_counter != 0
+		 */
 		tcp_ca_event(sk, CA_EVENT_FRTO);
 	}
 
-- 
1.4.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 16/18] [TCP]: Prevent reordering adjustments during FRTO
  2007-02-19 11:38                             ` [PATCH 15/18] [TCP] FRTO: Fake cwnd for ssthresh callback Ilpo Järvinen
@ 2007-02-19 11:38                               ` Ilpo Järvinen
  2007-02-19 11:38                                 ` [PATCH 17/18] [TCP]: SACK enhanced FRTO Ilpo Järvinen
  0 siblings, 1 reply; 19+ messages in thread
From: Ilpo Järvinen @ 2007-02-19 11:38 UTC (permalink / raw)
  To: netdev; +Cc: David Miller, Pasi Sarolahti

To be honest, I'm not too sure how the reord stuff works in the
first place but this seems necessary.

When FRTO has been active, the one and only retransmission could
be unnecessary but the state and sending order might not be what
the sacktag code expects it to be (to work correctly).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_input.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 5d935b1..356de02 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1224,7 +1224,8 @@ tcp_sacktag_write_queue(struct sock *sk,
 
 	tp->left_out = tp->sacked_out + tp->lost_out;
 
-	if ((reord < tp->fackets_out) && icsk->icsk_ca_state != TCP_CA_Loss)
+	if ((reord < tp->fackets_out) && icsk->icsk_ca_state != TCP_CA_Loss &&
+	    (tp->frto_highmark && after(tp->snd_una, tp->frto_highmark)))
 		tcp_update_reordering(sk, ((tp->fackets_out + 1) - reord), 0);
 
 #if FASTRETRANS_DEBUG > 0
-- 
1.4.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 17/18] [TCP]: SACK enhanced FRTO
  2007-02-19 11:38                               ` [PATCH 16/18] [TCP]: Prevent reordering adjustments during FRTO Ilpo Järvinen
@ 2007-02-19 11:38                                 ` Ilpo Järvinen
  2007-02-19 11:38                                   ` [PATCH 18/18] [TCP] FRTO: Sysctl documentation for SACK enhanced version Ilpo Järvinen
  0 siblings, 1 reply; 19+ messages in thread
From: Ilpo Järvinen @ 2007-02-19 11:38 UTC (permalink / raw)
  To: netdev; +Cc: David Miller, Pasi Sarolahti

Implements the SACK-enhanced FRTO given in RFC4138 using the
variant given in Appendix B.

RFC4138, Appendix B:
  "This means that in order to declare timeout spurious, the TCP
   sender must receive an acknowledgment for non-retransmitted
   segment between SND.UNA and RecoveryPoint in algorithm step 3.
   RecoveryPoint is defined in conservative SACK-recovery
   algorithm [RFC3517]"

The basic version of the FRTO algorithm can still be used also
when SACK is enabled. To enabled SACK-enhanced version, tcp_frto
sysctl is set to 2.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_input.c |   76 +++++++++++++++++++++++++++++++++++++++++++-------
 1 files changed, 65 insertions(+), 11 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 356de02..3ce4019 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -100,6 +100,7 @@ #define FLAG_DATA_SACKED	0x20 /* New SAC
 #define FLAG_ECE		0x40 /* ECE in this ACK				*/
 #define FLAG_DATA_LOST		0x80 /* SACK detected data lossage.		*/
 #define FLAG_SLOWPATH		0x100 /* Do not skip RFC checks for window update.*/
+#define FLAG_ONLY_ORIG_SACKED	0x200 /* SACKs only non-rexmit sent before RTO */
 
 #define FLAG_ACKED		(FLAG_DATA_ACKED|FLAG_SYN_ACKED)
 #define FLAG_NOT_DUP		(FLAG_DATA|FLAG_WIN_UPDATE|FLAG_ACKED)
@@ -110,6 +111,8 @@ #define IsReno(tp) ((tp)->rx_opt.sack_ok
 #define IsFack(tp) ((tp)->rx_opt.sack_ok & 2)
 #define IsDSack(tp) ((tp)->rx_opt.sack_ok & 4)
 
+#define IsSackFrto() (sysctl_tcp_frto == 0x2)
+
 #define TCP_REMNANT (TCP_FLAG_FIN|TCP_FLAG_URG|TCP_FLAG_SYN|TCP_FLAG_PSH)
 
 /* Adapt the MSS value used to make delayed ack decision to the
@@ -1159,6 +1162,18 @@ tcp_sacktag_write_queue(struct sock *sk,
 						/* clear lost hint */
 						tp->retransmit_skb_hint = NULL;
 					}
+					/* SACK enhanced F-RTO detection.
+					 * Set flag if and only if non-rexmitted
+					 * segments below frto_highmark are
+					 * SACKed (RFC4138; Appendix B).
+					 * Clearing correct due to in-order walk
+					 */
+					if (after(end_seq, tp->frto_highmark)) {
+						flag &= ~FLAG_ONLY_ORIG_SACKED;
+					} else {
+						if (!(sacked & TCPCB_RETRANS))
+							flag |= FLAG_ONLY_ORIG_SACKED;
+					}
 				}
 
 				TCP_SKB_CB(skb)->sacked |= TCPCB_SACKED_ACKED;
@@ -1240,7 +1255,8 @@ #endif
 /* F-RTO can only be used if these conditions are satisfied:
  *  - there must be some unsent new data
  *  - the advertised window should allow sending it
- *  - TCP has never retransmitted anything other than head
+ *  - TCP has never retransmitted anything other than head (SACK enhanced
+ *    variant from Appendix B of RFC4138 is more robust here)
  */
 int tcp_use_frto(struct sock *sk)
 {
@@ -1252,6 +1268,9 @@ int tcp_use_frto(struct sock *sk)
 		      tp->snd_una + tp->snd_wnd))
 		return 0;
 
+	if (IsSackFrto())
+		return 1;
+
 	/* Avoid expensive walking of rexmit queue if possible */
 	if (tp->retrans_out > 1)
 		return 0;
@@ -1328,9 +1347,18 @@ void tcp_enter_frto(struct sock *sk)
 	}
 	tcp_sync_left_out(tp);
 
+	/* Earlier loss recovery underway (see RFC4138; Appendix B).
+	 * The last condition is necessary at least in tp->frto_counter case.
+	 */
+	if (IsSackFrto() && (tp->frto_counter ||
+	    ((1 << icsk->icsk_ca_state) & (TCPF_CA_Recovery|TCPF_CA_Loss))) &&
+	    after(tp->high_seq, tp->snd_una)) {
+		tp->frto_highmark = tp->high_seq;
+	} else {
+		tp->frto_highmark = tp->snd_nxt;
+	}
 	tcp_set_ca_state(sk, TCP_CA_Disorder);
 	tp->high_seq = tp->snd_nxt;
-	tp->frto_highmark = tp->snd_nxt;
 	tp->frto_counter = 1;
 }
 
@@ -2566,6 +2594,10 @@ static void tcp_conservative_spur_to_res
  * Rationale: if the RTO was spurious, new ACKs should arrive from the
  * original window even after we transmit two new data segments.
  *
+ * SACK version:
+ *   on first step, wait until first cumulative ACK arrives, then move to
+ *   the second step. In second step, the next ACK decides.
+ *
  * F-RTO is implemented (mainly) in four functions:
  *   - tcp_use_frto() is used to determine if TCP is can use F-RTO
  *   - tcp_enter_frto() prepares TCP state on RTO if F-RTO is used, it is
@@ -2590,16 +2622,38 @@ static int tcp_process_frto(struct sock 
 		return 1;
 	}
 
-	/* RFC4138 shortcoming in step 2; should also have case c): ACK isn't
-	 * duplicate nor advances window, e.g., opposite dir data, winupdate
-	 */
-	if ((tp->snd_una == prior_snd_una) && (flag&FLAG_NOT_DUP) &&
-	    !(flag&FLAG_FORWARD_PROGRESS))
-		return 1;
+	if (!IsSackFrto() || IsReno(tp)) {
+		/* RFC4138 shortcoming in step 2; should also have case c):
+		 * ACK isn't duplicate nor advances window, e.g., opposite dir
+		 * data, winupdate
+		 */
+		if ((tp->snd_una == prior_snd_una) && (flag&FLAG_NOT_DUP) &&
+		    !(flag&FLAG_FORWARD_PROGRESS))
+			return 1;
 
-	if (!(flag&FLAG_DATA_ACKED)) {
-		tcp_enter_frto_loss(sk, (tp->frto_counter == 1 ? 0 : 3), flag);
-		return 1;
+		if (!(flag&FLAG_DATA_ACKED)) {
+			tcp_enter_frto_loss(sk, (tp->frto_counter == 1 ? 0 : 3),
+					    flag);
+			return 1;
+		}
+	} else {
+		if (!(flag&FLAG_DATA_ACKED) && (tp->frto_counter == 1)) {
+			/* Prevent sending of new data. */
+			tp->snd_cwnd = min(tp->snd_cwnd,
+					   tcp_packets_in_flight(tp));
+			return 1;
+		}
+
+		if ((tp->frto_counter == 2) &&
+		    (!(flag&FLAG_FORWARD_PROGRESS) ||
+		     ((flag&FLAG_DATA_SACKED) && !(flag&FLAG_ONLY_ORIG_SACKED)))) {
+			/* RFC4138 shortcoming (see comment above) */
+			if (!(flag&FLAG_FORWARD_PROGRESS) && (flag&FLAG_NOT_DUP))
+				return 1;
+
+			tcp_enter_frto_loss(sk, 3, flag);
+			return 1;
+		}
 	}
 
 	if (tp->frto_counter == 1) {
-- 
1.4.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 18/18] [TCP] FRTO: Sysctl documentation for SACK enhanced version
  2007-02-19 11:38                                 ` [PATCH 17/18] [TCP]: SACK enhanced FRTO Ilpo Järvinen
@ 2007-02-19 11:38                                   ` Ilpo Järvinen
  0 siblings, 0 replies; 19+ messages in thread
From: Ilpo Järvinen @ 2007-02-19 11:38 UTC (permalink / raw)
  To: netdev; +Cc: David Miller, Pasi Sarolahti

The description is overly verbose to avoid ambiguity between
"SACK enabled" and "SACK enhanced FRTO"

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 Documentation/networking/ip-sysctl.txt |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index a0f6842..d66777b 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -178,7 +178,10 @@ tcp_frto - BOOLEAN
 	Enables F-RTO, an enhanced recovery algorithm for TCP retransmission
 	timeouts.  It is particularly beneficial in wireless environments
 	where packet loss is typically due to random radio interference
-	rather than intermediate router congestion.
+	rather than intermediate router congestion. If set to 1, basic
+	version is enabled. 2 enables SACK enhanced FRTO, which is
+	EXPERIMENTAL. The basic version can be used also when SACK is
+	enabled for a flow through tcp_sack sysctl.
 
 tcp_keepalive_time - INTEGER
 	How often TCP sends out keepalive messages when keepalive is enabled.
-- 
1.4.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2007-02-19 11:38 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-19 11:37 [PATCHSET 0/18] FRTO: fixes and small changes + SACK enhanced version Ilpo Järvinen
2007-02-19 11:37 ` [PATCH 1/18] [TCP] FRTO: Incorrectly clears TCPCB_EVER_RETRANS bit Ilpo Järvinen
2007-02-19 11:37   ` [PATCH 2/18] [TCP] FRTO: Separated response from FRTO detection algorithm Ilpo Järvinen
2007-02-19 11:37     ` [PATCH 3/18] [TCP] FRTO: Moved tcp_use_frto from tcp.h to tcp_input.c Ilpo Järvinen
2007-02-19 11:37       ` [PATCH 4/18] [TCP] FRTO: Comment cleanup & improvement Ilpo Järvinen
2007-02-19 11:37         ` [PATCH 5/18] [TCP] FRTO: Consecutive RTOs keep prior_ssthresh and ssthresh Ilpo Järvinen
2007-02-19 11:38           ` [PATCH 6/18] [TCP] FRTO: Use Disorder state during operation instead of Open Ilpo Järvinen
2007-02-19 11:38             ` [PATCH 7/18] [TCP] FRTO: Ignore some uninteresting ACKs Ilpo Järvinen
2007-02-19 11:38               ` [PATCH 8/18] [TCP] FRTO: fixes fallback to conventional recovery Ilpo Järvinen
2007-02-19 11:38                 ` [PATCH 9/18] [TCP] FRTO: Response should reset also snd_cwnd_cnt Ilpo Järvinen
2007-02-19 11:38                   ` [PATCH 10/18] [TCP]: Don't enter to fast recovery while using FRTO Ilpo Järvinen
2007-02-19 11:38                     ` [PATCH 11/18] [TCP] FRTO: frto_counter modulo-op converted to two assignments Ilpo Järvinen
2007-02-19 11:38                       ` [PATCH 12/18] [TCP]: Prevent unrelated cwnd adjustment while using FRTO Ilpo Järvinen
2007-02-19 11:38                         ` [PATCH 13/18] [TCP] FRTO: Entry is allowed only during (New)Reno like recovery Ilpo Järvinen
2007-02-19 11:38                           ` [PATCH 14/18] [TCP] FRTO: Reverse RETRANS bit clearing logic Ilpo Järvinen
     [not found]                             ` <1171885092244-git- send-email-ilpo.jarvinen@helsinki.fi>
2007-02-19 11:38                             ` [PATCH 15/18] [TCP] FRTO: Fake cwnd for ssthresh callback Ilpo Järvinen
2007-02-19 11:38                               ` [PATCH 16/18] [TCP]: Prevent reordering adjustments during FRTO Ilpo Järvinen
2007-02-19 11:38                                 ` [PATCH 17/18] [TCP]: SACK enhanced FRTO Ilpo Järvinen
2007-02-19 11:38                                   ` [PATCH 18/18] [TCP] FRTO: Sysctl documentation for SACK enhanced version Ilpo Järvinen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.