* [PATCHSET 0/18] FRTO: fixes and small changes + SACK enhanced version @ 2007-02-19 11:37 Ilpo Järvinen 2007-02-19 11:37 ` [PATCH 1/18] [TCP] FRTO: Incorrectly clears TCPCB_EVER_RETRANS bit Ilpo Järvinen 0 siblings, 1 reply; 19+ messages in thread From: Ilpo Järvinen @ 2007-02-19 11:37 UTC (permalink / raw) To: netdev; +Cc: David Miller, Pasi Sarolahti Hi, Here is a set of patches that fix most of the flaws the current FRTO implementation (specified in RFC4138) has, besides that, the last two patches add SACK-enhanced FRTO (not enabled unless frto sysctl is set to 2, which allows using of the basic version also with SACK). There are some depencies to the earlier patches in the set (hard to list all thoughts I've had, but not all combinations are not good ones even if they apply cleanly). Documentation/networking/ip-sysctl.txt | 5 - include/net/tcp.h | 14 -- net/ipv4/tcp_input.c | 265 ++++++++++++++++++++++++++------ 3 files changed, 221 insertions(+), 63 deletions(-) (At least) one interpretation issue exists, see patch "FRTO: Entry is allowed only during (New)Reno like recovery". Besides that, these things should/could be solved (later on): - Setting undo_marker when RTO is not spurious (FRTO has been clearing it, which disabled DSACK undos for conventional recovery). - Interaction with Eifel - Different response (new sysctl to select them?) - When cumulative ACK arrives to the frto_highseq during FRTO, it could be useful to go directly to CA_Open because then duplicate ACKs for that segment could then be used initiate recovery if it was lost. Most of the time, the duplicate ACKs won't be false ones (we might have made too many unnecessary retransmission but that's less likely with FRTO and it could be consider while making state decision). - Maybe the frto_highmark should be reset somewhere during a connection due to wrapping of seqnos (reord adjustment relies on it having a valid after relation...)? - tcp_use_frto and tcp_enter_loss now both scan skb list from the beginning, it might be possible to take advantage of this either by combining them or by passing skb from use_frto iteration to tcp_enter_loss. I did some tests with FACK + SACK FRTO, results seemed to be correct but the conservative response had really poor performance. I'm more familiar with more aggressive response time-seq graphs and I was really wondering does this thing really work at all (in couple of cases), but yes, I found after tracing that it worked although the results was not a very good looking one due to interaction with rate halving, maybe a "rate-halving aware" response could do much better (or alternatively one that does more aggressive undo). # Test 1: normal TCP # Test 2: spurious RTO # Test 3: drop the segment # Test 4: drop a delayed segment # Test 5: drop the next segment # Test 6: drop in window segment # Test 7: drop the segment and the next segment # Test 8: drop the segment and in window segment # Test 9: delay the first and next (spurious RTOs, for different segments) # Test 10: delay the first excessively (two spurious RTOs) # Test n+1: drop rexmission # Test n+2: delay rexmission (spurious RTO also after frto_highmark) # Test n+3: delay rexmission (spurious RTO also after highmark), drop RTO seg # Test n+4: drop the segment and rexmit # Test n+5: drop the segment and first new data # Test n+6: drop the segment and second new data The tests were run in 2.6.18, I have quite a lot of own modifications included in but they were disable using sysctls except for a change in mark_head_lost: if condition from !TAGBITS -> !(TAGBITS & ~SACKED_RETRANS) but afaict, it shouldn't affect, and if it does, it should be included (if you received this mail from previous send attempt, I claimed by a mistakenly that SACKED_ACKED was the bit that was excluded and had incorrect parenthesized it here). I couldn't come up with a scenario in mainline only code where SACKED_RETRANS would be set for a skb when LOST has not been set, except for the head by FRTO itself which will not be a problem. I have checked that the FRTO parts used in tests were identical to the result of this patchset. Compile tested againts the net-2.6 (also intermediate steps). -- i. ps. I'm sorry if you receive these twice, the previous attempted had some charset problems and was rejected at least by netdev. ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 1/18] [TCP] FRTO: Incorrectly clears TCPCB_EVER_RETRANS bit 2007-02-19 11:37 [PATCHSET 0/18] FRTO: fixes and small changes + SACK enhanced version Ilpo Järvinen @ 2007-02-19 11:37 ` Ilpo Järvinen 2007-02-19 11:37 ` [PATCH 2/18] [TCP] FRTO: Separated response from FRTO detection algorithm Ilpo Järvinen 0 siblings, 1 reply; 19+ messages in thread From: Ilpo Järvinen @ 2007-02-19 11:37 UTC (permalink / raw) To: netdev; +Cc: David Miller, Pasi Sarolahti FRTO was slightly too brave... Should only clear TCPCB_SACKED_RETRANS bit. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> --- net/ipv4/tcp_input.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 1a14191..b21e232 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1266,7 +1266,7 @@ void tcp_enter_frto(struct sock *sk) tp->undo_retrans = 0; sk_stream_for_retrans_queue(skb, sk) { - TCP_SKB_CB(skb)->sacked &= ~TCPCB_RETRANS; + TCP_SKB_CB(skb)->sacked &= ~TCPCB_SACKED_RETRANS; } tcp_sync_left_out(tp); -- 1.4.2 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 2/18] [TCP] FRTO: Separated response from FRTO detection algorithm 2007-02-19 11:37 ` [PATCH 1/18] [TCP] FRTO: Incorrectly clears TCPCB_EVER_RETRANS bit Ilpo Järvinen @ 2007-02-19 11:37 ` Ilpo Järvinen 2007-02-19 11:37 ` [PATCH 3/18] [TCP] FRTO: Moved tcp_use_frto from tcp.h to tcp_input.c Ilpo Järvinen 0 siblings, 1 reply; 19+ messages in thread From: Ilpo Järvinen @ 2007-02-19 11:37 UTC (permalink / raw) To: netdev; +Cc: David Miller, Pasi Sarolahti FRTO spurious RTO detection algorithm (RFC4138) does not include response to a detected spurious RTO but can use different response algorithms. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> --- net/ipv4/tcp_input.c | 16 ++++++++++------ 1 files changed, 10 insertions(+), 6 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index b21e232..c5be3d0 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2467,6 +2467,15 @@ static int tcp_ack_update_window(struct return flag; } +/* A very conservative spurious RTO response algorithm: reduce cwnd and + * continue in congestion avoidance. + */ +static void tcp_conservative_spur_to_response(struct tcp_sock *tp) +{ + tp->snd_cwnd = min(tp->snd_cwnd, tp->snd_ssthresh); + tcp_moderate_cwnd(tp); +} + static void tcp_process_frto(struct sock *sk, u32 prior_snd_una) { struct tcp_sock *tp = tcp_sk(sk); @@ -2488,12 +2497,7 @@ static void tcp_process_frto(struct sock */ tp->snd_cwnd = tcp_packets_in_flight(tp) + 2; } else { - /* Also the second ACK after RTO advances the window. - * The RTO was likely spurious. Reduce cwnd and continue - * in congestion avoidance - */ - tp->snd_cwnd = min(tp->snd_cwnd, tp->snd_ssthresh); - tcp_moderate_cwnd(tp); + tcp_conservative_spur_to_response(tp); } /* F-RTO affects on two new ACKs following RTO. -- 1.4.2 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 3/18] [TCP] FRTO: Moved tcp_use_frto from tcp.h to tcp_input.c 2007-02-19 11:37 ` [PATCH 2/18] [TCP] FRTO: Separated response from FRTO detection algorithm Ilpo Järvinen @ 2007-02-19 11:37 ` Ilpo Järvinen 2007-02-19 11:37 ` [PATCH 4/18] [TCP] FRTO: Comment cleanup & improvement Ilpo Järvinen 0 siblings, 1 reply; 19+ messages in thread From: Ilpo Järvinen @ 2007-02-19 11:37 UTC (permalink / raw) To: netdev; +Cc: David Miller, Pasi Sarolahti In addition, removed inline. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> --- include/net/tcp.h | 14 +------------- net/ipv4/tcp_input.c | 13 +++++++++++++ 2 files changed, 14 insertions(+), 13 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 5c472f2..572a77b 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -341,6 +341,7 @@ extern struct sock * tcp_check_req(stru extern int tcp_child_process(struct sock *parent, struct sock *child, struct sk_buff *skb); +extern int tcp_use_frto(const struct sock *sk); extern void tcp_enter_frto(struct sock *sk); extern void tcp_enter_loss(struct sock *sk, int how); extern void tcp_clear_retrans(struct tcp_sock *tp); @@ -1033,19 +1034,6 @@ static inline int tcp_paws_check(const s #define TCP_CHECK_TIMER(sk) do { } while (0) -static inline int tcp_use_frto(const struct sock *sk) -{ - const struct tcp_sock *tp = tcp_sk(sk); - - /* F-RTO must be activated in sysctl and there must be some - * unsent new data, and the advertised window should allow - * sending it. - */ - return (sysctl_tcp_frto && sk->sk_send_head && - !after(TCP_SKB_CB(sk->sk_send_head)->end_seq, - tp->snd_una + tp->snd_wnd)); -} - static inline void tcp_mib_init(void) { /* See RFC 2012 */ diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index c5be3d0..294cb44 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1236,6 +1236,19 @@ #endif return flag; } +int tcp_use_frto(const struct sock *sk) +{ + const struct tcp_sock *tp = tcp_sk(sk); + + /* F-RTO must be activated in sysctl and there must be some + * unsent new data, and the advertised window should allow + * sending it. + */ + return (sysctl_tcp_frto && sk->sk_send_head && + !after(TCP_SKB_CB(sk->sk_send_head)->end_seq, + tp->snd_una + tp->snd_wnd)); +} + /* RTO occurred, but do not yet enter loss state. Instead, transmit two new * segments to see from the next ACKs whether any data was really missing. * If the RTO was spurious, new ACKs should arrive. -- 1.4.2 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 4/18] [TCP] FRTO: Comment cleanup & improvement 2007-02-19 11:37 ` [PATCH 3/18] [TCP] FRTO: Moved tcp_use_frto from tcp.h to tcp_input.c Ilpo Järvinen @ 2007-02-19 11:37 ` Ilpo Järvinen 2007-02-19 11:37 ` [PATCH 5/18] [TCP] FRTO: Consecutive RTOs keep prior_ssthresh and ssthresh Ilpo Järvinen 0 siblings, 1 reply; 19+ messages in thread From: Ilpo Järvinen @ 2007-02-19 11:37 UTC (permalink / raw) To: netdev; +Cc: David Miller, Pasi Sarolahti Moved comments out from the body of process_frto() to the head (preferred way; see Documentation/CodingStyle). Bonus: it's much easier to read in this compacted form. FRTO algorithm and implementation is described in greater detail. For interested reader, more information is available in RFC4138. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> --- net/ipv4/tcp_input.c | 49 ++++++++++++++++++++++++++++++++----------------- 1 files changed, 32 insertions(+), 17 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 294cb44..f645c3e 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1236,22 +1236,22 @@ #endif return flag; } +/* F-RTO can only be used if these conditions are satisfied: + * - there must be some unsent new data + * - the advertised window should allow sending it + */ int tcp_use_frto(const struct sock *sk) { const struct tcp_sock *tp = tcp_sk(sk); - /* F-RTO must be activated in sysctl and there must be some - * unsent new data, and the advertised window should allow - * sending it. - */ return (sysctl_tcp_frto && sk->sk_send_head && !after(TCP_SKB_CB(sk->sk_send_head)->end_seq, tp->snd_una + tp->snd_wnd)); } -/* RTO occurred, but do not yet enter loss state. Instead, transmit two new - * segments to see from the next ACKs whether any data was really missing. - * If the RTO was spurious, new ACKs should arrive. +/* RTO occurred, but do not yet enter Loss state. Instead, defer RTO + * recovery a bit and use heuristics in tcp_process_frto() to detect if + * the RTO was spurious. */ void tcp_enter_frto(struct sock *sk) { @@ -2489,6 +2489,30 @@ static void tcp_conservative_spur_to_res tcp_moderate_cwnd(tp); } +/* F-RTO spurious RTO detection algorithm (RFC4138) + * + * F-RTO affects during two new ACKs following RTO. State (ACK number) is kept + * in frto_counter. When ACK advances window (but not to or beyond highest + * sequence sent before RTO): + * On First ACK, send two new segments out. + * On Second ACK, RTO was likely spurious. Do spurious response (response + * algorithm is not part of the F-RTO detection algorithm + * given in RFC4138 but can be selected separately). + * Otherwise (basically on duplicate ACK), RTO was (likely) caused by a loss + * and TCP falls back to conventional RTO recovery. + * + * Rationale: if the RTO was spurious, new ACKs should arrive from the + * original window even after we transmit two new data segments. + * + * F-RTO is implemented (mainly) in four functions: + * - tcp_use_frto() is used to determine if TCP is can use F-RTO + * - tcp_enter_frto() prepares TCP state on RTO if F-RTO is used, it is + * called when tcp_use_frto() showed green light + * - tcp_process_frto() handles incoming ACKs during F-RTO algorithm + * - tcp_enter_frto_loss() is called if there is not enough evidence + * to prove that the RTO is indeed spurious. It transfers the control + * from F-RTO to the conventional RTO recovery + */ static void tcp_process_frto(struct sock *sk, u32 prior_snd_una) { struct tcp_sock *tp = tcp_sk(sk); @@ -2497,25 +2521,16 @@ static void tcp_process_frto(struct sock if (tp->snd_una == prior_snd_una || !before(tp->snd_una, tp->frto_highmark)) { - /* RTO was caused by loss, start retransmitting in - * go-back-N slow start - */ tcp_enter_frto_loss(sk); return; } if (tp->frto_counter == 1) { - /* First ACK after RTO advances the window: allow two new - * segments out. - */ tp->snd_cwnd = tcp_packets_in_flight(tp) + 2; - } else { + } else /* frto_counter == 2 */ { tcp_conservative_spur_to_response(tp); } - /* F-RTO affects on two new ACKs following RTO. - * At latest on third ACK the TCP behavior is back to normal. - */ tp->frto_counter = (tp->frto_counter + 1) % 3; } -- 1.4.2 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 5/18] [TCP] FRTO: Consecutive RTOs keep prior_ssthresh and ssthresh 2007-02-19 11:37 ` [PATCH 4/18] [TCP] FRTO: Comment cleanup & improvement Ilpo Järvinen @ 2007-02-19 11:37 ` Ilpo Järvinen 2007-02-19 11:38 ` [PATCH 6/18] [TCP] FRTO: Use Disorder state during operation instead of Open Ilpo Järvinen 0 siblings, 1 reply; 19+ messages in thread From: Ilpo Järvinen @ 2007-02-19 11:37 UTC (permalink / raw) To: netdev; +Cc: David Miller, Pasi Sarolahti In case a latency spike causes more than one RTO, the later should not cause the already reduced ssthresh to propagate into the prior_ssthresh since FRTO declares all such RTOs spurious at once or none of them. In treating of ssthresh, we mimic what tcp_enter_loss() does. The previous state (in frto_counter) must be available until we have checked it in tcp_enter_frto(), and also ACK information flag in process_frto(). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> --- net/ipv4/tcp_input.c | 20 ++++++++++++++------ 1 files changed, 14 insertions(+), 6 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index f645c3e..c846beb 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1252,6 +1252,10 @@ int tcp_use_frto(const struct sock *sk) /* RTO occurred, but do not yet enter Loss state. Instead, defer RTO * recovery a bit and use heuristics in tcp_process_frto() to detect if * the RTO was spurious. + * + * Do like tcp_enter_loss() would; when RTO expires the second time it + * does: + * "Reduce ssthresh if it has not yet been made inside this window." */ void tcp_enter_frto(struct sock *sk) { @@ -1259,11 +1263,10 @@ void tcp_enter_frto(struct sock *sk) struct tcp_sock *tp = tcp_sk(sk); struct sk_buff *skb; - tp->frto_counter = 1; - - if (icsk->icsk_ca_state <= TCP_CA_Disorder || + if ((!tp->frto_counter && icsk->icsk_ca_state <= TCP_CA_Disorder) || tp->snd_una == tp->high_seq || - (icsk->icsk_ca_state == TCP_CA_Loss && !icsk->icsk_retransmits)) { + ((icsk->icsk_ca_state == TCP_CA_Loss || tp->frto_counter) && + !icsk->icsk_retransmits)) { tp->prior_ssthresh = tcp_current_ssthresh(sk); tp->snd_ssthresh = icsk->icsk_ca_ops->ssthresh(sk); tcp_ca_event(sk, CA_EVENT_FRTO); @@ -1285,6 +1288,7 @@ void tcp_enter_frto(struct sock *sk) tcp_set_ca_state(sk, TCP_CA_Open); tp->frto_highmark = tp->snd_nxt; + tp->frto_counter = 1; } /* Enter Loss state after F-RTO was applied. Dupack arrived after RTO, @@ -2513,12 +2517,16 @@ static void tcp_conservative_spur_to_res * to prove that the RTO is indeed spurious. It transfers the control * from F-RTO to the conventional RTO recovery */ -static void tcp_process_frto(struct sock *sk, u32 prior_snd_una) +static void tcp_process_frto(struct sock *sk, u32 prior_snd_una, int flag) { struct tcp_sock *tp = tcp_sk(sk); tcp_sync_left_out(tp); + /* Duplicate the behavior from Loss state (fastretrans_alert) */ + if (flag&FLAG_DATA_ACKED) + inet_csk(sk)->icsk_retransmits = 0; + if (tp->snd_una == prior_snd_una || !before(tp->snd_una, tp->frto_highmark)) { tcp_enter_frto_loss(sk); @@ -2607,7 +2615,7 @@ static int tcp_ack(struct sock *sk, stru flag |= tcp_clean_rtx_queue(sk, &seq_rtt); if (tp->frto_counter) - tcp_process_frto(sk, prior_snd_una); + tcp_process_frto(sk, prior_snd_una, flag); if (tcp_ack_is_dubious(sk, flag)) { /* Advance CWND, if state allows this. */ -- 1.4.2 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 6/18] [TCP] FRTO: Use Disorder state during operation instead of Open 2007-02-19 11:37 ` [PATCH 5/18] [TCP] FRTO: Consecutive RTOs keep prior_ssthresh and ssthresh Ilpo Järvinen @ 2007-02-19 11:38 ` Ilpo Järvinen 2007-02-19 11:38 ` [PATCH 7/18] [TCP] FRTO: Ignore some uninteresting ACKs Ilpo Järvinen 0 siblings, 1 reply; 19+ messages in thread From: Ilpo Järvinen @ 2007-02-19 11:38 UTC (permalink / raw) To: netdev; +Cc: David Miller, Pasi Sarolahti Retransmission counter assumptions are to be changed. Forcing reason to do this exist: Using sysctl in check would be racy as soon as FRTO starts to ignore some ACKs (doing that in the following patches). Userspace may disable it at any moment giving nice oops if timing is right. frto_counter would be inaccessible from userspace, but with SACK enhanced FRTO retrans_out can include other than head, and possibly leaving it non-zero after spurious RTO, boom again. Luckily, solution seems rather simple: never go directly to Open state but use Disorder instead. This does not really change much, since TCP could anyway change its state to Disorder during FRTO using path tcp_fastretrans_alert -> tcp_try_to_open (e.g., when a SACK block makes ACK dubious). Besides, Disorder seems to be the state where TCP should be if not recovering (in Recovery or Loss state) while having some retransmissions in-flight (see tcp_try_to_open), which is exactly what happens with FRTO. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> --- net/ipv4/tcp_input.c | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index c846beb..d1e731f 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1286,7 +1286,8 @@ void tcp_enter_frto(struct sock *sk) } tcp_sync_left_out(tp); - tcp_set_ca_state(sk, TCP_CA_Open); + tcp_set_ca_state(sk, TCP_CA_Disorder); + tp->high_seq = tp->snd_nxt; tp->frto_highmark = tp->snd_nxt; tp->frto_counter = 1; } @@ -2014,8 +2015,7 @@ tcp_fastretrans_alert(struct sock *sk, u /* E. Check state exit conditions. State can be terminated * when high_seq is ACKed. */ if (icsk->icsk_ca_state == TCP_CA_Open) { - if (!sysctl_tcp_frto) - BUG_TRAP(tp->retrans_out == 0); + BUG_TRAP(tp->retrans_out == 0); tp->retrans_stamp = 0; } else if (!before(tp->snd_una, tp->high_seq)) { switch (icsk->icsk_ca_state) { -- 1.4.2 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 7/18] [TCP] FRTO: Ignore some uninteresting ACKs 2007-02-19 11:38 ` [PATCH 6/18] [TCP] FRTO: Use Disorder state during operation instead of Open Ilpo Järvinen @ 2007-02-19 11:38 ` Ilpo Järvinen 2007-02-19 11:38 ` [PATCH 8/18] [TCP] FRTO: fixes fallback to conventional recovery Ilpo Järvinen 0 siblings, 1 reply; 19+ messages in thread From: Ilpo Järvinen @ 2007-02-19 11:38 UTC (permalink / raw) To: netdev; +Cc: David Miller, Pasi Sarolahti Handles RFC4138 shortcoming (in step 2); it should also have case c) which ignores ACKs that are not duplicates nor advance window (opposite dir data, winupdate). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> --- net/ipv4/tcp_input.c | 13 ++++++++++--- 1 files changed, 10 insertions(+), 3 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index d1e731f..5831daa 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2495,9 +2495,9 @@ static void tcp_conservative_spur_to_res /* F-RTO spurious RTO detection algorithm (RFC4138) * - * F-RTO affects during two new ACKs following RTO. State (ACK number) is kept - * in frto_counter. When ACK advances window (but not to or beyond highest - * sequence sent before RTO): + * F-RTO affects during two new ACKs following RTO (well, almost, see inline + * comments). State (ACK number) is kept in frto_counter. When ACK advances + * window (but not to or beyond highest sequence sent before RTO): * On First ACK, send two new segments out. * On Second ACK, RTO was likely spurious. Do spurious response (response * algorithm is not part of the F-RTO detection algorithm @@ -2527,6 +2527,13 @@ static void tcp_process_frto(struct sock if (flag&FLAG_DATA_ACKED) inet_csk(sk)->icsk_retransmits = 0; + /* RFC4138 shortcoming in step 2; should also have case c): ACK isn't + * duplicate nor advances window, e.g., opposite dir data, winupdate + */ + if ((tp->snd_una == prior_snd_una) && (flag&FLAG_NOT_DUP) && + !(flag&FLAG_FORWARD_PROGRESS)) + return; + if (tp->snd_una == prior_snd_una || !before(tp->snd_una, tp->frto_highmark)) { tcp_enter_frto_loss(sk); -- 1.4.2 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 8/18] [TCP] FRTO: fixes fallback to conventional recovery 2007-02-19 11:38 ` [PATCH 7/18] [TCP] FRTO: Ignore some uninteresting ACKs Ilpo Järvinen @ 2007-02-19 11:38 ` Ilpo Järvinen 2007-02-19 11:38 ` [PATCH 9/18] [TCP] FRTO: Response should reset also snd_cwnd_cnt Ilpo Järvinen 0 siblings, 1 reply; 19+ messages in thread From: Ilpo Järvinen @ 2007-02-19 11:38 UTC (permalink / raw) To: netdev; +Cc: David Miller, Pasi Sarolahti The FRTO detection did not care how ACK pattern affects to cwnd calculation of the conventional recovery. This caused incorrect setting of cwnd when the fallback becames necessary. The knowledge tcp_process_frto() has about the incoming ACK is now passed on to tcp_enter_frto_loss() in allowed_segments parameter that gives the number of segments that must be added to packets-in-flight while calculating the new cwnd. Instead of snd_una we use FLAG_DATA_ACKED in duplicate ACK detection because RFC4138 states (in Section 2.2): If the first acknowledgment after the RTO retransmission does not acknowledge all of the data that was retransmitted in step 1, the TCP sender reverts to the conventional RTO recovery. Otherwise, a malicious receiver acknowledging partial segments could cause the sender to declare the timeout spurious in a case where data was lost. If the next ACK after RTO is duplicate, we do not retransmit anything, which is equal to what conservative conventional recovery does in such case. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> --- net/ipv4/tcp_input.c | 14 +++++++++----- 1 files changed, 9 insertions(+), 5 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 5831daa..2679279 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1296,7 +1296,7 @@ void tcp_enter_frto(struct sock *sk) * which indicates that we should follow the traditional RTO recovery, * i.e. mark everything lost and do go-back-N retransmission. */ -static void tcp_enter_frto_loss(struct sock *sk) +static void tcp_enter_frto_loss(struct sock *sk, int allowed_segments) { struct tcp_sock *tp = tcp_sk(sk); struct sk_buff *skb; @@ -1326,7 +1326,7 @@ static void tcp_enter_frto_loss(struct s } tcp_sync_left_out(tp); - tp->snd_cwnd = tp->frto_counter + tcp_packets_in_flight(tp)+1; + tp->snd_cwnd = tcp_packets_in_flight(tp) + allowed_segments; tp->snd_cwnd_cnt = 0; tp->snd_cwnd_stamp = tcp_time_stamp; tp->undo_marker = 0; @@ -2527,6 +2527,11 @@ static void tcp_process_frto(struct sock if (flag&FLAG_DATA_ACKED) inet_csk(sk)->icsk_retransmits = 0; + if (!before(tp->snd_una, tp->frto_highmark)) { + tcp_enter_frto_loss(sk, tp->frto_counter + 1); + return; + } + /* RFC4138 shortcoming in step 2; should also have case c): ACK isn't * duplicate nor advances window, e.g., opposite dir data, winupdate */ @@ -2534,9 +2539,8 @@ static void tcp_process_frto(struct sock !(flag&FLAG_FORWARD_PROGRESS)) return; - if (tp->snd_una == prior_snd_una || - !before(tp->snd_una, tp->frto_highmark)) { - tcp_enter_frto_loss(sk); + if (!(flag&FLAG_DATA_ACKED)) { + tcp_enter_frto_loss(sk, (tp->frto_counter == 1 ? 0 : 3)); return; } -- 1.4.2 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 9/18] [TCP] FRTO: Response should reset also snd_cwnd_cnt 2007-02-19 11:38 ` [PATCH 8/18] [TCP] FRTO: fixes fallback to conventional recovery Ilpo Järvinen @ 2007-02-19 11:38 ` Ilpo Järvinen 2007-02-19 11:38 ` [PATCH 10/18] [TCP]: Don't enter to fast recovery while using FRTO Ilpo Järvinen 0 siblings, 1 reply; 19+ messages in thread From: Ilpo Järvinen @ 2007-02-19 11:38 UTC (permalink / raw) To: netdev; +Cc: David Miller, Pasi Sarolahti Since purpose is to reduce CWND, we prevent immediate growth. This is not a major issue nor is "the correct way" specified anywhere. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> --- net/ipv4/tcp_input.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 2679279..9637abd 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2490,6 +2490,7 @@ static int tcp_ack_update_window(struct static void tcp_conservative_spur_to_response(struct tcp_sock *tp) { tp->snd_cwnd = min(tp->snd_cwnd, tp->snd_ssthresh); + tp->snd_cwnd_cnt = 0; tcp_moderate_cwnd(tp); } -- 1.4.2 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 10/18] [TCP]: Don't enter to fast recovery while using FRTO 2007-02-19 11:38 ` [PATCH 9/18] [TCP] FRTO: Response should reset also snd_cwnd_cnt Ilpo Järvinen @ 2007-02-19 11:38 ` Ilpo Järvinen 2007-02-19 11:38 ` [PATCH 11/18] [TCP] FRTO: frto_counter modulo-op converted to two assignments Ilpo Järvinen 0 siblings, 1 reply; 19+ messages in thread From: Ilpo Järvinen @ 2007-02-19 11:38 UTC (permalink / raw) To: netdev; +Cc: David Miller, Pasi Sarolahti Because TCP is not in Loss state during FRTO recovery, fast recovery could be triggered by accident. Non-SACK FRTO is more robust than not yet included SACK-enhanced version (that can receiver high number of duplicate ACKs with SACK blocks during FRTO), at least with unidirectional transfers, but under extraordinary patterns fast recovery can be incorrectly triggered, e.g., Data loss+ACK losses => cumulative ACK with enough SACK blocks to meet sacked_out >= dupthresh condition). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> --- net/ipv4/tcp_input.c | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 9637abd..309da3e 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1547,6 +1547,10 @@ static int tcp_time_to_recover(struct so { __u32 packets_out; + /* Do not perform any recovery during FRTO algorithm */ + if (tp->frto_counter) + return 0; + /* Trick#1: The loss is proven. */ if (tp->lost_out) return 1; -- 1.4.2 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 11/18] [TCP] FRTO: frto_counter modulo-op converted to two assignments 2007-02-19 11:38 ` [PATCH 10/18] [TCP]: Don't enter to fast recovery while using FRTO Ilpo Järvinen @ 2007-02-19 11:38 ` Ilpo Järvinen 2007-02-19 11:38 ` [PATCH 12/18] [TCP]: Prevent unrelated cwnd adjustment while using FRTO Ilpo Järvinen 0 siblings, 1 reply; 19+ messages in thread From: Ilpo Järvinen @ 2007-02-19 11:38 UTC (permalink / raw) To: netdev; +Cc: David Miller, Pasi Sarolahti Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> --- net/ipv4/tcp_input.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 309da3e..9fc7f66 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2551,11 +2551,11 @@ static void tcp_process_frto(struct sock if (tp->frto_counter == 1) { tp->snd_cwnd = tcp_packets_in_flight(tp) + 2; + tp->frto_counter = 2; } else /* frto_counter == 2 */ { tcp_conservative_spur_to_response(tp); + tp->frto_counter = 0; } - - tp->frto_counter = (tp->frto_counter + 1) % 3; } /* This routine deals with incoming acks, but not outgoing ones. */ -- 1.4.2 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 12/18] [TCP]: Prevent unrelated cwnd adjustment while using FRTO 2007-02-19 11:38 ` [PATCH 11/18] [TCP] FRTO: frto_counter modulo-op converted to two assignments Ilpo Järvinen @ 2007-02-19 11:38 ` Ilpo Järvinen 2007-02-19 11:38 ` [PATCH 13/18] [TCP] FRTO: Entry is allowed only during (New)Reno like recovery Ilpo Järvinen 0 siblings, 1 reply; 19+ messages in thread From: Ilpo Järvinen @ 2007-02-19 11:38 UTC (permalink / raw) To: netdev; +Cc: David Miller, Pasi Sarolahti FRTO controls cwnd when it still processes the ACK input or it has just reverted back to conventional RTO recovery; the normal rules apply when FRTO has reverted to standard congestion control. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> --- net/ipv4/tcp_input.c | 18 +++++++++++------- 1 files changed, 11 insertions(+), 7 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 9fc7f66..5e952f0 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2522,7 +2522,7 @@ static void tcp_conservative_spur_to_res * to prove that the RTO is indeed spurious. It transfers the control * from F-RTO to the conventional RTO recovery */ -static void tcp_process_frto(struct sock *sk, u32 prior_snd_una, int flag) +static int tcp_process_frto(struct sock *sk, u32 prior_snd_una, int flag) { struct tcp_sock *tp = tcp_sk(sk); @@ -2534,7 +2534,7 @@ static void tcp_process_frto(struct sock if (!before(tp->snd_una, tp->frto_highmark)) { tcp_enter_frto_loss(sk, tp->frto_counter + 1); - return; + return 1; } /* RFC4138 shortcoming in step 2; should also have case c): ACK isn't @@ -2542,20 +2542,22 @@ static void tcp_process_frto(struct sock */ if ((tp->snd_una == prior_snd_una) && (flag&FLAG_NOT_DUP) && !(flag&FLAG_FORWARD_PROGRESS)) - return; + return 1; if (!(flag&FLAG_DATA_ACKED)) { tcp_enter_frto_loss(sk, (tp->frto_counter == 1 ? 0 : 3)); - return; + return 1; } if (tp->frto_counter == 1) { tp->snd_cwnd = tcp_packets_in_flight(tp) + 2; tp->frto_counter = 2; + return 1; } else /* frto_counter == 2 */ { tcp_conservative_spur_to_response(tp); tp->frto_counter = 0; } + return 0; } /* This routine deals with incoming acks, but not outgoing ones. */ @@ -2569,6 +2571,7 @@ static int tcp_ack(struct sock *sk, stru u32 prior_in_flight; s32 seq_rtt; int prior_packets; + int frto_cwnd = 0; /* If the ack is newer than sent or older than previous acks * then we can probably ignore it. @@ -2631,15 +2634,16 @@ static int tcp_ack(struct sock *sk, stru flag |= tcp_clean_rtx_queue(sk, &seq_rtt); if (tp->frto_counter) - tcp_process_frto(sk, prior_snd_una, flag); + frto_cwnd = tcp_process_frto(sk, prior_snd_una, flag); if (tcp_ack_is_dubious(sk, flag)) { /* Advance CWND, if state allows this. */ - if ((flag & FLAG_DATA_ACKED) && tcp_may_raise_cwnd(sk, flag)) + if ((flag & FLAG_DATA_ACKED) && !frto_cwnd && + tcp_may_raise_cwnd(sk, flag)) tcp_cong_avoid(sk, ack, seq_rtt, prior_in_flight, 0); tcp_fastretrans_alert(sk, prior_snd_una, prior_packets, flag); } else { - if ((flag & FLAG_DATA_ACKED)) + if ((flag & FLAG_DATA_ACKED) && !frto_cwnd) tcp_cong_avoid(sk, ack, seq_rtt, prior_in_flight, 1); } -- 1.4.2 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 13/18] [TCP] FRTO: Entry is allowed only during (New)Reno like recovery 2007-02-19 11:38 ` [PATCH 12/18] [TCP]: Prevent unrelated cwnd adjustment while using FRTO Ilpo Järvinen @ 2007-02-19 11:38 ` Ilpo Järvinen 2007-02-19 11:38 ` [PATCH 14/18] [TCP] FRTO: Reverse RETRANS bit clearing logic Ilpo Järvinen 0 siblings, 1 reply; 19+ messages in thread From: Ilpo Järvinen @ 2007-02-19 11:38 UTC (permalink / raw) To: netdev; +Cc: David Miller, Pasi Sarolahti This interpretation comes from RFC4138: "If the sender implements some loss recovery algorithm other than Reno or NewReno [FHG04], the F-RTO algorithm SHOULD NOT be entered when earlier fast recovery is underway." I think the RFC means to say (especially in the light of Appendix B) that ...recovery is underway (not just fast recovery) or was underway when it was interrupted by an earlier (F-)RTO that hasn't yet been resolved (snd_una has not advanced enough). Thus, my interpretation is that whenever TCP has ever retransmitted other than head, basic version cannot be used because then the order assumptions which are used as FRTO basis do not hold. NewReno has only the head segment retransmitted at a time. Therefore, walk up to the segment that has not been SACKed, if that segment is not retransmitted nor anything before it, we know for sure, that nothing after the non-SACKed segment should be either. This assumption is valid because TCPCB_EVER_RETRANS does not leave holes but each non-SACKed segment is rexmitted in-order. Check for retrans_out > 1 avoids more expensive walk through the skb list, as we can know the result beforehand: F-RTO will not be allowed. SACKed skb can turn into non-SACked only in the extremely rare case of SACK reneging, in this case we might fail to detect retransmissions if there were them for any other than head. To get rid of that feature, whole rexmit queue would have to be walked (always) or FRTO should be prevented when SACK reneging happens. Of course RTO should still trigger after reneging which makes this issue even less likely to show up. And as long as the response is as conservative as it's now, nothing bad happens even then. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> --- include/net/tcp.h | 2 +- net/ipv4/tcp_input.c | 25 +++++++++++++++++++++---- 2 files changed, 22 insertions(+), 5 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 572a77b..7fd6b77 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -341,7 +341,7 @@ extern struct sock * tcp_check_req(stru extern int tcp_child_process(struct sock *parent, struct sock *child, struct sk_buff *skb); -extern int tcp_use_frto(const struct sock *sk); +extern int tcp_use_frto(struct sock *sk); extern void tcp_enter_frto(struct sock *sk); extern void tcp_enter_loss(struct sock *sk, int how); extern void tcp_clear_retrans(struct tcp_sock *tp); diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 5e952f0..8f0aa9d 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1239,14 +1239,31 @@ #endif /* F-RTO can only be used if these conditions are satisfied: * - there must be some unsent new data * - the advertised window should allow sending it + * - TCP has never retransmitted anything other than head */ -int tcp_use_frto(const struct sock *sk) +int tcp_use_frto(struct sock *sk) { const struct tcp_sock *tp = tcp_sk(sk); + struct sk_buff *skb; - return (sysctl_tcp_frto && sk->sk_send_head && - !after(TCP_SKB_CB(sk->sk_send_head)->end_seq, - tp->snd_una + tp->snd_wnd)); + if (!sysctl_tcp_frto || !sk->sk_send_head || + after(TCP_SKB_CB(sk->sk_send_head)->end_seq, + tp->snd_una + tp->snd_wnd)) + return 0; + + /* Avoid expensive walking of rexmit queue if possible */ + if (tp->retrans_out > 1) + return 0; + + skb = skb_peek(&sk->sk_write_queue)->next; /* Skips head */ + sk_stream_for_retrans_queue_from(skb, sk) { + if (TCP_SKB_CB(skb)->sacked&TCPCB_RETRANS) + return 0; + /* Short-circuit when first non-SACKed skb has been checked */ + if (!(TCP_SKB_CB(skb)->sacked&TCPCB_SACKED_ACKED)) + break; + } + return 1; } /* RTO occurred, but do not yet enter Loss state. Instead, defer RTO -- 1.4.2 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 14/18] [TCP] FRTO: Reverse RETRANS bit clearing logic 2007-02-19 11:38 ` [PATCH 13/18] [TCP] FRTO: Entry is allowed only during (New)Reno like recovery Ilpo Järvinen @ 2007-02-19 11:38 ` Ilpo Järvinen [not found] ` <1171885092244-git- send-email-ilpo.jarvinen@helsinki.fi> 2007-02-19 11:38 ` [PATCH 15/18] [TCP] FRTO: Fake cwnd for ssthresh callback Ilpo Järvinen 0 siblings, 2 replies; 19+ messages in thread From: Ilpo Järvinen @ 2007-02-19 11:38 UTC (permalink / raw) To: netdev; +Cc: David Miller, Pasi Sarolahti Previously RETRANS bits were cleared on the entry to FRTO. We postpone that into tcp_enter_frto_loss, which is really the place were the clearing should be done anyway. This allows simplification of the logic from a clearing loop to the head skb clearing only. Besides, the other changes made in the previous patches to tcp_use_frto made it impossible for the non-SACKed FRTO to be entered if other than the head has been rexmitted. With SACK-enhanced FRTO (and Appendix B), however, there can be a number retransmissions in flight when RTO expires (same thing could happen before this patchset also with non-SACK FRTO). To not introduce any jumpiness into the packet counting during FRTO, instead of clearing RETRANS bits from skbs during entry, do it later on. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> --- net/ipv4/tcp_input.c | 35 +++++++++++++++++++++++------------ 1 files changed, 23 insertions(+), 12 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 8f0aa9d..2c0b387 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1268,7 +1268,11 @@ int tcp_use_frto(struct sock *sk) /* RTO occurred, but do not yet enter Loss state. Instead, defer RTO * recovery a bit and use heuristics in tcp_process_frto() to detect if - * the RTO was spurious. + * the RTO was spurious. Only clear SACKED_RETRANS of the head here to + * keep retrans_out counting accurate (with SACK F-RTO, other than head + * may still have that bit set); TCPCB_LOST and remaining SACKED_RETRANS + * bits are handled if the Loss state is really to be entered (in + * tcp_enter_frto_loss). * * Do like tcp_enter_loss() would; when RTO expires the second time it * does: @@ -1289,17 +1293,13 @@ void tcp_enter_frto(struct sock *sk) tcp_ca_event(sk, CA_EVENT_FRTO); } - /* Have to clear retransmission markers here to keep the bookkeeping - * in shape, even though we are not yet in Loss state. - * If something was really lost, it is eventually caught up - * in tcp_enter_frto_loss. - */ - tp->retrans_out = 0; tp->undo_marker = tp->snd_una; tp->undo_retrans = 0; - sk_stream_for_retrans_queue(skb, sk) { + skb = skb_peek(&sk->sk_write_queue); + if (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_RETRANS) { TCP_SKB_CB(skb)->sacked &= ~TCPCB_SACKED_RETRANS; + tp->retrans_out -= tcp_skb_pcount(skb); } tcp_sync_left_out(tp); @@ -1313,7 +1313,7 @@ void tcp_enter_frto(struct sock *sk) * which indicates that we should follow the traditional RTO recovery, * i.e. mark everything lost and do go-back-N retransmission. */ -static void tcp_enter_frto_loss(struct sock *sk, int allowed_segments) +static void tcp_enter_frto_loss(struct sock *sk, int allowed_segments, int flag) { struct tcp_sock *tp = tcp_sk(sk); struct sk_buff *skb; @@ -1322,10 +1322,21 @@ static void tcp_enter_frto_loss(struct s tp->sacked_out = 0; tp->lost_out = 0; tp->fackets_out = 0; + tp->retrans_out = 0; sk_stream_for_retrans_queue(skb, sk) { cnt += tcp_skb_pcount(skb); - TCP_SKB_CB(skb)->sacked &= ~TCPCB_LOST; + /* + * Count the retransmission made on RTO correctly (only when + * waiting for the first ACK and did not get it)... + */ + if ((tp->frto_counter == 1) && !(flag&FLAG_DATA_ACKED)) { + tp->retrans_out += tcp_skb_pcount(skb); + /* ...enter this if branch just for the first segment */ + flag |= FLAG_DATA_ACKED; + } else { + TCP_SKB_CB(skb)->sacked &= ~(TCPCB_LOST|TCPCB_SACKED_RETRANS); + } if (!(TCP_SKB_CB(skb)->sacked&TCPCB_SACKED_ACKED)) { /* Do not mark those segments lost that were @@ -2550,7 +2561,7 @@ static int tcp_process_frto(struct sock inet_csk(sk)->icsk_retransmits = 0; if (!before(tp->snd_una, tp->frto_highmark)) { - tcp_enter_frto_loss(sk, tp->frto_counter + 1); + tcp_enter_frto_loss(sk, tp->frto_counter + 1, flag); return 1; } @@ -2562,7 +2573,7 @@ static int tcp_process_frto(struct sock return 1; if (!(flag&FLAG_DATA_ACKED)) { - tcp_enter_frto_loss(sk, (tp->frto_counter == 1 ? 0 : 3)); + tcp_enter_frto_loss(sk, (tp->frto_counter == 1 ? 0 : 3), flag); return 1; } -- 1.4.2 ^ permalink raw reply related [flat|nested] 19+ messages in thread
[parent not found: <1171885092244-git- send-email-ilpo.jarvinen@helsinki.fi>]
* [PATCH 15/18] [TCP] FRTO: Fake cwnd for ssthresh callback 2007-02-19 11:38 ` [PATCH 14/18] [TCP] FRTO: Reverse RETRANS bit clearing logic Ilpo Järvinen [not found] ` <1171885092244-git- send-email-ilpo.jarvinen@helsinki.fi> @ 2007-02-19 11:38 ` Ilpo Järvinen 2007-02-19 11:38 ` [PATCH 16/18] [TCP]: Prevent reordering adjustments during FRTO Ilpo Järvinen 1 sibling, 1 reply; 19+ messages in thread From: Ilpo Järvinen @ 2007-02-19 11:38 UTC (permalink / raw) To: netdev; +Cc: David Miller, Pasi Sarolahti TCP without FRTO would be in Loss state with small cwnd. FRTO, however, leaves cwnd (typically) to a larger value which causes ssthresh to become too large in case RTO is triggered again compared to what conventional recovery would do. Because consecutive RTOs result in only a single ssthresh reduction, RTO+cumulative ACK+RTO pattern is required to trigger this event. A large comment is included for congestion control module writers trying to figure out what CA_EVENT_FRTO handler should do because there exists a remote possibility of incompatibility between FRTO and module defined ssthresh functions. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> --- net/ipv4/tcp_input.c | 26 +++++++++++++++++++++++++- 1 files changed, 25 insertions(+), 1 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 2c0b387..5d935b1 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1289,7 +1289,31 @@ void tcp_enter_frto(struct sock *sk) ((icsk->icsk_ca_state == TCP_CA_Loss || tp->frto_counter) && !icsk->icsk_retransmits)) { tp->prior_ssthresh = tcp_current_ssthresh(sk); - tp->snd_ssthresh = icsk->icsk_ca_ops->ssthresh(sk); + /* Our state is too optimistic in ssthresh() call because cwnd + * is not reduced until tcp_enter_frto_loss() when previous FRTO + * recovery has not yet completed. Pattern would be this: RTO, + * Cumulative ACK, RTO (2xRTO for the same segment does not end + * up here twice). + * RFC4138 should be more specific on what to do, even though + * RTO is quite unlikely to occur after the first Cumulative ACK + * due to back-off and complexity of triggering events ... + */ + if (tp->frto_counter) { + u32 stored_cwnd; + stored_cwnd = tp->snd_cwnd; + tp->snd_cwnd = 2; + tp->snd_ssthresh = icsk->icsk_ca_ops->ssthresh(sk); + tp->snd_cwnd = stored_cwnd; + } else { + tp->snd_ssthresh = icsk->icsk_ca_ops->ssthresh(sk); + } + /* ... in theory, cong.control module could do "any tricks" in + * ssthresh(), which means that ca_state, lost bits and lost_out + * counter would have to be faked before the call occurs. We + * consider that too expensive, unlikely and hacky, so modules + * using these in ssthresh() must deal these incompatibility + * issues if they receives CA_EVENT_FRTO and frto_counter != 0 + */ tcp_ca_event(sk, CA_EVENT_FRTO); } -- 1.4.2 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 16/18] [TCP]: Prevent reordering adjustments during FRTO 2007-02-19 11:38 ` [PATCH 15/18] [TCP] FRTO: Fake cwnd for ssthresh callback Ilpo Järvinen @ 2007-02-19 11:38 ` Ilpo Järvinen 2007-02-19 11:38 ` [PATCH 17/18] [TCP]: SACK enhanced FRTO Ilpo Järvinen 0 siblings, 1 reply; 19+ messages in thread From: Ilpo Järvinen @ 2007-02-19 11:38 UTC (permalink / raw) To: netdev; +Cc: David Miller, Pasi Sarolahti To be honest, I'm not too sure how the reord stuff works in the first place but this seems necessary. When FRTO has been active, the one and only retransmission could be unnecessary but the state and sending order might not be what the sacktag code expects it to be (to work correctly). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> --- net/ipv4/tcp_input.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 5d935b1..356de02 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1224,7 +1224,8 @@ tcp_sacktag_write_queue(struct sock *sk, tp->left_out = tp->sacked_out + tp->lost_out; - if ((reord < tp->fackets_out) && icsk->icsk_ca_state != TCP_CA_Loss) + if ((reord < tp->fackets_out) && icsk->icsk_ca_state != TCP_CA_Loss && + (tp->frto_highmark && after(tp->snd_una, tp->frto_highmark))) tcp_update_reordering(sk, ((tp->fackets_out + 1) - reord), 0); #if FASTRETRANS_DEBUG > 0 -- 1.4.2 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 17/18] [TCP]: SACK enhanced FRTO 2007-02-19 11:38 ` [PATCH 16/18] [TCP]: Prevent reordering adjustments during FRTO Ilpo Järvinen @ 2007-02-19 11:38 ` Ilpo Järvinen 2007-02-19 11:38 ` [PATCH 18/18] [TCP] FRTO: Sysctl documentation for SACK enhanced version Ilpo Järvinen 0 siblings, 1 reply; 19+ messages in thread From: Ilpo Järvinen @ 2007-02-19 11:38 UTC (permalink / raw) To: netdev; +Cc: David Miller, Pasi Sarolahti Implements the SACK-enhanced FRTO given in RFC4138 using the variant given in Appendix B. RFC4138, Appendix B: "This means that in order to declare timeout spurious, the TCP sender must receive an acknowledgment for non-retransmitted segment between SND.UNA and RecoveryPoint in algorithm step 3. RecoveryPoint is defined in conservative SACK-recovery algorithm [RFC3517]" The basic version of the FRTO algorithm can still be used also when SACK is enabled. To enabled SACK-enhanced version, tcp_frto sysctl is set to 2. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> --- net/ipv4/tcp_input.c | 76 +++++++++++++++++++++++++++++++++++++++++++------- 1 files changed, 65 insertions(+), 11 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 356de02..3ce4019 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -100,6 +100,7 @@ #define FLAG_DATA_SACKED 0x20 /* New SAC #define FLAG_ECE 0x40 /* ECE in this ACK */ #define FLAG_DATA_LOST 0x80 /* SACK detected data lossage. */ #define FLAG_SLOWPATH 0x100 /* Do not skip RFC checks for window update.*/ +#define FLAG_ONLY_ORIG_SACKED 0x200 /* SACKs only non-rexmit sent before RTO */ #define FLAG_ACKED (FLAG_DATA_ACKED|FLAG_SYN_ACKED) #define FLAG_NOT_DUP (FLAG_DATA|FLAG_WIN_UPDATE|FLAG_ACKED) @@ -110,6 +111,8 @@ #define IsReno(tp) ((tp)->rx_opt.sack_ok #define IsFack(tp) ((tp)->rx_opt.sack_ok & 2) #define IsDSack(tp) ((tp)->rx_opt.sack_ok & 4) +#define IsSackFrto() (sysctl_tcp_frto == 0x2) + #define TCP_REMNANT (TCP_FLAG_FIN|TCP_FLAG_URG|TCP_FLAG_SYN|TCP_FLAG_PSH) /* Adapt the MSS value used to make delayed ack decision to the @@ -1159,6 +1162,18 @@ tcp_sacktag_write_queue(struct sock *sk, /* clear lost hint */ tp->retransmit_skb_hint = NULL; } + /* SACK enhanced F-RTO detection. + * Set flag if and only if non-rexmitted + * segments below frto_highmark are + * SACKed (RFC4138; Appendix B). + * Clearing correct due to in-order walk + */ + if (after(end_seq, tp->frto_highmark)) { + flag &= ~FLAG_ONLY_ORIG_SACKED; + } else { + if (!(sacked & TCPCB_RETRANS)) + flag |= FLAG_ONLY_ORIG_SACKED; + } } TCP_SKB_CB(skb)->sacked |= TCPCB_SACKED_ACKED; @@ -1240,7 +1255,8 @@ #endif /* F-RTO can only be used if these conditions are satisfied: * - there must be some unsent new data * - the advertised window should allow sending it - * - TCP has never retransmitted anything other than head + * - TCP has never retransmitted anything other than head (SACK enhanced + * variant from Appendix B of RFC4138 is more robust here) */ int tcp_use_frto(struct sock *sk) { @@ -1252,6 +1268,9 @@ int tcp_use_frto(struct sock *sk) tp->snd_una + tp->snd_wnd)) return 0; + if (IsSackFrto()) + return 1; + /* Avoid expensive walking of rexmit queue if possible */ if (tp->retrans_out > 1) return 0; @@ -1328,9 +1347,18 @@ void tcp_enter_frto(struct sock *sk) } tcp_sync_left_out(tp); + /* Earlier loss recovery underway (see RFC4138; Appendix B). + * The last condition is necessary at least in tp->frto_counter case. + */ + if (IsSackFrto() && (tp->frto_counter || + ((1 << icsk->icsk_ca_state) & (TCPF_CA_Recovery|TCPF_CA_Loss))) && + after(tp->high_seq, tp->snd_una)) { + tp->frto_highmark = tp->high_seq; + } else { + tp->frto_highmark = tp->snd_nxt; + } tcp_set_ca_state(sk, TCP_CA_Disorder); tp->high_seq = tp->snd_nxt; - tp->frto_highmark = tp->snd_nxt; tp->frto_counter = 1; } @@ -2566,6 +2594,10 @@ static void tcp_conservative_spur_to_res * Rationale: if the RTO was spurious, new ACKs should arrive from the * original window even after we transmit two new data segments. * + * SACK version: + * on first step, wait until first cumulative ACK arrives, then move to + * the second step. In second step, the next ACK decides. + * * F-RTO is implemented (mainly) in four functions: * - tcp_use_frto() is used to determine if TCP is can use F-RTO * - tcp_enter_frto() prepares TCP state on RTO if F-RTO is used, it is @@ -2590,16 +2622,38 @@ static int tcp_process_frto(struct sock return 1; } - /* RFC4138 shortcoming in step 2; should also have case c): ACK isn't - * duplicate nor advances window, e.g., opposite dir data, winupdate - */ - if ((tp->snd_una == prior_snd_una) && (flag&FLAG_NOT_DUP) && - !(flag&FLAG_FORWARD_PROGRESS)) - return 1; + if (!IsSackFrto() || IsReno(tp)) { + /* RFC4138 shortcoming in step 2; should also have case c): + * ACK isn't duplicate nor advances window, e.g., opposite dir + * data, winupdate + */ + if ((tp->snd_una == prior_snd_una) && (flag&FLAG_NOT_DUP) && + !(flag&FLAG_FORWARD_PROGRESS)) + return 1; - if (!(flag&FLAG_DATA_ACKED)) { - tcp_enter_frto_loss(sk, (tp->frto_counter == 1 ? 0 : 3), flag); - return 1; + if (!(flag&FLAG_DATA_ACKED)) { + tcp_enter_frto_loss(sk, (tp->frto_counter == 1 ? 0 : 3), + flag); + return 1; + } + } else { + if (!(flag&FLAG_DATA_ACKED) && (tp->frto_counter == 1)) { + /* Prevent sending of new data. */ + tp->snd_cwnd = min(tp->snd_cwnd, + tcp_packets_in_flight(tp)); + return 1; + } + + if ((tp->frto_counter == 2) && + (!(flag&FLAG_FORWARD_PROGRESS) || + ((flag&FLAG_DATA_SACKED) && !(flag&FLAG_ONLY_ORIG_SACKED)))) { + /* RFC4138 shortcoming (see comment above) */ + if (!(flag&FLAG_FORWARD_PROGRESS) && (flag&FLAG_NOT_DUP)) + return 1; + + tcp_enter_frto_loss(sk, 3, flag); + return 1; + } } if (tp->frto_counter == 1) { -- 1.4.2 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 18/18] [TCP] FRTO: Sysctl documentation for SACK enhanced version 2007-02-19 11:38 ` [PATCH 17/18] [TCP]: SACK enhanced FRTO Ilpo Järvinen @ 2007-02-19 11:38 ` Ilpo Järvinen 0 siblings, 0 replies; 19+ messages in thread From: Ilpo Järvinen @ 2007-02-19 11:38 UTC (permalink / raw) To: netdev; +Cc: David Miller, Pasi Sarolahti The description is overly verbose to avoid ambiguity between "SACK enabled" and "SACK enhanced FRTO" Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> --- Documentation/networking/ip-sysctl.txt | 5 ++++- 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index a0f6842..d66777b 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -178,7 +178,10 @@ tcp_frto - BOOLEAN Enables F-RTO, an enhanced recovery algorithm for TCP retransmission timeouts. It is particularly beneficial in wireless environments where packet loss is typically due to random radio interference - rather than intermediate router congestion. + rather than intermediate router congestion. If set to 1, basic + version is enabled. 2 enables SACK enhanced FRTO, which is + EXPERIMENTAL. The basic version can be used also when SACK is + enabled for a flow through tcp_sack sysctl. tcp_keepalive_time - INTEGER How often TCP sends out keepalive messages when keepalive is enabled. -- 1.4.2 ^ permalink raw reply related [flat|nested] 19+ messages in thread
end of thread, other threads:[~2007-02-19 11:38 UTC | newest] Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2007-02-19 11:37 [PATCHSET 0/18] FRTO: fixes and small changes + SACK enhanced version Ilpo Järvinen 2007-02-19 11:37 ` [PATCH 1/18] [TCP] FRTO: Incorrectly clears TCPCB_EVER_RETRANS bit Ilpo Järvinen 2007-02-19 11:37 ` [PATCH 2/18] [TCP] FRTO: Separated response from FRTO detection algorithm Ilpo Järvinen 2007-02-19 11:37 ` [PATCH 3/18] [TCP] FRTO: Moved tcp_use_frto from tcp.h to tcp_input.c Ilpo Järvinen 2007-02-19 11:37 ` [PATCH 4/18] [TCP] FRTO: Comment cleanup & improvement Ilpo Järvinen 2007-02-19 11:37 ` [PATCH 5/18] [TCP] FRTO: Consecutive RTOs keep prior_ssthresh and ssthresh Ilpo Järvinen 2007-02-19 11:38 ` [PATCH 6/18] [TCP] FRTO: Use Disorder state during operation instead of Open Ilpo Järvinen 2007-02-19 11:38 ` [PATCH 7/18] [TCP] FRTO: Ignore some uninteresting ACKs Ilpo Järvinen 2007-02-19 11:38 ` [PATCH 8/18] [TCP] FRTO: fixes fallback to conventional recovery Ilpo Järvinen 2007-02-19 11:38 ` [PATCH 9/18] [TCP] FRTO: Response should reset also snd_cwnd_cnt Ilpo Järvinen 2007-02-19 11:38 ` [PATCH 10/18] [TCP]: Don't enter to fast recovery while using FRTO Ilpo Järvinen 2007-02-19 11:38 ` [PATCH 11/18] [TCP] FRTO: frto_counter modulo-op converted to two assignments Ilpo Järvinen 2007-02-19 11:38 ` [PATCH 12/18] [TCP]: Prevent unrelated cwnd adjustment while using FRTO Ilpo Järvinen 2007-02-19 11:38 ` [PATCH 13/18] [TCP] FRTO: Entry is allowed only during (New)Reno like recovery Ilpo Järvinen 2007-02-19 11:38 ` [PATCH 14/18] [TCP] FRTO: Reverse RETRANS bit clearing logic Ilpo Järvinen [not found] ` <1171885092244-git- send-email-ilpo.jarvinen@helsinki.fi> 2007-02-19 11:38 ` [PATCH 15/18] [TCP] FRTO: Fake cwnd for ssthresh callback Ilpo Järvinen 2007-02-19 11:38 ` [PATCH 16/18] [TCP]: Prevent reordering adjustments during FRTO Ilpo Järvinen 2007-02-19 11:38 ` [PATCH 17/18] [TCP]: SACK enhanced FRTO Ilpo Järvinen 2007-02-19 11:38 ` [PATCH 18/18] [TCP] FRTO: Sysctl documentation for SACK enhanced version Ilpo Järvinen
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.