All of lore.kernel.org
 help / color / mirror / Atom feed
* tcp: picking a less conservative SACK RTT for congestion control
@ 2015-04-11 19:50 Kenneth Klette Jonassen
  2015-04-12 15:59 ` Yuchung Cheng
  0 siblings, 1 reply; 3+ messages in thread
From: Kenneth Klette Jonassen @ 2015-04-11 19:50 UTC (permalink / raw)
  To: netdev
  Cc: Eric Dumazet, Neal Cardwell, Yuchung Cheng, Ilpo Järvinen,
	Stephen Hemminger

tcp_sacktag_one() currently picks the earliest sequence sacked for RTT. This
makes sense when data is sacked due to reordering as described in commit
832d11c5 ("Try to restore large SKBs while SACK processing"). But it might
not make sense for CC in cases where:

 1. ACKs are lost, i.e. a SACK subsequent to a lost SACK covers both a new
    and an old segment at the receiver. A concrete example follows below.
 2. The receiver disregards the rfc5681 recommendation to immediately ack
    out-of-order segments, perhaps due to a hardware offload mechanism.

We have an implementation of the experimental congestion controller CDG [1]
which can perform slightly better in environments with random loss. Unlike
e.g. Vegas which resets all internal state when loss is detected, CDG is
quite sensitive to recent RTT changes even during loss recovery.

What would be the feasible approach to track the last segment sacked? I was
thinking of keeping first/last skb_mstamp's in struct tcp_sacktag_state akin
to the way it is done in tcp_clean_rtx_queue(). This would require passing
eight more bytes around on 64 bit. An alternative that is slightly obscure
is to store the delta between the first and last sack in a 4 byte value.
Since struct tcp_sacktag_state currently has 4 bytes padding, this does not
require passing more data around -- just changing "long sack_rtt_us" to
a pointer. It can have some microscale cache locality impacts though. I
envision that both approaches saves the call to skb_mstamp_get() in
tcp_sacktag_one().

1. http://caia.swin.edu.au/cv/dahayes/content/networking2011-cdg-preprint.pdf

PS: The pkts_acked CC hook is not currently called unless new data is acked
sequentially. I have a simple patch that calls it for new SACK RTTs, but I
am holding it off until my recent patch is reviewed (fix bogus RTT for CC).

---

Concrete example. Path has 1% uniform loss, no reordering. Prints show delta
timestamped packets separately captured at sender and receiver.

Receiver sends two acks:
00:00:00.005018 IP 10.0.1.2.5001 > 10.0.0.2.48089: Flags [.], ack
3824632751, win 32746, options [nop,nop,TS val 1820536519 ecr
2169294,nop,nop,sack 1 {3824634199:3824651575}], length 0
00:00:00.004871 IP 10.0.1.2.5001 > 10.0.0.2.48089: Flags [.], ack
3824632751, win 32746, options [nop,nop,TS val 1820536524 ecr
2169294,nop,nop,sack 1 {3824634199:3824653023}], length 0

One reaches the sender:
00:00:00.009842 IP 10.0.1.2.5001 > 10.0.0.2.48089: Flags [.], ack
3824632751, win 32746, options [nop,nop,TS val 1820536524 ecr
2169294,nop,nop,sack 1 {3824634199:3824653023}], length 0

Trace output at sender:
8968.105153: tcp_sacktag_one: first sacked range 3824648679 -
3824651575 rtt 75129
8968.105157: tcp_sacktag_one: later sacked range 3824651575 -
3824653023 rtt 70224 (rtt not used)

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: tcp: picking a less conservative SACK RTT for congestion control
  2015-04-11 19:50 tcp: picking a less conservative SACK RTT for congestion control Kenneth Klette Jonassen
@ 2015-04-12 15:59 ` Yuchung Cheng
  2015-04-12 17:53   ` Kenneth Klette Jonassen
  0 siblings, 1 reply; 3+ messages in thread
From: Yuchung Cheng @ 2015-04-12 15:59 UTC (permalink / raw)
  To: Kenneth Klette Jonassen
  Cc: netdev, Eric Dumazet, Neal Cardwell, Ilpo Järvinen,
	Stephen Hemminger

On Sat, Apr 11, 2015 at 12:50 PM, Kenneth Klette Jonassen
<kennetkl@ifi.uio.no> wrote:
> tcp_sacktag_one() currently picks the earliest sequence sacked for RTT. This
> makes sense when data is sacked due to reordering as described in commit
> 832d11c5 ("Try to restore large SKBs while SACK processing"). But it might
> not make sense for CC in cases where:
>
>  1. ACKs are lost, i.e. a SACK subsequent to a lost SACK covers both a new
>     and an old segment at the receiver. A concrete example follows below.
>  2. The receiver disregards the rfc5681 recommendation to immediately ack
>     out-of-order segments, perhaps due to a hardware offload mechanism.
>
> We have an implementation of the experimental congestion controller CDG [1]
> which can perform slightly better in environments with random loss. Unlike
> e.g. Vegas which resets all internal state when loss is detected, CDG is
> quite sensitive to recent RTT changes even during loss recovery.
>
> What would be the feasible approach to track the last segment sacked? I was
> thinking of keeping first/last skb_mstamp's in struct tcp_sacktag_state akin
> to the way it is done in tcp_clean_rtx_queue(). This would require passing
or use last sacked skb mstamp instead of first for sack_rtt? IMO it
won't matter that much for RTTM.

or a new ca_sack_rtt in tcp_sacktag_state and pass the state to
tcp_clean_rtx_queue as well. the latter is more generic and extendable?

> eight more bytes around on 64 bit. An alternative that is slightly obscure
> is to store the delta between the first and last sack in a 4 byte value.
> Since struct tcp_sacktag_state currently has 4 bytes padding, this does not
> require passing more data around -- just changing "long sack_rtt_us" to
> a pointer. It can have some microscale cache locality impacts though. I
seems too complicated

> envision that both approaches saves the call to skb_mstamp_get() in
> tcp_sacktag_one().
>
> 1. http://caia.swin.edu.au/cv/dahayes/content/networking2011-cdg-preprint.pdf
>
> PS: The pkts_acked CC hook is not currently called unless new data is acked
> sequentially. I have a simple patch that calls it for new SACK RTTs, but I
> am holding it off until my recent patch is reviewed (fix bogus RTT for CC).
>
> ---
>
> Concrete example. Path has 1% uniform loss, no reordering. Prints show delta
> timestamped packets separately captured at sender and receiver.
>
> Receiver sends two acks:
> 00:00:00.005018 IP 10.0.1.2.5001 > 10.0.0.2.48089: Flags [.], ack
> 3824632751, win 32746, options [nop,nop,TS val 1820536519 ecr
> 2169294,nop,nop,sack 1 {3824634199:3824651575}], length 0
> 00:00:00.004871 IP 10.0.1.2.5001 > 10.0.0.2.48089: Flags [.], ack
> 3824632751, win 32746, options [nop,nop,TS val 1820536524 ecr
> 2169294,nop,nop,sack 1 {3824634199:3824653023}], length 0
>
> One reaches the sender:
> 00:00:00.009842 IP 10.0.1.2.5001 > 10.0.0.2.48089: Flags [.], ack
> 3824632751, win 32746, options [nop,nop,TS val 1820536524 ecr
> 2169294,nop,nop,sack 1 {3824634199:3824653023}], length 0
>
> Trace output at sender:
> 8968.105153: tcp_sacktag_one: first sacked range 3824648679 -
> 3824651575 rtt 75129
> 8968.105157: tcp_sacktag_one: later sacked range 3824651575 -
> 3824653023 rtt 70224 (rtt not used)

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: tcp: picking a less conservative SACK RTT for congestion control
  2015-04-12 15:59 ` Yuchung Cheng
@ 2015-04-12 17:53   ` Kenneth Klette Jonassen
  0 siblings, 0 replies; 3+ messages in thread
From: Kenneth Klette Jonassen @ 2015-04-12 17:53 UTC (permalink / raw)
  To: Yuchung Cheng
  Cc: netdev, Eric Dumazet, Neal Cardwell, Ilpo Järvinen,
	Stephen Hemminger

>> What would be the feasible approach to track the last segment sacked? I was
>> thinking of keeping first/last skb_mstamp's in struct tcp_sacktag_state akin
>> to the way it is done in tcp_clean_rtx_queue(). This would require passing
> or use last sacked skb mstamp instead of first for sack_rtt? IMO it
> won't matter that much for RTTM.
>
> or a new ca_sack_rtt in tcp_sacktag_state and pass the state to
> tcp_clean_rtx_queue as well. the latter is more generic and extendable?
That is a likely solution. Although, intuitively, I am not really
happy about having two different skb_mstamp_get()'s in the ACK code
path for RTT since it might contribute to the variance between RTTMs.

>> eight more bytes around on 64 bit. An alternative that is slightly obscure
>> is to store the delta between the first and last sack in a 4 byte value.
>> Since struct tcp_sacktag_state currently has 4 bytes padding, this does not
>> require passing more data around -- just changing "long sack_rtt_us" to
>> a pointer. It can have some microscale cache locality impacts though. I
> seems too complicated
Maybe. I thought so too at first. But then again, this stuff is
somewhat complicated either way. I took a swing at it today, submitted
as RFC just now:
http://patchwork.ozlabs.org/patch/460536/
(Only CC'd Yuchung on it to keep email volume down.)

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-04-12 17:53 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-11 19:50 tcp: picking a less conservative SACK RTT for congestion control Kenneth Klette Jonassen
2015-04-12 15:59 ` Yuchung Cheng
2015-04-12 17:53   ` Kenneth Klette Jonassen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.