netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Bendik Rønning Opstad" <bro.devel@gmail.com>
To: Yuchung Cheng <ycheng@google.com>
Cc: "David S. Miller" <davem@davemloft.net>,
	netdev <netdev@vger.kernel.org>,
	"Eric Dumazet" <eric.dumazet@gmail.com>,
	"Neal Cardwell" <ncardwell@google.com>,
	"Andreas Petlund" <apetlund@simula.no>,
	"Carsten Griwodz" <griff@simula.no>,
	"Pål Halvorsen" <paalh@simula.no>,
	"Jonas Markussen" <jonassm@ifi.uio.no>,
	"Kristian Evensen" <kristian.evensen@gmail.com>,
	"Kenneth Klette Jonassen" <kennetkl@ifi.uio.no>
Subject: Re: [PATCH v6 net-next 2/2] tcp: Add Redundant Data Bundling (RDB)
Date: Thu, 16 Jun 2016 19:12:58 +0200	[thread overview]
Message-ID: <5762DE1A.5000904@gmail.com> (raw)
In-Reply-To: <CAK6E8=cZUkTUzSh_qymBMozZyxgwBQsvq6bx38LzpGJehcLQiA@mail.gmail.com>

On 21/03/16 19:54, Yuchung Cheng wrote:
> On Thu, Mar 17, 2016 at 4:26 PM, Bendik Rønning Opstad
> <bro.devel@gmail.com> wrote:
>>
>>>> diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
>>>> index 6a92b15..8f3f3bf 100644
>>>> --- a/Documentation/networking/ip-sysctl.txt
>>>> +++ b/Documentation/networking/ip-sysctl.txt
>>>> @@ -716,6 +716,21 @@ tcp_thin_dpifl_itt_lower_bound - INTEGER
>>>>         calculated, which is used to classify whether a stream is thin.
>>>>         Default: 10000
>>>>
>>>> +tcp_rdb - BOOLEAN
>>>> +       Enable RDB for all new TCP connections.
>>>   Please describe RDB briefly, perhaps with a pointer to your paper.
>>
>> Ah, yes, that description may have been a bit too brief...
>>
>> What about pointing to tcp-thin.txt in the brief description, and
>> rewrite tcp-thin.txt with a more detailed description of RDB along
>> with a paper reference?
> +1
>>
>>>    I suggest have three level of controls:
>>>    0: disable RDB completely
>>>    1: enable indiv. thin-stream conn. to use RDB via TCP_RDB socket
>>> options
>>>    2: enable RDB on all thin-stream conn. by default
>>>
>>>    currently it only provides mode 1 and 2. but there may be cases where
>>>    the administrator wants to disallow it (e.g., broken middle-boxes).
>>
>> Good idea. Will change this.

I have implemented your suggestion in the next patch.

>>> It also seems better to
>>> allow individual socket to select the redundancy level (e.g.,
>>> setsockopt TCP_RDB=3 means <=3 pkts per bundle) vs a global setting.
>>> This requires more bits in tcp_sock but 2-3 more is suffice.
>>
>> Most certainly. We decided not to implement this for the patch to keep
>> it as simple as possible, however, we surely prefer to have this
>> functionality included if possible.

Next patch version has a socket option to allow modifying the different
RDB settings.

>>>> +int tcp_transmit_rdb_skb(struct sock *sk, struct sk_buff *xmit_skb,
>>>> +                        unsigned int mss_now, gfp_t gfp_mask)
>>>> +{
>>>> +       struct sk_buff *rdb_skb = NULL;
>>>> +       struct sk_buff *first_to_bundle;
>>>> +       u32 bytes_in_rdb_skb = 0;
>>>> +
>>>> +       /* How we detect that RDB was used. When equal, no RDB data was sent */
>>>> +       TCP_SKB_CB(xmit_skb)->tx.rdb_start_seq = TCP_SKB_CB(xmit_skb)->seq;
>>>
>>>> +
>>>> +       if (!tcp_stream_is_thin_dpifl(tcp_sk(sk)))
>>> During loss recovery tcp inflight fluctuates and would like to trigger
>>> this check even for non-thin-stream connections.
>>
>> Good point.
>>
>>> Since the loss
>>> already occurs, RDB can only take advantage from limited-transmit,
>>> which it likely does not have (b/c its a thin-stream). It might be
>>> checking if the state is open.
>>
>> You mean to test for open state to avoid calling rdb_can_bundle_test()
>> unnecessarily if we (presume to) know it cannot bundle anyway? That
>> makes sense, however, I would like to do some tests on whether "state
>> != open" is a good indicator on when bundling is not possible.

When testing this I found that bundling can often be performed when
not in Open state. For the most part in CWR mode, but also the other
modes, so this does not seem like a good indicator.

The only problem with tcp_stream_is_thin_dpifl() triggering for
non-thin streams in loss recovery would be the performance penalty of
calling rdb_can_bundle_test(). It would not be able to bundle anyways
since the previous SKB would contain >= mss worth of data.

The most reliable test is to check available space in the previous
SKB, i.e. if (xmit_skb->prev->len == mss_now). Do you suggest, for
performance reasons, to do this before the call to
tcp_stream_is_thin_dpifl()?

>>> since RDB will cause DSACKs, and we only blindly count DSACKs to
>>> perform CWND undo. How does RDB handle that false positives?
>>
>> That is a very good question. The simple answer is that the
>> implementation does not handle any such false positives, which I
>> expect can result in incorrectly undoing CWND reduction in some cases.
>> This gets a bit complicated, so I'll have to do some more testing on
>> this to verify with certainty when it happens.
>>
>> When there is no loss, and each RDB packet arriving at the receiver
>> contains both already received and new data, the receiver will respond
>> with an ACK that acknowledges new data (moves snd_una), with the SACK
>> field populated with the already received sequence range (DSACK).
>>
>> The DSACKs in these incoming ACKs are not counted (tp->undo_retrans--)
>> unless tp->undo_marker has been set by tcp_init_undo(), which is
>> called by either tcp_enter_loss() or tcp_enter_recovery(). However,
>> whenever a loss is detected by rdb_detect_loss(), tcp_enter_cwr() is
>> called, which disables CWND undo. Therefore, I believe the incorrect
> thanks for the clarification. it might worth a short comment on why we
> use tcp_enter_cwr() (to disable undo)
>
>
>> counting of DSACKs from ACKs on RDB packets will only be a problem
>> after the regular loss detection mechanisms (Fast Retransmit/RTO) have
>> been triggered (i.e. we are in either TCP_CA_Recovery or TCP_CA_Loss).
>>
>> We have recorded the CWND values for both RDB and non-RDB streams in
>> our experiments, and have not found any obvious red flags when
>> analysing the results, so I presume (hope may be more precise) this is
>> not a major issue we have missed. Nevertheless, I will investigate
>> this in detail and get back to you.

I've looked into this and tried to figure out in which cases this is
actually a problem, but I have failed to find any.

One scenario I considered is when an RDB packet is sent right after a
retransmit, which would result in DSACK in the ACK in response to the
RDB packet.

With a bundling limit of 1 packet, two packets must be lost for RDB to
fail to repair the loss, causing dupACKs. So if three packets are sent,
where the first two are lost, the last packet will cause a dupACK,
resulting in a fast retransmit (and entering recovery which calls
tcp_init_undo()).

By writing new data to the socket right after the fast retransmit,
a new RDB packet is built with some old data that was just
retransmitted.

On the ACK on the fast retransmit the state is changed from Recovery
to Open. The next incoming ACK (on the RDB packet) will contain a DSACK
range, but it will not be considered dubious (tcp_ack_is_dubious())
since "!(flag & FLAG_NOT_DUP)" is false (new data was acked), state is
Open, and "flag & FLAG_CA_ALERT" evaluates to false.


Feel free to suggest scenarios (as detailed as possible) with the
potential to cause such false positives, and I'll test them with
packetdrill.

Bendik

  reply	other threads:[~2016-06-16 17:15 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-23 20:50 [PATCH RFC net-next 0/2] tcp: Redundant Data Bundling (RDB) Bendik Rønning Opstad
2015-10-23 20:50 ` [PATCH RFC net-next 1/2] tcp: Add DPIFL thin stream detection mechanism Bendik Rønning Opstad
2015-10-23 21:44   ` Eric Dumazet
     [not found]     ` <1445636654.22974.193.camel-XN9IlZ5yJG9HTL0Zs8A6p/gx64E7kk8eUsxypvmhUTTZJqsBc5GL+g@public.gmane.org>
2015-10-25  5:56       ` Bendik Rønning Opstad
2015-10-23 20:50 ` [PATCH RFC net-next 2/2] tcp: Add Redundant Data Bundling (RDB) Bendik Rønning Opstad
2015-10-26 14:50   ` Neal Cardwell
2015-10-26 21:35     ` Andreas Petlund
2015-10-26 21:58       ` Yuchung Cheng
2015-10-27 19:15         ` Jonas Markussen
2015-10-29 22:53         ` Bendik Rønning Opstad
2015-11-02  9:18           ` David Laight
2015-11-02  9:37   ` David Laight
2015-11-05  2:06     ` Bendik Rønning Opstad
2015-10-24  6:11 ` [PATCH RFC net-next 0/2] tcp: " Yuchung Cheng
2015-10-24  8:00   ` Jonas Markussen
     [not found]     ` <61F74109-9FDC-485A-978B-714B7AA27445-6miFZF/5cTBuMpJDpNschA@public.gmane.org>
2015-10-24 12:57       ` Eric Dumazet
2015-11-09 19:40         ` Bendik Rønning Opstad
2015-11-23 16:26 ` [PATCH RFC v2 " Bendik Rønning Opstad
2015-11-23 16:26 ` [PATCH RFC v2 net-next 1/2] tcp: Add DPIFL thin stream detection mechanism Bendik Rønning Opstad
2015-11-23 16:26 ` [PATCH RFC v2 net-next 2/2] tcp: Add Redundant Data Bundling (RDB) Bendik Rønning Opstad
2015-11-23 17:43   ` Eric Dumazet
2015-11-23 20:05     ` Bendik Rønning Opstad
2016-02-02 19:23 ` [PATCH v3 net-next 0/2] tcp: " Bendik Rønning Opstad
2016-02-02 19:23 ` [PATCH v3 net-next 1/2] tcp: Add DPIFL thin stream detection mechanism Bendik Rønning Opstad
2016-02-02 19:23 ` [PATCH v3 net-next 2/2] tcp: Add Redundant Data Bundling (RDB) Bendik Rønning Opstad
2016-02-02 20:35   ` Eric Dumazet
2016-02-03 18:17     ` Bendik Rønning Opstad
2016-02-03 19:34       ` Eric Dumazet
     [not found]         ` <CAF8eE=VOuoNLQHtkRwM9ZG+vJ-uH2ufVW5y_pS24rGqWh4Qa2g@mail.gmail.com>
2016-02-08 17:30           ` Bendik Rønning Opstad
2016-02-08 17:38         ` Bendik Rønning Opstad
2016-02-16 13:51 ` [PATCH v4 net-next 0/2] tcp: " Bendik Rønning Opstad
2016-02-16 13:51 ` [PATCH v4 net-next 1/2] tcp: Add DPIFL thin stream detection mechanism Bendik Rønning Opstad
2016-02-16 13:51 ` [PATCH v4 net-next 2/2] tcp: Add Redundant Data Bundling (RDB) Bendik Rønning Opstad
2016-02-18 15:18   ` Eric Dumazet
2016-02-19 14:12     ` Bendik Rønning Opstad
2016-02-24 21:12 ` [PATCH v5 net-next 0/2] tcp: " Bendik Rønning Opstad
2016-02-24 21:12 ` [PATCH v5 net-next 1/2] tcp: Add DPIFL thin stream detection mechanism Bendik Rønning Opstad
2016-02-24 21:12 ` [PATCH v5 net-next 2/2] tcp: Add Redundant Data Bundling (RDB) Bendik Rønning Opstad
2016-03-02 19:52   ` David Miller
2016-03-02 22:33     ` Bendik Rønning Opstad
2016-03-03 18:06 ` [PATCH v6 net-next 0/2] tcp: " Bendik Rønning Opstad
2016-03-07 19:36   ` David Miller
2016-03-10  0:20   ` Yuchung Cheng
2016-03-10  1:45     ` Jonas Markussen
2016-03-10  2:27       ` Yuchung Cheng
2016-03-12  9:23         ` Jonas Markussen
2016-03-13 23:18     ` Bendik Rønning Opstad
2016-03-14 21:59       ` Yuchung Cheng
2016-03-18 14:25         ` Bendik Rønning Opstad
2016-03-03 18:06 ` [PATCH v6 net-next 1/2] tcp: Add DPIFL thin stream detection mechanism Bendik Rønning Opstad
2016-03-03 18:06 ` [PATCH v6 net-next 2/2] tcp: Add Redundant Data Bundling (RDB) Bendik Rønning Opstad
2016-03-14 21:15   ` Eric Dumazet
2016-03-15  1:04     ` Rick Jones
2016-03-15 18:09       ` Yuchung Cheng
2016-03-18 17:58     ` Bendik Rønning Opstad
2016-03-14 21:54   ` Yuchung Cheng
2016-03-15  0:40     ` Bill Fink
2016-03-17 23:26     ` Bendik Rønning Opstad
2016-03-21 18:54       ` Yuchung Cheng
2016-06-16 17:12         ` Bendik Rønning Opstad [this message]
2016-06-22 14:56 ` [PATCH v7 net-next 0/2] tcp: " Bendik Rønning Opstad
2016-06-22 14:56 ` [PATCH v7 net-next 1/2] tcp: Add DPIFL thin stream detection mechanism Bendik Rønning Opstad
2016-06-22 14:56 ` [PATCH v7 net-next 2/2] tcp: Add Redundant Data Bundling (RDB) Bendik Rønning Opstad

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5762DE1A.5000904@gmail.com \
    --to=bro.devel@gmail.com \
    --cc=apetlund@simula.no \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=griff@simula.no \
    --cc=jonassm@ifi.uio.no \
    --cc=kennetkl@ifi.uio.no \
    --cc=kristian.evensen@gmail.com \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=paalh@simula.no \
    --cc=ycheng@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).