All of lore.kernel.org
 help / color / mirror / Atom feed
* Proposed NN Feature handling modification
@ 2011-02-14  1:49 Samuel Jero
  2011-02-22 11:34 ` Gerrit Renker
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Samuel Jero @ 2011-02-14  1:49 UTC (permalink / raw)
  To: dccp

[-- Attachment #1: Type: text/plain, Size: 1800 bytes --]

All,

In putting together the set of patches that I just submitted, I noticed
a less than desirable by-product of the way NN features are currently
handled. However, the reasons for handling them in this way are spelled
out in one of the function comments, so before submitted a patch to
change that, I figured I should throw my findings out on the list.

Right now, the the sender starts using the new value for an NN option as
soon as that option is sent (really, as soon as
dccp_feat_signal_nn_change is called). The justification presented in a
comment is as follows: "values are activated immediately after
validation, i.e. we don't wait for the Confirm: either the value is
accepted by the peer (and then the waiting is futile), or it is not
(Reset or empty Confirm). We don't accept empty Confirms - transmitted
values are validated, and the peer "MUST accept any valid value" (RFC
4340, 6.3.2)."

However, what I have noticed is that this causes a potential problem
with the Ack Ratio Feature. Specifically, this means that for 1/2 RTT
the sender thinks each Ack represents a different number of packets than
the receiver actually means each ack to represent. This is the time
during which that feature negotiation travels between the sender and the
receiver.

In practice, the Appropriate Byte Counting implemented in CCID 2
prevents any real problem, but a similar situation could crop up with
other NN options.

My recommendation is to wait until the Confirm has been received to
alter the value of an NN feature. This eliminates the problem above and
has the advantage that it follows RFC 4340 (section 6.6.1). Further,
making this change should require little code change.

Comments?


Samuel Jero
Internetworking Research Group
Ohio University


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Proposed NN Feature handling modification
  2011-02-14  1:49 Proposed NN Feature handling modification Samuel Jero
@ 2011-02-22 11:34 ` Gerrit Renker
  2011-02-26  2:04 ` Samuel Jero
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Gerrit Renker @ 2011-02-22 11:34 UTC (permalink / raw)
  To: dccp

| "values are activated immediately after validation, i.e. we don't wait
|  for the Confirm: either the value is accepted by the peer (and then
|  the waiting is futile), or it is not (Reset or empty Confirm). We
|  don't accept empty Confirms - transmitted values are validated, and
|  the peer "MUST accept any valid value" (RFC 4340, 6.3.2)."
| 
| However, what I have noticed is that this causes a potential problem
| with the Ack Ratio Feature. Specifically, this means that for 1/2 RTT
| the sender thinks each Ack represents a different number of packets than
| the receiver actually means each ack to represent. This is the time
| during which that feature negotiation travels between the sender and the
| receiver.
| 
| In practice, the Appropriate Byte Counting implemented in CCID 2
| prevents any real problem, but a similar situation could crop up with
| other NN options.
| 
| My recommendation is to wait until the Confirm has been received to
| alter the value of an NN feature. This eliminates the problem above and
| has the advantage that it follows RFC 4340 (section 6.6.1). Further,
| making this change should require little code change.
| 
I like the solution introduced in patch #3 because it is simple: Change
options are retransmitted until the correct Confirm option comes in,
instead of resetting the connection. It is an instance of the robustness
principle - the peer can send any outdated/reordered Confirm option.

If this simplicity is not sufficient, it can be extended to cover any cases
where mismatch between endpoints creates temporary problems.

I find it better / more robust to wait for such problems to surface and then
tackle them, rather than making the local state change dependent on the other
end. Doing this complicates processing since the other endpoint is out of local
control, hence we need to make assumptions that can break in practice
(e.g. that a client should reply within two RTTs, but there is a sudden jamming
on the wireless / ADSL, so that the reply comes in only after multiples of
seconds).

Hence I'd very much vote for your approach in patch #3, perhaps with a
debug message in order to track how often a mismatch of (rapidly changing)
options occurs.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Proposed NN Feature handling modification
  2011-02-14  1:49 Proposed NN Feature handling modification Samuel Jero
  2011-02-22 11:34 ` Gerrit Renker
@ 2011-02-26  2:04 ` Samuel Jero
  2011-02-28 11:24 ` Gerrit Renker
  2011-03-01  6:12 ` Samuel Jero
  3 siblings, 0 replies; 5+ messages in thread
From: Samuel Jero @ 2011-02-26  2:04 UTC (permalink / raw)
  To: dccp

[-- Attachment #1: Type: text/plain, Size: 2235 bytes --]

> | My recommendation is to wait until the Confirm has been received to
> | alter the value of an NN feature. This eliminates the problem above and
> | has the advantage that it follows RFC 4340 (section 6.6.1). Further,
> | making this change should require little code change.
> | 
> I like the solution introduced in patch #3 because it is simple: Change
> options are retransmitted until the correct Confirm option comes in,
> instead of resetting the connection. It is an instance of the robustness
> principle - the peer can send any outdated/reordered Confirm option.

Patches #2 and 3 (and another patch I will send shortly) make the
current method usable. Without them, NN feature negotiation in an
ongoing connection causes the connection to be reset.

However, I wouldn't consider them a fix for what I'm talking about here.
The fact is that with the current code the sender thinks the receiver
has the new value of the feature while the feature is being negotiated.

Particularly for the Ack Ratio this can be confusing because the sender
would interpret the acks from the receiver using the new ack ratio. The
first half RTT of those would be from the old ack ratio. This would
cause inappropriate changes in the congestion window/pipe except for
appropriate byte counting which, I believe, mitigates that.

It just seems to me that there should be a better way to do this;
something that's clearer and easier to understand/debug. 

> I find it better / more robust to wait for such problems to surface and then
> tackle them, rather than making the local state change dependent on the other
> end. Doing this complicates processing since the other endpoint is out of local
> control, hence we need to make assumptions that can break in practice
> (e.g. that a client should reply within two RTTs, but there is a sudden jamming
> on the wireless / ADSL, so that the reply comes in only after multiples of
> seconds).

That makes sense. After having run more tests, this issue doesn't seem
to be causing any problems right now. I'm okay with waiting until this
causes a problem to try to come up with a better solution.

Samuel Jero
Internetworking Research Group
Ohio University 

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Proposed NN Feature handling modification
  2011-02-14  1:49 Proposed NN Feature handling modification Samuel Jero
  2011-02-22 11:34 ` Gerrit Renker
  2011-02-26  2:04 ` Samuel Jero
@ 2011-02-28 11:24 ` Gerrit Renker
  2011-03-01  6:12 ` Samuel Jero
  3 siblings, 0 replies; 5+ messages in thread
From: Gerrit Renker @ 2011-02-28 11:24 UTC (permalink / raw)
  To: dccp

Quoting Samuel Jero:
| > | My recommendation is to wait until the Confirm has been received to
| > | alter the value of an NN feature. This eliminates the problem above and
| > | has the advantage that it follows RFC 4340 (section 6.6.1). Further,
| > | making this change should require little code change.
| > | 
| > I like the solution introduced in patch #3 because it is simple: Change
| > options are retransmitted until the correct Confirm option comes in,
| > instead of resetting the connection. It is an instance of the robustness
| > principle - the peer can send any outdated/reordered Confirm option.
| 
| Patches #2 and 3 (and another patch I will send shortly) make the
| current method usable. Without them, NN feature negotiation in an
| ongoing connection causes the connection to be reset.
| 
Thank you for sending the updates. I have applied them to the current tree,
please check them. Here are the changes that I made:

  Patch #1: only whitespace changes. When looking over this patch I noted a problem
            with setting the local sequence window directly: this is also done in 
            feat.c:dccp_hdlr_seq_win() - in particular, also adjusting GSS, AWL, AWH.
            Please see discussion below, this is related.
  Patch #4: I think your patch set has tackled some of the problems mentioned
            in the FIXME that it removes. It would be good to find out what
            remains to be tackled, to eventually remove the #ifdef.
  Patch #5: Added a dccp_feat_list_pop(entry) in case the new value overwrites
            an existing one. Otherwise there would be multiple ChangeL options
            in the same packet, with differing values. Please also see below.

  http://eden-feed.erg.abdn.ac.uk/cgi-bin/gitweb.cgi?pÜcp_exp.git;a=shortlog;hÜcp

| However, I wouldn't consider them a fix for what I'm talking about here.
| The fact is that with the current code the sender thinks the receiver
| has the new value of the feature while the feature is being negotiated.
| 
I think this was my fault: the RFC says in 6.6.1 that values MUST not be changed
until they are confirmed by the peer. When this is taken as a prerequisite, the
complexity you mentioned below and the weird cases of two ends being in a limbo
state will not appear.

I am therefore sending an RFC patch separately. Although I have tried it briefly
with some Speex-streaming yesterday, I have not yet applied it to the test tree:
could you give this one a brief spin with your test setup?

If it works out as expected, I think this will give a simpler (and more conformant)
solution.

| Particularly for the Ack Ratio this can be confusing because the sender
| would interpret the acks from the receiver using the new ack ratio. The
| first half RTT of those would be from the old ack ratio. This would
| cause inappropriate changes in the congestion window/pipe except for
| appropriate byte counting which, I believe, mitigates that.
| 
| It just seems to me that there should be a better way to do this;
| something that's clearer and easier to understand/debug. 
| 
I wonder if there are any corner cases with waiting for the ConfirmR option before
activating the case: if not, the case is simpler, and I think we can then even 
remove the following (changed) segment.

        entry = dccp_feat_list_lookup(fn, feat, 1);
        if (entry != NULL) {
                if (entry->val.nn = fval.nn)
                        return 0;
                dccp_pr_debug("Clobbering existing NN entry %llu -> %llu\n",
                              (unsigned long long)entry->val.nn,
                              (unsigned long long)nn_val);
                dccp_feat_list_pop(entry);
        }

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Proposed NN Feature handling modification
  2011-02-14  1:49 Proposed NN Feature handling modification Samuel Jero
                   ` (2 preceding siblings ...)
  2011-02-28 11:24 ` Gerrit Renker
@ 2011-03-01  6:12 ` Samuel Jero
  3 siblings, 0 replies; 5+ messages in thread
From: Samuel Jero @ 2011-03-01  6:12 UTC (permalink / raw)
  To: dccp

[-- Attachment #1: Type: text/plain, Size: 4130 bytes --]

> Thank you for sending the updates. I have applied them to the current tree,
> please check them. Here are the changes that I made:
> 
>   Patch #1: only whitespace changes. When looking over this patch I noted a problem
>             with setting the local sequence window directly: this is also done in 
>             feat.c:dccp_hdlr_seq_win() - in particular, also adjusting GSS, AWL, AWH.
>             Please see discussion below, this is related.

Good catch. We should call dccp_update_gss() to adjust the GSS, etc.
However, since dccp_hdlr_seq_win() will be called almost immediately
after we set the value directly, not adjusting the GSS, etc at this
point won't actually be noticed.

Considering that the assignment will go away shortly (your new patch)
and that it doesn't actually affect performance, I don't think this is
worth fixing now.

>   Patch #4: I think your patch set has tackled some of the problems mentioned
>             in the FIXME that it removes. It would be good to find out what
>             remains to be tackled, to eventually remove the #ifdef.

That's on my list of things to do eventually.

>   Patch #5: Added a dccp_feat_list_pop(entry) in case the new value overwrites
>             an existing one. Otherwise there would be multiple ChangeL options
>             in the same packet, with differing values. Please also see below.

I believe dccp_feat_entry_new(), which is called from
dccp_feat_push_change(), already takes care of an existing option by
calling the destructor and reallocating the feature. That said, I don't
really have a problem with the change.


> I am therefore sending an RFC patch separately. Although I have tried it briefly
> with some Speex-streaming yesterday, I have not yet applied it to the test tree:
> could you give this one a brief spin with your test setup?

Excellent. I am always in favor of RFC compliance; if the code is
simpler, so much the better. I will try to test this patch in the next
few days.


> | It just seems to me that there should be a better way to do this;
> | something that's clearer and easier to understand/debug. 
> | 
> I wonder if there are any corner cases with waiting for the ConfirmR option before
> activating the case: if not, the case is simpler, and I think we can then even 
> remove the following (changed) segment.
> 
>         entry = dccp_feat_list_lookup(fn, feat, 1);
>         if (entry != NULL) {
>                 if (entry->val.nn == fval.nn)
>                         return 0;
>                 dccp_pr_debug("Clobbering existing NN entry %llu -> %llu\n",
>                               (unsigned long long)entry->val.nn,
>                               (unsigned long long)nn_val);
>                 dccp_feat_list_pop(entry);
>         }

I have actually done a little work on a patch similar to the one you
sent out. It turns out that there is at least one corner case.

Consider this scenario: Sender has ack ratio=1 and then starts feature
negotation for ack ratio=2. At this exact time, the sender goes idle.
This triggers the RTO timer, which will check the ack ratio and reset it
to 1 if it is larger. However, this code has no way of knowing that we
are currently negotiating for ack ratio=2. As a result, the first packet
sent after the idle period will change the ack ratio at the receiver to
2. With cwnd at 1, the sender will wait for the RTO to send a second
packet which will trigger an ack. The good news is that this ack will
confirm ack ratio=2, so the next RTO will start feature negotiation for
ack ratio=1 and this fixes the problem.

I think in order to implement this, we need two ack ratio (and possibly
other feature) variables. An ack_ratio_current (updated on confirm) and
an ack_ratio_next (updated on start of feature negotiation). That way,
code that checks for ack ratio larger than cwnd can utilize
ack_ratio_next to prevent a problem like above. However, code
interpreting acks and sending acks can use the current value.


Samuel Jero
Internetworking Research Group
Ohio University

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-03-01  6:12 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-02-14  1:49 Proposed NN Feature handling modification Samuel Jero
2011-02-22 11:34 ` Gerrit Renker
2011-02-26  2:04 ` Samuel Jero
2011-02-28 11:24 ` Gerrit Renker
2011-03-01  6:12 ` Samuel Jero

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.