All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv2] sctp: Do not create SACK chunk if the final packet size
@ 2009-07-31 10:19 Wei Yongjun
  2009-07-31 14:01 ` [PATCHv2] sctp: Do not create SACK chunk if the final packet Vlad Yasevich
                   ` (12 more replies)
  0 siblings, 13 replies; 14+ messages in thread
From: Wei Yongjun @ 2009-07-31 10:19 UTC (permalink / raw)
  To: linux-sctp

The sender should create a SACK only if the size of the final SCTP
packet does not exceed the current MTU. Base on RFC 4960:

  6.1.  Transmission of DATA Chunks

    Before an endpoint transmits a DATA chunk, if any received DATA
    chunks have not been acknowledged (e.g., due to delayed ack), the
    sender should create a SACK and bundle it with the outbound DATA
    chunk, as long as the size of the final SCTP packet does not exceed
    the current MTU.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
---
 net/sctp/output.c |   28 ++++++++++++++++------------
 1 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/net/sctp/output.c b/net/sctp/output.c
index 94c110d..21b9efd 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -222,6 +222,16 @@ static sctp_xmit_t sctp_packet_bundle_auth(struct sctp_packet *pkt,
 	return retval;
 }
 
+static int can_append_chunk(struct sctp_packet *pkt, size_t size)
+{
+	size_t psize = pkt->size;
+	size_t pmtu = ((pkt->transport->asoc) ?
+		       (pkt->transport->asoc->pathmtu) :
+		       (pkt->transport->pathmtu));
+
+	return (psize + size <= pmtu);
+}
+
 /* Try to bundle a SACK with the packet. */
 static sctp_xmit_t sctp_packet_bundle_sack(struct sctp_packet *pkt,
 					   struct sctp_chunk *chunk)
@@ -235,11 +245,15 @@ static sctp_xmit_t sctp_packet_bundle_sack(struct sctp_packet *pkt,
 	    !pkt->has_cookie_echo) {
 		struct sctp_association *asoc;
 		struct timer_list *timer;
+		__u16 chunk_len;
+
 		asoc = pkt->transport->asoc;
 		timer = &asoc->timers[SCTP_EVENT_TIMEOUT_SACK];
+		chunk_len = WORD_ROUND(ntohs(chunk->chunk_hdr->length)) +
+			    sizeof(struct sctp_sack_chunk);
 
 		/* If the SACK timer is running, we have a pending SACK */
-		if (timer_pending(timer)) {
+		if (timer_pending(timer) && can_append_chunk(pkt, chunk_len)) {
 			struct sctp_chunk *sack;
 			asoc->a_rwnd = asoc->rwnd;
 			sack = sctp_make_sack(asoc);
@@ -262,9 +276,6 @@ sctp_xmit_t sctp_packet_append_chunk(struct sctp_packet *packet,
 {
 	sctp_xmit_t retval = SCTP_XMIT_OK;
 	__u16 chunk_len = WORD_ROUND(ntohs(chunk->chunk_hdr->length));
-	size_t psize;
-	size_t pmtu;
-	int too_big;
 
 	SCTP_DEBUG_PRINTK("%s: packet:%p chunk:%p\n", __func__, packet,
 			  chunk);
@@ -279,15 +290,8 @@ sctp_xmit_t sctp_packet_append_chunk(struct sctp_packet *packet,
 	if (retval != SCTP_XMIT_OK)
 		goto finish;
 
-	psize = packet->size;
-	pmtu  = ((packet->transport->asoc) ?
-		 (packet->transport->asoc->pathmtu) :
-		 (packet->transport->pathmtu));
-
-	too_big = (psize + chunk_len > pmtu);
-
 	/* Decide if we need to fragment or resubmit later. */
-	if (too_big) {
+	if (!can_append_chunk(packet, chunk_len)) {
 		/* It's OK to fragmet at IP level if any one of the following
 		 * is true:
 		 * 	1. The packet is empty (meaning this chunk is greater
-- 
1.6.2.2





^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCHv2] sctp: Do not create SACK chunk if the final packet
  2009-07-31 10:19 [PATCHv2] sctp: Do not create SACK chunk if the final packet size Wei Yongjun
@ 2009-07-31 14:01 ` Vlad Yasevich
  2009-07-31 16:02 ` [PATCHv2] sctp: Do not create SACK chunk if the final packet size exceed current MTU Doug Graham
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Vlad Yasevich @ 2009-07-31 14:01 UTC (permalink / raw)
  To: linux-sctp

Wei Yongjun wrote:
> The sender should create a SACK only if the size of the final SCTP
> packet does not exceed the current MTU. Base on RFC 4960:
> 
>   6.1.  Transmission of DATA Chunks
> 
>     Before an endpoint transmits a DATA chunk, if any received DATA
>     chunks have not been acknowledged (e.g., due to delayed ack), the
>     sender should create a SACK and bundle it with the outbound DATA
>     chunk, as long as the size of the final SCTP packet does not exceed
>     the current MTU.


I like this much better, but the one thing I don't like is that we end
up delaying the SACK if it doesn't fit in the chunk.

May be we can add some checking to see if there are more chunks that we'll
be sending and try to bundle it later.

Another question is whether we should really be sending an immediate SACK
back after receiving just one DATA?

I still think that this should really be handled by SACK immediately extension.
The extension can apply when we are single sub-MTU packets, essentially a
request-response scenario.  There is no reason that Immediate SACKs can't be
bundled in this manner.

-vlad


> 
> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
> ---
>  net/sctp/output.c |   28 ++++++++++++++++------------
>  1 files changed, 16 insertions(+), 12 deletions(-)
> 
> diff --git a/net/sctp/output.c b/net/sctp/output.c
> index 94c110d..21b9efd 100644
> --- a/net/sctp/output.c
> +++ b/net/sctp/output.c
> @@ -222,6 +222,16 @@ static sctp_xmit_t sctp_packet_bundle_auth(struct sctp_packet *pkt,
>  	return retval;
>  }
>  
> +static int can_append_chunk(struct sctp_packet *pkt, size_t size)
> +{
> +	size_t psize = pkt->size;
> +	size_t pmtu = ((pkt->transport->asoc) ?
> +		       (pkt->transport->asoc->pathmtu) :
> +		       (pkt->transport->pathmtu));
> +
> +	return (psize + size <= pmtu);
> +}
> +
>  /* Try to bundle a SACK with the packet. */
>  static sctp_xmit_t sctp_packet_bundle_sack(struct sctp_packet *pkt,
>  					   struct sctp_chunk *chunk)
> @@ -235,11 +245,15 @@ static sctp_xmit_t sctp_packet_bundle_sack(struct sctp_packet *pkt,
>  	    !pkt->has_cookie_echo) {
>  		struct sctp_association *asoc;
>  		struct timer_list *timer;
> +		__u16 chunk_len;
> +
>  		asoc = pkt->transport->asoc;
>  		timer = &asoc->timers[SCTP_EVENT_TIMEOUT_SACK];
> +		chunk_len = WORD_ROUND(ntohs(chunk->chunk_hdr->length)) +
> +			    sizeof(struct sctp_sack_chunk);
>  
>  		/* If the SACK timer is running, we have a pending SACK */
> -		if (timer_pending(timer)) {
> +		if (timer_pending(timer) && can_append_chunk(pkt, chunk_len)) {
>  			struct sctp_chunk *sack;
>  			asoc->a_rwnd = asoc->rwnd;
>  			sack = sctp_make_sack(asoc);
> @@ -262,9 +276,6 @@ sctp_xmit_t sctp_packet_append_chunk(struct sctp_packet *packet,
>  {
>  	sctp_xmit_t retval = SCTP_XMIT_OK;
>  	__u16 chunk_len = WORD_ROUND(ntohs(chunk->chunk_hdr->length));
> -	size_t psize;
> -	size_t pmtu;
> -	int too_big;
>  
>  	SCTP_DEBUG_PRINTK("%s: packet:%p chunk:%p\n", __func__, packet,
>  			  chunk);
> @@ -279,15 +290,8 @@ sctp_xmit_t sctp_packet_append_chunk(struct sctp_packet *packet,
>  	if (retval != SCTP_XMIT_OK)
>  		goto finish;
>  
> -	psize = packet->size;
> -	pmtu  = ((packet->transport->asoc) ?
> -		 (packet->transport->asoc->pathmtu) :
> -		 (packet->transport->pathmtu));
> -
> -	too_big = (psize + chunk_len > pmtu);
> -
>  	/* Decide if we need to fragment or resubmit later. */
> -	if (too_big) {
> +	if (!can_append_chunk(packet, chunk_len)) {
>  		/* It's OK to fragmet at IP level if any one of the following
>  		 * is true:
>  		 * 	1. The packet is empty (meaning this chunk is greater


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv2] sctp: Do not create SACK chunk if the final packet size exceed current MTU
  2009-07-31 10:19 [PATCHv2] sctp: Do not create SACK chunk if the final packet size Wei Yongjun
  2009-07-31 14:01 ` [PATCHv2] sctp: Do not create SACK chunk if the final packet Vlad Yasevich
@ 2009-07-31 16:02 ` Doug Graham
  2009-07-31 16:49 ` [PATCHv2] sctp: Do not create SACK chunk if the final packet Wei Yongjun
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Doug Graham @ 2009-07-31 16:02 UTC (permalink / raw)
  To: linux-sctp

On Fri, Jul 31, 2009 at 10:01:31AM -0400, Vlad Yasevich wrote:
> Wei Yongjun wrote:
> > The sender should create a SACK only if the size of the final SCTP
> > packet does not exceed the current MTU. Base on RFC 4960:
> > 
> >   6.1.  Transmission of DATA Chunks
> > 
> >     Before an endpoint transmits a DATA chunk, if any received DATA
> >     chunks have not been acknowledged (e.g., due to delayed ack), the
> >     sender should create a SACK and bundle it with the outbound DATA
> >     chunk, as long as the size of the final SCTP packet does not exceed
> >     the current MTU.
> 
> 
> I like this much better, but the one thing I don't like is that we end
> up delaying the SACK if it doesn't fit in the chunk.
> 
> May be we can add some checking to see if there are more chunks that we'll
> be sending and try to bundle it later.
> 
> Another question is whether we should really be sending an immediate SACK
> back after receiving just one DATA?

IMO, if there is data flowing in the other direction, a SACK should
always be sent back at the same time as the DATA.  If the SACK can't be
bundled into the last chunk of DATA within a message, then it should be
sent by itself.

I'll try coming up with a patch to implement that behaviour when I
get time to look at the code some more (and if nobody beats me to it),
but I think the logic when sending DATA should be:

  if (have SACK to send)
     if (this is the last or only DATA chunk of a message)
        if (SACK+DATA fits in MTU)
           bundle SACK
        else
           send SACK in a separate packet

Doing something like this would mean that a SACK would always be sent
back immediately if there is DATA to send.  The only time the SACK would
be sent in a separate packet would be if the last fragment of a message
was too large to be bundled with a SACK.  That's a pretty narrow range
of packet sizes (1436-1452 bytes), so it shouldn't happen often.  In all
other cases, the SACK would be bundled with the last message fragment.

How does that sound?  Looks like it should be quite simple to implement,
provided that it's not difficult to figure out whether a DATA chunk
is the last in a message.

--Doug.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv2] sctp: Do not create SACK chunk if the final packet
  2009-07-31 10:19 [PATCHv2] sctp: Do not create SACK chunk if the final packet size Wei Yongjun
  2009-07-31 14:01 ` [PATCHv2] sctp: Do not create SACK chunk if the final packet Vlad Yasevich
  2009-07-31 16:02 ` [PATCHv2] sctp: Do not create SACK chunk if the final packet size exceed current MTU Doug Graham
@ 2009-07-31 16:49 ` Wei Yongjun
  2009-07-31 17:04 ` Vlad Yasevich
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Wei Yongjun @ 2009-07-31 16:49 UTC (permalink / raw)
  To: linux-sctp

Vlad Yasevich wrote:
> Wei Yongjun wrote:
>   
>> The sender should create a SACK only if the size of the final SCTP
>> packet does not exceed the current MTU. Base on RFC 4960:
>>
>>   6.1.  Transmission of DATA Chunks
>>
>>     Before an endpoint transmits a DATA chunk, if any received DATA
>>     chunks have not been acknowledged (e.g., due to delayed ack), the
>>     sender should create a SACK and bundle it with the outbound DATA
>>     chunk, as long as the size of the final SCTP packet does not exceed
>>     the current MTU.
>>     
>
>
> I like this much better, but the one thing I don't like is that we end
> up delaying the SACK if it doesn't fit in the chunk.
>   

Why we need to send the SACK immediate? Just to make the one send one
recv mode happy?
In this case, it is better to disable the delay ack, is this correct?

> May be we can add some checking to see if there are more chunks that we'll
> be sending and try to bundle it later.
>
> Another question is whether we should really be sending an immediate SACK
> back after receiving just one DATA?
>
> I still think that this should really be handled by SACK immediately extension.
> The extension can apply when we are single sub-MTU packets, essentially a
> request-response scenario.  There is no reason that Immediate SACKs can't be
> bundled in this manner.
>   

The immediate SACK is sent without delay SACK timer.It is not generate by
sctp_packet_bundle_sack(), it is created by sctp_gen_sack().




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv2] sctp: Do not create SACK chunk if the final packet
  2009-07-31 10:19 [PATCHv2] sctp: Do not create SACK chunk if the final packet size Wei Yongjun
                   ` (2 preceding siblings ...)
  2009-07-31 16:49 ` [PATCHv2] sctp: Do not create SACK chunk if the final packet Wei Yongjun
@ 2009-07-31 17:04 ` Vlad Yasevich
  2009-07-31 17:09 ` Doug Graham
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Vlad Yasevich @ 2009-07-31 17:04 UTC (permalink / raw)
  To: linux-sctp

Wei Yongjun wrote:
> Vlad Yasevich wrote:
>> Wei Yongjun wrote:
>>   
>>> The sender should create a SACK only if the size of the final SCTP
>>> packet does not exceed the current MTU. Base on RFC 4960:
>>>
>>>   6.1.  Transmission of DATA Chunks
>>>
>>>     Before an endpoint transmits a DATA chunk, if any received DATA
>>>     chunks have not been acknowledged (e.g., due to delayed ack), the
>>>     sender should create a SACK and bundle it with the outbound DATA
>>>     chunk, as long as the size of the final SCTP packet does not exceed
>>>     the current MTU.
>>>     
>>
>> I like this much better, but the one thing I don't like is that we end
>> up delaying the SACK if it doesn't fit in the chunk.
>>   
> 
> Why we need to send the SACK immediate? Just to make the one send one
> recv mode happy?
> In this case, it is better to disable the delay ack, is this correct?

Consistent behavior.  A change in message size shouldn't cause a change in
on the wire behavior.

Either we always send a SACK if we have DATA to send, or we don't.  Having a
weird "only if it fits" behavior will have unpredictable results.

> 
>> May be we can add some checking to see if there are more chunks that we'll
>> be sending and try to bundle it later.
>>
>> Another question is whether we should really be sending an immediate SACK
>> back after receiving just one DATA?
>>
>> I still think that this should really be handled by SACK immediately extension.
>> The extension can apply when we are single sub-MTU packets, essentially a
>> request-response scenario.  There is no reason that Immediate SACKs can't be
>> bundled in this manner.
>>   
> 
> The immediate SACK is sent without delay SACK timer.It is not generate by
> sctp_packet_bundle_sack(), it is created by sctp_gen_sack().
> 

Matter of implementation.

-vlad

> 
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv2] sctp: Do not create SACK chunk if the final packet
  2009-07-31 10:19 [PATCHv2] sctp: Do not create SACK chunk if the final packet size Wei Yongjun
                   ` (3 preceding siblings ...)
  2009-07-31 17:04 ` Vlad Yasevich
@ 2009-07-31 17:09 ` Doug Graham
  2009-08-02  3:03 ` [PATCHv2] sctp: Do not create SACK chunk if the final packet size exceed current MTU Doug Graham
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Doug Graham @ 2009-07-31 17:09 UTC (permalink / raw)
  To: linux-sctp

Vlad Yasevich wrote:
> Wei Yongjun wrote:
>   
>>
>> Why we need to send the SACK immediate? Just to make the one send one
>> recv mode happy?
>> In this case, it is better to disable the delay ack, is this correct?
>>     
>
> Consistent behavior.  A change in message size shouldn't cause a change in
> on the wire behavior.
>
> Either we always send a SACK if we have DATA to send, or we don't.  Having a
> weird "only if it fits" behavior will have unpredictable results.
>   
Exactly!

--Doug



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv2] sctp: Do not create SACK chunk if the final packet size exceed current MTU
  2009-07-31 10:19 [PATCHv2] sctp: Do not create SACK chunk if the final packet size Wei Yongjun
                   ` (4 preceding siblings ...)
  2009-07-31 17:09 ` Doug Graham
@ 2009-08-02  3:03 ` Doug Graham
  2009-08-03 17:19 ` [PATCHv2] sctp: Do not create SACK chunk if the final packet Vlad Yasevich
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Doug Graham @ 2009-08-02  3:03 UTC (permalink / raw)
  To: linux-sctp

On Fri, Jul 31, 2009 at 01:04:44PM -0400, Vlad Yasevich wrote:
> Wei Yongjun wrote:
> > Vlad Yasevich wrote:
> >> I like this much better, but the one thing I don't like is that we end
> >> up delaying the SACK if it doesn't fit in the chunk.
> >>   
> > 
> > Why we need to send the SACK immediate? Just to make the one send one
> > recv mode happy?
> > In this case, it is better to disable the delay ack, is this correct?
> 
> Consistent behavior.  A change in message size shouldn't cause a change in
> on the wire behavior.
> 
> Either we always send a SACK if we have DATA to send, or we don't.  Having a
> weird "only if it fits" behavior will have unpredictable results.

My attempt at solving this is below.  Messages whose last or only
fragment are just less than or equal to the PMTU (meaning that no SACK
can be bundled in any fragment of the message) have a bare SACK sent just
before the last fragment of the message.  For all other message sizes,
normal piggybacking takes care of sending a bundled SACK.  I've tested
this by ping-ponging messages of different sizes between two applications
as fast as possible, and have not come across any sizes where performance
drops precipitously.

> > 
> >> May be we can add some checking to see if there are more chunks that we'll
> >> be sending and try to bundle it later.
> >>
> >> Another question is whether we should really be sending an immediate SACK
> >> back after receiving just one DATA?
> >>
> >> I still think that this should really be handled by SACK immediately extension.
> >> The extension can apply when we are single sub-MTU packets, essentially a
> >> request-response scenario.  There is no reason that Immediate SACKs can't be
> >> bundled in this manner.

The main problems I can think of with depending on this extension are:

 1) If it's left up to the user to decide when to use SCTP_SACK_IMMEDIATELY flag,
    rather than taking care to use it only under the conditions when it would
    actually help, they'd be likely to just use it all the time, thus defeating
    the purpose of delayed SACKs altogether.  I suspect that this is what is
    happening now with SCTP_NODELAY.

 2) Probably the only way a user could determine when SCTP_SACK_IMMEDIATELY
    would be useful would be to first get a PhD in networking and SCTP,
    then spend some time sniffing their application's on-the-wire
    behaviour.    

 3) Even if users are careful about when to use SCTP_SACK_IMMEDIATELY, there will
    still be cases where the SACK could have been piggybacked on returning DATA,
    but wasn't because this flag requested an immediate SACK.  This
    applies even if the entity deciding whether or not to set the I bit
    is the sending SCTP (as opposed to a user).

I think it is better if the receiving SCTP is able to determine without
any hints from the sender when to piggyback, when to send a bare immediate
SACK, and when to delay the SACK.  This may not always be possible,
so there may be cases where there is no other way to do what the I bit
does, but I don't think it's a good idea to use it to solve problems
that can be solved in other ways.  All the same issues with Nagle and
delayed-ACKs and whatever else exist in TCP as well, but to my knowledge,
nobody has ever seen the need to implement an extension like this for TCP.


Here's my patch to make sure a SACK gets sent back with a message when
the last or only fragment of the message is just less than the PMTU.

Signed-off-by: Doug Graham <dgraham@nortel.com>

---

--- linux-2.6.29/net/sctp/output.c	2009/08/02 00:51:45	1.4
+++ linux-2.6.29/net/sctp/output.c	2009/08/02 01:21:37
@@ -249,14 +249,36 @@ static sctp_xmit_t sctp_packet_bundle_sa
 		struct sctp_association *asoc;
 		struct timer_list *timer;
 		__u16 chunk_len;
+		__u8 frag_bits, last_frag;
 
 		asoc = pkt->transport->asoc;
 		timer = &asoc->timers[SCTP_EVENT_TIMEOUT_SACK];
 		chunk_len = WORD_ROUND(ntohs(chunk->chunk_hdr->length)) +
 			    sizeof(struct sctp_sack_chunk);
+		frag_bits = chunk->chunk_hdr->flags & SCTP_DATA_FRAG_MASK;
+		last_frag = (frag_bits = SCTP_DATA_NOT_FRAG ||
+			     frag_bits = SCTP_DATA_LAST_FRAG);
 
-		/* If the SACK timer is running, we have a pending SACK */
-		if (timer_pending(timer) && can_append_chunk(pkt, chunk_len)) {
+		/*
+		 * If the SACK timer is running, we have a pending SACK
+		 *
+		 * If we have room, bundle it with the current DATA chunk.
+		 * If we do not have room but this is either the only
+		 * fragment of a message or the last fragment of a message,
+		 * append the SACK chunk anyway.  It'll be sent in a separate
+		 * packet just before the DATA chunk.  The reason for sending
+		 * the SACK immediately is to avoid odd anomalies in behaviour
+		 * when sending messages of a size just less than or equal
+		 * to a multiple of the PMTU.  Small messages get a SACK
+		 * bundled with them, since there is room.  Messages greater
+		 * than the PMTU also get a SACK bundled into the last DATA
+		 * chunk if there is room.  Messages whose last or only chunks
+		 * are just less than the PMTU can not bundle a SACK with
+		 * a DATA chunk, so we send one in a separate packet just
+		 * before the last DATA chunk in the message.
+		 */
+		if (timer_pending(timer) &&
+		    (can_append_chunk(pkt, chunk_len) || last_frag)) {
 			struct sctp_chunk *sack;
 			asoc->a_rwnd = asoc->rwnd;
 			sack = sctp_make_sack(asoc);


--Doug

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv2] sctp: Do not create SACK chunk if the final packet
  2009-07-31 10:19 [PATCHv2] sctp: Do not create SACK chunk if the final packet size Wei Yongjun
                   ` (5 preceding siblings ...)
  2009-08-02  3:03 ` [PATCHv2] sctp: Do not create SACK chunk if the final packet size exceed current MTU Doug Graham
@ 2009-08-03 17:19 ` Vlad Yasevich
  2009-08-04  2:45 ` Doug Graham
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Vlad Yasevich @ 2009-08-03 17:19 UTC (permalink / raw)
  To: linux-sctp

Hi Doug

Doug Graham wrote:
> On Fri, Jul 31, 2009 at 01:04:44PM -0400, Vlad Yasevich wrote:
>> Wei Yongjun wrote:
>>> Vlad Yasevich wrote:
>>>> I like this much better, but the one thing I don't like is that we end
>>>> up delaying the SACK if it doesn't fit in the chunk.
>>>>   
>>> Why we need to send the SACK immediate? Just to make the one send one
>>> recv mode happy?
>>> In this case, it is better to disable the delay ack, is this correct?
>> Consistent behavior.  A change in message size shouldn't cause a change in
>> on the wire behavior.
>>
>> Either we always send a SACK if we have DATA to send, or we don't.  Having a
>> weird "only if it fits" behavior will have unpredictable results.
> 
> My attempt at solving this is below.  Messages whose last or only
> fragment are just less than or equal to the PMTU (meaning that no SACK
> can be bundled in any fragment of the message) have a bare SACK sent just
> before the last fragment of the message.  For all other message sizes,
> normal piggybacking takes care of sending a bundled SACK.  I've tested
> this by ping-ponging messages of different sizes between two applications
> as fast as possible, and have not come across any sizes where performance
> drops precipitously.
> 
>>>> May be we can add some checking to see if there are more chunks that we'll
>>>> be sending and try to bundle it later.
>>>>
>>>> Another question is whether we should really be sending an immediate SACK
>>>> back after receiving just one DATA?
>>>>
>>>> I still think that this should really be handled by SACK immediately extension.
>>>> The extension can apply when we are single sub-MTU packets, essentially a
>>>> request-response scenario.  There is no reason that Immediate SACKs can't be
>>>> bundled in this manner.
> 
> The main problems I can think of with depending on this extension are:
> 
>  1) If it's left up to the user to decide when to use SCTP_SACK_IMMEDIATELY flag,
>     rather than taking care to use it only under the conditions when it would
>     actually help, they'd be likely to just use it all the time, thus defeating
>     the purpose of delayed SACKs altogether.  I suspect that this is what is
>     happening now with SCTP_NODELAY.
> 
>  2) Probably the only way a user could determine when SCTP_SACK_IMMEDIATELY
>     would be useful would be to first get a PhD in networking and SCTP,
>     then spend some time sniffing their application's on-the-wire
>     behaviour.    
> 
>  3) Even if users are careful about when to use SCTP_SACK_IMMEDIATELY, there will
>     still be cases where the SACK could have been piggybacked on returning DATA,
>     but wasn't because this flag requested an immediate SACK.  This
>     applies even if the entity deciding whether or not to set the I bit
>     is the sending SCTP (as opposed to a user).

SACK_IMMEDIATELY is not only done when the user marks it so.  The sctp
implementation may also mark chunks when it thing that it can benefit from
an immediate sack.  We can talk to Michael about when it can be done as well
as any potential bundling recommendations in a separate topic.


> 
> I think it is better if the receiving SCTP is able to determine without
> any hints from the sender when to piggyback, when to send a bare immediate
> SACK, and when to delay the SACK.  This may not always be possible,
> so there may be cases where there is no other way to do what the I bit
> does, but I don't think it's a good idea to use it to solve problems
> that can be solved in other ways.  All the same issues with Nagle and
> delayed-ACKs and whatever else exist in TCP as well, but to my knowledge,
> nobody has ever seen the need to implement an extension like this for TCP.
> 
> 
> Here's my patch to make sure a SACK gets sent back with a message when
> the last or only fragment of the message is just less than the PMTU.

The only thing that I find slightly odd about this behavior is that the bundling
occurs at the end of message and so I asked myself why aren't we accounting
for SACK chunk at segmentation time?

We could simply do something like this (completely not tested):

diff --git a/net/sctp/chunk.c b/net/sctp/chunk.c
index 1748ef9..3bca0a2 100644
--- a/net/sctp/chunk.c
+++ b/net/sctp/chunk.c
@@ -193,6 +193,15 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct
sctp_association *asoc,
                                            hmac_desc->hmac_len);
        }

+       /* Check to see if we have a pending SACK and try to let it be bundled
+        * with this message.  Do this if we don't have any data to send.  To
+        * check that, look at out_qlen and retransmit list.
+        */
+       if (timer_pending(&asoc->timers[SCTP_EVENT_TIMEOUT_SACK]) &&
+           asoc->outqueue.out_qlen = 0 &&
+           list_empty(&asoc->outqueue.retransmit))
+               max -= WORD_ROUND(sizeof(sctp_sack_chunk_t));
+
        whole = 0;
        first_len = max;


-vlad
> 
> Signed-off-by: Doug Graham <dgraham@nortel.com>
> 
> ---
> 
> --- linux-2.6.29/net/sctp/output.c	2009/08/02 00:51:45	1.4
> +++ linux-2.6.29/net/sctp/output.c	2009/08/02 01:21:37
> @@ -249,14 +249,36 @@ static sctp_xmit_t sctp_packet_bundle_sa
>  		struct sctp_association *asoc;
>  		struct timer_list *timer;
>  		__u16 chunk_len;
> +		__u8 frag_bits, last_frag;
>  
>  		asoc = pkt->transport->asoc;
>  		timer = &asoc->timers[SCTP_EVENT_TIMEOUT_SACK];
>  		chunk_len = WORD_ROUND(ntohs(chunk->chunk_hdr->length)) +
>  			    sizeof(struct sctp_sack_chunk);
> +		frag_bits = chunk->chunk_hdr->flags & SCTP_DATA_FRAG_MASK;
> +		last_frag = (frag_bits = SCTP_DATA_NOT_FRAG ||
> +			     frag_bits = SCTP_DATA_LAST_FRAG);
>  
> -		/* If the SACK timer is running, we have a pending SACK */
> -		if (timer_pending(timer) && can_append_chunk(pkt, chunk_len)) {
> +		/*
> +		 * If the SACK timer is running, we have a pending SACK
> +		 *
> +		 * If we have room, bundle it with the current DATA chunk.
> +		 * If we do not have room but this is either the only
> +		 * fragment of a message or the last fragment of a message,
> +		 * append the SACK chunk anyway.  It'll be sent in a separate
> +		 * packet just before the DATA chunk.  The reason for sending
> +		 * the SACK immediately is to avoid odd anomalies in behaviour
> +		 * when sending messages of a size just less than or equal
> +		 * to a multiple of the PMTU.  Small messages get a SACK
> +		 * bundled with them, since there is room.  Messages greater
> +		 * than the PMTU also get a SACK bundled into the last DATA
> +		 * chunk if there is room.  Messages whose last or only chunks
> +		 * are just less than the PMTU can not bundle a SACK with
> +		 * a DATA chunk, so we send one in a separate packet just
> +		 * before the last DATA chunk in the message.
> +		 */
> +		if (timer_pending(timer) &&
> +		    (can_append_chunk(pkt, chunk_len) || last_frag)) {
>  			struct sctp_chunk *sack;
>  			asoc->a_rwnd = asoc->rwnd;
>  			sack = sctp_make_sack(asoc);
> 
> 
> --Doug
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCHv2] sctp: Do not create SACK chunk if the final packet
  2009-07-31 10:19 [PATCHv2] sctp: Do not create SACK chunk if the final packet size Wei Yongjun
                   ` (6 preceding siblings ...)
  2009-08-03 17:19 ` [PATCHv2] sctp: Do not create SACK chunk if the final packet Vlad Yasevich
@ 2009-08-04  2:45 ` Doug Graham
  2009-08-04  3:08 ` Wei Yongjun
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Doug Graham @ 2009-08-04  2:45 UTC (permalink / raw)
  To: linux-sctp

Hi Vlad,

> The only thing that I find slightly odd about this behavior is that the bundling
> occurs at the end of message and so I asked myself why aren't we accounting
> for SACK chunk at segmentation time?
>
> We could simply do something like this (completely not tested):
>
> diff --git a/net/sctp/chunk.c b/net/sctp/chunk.c
> index 1748ef9..3bca0a2 100644
> --- a/net/sctp/chunk.c
> +++ b/net/sctp/chunk.c
> @@ -193,6 +193,15 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct
> sctp_association *asoc,
>                                             hmac_desc->hmac_len);
>         }
>
> +       /* Check to see if we have a pending SACK and try to let it be bundled
> +        * with this message.  Do this if we don't have any data to send.  To
> +        * check that, look at out_qlen and retransmit list.
> +        */
> +       if (timer_pending(&asoc->timers[SCTP_EVENT_TIMEOUT_SACK]) &&
> +           asoc->outqueue.out_qlen = 0 &&
> +           list_empty(&asoc->outqueue.retransmit))
> +               max -= WORD_ROUND(sizeof(sctp_sack_chunk_t));
> +
>         whole = 0;
>         first_len = max;
>
>   
Doesn't this reduce the  size of all chunks in a message?  You've 
descremented "max", which I think
is the maximum size of all chunks in the message.  I could see maybe 
doing this to leave space
in only the first chunk (ie: just decrement first_len by 
sizeof(sctp_sack_chunk_t), but even that
doesn't buy much I think.  It wouldn't reduce the number of packets 
sent, it would only mean
that the SACK is bundled with the first fragment, and then an extra DATA 
fragment would need
to be created to send the last sizeof(sctp_sack_chunk_t) bytes of 
DATA.   Example:

Assume that the PMTU is 1000, that the user is sending a message of 1990 
bytes,  and
that sizeof(sctp_sack_chunk_t) is 16.  With my scheme, we'd send this:

  DATA (1000 bytes)
  SACK
  DATA (990 bytes)

with your scheme exactly as you propose it, we'd send

  SACK + DATA (1000 - 16 = 984 bytes)
  DATA  (984 bytes)
  DATA  (22 bytes)

If the scheme is modified to only reduce the size of the first fragment, 
we'd send:

  SACK + DATA (984 bytes)
  DATA  (1000 bytes)
  DATA  (6 bytes)

The main point being that if a SACK won't fit in the last chunk of a 
message, then your scheme
would just push all the data down so that a new DATA chunk has to be 
sent to send the last
few bytes of data,

I don't think there's any real advantage in sending the SACK in the 
first packet rather than the
last one, but I agree that  it seems to make more sense.  It does seem 
to complicate the
code a bit though.  At the moment, all the code related to piggybacking 
SACKs is in
one place, whereas with this patch it'd be spread out to seemingly 
unrelated code.

--Doug


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv2] sctp: Do not create SACK chunk if the final packet
  2009-07-31 10:19 [PATCHv2] sctp: Do not create SACK chunk if the final packet size Wei Yongjun
                   ` (7 preceding siblings ...)
  2009-08-04  2:45 ` Doug Graham
@ 2009-08-04  3:08 ` Wei Yongjun
  2009-08-04 14:16 ` Vlad Yasevich
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Wei Yongjun @ 2009-08-04  3:08 UTC (permalink / raw)
  To: linux-sctp

Doug Graham wrote:
> Hi Vlad,
>
>> The only thing that I find slightly odd about this behavior is that
>> the bundling
>> occurs at the end of message and so I asked myself why aren't we
>> accounting
>> for SACK chunk at segmentation time?
>>
>> We could simply do something like this (completely not tested):
>>
>> diff --git a/net/sctp/chunk.c b/net/sctp/chunk.c
>> index 1748ef9..3bca0a2 100644
>> --- a/net/sctp/chunk.c
>> +++ b/net/sctp/chunk.c
>> @@ -193,6 +193,15 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct
>> sctp_association *asoc,
>>                                             hmac_desc->hmac_len);
>>         }
>>
>> +       /* Check to see if we have a pending SACK and try to let it
>> be bundled
>> +        * with this message.  Do this if we don't have any data to
>> send.  To
>> +        * check that, look at out_qlen and retransmit list.
>> +        */
>> +       if (timer_pending(&asoc->timers[SCTP_EVENT_TIMEOUT_SACK]) &&
>> +           asoc->outqueue.out_qlen = 0 &&
>> +           list_empty(&asoc->outqueue.retransmit))
>> +               max -= WORD_ROUND(sizeof(sctp_sack_chunk_t));
>> +
>>         whole = 0;
>>         first_len = max;
>>
>>   
> Doesn't this reduce the  size of all chunks in a message?  You've
> descremented "max", which I think
> is the maximum size of all chunks in the message.  I could see maybe
> doing this to leave space
> in only the first chunk (ie: just decrement first_len by
> sizeof(sctp_sack_chunk_t), but even that
> doesn't buy much I think.  It wouldn't reduce the number of packets
> sent, it would only mean
> that the SACK is bundled with the first fragment, and then an extra
> DATA fragment would need
> to be created to send the last sizeof(sctp_sack_chunk_t) bytes of
> DATA.   Example:
>
> Assume that the PMTU is 1000, that the user is sending a message of
> 1990 bytes,  and
> that sizeof(sctp_sack_chunk_t) is 16.  With my scheme, we'd send this:
>
>  DATA (1000 bytes)
>  SACK
>  DATA (990 bytes)
>
> with your scheme exactly as you propose it, we'd send
>
>  SACK + DATA (1000 - 16 = 984 bytes)
>  DATA  (984 bytes)
>  DATA  (22 bytes)
>
> If the scheme is modified to only reduce the size of the first
> fragment, we'd send:
>
>  SACK + DATA (984 bytes)
>  DATA  (1000 bytes)
>  DATA  (6 bytes)
>
> The main point being that if a SACK won't fit in the last chunk of a
> message, then your scheme
> would just push all the data down so that a new DATA chunk has to be
> sent to send the last
> few bytes of data,
>
> I don't think there's any real advantage in sending the SACK in the
> first packet rather than the
> last one, but I agree that  it seems to make more sense.  It does seem
> to complicate the
> code a bit though.  At the moment, all the code related to
> piggybacking SACKs is in
> one place, whereas with this patch it'd be spread out to seemingly
> unrelated code.

I think Vlad's patch should be like this:

        whole = 0;
        first_len = max;
        ~~~~~~~~~~~~~~~~~~~~

+       /* Check to see if we have a pending SACK and try to let it be bundled
+        * with this message.  Do this if we don't have any data to send.  To
+        * check that, look at out_qlen and retransmit list.
+        */
+       if (timer_pending(&asoc->timers[SCTP_EVENT_TIMEOUT_SACK]) &&
+           asoc->outqueue.out_qlen = 0 &&
+           list_empty(&asoc->outqueue.retransmit))
+               first_len -= WORD_ROUND(sizeof(sctp_sack_chunk_t));
                ~~~~~~~~~~~~~~
......




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv2] sctp: Do not create SACK chunk if the final packet
  2009-07-31 10:19 [PATCHv2] sctp: Do not create SACK chunk if the final packet size Wei Yongjun
                   ` (8 preceding siblings ...)
  2009-08-04  3:08 ` Wei Yongjun
@ 2009-08-04 14:16 ` Vlad Yasevich
  2009-08-04 16:54 ` [PATCHv2] sctp: Do not create SACK chunk if the final packet size exceed current MTU Doug Graham
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Vlad Yasevich @ 2009-08-04 14:16 UTC (permalink / raw)
  To: linux-sctp

Wei Yongjun wrote:
> Doug Graham wrote:
>> Hi Vlad,
>>
>>> The only thing that I find slightly odd about this behavior is that
>>> the bundling
>>> occurs at the end of message and so I asked myself why aren't we
>>> accounting
>>> for SACK chunk at segmentation time?
>>>
>>> We could simply do something like this (completely not tested):
>>>
>>> diff --git a/net/sctp/chunk.c b/net/sctp/chunk.c
>>> index 1748ef9..3bca0a2 100644
>>> --- a/net/sctp/chunk.c
>>> +++ b/net/sctp/chunk.c
>>> @@ -193,6 +193,15 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct
>>> sctp_association *asoc,
>>>                                             hmac_desc->hmac_len);
>>>         }
>>>
>>> +       /* Check to see if we have a pending SACK and try to let it
>>> be bundled
>>> +        * with this message.  Do this if we don't have any data to
>>> send.  To
>>> +        * check that, look at out_qlen and retransmit list.
>>> +        */
>>> +       if (timer_pending(&asoc->timers[SCTP_EVENT_TIMEOUT_SACK]) &&
>>> +           asoc->outqueue.out_qlen = 0 &&
>>> +           list_empty(&asoc->outqueue.retransmit))
>>> +               max -= WORD_ROUND(sizeof(sctp_sack_chunk_t));
>>> +
>>>         whole = 0;
>>>         first_len = max;
>>>
>>>   
>> Doesn't this reduce the  size of all chunks in a message?  You've
>> descremented "max", which I think
>> is the maximum size of all chunks in the message.  I could see maybe
>> doing this to leave space
>> in only the first chunk (ie: just decrement first_len by
>> sizeof(sctp_sack_chunk_t), but even that
>> doesn't buy much I think.  It wouldn't reduce the number of packets
>> sent, it would only mean
>> that the SACK is bundled with the first fragment, and then an extra
>> DATA fragment would need
>> to be created to send the last sizeof(sctp_sack_chunk_t) bytes of
>> DATA.   Example:
>>
>> Assume that the PMTU is 1000, that the user is sending a message of
>> 1990 bytes,  and
>> that sizeof(sctp_sack_chunk_t) is 16.  With my scheme, we'd send this:
>>
>>  DATA (1000 bytes)
>>  SACK
>>  DATA (990 bytes)
>>
>> with your scheme exactly as you propose it, we'd send
>>
>>  SACK + DATA (1000 - 16 = 984 bytes)
>>  DATA  (984 bytes)
>>  DATA  (22 bytes)
>>
>> If the scheme is modified to only reduce the size of the first
>> fragment, we'd send:
>>
>>  SACK + DATA (984 bytes)
>>  DATA  (1000 bytes)
>>  DATA  (6 bytes)
>>
>> The main point being that if a SACK won't fit in the last chunk of a
>> message, then your scheme
>> would just push all the data down so that a new DATA chunk has to be
>> sent to send the last
>> few bytes of data,

One thing to mention is that a generally accepted practice is send control
chunks first, be it bundled or separate.  This why BSD ends up sending SACK
first when it can't bundle it with outgoing data.  Additionally, the way
the code is now, we might end up sending a SACK even when sending DATA is
prohibited via congestion or receive window.   Should we still be doing it
even if we do not send DATA?


>>
>> I don't think there's any real advantage in sending the SACK in the
>> first packet rather than the
>> last one, but I agree that  it seems to make more sense.  It does seem
>> to complicate the
>> code a bit though.  At the moment, all the code related to
>> piggybacking SACKs is in
>> one place, whereas with this patch it'd be spread out to seemingly
>> unrelated code.

Well, this is something that we already have wrt AUTH and COOKIE-ECHO chunks.
It has to do with where we end up segmenting.  If we want the ability to bundle,
we either need to size the data appropriately (this patch) or try to find the
chunk that we can bundle with (your patch).

My personal opinion is that sizing things properly from the beginning is the
right approach.  Note that this will only happen when we have no data queued up
so will not effect applications that try to do bulk transfers, where as you
patch might impact those applications unnecessarily, since they behave much
better with a delayed ACK behavior.

-vlad
> 
> I think Vlad's patch should be like this:
> 
>         whole = 0;
>         first_len = max;
>         ~~~~~~~~~~~~~~~~~~~~
> 
> +       /* Check to see if we have a pending SACK and try to let it be bundled
> +        * with this message.  Do this if we don't have any data to send.  To
> +        * check that, look at out_qlen and retransmit list.
> +        */
> +       if (timer_pending(&asoc->timers[SCTP_EVENT_TIMEOUT_SACK]) &&
> +           asoc->outqueue.out_qlen = 0 &&
> +           list_empty(&asoc->outqueue.retransmit))
> +               first_len -= WORD_ROUND(sizeof(sctp_sack_chunk_t));
>                 ~~~~~~~~~~~~~~
> ......
> 

Yes.  It should really only change the first segment size.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv2] sctp: Do not create SACK chunk if the final packet size exceed current MTU
  2009-07-31 10:19 [PATCHv2] sctp: Do not create SACK chunk if the final packet size Wei Yongjun
                   ` (9 preceding siblings ...)
  2009-08-04 14:16 ` Vlad Yasevich
@ 2009-08-04 16:54 ` Doug Graham
  2009-08-04 17:08 ` [PATCHv2] sctp: Do not create SACK chunk if the final packet Vlad Yasevich
  2009-08-04 17:33 ` [PATCHv2] sctp: Do not create SACK chunk if the final packet size exceed current MTU Doug Graham
  12 siblings, 0 replies; 14+ messages in thread
From: Doug Graham @ 2009-08-04 16:54 UTC (permalink / raw)
  To: linux-sctp

On Tue, Aug 04, 2009 at 10:16:32AM -0400, Vlad Yasevich wrote:
> >> The main point being that if a SACK won't fit in the last chunk of a
> >> message, then your scheme
> >> would just push all the data down so that a new DATA chunk has to be
> >> sent to send the last
> >> few bytes of data,
> 
> One thing to mention is that a generally accepted practice is send control
> chunks first, be it bundled or separate.  This why BSD ends up sending SACK
> first when it can't bundle it with outgoing data.  Additionally, the way
> the code is now, we might end up sending a SACK even when sending DATA is
> prohibited via congestion or receive window.   Should we still be doing it
> even if we do not send DATA?

I don't think BSD is doing this deliberately.  BSD is behaving exactly how
lksctp would have behaved after my very first patch and before Wei noticed
the bit in the RFC about only appending the SACK if it doesn't cause the MTU
to be exceeded.  So I think that when the SACK is too big to be bundled in
the first fragment of a message, BSD is sending the SACK in separate packet
before the DATA more or less by accident.

I don't think the SACK needs to be sent immediately if sending DATA
is prohibited; the normal delay of 200ms should be fine in that case.
The case we're trying to opimize (I think) is when a client is sending
requests and receiving responses as fast as possible from a server.
So we just want to ensure that when a message is sent in either direction,
that it includes a SACK to the message that triggered the request/reply
it's just about to send.  That way, nobody ever has to stop and wait for
a SACK before sending their next request or response.  If the congestion
window prevents the sending of a request from C to S, there's no rush
to send a SACK to S because it hasn't received the request yet, and so
probably isn't going to need a send a reply anytime soon.

--Doug.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv2] sctp: Do not create SACK chunk if the final packet
  2009-07-31 10:19 [PATCHv2] sctp: Do not create SACK chunk if the final packet size Wei Yongjun
                   ` (10 preceding siblings ...)
  2009-08-04 16:54 ` [PATCHv2] sctp: Do not create SACK chunk if the final packet size exceed current MTU Doug Graham
@ 2009-08-04 17:08 ` Vlad Yasevich
  2009-08-04 17:33 ` [PATCHv2] sctp: Do not create SACK chunk if the final packet size exceed current MTU Doug Graham
  12 siblings, 0 replies; 14+ messages in thread
From: Vlad Yasevich @ 2009-08-04 17:08 UTC (permalink / raw)
  To: linux-sctp

Doug Graham wrote:
> On Tue, Aug 04, 2009 at 10:16:32AM -0400, Vlad Yasevich wrote:
>>>> The main point being that if a SACK won't fit in the last chunk of a
>>>> message, then your scheme
>>>> would just push all the data down so that a new DATA chunk has to be
>>>> sent to send the last
>>>> few bytes of data,
>> One thing to mention is that a generally accepted practice is send control
>> chunks first, be it bundled or separate.  This why BSD ends up sending SACK
>> first when it can't bundle it with outgoing data.  Additionally, the way
>> the code is now, we might end up sending a SACK even when sending DATA is
>> prohibited via congestion or receive window.   Should we still be doing it
>> even if we do not send DATA?
> 
> I don't think BSD is doing this deliberately.  BSD is behaving exactly how
> lksctp would have behaved after my very first patch and before Wei noticed
> the bit in the RFC about only appending the SACK if it doesn't cause the MTU
> to be exceeded.  So I think that when the SACK is too big to be bundled in
> the first fragment of a message, BSD is sending the SACK in separate packet
> before the DATA more or less by accident.

Actually, it is rather deliberate.  Control chunks are always sent first.  If
we have data to send and can bundle the data along with control chunks, we do so.

So, the stack attempts to bundle the SACK, but it can't due to size and sends
the SACK first, followed by data.  It would be the same with other control
chunks that may have but out the send queue as well.

> 
> I don't think the SACK needs to be sent immediately if sending DATA
> is prohibited; the normal delay of 200ms should be fine in that case.
> The case we're trying to opimize (I think) is when a client is sending
> requests and receiving responses as fast as possible from a server.
> So we just want to ensure that when a message is sent in either direction,
> that it includes a SACK to the message that triggered the request/reply
> it's just about to send.  That way, nobody ever has to stop and wait for
> a SACK before sending their next request or response.  If the congestion
> window prevents the sending of a request from C to S, there's no rush
> to send a SACK to S because it hasn't received the request yet, and so
> probably isn't going to need a send a reply anytime soon.
> 

Yep, that's what I was thinking as well.  As such, I don't think that logic
of bundling the SACK only if it fits (Wei's patch) is quite right.  What we
want to do is see if we'll, in fact, send the data.  If we do, then try to
send the SACK letting it bundle or be standalone.  To promote bundling, we'll
try to size our DATA correctly in the first place when segmenting.

I think that will provide the most correct on-the-wire behavior.

-vlad

> --Doug.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv2] sctp: Do not create SACK chunk if the final packet size exceed current MTU
  2009-07-31 10:19 [PATCHv2] sctp: Do not create SACK chunk if the final packet size Wei Yongjun
                   ` (11 preceding siblings ...)
  2009-08-04 17:08 ` [PATCHv2] sctp: Do not create SACK chunk if the final packet Vlad Yasevich
@ 2009-08-04 17:33 ` Doug Graham
  12 siblings, 0 replies; 14+ messages in thread
From: Doug Graham @ 2009-08-04 17:33 UTC (permalink / raw)
  To: linux-sctp

On Tue, Aug 04, 2009 at 01:08:05PM -0400, Vlad Yasevich wrote:
> Doug Graham wrote:
> > On Tue, Aug 04, 2009 at 10:16:32AM -0400, Vlad Yasevich wrote:
> >>>> The main point being that if a SACK won't fit in the last chunk of a
> >>>> message, then your scheme
> >>>> would just push all the data down so that a new DATA chunk has to be
> >>>> sent to send the last
> >>>> few bytes of data,
> >> One thing to mention is that a generally accepted practice is send control
> >> chunks first, be it bundled or separate.  This why BSD ends up sending SACK
> >> first when it can't bundle it with outgoing data.  Additionally, the way
> >> the code is now, we might end up sending a SACK even when sending DATA is
> >> prohibited via congestion or receive window.   Should we still be doing it
> >> even if we do not send DATA?
> > 
> > I don't think BSD is doing this deliberately.  BSD is behaving exactly how
> > lksctp would have behaved after my very first patch and before Wei noticed
> > the bit in the RFC about only appending the SACK if it doesn't cause the MTU
> > to be exceeded.  So I think that when the SACK is too big to be bundled in
> > the first fragment of a message, BSD is sending the SACK in separate packet
> > before the DATA more or less by accident.
> 
> Actually, it is rather deliberate.  Control chunks are always sent first.  If
> we have data to send and can bundle the data along with control chunks, we do so.
> 
> So, the stack attempts to bundle the SACK, but it can't due to size and sends
> the SACK first, followed by data.  It would be the same with other control
> chunks that may have but out the send queue as well.

Did you get this by reading the BSD code?  The reason I thought it was
more or less by accident is that, as I mentioned, BSD's behaviour is
the same as what lksctp would have done before Wei's MTU patch.  ie:
if it is sending a fragmented packet, it just blindly notices that the
first fragment it's about to send after the SACK becomes pending has no
room for a SACK, so it sends the SACK in a separate packet.  I'd have
thought that if they'd put much thought into it, they'd have implemented
something closer to what we're trying to do, which is that if there's
room in *any* fragment of a message, that they'd try to bundle the SACK.
Doesn't matter too much whether they leave room in the first fragment
and bundle the SACK there, or wait till the last fragment and bundle
the SACK there if there's room; they key thing is that BSD does not take
advantage of piggybacking the SACK onto fragmented messages even in the
"easy" cases.


> > 
> > I don't think the SACK needs to be sent immediately if sending DATA
> > is prohibited; the normal delay of 200ms should be fine in that case.
> > The case we're trying to opimize (I think) is when a client is sending
> > requests and receiving responses as fast as possible from a server.
> > So we just want to ensure that when a message is sent in either direction,
> > that it includes a SACK to the message that triggered the request/reply
> > it's just about to send.  That way, nobody ever has to stop and wait for
> > a SACK before sending their next request or response.  If the congestion
> > window prevents the sending of a request from C to S, there's no rush
> > to send a SACK to S because it hasn't received the request yet, and so
> > probably isn't going to need a send a reply anytime soon.
> > 
> 
> Yep, that's what I was thinking as well.  As such, I don't think that logic
> of bundling the SACK only if it fits (Wei's patch) is quite right.  What we
> want to do is see if we'll, in fact, send the data.  If we do, then try to
> send the SACK letting it bundle or be standalone.  To promote bundling, we'll
> try to size our DATA correctly in the first place when segmenting.
> 
> I think that will provide the most correct on-the-wire behavior.

Sure, works for me.

You'd think maybe the RFC would have more to say about all this!

--Doug.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2009-08-04 17:33 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-07-31 10:19 [PATCHv2] sctp: Do not create SACK chunk if the final packet size Wei Yongjun
2009-07-31 14:01 ` [PATCHv2] sctp: Do not create SACK chunk if the final packet Vlad Yasevich
2009-07-31 16:02 ` [PATCHv2] sctp: Do not create SACK chunk if the final packet size exceed current MTU Doug Graham
2009-07-31 16:49 ` [PATCHv2] sctp: Do not create SACK chunk if the final packet Wei Yongjun
2009-07-31 17:04 ` Vlad Yasevich
2009-07-31 17:09 ` Doug Graham
2009-08-02  3:03 ` [PATCHv2] sctp: Do not create SACK chunk if the final packet size exceed current MTU Doug Graham
2009-08-03 17:19 ` [PATCHv2] sctp: Do not create SACK chunk if the final packet Vlad Yasevich
2009-08-04  2:45 ` Doug Graham
2009-08-04  3:08 ` Wei Yongjun
2009-08-04 14:16 ` Vlad Yasevich
2009-08-04 16:54 ` [PATCHv2] sctp: Do not create SACK chunk if the final packet size exceed current MTU Doug Graham
2009-08-04 17:08 ` [PATCHv2] sctp: Do not create SACK chunk if the final packet Vlad Yasevich
2009-08-04 17:33 ` [PATCHv2] sctp: Do not create SACK chunk if the final packet size exceed current MTU Doug Graham

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.