All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Fix piggybacked ACKs
@ 2009-07-29 16:05 Doug Graham
  2009-07-30  6:48 ` Wei Yongjun
                   ` (27 more replies)
  0 siblings, 28 replies; 29+ messages in thread
From: Doug Graham @ 2009-07-29 16:05 UTC (permalink / raw)
  To: linux-sctp

This patch corrects the conditions under which a SACK will be piggybacked
on a DATA packet.  The previous condition was incorrect due to a
misinterpretation of RFC 4960 and/or RFC 2960.  Specifically, the
following paragraph from section 6.2 had not been implemented correctly:

   Before an endpoint transmits a DATA chunk, if any received DATA
   chunks have not been acknowledged (e.g., due to delayed ack), the
   sender should create a SACK and bundle it with the outbound DATA
   chunk, as long as the size of the final SCTP packet does not exceed
   the current MTU.  See Section 6.2.

When about to send a DATA chunk, the code now checks to see if the SACK
timer is running.  If it is, we know we have a SACK to send to the
peer, so we append the SACK (assuming available space in the packet)
and turn off the timer.  For a simple request-response scenario, this
will result in the SACK being bundled with the response, meaning the
the SACK is received quickly by the client, and also meaning that no
separate SACK packet needs to be sent by the server to acknowledge the
request.  Prior to this patch, a separate SACK packet would have been
sent by the server SCTP only after its delayed-ACK timer had expired
(usually 200ms).  This is wasteful of bandwidth, and can also have a
major negative impact on performance due the interaction of delayed ACKs
with the Nagle algorithm.

Signed-off-by: Doug Graham <dgraham@nortel.com>

---

--- linux-2.6.29/net/sctp/output.c	2009/07/24 23:37:44	1.1
+++ linux-2.6.29/net/sctp/output.c	2009/07/26 03:55:36
@@ -237,18 +237,19 @@ static sctp_xmit_t sctp_packet_bundle_sa
 	if (sctp_chunk_is_data(chunk) && !pkt->has_sack &&
 	    !pkt->has_cookie_echo) {
 		struct sctp_association *asoc;
+		struct timer_list *timer;
 		asoc = pkt->transport->asoc;
+		timer = &asoc->timers[SCTP_EVENT_TIMEOUT_SACK];
 
-		if (asoc->a_rwnd > asoc->rwnd) {
+		/* If the SACK timer is running, we have a pending SACK */
+		if (timer_pending(timer)) {
 			struct sctp_chunk *sack;
 			asoc->a_rwnd = asoc->rwnd;
 			sack = sctp_make_sack(asoc);
 			if (sack) {
-				struct timer_list *timer;
 				retval = sctp_packet_append_chunk(pkt, sack);
 				asoc->peer.sack_needed = 0;
-				timer = &asoc->timers[SCTP_EVENT_TIMEOUT_SACK];
-				if (timer_pending(timer) && del_timer(timer))
+				if (del_timer(timer))
 					sctp_association_put(asoc);
 			}
 		}

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix piggybacked ACKs
  2009-07-29 16:05 [PATCH] Fix piggybacked ACKs Doug Graham
@ 2009-07-30  6:48 ` Wei Yongjun
  2009-07-30  9:51 ` Wei Yongjun
                   ` (26 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Wei Yongjun @ 2009-07-30  6:48 UTC (permalink / raw)
  To: linux-sctp

Doug Graham wrote:
> This patch corrects the conditions under which a SACK will be piggybacked
> on a DATA packet.  The previous condition was incorrect due to a
> misinterpretation of RFC 4960 and/or RFC 2960.  Specifically, the
> following paragraph from section 6.2 had not been implemented correctly:
>
>    Before an endpoint transmits a DATA chunk, if any received DATA
>    chunks have not been acknowledged (e.g., due to delayed ack), the
>    sender should create a SACK and bundle it with the outbound DATA
>    chunk, as long as the size of the final SCTP packet does not exceed
>    the current MTU.  See Section 6.2.
>   

The above text said that SACK is create when the size of the final SCTP
packet
does not exceed  the current MTU. With the patch, what will happend?
If the packet is too large for bundle with SACK, the packet sequence will
like this:

Endpoint A                         Endpoint B

DATA            ------------->
                <-------------     SACK
                <-------------     DATA (size\x1452)
SACK            ------------->
DATA (size\x1452)------------->
                <-------------     SACK
                <-------------     DATA (size\x1452)

The behavior is the same as no delayed ack support. So I think
you also need to check the packet size before append the SACK.


> When about to send a DATA chunk, the code now checks to see if the SACK
> timer is running.  If it is, we know we have a SACK to send to the
> peer, so we append the SACK (assuming available space in the packet)
> and turn off the timer.  For a simple request-response scenario, this
> will result in the SACK being bundled with the response, meaning the
> the SACK is received quickly by the client, and also meaning that no
> separate SACK packet needs to be sent by the server to acknowledge the
> request.  Prior to this patch, a separate SACK packet would have been
> sent by the server SCTP only after its delayed-ACK timer had expired
> (usually 200ms).  This is wasteful of bandwidth, and can also have a
> major negative impact on performance due the interaction of delayed ACKs
> with the Nagle algorithm.
>
> Signed-off-by: Doug Graham <dgraham@nortel.com>
>
> ---
>
> --- linux-2.6.29/net/sctp/output.c	2009/07/24 23:37:44	1.1
> +++ linux-2.6.29/net/sctp/output.c	2009/07/26 03:55:36
> @@ -237,18 +237,19 @@ static sctp_xmit_t sctp_packet_bundle_sa
>  	if (sctp_chunk_is_data(chunk) && !pkt->has_sack &&
>  	    !pkt->has_cookie_echo) {
>  		struct sctp_association *asoc;
> +		struct timer_list *timer;
>  		asoc = pkt->transport->asoc;
> +		timer = &asoc->timers[SCTP_EVENT_TIMEOUT_SACK];
>  
> -		if (asoc->a_rwnd > asoc->rwnd) {
> +		/* If the SACK timer is running, we have a pending SACK */
> +		if (timer_pending(timer)) {
>  			struct sctp_chunk *sack;
>  			asoc->a_rwnd = asoc->rwnd;
>  			sack = sctp_make_sack(asoc);
>  			if (sack) {
> -				struct timer_list *timer;
>  				retval = sctp_packet_append_chunk(pkt, sack);
>  				asoc->peer.sack_needed = 0;
> -				timer = &asoc->timers[SCTP_EVENT_TIMEOUT_SACK];
> -				if (timer_pending(timer) && del_timer(timer))
> +				if (del_timer(timer))
>  					sctp_association_put(asoc);
>  			}
>  		}
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
>   



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix piggybacked ACKs
  2009-07-29 16:05 [PATCH] Fix piggybacked ACKs Doug Graham
  2009-07-30  6:48 ` Wei Yongjun
@ 2009-07-30  9:51 ` Wei Yongjun
  2009-07-30 16:49 ` Doug Graham
                   ` (25 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Wei Yongjun @ 2009-07-30  9:51 UTC (permalink / raw)
  To: linux-sctp

Doug Graham wrote:
>> This patch corrects the conditions under which a SACK will be piggybacked
>> on a DATA packet.  The previous condition was incorrect due to a
>> misinterpretation of RFC 4960 and/or RFC 2960.  Specifically, the
>> following paragraph from section 6.2 had not been implemented correctly:
>>
>>    Before an endpoint transmits a DATA chunk, if any received DATA
>>    chunks have not been acknowledged (e.g., due to delayed ack), the
>>    sender should create a SACK and bundle it with the outbound DATA
>>    chunk, as long as the size of the final SCTP packet does not exceed
>>    the current MTU.  See Section 6.2.
>>   
>>     
>
> The above text said that SACK is create when the size of the final SCTP
> packet
> does not exceed  the current MTU. With the patch, what will happend?
> If the packet is too large for bundle with SACK, the packet sequence will
> like this:
>
> Endpoint A                         Endpoint B
>
> DATA            ------------->
>                 <-------------     SACK
>                 <-------------     DATA (size\x1452)
> SACK            ------------->
> DATA (size\x1452)------------->
>                 <-------------     SACK
>                 <-------------     DATA (size\x1452)
>
> The behavior is the same as no delayed ack support. So I think
> you also need to check the packet size before append the SACK.

The patch should be like this:

[PATCH] sctp: Do not create SACK chunk if the final packet size exceed current MTU

The sender should create a SACK only if the size of the final SCTP
packet does not exceed the current MTU. Base on RFC 4960:

  6.1.  Transmission of DATA Chunks

    Before an endpoint transmits a DATA chunk, if any received DATA
    chunks have not been acknowledged (e.g., due to delayed ack), the
    sender should create a SACK and bundle it with the outbound DATA
    chunk, as long as the size of the final SCTP packet does not exceed
    the current MTU.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
---
 net/sctp/output.c |   13 ++++++++-----
 1 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/net/sctp/output.c b/net/sctp/output.c
index 94c110d..1aa3c6e 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -274,16 +274,19 @@ sctp_xmit_t sctp_packet_append_chunk(struct sctp_packet *packet,
 	if (retval != SCTP_XMIT_OK)
 		goto finish;
 
-	/* Try to bundle SACK chunk */
-	retval = sctp_packet_bundle_sack(packet, chunk);
-	if (retval != SCTP_XMIT_OK)
-		goto finish;
-
 	psize = packet->size;
 	pmtu  = ((packet->transport->asoc) ?
 		 (packet->transport->asoc->pathmtu) :
 		 (packet->transport->pathmtu));
 
+	/* Try to bundle SACK chunk if not exceed the current MTU */
+	if (psize + chunk_len + sizeof(struct sctp_sack_chunk) <= pmtu) {
+		retval = sctp_packet_bundle_sack(packet, chunk);
+		if (retval != SCTP_XMIT_OK)
+			goto finish;
+		psize = packet->size;
+	}
+
 	too_big = (psize + chunk_len > pmtu);
 
 	/* Decide if we need to fragment or resubmit later. */
-- 
1.6.2.2






^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix piggybacked ACKs
  2009-07-29 16:05 [PATCH] Fix piggybacked ACKs Doug Graham
  2009-07-30  6:48 ` Wei Yongjun
  2009-07-30  9:51 ` Wei Yongjun
@ 2009-07-30 16:49 ` Doug Graham
  2009-07-30 17:05 ` Vlad Yasevich
                   ` (24 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Doug Graham @ 2009-07-30 16:49 UTC (permalink / raw)
  To: linux-sctp

On Thu, Jul 30, 2009 at 05:51:18PM +0800, Wei Yongjun wrote:
> The sender should create a SACK only if the size of the final SCTP
> packet does not exceed the current MTU. Base on RFC 4960:
> 
>   6.1.  Transmission of DATA Chunks
> 
>     Before an endpoint transmits a DATA chunk, if any received DATA
>     chunks have not been acknowledged (e.g., due to delayed ack), the
>     sender should create a SACK and bundle it with the outbound DATA
>     chunk, as long as the size of the final SCTP packet does not exceed
>     the current MTU.

[patch deleted]

I think you're right that there's a real problem here, and that a patch
similar to yours is needed, but this is not a new problem introduced
with my patch.  I only changed the conditions under which a SACK chunk
was bundled with a DATA chunk, but the same bundling would have been
happening before under different conditions.

I'd really need to study the code more than I have to be able to say
whether or not your fix is correct (and who cares what I think anyway
:-)), but I've just spent all of about 15 minutes looking at parts of the
code that I'd never looked at before, and I see something that off the
top of my head looks a bit scary.  That is that sctp_packet_bundle_sack()
calls sctp_packet_append_chunk(), which calls sctp_packet_bundle_sack().
Aside from the possibility of infinite recursion (presumably this must
be prevented somehow, because it doesn't seem to happen), the logic of
this seems strangely circular to me.  If bundle_sack() is going to call
append_chunk(), I'd have guessed that append_chunk() would be a lower
level routine that just appends the chunk you give it, not one that
itself tries to bundle other chunks in.

Anyway, you've got me curious now.  I'll have a go at better understanding
the code when I get some time.

--Doug.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix piggybacked ACKs
  2009-07-29 16:05 [PATCH] Fix piggybacked ACKs Doug Graham
                   ` (2 preceding siblings ...)
  2009-07-30 16:49 ` Doug Graham
@ 2009-07-30 17:05 ` Vlad Yasevich
  2009-07-30 21:24 ` Vlad Yasevich
                   ` (23 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Vlad Yasevich @ 2009-07-30 17:05 UTC (permalink / raw)
  To: linux-sctp

Doug Graham wrote:
> On Thu, Jul 30, 2009 at 05:51:18PM +0800, Wei Yongjun wrote:
>> The sender should create a SACK only if the size of the final SCTP
>> packet does not exceed the current MTU. Base on RFC 4960:
>>
>>   6.1.  Transmission of DATA Chunks
>>
>>     Before an endpoint transmits a DATA chunk, if any received DATA
>>     chunks have not been acknowledged (e.g., due to delayed ack), the
>>     sender should create a SACK and bundle it with the outbound DATA
>>     chunk, as long as the size of the final SCTP packet does not exceed
>>     the current MTU.
> 
> [patch deleted]
> 
> I think you're right that there's a real problem here, and that a patch
> similar to yours is needed, but this is not a new problem introduced
> with my patch.  I only changed the conditions under which a SACK chunk
> was bundled with a DATA chunk, but the same bundling would have been
> happening before under different conditions.
> 
> I'd really need to study the code more than I have to be able to say
> whether or not your fix is correct (and who cares what I think anyway
> :-)), but I've just spent all of about 15 minutes looking at parts of the
> code that I'd never looked at before, and I see something that off the
> top of my head looks a bit scary.  That is that sctp_packet_bundle_sack()
> calls sctp_packet_append_chunk(), which calls sctp_packet_bundle_sack().
> Aside from the possibility of infinite recursion (presumably this must
> be prevented somehow, because it doesn't seem to happen), the logic of
> this seems strangely circular to me.  If bundle_sack() is going to call
> append_chunk(), I'd have guessed that append_chunk() would be a lower
> level routine that just appends the chunk you give it, not one that
> itself tries to bundle other chunks in.

sctp_append_chunk() tries to be very smart.  It's a generic routine that you
call that also then tries to figure out if any special bundle can also be done.

The circular bundling doesn't happen because the chunk one passes to
sctp_packet_bundle_sack() changes.   The first time in, the chunk is a data
chunk that makes us go into the whole added the SACK in.  At this time,
append_chunk() is called with a SACK chunk and a seemingly recursive call to
bundle_sack() bails since the chunk is not data.

I think the Wei's patch for mtu checking should probably move into bundle_sack().

-vlad

> 
> Anyway, you've got me curious now.  I'll have a go at better understanding
> the code when I get some time.
> 
> --Doug.
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix piggybacked ACKs
  2009-07-29 16:05 [PATCH] Fix piggybacked ACKs Doug Graham
                   ` (3 preceding siblings ...)
  2009-07-30 17:05 ` Vlad Yasevich
@ 2009-07-30 21:24 ` Vlad Yasevich
  2009-07-30 23:40 ` Doug Graham
                   ` (22 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Vlad Yasevich @ 2009-07-30 21:24 UTC (permalink / raw)
  To: linux-sctp

Doug Graham wrote:
> On Thu, Jul 30, 2009 at 05:51:18PM +0800, Wei Yongjun wrote:
>> The sender should create a SACK only if the size of the final SCTP
>> packet does not exceed the current MTU. Base on RFC 4960:
>>
>>   6.1.  Transmission of DATA Chunks
>>
>>     Before an endpoint transmits a DATA chunk, if any received DATA
>>     chunks have not been acknowledged (e.g., due to delayed ack), the
>>     sender should create a SACK and bundle it with the outbound DATA
>>     chunk, as long as the size of the final SCTP packet does not exceed
>>     the current MTU.
> 
> [patch deleted]
> 
> I think you're right that there's a real problem here, and that a patch
> similar to yours is needed, but this is not a new problem introduced
> with my patch.  I only changed the conditions under which a SACK chunk
> was bundled with a DATA chunk, but the same bundling would have been
> happening before under different conditions.

Doug

If you still have BSD setup, can you try increasing you message size
to say 1442 and see what happens.

I'd expect bundles SACKs at 1440 bytes, but then probably a separate SACK and DATA.

-vlad

> 
> I'd really need to study the code more than I have to be able to say
> whether or not your fix is correct (and who cares what I think anyway
> :-)), but I've just spent all of about 15 minutes looking at parts of the
> code that I'd never looked at before, and I see something that off the
> top of my head looks a bit scary.  That is that sctp_packet_bundle_sack()
> calls sctp_packet_append_chunk(), which calls sctp_packet_bundle_sack().
> Aside from the possibility of infinite recursion (presumably this must
> be prevented somehow, because it doesn't seem to happen), the logic of
> this seems strangely circular to me.  If bundle_sack() is going to call
> append_chunk(), I'd have guessed that append_chunk() would be a lower
> level routine that just appends the chunk you give it, not one that
> itself tries to bundle other chunks in.
> 
> Anyway, you've got me curious now.  I'll have a go at better understanding
> the code when I get some time.
> 
> --Doug.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix piggybacked ACKs
  2009-07-29 16:05 [PATCH] Fix piggybacked ACKs Doug Graham
                   ` (4 preceding siblings ...)
  2009-07-30 21:24 ` Vlad Yasevich
@ 2009-07-30 23:40 ` Doug Graham
  2009-07-31  0:53 ` Wei Yongjun
                   ` (21 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Doug Graham @ 2009-07-30 23:40 UTC (permalink / raw)
  To: linux-sctp

[-- Attachment #1: Type: text/plain, Size: 2470 bytes --]

On Thu, Jul 30, 2009 at 05:24:09PM -0400, Vlad Yasevich wrote:
> Doug Graham wrote:
> > On Thu, Jul 30, 2009 at 05:51:18PM +0800, Wei Yongjun wrote:
> >> The sender should create a SACK only if the size of the final SCTP
> >> packet does not exceed the current MTU. Base on RFC 4960:
> >>
> >>   6.1.  Transmission of DATA Chunks
> >>
> >>     Before an endpoint transmits a DATA chunk, if any received DATA
> >>     chunks have not been acknowledged (e.g., due to delayed ack), the
> >>     sender should create a SACK and bundle it with the outbound DATA
> >>     chunk, as long as the size of the final SCTP packet does not exceed
> >>     the current MTU.
> > 
> > [patch deleted]
> > 
> > I think you're right that there's a real problem here, and that a patch
> > similar to yours is needed, but this is not a new problem introduced
> > with my patch.  I only changed the conditions under which a SACK chunk
> > was bundled with a DATA chunk, but the same bundling would have been
> > happening before under different conditions.
> 
> Doug
> 
> If you still have BSD setup, can you try increasing you message size
> to say 1442 and see what happens.
> 
> I'd expect bundles SACKs at 1440 bytes, but then probably a separate SACK and DATA.

The largest amount of data I can send and still have the BSD server bundle
a SACK with the response is 1436 bytes.  The total ethernet frame size
at that point is 1514 bytes, so this seems correct.  I've attached
wireshark captures with data sizes of 1436 bytes and 1438 bytes.
It's interesting to note that if BSD decides not to bundle a SACK,
it instead sends a separate SACK packet immediately; it does not wait
for the SACK timer to timeout.  It first sends the SACK, then the DATA
immediately follows. I don't think Wei's patch would do this; I think
that if his patch determined that bundling a SACK would cause the packet
to exceed the MTU, then the behaviour will revert to what it was before
my patch is applied: ie the SACK will not be sent for 200ms.

So I guess the logic when about to send a DATA chunk should go something
like:

 if (sack_timer_running) {
   /* We know we owe the peer a SACK */
   if (SACK + DATA fits in the MTU)
      bundle SACK with DATA and send that
   else {
      send SACK in a separate packet
      send DATA in a separate packet
   }
   turn_off_sack_timer
 }

I don't think the RFC was explicit on what to do if the SACK+DATA
exceeds the MTU, but this makes sense to me.

--Doug.

[-- Attachment #2: bsd72_server_1436.cap --]
[-- Type: text/plain, Size: 13809 bytes --]

Ôò¡\x02\0\x04\0\0\0\0\0\0\0\0\0ÿÿ\0\0\x01\0\0\0C+rJ–!\b\0Z\0\0\0Z\0\0\0\0\x13 \x18¦ø\0\x15Å\x04ˆ.\b\0E\x02\0L\0\0@\0@„&\x13
\0\0\x0f
\0\0\vÐB06\0\0\0\0‚¥÷k\x01\0\0,\x184\x18Š\0\0Ü\0\0
ÿÿ{a\x12Q\0\f\0\x06\0\x05\0\0€\0\0\x04À\0\0\x04À\x06\0\b\0\0\0\0C+rJ"#\b\0¢\x01\0\0¢\x01\0\0\0\x15Å\x04ˆ.\0\x13 \x18¦ø\b\0E\0\x01”\x01¶\0\0@„c\x17
\0\0\v
\0\0\x0f06ÐB\x184\x18Š\x1d+ÃÕ\x02\0\x01tïÿέ\0\x03Ž8\0
\b\0äÉdíÀ\x06\0\bPLRS€\0\0\x04À\0\0\x04€\b\0
Á€À‚\x0f\0\0€\x02\0$›ÂCþ\x7f½W¶Á\x1c\x16Wß9#\x04ût‹xŸË‚uÅ;qX?–l€\x04\0\b\0\x01\0\x03€\x03\0\x06€Á\0\0\0\a\x01\x10KAME-BSD 1.1\0\0\0\0ñ\a\0\0·	\0`ê\0\0\0\0\0\0\0\0\0\0\x184\x18Šïÿέ
\0\0\x0f\0\0\0\0\0\0\0\0\0\0\0\0\x05\0\0\0
\0\0\v\0\0\0\0\0\0\0\0\0\0\0\0\x05\0\0\0\0\0\0\0ÐB06\x01\0\0\0\x01\0UÄ\0°{Ä\x01\0\0,\x184\x18Š\0\0Ü\0\0
ÿÿ{a\x12Q\0\f\0\x06\0\x05\0\0€\0\0\x04À\0\0\x04À\x06\0\b\0\0\0\0\x02\0\x01tïÿέ\0\x03Ž8\0
\b\0äÉdíÀ\x06\0\bPLRS€\0\0\x04À\0\0\x04€\b\0
Á€À‚\x0f\0\0€\x02\0$›ÂCþ\x7f½W¶Á\x1c\x16Wß9#\x04ût‹xŸË‚uÅ;qX?–l€\x04\0\b\0\x01\0\x03€\x03\0\x06€Á\0\0·š½>§4\x1d5H
³wz\bãéš‹\C+rJa#\b\0>\x01\0\0>\x01\0\0\0\x13 \x18¦ø\0\x15Å\x04ˆ.\b\0E\x02\x010\0\0@\0@„%/
\0\0\x0f
\0\0\vÐB06ïÿέ\x16§þ\x18
\0\x01\x10KAME-BSD 1.1\0\0\0\0ñ\a\0\0·	\0`ê\0\0\0\0\0\0\0\0\0\0\x184\x18Šïÿέ
\0\0\x0f\0\0\0\0\0\0\0\0\0\0\0\0\x05\0\0\0
\0\0\v\0\0\0\0\0\0\0\0\0\0\0\0\x05\0\0\0\0\0\0\0ÐB06\x01\0\0\0\x01\0UÄ\0°{Ä\x01\0\0,\x184\x18Š\0\0Ü\0\0
ÿÿ{a\x12Q\0\f\0\x06\0\x05\0\0€\0\0\x04À\0\0\x04À\x06\0\b\0\0\0\0\x02\0\x01tïÿέ\0\x03Ž8\0
\b\0äÉdíÀ\x06\0\bPLRS€\0\0\x04À\0\0\x04€\b\0
Á€À‚\x0f\0\0€\x02\0$›ÂCþ\x7f½W¶Á\x1c\x16Wß9#\x04ût‹xŸË‚uÅ;qX?–l€\x04\0\b\0\x01\0\x03€\x03\0\x06€Á\0\0·š½>§4\x1d5H
³wz\bãéš‹\C+rJß$\b\0<\0\0\0<\0\0\0\0\x15Å\x04ˆ.\0\x13 \x18¦ø\b\0E\0\0$\x01·@\0@„$†
\0\0\v
\0\0\x0f06ÐB\x184\x18Š
n§%\v\0\0\x04\0\0\0\0\0\0\0\0\0\0C+rJ\a%\b\0ê\x05\0\0ê\x05\0\0\0\x13 \x18¦ø\0\x15Å\x04ˆ.\b\0E\x02\x05Ü\0\0@\0@„ ƒ
\0\0\x0f
\0\0\vÐB06ïÿέ_›æ­\0\x02\x04ô{a\x12Q\0\0\0\0\0\0\0\00123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901\0\x01\0È{a\x12R\0\0\0\0\0\0\0\02345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345C+rJt'\b\0>\0\0\0>\0\0\0\0\x15Å\x04ˆ.\0\x13 \x18¦ø\b\0E\0\00\x01¸@\0@„$y
\0\0\v
\0\0\x0f06ÐB\x184\x18ŠÍýWÛ\x03\0\0\x10{a\x12R\0\x03†œ\0\0\0\0C+rJ\x15)\b\0Ú\x05\0\0Ú\x05\0\0\0\x15Å\x04ˆ.\0\x13 \x18¦ø\b\0E\x02\x05Ì\x01¹@\0@„\x1eÚ
\0\0\v
\0\0\x0f06ÐB\x184\x18Šçí(Û\0\x03\x05¬äÉdí\0\0\0\0\0\0\0\001234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345C+rJ@)\b\0>\0\0\0>\0\0\0\0\x13 \x18¦ø\0\x15Å\x04ˆ.\b\0E\x02\00\0\0@\0@„&/
\0\0\x0f
\0\0\vÐB06ïÿݝ´\fÙ\x03\0\0\x10äÉdí\0\0Öd\0\0\0\0E+rJ\f+\b\0Ú\x05\0\0Ú\x05\0\0\0\x13 \x18¦ø\0\x15Å\x04ˆ.\b\0E\x02\x05Ì\0\0@\0@„ “
\0\0\x0f
\0\0\vÐB06ïÿέ'ÛU"\0\x03\x05¬{a\x12S\0\0\0\x01\0\0\0\001234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345E+rJ'/\b\0ê\x05\0\0ê\x05\0\0\0\x15Å\x04ˆ.\0\x13 \x18¦ø\b\0E\x02\x05Ü\x01»@\0@„\x1eÈ
\0\0\v
\0\0\x0f06ÐB\x184\x18ŠŽtáî\x03\0\0\x10{a\x12S\0\x03Ž8\0\0\0\0\0\x03\x05¬äÉdî\0\0\0\x01\0\0\0\001234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345E+rJ78\v\0>\0\0\0>\0\0\0\0\x13 \x18¦ø\0\x15Å\x04ˆ.\b\0E\x02\00\0\0@\0@„&/
\0\0\x0f
\0\0\vÐB06ïÿέ´¸£À\x03\0\0\x10äÉdî\0\0Öd\0\0\0\0G+rJë0\b\0Ú\x05\0\0Ú\x05\0\0\0\x13 \x18¦ø\0\x15Å\x04ˆ.\b\0E\x02\x05Ì\0\0@\0@„ “
\0\0\x0f
\0\0\vÐB06ïÿέ´«Û¾\0\x03\x05¬{a\x12T\0\0\0\x02\0\0\0\001234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345G+rJ\x135\b\0ê\x05\0\0ê\x05\0\0\0\x15Å\x04ˆ.\0\x13 \x18¦ø\b\0E\x02\x05Ü\x01½@\0@„\x1eÆ
\0\0\v
\0\0\x0f06ÐB\x184\x18ŠÿD‹ \x03\0\0\x10{a\x12T\0\x03Ž8\0\0\0\0\0\x03\x05¬äÉdï\0\0\0\x02\0\0\0\001234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345G+rJþ@\v\0>\0\0\0>\0\0\0\0\x13 \x18¦ø\0\x15Å\x04ˆ.\b\0E\x02\00\0\0@\0@„&/
\0\0\x0f
\0\0\vÐB06ïÿέün4\x03\0\0\x10äÉdï\0\0Öd\0\0\0\0I+rJk7\b\0Ú\x05\0\0Ú\x05\0\0\0\x13 \x18¦ø\0\x15Å\x04ˆ.\b\0E\x02\x05Ì\0\0@\0@„ “
\0\0\x0f
\0\0\vÐB06ïÿέ&R²g\0\x03\x05¬{a\x12U\0\0\0\x03\0\0\0\001234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345I+rJ³;\b\0ê\x05\0\0ê\x05\0\0\0\x15Å\x04ˆ.\0\x13 \x18¦ø\b\0E\x02\x05Ü\x01¿@\0@„\x1eÄ
\0\0\v
\0\0\x0f06ÐB\x184\x18ŠFyº¥\x03\0\0\x10{a\x12U\0\x03Ž8\0\0\0\0\0\x03\x05¬äÉdð\0\0\0\x03\0\0\0\001234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345I+rJ×G\v\0>\0\0\0>\0\0\0\0\x13 \x18¦ø\0\x15Å\x04ˆ.\b\0E\x02\00\0\0@\0@„&/
\0\0\x0f
\0\0\vÐB06ïÿέ®Á…>\x03\0\0\x10äÉdð\0\0Öd\0\0\0\0K+rJ\x05>\b\06\0\0\06\0\0\0\0\x13 \x18¦ø\0\x15Å\x04ˆ.\b\0E\x02\0(\0\0@\0@„&7
\0\0\x0f
\0\0\vÐB06ïÿέûð°I\a\0\0\bäÉdðK+rJÏ>\b\0<\0\0\0<\0\0\0\0\x15Å\x04ˆ.\0\x13 \x18¦ø\b\0E\0\0$\x01Á@\0@„$|
\0\0\v
\0\0\x0f06ÐB\x184\x18Š3ç…G\b\0\0\x04\0\0\0\0\0\0\0\0\0\0K+rJó>\b\02\0\0\02\0\0\0\0\x13 \x18¦ø\0\x15Å\x04ˆ.\b\0E\x02\0$\0\0@\0@„&;
\0\0\x0f
\0\0\vÐB06ïÿέg¨\x0e\0\0\x04

[-- Attachment #3: bsd72_server_1438.cap --]
[-- Type: text/plain, Size: 14176 bytes --]

Ôò¡\x02\0\x04\0\0\0\0\0\0\0\0\0ÿÿ\0\0\x01\0\0\0¡+rJ¹j\b\0Z\0\0\0Z\0\0\0\0\x13 \x18¦ø\0\x15Å\x04ˆ.\b\0E\x02\0L\0\0@\0@„&\x13
\0\0\x0f
\0\0\v§±06\0\0\0\0„\bkD\x01\0\0,°n³:\0\0Ü\0\0
ÿÿÁÛ‚U\0\f\0\x06\0\x05\0\0€\0\0\x04À\0\0\x04À\x06\0\b\0\0\0\0¡+rJêk\b\0¢\x01\0\0¢\x01\0\0\0\x15Å\x04ˆ.\0\x13 \x18¦ø\b\0E\0\x01”\x01Â\0\0@„c\v
\0\0\v
\0\0\x0f06§±°n³:9Ïé›\x02\0\x01t\x14c5\x06\0\x03Ž8\0
\b\0õÎåÖÀ\x06\0\bPLRS€\0\0\x04À\0\0\x04€\b\0
Á€À‚\x0f\0\0€\x02\0$UlUDJ\x17€Óó¯ï€]›‘ûw櫽\–a\x15D\x17WD78€\x04\0\b\0\x01\0\x03€\x03\0\x06€Á\0\0\0\a\x01\x10KAME-BSD 1.1\0†|ÄO\b\0\0q\x11
\0`ê\0\0\0\0\0\0\0\0\0\0°n³:\x14c5\x06
\0\0\x0f\0\0\0\0\0\0\0\0\0\0\0\0\x05\0\0\0
\0\0\v\0\0\0\0\0\0\0\0\0\0\0\0\x05\0\0\0\0\0\0\0§±06\x01\0\0\0\x01\0{Ä\x04\0\0\0\x01\0\0,°n³:\0\0Ü\0\0
ÿÿÁÛ‚U\0\f\0\x06\0\x05\0\0€\0\0\x04À\0\0\x04À\x06\0\b\0\0\0\0\x02\0\x01t\x14c5\x06\0\x03Ž8\0
\b\0õÎåÖÀ\x06\0\bPLRS€\0\0\x04À\0\0\x04€\b\0
Á€À‚\x0f\0\0€\x02\0$UlUDJ\x17€Óó¯ï€]›‘ûw櫽\–a\x15D\x17WD78€\x04\0\b\0\x01\0\x03€\x03\0\x06€Á\0\0®ýºr\r;$mÌk2µ£tãªÏ>Ô¡+rJ%l\b\0>\x01\0\0>\x01\0\0\0\x13 \x18¦ø\0\x15Å\x04ˆ.\b\0E\x02\x010\0\0@\0@„%/
\0\0\x0f
\0\0\v§±06\x14c5\x06\x1f#½“
\0\x01\x10KAME-BSD 1.1\0†|ÄO\b\0\0q\x11
\0`ê\0\0\0\0\0\0\0\0\0\0°n³:\x14c5\x06
\0\0\x0f\0\0\0\0\0\0\0\0\0\0\0\0\x05\0\0\0
\0\0\v\0\0\0\0\0\0\0\0\0\0\0\0\x05\0\0\0\0\0\0\0§±06\x01\0\0\0\x01\0{Ä\x04\0\0\0\x01\0\0,°n³:\0\0Ü\0\0
ÿÿÁÛ‚U\0\f\0\x06\0\x05\0\0€\0\0\x04À\0\0\x04À\x06\0\b\0\0\0\0\x02\0\x01t\x14c5\x06\0\x03Ž8\0
\b\0õÎåÖÀ\x06\0\bPLRS€\0\0\x04À\0\0\x04€\b\0
Á€À‚\x0f\0\0€\x02\0$UlUDJ\x17€Óó¯ï€]›‘ûw櫽\–a\x15D\x17WD78€\x04\0\b\0\x01\0\x03€\x03\0\x06€Á\0\0®ýºr\r;$mÌk2µ£tãªÏ>Ô¡+rJ m\b\0<\0\0\0<\0\0\0\0\x15Å\x04ˆ.\0\x13 \x18¦ø\b\0E\0\0$\x01Ã@\0@„$z
\0\0\v
\0\0\x0f06§±°n³:6®’è\v\0\0\x04\0\0\0\0\0\0\0\0\0\0¡+rJÊm\b\0"\x05\0\0"\x05\0\0\0\x13 \x18¦ø\0\x15Å\x04ˆ.\b\0E\x02\x05\x14\0\0@\0@„!K
\0\0\x0f
\0\0\v§±06\x14c5\x06.'V¤\0\x02\x04ôÁÛ‚U\0\0\0\0\0\0\0\00123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901¡+rJ(p\b\0>\0\0\0>\0\0\0\0\x15Å\x04ˆ.\0\x13 \x18¦ø\b\0E\0\00\x01Ä@\0@„$m
\0\0\v
\0\0\x0f06§±°n³:*†#s\x03\0\0\x10ÁÛ‚U\0\x03‰T\0\0\0\0¡+rJFp\b\0ú\0\0\0ú\0\0\0\0\x13 \x18¦ø\0\x15Å\x04ˆ.\b\0E\x02\0ì\0\0@\0@„%s
\0\0\x0f
\0\0\v§±06\x14c5\x06.º¢q\0\x01\0ÊÁÛ‚V\0\0\0\0\0\0\0\0234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567\0\0¡+rJ˜q\b\0>\0\0\0>\0\0\0\0\x15Å\x04ˆ.\0\x13 \x18¦ø\b\0E\0\00\x01Å@\0@„$l
\0\0\v
\0\0\x0f06§±°n³:v\x1c\x1eå\x03\0\0\x10ÁÛ‚V\0\x03Ž8\0\0\0\0¡+rJ\x0es\b\0Þ\x05\0\0Þ\x05\0\0\0\x15Å\x04ˆ.\0\x13 \x18¦ø\b\0E\x02\x05Ð\x01Æ@\0@„\x1eÉ
\0\0\v
\0\0\x0f06§±°n³:\x05Þ»®\0\x03\x05®õÎåÖ\0\0\0\0\0\0\0\00123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567\0\0¡+rJ=s\b\0>\0\0\0>\0\0\0\0\x13 \x18¦ø\0\x15Å\x04ˆ.\b\0E\x02\00\0\0@\0@„&/
\0\0\x0f
\0\0\v§±06\x14c5\x06uÀá\x03\0\0\x10õÎåÖ\0\0Öb\0\0\0\0£+rJJt\b\0Þ\x05\0\0Þ\x05\0\0\0\x13 \x18¦ø\0\x15Å\x04ˆ.\b\0E\x02\x05Ð\0\0@\0@„ 
\0\0\x0f
\0\0\v§±06\x14c5\x06ÂWä\0\x03\x05®ÁÛ‚W\0\0\0\x01\0\0\0\00123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567\0\0£+rJ\x19w\b\0>\0\0\0>\0\0\0\0\x15Å\x04ˆ.\0\x13 \x18¦ø\b\0E\0\00\x01È@\0@„$i
\0\0\v
\0\0\x0f06§±°n³:>Ê \x11\x03\0\0\x10ÁÛ‚W\0\x03Ž8\0\0\0\0£+rJ‰x\b\0Þ\x05\0\0Þ\x05\0\0\0\x15Å\x04ˆ.\0\x13 \x18¦ø\b\0E\x02\x05Ð\x01É@\0@„\x1eÆ
\0\0\v
\0\0\x0f06§±°n³:—\x1cçO\0\x03\x05®õÎå×\0\0\0\x01\0\0\0\00123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567\0\0£+rJm‚\v\0>\0\0\0>\0\0\0\0\x13 \x18¦ø\0\x15Å\x04ˆ.\b\0E\x02\00\0\0@\0@„&/
\0\0\x0f
\0\0\v§±06\x14c5\x06=\x16ßd\x03\0\0\x10õÎå×\0\0Öb\0\0\0\0¥+rJÙy\b\0Þ\x05\0\0Þ\x05\0\0\0\x13 \x18¦ø\0\x15Å\x04ˆ.\b\0E\x02\x05Ð\0\0@\0@„ 
\0\0\x0f
\0\0\v§±06\x14c5\x06FUþH\0\x03\x05®ÁÛ‚X\0\0\0\x02\0\0\0\00123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567\0\0¥+rJh|\b\0>\0\0\0>\0\0\0\0\x15Å\x04ˆ.\0\x13 \x18¦ø\b\0E\0\00\x01Ë@\0@„$f
\0\0\v
\0\0\x0f06§±°n³:³ö3n\x03\0\0\x10ÁÛ‚X\0\x03Ž8\0\0\0\0¥+rJá}\b\0Þ\x05\0\0Þ\x05\0\0\0\x15Å\x04ˆ.\0\x13 \x18¦ø\b\0E\x02\x05Ð\x01Ì@\0@„\x1eÃ
\0\0\v
\0\0\x0f06§±°n³:A‹Nã\0\x03\x05®õÎåØ\0\0\0\x02\0\0\0\00123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567\0\0¥+rJ2‹\v\0>\0\0\0>\0\0\0\0\x13 \x18¦ø\0\x15Å\x04ˆ.\b\0E\x02\00\0\0@\0@„&/
\0\0\x0f
\0\0\v§±06\x14c5\x06°*Ì^[\x03\0\0\x10õÎåØ\0\0Öb\0\0\0\0§+rJü~\b\0Þ\x05\0\0Þ\x05\0\0\0\x13 \x18¦ø\0\x15Å\x04ˆ.\b\0E\x02\x05Ð\0\0@\0@„ 
\0\0\x0f
\0\0\v§±06\x14c5\x06Ô—¢©\0\x03\x05®ÁÛ‚Y\0\0\0\x03\0\0\0\00123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567\0\0§+rJˁ\b\0>\0\0\0>\0\0\0\0\x15Å\x04ˆ.\0\x13 \x18¦ø\b\0E\0\00\x01Î@\0@„$c
\0\0\v
\0\0\x0f06§±°n³:û \rš\x03\0\0\x10ÁÛ‚Y\0\x03Ž8\0\0\0\0§+rJ9ƒ\b\0Þ\x05\0\0Þ\x05\0\0\0\x15Å\x04ˆ.\0\x13 \x18¦ø\b\0E\x02\x05Ð\x01Ï@\0@„\x1eÀ
\0\0\v
\0\0\x0f06§±°n³:ÓI\x12\x02\0\x03\x05®õÎåÙ\0\0\0\x03\0\0\0\00123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567\0\0§+rJ#Ž\v\0>\0\0\0>\0\0\0\0\x13 \x18¦ø\0\x15Å\x04ˆ.\b\0E\x02\00\0\0@\0@„&/
\0\0\x0f
\0\0\v§±06\x14c5\x06øüòï\x03\0\0\x10õÎåÙ\0\0Öb\0\0\0\0©+rJY…\b\06\0\0\06\0\0\0\0\x13 \x18¦ø\0\x15Å\x04ˆ.\b\0E\x02\0(\0\0@\0@„&7
\0\0\x0f
\0\0\v§±06\x14c5\x06½Âw\x13\a\0\0\bõÎåÙ©+rJc†\b\0<\0\0\0<\0\0\0\0\x15Å\x04ˆ.\0\x13 \x18¦ø\b\0E\0\0$\x01Ñ@\0@„$l
\0\0\v
\0\0\x0f06§±°n³:\x0f'°Š\b\0\0\x04\0\0\0\0\0\0\0\0\0\0©+rJ‹†\b\02\0\0\02\0\0\0\0\x13 \x18¦ø\0\x15Å\x04ˆ.\b\0E\x02\0$\0\0@\0@„&;
\0\0\x0f
\0\0\v§±06\x14c5\x06 aˆ®\x0e\0\0\x04

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix piggybacked ACKs
  2009-07-29 16:05 [PATCH] Fix piggybacked ACKs Doug Graham
                   ` (5 preceding siblings ...)
  2009-07-30 23:40 ` Doug Graham
@ 2009-07-31  0:53 ` Wei Yongjun
  2009-07-31  1:17 ` Doug Graham
                   ` (20 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Wei Yongjun @ 2009-07-31  0:53 UTC (permalink / raw)
  To: linux-sctp

Doug Graham wrote:
> On Thu, Jul 30, 2009 at 05:24:09PM -0400, Vlad Yasevich wrote:
>   
>> Doug Graham wrote:
>>     
>>> On Thu, Jul 30, 2009 at 05:51:18PM +0800, Wei Yongjun wrote:
>>>       
>>>> The sender should create a SACK only if the size of the final SCTP
>>>> packet does not exceed the current MTU. Base on RFC 4960:
>>>>
>>>>   6.1.  Transmission of DATA Chunks
>>>>
>>>>     Before an endpoint transmits a DATA chunk, if any received DATA
>>>>     chunks have not been acknowledged (e.g., due to delayed ack), the
>>>>     sender should create a SACK and bundle it with the outbound DATA
>>>>     chunk, as long as the size of the final SCTP packet does not exceed
>>>>     the current MTU.
>>>>         
>>> [patch deleted]
>>>
>>> I think you're right that there's a real problem here, and that a patch
>>> similar to yours is needed, but this is not a new problem introduced
>>> with my patch.  I only changed the conditions under which a SACK chunk
>>> was bundled with a DATA chunk, but the same bundling would have been
>>> happening before under different conditions.
>>>       
>> Doug
>>
>> If you still have BSD setup, can you try increasing you message size
>> to say 1442 and see what happens.
>>
>> I'd expect bundles SACKs at 1440 bytes, but then probably a separate SACK and DATA.
>>     
>
> The largest amount of data I can send and still have the BSD server bundle
> a SACK with the response is 1436 bytes.  The total ethernet frame size
> at that point is 1514 bytes, so this seems correct.  I've attached
> wireshark captures with data sizes of 1436 bytes and 1438 bytes.
> It's interesting to note that if BSD decides not to bundle a SACK,
> it instead sends a separate SACK packet immediately; it does not wait
> for the SACK timer to timeout.  It first sends the SACK, then the DATA
> immediately follows. I don't think Wei's patch would do this; I think
> that if his patch determined that bundling a SACK would cause the packet
> to exceed the MTU, then the behaviour will revert to what it was before
> my patch is applied: ie the SACK will not be sent for 200ms.
>   

Before my patch, SACK sent on linux is the same as BSD. But... BSD's
implemention is really correct?

RFC said:

the sender should create a SACK and bundle it with the outbound DATA
chunk, as long as the size of the final SCTP packet does not exceed
the current MTU.

So, we just need create a SACK only if the final packet size does not
exceed the MTU. Always send SACK may cause lower performance.

> So I guess the logic when about to send a DATA chunk should go something
> like:
>
>  if (sack_timer_running) {
>    /* We know we owe the peer a SACK */
>    if (SACK + DATA fits in the MTU)
>       bundle SACK with DATA and send that
>    else {
>       send SACK in a separate packet
>       send DATA in a separate packet
>    }
>    turn_off_sack_timer
>  }
>
> I don't think the RFC was explicit on what to do if the SACK+DATA
> exceeds the MTU, but this makes sense to me.
>
> --Doug.
>   



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix piggybacked ACKs
  2009-07-29 16:05 [PATCH] Fix piggybacked ACKs Doug Graham
                   ` (6 preceding siblings ...)
  2009-07-31  0:53 ` Wei Yongjun
@ 2009-07-31  1:17 ` Doug Graham
  2009-07-31  1:43 ` Doug Graham
                   ` (19 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Doug Graham @ 2009-07-31  1:17 UTC (permalink / raw)
  To: linux-sctp

[-- Attachment #1: Type: text/plain, Size: 3039 bytes --]

On Thu, Jul 30, 2009 at 07:40:47PM -0400, Doug Graham wrote:
> On Thu, Jul 30, 2009 at 05:24:09PM -0400, Vlad Yasevich wrote:
> > If you still have BSD setup, can you try increasing you message size
> > to say 1442 and see what happens.
> > 
> > I'd expect bundles SACKs at 1440 bytes, but then probably a separate SACK and DATA.
> 
> The largest amount of data I can send and still have the BSD server bundle
> a SACK with the response is 1436 bytes.  The total ethernet frame size
> at that point is 1514 bytes, so this seems correct.  I've attached
> wireshark captures with data sizes of 1436 bytes and 1438 bytes.
> It's interesting to note that if BSD decides not to bundle a SACK,
> it instead sends a separate SACK packet immediately; it does not wait
> for the SACK timer to timeout.  It first sends the SACK, then the DATA
> immediately follows. I don't think Wei's patch would do this; I think
> that if his patch determined that bundling a SACK would cause the packet
> to exceed the MTU, then the behaviour will revert to what it was before
> my patch is applied: ie the SACK will not be sent for 200ms.

I think it's about time that I sat down and carefully read the RFC all the
way through before trying to do much more analysis of what's happening on
the wire, but I did just notice something surprising while try slightly
larger packets.  For one, I could've sworn that I saw a ethernet frame
of 1516 bytes at one point, but I didn't save the capture and don't
know whether it was Linux or BSD that sent the oversized frame, or just
my imagination.  But here's one that I did capture when sending and
receiving 1454 bytes of data.  1452 bytes is the most data that will fit
in a single 1514 byte ethernet frame, so 1454 bytes must be fragmented.
The capture is attached, but here's one iteration:

 13 2.002632    10.0.0.15   10.0.0.11   DATA (1452 bytes data) 
 14 2.203092    10.0.0.11   10.0.0.15   SACK 
 15 2.203153    10.0.0.15   10.0.0.11   DATA (2 bytes data)
 16 2.203427    10.0.0.11   10.0.0.15   SACK 
 17 2.203808    10.0.0.11   10.0.0.15   DATA (1452 bytes data)
 18 2.403524    10.0.0.15   10.0.0.11   SACK 
 19 2.403686    10.0.0.11   10.0.0.15   DATA (2 bytes data)
 20 2.603285    10.0.0.15   10.0.0.11   SACK 

What bothers me about this is that Nagle seems to be introducing a delay
here.  The first DATA packets in both directions are MTU-sized packets,
yet both the Linux client and the BSD server wait 200ms until they get
the SACK to the first fragment before sending the second fragment.
The server can't send its reply until it gets both fragments, and the
client can't reassemble the reply until it gets both fragments, so from
the application's point of view, the reply doesn't arrive until 400ms
after the request is sent.  This could probably be fixed by disabling
Nagle with SCTP_NODELAY, but that shouldn't be required.  Nagle is only
supposed to prevent multiple outstanding *small* packets.

If you tell me I'm full of crap, I promise I'll shut up until I read
the whole RFC :-)

--Doug.

[-- Attachment #2: bsd72_server_1454.cap --]
[-- Type: application/octet-stream, Size: 15330 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix piggybacked ACKs
  2009-07-29 16:05 [PATCH] Fix piggybacked ACKs Doug Graham
                   ` (7 preceding siblings ...)
  2009-07-31  1:17 ` Doug Graham
@ 2009-07-31  1:43 ` Doug Graham
  2009-07-31  4:21 ` Wei Yongjun
                   ` (18 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Doug Graham @ 2009-07-31  1:43 UTC (permalink / raw)
  To: linux-sctp

On Fri, Jul 31, 2009 at 08:53:55AM +0800, Wei Yongjun wrote:
> Doug Graham wrote:
> > The largest amount of data I can send and still have the BSD server bundle
> > a SACK with the response is 1436 bytes.  The total ethernet frame size
> > at that point is 1514 bytes, so this seems correct.  I've attached
> > wireshark captures with data sizes of 1436 bytes and 1438 bytes.
> > It's interesting to note that if BSD decides not to bundle a SACK,
> > it instead sends a separate SACK packet immediately; it does not wait
> > for the SACK timer to timeout.  It first sends the SACK, then the DATA
> > immediately follows. I don't think Wei's patch would do this; I think
> > that if his patch determined that bundling a SACK would cause the packet
> > to exceed the MTU, then the behaviour will revert to what it was before
> > my patch is applied: ie the SACK will not be sent for 200ms.
> >   
> 
> Before my patch, SACK sent on linux is the same as BSD.

I had it in my head that without your patch, the combined DATA+SACK packet
would have been fragmented at the IP level, but that's very likely my
unfamiliarity with the code kicking in.

> But... BSD's
> implemention is really correct?
> 
> RFC said:
> 
> the sender should create a SACK and bundle it with the outbound DATA
> chunk, as long as the size of the final SCTP packet does not exceed
> the current MTU.
> 
> So, we just need create a SACK only if the final packet size does not
> exceed the MTU. Always send SACK may cause lower performance.

I agree that this section of the RFC implies that if the SACK won't
fit, it simply shouldn't be sent at this point.  Which would make
BSD's behaviour incorrect.  But to my mind, it makes sense to send it,
although I'm not sure I could make a strong case for that.

But consider that in the case of a client and server sending equal-sized
messages to each other (to keep it simple), there will be a message
size at which the behaviour changes noticably.  Small messages will
be SACK'd immediately.  Messages slightly smaller than the MTU will
not be SACK'd until the delayed ACK timer expires.  Messages slightly
larger than the MTU will again be SACKED immediately because the second
fragment in the response will have space for a SACK (assuming that the
Nagle problem I mentioned in my last email really is a problem that
needs to be fixed).

Perhaps Michael could explain which is the correct behaviour.

--Doug.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix piggybacked ACKs
  2009-07-29 16:05 [PATCH] Fix piggybacked ACKs Doug Graham
                   ` (8 preceding siblings ...)
  2009-07-31  1:43 ` Doug Graham
@ 2009-07-31  4:21 ` Wei Yongjun
  2009-07-31  7:30 ` Michael Tüxen
                   ` (17 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Wei Yongjun @ 2009-07-31  4:21 UTC (permalink / raw)
  To: linux-sctp

Doug Graham wrote:
> On Thu, Jul 30, 2009 at 07:40:47PM -0400, Doug Graham wrote:
>   
>> On Thu, Jul 30, 2009 at 05:24:09PM -0400, Vlad Yasevich wrote:
>>     
>>> If you still have BSD setup, can you try increasing you message size
>>> to say 1442 and see what happens.
>>>
>>> I'd expect bundles SACKs at 1440 bytes, but then probably a separate SACK and DATA.
>>>       
>> The largest amount of data I can send and still have the BSD server bundle
>> a SACK with the response is 1436 bytes.  The total ethernet frame size
>> at that point is 1514 bytes, so this seems correct.  I've attached
>> wireshark captures with data sizes of 1436 bytes and 1438 bytes.
>> It's interesting to note that if BSD decides not to bundle a SACK,
>> it instead sends a separate SACK packet immediately; it does not wait
>> for the SACK timer to timeout.  It first sends the SACK, then the DATA
>> immediately follows. I don't think Wei's patch would do this; I think
>> that if his patch determined that bundling a SACK would cause the packet
>> to exceed the MTU, then the behaviour will revert to what it was before
>> my patch is applied: ie the SACK will not be sent for 200ms.
>>     
>
> I think it's about time that I sat down and carefully read the RFC all the
> way through before trying to do much more analysis of what's happening on
> the wire, but I did just notice something surprising while try slightly
> larger packets.  For one, I could've sworn that I saw a ethernet frame
> of 1516 bytes at one point, but I didn't save the capture and don't
> know whether it was Linux or BSD that sent the oversized frame, or just
> my imagination.  But here's one that I did capture when sending and
> receiving 1454 bytes of data.  1452 bytes is the most data that will fit
> in a single 1514 byte ethernet frame, so 1454 bytes must be fragmented.
> The capture is attached, but here's one iteration:
>
>  13 2.002632    10.0.0.15   10.0.0.11   DATA (1452 bytes data) 
>  14 2.203092    10.0.0.11   10.0.0.15   SACK 
>  15 2.203153    10.0.0.15   10.0.0.11   DATA (2 bytes data)
>  16 2.203427    10.0.0.11   10.0.0.15   SACK 
>  17 2.203808    10.0.0.11   10.0.0.15   DATA (1452 bytes data)
>  18 2.403524    10.0.0.15   10.0.0.11   SACK 
>  19 2.403686    10.0.0.11   10.0.0.15   DATA (2 bytes data)
>  20 2.603285    10.0.0.15   10.0.0.11   SACK 
>
> What bothers me about this is that Nagle seems to be introducing a delay
> here.  The first DATA packets in both directions are MTU-sized packets,
> yet both the Linux client and the BSD server wait 200ms until they get
> the SACK to the first fragment before sending the second fragment.
> The server can't send its reply until it gets both fragments, and the
> client can't reassemble the reply until it gets both fragments, so from
> the application's point of view, the reply doesn't arrive until 400ms
> after the request is sent.  This could probably be fixed by disabling
> Nagle with SCTP_NODELAY, but that shouldn't be required.  Nagle is only
> supposed to prevent multiple outstanding *small* packets.
>   

I think you hit the point which Nagle's algorithm should be not used.

Can you try the following patch?

[PATCH] sctp: do not used Nagle algorithm while fragmented data is transmitted

If fragmented data is sent, the Nagle's algorithm should not be
used. In special case, if only one large packet is sent, the delay
send of fragmented data will cause the receiver wait for more
fragmented data to reassembe them and not send SACK, but the sender
still wait for SACK before send the last fragment.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
---
 net/sctp/output.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/net/sctp/output.c b/net/sctp/output.c
index b94c211..0d8d765 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -718,7 +718,9 @@ static sctp_xmit_t sctp_packet_append_data(struct sctp_packet *packet,
 	 * unacknowledged.
 	 */
 	if (!sp->nodelay && sctp_packet_empty(packet) &&
-	    q->outstanding_bytes && sctp_state(asoc, ESTABLISHED)) {
+	    q->outstanding_bytes && sctp_state(asoc, ESTABLISHED) &&
+	    (chunk->chunk_hdr->flags & SCTP_DATA_FRAG_MASK) =
+							SCTP_DATA_NOT_FRAG) {
 		unsigned len = datasize + q->out_qlen;
 
 		/* Check whether this chunk and all the rest of pending
-- 
1.6.2.2




> If you tell me I'm full of crap, I promise I'll shut up until I read
> the whole RFC :-)
>
> --Doug.
>   



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix piggybacked ACKs
  2009-07-29 16:05 [PATCH] Fix piggybacked ACKs Doug Graham
                   ` (9 preceding siblings ...)
  2009-07-31  4:21 ` Wei Yongjun
@ 2009-07-31  7:30 ` Michael Tüxen
  2009-07-31  7:34 ` Michael Tüxen
                   ` (16 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Michael Tüxen @ 2009-07-31  7:30 UTC (permalink / raw)
  To: linux-sctp

On Jul 31, 2009, at 3:17 AM, Doug Graham wrote:

> On Thu, Jul 30, 2009 at 07:40:47PM -0400, Doug Graham wrote:
>> On Thu, Jul 30, 2009 at 05:24:09PM -0400, Vlad Yasevich wrote:
>>> If you still have BSD setup, can you try increasing you message size
>>> to say 1442 and see what happens.
>>>
>>> I'd expect bundles SACKs at 1440 bytes, but then probably a  
>>> separate SACK and DATA.
>>
>> The largest amount of data I can send and still have the BSD server  
>> bundle
>> a SACK with the response is 1436 bytes.  The total ethernet frame  
>> size
>> at that point is 1514 bytes, so this seems correct.  I've attached
>> wireshark captures with data sizes of 1436 bytes and 1438 bytes.
>> It's interesting to note that if BSD decides not to bundle a SACK,
>> it instead sends a separate SACK packet immediately; it does not wait
>> for the SACK timer to timeout.  It first sends the SACK, then the  
>> DATA
>> immediately follows. I don't think Wei's patch would do this; I think
>> that if his patch determined that bundling a SACK would cause the  
>> packet
>> to exceed the MTU, then the behaviour will revert to what it was  
>> before
>> my patch is applied: ie the SACK will not be sent for 200ms.
>
> I think it's about time that I sat down and carefully read the RFC  
> all the
> way through before trying to do much more analysis of what's  
> happening on
> the wire, but I did just notice something surprising while try  
> slightly
> larger packets.  For one, I could've sworn that I saw a ethernet frame
> of 1516 bytes at one point, but I didn't save the capture and don't
> know whether it was Linux or BSD that sent the oversized frame, or  
> just
> my imagination.  But here's one that I did capture when sending and
> receiving 1454 bytes of data.  1452 bytes is the most data that will  
> fit
> in a single 1514 byte ethernet frame, so 1454 bytes must be  
> fragmented.
> The capture is attached, but here's one iteration:
>
> 13 2.002632    10.0.0.15   10.0.0.11   DATA (1452 bytes data)
> 14 2.203092    10.0.0.11   10.0.0.15   SACK
> 15 2.203153    10.0.0.15   10.0.0.11   DATA (2 bytes data)
> 16 2.203427    10.0.0.11   10.0.0.15   SACK
> 17 2.203808    10.0.0.11   10.0.0.15   DATA (1452 bytes data)
> 18 2.403524    10.0.0.15   10.0.0.11   SACK
> 19 2.403686    10.0.0.11   10.0.0.15   DATA (2 bytes data)
> 20 2.603285    10.0.0.15   10.0.0.11   SACK
>
> What bothers me about this is that Nagle seems to be introducing a  
> delay
This is the common bad interaction between Nagle and delayed SACKs.
> here.  The first DATA packets in both directions are MTU-sized  
> packets,
> yet both the Linux client and the BSD server wait 200ms until they get
> the SACK to the first fragment before sending the second fragment.
> The server can't send its reply until it gets both fragments, and the
> client can't reassemble the reply until it gets both fragments, so  
> from
> the application's point of view, the reply doesn't arrive until 400ms
> after the request is sent.  This could probably be fixed by disabling
> Nagle with SCTP_NODELAY, but that shouldn't be required.  Nagle is  
> only
> supposed to prevent multiple outstanding *small* packets.
Yes, but Nagle operates at the level of chunks...
This problem is one of the reasons why we have
draft-tuexen-tsvwg-sctp-sack-immediately-02
The kernel can set the I-Bit on the first chunk...
Currently the only way around this is to disable Nagle at all...
>
> If you tell me I'm full of crap, I promise I'll shut up until I read
> the whole RFC :-)
>
> --Doug.
> <bsd72_server_1454.cap>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix piggybacked ACKs
  2009-07-29 16:05 [PATCH] Fix piggybacked ACKs Doug Graham
                   ` (10 preceding siblings ...)
  2009-07-31  7:30 ` Michael Tüxen
@ 2009-07-31  7:34 ` Michael Tüxen
  2009-07-31 12:59 ` Doug Graham
                   ` (15 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Michael Tüxen @ 2009-07-31  7:34 UTC (permalink / raw)
  To: linux-sctp

On Jul 31, 2009, at 3:43 AM, Doug Graham wrote:

> On Fri, Jul 31, 2009 at 08:53:55AM +0800, Wei Yongjun wrote:
>> Doug Graham wrote:
>>> The largest amount of data I can send and still have the BSD  
>>> server bundle
>>> a SACK with the response is 1436 bytes.  The total ethernet frame  
>>> size
>>> at that point is 1514 bytes, so this seems correct.  I've attached
>>> wireshark captures with data sizes of 1436 bytes and 1438 bytes.
>>> It's interesting to note that if BSD decides not to bundle a SACK,
>>> it instead sends a separate SACK packet immediately; it does not  
>>> wait
>>> for the SACK timer to timeout.  It first sends the SACK, then the  
>>> DATA
>>> immediately follows. I don't think Wei's patch would do this; I  
>>> think
>>> that if his patch determined that bundling a SACK would cause the  
>>> packet
>>> to exceed the MTU, then the behaviour will revert to what it was  
>>> before
>>> my patch is applied: ie the SACK will not be sent for 200ms.
>>>
>>
>> Before my patch, SACK sent on linux is the same as BSD.
>
> I had it in my head that without your patch, the combined DATA+SACK  
> packet
> would have been fragmented at the IP level, but that's very likely my
> unfamiliarity with the code kicking in.
>
>> But... BSD's
>> implemention is really correct?
>>
>> RFC said:
>>
>> the sender should create a SACK and bundle it with the outbound DATA
>> chunk, as long as the size of the final SCTP packet does not exceed
>> the current MTU.
>>
>> So, we just need create a SACK only if the final packet size does not
>> exceed the MTU. Always send SACK may cause lower performance.
>
> I agree that this section of the RFC implies that if the SACK won't
> fit, it simply shouldn't be sent at this point.  Which would make
> BSD's behaviour incorrect.  But to my mind, it makes sense to send it,
> although I'm not sure I could make a strong case for that.
>
> But consider that in the case of a client and server sending equal- 
> sized
> messages to each other (to keep it simple), there will be a message
> size at which the behaviour changes noticably.  Small messages will
> be SACK'd immediately.  Messages slightly smaller than the MTU will
> not be SACK'd until the delayed ACK timer expires.  Messages slightly
> larger than the MTU will again be SACKED immediately because the  
> second
> fragment in the response will have space for a SACK (assuming that the
> Nagle problem I mentioned in my last email really is a problem that
> needs to be fixed).
>
> Perhaps Michael could explain which is the correct behaviour.
As said in my earlier mail:
I have seen this multiple times: for one packet size the app
runs find, make the packet size larger (1 byte enough) and
your throughput drops to 5 packets/requests per second (assuming
a 200ms delayed ack timer).

I agree that this is something the kernel should take care of
and i think draft-tuexen-tsvwg-sctp-sack-immediately-02 is
the way to go...
>
> --Doug.
>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix piggybacked ACKs
  2009-07-29 16:05 [PATCH] Fix piggybacked ACKs Doug Graham
                   ` (11 preceding siblings ...)
  2009-07-31  7:34 ` Michael Tüxen
@ 2009-07-31 12:59 ` Doug Graham
  2009-07-31 13:11 ` Doug Graham
                   ` (14 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Doug Graham @ 2009-07-31 12:59 UTC (permalink / raw)
  To: linux-sctp

On Fri, Jul 31, 2009 at 12:21:15PM +0800, Wei Yongjun wrote:
> Doug Graham wrote:
> > On Thu, Jul 30, 2009 at 07:40:47PM -0400, Doug Graham wrote:
> >   
> >> On Thu, Jul 30, 2009 at 05:24:09PM -0400, Vlad Yasevich wrote:
> >>     
> >>> If you still have BSD setup, can you try increasing you message size
> >>> to say 1442 and see what happens.
> >>>
> >>> I'd expect bundles SACKs at 1440 bytes, but then probably a separate SACK and DATA.
> >>>       
> >> The largest amount of data I can send and still have the BSD server bundle
> >> a SACK with the response is 1436 bytes.  The total ethernet frame size
> >> at that point is 1514 bytes, so this seems correct.  I've attached
> >> wireshark captures with data sizes of 1436 bytes and 1438 bytes.
> >> It's interesting to note that if BSD decides not to bundle a SACK,
> >> it instead sends a separate SACK packet immediately; it does not wait
> >> for the SACK timer to timeout.  It first sends the SACK, then the DATA
> >> immediately follows. I don't think Wei's patch would do this; I think
> >> that if his patch determined that bundling a SACK would cause the packet
> >> to exceed the MTU, then the behaviour will revert to what it was before
> >> my patch is applied: ie the SACK will not be sent for 200ms.
> >>     
> >
> > I think it's about time that I sat down and carefully read the RFC all the
> > way through before trying to do much more analysis of what's happening on
> > the wire, but I did just notice something surprising while try slightly
> > larger packets.  For one, I could've sworn that I saw a ethernet frame
> > of 1516 bytes at one point, but I didn't save the capture and don't
> > know whether it was Linux or BSD that sent the oversized frame, or just
> > my imagination.  But here's one that I did capture when sending and
> > receiving 1454 bytes of data.  1452 bytes is the most data that will fit
> > in a single 1514 byte ethernet frame, so 1454 bytes must be fragmented.
> > The capture is attached, but here's one iteration:
> >
> >  13 2.002632    10.0.0.15   10.0.0.11   DATA (1452 bytes data) 
> >  14 2.203092    10.0.0.11   10.0.0.15   SACK 
> >  15 2.203153    10.0.0.15   10.0.0.11   DATA (2 bytes data)
> >  16 2.203427    10.0.0.11   10.0.0.15   SACK 
> >  17 2.203808    10.0.0.11   10.0.0.15   DATA (1452 bytes data)
> >  18 2.403524    10.0.0.15   10.0.0.11   SACK 
> >  19 2.403686    10.0.0.11   10.0.0.15   DATA (2 bytes data)
> >  20 2.603285    10.0.0.15   10.0.0.11   SACK 
> >
> > What bothers me about this is that Nagle seems to be introducing a delay
> > here.  The first DATA packets in both directions are MTU-sized packets,
> > yet both the Linux client and the BSD server wait 200ms until they get
> > the SACK to the first fragment before sending the second fragment.
> > The server can't send its reply until it gets both fragments, and the
> > client can't reassemble the reply until it gets both fragments, so from
> > the application's point of view, the reply doesn't arrive until 400ms
> > after the request is sent.  This could probably be fixed by disabling
> > Nagle with SCTP_NODELAY, but that shouldn't be required.  Nagle is only
> > supposed to prevent multiple outstanding *small* packets.
> >   
> 
> I think you hit the point which Nagle's algorithm should be not used.
> 
> Can you try the following patch?
> 
> [PATCH] sctp: do not used Nagle algorithm while fragmented data is transmitted
> 
> If fragmented data is sent, the Nagle's algorithm should not be
> used. In special case, if only one large packet is sent, the delay
> send of fragmented data will cause the receiver wait for more
> fragmented data to reassembe them and not send SACK, but the sender
> still wait for SACK before send the last fragment.
> 
> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
> ---
>  net/sctp/output.c |    4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
> 
> diff --git a/net/sctp/output.c b/net/sctp/output.c
> index b94c211..0d8d765 100644
> --- a/net/sctp/output.c
> +++ b/net/sctp/output.c
> @@ -718,7 +718,9 @@ static sctp_xmit_t sctp_packet_append_data(struct sctp_packet *packet,
>  	 * unacknowledged.
>  	 */
>  	if (!sp->nodelay && sctp_packet_empty(packet) &&
> -	    q->outstanding_bytes && sctp_state(asoc, ESTABLISHED)) {
> +	    q->outstanding_bytes && sctp_state(asoc, ESTABLISHED) &&
> +	    (chunk->chunk_hdr->flags & SCTP_DATA_FRAG_MASK) =
> +							SCTP_DATA_NOT_FRAG) {
>  		unsigned len = datasize + q->out_qlen;
>  
>  		/* Check whether this chunk and all the rest of pending
> -- 
> 1.6.2.2

This patch seems to do the job.  I applied it in a UML instance and ran
my server in that.  The client is still unpatched.  I see this:

 13 2.002638    10.0.0.15    10.0.0.249    DATA (1452 bytes data)
 14 2.204041    10.0.0.249   10.0.0.15     SACK 
 15 2.204090    10.0.0.15    10.0.0.249    DATA (2 bytes data)
 16 2.204428    10.0.0.249   10.0.0.15     SACK 
 17 2.204822    10.0.0.249   10.0.0.15     DATA (1452 bytes data)
 18 2.204856    10.0.0.249   10.0.0.15     DATA (2 bytes data)
 19 2.204890    10.0.0.15    10.0.0.249    SACK 

So 10.0.0.249 (the patched UML server) did send back-to-back data
packets without waiting for the SACK,

I have not applied your MTU patch yet, so the server also sent a
separate SACK immediately.  This is less than ideal, since it could have
piggybacked the SACK on the second DATA fragment (frame 18), which has
lots of room.  I think your MTU patch might accomplish that.

--Doug

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix piggybacked ACKs
  2009-07-29 16:05 [PATCH] Fix piggybacked ACKs Doug Graham
                   ` (12 preceding siblings ...)
  2009-07-31 12:59 ` Doug Graham
@ 2009-07-31 13:11 ` Doug Graham
  2009-07-31 13:39 ` Doug Graham
                   ` (13 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Doug Graham @ 2009-07-31 13:11 UTC (permalink / raw)
  To: linux-sctp

On Fri, Jul 31, 2009 at 09:30:48AM +0200, Michael T?xen wrote:
> On Jul 31, 2009, at 3:17 AM, Doug Graham wrote:
> >13 2.002632    10.0.0.15   10.0.0.11   DATA (1452 bytes data)
> >14 2.203092    10.0.0.11   10.0.0.15   SACK
> >15 2.203153    10.0.0.15   10.0.0.11   DATA (2 bytes data)
> >16 2.203427    10.0.0.11   10.0.0.15   SACK
> >17 2.203808    10.0.0.11   10.0.0.15   DATA (1452 bytes data)
> >18 2.403524    10.0.0.15   10.0.0.11   SACK
> >19 2.403686    10.0.0.11   10.0.0.15   DATA (2 bytes data)
> >20 2.603285    10.0.0.15   10.0.0.11   SACK
> >
> >What bothers me about this is that Nagle seems to be introducing a  
> >delay here.
>
> This is the common bad interaction between Nagle and delayed SACKs.
>
> >The first DATA packets in both directions are MTU-sized packets
> >yet both the Linux client and the BSD server wait 200ms until they get
> >the SACK to the first fragment before sending the second fragment.
> >The server can't send its reply until it gets both fragments, and the
> >client can't reassemble the reply until it gets both fragments, so  
> >from
> >the application's point of view, the reply doesn't arrive until 400ms
> >after the request is sent.  This could probably be fixed by disabling
> >Nagle with SCTP_NODELAY, but that shouldn't be required.  Nagle is  
> >only supposed to prevent multiple outstanding *small* packets.

> Yes, but Nagle operates at the level of chunks...
> This problem is one of the reasons why we have
> draft-tuexen-tsvwg-sctp-sack-immediately-02
> The kernel can set the I-Bit on the first chunk...
> Currently the only way around this is to disable Nagle at all...

I don't understand how Nagle operating at the level of chunks makes a
difference here.  The first DATA chunk contained 1452 bytes of data,
which results in an MTU-sized frame.  How is this any different than TCP
sending 1460 bytes data in one segment followed immediately by sending
another 2 bytes in a second segment?  ie: user tries to send 1462 bytes,
which TCP has to segment.  It first sends an MTU-sized packet followed
immediately by another packet containing whatever is left over the data,
all without Nagle getting in the way.

--Doug

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix piggybacked ACKs
  2009-07-29 16:05 [PATCH] Fix piggybacked ACKs Doug Graham
                   ` (13 preceding siblings ...)
  2009-07-31 13:11 ` Doug Graham
@ 2009-07-31 13:39 ` Doug Graham
  2009-07-31 14:18 ` Vlad Yasevich
                   ` (12 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Doug Graham @ 2009-07-31 13:39 UTC (permalink / raw)
  To: linux-sctp

On Fri, Jul 31, 2009 at 08:59:29AM -0400, Doug Graham wrote:
> This patch seems to do the job.  I applied it in a UML instance and ran
> my server in that.  The client is still unpatched.  I see this:
> 
>  13 2.002638    10.0.0.15    10.0.0.249    DATA (1452 bytes data)
>  14 2.204041    10.0.0.249   10.0.0.15     SACK 
>  15 2.204090    10.0.0.15    10.0.0.249    DATA (2 bytes data)
>  16 2.204428    10.0.0.249   10.0.0.15     SACK 
>  17 2.204822    10.0.0.249   10.0.0.15     DATA (1452 bytes data)
>  18 2.204856    10.0.0.249   10.0.0.15     DATA (2 bytes data)
>  19 2.204890    10.0.0.15    10.0.0.249    SACK 
> 
> So 10.0.0.249 (the patched UML server) did send back-to-back data
> packets without waiting for the SACK,
> 
> I have not applied your MTU patch yet, so the server also sent a
> separate SACK immediately.  This is less than ideal, since it could have
> piggybacked the SACK on the second DATA fragment (frame 18), which has
> lots of room.  I think your MTU patch might accomplish that.

Here's what it looks like after I apply your V2 MTU patch:

 12 2.002750    10.0.0.15      10.0.0.249    DATA 
 13 2.204164    10.0.0.249     10.0.0.15     SACK 
 14 2.204204    10.0.0.15      10.0.0.249    DATA 
 15 2.204926    10.0.0.249     10.0.0.15     DATA 
 16 2.204950    10.0.0.249     10.0.0.15     SACK DATA 
 17 2.204974    10.0.0.15      10.0.0.249    SACK 

Starting to look pretty good!

--Doug

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix piggybacked ACKs
  2009-07-29 16:05 [PATCH] Fix piggybacked ACKs Doug Graham
                   ` (14 preceding siblings ...)
  2009-07-31 13:39 ` Doug Graham
@ 2009-07-31 14:18 ` Vlad Yasevich
  2009-08-02  2:03 ` Doug Graham
                   ` (11 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Vlad Yasevich @ 2009-07-31 14:18 UTC (permalink / raw)
  To: linux-sctp

Michael Tüxen wrote:
> On Jul 31, 2009, at 3:17 AM, Doug Graham wrote:
> 
>> On Thu, Jul 30, 2009 at 07:40:47PM -0400, Doug Graham wrote:
>>> On Thu, Jul 30, 2009 at 05:24:09PM -0400, Vlad Yasevich wrote:
>>>> If you still have BSD setup, can you try increasing you message size
>>>> to say 1442 and see what happens.
>>>>
>>>> I'd expect bundles SACKs at 1440 bytes, but then probably a separate
>>>> SACK and DATA.
>>>
>>> The largest amount of data I can send and still have the BSD server
>>> bundle
>>> a SACK with the response is 1436 bytes.  The total ethernet frame size
>>> at that point is 1514 bytes, so this seems correct.  I've attached
>>> wireshark captures with data sizes of 1436 bytes and 1438 bytes.
>>> It's interesting to note that if BSD decides not to bundle a SACK,
>>> it instead sends a separate SACK packet immediately; it does not wait
>>> for the SACK timer to timeout.  It first sends the SACK, then the DATA
>>> immediately follows. I don't think Wei's patch would do this; I think
>>> that if his patch determined that bundling a SACK would cause the packet
>>> to exceed the MTU, then the behaviour will revert to what it was before
>>> my patch is applied: ie the SACK will not be sent for 200ms.
>>
>> I think it's about time that I sat down and carefully read the RFC all
>> the
>> way through before trying to do much more analysis of what's happening on
>> the wire, but I did just notice something surprising while try slightly
>> larger packets.  For one, I could've sworn that I saw a ethernet frame
>> of 1516 bytes at one point, but I didn't save the capture and don't
>> know whether it was Linux or BSD that sent the oversized frame, or just
>> my imagination.  But here's one that I did capture when sending and
>> receiving 1454 bytes of data.  1452 bytes is the most data that will fit
>> in a single 1514 byte ethernet frame, so 1454 bytes must be fragmented.
>> The capture is attached, but here's one iteration:
>>
>> 13 2.002632    10.0.0.15   10.0.0.11   DATA (1452 bytes data)
>> 14 2.203092    10.0.0.11   10.0.0.15   SACK
>> 15 2.203153    10.0.0.15   10.0.0.11   DATA (2 bytes data)
>> 16 2.203427    10.0.0.11   10.0.0.15   SACK
>> 17 2.203808    10.0.0.11   10.0.0.15   DATA (1452 bytes data)
>> 18 2.403524    10.0.0.15   10.0.0.11   SACK
>> 19 2.403686    10.0.0.11   10.0.0.15   DATA (2 bytes data)
>> 20 2.603285    10.0.0.15   10.0.0.11   SACK
>>
>> What bothers me about this is that Nagle seems to be introducing a delay
> This is the common bad interaction between Nagle and delayed SACKs.
>> here.  The first DATA packets in both directions are MTU-sized packets,
>> yet both the Linux client and the BSD server wait 200ms until they get
>> the SACK to the first fragment before sending the second fragment.
>> The server can't send its reply until it gets both fragments, and the
>> client can't reassemble the reply until it gets both fragments, so from
>> the application's point of view, the reply doesn't arrive until 400ms
>> after the request is sent.  This could probably be fixed by disabling
>> Nagle with SCTP_NODELAY, but that shouldn't be required.  Nagle is only
>> supposed to prevent multiple outstanding *small* packets.
> Yes, but Nagle operates at the level of chunks...
> This problem is one of the reasons why we have

Michael

That doesn't make sense.  Nagle was meant to prevent sending a bunch small
packets.  That doesn't apply if the user sends a large enough message that it
ends up being fragmenting into a full sized data chunk and a small-sized data
chunk.  It doesn't sound like Nagle should apply to the second fragment.

-vlad


> draft-tuexen-tsvwg-sctp-sack-immediately-02
> The kernel can set the I-Bit on the first chunk...
> Currently the only way around this is to disable Nagle at all...
>>
>> If you tell me I'm full of crap, I promise I'll shut up until I read
>> the whole RFC :-)
>>
>> --Doug.
>> <bsd72_server_1454.cap>
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix piggybacked ACKs
  2009-07-29 16:05 [PATCH] Fix piggybacked ACKs Doug Graham
                   ` (15 preceding siblings ...)
  2009-07-31 14:18 ` Vlad Yasevich
@ 2009-08-02  2:03 ` Doug Graham
  2009-08-03  2:00 ` Wei Yongjun
                   ` (10 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Doug Graham @ 2009-08-02  2:03 UTC (permalink / raw)
  To: linux-sctp

On Fri, Jul 31, 2009 at 12:21:15PM +0800, Wei Yongjun wrote:
> Doug Graham wrote:
> >  13 2.002632    10.0.0.15   10.0.0.11   DATA (1452 bytes data) 
> >  14 2.203092    10.0.0.11   10.0.0.15   SACK 
> >  15 2.203153    10.0.0.15   10.0.0.11   DATA (2 bytes data)
> >  16 2.203427    10.0.0.11   10.0.0.15   SACK 
> >  17 2.203808    10.0.0.11   10.0.0.15   DATA (1452 bytes data)
> >  18 2.403524    10.0.0.15   10.0.0.11   SACK 
> >  19 2.403686    10.0.0.11   10.0.0.15   DATA (2 bytes data)
> >  20 2.603285    10.0.0.15   10.0.0.11   SACK 
> >
> > What bothers me about this is that Nagle seems to be introducing a delay
> > here.  The first DATA packets in both directions are MTU-sized packets,
> > yet both the Linux client and the BSD server wait 200ms until they get
> > the SACK to the first fragment before sending the second fragment.
> > The server can't send its reply until it gets both fragments, and the
> > client can't reassemble the reply until it gets both fragments, so from
> > the application's point of view, the reply doesn't arrive until 400ms
> > after the request is sent.  This could probably be fixed by disabling
> > Nagle with SCTP_NODELAY, but that shouldn't be required.  Nagle is only
> > supposed to prevent multiple outstanding *small* packets.
> >   
> 
> I think you hit the point which Nagle's algorithm should be not used.
> 
> Can you try the following patch?
> 
> [PATCH] sctp: do not used Nagle algorithm while fragmented data is transmitted
> 
> If fragmented data is sent, the Nagle's algorithm should not be
> used. In special case, if only one large packet is sent, the delay
> send of fragmented data will cause the receiver wait for more
> fragmented data to reassembe them and not send SACK, but the sender
> still wait for SACK before send the last fragment.

[patch deleted]

This patch seems to work quite well, but I think disabling Nagle
completely for large messages is not quite the right thing to do.
There's a draft-minshall-nagle-01.txt floating around that describes a
modified Nagle algorithm for TCP.  It appears to have been implemented
in Linux TCP even though the draft has expired.  The modified algorithm
is how I thought Nagle had always worked to begin with.  From the draft:

        "If a TCP has less than a full-sized packet to transmit,
        and if any previously transmitted less than full-sized
        packet has not yet been acknowledged, do not transmit
        a packet."

so in the case of sending a fragmented SCTP message, all but the last
fragment will be full-sized and will be sent without delay.  The last
fragment will usually not be full-sized, but it too will be sent without
delay because there are no outstanding non-full-sized packets.

The difference between this and your method is that yours would
allow many small fragments of big messages to be outstanding, whereas
this one would only allow the first big message to be sent in its
entirety, followed by the full-sized fragments of the next big
message.  When it came time to send the second small fragment,
Nagle would force it to wait for an ACK for the first small fragment.
I'm not convinced that the difference is all that important,
but who knows.

Here's my attempt at implementing the modified Nagle algorithm described
in draft-minshall-nagle-01.txt.  It should be applied instead of your
patch, not on top of it.  If (q->outstanding_bytes % asoc->frag_point)
is zero, no delay is introduced.  The assumption is that this means that
all outstanding packets (if any) are full-sized.

Signed-off-by: Doug Graham <dgraham@nortel.com>

---
--- linux-2.6.29/net/sctp/output.c	2009/08/02 00:47:44	1.3
+++ linux-2.6.29/net/sctp/output.c	2009/08/02 00:51:18
@@ -717,7 +717,8 @@ static sctp_xmit_t sctp_packet_append_da
 	 * unacknowledged.
 	 */
 	if (!sp->nodelay && sctp_packet_empty(packet) &&
-	    q->outstanding_bytes && sctp_state(asoc, ESTABLISHED)) {
+	    (q->outstanding_bytes % asoc->frag_point) != 0 &&
+	    sctp_state(asoc, ESTABLISHED)) {
 		unsigned len = datasize + q->out_qlen;
 
 		/* Check whether this chunk and all the rest of pending

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix piggybacked ACKs
  2009-07-29 16:05 [PATCH] Fix piggybacked ACKs Doug Graham
                   ` (16 preceding siblings ...)
  2009-08-02  2:03 ` Doug Graham
@ 2009-08-03  2:00 ` Wei Yongjun
  2009-08-03  2:15 ` Wei Yongjun
                   ` (9 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Wei Yongjun @ 2009-08-03  2:00 UTC (permalink / raw)
  To: linux-sctp

Doug Graham wrote:
> On Fri, Jul 31, 2009 at 12:21:15PM +0800, Wei Yongjun wrote:
>   
>> Doug Graham wrote:
>>     
>>>  13 2.002632    10.0.0.15   10.0.0.11   DATA (1452 bytes data) 
>>>  14 2.203092    10.0.0.11   10.0.0.15   SACK 
>>>  15 2.203153    10.0.0.15   10.0.0.11   DATA (2 bytes data)
>>>  16 2.203427    10.0.0.11   10.0.0.15   SACK 
>>>  17 2.203808    10.0.0.11   10.0.0.15   DATA (1452 bytes data)
>>>  18 2.403524    10.0.0.15   10.0.0.11   SACK 
>>>  19 2.403686    10.0.0.11   10.0.0.15   DATA (2 bytes data)
>>>  20 2.603285    10.0.0.15   10.0.0.11   SACK 
>>>
>>> What bothers me about this is that Nagle seems to be introducing a delay
>>> here.  The first DATA packets in both directions are MTU-sized packets,
>>> yet both the Linux client and the BSD server wait 200ms until they get
>>> the SACK to the first fragment before sending the second fragment.
>>> The server can't send its reply until it gets both fragments, and the
>>> client can't reassemble the reply until it gets both fragments, so from
>>> the application's point of view, the reply doesn't arrive until 400ms
>>> after the request is sent.  This could probably be fixed by disabling
>>> Nagle with SCTP_NODELAY, but that shouldn't be required.  Nagle is only
>>> supposed to prevent multiple outstanding *small* packets.
>>>   
>>>       
>> I think you hit the point which Nagle's algorithm should be not used.
>>
>> Can you try the following patch?
>>
>> [PATCH] sctp: do not used Nagle algorithm while fragmented data is transmitted
>>
>> If fragmented data is sent, the Nagle's algorithm should not be
>> used. In special case, if only one large packet is sent, the delay
>> send of fragmented data will cause the receiver wait for more
>> fragmented data to reassembe them and not send SACK, but the sender
>> still wait for SACK before send the last fragment.
>>     
>
> [patch deleted]
>
> This patch seems to work quite well, but I think disabling Nagle
> completely for large messages is not quite the right thing to do.
> There's a draft-minshall-nagle-01.txt floating around that describes a
> modified Nagle algorithm for TCP.  It appears to have been implemented
> in Linux TCP even though the draft has expired.  The modified algorithm
> is how I thought Nagle had always worked to begin with.  From the draft:
>
>         "If a TCP has less than a full-sized packet to transmit,
>         and if any previously transmitted less than full-sized
>         packet has not yet been acknowledged, do not transmit
>         a packet."
>
> so in the case of sending a fragmented SCTP message, all but the last
> fragment will be full-sized and will be sent without delay.  The last
> fragment will usually not be full-sized, but it too will be sent without
> delay because there are no outstanding non-full-sized packets.
>
> The difference between this and your method is that yours would
> allow many small fragments of big messages to be outstanding, whereas
> this one would only allow the first big message to be sent in its
> entirety, followed by the full-sized fragments of the next big
> message.  When it came time to send the second small fragment,
> Nagle would force it to wait for an ACK for the first small fragment.
> I'm not convinced that the difference is all that important,
> but who knows.
>
> Here's my attempt at implementing the modified Nagle algorithm described
> in draft-minshall-nagle-01.txt.  It should be applied instead of your
> patch, not on top of it.  If (q->outstanding_bytes % asoc->frag_point)
> is zero, no delay is introduced.  The assumption is that this means that
> all outstanding packets (if any) are full-sized.
>
> Signed-off-by: Doug Graham <dgraham@nortel.com>
>
> ---
> --- linux-2.6.29/net/sctp/output.c	2009/08/02 00:47:44	1.3
> +++ linux-2.6.29/net/sctp/output.c	2009/08/02 00:51:18
> @@ -717,7 +717,8 @@ static sctp_xmit_t sctp_packet_append_da
>  	 * unacknowledged.
>  	 */
>  	if (!sp->nodelay && sctp_packet_empty(packet) &&
> -	    q->outstanding_bytes && sctp_state(asoc, ESTABLISHED)) {
> +	    (q->outstanding_bytes % asoc->frag_point) != 0 &&
> +	    sctp_state(asoc, ESTABLISHED)) {
>  		unsigned len = datasize + q->out_qlen;
>  
>  		/* Check whether this chunk and all the rest of pending
>   


Seem good! But it may be broken the small packet transmit which can be
used Nagle algorithm.
Such as this:

Endpoint A                Endpint B
          <-------------  DATA (size\x1452/2) delay send
          <-------------  DATA (size\x1452/2) send immediately
          <-------------  DATA (size\x1452/2) send immediately ** broken
          <-------------  DATA (size\x1452/2) delay send
          <-------------  DATA (size\x1452/2) send immediately
          <-------------  DATA (size\x1452/2) send immediately ** broken


Can you try this one?





^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix piggybacked ACKs
  2009-07-29 16:05 [PATCH] Fix piggybacked ACKs Doug Graham
                   ` (17 preceding siblings ...)
  2009-08-03  2:00 ` Wei Yongjun
@ 2009-08-03  2:15 ` Wei Yongjun
  2009-08-03  3:32 ` Wei Yongjun
                   ` (8 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Wei Yongjun @ 2009-08-03  2:15 UTC (permalink / raw)
  To: linux-sctp

Doug Graham wrote:
> On Fri, Jul 31, 2009 at 12:21:15PM +0800, Wei Yongjun wrote:
>   
>> Doug Graham wrote:
>>     
>>>  13 2.002632    10.0.0.15   10.0.0.11   DATA (1452 bytes data) 
>>>  14 2.203092    10.0.0.11   10.0.0.15   SACK 
>>>  15 2.203153    10.0.0.15   10.0.0.11   DATA (2 bytes data)
>>>  16 2.203427    10.0.0.11   10.0.0.15   SACK 
>>>  17 2.203808    10.0.0.11   10.0.0.15   DATA (1452 bytes data)
>>>  18 2.403524    10.0.0.15   10.0.0.11   SACK 
>>>  19 2.403686    10.0.0.11   10.0.0.15   DATA (2 bytes data)
>>>  20 2.603285    10.0.0.15   10.0.0.11   SACK 
>>>
>>> What bothers me about this is that Nagle seems to be introducing a delay
>>> here.  The first DATA packets in both directions are MTU-sized packets,
>>> yet both the Linux client and the BSD server wait 200ms until they get
>>> the SACK to the first fragment before sending the second fragment.
>>> The server can't send its reply until it gets both fragments, and the
>>> client can't reassemble the reply until it gets both fragments, so from
>>> the application's point of view, the reply doesn't arrive until 400ms
>>> after the request is sent.  This could probably be fixed by disabling
>>> Nagle with SCTP_NODELAY, but that shouldn't be required.  Nagle is only
>>> supposed to prevent multiple outstanding *small* packets.
>>>   
>>>       
>> I think you hit the point which Nagle's algorithm should be not used.
>>
>> Can you try the following patch?
>>
>> [PATCH] sctp: do not used Nagle algorithm while fragmented data is transmitted
>>
>> If fragmented data is sent, the Nagle's algorithm should not be
>> used. In special case, if only one large packet is sent, the delay
>> send of fragmented data will cause the receiver wait for more
>> fragmented data to reassembe them and not send SACK, but the sender
>> still wait for SACK before send the last fragment.
>>     
>
> [patch deleted]
>
> This patch seems to work quite well, but I think disabling Nagle
> completely for large messages is not quite the right thing to do.
> There's a draft-minshall-nagle-01.txt floating around that describes a
> modified Nagle algorithm for TCP.  It appears to have been implemented
> in Linux TCP even though the draft has expired.  The modified algorithm
> is how I thought Nagle had always worked to begin with.  From the draft:
>
>         "If a TCP has less than a full-sized packet to transmit,
>         and if any previously transmitted less than full-sized
>         packet has not yet been acknowledged, do not transmit
>         a packet."
>
> so in the case of sending a fragmented SCTP message, all but the last
> fragment will be full-sized and will be sent without delay.  The last
> fragment will usually not be full-sized, but it too will be sent without
> delay because there are no outstanding non-full-sized packets.
>
> The difference between this and your method is that yours would
> allow many small fragments of big messages to be outstanding, whereas
> this one would only allow the first big message to be sent in its
> entirety, followed by the full-sized fragments of the next big
> message.  When it came time to send the second small fragment,
> Nagle would force it to wait for an ACK for the first small fragment.
> I'm not convinced that the difference is all that important,
> but who knows.
>
> Here's my attempt at implementing the modified Nagle algorithm described
> in draft-minshall-nagle-01.txt.  It should be applied instead of your
> patch, not on top of it.  If (q->outstanding_bytes % asoc->frag_point)
> is zero, no delay is introduced.  The assumption is that this means that
> all outstanding packets (if any) are full-sized.
>
> Signed-off-by: Doug Graham <dgraham@nortel.com>
>
> ---
> --- linux-2.6.29/net/sctp/output.c	2009/08/02 00:47:44	1.3
> +++ linux-2.6.29/net/sctp/output.c	2009/08/02 00:51:18
> @@ -717,7 +717,8 @@ static sctp_xmit_t sctp_packet_append_da
>  	 * unacknowledged.
>  	 */
>  	if (!sp->nodelay && sctp_packet_empty(packet) &&
> -	    q->outstanding_bytes && sctp_state(asoc, ESTABLISHED)) {
> +	    (q->outstanding_bytes % asoc->frag_point) != 0 &&
>   
I think asoc->unack_data can be used for check full-sized packet. change
(q->outstanding_bytes % asoc->frag_point) != 0 to:

  +           (q->outstanding_bytes != asoc->frag_point * (asoc->unack_data - 1) &&

> +	    sctp_state(asoc, ESTABLISHED)) {
>  		unsigned len = datasize + q->out_qlen;
>  
>  		/* Check whether this chunk and all the rest of pending
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
>   



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix piggybacked ACKs
  2009-07-29 16:05 [PATCH] Fix piggybacked ACKs Doug Graham
                   ` (18 preceding siblings ...)
  2009-08-03  2:15 ` Wei Yongjun
@ 2009-08-03  3:32 ` Wei Yongjun
  2009-08-04  3:00 ` Doug Graham
                   ` (7 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Wei Yongjun @ 2009-08-03  3:32 UTC (permalink / raw)
  To: linux-sctp

Doug Graham wrote:
> On Fri, Jul 31, 2009 at 12:21:15PM +0800, Wei Yongjun wrote:
>   
>> Doug Graham wrote:
>>     
>>>  13 2.002632    10.0.0.15   10.0.0.11   DATA (1452 bytes data) 
>>>  14 2.203092    10.0.0.11   10.0.0.15   SACK 
>>>  15 2.203153    10.0.0.15   10.0.0.11   DATA (2 bytes data)
>>>  16 2.203427    10.0.0.11   10.0.0.15   SACK 
>>>  17 2.203808    10.0.0.11   10.0.0.15   DATA (1452 bytes data)
>>>  18 2.403524    10.0.0.15   10.0.0.11   SACK 
>>>  19 2.403686    10.0.0.11   10.0.0.15   DATA (2 bytes data)
>>>  20 2.603285    10.0.0.15   10.0.0.11   SACK 
>>>
>>> What bothers me about this is that Nagle seems to be introducing a delay
>>> here.  The first DATA packets in both directions are MTU-sized packets,
>>> yet both the Linux client and the BSD server wait 200ms until they get
>>> the SACK to the first fragment before sending the second fragment.
>>> The server can't send its reply until it gets both fragments, and the
>>> client can't reassemble the reply until it gets both fragments, so from
>>> the application's point of view, the reply doesn't arrive until 400ms
>>> after the request is sent.  This could probably be fixed by disabling
>>> Nagle with SCTP_NODELAY, but that shouldn't be required.  Nagle is only
>>> supposed to prevent multiple outstanding *small* packets.
>>>   
>>>       
>> I think you hit the point which Nagle's algorithm should be not used.
>>
>> Can you try the following patch?
>>
>> [PATCH] sctp: do not used Nagle algorithm while fragmented data is transmitted
>>
>> If fragmented data is sent, the Nagle's algorithm should not be
>> used. In special case, if only one large packet is sent, the delay
>> send of fragmented data will cause the receiver wait for more
>> fragmented data to reassembe them and not send SACK, but the sender
>> still wait for SACK before send the last fragment.
>>     
>
> [patch deleted]
>
> This patch seems to work quite well, but I think disabling Nagle
> completely for large messages is not quite the right thing to do.
> There's a draft-minshall-nagle-01.txt floating around that describes a
> modified Nagle algorithm for TCP.  It appears to have been implemented
> in Linux TCP even though the draft has expired.  The modified algorithm
> is how I thought Nagle had always worked to begin with.  From the draft:
>
>         "If a TCP has less than a full-sized packet to transmit,
>         and if any previously transmitted less than full-sized
>         packet has not yet been acknowledged, do not transmit
>         a packet."
>
> so in the case of sending a fragmented SCTP message, all but the last
> fragment will be full-sized and will be sent without delay.  The last
> fragment will usually not be full-sized, but it too will be sent without
> delay because there are no outstanding non-full-sized packets.
>
> The difference between this and your method is that yours would
> allow many small fragments of big messages to be outstanding, whereas
> this one would only allow the first big message to be sent in its
> entirety, followed by the full-sized fragments of the next big
> message.  When it came time to send the second small fragment,
> Nagle would force it to wait for an ACK for the first small fragment.
>   

This case will never happend because when we fragment data, the fragment
size
is always be frag_point expect the last fragment. So either the last
fragment is
full size or not, we should not use Nagle algorithm.

Nagle algorithm is not adapt to fragment datas.


> I'm not convinced that the difference is all that important,
> but who knows.
>
> Here's my attempt at implementing the modified Nagle algorithm described
> in draft-minshall-nagle-01.txt.  It should be applied instead of your
> patch, not on top of it.  If (q->outstanding_bytes % asoc->frag_point)
> is zero, no delay is introduced.  The assumption is that this means that
> all outstanding packets (if any) are full-sized.
>
> Signed-off-by: Doug Graham <dgraham@nortel.com>
>
> ---
> --- linux-2.6.29/net/sctp/output.c	2009/08/02 00:47:44	1.3
> +++ linux-2.6.29/net/sctp/output.c	2009/08/02 00:51:18
> @@ -717,7 +717,8 @@ static sctp_xmit_t sctp_packet_append_da
>  	 * unacknowledged.
>  	 */
>  	if (!sp->nodelay && sctp_packet_empty(packet) &&
> -	    q->outstanding_bytes && sctp_state(asoc, ESTABLISHED)) {
> +	    (q->outstanding_bytes % asoc->frag_point) != 0 &&
> +	    sctp_state(asoc, ESTABLISHED)) {
>  		unsigned len = datasize + q->out_qlen;
>  
>  		/* Check whether this chunk and all the rest of pending
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
>   



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix piggybacked ACKs
  2009-07-29 16:05 [PATCH] Fix piggybacked ACKs Doug Graham
                   ` (19 preceding siblings ...)
  2009-08-03  3:32 ` Wei Yongjun
@ 2009-08-04  3:00 ` Doug Graham
  2009-08-04  3:03 ` Wei Yongjun
                   ` (6 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Doug Graham @ 2009-08-04  3:00 UTC (permalink / raw)
  To: linux-sctp

Oops.  Sent the last one in HTML,  so the mailing list rejected it.  
Damned GUI email
clients!

Wei Yongjun wrote:
> Doug Graham wrote:
>   
>> On Fri, Jul 31, 2009 at 12:21:15PM +0800, Wei Yongjun wrote:
>>   
>>     
>>> Doug Graham wrote:
>>>     
>>>       
>>>>  13 2.002632    10.0.0.15   10.0.0.11   DATA (1452 bytes data) 
>>>>  14 2.203092    10.0.0.11   10.0.0.15   SACK 
>>>>  15 2.203153    10.0.0.15   10.0.0.11   DATA (2 bytes data)
>>>>  16 2.203427    10.0.0.11   10.0.0.15   SACK 
>>>>  17 2.203808    10.0.0.11   10.0.0.15   DATA (1452 bytes data)
>>>>  18 2.403524    10.0.0.15   10.0.0.11   SACK 
>>>>  19 2.403686    10.0.0.11   10.0.0.15   DATA (2 bytes data)
>>>>  20 2.603285    10.0.0.15   10.0.0.11   SACK 
>>>>
>>>> What bothers me about this is that Nagle seems to be introducing a delay
>>>> here.  The first DATA packets in both directions are MTU-sized packets,
>>>> yet both the Linux client and the BSD server wait 200ms until they get
>>>> the SACK to the first fragment before sending the second fragment.
>>>> The server can't send its reply until it gets both fragments, and the
>>>> client can't reassemble the reply until it gets both fragments, so from
>>>> the application's point of view, the reply doesn't arrive until 400ms
>>>> after the request is sent.  This could probably be fixed by disabling
>>>> Nagle with SCTP_NODELAY, but that shouldn't be required.  Nagle is only
>>>> supposed to prevent multiple outstanding *small* packets.
>>>>   
>>>>       
>>>>         
>>> I think you hit the point which Nagle's algorithm should be not used.
>>>
>>> Can you try the following patch?
>>>
>>> [PATCH] sctp: do not used Nagle algorithm while fragmented data is transmitted
>>>
>>> If fragmented data is sent, the Nagle's algorithm should not be
>>> used. In special case, if only one large packet is sent, the delay
>>> send of fragmented data will cause the receiver wait for more
>>> fragmented data to reassembe them and not send SACK, but the sender
>>> still wait for SACK before send the last fragment.
>>>     
>>>       
>> [patch deleted]
>>
>> This patch seems to work quite well, but I think disabling Nagle
>> completely for large messages is not quite the right thing to do.
>> There's a draft-minshall-nagle-01.txt floating around that describes a
>> modified Nagle algorithm for TCP.  It appears to have been implemented
>> in Linux TCP even though the draft has expired.  The modified algorithm
>> is how I thought Nagle had always worked to begin with.  From the draft:
>>
>>         "If a TCP has less than a full-sized packet to transmit,
>>         and if any previously transmitted less than full-sized
>>         packet has not yet been acknowledged, do not transmit
>>         a packet."
>>
>> so in the case of sending a fragmented SCTP message, all but the last
>> fragment will be full-sized and will be sent without delay.  The last
>> fragment will usually not be full-sized, but it too will be sent without
>> delay because there are no outstanding non-full-sized packets.
>>
>> The difference between this and your method is that yours would
>> allow many small fragments of big messages to be outstanding, whereas
>> this one would only allow the first big message to be sent in its
>> entirety, followed by the full-sized fragments of the next big
>> message.  When it came time to send the second small fragment,
>> Nagle would force it to wait for an ACK for the first small fragment.
>> I'm not convinced that the difference is all that important,
>> but who knows.
>>
>> Here's my attempt at implementing the modified Nagle algorithm described
>> in draft-minshall-nagle-01.txt.  It should be applied instead of your
>> patch, not on top of it.  If (q->outstanding_bytes % asoc->frag_point)
>> is zero, no delay is introduced.  The assumption is that this means that
>> all outstanding packets (if any) are full-sized.
>>
>> Signed-off-by: Doug Graham <dgraham@nortel.com>
>>
>> ---
>> --- linux-2.6.29/net/sctp/output.c	2009/08/02 00:47:44	1.3
>> +++ linux-2.6.29/net/sctp/output.c	2009/08/02 00:51:18
>> @@ -717,7 +717,8 @@ static sctp_xmit_t sctp_packet_append_da
>>  	 * unacknowledged.
>>  	 */
>>  	if (!sp->nodelay && sctp_packet_empty(packet) &&
>> -	    q->outstanding_bytes && sctp_state(asoc, ESTABLISHED)) {
>> +	    (q->outstanding_bytes % asoc->frag_point) != 0 &&
>> +	    sctp_state(asoc, ESTABLISHED)) {
>>  		unsigned len = datasize + q->out_qlen;
>>  
>>  		/* Check whether this chunk and all the rest of pending
>>   
>>     
>
>
> Seem good! But it may be broken the small packet transmit which can be
> used Nagle algorithm.
> Such as this:
>
> Endpoint A                Endpint B
>           <-------------  DATA (size\x1452/2) delay send
>           <-------------  DATA (size\x1452/2) send immediately
>           <-------------  DATA (size\x1452/2) send immediately ** broken
>           <-------------  DATA (size\x1452/2) delay send
>           <-------------  DATA (size\x1452/2) send immediately
>           <-------------  DATA (size\x1452/2) send immediately ** broken
>
>
> Can you try this one?
>
>
>   

I would, except I don't understand what you're getting at.  Does this 
mean to send a total of
6 1454 byte messages from B to A?  If so, why would the first one be 
delayed?

Assuming that no SACKs are received by B, this should result in the 
first 3 packets getting sent
immediately, a 1452 byte fragment, then a 2 byte fragment, then the 
second 1452 byte fragment.
When it comes time to send the second 2 byte fragment, Nagle kicks in 
and prevents if from
being sent until a SACK is received.

But I'm pretty sure I missed your point.  Can you flesh it out a bit?

--Doug
>
>   

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix piggybacked ACKs
  2009-07-29 16:05 [PATCH] Fix piggybacked ACKs Doug Graham
                   ` (20 preceding siblings ...)
  2009-08-04  3:00 ` Doug Graham
@ 2009-08-04  3:03 ` Wei Yongjun
  2009-08-04  3:28 ` Doug Graham
                   ` (5 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Wei Yongjun @ 2009-08-04  3:03 UTC (permalink / raw)
  To: linux-sctp

Doug Graham wrote:
> Oops.  Sent the last one in HTML,  so the mailing list rejected it. 
> Damned GUI email
> clients!
>
> Wei Yongjun wrote:
>> Doug Graham wrote:
>>  
>>> On Fri, Jul 31, 2009 at 12:21:15PM +0800, Wei Yongjun wrote:
>>>      
>>>> Doug Graham wrote:
>>>>          
>>>>>  13 2.002632    10.0.0.15   10.0.0.11   DATA (1452 bytes data)  14
>>>>> 2.203092    10.0.0.11   10.0.0.15   SACK  15 2.203153   
>>>>> 10.0.0.15   10.0.0.11   DATA (2 bytes data)
>>>>>  16 2.203427    10.0.0.11   10.0.0.15   SACK  17 2.203808   
>>>>> 10.0.0.11   10.0.0.15   DATA (1452 bytes data)
>>>>>  18 2.403524    10.0.0.15   10.0.0.11   SACK  19 2.403686   
>>>>> 10.0.0.11   10.0.0.15   DATA (2 bytes data)
>>>>>  20 2.603285    10.0.0.15   10.0.0.11   SACK
>>>>> What bothers me about this is that Nagle seems to be introducing a
>>>>> delay
>>>>> here.  The first DATA packets in both directions are MTU-sized
>>>>> packets,
>>>>> yet both the Linux client and the BSD server wait 200ms until they
>>>>> get
>>>>> the SACK to the first fragment before sending the second fragment.
>>>>> The server can't send its reply until it gets both fragments, and the
>>>>> client can't reassemble the reply until it gets both fragments, so
>>>>> from
>>>>> the application's point of view, the reply doesn't arrive until 400ms
>>>>> after the request is sent.  This could probably be fixed by disabling
>>>>> Nagle with SCTP_NODELAY, but that shouldn't be required.  Nagle is
>>>>> only
>>>>> supposed to prevent multiple outstanding *small* packets.
>>>>>                 
>>>> I think you hit the point which Nagle's algorithm should be not used.
>>>>
>>>> Can you try the following patch?
>>>>
>>>> [PATCH] sctp: do not used Nagle algorithm while fragmented data is
>>>> transmitted
>>>>
>>>> If fragmented data is sent, the Nagle's algorithm should not be
>>>> used. In special case, if only one large packet is sent, the delay
>>>> send of fragmented data will cause the receiver wait for more
>>>> fragmented data to reassembe them and not send SACK, but the sender
>>>> still wait for SACK before send the last fragment.
>>>>           
>>> [patch deleted]
>>>
>>> This patch seems to work quite well, but I think disabling Nagle
>>> completely for large messages is not quite the right thing to do.
>>> There's a draft-minshall-nagle-01.txt floating around that describes a
>>> modified Nagle algorithm for TCP.  It appears to have been implemented
>>> in Linux TCP even though the draft has expired.  The modified algorithm
>>> is how I thought Nagle had always worked to begin with.  From the
>>> draft:
>>>
>>>         "If a TCP has less than a full-sized packet to transmit,
>>>         and if any previously transmitted less than full-sized
>>>         packet has not yet been acknowledged, do not transmit
>>>         a packet."
>>>
>>> so in the case of sending a fragmented SCTP message, all but the last
>>> fragment will be full-sized and will be sent without delay.  The last
>>> fragment will usually not be full-sized, but it too will be sent
>>> without
>>> delay because there are no outstanding non-full-sized packets.
>>>
>>> The difference between this and your method is that yours would
>>> allow many small fragments of big messages to be outstanding, whereas
>>> this one would only allow the first big message to be sent in its
>>> entirety, followed by the full-sized fragments of the next big
>>> message.  When it came time to send the second small fragment,
>>> Nagle would force it to wait for an ACK for the first small fragment.
>>> I'm not convinced that the difference is all that important,
>>> but who knows.
>>>
>>> Here's my attempt at implementing the modified Nagle algorithm
>>> described
>>> in draft-minshall-nagle-01.txt.  It should be applied instead of your
>>> patch, not on top of it.  If (q->outstanding_bytes % asoc->frag_point)
>>> is zero, no delay is introduced.  The assumption is that this means
>>> that
>>> all outstanding packets (if any) are full-sized.
>>>
>>> Signed-off-by: Doug Graham <dgraham@nortel.com>
>>>
>>> ---
>>> --- linux-2.6.29/net/sctp/output.c    2009/08/02 00:47:44    1.3
>>> +++ linux-2.6.29/net/sctp/output.c    2009/08/02 00:51:18
>>> @@ -717,7 +717,8 @@ static sctp_xmit_t sctp_packet_append_da
>>>       * unacknowledged.
>>>       */
>>>      if (!sp->nodelay && sctp_packet_empty(packet) &&
>>> -        q->outstanding_bytes && sctp_state(asoc, ESTABLISHED)) {
>>> +        (q->outstanding_bytes % asoc->frag_point) != 0 &&
>>> +        sctp_state(asoc, ESTABLISHED)) {
>>>          unsigned len = datasize + q->out_qlen;
>>>  
>>>          /* Check whether this chunk and all the rest of pending
>>>       
>>
>>
>> Seem good! But it may be broken the small packet transmit which can be
>> used Nagle algorithm.
>> Such as this:
>>
>> Endpoint A                Endpint B
>>           <-------------  DATA (size\x1452/2) delay send
>>           <-------------  DATA (size\x1452/2) send immediately
>>           <-------------  DATA (size\x1452/2) send immediately ** broken
>>           <-------------  DATA (size\x1452/2) delay send
>>           <-------------  DATA (size\x1452/2) send immediately
>>           <-------------  DATA (size\x1452/2) send immediately ** broken
>>
>>
>> Can you try this one?
>>
>>
>>   
>
> I would, except I don't understand what you're getting at.  Does this
> mean to send a total of
> 6 1454 byte messages from B to A?  If so, why would the first one be
> delayed?

Oh, no, six 726 bytes(1452/2) messages, may be the 1st and 2nd are
bundled in one packet,
the 3rd is a single packet, the 4th, 5th are bundled, the 6th is single.
I have no test it.

>
> Assuming that no SACKs are received by B, this should result in the
> first 3 packets getting sent
> immediately, a 1452 byte fragment, then a 2 byte fragment, then the
> second 1452 byte fragment.
> When it comes time to send the second 2 byte fragment, Nagle kicks in
> and prevents if from
> being sent until a SACK is received.
>
> But I'm pretty sure I missed your point.  Can you flesh it out a bit?
>
> --Doug
>>
>>   
>
>
>



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix piggybacked ACKs
  2009-07-29 16:05 [PATCH] Fix piggybacked ACKs Doug Graham
                   ` (21 preceding siblings ...)
  2009-08-04  3:03 ` Wei Yongjun
@ 2009-08-04  3:28 ` Doug Graham
  2009-08-04  3:44 ` Doug Graham
                   ` (4 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Doug Graham @ 2009-08-04  3:28 UTC (permalink / raw)
  To: linux-sctp

Wei Yongjun wrote:
> Doug Graham wrote:
>   
>> On Fri, Jul 31, 2009 at 12:21:15PM +0800, Wei Yongjun wrote:
>>   
>>     
>>> Doug Graham wrote:
>>>     
>>>       
>>>>  13 2.002632    10.0.0.15   10.0.0.11   DATA (1452 bytes data) 
>>>>  14 2.203092    10.0.0.11   10.0.0.15   SACK 
>>>>  15 2.203153    10.0.0.15   10.0.0.11   DATA (2 bytes data)
>>>>  16 2.203427    10.0.0.11   10.0.0.15   SACK 
>>>>  17 2.203808    10.0.0.11   10.0.0.15   DATA (1452 bytes data)
>>>>  18 2.403524    10.0.0.15   10.0.0.11   SACK 
>>>>  19 2.403686    10.0.0.11   10.0.0.15   DATA (2 bytes data)
>>>>  20 2.603285    10.0.0.15   10.0.0.11   SACK 
>>>>
>>>> What bothers me about this is that Nagle seems to be introducing a delay
>>>> here.  The first DATA packets in both directions are MTU-sized packets,
>>>> yet both the Linux client and the BSD server wait 200ms until they get
>>>> the SACK to the first fragment before sending the second fragment.
>>>> The server can't send its reply until it gets both fragments, and the
>>>> client can't reassemble the reply until it gets both fragments, so from
>>>> the application's point of view, the reply doesn't arrive until 400ms
>>>> after the request is sent.  This could probably be fixed by disabling
>>>> Nagle with SCTP_NODELAY, but that shouldn't be required.  Nagle is only
>>>> supposed to prevent multiple outstanding *small* packets.
>>>>   
>>>>       
>>>>         
>>> I think you hit the point which Nagle's algorithm should be not used.
>>>
>>> Can you try the following patch?
>>>
>>> [PATCH] sctp: do not used Nagle algorithm while fragmented data is transmitted
>>>
>>> If fragmented data is sent, the Nagle's algorithm should not be
>>> used. In special case, if only one large packet is sent, the delay
>>> send of fragmented data will cause the receiver wait for more
>>> fragmented data to reassembe them and not send SACK, but the sender
>>> still wait for SACK before send the last fragment.
>>>     
>>>       
>> [patch deleted]
>>
>> This patch seems to work quite well, but I think disabling Nagle
>> completely for large messages is not quite the right thing to do.
>> There's a draft-minshall-nagle-01.txt floating around that describes a
>> modified Nagle algorithm for TCP.  It appears to have been implemented
>> in Linux TCP even though the draft has expired.  The modified algorithm
>> is how I thought Nagle had always worked to begin with.  From the draft:
>>
>>         "If a TCP has less than a full-sized packet to transmit,
>>         and if any previously transmitted less than full-sized
>>         packet has not yet been acknowledged, do not transmit
>>         a packet."
>>
>> so in the case of sending a fragmented SCTP message, all but the last
>> fragment will be full-sized and will be sent without delay.  The last
>> fragment will usually not be full-sized, but it too will be sent without
>> delay because there are no outstanding non-full-sized packets.
>>
>> The difference between this and your method is that yours would
>> allow many small fragments of big messages to be outstanding, whereas
>> this one would only allow the first big message to be sent in its
>> entirety, followed by the full-sized fragments of the next big
>> message.  When it came time to send the second small fragment,
>> Nagle would force it to wait for an ACK for the first small fragment.
>>   
>>     
>
> This case will never happend because when we fragment data, the fragment
> size
> is always be frag_point expect the last fragment. So either the last
> fragment is
> full size or not, we should not use Nagle algorithm.
>
> Nagle algorithm is not adapt to fragment datas.
>
>
>   
Why can it never happen?  If I send a bunch of large messages with
small last fragments, your modification will allow all messages
to be sent, because it disables Nagle for large messages, right?
If so, many small last fragments can be outstanding at any one
time (one from each message).  Technically, this violates Nagle,
which aims to prevent more than one small fragment from ever being
outstanding, but I'm not sure that it really violates the spirit
of what Nagle is trying to accomplish.

Nagle is really meant to prevent the case of an application like
telnet from sending a whole lot of small packets containing only 1
or a few characters.  If the receive window is, say, 10000 bytes,
Nagle would allow 10000 packets to be outstanding, all clogging
up the network.  But if the PMTU is, say, 1000 bytes and the user
tries to send a bunch of 1001 byte messages, your method (if I
understand it correctly) will allow 9 unacknowledged messages to
be outstanding.  Those 9 messages will be split into 9 full-sized
packets and 9 packets carrying only 1 byte of data.  18 outstanding
packets isn't all that bad.  If the user were instead sending 1000
byte messages, Nagle would have nothing to say about it, and you'd
be able to have 10 packets outstanding.  The increase from 10 to
19 outstanding packets isn't likely to cause network collapse.

--Doug


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix piggybacked ACKs
  2009-07-29 16:05 [PATCH] Fix piggybacked ACKs Doug Graham
                   ` (22 preceding siblings ...)
  2009-08-04  3:28 ` Doug Graham
@ 2009-08-04  3:44 ` Doug Graham
  2009-08-04  3:57 ` Doug Graham
                   ` (3 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Doug Graham @ 2009-08-04  3:44 UTC (permalink / raw)
  To: linux-sctp

Wei Yongjun wrote:
> Doug Graham wrote:
>   
>> Oops.  Sent the last one in HTML,  so the mailing list rejected it. 
>> Damned GUI email
>> clients!
>>
>> Wei Yongjun wrote:
>>     
>>> Doug Graham wrote:
>>>  
>>>       
>>>> On Fri, Jul 31, 2009 at 12:21:15PM +0800, Wei Yongjun wrote:
>>>>      
>>>>         
>>>>> Doug Graham wrote:
>>>>>          
>>>>>           
>>>>>>  13 2.002632    10.0.0.15   10.0.0.11   DATA (1452 bytes data)  14
>>>>>> 2.203092    10.0.0.11   10.0.0.15   SACK  15 2.203153   
>>>>>> 10.0.0.15   10.0.0.11   DATA (2 bytes data)
>>>>>>  16 2.203427    10.0.0.11   10.0.0.15   SACK  17 2.203808   
>>>>>> 10.0.0.11   10.0.0.15   DATA (1452 bytes data)
>>>>>>  18 2.403524    10.0.0.15   10.0.0.11   SACK  19 2.403686   
>>>>>> 10.0.0.11   10.0.0.15   DATA (2 bytes data)
>>>>>>  20 2.603285    10.0.0.15   10.0.0.11   SACK
>>>>>> What bothers me about this is that Nagle seems to be introducing a
>>>>>> delay
>>>>>> here.  The first DATA packets in both directions are MTU-sized
>>>>>> packets,
>>>>>> yet both the Linux client and the BSD server wait 200ms until they
>>>>>> get
>>>>>> the SACK to the first fragment before sending the second fragment.
>>>>>> The server can't send its reply until it gets both fragments, and the
>>>>>> client can't reassemble the reply until it gets both fragments, so
>>>>>> from
>>>>>> the application's point of view, the reply doesn't arrive until 400ms
>>>>>> after the request is sent.  This could probably be fixed by disabling
>>>>>> Nagle with SCTP_NODELAY, but that shouldn't be required.  Nagle is
>>>>>> only
>>>>>> supposed to prevent multiple outstanding *small* packets.
>>>>>>                 
>>>>>>             
>>>>> I think you hit the point which Nagle's algorithm should be not used.
>>>>>
>>>>> Can you try the following patch?
>>>>>
>>>>> [PATCH] sctp: do not used Nagle algorithm while fragmented data is
>>>>> transmitted
>>>>>
>>>>> If fragmented data is sent, the Nagle's algorithm should not be
>>>>> used. In special case, if only one large packet is sent, the delay
>>>>> send of fragmented data will cause the receiver wait for more
>>>>> fragmented data to reassembe them and not send SACK, but the sender
>>>>> still wait for SACK before send the last fragment.
>>>>>           
>>>>>           
>>>> [patch deleted]
>>>>
>>>> This patch seems to work quite well, but I think disabling Nagle
>>>> completely for large messages is not quite the right thing to do.
>>>> There's a draft-minshall-nagle-01.txt floating around that describes a
>>>> modified Nagle algorithm for TCP.  It appears to have been implemented
>>>> in Linux TCP even though the draft has expired.  The modified algorithm
>>>> is how I thought Nagle had always worked to begin with.  From the
>>>> draft:
>>>>
>>>>         "If a TCP has less than a full-sized packet to transmit,
>>>>         and if any previously transmitted less than full-sized
>>>>         packet has not yet been acknowledged, do not transmit
>>>>         a packet."
>>>>
>>>> so in the case of sending a fragmented SCTP message, all but the last
>>>> fragment will be full-sized and will be sent without delay.  The last
>>>> fragment will usually not be full-sized, but it too will be sent
>>>> without
>>>> delay because there are no outstanding non-full-sized packets.
>>>>
>>>> The difference between this and your method is that yours would
>>>> allow many small fragments of big messages to be outstanding, whereas
>>>> this one would only allow the first big message to be sent in its
>>>> entirety, followed by the full-sized fragments of the next big
>>>> message.  When it came time to send the second small fragment,
>>>> Nagle would force it to wait for an ACK for the first small fragment.
>>>> I'm not convinced that the difference is all that important,
>>>> but who knows.
>>>>
>>>> Here's my attempt at implementing the modified Nagle algorithm
>>>> described
>>>> in draft-minshall-nagle-01.txt.  It should be applied instead of your
>>>> patch, not on top of it.  If (q->outstanding_bytes % asoc->frag_point)
>>>> is zero, no delay is introduced.  The assumption is that this means
>>>> that
>>>> all outstanding packets (if any) are full-sized.
>>>>
>>>> Signed-off-by: Doug Graham <dgraham@nortel.com>
>>>>
>>>> ---
>>>> --- linux-2.6.29/net/sctp/output.c    2009/08/02 00:47:44    1.3
>>>> +++ linux-2.6.29/net/sctp/output.c    2009/08/02 00:51:18
>>>> @@ -717,7 +717,8 @@ static sctp_xmit_t sctp_packet_append_da
>>>>       * unacknowledged.
>>>>       */
>>>>      if (!sp->nodelay && sctp_packet_empty(packet) &&
>>>> -        q->outstanding_bytes && sctp_state(asoc, ESTABLISHED)) {
>>>> +        (q->outstanding_bytes % asoc->frag_point) != 0 &&
>>>> +        sctp_state(asoc, ESTABLISHED)) {
>>>>          unsigned len = datasize + q->out_qlen;
>>>>  
>>>>          /* Check whether this chunk and all the rest of pending
>>>>       
>>>>         
>>> Seem good! But it may be broken the small packet transmit which can be
>>> used Nagle algorithm.
>>> Such as this:
>>>
>>> Endpoint A                Endpint B
>>>           <-------------  DATA (size\x1452/2) delay send
>>>           <-------------  DATA (size\x1452/2) send immediately
>>>           <-------------  DATA (size\x1452/2) send immediately ** broken
>>>           <-------------  DATA (size\x1452/2) delay send
>>>           <-------------  DATA (size\x1452/2) send immediately
>>>           <-------------  DATA (size\x1452/2) send immediately ** broken
>>>
>>>
>>> Can you try this one?
>>>
>>>
>>>   
>>>       
>> I would, except I don't understand what you're getting at.  Does this
>> mean to send a total of
>> 6 1454 byte messages from B to A?  If so, why would the first one be
>> delayed?
>>     
>
> Oh, no, six 726 bytes(1452/2) messages, may be the 1st and 2nd are
> bundled in one packet,
> the 3rd is a single packet, the 4th, 5th are bundled, the 6th is single.
> I have no test it.
>
>   
Ah, so that / meant *division*!  I thought that was your notation 
meaning that the
packets were fragmented into a 1452 byte chunk and a 2 byte chunk!
Makes more sense now :-)

I admit that I didn't study too closely exactly what 
q->outstanding_bytes represents.  I assumed
it meant the number of bytes that had been sent on the wire, but not yet 
acknowledged.
Any bytes that were delayed because of Nagle would not be counted in 
outstanding_bytes
(I assume).  So the first send of 726 would get sent immediately and 
counted in
outstanding_bytes.  The second one would get delayed by Nagle and not 
counted
in outstanding_bytes.  All the later ones would also get delayed by 
Nagle because
outstanding_bytes is still 726.

I do think that using outstanding_bytes the way I did is probably an 
ugly kludge, and
there's hopefully a better way.  But the right way will probably involve 
adding
some more state to each association (the snd.sml variable mentioned in 
the minshall
draft at the very least).  I'm not sure that using asoc->frag_point the 
way I did is correct
either, because I think the frag_point can change during the lifetime of 
an association.

--Doug


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix piggybacked ACKs
  2009-07-29 16:05 [PATCH] Fix piggybacked ACKs Doug Graham
                   ` (23 preceding siblings ...)
  2009-08-04  3:44 ` Doug Graham
@ 2009-08-04  3:57 ` Doug Graham
  2009-08-04 14:50 ` Vlad Yasevich
                   ` (2 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Doug Graham @ 2009-08-04  3:57 UTC (permalink / raw)
  To: linux-sctp


>>  
> Why can it never happen?  If I send a bunch of large messages with
> small last fragments, your modification will allow all messages
> to be sent, because it disables Nagle for large messages, right?
> If so, many small last fragments can be outstanding at any one
> time (one from each message).  Technically, this violates Nagle,
> which aims to prevent more than one small fragment from ever being
> outstanding, but I'm not sure that it really violates the spirit
> of what Nagle is trying to accomplish.
>
> Nagle is really meant to prevent the case of an application like
> telnet from sending a whole lot of small packets containing only 1
> or a few characters.  If the receive window is, say, 10000 bytes,
> Nagle would allow 10000 packets to be outstanding, all clogging
> up the network.

I meant to say that *without* Nagle, you'd be able to have 10000 packets
outstanding.  This is what Nagle is trying to prevent.

> But if the PMTU is, say, 1000 bytes and the user
> tries to send a bunch of 1001 byte messages, your method (if I
> understand it correctly) will allow 9 unacknowledged messages to
> be outstanding.  Those 9 messages will be split into 9 full-sized
> packets and 9 packets carrying only 1 byte of data.  18 outstanding
> packets isn't all that bad.  If the user were instead sending 1000
> byte messages, Nagle would have nothing to say about it, and you'd
> be able to have 10 packets outstanding.  The increase from 10 to
> 19 outstanding packets isn't likely to cause network collapse.
>
> --Doug
>
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix piggybacked ACKs
  2009-07-29 16:05 [PATCH] Fix piggybacked ACKs Doug Graham
                   ` (24 preceding siblings ...)
  2009-08-04  3:57 ` Doug Graham
@ 2009-08-04 14:50 ` Vlad Yasevich
  2009-08-04 17:05 ` Doug Graham
  2009-08-04 17:14 ` Vlad Yasevich
  27 siblings, 0 replies; 29+ messages in thread
From: Vlad Yasevich @ 2009-08-04 14:50 UTC (permalink / raw)
  To: linux-sctp

Doug Graham wrote:
> Wei Yongjun wrote:
>> Doug Graham wrote:
>>  
>>> Oops.  Sent the last one in HTML,  so the mailing list rejected it.
>>> Damned GUI email
>>> clients!
>>>
>>> Wei Yongjun wrote:
>>>    
>>>> Doug Graham wrote:
>>>>  
>>>>      
>>>>> On Fri, Jul 31, 2009 at 12:21:15PM +0800, Wei Yongjun wrote:
>>>>>             
>>>>>> Doug Graham wrote:
>>>>>>                   
>>>>>>>  13 2.002632    10.0.0.15   10.0.0.11   DATA (1452 bytes data)  14
>>>>>>> 2.203092    10.0.0.11   10.0.0.15   SACK  15 2.203153  
>>>>>>> 10.0.0.15   10.0.0.11   DATA (2 bytes data)
>>>>>>>  16 2.203427    10.0.0.11   10.0.0.15   SACK  17 2.203808  
>>>>>>> 10.0.0.11   10.0.0.15   DATA (1452 bytes data)
>>>>>>>  18 2.403524    10.0.0.15   10.0.0.11   SACK  19 2.403686  
>>>>>>> 10.0.0.11   10.0.0.15   DATA (2 bytes data)
>>>>>>>  20 2.603285    10.0.0.15   10.0.0.11   SACK
>>>>>>> What bothers me about this is that Nagle seems to be introducing a
>>>>>>> delay
>>>>>>> here.  The first DATA packets in both directions are MTU-sized
>>>>>>> packets,
>>>>>>> yet both the Linux client and the BSD server wait 200ms until they
>>>>>>> get
>>>>>>> the SACK to the first fragment before sending the second fragment.
>>>>>>> The server can't send its reply until it gets both fragments, and
>>>>>>> the
>>>>>>> client can't reassemble the reply until it gets both fragments, so
>>>>>>> from
>>>>>>> the application's point of view, the reply doesn't arrive until
>>>>>>> 400ms
>>>>>>> after the request is sent.  This could probably be fixed by
>>>>>>> disabling
>>>>>>> Nagle with SCTP_NODELAY, but that shouldn't be required.  Nagle is
>>>>>>> only
>>>>>>> supposed to prevent multiple outstanding *small* packets.
>>>>>>>                             
>>>>>> I think you hit the point which Nagle's algorithm should be not used.
>>>>>>
>>>>>> Can you try the following patch?
>>>>>>
>>>>>> [PATCH] sctp: do not used Nagle algorithm while fragmented data is
>>>>>> transmitted
>>>>>>
>>>>>> If fragmented data is sent, the Nagle's algorithm should not be
>>>>>> used. In special case, if only one large packet is sent, the delay
>>>>>> send of fragmented data will cause the receiver wait for more
>>>>>> fragmented data to reassembe them and not send SACK, but the sender
>>>>>> still wait for SACK before send the last fragment.
>>>>>>                     
>>>>> [patch deleted]
>>>>>
>>>>> This patch seems to work quite well, but I think disabling Nagle
>>>>> completely for large messages is not quite the right thing to do.
>>>>> There's a draft-minshall-nagle-01.txt floating around that describes a
>>>>> modified Nagle algorithm for TCP.  It appears to have been implemented
>>>>> in Linux TCP even though the draft has expired.  The modified
>>>>> algorithm
>>>>> is how I thought Nagle had always worked to begin with.  From the
>>>>> draft:
>>>>>
>>>>>         "If a TCP has less than a full-sized packet to transmit,
>>>>>         and if any previously transmitted less than full-sized
>>>>>         packet has not yet been acknowledged, do not transmit
>>>>>         a packet."
>>>>>
>>>>> so in the case of sending a fragmented SCTP message, all but the last
>>>>> fragment will be full-sized and will be sent without delay.  The last
>>>>> fragment will usually not be full-sized, but it too will be sent
>>>>> without
>>>>> delay because there are no outstanding non-full-sized packets.
>>>>>
>>>>> The difference between this and your method is that yours would
>>>>> allow many small fragments of big messages to be outstanding, whereas
>>>>> this one would only allow the first big message to be sent in its
>>>>> entirety, followed by the full-sized fragments of the next big
>>>>> message.  When it came time to send the second small fragment,
>>>>> Nagle would force it to wait for an ACK for the first small fragment.
>>>>> I'm not convinced that the difference is all that important,
>>>>> but who knows.
>>>>>
>>>>> Here's my attempt at implementing the modified Nagle algorithm
>>>>> described
>>>>> in draft-minshall-nagle-01.txt.  It should be applied instead of your
>>>>> patch, not on top of it.  If (q->outstanding_bytes % asoc->frag_point)
>>>>> is zero, no delay is introduced.  The assumption is that this means
>>>>> that
>>>>> all outstanding packets (if any) are full-sized.
>>>>>
>>>>> Signed-off-by: Doug Graham <dgraham@nortel.com>
>>>>>
>>>>> ---
>>>>> --- linux-2.6.29/net/sctp/output.c    2009/08/02 00:47:44    1.3
>>>>> +++ linux-2.6.29/net/sctp/output.c    2009/08/02 00:51:18
>>>>> @@ -717,7 +717,8 @@ static sctp_xmit_t sctp_packet_append_da
>>>>>       * unacknowledged.
>>>>>       */
>>>>>      if (!sp->nodelay && sctp_packet_empty(packet) &&
>>>>> -        q->outstanding_bytes && sctp_state(asoc, ESTABLISHED)) {
>>>>> +        (q->outstanding_bytes % asoc->frag_point) != 0 &&
>>>>> +        sctp_state(asoc, ESTABLISHED)) {
>>>>>          unsigned len = datasize + q->out_qlen;
>>>>>  
>>>>>          /* Check whether this chunk and all the rest of pending
>>>>>               
>>>> Seem good! But it may be broken the small packet transmit which can be
>>>> used Nagle algorithm.
>>>> Such as this:
>>>>
>>>> Endpoint A                Endpint B
>>>>           <-------------  DATA (size\x1452/2) delay send
>>>>           <-------------  DATA (size\x1452/2) send immediately
>>>>           <-------------  DATA (size\x1452/2) send immediately ** broken
>>>>           <-------------  DATA (size\x1452/2) delay send
>>>>           <-------------  DATA (size\x1452/2) send immediately
>>>>           <-------------  DATA (size\x1452/2) send immediately ** broken
>>>>
>>>>
>>>> Can you try this one?
>>>>
>>>>
>>>>         
>>> I would, except I don't understand what you're getting at.  Does this
>>> mean to send a total of
>>> 6 1454 byte messages from B to A?  If so, why would the first one be
>>> delayed?
>>>     
>>
>> Oh, no, six 726 bytes(1452/2) messages, may be the 1st and 2nd are
>> bundled in one packet,
>> the 3rd is a single packet, the 4th, 5th are bundled, the 6th is single.
>> I have no test it.
>>
>>   
> Ah, so that / meant *division*!  I thought that was your notation
> meaning that the
> packets were fragmented into a 1452 byte chunk and a 2 byte chunk!
> Makes more sense now :-)
> 
> I admit that I didn't study too closely exactly what
> q->outstanding_bytes represents.  I assumed
> it meant the number of bytes that had been sent on the wire, but not yet
> acknowledged.
> Any bytes that were delayed because of Nagle would not be counted in
> outstanding_bytes
> (I assume).  So the first send of 726 would get sent immediately and
> counted in
> outstanding_bytes.  The second one would get delayed by Nagle and not
> counted
> in outstanding_bytes.  All the later ones would also get delayed by
> Nagle because
> outstanding_bytes is still 726.
> 
> I do think that using outstanding_bytes the way I did is probably an
> ugly kludge, and
> there's hopefully a better way.  But the right way will probably involve
> adding
> some more state to each association (the snd.sml variable mentioned in
> the minshall
> draft at the very least).  I'm not sure that using asoc->frag_point the
> way I did is correct
> either, because I think the frag_point can change during the lifetime of
> an association.

Using division in such a hot path is a non-starter to begin with, so we
definitely need to find a better way.

Using frag_point is not the right way to do it either since it's effected by
MTU and user API.

I think we can add something to sctp_outq structure to properly track this.

-vlad


> 
> --Doug
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix piggybacked ACKs
  2009-07-29 16:05 [PATCH] Fix piggybacked ACKs Doug Graham
                   ` (25 preceding siblings ...)
  2009-08-04 14:50 ` Vlad Yasevich
@ 2009-08-04 17:05 ` Doug Graham
  2009-08-04 17:14 ` Vlad Yasevich
  27 siblings, 0 replies; 29+ messages in thread
From: Doug Graham @ 2009-08-04 17:05 UTC (permalink / raw)
  To: linux-sctp

On Tue, Aug 04, 2009 at 10:50:18AM -0400, Vlad Yasevich wrote:
> > I admit that I didn't study too closely exactly what
> > q->outstanding_bytes represents.  I assumed
> > it meant the number of bytes that had been sent on the wire, but not yet
> > acknowledged.
> > Any bytes that were delayed because of Nagle would not be counted in
> > outstanding_bytes
> > (I assume).  So the first send of 726 would get sent immediately and
> > counted in
> > outstanding_bytes.  The second one would get delayed by Nagle and not
> > counted
> > in outstanding_bytes.  All the later ones would also get delayed by
> > Nagle because
> > outstanding_bytes is still 726.
> > 
> > I do think that using outstanding_bytes the way I did is probably an
> > ugly kludge, and
> > there's hopefully a better way.  But the right way will probably involve
> > adding
> > some more state to each association (the snd.sml variable mentioned in
> > the minshall
> > draft at the very least).  I'm not sure that using asoc->frag_point the
> > way I did is correct
> > either, because I think the frag_point can change during the lifetime of
> > an association.
> 
> Using division in such a hot path is a non-starter to begin with, so we
> definitely need to find a better way.

That thought crossed by mind too, although I didn't consider it as much
of a show-stopper as you do.  32 bit integer division isn't really all
that expensive on modern processors is it?  The C compiler is probably
doing it in places as a result of pointer arithmetic anyway.

> Using frag_point is not the right way to do it either since it's effected by
> MTU and user API.
> 
> I think we can add something to sctp_outq structure to properly track this.

I've pretty much convinced myself that Wei's original Nagle patch is fine
anyway.  Just disable Nagle for large messages that need to be fragmented.

--Doug.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Fix piggybacked ACKs
  2009-07-29 16:05 [PATCH] Fix piggybacked ACKs Doug Graham
                   ` (26 preceding siblings ...)
  2009-08-04 17:05 ` Doug Graham
@ 2009-08-04 17:14 ` Vlad Yasevich
  27 siblings, 0 replies; 29+ messages in thread
From: Vlad Yasevich @ 2009-08-04 17:14 UTC (permalink / raw)
  To: linux-sctp

Doug Graham wrote:
> On Tue, Aug 04, 2009 at 10:50:18AM -0400, Vlad Yasevich wrote:
>>> I admit that I didn't study too closely exactly what
>>> q->outstanding_bytes represents.  I assumed
>>> it meant the number of bytes that had been sent on the wire, but not yet
>>> acknowledged.
>>> Any bytes that were delayed because of Nagle would not be counted in
>>> outstanding_bytes
>>> (I assume).  So the first send of 726 would get sent immediately and
>>> counted in
>>> outstanding_bytes.  The second one would get delayed by Nagle and not
>>> counted
>>> in outstanding_bytes.  All the later ones would also get delayed by
>>> Nagle because
>>> outstanding_bytes is still 726.
>>>
>>> I do think that using outstanding_bytes the way I did is probably an
>>> ugly kludge, and
>>> there's hopefully a better way.  But the right way will probably involve
>>> adding
>>> some more state to each association (the snd.sml variable mentioned in
>>> the minshall
>>> draft at the very least).  I'm not sure that using asoc->frag_point the
>>> way I did is correct
>>> either, because I think the frag_point can change during the lifetime of
>>> an association.
>> Using division in such a hot path is a non-starter to begin with, so we
>> definitely need to find a better way.
> 
> That thought crossed by mind too, although I didn't consider it as much
> of a show-stopper as you do.  32 bit integer division isn't really all
> that expensive on modern processors is it?  The C compiler is probably
> doing it in places as a result of pointer arithmetic anyway.
> 
>> Using frag_point is not the right way to do it either since it's effected by
>> MTU and user API.
>>
>> I think we can add something to sctp_outq structure to properly track this.
> 
> I've pretty much convinced myself that Wei's original Nagle patch is fine
> anyway.  Just disable Nagle for large messages that need to be fragmented.
> 

We can't do it blindly since the user may set a fragmentation point such that
we'll send sub-MSS (for lack of a better term) segments.  I need to look at
this more closely.

-vlad

> --Doug.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2009-08-04 17:14 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-07-29 16:05 [PATCH] Fix piggybacked ACKs Doug Graham
2009-07-30  6:48 ` Wei Yongjun
2009-07-30  9:51 ` Wei Yongjun
2009-07-30 16:49 ` Doug Graham
2009-07-30 17:05 ` Vlad Yasevich
2009-07-30 21:24 ` Vlad Yasevich
2009-07-30 23:40 ` Doug Graham
2009-07-31  0:53 ` Wei Yongjun
2009-07-31  1:17 ` Doug Graham
2009-07-31  1:43 ` Doug Graham
2009-07-31  4:21 ` Wei Yongjun
2009-07-31  7:30 ` Michael Tüxen
2009-07-31  7:34 ` Michael Tüxen
2009-07-31 12:59 ` Doug Graham
2009-07-31 13:11 ` Doug Graham
2009-07-31 13:39 ` Doug Graham
2009-07-31 14:18 ` Vlad Yasevich
2009-08-02  2:03 ` Doug Graham
2009-08-03  2:00 ` Wei Yongjun
2009-08-03  2:15 ` Wei Yongjun
2009-08-03  3:32 ` Wei Yongjun
2009-08-04  3:00 ` Doug Graham
2009-08-04  3:03 ` Wei Yongjun
2009-08-04  3:28 ` Doug Graham
2009-08-04  3:44 ` Doug Graham
2009-08-04  3:57 ` Doug Graham
2009-08-04 14:50 ` Vlad Yasevich
2009-08-04 17:05 ` Doug Graham
2009-08-04 17:14 ` Vlad Yasevich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.