All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net] sctp: change to save MSG_MORE flag into assoc
@ 2017-03-03  6:23 ` Xin Long
  0 siblings, 0 replies; 14+ messages in thread
From: Xin Long @ 2017-03-03  6:23 UTC (permalink / raw)
  To: network dev, linux-sctp
  Cc: davem, Marcelo Ricardo Leitner, Neil Horman, Vlad Yasevich, David Laight

David Laight noticed the support for MSG_MORE with datamsg->force_day
didn't really work as we expected, as the first msg with MSG_MORE set
would always block the following chunks' dequeuing.

This Patch is to rewrite it by saving the MSG_MORE flag into assoc as
Divid Laight suggested.

asoc->force_delay is used to save MSG_MORE flag before a msg is sent.
Once this msg is queued, asoc->force_delay is set back to 0, so that
it will not affect other places flushing out queue.

asoc->force_delay works as a 'local param' here as the msg sending is
under protection of sock lock.  It would make sctp's MSG_MORE work as
tcp's.

Fixes: 4ea0c32f5f42 ("sctp: add support for MSG_MORE")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 include/net/sctp/structs.h | 2 +-
 net/sctp/output.c          | 2 +-
 net/sctp/socket.c          | 3 ++-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index a244db5..3378c02 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -497,7 +497,6 @@ struct sctp_datamsg {
 	/* Did the messenge fail to send? */
 	int send_error;
 	u8 send_failed:1,
-	   force_delay:1,
 	   can_delay;	    /* should this message be Nagle delayed */
 };
 
@@ -1876,6 +1875,7 @@ struct sctp_association {
 
 	__u8 need_ecne:1,	/* Need to send an ECNE Chunk? */
 	     temp:1,		/* Is it a temporary association? */
+	     force_delay:1,
 	     prsctp_enable:1,
 	     reconf_enable:1;
 
diff --git a/net/sctp/output.c b/net/sctp/output.c
index 85406d5..5f6acac 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -705,7 +705,7 @@ static sctp_xmit_t sctp_packet_can_append_data(struct sctp_packet *packet,
 	 */
 
 	if ((sctp_sk(asoc->base.sk)->nodelay || inflight == 0) &&
-	    !chunk->msg->force_delay)
+	    !asoc->force_delay)
 		/* Nothing unacked */
 		return SCTP_XMIT_OK;
 
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 465a9c8..d62cf9e 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1964,7 +1964,7 @@ static int sctp_sendmsg(struct sock *sk, struct msghdr *msg, size_t msg_len)
 		err = PTR_ERR(datamsg);
 		goto out_free;
 	}
-	datamsg->force_delay = !!(msg->msg_flags & MSG_MORE);
+	asoc->force_delay = !!(msg->msg_flags & MSG_MORE);
 
 	/* Now send the (possibly) fragmented message. */
 	list_for_each_entry(chunk, &datamsg->chunks, frag_list) {
@@ -1982,6 +1982,7 @@ static int sctp_sendmsg(struct sock *sk, struct msghdr *msg, size_t msg_len)
 	 * breaks.
 	 */
 	err = sctp_primitive_SEND(net, asoc, datamsg);
+	asoc->force_delay = 0;
 	/* Did the lower layer accept the chunk? */
 	if (err) {
 		sctp_datamsg_free(datamsg);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net] sctp: change to save MSG_MORE flag into assoc
@ 2017-03-03  6:23 ` Xin Long
  0 siblings, 0 replies; 14+ messages in thread
From: Xin Long @ 2017-03-03  6:23 UTC (permalink / raw)
  To: network dev, linux-sctp
  Cc: davem, Marcelo Ricardo Leitner, Neil Horman, Vlad Yasevich, David Laight

David Laight noticed the support for MSG_MORE with datamsg->force_day
didn't really work as we expected, as the first msg with MSG_MORE set
would always block the following chunks' dequeuing.

This Patch is to rewrite it by saving the MSG_MORE flag into assoc as
Divid Laight suggested.

asoc->force_delay is used to save MSG_MORE flag before a msg is sent.
Once this msg is queued, asoc->force_delay is set back to 0, so that
it will not affect other places flushing out queue.

asoc->force_delay works as a 'local param' here as the msg sending is
under protection of sock lock.  It would make sctp's MSG_MORE work as
tcp's.

Fixes: 4ea0c32f5f42 ("sctp: add support for MSG_MORE")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 include/net/sctp/structs.h | 2 +-
 net/sctp/output.c          | 2 +-
 net/sctp/socket.c          | 3 ++-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index a244db5..3378c02 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -497,7 +497,6 @@ struct sctp_datamsg {
 	/* Did the messenge fail to send? */
 	int send_error;
 	u8 send_failed:1,
-	   force_delay:1,
 	   can_delay;	    /* should this message be Nagle delayed */
 };
 
@@ -1876,6 +1875,7 @@ struct sctp_association {
 
 	__u8 need_ecne:1,	/* Need to send an ECNE Chunk? */
 	     temp:1,		/* Is it a temporary association? */
+	     force_delay:1,
 	     prsctp_enable:1,
 	     reconf_enable:1;
 
diff --git a/net/sctp/output.c b/net/sctp/output.c
index 85406d5..5f6acac 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -705,7 +705,7 @@ static sctp_xmit_t sctp_packet_can_append_data(struct sctp_packet *packet,
 	 */
 
 	if ((sctp_sk(asoc->base.sk)->nodelay || inflight = 0) &&
-	    !chunk->msg->force_delay)
+	    !asoc->force_delay)
 		/* Nothing unacked */
 		return SCTP_XMIT_OK;
 
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 465a9c8..d62cf9e 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1964,7 +1964,7 @@ static int sctp_sendmsg(struct sock *sk, struct msghdr *msg, size_t msg_len)
 		err = PTR_ERR(datamsg);
 		goto out_free;
 	}
-	datamsg->force_delay = !!(msg->msg_flags & MSG_MORE);
+	asoc->force_delay = !!(msg->msg_flags & MSG_MORE);
 
 	/* Now send the (possibly) fragmented message. */
 	list_for_each_entry(chunk, &datamsg->chunks, frag_list) {
@@ -1982,6 +1982,7 @@ static int sctp_sendmsg(struct sock *sk, struct msghdr *msg, size_t msg_len)
 	 * breaks.
 	 */
 	err = sctp_primitive_SEND(net, asoc, datamsg);
+	asoc->force_delay = 0;
 	/* Did the lower layer accept the chunk? */
 	if (err) {
 		sctp_datamsg_free(datamsg);
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* RE: [PATCH net] sctp: change to save MSG_MORE flag into assoc
  2017-03-03  6:23 ` Xin Long
@ 2017-03-03 12:49   ` David Laight
  -1 siblings, 0 replies; 14+ messages in thread
From: David Laight @ 2017-03-03 12:49 UTC (permalink / raw)
  To: 'Xin Long', network dev, linux-sctp
  Cc: davem, Marcelo Ricardo Leitner, Neil Horman, Vlad Yasevich

From: Xin Long
> Sent: 03 March 2017 06:24
> David Laight noticed the support for MSG_MORE with datamsg->force_day
> didn't really work as we expected, as the first msg with MSG_MORE set
> would always block the following chunks' dequeuing.
> 
> This Patch is to rewrite it by saving the MSG_MORE flag into assoc as
> Divid Laight suggested.
   ^ typo

> asoc->force_delay is used to save MSG_MORE flag before a msg is sent.
> Once this msg is queued, asoc->force_delay is set back to 0, so that
> it will not affect other places flushing out queue.

That doesn't seem right nor make sense.

> asoc->force_delay works as a 'local param' here as the msg sending is
> under protection of sock lock.  It would make sctp's MSG_MORE work as
> tcp's.

It is much more important to get MSG_MORE working 'properly' for SCTP
than for TCP. For TCP an application can always use a long send.

...
> @@ -1982,6 +1982,7 @@ static int sctp_sendmsg(struct sock *sk, struct msghdr *msg, size_t msg_len)
>  	 * breaks.
>  	 */
>  	err = sctp_primitive_SEND(net, asoc, datamsg);
> +	asoc->force_delay = 0;
>  	/* Did the lower layer accept the chunk? */
>  	if (err) {
>  		sctp_datamsg_free(datamsg);

I don't think this is right - or needed.
You only get to the above if some test has decided to send data chunks.
So it just means that the NEXT time someone tries to send data all the
queued data gets sent.
I'm guessing that the whole thing gets called in a loop (definitely needed
for very long data chunks, or after the window is opened).
Now if an application sends a lot of (say) 100 byte chunks with MSG_MORE
set it would expect to see a lot of full ethernet frames be sent.
With the above a frame will be sent (containing all but 1 chunk) when the
amount of queued data becomes too large for an ethernet frame, and immediately
followed by a second ethernet frame with 1 chunk in it.

Now it might be that the flag needs clearing when retransmissions are queued.
OTOH they might get sent for other reasons.

	David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH net] sctp: change to save MSG_MORE flag into assoc
@ 2017-03-03 12:49   ` David Laight
  0 siblings, 0 replies; 14+ messages in thread
From: David Laight @ 2017-03-03 12:49 UTC (permalink / raw)
  To: 'Xin Long', network dev, linux-sctp
  Cc: davem, Marcelo Ricardo Leitner, Neil Horman, Vlad Yasevich

From: Xin Long
> Sent: 03 March 2017 06:24
> David Laight noticed the support for MSG_MORE with datamsg->force_day
> didn't really work as we expected, as the first msg with MSG_MORE set
> would always block the following chunks' dequeuing.
> 
> This Patch is to rewrite it by saving the MSG_MORE flag into assoc as
> Divid Laight suggested.
   ^ typo

> asoc->force_delay is used to save MSG_MORE flag before a msg is sent.
> Once this msg is queued, asoc->force_delay is set back to 0, so that
> it will not affect other places flushing out queue.

That doesn't seem right nor make sense.

> asoc->force_delay works as a 'local param' here as the msg sending is
> under protection of sock lock.  It would make sctp's MSG_MORE work as
> tcp's.

It is much more important to get MSG_MORE working 'properly' for SCTP
than for TCP. For TCP an application can always use a long send.

...
> @@ -1982,6 +1982,7 @@ static int sctp_sendmsg(struct sock *sk, struct msghdr *msg, size_t msg_len)
>  	 * breaks.
>  	 */
>  	err = sctp_primitive_SEND(net, asoc, datamsg);
> +	asoc->force_delay = 0;
>  	/* Did the lower layer accept the chunk? */
>  	if (err) {
>  		sctp_datamsg_free(datamsg);

I don't think this is right - or needed.
You only get to the above if some test has decided to send data chunks.
So it just means that the NEXT time someone tries to send data all the
queued data gets sent.
I'm guessing that the whole thing gets called in a loop (definitely needed
for very long data chunks, or after the window is opened).
Now if an application sends a lot of (say) 100 byte chunks with MSG_MORE
set it would expect to see a lot of full ethernet frames be sent.
With the above a frame will be sent (containing all but 1 chunk) when the
amount of queued data becomes too large for an ethernet frame, and immediately
followed by a second ethernet frame with 1 chunk in it.

Now it might be that the flag needs clearing when retransmissions are queued.
OTOH they might get sent for other reasons.

	David



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net] sctp: change to save MSG_MORE flag into assoc
  2017-03-03 12:49   ` David Laight
@ 2017-03-03 15:42     ` Xin Long
  -1 siblings, 0 replies; 14+ messages in thread
From: Xin Long @ 2017-03-03 15:42 UTC (permalink / raw)
  To: David Laight
  Cc: network dev, linux-sctp, davem, Marcelo Ricardo Leitner,
	Neil Horman, Vlad Yasevich

On Fri, Mar 3, 2017 at 8:49 PM, David Laight <David.Laight@aculab.com> wrote:
> From: Xin Long
>> Sent: 03 March 2017 06:24
>> David Laight noticed the support for MSG_MORE with datamsg->force_day
>> didn't really work as we expected, as the first msg with MSG_MORE set
>> would always block the following chunks' dequeuing.
>>
>> This Patch is to rewrite it by saving the MSG_MORE flag into assoc as
>> Divid Laight suggested.
>    ^ typo
ah, sorry. :P
>
>> asoc->force_delay is used to save MSG_MORE flag before a msg is sent.
>> Once this msg is queued, asoc->force_delay is set back to 0, so that
>> it will not affect other places flushing out queue.
>
> That doesn't seem right nor make sense.
>
>> asoc->force_delay works as a 'local param' here as the msg sending is
>> under protection of sock lock.  It would make sctp's MSG_MORE work as
>> tcp's.
>
> It is much more important to get MSG_MORE working 'properly' for SCTP
> than for TCP. For TCP an application can always use a long send.
"long send" ?, you mean bigger data, or keeping sending?
I didn't get the difference between SCTP and TCP, they
are similar when sending data.

>
> ...
>> @@ -1982,6 +1982,7 @@ static int sctp_sendmsg(struct sock *sk, struct msghdr *msg, size_t msg_len)
>>        * breaks.
>>        */
>>       err = sctp_primitive_SEND(net, asoc, datamsg);
>> +     asoc->force_delay = 0;
>>       /* Did the lower layer accept the chunk? */
>>       if (err) {
>>               sctp_datamsg_free(datamsg);
>
> I don't think this is right - or needed.
> You only get to the above if some test has decided to send data chunks.
> So it just means that the NEXT time someone tries to send data all the
> queued data gets sent.
the NEXT time someone tries to send data with "MSG_MORE clear",
yes, but with "MSG_MORE set", it will still delay.

> I'm guessing that the whole thing gets called in a loop (definitely needed
> for very long data chunks, or after the window is opened).
yes, if users keep sending data chunks with MSG_MORE set, no
data with "MSG_MORE clear" gap.

> Now if an application sends a lot of (say) 100 byte chunks with MSG_MORE
> set it would expect to see a lot of full ethernet frames be sent.
right.
> With the above a frame will be sent (containing all but 1 chunk) when the
> amount of queued data becomes too large for an ethernet frame, and immediately
> followed by a second ethernet frame with 1 chunk in it.
"followed by a second ethernet frame with 1 chunk in it.", I think this's
what you're really worried about, right ?
But sctp flush data queue NOT like what you think, it's not keep traversing
the queue untill the queue is empty.
once a packet with chunks in one ethernet frame is sent, sctp_outq_flush
will return. it will pack chunks and send the next packet again untill some
other 'event' triggers it, like retransmission or data received from peer.
I don't think this is a problem.

>
> Now it might be that the flag needs clearing when retransmissions are queued.
> OTOH they might get sent for other reasons.
Before we really overthought about MSG_MORE, no need to care about
retransmissions, define MSG_MORE, in my opinion, it works more for
*inflight is 0*, if it's not 0, we shouldn't stop other places flushing them.

We cannot let asoc's more_more flag work as global, it will block elsewhere
sending data chunks, not only sctp_sendmsg.

Thanks

>
>         David
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net] sctp: change to save MSG_MORE flag into assoc
@ 2017-03-03 15:42     ` Xin Long
  0 siblings, 0 replies; 14+ messages in thread
From: Xin Long @ 2017-03-03 15:42 UTC (permalink / raw)
  To: David Laight
  Cc: network dev, linux-sctp, davem, Marcelo Ricardo Leitner,
	Neil Horman, Vlad Yasevich

On Fri, Mar 3, 2017 at 8:49 PM, David Laight <David.Laight@aculab.com> wrote:
> From: Xin Long
>> Sent: 03 March 2017 06:24
>> David Laight noticed the support for MSG_MORE with datamsg->force_day
>> didn't really work as we expected, as the first msg with MSG_MORE set
>> would always block the following chunks' dequeuing.
>>
>> This Patch is to rewrite it by saving the MSG_MORE flag into assoc as
>> Divid Laight suggested.
>    ^ typo
ah, sorry. :P
>
>> asoc->force_delay is used to save MSG_MORE flag before a msg is sent.
>> Once this msg is queued, asoc->force_delay is set back to 0, so that
>> it will not affect other places flushing out queue.
>
> That doesn't seem right nor make sense.
>
>> asoc->force_delay works as a 'local param' here as the msg sending is
>> under protection of sock lock.  It would make sctp's MSG_MORE work as
>> tcp's.
>
> It is much more important to get MSG_MORE working 'properly' for SCTP
> than for TCP. For TCP an application can always use a long send.
"long send" ?, you mean bigger data, or keeping sending?
I didn't get the difference between SCTP and TCP, they
are similar when sending data.

>
> ...
>> @@ -1982,6 +1982,7 @@ static int sctp_sendmsg(struct sock *sk, struct msghdr *msg, size_t msg_len)
>>        * breaks.
>>        */
>>       err = sctp_primitive_SEND(net, asoc, datamsg);
>> +     asoc->force_delay = 0;
>>       /* Did the lower layer accept the chunk? */
>>       if (err) {
>>               sctp_datamsg_free(datamsg);
>
> I don't think this is right - or needed.
> You only get to the above if some test has decided to send data chunks.
> So it just means that the NEXT time someone tries to send data all the
> queued data gets sent.
the NEXT time someone tries to send data with "MSG_MORE clear",
yes, but with "MSG_MORE set", it will still delay.

> I'm guessing that the whole thing gets called in a loop (definitely needed
> for very long data chunks, or after the window is opened).
yes, if users keep sending data chunks with MSG_MORE set, no
data with "MSG_MORE clear" gap.

> Now if an application sends a lot of (say) 100 byte chunks with MSG_MORE
> set it would expect to see a lot of full ethernet frames be sent.
right.
> With the above a frame will be sent (containing all but 1 chunk) when the
> amount of queued data becomes too large for an ethernet frame, and immediately
> followed by a second ethernet frame with 1 chunk in it.
"followed by a second ethernet frame with 1 chunk in it.", I think this's
what you're really worried about, right ?
But sctp flush data queue NOT like what you think, it's not keep traversing
the queue untill the queue is empty.
once a packet with chunks in one ethernet frame is sent, sctp_outq_flush
will return. it will pack chunks and send the next packet again untill some
other 'event' triggers it, like retransmission or data received from peer.
I don't think this is a problem.

>
> Now it might be that the flag needs clearing when retransmissions are queued.
> OTOH they might get sent for other reasons.
Before we really overthought about MSG_MORE, no need to care about
retransmissions, define MSG_MORE, in my opinion, it works more for
*inflight is 0*, if it's not 0, we shouldn't stop other places flushing them.

We cannot let asoc's more_more flag work as global, it will block elsewhere
sending data chunks, not only sctp_sendmsg.

Thanks

>
>         David
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH net] sctp: change to save MSG_MORE flag into assoc
  2017-03-03 15:42     ` Xin Long
@ 2017-03-03 16:31       ` David Laight
  -1 siblings, 0 replies; 14+ messages in thread
From: David Laight @ 2017-03-03 16:31 UTC (permalink / raw)
  To: 'Xin Long'
  Cc: network dev, linux-sctp, davem, Marcelo Ricardo Leitner,
	Neil Horman, Vlad Yasevich

From: Xin Long
> Sent: 03 March 2017 15:43
...
> > It is much more important to get MSG_MORE working 'properly' for SCTP
> > than for TCP. For TCP an application can always use a long send.

> "long send" ?, you mean bigger data, or keeping sending?
> I didn't get the difference between SCTP and TCP, they
> are similar when sending data.

With tcp an application can always replace two send()/write()
calls with a single call to writev().
For sctp two send() calls must be made in order to generate two
data chunks.
So it is much easier for a tcp application to generate 'full'
ethernet packets. 

> 
> >
> > ...
> >> @@ -1982,6 +1982,7 @@ static int sctp_sendmsg(struct sock *sk, struct msghdr *msg, size_t msg_len)
> >>        * breaks.
> >>        */
> >>       err = sctp_primitive_SEND(net, asoc, datamsg);
> >> +     asoc->force_delay = 0;
> >>       /* Did the lower layer accept the chunk? */
> >>       if (err) {
> >>               sctp_datamsg_free(datamsg);
> >
> > I don't think this is right - or needed.
> > You only get to the above if some test has decided to send data chunks.
> > So it just means that the NEXT time someone tries to send data all the
> > queued data gets sent.

> the NEXT time someone tries to send data with "MSG_MORE clear",
> yes, but with "MSG_MORE set", it will still delay.
> 
> > I'm guessing that the whole thing gets called in a loop (definitely needed
> > for very long data chunks, or after the window is opened).

> yes, if users keep sending data chunks with MSG_MORE set, no
> data with "MSG_MORE clear" gap.
> 
> > Now if an application sends a lot of (say) 100 byte chunks with MSG_MORE
> > set it would expect to see a lot of full ethernet frames be sent.

> right.

> > With the above a frame will be sent (containing all but 1 chunk) when the
> > amount of queued data becomes too large for an ethernet frame, and immediately
> > followed by a second ethernet frame with 1 chunk in it.

> "followed by a second ethernet frame with 1 chunk in it.", I think this's
> what you're really worried about, right ?
> But sctp flush data queue NOT like what you think, it's not keep traversing
> the queue untill the queue is empty.
> once a packet with chunks in one ethernet frame is sent, sctp_outq_flush
> will return. it will pack chunks and send the next packet again untill some
> other 'event' triggers it, like retransmission or data received from peer.
> I don't think this is a problem.

Erm.... that can't work.
I think there is code to convert a large user send into multiple data chunks.
So if the user does a 4k (say) send several large chunks get queued.
These would need to all be sent at once.

Similarly when the transmit window is received.
So somewhere there ought to be a loop that will send more than one packet.

> > Now it might be that the flag needs clearing when retransmissions are queued.
> > OTOH they might get sent for other reasons.

> Before we really overthought about MSG_MORE, no need to care about
> retransmissions, define MSG_MORE, in my opinion, it works more for
> *inflight is 0*, if it's not 0, we shouldn't stop other places flushing them.

Eh? and when nagle disabled.
If 'inflight' isn't 0 then most paths don't flush data.

> We cannot let asoc's more_more flag work as global, it will block elsewhere
> sending data chunks, not only sctp_sendmsg.

If the connection was flow controlled off, and more 'credit' arrives and there
is less that an ethernet frame's worth of data pending, and the last send
said 'MSG_MORE' there is no point sending anything until the application
does a send with MSG_MORE clear.

I'm not sure what causes a retransmission to send data, I suspect that 'inflight'
can easily be non-zero at that time.
Likely something causes a packet be generated - which then collects the data chunks.

	David



^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH net] sctp: change to save MSG_MORE flag into assoc
@ 2017-03-03 16:31       ` David Laight
  0 siblings, 0 replies; 14+ messages in thread
From: David Laight @ 2017-03-03 16:31 UTC (permalink / raw)
  To: 'Xin Long'
  Cc: network dev, linux-sctp, davem, Marcelo Ricardo Leitner,
	Neil Horman, Vlad Yasevich

RnJvbTogWGluIExvbmcNCj4gU2VudDogMDMgTWFyY2ggMjAxNyAxNTo0Mw0KLi4uDQo+ID4gSXQg
aXMgbXVjaCBtb3JlIGltcG9ydGFudCB0byBnZXQgTVNHX01PUkUgd29ya2luZyAncHJvcGVybHkn
IGZvciBTQ1RQDQo+ID4gdGhhbiBmb3IgVENQLiBGb3IgVENQIGFuIGFwcGxpY2F0aW9uIGNhbiBh
bHdheXMgdXNlIGEgbG9uZyBzZW5kLg0KDQo+ICJsb25nIHNlbmQiID8sIHlvdSBtZWFuIGJpZ2dl
ciBkYXRhLCBvciBrZWVwaW5nIHNlbmRpbmc/DQo+IEkgZGlkbid0IGdldCB0aGUgZGlmZmVyZW5j
ZSBiZXR3ZWVuIFNDVFAgYW5kIFRDUCwgdGhleQ0KPiBhcmUgc2ltaWxhciB3aGVuIHNlbmRpbmcg
ZGF0YS4NCg0KV2l0aCB0Y3AgYW4gYXBwbGljYXRpb24gY2FuIGFsd2F5cyByZXBsYWNlIHR3byBz
ZW5kKCkvd3JpdGUoKQ0KY2FsbHMgd2l0aCBhIHNpbmdsZSBjYWxsIHRvIHdyaXRldigpLg0KRm9y
IHNjdHAgdHdvIHNlbmQoKSBjYWxscyBtdXN0IGJlIG1hZGUgaW4gb3JkZXIgdG8gZ2VuZXJhdGUg
dHdvDQpkYXRhIGNodW5rcy4NClNvIGl0IGlzIG11Y2ggZWFzaWVyIGZvciBhIHRjcCBhcHBsaWNh
dGlvbiB0byBnZW5lcmF0ZSAnZnVsbCcNCmV0aGVybmV0IHBhY2tldHMuIA0KDQo+IA0KPiA+DQo+
ID4gLi4uDQo+ID4+IEBAIC0xOTgyLDYgKzE5ODIsNyBAQCBzdGF0aWMgaW50IHNjdHBfc2VuZG1z
ZyhzdHJ1Y3Qgc29jayAqc2ssIHN0cnVjdCBtc2doZHIgKm1zZywgc2l6ZV90IG1zZ19sZW4pDQo+
ID4+ICAgICAgICAqIGJyZWFrcy4NCj4gPj4gICAgICAgICovDQo+ID4+ICAgICAgIGVyciA9IHNj
dHBfcHJpbWl0aXZlX1NFTkQobmV0LCBhc29jLCBkYXRhbXNnKTsNCj4gPj4gKyAgICAgYXNvYy0+
Zm9yY2VfZGVsYXkgPSAwOw0KPiA+PiAgICAgICAvKiBEaWQgdGhlIGxvd2VyIGxheWVyIGFjY2Vw
dCB0aGUgY2h1bms/ICovDQo+ID4+ICAgICAgIGlmIChlcnIpIHsNCj4gPj4gICAgICAgICAgICAg
ICBzY3RwX2RhdGFtc2dfZnJlZShkYXRhbXNnKTsNCj4gPg0KPiA+IEkgZG9uJ3QgdGhpbmsgdGhp
cyBpcyByaWdodCAtIG9yIG5lZWRlZC4NCj4gPiBZb3Ugb25seSBnZXQgdG8gdGhlIGFib3ZlIGlm
IHNvbWUgdGVzdCBoYXMgZGVjaWRlZCB0byBzZW5kIGRhdGEgY2h1bmtzLg0KPiA+IFNvIGl0IGp1
c3QgbWVhbnMgdGhhdCB0aGUgTkVYVCB0aW1lIHNvbWVvbmUgdHJpZXMgdG8gc2VuZCBkYXRhIGFs
bCB0aGUNCj4gPiBxdWV1ZWQgZGF0YSBnZXRzIHNlbnQuDQoNCj4gdGhlIE5FWFQgdGltZSBzb21l
b25lIHRyaWVzIHRvIHNlbmQgZGF0YSB3aXRoICJNU0dfTU9SRSBjbGVhciIsDQo+IHllcywgYnV0
IHdpdGggIk1TR19NT1JFIHNldCIsIGl0IHdpbGwgc3RpbGwgZGVsYXkuDQo+IA0KPiA+IEknbSBn
dWVzc2luZyB0aGF0IHRoZSB3aG9sZSB0aGluZyBnZXRzIGNhbGxlZCBpbiBhIGxvb3AgKGRlZmlu
aXRlbHkgbmVlZGVkDQo+ID4gZm9yIHZlcnkgbG9uZyBkYXRhIGNodW5rcywgb3IgYWZ0ZXIgdGhl
IHdpbmRvdyBpcyBvcGVuZWQpLg0KDQo+IHllcywgaWYgdXNlcnMga2VlcCBzZW5kaW5nIGRhdGEg
Y2h1bmtzIHdpdGggTVNHX01PUkUgc2V0LCBubw0KPiBkYXRhIHdpdGggIk1TR19NT1JFIGNsZWFy
IiBnYXAuDQo+IA0KPiA+IE5vdyBpZiBhbiBhcHBsaWNhdGlvbiBzZW5kcyBhIGxvdCBvZiAoc2F5
KSAxMDAgYnl0ZSBjaHVua3Mgd2l0aCBNU0dfTU9SRQ0KPiA+IHNldCBpdCB3b3VsZCBleHBlY3Qg
dG8gc2VlIGEgbG90IG9mIGZ1bGwgZXRoZXJuZXQgZnJhbWVzIGJlIHNlbnQuDQoNCj4gcmlnaHQu
DQoNCj4gPiBXaXRoIHRoZSBhYm92ZSBhIGZyYW1lIHdpbGwgYmUgc2VudCAoY29udGFpbmluZyBh
bGwgYnV0IDEgY2h1bmspIHdoZW4gdGhlDQo+ID4gYW1vdW50IG9mIHF1ZXVlZCBkYXRhIGJlY29t
ZXMgdG9vIGxhcmdlIGZvciBhbiBldGhlcm5ldCBmcmFtZSwgYW5kIGltbWVkaWF0ZWx5DQo+ID4g
Zm9sbG93ZWQgYnkgYSBzZWNvbmQgZXRoZXJuZXQgZnJhbWUgd2l0aCAxIGNodW5rIGluIGl0Lg0K
DQo+ICJmb2xsb3dlZCBieSBhIHNlY29uZCBldGhlcm5ldCBmcmFtZSB3aXRoIDEgY2h1bmsgaW4g
aXQuIiwgSSB0aGluayB0aGlzJ3MNCj4gd2hhdCB5b3UncmUgcmVhbGx5IHdvcnJpZWQgYWJvdXQs
IHJpZ2h0ID8NCj4gQnV0IHNjdHAgZmx1c2ggZGF0YSBxdWV1ZSBOT1QgbGlrZSB3aGF0IHlvdSB0
aGluaywgaXQncyBub3Qga2VlcCB0cmF2ZXJzaW5nDQo+IHRoZSBxdWV1ZSB1bnRpbGwgdGhlIHF1
ZXVlIGlzIGVtcHR5Lg0KPiBvbmNlIGEgcGFja2V0IHdpdGggY2h1bmtzIGluIG9uZSBldGhlcm5l
dCBmcmFtZSBpcyBzZW50LCBzY3RwX291dHFfZmx1c2gNCj4gd2lsbCByZXR1cm4uIGl0IHdpbGwg
cGFjayBjaHVua3MgYW5kIHNlbmQgdGhlIG5leHQgcGFja2V0IGFnYWluIHVudGlsbCBzb21lDQo+
IG90aGVyICdldmVudCcgdHJpZ2dlcnMgaXQsIGxpa2UgcmV0cmFuc21pc3Npb24gb3IgZGF0YSBy
ZWNlaXZlZCBmcm9tIHBlZXIuDQo+IEkgZG9uJ3QgdGhpbmsgdGhpcyBpcyBhIHByb2JsZW0uDQoN
CkVybS4uLi4gdGhhdCBjYW4ndCB3b3JrLg0KSSB0aGluayB0aGVyZSBpcyBjb2RlIHRvIGNvbnZl
cnQgYSBsYXJnZSB1c2VyIHNlbmQgaW50byBtdWx0aXBsZSBkYXRhIGNodW5rcy4NClNvIGlmIHRo
ZSB1c2VyIGRvZXMgYSA0ayAoc2F5KSBzZW5kIHNldmVyYWwgbGFyZ2UgY2h1bmtzIGdldCBxdWV1
ZWQuDQpUaGVzZSB3b3VsZCBuZWVkIHRvIGFsbCBiZSBzZW50IGF0IG9uY2UuDQoNClNpbWlsYXJs
eSB3aGVuIHRoZSB0cmFuc21pdCB3aW5kb3cgaXMgcmVjZWl2ZWQuDQpTbyBzb21ld2hlcmUgdGhl
cmUgb3VnaHQgdG8gYmUgYSBsb29wIHRoYXQgd2lsbCBzZW5kIG1vcmUgdGhhbiBvbmUgcGFja2V0
Lg0KDQo+ID4gTm93IGl0IG1pZ2h0IGJlIHRoYXQgdGhlIGZsYWcgbmVlZHMgY2xlYXJpbmcgd2hl
biByZXRyYW5zbWlzc2lvbnMgYXJlIHF1ZXVlZC4NCj4gPiBPVE9IIHRoZXkgbWlnaHQgZ2V0IHNl
bnQgZm9yIG90aGVyIHJlYXNvbnMuDQoNCj4gQmVmb3JlIHdlIHJlYWxseSBvdmVydGhvdWdodCBh
Ym91dCBNU0dfTU9SRSwgbm8gbmVlZCB0byBjYXJlIGFib3V0DQo+IHJldHJhbnNtaXNzaW9ucywg
ZGVmaW5lIE1TR19NT1JFLCBpbiBteSBvcGluaW9uLCBpdCB3b3JrcyBtb3JlIGZvcg0KPiAqaW5m
bGlnaHQgaXMgMCosIGlmIGl0J3Mgbm90IDAsIHdlIHNob3VsZG4ndCBzdG9wIG90aGVyIHBsYWNl
cyBmbHVzaGluZyB0aGVtLg0KDQpFaD8gYW5kIHdoZW4gbmFnbGUgZGlzYWJsZWQuDQpJZiAnaW5m
bGlnaHQnIGlzbid0IDAgdGhlbiBtb3N0IHBhdGhzIGRvbid0IGZsdXNoIGRhdGEuDQoNCj4gV2Ug
Y2Fubm90IGxldCBhc29jJ3MgbW9yZV9tb3JlIGZsYWcgd29yayBhcyBnbG9iYWwsIGl0IHdpbGwg
YmxvY2sgZWxzZXdoZXJlDQo+IHNlbmRpbmcgZGF0YSBjaHVua3MsIG5vdCBvbmx5IHNjdHBfc2Vu
ZG1zZy4NCg0KSWYgdGhlIGNvbm5lY3Rpb24gd2FzIGZsb3cgY29udHJvbGxlZCBvZmYsIGFuZCBt
b3JlICdjcmVkaXQnIGFycml2ZXMgYW5kIHRoZXJlDQppcyBsZXNzIHRoYXQgYW4gZXRoZXJuZXQg
ZnJhbWUncyB3b3J0aCBvZiBkYXRhIHBlbmRpbmcsIGFuZCB0aGUgbGFzdCBzZW5kDQpzYWlkICdN
U0dfTU9SRScgdGhlcmUgaXMgbm8gcG9pbnQgc2VuZGluZyBhbnl0aGluZyB1bnRpbCB0aGUgYXBw
bGljYXRpb24NCmRvZXMgYSBzZW5kIHdpdGggTVNHX01PUkUgY2xlYXIuDQoNCkknbSBub3Qgc3Vy
ZSB3aGF0IGNhdXNlcyBhIHJldHJhbnNtaXNzaW9uIHRvIHNlbmQgZGF0YSwgSSBzdXNwZWN0IHRo
YXQgJ2luZmxpZ2h0Jw0KY2FuIGVhc2lseSBiZSBub24temVybyBhdCB0aGF0IHRpbWUuDQpMaWtl
bHkgc29tZXRoaW5nIGNhdXNlcyBhIHBhY2tldCBiZSBnZW5lcmF0ZWQgLSB3aGljaCB0aGVuIGNv
bGxlY3RzIHRoZSBkYXRhIGNodW5rcy4NCg0KCURhdmlkDQoNCg0K

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net] sctp: change to save MSG_MORE flag into assoc
  2017-03-03 16:31       ` David Laight
@ 2017-03-03 17:57         ` Xin Long
  -1 siblings, 0 replies; 14+ messages in thread
From: Xin Long @ 2017-03-03 17:57 UTC (permalink / raw)
  To: David Laight
  Cc: network dev, linux-sctp, davem, Marcelo Ricardo Leitner,
	Neil Horman, Vlad Yasevich

On Sat, Mar 4, 2017 at 12:31 AM, David Laight <David.Laight@aculab.com> wrote:
> From: Xin Long
>> Sent: 03 March 2017 15:43
> ...
>> > It is much more important to get MSG_MORE working 'properly' for SCTP
>> > than for TCP. For TCP an application can always use a long send.
>
>> "long send" ?, you mean bigger data, or keeping sending?
>> I didn't get the difference between SCTP and TCP, they
>> are similar when sending data.
>
> With tcp an application can always replace two send()/write()
> calls with a single call to writev().
> For sctp two send() calls must be made in order to generate two
> data chunks.
> So it is much easier for a tcp application to generate 'full'
> ethernet packets.
okay, it should not be a important reason, and sctp might also support
it one day. :-)

>
>>
>> >
>> > ...
>> >> @@ -1982,6 +1982,7 @@ static int sctp_sendmsg(struct sock *sk, struct msghdr *msg, size_t msg_len)
>> >>        * breaks.
>> >>        */
>> >>       err = sctp_primitive_SEND(net, asoc, datamsg);
>> >> +     asoc->force_delay = 0;
>> >>       /* Did the lower layer accept the chunk? */
>> >>       if (err) {
>> >>               sctp_datamsg_free(datamsg);
>> >
>> > I don't think this is right - or needed.
>> > You only get to the above if some test has decided to send data chunks.
>> > So it just means that the NEXT time someone tries to send data all the
>> > queued data gets sent.
>
>> the NEXT time someone tries to send data with "MSG_MORE clear",
>> yes, but with "MSG_MORE set", it will still delay.
>>
>> > I'm guessing that the whole thing gets called in a loop (definitely needed
>> > for very long data chunks, or after the window is opened).
>
>> yes, if users keep sending data chunks with MSG_MORE set, no
>> data with "MSG_MORE clear" gap.
>>
>> > Now if an application sends a lot of (say) 100 byte chunks with MSG_MORE
>> > set it would expect to see a lot of full ethernet frames be sent.
>
>> right.
>
>> > With the above a frame will be sent (containing all but 1 chunk) when the
>> > amount of queued data becomes too large for an ethernet frame, and immediately
>> > followed by a second ethernet frame with 1 chunk in it.
>
>> "followed by a second ethernet frame with 1 chunk in it.", I think this's
>> what you're really worried about, right ?
>> But sctp flush data queue NOT like what you think, it's not keep traversing
>> the queue untill the queue is empty.
>> once a packet with chunks in one ethernet frame is sent, sctp_outq_flush
>> will return. it will pack chunks and send the next packet again untill some
>> other 'event' triggers it, like retransmission or data received from peer.
>> I don't think this is a problem.
>
> Erm.... that can't work.
> I think there is code to convert a large user send into multiple data chunks.
> So if the user does a 4k (say) send several large chunks get queued.
> These would need to all be sent at once.
>
> Similarly when the transmit window is received.
> So somewhere there ought to be a loop that will send more than one packet.
As far as I can see, no loop like you said, mostly, the incoming
chunk (like SACK) from peer will trigger the next flush out.
I can try to trace the path in kernel for sure tomorrow.

>
>> > Now it might be that the flag needs clearing when retransmissions are queued.
>> > OTOH they might get sent for other reasons.
>
>> Before we really overthought about MSG_MORE, no need to care about
>> retransmissions, define MSG_MORE, in my opinion, it works more for
>> *inflight is 0*, if it's not 0, we shouldn't stop other places flushing them.
>
> Eh? and when nagle disabled.
> If 'inflight' isn't 0 then most paths don't flush data.
I knew, but MSG_MORE is different thing, it should only try to work for the
current and following data.

>
>> We cannot let asoc's more_more flag work as global, it will block elsewhere
>> sending data chunks, not only sctp_sendmsg.
>
> If the connection was flow controlled off, and more 'credit' arrives and there
> is less that an ethernet frame's worth of data pending, and the last send
> said 'MSG_MORE' there is no point sending anything until the application
> does a send with MSG_MORE clear.
got you, I think you have different understanding about MSG_MORE
while this patch just try to make it work like TCP's msg_more, but what
you mentioned here is the same as TCP thing, seems you also want
to improve TCP's MSG_MORE :-)

>
> I'm not sure what causes a retransmission to send data, I suspect that 'inflight'
> can easily be non-zero at that time.
The thing that causes a retransmission to send data is that both tx and
rtx send data through sctp_outq_flush, in which it will try to send rtx queue,
then rx queue.

yes, once a packet is sent out and not yet be SACKed, "inflight" will not be
zero, so when retransmiting, "inflight" must be non-zero.

> Likely something causes a packet be generated - which then collects the data chunks.
>
>         David
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net] sctp: change to save MSG_MORE flag into assoc
@ 2017-03-03 17:57         ` Xin Long
  0 siblings, 0 replies; 14+ messages in thread
From: Xin Long @ 2017-03-03 17:57 UTC (permalink / raw)
  To: David Laight
  Cc: network dev, linux-sctp, davem, Marcelo Ricardo Leitner,
	Neil Horman, Vlad Yasevich

On Sat, Mar 4, 2017 at 12:31 AM, David Laight <David.Laight@aculab.com> wrote:
> From: Xin Long
>> Sent: 03 March 2017 15:43
> ...
>> > It is much more important to get MSG_MORE working 'properly' for SCTP
>> > than for TCP. For TCP an application can always use a long send.
>
>> "long send" ?, you mean bigger data, or keeping sending?
>> I didn't get the difference between SCTP and TCP, they
>> are similar when sending data.
>
> With tcp an application can always replace two send()/write()
> calls with a single call to writev().
> For sctp two send() calls must be made in order to generate two
> data chunks.
> So it is much easier for a tcp application to generate 'full'
> ethernet packets.
okay, it should not be a important reason, and sctp might also support
it one day. :-)

>
>>
>> >
>> > ...
>> >> @@ -1982,6 +1982,7 @@ static int sctp_sendmsg(struct sock *sk, struct msghdr *msg, size_t msg_len)
>> >>        * breaks.
>> >>        */
>> >>       err = sctp_primitive_SEND(net, asoc, datamsg);
>> >> +     asoc->force_delay = 0;
>> >>       /* Did the lower layer accept the chunk? */
>> >>       if (err) {
>> >>               sctp_datamsg_free(datamsg);
>> >
>> > I don't think this is right - or needed.
>> > You only get to the above if some test has decided to send data chunks.
>> > So it just means that the NEXT time someone tries to send data all the
>> > queued data gets sent.
>
>> the NEXT time someone tries to send data with "MSG_MORE clear",
>> yes, but with "MSG_MORE set", it will still delay.
>>
>> > I'm guessing that the whole thing gets called in a loop (definitely needed
>> > for very long data chunks, or after the window is opened).
>
>> yes, if users keep sending data chunks with MSG_MORE set, no
>> data with "MSG_MORE clear" gap.
>>
>> > Now if an application sends a lot of (say) 100 byte chunks with MSG_MORE
>> > set it would expect to see a lot of full ethernet frames be sent.
>
>> right.
>
>> > With the above a frame will be sent (containing all but 1 chunk) when the
>> > amount of queued data becomes too large for an ethernet frame, and immediately
>> > followed by a second ethernet frame with 1 chunk in it.
>
>> "followed by a second ethernet frame with 1 chunk in it.", I think this's
>> what you're really worried about, right ?
>> But sctp flush data queue NOT like what you think, it's not keep traversing
>> the queue untill the queue is empty.
>> once a packet with chunks in one ethernet frame is sent, sctp_outq_flush
>> will return. it will pack chunks and send the next packet again untill some
>> other 'event' triggers it, like retransmission or data received from peer.
>> I don't think this is a problem.
>
> Erm.... that can't work.
> I think there is code to convert a large user send into multiple data chunks.
> So if the user does a 4k (say) send several large chunks get queued.
> These would need to all be sent at once.
>
> Similarly when the transmit window is received.
> So somewhere there ought to be a loop that will send more than one packet.
As far as I can see, no loop like you said, mostly, the incoming
chunk (like SACK) from peer will trigger the next flush out.
I can try to trace the path in kernel for sure tomorrow.

>
>> > Now it might be that the flag needs clearing when retransmissions are queued.
>> > OTOH they might get sent for other reasons.
>
>> Before we really overthought about MSG_MORE, no need to care about
>> retransmissions, define MSG_MORE, in my opinion, it works more for
>> *inflight is 0*, if it's not 0, we shouldn't stop other places flushing them.
>
> Eh? and when nagle disabled.
> If 'inflight' isn't 0 then most paths don't flush data.
I knew, but MSG_MORE is different thing, it should only try to work for the
current and following data.

>
>> We cannot let asoc's more_more flag work as global, it will block elsewhere
>> sending data chunks, not only sctp_sendmsg.
>
> If the connection was flow controlled off, and more 'credit' arrives and there
> is less that an ethernet frame's worth of data pending, and the last send
> said 'MSG_MORE' there is no point sending anything until the application
> does a send with MSG_MORE clear.
got you, I think you have different understanding about MSG_MORE
while this patch just try to make it work like TCP's msg_more, but what
you mentioned here is the same as TCP thing, seems you also want
to improve TCP's MSG_MORE :-)

>
> I'm not sure what causes a retransmission to send data, I suspect that 'inflight'
> can easily be non-zero at that time.
The thing that causes a retransmission to send data is that both tx and
rtx send data through sctp_outq_flush, in which it will try to send rtx queue,
then rx queue.

yes, once a packet is sent out and not yet be SACKed, "inflight" will not be
zero, so when retransmiting, "inflight" must be non-zero.

> Likely something causes a packet be generated - which then collects the data chunks.
>
>         David
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net] sctp: change to save MSG_MORE flag into assoc
  2017-03-03 17:57         ` Xin Long
@ 2017-03-04  4:51           ` Xin Long
  -1 siblings, 0 replies; 14+ messages in thread
From: Xin Long @ 2017-03-04  4:51 UTC (permalink / raw)
  To: David Laight
  Cc: network dev, linux-sctp, davem, Marcelo Ricardo Leitner,
	Neil Horman, Vlad Yasevich

On Sat, Mar 4, 2017 at 1:57 AM, Xin Long <lucien.xin@gmail.com> wrote:
> On Sat, Mar 4, 2017 at 12:31 AM, David Laight <David.Laight@aculab.com> wrote:
>> From: Xin Long
>>> Sent: 03 March 2017 15:43
>> ...
>>> > It is much more important to get MSG_MORE working 'properly' for SCTP
>>> > than for TCP. For TCP an application can always use a long send.
>>
>>> "long send" ?, you mean bigger data, or keeping sending?
>>> I didn't get the difference between SCTP and TCP, they
>>> are similar when sending data.
>>
>> With tcp an application can always replace two send()/write()
>> calls with a single call to writev().
>> For sctp two send() calls must be made in order to generate two
>> data chunks.
>> So it is much easier for a tcp application to generate 'full'
>> ethernet packets.
> okay, it should not be a important reason, and sctp might also support
> it one day. :-)
>
>>
>>>
>>> >
>>> > ...
>>> >> @@ -1982,6 +1982,7 @@ static int sctp_sendmsg(struct sock *sk, struct msghdr *msg, size_t msg_len)
>>> >>        * breaks.
>>> >>        */
>>> >>       err = sctp_primitive_SEND(net, asoc, datamsg);
>>> >> +     asoc->force_delay = 0;
>>> >>       /* Did the lower layer accept the chunk? */
>>> >>       if (err) {
>>> >>               sctp_datamsg_free(datamsg);
>>> >
>>> > I don't think this is right - or needed.
>>> > You only get to the above if some test has decided to send data chunks.
>>> > So it just means that the NEXT time someone tries to send data all the
>>> > queued data gets sent.
>>
>>> the NEXT time someone tries to send data with "MSG_MORE clear",
>>> yes, but with "MSG_MORE set", it will still delay.
>>>
>>> > I'm guessing that the whole thing gets called in a loop (definitely needed
>>> > for very long data chunks, or after the window is opened).
>>
>>> yes, if users keep sending data chunks with MSG_MORE set, no
>>> data with "MSG_MORE clear" gap.
>>>
>>> > Now if an application sends a lot of (say) 100 byte chunks with MSG_MORE
>>> > set it would expect to see a lot of full ethernet frames be sent.
>>
>>> right.
>>
>>> > With the above a frame will be sent (containing all but 1 chunk) when the
>>> > amount of queued data becomes too large for an ethernet frame, and immediately
>>> > followed by a second ethernet frame with 1 chunk in it.
>>
>>> "followed by a second ethernet frame with 1 chunk in it.", I think this's
>>> what you're really worried about, right ?
>>> But sctp flush data queue NOT like what you think, it's not keep traversing
>>> the queue untill the queue is empty.
>>> once a packet with chunks in one ethernet frame is sent, sctp_outq_flush
>>> will return. it will pack chunks and send the next packet again untill some
>>> other 'event' triggers it, like retransmission or data received from peer.
>>> I don't think this is a problem.
>>
>> Erm.... that can't work.
>> I think there is code to convert a large user send into multiple data chunks.
>> So if the user does a 4k (say) send several large chunks get queued.
>> These would need to all be sent at once.
>>
>> Similarly when the transmit window is received.
>> So somewhere there ought to be a loop that will send more than one packet.
> As far as I can see, no loop like you said, mostly, the incoming
> chunk (like SACK) from peer will trigger the next flush out.
> I can try to trace the path in kernel for sure tomorrow.
okay, you are right, I missed sctp_packet_transmit_chunk also call
sctp_packet_transmit to send the current packet. :)

But if we keep sending data with "MSG_MORE", after one ethernet frame
is sent, "followed by a second ethernet frame with 1 chunk in it" will NOT
happen, as in this loop the asoc's msg_more flag is still set, and this flush
is called by sctp_sendmsg(the function msg_more should care more).

did I miss something ?

>
>>
>>> > Now it might be that the flag needs clearing when retransmissions are queued.
>>> > OTOH they might get sent for other reasons.
>>
>>> Before we really overthought about MSG_MORE, no need to care about
>>> retransmissions, define MSG_MORE, in my opinion, it works more for
>>> *inflight is 0*, if it's not 0, we shouldn't stop other places flushing them.
>>
>> Eh? and when nagle disabled.
>> If 'inflight' isn't 0 then most paths don't flush data.
> I knew, but MSG_MORE is different thing, it should only try to work for the
> current and following data.
>
>>
>>> We cannot let asoc's more_more flag work as global, it will block elsewhere
>>> sending data chunks, not only sctp_sendmsg.
>>
>> If the connection was flow controlled off, and more 'credit' arrives and there
>> is less that an ethernet frame's worth of data pending, and the last send
>> said 'MSG_MORE' there is no point sending anything until the application
>> does a send with MSG_MORE clear.
> got you, I think you have different understanding about MSG_MORE
> while this patch just try to make it work like TCP's msg_more, but what
> you mentioned here is the same as TCP thing, seems you also want
> to improve TCP's MSG_MORE :-)
>
>>
>> I'm not sure what causes a retransmission to send data, I suspect that 'inflight'
>> can easily be non-zero at that time.
> The thing that causes a retransmission to send data is that both tx and
> rtx send data through sctp_outq_flush, in which it will try to send rtx queue,
> then rx queue.
>
> yes, once a packet is sent out and not yet be SACKed, "inflight" will not be
> zero, so when retransmiting, "inflight" must be non-zero.
>
>> Likely something causes a packet be generated - which then collects the data chunks.
>>
>>         David
>>
>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net] sctp: change to save MSG_MORE flag into assoc
@ 2017-03-04  4:51           ` Xin Long
  0 siblings, 0 replies; 14+ messages in thread
From: Xin Long @ 2017-03-04  4:51 UTC (permalink / raw)
  To: David Laight
  Cc: network dev, linux-sctp, davem, Marcelo Ricardo Leitner,
	Neil Horman, Vlad Yasevich

On Sat, Mar 4, 2017 at 1:57 AM, Xin Long <lucien.xin@gmail.com> wrote:
> On Sat, Mar 4, 2017 at 12:31 AM, David Laight <David.Laight@aculab.com> wrote:
>> From: Xin Long
>>> Sent: 03 March 2017 15:43
>> ...
>>> > It is much more important to get MSG_MORE working 'properly' for SCTP
>>> > than for TCP. For TCP an application can always use a long send.
>>
>>> "long send" ?, you mean bigger data, or keeping sending?
>>> I didn't get the difference between SCTP and TCP, they
>>> are similar when sending data.
>>
>> With tcp an application can always replace two send()/write()
>> calls with a single call to writev().
>> For sctp two send() calls must be made in order to generate two
>> data chunks.
>> So it is much easier for a tcp application to generate 'full'
>> ethernet packets.
> okay, it should not be a important reason, and sctp might also support
> it one day. :-)
>
>>
>>>
>>> >
>>> > ...
>>> >> @@ -1982,6 +1982,7 @@ static int sctp_sendmsg(struct sock *sk, struct msghdr *msg, size_t msg_len)
>>> >>        * breaks.
>>> >>        */
>>> >>       err = sctp_primitive_SEND(net, asoc, datamsg);
>>> >> +     asoc->force_delay = 0;
>>> >>       /* Did the lower layer accept the chunk? */
>>> >>       if (err) {
>>> >>               sctp_datamsg_free(datamsg);
>>> >
>>> > I don't think this is right - or needed.
>>> > You only get to the above if some test has decided to send data chunks.
>>> > So it just means that the NEXT time someone tries to send data all the
>>> > queued data gets sent.
>>
>>> the NEXT time someone tries to send data with "MSG_MORE clear",
>>> yes, but with "MSG_MORE set", it will still delay.
>>>
>>> > I'm guessing that the whole thing gets called in a loop (definitely needed
>>> > for very long data chunks, or after the window is opened).
>>
>>> yes, if users keep sending data chunks with MSG_MORE set, no
>>> data with "MSG_MORE clear" gap.
>>>
>>> > Now if an application sends a lot of (say) 100 byte chunks with MSG_MORE
>>> > set it would expect to see a lot of full ethernet frames be sent.
>>
>>> right.
>>
>>> > With the above a frame will be sent (containing all but 1 chunk) when the
>>> > amount of queued data becomes too large for an ethernet frame, and immediately
>>> > followed by a second ethernet frame with 1 chunk in it.
>>
>>> "followed by a second ethernet frame with 1 chunk in it.", I think this's
>>> what you're really worried about, right ?
>>> But sctp flush data queue NOT like what you think, it's not keep traversing
>>> the queue untill the queue is empty.
>>> once a packet with chunks in one ethernet frame is sent, sctp_outq_flush
>>> will return. it will pack chunks and send the next packet again untill some
>>> other 'event' triggers it, like retransmission or data received from peer.
>>> I don't think this is a problem.
>>
>> Erm.... that can't work.
>> I think there is code to convert a large user send into multiple data chunks.
>> So if the user does a 4k (say) send several large chunks get queued.
>> These would need to all be sent at once.
>>
>> Similarly when the transmit window is received.
>> So somewhere there ought to be a loop that will send more than one packet.
> As far as I can see, no loop like you said, mostly, the incoming
> chunk (like SACK) from peer will trigger the next flush out.
> I can try to trace the path in kernel for sure tomorrow.
okay, you are right, I missed sctp_packet_transmit_chunk also call
sctp_packet_transmit to send the current packet. :)

But if we keep sending data with "MSG_MORE", after one ethernet frame
is sent, "followed by a second ethernet frame with 1 chunk in it" will NOT
happen, as in this loop the asoc's msg_more flag is still set, and this flush
is called by sctp_sendmsg(the function msg_more should care more).

did I miss something ?

>
>>
>>> > Now it might be that the flag needs clearing when retransmissions are queued.
>>> > OTOH they might get sent for other reasons.
>>
>>> Before we really overthought about MSG_MORE, no need to care about
>>> retransmissions, define MSG_MORE, in my opinion, it works more for
>>> *inflight is 0*, if it's not 0, we shouldn't stop other places flushing them.
>>
>> Eh? and when nagle disabled.
>> If 'inflight' isn't 0 then most paths don't flush data.
> I knew, but MSG_MORE is different thing, it should only try to work for the
> current and following data.
>
>>
>>> We cannot let asoc's more_more flag work as global, it will block elsewhere
>>> sending data chunks, not only sctp_sendmsg.
>>
>> If the connection was flow controlled off, and more 'credit' arrives and there
>> is less that an ethernet frame's worth of data pending, and the last send
>> said 'MSG_MORE' there is no point sending anything until the application
>> does a send with MSG_MORE clear.
> got you, I think you have different understanding about MSG_MORE
> while this patch just try to make it work like TCP's msg_more, but what
> you mentioned here is the same as TCP thing, seems you also want
> to improve TCP's MSG_MORE :-)
>
>>
>> I'm not sure what causes a retransmission to send data, I suspect that 'inflight'
>> can easily be non-zero at that time.
> The thing that causes a retransmission to send data is that both tx and
> rtx send data through sctp_outq_flush, in which it will try to send rtx queue,
> then rx queue.
>
> yes, once a packet is sent out and not yet be SACKed, "inflight" will not be
> zero, so when retransmiting, "inflight" must be non-zero.
>
>> Likely something causes a packet be generated - which then collects the data chunks.
>>
>>         David
>>
>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net] sctp: change to save MSG_MORE flag into assoc
  2017-03-04  4:51           ` Xin Long
@ 2017-03-07  3:56             ` Xin Long
  -1 siblings, 0 replies; 14+ messages in thread
From: Xin Long @ 2017-03-07  3:56 UTC (permalink / raw)
  To: David Laight
  Cc: network dev, linux-sctp, davem, Marcelo Ricardo Leitner,
	Neil Horman, Vlad Yasevich

On Sat, Mar 4, 2017 at 12:51 PM, Xin Long <lucien.xin@gmail.com> wrote:
> On Sat, Mar 4, 2017 at 1:57 AM, Xin Long <lucien.xin@gmail.com> wrote:
>> On Sat, Mar 4, 2017 at 12:31 AM, David Laight <David.Laight@aculab.com> wrote:
>>> From: Xin Long
>>>> Sent: 03 March 2017 15:43
>>> ...
>>>> > It is much more important to get MSG_MORE working 'properly' for SCTP
>>>> > than for TCP. For TCP an application can always use a long send.
>>>
>>>> "long send" ?, you mean bigger data, or keeping sending?
>>>> I didn't get the difference between SCTP and TCP, they
>>>> are similar when sending data.
>>>
>>> With tcp an application can always replace two send()/write()
>>> calls with a single call to writev().
>>> For sctp two send() calls must be made in order to generate two
>>> data chunks.
>>> So it is much easier for a tcp application to generate 'full'
>>> ethernet packets.
>> okay, it should not be a important reason, and sctp might also support
>> it one day. :-)
>>
>>>
>>>>
>>>> >
>>>> > ...
>>>> >> @@ -1982,6 +1982,7 @@ static int sctp_sendmsg(struct sock *sk, struct msghdr *msg, size_t msg_len)
>>>> >>        * breaks.
>>>> >>        */
>>>> >>       err = sctp_primitive_SEND(net, asoc, datamsg);
>>>> >> +     asoc->force_delay = 0;
>>>> >>       /* Did the lower layer accept the chunk? */
>>>> >>       if (err) {
>>>> >>               sctp_datamsg_free(datamsg);
>>>> >
>>>> > I don't think this is right - or needed.
>>>> > You only get to the above if some test has decided to send data chunks.
>>>> > So it just means that the NEXT time someone tries to send data all the
>>>> > queued data gets sent.
>>>
>>>> the NEXT time someone tries to send data with "MSG_MORE clear",
>>>> yes, but with "MSG_MORE set", it will still delay.
>>>>
>>>> > I'm guessing that the whole thing gets called in a loop (definitely needed
>>>> > for very long data chunks, or after the window is opened).
>>>
>>>> yes, if users keep sending data chunks with MSG_MORE set, no
>>>> data with "MSG_MORE clear" gap.
>>>>
>>>> > Now if an application sends a lot of (say) 100 byte chunks with MSG_MORE
>>>> > set it would expect to see a lot of full ethernet frames be sent.
>>>
>>>> right.
>>>
>>>> > With the above a frame will be sent (containing all but 1 chunk) when the
>>>> > amount of queued data becomes too large for an ethernet frame, and immediately
>>>> > followed by a second ethernet frame with 1 chunk in it.
>>>
>>>> "followed by a second ethernet frame with 1 chunk in it.", I think this's
>>>> what you're really worried about, right ?
>>>> But sctp flush data queue NOT like what you think, it's not keep traversing
>>>> the queue untill the queue is empty.
>>>> once a packet with chunks in one ethernet frame is sent, sctp_outq_flush
>>>> will return. it will pack chunks and send the next packet again untill some
>>>> other 'event' triggers it, like retransmission or data received from peer.
>>>> I don't think this is a problem.
>>>
>>> Erm.... that can't work.
>>> I think there is code to convert a large user send into multiple data chunks.
>>> So if the user does a 4k (say) send several large chunks get queued.
>>> These would need to all be sent at once.
>>>
>>> Similarly when the transmit window is received.
>>> So somewhere there ought to be a loop that will send more than one packet.
>> As far as I can see, no loop like you said, mostly, the incoming
>> chunk (like SACK) from peer will trigger the next flush out.
>> I can try to trace the path in kernel for sure tomorrow.
> okay, you are right, I missed sctp_packet_transmit_chunk also call
> sctp_packet_transmit to send the current packet. :)
>
> But if we keep sending data with "MSG_MORE", after one ethernet frame
> is sent, "followed by a second ethernet frame with 1 chunk in it" will NOT
> happen, as in this loop the asoc's msg_more flag is still set, and this flush
> is called by sctp_sendmsg(the function msg_more should care more).
>
> did I miss something ?
Hi, David Laight

As now sctp MSG_MORE is broken with msg->force_day, I hope we
can use this patch before you get a better one to fix it, so that users
would not be confused with the strange behaviour.

what do you say ?

>
>>
>>>
>>>> > Now it might be that the flag needs clearing when retransmissions are queued.
>>>> > OTOH they might get sent for other reasons.
>>>
>>>> Before we really overthought about MSG_MORE, no need to care about
>>>> retransmissions, define MSG_MORE, in my opinion, it works more for
>>>> *inflight is 0*, if it's not 0, we shouldn't stop other places flushing them.
>>>
>>> Eh? and when nagle disabled.
>>> If 'inflight' isn't 0 then most paths don't flush data.
>> I knew, but MSG_MORE is different thing, it should only try to work for the
>> current and following data.
>>
>>>
>>>> We cannot let asoc's more_more flag work as global, it will block elsewhere
>>>> sending data chunks, not only sctp_sendmsg.
>>>
>>> If the connection was flow controlled off, and more 'credit' arrives and there
>>> is less that an ethernet frame's worth of data pending, and the last send
>>> said 'MSG_MORE' there is no point sending anything until the application
>>> does a send with MSG_MORE clear.
>> got you, I think you have different understanding about MSG_MORE
>> while this patch just try to make it work like TCP's msg_more, but what
>> you mentioned here is the same as TCP thing, seems you also want
>> to improve TCP's MSG_MORE :-)
>>
>>>
>>> I'm not sure what causes a retransmission to send data, I suspect that 'inflight'
>>> can easily be non-zero at that time.
>> The thing that causes a retransmission to send data is that both tx and
>> rtx send data through sctp_outq_flush, in which it will try to send rtx queue,
>> then rx queue.
>>
>> yes, once a packet is sent out and not yet be SACKed, "inflight" will not be
>> zero, so when retransmiting, "inflight" must be non-zero.
>>
>>> Likely something causes a packet be generated - which then collects the data chunks.
>>>
>>>         David
>>>
>>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net] sctp: change to save MSG_MORE flag into assoc
@ 2017-03-07  3:56             ` Xin Long
  0 siblings, 0 replies; 14+ messages in thread
From: Xin Long @ 2017-03-07  3:56 UTC (permalink / raw)
  To: David Laight
  Cc: network dev, linux-sctp, davem, Marcelo Ricardo Leitner,
	Neil Horman, Vlad Yasevich

On Sat, Mar 4, 2017 at 12:51 PM, Xin Long <lucien.xin@gmail.com> wrote:
> On Sat, Mar 4, 2017 at 1:57 AM, Xin Long <lucien.xin@gmail.com> wrote:
>> On Sat, Mar 4, 2017 at 12:31 AM, David Laight <David.Laight@aculab.com> wrote:
>>> From: Xin Long
>>>> Sent: 03 March 2017 15:43
>>> ...
>>>> > It is much more important to get MSG_MORE working 'properly' for SCTP
>>>> > than for TCP. For TCP an application can always use a long send.
>>>
>>>> "long send" ?, you mean bigger data, or keeping sending?
>>>> I didn't get the difference between SCTP and TCP, they
>>>> are similar when sending data.
>>>
>>> With tcp an application can always replace two send()/write()
>>> calls with a single call to writev().
>>> For sctp two send() calls must be made in order to generate two
>>> data chunks.
>>> So it is much easier for a tcp application to generate 'full'
>>> ethernet packets.
>> okay, it should not be a important reason, and sctp might also support
>> it one day. :-)
>>
>>>
>>>>
>>>> >
>>>> > ...
>>>> >> @@ -1982,6 +1982,7 @@ static int sctp_sendmsg(struct sock *sk, struct msghdr *msg, size_t msg_len)
>>>> >>        * breaks.
>>>> >>        */
>>>> >>       err = sctp_primitive_SEND(net, asoc, datamsg);
>>>> >> +     asoc->force_delay = 0;
>>>> >>       /* Did the lower layer accept the chunk? */
>>>> >>       if (err) {
>>>> >>               sctp_datamsg_free(datamsg);
>>>> >
>>>> > I don't think this is right - or needed.
>>>> > You only get to the above if some test has decided to send data chunks.
>>>> > So it just means that the NEXT time someone tries to send data all the
>>>> > queued data gets sent.
>>>
>>>> the NEXT time someone tries to send data with "MSG_MORE clear",
>>>> yes, but with "MSG_MORE set", it will still delay.
>>>>
>>>> > I'm guessing that the whole thing gets called in a loop (definitely needed
>>>> > for very long data chunks, or after the window is opened).
>>>
>>>> yes, if users keep sending data chunks with MSG_MORE set, no
>>>> data with "MSG_MORE clear" gap.
>>>>
>>>> > Now if an application sends a lot of (say) 100 byte chunks with MSG_MORE
>>>> > set it would expect to see a lot of full ethernet frames be sent.
>>>
>>>> right.
>>>
>>>> > With the above a frame will be sent (containing all but 1 chunk) when the
>>>> > amount of queued data becomes too large for an ethernet frame, and immediately
>>>> > followed by a second ethernet frame with 1 chunk in it.
>>>
>>>> "followed by a second ethernet frame with 1 chunk in it.", I think this's
>>>> what you're really worried about, right ?
>>>> But sctp flush data queue NOT like what you think, it's not keep traversing
>>>> the queue untill the queue is empty.
>>>> once a packet with chunks in one ethernet frame is sent, sctp_outq_flush
>>>> will return. it will pack chunks and send the next packet again untill some
>>>> other 'event' triggers it, like retransmission or data received from peer.
>>>> I don't think this is a problem.
>>>
>>> Erm.... that can't work.
>>> I think there is code to convert a large user send into multiple data chunks.
>>> So if the user does a 4k (say) send several large chunks get queued.
>>> These would need to all be sent at once.
>>>
>>> Similarly when the transmit window is received.
>>> So somewhere there ought to be a loop that will send more than one packet.
>> As far as I can see, no loop like you said, mostly, the incoming
>> chunk (like SACK) from peer will trigger the next flush out.
>> I can try to trace the path in kernel for sure tomorrow.
> okay, you are right, I missed sctp_packet_transmit_chunk also call
> sctp_packet_transmit to send the current packet. :)
>
> But if we keep sending data with "MSG_MORE", after one ethernet frame
> is sent, "followed by a second ethernet frame with 1 chunk in it" will NOT
> happen, as in this loop the asoc's msg_more flag is still set, and this flush
> is called by sctp_sendmsg(the function msg_more should care more).
>
> did I miss something ?
Hi, David Laight

As now sctp MSG_MORE is broken with msg->force_day, I hope we
can use this patch before you get a better one to fix it, so that users
would not be confused with the strange behaviour.

what do you say ?

>
>>
>>>
>>>> > Now it might be that the flag needs clearing when retransmissions are queued.
>>>> > OTOH they might get sent for other reasons.
>>>
>>>> Before we really overthought about MSG_MORE, no need to care about
>>>> retransmissions, define MSG_MORE, in my opinion, it works more for
>>>> *inflight is 0*, if it's not 0, we shouldn't stop other places flushing them.
>>>
>>> Eh? and when nagle disabled.
>>> If 'inflight' isn't 0 then most paths don't flush data.
>> I knew, but MSG_MORE is different thing, it should only try to work for the
>> current and following data.
>>
>>>
>>>> We cannot let asoc's more_more flag work as global, it will block elsewhere
>>>> sending data chunks, not only sctp_sendmsg.
>>>
>>> If the connection was flow controlled off, and more 'credit' arrives and there
>>> is less that an ethernet frame's worth of data pending, and the last send
>>> said 'MSG_MORE' there is no point sending anything until the application
>>> does a send with MSG_MORE clear.
>> got you, I think you have different understanding about MSG_MORE
>> while this patch just try to make it work like TCP's msg_more, but what
>> you mentioned here is the same as TCP thing, seems you also want
>> to improve TCP's MSG_MORE :-)
>>
>>>
>>> I'm not sure what causes a retransmission to send data, I suspect that 'inflight'
>>> can easily be non-zero at that time.
>> The thing that causes a retransmission to send data is that both tx and
>> rtx send data through sctp_outq_flush, in which it will try to send rtx queue,
>> then rx queue.
>>
>> yes, once a packet is sent out and not yet be SACKed, "inflight" will not be
>> zero, so when retransmiting, "inflight" must be non-zero.
>>
>>> Likely something causes a packet be generated - which then collects the data chunks.
>>>
>>>         David
>>>
>>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-03-07  9:58 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-03  6:23 [PATCH net] sctp: change to save MSG_MORE flag into assoc Xin Long
2017-03-03  6:23 ` Xin Long
2017-03-03 12:49 ` David Laight
2017-03-03 12:49   ` David Laight
2017-03-03 15:42   ` Xin Long
2017-03-03 15:42     ` Xin Long
2017-03-03 16:31     ` David Laight
2017-03-03 16:31       ` David Laight
2017-03-03 17:57       ` Xin Long
2017-03-03 17:57         ` Xin Long
2017-03-04  4:51         ` Xin Long
2017-03-04  4:51           ` Xin Long
2017-03-07  3:56           ` Xin Long
2017-03-07  3:56             ` Xin Long

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.