All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/2] sctp: support MSG_MORE flag when sending msg
@ 2017-02-18 17:52 ` Xin Long
  0 siblings, 0 replies; 44+ messages in thread
From: Xin Long @ 2017-02-18 17:52 UTC (permalink / raw)
  To: network dev, linux-sctp
  Cc: Marcelo Ricardo Leitner, Neil Horman, Vlad Yasevich, davem, David Laight

This patch is to add support for MSG_MORE on sctp. Patch 1/2 is an
improvement ahead of patch 2/2 to solve the close block problem
mentioned in https://patchwork.ozlabs.org/patch/372404/.

Xin Long (2):
  sctp: flush out queue once assoc state falls into SHUTDOWN_PENDING
  sctp: add support for MSG_MORE

 include/net/sctp/structs.h | 1 +
 net/sctp/output.c          | 9 +++------
 net/sctp/sm_sideeffect.c   | 4 ++++
 net/sctp/socket.c          | 1 +
 4 files changed, 9 insertions(+), 6 deletions(-)

-- 
2.1.0

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH net-next 0/2] sctp: support MSG_MORE flag when sending msg
@ 2017-02-18 17:52 ` Xin Long
  0 siblings, 0 replies; 44+ messages in thread
From: Xin Long @ 2017-02-18 17:52 UTC (permalink / raw)
  To: network dev, linux-sctp
  Cc: Marcelo Ricardo Leitner, Neil Horman, Vlad Yasevich, davem, David Laight

This patch is to add support for MSG_MORE on sctp. Patch 1/2 is an
improvement ahead of patch 2/2 to solve the close block problem
mentioned in https://patchwork.ozlabs.org/patch/372404/.

Xin Long (2):
  sctp: flush out queue once assoc state falls into SHUTDOWN_PENDING
  sctp: add support for MSG_MORE

 include/net/sctp/structs.h | 1 +
 net/sctp/output.c          | 9 +++------
 net/sctp/sm_sideeffect.c   | 4 ++++
 net/sctp/socket.c          | 1 +
 4 files changed, 9 insertions(+), 6 deletions(-)

-- 
2.1.0


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH net-next 1/2] sctp: flush out queue once assoc state falls into SHUTDOWN_PENDING
  2017-02-18 17:52 ` Xin Long
@ 2017-02-18 17:52   ` Xin Long
  -1 siblings, 0 replies; 44+ messages in thread
From: Xin Long @ 2017-02-18 17:52 UTC (permalink / raw)
  To: network dev, linux-sctp
  Cc: Marcelo Ricardo Leitner, Neil Horman, Vlad Yasevich, davem, David Laight

This patch is to flush out queue when assoc state falls into
SHUTDOWN_PENDING if there are still chunks in it, so that the
data can be sent out as soon as possible before sending SHUTDOWN
chunk.

When sctp supports MSG_MORE flag in next patch, this improvement
can also solve the problem that the chunks with MSG_MORE flag
may be stuck in queue when closing an assoc.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 net/sctp/sm_sideeffect.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
index 51abcc9..25384fa 100644
--- a/net/sctp/sm_sideeffect.c
+++ b/net/sctp/sm_sideeffect.c
@@ -872,6 +872,10 @@ static void sctp_cmd_new_state(sctp_cmd_seq_t *cmds,
 		if (!sctp_style(sk, UDP))
 			sk->sk_state_change(sk);
 	}
+
+	if (sctp_state(asoc, SHUTDOWN_PENDING) &&
+	    !sctp_outq_is_empty(&asoc->outqueue))
+		sctp_outq_uncork(&asoc->outqueue, GFP_ATOMIC);
 }
 
 /* Helper function to delete an association. */
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH net-next 1/2] sctp: flush out queue once assoc state falls into SHUTDOWN_PENDING
@ 2017-02-18 17:52   ` Xin Long
  0 siblings, 0 replies; 44+ messages in thread
From: Xin Long @ 2017-02-18 17:52 UTC (permalink / raw)
  To: network dev, linux-sctp
  Cc: Marcelo Ricardo Leitner, Neil Horman, Vlad Yasevich, davem, David Laight

This patch is to flush out queue when assoc state falls into
SHUTDOWN_PENDING if there are still chunks in it, so that the
data can be sent out as soon as possible before sending SHUTDOWN
chunk.

When sctp supports MSG_MORE flag in next patch, this improvement
can also solve the problem that the chunks with MSG_MORE flag
may be stuck in queue when closing an assoc.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 net/sctp/sm_sideeffect.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
index 51abcc9..25384fa 100644
--- a/net/sctp/sm_sideeffect.c
+++ b/net/sctp/sm_sideeffect.c
@@ -872,6 +872,10 @@ static void sctp_cmd_new_state(sctp_cmd_seq_t *cmds,
 		if (!sctp_style(sk, UDP))
 			sk->sk_state_change(sk);
 	}
+
+	if (sctp_state(asoc, SHUTDOWN_PENDING) &&
+	    !sctp_outq_is_empty(&asoc->outqueue))
+		sctp_outq_uncork(&asoc->outqueue, GFP_ATOMIC);
 }
 
 /* Helper function to delete an association. */
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH net-next 2/2] sctp: add support for MSG_MORE
  2017-02-18 17:52   ` Xin Long
@ 2017-02-18 17:52     ` Xin Long
  -1 siblings, 0 replies; 44+ messages in thread
From: Xin Long @ 2017-02-18 17:52 UTC (permalink / raw)
  To: network dev, linux-sctp
  Cc: Marcelo Ricardo Leitner, Neil Horman, Vlad Yasevich, davem, David Laight

This patch is to add support for MSG_MORE on sctp.

It adds force_delay in sctp_datamsg to save MSG_MORE, and sets it after
creating datamsg according to the send flag. sctp_packet_can_append_data
then uses it to decide if the chunks of this msg will be sent at once or
delay it.

Note that unlike [1], this patch saves MSG_MORE in datamsg, instead of
in assoc. As sctp enqueues the chunks first, then dequeue them one by
one. If it's saved in assoc,the current msg's send flag (MSG_MORE) may
affect other chunks' bundling.

Since last patch, sctp flush out queue once assoc state falls into
SHUTDOWN_PENDING, the close block problem mentioned in [1] has been
solved as well.

[1] https://patchwork.ozlabs.org/patch/372404/

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 include/net/sctp/structs.h | 1 +
 net/sctp/output.c          | 9 +++------
 net/sctp/socket.c          | 1 +
 3 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 387c802..a244db5 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -497,6 +497,7 @@ struct sctp_datamsg {
 	/* Did the messenge fail to send? */
 	int send_error;
 	u8 send_failed:1,
+	   force_delay:1,
 	   can_delay;	    /* should this message be Nagle delayed */
 };
 
diff --git a/net/sctp/output.c b/net/sctp/output.c
index 814eac0..85406d5 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -704,18 +704,15 @@ static sctp_xmit_t sctp_packet_can_append_data(struct sctp_packet *packet,
 	 * unacknowledged.
 	 */
 
-	if (sctp_sk(asoc->base.sk)->nodelay)
-		/* Nagle disabled */
+	if ((sctp_sk(asoc->base.sk)->nodelay || inflight == 0) &&
+	    !chunk->msg->force_delay)
+		/* Nothing unacked */
 		return SCTP_XMIT_OK;
 
 	if (!sctp_packet_empty(packet))
 		/* Append to packet */
 		return SCTP_XMIT_OK;
 
-	if (inflight == 0)
-		/* Nothing unacked */
-		return SCTP_XMIT_OK;
-
 	if (!sctp_state(asoc, ESTABLISHED))
 		return SCTP_XMIT_OK;
 
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 75f35ce..b532148 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1964,6 +1964,7 @@ static int sctp_sendmsg(struct sock *sk, struct msghdr *msg, size_t msg_len)
 		err = PTR_ERR(datamsg);
 		goto out_free;
 	}
+	datamsg->force_delay = !!(msg->msg_flags & MSG_MORE);
 
 	/* Now send the (possibly) fragmented message. */
 	list_for_each_entry(chunk, &datamsg->chunks, frag_list) {
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH net-next 2/2] sctp: add support for MSG_MORE
@ 2017-02-18 17:52     ` Xin Long
  0 siblings, 0 replies; 44+ messages in thread
From: Xin Long @ 2017-02-18 17:52 UTC (permalink / raw)
  To: network dev, linux-sctp
  Cc: Marcelo Ricardo Leitner, Neil Horman, Vlad Yasevich, davem, David Laight

This patch is to add support for MSG_MORE on sctp.

It adds force_delay in sctp_datamsg to save MSG_MORE, and sets it after
creating datamsg according to the send flag. sctp_packet_can_append_data
then uses it to decide if the chunks of this msg will be sent at once or
delay it.

Note that unlike [1], this patch saves MSG_MORE in datamsg, instead of
in assoc. As sctp enqueues the chunks first, then dequeue them one by
one. If it's saved in assoc,the current msg's send flag (MSG_MORE) may
affect other chunks' bundling.

Since last patch, sctp flush out queue once assoc state falls into
SHUTDOWN_PENDING, the close block problem mentioned in [1] has been
solved as well.

[1] https://patchwork.ozlabs.org/patch/372404/

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 include/net/sctp/structs.h | 1 +
 net/sctp/output.c          | 9 +++------
 net/sctp/socket.c          | 1 +
 3 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 387c802..a244db5 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -497,6 +497,7 @@ struct sctp_datamsg {
 	/* Did the messenge fail to send? */
 	int send_error;
 	u8 send_failed:1,
+	   force_delay:1,
 	   can_delay;	    /* should this message be Nagle delayed */
 };
 
diff --git a/net/sctp/output.c b/net/sctp/output.c
index 814eac0..85406d5 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -704,18 +704,15 @@ static sctp_xmit_t sctp_packet_can_append_data(struct sctp_packet *packet,
 	 * unacknowledged.
 	 */
 
-	if (sctp_sk(asoc->base.sk)->nodelay)
-		/* Nagle disabled */
+	if ((sctp_sk(asoc->base.sk)->nodelay || inflight = 0) &&
+	    !chunk->msg->force_delay)
+		/* Nothing unacked */
 		return SCTP_XMIT_OK;
 
 	if (!sctp_packet_empty(packet))
 		/* Append to packet */
 		return SCTP_XMIT_OK;
 
-	if (inflight = 0)
-		/* Nothing unacked */
-		return SCTP_XMIT_OK;
-
 	if (!sctp_state(asoc, ESTABLISHED))
 		return SCTP_XMIT_OK;
 
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 75f35ce..b532148 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1964,6 +1964,7 @@ static int sctp_sendmsg(struct sock *sk, struct msghdr *msg, size_t msg_len)
 		err = PTR_ERR(datamsg);
 		goto out_free;
 	}
+	datamsg->force_delay = !!(msg->msg_flags & MSG_MORE);
 
 	/* Now send the (possibly) fragmented message. */
 	list_for_each_entry(chunk, &datamsg->chunks, frag_list) {
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 0/2] sctp: support MSG_MORE flag when sending msg
  2017-02-18 17:52 ` Xin Long
@ 2017-02-20 15:26   ` David Miller
  -1 siblings, 0 replies; 44+ messages in thread
From: David Miller @ 2017-02-20 15:26 UTC (permalink / raw)
  To: lucien.xin
  Cc: netdev, linux-sctp, marcelo.leitner, nhorman, vyasevich, david.laight

From: Xin Long <lucien.xin@gmail.com>
Date: Sun, 19 Feb 2017 01:52:44 +0800

> This patch is to add support for MSG_MORE on sctp. Patch 1/2 is an
> improvement ahead of patch 2/2 to solve the close block problem
> mentioned in https://patchwork.ozlabs.org/patch/372404/.

Series applied, thanks Xin.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 0/2] sctp: support MSG_MORE flag when sending msg
@ 2017-02-20 15:26   ` David Miller
  0 siblings, 0 replies; 44+ messages in thread
From: David Miller @ 2017-02-20 15:26 UTC (permalink / raw)
  To: lucien.xin
  Cc: netdev, linux-sctp, marcelo.leitner, nhorman, vyasevich, david.laight

From: Xin Long <lucien.xin@gmail.com>
Date: Sun, 19 Feb 2017 01:52:44 +0800

> This patch is to add support for MSG_MORE on sctp. Patch 1/2 is an
> improvement ahead of patch 2/2 to solve the close block problem
> mentioned in https://patchwork.ozlabs.org/patch/372404/.

Series applied, thanks Xin.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH net-next 2/2] sctp: add support for MSG_MORE
  2017-02-18 17:52     ` Xin Long
  (?)
@ 2017-02-21 14:27     ` David Laight
  2017-02-23  3:45         ` Xin Long
  -1 siblings, 1 reply; 44+ messages in thread
From: David Laight @ 2017-02-21 14:27 UTC (permalink / raw)
  To: 'Xin Long', network dev, linux-sctp
  Cc: Marcelo Ricardo Leitner, Neil Horman, Vlad Yasevich, davem

From: Xin Long 
> Sent: 18 February 2017 17:53
> This patch is to add support for MSG_MORE on sctp.
> 
> It adds force_delay in sctp_datamsg to save MSG_MORE, and sets it after
> creating datamsg according to the send flag. sctp_packet_can_append_data
> then uses it to decide if the chunks of this msg will be sent at once or
> delay it.
> 
> Note that unlike [1], this patch saves MSG_MORE in datamsg, instead of
> in assoc. As sctp enqueues the chunks first, then dequeue them one by
> one. If it's saved in assoc,the current msg's send flag (MSG_MORE) may
> affect other chunks' bundling.

I thought about that and decided that the MSG_MORE flag on the last data
chunk was the only one that mattered.
Indeed looking at any others is broken.

Consider what happens if you have two small chunks queued, the first
with MSG_MORE set, the second with it clear.

I think that sctp_outq_flush() will look at the first chunk and decide it
doesn't need to do anything because sctp_packet_transmit_chunk()
returns SCTP_XMIT_DELAY.
The data chunk with MSG_MORE clear won't even be looked at.
So the data will never be sent.

I wouldn't worry about having messages queued that have MSG_MORE clean
when the final message has it set.
While it might be 'nice' to send the data (would have to be tx credit)
waiting for the next data chunk shouldn't be a problem.

I'm not sure I even want to test the current patch!

	David

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 2/2] sctp: add support for MSG_MORE
  2017-02-21 14:27     ` David Laight
@ 2017-02-23  3:45         ` Xin Long
  0 siblings, 0 replies; 44+ messages in thread
From: Xin Long @ 2017-02-23  3:45 UTC (permalink / raw)
  To: David Laight
  Cc: network dev, linux-sctp, Marcelo Ricardo Leitner, Neil Horman,
	Vlad Yasevich, davem

On Tue, Feb 21, 2017 at 10:27 PM, David Laight <David.Laight@aculab.com> wrote:
> From: Xin Long
>> Sent: 18 February 2017 17:53
>> This patch is to add support for MSG_MORE on sctp.
>>
>> It adds force_delay in sctp_datamsg to save MSG_MORE, and sets it after
>> creating datamsg according to the send flag. sctp_packet_can_append_data
>> then uses it to decide if the chunks of this msg will be sent at once or
>> delay it.
>>
>> Note that unlike [1], this patch saves MSG_MORE in datamsg, instead of
>> in assoc. As sctp enqueues the chunks first, then dequeue them one by
>> one. If it's saved in assoc,the current msg's send flag (MSG_MORE) may
>> affect other chunks' bundling.
>
> I thought about that and decided that the MSG_MORE flag on the last data
> chunk was the only one that mattered.
> Indeed looking at any others is broken.
>
> Consider what happens if you have two small chunks queued, the first
> with MSG_MORE set, the second with it clear.
>
> I think that sctp_outq_flush() will look at the first chunk and decide it
> doesn't need to do anything because sctp_packet_transmit_chunk()
> returns SCTP_XMIT_DELAY.
> The data chunk with MSG_MORE clear won't even be looked at.
> So the data will never be sent.
It's not that bad as you thought, in sctp_packet_can_append_data():
when inflight == 0 || sctp_sk(asoc->base.sk)->nodelay, the chunks
would be still sent out.

What MSG_MORE flag actually does is ignore inflight == 0 and
sctp_sk(asoc->base.sk)->nodelay to delay the chunks, but still
it has to respect the original logic (like !chunk->msg->can_delay
|| !sctp_packet_empty(packet) || ...)

To delay the chunks with MSG_MORE set even when inflight is 0
it especially important here for users.

>
> I wouldn't worry about having messages queued that have MSG_MORE clean
> when the final message has it set.
Yeah, It's an old optimization for bundling. MSG_MORE should NOT
break that.

> While it might be 'nice' to send the data (would have to be tx credit)
> waiting for the next data chunk shouldn't be a problem.
sorry, you mean it shouldn't send the data if it's waiting for the
next data whenever ?

>
> I'm not sure I even want to test the current patch!
>
>         David
>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 2/2] sctp: add support for MSG_MORE
@ 2017-02-23  3:45         ` Xin Long
  0 siblings, 0 replies; 44+ messages in thread
From: Xin Long @ 2017-02-23  3:45 UTC (permalink / raw)
  To: David Laight
  Cc: network dev, linux-sctp, Marcelo Ricardo Leitner, Neil Horman,
	Vlad Yasevich, davem

On Tue, Feb 21, 2017 at 10:27 PM, David Laight <David.Laight@aculab.com> wrote:
> From: Xin Long
>> Sent: 18 February 2017 17:53
>> This patch is to add support for MSG_MORE on sctp.
>>
>> It adds force_delay in sctp_datamsg to save MSG_MORE, and sets it after
>> creating datamsg according to the send flag. sctp_packet_can_append_data
>> then uses it to decide if the chunks of this msg will be sent at once or
>> delay it.
>>
>> Note that unlike [1], this patch saves MSG_MORE in datamsg, instead of
>> in assoc. As sctp enqueues the chunks first, then dequeue them one by
>> one. If it's saved in assoc,the current msg's send flag (MSG_MORE) may
>> affect other chunks' bundling.
>
> I thought about that and decided that the MSG_MORE flag on the last data
> chunk was the only one that mattered.
> Indeed looking at any others is broken.
>
> Consider what happens if you have two small chunks queued, the first
> with MSG_MORE set, the second with it clear.
>
> I think that sctp_outq_flush() will look at the first chunk and decide it
> doesn't need to do anything because sctp_packet_transmit_chunk()
> returns SCTP_XMIT_DELAY.
> The data chunk with MSG_MORE clear won't even be looked at.
> So the data will never be sent.
It's not that bad as you thought, in sctp_packet_can_append_data():
when inflight = 0 || sctp_sk(asoc->base.sk)->nodelay, the chunks
would be still sent out.

What MSG_MORE flag actually does is ignore inflight = 0 and
sctp_sk(asoc->base.sk)->nodelay to delay the chunks, but still
it has to respect the original logic (like !chunk->msg->can_delay
|| !sctp_packet_empty(packet) || ...)

To delay the chunks with MSG_MORE set even when inflight is 0
it especially important here for users.

>
> I wouldn't worry about having messages queued that have MSG_MORE clean
> when the final message has it set.
Yeah, It's an old optimization for bundling. MSG_MORE should NOT
break that.

> While it might be 'nice' to send the data (would have to be tx credit)
> waiting for the next data chunk shouldn't be a problem.
sorry, you mean it shouldn't send the data if it's waiting for the
next data whenever ?

>
> I'm not sure I even want to test the current patch!
>
>         David
>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH net-next 2/2] sctp: add support for MSG_MORE
  2017-02-23  3:45         ` Xin Long
@ 2017-02-23 16:04           ` David Laight
  -1 siblings, 0 replies; 44+ messages in thread
From: David Laight @ 2017-02-23 16:04 UTC (permalink / raw)
  To: 'Xin Long'
  Cc: network dev, linux-sctp, Marcelo Ricardo Leitner, Neil Horman,
	Vlad Yasevich, davem

From: Xin Long
> Sent: 23 February 2017 03:46
> On Tue, Feb 21, 2017 at 10:27 PM, David Laight <David.Laight@aculab.com> wrote:
> > From: Xin Long
> >> Sent: 18 February 2017 17:53
> >> This patch is to add support for MSG_MORE on sctp.
> >>
> >> It adds force_delay in sctp_datamsg to save MSG_MORE, and sets it after
> >> creating datamsg according to the send flag. sctp_packet_can_append_data
> >> then uses it to decide if the chunks of this msg will be sent at once or
> >> delay it.
> >>
> >> Note that unlike [1], this patch saves MSG_MORE in datamsg, instead of
> >> in assoc. As sctp enqueues the chunks first, then dequeue them one by
> >> one. If it's saved in assoc,the current msg's send flag (MSG_MORE) may
> >> affect other chunks' bundling.
> >
> > I thought about that and decided that the MSG_MORE flag on the last data
> > chunk was the only one that mattered.
> > Indeed looking at any others is broken.
> >
> > Consider what happens if you have two small chunks queued, the first
> > with MSG_MORE set, the second with it clear.
> >
> > I think that sctp_outq_flush() will look at the first chunk and decide it
> > doesn't need to do anything because sctp_packet_transmit_chunk()
> > returns SCTP_XMIT_DELAY.
> > The data chunk with MSG_MORE clear won't even be looked at.
> > So the data will never be sent.

> It's not that bad as you thought, in sctp_packet_can_append_data():
> when inflight == 0 || sctp_sk(asoc->base.sk)->nodelay, the chunks
> would be still sent out.

One of us isn't understanding the other :-)

IIRC sctp_packet_can_append_data() is called for the first queued
data chunk in order to decide whether to generate a message that
consists only of data chunks.
If it returns SCTP_XMIT_OK then a message is built collecting the
rest of the queued data chunks (until the window fills).

So if I send a message with MSG_MORE set (on an idle connection)
SCTP_XMIT_DELAY is returned and a message isn't sent.

I now send a second small message, this time with MSG_MORE clear.
The message is queued, then the code looks to see if it can send anything.

sctp_packet_can_append_data() is called for the first queued chunk.
Since it has force_delay set SCTP_XMIT_DELAY is returned and no
message is built.
The second message isn't even looked at.

> What MSG_MORE flag actually does is ignore inflight == 0 and
> sctp_sk(asoc->base.sk)->nodelay to delay the chunks, but still
> it has to respect the original logic (like !chunk->msg->can_delay
> || !sctp_packet_empty(packet) || ...)
> 
> To delay the chunks with MSG_MORE set even when inflight is 0
> it especially important here for users.

I'm not too worried about that.
Sending the first message was a cheap way to ensure something got
sent if the application lied and didn't send a subsequent message.

The change has hit Linus's tree, I'll should be able to test that
and confirm what I think is going on.

	David


^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH net-next 2/2] sctp: add support for MSG_MORE
@ 2017-02-23 16:04           ` David Laight
  0 siblings, 0 replies; 44+ messages in thread
From: David Laight @ 2017-02-23 16:04 UTC (permalink / raw)
  To: 'Xin Long'
  Cc: network dev, linux-sctp, Marcelo Ricardo Leitner, Neil Horman,
	Vlad Yasevich, davem

RnJvbTogWGluIExvbmcNCj4gU2VudDogMjMgRmVicnVhcnkgMjAxNyAwMzo0Ng0KPiBPbiBUdWUs
IEZlYiAyMSwgMjAxNyBhdCAxMDoyNyBQTSwgRGF2aWQgTGFpZ2h0IDxEYXZpZC5MYWlnaHRAYWN1
bGFiLmNvbT4gd3JvdGU6DQo+ID4gRnJvbTogWGluIExvbmcNCj4gPj4gU2VudDogMTggRmVicnVh
cnkgMjAxNyAxNzo1Mw0KPiA+PiBUaGlzIHBhdGNoIGlzIHRvIGFkZCBzdXBwb3J0IGZvciBNU0df
TU9SRSBvbiBzY3RwLg0KPiA+Pg0KPiA+PiBJdCBhZGRzIGZvcmNlX2RlbGF5IGluIHNjdHBfZGF0
YW1zZyB0byBzYXZlIE1TR19NT1JFLCBhbmQgc2V0cyBpdCBhZnRlcg0KPiA+PiBjcmVhdGluZyBk
YXRhbXNnIGFjY29yZGluZyB0byB0aGUgc2VuZCBmbGFnLiBzY3RwX3BhY2tldF9jYW5fYXBwZW5k
X2RhdGENCj4gPj4gdGhlbiB1c2VzIGl0IHRvIGRlY2lkZSBpZiB0aGUgY2h1bmtzIG9mIHRoaXMg
bXNnIHdpbGwgYmUgc2VudCBhdCBvbmNlIG9yDQo+ID4+IGRlbGF5IGl0Lg0KPiA+Pg0KPiA+PiBO
b3RlIHRoYXQgdW5saWtlIFsxXSwgdGhpcyBwYXRjaCBzYXZlcyBNU0dfTU9SRSBpbiBkYXRhbXNn
LCBpbnN0ZWFkIG9mDQo+ID4+IGluIGFzc29jLiBBcyBzY3RwIGVucXVldWVzIHRoZSBjaHVua3Mg
Zmlyc3QsIHRoZW4gZGVxdWV1ZSB0aGVtIG9uZSBieQ0KPiA+PiBvbmUuIElmIGl0J3Mgc2F2ZWQg
aW4gYXNzb2MsdGhlIGN1cnJlbnQgbXNnJ3Mgc2VuZCBmbGFnIChNU0dfTU9SRSkgbWF5DQo+ID4+
IGFmZmVjdCBvdGhlciBjaHVua3MnIGJ1bmRsaW5nLg0KPiA+DQo+ID4gSSB0aG91Z2h0IGFib3V0
IHRoYXQgYW5kIGRlY2lkZWQgdGhhdCB0aGUgTVNHX01PUkUgZmxhZyBvbiB0aGUgbGFzdCBkYXRh
DQo+ID4gY2h1bmsgd2FzIHRoZSBvbmx5IG9uZSB0aGF0IG1hdHRlcmVkLg0KPiA+IEluZGVlZCBs
b29raW5nIGF0IGFueSBvdGhlcnMgaXMgYnJva2VuLg0KPiA+DQo+ID4gQ29uc2lkZXIgd2hhdCBo
YXBwZW5zIGlmIHlvdSBoYXZlIHR3byBzbWFsbCBjaHVua3MgcXVldWVkLCB0aGUgZmlyc3QNCj4g
PiB3aXRoIE1TR19NT1JFIHNldCwgdGhlIHNlY29uZCB3aXRoIGl0IGNsZWFyLg0KPiA+DQo+ID4g
SSB0aGluayB0aGF0IHNjdHBfb3V0cV9mbHVzaCgpIHdpbGwgbG9vayBhdCB0aGUgZmlyc3QgY2h1
bmsgYW5kIGRlY2lkZSBpdA0KPiA+IGRvZXNuJ3QgbmVlZCB0byBkbyBhbnl0aGluZyBiZWNhdXNl
IHNjdHBfcGFja2V0X3RyYW5zbWl0X2NodW5rKCkNCj4gPiByZXR1cm5zIFNDVFBfWE1JVF9ERUxB
WS4NCj4gPiBUaGUgZGF0YSBjaHVuayB3aXRoIE1TR19NT1JFIGNsZWFyIHdvbid0IGV2ZW4gYmUg
bG9va2VkIGF0Lg0KPiA+IFNvIHRoZSBkYXRhIHdpbGwgbmV2ZXIgYmUgc2VudC4NCg0KPiBJdCdz
IG5vdCB0aGF0IGJhZCBhcyB5b3UgdGhvdWdodCwgaW4gc2N0cF9wYWNrZXRfY2FuX2FwcGVuZF9k
YXRhKCk6DQo+IHdoZW4gaW5mbGlnaHQgPT0gMCB8fCBzY3RwX3NrKGFzb2MtPmJhc2Uuc2spLT5u
b2RlbGF5LCB0aGUgY2h1bmtzDQo+IHdvdWxkIGJlIHN0aWxsIHNlbnQgb3V0Lg0KDQpPbmUgb2Yg
dXMgaXNuJ3QgdW5kZXJzdGFuZGluZyB0aGUgb3RoZXIgOi0pDQoNCklJUkMgc2N0cF9wYWNrZXRf
Y2FuX2FwcGVuZF9kYXRhKCkgaXMgY2FsbGVkIGZvciB0aGUgZmlyc3QgcXVldWVkDQpkYXRhIGNo
dW5rIGluIG9yZGVyIHRvIGRlY2lkZSB3aGV0aGVyIHRvIGdlbmVyYXRlIGEgbWVzc2FnZSB0aGF0
DQpjb25zaXN0cyBvbmx5IG9mIGRhdGEgY2h1bmtzLg0KSWYgaXQgcmV0dXJucyBTQ1RQX1hNSVRf
T0sgdGhlbiBhIG1lc3NhZ2UgaXMgYnVpbHQgY29sbGVjdGluZyB0aGUNCnJlc3Qgb2YgdGhlIHF1
ZXVlZCBkYXRhIGNodW5rcyAodW50aWwgdGhlIHdpbmRvdyBmaWxscykuDQoNClNvIGlmIEkgc2Vu
ZCBhIG1lc3NhZ2Ugd2l0aCBNU0dfTU9SRSBzZXQgKG9uIGFuIGlkbGUgY29ubmVjdGlvbikNClND
VFBfWE1JVF9ERUxBWSBpcyByZXR1cm5lZCBhbmQgYSBtZXNzYWdlIGlzbid0IHNlbnQuDQoNCkkg
bm93IHNlbmQgYSBzZWNvbmQgc21hbGwgbWVzc2FnZSwgdGhpcyB0aW1lIHdpdGggTVNHX01PUkUg
Y2xlYXIuDQpUaGUgbWVzc2FnZSBpcyBxdWV1ZWQsIHRoZW4gdGhlIGNvZGUgbG9va3MgdG8gc2Vl
IGlmIGl0IGNhbiBzZW5kIGFueXRoaW5nLg0KDQpzY3RwX3BhY2tldF9jYW5fYXBwZW5kX2RhdGEo
KSBpcyBjYWxsZWQgZm9yIHRoZSBmaXJzdCBxdWV1ZWQgY2h1bmsuDQpTaW5jZSBpdCBoYXMgZm9y
Y2VfZGVsYXkgc2V0IFNDVFBfWE1JVF9ERUxBWSBpcyByZXR1cm5lZCBhbmQgbm8NCm1lc3NhZ2Ug
aXMgYnVpbHQuDQpUaGUgc2Vjb25kIG1lc3NhZ2UgaXNuJ3QgZXZlbiBsb29rZWQgYXQuDQoNCj4g
V2hhdCBNU0dfTU9SRSBmbGFnIGFjdHVhbGx5IGRvZXMgaXMgaWdub3JlIGluZmxpZ2h0ID09IDAg
YW5kDQo+IHNjdHBfc2soYXNvYy0+YmFzZS5zayktPm5vZGVsYXkgdG8gZGVsYXkgdGhlIGNodW5r
cywgYnV0IHN0aWxsDQo+IGl0IGhhcyB0byByZXNwZWN0IHRoZSBvcmlnaW5hbCBsb2dpYyAobGlr
ZSAhY2h1bmstPm1zZy0+Y2FuX2RlbGF5DQo+IHx8ICFzY3RwX3BhY2tldF9lbXB0eShwYWNrZXQp
IHx8IC4uLikNCj4gDQo+IFRvIGRlbGF5IHRoZSBjaHVua3Mgd2l0aCBNU0dfTU9SRSBzZXQgZXZl
biB3aGVuIGluZmxpZ2h0IGlzIDANCj4gaXQgZXNwZWNpYWxseSBpbXBvcnRhbnQgaGVyZSBmb3Ig
dXNlcnMuDQoNCkknbSBub3QgdG9vIHdvcnJpZWQgYWJvdXQgdGhhdC4NClNlbmRpbmcgdGhlIGZp
cnN0IG1lc3NhZ2Ugd2FzIGEgY2hlYXAgd2F5IHRvIGVuc3VyZSBzb21ldGhpbmcgZ290DQpzZW50
IGlmIHRoZSBhcHBsaWNhdGlvbiBsaWVkIGFuZCBkaWRuJ3Qgc2VuZCBhIHN1YnNlcXVlbnQgbWVz
c2FnZS4NCg0KVGhlIGNoYW5nZSBoYXMgaGl0IExpbnVzJ3MgdHJlZSwgSSdsbCBzaG91bGQgYmUg
YWJsZSB0byB0ZXN0IHRoYXQNCmFuZCBjb25maXJtIHdoYXQgSSB0aGluayBpcyBnb2luZyBvbi4N
Cg0KCURhdmlkDQoNCg=

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 2/2] sctp: add support for MSG_MORE
  2017-02-23 16:04           ` David Laight
@ 2017-02-23 17:40             ` Marcelo Ricardo Leitner
  -1 siblings, 0 replies; 44+ messages in thread
From: Marcelo Ricardo Leitner @ 2017-02-23 17:40 UTC (permalink / raw)
  To: David Laight
  Cc: 'Xin Long',
	network dev, linux-sctp, Neil Horman, Vlad Yasevich, davem

On Thu, Feb 23, 2017 at 04:04:10PM +0000, David Laight wrote:
> From: Xin Long
> > Sent: 23 February 2017 03:46
> > On Tue, Feb 21, 2017 at 10:27 PM, David Laight <David.Laight@aculab.com> wrote:
> > > From: Xin Long
> > >> Sent: 18 February 2017 17:53
> > >> This patch is to add support for MSG_MORE on sctp.
> > >>
> > >> It adds force_delay in sctp_datamsg to save MSG_MORE, and sets it after
> > >> creating datamsg according to the send flag. sctp_packet_can_append_data
> > >> then uses it to decide if the chunks of this msg will be sent at once or
> > >> delay it.
> > >>
> > >> Note that unlike [1], this patch saves MSG_MORE in datamsg, instead of
> > >> in assoc. As sctp enqueues the chunks first, then dequeue them one by
> > >> one. If it's saved in assoc,the current msg's send flag (MSG_MORE) may
> > >> affect other chunks' bundling.
> > >
> > > I thought about that and decided that the MSG_MORE flag on the last data
> > > chunk was the only one that mattered.
> > > Indeed looking at any others is broken.
> > >
> > > Consider what happens if you have two small chunks queued, the first
> > > with MSG_MORE set, the second with it clear.
> > >
> > > I think that sctp_outq_flush() will look at the first chunk and decide it
> > > doesn't need to do anything because sctp_packet_transmit_chunk()
> > > returns SCTP_XMIT_DELAY.
> > > The data chunk with MSG_MORE clear won't even be looked at.
> > > So the data will never be sent.
> 
> > It's not that bad as you thought, in sctp_packet_can_append_data():
> > when inflight == 0 || sctp_sk(asoc->base.sk)->nodelay, the chunks
> > would be still sent out.
> 
> One of us isn't understanding the other :-)
> 
> IIRC sctp_packet_can_append_data() is called for the first queued
> data chunk in order to decide whether to generate a message that

Perhaps here lies the source of the confusion?
sctp_packet_can_append_data() is called for all queued data chunks, and
not just the first one.

sctp_outq_flush
  (retransmissions here, omitted for simplicity)
  /* Finally, transmit new packets.  */
  while ((chunk = sctp_outq_dequeue_data(q)) != NULL) {
    sctp_packet_transmit_chunk
      sctp_packet_append_chunk
        sctp_packet_can_append_data
        __sctp_packet_append_chunk

So chunks are checked one by one.

> consists only of data chunks.

That's not really its purpose. It's to check if it can append a data
chunk to the packet being prepared, while respecting asoc state, cwnd,
etc.

HTH!

  Marcelo

> If it returns SCTP_XMIT_OK then a message is built collecting the
> rest of the queued data chunks (until the window fills).
> 
> So if I send a message with MSG_MORE set (on an idle connection)
> SCTP_XMIT_DELAY is returned and a message isn't sent.
> 
> I now send a second small message, this time with MSG_MORE clear.
> The message is queued, then the code looks to see if it can send anything.
> 
> sctp_packet_can_append_data() is called for the first queued chunk.
> Since it has force_delay set SCTP_XMIT_DELAY is returned and no
> message is built.
> The second message isn't even looked at.
> 
> > What MSG_MORE flag actually does is ignore inflight == 0 and
> > sctp_sk(asoc->base.sk)->nodelay to delay the chunks, but still
> > it has to respect the original logic (like !chunk->msg->can_delay
> > || !sctp_packet_empty(packet) || ...)
> > 
> > To delay the chunks with MSG_MORE set even when inflight is 0
> > it especially important here for users.
> 
> I'm not too worried about that.
> Sending the first message was a cheap way to ensure something got
> sent if the application lied and didn't send a subsequent message.
> 
> The change has hit Linus's tree, I'll should be able to test that
> and confirm what I think is going on.
> 
> 	David
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 2/2] sctp: add support for MSG_MORE
@ 2017-02-23 17:40             ` Marcelo Ricardo Leitner
  0 siblings, 0 replies; 44+ messages in thread
From: Marcelo Ricardo Leitner @ 2017-02-23 17:40 UTC (permalink / raw)
  To: David Laight
  Cc: 'Xin Long',
	network dev, linux-sctp, Neil Horman, Vlad Yasevich, davem

On Thu, Feb 23, 2017 at 04:04:10PM +0000, David Laight wrote:
> From: Xin Long
> > Sent: 23 February 2017 03:46
> > On Tue, Feb 21, 2017 at 10:27 PM, David Laight <David.Laight@aculab.com> wrote:
> > > From: Xin Long
> > >> Sent: 18 February 2017 17:53
> > >> This patch is to add support for MSG_MORE on sctp.
> > >>
> > >> It adds force_delay in sctp_datamsg to save MSG_MORE, and sets it after
> > >> creating datamsg according to the send flag. sctp_packet_can_append_data
> > >> then uses it to decide if the chunks of this msg will be sent at once or
> > >> delay it.
> > >>
> > >> Note that unlike [1], this patch saves MSG_MORE in datamsg, instead of
> > >> in assoc. As sctp enqueues the chunks first, then dequeue them one by
> > >> one. If it's saved in assoc,the current msg's send flag (MSG_MORE) may
> > >> affect other chunks' bundling.
> > >
> > > I thought about that and decided that the MSG_MORE flag on the last data
> > > chunk was the only one that mattered.
> > > Indeed looking at any others is broken.
> > >
> > > Consider what happens if you have two small chunks queued, the first
> > > with MSG_MORE set, the second with it clear.
> > >
> > > I think that sctp_outq_flush() will look at the first chunk and decide it
> > > doesn't need to do anything because sctp_packet_transmit_chunk()
> > > returns SCTP_XMIT_DELAY.
> > > The data chunk with MSG_MORE clear won't even be looked at.
> > > So the data will never be sent.
> 
> > It's not that bad as you thought, in sctp_packet_can_append_data():
> > when inflight = 0 || sctp_sk(asoc->base.sk)->nodelay, the chunks
> > would be still sent out.
> 
> One of us isn't understanding the other :-)
> 
> IIRC sctp_packet_can_append_data() is called for the first queued
> data chunk in order to decide whether to generate a message that

Perhaps here lies the source of the confusion?
sctp_packet_can_append_data() is called for all queued data chunks, and
not just the first one.

sctp_outq_flush
  (retransmissions here, omitted for simplicity)
  /* Finally, transmit new packets.  */
  while ((chunk = sctp_outq_dequeue_data(q)) != NULL) {
    sctp_packet_transmit_chunk
      sctp_packet_append_chunk
        sctp_packet_can_append_data
        __sctp_packet_append_chunk

So chunks are checked one by one.

> consists only of data chunks.

That's not really its purpose. It's to check if it can append a data
chunk to the packet being prepared, while respecting asoc state, cwnd,
etc.

HTH!

  Marcelo

> If it returns SCTP_XMIT_OK then a message is built collecting the
> rest of the queued data chunks (until the window fills).
> 
> So if I send a message with MSG_MORE set (on an idle connection)
> SCTP_XMIT_DELAY is returned and a message isn't sent.
> 
> I now send a second small message, this time with MSG_MORE clear.
> The message is queued, then the code looks to see if it can send anything.
> 
> sctp_packet_can_append_data() is called for the first queued chunk.
> Since it has force_delay set SCTP_XMIT_DELAY is returned and no
> message is built.
> The second message isn't even looked at.
> 
> > What MSG_MORE flag actually does is ignore inflight = 0 and
> > sctp_sk(asoc->base.sk)->nodelay to delay the chunks, but still
> > it has to respect the original logic (like !chunk->msg->can_delay
> > || !sctp_packet_empty(packet) || ...)
> > 
> > To delay the chunks with MSG_MORE set even when inflight is 0
> > it especially important here for users.
> 
> I'm not too worried about that.
> Sending the first message was a cheap way to ensure something got
> sent if the application lied and didn't send a subsequent message.
> 
> The change has hit Linus's tree, I'll should be able to test that
> and confirm what I think is going on.
> 
> 	David
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 2/2] sctp: add support for MSG_MORE
  2017-02-23 17:40             ` Marcelo Ricardo Leitner
@ 2017-02-23 18:16               ` Xin Long
  -1 siblings, 0 replies; 44+ messages in thread
From: Xin Long @ 2017-02-23 18:16 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: David Laight, network dev, linux-sctp, Neil Horman, Vlad Yasevich, davem

On Fri, Feb 24, 2017 at 1:40 AM, Marcelo Ricardo Leitner
<marcelo.leitner@gmail.com> wrote:
> On Thu, Feb 23, 2017 at 04:04:10PM +0000, David Laight wrote:
>> From: Xin Long
>> > Sent: 23 February 2017 03:46
>> > On Tue, Feb 21, 2017 at 10:27 PM, David Laight <David.Laight@aculab.com> wrote:
>> > > From: Xin Long
>> > >> Sent: 18 February 2017 17:53
>> > >> This patch is to add support for MSG_MORE on sctp.
>> > >>
>> > >> It adds force_delay in sctp_datamsg to save MSG_MORE, and sets it after
>> > >> creating datamsg according to the send flag. sctp_packet_can_append_data
>> > >> then uses it to decide if the chunks of this msg will be sent at once or
>> > >> delay it.
>> > >>
>> > >> Note that unlike [1], this patch saves MSG_MORE in datamsg, instead of
>> > >> in assoc. As sctp enqueues the chunks first, then dequeue them one by
>> > >> one. If it's saved in assoc,the current msg's send flag (MSG_MORE) may
>> > >> affect other chunks' bundling.
>> > >
>> > > I thought about that and decided that the MSG_MORE flag on the last data
>> > > chunk was the only one that mattered.
>> > > Indeed looking at any others is broken.
>> > >
>> > > Consider what happens if you have two small chunks queued, the first
>> > > with MSG_MORE set, the second with it clear.
>> > >
>> > > I think that sctp_outq_flush() will look at the first chunk and decide it
>> > > doesn't need to do anything because sctp_packet_transmit_chunk()
>> > > returns SCTP_XMIT_DELAY.
>> > > The data chunk with MSG_MORE clear won't even be looked at.
>> > > So the data will never be sent.
>>
>> > It's not that bad as you thought, in sctp_packet_can_append_data():
>> > when inflight == 0 || sctp_sk(asoc->base.sk)->nodelay, the chunks
>> > would be still sent out.
>>
>> One of us isn't understanding the other :-)
>>
>> IIRC sctp_packet_can_append_data() is called for the first queued
>> data chunk in order to decide whether to generate a message that
>
> Perhaps here lies the source of the confusion?
> sctp_packet_can_append_data() is called for all queued data chunks, and
> not just the first one.
>
> sctp_outq_flush
>   (retransmissions here, omitted for simplicity)
>   /* Finally, transmit new packets.  */
>   while ((chunk = sctp_outq_dequeue_data(q)) != NULL) {
>     sctp_packet_transmit_chunk
>       sctp_packet_append_chunk
>         sctp_packet_can_append_data
>         __sctp_packet_append_chunk
>
> So chunks are checked one by one.
I think I got David's point.
like, the queue is:

chunk3[null]-->chunk2 [msg_more]-->chunk1 [msg_more]

it dequeue from chunk1, once it returns SCTP_XMIT_DELAY
chunk2, chunk3 will has no chance to dequeue, as it will
goto: sctpflush_out in sctp_outq_flush(), But we are expecting
to send all chunks.

>
>> consists only of data chunks.
>
> That's not really its purpose. It's to check if it can append a data
> chunk to the packet being prepared, while respecting asoc state, cwnd,
> etc.
>
> HTH!
>
>   Marcelo
>
>> If it returns SCTP_XMIT_OK then a message is built collecting the
>> rest of the queued data chunks (until the window fills).
>>
>> So if I send a message with MSG_MORE set (on an idle connection)
>> SCTP_XMIT_DELAY is returned and a message isn't sent.
>>
>> I now send a second small message, this time with MSG_MORE clear.
>> The message is queued, then the code looks to see if it can send anything.
>>
>> sctp_packet_can_append_data() is called for the first queued chunk.
>> Since it has force_delay set SCTP_XMIT_DELAY is returned and no
>> message is built.
>> The second message isn't even looked at.
>>
>> > What MSG_MORE flag actually does is ignore inflight == 0 and
>> > sctp_sk(asoc->base.sk)->nodelay to delay the chunks, but still
>> > it has to respect the original logic (like !chunk->msg->can_delay
>> > || !sctp_packet_empty(packet) || ...)
>> >
>> > To delay the chunks with MSG_MORE set even when inflight is 0
>> > it especially important here for users.
>>
>> I'm not too worried about that.
>> Sending the first message was a cheap way to ensure something got
>> sent if the application lied and didn't send a subsequent message.
>>
>> The change has hit Linus's tree, I'll should be able to test that
>> and confirm what I think is going on.
>>
>>       David
>>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 2/2] sctp: add support for MSG_MORE
@ 2017-02-23 18:16               ` Xin Long
  0 siblings, 0 replies; 44+ messages in thread
From: Xin Long @ 2017-02-23 18:16 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: David Laight, network dev, linux-sctp, Neil Horman, Vlad Yasevich, davem

On Fri, Feb 24, 2017 at 1:40 AM, Marcelo Ricardo Leitner
<marcelo.leitner@gmail.com> wrote:
> On Thu, Feb 23, 2017 at 04:04:10PM +0000, David Laight wrote:
>> From: Xin Long
>> > Sent: 23 February 2017 03:46
>> > On Tue, Feb 21, 2017 at 10:27 PM, David Laight <David.Laight@aculab.com> wrote:
>> > > From: Xin Long
>> > >> Sent: 18 February 2017 17:53
>> > >> This patch is to add support for MSG_MORE on sctp.
>> > >>
>> > >> It adds force_delay in sctp_datamsg to save MSG_MORE, and sets it after
>> > >> creating datamsg according to the send flag. sctp_packet_can_append_data
>> > >> then uses it to decide if the chunks of this msg will be sent at once or
>> > >> delay it.
>> > >>
>> > >> Note that unlike [1], this patch saves MSG_MORE in datamsg, instead of
>> > >> in assoc. As sctp enqueues the chunks first, then dequeue them one by
>> > >> one. If it's saved in assoc,the current msg's send flag (MSG_MORE) may
>> > >> affect other chunks' bundling.
>> > >
>> > > I thought about that and decided that the MSG_MORE flag on the last data
>> > > chunk was the only one that mattered.
>> > > Indeed looking at any others is broken.
>> > >
>> > > Consider what happens if you have two small chunks queued, the first
>> > > with MSG_MORE set, the second with it clear.
>> > >
>> > > I think that sctp_outq_flush() will look at the first chunk and decide it
>> > > doesn't need to do anything because sctp_packet_transmit_chunk()
>> > > returns SCTP_XMIT_DELAY.
>> > > The data chunk with MSG_MORE clear won't even be looked at.
>> > > So the data will never be sent.
>>
>> > It's not that bad as you thought, in sctp_packet_can_append_data():
>> > when inflight = 0 || sctp_sk(asoc->base.sk)->nodelay, the chunks
>> > would be still sent out.
>>
>> One of us isn't understanding the other :-)
>>
>> IIRC sctp_packet_can_append_data() is called for the first queued
>> data chunk in order to decide whether to generate a message that
>
> Perhaps here lies the source of the confusion?
> sctp_packet_can_append_data() is called for all queued data chunks, and
> not just the first one.
>
> sctp_outq_flush
>   (retransmissions here, omitted for simplicity)
>   /* Finally, transmit new packets.  */
>   while ((chunk = sctp_outq_dequeue_data(q)) != NULL) {
>     sctp_packet_transmit_chunk
>       sctp_packet_append_chunk
>         sctp_packet_can_append_data
>         __sctp_packet_append_chunk
>
> So chunks are checked one by one.
I think I got David's point.
like, the queue is:

chunk3[null]-->chunk2 [msg_more]-->chunk1 [msg_more]

it dequeue from chunk1, once it returns SCTP_XMIT_DELAY
chunk2, chunk3 will has no chance to dequeue, as it will
goto: sctpflush_out in sctp_outq_flush(), But we are expecting
to send all chunks.

>
>> consists only of data chunks.
>
> That's not really its purpose. It's to check if it can append a data
> chunk to the packet being prepared, while respecting asoc state, cwnd,
> etc.
>
> HTH!
>
>   Marcelo
>
>> If it returns SCTP_XMIT_OK then a message is built collecting the
>> rest of the queued data chunks (until the window fills).
>>
>> So if I send a message with MSG_MORE set (on an idle connection)
>> SCTP_XMIT_DELAY is returned and a message isn't sent.
>>
>> I now send a second small message, this time with MSG_MORE clear.
>> The message is queued, then the code looks to see if it can send anything.
>>
>> sctp_packet_can_append_data() is called for the first queued chunk.
>> Since it has force_delay set SCTP_XMIT_DELAY is returned and no
>> message is built.
>> The second message isn't even looked at.
>>
>> > What MSG_MORE flag actually does is ignore inflight = 0 and
>> > sctp_sk(asoc->base.sk)->nodelay to delay the chunks, but still
>> > it has to respect the original logic (like !chunk->msg->can_delay
>> > || !sctp_packet_empty(packet) || ...)
>> >
>> > To delay the chunks with MSG_MORE set even when inflight is 0
>> > it especially important here for users.
>>
>> I'm not too worried about that.
>> Sending the first message was a cheap way to ensure something got
>> sent if the application lied and didn't send a subsequent message.
>>
>> The change has hit Linus's tree, I'll should be able to test that
>> and confirm what I think is going on.
>>
>>       David
>>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 2/2] sctp: add support for MSG_MORE
  2017-02-23 18:16               ` Xin Long
@ 2017-02-23 18:39                 ` Marcelo Ricardo Leitner
  -1 siblings, 0 replies; 44+ messages in thread
From: Marcelo Ricardo Leitner @ 2017-02-23 18:39 UTC (permalink / raw)
  To: Xin Long
  Cc: David Laight, network dev, linux-sctp, Neil Horman, Vlad Yasevich, davem

On Fri, Feb 24, 2017 at 02:16:02AM +0800, Xin Long wrote:
> On Fri, Feb 24, 2017 at 1:40 AM, Marcelo Ricardo Leitner
> <marcelo.leitner@gmail.com> wrote:
> > On Thu, Feb 23, 2017 at 04:04:10PM +0000, David Laight wrote:
> >> From: Xin Long
> >> > Sent: 23 February 2017 03:46
> >> > On Tue, Feb 21, 2017 at 10:27 PM, David Laight <David.Laight@aculab.com> wrote:
> >> > > From: Xin Long
> >> > >> Sent: 18 February 2017 17:53
> >> > >> This patch is to add support for MSG_MORE on sctp.
> >> > >>
> >> > >> It adds force_delay in sctp_datamsg to save MSG_MORE, and sets it after
> >> > >> creating datamsg according to the send flag. sctp_packet_can_append_data
> >> > >> then uses it to decide if the chunks of this msg will be sent at once or
> >> > >> delay it.
> >> > >>
> >> > >> Note that unlike [1], this patch saves MSG_MORE in datamsg, instead of
> >> > >> in assoc. As sctp enqueues the chunks first, then dequeue them one by
> >> > >> one. If it's saved in assoc,the current msg's send flag (MSG_MORE) may
> >> > >> affect other chunks' bundling.
> >> > >
> >> > > I thought about that and decided that the MSG_MORE flag on the last data
> >> > > chunk was the only one that mattered.
> >> > > Indeed looking at any others is broken.
> >> > >
> >> > > Consider what happens if you have two small chunks queued, the first
> >> > > with MSG_MORE set, the second with it clear.
> >> > >
> >> > > I think that sctp_outq_flush() will look at the first chunk and decide it
> >> > > doesn't need to do anything because sctp_packet_transmit_chunk()
> >> > > returns SCTP_XMIT_DELAY.
> >> > > The data chunk with MSG_MORE clear won't even be looked at.
> >> > > So the data will never be sent.
> >>
> >> > It's not that bad as you thought, in sctp_packet_can_append_data():
> >> > when inflight == 0 || sctp_sk(asoc->base.sk)->nodelay, the chunks
> >> > would be still sent out.
> >>
> >> One of us isn't understanding the other :-)
> >>
> >> IIRC sctp_packet_can_append_data() is called for the first queued
> >> data chunk in order to decide whether to generate a message that
> >
> > Perhaps here lies the source of the confusion?
> > sctp_packet_can_append_data() is called for all queued data chunks, and
> > not just the first one.
> >
> > sctp_outq_flush
> >   (retransmissions here, omitted for simplicity)
> >   /* Finally, transmit new packets.  */
> >   while ((chunk = sctp_outq_dequeue_data(q)) != NULL) {
> >     sctp_packet_transmit_chunk
> >       sctp_packet_append_chunk
> >         sctp_packet_can_append_data
> >         __sctp_packet_append_chunk
> >
> > So chunks are checked one by one.
> I think I got David's point.
> like, the queue is:
> 
> chunk3[null]-->chunk2 [msg_more]-->chunk1 [msg_more]
> 
> it dequeue from chunk1, once it returns SCTP_XMIT_DELAY
> chunk2, chunk3 will has no chance to dequeue, as it will
> goto: sctpflush_out in sctp_outq_flush(), But we are expecting
> to send all chunks.

Ahh yes, exactly.

> 
> >
> >> consists only of data chunks.
> >
> > That's not really its purpose. It's to check if it can append a data
> > chunk to the packet being prepared, while respecting asoc state, cwnd,
> > etc.
> >
> > HTH!
> >
> >   Marcelo
> >
> >> If it returns SCTP_XMIT_OK then a message is built collecting the
> >> rest of the queued data chunks (until the window fills).
> >>
> >> So if I send a message with MSG_MORE set (on an idle connection)
> >> SCTP_XMIT_DELAY is returned and a message isn't sent.
> >>
> >> I now send a second small message, this time with MSG_MORE clear.
> >> The message is queued, then the code looks to see if it can send anything.
> >>
> >> sctp_packet_can_append_data() is called for the first queued chunk.
> >> Since it has force_delay set SCTP_XMIT_DELAY is returned and no
> >> message is built.
> >> The second message isn't even looked at.
> >>
> >> > What MSG_MORE flag actually does is ignore inflight == 0 and
> >> > sctp_sk(asoc->base.sk)->nodelay to delay the chunks, but still
> >> > it has to respect the original logic (like !chunk->msg->can_delay
> >> > || !sctp_packet_empty(packet) || ...)
> >> >
> >> > To delay the chunks with MSG_MORE set even when inflight is 0
> >> > it especially important here for users.
> >>
> >> I'm not too worried about that.
> >> Sending the first message was a cheap way to ensure something got
> >> sent if the application lied and didn't send a subsequent message.
> >>
> >> The change has hit Linus's tree, I'll should be able to test that
> >> and confirm what I think is going on.
> >>
> >>       David
> >>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 2/2] sctp: add support for MSG_MORE
@ 2017-02-23 18:39                 ` Marcelo Ricardo Leitner
  0 siblings, 0 replies; 44+ messages in thread
From: Marcelo Ricardo Leitner @ 2017-02-23 18:39 UTC (permalink / raw)
  To: Xin Long
  Cc: David Laight, network dev, linux-sctp, Neil Horman, Vlad Yasevich, davem

On Fri, Feb 24, 2017 at 02:16:02AM +0800, Xin Long wrote:
> On Fri, Feb 24, 2017 at 1:40 AM, Marcelo Ricardo Leitner
> <marcelo.leitner@gmail.com> wrote:
> > On Thu, Feb 23, 2017 at 04:04:10PM +0000, David Laight wrote:
> >> From: Xin Long
> >> > Sent: 23 February 2017 03:46
> >> > On Tue, Feb 21, 2017 at 10:27 PM, David Laight <David.Laight@aculab.com> wrote:
> >> > > From: Xin Long
> >> > >> Sent: 18 February 2017 17:53
> >> > >> This patch is to add support for MSG_MORE on sctp.
> >> > >>
> >> > >> It adds force_delay in sctp_datamsg to save MSG_MORE, and sets it after
> >> > >> creating datamsg according to the send flag. sctp_packet_can_append_data
> >> > >> then uses it to decide if the chunks of this msg will be sent at once or
> >> > >> delay it.
> >> > >>
> >> > >> Note that unlike [1], this patch saves MSG_MORE in datamsg, instead of
> >> > >> in assoc. As sctp enqueues the chunks first, then dequeue them one by
> >> > >> one. If it's saved in assoc,the current msg's send flag (MSG_MORE) may
> >> > >> affect other chunks' bundling.
> >> > >
> >> > > I thought about that and decided that the MSG_MORE flag on the last data
> >> > > chunk was the only one that mattered.
> >> > > Indeed looking at any others is broken.
> >> > >
> >> > > Consider what happens if you have two small chunks queued, the first
> >> > > with MSG_MORE set, the second with it clear.
> >> > >
> >> > > I think that sctp_outq_flush() will look at the first chunk and decide it
> >> > > doesn't need to do anything because sctp_packet_transmit_chunk()
> >> > > returns SCTP_XMIT_DELAY.
> >> > > The data chunk with MSG_MORE clear won't even be looked at.
> >> > > So the data will never be sent.
> >>
> >> > It's not that bad as you thought, in sctp_packet_can_append_data():
> >> > when inflight = 0 || sctp_sk(asoc->base.sk)->nodelay, the chunks
> >> > would be still sent out.
> >>
> >> One of us isn't understanding the other :-)
> >>
> >> IIRC sctp_packet_can_append_data() is called for the first queued
> >> data chunk in order to decide whether to generate a message that
> >
> > Perhaps here lies the source of the confusion?
> > sctp_packet_can_append_data() is called for all queued data chunks, and
> > not just the first one.
> >
> > sctp_outq_flush
> >   (retransmissions here, omitted for simplicity)
> >   /* Finally, transmit new packets.  */
> >   while ((chunk = sctp_outq_dequeue_data(q)) != NULL) {
> >     sctp_packet_transmit_chunk
> >       sctp_packet_append_chunk
> >         sctp_packet_can_append_data
> >         __sctp_packet_append_chunk
> >
> > So chunks are checked one by one.
> I think I got David's point.
> like, the queue is:
> 
> chunk3[null]-->chunk2 [msg_more]-->chunk1 [msg_more]
> 
> it dequeue from chunk1, once it returns SCTP_XMIT_DELAY
> chunk2, chunk3 will has no chance to dequeue, as it will
> goto: sctpflush_out in sctp_outq_flush(), But we are expecting
> to send all chunks.

Ahh yes, exactly.

> 
> >
> >> consists only of data chunks.
> >
> > That's not really its purpose. It's to check if it can append a data
> > chunk to the packet being prepared, while respecting asoc state, cwnd,
> > etc.
> >
> > HTH!
> >
> >   Marcelo
> >
> >> If it returns SCTP_XMIT_OK then a message is built collecting the
> >> rest of the queued data chunks (until the window fills).
> >>
> >> So if I send a message with MSG_MORE set (on an idle connection)
> >> SCTP_XMIT_DELAY is returned and a message isn't sent.
> >>
> >> I now send a second small message, this time with MSG_MORE clear.
> >> The message is queued, then the code looks to see if it can send anything.
> >>
> >> sctp_packet_can_append_data() is called for the first queued chunk.
> >> Since it has force_delay set SCTP_XMIT_DELAY is returned and no
> >> message is built.
> >> The second message isn't even looked at.
> >>
> >> > What MSG_MORE flag actually does is ignore inflight = 0 and
> >> > sctp_sk(asoc->base.sk)->nodelay to delay the chunks, but still
> >> > it has to respect the original logic (like !chunk->msg->can_delay
> >> > || !sctp_packet_empty(packet) || ...)
> >> >
> >> > To delay the chunks with MSG_MORE set even when inflight is 0
> >> > it especially important here for users.
> >>
> >> I'm not too worried about that.
> >> Sending the first message was a cheap way to ensure something got
> >> sent if the application lied and didn't send a subsequent message.
> >>
> >> The change has hit Linus's tree, I'll should be able to test that
> >> and confirm what I think is going on.
> >>
> >>       David
> >>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 2/2] sctp: add support for MSG_MORE
  2017-02-23 16:04           ` David Laight
@ 2017-02-24  6:43             ` Xin Long
  -1 siblings, 0 replies; 44+ messages in thread
From: Xin Long @ 2017-02-24  6:43 UTC (permalink / raw)
  To: David Laight
  Cc: network dev, linux-sctp, Marcelo Ricardo Leitner, Neil Horman,
	Vlad Yasevich, davem

On Fri, Feb 24, 2017 at 12:04 AM, David Laight <David.Laight@aculab.com> wrote:
> From: Xin Long
>> Sent: 23 February 2017 03:46
>> On Tue, Feb 21, 2017 at 10:27 PM, David Laight <David.Laight@aculab.com> wrote:
>> > From: Xin Long
>> >> Sent: 18 February 2017 17:53
>> >> This patch is to add support for MSG_MORE on sctp.
>> >>
>> >> It adds force_delay in sctp_datamsg to save MSG_MORE, and sets it after
>> >> creating datamsg according to the send flag. sctp_packet_can_append_data
>> >> then uses it to decide if the chunks of this msg will be sent at once or
>> >> delay it.
>> >>
>> >> Note that unlike [1], this patch saves MSG_MORE in datamsg, instead of
>> >> in assoc. As sctp enqueues the chunks first, then dequeue them one by
>> >> one. If it's saved in assoc,the current msg's send flag (MSG_MORE) may
>> >> affect other chunks' bundling.
>> >
>> > I thought about that and decided that the MSG_MORE flag on the last data
>> > chunk was the only one that mattered.
>> > Indeed looking at any others is broken.
>> >
>> > Consider what happens if you have two small chunks queued, the first
>> > with MSG_MORE set, the second with it clear.
>> >
>> > I think that sctp_outq_flush() will look at the first chunk and decide it
>> > doesn't need to do anything because sctp_packet_transmit_chunk()
>> > returns SCTP_XMIT_DELAY.
>> > The data chunk with MSG_MORE clear won't even be looked at.
>> > So the data will never be sent.
>
>> It's not that bad as you thought, in sctp_packet_can_append_data():
>> when inflight == 0 || sctp_sk(asoc->base.sk)->nodelay, the chunks
>> would be still sent out.
>
> One of us isn't understanding the other :-)
>
> IIRC sctp_packet_can_append_data() is called for the first queued
> data chunk in order to decide whether to generate a message that
> consists only of data chunks.
> If it returns SCTP_XMIT_OK then a message is built collecting the
> rest of the queued data chunks (until the window fills).
>
> So if I send a message with MSG_MORE set (on an idle connection)
> SCTP_XMIT_DELAY is returned and a message isn't sent.
>
> I now send a second small message, this time with MSG_MORE clear.
> The message is queued, then the code looks to see if it can send anything.
>
> sctp_packet_can_append_data() is called for the first queued chunk.
> Since it has force_delay set SCTP_XMIT_DELAY is returned and no
> message is built.
> The second message isn't even looked at.
You're right. I can see the problem now.

What I expected is it should work like:

1, send 3 small chunks with MSG_MORE set, the queue is:
  chk3 [set] -> chk2 [set] -> chk1 [set]
2. send 1 more chunk with MSG_MORE clear, the queue is:
  chk4[clear] -> chk3 [clear] -> chk2 [clear] -> chk1 [clear]
3. then if user send more small chunks with MSG_MORE set,
the queue is like:
  chkB[set] -> chkA[set] -> chk4[clear] -> chk3 [clear] -> chk2
[clear] -> chk1 [clear]
so that the new small chunks' flag will not affect the other chunks bundling.

is it ok to you to work like that ?
if yes, I propose the fix for this issue is:

@@ -303,6 +303,17 @@ void sctp_outq_tail(struct sctp_outq *q, struct
sctp_chunk *chunk, gfp_t gfp)
                         sctp_cname(SCTP_ST_CHUNK(chunk->chunk_hdr->type)) :
                         "illegal chunk");

+               if (q->has_delay && !chunk->msg->force_delay) {
+                       struct sctp_chunk *chk;
+
+                       list_for_each_entry(chk, &q->out_chunk_list, list) {
+                               if (!chk->msg->force_delay)
+                                       break;
+                               chk->msg->force_delay = 0;
+                       }
+               }
+               q->has_delay = chunk->msg->force_delay;
+

Thanks.

>
>> What MSG_MORE flag actually does is ignore inflight == 0 and
>> sctp_sk(asoc->base.sk)->nodelay to delay the chunks, but still
>> it has to respect the original logic (like !chunk->msg->can_delay
>> || !sctp_packet_empty(packet) || ...)
>>
>> To delay the chunks with MSG_MORE set even when inflight is 0
>> it especially important here for users.
>
> I'm not too worried about that.
> Sending the first message was a cheap way to ensure something got
> sent if the application lied and didn't send a subsequent message.
>
> The change has hit Linus's tree, I'll should be able to test that
> and confirm what I think is going on.
>
>         David
>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 2/2] sctp: add support for MSG_MORE
@ 2017-02-24  6:43             ` Xin Long
  0 siblings, 0 replies; 44+ messages in thread
From: Xin Long @ 2017-02-24  6:43 UTC (permalink / raw)
  To: David Laight
  Cc: network dev, linux-sctp, Marcelo Ricardo Leitner, Neil Horman,
	Vlad Yasevich, davem

On Fri, Feb 24, 2017 at 12:04 AM, David Laight <David.Laight@aculab.com> wrote:
> From: Xin Long
>> Sent: 23 February 2017 03:46
>> On Tue, Feb 21, 2017 at 10:27 PM, David Laight <David.Laight@aculab.com> wrote:
>> > From: Xin Long
>> >> Sent: 18 February 2017 17:53
>> >> This patch is to add support for MSG_MORE on sctp.
>> >>
>> >> It adds force_delay in sctp_datamsg to save MSG_MORE, and sets it after
>> >> creating datamsg according to the send flag. sctp_packet_can_append_data
>> >> then uses it to decide if the chunks of this msg will be sent at once or
>> >> delay it.
>> >>
>> >> Note that unlike [1], this patch saves MSG_MORE in datamsg, instead of
>> >> in assoc. As sctp enqueues the chunks first, then dequeue them one by
>> >> one. If it's saved in assoc,the current msg's send flag (MSG_MORE) may
>> >> affect other chunks' bundling.
>> >
>> > I thought about that and decided that the MSG_MORE flag on the last data
>> > chunk was the only one that mattered.
>> > Indeed looking at any others is broken.
>> >
>> > Consider what happens if you have two small chunks queued, the first
>> > with MSG_MORE set, the second with it clear.
>> >
>> > I think that sctp_outq_flush() will look at the first chunk and decide it
>> > doesn't need to do anything because sctp_packet_transmit_chunk()
>> > returns SCTP_XMIT_DELAY.
>> > The data chunk with MSG_MORE clear won't even be looked at.
>> > So the data will never be sent.
>
>> It's not that bad as you thought, in sctp_packet_can_append_data():
>> when inflight = 0 || sctp_sk(asoc->base.sk)->nodelay, the chunks
>> would be still sent out.
>
> One of us isn't understanding the other :-)
>
> IIRC sctp_packet_can_append_data() is called for the first queued
> data chunk in order to decide whether to generate a message that
> consists only of data chunks.
> If it returns SCTP_XMIT_OK then a message is built collecting the
> rest of the queued data chunks (until the window fills).
>
> So if I send a message with MSG_MORE set (on an idle connection)
> SCTP_XMIT_DELAY is returned and a message isn't sent.
>
> I now send a second small message, this time with MSG_MORE clear.
> The message is queued, then the code looks to see if it can send anything.
>
> sctp_packet_can_append_data() is called for the first queued chunk.
> Since it has force_delay set SCTP_XMIT_DELAY is returned and no
> message is built.
> The second message isn't even looked at.
You're right. I can see the problem now.

What I expected is it should work like:

1, send 3 small chunks with MSG_MORE set, the queue is:
  chk3 [set] -> chk2 [set] -> chk1 [set]
2. send 1 more chunk with MSG_MORE clear, the queue is:
  chk4[clear] -> chk3 [clear] -> chk2 [clear] -> chk1 [clear]
3. then if user send more small chunks with MSG_MORE set,
the queue is like:
  chkB[set] -> chkA[set] -> chk4[clear] -> chk3 [clear] -> chk2
[clear] -> chk1 [clear]
so that the new small chunks' flag will not affect the other chunks bundling.

is it ok to you to work like that ?
if yes, I propose the fix for this issue is:

@@ -303,6 +303,17 @@ void sctp_outq_tail(struct sctp_outq *q, struct
sctp_chunk *chunk, gfp_t gfp)
                         sctp_cname(SCTP_ST_CHUNK(chunk->chunk_hdr->type)) :
                         "illegal chunk");

+               if (q->has_delay && !chunk->msg->force_delay) {
+                       struct sctp_chunk *chk;
+
+                       list_for_each_entry(chk, &q->out_chunk_list, list) {
+                               if (!chk->msg->force_delay)
+                                       break;
+                               chk->msg->force_delay = 0;
+                       }
+               }
+               q->has_delay = chunk->msg->force_delay;
+

Thanks.

>
>> What MSG_MORE flag actually does is ignore inflight = 0 and
>> sctp_sk(asoc->base.sk)->nodelay to delay the chunks, but still
>> it has to respect the original logic (like !chunk->msg->can_delay
>> || !sctp_packet_empty(packet) || ...)
>>
>> To delay the chunks with MSG_MORE set even when inflight is 0
>> it especially important here for users.
>
> I'm not too worried about that.
> Sending the first message was a cheap way to ensure something got
> sent if the application lied and didn't send a subsequent message.
>
> The change has hit Linus's tree, I'll should be able to test that
> and confirm what I think is going on.
>
>         David
>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH net-next 2/2] sctp: add support for MSG_MORE
  2017-02-24  6:43             ` Xin Long
@ 2017-02-24 10:14               ` David Laight
  -1 siblings, 0 replies; 44+ messages in thread
From: David Laight @ 2017-02-24 10:14 UTC (permalink / raw)
  To: 'Xin Long'
  Cc: network dev, linux-sctp, Marcelo Ricardo Leitner, Neil Horman,
	Vlad Yasevich, davem

From: Xin Long
> Sent: 24 February 2017 06:44
...
> > IIRC sctp_packet_can_append_data() is called for the first queued
> > data chunk in order to decide whether to generate a message that
> > consists only of data chunks.
> > If it returns SCTP_XMIT_OK then a message is built collecting the
> > rest of the queued data chunks (until the window fills).
> >
> > So if I send a message with MSG_MORE set (on an idle connection)
> > SCTP_XMIT_DELAY is returned and a message isn't sent.
> >
> > I now send a second small message, this time with MSG_MORE clear.
> > The message is queued, then the code looks to see if it can send anything.
> >
> > sctp_packet_can_append_data() is called for the first queued chunk.
> > Since it has force_delay set SCTP_XMIT_DELAY is returned and no
> > message is built.
> > The second message isn't even looked at.
> You're right. I can see the problem now.
> 
> What I expected is it should work like:
> 
> 1, send 3 small chunks with MSG_MORE set, the queue is:
>   chk3 [set] -> chk2 [set] -> chk1 [set]

Strange way to write a queue! chk1 points to chk2 :-)

> 2. send 1 more chunk with MSG_MORE clear, the queue is:
>   chk4[clear] -> chk3 [clear] -> chk2 [clear] -> chk1 [clear]

I don't think processing the entire queue is a good idea.
Both from execution time and the effects on the data cache.
The SCTP code is horrid enough as it is.

> 3. then if user send more small chunks with MSG_MORE set,
> the queue is like:
>   chkB[set] -> chkA[set] -> chk4[clear] -> chk3 [clear] -> chk2 [clear] -> chk1 [clear]
> so that the new small chunks' flag will not affect the other chunks bundling.

That isn't really necessary.
The user can't expect to have absolute control over which chunks get bundled
together.
If the above chunks still aren't big enough to fill a frame the code might
as well wait for the next chunk instead of building a packet that contains
chk1 through to chkB.

Remember you'll only get a queued chunk with MSG_MORE clear if data can't be sent.
As soon as data can be sent, if the first chunk has MSG_MORE clear all of the
queued chunks will be sent.

So immediately after your (3) the application is expected to send a chunk
with MSG_MORE clear - at that point all the queued chunks can be sent in
a single packet.

So just save the last MSG_MORE on the association as I did.

	David

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH net-next 2/2] sctp: add support for MSG_MORE
@ 2017-02-24 10:14               ` David Laight
  0 siblings, 0 replies; 44+ messages in thread
From: David Laight @ 2017-02-24 10:14 UTC (permalink / raw)
  To: 'Xin Long'
  Cc: network dev, linux-sctp, Marcelo Ricardo Leitner, Neil Horman,
	Vlad Yasevich, davem

RnJvbTogWGluIExvbmcNCj4gU2VudDogMjQgRmVicnVhcnkgMjAxNyAwNjo0NA0KLi4uDQo+ID4g
SUlSQyBzY3RwX3BhY2tldF9jYW5fYXBwZW5kX2RhdGEoKSBpcyBjYWxsZWQgZm9yIHRoZSBmaXJz
dCBxdWV1ZWQNCj4gPiBkYXRhIGNodW5rIGluIG9yZGVyIHRvIGRlY2lkZSB3aGV0aGVyIHRvIGdl
bmVyYXRlIGEgbWVzc2FnZSB0aGF0DQo+ID4gY29uc2lzdHMgb25seSBvZiBkYXRhIGNodW5rcy4N
Cj4gPiBJZiBpdCByZXR1cm5zIFNDVFBfWE1JVF9PSyB0aGVuIGEgbWVzc2FnZSBpcyBidWlsdCBj
b2xsZWN0aW5nIHRoZQ0KPiA+IHJlc3Qgb2YgdGhlIHF1ZXVlZCBkYXRhIGNodW5rcyAodW50aWwg
dGhlIHdpbmRvdyBmaWxscykuDQo+ID4NCj4gPiBTbyBpZiBJIHNlbmQgYSBtZXNzYWdlIHdpdGgg
TVNHX01PUkUgc2V0IChvbiBhbiBpZGxlIGNvbm5lY3Rpb24pDQo+ID4gU0NUUF9YTUlUX0RFTEFZ
IGlzIHJldHVybmVkIGFuZCBhIG1lc3NhZ2UgaXNuJ3Qgc2VudC4NCj4gPg0KPiA+IEkgbm93IHNl
bmQgYSBzZWNvbmQgc21hbGwgbWVzc2FnZSwgdGhpcyB0aW1lIHdpdGggTVNHX01PUkUgY2xlYXIu
DQo+ID4gVGhlIG1lc3NhZ2UgaXMgcXVldWVkLCB0aGVuIHRoZSBjb2RlIGxvb2tzIHRvIHNlZSBp
ZiBpdCBjYW4gc2VuZCBhbnl0aGluZy4NCj4gPg0KPiA+IHNjdHBfcGFja2V0X2Nhbl9hcHBlbmRf
ZGF0YSgpIGlzIGNhbGxlZCBmb3IgdGhlIGZpcnN0IHF1ZXVlZCBjaHVuay4NCj4gPiBTaW5jZSBp
dCBoYXMgZm9yY2VfZGVsYXkgc2V0IFNDVFBfWE1JVF9ERUxBWSBpcyByZXR1cm5lZCBhbmQgbm8N
Cj4gPiBtZXNzYWdlIGlzIGJ1aWx0Lg0KPiA+IFRoZSBzZWNvbmQgbWVzc2FnZSBpc24ndCBldmVu
IGxvb2tlZCBhdC4NCj4gWW91J3JlIHJpZ2h0LiBJIGNhbiBzZWUgdGhlIHByb2JsZW0gbm93Lg0K
PiANCj4gV2hhdCBJIGV4cGVjdGVkIGlzIGl0IHNob3VsZCB3b3JrIGxpa2U6DQo+IA0KPiAxLCBz
ZW5kIDMgc21hbGwgY2h1bmtzIHdpdGggTVNHX01PUkUgc2V0LCB0aGUgcXVldWUgaXM6DQo+ICAg
Y2hrMyBbc2V0XSAtPiBjaGsyIFtzZXRdIC0+IGNoazEgW3NldF0NCg0KU3RyYW5nZSB3YXkgdG8g
d3JpdGUgYSBxdWV1ZSEgY2hrMSBwb2ludHMgdG8gY2hrMiA6LSkNCg0KPiAyLiBzZW5kIDEgbW9y
ZSBjaHVuayB3aXRoIE1TR19NT1JFIGNsZWFyLCB0aGUgcXVldWUgaXM6DQo+ICAgY2hrNFtjbGVh
cl0gLT4gY2hrMyBbY2xlYXJdIC0+IGNoazIgW2NsZWFyXSAtPiBjaGsxIFtjbGVhcl0NCg0KSSBk
b24ndCB0aGluayBwcm9jZXNzaW5nIHRoZSBlbnRpcmUgcXVldWUgaXMgYSBnb29kIGlkZWEuDQpC
b3RoIGZyb20gZXhlY3V0aW9uIHRpbWUgYW5kIHRoZSBlZmZlY3RzIG9uIHRoZSBkYXRhIGNhY2hl
Lg0KVGhlIFNDVFAgY29kZSBpcyBob3JyaWQgZW5vdWdoIGFzIGl0IGlzLg0KDQo+IDMuIHRoZW4g
aWYgdXNlciBzZW5kIG1vcmUgc21hbGwgY2h1bmtzIHdpdGggTVNHX01PUkUgc2V0LA0KPiB0aGUg
cXVldWUgaXMgbGlrZToNCj4gICBjaGtCW3NldF0gLT4gY2hrQVtzZXRdIC0+IGNoazRbY2xlYXJd
IC0+IGNoazMgW2NsZWFyXSAtPiBjaGsyIFtjbGVhcl0gLT4gY2hrMSBbY2xlYXJdDQo+IHNvIHRo
YXQgdGhlIG5ldyBzbWFsbCBjaHVua3MnIGZsYWcgd2lsbCBub3QgYWZmZWN0IHRoZSBvdGhlciBj
aHVua3MgYnVuZGxpbmcuDQoNClRoYXQgaXNuJ3QgcmVhbGx5IG5lY2Vzc2FyeS4NClRoZSB1c2Vy
IGNhbid0IGV4cGVjdCB0byBoYXZlIGFic29sdXRlIGNvbnRyb2wgb3ZlciB3aGljaCBjaHVua3Mg
Z2V0IGJ1bmRsZWQNCnRvZ2V0aGVyLg0KSWYgdGhlIGFib3ZlIGNodW5rcyBzdGlsbCBhcmVuJ3Qg
YmlnIGVub3VnaCB0byBmaWxsIGEgZnJhbWUgdGhlIGNvZGUgbWlnaHQNCmFzIHdlbGwgd2FpdCBm
b3IgdGhlIG5leHQgY2h1bmsgaW5zdGVhZCBvZiBidWlsZGluZyBhIHBhY2tldCB0aGF0IGNvbnRh
aW5zDQpjaGsxIHRocm91Z2ggdG8gY2hrQi4NCg0KUmVtZW1iZXIgeW91J2xsIG9ubHkgZ2V0IGEg
cXVldWVkIGNodW5rIHdpdGggTVNHX01PUkUgY2xlYXIgaWYgZGF0YSBjYW4ndCBiZSBzZW50Lg0K
QXMgc29vbiBhcyBkYXRhIGNhbiBiZSBzZW50LCBpZiB0aGUgZmlyc3QgY2h1bmsgaGFzIE1TR19N
T1JFIGNsZWFyIGFsbCBvZiB0aGUNCnF1ZXVlZCBjaHVua3Mgd2lsbCBiZSBzZW50Lg0KDQpTbyBp
bW1lZGlhdGVseSBhZnRlciB5b3VyICgzKSB0aGUgYXBwbGljYXRpb24gaXMgZXhwZWN0ZWQgdG8g
c2VuZCBhIGNodW5rDQp3aXRoIE1TR19NT1JFIGNsZWFyIC0gYXQgdGhhdCBwb2ludCBhbGwgdGhl
IHF1ZXVlZCBjaHVua3MgY2FuIGJlIHNlbnQgaW4NCmEgc2luZ2xlIHBhY2tldC4NCg0KU28ganVz
dCBzYXZlIHRoZSBsYXN0IE1TR19NT1JFIG9uIHRoZSBhc3NvY2lhdGlvbiBhcyBJIGRpZC4NCg0K
CURhdmlkDQo

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 2/2] sctp: add support for MSG_MORE
  2017-02-24 10:14               ` David Laight
@ 2017-02-25  8:41                 ` Xin Long
  -1 siblings, 0 replies; 44+ messages in thread
From: Xin Long @ 2017-02-25  8:41 UTC (permalink / raw)
  To: David Laight
  Cc: network dev, linux-sctp, Marcelo Ricardo Leitner, Neil Horman,
	Vlad Yasevich, davem

On Fri, Feb 24, 2017 at 6:14 PM, David Laight <David.Laight@aculab.com> wrote:
>
> From: Xin Long
> > Sent: 24 February 2017 06:44
> ...
> > > IIRC sctp_packet_can_append_data() is called for the first queued
> > > data chunk in order to decide whether to generate a message that
> > > consists only of data chunks.
> > > If it returns SCTP_XMIT_OK then a message is built collecting the
> > > rest of the queued data chunks (until the window fills).
> > >
> > > So if I send a message with MSG_MORE set (on an idle connection)
> > > SCTP_XMIT_DELAY is returned and a message isn't sent.
> > >
> > > I now send a second small message, this time with MSG_MORE clear.
> > > The message is queued, then the code looks to see if it can send anything.
> > >
> > > sctp_packet_can_append_data() is called for the first queued chunk.
> > > Since it has force_delay set SCTP_XMIT_DELAY is returned and no
> > > message is built.
> > > The second message isn't even looked at.
> > You're right. I can see the problem now.
> >
> > What I expected is it should work like:
> >
> > 1, send 3 small chunks with MSG_MORE set, the queue is:
> >   chk3 [set] -> chk2 [set] -> chk1 [set]
>
> Strange way to write a queue! chk1 points to chk2 :-)
haha, just  a model.

>
> > 2. send 1 more chunk with MSG_MORE clear, the queue is:
> >   chk4[clear] -> chk3 [clear] -> chk2 [clear] -> chk1 [clear]
>
> I don't think processing the entire queue is a good idea.
> Both from execution time and the effects on the data cache.
> The SCTP code is horrid enough as it is.
you check the codes in last email, it's not processing the entire queue.

1). only when queue has delay chunk inside by checking queue->has_delay
    and current chunk has msg_more flag.

2). will break on the first chunk with clear in the queue.

but yes, in 2), extra work has to be done than before, but not much.

>
> > 3. then if user send more small chunks with MSG_MORE set,
> > the queue is like:
> >   chkB[set] -> chkA[set] -> chk4[clear] -> chk3 [clear] -> chk2 [clear] -> chk1 [clear]
> > so that the new small chunks' flag will not affect the other chunks bundling.
>
> That isn't really necessary.
> The user can't expect to have absolute control over which chunks get bundled
> together.
> If the above chunks still aren't big enough to fill a frame the code might
> as well wait for the next chunk instead of building a packet that contains
> chk1 through to chkB.
>
> Remember you'll only get a queued chunk with MSG_MORE clear if data can't be sent.
> As soon as data can be sent, if the first chunk has MSG_MORE clear all of the
> queued chunks will be sent.
>
> So immediately after your (3) the application is expected to send a chunk
> with MSG_MORE clear - at that point all the queued chunks can be sent in
> a single packet.
understand this.

what I'm worried about is if the msg_more is saved in assoc:
     chk4[clear] -> chk3 [clear] -> chk2 [clear] -> chk1 [clear]
then when you send a small chkA with MSG_MORE,
the queue will be like:
     chkA [set] -> chk4[set] -> chk3 [set] -> chk2 [set] -> chk1 [set]
because msg_more is saved in assoc, every chunk can look at it.
chk1 - chk4 are big enough to be packed into a packet, they were
not sent last time because a lot of chunks are in the retransmit
queue.

But now even if retransmit queue is null, chk1-chk4 are still blocked.

can you accept that chkA may block the old chunks ?

>
> So just save the last MSG_MORE on the association as I did.
I will think about it, thanks.

>
>         David

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 2/2] sctp: add support for MSG_MORE
@ 2017-02-25  8:41                 ` Xin Long
  0 siblings, 0 replies; 44+ messages in thread
From: Xin Long @ 2017-02-25  8:41 UTC (permalink / raw)
  To: David Laight
  Cc: network dev, linux-sctp, Marcelo Ricardo Leitner, Neil Horman,
	Vlad Yasevich, davem

On Fri, Feb 24, 2017 at 6:14 PM, David Laight <David.Laight@aculab.com> wrote:
>
> From: Xin Long
> > Sent: 24 February 2017 06:44
> ...
> > > IIRC sctp_packet_can_append_data() is called for the first queued
> > > data chunk in order to decide whether to generate a message that
> > > consists only of data chunks.
> > > If it returns SCTP_XMIT_OK then a message is built collecting the
> > > rest of the queued data chunks (until the window fills).
> > >
> > > So if I send a message with MSG_MORE set (on an idle connection)
> > > SCTP_XMIT_DELAY is returned and a message isn't sent.
> > >
> > > I now send a second small message, this time with MSG_MORE clear.
> > > The message is queued, then the code looks to see if it can send anything.
> > >
> > > sctp_packet_can_append_data() is called for the first queued chunk.
> > > Since it has force_delay set SCTP_XMIT_DELAY is returned and no
> > > message is built.
> > > The second message isn't even looked at.
> > You're right. I can see the problem now.
> >
> > What I expected is it should work like:
> >
> > 1, send 3 small chunks with MSG_MORE set, the queue is:
> >   chk3 [set] -> chk2 [set] -> chk1 [set]
>
> Strange way to write a queue! chk1 points to chk2 :-)
haha, just  a model.

>
> > 2. send 1 more chunk with MSG_MORE clear, the queue is:
> >   chk4[clear] -> chk3 [clear] -> chk2 [clear] -> chk1 [clear]
>
> I don't think processing the entire queue is a good idea.
> Both from execution time and the effects on the data cache.
> The SCTP code is horrid enough as it is.
you check the codes in last email, it's not processing the entire queue.

1). only when queue has delay chunk inside by checking queue->has_delay
    and current chunk has msg_more flag.

2). will break on the first chunk with clear in the queue.

but yes, in 2), extra work has to be done than before, but not much.

>
> > 3. then if user send more small chunks with MSG_MORE set,
> > the queue is like:
> >   chkB[set] -> chkA[set] -> chk4[clear] -> chk3 [clear] -> chk2 [clear] -> chk1 [clear]
> > so that the new small chunks' flag will not affect the other chunks bundling.
>
> That isn't really necessary.
> The user can't expect to have absolute control over which chunks get bundled
> together.
> If the above chunks still aren't big enough to fill a frame the code might
> as well wait for the next chunk instead of building a packet that contains
> chk1 through to chkB.
>
> Remember you'll only get a queued chunk with MSG_MORE clear if data can't be sent.
> As soon as data can be sent, if the first chunk has MSG_MORE clear all of the
> queued chunks will be sent.
>
> So immediately after your (3) the application is expected to send a chunk
> with MSG_MORE clear - at that point all the queued chunks can be sent in
> a single packet.
understand this.

what I'm worried about is if the msg_more is saved in assoc:
     chk4[clear] -> chk3 [clear] -> chk2 [clear] -> chk1 [clear]
then when you send a small chkA with MSG_MORE,
the queue will be like:
     chkA [set] -> chk4[set] -> chk3 [set] -> chk2 [set] -> chk1 [set]
because msg_more is saved in assoc, every chunk can look at it.
chk1 - chk4 are big enough to be packed into a packet, they were
not sent last time because a lot of chunks are in the retransmit
queue.

But now even if retransmit queue is null, chk1-chk4 are still blocked.

can you accept that chkA may block the old chunks ?

>
> So just save the last MSG_MORE on the association as I did.
I will think about it, thanks.

>
>         David

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 2/2] sctp: add support for MSG_MORE
  2017-02-25  8:41                 ` Xin Long
@ 2017-02-27  4:49                   ` Xin Long
  -1 siblings, 0 replies; 44+ messages in thread
From: Xin Long @ 2017-02-27  4:49 UTC (permalink / raw)
  To: David Laight
  Cc: network dev, linux-sctp, Marcelo Ricardo Leitner, Neil Horman,
	Vlad Yasevich, davem

On Sat, Feb 25, 2017 at 4:41 PM, Xin Long <lucien.xin@gmail.com> wrote:
> On Fri, Feb 24, 2017 at 6:14 PM, David Laight <David.Laight@aculab.com> wrote:
>>
>> From: Xin Long
>> > Sent: 24 February 2017 06:44
>> ...
>> > > IIRC sctp_packet_can_append_data() is called for the first queued
>> > > data chunk in order to decide whether to generate a message that
>> > > consists only of data chunks.
>> > > If it returns SCTP_XMIT_OK then a message is built collecting the
>> > > rest of the queued data chunks (until the window fills).
>> > >
>> > > So if I send a message with MSG_MORE set (on an idle connection)
>> > > SCTP_XMIT_DELAY is returned and a message isn't sent.
>> > >
>> > > I now send a second small message, this time with MSG_MORE clear.
>> > > The message is queued, then the code looks to see if it can send anything.
>> > >
>> > > sctp_packet_can_append_data() is called for the first queued chunk.
>> > > Since it has force_delay set SCTP_XMIT_DELAY is returned and no
>> > > message is built.
>> > > The second message isn't even looked at.
>> > You're right. I can see the problem now.
>> >
>> > What I expected is it should work like:
>> >
>> > 1, send 3 small chunks with MSG_MORE set, the queue is:
>> >   chk3 [set] -> chk2 [set] -> chk1 [set]
>>
>> Strange way to write a queue! chk1 points to chk2 :-)
> haha, just  a model.
>
>>
>> > 2. send 1 more chunk with MSG_MORE clear, the queue is:
>> >   chk4[clear] -> chk3 [clear] -> chk2 [clear] -> chk1 [clear]
>>
>> I don't think processing the entire queue is a good idea.
>> Both from execution time and the effects on the data cache.
>> The SCTP code is horrid enough as it is.
> you check the codes in last email, it's not processing the entire queue.
>
> 1). only when queue has delay chunk inside by checking queue->has_delay
>     and current chunk has msg_more flag.
>
> 2). will break on the first chunk with clear in the queue.
>
> but yes, in 2), extra work has to be done than before, but not much.
>
>>
>> > 3. then if user send more small chunks with MSG_MORE set,
>> > the queue is like:
>> >   chkB[set] -> chkA[set] -> chk4[clear] -> chk3 [clear] -> chk2 [clear] -> chk1 [clear]
>> > so that the new small chunks' flag will not affect the other chunks bundling.
>>
>> That isn't really necessary.
>> The user can't expect to have absolute control over which chunks get bundled
>> together.
>> If the above chunks still aren't big enough to fill a frame the code might
>> as well wait for the next chunk instead of building a packet that contains
>> chk1 through to chkB.
>>
>> Remember you'll only get a queued chunk with MSG_MORE clear if data can't be sent.
>> As soon as data can be sent, if the first chunk has MSG_MORE clear all of the
>> queued chunks will be sent.
>>
>> So immediately after your (3) the application is expected to send a chunk
>> with MSG_MORE clear - at that point all the queued chunks can be sent in
>> a single packet.
> understand this.
>
> what I'm worried about is if the msg_more is saved in assoc:
>      chk4[clear] -> chk3 [clear] -> chk2 [clear] -> chk1 [clear]
> then when you send a small chkA with MSG_MORE,
> the queue will be like:
>      chkA [set] -> chk4[set] -> chk3 [set] -> chk2 [set] -> chk1 [set]
> because msg_more is saved in assoc, every chunk can look at it.
> chk1 - chk4 are big enough to be packed into a packet, they were
> not sent last time because a lot of chunks are in the retransmit
> queue.
>
> But now even if retransmit queue is null, chk1-chk4 are still blocked.
>
> can you accept that chkA may block the old chunks ?
even also block the retransmit chunks.

>
>>
>> So just save the last MSG_MORE on the association as I did.
> I will think about it, thanks.
>
>>
>>         David

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 2/2] sctp: add support for MSG_MORE
@ 2017-02-27  4:49                   ` Xin Long
  0 siblings, 0 replies; 44+ messages in thread
From: Xin Long @ 2017-02-27  4:49 UTC (permalink / raw)
  To: David Laight
  Cc: network dev, linux-sctp, Marcelo Ricardo Leitner, Neil Horman,
	Vlad Yasevich, davem

On Sat, Feb 25, 2017 at 4:41 PM, Xin Long <lucien.xin@gmail.com> wrote:
> On Fri, Feb 24, 2017 at 6:14 PM, David Laight <David.Laight@aculab.com> wrote:
>>
>> From: Xin Long
>> > Sent: 24 February 2017 06:44
>> ...
>> > > IIRC sctp_packet_can_append_data() is called for the first queued
>> > > data chunk in order to decide whether to generate a message that
>> > > consists only of data chunks.
>> > > If it returns SCTP_XMIT_OK then a message is built collecting the
>> > > rest of the queued data chunks (until the window fills).
>> > >
>> > > So if I send a message with MSG_MORE set (on an idle connection)
>> > > SCTP_XMIT_DELAY is returned and a message isn't sent.
>> > >
>> > > I now send a second small message, this time with MSG_MORE clear.
>> > > The message is queued, then the code looks to see if it can send anything.
>> > >
>> > > sctp_packet_can_append_data() is called for the first queued chunk.
>> > > Since it has force_delay set SCTP_XMIT_DELAY is returned and no
>> > > message is built.
>> > > The second message isn't even looked at.
>> > You're right. I can see the problem now.
>> >
>> > What I expected is it should work like:
>> >
>> > 1, send 3 small chunks with MSG_MORE set, the queue is:
>> >   chk3 [set] -> chk2 [set] -> chk1 [set]
>>
>> Strange way to write a queue! chk1 points to chk2 :-)
> haha, just  a model.
>
>>
>> > 2. send 1 more chunk with MSG_MORE clear, the queue is:
>> >   chk4[clear] -> chk3 [clear] -> chk2 [clear] -> chk1 [clear]
>>
>> I don't think processing the entire queue is a good idea.
>> Both from execution time and the effects on the data cache.
>> The SCTP code is horrid enough as it is.
> you check the codes in last email, it's not processing the entire queue.
>
> 1). only when queue has delay chunk inside by checking queue->has_delay
>     and current chunk has msg_more flag.
>
> 2). will break on the first chunk with clear in the queue.
>
> but yes, in 2), extra work has to be done than before, but not much.
>
>>
>> > 3. then if user send more small chunks with MSG_MORE set,
>> > the queue is like:
>> >   chkB[set] -> chkA[set] -> chk4[clear] -> chk3 [clear] -> chk2 [clear] -> chk1 [clear]
>> > so that the new small chunks' flag will not affect the other chunks bundling.
>>
>> That isn't really necessary.
>> The user can't expect to have absolute control over which chunks get bundled
>> together.
>> If the above chunks still aren't big enough to fill a frame the code might
>> as well wait for the next chunk instead of building a packet that contains
>> chk1 through to chkB.
>>
>> Remember you'll only get a queued chunk with MSG_MORE clear if data can't be sent.
>> As soon as data can be sent, if the first chunk has MSG_MORE clear all of the
>> queued chunks will be sent.
>>
>> So immediately after your (3) the application is expected to send a chunk
>> with MSG_MORE clear - at that point all the queued chunks can be sent in
>> a single packet.
> understand this.
>
> what I'm worried about is if the msg_more is saved in assoc:
>      chk4[clear] -> chk3 [clear] -> chk2 [clear] -> chk1 [clear]
> then when you send a small chkA with MSG_MORE,
> the queue will be like:
>      chkA [set] -> chk4[set] -> chk3 [set] -> chk2 [set] -> chk1 [set]
> because msg_more is saved in assoc, every chunk can look at it.
> chk1 - chk4 are big enough to be packed into a packet, they were
> not sent last time because a lot of chunks are in the retransmit
> queue.
>
> But now even if retransmit queue is null, chk1-chk4 are still blocked.
>
> can you accept that chkA may block the old chunks ?
even also block the retransmit chunks.

>
>>
>> So just save the last MSG_MORE on the association as I did.
> I will think about it, thanks.
>
>>
>>         David

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH net-next 2/2] sctp: add support for MSG_MORE
  2017-02-27  4:49                   ` Xin Long
@ 2017-02-27 10:48                     ` David Laight
  -1 siblings, 0 replies; 44+ messages in thread
From: David Laight @ 2017-02-27 10:48 UTC (permalink / raw)
  To: 'Xin Long'
  Cc: network dev, linux-sctp, Marcelo Ricardo Leitner, Neil Horman,
	Vlad Yasevich, davem

From: Xin Long
> Sent: 27 February 2017 04:49
...
> > what I'm worried about is if the msg_more is saved in assoc:
> >      chk4[clear] -> chk3 [clear] -> chk2 [clear] -> chk1 [clear]
> > then when you send a small chkA with MSG_MORE,
> > the queue will be like:
> >      chkA [set] -> chk4[set] -> chk3 [set] -> chk2 [set] -> chk1 [set]
> > because msg_more is saved in assoc, every chunk can look at it.
> > chk1 - chk4 are big enough to be packed into a packet, they were
> > not sent last time because a lot of chunks are in the retransmit
> > queue.

Or just waiting for transmit window space.

> > But now even if retransmit queue is null, chk1-chk4 are still blocked.
> >
> > can you accept that chkA may block the old chunks ?
> even also block the retransmit chunks.

I don't see a problem, the application is about to send another chunk.
It is likely that the whole lot will go in one packet.

If the connection is flow controlled off (even if due to packet loss)
the minor delay waiting for the application won't make any real difference.
Indeed reducing the number of ethernet frames may help increase
overall throughput.

Does a request for chunks to be retransmitted force a response packet be built?
If so it will pick up all the queued data chunks regardless of any saved
MSG_MORE bits.

	David



^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH net-next 2/2] sctp: add support for MSG_MORE
@ 2017-02-27 10:48                     ` David Laight
  0 siblings, 0 replies; 44+ messages in thread
From: David Laight @ 2017-02-27 10:48 UTC (permalink / raw)
  To: 'Xin Long'
  Cc: network dev, linux-sctp, Marcelo Ricardo Leitner, Neil Horman,
	Vlad Yasevich, davem

RnJvbTogWGluIExvbmcNCj4gU2VudDogMjcgRmVicnVhcnkgMjAxNyAwNDo0OQ0KLi4uDQo+ID4g
d2hhdCBJJ20gd29ycmllZCBhYm91dCBpcyBpZiB0aGUgbXNnX21vcmUgaXMgc2F2ZWQgaW4gYXNz
b2M6DQo+ID4gICAgICBjaGs0W2NsZWFyXSAtPiBjaGszIFtjbGVhcl0gLT4gY2hrMiBbY2xlYXJd
IC0+IGNoazEgW2NsZWFyXQ0KPiA+IHRoZW4gd2hlbiB5b3Ugc2VuZCBhIHNtYWxsIGNoa0Egd2l0
aCBNU0dfTU9SRSwNCj4gPiB0aGUgcXVldWUgd2lsbCBiZSBsaWtlOg0KPiA+ICAgICAgY2hrQSBb
c2V0XSAtPiBjaGs0W3NldF0gLT4gY2hrMyBbc2V0XSAtPiBjaGsyIFtzZXRdIC0+IGNoazEgW3Nl
dF0NCj4gPiBiZWNhdXNlIG1zZ19tb3JlIGlzIHNhdmVkIGluIGFzc29jLCBldmVyeSBjaHVuayBj
YW4gbG9vayBhdCBpdC4NCj4gPiBjaGsxIC0gY2hrNCBhcmUgYmlnIGVub3VnaCB0byBiZSBwYWNr
ZWQgaW50byBhIHBhY2tldCwgdGhleSB3ZXJlDQo+ID4gbm90IHNlbnQgbGFzdCB0aW1lIGJlY2F1
c2UgYSBsb3Qgb2YgY2h1bmtzIGFyZSBpbiB0aGUgcmV0cmFuc21pdA0KPiA+IHF1ZXVlLg0KDQpP
ciBqdXN0IHdhaXRpbmcgZm9yIHRyYW5zbWl0IHdpbmRvdyBzcGFjZS4NCg0KPiA+IEJ1dCBub3cg
ZXZlbiBpZiByZXRyYW5zbWl0IHF1ZXVlIGlzIG51bGwsIGNoazEtY2hrNCBhcmUgc3RpbGwgYmxv
Y2tlZC4NCj4gPg0KPiA+IGNhbiB5b3UgYWNjZXB0IHRoYXQgY2hrQSBtYXkgYmxvY2sgdGhlIG9s
ZCBjaHVua3MgPw0KPiBldmVuIGFsc28gYmxvY2sgdGhlIHJldHJhbnNtaXQgY2h1bmtzLg0KDQpJ
IGRvbid0IHNlZSBhIHByb2JsZW0sIHRoZSBhcHBsaWNhdGlvbiBpcyBhYm91dCB0byBzZW5kIGFu
b3RoZXIgY2h1bmsuDQpJdCBpcyBsaWtlbHkgdGhhdCB0aGUgd2hvbGUgbG90IHdpbGwgZ28gaW4g
b25lIHBhY2tldC4NCg0KSWYgdGhlIGNvbm5lY3Rpb24gaXMgZmxvdyBjb250cm9sbGVkIG9mZiAo
ZXZlbiBpZiBkdWUgdG8gcGFja2V0IGxvc3MpDQp0aGUgbWlub3IgZGVsYXkgd2FpdGluZyBmb3Ig
dGhlIGFwcGxpY2F0aW9uIHdvbid0IG1ha2UgYW55IHJlYWwgZGlmZmVyZW5jZS4NCkluZGVlZCBy
ZWR1Y2luZyB0aGUgbnVtYmVyIG9mIGV0aGVybmV0IGZyYW1lcyBtYXkgaGVscCBpbmNyZWFzZQ0K
b3ZlcmFsbCB0aHJvdWdocHV0Lg0KDQpEb2VzIGEgcmVxdWVzdCBmb3IgY2h1bmtzIHRvIGJlIHJl
dHJhbnNtaXR0ZWQgZm9yY2UgYSByZXNwb25zZSBwYWNrZXQgYmUgYnVpbHQ/DQpJZiBzbyBpdCB3
aWxsIHBpY2sgdXAgYWxsIHRoZSBxdWV1ZWQgZGF0YSBjaHVua3MgcmVnYXJkbGVzcyBvZiBhbnkg
c2F2ZWQNCk1TR19NT1JFIGJpdHMuDQoNCglEYXZpZA0KDQoNCg=

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 2/2] sctp: add support for MSG_MORE
  2017-02-24 10:14               ` David Laight
@ 2017-03-21 22:04                 ` Marcelo Ricardo Leitner
  -1 siblings, 0 replies; 44+ messages in thread
From: Marcelo Ricardo Leitner @ 2017-03-21 22:04 UTC (permalink / raw)
  To: David Laight
  Cc: 'Xin Long',
	network dev, linux-sctp, Neil Horman, Vlad Yasevich, davem

Hi,

On Fri, Feb 24, 2017 at 10:14:09AM +0000, David Laight wrote:
> From: Xin Long
> > Sent: 24 February 2017 06:44
> ...
> > > IIRC sctp_packet_can_append_data() is called for the first queued
> > > data chunk in order to decide whether to generate a message that
> > > consists only of data chunks.
> > > If it returns SCTP_XMIT_OK then a message is built collecting the
> > > rest of the queued data chunks (until the window fills).
> > >
> > > So if I send a message with MSG_MORE set (on an idle connection)
> > > SCTP_XMIT_DELAY is returned and a message isn't sent.
> > >
> > > I now send a second small message, this time with MSG_MORE clear.
> > > The message is queued, then the code looks to see if it can send anything.
> > >
> > > sctp_packet_can_append_data() is called for the first queued chunk.
> > > Since it has force_delay set SCTP_XMIT_DELAY is returned and no
> > > message is built.
> > > The second message isn't even looked at.
> > You're right. I can see the problem now.
> > 
> > What I expected is it should work like:
> > 
> > 1, send 3 small chunks with MSG_MORE set, the queue is:
> >   chk3 [set] -> chk2 [set] -> chk1 [set]
> 
> Strange way to write a queue! chk1 points to chk2 :-)

Strictly speaking, it's actually both together, as it's a double-linked
list. :-)

> 
> > 2. send 1 more chunk with MSG_MORE clear, the queue is:
> >   chk4[clear] -> chk3 [clear] -> chk2 [clear] -> chk1 [clear]
> 
> I don't think processing the entire queue is a good idea.
> Both from execution time and the effects on the data cache.

It won't be processing the entire queue if not needed, and it will only
process it on the last sendmsg() call. As the list is double-linked, it
can walk backwards as necessary and stop just at the right point.  So
this doesn't imply on any quadratic or exponential factor here, but
linear and only if/when finishing the MSG_MORE block.

If the application is not using MSG_MORE, impact is zero.

> The SCTP code is horrid enough as it is.
> 
> > 3. then if user send more small chunks with MSG_MORE set,
> > the queue is like:
> >   chkB[set] -> chkA[set] -> chk4[clear] -> chk3 [clear] -> chk2 [clear] -> chk1 [clear]
> > so that the new small chunks' flag will not affect the other chunks bundling.
> 
> That isn't really necessary.
> The user can't expect to have absolute control over which chunks get bundled
> together.

So...?
I mean, I'm okay with that but that doesn't explain why we can't do as
Xin proposed on previous email here.

> If the above chunks still aren't big enough to fill a frame the code might
> as well wait for the next chunk instead of building a packet that contains
> chk1 through to chkB.

Our expectations are the same and that's what the proposed solution also
achieves, no?

> 
> Remember you'll only get a queued chunk with MSG_MORE clear if data can't be sent.
> As soon as data can be sent, if the first chunk has MSG_MORE clear all of the
> queued chunks will be sent.

With the fix proposed by Xin, this would be more like: ... all of the
_non-held_ chunks will be sent.
After all, application asked to hold them, for whatever reason it had.

> 
> So immediately after your (3) the application is expected to send a chunk
> with MSG_MORE clear - at that point all the queued chunks can be sent in
> a single packet.

Yes. Isn't that the idea?

> 
> So just save the last MSG_MORE on the association as I did.

I don't see the reason to change that. Your reply seem to reject the
idea but I cannot get the reason why. The solution proposed is more
complex, yes, allows more control, yes, but those aren't real issues
here.

  Marcelo

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 2/2] sctp: add support for MSG_MORE
@ 2017-03-21 22:04                 ` Marcelo Ricardo Leitner
  0 siblings, 0 replies; 44+ messages in thread
From: Marcelo Ricardo Leitner @ 2017-03-21 22:04 UTC (permalink / raw)
  To: David Laight
  Cc: 'Xin Long',
	network dev, linux-sctp, Neil Horman, Vlad Yasevich, davem

Hi,

On Fri, Feb 24, 2017 at 10:14:09AM +0000, David Laight wrote:
> From: Xin Long
> > Sent: 24 February 2017 06:44
> ...
> > > IIRC sctp_packet_can_append_data() is called for the first queued
> > > data chunk in order to decide whether to generate a message that
> > > consists only of data chunks.
> > > If it returns SCTP_XMIT_OK then a message is built collecting the
> > > rest of the queued data chunks (until the window fills).
> > >
> > > So if I send a message with MSG_MORE set (on an idle connection)
> > > SCTP_XMIT_DELAY is returned and a message isn't sent.
> > >
> > > I now send a second small message, this time with MSG_MORE clear.
> > > The message is queued, then the code looks to see if it can send anything.
> > >
> > > sctp_packet_can_append_data() is called for the first queued chunk.
> > > Since it has force_delay set SCTP_XMIT_DELAY is returned and no
> > > message is built.
> > > The second message isn't even looked at.
> > You're right. I can see the problem now.
> > 
> > What I expected is it should work like:
> > 
> > 1, send 3 small chunks with MSG_MORE set, the queue is:
> >   chk3 [set] -> chk2 [set] -> chk1 [set]
> 
> Strange way to write a queue! chk1 points to chk2 :-)

Strictly speaking, it's actually both together, as it's a double-linked
list. :-)

> 
> > 2. send 1 more chunk with MSG_MORE clear, the queue is:
> >   chk4[clear] -> chk3 [clear] -> chk2 [clear] -> chk1 [clear]
> 
> I don't think processing the entire queue is a good idea.
> Both from execution time and the effects on the data cache.

It won't be processing the entire queue if not needed, and it will only
process it on the last sendmsg() call. As the list is double-linked, it
can walk backwards as necessary and stop just at the right point.  So
this doesn't imply on any quadratic or exponential factor here, but
linear and only if/when finishing the MSG_MORE block.

If the application is not using MSG_MORE, impact is zero.

> The SCTP code is horrid enough as it is.
> 
> > 3. then if user send more small chunks with MSG_MORE set,
> > the queue is like:
> >   chkB[set] -> chkA[set] -> chk4[clear] -> chk3 [clear] -> chk2 [clear] -> chk1 [clear]
> > so that the new small chunks' flag will not affect the other chunks bundling.
> 
> That isn't really necessary.
> The user can't expect to have absolute control over which chunks get bundled
> together.

So...?
I mean, I'm okay with that but that doesn't explain why we can't do as
Xin proposed on previous email here.

> If the above chunks still aren't big enough to fill a frame the code might
> as well wait for the next chunk instead of building a packet that contains
> chk1 through to chkB.

Our expectations are the same and that's what the proposed solution also
achieves, no?

> 
> Remember you'll only get a queued chunk with MSG_MORE clear if data can't be sent.
> As soon as data can be sent, if the first chunk has MSG_MORE clear all of the
> queued chunks will be sent.

With the fix proposed by Xin, this would be more like: ... all of the
_non-held_ chunks will be sent.
After all, application asked to hold them, for whatever reason it had.

> 
> So immediately after your (3) the application is expected to send a chunk
> with MSG_MORE clear - at that point all the queued chunks can be sent in
> a single packet.

Yes. Isn't that the idea?

> 
> So just save the last MSG_MORE on the association as I did.

I don't see the reason to change that. Your reply seem to reject the
idea but I cannot get the reason why. The solution proposed is more
complex, yes, allows more control, yes, but those aren't real issues
here.

  Marcelo


^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH net-next 2/2] sctp: add support for MSG_MORE
  2017-03-21 22:04                 ` Marcelo Ricardo Leitner
  (?)
@ 2017-03-22 14:07                 ` David Laight
  2017-03-22 17:33                     ` 'Marcelo Ricardo Leitner'
  -1 siblings, 1 reply; 44+ messages in thread
From: David Laight @ 2017-03-22 14:07 UTC (permalink / raw)
  To: 'Marcelo Ricardo Leitner'
  Cc: 'Xin Long',
	network dev, linux-sctp, Neil Horman, Vlad Yasevich, davem

From: Marcelo Ricardo Leitner [mailto:marcelo.leitner@gmail.com]
> Sent: 21 March 2017 22:04
> Hi,
...
> > > 2. send 1 more chunk with MSG_MORE clear, the queue is:
> > >   chk4[clear] -> chk3 [clear] -> chk2 [clear] -> chk1 [clear]
> >
> > I don't think processing the entire queue is a good idea.
> > Both from execution time and the effects on the data cache.
> 
> It won't be processing the entire queue if not needed, and it will only
> process it on the last sendmsg() call. As the list is double-linked, it
> can walk backwards as necessary and stop just at the right point.  So
> this doesn't imply on any quadratic or exponential factor here, but
> linear and only if/when finishing the MSG_MORE block.
> 
> If the application is not using MSG_MORE, impact is zero.
> 
> > The SCTP code is horrid enough as it is.
> >
> > > 3. then if user send more small chunks with MSG_MORE set,
> > > the queue is like:
> > >   chkB[set] -> chkA[set] -> chk4[clear] -> chk3 [clear] -> chk2 [clear] -> chk1 [clear]
> > > so that the new small chunks' flag will not affect the other chunks bundling.
> >
> > That isn't really necessary.
> > The user can't expect to have absolute control over which chunks get bundled
> > together.
> 
> So...?
> I mean, I'm okay with that but that doesn't explain why we can't do as
> Xin proposed on previous email here.
> 
> > If the above chunks still aren't big enough to fill a frame the code might
> > as well wait for the next chunk instead of building a packet that contains
> > chk1 through to chkB.
> 
> Our expectations are the same and that's what the proposed solution also
> achieves, no?

Not really.

> > Remember you'll only get a queued chunk with MSG_MORE clear if data can't be sent.
> > As soon as data can be sent, if the first chunk has MSG_MORE clear all of the
> > queued chunks will be sent.
> 
> With the fix proposed by Xin, this would be more like: ... all of the
> _non-held_ chunks will be sent.
> After all, application asked to hold them, for whatever reason it had.

You are mis-understanding what I think MSG_MORE is for.
It isn't the application saying 'don't send this packet', but rather
'there is no point sending ANY data because I've more data to send'.
There is also the inference that the application will immediately
send the next piece of data.

So it isn't a property of the queued chunk, it is an indication that
the application is going to send more data immediately.

> > So immediately after your (3) the application is expected to send a chunk
> > with MSG_MORE clear - at that point all the queued chunks can be sent in
> > a single packet.
> 
> Yes. Isn't that the idea?
> 
> >
> > So just save the last MSG_MORE on the association as I did.
> 
> I don't see the reason to change that. Your reply seem to reject the
> idea but I cannot get the reason why. The solution proposed is more
> complex, yes, allows more control, yes, but those aren't real issues
> here.

I think you are trying to provide control of chunking that is neither
necessary nor desirable and may give a false indication of what it
might be sensible for the application to have control over.

Regardless of the MSG_MORE flags associated with any specific send()
request there will always be protocol effects (like retransmissions
or flow control 'on') that will generate different 'chunking'.

	David

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 2/2] sctp: add support for MSG_MORE
  2017-03-22 14:07                 ` David Laight
@ 2017-03-22 17:33                     ` 'Marcelo Ricardo Leitner'
  0 siblings, 0 replies; 44+ messages in thread
From: 'Marcelo Ricardo Leitner' @ 2017-03-22 17:33 UTC (permalink / raw)
  To: David Laight
  Cc: 'Xin Long',
	network dev, linux-sctp, Neil Horman, Vlad Yasevich, davem

On Wed, Mar 22, 2017 at 02:07:37PM +0000, David Laight wrote:
> From: Marcelo Ricardo Leitner [mailto:marcelo.leitner@gmail.com]
> > Sent: 21 March 2017 22:04
> > Hi,
> ...
> > > > 2. send 1 more chunk with MSG_MORE clear, the queue is:
> > > >   chk4[clear] -> chk3 [clear] -> chk2 [clear] -> chk1 [clear]
> > >
> > > I don't think processing the entire queue is a good idea.
> > > Both from execution time and the effects on the data cache.
> > 
> > It won't be processing the entire queue if not needed, and it will only
> > process it on the last sendmsg() call. As the list is double-linked, it
> > can walk backwards as necessary and stop just at the right point.  So
> > this doesn't imply on any quadratic or exponential factor here, but
> > linear and only if/when finishing the MSG_MORE block.
> > 
> > If the application is not using MSG_MORE, impact is zero.
> > 
> > > The SCTP code is horrid enough as it is.
> > >
> > > > 3. then if user send more small chunks with MSG_MORE set,
> > > > the queue is like:
> > > >   chkB[set] -> chkA[set] -> chk4[clear] -> chk3 [clear] -> chk2 [clear] -> chk1 [clear]
> > > > so that the new small chunks' flag will not affect the other chunks bundling.
> > >
> > > That isn't really necessary.
> > > The user can't expect to have absolute control over which chunks get bundled
> > > together.
> > 
> > So...?
> > I mean, I'm okay with that but that doesn't explain why we can't do as
> > Xin proposed on previous email here.
> > 
> > > If the above chunks still aren't big enough to fill a frame the code might
> > > as well wait for the next chunk instead of building a packet that contains
> > > chk1 through to chkB.
> > 
> > Our expectations are the same and that's what the proposed solution also
> > achieves, no?
> 
> Not really.
> 
> > > Remember you'll only get a queued chunk with MSG_MORE clear if data can't be sent.
> > > As soon as data can be sent, if the first chunk has MSG_MORE clear all of the
> > > queued chunks will be sent.
> > 
> > With the fix proposed by Xin, this would be more like: ... all of the
> > _non-held_ chunks will be sent.
> > After all, application asked to hold them, for whatever reason it had.
> 
> You are mis-understanding what I think MSG_MORE is for.
> It isn't the application saying 'don't send this packet', but rather
> 'there is no point sending ANY data because I've more data to send'.

I see the difference now, thanks.

> There is also the inference that the application will immediately
> send the next piece of data.
> 
> So it isn't a property of the queued chunk, it is an indication that
> the application is going to send more data immediately.

It would basically avoid sending the chunk immediately when inflight==0
and the rest could stay as is then.
As the application is supposed to send more data immediately and clear
the flag, we can send packets as they become full, and also let normal
Nagle processing be. TCP also has a ceiling on this waiting, 200ms
according to man 7 tcp/TCP_CORK.

> 
> > > So immediately after your (3) the application is expected to send a chunk
> > > with MSG_MORE clear - at that point all the queued chunks can be sent in
> > > a single packet.
> > 
> > Yes. Isn't that the idea?
> > 
> > >
> > > So just save the last MSG_MORE on the association as I did.
> > 
> > I don't see the reason to change that. Your reply seem to reject the
> > idea but I cannot get the reason why. The solution proposed is more
> > complex, yes, allows more control, yes, but those aren't real issues
> > here.
> 
> I think you are trying to provide control of chunking that is neither
> necessary nor desirable and may give a false indication of what it
> might be sensible for the application to have control over.

Agreed.

> 
> Regardless of the MSG_MORE flags associated with any specific send()
> request there will always be protocol effects (like retransmissions
> or flow control 'on') that will generate different 'chunking'.

Yes, those are the ones that may lead to some confusion on how it
actually works, and mangling them is not really desired for the
sideeffects that it might have.

Sooner or later we could have bug reports like "hey this chunk shouldn't
have been packed with that." if we stick with the initial proposition,
while with David's view, we are only promising to not send packets with
a single chunk and as long as the application send more data fast enough.

David, are we on the same page now? ;-)

Xin, what do you think?

  Marcelo

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 2/2] sctp: add support for MSG_MORE
@ 2017-03-22 17:33                     ` 'Marcelo Ricardo Leitner'
  0 siblings, 0 replies; 44+ messages in thread
From: 'Marcelo Ricardo Leitner' @ 2017-03-22 17:33 UTC (permalink / raw)
  To: David Laight
  Cc: 'Xin Long',
	network dev, linux-sctp, Neil Horman, Vlad Yasevich, davem

On Wed, Mar 22, 2017 at 02:07:37PM +0000, David Laight wrote:
> From: Marcelo Ricardo Leitner [mailto:marcelo.leitner@gmail.com]
> > Sent: 21 March 2017 22:04
> > Hi,
> ...
> > > > 2. send 1 more chunk with MSG_MORE clear, the queue is:
> > > >   chk4[clear] -> chk3 [clear] -> chk2 [clear] -> chk1 [clear]
> > >
> > > I don't think processing the entire queue is a good idea.
> > > Both from execution time and the effects on the data cache.
> > 
> > It won't be processing the entire queue if not needed, and it will only
> > process it on the last sendmsg() call. As the list is double-linked, it
> > can walk backwards as necessary and stop just at the right point.  So
> > this doesn't imply on any quadratic or exponential factor here, but
> > linear and only if/when finishing the MSG_MORE block.
> > 
> > If the application is not using MSG_MORE, impact is zero.
> > 
> > > The SCTP code is horrid enough as it is.
> > >
> > > > 3. then if user send more small chunks with MSG_MORE set,
> > > > the queue is like:
> > > >   chkB[set] -> chkA[set] -> chk4[clear] -> chk3 [clear] -> chk2 [clear] -> chk1 [clear]
> > > > so that the new small chunks' flag will not affect the other chunks bundling.
> > >
> > > That isn't really necessary.
> > > The user can't expect to have absolute control over which chunks get bundled
> > > together.
> > 
> > So...?
> > I mean, I'm okay with that but that doesn't explain why we can't do as
> > Xin proposed on previous email here.
> > 
> > > If the above chunks still aren't big enough to fill a frame the code might
> > > as well wait for the next chunk instead of building a packet that contains
> > > chk1 through to chkB.
> > 
> > Our expectations are the same and that's what the proposed solution also
> > achieves, no?
> 
> Not really.
> 
> > > Remember you'll only get a queued chunk with MSG_MORE clear if data can't be sent.
> > > As soon as data can be sent, if the first chunk has MSG_MORE clear all of the
> > > queued chunks will be sent.
> > 
> > With the fix proposed by Xin, this would be more like: ... all of the
> > _non-held_ chunks will be sent.
> > After all, application asked to hold them, for whatever reason it had.
> 
> You are mis-understanding what I think MSG_MORE is for.
> It isn't the application saying 'don't send this packet', but rather
> 'there is no point sending ANY data because I've more data to send'.

I see the difference now, thanks.

> There is also the inference that the application will immediately
> send the next piece of data.
> 
> So it isn't a property of the queued chunk, it is an indication that
> the application is going to send more data immediately.

It would basically avoid sending the chunk immediately when inflight=0
and the rest could stay as is then.
As the application is supposed to send more data immediately and clear
the flag, we can send packets as they become full, and also let normal
Nagle processing be. TCP also has a ceiling on this waiting, 200ms
according to man 7 tcp/TCP_CORK.

> 
> > > So immediately after your (3) the application is expected to send a chunk
> > > with MSG_MORE clear - at that point all the queued chunks can be sent in
> > > a single packet.
> > 
> > Yes. Isn't that the idea?
> > 
> > >
> > > So just save the last MSG_MORE on the association as I did.
> > 
> > I don't see the reason to change that. Your reply seem to reject the
> > idea but I cannot get the reason why. The solution proposed is more
> > complex, yes, allows more control, yes, but those aren't real issues
> > here.
> 
> I think you are trying to provide control of chunking that is neither
> necessary nor desirable and may give a false indication of what it
> might be sensible for the application to have control over.

Agreed.

> 
> Regardless of the MSG_MORE flags associated with any specific send()
> request there will always be protocol effects (like retransmissions
> or flow control 'on') that will generate different 'chunking'.

Yes, those are the ones that may lead to some confusion on how it
actually works, and mangling them is not really desired for the
sideeffects that it might have.

Sooner or later we could have bug reports like "hey this chunk shouldn't
have been packed with that." if we stick with the initial proposition,
while with David's view, we are only promising to not send packets with
a single chunk and as long as the application send more data fast enough.

David, are we on the same page now? ;-)

Xin, what do you think?

  Marcelo


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 2/2] sctp: add support for MSG_MORE
  2017-03-22 17:33                     ` 'Marcelo Ricardo Leitner'
@ 2017-03-23  4:35                       ` Xin Long
  -1 siblings, 0 replies; 44+ messages in thread
From: Xin Long @ 2017-03-23  4:35 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: David Laight, network dev, linux-sctp, Neil Horman, Vlad Yasevich, davem

On Thu, Mar 23, 2017 at 1:33 AM, Marcelo Ricardo Leitner
<marcelo.leitner@gmail.com> wrote:
> On Wed, Mar 22, 2017 at 02:07:37PM +0000, David Laight wrote:
>> From: Marcelo Ricardo Leitner [mailto:marcelo.leitner@gmail.com]
>> > Sent: 21 March 2017 22:04
>> > Hi,
>> ...
>> > > > 2. send 1 more chunk with MSG_MORE clear, the queue is:
>> > > >   chk4[clear] -> chk3 [clear] -> chk2 [clear] -> chk1 [clear]
>> > >
>> > > I don't think processing the entire queue is a good idea.
>> > > Both from execution time and the effects on the data cache.
>> >
>> > It won't be processing the entire queue if not needed, and it will only
>> > process it on the last sendmsg() call. As the list is double-linked, it
>> > can walk backwards as necessary and stop just at the right point.  So
>> > this doesn't imply on any quadratic or exponential factor here, but
>> > linear and only if/when finishing the MSG_MORE block.
>> >
>> > If the application is not using MSG_MORE, impact is zero.
>> >
>> > > The SCTP code is horrid enough as it is.
>> > >
>> > > > 3. then if user send more small chunks with MSG_MORE set,
>> > > > the queue is like:
>> > > >   chkB[set] -> chkA[set] -> chk4[clear] -> chk3 [clear] -> chk2 [clear] -> chk1 [clear]
>> > > > so that the new small chunks' flag will not affect the other chunks bundling.
>> > >
>> > > That isn't really necessary.
>> > > The user can't expect to have absolute control over which chunks get bundled
>> > > together.
>> >
>> > So...?
>> > I mean, I'm okay with that but that doesn't explain why we can't do as
>> > Xin proposed on previous email here.
>> >
>> > > If the above chunks still aren't big enough to fill a frame the code might
>> > > as well wait for the next chunk instead of building a packet that contains
>> > > chk1 through to chkB.
>> >
>> > Our expectations are the same and that's what the proposed solution also
>> > achieves, no?
>>
>> Not really.
>>
>> > > Remember you'll only get a queued chunk with MSG_MORE clear if data can't be sent.
>> > > As soon as data can be sent, if the first chunk has MSG_MORE clear all of the
>> > > queued chunks will be sent.
>> >
>> > With the fix proposed by Xin, this would be more like: ... all of the
>> > _non-held_ chunks will be sent.
>> > After all, application asked to hold them, for whatever reason it had.
>>
>> You are mis-understanding what I think MSG_MORE is for.
>> It isn't the application saying 'don't send this packet', but rather
>> 'there is no point sending ANY data because I've more data to send'.
>
> I see the difference now, thanks.
>
>> There is also the inference that the application will immediately
>> send the next piece of data.
>>
>> So it isn't a property of the queued chunk, it is an indication that
>> the application is going to send more data immediately.
>
> It would basically avoid sending the chunk immediately when inflight==0
> and the rest could stay as is then.
> As the application is supposed to send more data immediately and clear
> the flag, we can send packets as they become full, and also let normal
> Nagle processing be. TCP also has a ceiling on this waiting, 200ms
> according to man 7 tcp/TCP_CORK.
>
>>
>> > > So immediately after your (3) the application is expected to send a chunk
>> > > with MSG_MORE clear - at that point all the queued chunks can be sent in
>> > > a single packet.
>> >
>> > Yes. Isn't that the idea?
>> >
>> > >
>> > > So just save the last MSG_MORE on the association as I did.
>> >
>> > I don't see the reason to change that. Your reply seem to reject the
>> > idea but I cannot get the reason why. The solution proposed is more
>> > complex, yes, allows more control, yes, but those aren't real issues
>> > here.
>>
>> I think you are trying to provide control of chunking that is neither
>> necessary nor desirable and may give a false indication of what it
>> might be sensible for the application to have control over.
>
> Agreed.
>
>>
>> Regardless of the MSG_MORE flags associated with any specific send()
>> request there will always be protocol effects (like retransmissions
>> or flow control 'on') that will generate different 'chunking'.
>
> Yes, those are the ones that may lead to some confusion on how it
> actually works, and mangling them is not really desired for the
> sideeffects that it might have.
>
> Sooner or later we could have bug reports like "hey this chunk shouldn't
> have been packed with that." if we stick with the initial proposition,
> while with David's view, we are only promising to not send packets with
> a single chunk and as long as the application send more data fast enough.
>
> David, are we on the same page now? ;-)
>
> Xin, what do you think?
If we insist that MSG_MORE means not to send  ANY data, I compromise.
does ANY include retransmission DATA? should MSG_MORE block
retransmission ?

>
>   Marcelo
>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 2/2] sctp: add support for MSG_MORE
@ 2017-03-23  4:35                       ` Xin Long
  0 siblings, 0 replies; 44+ messages in thread
From: Xin Long @ 2017-03-23  4:35 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: David Laight, network dev, linux-sctp, Neil Horman, Vlad Yasevich, davem

On Thu, Mar 23, 2017 at 1:33 AM, Marcelo Ricardo Leitner
<marcelo.leitner@gmail.com> wrote:
> On Wed, Mar 22, 2017 at 02:07:37PM +0000, David Laight wrote:
>> From: Marcelo Ricardo Leitner [mailto:marcelo.leitner@gmail.com]
>> > Sent: 21 March 2017 22:04
>> > Hi,
>> ...
>> > > > 2. send 1 more chunk with MSG_MORE clear, the queue is:
>> > > >   chk4[clear] -> chk3 [clear] -> chk2 [clear] -> chk1 [clear]
>> > >
>> > > I don't think processing the entire queue is a good idea.
>> > > Both from execution time and the effects on the data cache.
>> >
>> > It won't be processing the entire queue if not needed, and it will only
>> > process it on the last sendmsg() call. As the list is double-linked, it
>> > can walk backwards as necessary and stop just at the right point.  So
>> > this doesn't imply on any quadratic or exponential factor here, but
>> > linear and only if/when finishing the MSG_MORE block.
>> >
>> > If the application is not using MSG_MORE, impact is zero.
>> >
>> > > The SCTP code is horrid enough as it is.
>> > >
>> > > > 3. then if user send more small chunks with MSG_MORE set,
>> > > > the queue is like:
>> > > >   chkB[set] -> chkA[set] -> chk4[clear] -> chk3 [clear] -> chk2 [clear] -> chk1 [clear]
>> > > > so that the new small chunks' flag will not affect the other chunks bundling.
>> > >
>> > > That isn't really necessary.
>> > > The user can't expect to have absolute control over which chunks get bundled
>> > > together.
>> >
>> > So...?
>> > I mean, I'm okay with that but that doesn't explain why we can't do as
>> > Xin proposed on previous email here.
>> >
>> > > If the above chunks still aren't big enough to fill a frame the code might
>> > > as well wait for the next chunk instead of building a packet that contains
>> > > chk1 through to chkB.
>> >
>> > Our expectations are the same and that's what the proposed solution also
>> > achieves, no?
>>
>> Not really.
>>
>> > > Remember you'll only get a queued chunk with MSG_MORE clear if data can't be sent.
>> > > As soon as data can be sent, if the first chunk has MSG_MORE clear all of the
>> > > queued chunks will be sent.
>> >
>> > With the fix proposed by Xin, this would be more like: ... all of the
>> > _non-held_ chunks will be sent.
>> > After all, application asked to hold them, for whatever reason it had.
>>
>> You are mis-understanding what I think MSG_MORE is for.
>> It isn't the application saying 'don't send this packet', but rather
>> 'there is no point sending ANY data because I've more data to send'.
>
> I see the difference now, thanks.
>
>> There is also the inference that the application will immediately
>> send the next piece of data.
>>
>> So it isn't a property of the queued chunk, it is an indication that
>> the application is going to send more data immediately.
>
> It would basically avoid sending the chunk immediately when inflight=0
> and the rest could stay as is then.
> As the application is supposed to send more data immediately and clear
> the flag, we can send packets as they become full, and also let normal
> Nagle processing be. TCP also has a ceiling on this waiting, 200ms
> according to man 7 tcp/TCP_CORK.
>
>>
>> > > So immediately after your (3) the application is expected to send a chunk
>> > > with MSG_MORE clear - at that point all the queued chunks can be sent in
>> > > a single packet.
>> >
>> > Yes. Isn't that the idea?
>> >
>> > >
>> > > So just save the last MSG_MORE on the association as I did.
>> >
>> > I don't see the reason to change that. Your reply seem to reject the
>> > idea but I cannot get the reason why. The solution proposed is more
>> > complex, yes, allows more control, yes, but those aren't real issues
>> > here.
>>
>> I think you are trying to provide control of chunking that is neither
>> necessary nor desirable and may give a false indication of what it
>> might be sensible for the application to have control over.
>
> Agreed.
>
>>
>> Regardless of the MSG_MORE flags associated with any specific send()
>> request there will always be protocol effects (like retransmissions
>> or flow control 'on') that will generate different 'chunking'.
>
> Yes, those are the ones that may lead to some confusion on how it
> actually works, and mangling them is not really desired for the
> sideeffects that it might have.
>
> Sooner or later we could have bug reports like "hey this chunk shouldn't
> have been packed with that." if we stick with the initial proposition,
> while with David's view, we are only promising to not send packets with
> a single chunk and as long as the application send more data fast enough.
>
> David, are we on the same page now? ;-)
>
> Xin, what do you think?
If we insist that MSG_MORE means not to send  ANY data, I compromise.
does ANY include retransmission DATA? should MSG_MORE block
retransmission ?

>
>   Marcelo
>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 2/2] sctp: add support for MSG_MORE
  2017-03-23  4:35                       ` Xin Long
@ 2017-03-23 16:42                         ` Marcelo Ricardo Leitner
  -1 siblings, 0 replies; 44+ messages in thread
From: Marcelo Ricardo Leitner @ 2017-03-23 16:42 UTC (permalink / raw)
  To: Xin Long
  Cc: David Laight, network dev, linux-sctp, Neil Horman, Vlad Yasevich, davem

On Thu, Mar 23, 2017 at 12:35:46PM +0800, Xin Long wrote:
> On Thu, Mar 23, 2017 at 1:33 AM, Marcelo Ricardo Leitner
> <marcelo.leitner@gmail.com> wrote:
> > On Wed, Mar 22, 2017 at 02:07:37PM +0000, David Laight wrote:
> >> Regardless of the MSG_MORE flags associated with any specific send()
> >> request there will always be protocol effects (like retransmissions
> >> or flow control 'on') that will generate different 'chunking'.
> >
> > Yes, those are the ones that may lead to some confusion on how it
> > actually works, and mangling them is not really desired for the
> > sideeffects that it might have.
> >
> > Sooner or later we could have bug reports like "hey this chunk shouldn't
> > have been packed with that." if we stick with the initial proposition,
> > while with David's view, we are only promising to not send packets with
> > a single chunk and as long as the application send more data fast enough.
> >
> > David, are we on the same page now? ;-)
> >
> > Xin, what do you think?
> If we insist that MSG_MORE means not to send  ANY data, I compromise.
> does ANY include retransmission DATA? should MSG_MORE block
> retransmission ?

That's not really what he meant by that, I think. That "ANY" in there is
a way to refer to the entire buf and not that msg sendmsg is sending.
Later I explained what I got from his explanation, which should be more
like:
"If MSG_MORE was used, and there are no packets in flight, do not send a
packet right away because the application is going to send more data."
Would have to handle the (Not-)Nagle situation too:
"If not using Nagle and using MSG_MORE, try to not generate a packet
right away." (because this may send packets with a single chunk even if
in_flight != 0)
In both cases, if the flush is generated by other triggers, it's okay.

Because if there are chunks already queued, they will be sent as soon as
in_flight reaches 0 or some other break is lifted (flow control).
Holding the chunk that was queued with MSG_MORE and sending a partial
packet in this case because of MSG_MORE is not good, it's possibly not
saving anything.

  Marcelo

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 2/2] sctp: add support for MSG_MORE
@ 2017-03-23 16:42                         ` Marcelo Ricardo Leitner
  0 siblings, 0 replies; 44+ messages in thread
From: Marcelo Ricardo Leitner @ 2017-03-23 16:42 UTC (permalink / raw)
  To: Xin Long
  Cc: David Laight, network dev, linux-sctp, Neil Horman, Vlad Yasevich, davem

On Thu, Mar 23, 2017 at 12:35:46PM +0800, Xin Long wrote:
> On Thu, Mar 23, 2017 at 1:33 AM, Marcelo Ricardo Leitner
> <marcelo.leitner@gmail.com> wrote:
> > On Wed, Mar 22, 2017 at 02:07:37PM +0000, David Laight wrote:
> >> Regardless of the MSG_MORE flags associated with any specific send()
> >> request there will always be protocol effects (like retransmissions
> >> or flow control 'on') that will generate different 'chunking'.
> >
> > Yes, those are the ones that may lead to some confusion on how it
> > actually works, and mangling them is not really desired for the
> > sideeffects that it might have.
> >
> > Sooner or later we could have bug reports like "hey this chunk shouldn't
> > have been packed with that." if we stick with the initial proposition,
> > while with David's view, we are only promising to not send packets with
> > a single chunk and as long as the application send more data fast enough.
> >
> > David, are we on the same page now? ;-)
> >
> > Xin, what do you think?
> If we insist that MSG_MORE means not to send  ANY data, I compromise.
> does ANY include retransmission DATA? should MSG_MORE block
> retransmission ?

That's not really what he meant by that, I think. That "ANY" in there is
a way to refer to the entire buf and not that msg sendmsg is sending.
Later I explained what I got from his explanation, which should be more
like:
"If MSG_MORE was used, and there are no packets in flight, do not send a
packet right away because the application is going to send more data."
Would have to handle the (Not-)Nagle situation too:
"If not using Nagle and using MSG_MORE, try to not generate a packet
right away." (because this may send packets with a single chunk even if
in_flight != 0)
In both cases, if the flush is generated by other triggers, it's okay.

Because if there are chunks already queued, they will be sent as soon as
in_flight reaches 0 or some other break is lifted (flow control).
Holding the chunk that was queued with MSG_MORE and sending a partial
packet in this case because of MSG_MORE is not good, it's possibly not
saving anything.

  Marcelo


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 2/2] sctp: add support for MSG_MORE
  2017-03-23 16:42                         ` Marcelo Ricardo Leitner
@ 2017-03-24 16:09                           ` Xin Long
  -1 siblings, 0 replies; 44+ messages in thread
From: Xin Long @ 2017-03-24 16:09 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: David Laight, network dev, linux-sctp, Neil Horman, Vlad Yasevich, davem

On Fri, Mar 24, 2017 at 12:42 AM, Marcelo Ricardo Leitner
<marcelo.leitner@gmail.com> wrote:
> On Thu, Mar 23, 2017 at 12:35:46PM +0800, Xin Long wrote:
>> On Thu, Mar 23, 2017 at 1:33 AM, Marcelo Ricardo Leitner
>> <marcelo.leitner@gmail.com> wrote:
>> > On Wed, Mar 22, 2017 at 02:07:37PM +0000, David Laight wrote:
>> >> Regardless of the MSG_MORE flags associated with any specific send()
>> >> request there will always be protocol effects (like retransmissions
>> >> or flow control 'on') that will generate different 'chunking'.
>> >
>> > Yes, those are the ones that may lead to some confusion on how it
>> > actually works, and mangling them is not really desired for the
>> > sideeffects that it might have.
>> >
>> > Sooner or later we could have bug reports like "hey this chunk shouldn't
>> > have been packed with that." if we stick with the initial proposition,
>> > while with David's view, we are only promising to not send packets with
>> > a single chunk and as long as the application send more data fast enough.
>> >
>> > David, are we on the same page now? ;-)
>> >
>> > Xin, what do you think?
>> If we insist that MSG_MORE means not to send  ANY data, I compromise.
>> does ANY include retransmission DATA? should MSG_MORE block
>> retransmission ?
>
> That's not really what he meant by that, I think. That "ANY" in there is
> a way to refer to the entire buf and not that msg sendmsg is sending.
> Later I explained what I got from his explanation, which should be more
> like:
> "If MSG_MORE was used, and there are no packets in flight, do not send a
> packet right away because the application is going to send more data."
> Would have to handle the (Not-)Nagle situation too:
> "If not using Nagle and using MSG_MORE, try to not generate a packet
> right away." (because this may send packets with a single chunk even if
> in_flight != 0)
> In both cases, if the flush is generated by other triggers, it's okay.
>
> Because if there are chunks already queued, they will be sent as soon as
> in_flight reaches 0 or some other break is lifted (flow control).
> Holding the chunk that was queued with MSG_MORE and sending a partial
> packet in this case because of MSG_MORE is not good, it's possibly not
> saving anything.

Makes sence, thanks for making this clear, will post a new fix.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 2/2] sctp: add support for MSG_MORE
@ 2017-03-24 16:09                           ` Xin Long
  0 siblings, 0 replies; 44+ messages in thread
From: Xin Long @ 2017-03-24 16:09 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: David Laight, network dev, linux-sctp, Neil Horman, Vlad Yasevich, davem

On Fri, Mar 24, 2017 at 12:42 AM, Marcelo Ricardo Leitner
<marcelo.leitner@gmail.com> wrote:
> On Thu, Mar 23, 2017 at 12:35:46PM +0800, Xin Long wrote:
>> On Thu, Mar 23, 2017 at 1:33 AM, Marcelo Ricardo Leitner
>> <marcelo.leitner@gmail.com> wrote:
>> > On Wed, Mar 22, 2017 at 02:07:37PM +0000, David Laight wrote:
>> >> Regardless of the MSG_MORE flags associated with any specific send()
>> >> request there will always be protocol effects (like retransmissions
>> >> or flow control 'on') that will generate different 'chunking'.
>> >
>> > Yes, those are the ones that may lead to some confusion on how it
>> > actually works, and mangling them is not really desired for the
>> > sideeffects that it might have.
>> >
>> > Sooner or later we could have bug reports like "hey this chunk shouldn't
>> > have been packed with that." if we stick with the initial proposition,
>> > while with David's view, we are only promising to not send packets with
>> > a single chunk and as long as the application send more data fast enough.
>> >
>> > David, are we on the same page now? ;-)
>> >
>> > Xin, what do you think?
>> If we insist that MSG_MORE means not to send  ANY data, I compromise.
>> does ANY include retransmission DATA? should MSG_MORE block
>> retransmission ?
>
> That's not really what he meant by that, I think. That "ANY" in there is
> a way to refer to the entire buf and not that msg sendmsg is sending.
> Later I explained what I got from his explanation, which should be more
> like:
> "If MSG_MORE was used, and there are no packets in flight, do not send a
> packet right away because the application is going to send more data."
> Would have to handle the (Not-)Nagle situation too:
> "If not using Nagle and using MSG_MORE, try to not generate a packet
> right away." (because this may send packets with a single chunk even if
> in_flight != 0)
> In both cases, if the flush is generated by other triggers, it's okay.
>
> Because if there are chunks already queued, they will be sent as soon as
> in_flight reaches 0 or some other break is lifted (flow control).
> Holding the chunk that was queued with MSG_MORE and sending a partial
> packet in this case because of MSG_MORE is not good, it's possibly not
> saving anything.

Makes sence, thanks for making this clear, will post a new fix.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH net-next 2/2] sctp: add support for MSG_MORE
  2017-03-23 16:42                         ` Marcelo Ricardo Leitner
  (?)
  (?)
@ 2017-03-24 17:38                         ` David Laight
  -1 siblings, 0 replies; 44+ messages in thread
From: David Laight @ 2017-03-24 17:38 UTC (permalink / raw)
  To: 'Marcelo Ricardo Leitner', Xin Long
  Cc: network dev, linux-sctp, Neil Horman, Vlad Yasevich, davem

From: Marcelo Ricardo Leitner
> Sent: 23 March 2017 16:42
...
> > > David, are we on the same page now? ;-)

Probably...

> > > Xin, what do you think?
> > If we insist that MSG_MORE means not to send  ANY data, I compromise.
> > does ANY include retransmission DATA? should MSG_MORE block
> > retransmission ?
> 
> That's not really what he meant by that, I think. That "ANY" in there is
> a way to refer to the entire buf and not that msg sendmsg is sending.
> Later I explained what I got from his explanation, which should be more
> like:
> "If MSG_MORE was used, and there are no packets in flight, do not send a
> packet right away because the application is going to send more data."
> Would have to handle the (Not-)Nagle situation too:
> "If not using Nagle and using MSG_MORE, try to not generate a packet
> right away." (because this may send packets with a single chunk even if
> in_flight != 0)
> In both cases, if the flush is generated by other triggers, it's okay.
> 
> Because if there are chunks already queued, they will be sent as soon as
> in_flight reaches 0 or some other break is lifted (flow control).
> Holding the chunk that was queued with MSG_MORE and sending a partial
> packet in this case because of MSG_MORE is not good, it's possibly not
> saving anything.

It is also worth remembering that this is all about whether the first
chunk in an ethernet frame is a data chunk.
If a frame is being sent for some other reason (eg ack or heartbeat)
then it will collect queued data chunks.

I've seen hearbeats collect data chunks, I've not checked that this
doesn't happen for heartbeats that are probing IP addresses.

	David

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH net-next 2/2] sctp: add support for MSG_MORE
  2017-03-21 22:04                 ` Marcelo Ricardo Leitner
  (?)
  (?)
@ 2017-03-28 10:29                 ` David Laight
  2017-03-28 18:12                     ` 'Marcelo Ricardo Leitner'
  -1 siblings, 1 reply; 44+ messages in thread
From: David Laight @ 2017-03-28 10:29 UTC (permalink / raw)
  To: 'Marcelo Ricardo Leitner'
  Cc: 'Xin Long',
	network dev, linux-sctp, Neil Horman, Vlad Yasevich, davem

From: Marcelo Ricardo Leitner
> Sent: 21 March 2017 22:04
...
> > > 3. then if user send more small chunks with MSG_MORE set,
> > > the queue is like:
> > >   chkB[set] -> chkA[set] -> chk4[clear] -> chk3 [clear] -> chk2 [clear] -> chk1 [clear]
> > > so that the new small chunks' flag will not affect the other chunks bundling.
> >
> > That isn't really necessary.
> > The user can't expect to have absolute control over which chunks get bundled
> > together.
> 
> So...?
> I mean, I'm okay with that but that doesn't explain why we can't do as
> Xin proposed on previous email here.
> 
> > If the above chunks still aren't big enough to fill a frame the code might
> > as well wait for the next chunk instead of building a packet that contains
> > chk1 through to chkB.
> 
> Our expectations are the same and that's what the proposed solution also
> achieves, no?
> 
> >
> > Remember you'll only get a queued chunk with MSG_MORE clear if data can't be sent.
> > As soon as data can be sent, if the first chunk has MSG_MORE clear all of the
> > queued chunks will be sent.
> 
> With the fix proposed by Xin, this would be more like: ... all of the
> _non-held_ chunks will be sent.
> After all, application asked to hold them, for whatever reason it had.
> 
> >
> > So immediately after your (3) the application is expected to send a chunk
> > with MSG_MORE clear - at that point all the queued chunks can be sent in
> > a single packet.
> 
> Yes. Isn't that the idea?
> 
> >
> > So just save the last MSG_MORE on the association as I did.
> 
> I don't see the reason to change that. Your reply seem to reject the
> idea but I cannot get the reason why. The solution proposed is more
> complex, yes, allows more control, yes, but those aren't real issues
> here.

I think I've realised why we were disagreeing....

For TCP MSG_MORE was implemented as an alternative to 'corking' for
allocations that used multiple send() calls to send a single application
level message (eg those that send a length and data separately).
The real solution is to use sendv() (or writev() if sendv() doesn't exist).

SCTP is different, every send() generates a separate data chunk.
The problem in SCTP is that if an application has more than one data
chunk to send it needs to stop the kernel sending anything until
it has sent all the data chunks - otherwise they all go out in
separate ethernet packets.

A TCP application can avoid this by doing a single send() for all the
data. This doesn't work for SCTP because it would generate a single
data chunk.

	David

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 2/2] sctp: add support for MSG_MORE
  2017-03-28 10:29                 ` David Laight
@ 2017-03-28 18:12                     ` 'Marcelo Ricardo Leitner'
  0 siblings, 0 replies; 44+ messages in thread
From: 'Marcelo Ricardo Leitner' @ 2017-03-28 18:12 UTC (permalink / raw)
  To: David Laight
  Cc: 'Xin Long',
	network dev, linux-sctp, Neil Horman, Vlad Yasevich, davem

On Tue, Mar 28, 2017 at 10:29:27AM +0000, David Laight wrote:
> From: Marcelo Ricardo Leitner
> > Sent: 21 March 2017 22:04
> ...
> > > > 3. then if user send more small chunks with MSG_MORE set,
> > > > the queue is like:
> > > >   chkB[set] -> chkA[set] -> chk4[clear] -> chk3 [clear] -> chk2 [clear] -> chk1 [clear]
> > > > so that the new small chunks' flag will not affect the other chunks bundling.
> > >
> > > That isn't really necessary.
> > > The user can't expect to have absolute control over which chunks get bundled
> > > together.
> > 
> > So...?
> > I mean, I'm okay with that but that doesn't explain why we can't do as
> > Xin proposed on previous email here.
> > 
> > > If the above chunks still aren't big enough to fill a frame the code might
> > > as well wait for the next chunk instead of building a packet that contains
> > > chk1 through to chkB.
> > 
> > Our expectations are the same and that's what the proposed solution also
> > achieves, no?
> > 
> > >
> > > Remember you'll only get a queued chunk with MSG_MORE clear if data can't be sent.
> > > As soon as data can be sent, if the first chunk has MSG_MORE clear all of the
> > > queued chunks will be sent.
> > 
> > With the fix proposed by Xin, this would be more like: ... all of the
> > _non-held_ chunks will be sent.
> > After all, application asked to hold them, for whatever reason it had.
> > 
> > >
> > > So immediately after your (3) the application is expected to send a chunk
> > > with MSG_MORE clear - at that point all the queued chunks can be sent in
> > > a single packet.
> > 
> > Yes. Isn't that the idea?
> > 
> > >
> > > So just save the last MSG_MORE on the association as I did.
> > 
> > I don't see the reason to change that. Your reply seem to reject the
> > idea but I cannot get the reason why. The solution proposed is more
> > complex, yes, allows more control, yes, but those aren't real issues
> > here.
> 
> I think I've realised why we were disagreeing....
> 
> For TCP MSG_MORE was implemented as an alternative to 'corking' for
> allocations that used multiple send() calls to send a single application
> level message (eg those that send a length and data separately).
> The real solution is to use sendv() (or writev() if sendv() doesn't exist).
> 
> SCTP is different, every send() generates a separate data chunk.
> The problem in SCTP is that if an application has more than one data
> chunk to send it needs to stop the kernel sending anything until
> it has sent all the data chunks - otherwise they all go out in
> separate ethernet packets.
> 
> A TCP application can avoid this by doing a single send() for all the
> data. This doesn't work for SCTP because it would generate a single
> data chunk.

Yep, it was based on TCP one since day 0. There is also the way MSG_MORE
works for UDP, in which it appends the subsequent write to the same
datagram. But then, for SCTP, we should use EOR flag, as Michael Tuexen
had indicated here on the list a few months ago:
"""
What about adding support for the explicit EOR mode as specified in
https://tools.ietf.org/html/rfc6458#section-8.1.26
"""
Thus why it was very based on TCP implementation, so maybe in the future
we could have both options available for SCTP.

  Marcelo

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 2/2] sctp: add support for MSG_MORE
@ 2017-03-28 18:12                     ` 'Marcelo Ricardo Leitner'
  0 siblings, 0 replies; 44+ messages in thread
From: 'Marcelo Ricardo Leitner' @ 2017-03-28 18:12 UTC (permalink / raw)
  To: David Laight
  Cc: 'Xin Long',
	network dev, linux-sctp, Neil Horman, Vlad Yasevich, davem

On Tue, Mar 28, 2017 at 10:29:27AM +0000, David Laight wrote:
> From: Marcelo Ricardo Leitner
> > Sent: 21 March 2017 22:04
> ...
> > > > 3. then if user send more small chunks with MSG_MORE set,
> > > > the queue is like:
> > > >   chkB[set] -> chkA[set] -> chk4[clear] -> chk3 [clear] -> chk2 [clear] -> chk1 [clear]
> > > > so that the new small chunks' flag will not affect the other chunks bundling.
> > >
> > > That isn't really necessary.
> > > The user can't expect to have absolute control over which chunks get bundled
> > > together.
> > 
> > So...?
> > I mean, I'm okay with that but that doesn't explain why we can't do as
> > Xin proposed on previous email here.
> > 
> > > If the above chunks still aren't big enough to fill a frame the code might
> > > as well wait for the next chunk instead of building a packet that contains
> > > chk1 through to chkB.
> > 
> > Our expectations are the same and that's what the proposed solution also
> > achieves, no?
> > 
> > >
> > > Remember you'll only get a queued chunk with MSG_MORE clear if data can't be sent.
> > > As soon as data can be sent, if the first chunk has MSG_MORE clear all of the
> > > queued chunks will be sent.
> > 
> > With the fix proposed by Xin, this would be more like: ... all of the
> > _non-held_ chunks will be sent.
> > After all, application asked to hold them, for whatever reason it had.
> > 
> > >
> > > So immediately after your (3) the application is expected to send a chunk
> > > with MSG_MORE clear - at that point all the queued chunks can be sent in
> > > a single packet.
> > 
> > Yes. Isn't that the idea?
> > 
> > >
> > > So just save the last MSG_MORE on the association as I did.
> > 
> > I don't see the reason to change that. Your reply seem to reject the
> > idea but I cannot get the reason why. The solution proposed is more
> > complex, yes, allows more control, yes, but those aren't real issues
> > here.
> 
> I think I've realised why we were disagreeing....
> 
> For TCP MSG_MORE was implemented as an alternative to 'corking' for
> allocations that used multiple send() calls to send a single application
> level message (eg those that send a length and data separately).
> The real solution is to use sendv() (or writev() if sendv() doesn't exist).
> 
> SCTP is different, every send() generates a separate data chunk.
> The problem in SCTP is that if an application has more than one data
> chunk to send it needs to stop the kernel sending anything until
> it has sent all the data chunks - otherwise they all go out in
> separate ethernet packets.
> 
> A TCP application can avoid this by doing a single send() for all the
> data. This doesn't work for SCTP because it would generate a single
> data chunk.

Yep, it was based on TCP one since day 0. There is also the way MSG_MORE
works for UDP, in which it appends the subsequent write to the same
datagram. But then, for SCTP, we should use EOR flag, as Michael Tuexen
had indicated here on the list a few months ago:
"""
What about adding support for the explicit EOR mode as specified in
https://tools.ietf.org/html/rfc6458#section-8.1.26
"""
Thus why it was very based on TCP implementation, so maybe in the future
we could have both options available for SCTP.

  Marcelo


^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2017-03-28 18:13 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-18 17:52 [PATCH net-next 0/2] sctp: support MSG_MORE flag when sending msg Xin Long
2017-02-18 17:52 ` Xin Long
2017-02-18 17:52 ` [PATCH net-next 1/2] sctp: flush out queue once assoc state falls into SHUTDOWN_PENDING Xin Long
2017-02-18 17:52   ` Xin Long
2017-02-18 17:52   ` [PATCH net-next 2/2] sctp: add support for MSG_MORE Xin Long
2017-02-18 17:52     ` Xin Long
2017-02-21 14:27     ` David Laight
2017-02-23  3:45       ` Xin Long
2017-02-23  3:45         ` Xin Long
2017-02-23 16:04         ` David Laight
2017-02-23 16:04           ` David Laight
2017-02-23 17:40           ` Marcelo Ricardo Leitner
2017-02-23 17:40             ` Marcelo Ricardo Leitner
2017-02-23 18:16             ` Xin Long
2017-02-23 18:16               ` Xin Long
2017-02-23 18:39               ` Marcelo Ricardo Leitner
2017-02-23 18:39                 ` Marcelo Ricardo Leitner
2017-02-24  6:43           ` Xin Long
2017-02-24  6:43             ` Xin Long
2017-02-24 10:14             ` David Laight
2017-02-24 10:14               ` David Laight
2017-02-25  8:41               ` Xin Long
2017-02-25  8:41                 ` Xin Long
2017-02-27  4:49                 ` Xin Long
2017-02-27  4:49                   ` Xin Long
2017-02-27 10:48                   ` David Laight
2017-02-27 10:48                     ` David Laight
2017-03-21 22:04               ` Marcelo Ricardo Leitner
2017-03-21 22:04                 ` Marcelo Ricardo Leitner
2017-03-22 14:07                 ` David Laight
2017-03-22 17:33                   ` 'Marcelo Ricardo Leitner'
2017-03-22 17:33                     ` 'Marcelo Ricardo Leitner'
2017-03-23  4:35                     ` Xin Long
2017-03-23  4:35                       ` Xin Long
2017-03-23 16:42                       ` Marcelo Ricardo Leitner
2017-03-23 16:42                         ` Marcelo Ricardo Leitner
2017-03-24 16:09                         ` Xin Long
2017-03-24 16:09                           ` Xin Long
2017-03-24 17:38                         ` David Laight
2017-03-28 10:29                 ` David Laight
2017-03-28 18:12                   ` 'Marcelo Ricardo Leitner'
2017-03-28 18:12                     ` 'Marcelo Ricardo Leitner'
2017-02-20 15:26 ` [PATCH net-next 0/2] sctp: support MSG_MORE flag when sending msg David Miller
2017-02-20 15:26   ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.