All of lore.kernel.org
 help / color / mirror / Atom feed
* [MPTCP] Re: [RFC 4/5] mptcp: update mptcp ack sequence from work queue
@ 2020-02-13 14:37 Paolo Abeni
  0 siblings, 0 replies; 4+ messages in thread
From: Paolo Abeni @ 2020-02-13 14:37 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 2170 bytes --]

On Thu, 2020-02-13 at 11:27 +0100, Florian Westphal wrote:
> +static void __mptcp_move_skbs(struct mptcp_sock *msk)
> +{
> +	struct mptcp_steal_arg arg = {
> +		.msk = msk,
> +		.fin = false,
> +	};
> +	read_descriptor_t desc = {
> +		.arg.data = &arg,
> +	};
> +
> +	for (;;) {
> +		struct mptcp_subflow_context *subflow;
> +		bool more_data_avail;
> +
> +		arg.ssk = mptcp_subflow_recv_lookup(msk);
> +		if (!arg.ssk)
> +			break;
> +
> +		subflow = mptcp_subflow_ctx(arg.ssk);
> +
> +		lock_sock(arg.ssk);
> +
> +		do {
> +			u32 map_remaining;
> +			int bytes_read;
> +
> +			/* try to read as much data as available */
> +			map_remaining = subflow->map_data_len -
> +					mptcp_subflow_get_map_offset(subflow);
> +			desc.count = map_remaining;
> +
> +			bytes_read = tcp_read_sock(arg.ssk, &desc, mptcp_steal_actor);
> +			if (bytes_read <= 0) {
> +				release_sock(arg.ssk);
> +				return;
> +			}
> +
> +			if (arg.fin) {
> +				struct tcp_sock *tp = tcp_sk(arg.ssk);
> +				u32 seq = READ_ONCE(tp->copied_seq);
> +
> +				WRITE_ONCE(tp->copied_seq, seq + 1);
> +			}
> +
> +			more_data_avail = mptcp_subflow_data_available(arg.ssk);
> +		} while (more_data_avail);
> +
> +		release_sock(arg.ssk);
> +	}
> +}

Overall the above move the pending skbs to the msk receive queue,
increment ssk copied_seq and eventually let TCP send ack via
tcp_cleanup_rbuf().

I think we could avoid touching tcp_read_sock() in patch 2/5 moving the
skb directly from __mptcp_move_skbs() - no call to tcp_read_sock(),
just queue manipulation and copied_seq accounting.

Than __mptcp_move_skbs() will call directly, as needed - need to export
it.

We could additionally clean-up the code calling __mptcp_move_skbs()
from mptcp_recvmsg() before looking into the msk receive queue. Than no
need for the current recvmsg inner loop.

I fear we need some tweek to handle TCP fallback. If I read the RFC
correctly the msk can fallback to TCP even after receiving some DSS
data (e.g. msk receive queue contains some skbs and the msk fallback to
TCP. We can't call anymore sock_recvmsg())

WDYT?

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [MPTCP] Re: [RFC 4/5] mptcp: update mptcp ack sequence from work queue
@ 2020-02-17 18:27 Christoph Paasch
  0 siblings, 0 replies; 4+ messages in thread
From: Christoph Paasch @ 2020-02-17 18:27 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 6438 bytes --]

Hello,

On 17/02/20 - 09:40:10, Paolo Abeni wrote:
> On Fri, 2020-02-14 at 17:39 -0800, Mat Martineau wrote:
> > On Thu, 13 Feb 2020, Paolo Abeni wrote:
> > 
> > > On Thu, 2020-02-13 at 11:27 +0100, Florian Westphal wrote:
> > > > +static void __mptcp_move_skbs(struct mptcp_sock *msk)
> > > > +{
> > > > +	struct mptcp_steal_arg arg = {
> > > > +		.msk = msk,
> > > > +		.fin = false,
> > > > +	};
> > > > +	read_descriptor_t desc = {
> > > > +		.arg.data = &arg,
> > > > +	};
> > > > +
> > > > +	for (;;) {
> > > > +		struct mptcp_subflow_context *subflow;
> > > > +		bool more_data_avail;
> > > > +
> > > > +		arg.ssk = mptcp_subflow_recv_lookup(msk);
> > > > +		if (!arg.ssk)
> > > > +			break;
> > > > +
> > > > +		subflow = mptcp_subflow_ctx(arg.ssk);
> > > > +
> > > > +		lock_sock(arg.ssk);
> > > > +
> > > > +		do {
> > > > +			u32 map_remaining;
> > > > +			int bytes_read;
> > > > +
> > > > +			/* try to read as much data as available */
> > > > +			map_remaining = subflow->map_data_len -
> > > > +					mptcp_subflow_get_map_offset(subflow);
> > > > +			desc.count = map_remaining;
> > > > +
> > > > +			bytes_read = tcp_read_sock(arg.ssk, &desc, mptcp_steal_actor);
> > > > +			if (bytes_read <= 0) {
> > > > +				release_sock(arg.ssk);
> > > > +				return;
> > > > +			}
> > > > +
> > > > +			if (arg.fin) {
> > > > +				struct tcp_sock *tp = tcp_sk(arg.ssk);
> > > > +				u32 seq = READ_ONCE(tp->copied_seq);
> > > > +
> > > > +				WRITE_ONCE(tp->copied_seq, seq + 1);
> > > > +			}
> > > > +
> > > > +			more_data_avail = mptcp_subflow_data_available(arg.ssk);
> > > > +		} while (more_data_avail);
> > > > +
> > > > +		release_sock(arg.ssk);
> > > > +	}
> > > > +}
> > > 
> > > Overall the above move the pending skbs to the msk receive queue,
> > > increment ssk copied_seq and eventually let TCP send ack via
> > > tcp_cleanup_rbuf().
> > > 
> > 
> > It's only moving in-order skbs, right?
> > 
> > > I think we could avoid touching tcp_read_sock() in patch 2/5 moving the
> > > skb directly from __mptcp_move_skbs() - no call to tcp_read_sock(),
> > > just queue manipulation and copied_seq accounting.
> > > 
> > > Than __mptcp_move_skbs() will call directly, as needed - need to export
> > > it.
> > 
> > This makes sense to me. We have to do another layer of reassembly for 
> > MPTCP and we already know the skbs are in-order for the subflow stream, 
> > and there's no need to handle partial skb reads like tcp_read_sock does.
> > 
> > > We could additionally clean-up the code calling __mptcp_move_skbs()
> > > from mptcp_recvmsg() before looking into the msk receive queue. Than no
> > > need for the current recvmsg inner loop.
> > > 
> > > I fear we need some tweek to handle TCP fallback. If I read the RFC
> > > correctly the msk can fallback to TCP even after receiving some DSS
> > > data (e.g. msk receive queue contains some skbs and the msk fallback to
> > > TCP. We can't call anymore sock_recvmsg())
> > 
> > RFC 8684 section 3.7 (Fallback) says that "If a subflow breaks during 
> > operation... the subflow SHOULD be treated as broken and closed with a 
> > RST". Does that cover your concern, or are you referring to a fallback at 
> > connection time?
> 
> uhm... it looks like fallback scenarios are quite rich/complex.
> 
> @Florian, I think this discussion in important, but does not relate
> strictly to this series; the problem is independent, and could be
> addresses separatelly - if we need to address anything ;)
> 
> The scenario I was thinking about is (e.g.):
> 

I'm numbering the steps for easier reference:

> 1. the client send a data segment + DSS mapping
> 2. the server do not send DACKs, just plain TCP acks (because e.g. MPTCP
> options are allowed by middle-box after syn)
> 3. the client should fall-back (we currently don't do any check there,
> so we don't) and sends some additional plain TCP data segments
> 4. the server sends some data seg, and should fall-back for the same
> resons

*if* in Step 2 the server received data without a DSS-option he already fell
back to regular TCP right away. Because at the beginning a sender MUST
include the DSS-options in each segment:

"A sender MUST include a DSS option with data sequence mapping in
 every segment until one of the sent segments has been acknowledged
 with a DSS option containing a Data ACK."

 Nevertheless, the following steps can still happen if the middlebox is just
 in one direction (aka., server to client and thus the DATA_ACK gets
 removed).

> 5. the server tries to read from the msk socket. The older data is in
> the msk receive queue, the newer in the subflow receive queue, if we
> call sock_recvmsg(), that will corrupt the stream.
> 
> I'm unsure if the above wording from 3.7 applies here: it think that
> the sentence "If a subflow breaks during operation" applies to the msk
> socket after that the first data segment is DACK-ed other, otherwise
> the 2 paragraph just before that[1] would be unneeded, right?

Yes, this scenario here is about the start of the connection. and the
paragraph [1] applies, as well as "In the case of such an ACK being received
on the first subflow...".

Thus, once the sender receives the ACK without DATA_ACK, he sends an
infinite mapping option (DSS With data-len == 0) to the server.
That is the indication that it has to fallback. It is guaranteed that all
data is in-order because the client should not yet have established a second
subflow or sent data in an out-of-order way on the subflow.

> @Christoph, could you please assert the correct interpretation here?
> And what about the other critical fallback scenario discussed during
> the last mtg? (fallback due to out-of-order DACK / plain TCP ACK)

If that happens, the sender will receive the plain TCP ACK and fallback to
regular TCP. It will indicate that to the receiver with an infinite mapping.
(Cfr.: "The sender will send one final data sequence mapping...").

Once the receiver receives that, he knows that he also has to fallback to
regular TCP.


In general, it is best for a host to always include a DATA_ACK option in
every segment that it sends. That avoids these kind of issues :-)


Christoph


> 
> Thanks,
> 
> Paolo
> 
> [1] "If, however, an ACK is received for data (not just for the
> SYN)"...
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [MPTCP] Re: [RFC 4/5] mptcp: update mptcp ack sequence from work queue
@ 2020-02-17  8:40 Paolo Abeni
  0 siblings, 0 replies; 4+ messages in thread
From: Paolo Abeni @ 2020-02-17  8:40 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 4532 bytes --]

On Fri, 2020-02-14 at 17:39 -0800, Mat Martineau wrote:
> On Thu, 13 Feb 2020, Paolo Abeni wrote:
> 
> > On Thu, 2020-02-13 at 11:27 +0100, Florian Westphal wrote:
> > > +static void __mptcp_move_skbs(struct mptcp_sock *msk)
> > > +{
> > > +	struct mptcp_steal_arg arg = {
> > > +		.msk = msk,
> > > +		.fin = false,
> > > +	};
> > > +	read_descriptor_t desc = {
> > > +		.arg.data = &arg,
> > > +	};
> > > +
> > > +	for (;;) {
> > > +		struct mptcp_subflow_context *subflow;
> > > +		bool more_data_avail;
> > > +
> > > +		arg.ssk = mptcp_subflow_recv_lookup(msk);
> > > +		if (!arg.ssk)
> > > +			break;
> > > +
> > > +		subflow = mptcp_subflow_ctx(arg.ssk);
> > > +
> > > +		lock_sock(arg.ssk);
> > > +
> > > +		do {
> > > +			u32 map_remaining;
> > > +			int bytes_read;
> > > +
> > > +			/* try to read as much data as available */
> > > +			map_remaining = subflow->map_data_len -
> > > +					mptcp_subflow_get_map_offset(subflow);
> > > +			desc.count = map_remaining;
> > > +
> > > +			bytes_read = tcp_read_sock(arg.ssk, &desc, mptcp_steal_actor);
> > > +			if (bytes_read <= 0) {
> > > +				release_sock(arg.ssk);
> > > +				return;
> > > +			}
> > > +
> > > +			if (arg.fin) {
> > > +				struct tcp_sock *tp = tcp_sk(arg.ssk);
> > > +				u32 seq = READ_ONCE(tp->copied_seq);
> > > +
> > > +				WRITE_ONCE(tp->copied_seq, seq + 1);
> > > +			}
> > > +
> > > +			more_data_avail = mptcp_subflow_data_available(arg.ssk);
> > > +		} while (more_data_avail);
> > > +
> > > +		release_sock(arg.ssk);
> > > +	}
> > > +}
> > 
> > Overall the above move the pending skbs to the msk receive queue,
> > increment ssk copied_seq and eventually let TCP send ack via
> > tcp_cleanup_rbuf().
> > 
> 
> It's only moving in-order skbs, right?
> 
> > I think we could avoid touching tcp_read_sock() in patch 2/5 moving the
> > skb directly from __mptcp_move_skbs() - no call to tcp_read_sock(),
> > just queue manipulation and copied_seq accounting.
> > 
> > Than __mptcp_move_skbs() will call directly, as needed - need to export
> > it.
> 
> This makes sense to me. We have to do another layer of reassembly for 
> MPTCP and we already know the skbs are in-order for the subflow stream, 
> and there's no need to handle partial skb reads like tcp_read_sock does.
> 
> > We could additionally clean-up the code calling __mptcp_move_skbs()
> > from mptcp_recvmsg() before looking into the msk receive queue. Than no
> > need for the current recvmsg inner loop.
> > 
> > I fear we need some tweek to handle TCP fallback. If I read the RFC
> > correctly the msk can fallback to TCP even after receiving some DSS
> > data (e.g. msk receive queue contains some skbs and the msk fallback to
> > TCP. We can't call anymore sock_recvmsg())
> 
> RFC 8684 section 3.7 (Fallback) says that "If a subflow breaks during 
> operation... the subflow SHOULD be treated as broken and closed with a 
> RST". Does that cover your concern, or are you referring to a fallback at 
> connection time?

uhm... it looks like fallback scenarios are quite rich/complex.

@Florian, I think this discussion in important, but does not relate
strictly to this series; the problem is independent, and could be
addresses separatelly - if we need to address anything ;)

The scenario I was thinking about is (e.g.):

- the client send a data segment + DSS mapping
- the server do not send DACKs, just plain TCP acks (because e.g. MPTCP
options are allowed by middle-box after syn)
- the client should fall-back (we currently don't do any check there,
so we don't) and sends some additional plain TCP data segments
- the server sends some data seg, and should fall-back for the same
resons
- the server tries to read from the msk socket. The older data is in
the msk receive queue, the newer in the subflow receive queue, if we
call sock_recvmsg(), that will corrupt the stream.

I'm unsure if the above wording from 3.7 applies here: it think that
the sentence "If a subflow breaks during operation" applies to the msk
socket after that the first data segment is DACK-ed other, otherwise
the 2 paragraph just before that[1] would be unneeded, right?

@Christoph, could you please assert the correct interpretation here?
And what about the other critical fallback scenario discussed during
the last mtg? (fallback due to out-of-order DACK / plain TCP ACK)

Thanks,

Paolo

[1] "If, however, an ACK is received for data (not just for the
SYN)"...




^ permalink raw reply	[flat|nested] 4+ messages in thread

* [MPTCP] Re: [RFC 4/5] mptcp: update mptcp ack sequence from work queue
@ 2020-02-15  1:39 Mat Martineau
  0 siblings, 0 replies; 4+ messages in thread
From: Mat Martineau @ 2020-02-15  1:39 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 2810 bytes --]


On Thu, 13 Feb 2020, Paolo Abeni wrote:

> On Thu, 2020-02-13 at 11:27 +0100, Florian Westphal wrote:
>> +static void __mptcp_move_skbs(struct mptcp_sock *msk)
>> +{
>> +	struct mptcp_steal_arg arg = {
>> +		.msk = msk,
>> +		.fin = false,
>> +	};
>> +	read_descriptor_t desc = {
>> +		.arg.data = &arg,
>> +	};
>> +
>> +	for (;;) {
>> +		struct mptcp_subflow_context *subflow;
>> +		bool more_data_avail;
>> +
>> +		arg.ssk = mptcp_subflow_recv_lookup(msk);
>> +		if (!arg.ssk)
>> +			break;
>> +
>> +		subflow = mptcp_subflow_ctx(arg.ssk);
>> +
>> +		lock_sock(arg.ssk);
>> +
>> +		do {
>> +			u32 map_remaining;
>> +			int bytes_read;
>> +
>> +			/* try to read as much data as available */
>> +			map_remaining = subflow->map_data_len -
>> +					mptcp_subflow_get_map_offset(subflow);
>> +			desc.count = map_remaining;
>> +
>> +			bytes_read = tcp_read_sock(arg.ssk, &desc, mptcp_steal_actor);
>> +			if (bytes_read <= 0) {
>> +				release_sock(arg.ssk);
>> +				return;
>> +			}
>> +
>> +			if (arg.fin) {
>> +				struct tcp_sock *tp = tcp_sk(arg.ssk);
>> +				u32 seq = READ_ONCE(tp->copied_seq);
>> +
>> +				WRITE_ONCE(tp->copied_seq, seq + 1);
>> +			}
>> +
>> +			more_data_avail = mptcp_subflow_data_available(arg.ssk);
>> +		} while (more_data_avail);
>> +
>> +		release_sock(arg.ssk);
>> +	}
>> +}
>
> Overall the above move the pending skbs to the msk receive queue,
> increment ssk copied_seq and eventually let TCP send ack via
> tcp_cleanup_rbuf().
>

It's only moving in-order skbs, right?

> I think we could avoid touching tcp_read_sock() in patch 2/5 moving the
> skb directly from __mptcp_move_skbs() - no call to tcp_read_sock(),
> just queue manipulation and copied_seq accounting.
>
> Than __mptcp_move_skbs() will call directly, as needed - need to export
> it.

This makes sense to me. We have to do another layer of reassembly for 
MPTCP and we already know the skbs are in-order for the subflow stream, 
and there's no need to handle partial skb reads like tcp_read_sock does.

>
> We could additionally clean-up the code calling __mptcp_move_skbs()
> from mptcp_recvmsg() before looking into the msk receive queue. Than no
> need for the current recvmsg inner loop.
>
> I fear we need some tweek to handle TCP fallback. If I read the RFC
> correctly the msk can fallback to TCP even after receiving some DSS
> data (e.g. msk receive queue contains some skbs and the msk fallback to
> TCP. We can't call anymore sock_recvmsg())

RFC 8684 section 3.7 (Fallback) says that "If a subflow breaks during 
operation... the subflow SHOULD be treated as broken and closed with a 
RST". Does that cover your concern, or are you referring to a fallback at 
connection time?


--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-02-17 18:27 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-13 14:37 [MPTCP] Re: [RFC 4/5] mptcp: update mptcp ack sequence from work queue Paolo Abeni
2020-02-15  1:39 Mat Martineau
2020-02-17  8:40 Paolo Abeni
2020-02-17 18:27 Christoph Paasch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.