All of lore.kernel.org
 help / color / mirror / Atom feed
* [MPTCP] Re: SYN/ACK+MP_JOIN keeps getting retransmitted
@ 2020-05-05 18:31 Christoph Paasch
  0 siblings, 0 replies; 10+ messages in thread
From: Christoph Paasch @ 2020-05-05 18:31 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 5614 bytes --]

Hello,

On 05/05/20 - 12:42:16, Paolo Abeni wrote:
> On Mon, 2020-05-04 at 09:36 -0700, Christoph Paasch wrote:
> > On 02/05/20 - 09:30:43, Paolo Abeni wrote:
> > > On Fri, 2020-05-01 at 11:09 -0700, Christoph Paasch wrote:
> [...]
> > > > Thoughts?
> > > > 
> > > > This is not going to trigger a RST from the server (see comment in
> > > > subflow_init_req, because we are not passing up a return-value from init_req
> > > > to tcp_conn_request - but that should be easy to change).
> > > 
> > > Thank you for the detailed analisys! LGTM, with a couple of notes:
> > > 
> > > - I think it would be nicer if we additionally decreases the subflows
> > > if the mp_join request sk 'fails'/fallback before completing the 3WHS
> > 
> > regarding decrease - I saw that and actually was surprised that we *never*
> > decrease the subflow-counter (at least, I couldn't find where it is being
> > decremented).
> > 
> > Was that intended? 
> 
> So far, yes. The idea is that we currently increment the subflows
> counter only after the peer is completely validated and we close the
> subflow only at msk shutdown. Even if the peer closes the subflow, we -
> currently - still keep the 'struct sock' around. So the subflow count
> never decreases. 
> 
> I plan to revisit the above within the scope of:
> 
> https://github.com/multipath-tcp/mptcp_net-next/issues/19
> 
> 
> > Because, that means if I set max_subflows to 3 and even if
> > I only ever have 2 subflows in parallel, after the third subflow I am blocked.
> 
> Which is expected, right ?!? Can you please re-phrase?!? 

The scenario I have in mind is this:

One configures the server-side with X subflows (with X being a reasonable
number, like 8) to prevent an attacker from creating an endless list of
subflows which makes the kernel iterate over that list for a long time.

Now, the client is walking from one WiFi access-point to another, switching
from WiFi, back to cell, back to WiFi,... each time with a new subflow
(e.g., because it's a different WiFi AP). So, at any time the number of
"parallel" subflows should never be higher than 2, but the total number of
subflows this connection has seen can be very large.

It seems like this is being addressed with Issue #19 (I would add to it that
subflows should also be closed if they receive a TCP-RST).

> 
> > > - we should cope better with failures when processing incoming mp_join
> > > syn-ack - e.g. dropping the req socket. 
> > 
> > These failures should be fairly easy to repro with packetdrill.
> > E.g., the state-check in finish_join.
> 
> uhm... a pity we don't have mp_join support yet for the pktdrill
> version in use with the export-branch :(

Ah, I see.

> > But, looking at removing the request-sock upon failure in
> > subflow_syn_recv_sock(). That looks tricky to me because we would need to
> > somehow tell tcp_check_req() whether or not to continue processing.
> 
> Just return NULL (as we currently do) and properly get rid of the req
> socket (as we currently _don't_ do) !?!
> 
> The req is disposed by tcp_check_req() iff 'tcp_abort_on_overflow' !=
> 0, which is not the default.

Yes, so the problem is to pass back the indication to tcp_check_req to
dispose the req. (trying to avoid the hooks in tcp_check_req ;-) )

Could we change the syn_recv_sock callback to return an int which indicates to
tcp_check_req whether or not it should dispose of the request-sock?

Looking at quite a few of the possible failures in those
callbacks, I don't see any (besides maybe DCCP and MPTCP) that would make sense to
destroy the request-socket. All the other failures are ENOMEM-related.


What about something like this:
@@ -769,12 +769,16 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
         * ESTABLISHED STATE. If it will be dropped after
         * socket is created, wait for troubles.
         */
-       child = inet_csk(sk)->icsk_af_ops->syn_recv_sock(sk, skb, req, NULL,
-                                                        req, &own_req);
-       if (!child)
+       ret = inet_csk(sk)->icsk_af_ops->syn_recv_sock(sk, skb, req, NULL,
+                                                      req, &own_req, &child);
+       if (ret < 0) {
+               if (ret != -ENOMEM) {
+                       goto embryonic_reset;
+               }
                goto listen_overflow;
+       }

-       if (own_req && sk_is_mptcp(child) && mptcp_sk_is_subflow(child)) {
+       if (ret > 0)
                reqsk_queue_removed(&inet_csk(sk)->icsk_accept_queue, req);
                inet_csk_reqsk_queue_drop_and_put(sk, req);
                return child;


Yes, it does not get rid of the branching in tcp_check_req, but at least it
makes it less MPTCP-specific.

We then just need to make sure that MPTCP returns 1 in syn_recv_sock if it
is a subflow.

Thoughts?


Christoph

> Additionally we currently account syn-ack join checks as listener
> overflow.
> 
> 
> > Looking at DCCP, which seems to have some kind of activation of features
> > requested during the handshake - it also just seems to return NULL without
> > killing the request-sock.
> > 
> > Would it make sense to rather do the validity-checks (e.g., correctness of
> > hmac) in tcp_check_req where it is easy to fail the request-sock ?
> 
> uhm... more hooks in tcp_check_req :(, see:
> 
> https://github.com/multipath-tcp/mptcp_net-next/issues/16
> 
> I'd like to avoid them if possible - but I can't find how :(
> 
> Cheers,
> 
> Paolo
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [MPTCP] Re: SYN/ACK+MP_JOIN keeps getting retransmitted
@ 2020-05-07 16:50 Christoph Paasch
  0 siblings, 0 replies; 10+ messages in thread
From: Christoph Paasch @ 2020-05-07 16:50 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 3765 bytes --]

On 06/05/20 - 19:08:24, Paolo Abeni wrote:
> On Tue, 2020-05-05 at 11:31 -0700, Christoph Paasch wrote:
> > On 05/05/20 - 12:42:16, Paolo Abeni wrote:
> > > On Mon, 2020-05-04 at 09:36 -0700, Christoph Paasch wrote:
> > > > On 02/05/20 - 09:30:43, Paolo Abeni wrote:
> > > > > On Fri, 2020-05-01 at 11:09 -0700, Christoph Paasch wrote:
> > > [...]
> > > > > > Thoughts?
> > > > > > 
> > > > > > This is not going to trigger a RST from the server (see comment in
> > > > > > subflow_init_req, because we are not passing up a return-value from init_req
> > > > > > to tcp_conn_request - but that should be easy to change).
> > > > > 
> > > > > Thank you for the detailed analisys! LGTM, with a couple of notes:
> > > > > 
> > > > > - I think it would be nicer if we additionally decreases the subflows
> > > > > if the mp_join request sk 'fails'/fallback before completing the 3WHS
> > > > 
> > > > regarding decrease - I saw that and actually was surprised that we *never*
> > > > decrease the subflow-counter (at least, I couldn't find where it is being
> > > > decremented).
> > > > 
> > > > Was that intended? 
> > > 
> > > So far, yes. The idea is that we currently increment the subflows
> > > counter only after the peer is completely validated and we close the
> > > subflow only at msk shutdown. Even if the peer closes the subflow, we -
> > > currently - still keep the 'struct sock' around. So the subflow count
> > > never decreases. 
> > > 
> > > I plan to revisit the above within the scope of:
> > > 
> > > https://github.com/multipath-tcp/mptcp_net-next/issues/19
> > > 
> > > 
> > > > Because, that means if I set max_subflows to 3 and even if
> > > > I only ever have 2 subflows in parallel, after the third subflow I am blocked.
> > > 
> > > Which is expected, right ?!? Can you please re-phrase?!? 
> > 
> > The scenario I have in mind is this:
> > 
> > One configures the server-side with X subflows (with X being a reasonable
> > number, like 8) to prevent an attacker from creating an endless list of
> > subflows which makes the kernel iterate over that list for a long time.
> > 
> > Now, the client is walking from one WiFi access-point to another, switching
> > from WiFi, back to cell, back to WiFi,... each time with a new subflow
> > (e.g., because it's a different WiFi AP). So, at any time the number of
> > "parallel" subflows should never be higher than 2, but the total number of
> > subflows this connection has seen can be very large.
> > 
> > It seems like this is being addressed with Issue #19 (I would add to it that
> > subflows should also be closed if they receive a TCP-RST).
> 
> Yep, that was the idea...
> 
> Follow-up question: the above scenario will finally clash with the 8
> bit limit for address id, right?

I think it is fine to wrap-around the 8-bit counter then and use the
address-IDs that are currently unused.


Christoph

> > Could we change the syn_recv_sock callback to return an int which indicates to
> > tcp_check_req whether or not it should dispose of the request-sock?
> 
> Ok, that would work, I think.
> My *personal* taste would be to try to avoid additional modification to
> TCP.
> 
> I *think* we can cope with this scenario in the following way:
> 
> * if MP_JOIN checks fail, subflow_syn_recv_sock() instead of freeing
> the child will call:
>         tcp_set_state(ssk, TCP_CLOSE);
>         tcp_send_active_reset(ssk, GFP_ATOMIC);
>         sock_put(ssk); // ssk ref cnt should be 2 before this call
>   and return the subflow socket.
> * the TCP stack will call tcp_child_process() on the child process,
> which will release the last reference to subflow.
> 
> Let me try to cook this one.
> 
> /P
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [MPTCP] Re: SYN/ACK+MP_JOIN keeps getting retransmitted
@ 2020-05-06 17:48 Paolo Abeni
  0 siblings, 0 replies; 10+ messages in thread
From: Paolo Abeni @ 2020-05-06 17:48 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 1647 bytes --]

On Tue, 2020-05-05 at 11:31 -0700, Christoph Paasch wrote:
> What about something like this:
> @@ -769,12 +769,16 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
>          * ESTABLISHED STATE. If it will be dropped after
>          * socket is created, wait for troubles.
>          */
> -       child = inet_csk(sk)->icsk_af_ops->syn_recv_sock(sk, skb, req, NULL,
> -                                                        req, &own_req);
> -       if (!child)
> +       ret = inet_csk(sk)->icsk_af_ops->syn_recv_sock(sk, skb, req, NULL,
> +                                                      req, &own_req, &child);
> +       if (ret < 0) {
> +               if (ret != -ENOMEM) {
> +                       goto embryonic_reset;
> +               }
>                 goto listen_overflow;
> +       }
> 
> -       if (own_req && sk_is_mptcp(child) && mptcp_sk_is_subflow(child)) {
> +       if (ret > 0)
>                 reqsk_queue_removed(&inet_csk(sk)->icsk_accept_queue, req);
>                 inet_csk_reqsk_queue_drop_and_put(sk, req);
>                 return child;
> 
> 
> Yes, it does not get rid of the branching in tcp_check_req, but at least it
> makes it less MPTCP-specific.

I'm sorry, I forgot to mention in my previous email the following: one
thing that I fear about the above change is that introduces some
overhead for

# CONFIG_MPTCP is not set

builds, while the current code compiles to nothing. We will have a
couple of additional conditionals - one in fast path - and an
additional argument in the stack. Nothing drammatic, but still...

Cheers,

Paolo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [MPTCP] Re: SYN/ACK+MP_JOIN keeps getting retransmitted
@ 2020-05-06 17:08 Paolo Abeni
  0 siblings, 0 replies; 10+ messages in thread
From: Paolo Abeni @ 2020-05-06 17:08 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 3434 bytes --]

On Tue, 2020-05-05 at 11:31 -0700, Christoph Paasch wrote:
> On 05/05/20 - 12:42:16, Paolo Abeni wrote:
> > On Mon, 2020-05-04 at 09:36 -0700, Christoph Paasch wrote:
> > > On 02/05/20 - 09:30:43, Paolo Abeni wrote:
> > > > On Fri, 2020-05-01 at 11:09 -0700, Christoph Paasch wrote:
> > [...]
> > > > > Thoughts?
> > > > > 
> > > > > This is not going to trigger a RST from the server (see comment in
> > > > > subflow_init_req, because we are not passing up a return-value from init_req
> > > > > to tcp_conn_request - but that should be easy to change).
> > > > 
> > > > Thank you for the detailed analisys! LGTM, with a couple of notes:
> > > > 
> > > > - I think it would be nicer if we additionally decreases the subflows
> > > > if the mp_join request sk 'fails'/fallback before completing the 3WHS
> > > 
> > > regarding decrease - I saw that and actually was surprised that we *never*
> > > decrease the subflow-counter (at least, I couldn't find where it is being
> > > decremented).
> > > 
> > > Was that intended? 
> > 
> > So far, yes. The idea is that we currently increment the subflows
> > counter only after the peer is completely validated and we close the
> > subflow only at msk shutdown. Even if the peer closes the subflow, we -
> > currently - still keep the 'struct sock' around. So the subflow count
> > never decreases. 
> > 
> > I plan to revisit the above within the scope of:
> > 
> > https://github.com/multipath-tcp/mptcp_net-next/issues/19
> > 
> > 
> > > Because, that means if I set max_subflows to 3 and even if
> > > I only ever have 2 subflows in parallel, after the third subflow I am blocked.
> > 
> > Which is expected, right ?!? Can you please re-phrase?!? 
> 
> The scenario I have in mind is this:
> 
> One configures the server-side with X subflows (with X being a reasonable
> number, like 8) to prevent an attacker from creating an endless list of
> subflows which makes the kernel iterate over that list for a long time.
> 
> Now, the client is walking from one WiFi access-point to another, switching
> from WiFi, back to cell, back to WiFi,... each time with a new subflow
> (e.g., because it's a different WiFi AP). So, at any time the number of
> "parallel" subflows should never be higher than 2, but the total number of
> subflows this connection has seen can be very large.
> 
> It seems like this is being addressed with Issue #19 (I would add to it that
> subflows should also be closed if they receive a TCP-RST).

Yep, that was the idea...

Follow-up question: the above scenario will finally clash with the 8
bit limit for address id, right?

> Could we change the syn_recv_sock callback to return an int which indicates to
> tcp_check_req whether or not it should dispose of the request-sock?

Ok, that would work, I think.
My *personal* taste would be to try to avoid additional modification to
TCP.

I *think* we can cope with this scenario in the following way:

* if MP_JOIN checks fail, subflow_syn_recv_sock() instead of freeing
the child will call:
        tcp_set_state(ssk, TCP_CLOSE);
        tcp_send_active_reset(ssk, GFP_ATOMIC);
        sock_put(ssk); // ssk ref cnt should be 2 before this call
  and return the subflow socket.
* the TCP stack will call tcp_child_process() on the child process,
which will release the last reference to subflow.

Let me try to cook this one.

/P

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [MPTCP] Re: SYN/ACK+MP_JOIN keeps getting retransmitted
@ 2020-05-05 11:13 Paolo Abeni
  0 siblings, 0 replies; 10+ messages in thread
From: Paolo Abeni @ 2020-05-05 11:13 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 2871 bytes --]

On Mon, 2020-05-04 at 09:36 -0700, Christoph Paasch wrote:
> On 02/05/20 - 09:30:43, Paolo Abeni wrote:
> > On Fri, 2020-05-01 at 11:09 -0700, Christoph Paasch wrote:
[...]
> > > Thoughts?
> > > 
> > > This is not going to trigger a RST from the server (see comment in
> > > subflow_init_req, because we are not passing up a return-value from init_req
> > > to tcp_conn_request - but that should be easy to change).
> > 
> > Thank you for the detailed analisys! LGTM, with a couple of notes:
> > 
> > - I think it would be nicer if we additionally decreases the subflows
> > if the mp_join request sk 'fails'/fallback before completing the 3WHS
> 
> regarding decrease - I saw that and actually was surprised that we *never*
> decrease the subflow-counter (at least, I couldn't find where it is being
> decremented).
> 
> Was that intended? 

So far, yes. The idea is that we currently increment the subflows
counter only after the peer is completely validated and we close the
subflow only at msk shutdown. Even if the peer closes the subflow, we -
currently - still keep the 'struct sock' around. So the subflow count
never decreases. 

I plan to revisit the above within the scope of:

https://github.com/multipath-tcp/mptcp_net-next/issues/19


> Because, that means if I set max_subflows to 3 and even if
> I only ever have 2 subflows in parallel, after the third subflow I am blocked.

Which is expected, right ?!? Can you please re-phrase?!? 

> > - we should cope better with failures when processing incoming mp_join
> > syn-ack - e.g. dropping the req socket. 
> 
> These failures should be fairly easy to repro with packetdrill.
> E.g., the state-check in finish_join.

uhm... a pity we don't have mp_join support yet for the pktdrill
version in use with the export-branch :(

> But, looking at removing the request-sock upon failure in
> subflow_syn_recv_sock(). That looks tricky to me because we would need to
> somehow tell tcp_check_req() whether or not to continue processing.

Just return NULL (as we currently do) and properly get rid of the req
socket (as we currently _don't_ do) !?!

The req is disposed by tcp_check_req() iff 'tcp_abort_on_overflow' !=
0, which is not the default.

Additionally we currently account syn-ack join checks as listener
overflow.

> Looking at DCCP, which seems to have some kind of activation of features
> requested during the handshake - it also just seems to return NULL without
> killing the request-sock.
> 
> Would it make sense to rather do the validity-checks (e.g., correctness of
> hmac) in tcp_check_req where it is easy to fail the request-sock ?

uhm... more hooks in tcp_check_req :(, see:

https://github.com/multipath-tcp/mptcp_net-next/issues/16

I'd like to avoid them if possible - but I can't find how :(

Cheers,

Paolo


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [MPTCP] Re: SYN/ACK+MP_JOIN keeps getting retransmitted
@ 2020-05-05 10:42 Paolo Abeni
  0 siblings, 0 replies; 10+ messages in thread
From: Paolo Abeni @ 2020-05-05 10:42 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 2869 bytes --]

On Mon, 2020-05-04 at 09:36 -0700, Christoph Paasch wrote:
> On 02/05/20 - 09:30:43, Paolo Abeni wrote:
> > On Fri, 2020-05-01 at 11:09 -0700, Christoph Paasch wrote:
[...]
> > > Thoughts?
> > > 
> > > This is not going to trigger a RST from the server (see comment in
> > > subflow_init_req, because we are not passing up a return-value from init_req
> > > to tcp_conn_request - but that should be easy to change).
> > 
> > Thank you for the detailed analisys! LGTM, with a couple of notes:
> > 
> > - I think it would be nicer if we additionally decreases the subflows
> > if the mp_join request sk 'fails'/fallback before completing the 3WHS
> 
> regarding decrease - I saw that and actually was surprised that we *never*
> decrease the subflow-counter (at least, I couldn't find where it is being
> decremented).
> 
> Was that intended? 

So far, yes. The idea is that we currently increment the subflows
counter only after the peer is completely validated and we close the
subflow only at msk shutdown. Even if the peer closes the subflow, we -
currently - still keep the 'struct sock' around. So the subflow count
never decreases. 

I plan to revisit the above within the scope of:

https://github.com/multipath-tcp/mptcp_net-next/issues/19


> Because, that means if I set max_subflows to 3 and even if
> I only ever have 2 subflows in parallel, after the third subflow I am blocked.

Which is expected, right ?!? Can you please re-phrase?!? 

> > - we should cope better with failures when processing incoming mp_join
> > syn-ack - e.g. dropping the req socket. 
> 
> These failures should be fairly easy to repro with packetdrill.
> E.g., the state-check in finish_join.

uhm... a pity we don't have mp_join support yet for the pktdrill
version in use with the export-branch :(

> But, looking at removing the request-sock upon failure in
> subflow_syn_recv_sock(). That looks tricky to me because we would need to
> somehow tell tcp_check_req() whether or not to continue processing.

Just return NULL (as we currently do) and properly get rid of the req
socket (as we currently _don't_ do) !?!

The req is disposed by tcp_check_req() iff 'tcp_abort_on_overflow' !=
0, which is not the default.

Additionally we currently account syn-ack join checks as listener
overflow.

> Looking at DCCP, which seems to have some kind of activation of features
> requested during the handshake - it also just seems to return NULL without
> killing the request-sock.
> 
> Would it make sense to rather do the validity-checks (e.g., correctness of
> hmac) in tcp_check_req where it is easy to fail the request-sock ?

uhm... more hooks in tcp_check_req :(, see:

https://github.com/multipath-tcp/mptcp_net-next/issues/16

I'd like to avoid them if possible - but I can't find how :(

Cheers,

Paolo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [MPTCP] Re: SYN/ACK+MP_JOIN keeps getting retransmitted
@ 2020-05-04 16:38 Christoph Paasch
  0 siblings, 0 replies; 10+ messages in thread
From: Christoph Paasch @ 2020-05-04 16:38 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 1806 bytes --]

On 02/05/20 - 09:33:25, Paolo Abeni wrote:
> On Fri, 2020-05-01 at 11:09 -0700, Christoph Paasch wrote:
> > while testing forked MPTCP as client and upstream MPTCP as server and one
> > forgets to set the max_subflow limit with iproute, I end up having a
> > scenario where SYN+MP_JOIN is followed by SYN/ACK+MP_JOIN followed by the
> > ACK+MP_JOIN, which then gets RST by the server.
> > 
> > This makes for a weird packet-trace:
> > 16:48:01.626118 IP 10.1.3.1.35561 > 10.2.1.1.5001: Flags [S], seq 3267701968, win 42340, options [mss 1460,sackOK,TS val 1457348237 ecr 0,nop,wscale 7,mptcp join id 4 token 0x8e133996 nonce 0xee4ce80a], length 0
> > 16:48:01.626163 IP 10.2.1.1.5001 > 10.1.3.1.35561: Flags [S.], seq 260310350, ack 3267701969, win 43440, options [mss 1460,sackOK,TS val 3916703437 ecr 1457348237,nop,wscale 7,mptcp join id 0 hmac 0x30e5a618ce852835 nonce 0x3238ac63], length 0
> > 16:48:01.626411 IP 10.1.3.1.35561 > 10.2.1.1.5001: Flags [.], ack 1, win 991, options [nop,nop,TS val 1457348238 ecr 3916703437,mptcp join hmac 0x5c9f67595a7c0f15e72b8d45d2abe73de3a86bf8], length 0
> > 16:48:01.626427 IP 10.2.1.1.5001 > 10.1.3.1.35561: Flags [R.], seq 1, ack 1, win 340, options [nop,nop,TS val 3916703438 ecr 1457348238,mptcp dss ack 7475064947213107918], length 0
> 
> [...]
> 
> 
> Almost forgot! does the above mean that with a correct configuration
> the two implementations interoperate nicely ?

Depends on the definition of "nicely" ;-)


MP_CAPABLE & MP_JOIN exchanges work now.

ADD_ADDR still has the old format in out-of-tree MPTCP, so that doesn't work
yet.

When there is a single subflow, data-exchange works fine. Once there are two
subflows, things become weird. Out-of-tree is retransmitting a lot... I need
to look into this.


Christoph

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [MPTCP] Re: SYN/ACK+MP_JOIN keeps getting retransmitted
@ 2020-05-04 16:36 Christoph Paasch
  0 siblings, 0 replies; 10+ messages in thread
From: Christoph Paasch @ 2020-05-04 16:36 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 6456 bytes --]

Hello,

On 02/05/20 - 09:30:43, Paolo Abeni wrote:
> Hi,
> 
> On Fri, 2020-05-01 at 11:09 -0700, Christoph Paasch wrote:
> > while testing forked MPTCP as client and upstream MPTCP as server and one
> > forgets to set the max_subflow limit with iproute, I end up having a
> > scenario where SYN+MP_JOIN is followed by SYN/ACK+MP_JOIN followed by the
> > ACK+MP_JOIN, which then gets RST by the server.
> > 
> > This makes for a weird packet-trace:
> > 16:48:01.626118 IP 10.1.3.1.35561 > 10.2.1.1.5001: Flags [S], seq 3267701968, win 42340, options [mss 1460,sackOK,TS val 1457348237 ecr 0,nop,wscale 7,mptcp join id 4 token 0x8e133996 nonce 0xee4ce80a], length 0
> > 16:48:01.626163 IP 10.2.1.1.5001 > 10.1.3.1.35561: Flags [S.], seq 260310350, ack 3267701969, win 43440, options [mss 1460,sackOK,TS val 3916703437 ecr 1457348237,nop,wscale 7,mptcp join id 0 hmac 0x30e5a618ce852835 nonce 0x3238ac63], length 0
> > 16:48:01.626411 IP 10.1.3.1.35561 > 10.2.1.1.5001: Flags [.], ack 1, win 991, options [nop,nop,TS val 1457348238 ecr 3916703437,mptcp join hmac 0x5c9f67595a7c0f15e72b8d45d2abe73de3a86bf8], length 0
> > 16:48:01.626427 IP 10.2.1.1.5001 > 10.1.3.1.35561: Flags [R.], seq 1, ack 1, win 340, options [nop,nop,TS val 3916703438 ecr 1457348238,mptcp dss ack 7475064947213107918], length 0
> > ^^^
> > Server kills the connection -> this is because subflow_syn_recv_sock does
> > the goto close_child and will return NULL to tcp_check_req
> > 
> > 16:48:02.653269 IP 10.2.1.1.5001 > 10.1.3.1.35561: Flags [S.], seq 260310350, ack 3267701969, win 43440, options [mss 1460,sackOK,TS val 3916704464 ecr 1457348238,nop,wscale 7,mptcp join id 0 hmac 0x30e5a618ce852835 nonce 0x3238ac63], length 0
> > ^^^
> > But, server keeps on retransmitting the SYN/ACK, which happens because
> > tcp_check_req did not remove the request-socket after subflow_syn_recv_sock
> > returned NULL
> > 
> > 16:48:02.653692 IP 10.1.3.1.35561 > 10.2.1.1.5001: Flags [R], seq 3267701969, win 0, length 0
> > ^^^
> > This time, the client will send the RST because of the previous RST from the
> > server
> > 
> > 
> > 16:48:04.701270 IP 10.2.1.1.5001 > 10.1.3.1.35561: Flags [S.], seq 260310350, ack 3267701969, win 43440, options [mss 1460,sackOK,TS val 3916706512 ecr 1457348238,nop,wscale 7,mptcp join id 0 hmac 0x30e5a618ce852835 nonce 0x3238ac63], length 0
> > 16:48:04.701707 IP 10.1.3.1.35561 > 10.2.1.1.5001: Flags [R], seq 3267701969, win 0, length 0
> > ^^^
> > And server keeps on retransmitting the SYN/ACK until things time out.
> > 
> > 
> > I think we should move the subflow-count check to the SYN+MP_JOIN reception
> > à la:
> > 
> > -------
> > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> > index e1f23016ed3f..adddf2da5dcc 100644
> > --- a/net/mptcp/protocol.c
> > +++ b/net/mptcp/protocol.c
> > @@ -1611,7 +1611,6 @@ bool mptcp_finish_join(struct sock *sk)
> >  	struct mptcp_sock *msk = mptcp_sk(subflow->conn);
> >  	struct sock *parent = (void *)msk;
> >  	struct socket *parent_sock;
> > -	bool ret;
> > 
> >  	pr_debug("msk=%p, subflow=%p", msk, subflow);
> > 
> > @@ -1627,15 +1626,12 @@ bool mptcp_finish_join(struct sock *sk)
> >  	if (parent_sock && !sk->sk_socket)
> >  		mptcp_sock_graft(sk, parent_sock);
> > 
> > -	ret = mptcp_pm_allow_new_subflow(msk);
> > -	if (ret) {
> > -		/* active connections are already on conn_list */
> > -		spin_lock_bh(&msk->join_list_lock);
> > -		if (!WARN_ON_ONCE(!list_empty(&subflow->node)))
> > -			list_add_tail(&subflow->node, &msk->join_list);
> > -		spin_unlock_bh(&msk->join_list_lock);
> > -	}
> > -	return ret;
> > +	/* active connections are already on conn_list */
> > +	spin_lock_bh(&msk->join_list_lock);
> > +	if (!WARN_ON_ONCE(!list_empty(&subflow->node)))
> > +		list_add_tail(&subflow->node, &msk->join_list);
> > +	spin_unlock_bh(&msk->join_list_lock);
> > +	return true;
> >  }
> > 
> >  bool mptcp_sk_is_subflow(const struct sock *sk)
> > diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
> > index 009d5c478062..0684bfc42f88 100644
> > --- a/net/mptcp/subflow.c
> > +++ b/net/mptcp/subflow.c
> > @@ -99,6 +99,10 @@ static bool subflow_token_join_request(struct request_sock *req,
> >  		return false;
> >  	}
> > 
> > +	if (!mptcp_pm_allow_new_subflow(msk)) {
> > +		return false;
> > +	}
> > +
> >  	local_id = mptcp_pm_get_local_id(msk, (struct sock_common *)req);
> >  	if (local_id < 0) {
> >  		sock_put((struct sock *)msk);
> > 
> > 
> > 
> > Thoughts?
> > 
> > This is not going to trigger a RST from the server (see comment in
> > subflow_init_req, because we are not passing up a return-value from init_req
> > to tcp_conn_request - but that should be easy to change).
> 
> Thank you for the detailed analisys! LGTM, with a couple of notes:
> 
> - I think it would be nicer if we additionally decreases the subflows
> if the mp_join request sk 'fails'/fallback before completing the 3WHS

regarding decrease - I saw that and actually was surprised that we *never*
decrease the subflow-counter (at least, I couldn't find where it is being
decremented).

Was that intended? Because, that means if I set max_subflows to 3 and even if
I only ever have 2 subflows in parallel, after the third subflow I am blocked.

> - we should cope better with failures when processing incoming mp_join
> syn-ack - e.g. dropping the req socket. 

These failures should be fairly easy to repro with packetdrill.
E.g., the state-check in finish_join.

But, looking at removing the request-sock upon failure in
subflow_syn_recv_sock(). That looks tricky to me because we would need to
somehow tell tcp_check_req() whether or not to continue processing.

Looking at DCCP, which seems to have some kind of activation of features
requested during the handshake - it also just seems to return NULL without
killing the request-sock.


Would it make sense to rather do the validity-checks (e.g., correctness of
hmac) in tcp_check_req where it is easy to fail the request-sock ?

> The latter scenario will be IMHO very difficult to test without the bug
> you outline here, so perhaps we can [try to] fix them in reverse order?
> 
> Thanks!
> 
> Paolo
> 
> p.s. could you please share the out-of-tree v1 code - even on a
> devel/test/experimental branch?

Yes, I just published them on mptcp-dev. They apply on top of mptcp_trunk.


Christoph

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [MPTCP] Re: SYN/ACK+MP_JOIN keeps getting retransmitted
@ 2020-05-02  7:33 Paolo Abeni
  0 siblings, 0 replies; 10+ messages in thread
From: Paolo Abeni @ 2020-05-02  7:33 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 1377 bytes --]

On Fri, 2020-05-01 at 11:09 -0700, Christoph Paasch wrote:
> while testing forked MPTCP as client and upstream MPTCP as server and one
> forgets to set the max_subflow limit with iproute, I end up having a
> scenario where SYN+MP_JOIN is followed by SYN/ACK+MP_JOIN followed by the
> ACK+MP_JOIN, which then gets RST by the server.
> 
> This makes for a weird packet-trace:
> 16:48:01.626118 IP 10.1.3.1.35561 > 10.2.1.1.5001: Flags [S], seq 3267701968, win 42340, options [mss 1460,sackOK,TS val 1457348237 ecr 0,nop,wscale 7,mptcp join id 4 token 0x8e133996 nonce 0xee4ce80a], length 0
> 16:48:01.626163 IP 10.2.1.1.5001 > 10.1.3.1.35561: Flags [S.], seq 260310350, ack 3267701969, win 43440, options [mss 1460,sackOK,TS val 3916703437 ecr 1457348237,nop,wscale 7,mptcp join id 0 hmac 0x30e5a618ce852835 nonce 0x3238ac63], length 0
> 16:48:01.626411 IP 10.1.3.1.35561 > 10.2.1.1.5001: Flags [.], ack 1, win 991, options [nop,nop,TS val 1457348238 ecr 3916703437,mptcp join hmac 0x5c9f67595a7c0f15e72b8d45d2abe73de3a86bf8], length 0
> 16:48:01.626427 IP 10.2.1.1.5001 > 10.1.3.1.35561: Flags [R.], seq 1, ack 1, win 340, options [nop,nop,TS val 3916703438 ecr 1457348238,mptcp dss ack 7475064947213107918], length 0

[...]


Almost forgot! does the above mean that with a correct configuration
the two implementations interoperate nicely ?

Thanks

Paolo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [MPTCP] Re: SYN/ACK+MP_JOIN keeps getting retransmitted
@ 2020-05-02  7:30 Paolo Abeni
  0 siblings, 0 replies; 10+ messages in thread
From: Paolo Abeni @ 2020-05-02  7:30 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 5116 bytes --]

Hi,

On Fri, 2020-05-01 at 11:09 -0700, Christoph Paasch wrote:
> while testing forked MPTCP as client and upstream MPTCP as server and one
> forgets to set the max_subflow limit with iproute, I end up having a
> scenario where SYN+MP_JOIN is followed by SYN/ACK+MP_JOIN followed by the
> ACK+MP_JOIN, which then gets RST by the server.
> 
> This makes for a weird packet-trace:
> 16:48:01.626118 IP 10.1.3.1.35561 > 10.2.1.1.5001: Flags [S], seq 3267701968, win 42340, options [mss 1460,sackOK,TS val 1457348237 ecr 0,nop,wscale 7,mptcp join id 4 token 0x8e133996 nonce 0xee4ce80a], length 0
> 16:48:01.626163 IP 10.2.1.1.5001 > 10.1.3.1.35561: Flags [S.], seq 260310350, ack 3267701969, win 43440, options [mss 1460,sackOK,TS val 3916703437 ecr 1457348237,nop,wscale 7,mptcp join id 0 hmac 0x30e5a618ce852835 nonce 0x3238ac63], length 0
> 16:48:01.626411 IP 10.1.3.1.35561 > 10.2.1.1.5001: Flags [.], ack 1, win 991, options [nop,nop,TS val 1457348238 ecr 3916703437,mptcp join hmac 0x5c9f67595a7c0f15e72b8d45d2abe73de3a86bf8], length 0
> 16:48:01.626427 IP 10.2.1.1.5001 > 10.1.3.1.35561: Flags [R.], seq 1, ack 1, win 340, options [nop,nop,TS val 3916703438 ecr 1457348238,mptcp dss ack 7475064947213107918], length 0
> ^^^
> Server kills the connection -> this is because subflow_syn_recv_sock does
> the goto close_child and will return NULL to tcp_check_req
> 
> 16:48:02.653269 IP 10.2.1.1.5001 > 10.1.3.1.35561: Flags [S.], seq 260310350, ack 3267701969, win 43440, options [mss 1460,sackOK,TS val 3916704464 ecr 1457348238,nop,wscale 7,mptcp join id 0 hmac 0x30e5a618ce852835 nonce 0x3238ac63], length 0
> ^^^
> But, server keeps on retransmitting the SYN/ACK, which happens because
> tcp_check_req did not remove the request-socket after subflow_syn_recv_sock
> returned NULL
> 
> 16:48:02.653692 IP 10.1.3.1.35561 > 10.2.1.1.5001: Flags [R], seq 3267701969, win 0, length 0
> ^^^
> This time, the client will send the RST because of the previous RST from the
> server
> 
> 
> 16:48:04.701270 IP 10.2.1.1.5001 > 10.1.3.1.35561: Flags [S.], seq 260310350, ack 3267701969, win 43440, options [mss 1460,sackOK,TS val 3916706512 ecr 1457348238,nop,wscale 7,mptcp join id 0 hmac 0x30e5a618ce852835 nonce 0x3238ac63], length 0
> 16:48:04.701707 IP 10.1.3.1.35561 > 10.2.1.1.5001: Flags [R], seq 3267701969, win 0, length 0
> ^^^
> And server keeps on retransmitting the SYN/ACK until things time out.
> 
> 
> I think we should move the subflow-count check to the SYN+MP_JOIN reception
> à la:
> 
> -------
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index e1f23016ed3f..adddf2da5dcc 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -1611,7 +1611,6 @@ bool mptcp_finish_join(struct sock *sk)
>  	struct mptcp_sock *msk = mptcp_sk(subflow->conn);
>  	struct sock *parent = (void *)msk;
>  	struct socket *parent_sock;
> -	bool ret;
> 
>  	pr_debug("msk=%p, subflow=%p", msk, subflow);
> 
> @@ -1627,15 +1626,12 @@ bool mptcp_finish_join(struct sock *sk)
>  	if (parent_sock && !sk->sk_socket)
>  		mptcp_sock_graft(sk, parent_sock);
> 
> -	ret = mptcp_pm_allow_new_subflow(msk);
> -	if (ret) {
> -		/* active connections are already on conn_list */
> -		spin_lock_bh(&msk->join_list_lock);
> -		if (!WARN_ON_ONCE(!list_empty(&subflow->node)))
> -			list_add_tail(&subflow->node, &msk->join_list);
> -		spin_unlock_bh(&msk->join_list_lock);
> -	}
> -	return ret;
> +	/* active connections are already on conn_list */
> +	spin_lock_bh(&msk->join_list_lock);
> +	if (!WARN_ON_ONCE(!list_empty(&subflow->node)))
> +		list_add_tail(&subflow->node, &msk->join_list);
> +	spin_unlock_bh(&msk->join_list_lock);
> +	return true;
>  }
> 
>  bool mptcp_sk_is_subflow(const struct sock *sk)
> diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
> index 009d5c478062..0684bfc42f88 100644
> --- a/net/mptcp/subflow.c
> +++ b/net/mptcp/subflow.c
> @@ -99,6 +99,10 @@ static bool subflow_token_join_request(struct request_sock *req,
>  		return false;
>  	}
> 
> +	if (!mptcp_pm_allow_new_subflow(msk)) {
> +		return false;
> +	}
> +
>  	local_id = mptcp_pm_get_local_id(msk, (struct sock_common *)req);
>  	if (local_id < 0) {
>  		sock_put((struct sock *)msk);
> 
> 
> 
> Thoughts?
> 
> This is not going to trigger a RST from the server (see comment in
> subflow_init_req, because we are not passing up a return-value from init_req
> to tcp_conn_request - but that should be easy to change).

Thank you for the detailed analisys! LGTM, with a couple of notes:

- I think it would be nicer if we additionally decreases the subflows
if the mp_join request sk 'fails'/fallback before completing the 3WHS
- we should cope better with failures when processing incoming mp_join
syn-ack - e.g. dropping the req socket. 

The latter scenario will be IMHO very difficult to test without the bug
you outline here, so perhaps we can [try to] fix them in reverse order?

Thanks!

Paolo

p.s. could you please share the out-of-tree v1 code - even on a
devel/test/experimental branch?

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-05-07 16:50 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-05 18:31 [MPTCP] Re: SYN/ACK+MP_JOIN keeps getting retransmitted Christoph Paasch
  -- strict thread matches above, loose matches on Subject: below --
2020-05-07 16:50 Christoph Paasch
2020-05-06 17:48 Paolo Abeni
2020-05-06 17:08 Paolo Abeni
2020-05-05 11:13 Paolo Abeni
2020-05-05 10:42 Paolo Abeni
2020-05-04 16:38 Christoph Paasch
2020-05-04 16:36 Christoph Paasch
2020-05-02  7:33 Paolo Abeni
2020-05-02  7:30 Paolo Abeni

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.