* [MPTCP] Re: [RFC PATCH 6/8] mptcp: refactor shutdown and close
@ 2020-09-25 17:02 Paolo Abeni
0 siblings, 0 replies; 5+ messages in thread
From: Paolo Abeni @ 2020-09-25 17:02 UTC (permalink / raw)
To: mptcp
[-- Attachment #1: Type: text/plain, Size: 3929 bytes --]
On Thu, 2020-09-24 at 15:59 +0200, Paolo Abeni wrote:
> We must not close the subflows before all the MPTCP level
> data, comprising the DATA_FIN has been acked at the MPTCP
> level, otherwise we could be unable to retransmit as needed.
>
> __mptcp_wr_shutdown() shutdown is responsible to check for the
> correct status and close all subflows. Is called by the output
> path after spooling any data and at shutdown/close time.
>
> In a similar way, __mptcp_destroy_sock() is responsible to clean-up
> the MPTCP level status, and is called when the msk transition
> to TCP_CLOSE.
>
> The protocol level close() does not force anymore the TCP_CLOSE
> status, but orphan the msk socket and all the subflows.
> Orphaned msk sockets are forciby closed after a timeout or
> when all MPTCP-level data is acked.
>
> There is a caveat about keeping the orphaned subflows around:
> the TCP stack can asynchronusly call tcp_cleanup_ulp() on them via
> tcp_close(). To prevent accessing freed memory on later MPTCP
> level operations, the msk acquires a reference to each subflow
> socket and prevent subflow_ulp_release() from releasing the
> subflow context before __mptcp_destroy_sock().
>
> The additional subflow references are released by __mptcp_done()
> and the async ULP release is detected checking ULP ops. If such
> field has been already cleared by the ULP release path, the
> dangling context is freed directly by __mptcp_done().
>
> Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
> ---
> Note:
> The convoluted ctx deallocation schema fixes the UaF described in:
>
> https://lists.01.org/hyperkitty/list/mptcp(a)lists.01.org/message/JC7GTZHEKI6A3WHQKAWF3SYUSBEU5DYM/
>
> for me.
> ---
> net/mptcp/protocol.c | 264 ++++++++++++++++++++++++++++++-----------
> --
> net/mptcp/protocol.h | 4 +-
> net/mptcp/subflow.c | 21 +++-
> 3 files changed, 203 insertions(+), 86 deletions(-)
>
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 2b6c704d4505..dec703103398 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -21,6 +21,7 @@
> #include <net/transp_v6.h>
> #endif
> #include <net/mptcp.h>
> +#include <net/xfrm.h>
> #include "protocol.h"
> #include "mib.h"
>
> @@ -41,6 +42,8 @@ struct mptcp_skb_cb {
>
> static struct percpu_counter mptcp_sockets_allocated;
>
> +static void __mptcp_destroy_sock(struct sock *sk, int timeout);
> +
> /* If msk has an initial subflow socket, and the MP_CAPABLE
> handshake has not
> * completed yet or has failed, return the subflow socket.
> * Otherwise return NULL.
> @@ -102,6 +105,7 @@ static int __mptcp_socket_create(struct
> mptcp_sock *msk)
> msk->subflow = ssock;
> subflow = mptcp_subflow_ctx(ssock->sk);
> list_add(&subflow->node, &msk->conn_list);
> + sock_hold(ssock->sk);
> subflow->request_mptcp = 1;
>
> /* accept() will wait on first subflow sk_wq, and we always
> wakes up
> @@ -313,6 +317,13 @@ static void mptcp_stop_timer(struct sock *sk)
> mptcp_sk(sk)->timer_ival = 0;
> }
>
> +static bool mptcp_wr_eof(const struct sock *sk)
> +{
> + return sk->sk_state == TCP_FIN_WAIT2 || sk->sk_state ==
> TCP_TIME_WAIT ||
> + sk->sk_state == TCP_CLOSE;
> +}
> +
> +/* return true when transitioning to CLOSE state */
> static void mptcp_check_data_fin_ack(struct sock *sk)
> {
> struct mptcp_sock *msk = mptcp_sk(sk);
> @@ -331,15 +342,22 @@ static void mptcp_check_data_fin_ack(struct
> sock *sk)
> switch (sk->sk_state) {
> case TCP_FIN_WAIT1:
> inet_sk_state_store(sk, TCP_FIN_WAIT2);
> - sk->sk_state_change(sk);
> break;
> case TCP_CLOSING:
> + inet_sk_state_store(sk, TCP_TIME_WAIT);
> + break;
I can't use TIME_WAIT here nor later, otherwise sk_fullsock() will be
fooled. Not sure why I did not notice before. I'll use directly
TCP_CLOSE.
Paolo
^ permalink raw reply [flat|nested] 5+ messages in thread
* [MPTCP] Re: [RFC PATCH 6/8] mptcp: refactor shutdown and close
@ 2020-09-28 14:47 Paolo Abeni
0 siblings, 0 replies; 5+ messages in thread
From: Paolo Abeni @ 2020-09-28 14:47 UTC (permalink / raw)
To: mptcp
[-- Attachment #1: Type: text/plain, Size: 614 bytes --]
On Thu, 2020-09-24 at 15:59 +0200, Paolo Abeni wrote:
> @@ -417,9 +435,8 @@ static void mptcp_check_data_fin(struct sock *sk)
> inet_sk_state_store(sk, TCP_CLOSING);
> break;
> case TCP_FIN_WAIT2:
> - inet_sk_state_store(sk, TCP_CLOSE);
> - // @@ Close subflows now?
> - break;
> + inet_sk_state_store(sk, TCP_TIME_WAIT);
> + return;
the 'return;' statement slipped-in unintentionally here: this does not
send data_fin ack, if the fin in received in fin_wait2 state.
I have quite a few pending fixes, so I'll send a v1 soon, even if there
are likely more bugs pending.
/P
^ permalink raw reply [flat|nested] 5+ messages in thread
* [MPTCP] Re: [RFC PATCH 6/8] mptcp: refactor shutdown and close
@ 2020-09-28 9:39 Paolo Abeni
0 siblings, 0 replies; 5+ messages in thread
From: Paolo Abeni @ 2020-09-28 9:39 UTC (permalink / raw)
To: mptcp
[-- Attachment #1: Type: text/plain, Size: 4706 bytes --]
On Fri, 2020-09-25 at 14:34 -0700, Mat Martineau wrote:
> On Fri, 25 Sep 2020, Paolo Abeni wrote:
>
> > On Thu, 2020-09-24 at 15:59 +0200, Paolo Abeni wrote:
> > > We must not close the subflows before all the MPTCP level
> > > data, comprising the DATA_FIN has been acked at the MPTCP
> > > level, otherwise we could be unable to retransmit as needed.
> > >
> > > __mptcp_wr_shutdown() shutdown is responsible to check for the
> > > correct status and close all subflows. Is called by the output
> > > path after spooling any data and at shutdown/close time.
> > >
> > > In a similar way, __mptcp_destroy_sock() is responsible to clean-
> > > up
> > > the MPTCP level status, and is called when the msk transition
> > > to TCP_CLOSE.
> > >
> > > The protocol level close() does not force anymore the TCP_CLOSE
> > > status, but orphan the msk socket and all the subflows.
> > > Orphaned msk sockets are forciby closed after a timeout or
> > > when all MPTCP-level data is acked.
> > >
> > > There is a caveat about keeping the orphaned subflows around:
> > > the TCP stack can asynchronusly call tcp_cleanup_ulp() on them
> > > via
> > > tcp_close(). To prevent accessing freed memory on later MPTCP
> > > level operations, the msk acquires a reference to each subflow
> > > socket and prevent subflow_ulp_release() from releasing the
> > > subflow context before __mptcp_destroy_sock().
> > >
> > > The additional subflow references are released by __mptcp_done()
> > > and the async ULP release is detected checking ULP ops. If such
> > > field has been already cleared by the ULP release path, the
> > > dangling context is freed directly by __mptcp_done().
> > >
> > > Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
> > > ---
> > > Note:
> > > The convoluted ctx deallocation schema fixes the UaF described
> > > in:
> > >
> > > https://lists.01.org/hyperkitty/list/mptcp(a)lists.01.org/message/JC7GTZHEKI6A3WHQKAWF3SYUSBEU5DYM/
> > >
> > > for me.
> > > ---
> > > net/mptcp/protocol.c | 264 ++++++++++++++++++++++++++++++-------
> > > ----
> > > --
> > > net/mptcp/protocol.h | 4 +-
> > > net/mptcp/subflow.c | 21 +++-
> > > 3 files changed, 203 insertions(+), 86 deletions(-)
> > >
> > > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> > > index 2b6c704d4505..dec703103398 100644
> > > --- a/net/mptcp/protocol.c
> > > +++ b/net/mptcp/protocol.c
> > > @@ -21,6 +21,7 @@
> > > #include <net/transp_v6.h>
> > > #endif
> > > #include <net/mptcp.h>
> > > +#include <net/xfrm.h>
> > > #include "protocol.h"
> > > #include "mib.h"
> > >
> > > @@ -41,6 +42,8 @@ struct mptcp_skb_cb {
> > >
> > > static struct percpu_counter mptcp_sockets_allocated;
> > >
> > > +static void __mptcp_destroy_sock(struct sock *sk, int timeout);
> > > +
> > > /* If msk has an initial subflow socket, and the MP_CAPABLE
> > > handshake has not
> > > * completed yet or has failed, return the subflow socket.
> > > * Otherwise return NULL.
> > > @@ -102,6 +105,7 @@ static int __mptcp_socket_create(struct
> > > mptcp_sock *msk)
> > > msk->subflow = ssock;
> > > subflow = mptcp_subflow_ctx(ssock->sk);
> > > list_add(&subflow->node, &msk->conn_list);
> > > + sock_hold(ssock->sk);
> > > subflow->request_mptcp = 1;
> > >
> > > /* accept() will wait on first subflow sk_wq, and we always
> > > wakes up
> > > @@ -313,6 +317,13 @@ static void mptcp_stop_timer(struct sock
> > > *sk)
> > > mptcp_sk(sk)->timer_ival = 0;
> > > }
> > >
> > > +static bool mptcp_wr_eof(const struct sock *sk)
> > > +{
> > > + return sk->sk_state == TCP_FIN_WAIT2 || sk->sk_state ==
> > > TCP_TIME_WAIT ||
> > > + sk->sk_state == TCP_CLOSE;
>
> Need to remove TCP_TIME_WAIT here too, the state isn't used (yet?).
Indeed, TIME_WAIT should be dropped from here. It's use is introduced
by this version of this patch, but I removed in in later version
(shared only on IRC so far ;)
> I looked at how this interoperated with the mptcp_trunk kernel (with
> MPTCPv1). With TCP_FIN_WAIT2 included in the mptcp_wr_eof() check,
> __mptcp_destroy_sock() gets called before we can receive and
> acknowledge
> the peer's DATA_FIN.
> I ran an experiment removing TCP_FIN_WAIT2 and TCP_TIME_WAIT from
> the
> above, and it did keep the socket alive to receive DATA_FIN...
Uhm... thinking again about it, I agree mptcp_wr_eof() is too
aggressive: we still need to have the subflows around to cope with the
scenario you describe.
> but then
> exposed a bug matching an incoming DATA_FIN with 32-bit sequence
> number.
Lucky me, that seams an unrelated issue ;)
/P
^ permalink raw reply [flat|nested] 5+ messages in thread
* [MPTCP] Re: [RFC PATCH 6/8] mptcp: refactor shutdown and close
@ 2020-09-25 21:34 Mat Martineau
0 siblings, 0 replies; 5+ messages in thread
From: Mat Martineau @ 2020-09-25 21:34 UTC (permalink / raw)
To: mptcp
[-- Attachment #1: Type: text/plain, Size: 4614 bytes --]
On Fri, 25 Sep 2020, Paolo Abeni wrote:
> On Thu, 2020-09-24 at 15:59 +0200, Paolo Abeni wrote:
>> We must not close the subflows before all the MPTCP level
>> data, comprising the DATA_FIN has been acked at the MPTCP
>> level, otherwise we could be unable to retransmit as needed.
>>
>> __mptcp_wr_shutdown() shutdown is responsible to check for the
>> correct status and close all subflows. Is called by the output
>> path after spooling any data and at shutdown/close time.
>>
>> In a similar way, __mptcp_destroy_sock() is responsible to clean-up
>> the MPTCP level status, and is called when the msk transition
>> to TCP_CLOSE.
>>
>> The protocol level close() does not force anymore the TCP_CLOSE
>> status, but orphan the msk socket and all the subflows.
>> Orphaned msk sockets are forciby closed after a timeout or
>> when all MPTCP-level data is acked.
>>
>> There is a caveat about keeping the orphaned subflows around:
>> the TCP stack can asynchronusly call tcp_cleanup_ulp() on them via
>> tcp_close(). To prevent accessing freed memory on later MPTCP
>> level operations, the msk acquires a reference to each subflow
>> socket and prevent subflow_ulp_release() from releasing the
>> subflow context before __mptcp_destroy_sock().
>>
>> The additional subflow references are released by __mptcp_done()
>> and the async ULP release is detected checking ULP ops. If such
>> field has been already cleared by the ULP release path, the
>> dangling context is freed directly by __mptcp_done().
>>
>> Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
>> ---
>> Note:
>> The convoluted ctx deallocation schema fixes the UaF described in:
>>
>> https://lists.01.org/hyperkitty/list/mptcp(a)lists.01.org/message/JC7GTZHEKI6A3WHQKAWF3SYUSBEU5DYM/
>>
>> for me.
>> ---
>> net/mptcp/protocol.c | 264 ++++++++++++++++++++++++++++++-----------
>> --
>> net/mptcp/protocol.h | 4 +-
>> net/mptcp/subflow.c | 21 +++-
>> 3 files changed, 203 insertions(+), 86 deletions(-)
>>
>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
>> index 2b6c704d4505..dec703103398 100644
>> --- a/net/mptcp/protocol.c
>> +++ b/net/mptcp/protocol.c
>> @@ -21,6 +21,7 @@
>> #include <net/transp_v6.h>
>> #endif
>> #include <net/mptcp.h>
>> +#include <net/xfrm.h>
>> #include "protocol.h"
>> #include "mib.h"
>>
>> @@ -41,6 +42,8 @@ struct mptcp_skb_cb {
>>
>> static struct percpu_counter mptcp_sockets_allocated;
>>
>> +static void __mptcp_destroy_sock(struct sock *sk, int timeout);
>> +
>> /* If msk has an initial subflow socket, and the MP_CAPABLE
>> handshake has not
>> * completed yet or has failed, return the subflow socket.
>> * Otherwise return NULL.
>> @@ -102,6 +105,7 @@ static int __mptcp_socket_create(struct
>> mptcp_sock *msk)
>> msk->subflow = ssock;
>> subflow = mptcp_subflow_ctx(ssock->sk);
>> list_add(&subflow->node, &msk->conn_list);
>> + sock_hold(ssock->sk);
>> subflow->request_mptcp = 1;
>>
>> /* accept() will wait on first subflow sk_wq, and we always
>> wakes up
>> @@ -313,6 +317,13 @@ static void mptcp_stop_timer(struct sock *sk)
>> mptcp_sk(sk)->timer_ival = 0;
>> }
>>
>> +static bool mptcp_wr_eof(const struct sock *sk)
>> +{
>> + return sk->sk_state == TCP_FIN_WAIT2 || sk->sk_state ==
>> TCP_TIME_WAIT ||
>> + sk->sk_state == TCP_CLOSE;
Need to remove TCP_TIME_WAIT here too, the state isn't used (yet?).
I looked at how this interoperated with the mptcp_trunk kernel (with
MPTCPv1). With TCP_FIN_WAIT2 included in the mptcp_wr_eof() check,
__mptcp_destroy_sock() gets called before we can receive and acknowledge
the peer's DATA_FIN.
I ran an experiment removing TCP_FIN_WAIT2 and TCP_TIME_WAIT from the
above, and it did keep the socket alive to receive DATA_FIN... but then
exposed a bug matching an incoming DATA_FIN with 32-bit sequence number.
>> +}
>> +
>> +/* return true when transitioning to CLOSE state */
>> static void mptcp_check_data_fin_ack(struct sock *sk)
>> {
>> struct mptcp_sock *msk = mptcp_sk(sk);
>> @@ -331,15 +342,22 @@ static void mptcp_check_data_fin_ack(struct
>> sock *sk)
>> switch (sk->sk_state) {
>> case TCP_FIN_WAIT1:
>> inet_sk_state_store(sk, TCP_FIN_WAIT2);
>> - sk->sk_state_change(sk);
>> break;
>> case TCP_CLOSING:
>> + inet_sk_state_store(sk, TCP_TIME_WAIT);
>> + break;
>
> I can't use TIME_WAIT here nor later, otherwise sk_fullsock() will be
> fooled. Not sure why I did not notice before. I'll use directly
> TCP_CLOSE.
--
Mat Martineau
Intel
^ permalink raw reply [flat|nested] 5+ messages in thread
* [MPTCP] Re: [RFC PATCH 6/8] mptcp: refactor shutdown and close
@ 2020-09-25 17:07 Paolo Abeni
0 siblings, 0 replies; 5+ messages in thread
From: Paolo Abeni @ 2020-09-25 17:07 UTC (permalink / raw)
To: mptcp
[-- Attachment #1: Type: text/plain, Size: 1814 bytes --]
On Thu, 2020-09-24 at 15:59 +0200, Paolo Abeni wrote:
> @@ -1946,42 +2024,43 @@ static int mptcp_close_state(struct sock *sk)
> return next & TCP_ACTION_FIN;
> }
>
> -static void mptcp_close(struct sock *sk, long timeout)
> +static void __mptcp_wr_shutdown(struct sock *sk)
> {
> - struct mptcp_subflow_context *subflow, *tmp;
> + struct mptcp_subflow_context *subflow;
> struct mptcp_sock *msk = mptcp_sk(sk);
> - LIST_HEAD(conn_list);
>
> - lock_sock(sk);
> - sk->sk_shutdown = SHUTDOWN_MASK;
> + pr_debug("msk=%p snd_data_fin_enable=%d shutdown=%x state=%d pending=%d",
> + msk, msk->snd_data_fin_enable, sk->sk_shutdown, sk->sk_state,
> + !!mptcp_send_head(sk));
>
> - if (sk->sk_state == TCP_LISTEN) {
> - inet_sk_state_store(sk, TCP_CLOSE);
> - goto cleanup;
> - } else if (sk->sk_state == TCP_CLOSE) {
> - goto cleanup;
> - }
> + /* If we've already sent a FIN, we still need to enqueue data to
> + * subflows or not really shutting down, skip this
> + */
> + if (msk->snd_data_fin_enable ||
> + !(sk->sk_shutdown & SEND_SHUTDOWN) || mptcp_send_head(sk) ||
> + (sk->sk_state != TCP_FIN_WAIT1 && sk->sk_state != TCP_LAST_ACK))
> + return;
>
> - if (__mptcp_check_fallback(msk)) {
> - goto update_state;
> - } else if (mptcp_close_state(sk)) {
> - pr_debug("Sending DATA_FIN sk=%p", sk);
> - WRITE_ONCE(msk->write_seq, msk->write_seq + 1);
> - WRITE_ONCE(msk->snd_data_fin_enable, 1);
> + /* will be ignored by fallback sockets */
> + WRITE_ONCE(msk->write_seq, msk->write_seq + 1);
> + WRITE_ONCE(msk->snd_data_fin_enable, 1);
Here I need to increment also snd_nxt, or we will drop DATA FIN ack
in ack_update_msk(). Technically this last change could land in a
previous patch, but will be irrelevant till the next one.
/P
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-09-28 14:47 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-25 17:02 [MPTCP] Re: [RFC PATCH 6/8] mptcp: refactor shutdown and close Paolo Abeni
2020-09-25 17:07 Paolo Abeni
2020-09-25 21:34 Mat Martineau
2020-09-28 9:39 Paolo Abeni
2020-09-28 14:47 Paolo Abeni
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.