* Re: [MPTCP] [RFC PATCH v4 08/17] mptcp: Create SUBFLOW socket for incoming connections
@ 2018-12-12 23:58 Krystad, Peter
0 siblings, 0 replies; 7+ messages in thread
From: Krystad, Peter @ 2018-12-12 23:58 UTC (permalink / raw)
To: mptcp
[-- Attachment #1: Type: text/plain, Size: 25897 bytes --]
On Wed, 2018-12-12 at 13:59 -0800, cpaasch(a)apple.com wrote:
> On 12/12/18 - 21:45:58, Krystad, Peter wrote:
> > On Wed, 2018-12-12 at 13:07 -0800, cpaasch(a)apple.com wrote:
> > > On 12/12/18 - 19:25:09, Krystad, Peter wrote:
> > > > On Tue, 2018-12-11 at 22:08 -0800, Christoph Paasch wrote:
> > > > > Hello,
> > > > >
> > > > > On 30/11/18 - 12:11:03, Mat Martineau wrote:
> > > > > > From: Peter Krystad <peter.krystad(a)intel.com>
> > > > > >
> > > > > > Add subflow_request_sock type that extends tcp_request_sock
> > > > > > and add an is_mptcp flag to tcp_request_sock distinguish them.
> > > > > >
> > > > > > Override the listen() and accept() methods of the MPTCP
> > > > > > socket proto_ops so they may act on the subflow socket.
> > > > > >
> > > > > > Override the conn_request() and syn_recv_sock() handlers
> > > > > > in the inet_connection_sock to handle incoming MPTCP
> > > > > > SYNs and the ACK to the response SYN.
> > > > >
> > > > > I'm having quite a hard time to understand how it works. Can you give some
> > > > > more details?
> > > > >
> > > > > Because, the difficult part about MPTCP is that incoming subflows are no
> > > > > more matching on a listener but rather on a "established" MPTCP-socket based
> > > > > on the token that is present in the TCP-options.
> > > > > And, I don't see how this is being taken care of here.
> > > > >
> > > > > Is the expectation that the app will call "listen()" and "accept()" on the
> > > > > MPTCP-socket ?
> > > >
> > > > Yes, the application will call listen() and accept() with the socket it
> > > > got when it called socket(..., IPPROTO_MPTCP), the normal call sequence
> > > > for a server application is preserved. In the kernel this socket is
> > > > represented by struct mptcp_sock.
> > > >
> > > > Key generation and token-tracking data structure is added in the next
> > > > patch, #9.
> > > >
> > > > How this works is that underneath the MPTCP socket is a subflow socket
> > > > that is a struct subflow_sock. This is an extended tcp_sock structure
> > > > with subflow-specific fields. mptcp_stream_listen() and
> > > > mptcp_stream_accept() routines in this patch call inet_listen() and
> > > > inet_accept() on the subflow socket just like it were a listening TCP
> > > > socket.
> > > >
> > > > When an initial MPTCP connection completes on the subflow_socket
> > > > mptcp_accept() creates a new mptcp_sock, attaches the child tcp_sock
> > > > (returned by kernel_accept()) to it and this new MPTCP socket is
> > > > returned to the application. In patch #9 you can see the token and new
> > > > mptcp_sock are stored in the token tree, so we can find it when
> > > > additional subflows are created.
> > >
> > > When these additional subflows are coming in, normally these SYNs are
> > > matching on the default listener. But, they should be matching on the
> > > mptcp-socket that belongs to the token inside the mp_join.
> > >
> > > Otherwise said, my question is: where is rx_opt.mp_join set to 1 ? I can't
> > > seem to find it.
> > >
> >
> > Join is not supported in this patchset, see 0/0 patch message. But you
> > are right additional SYNs will match on the listener subflow socket and
> > the MPTCP socket will be located by token-tree lookup.
> >
> > I have some in-progress commits that implement this but it's not
> > completely working yet.
>
> Ok, that's what I was looking for. What's your plan on how to do that?
> Because, it probably involves some changes in tcp_v4_rcv's fast-path.
I plan to extend subflow_init_req() [the existing callout in
tcp_request_sock_ops] to handle the MP_JOIN option. This would have a
token-tree lookup to validate the token in the incoming SYN w/ MP_JOIN
and is where the local address ID is allocated. Currently for incoming
SYN w/ MP_CAPABLE this is where token generation happens.
Then I would also extend subflow_syn_recv_sock() [the existing callout
in inet_connection_sock_af_ops called from tcp_check_req/tcp_v4_rcv] to
handle the MP_JOIN option as well. There would be another token-tree
lookup to find the MPTCP socket which would then be linked to the
newly-created subflow TCP socket.
Not sure how the second part can be avoided for MPTCP.
Peter.
>
> Christoph
>
> >
> > Peter.
> >
> > > Christoph
> > >
> > > >
> > > > Let me know if you have more questions.
> > > >
> > > > Peter.
> > > >
> > > >
> > > > > Thanks,
> > > > > Christoph
> > > > >
> > > > > >
> > > > > > Add handling in tcp_output.c to add MP_CAPABLE to an outgoing
> > > > > > SYN-ACK response for a subflow_request_sock.
> > > > > >
> > > > > > Signed-off-by: Peter Krystad <peter.krystad(a)intel.com>
> > > > > > ---
> > > > > > include/linux/tcp.h | 1 +
> > > > > > include/net/mptcp.h | 26 ++++++++++
> > > > > > include/net/tcp.h | 1 +
> > > > > > net/ipv4/tcp_input.c | 1 +
> > > > > > net/ipv4/tcp_output.c | 21 +++++++-
> > > > > > net/mptcp/options.c | 15 ++++++
> > > > > > net/mptcp/protocol.c | 102 ++++++++++++++++++++++++++++++++++---
> > > > > > net/mptcp/subflow.c | 115 ++++++++++++++++++++++++++++++++++++++++--
> > > > > > 8 files changed, 271 insertions(+), 11 deletions(-)
> > > > > >
> > > > > > diff --git a/include/linux/tcp.h b/include/linux/tcp.h
> > > > > > index 2622817ecd6b..b54ab3b5546a 100644
> > > > > > --- a/include/linux/tcp.h
> > > > > > +++ b/include/linux/tcp.h
> > > > > > @@ -148,6 +148,7 @@ struct tcp_request_sock {
> > > > > > * FastOpen it's the seq#
> > > > > > * after data-in-SYN.
> > > > > > */
> > > > > > + bool is_mptcp;
> > > > > > };
> > > > > >
> > > > > > static inline struct tcp_request_sock *tcp_rsk(const struct request_sock *req)
> > > > > > diff --git a/include/net/mptcp.h b/include/net/mptcp.h
> > > > > > index a5c2baeb688f..ced33f1c529e 100644
> > > > > > --- a/include/net/mptcp.h
> > > > > > +++ b/include/net/mptcp.h
> > > > > > @@ -69,6 +69,23 @@ static inline struct subflow_sock *subflow_sk(const struct sock *sk)
> > > > > > return (struct subflow_sock *)sk;
> > > > > > }
> > > > > >
> > > > > > +struct subflow_request_sock {
> > > > > > + struct tcp_request_sock sk;
> > > > > > + u8 mp_capable : 1,
> > > > > > + mp_join : 1,
> > > > > > + checksum : 1,
> > > > > > + backup : 1,
> > > > > > + version : 4;
> > > > > > + u64 local_key;
> > > > > > + u64 remote_key;
> > > > > > +};
> > > > > > +
> > > > > > +static inline
> > > > > > +struct subflow_request_sock *subflow_rsk(const struct request_sock *rsk)
> > > > > > +{
> > > > > > + return (struct subflow_request_sock *)rsk;
> > > > > > +}
> > > > > > +
> > > > > > #ifdef CONFIG_MPTCP
> > > > > >
> > > > > > void mptcp_parse_option(const unsigned char *ptr, int opsize,
> > > > > > @@ -77,6 +94,8 @@ unsigned int mptcp_syn_options(struct sock *sk, u64 *local_key);
> > > > > > void mptcp_rcv_synsent(struct sock *sk);
> > > > > > unsigned int mptcp_established_options(struct sock *sk, u64 *local_key,
> > > > > > u64 *remote_key);
> > > > > > +unsigned int mptcp_synack_options(struct request_sock *req,
> > > > > > + u64 *local_key, u64 *remote_key);
> > > > > >
> > > > > > void mptcp_finish_connect(struct sock *sk, int mp_capable);
> > > > > >
> > > > > > @@ -104,6 +123,13 @@ static inline void mptcp_rcv_synsent(struct sock *sk)
> > > > > > {
> > > > > > }
> > > > > >
> > > > > > +static inline unsigned int mptcp_synack_options(struct request_sock *sk,
> > > > > > + u64 *local_key,
> > > > > > + u64 *remote_key)
> > > > > > +{
> > > > > > + return 0;
> > > > > > +}
> > > > > > +
> > > > > > static inline unsigned int mptcp_established_options(struct sock *sk,
> > > > > > u64 *local_key,
> > > > > > u64 *remote_key)
> > > > > > diff --git a/include/net/tcp.h b/include/net/tcp.h
> > > > > > index 254cf82e2ec6..1fc6362fa778 100644
> > > > > > --- a/include/net/tcp.h
> > > > > > +++ b/include/net/tcp.h
> > > > > > @@ -216,6 +216,7 @@ void tcp_time_wait(struct sock *sk, int state, int timeo);
> > > > > > #define TCPOLEN_MSS_ALIGNED 4
> > > > > > #define TCPOLEN_EXP_SMC_BASE_ALIGNED 8
> > > > > > #define TCPOLEN_MPTCP_MPC_SYN 12
> > > > > > +#define TCPOLEN_MPTCP_MPC_SYNACK 20
> > > > > > #define TCPOLEN_MPTCP_MPC_ACK 20
> > > > > >
> > > > > > /* Flags in tp->nonagle */
> > > > > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > > > > > index eda515b141fb..00f7a3d88d66 100644
> > > > > > --- a/net/ipv4/tcp_input.c
> > > > > > +++ b/net/ipv4/tcp_input.c
> > > > > > @@ -6445,6 +6445,7 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
> > > > > >
> > > > > > tcp_rsk(req)->af_specific = af_ops;
> > > > > > tcp_rsk(req)->ts_off = 0;
> > > > > > + tcp_rsk(req)->is_mptcp = 0;
> > > > > >
> > > > > > tcp_clear_options(&tmp_opt);
> > > > > > tmp_opt.mss_clamp = af_ops->mss_clamp;
> > > > > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > > > > > index 4f284ed879ba..6f723cdb5c8e 100644
> > > > > > --- a/net/ipv4/tcp_output.c
> > > > > > +++ b/net/ipv4/tcp_output.c
> > > > > > @@ -416,6 +416,7 @@ static inline bool tcp_urg_mode(const struct tcp_sock *tp)
> > > > > >
> > > > > > /* MPTCP option subtypes */
> > > > > > #define OPTION_MPTCP_MPC_SYN (1 << 0)
> > > > > > +#define OPTION_MPTCP_MPC_SYNACK (1 << 1)
> > > > > > #define OPTION_MPTCP_MPC_ACK (1 << 2)
> > > > > >
> > > > > > struct tcp_out_options {
> > > > > > @@ -439,12 +440,15 @@ static void mptcp_options_write(__be32 *ptr, struct tcp_out_options *opts)
> > > > > > return;
> > > > > >
> > > > > > if ((OPTION_MPTCP_MPC_SYN |
> > > > > > + OPTION_MPTCP_MPC_SYNACK |
> > > > > > OPTION_MPTCP_MPC_ACK) & opts->suboptions) {
> > > > > > u8 len;
> > > > > > __be64 key;
> > > > > >
> > > > > > if (OPTION_MPTCP_MPC_SYN & opts->suboptions)
> > > > > > len = TCPOLEN_MPTCP_MPC_SYN;
> > > > > > + else if (OPTION_MPTCP_MPC_SYNACK & opts->suboptions)
> > > > > > + len = TCPOLEN_MPTCP_MPC_SYNACK;
> > > > > > else
> > > > > > len = TCPOLEN_MPTCP_MPC_ACK;
> > > > > >
> > > > > > @@ -455,7 +459,8 @@ static void mptcp_options_write(__be32 *ptr, struct tcp_out_options *opts)
> > > > > > key = cpu_to_be64(opts->sndr_key);
> > > > > > memcpy((u8 *) ptr, (u8 *) &key, 8);
> > > > > > ptr += 2;
> > > > > > - if (OPTION_MPTCP_MPC_ACK & opts->suboptions) {
> > > > > > + if ((OPTION_MPTCP_MPC_SYNACK |
> > > > > > + OPTION_MPTCP_MPC_ACK) & opts->suboptions) {
> > > > > > key = cpu_to_be64(opts->rcvr_key);
> > > > > > memcpy((u8 *) ptr, (u8 *) &key, 8);
> > > > > > ptr += 2;
> > > > > > @@ -762,6 +767,20 @@ static unsigned int tcp_synack_options(const struct sock *sk,
> > > > > > remaining -= need;
> > > > > > }
> > > > > > }
> > > > > > + if (tcp_rsk(req)->is_mptcp) {
> > > > > > + u64 local_key;
> > > > > > + u64 remote_key;
> > > > > > + if (mptcp_synack_options(req, &local_key, &remote_key)) {
> > > > > > + if (remaining >= TCPOLEN_MPTCP_MPC_SYNACK) {
> > > > > > + opts->options |= OPTION_MPTCP;
> > > > > > + opts->suboptions = OPTION_MPTCP_MPC_SYNACK;
> > > > > > + opts->sndr_key = local_key;
> > > > > > + opts->rcvr_key = remote_key;
> > > > > > + remaining -= TCPOLEN_MPTCP_MPC_SYNACK;
> > > > > > + }
> > > > > > + }
> > > > > > + }
> > > > > > +
> > > > > > smc_set_option_cond(tcp_sk(sk), ireq, opts, &remaining);
> > > > > >
> > > > > > return MAX_TCP_OPTION_SPACE - remaining;
> > > > > > diff --git a/net/mptcp/options.c b/net/mptcp/options.c
> > > > > > index b0616f520da0..266a9f7fed0d 100644
> > > > > > --- a/net/mptcp/options.c
> > > > > > +++ b/net/mptcp/options.c
> > > > > > @@ -189,3 +189,18 @@ unsigned int mptcp_established_options(struct sock *sk, u64 *local_key,
> > > > > > }
> > > > > > return 0;
> > > > > > }
> > > > > > +
> > > > > > +unsigned int mptcp_synack_options(struct request_sock *req, u64 *local_key,
> > > > > > + u64 *remote_key)
> > > > > > +{
> > > > > > + struct subflow_request_sock *subflow_req = subflow_rsk(req);
> > > > > > +
> > > > > > + pr_debug("subflow_req=%p", subflow_req);
> > > > > > + if (subflow_req->mp_capable) {
> > > > > > + *local_key = subflow_req->local_key;
> > > > > > + *remote_key = subflow_req->remote_key;
> > > > > > + pr_debug("local_key=%llu", *local_key);
> > > > > > + pr_debug("remote_key=%llu", *remote_key);
> > > > > > + }
> > > > > > + return subflow_req->mp_capable;
> > > > > > +}
> > > > > > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> > > > > > index 1a3412a742ea..9f802f69a528 100644
> > > > > > --- a/net/mptcp/protocol.c
> > > > > > +++ b/net/mptcp/protocol.c
> > > > > > @@ -80,6 +80,45 @@ static void mptcp_close(struct sock *sk, long timeout)
> > > > > > }
> > > > > > }
> > > > > >
> > > > > > +static struct sock *mptcp_accept(struct sock *sk, int flags, int *err,
> > > > > > + bool kern)
> > > > > > +{
> > > > > > + struct mptcp_sock *msk = mptcp_sk(sk);
> > > > > > + struct socket *listener = msk->subflow;
> > > > > > + struct socket *new_sock;
> > > > > > + struct socket *mp;
> > > > > > + struct subflow_sock *subflow;
> > > > > > +
> > > > > > + pr_debug("msk=%p, listener=%p", msk, listener->sk);
> > > > > > + *err = kernel_accept(listener, &new_sock, flags);
> > > > > > + if (*err < 0)
> > > > > > + return NULL;
> > > > > > +
> > > > > > + subflow = subflow_sk(new_sock->sk);
> > > > > > + pr_debug("new_sock=%p", subflow);
> > > > > > +
> > > > > > + *err = sock_create(PF_INET, SOCK_STREAM, IPPROTO_MPTCP, &mp);
> > > > > > + if (*err < 0) {
> > > > > > + kernel_sock_shutdown(new_sock, SHUT_RDWR);
> > > > > > + sock_release(new_sock);
> > > > > > + return NULL;
> > > > > > + }
> > > > > > +
> > > > > > + msk = mptcp_sk(mp->sk);
> > > > > > + pr_debug("msk=%p", msk);
> > > > > > + subflow->conn = mp->sk;
> > > > > > +
> > > > > > + if (subflow->mp_capable) {
> > > > > > + msk->remote_key = subflow->remote_key;
> > > > > > + msk->local_key = subflow->local_key;
> > > > > > + msk->connection_list = new_sock;
> > > > > > + } else {
> > > > > > + msk->subflow = new_sock;
> > > > > > + }
> > > > > > +
> > > > > > + return mp->sk;
> > > > > > +}
> > > > > > +
> > > > > > static int mptcp_get_port(struct sock *sk, unsigned short snum)
> > > > > > {
> > > > > > struct mptcp_sock *msk = mptcp_sk(sk);
> > > > > > @@ -129,11 +168,16 @@ static int subflow_create(struct sock *sock)
> > > > > > int mptcp_stream_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
> > > > > > {
> > > > > > struct mptcp_sock *msk = mptcp_sk(sock->sk);
> > > > > > - struct socket *subflow = msk->subflow;
> > > > > > + int err;
> > > > > >
> > > > > > - pr_debug("msk=%p, subflow=%p", msk, subflow->sk);
> > > > > > + pr_debug("msk=%p", msk);
> > > > > >
> > > > > > - return inet_bind(subflow, uaddr, addr_len);
> > > > > > + if (msk->subflow == NULL) {
> > > > > > + err = subflow_create(sock->sk);
> > > > > > + if (err)
> > > > > > + return err;
> > > > > > + }
> > > > > > + return inet_bind(msk->subflow, uaddr, addr_len);
> > > > > > }
> > > > > >
> > > > > > int mptcp_stream_connect(struct socket *sock, struct sockaddr *uaddr,
> > > > > > @@ -153,12 +197,56 @@ int mptcp_stream_connect(struct socket *sock, struct sockaddr *uaddr,
> > > > > > return inet_stream_connect(msk->subflow, uaddr, addr_len, flags);
> > > > > > }
> > > > > >
> > > > > > +int mptcp_stream_getname(struct socket *sock, struct sockaddr *uaddr, int peer)
> > > > > > +{
> > > > > > + struct mptcp_sock *msk = mptcp_sk(sock->sk);
> > > > > > + struct socket *subflow;
> > > > > > + int err = -EPERM;
> > > > > > +
> > > > > > + if (msk->connection_list)
> > > > > > + subflow = msk->connection_list;
> > > > > > + else
> > > > > > + subflow = msk->subflow;
> > > > > > +
> > > > > > + err = inet_getname(subflow, uaddr, peer);
> > > > > > +
> > > > > > + return err;
> > > > > > +}
> > > > > > +
> > > > > > +int mptcp_stream_listen(struct socket *sock, int backlog)
> > > > > > +{
> > > > > > + struct mptcp_sock *msk = mptcp_sk(sock->sk);
> > > > > > + int err;
> > > > > > +
> > > > > > + pr_debug("msk=%p", msk);
> > > > > > +
> > > > > > + if (msk->subflow == NULL) {
> > > > > > + err = subflow_create(sock->sk);
> > > > > > + if (err)
> > > > > > + return err;
> > > > > > + }
> > > > > > + return inet_listen(msk->subflow, backlog);
> > > > > > +}
> > > > > > +
> > > > > > +int mptcp_stream_accept(struct socket *sock, struct socket *newsock, int flags,
> > > > > > + bool kern)
> > > > > > +{
> > > > > > + struct mptcp_sock *msk = mptcp_sk(sock->sk);
> > > > > > +
> > > > > > + pr_debug("msk=%p", msk);
> > > > > > +
> > > > > > + if (msk->subflow == NULL) {
> > > > > > + return -EINVAL;
> > > > > > + }
> > > > > > + return inet_accept(sock, newsock, flags, kern);
> > > > > > +}
> > > > > > +
> > > > > > static struct proto mptcp_prot = {
> > > > > > .name = "MPTCP",
> > > > > > .owner = THIS_MODULE,
> > > > > > .init = mptcp_init_sock,
> > > > > > .close = mptcp_close,
> > > > > > - .accept = inet_csk_accept,
> > > > > > + .accept = mptcp_accept,
> > > > > > .shutdown = tcp_shutdown,
> > > > > > .sendmsg = mptcp_sendmsg,
> > > > > > .recvmsg = mptcp_recvmsg,
> > > > > > @@ -176,11 +264,11 @@ const struct proto_ops mptcp_stream_ops = {
> > > > > > .bind = mptcp_stream_bind,
> > > > > > .connect = mptcp_stream_connect,
> > > > > > .socketpair = sock_no_socketpair,
> > > > > > - .accept = inet_accept,
> > > > > > - .getname = inet_getname,
> > > > > > + .accept = mptcp_stream_accept,
> > > > > > + .getname = mptcp_stream_getname,
> > > > > > .poll = tcp_poll,
> > > > > > .ioctl = inet_ioctl,
> > > > > > - .listen = inet_listen,
> > > > > > + .listen = mptcp_stream_listen,
> > > > > > .shutdown = inet_shutdown,
> > > > > > .setsockopt = sock_common_setsockopt,
> > > > > > .getsockopt = sock_common_getsockopt,
> > > > > > diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
> > > > > > index 5e5fdcb3175f..89fcc3b746eb 100644
> > > > > > --- a/net/mptcp/subflow.c
> > > > > > +++ b/net/mptcp/subflow.c
> > > > > > @@ -53,6 +53,40 @@ static int subflow_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
> > > > > > return tcp_recvmsg(sk, msg, len, nonblock, flags, addr_len);
> > > > > > }
> > > > > >
> > > > > > +static void subflow_v4_init_req(struct request_sock *req,
> > > > > > + const struct sock *sk_listener,
> > > > > > + struct sk_buff *skb)
> > > > > > +{
> > > > > > + struct subflow_request_sock *subflow_req = subflow_rsk(req);
> > > > > > + struct subflow_sock *listener = subflow_sk(sk_listener);
> > > > > > + struct tcp_options_received rx_opt;
> > > > > > +
> > > > > > + tcp_rsk(req)->is_mptcp = 1;
> > > > > > + pr_debug("subflow_req=%p, listener=%p", subflow_req, listener);
> > > > > > +
> > > > > > + tcp_request_sock_ipv4_ops.init_req(req, sk_listener, skb);
> > > > > > +
> > > > > > + rx_opt.mptcp.flags = 0;
> > > > > > + rx_opt.mptcp.mp_capable = 0;
> > > > > > + rx_opt.mptcp.mp_join = 0;
> > > > > > + rx_opt.mptcp.dss = 0;
> > > > > > + mptcp_get_options(skb, &rx_opt);
> > > > > > +
> > > > > > + if (rx_opt.mptcp.mp_capable && listener->request_mptcp) {
> > > > > > + subflow_req->mp_capable = 1;
> > > > > > + if (rx_opt.mptcp.version >= listener->version)
> > > > > > + subflow_req->version = listener->version;
> > > > > > + else
> > > > > > + subflow_req->version = rx_opt.mptcp.version;
> > > > > > + if ((rx_opt.mptcp.flags & MPTCP_CAP_CHECKSUM_REQD) ||
> > > > > > + listener->checksum)
> > > > > > + subflow_req->checksum = 1;
> > > > > > + subflow_req->remote_key = rx_opt.mptcp.sndr_key;
> > > > > > + } else {
> > > > > > + subflow_req->mp_capable = 0;
> > > > > > + }
> > > > > > +}
> > > > > > +
> > > > > > static void subflow_finish_connect(struct sock *sk, const struct sk_buff *skb)
> > > > > > {
> > > > > > struct subflow_sock *subflow = subflow_sk(sk);
> > > > > > @@ -68,13 +102,66 @@ static void subflow_finish_connect(struct sock *sk, const struct sk_buff *skb)
> > > > > > }
> > > > > > }
> > > > > >
> > > > > > +static struct request_sock_ops subflow_request_sock_ops;
> > > > > > +static struct tcp_request_sock_ops subflow_request_sock_ipv4_ops;
> > > > > > +
> > > > > > +static int subflow_conn_request(struct sock *sk, struct sk_buff *skb)
> > > > > > +{
> > > > > > + struct subflow_sock *subflow = subflow_sk(sk);
> > > > > > +
> > > > > > + pr_debug("subflow=%p", subflow);
> > > > > > +
> > > > > > + /* Never answer to SYNs sent to broadcast or multicast */
> > > > > > + if (skb_rtable(skb)->rt_flags & (RTCF_BROADCAST | RTCF_MULTICAST))
> > > > > > + goto drop;
> > > > > > +
> > > > > > + return tcp_conn_request(&subflow_request_sock_ops,
> > > > > > + &subflow_request_sock_ipv4_ops,
> > > > > > + sk, skb);
> > > > > > +drop:
> > > > > > + tcp_listendrop(sk);
> > > > > > + return 0;
> > > > > > +}
> > > > > > +
> > > > > > +static struct sock *subflow_syn_recv_sock(const struct sock *sk,
> > > > > > + struct sk_buff *skb,
> > > > > > + struct request_sock *req,
> > > > > > + struct dst_entry *dst,
> > > > > > + struct request_sock *req_unhash,
> > > > > > + bool *own_req)
> > > > > > +{
> > > > > > + struct subflow_sock *listener = subflow_sk(sk);
> > > > > > + struct subflow_request_sock *subflow_req = subflow_rsk(req);
> > > > > > + struct sock *child;
> > > > > > +
> > > > > > + pr_debug("listener=%p, req=%p, conn=%p", sk, req, listener->conn);
> > > > > > +
> > > > > > + child = tcp_v4_syn_recv_sock(sk, skb, req, dst, req_unhash, own_req);
> > > > > > +
> > > > > > + if (child) {
> > > > > > + struct subflow_sock *subflow = subflow_sk(child);
> > > > > > +
> > > > > > + pr_debug("child=%p", child);
> > > > > > + if (subflow_req->mp_capable) {
> > > > > > + subflow->mp_capable = 1;
> > > > > > + subflow->fourth_ack = 1;
> > > > > > + subflow->remote_key = subflow_req->remote_key;
> > > > > > + subflow->local_key = subflow_req->local_key;
> > > > > > + } else {
> > > > > > + subflow->mp_capable = 0;
> > > > > > + }
> > > > > > + }
> > > > > > +
> > > > > > + return child;
> > > > > > +}
> > > > > > +
> > > > > > const struct inet_connection_sock_af_ops subflow_specific = {
> > > > > > .queue_xmit = ip_queue_xmit,
> > > > > > .send_check = tcp_v4_send_check,
> > > > > > .rebuild_header = inet_sk_rebuild_header,
> > > > > > .sk_rx_dst_set = subflow_finish_connect,
> > > > > > - .conn_request = tcp_v4_conn_request,
> > > > > > - .syn_recv_sock = tcp_v4_syn_recv_sock,
> > > > > > + .conn_request = subflow_conn_request,
> > > > > > + .syn_recv_sock = subflow_syn_recv_sock,
> > > > > > .net_header_len = sizeof(struct iphdr),
> > > > > > .setsockopt = ip_setsockopt,
> > > > > > .getsockopt = ip_getsockopt,
> > > > > > @@ -112,6 +199,21 @@ static void subflow_close(struct sock *sk, long timeout)
> > > > > > tcp_close(sk, timeout);
> > > > > > }
> > > > > >
> > > > > > +static struct sock *subflow_accept(struct sock *sk, int flags, int *err,
> > > > > > + bool kern)
> > > > > > +{
> > > > > > + struct subflow_sock *subflow = subflow_sk(sk);
> > > > > > + struct sock *child;
> > > > > > +
> > > > > > + pr_debug("subflow=%p, conn=%p", subflow, subflow->conn);
> > > > > > +
> > > > > > + child = inet_csk_accept(sk, flags, err, kern);
> > > > > > +
> > > > > > + pr_debug("child=%p", child);
> > > > > > +
> > > > > > + return child;
> > > > > > +}
> > > > > > +
> > > > > > static void subflow_destroy(struct sock *sk)
> > > > > > {
> > > > > > pr_debug("subflow=%p", sk);
> > > > > > @@ -125,7 +227,7 @@ static struct proto subflow_prot = {
> > > > > > .close = subflow_close,
> > > > > > .connect = subflow_connect,
> > > > > > .disconnect = tcp_disconnect,
> > > > > > - .accept = inet_csk_accept,
> > > > > > + .accept = subflow_accept,
> > > > > > .ioctl = tcp_ioctl,
> > > > > > .init = subflow_init_sock,
> > > > > > .destroy = subflow_destroy,
> > > > > > @@ -169,7 +271,14 @@ int mptcp_subflow_init(void)
> > > > > >
> > > > > > /* TODO: Register path manager callbacks. */
> > > > > >
> > > > > > + subflow_request_sock_ops = tcp_request_sock_ops;
> > > > > > + subflow_request_sock_ops.obj_size = sizeof(struct subflow_request_sock),
> > > > > > +
> > > > > > + subflow_request_sock_ipv4_ops = tcp_request_sock_ipv4_ops;
> > > > > > + subflow_request_sock_ipv4_ops.init_req = subflow_v4_init_req;
> > > > > > +
> > > > > > subflow_prot.twsk_prot = tcp_prot.twsk_prot;
> > > > > > + subflow_prot.rsk_prot = &subflow_request_sock_ops;
> > > > > > subflow_prot.h.hashinfo = tcp_prot.h.hashinfo;
> > > > > > err = proto_register(&subflow_prot, 1);
> > > > > > if (err)
> > > > > > --
> > > > > > 2.19.1
> > > > > >
> > > > > > _______________________________________________
> > > > > > mptcp mailing list
> > > > > > mptcp(a)lists.01.org
> > > > > > https://lists.01.org/mailman/listinfo/mptcp
> > > > >
> > > > > _______________________________________________
> > > > > mptcp mailing list
> > > > > mptcp(a)lists.01.org
> > > > > https://lists.01.org/mailman/listinfo/mptcp
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [MPTCP] [RFC PATCH v4 08/17] mptcp: Create SUBFLOW socket for incoming connections
@ 2018-12-12 21:59 cpaasch
0 siblings, 0 replies; 7+ messages in thread
From: cpaasch @ 2018-12-12 21:59 UTC (permalink / raw)
To: mptcp
[-- Attachment #1: Type: text/plain, Size: 23934 bytes --]
On 12/12/18 - 21:45:58, Krystad, Peter wrote:
> On Wed, 2018-12-12 at 13:07 -0800, cpaasch(a)apple.com wrote:
> > On 12/12/18 - 19:25:09, Krystad, Peter wrote:
> > > On Tue, 2018-12-11 at 22:08 -0800, Christoph Paasch wrote:
> > > > Hello,
> > > >
> > > > On 30/11/18 - 12:11:03, Mat Martineau wrote:
> > > > > From: Peter Krystad <peter.krystad(a)intel.com>
> > > > >
> > > > > Add subflow_request_sock type that extends tcp_request_sock
> > > > > and add an is_mptcp flag to tcp_request_sock distinguish them.
> > > > >
> > > > > Override the listen() and accept() methods of the MPTCP
> > > > > socket proto_ops so they may act on the subflow socket.
> > > > >
> > > > > Override the conn_request() and syn_recv_sock() handlers
> > > > > in the inet_connection_sock to handle incoming MPTCP
> > > > > SYNs and the ACK to the response SYN.
> > > >
> > > > I'm having quite a hard time to understand how it works. Can you give some
> > > > more details?
> > > >
> > > > Because, the difficult part about MPTCP is that incoming subflows are no
> > > > more matching on a listener but rather on a "established" MPTCP-socket based
> > > > on the token that is present in the TCP-options.
> > > > And, I don't see how this is being taken care of here.
> > > >
> > > > Is the expectation that the app will call "listen()" and "accept()" on the
> > > > MPTCP-socket ?
> > >
> > > Yes, the application will call listen() and accept() with the socket it
> > > got when it called socket(..., IPPROTO_MPTCP), the normal call sequence
> > > for a server application is preserved. In the kernel this socket is
> > > represented by struct mptcp_sock.
> > >
> > > Key generation and token-tracking data structure is added in the next
> > > patch, #9.
> > >
> > > How this works is that underneath the MPTCP socket is a subflow socket
> > > that is a struct subflow_sock. This is an extended tcp_sock structure
> > > with subflow-specific fields. mptcp_stream_listen() and
> > > mptcp_stream_accept() routines in this patch call inet_listen() and
> > > inet_accept() on the subflow socket just like it were a listening TCP
> > > socket.
> > >
> > > When an initial MPTCP connection completes on the subflow_socket
> > > mptcp_accept() creates a new mptcp_sock, attaches the child tcp_sock
> > > (returned by kernel_accept()) to it and this new MPTCP socket is
> > > returned to the application. In patch #9 you can see the token and new
> > > mptcp_sock are stored in the token tree, so we can find it when
> > > additional subflows are created.
> >
> > When these additional subflows are coming in, normally these SYNs are
> > matching on the default listener. But, they should be matching on the
> > mptcp-socket that belongs to the token inside the mp_join.
> >
> > Otherwise said, my question is: where is rx_opt.mp_join set to 1 ? I can't
> > seem to find it.
> >
> Join is not supported in this patchset, see 0/0 patch message. But you
> are right additional SYNs will match on the listener subflow socket and
> the MPTCP socket will be located by token-tree lookup.
>
> I have some in-progress commits that implement this but it's not
> completely working yet.
Ok, that's what I was looking for. What's your plan on how to do that?
Because, it probably involves some changes in tcp_v4_rcv's fast-path.
Christoph
>
> Peter.
>
> > Christoph
> >
> > >
> > > Let me know if you have more questions.
> > >
> > > Peter.
> > >
> > >
> > > > Thanks,
> > > > Christoph
> > > >
> > > > >
> > > > > Add handling in tcp_output.c to add MP_CAPABLE to an outgoing
> > > > > SYN-ACK response for a subflow_request_sock.
> > > > >
> > > > > Signed-off-by: Peter Krystad <peter.krystad(a)intel.com>
> > > > > ---
> > > > > include/linux/tcp.h | 1 +
> > > > > include/net/mptcp.h | 26 ++++++++++
> > > > > include/net/tcp.h | 1 +
> > > > > net/ipv4/tcp_input.c | 1 +
> > > > > net/ipv4/tcp_output.c | 21 +++++++-
> > > > > net/mptcp/options.c | 15 ++++++
> > > > > net/mptcp/protocol.c | 102 ++++++++++++++++++++++++++++++++++---
> > > > > net/mptcp/subflow.c | 115 ++++++++++++++++++++++++++++++++++++++++--
> > > > > 8 files changed, 271 insertions(+), 11 deletions(-)
> > > > >
> > > > > diff --git a/include/linux/tcp.h b/include/linux/tcp.h
> > > > > index 2622817ecd6b..b54ab3b5546a 100644
> > > > > --- a/include/linux/tcp.h
> > > > > +++ b/include/linux/tcp.h
> > > > > @@ -148,6 +148,7 @@ struct tcp_request_sock {
> > > > > * FastOpen it's the seq#
> > > > > * after data-in-SYN.
> > > > > */
> > > > > + bool is_mptcp;
> > > > > };
> > > > >
> > > > > static inline struct tcp_request_sock *tcp_rsk(const struct request_sock *req)
> > > > > diff --git a/include/net/mptcp.h b/include/net/mptcp.h
> > > > > index a5c2baeb688f..ced33f1c529e 100644
> > > > > --- a/include/net/mptcp.h
> > > > > +++ b/include/net/mptcp.h
> > > > > @@ -69,6 +69,23 @@ static inline struct subflow_sock *subflow_sk(const struct sock *sk)
> > > > > return (struct subflow_sock *)sk;
> > > > > }
> > > > >
> > > > > +struct subflow_request_sock {
> > > > > + struct tcp_request_sock sk;
> > > > > + u8 mp_capable : 1,
> > > > > + mp_join : 1,
> > > > > + checksum : 1,
> > > > > + backup : 1,
> > > > > + version : 4;
> > > > > + u64 local_key;
> > > > > + u64 remote_key;
> > > > > +};
> > > > > +
> > > > > +static inline
> > > > > +struct subflow_request_sock *subflow_rsk(const struct request_sock *rsk)
> > > > > +{
> > > > > + return (struct subflow_request_sock *)rsk;
> > > > > +}
> > > > > +
> > > > > #ifdef CONFIG_MPTCP
> > > > >
> > > > > void mptcp_parse_option(const unsigned char *ptr, int opsize,
> > > > > @@ -77,6 +94,8 @@ unsigned int mptcp_syn_options(struct sock *sk, u64 *local_key);
> > > > > void mptcp_rcv_synsent(struct sock *sk);
> > > > > unsigned int mptcp_established_options(struct sock *sk, u64 *local_key,
> > > > > u64 *remote_key);
> > > > > +unsigned int mptcp_synack_options(struct request_sock *req,
> > > > > + u64 *local_key, u64 *remote_key);
> > > > >
> > > > > void mptcp_finish_connect(struct sock *sk, int mp_capable);
> > > > >
> > > > > @@ -104,6 +123,13 @@ static inline void mptcp_rcv_synsent(struct sock *sk)
> > > > > {
> > > > > }
> > > > >
> > > > > +static inline unsigned int mptcp_synack_options(struct request_sock *sk,
> > > > > + u64 *local_key,
> > > > > + u64 *remote_key)
> > > > > +{
> > > > > + return 0;
> > > > > +}
> > > > > +
> > > > > static inline unsigned int mptcp_established_options(struct sock *sk,
> > > > > u64 *local_key,
> > > > > u64 *remote_key)
> > > > > diff --git a/include/net/tcp.h b/include/net/tcp.h
> > > > > index 254cf82e2ec6..1fc6362fa778 100644
> > > > > --- a/include/net/tcp.h
> > > > > +++ b/include/net/tcp.h
> > > > > @@ -216,6 +216,7 @@ void tcp_time_wait(struct sock *sk, int state, int timeo);
> > > > > #define TCPOLEN_MSS_ALIGNED 4
> > > > > #define TCPOLEN_EXP_SMC_BASE_ALIGNED 8
> > > > > #define TCPOLEN_MPTCP_MPC_SYN 12
> > > > > +#define TCPOLEN_MPTCP_MPC_SYNACK 20
> > > > > #define TCPOLEN_MPTCP_MPC_ACK 20
> > > > >
> > > > > /* Flags in tp->nonagle */
> > > > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > > > > index eda515b141fb..00f7a3d88d66 100644
> > > > > --- a/net/ipv4/tcp_input.c
> > > > > +++ b/net/ipv4/tcp_input.c
> > > > > @@ -6445,6 +6445,7 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
> > > > >
> > > > > tcp_rsk(req)->af_specific = af_ops;
> > > > > tcp_rsk(req)->ts_off = 0;
> > > > > + tcp_rsk(req)->is_mptcp = 0;
> > > > >
> > > > > tcp_clear_options(&tmp_opt);
> > > > > tmp_opt.mss_clamp = af_ops->mss_clamp;
> > > > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > > > > index 4f284ed879ba..6f723cdb5c8e 100644
> > > > > --- a/net/ipv4/tcp_output.c
> > > > > +++ b/net/ipv4/tcp_output.c
> > > > > @@ -416,6 +416,7 @@ static inline bool tcp_urg_mode(const struct tcp_sock *tp)
> > > > >
> > > > > /* MPTCP option subtypes */
> > > > > #define OPTION_MPTCP_MPC_SYN (1 << 0)
> > > > > +#define OPTION_MPTCP_MPC_SYNACK (1 << 1)
> > > > > #define OPTION_MPTCP_MPC_ACK (1 << 2)
> > > > >
> > > > > struct tcp_out_options {
> > > > > @@ -439,12 +440,15 @@ static void mptcp_options_write(__be32 *ptr, struct tcp_out_options *opts)
> > > > > return;
> > > > >
> > > > > if ((OPTION_MPTCP_MPC_SYN |
> > > > > + OPTION_MPTCP_MPC_SYNACK |
> > > > > OPTION_MPTCP_MPC_ACK) & opts->suboptions) {
> > > > > u8 len;
> > > > > __be64 key;
> > > > >
> > > > > if (OPTION_MPTCP_MPC_SYN & opts->suboptions)
> > > > > len = TCPOLEN_MPTCP_MPC_SYN;
> > > > > + else if (OPTION_MPTCP_MPC_SYNACK & opts->suboptions)
> > > > > + len = TCPOLEN_MPTCP_MPC_SYNACK;
> > > > > else
> > > > > len = TCPOLEN_MPTCP_MPC_ACK;
> > > > >
> > > > > @@ -455,7 +459,8 @@ static void mptcp_options_write(__be32 *ptr, struct tcp_out_options *opts)
> > > > > key = cpu_to_be64(opts->sndr_key);
> > > > > memcpy((u8 *) ptr, (u8 *) &key, 8);
> > > > > ptr += 2;
> > > > > - if (OPTION_MPTCP_MPC_ACK & opts->suboptions) {
> > > > > + if ((OPTION_MPTCP_MPC_SYNACK |
> > > > > + OPTION_MPTCP_MPC_ACK) & opts->suboptions) {
> > > > > key = cpu_to_be64(opts->rcvr_key);
> > > > > memcpy((u8 *) ptr, (u8 *) &key, 8);
> > > > > ptr += 2;
> > > > > @@ -762,6 +767,20 @@ static unsigned int tcp_synack_options(const struct sock *sk,
> > > > > remaining -= need;
> > > > > }
> > > > > }
> > > > > + if (tcp_rsk(req)->is_mptcp) {
> > > > > + u64 local_key;
> > > > > + u64 remote_key;
> > > > > + if (mptcp_synack_options(req, &local_key, &remote_key)) {
> > > > > + if (remaining >= TCPOLEN_MPTCP_MPC_SYNACK) {
> > > > > + opts->options |= OPTION_MPTCP;
> > > > > + opts->suboptions = OPTION_MPTCP_MPC_SYNACK;
> > > > > + opts->sndr_key = local_key;
> > > > > + opts->rcvr_key = remote_key;
> > > > > + remaining -= TCPOLEN_MPTCP_MPC_SYNACK;
> > > > > + }
> > > > > + }
> > > > > + }
> > > > > +
> > > > > smc_set_option_cond(tcp_sk(sk), ireq, opts, &remaining);
> > > > >
> > > > > return MAX_TCP_OPTION_SPACE - remaining;
> > > > > diff --git a/net/mptcp/options.c b/net/mptcp/options.c
> > > > > index b0616f520da0..266a9f7fed0d 100644
> > > > > --- a/net/mptcp/options.c
> > > > > +++ b/net/mptcp/options.c
> > > > > @@ -189,3 +189,18 @@ unsigned int mptcp_established_options(struct sock *sk, u64 *local_key,
> > > > > }
> > > > > return 0;
> > > > > }
> > > > > +
> > > > > +unsigned int mptcp_synack_options(struct request_sock *req, u64 *local_key,
> > > > > + u64 *remote_key)
> > > > > +{
> > > > > + struct subflow_request_sock *subflow_req = subflow_rsk(req);
> > > > > +
> > > > > + pr_debug("subflow_req=%p", subflow_req);
> > > > > + if (subflow_req->mp_capable) {
> > > > > + *local_key = subflow_req->local_key;
> > > > > + *remote_key = subflow_req->remote_key;
> > > > > + pr_debug("local_key=%llu", *local_key);
> > > > > + pr_debug("remote_key=%llu", *remote_key);
> > > > > + }
> > > > > + return subflow_req->mp_capable;
> > > > > +}
> > > > > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> > > > > index 1a3412a742ea..9f802f69a528 100644
> > > > > --- a/net/mptcp/protocol.c
> > > > > +++ b/net/mptcp/protocol.c
> > > > > @@ -80,6 +80,45 @@ static void mptcp_close(struct sock *sk, long timeout)
> > > > > }
> > > > > }
> > > > >
> > > > > +static struct sock *mptcp_accept(struct sock *sk, int flags, int *err,
> > > > > + bool kern)
> > > > > +{
> > > > > + struct mptcp_sock *msk = mptcp_sk(sk);
> > > > > + struct socket *listener = msk->subflow;
> > > > > + struct socket *new_sock;
> > > > > + struct socket *mp;
> > > > > + struct subflow_sock *subflow;
> > > > > +
> > > > > + pr_debug("msk=%p, listener=%p", msk, listener->sk);
> > > > > + *err = kernel_accept(listener, &new_sock, flags);
> > > > > + if (*err < 0)
> > > > > + return NULL;
> > > > > +
> > > > > + subflow = subflow_sk(new_sock->sk);
> > > > > + pr_debug("new_sock=%p", subflow);
> > > > > +
> > > > > + *err = sock_create(PF_INET, SOCK_STREAM, IPPROTO_MPTCP, &mp);
> > > > > + if (*err < 0) {
> > > > > + kernel_sock_shutdown(new_sock, SHUT_RDWR);
> > > > > + sock_release(new_sock);
> > > > > + return NULL;
> > > > > + }
> > > > > +
> > > > > + msk = mptcp_sk(mp->sk);
> > > > > + pr_debug("msk=%p", msk);
> > > > > + subflow->conn = mp->sk;
> > > > > +
> > > > > + if (subflow->mp_capable) {
> > > > > + msk->remote_key = subflow->remote_key;
> > > > > + msk->local_key = subflow->local_key;
> > > > > + msk->connection_list = new_sock;
> > > > > + } else {
> > > > > + msk->subflow = new_sock;
> > > > > + }
> > > > > +
> > > > > + return mp->sk;
> > > > > +}
> > > > > +
> > > > > static int mptcp_get_port(struct sock *sk, unsigned short snum)
> > > > > {
> > > > > struct mptcp_sock *msk = mptcp_sk(sk);
> > > > > @@ -129,11 +168,16 @@ static int subflow_create(struct sock *sock)
> > > > > int mptcp_stream_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
> > > > > {
> > > > > struct mptcp_sock *msk = mptcp_sk(sock->sk);
> > > > > - struct socket *subflow = msk->subflow;
> > > > > + int err;
> > > > >
> > > > > - pr_debug("msk=%p, subflow=%p", msk, subflow->sk);
> > > > > + pr_debug("msk=%p", msk);
> > > > >
> > > > > - return inet_bind(subflow, uaddr, addr_len);
> > > > > + if (msk->subflow == NULL) {
> > > > > + err = subflow_create(sock->sk);
> > > > > + if (err)
> > > > > + return err;
> > > > > + }
> > > > > + return inet_bind(msk->subflow, uaddr, addr_len);
> > > > > }
> > > > >
> > > > > int mptcp_stream_connect(struct socket *sock, struct sockaddr *uaddr,
> > > > > @@ -153,12 +197,56 @@ int mptcp_stream_connect(struct socket *sock, struct sockaddr *uaddr,
> > > > > return inet_stream_connect(msk->subflow, uaddr, addr_len, flags);
> > > > > }
> > > > >
> > > > > +int mptcp_stream_getname(struct socket *sock, struct sockaddr *uaddr, int peer)
> > > > > +{
> > > > > + struct mptcp_sock *msk = mptcp_sk(sock->sk);
> > > > > + struct socket *subflow;
> > > > > + int err = -EPERM;
> > > > > +
> > > > > + if (msk->connection_list)
> > > > > + subflow = msk->connection_list;
> > > > > + else
> > > > > + subflow = msk->subflow;
> > > > > +
> > > > > + err = inet_getname(subflow, uaddr, peer);
> > > > > +
> > > > > + return err;
> > > > > +}
> > > > > +
> > > > > +int mptcp_stream_listen(struct socket *sock, int backlog)
> > > > > +{
> > > > > + struct mptcp_sock *msk = mptcp_sk(sock->sk);
> > > > > + int err;
> > > > > +
> > > > > + pr_debug("msk=%p", msk);
> > > > > +
> > > > > + if (msk->subflow == NULL) {
> > > > > + err = subflow_create(sock->sk);
> > > > > + if (err)
> > > > > + return err;
> > > > > + }
> > > > > + return inet_listen(msk->subflow, backlog);
> > > > > +}
> > > > > +
> > > > > +int mptcp_stream_accept(struct socket *sock, struct socket *newsock, int flags,
> > > > > + bool kern)
> > > > > +{
> > > > > + struct mptcp_sock *msk = mptcp_sk(sock->sk);
> > > > > +
> > > > > + pr_debug("msk=%p", msk);
> > > > > +
> > > > > + if (msk->subflow == NULL) {
> > > > > + return -EINVAL;
> > > > > + }
> > > > > + return inet_accept(sock, newsock, flags, kern);
> > > > > +}
> > > > > +
> > > > > static struct proto mptcp_prot = {
> > > > > .name = "MPTCP",
> > > > > .owner = THIS_MODULE,
> > > > > .init = mptcp_init_sock,
> > > > > .close = mptcp_close,
> > > > > - .accept = inet_csk_accept,
> > > > > + .accept = mptcp_accept,
> > > > > .shutdown = tcp_shutdown,
> > > > > .sendmsg = mptcp_sendmsg,
> > > > > .recvmsg = mptcp_recvmsg,
> > > > > @@ -176,11 +264,11 @@ const struct proto_ops mptcp_stream_ops = {
> > > > > .bind = mptcp_stream_bind,
> > > > > .connect = mptcp_stream_connect,
> > > > > .socketpair = sock_no_socketpair,
> > > > > - .accept = inet_accept,
> > > > > - .getname = inet_getname,
> > > > > + .accept = mptcp_stream_accept,
> > > > > + .getname = mptcp_stream_getname,
> > > > > .poll = tcp_poll,
> > > > > .ioctl = inet_ioctl,
> > > > > - .listen = inet_listen,
> > > > > + .listen = mptcp_stream_listen,
> > > > > .shutdown = inet_shutdown,
> > > > > .setsockopt = sock_common_setsockopt,
> > > > > .getsockopt = sock_common_getsockopt,
> > > > > diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
> > > > > index 5e5fdcb3175f..89fcc3b746eb 100644
> > > > > --- a/net/mptcp/subflow.c
> > > > > +++ b/net/mptcp/subflow.c
> > > > > @@ -53,6 +53,40 @@ static int subflow_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
> > > > > return tcp_recvmsg(sk, msg, len, nonblock, flags, addr_len);
> > > > > }
> > > > >
> > > > > +static void subflow_v4_init_req(struct request_sock *req,
> > > > > + const struct sock *sk_listener,
> > > > > + struct sk_buff *skb)
> > > > > +{
> > > > > + struct subflow_request_sock *subflow_req = subflow_rsk(req);
> > > > > + struct subflow_sock *listener = subflow_sk(sk_listener);
> > > > > + struct tcp_options_received rx_opt;
> > > > > +
> > > > > + tcp_rsk(req)->is_mptcp = 1;
> > > > > + pr_debug("subflow_req=%p, listener=%p", subflow_req, listener);
> > > > > +
> > > > > + tcp_request_sock_ipv4_ops.init_req(req, sk_listener, skb);
> > > > > +
> > > > > + rx_opt.mptcp.flags = 0;
> > > > > + rx_opt.mptcp.mp_capable = 0;
> > > > > + rx_opt.mptcp.mp_join = 0;
> > > > > + rx_opt.mptcp.dss = 0;
> > > > > + mptcp_get_options(skb, &rx_opt);
> > > > > +
> > > > > + if (rx_opt.mptcp.mp_capable && listener->request_mptcp) {
> > > > > + subflow_req->mp_capable = 1;
> > > > > + if (rx_opt.mptcp.version >= listener->version)
> > > > > + subflow_req->version = listener->version;
> > > > > + else
> > > > > + subflow_req->version = rx_opt.mptcp.version;
> > > > > + if ((rx_opt.mptcp.flags & MPTCP_CAP_CHECKSUM_REQD) ||
> > > > > + listener->checksum)
> > > > > + subflow_req->checksum = 1;
> > > > > + subflow_req->remote_key = rx_opt.mptcp.sndr_key;
> > > > > + } else {
> > > > > + subflow_req->mp_capable = 0;
> > > > > + }
> > > > > +}
> > > > > +
> > > > > static void subflow_finish_connect(struct sock *sk, const struct sk_buff *skb)
> > > > > {
> > > > > struct subflow_sock *subflow = subflow_sk(sk);
> > > > > @@ -68,13 +102,66 @@ static void subflow_finish_connect(struct sock *sk, const struct sk_buff *skb)
> > > > > }
> > > > > }
> > > > >
> > > > > +static struct request_sock_ops subflow_request_sock_ops;
> > > > > +static struct tcp_request_sock_ops subflow_request_sock_ipv4_ops;
> > > > > +
> > > > > +static int subflow_conn_request(struct sock *sk, struct sk_buff *skb)
> > > > > +{
> > > > > + struct subflow_sock *subflow = subflow_sk(sk);
> > > > > +
> > > > > + pr_debug("subflow=%p", subflow);
> > > > > +
> > > > > + /* Never answer to SYNs sent to broadcast or multicast */
> > > > > + if (skb_rtable(skb)->rt_flags & (RTCF_BROADCAST | RTCF_MULTICAST))
> > > > > + goto drop;
> > > > > +
> > > > > + return tcp_conn_request(&subflow_request_sock_ops,
> > > > > + &subflow_request_sock_ipv4_ops,
> > > > > + sk, skb);
> > > > > +drop:
> > > > > + tcp_listendrop(sk);
> > > > > + return 0;
> > > > > +}
> > > > > +
> > > > > +static struct sock *subflow_syn_recv_sock(const struct sock *sk,
> > > > > + struct sk_buff *skb,
> > > > > + struct request_sock *req,
> > > > > + struct dst_entry *dst,
> > > > > + struct request_sock *req_unhash,
> > > > > + bool *own_req)
> > > > > +{
> > > > > + struct subflow_sock *listener = subflow_sk(sk);
> > > > > + struct subflow_request_sock *subflow_req = subflow_rsk(req);
> > > > > + struct sock *child;
> > > > > +
> > > > > + pr_debug("listener=%p, req=%p, conn=%p", sk, req, listener->conn);
> > > > > +
> > > > > + child = tcp_v4_syn_recv_sock(sk, skb, req, dst, req_unhash, own_req);
> > > > > +
> > > > > + if (child) {
> > > > > + struct subflow_sock *subflow = subflow_sk(child);
> > > > > +
> > > > > + pr_debug("child=%p", child);
> > > > > + if (subflow_req->mp_capable) {
> > > > > + subflow->mp_capable = 1;
> > > > > + subflow->fourth_ack = 1;
> > > > > + subflow->remote_key = subflow_req->remote_key;
> > > > > + subflow->local_key = subflow_req->local_key;
> > > > > + } else {
> > > > > + subflow->mp_capable = 0;
> > > > > + }
> > > > > + }
> > > > > +
> > > > > + return child;
> > > > > +}
> > > > > +
> > > > > const struct inet_connection_sock_af_ops subflow_specific = {
> > > > > .queue_xmit = ip_queue_xmit,
> > > > > .send_check = tcp_v4_send_check,
> > > > > .rebuild_header = inet_sk_rebuild_header,
> > > > > .sk_rx_dst_set = subflow_finish_connect,
> > > > > - .conn_request = tcp_v4_conn_request,
> > > > > - .syn_recv_sock = tcp_v4_syn_recv_sock,
> > > > > + .conn_request = subflow_conn_request,
> > > > > + .syn_recv_sock = subflow_syn_recv_sock,
> > > > > .net_header_len = sizeof(struct iphdr),
> > > > > .setsockopt = ip_setsockopt,
> > > > > .getsockopt = ip_getsockopt,
> > > > > @@ -112,6 +199,21 @@ static void subflow_close(struct sock *sk, long timeout)
> > > > > tcp_close(sk, timeout);
> > > > > }
> > > > >
> > > > > +static struct sock *subflow_accept(struct sock *sk, int flags, int *err,
> > > > > + bool kern)
> > > > > +{
> > > > > + struct subflow_sock *subflow = subflow_sk(sk);
> > > > > + struct sock *child;
> > > > > +
> > > > > + pr_debug("subflow=%p, conn=%p", subflow, subflow->conn);
> > > > > +
> > > > > + child = inet_csk_accept(sk, flags, err, kern);
> > > > > +
> > > > > + pr_debug("child=%p", child);
> > > > > +
> > > > > + return child;
> > > > > +}
> > > > > +
> > > > > static void subflow_destroy(struct sock *sk)
> > > > > {
> > > > > pr_debug("subflow=%p", sk);
> > > > > @@ -125,7 +227,7 @@ static struct proto subflow_prot = {
> > > > > .close = subflow_close,
> > > > > .connect = subflow_connect,
> > > > > .disconnect = tcp_disconnect,
> > > > > - .accept = inet_csk_accept,
> > > > > + .accept = subflow_accept,
> > > > > .ioctl = tcp_ioctl,
> > > > > .init = subflow_init_sock,
> > > > > .destroy = subflow_destroy,
> > > > > @@ -169,7 +271,14 @@ int mptcp_subflow_init(void)
> > > > >
> > > > > /* TODO: Register path manager callbacks. */
> > > > >
> > > > > + subflow_request_sock_ops = tcp_request_sock_ops;
> > > > > + subflow_request_sock_ops.obj_size = sizeof(struct subflow_request_sock),
> > > > > +
> > > > > + subflow_request_sock_ipv4_ops = tcp_request_sock_ipv4_ops;
> > > > > + subflow_request_sock_ipv4_ops.init_req = subflow_v4_init_req;
> > > > > +
> > > > > subflow_prot.twsk_prot = tcp_prot.twsk_prot;
> > > > > + subflow_prot.rsk_prot = &subflow_request_sock_ops;
> > > > > subflow_prot.h.hashinfo = tcp_prot.h.hashinfo;
> > > > > err = proto_register(&subflow_prot, 1);
> > > > > if (err)
> > > > > --
> > > > > 2.19.1
> > > > >
> > > > > _______________________________________________
> > > > > mptcp mailing list
> > > > > mptcp(a)lists.01.org
> > > > > https://lists.01.org/mailman/listinfo/mptcp
> > > >
> > > > _______________________________________________
> > > > mptcp mailing list
> > > > mptcp(a)lists.01.org
> > > > https://lists.01.org/mailman/listinfo/mptcp
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [MPTCP] [RFC PATCH v4 08/17] mptcp: Create SUBFLOW socket for incoming connections
@ 2018-12-12 21:45 Krystad, Peter
0 siblings, 0 replies; 7+ messages in thread
From: Krystad, Peter @ 2018-12-12 21:45 UTC (permalink / raw)
To: mptcp
[-- Attachment #1: Type: text/plain, Size: 22567 bytes --]
On Wed, 2018-12-12 at 13:07 -0800, cpaasch(a)apple.com wrote:
> On 12/12/18 - 19:25:09, Krystad, Peter wrote:
> > On Tue, 2018-12-11 at 22:08 -0800, Christoph Paasch wrote:
> > > Hello,
> > >
> > > On 30/11/18 - 12:11:03, Mat Martineau wrote:
> > > > From: Peter Krystad <peter.krystad(a)intel.com>
> > > >
> > > > Add subflow_request_sock type that extends tcp_request_sock
> > > > and add an is_mptcp flag to tcp_request_sock distinguish them.
> > > >
> > > > Override the listen() and accept() methods of the MPTCP
> > > > socket proto_ops so they may act on the subflow socket.
> > > >
> > > > Override the conn_request() and syn_recv_sock() handlers
> > > > in the inet_connection_sock to handle incoming MPTCP
> > > > SYNs and the ACK to the response SYN.
> > >
> > > I'm having quite a hard time to understand how it works. Can you give some
> > > more details?
> > >
> > > Because, the difficult part about MPTCP is that incoming subflows are no
> > > more matching on a listener but rather on a "established" MPTCP-socket based
> > > on the token that is present in the TCP-options.
> > > And, I don't see how this is being taken care of here.
> > >
> > > Is the expectation that the app will call "listen()" and "accept()" on the
> > > MPTCP-socket ?
> >
> > Yes, the application will call listen() and accept() with the socket it
> > got when it called socket(..., IPPROTO_MPTCP), the normal call sequence
> > for a server application is preserved. In the kernel this socket is
> > represented by struct mptcp_sock.
> >
> > Key generation and token-tracking data structure is added in the next
> > patch, #9.
> >
> > How this works is that underneath the MPTCP socket is a subflow socket
> > that is a struct subflow_sock. This is an extended tcp_sock structure
> > with subflow-specific fields. mptcp_stream_listen() and
> > mptcp_stream_accept() routines in this patch call inet_listen() and
> > inet_accept() on the subflow socket just like it were a listening TCP
> > socket.
> >
> > When an initial MPTCP connection completes on the subflow_socket
> > mptcp_accept() creates a new mptcp_sock, attaches the child tcp_sock
> > (returned by kernel_accept()) to it and this new MPTCP socket is
> > returned to the application. In patch #9 you can see the token and new
> > mptcp_sock are stored in the token tree, so we can find it when
> > additional subflows are created.
>
> When these additional subflows are coming in, normally these SYNs are
> matching on the default listener. But, they should be matching on the
> mptcp-socket that belongs to the token inside the mp_join.
>
> Otherwise said, my question is: where is rx_opt.mp_join set to 1 ? I can't
> seem to find it.
>
Join is not supported in this patchset, see 0/0 patch message. But you
are right additional SYNs will match on the listener subflow socket and
the MPTCP socket will be located by token-tree lookup.
I have some in-progress commits that implement this but it's not
completely working yet.
Peter.
> Christoph
>
> >
> > Let me know if you have more questions.
> >
> > Peter.
> >
> >
> > > Thanks,
> > > Christoph
> > >
> > > >
> > > > Add handling in tcp_output.c to add MP_CAPABLE to an outgoing
> > > > SYN-ACK response for a subflow_request_sock.
> > > >
> > > > Signed-off-by: Peter Krystad <peter.krystad(a)intel.com>
> > > > ---
> > > > include/linux/tcp.h | 1 +
> > > > include/net/mptcp.h | 26 ++++++++++
> > > > include/net/tcp.h | 1 +
> > > > net/ipv4/tcp_input.c | 1 +
> > > > net/ipv4/tcp_output.c | 21 +++++++-
> > > > net/mptcp/options.c | 15 ++++++
> > > > net/mptcp/protocol.c | 102 ++++++++++++++++++++++++++++++++++---
> > > > net/mptcp/subflow.c | 115 ++++++++++++++++++++++++++++++++++++++++--
> > > > 8 files changed, 271 insertions(+), 11 deletions(-)
> > > >
> > > > diff --git a/include/linux/tcp.h b/include/linux/tcp.h
> > > > index 2622817ecd6b..b54ab3b5546a 100644
> > > > --- a/include/linux/tcp.h
> > > > +++ b/include/linux/tcp.h
> > > > @@ -148,6 +148,7 @@ struct tcp_request_sock {
> > > > * FastOpen it's the seq#
> > > > * after data-in-SYN.
> > > > */
> > > > + bool is_mptcp;
> > > > };
> > > >
> > > > static inline struct tcp_request_sock *tcp_rsk(const struct request_sock *req)
> > > > diff --git a/include/net/mptcp.h b/include/net/mptcp.h
> > > > index a5c2baeb688f..ced33f1c529e 100644
> > > > --- a/include/net/mptcp.h
> > > > +++ b/include/net/mptcp.h
> > > > @@ -69,6 +69,23 @@ static inline struct subflow_sock *subflow_sk(const struct sock *sk)
> > > > return (struct subflow_sock *)sk;
> > > > }
> > > >
> > > > +struct subflow_request_sock {
> > > > + struct tcp_request_sock sk;
> > > > + u8 mp_capable : 1,
> > > > + mp_join : 1,
> > > > + checksum : 1,
> > > > + backup : 1,
> > > > + version : 4;
> > > > + u64 local_key;
> > > > + u64 remote_key;
> > > > +};
> > > > +
> > > > +static inline
> > > > +struct subflow_request_sock *subflow_rsk(const struct request_sock *rsk)
> > > > +{
> > > > + return (struct subflow_request_sock *)rsk;
> > > > +}
> > > > +
> > > > #ifdef CONFIG_MPTCP
> > > >
> > > > void mptcp_parse_option(const unsigned char *ptr, int opsize,
> > > > @@ -77,6 +94,8 @@ unsigned int mptcp_syn_options(struct sock *sk, u64 *local_key);
> > > > void mptcp_rcv_synsent(struct sock *sk);
> > > > unsigned int mptcp_established_options(struct sock *sk, u64 *local_key,
> > > > u64 *remote_key);
> > > > +unsigned int mptcp_synack_options(struct request_sock *req,
> > > > + u64 *local_key, u64 *remote_key);
> > > >
> > > > void mptcp_finish_connect(struct sock *sk, int mp_capable);
> > > >
> > > > @@ -104,6 +123,13 @@ static inline void mptcp_rcv_synsent(struct sock *sk)
> > > > {
> > > > }
> > > >
> > > > +static inline unsigned int mptcp_synack_options(struct request_sock *sk,
> > > > + u64 *local_key,
> > > > + u64 *remote_key)
> > > > +{
> > > > + return 0;
> > > > +}
> > > > +
> > > > static inline unsigned int mptcp_established_options(struct sock *sk,
> > > > u64 *local_key,
> > > > u64 *remote_key)
> > > > diff --git a/include/net/tcp.h b/include/net/tcp.h
> > > > index 254cf82e2ec6..1fc6362fa778 100644
> > > > --- a/include/net/tcp.h
> > > > +++ b/include/net/tcp.h
> > > > @@ -216,6 +216,7 @@ void tcp_time_wait(struct sock *sk, int state, int timeo);
> > > > #define TCPOLEN_MSS_ALIGNED 4
> > > > #define TCPOLEN_EXP_SMC_BASE_ALIGNED 8
> > > > #define TCPOLEN_MPTCP_MPC_SYN 12
> > > > +#define TCPOLEN_MPTCP_MPC_SYNACK 20
> > > > #define TCPOLEN_MPTCP_MPC_ACK 20
> > > >
> > > > /* Flags in tp->nonagle */
> > > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > > > index eda515b141fb..00f7a3d88d66 100644
> > > > --- a/net/ipv4/tcp_input.c
> > > > +++ b/net/ipv4/tcp_input.c
> > > > @@ -6445,6 +6445,7 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
> > > >
> > > > tcp_rsk(req)->af_specific = af_ops;
> > > > tcp_rsk(req)->ts_off = 0;
> > > > + tcp_rsk(req)->is_mptcp = 0;
> > > >
> > > > tcp_clear_options(&tmp_opt);
> > > > tmp_opt.mss_clamp = af_ops->mss_clamp;
> > > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > > > index 4f284ed879ba..6f723cdb5c8e 100644
> > > > --- a/net/ipv4/tcp_output.c
> > > > +++ b/net/ipv4/tcp_output.c
> > > > @@ -416,6 +416,7 @@ static inline bool tcp_urg_mode(const struct tcp_sock *tp)
> > > >
> > > > /* MPTCP option subtypes */
> > > > #define OPTION_MPTCP_MPC_SYN (1 << 0)
> > > > +#define OPTION_MPTCP_MPC_SYNACK (1 << 1)
> > > > #define OPTION_MPTCP_MPC_ACK (1 << 2)
> > > >
> > > > struct tcp_out_options {
> > > > @@ -439,12 +440,15 @@ static void mptcp_options_write(__be32 *ptr, struct tcp_out_options *opts)
> > > > return;
> > > >
> > > > if ((OPTION_MPTCP_MPC_SYN |
> > > > + OPTION_MPTCP_MPC_SYNACK |
> > > > OPTION_MPTCP_MPC_ACK) & opts->suboptions) {
> > > > u8 len;
> > > > __be64 key;
> > > >
> > > > if (OPTION_MPTCP_MPC_SYN & opts->suboptions)
> > > > len = TCPOLEN_MPTCP_MPC_SYN;
> > > > + else if (OPTION_MPTCP_MPC_SYNACK & opts->suboptions)
> > > > + len = TCPOLEN_MPTCP_MPC_SYNACK;
> > > > else
> > > > len = TCPOLEN_MPTCP_MPC_ACK;
> > > >
> > > > @@ -455,7 +459,8 @@ static void mptcp_options_write(__be32 *ptr, struct tcp_out_options *opts)
> > > > key = cpu_to_be64(opts->sndr_key);
> > > > memcpy((u8 *) ptr, (u8 *) &key, 8);
> > > > ptr += 2;
> > > > - if (OPTION_MPTCP_MPC_ACK & opts->suboptions) {
> > > > + if ((OPTION_MPTCP_MPC_SYNACK |
> > > > + OPTION_MPTCP_MPC_ACK) & opts->suboptions) {
> > > > key = cpu_to_be64(opts->rcvr_key);
> > > > memcpy((u8 *) ptr, (u8 *) &key, 8);
> > > > ptr += 2;
> > > > @@ -762,6 +767,20 @@ static unsigned int tcp_synack_options(const struct sock *sk,
> > > > remaining -= need;
> > > > }
> > > > }
> > > > + if (tcp_rsk(req)->is_mptcp) {
> > > > + u64 local_key;
> > > > + u64 remote_key;
> > > > + if (mptcp_synack_options(req, &local_key, &remote_key)) {
> > > > + if (remaining >= TCPOLEN_MPTCP_MPC_SYNACK) {
> > > > + opts->options |= OPTION_MPTCP;
> > > > + opts->suboptions = OPTION_MPTCP_MPC_SYNACK;
> > > > + opts->sndr_key = local_key;
> > > > + opts->rcvr_key = remote_key;
> > > > + remaining -= TCPOLEN_MPTCP_MPC_SYNACK;
> > > > + }
> > > > + }
> > > > + }
> > > > +
> > > > smc_set_option_cond(tcp_sk(sk), ireq, opts, &remaining);
> > > >
> > > > return MAX_TCP_OPTION_SPACE - remaining;
> > > > diff --git a/net/mptcp/options.c b/net/mptcp/options.c
> > > > index b0616f520da0..266a9f7fed0d 100644
> > > > --- a/net/mptcp/options.c
> > > > +++ b/net/mptcp/options.c
> > > > @@ -189,3 +189,18 @@ unsigned int mptcp_established_options(struct sock *sk, u64 *local_key,
> > > > }
> > > > return 0;
> > > > }
> > > > +
> > > > +unsigned int mptcp_synack_options(struct request_sock *req, u64 *local_key,
> > > > + u64 *remote_key)
> > > > +{
> > > > + struct subflow_request_sock *subflow_req = subflow_rsk(req);
> > > > +
> > > > + pr_debug("subflow_req=%p", subflow_req);
> > > > + if (subflow_req->mp_capable) {
> > > > + *local_key = subflow_req->local_key;
> > > > + *remote_key = subflow_req->remote_key;
> > > > + pr_debug("local_key=%llu", *local_key);
> > > > + pr_debug("remote_key=%llu", *remote_key);
> > > > + }
> > > > + return subflow_req->mp_capable;
> > > > +}
> > > > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> > > > index 1a3412a742ea..9f802f69a528 100644
> > > > --- a/net/mptcp/protocol.c
> > > > +++ b/net/mptcp/protocol.c
> > > > @@ -80,6 +80,45 @@ static void mptcp_close(struct sock *sk, long timeout)
> > > > }
> > > > }
> > > >
> > > > +static struct sock *mptcp_accept(struct sock *sk, int flags, int *err,
> > > > + bool kern)
> > > > +{
> > > > + struct mptcp_sock *msk = mptcp_sk(sk);
> > > > + struct socket *listener = msk->subflow;
> > > > + struct socket *new_sock;
> > > > + struct socket *mp;
> > > > + struct subflow_sock *subflow;
> > > > +
> > > > + pr_debug("msk=%p, listener=%p", msk, listener->sk);
> > > > + *err = kernel_accept(listener, &new_sock, flags);
> > > > + if (*err < 0)
> > > > + return NULL;
> > > > +
> > > > + subflow = subflow_sk(new_sock->sk);
> > > > + pr_debug("new_sock=%p", subflow);
> > > > +
> > > > + *err = sock_create(PF_INET, SOCK_STREAM, IPPROTO_MPTCP, &mp);
> > > > + if (*err < 0) {
> > > > + kernel_sock_shutdown(new_sock, SHUT_RDWR);
> > > > + sock_release(new_sock);
> > > > + return NULL;
> > > > + }
> > > > +
> > > > + msk = mptcp_sk(mp->sk);
> > > > + pr_debug("msk=%p", msk);
> > > > + subflow->conn = mp->sk;
> > > > +
> > > > + if (subflow->mp_capable) {
> > > > + msk->remote_key = subflow->remote_key;
> > > > + msk->local_key = subflow->local_key;
> > > > + msk->connection_list = new_sock;
> > > > + } else {
> > > > + msk->subflow = new_sock;
> > > > + }
> > > > +
> > > > + return mp->sk;
> > > > +}
> > > > +
> > > > static int mptcp_get_port(struct sock *sk, unsigned short snum)
> > > > {
> > > > struct mptcp_sock *msk = mptcp_sk(sk);
> > > > @@ -129,11 +168,16 @@ static int subflow_create(struct sock *sock)
> > > > int mptcp_stream_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
> > > > {
> > > > struct mptcp_sock *msk = mptcp_sk(sock->sk);
> > > > - struct socket *subflow = msk->subflow;
> > > > + int err;
> > > >
> > > > - pr_debug("msk=%p, subflow=%p", msk, subflow->sk);
> > > > + pr_debug("msk=%p", msk);
> > > >
> > > > - return inet_bind(subflow, uaddr, addr_len);
> > > > + if (msk->subflow == NULL) {
> > > > + err = subflow_create(sock->sk);
> > > > + if (err)
> > > > + return err;
> > > > + }
> > > > + return inet_bind(msk->subflow, uaddr, addr_len);
> > > > }
> > > >
> > > > int mptcp_stream_connect(struct socket *sock, struct sockaddr *uaddr,
> > > > @@ -153,12 +197,56 @@ int mptcp_stream_connect(struct socket *sock, struct sockaddr *uaddr,
> > > > return inet_stream_connect(msk->subflow, uaddr, addr_len, flags);
> > > > }
> > > >
> > > > +int mptcp_stream_getname(struct socket *sock, struct sockaddr *uaddr, int peer)
> > > > +{
> > > > + struct mptcp_sock *msk = mptcp_sk(sock->sk);
> > > > + struct socket *subflow;
> > > > + int err = -EPERM;
> > > > +
> > > > + if (msk->connection_list)
> > > > + subflow = msk->connection_list;
> > > > + else
> > > > + subflow = msk->subflow;
> > > > +
> > > > + err = inet_getname(subflow, uaddr, peer);
> > > > +
> > > > + return err;
> > > > +}
> > > > +
> > > > +int mptcp_stream_listen(struct socket *sock, int backlog)
> > > > +{
> > > > + struct mptcp_sock *msk = mptcp_sk(sock->sk);
> > > > + int err;
> > > > +
> > > > + pr_debug("msk=%p", msk);
> > > > +
> > > > + if (msk->subflow == NULL) {
> > > > + err = subflow_create(sock->sk);
> > > > + if (err)
> > > > + return err;
> > > > + }
> > > > + return inet_listen(msk->subflow, backlog);
> > > > +}
> > > > +
> > > > +int mptcp_stream_accept(struct socket *sock, struct socket *newsock, int flags,
> > > > + bool kern)
> > > > +{
> > > > + struct mptcp_sock *msk = mptcp_sk(sock->sk);
> > > > +
> > > > + pr_debug("msk=%p", msk);
> > > > +
> > > > + if (msk->subflow == NULL) {
> > > > + return -EINVAL;
> > > > + }
> > > > + return inet_accept(sock, newsock, flags, kern);
> > > > +}
> > > > +
> > > > static struct proto mptcp_prot = {
> > > > .name = "MPTCP",
> > > > .owner = THIS_MODULE,
> > > > .init = mptcp_init_sock,
> > > > .close = mptcp_close,
> > > > - .accept = inet_csk_accept,
> > > > + .accept = mptcp_accept,
> > > > .shutdown = tcp_shutdown,
> > > > .sendmsg = mptcp_sendmsg,
> > > > .recvmsg = mptcp_recvmsg,
> > > > @@ -176,11 +264,11 @@ const struct proto_ops mptcp_stream_ops = {
> > > > .bind = mptcp_stream_bind,
> > > > .connect = mptcp_stream_connect,
> > > > .socketpair = sock_no_socketpair,
> > > > - .accept = inet_accept,
> > > > - .getname = inet_getname,
> > > > + .accept = mptcp_stream_accept,
> > > > + .getname = mptcp_stream_getname,
> > > > .poll = tcp_poll,
> > > > .ioctl = inet_ioctl,
> > > > - .listen = inet_listen,
> > > > + .listen = mptcp_stream_listen,
> > > > .shutdown = inet_shutdown,
> > > > .setsockopt = sock_common_setsockopt,
> > > > .getsockopt = sock_common_getsockopt,
> > > > diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
> > > > index 5e5fdcb3175f..89fcc3b746eb 100644
> > > > --- a/net/mptcp/subflow.c
> > > > +++ b/net/mptcp/subflow.c
> > > > @@ -53,6 +53,40 @@ static int subflow_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
> > > > return tcp_recvmsg(sk, msg, len, nonblock, flags, addr_len);
> > > > }
> > > >
> > > > +static void subflow_v4_init_req(struct request_sock *req,
> > > > + const struct sock *sk_listener,
> > > > + struct sk_buff *skb)
> > > > +{
> > > > + struct subflow_request_sock *subflow_req = subflow_rsk(req);
> > > > + struct subflow_sock *listener = subflow_sk(sk_listener);
> > > > + struct tcp_options_received rx_opt;
> > > > +
> > > > + tcp_rsk(req)->is_mptcp = 1;
> > > > + pr_debug("subflow_req=%p, listener=%p", subflow_req, listener);
> > > > +
> > > > + tcp_request_sock_ipv4_ops.init_req(req, sk_listener, skb);
> > > > +
> > > > + rx_opt.mptcp.flags = 0;
> > > > + rx_opt.mptcp.mp_capable = 0;
> > > > + rx_opt.mptcp.mp_join = 0;
> > > > + rx_opt.mptcp.dss = 0;
> > > > + mptcp_get_options(skb, &rx_opt);
> > > > +
> > > > + if (rx_opt.mptcp.mp_capable && listener->request_mptcp) {
> > > > + subflow_req->mp_capable = 1;
> > > > + if (rx_opt.mptcp.version >= listener->version)
> > > > + subflow_req->version = listener->version;
> > > > + else
> > > > + subflow_req->version = rx_opt.mptcp.version;
> > > > + if ((rx_opt.mptcp.flags & MPTCP_CAP_CHECKSUM_REQD) ||
> > > > + listener->checksum)
> > > > + subflow_req->checksum = 1;
> > > > + subflow_req->remote_key = rx_opt.mptcp.sndr_key;
> > > > + } else {
> > > > + subflow_req->mp_capable = 0;
> > > > + }
> > > > +}
> > > > +
> > > > static void subflow_finish_connect(struct sock *sk, const struct sk_buff *skb)
> > > > {
> > > > struct subflow_sock *subflow = subflow_sk(sk);
> > > > @@ -68,13 +102,66 @@ static void subflow_finish_connect(struct sock *sk, const struct sk_buff *skb)
> > > > }
> > > > }
> > > >
> > > > +static struct request_sock_ops subflow_request_sock_ops;
> > > > +static struct tcp_request_sock_ops subflow_request_sock_ipv4_ops;
> > > > +
> > > > +static int subflow_conn_request(struct sock *sk, struct sk_buff *skb)
> > > > +{
> > > > + struct subflow_sock *subflow = subflow_sk(sk);
> > > > +
> > > > + pr_debug("subflow=%p", subflow);
> > > > +
> > > > + /* Never answer to SYNs sent to broadcast or multicast */
> > > > + if (skb_rtable(skb)->rt_flags & (RTCF_BROADCAST | RTCF_MULTICAST))
> > > > + goto drop;
> > > > +
> > > > + return tcp_conn_request(&subflow_request_sock_ops,
> > > > + &subflow_request_sock_ipv4_ops,
> > > > + sk, skb);
> > > > +drop:
> > > > + tcp_listendrop(sk);
> > > > + return 0;
> > > > +}
> > > > +
> > > > +static struct sock *subflow_syn_recv_sock(const struct sock *sk,
> > > > + struct sk_buff *skb,
> > > > + struct request_sock *req,
> > > > + struct dst_entry *dst,
> > > > + struct request_sock *req_unhash,
> > > > + bool *own_req)
> > > > +{
> > > > + struct subflow_sock *listener = subflow_sk(sk);
> > > > + struct subflow_request_sock *subflow_req = subflow_rsk(req);
> > > > + struct sock *child;
> > > > +
> > > > + pr_debug("listener=%p, req=%p, conn=%p", sk, req, listener->conn);
> > > > +
> > > > + child = tcp_v4_syn_recv_sock(sk, skb, req, dst, req_unhash, own_req);
> > > > +
> > > > + if (child) {
> > > > + struct subflow_sock *subflow = subflow_sk(child);
> > > > +
> > > > + pr_debug("child=%p", child);
> > > > + if (subflow_req->mp_capable) {
> > > > + subflow->mp_capable = 1;
> > > > + subflow->fourth_ack = 1;
> > > > + subflow->remote_key = subflow_req->remote_key;
> > > > + subflow->local_key = subflow_req->local_key;
> > > > + } else {
> > > > + subflow->mp_capable = 0;
> > > > + }
> > > > + }
> > > > +
> > > > + return child;
> > > > +}
> > > > +
> > > > const struct inet_connection_sock_af_ops subflow_specific = {
> > > > .queue_xmit = ip_queue_xmit,
> > > > .send_check = tcp_v4_send_check,
> > > > .rebuild_header = inet_sk_rebuild_header,
> > > > .sk_rx_dst_set = subflow_finish_connect,
> > > > - .conn_request = tcp_v4_conn_request,
> > > > - .syn_recv_sock = tcp_v4_syn_recv_sock,
> > > > + .conn_request = subflow_conn_request,
> > > > + .syn_recv_sock = subflow_syn_recv_sock,
> > > > .net_header_len = sizeof(struct iphdr),
> > > > .setsockopt = ip_setsockopt,
> > > > .getsockopt = ip_getsockopt,
> > > > @@ -112,6 +199,21 @@ static void subflow_close(struct sock *sk, long timeout)
> > > > tcp_close(sk, timeout);
> > > > }
> > > >
> > > > +static struct sock *subflow_accept(struct sock *sk, int flags, int *err,
> > > > + bool kern)
> > > > +{
> > > > + struct subflow_sock *subflow = subflow_sk(sk);
> > > > + struct sock *child;
> > > > +
> > > > + pr_debug("subflow=%p, conn=%p", subflow, subflow->conn);
> > > > +
> > > > + child = inet_csk_accept(sk, flags, err, kern);
> > > > +
> > > > + pr_debug("child=%p", child);
> > > > +
> > > > + return child;
> > > > +}
> > > > +
> > > > static void subflow_destroy(struct sock *sk)
> > > > {
> > > > pr_debug("subflow=%p", sk);
> > > > @@ -125,7 +227,7 @@ static struct proto subflow_prot = {
> > > > .close = subflow_close,
> > > > .connect = subflow_connect,
> > > > .disconnect = tcp_disconnect,
> > > > - .accept = inet_csk_accept,
> > > > + .accept = subflow_accept,
> > > > .ioctl = tcp_ioctl,
> > > > .init = subflow_init_sock,
> > > > .destroy = subflow_destroy,
> > > > @@ -169,7 +271,14 @@ int mptcp_subflow_init(void)
> > > >
> > > > /* TODO: Register path manager callbacks. */
> > > >
> > > > + subflow_request_sock_ops = tcp_request_sock_ops;
> > > > + subflow_request_sock_ops.obj_size = sizeof(struct subflow_request_sock),
> > > > +
> > > > + subflow_request_sock_ipv4_ops = tcp_request_sock_ipv4_ops;
> > > > + subflow_request_sock_ipv4_ops.init_req = subflow_v4_init_req;
> > > > +
> > > > subflow_prot.twsk_prot = tcp_prot.twsk_prot;
> > > > + subflow_prot.rsk_prot = &subflow_request_sock_ops;
> > > > subflow_prot.h.hashinfo = tcp_prot.h.hashinfo;
> > > > err = proto_register(&subflow_prot, 1);
> > > > if (err)
> > > > --
> > > > 2.19.1
> > > >
> > > > _______________________________________________
> > > > mptcp mailing list
> > > > mptcp(a)lists.01.org
> > > > https://lists.01.org/mailman/listinfo/mptcp
> > >
> > > _______________________________________________
> > > mptcp mailing list
> > > mptcp(a)lists.01.org
> > > https://lists.01.org/mailman/listinfo/mptcp
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [MPTCP] [RFC PATCH v4 08/17] mptcp: Create SUBFLOW socket for incoming connections
@ 2018-12-12 21:07 cpaasch
0 siblings, 0 replies; 7+ messages in thread
From: cpaasch @ 2018-12-12 21:07 UTC (permalink / raw)
To: mptcp
[-- Attachment #1: Type: text/plain, Size: 21066 bytes --]
On 12/12/18 - 19:25:09, Krystad, Peter wrote:
> On Tue, 2018-12-11 at 22:08 -0800, Christoph Paasch wrote:
> > Hello,
> >
> > On 30/11/18 - 12:11:03, Mat Martineau wrote:
> > > From: Peter Krystad <peter.krystad(a)intel.com>
> > >
> > > Add subflow_request_sock type that extends tcp_request_sock
> > > and add an is_mptcp flag to tcp_request_sock distinguish them.
> > >
> > > Override the listen() and accept() methods of the MPTCP
> > > socket proto_ops so they may act on the subflow socket.
> > >
> > > Override the conn_request() and syn_recv_sock() handlers
> > > in the inet_connection_sock to handle incoming MPTCP
> > > SYNs and the ACK to the response SYN.
> >
> > I'm having quite a hard time to understand how it works. Can you give some
> > more details?
> >
> > Because, the difficult part about MPTCP is that incoming subflows are no
> > more matching on a listener but rather on a "established" MPTCP-socket based
> > on the token that is present in the TCP-options.
> > And, I don't see how this is being taken care of here.
> >
> > Is the expectation that the app will call "listen()" and "accept()" on the
> > MPTCP-socket ?
>
> Yes, the application will call listen() and accept() with the socket it
> got when it called socket(..., IPPROTO_MPTCP), the normal call sequence
> for a server application is preserved. In the kernel this socket is
> represented by struct mptcp_sock.
>
> Key generation and token-tracking data structure is added in the next
> patch, #9.
>
> How this works is that underneath the MPTCP socket is a subflow socket
> that is a struct subflow_sock. This is an extended tcp_sock structure
> with subflow-specific fields. mptcp_stream_listen() and
> mptcp_stream_accept() routines in this patch call inet_listen() and
> inet_accept() on the subflow socket just like it were a listening TCP
> socket.
>
> When an initial MPTCP connection completes on the subflow_socket
> mptcp_accept() creates a new mptcp_sock, attaches the child tcp_sock
> (returned by kernel_accept()) to it and this new MPTCP socket is
> returned to the application. In patch #9 you can see the token and new
> mptcp_sock are stored in the token tree, so we can find it when
> additional subflows are created.
When these additional subflows are coming in, normally these SYNs are
matching on the default listener. But, they should be matching on the
mptcp-socket that belongs to the token inside the mp_join.
Otherwise said, my question is: where is rx_opt.mp_join set to 1 ? I can't
seem to find it.
Christoph
>
> Let me know if you have more questions.
>
> Peter.
>
>
> > Thanks,
> > Christoph
> >
> > >
> > > Add handling in tcp_output.c to add MP_CAPABLE to an outgoing
> > > SYN-ACK response for a subflow_request_sock.
> > >
> > > Signed-off-by: Peter Krystad <peter.krystad(a)intel.com>
> > > ---
> > > include/linux/tcp.h | 1 +
> > > include/net/mptcp.h | 26 ++++++++++
> > > include/net/tcp.h | 1 +
> > > net/ipv4/tcp_input.c | 1 +
> > > net/ipv4/tcp_output.c | 21 +++++++-
> > > net/mptcp/options.c | 15 ++++++
> > > net/mptcp/protocol.c | 102 ++++++++++++++++++++++++++++++++++---
> > > net/mptcp/subflow.c | 115 ++++++++++++++++++++++++++++++++++++++++--
> > > 8 files changed, 271 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/include/linux/tcp.h b/include/linux/tcp.h
> > > index 2622817ecd6b..b54ab3b5546a 100644
> > > --- a/include/linux/tcp.h
> > > +++ b/include/linux/tcp.h
> > > @@ -148,6 +148,7 @@ struct tcp_request_sock {
> > > * FastOpen it's the seq#
> > > * after data-in-SYN.
> > > */
> > > + bool is_mptcp;
> > > };
> > >
> > > static inline struct tcp_request_sock *tcp_rsk(const struct request_sock *req)
> > > diff --git a/include/net/mptcp.h b/include/net/mptcp.h
> > > index a5c2baeb688f..ced33f1c529e 100644
> > > --- a/include/net/mptcp.h
> > > +++ b/include/net/mptcp.h
> > > @@ -69,6 +69,23 @@ static inline struct subflow_sock *subflow_sk(const struct sock *sk)
> > > return (struct subflow_sock *)sk;
> > > }
> > >
> > > +struct subflow_request_sock {
> > > + struct tcp_request_sock sk;
> > > + u8 mp_capable : 1,
> > > + mp_join : 1,
> > > + checksum : 1,
> > > + backup : 1,
> > > + version : 4;
> > > + u64 local_key;
> > > + u64 remote_key;
> > > +};
> > > +
> > > +static inline
> > > +struct subflow_request_sock *subflow_rsk(const struct request_sock *rsk)
> > > +{
> > > + return (struct subflow_request_sock *)rsk;
> > > +}
> > > +
> > > #ifdef CONFIG_MPTCP
> > >
> > > void mptcp_parse_option(const unsigned char *ptr, int opsize,
> > > @@ -77,6 +94,8 @@ unsigned int mptcp_syn_options(struct sock *sk, u64 *local_key);
> > > void mptcp_rcv_synsent(struct sock *sk);
> > > unsigned int mptcp_established_options(struct sock *sk, u64 *local_key,
> > > u64 *remote_key);
> > > +unsigned int mptcp_synack_options(struct request_sock *req,
> > > + u64 *local_key, u64 *remote_key);
> > >
> > > void mptcp_finish_connect(struct sock *sk, int mp_capable);
> > >
> > > @@ -104,6 +123,13 @@ static inline void mptcp_rcv_synsent(struct sock *sk)
> > > {
> > > }
> > >
> > > +static inline unsigned int mptcp_synack_options(struct request_sock *sk,
> > > + u64 *local_key,
> > > + u64 *remote_key)
> > > +{
> > > + return 0;
> > > +}
> > > +
> > > static inline unsigned int mptcp_established_options(struct sock *sk,
> > > u64 *local_key,
> > > u64 *remote_key)
> > > diff --git a/include/net/tcp.h b/include/net/tcp.h
> > > index 254cf82e2ec6..1fc6362fa778 100644
> > > --- a/include/net/tcp.h
> > > +++ b/include/net/tcp.h
> > > @@ -216,6 +216,7 @@ void tcp_time_wait(struct sock *sk, int state, int timeo);
> > > #define TCPOLEN_MSS_ALIGNED 4
> > > #define TCPOLEN_EXP_SMC_BASE_ALIGNED 8
> > > #define TCPOLEN_MPTCP_MPC_SYN 12
> > > +#define TCPOLEN_MPTCP_MPC_SYNACK 20
> > > #define TCPOLEN_MPTCP_MPC_ACK 20
> > >
> > > /* Flags in tp->nonagle */
> > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > > index eda515b141fb..00f7a3d88d66 100644
> > > --- a/net/ipv4/tcp_input.c
> > > +++ b/net/ipv4/tcp_input.c
> > > @@ -6445,6 +6445,7 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
> > >
> > > tcp_rsk(req)->af_specific = af_ops;
> > > tcp_rsk(req)->ts_off = 0;
> > > + tcp_rsk(req)->is_mptcp = 0;
> > >
> > > tcp_clear_options(&tmp_opt);
> > > tmp_opt.mss_clamp = af_ops->mss_clamp;
> > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > > index 4f284ed879ba..6f723cdb5c8e 100644
> > > --- a/net/ipv4/tcp_output.c
> > > +++ b/net/ipv4/tcp_output.c
> > > @@ -416,6 +416,7 @@ static inline bool tcp_urg_mode(const struct tcp_sock *tp)
> > >
> > > /* MPTCP option subtypes */
> > > #define OPTION_MPTCP_MPC_SYN (1 << 0)
> > > +#define OPTION_MPTCP_MPC_SYNACK (1 << 1)
> > > #define OPTION_MPTCP_MPC_ACK (1 << 2)
> > >
> > > struct tcp_out_options {
> > > @@ -439,12 +440,15 @@ static void mptcp_options_write(__be32 *ptr, struct tcp_out_options *opts)
> > > return;
> > >
> > > if ((OPTION_MPTCP_MPC_SYN |
> > > + OPTION_MPTCP_MPC_SYNACK |
> > > OPTION_MPTCP_MPC_ACK) & opts->suboptions) {
> > > u8 len;
> > > __be64 key;
> > >
> > > if (OPTION_MPTCP_MPC_SYN & opts->suboptions)
> > > len = TCPOLEN_MPTCP_MPC_SYN;
> > > + else if (OPTION_MPTCP_MPC_SYNACK & opts->suboptions)
> > > + len = TCPOLEN_MPTCP_MPC_SYNACK;
> > > else
> > > len = TCPOLEN_MPTCP_MPC_ACK;
> > >
> > > @@ -455,7 +459,8 @@ static void mptcp_options_write(__be32 *ptr, struct tcp_out_options *opts)
> > > key = cpu_to_be64(opts->sndr_key);
> > > memcpy((u8 *) ptr, (u8 *) &key, 8);
> > > ptr += 2;
> > > - if (OPTION_MPTCP_MPC_ACK & opts->suboptions) {
> > > + if ((OPTION_MPTCP_MPC_SYNACK |
> > > + OPTION_MPTCP_MPC_ACK) & opts->suboptions) {
> > > key = cpu_to_be64(opts->rcvr_key);
> > > memcpy((u8 *) ptr, (u8 *) &key, 8);
> > > ptr += 2;
> > > @@ -762,6 +767,20 @@ static unsigned int tcp_synack_options(const struct sock *sk,
> > > remaining -= need;
> > > }
> > > }
> > > + if (tcp_rsk(req)->is_mptcp) {
> > > + u64 local_key;
> > > + u64 remote_key;
> > > + if (mptcp_synack_options(req, &local_key, &remote_key)) {
> > > + if (remaining >= TCPOLEN_MPTCP_MPC_SYNACK) {
> > > + opts->options |= OPTION_MPTCP;
> > > + opts->suboptions = OPTION_MPTCP_MPC_SYNACK;
> > > + opts->sndr_key = local_key;
> > > + opts->rcvr_key = remote_key;
> > > + remaining -= TCPOLEN_MPTCP_MPC_SYNACK;
> > > + }
> > > + }
> > > + }
> > > +
> > > smc_set_option_cond(tcp_sk(sk), ireq, opts, &remaining);
> > >
> > > return MAX_TCP_OPTION_SPACE - remaining;
> > > diff --git a/net/mptcp/options.c b/net/mptcp/options.c
> > > index b0616f520da0..266a9f7fed0d 100644
> > > --- a/net/mptcp/options.c
> > > +++ b/net/mptcp/options.c
> > > @@ -189,3 +189,18 @@ unsigned int mptcp_established_options(struct sock *sk, u64 *local_key,
> > > }
> > > return 0;
> > > }
> > > +
> > > +unsigned int mptcp_synack_options(struct request_sock *req, u64 *local_key,
> > > + u64 *remote_key)
> > > +{
> > > + struct subflow_request_sock *subflow_req = subflow_rsk(req);
> > > +
> > > + pr_debug("subflow_req=%p", subflow_req);
> > > + if (subflow_req->mp_capable) {
> > > + *local_key = subflow_req->local_key;
> > > + *remote_key = subflow_req->remote_key;
> > > + pr_debug("local_key=%llu", *local_key);
> > > + pr_debug("remote_key=%llu", *remote_key);
> > > + }
> > > + return subflow_req->mp_capable;
> > > +}
> > > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> > > index 1a3412a742ea..9f802f69a528 100644
> > > --- a/net/mptcp/protocol.c
> > > +++ b/net/mptcp/protocol.c
> > > @@ -80,6 +80,45 @@ static void mptcp_close(struct sock *sk, long timeout)
> > > }
> > > }
> > >
> > > +static struct sock *mptcp_accept(struct sock *sk, int flags, int *err,
> > > + bool kern)
> > > +{
> > > + struct mptcp_sock *msk = mptcp_sk(sk);
> > > + struct socket *listener = msk->subflow;
> > > + struct socket *new_sock;
> > > + struct socket *mp;
> > > + struct subflow_sock *subflow;
> > > +
> > > + pr_debug("msk=%p, listener=%p", msk, listener->sk);
> > > + *err = kernel_accept(listener, &new_sock, flags);
> > > + if (*err < 0)
> > > + return NULL;
> > > +
> > > + subflow = subflow_sk(new_sock->sk);
> > > + pr_debug("new_sock=%p", subflow);
> > > +
> > > + *err = sock_create(PF_INET, SOCK_STREAM, IPPROTO_MPTCP, &mp);
> > > + if (*err < 0) {
> > > + kernel_sock_shutdown(new_sock, SHUT_RDWR);
> > > + sock_release(new_sock);
> > > + return NULL;
> > > + }
> > > +
> > > + msk = mptcp_sk(mp->sk);
> > > + pr_debug("msk=%p", msk);
> > > + subflow->conn = mp->sk;
> > > +
> > > + if (subflow->mp_capable) {
> > > + msk->remote_key = subflow->remote_key;
> > > + msk->local_key = subflow->local_key;
> > > + msk->connection_list = new_sock;
> > > + } else {
> > > + msk->subflow = new_sock;
> > > + }
> > > +
> > > + return mp->sk;
> > > +}
> > > +
> > > static int mptcp_get_port(struct sock *sk, unsigned short snum)
> > > {
> > > struct mptcp_sock *msk = mptcp_sk(sk);
> > > @@ -129,11 +168,16 @@ static int subflow_create(struct sock *sock)
> > > int mptcp_stream_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
> > > {
> > > struct mptcp_sock *msk = mptcp_sk(sock->sk);
> > > - struct socket *subflow = msk->subflow;
> > > + int err;
> > >
> > > - pr_debug("msk=%p, subflow=%p", msk, subflow->sk);
> > > + pr_debug("msk=%p", msk);
> > >
> > > - return inet_bind(subflow, uaddr, addr_len);
> > > + if (msk->subflow == NULL) {
> > > + err = subflow_create(sock->sk);
> > > + if (err)
> > > + return err;
> > > + }
> > > + return inet_bind(msk->subflow, uaddr, addr_len);
> > > }
> > >
> > > int mptcp_stream_connect(struct socket *sock, struct sockaddr *uaddr,
> > > @@ -153,12 +197,56 @@ int mptcp_stream_connect(struct socket *sock, struct sockaddr *uaddr,
> > > return inet_stream_connect(msk->subflow, uaddr, addr_len, flags);
> > > }
> > >
> > > +int mptcp_stream_getname(struct socket *sock, struct sockaddr *uaddr, int peer)
> > > +{
> > > + struct mptcp_sock *msk = mptcp_sk(sock->sk);
> > > + struct socket *subflow;
> > > + int err = -EPERM;
> > > +
> > > + if (msk->connection_list)
> > > + subflow = msk->connection_list;
> > > + else
> > > + subflow = msk->subflow;
> > > +
> > > + err = inet_getname(subflow, uaddr, peer);
> > > +
> > > + return err;
> > > +}
> > > +
> > > +int mptcp_stream_listen(struct socket *sock, int backlog)
> > > +{
> > > + struct mptcp_sock *msk = mptcp_sk(sock->sk);
> > > + int err;
> > > +
> > > + pr_debug("msk=%p", msk);
> > > +
> > > + if (msk->subflow == NULL) {
> > > + err = subflow_create(sock->sk);
> > > + if (err)
> > > + return err;
> > > + }
> > > + return inet_listen(msk->subflow, backlog);
> > > +}
> > > +
> > > +int mptcp_stream_accept(struct socket *sock, struct socket *newsock, int flags,
> > > + bool kern)
> > > +{
> > > + struct mptcp_sock *msk = mptcp_sk(sock->sk);
> > > +
> > > + pr_debug("msk=%p", msk);
> > > +
> > > + if (msk->subflow == NULL) {
> > > + return -EINVAL;
> > > + }
> > > + return inet_accept(sock, newsock, flags, kern);
> > > +}
> > > +
> > > static struct proto mptcp_prot = {
> > > .name = "MPTCP",
> > > .owner = THIS_MODULE,
> > > .init = mptcp_init_sock,
> > > .close = mptcp_close,
> > > - .accept = inet_csk_accept,
> > > + .accept = mptcp_accept,
> > > .shutdown = tcp_shutdown,
> > > .sendmsg = mptcp_sendmsg,
> > > .recvmsg = mptcp_recvmsg,
> > > @@ -176,11 +264,11 @@ const struct proto_ops mptcp_stream_ops = {
> > > .bind = mptcp_stream_bind,
> > > .connect = mptcp_stream_connect,
> > > .socketpair = sock_no_socketpair,
> > > - .accept = inet_accept,
> > > - .getname = inet_getname,
> > > + .accept = mptcp_stream_accept,
> > > + .getname = mptcp_stream_getname,
> > > .poll = tcp_poll,
> > > .ioctl = inet_ioctl,
> > > - .listen = inet_listen,
> > > + .listen = mptcp_stream_listen,
> > > .shutdown = inet_shutdown,
> > > .setsockopt = sock_common_setsockopt,
> > > .getsockopt = sock_common_getsockopt,
> > > diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
> > > index 5e5fdcb3175f..89fcc3b746eb 100644
> > > --- a/net/mptcp/subflow.c
> > > +++ b/net/mptcp/subflow.c
> > > @@ -53,6 +53,40 @@ static int subflow_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
> > > return tcp_recvmsg(sk, msg, len, nonblock, flags, addr_len);
> > > }
> > >
> > > +static void subflow_v4_init_req(struct request_sock *req,
> > > + const struct sock *sk_listener,
> > > + struct sk_buff *skb)
> > > +{
> > > + struct subflow_request_sock *subflow_req = subflow_rsk(req);
> > > + struct subflow_sock *listener = subflow_sk(sk_listener);
> > > + struct tcp_options_received rx_opt;
> > > +
> > > + tcp_rsk(req)->is_mptcp = 1;
> > > + pr_debug("subflow_req=%p, listener=%p", subflow_req, listener);
> > > +
> > > + tcp_request_sock_ipv4_ops.init_req(req, sk_listener, skb);
> > > +
> > > + rx_opt.mptcp.flags = 0;
> > > + rx_opt.mptcp.mp_capable = 0;
> > > + rx_opt.mptcp.mp_join = 0;
> > > + rx_opt.mptcp.dss = 0;
> > > + mptcp_get_options(skb, &rx_opt);
> > > +
> > > + if (rx_opt.mptcp.mp_capable && listener->request_mptcp) {
> > > + subflow_req->mp_capable = 1;
> > > + if (rx_opt.mptcp.version >= listener->version)
> > > + subflow_req->version = listener->version;
> > > + else
> > > + subflow_req->version = rx_opt.mptcp.version;
> > > + if ((rx_opt.mptcp.flags & MPTCP_CAP_CHECKSUM_REQD) ||
> > > + listener->checksum)
> > > + subflow_req->checksum = 1;
> > > + subflow_req->remote_key = rx_opt.mptcp.sndr_key;
> > > + } else {
> > > + subflow_req->mp_capable = 0;
> > > + }
> > > +}
> > > +
> > > static void subflow_finish_connect(struct sock *sk, const struct sk_buff *skb)
> > > {
> > > struct subflow_sock *subflow = subflow_sk(sk);
> > > @@ -68,13 +102,66 @@ static void subflow_finish_connect(struct sock *sk, const struct sk_buff *skb)
> > > }
> > > }
> > >
> > > +static struct request_sock_ops subflow_request_sock_ops;
> > > +static struct tcp_request_sock_ops subflow_request_sock_ipv4_ops;
> > > +
> > > +static int subflow_conn_request(struct sock *sk, struct sk_buff *skb)
> > > +{
> > > + struct subflow_sock *subflow = subflow_sk(sk);
> > > +
> > > + pr_debug("subflow=%p", subflow);
> > > +
> > > + /* Never answer to SYNs sent to broadcast or multicast */
> > > + if (skb_rtable(skb)->rt_flags & (RTCF_BROADCAST | RTCF_MULTICAST))
> > > + goto drop;
> > > +
> > > + return tcp_conn_request(&subflow_request_sock_ops,
> > > + &subflow_request_sock_ipv4_ops,
> > > + sk, skb);
> > > +drop:
> > > + tcp_listendrop(sk);
> > > + return 0;
> > > +}
> > > +
> > > +static struct sock *subflow_syn_recv_sock(const struct sock *sk,
> > > + struct sk_buff *skb,
> > > + struct request_sock *req,
> > > + struct dst_entry *dst,
> > > + struct request_sock *req_unhash,
> > > + bool *own_req)
> > > +{
> > > + struct subflow_sock *listener = subflow_sk(sk);
> > > + struct subflow_request_sock *subflow_req = subflow_rsk(req);
> > > + struct sock *child;
> > > +
> > > + pr_debug("listener=%p, req=%p, conn=%p", sk, req, listener->conn);
> > > +
> > > + child = tcp_v4_syn_recv_sock(sk, skb, req, dst, req_unhash, own_req);
> > > +
> > > + if (child) {
> > > + struct subflow_sock *subflow = subflow_sk(child);
> > > +
> > > + pr_debug("child=%p", child);
> > > + if (subflow_req->mp_capable) {
> > > + subflow->mp_capable = 1;
> > > + subflow->fourth_ack = 1;
> > > + subflow->remote_key = subflow_req->remote_key;
> > > + subflow->local_key = subflow_req->local_key;
> > > + } else {
> > > + subflow->mp_capable = 0;
> > > + }
> > > + }
> > > +
> > > + return child;
> > > +}
> > > +
> > > const struct inet_connection_sock_af_ops subflow_specific = {
> > > .queue_xmit = ip_queue_xmit,
> > > .send_check = tcp_v4_send_check,
> > > .rebuild_header = inet_sk_rebuild_header,
> > > .sk_rx_dst_set = subflow_finish_connect,
> > > - .conn_request = tcp_v4_conn_request,
> > > - .syn_recv_sock = tcp_v4_syn_recv_sock,
> > > + .conn_request = subflow_conn_request,
> > > + .syn_recv_sock = subflow_syn_recv_sock,
> > > .net_header_len = sizeof(struct iphdr),
> > > .setsockopt = ip_setsockopt,
> > > .getsockopt = ip_getsockopt,
> > > @@ -112,6 +199,21 @@ static void subflow_close(struct sock *sk, long timeout)
> > > tcp_close(sk, timeout);
> > > }
> > >
> > > +static struct sock *subflow_accept(struct sock *sk, int flags, int *err,
> > > + bool kern)
> > > +{
> > > + struct subflow_sock *subflow = subflow_sk(sk);
> > > + struct sock *child;
> > > +
> > > + pr_debug("subflow=%p, conn=%p", subflow, subflow->conn);
> > > +
> > > + child = inet_csk_accept(sk, flags, err, kern);
> > > +
> > > + pr_debug("child=%p", child);
> > > +
> > > + return child;
> > > +}
> > > +
> > > static void subflow_destroy(struct sock *sk)
> > > {
> > > pr_debug("subflow=%p", sk);
> > > @@ -125,7 +227,7 @@ static struct proto subflow_prot = {
> > > .close = subflow_close,
> > > .connect = subflow_connect,
> > > .disconnect = tcp_disconnect,
> > > - .accept = inet_csk_accept,
> > > + .accept = subflow_accept,
> > > .ioctl = tcp_ioctl,
> > > .init = subflow_init_sock,
> > > .destroy = subflow_destroy,
> > > @@ -169,7 +271,14 @@ int mptcp_subflow_init(void)
> > >
> > > /* TODO: Register path manager callbacks. */
> > >
> > > + subflow_request_sock_ops = tcp_request_sock_ops;
> > > + subflow_request_sock_ops.obj_size = sizeof(struct subflow_request_sock),
> > > +
> > > + subflow_request_sock_ipv4_ops = tcp_request_sock_ipv4_ops;
> > > + subflow_request_sock_ipv4_ops.init_req = subflow_v4_init_req;
> > > +
> > > subflow_prot.twsk_prot = tcp_prot.twsk_prot;
> > > + subflow_prot.rsk_prot = &subflow_request_sock_ops;
> > > subflow_prot.h.hashinfo = tcp_prot.h.hashinfo;
> > > err = proto_register(&subflow_prot, 1);
> > > if (err)
> > > --
> > > 2.19.1
> > >
> > > _______________________________________________
> > > mptcp mailing list
> > > mptcp(a)lists.01.org
> > > https://lists.01.org/mailman/listinfo/mptcp
> >
> > _______________________________________________
> > mptcp mailing list
> > mptcp(a)lists.01.org
> > https://lists.01.org/mailman/listinfo/mptcp
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [MPTCP] [RFC PATCH v4 08/17] mptcp: Create SUBFLOW socket for incoming connections
@ 2018-12-12 19:25 Krystad, Peter
0 siblings, 0 replies; 7+ messages in thread
From: Krystad, Peter @ 2018-12-12 19:25 UTC (permalink / raw)
To: mptcp
[-- Attachment #1: Type: text/plain, Size: 19586 bytes --]
On Tue, 2018-12-11 at 22:08 -0800, Christoph Paasch wrote:
> Hello,
>
> On 30/11/18 - 12:11:03, Mat Martineau wrote:
> > From: Peter Krystad <peter.krystad(a)intel.com>
> >
> > Add subflow_request_sock type that extends tcp_request_sock
> > and add an is_mptcp flag to tcp_request_sock distinguish them.
> >
> > Override the listen() and accept() methods of the MPTCP
> > socket proto_ops so they may act on the subflow socket.
> >
> > Override the conn_request() and syn_recv_sock() handlers
> > in the inet_connection_sock to handle incoming MPTCP
> > SYNs and the ACK to the response SYN.
>
> I'm having quite a hard time to understand how it works. Can you give some
> more details?
>
> Because, the difficult part about MPTCP is that incoming subflows are no
> more matching on a listener but rather on a "established" MPTCP-socket based
> on the token that is present in the TCP-options.
> And, I don't see how this is being taken care of here.
>
> Is the expectation that the app will call "listen()" and "accept()" on the
> MPTCP-socket ?
Yes, the application will call listen() and accept() with the socket it
got when it called socket(..., IPPROTO_MPTCP), the normal call sequence
for a server application is preserved. In the kernel this socket is
represented by struct mptcp_sock.
Key generation and token-tracking data structure is added in the next
patch, #9.
How this works is that underneath the MPTCP socket is a subflow socket
that is a struct subflow_sock. This is an extended tcp_sock structure
with subflow-specific fields. mptcp_stream_listen() and
mptcp_stream_accept() routines in this patch call inet_listen() and
inet_accept() on the subflow socket just like it were a listening TCP
socket.
When an initial MPTCP connection completes on the subflow_socket
mptcp_accept() creates a new mptcp_sock, attaches the child tcp_sock
(returned by kernel_accept()) to it and this new MPTCP socket is
returned to the application. In patch #9 you can see the token and new
mptcp_sock are stored in the token tree, so we can find it when
additional subflows are created.
Let me know if you have more questions.
Peter.
> Thanks,
> Christoph
>
> >
> > Add handling in tcp_output.c to add MP_CAPABLE to an outgoing
> > SYN-ACK response for a subflow_request_sock.
> >
> > Signed-off-by: Peter Krystad <peter.krystad(a)intel.com>
> > ---
> > include/linux/tcp.h | 1 +
> > include/net/mptcp.h | 26 ++++++++++
> > include/net/tcp.h | 1 +
> > net/ipv4/tcp_input.c | 1 +
> > net/ipv4/tcp_output.c | 21 +++++++-
> > net/mptcp/options.c | 15 ++++++
> > net/mptcp/protocol.c | 102 ++++++++++++++++++++++++++++++++++---
> > net/mptcp/subflow.c | 115 ++++++++++++++++++++++++++++++++++++++++--
> > 8 files changed, 271 insertions(+), 11 deletions(-)
> >
> > diff --git a/include/linux/tcp.h b/include/linux/tcp.h
> > index 2622817ecd6b..b54ab3b5546a 100644
> > --- a/include/linux/tcp.h
> > +++ b/include/linux/tcp.h
> > @@ -148,6 +148,7 @@ struct tcp_request_sock {
> > * FastOpen it's the seq#
> > * after data-in-SYN.
> > */
> > + bool is_mptcp;
> > };
> >
> > static inline struct tcp_request_sock *tcp_rsk(const struct request_sock *req)
> > diff --git a/include/net/mptcp.h b/include/net/mptcp.h
> > index a5c2baeb688f..ced33f1c529e 100644
> > --- a/include/net/mptcp.h
> > +++ b/include/net/mptcp.h
> > @@ -69,6 +69,23 @@ static inline struct subflow_sock *subflow_sk(const struct sock *sk)
> > return (struct subflow_sock *)sk;
> > }
> >
> > +struct subflow_request_sock {
> > + struct tcp_request_sock sk;
> > + u8 mp_capable : 1,
> > + mp_join : 1,
> > + checksum : 1,
> > + backup : 1,
> > + version : 4;
> > + u64 local_key;
> > + u64 remote_key;
> > +};
> > +
> > +static inline
> > +struct subflow_request_sock *subflow_rsk(const struct request_sock *rsk)
> > +{
> > + return (struct subflow_request_sock *)rsk;
> > +}
> > +
> > #ifdef CONFIG_MPTCP
> >
> > void mptcp_parse_option(const unsigned char *ptr, int opsize,
> > @@ -77,6 +94,8 @@ unsigned int mptcp_syn_options(struct sock *sk, u64 *local_key);
> > void mptcp_rcv_synsent(struct sock *sk);
> > unsigned int mptcp_established_options(struct sock *sk, u64 *local_key,
> > u64 *remote_key);
> > +unsigned int mptcp_synack_options(struct request_sock *req,
> > + u64 *local_key, u64 *remote_key);
> >
> > void mptcp_finish_connect(struct sock *sk, int mp_capable);
> >
> > @@ -104,6 +123,13 @@ static inline void mptcp_rcv_synsent(struct sock *sk)
> > {
> > }
> >
> > +static inline unsigned int mptcp_synack_options(struct request_sock *sk,
> > + u64 *local_key,
> > + u64 *remote_key)
> > +{
> > + return 0;
> > +}
> > +
> > static inline unsigned int mptcp_established_options(struct sock *sk,
> > u64 *local_key,
> > u64 *remote_key)
> > diff --git a/include/net/tcp.h b/include/net/tcp.h
> > index 254cf82e2ec6..1fc6362fa778 100644
> > --- a/include/net/tcp.h
> > +++ b/include/net/tcp.h
> > @@ -216,6 +216,7 @@ void tcp_time_wait(struct sock *sk, int state, int timeo);
> > #define TCPOLEN_MSS_ALIGNED 4
> > #define TCPOLEN_EXP_SMC_BASE_ALIGNED 8
> > #define TCPOLEN_MPTCP_MPC_SYN 12
> > +#define TCPOLEN_MPTCP_MPC_SYNACK 20
> > #define TCPOLEN_MPTCP_MPC_ACK 20
> >
> > /* Flags in tp->nonagle */
> > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > index eda515b141fb..00f7a3d88d66 100644
> > --- a/net/ipv4/tcp_input.c
> > +++ b/net/ipv4/tcp_input.c
> > @@ -6445,6 +6445,7 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
> >
> > tcp_rsk(req)->af_specific = af_ops;
> > tcp_rsk(req)->ts_off = 0;
> > + tcp_rsk(req)->is_mptcp = 0;
> >
> > tcp_clear_options(&tmp_opt);
> > tmp_opt.mss_clamp = af_ops->mss_clamp;
> > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > index 4f284ed879ba..6f723cdb5c8e 100644
> > --- a/net/ipv4/tcp_output.c
> > +++ b/net/ipv4/tcp_output.c
> > @@ -416,6 +416,7 @@ static inline bool tcp_urg_mode(const struct tcp_sock *tp)
> >
> > /* MPTCP option subtypes */
> > #define OPTION_MPTCP_MPC_SYN (1 << 0)
> > +#define OPTION_MPTCP_MPC_SYNACK (1 << 1)
> > #define OPTION_MPTCP_MPC_ACK (1 << 2)
> >
> > struct tcp_out_options {
> > @@ -439,12 +440,15 @@ static void mptcp_options_write(__be32 *ptr, struct tcp_out_options *opts)
> > return;
> >
> > if ((OPTION_MPTCP_MPC_SYN |
> > + OPTION_MPTCP_MPC_SYNACK |
> > OPTION_MPTCP_MPC_ACK) & opts->suboptions) {
> > u8 len;
> > __be64 key;
> >
> > if (OPTION_MPTCP_MPC_SYN & opts->suboptions)
> > len = TCPOLEN_MPTCP_MPC_SYN;
> > + else if (OPTION_MPTCP_MPC_SYNACK & opts->suboptions)
> > + len = TCPOLEN_MPTCP_MPC_SYNACK;
> > else
> > len = TCPOLEN_MPTCP_MPC_ACK;
> >
> > @@ -455,7 +459,8 @@ static void mptcp_options_write(__be32 *ptr, struct tcp_out_options *opts)
> > key = cpu_to_be64(opts->sndr_key);
> > memcpy((u8 *) ptr, (u8 *) &key, 8);
> > ptr += 2;
> > - if (OPTION_MPTCP_MPC_ACK & opts->suboptions) {
> > + if ((OPTION_MPTCP_MPC_SYNACK |
> > + OPTION_MPTCP_MPC_ACK) & opts->suboptions) {
> > key = cpu_to_be64(opts->rcvr_key);
> > memcpy((u8 *) ptr, (u8 *) &key, 8);
> > ptr += 2;
> > @@ -762,6 +767,20 @@ static unsigned int tcp_synack_options(const struct sock *sk,
> > remaining -= need;
> > }
> > }
> > + if (tcp_rsk(req)->is_mptcp) {
> > + u64 local_key;
> > + u64 remote_key;
> > + if (mptcp_synack_options(req, &local_key, &remote_key)) {
> > + if (remaining >= TCPOLEN_MPTCP_MPC_SYNACK) {
> > + opts->options |= OPTION_MPTCP;
> > + opts->suboptions = OPTION_MPTCP_MPC_SYNACK;
> > + opts->sndr_key = local_key;
> > + opts->rcvr_key = remote_key;
> > + remaining -= TCPOLEN_MPTCP_MPC_SYNACK;
> > + }
> > + }
> > + }
> > +
> > smc_set_option_cond(tcp_sk(sk), ireq, opts, &remaining);
> >
> > return MAX_TCP_OPTION_SPACE - remaining;
> > diff --git a/net/mptcp/options.c b/net/mptcp/options.c
> > index b0616f520da0..266a9f7fed0d 100644
> > --- a/net/mptcp/options.c
> > +++ b/net/mptcp/options.c
> > @@ -189,3 +189,18 @@ unsigned int mptcp_established_options(struct sock *sk, u64 *local_key,
> > }
> > return 0;
> > }
> > +
> > +unsigned int mptcp_synack_options(struct request_sock *req, u64 *local_key,
> > + u64 *remote_key)
> > +{
> > + struct subflow_request_sock *subflow_req = subflow_rsk(req);
> > +
> > + pr_debug("subflow_req=%p", subflow_req);
> > + if (subflow_req->mp_capable) {
> > + *local_key = subflow_req->local_key;
> > + *remote_key = subflow_req->remote_key;
> > + pr_debug("local_key=%llu", *local_key);
> > + pr_debug("remote_key=%llu", *remote_key);
> > + }
> > + return subflow_req->mp_capable;
> > +}
> > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> > index 1a3412a742ea..9f802f69a528 100644
> > --- a/net/mptcp/protocol.c
> > +++ b/net/mptcp/protocol.c
> > @@ -80,6 +80,45 @@ static void mptcp_close(struct sock *sk, long timeout)
> > }
> > }
> >
> > +static struct sock *mptcp_accept(struct sock *sk, int flags, int *err,
> > + bool kern)
> > +{
> > + struct mptcp_sock *msk = mptcp_sk(sk);
> > + struct socket *listener = msk->subflow;
> > + struct socket *new_sock;
> > + struct socket *mp;
> > + struct subflow_sock *subflow;
> > +
> > + pr_debug("msk=%p, listener=%p", msk, listener->sk);
> > + *err = kernel_accept(listener, &new_sock, flags);
> > + if (*err < 0)
> > + return NULL;
> > +
> > + subflow = subflow_sk(new_sock->sk);
> > + pr_debug("new_sock=%p", subflow);
> > +
> > + *err = sock_create(PF_INET, SOCK_STREAM, IPPROTO_MPTCP, &mp);
> > + if (*err < 0) {
> > + kernel_sock_shutdown(new_sock, SHUT_RDWR);
> > + sock_release(new_sock);
> > + return NULL;
> > + }
> > +
> > + msk = mptcp_sk(mp->sk);
> > + pr_debug("msk=%p", msk);
> > + subflow->conn = mp->sk;
> > +
> > + if (subflow->mp_capable) {
> > + msk->remote_key = subflow->remote_key;
> > + msk->local_key = subflow->local_key;
> > + msk->connection_list = new_sock;
> > + } else {
> > + msk->subflow = new_sock;
> > + }
> > +
> > + return mp->sk;
> > +}
> > +
> > static int mptcp_get_port(struct sock *sk, unsigned short snum)
> > {
> > struct mptcp_sock *msk = mptcp_sk(sk);
> > @@ -129,11 +168,16 @@ static int subflow_create(struct sock *sock)
> > int mptcp_stream_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
> > {
> > struct mptcp_sock *msk = mptcp_sk(sock->sk);
> > - struct socket *subflow = msk->subflow;
> > + int err;
> >
> > - pr_debug("msk=%p, subflow=%p", msk, subflow->sk);
> > + pr_debug("msk=%p", msk);
> >
> > - return inet_bind(subflow, uaddr, addr_len);
> > + if (msk->subflow == NULL) {
> > + err = subflow_create(sock->sk);
> > + if (err)
> > + return err;
> > + }
> > + return inet_bind(msk->subflow, uaddr, addr_len);
> > }
> >
> > int mptcp_stream_connect(struct socket *sock, struct sockaddr *uaddr,
> > @@ -153,12 +197,56 @@ int mptcp_stream_connect(struct socket *sock, struct sockaddr *uaddr,
> > return inet_stream_connect(msk->subflow, uaddr, addr_len, flags);
> > }
> >
> > +int mptcp_stream_getname(struct socket *sock, struct sockaddr *uaddr, int peer)
> > +{
> > + struct mptcp_sock *msk = mptcp_sk(sock->sk);
> > + struct socket *subflow;
> > + int err = -EPERM;
> > +
> > + if (msk->connection_list)
> > + subflow = msk->connection_list;
> > + else
> > + subflow = msk->subflow;
> > +
> > + err = inet_getname(subflow, uaddr, peer);
> > +
> > + return err;
> > +}
> > +
> > +int mptcp_stream_listen(struct socket *sock, int backlog)
> > +{
> > + struct mptcp_sock *msk = mptcp_sk(sock->sk);
> > + int err;
> > +
> > + pr_debug("msk=%p", msk);
> > +
> > + if (msk->subflow == NULL) {
> > + err = subflow_create(sock->sk);
> > + if (err)
> > + return err;
> > + }
> > + return inet_listen(msk->subflow, backlog);
> > +}
> > +
> > +int mptcp_stream_accept(struct socket *sock, struct socket *newsock, int flags,
> > + bool kern)
> > +{
> > + struct mptcp_sock *msk = mptcp_sk(sock->sk);
> > +
> > + pr_debug("msk=%p", msk);
> > +
> > + if (msk->subflow == NULL) {
> > + return -EINVAL;
> > + }
> > + return inet_accept(sock, newsock, flags, kern);
> > +}
> > +
> > static struct proto mptcp_prot = {
> > .name = "MPTCP",
> > .owner = THIS_MODULE,
> > .init = mptcp_init_sock,
> > .close = mptcp_close,
> > - .accept = inet_csk_accept,
> > + .accept = mptcp_accept,
> > .shutdown = tcp_shutdown,
> > .sendmsg = mptcp_sendmsg,
> > .recvmsg = mptcp_recvmsg,
> > @@ -176,11 +264,11 @@ const struct proto_ops mptcp_stream_ops = {
> > .bind = mptcp_stream_bind,
> > .connect = mptcp_stream_connect,
> > .socketpair = sock_no_socketpair,
> > - .accept = inet_accept,
> > - .getname = inet_getname,
> > + .accept = mptcp_stream_accept,
> > + .getname = mptcp_stream_getname,
> > .poll = tcp_poll,
> > .ioctl = inet_ioctl,
> > - .listen = inet_listen,
> > + .listen = mptcp_stream_listen,
> > .shutdown = inet_shutdown,
> > .setsockopt = sock_common_setsockopt,
> > .getsockopt = sock_common_getsockopt,
> > diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
> > index 5e5fdcb3175f..89fcc3b746eb 100644
> > --- a/net/mptcp/subflow.c
> > +++ b/net/mptcp/subflow.c
> > @@ -53,6 +53,40 @@ static int subflow_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
> > return tcp_recvmsg(sk, msg, len, nonblock, flags, addr_len);
> > }
> >
> > +static void subflow_v4_init_req(struct request_sock *req,
> > + const struct sock *sk_listener,
> > + struct sk_buff *skb)
> > +{
> > + struct subflow_request_sock *subflow_req = subflow_rsk(req);
> > + struct subflow_sock *listener = subflow_sk(sk_listener);
> > + struct tcp_options_received rx_opt;
> > +
> > + tcp_rsk(req)->is_mptcp = 1;
> > + pr_debug("subflow_req=%p, listener=%p", subflow_req, listener);
> > +
> > + tcp_request_sock_ipv4_ops.init_req(req, sk_listener, skb);
> > +
> > + rx_opt.mptcp.flags = 0;
> > + rx_opt.mptcp.mp_capable = 0;
> > + rx_opt.mptcp.mp_join = 0;
> > + rx_opt.mptcp.dss = 0;
> > + mptcp_get_options(skb, &rx_opt);
> > +
> > + if (rx_opt.mptcp.mp_capable && listener->request_mptcp) {
> > + subflow_req->mp_capable = 1;
> > + if (rx_opt.mptcp.version >= listener->version)
> > + subflow_req->version = listener->version;
> > + else
> > + subflow_req->version = rx_opt.mptcp.version;
> > + if ((rx_opt.mptcp.flags & MPTCP_CAP_CHECKSUM_REQD) ||
> > + listener->checksum)
> > + subflow_req->checksum = 1;
> > + subflow_req->remote_key = rx_opt.mptcp.sndr_key;
> > + } else {
> > + subflow_req->mp_capable = 0;
> > + }
> > +}
> > +
> > static void subflow_finish_connect(struct sock *sk, const struct sk_buff *skb)
> > {
> > struct subflow_sock *subflow = subflow_sk(sk);
> > @@ -68,13 +102,66 @@ static void subflow_finish_connect(struct sock *sk, const struct sk_buff *skb)
> > }
> > }
> >
> > +static struct request_sock_ops subflow_request_sock_ops;
> > +static struct tcp_request_sock_ops subflow_request_sock_ipv4_ops;
> > +
> > +static int subflow_conn_request(struct sock *sk, struct sk_buff *skb)
> > +{
> > + struct subflow_sock *subflow = subflow_sk(sk);
> > +
> > + pr_debug("subflow=%p", subflow);
> > +
> > + /* Never answer to SYNs sent to broadcast or multicast */
> > + if (skb_rtable(skb)->rt_flags & (RTCF_BROADCAST | RTCF_MULTICAST))
> > + goto drop;
> > +
> > + return tcp_conn_request(&subflow_request_sock_ops,
> > + &subflow_request_sock_ipv4_ops,
> > + sk, skb);
> > +drop:
> > + tcp_listendrop(sk);
> > + return 0;
> > +}
> > +
> > +static struct sock *subflow_syn_recv_sock(const struct sock *sk,
> > + struct sk_buff *skb,
> > + struct request_sock *req,
> > + struct dst_entry *dst,
> > + struct request_sock *req_unhash,
> > + bool *own_req)
> > +{
> > + struct subflow_sock *listener = subflow_sk(sk);
> > + struct subflow_request_sock *subflow_req = subflow_rsk(req);
> > + struct sock *child;
> > +
> > + pr_debug("listener=%p, req=%p, conn=%p", sk, req, listener->conn);
> > +
> > + child = tcp_v4_syn_recv_sock(sk, skb, req, dst, req_unhash, own_req);
> > +
> > + if (child) {
> > + struct subflow_sock *subflow = subflow_sk(child);
> > +
> > + pr_debug("child=%p", child);
> > + if (subflow_req->mp_capable) {
> > + subflow->mp_capable = 1;
> > + subflow->fourth_ack = 1;
> > + subflow->remote_key = subflow_req->remote_key;
> > + subflow->local_key = subflow_req->local_key;
> > + } else {
> > + subflow->mp_capable = 0;
> > + }
> > + }
> > +
> > + return child;
> > +}
> > +
> > const struct inet_connection_sock_af_ops subflow_specific = {
> > .queue_xmit = ip_queue_xmit,
> > .send_check = tcp_v4_send_check,
> > .rebuild_header = inet_sk_rebuild_header,
> > .sk_rx_dst_set = subflow_finish_connect,
> > - .conn_request = tcp_v4_conn_request,
> > - .syn_recv_sock = tcp_v4_syn_recv_sock,
> > + .conn_request = subflow_conn_request,
> > + .syn_recv_sock = subflow_syn_recv_sock,
> > .net_header_len = sizeof(struct iphdr),
> > .setsockopt = ip_setsockopt,
> > .getsockopt = ip_getsockopt,
> > @@ -112,6 +199,21 @@ static void subflow_close(struct sock *sk, long timeout)
> > tcp_close(sk, timeout);
> > }
> >
> > +static struct sock *subflow_accept(struct sock *sk, int flags, int *err,
> > + bool kern)
> > +{
> > + struct subflow_sock *subflow = subflow_sk(sk);
> > + struct sock *child;
> > +
> > + pr_debug("subflow=%p, conn=%p", subflow, subflow->conn);
> > +
> > + child = inet_csk_accept(sk, flags, err, kern);
> > +
> > + pr_debug("child=%p", child);
> > +
> > + return child;
> > +}
> > +
> > static void subflow_destroy(struct sock *sk)
> > {
> > pr_debug("subflow=%p", sk);
> > @@ -125,7 +227,7 @@ static struct proto subflow_prot = {
> > .close = subflow_close,
> > .connect = subflow_connect,
> > .disconnect = tcp_disconnect,
> > - .accept = inet_csk_accept,
> > + .accept = subflow_accept,
> > .ioctl = tcp_ioctl,
> > .init = subflow_init_sock,
> > .destroy = subflow_destroy,
> > @@ -169,7 +271,14 @@ int mptcp_subflow_init(void)
> >
> > /* TODO: Register path manager callbacks. */
> >
> > + subflow_request_sock_ops = tcp_request_sock_ops;
> > + subflow_request_sock_ops.obj_size = sizeof(struct subflow_request_sock),
> > +
> > + subflow_request_sock_ipv4_ops = tcp_request_sock_ipv4_ops;
> > + subflow_request_sock_ipv4_ops.init_req = subflow_v4_init_req;
> > +
> > subflow_prot.twsk_prot = tcp_prot.twsk_prot;
> > + subflow_prot.rsk_prot = &subflow_request_sock_ops;
> > subflow_prot.h.hashinfo = tcp_prot.h.hashinfo;
> > err = proto_register(&subflow_prot, 1);
> > if (err)
> > --
> > 2.19.1
> >
> > _______________________________________________
> > mptcp mailing list
> > mptcp(a)lists.01.org
> > https://lists.01.org/mailman/listinfo/mptcp
>
> _______________________________________________
> mptcp mailing list
> mptcp(a)lists.01.org
> https://lists.01.org/mailman/listinfo/mptcp
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [MPTCP] [RFC PATCH v4 08/17] mptcp: Create SUBFLOW socket for incoming connections
@ 2018-12-12 6:08 Christoph Paasch
0 siblings, 0 replies; 7+ messages in thread
From: Christoph Paasch @ 2018-12-12 6:08 UTC (permalink / raw)
To: mptcp
[-- Attachment #1: Type: text/plain, Size: 17218 bytes --]
Hello,
On 30/11/18 - 12:11:03, Mat Martineau wrote:
> From: Peter Krystad <peter.krystad(a)intel.com>
>
> Add subflow_request_sock type that extends tcp_request_sock
> and add an is_mptcp flag to tcp_request_sock distinguish them.
>
> Override the listen() and accept() methods of the MPTCP
> socket proto_ops so they may act on the subflow socket.
>
> Override the conn_request() and syn_recv_sock() handlers
> in the inet_connection_sock to handle incoming MPTCP
> SYNs and the ACK to the response SYN.
I'm having quite a hard time to understand how it works. Can you give some
more details?
Because, the difficult part about MPTCP is that incoming subflows are no
more matching on a listener but rather on a "established" MPTCP-socket based
on the token that is present in the TCP-options.
And, I don't see how this is being taken care of here.
Is the expectation that the app will call "listen()" and "accept()" on the
MPTCP-socket ?
Thanks,
Christoph
>
> Add handling in tcp_output.c to add MP_CAPABLE to an outgoing
> SYN-ACK response for a subflow_request_sock.
>
> Signed-off-by: Peter Krystad <peter.krystad(a)intel.com>
> ---
> include/linux/tcp.h | 1 +
> include/net/mptcp.h | 26 ++++++++++
> include/net/tcp.h | 1 +
> net/ipv4/tcp_input.c | 1 +
> net/ipv4/tcp_output.c | 21 +++++++-
> net/mptcp/options.c | 15 ++++++
> net/mptcp/protocol.c | 102 ++++++++++++++++++++++++++++++++++---
> net/mptcp/subflow.c | 115 ++++++++++++++++++++++++++++++++++++++++--
> 8 files changed, 271 insertions(+), 11 deletions(-)
>
> diff --git a/include/linux/tcp.h b/include/linux/tcp.h
> index 2622817ecd6b..b54ab3b5546a 100644
> --- a/include/linux/tcp.h
> +++ b/include/linux/tcp.h
> @@ -148,6 +148,7 @@ struct tcp_request_sock {
> * FastOpen it's the seq#
> * after data-in-SYN.
> */
> + bool is_mptcp;
> };
>
> static inline struct tcp_request_sock *tcp_rsk(const struct request_sock *req)
> diff --git a/include/net/mptcp.h b/include/net/mptcp.h
> index a5c2baeb688f..ced33f1c529e 100644
> --- a/include/net/mptcp.h
> +++ b/include/net/mptcp.h
> @@ -69,6 +69,23 @@ static inline struct subflow_sock *subflow_sk(const struct sock *sk)
> return (struct subflow_sock *)sk;
> }
>
> +struct subflow_request_sock {
> + struct tcp_request_sock sk;
> + u8 mp_capable : 1,
> + mp_join : 1,
> + checksum : 1,
> + backup : 1,
> + version : 4;
> + u64 local_key;
> + u64 remote_key;
> +};
> +
> +static inline
> +struct subflow_request_sock *subflow_rsk(const struct request_sock *rsk)
> +{
> + return (struct subflow_request_sock *)rsk;
> +}
> +
> #ifdef CONFIG_MPTCP
>
> void mptcp_parse_option(const unsigned char *ptr, int opsize,
> @@ -77,6 +94,8 @@ unsigned int mptcp_syn_options(struct sock *sk, u64 *local_key);
> void mptcp_rcv_synsent(struct sock *sk);
> unsigned int mptcp_established_options(struct sock *sk, u64 *local_key,
> u64 *remote_key);
> +unsigned int mptcp_synack_options(struct request_sock *req,
> + u64 *local_key, u64 *remote_key);
>
> void mptcp_finish_connect(struct sock *sk, int mp_capable);
>
> @@ -104,6 +123,13 @@ static inline void mptcp_rcv_synsent(struct sock *sk)
> {
> }
>
> +static inline unsigned int mptcp_synack_options(struct request_sock *sk,
> + u64 *local_key,
> + u64 *remote_key)
> +{
> + return 0;
> +}
> +
> static inline unsigned int mptcp_established_options(struct sock *sk,
> u64 *local_key,
> u64 *remote_key)
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 254cf82e2ec6..1fc6362fa778 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -216,6 +216,7 @@ void tcp_time_wait(struct sock *sk, int state, int timeo);
> #define TCPOLEN_MSS_ALIGNED 4
> #define TCPOLEN_EXP_SMC_BASE_ALIGNED 8
> #define TCPOLEN_MPTCP_MPC_SYN 12
> +#define TCPOLEN_MPTCP_MPC_SYNACK 20
> #define TCPOLEN_MPTCP_MPC_ACK 20
>
> /* Flags in tp->nonagle */
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index eda515b141fb..00f7a3d88d66 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -6445,6 +6445,7 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
>
> tcp_rsk(req)->af_specific = af_ops;
> tcp_rsk(req)->ts_off = 0;
> + tcp_rsk(req)->is_mptcp = 0;
>
> tcp_clear_options(&tmp_opt);
> tmp_opt.mss_clamp = af_ops->mss_clamp;
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 4f284ed879ba..6f723cdb5c8e 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -416,6 +416,7 @@ static inline bool tcp_urg_mode(const struct tcp_sock *tp)
>
> /* MPTCP option subtypes */
> #define OPTION_MPTCP_MPC_SYN (1 << 0)
> +#define OPTION_MPTCP_MPC_SYNACK (1 << 1)
> #define OPTION_MPTCP_MPC_ACK (1 << 2)
>
> struct tcp_out_options {
> @@ -439,12 +440,15 @@ static void mptcp_options_write(__be32 *ptr, struct tcp_out_options *opts)
> return;
>
> if ((OPTION_MPTCP_MPC_SYN |
> + OPTION_MPTCP_MPC_SYNACK |
> OPTION_MPTCP_MPC_ACK) & opts->suboptions) {
> u8 len;
> __be64 key;
>
> if (OPTION_MPTCP_MPC_SYN & opts->suboptions)
> len = TCPOLEN_MPTCP_MPC_SYN;
> + else if (OPTION_MPTCP_MPC_SYNACK & opts->suboptions)
> + len = TCPOLEN_MPTCP_MPC_SYNACK;
> else
> len = TCPOLEN_MPTCP_MPC_ACK;
>
> @@ -455,7 +459,8 @@ static void mptcp_options_write(__be32 *ptr, struct tcp_out_options *opts)
> key = cpu_to_be64(opts->sndr_key);
> memcpy((u8 *) ptr, (u8 *) &key, 8);
> ptr += 2;
> - if (OPTION_MPTCP_MPC_ACK & opts->suboptions) {
> + if ((OPTION_MPTCP_MPC_SYNACK |
> + OPTION_MPTCP_MPC_ACK) & opts->suboptions) {
> key = cpu_to_be64(opts->rcvr_key);
> memcpy((u8 *) ptr, (u8 *) &key, 8);
> ptr += 2;
> @@ -762,6 +767,20 @@ static unsigned int tcp_synack_options(const struct sock *sk,
> remaining -= need;
> }
> }
> + if (tcp_rsk(req)->is_mptcp) {
> + u64 local_key;
> + u64 remote_key;
> + if (mptcp_synack_options(req, &local_key, &remote_key)) {
> + if (remaining >= TCPOLEN_MPTCP_MPC_SYNACK) {
> + opts->options |= OPTION_MPTCP;
> + opts->suboptions = OPTION_MPTCP_MPC_SYNACK;
> + opts->sndr_key = local_key;
> + opts->rcvr_key = remote_key;
> + remaining -= TCPOLEN_MPTCP_MPC_SYNACK;
> + }
> + }
> + }
> +
> smc_set_option_cond(tcp_sk(sk), ireq, opts, &remaining);
>
> return MAX_TCP_OPTION_SPACE - remaining;
> diff --git a/net/mptcp/options.c b/net/mptcp/options.c
> index b0616f520da0..266a9f7fed0d 100644
> --- a/net/mptcp/options.c
> +++ b/net/mptcp/options.c
> @@ -189,3 +189,18 @@ unsigned int mptcp_established_options(struct sock *sk, u64 *local_key,
> }
> return 0;
> }
> +
> +unsigned int mptcp_synack_options(struct request_sock *req, u64 *local_key,
> + u64 *remote_key)
> +{
> + struct subflow_request_sock *subflow_req = subflow_rsk(req);
> +
> + pr_debug("subflow_req=%p", subflow_req);
> + if (subflow_req->mp_capable) {
> + *local_key = subflow_req->local_key;
> + *remote_key = subflow_req->remote_key;
> + pr_debug("local_key=%llu", *local_key);
> + pr_debug("remote_key=%llu", *remote_key);
> + }
> + return subflow_req->mp_capable;
> +}
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 1a3412a742ea..9f802f69a528 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -80,6 +80,45 @@ static void mptcp_close(struct sock *sk, long timeout)
> }
> }
>
> +static struct sock *mptcp_accept(struct sock *sk, int flags, int *err,
> + bool kern)
> +{
> + struct mptcp_sock *msk = mptcp_sk(sk);
> + struct socket *listener = msk->subflow;
> + struct socket *new_sock;
> + struct socket *mp;
> + struct subflow_sock *subflow;
> +
> + pr_debug("msk=%p, listener=%p", msk, listener->sk);
> + *err = kernel_accept(listener, &new_sock, flags);
> + if (*err < 0)
> + return NULL;
> +
> + subflow = subflow_sk(new_sock->sk);
> + pr_debug("new_sock=%p", subflow);
> +
> + *err = sock_create(PF_INET, SOCK_STREAM, IPPROTO_MPTCP, &mp);
> + if (*err < 0) {
> + kernel_sock_shutdown(new_sock, SHUT_RDWR);
> + sock_release(new_sock);
> + return NULL;
> + }
> +
> + msk = mptcp_sk(mp->sk);
> + pr_debug("msk=%p", msk);
> + subflow->conn = mp->sk;
> +
> + if (subflow->mp_capable) {
> + msk->remote_key = subflow->remote_key;
> + msk->local_key = subflow->local_key;
> + msk->connection_list = new_sock;
> + } else {
> + msk->subflow = new_sock;
> + }
> +
> + return mp->sk;
> +}
> +
> static int mptcp_get_port(struct sock *sk, unsigned short snum)
> {
> struct mptcp_sock *msk = mptcp_sk(sk);
> @@ -129,11 +168,16 @@ static int subflow_create(struct sock *sock)
> int mptcp_stream_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
> {
> struct mptcp_sock *msk = mptcp_sk(sock->sk);
> - struct socket *subflow = msk->subflow;
> + int err;
>
> - pr_debug("msk=%p, subflow=%p", msk, subflow->sk);
> + pr_debug("msk=%p", msk);
>
> - return inet_bind(subflow, uaddr, addr_len);
> + if (msk->subflow == NULL) {
> + err = subflow_create(sock->sk);
> + if (err)
> + return err;
> + }
> + return inet_bind(msk->subflow, uaddr, addr_len);
> }
>
> int mptcp_stream_connect(struct socket *sock, struct sockaddr *uaddr,
> @@ -153,12 +197,56 @@ int mptcp_stream_connect(struct socket *sock, struct sockaddr *uaddr,
> return inet_stream_connect(msk->subflow, uaddr, addr_len, flags);
> }
>
> +int mptcp_stream_getname(struct socket *sock, struct sockaddr *uaddr, int peer)
> +{
> + struct mptcp_sock *msk = mptcp_sk(sock->sk);
> + struct socket *subflow;
> + int err = -EPERM;
> +
> + if (msk->connection_list)
> + subflow = msk->connection_list;
> + else
> + subflow = msk->subflow;
> +
> + err = inet_getname(subflow, uaddr, peer);
> +
> + return err;
> +}
> +
> +int mptcp_stream_listen(struct socket *sock, int backlog)
> +{
> + struct mptcp_sock *msk = mptcp_sk(sock->sk);
> + int err;
> +
> + pr_debug("msk=%p", msk);
> +
> + if (msk->subflow == NULL) {
> + err = subflow_create(sock->sk);
> + if (err)
> + return err;
> + }
> + return inet_listen(msk->subflow, backlog);
> +}
> +
> +int mptcp_stream_accept(struct socket *sock, struct socket *newsock, int flags,
> + bool kern)
> +{
> + struct mptcp_sock *msk = mptcp_sk(sock->sk);
> +
> + pr_debug("msk=%p", msk);
> +
> + if (msk->subflow == NULL) {
> + return -EINVAL;
> + }
> + return inet_accept(sock, newsock, flags, kern);
> +}
> +
> static struct proto mptcp_prot = {
> .name = "MPTCP",
> .owner = THIS_MODULE,
> .init = mptcp_init_sock,
> .close = mptcp_close,
> - .accept = inet_csk_accept,
> + .accept = mptcp_accept,
> .shutdown = tcp_shutdown,
> .sendmsg = mptcp_sendmsg,
> .recvmsg = mptcp_recvmsg,
> @@ -176,11 +264,11 @@ const struct proto_ops mptcp_stream_ops = {
> .bind = mptcp_stream_bind,
> .connect = mptcp_stream_connect,
> .socketpair = sock_no_socketpair,
> - .accept = inet_accept,
> - .getname = inet_getname,
> + .accept = mptcp_stream_accept,
> + .getname = mptcp_stream_getname,
> .poll = tcp_poll,
> .ioctl = inet_ioctl,
> - .listen = inet_listen,
> + .listen = mptcp_stream_listen,
> .shutdown = inet_shutdown,
> .setsockopt = sock_common_setsockopt,
> .getsockopt = sock_common_getsockopt,
> diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
> index 5e5fdcb3175f..89fcc3b746eb 100644
> --- a/net/mptcp/subflow.c
> +++ b/net/mptcp/subflow.c
> @@ -53,6 +53,40 @@ static int subflow_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
> return tcp_recvmsg(sk, msg, len, nonblock, flags, addr_len);
> }
>
> +static void subflow_v4_init_req(struct request_sock *req,
> + const struct sock *sk_listener,
> + struct sk_buff *skb)
> +{
> + struct subflow_request_sock *subflow_req = subflow_rsk(req);
> + struct subflow_sock *listener = subflow_sk(sk_listener);
> + struct tcp_options_received rx_opt;
> +
> + tcp_rsk(req)->is_mptcp = 1;
> + pr_debug("subflow_req=%p, listener=%p", subflow_req, listener);
> +
> + tcp_request_sock_ipv4_ops.init_req(req, sk_listener, skb);
> +
> + rx_opt.mptcp.flags = 0;
> + rx_opt.mptcp.mp_capable = 0;
> + rx_opt.mptcp.mp_join = 0;
> + rx_opt.mptcp.dss = 0;
> + mptcp_get_options(skb, &rx_opt);
> +
> + if (rx_opt.mptcp.mp_capable && listener->request_mptcp) {
> + subflow_req->mp_capable = 1;
> + if (rx_opt.mptcp.version >= listener->version)
> + subflow_req->version = listener->version;
> + else
> + subflow_req->version = rx_opt.mptcp.version;
> + if ((rx_opt.mptcp.flags & MPTCP_CAP_CHECKSUM_REQD) ||
> + listener->checksum)
> + subflow_req->checksum = 1;
> + subflow_req->remote_key = rx_opt.mptcp.sndr_key;
> + } else {
> + subflow_req->mp_capable = 0;
> + }
> +}
> +
> static void subflow_finish_connect(struct sock *sk, const struct sk_buff *skb)
> {
> struct subflow_sock *subflow = subflow_sk(sk);
> @@ -68,13 +102,66 @@ static void subflow_finish_connect(struct sock *sk, const struct sk_buff *skb)
> }
> }
>
> +static struct request_sock_ops subflow_request_sock_ops;
> +static struct tcp_request_sock_ops subflow_request_sock_ipv4_ops;
> +
> +static int subflow_conn_request(struct sock *sk, struct sk_buff *skb)
> +{
> + struct subflow_sock *subflow = subflow_sk(sk);
> +
> + pr_debug("subflow=%p", subflow);
> +
> + /* Never answer to SYNs sent to broadcast or multicast */
> + if (skb_rtable(skb)->rt_flags & (RTCF_BROADCAST | RTCF_MULTICAST))
> + goto drop;
> +
> + return tcp_conn_request(&subflow_request_sock_ops,
> + &subflow_request_sock_ipv4_ops,
> + sk, skb);
> +drop:
> + tcp_listendrop(sk);
> + return 0;
> +}
> +
> +static struct sock *subflow_syn_recv_sock(const struct sock *sk,
> + struct sk_buff *skb,
> + struct request_sock *req,
> + struct dst_entry *dst,
> + struct request_sock *req_unhash,
> + bool *own_req)
> +{
> + struct subflow_sock *listener = subflow_sk(sk);
> + struct subflow_request_sock *subflow_req = subflow_rsk(req);
> + struct sock *child;
> +
> + pr_debug("listener=%p, req=%p, conn=%p", sk, req, listener->conn);
> +
> + child = tcp_v4_syn_recv_sock(sk, skb, req, dst, req_unhash, own_req);
> +
> + if (child) {
> + struct subflow_sock *subflow = subflow_sk(child);
> +
> + pr_debug("child=%p", child);
> + if (subflow_req->mp_capable) {
> + subflow->mp_capable = 1;
> + subflow->fourth_ack = 1;
> + subflow->remote_key = subflow_req->remote_key;
> + subflow->local_key = subflow_req->local_key;
> + } else {
> + subflow->mp_capable = 0;
> + }
> + }
> +
> + return child;
> +}
> +
> const struct inet_connection_sock_af_ops subflow_specific = {
> .queue_xmit = ip_queue_xmit,
> .send_check = tcp_v4_send_check,
> .rebuild_header = inet_sk_rebuild_header,
> .sk_rx_dst_set = subflow_finish_connect,
> - .conn_request = tcp_v4_conn_request,
> - .syn_recv_sock = tcp_v4_syn_recv_sock,
> + .conn_request = subflow_conn_request,
> + .syn_recv_sock = subflow_syn_recv_sock,
> .net_header_len = sizeof(struct iphdr),
> .setsockopt = ip_setsockopt,
> .getsockopt = ip_getsockopt,
> @@ -112,6 +199,21 @@ static void subflow_close(struct sock *sk, long timeout)
> tcp_close(sk, timeout);
> }
>
> +static struct sock *subflow_accept(struct sock *sk, int flags, int *err,
> + bool kern)
> +{
> + struct subflow_sock *subflow = subflow_sk(sk);
> + struct sock *child;
> +
> + pr_debug("subflow=%p, conn=%p", subflow, subflow->conn);
> +
> + child = inet_csk_accept(sk, flags, err, kern);
> +
> + pr_debug("child=%p", child);
> +
> + return child;
> +}
> +
> static void subflow_destroy(struct sock *sk)
> {
> pr_debug("subflow=%p", sk);
> @@ -125,7 +227,7 @@ static struct proto subflow_prot = {
> .close = subflow_close,
> .connect = subflow_connect,
> .disconnect = tcp_disconnect,
> - .accept = inet_csk_accept,
> + .accept = subflow_accept,
> .ioctl = tcp_ioctl,
> .init = subflow_init_sock,
> .destroy = subflow_destroy,
> @@ -169,7 +271,14 @@ int mptcp_subflow_init(void)
>
> /* TODO: Register path manager callbacks. */
>
> + subflow_request_sock_ops = tcp_request_sock_ops;
> + subflow_request_sock_ops.obj_size = sizeof(struct subflow_request_sock),
> +
> + subflow_request_sock_ipv4_ops = tcp_request_sock_ipv4_ops;
> + subflow_request_sock_ipv4_ops.init_req = subflow_v4_init_req;
> +
> subflow_prot.twsk_prot = tcp_prot.twsk_prot;
> + subflow_prot.rsk_prot = &subflow_request_sock_ops;
> subflow_prot.h.hashinfo = tcp_prot.h.hashinfo;
> err = proto_register(&subflow_prot, 1);
> if (err)
> --
> 2.19.1
>
> _______________________________________________
> mptcp mailing list
> mptcp(a)lists.01.org
> https://lists.01.org/mailman/listinfo/mptcp
^ permalink raw reply [flat|nested] 7+ messages in thread
* [MPTCP] [RFC PATCH v4 08/17] mptcp: Create SUBFLOW socket for incoming connections
@ 2018-11-30 20:11 Mat Martineau
0 siblings, 0 replies; 7+ messages in thread
From: Mat Martineau @ 2018-11-30 20:11 UTC (permalink / raw)
To: mptcp
[-- Attachment #1: Type: text/plain, Size: 15541 bytes --]
From: Peter Krystad <peter.krystad(a)intel.com>
Add subflow_request_sock type that extends tcp_request_sock
and add an is_mptcp flag to tcp_request_sock distinguish them.
Override the listen() and accept() methods of the MPTCP
socket proto_ops so they may act on the subflow socket.
Override the conn_request() and syn_recv_sock() handlers
in the inet_connection_sock to handle incoming MPTCP
SYNs and the ACK to the response SYN.
Add handling in tcp_output.c to add MP_CAPABLE to an outgoing
SYN-ACK response for a subflow_request_sock.
Signed-off-by: Peter Krystad <peter.krystad(a)intel.com>
---
include/linux/tcp.h | 1 +
include/net/mptcp.h | 26 ++++++++++
include/net/tcp.h | 1 +
net/ipv4/tcp_input.c | 1 +
net/ipv4/tcp_output.c | 21 +++++++-
net/mptcp/options.c | 15 ++++++
net/mptcp/protocol.c | 102 ++++++++++++++++++++++++++++++++++---
net/mptcp/subflow.c | 115 ++++++++++++++++++++++++++++++++++++++++--
8 files changed, 271 insertions(+), 11 deletions(-)
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 2622817ecd6b..b54ab3b5546a 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -148,6 +148,7 @@ struct tcp_request_sock {
* FastOpen it's the seq#
* after data-in-SYN.
*/
+ bool is_mptcp;
};
static inline struct tcp_request_sock *tcp_rsk(const struct request_sock *req)
diff --git a/include/net/mptcp.h b/include/net/mptcp.h
index a5c2baeb688f..ced33f1c529e 100644
--- a/include/net/mptcp.h
+++ b/include/net/mptcp.h
@@ -69,6 +69,23 @@ static inline struct subflow_sock *subflow_sk(const struct sock *sk)
return (struct subflow_sock *)sk;
}
+struct subflow_request_sock {
+ struct tcp_request_sock sk;
+ u8 mp_capable : 1,
+ mp_join : 1,
+ checksum : 1,
+ backup : 1,
+ version : 4;
+ u64 local_key;
+ u64 remote_key;
+};
+
+static inline
+struct subflow_request_sock *subflow_rsk(const struct request_sock *rsk)
+{
+ return (struct subflow_request_sock *)rsk;
+}
+
#ifdef CONFIG_MPTCP
void mptcp_parse_option(const unsigned char *ptr, int opsize,
@@ -77,6 +94,8 @@ unsigned int mptcp_syn_options(struct sock *sk, u64 *local_key);
void mptcp_rcv_synsent(struct sock *sk);
unsigned int mptcp_established_options(struct sock *sk, u64 *local_key,
u64 *remote_key);
+unsigned int mptcp_synack_options(struct request_sock *req,
+ u64 *local_key, u64 *remote_key);
void mptcp_finish_connect(struct sock *sk, int mp_capable);
@@ -104,6 +123,13 @@ static inline void mptcp_rcv_synsent(struct sock *sk)
{
}
+static inline unsigned int mptcp_synack_options(struct request_sock *sk,
+ u64 *local_key,
+ u64 *remote_key)
+{
+ return 0;
+}
+
static inline unsigned int mptcp_established_options(struct sock *sk,
u64 *local_key,
u64 *remote_key)
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 254cf82e2ec6..1fc6362fa778 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -216,6 +216,7 @@ void tcp_time_wait(struct sock *sk, int state, int timeo);
#define TCPOLEN_MSS_ALIGNED 4
#define TCPOLEN_EXP_SMC_BASE_ALIGNED 8
#define TCPOLEN_MPTCP_MPC_SYN 12
+#define TCPOLEN_MPTCP_MPC_SYNACK 20
#define TCPOLEN_MPTCP_MPC_ACK 20
/* Flags in tp->nonagle */
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index eda515b141fb..00f7a3d88d66 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -6445,6 +6445,7 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
tcp_rsk(req)->af_specific = af_ops;
tcp_rsk(req)->ts_off = 0;
+ tcp_rsk(req)->is_mptcp = 0;
tcp_clear_options(&tmp_opt);
tmp_opt.mss_clamp = af_ops->mss_clamp;
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 4f284ed879ba..6f723cdb5c8e 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -416,6 +416,7 @@ static inline bool tcp_urg_mode(const struct tcp_sock *tp)
/* MPTCP option subtypes */
#define OPTION_MPTCP_MPC_SYN (1 << 0)
+#define OPTION_MPTCP_MPC_SYNACK (1 << 1)
#define OPTION_MPTCP_MPC_ACK (1 << 2)
struct tcp_out_options {
@@ -439,12 +440,15 @@ static void mptcp_options_write(__be32 *ptr, struct tcp_out_options *opts)
return;
if ((OPTION_MPTCP_MPC_SYN |
+ OPTION_MPTCP_MPC_SYNACK |
OPTION_MPTCP_MPC_ACK) & opts->suboptions) {
u8 len;
__be64 key;
if (OPTION_MPTCP_MPC_SYN & opts->suboptions)
len = TCPOLEN_MPTCP_MPC_SYN;
+ else if (OPTION_MPTCP_MPC_SYNACK & opts->suboptions)
+ len = TCPOLEN_MPTCP_MPC_SYNACK;
else
len = TCPOLEN_MPTCP_MPC_ACK;
@@ -455,7 +459,8 @@ static void mptcp_options_write(__be32 *ptr, struct tcp_out_options *opts)
key = cpu_to_be64(opts->sndr_key);
memcpy((u8 *) ptr, (u8 *) &key, 8);
ptr += 2;
- if (OPTION_MPTCP_MPC_ACK & opts->suboptions) {
+ if ((OPTION_MPTCP_MPC_SYNACK |
+ OPTION_MPTCP_MPC_ACK) & opts->suboptions) {
key = cpu_to_be64(opts->rcvr_key);
memcpy((u8 *) ptr, (u8 *) &key, 8);
ptr += 2;
@@ -762,6 +767,20 @@ static unsigned int tcp_synack_options(const struct sock *sk,
remaining -= need;
}
}
+ if (tcp_rsk(req)->is_mptcp) {
+ u64 local_key;
+ u64 remote_key;
+ if (mptcp_synack_options(req, &local_key, &remote_key)) {
+ if (remaining >= TCPOLEN_MPTCP_MPC_SYNACK) {
+ opts->options |= OPTION_MPTCP;
+ opts->suboptions = OPTION_MPTCP_MPC_SYNACK;
+ opts->sndr_key = local_key;
+ opts->rcvr_key = remote_key;
+ remaining -= TCPOLEN_MPTCP_MPC_SYNACK;
+ }
+ }
+ }
+
smc_set_option_cond(tcp_sk(sk), ireq, opts, &remaining);
return MAX_TCP_OPTION_SPACE - remaining;
diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index b0616f520da0..266a9f7fed0d 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -189,3 +189,18 @@ unsigned int mptcp_established_options(struct sock *sk, u64 *local_key,
}
return 0;
}
+
+unsigned int mptcp_synack_options(struct request_sock *req, u64 *local_key,
+ u64 *remote_key)
+{
+ struct subflow_request_sock *subflow_req = subflow_rsk(req);
+
+ pr_debug("subflow_req=%p", subflow_req);
+ if (subflow_req->mp_capable) {
+ *local_key = subflow_req->local_key;
+ *remote_key = subflow_req->remote_key;
+ pr_debug("local_key=%llu", *local_key);
+ pr_debug("remote_key=%llu", *remote_key);
+ }
+ return subflow_req->mp_capable;
+}
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 1a3412a742ea..9f802f69a528 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -80,6 +80,45 @@ static void mptcp_close(struct sock *sk, long timeout)
}
}
+static struct sock *mptcp_accept(struct sock *sk, int flags, int *err,
+ bool kern)
+{
+ struct mptcp_sock *msk = mptcp_sk(sk);
+ struct socket *listener = msk->subflow;
+ struct socket *new_sock;
+ struct socket *mp;
+ struct subflow_sock *subflow;
+
+ pr_debug("msk=%p, listener=%p", msk, listener->sk);
+ *err = kernel_accept(listener, &new_sock, flags);
+ if (*err < 0)
+ return NULL;
+
+ subflow = subflow_sk(new_sock->sk);
+ pr_debug("new_sock=%p", subflow);
+
+ *err = sock_create(PF_INET, SOCK_STREAM, IPPROTO_MPTCP, &mp);
+ if (*err < 0) {
+ kernel_sock_shutdown(new_sock, SHUT_RDWR);
+ sock_release(new_sock);
+ return NULL;
+ }
+
+ msk = mptcp_sk(mp->sk);
+ pr_debug("msk=%p", msk);
+ subflow->conn = mp->sk;
+
+ if (subflow->mp_capable) {
+ msk->remote_key = subflow->remote_key;
+ msk->local_key = subflow->local_key;
+ msk->connection_list = new_sock;
+ } else {
+ msk->subflow = new_sock;
+ }
+
+ return mp->sk;
+}
+
static int mptcp_get_port(struct sock *sk, unsigned short snum)
{
struct mptcp_sock *msk = mptcp_sk(sk);
@@ -129,11 +168,16 @@ static int subflow_create(struct sock *sock)
int mptcp_stream_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
{
struct mptcp_sock *msk = mptcp_sk(sock->sk);
- struct socket *subflow = msk->subflow;
+ int err;
- pr_debug("msk=%p, subflow=%p", msk, subflow->sk);
+ pr_debug("msk=%p", msk);
- return inet_bind(subflow, uaddr, addr_len);
+ if (msk->subflow == NULL) {
+ err = subflow_create(sock->sk);
+ if (err)
+ return err;
+ }
+ return inet_bind(msk->subflow, uaddr, addr_len);
}
int mptcp_stream_connect(struct socket *sock, struct sockaddr *uaddr,
@@ -153,12 +197,56 @@ int mptcp_stream_connect(struct socket *sock, struct sockaddr *uaddr,
return inet_stream_connect(msk->subflow, uaddr, addr_len, flags);
}
+int mptcp_stream_getname(struct socket *sock, struct sockaddr *uaddr, int peer)
+{
+ struct mptcp_sock *msk = mptcp_sk(sock->sk);
+ struct socket *subflow;
+ int err = -EPERM;
+
+ if (msk->connection_list)
+ subflow = msk->connection_list;
+ else
+ subflow = msk->subflow;
+
+ err = inet_getname(subflow, uaddr, peer);
+
+ return err;
+}
+
+int mptcp_stream_listen(struct socket *sock, int backlog)
+{
+ struct mptcp_sock *msk = mptcp_sk(sock->sk);
+ int err;
+
+ pr_debug("msk=%p", msk);
+
+ if (msk->subflow == NULL) {
+ err = subflow_create(sock->sk);
+ if (err)
+ return err;
+ }
+ return inet_listen(msk->subflow, backlog);
+}
+
+int mptcp_stream_accept(struct socket *sock, struct socket *newsock, int flags,
+ bool kern)
+{
+ struct mptcp_sock *msk = mptcp_sk(sock->sk);
+
+ pr_debug("msk=%p", msk);
+
+ if (msk->subflow == NULL) {
+ return -EINVAL;
+ }
+ return inet_accept(sock, newsock, flags, kern);
+}
+
static struct proto mptcp_prot = {
.name = "MPTCP",
.owner = THIS_MODULE,
.init = mptcp_init_sock,
.close = mptcp_close,
- .accept = inet_csk_accept,
+ .accept = mptcp_accept,
.shutdown = tcp_shutdown,
.sendmsg = mptcp_sendmsg,
.recvmsg = mptcp_recvmsg,
@@ -176,11 +264,11 @@ const struct proto_ops mptcp_stream_ops = {
.bind = mptcp_stream_bind,
.connect = mptcp_stream_connect,
.socketpair = sock_no_socketpair,
- .accept = inet_accept,
- .getname = inet_getname,
+ .accept = mptcp_stream_accept,
+ .getname = mptcp_stream_getname,
.poll = tcp_poll,
.ioctl = inet_ioctl,
- .listen = inet_listen,
+ .listen = mptcp_stream_listen,
.shutdown = inet_shutdown,
.setsockopt = sock_common_setsockopt,
.getsockopt = sock_common_getsockopt,
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 5e5fdcb3175f..89fcc3b746eb 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -53,6 +53,40 @@ static int subflow_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
return tcp_recvmsg(sk, msg, len, nonblock, flags, addr_len);
}
+static void subflow_v4_init_req(struct request_sock *req,
+ const struct sock *sk_listener,
+ struct sk_buff *skb)
+{
+ struct subflow_request_sock *subflow_req = subflow_rsk(req);
+ struct subflow_sock *listener = subflow_sk(sk_listener);
+ struct tcp_options_received rx_opt;
+
+ tcp_rsk(req)->is_mptcp = 1;
+ pr_debug("subflow_req=%p, listener=%p", subflow_req, listener);
+
+ tcp_request_sock_ipv4_ops.init_req(req, sk_listener, skb);
+
+ rx_opt.mptcp.flags = 0;
+ rx_opt.mptcp.mp_capable = 0;
+ rx_opt.mptcp.mp_join = 0;
+ rx_opt.mptcp.dss = 0;
+ mptcp_get_options(skb, &rx_opt);
+
+ if (rx_opt.mptcp.mp_capable && listener->request_mptcp) {
+ subflow_req->mp_capable = 1;
+ if (rx_opt.mptcp.version >= listener->version)
+ subflow_req->version = listener->version;
+ else
+ subflow_req->version = rx_opt.mptcp.version;
+ if ((rx_opt.mptcp.flags & MPTCP_CAP_CHECKSUM_REQD) ||
+ listener->checksum)
+ subflow_req->checksum = 1;
+ subflow_req->remote_key = rx_opt.mptcp.sndr_key;
+ } else {
+ subflow_req->mp_capable = 0;
+ }
+}
+
static void subflow_finish_connect(struct sock *sk, const struct sk_buff *skb)
{
struct subflow_sock *subflow = subflow_sk(sk);
@@ -68,13 +102,66 @@ static void subflow_finish_connect(struct sock *sk, const struct sk_buff *skb)
}
}
+static struct request_sock_ops subflow_request_sock_ops;
+static struct tcp_request_sock_ops subflow_request_sock_ipv4_ops;
+
+static int subflow_conn_request(struct sock *sk, struct sk_buff *skb)
+{
+ struct subflow_sock *subflow = subflow_sk(sk);
+
+ pr_debug("subflow=%p", subflow);
+
+ /* Never answer to SYNs sent to broadcast or multicast */
+ if (skb_rtable(skb)->rt_flags & (RTCF_BROADCAST | RTCF_MULTICAST))
+ goto drop;
+
+ return tcp_conn_request(&subflow_request_sock_ops,
+ &subflow_request_sock_ipv4_ops,
+ sk, skb);
+drop:
+ tcp_listendrop(sk);
+ return 0;
+}
+
+static struct sock *subflow_syn_recv_sock(const struct sock *sk,
+ struct sk_buff *skb,
+ struct request_sock *req,
+ struct dst_entry *dst,
+ struct request_sock *req_unhash,
+ bool *own_req)
+{
+ struct subflow_sock *listener = subflow_sk(sk);
+ struct subflow_request_sock *subflow_req = subflow_rsk(req);
+ struct sock *child;
+
+ pr_debug("listener=%p, req=%p, conn=%p", sk, req, listener->conn);
+
+ child = tcp_v4_syn_recv_sock(sk, skb, req, dst, req_unhash, own_req);
+
+ if (child) {
+ struct subflow_sock *subflow = subflow_sk(child);
+
+ pr_debug("child=%p", child);
+ if (subflow_req->mp_capable) {
+ subflow->mp_capable = 1;
+ subflow->fourth_ack = 1;
+ subflow->remote_key = subflow_req->remote_key;
+ subflow->local_key = subflow_req->local_key;
+ } else {
+ subflow->mp_capable = 0;
+ }
+ }
+
+ return child;
+}
+
const struct inet_connection_sock_af_ops subflow_specific = {
.queue_xmit = ip_queue_xmit,
.send_check = tcp_v4_send_check,
.rebuild_header = inet_sk_rebuild_header,
.sk_rx_dst_set = subflow_finish_connect,
- .conn_request = tcp_v4_conn_request,
- .syn_recv_sock = tcp_v4_syn_recv_sock,
+ .conn_request = subflow_conn_request,
+ .syn_recv_sock = subflow_syn_recv_sock,
.net_header_len = sizeof(struct iphdr),
.setsockopt = ip_setsockopt,
.getsockopt = ip_getsockopt,
@@ -112,6 +199,21 @@ static void subflow_close(struct sock *sk, long timeout)
tcp_close(sk, timeout);
}
+static struct sock *subflow_accept(struct sock *sk, int flags, int *err,
+ bool kern)
+{
+ struct subflow_sock *subflow = subflow_sk(sk);
+ struct sock *child;
+
+ pr_debug("subflow=%p, conn=%p", subflow, subflow->conn);
+
+ child = inet_csk_accept(sk, flags, err, kern);
+
+ pr_debug("child=%p", child);
+
+ return child;
+}
+
static void subflow_destroy(struct sock *sk)
{
pr_debug("subflow=%p", sk);
@@ -125,7 +227,7 @@ static struct proto subflow_prot = {
.close = subflow_close,
.connect = subflow_connect,
.disconnect = tcp_disconnect,
- .accept = inet_csk_accept,
+ .accept = subflow_accept,
.ioctl = tcp_ioctl,
.init = subflow_init_sock,
.destroy = subflow_destroy,
@@ -169,7 +271,14 @@ int mptcp_subflow_init(void)
/* TODO: Register path manager callbacks. */
+ subflow_request_sock_ops = tcp_request_sock_ops;
+ subflow_request_sock_ops.obj_size = sizeof(struct subflow_request_sock),
+
+ subflow_request_sock_ipv4_ops = tcp_request_sock_ipv4_ops;
+ subflow_request_sock_ipv4_ops.init_req = subflow_v4_init_req;
+
subflow_prot.twsk_prot = tcp_prot.twsk_prot;
+ subflow_prot.rsk_prot = &subflow_request_sock_ops;
subflow_prot.h.hashinfo = tcp_prot.h.hashinfo;
err = proto_register(&subflow_prot, 1);
if (err)
--
2.19.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2018-12-12 23:58 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-12 23:58 [MPTCP] [RFC PATCH v4 08/17] mptcp: Create SUBFLOW socket for incoming connections Krystad, Peter
-- strict thread matches above, loose matches on Subject: below --
2018-12-12 21:59 cpaasch
2018-12-12 21:45 Krystad, Peter
2018-12-12 21:07 cpaasch
2018-12-12 19:25 Krystad, Peter
2018-12-12 6:08 Christoph Paasch
2018-11-30 20:11 Mat Martineau
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.