All of lore.kernel.org
 help / color / mirror / Atom feed
* [MPTCP] Re: [PATCH v2 mptcp-next] mptcp: implement delegated actions
@ 2021-01-16 10:09 Matthieu Baerts
  0 siblings, 0 replies; 2+ messages in thread
From: Matthieu Baerts @ 2021-01-16 10:09 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 1866 bytes --]

Hi Paolo, Mat,

On 15/01/2021 18:17, Paolo Abeni wrote:
> On MPTCP-level ack reception, the packet scheduler
> may select a subflow other then the current one.
> 
> Prior to this commit we rely on the workqueue to trigger
> action on such subflow.
> 
> This changeset introduce an infrastructure that allows
> any MPTCP subflow to schedule actions (MPTCP xmit) on
> others subflows without resorting to (multiple) process
> reschedule.
> 
> A dummy NAPI instance is used instead. When MPTCP needs to
> trigger action an a different subflow, it enqueues the target
> subflow on the NAPI backlog and schedule such instance as needed.
> 
> The dummy NAPI poll method walk the sockets backlog and try
> to acquire the (BH) socket lock on each of them. If the socket
> is owned by the user space, the action will be completed by
> the sock release cb, otherwise push is started.
> 
> This change leverage the delegated action infrastructure
> to avoid invoking the MPTCP worker to spool the pending data,
> when the packet scheduler picks a subflow other then the one
> currently processing the incoming MPTCP-level ack.
> 
> Additinally we fourther refine the subflow selection
> invoking the packet scheduler for each chunk of data
> even inside __mptcp_subflow_push_pending()
> 
> Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>

Thank you for this nice patch and reviews!

I just added it in our tree, after other patches from Paolo, with Mat's 
RvB tag and fixed a few typos in the commit message. There was a small 
conflict, a detail.

- ac480864a68a: mptcp: implement delegated actions
- da565da41ddc: conflict in t/DO-NOT-MERGE-mptcp-use-kmalloc-on-kasan-build
- Results: dd65d1dcd819..46973a057ea7

Tests + export are in progress!

Cheers,
Matt
-- 
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net

^ permalink raw reply	[flat|nested] 2+ messages in thread

* [MPTCP] Re: [PATCH v2 mptcp-next] mptcp: implement delegated actions
@ 2021-01-16  0:16 Mat Martineau
  0 siblings, 0 replies; 2+ messages in thread
From: Mat Martineau @ 2021-01-16  0:16 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 13486 bytes --]

On Fri, 15 Jan 2021, Paolo Abeni wrote:

> On MPTCP-level ack reception, the packet scheduler
> may select a subflow other then the current one.
>
> Prior to this commit we rely on the workqueue to trigger
> action on such subflow.
>
> This changeset introduce an infrastructure that allows
> any MPTCP subflow to schedule actions (MPTCP xmit) on
> others subflows without resorting to (multiple) process
> reschedule.
>
> A dummy NAPI instance is used instead. When MPTCP needs to
> trigger action an a different subflow, it enqueues the target
> subflow on the NAPI backlog and schedule such instance as needed.
>
> The dummy NAPI poll method walk the sockets backlog and try
> to acquire the (BH) socket lock on each of them. If the socket
> is owned by the user space, the action will be completed by
> the sock release cb, otherwise push is started.
>
> This change leverage the delegated action infrastructure
> to avoid invoking the MPTCP worker to spool the pending data,
> when the packet scheduler picks a subflow other then the one
> currently processing the incoming MPTCP-level ack.
>
> Additinally we fourther refine the subflow selection
> invoking the packet scheduler for each chunk of data
> even inside __mptcp_subflow_push_pending()
>
> Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
> ---
> v1 -> v2:
> - squash the 2 patches togethar

Thanks Paolo - I think it would be good to have this in the export branch 
for more test exposure.


Mat


> ---
> net/mptcp/protocol.c | 83 +++++++++++++++++++++++++++++++++++++++++---
> net/mptcp/protocol.h | 52 +++++++++++++++++++++++++++
> net/mptcp/subflow.c  | 39 +++++++++++++++++++++
> 3 files changed, 170 insertions(+), 4 deletions(-)
>
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index d538f5a6810c..583154c1f72d 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -45,6 +45,9 @@ static struct percpu_counter mptcp_sockets_allocated;
> static void __mptcp_destroy_sock(struct sock *sk);
> static void __mptcp_check_send_data_fin(struct sock *sk);
>
> +DEFINE_PER_CPU(struct mptcp_delegated_action, mptcp_delegated_actions);
> +static struct net_device mptcp_napi_dev;
> +
> /* If msk has an initial subflow socket, and the MP_CAPABLE handshake has not
>  * completed yet or has failed, return the subflow socket.
>  * Otherwise return NULL.
> @@ -1507,7 +1510,9 @@ static void __mptcp_subflow_push_pending(struct sock *sk, struct sock *ssk)
> 	struct mptcp_sock *msk = mptcp_sk(sk);
> 	struct mptcp_sendmsg_info info;
> 	struct mptcp_data_frag *dfrag;
> +	struct sock *xmit_ssk;
> 	int len, copied = 0;
> +	bool first = true;
>
> 	info.flags = 0;
> 	while ((dfrag = mptcp_send_head(sk))) {
> @@ -1517,6 +1522,18 @@ static void __mptcp_subflow_push_pending(struct sock *sk, struct sock *ssk)
> 		while (len > 0) {
> 			int ret = 0;
>
> +			/* the caller already invoked the packet scheduler,
> +			 * check for a different subflow usage only after
> +			 * spooling the first chunk of data
> +			 */
> +			xmit_ssk = first ? ssk : mptcp_subflow_get_send(mptcp_sk(sk));
> +			if (!xmit_ssk)
> +				goto out;
> +			if (xmit_ssk != ssk) {
> +				mptcp_subflow_delegate(mptcp_subflow_ctx(xmit_ssk));
> +				goto out;
> +			}
> +
> 			if (unlikely(mptcp_must_reclaim_memory(sk, ssk))) {
> 				__mptcp_update_wmem(sk);
> 				sk_mem_reclaim_partial(sk);
> @@ -1535,6 +1552,7 @@ static void __mptcp_subflow_push_pending(struct sock *sk, struct sock *ssk)
> 			msk->tx_pending_data -= ret;
> 			copied += ret;
> 			len -= ret;
> +			first = false;
> 		}
> 		WRITE_ONCE(msk->first_pending, mptcp_send_next(sk));
> 	}
> @@ -2243,7 +2261,6 @@ static void mptcp_worker(struct work_struct *work)
> 	if (unlikely(state == TCP_CLOSE))
> 		goto unlock;
>
> -	mptcp_push_pending(sk, 0);
> 	mptcp_check_data_fin_ack(sk);
> 	__mptcp_flush_join_list(msk);
>
> @@ -2903,10 +2920,12 @@ void __mptcp_check_push(struct sock *sk, struct sock *ssk)
> 		return;
>
> 	if (!sock_owned_by_user(sk)) {
> -		if (mptcp_subflow_get_send(mptcp_sk(sk)) == ssk)
> +		struct sock *xmit_ssk = mptcp_subflow_get_send(mptcp_sk(sk));
> +
> +		if (xmit_ssk == ssk)
> 			__mptcp_subflow_push_pending(sk, ssk);
> -		else
> -			mptcp_schedule_work(sk);
> +		else if (xmit_ssk)
> +			mptcp_subflow_delegate(mptcp_subflow_ctx(xmit_ssk));
> 	} else {
> 		set_bit(MPTCP_PUSH_PENDING, &mptcp_sk(sk)->flags);
> 	}
> @@ -2957,6 +2976,20 @@ static void mptcp_release_cb(struct sock *sk)
> 	}
> }
>
> +void mptcp_subflow_process_delegated(struct sock *ssk)
> +{
> +	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
> +	struct sock *sk = subflow->conn;
> +
> +	mptcp_data_lock(sk);
> +	if (!sock_owned_by_user(sk))
> +		__mptcp_subflow_push_pending(sk, ssk);
> +	else
> +		set_bit(MPTCP_PUSH_PENDING, &mptcp_sk(sk)->flags);
> +	mptcp_data_unlock(sk);
> +	mptcp_subflow_delegated_done(subflow);
> +}
> +
> static int mptcp_hash(struct sock *sk)
> {
> 	/* should never be called,
> @@ -3373,13 +3406,55 @@ static struct inet_protosw mptcp_protosw = {
> #define MPTCP_USE_SLAB		1
> #endif
>
> +static int mptcp_napi_poll(struct napi_struct *napi, int budget)
> +{
> +	struct mptcp_delegated_action *delegated;
> +	struct mptcp_subflow_context *subflow;
> +	int work_done = 0;
> +
> +	delegated = container_of(napi, struct mptcp_delegated_action, napi);
> +	while ((subflow = mptcp_subflow_delegated_next(delegated)) != NULL) {
> +		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
> +
> +		bh_lock_sock_nested(ssk);
> +		if (!sock_owned_by_user(ssk))
> +			mptcp_subflow_process_delegated(ssk);
> +
> +		/* if the sock is locked the delegated status will be cleared
> +		 * by tcp_release_cb_override
> +		 */
> +		bh_unlock_sock(ssk);
> +
> +		if (++work_done == budget)
> +			return budget;
> +	}
> +
> +	/* always provide a 0 'work_done' argument, so that napi_complete_done
> +	 * will not try accessing the NULL napi->dev ptr
> +	 */
> +	napi_complete_done(napi, 0);
> +	return work_done;
> +}
> +
> void __init mptcp_proto_init(void)
> {
> +	struct mptcp_delegated_action *delegated;
> +	int cpu;
> +
> 	mptcp_prot.h.hashinfo = tcp_prot.h.hashinfo;
>
> 	if (percpu_counter_init(&mptcp_sockets_allocated, 0, GFP_KERNEL))
> 		panic("Failed to allocate MPTCP pcpu counter\n");
>
> +	init_dummy_netdev(&mptcp_napi_dev);
> +	for_each_possible_cpu(cpu) {
> +		delegated = per_cpu_ptr(&mptcp_delegated_actions, cpu);
> +		INIT_LIST_HEAD(&delegated->head);
> +		netif_tx_napi_add(&mptcp_napi_dev, &delegated->napi, mptcp_napi_poll,
> +				  NAPI_POLL_WEIGHT);
> +		napi_enable(&delegated->napi);
> +	}
> +
> 	mptcp_subflow_init();
> 	mptcp_pm_init();
> 	mptcp_token_init();
> diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
> index 871534df6140..7fa2ac3f2f4e 100644
> --- a/net/mptcp/protocol.h
> +++ b/net/mptcp/protocol.h
> @@ -378,6 +378,13 @@ enum mptcp_data_avail {
> 	MPTCP_SUBFLOW_OOO_DATA
> };
>
> +struct mptcp_delegated_action {
> +	struct napi_struct napi;
> +	struct list_head head;
> +};
> +
> +DECLARE_PER_CPU(struct mptcp_delegated_action, mptcp_delegated_actions);
> +
> /* MPTCP subflow context */
> struct mptcp_subflow_context {
> 	struct	list_head node;/* conn_list of subflows */
> @@ -415,6 +422,9 @@ struct mptcp_subflow_context {
> 	u8	local_id;
> 	u8	remote_id;
>
> +	long	delegated_status;
> +	struct	list_head delegated_node;
> +
> 	struct	sock *tcp_sock;	    /* tcp sk backpointer */
> 	struct	sock *conn;	    /* parent mptcp_sock */
> 	const	struct inet_connection_sock_af_ops *icsk_af_ops;
> @@ -463,6 +473,48 @@ static inline void mptcp_add_pending_subflow(struct mptcp_sock *msk,
> 	spin_unlock_bh(&msk->join_list_lock);
> }
>
> +void mptcp_subflow_process_delegated(struct sock *ssk);
> +
> +static inline void mptcp_subflow_delegate(struct mptcp_subflow_context *subflow)
> +{
> +	struct mptcp_delegated_action *delegated;
> +	bool schedule;
> +
> +	if (!test_and_set_bit(1, &subflow->delegated_status)) {
> +		local_bh_disable();
> +		delegated = this_cpu_ptr(&mptcp_delegated_actions);
> +		schedule = list_empty(&delegated->head);
> +		list_add_tail(&subflow->delegated_node, &delegated->head);
> +		if (schedule)
> +			napi_schedule(&delegated->napi);
> +		local_bh_enable();
> +	}
> +}
> +
> +static inline struct mptcp_subflow_context *
> +mptcp_subflow_delegated_next(struct mptcp_delegated_action *delegated)
> +{
> +	struct mptcp_subflow_context *ret;
> +
> +	if (list_empty(&delegated->head))
> +		return NULL;
> +
> +	ret = list_first_entry(&delegated->head, struct mptcp_subflow_context, delegated_node);
> +	list_del_init(&ret->delegated_node);
> +	return ret;
> +}
> +
> +static inline bool mptcp_subflow_has_delegated_action(const struct mptcp_subflow_context *subflow)
> +{
> +	return !test_bit(1, &subflow->delegated_status);
> +}
> +
> +static inline void mptcp_subflow_delegated_done(struct mptcp_subflow_context *subflow)
> +{
> +	clear_bit(1, &subflow->delegated_status);
> +	list_del_init(&subflow->delegated_node);
> +}
> +
> int mptcp_is_enabled(struct net *net);
> unsigned int mptcp_get_add_addr_timeout(struct net *net);
> void mptcp_subflow_fully_established(struct mptcp_subflow_context *subflow,
> diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
> index 31cc362a4638..0a34d21e4d1f 100644
> --- a/net/mptcp/subflow.c
> +++ b/net/mptcp/subflow.c
> @@ -18,6 +18,7 @@
> #include <net/tcp.h>
> #if IS_ENABLED(CONFIG_MPTCP_IPV6)
> #include <net/ip6_route.h>
> +#include <net/transp_v6.h>
> #endif
> #include <net/mptcp.h>
> #include <uapi/linux/mptcp.h>
> @@ -428,6 +429,7 @@ static int subflow_v4_conn_request(struct sock *sk, struct sk_buff *skb)
> static struct tcp_request_sock_ops subflow_request_sock_ipv6_ops;
> static struct inet_connection_sock_af_ops subflow_v6_specific;
> static struct inet_connection_sock_af_ops subflow_v6m_specific;
> +static struct proto tcpv6_prot_override;
>
> static int subflow_v6_conn_request(struct sock *sk, struct sk_buff *skb)
> {
> @@ -509,6 +511,14 @@ static void subflow_ulp_fallback(struct sock *sk,
> 	icsk->icsk_ulp_ops = NULL;
> 	rcu_assign_pointer(icsk->icsk_ulp_data, NULL);
> 	tcp_sk(sk)->is_mptcp = 0;
> +
> +	/* undo override */
> +#if IS_ENABLED(CONFIG_MPTCP_IPV6)
> +	if (sk->sk_prot == &tcpv6_prot_override)
> +		sk->sk_prot = &tcpv6_prot;
> +	else
> +#endif
> +		sk->sk_prot = &tcp_prot;
> }
>
> static void subflow_drop_ctx(struct sock *ssk)
> @@ -682,6 +692,7 @@ static struct sock *subflow_syn_recv_sock(const struct sock *sk,
> }
>
> static struct inet_connection_sock_af_ops subflow_specific;
> +static struct proto tcp_prot_override;
>
> enum mapping_status {
> 	MAPPING_OK,
> @@ -1206,6 +1217,16 @@ static void mptcp_attach_cgroup(struct sock *parent, struct sock *child)
> #endif /* CONFIG_SOCK_CGROUP_DATA */
> }
>
> +static void mptcp_subflow_ops_override(struct sock *ssk)
> +{
> +#if IS_ENABLED(CONFIG_MPTCP_IPV6)
> +	if (ssk->sk_prot == &tcpv6_prot)
> +		ssk->sk_prot = &tcpv6_prot_override;
> +	else
> +#endif
> +		ssk->sk_prot = &tcp_prot_override;
> +}
> +
> int mptcp_subflow_create_socket(struct sock *sk, struct socket **new_sock)
> {
> 	struct mptcp_subflow_context *subflow;
> @@ -1261,6 +1282,7 @@ int mptcp_subflow_create_socket(struct sock *sk, struct socket **new_sock)
> 	*new_sock = sf;
> 	sock_hold(sk);
> 	subflow->conn = sk;
> +	mptcp_subflow_ops_override(sf->sk);
>
> 	return 0;
> }
> @@ -1277,6 +1299,7 @@ static struct mptcp_subflow_context *subflow_create_ctx(struct sock *sk,
>
> 	rcu_assign_pointer(icsk->icsk_ulp_data, ctx);
> 	INIT_LIST_HEAD(&ctx->node);
> +	INIT_LIST_HEAD(&ctx->delegated_node);
>
> 	pr_debug("subflow=%p", ctx);
>
> @@ -1442,6 +1465,16 @@ static void subflow_ulp_clone(const struct request_sock *req,
> 	}
> }
>
> +static void tcp_release_cb_override(struct sock *ssk)
> +{
> +	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
> +
> +	if (mptcp_subflow_has_delegated_action(subflow))
> +		mptcp_subflow_process_delegated(ssk);
> +
> +	tcp_release_cb(ssk);
> +}
> +
> static struct tcp_ulp_ops subflow_ulp_ops __read_mostly = {
> 	.name		= "mptcp",
> 	.owner		= THIS_MODULE,
> @@ -1482,6 +1515,9 @@ void __init mptcp_subflow_init(void)
> 	subflow_specific.syn_recv_sock = subflow_syn_recv_sock;
> 	subflow_specific.sk_rx_dst_set = subflow_finish_connect;
>
> +	tcp_prot_override = tcp_prot;
> +	tcp_prot_override.release_cb = tcp_release_cb_override;
> +
> #if IS_ENABLED(CONFIG_MPTCP_IPV6)
> 	subflow_request_sock_ipv6_ops = tcp_request_sock_ipv6_ops;
> 	subflow_request_sock_ipv6_ops.route_req = subflow_v6_route_req;
> @@ -1497,6 +1533,9 @@ void __init mptcp_subflow_init(void)
> 	subflow_v6m_specific.net_header_len = ipv4_specific.net_header_len;
> 	subflow_v6m_specific.mtu_reduced = ipv4_specific.mtu_reduced;
> 	subflow_v6m_specific.net_frag_header_len = 0;
> +
> +	tcpv6_prot_override = tcpv6_prot;
> +	tcpv6_prot_override.release_cb = tcp_release_cb_override;
> #endif
>
> 	mptcp_diag_subflow_init(&subflow_ulp_ops);
> -- 
> 2.26.2
> _______________________________________________
> mptcp mailing list -- mptcp(a)lists.01.org
> To unsubscribe send an email to mptcp-leave(a)lists.01.org
>

--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-01-16 10:09 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-16 10:09 [MPTCP] Re: [PATCH v2 mptcp-next] mptcp: implement delegated actions Matthieu Baerts
  -- strict thread matches above, loose matches on Subject: below --
2021-01-16  0:16 Mat Martineau

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.