All of lore.kernel.org
 help / color / mirror / Atom feed
* [MPTCP] Re: [PATCH v3 7/9] mptcp: add netlink based PM
@ 2020-02-22 16:38 Matthieu Baerts
  0 siblings, 0 replies; 12+ messages in thread
From: Matthieu Baerts @ 2020-02-22 16:38 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 37879 bytes --]

Hi Paolo,

As always, it is very good! Sorry for the delay but here are various 
questions and comments.

Note that I didn't really pay attention at how the locks were used, I 
would be glad if someone could look at this specific point. I focused my 
review more on the exposed API, general behaviour, Netlink stuff and typos.

On 21/02/2020 17:48, Paolo Abeni wrote:
> Expose to U/S a netlink family to control the PM, setting:
> 
>   - list of local addresses to be signaled.
>   - list of local addresses used to created subflows.
>   - maximum number of add_addr option to react

It looks a bit strange to me to limit in term of ADD_ADDR. I see that 
all commands are linked to addresses which seem to make sense but I 
don't know if it is good for a generic PM.

Depending on the strategy, you might create and accept more or less 
subflows using the same list of local and remote addresses:

  - a client with a fullmesh strategy will create a subflow from each 
local address to each remote ones having the same IP family. So if the 
client and the server have each 2 (reachable) IPv4 A and B, there will 
be 4 subflows: A-A, A-B, B-A, B-B.

  - another strategy could be like the fullmesh one but using each IP 
only once: A-A, B-B

  - slightly different: create one subflow per network interface (from 
the client, it is hard to know what the server has): both client and 
server can have 2 NIC with one IPv4 and one v6 per NIC. The server will 
announce the v4 and v6 of the other NIC and the client will use one IP 
per interface: A4-A4, B4/6-B4/6 but not A4-A4, A6-A/B6, B4-B4, B6-B/A6 
like we would have with the previous strategy.

  - there are plenty of strategies for the client, e.g. ndiffport (more 
subflows using the same IPs but different ports), etc. Many strategies 
will be dedicated to one specific use-case, then requiring a more 
advanced Netlink PM like the one in mptcp.org with mptcpd.

  - on the server side, I think there are less strategies. Of course, 
some might require a more advanced Netlink

 > +}
 > diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
 > index 5111b191a726..ea916aa22ae4 100644
 > --- a/net/mptcp/protocol.h
 > +++ b/net/mptcp/protocol.h
 > @@ -391,6 +391,13 @@ bool mptcp_pm_addr_signal(struct mptcp_sock 
*msk, unsigned int remaining,
 >   			  struct mptcp_addr_info *saddr);
 >   int mptcp_pm_get_local_id(struct mptcp_sock *msk, struct 
sock_common *skc);
 >
 > +void mptcp_pm_nl_init(void);
 > +void mptcp_pm_nl_data_init(struct mptcp_sock *msk);
 > +void mptcp_pm_nl_fully_established(struct mptcp_sock *msk);
 > +void mptcp_pm_nl_subflow_established(struct mptcp_sock *msk);
 > +void mptcp_pm_nl_add_addr(struct mptcp_sock *msk);
 > +int mptcp_pm_nl_get_local_id(struct mptcp_sock *msk, struct 
sock_common *skc);
 > +
 >   static inline struct mptcp_ext *mptcp_get_ext(struct sk_buff *skb)
 >   {
 >   	return (struct mptcp_ext *)skb_ext_find(skb, SKB_EXT_MPTCP);
 > PM but most of the time, the server's job is to announce addresses 
and accept (or reject) subflows.

Also, another point: the client doesn't really have to send ADD_ADDR to 
the server. The server could then receive MP_JOIN from IPs that are new 
to the server, i.e. not announced by the client with ADD_ADDR.


Now, after having mentioned all these points, I think it can be 
interesting to:

  - continue to manage addresses that can be announced / used to create 
subflows.

  - continue to pick a single strategy for the moment. Maybe best to 
start with fullmesh and if needed, the userspace can block some 
combinations using NFTables (I almost wrote IPTables :-o). This can be 
extended later by adding more strategies but I think we can already do 
many things if the userspace selects which additional addresses to use.

  - instead of having a maximum number of received ADD_ADDR that can 
use, it might be more generic to limit the number of created/accepted 
subflows per MPTCP connection. If needed, this can be extended later 
with other limits: number of subflow per interface or IP and per MPTCP 
connection.

In other words, compared to this patch, only modifying how the limits 
are managed to be more generic (I think).

What do you think about that?


Maybe you still want to keep a limit for the received ADD_ADDR? I don't 
think we store them for the moment? I guess MPTCP_PM_ADDR_MAX is enough 
for now for the limit.

> When the msk is fully established, the PM netlink attempts to
> create subflow for each addr in 'local' list, waiting for each
> connection to be completed before attempting the next one.

I guess it does that only for the client (connect, not listen).

> After exausting the 'local' list, the PM tries to announce the
> 'signal' list via the ADD_ADDR option. Since we currenlty lack

(detail, we can fix that later but: s/exausting/exhausting/ and 
s/currenlty/currently/ and s/signaled/signalled/ (or announced))

> the ADD_ADDR echo (and related event) only the first addr is sent.

It is not clear for me why the client has to announce addresses. But if 
it does that, it might be better to do it before creating additional 
subflows? The info in the ADD_ADDR sent by the clients could maybe be 
used in firewalls. Better to send them before then.

I guess we can change that later, it's not fixed with the API.

> > Idea is to add an additional PM hook for ADD_ADDR echo, to allow
> the PM netlink announcing multiple addresses, in sequence.

And later in parallel to avoid delays :-)

> v1  -> v2:
>   - explicitly reset pm status to idle after processing a command
>   - fix msk pm initialization
>   - fix mptcp_pm_nl_add_addr bugs
>   - added ifindex support
>   - hopefully clearer local function names
>   - always use  IS_ENABLED(CONFIG_MPTCP_IPV6) instead of
>      IS_ENABLED(CONFIG_IPV6)
> 
> RFC -> v1:
>   - simplified NL API
>   - reduced {WRITE,READ}_ONCE boilerplate due to PM changes
>   - add check for duplicate addresses
> 
> Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
> ---
>   include/uapi/linux/mptcp.h |  53 +++
>   net/mptcp/Makefile         |   3 +-
>   net/mptcp/pm.c             |  18 +-
>   net/mptcp/pm_netlink.c     | 828 +++++++++++++++++++++++++++++++++++++
>   net/mptcp/protocol.h       |   7 +
>   5 files changed, 907 insertions(+), 2 deletions(-)
>   create mode 100644 net/mptcp/pm_netlink.c
> 
> diff --git a/include/uapi/linux/mptcp.h b/include/uapi/linux/mptcp.h
> index 3912a9808fa2..8990c564680d 100644
> --- a/include/uapi/linux/mptcp.h
> +++ b/include/uapi/linux/mptcp.h
> @@ -31,4 +31,57 @@ enum {
>   };
>   
>   #define MPTCP_SUBFLOW_MAX (__MPTCP_SUBFLOW_MAX - 1)

Sorry, it's not linked to this patch but we should probably add "_ATTR" 
here (MPTCP_SUBFLOW_ATTR_MAX) and for all entries in the enum, because 
we could think here it is the maximum number of subflows we can have.

I can do the modification if it is not too late.

*Davide* : I guess no upstreamed version of ss are already using this, 
right? Anyway because this kernel code is not upstreamed, it's not too 
late to change, right? :)

> +
> +/* netlink interface */
> +#define MPTCP_PM_NAME		"mptcp_pm"
> +#define MPTCP_PM_CMD_GRP_NAME	"mptcp_pm_cmds"
> +#define MPTCP_PM_VER		0x1

Should we add "_NETLINK" or "_NL" here above, just in case we have 
additional PMs later?

   MPTCP_PM_NL_NAME "mptcp_pm_netlink"

And maybe we should do the same below? e.g. "mptcp_pm_nl_attrs"

> +
> +/*
> + * ATTR types defined for MPTCP
> + */
> +enum mptcp_pm_attrs {
> +	MPTCP_PM_ATTR_UNSPEC,

Because this enum will be exposed to userspace, should we force the 
first item to be 0? (same below)
I guess most compiler will set it to 0 by default but I think the specs 
don't force them to do that.

(if yes, we should also do it for "MPTCP_SUBFLOW(_ATTR)_UNSPEC")

> +
> +	MPTCP_PM_ATTR_ADDR,				/* nested address */
> +	MPTCP_PM_ATTR_RCV_ADD_ADDRS,			/* u32 */
> +
> +	__MPTCP_PM_ATTR_MAX
> +};
> +
> +#define MPTCP_PM_ATTR_MAX (__MPTCP_PM_ATTR_MAX - 1)
> +
> +enum mptcp_pm_addr_addrs {
> +	MPTCP_PM_ADDR_ATTR_UNSPEC,
> +
> +	MPTCP_PM_ADDR_ATTR_FAMILY,			/* u16 */
> +	MPTCP_PM_ADDR_ATTR_ID,				/* u8 */
> +	MPTCP_PM_ADDR_ATTR_ADDR4,			/* struct in_addr */
> +	MPTCP_PM_ADDR_ATTR_ADDR6,			/* struct in6_addr */
> +	MPTCP_PM_ADDR_ATTR_PORT,			/* u16 */

(It seems you are not using it)

> +	MPTCP_PM_ADDR_ATTR_FLAGS,			/* u32 */
> +	MPTCP_PM_ADDR_ATTR_IF_IDX,			/* s32 */
> +
> +	__MPTCP_PM_ADDR_ATTR_MAX
> +};
> +
> +#define MPTCP_PM_ADDR_ATTR_MAX (__MPTCP_PM_ADDR_ATTR_MAX - 1)
> +
> +#define MPTCP_PM_ADDR_FLAG_SIGNAL			(1 << 0)
> +#define MPTCP_PM_ADDR_FLAG_SUBFLOW			(1 << 1)
> +#define MPTCP_PM_ADDR_FLAG_BACKUP			(1 << 2)

detail and we can fix it later if needed: I guess checkpatch will ask us 
to use BIT(0), etc.

> +
> +enum {
> +	MPTCP_CMD_UNSPEC,
> +
> +	MPTCP_CMD_ADD_ADDR,
> +	MPTCP_CMD_DEL_ADDR,
> +	MPTCP_CMD_GET_ADDR,
> +	MPTCP_CMD_FLUSH_ADDRS,
> +	MPTCP_CMD_SET_RCV_ADD_ADDRS,
> +	MPTCP_CMD_GET_RCV_ADD_ADDRS,
> +
> +	__MPTCP_CMD_AFTER_LAST
> +};

A detail but maybe important before it's too late: should we add "_PM" 
or "_PM_NL" in the list of commands?

Also it could be nice to add (later?) comments about the API: what does 
each command do, which (optional) parameters to use and what reply you 
will get (if any).

> +
>   #endif /* _UAPI_MPTCP_H */
> diff --git a/net/mptcp/Makefile b/net/mptcp/Makefile
> index faebe8ec9f73..baa0640527c7 100644
> --- a/net/mptcp/Makefile
> +++ b/net/mptcp/Makefile
> @@ -1,4 +1,5 @@
>   # SPDX-License-Identifier: GPL-2.0
>   obj-$(CONFIG_MPTCP) += mptcp.o
>   
> -mptcp-y := protocol.o subflow.o options.o token.o crypto.o ctrl.o pm.o diag.o mib.o
> +mptcp-y := protocol.o subflow.o options.o token.o crypto.o ctrl.o pm.o diag.o \
> +	   mib.o pm_netlink.o

(detail: because we might want to do other modifications in mib.c, see 
above, should we also modify this commit to go to the new line in this 
commit?)

[...]

> diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
> new file mode 100644
> index 000000000000..56847e16d206
> --- /dev/null
> +++ b/net/mptcp/pm_netlink.c
> @@ -0,0 +1,828 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Multipath TCP
> + *
> + * Copyright (c) 2020, Red Hat, Inc.
> + */
> +
> +#include <linux/inet.h>
> +#include <linux/kernel.h>
> +#include <net/tcp.h>
> +#include <net/netns/generic.h>
> +#include <net/mptcp.h>
> +#include <net/genetlink.h>
> +#include <uapi/linux/mptcp.h>
> +
> +#include "protocol.h"
> +
> +/* forward declaration */
> +static struct genl_family mptcp_genl_family;
> +
> +static int pm_nl_pernet_id;
> +
> +struct mptcp_pm_addr_entry {
> +	struct list_head	list;
> +	unsigned int		flags;
> +	int			ifindex;
> +	struct mptcp_addr_info	addr;
> +	struct rcu_head		rcu;
> +};
> +
> +struct pm_nl_pernet {
> +	/* protects pernet updates */
> +	spinlock_t		lock;
> +	struct list_head	addr_list;

detail: maybe clearer with local_addr_list?

Should we not maintain a list of remote_add_addr?

> +	unsigned int		addrs;
> +	unsigned int		add_addr_signal_max;
> +	unsigned int		add_addr_accept_max;
> +	unsigned int		local_addr_max;
> +	unsigned int		next_id;
> +};
> +
> +#define MPTCP_PM_ADDR_MAX	8
> +
> +static bool addresses_equal(const struct mptcp_addr_info *a,
> +			    struct mptcp_addr_info *b, bool use_port)
> +{
> +	bool addr_equals;
> +
> +	if (a->family != b->family)
> +		return false;
> +
> +	if (a->family == AF_INET)
> +		addr_equals = !memcmp(&a->addr, &b->addr, sizeof(b->addr));
> +	else

You need to surround this block with:

   #if IS_ENABLED(CONFIG_MPTCP_IPV6)

Out of curriosity, what's the recommended way to deal with IPv4 and 
IPv6: should we always have "else if (a->family == AF_INET6)" (or a 
switch/case) or can we assume that if it is not v4, it is v6?

> +		addr_equals = !memcmp(&a->addr6, &b->addr6, sizeof(b->addr6));

detail: you can also use ipv6_addr_cmp(). (and not memcmp for v4?)

> +
> +	if (!addr_equals)
> +		return false;
> +	if (!use_port)
> +		return true;
> +
> +	return a->port == b->port;
> +}
> +
> +static void local_address(const struct sock_common *skc,
> +			  struct mptcp_addr_info *addr)
> +{
> +	addr->family = skc->skc_family;

Should we copy the source port, just in case?

> +	if (addr->family == AF_INET)
> +		addr->addr.s_addr = skc->skc_rcv_saddr;
> +#if IS_ENABLED(CONFIG_MPTCP_IPV6)
> +	else if (addr->family == AF_INET6)
> +		addr->addr6 = skc->skc_v6_rcv_saddr;
> +#endif
> +}
> +
> +static void remote_address(const struct sock_common *skc,
> +			   struct mptcp_addr_info *addr)
> +{
> +	addr->family = skc->skc_family;
> +	addr->port = skc->skc_dport;
> +	if (addr->family == AF_INET)
> +		addr->addr.s_addr = skc->skc_daddr;
> +#if IS_ENABLED(CONFIG_MPTCP_IPV6)
> +	else if (addr->family == AF_INET6)
> +		addr->addr6 = skc->skc_v6_daddr;
> +#endif
> +}
> +
> +static bool lookup_subflow_by_saddr(const struct mptcp_sock *msk,
> +				    struct mptcp_addr_info *saddr)
> +{
> +	struct mptcp_subflow_context *subflow;
> +	struct mptcp_addr_info cur;
> +	struct sock_common *skc;
> +
> +	list_for_each_entry(subflow, &msk->conn_list, node) {
> +		skc = (struct sock_common *)mptcp_subflow_tcp_sock(subflow);
> +
> +		local_address(skc, &cur);

I guess it's OK to do the copy to ease the comparison here below, right?
(we are going to do that only when creating subflows so it seems fine. 
Maybe should we add a comment if we reuse this function later?)

> +		if (addresses_equal(&cur, saddr, false))
> +			return true;
> +	}
> +
> +	return false;
> +}
> +
> +static struct mptcp_pm_addr_entry *
> +select_local_address(const struct pm_nl_pernet *pernet,
> +		     const struct mptcp_sock *msk)
> +{
> +	struct mptcp_pm_addr_entry *entry, *ret = NULL;
> +
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(entry, &pernet->addr_list, list) {
> +		if (!(entry->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW))
> +			continue;
> +		if (entry->addr.family == ((struct sock *)msk)->sk_family &&
> +		    !lookup_subflow_by_saddr(msk, &entry->addr)) {
> +			ret = entry;
> +			break;
> +		}
> +	}
> +	rcu_read_unlock();
> +	return ret;
> +}
> +
> +static struct mptcp_pm_addr_entry *
> +select_signal_address(struct pm_nl_pernet *pernet, unsigned int pos)
> +{
> +	struct mptcp_pm_addr_entry *entry, *ret = NULL;
> +	int i = 0;
> +
> +	rcu_read_lock();
> +	/* do not keep any additional per socket state, just signal
> +	 * the address list in order.
> +	 * Note: removal from the local address list during the msk life-cycle
> +	 * can lead to additional addresses not being announced.
> +	 */
> +	list_for_each_entry_rcu(entry, &pernet->addr_list, list) {
> +		if (!(entry->flags & MPTCP_PM_ADDR_FLAG_SIGNAL))
> +			continue;
> +		if (i++ == pos) {
> +			ret = entry;
> +			break;
> +		}
> +	}
> +	rcu_read_unlock();
> +	return ret;
> +}
> +
> +static void check_work_pending(struct mptcp_sock *msk)
> +{
> +	if (msk->pm.add_addr_signaled == msk->pm.add_addr_signal_max &&
> +	    msk->pm.local_addr_used == msk->pm.local_addr_max)
> +		WRITE_ONCE(msk->pm.work_pending, false);
> +}
> +
> +static void mptcp_pm_create_subflow_or_signal(struct mptcp_sock *msk)

detail: I think we should avoid using signal alone because MP_FAIL, 
REM_ADD, MP_BACKUP, etc. are signals as well, no?
It's maybe just me but I find it clearer, when it is linked to ADD_ADDR, 
to use "announce" instead of the generic term "signal". But that's a 
detail. As long as it is clear for everybody :-)

> +{
> +	struct sock *sk = (struct sock *)msk;
> +	struct mptcp_pm_addr_entry *local;
> +	struct mptcp_addr_info remote;
> +	struct pm_nl_pernet *pernet;
> +
> +	pernet = net_generic(sock_net((struct sock *)msk), pm_nl_pernet_id);
> +
> +	lock_sock(sk);
> +
> +	spin_lock_bh(&msk->pm.lock);
> +	msk->pm.status = MPTCP_PM_IDLE;
> +	pr_debug("local %d:%d signal %d:%d\n",
> +		 msk->pm.local_addr_used, msk->pm.local_addr_max,
> +		 msk->pm.add_addr_signaled, msk->pm.add_addr_signal_max);
> +
> +	/* check first if should create a new subflow */
> +	if (msk->pm.local_addr_used < msk->pm.local_addr_max) {
> +		remote_address((struct sock_common *)sk, &remote);
> +
> +		local = select_local_address(pernet, msk);
> +		if (local) {
> +			msk->pm.local_addr_used++;
> +			check_work_pending(msk);
> +			spin_unlock_bh(&msk->pm.lock);
> +			__mptcp_subflow_connect(sk, local->ifindex,
> +						&local->addr, &remote);

Should we do anything special if this fail?
(decrement local_addr_used? retry later?)

> +			release_sock(sk);
> +			return;
> +		}
> +
> +		/* lookup failed, avoid fourther attempts later */
> +		msk->pm.local_addr_used = msk->pm.local_addr_max;
> +		check_work_pending(msk);
> +	}
> +
> +	/* check for announce */
> +	if (msk->pm.add_addr_signaled < msk->pm.add_addr_signal_max) {
> +		local = select_signal_address(pernet,
> +					      msk->pm.add_addr_signaled);
> +
> +		if (local) {
> +			msk->pm.local_addr_used++;
> +			mptcp_pm_announce_addr(msk, &local->addr);
> +		} else {
> +			/* pick failed, avoid fourther attempts later */
> +			msk->pm.local_addr_used = msk->pm.add_addr_signal_max;
> +		}
> +
> +		check_work_pending(msk);
> +	}
> +	spin_unlock_bh(&msk->pm.lock);
> +	release_sock(sk);
> +}
> +
> +void mptcp_pm_nl_fully_established(struct mptcp_sock *msk)
> +{
> +	mptcp_pm_create_subflow_or_signal(msk);
> +}
> +
> +void mptcp_pm_nl_subflow_established(struct mptcp_sock *msk)
> +{
> +	mptcp_pm_create_subflow_or_signal(msk);
> +}
> +
> +void mptcp_pm_nl_add_addr(struct mptcp_sock *msk)

(it's maybe just me but should we rename this by adding "_recv"? Same 
for the caller and related functions because it was not clear we are 
doing some actions not add an address or produce an add_addr but because 
we just received one)

> +{
> +	struct sock *sk = (struct sock *)msk;
> +	struct mptcp_addr_info remote;
> +	struct mptcp_addr_info local;
> +	struct pm_nl_pernet *pernet;
> +
> +	pernet = net_generic(sock_net((struct sock *)msk), pm_nl_pernet_id);
> +
> +	spin_lock_bh(&msk->pm.lock);
> +	msk->pm.status = MPTCP_PM_IDLE;
> +	pr_debug("accepted %d:%d remote family %d",
> +		 msk->pm.add_addr_accepted, msk->pm.add_addr_accept_max,
> +		 msk->pm.remote.family);
> +	if (++msk->pm.add_addr_accepted >= msk->pm.add_addr_accept_max)
> +		WRITE_ONCE(msk->pm.accept_addr, false);
> +
> +	/* connect to the specified remote address, using whatever
> +	 * local address the routing configuration will pick.
> +	 */

Should we try with all local IP the userspace added (+ the original 
one)? → to have a fullmesh

> +	remote = msk->pm.remote;
> +	if (!remote.port)
> +		remote.port = sk->sk_dport;
> +	memset(&local, 0, sizeof(local));
> +	local.family = remote.family;
> +	spin_unlock_bh(&msk->pm.lock);
> +
> +	lock_sock(sk);
> +	__mptcp_subflow_connect((struct sock *)msk, 0, &local, &remote);

Should we do anything special if this fail?

> +	release_sock(sk);

Do we need to clear "pm.remote" somewhere? How do we deal with multiple 
received ADD_ADDR? Even if with the "echo", I guess we might receive a 
new one before the workqueue has finished dealing with the previous one, no?

> +}
> +
> +static bool address_use_port(struct mptcp_pm_addr_entry *entry)
> +{
> +	return (entry->flags &
> +		(MPTCP_PM_ADDR_FLAG_SIGNAL | MPTCP_PM_ADDR_FLAG_SUBFLOW)) ==
> +		MPTCP_PM_ADDR_FLAG_SIGNAL;
> +}
> +
> +static int mptcp_pm_nl_append_new_addr(struct pm_nl_pernet *pernet,

detail: Here as well, should we add _local → new_local_addr?

> +				       struct mptcp_pm_addr_entry *entry)
> +{
> +	struct mptcp_pm_addr_entry *cur;
> +	int ret = -EINVAL;
> +
> +	spin_lock_bh(&pernet->lock);
> +	/* to keep the code simple, don't do IDR-like allocation for address ID,
> +	 * just bail when we exceed limits
> +	 */

And we also need to support REM_ADDR here I guess.

> +	if (pernet->next_id > 255)
> +		goto out;
> +	if (pernet->addrs >= MPTCP_PM_ADDR_MAX)
> +		goto out;
> +
> +	/* do not insert duplicate address, differentiate on port only
> +	 * singled addresses
> +	 */
> +	list_for_each_entry(cur, &pernet->addr_list, list) {
> +		if (addresses_equal(&cur->addr, &entry->addr,
> +				    address_use_port(entry) &&
> +				    address_use_port(cur)))
> +			goto out;
> +	}
> +
> +	if (entry->flags & MPTCP_PM_ADDR_FLAG_SIGNAL)
> +		pernet->add_addr_signal_max++;
> +	if (entry->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW)
> +		pernet->local_addr_max++;
> +
> +	entry->addr.id = pernet->next_id++;
> +	pernet->addrs++;
> +	list_add_tail_rcu(&entry->list, &pernet->addr_list);
> +	ret = entry->addr.id;
> +
> +out:
> +	spin_unlock_bh(&pernet->lock);
> +	return ret;
> +}
> +
> +int mptcp_pm_nl_get_local_id(struct mptcp_sock *msk, struct sock_common *skc)
> +{
> +	struct mptcp_pm_addr_entry *entry;
> +	struct mptcp_addr_info skc_local;
> +	struct mptcp_addr_info msk_local;
> +	struct pm_nl_pernet *pernet;
> +	int ret = -1;
> +
> +	if (WARN_ON_ONCE(!msk))
> +		return -1;
> +
> +	/* The 0 ID mapping is defined by the first subflow, copied into the msk
> +	 * addr
> +	 */

Is it still OK to do that if the first subflow is down (but other 
subflows exist)? I mean, no need to have additional check to see if the 
subflow 0 is still alive? (I didn't check what we do with the msk for 
this specific case)

> +	local_address((struct sock_common *)msk, &msk_local);
> +	local_address((struct sock_common *)msk, &skc_local);
> +	if (addresses_equal(&msk_local, &skc_local, false))
> +		return 0;
> +
> +	pernet = net_generic(sock_net((struct sock *)msk), pm_nl_pernet_id);
> +
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(entry, &pernet->addr_list, list) {
> +		if (addresses_equal(&entry->addr, &skc_local, false)) {
> +			ret = entry->addr.id;
> +			break;
> +		}
> +	}
> +	rcu_read_unlock();
> +	if (ret >= 0)
> +		return ret;
> +
> +	/* address not found, add to local list */
> +	entry = kmalloc(sizeof(*entry), GFP_KERNEL);
> +	if (!entry)
> +		return -ENOMEM;
> +
> +	entry->flags = 0;
> +	entry->addr = skc_local;
> +	ret = mptcp_pm_nl_append_new_addr(pernet, entry);
> +	if (ret < 0)
> +		kfree(entry);
> +
> +	return ret;
> +}
> +
> +void mptcp_pm_nl_data_init(struct mptcp_sock *msk)
> +{
> +	struct pm_nl_pernet *pernet;
> +
> +	pernet = net_generic(sock_net((struct sock *)msk), pm_nl_pernet_id);
> +
> +	msk->pm.add_addr_signal_max = READ_ONCE(pernet->add_addr_signal_max);
> +	msk->pm.add_addr_accept_max = READ_ONCE(pernet->add_addr_accept_max);
> +	msk->pm.local_addr_max = READ_ONCE(pernet->local_addr_max);
> +	WRITE_ONCE(msk->pm.work_pending,
> +		   !!msk->pm.local_addr_max || !!msk->pm.add_addr_signal_max);
> +	WRITE_ONCE(msk->pm.accept_addr, !!msk->pm.add_addr_accept_max);
> +}
> +
> +#define MPTCP_PM_CMD_GRP_OFFSET	0
> +
> +static const struct genl_multicast_group mptcp_pm_mcgrps[] = {
> +	[MPTCP_PM_CMD_GRP_OFFSET]	= { .name = MPTCP_PM_CMD_GRP_NAME, },
> +};
> +
> +static const struct nla_policy
> +mptcp_pm_addr_policy[MPTCP_PM_ADDR_ATTR_MAX + 1] = {
> +	[MPTCP_PM_ADDR_ATTR_FAMILY]	= { .type	= NLA_U16,	},
> +	[MPTCP_PM_ADDR_ATTR_ID]		= { .type	= NLA_U8,	},
> +	[MPTCP_PM_ADDR_ATTR_ADDR4]	= { .type	= NLA_U32,	},
> +	[MPTCP_PM_ADDR_ATTR_ADDR6]	= { .type	= NLA_EXACT_LEN,
> +					    .len   = sizeof(struct in6_addr), },
> +	[MPTCP_PM_ADDR_ATTR_PORT]	= { .type	= NLA_U16	},
> +	[MPTCP_PM_ADDR_ATTR_FLAGS]	= { .type	= NLA_U32	},
> +	[MPTCP_PM_ADDR_ATTR_IF_IDX]     = { .type	= NLA_S32	},
> +};
> +
> +static const struct nla_policy mptcp_pm_policy[MPTCP_PM_ATTR_MAX + 1] = {
> +	[MPTCP_PM_ATTR_ADDR]		=
> +					NLA_POLICY_NESTED(mptcp_pm_addr_policy),
> +	[MPTCP_PM_ATTR_RCV_ADD_ADDRS]	= { .type	= NLA_U32,	},
> +};
> +
> +static int mptcp_pm_family_to_addr(int family)
> +{
> +#if IS_ENABLED(CONFIG_MPTCP_IPV6)
> +	if (family == AF_INET6)
> +		return MPTCP_PM_ADDR_ATTR_ADDR6;
> +#endif > +	return MPTCP_PM_ADDR_ATTR_ADDR4;
> +}
> +
> +static int mptcp_pm_parse_addr(struct nlattr *attr, struct genl_info *info,
> +			       bool require_family,
> +			       struct mptcp_pm_addr_entry *entry)
> +{
> +	struct nlattr *tb[MPTCP_PM_ADDR_ATTR_MAX + 1];
> +	int err, addr_addr;
> +
> +	if (!attr) {
> +		GENL_SET_ERR_MSG(info, "missing address info");
> +		return -EINVAL;
> +	}
> +
> +	/* no validation needed - was already done via nested policy */
> +	err = nla_parse_nested_deprecated(tb, MPTCP_PM_ADDR_ATTR_MAX, attr,
> +					  mptcp_pm_addr_policy, info->extack);
> +	if (err)
> +		return err;
> +
> +	memset(entry, 0, sizeof(*entry));
> +	if (!tb[MPTCP_PM_ADDR_ATTR_FAMILY]) {
> +		if (!require_family)
> +			goto skip_family;
> +
> +		NL_SET_ERR_MSG_ATTR(info->extack, attr,
> +				    "missing family");
> +		return -EINVAL;
> +	}
> +
> +	entry->addr.family = nla_get_u16(tb[MPTCP_PM_ADDR_ATTR_FAMILY]);
> +	if (entry->addr.family != AF_INET
> +#if IS_ENABLED(CONFIG_MPTCP_IPV6)
> +	    && entry->addr.family != AF_INET6
> +#endif
> +	    ) {
> +		NL_SET_ERR_MSG_ATTR(info->extack, attr,
> +				    "unknown address family");
> +		return -EINVAL;
> +	}
> +	addr_addr = mptcp_pm_family_to_addr(entry->addr.family);
> +	if (!tb[addr_addr]) {
> +		NL_SET_ERR_MSG_ATTR(info->extack, attr,
> +				    "missing address data");
> +		return -EINVAL;
> +	}
> +
> +#if IS_ENABLED(CONFIG_MPTCP_IPV6)
> +	if (entry->addr.family == AF_INET6)
> +		entry->addr.addr6 = nla_get_in6_addr(tb[addr_addr]);
> +	else
> +#endif
> +		entry->addr.addr.s_addr = nla_get_in_addr(tb[addr_addr]);
> +
> +skip_family:
> +	if (tb[MPTCP_PM_ADDR_ATTR_IF_IDX])
> +		entry->ifindex = nla_get_s32(tb[MPTCP_PM_ADDR_ATTR_IF_IDX]);
> +	else
> +		entry->ifindex = 0;

detail: "entry" is init to 0 at the beginning of this function. Same 
below for the ID. Or if you remove the memset, the flags are not set to 0

> +
> +	if (tb[MPTCP_PM_ADDR_ATTR_ID])
> +		entry->addr.id = nla_get_u16(tb[MPTCP_PM_ADDR_ATTR_ID]);

Is it not a u8?

> +	else
> +		entry->addr.id = 0;
> +
> +	if (tb[MPTCP_PM_ADDR_ATTR_FLAGS])
> +		entry->flags = nla_get_u32(tb[MPTCP_PM_ADDR_ATTR_FLAGS]);

(detail if you have to modify this: please add a new line here because 
most of the time, there is an empty line after a if-statement block)

> +	return 0;
> +}
> +
> +static struct pm_nl_pernet *genl_info_pm_nl(struct genl_info *info)
> +{
> +	return net_generic(genl_info_net(info), pm_nl_pernet_id);
> +}
> +
> +static int mptcp_nl_cmd_add_addr(struct sk_buff *skb, struct genl_info *info)
> +{
> +	struct nlattr *attr = info->attrs[MPTCP_PM_ATTR_ADDR];
> +	struct pm_nl_pernet *pernet = genl_info_pm_nl(info);
> +	struct mptcp_pm_addr_entry addr, *entry;
> +	int ret;
> +
> +	ret = mptcp_pm_parse_addr(attr, info, true, &addr);
> +	if (ret)

Out of curriosity, is there a recommendation on what to check with 
"ret"? I mean "ret" is vague and often we can see "if (ret < 0)", like 
below (also because it is needed in this case), which seems clearer to 
me. Sometimes, "err" is used and in this case, it is clear to read: "if 
(err)".
No need to change anything, it's just to know what's recommended but not 
mandatory I guess.

> +		return ret;
> +
> +	entry = kmalloc(sizeof(*entry), GFP_KERNEL);
> +	if (!entry) {
> +		GENL_SET_ERR_MSG(info, "can't allocate addr");
> +		return -ENOMEM;
> +	}
> +
> +	*entry = addr;
> +	ret = mptcp_pm_nl_append_new_addr(pernet, entry);
> +	if (ret < 0) {
> +		GENL_SET_ERR_MSG(info, "too many addresses");

detail: Or duplicated one.

> +		kfree(entry);
> +		return ret;
> +	}

Should we trigger here the establishement of new subflows and/or the 
send of ADD_ADDR for all existing MPTCP connections?

> +	return 0;
> +}
> +
> +static struct mptcp_pm_addr_entry *
> +__lookup_addr_by_id(struct pm_nl_pernet *pernet, unsigned int id)
> +{
> +	struct mptcp_pm_addr_entry *entry;
> +
> +	list_for_each_entry(entry, &pernet->addr_list, list) {
> +		if (entry->addr.id == id)
> +			return entry;
> +	}
> +	return NULL;
> +}
> +
> +static int mptcp_nl_cmd_del_addr(struct sk_buff *skb, struct genl_info *info)
> +{
> +	struct nlattr *attr = info->attrs[MPTCP_PM_ATTR_ADDR];
> +	struct pm_nl_pernet *pernet = genl_info_pm_nl(info);
> +	struct mptcp_pm_addr_entry addr, *entry;
> +	int ret;
> +
> +	ret = mptcp_pm_parse_addr(attr, info, false, &addr);
> +	if (ret)
> +		return ret;
> +
> +	spin_lock_bh(&pernet->lock);
> +	entry = __lookup_addr_by_id(pernet, addr.addr.id);
> +	if (!entry) {
> +		GENL_SET_ERR_MSG(info, "address not found");
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +	if (entry->flags & MPTCP_PM_ADDR_FLAG_SIGNAL)
> +		pernet->add_addr_signal_max--;
> +	if (entry->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW)
> +		pernet->local_addr_max--;
> +
> +	pernet->addrs--;
> +	list_del_rcu(&entry->list);
> +	kfree_rcu(entry, rcu);
> +out:
> +	spin_unlock_bh(&pernet->lock);
> +	return ret;
> +}
> +
> +static void __flush_addrs(struct pm_nl_pernet *pernet)
> +{
> +	while (!list_empty(&pernet->addr_list)) {
> +		struct mptcp_pm_addr_entry *cur;
> +
> +		cur = list_entry(pernet->addr_list.next,
> +				 struct mptcp_pm_addr_entry, list);
> +		list_del_rcu(&cur->list);
> +		kfree_rcu(cur, rcu);
> +	}
> +}
> +
> +static void __reset_counters(struct pm_nl_pernet *pernet)
> +{
> +	pernet->add_addr_signal_max = 0;
> +	pernet->add_addr_accept_max = 0;
> +	pernet->local_addr_max = 0;
> +	pernet->addrs = 0;
> +}
> +
> +static int mptcp_nl_cmd_flush_addrs(struct sk_buff *skb, struct genl_info *info)
> +{
> +	struct pm_nl_pernet *pernet = genl_info_pm_nl(info);
> +
> +	spin_lock_bh(&pernet->lock);
> +	__flush_addrs(pernet);
> +	__reset_counters(pernet);
> +	spin_unlock_bh(&pernet->lock);
> +	return 0;
> +}
> +
> +static int mptcp_nl_fill_addr(struct sk_buff *skb,
> +			      struct mptcp_pm_addr_entry *entry)
> +{
> +	struct mptcp_addr_info *addr = &entry->addr;
> +	struct nlattr *attr;
> +
> +	attr = nla_nest_start(skb, MPTCP_PM_ATTR_ADDR);
> +	if (!attr)
> +		return -EMSGSIZE;
> +
> +	if (nla_put_u16(skb, MPTCP_PM_ADDR_ATTR_FAMILY, addr->family))
> +		goto nla_put_failure;
> +	if (nla_put_u8(skb, MPTCP_PM_ADDR_ATTR_ID, addr->id))
> +		goto nla_put_failure;
> +	if (nla_put_u32(skb, MPTCP_PM_ADDR_ATTR_FLAGS, entry->flags))
> +		goto nla_put_failure;
> +	if (entry->ifindex &&
> +	    nla_put_s32(skb, MPTCP_PM_ADDR_ATTR_IF_IDX, entry->ifindex))
> +		goto nla_put_failure;
> +
> +	if (addr->family == AF_INET)
> +		nla_put_in_addr(skb, MPTCP_PM_ADDR_ATTR_ADDR4,
> +				addr->addr.s_addr);
> +#if IS_ENABLED(CONFIG_MPTCP_IPV6)
> +	else if (addr->family == AF_INET6)
> +		nla_put_in6_addr(skb, MPTCP_PM_ADDR_ATTR_ADDR6, &addr->addr6);
> +#endif
> +	nla_nest_end(skb, attr);
> +	return 0;
> +
> +nla_put_failure:
> +	nla_nest_cancel(skb, attr);
> +	return -EMSGSIZE;
> +}
> +
> +static int mptcp_nl_cmd_get_addr(struct sk_buff *skb, struct genl_info *info)
> +{
> +	struct nlattr *attr = info->attrs[MPTCP_PM_ATTR_ADDR];
> +	struct pm_nl_pernet *pernet = genl_info_pm_nl(info);
> +	struct mptcp_pm_addr_entry addr, *entry;
> +	struct sk_buff *msg;
> +	void *reply;
> +	int ret;
> +
> +	ret = mptcp_pm_parse_addr(attr, info, false, &addr);
> +	if (ret)
> +		return ret;
> +
> +	msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
> +	if (!msg)
> +		return -ENOMEM;
> +
> +	reply = genlmsg_put_reply(msg, info, &mptcp_genl_family, 0,
> +				  MPTCP_CMD_ADD_ADDR);

Should it not be MPTCP_CMD_GET_ADDR?

Can you not use info->genlhdr->cmd? (Same for the other genlmsg_put_reply())

> +	if (!reply) {
> +		GENL_SET_ERR_MSG(info, "not enough space in Netlink message");
> +		ret = -EMSGSIZE;
> +		goto fail;
> +	}
> +
> +	spin_lock_bh(&pernet->lock);
> +	entry = __lookup_addr_by_id(pernet, addr.addr.id);
> +	if (!entry) {
> +		GENL_SET_ERR_MSG(info, "address not found");
> +		ret = -EINVAL;
> +		goto unlock_fail;
> +	}
> +
> +	ret = mptcp_nl_fill_addr(msg, entry);
> +	if (ret)
> +		goto unlock_fail;
> +
> +	genlmsg_end(msg, reply);
> +	ret = genlmsg_reply(msg, info);
> +	spin_unlock_bh(&pernet->lock);
> +	return ret;
> +
> +unlock_fail:
> +	spin_unlock_bh(&pernet->lock);
> +
> +fail:
> +	nlmsg_free(msg);
> +	return ret;
> +}
> +
> +static int mptcp_nl_cmd_dump_addrs(struct sk_buff *msg,
> +				   struct netlink_callback *cb)
> +{
> +	struct net *net = sock_net(msg->sk);
> +	struct mptcp_pm_addr_entry *entry;
> +	struct pm_nl_pernet *pernet;
> +	int id = cb->args[0];
> +	void *hdr;
> +
> +	pernet = net_generic(net, pm_nl_pernet_id);
> +
> +	spin_lock_bh(&pernet->lock);
> +	list_for_each_entry(entry, &pernet->addr_list, list) {
> +		if (entry->addr.id <= id)
> +			continue;
> +
> +		hdr = genlmsg_put(msg, NETLINK_CB(cb->skb).portid,
> +				  cb->nlh->nlmsg_seq, &mptcp_genl_family,
> +				  NLM_F_MULTI, MPTCP_CMD_ADD_ADDR);

same here: should it be MPTCP_CMD_GET_ADDR?

> +		if (!hdr)
> +			break;
> +
> +		if (mptcp_nl_fill_addr(msg, entry) < 0) {
> +			genlmsg_cancel(msg, hdr);
> +			break;
> +		}
> +
> +		id = entry->addr.id;
> +		genlmsg_end(msg, hdr);
> +	}
> +	spin_unlock_bh(&pernet->lock);
> +
> +	cb->args[0] = id;
> +	return msg->len;
> +}
> +
> +static int
> +mptcp_nl_cmd_set_rcv_add_addrs(struct sk_buff *skb, struct genl_info *info)
> +{
> +	struct nlattr *attr = info->attrs[MPTCP_PM_ATTR_RCV_ADD_ADDRS];
> +	struct pm_nl_pernet *pernet = genl_info_pm_nl(info);
> +	int limit;
> +
> +	if (!attr) {
> +		GENL_SET_ERR_MSG(info, "missing announce accept limit");
> +		return -EINVAL;
> +	}
> +
> +	limit = nla_get_u16(attr);

it should be u32 I think.

> +	if (limit > MPTCP_PM_ADDR_MAX) {
> +		GENL_SET_ERR_MSG(info,
> +				 "announce accept limit greater than maximum");
> +		return -EINVAL;
> +	}
> +
> +	WRITE_ONCE(pernet->add_addr_accept_max, limit);
> +	return 0;
> +}
> +
> +static int
> +mptcp_nl_cmd_get_rcv_add_addrs(struct sk_buff *skb, struct genl_info *info)
> +{
> +	struct pm_nl_pernet *pernet = genl_info_pm_nl(info);
> +	struct sk_buff *msg;
> +	void *reply;
> +
> +	msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
> +	if (!msg)
> +		return -ENOMEM;
> +
> +	reply = genlmsg_put_reply(msg, info, &mptcp_genl_family, 0,
> +				  MPTCP_CMD_GET_RCV_ADD_ADDRS);
> +	if (!reply)
> +		goto fail;
> +
> +	if (nla_put_u32(msg, MPTCP_PM_ATTR_RCV_ADD_ADDRS,
> +			READ_ONCE(pernet->add_addr_accept_max)))
> +		goto fail;
> +
> +	genlmsg_end(msg, reply);
> +	return genlmsg_reply(msg, info);
> +
> +fail:
> +	GENL_SET_ERR_MSG(info, "not enough space in Netlink message");
> +	nlmsg_free(msg);
> +	return -EMSGSIZE;
> +}
> +
> +static struct genl_ops mptcp_pm_ops[] = {
> +	{
> +		.cmd    = MPTCP_CMD_ADD_ADDR,
> +		.doit   = mptcp_nl_cmd_add_addr,
> +		.flags  = GENL_ADMIN_PERM,
> +	},
> +	{
> +		.cmd    = MPTCP_CMD_DEL_ADDR,
> +		.doit   = mptcp_nl_cmd_del_addr,
> +		.flags  = GENL_ADMIN_PERM,
> +	},
> +	{
> +		.cmd    = MPTCP_CMD_FLUSH_ADDRS,
> +		.doit   = mptcp_nl_cmd_flush_addrs,
> +		.flags  = GENL_ADMIN_PERM,
> +	},
> +	{
> +		.cmd    = MPTCP_CMD_GET_ADDR,
> +		.doit   = mptcp_nl_cmd_get_addr,
> +		.dumpit   = mptcp_nl_cmd_dump_addrs,
> +		.flags  = GENL_ADMIN_PERM,
> +	},
> +	{
> +		.cmd    = MPTCP_CMD_SET_RCV_ADD_ADDRS,
> +		.doit   = mptcp_nl_cmd_set_rcv_add_addrs,
> +		.flags  = GENL_ADMIN_PERM,
> +	},
> +	{
> +		.cmd    = MPTCP_CMD_GET_RCV_ADD_ADDRS,
> +		.doit   = mptcp_nl_cmd_get_rcv_add_addrs,
> +		.flags  = GENL_ADMIN_PERM,
> +	},
> +};
> +
> +static struct genl_family mptcp_genl_family __ro_after_init = {
> +	.name		= MPTCP_PM_NAME,
> +	.version	= MPTCP_PM_VER,
> +	.maxattr	= MPTCP_PM_ATTR_MAX,
> +	.policy		= mptcp_pm_policy,
> +	.netnsok	= true,
> +	.module		= THIS_MODULE,
> +	.ops		= mptcp_pm_ops,
> +	.n_ops		= ARRAY_SIZE(mptcp_pm_ops),
> +	.mcgrps		= mptcp_pm_mcgrps,
> +	.n_mcgrps	= ARRAY_SIZE(mptcp_pm_mcgrps),
> +};
> +
> +static int __net_init pm_nl_init_net(struct net *net)
> +{
> +	struct pm_nl_pernet *pernet = net_generic(net, pm_nl_pernet_id);
> +
> +	INIT_LIST_HEAD_RCU(&pernet->addr_list);
> +	__reset_counters(pernet);

I guess it's not strictly needed but doesn't hurt.

> +	pernet->next_id = 1;
> +	spin_lock_init(&pernet->lock);
> +	return 0;
> +}
> +
> +static void __net_exit pm_nl_exit_net(struct list_head *net_list)
> +{
> +	struct net *net;
> +
> +	list_for_each_entry(net, net_list, exit_list) {
> +		/* net is removed from namespace list, can't race with
> +		 * other modifiers
> +		 */
> +		__flush_addrs(net_generic(net, pm_nl_pernet_id));
> +	}
> +}
> +
> +static struct pernet_operations mptcp_pm_pernet_ops = {
> +	.init = pm_nl_init_net,
> +	.exit_batch = pm_nl_exit_net,
> +	.id = &pm_nl_pernet_id,
> +	.size = sizeof(struct pm_nl_pernet),
> +};
> +
> +void mptcp_pm_nl_init(void)
> +{
> +	if (register_pernet_subsys(&mptcp_pm_pernet_ops) < 0)
> +		panic("Failed to register MPTCP PM pernet subsystem.\n");
> +
> +	if (genl_register_family(&mptcp_genl_family))
> +		panic("Failed to register MPTCP PM netlink family");

(detail: I guess you need "\n")

Cheers,
Matt
-- 
Matthieu Baerts | R&D Engineer
matthieu.baerts(a)tessares.net
Tessares SA | Hybrid Access Solutions
www.tessares.net
1 Avenue Jean Monnet, 1348 Louvain-la-Neuve, Belgium

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [MPTCP] Re: [PATCH v3 7/9] mptcp: add netlink based PM
@ 2020-02-26  9:54 Matthieu Baerts
  0 siblings, 0 replies; 12+ messages in thread
From: Matthieu Baerts @ 2020-02-26  9:54 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 1332 bytes --]

Hi Paolo, Florian,

On 26/02/2020 10:51, Florian Westphal wrote:
> Paolo Abeni <pabeni(a)redhat.com> wrote:
>> On Wed, 2020-02-26 at 10:36 +0100, Florian Westphal wrote:
>>> Paolo Abeni <pabeni(a)redhat.com> wrote:
>>>> I see! Yes, currently we can't limit the number of subflows created for
>>>> a single mptcp socket on a server. I'm unsure how bad that is, as
>>>> usually we allow each client to create an arbitrary high number of TCP
>>>> connection to the server (modulo syn flood).
>>>>
>>>> Still I think adding the ability to impose such limit could be nice. It
>>>> requires an additional hook (from subflow_syn_recv_sock() towards the
>>>> PM). If we agree on that, I can try to cover in a separate
>>>> patch/series.
>>>
>>> I think its a must-have, as file descriptor limits don't apply.
>>
>> Ok I'll work on that!
>>
>> Can we still merge the series without this feature to unblock Peter ?
> 
> No objections, I did not mean that as a veto.

Fine for me too.

I can have a look at that after Florian's rebase.

I will also drop "mptcp: Implement basic path manager" patch as 
mentioned in the cover-letter!

-- 
Matthieu Baerts | R&D Engineer
matthieu.baerts(a)tessares.net
Tessares SA | Hybrid Access Solutions
www.tessares.net
1 Avenue Jean Monnet, 1348 Louvain-la-Neuve, Belgium

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [MPTCP] Re: [PATCH v3 7/9] mptcp: add netlink based PM
@ 2020-02-26  9:51 Florian Westphal
  0 siblings, 0 replies; 12+ messages in thread
From: Florian Westphal @ 2020-02-26  9:51 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 912 bytes --]

Paolo Abeni <pabeni(a)redhat.com> wrote:
> On Wed, 2020-02-26 at 10:36 +0100, Florian Westphal wrote:
> > Paolo Abeni <pabeni(a)redhat.com> wrote:
> > > I see! Yes, currently we can't limit the number of subflows created for
> > > a single mptcp socket on a server. I'm unsure how bad that is, as
> > > usually we allow each client to create an arbitrary high number of TCP
> > > connection to the server (modulo syn flood).
> > > 
> > > Still I think adding the ability to impose such limit could be nice. It
> > > requires an additional hook (from subflow_syn_recv_sock() towards the
> > > PM). If we agree on that, I can try to cover in a separate
> > > patch/series.
> > 
> > I think its a must-have, as file descriptor limits don't apply.
> 
> Ok I'll work on that! 
> 
> Can we still merge the series without this feature to unblock Peter ?

No objections, I did not mean that as a veto.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [MPTCP] Re: [PATCH v3 7/9] mptcp: add netlink based PM
@ 2020-02-26  9:46 Paolo Abeni
  0 siblings, 0 replies; 12+ messages in thread
From: Paolo Abeni @ 2020-02-26  9:46 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 809 bytes --]

On Wed, 2020-02-26 at 10:36 +0100, Florian Westphal wrote:
> Paolo Abeni <pabeni(a)redhat.com> wrote:
> > I see! Yes, currently we can't limit the number of subflows created for
> > a single mptcp socket on a server. I'm unsure how bad that is, as
> > usually we allow each client to create an arbitrary high number of TCP
> > connection to the server (modulo syn flood).
> > 
> > Still I think adding the ability to impose such limit could be nice. It
> > requires an additional hook (from subflow_syn_recv_sock() towards the
> > PM). If we agree on that, I can try to cover in a separate
> > patch/series.
> 
> I think its a must-have, as file descriptor limits don't apply.

Ok I'll work on that! 

Can we still merge the series without this feature to unblock Peter ?

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [MPTCP] Re: [PATCH v3 7/9] mptcp: add netlink based PM
@ 2020-02-26  9:36 Florian Westphal
  0 siblings, 0 replies; 12+ messages in thread
From: Florian Westphal @ 2020-02-26  9:36 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 604 bytes --]

Paolo Abeni <pabeni(a)redhat.com> wrote:
> I see! Yes, currently we can't limit the number of subflows created for
> a single mptcp socket on a server. I'm unsure how bad that is, as
> usually we allow each client to create an arbitrary high number of TCP
> connection to the server (modulo syn flood).
> 
> Still I think adding the ability to impose such limit could be nice. It
> requires an additional hook (from subflow_syn_recv_sock() towards the
> PM). If we agree on that, I can try to cover in a separate
> patch/series.

I think its a must-have, as file descriptor limits don't apply.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [MPTCP] Re: [PATCH v3 7/9] mptcp: add netlink based PM
@ 2020-02-25 17:51 Matthieu Baerts
  0 siblings, 0 replies; 12+ messages in thread
From: Matthieu Baerts @ 2020-02-25 17:51 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 3000 bytes --]

Hi Paolo,

On 24/02/2020 19:31, Paolo Abeni wrote:
> On Mon, 2020-02-24 at 18:58 +0100, Matthieu Baerts wrote:
>>> The maximum number of received ADD_ADDR limit allows to easily model
>>> the 'server' behaviour (max accepted == 0). We could add an additional
>>> max_subflows limit, but I think it's only relevant if we do the full-
>>> mash strategy.
>>
>> What the server will do with the received ADD_ADDR? The RFC allows the
>> server to initiate subflows but technically, is it possible for the
>> server to do that if the socket is in "LISTEN" mode and on "CONNECT"
>> mode on the client side?
>>
>> If we limit the number of created subflows, we can limit the number of
>> subflows on both the client and the server. Is it possible to limit the
>> number of subflows on the server side with the current version? (It is
>> maybe not needed for the moment.)
> 
> I see! Yes, currently we can't limit the number of subflows created for
> a single mptcp socket on a server. I'm unsure how bad that is, as
> usually we allow each client to create an arbitrary high number of TCP
> connection to the server (modulo syn flood).
> 
> Still I think adding the ability to impose such limit could be nice. It
> requires an additional hook (from subflow_syn_recv_sock() towards the
> PM). If we agree on that, I can try to cover in a separate
> patch/series.

We can indeed see that later.

I initially thought you wanted to also protect the server but here the 
objective is only to protect the client which reacts when an ADD_ADDR is 
received.

>> When the ADD_ADDRv1 with the 'echo' will be available, does that mean we
>> will add an "important" delay for the creation of new subflows?
>>
>> I guess at some points, we might want to have the possibility to:
>> - send ADD_ADDR and MP_JOIN in parallel
> 
> We can do the above with the propososed infrastructure. We could
> additionally create multiple subflows in paraller, but we can't
> (currently) send multiple ADD_ADD "in parallel".

I we can configure the client not to send ADD_ADDR, it's fine. If we 
send them, we delay the MP_JOIN because that's what the firewall and 
load-balancers want.

> The main concern I have to support the latter is adding to the msk data
> structure new fields that related to the specific PM implementation.
> 
> (The ones currently present should be generic)

The space is quite important in mptcp.org. I don't know what's the 
reasonable max size we can have per msk for the PM.

> There are likely a bunch of items I missed in v4, so I suspect we will
> move the discussion on that series.

All good, I will look at the v4 ASAP. I hope tomorrow. But I am sure it 
is fine and we can probably already merge it and provide fixes, if 
needed, later, no?

Cheers,
Matt
-- 
Matthieu Baerts | R&D Engineer
matthieu.baerts(a)tessares.net
Tessares SA | Hybrid Access Solutions
www.tessares.net
1 Avenue Jean Monnet, 1348 Louvain-la-Neuve, Belgium

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [MPTCP] Re: [PATCH v3 7/9] mptcp: add netlink based PM
@ 2020-02-24 18:31 Paolo Abeni
  0 siblings, 0 replies; 12+ messages in thread
From: Paolo Abeni @ 2020-02-24 18:31 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 2076 bytes --]

On Mon, 2020-02-24 at 18:58 +0100, Matthieu Baerts wrote:
> > The maximum number of received ADD_ADDR limit allows to easily model
> > the 'server' behaviour (max accepted == 0). We could add an additional
> > max_subflows limit, but I think it's only relevant if we do the full-
> > mash strategy.
> 
> What the server will do with the received ADD_ADDR? The RFC allows the 
> server to initiate subflows but technically, is it possible for the 
> server to do that if the socket is in "LISTEN" mode and on "CONNECT" 
> mode on the client side?
> 
> If we limit the number of created subflows, we can limit the number of 
> subflows on both the client and the server. Is it possible to limit the 
> number of subflows on the server side with the current version? (It is 
> maybe not needed for the moment.)

I see! Yes, currently we can't limit the number of subflows created for
a single mptcp socket on a server. I'm unsure how bad that is, as
usually we allow each client to create an arbitrary high number of TCP
connection to the server (modulo syn flood).

Still I think adding the ability to impose such limit could be nice. It
requires an additional hook (from subflow_syn_recv_sock() towards the
PM). If we agree on that, I can try to cover in a separate
patch/series.

> When the ADD_ADDRv1 with the 'echo' will be available, does that mean we 
> will add an "important" delay for the creation of new subflows?
> 
> I guess at some points, we might want to have the possibility to:
> - send ADD_ADDR and MP_JOIN in parallel

We can do the above with the propososed infrastructure. We could
additionally create multiple subflows in paraller, but we can't
(currently) send multiple ADD_ADD "in parallel". 

The main concern I have to support the latter is adding to the msk data
structure new fields that related to the specific PM implementation.

(The ones currently present should be generic)

There are likely a bunch of items I missed in v4, so I suspect we will
move the discussion on that series.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [MPTCP] Re: [PATCH v3 7/9] mptcp: add netlink based PM
@ 2020-02-24 17:58 Matthieu Baerts
  0 siblings, 0 replies; 12+ messages in thread
From: Matthieu Baerts @ 2020-02-24 17:58 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 30940 bytes --]

Hi Paolo,

Thank you for your answers too all my questions! :)

On 24/02/2020 13:07, Paolo Abeni wrote:
> Hi,
> 
> Thank you for the very detailed review. I'll try to cover your feedback
> asap. I have a little more details below, but I would prefer post-
> poning more complext stuff like full-mash and retry policy.

Sounds good to me. I think the most important is to stick with one 
policy while still having the possibility to extend it later without 
breaking the API.

> On Sat, 2020-02-22 at 17:38 +0100, Matthieu Baerts wrote:
>> As always, it is very good! Sorry for the delay but here are various
>> questions and comments.
>>
>> Note that I didn't really pay attention at how the locks were used, I
>> would be glad if someone could look at this specific point. I focused my
>> review more on the exposed API, general behaviour, Netlink stuff and typos.
>>
>> On 21/02/2020 17:48, Paolo Abeni wrote:
>>> Expose to U/S a netlink family to control the PM, setting:
>>>
>>>    - list of local addresses to be signaled.
>>>    - list of local addresses used to created subflows.
>>>    - maximum number of add_addr option to react
>>
>> It looks a bit strange to me to limit in term of ADD_ADDR. I see that
>> all commands are linked to addresses which seem to make sense but I
>> don't know if it is good for a generic PM.
>>
>> Depending on the strategy, you might create and accept more or less
>> subflows using the same list of local and remote addresses:
>>
>>    - a client with a fullmesh strategy will create a subflow from each
>> local address to each remote ones having the same IP family. So if the
>> client and the server have each 2 (reachable) IPv4 A and B, there will
>> be 4 subflows: A-A, A-B, B-A, B-B.
>>
>>    - another strategy could be like the fullmesh one but using each IP
>> only once: A-A, B-B
>>
>>    - slightly different: create one subflow per network interface (from
>> the client, it is hard to know what the server has): both client and
>> server can have 2 NIC with one IPv4 and one v6 per NIC. The server will
>> announce the v4 and v6 of the other NIC and the client will use one IP
>> per interface: A4-A4, B4/6-B4/6 but not A4-A4, A6-A/B6, B4-B4, B6-B/A6
>> like we would have with the previous strategy.
>>
>>    - there are plenty of strategies for the client, e.g. ndiffport (more
>> subflows using the same IPs but different ports), etc. Many strategies
>> will be dedicated to one specific use-case, then requiring a more
>> advanced Netlink PM like the one in mptcp.org with mptcpd.
>>
>>    - on the server side, I think there are less strategies. Of course,
>> some might require a more advanced Netlink
> 
> One of the main goals here was keeping this PM impl. as simple as
> possible. I would defer any non trivial strategy to later.
> 
> With this patch the peer will create a subflow for each local address
> (towards the "main" remote address) and one subflow for each received
> and accepted ADD_ADDR, up to the specified limit, using whatever local
> address the routing will pick. No full-mash.

It sounds simple and still flexible enough. The "full-mesh" feature can 
be added later, with or without mash as a side dish :)

In this configuration, I think we can cover the "smartphone" use-case: 
one path via WiFi and one via cellular.

> The limit on the number of accepted address is to avoid the a malicius
> peer to consume a lot of resources.

So only to avoid the client to create too many subflows in reaction to 
received ADD_ADDR, right? (and the server, even if it will fail)

>> PM but most of the time, the server's job is to announce addresses
>> and accept (or reject) subflows.
>>
>> Also, another point: the client doesn't really have to send ADD_ADDR to
>> the server. The server could then receive MP_JOIN from IPs that are new
>> to the server, i.e. not announced by the client with ADD_ADDR.
> 
> Yep, the above is configurable with the proposed interface.
> 
>> Now, after having mentioned all these points, I think it can be
>> interesting to:
>>
>>    - continue to manage addresses that can be announced / used to create
>> subflows.
>>
>>    - continue to pick a single strategy for the moment. Maybe best to
>> start with fullmesh and if needed, the userspace can block some
>> combinations using NFTables (I almost wrote IPTables :-o). This can be
>> extended later by adding more strategies but I think we can already do
>> many things if the userspace selects which additional addresses to use.
> 
> I think a full-mash strategy would require a litte more state-tracking,
> but I'm unsure before starting coding it.

It seems this strategy can be extended later to cover more use-cases.

>>    - instead of having a maximum number of received ADD_ADDR that can
>> use, it might be more generic to limit the number of created/accepted
>> subflows per MPTCP connection. If needed, this can be extended later
>> with other limits: number of subflow per interface or IP and per MPTCP
>> connection.
> 
> The maximum number of received ADD_ADDR limit allows to easily model
> the 'server' behaviour (max accepted == 0). We could add an additional
> max_subflows limit, but I think it's only relevant if we do the full-
> mash strategy.

What the server will do with the received ADD_ADDR? The RFC allows the 
server to initiate subflows but technically, is it possible for the 
server to do that if the socket is in "LISTEN" mode and on "CONNECT" 
mode on the client side?

If we limit the number of created subflows, we can limit the number of 
subflows on both the client and the server. Is it possible to limit the 
number of subflows on the server side with the current version? (It is 
maybe not needed for the moment.)

(EDIT: I see below that these connections are going to fail, that's why 
we should put RCV_MAX to 0)

>> In other words, compared to this patch, only modifying how the limits
>> are managed to be more generic (I think).
>>
>> What do you think about that?
>>
>>
>> Maybe you still want to keep a limit for the received ADD_ADDR? I don't
>> think we store them for the moment? I guess MPTCP_PM_ADDR_MAX is enough
>> for now for the limit.
> 
> Yep, we don't store the remote addresses, and I would avoid that till
> is possible - potentially we could have a lot of remote addresses per
> namespace.

I guess if we store them, that would be per connection. So yes we can 
have a lot :)

But it can be needed: if a client looses one subflow or has a new 
available interface, it might want to re-try it later. But we can look 
at that later and maybe it is enough to just re-establish a subflow 
after an error.

>>> When the msk is fully established, the PM netlink attempts to
>>> create subflow for each addr in 'local' list, waiting for each
>>> connection to be completed before attempting the next one.
>>
>> I guess it does that only for the client (connect, not listen).
> 
> The PM netlink currently does that for every peer with a configured not
> empty local address list. Yep, for a server each subflow connection is
> going to fail.

OK, thank you for the clarification!

>>> After exausting the 'local' list, the PM tries to announce the
>>> 'signal' list via the ADD_ADDR option. Since we currenlty lack
>>
>> (detail, we can fix that later but: s/exausting/exhausting/ and
>> s/currenlty/currently/ and s/signaled/signalled/ (or announced))
>>
>>> the ADD_ADDR echo (and related event) only the first addr is sent.
>>
>> It is not clear for me why the client has to announce addresses.
> 
> Just because the RFC allows that and it's very easy to implement
> -forbitting that would need some additional check. Also fits the unix
> philosopy ;)

If it is easier to add this support than having a separation between 
client and server mode, that's fine :-)

>> But if
>> it does th	at, it might be better to do it before creating additional
>> subflows? The info in the ADD_ADDR sent by the clients could maybe be
>> used in firewalls. Better to send them before then.
> 
> Ok. Will do in the next iteration.

When the ADD_ADDRv1 with the 'echo' will be available, does that mean we 
will add an "important" delay for the creation of new subflows?

I guess at some points, we might want to have the possibility to:
- send ADD_ADDR and MP_JOIN in parallel
- send ADD_ADDR then, once they have been ACKed by the server, send the 
MP_JOINs. For potential firewall/load-balancers, see below.

I think the firewall/load-balancer support is less important now. Worst 
case: the SYN+MP_JOIN is retransmitted.
So maybe better not to delay the MP_JOIN.
Or configure the client not to send ADD_ADDR :)

> Can you please explain the
> 'firewall' thing? who is used the ADD_ADDR opt?

The only use-case I can find for the client to send ADD_ADDRs is to 
allow stateful firewalls or load-balancers to act in prevision of the 
MP_JOIN they are going to receive: not to block it or to forward it to 
the right sub-server.
There are some techniques to allow stateless load-balancers but some 
might prefer stateful ones.

>> +/* netlink interface */
>>> +#define MPTCP_PM_NAME		"mptcp_pm"
>>> +#define MPTCP_PM_CMD_GRP_NAME	"mptcp_pm_cmds"
>>> +#define MPTCP_PM_VER		0x1
>>
>> Should we add "_NETLINK" or "_NL" here above, just in case we have
>> additional PMs later?
>>
>>     MPTCP_PM_NL_NAME "mptcp_pm_netlink"
> 
> The netlink name is limited to GENL_NAMSIZ(16) - comprising the
> trailing '\0' so it will not fit. The 'nl' part sounds rendundant (this
> is a nl interface), so I explicitly avoided that in the API

Good point! I think only modifying "define" names is enough:

   MPTCP_PM_NETLINK_NAME
   MPTCP_PM_NETLINK_CMD_GRP_NAME
   MPTCP_PM_NETLINK_VER

(or with _NL)

Just to avoid conflicts with potential future PMs :)

>> And maybe we should do the same below? e.g. "mptcp_pm_nl_attrs"
>>
>>> +
>>> +/*
>>> + * ATTR types defined for MPTCP
>>> + */
>>> +enum mptcp_pm_attrs {
>>> +	MPTCP_PM_ATTR_UNSPEC,
>>
>> Because this enum will be exposed to userspace, should we force the
>> first item to be 0? (same below)
>> I guess most compiler will set it to 0 by default but I think the specs
>> don't force them to do that.
> 
> According to http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf
> the C99 specs require the first enum value being 0  - section 6.7.2.2
> Enumeration specifiers, p: 105-106.

Thank you for pointing me to the specs, I always thought the value would 
not be necesserally 0, explaining why in files from include/uapi, many 
first entries of enum's have "= 0" :)

>>> +
>>> +	MPTCP_PM_ATTR_ADDR,				/* nested address */
>>> +	MPTCP_PM_ATTR_RCV_ADD_ADDRS,			/* u32 */
>>> +
>>> +	__MPTCP_PM_ATTR_MAX
>>> +};
>>> +
>>> +#define MPTCP_PM_ATTR_MAX (__MPTCP_PM_ATTR_MAX - 1)
>>> +
>>> +enum mptcp_pm_addr_addrs {
>>> +	MPTCP_PM_ADDR_ATTR_UNSPEC,
>>> +
>>> +	MPTCP_PM_ADDR_ATTR_FAMILY,			/* u16 */
>>> +	MPTCP_PM_ADDR_ATTR_ID,				/* u8 */
>>> +	MPTCP_PM_ADDR_ATTR_ADDR4,			/* struct in_addr */
>>> +	MPTCP_PM_ADDR_ATTR_ADDR6,			/* struct in6_addr */
>>> +	MPTCP_PM_ADDR_ATTR_PORT,			/* u16 */
>>
>> (It seems you are not using it)
> 
> Currently not used, it's there to cope with the RFC allowing a MPTCP
> peer to announce a local address on a specific port.

OK to keep it then!

>>> +	MPTCP_PM_ADDR_ATTR_FLAGS,			/* u32 */
>>> +	MPTCP_PM_ADDR_ATTR_IF_IDX,			/* s32 */
>>> +
>>> +	__MPTCP_PM_ADDR_ATTR_MAX
>>> +};
>>> +
>>> +#define MPTCP_PM_ADDR_ATTR_MAX (__MPTCP_PM_ADDR_ATTR_MAX - 1)
>>> +
>>> +#define MPTCP_PM_ADDR_FLAG_SIGNAL			(1 << 0)
>>> +#define MPTCP_PM_ADDR_FLAG_SUBFLOW			(1 << 1)
>>> +#define MPTCP_PM_ADDR_FLAG_BACKUP			(1 << 2)
>>
>> detail and we can fix it later if needed: I guess checkpatch will ask us
>> to use BIT(0), etc.
> 
> Will do in next iteration.

Thanks!

>>> +
>>> +enum {
>>> +	MPTCP_CMD_UNSPEC,
>>> +
>>> +	MPTCP_CMD_ADD_ADDR,
>>> +	MPTCP_CMD_DEL_ADDR,
>>> +	MPTCP_CMD_GET_ADDR,
>>> +	MPTCP_CMD_FLUSH_ADDRS,
>>> +	MPTCP_CMD_SET_RCV_ADD_ADDRS,
>>> +	MPTCP_CMD_GET_RCV_ADD_ADDRS,
>>> +
>>> +	__MPTCP_CMD_AFTER_LAST
>>> +};
>>
>> A detail but maybe important before it's too late: should we add "_PM"
>> or "_PM_NL" in the list of commands?
> 
> Yep, will do in the next iteration.

Thanks!

>> Also it could be nice to add (later?) comments about the API: what does
>> each command do, which (optional) parameters to use and what reply you
>> will get (if any).
> 
> Uhm... locking at existing genl family, this kind of documentation is
> outside the scope of the uapi header ? I guess because it will bloat
> the header too much ?!?

OK, I was only thinking about something like:

   MPTCP_CMD_ADD_ADDR: param1 [optional param2]
     Add addresses that will be announced / used to create new subflows

> [...]
>>> +struct pm_nl_pernet {
>>> +	/* protects pernet updates */
>>> +	spinlock_t		lock;
>>> +	struct list_head	addr_list;
>>
>> detail: maybe clearer with local_addr_list?
> 
> Ok, will change in next iteration

Thanks!

>> Should we not maintain a list of remote_add_addr?
> 
> I explicitly tried to avoid that: a per netns list of remote addresses
> could be quite long/expensive to maintain.

Fine for me if we don't need it for the moment.

> [...]
>>> +static bool addresses_equal(const struct mptcp_addr_info *a,
>>> +			    struct mptcp_addr_info *b, bool use_port)
>>> +{
>>> +	bool addr_equals;
>>> +
>>> +	if (a->family != b->family)
>>> +		return false;
>>> +
>>> +	if (a->family == AF_INET)
>>> +		addr_equals = !memcmp(&a->addr, &b->addr, sizeof(b->addr));
>>> +	else
>>
>> You need to surround this block with:
>>
>>     #if IS_ENABLED(CONFIG_MPTCP_IPV6)
> 
> Will do in the next iteration.

Thanks!

>> Out of curriosity, what's the recommended way to deal with IPv4 and
>> IPv6: should we always have "else if (a->family == AF_INET6)" (or a
>> switch/case) or can we assume that if it is not v4, it is v6?
> 
> In this specific case, we can do this assumption, as the family has
> been validated before (by the netlink callback and/or the socket code).

I see, thank you!

>>
>>> +		addr_equals = !memcmp(&a->addr6, &b->addr6, sizeof(b->addr6));
>>
>> detail: you can also use ipv6_addr_cmp(). (and not memcmp for v4?)
> 
> Will do in the next iteration.

Thanks!

>>> +
>>> +	if (!addr_equals)
>>> +		return false;
>>> +	if (!use_port)
>>> +		return true;
>>> +
>>> +	return a->port == b->port;
>>> +}
>>> +
>>> +static void local_address(const struct sock_common *skc,
>>> +			  struct mptcp_addr_info *addr)
>>> +{
>>> +	addr->family = skc->skc_family;
>>
>> Should we copy the source port, just in case?
> 
> The source port explicitly omitted for local addresses as it usually
> changes across subflows and we want to ignore it. I'll add a comment in
> the next iteration

Fine for me, thanks, we can add it later if we need it for some other 
reasons.

>>
>>> +	if (addr->family == AF_INET)
>>> +		addr->addr.s_addr = skc->skc_rcv_saddr;
>>> +#if IS_ENABLED(CONFIG_MPTCP_IPV6)
>>> +	else if (addr->family == AF_INET6)
>>> +		addr->addr6 = skc->skc_v6_rcv_saddr;
>>> +#endif
>>> +}
>>> +
>>> +static void remote_address(const struct sock_common *skc,
>>> +			   struct mptcp_addr_info *addr)
>>> +{
>>> +	addr->family = skc->skc_family;
>>> +	addr->port = skc->skc_dport;
>>> +	if (addr->family == AF_INET)
>>> +		addr->addr.s_addr = skc->skc_daddr;
>>> +#if IS_ENABLED(CONFIG_MPTCP_IPV6)
>>> +	else if (addr->family == AF_INET6)
>>> +		addr->addr6 = skc->skc_v6_daddr;
>>> +#endif
>>> +}
>>> +
>>> +static bool lookup_subflow_by_saddr(const struct mptcp_sock *msk,
>>> +				    struct mptcp_addr_info *saddr)
>>> +{
>>> +	struct mptcp_subflow_context *subflow;
>>> +	struct mptcp_addr_info cur;
>>> +	struct sock_common *skc;
>>> +
>>> +	list_for_each_entry(subflow, &msk->conn_list, node) {
>>> +		skc = (struct sock_common *)mptcp_subflow_tcp_sock(subflow);
>>> +
>>> +		local_address(skc, &cur);
>>
>> I guess it's OK to do the copy to ease the comparison here below, right?
>> (we are going to do that only when creating subflows so it seems fine.
>> Maybe should we add a comment if we reuse this function later?)
> 
> Are you concerned about performances? I think this is really
> unnoticeable. The copied data will be in cache, possibly even in
> registers, and the overhead to traverse the subflow list (with possibly
> different cache misses per subflow) will be far greater.

Thank you for the justification. Indeed, I didn't think about the 
traverse of the subflow list.

>>> +static void check_work_pending(struct mptcp_sock *msk)
>>> +{
>>> +	if (msk->pm.add_addr_signaled == msk->pm.add_addr_signal_max &&
>>> +	    msk->pm.local_addr_used == msk->pm.local_addr_max)
>>> +		WRITE_ONCE(msk->pm.work_pending, false);
>>> +}
>>> +
>>> +static void mptcp_pm_create_subflow_or_signal(struct mptcp_sock *msk)
>>
>> detail: I think we should avoid using signal alone because MP_FAIL,
>> REM_ADD, MP_BACKUP, etc. are signals as well, no?
>> It's maybe just me but I find it clearer, when it is linked to ADD_ADDR,
>> to use "announce" instead of the generic term "signal". But that's a
>> detail. As long as it is clear for everybody :-)
> 
> I picked 'signal' because the term was already in use in the existing
> code. I can change it to *'signal_addr' to be consistent with other
> names.

I should have maybe react earlier and that's certainly mainly just me 
who is used to talk about announced addresses. I can get used to one or 
the other :)

>>> +{
>>> +	struct sock *sk = (struct sock *)msk;
>>> +	struct mptcp_pm_addr_entry *local;
>>> +	struct mptcp_addr_info remote;
>>> +	struct pm_nl_pernet *pernet;
>>> +
>>> +	pernet = net_generic(sock_net((struct sock *)msk), pm_nl_pernet_id);
>>> +
>>> +	lock_sock(sk);
>>> +
>>> +	spin_lock_bh(&msk->pm.lock);
>>> +	msk->pm.status = MPTCP_PM_IDLE;
>>> +	pr_debug("local %d:%d signal %d:%d\n",
>>> +		 msk->pm.local_addr_used, msk->pm.local_addr_max,
>>> +		 msk->pm.add_addr_signaled, msk->pm.add_addr_signal_max);
>>> +
>>> +	/* check first if should create a new subflow */
>>> +	if (msk->pm.local_addr_used < msk->pm.local_addr_max) {
>>> +		remote_address((struct sock_common *)sk, &remote);
>>> +
>>> +		local = select_local_address(pernet, msk);
>>> +		if (local) {
>>> +			msk->pm.local_addr_used++;
>>> +			check_work_pending(msk);
>>> +			spin_unlock_bh(&msk->pm.lock);
>>> +			__mptcp_subflow_connect(sk, local->ifindex,
>>> +						&local->addr, &remote);
>>
>> Should we do anything special if this fail?
>> (decrement local_addr_used? retry later?)
> 
> IIRC, according to the RFC we must not retry if the address is
> unreachable and/or the path is not MPTCP capable. So we should retry
> only if failing e.g. due to an allocation error. Overall a correct
> retry policy looks a bit too complex for this stage to me. I think we
> can add that later, if needed.

Fine for me.
So no need to decrement msk->pm.local_addr_used, right?

> 
> [...]
>> +void mptcp_pm_nl_add_addr(struct mptcp_sock *msk)
>>
>> (it's maybe just me but should we rename this by adding "_recv"? Same
>> for the caller and related functions because it was not clear we are
>> doing some actions not add an address or produce an add_addr but because
>> we just received one)
> 
> Ok, will do in the next iteration.

Thanks!

>>> +{
>>> +	struct sock *sk = (struct sock *)msk;
>>> +	struct mptcp_addr_info remote;
>>> +	struct mptcp_addr_info local;
>>> +	struct pm_nl_pernet *pernet;
>>> +
>>> +	pernet = net_generic(sock_net((struct sock *)msk), pm_nl_pernet_id);
>>> +
>>> +	spin_lock_bh(&msk->pm.lock);
>>> +	msk->pm.status = MPTCP_PM_IDLE;
>>> +	pr_debug("accepted %d:%d remote family %d",
>>> +		 msk->pm.add_addr_accepted, msk->pm.add_addr_accept_max,
>>> +		 msk->pm.remote.family);
>>> +	if (++msk->pm.add_addr_accepted >= msk->pm.add_addr_accept_max)
>>> +		WRITE_ONCE(msk->pm.accept_addr, false);
>>> +
>>> +	/* connect to the specified remote address, using whatever
>>> +	 * local address the routing configuration will pick.
>>> +	 */
>>
>> Should we try with all local IP the userspace added (+ the original
>> one)? → to have a fullmesh
> 
> I would postpone this one to later.

Fine for me!

>>
>>> +	remote = msk->pm.remote;
>>> +	if (!remote.port)
>>> +		remote.port = sk->sk_dport;
>>> +	memset(&local, 0, sizeof(local));
>>> +	local.family = remote.family;
>>> +	spin_unlock_bh(&msk->pm.lock);
>>> +
>>> +	lock_sock(sk);
>>> +	__mptcp_subflow_connect((struct sock *)msk, 0, &local, &remote);
>>
>> Should we do anything special if this fail?
> 
> Same situation as the other __mptcp_subflow_connect()

Also no need to decrement msk->pm.add_addr_accepted, right?

>>> +	release_sock(sk);
>>
>> Do we need to clear "pm.remote" somewhere? How do we deal with multiple
>> received ADD_ADDR? Even if with the "echo", I guess we might receive a
>> new one before the workqueue has finished dealing with the previous one, no?
> 
> The idea is that the workqueue does sort of rate-limiting on subflow
> creation. To cope with multiple 'concurrent' addresses we need to add
> additiona/dynamic storage. IIRC the RFC allow us to ignore multiple
> ADD_ADDR options as it fit us.

OK, I guess it's fine for the moment. But it seems not unlikely to have 
a bunch of ADD_ADDR exchanged at the beginning of the connection 
(IPv4/v6). Or even later if the client sends new ADD_ADDRs because a new 
interface is available.

But that would be for later.

>>> +}
>>> +
>>> +static bool address_use_port(struct mptcp_pm_addr_entry *entry)
>>> +{
>>> +	return (entry->flags &
>>> +		(MPTCP_PM_ADDR_FLAG_SIGNAL | MPTCP_PM_ADDR_FLAG_SUBFLOW)) ==
>>> +		MPTCP_PM_ADDR_FLAG_SIGNAL;
>>> +}
>>> +
>>> +static int mptcp_pm_nl_append_new_addr(struct pm_nl_pernet *pernet,
>>
>> detail: Here as well, should we add _local → new_local_addr?
> 
> Ok, will do in the next iteration.

Thanks!

>>> +				       struct mptcp_pm_addr_entry *entry)
>>> +{
>>> +	struct mptcp_pm_addr_entry *cur;
>>> +	int ret = -EINVAL;
>>> +
>>> +	spin_lock_bh(&pernet->lock);
>>> +	/* to keep the code simple, don't do IDR-like allocation for address ID,
>>> +	 * just bail when we exceed limits
>>> +	 */
>>
>> And we also need to support REM_ADDR here I guess.
> 
> Yes, we need to add REM_ADDR support. I think we don't need to add
> anything specific with the current implementation: we don't store
> locally remote addess, we don't have anything to delete and we will not
> re-use them later anyway (due to lack of retry policy).
Sounds good to me.
Note that REM_ADDR can also be used to explain to the other host that 
they can close a subflow. It can be useful if one host is disconnected 
before having sent a FIN/RST. The other host doesn't have to keep trying.

>>> +	if (pernet->next_id > 255)
>>> +		goto out;
>>> +	if (pernet->addrs >= MPTCP_PM_ADDR_MAX)
>>> +		goto out;
>>> +
>>> +	/* do not insert duplicate address, differentiate on port only
>>> +	 * singled addresses
>>> +	 */
>>> +	list_for_each_entry(cur, &pernet->addr_list, list) {
>>> +		if (addresses_equal(&cur->addr, &entry->addr,
>>> +				    address_use_port(entry) &&
>>> +				    address_use_port(cur)))
>>> +			goto out;
>>> +	}
>>> +
>>> +	if (entry->flags & MPTCP_PM_ADDR_FLAG_SIGNAL)
>>> +		pernet->add_addr_signal_max++;
>>> +	if (entry->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW)
>>> +		pernet->local_addr_max++;
>>> +
>>> +	entry->addr.id = pernet->next_id++;
>>> +	pernet->addrs++;
>>> +	list_add_tail_rcu(&entry->list, &pernet->addr_list);
>>> +	ret = entry->addr.id;
>>> +
>>> +out:
>>> +	spin_unlock_bh(&pernet->lock);
>>> +	return ret;
>>> +}
>>> +
>>> +int mptcp_pm_nl_get_local_id(struct mptcp_sock *msk, struct sock_common *skc)
>>> +{
>>> +	struct mptcp_pm_addr_entry *entry;
>>> +	struct mptcp_addr_info skc_local;
>>> +	struct mptcp_addr_info msk_local;
>>> +	struct pm_nl_pernet *pernet;
>>> +	int ret = -1;
>>> +
>>> +	if (WARN_ON_ONCE(!msk))
>>> +		return -1;
>>> +
>>> +	/* The 0 ID mapping is defined by the first subflow, copied into the msk
>>> +	 * addr
>>> +	 */
>>
>> Is it still OK to do that if the first subflow is down (but other
>> subflows exist)? I mean, no need to have additional check to see if the
>> subflow 0 is still alive? (I didn't check what we do with the msk for
>> this specific case)
> 
> We must always use the same ID for the same address inside an MTPCP
> connection scope. Once we associate '0' to one local IP we should use
> that association till the msk socket exists. With the current code, the
> first subflow socket structure will exists till mptcp_close, even if
> the related TCP connection is terminate well before. So this code
> should be safe.

That's true, thank you for the reminder. We have enough ID anyway.

> [...]
>>> +#if IS_ENABLED(CONFIG_MPTCP_IPV6)
>>> +	if (entry->addr.family == AF_INET6)
>>> +		entry->addr.addr6 = nla_get_in6_addr(tb[addr_addr]);
>>> +	else
>>> +#endif
>>> +		entry->addr.addr.s_addr = nla_get_in_addr(tb[addr_addr]);
>>> +
>>> +skip_family:
>>> +	if (tb[MPTCP_PM_ADDR_ATTR_IF_IDX])
>>> +		entry->ifindex = nla_get_s32(tb[MPTCP_PM_ADDR_ATTR_IF_IDX]);
>>> +	else
>>> +		entry->ifindex = 0;
>>
>> detail: "entry" is init to 0 at the beginning of this function. Same
>> below for the ID. Or if you remove the memset, the flags are not set to 0
> 
> Will drop the 'else' branch in the next iteration.

Thanks! (same for the addr.id but I guess you saw it).

>>> +
>>> +	if (tb[MPTCP_PM_ADDR_ATTR_ID])
>>> +		entry->addr.id = nla_get_u16(tb[MPTCP_PM_ADDR_ATTR_ID]);
>>
>> Is it not a u8?
> 
> yep, funnily enough self-tests passes. Will fix in the next iteration.

Thanks!

>>> +	else
>>> +		entry->addr.id = 0;
>>> +
>>> +	if (tb[MPTCP_PM_ADDR_ATTR_FLAGS])
>>> +		entry->flags = nla_get_u32(tb[MPTCP_PM_ADDR_ATTR_FLAGS]);
>>
>> (detail if you have to modify this: please add a new line here because
>> most of the time, there is an empty line after a if-statement block)
>>
>>> +	return 0;
>>> +}
>>> +
>>> +static struct pm_nl_pernet *genl_info_pm_nl(struct genl_info *info)
>>> +{
>>> +	return net_generic(genl_info_net(info), pm_nl_pernet_id);
>>> +}
>>> +
>>> +static int mptcp_nl_cmd_add_addr(struct sk_buff *skb, struct genl_info *info)
>>> +{
>>> +	struct nlattr *attr = info->attrs[MPTCP_PM_ATTR_ADDR];
>>> +	struct pm_nl_pernet *pernet = genl_info_pm_nl(info);
>>> +	struct mptcp_pm_addr_entry addr, *entry;
>>> +	int ret;
>>> +
>>> +	ret = mptcp_pm_parse_addr(attr, info, true, &addr);
>>> +	if (ret)
>>
>> Out of curriosity, is there a recommendation on what to check with
>> "ret"? I mean "ret" is vague and often we can see "if (ret < 0)", like
>> below (also because it is needed in this case), which seems clearer to
>> me. Sometimes, "err" is used and in this case, it is clear to read: "if
>> (err)".
>> No need to change anything, it's just to know what's recommended but not
>> mandatory I guess.
> 
> I'll go with 'err' in the next iteration.

Thanks!

>>> +		return ret;
>>> +
>>> +	entry = kmalloc(sizeof(*entry), GFP_KERNEL);
>>> +	if (!entry) {
>>> +		GENL_SET_ERR_MSG(info, "can't allocate addr");
>>> +		return -ENOMEM;
>>> +	}
>>> +
>>> +	*entry = addr;
>>> +	ret = mptcp_pm_nl_append_new_addr(pernet, entry);
>>> +	if (ret < 0) {
>>> +		GENL_SET_ERR_MSG(info, "too many addresses");
>>
>> detail: Or duplicated one.
>>
>>> +		kfree(entry);
>>> +		return ret;
>>> +	}
>>
>> Should we trigger here the establishement of new subflows and/or the
>> send of ADD_ADDR for all existing MPTCP connections?
> 
> To keep the things simple, I would avoid that. For servers the number
> of msk sockets to process could be high.

It might be required later but that can be extented. It's mostly for the 
client where it can be important to start new subflows for existing 
connections when a new ADD_ADDR is received or a new local address can 
be used. But that would be for later.

> [...]
>>> +static int mptcp_nl_cmd_get_addr(struct sk_buff *skb, struct genl_info *info)
>>> +{
>>> +	struct nlattr *attr = info->attrs[MPTCP_PM_ATTR_ADDR];
>>> +	struct pm_nl_pernet *pernet = genl_info_pm_nl(info);
>>> +	struct mptcp_pm_addr_entry addr, *entry;
>>> +	struct sk_buff *msg;
>>> +	void *reply;
>>> +	int ret;
>>> +
>>> +	ret = mptcp_pm_parse_addr(attr, info, false, &addr);
>>> +	if (ret)
>>> +		return ret;
>>> +
>>> +	msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
>>> +	if (!msg)
>>> +		return -ENOMEM;
>>> +
>>> +	reply = genlmsg_put_reply(msg, info, &mptcp_genl_family, 0,
>>> +				  MPTCP_CMD_ADD_ADDR);
>>
>> Should it not be MPTCP_CMD_GET_ADDR?
>>
>> Can you not use info->genlhdr->cmd? (Same for the other genlmsg_put_reply())
> 
> I think you are right. I picked 'MPTCP_CMD_ADD_ADDR' because the reply
> message is formatted exactly as a MPTCP_CMD_ADD_ADDR command, but it's
> not the correct way to do this. Will fix in the next iteration.

Thanks!

>>>
>>> +mptcp_nl_cmd_set_rcv_add_addrs(struct sk_buff *skb, struct genl_info *info)
>>> +{
>>> +	struct nlattr *attr = info->attrs[MPTCP_PM_ATTR_RCV_ADD_ADDRS];
>>> +	struct pm_nl_pernet *pernet = genl_info_pm_nl(info);
>>> +	int limit;
>>> +
>>> +	if (!attr) {
>>> +		GENL_SET_ERR_MSG(info, "missing announce accept limit");
>>> +		return -EINVAL;
>>> +	}
>>> +
>>> +	limit = nla_get_u16(attr);
>>
>> it should be u32 I think.
> 
> Yep, will fix in the next iteration.

Thanks!

> 
>>> +static int __net_init pm_nl_init_net(struct net *net)
>>> +{
>>> +	struct pm_nl_pernet *pernet = net_generic(net, pm_nl_pernet_id);
>>> +
>>> +	INIT_LIST_HEAD_RCU(&pernet->addr_list);
>>> +	__reset_counters(pernet);
>>
>> I guess it's not strictly needed but doesn't hurt.
> 
> Needed on clone()

I forgot about that, thank you!

> 
>>> +void mptcp_pm_nl_init(void)
>>> +{
>>> +	if (register_pernet_subsys(&mptcp_pm_pernet_ops) < 0)
>>> +		panic("Failed to register MPTCP PM pernet subsystem.\n");
>>> +
>>> +	if (genl_register_family(&mptcp_genl_family))
>>> +		panic("Failed to register MPTCP PM netlink family");
>>
>> (detail: I guess you need "\n")
>>
> will fix in the next iteration.

Thanks!

Cheers,
Matt
-- 
Matthieu Baerts | R&D Engineer
matthieu.baerts(a)tessares.net
Tessares SA | Hybrid Access Solutions
www.tessares.net
1 Avenue Jean Monnet, 1348 Louvain-la-Neuve, Belgium

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [MPTCP] Re: [PATCH v3 7/9] mptcp: add netlink based PM
@ 2020-02-24 15:50 Paolo Abeni
  0 siblings, 0 replies; 12+ messages in thread
From: Paolo Abeni @ 2020-02-24 15:50 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 527 bytes --]

On Sat, 2020-02-22 at 17:38 +0100, Matthieu Baerts wrote:
> > +#define MPTCP_PM_ADDR_ATTR_MAX (__MPTCP_PM_ADDR_ATTR_MAX - 1)
> > +
> > +#define MPTCP_PM_ADDR_FLAG_SIGNAL			(1 << 0)
> > +#define MPTCP_PM_ADDR_FLAG_SUBFLOW			(1 << 1)
> > +#define MPTCP_PM_ADDR_FLAG_BACKUP			(1 << 2)
> 
> detail and we can fix it later if needed: I guess checkpatch will ask us 
> to use BIT(0), etc.

Addendum: BIT() is not available in uapi headers and checkpatch does
not complain, so I'll keep this unchanged

Cheers,

Paolo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [MPTCP] Re: [PATCH v3 7/9] mptcp: add netlink based PM
@ 2020-02-24 13:02 Matthieu Baerts
  0 siblings, 0 replies; 12+ messages in thread
From: Matthieu Baerts @ 2020-02-24 13:02 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 877 bytes --]

Hi Davide,

On 24/02/2020 13:59, Davide Caratti wrote:
> On Sat, 2020-02-22 at 17:38 +0100, Matthieu Baerts wrote:
>> *Davide* : I guess no upstreamed version of ss are already using this,
>> right? Anyway because this kernel code is not upstreamed, it's not too
>> late to change, right? :)
> 
> yes, it makes sense. At a certain point, I tried to reduce the name length
> as much as possible, but here _ATTR_  seems legitimate. I will send a
> squash-to patch in short time.

Great, thank you!

> (and I will share also the 'ss' code I'm using ATM, maybe it's also useful
> for you)

Good idea!
We can also add a new repo in github.com/multipath-tcp if needed.

Cheers,
Matt
-- 
Matthieu Baerts | R&D Engineer
matthieu.baerts(a)tessares.net
Tessares SA | Hybrid Access Solutions
www.tessares.net
1 Avenue Jean Monnet, 1348 Louvain-la-Neuve, Belgium

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [MPTCP] Re: [PATCH v3 7/9] mptcp: add netlink based PM
@ 2020-02-24 12:59 Davide Caratti
  0 siblings, 0 replies; 12+ messages in thread
From: Davide Caratti @ 2020-02-24 12:59 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 535 bytes --]

On Sat, 2020-02-22 at 17:38 +0100, Matthieu Baerts wrote:
> *Davide* : I guess no upstreamed version of ss are already using this, 
> right? Anyway because this kernel code is not upstreamed, it's not too 
> late to change, right? :)

yes, it makes sense. At a certain point, I tried to reduce the name length
as much as possible, but here _ATTR_  seems legitimate. I will send a
squash-to patch in short time.

(and I will share also the 'ss' code I'm using ATM, maybe it's also useful
for you)

thanks!
-- 
davide



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [MPTCP] Re: [PATCH v3 7/9] mptcp: add netlink based PM
@ 2020-02-24 12:07 Paolo Abeni
  0 siblings, 0 replies; 12+ messages in thread
From: Paolo Abeni @ 2020-02-24 12:07 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 25405 bytes --]

Hi,

Thank you for the very detailed review. I'll try to cover your feedback
asap. I have a little more details below, but I would prefer post-
poning more complext stuff like full-mash and retry policy.

On Sat, 2020-02-22 at 17:38 +0100, Matthieu Baerts wrote:
> As always, it is very good! Sorry for the delay but here are various 
> questions and comments.
> 
> Note that I didn't really pay attention at how the locks were used, I 
> would be glad if someone could look at this specific point. I focused my 
> review more on the exposed API, general behaviour, Netlink stuff and typos.
> 
> On 21/02/2020 17:48, Paolo Abeni wrote:
> > Expose to U/S a netlink family to control the PM, setting:
> > 
> >   - list of local addresses to be signaled.
> >   - list of local addresses used to created subflows.
> >   - maximum number of add_addr option to react
> 
> It looks a bit strange to me to limit in term of ADD_ADDR. I see that 
> all commands are linked to addresses which seem to make sense but I 
> don't know if it is good for a generic PM.
> 
> Depending on the strategy, you might create and accept more or less 
> subflows using the same list of local and remote addresses:
> 
>   - a client with a fullmesh strategy will create a subflow from each 
> local address to each remote ones having the same IP family. So if the 
> client and the server have each 2 (reachable) IPv4 A and B, there will 
> be 4 subflows: A-A, A-B, B-A, B-B.
> 
>   - another strategy could be like the fullmesh one but using each IP 
> only once: A-A, B-B
> 
>   - slightly different: create one subflow per network interface (from 
> the client, it is hard to know what the server has): both client and 
> server can have 2 NIC with one IPv4 and one v6 per NIC. The server will 
> announce the v4 and v6 of the other NIC and the client will use one IP 
> per interface: A4-A4, B4/6-B4/6 but not A4-A4, A6-A/B6, B4-B4, B6-B/A6 
> like we would have with the previous strategy.
> 
>   - there are plenty of strategies for the client, e.g. ndiffport (more 
> subflows using the same IPs but different ports), etc. Many strategies 
> will be dedicated to one specific use-case, then requiring a more 
> advanced Netlink PM like the one in mptcp.org with mptcpd.
> 
>   - on the server side, I think there are less strategies. Of course, 
> some might require a more advanced Netlink

One of the main goals here was keeping this PM impl. as simple as
possible. I would defer any non trivial strategy to later.

With this patch the peer will create a subflow for each local address
(towards the "main" remote address) and one subflow for each received
and accepted ADD_ADDR, up to the specified limit, using whatever local
address the routing will pick. No full-mash.

The limit on the number of accepted address is to avoid the a malicius
peer to consume a lot of resources.

> PM but most of the time, the server's job is to announce addresses 
> and accept (or reject) subflows.
> 
> Also, another point: the client doesn't really have to send ADD_ADDR to 
> the server. The server could then receive MP_JOIN from IPs that are new 
> to the server, i.e. not announced by the client with ADD_ADDR.

Yep, the above is configurable with the proposed interface.

> Now, after having mentioned all these points, I think it can be 
> interesting to:
> 
>   - continue to manage addresses that can be announced / used to create 
> subflows.
> 
>   - continue to pick a single strategy for the moment. Maybe best to 
> start with fullmesh and if needed, the userspace can block some 
> combinations using NFTables (I almost wrote IPTables :-o). This can be 
> extended later by adding more strategies but I think we can already do 
> many things if the userspace selects which additional addresses to use.

I think a full-mash strategy would require a litte more state-tracking, 
but I'm unsure before starting coding it.

>   - instead of having a maximum number of received ADD_ADDR that can 
> use, it might be more generic to limit the number of created/accepted 
> subflows per MPTCP connection. If needed, this can be extended later 
> with other limits: number of subflow per interface or IP and per MPTCP 
> connection.

The maximum number of received ADD_ADDR limit allows to easily model
the 'server' behaviour (max accepted == 0). We could add an additional
max_subflows limit, but I think it's only relevant if we do the full-
mash strategy.

> In other words, compared to this patch, only modifying how the limits 
> are managed to be more generic (I think).
> 
> What do you think about that?
> 
> 
> Maybe you still want to keep a limit for the received ADD_ADDR? I don't 
> think we store them for the moment? I guess MPTCP_PM_ADDR_MAX is enough 
> for now for the limit.

Yep, we don't store the remote addresses, and I would avoid that till
is possible - potentially we could have a lot of remote addresses per
namespace.

> > When the msk is fully established, the PM netlink attempts to
> > create subflow for each addr in 'local' list, waiting for each
> > connection to be completed before attempting the next one.
> 
> I guess it does that only for the client (connect, not listen).

The PM netlink currently does that for every peer with a configured not
empty local address list. Yep, for a server each subflow connection is
going to fail.

> > After exausting the 'local' list, the PM tries to announce the
> > 'signal' list via the ADD_ADDR option. Since we currenlty lack
> 
> (detail, we can fix that later but: s/exausting/exhausting/ and 
> s/currenlty/currently/ and s/signaled/signalled/ (or announced))
> 
> > the ADD_ADDR echo (and related event) only the first addr is sent.
> 
> It is not clear for me why the client has to announce addresses. 

Just because the RFC allows that and it's very easy to implement
-forbitting that would need some additional check. Also fits the unix
philosopy ;)

> But if 
> it does th	at, it might be better to do it before creating additional 
> subflows? The info in the ADD_ADDR sent by the clients could maybe be 
> used in firewalls. Better to send them before then.

Ok. Will do in the next iteration. Can you please explain the
'firewall' thing? who is used the ADD_ADDR opt?

> +/* netlink interface */
> > +#define MPTCP_PM_NAME		"mptcp_pm"
> > +#define MPTCP_PM_CMD_GRP_NAME	"mptcp_pm_cmds"
> > +#define MPTCP_PM_VER		0x1
> 
> Should we add "_NETLINK" or "_NL" here above, just in case we have 
> additional PMs later?
> 
>    MPTCP_PM_NL_NAME "mptcp_pm_netlink"

The netlink name is limited to GENL_NAMSIZ(16) - comprising the
trailing '\0' so it will not fit. The 'nl' part sounds rendundant (this
is a nl interface), so I explicitly avoided that in the API

> And maybe we should do the same below? e.g. "mptcp_pm_nl_attrs"
> 
> > +
> > +/*
> > + * ATTR types defined for MPTCP
> > + */
> > +enum mptcp_pm_attrs {
> > +	MPTCP_PM_ATTR_UNSPEC,
> 
> Because this enum will be exposed to userspace, should we force the 
> first item to be 0? (same below)
> I guess most compiler will set it to 0 by default but I think the specs 
> don't force them to do that.

According to http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf
the C99 specs require the first enum value being 0  - section 6.7.2.2
Enumeration specifiers, p: 105-106.

> > +
> > +	MPTCP_PM_ATTR_ADDR,				/* nested address */
> > +	MPTCP_PM_ATTR_RCV_ADD_ADDRS,			/* u32 */
> > +
> > +	__MPTCP_PM_ATTR_MAX
> > +};
> > +
> > +#define MPTCP_PM_ATTR_MAX (__MPTCP_PM_ATTR_MAX - 1)
> > +
> > +enum mptcp_pm_addr_addrs {
> > +	MPTCP_PM_ADDR_ATTR_UNSPEC,
> > +
> > +	MPTCP_PM_ADDR_ATTR_FAMILY,			/* u16 */
> > +	MPTCP_PM_ADDR_ATTR_ID,				/* u8 */
> > +	MPTCP_PM_ADDR_ATTR_ADDR4,			/* struct in_addr */
> > +	MPTCP_PM_ADDR_ATTR_ADDR6,			/* struct in6_addr */
> > +	MPTCP_PM_ADDR_ATTR_PORT,			/* u16 */
> 
> (It seems you are not using it)

Currently not used, it's there to cope with the RFC allowing a MPTCP
peer to announce a local address on a specific port.

> > +	MPTCP_PM_ADDR_ATTR_FLAGS,			/* u32 */
> > +	MPTCP_PM_ADDR_ATTR_IF_IDX,			/* s32 */
> > +
> > +	__MPTCP_PM_ADDR_ATTR_MAX
> > +};
> > +
> > +#define MPTCP_PM_ADDR_ATTR_MAX (__MPTCP_PM_ADDR_ATTR_MAX - 1)
> > +
> > +#define MPTCP_PM_ADDR_FLAG_SIGNAL			(1 << 0)
> > +#define MPTCP_PM_ADDR_FLAG_SUBFLOW			(1 << 1)
> > +#define MPTCP_PM_ADDR_FLAG_BACKUP			(1 << 2)
> 
> detail and we can fix it later if needed: I guess checkpatch will ask us 
> to use BIT(0), etc.

Will do in next iteration.

> > +
> > +enum {
> > +	MPTCP_CMD_UNSPEC,
> > +
> > +	MPTCP_CMD_ADD_ADDR,
> > +	MPTCP_CMD_DEL_ADDR,
> > +	MPTCP_CMD_GET_ADDR,
> > +	MPTCP_CMD_FLUSH_ADDRS,
> > +	MPTCP_CMD_SET_RCV_ADD_ADDRS,
> > +	MPTCP_CMD_GET_RCV_ADD_ADDRS,
> > +
> > +	__MPTCP_CMD_AFTER_LAST
> > +};
> 
> A detail but maybe important before it's too late: should we add "_PM" 
> or "_PM_NL" in the list of commands?

Yep, will do in the next iteration.

> Also it could be nice to add (later?) comments about the API: what does 
> each command do, which (optional) parameters to use and what reply you 
> will get (if any).

Uhm... locking at existing genl family, this kind of documentation is
outside the scope of the uapi header ? I guess because it will bloat
the header too much ?!?

[...]
> > +struct pm_nl_pernet {
> > +	/* protects pernet updates */
> > +	spinlock_t		lock;
> > +	struct list_head	addr_list;
> 
> detail: maybe clearer with local_addr_list?

Ok, will change in next iteration

> Should we not maintain a list of remote_add_addr?

I explicitly tried to avoid that: a per netns list of remote addresses
could be quite long/expensive to maintain.

[...]
> > +static bool addresses_equal(const struct mptcp_addr_info *a,
> > +			    struct mptcp_addr_info *b, bool use_port)
> > +{
> > +	bool addr_equals;
> > +
> > +	if (a->family != b->family)
> > +		return false;
> > +
> > +	if (a->family == AF_INET)
> > +		addr_equals = !memcmp(&a->addr, &b->addr, sizeof(b->addr));
> > +	else
> 
> You need to surround this block with:
> 
>    #if IS_ENABLED(CONFIG_MPTCP_IPV6)

Will do in the next iteration.

> Out of curriosity, what's the recommended way to deal with IPv4 and 
> IPv6: should we always have "else if (a->family == AF_INET6)" (or a 
> switch/case) or can we assume that if it is not v4, it is v6?

In this specific case, we can do this assumption, as the family has
been validated before (by the netlink callback and/or the socket code).

> 
> > +		addr_equals = !memcmp(&a->addr6, &b->addr6, sizeof(b->addr6));
> 
> detail: you can also use ipv6_addr_cmp(). (and not memcmp for v4?)

Will do in the next iteration.
> 
> > +
> > +	if (!addr_equals)
> > +		return false;
> > +	if (!use_port)
> > +		return true;
> > +
> > +	return a->port == b->port;
> > +}
> > +
> > +static void local_address(const struct sock_common *skc,
> > +			  struct mptcp_addr_info *addr)
> > +{
> > +	addr->family = skc->skc_family;
> 
> Should we copy the source port, just in case?

The source port explicitly omitted for local addresses as it usually
changes across subflows and we want to ignore it. I'll add a comment in
the next iteration

> 
> > +	if (addr->family == AF_INET)
> > +		addr->addr.s_addr = skc->skc_rcv_saddr;
> > +#if IS_ENABLED(CONFIG_MPTCP_IPV6)
> > +	else if (addr->family == AF_INET6)
> > +		addr->addr6 = skc->skc_v6_rcv_saddr;
> > +#endif
> > +}
> > +
> > +static void remote_address(const struct sock_common *skc,
> > +			   struct mptcp_addr_info *addr)
> > +{
> > +	addr->family = skc->skc_family;
> > +	addr->port = skc->skc_dport;
> > +	if (addr->family == AF_INET)
> > +		addr->addr.s_addr = skc->skc_daddr;
> > +#if IS_ENABLED(CONFIG_MPTCP_IPV6)
> > +	else if (addr->family == AF_INET6)
> > +		addr->addr6 = skc->skc_v6_daddr;
> > +#endif
> > +}
> > +
> > +static bool lookup_subflow_by_saddr(const struct mptcp_sock *msk,
> > +				    struct mptcp_addr_info *saddr)
> > +{
> > +	struct mptcp_subflow_context *subflow;
> > +	struct mptcp_addr_info cur;
> > +	struct sock_common *skc;
> > +
> > +	list_for_each_entry(subflow, &msk->conn_list, node) {
> > +		skc = (struct sock_common *)mptcp_subflow_tcp_sock(subflow);
> > +
> > +		local_address(skc, &cur);
> 
> I guess it's OK to do the copy to ease the comparison here below, right?
> (we are going to do that only when creating subflows so it seems fine. 
> Maybe should we add a comment if we reuse this function later?)

Are you concerned about performances? I think this is really
unnoticeable. The copied data will be in cache, possibly even in
registers, and the overhead to traverse the subflow list (with possibly
different cache misses per subflow) will be far greater.

> > +static void check_work_pending(struct mptcp_sock *msk)
> > +{
> > +	if (msk->pm.add_addr_signaled == msk->pm.add_addr_signal_max &&
> > +	    msk->pm.local_addr_used == msk->pm.local_addr_max)
> > +		WRITE_ONCE(msk->pm.work_pending, false);
> > +}
> > +
> > +static void mptcp_pm_create_subflow_or_signal(struct mptcp_sock *msk)
> 
> detail: I think we should avoid using signal alone because MP_FAIL, 
> REM_ADD, MP_BACKUP, etc. are signals as well, no?
> It's maybe just me but I find it clearer, when it is linked to ADD_ADDR, 
> to use "announce" instead of the generic term "signal". But that's a 
> detail. As long as it is clear for everybody :-)

I picked 'signal' because the term was already in use in the existing
code. I can change it to *'signal_addr' to be consistent with other
names.

> > +{
> > +	struct sock *sk = (struct sock *)msk;
> > +	struct mptcp_pm_addr_entry *local;
> > +	struct mptcp_addr_info remote;
> > +	struct pm_nl_pernet *pernet;
> > +
> > +	pernet = net_generic(sock_net((struct sock *)msk), pm_nl_pernet_id);
> > +
> > +	lock_sock(sk);
> > +
> > +	spin_lock_bh(&msk->pm.lock);
> > +	msk->pm.status = MPTCP_PM_IDLE;
> > +	pr_debug("local %d:%d signal %d:%d\n",
> > +		 msk->pm.local_addr_used, msk->pm.local_addr_max,
> > +		 msk->pm.add_addr_signaled, msk->pm.add_addr_signal_max);
> > +
> > +	/* check first if should create a new subflow */
> > +	if (msk->pm.local_addr_used < msk->pm.local_addr_max) {
> > +		remote_address((struct sock_common *)sk, &remote);
> > +
> > +		local = select_local_address(pernet, msk);
> > +		if (local) {
> > +			msk->pm.local_addr_used++;
> > +			check_work_pending(msk);
> > +			spin_unlock_bh(&msk->pm.lock);
> > +			__mptcp_subflow_connect(sk, local->ifindex,
> > +						&local->addr, &remote);
> 
> Should we do anything special if this fail?
> (decrement local_addr_used? retry later?)

IIRC, according to the RFC we must not retry if the address is
unreachable and/or the path is not MPTCP capable. So we should retry
only if failing e.g. due to an allocation error. Overall a correct
retry policy looks a bit too complex for this stage to me. I think we
can add that later, if needed.

[...]
> +void mptcp_pm_nl_add_addr(struct mptcp_sock *msk)
> 
> (it's maybe just me but should we rename this by adding "_recv"? Same 
> for the caller and related functions because it was not clear we are 
> doing some actions not add an address or produce an add_addr but because 
> we just received one)

Ok, will do in the next iteration.

> > +{
> > +	struct sock *sk = (struct sock *)msk;
> > +	struct mptcp_addr_info remote;
> > +	struct mptcp_addr_info local;
> > +	struct pm_nl_pernet *pernet;
> > +
> > +	pernet = net_generic(sock_net((struct sock *)msk), pm_nl_pernet_id);
> > +
> > +	spin_lock_bh(&msk->pm.lock);
> > +	msk->pm.status = MPTCP_PM_IDLE;
> > +	pr_debug("accepted %d:%d remote family %d",
> > +		 msk->pm.add_addr_accepted, msk->pm.add_addr_accept_max,
> > +		 msk->pm.remote.family);
> > +	if (++msk->pm.add_addr_accepted >= msk->pm.add_addr_accept_max)
> > +		WRITE_ONCE(msk->pm.accept_addr, false);
> > +
> > +	/* connect to the specified remote address, using whatever
> > +	 * local address the routing configuration will pick.
> > +	 */
> 
> Should we try with all local IP the userspace added (+ the original 
> one)? → to have a fullmesh

I would postpone this one to later.
> 
> > +	remote = msk->pm.remote;
> > +	if (!remote.port)
> > +		remote.port = sk->sk_dport;
> > +	memset(&local, 0, sizeof(local));
> > +	local.family = remote.family;
> > +	spin_unlock_bh(&msk->pm.lock);
> > +
> > +	lock_sock(sk);
> > +	__mptcp_subflow_connect((struct sock *)msk, 0, &local, &remote);
> 
> Should we do anything special if this fail?

Same situation as the other __mptcp_subflow_connect()

> > +	release_sock(sk);
> 
> Do we need to clear "pm.remote" somewhere? How do we deal with multiple 
> received ADD_ADDR? Even if with the "echo", I guess we might receive a 
> new one before the workqueue has finished dealing with the previous one, no?

The idea is that the workqueue does sort of rate-limiting on subflow
creation. To cope with multiple 'concurrent' addresses we need to add
additiona/dynamic storage. IIRC the RFC allow us to ignore multiple
ADD_ADDR options as it fit us. 

> > +}
> > +
> > +static bool address_use_port(struct mptcp_pm_addr_entry *entry)
> > +{
> > +	return (entry->flags &
> > +		(MPTCP_PM_ADDR_FLAG_SIGNAL | MPTCP_PM_ADDR_FLAG_SUBFLOW)) ==
> > +		MPTCP_PM_ADDR_FLAG_SIGNAL;
> > +}
> > +
> > +static int mptcp_pm_nl_append_new_addr(struct pm_nl_pernet *pernet,
> 
> detail: Here as well, should we add _local → new_local_addr?

Ok, will do in the next iteration.

> > +				       struct mptcp_pm_addr_entry *entry)
> > +{
> > +	struct mptcp_pm_addr_entry *cur;
> > +	int ret = -EINVAL;
> > +
> > +	spin_lock_bh(&pernet->lock);
> > +	/* to keep the code simple, don't do IDR-like allocation for address ID,
> > +	 * just bail when we exceed limits
> > +	 */
> 
> And we also need to support REM_ADDR here I guess.

Yes, we need to add REM_ADDR support. I think we don't need to add
anything specific with the current implementation: we don't store
locally remote addess, we don't have anything to delete and we will not
re-use them later anyway (due to lack of retry policy).

> > +	if (pernet->next_id > 255)
> > +		goto out;
> > +	if (pernet->addrs >= MPTCP_PM_ADDR_MAX)
> > +		goto out;
> > +
> > +	/* do not insert duplicate address, differentiate on port only
> > +	 * singled addresses
> > +	 */
> > +	list_for_each_entry(cur, &pernet->addr_list, list) {
> > +		if (addresses_equal(&cur->addr, &entry->addr,
> > +				    address_use_port(entry) &&
> > +				    address_use_port(cur)))
> > +			goto out;
> > +	}
> > +
> > +	if (entry->flags & MPTCP_PM_ADDR_FLAG_SIGNAL)
> > +		pernet->add_addr_signal_max++;
> > +	if (entry->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW)
> > +		pernet->local_addr_max++;
> > +
> > +	entry->addr.id = pernet->next_id++;
> > +	pernet->addrs++;
> > +	list_add_tail_rcu(&entry->list, &pernet->addr_list);
> > +	ret = entry->addr.id;
> > +
> > +out:
> > +	spin_unlock_bh(&pernet->lock);
> > +	return ret;
> > +}
> > +
> > +int mptcp_pm_nl_get_local_id(struct mptcp_sock *msk, struct sock_common *skc)
> > +{
> > +	struct mptcp_pm_addr_entry *entry;
> > +	struct mptcp_addr_info skc_local;
> > +	struct mptcp_addr_info msk_local;
> > +	struct pm_nl_pernet *pernet;
> > +	int ret = -1;
> > +
> > +	if (WARN_ON_ONCE(!msk))
> > +		return -1;
> > +
> > +	/* The 0 ID mapping is defined by the first subflow, copied into the msk
> > +	 * addr
> > +	 */
> 
> Is it still OK to do that if the first subflow is down (but other 
> subflows exist)? I mean, no need to have additional check to see if the 
> subflow 0 is still alive? (I didn't check what we do with the msk for 
> this specific case)

We must always use the same ID for the same address inside an MTPCP
connection scope. Once we associate '0' to one local IP we should use
that association till the msk socket exists. With the current code, the
first subflow socket structure will exists till mptcp_close, even if
the related TCP connection is terminate well before. So this code
should be safe.

[...]
> > +#if IS_ENABLED(CONFIG_MPTCP_IPV6)
> > +	if (entry->addr.family == AF_INET6)
> > +		entry->addr.addr6 = nla_get_in6_addr(tb[addr_addr]);
> > +	else
> > +#endif
> > +		entry->addr.addr.s_addr = nla_get_in_addr(tb[addr_addr]);
> > +
> > +skip_family:
> > +	if (tb[MPTCP_PM_ADDR_ATTR_IF_IDX])
> > +		entry->ifindex = nla_get_s32(tb[MPTCP_PM_ADDR_ATTR_IF_IDX]);
> > +	else
> > +		entry->ifindex = 0;
> 
> detail: "entry" is init to 0 at the beginning of this function. Same 
> below for the ID. Or if you remove the memset, the flags are not set to 0

Will drop the 'else' branch in the next iteration.

> > +
> > +	if (tb[MPTCP_PM_ADDR_ATTR_ID])
> > +		entry->addr.id = nla_get_u16(tb[MPTCP_PM_ADDR_ATTR_ID]);
> 
> Is it not a u8?

yep, funnily enough self-tests passes. Will fix in the next iteration.
> 
> > +	else
> > +		entry->addr.id = 0;
> > +
> > +	if (tb[MPTCP_PM_ADDR_ATTR_FLAGS])
> > +		entry->flags = nla_get_u32(tb[MPTCP_PM_ADDR_ATTR_FLAGS]);
> 
> (detail if you have to modify this: please add a new line here because 
> most of the time, there is an empty line after a if-statement block)
> 
> > +	return 0;
> > +}
> > +
> > +static struct pm_nl_pernet *genl_info_pm_nl(struct genl_info *info)
> > +{
> > +	return net_generic(genl_info_net(info), pm_nl_pernet_id);
> > +}
> > +
> > +static int mptcp_nl_cmd_add_addr(struct sk_buff *skb, struct genl_info *info)
> > +{
> > +	struct nlattr *attr = info->attrs[MPTCP_PM_ATTR_ADDR];
> > +	struct pm_nl_pernet *pernet = genl_info_pm_nl(info);
> > +	struct mptcp_pm_addr_entry addr, *entry;
> > +	int ret;
> > +
> > +	ret = mptcp_pm_parse_addr(attr, info, true, &addr);
> > +	if (ret)
> 
> Out of curriosity, is there a recommendation on what to check with 
> "ret"? I mean "ret" is vague and often we can see "if (ret < 0)", like 
> below (also because it is needed in this case), which seems clearer to 
> me. Sometimes, "err" is used and in this case, it is clear to read: "if 
> (err)".
> No need to change anything, it's just to know what's recommended but not 
> mandatory I guess.

I'll go with 'err' in the next iteration.

> > +		return ret;
> > +
> > +	entry = kmalloc(sizeof(*entry), GFP_KERNEL);
> > +	if (!entry) {
> > +		GENL_SET_ERR_MSG(info, "can't allocate addr");
> > +		return -ENOMEM;
> > +	}
> > +
> > +	*entry = addr;
> > +	ret = mptcp_pm_nl_append_new_addr(pernet, entry);
> > +	if (ret < 0) {
> > +		GENL_SET_ERR_MSG(info, "too many addresses");
> 
> detail: Or duplicated one.
> 
> > +		kfree(entry);
> > +		return ret;
> > +	}
> 
> Should we trigger here the establishement of new subflows and/or the 
> send of ADD_ADDR for all existing MPTCP connections?

To keep the things simple, I would avoid that. For servers the number
of msk sockets to process could be high.

[...]
> > +static int mptcp_nl_cmd_get_addr(struct sk_buff *skb, struct genl_info *info)
> > +{
> > +	struct nlattr *attr = info->attrs[MPTCP_PM_ATTR_ADDR];
> > +	struct pm_nl_pernet *pernet = genl_info_pm_nl(info);
> > +	struct mptcp_pm_addr_entry addr, *entry;
> > +	struct sk_buff *msg;
> > +	void *reply;
> > +	int ret;
> > +
> > +	ret = mptcp_pm_parse_addr(attr, info, false, &addr);
> > +	if (ret)
> > +		return ret;
> > +
> > +	msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
> > +	if (!msg)
> > +		return -ENOMEM;
> > +
> > +	reply = genlmsg_put_reply(msg, info, &mptcp_genl_family, 0,
> > +				  MPTCP_CMD_ADD_ADDR);
> 
> Should it not be MPTCP_CMD_GET_ADDR?
> 
> Can you not use info->genlhdr->cmd? (Same for the other genlmsg_put_reply())

I think you are right. I picked 'MPTCP_CMD_ADD_ADDR' because the reply
message is formatted exactly as a MPTCP_CMD_ADD_ADDR command, but it's
not the correct way to do this. Will fix in the next iteration.

> > 
> > +mptcp_nl_cmd_set_rcv_add_addrs(struct sk_buff *skb, struct genl_info *info)
> > +{
> > +	struct nlattr *attr = info->attrs[MPTCP_PM_ATTR_RCV_ADD_ADDRS];
> > +	struct pm_nl_pernet *pernet = genl_info_pm_nl(info);
> > +	int limit;
> > +
> > +	if (!attr) {
> > +		GENL_SET_ERR_MSG(info, "missing announce accept limit");
> > +		return -EINVAL;
> > +	}
> > +
> > +	limit = nla_get_u16(attr);
> 
> it should be u32 I think.

Yep, will fix in the next iteration.

> > +static int __net_init pm_nl_init_net(struct net *net)
> > +{
> > +	struct pm_nl_pernet *pernet = net_generic(net, pm_nl_pernet_id);
> > +
> > +	INIT_LIST_HEAD_RCU(&pernet->addr_list);
> > +	__reset_counters(pernet);
> 
> I guess it's not strictly needed but doesn't hurt.

Needed on clone()

> > +void mptcp_pm_nl_init(void)
> > +{
> > +	if (register_pernet_subsys(&mptcp_pm_pernet_ops) < 0)
> > +		panic("Failed to register MPTCP PM pernet subsystem.\n");
> > +
> > +	if (genl_register_family(&mptcp_genl_family))
> > +		panic("Failed to register MPTCP PM netlink family");
> 
> (detail: I guess you need "\n")
> 
will fix in the next iteration.

Thank you!

Paolo

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-02-26  9:54 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-22 16:38 [MPTCP] Re: [PATCH v3 7/9] mptcp: add netlink based PM Matthieu Baerts
2020-02-24 12:07 Paolo Abeni
2020-02-24 12:59 Davide Caratti
2020-02-24 13:02 Matthieu Baerts
2020-02-24 15:50 Paolo Abeni
2020-02-24 17:58 Matthieu Baerts
2020-02-24 18:31 Paolo Abeni
2020-02-25 17:51 Matthieu Baerts
2020-02-26  9:36 Florian Westphal
2020-02-26  9:46 Paolo Abeni
2020-02-26  9:51 Florian Westphal
2020-02-26  9:54 Matthieu Baerts

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.