All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/6] Support for fdb ECMP nexthop groups
@ 2020-05-19  2:14 Roopa Prabhu
  2020-05-19  2:14 ` [PATCH net-next 1/6] nexthop: dereference nh only once in nexthop_select_path Roopa Prabhu
                   ` (5 more replies)
  0 siblings, 6 replies; 20+ messages in thread
From: Roopa Prabhu @ 2020-05-19  2:14 UTC (permalink / raw)
  To: dsahern, davem; +Cc: netdev, nikolay, jiri, idosch, petrm

From: Roopa Prabhu <roopa@cumulusnetworks.com>

This series introduces ecmp nexthops and nexthop groups
for mac fdb entries. In subsequent patches this is used
by the vxlan driver fdb entries. The use case is
E-VPN multihoming [1,2,3] which requires bridged vxlan traffic
to be load balanced to remote switches (vteps) belonging to
the same multi-homed ethernet segment (This is analogous to
a multi-homed LAG but over vxlan).

Changes include new nexthop flag NHA_FDB for nexthops
referenced by fdb entries. These nexthops only have ip.
The patches make sure that routes dont reference such nexthops.

example:
$ip nexthop add id 12 via 172.16.1.2 fdb
$ip nexthop add id 13 via 172.16.1.3 fdb
$ip nexthop add id 102 group 12/13 fdb

$bridge fdb add 02:02:00:00:00:13 dev vxlan1000 nhid 101 self

[1] E-VPN https://tools.ietf.org/html/rfc7432
[2] E-VPN VxLAN: https://tools.ietf.org/html/rfc8365
[3] LPC talk with mention of nexthop groups for L2 ecmp
http://vger.kernel.org/lpc_net2018_talks/scaling_bridge_fdb_database_slidesV3.pdf


Nikolay Aleksandrov (1):
  nexthop: dereference nh only once in nexthop_select_path

Roopa Prabhu (5):
  nexthop: support for fdb ecmp nexthops
  vxlan: ecmp support for mac fdb entries
  nexthop: add support for notifiers
  vxlan: support for nexthop notifiers
  selftests: net: add fdb nexthop tests

 drivers/net/vxlan.c                         | 318 ++++++++++++++++++++++------
 include/net/ip6_fib.h                       |   1 +
 include/net/netns/nexthop.h                 |   1 +
 include/net/nexthop.h                       |  44 ++++
 include/net/vxlan.h                         |  24 +++
 include/uapi/linux/neighbour.h              |   1 +
 include/uapi/linux/nexthop.h                |   3 +
 net/core/neighbour.c                        |   2 +
 net/ipv4/nexthop.c                          | 170 ++++++++++++---
 net/ipv6/route.c                            |   5 +
 tools/testing/selftests/net/fib_nexthops.sh | 140 +++++++++++-
 11 files changed, 617 insertions(+), 92 deletions(-)

-- 
2.1.4


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH net-next 1/6] nexthop: dereference nh only once in nexthop_select_path
  2020-05-19  2:14 [PATCH net-next 0/6] Support for fdb ECMP nexthop groups Roopa Prabhu
@ 2020-05-19  2:14 ` Roopa Prabhu
  2020-05-19  3:25   ` David Ahern
  2020-05-19  2:14 ` [PATCH net-next 2/6] nexthop: support for fdb ecmp nexthops Roopa Prabhu
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Roopa Prabhu @ 2020-05-19  2:14 UTC (permalink / raw)
  To: dsahern, davem; +Cc: netdev, nikolay, jiri, idosch, petrm

From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

the ->nh pointer might become suddenly null while we're selecting the
path and we may dereference it. Dereference it only once in the
beginning and use that if it's not null, we rely on the refcounting and
rcu to protect against use-after-free. (This is needed for later
vxlan patches that exposes the problem)

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
---
 net/ipv4/nexthop.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
index 3957364..992e841 100644
--- a/net/ipv4/nexthop.c
+++ b/net/ipv4/nexthop.c
@@ -490,28 +490,33 @@ struct nexthop *nexthop_select_path(struct nexthop *nh, int hash)
 	nhg = rcu_dereference(nh->nh_grp);
 	for (i = 0; i < nhg->num_nh; ++i) {
 		struct nh_grp_entry *nhge = &nhg->nh_entries[i];
+		struct nexthop *nhge_nh;
 		struct nh_info *nhi;
 
 		if (hash > atomic_read(&nhge->upper_bound))
 			continue;
 
+		nhge_nh = READ_ONCE(nhge->nh);
+		if (unlikely(!nhge_nh))
+			continue;
+
 		/* nexthops always check if it is good and does
 		 * not rely on a sysctl for this behavior
 		 */
-		nhi = rcu_dereference(nhge->nh->nh_info);
+		nhi = rcu_dereference(nhge_nh->nh_info);
 		switch (nhi->family) {
 		case AF_INET:
 			if (ipv4_good_nh(&nhi->fib_nh))
-				return nhge->nh;
+				return nhge_nh;
 			break;
 		case AF_INET6:
 			if (ipv6_good_nh(&nhi->fib6_nh))
-				return nhge->nh;
+				return nhge_nh;
 			break;
 		}
 
 		if (!rc)
-			rc = nhge->nh;
+			rc = nhge_nh;
 	}
 
 	return rc;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next 2/6] nexthop: support for fdb ecmp nexthops
  2020-05-19  2:14 [PATCH net-next 0/6] Support for fdb ECMP nexthop groups Roopa Prabhu
  2020-05-19  2:14 ` [PATCH net-next 1/6] nexthop: dereference nh only once in nexthop_select_path Roopa Prabhu
@ 2020-05-19  2:14 ` Roopa Prabhu
  2020-05-19  3:53   ` David Ahern
  2020-05-19  2:14 ` [PATCH net-next 3/6] vxlan: ecmp support for mac fdb entries Roopa Prabhu
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Roopa Prabhu @ 2020-05-19  2:14 UTC (permalink / raw)
  To: dsahern, davem; +Cc: netdev, nikolay, jiri, idosch, petrm

From: Roopa Prabhu <roopa@cumulusnetworks.com>

This patch introduces ecmp nexthops and nexthop groups
for mac fdb entries. In subsequent patches this is used
by the vxlan driver fdb entries. The use case is
E-VPN multihoming [1,2,3] which requires bridged vxlan traffic
to be load balanced to remote switches (vteps) belonging to
the same multi-homed ethernet segment (This is analogous to
a multi-homed LAG but over vxlan).

Changes include new nexthop flag NHA_FDB for nexthops
referenced by fdb entries. These nexthops only have ip.
This patch includes appropriate checks to avoid routes
referencing such nexthops.

example:
$ip nexthop add id 12 via 172.16.1.2 fdb
$ip nexthop add id 13 via 172.16.1.3 fdb
$ip nexthop add id 102 group 12/13 fdb

$bridge fdb add 02:02:00:00:00:13 dev vxlan1000 nhid 101 self

[1] E-VPN https://tools.ietf.org/html/rfc7432
[2] E-VPN VxLAN: https://tools.ietf.org/html/rfc8365
[3] LPC talk with mention of nexthop groups for L2 ecmp
http://vger.kernel.org/lpc_net2018_talks/scaling_bridge_fdb_database_slidesV3.pdf

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
---
 include/net/ip6_fib.h        |   1 +
 include/net/nexthop.h        |  30 ++++++++++
 include/uapi/linux/nexthop.h |   3 +
 net/core/neighbour.c         |   2 +
 net/ipv4/nexthop.c           | 129 ++++++++++++++++++++++++++++++++++---------
 net/ipv6/route.c             |   5 ++
 6 files changed, 145 insertions(+), 25 deletions(-)

diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index fdaf975..3f615a2 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -65,6 +65,7 @@ struct fib6_config {
 	struct nl_info	fc_nlinfo;
 	struct nlattr	*fc_encap;
 	u16		fc_encap_type;
+	bool		fc_is_fdb;
 };
 
 struct fib6_node {
diff --git a/include/net/nexthop.h b/include/net/nexthop.h
index c440ccc..04dafc6 100644
--- a/include/net/nexthop.h
+++ b/include/net/nexthop.h
@@ -26,6 +26,7 @@ struct nh_config {
 	u8		nh_family;
 	u8		nh_protocol;
 	u8		nh_blackhole;
+	u8		nh_fdb;
 	u32		nh_flags;
 
 	int		nh_ifindex;
@@ -52,6 +53,7 @@ struct nh_info {
 
 	u8			family;
 	bool			reject_nh;
+	bool			fdb_nh;
 
 	union {
 		struct fib_nh_common	fib_nhc;
@@ -80,6 +82,7 @@ struct nexthop {
 	struct rb_node		rb_node;    /* entry on netns rbtree */
 	struct list_head	fi_list;    /* v4 entries using nh */
 	struct list_head	f6i_list;   /* v6 entries using nh */
+	struct list_head        fdb_list;   /* fdb entries using this nh */
 	struct list_head	grp_list;   /* nh group entries using this nh */
 	struct net		*net;
 
@@ -88,6 +91,7 @@ struct nexthop {
 	u8			protocol;   /* app managing this nh */
 	u8			nh_flags;
 	bool			is_group;
+	bool			is_fdb_nh;
 
 	refcount_t		refcnt;
 	struct rcu_head		rcu;
@@ -304,4 +308,30 @@ static inline void nexthop_path_fib6_result(struct fib6_result *res, int hash)
 int nexthop_for_each_fib6_nh(struct nexthop *nh,
 			     int (*cb)(struct fib6_nh *nh, void *arg),
 			     void *arg);
+
+static inline int nexthop_get_family(struct nexthop *nh)
+{
+	struct nh_info *nhi = rcu_dereference_rtnl(nh->nh_info);
+
+	return nhi->family;
+}
+
+static inline
+struct fib_nh_common *nexthop_fdb_nhc(struct nexthop *nh)
+{
+	struct nh_info *nhi = rcu_dereference_rtnl(nh->nh_info);
+
+	return &nhi->fib_nhc;
+}
+
+static inline struct fib_nh_common *nexthop_path_fdb_result(struct nexthop *nh,
+							    int hash)
+{
+	struct nh_info *nhi;
+	struct nexthop *nhp;
+
+	nhp = nexthop_select_path(nh, hash);
+	nhi = rcu_dereference(nhp->nh_info);
+	return &nhi->fib_nhc;
+}
 #endif
diff --git a/include/uapi/linux/nexthop.h b/include/uapi/linux/nexthop.h
index 7b61867..2d4a1e7 100644
--- a/include/uapi/linux/nexthop.h
+++ b/include/uapi/linux/nexthop.h
@@ -49,6 +49,9 @@ enum {
 	NHA_GROUPS,	/* flag; only return nexthop groups in dump */
 	NHA_MASTER,	/* u32;  only return nexthops with given master dev */
 
+	NHA_FDB,	/* flag; nexthop belongs to a bridge fdb */
+	/* if NHA_FDB is added, OIF, BLACKHOLE, ENCAP cannot be set */
+
 	__NHA_MAX,
 };
 
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index b607ea6..37e4dba 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1771,6 +1771,7 @@ static struct neigh_table *neigh_find_table(int family)
 }
 
 const struct nla_policy nda_policy[NDA_MAX+1] = {
+	[NDA_UNSPEC]		= { .strict_start_type = NDA_NH_ID },
 	[NDA_DST]		= { .type = NLA_BINARY, .len = MAX_ADDR_LEN },
 	[NDA_LLADDR]		= { .type = NLA_BINARY, .len = MAX_ADDR_LEN },
 	[NDA_CACHEINFO]		= { .len = sizeof(struct nda_cacheinfo) },
@@ -1781,6 +1782,7 @@ const struct nla_policy nda_policy[NDA_MAX+1] = {
 	[NDA_IFINDEX]		= { .type = NLA_U32 },
 	[NDA_MASTER]		= { .type = NLA_U32 },
 	[NDA_PROTOCOL]		= { .type = NLA_U8 },
+	[NDA_NH_ID]		= { .type = NLA_U32 },
 };
 
 static int neigh_delete(struct sk_buff *skb, struct nlmsghdr *nlh,
diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
index 992e841..d314b27 100644
--- a/net/ipv4/nexthop.c
+++ b/net/ipv4/nexthop.c
@@ -33,6 +33,7 @@ static const struct nla_policy rtm_nh_policy[NHA_MAX + 1] = {
 	[NHA_ENCAP]		= { .type = NLA_NESTED },
 	[NHA_GROUPS]		= { .type = NLA_FLAG },
 	[NHA_MASTER]		= { .type = NLA_U32 },
+	[NHA_FDB]		= { .type = NLA_FLAG },
 };
 
 static unsigned int nh_dev_hashfn(unsigned int val)
@@ -107,6 +108,7 @@ static struct nexthop *nexthop_alloc(void)
 		INIT_LIST_HEAD(&nh->fi_list);
 		INIT_LIST_HEAD(&nh->f6i_list);
 		INIT_LIST_HEAD(&nh->grp_list);
+		INIT_LIST_HEAD(&nh->fdb_list);
 	}
 	return nh;
 }
@@ -227,6 +229,9 @@ static int nh_fill_node(struct sk_buff *skb, struct nexthop *nh,
 	if (nla_put_u32(skb, NHA_ID, nh->id))
 		goto nla_put_failure;
 
+	if (nh->is_fdb_nh && nla_put_flag(skb, NHA_FDB))
+		goto nla_put_failure;
+
 	if (nh->is_group) {
 		struct nh_group *nhg = rtnl_dereference(nh->nh_grp);
 
@@ -241,7 +246,7 @@ static int nh_fill_node(struct sk_buff *skb, struct nexthop *nh,
 		if (nla_put_flag(skb, NHA_BLACKHOLE))
 			goto nla_put_failure;
 		goto out;
-	} else {
+	} else if (!nh->is_fdb_nh) {
 		const struct net_device *dev;
 
 		dev = nhi->fib_nhc.nhc_dev;
@@ -387,12 +392,35 @@ static bool valid_group_nh(struct nexthop *nh, unsigned int npaths,
 	return true;
 }
 
+static int nh_check_attr_fdb_group(struct nexthop *nh, u8 *nh_family,
+				   struct netlink_ext_ack *extack)
+{
+	struct nh_info *nhi;
+
+	if (!nh->is_fdb_nh) {
+		NL_SET_ERR_MSG(extack, "FDB nexthop group can only have fdb nexthops");
+		return -EINVAL;
+	}
+
+	nhi = rtnl_dereference(nh->nh_info);
+	if (*nh_family == AF_UNSPEC) {
+		*nh_family = nhi->family;
+	} else if (*nh_family != nhi->family) {
+		NL_SET_ERR_MSG(extack, "FDB nexthop group cannot have mixed family nexthops");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 static int nh_check_attr_group(struct net *net, struct nlattr *tb[],
 			       struct netlink_ext_ack *extack)
 {
 	unsigned int len = nla_len(tb[NHA_GROUP]);
+	u8 nh_family = AF_UNSPEC;
 	struct nexthop_grp *nhg;
 	unsigned int i, j;
+	u8 nhg_fdb = 0;
 
 	if (len & (sizeof(struct nexthop_grp) - 1)) {
 		NL_SET_ERR_MSG(extack,
@@ -421,6 +449,8 @@ static int nh_check_attr_group(struct net *net, struct nlattr *tb[],
 		}
 	}
 
+	if (tb[NHA_FDB])
+		nhg_fdb = 1;
 	nhg = nla_data(tb[NHA_GROUP]);
 	for (i = 0; i < len; ++i) {
 		struct nexthop *nh;
@@ -432,11 +462,20 @@ static int nh_check_attr_group(struct net *net, struct nlattr *tb[],
 		}
 		if (!valid_group_nh(nh, len, extack))
 			return -EINVAL;
+
+		if (nhg_fdb && nh_check_attr_fdb_group(nh, &nh_family, extack))
+			return -EINVAL;
+
+		if (!nhg_fdb && nh->is_fdb_nh) {
+			NL_SET_ERR_MSG(extack, "Non FDB nexthop group cannot have fdb nexthops");
+			return -EINVAL;
+		}
 	}
 	for (i = NHA_GROUP + 1; i < __NHA_MAX; ++i) {
 		if (!tb[i])
 			continue;
-
+		if (tb[NHA_FDB])
+			continue;
 		NL_SET_ERR_MSG(extack,
 			       "No other attributes can be set in nexthop groups");
 		return -EINVAL;
@@ -500,6 +539,9 @@ struct nexthop *nexthop_select_path(struct nexthop *nh, int hash)
 		if (unlikely(!nhge_nh))
 			continue;
 
+		if (nhge_nh->is_fdb_nh)
+			return nhge->nh;
+
 		/* nexthops always check if it is good and does
 		 * not rely on a sysctl for this behavior
 		 */
@@ -569,6 +611,11 @@ int fib6_check_nexthop(struct nexthop *nh, struct fib6_config *cfg,
 {
 	struct nh_info *nhi;
 
+	if (nh->is_fdb_nh) {
+		NL_SET_ERR_MSG(extack, "Route cannot point to a fdb nexthop");
+		return -EINVAL;
+	}
+
 	/* fib6_src is unique to a fib6_info and limits the ability to cache
 	 * routes in fib6_nh within a nexthop that is potentially shared
 	 * across multiple fib entries. If the config wants to use source
@@ -645,6 +692,12 @@ int fib_check_nexthop(struct nexthop *nh, u8 scope,
 {
 	int err = 0;
 
+	if (nh->is_fdb_nh) {
+		NL_SET_ERR_MSG(extack, "Route cannot point to a fdb nexthop");
+		err = -EINVAL;
+		goto out;
+	}
+
 	if (nh->is_group) {
 		struct nh_group *nhg;
 
@@ -1130,6 +1183,9 @@ static struct nexthop *nexthop_create_group(struct net *net,
 		nh_group_rebalance(nhg);
 	}
 
+	if (cfg->nh_fdb)
+		nh->is_fdb_nh = 1;
+
 	rcu_assign_pointer(nh->nh_grp, nhg);
 
 	return nh;
@@ -1157,7 +1213,7 @@ static int nh_create_ipv4(struct net *net, struct nexthop *nh,
 		.fc_encap = cfg->nh_encap,
 		.fc_encap_type = cfg->nh_encap_type,
 	};
-	u32 tb_id = l3mdev_fib_table(cfg->dev);
+	u32 tb_id = (cfg->dev ? l3mdev_fib_table(cfg->dev) : RT_TABLE_MAIN);
 	int err;
 
 	err = fib_nh_init(net, fib_nh, &fib_cfg, 1, extack);
@@ -1166,6 +1222,9 @@ static int nh_create_ipv4(struct net *net, struct nexthop *nh,
 		goto out;
 	}
 
+	if (nh->is_fdb_nh)
+		goto out;
+
 	/* sets nh_dev if successful */
 	err = fib_check_nh(net, fib_nh, tb_id, 0, extack);
 	if (!err) {
@@ -1191,6 +1250,7 @@ static int nh_create_ipv6(struct net *net,  struct nexthop *nh,
 		.fc_flags = cfg->nh_flags,
 		.fc_encap = cfg->nh_encap,
 		.fc_encap_type = cfg->nh_encap_type,
+		.fc_is_fdb = cfg->nh_fdb,
 	};
 	int err;
 
@@ -1232,6 +1292,9 @@ static struct nexthop *nexthop_create(struct net *net, struct nh_config *cfg,
 	nhi->family = cfg->nh_family;
 	nhi->fib_nhc.nhc_scope = RT_SCOPE_LINK;
 
+	if (cfg->nh_fdb)
+		nh->is_fdb_nh = 1;
+
 	if (cfg->nh_blackhole) {
 		nhi->reject_nh = 1;
 		cfg->nh_ifindex = net->loopback_dev->ifindex;
@@ -1253,7 +1316,8 @@ static struct nexthop *nexthop_create(struct net *net, struct nh_config *cfg,
 	}
 
 	/* add the entry to the device based hash */
-	nexthop_devhash_add(net, nhi);
+	if (!nh->is_fdb_nh)
+		nexthop_devhash_add(net, nhi);
 
 	rcu_assign_pointer(nh->nh_info, nhi);
 
@@ -1357,6 +1421,16 @@ static int rtm_to_nh_config(struct net *net, struct sk_buff *skb,
 	if (tb[NHA_ID])
 		cfg->nh_id = nla_get_u32(tb[NHA_ID]);
 
+	if (tb[NHA_FDB]) {
+		if (tb[NHA_OIF] || tb[NHA_BLACKHOLE] ||
+		    tb[NHA_ENCAP]   || tb[NHA_ENCAP_TYPE]) {
+			NL_SET_ERR_MSG(extack, "Fdb attribute can not be used with encap, oif or blackhole");
+			goto out;
+		}
+
+		cfg->nh_fdb = nla_get_flag(tb[NHA_FDB]);
+	}
+
 	if (tb[NHA_GROUP]) {
 		if (nhm->nh_family != AF_UNSPEC) {
 			NL_SET_ERR_MSG(extack, "Invalid family for group");
@@ -1380,8 +1454,8 @@ static int rtm_to_nh_config(struct net *net, struct sk_buff *skb,
 
 	if (tb[NHA_BLACKHOLE]) {
 		if (tb[NHA_GATEWAY] || tb[NHA_OIF] ||
-		    tb[NHA_ENCAP]   || tb[NHA_ENCAP_TYPE]) {
-			NL_SET_ERR_MSG(extack, "Blackhole attribute can not be used with gateway or oif");
+		    tb[NHA_ENCAP]   || tb[NHA_ENCAP_TYPE] || tb[NHA_FDB]) {
+			NL_SET_ERR_MSG(extack, "Blackhole attribute can not be used with gateway, oif, encap or fdb");
 			goto out;
 		}
 
@@ -1390,26 +1464,28 @@ static int rtm_to_nh_config(struct net *net, struct sk_buff *skb,
 		goto out;
 	}
 
-	if (!tb[NHA_OIF]) {
-		NL_SET_ERR_MSG(extack, "Device attribute required for non-blackhole nexthops");
+	if (!cfg->nh_fdb && !tb[NHA_OIF]) {
+		NL_SET_ERR_MSG(extack, "Device attribute required for non-blackhole and non-fdb nexthops");
 		goto out;
 	}
 
-	cfg->nh_ifindex = nla_get_u32(tb[NHA_OIF]);
-	if (cfg->nh_ifindex)
-		cfg->dev = __dev_get_by_index(net, cfg->nh_ifindex);
+	if (!cfg->nh_fdb && tb[NHA_OIF]) {
+		cfg->nh_ifindex = nla_get_u32(tb[NHA_OIF]);
+		if (cfg->nh_ifindex)
+			cfg->dev = __dev_get_by_index(net, cfg->nh_ifindex);
 
-	if (!cfg->dev) {
-		NL_SET_ERR_MSG(extack, "Invalid device index");
-		goto out;
-	} else if (!(cfg->dev->flags & IFF_UP)) {
-		NL_SET_ERR_MSG(extack, "Nexthop device is not up");
-		err = -ENETDOWN;
-		goto out;
-	} else if (!netif_carrier_ok(cfg->dev)) {
-		NL_SET_ERR_MSG(extack, "Carrier for nexthop device is down");
-		err = -ENETDOWN;
-		goto out;
+		if (!cfg->dev) {
+			NL_SET_ERR_MSG(extack, "Invalid device index");
+			goto out;
+		} else if (!(cfg->dev->flags & IFF_UP)) {
+			NL_SET_ERR_MSG(extack, "Nexthop device is not up");
+			err = -ENETDOWN;
+			goto out;
+		} else if (!netif_carrier_ok(cfg->dev)) {
+			NL_SET_ERR_MSG(extack, "Carrier for nexthop device is down");
+			err = -ENETDOWN;
+			goto out;
+		}
 	}
 
 	err = -EINVAL;
@@ -1638,7 +1714,7 @@ static bool nh_dump_filtered(struct nexthop *nh, int dev_idx, int master_idx,
 
 static int nh_valid_dump_req(const struct nlmsghdr *nlh, int *dev_idx,
 			     int *master_idx, bool *group_filter,
-			     struct netlink_callback *cb)
+			     bool *fdb_filter, struct netlink_callback *cb)
 {
 	struct netlink_ext_ack *extack = cb->extack;
 	struct nlattr *tb[NHA_MAX + 1];
@@ -1675,6 +1751,9 @@ static int nh_valid_dump_req(const struct nlmsghdr *nlh, int *dev_idx,
 		case NHA_GROUPS:
 			*group_filter = true;
 			break;
+		case NHA_FDB:
+			*fdb_filter = true;
+			break;
 		default:
 			NL_SET_ERR_MSG(extack, "Unsupported attribute in dump request");
 			return -EINVAL;
@@ -1693,17 +1772,17 @@ static int nh_valid_dump_req(const struct nlmsghdr *nlh, int *dev_idx,
 /* rtnl */
 static int rtm_dump_nexthop(struct sk_buff *skb, struct netlink_callback *cb)
 {
+	bool group_filter = false, fdb_filter = false;
 	struct nhmsg *nhm = nlmsg_data(cb->nlh);
 	int dev_filter_idx = 0, master_idx = 0;
 	struct net *net = sock_net(skb->sk);
 	struct rb_root *root = &net->nexthop.rb_root;
-	bool group_filter = false;
 	struct rb_node *node;
 	int idx = 0, s_idx;
 	int err;
 
 	err = nh_valid_dump_req(cb->nlh, &dev_filter_idx, &master_idx,
-				&group_filter, cb);
+				&group_filter, &fdb_filter, cb);
 	if (err < 0)
 		return err;
 
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index a8b4add..41b49e3 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -3421,6 +3421,11 @@ int fib6_nh_init(struct net *net, struct fib6_nh *fib6_nh,
 #ifdef CONFIG_IPV6_ROUTER_PREF
 	fib6_nh->last_probe = jiffies;
 #endif
+	if (cfg->fc_is_fdb) {
+		fib6_nh->fib_nh_gw6 = cfg->fc_gateway;
+		fib6_nh->fib_nh_gw_family = AF_INET6;
+		return 0;
+	}
 
 	err = -ENODEV;
 	if (cfg->fc_ifindex) {
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next 3/6] vxlan: ecmp support for mac fdb entries
  2020-05-19  2:14 [PATCH net-next 0/6] Support for fdb ECMP nexthop groups Roopa Prabhu
  2020-05-19  2:14 ` [PATCH net-next 1/6] nexthop: dereference nh only once in nexthop_select_path Roopa Prabhu
  2020-05-19  2:14 ` [PATCH net-next 2/6] nexthop: support for fdb ecmp nexthops Roopa Prabhu
@ 2020-05-19  2:14 ` Roopa Prabhu
  2020-05-19  3:57   ` David Ahern
                     ` (4 more replies)
  2020-05-19  2:14 ` [PATCH net-next 4/6] nexthop: add support for notifiers Roopa Prabhu
                   ` (2 subsequent siblings)
  5 siblings, 5 replies; 20+ messages in thread
From: Roopa Prabhu @ 2020-05-19  2:14 UTC (permalink / raw)
  To: dsahern, davem; +Cc: netdev, nikolay, jiri, idosch, petrm

From: Roopa Prabhu <roopa@cumulusnetworks.com>

Todays vxlan mac fdb entries can point to multiple remote
ips (rdsts) with the sole purpose of replicating
broadcast-multicast and unknown unicast packets to those remote ips.

E-VPN multihoming [1,2,3] requires bridged vxlan traffic to be
load balanced to remote switches (vteps) belonging to the
same multi-homed ethernet segment (E-VPN multihoming is analogous
to multi-homed LAG implementations, but with the inter-switch
peerlink replaced with a vxlan tunnel). In other words it needs
support for mac ecmp. Furthermore, for faster convergence, E-VPN
multihoming needs the ability to update fdb ecmp nexthops independent
of the fdb entries.

New route nexthop API is perfect for this usecase.
This patch extends the vxlan fdb code to take a nexthop id
pointing to an ecmp nexthop group.

Changes include:
- New NDA_NH_ID attribute for fdbs
- Use the newly added fdb nexthop groups
- makes vxlan rdsts and nexthop handling code mutually
  exclusive
- since this is a new use-case and the requirement is for ecmp
nexthop groups, the fdb add and update path checks that the
nexthop is really an ecmp nexthop group. This check can be relaxed
in the future, if we want to introduce replication fdb nexthop groups
and allow its use in lieu of current rdst lists.
- fdb update requests with nexthop id's only allowed for existing
fdb's that have nexthop id's
- learning will not override an existing fdb entry with nexthop
group
- I have wrapped the switchdev offload code around the presence of
rdst
- I think there is scope for simplyfing vxlan_xmit_one: Will see
what I can do before the non-RFC version

[1] E-VPN RFC https://tools.ietf.org/html/rfc7432
[2] E-VPN with vxlan https://tools.ietf.org/html/rfc8365
[3] http://vger.kernel.org/lpc_net2018_talks/scaling_bridge_fdb_database_slidesV3.pdf

Includes a nexthop_path_fdb_result NULL check crash fix from Nikolay Aleksandrov

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
---
 drivers/net/vxlan.c            | 285 ++++++++++++++++++++++++++++++++---------
 include/net/nexthop.h          |   2 +
 include/net/vxlan.h            |  24 ++++
 include/uapi/linux/neighbour.h |   1 +
 4 files changed, 255 insertions(+), 57 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index a5b415f..01933e9 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -26,6 +26,7 @@
 #include <net/netns/generic.h>
 #include <net/tun_proto.h>
 #include <net/vxlan.h>
+#include <net/nexthop.h>
 
 #if IS_ENABLED(CONFIG_IPV6)
 #include <net/ip6_tunnel.h>
@@ -78,6 +79,8 @@ struct vxlan_fdb {
 	u16		  state;	/* see ndm_state */
 	__be32		  vni;
 	u16		  flags;	/* see ndm_flags and below */
+	struct list_head  nh_list;
+	struct nexthop __rcu *nh;
 };
 
 #define NTF_VXLAN_ADDED_BY_USER 0x100
@@ -174,11 +177,15 @@ static inline struct hlist_head *vs_head(struct net *net, __be16 port)
  */
 static inline struct vxlan_rdst *first_remote_rcu(struct vxlan_fdb *fdb)
 {
+	if (rcu_access_pointer(fdb->nh))
+		return NULL;
 	return list_entry_rcu(fdb->remotes.next, struct vxlan_rdst, list);
 }
 
 static inline struct vxlan_rdst *first_remote_rtnl(struct vxlan_fdb *fdb)
 {
+	if (rcu_access_pointer(fdb->nh))
+		return NULL;
 	return list_first_entry(&fdb->remotes, struct vxlan_rdst, list);
 }
 
@@ -251,9 +258,10 @@ static int vxlan_fdb_info(struct sk_buff *skb, struct vxlan_dev *vxlan,
 {
 	unsigned long now = jiffies;
 	struct nda_cacheinfo ci;
+	bool send_ip, send_eth;
 	struct nlmsghdr *nlh;
+	struct nexthop *nh;
 	struct ndmsg *ndm;
-	bool send_ip, send_eth;
 
 	nlh = nlmsg_put(skb, portid, seq, type, sizeof(*ndm), flags);
 	if (nlh == NULL)
@@ -265,15 +273,19 @@ static int vxlan_fdb_info(struct sk_buff *skb, struct vxlan_dev *vxlan,
 	send_eth = send_ip = true;
 
 	if (type == RTM_GETNEIGH) {
-		send_ip = !vxlan_addr_any(&rdst->remote_ip);
+		if (rdst) {
+			send_ip = !vxlan_addr_any(&rdst->remote_ip);
+			ndm->ndm_family = send_ip ? rdst->remote_ip.sa.sa_family : AF_INET;
+		} else if (rcu_access_pointer(fdb->nh)) {
+			ndm->ndm_family = nexthop_get_family(rcu_dereference(fdb->nh));
+		}
 		send_eth = !is_zero_ether_addr(fdb->eth_addr);
-		ndm->ndm_family = send_ip ? rdst->remote_ip.sa.sa_family : AF_INET;
 	} else
 		ndm->ndm_family	= AF_BRIDGE;
 	ndm->ndm_state = fdb->state;
 	ndm->ndm_ifindex = vxlan->dev->ifindex;
 	ndm->ndm_flags = fdb->flags;
-	if (rdst->offloaded)
+	if (rdst && rdst->offloaded)
 		ndm->ndm_flags |= NTF_OFFLOADED;
 	ndm->ndm_type = RTN_UNICAST;
 
@@ -284,23 +296,31 @@ static int vxlan_fdb_info(struct sk_buff *skb, struct vxlan_dev *vxlan,
 
 	if (send_eth && nla_put(skb, NDA_LLADDR, ETH_ALEN, &fdb->eth_addr))
 		goto nla_put_failure;
+	nh = rcu_dereference(fdb->nh);
+	if (nh) {
+		if (nla_put_u32(skb, NDA_NH_ID, nh->id))
+			goto nla_put_failure;
+	} else if (rdst) {
+		if (send_ip && vxlan_nla_put_addr(skb, NDA_DST,
+						  &rdst->remote_ip))
+			goto nla_put_failure;
+
+		if (rdst->remote_port &&
+		    rdst->remote_port != vxlan->cfg.dst_port &&
+		    nla_put_be16(skb, NDA_PORT, rdst->remote_port))
+			goto nla_put_failure;
+		if (rdst->remote_vni != vxlan->default_dst.remote_vni &&
+		    nla_put_u32(skb, NDA_VNI, be32_to_cpu(rdst->remote_vni)))
+			goto nla_put_failure;
+		if (rdst->remote_ifindex &&
+		    nla_put_u32(skb, NDA_IFINDEX, rdst->remote_ifindex))
+			goto nla_put_failure;
+	}
 
-	if (send_ip && vxlan_nla_put_addr(skb, NDA_DST, &rdst->remote_ip))
-		goto nla_put_failure;
-
-	if (rdst->remote_port && rdst->remote_port != vxlan->cfg.dst_port &&
-	    nla_put_be16(skb, NDA_PORT, rdst->remote_port))
-		goto nla_put_failure;
-	if (rdst->remote_vni != vxlan->default_dst.remote_vni &&
-	    nla_put_u32(skb, NDA_VNI, be32_to_cpu(rdst->remote_vni)))
-		goto nla_put_failure;
 	if ((vxlan->cfg.flags & VXLAN_F_COLLECT_METADATA) && fdb->vni &&
 	    nla_put_u32(skb, NDA_SRC_VNI,
 			be32_to_cpu(fdb->vni)))
 		goto nla_put_failure;
-	if (rdst->remote_ifindex &&
-	    nla_put_u32(skb, NDA_IFINDEX, rdst->remote_ifindex))
-		goto nla_put_failure;
 
 	ci.ndm_used	 = jiffies_to_clock_t(now - fdb->used);
 	ci.ndm_confirmed = 0;
@@ -401,7 +421,7 @@ static int vxlan_fdb_notify(struct vxlan_dev *vxlan, struct vxlan_fdb *fdb,
 {
 	int err;
 
-	if (swdev_notify) {
+	if (swdev_notify && rd) {
 		switch (type) {
 		case RTM_NEWNEIGH:
 			err = vxlan_fdb_switchdev_call_notifiers(vxlan, fdb, rd,
@@ -805,6 +825,8 @@ static struct vxlan_fdb *vxlan_fdb_alloc(const u8 *mac, __u16 state,
 	f->flags = ndm_flags;
 	f->updated = f->used = jiffies;
 	f->vni = src_vni;
+	f->nh = NULL;
+	INIT_LIST_HEAD(&f->nh_list);
 	INIT_LIST_HEAD(&f->remotes);
 	memcpy(f->eth_addr, mac, ETH_ALEN);
 
@@ -819,11 +841,76 @@ static void vxlan_fdb_insert(struct vxlan_dev *vxlan, const u8 *mac,
 			   vxlan_fdb_head(vxlan, mac, src_vni));
 }
 
+static int vxlan_fdb_nh_update(struct vxlan_dev *vxlan, struct vxlan_fdb *fdb,
+			       u32 nhid, struct netlink_ext_ack *extack)
+{
+	struct nexthop *old_nh = rtnl_dereference(fdb->nh);
+	struct nh_group *nhg;
+	struct nexthop *nh;
+	int err = -EINVAL;
+
+	if (old_nh && old_nh->id == nhid)
+		return 0;
+
+	nh = nexthop_find_by_id(vxlan->net, nhid);
+	if (!nh) {
+		NL_SET_ERR_MSG(extack, "Nexthop id does not exist");
+		goto err_inval;
+	}
+
+	if (nh) {
+		if (!nexthop_get(nh)) {
+			NL_SET_ERR_MSG(extack, "Nexthop has been deleted");
+			nh = NULL;
+			goto err_inval;
+		}
+		if (!nh->is_fdb_nh) {
+			NL_SET_ERR_MSG(extack, "Nexthop is not a fdb nexthop");
+			goto err_inval;
+		}
+
+		if (!nh->is_group || !nh->nh_grp->mpath) {
+			NL_SET_ERR_MSG(extack, "Nexthop is not a multipath group");
+			goto err_inval;
+		}
+
+		/* check nexthop group family */
+		nhg = rtnl_dereference(nh->nh_grp);
+		switch (vxlan->default_dst.remote_ip.sa.sa_family) {
+		case AF_INET:
+			if (!nhg->has_v4) {
+				err = -EAFNOSUPPORT;
+				NL_SET_ERR_MSG(extack, "Nexthop group family not supported");
+				goto err_inval;
+			}
+			break;
+		case AF_INET6:
+			if (nhg->has_v4) {
+				err = -EAFNOSUPPORT;
+				NL_SET_ERR_MSG(extack, "Nexthop group family not supported");
+				goto err_inval;
+			}
+		}
+	}
+
+	rcu_assign_pointer(fdb->nh, nh);
+	list_add_tail_rcu(&fdb->nh_list, &nh->fdb_list);
+	if (old_nh)
+		nexthop_put(old_nh);
+	return 0;
+
+err_inval:
+	if (nh)
+		nexthop_put(nh);
+	return err;
+}
+
 static int vxlan_fdb_create(struct vxlan_dev *vxlan,
 			    const u8 *mac, union vxlan_addr *ip,
 			    __u16 state, __be16 port, __be32 src_vni,
 			    __be32 vni, __u32 ifindex, __u16 ndm_flags,
-			    struct vxlan_fdb **fdb)
+			    u32 nhid, struct vxlan_fdb **fdb,
+			    struct netlink_ext_ack *extack)
 {
 	struct vxlan_rdst *rd = NULL;
 	struct vxlan_fdb *f;
@@ -838,20 +925,33 @@ static int vxlan_fdb_create(struct vxlan_dev *vxlan,
 	if (!f)
 		return -ENOMEM;
 
-	rc = vxlan_fdb_append(f, ip, port, vni, ifindex, &rd);
-	if (rc < 0) {
-		kfree(f);
-		return rc;
-	}
+	if (nhid)
+		rc = vxlan_fdb_nh_update(vxlan, f, nhid, extack);
+	else
+		rc = vxlan_fdb_append(f, ip, port, vni, ifindex, &rd);
+	if (rc < 0)
+		goto errout;
 
 	*fdb = f;
 
 	return 0;
+
+errout:
+	kfree(f);
+	return rc;
 }
 
 static void __vxlan_fdb_free(struct vxlan_fdb *f)
 {
 	struct vxlan_rdst *rd, *nd;
+	struct nexthop *nh;
+
+	nh = rcu_dereference_raw(f->nh);
+	if (nh) {
+		rcu_assign_pointer(f->nh, NULL);
+		list_del_rcu(&f->nh_list);
+		nexthop_put(nh);
+	}
 
 	list_for_each_entry_safe(rd, nd, &f->remotes, list) {
 		dst_cache_destroy(&rd->dst_cache);
@@ -875,10 +975,15 @@ static void vxlan_fdb_destroy(struct vxlan_dev *vxlan, struct vxlan_fdb *f,
 	netdev_dbg(vxlan->dev, "delete %pM\n", f->eth_addr);
 
 	--vxlan->addrcnt;
-	if (do_notify)
-		list_for_each_entry(rd, &f->remotes, list)
-			vxlan_fdb_notify(vxlan, f, rd, RTM_DELNEIGH,
+	if (do_notify) {
+		if (rcu_access_pointer(f->nh))
+			vxlan_fdb_notify(vxlan, f, NULL, RTM_DELNEIGH,
 					 swdev_notify, NULL);
+		else
+			list_for_each_entry(rd, &f->remotes, list)
+				vxlan_fdb_notify(vxlan, f, rd, RTM_DELNEIGH,
+						 swdev_notify, NULL);
+	}
 
 	hlist_del_rcu(&f->hlist);
 	call_rcu(&f->rcu, vxlan_fdb_free);
@@ -897,7 +1002,7 @@ static int vxlan_fdb_update_existing(struct vxlan_dev *vxlan,
 				     __u16 state, __u16 flags,
 				     __be16 port, __be32 vni,
 				     __u32 ifindex, __u16 ndm_flags,
-				     struct vxlan_fdb *f,
+				     struct vxlan_fdb *f, u32 nhid,
 				     bool swdev_notify,
 				     struct netlink_ext_ack *extack)
 {
@@ -908,6 +1013,12 @@ static int vxlan_fdb_update_existing(struct vxlan_dev *vxlan,
 	int rc = 0;
 	int err;
 
+	if (nhid && !rcu_access_pointer(f->nh)) {
+		NL_SET_ERR_MSG(extack,
+			       "Cannot replace an existing non nexthop fdb with a nexthop");
+		return -EOPNOTSUPP;
+	}
+
 	/* Do not allow an externally learned entry to take over an entry added
 	 * by the user.
 	 */
@@ -925,6 +1036,14 @@ static int vxlan_fdb_update_existing(struct vxlan_dev *vxlan,
 		}
 	}
 
+	if (nhid) {
+		rc = vxlan_fdb_nh_update(vxlan, f, nhid, extack);
+		if (rc < 0)
+			return rc;
+		notify = 1;
+		f->updated = jiffies;
+	}
+
 	if ((flags & NLM_F_REPLACE)) {
 		/* Only change unicasts */
 		if (!(is_multicast_ether_addr(f->eth_addr) ||
@@ -975,7 +1094,7 @@ static int vxlan_fdb_update_create(struct vxlan_dev *vxlan,
 				   const u8 *mac, union vxlan_addr *ip,
 				   __u16 state, __u16 flags,
 				   __be16 port, __be32 src_vni, __be32 vni,
-				   __u32 ifindex, __u16 ndm_flags,
+				   __u32 ifindex, __u16 ndm_flags, u32 nhid,
 				   bool swdev_notify,
 				   struct netlink_ext_ack *extack)
 {
@@ -990,7 +1109,7 @@ static int vxlan_fdb_update_create(struct vxlan_dev *vxlan,
 
 	netdev_dbg(vxlan->dev, "add %pM -> %pIS\n", mac, ip);
 	rc = vxlan_fdb_create(vxlan, mac, ip, state, port, src_vni,
-			      vni, ifindex, fdb_flags, &f);
+			      vni, ifindex, fdb_flags, nhid, &f, extack);
 	if (rc < 0)
 		return rc;
 
@@ -1012,7 +1131,7 @@ static int vxlan_fdb_update(struct vxlan_dev *vxlan,
 			    const u8 *mac, union vxlan_addr *ip,
 			    __u16 state, __u16 flags,
 			    __be16 port, __be32 src_vni, __be32 vni,
-			    __u32 ifindex, __u16 ndm_flags,
+			    __u32 ifindex, __u16 ndm_flags, u32 nhid,
 			    bool swdev_notify,
 			    struct netlink_ext_ack *extack)
 {
@@ -1028,14 +1147,15 @@ static int vxlan_fdb_update(struct vxlan_dev *vxlan,
 
 		return vxlan_fdb_update_existing(vxlan, ip, state, flags, port,
 						 vni, ifindex, ndm_flags, f,
-						 swdev_notify, extack);
+						 nhid, swdev_notify, extack);
 	} else {
 		if (!(flags & NLM_F_CREATE))
 			return -ENOENT;
 
 		return vxlan_fdb_update_create(vxlan, mac, ip, state, flags,
 					       port, src_vni, vni, ifindex,
-					       ndm_flags, swdev_notify, extack);
+					       ndm_flags, nhid, swdev_notify,
+					       extack);
 	}
 }
 
@@ -1049,7 +1169,7 @@ static void vxlan_fdb_dst_destroy(struct vxlan_dev *vxlan, struct vxlan_fdb *f,
 
 static int vxlan_fdb_parse(struct nlattr *tb[], struct vxlan_dev *vxlan,
 			   union vxlan_addr *ip, __be16 *port, __be32 *src_vni,
-			   __be32 *vni, u32 *ifindex)
+			   __be32 *vni, u32 *ifindex, u32 *nhid)
 {
 	struct net *net = dev_net(vxlan->dev);
 	int err;
@@ -1109,6 +1229,11 @@ static int vxlan_fdb_parse(struct nlattr *tb[], struct vxlan_dev *vxlan,
 		*ifindex = 0;
 	}
 
+	if (tb[NDA_NH_ID])
+		*nhid = nla_get_u32(tb[NDA_NH_ID]);
+	else
+		*nhid = 0;
+
 	return 0;
 }
 
@@ -1123,7 +1248,7 @@ static int vxlan_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
 	union vxlan_addr ip;
 	__be16 port;
 	__be32 src_vni, vni;
-	u32 ifindex;
+	u32 ifindex, nhid;
 	u32 hash_index;
 	int err;
 
@@ -1133,10 +1258,11 @@ static int vxlan_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
 		return -EINVAL;
 	}
 
-	if (tb[NDA_DST] == NULL)
+	if (!tb || (!tb[NDA_DST] && !tb[NDA_NH_ID]))
 		return -EINVAL;
 
-	err = vxlan_fdb_parse(tb, vxlan, &ip, &port, &src_vni, &vni, &ifindex);
+	err = vxlan_fdb_parse(tb, vxlan, &ip, &port, &src_vni, &vni, &ifindex,
+			      &nhid);
 	if (err)
 		return err;
 
@@ -1148,7 +1274,7 @@ static int vxlan_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
 	err = vxlan_fdb_update(vxlan, addr, &ip, ndm->ndm_state, flags,
 			       port, src_vni, vni, ifindex,
 			       ndm->ndm_flags | NTF_VXLAN_ADDED_BY_USER,
-			       true, extack);
+			       nhid, true, extack);
 	spin_unlock_bh(&vxlan->hash_lock[hash_index]);
 
 	return err;
@@ -1159,8 +1285,8 @@ static int __vxlan_fdb_delete(struct vxlan_dev *vxlan,
 			      __be16 port, __be32 src_vni, __be32 vni,
 			      u32 ifindex, bool swdev_notify)
 {
-	struct vxlan_fdb *f;
 	struct vxlan_rdst *rd = NULL;
+	struct vxlan_fdb *f;
 	int err = -ENOENT;
 
 	f = vxlan_find_mac(vxlan, addr, src_vni);
@@ -1195,12 +1321,13 @@ static int vxlan_fdb_delete(struct ndmsg *ndm, struct nlattr *tb[],
 	struct vxlan_dev *vxlan = netdev_priv(dev);
 	union vxlan_addr ip;
 	__be32 src_vni, vni;
-	__be16 port;
-	u32 ifindex;
+	u32 ifindex, nhid;
 	u32 hash_index;
+	__be16 port;
 	int err;
 
-	err = vxlan_fdb_parse(tb, vxlan, &ip, &port, &src_vni, &vni, &ifindex);
+	err = vxlan_fdb_parse(tb, vxlan, &ip, &port, &src_vni, &vni, &ifindex,
+			      &nhid);
 	if (err)
 		return err;
 
@@ -1228,6 +1355,17 @@ static int vxlan_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb,
 		hlist_for_each_entry_rcu(f, &vxlan->fdb_head[h], hlist) {
 			struct vxlan_rdst *rd;
 
+			if (rcu_access_pointer(f->nh)) {
+				err = vxlan_fdb_info(skb, vxlan, f,
+						     NETLINK_CB(cb->skb).portid,
+						     cb->nlh->nlmsg_seq,
+						     RTM_NEWNEIGH,
+						     NLM_F_MULTI, NULL);
+				if (err < 0)
+					goto out;
+				continue;
+			}
+
 			list_for_each_entry_rcu(rd, &f->remotes, list) {
 				if (*idx < cb->args[2])
 					goto skip;
@@ -1311,6 +1449,10 @@ static bool vxlan_snoop(struct net_device *dev,
 		if (f->state & (NUD_PERMANENT | NUD_NOARP))
 			return true;
 
+		/* Don't override an fdb with nexthop with a learnt entry */
+		if (rcu_access_pointer(f->nh))
+			return true;
+
 		if (net_ratelimit())
 			netdev_info(dev,
 				    "%pM migrated from %pIS to %pIS\n",
@@ -1333,7 +1475,7 @@ static bool vxlan_snoop(struct net_device *dev,
 					 vxlan->cfg.dst_port,
 					 vni,
 					 vxlan->default_dst.remote_vni,
-					 ifindex, NTF_SELF, true, NULL);
+					 ifindex, NTF_SELF, 0, true, NULL);
 		spin_unlock(&vxlan->hash_lock[hash_index]);
 	}
 
@@ -2616,6 +2758,30 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 	kfree_skb(skb);
 }
 
+static void vxlan_xmit_nh(struct sk_buff *skb, struct net_device *dev,
+			  struct vxlan_fdb *f, __be32 vni, bool did_rsc)
+{
+	struct vxlan_rdst nh_rdst;
+	struct nexthop *nh;
+	bool do_xmit;
+	u32 hash;
+
+	memset(&nh_rdst, 0, sizeof(struct vxlan_rdst));
+	hash = skb_get_hash(skb);
+
+	rcu_read_lock();
+	nh = rcu_dereference(f->nh);
+	if (!nh)
+		return;
+	do_xmit = vxlan_fdb_nh_path_select(nh, hash, &nh_rdst);
+	rcu_read_unlock();
+
+	if (likely(do_xmit))
+		vxlan_xmit_one(skb, dev, vni, &nh_rdst, did_rsc);
+	else
+		kfree_skb(skb);
+}
+
 /* Transmit local packets over Vxlan
  *
  * Outer IP header inherits ECN and DF from inner header.
@@ -2692,22 +2858,27 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev)
 		}
 	}
 
-	list_for_each_entry_rcu(rdst, &f->remotes, list) {
-		struct sk_buff *skb1;
+	if (rcu_access_pointer(f->nh)) {
+		vxlan_xmit_nh(skb, dev, f,
+			      (vni ? : vxlan->default_dst.remote_vni), did_rsc);
+	} else {
+		list_for_each_entry_rcu(rdst, &f->remotes, list) {
+			struct sk_buff *skb1;
 
-		if (!fdst) {
-			fdst = rdst;
-			continue;
+			if (!fdst) {
+				fdst = rdst;
+				continue;
+			}
+			skb1 = skb_clone(skb, GFP_ATOMIC);
+			if (skb1)
+				vxlan_xmit_one(skb1, dev, vni, rdst, did_rsc);
 		}
-		skb1 = skb_clone(skb, GFP_ATOMIC);
-		if (skb1)
-			vxlan_xmit_one(skb1, dev, vni, rdst, did_rsc);
+		if (fdst)
+			vxlan_xmit_one(skb, dev, vni, fdst, did_rsc);
+		else
+			kfree_skb(skb);
 	}
 
-	if (fdst)
-		vxlan_xmit_one(skb, dev, vni, fdst, did_rsc);
-	else
-		kfree_skb(skb);
 	return NETDEV_TX_OK;
 }
 
@@ -3615,7 +3786,7 @@ static int __vxlan_dev_create(struct net *net, struct net_device *dev,
 				       dst->remote_vni,
 				       dst->remote_vni,
 				       dst->remote_ifindex,
-				       NTF_SELF, &f);
+				       NTF_SELF, 0, &f, extack);
 		if (err)
 			return err;
 	}
@@ -4013,7 +4184,7 @@ static int vxlan_changelink(struct net_device *dev, struct nlattr *tb[],
 					       vxlan->cfg.dst_port,
 					       conf.vni, conf.vni,
 					       conf.remote_ifindex,
-					       NTF_SELF, true, extack);
+					       NTF_SELF, 0, true, extack);
 			if (err) {
 				spin_unlock_bh(&vxlan->hash_lock[hash_index]);
 				netdev_adjacent_change_abort(dst->remote_dev,
@@ -4335,7 +4506,7 @@ vxlan_fdb_external_learn_add(struct net_device *dev,
 			       fdb_info->remote_vni,
 			       fdb_info->remote_ifindex,
 			       NTF_USE | NTF_SELF | NTF_EXT_LEARNED,
-			       false, extack);
+			       0, false, extack);
 	spin_unlock_bh(&vxlan->hash_lock[hash_index]);
 
 	return err;
diff --git a/include/net/nexthop.h b/include/net/nexthop.h
index 04dafc6..d929c98 100644
--- a/include/net/nexthop.h
+++ b/include/net/nexthop.h
@@ -331,6 +331,8 @@ static inline struct fib_nh_common *nexthop_path_fdb_result(struct nexthop *nh,
 	struct nexthop *nhp;
 
 	nhp = nexthop_select_path(nh, hash);
+	if (unlikely(!nhp))
+		return NULL;
 	nhi = rcu_dereference(nhp->nh_info);
 	return &nhi->fib_nhc;
 }
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index 373aadc..31bca3e 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -487,4 +487,28 @@ static inline void vxlan_flag_attr_error(int attrtype,
 #undef VXLAN_FLAG
 }
 
+static inline bool vxlan_fdb_nh_path_select(struct nexthop *nh,
+					    int hash,
+					    struct vxlan_rdst *rdst)
+{
+	struct fib_nh_common *nhc;
+
+	nhc = nexthop_path_fdb_result(nh, hash);
+	if (unlikely(!nhc))
+		return false;
+
+	switch (nhc->nhc_gw_family) {
+	case AF_INET:
+		rdst->remote_ip.sin.sin_addr.s_addr = nhc->nhc_gw.ipv4;
+		rdst->remote_ip.sa.sa_family = AF_INET;
+		break;
+	case AF_INET6:
+		rdst->remote_ip.sin6.sin6_addr = nhc->nhc_gw.ipv6;
+		rdst->remote_ip.sa.sa_family = AF_INET6;
+		break;
+	}
+
+	return true;
+}
+
 #endif
diff --git a/include/uapi/linux/neighbour.h b/include/uapi/linux/neighbour.h
index cd144e3..eefcda8 100644
--- a/include/uapi/linux/neighbour.h
+++ b/include/uapi/linux/neighbour.h
@@ -29,6 +29,7 @@ enum {
 	NDA_LINK_NETNSID,
 	NDA_SRC_VNI,
 	NDA_PROTOCOL,  /* Originator of entry */
+	NDA_NH_ID,
 	__NDA_MAX
 };
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next 4/6] nexthop: add support for notifiers
  2020-05-19  2:14 [PATCH net-next 0/6] Support for fdb ECMP nexthop groups Roopa Prabhu
                   ` (2 preceding siblings ...)
  2020-05-19  2:14 ` [PATCH net-next 3/6] vxlan: ecmp support for mac fdb entries Roopa Prabhu
@ 2020-05-19  2:14 ` Roopa Prabhu
  2020-05-19  2:14 ` [PATCH net-next 5/6] vxlan: support for nexthop notifiers Roopa Prabhu
  2020-05-19  2:14 ` [PATCH net-next 6/6] selftests: net: add fdb nexthop tests Roopa Prabhu
  5 siblings, 0 replies; 20+ messages in thread
From: Roopa Prabhu @ 2020-05-19  2:14 UTC (permalink / raw)
  To: dsahern, davem; +Cc: netdev, nikolay, jiri, idosch, petrm

From: Roopa Prabhu <roopa@cumulusnetworks.com>

This patch adds nexthop add/del notifiers. To be used by
vxlan driver in a later patch. Could possibly be used by
switchdev drivers in the future.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
---
 include/net/netns/nexthop.h |  1 +
 include/net/nexthop.h       | 12 ++++++++++++
 net/ipv4/nexthop.c          | 28 ++++++++++++++++++++++++++++
 3 files changed, 41 insertions(+)

diff --git a/include/net/netns/nexthop.h b/include/net/netns/nexthop.h
index c712ee5..1937476 100644
--- a/include/net/netns/nexthop.h
+++ b/include/net/netns/nexthop.h
@@ -14,5 +14,6 @@ struct netns_nexthop {
 
 	unsigned int		seq;		/* protected by rtnl_mutex */
 	u32			last_id_allocated;
+	struct atomic_notifier_head notifier_chain;
 };
 #endif
diff --git a/include/net/nexthop.h b/include/net/nexthop.h
index d929c98..4c95168 100644
--- a/include/net/nexthop.h
+++ b/include/net/nexthop.h
@@ -10,6 +10,7 @@
 #define __LINUX_NEXTHOP_H
 
 #include <linux/netdevice.h>
+#include <linux/notifier.h>
 #include <linux/route.h>
 #include <linux/types.h>
 #include <net/ip_fib.h>
@@ -102,6 +103,17 @@ struct nexthop {
 	};
 };
 
+enum nexthop_event_type {
+	NEXTHOP_EVENT_ADD,
+	NEXTHOP_EVENT_DEL
+};
+
+int call_nexthop_notifier(struct notifier_block *nb, struct net *net,
+			  enum nexthop_event_type event_type,
+			  struct nexthop *nh);
+int register_nexthop_notifier(struct net *net, struct notifier_block *nb);
+int unregister_nexthop_notifier(struct net *net, struct notifier_block *nb);
+
 /* caller is holding rcu or rtnl; no reference taken to nexthop */
 struct nexthop *nexthop_find_by_id(struct net *net, u32 id);
 void nexthop_free_rcu(struct rcu_head *head);
diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
index d314b27..a13ce753 100644
--- a/net/ipv4/nexthop.c
+++ b/net/ipv4/nexthop.c
@@ -36,6 +36,17 @@ static const struct nla_policy rtm_nh_policy[NHA_MAX + 1] = {
 	[NHA_FDB]		= { .type = NLA_FLAG },
 };
 
+static int call_nexthop_notifiers(struct net *net,
+				  enum fib_event_type event_type,
+				  struct nexthop *nh)
+{
+	int err;
+
+	err = atomic_notifier_call_chain(&net->nexthop.notifier_chain,
+					 event_type, nh);
+	return notifier_to_errno(err);
+}
+
 static unsigned int nh_dev_hashfn(unsigned int val)
 {
 	unsigned int mask = NH_DEV_HASHSIZE - 1;
@@ -831,6 +842,8 @@ static void __remove_nexthop_fib(struct net *net, struct nexthop *nh)
 	bool do_flush = false;
 	struct fib_info *fi;
 
+	call_nexthop_notifiers(net, NEXTHOP_EVENT_DEL, nh);
+
 	list_for_each_entry(fi, &nh->fi_list, nh_list) {
 		fi->fib_flags |= RTNH_F_DEAD;
 		do_flush = true;
@@ -1867,6 +1880,21 @@ static struct notifier_block nh_netdev_notifier = {
 	.notifier_call = nh_netdev_event,
 };
 
+static ATOMIC_NOTIFIER_HEAD(nexthop_notif_chain);
+
+int register_nexthop_notifier(struct net *net, struct notifier_block *nb)
+{
+	return atomic_notifier_chain_register(&net->nexthop.notifier_chain, nb);
+}
+EXPORT_SYMBOL(register_nexthop_notifier);
+
+int unregister_nexthop_notifier(struct net *net, struct notifier_block *nb)
+{
+	return atomic_notifier_chain_unregister(&net->nexthop.notifier_chain,
+						nb);
+}
+EXPORT_SYMBOL(unregister_nexthop_notifier);
+
 static void __net_exit nexthop_net_exit(struct net *net)
 {
 	rtnl_lock();
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next 5/6] vxlan: support for nexthop notifiers
  2020-05-19  2:14 [PATCH net-next 0/6] Support for fdb ECMP nexthop groups Roopa Prabhu
                   ` (3 preceding siblings ...)
  2020-05-19  2:14 ` [PATCH net-next 4/6] nexthop: add support for notifiers Roopa Prabhu
@ 2020-05-19  2:14 ` Roopa Prabhu
  2020-05-19  2:14 ` [PATCH net-next 6/6] selftests: net: add fdb nexthop tests Roopa Prabhu
  5 siblings, 0 replies; 20+ messages in thread
From: Roopa Prabhu @ 2020-05-19  2:14 UTC (permalink / raw)
  To: dsahern, davem; +Cc: netdev, nikolay, jiri, idosch, petrm

From: Roopa Prabhu <roopa@cumulusnetworks.com>

vxlan driver registers for nexthop add/del notifiers to
cleanup fdb entries pointing to such nexthops.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
---
 drivers/net/vxlan.c | 33 +++++++++++++++++++++++++++++----
 1 file changed, 29 insertions(+), 4 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 01933e9..f9c27ff 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -81,6 +81,7 @@ struct vxlan_fdb {
 	u16		  flags;	/* see ndm_flags and below */
 	struct list_head  nh_list;
 	struct nexthop __rcu *nh;
+	struct vxlan_dev  *vdev;
 };
 
 #define NTF_VXLAN_ADDED_BY_USER 0x100
@@ -813,8 +814,9 @@ static int vxlan_gro_complete(struct sock *sk, struct sk_buff *skb, int nhoff)
 	return eth_gro_complete(skb, nhoff + sizeof(struct vxlanhdr));
 }
 
-static struct vxlan_fdb *vxlan_fdb_alloc(const u8 *mac, __u16 state,
-					 __be32 src_vni, __u16 ndm_flags)
+static struct vxlan_fdb *vxlan_fdb_alloc(struct vxlan_dev *vxlan, const u8 *mac,
+					 __u16 state, __be32 src_vni,
+					 __u16 ndm_flags)
 {
 	struct vxlan_fdb *f;
 
@@ -826,6 +828,7 @@ static struct vxlan_fdb *vxlan_fdb_alloc(const u8 *mac, __u16 state,
 	f->updated = f->used = jiffies;
 	f->vni = src_vni;
 	f->nh = NULL;
+	f->vdev = vxlan;
 	INIT_LIST_HEAD(&f->nh_list);
 	INIT_LIST_HEAD(&f->remotes);
 	memcpy(f->eth_addr, mac, ETH_ALEN);
@@ -921,7 +924,7 @@ static int vxlan_fdb_create(struct vxlan_dev *vxlan,
 		return -ENOSPC;
 
 	netdev_dbg(vxlan->dev, "add %pM -> %pIS\n", mac, ip);
-	f = vxlan_fdb_alloc(mac, state, src_vni, ndm_flags);
+	f = vxlan_fdb_alloc(vxlan, mac, state, src_vni, ndm_flags);
 	if (!f)
 		return -ENOMEM;
 
@@ -986,6 +989,7 @@ static void vxlan_fdb_destroy(struct vxlan_dev *vxlan, struct vxlan_fdb *f,
 	}
 
 	hlist_del_rcu(&f->hlist);
+	f->vdev = NULL;
 	call_rcu(&f->rcu, vxlan_fdb_free);
 }
 
@@ -4581,6 +4585,25 @@ static struct notifier_block vxlan_switchdev_notifier_block __read_mostly = {
 	.notifier_call = vxlan_switchdev_event,
 };
 
+static int vxlan_nexthop_event(struct notifier_block *nb,
+			       unsigned long event, void *ptr)
+{
+	struct nexthop *nh = ptr;
+	struct vxlan_fdb *fdb, *tmp;
+
+	if (!nh || event != NEXTHOP_EVENT_DEL)
+		return NOTIFY_DONE;
+
+	list_for_each_entry_safe(fdb, tmp, &nh->fdb_list, nh_list)
+		vxlan_fdb_destroy(fdb->vdev, fdb, false, false);
+
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block vxlan_nexthop_notifier_block __read_mostly = {
+	.notifier_call = vxlan_nexthop_event,
+};
+
 static __net_init int vxlan_init_net(struct net *net)
 {
 	struct vxlan_net *vn = net_generic(net, vxlan_net_id);
@@ -4592,7 +4615,7 @@ static __net_init int vxlan_init_net(struct net *net)
 	for (h = 0; h < PORT_HASH_SIZE; ++h)
 		INIT_HLIST_HEAD(&vn->sock_list[h]);
 
-	return 0;
+	return register_nexthop_notifier(net, &vxlan_nexthop_notifier_block);
 }
 
 static void vxlan_destroy_tunnels(struct net *net, struct list_head *head)
@@ -4625,6 +4648,8 @@ static void __net_exit vxlan_exit_batch_net(struct list_head *net_list)
 
 	rtnl_lock();
 	list_for_each_entry(net, net_list, exit_list)
+		unregister_nexthop_notifier(net, &vxlan_nexthop_notifier_block);
+	list_for_each_entry(net, net_list, exit_list)
 		vxlan_destroy_tunnels(net, &list);
 
 	unregister_netdevice_many(&list);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next 6/6] selftests: net: add fdb nexthop tests
  2020-05-19  2:14 [PATCH net-next 0/6] Support for fdb ECMP nexthop groups Roopa Prabhu
                   ` (4 preceding siblings ...)
  2020-05-19  2:14 ` [PATCH net-next 5/6] vxlan: support for nexthop notifiers Roopa Prabhu
@ 2020-05-19  2:14 ` Roopa Prabhu
  2020-05-19  4:00   ` David Ahern
  5 siblings, 1 reply; 20+ messages in thread
From: Roopa Prabhu @ 2020-05-19  2:14 UTC (permalink / raw)
  To: dsahern, davem; +Cc: netdev, nikolay, jiri, idosch, petrm

From: Roopa Prabhu <roopa@cumulusnetworks.com>

This commit adds ipv4 and ipv6 fdb api tests to fib_nexthops.sh.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
---
 tools/testing/selftests/net/fib_nexthops.sh | 140 +++++++++++++++++++++++++++-
 1 file changed, 138 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/net/fib_nexthops.sh b/tools/testing/selftests/net/fib_nexthops.sh
index 50d822f..eb4c270 100755
--- a/tools/testing/selftests/net/fib_nexthops.sh
+++ b/tools/testing/selftests/net/fib_nexthops.sh
@@ -19,8 +19,8 @@ ret=0
 ksft_skip=4
 
 # all tests in this script. Can be overridden with -t option
-IPV4_TESTS="ipv4_fcnal ipv4_grp_fcnal ipv4_withv6_fcnal ipv4_fcnal_runtime ipv4_compat_mode"
-IPV6_TESTS="ipv6_fcnal ipv6_grp_fcnal ipv6_fcnal_runtime ipv6_compat_mode"
+IPV4_TESTS="ipv4_fcnal ipv4_grp_fcnal ipv4_withv6_fcnal ipv4_fcnal_runtime ipv4_compat_mode ipv4_fdb_grp_fcnal"
+IPV6_TESTS="ipv6_fcnal ipv6_grp_fcnal ipv6_fcnal_runtime ipv6_compat_mode ipv6_fdb_grp_fcnal"
 
 ALL_TESTS="basic ${IPV4_TESTS} ${IPV6_TESTS}"
 TESTS="${ALL_TESTS}"
@@ -146,6 +146,7 @@ setup()
 	create_ns remote
 
 	IP="ip -netns me"
+	BRIDGE="bridge -netns me"
 	set -e
 	$IP li add veth1 type veth peer name veth2
 	$IP li set veth1 up
@@ -280,6 +281,141 @@ stop_ip_monitor()
 	return $rc
 }
 
+check_nexthop_fdb_support()
+{
+	$IP nexthop help 2>&1 | grep -q fdb
+	if [ $? -ne 0 ]; then
+		echo "SKIP: iproute2 too old, missing fdb nexthop support"
+		return $ksft_skip
+	fi
+}
+
+ipv6_fdb_grp_fcnal()
+{
+	local rc
+
+	echo
+	echo "IPv6 fdb groups functional"
+	echo "--------------------------"
+
+	check_nexthop_fdb_support
+	if [ $? -eq $ksft_skip ]; then
+		return $ksft_skip
+	fi
+
+	# create group with multiple nexthops
+	run_cmd "$IP nexthop add id 61 via 2001:db8:91::2 fdb"
+	run_cmd "$IP nexthop add id 62 via 2001:db8:91::3 fdb"
+	run_cmd "$IP nexthop add id 102 group 61/62 fdb"
+	check_nexthop "id 102" "id 102 group 61/62 fdb"
+	log_test $? 0 "Fdb Nexthop group with multiple nexthops"
+
+	## get nexthop group
+	run_cmd "$IP nexthop get id 102"
+	check_nexthop "id 102" "id 102 group 61/62 fdb"
+	log_test $? 0 "Get Fdb nexthop group by id"
+
+	# fdb nexthop group can only contain fdb nexthops
+	run_cmd "$IP nexthop add id 63 via 2001:db8:91::4"
+	run_cmd "$IP nexthop add id 64 via 2001:db8:91::5"
+	run_cmd "$IP nexthop add id 103 group 63/64 fdb"
+	log_test $? 2 "Fdb Nexthop group with non-fdb nexthops"
+
+	# Non fdb nexthop group can not contain fdb nexthops
+	run_cmd "$IP nexthop add id 65 via 2001:db8:91::5 fdb"
+	run_cmd "$IP nexthop add id 66 via 2001:db8:91::6 fdb"
+	run_cmd "$IP nexthop add id 104 group 65/66"
+	log_test $? 2 "Non-Fdb Nexthop group with non nexthops"
+
+	# fdb nexthop cannot have blackhole
+	run_cmd "$IP nexthop add id 67 blackhole fdb"
+	log_test $? 2 "Fdb Nexthop with blackhole"
+
+	# fdb nexthop with oif
+	run_cmd "$IP nexthop add id 68 via 2001:db8:91::7 dev veth1 fdb"
+	log_test $? 2 "Fdb Nexthop with oif"
+
+	# fdb nexthop with encap
+	run_cmd "$IP nexthop add id 69 encap mpls 101 via 2001:db8:91::8 dev veth1 fdb"
+	log_test $? 2 "Fdb Nexthop with encap"
+
+	run_cmd "$IP link add name vx10 type vxlan id 1010 local 2001:db8:91::9 remote 2001:db8:91::10 dstport 4789 nolearning noudpcsum tos inherit ttl 100"
+	run_cmd "$BRIDGE fdb add 02:02:00:00:00:13 dev vx10 nhid 102 self"
+	log_test $? 0 "Fdb mac add with nexthop group"
+
+	## fdb nexthops can only reference nexthop groups and not nexthops
+	run_cmd "$BRIDGE fdb add 02:02:00:00:00:14 dev vx10 nhid 61 self"
+	log_test $? 255 "Fdb mac add with nexthop"
+
+	run_cmd "$IP nexthop del id 102"
+	log_test $? 0 "Fdb nexthop delete"
+
+	$IP link del dev vx10
+}
+
+ipv4_fdb_grp_fcnal()
+{
+	local rc
+
+	echo
+	echo "IPv4 fdb groups functional"
+	echo "--------------------------"
+
+	check_nexthop_fdb_support
+	if [ $? -eq $ksft_skip ]; then
+		return $ksft_skip
+	fi
+
+	# create group with multiple nexthops
+	run_cmd "$IP nexthop add id 12 via 172.16.1.2 fdb"
+	run_cmd "$IP nexthop add id 13 via 172.16.1.3 fdb"
+	run_cmd "$IP nexthop add id 102 group 12/13 fdb"
+	check_nexthop "id 102" "id 102 group 12/13 fdb"
+	log_test $? 0 "Fdb Nexthop group with multiple nexthops"
+
+	# get nexthop group
+	run_cmd "$IP nexthop get id 102"
+	check_nexthop "id 102" "id 102 group 12/13 fdb"
+	log_test $? 0 "Get Fdb nexthop group by id"
+
+	# fdb nexthop group can only contain fdb nexthops
+	run_cmd "$IP nexthop add id 14 via 172.16.1.2"
+	run_cmd "$IP nexthop add id 15 via 172.16.1.3"
+	run_cmd "$IP nexthop add id 103 group 14/15 fdb"
+	log_test $? 2 "Fdb Nexthop group with non-fdb nexthops"
+
+	# Non fdb nexthop group can not contain fdb nexthops
+	run_cmd "$IP nexthop add id 16 via 172.16.1.2 fdb"
+	run_cmd "$IP nexthop add id 17 via 172.16.1.3 fdb"
+	run_cmd "$IP nexthop add id 104 group 14/15"
+	log_test $? 2 "Non-Fdb Nexthop group with non nexthops"
+
+	# fdb nexthop cannot have blackhole
+	run_cmd "$IP nexthop add id 18 blackhole fdb"
+	log_test $? 2 "Fdb Nexthop with blackhole"
+
+	# fdb nexthop with oif
+	run_cmd "$IP nexthop add id 16 via 172.16.1.2 dev veth1 fdb"
+	log_test $? 2 "Fdb Nexthop with oif"
+
+	# fdb nexthop with encap
+	run_cmd "$IP nexthop add id 17 encap mpls 101 via 172.16.1.2 dev veth1 fdb"
+	log_test $? 2 "Fdb Nexthop with encap"
+
+	run_cmd "$IP link add name vx10 type vxlan id 1010 local 10.0.0.1 remote 10.0.0.2 dstport 4789 nolearning noudpcsum tos inherit ttl 100"
+	run_cmd "$BRIDGE fdb add 02:02:00:00:00:13 dev vx10 nhid 102 self"
+	log_test $? 0 "Fdb mac add with nexthop group"
+
+	# fdb nexthops can only reference nexthop groups and not nexthops
+	run_cmd "$BRIDGE fdb add 02:02:00:00:00:14 dev vx10 nhid 12 self"
+	log_test $? 255 "Fdb mac add with nexthop"
+
+	run_cmd "$IP nexthop del id 102"
+	log_test $? 0 "Fdb nexthop delete"
+
+	$IP link del dev vx10
+}
+
 ################################################################################
 # basic operations (add, delete, replace) on nexthops and nexthop groups
 #
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 1/6] nexthop: dereference nh only once in nexthop_select_path
  2020-05-19  2:14 ` [PATCH net-next 1/6] nexthop: dereference nh only once in nexthop_select_path Roopa Prabhu
@ 2020-05-19  3:25   ` David Ahern
  2020-05-19  8:48     ` Nikolay Aleksandrov
  0 siblings, 1 reply; 20+ messages in thread
From: David Ahern @ 2020-05-19  3:25 UTC (permalink / raw)
  To: Roopa Prabhu, davem; +Cc: netdev, nikolay, jiri, idosch, petrm

On 5/18/20 8:14 PM, Roopa Prabhu wrote:
> From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> 
> the ->nh pointer might become suddenly null while we're selecting the
> path and we may dereference it. Dereference it only once in the
> beginning and use that if it's not null, we rely on the refcounting and
> rcu to protect against use-after-free. (This is needed for later
> vxlan patches that exposes the problem)
> 
> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
> ---
>  net/ipv4/nexthop.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 

Reviewed-by: David Ahern <dsahern@gmail.com>

Should this be a bug fix? Any chance the route path can hit it?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 2/6] nexthop: support for fdb ecmp nexthops
  2020-05-19  2:14 ` [PATCH net-next 2/6] nexthop: support for fdb ecmp nexthops Roopa Prabhu
@ 2020-05-19  3:53   ` David Ahern
       [not found]     ` <CAJieiUjXO6h9HtwTn3fv7W=WovyUxzU2+EZ_Off6kxxRfgyUKQ@mail.gmail.com>
  0 siblings, 1 reply; 20+ messages in thread
From: David Ahern @ 2020-05-19  3:53 UTC (permalink / raw)
  To: Roopa Prabhu, davem; +Cc: netdev, nikolay, jiri, idosch, petrm

On 5/18/20 8:14 PM, Roopa Prabhu wrote:

> diff --git a/net/core/neighbour.c b/net/core/neighbour.c
> index b607ea6..37e4dba 100644
> --- a/net/core/neighbour.c
> +++ b/net/core/neighbour.c
> @@ -1771,6 +1771,7 @@ static struct neigh_table *neigh_find_table(int family)
>  }
>  
>  const struct nla_policy nda_policy[NDA_MAX+1] = {
> +	[NDA_UNSPEC]		= { .strict_start_type = NDA_NH_ID },
>  	[NDA_DST]		= { .type = NLA_BINARY, .len = MAX_ADDR_LEN },
>  	[NDA_LLADDR]		= { .type = NLA_BINARY, .len = MAX_ADDR_LEN },
>  	[NDA_CACHEINFO]		= { .len = sizeof(struct nda_cacheinfo) },
> @@ -1781,6 +1782,7 @@ const struct nla_policy nda_policy[NDA_MAX+1] = {
>  	[NDA_IFINDEX]		= { .type = NLA_U32 },
>  	[NDA_MASTER]		= { .type = NLA_U32 },
>  	[NDA_PROTOCOL]		= { .type = NLA_U8 },
> +	[NDA_NH_ID]		= { .type = NLA_U32 },

I think you also need a checks where nda_policy is used to detect if
NDA_NH_ID is set. Since the neighbor code ignores the attribute it
should send back an error if set.

Otherwise looks ok to me.

Reviewed-by: David Ahern <dsahern@gmail.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 3/6] vxlan: ecmp support for mac fdb entries
  2020-05-19  2:14 ` [PATCH net-next 3/6] vxlan: ecmp support for mac fdb entries Roopa Prabhu
@ 2020-05-19  3:57   ` David Ahern
  2020-05-19 11:13   ` Nikolay Aleksandrov
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 20+ messages in thread
From: David Ahern @ 2020-05-19  3:57 UTC (permalink / raw)
  To: Roopa Prabhu, davem; +Cc: netdev, nikolay, jiri, idosch, petrm

On 5/18/20 8:14 PM, Roopa Prabhu wrote:
> diff --git a/include/net/nexthop.h b/include/net/nexthop.h
> index 04dafc6..d929c98 100644
> --- a/include/net/nexthop.h
> +++ b/include/net/nexthop.h
> @@ -331,6 +331,8 @@ static inline struct fib_nh_common *nexthop_path_fdb_result(struct nexthop *nh,
>  	struct nexthop *nhp;
>  
>  	nhp = nexthop_select_path(nh, hash);
> +	if (unlikely(!nhp))
> +		return NULL;
>  	nhi = rcu_dereference(nhp->nh_info);
>  	return &nhi->fib_nhc;
>  }

that should be folded into patch 2 which introduces nexthop_path_fdb_result


> diff --git a/include/uapi/linux/neighbour.h b/include/uapi/linux/neighbour.h
> index cd144e3..eefcda8 100644
> --- a/include/uapi/linux/neighbour.h
> +++ b/include/uapi/linux/neighbour.h
> @@ -29,6 +29,7 @@ enum {
>  	NDA_LINK_NETNSID,
>  	NDA_SRC_VNI,
>  	NDA_PROTOCOL,  /* Originator of entry */
> +	NDA_NH_ID,
>  	__NDA_MAX
>  };
>  
> 

That needs to be in Patch 2 for it to compile.

I am not qualified to review the vxlan code.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 6/6] selftests: net: add fdb nexthop tests
  2020-05-19  2:14 ` [PATCH net-next 6/6] selftests: net: add fdb nexthop tests Roopa Prabhu
@ 2020-05-19  4:00   ` David Ahern
  0 siblings, 0 replies; 20+ messages in thread
From: David Ahern @ 2020-05-19  4:00 UTC (permalink / raw)
  To: Roopa Prabhu, davem; +Cc: netdev, nikolay, jiri, idosch, petrm

On 5/18/20 8:14 PM, Roopa Prabhu wrote:
> diff --git a/tools/testing/selftests/net/fib_nexthops.sh b/tools/testing/selftests/net/fib_nexthops.sh
> index 50d822f..eb4c270 100755
> --- a/tools/testing/selftests/net/fib_nexthops.sh
> +++ b/tools/testing/selftests/net/fib_nexthops.sh
> @@ -19,8 +19,8 @@ ret=0
>  ksft_skip=4
>  
>  # all tests in this script. Can be overridden with -t option
> -IPV4_TESTS="ipv4_fcnal ipv4_grp_fcnal ipv4_withv6_fcnal ipv4_fcnal_runtime ipv4_compat_mode"
> -IPV6_TESTS="ipv6_fcnal ipv6_grp_fcnal ipv6_fcnal_runtime ipv6_compat_mode"
> +IPV4_TESTS="ipv4_fcnal ipv4_grp_fcnal ipv4_withv6_fcnal ipv4_fcnal_runtime ipv4_compat_mode ipv4_fdb_grp_fcnal"
> +IPV6_TESTS="ipv6_fcnal ipv6_grp_fcnal ipv6_fcnal_runtime ipv6_compat_mode ipv6_fdb_grp_fcnal"
>  
>  ALL_TESTS="basic ${IPV4_TESTS} ${IPV6_TESTS}"
>  TESTS="${ALL_TESTS}"

add negative tests to the other existing sets to verify fdb nh can not
be used with IPv4 or IPv6 routes and fdb nh can be part of v4/v6
multipath group.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 1/6] nexthop: dereference nh only once in nexthop_select_path
  2020-05-19  3:25   ` David Ahern
@ 2020-05-19  8:48     ` Nikolay Aleksandrov
  2020-05-19 10:04       ` Nikolay Aleksandrov
  0 siblings, 1 reply; 20+ messages in thread
From: Nikolay Aleksandrov @ 2020-05-19  8:48 UTC (permalink / raw)
  To: David Ahern, Roopa Prabhu, davem; +Cc: netdev, jiri, idosch, petrm

On 19/05/2020 06:25, David Ahern wrote:
> On 5/18/20 8:14 PM, Roopa Prabhu wrote:
>> From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>>
>> the ->nh pointer might become suddenly null while we're selecting the
>> path and we may dereference it. Dereference it only once in the
>> beginning and use that if it's not null, we rely on the refcounting and
>> rcu to protect against use-after-free. (This is needed for later
>> vxlan patches that exposes the problem)
>>
>> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
>> ---
>>  net/ipv4/nexthop.c | 13 +++++++++----
>>  1 file changed, 9 insertions(+), 4 deletions(-)
>>
> 
> Reviewed-by: David Ahern <dsahern@gmail.com>
> 
> Should this be a bug fix? Any chance the route path can hit it?
> 

That was fast, I didn't expect to see it on upstream so quickly. :)
So I haven't had time to inspect it in detail, but it did seem to me
that it should be possible to hit from the route path. When I tried
running a few basic tests to make it happen I couldn't mainly due to the
fib flush done at nexthop removal, but I still believe with the right
timing one could hit it.

In fact I'd go 1 step further and add a null check from the return
of nexthop_select_path() in the helpers which dereference the value it
returns like:  nexthop_path_fib6_result() and nexthop_path_fib_result()

The reason is that the .nh ptr is set and read without any sync primitives
so in theory it can become null at any point (being cleared on nh group removal),
and also the nh count in a group (num_nh), when it becomes == 0 while destroying a nh group
and we hit it then in nexthop_select_path() rc would remain == NULL and we'll
deref a null ptr. We did see the above with the vxlan code due to it missing the equivalent
of a fib flush (or rather it being more relaxed), but I haven't had time to see how feasible
it is to hit it via the route path yet.






^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 1/6] nexthop: dereference nh only once in nexthop_select_path
  2020-05-19  8:48     ` Nikolay Aleksandrov
@ 2020-05-19 10:04       ` Nikolay Aleksandrov
  0 siblings, 0 replies; 20+ messages in thread
From: Nikolay Aleksandrov @ 2020-05-19 10:04 UTC (permalink / raw)
  To: David Ahern, Roopa Prabhu, davem; +Cc: netdev, jiri, idosch, petrm

On 19/05/2020 11:48, Nikolay Aleksandrov wrote:
> On 19/05/2020 06:25, David Ahern wrote:
>> On 5/18/20 8:14 PM, Roopa Prabhu wrote:
>>> From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>>>
>>> the ->nh pointer might become suddenly null while we're selecting the
>>> path and we may dereference it. Dereference it only once in the
>>> beginning and use that if it's not null, we rely on the refcounting and
>>> rcu to protect against use-after-free. (This is needed for later
>>> vxlan patches that exposes the problem)
>>>
>>> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>>> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
>>> ---
>>>  net/ipv4/nexthop.c | 13 +++++++++----
>>>  1 file changed, 9 insertions(+), 4 deletions(-)
>>>
>>
>> Reviewed-by: David Ahern <dsahern@gmail.com>
>>
>> Should this be a bug fix? Any chance the route path can hit it?
>>
> 
> That was fast, I didn't expect to see it on upstream so quickly. :)
> So I haven't had time to inspect it in detail, but it did seem to me
> that it should be possible to hit from the route path. When I tried
> running a few basic tests to make it happen I couldn't mainly due to the
> fib flush done at nexthop removal, but I still believe with the right
> timing one could hit it.
> 
> In fact I'd go 1 step further and add a null check from the return
> of nexthop_select_path() in the helpers which dereference the value it
> returns like:  nexthop_path_fib6_result() and nexthop_path_fib_result()
> 
> The reason is that the .nh ptr is set and read without any sync primitives
> so in theory it can become null at any point (being cleared on nh group removal),
> and also the nh count in a group (num_nh), when it becomes == 0 while destroying a nh group
> and we hit it then in nexthop_select_path() rc would remain == NULL and we'll
> deref a null ptr. We did see the above with the vxlan code due to it missing the equivalent
> of a fib flush (or rather it being more relaxed), but I haven't had time to see how feasible
> it is to hit it via the route path yet.
> 

Actually got it pretty easy via nexthop replace, that nulls .nh and can cause the above
very quickly if running in parallel with some traffic to those routes.

Here's the route NULL ptr deref:
[  322.517290] BUG: kernel NULL pointer dereference, address: 0000000000000070
[  322.517670] #PF: supervisor read access in kernel mode
[  322.517935] #PF: error_code(0x0000) - not-present page
[  322.518213] PGD 0 P4D 0 
[  322.518388] Oops: 0000 [#1] SMP PTI
[  322.518601] CPU: 1 PID: 58185 Comm: ping Not tainted 5.7.0-rc5+ #190
[  322.518911] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
[  322.519490] RIP: 0010:fib_select_multipath+0x5a/0x2ac
[  322.519776] Code: 85 db 48 89 44 24 08 40 0f 95 c6 31 c9 31 d2 e8 29 13 93 ff 48 85 db 74 58 48 8b 45 18 8b 74 24 04 48 8b 78 68 e8 81 b7 00 00 <48> 8b 58 70 e8 6c 04 8b ff 85 c0 74 31 80 3d cd 89 d4 00 00 75 28
[  322.520536] RSP: 0018:ffff888228f6bac0 EFLAGS: 00010286
[  322.520813] RAX: 0000000000000000 RBX: ffff888228cc3c00 RCX: 0000000000000000
[  322.521152] RDX: ffff888222215080 RSI: ffff888222215930 RDI: ffff888222215080
[  322.521478] RBP: ffff888228f6bbd8 R08: ffff888222215930 R09: 0000000000020377
[  322.521815] R10: 0000000000000000 R11: 784deca9f66dea1e R12: 0000000000000000
[  322.522143] R13: ffff88822a099000 R14: ffff888228f6bbd8 R15: ffffffff8258cc80
[  322.522491] FS:  00007fc5ee6a8000(0000) GS:ffff88822bc80000(0000) knlGS:0000000000000000
[  322.522862] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  322.523236] CR2: 0000000000000070 CR3: 0000000222954001 CR4: 0000000000360ee0
[  322.523657] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  322.524060] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  322.524461] Call Trace:
[  322.524707]  fib_select_path+0x5a/0x2c8
[  322.524998]  ip_route_output_key_hash_rcu+0x2d6/0x636
[  322.525372]  ip_route_output_key_hash+0x9f/0xb6
[  322.525697]  ip_route_output_flow+0x1c/0x58
[  322.525990]  raw_sendmsg+0x5e9/0xca4
[  322.526261]  ? mark_lock+0x68/0x24d
[  322.526536]  ? lock_acquire+0x233/0x24f
[  322.526823]  ? raw_abort+0x3f/0x3f
[  322.527086]  ? inet_send_prepare+0x3b/0x3b
[  322.527418]  ? sock_sendmsg_nosec+0x4f/0x9b
[  322.527721]  ? raw_abort+0x3f/0x3f
[  322.527984]  sock_sendmsg_nosec+0x4f/0x9b
[  322.528274]  __sys_sendto+0xdd/0x100
[  322.528551]  ? sockfd_lookup_light+0x72/0x96
[  322.528851]  ? trace_hardirqs_on_thunk+0x1a/0x1c
[  322.529159]  __x64_sys_sendto+0x25/0x28
[  322.529442]  do_syscall_64+0xd1/0xe1
[  322.529719]  entry_SYSCALL_64_after_hwframe+0x49/0xb3





^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 3/6] vxlan: ecmp support for mac fdb entries
  2020-05-19  2:14 ` [PATCH net-next 3/6] vxlan: ecmp support for mac fdb entries Roopa Prabhu
  2020-05-19  3:57   ` David Ahern
@ 2020-05-19 11:13   ` Nikolay Aleksandrov
  2020-05-19 11:28   ` Nikolay Aleksandrov
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 20+ messages in thread
From: Nikolay Aleksandrov @ 2020-05-19 11:13 UTC (permalink / raw)
  To: Roopa Prabhu, dsahern, davem; +Cc: netdev, jiri, idosch, petrm

On 19/05/2020 05:14, Roopa Prabhu wrote:
> From: Roopa Prabhu <roopa@cumulusnetworks.com>
> 
> Todays vxlan mac fdb entries can point to multiple remote
> ips (rdsts) with the sole purpose of replicating
> broadcast-multicast and unknown unicast packets to those remote ips.
> 
> E-VPN multihoming [1,2,3] requires bridged vxlan traffic to be
> load balanced to remote switches (vteps) belonging to the
> same multi-homed ethernet segment (E-VPN multihoming is analogous
> to multi-homed LAG implementations, but with the inter-switch
> peerlink replaced with a vxlan tunnel). In other words it needs
> support for mac ecmp. Furthermore, for faster convergence, E-VPN
> multihoming needs the ability to update fdb ecmp nexthops independent
> of the fdb entries.
> 
> New route nexthop API is perfect for this usecase.
> This patch extends the vxlan fdb code to take a nexthop id
> pointing to an ecmp nexthop group.
> 
> Changes include:
> - New NDA_NH_ID attribute for fdbs
> - Use the newly added fdb nexthop groups
> - makes vxlan rdsts and nexthop handling code mutually
>   exclusive
> - since this is a new use-case and the requirement is for ecmp
> nexthop groups, the fdb add and update path checks that the
> nexthop is really an ecmp nexthop group. This check can be relaxed
> in the future, if we want to introduce replication fdb nexthop groups
> and allow its use in lieu of current rdst lists.
> - fdb update requests with nexthop id's only allowed for existing
> fdb's that have nexthop id's
> - learning will not override an existing fdb entry with nexthop
> group
> - I have wrapped the switchdev offload code around the presence of
> rdst
> - I think there is scope for simplyfing vxlan_xmit_one: Will see
> what I can do before the non-RFC version
> 
> [1] E-VPN RFC https://tools.ietf.org/html/rfc7432
> [2] E-VPN with vxlan https://tools.ietf.org/html/rfc8365
> [3] http://vger.kernel.org/lpc_net2018_talks/scaling_bridge_fdb_database_slidesV3.pdf
> 
> Includes a nexthop_path_fdb_result NULL check crash fix from Nikolay Aleksandrov
> 
> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
> ---
>  drivers/net/vxlan.c            | 285 ++++++++++++++++++++++++++++++++---------
>  include/net/nexthop.h          |   2 +
>  include/net/vxlan.h            |  24 ++++
>  include/uapi/linux/neighbour.h |   1 +
>  4 files changed, 255 insertions(+), 57 deletions(-)
> 

Hi,
A few issues below. :)

> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index a5b415f..01933e9 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -26,6 +26,7 @@
>  #include <net/netns/generic.h>
>  #include <net/tun_proto.h>
>  #include <net/vxlan.h>
> +#include <net/nexthop.h>
>  
>  #if IS_ENABLED(CONFIG_IPV6)
>  #include <net/ip6_tunnel.h>
> @@ -78,6 +79,8 @@ struct vxlan_fdb {
>  	u16		  state;	/* see ndm_state */
>  	__be32		  vni;
>  	u16		  flags;	/* see ndm_flags and below */
> +	struct list_head  nh_list;
> +	struct nexthop __rcu *nh;
>  };
>  
>  #define NTF_VXLAN_ADDED_BY_USER 0x100
> @@ -174,11 +177,15 @@ static inline struct hlist_head *vs_head(struct net *net, __be16 port)
>   */
>  static inline struct vxlan_rdst *first_remote_rcu(struct vxlan_fdb *fdb)
>  {
> +	if (rcu_access_pointer(fdb->nh))
> +		return NULL;
>  	return list_entry_rcu(fdb->remotes.next, struct vxlan_rdst, list);
>  }
>  
>  static inline struct vxlan_rdst *first_remote_rtnl(struct vxlan_fdb *fdb)
>  {
> +	if (rcu_access_pointer(fdb->nh))
> +		return NULL;
>  	return list_first_entry(&fdb->remotes, struct vxlan_rdst, list);
>  }
>  
> @@ -251,9 +258,10 @@ static int vxlan_fdb_info(struct sk_buff *skb, struct vxlan_dev *vxlan,
>  {
>  	unsigned long now = jiffies;
>  	struct nda_cacheinfo ci;
> +	bool send_ip, send_eth;
>  	struct nlmsghdr *nlh;
> +	struct nexthop *nh;
>  	struct ndmsg *ndm;
> -	bool send_ip, send_eth;
>  
>  	nlh = nlmsg_put(skb, portid, seq, type, sizeof(*ndm), flags);
>  	if (nlh == NULL)
> @@ -265,15 +273,19 @@ static int vxlan_fdb_info(struct sk_buff *skb, struct vxlan_dev *vxlan,
>  	send_eth = send_ip = true;
>  
>  	if (type == RTM_GETNEIGH) {
> -		send_ip = !vxlan_addr_any(&rdst->remote_ip);
> +		if (rdst) {
> +			send_ip = !vxlan_addr_any(&rdst->remote_ip);
> +			ndm->ndm_family = send_ip ? rdst->remote_ip.sa.sa_family : AF_INET;
> +		} else if (rcu_access_pointer(fdb->nh)) {

You shouldn't dereference fdb->nh more than once in this function.

> +			ndm->ndm_family = nexthop_get_family(rcu_dereference(fdb->nh));
> +		}
>  		send_eth = !is_zero_ether_addr(fdb->eth_addr);
> -		ndm->ndm_family = send_ip ? rdst->remote_ip.sa.sa_family : AF_INET;
>  	} else
>  		ndm->ndm_family	= AF_BRIDGE;
>  	ndm->ndm_state = fdb->state;
>  	ndm->ndm_ifindex = vxlan->dev->ifindex;
>  	ndm->ndm_flags = fdb->flags;
> -	if (rdst->offloaded)
> +	if (rdst && rdst->offloaded)
>  		ndm->ndm_flags |= NTF_OFFLOADED;
>  	ndm->ndm_type = RTN_UNICAST;
>  
> @@ -284,23 +296,31 @@ static int vxlan_fdb_info(struct sk_buff *skb, struct vxlan_dev *vxlan,
>  
>  	if (send_eth && nla_put(skb, NDA_LLADDR, ETH_ALEN, &fdb->eth_addr))
>  		goto nla_put_failure;
> +	nh = rcu_dereference(fdb->nh);

This should be moved up and fdb->nh accessed only through it.

> +	if (nh) {
> +		if (nla_put_u32(skb, NDA_NH_ID, nh->id))
> +			goto nla_put_failure;
> +	} else if (rdst) {
> +		if (send_ip && vxlan_nla_put_addr(skb, NDA_DST,
> +						  &rdst->remote_ip))
> +			goto nla_put_failure;
> +
> +		if (rdst->remote_port &&
> +		    rdst->remote_port != vxlan->cfg.dst_port &&
> +		    nla_put_be16(skb, NDA_PORT, rdst->remote_port))
> +			goto nla_put_failure;
> +		if (rdst->remote_vni != vxlan->default_dst.remote_vni &&
> +		    nla_put_u32(skb, NDA_VNI, be32_to_cpu(rdst->remote_vni)))
> +			goto nla_put_failure;
> +		if (rdst->remote_ifindex &&
> +		    nla_put_u32(skb, NDA_IFINDEX, rdst->remote_ifindex))
> +			goto nla_put_failure;
> +	}
>  
> -	if (send_ip && vxlan_nla_put_addr(skb, NDA_DST, &rdst->remote_ip))
> -		goto nla_put_failure;
> -
> -	if (rdst->remote_port && rdst->remote_port != vxlan->cfg.dst_port &&
> -	    nla_put_be16(skb, NDA_PORT, rdst->remote_port))
> -		goto nla_put_failure;
> -	if (rdst->remote_vni != vxlan->default_dst.remote_vni &&
> -	    nla_put_u32(skb, NDA_VNI, be32_to_cpu(rdst->remote_vni)))
> -		goto nla_put_failure;
>  	if ((vxlan->cfg.flags & VXLAN_F_COLLECT_METADATA) && fdb->vni &&
>  	    nla_put_u32(skb, NDA_SRC_VNI,
>  			be32_to_cpu(fdb->vni)))
>  		goto nla_put_failure;
> -	if (rdst->remote_ifindex &&
> -	    nla_put_u32(skb, NDA_IFINDEX, rdst->remote_ifindex))
> -		goto nla_put_failure;
>  
>  	ci.ndm_used	 = jiffies_to_clock_t(now - fdb->used);
>  	ci.ndm_confirmed = 0;
> @@ -401,7 +421,7 @@ static int vxlan_fdb_notify(struct vxlan_dev *vxlan, struct vxlan_fdb *fdb,
>  {
>  	int err;
>  
> -	if (swdev_notify) {
> +	if (swdev_notify && rd) {
>  		switch (type) {
>  		case RTM_NEWNEIGH:
>  			err = vxlan_fdb_switchdev_call_notifiers(vxlan, fdb, rd,
> @@ -805,6 +825,8 @@ static struct vxlan_fdb *vxlan_fdb_alloc(const u8 *mac, __u16 state,
>  	f->flags = ndm_flags;
>  	f->updated = f->used = jiffies;
>  	f->vni = src_vni;
> +	f->nh = NULL;
> +	INIT_LIST_HEAD(&f->nh_list);
>  	INIT_LIST_HEAD(&f->remotes);
>  	memcpy(f->eth_addr, mac, ETH_ALEN);
>  
> @@ -819,11 +841,76 @@ static void vxlan_fdb_insert(struct vxlan_dev *vxlan, const u8 *mac,
>  			   vxlan_fdb_head(vxlan, mac, src_vni));
>  }
>  
> +static int vxlan_fdb_nh_update(struct vxlan_dev *vxlan, struct vxlan_fdb *fdb,
> +			       u32 nhid, struct netlink_ext_ack *extack)
> +{
> +	struct nexthop *old_nh = rtnl_dereference(fdb->nh);
> +	struct nh_group *nhg;
> +	struct nexthop *nh;
> +	int err = -EINVAL;
> +
> +	if (old_nh && old_nh->id == nhid)
> +		return 0;
> +
> +	nh = nexthop_find_by_id(vxlan->net, nhid);
> +	if (!nh) {
> +		NL_SET_ERR_MSG(extack, "Nexthop id does not exist");
> +		goto err_inval;
> +	}
> +
> +	if (nh) {
> +		if (!nexthop_get(nh)) {
> +			NL_SET_ERR_MSG(extack, "Nexthop has been deleted");
> +			nh = NULL;
> +			goto err_inval;
> +		}
> +		if (!nh->is_fdb_nh) {
> +			NL_SET_ERR_MSG(extack, "Nexthop is not a fdb nexthop");
> +			goto err_inval;
> +		}
> +
> +		if (!nh->is_group || !nh->nh_grp->mpath) {
> +			NL_SET_ERR_MSG(extack, "Nexthop is not a multipath group");
> +			goto err_inval;
> +		}
> +
> +		/* check nexthop group family */
> +		nhg = rtnl_dereference(nh->nh_grp);
> +		switch (vxlan->default_dst.remote_ip.sa.sa_family) {
> +		case AF_INET:
> +			if (!nhg->has_v4) {
> +				err = -EAFNOSUPPORT;
> +				NL_SET_ERR_MSG(extack, "Nexthop group family not supported");
> +				goto err_inval;
> +			}
> +			break;
> +		case AF_INET6:
> +			if (nhg->has_v4) {
> +				err = -EAFNOSUPPORT;
> +				NL_SET_ERR_MSG(extack, "Nexthop group family not supported");
> +				goto err_inval;
> +			}
> +		}
> +	}
> +
> +	rcu_assign_pointer(fdb->nh, nh);
> +	list_add_tail_rcu(&fdb->nh_list, &nh->fdb_list);
> +	if (old_nh)
> +		nexthop_put(old_nh);
> +	return 0;
> +
> +err_inval:
> +	if (nh)
> +		nexthop_put(nh);
> +	return err;
> +}
> +
>  static int vxlan_fdb_create(struct vxlan_dev *vxlan,
>  			    const u8 *mac, union vxlan_addr *ip,
>  			    __u16 state, __be16 port, __be32 src_vni,
>  			    __be32 vni, __u32 ifindex, __u16 ndm_flags,
> -			    struct vxlan_fdb **fdb)
> +			    u32 nhid, struct vxlan_fdb **fdb,
> +			    struct netlink_ext_ack *extack)
>  {
>  	struct vxlan_rdst *rd = NULL;
>  	struct vxlan_fdb *f;
> @@ -838,20 +925,33 @@ static int vxlan_fdb_create(struct vxlan_dev *vxlan,
>  	if (!f)
>  		return -ENOMEM;
>  
> -	rc = vxlan_fdb_append(f, ip, port, vni, ifindex, &rd);
> -	if (rc < 0) {
> -		kfree(f);
> -		return rc;
> -	}
> +	if (nhid)
> +		rc = vxlan_fdb_nh_update(vxlan, f, nhid, extack);
> +	else
> +		rc = vxlan_fdb_append(f, ip, port, vni, ifindex, &rd);
> +	if (rc < 0)
> +		goto errout;
>  
>  	*fdb = f;
>  
>  	return 0;
> +
> +errout:
> +	kfree(f);
> +	return rc;
>  }
>  
>  static void __vxlan_fdb_free(struct vxlan_fdb *f)
>  {
>  	struct vxlan_rdst *rd, *nd;
> +	struct nexthop *nh;
> +
> +	nh = rcu_dereference_raw(f->nh);
> +	if (nh) {
> +		rcu_assign_pointer(f->nh, NULL);
> +		list_del_rcu(&f->nh_list);
> +		nexthop_put(nh);
> +	}
>  
>  	list_for_each_entry_safe(rd, nd, &f->remotes, list) {
>  		dst_cache_destroy(&rd->dst_cache);
> @@ -875,10 +975,15 @@ static void vxlan_fdb_destroy(struct vxlan_dev *vxlan, struct vxlan_fdb *f,
>  	netdev_dbg(vxlan->dev, "delete %pM\n", f->eth_addr);
>  
>  	--vxlan->addrcnt;
> -	if (do_notify)
> -		list_for_each_entry(rd, &f->remotes, list)
> -			vxlan_fdb_notify(vxlan, f, rd, RTM_DELNEIGH,
> +	if (do_notify) {
> +		if (rcu_access_pointer(f->nh))
> +			vxlan_fdb_notify(vxlan, f, NULL, RTM_DELNEIGH,
>  					 swdev_notify, NULL);
> +		else
> +			list_for_each_entry(rd, &f->remotes, list)
> +				vxlan_fdb_notify(vxlan, f, rd, RTM_DELNEIGH,
> +						 swdev_notify, NULL);
> +	}
>  
>  	hlist_del_rcu(&f->hlist);
>  	call_rcu(&f->rcu, vxlan_fdb_free);
> @@ -897,7 +1002,7 @@ static int vxlan_fdb_update_existing(struct vxlan_dev *vxlan,
>  				     __u16 state, __u16 flags,
>  				     __be16 port, __be32 vni,
>  				     __u32 ifindex, __u16 ndm_flags,
> -				     struct vxlan_fdb *f,
> +				     struct vxlan_fdb *f, u32 nhid,
>  				     bool swdev_notify,
>  				     struct netlink_ext_ack *extack)
>  {
> @@ -908,6 +1013,12 @@ static int vxlan_fdb_update_existing(struct vxlan_dev *vxlan,
>  	int rc = 0;
>  	int err;
>  
> +	if (nhid && !rcu_access_pointer(f->nh)) {
> +		NL_SET_ERR_MSG(extack,
> +			       "Cannot replace an existing non nexthop fdb with a nexthop");
> +		return -EOPNOTSUPP;
> +	}
> +
>  	/* Do not allow an externally learned entry to take over an entry added
>  	 * by the user.
>  	 */
> @@ -925,6 +1036,14 @@ static int vxlan_fdb_update_existing(struct vxlan_dev *vxlan,
>  		}
>  	}
>  
> +	if (nhid) {
> +		rc = vxlan_fdb_nh_update(vxlan, f, nhid, extack);
> +		if (rc < 0)
> +			return rc;
> +		notify = 1;
> +		f->updated = jiffies;
> +	}
> +
>  	if ((flags & NLM_F_REPLACE)) {
>  		/* Only change unicasts */
>  		if (!(is_multicast_ether_addr(f->eth_addr) ||
> @@ -975,7 +1094,7 @@ static int vxlan_fdb_update_create(struct vxlan_dev *vxlan,
>  				   const u8 *mac, union vxlan_addr *ip,
>  				   __u16 state, __u16 flags,
>  				   __be16 port, __be32 src_vni, __be32 vni,
> -				   __u32 ifindex, __u16 ndm_flags,
> +				   __u32 ifindex, __u16 ndm_flags, u32 nhid,
>  				   bool swdev_notify,
>  				   struct netlink_ext_ack *extack)
>  {
> @@ -990,7 +1109,7 @@ static int vxlan_fdb_update_create(struct vxlan_dev *vxlan,
>  
>  	netdev_dbg(vxlan->dev, "add %pM -> %pIS\n", mac, ip);
>  	rc = vxlan_fdb_create(vxlan, mac, ip, state, port, src_vni,
> -			      vni, ifindex, fdb_flags, &f);
> +			      vni, ifindex, fdb_flags, nhid, &f, extack);
>  	if (rc < 0)
>  		return rc;
>  
> @@ -1012,7 +1131,7 @@ static int vxlan_fdb_update(struct vxlan_dev *vxlan,
>  			    const u8 *mac, union vxlan_addr *ip,
>  			    __u16 state, __u16 flags,
>  			    __be16 port, __be32 src_vni, __be32 vni,
> -			    __u32 ifindex, __u16 ndm_flags,
> +			    __u32 ifindex, __u16 ndm_flags, u32 nhid,
>  			    bool swdev_notify,
>  			    struct netlink_ext_ack *extack)
>  {
> @@ -1028,14 +1147,15 @@ static int vxlan_fdb_update(struct vxlan_dev *vxlan,
>  
>  		return vxlan_fdb_update_existing(vxlan, ip, state, flags, port,
>  						 vni, ifindex, ndm_flags, f,
> -						 swdev_notify, extack);
> +						 nhid, swdev_notify, extack);
>  	} else {
>  		if (!(flags & NLM_F_CREATE))
>  			return -ENOENT;
>  
>  		return vxlan_fdb_update_create(vxlan, mac, ip, state, flags,
>  					       port, src_vni, vni, ifindex,
> -					       ndm_flags, swdev_notify, extack);
> +					       ndm_flags, nhid, swdev_notify,
> +					       extack);
>  	}
>  }
>  
> @@ -1049,7 +1169,7 @@ static void vxlan_fdb_dst_destroy(struct vxlan_dev *vxlan, struct vxlan_fdb *f,
>  
>  static int vxlan_fdb_parse(struct nlattr *tb[], struct vxlan_dev *vxlan,
>  			   union vxlan_addr *ip, __be16 *port, __be32 *src_vni,
> -			   __be32 *vni, u32 *ifindex)
> +			   __be32 *vni, u32 *ifindex, u32 *nhid)
>  {
>  	struct net *net = dev_net(vxlan->dev);
>  	int err;
> @@ -1109,6 +1229,11 @@ static int vxlan_fdb_parse(struct nlattr *tb[], struct vxlan_dev *vxlan,
>  		*ifindex = 0;
>  	}
>  
> +	if (tb[NDA_NH_ID])
> +		*nhid = nla_get_u32(tb[NDA_NH_ID]);
> +	else
> +		*nhid = 0;
> +
>  	return 0;
>  }
>  
> @@ -1123,7 +1248,7 @@ static int vxlan_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
>  	union vxlan_addr ip;
>  	__be16 port;
>  	__be32 src_vni, vni;
> -	u32 ifindex;
> +	u32 ifindex, nhid;
>  	u32 hash_index;
>  	int err;
>  
> @@ -1133,10 +1258,11 @@ static int vxlan_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
>  		return -EINVAL;
>  	}
>  
> -	if (tb[NDA_DST] == NULL)
> +	if (!tb || (!tb[NDA_DST] && !tb[NDA_NH_ID]))
>  		return -EINVAL;
>  
> -	err = vxlan_fdb_parse(tb, vxlan, &ip, &port, &src_vni, &vni, &ifindex);
> +	err = vxlan_fdb_parse(tb, vxlan, &ip, &port, &src_vni, &vni, &ifindex,
> +			      &nhid);
>  	if (err)
>  		return err;
>  
> @@ -1148,7 +1274,7 @@ static int vxlan_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
>  	err = vxlan_fdb_update(vxlan, addr, &ip, ndm->ndm_state, flags,
>  			       port, src_vni, vni, ifindex,
>  			       ndm->ndm_flags | NTF_VXLAN_ADDED_BY_USER,
> -			       true, extack);
> +			       nhid, true, extack);
>  	spin_unlock_bh(&vxlan->hash_lock[hash_index]);
>  
>  	return err;
> @@ -1159,8 +1285,8 @@ static int __vxlan_fdb_delete(struct vxlan_dev *vxlan,
>  			      __be16 port, __be32 src_vni, __be32 vni,
>  			      u32 ifindex, bool swdev_notify)
>  {
> -	struct vxlan_fdb *f;
>  	struct vxlan_rdst *rd = NULL;
> +	struct vxlan_fdb *f;
>  	int err = -ENOENT;
>  
>  	f = vxlan_find_mac(vxlan, addr, src_vni);
> @@ -1195,12 +1321,13 @@ static int vxlan_fdb_delete(struct ndmsg *ndm, struct nlattr *tb[],
>  	struct vxlan_dev *vxlan = netdev_priv(dev);
>  	union vxlan_addr ip;
>  	__be32 src_vni, vni;
> -	__be16 port;
> -	u32 ifindex;
> +	u32 ifindex, nhid;
>  	u32 hash_index;
> +	__be16 port;
>  	int err;
>  
> -	err = vxlan_fdb_parse(tb, vxlan, &ip, &port, &src_vni, &vni, &ifindex);
> +	err = vxlan_fdb_parse(tb, vxlan, &ip, &port, &src_vni, &vni, &ifindex,
> +			      &nhid);
>  	if (err)
>  		return err;
>  
> @@ -1228,6 +1355,17 @@ static int vxlan_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb,
>  		hlist_for_each_entry_rcu(f, &vxlan->fdb_head[h], hlist) {
>  			struct vxlan_rdst *rd;
>  
> +			if (rcu_access_pointer(f->nh)) {
> +				err = vxlan_fdb_info(skb, vxlan, f,
> +						     NETLINK_CB(cb->skb).portid,
> +						     cb->nlh->nlmsg_seq,
> +						     RTM_NEWNEIGH,
> +						     NLM_F_MULTI, NULL);
> +				if (err < 0)
> +					goto out;
> +				continue;
> +			}
> +
>  			list_for_each_entry_rcu(rd, &f->remotes, list) {
>  				if (*idx < cb->args[2])
>  					goto skip;
> @@ -1311,6 +1449,10 @@ static bool vxlan_snoop(struct net_device *dev,
>  		if (f->state & (NUD_PERMANENT | NUD_NOARP))
>  			return true;
>  
> +		/* Don't override an fdb with nexthop with a learnt entry */
> +		if (rcu_access_pointer(f->nh))
> +			return true;
> +
>  		if (net_ratelimit())
>  			netdev_info(dev,
>  				    "%pM migrated from %pIS to %pIS\n",
> @@ -1333,7 +1475,7 @@ static bool vxlan_snoop(struct net_device *dev,
>  					 vxlan->cfg.dst_port,
>  					 vni,
>  					 vxlan->default_dst.remote_vni,
> -					 ifindex, NTF_SELF, true, NULL);
> +					 ifindex, NTF_SELF, 0, true, NULL);
>  		spin_unlock(&vxlan->hash_lock[hash_index]);
>  	}
>  
> @@ -2616,6 +2758,30 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
>  	kfree_skb(skb);
>  }
>  
> +static void vxlan_xmit_nh(struct sk_buff *skb, struct net_device *dev,
> +			  struct vxlan_fdb *f, __be32 vni, bool did_rsc)
> +{
> +	struct vxlan_rdst nh_rdst;
> +	struct nexthop *nh;
> +	bool do_xmit;
> +	u32 hash;
> +
> +	memset(&nh_rdst, 0, sizeof(struct vxlan_rdst));
> +	hash = skb_get_hash(skb);
> +
> +	rcu_read_lock();
> +	nh = rcu_dereference(f->nh);
> +	if (!nh)

return without rcu_read_unlock() 

> +		return;
> +	do_xmit = vxlan_fdb_nh_path_select(nh, hash, &nh_rdst);
> +	rcu_read_unlock();
> +
> +	if (likely(do_xmit))
> +		vxlan_xmit_one(skb, dev, vni, &nh_rdst, did_rsc);
> +	else
> +		kfree_skb(skb);
> +}
> +
>  /* Transmit local packets over Vxlan
>   *
>   * Outer IP header inherits ECN and DF from inner header.
> @@ -2692,22 +2858,27 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev)
>  		}
>  	}
>  
> -	list_for_each_entry_rcu(rdst, &f->remotes, list) {
> -		struct sk_buff *skb1;
> +	if (rcu_access_pointer(f->nh)) {
> +		vxlan_xmit_nh(skb, dev, f,
> +			      (vni ? : vxlan->default_dst.remote_vni), did_rsc);
> +	} else {
> +		list_for_each_entry_rcu(rdst, &f->remotes, list) {
> +			struct sk_buff *skb1;
>  
> -		if (!fdst) {
> -			fdst = rdst;
> -			continue;
> +			if (!fdst) {
> +				fdst = rdst;
> +				continue;
> +			}
> +			skb1 = skb_clone(skb, GFP_ATOMIC);
> +			if (skb1)
> +				vxlan_xmit_one(skb1, dev, vni, rdst, did_rsc);
>  		}
> -		skb1 = skb_clone(skb, GFP_ATOMIC);
> -		if (skb1)
> -			vxlan_xmit_one(skb1, dev, vni, rdst, did_rsc);
> +		if (fdst)
> +			vxlan_xmit_one(skb, dev, vni, fdst, did_rsc);
> +		else
> +			kfree_skb(skb);
>  	}
>  
> -	if (fdst)
> -		vxlan_xmit_one(skb, dev, vni, fdst, did_rsc);
> -	else
> -		kfree_skb(skb);
>  	return NETDEV_TX_OK;
>  }
>  
> @@ -3615,7 +3786,7 @@ static int __vxlan_dev_create(struct net *net, struct net_device *dev,
>  				       dst->remote_vni,
>  				       dst->remote_vni,
>  				       dst->remote_ifindex,
> -				       NTF_SELF, &f);
> +				       NTF_SELF, 0, &f, extack);
>  		if (err)
>  			return err;
>  	}
> @@ -4013,7 +4184,7 @@ static int vxlan_changelink(struct net_device *dev, struct nlattr *tb[],
>  					       vxlan->cfg.dst_port,
>  					       conf.vni, conf.vni,
>  					       conf.remote_ifindex,
> -					       NTF_SELF, true, extack);
> +					       NTF_SELF, 0, true, extack);
>  			if (err) {
>  				spin_unlock_bh(&vxlan->hash_lock[hash_index]);
>  				netdev_adjacent_change_abort(dst->remote_dev,
> @@ -4335,7 +4506,7 @@ vxlan_fdb_external_learn_add(struct net_device *dev,
>  			       fdb_info->remote_vni,
>  			       fdb_info->remote_ifindex,
>  			       NTF_USE | NTF_SELF | NTF_EXT_LEARNED,
> -			       false, extack);
> +			       0, false, extack);
>  	spin_unlock_bh(&vxlan->hash_lock[hash_index]);
>  
>  	return err;
> diff --git a/include/net/nexthop.h b/include/net/nexthop.h
> index 04dafc6..d929c98 100644
> --- a/include/net/nexthop.h
> +++ b/include/net/nexthop.h
> @@ -331,6 +331,8 @@ static inline struct fib_nh_common *nexthop_path_fdb_result(struct nexthop *nh,
>  	struct nexthop *nhp;
>  
>  	nhp = nexthop_select_path(nh, hash);
> +	if (unlikely(!nhp))
> +		return NULL

This fix belongs in the previous patch.

;
>  	nhi = rcu_dereference(nhp->nh_info);
>  	return &nhi->fib_nhc;
>  }
> diff --git a/include/net/vxlan.h b/include/net/vxlan.h
> index 373aadc..31bca3e 100644
> --- a/include/net/vxlan.h
> +++ b/include/net/vxlan.h
> @@ -487,4 +487,28 @@ static inline void vxlan_flag_attr_error(int attrtype,
>  #undef VXLAN_FLAG
>  }
>  
> +static inline bool vxlan_fdb_nh_path_select(struct nexthop *nh,
> +					    int hash,
> +					    struct vxlan_rdst *rdst)
> +{
> +	struct fib_nh_common *nhc;
> +
> +	nhc = nexthop_path_fdb_result(nh, hash);
> +	if (unlikely(!nhc))
> +		return false;
> +
> +	switch (nhc->nhc_gw_family) {
> +	case AF_INET:
> +		rdst->remote_ip.sin.sin_addr.s_addr = nhc->nhc_gw.ipv4;
> +		rdst->remote_ip.sa.sa_family = AF_INET;
> +		break;
> +	case AF_INET6:
> +		rdst->remote_ip.sin6.sin6_addr = nhc->nhc_gw.ipv6;
> +		rdst->remote_ip.sa.sa_family = AF_INET6;
> +		break;
> +	}
> +
> +	return true;
> +}
> +
>  #endif
> diff --git a/include/uapi/linux/neighbour.h b/include/uapi/linux/neighbour.h
> index cd144e3..eefcda8 100644
> --- a/include/uapi/linux/neighbour.h
> +++ b/include/uapi/linux/neighbour.h
> @@ -29,6 +29,7 @@ enum {
>  	NDA_LINK_NETNSID,
>  	NDA_SRC_VNI,
>  	NDA_PROTOCOL,  /* Originator of entry */
> +	NDA_NH_ID,
>  	__NDA_MAX
>  };
>  
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 3/6] vxlan: ecmp support for mac fdb entries
  2020-05-19  2:14 ` [PATCH net-next 3/6] vxlan: ecmp support for mac fdb entries Roopa Prabhu
  2020-05-19  3:57   ` David Ahern
  2020-05-19 11:13   ` Nikolay Aleksandrov
@ 2020-05-19 11:28   ` Nikolay Aleksandrov
  2020-05-19 12:55     ` kbuild test robot
  2020-05-19 13:31     ` kbuild test robot
  4 siblings, 0 replies; 20+ messages in thread
From: Nikolay Aleksandrov @ 2020-05-19 11:28 UTC (permalink / raw)
  To: Roopa Prabhu, dsahern, davem; +Cc: netdev, jiri, idosch, petrm

On 19/05/2020 05:14, Roopa Prabhu wrote:
> From: Roopa Prabhu <roopa@cumulusnetworks.com>
> 
> Todays vxlan mac fdb entries can point to multiple remote
> ips (rdsts) with the sole purpose of replicating
> broadcast-multicast and unknown unicast packets to those remote ips.
> 
> E-VPN multihoming [1,2,3] requires bridged vxlan traffic to be
> load balanced to remote switches (vteps) belonging to the
> same multi-homed ethernet segment (E-VPN multihoming is analogous
> to multi-homed LAG implementations, but with the inter-switch
> peerlink replaced with a vxlan tunnel). In other words it needs
> support for mac ecmp. Furthermore, for faster convergence, E-VPN
> multihoming needs the ability to update fdb ecmp nexthops independent
> of the fdb entries.
> 
> New route nexthop API is perfect for this usecase.
> This patch extends the vxlan fdb code to take a nexthop id
> pointing to an ecmp nexthop group.
> 
> Changes include:
> - New NDA_NH_ID attribute for fdbs
> - Use the newly added fdb nexthop groups
> - makes vxlan rdsts and nexthop handling code mutually
>   exclusive
> - since this is a new use-case and the requirement is for ecmp
> nexthop groups, the fdb add and update path checks that the
> nexthop is really an ecmp nexthop group. This check can be relaxed
> in the future, if we want to introduce replication fdb nexthop groups
> and allow its use in lieu of current rdst lists.
> - fdb update requests with nexthop id's only allowed for existing
> fdb's that have nexthop id's
> - learning will not override an existing fdb entry with nexthop
> group
> - I have wrapped the switchdev offload code around the presence of
> rdst
> - I think there is scope for simplyfing vxlan_xmit_one: Will see
> what I can do before the non-RFC version
> 
> [1] E-VPN RFC https://tools.ietf.org/html/rfc7432
> [2] E-VPN with vxlan https://tools.ietf.org/html/rfc8365
> [3] http://vger.kernel.org/lpc_net2018_talks/scaling_bridge_fdb_database_slidesV3.pdf
> 
> Includes a nexthop_path_fdb_result NULL check crash fix from Nikolay Aleksandrov
> 
> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
> ---
>  drivers/net/vxlan.c            | 285 ++++++++++++++++++++++++++++++++---------
>  include/net/nexthop.h          |   2 +
>  include/net/vxlan.h            |  24 ++++
>  include/uapi/linux/neighbour.h |   1 +
>  4 files changed, 255 insertions(+), 57 deletions(-)
> 

Unfortunately I missed a few potential issues in my previous review, more below.

> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index a5b415f..01933e9 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -26,6 +26,7 @@
>  #include <net/netns/generic.h>
>  #include <net/tun_proto.h>
>  #include <net/vxlan.h>
> +#include <net/nexthop.h>
>  
>  #if IS_ENABLED(CONFIG_IPV6)
>  #include <net/ip6_tunnel.h>
> @@ -78,6 +79,8 @@ struct vxlan_fdb {
>  	u16		  state;	/* see ndm_state */
>  	__be32		  vni;
>  	u16		  flags;	/* see ndm_flags and below */
> +	struct list_head  nh_list;
> +	struct nexthop __rcu *nh;
>  };
>  
>  #define NTF_VXLAN_ADDED_BY_USER 0x100
> @@ -174,11 +177,15 @@ static inline struct hlist_head *vs_head(struct net *net, __be16 port)
>   */
>  static inline struct vxlan_rdst *first_remote_rcu(struct vxlan_fdb *fdb)
>  {
> +	if (rcu_access_pointer(fdb->nh))
> +		return NULL;
>  	return list_entry_rcu(fdb->remotes.next, struct vxlan_rdst, list);
>  }
>  
>  static inline struct vxlan_rdst *first_remote_rtnl(struct vxlan_fdb *fdb)
>  {
> +	if (rcu_access_pointer(fdb->nh))
> +		return NULL;
>  	return list_first_entry(&fdb->remotes, struct vxlan_rdst, list);
>  }
>  
> @@ -251,9 +258,10 @@ static int vxlan_fdb_info(struct sk_buff *skb, struct vxlan_dev *vxlan,
>  {
>  	unsigned long now = jiffies;
>  	struct nda_cacheinfo ci;
> +	bool send_ip, send_eth;
>  	struct nlmsghdr *nlh;
> +	struct nexthop *nh;
>  	struct ndmsg *ndm;
> -	bool send_ip, send_eth;
>  
>  	nlh = nlmsg_put(skb, portid, seq, type, sizeof(*ndm), flags);
>  	if (nlh == NULL)
> @@ -265,15 +273,19 @@ static int vxlan_fdb_info(struct sk_buff *skb, struct vxlan_dev *vxlan,
>  	send_eth = send_ip = true;
>  
>  	if (type == RTM_GETNEIGH) {
> -		send_ip = !vxlan_addr_any(&rdst->remote_ip);
> +		if (rdst) {
> +			send_ip = !vxlan_addr_any(&rdst->remote_ip);
> +			ndm->ndm_family = send_ip ? rdst->remote_ip.sa.sa_family : AF_INET;
> +		} else if (rcu_access_pointer(fdb->nh)) {
> +			ndm->ndm_family = nexthop_get_family(rcu_dereference(fdb->nh));
> +		}
>  		send_eth = !is_zero_ether_addr(fdb->eth_addr);
> -		ndm->ndm_family = send_ip ? rdst->remote_ip.sa.sa_family : AF_INET;
>  	} else
>  		ndm->ndm_family	= AF_BRIDGE;
>  	ndm->ndm_state = fdb->state;
>  	ndm->ndm_ifindex = vxlan->dev->ifindex;
>  	ndm->ndm_flags = fdb->flags;
> -	if (rdst->offloaded)
> +	if (rdst && rdst->offloaded)
>  		ndm->ndm_flags |= NTF_OFFLOADED;
>  	ndm->ndm_type = RTN_UNICAST;
>  
> @@ -284,23 +296,31 @@ static int vxlan_fdb_info(struct sk_buff *skb, struct vxlan_dev *vxlan,
>  
>  	if (send_eth && nla_put(skb, NDA_LLADDR, ETH_ALEN, &fdb->eth_addr))
>  		goto nla_put_failure;
> +	nh = rcu_dereference(fdb->nh);


One more thing that just occurred to me, I think you should use rcu_dereference_rtnl() here.
I think this function can be called when holding only rtnl.

> +	if (nh) {
> +		if (nla_put_u32(skb, NDA_NH_ID, nh->id))
> +			goto nla_put_failure;
> +	} else if (rdst) {
> +		if (send_ip && vxlan_nla_put_addr(skb, NDA_DST,
> +						  &rdst->remote_ip))
> +			goto nla_put_failure;
> +
> +		if (rdst->remote_port &&
> +		    rdst->remote_port != vxlan->cfg.dst_port &&
> +		    nla_put_be16(skb, NDA_PORT, rdst->remote_port))
> +			goto nla_put_failure;
> +		if (rdst->remote_vni != vxlan->default_dst.remote_vni &&
> +		    nla_put_u32(skb, NDA_VNI, be32_to_cpu(rdst->remote_vni)))
> +			goto nla_put_failure;
> +		if (rdst->remote_ifindex &&
> +		    nla_put_u32(skb, NDA_IFINDEX, rdst->remote_ifindex))
> +			goto nla_put_failure;
> +	}
>  
> -	if (send_ip && vxlan_nla_put_addr(skb, NDA_DST, &rdst->remote_ip))
> -		goto nla_put_failure;
> -
> -	if (rdst->remote_port && rdst->remote_port != vxlan->cfg.dst_port &&
> -	    nla_put_be16(skb, NDA_PORT, rdst->remote_port))
> -		goto nla_put_failure;
> -	if (rdst->remote_vni != vxlan->default_dst.remote_vni &&
> -	    nla_put_u32(skb, NDA_VNI, be32_to_cpu(rdst->remote_vni)))
> -		goto nla_put_failure;
>  	if ((vxlan->cfg.flags & VXLAN_F_COLLECT_METADATA) && fdb->vni &&
>  	    nla_put_u32(skb, NDA_SRC_VNI,
>  			be32_to_cpu(fdb->vni)))
>  		goto nla_put_failure;
> -	if (rdst->remote_ifindex &&
> -	    nla_put_u32(skb, NDA_IFINDEX, rdst->remote_ifindex))
> -		goto nla_put_failure;
>  
>  	ci.ndm_used	 = jiffies_to_clock_t(now - fdb->used);
>  	ci.ndm_confirmed = 0;
> @@ -401,7 +421,7 @@ static int vxlan_fdb_notify(struct vxlan_dev *vxlan, struct vxlan_fdb *fdb,
>  {
>  	int err;
>  
> -	if (swdev_notify) {
> +	if (swdev_notify && rd) {
>  		switch (type) {
>  		case RTM_NEWNEIGH:
>  			err = vxlan_fdb_switchdev_call_notifiers(vxlan, fdb, rd,
> @@ -805,6 +825,8 @@ static struct vxlan_fdb *vxlan_fdb_alloc(const u8 *mac, __u16 state,
>  	f->flags = ndm_flags;
>  	f->updated = f->used = jiffies;
>  	f->vni = src_vni;
> +	f->nh = NULL;
> +	INIT_LIST_HEAD(&f->nh_list);
>  	INIT_LIST_HEAD(&f->remotes);
>  	memcpy(f->eth_addr, mac, ETH_ALEN);
>  
> @@ -819,11 +841,76 @@ static void vxlan_fdb_insert(struct vxlan_dev *vxlan, const u8 *mac,
>  			   vxlan_fdb_head(vxlan, mac, src_vni));
>  }
>  
> +static int vxlan_fdb_nh_update(struct vxlan_dev *vxlan, struct vxlan_fdb *fdb,
> +			       u32 nhid, struct netlink_ext_ack *extack)
> +{
> +	struct nexthop *old_nh = rtnl_dereference(fdb->nh);
> +	struct nh_group *nhg;
> +	struct nexthop *nh;
> +	int err = -EINVAL;
> +
> +	if (old_nh && old_nh->id == nhid)
> +		return 0;
> +
> +	nh = nexthop_find_by_id(vxlan->net, nhid);
> +	if (!nh) {
> +		NL_SET_ERR_MSG(extack, "Nexthop id does not exist");
> +		goto err_inval;
> +	}
> +
> +	if (nh) {
> +		if (!nexthop_get(nh)) {
> +			NL_SET_ERR_MSG(extack, "Nexthop has been deleted");
> +			nh = NULL;
> +			goto err_inval;
> +		}
> +		if (!nh->is_fdb_nh) {
> +			NL_SET_ERR_MSG(extack, "Nexthop is not a fdb nexthop");
> +			goto err_inval;
> +		}
> +
> +		if (!nh->is_group || !nh->nh_grp->mpath) {
> +			NL_SET_ERR_MSG(extack, "Nexthop is not a multipath group");
> +			goto err_inval;
> +		}
> +
> +		/* check nexthop group family */
> +		nhg = rtnl_dereference(nh->nh_grp);
> +		switch (vxlan->default_dst.remote_ip.sa.sa_family) {
> +		case AF_INET:
> +			if (!nhg->has_v4) {
> +				err = -EAFNOSUPPORT;
> +				NL_SET_ERR_MSG(extack, "Nexthop group family not supported");
> +				goto err_inval;
> +			}
> +			break;
> +		case AF_INET6:
> +			if (nhg->has_v4) {
> +				err = -EAFNOSUPPORT;
> +				NL_SET_ERR_MSG(extack, "Nexthop group family not supported");
> +				goto err_inval;
> +			}
> +		}
> +	}
> +
> +	rcu_assign_pointer(fdb->nh, nh);
> +	list_add_tail_rcu(&fdb->nh_list, &nh->fdb_list);
> +	if (old_nh)
> +		nexthop_put(old_nh);

Isn't old_nh in that list as well ?

> +	return 0;
> +
> +err_inval:
> +	if (nh)
> +		nexthop_put(nh);
> +	return err;
> +}
> +
>  static int vxlan_fdb_create(struct vxlan_dev *vxlan,
>  			    const u8 *mac, union vxlan_addr *ip,
>  			    __u16 state, __be16 port, __be32 src_vni,
>  			    __be32 vni, __u32 ifindex, __u16 ndm_flags,
> -			    struct vxlan_fdb **fdb)
> +			    u32 nhid, struct vxlan_fdb **fdb,
> +			    struct netlink_ext_ack *extack)
>  {
>  	struct vxlan_rdst *rd = NULL;
>  	struct vxlan_fdb *f;
> @@ -838,20 +925,33 @@ static int vxlan_fdb_create(struct vxlan_dev *vxlan,
>  	if (!f)
>  		return -ENOMEM;
>  
> -	rc = vxlan_fdb_append(f, ip, port, vni, ifindex, &rd);
> -	if (rc < 0) {
> -		kfree(f);
> -		return rc;
> -	}
> +	if (nhid)
> +		rc = vxlan_fdb_nh_update(vxlan, f, nhid, extack);
> +	else
> +		rc = vxlan_fdb_append(f, ip, port, vni, ifindex, &rd);
> +	if (rc < 0)
> +		goto errout;
>  
>  	*fdb = f;
>  
>  	return 0;
> +
> +errout:
> +	kfree(f);
> +	return rc;
>  }
>  
>  static void __vxlan_fdb_free(struct vxlan_fdb *f)
>  {
>  	struct vxlan_rdst *rd, *nd;
> +	struct nexthop *nh;
> +
> +	nh = rcu_dereference_raw(f->nh);
> +	if (nh) {
> +		rcu_assign_pointer(f->nh, NULL);
> +		list_del_rcu(&f->nh_list);
> +		nexthop_put(nh);
> +	}
>  
>  	list_for_each_entry_safe(rd, nd, &f->remotes, list) {
>  		dst_cache_destroy(&rd->dst_cache);
> @@ -875,10 +975,15 @@ static void vxlan_fdb_destroy(struct vxlan_dev *vxlan, struct vxlan_fdb *f,
>  	netdev_dbg(vxlan->dev, "delete %pM\n", f->eth_addr);
>  
>  	--vxlan->addrcnt;
> -	if (do_notify)
> -		list_for_each_entry(rd, &f->remotes, list)
> -			vxlan_fdb_notify(vxlan, f, rd, RTM_DELNEIGH,
> +	if (do_notify) {
> +		if (rcu_access_pointer(f->nh))
> +			vxlan_fdb_notify(vxlan, f, NULL, RTM_DELNEIGH,
>  					 swdev_notify, NULL);
> +		else
> +			list_for_each_entry(rd, &f->remotes, list)
> +				vxlan_fdb_notify(vxlan, f, rd, RTM_DELNEIGH,
> +						 swdev_notify, NULL);
> +	}
>  
>  	hlist_del_rcu(&f->hlist);
>  	call_rcu(&f->rcu, vxlan_fdb_free);
> @@ -897,7 +1002,7 @@ static int vxlan_fdb_update_existing(struct vxlan_dev *vxlan,
>  				     __u16 state, __u16 flags,
>  				     __be16 port, __be32 vni,
>  				     __u32 ifindex, __u16 ndm_flags,
> -				     struct vxlan_fdb *f,
> +				     struct vxlan_fdb *f, u32 nhid,
>  				     bool swdev_notify,
>  				     struct netlink_ext_ack *extack)
>  {
> @@ -908,6 +1013,12 @@ static int vxlan_fdb_update_existing(struct vxlan_dev *vxlan,
>  	int rc = 0;
>  	int err;
>  
> +	if (nhid && !rcu_access_pointer(f->nh)) {
> +		NL_SET_ERR_MSG(extack,
> +			       "Cannot replace an existing non nexthop fdb with a nexthop");
> +		return -EOPNOTSUPP;
> +	}
> +
>  	/* Do not allow an externally learned entry to take over an entry added
>  	 * by the user.
>  	 */
> @@ -925,6 +1036,14 @@ static int vxlan_fdb_update_existing(struct vxlan_dev *vxlan,
>  		}
>  	}
>  
> +	if (nhid) {
> +		rc = vxlan_fdb_nh_update(vxlan, f, nhid, extack);
> +		if (rc < 0)
> +			return rc;
> +		notify = 1;
> +		f->updated = jiffies;
> +	}
> +
>  	if ((flags & NLM_F_REPLACE)) {
>  		/* Only change unicasts */
>  		if (!(is_multicast_ether_addr(f->eth_addr) ||
> @@ -975,7 +1094,7 @@ static int vxlan_fdb_update_create(struct vxlan_dev *vxlan,
>  				   const u8 *mac, union vxlan_addr *ip,
>  				   __u16 state, __u16 flags,
>  				   __be16 port, __be32 src_vni, __be32 vni,
> -				   __u32 ifindex, __u16 ndm_flags,
> +				   __u32 ifindex, __u16 ndm_flags, u32 nhid,
>  				   bool swdev_notify,
>  				   struct netlink_ext_ack *extack)
>  {
> @@ -990,7 +1109,7 @@ static int vxlan_fdb_update_create(struct vxlan_dev *vxlan,
>  
>  	netdev_dbg(vxlan->dev, "add %pM -> %pIS\n", mac, ip);
>  	rc = vxlan_fdb_create(vxlan, mac, ip, state, port, src_vni,
> -			      vni, ifindex, fdb_flags, &f);
> +			      vni, ifindex, fdb_flags, nhid, &f, extack);
>  	if (rc < 0)
>  		return rc;
>  
> @@ -1012,7 +1131,7 @@ static int vxlan_fdb_update(struct vxlan_dev *vxlan,
>  			    const u8 *mac, union vxlan_addr *ip,
>  			    __u16 state, __u16 flags,
>  			    __be16 port, __be32 src_vni, __be32 vni,
> -			    __u32 ifindex, __u16 ndm_flags,
> +			    __u32 ifindex, __u16 ndm_flags, u32 nhid,
>  			    bool swdev_notify,
>  			    struct netlink_ext_ack *extack)
>  {
> @@ -1028,14 +1147,15 @@ static int vxlan_fdb_update(struct vxlan_dev *vxlan,
>  
>  		return vxlan_fdb_update_existing(vxlan, ip, state, flags, port,
>  						 vni, ifindex, ndm_flags, f,
> -						 swdev_notify, extack);
> +						 nhid, swdev_notify, extack);
>  	} else {
>  		if (!(flags & NLM_F_CREATE))
>  			return -ENOENT;
>  
>  		return vxlan_fdb_update_create(vxlan, mac, ip, state, flags,
>  					       port, src_vni, vni, ifindex,
> -					       ndm_flags, swdev_notify, extack);
> +					       ndm_flags, nhid, swdev_notify,
> +					       extack);
>  	}
>  }
>  
> @@ -1049,7 +1169,7 @@ static void vxlan_fdb_dst_destroy(struct vxlan_dev *vxlan, struct vxlan_fdb *f,
>  
>  static int vxlan_fdb_parse(struct nlattr *tb[], struct vxlan_dev *vxlan,
>  			   union vxlan_addr *ip, __be16 *port, __be32 *src_vni,
> -			   __be32 *vni, u32 *ifindex)
> +			   __be32 *vni, u32 *ifindex, u32 *nhid)
>  {
>  	struct net *net = dev_net(vxlan->dev);
>  	int err;
> @@ -1109,6 +1229,11 @@ static int vxlan_fdb_parse(struct nlattr *tb[], struct vxlan_dev *vxlan,
>  		*ifindex = 0;
>  	}
>  
> +	if (tb[NDA_NH_ID])
> +		*nhid = nla_get_u32(tb[NDA_NH_ID]);
> +	else
> +		*nhid = 0;
> +
>  	return 0;
>  }
>  
> @@ -1123,7 +1248,7 @@ static int vxlan_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
>  	union vxlan_addr ip;
>  	__be16 port;
>  	__be32 src_vni, vni;
> -	u32 ifindex;
> +	u32 ifindex, nhid;
>  	u32 hash_index;
>  	int err;
>  
> @@ -1133,10 +1258,11 @@ static int vxlan_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
>  		return -EINVAL;
>  	}
>  
> -	if (tb[NDA_DST] == NULL)
> +	if (!tb || (!tb[NDA_DST] && !tb[NDA_NH_ID]))
>  		return -EINVAL;
>  
> -	err = vxlan_fdb_parse(tb, vxlan, &ip, &port, &src_vni, &vni, &ifindex);
> +	err = vxlan_fdb_parse(tb, vxlan, &ip, &port, &src_vni, &vni, &ifindex,
> +			      &nhid);
>  	if (err)
>  		return err;
>  
> @@ -1148,7 +1274,7 @@ static int vxlan_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
>  	err = vxlan_fdb_update(vxlan, addr, &ip, ndm->ndm_state, flags,
>  			       port, src_vni, vni, ifindex,
>  			       ndm->ndm_flags | NTF_VXLAN_ADDED_BY_USER,
> -			       true, extack);
> +			       nhid, true, extack);
>  	spin_unlock_bh(&vxlan->hash_lock[hash_index]);
>  
>  	return err;
> @@ -1159,8 +1285,8 @@ static int __vxlan_fdb_delete(struct vxlan_dev *vxlan,
>  			      __be16 port, __be32 src_vni, __be32 vni,
>  			      u32 ifindex, bool swdev_notify)
>  {
> -	struct vxlan_fdb *f;
>  	struct vxlan_rdst *rd = NULL;
> +	struct vxlan_fdb *f;
>  	int err = -ENOENT;
>  
>  	f = vxlan_find_mac(vxlan, addr, src_vni);
> @@ -1195,12 +1321,13 @@ static int vxlan_fdb_delete(struct ndmsg *ndm, struct nlattr *tb[],
>  	struct vxlan_dev *vxlan = netdev_priv(dev);
>  	union vxlan_addr ip;
>  	__be32 src_vni, vni;
> -	__be16 port;
> -	u32 ifindex;
> +	u32 ifindex, nhid;
>  	u32 hash_index;
> +	__be16 port;
>  	int err;
>  
> -	err = vxlan_fdb_parse(tb, vxlan, &ip, &port, &src_vni, &vni, &ifindex);
> +	err = vxlan_fdb_parse(tb, vxlan, &ip, &port, &src_vni, &vni, &ifindex,
> +			      &nhid);
>  	if (err)
>  		return err;
>  
> @@ -1228,6 +1355,17 @@ static int vxlan_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb,
>  		hlist_for_each_entry_rcu(f, &vxlan->fdb_head[h], hlist) {
>  			struct vxlan_rdst *rd;
>  
> +			if (rcu_access_pointer(f->nh)) {
> +				err = vxlan_fdb_info(skb, vxlan, f,
> +						     NETLINK_CB(cb->skb).portid,
> +						     cb->nlh->nlmsg_seq,
> +						     RTM_NEWNEIGH,
> +						     NLM_F_MULTI, NULL);
> +				if (err < 0)
> +					goto out;
> +				continue;
> +			}
> +
>  			list_for_each_entry_rcu(rd, &f->remotes, list) {
>  				if (*idx < cb->args[2])
>  					goto skip;
> @@ -1311,6 +1449,10 @@ static bool vxlan_snoop(struct net_device *dev,
>  		if (f->state & (NUD_PERMANENT | NUD_NOARP))
>  			return true;
>  
> +		/* Don't override an fdb with nexthop with a learnt entry */
> +		if (rcu_access_pointer(f->nh))
> +			return true;
> +
>  		if (net_ratelimit())
>  			netdev_info(dev,
>  				    "%pM migrated from %pIS to %pIS\n",
> @@ -1333,7 +1475,7 @@ static bool vxlan_snoop(struct net_device *dev,
>  					 vxlan->cfg.dst_port,
>  					 vni,
>  					 vxlan->default_dst.remote_vni,
> -					 ifindex, NTF_SELF, true, NULL);
> +					 ifindex, NTF_SELF, 0, true, NULL);
>  		spin_unlock(&vxlan->hash_lock[hash_index]);
>  	}
>  
> @@ -2616,6 +2758,30 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
>  	kfree_skb(skb);
>  }
>  
> +static void vxlan_xmit_nh(struct sk_buff *skb, struct net_device *dev,
> +			  struct vxlan_fdb *f, __be32 vni, bool did_rsc)
> +{
> +	struct vxlan_rdst nh_rdst;
> +	struct nexthop *nh;
> +	bool do_xmit;
> +	u32 hash;
> +
> +	memset(&nh_rdst, 0, sizeof(struct vxlan_rdst));
> +	hash = skb_get_hash(skb);
> +
> +	rcu_read_lock();
> +	nh = rcu_dereference(f->nh);
> +	if (!nh)
> +		return;
> +	do_xmit = vxlan_fdb_nh_path_select(nh, hash, &nh_rdst);
> +	rcu_read_unlock();
> +
> +	if (likely(do_xmit))
> +		vxlan_xmit_one(skb, dev, vni, &nh_rdst, did_rsc);
> +	else
> +		kfree_skb(skb);
> +}
> +
>  /* Transmit local packets over Vxlan
>   *
>   * Outer IP header inherits ECN and DF from inner header.
> @@ -2692,22 +2858,27 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev)
>  		}
>  	}
>  
> -	list_for_each_entry_rcu(rdst, &f->remotes, list) {
> -		struct sk_buff *skb1;
> +	if (rcu_access_pointer(f->nh)) {
> +		vxlan_xmit_nh(skb, dev, f,
> +			      (vni ? : vxlan->default_dst.remote_vni), did_rsc);
> +	} else {
> +		list_for_each_entry_rcu(rdst, &f->remotes, list) {
> +			struct sk_buff *skb1;
>  
> -		if (!fdst) {
> -			fdst = rdst;
> -			continue;
> +			if (!fdst) {
> +				fdst = rdst;
> +				continue;
> +			}
> +			skb1 = skb_clone(skb, GFP_ATOMIC);
> +			if (skb1)
> +				vxlan_xmit_one(skb1, dev, vni, rdst, did_rsc);
>  		}
> -		skb1 = skb_clone(skb, GFP_ATOMIC);
> -		if (skb1)
> -			vxlan_xmit_one(skb1, dev, vni, rdst, did_rsc);
> +		if (fdst)
> +			vxlan_xmit_one(skb, dev, vni, fdst, did_rsc);
> +		else
> +			kfree_skb(skb);
>  	}
>  
> -	if (fdst)
> -		vxlan_xmit_one(skb, dev, vni, fdst, did_rsc);
> -	else
> -		kfree_skb(skb);
>  	return NETDEV_TX_OK;
>  }
>  
> @@ -3615,7 +3786,7 @@ static int __vxlan_dev_create(struct net *net, struct net_device *dev,
>  				       dst->remote_vni,
>  				       dst->remote_vni,
>  				       dst->remote_ifindex,
> -				       NTF_SELF, &f);
> +				       NTF_SELF, 0, &f, extack);
>  		if (err)
>  			return err;
>  	}
> @@ -4013,7 +4184,7 @@ static int vxlan_changelink(struct net_device *dev, struct nlattr *tb[],
>  					       vxlan->cfg.dst_port,
>  					       conf.vni, conf.vni,
>  					       conf.remote_ifindex,
> -					       NTF_SELF, true, extack);
> +					       NTF_SELF, 0, true, extack);
>  			if (err) {
>  				spin_unlock_bh(&vxlan->hash_lock[hash_index]);
>  				netdev_adjacent_change_abort(dst->remote_dev,
> @@ -4335,7 +4506,7 @@ vxlan_fdb_external_learn_add(struct net_device *dev,
>  			       fdb_info->remote_vni,
>  			       fdb_info->remote_ifindex,
>  			       NTF_USE | NTF_SELF | NTF_EXT_LEARNED,
> -			       false, extack);
> +			       0, false, extack);
>  	spin_unlock_bh(&vxlan->hash_lock[hash_index]);
>  
>  	return err;
> diff --git a/include/net/nexthop.h b/include/net/nexthop.h
> index 04dafc6..d929c98 100644
> --- a/include/net/nexthop.h
> +++ b/include/net/nexthop.h
> @@ -331,6 +331,8 @@ static inline struct fib_nh_common *nexthop_path_fdb_result(struct nexthop *nh,
>  	struct nexthop *nhp;
>  
>  	nhp = nexthop_select_path(nh, hash);
> +	if (unlikely(!nhp))
> +		return NULL;
>  	nhi = rcu_dereference(nhp->nh_info);
>  	return &nhi->fib_nhc;
>  }
> diff --git a/include/net/vxlan.h b/include/net/vxlan.h
> index 373aadc..31bca3e 100644
> --- a/include/net/vxlan.h
> +++ b/include/net/vxlan.h
> @@ -487,4 +487,28 @@ static inline void vxlan_flag_attr_error(int attrtype,
>  #undef VXLAN_FLAG
>  }
>  
> +static inline bool vxlan_fdb_nh_path_select(struct nexthop *nh,
> +					    int hash,
> +					    struct vxlan_rdst *rdst)
> +{
> +	struct fib_nh_common *nhc;
> +
> +	nhc = nexthop_path_fdb_result(nh, hash);
> +	if (unlikely(!nhc))
> +		return false;
> +
> +	switch (nhc->nhc_gw_family) {
> +	case AF_INET:
> +		rdst->remote_ip.sin.sin_addr.s_addr = nhc->nhc_gw.ipv4;
> +		rdst->remote_ip.sa.sa_family = AF_INET;
> +		break;
> +	case AF_INET6:
> +		rdst->remote_ip.sin6.sin6_addr = nhc->nhc_gw.ipv6;
> +		rdst->remote_ip.sa.sa_family = AF_INET6;
> +		break;
> +	}
> +
> +	return true;
> +}
> +
>  #endif
> diff --git a/include/uapi/linux/neighbour.h b/include/uapi/linux/neighbour.h
> index cd144e3..eefcda8 100644
> --- a/include/uapi/linux/neighbour.h
> +++ b/include/uapi/linux/neighbour.h
> @@ -29,6 +29,7 @@ enum {
>  	NDA_LINK_NETNSID,
>  	NDA_SRC_VNI,
>  	NDA_PROTOCOL,  /* Originator of entry */
> +	NDA_NH_ID,
>  	__NDA_MAX
>  };
>  
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 3/6] vxlan: ecmp support for mac fdb entries
  2020-05-19  2:14 ` [PATCH net-next 3/6] vxlan: ecmp support for mac fdb entries Roopa Prabhu
@ 2020-05-19 12:55     ` kbuild test robot
  2020-05-19 11:13   ` Nikolay Aleksandrov
                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 20+ messages in thread
From: kbuild test robot @ 2020-05-19 12:55 UTC (permalink / raw)
  To: Roopa Prabhu, dsahern, davem
  Cc: kbuild-all, netdev, nikolay, jiri, idosch, petrm

[-- Attachment #1: Type: text/plain, Size: 2685 bytes --]

Hi Roopa,

I love your patch! Yet something to improve:

[auto build test ERROR on net-next/master]
[also build test ERROR on net/master kselftest/next sparc-next/master linus/master v5.7-rc6 next-20200518]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Roopa-Prabhu/Support-for-fdb-ECMP-nexthop-groups/20200519-185605
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 5cdfe8306631b2224e3f81fc5a1e2721c7a1948b
config: sh-polaris_defconfig (attached as .config)
compiler: sh4-linux-gcc (GCC) 9.3.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=sh 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>, old ones prefixed by <<):

In file included from net/ipv4/ip_tunnel_core.c:38:
include/net/vxlan.h: In function 'vxlan_fdb_nh_path_select':
>> include/net/vxlan.h:496:8: error: implicit declaration of function 'nexthop_path_fdb_result' [-Werror=implicit-function-declaration]
496 |  nhc = nexthop_path_fdb_result(nh, hash);
|        ^~~~~~~~~~~~~~~~~~~~~~~
>> include/net/vxlan.h:496:6: warning: assignment to 'struct fib_nh_common *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
496 |  nhc = nexthop_path_fdb_result(nh, hash);
|      ^
cc1: some warnings being treated as errors

vim +/nexthop_path_fdb_result +496 include/net/vxlan.h

   489	
   490	static inline bool vxlan_fdb_nh_path_select(struct nexthop *nh,
   491						    int hash,
   492						    struct vxlan_rdst *rdst)
   493	{
   494		struct fib_nh_common *nhc;
   495	
 > 496		nhc = nexthop_path_fdb_result(nh, hash);
   497		if (unlikely(!nhc))
   498			return false;
   499	
   500		switch (nhc->nhc_gw_family) {
   501		case AF_INET:
   502			rdst->remote_ip.sin.sin_addr.s_addr = nhc->nhc_gw.ipv4;
   503			rdst->remote_ip.sa.sa_family = AF_INET;
   504			break;
   505		case AF_INET6:
   506			rdst->remote_ip.sin6.sin6_addr = nhc->nhc_gw.ipv6;
   507			rdst->remote_ip.sa.sa_family = AF_INET6;
   508			break;
   509		}
   510	
   511		return true;
   512	}
   513	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 10869 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 3/6] vxlan: ecmp support for mac fdb entries
@ 2020-05-19 12:55     ` kbuild test robot
  0 siblings, 0 replies; 20+ messages in thread
From: kbuild test robot @ 2020-05-19 12:55 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 2753 bytes --]

Hi Roopa,

I love your patch! Yet something to improve:

[auto build test ERROR on net-next/master]
[also build test ERROR on net/master kselftest/next sparc-next/master linus/master v5.7-rc6 next-20200518]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Roopa-Prabhu/Support-for-fdb-ECMP-nexthop-groups/20200519-185605
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 5cdfe8306631b2224e3f81fc5a1e2721c7a1948b
config: sh-polaris_defconfig (attached as .config)
compiler: sh4-linux-gcc (GCC) 9.3.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=sh 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>, old ones prefixed by <<):

In file included from net/ipv4/ip_tunnel_core.c:38:
include/net/vxlan.h: In function 'vxlan_fdb_nh_path_select':
>> include/net/vxlan.h:496:8: error: implicit declaration of function 'nexthop_path_fdb_result' [-Werror=implicit-function-declaration]
496 |  nhc = nexthop_path_fdb_result(nh, hash);
|        ^~~~~~~~~~~~~~~~~~~~~~~
>> include/net/vxlan.h:496:6: warning: assignment to 'struct fib_nh_common *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
496 |  nhc = nexthop_path_fdb_result(nh, hash);
|      ^
cc1: some warnings being treated as errors

vim +/nexthop_path_fdb_result +496 include/net/vxlan.h

   489	
   490	static inline bool vxlan_fdb_nh_path_select(struct nexthop *nh,
   491						    int hash,
   492						    struct vxlan_rdst *rdst)
   493	{
   494		struct fib_nh_common *nhc;
   495	
 > 496		nhc = nexthop_path_fdb_result(nh, hash);
   497		if (unlikely(!nhc))
   498			return false;
   499	
   500		switch (nhc->nhc_gw_family) {
   501		case AF_INET:
   502			rdst->remote_ip.sin.sin_addr.s_addr = nhc->nhc_gw.ipv4;
   503			rdst->remote_ip.sa.sa_family = AF_INET;
   504			break;
   505		case AF_INET6:
   506			rdst->remote_ip.sin6.sin6_addr = nhc->nhc_gw.ipv6;
   507			rdst->remote_ip.sa.sa_family = AF_INET6;
   508			break;
   509		}
   510	
   511		return true;
   512	}
   513	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 10869 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 3/6] vxlan: ecmp support for mac fdb entries
  2020-05-19  2:14 ` [PATCH net-next 3/6] vxlan: ecmp support for mac fdb entries Roopa Prabhu
@ 2020-05-19 13:31     ` kbuild test robot
  2020-05-19 11:13   ` Nikolay Aleksandrov
                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 20+ messages in thread
From: kbuild test robot @ 2020-05-19 13:31 UTC (permalink / raw)
  To: Roopa Prabhu, dsahern, davem
  Cc: kbuild-all, netdev, nikolay, jiri, idosch, petrm

[-- Attachment #1: Type: text/plain, Size: 4341 bytes --]

Hi Roopa,

I love your patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]
[also build test WARNING on net/master kselftest/next sparc-next/master linus/master v5.7-rc6 next-20200518]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Roopa-Prabhu/Support-for-fdb-ECMP-nexthop-groups/20200519-185605
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 5cdfe8306631b2224e3f81fc5a1e2721c7a1948b
config: um-defconfig (attached as .config)
compiler: gcc-7 (Ubuntu 7.5.0-6ubuntu2) 7.5.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=um 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot <lkp@intel.com>

All warnings (new ones prefixed by >>, old ones prefixed by <<):

cc1: warning: arch/um/include/uapi: No such file or directory [-Wmissing-include-dirs]
In file included from include/linux/kernel.h:11:0,
from net/ipv4/ip_tunnel_core.c:9:
include/asm-generic/fixmap.h: In function 'fix_to_virt':
include/asm-generic/fixmap.h:32:19: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
BUILD_BUG_ON(idx >= __end_of_fixed_addresses);
^
include/linux/compiler.h:330:9: note: in definition of macro '__compiletime_assert'
if (!(condition))                 ^~~~~~~~~
include/linux/compiler.h:350:2: note: in expansion of macro '_compiletime_assert'
_compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
^~~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
#define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
^~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:50:2: note: in expansion of macro 'BUILD_BUG_ON_MSG'
BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
^~~~~~~~~~~~~~~~
include/asm-generic/fixmap.h:32:2: note: in expansion of macro 'BUILD_BUG_ON'
BUILD_BUG_ON(idx >= __end_of_fixed_addresses);
^~~~~~~~~~~~
In file included from include/linux/uaccess.h:11:0,
from include/linux/sched/task.h:11,
from include/linux/sched/signal.h:9,
from include/linux/rcuwait.h:6,
from include/linux/percpu-rwsem.h:7,
from include/linux/fs.h:34,
from include/linux/huge_mm.h:8,
from include/linux/mm.h:679,
from include/linux/bvec.h:13,
from include/linux/skbuff.h:17,
from net/ipv4/ip_tunnel_core.c:10:
arch/um/include/asm/uaccess.h: In function '__access_ok':
arch/um/include/asm/uaccess.h:17:29: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
(((unsigned long) (addr) >= FIXADDR_USER_START) &&                                 ^
arch/um/include/asm/uaccess.h:45:3: note: in expansion of macro '__access_ok_vsyscall'
__access_ok_vsyscall(addr, size) ||
^~~~~~~~~~~~~~~~~~~~
In file included from net/ipv4/ip_tunnel_core.c:38:0:
include/net/vxlan.h: In function 'vxlan_fdb_nh_path_select':
include/net/vxlan.h:496:8: error: implicit declaration of function 'nexthop_path_fdb_result' [-Werror=implicit-function-declaration]
nhc = nexthop_path_fdb_result(nh, hash);
^~~~~~~~~~~~~~~~~~~~~~~
>> include/net/vxlan.h:496:6: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
nhc = nexthop_path_fdb_result(nh, hash);
^
cc1: some warnings being treated as errors

vim +496 include/net/vxlan.h

   489	
   490	static inline bool vxlan_fdb_nh_path_select(struct nexthop *nh,
   491						    int hash,
   492						    struct vxlan_rdst *rdst)
   493	{
   494		struct fib_nh_common *nhc;
   495	
 > 496		nhc = nexthop_path_fdb_result(nh, hash);
   497		if (unlikely(!nhc))
   498			return false;
   499	
   500		switch (nhc->nhc_gw_family) {
   501		case AF_INET:
   502			rdst->remote_ip.sin.sin_addr.s_addr = nhc->nhc_gw.ipv4;
   503			rdst->remote_ip.sa.sa_family = AF_INET;
   504			break;
   505		case AF_INET6:
   506			rdst->remote_ip.sin6.sin6_addr = nhc->nhc_gw.ipv6;
   507			rdst->remote_ip.sa.sa_family = AF_INET6;
   508			break;
   509		}
   510	
   511		return true;
   512	}
   513	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 8489 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 3/6] vxlan: ecmp support for mac fdb entries
@ 2020-05-19 13:31     ` kbuild test robot
  0 siblings, 0 replies; 20+ messages in thread
From: kbuild test robot @ 2020-05-19 13:31 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 4445 bytes --]

Hi Roopa,

I love your patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]
[also build test WARNING on net/master kselftest/next sparc-next/master linus/master v5.7-rc6 next-20200518]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Roopa-Prabhu/Support-for-fdb-ECMP-nexthop-groups/20200519-185605
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 5cdfe8306631b2224e3f81fc5a1e2721c7a1948b
config: um-defconfig (attached as .config)
compiler: gcc-7 (Ubuntu 7.5.0-6ubuntu2) 7.5.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=um 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot <lkp@intel.com>

All warnings (new ones prefixed by >>, old ones prefixed by <<):

cc1: warning: arch/um/include/uapi: No such file or directory [-Wmissing-include-dirs]
In file included from include/linux/kernel.h:11:0,
from net/ipv4/ip_tunnel_core.c:9:
include/asm-generic/fixmap.h: In function 'fix_to_virt':
include/asm-generic/fixmap.h:32:19: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
BUILD_BUG_ON(idx >= __end_of_fixed_addresses);
^
include/linux/compiler.h:330:9: note: in definition of macro '__compiletime_assert'
if (!(condition))                 ^~~~~~~~~
include/linux/compiler.h:350:2: note: in expansion of macro '_compiletime_assert'
_compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
^~~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
#define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
^~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:50:2: note: in expansion of macro 'BUILD_BUG_ON_MSG'
BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
^~~~~~~~~~~~~~~~
include/asm-generic/fixmap.h:32:2: note: in expansion of macro 'BUILD_BUG_ON'
BUILD_BUG_ON(idx >= __end_of_fixed_addresses);
^~~~~~~~~~~~
In file included from include/linux/uaccess.h:11:0,
from include/linux/sched/task.h:11,
from include/linux/sched/signal.h:9,
from include/linux/rcuwait.h:6,
from include/linux/percpu-rwsem.h:7,
from include/linux/fs.h:34,
from include/linux/huge_mm.h:8,
from include/linux/mm.h:679,
from include/linux/bvec.h:13,
from include/linux/skbuff.h:17,
from net/ipv4/ip_tunnel_core.c:10:
arch/um/include/asm/uaccess.h: In function '__access_ok':
arch/um/include/asm/uaccess.h:17:29: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
(((unsigned long) (addr) >= FIXADDR_USER_START) &&                                 ^
arch/um/include/asm/uaccess.h:45:3: note: in expansion of macro '__access_ok_vsyscall'
__access_ok_vsyscall(addr, size) ||
^~~~~~~~~~~~~~~~~~~~
In file included from net/ipv4/ip_tunnel_core.c:38:0:
include/net/vxlan.h: In function 'vxlan_fdb_nh_path_select':
include/net/vxlan.h:496:8: error: implicit declaration of function 'nexthop_path_fdb_result' [-Werror=implicit-function-declaration]
nhc = nexthop_path_fdb_result(nh, hash);
^~~~~~~~~~~~~~~~~~~~~~~
>> include/net/vxlan.h:496:6: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
nhc = nexthop_path_fdb_result(nh, hash);
^
cc1: some warnings being treated as errors

vim +496 include/net/vxlan.h

   489	
   490	static inline bool vxlan_fdb_nh_path_select(struct nexthop *nh,
   491						    int hash,
   492						    struct vxlan_rdst *rdst)
   493	{
   494		struct fib_nh_common *nhc;
   495	
 > 496		nhc = nexthop_path_fdb_result(nh, hash);
   497		if (unlikely(!nhc))
   498			return false;
   499	
   500		switch (nhc->nhc_gw_family) {
   501		case AF_INET:
   502			rdst->remote_ip.sin.sin_addr.s_addr = nhc->nhc_gw.ipv4;
   503			rdst->remote_ip.sa.sa_family = AF_INET;
   504			break;
   505		case AF_INET6:
   506			rdst->remote_ip.sin6.sin6_addr = nhc->nhc_gw.ipv6;
   507			rdst->remote_ip.sa.sa_family = AF_INET6;
   508			break;
   509		}
   510	
   511		return true;
   512	}
   513	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 8489 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 2/6] nexthop: support for fdb ecmp nexthops
       [not found]       ` <CAJieiUgHqYo1UZ2VKHK=hTTLZjkScYisdRJ0be0kjtj6c-DRYA@mail.gmail.com>
@ 2020-05-19 17:07         ` David Ahern
  0 siblings, 0 replies; 20+ messages in thread
From: David Ahern @ 2020-05-19 17:07 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: David Miller, netdev, Nikolay Aleksandrov, Jiri Pirko,
	Ido Schimmel, Petr Machata

On 5/19/20 11:02 AM, Roopa Prabhu wrote:
> What are the rules here.... does neighbor code only check for NDA_NH_ID
> because the strict start type is set to NDA_NH_ID ?
> 

Lack of checking of unused attributes leads to the kernel silently
ignoring userspace data. From top of memory that has a few problems:

1. does the kernel support feature A (lack of probing for a feature),

2. kernel not implementing what the user requested and not telling the user,

3. impacting the ability to start using feature A in that code at some
point in the future (it was ignored in the past and garbage was passed
in, now suddenly that garbage is acted on).

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2020-05-19 17:07 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-19  2:14 [PATCH net-next 0/6] Support for fdb ECMP nexthop groups Roopa Prabhu
2020-05-19  2:14 ` [PATCH net-next 1/6] nexthop: dereference nh only once in nexthop_select_path Roopa Prabhu
2020-05-19  3:25   ` David Ahern
2020-05-19  8:48     ` Nikolay Aleksandrov
2020-05-19 10:04       ` Nikolay Aleksandrov
2020-05-19  2:14 ` [PATCH net-next 2/6] nexthop: support for fdb ecmp nexthops Roopa Prabhu
2020-05-19  3:53   ` David Ahern
     [not found]     ` <CAJieiUjXO6h9HtwTn3fv7W=WovyUxzU2+EZ_Off6kxxRfgyUKQ@mail.gmail.com>
     [not found]       ` <CAJieiUgHqYo1UZ2VKHK=hTTLZjkScYisdRJ0be0kjtj6c-DRYA@mail.gmail.com>
2020-05-19 17:07         ` David Ahern
2020-05-19  2:14 ` [PATCH net-next 3/6] vxlan: ecmp support for mac fdb entries Roopa Prabhu
2020-05-19  3:57   ` David Ahern
2020-05-19 11:13   ` Nikolay Aleksandrov
2020-05-19 11:28   ` Nikolay Aleksandrov
2020-05-19 12:55   ` kbuild test robot
2020-05-19 12:55     ` kbuild test robot
2020-05-19 13:31   ` kbuild test robot
2020-05-19 13:31     ` kbuild test robot
2020-05-19  2:14 ` [PATCH net-next 4/6] nexthop: add support for notifiers Roopa Prabhu
2020-05-19  2:14 ` [PATCH net-next 5/6] vxlan: support for nexthop notifiers Roopa Prabhu
2020-05-19  2:14 ` [PATCH net-next 6/6] selftests: net: add fdb nexthop tests Roopa Prabhu
2020-05-19  4:00   ` David Ahern

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.