All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net 0/5] nexthops: Fix 2 fundamental flaws with nexthop groups
@ 2020-05-26 15:01 David Ahern
  2020-05-26 15:01 ` [PATCH net 1/5] nexthops: Move code from remove_nexthop_from_groups to remove_nh_grp_entry David Ahern
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: David Ahern @ 2020-05-26 15:01 UTC (permalink / raw)
  To: netdev; +Cc: davem, kuba, nikolay, David Ahern

From: David Ahern <dahern@digitalocean.com>

Nik's torture tests have exposed 2 fundamental mistakes with the initial
nexthop code for groups. First, the nexthops entries and num_nh in the
nh_grp struct should not be modified once the struct is set under rcu.
Doing so has major affects on the datapath seeing valid nexthop entries.

Second, the helpers in the header file were convenient for not repeating
code, but they cause datapath walks to potentially see 2 different group
structs after an rcu replace, disrupting a walk of the path objects.
This second problem applies solely to IPv4 as I re-used too much of the
existing code in walking legs of a multipath route.

Patches 1 is refactoring change to simplify the overhead of reviewing and
understanding the change in patch 2 which fixes the update of nexthop
groups when a compnent leg is removed.

Patches 3-5 address the second problem. Patch 3 inlines the multipath
check such that the mpath lookup and subsequent calls all use the same
nh_grp struct. Patches 4 and 5 fix datapath uses of fib_info_num_path
with iterative calls to fib_info_nhc.

fib_info_num_path can be used in control plane path in a 'for loop' with
subsequent fib_info_nhc calls to get each leg since the nh_grp struct is
only changed while holding the rtnl; the combination can not be used in
the data plane with external nexthops as it involves repeated dereferences
of nh_grp struct which can change between calls.

Similarly, nexthop_is_multipath can be used for branching decisions in
the datapath since the nexthop type can not be changed (a group can not
be converted to standalone and vice versa).

Patch set developed in coordination with Nikolay Aleksandrov. He did a
lot of work creating a good reproducer, discussing options to fix it
and testing iterations.

I have adapted Nik's commands into additional tests in the nexthops
selftest script which I will send against -next.

David Ahern (4):
  nexthops: Move code from remove_nexthop_from_groups to
    remove_nh_grp_entry
  nexthop: Expand nexthop_is_multipath in a few places
  ipv4: Refactor nhc evaluation in fib_table_lookup
  ipv4: nexthop version of fib_info_nh_uses_dev

Nikolay Aleksandrov (1):
  nexthops: don't modify published nexthop groups

 include/net/ip_fib.h    |  12 +++++
 include/net/nexthop.h   | 100 ++++++++++++++++++++++++++++++++-------
 net/ipv4/fib_frontend.c |  19 ++++----
 net/ipv4/fib_trie.c     |  51 ++++++++++++++------
 net/ipv4/nexthop.c      | 102 +++++++++++++++++++++++++---------------
 5 files changed, 205 insertions(+), 79 deletions(-)

-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH net 1/5] nexthops: Move code from remove_nexthop_from_groups to remove_nh_grp_entry
  2020-05-26 15:01 [PATCH net 0/5] nexthops: Fix 2 fundamental flaws with nexthop groups David Ahern
@ 2020-05-26 15:01 ` David Ahern
  2020-05-26 15:07   ` Nikolay Aleksandrov
  2020-05-26 15:01 ` [PATCH net 2/5] nexthops: don't modify published nexthop groups David Ahern
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: David Ahern @ 2020-05-26 15:01 UTC (permalink / raw)
  To: netdev; +Cc: davem, kuba, nikolay, David Ahern

From: David Ahern <dsahern@gmail.com>

Move nh_grp dereference and check for removing nexthop group due to
all members gone into remove_nh_grp_entry.

Fixes: 430a049190de ("nexthop: Add support for nexthop groups")
Signed-off-by: David Ahern <dsahern@gmail.com>
---
 net/ipv4/nexthop.c | 27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
index 2a31c4af845e..0f68d9801808 100644
--- a/net/ipv4/nexthop.c
+++ b/net/ipv4/nexthop.c
@@ -693,17 +693,21 @@ static void nh_group_rebalance(struct nh_group *nhg)
 	}
 }
 
-static void remove_nh_grp_entry(struct nh_grp_entry *nhge,
-				struct nh_group *nhg,
+static void remove_nh_grp_entry(struct net *net, struct nh_grp_entry *nhge,
 				struct nl_info *nlinfo)
 {
+	struct nexthop *nhp = nhge->nh_parent;
 	struct nexthop *nh = nhge->nh;
 	struct nh_grp_entry *nhges;
+	struct nh_group *nhg;
 	bool found = false;
 	int i;
 
 	WARN_ON(!nh);
 
+	list_del(&nhge->nh_list);
+
+	nhg = rtnl_dereference(nhp->nh_grp);
 	nhges = nhg->nh_entries;
 	for (i = 0; i < nhg->num_nh; ++i) {
 		if (found) {
@@ -727,7 +731,11 @@ static void remove_nh_grp_entry(struct nh_grp_entry *nhge,
 	nexthop_put(nh);
 
 	if (nlinfo)
-		nexthop_notify(RTM_NEWNEXTHOP, nhge->nh_parent, nlinfo);
+		nexthop_notify(RTM_NEWNEXTHOP, nhp, nlinfo);
+
+	/* if this group has no more entries then remove it */
+	if (!nhg->num_nh)
+		remove_nexthop(net, nhp, nlinfo);
 }
 
 static void remove_nexthop_from_groups(struct net *net, struct nexthop *nh,
@@ -735,17 +743,8 @@ static void remove_nexthop_from_groups(struct net *net, struct nexthop *nh,
 {
 	struct nh_grp_entry *nhge, *tmp;
 
-	list_for_each_entry_safe(nhge, tmp, &nh->grp_list, nh_list) {
-		struct nh_group *nhg;
-
-		list_del(&nhge->nh_list);
-		nhg = rtnl_dereference(nhge->nh_parent->nh_grp);
-		remove_nh_grp_entry(nhge, nhg, nlinfo);
-
-		/* if this group has no more entries then remove it */
-		if (!nhg->num_nh)
-			remove_nexthop(net, nhge->nh_parent, nlinfo);
-	}
+	list_for_each_entry_safe(nhge, tmp, &nh->grp_list, nh_list)
+		remove_nh_grp_entry(net, nhge, nlinfo);
 }
 
 static void remove_nexthop_group(struct nexthop *nh, struct nl_info *nlinfo)
-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net 2/5] nexthops: don't modify published nexthop groups
  2020-05-26 15:01 [PATCH net 0/5] nexthops: Fix 2 fundamental flaws with nexthop groups David Ahern
  2020-05-26 15:01 ` [PATCH net 1/5] nexthops: Move code from remove_nexthop_from_groups to remove_nh_grp_entry David Ahern
@ 2020-05-26 15:01 ` David Ahern
  2020-05-26 15:01 ` [PATCH net 3/5] nexthop: Expand nexthop_is_multipath in a few places David Ahern
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: David Ahern @ 2020-05-26 15:01 UTC (permalink / raw)
  To: netdev; +Cc: davem, kuba, nikolay, David Ahern

From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

We must avoid modifying published nexthop groups while they might be
in use, otherwise we might see NULL ptr dereferences. In order to do
that we allocate 2 nexthoup group structures upon nexthop creation
and swap between them when we have to delete an entry. The reason is
that we can't fail nexthop group removal, so we can't handle allocation
failure thus we move the extra allocation on creation where we can
safely fail and return ENOMEM.

Fixes: 430a049190de ("nexthop: Add support for nexthop groups")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
---
 include/net/nexthop.h |  1 +
 net/ipv4/nexthop.c    | 91 +++++++++++++++++++++++++++----------------
 2 files changed, 59 insertions(+), 33 deletions(-)

diff --git a/include/net/nexthop.h b/include/net/nexthop.h
index c440ccc861fc..8a343519ed7a 100644
--- a/include/net/nexthop.h
+++ b/include/net/nexthop.h
@@ -70,6 +70,7 @@ struct nh_grp_entry {
 };
 
 struct nh_group {
+	struct nh_group		*spare; /* spare group for removals */
 	u16			num_nh;
 	bool			mpath;
 	bool			has_v4;
diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
index 0f68d9801808..a8a399aaa4bc 100644
--- a/net/ipv4/nexthop.c
+++ b/net/ipv4/nexthop.c
@@ -63,9 +63,16 @@ static void nexthop_free_mpath(struct nexthop *nh)
 	int i;
 
 	nhg = rcu_dereference_raw(nh->nh_grp);
-	for (i = 0; i < nhg->num_nh; ++i)
-		WARN_ON(nhg->nh_entries[i].nh);
+	for (i = 0; i < nhg->num_nh; ++i) {
+		struct nh_grp_entry *nhge = &nhg->nh_entries[i];
 
+		WARN_ON(!list_empty(&nhge->nh_list));
+		nexthop_put(nhge->nh);
+	}
+
+	WARN_ON(nhg->spare == nhg);
+
+	kfree(nhg->spare);
 	kfree(nhg);
 }
 
@@ -696,46 +703,53 @@ static void nh_group_rebalance(struct nh_group *nhg)
 static void remove_nh_grp_entry(struct net *net, struct nh_grp_entry *nhge,
 				struct nl_info *nlinfo)
 {
+	struct nh_grp_entry *nhges, *new_nhges;
 	struct nexthop *nhp = nhge->nh_parent;
 	struct nexthop *nh = nhge->nh;
-	struct nh_grp_entry *nhges;
-	struct nh_group *nhg;
-	bool found = false;
-	int i;
+	struct nh_group *nhg, *newg;
+	int i, j;
 
 	WARN_ON(!nh);
 
-	list_del(&nhge->nh_list);
-
 	nhg = rtnl_dereference(nhp->nh_grp);
-	nhges = nhg->nh_entries;
-	for (i = 0; i < nhg->num_nh; ++i) {
-		if (found) {
-			nhges[i-1].nh = nhges[i].nh;
-			nhges[i-1].weight = nhges[i].weight;
-			list_del(&nhges[i].nh_list);
-			list_add(&nhges[i-1].nh_list, &nhges[i-1].nh->grp_list);
-		} else if (nhg->nh_entries[i].nh == nh) {
-			found = true;
-		}
-	}
+	newg = nhg->spare;
 
-	if (WARN_ON(!found))
+	/* last entry, keep it visible and remove the parent */
+	if (nhg->num_nh == 1) {
+		remove_nexthop(net, nhp, nlinfo);
 		return;
+	}
 
-	nhg->num_nh--;
-	nhg->nh_entries[nhg->num_nh].nh = NULL;
+	newg->has_v4 = nhg->has_v4;
+	newg->mpath = nhg->mpath;
+	newg->num_nh = nhg->num_nh;
 
-	nh_group_rebalance(nhg);
+	/* copy old entries to new except the one getting removed */
+	nhges = nhg->nh_entries;
+	new_nhges = newg->nh_entries;
+	for (i = 0, j = 0; i < nhg->num_nh; ++i) {
+		/* current nexthop getting removed */
+		if (nhg->nh_entries[i].nh == nh) {
+			newg->num_nh--;
+			continue;
+		}
 
-	nexthop_put(nh);
+		list_del(&nhges[i].nh_list);
+		new_nhges[j].nh_parent = nhges[i].nh_parent;
+		new_nhges[j].nh = nhges[i].nh;
+		new_nhges[j].weight = nhges[i].weight;
+		list_add(&new_nhges[j].nh_list, &new_nhges[j].nh->grp_list);
+		j++;
+	}
+
+	nh_group_rebalance(newg);
+	rcu_assign_pointer(nhp->nh_grp, newg);
+
+	list_del(&nhge->nh_list);
+	nexthop_put(nhge->nh);
 
 	if (nlinfo)
 		nexthop_notify(RTM_NEWNEXTHOP, nhp, nlinfo);
-
-	/* if this group has no more entries then remove it */
-	if (!nhg->num_nh)
-		remove_nexthop(net, nhp, nlinfo);
 }
 
 static void remove_nexthop_from_groups(struct net *net, struct nexthop *nh,
@@ -745,6 +759,9 @@ static void remove_nexthop_from_groups(struct net *net, struct nexthop *nh,
 
 	list_for_each_entry_safe(nhge, tmp, &nh->grp_list, nh_list)
 		remove_nh_grp_entry(net, nhge, nlinfo);
+
+	/* make sure all see the newly published array before releasing rtnl */
+	synchronize_rcu();
 }
 
 static void remove_nexthop_group(struct nexthop *nh, struct nl_info *nlinfo)
@@ -758,10 +775,7 @@ static void remove_nexthop_group(struct nexthop *nh, struct nl_info *nlinfo)
 		if (WARN_ON(!nhge->nh))
 			continue;
 
-		list_del(&nhge->nh_list);
-		nexthop_put(nhge->nh);
-		nhge->nh = NULL;
-		nhg->num_nh--;
+		list_del_init(&nhge->nh_list);
 	}
 }
 
@@ -1084,6 +1098,7 @@ static struct nexthop *nexthop_create_group(struct net *net,
 {
 	struct nlattr *grps_attr = cfg->nh_grp;
 	struct nexthop_grp *entry = nla_data(grps_attr);
+	u16 num_nh = nla_len(grps_attr) / sizeof(*entry);
 	struct nh_group *nhg;
 	struct nexthop *nh;
 	int i;
@@ -1094,12 +1109,21 @@ static struct nexthop *nexthop_create_group(struct net *net,
 
 	nh->is_group = 1;
 
-	nhg = nexthop_grp_alloc(nla_len(grps_attr) / sizeof(*entry));
+	nhg = nexthop_grp_alloc(num_nh);
 	if (!nhg) {
 		kfree(nh);
 		return ERR_PTR(-ENOMEM);
 	}
 
+	/* spare group used for removals */
+	nhg->spare = nexthop_grp_alloc(num_nh);
+	if (!nhg) {
+		kfree(nhg);
+		kfree(nh);
+		return NULL;
+	}
+	nhg->spare->spare = nhg;
+
 	for (i = 0; i < nhg->num_nh; ++i) {
 		struct nexthop *nhe;
 		struct nh_info *nhi;
@@ -1131,6 +1155,7 @@ static struct nexthop *nexthop_create_group(struct net *net,
 	for (; i >= 0; --i)
 		nexthop_put(nhg->nh_entries[i].nh);
 
+	kfree(nhg->spare);
 	kfree(nhg);
 	kfree(nh);
 
-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net 3/5] nexthop: Expand nexthop_is_multipath in a few places
  2020-05-26 15:01 [PATCH net 0/5] nexthops: Fix 2 fundamental flaws with nexthop groups David Ahern
  2020-05-26 15:01 ` [PATCH net 1/5] nexthops: Move code from remove_nexthop_from_groups to remove_nh_grp_entry David Ahern
  2020-05-26 15:01 ` [PATCH net 2/5] nexthops: don't modify published nexthop groups David Ahern
@ 2020-05-26 15:01 ` David Ahern
  2020-05-26 15:08   ` Nikolay Aleksandrov
  2020-05-26 15:01 ` [PATCH net 4/5] ipv4: Refactor nhc evaluation in fib_table_lookup David Ahern
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: David Ahern @ 2020-05-26 15:01 UTC (permalink / raw)
  To: netdev; +Cc: davem, kuba, nikolay, David Ahern

From: David Ahern <dsahern@gmail.com>

I got too fancy consolidating checks on multipath type. The result
is that path lookups can access 2 different nh_grp structs as exposed
by Nik's torture tests. Expand nexthop_is_multipath within nexthop.h to
avoid multiple, nh_grp dereferences and make decisions based on the
consistent struct.

Only 2 places left using nexthop_is_multipath are within IPv6, both
only check that the nexthop is a multipath for a branching decision
which are acceptable.

Fixes: 430a049190de ("nexthop: Add support for nexthop groups")
Signed-off-by: David Ahern <dsahern@gmail.com>
---
 include/net/nexthop.h | 41 +++++++++++++++++++++++++----------------
 1 file changed, 25 insertions(+), 16 deletions(-)

diff --git a/include/net/nexthop.h b/include/net/nexthop.h
index 8a343519ed7a..f09e8d7d9886 100644
--- a/include/net/nexthop.h
+++ b/include/net/nexthop.h
@@ -137,21 +137,20 @@ static inline unsigned int nexthop_num_path(const struct nexthop *nh)
 {
 	unsigned int rc = 1;
 
-	if (nexthop_is_multipath(nh)) {
+	if (nh->is_group) {
 		struct nh_group *nh_grp;
 
 		nh_grp = rcu_dereference_rtnl(nh->nh_grp);
-		rc = nh_grp->num_nh;
+		if (nh_grp->mpath)
+			rc = nh_grp->num_nh;
 	}
 
 	return rc;
 }
 
 static inline
-struct nexthop *nexthop_mpath_select(const struct nexthop *nh, int nhsel)
+struct nexthop *nexthop_mpath_select(const struct nh_group *nhg, int nhsel)
 {
-	const struct nh_group *nhg = rcu_dereference_rtnl(nh->nh_grp);
-
 	/* for_nexthops macros in fib_semantics.c grabs a pointer to
 	 * the nexthop before checking nhsel
 	 */
@@ -186,12 +185,14 @@ static inline bool nexthop_is_blackhole(const struct nexthop *nh)
 {
 	const struct nh_info *nhi;
 
-	if (nexthop_is_multipath(nh)) {
-		if (nexthop_num_path(nh) > 1)
-			return false;
-		nh = nexthop_mpath_select(nh, 0);
-		if (!nh)
+	if (nh->is_group) {
+		struct nh_group *nh_grp;
+
+		nh_grp = rcu_dereference_rtnl(nh->nh_grp);
+		if (nh_grp->num_nh > 1)
 			return false;
+
+		nh = nh_grp->nh_entries[0].nh;
 	}
 
 	nhi = rcu_dereference_rtnl(nh->nh_info);
@@ -217,10 +218,15 @@ struct fib_nh_common *nexthop_fib_nhc(struct nexthop *nh, int nhsel)
 	BUILD_BUG_ON(offsetof(struct fib_nh, nh_common) != 0);
 	BUILD_BUG_ON(offsetof(struct fib6_nh, nh_common) != 0);
 
-	if (nexthop_is_multipath(nh)) {
-		nh = nexthop_mpath_select(nh, nhsel);
-		if (!nh)
-			return NULL;
+	if (nh->is_group) {
+		struct nh_group *nh_grp;
+
+		nh_grp = rcu_dereference_rtnl(nh->nh_grp);
+		if (nh_grp->mpath) {
+			nh = nexthop_mpath_select(nh_grp, nhsel);
+			if (!nh)
+				return NULL;
+		}
 	}
 
 	nhi = rcu_dereference_rtnl(nh->nh_info);
@@ -264,8 +270,11 @@ static inline struct fib6_nh *nexthop_fib6_nh(struct nexthop *nh)
 {
 	struct nh_info *nhi;
 
-	if (nexthop_is_multipath(nh)) {
-		nh = nexthop_mpath_select(nh, 0);
+	if (nh->is_group) {
+		struct nh_group *nh_grp;
+
+		nh_grp = rcu_dereference_rtnl(nh->nh_grp);
+		nh = nexthop_mpath_select(nh_grp, 0);
 		if (!nh)
 			return NULL;
 	}
-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net 4/5] ipv4: Refactor nhc evaluation in fib_table_lookup
  2020-05-26 15:01 [PATCH net 0/5] nexthops: Fix 2 fundamental flaws with nexthop groups David Ahern
                   ` (2 preceding siblings ...)
  2020-05-26 15:01 ` [PATCH net 3/5] nexthop: Expand nexthop_is_multipath in a few places David Ahern
@ 2020-05-26 15:01 ` David Ahern
  2020-05-26 15:09   ` Nikolay Aleksandrov
  2020-05-26 15:01 ` [PATCH net 5/5] ipv4: nexthop version of fib_info_nh_uses_dev David Ahern
  2020-05-26 22:28 ` [PATCH net 0/5] nexthops: Fix 2 fundamental flaws with nexthop groups David Miller
  5 siblings, 1 reply; 16+ messages in thread
From: David Ahern @ 2020-05-26 15:01 UTC (permalink / raw)
  To: netdev; +Cc: davem, kuba, nikolay, David Ahern

From: David Ahern <dsahern@gmail.com>

FIB lookups can return an entry that references an external nexthop.
While walking the nexthop struct we do not want to make multiple calls
into the nexthop code which can result in 2 different structs getting
accessed - one returning the number of paths the rest of the loop
seeing a different nh_grp struct. If the nexthop group shrunk, the
result is an attempt to access a fib_nh_common that does not exist for
the new nh_grp struct but did for the old one.

To fix that move the device evaluation code to a helper that can be
used for inline fib_nh path as well as external nexthops.

Update the existing check for fi->nh in fib_table_lookup to call a
new helper, nexthop_get_nhc_lookup, which walks the external nexthop
with a single rcu dereference.

Fixes: 430a049190de ("nexthop: Add support for nexthop groups")
Signed-off-by: David Ahern <dsahern@gmail.com>
---
 include/net/ip_fib.h  |  2 ++
 include/net/nexthop.h | 33 ++++++++++++++++++++++++++++
 net/ipv4/fib_trie.c   | 51 ++++++++++++++++++++++++++++++-------------
 3 files changed, 71 insertions(+), 15 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 59e0d4e99f94..1f1dd22980e4 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -480,6 +480,8 @@ void fib_nh_common_release(struct fib_nh_common *nhc);
 void fib_alias_hw_flags_set(struct net *net, const struct fib_rt_info *fri);
 void fib_trie_init(void);
 struct fib_table *fib_trie_table(u32 id, struct fib_table *alias);
+bool fib_lookup_good_nhc(const struct fib_nh_common *nhc, int fib_flags,
+			 const struct flowi4 *flp);
 
 static inline void fib_combine_itag(u32 *itag, const struct fib_result *res)
 {
diff --git a/include/net/nexthop.h b/include/net/nexthop.h
index f09e8d7d9886..9414ae46fc1c 100644
--- a/include/net/nexthop.h
+++ b/include/net/nexthop.h
@@ -233,6 +233,39 @@ struct fib_nh_common *nexthop_fib_nhc(struct nexthop *nh, int nhsel)
 	return &nhi->fib_nhc;
 }
 
+/* called from fib_table_lookup with rcu_lock */
+static inline
+struct fib_nh_common *nexthop_get_nhc_lookup(const struct nexthop *nh,
+					     int fib_flags,
+					     const struct flowi4 *flp,
+					     int *nhsel)
+{
+	struct nh_info *nhi;
+
+	if (nh->is_group) {
+		struct nh_group *nhg = rcu_dereference(nh->nh_grp);
+		int i;
+
+		for (i = 0; i < nhg->num_nh; i++) {
+			struct nexthop *nhe = nhg->nh_entries[i].nh;
+
+			nhi = rcu_dereference(nhe->nh_info);
+			if (fib_lookup_good_nhc(&nhi->fib_nhc, fib_flags, flp)) {
+				*nhsel = i;
+				return &nhi->fib_nhc;
+			}
+		}
+	} else {
+		nhi = rcu_dereference(nh->nh_info);
+		if (fib_lookup_good_nhc(&nhi->fib_nhc, fib_flags, flp)) {
+			*nhsel = 0;
+			return &nhi->fib_nhc;
+		}
+	}
+
+	return NULL;
+}
+
 static inline unsigned int fib_info_num_path(const struct fib_info *fi)
 {
 	if (unlikely(fi->nh))
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 4f334b425538..248f1c1959a6 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1371,6 +1371,26 @@ static inline t_key prefix_mismatch(t_key key, struct key_vector *n)
 	return (key ^ prefix) & (prefix | -prefix);
 }
 
+bool fib_lookup_good_nhc(const struct fib_nh_common *nhc, int fib_flags,
+			 const struct flowi4 *flp)
+{
+	if (nhc->nhc_flags & RTNH_F_DEAD)
+		return false;
+
+	if (ip_ignore_linkdown(nhc->nhc_dev) &&
+	    nhc->nhc_flags & RTNH_F_LINKDOWN &&
+	    !(fib_flags & FIB_LOOKUP_IGNORE_LINKSTATE))
+		return false;
+
+	if (!(flp->flowi4_flags & FLOWI_FLAG_SKIP_NH_OIF)) {
+		if (flp->flowi4_oif &&
+		    flp->flowi4_oif != nhc->nhc_oif)
+			return false;
+	}
+
+	return true;
+}
+
 /* should be called with rcu_read_lock */
 int fib_table_lookup(struct fib_table *tb, const struct flowi4 *flp,
 		     struct fib_result *res, int fib_flags)
@@ -1503,6 +1523,7 @@ int fib_table_lookup(struct fib_table *tb, const struct flowi4 *flp,
 	/* Step 3: Process the leaf, if that fails fall back to backtracing */
 	hlist_for_each_entry_rcu(fa, &n->leaf, fa_list) {
 		struct fib_info *fi = fa->fa_info;
+		struct fib_nh_common *nhc;
 		int nhsel, err;
 
 		if ((BITS_PER_LONG > KEYLENGTH) || (fa->fa_slen < KEYLENGTH)) {
@@ -1528,26 +1549,25 @@ int fib_table_lookup(struct fib_table *tb, const struct flowi4 *flp,
 		if (fi->fib_flags & RTNH_F_DEAD)
 			continue;
 
-		if (unlikely(fi->nh && nexthop_is_blackhole(fi->nh))) {
-			err = fib_props[RTN_BLACKHOLE].error;
-			goto out_reject;
+		if (unlikely(fi->nh)) {
+			if (nexthop_is_blackhole(fi->nh)) {
+				err = fib_props[RTN_BLACKHOLE].error;
+				goto out_reject;
+			}
+
+			nhc = nexthop_get_nhc_lookup(fi->nh, fib_flags, flp,
+						     &nhsel);
+			if (nhc)
+				goto set_result;
+			goto miss;
 		}
 
 		for (nhsel = 0; nhsel < fib_info_num_path(fi); nhsel++) {
-			struct fib_nh_common *nhc = fib_info_nhc(fi, nhsel);
+			nhc = fib_info_nhc(fi, nhsel);
 
-			if (nhc->nhc_flags & RTNH_F_DEAD)
+			if (!fib_lookup_good_nhc(nhc, fib_flags, flp))
 				continue;
-			if (ip_ignore_linkdown(nhc->nhc_dev) &&
-			    nhc->nhc_flags & RTNH_F_LINKDOWN &&
-			    !(fib_flags & FIB_LOOKUP_IGNORE_LINKSTATE))
-				continue;
-			if (!(flp->flowi4_flags & FLOWI_FLAG_SKIP_NH_OIF)) {
-				if (flp->flowi4_oif &&
-				    flp->flowi4_oif != nhc->nhc_oif)
-					continue;
-			}
-
+set_result:
 			if (!(fib_flags & FIB_LOOKUP_NOREF))
 				refcount_inc(&fi->fib_clntref);
 
@@ -1568,6 +1588,7 @@ int fib_table_lookup(struct fib_table *tb, const struct flowi4 *flp,
 			return err;
 		}
 	}
+miss:
 #ifdef CONFIG_IP_FIB_TRIE_STATS
 	this_cpu_inc(stats->semantic_match_miss);
 #endif
-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net 5/5] ipv4: nexthop version of fib_info_nh_uses_dev
  2020-05-26 15:01 [PATCH net 0/5] nexthops: Fix 2 fundamental flaws with nexthop groups David Ahern
                   ` (3 preceding siblings ...)
  2020-05-26 15:01 ` [PATCH net 4/5] ipv4: Refactor nhc evaluation in fib_table_lookup David Ahern
@ 2020-05-26 15:01 ` David Ahern
  2020-05-26 15:10   ` Nikolay Aleksandrov
  2020-05-26 18:37   ` Jakub Kicinski
  2020-05-26 22:28 ` [PATCH net 0/5] nexthops: Fix 2 fundamental flaws with nexthop groups David Miller
  5 siblings, 2 replies; 16+ messages in thread
From: David Ahern @ 2020-05-26 15:01 UTC (permalink / raw)
  To: netdev; +Cc: davem, kuba, nikolay, David Ahern

From: David Ahern <dsahern@gmail.com>

Similar to the last path, need to fix fib_info_nh_uses_dev for
external nexthops to avoid referencing multiple nh_grp structs.
Move the device check in fib_info_nh_uses_dev to a helper and
create a nexthop version that is called if the fib_info uses an
external nexthop.

Fixes: 430a049190de ("nexthop: Add support for nexthop groups")
Signed-off-by: David Ahern <dsahern@gmail.com>
---
 include/net/ip_fib.h    | 10 ++++++++++
 include/net/nexthop.h   | 25 +++++++++++++++++++++++++
 net/ipv4/fib_frontend.c | 19 ++++++++++---------
 3 files changed, 45 insertions(+), 9 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 1f1dd22980e4..6683558db7c9 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -448,6 +448,16 @@ static inline int fib_num_tclassid_users(struct net *net)
 #endif
 int fib_unmerge(struct net *net);
 
+static inline bool nhc_l3mdev_matches_dev(const struct fib_nh_common *nhc,
+const struct net_device *dev)
+{
+	if (nhc->nhc_dev == dev ||
+	    l3mdev_master_ifindex_rcu(nhc->nhc_dev) == dev->ifindex)
+			return true;
+
+	return false;
+}
+
 /* Exported by fib_semantics.c */
 int ip_fib_check_default(__be32 gw, struct net_device *dev);
 int fib_sync_down_dev(struct net_device *dev, unsigned long event, bool force);
diff --git a/include/net/nexthop.h b/include/net/nexthop.h
index 9414ae46fc1c..35680a8c2a1c 100644
--- a/include/net/nexthop.h
+++ b/include/net/nexthop.h
@@ -266,6 +266,31 @@ struct fib_nh_common *nexthop_get_nhc_lookup(const struct nexthop *nh,
 	return NULL;
 }
 
+static inline bool nexthop_uses_dev(const struct nexthop *nh,
+				    const struct net_device *dev)
+{
+	struct nh_info *nhi;
+
+	if (nh->is_group) {
+		struct nh_group *nhg = rcu_dereference(nh->nh_grp);
+		int i;
+
+		for (i = 0; i < nhg->num_nh; i++) {
+			struct nexthop *nhe = nhg->nh_entries[i].nh;
+
+			nhi = rcu_dereference(nhe->nh_info);
+			if (nhc_l3mdev_matches_dev(&nhi->fib_nhc, dev))
+				return true;
+                }
+        } else {
+		nhi = rcu_dereference(nh->nh_info);
+		if (nhc_l3mdev_matches_dev(&nhi->fib_nhc, dev))
+			return true;
+        }
+
+	return false;
+}
+
 static inline unsigned int fib_info_num_path(const struct fib_info *fi)
 {
 	if (unlikely(fi->nh))
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 213be9c050ad..aebb50735c68 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -309,17 +309,18 @@ bool fib_info_nh_uses_dev(struct fib_info *fi, const struct net_device *dev)
 {
 	bool dev_match = false;
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
-	int ret;
+	if (unlikely(fi->nh)) {
+		dev_match = nexthop_uses_dev(fi->nh, dev);
+	} else {
+		int ret;
 
-	for (ret = 0; ret < fib_info_num_path(fi); ret++) {
-		const struct fib_nh_common *nhc = fib_info_nhc(fi, ret);
+		for (ret = 0; ret < fib_info_num_path(fi); ret++) {
+			const struct fib_nh_common *nhc = fib_info_nhc(fi, ret);
 
-		if (nhc->nhc_dev == dev) {
-			dev_match = true;
-			break;
-		} else if (l3mdev_master_ifindex_rcu(nhc->nhc_dev) == dev->ifindex) {
-			dev_match = true;
-			break;
+			if (nhc_l3mdev_matches_dev(nhc, dev)) {
+				dev_match = true;
+				break;
+			}
 		}
 	}
 #else
-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH net 1/5] nexthops: Move code from remove_nexthop_from_groups to remove_nh_grp_entry
  2020-05-26 15:01 ` [PATCH net 1/5] nexthops: Move code from remove_nexthop_from_groups to remove_nh_grp_entry David Ahern
@ 2020-05-26 15:07   ` Nikolay Aleksandrov
  0 siblings, 0 replies; 16+ messages in thread
From: Nikolay Aleksandrov @ 2020-05-26 15:07 UTC (permalink / raw)
  To: David Ahern, netdev; +Cc: davem, kuba, David Ahern

On 26/05/2020 18:01, David Ahern wrote:
> From: David Ahern <dsahern@gmail.com>
> 
> Move nh_grp dereference and check for removing nexthop group due to
> all members gone into remove_nh_grp_entry.
> 
> Fixes: 430a049190de ("nexthop: Add support for nexthop groups")
> Signed-off-by: David Ahern <dsahern@gmail.com>
> ---
>  net/ipv4/nexthop.c | 27 +++++++++++++--------------
>  1 file changed, 13 insertions(+), 14 deletions(-)
> 

Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

> diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
> index 2a31c4af845e..0f68d9801808 100644
> --- a/net/ipv4/nexthop.c
> +++ b/net/ipv4/nexthop.c
> @@ -693,17 +693,21 @@ static void nh_group_rebalance(struct nh_group *nhg)
>  	}
>  }
>  
> -static void remove_nh_grp_entry(struct nh_grp_entry *nhge,
> -				struct nh_group *nhg,
> +static void remove_nh_grp_entry(struct net *net, struct nh_grp_entry *nhge,
>  				struct nl_info *nlinfo)
>  {
> +	struct nexthop *nhp = nhge->nh_parent;
>  	struct nexthop *nh = nhge->nh;
>  	struct nh_grp_entry *nhges;
> +	struct nh_group *nhg;
>  	bool found = false;
>  	int i;
>  
>  	WARN_ON(!nh);
>  
> +	list_del(&nhge->nh_list);
> +
> +	nhg = rtnl_dereference(nhp->nh_grp);
>  	nhges = nhg->nh_entries;
>  	for (i = 0; i < nhg->num_nh; ++i) {
>  		if (found) {
> @@ -727,7 +731,11 @@ static void remove_nh_grp_entry(struct nh_grp_entry *nhge,
>  	nexthop_put(nh);
>  
>  	if (nlinfo)
> -		nexthop_notify(RTM_NEWNEXTHOP, nhge->nh_parent, nlinfo);
> +		nexthop_notify(RTM_NEWNEXTHOP, nhp, nlinfo);
> +
> +	/* if this group has no more entries then remove it */
> +	if (!nhg->num_nh)
> +		remove_nexthop(net, nhp, nlinfo);
>  }
>  
>  static void remove_nexthop_from_groups(struct net *net, struct nexthop *nh,
> @@ -735,17 +743,8 @@ static void remove_nexthop_from_groups(struct net *net, struct nexthop *nh,
>  {
>  	struct nh_grp_entry *nhge, *tmp;
>  
> -	list_for_each_entry_safe(nhge, tmp, &nh->grp_list, nh_list) {
> -		struct nh_group *nhg;
> -
> -		list_del(&nhge->nh_list);
> -		nhg = rtnl_dereference(nhge->nh_parent->nh_grp);
> -		remove_nh_grp_entry(nhge, nhg, nlinfo);
> -
> -		/* if this group has no more entries then remove it */
> -		if (!nhg->num_nh)
> -			remove_nexthop(net, nhge->nh_parent, nlinfo);
> -	}
> +	list_for_each_entry_safe(nhge, tmp, &nh->grp_list, nh_list)
> +		remove_nh_grp_entry(net, nhge, nlinfo);
>  }
>  
>  static void remove_nexthop_group(struct nexthop *nh, struct nl_info *nlinfo)
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net 3/5] nexthop: Expand nexthop_is_multipath in a few places
  2020-05-26 15:01 ` [PATCH net 3/5] nexthop: Expand nexthop_is_multipath in a few places David Ahern
@ 2020-05-26 15:08   ` Nikolay Aleksandrov
  0 siblings, 0 replies; 16+ messages in thread
From: Nikolay Aleksandrov @ 2020-05-26 15:08 UTC (permalink / raw)
  To: David Ahern, netdev; +Cc: davem, kuba, David Ahern

On 26/05/2020 18:01, David Ahern wrote:
> From: David Ahern <dsahern@gmail.com>
> 
> I got too fancy consolidating checks on multipath type. The result
> is that path lookups can access 2 different nh_grp structs as exposed
> by Nik's torture tests. Expand nexthop_is_multipath within nexthop.h to
> avoid multiple, nh_grp dereferences and make decisions based on the
> consistent struct.
> 
> Only 2 places left using nexthop_is_multipath are within IPv6, both
> only check that the nexthop is a multipath for a branching decision
> which are acceptable.
> 
> Fixes: 430a049190de ("nexthop: Add support for nexthop groups")
> Signed-off-by: David Ahern <dsahern@gmail.com>
> ---
>  include/net/nexthop.h | 41 +++++++++++++++++++++++++----------------
>  1 file changed, 25 insertions(+), 16 deletions(-)
> 

Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

> diff --git a/include/net/nexthop.h b/include/net/nexthop.h
> index 8a343519ed7a..f09e8d7d9886 100644
> --- a/include/net/nexthop.h
> +++ b/include/net/nexthop.h
> @@ -137,21 +137,20 @@ static inline unsigned int nexthop_num_path(const struct nexthop *nh)
>  {
>  	unsigned int rc = 1;
>  
> -	if (nexthop_is_multipath(nh)) {
> +	if (nh->is_group) {
>  		struct nh_group *nh_grp;
>  
>  		nh_grp = rcu_dereference_rtnl(nh->nh_grp);
> -		rc = nh_grp->num_nh;
> +		if (nh_grp->mpath)
> +			rc = nh_grp->num_nh;
>  	}
>  
>  	return rc;
>  }
>  
>  static inline
> -struct nexthop *nexthop_mpath_select(const struct nexthop *nh, int nhsel)
> +struct nexthop *nexthop_mpath_select(const struct nh_group *nhg, int nhsel)
>  {
> -	const struct nh_group *nhg = rcu_dereference_rtnl(nh->nh_grp);
> -
>  	/* for_nexthops macros in fib_semantics.c grabs a pointer to
>  	 * the nexthop before checking nhsel
>  	 */
> @@ -186,12 +185,14 @@ static inline bool nexthop_is_blackhole(const struct nexthop *nh)
>  {
>  	const struct nh_info *nhi;
>  
> -	if (nexthop_is_multipath(nh)) {
> -		if (nexthop_num_path(nh) > 1)
> -			return false;
> -		nh = nexthop_mpath_select(nh, 0);
> -		if (!nh)
> +	if (nh->is_group) {
> +		struct nh_group *nh_grp;
> +
> +		nh_grp = rcu_dereference_rtnl(nh->nh_grp);
> +		if (nh_grp->num_nh > 1)
>  			return false;
> +
> +		nh = nh_grp->nh_entries[0].nh;
>  	}
>  
>  	nhi = rcu_dereference_rtnl(nh->nh_info);
> @@ -217,10 +218,15 @@ struct fib_nh_common *nexthop_fib_nhc(struct nexthop *nh, int nhsel)
>  	BUILD_BUG_ON(offsetof(struct fib_nh, nh_common) != 0);
>  	BUILD_BUG_ON(offsetof(struct fib6_nh, nh_common) != 0);
>  
> -	if (nexthop_is_multipath(nh)) {
> -		nh = nexthop_mpath_select(nh, nhsel);
> -		if (!nh)
> -			return NULL;
> +	if (nh->is_group) {
> +		struct nh_group *nh_grp;
> +
> +		nh_grp = rcu_dereference_rtnl(nh->nh_grp);
> +		if (nh_grp->mpath) {
> +			nh = nexthop_mpath_select(nh_grp, nhsel);
> +			if (!nh)
> +				return NULL;
> +		}
>  	}
>  
>  	nhi = rcu_dereference_rtnl(nh->nh_info);
> @@ -264,8 +270,11 @@ static inline struct fib6_nh *nexthop_fib6_nh(struct nexthop *nh)
>  {
>  	struct nh_info *nhi;
>  
> -	if (nexthop_is_multipath(nh)) {
> -		nh = nexthop_mpath_select(nh, 0);
> +	if (nh->is_group) {
> +		struct nh_group *nh_grp;
> +
> +		nh_grp = rcu_dereference_rtnl(nh->nh_grp);
> +		nh = nexthop_mpath_select(nh_grp, 0);
>  		if (!nh)
>  			return NULL;
>  	}
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net 4/5] ipv4: Refactor nhc evaluation in fib_table_lookup
  2020-05-26 15:01 ` [PATCH net 4/5] ipv4: Refactor nhc evaluation in fib_table_lookup David Ahern
@ 2020-05-26 15:09   ` Nikolay Aleksandrov
  0 siblings, 0 replies; 16+ messages in thread
From: Nikolay Aleksandrov @ 2020-05-26 15:09 UTC (permalink / raw)
  To: David Ahern, netdev; +Cc: davem, kuba, David Ahern

On 26/05/2020 18:01, David Ahern wrote:
> From: David Ahern <dsahern@gmail.com>
> 
> FIB lookups can return an entry that references an external nexthop.
> While walking the nexthop struct we do not want to make multiple calls
> into the nexthop code which can result in 2 different structs getting
> accessed - one returning the number of paths the rest of the loop
> seeing a different nh_grp struct. If the nexthop group shrunk, the
> result is an attempt to access a fib_nh_common that does not exist for
> the new nh_grp struct but did for the old one.
> 
> To fix that move the device evaluation code to a helper that can be
> used for inline fib_nh path as well as external nexthops.
> 
> Update the existing check for fi->nh in fib_table_lookup to call a
> new helper, nexthop_get_nhc_lookup, which walks the external nexthop
> with a single rcu dereference.
> 
> Fixes: 430a049190de ("nexthop: Add support for nexthop groups")
> Signed-off-by: David Ahern <dsahern@gmail.com>
> ---
>  include/net/ip_fib.h  |  2 ++
>  include/net/nexthop.h | 33 ++++++++++++++++++++++++++++
>  net/ipv4/fib_trie.c   | 51 ++++++++++++++++++++++++++++++-------------
>  3 files changed, 71 insertions(+), 15 deletions(-)
> 

Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

> diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
> index 59e0d4e99f94..1f1dd22980e4 100644
> --- a/include/net/ip_fib.h
> +++ b/include/net/ip_fib.h
> @@ -480,6 +480,8 @@ void fib_nh_common_release(struct fib_nh_common *nhc);
>  void fib_alias_hw_flags_set(struct net *net, const struct fib_rt_info *fri);
>  void fib_trie_init(void);
>  struct fib_table *fib_trie_table(u32 id, struct fib_table *alias);
> +bool fib_lookup_good_nhc(const struct fib_nh_common *nhc, int fib_flags,
> +			 const struct flowi4 *flp);
>  
>  static inline void fib_combine_itag(u32 *itag, const struct fib_result *res)
>  {
> diff --git a/include/net/nexthop.h b/include/net/nexthop.h
> index f09e8d7d9886..9414ae46fc1c 100644
> --- a/include/net/nexthop.h
> +++ b/include/net/nexthop.h
> @@ -233,6 +233,39 @@ struct fib_nh_common *nexthop_fib_nhc(struct nexthop *nh, int nhsel)
>  	return &nhi->fib_nhc;
>  }
>  
> +/* called from fib_table_lookup with rcu_lock */
> +static inline
> +struct fib_nh_common *nexthop_get_nhc_lookup(const struct nexthop *nh,
> +					     int fib_flags,
> +					     const struct flowi4 *flp,
> +					     int *nhsel)
> +{
> +	struct nh_info *nhi;
> +
> +	if (nh->is_group) {
> +		struct nh_group *nhg = rcu_dereference(nh->nh_grp);
> +		int i;
> +
> +		for (i = 0; i < nhg->num_nh; i++) {
> +			struct nexthop *nhe = nhg->nh_entries[i].nh;
> +
> +			nhi = rcu_dereference(nhe->nh_info);
> +			if (fib_lookup_good_nhc(&nhi->fib_nhc, fib_flags, flp)) {
> +				*nhsel = i;
> +				return &nhi->fib_nhc;
> +			}
> +		}
> +	} else {
> +		nhi = rcu_dereference(nh->nh_info);
> +		if (fib_lookup_good_nhc(&nhi->fib_nhc, fib_flags, flp)) {
> +			*nhsel = 0;
> +			return &nhi->fib_nhc;
> +		}
> +	}
> +
> +	return NULL;
> +}
> +
>  static inline unsigned int fib_info_num_path(const struct fib_info *fi)
>  {
>  	if (unlikely(fi->nh))
> diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
> index 4f334b425538..248f1c1959a6 100644
> --- a/net/ipv4/fib_trie.c
> +++ b/net/ipv4/fib_trie.c
> @@ -1371,6 +1371,26 @@ static inline t_key prefix_mismatch(t_key key, struct key_vector *n)
>  	return (key ^ prefix) & (prefix | -prefix);
>  }
>  
> +bool fib_lookup_good_nhc(const struct fib_nh_common *nhc, int fib_flags,
> +			 const struct flowi4 *flp)
> +{
> +	if (nhc->nhc_flags & RTNH_F_DEAD)
> +		return false;
> +
> +	if (ip_ignore_linkdown(nhc->nhc_dev) &&
> +	    nhc->nhc_flags & RTNH_F_LINKDOWN &&
> +	    !(fib_flags & FIB_LOOKUP_IGNORE_LINKSTATE))
> +		return false;
> +
> +	if (!(flp->flowi4_flags & FLOWI_FLAG_SKIP_NH_OIF)) {
> +		if (flp->flowi4_oif &&
> +		    flp->flowi4_oif != nhc->nhc_oif)
> +			return false;
> +	}
> +
> +	return true;
> +}
> +
>  /* should be called with rcu_read_lock */
>  int fib_table_lookup(struct fib_table *tb, const struct flowi4 *flp,
>  		     struct fib_result *res, int fib_flags)
> @@ -1503,6 +1523,7 @@ int fib_table_lookup(struct fib_table *tb, const struct flowi4 *flp,
>  	/* Step 3: Process the leaf, if that fails fall back to backtracing */
>  	hlist_for_each_entry_rcu(fa, &n->leaf, fa_list) {
>  		struct fib_info *fi = fa->fa_info;
> +		struct fib_nh_common *nhc;
>  		int nhsel, err;
>  
>  		if ((BITS_PER_LONG > KEYLENGTH) || (fa->fa_slen < KEYLENGTH)) {
> @@ -1528,26 +1549,25 @@ int fib_table_lookup(struct fib_table *tb, const struct flowi4 *flp,
>  		if (fi->fib_flags & RTNH_F_DEAD)
>  			continue;
>  
> -		if (unlikely(fi->nh && nexthop_is_blackhole(fi->nh))) {
> -			err = fib_props[RTN_BLACKHOLE].error;
> -			goto out_reject;
> +		if (unlikely(fi->nh)) {
> +			if (nexthop_is_blackhole(fi->nh)) {
> +				err = fib_props[RTN_BLACKHOLE].error;
> +				goto out_reject;
> +			}
> +
> +			nhc = nexthop_get_nhc_lookup(fi->nh, fib_flags, flp,
> +						     &nhsel);
> +			if (nhc)
> +				goto set_result;
> +			goto miss;
>  		}
>  
>  		for (nhsel = 0; nhsel < fib_info_num_path(fi); nhsel++) {
> -			struct fib_nh_common *nhc = fib_info_nhc(fi, nhsel);
> +			nhc = fib_info_nhc(fi, nhsel);
>  
> -			if (nhc->nhc_flags & RTNH_F_DEAD)
> +			if (!fib_lookup_good_nhc(nhc, fib_flags, flp))
>  				continue;
> -			if (ip_ignore_linkdown(nhc->nhc_dev) &&
> -			    nhc->nhc_flags & RTNH_F_LINKDOWN &&
> -			    !(fib_flags & FIB_LOOKUP_IGNORE_LINKSTATE))
> -				continue;
> -			if (!(flp->flowi4_flags & FLOWI_FLAG_SKIP_NH_OIF)) {
> -				if (flp->flowi4_oif &&
> -				    flp->flowi4_oif != nhc->nhc_oif)
> -					continue;
> -			}
> -
> +set_result:
>  			if (!(fib_flags & FIB_LOOKUP_NOREF))
>  				refcount_inc(&fi->fib_clntref);
>  
> @@ -1568,6 +1588,7 @@ int fib_table_lookup(struct fib_table *tb, const struct flowi4 *flp,
>  			return err;
>  		}
>  	}
> +miss:
>  #ifdef CONFIG_IP_FIB_TRIE_STATS
>  	this_cpu_inc(stats->semantic_match_miss);
>  #endif
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net 5/5] ipv4: nexthop version of fib_info_nh_uses_dev
  2020-05-26 15:01 ` [PATCH net 5/5] ipv4: nexthop version of fib_info_nh_uses_dev David Ahern
@ 2020-05-26 15:10   ` Nikolay Aleksandrov
  2020-05-26 18:37   ` Jakub Kicinski
  1 sibling, 0 replies; 16+ messages in thread
From: Nikolay Aleksandrov @ 2020-05-26 15:10 UTC (permalink / raw)
  To: David Ahern, netdev; +Cc: davem, kuba, David Ahern

On 26/05/2020 18:01, David Ahern wrote:
> From: David Ahern <dsahern@gmail.com>
> 
> Similar to the last path, need to fix fib_info_nh_uses_dev for
> external nexthops to avoid referencing multiple nh_grp structs.
> Move the device check in fib_info_nh_uses_dev to a helper and
> create a nexthop version that is called if the fib_info uses an
> external nexthop.
> 
> Fixes: 430a049190de ("nexthop: Add support for nexthop groups")
> Signed-off-by: David Ahern <dsahern@gmail.com>
> ---
>  include/net/ip_fib.h    | 10 ++++++++++
>  include/net/nexthop.h   | 25 +++++++++++++++++++++++++
>  net/ipv4/fib_frontend.c | 19 ++++++++++---------
>  3 files changed, 45 insertions(+), 9 deletions(-)
> 

Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

> diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
> index 1f1dd22980e4..6683558db7c9 100644
> --- a/include/net/ip_fib.h
> +++ b/include/net/ip_fib.h
> @@ -448,6 +448,16 @@ static inline int fib_num_tclassid_users(struct net *net)
>  #endif
>  int fib_unmerge(struct net *net);
>  
> +static inline bool nhc_l3mdev_matches_dev(const struct fib_nh_common *nhc,
> +const struct net_device *dev)
> +{
> +	if (nhc->nhc_dev == dev ||
> +	    l3mdev_master_ifindex_rcu(nhc->nhc_dev) == dev->ifindex)
> +			return true;
> +
> +	return false;
> +}
> +
>  /* Exported by fib_semantics.c */
>  int ip_fib_check_default(__be32 gw, struct net_device *dev);
>  int fib_sync_down_dev(struct net_device *dev, unsigned long event, bool force);
> diff --git a/include/net/nexthop.h b/include/net/nexthop.h
> index 9414ae46fc1c..35680a8c2a1c 100644
> --- a/include/net/nexthop.h
> +++ b/include/net/nexthop.h
> @@ -266,6 +266,31 @@ struct fib_nh_common *nexthop_get_nhc_lookup(const struct nexthop *nh,
>  	return NULL;
>  }
>  
> +static inline bool nexthop_uses_dev(const struct nexthop *nh,
> +				    const struct net_device *dev)
> +{
> +	struct nh_info *nhi;
> +
> +	if (nh->is_group) {
> +		struct nh_group *nhg = rcu_dereference(nh->nh_grp);
> +		int i;
> +
> +		for (i = 0; i < nhg->num_nh; i++) {
> +			struct nexthop *nhe = nhg->nh_entries[i].nh;
> +
> +			nhi = rcu_dereference(nhe->nh_info);
> +			if (nhc_l3mdev_matches_dev(&nhi->fib_nhc, dev))
> +				return true;
> +                }
> +        } else {
> +		nhi = rcu_dereference(nh->nh_info);
> +		if (nhc_l3mdev_matches_dev(&nhi->fib_nhc, dev))
> +			return true;
> +        }
> +
> +	return false;
> +}
> +
>  static inline unsigned int fib_info_num_path(const struct fib_info *fi)
>  {
>  	if (unlikely(fi->nh))
> diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
> index 213be9c050ad..aebb50735c68 100644
> --- a/net/ipv4/fib_frontend.c
> +++ b/net/ipv4/fib_frontend.c
> @@ -309,17 +309,18 @@ bool fib_info_nh_uses_dev(struct fib_info *fi, const struct net_device *dev)
>  {
>  	bool dev_match = false;
>  #ifdef CONFIG_IP_ROUTE_MULTIPATH
> -	int ret;
> +	if (unlikely(fi->nh)) {
> +		dev_match = nexthop_uses_dev(fi->nh, dev);
> +	} else {
> +		int ret;
>  
> -	for (ret = 0; ret < fib_info_num_path(fi); ret++) {
> -		const struct fib_nh_common *nhc = fib_info_nhc(fi, ret);
> +		for (ret = 0; ret < fib_info_num_path(fi); ret++) {
> +			const struct fib_nh_common *nhc = fib_info_nhc(fi, ret);
>  
> -		if (nhc->nhc_dev == dev) {
> -			dev_match = true;
> -			break;
> -		} else if (l3mdev_master_ifindex_rcu(nhc->nhc_dev) == dev->ifindex) {
> -			dev_match = true;
> -			break;
> +			if (nhc_l3mdev_matches_dev(nhc, dev)) {
> +				dev_match = true;
> +				break;
> +			}
>  		}
>  	}
>  #else
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net 5/5] ipv4: nexthop version of fib_info_nh_uses_dev
  2020-05-26 15:01 ` [PATCH net 5/5] ipv4: nexthop version of fib_info_nh_uses_dev David Ahern
  2020-05-26 15:10   ` Nikolay Aleksandrov
@ 2020-05-26 18:37   ` Jakub Kicinski
  2020-05-26 18:39     ` David Ahern
  1 sibling, 1 reply; 16+ messages in thread
From: Jakub Kicinski @ 2020-05-26 18:37 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, davem, nikolay, David Ahern

On Tue, 26 May 2020 09:01:14 -0600 David Ahern wrote:
> From: David Ahern <dsahern@gmail.com>
> 
> Similar to the last path, need to fix fib_info_nh_uses_dev for
> external nexthops to avoid referencing multiple nh_grp structs.
> Move the device check in fib_info_nh_uses_dev to a helper and
> create a nexthop version that is called if the fib_info uses an
> external nexthop.
> 
> Fixes: 430a049190de ("nexthop: Add support for nexthop groups")
> Signed-off-by: David Ahern <dsahern@gmail.com>

Dave, bunch of white space problems here according to checkpatch:

CHECK: Alignment should match open parenthesis
#31: FILE: include/net/ip_fib.h:451:
+static inline bool nhc_l3mdev_matches_dev(const struct fib_nh_common *nhc,
+const struct net_device *dev)

WARNING: suspect code indent for conditional statements (8, 24)
#33: FILE: include/net/ip_fib.h:453:
+	if (nhc->nhc_dev == dev ||
[...]
+			return true;

ERROR: code indent should use tabs where possible
#66: FILE: include/net/nexthop.h:284:
+                }$

WARNING: please, no spaces at the start of a line
#66: FILE: include/net/nexthop.h:284:
+                }$

ERROR: code indent should use tabs where possible
#67: FILE: include/net/nexthop.h:285:
+        } else {$

WARNING: please, no spaces at the start of a line
#67: FILE: include/net/nexthop.h:285:
+        } else {$

ERROR: code indent should use tabs where possible
#71: FILE: include/net/nexthop.h:289:
+        }$

WARNING: please, no spaces at the start of a line
#71: FILE: include/net/nexthop.h:289:
+        }$

total: 3 errors, 4 warnings, 1 checks, 74 lines checked

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net 5/5] ipv4: nexthop version of fib_info_nh_uses_dev
  2020-05-26 18:37   ` Jakub Kicinski
@ 2020-05-26 18:39     ` David Ahern
  0 siblings, 0 replies; 16+ messages in thread
From: David Ahern @ 2020-05-26 18:39 UTC (permalink / raw)
  To: Jakub Kicinski, David Ahern; +Cc: netdev, davem, nikolay

On 5/26/20 12:37 PM, Jakub Kicinski wrote:
> Dave, bunch of white space problems here according to checkpatch:
> 

ugh, the VM I used for iterations did not have the git hook to run
checkpatch on commit. Will re-spin.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net 0/5] nexthops: Fix 2 fundamental flaws with nexthop groups
  2020-05-26 15:01 [PATCH net 0/5] nexthops: Fix 2 fundamental flaws with nexthop groups David Ahern
                   ` (4 preceding siblings ...)
  2020-05-26 15:01 ` [PATCH net 5/5] ipv4: nexthop version of fib_info_nh_uses_dev David Ahern
@ 2020-05-26 22:28 ` David Miller
  2020-05-26 22:35   ` Nikolay Aleksandrov
  5 siblings, 1 reply; 16+ messages in thread
From: David Miller @ 2020-05-26 22:28 UTC (permalink / raw)
  To: dsahern; +Cc: netdev, kuba, nikolay, dahern

From: David Ahern <dsahern@kernel.org>
Date: Tue, 26 May 2020 09:01:09 -0600

> From: David Ahern <dahern@digitalocean.com>
> 
> Nik's torture tests have exposed 2 fundamental mistakes with the initial
> nexthop code for groups. First, the nexthops entries and num_nh in the
> nh_grp struct should not be modified once the struct is set under rcu.
> Doing so has major affects on the datapath seeing valid nexthop entries.
> 
> Second, the helpers in the header file were convenient for not repeating
> code, but they cause datapath walks to potentially see 2 different group
> structs after an rcu replace, disrupting a walk of the path objects.
> This second problem applies solely to IPv4 as I re-used too much of the
> existing code in walking legs of a multipath route.
> 
> Patches 1 is refactoring change to simplify the overhead of reviewing and
> understanding the change in patch 2 which fixes the update of nexthop
> groups when a compnent leg is removed.
> 
> Patches 3-5 address the second problem. Patch 3 inlines the multipath
> check such that the mpath lookup and subsequent calls all use the same
> nh_grp struct. Patches 4 and 5 fix datapath uses of fib_info_num_path
> with iterative calls to fib_info_nhc.
> 
> fib_info_num_path can be used in control plane path in a 'for loop' with
> subsequent fib_info_nhc calls to get each leg since the nh_grp struct is
> only changed while holding the rtnl; the combination can not be used in
> the data plane with external nexthops as it involves repeated dereferences
> of nh_grp struct which can change between calls.
> 
> Similarly, nexthop_is_multipath can be used for branching decisions in
> the datapath since the nexthop type can not be changed (a group can not
> be converted to standalone and vice versa).
> 
> Patch set developed in coordination with Nikolay Aleksandrov. He did a
> lot of work creating a good reproducer, discussing options to fix it
> and testing iterations.
> 
> I have adapted Nik's commands into additional tests in the nexthops
> selftest script which I will send against -next.

Series applied and queued up for -stable, thanks David.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net 0/5] nexthops: Fix 2 fundamental flaws with nexthop groups
  2020-05-26 22:28 ` [PATCH net 0/5] nexthops: Fix 2 fundamental flaws with nexthop groups David Miller
@ 2020-05-26 22:35   ` Nikolay Aleksandrov
  2020-05-26 22:37     ` Nikolay Aleksandrov
  0 siblings, 1 reply; 16+ messages in thread
From: Nikolay Aleksandrov @ 2020-05-26 22:35 UTC (permalink / raw)
  To: David Miller, dsahern; +Cc: netdev, kuba, dahern

On 5/27/20 1:28 AM, David Miller wrote:
> From: David Ahern <dsahern@kernel.org>
> Date: Tue, 26 May 2020 09:01:09 -0600
> 
>> From: David Ahern <dahern@digitalocean.com>
>>
>> Nik's torture tests have exposed 2 fundamental mistakes with the initial
>> nexthop code for groups. First, the nexthops entries and num_nh in the
>> nh_grp struct should not be modified once the struct is set under rcu.
>> Doing so has major affects on the datapath seeing valid nexthop entries.
>>
>> Second, the helpers in the header file were convenient for not repeating
>> code, but they cause datapath walks to potentially see 2 different group
>> structs after an rcu replace, disrupting a walk of the path objects.
>> This second problem applies solely to IPv4 as I re-used too much of the
>> existing code in walking legs of a multipath route.
>>
>> Patches 1 is refactoring change to simplify the overhead of reviewing and
>> understanding the change in patch 2 which fixes the update of nexthop
>> groups when a compnent leg is removed.
>>
>> Patches 3-5 address the second problem. Patch 3 inlines the multipath
>> check such that the mpath lookup and subsequent calls all use the same
>> nh_grp struct. Patches 4 and 5 fix datapath uses of fib_info_num_path
>> with iterative calls to fib_info_nhc.
>>
>> fib_info_num_path can be used in control plane path in a 'for loop' with
>> subsequent fib_info_nhc calls to get each leg since the nh_grp struct is
>> only changed while holding the rtnl; the combination can not be used in
>> the data plane with external nexthops as it involves repeated dereferences
>> of nh_grp struct which can change between calls.
>>
>> Similarly, nexthop_is_multipath can be used for branching decisions in
>> the datapath since the nexthop type can not be changed (a group can not
>> be converted to standalone and vice versa).
>>
>> Patch set developed in coordination with Nikolay Aleksandrov. He did a
>> lot of work creating a good reproducer, discussing options to fix it
>> and testing iterations.
>>
>> I have adapted Nik's commands into additional tests in the nexthops
>> selftest script which I will send against -next.
> 
> Series applied and queued up for -stable, thanks David.
> 

Dave this was v1, there were some whitespace errors which were fixed
in v2: http://patchwork.ozlabs.org/project/netdev/list/?series=179428



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net 0/5] nexthops: Fix 2 fundamental flaws with nexthop groups
  2020-05-26 22:35   ` Nikolay Aleksandrov
@ 2020-05-26 22:37     ` Nikolay Aleksandrov
  2020-05-26 22:39       ` David Ahern
  0 siblings, 1 reply; 16+ messages in thread
From: Nikolay Aleksandrov @ 2020-05-26 22:37 UTC (permalink / raw)
  To: David Miller, dsahern; +Cc: netdev, kuba, dahern

On 5/27/20 1:35 AM, Nikolay Aleksandrov wrote:
> On 5/27/20 1:28 AM, David Miller wrote:
>> From: David Ahern <dsahern@kernel.org>
>> Date: Tue, 26 May 2020 09:01:09 -0600
>>
>>> From: David Ahern <dahern@digitalocean.com>
>>>
>>> Nik's torture tests have exposed 2 fundamental mistakes with the initial
>>> nexthop code for groups. First, the nexthops entries and num_nh in the
>>> nh_grp struct should not be modified once the struct is set under rcu.
>>> Doing so has major affects on the datapath seeing valid nexthop entries.
>>>
>>> Second, the helpers in the header file were convenient for not repeating
>>> code, but they cause datapath walks to potentially see 2 different group
>>> structs after an rcu replace, disrupting a walk of the path objects.
>>> This second problem applies solely to IPv4 as I re-used too much of the
>>> existing code in walking legs of a multipath route.
>>>
>>> Patches 1 is refactoring change to simplify the overhead of reviewing and
>>> understanding the change in patch 2 which fixes the update of nexthop
>>> groups when a compnent leg is removed.
>>>
>>> Patches 3-5 address the second problem. Patch 3 inlines the multipath
>>> check such that the mpath lookup and subsequent calls all use the same
>>> nh_grp struct. Patches 4 and 5 fix datapath uses of fib_info_num_path
>>> with iterative calls to fib_info_nhc.
>>>
>>> fib_info_num_path can be used in control plane path in a 'for loop' with
>>> subsequent fib_info_nhc calls to get each leg since the nh_grp struct is
>>> only changed while holding the rtnl; the combination can not be used in
>>> the data plane with external nexthops as it involves repeated dereferences
>>> of nh_grp struct which can change between calls.
>>>
>>> Similarly, nexthop_is_multipath can be used for branching decisions in
>>> the datapath since the nexthop type can not be changed (a group can not
>>> be converted to standalone and vice versa).
>>>
>>> Patch set developed in coordination with Nikolay Aleksandrov. He did a
>>> lot of work creating a good reproducer, discussing options to fix it
>>> and testing iterations.
>>>
>>> I have adapted Nik's commands into additional tests in the nexthops
>>> selftest script which I will send against -next.
>>
>> Series applied and queued up for -stable, thanks David.
>>
> 
> Dave this was v1, there were some whitespace errors which were fixed
> in v2: http://patchwork.ozlabs.org/project/netdev/list/?series=179428
> 

We can send a simple incremental against this set to fix 'em up. If David
doesn't me beat to it, I'll send a fix tomorrow.

Cheers


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net 0/5] nexthops: Fix 2 fundamental flaws with nexthop groups
  2020-05-26 22:37     ` Nikolay Aleksandrov
@ 2020-05-26 22:39       ` David Ahern
  0 siblings, 0 replies; 16+ messages in thread
From: David Ahern @ 2020-05-26 22:39 UTC (permalink / raw)
  To: Nikolay Aleksandrov, David Miller, dsahern; +Cc: netdev, kuba, dahern

On 5/26/20 4:37 PM, Nikolay Aleksandrov wrote:
> We can send a simple incremental against this set to fix 'em up. If David
> doesn't me beat to it, I'll send a fix tomorrow.

Leave -net as is; I think there were 4 lines with spaces instead of tabs.

I have another feature coming for the nexthop code; I can look at fixing
the whitespace issues in -next with it.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-05-26 22:39 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-26 15:01 [PATCH net 0/5] nexthops: Fix 2 fundamental flaws with nexthop groups David Ahern
2020-05-26 15:01 ` [PATCH net 1/5] nexthops: Move code from remove_nexthop_from_groups to remove_nh_grp_entry David Ahern
2020-05-26 15:07   ` Nikolay Aleksandrov
2020-05-26 15:01 ` [PATCH net 2/5] nexthops: don't modify published nexthop groups David Ahern
2020-05-26 15:01 ` [PATCH net 3/5] nexthop: Expand nexthop_is_multipath in a few places David Ahern
2020-05-26 15:08   ` Nikolay Aleksandrov
2020-05-26 15:01 ` [PATCH net 4/5] ipv4: Refactor nhc evaluation in fib_table_lookup David Ahern
2020-05-26 15:09   ` Nikolay Aleksandrov
2020-05-26 15:01 ` [PATCH net 5/5] ipv4: nexthop version of fib_info_nh_uses_dev David Ahern
2020-05-26 15:10   ` Nikolay Aleksandrov
2020-05-26 18:37   ` Jakub Kicinski
2020-05-26 18:39     ` David Ahern
2020-05-26 22:28 ` [PATCH net 0/5] nexthops: Fix 2 fundamental flaws with nexthop groups David Miller
2020-05-26 22:35   ` Nikolay Aleksandrov
2020-05-26 22:37     ` Nikolay Aleksandrov
2020-05-26 22:39       ` David Ahern

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.