All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch net-next 0/7] mlxsw: Identical routes handling
@ 2017-02-09  9:28 Jiri Pirko
  2017-02-09  9:28 ` [patch net-next 1/7] ipv4: fib: Only flush FIB aliases belonging to currently flushed table Jiri Pirko
                   ` (7 more replies)
  0 siblings, 8 replies; 10+ messages in thread
From: Jiri Pirko @ 2017-02-09  9:28 UTC (permalink / raw)
  To: netdev; +Cc: davem, idosch, eladr, mlxsw

From: Jiri Pirko <jiri@mellanox.com>

Ido says:

The kernel can store several FIB aliases that share the same prefix and
length. These aliases can differ in other parameters such as TOS and
metric, which are taken into account during lookup.

Offloading devices might not have the same flexibility, allowing only a
single route with the same prefix and length to be reflected. mlxsw is
one such device.

This patchset aims to correctly handle this situation in the mlxsw
driver. The first four patches introduce small changes in the IPv4 FIB
code, so that listeners of the FIB notification chain will be able to
correctly handle identical routes.

The last three patches build on top of previous work and introduce the
necessary changes in the mlxsw driver. The biggest change is the
introduction of a FIB node, where identical routes are chained, instead
of a primitive reference counting. This is explained in detail in the
fifth patch.

Ido Schimmel (7):
  ipv4: fib: Only flush FIB aliases belonging to currently flushed table
  ipv4: fib: Send deletion notification with actual FIB alias type
  ipv4: fib: Send notification before deleting FIB alias
  ipv4: fib: Add events for FIB replace and append
  mlxsw: spectrum_router: Correctly handle identical routes
  mlxsw: spectrum_router: Add support for route append
  mlxsw: spectrum_router: Add support for route replace

 .../net/ethernet/mellanox/mlxsw/spectrum_router.c  | 644 +++++++++++++++------
 include/net/ip_fib.h                               |   3 +-
 net/ipv4/fib_trie.c                                |  42 +-
 3 files changed, 489 insertions(+), 200 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [patch net-next 1/7] ipv4: fib: Only flush FIB aliases belonging to currently flushed table
  2017-02-09  9:28 [patch net-next 0/7] mlxsw: Identical routes handling Jiri Pirko
@ 2017-02-09  9:28 ` Jiri Pirko
  2017-02-09 19:00   ` Duyck, Alexander H
  2017-02-09  9:28 ` [patch net-next 2/7] ipv4: fib: Send deletion notification with actual FIB alias type Jiri Pirko
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 10+ messages in thread
From: Jiri Pirko @ 2017-02-09  9:28 UTC (permalink / raw)
  To: netdev; +Cc: davem, idosch, eladr, mlxsw, Alexander Duyck, Patrick McHardy

From: Ido Schimmel <idosch@mellanox.com>

In case the MAIN table is flushed and its trie is shared with the LOCAL
table, then we might be flushing FIB aliases belonging to the latter.
This can lead to FIB_ENTRY_DEL notifications sent with the wrong table
ID.

The above doesn't affect current listeners, as the table ID is ignored
during entry deletion, but this will change later in the patchset.

When flushing a particular table, skip any aliases belonging to a
different one.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
CC: Alexander Duyck <alexander.h.duyck@intel.com>
CC: Patrick McHardy <kaber@trash.net>
---
 net/ipv4/fib_trie.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 2919d1a..5ef4596 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1963,7 +1963,8 @@ int fib_table_flush(struct net *net, struct fib_table *tb)
 		hlist_for_each_entry_safe(fa, tmp, &n->leaf, fa_list) {
 			struct fib_info *fi = fa->fa_info;
 
-			if (!fi || !(fi->fib_flags & RTNH_F_DEAD)) {
+			if (!fi || !(fi->fib_flags & RTNH_F_DEAD) ||
+			    tb->tb_id != fa->tb_id) {
 				slen = fa->fa_slen;
 				continue;
 			}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [patch net-next 2/7] ipv4: fib: Send deletion notification with actual FIB alias type
  2017-02-09  9:28 [patch net-next 0/7] mlxsw: Identical routes handling Jiri Pirko
  2017-02-09  9:28 ` [patch net-next 1/7] ipv4: fib: Only flush FIB aliases belonging to currently flushed table Jiri Pirko
@ 2017-02-09  9:28 ` Jiri Pirko
  2017-02-09  9:28 ` [patch net-next 3/7] ipv4: fib: Send notification before deleting FIB alias Jiri Pirko
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Jiri Pirko @ 2017-02-09  9:28 UTC (permalink / raw)
  To: netdev; +Cc: davem, idosch, eladr, mlxsw, Patrick McHardy

From: Ido Schimmel <idosch@mellanox.com>

When a FIB alias is removed, a notification is sent using the type
passed from user space - can be RTN_UNSPEC - instead of the actual type
of the removed alias. This is problematic for listeners of the FIB
notification chain, as several FIB aliases can exist with matching
parameters, but the type.

Solve this by passing the actual type of the removed FIB alias.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
CC: Patrick McHardy <kaber@trash.net>
---
 net/ipv4/fib_trie.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 5ef4596..b0bfb1c 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1653,8 +1653,8 @@ int fib_table_delete(struct net *net, struct fib_table *tb,
 		return -ESRCH;
 
 	call_fib_entry_notifiers(net, FIB_EVENT_ENTRY_DEL, key, plen,
-				 fa_to_delete->fa_info, tos, cfg->fc_type,
-				 tb->tb_id, 0);
+				 fa_to_delete->fa_info, tos,
+				 fa_to_delete->fa_type, tb->tb_id, 0);
 	rtmsg_fib(RTM_DELROUTE, htonl(key), fa_to_delete, plen, tb->tb_id,
 		  &cfg->fc_nlinfo, 0);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [patch net-next 3/7] ipv4: fib: Send notification before deleting FIB alias
  2017-02-09  9:28 [patch net-next 0/7] mlxsw: Identical routes handling Jiri Pirko
  2017-02-09  9:28 ` [patch net-next 1/7] ipv4: fib: Only flush FIB aliases belonging to currently flushed table Jiri Pirko
  2017-02-09  9:28 ` [patch net-next 2/7] ipv4: fib: Send deletion notification with actual FIB alias type Jiri Pirko
@ 2017-02-09  9:28 ` Jiri Pirko
  2017-02-09  9:28 ` [patch net-next 4/7] ipv4: fib: Add events for FIB replace and append Jiri Pirko
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Jiri Pirko @ 2017-02-09  9:28 UTC (permalink / raw)
  To: netdev; +Cc: davem, idosch, eladr, mlxsw, Patrick McHardy

From: Ido Schimmel <idosch@mellanox.com>

When a FIB alias is replaced following NLM_F_REPLACE, the ENTRY_ADD
notification is sent after the reference on the previous FIB info was
dropped. This is problematic as potential listeners might need to access
it in their notification blocks.

Solve this by sending the notification prior to the deletion of the
replaced FIB alias. This is consistent with ENTRY_DEL notifications.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
CC: Patrick McHardy <kaber@trash.net>
---
 net/ipv4/fib_trie.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index b0bfb1c..1c4d42e 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1295,6 +1295,13 @@ int fib_table_insert(struct net *net, struct fib_table *tb,
 			new_fa->tb_id = tb->tb_id;
 			new_fa->fa_default = -1;
 
+			call_fib_entry_notifiers(net, FIB_EVENT_ENTRY_ADD,
+						 key, plen, fi,
+						 new_fa->fa_tos, cfg->fc_type,
+						 tb->tb_id, nlflags);
+			rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen,
+				  tb->tb_id, &cfg->fc_nlinfo, nlflags);
+
 			hlist_replace_rcu(&fa->fa_list, &new_fa->fa_list);
 
 			alias_free_mem_rcu(fa);
@@ -1303,13 +1310,6 @@ int fib_table_insert(struct net *net, struct fib_table *tb,
 			if (state & FA_S_ACCESSED)
 				rt_cache_flush(cfg->fc_nlinfo.nl_net);
 
-			call_fib_entry_notifiers(net, FIB_EVENT_ENTRY_ADD,
-						 key, plen, fi,
-						 new_fa->fa_tos, cfg->fc_type,
-						 tb->tb_id, cfg->fc_nlflags);
-			rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen,
-				tb->tb_id, &cfg->fc_nlinfo, nlflags);
-
 			goto succeeded;
 		}
 		/* Error if we find a perfect match which
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [patch net-next 4/7] ipv4: fib: Add events for FIB replace and append
  2017-02-09  9:28 [patch net-next 0/7] mlxsw: Identical routes handling Jiri Pirko
                   ` (2 preceding siblings ...)
  2017-02-09  9:28 ` [patch net-next 3/7] ipv4: fib: Send notification before deleting FIB alias Jiri Pirko
@ 2017-02-09  9:28 ` Jiri Pirko
  2017-02-09  9:28 ` [patch net-next 5/7] mlxsw: spectrum_router: Correctly handle identical routes Jiri Pirko
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Jiri Pirko @ 2017-02-09  9:28 UTC (permalink / raw)
  To: netdev; +Cc: davem, idosch, eladr, mlxsw, Patrick McHardy

From: Ido Schimmel <idosch@mellanox.com>

The FIB notification chain currently uses the NLM_F_{REPLACE,APPEND}
flags to signal routes being replaced or appended.

Instead of using netlink flags for in-kernel notifications we can simply
introduce two new events in the FIB notification chain. This has the
added advantage of making the API cleaner, thereby making it clear that
these events should be supported by listeners of the notification chain.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
CC: Patrick McHardy <kaber@trash.net>
---
 include/net/ip_fib.h |  3 ++-
 net/ipv4/fib_trie.c  | 27 ++++++++++++++-------------
 2 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 45a184e..368bb40 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -211,7 +211,6 @@ struct fib_entry_notifier_info {
 	u8 tos;
 	u8 type;
 	u32 tb_id;
-	u32 nlflags;
 };
 
 struct fib_nh_notifier_info {
@@ -220,6 +219,8 @@ struct fib_nh_notifier_info {
 };
 
 enum fib_event_type {
+	FIB_EVENT_ENTRY_REPLACE,
+	FIB_EVENT_ENTRY_APPEND,
 	FIB_EVENT_ENTRY_ADD,
 	FIB_EVENT_ENTRY_DEL,
 	FIB_EVENT_RULE_ADD,
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 1c4d42e..d8cea21 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -124,7 +124,7 @@ static void fib_notify(struct net *net, struct notifier_block *nb,
 static int call_fib_entry_notifier(struct notifier_block *nb, struct net *net,
 				   enum fib_event_type event_type, u32 dst,
 				   int dst_len, struct fib_info *fi,
-				   u8 tos, u8 type, u32 tb_id, u32 nlflags)
+				   u8 tos, u8 type, u32 tb_id)
 {
 	struct fib_entry_notifier_info info = {
 		.dst = dst,
@@ -133,7 +133,6 @@ static int call_fib_entry_notifier(struct notifier_block *nb, struct net *net,
 		.tos = tos,
 		.type = type,
 		.tb_id = tb_id,
-		.nlflags = nlflags,
 	};
 	return call_fib_notifier(nb, net, event_type, &info.info);
 }
@@ -197,7 +196,7 @@ int call_fib_notifiers(struct net *net, enum fib_event_type event_type,
 static int call_fib_entry_notifiers(struct net *net,
 				    enum fib_event_type event_type, u32 dst,
 				    int dst_len, struct fib_info *fi,
-				    u8 tos, u8 type, u32 tb_id, u32 nlflags)
+				    u8 tos, u8 type, u32 tb_id)
 {
 	struct fib_entry_notifier_info info = {
 		.dst = dst,
@@ -206,7 +205,6 @@ static int call_fib_entry_notifiers(struct net *net,
 		.tos = tos,
 		.type = type,
 		.tb_id = tb_id,
-		.nlflags = nlflags,
 	};
 	return call_fib_notifiers(net, event_type, &info.info);
 }
@@ -1198,6 +1196,7 @@ static int fib_insert_alias(struct trie *t, struct key_vector *tp,
 int fib_table_insert(struct net *net, struct fib_table *tb,
 		     struct fib_config *cfg)
 {
+	enum fib_event_type event = FIB_EVENT_ENTRY_ADD;
 	struct trie *t = (struct trie *)tb->tb_data;
 	struct fib_alias *fa, *new_fa;
 	struct key_vector *l, *tp;
@@ -1295,10 +1294,10 @@ int fib_table_insert(struct net *net, struct fib_table *tb,
 			new_fa->tb_id = tb->tb_id;
 			new_fa->fa_default = -1;
 
-			call_fib_entry_notifiers(net, FIB_EVENT_ENTRY_ADD,
+			call_fib_entry_notifiers(net, FIB_EVENT_ENTRY_REPLACE,
 						 key, plen, fi,
 						 new_fa->fa_tos, cfg->fc_type,
-						 tb->tb_id, nlflags);
+						 tb->tb_id);
 			rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen,
 				  tb->tb_id, &cfg->fc_nlinfo, nlflags);
 
@@ -1319,10 +1318,12 @@ int fib_table_insert(struct net *net, struct fib_table *tb,
 		if (fa_match)
 			goto out;
 
-		if (cfg->fc_nlflags & NLM_F_APPEND)
+		if (cfg->fc_nlflags & NLM_F_APPEND) {
+			event = FIB_EVENT_ENTRY_APPEND;
 			nlflags |= NLM_F_APPEND;
-		else
+		} else {
 			fa = fa_first;
+		}
 	}
 	err = -ENOENT;
 	if (!(cfg->fc_nlflags & NLM_F_CREATE))
@@ -1351,8 +1352,8 @@ int fib_table_insert(struct net *net, struct fib_table *tb,
 		tb->tb_num_default++;
 
 	rt_cache_flush(cfg->fc_nlinfo.nl_net);
-	call_fib_entry_notifiers(net, FIB_EVENT_ENTRY_ADD, key, plen, fi, tos,
-				 cfg->fc_type, tb->tb_id, cfg->fc_nlflags);
+	call_fib_entry_notifiers(net, event, key, plen, fi, tos, cfg->fc_type,
+				 tb->tb_id);
 	rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen, new_fa->tb_id,
 		  &cfg->fc_nlinfo, nlflags);
 succeeded:
@@ -1654,7 +1655,7 @@ int fib_table_delete(struct net *net, struct fib_table *tb,
 
 	call_fib_entry_notifiers(net, FIB_EVENT_ENTRY_DEL, key, plen,
 				 fa_to_delete->fa_info, tos,
-				 fa_to_delete->fa_type, tb->tb_id, 0);
+				 fa_to_delete->fa_type, tb->tb_id);
 	rtmsg_fib(RTM_DELROUTE, htonl(key), fa_to_delete, plen, tb->tb_id,
 		  &cfg->fc_nlinfo, 0);
 
@@ -1973,7 +1974,7 @@ int fib_table_flush(struct net *net, struct fib_table *tb)
 						 n->key,
 						 KEYLENGTH - fa->fa_slen,
 						 fi, fa->fa_tos, fa->fa_type,
-						 tb->tb_id, 0);
+						 tb->tb_id);
 			hlist_del_rcu(&fa->fa_list);
 			fib_release_info(fa->fa_info);
 			alias_free_mem_rcu(fa);
@@ -2013,7 +2014,7 @@ static void fib_leaf_notify(struct net *net, struct key_vector *l,
 
 		call_fib_entry_notifier(nb, net, event_type, l->key,
 					KEYLENGTH - fa->fa_slen, fi, fa->fa_tos,
-					fa->fa_type, fa->tb_id, 0);
+					fa->fa_type, fa->tb_id);
 	}
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [patch net-next 5/7] mlxsw: spectrum_router: Correctly handle identical routes
  2017-02-09  9:28 [patch net-next 0/7] mlxsw: Identical routes handling Jiri Pirko
                   ` (3 preceding siblings ...)
  2017-02-09  9:28 ` [patch net-next 4/7] ipv4: fib: Add events for FIB replace and append Jiri Pirko
@ 2017-02-09  9:28 ` Jiri Pirko
  2017-02-09  9:28 ` [patch net-next 6/7] mlxsw: spectrum_router: Add support for route append Jiri Pirko
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Jiri Pirko @ 2017-02-09  9:28 UTC (permalink / raw)
  To: netdev; +Cc: davem, idosch, eladr, mlxsw

From: Ido Schimmel <idosch@mellanox.com>

In the device, routes are indexed in a routing table based on the prefix
and its length. This is in contrast to the kernel's FIB where several
FIB aliases can exist with these parameters being identical. In such
cases, the routes will be sorted by table ID (LOCAL first, then MAIN),
TOS and finally priority (metric).

During lookup, these routes will be evaluated in order. In case the
packet's TOS field is non-zero and a FIB alias with a matching TOS is
found, then it's selected. Otherwise, the lookup defaults to the route
with TOS 0 (if it exists). However, if the requested scope is narrower
than the one found, then the lookup continues.

To best reflect the kernel's datapath we should take the above into
account. Given a prefix and its length, the reflected route will always
be the first one in the FIB alias list. However, if the route has a
non-zero TOS then its action will be converted to trap instead of
forward, since we currently don't support TOS-based routing. If this
turns out to be a real issue, we can add support for that using
policy-based switching.

The route's scope can be effectively ignored as any packet being routed
by the device would've been looked-up using the widest scope (UNIVERSE).

To achieve that we need to do two changes. Firstly, we need to create
another struct (FIB node) that will hold the list of FIB entries sharing
the same prefix and length. This struct will be hashed using these two
parameters.

Secondly, we need to change the route reflection to match the above
logic, so that the first FIB entry in the list will be programmed into
the device while the rest will remain in the driver's cache in case of
subsequent changes.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 .../net/ethernet/mellanox/mlxsw/spectrum_router.c  | 581 ++++++++++++++-------
 1 file changed, 403 insertions(+), 178 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index 71ff02f..7c55df9 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -109,7 +109,6 @@ mlxsw_sp_prefix_usage_clear(struct mlxsw_sp_prefix_usage *prefix_usage,
 }
 
 struct mlxsw_sp_fib_key {
-	struct net_device *dev;
 	unsigned char addr[sizeof(struct in6_addr)];
 	unsigned char prefix_len;
 };
@@ -122,94 +121,39 @@ enum mlxsw_sp_fib_entry_type {
 
 struct mlxsw_sp_nexthop_group;
 
-struct mlxsw_sp_fib_entry {
-	struct rhash_head ht_node;
+struct mlxsw_sp_fib_node {
+	struct list_head entry_list;
 	struct list_head list;
+	struct rhash_head ht_node;
+	struct mlxsw_sp_vr *vr;
 	struct mlxsw_sp_fib_key key;
+};
+
+struct mlxsw_sp_fib_entry_params {
+	u32 tb_id;
+	u32 prio;
+	u8 tos;
+	u8 type;
+};
+
+struct mlxsw_sp_fib_entry {
+	struct list_head list;
+	struct mlxsw_sp_fib_node *fib_node;
 	enum mlxsw_sp_fib_entry_type type;
-	unsigned int ref_count;
-	struct mlxsw_sp_vr *vr;
 	struct list_head nexthop_group_node;
 	struct mlxsw_sp_nexthop_group *nh_group;
+	struct mlxsw_sp_fib_entry_params params;
 	bool offloaded;
 };
 
 struct mlxsw_sp_fib {
 	struct rhashtable ht;
-	struct list_head entry_list;
+	struct list_head node_list;
 	unsigned long prefix_ref_count[MLXSW_SP_PREFIX_COUNT];
 	struct mlxsw_sp_prefix_usage prefix_usage;
 };
 
-static const struct rhashtable_params mlxsw_sp_fib_ht_params = {
-	.key_offset = offsetof(struct mlxsw_sp_fib_entry, key),
-	.head_offset = offsetof(struct mlxsw_sp_fib_entry, ht_node),
-	.key_len = sizeof(struct mlxsw_sp_fib_key),
-	.automatic_shrinking = true,
-};
-
-static int mlxsw_sp_fib_entry_insert(struct mlxsw_sp_fib *fib,
-				     struct mlxsw_sp_fib_entry *fib_entry)
-{
-	unsigned char prefix_len = fib_entry->key.prefix_len;
-	int err;
-
-	err = rhashtable_insert_fast(&fib->ht, &fib_entry->ht_node,
-				     mlxsw_sp_fib_ht_params);
-	if (err)
-		return err;
-	list_add_tail(&fib_entry->list, &fib->entry_list);
-	if (fib->prefix_ref_count[prefix_len]++ == 0)
-		mlxsw_sp_prefix_usage_set(&fib->prefix_usage, prefix_len);
-	return 0;
-}
-
-static void mlxsw_sp_fib_entry_remove(struct mlxsw_sp_fib *fib,
-				      struct mlxsw_sp_fib_entry *fib_entry)
-{
-	unsigned char prefix_len = fib_entry->key.prefix_len;
-
-	if (--fib->prefix_ref_count[prefix_len] == 0)
-		mlxsw_sp_prefix_usage_clear(&fib->prefix_usage, prefix_len);
-	list_del(&fib_entry->list);
-	rhashtable_remove_fast(&fib->ht, &fib_entry->ht_node,
-			       mlxsw_sp_fib_ht_params);
-}
-
-static struct mlxsw_sp_fib_entry *
-mlxsw_sp_fib_entry_create(struct mlxsw_sp_fib *fib, const void *addr,
-			  size_t addr_len, unsigned char prefix_len,
-			  struct net_device *dev)
-{
-	struct mlxsw_sp_fib_entry *fib_entry;
-
-	fib_entry = kzalloc(sizeof(*fib_entry), GFP_KERNEL);
-	if (!fib_entry)
-		return NULL;
-	fib_entry->key.dev = dev;
-	memcpy(fib_entry->key.addr, addr, addr_len);
-	fib_entry->key.prefix_len = prefix_len;
-	return fib_entry;
-}
-
-static void mlxsw_sp_fib_entry_destroy(struct mlxsw_sp_fib_entry *fib_entry)
-{
-	kfree(fib_entry);
-}
-
-static struct mlxsw_sp_fib_entry *
-mlxsw_sp_fib_entry_lookup(struct mlxsw_sp_fib *fib, const void *addr,
-			  size_t addr_len, unsigned char prefix_len,
-			  struct net_device *dev)
-{
-	struct mlxsw_sp_fib_key key;
-
-	memset(&key, 0, sizeof(key));
-	key.dev = dev;
-	memcpy(key.addr, addr, addr_len);
-	key.prefix_len = prefix_len;
-	return rhashtable_lookup_fast(&fib->ht, &key, mlxsw_sp_fib_ht_params);
-}
+static const struct rhashtable_params mlxsw_sp_fib_ht_params;
 
 static struct mlxsw_sp_fib *mlxsw_sp_fib_create(void)
 {
@@ -222,7 +166,7 @@ static struct mlxsw_sp_fib *mlxsw_sp_fib_create(void)
 	err = rhashtable_init(&fib->ht, &mlxsw_sp_fib_ht_params);
 	if (err)
 		goto err_rhashtable_init;
-	INIT_LIST_HEAD(&fib->entry_list);
+	INIT_LIST_HEAD(&fib->node_list);
 	return fib;
 
 err_rhashtable_init:
@@ -232,6 +176,7 @@ static struct mlxsw_sp_fib *mlxsw_sp_fib_create(void)
 
 static void mlxsw_sp_fib_destroy(struct mlxsw_sp_fib *fib)
 {
+	WARN_ON(!list_empty(&fib->node_list));
 	rhashtable_destroy(&fib->ht);
 	kfree(fib);
 }
@@ -1239,9 +1184,9 @@ static int mlxsw_sp_adj_index_mass_update(struct mlxsw_sp *mlxsw_sp,
 	int err;
 
 	list_for_each_entry(fib_entry, &nh_grp->fib_list, nexthop_group_node) {
-		if (vr == fib_entry->vr)
+		if (vr == fib_entry->fib_node->vr)
 			continue;
-		vr = fib_entry->vr;
+		vr = fib_entry->fib_node->vr;
 		err = mlxsw_sp_adj_index_mass_update_vr(mlxsw_sp, vr,
 							old_adj_index,
 							old_ecmp_size,
@@ -1727,6 +1672,9 @@ mlxsw_sp_fib_entry_should_offload(const struct mlxsw_sp_fib_entry *fib_entry)
 {
 	struct mlxsw_sp_nexthop_group *nh_group = fib_entry->nh_group;
 
+	if (fib_entry->params.tos)
+		return false;
+
 	switch (fib_entry->type) {
 	case MLXSW_SP_FIB_ENTRY_TYPE_REMOTE:
 		return !!nh_group->adj_index_valid;
@@ -1741,7 +1689,7 @@ static void mlxsw_sp_fib_entry_offload_set(struct mlxsw_sp_fib_entry *fib_entry)
 {
 	fib_entry->offloaded = true;
 
-	switch (fib_entry->vr->proto) {
+	switch (fib_entry->fib_node->vr->proto) {
 	case MLXSW_SP_L3_PROTO_IPV4:
 		fib_info_offload_inc(fib_entry->nh_group->key.fi);
 		break;
@@ -1753,7 +1701,7 @@ static void mlxsw_sp_fib_entry_offload_set(struct mlxsw_sp_fib_entry *fib_entry)
 static void
 mlxsw_sp_fib_entry_offload_unset(struct mlxsw_sp_fib_entry *fib_entry)
 {
-	switch (fib_entry->vr->proto) {
+	switch (fib_entry->fib_node->vr->proto) {
 	case MLXSW_SP_L3_PROTO_IPV4:
 		fib_info_offload_dec(fib_entry->nh_group->key.fi);
 		break;
@@ -1793,8 +1741,8 @@ static int mlxsw_sp_fib_entry_op4_remote(struct mlxsw_sp *mlxsw_sp,
 					 enum mlxsw_reg_ralue_op op)
 {
 	char ralue_pl[MLXSW_REG_RALUE_LEN];
-	u32 *p_dip = (u32 *) fib_entry->key.addr;
-	struct mlxsw_sp_vr *vr = fib_entry->vr;
+	u32 *p_dip = (u32 *) fib_entry->fib_node->key.addr;
+	struct mlxsw_sp_vr *vr = fib_entry->fib_node->vr;
 	enum mlxsw_reg_ralue_trap_action trap_action;
 	u16 trap_id = 0;
 	u32 adjacency_index = 0;
@@ -1815,7 +1763,8 @@ static int mlxsw_sp_fib_entry_op4_remote(struct mlxsw_sp *mlxsw_sp,
 
 	mlxsw_reg_ralue_pack4(ralue_pl,
 			      (enum mlxsw_reg_ralxx_protocol) vr->proto, op,
-			      vr->id, fib_entry->key.prefix_len, *p_dip);
+			      vr->id, fib_entry->fib_node->key.prefix_len,
+			      *p_dip);
 	mlxsw_reg_ralue_act_remote_pack(ralue_pl, trap_action, trap_id,
 					adjacency_index, ecmp_size);
 	return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(ralue), ralue_pl);
@@ -1828,8 +1777,8 @@ static int mlxsw_sp_fib_entry_op4_local(struct mlxsw_sp *mlxsw_sp,
 	struct mlxsw_sp_rif *r = fib_entry->nh_group->nh_rif;
 	enum mlxsw_reg_ralue_trap_action trap_action;
 	char ralue_pl[MLXSW_REG_RALUE_LEN];
-	u32 *p_dip = (u32 *) fib_entry->key.addr;
-	struct mlxsw_sp_vr *vr = fib_entry->vr;
+	u32 *p_dip = (u32 *) fib_entry->fib_node->key.addr;
+	struct mlxsw_sp_vr *vr = fib_entry->fib_node->vr;
 	u16 trap_id = 0;
 	u16 rif = 0;
 
@@ -1843,7 +1792,8 @@ static int mlxsw_sp_fib_entry_op4_local(struct mlxsw_sp *mlxsw_sp,
 
 	mlxsw_reg_ralue_pack4(ralue_pl,
 			      (enum mlxsw_reg_ralxx_protocol) vr->proto, op,
-			      vr->id, fib_entry->key.prefix_len, *p_dip);
+			      vr->id, fib_entry->fib_node->key.prefix_len,
+			      *p_dip);
 	mlxsw_reg_ralue_act_local_pack(ralue_pl, trap_action, trap_id, rif);
 	return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(ralue), ralue_pl);
 }
@@ -1853,12 +1803,13 @@ static int mlxsw_sp_fib_entry_op4_trap(struct mlxsw_sp *mlxsw_sp,
 				       enum mlxsw_reg_ralue_op op)
 {
 	char ralue_pl[MLXSW_REG_RALUE_LEN];
-	u32 *p_dip = (u32 *) fib_entry->key.addr;
-	struct mlxsw_sp_vr *vr = fib_entry->vr;
+	u32 *p_dip = (u32 *) fib_entry->fib_node->key.addr;
+	struct mlxsw_sp_vr *vr = fib_entry->fib_node->vr;
 
 	mlxsw_reg_ralue_pack4(ralue_pl,
 			      (enum mlxsw_reg_ralxx_protocol) vr->proto, op,
-			      vr->id, fib_entry->key.prefix_len, *p_dip);
+			      vr->id, fib_entry->fib_node->key.prefix_len,
+			      *p_dip);
 	mlxsw_reg_ralue_act_ip2me_pack(ralue_pl);
 	return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(ralue), ralue_pl);
 }
@@ -1884,7 +1835,7 @@ static int mlxsw_sp_fib_entry_op(struct mlxsw_sp *mlxsw_sp,
 {
 	int err = -EINVAL;
 
-	switch (fib_entry->vr->proto) {
+	switch (fib_entry->fib_node->vr->proto) {
 	case MLXSW_SP_L3_PROTO_IPV4:
 		err = mlxsw_sp_fib_entry_op4(mlxsw_sp, fib_entry, op);
 		break;
@@ -1930,130 +1881,376 @@ mlxsw_sp_fib4_entry_type_set(struct mlxsw_sp *mlxsw_sp,
 }
 
 static struct mlxsw_sp_fib_entry *
-mlxsw_sp_fib_entry_get(struct mlxsw_sp *mlxsw_sp,
-		       const struct fib_entry_notifier_info *fen_info)
+mlxsw_sp_fib4_entry_create(struct mlxsw_sp *mlxsw_sp,
+			   struct mlxsw_sp_fib_node *fib_node,
+			   const struct fib_entry_notifier_info *fen_info)
 {
 	struct mlxsw_sp_fib_entry *fib_entry;
-	struct fib_info *fi = fen_info->fi;
-	struct mlxsw_sp_vr *vr;
 	int err;
 
-	vr = mlxsw_sp_vr_get(mlxsw_sp, fen_info->dst_len, fen_info->tb_id,
-			     MLXSW_SP_L3_PROTO_IPV4);
-	if (IS_ERR(vr))
-		return ERR_CAST(vr);
-
-	fib_entry = mlxsw_sp_fib_entry_lookup(vr->fib, &fen_info->dst,
-					      sizeof(fen_info->dst),
-					      fen_info->dst_len, fi->fib_dev);
-	if (fib_entry) {
-		/* Already exists, just take a reference */
-		fib_entry->ref_count++;
-		return fib_entry;
-	}
-	fib_entry = mlxsw_sp_fib_entry_create(vr->fib, &fen_info->dst,
-					      sizeof(fen_info->dst),
-					      fen_info->dst_len, fi->fib_dev);
+	fib_entry = kzalloc(sizeof(*fib_entry), GFP_KERNEL);
 	if (!fib_entry) {
 		err = -ENOMEM;
-		goto err_fib_entry_create;
+		goto err_fib_entry_alloc;
 	}
-	fib_entry->vr = vr;
-	fib_entry->ref_count = 1;
 
 	err = mlxsw_sp_fib4_entry_type_set(mlxsw_sp, fen_info, fib_entry);
 	if (err)
 		goto err_fib4_entry_type_set;
 
-	err = mlxsw_sp_nexthop_group_get(mlxsw_sp, fib_entry, fi);
+	err = mlxsw_sp_nexthop_group_get(mlxsw_sp, fib_entry, fen_info->fi);
 	if (err)
 		goto err_nexthop_group_get;
 
+	fib_entry->params.prio = fen_info->fi->fib_priority;
+	fib_entry->params.tb_id = fen_info->tb_id;
+	fib_entry->params.type = fen_info->type;
+	fib_entry->params.tos = fen_info->tos;
+
+	fib_entry->fib_node = fib_node;
+
 	return fib_entry;
 
 err_nexthop_group_get:
 err_fib4_entry_type_set:
-	mlxsw_sp_fib_entry_destroy(fib_entry);
-err_fib_entry_create:
-	mlxsw_sp_vr_put(mlxsw_sp, vr);
-
+	kfree(fib_entry);
+err_fib_entry_alloc:
 	return ERR_PTR(err);
 }
 
+static void mlxsw_sp_fib4_entry_destroy(struct mlxsw_sp *mlxsw_sp,
+					struct mlxsw_sp_fib_entry *fib_entry)
+{
+	mlxsw_sp_nexthop_group_put(mlxsw_sp, fib_entry);
+	kfree(fib_entry);
+}
+
+static struct mlxsw_sp_fib_node *
+mlxsw_sp_fib4_node_get(struct mlxsw_sp *mlxsw_sp,
+		       const struct fib_entry_notifier_info *fen_info);
+
 static struct mlxsw_sp_fib_entry *
-mlxsw_sp_fib_entry_find(struct mlxsw_sp *mlxsw_sp,
-			const struct fib_entry_notifier_info *fen_info)
+mlxsw_sp_fib4_entry_lookup(struct mlxsw_sp *mlxsw_sp,
+			   const struct fib_entry_notifier_info *fen_info)
 {
-	struct mlxsw_sp_vr *vr;
+	struct mlxsw_sp_fib_entry *fib_entry;
+	struct mlxsw_sp_fib_node *fib_node;
 
-	vr = mlxsw_sp_vr_find(mlxsw_sp, fen_info->tb_id,
-			      MLXSW_SP_L3_PROTO_IPV4);
-	if (!vr)
+	fib_node = mlxsw_sp_fib4_node_get(mlxsw_sp, fen_info);
+	if (IS_ERR(fib_node))
+		return NULL;
+
+	list_for_each_entry(fib_entry, &fib_node->entry_list, list) {
+		if (fib_entry->params.tb_id == fen_info->tb_id &&
+		    fib_entry->params.tos == fen_info->tos &&
+		    fib_entry->params.type == fen_info->type &&
+		    fib_entry->nh_group->key.fi == fen_info->fi) {
+			return fib_entry;
+		}
+	}
+
+	return NULL;
+}
+
+static const struct rhashtable_params mlxsw_sp_fib_ht_params = {
+	.key_offset = offsetof(struct mlxsw_sp_fib_node, key),
+	.head_offset = offsetof(struct mlxsw_sp_fib_node, ht_node),
+	.key_len = sizeof(struct mlxsw_sp_fib_key),
+	.automatic_shrinking = true,
+};
+
+static int mlxsw_sp_fib_node_insert(struct mlxsw_sp_fib *fib,
+				    struct mlxsw_sp_fib_node *fib_node)
+{
+	return rhashtable_insert_fast(&fib->ht, &fib_node->ht_node,
+				      mlxsw_sp_fib_ht_params);
+}
+
+static void mlxsw_sp_fib_node_remove(struct mlxsw_sp_fib *fib,
+				     struct mlxsw_sp_fib_node *fib_node)
+{
+	rhashtable_remove_fast(&fib->ht, &fib_node->ht_node,
+			       mlxsw_sp_fib_ht_params);
+}
+
+static struct mlxsw_sp_fib_node *
+mlxsw_sp_fib_node_lookup(struct mlxsw_sp_fib *fib, const void *addr,
+			 size_t addr_len, unsigned char prefix_len)
+{
+	struct mlxsw_sp_fib_key key;
+
+	memset(&key, 0, sizeof(key));
+	memcpy(key.addr, addr, addr_len);
+	key.prefix_len = prefix_len;
+	return rhashtable_lookup_fast(&fib->ht, &key, mlxsw_sp_fib_ht_params);
+}
+
+static struct mlxsw_sp_fib_node *
+mlxsw_sp_fib_node_create(struct mlxsw_sp_vr *vr, const void *addr,
+			 size_t addr_len, unsigned char prefix_len)
+{
+	struct mlxsw_sp_fib_node *fib_node;
+
+	fib_node = kzalloc(sizeof(*fib_node), GFP_KERNEL);
+	if (!fib_node)
 		return NULL;
 
-	return mlxsw_sp_fib_entry_lookup(vr->fib, &fen_info->dst,
-					 sizeof(fen_info->dst),
-					 fen_info->dst_len,
-					 fen_info->fi->fib_dev);
+	INIT_LIST_HEAD(&fib_node->entry_list);
+	list_add(&fib_node->list, &vr->fib->node_list);
+	memcpy(fib_node->key.addr, addr, addr_len);
+	fib_node->key.prefix_len = prefix_len;
+	mlxsw_sp_fib_node_insert(vr->fib, fib_node);
+	fib_node->vr = vr;
+
+	return fib_node;
+}
+
+static void mlxsw_sp_fib_node_destroy(struct mlxsw_sp_fib_node *fib_node)
+{
+	mlxsw_sp_fib_node_remove(fib_node->vr->fib, fib_node);
+	list_del(&fib_node->list);
+	WARN_ON(!list_empty(&fib_node->entry_list));
+	kfree(fib_node);
+}
+
+static bool
+mlxsw_sp_fib_node_entry_is_first(const struct mlxsw_sp_fib_node *fib_node,
+				 const struct mlxsw_sp_fib_entry *fib_entry)
+{
+	return list_first_entry(&fib_node->entry_list,
+				struct mlxsw_sp_fib_entry, list) == fib_entry;
+}
+
+static void mlxsw_sp_fib_node_prefix_inc(struct mlxsw_sp_fib_node *fib_node)
+{
+	unsigned char prefix_len = fib_node->key.prefix_len;
+	struct mlxsw_sp_fib *fib = fib_node->vr->fib;
+
+	if (fib->prefix_ref_count[prefix_len]++ == 0)
+		mlxsw_sp_prefix_usage_set(&fib->prefix_usage, prefix_len);
+}
+
+static void mlxsw_sp_fib_node_prefix_dec(struct mlxsw_sp_fib_node *fib_node)
+{
+	unsigned char prefix_len = fib_node->key.prefix_len;
+	struct mlxsw_sp_fib *fib = fib_node->vr->fib;
+
+	if (--fib->prefix_ref_count[prefix_len] == 0)
+		mlxsw_sp_prefix_usage_clear(&fib->prefix_usage, prefix_len);
 }
 
-static void mlxsw_sp_fib_entry_put(struct mlxsw_sp *mlxsw_sp,
-				   struct mlxsw_sp_fib_entry *fib_entry)
+static struct mlxsw_sp_fib_node *
+mlxsw_sp_fib4_node_get(struct mlxsw_sp *mlxsw_sp,
+		       const struct fib_entry_notifier_info *fen_info)
 {
-	struct mlxsw_sp_vr *vr = fib_entry->vr;
+	struct mlxsw_sp_fib_node *fib_node;
+	struct mlxsw_sp_vr *vr;
+	int err;
+
+	vr = mlxsw_sp_vr_get(mlxsw_sp, fen_info->dst_len, fen_info->tb_id,
+			     MLXSW_SP_L3_PROTO_IPV4);
+	if (IS_ERR(vr))
+		return ERR_CAST(vr);
+
+	fib_node = mlxsw_sp_fib_node_lookup(vr->fib, &fen_info->dst,
+					    sizeof(fen_info->dst),
+					    fen_info->dst_len);
+	if (fib_node)
+		return fib_node;
 
-	if (--fib_entry->ref_count == 0) {
-		mlxsw_sp_nexthop_group_put(mlxsw_sp, fib_entry);
-		mlxsw_sp_fib_entry_destroy(fib_entry);
+	fib_node = mlxsw_sp_fib_node_create(vr, &fen_info->dst,
+					    sizeof(fen_info->dst),
+					    fen_info->dst_len);
+	if (!fib_node) {
+		err = -ENOMEM;
+		goto err_fib_node_create;
 	}
+
+	return fib_node;
+
+err_fib_node_create:
 	mlxsw_sp_vr_put(mlxsw_sp, vr);
+	return ERR_PTR(err);
 }
 
-static void mlxsw_sp_fib_entry_put_all(struct mlxsw_sp *mlxsw_sp,
-				       struct mlxsw_sp_fib_entry *fib_entry)
+static void mlxsw_sp_fib4_node_put(struct mlxsw_sp *mlxsw_sp,
+				   struct mlxsw_sp_fib_node *fib_node)
 {
-	unsigned int last_ref_count;
+	struct mlxsw_sp_vr *vr = fib_node->vr;
 
-	do {
-		last_ref_count = fib_entry->ref_count;
-		mlxsw_sp_fib_entry_put(mlxsw_sp, fib_entry);
-	} while (last_ref_count != 1);
+	if (!list_empty(&fib_node->entry_list))
+		return;
+	mlxsw_sp_fib_node_destroy(fib_node);
+	mlxsw_sp_vr_put(mlxsw_sp, vr);
 }
 
-static int mlxsw_sp_router_fib4_add(struct mlxsw_sp *mlxsw_sp,
-				    struct fib_entry_notifier_info *fen_info)
+static struct mlxsw_sp_fib_entry *
+mlxsw_sp_fib4_node_entry_find(const struct mlxsw_sp_fib_node *fib_node,
+			      const struct mlxsw_sp_fib_entry_params *params)
 {
 	struct mlxsw_sp_fib_entry *fib_entry;
-	struct mlxsw_sp_vr *vr;
+
+	list_for_each_entry(fib_entry, &fib_node->entry_list, list) {
+		if (fib_entry->params.tb_id > params->tb_id)
+			continue;
+		if (fib_entry->params.tb_id != params->tb_id)
+			break;
+		if (fib_entry->params.tos > params->tos)
+			continue;
+		if (fib_entry->params.prio >= params->prio ||
+		    fib_entry->params.tos < params->tos)
+			return fib_entry;
+	}
+
+	return NULL;
+}
+
+static int
+mlxsw_sp_fib4_node_list_insert(struct mlxsw_sp_fib_node *fib_node,
+			       struct mlxsw_sp_fib_entry *new_entry)
+{
+	struct mlxsw_sp_fib_entry *fib_entry;
+
+	fib_entry = mlxsw_sp_fib4_node_entry_find(fib_node, &new_entry->params);
+
+	if (fib_entry) {
+		list_add_tail(&new_entry->list, &fib_entry->list);
+	} else {
+		struct mlxsw_sp_fib_entry *last;
+
+		list_for_each_entry(last, &fib_node->entry_list, list) {
+			if (new_entry->params.tb_id > last->params.tb_id)
+				break;
+			fib_entry = last;
+		}
+
+		if (fib_entry)
+			list_add(&new_entry->list, &fib_entry->list);
+		else
+			list_add(&new_entry->list, &fib_node->entry_list);
+	}
+
+	return 0;
+}
+
+static void
+mlxsw_sp_fib4_node_list_remove(struct mlxsw_sp_fib_entry *fib_entry)
+{
+	list_del(&fib_entry->list);
+}
+
+static int
+mlxsw_sp_fib4_node_entry_add(struct mlxsw_sp *mlxsw_sp,
+			     const struct mlxsw_sp_fib_node *fib_node,
+			     struct mlxsw_sp_fib_entry *fib_entry)
+{
+	if (!mlxsw_sp_fib_node_entry_is_first(fib_node, fib_entry))
+		return 0;
+
+	/* To prevent packet loss, overwrite the previously offloaded
+	 * entry.
+	 */
+	if (!list_is_singular(&fib_node->entry_list)) {
+		enum mlxsw_reg_ralue_op op = MLXSW_REG_RALUE_OP_WRITE_DELETE;
+		struct mlxsw_sp_fib_entry *n = list_next_entry(fib_entry, list);
+
+		mlxsw_sp_fib_entry_offload_refresh(n, op, 0);
+	}
+
+	return mlxsw_sp_fib_entry_update(mlxsw_sp, fib_entry);
+}
+
+static void
+mlxsw_sp_fib4_node_entry_del(struct mlxsw_sp *mlxsw_sp,
+			     const struct mlxsw_sp_fib_node *fib_node,
+			     struct mlxsw_sp_fib_entry *fib_entry)
+{
+	if (!mlxsw_sp_fib_node_entry_is_first(fib_node, fib_entry))
+		return;
+
+	/* Promote the next entry by overwriting the deleted entry */
+	if (!list_is_singular(&fib_node->entry_list)) {
+		struct mlxsw_sp_fib_entry *n = list_next_entry(fib_entry, list);
+		enum mlxsw_reg_ralue_op op = MLXSW_REG_RALUE_OP_WRITE_DELETE;
+
+		mlxsw_sp_fib_entry_update(mlxsw_sp, n);
+		mlxsw_sp_fib_entry_offload_refresh(fib_entry, op, 0);
+		return;
+	}
+
+	mlxsw_sp_fib_entry_del(mlxsw_sp, fib_entry);
+}
+
+static int mlxsw_sp_fib4_node_entry_link(struct mlxsw_sp *mlxsw_sp,
+					 struct mlxsw_sp_fib_entry *fib_entry)
+{
+	struct mlxsw_sp_fib_node *fib_node = fib_entry->fib_node;
+	int err;
+
+	err = mlxsw_sp_fib4_node_list_insert(fib_node, fib_entry);
+	if (err)
+		return err;
+
+	err = mlxsw_sp_fib4_node_entry_add(mlxsw_sp, fib_node, fib_entry);
+	if (err)
+		goto err_fib4_node_entry_add;
+
+	mlxsw_sp_fib_node_prefix_inc(fib_node);
+
+	return 0;
+
+err_fib4_node_entry_add:
+	mlxsw_sp_fib4_node_list_remove(fib_entry);
+	return err;
+}
+
+static void
+mlxsw_sp_fib4_node_entry_unlink(struct mlxsw_sp *mlxsw_sp,
+				struct mlxsw_sp_fib_entry *fib_entry)
+{
+	struct mlxsw_sp_fib_node *fib_node = fib_entry->fib_node;
+
+	mlxsw_sp_fib_node_prefix_dec(fib_node);
+	mlxsw_sp_fib4_node_entry_del(mlxsw_sp, fib_node, fib_entry);
+	mlxsw_sp_fib4_node_list_remove(fib_entry);
+}
+
+static int
+mlxsw_sp_router_fib4_add(struct mlxsw_sp *mlxsw_sp,
+			 const struct fib_entry_notifier_info *fen_info)
+{
+	struct mlxsw_sp_fib_entry *fib_entry;
+	struct mlxsw_sp_fib_node *fib_node;
 	int err;
 
 	if (mlxsw_sp->router.aborted)
 		return 0;
 
-	fib_entry = mlxsw_sp_fib_entry_get(mlxsw_sp, fen_info);
-	if (IS_ERR(fib_entry)) {
-		dev_warn(mlxsw_sp->bus_info->dev, "Failed to get FIB4 entry being added.\n");
-		return PTR_ERR(fib_entry);
+	fib_node = mlxsw_sp_fib4_node_get(mlxsw_sp, fen_info);
+	if (IS_ERR(fib_node)) {
+		dev_warn(mlxsw_sp->bus_info->dev, "Failed to get FIB node\n");
+		return PTR_ERR(fib_node);
 	}
 
-	if (fib_entry->ref_count != 1)
-		return 0;
+	fib_entry = mlxsw_sp_fib4_entry_create(mlxsw_sp, fib_node, fen_info);
+	if (IS_ERR(fib_entry)) {
+		dev_warn(mlxsw_sp->bus_info->dev, "Failed to create FIB entry\n");
+		err = PTR_ERR(fib_entry);
+		goto err_fib4_entry_create;
+	}
 
-	vr = fib_entry->vr;
-	err = mlxsw_sp_fib_entry_insert(vr->fib, fib_entry);
+	err = mlxsw_sp_fib4_node_entry_link(mlxsw_sp, fib_entry);
 	if (err) {
-		dev_warn(mlxsw_sp->bus_info->dev, "Failed to insert FIB4 entry being added.\n");
-		goto err_fib_entry_insert;
+		dev_warn(mlxsw_sp->bus_info->dev, "Failed to link FIB entry to node\n");
+		goto err_fib4_node_entry_link;
 	}
-	err = mlxsw_sp_fib_entry_update(mlxsw_sp, fib_entry);
-	if (err)
-		goto err_fib_entry_add;
+
 	return 0;
 
-err_fib_entry_add:
-	mlxsw_sp_fib_entry_remove(vr->fib, fib_entry);
-err_fib_entry_insert:
-	mlxsw_sp_fib_entry_put(mlxsw_sp, fib_entry);
+err_fib4_node_entry_link:
+	mlxsw_sp_fib4_entry_destroy(mlxsw_sp, fib_entry);
+err_fib4_entry_create:
+	mlxsw_sp_fib4_node_put(mlxsw_sp, fib_node);
 	return err;
 }
 
@@ -2061,20 +2258,19 @@ static void mlxsw_sp_router_fib4_del(struct mlxsw_sp *mlxsw_sp,
 				     struct fib_entry_notifier_info *fen_info)
 {
 	struct mlxsw_sp_fib_entry *fib_entry;
+	struct mlxsw_sp_fib_node *fib_node;
 
 	if (mlxsw_sp->router.aborted)
 		return;
 
-	fib_entry = mlxsw_sp_fib_entry_find(mlxsw_sp, fen_info);
-	if (!fib_entry)
+	fib_entry = mlxsw_sp_fib4_entry_lookup(mlxsw_sp, fen_info);
+	if (WARN_ON(!fib_entry))
 		return;
+	fib_node = fib_entry->fib_node;
 
-	if (fib_entry->ref_count == 1) {
-		mlxsw_sp_fib_entry_del(mlxsw_sp, fib_entry);
-		mlxsw_sp_fib_entry_remove(fib_entry->vr->fib, fib_entry);
-	}
-
-	mlxsw_sp_fib_entry_put(mlxsw_sp, fib_entry);
+	mlxsw_sp_fib4_node_entry_unlink(mlxsw_sp, fib_entry);
+	mlxsw_sp_fib4_entry_destroy(mlxsw_sp, fib_entry);
+	mlxsw_sp_fib4_node_put(mlxsw_sp, fib_node);
 }
 
 static int mlxsw_sp_router_set_abort_trap(struct mlxsw_sp *mlxsw_sp)
@@ -2108,10 +2304,42 @@ static int mlxsw_sp_router_set_abort_trap(struct mlxsw_sp *mlxsw_sp)
 	return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(ralue), ralue_pl);
 }
 
+static void mlxsw_sp_fib4_node_flush(struct mlxsw_sp *mlxsw_sp,
+				     struct mlxsw_sp_fib_node *fib_node)
+{
+	struct mlxsw_sp_fib_entry *fib_entry, *tmp;
+
+	list_for_each_entry_safe(fib_entry, tmp, &fib_node->entry_list, list) {
+		bool do_break = &tmp->list == &fib_node->entry_list;
+
+		mlxsw_sp_fib4_node_entry_unlink(mlxsw_sp, fib_entry);
+		mlxsw_sp_fib4_entry_destroy(mlxsw_sp, fib_entry);
+		mlxsw_sp_fib4_node_put(mlxsw_sp, fib_node);
+		/* Break when entry list is empty and node was freed.
+		 * Otherwise, we'll access freed memory in the next
+		 * iteration.
+		 */
+		if (do_break)
+			break;
+	}
+}
+
+static void mlxsw_sp_fib_node_flush(struct mlxsw_sp *mlxsw_sp,
+				    struct mlxsw_sp_fib_node *fib_node)
+{
+	switch (fib_node->vr->proto) {
+	case MLXSW_SP_L3_PROTO_IPV4:
+		mlxsw_sp_fib4_node_flush(mlxsw_sp, fib_node);
+		break;
+	case MLXSW_SP_L3_PROTO_IPV6:
+		WARN_ON_ONCE(1);
+		break;
+	}
+}
+
 static void mlxsw_sp_router_fib_flush(struct mlxsw_sp *mlxsw_sp)
 {
-	struct mlxsw_sp_fib_entry *fib_entry;
-	struct mlxsw_sp_fib_entry *tmp;
+	struct mlxsw_sp_fib_node *fib_node, *tmp;
 	struct mlxsw_sp_vr *vr;
 	int i;
 
@@ -2121,14 +2349,11 @@ static void mlxsw_sp_router_fib_flush(struct mlxsw_sp *mlxsw_sp)
 		if (!vr->used)
 			continue;
 
-		list_for_each_entry_safe(fib_entry, tmp,
-					 &vr->fib->entry_list, list) {
-			bool do_break = &tmp->list == &vr->fib->entry_list;
+		list_for_each_entry_safe(fib_node, tmp, &vr->fib->node_list,
+					 list) {
+			bool do_break = &tmp->list == &vr->fib->node_list;
 
-			mlxsw_sp_fib_entry_del(mlxsw_sp, fib_entry);
-			mlxsw_sp_fib_entry_remove(fib_entry->vr->fib,
-						  fib_entry);
-			mlxsw_sp_fib_entry_put_all(mlxsw_sp, fib_entry);
+			mlxsw_sp_fib_node_flush(mlxsw_sp, fib_node);
 			if (do_break)
 				break;
 		}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [patch net-next 6/7] mlxsw: spectrum_router: Add support for route append
  2017-02-09  9:28 [patch net-next 0/7] mlxsw: Identical routes handling Jiri Pirko
                   ` (4 preceding siblings ...)
  2017-02-09  9:28 ` [patch net-next 5/7] mlxsw: spectrum_router: Correctly handle identical routes Jiri Pirko
@ 2017-02-09  9:28 ` Jiri Pirko
  2017-02-09  9:28 ` [patch net-next 7/7] mlxsw: spectrum_router: Add support for route replace Jiri Pirko
  2017-02-10 16:34 ` [patch net-next 0/7] mlxsw: Identical routes handling David Miller
  7 siblings, 0 replies; 10+ messages in thread
From: Jiri Pirko @ 2017-02-09  9:28 UTC (permalink / raw)
  To: netdev; +Cc: davem, idosch, eladr, mlxsw

From: Ido Schimmel <idosch@mellanox.com>

When a new route is appended, it's placed after existing routes sharing
the same parameters (prefix, length, table ID, TOS and priority).

While the device supports only one route with the same prefix and length
in a single table, it's important to correctly place the appended route
in the driver's cache, as when a route is deleted the next one is
programmed into the device.

Following the reception of an ENTRY_APPEND notification, resolve the
FIB node corresponding to the prefix and length and correctly place the
new entry in its entry list.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 .../net/ethernet/mellanox/mlxsw/spectrum_router.c  | 43 +++++++++++++++++++---
 1 file changed, 37 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index 7c55df9..d98f039 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -2105,14 +2105,38 @@ mlxsw_sp_fib4_node_entry_find(const struct mlxsw_sp_fib_node *fib_node,
 	return NULL;
 }
 
+static int mlxsw_sp_fib4_node_list_append(struct mlxsw_sp_fib_entry *fib_entry,
+					  struct mlxsw_sp_fib_entry *new_entry)
+{
+	struct mlxsw_sp_fib_node *fib_node;
+
+	if (WARN_ON(!fib_entry))
+		return -EINVAL;
+
+	fib_node = fib_entry->fib_node;
+	list_for_each_entry_from(fib_entry, &fib_node->entry_list, list) {
+		if (fib_entry->params.tb_id != new_entry->params.tb_id ||
+		    fib_entry->params.tos != new_entry->params.tos ||
+		    fib_entry->params.prio != new_entry->params.prio)
+			break;
+	}
+
+	list_add_tail(&new_entry->list, &fib_entry->list);
+	return 0;
+}
+
 static int
 mlxsw_sp_fib4_node_list_insert(struct mlxsw_sp_fib_node *fib_node,
-			       struct mlxsw_sp_fib_entry *new_entry)
+			       struct mlxsw_sp_fib_entry *new_entry,
+			       bool append)
 {
 	struct mlxsw_sp_fib_entry *fib_entry;
 
 	fib_entry = mlxsw_sp_fib4_node_entry_find(fib_node, &new_entry->params);
 
+	if (append)
+		return mlxsw_sp_fib4_node_list_append(fib_entry, new_entry);
+
 	if (fib_entry) {
 		list_add_tail(&new_entry->list, &fib_entry->list);
 	} else {
@@ -2182,12 +2206,13 @@ mlxsw_sp_fib4_node_entry_del(struct mlxsw_sp *mlxsw_sp,
 }
 
 static int mlxsw_sp_fib4_node_entry_link(struct mlxsw_sp *mlxsw_sp,
-					 struct mlxsw_sp_fib_entry *fib_entry)
+					 struct mlxsw_sp_fib_entry *fib_entry,
+					 bool append)
 {
 	struct mlxsw_sp_fib_node *fib_node = fib_entry->fib_node;
 	int err;
 
-	err = mlxsw_sp_fib4_node_list_insert(fib_node, fib_entry);
+	err = mlxsw_sp_fib4_node_list_insert(fib_node, fib_entry, append);
 	if (err)
 		return err;
 
@@ -2217,7 +2242,8 @@ mlxsw_sp_fib4_node_entry_unlink(struct mlxsw_sp *mlxsw_sp,
 
 static int
 mlxsw_sp_router_fib4_add(struct mlxsw_sp *mlxsw_sp,
-			 const struct fib_entry_notifier_info *fen_info)
+			 const struct fib_entry_notifier_info *fen_info,
+			 bool append)
 {
 	struct mlxsw_sp_fib_entry *fib_entry;
 	struct mlxsw_sp_fib_node *fib_node;
@@ -2239,7 +2265,7 @@ mlxsw_sp_router_fib4_add(struct mlxsw_sp *mlxsw_sp,
 		goto err_fib4_entry_create;
 	}
 
-	err = mlxsw_sp_fib4_node_entry_link(mlxsw_sp, fib_entry);
+	err = mlxsw_sp_fib4_node_entry_link(mlxsw_sp, fib_entry, append);
 	if (err) {
 		dev_warn(mlxsw_sp->bus_info->dev, "Failed to link FIB entry to node\n");
 		goto err_fib4_node_entry_link;
@@ -2453,13 +2479,17 @@ static void mlxsw_sp_router_fib_event_work(struct work_struct *work)
 	struct mlxsw_sp_fib_event_work *fib_work =
 		container_of(work, struct mlxsw_sp_fib_event_work, work);
 	struct mlxsw_sp *mlxsw_sp = fib_work->mlxsw_sp;
+	bool append;
 	int err;
 
 	/* Protect internal structures from changes */
 	rtnl_lock();
 	switch (fib_work->event) {
+	case FIB_EVENT_ENTRY_APPEND: /* fall through */
 	case FIB_EVENT_ENTRY_ADD:
-		err = mlxsw_sp_router_fib4_add(mlxsw_sp, &fib_work->fen_info);
+		append = fib_work->event == FIB_EVENT_ENTRY_APPEND;
+		err = mlxsw_sp_router_fib4_add(mlxsw_sp, &fib_work->fen_info,
+					       append);
 		if (err)
 			mlxsw_sp_router_fib4_abort(mlxsw_sp);
 		fib_info_put(fib_work->fen_info.fi);
@@ -2503,6 +2533,7 @@ static int mlxsw_sp_router_fib_event(struct notifier_block *nb,
 	fib_work->event = event;
 
 	switch (event) {
+	case FIB_EVENT_ENTRY_APPEND: /* fall through */
 	case FIB_EVENT_ENTRY_ADD: /* fall through */
 	case FIB_EVENT_ENTRY_DEL:
 		memcpy(&fib_work->fen_info, ptr, sizeof(fib_work->fen_info));
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [patch net-next 7/7] mlxsw: spectrum_router: Add support for route replace
  2017-02-09  9:28 [patch net-next 0/7] mlxsw: Identical routes handling Jiri Pirko
                   ` (5 preceding siblings ...)
  2017-02-09  9:28 ` [patch net-next 6/7] mlxsw: spectrum_router: Add support for route append Jiri Pirko
@ 2017-02-09  9:28 ` Jiri Pirko
  2017-02-10 16:34 ` [patch net-next 0/7] mlxsw: Identical routes handling David Miller
  7 siblings, 0 replies; 10+ messages in thread
From: Jiri Pirko @ 2017-02-09  9:28 UTC (permalink / raw)
  To: netdev; +Cc: davem, idosch, eladr, mlxsw

From: Ido Schimmel <idosch@mellanox.com>

Upon the reception of an ENTRY_REPLACE notification, resolve the FIB
node corresponding to the prefix and length and insert the new route
before the first matching entry.

Since the notification also signals the deletion of the replaced route,
delete it from the driver's cache.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 .../net/ethernet/mellanox/mlxsw/spectrum_router.c  | 44 ++++++++++++++++++----
 1 file changed, 37 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index d98f039..d7ac22d 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -2128,7 +2128,7 @@ static int mlxsw_sp_fib4_node_list_append(struct mlxsw_sp_fib_entry *fib_entry,
 static int
 mlxsw_sp_fib4_node_list_insert(struct mlxsw_sp_fib_node *fib_node,
 			       struct mlxsw_sp_fib_entry *new_entry,
-			       bool append)
+			       bool replace, bool append)
 {
 	struct mlxsw_sp_fib_entry *fib_entry;
 
@@ -2136,7 +2136,12 @@ mlxsw_sp_fib4_node_list_insert(struct mlxsw_sp_fib_node *fib_node,
 
 	if (append)
 		return mlxsw_sp_fib4_node_list_append(fib_entry, new_entry);
+	if (replace && WARN_ON(!fib_entry))
+		return -EINVAL;
 
+	/* Insert new entry before replaced one, so that we can later
+	 * remove the second.
+	 */
 	if (fib_entry) {
 		list_add_tail(&new_entry->list, &fib_entry->list);
 	} else {
@@ -2207,12 +2212,13 @@ mlxsw_sp_fib4_node_entry_del(struct mlxsw_sp *mlxsw_sp,
 
 static int mlxsw_sp_fib4_node_entry_link(struct mlxsw_sp *mlxsw_sp,
 					 struct mlxsw_sp_fib_entry *fib_entry,
-					 bool append)
+					 bool replace, bool append)
 {
 	struct mlxsw_sp_fib_node *fib_node = fib_entry->fib_node;
 	int err;
 
-	err = mlxsw_sp_fib4_node_list_insert(fib_node, fib_entry, append);
+	err = mlxsw_sp_fib4_node_list_insert(fib_node, fib_entry, replace,
+					     append);
 	if (err)
 		return err;
 
@@ -2240,10 +2246,28 @@ mlxsw_sp_fib4_node_entry_unlink(struct mlxsw_sp *mlxsw_sp,
 	mlxsw_sp_fib4_node_list_remove(fib_entry);
 }
 
+static void mlxsw_sp_fib4_entry_replace(struct mlxsw_sp *mlxsw_sp,
+					struct mlxsw_sp_fib_entry *fib_entry,
+					bool replace)
+{
+	struct mlxsw_sp_fib_node *fib_node = fib_entry->fib_node;
+	struct mlxsw_sp_fib_entry *replaced;
+
+	if (!replace)
+		return;
+
+	/* We inserted the new entry before replaced one */
+	replaced = list_next_entry(fib_entry, list);
+
+	mlxsw_sp_fib4_node_entry_unlink(mlxsw_sp, replaced);
+	mlxsw_sp_fib4_entry_destroy(mlxsw_sp, replaced);
+	mlxsw_sp_fib4_node_put(mlxsw_sp, fib_node);
+}
+
 static int
 mlxsw_sp_router_fib4_add(struct mlxsw_sp *mlxsw_sp,
 			 const struct fib_entry_notifier_info *fen_info,
-			 bool append)
+			 bool replace, bool append)
 {
 	struct mlxsw_sp_fib_entry *fib_entry;
 	struct mlxsw_sp_fib_node *fib_node;
@@ -2265,12 +2289,15 @@ mlxsw_sp_router_fib4_add(struct mlxsw_sp *mlxsw_sp,
 		goto err_fib4_entry_create;
 	}
 
-	err = mlxsw_sp_fib4_node_entry_link(mlxsw_sp, fib_entry, append);
+	err = mlxsw_sp_fib4_node_entry_link(mlxsw_sp, fib_entry, replace,
+					    append);
 	if (err) {
 		dev_warn(mlxsw_sp->bus_info->dev, "Failed to link FIB entry to node\n");
 		goto err_fib4_node_entry_link;
 	}
 
+	mlxsw_sp_fib4_entry_replace(mlxsw_sp, fib_entry, replace);
+
 	return 0;
 
 err_fib4_node_entry_link:
@@ -2479,17 +2506,19 @@ static void mlxsw_sp_router_fib_event_work(struct work_struct *work)
 	struct mlxsw_sp_fib_event_work *fib_work =
 		container_of(work, struct mlxsw_sp_fib_event_work, work);
 	struct mlxsw_sp *mlxsw_sp = fib_work->mlxsw_sp;
-	bool append;
+	bool replace, append;
 	int err;
 
 	/* Protect internal structures from changes */
 	rtnl_lock();
 	switch (fib_work->event) {
+	case FIB_EVENT_ENTRY_REPLACE: /* fall through */
 	case FIB_EVENT_ENTRY_APPEND: /* fall through */
 	case FIB_EVENT_ENTRY_ADD:
+		replace = fib_work->event == FIB_EVENT_ENTRY_REPLACE;
 		append = fib_work->event == FIB_EVENT_ENTRY_APPEND;
 		err = mlxsw_sp_router_fib4_add(mlxsw_sp, &fib_work->fen_info,
-					       append);
+					       replace, append);
 		if (err)
 			mlxsw_sp_router_fib4_abort(mlxsw_sp);
 		fib_info_put(fib_work->fen_info.fi);
@@ -2533,6 +2562,7 @@ static int mlxsw_sp_router_fib_event(struct notifier_block *nb,
 	fib_work->event = event;
 
 	switch (event) {
+	case FIB_EVENT_ENTRY_REPLACE: /* fall through */
 	case FIB_EVENT_ENTRY_APPEND: /* fall through */
 	case FIB_EVENT_ENTRY_ADD: /* fall through */
 	case FIB_EVENT_ENTRY_DEL:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [patch net-next 1/7] ipv4: fib: Only flush FIB aliases belonging to currently flushed table
  2017-02-09  9:28 ` [patch net-next 1/7] ipv4: fib: Only flush FIB aliases belonging to currently flushed table Jiri Pirko
@ 2017-02-09 19:00   ` Duyck, Alexander H
  0 siblings, 0 replies; 10+ messages in thread
From: Duyck, Alexander H @ 2017-02-09 19:00 UTC (permalink / raw)
  To: netdev, jiri; +Cc: davem, idosch, eladr, mlxsw, kaber

On Thu, 2017-02-09 at 10:28 +0100, Jiri Pirko wrote:
> From: Ido Schimmel <idosch@mellanox.com>
> 
> In case the MAIN table is flushed and its trie is shared with the LOCAL
> table, then we might be flushing FIB aliases belonging to the latter.
> This can lead to FIB_ENTRY_DEL notifications sent with the wrong table
> ID.
> 
> The above doesn't affect current listeners, as the table ID is ignored
> during entry deletion, but this will change later in the patchset.
> 
> When flushing a particular table, skip any aliases belonging to a
> different one.
> 
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> CC: Alexander Duyck <alexander.h.duyck@intel.com>
> CC: Patrick McHardy <kaber@trash.net>
> ---
>  net/ipv4/fib_trie.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
> index 2919d1a..5ef4596 100644
> --- a/net/ipv4/fib_trie.c
> +++ b/net/ipv4/fib_trie.c
> @@ -1963,7 +1963,8 @@ int fib_table_flush(struct net *net, struct fib_table *tb)
>  		hlist_for_each_entry_safe(fa, tmp, &n->leaf, fa_list) {
>  			struct fib_info *fi = fa->fa_info;
>  
> -			if (!fi || !(fi->fib_flags & RTNH_F_DEAD)) {
> +			if (!fi || !(fi->fib_flags & RTNH_F_DEAD) ||
> +			    tb->tb_id != fa->tb_id) {
>  				slen = fa->fa_slen;
>  				continue;
>  			}

One change I might make if you end up having to do a v2 would be to
test for the table ID first.  It can end up saving a few cycles in the
whole flushing process since the table ID is in the fib alias instead
of having to dereference the fib info.

That being said, I am just being a bit nit-picky so the code itself is
functionally correct and there is nothing here that should cause any
issues.

Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch net-next 0/7] mlxsw: Identical routes handling
  2017-02-09  9:28 [patch net-next 0/7] mlxsw: Identical routes handling Jiri Pirko
                   ` (6 preceding siblings ...)
  2017-02-09  9:28 ` [patch net-next 7/7] mlxsw: spectrum_router: Add support for route replace Jiri Pirko
@ 2017-02-10 16:34 ` David Miller
  7 siblings, 0 replies; 10+ messages in thread
From: David Miller @ 2017-02-10 16:34 UTC (permalink / raw)
  To: jiri; +Cc: netdev, idosch, eladr, mlxsw

From: Jiri Pirko <jiri@resnulli.us>
Date: Thu,  9 Feb 2017 10:28:37 +0100

> From: Jiri Pirko <jiri@mellanox.com>
> 
> Ido says:
> 
> The kernel can store several FIB aliases that share the same prefix and
> length. These aliases can differ in other parameters such as TOS and
> metric, which are taken into account during lookup.
> 
> Offloading devices might not have the same flexibility, allowing only a
> single route with the same prefix and length to be reflected. mlxsw is
> one such device.
> 
> This patchset aims to correctly handle this situation in the mlxsw
> driver. The first four patches introduce small changes in the IPv4 FIB
> code, so that listeners of the FIB notification chain will be able to
> correctly handle identical routes.
> 
> The last three patches build on top of previous work and introduce the
> necessary changes in the mlxsw driver. The biggest change is the
> introduction of a FIB node, where identical routes are chained, instead
> of a primitive reference counting. This is explained in detail in the
> fifth patch.

Looks good, series applied, thanks Jiri and Ido.

I think you took care of this properly, but just always make sure that
if a delete event is emitted the object is not in the table any longer
and cannot be discovered by a parallel thread of execution at that
point.

Likewise a good rule of thumb is to make sure the object is
discoverable when you emit the add event.

Thanks again.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-02-10 16:41 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-09  9:28 [patch net-next 0/7] mlxsw: Identical routes handling Jiri Pirko
2017-02-09  9:28 ` [patch net-next 1/7] ipv4: fib: Only flush FIB aliases belonging to currently flushed table Jiri Pirko
2017-02-09 19:00   ` Duyck, Alexander H
2017-02-09  9:28 ` [patch net-next 2/7] ipv4: fib: Send deletion notification with actual FIB alias type Jiri Pirko
2017-02-09  9:28 ` [patch net-next 3/7] ipv4: fib: Send notification before deleting FIB alias Jiri Pirko
2017-02-09  9:28 ` [patch net-next 4/7] ipv4: fib: Add events for FIB replace and append Jiri Pirko
2017-02-09  9:28 ` [patch net-next 5/7] mlxsw: spectrum_router: Correctly handle identical routes Jiri Pirko
2017-02-09  9:28 ` [patch net-next 6/7] mlxsw: spectrum_router: Add support for route append Jiri Pirko
2017-02-09  9:28 ` [patch net-next 7/7] mlxsw: spectrum_router: Add support for route replace Jiri Pirko
2017-02-10 16:34 ` [patch net-next 0/7] mlxsw: Identical routes handling David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.