netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH net-next 00/15] Simplify IPv4 route offload API
@ 2019-10-02  8:40 Ido Schimmel
  2019-10-02  8:40 ` [RFC PATCH net-next 01/15] ipv4: Add temporary events to the FIB notification chain Ido Schimmel
                   ` (15 more replies)
  0 siblings, 16 replies; 39+ messages in thread
From: Ido Schimmel @ 2019-10-02  8:40 UTC (permalink / raw)
  To: netdev; +Cc: davem, dsahern, jiri, jakub.kicinski, saeedm, mlxsw, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

Today, whenever an IPv4 route is added or deleted a notification is sent
in the FIB notification chain and it is up to offload drivers to decide
if the route should be programmed to the hardware or not. This is not an
easy task as in hardware routes are keyed by {prefix, prefix length,
table id}, whereas the kernel can store multiple such routes that only
differ in metric / TOS / nexthop info.

This series makes sure that only routes that are actually used in the
data path are notified to offload drivers. This greatly simplifies the
work these drivers need to do, as they are now only concerned with
programming the hardware and do not need to replicate the IPv4 route
insertion logic and store multiple identical routes.

The route that is notified is the first FIB alias in the FIB node with
the given {prefix, prefix length, table ID}. In case the route is
deleted and there is another route with the same key, a replace
notification is emitted. Otherwise, a delete notification is emitted.

The above means that in the case of multiple routes with the same key,
but different TOS, only the route with the highest TOS is notified.
While the kernel can route a packet based on its TOS, this is not
supported by any hardware devices I'm familiar with. Moreover, this is
not supported by IPv6 nor by BIRD/FRR from what I could see. Offload
drivers should therefore use the presence of a non-zero TOS as an
indication to trap packets matching the route and let the kernel route
them instead. mlxsw has been doing it for the past two years.

The series also adds an "in hardware" indication to routes, in addition
to the offload indication we already have on nexthops today. Besides
being long overdue, the reason this is done in this series is that it
makes it possible to easily test the new FIB notification API over
netdevsim.

To ensure there is no degradation in route insertion rates, I used
Vincent Bernat's script [1][2] from [3] to inject 500,000 routes from an
MRT dump from a router with a full view. On a system with Intel(R)
Xeon(R) CPU D-1527 @ 2.20GHz I measured 8.184 seconds, averaged over 10
runs and saw no degradation compared to net-next from today.

Patchset overview:
Patches #1-#7 introduce the new FIB notifications
Patches #8-#9 convert listeners to make use of the new notifications
Patches #10-#14 add "in hardware" indication for IPv4 routes, including
a dummy FIB offload implementation in netdevsim
Patch #15 adds a selftest for the new FIB notifications API over
netdevsim

The series is based on Jiri's "devlink: allow devlink instances to
change network namespace" series [4]. The patches can be found here [5]
and patched iproute2 with the "in hardware" indication can be found here
[6].

IPv6 is next on my TODO list.

[1] https://github.com/vincentbernat/network-lab/blob/master/common/helpers/lab-routes-ipvX/insert-from-bgp
[2] https://gist.github.com/idosch/2eb96efe50eb5234d205e964f0814859
[3] https://vincent.bernat.ch/en/blog/2017-ipv4-route-lookup-linux
[4] https://patchwork.ozlabs.org/cover/1162295/
[5] https://github.com/idosch/linux/tree/fib-notifier
[6] https://github.com/idosch/iproute2/tree/fib-notifier

Ido Schimmel (15):
  ipv4: Add temporary events to the FIB notification chain
  ipv4: Notify route after insertion to the routing table
  ipv4: Notify route if replacing currently offloaded one
  ipv4: Notify newly added route if should be offloaded
  ipv4: Handle route deletion notification
  ipv4: Handle route deletion notification during flush
  ipv4: Only Replay routes of interest to new listeners
  mlxsw: spectrum_router: Start using new IPv4 route notifications
  ipv4: Remove old route notifications and convert listeners
  ipv4: Replace route in list before notifying
  ipv4: Encapsulate function arguments in a struct
  ipv4: Add "in hardware" indication to routes
  mlxsw: spectrum_router: Mark routes as "in hardware"
  netdevsim: fib: Mark routes as "in hardware"
  selftests: netdevsim: Add test for route offload API

 .../net/ethernet/mellanox/mlx5/core/lag_mp.c  |   4 -
 .../ethernet/mellanox/mlxsw/spectrum_router.c | 152 ++-----
 drivers/net/ethernet/rocker/rocker_main.c     |   4 +-
 drivers/net/netdevsim/fib.c                   | 263 ++++++++++-
 include/net/ip_fib.h                          |   5 +
 include/uapi/linux/rtnetlink.h                |   1 +
 net/ipv4/fib_lookup.h                         |  18 +-
 net/ipv4/fib_semantics.c                      |  30 +-
 net/ipv4/fib_trie.c                           | 223 ++++++++--
 net/ipv4/route.c                              |  12 +-
 .../drivers/net/netdevsim/fib_notifier.sh     | 411 ++++++++++++++++++
 11 files changed, 938 insertions(+), 185 deletions(-)
 create mode 100755 tools/testing/selftests/drivers/net/netdevsim/fib_notifier.sh

-- 
2.21.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH net-next 01/15] ipv4: Add temporary events to the FIB notification chain
  2019-10-02  8:40 [RFC PATCH net-next 00/15] Simplify IPv4 route offload API Ido Schimmel
@ 2019-10-02  8:40 ` Ido Schimmel
  2019-10-02  8:40 ` [RFC PATCH net-next 02/15] ipv4: Notify route after insertion to the routing table Ido Schimmel
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 39+ messages in thread
From: Ido Schimmel @ 2019-10-02  8:40 UTC (permalink / raw)
  To: netdev; +Cc: davem, dsahern, jiri, jakub.kicinski, saeedm, mlxsw, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

Subsequent patches are going to simplify the IPv4 route offload API,
which will only use two events - replace and delete.

Introduce a temporary version of these two events in order to make the
conversion easier to review.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 include/net/fib_notifier.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/net/fib_notifier.h b/include/net/fib_notifier.h
index 6d59221ff05a..b3c54325caec 100644
--- a/include/net/fib_notifier.h
+++ b/include/net/fib_notifier.h
@@ -23,6 +23,8 @@ enum fib_event_type {
 	FIB_EVENT_NH_DEL,
 	FIB_EVENT_VIF_ADD,
 	FIB_EVENT_VIF_DEL,
+	FIB_EVENT_ENTRY_REPLACE_TMP,
+	FIB_EVENT_ENTRY_DEL_TMP,
 };
 
 struct fib_notifier_ops {
-- 
2.21.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH net-next 02/15] ipv4: Notify route after insertion to the routing table
  2019-10-02  8:40 [RFC PATCH net-next 00/15] Simplify IPv4 route offload API Ido Schimmel
  2019-10-02  8:40 ` [RFC PATCH net-next 01/15] ipv4: Add temporary events to the FIB notification chain Ido Schimmel
@ 2019-10-02  8:40 ` Ido Schimmel
  2019-10-03  1:34   ` David Ahern
  2019-10-02  8:40 ` [RFC PATCH net-next 03/15] ipv4: Notify route if replacing currently offloaded one Ido Schimmel
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 39+ messages in thread
From: Ido Schimmel @ 2019-10-02  8:40 UTC (permalink / raw)
  To: netdev; +Cc: davem, dsahern, jiri, jakub.kicinski, saeedm, mlxsw, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

Currently, a new route is notified in the FIB notification chain before
it is inserted to the FIB alias list.

Subsequent patches will use the placement of the new route in the
ordered FIB alias list in order to determine if the route should be
notified or not.

As a preparatory step, change the order so that the route is first
inserted into the FIB alias list and only then notified.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 net/ipv4/fib_trie.c | 29 ++++++++++++++---------------
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index b9df9c09b84e..3ba63ebcfeef 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1063,9 +1063,6 @@ static int fib_insert_node(struct trie *t, struct key_vector *tp,
 	return -ENOMEM;
 }
 
-/* fib notifier for ADD is sent before calling fib_insert_alias with
- * the expectation that the only possible failure ENOMEM
- */
 static int fib_insert_alias(struct trie *t, struct key_vector *tp,
 			    struct key_vector *l, struct fib_alias *new,
 			    struct fib_alias *fa, t_key key)
@@ -1118,6 +1115,9 @@ static bool fib_valid_key_len(u32 key, u8 plen, struct netlink_ext_ack *extack)
 	return true;
 }
 
+static void fib_remove_alias(struct trie *t, struct key_vector *tp,
+			     struct key_vector *l, struct fib_alias *old);
+
 /* Caller must hold RTNL. */
 int fib_table_insert(struct net *net, struct fib_table *tb,
 		     struct fib_config *cfg, struct netlink_ext_ack *extack)
@@ -1269,14 +1269,19 @@ int fib_table_insert(struct net *net, struct fib_table *tb,
 	new_fa->tb_id = tb->tb_id;
 	new_fa->fa_default = -1;
 
-	err = call_fib_entry_notifiers(net, event, key, plen, new_fa, extack);
+	/* Insert new entry to the list. */
+	err = fib_insert_alias(t, tp, l, new_fa, fa, key);
 	if (err)
 		goto out_free_new_fa;
 
-	/* Insert new entry to the list. */
-	err = fib_insert_alias(t, tp, l, new_fa, fa, key);
+	/* The alias was already inserted, so the node must exist. */
+	l = fib_find_node(t, &tp, key);
+	if (WARN_ON_ONCE(!l))
+		goto out_free_new_fa;
+
+	err = call_fib_entry_notifiers(net, event, key, plen, new_fa, extack);
 	if (err)
-		goto out_fib_notif;
+		goto out_remove_new_fa;
 
 	if (!plen)
 		tb->tb_num_default++;
@@ -1287,14 +1292,8 @@ int fib_table_insert(struct net *net, struct fib_table *tb,
 succeeded:
 	return 0;
 
-out_fib_notif:
-	/* notifier was sent that entry would be added to trie, but
-	 * the add failed and need to recover. Only failure for
-	 * fib_insert_alias is ENOMEM.
-	 */
-	NL_SET_ERR_MSG(extack, "Failed to insert route into trie");
-	call_fib_entry_notifiers(net, FIB_EVENT_ENTRY_DEL, key,
-				 plen, new_fa, NULL);
+out_remove_new_fa:
+	fib_remove_alias(t, tp, l, new_fa);
 out_free_new_fa:
 	kmem_cache_free(fn_alias_kmem, new_fa);
 out:
-- 
2.21.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH net-next 03/15] ipv4: Notify route if replacing currently offloaded one
  2019-10-02  8:40 [RFC PATCH net-next 00/15] Simplify IPv4 route offload API Ido Schimmel
  2019-10-02  8:40 ` [RFC PATCH net-next 01/15] ipv4: Add temporary events to the FIB notification chain Ido Schimmel
  2019-10-02  8:40 ` [RFC PATCH net-next 02/15] ipv4: Notify route after insertion to the routing table Ido Schimmel
@ 2019-10-02  8:40 ` Ido Schimmel
  2019-10-02  8:40 ` [RFC PATCH net-next 04/15] ipv4: Notify newly added route if should be offloaded Ido Schimmel
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 39+ messages in thread
From: Ido Schimmel @ 2019-10-02  8:40 UTC (permalink / raw)
  To: netdev; +Cc: davem, dsahern, jiri, jakub.kicinski, saeedm, mlxsw, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

When replacing a route, its replacement should only be notified in case
the replaced route is of any interest to listeners. In other words, if
the replaced route is currently used in the data path, which means it is
the first route in the FIB alias list with the given {prefix, prefix
length, table ID}.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 net/ipv4/fib_trie.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 3ba63ebcfeef..8387b275721c 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -978,6 +978,27 @@ static struct key_vector *fib_find_node(struct trie *t,
 	return n;
 }
 
+/* Return the first fib alias matching prefix length and table ID. */
+static struct fib_alias *fib_find_first_alias(struct hlist_head *fah, u8 slen,
+					      u32 tb_id)
+{
+	struct fib_alias *fa;
+
+	hlist_for_each_entry(fa, fah, fa_list) {
+		if (fa->fa_slen < slen)
+			continue;
+		if (fa->fa_slen != slen)
+			break;
+		if (fa->tb_id > tb_id)
+			continue;
+		if (fa->tb_id != tb_id)
+			break;
+		return fa;
+	}
+
+	return NULL;
+}
+
 /* Return the first fib alias matching TOS with
  * priority less than or equal to PRIO.
  */
@@ -1217,6 +1238,17 @@ int fib_table_insert(struct net *net, struct fib_table *tb,
 			new_fa->tb_id = tb->tb_id;
 			new_fa->fa_default = -1;
 
+			if (fib_find_first_alias(&l->leaf, fa->fa_slen,
+						 tb->tb_id) == fa) {
+				enum fib_event_type fib_event;
+
+				fib_event = FIB_EVENT_ENTRY_REPLACE_TMP;
+				err = call_fib_entry_notifiers(net, fib_event,
+							       key, plen,
+							       new_fa, extack);
+				if (err)
+					goto out_free_new_fa;
+			}
 			err = call_fib_entry_notifiers(net,
 						       FIB_EVENT_ENTRY_REPLACE,
 						       key, plen, new_fa,
-- 
2.21.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH net-next 04/15] ipv4: Notify newly added route if should be offloaded
  2019-10-02  8:40 [RFC PATCH net-next 00/15] Simplify IPv4 route offload API Ido Schimmel
                   ` (2 preceding siblings ...)
  2019-10-02  8:40 ` [RFC PATCH net-next 03/15] ipv4: Notify route if replacing currently offloaded one Ido Schimmel
@ 2019-10-02  8:40 ` Ido Schimmel
  2019-10-02  8:40 ` [RFC PATCH net-next 05/15] ipv4: Handle route deletion notification Ido Schimmel
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 39+ messages in thread
From: Ido Schimmel @ 2019-10-02  8:40 UTC (permalink / raw)
  To: netdev; +Cc: davem, dsahern, jiri, jakub.kicinski, saeedm, mlxsw, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

When a route is added, it should only be notified in case it is the
first route in the FIB alias list with the given {prefix, prefix length,
table ID}. Otherwise, it is not used in the data path and should not be
considered by switch drivers.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 net/ipv4/fib_trie.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 8387b275721c..e5896729dcb9 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1311,6 +1311,16 @@ int fib_table_insert(struct net *net, struct fib_table *tb,
 	if (WARN_ON_ONCE(!l))
 		goto out_free_new_fa;
 
+	if (fib_find_first_alias(&l->leaf, new_fa->fa_slen, tb->tb_id) ==
+	    new_fa) {
+		enum fib_event_type fib_event;
+
+		fib_event = FIB_EVENT_ENTRY_REPLACE_TMP;
+		err = call_fib_entry_notifiers(net, fib_event, key, plen,
+					       new_fa, extack);
+		if (err)
+			goto out_remove_new_fa;
+	}
 	err = call_fib_entry_notifiers(net, event, key, plen, new_fa, extack);
 	if (err)
 		goto out_remove_new_fa;
-- 
2.21.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH net-next 05/15] ipv4: Handle route deletion notification
  2019-10-02  8:40 [RFC PATCH net-next 00/15] Simplify IPv4 route offload API Ido Schimmel
                   ` (3 preceding siblings ...)
  2019-10-02  8:40 ` [RFC PATCH net-next 04/15] ipv4: Notify newly added route if should be offloaded Ido Schimmel
@ 2019-10-02  8:40 ` Ido Schimmel
  2019-10-02  8:40 ` [RFC PATCH net-next 06/15] ipv4: Handle route deletion notification during flush Ido Schimmel
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 39+ messages in thread
From: Ido Schimmel @ 2019-10-02  8:40 UTC (permalink / raw)
  To: netdev; +Cc: davem, dsahern, jiri, jakub.kicinski, saeedm, mlxsw, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

When a route is deleted we potentially need to promote the next route in
the FIB alias list (e.g., with an higher metric). In case we find such a
route, a replace notification is emitted. Otherwise, a delete
notification for the deleted route.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 net/ipv4/fib_trie.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index e5896729dcb9..ba597dbe1cc5 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1586,6 +1586,36 @@ static void fib_remove_alias(struct trie *t, struct key_vector *tp,
 	node_pull_suffix(tp, fa->fa_slen);
 }
 
+static void fib_notify_alias_delete(struct net *net, u32 key,
+				    struct hlist_head *fah,
+				    struct fib_alias *fa_to_delete,
+				    struct netlink_ext_ack *extack)
+{
+	struct fib_alias *fa_next, *fa_to_notify;
+	u32 tb_id = fa_to_delete->tb_id;
+	u8 slen = fa_to_delete->fa_slen;
+	enum fib_event_type fib_event;
+
+	/* Do not notify if we do not care about the route. */
+	if (fib_find_first_alias(fah, slen, tb_id) != fa_to_delete)
+		return;
+
+	/* Determine if the route should be replaced by the next route in the
+	 * list.
+	 */
+	fa_next = hlist_entry_safe(fa_to_delete->fa_list.next,
+				   struct fib_alias, fa_list);
+	if (fa_next && fa_next->fa_slen == slen && fa_next->tb_id == tb_id) {
+		fib_event = FIB_EVENT_ENTRY_REPLACE_TMP;
+		fa_to_notify = fa_next;
+	} else {
+		fib_event = FIB_EVENT_ENTRY_DEL_TMP;
+		fa_to_notify = fa_to_delete;
+	}
+	call_fib_entry_notifiers(net, fib_event, key, KEYLENGTH - slen,
+				 fa_to_notify, extack);
+}
+
 /* Caller must hold RTNL. */
 int fib_table_delete(struct net *net, struct fib_table *tb,
 		     struct fib_config *cfg, struct netlink_ext_ack *extack)
@@ -1639,6 +1669,7 @@ int fib_table_delete(struct net *net, struct fib_table *tb,
 	if (!fa_to_delete)
 		return -ESRCH;
 
+	fib_notify_alias_delete(net, key, &l->leaf, fa_to_delete, extack);
 	call_fib_entry_notifiers(net, FIB_EVENT_ENTRY_DEL, key, plen,
 				 fa_to_delete, extack);
 	rtmsg_fib(RTM_DELROUTE, htonl(key), fa_to_delete, plen, tb->tb_id,
-- 
2.21.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH net-next 06/15] ipv4: Handle route deletion notification during flush
  2019-10-02  8:40 [RFC PATCH net-next 00/15] Simplify IPv4 route offload API Ido Schimmel
                   ` (4 preceding siblings ...)
  2019-10-02  8:40 ` [RFC PATCH net-next 05/15] ipv4: Handle route deletion notification Ido Schimmel
@ 2019-10-02  8:40 ` Ido Schimmel
  2019-10-02  8:40 ` [RFC PATCH net-next 07/15] ipv4: Only Replay routes of interest to new listeners Ido Schimmel
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 39+ messages in thread
From: Ido Schimmel @ 2019-10-02  8:40 UTC (permalink / raw)
  To: netdev; +Cc: davem, dsahern, jiri, jakub.kicinski, saeedm, mlxsw, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

In a similar fashion to previous patch, when a route is deleted as part
of table flushing, promote the next route in the list, if exists.
Otherwise, simply emit a delete notification.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 net/ipv4/fib_trie.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index ba597dbe1cc5..dc4c4e2cb0b3 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1995,6 +1995,8 @@ int fib_table_flush(struct net *net, struct fib_table *tb, bool flush_all)
 				continue;
 			}
 
+			fib_notify_alias_delete(net, n->key, &n->leaf, fa,
+						NULL);
 			call_fib_entry_notifiers(net, FIB_EVENT_ENTRY_DEL,
 						 n->key,
 						 KEYLENGTH - fa->fa_slen, fa,
-- 
2.21.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH net-next 07/15] ipv4: Only Replay routes of interest to new listeners
  2019-10-02  8:40 [RFC PATCH net-next 00/15] Simplify IPv4 route offload API Ido Schimmel
                   ` (5 preceding siblings ...)
  2019-10-02  8:40 ` [RFC PATCH net-next 06/15] ipv4: Handle route deletion notification during flush Ido Schimmel
@ 2019-10-02  8:40 ` Ido Schimmel
  2019-10-02 17:44   ` Jiri Pirko
  2019-10-02  8:40 ` [RFC PATCH net-next 08/15] mlxsw: spectrum_router: Start using new IPv4 route notifications Ido Schimmel
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 39+ messages in thread
From: Ido Schimmel @ 2019-10-02  8:40 UTC (permalink / raw)
  To: netdev; +Cc: davem, dsahern, jiri, jakub.kicinski, saeedm, mlxsw, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

When a new listener is registered to the FIB notification chain it
receives a dump of all the available routes in the system. Instead, make
sure to only replay the IPv4 routes that are actually used in the data
path and are of any interest to the new listener.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 net/ipv4/fib_trie.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index dc4c4e2cb0b3..4937a3503f4f 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -2096,6 +2096,7 @@ static int fib_leaf_notify(struct key_vector *l, struct fib_table *tb,
 			   struct netlink_ext_ack *extack)
 {
 	struct fib_alias *fa;
+	int last_slen = -1;
 	int err;
 
 	hlist_for_each_entry_rcu(fa, &l->leaf, fa_list) {
@@ -2110,6 +2111,15 @@ static int fib_leaf_notify(struct key_vector *l, struct fib_table *tb,
 		if (tb->tb_id != fa->tb_id)
 			continue;
 
+		if (fa->fa_slen == last_slen)
+			continue;
+
+		last_slen = fa->fa_slen;
+		err = call_fib_entry_notifier(nb, FIB_EVENT_ENTRY_REPLACE_TMP,
+					      l->key, KEYLENGTH - fa->fa_slen,
+					      fa, extack);
+		if (err)
+			return err;
 		err = call_fib_entry_notifier(nb, FIB_EVENT_ENTRY_ADD, l->key,
 					      KEYLENGTH - fa->fa_slen,
 					      fa, extack);
-- 
2.21.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH net-next 08/15] mlxsw: spectrum_router: Start using new IPv4 route notifications
  2019-10-02  8:40 [RFC PATCH net-next 00/15] Simplify IPv4 route offload API Ido Schimmel
                   ` (6 preceding siblings ...)
  2019-10-02  8:40 ` [RFC PATCH net-next 07/15] ipv4: Only Replay routes of interest to new listeners Ido Schimmel
@ 2019-10-02  8:40 ` Ido Schimmel
  2019-10-02 17:52   ` Jiri Pirko
  2019-10-02  8:40 ` [RFC PATCH net-next 09/15] ipv4: Remove old route notifications and convert listeners Ido Schimmel
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 39+ messages in thread
From: Ido Schimmel @ 2019-10-02  8:40 UTC (permalink / raw)
  To: netdev; +Cc: davem, dsahern, jiri, jakub.kicinski, saeedm, mlxsw, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

With the new notifications mlxsw does not need to handle identical
routes itself, as this is taken care of by the core IPv4 code.

Instead, mlxsw only needs to take care of inserting and removing routes
from the device.

Convert mlxsw to use the new IPv4 route notifications and simplify the
code.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 .../ethernet/mellanox/mlxsw/spectrum_router.c | 134 +++---------------
 1 file changed, 18 insertions(+), 116 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index 0e99b64450ca..f5514fbae39e 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -3822,7 +3822,7 @@ static void mlxsw_sp_nexthop4_event(struct mlxsw_sp *mlxsw_sp,
 
 	key.fib_nh = fib_nh;
 	nh = mlxsw_sp_nexthop_lookup(mlxsw_sp, key);
-	if (WARN_ON_ONCE(!nh))
+	if (!nh)
 		return;
 
 	switch (event) {
@@ -4693,95 +4693,6 @@ static void mlxsw_sp_fib_node_put(struct mlxsw_sp *mlxsw_sp,
 	mlxsw_sp_vr_put(mlxsw_sp, vr);
 }
 
-static struct mlxsw_sp_fib4_entry *
-mlxsw_sp_fib4_node_entry_find(const struct mlxsw_sp_fib_node *fib_node,
-			      const struct mlxsw_sp_fib4_entry *new4_entry)
-{
-	struct mlxsw_sp_fib4_entry *fib4_entry;
-
-	list_for_each_entry(fib4_entry, &fib_node->entry_list, common.list) {
-		if (fib4_entry->tb_id > new4_entry->tb_id)
-			continue;
-		if (fib4_entry->tb_id != new4_entry->tb_id)
-			break;
-		if (fib4_entry->tos > new4_entry->tos)
-			continue;
-		if (fib4_entry->prio >= new4_entry->prio ||
-		    fib4_entry->tos < new4_entry->tos)
-			return fib4_entry;
-	}
-
-	return NULL;
-}
-
-static int
-mlxsw_sp_fib4_node_list_append(struct mlxsw_sp_fib4_entry *fib4_entry,
-			       struct mlxsw_sp_fib4_entry *new4_entry)
-{
-	struct mlxsw_sp_fib_node *fib_node;
-
-	if (WARN_ON(!fib4_entry))
-		return -EINVAL;
-
-	fib_node = fib4_entry->common.fib_node;
-	list_for_each_entry_from(fib4_entry, &fib_node->entry_list,
-				 common.list) {
-		if (fib4_entry->tb_id != new4_entry->tb_id ||
-		    fib4_entry->tos != new4_entry->tos ||
-		    fib4_entry->prio != new4_entry->prio)
-			break;
-	}
-
-	list_add_tail(&new4_entry->common.list, &fib4_entry->common.list);
-	return 0;
-}
-
-static int
-mlxsw_sp_fib4_node_list_insert(struct mlxsw_sp_fib4_entry *new4_entry,
-			       bool replace, bool append)
-{
-	struct mlxsw_sp_fib_node *fib_node = new4_entry->common.fib_node;
-	struct mlxsw_sp_fib4_entry *fib4_entry;
-
-	fib4_entry = mlxsw_sp_fib4_node_entry_find(fib_node, new4_entry);
-
-	if (append)
-		return mlxsw_sp_fib4_node_list_append(fib4_entry, new4_entry);
-	if (replace && WARN_ON(!fib4_entry))
-		return -EINVAL;
-
-	/* Insert new entry before replaced one, so that we can later
-	 * remove the second.
-	 */
-	if (fib4_entry) {
-		list_add_tail(&new4_entry->common.list,
-			      &fib4_entry->common.list);
-	} else {
-		struct mlxsw_sp_fib4_entry *last;
-
-		list_for_each_entry(last, &fib_node->entry_list, common.list) {
-			if (new4_entry->tb_id > last->tb_id)
-				break;
-			fib4_entry = last;
-		}
-
-		if (fib4_entry)
-			list_add(&new4_entry->common.list,
-				 &fib4_entry->common.list);
-		else
-			list_add(&new4_entry->common.list,
-				 &fib_node->entry_list);
-	}
-
-	return 0;
-}
-
-static void
-mlxsw_sp_fib4_node_list_remove(struct mlxsw_sp_fib4_entry *fib4_entry)
-{
-	list_del(&fib4_entry->common.list);
-}
-
 static int mlxsw_sp_fib_node_entry_add(struct mlxsw_sp *mlxsw_sp,
 				       struct mlxsw_sp_fib_entry *fib_entry)
 {
@@ -4825,14 +4736,12 @@ static void mlxsw_sp_fib_node_entry_del(struct mlxsw_sp *mlxsw_sp,
 }
 
 static int mlxsw_sp_fib4_node_entry_link(struct mlxsw_sp *mlxsw_sp,
-					 struct mlxsw_sp_fib4_entry *fib4_entry,
-					 bool replace, bool append)
+					 struct mlxsw_sp_fib4_entry *fib4_entry)
 {
+	struct mlxsw_sp_fib_node *fib_node = fib4_entry->common.fib_node;
 	int err;
 
-	err = mlxsw_sp_fib4_node_list_insert(fib4_entry, replace, append);
-	if (err)
-		return err;
+	list_add(&fib4_entry->common.list, &fib_node->entry_list);
 
 	err = mlxsw_sp_fib_node_entry_add(mlxsw_sp, &fib4_entry->common);
 	if (err)
@@ -4841,7 +4750,7 @@ static int mlxsw_sp_fib4_node_entry_link(struct mlxsw_sp *mlxsw_sp,
 	return 0;
 
 err_fib_node_entry_add:
-	mlxsw_sp_fib4_node_list_remove(fib4_entry);
+	list_del(&fib4_entry->common.list);
 	return err;
 }
 
@@ -4850,20 +4759,19 @@ mlxsw_sp_fib4_node_entry_unlink(struct mlxsw_sp *mlxsw_sp,
 				struct mlxsw_sp_fib4_entry *fib4_entry)
 {
 	mlxsw_sp_fib_node_entry_del(mlxsw_sp, &fib4_entry->common);
-	mlxsw_sp_fib4_node_list_remove(fib4_entry);
+	list_del(&fib4_entry->common.list);
 
 	if (fib4_entry->common.type == MLXSW_SP_FIB_ENTRY_TYPE_IPIP_DECAP)
 		mlxsw_sp_fib_entry_decap_fini(mlxsw_sp, &fib4_entry->common);
 }
 
 static void mlxsw_sp_fib4_entry_replace(struct mlxsw_sp *mlxsw_sp,
-					struct mlxsw_sp_fib4_entry *fib4_entry,
-					bool replace)
+					struct mlxsw_sp_fib4_entry *fib4_entry)
 {
 	struct mlxsw_sp_fib_node *fib_node = fib4_entry->common.fib_node;
 	struct mlxsw_sp_fib4_entry *replaced;
 
-	if (!replace)
+	if (list_is_singular(&fib_node->entry_list))
 		return;
 
 	/* We inserted the new entry before replaced one */
@@ -4875,9 +4783,8 @@ static void mlxsw_sp_fib4_entry_replace(struct mlxsw_sp *mlxsw_sp,
 }
 
 static int
-mlxsw_sp_router_fib4_add(struct mlxsw_sp *mlxsw_sp,
-			 const struct fib_entry_notifier_info *fen_info,
-			 bool replace, bool append)
+mlxsw_sp_router_fib4_replace(struct mlxsw_sp *mlxsw_sp,
+			     const struct fib_entry_notifier_info *fen_info)
 {
 	struct mlxsw_sp_fib4_entry *fib4_entry;
 	struct mlxsw_sp_fib_node *fib_node;
@@ -4902,14 +4809,13 @@ mlxsw_sp_router_fib4_add(struct mlxsw_sp *mlxsw_sp,
 		goto err_fib4_entry_create;
 	}
 
-	err = mlxsw_sp_fib4_node_entry_link(mlxsw_sp, fib4_entry, replace,
-					    append);
+	err = mlxsw_sp_fib4_node_entry_link(mlxsw_sp, fib4_entry);
 	if (err) {
 		dev_warn(mlxsw_sp->bus_info->dev, "Failed to link FIB entry to node\n");
 		goto err_fib4_node_entry_link;
 	}
 
-	mlxsw_sp_fib4_entry_replace(mlxsw_sp, fib4_entry, replace);
+	mlxsw_sp_fib4_entry_replace(mlxsw_sp, fib4_entry);
 
 	return 0;
 
@@ -5997,7 +5903,6 @@ static void mlxsw_sp_router_fib4_event_work(struct work_struct *work)
 	struct mlxsw_sp_fib_event_work *fib_work =
 		container_of(work, struct mlxsw_sp_fib_event_work, work);
 	struct mlxsw_sp *mlxsw_sp = fib_work->mlxsw_sp;
-	bool replace, append;
 	int err;
 
 	/* Protect internal structures from changes */
@@ -6005,18 +5910,14 @@ static void mlxsw_sp_router_fib4_event_work(struct work_struct *work)
 	mlxsw_sp_span_respin(mlxsw_sp);
 
 	switch (fib_work->event) {
-	case FIB_EVENT_ENTRY_REPLACE: /* fall through */
-	case FIB_EVENT_ENTRY_APPEND: /* fall through */
-	case FIB_EVENT_ENTRY_ADD:
-		replace = fib_work->event == FIB_EVENT_ENTRY_REPLACE;
-		append = fib_work->event == FIB_EVENT_ENTRY_APPEND;
-		err = mlxsw_sp_router_fib4_add(mlxsw_sp, &fib_work->fen_info,
-					       replace, append);
+	case FIB_EVENT_ENTRY_REPLACE_TMP:
+		err = mlxsw_sp_router_fib4_replace(mlxsw_sp,
+						   &fib_work->fen_info);
 		if (err)
 			mlxsw_sp_router_fib_abort(mlxsw_sp);
 		fib_info_put(fib_work->fen_info.fi);
 		break;
-	case FIB_EVENT_ENTRY_DEL:
+	case FIB_EVENT_ENTRY_DEL_TMP:
 		mlxsw_sp_router_fib4_del(mlxsw_sp, &fib_work->fen_info);
 		fib_info_put(fib_work->fen_info.fi);
 		break;
@@ -6246,9 +6147,10 @@ static int mlxsw_sp_router_fib_event(struct notifier_block *nb,
 		err = mlxsw_sp_router_fib_rule_event(event, info,
 						     router->mlxsw_sp);
 		return notifier_from_errno(err);
-	case FIB_EVENT_ENTRY_ADD:
+	case FIB_EVENT_ENTRY_ADD: /* fall through */
 	case FIB_EVENT_ENTRY_REPLACE: /* fall through */
 	case FIB_EVENT_ENTRY_APPEND:  /* fall through */
+	case FIB_EVENT_ENTRY_REPLACE_TMP:
 		if (router->aborted) {
 			NL_SET_ERR_MSG_MOD(info->extack, "FIB offload was aborted. Not configuring route");
 			return notifier_from_errno(-EINVAL);
-- 
2.21.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH net-next 09/15] ipv4: Remove old route notifications and convert listeners
  2019-10-02  8:40 [RFC PATCH net-next 00/15] Simplify IPv4 route offload API Ido Schimmel
                   ` (7 preceding siblings ...)
  2019-10-02  8:40 ` [RFC PATCH net-next 08/15] mlxsw: spectrum_router: Start using new IPv4 route notifications Ido Schimmel
@ 2019-10-02  8:40 ` Ido Schimmel
  2019-10-02  8:40 ` [RFC PATCH net-next 10/15] ipv4: Replace route in list before notifying Ido Schimmel
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 39+ messages in thread
From: Ido Schimmel @ 2019-10-02  8:40 UTC (permalink / raw)
  To: netdev; +Cc: davem, dsahern, jiri, jakub.kicinski, saeedm, mlxsw, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

Unlike mlxsw, the other listeners to the FIB notification chain do not
require any special modifications as they never considered multiple
identical routes.

This patch removes the old route notifications and converts all the
listeners to use the new replace / delete notifications.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/lag_mp.c  |  4 --
 .../ethernet/mellanox/mlxsw/spectrum_router.c |  7 ++--
 drivers/net/ethernet/rocker/rocker_main.c     |  4 +-
 drivers/net/netdevsim/fib.c                   |  4 +-
 include/net/fib_notifier.h                    |  2 -
 net/ipv4/fib_trie.c                           | 37 ++++---------------
 6 files changed, 14 insertions(+), 44 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c b/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c
index 13e2944b1274..58c2646051a8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag_mp.c
@@ -197,8 +197,6 @@ static void mlx5_lag_fib_update(struct work_struct *work)
 	rtnl_lock();
 	switch (fib_work->event) {
 	case FIB_EVENT_ENTRY_REPLACE: /* fall through */
-	case FIB_EVENT_ENTRY_APPEND: /* fall through */
-	case FIB_EVENT_ENTRY_ADD: /* fall through */
 	case FIB_EVENT_ENTRY_DEL:
 		mlx5_lag_fib_route_event(ldev, fib_work->event,
 					 fib_work->fen_info.fi);
@@ -256,8 +254,6 @@ static int mlx5_lag_fib_event(struct notifier_block *nb,
 
 	switch (event) {
 	case FIB_EVENT_ENTRY_REPLACE: /* fall through */
-	case FIB_EVENT_ENTRY_APPEND: /* fall through */
-	case FIB_EVENT_ENTRY_ADD: /* fall through */
 	case FIB_EVENT_ENTRY_DEL:
 		fen_info = container_of(info, struct fib_entry_notifier_info,
 					info);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index f5514fbae39e..5a4e61f1feec 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -5910,14 +5910,14 @@ static void mlxsw_sp_router_fib4_event_work(struct work_struct *work)
 	mlxsw_sp_span_respin(mlxsw_sp);
 
 	switch (fib_work->event) {
-	case FIB_EVENT_ENTRY_REPLACE_TMP:
+	case FIB_EVENT_ENTRY_REPLACE:
 		err = mlxsw_sp_router_fib4_replace(mlxsw_sp,
 						   &fib_work->fen_info);
 		if (err)
 			mlxsw_sp_router_fib_abort(mlxsw_sp);
 		fib_info_put(fib_work->fen_info.fi);
 		break;
-	case FIB_EVENT_ENTRY_DEL_TMP:
+	case FIB_EVENT_ENTRY_DEL:
 		mlxsw_sp_router_fib4_del(mlxsw_sp, &fib_work->fen_info);
 		fib_info_put(fib_work->fen_info.fi);
 		break;
@@ -6149,8 +6149,7 @@ static int mlxsw_sp_router_fib_event(struct notifier_block *nb,
 		return notifier_from_errno(err);
 	case FIB_EVENT_ENTRY_ADD: /* fall through */
 	case FIB_EVENT_ENTRY_REPLACE: /* fall through */
-	case FIB_EVENT_ENTRY_APPEND:  /* fall through */
-	case FIB_EVENT_ENTRY_REPLACE_TMP:
+	case FIB_EVENT_ENTRY_APPEND:
 		if (router->aborted) {
 			NL_SET_ERR_MSG_MOD(info->extack, "FIB offload was aborted. Not configuring route");
 			return notifier_from_errno(-EINVAL);
diff --git a/drivers/net/ethernet/rocker/rocker_main.c b/drivers/net/ethernet/rocker/rocker_main.c
index bc4f951315da..7585cd2270ba 100644
--- a/drivers/net/ethernet/rocker/rocker_main.c
+++ b/drivers/net/ethernet/rocker/rocker_main.c
@@ -2159,7 +2159,7 @@ static void rocker_router_fib_event_work(struct work_struct *work)
 	/* Protect internal structures from changes */
 	rtnl_lock();
 	switch (fib_work->event) {
-	case FIB_EVENT_ENTRY_ADD:
+	case FIB_EVENT_ENTRY_REPLACE:
 		err = rocker_world_fib4_add(rocker, &fib_work->fen_info);
 		if (err)
 			rocker_world_fib4_abort(rocker);
@@ -2201,7 +2201,7 @@ static int rocker_router_fib_event(struct notifier_block *nb,
 	fib_work->event = event;
 
 	switch (event) {
-	case FIB_EVENT_ENTRY_ADD: /* fall through */
+	case FIB_EVENT_ENTRY_REPLACE: /* fall through */
 	case FIB_EVENT_ENTRY_DEL:
 		if (info->family == AF_INET) {
 			struct fib_entry_notifier_info *fen_info = ptr;
diff --git a/drivers/net/netdevsim/fib.c b/drivers/net/netdevsim/fib.c
index 13540dee7364..4e02a4231fcb 100644
--- a/drivers/net/netdevsim/fib.c
+++ b/drivers/net/netdevsim/fib.c
@@ -177,10 +177,10 @@ static int nsim_fib_event_nb(struct notifier_block *nb, unsigned long event,
 					  event == FIB_EVENT_RULE_ADD);
 		break;
 
+	case FIB_EVENT_ENTRY_REPLACE:  /* fall through */
 	case FIB_EVENT_ENTRY_ADD:  /* fall through */
 	case FIB_EVENT_ENTRY_DEL:
-		err = nsim_fib_event(data, info,
-				     event == FIB_EVENT_ENTRY_ADD);
+		err = nsim_fib_event(data, info, event != FIB_EVENT_ENTRY_DEL);
 		break;
 	}
 
diff --git a/include/net/fib_notifier.h b/include/net/fib_notifier.h
index b3c54325caec..6d59221ff05a 100644
--- a/include/net/fib_notifier.h
+++ b/include/net/fib_notifier.h
@@ -23,8 +23,6 @@ enum fib_event_type {
 	FIB_EVENT_NH_DEL,
 	FIB_EVENT_VIF_ADD,
 	FIB_EVENT_VIF_DEL,
-	FIB_EVENT_ENTRY_REPLACE_TMP,
-	FIB_EVENT_ENTRY_DEL_TMP,
 };
 
 struct fib_notifier_ops {
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 4937a3503f4f..cbb41eebb43b 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1143,7 +1143,6 @@ static void fib_remove_alias(struct trie *t, struct key_vector *tp,
 int fib_table_insert(struct net *net, struct fib_table *tb,
 		     struct fib_config *cfg, struct netlink_ext_ack *extack)
 {
-	enum fib_event_type event = FIB_EVENT_ENTRY_ADD;
 	struct trie *t = (struct trie *)tb->tb_data;
 	struct fib_alias *fa, *new_fa;
 	struct key_vector *l, *tp;
@@ -1242,19 +1241,13 @@ int fib_table_insert(struct net *net, struct fib_table *tb,
 						 tb->tb_id) == fa) {
 				enum fib_event_type fib_event;
 
-				fib_event = FIB_EVENT_ENTRY_REPLACE_TMP;
+				fib_event = FIB_EVENT_ENTRY_REPLACE;
 				err = call_fib_entry_notifiers(net, fib_event,
 							       key, plen,
 							       new_fa, extack);
 				if (err)
 					goto out_free_new_fa;
 			}
-			err = call_fib_entry_notifiers(net,
-						       FIB_EVENT_ENTRY_REPLACE,
-						       key, plen, new_fa,
-						       extack);
-			if (err)
-				goto out_free_new_fa;
 
 			rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen,
 				  tb->tb_id, &cfg->fc_nlinfo, nlflags);
@@ -1276,12 +1269,10 @@ int fib_table_insert(struct net *net, struct fib_table *tb,
 		if (fa_match)
 			goto out;
 
-		if (cfg->fc_nlflags & NLM_F_APPEND) {
-			event = FIB_EVENT_ENTRY_APPEND;
+		if (cfg->fc_nlflags & NLM_F_APPEND)
 			nlflags |= NLM_F_APPEND;
-		} else {
+		else
 			fa = fa_first;
-		}
 	}
 	err = -ENOENT;
 	if (!(cfg->fc_nlflags & NLM_F_CREATE))
@@ -1315,15 +1306,12 @@ int fib_table_insert(struct net *net, struct fib_table *tb,
 	    new_fa) {
 		enum fib_event_type fib_event;
 
-		fib_event = FIB_EVENT_ENTRY_REPLACE_TMP;
+		fib_event = FIB_EVENT_ENTRY_REPLACE;
 		err = call_fib_entry_notifiers(net, fib_event, key, plen,
 					       new_fa, extack);
 		if (err)
 			goto out_remove_new_fa;
 	}
-	err = call_fib_entry_notifiers(net, event, key, plen, new_fa, extack);
-	if (err)
-		goto out_remove_new_fa;
 
 	if (!plen)
 		tb->tb_num_default++;
@@ -1606,10 +1594,10 @@ static void fib_notify_alias_delete(struct net *net, u32 key,
 	fa_next = hlist_entry_safe(fa_to_delete->fa_list.next,
 				   struct fib_alias, fa_list);
 	if (fa_next && fa_next->fa_slen == slen && fa_next->tb_id == tb_id) {
-		fib_event = FIB_EVENT_ENTRY_REPLACE_TMP;
+		fib_event = FIB_EVENT_ENTRY_REPLACE;
 		fa_to_notify = fa_next;
 	} else {
-		fib_event = FIB_EVENT_ENTRY_DEL_TMP;
+		fib_event = FIB_EVENT_ENTRY_DEL;
 		fa_to_notify = fa_to_delete;
 	}
 	call_fib_entry_notifiers(net, fib_event, key, KEYLENGTH - slen,
@@ -1670,8 +1658,6 @@ int fib_table_delete(struct net *net, struct fib_table *tb,
 		return -ESRCH;
 
 	fib_notify_alias_delete(net, key, &l->leaf, fa_to_delete, extack);
-	call_fib_entry_notifiers(net, FIB_EVENT_ENTRY_DEL, key, plen,
-				 fa_to_delete, extack);
 	rtmsg_fib(RTM_DELROUTE, htonl(key), fa_to_delete, plen, tb->tb_id,
 		  &cfg->fc_nlinfo, 0);
 
@@ -1997,10 +1983,6 @@ int fib_table_flush(struct net *net, struct fib_table *tb, bool flush_all)
 
 			fib_notify_alias_delete(net, n->key, &n->leaf, fa,
 						NULL);
-			call_fib_entry_notifiers(net, FIB_EVENT_ENTRY_DEL,
-						 n->key,
-						 KEYLENGTH - fa->fa_slen, fa,
-						 NULL);
 			hlist_del_rcu(&fa->fa_list);
 			fib_release_info(fa->fa_info);
 			alias_free_mem_rcu(fa);
@@ -2115,16 +2097,11 @@ static int fib_leaf_notify(struct key_vector *l, struct fib_table *tb,
 			continue;
 
 		last_slen = fa->fa_slen;
-		err = call_fib_entry_notifier(nb, FIB_EVENT_ENTRY_REPLACE_TMP,
+		err = call_fib_entry_notifier(nb, FIB_EVENT_ENTRY_REPLACE,
 					      l->key, KEYLENGTH - fa->fa_slen,
 					      fa, extack);
 		if (err)
 			return err;
-		err = call_fib_entry_notifier(nb, FIB_EVENT_ENTRY_ADD, l->key,
-					      KEYLENGTH - fa->fa_slen,
-					      fa, extack);
-		if (err)
-			return err;
 	}
 	return 0;
 }
-- 
2.21.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH net-next 10/15] ipv4: Replace route in list before notifying
  2019-10-02  8:40 [RFC PATCH net-next 00/15] Simplify IPv4 route offload API Ido Schimmel
                   ` (8 preceding siblings ...)
  2019-10-02  8:40 ` [RFC PATCH net-next 09/15] ipv4: Remove old route notifications and convert listeners Ido Schimmel
@ 2019-10-02  8:40 ` Ido Schimmel
  2019-10-02  8:40 ` [RFC PATCH net-next 11/15] ipv4: Encapsulate function arguments in a struct Ido Schimmel
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 39+ messages in thread
From: Ido Schimmel @ 2019-10-02  8:40 UTC (permalink / raw)
  To: netdev; +Cc: davem, dsahern, jiri, jakub.kicinski, saeedm, mlxsw, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

Subsequent patches will add an 'in_hw' flag to routes which will signal
if the route is present in hardware or not.

After programming the route to the hardware, drivers will have to ask
the IPv4 code to set the flag by passing the route's key.

In the case of route replace, the new route is notified before it is
actually inserted into the FIB alias list. This can prevent simple
drivers (e.g., netdevsim) that program the route to the hardware in the
same context it is notified in from being able to set the flag.

Solve this by first inserting the new route to the list and rollback the
operation in case the route was vetoed.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 net/ipv4/fib_trie.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index cbb41eebb43b..9ea9610eebfd 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1237,23 +1237,26 @@ int fib_table_insert(struct net *net, struct fib_table *tb,
 			new_fa->tb_id = tb->tb_id;
 			new_fa->fa_default = -1;
 
+			hlist_replace_rcu(&fa->fa_list, &new_fa->fa_list);
+
 			if (fib_find_first_alias(&l->leaf, fa->fa_slen,
-						 tb->tb_id) == fa) {
+						 tb->tb_id) == new_fa) {
 				enum fib_event_type fib_event;
 
 				fib_event = FIB_EVENT_ENTRY_REPLACE;
 				err = call_fib_entry_notifiers(net, fib_event,
 							       key, plen,
 							       new_fa, extack);
-				if (err)
+				if (err) {
+					hlist_replace_rcu(&new_fa->fa_list,
+							  &fa->fa_list);
 					goto out_free_new_fa;
+				}
 			}
 
 			rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen,
 				  tb->tb_id, &cfg->fc_nlinfo, nlflags);
 
-			hlist_replace_rcu(&fa->fa_list, &new_fa->fa_list);
-
 			alias_free_mem_rcu(fa);
 
 			fib_release_info(fi_drop);
-- 
2.21.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH net-next 11/15] ipv4: Encapsulate function arguments in a struct
  2019-10-02  8:40 [RFC PATCH net-next 00/15] Simplify IPv4 route offload API Ido Schimmel
                   ` (9 preceding siblings ...)
  2019-10-02  8:40 ` [RFC PATCH net-next 10/15] ipv4: Replace route in list before notifying Ido Schimmel
@ 2019-10-02  8:40 ` Ido Schimmel
  2019-10-02  8:41 ` [RFC PATCH net-next 12/15] ipv4: Add "in hardware" indication to routes Ido Schimmel
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 39+ messages in thread
From: Ido Schimmel @ 2019-10-02  8:40 UTC (permalink / raw)
  To: netdev; +Cc: davem, dsahern, jiri, jakub.kicinski, saeedm, mlxsw, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

fib_dump_info() is used to prepare RTM_{NEW,DEL}ROUTE netlink messages
using the passed arguments. Currently, the function takes 11 arguments,
6 of which are attributes of the route being dumped (e.g., prefix, TOS).

The next patch will need the function to also dump to user space an
indication if the route is present in hardware or not. Instead of
passing yet another argument, change the function to take a struct
containing the different route attributes.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 net/ipv4/fib_lookup.h    | 14 +++++++++++---
 net/ipv4/fib_semantics.c | 26 ++++++++++++++++----------
 net/ipv4/fib_trie.c      | 14 +++++++++-----
 net/ipv4/route.c         | 12 +++++++++---
 4 files changed, 45 insertions(+), 21 deletions(-)

diff --git a/net/ipv4/fib_lookup.h b/net/ipv4/fib_lookup.h
index a68b5e21ec51..b34594a9965f 100644
--- a/net/ipv4/fib_lookup.h
+++ b/net/ipv4/fib_lookup.h
@@ -21,6 +21,15 @@ struct fib_alias {
 
 #define FA_S_ACCESSED	0x01
 
+struct fib_rt_info {
+	struct fib_info		*fi;
+	u32			tb_id;
+	__be32			dst;
+	int			dst_len;
+	u8			tos;
+	u8			type;
+};
+
 /* Dont write on fa_state unless needed, to keep it shared on all cpus */
 static inline void fib_alias_accessed(struct fib_alias *fa)
 {
@@ -35,9 +44,8 @@ struct fib_info *fib_create_info(struct fib_config *cfg,
 int fib_nh_match(struct fib_config *cfg, struct fib_info *fi,
 		 struct netlink_ext_ack *extack);
 bool fib_metrics_match(struct fib_config *cfg, struct fib_info *fi);
-int fib_dump_info(struct sk_buff *skb, u32 pid, u32 seq, int event, u32 tb_id,
-		  u8 type, __be32 dst, int dst_len, u8 tos, struct fib_info *fi,
-		  unsigned int);
+int fib_dump_info(struct sk_buff *skb, u32 pid, u32 seq, int event,
+		  struct fib_rt_info *fri, unsigned int);
 void rtmsg_fib(int event, __be32 key, struct fib_alias *fa, int dst_len,
 	       u32 tb_id, const struct nl_info *info, unsigned int nlm_flags);
 
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 0913a090b2bf..3c9d47804d93 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -504,6 +504,7 @@ void rtmsg_fib(int event, __be32 key, struct fib_alias *fa,
 	       int dst_len, u32 tb_id, const struct nl_info *info,
 	       unsigned int nlm_flags)
 {
+	struct fib_rt_info fri;
 	struct sk_buff *skb;
 	u32 seq = info->nlh ? info->nlh->nlmsg_seq : 0;
 	int err = -ENOBUFS;
@@ -512,9 +513,13 @@ void rtmsg_fib(int event, __be32 key, struct fib_alias *fa,
 	if (!skb)
 		goto errout;
 
-	err = fib_dump_info(skb, info->portid, seq, event, tb_id,
-			    fa->fa_type, key, dst_len,
-			    fa->fa_tos, fa->fa_info, nlm_flags);
+	fri.fi = fa->fa_info;
+	fri.tb_id = tb_id;
+	fri.dst = key;
+	fri.dst_len = dst_len;
+	fri.tos = fa->fa_tos;
+	fri.type = fa->fa_type;
+	err = fib_dump_info(skb, info->portid, seq, event, &fri, nlm_flags);
 	if (err < 0) {
 		/* -EMSGSIZE implies BUG in fib_nlmsg_size() */
 		WARN_ON(err == -EMSGSIZE);
@@ -1725,10 +1730,11 @@ static int fib_add_multipath(struct sk_buff *skb, struct fib_info *fi)
 #endif
 
 int fib_dump_info(struct sk_buff *skb, u32 portid, u32 seq, int event,
-		  u32 tb_id, u8 type, __be32 dst, int dst_len, u8 tos,
-		  struct fib_info *fi, unsigned int flags)
+		  struct fib_rt_info *fri, unsigned int flags)
 {
-	unsigned int nhs = fib_info_num_path(fi);
+	unsigned int nhs = fib_info_num_path(fri->fi);
+	struct fib_info *fi = fri->fi;
+	u32 tb_id = fri->tb_id;
 	struct nlmsghdr *nlh;
 	struct rtmsg *rtm;
 
@@ -1738,22 +1744,22 @@ int fib_dump_info(struct sk_buff *skb, u32 portid, u32 seq, int event,
 
 	rtm = nlmsg_data(nlh);
 	rtm->rtm_family = AF_INET;
-	rtm->rtm_dst_len = dst_len;
+	rtm->rtm_dst_len = fri->dst_len;
 	rtm->rtm_src_len = 0;
-	rtm->rtm_tos = tos;
+	rtm->rtm_tos = fri->tos;
 	if (tb_id < 256)
 		rtm->rtm_table = tb_id;
 	else
 		rtm->rtm_table = RT_TABLE_COMPAT;
 	if (nla_put_u32(skb, RTA_TABLE, tb_id))
 		goto nla_put_failure;
-	rtm->rtm_type = type;
+	rtm->rtm_type = fri->type;
 	rtm->rtm_flags = fi->fib_flags;
 	rtm->rtm_scope = fi->fib_scope;
 	rtm->rtm_protocol = fi->fib_protocol;
 
 	if (rtm->rtm_dst_len &&
-	    nla_put_in_addr(skb, RTA_DST, dst))
+	    nla_put_in_addr(skb, RTA_DST, fri->dst))
 		goto nla_put_failure;
 	if (fi->fib_priority &&
 	    nla_put_u32(skb, RTA_PRIORITY, fi->fib_priority))
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 9ea9610eebfd..646542de83ca 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -2210,14 +2210,18 @@ static int fn_trie_dump_leaf(struct key_vector *l, struct fib_table *tb,
 
 		if (filter->dump_routes) {
 			if (!s_fa) {
+				struct fib_rt_info fri;
+
+				fri.fi = fi;
+				fri.tb_id = tb->tb_id;
+				fri.dst = xkey;
+				fri.dst_len = KEYLENGTH - fa->fa_slen;
+				fri.tos = fa->fa_tos;
+				fri.type = fa->fa_type;
 				err = fib_dump_info(skb,
 						    NETLINK_CB(cb->skb).portid,
 						    cb->nlh->nlmsg_seq,
-						    RTM_NEWROUTE,
-						    tb->tb_id, fa->fa_type,
-						    xkey,
-						    KEYLENGTH - fa->fa_slen,
-						    fa->fa_tos, fi, flags);
+						    RTM_NEWROUTE, &fri, flags);
 				if (err < 0)
 					goto stop;
 			}
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 7dcce724c78b..15674e446d4b 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -3179,16 +3179,22 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
 	skb_reset_mac_header(skb);
 
 	if (rtm->rtm_flags & RTM_F_FIB_MATCH) {
+		struct fib_rt_info fri;
+
 		if (!res.fi) {
 			err = fib_props[res.type].error;
 			if (!err)
 				err = -EHOSTUNREACH;
 			goto errout_rcu;
 		}
+		fri.fi = res.fi;
+		fri.tb_id = table_id;
+		fri.dst = res.prefix;
+		fri.dst_len = res.prefixlen;
+		fri.tos = fl4.flowi4_tos;
+		fri.type = rt->rt_type;
 		err = fib_dump_info(skb, NETLINK_CB(in_skb).portid,
-				    nlh->nlmsg_seq, RTM_NEWROUTE, table_id,
-				    rt->rt_type, res.prefix, res.prefixlen,
-				    fl4.flowi4_tos, res.fi, 0);
+				    nlh->nlmsg_seq, RTM_NEWROUTE, &fri, 0);
 	} else {
 		err = rt_fill_info(net, dst, src, rt, table_id, &fl4, skb,
 				   NETLINK_CB(in_skb).portid,
-- 
2.21.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH net-next 12/15] ipv4: Add "in hardware" indication to routes
  2019-10-02  8:40 [RFC PATCH net-next 00/15] Simplify IPv4 route offload API Ido Schimmel
                   ` (10 preceding siblings ...)
  2019-10-02  8:40 ` [RFC PATCH net-next 11/15] ipv4: Encapsulate function arguments in a struct Ido Schimmel
@ 2019-10-02  8:41 ` Ido Schimmel
  2019-10-02 15:58   ` Roopa Prabhu
  2019-10-02  8:41 ` [RFC PATCH net-next 13/15] mlxsw: spectrum_router: Mark routes as "in hardware" Ido Schimmel
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 39+ messages in thread
From: Ido Schimmel @ 2019-10-02  8:41 UTC (permalink / raw)
  To: netdev; +Cc: davem, dsahern, jiri, jakub.kicinski, saeedm, mlxsw, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

When performing L3 offload, routes and nexthops are usually programmed
into two different tables in the underlying device. Therefore, the fact
that a nexthop resides in hardware does not necessarily mean that all
the associated routes also reside in hardware and vice-versa.

While the kernel can signal to user space the presence of a nexthop in
hardware (via 'RTNH_F_OFFLOAD'), it does not have a corresponding flag
for routes. In addition, the fact that a route resides in hardware does
not necessarily mean that the traffic is offloaded. For example,
unreachable routes (i.e., 'RTN_UNREACHABLE') are programmed to trap
packets to the CPU so that the kernel will be able to generate the
appropriate ICMP error packet.

This patch adds an "in hardware" indication to IPv4 routes, so that
users will have better visibility into the offload process. In the
future IPv6 will be extended with this indication as well.

'struct fib_alias' is extended with a new field that indicates if
the route resides in hardware or not. Note that the new field is added
in the 6 bytes hole and therefore the struct still fits in a single
cache line [1].

Capable drivers are expected to invoke fib_alias_in_hw_{set,clear}()
with the route's key in order to set / clear the "in hardware
indication".

The new indication is dumped to user space via a new flag (i.e.,
'RTM_F_IN_HW') in the 'rtm_flags' field in the ancillary header.

[1]
struct fib_alias {
        struct hlist_node  fa_list;                      /*     0    16 */
        struct fib_info *          fa_info;              /*    16     8 */
        u8                         fa_tos;               /*    24     1 */
        u8                         fa_type;              /*    25     1 */
        u8                         fa_state;             /*    26     1 */
        u8                         fa_slen;              /*    27     1 */
        u32                        tb_id;                /*    28     4 */
        s16                        fa_default;           /*    32     2 */
        u8                         in_hw:1;              /*    34: 0  1 */
        u8                         unused:7;             /*    34: 1  1 */

        /* XXX 5 bytes hole, try to pack */

        struct callback_head rcu __attribute__((__aligned__(8))); /*    40    16 */

        /* size: 56, cachelines: 1, members: 11 */
        /* sum members: 50, holes: 1, sum holes: 5 */
        /* sum bitfield members: 8 bits (1 bytes) */
        /* forced alignments: 1, forced holes: 1, sum forced holes: 5 */
        /* last cacheline: 56 bytes */
} __attribute__((__aligned__(8)));

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 include/net/ip_fib.h           |  5 +++
 include/uapi/linux/rtnetlink.h |  1 +
 net/ipv4/fib_lookup.h          |  4 ++
 net/ipv4/fib_semantics.c       |  4 ++
 net/ipv4/fib_trie.c            | 71 ++++++++++++++++++++++++++++++++++
 5 files changed, 85 insertions(+)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 52b2406a5dfc..019a138a79f4 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -454,6 +454,11 @@ int fib_nh_common_init(struct fib_nh_common *nhc, struct nlattr *fc_encap,
 void fib_nh_common_release(struct fib_nh_common *nhc);
 
 /* Exported by fib_trie.c */
+void fib_alias_in_hw_set(struct net *net, u32 dst, int dst_len,
+			 const struct fib_info *fi, u8 tos, u8 type, u32 tb_id);
+void fib_alias_in_hw_clear(struct net *net, u32 dst, int dst_len,
+			   const struct fib_info *fi, u8 tos, u8 type,
+			   u32 tb_id);
 void fib_trie_init(void);
 struct fib_table *fib_trie_table(u32 id, struct fib_table *alias);
 
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 1418a8362bb7..e5a104f5ce35 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -309,6 +309,7 @@ enum rt_scope_t {
 #define RTM_F_PREFIX		0x800	/* Prefix addresses		*/
 #define RTM_F_LOOKUP_TABLE	0x1000	/* set rtm_table to FIB lookup result */
 #define RTM_F_FIB_MATCH	        0x2000	/* return full fib lookup match */
+#define RTM_F_IN_HW		0x4000	/* route is in hardware */
 
 /* Reserved table identifiers */
 
diff --git a/net/ipv4/fib_lookup.h b/net/ipv4/fib_lookup.h
index b34594a9965f..65a69a863499 100644
--- a/net/ipv4/fib_lookup.h
+++ b/net/ipv4/fib_lookup.h
@@ -16,6 +16,8 @@ struct fib_alias {
 	u8			fa_slen;
 	u32			tb_id;
 	s16			fa_default;
+	u8			in_hw:1,
+				unused:7;
 	struct rcu_head		rcu;
 };
 
@@ -28,6 +30,8 @@ struct fib_rt_info {
 	int			dst_len;
 	u8			tos;
 	u8			type;
+	u8			in_hw:1,
+				unused:7;
 };
 
 /* Dont write on fa_state unless needed, to keep it shared on all cpus */
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 3c9d47804d93..94f201d44844 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -519,6 +519,7 @@ void rtmsg_fib(int event, __be32 key, struct fib_alias *fa,
 	fri.dst_len = dst_len;
 	fri.tos = fa->fa_tos;
 	fri.type = fa->fa_type;
+	fri.in_hw = fa->in_hw;
 	err = fib_dump_info(skb, info->portid, seq, event, &fri, nlm_flags);
 	if (err < 0) {
 		/* -EMSGSIZE implies BUG in fib_nlmsg_size() */
@@ -1801,6 +1802,9 @@ int fib_dump_info(struct sk_buff *skb, u32 portid, u32 seq, int event,
 			goto nla_put_failure;
 	}
 
+	if (fri->in_hw)
+		rtm->rtm_flags |= RTM_F_IN_HW;
+
 	nlmsg_end(skb, nlh);
 	return 0;
 
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 646542de83ca..e3486bde6c5a 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1028,6 +1028,74 @@ static struct fib_alias *fib_find_alias(struct hlist_head *fah, u8 slen,
 	return NULL;
 }
 
+static struct fib_alias *
+fib_find_matching_alias(struct net *net, u32 dst, int dst_len,
+			const struct fib_info *fi, u8 tos, u8 type, u32 tb_id)
+{
+	u8 slen = KEYLENGTH - dst_len;
+	struct key_vector *l, *tp;
+	struct fib_table *tb;
+	struct fib_alias *fa;
+	struct trie *t;
+
+	tb = fib_get_table(net, tb_id);
+	if (!tb)
+		return NULL;
+
+	t = (struct trie *)tb->tb_data;
+	l = fib_find_node(t, &tp, dst);
+	if (!l)
+		return NULL;
+
+	hlist_for_each_entry_rcu(fa, &l->leaf, fa_list) {
+		if (fa->fa_slen == slen && fa->tb_id == tb_id &&
+		    fa->fa_tos == tos && fa->fa_info == fi &&
+		    fa->fa_type == type)
+			return fa;
+	}
+
+	return NULL;
+}
+
+void fib_alias_in_hw_set(struct net *net, u32 dst, int dst_len,
+			 const struct fib_info *fi, u8 tos, u8 type, u32 tb_id)
+{
+	struct fib_alias *fa_match;
+
+	rcu_read_lock();
+
+	fa_match = fib_find_matching_alias(net, dst, dst_len, fi, tos, type,
+					   tb_id);
+	if (!fa_match)
+		goto out;
+
+	fa_match->in_hw = 1;
+
+out:
+	rcu_read_unlock();
+}
+EXPORT_SYMBOL_GPL(fib_alias_in_hw_set);
+
+void fib_alias_in_hw_clear(struct net *net, u32 dst, int dst_len,
+			   const struct fib_info *fi, u8 tos, u8 type,
+			   u32 tb_id)
+{
+	struct fib_alias *fa_match;
+
+	rcu_read_lock();
+
+	fa_match = fib_find_matching_alias(net, dst, dst_len, fi, tos, type,
+					   tb_id);
+	if (!fa_match)
+		goto out;
+
+	fa_match->in_hw = 0;
+
+out:
+	rcu_read_unlock();
+}
+EXPORT_SYMBOL_GPL(fib_alias_in_hw_clear);
+
 static void trie_rebalance(struct trie *t, struct key_vector *tn)
 {
 	while (!IS_TRIE(tn))
@@ -1236,6 +1304,7 @@ int fib_table_insert(struct net *net, struct fib_table *tb,
 			new_fa->fa_slen = fa->fa_slen;
 			new_fa->tb_id = tb->tb_id;
 			new_fa->fa_default = -1;
+			new_fa->in_hw = 0;
 
 			hlist_replace_rcu(&fa->fa_list, &new_fa->fa_list);
 
@@ -1294,6 +1363,7 @@ int fib_table_insert(struct net *net, struct fib_table *tb,
 	new_fa->fa_slen = slen;
 	new_fa->tb_id = tb->tb_id;
 	new_fa->fa_default = -1;
+	new_fa->in_hw = 0;
 
 	/* Insert new entry to the list. */
 	err = fib_insert_alias(t, tp, l, new_fa, fa, key);
@@ -2218,6 +2288,7 @@ static int fn_trie_dump_leaf(struct key_vector *l, struct fib_table *tb,
 				fri.dst_len = KEYLENGTH - fa->fa_slen;
 				fri.tos = fa->fa_tos;
 				fri.type = fa->fa_type;
+				fri.in_hw = fa->in_hw;
 				err = fib_dump_info(skb,
 						    NETLINK_CB(cb->skb).portid,
 						    cb->nlh->nlmsg_seq,
-- 
2.21.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH net-next 13/15] mlxsw: spectrum_router: Mark routes as "in hardware"
  2019-10-02  8:40 [RFC PATCH net-next 00/15] Simplify IPv4 route offload API Ido Schimmel
                   ` (11 preceding siblings ...)
  2019-10-02  8:41 ` [RFC PATCH net-next 12/15] ipv4: Add "in hardware" indication to routes Ido Schimmel
@ 2019-10-02  8:41 ` Ido Schimmel
  2019-10-02 18:27   ` Jiri Pirko
  2019-10-02  8:41 ` [RFC PATCH net-next 14/15] netdevsim: fib: " Ido Schimmel
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 39+ messages in thread
From: Ido Schimmel @ 2019-10-02  8:41 UTC (permalink / raw)
  To: netdev; +Cc: davem, dsahern, jiri, jakub.kicinski, saeedm, mlxsw, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

Make use of the recently introduced APIs and mark notified routes as "in
hardware" after they were programmed to the device's LPM tree.

Similarly, when a route is replaced by an higher priority one, clear the
"in hardware" indication from it.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 .../ethernet/mellanox/mlxsw/spectrum_router.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index 5a4e61f1feec..26ab8ae482ec 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -4769,7 +4769,10 @@ static void mlxsw_sp_fib4_entry_replace(struct mlxsw_sp *mlxsw_sp,
 					struct mlxsw_sp_fib4_entry *fib4_entry)
 {
 	struct mlxsw_sp_fib_node *fib_node = fib4_entry->common.fib_node;
+	struct net *net = mlxsw_sp_net(mlxsw_sp);
+	u32 *addr = (u32 *) fib_node->key.addr;
 	struct mlxsw_sp_fib4_entry *replaced;
+	struct fib_info *fi;
 
 	if (list_is_singular(&fib_node->entry_list))
 		return;
@@ -4777,6 +4780,10 @@ static void mlxsw_sp_fib4_entry_replace(struct mlxsw_sp *mlxsw_sp,
 	/* We inserted the new entry before replaced one */
 	replaced = list_next_entry(fib4_entry, common.list);
 
+	fi = mlxsw_sp_nexthop4_group_fi(replaced->common.nh_group);
+	fib_alias_in_hw_clear(net, *addr, fib_node->key.prefix_len, fi,
+			      replaced->tos, replaced->type, replaced->tb_id);
+
 	mlxsw_sp_fib4_node_entry_unlink(mlxsw_sp, replaced);
 	mlxsw_sp_fib4_entry_destroy(mlxsw_sp, replaced);
 	mlxsw_sp_fib_node_put(mlxsw_sp, fib_node);
@@ -4786,6 +4793,7 @@ static int
 mlxsw_sp_router_fib4_replace(struct mlxsw_sp *mlxsw_sp,
 			     const struct fib_entry_notifier_info *fen_info)
 {
+	struct net *net = mlxsw_sp_net(mlxsw_sp);
 	struct mlxsw_sp_fib4_entry *fib4_entry;
 	struct mlxsw_sp_fib_node *fib_node;
 	int err;
@@ -4815,6 +4823,10 @@ mlxsw_sp_router_fib4_replace(struct mlxsw_sp *mlxsw_sp,
 		goto err_fib4_node_entry_link;
 	}
 
+	fib_alias_in_hw_set(net, fen_info->dst, fen_info->dst_len,
+			    fen_info->fi, fen_info->tos, fen_info->type,
+			    fen_info->tb_id);
+
 	mlxsw_sp_fib4_entry_replace(mlxsw_sp, fib4_entry);
 
 	return 0;
@@ -5731,11 +5743,18 @@ static void mlxsw_sp_fib4_node_flush(struct mlxsw_sp *mlxsw_sp,
 				     struct mlxsw_sp_fib_node *fib_node)
 {
 	struct mlxsw_sp_fib4_entry *fib4_entry, *tmp;
+	struct net *net = mlxsw_sp_net(mlxsw_sp);
+	u32 *addr = (u32 *) fib_node->key.addr;
 
 	list_for_each_entry_safe(fib4_entry, tmp, &fib_node->entry_list,
 				 common.list) {
 		bool do_break = &tmp->common.list == &fib_node->entry_list;
+		struct fib_info *fi;
 
+		fi = mlxsw_sp_nexthop4_group_fi(fib4_entry->common.nh_group);
+		fib_alias_in_hw_clear(net, *addr, fib_node->key.prefix_len, fi,
+				      fib4_entry->tos, fib4_entry->type,
+				      fib4_entry->tb_id);
 		mlxsw_sp_fib4_node_entry_unlink(mlxsw_sp, fib4_entry);
 		mlxsw_sp_fib4_entry_destroy(mlxsw_sp, fib4_entry);
 		mlxsw_sp_fib_node_put(mlxsw_sp, fib_node);
-- 
2.21.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH net-next 14/15] netdevsim: fib: Mark routes as "in hardware"
  2019-10-02  8:40 [RFC PATCH net-next 00/15] Simplify IPv4 route offload API Ido Schimmel
                   ` (12 preceding siblings ...)
  2019-10-02  8:41 ` [RFC PATCH net-next 13/15] mlxsw: spectrum_router: Mark routes as "in hardware" Ido Schimmel
@ 2019-10-02  8:41 ` Ido Schimmel
  2019-10-02  8:41 ` [RFC PATCH net-next 15/15] selftests: netdevsim: Add test for route offload API Ido Schimmel
  2019-10-02 18:17 ` [RFC PATCH net-next 00/15] Simplify IPv4 " Jiri Pirko
  15 siblings, 0 replies; 39+ messages in thread
From: Ido Schimmel @ 2019-10-02  8:41 UTC (permalink / raw)
  To: netdev; +Cc: davem, dsahern, jiri, jakub.kicinski, saeedm, mlxsw, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

Implement dummy IPv4 FIB "offload" in the driver by storing currently
"offloaded" routes in a hash table. Each route in the hash table is
marked as "in hardware". The indication is cleared when the route is
replaced or when the netdevsim instance is deleted.

This will later allow us to test the route offload API on top of
netdevsim.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 drivers/net/netdevsim/fib.c | 259 +++++++++++++++++++++++++++++++++++-
 1 file changed, 256 insertions(+), 3 deletions(-)

diff --git a/drivers/net/netdevsim/fib.c b/drivers/net/netdevsim/fib.c
index 4e02a4231fcb..ab08d6e73356 100644
--- a/drivers/net/netdevsim/fib.c
+++ b/drivers/net/netdevsim/fib.c
@@ -14,6 +14,9 @@
  * THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
  */
 
+#include <linux/in6.h>
+#include <linux/kernel.h>
+#include <linux/rhashtable.h>
 #include <net/fib_notifier.h>
 #include <net/ip_fib.h>
 #include <net/ip6_fib.h>
@@ -36,6 +39,34 @@ struct nsim_fib_data {
 	struct notifier_block fib_nb;
 	struct nsim_per_fib_data ipv4;
 	struct nsim_per_fib_data ipv6;
+	struct rhashtable fib_rt_ht;
+	struct devlink *devlink;
+};
+
+struct nsim_fib_rt_key {
+	unsigned char addr[sizeof(struct in6_addr)];
+	unsigned char prefix_len;
+	int family;
+	u32 tb_id;
+};
+
+struct nsim_fib_rt {
+	struct nsim_fib_rt_key key;
+	struct rhash_head ht_node;
+};
+
+struct nsim_fib4_rt {
+	struct nsim_fib_rt common;
+	struct fib_info *fi;
+	u8 type;
+	u8 tos;
+};
+
+static const struct rhashtable_params nsim_fib_rt_ht_params = {
+	.key_offset = offsetof(struct nsim_fib_rt, key),
+	.head_offset = offsetof(struct nsim_fib_rt, ht_node),
+	.key_len = sizeof(struct nsim_fib_rt_key),
+	.automatic_shrinking = true,
 };
 
 u64 nsim_fib_get_val(struct nsim_fib_data *fib_data,
@@ -144,6 +175,191 @@ static int nsim_fib_account(struct nsim_fib_entry *entry, bool add,
 	return err;
 }
 
+static void nsim_fib_rt_init(struct nsim_fib_rt *fib_rt, const void *addr,
+			     size_t addr_len, unsigned int prefix_len,
+			     int family, u32 tb_id)
+{
+	memcpy(fib_rt->key.addr, addr, addr_len);
+	fib_rt->key.prefix_len = prefix_len;
+	fib_rt->key.family = family;
+	fib_rt->key.tb_id = tb_id;
+}
+
+static struct nsim_fib_rt *nsim_fib_rt_lookup(struct rhashtable *fib_rt_ht,
+					      const void *addr, size_t addr_len,
+					      unsigned int prefix_len,
+					      int family, u32 tb_id)
+{
+	struct nsim_fib_rt_key key;
+
+	memset(&key, 0, sizeof(key));
+	memcpy(key.addr, addr, addr_len);
+	key.prefix_len = prefix_len;
+	key.family = family;
+	key.tb_id = tb_id;
+
+	return rhashtable_lookup_fast(fib_rt_ht, &key, nsim_fib_rt_ht_params);
+}
+
+static struct nsim_fib4_rt *
+nsim_fib4_rt_create(struct fib_entry_notifier_info *fen_info)
+{
+	struct nsim_fib4_rt *fib4_rt;
+
+	fib4_rt = kzalloc(sizeof(*fib4_rt), GFP_ATOMIC);
+	if (!fib4_rt)
+		return NULL;
+
+	nsim_fib_rt_init(&fib4_rt->common, &fen_info->dst, sizeof(u32),
+			 fen_info->dst_len, AF_INET, fen_info->tb_id);
+
+	fib4_rt->fi = fen_info->fi;
+	fib_info_hold(fib4_rt->fi);
+	fib4_rt->tos = fen_info->tos;
+	fib4_rt->type = fen_info->type;
+
+	return fib4_rt;
+}
+
+static void nsim_fib4_rt_destroy(struct nsim_fib4_rt *fib4_rt)
+{
+	fib_info_put(fib4_rt->fi);
+	kfree(fib4_rt);
+}
+
+static struct nsim_fib4_rt *
+nsim_fib4_rt_lookup(struct rhashtable *fib_rt_ht,
+		    const struct fib_entry_notifier_info *fen_info)
+{
+	struct nsim_fib_rt *fib_rt;
+
+	fib_rt = nsim_fib_rt_lookup(fib_rt_ht, &fen_info->dst, sizeof(u32),
+				    fen_info->dst_len, AF_INET,
+				    fen_info->tb_id);
+	if (!fib_rt)
+		return NULL;
+
+	return container_of(fib_rt, struct nsim_fib4_rt, common);
+}
+
+static int nsim_fib4_rt_add(struct nsim_fib_data *data,
+			    struct nsim_fib4_rt *fib4_rt,
+			    struct netlink_ext_ack *extack)
+{
+	struct nsim_fib_rt *fib_rt = &fib4_rt->common;
+	struct net *net = devlink_net(data->devlink);
+	u32 *addr = (u32 *) fib_rt->key.addr;
+	int err;
+
+	err = nsim_fib_account(&data->ipv4.fib, true, extack);
+	if (err)
+		return err;
+
+	err = rhashtable_insert_fast(&data->fib_rt_ht,
+				     &fib4_rt->common.ht_node,
+				     nsim_fib_rt_ht_params);
+	if (err) {
+		NL_SET_ERR_MSG_MOD(extack, "Failed to insert IPv4 route");
+		goto err_fib_dismiss;
+	}
+
+	fib_alias_in_hw_set(net, *addr, fib_rt->key.prefix_len, fib4_rt->fi,
+			    fib4_rt->tos, fib4_rt->type, fib_rt->key.tb_id);
+
+	return 0;
+
+err_fib_dismiss:
+	nsim_fib_account(&data->ipv4.fib, false, extack);
+	return err;
+}
+
+static int nsim_fib4_rt_replace(struct nsim_fib_data *data,
+				struct nsim_fib4_rt *fib4_rt,
+				struct nsim_fib4_rt *fib4_rt_old,
+				struct netlink_ext_ack *extack)
+{
+	struct nsim_fib_rt *fib_rt = &fib4_rt->common;
+	struct net *net = devlink_net(data->devlink);
+	u32 *addr = (u32 *) fib_rt->key.addr;
+	int err;
+
+	/* We are replacing a route, so no need to change the accounting. */
+	err = rhashtable_replace_fast(&data->fib_rt_ht,
+				      &fib4_rt_old->common.ht_node,
+				      &fib4_rt->common.ht_node,
+				      nsim_fib_rt_ht_params);
+	if (err) {
+		NL_SET_ERR_MSG_MOD(extack, "Failed to replace IPv4 route");
+		return err;
+	}
+
+	fib_alias_in_hw_clear(net, *addr, fib_rt->key.prefix_len,
+			      fib4_rt_old->fi, fib4_rt_old->tos,
+			      fib4_rt_old->type, fib_rt->key.tb_id);
+	nsim_fib4_rt_destroy(fib4_rt_old);
+
+	fib_alias_in_hw_set(net, *addr, fib_rt->key.prefix_len,
+			    fib4_rt->fi, fib4_rt->tos, fib4_rt->type,
+			    fib_rt->key.tb_id);
+
+	return 0;
+}
+
+static int nsim_fib4_rt_insert(struct nsim_fib_data *data,
+			       struct fib_entry_notifier_info *fen_info)
+{
+	struct netlink_ext_ack *extack = fen_info->info.extack;
+	struct nsim_fib4_rt *fib4_rt, *fib4_rt_old;
+	int err;
+
+	fib4_rt = nsim_fib4_rt_create(fen_info);
+	if (!fib4_rt)
+		return -ENOMEM;
+
+	fib4_rt_old = nsim_fib4_rt_lookup(&data->fib_rt_ht, fen_info);
+	if (!fib4_rt_old)
+		err = nsim_fib4_rt_add(data, fib4_rt, extack);
+	else
+		err = nsim_fib4_rt_replace(data, fib4_rt, fib4_rt_old, extack);
+
+	if (err)
+		nsim_fib4_rt_destroy(fib4_rt);
+
+	return err;
+}
+
+static void nsim_fib4_rt_remove(struct nsim_fib_data *data,
+				const struct fib_entry_notifier_info *fen_info)
+{
+	struct netlink_ext_ack *extack = fen_info->info.extack;
+	struct nsim_fib4_rt *fib4_rt;
+
+	fib4_rt = nsim_fib4_rt_lookup(&data->fib_rt_ht, fen_info);
+	if (WARN_ON_ONCE(!fib4_rt))
+		return;
+
+	rhashtable_remove_fast(&data->fib_rt_ht, &fib4_rt->common.ht_node,
+			       nsim_fib_rt_ht_params);
+	nsim_fib_account(&data->ipv4.fib, false, extack);
+	nsim_fib4_rt_destroy(fib4_rt);
+}
+
+static int nsim_fib4_event(struct nsim_fib_data *data,
+			   struct fib_notifier_info *info, bool add)
+{
+	struct fib_entry_notifier_info *fen_info;
+	int err = 0;
+
+	fen_info = container_of(info, struct fib_entry_notifier_info, info);
+
+	if (add)
+		err = nsim_fib4_rt_insert(data, fen_info);
+	else
+		nsim_fib4_rt_remove(data, fen_info);
+
+	return err;
+}
+
 static int nsim_fib_event(struct nsim_fib_data *data,
 			  struct fib_notifier_info *info, bool add)
 {
@@ -152,7 +368,7 @@ static int nsim_fib_event(struct nsim_fib_data *data,
 
 	switch (info->family) {
 	case AF_INET:
-		err = nsim_fib_account(&data->ipv4.fib, add, extack);
+		err = nsim_fib4_event(data, info, add);
 		break;
 	case AF_INET6:
 		err = nsim_fib_account(&data->ipv6.fib, add, extack);
@@ -247,6 +463,33 @@ static void nsim_fib_set_max_all(struct nsim_fib_data *data,
 	}
 }
 
+static void nsim_fib4_rt_free(struct nsim_fib_rt *fib_rt,
+			      struct devlink *devlink)
+{
+	u32 *addr = (u32 *) fib_rt->key.addr;
+	struct nsim_fib4_rt *fib4_rt;
+
+	fib4_rt = container_of(fib_rt, struct nsim_fib4_rt, common);
+	fib_alias_in_hw_clear(devlink_net(devlink), *addr,
+			      fib_rt->key.prefix_len, fib4_rt->fi, fib4_rt->tos,
+			      fib4_rt->type, fib_rt->key.tb_id);
+	nsim_fib4_rt_destroy(fib4_rt);
+}
+
+static void nsim_fib_rt_free(void *ptr, void *arg)
+{
+	struct nsim_fib_rt *fib_rt = ptr;
+	struct devlink *devlink = arg;
+
+	switch (fib_rt->key.family) {
+	case AF_INET:
+		nsim_fib4_rt_free(fib_rt, devlink);
+		break;
+	default:
+		WARN_ON_ONCE(1);
+	}
+}
+
 struct nsim_fib_data *nsim_fib_create(struct devlink *devlink,
 				      struct netlink_ext_ack *extack)
 {
@@ -256,6 +499,11 @@ struct nsim_fib_data *nsim_fib_create(struct devlink *devlink,
 	data = kzalloc(sizeof(*data), GFP_KERNEL);
 	if (!data)
 		return ERR_PTR(-ENOMEM);
+	data->devlink = devlink;
+
+	err = rhashtable_init(&data->fib_rt_ht, &nsim_fib_rt_ht_params);
+	if (err)
+		goto err_data_free;
 
 	nsim_fib_set_max_all(data, devlink);
 
@@ -264,7 +512,7 @@ struct nsim_fib_data *nsim_fib_create(struct devlink *devlink,
 				    nsim_fib_dump_inconsistent, extack);
 	if (err) {
 		pr_err("Failed to register fib notifier\n");
-		goto err_out;
+		goto err_rhashtable_destroy;
 	}
 
 	devlink_resource_occ_get_register(devlink,
@@ -285,7 +533,10 @@ struct nsim_fib_data *nsim_fib_create(struct devlink *devlink,
 					  data);
 	return data;
 
-err_out:
+err_rhashtable_destroy:
+	rhashtable_free_and_destroy(&data->fib_rt_ht, nsim_fib_rt_free,
+				    devlink);
+err_data_free:
 	kfree(data);
 	return ERR_PTR(err);
 }
@@ -301,5 +552,7 @@ void nsim_fib_destroy(struct devlink *devlink, struct nsim_fib_data *data)
 	devlink_resource_occ_get_unregister(devlink,
 					    NSIM_RESOURCE_IPV4_FIB);
 	unregister_fib_notifier(devlink_net(devlink), &data->fib_nb);
+	rhashtable_free_and_destroy(&data->fib_rt_ht, nsim_fib_rt_free,
+				    devlink);
 	kfree(data);
 }
-- 
2.21.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH net-next 15/15] selftests: netdevsim: Add test for route offload API
  2019-10-02  8:40 [RFC PATCH net-next 00/15] Simplify IPv4 route offload API Ido Schimmel
                   ` (13 preceding siblings ...)
  2019-10-02  8:41 ` [RFC PATCH net-next 14/15] netdevsim: fib: " Ido Schimmel
@ 2019-10-02  8:41 ` Ido Schimmel
  2019-10-02 18:17 ` [RFC PATCH net-next 00/15] Simplify IPv4 " Jiri Pirko
  15 siblings, 0 replies; 39+ messages in thread
From: Ido Schimmel @ 2019-10-02  8:41 UTC (permalink / raw)
  To: netdev; +Cc: davem, dsahern, jiri, jakub.kicinski, saeedm, mlxsw, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

Test various aspects of the route offload API on top of the netdevsim
implementation. Both good and error flows are tested.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 .../drivers/net/netdevsim/fib_notifier.sh     | 411 ++++++++++++++++++
 1 file changed, 411 insertions(+)
 create mode 100755 tools/testing/selftests/drivers/net/netdevsim/fib_notifier.sh

diff --git a/tools/testing/selftests/drivers/net/netdevsim/fib_notifier.sh b/tools/testing/selftests/drivers/net/netdevsim/fib_notifier.sh
new file mode 100755
index 000000000000..27b0752df67f
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/netdevsim/fib_notifier.sh
@@ -0,0 +1,411 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# This test is for checking the FIB notification API. It makes use of netdevsim
+# which registers a listener to the FIB notification chain.
+
+lib_dir=$(dirname $0)/../../../net/forwarding
+
+ALL_TESTS="
+	ipv4_identical_routes
+	ipv4_tos
+	ipv4_metric
+	ipv4_replace
+	ipv4_delete
+	ipv4_plen
+	ipv4_replay
+	ipv4_error_path
+	ipv4_flush
+"
+NETDEVSIM_PATH=/sys/bus/netdevsim/
+DEV_ADDR=1337
+DEV=netdevsim${DEV_ADDR}
+DEVLINK_DEV=netdevsim/${DEV}
+NUM_NETIFS=0
+source $lib_dir/lib.sh
+source $lib_dir/devlink_lib.sh
+
+route_in_hw_check()
+{
+	local route=$1; shift
+
+	ip -n testns1 route show $route | grep -q "in_hw"
+}
+
+ipv4_identical_routes()
+{
+	local i
+
+	RET=0
+
+	for i in $(seq 1 3); do
+		ip -n testns1 link add name dummy$i type dummy
+		ip -n testns1 link set dev dummy$i up
+	done
+
+	ip -n testns1 route add 192.0.2.0/24 dev dummy1 tos 0 metric 1024
+	route_in_hw_check "192.0.2.0/24 dev dummy1 tos 0 metric 1024"
+	check_err $? "Route not in hardware when should"
+
+	ip -n testns1 route append 192.0.2.0/24 dev dummy2 tos 0 metric 1024
+	route_in_hw_check "192.0.2.0/24 dev dummy2 tos 0 metric 1024"
+	check_fail $? "Appended route in hardware when should not"
+
+	ip -n testns1 route prepend 192.0.2.0/24 dev dummy3 tos 0 metric 1024
+	route_in_hw_check "192.0.2.0/24 dev dummy3 tos 0 metric 1024"
+	check_err $? "Prepended route not in hardware when should"
+
+	route_in_hw_check "192.0.2.0/24 dev dummy1 tos 0 metric 1024"
+	check_fail $? "Route was not replaced in hardware by prepended one"
+
+	log_test "IPv4 identical routes"
+
+	for i in $(seq 1 3); do
+		ip -n testns1 link del dev dummy$i
+	done
+}
+
+ipv4_tos()
+{
+	RET=0
+
+	ip -n testns1 link add name dummy1 type dummy
+	ip -n testns1 link set dev dummy1 up
+
+	ip -n testns1 route add 192.0.2.0/24 dev dummy1 tos 0 metric 1024
+	route_in_hw_check "192.0.2.0/24 dev dummy1 tos 0 metric 1024"
+	check_err $? "Route not in hardware when should"
+
+	ip -n testns1 route add 192.0.2.0/24 dev dummy1 tos 2 metric 1024
+	route_in_hw_check "192.0.2.0/24 dev dummy1 tos 2 metric 1024"
+	check_err $? "Highest TOS route not in hardware when should"
+
+	route_in_hw_check "192.0.2.0/24 dev dummy1 tos 0 metric 1024"
+	check_fail $? "Lowest TOS route still in hardware when should not"
+
+	ip -n testns1 route add 192.0.2.0/24 dev dummy1 tos 1 metric 1024
+	route_in_hw_check "192.0.2.0/24 dev dummy1 tos 1 metric 1024"
+	check_fail $? "Middle TOS route in hardware when should not"
+
+	log_test "IPv4 routes with TOS"
+
+	ip -n testns1 link del dev dummy1
+}
+
+ipv4_metric()
+{
+	RET=0
+
+	ip -n testns1 link add name dummy1 type dummy
+	ip -n testns1 link set dev dummy1 up
+
+	ip -n testns1 route add 192.0.2.0/24 dev dummy1 metric 1024
+	route_in_hw_check "192.0.2.0/24 dev dummy1 metric 1024"
+	check_err $? "Route not in hardware when should"
+
+	ip -n testns1 route add 192.0.2.0/24 dev dummy1 metric 1022
+	route_in_hw_check "192.0.2.0/24 dev dummy1 metric 1022"
+	check_err $? "Lowest metric route not in hardware when should"
+
+	route_in_hw_check "192.0.2.0/24 dev dummy1 metric 1024"
+	check_fail $? "Highest metric route still in hardware when should not"
+
+	ip -n testns1 route add 192.0.2.0/24 dev dummy1 metric 1023
+	route_in_hw_check "192.0.2.0/24 dev dummy1 tos 1 metric 1023"
+	check_fail $? "Middle metric route in hardware when should not"
+
+	log_test "IPv4 routes with metric"
+
+	ip -n testns1 link del dev dummy1
+}
+
+ipv4_replace()
+{
+	local i
+
+	RET=0
+
+	for i in $(seq 1 2); do
+		ip -n testns1 link add name dummy$i type dummy
+		ip -n testns1 link set dev dummy$i up
+	done
+
+	ip -n testns1 route add 192.0.2.0/24 dev dummy1 metric 1024
+	route_in_hw_check "192.0.2.0/24 dev dummy1 metric 1024"
+	check_err $? "Route not in hardware when should"
+
+	ip -n testns1 route replace 192.0.2.0/24 dev dummy2 metric 1024
+	route_in_hw_check "192.0.2.0/24 dev dummy2 metric 1024"
+	check_err $? "Replacement route not in hardware when should"
+
+	# Add a route with an higher metric and make sure that replacing it
+	# does not affect the lower metric one.
+	ip -n testns1 route add 192.0.2.0/24 dev dummy1 metric 1025
+	ip -n testns1 route replace 192.0.2.0/24 dev dummy2 metric 1025
+
+	route_in_hw_check "192.0.2.0/24 dev dummy2 metric 1024"
+	check_err $? "Lowest metric route not in hardware when should"
+	route_in_hw_check "192.0.2.0/24 dev dummy2 metric 1025"
+	check_fail $? "Highest metric route in hardware when should not"
+
+	log_test "IPv4 route replace"
+
+	for i in $(seq 1 2); do
+		ip -n testns1 link del dev dummy$i
+	done
+}
+
+ipv4_delete()
+{
+	local metric
+
+	RET=0
+
+	ip -n testns1 link add name dummy1 type dummy
+	ip -n testns1 link set dev dummy1 up
+
+	# Insert multiple routes with the same prefix and length and varying
+	# metrics. Make sure that throughout delete operations the lowest
+	# metric route is the one in hardware.
+	for metric in $(seq 1024 1026); do
+		ip -n testns1 route add 192.0.2.0/24 dev dummy1 metric $metric
+	done
+
+	route_in_hw_check "192.0.2.0/24 dev dummy1 metric 1024"
+	check_err $? "Route not in hardware when should"
+
+	ip -n testns1 route del 192.0.2.0/24 dev dummy1 metric 1024
+	route_in_hw_check "192.0.2.0/24 dev dummy1 metric 1025"
+	check_err $? "Lowest metric route not in hardware when should"
+
+	ip -n testns1 route del 192.0.2.0/24 dev dummy1 metric 1026
+	route_in_hw_check "192.0.2.0/24 dev dummy1 metric 1025"
+	check_err $? "Sole route not in hardware when should"
+
+	log_test "IPv4 route delete"
+
+	ip -n testns1 link del dev dummy1
+}
+
+ipv4_plen()
+{
+	RET=0
+
+	ip -n testns1 link add name dummy1 type dummy
+	ip -n testns1 link set dev dummy1 up
+
+	# Add two routes with the same key and different prefix length and
+	# make sure both are in hardware. It can be verfied that both are
+	# sharing the same leaf by checking the /proc/net/fib_trie
+	ip -n testns1 route add 192.0.2.0/24 dev dummy1
+	ip -n testns1 route add 192.0.2.0/25 dev dummy1
+
+	route_in_hw_check "192.0.2.0/24 dev dummy1"
+	check_err $? "/24 not in hardware when should"
+
+	route_in_hw_check "192.0.2.0/25 dev dummy1"
+	check_err $? "/25 not in hardware when should"
+
+	log_test "IPv4 routes with different prefix length"
+
+	ip -n testns1 link del dev dummy1
+}
+
+ipv4_replay_metric()
+{
+	RET=0
+
+	ip -n testns1 link add name dummy1 type dummy
+	ip -n testns1 link set dev dummy1 up
+
+	ip -n testns1 route add 192.0.2.0/24 dev dummy1 metric 1024
+	ip -n testns1 route add 192.0.2.0/24 dev dummy1 metric 1025
+
+	devlink -N testns1 dev reload $DEVLINK_DEV
+
+	route_in_hw_check "192.0.2.0/24 dev dummy1 metric 1024"
+	check_err $? "Lowest metric route not in hardware when should"
+
+	route_in_hw_check "192.0.2.0/24 dev dummy1 metric 1025"
+	check_fail $? "Highest metric route in hardware when should not"
+
+	log_test "IPv4 routes replay - metric"
+
+	ip -n testns1 link del dev dummy1
+}
+
+ipv4_replay_tos()
+{
+	RET=0
+
+	ip -n testns1 link add name dummy1 type dummy
+	ip -n testns1 link set dev dummy1 up
+
+	ip -n testns1 route add 192.0.2.0/24 dev dummy1 tos 0
+	ip -n testns1 route add 192.0.2.0/24 dev dummy1 tos 1
+
+	devlink -N testns1 dev reload $DEVLINK_DEV
+
+	route_in_hw_check "192.0.2.0/24 dev dummy1 tos 1"
+	check_err $? "Highest TOS route not in hardware when should"
+
+	route_in_hw_check "192.0.2.0/24 dev dummy1 tos 0"
+	check_fail $? "Lowest TOS route in hardware when should not"
+
+	log_test "IPv4 routes replay - TOS"
+
+	ip -n testns1 link del dev dummy1
+}
+
+ipv4_replay_plen()
+{
+	RET=0
+
+	ip -n testns1 link add name dummy1 type dummy
+	ip -n testns1 link set dev dummy1 up
+
+	ip -n testns1 route add 192.0.2.0/24 dev dummy1
+	ip -n testns1 route add 192.0.2.0/25 dev dummy1
+
+	devlink -N testns1 dev reload $DEVLINK_DEV
+
+	route_in_hw_check "192.0.2.0/24 dev dummy1"
+	check_err $? "/24 not in hardware when should"
+
+	route_in_hw_check "192.0.2.0/25 dev dummy1"
+	check_err $? "/25 not in hardware when should"
+
+	log_test "IPv4 routes replay - prefix length"
+
+	ip -n testns1 link del dev dummy1
+}
+
+ipv4_replay()
+{
+	# Install multiple routes with the same prefix and length and make sure
+	# that after reload only the routes that should be in hardware are
+	# replayed.
+	ipv4_replay_metric
+	ipv4_replay_tos
+	ipv4_replay_plen
+}
+
+ipv4_error_path_add()
+{
+	local lsb
+
+	RET=0
+
+	ip -n testns1 link add name dummy1 type dummy
+	ip -n testns1 link set dev dummy1 up
+
+	devlink -N testns1 resource set $DEVLINK_DEV path IPv4/fib size 10
+	devlink -N testns1 dev reload $DEVLINK_DEV
+
+	for lsb in $(seq 1 20); do
+		ip -n testns1 route add 192.0.2.${lsb}/32 dev dummy1 \
+			&> /dev/null
+	done
+
+	log_test "IPv4 error path - add"
+
+	ip -n testns1 link del dev dummy1
+}
+
+ipv4_error_path_replay()
+{
+	local lsb
+
+	RET=0
+
+	ip -n testns1 link add name dummy1 type dummy
+	ip -n testns1 link set dev dummy1 up
+
+	devlink -N testns1 resource set $DEVLINK_DEV path IPv4/fib size 100
+	devlink -N testns1 dev reload $DEVLINK_DEV
+
+	for lsb in $(seq 1 20); do
+		ip -n testns1 route add 192.0.2.${lsb}/32 dev dummy1
+	done
+
+	devlink -N testns1 resource set $DEVLINK_DEV path IPv4/fib size 10
+	devlink -N testns1 dev reload $DEVLINK_DEV &> /dev/null
+
+	log_test "IPv4 error path - replay"
+
+	ip -n testns1 link del dev dummy1
+
+	# Successfully reload after deleting all the routes.
+	devlink -N testns1 resource set $DEVLINK_DEV path IPv4/fib size 100
+	devlink -N testns1 dev reload $DEVLINK_DEV
+}
+
+ipv4_error_path()
+{
+	# Test the different error paths of the notifiers by limiting the size
+	# of the "IPv4/fib" resource.
+	ipv4_error_path_add
+	ipv4_error_path_replay
+}
+
+ipv4_flush()
+{
+	local metric
+
+	RET=0
+
+	ip -n testns1 link add name dummy1 type dummy
+	ip -n testns1 link set dev dummy1 up
+
+	# Exercise the routes flushing code paths by inserting various
+	# prefix routes on a netdev and then deleting it.
+	for metric in $(seq 1 20); do
+		ip -n testns1 route add 192.0.2.0/24 dev dummy1 metric $metric
+	done
+
+	ip -n testns1 link del dev dummy1
+
+	log_test "IPv4 routes flushing"
+}
+
+setup_prepare()
+{
+	local netdev
+
+	modprobe netdevsim &> /dev/null
+
+	echo "$DEV_ADDR 0" > ${NETDEVSIM_PATH}/new_device
+
+	if [ ! -d "${NETDEVSIM_PATH}/devices/${DEV}" ]; then
+		echo "Failed to create netdevsim device"
+		exit 1
+	fi
+
+	ip netns add testns1
+	if [ $? -ne 0 ]; then
+		echo "Failed to add netns \"testns1\""
+		exit 1
+	fi
+
+	devlink dev reload $DEVLINK_DEV netns testns1
+	if [ $? -ne 0 ]; then
+		echo "Failed to reload into netns \"testns1\""
+		exit 1
+	fi
+}
+
+cleanup()
+{
+	pre_cleanup
+	ip netns del testns1
+	echo "$DEV_ADDR" > ${NETDEVSIM_PATH}/del_device
+	modprobe -r netdevsim &> /dev/null
+}
+
+trap cleanup EXIT
+
+setup_prepare
+
+tests_run
+
+exit $EXIT_STATUS
-- 
2.21.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH net-next 12/15] ipv4: Add "in hardware" indication to routes
  2019-10-02  8:41 ` [RFC PATCH net-next 12/15] ipv4: Add "in hardware" indication to routes Ido Schimmel
@ 2019-10-02 15:58   ` Roopa Prabhu
  2019-10-02 18:21     ` Jiri Pirko
  2019-10-03 12:59     ` Ido Schimmel
  0 siblings, 2 replies; 39+ messages in thread
From: Roopa Prabhu @ 2019-10-02 15:58 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, David Miller, David Ahern, Jiri Pirko, Jakub Kicinski,
	Saeed Mahameed, mlxsw, Ido Schimmel

On Wed, Oct 2, 2019 at 1:41 AM Ido Schimmel <idosch@idosch.org> wrote:
>
> From: Ido Schimmel <idosch@mellanox.com>
>
> When performing L3 offload, routes and nexthops are usually programmed
> into two different tables in the underlying device. Therefore, the fact
> that a nexthop resides in hardware does not necessarily mean that all
> the associated routes also reside in hardware and vice-versa.
>
> While the kernel can signal to user space the presence of a nexthop in
> hardware (via 'RTNH_F_OFFLOAD'), it does not have a corresponding flag
> for routes. In addition, the fact that a route resides in hardware does
> not necessarily mean that the traffic is offloaded. For example,
> unreachable routes (i.e., 'RTN_UNREACHABLE') are programmed to trap
> packets to the CPU so that the kernel will be able to generate the
> appropriate ICMP error packet.
>
> This patch adds an "in hardware" indication to IPv4 routes, so that
> users will have better visibility into the offload process. In the
> future IPv6 will be extended with this indication as well.
>
> 'struct fib_alias' is extended with a new field that indicates if
> the route resides in hardware or not. Note that the new field is added
> in the 6 bytes hole and therefore the struct still fits in a single
> cache line [1].
>
> Capable drivers are expected to invoke fib_alias_in_hw_{set,clear}()
> with the route's key in order to set / clear the "in hardware
> indication".
>
> The new indication is dumped to user space via a new flag (i.e.,
> 'RTM_F_IN_HW') in the 'rtm_flags' field in the ancillary header.
>

nice series Ido. why not call this RTM_F_OFFLOAD to keep it consistent
with the nexthop offload indication ?.
But this again does not seem to be similar to the other request flags
like: RTM_F_FIB_MATCH

(so far i think all the RTNH_F_* flags are used on routes too IIRC
(see iproute2: print_rt_flags)
RTNH_F_DEAD seems to fall in this category)


> [1]
> struct fib_alias {
>         struct hlist_node  fa_list;                      /*     0    16 */
>         struct fib_info *          fa_info;              /*    16     8 */
>         u8                         fa_tos;               /*    24     1 */
>         u8                         fa_type;              /*    25     1 */
>         u8                         fa_state;             /*    26     1 */
>         u8                         fa_slen;              /*    27     1 */
>         u32                        tb_id;                /*    28     4 */
>         s16                        fa_default;           /*    32     2 */
>         u8                         in_hw:1;              /*    34: 0  1 */
>         u8                         unused:7;             /*    34: 1  1 */
>
>         /* XXX 5 bytes hole, try to pack */
>
>         struct callback_head rcu __attribute__((__aligned__(8))); /*    40    16 */
>
>         /* size: 56, cachelines: 1, members: 11 */
>         /* sum members: 50, holes: 1, sum holes: 5 */
>         /* sum bitfield members: 8 bits (1 bytes) */
>         /* forced alignments: 1, forced holes: 1, sum forced holes: 5 */
>         /* last cacheline: 56 bytes */
> } __attribute__((__aligned__(8)));
>
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> ---
>  include/net/ip_fib.h           |  5 +++
>  include/uapi/linux/rtnetlink.h |  1 +
>  net/ipv4/fib_lookup.h          |  4 ++
>  net/ipv4/fib_semantics.c       |  4 ++
>  net/ipv4/fib_trie.c            | 71 ++++++++++++++++++++++++++++++++++
>  5 files changed, 85 insertions(+)
>
> diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
> index 52b2406a5dfc..019a138a79f4 100644
> --- a/include/net/ip_fib.h
> +++ b/include/net/ip_fib.h
> @@ -454,6 +454,11 @@ int fib_nh_common_init(struct fib_nh_common *nhc, struct nlattr *fc_encap,
>  void fib_nh_common_release(struct fib_nh_common *nhc);
>
>  /* Exported by fib_trie.c */
> +void fib_alias_in_hw_set(struct net *net, u32 dst, int dst_len,
> +                        const struct fib_info *fi, u8 tos, u8 type, u32 tb_id);
> +void fib_alias_in_hw_clear(struct net *net, u32 dst, int dst_len,
> +                          const struct fib_info *fi, u8 tos, u8 type,
> +                          u32 tb_id);
>  void fib_trie_init(void);
>  struct fib_table *fib_trie_table(u32 id, struct fib_table *alias);
>
> diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
> index 1418a8362bb7..e5a104f5ce35 100644
> --- a/include/uapi/linux/rtnetlink.h
> +++ b/include/uapi/linux/rtnetlink.h
> @@ -309,6 +309,7 @@ enum rt_scope_t {
>  #define RTM_F_PREFIX           0x800   /* Prefix addresses             */
>  #define RTM_F_LOOKUP_TABLE     0x1000  /* set rtm_table to FIB lookup result */
>  #define RTM_F_FIB_MATCH                0x2000  /* return full fib lookup match */
> +#define RTM_F_IN_HW            0x4000  /* route is in hardware */
>
>  /* Reserved table identifiers */
>
> diff --git a/net/ipv4/fib_lookup.h b/net/ipv4/fib_lookup.h
> index b34594a9965f..65a69a863499 100644
> --- a/net/ipv4/fib_lookup.h
> +++ b/net/ipv4/fib_lookup.h
> @@ -16,6 +16,8 @@ struct fib_alias {
>         u8                      fa_slen;
>         u32                     tb_id;
>         s16                     fa_default;
> +       u8                      in_hw:1,
> +                               unused:7;
>         struct rcu_head         rcu;
>  };
>
> @@ -28,6 +30,8 @@ struct fib_rt_info {
>         int                     dst_len;
>         u8                      tos;
>         u8                      type;
> +       u8                      in_hw:1,
> +                               unused:7;
>  };
>
>  /* Dont write on fa_state unless needed, to keep it shared on all cpus */
> diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
> index 3c9d47804d93..94f201d44844 100644
> --- a/net/ipv4/fib_semantics.c
> +++ b/net/ipv4/fib_semantics.c
> @@ -519,6 +519,7 @@ void rtmsg_fib(int event, __be32 key, struct fib_alias *fa,
>         fri.dst_len = dst_len;
>         fri.tos = fa->fa_tos;
>         fri.type = fa->fa_type;
> +       fri.in_hw = fa->in_hw;
>         err = fib_dump_info(skb, info->portid, seq, event, &fri, nlm_flags);
>         if (err < 0) {
>                 /* -EMSGSIZE implies BUG in fib_nlmsg_size() */
> @@ -1801,6 +1802,9 @@ int fib_dump_info(struct sk_buff *skb, u32 portid, u32 seq, int event,
>                         goto nla_put_failure;
>         }
>
> +       if (fri->in_hw)
> +               rtm->rtm_flags |= RTM_F_IN_HW;
> +
>         nlmsg_end(skb, nlh);
>         return 0;
>
> diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
> index 646542de83ca..e3486bde6c5a 100644
> --- a/net/ipv4/fib_trie.c
> +++ b/net/ipv4/fib_trie.c
> @@ -1028,6 +1028,74 @@ static struct fib_alias *fib_find_alias(struct hlist_head *fah, u8 slen,
>         return NULL;
>  }
>
> +static struct fib_alias *
> +fib_find_matching_alias(struct net *net, u32 dst, int dst_len,
> +                       const struct fib_info *fi, u8 tos, u8 type, u32 tb_id)
> +{
> +       u8 slen = KEYLENGTH - dst_len;
> +       struct key_vector *l, *tp;
> +       struct fib_table *tb;
> +       struct fib_alias *fa;
> +       struct trie *t;
> +
> +       tb = fib_get_table(net, tb_id);
> +       if (!tb)
> +               return NULL;
> +
> +       t = (struct trie *)tb->tb_data;
> +       l = fib_find_node(t, &tp, dst);
> +       if (!l)
> +               return NULL;
> +
> +       hlist_for_each_entry_rcu(fa, &l->leaf, fa_list) {
> +               if (fa->fa_slen == slen && fa->tb_id == tb_id &&
> +                   fa->fa_tos == tos && fa->fa_info == fi &&
> +                   fa->fa_type == type)
> +                       return fa;
> +       }
> +
> +       return NULL;
> +}
> +
> +void fib_alias_in_hw_set(struct net *net, u32 dst, int dst_len,
> +                        const struct fib_info *fi, u8 tos, u8 type, u32 tb_id)
> +{
> +       struct fib_alias *fa_match;
> +
> +       rcu_read_lock();
> +
> +       fa_match = fib_find_matching_alias(net, dst, dst_len, fi, tos, type,
> +                                          tb_id);
> +       if (!fa_match)
> +               goto out;
> +
> +       fa_match->in_hw = 1;
> +
> +out:
> +       rcu_read_unlock();
> +}
> +EXPORT_SYMBOL_GPL(fib_alias_in_hw_set);
> +
> +void fib_alias_in_hw_clear(struct net *net, u32 dst, int dst_len,
> +                          const struct fib_info *fi, u8 tos, u8 type,
> +                          u32 tb_id)
> +{
> +       struct fib_alias *fa_match;
> +
> +       rcu_read_lock();
> +
> +       fa_match = fib_find_matching_alias(net, dst, dst_len, fi, tos, type,
> +                                          tb_id);
> +       if (!fa_match)
> +               goto out;
> +
> +       fa_match->in_hw = 0;
> +
> +out:
> +       rcu_read_unlock();
> +}
> +EXPORT_SYMBOL_GPL(fib_alias_in_hw_clear);
> +
>  static void trie_rebalance(struct trie *t, struct key_vector *tn)
>  {
>         while (!IS_TRIE(tn))
> @@ -1236,6 +1304,7 @@ int fib_table_insert(struct net *net, struct fib_table *tb,
>                         new_fa->fa_slen = fa->fa_slen;
>                         new_fa->tb_id = tb->tb_id;
>                         new_fa->fa_default = -1;
> +                       new_fa->in_hw = 0;
>
>                         hlist_replace_rcu(&fa->fa_list, &new_fa->fa_list);
>
> @@ -1294,6 +1363,7 @@ int fib_table_insert(struct net *net, struct fib_table *tb,
>         new_fa->fa_slen = slen;
>         new_fa->tb_id = tb->tb_id;
>         new_fa->fa_default = -1;
> +       new_fa->in_hw = 0;
>
>         /* Insert new entry to the list. */
>         err = fib_insert_alias(t, tp, l, new_fa, fa, key);
> @@ -2218,6 +2288,7 @@ static int fn_trie_dump_leaf(struct key_vector *l, struct fib_table *tb,
>                                 fri.dst_len = KEYLENGTH - fa->fa_slen;
>                                 fri.tos = fa->fa_tos;
>                                 fri.type = fa->fa_type;
> +                               fri.in_hw = fa->in_hw;
>                                 err = fib_dump_info(skb,
>                                                     NETLINK_CB(cb->skb).portid,
>                                                     cb->nlh->nlmsg_seq,
> --
> 2.21.0
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH net-next 07/15] ipv4: Only Replay routes of interest to new listeners
  2019-10-02  8:40 ` [RFC PATCH net-next 07/15] ipv4: Only Replay routes of interest to new listeners Ido Schimmel
@ 2019-10-02 17:44   ` Jiri Pirko
  2019-10-03 13:04     ` Ido Schimmel
  0 siblings, 1 reply; 39+ messages in thread
From: Jiri Pirko @ 2019-10-02 17:44 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, davem, dsahern, jiri, jakub.kicinski, saeedm, mlxsw,
	Ido Schimmel

Wed, Oct 02, 2019 at 10:40:55AM CEST, idosch@idosch.org wrote:
>From: Ido Schimmel <idosch@mellanox.com>
>
>When a new listener is registered to the FIB notification chain it
>receives a dump of all the available routes in the system. Instead, make
>sure to only replay the IPv4 routes that are actually used in the data
>path and are of any interest to the new listener.
>
>Signed-off-by: Ido Schimmel <idosch@mellanox.com>
>---
> net/ipv4/fib_trie.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
>diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
>index dc4c4e2cb0b3..4937a3503f4f 100644
>--- a/net/ipv4/fib_trie.c
>+++ b/net/ipv4/fib_trie.c
>@@ -2096,6 +2096,7 @@ static int fib_leaf_notify(struct key_vector *l, struct fib_table *tb,
> 			   struct netlink_ext_ack *extack)
> {
> 	struct fib_alias *fa;
>+	int last_slen = -1;
> 	int err;
> 
> 	hlist_for_each_entry_rcu(fa, &l->leaf, fa_list) {
>@@ -2110,6 +2111,15 @@ static int fib_leaf_notify(struct key_vector *l, struct fib_table *tb,
> 		if (tb->tb_id != fa->tb_id)
> 			continue;
> 
>+		if (fa->fa_slen == last_slen)
>+			continue;

Hmm, I wonder, don't you want to continue only for FIB_EVENT_ENTRY_REPLACE_TMP
and keep the notifier call for FIB_EVENT_ENTRY_ADD?


>+
>+		last_slen = fa->fa_slen;
>+		err = call_fib_entry_notifier(nb, FIB_EVENT_ENTRY_REPLACE_TMP,
>+					      l->key, KEYLENGTH - fa->fa_slen,
>+					      fa, extack);
>+		if (err)
>+			return err;
> 		err = call_fib_entry_notifier(nb, FIB_EVENT_ENTRY_ADD, l->key,
> 					      KEYLENGTH - fa->fa_slen,
> 					      fa, extack);
>-- 
>2.21.0
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH net-next 08/15] mlxsw: spectrum_router: Start using new IPv4 route notifications
  2019-10-02  8:40 ` [RFC PATCH net-next 08/15] mlxsw: spectrum_router: Start using new IPv4 route notifications Ido Schimmel
@ 2019-10-02 17:52   ` Jiri Pirko
  2019-10-02 18:01     ` Jiri Pirko
  0 siblings, 1 reply; 39+ messages in thread
From: Jiri Pirko @ 2019-10-02 17:52 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, davem, dsahern, jiri, jakub.kicinski, saeedm, mlxsw,
	Ido Schimmel

Wed, Oct 02, 2019 at 10:40:56AM CEST, idosch@idosch.org wrote:
>From: Ido Schimmel <idosch@mellanox.com>
>
>With the new notifications mlxsw does not need to handle identical
>routes itself, as this is taken care of by the core IPv4 code.
>
>Instead, mlxsw only needs to take care of inserting and removing routes
>from the device.
>
>Convert mlxsw to use the new IPv4 route notifications and simplify the
>code.
>

[...]


>@@ -6246,9 +6147,10 @@ static int mlxsw_sp_router_fib_event(struct notifier_block *nb,
> 		err = mlxsw_sp_router_fib_rule_event(event, info,
> 						     router->mlxsw_sp);
> 		return notifier_from_errno(err);
>-	case FIB_EVENT_ENTRY_ADD:
>+	case FIB_EVENT_ENTRY_ADD: /* fall through */
> 	case FIB_EVENT_ENTRY_REPLACE: /* fall through */
> 	case FIB_EVENT_ENTRY_APPEND:  /* fall through */

Why don't you skip the three above with just return of NOTIFY_DONE?


>+	case FIB_EVENT_ENTRY_REPLACE_TMP:
> 		if (router->aborted) {
> 			NL_SET_ERR_MSG_MOD(info->extack, "FIB offload was aborted. Not configuring route");
> 			return notifier_from_errno(-EINVAL);
>-- 
>2.21.0
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH net-next 08/15] mlxsw: spectrum_router: Start using new IPv4 route notifications
  2019-10-02 17:52   ` Jiri Pirko
@ 2019-10-02 18:01     ` Jiri Pirko
  2019-10-03 15:10       ` Ido Schimmel
  0 siblings, 1 reply; 39+ messages in thread
From: Jiri Pirko @ 2019-10-02 18:01 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, davem, dsahern, jiri, jakub.kicinski, saeedm, mlxsw,
	Ido Schimmel

Wed, Oct 02, 2019 at 07:52:30PM CEST, jiri@resnulli.us wrote:
>Wed, Oct 02, 2019 at 10:40:56AM CEST, idosch@idosch.org wrote:
>>From: Ido Schimmel <idosch@mellanox.com>
>>
>>With the new notifications mlxsw does not need to handle identical
>>routes itself, as this is taken care of by the core IPv4 code.
>>
>>Instead, mlxsw only needs to take care of inserting and removing routes
>>from the device.
>>
>>Convert mlxsw to use the new IPv4 route notifications and simplify the
>>code.
>>
>
>[...]
>
>
>>@@ -6246,9 +6147,10 @@ static int mlxsw_sp_router_fib_event(struct notifier_block *nb,
>> 		err = mlxsw_sp_router_fib_rule_event(event, info,
>> 						     router->mlxsw_sp);
>> 		return notifier_from_errno(err);
>>-	case FIB_EVENT_ENTRY_ADD:
>>+	case FIB_EVENT_ENTRY_ADD: /* fall through */
>> 	case FIB_EVENT_ENTRY_REPLACE: /* fall through */
>> 	case FIB_EVENT_ENTRY_APPEND:  /* fall through */
>
>Why don't you skip the three above with just return of NOTIFY_DONE?

if (info->family == AF_INET)
	return NOTIFY_DONE;

>
>
>>+	case FIB_EVENT_ENTRY_REPLACE_TMP:
>> 		if (router->aborted) {
>> 			NL_SET_ERR_MSG_MOD(info->extack, "FIB offload was aborted. Not configuring route");
>> 			return notifier_from_errno(-EINVAL);
>>-- 
>>2.21.0
>>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH net-next 00/15] Simplify IPv4 route offload API
  2019-10-02  8:40 [RFC PATCH net-next 00/15] Simplify IPv4 route offload API Ido Schimmel
                   ` (14 preceding siblings ...)
  2019-10-02  8:41 ` [RFC PATCH net-next 15/15] selftests: netdevsim: Add test for route offload API Ido Schimmel
@ 2019-10-02 18:17 ` Jiri Pirko
  2019-10-03  5:18   ` Ido Schimmel
  15 siblings, 1 reply; 39+ messages in thread
From: Jiri Pirko @ 2019-10-02 18:17 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, davem, dsahern, jiri, jakub.kicinski, saeedm, mlxsw,
	Ido Schimmel

Wed, Oct 02, 2019 at 10:40:48AM CEST, idosch@idosch.org wrote:
>From: Ido Schimmel <idosch@mellanox.com>
>
>Today, whenever an IPv4 route is added or deleted a notification is sent
>in the FIB notification chain and it is up to offload drivers to decide
>if the route should be programmed to the hardware or not. This is not an
>easy task as in hardware routes are keyed by {prefix, prefix length,
>table id}, whereas the kernel can store multiple such routes that only
>differ in metric / TOS / nexthop info.
>
>This series makes sure that only routes that are actually used in the
>data path are notified to offload drivers. This greatly simplifies the
>work these drivers need to do, as they are now only concerned with
>programming the hardware and do not need to replicate the IPv4 route
>insertion logic and store multiple identical routes.
>
>The route that is notified is the first FIB alias in the FIB node with
>the given {prefix, prefix length, table ID}. In case the route is
>deleted and there is another route with the same key, a replace
>notification is emitted. Otherwise, a delete notification is emitted.
>
>The above means that in the case of multiple routes with the same key,
>but different TOS, only the route with the highest TOS is notified.
>While the kernel can route a packet based on its TOS, this is not
>supported by any hardware devices I'm familiar with. Moreover, this is
>not supported by IPv6 nor by BIRD/FRR from what I could see. Offload
>drivers should therefore use the presence of a non-zero TOS as an
>indication to trap packets matching the route and let the kernel route
>them instead. mlxsw has been doing it for the past two years.
>
>The series also adds an "in hardware" indication to routes, in addition

I think this might be a separate patchset. I mean patch "ipv4: Replace
route in list before notifying" and above.


>to the offload indication we already have on nexthops today. Besides
>being long overdue, the reason this is done in this series is that it
>makes it possible to easily test the new FIB notification API over
>netdevsim.
>
>To ensure there is no degradation in route insertion rates, I used
>Vincent Bernat's script [1][2] from [3] to inject 500,000 routes from an
>MRT dump from a router with a full view. On a system with Intel(R)
>Xeon(R) CPU D-1527 @ 2.20GHz I measured 8.184 seconds, averaged over 10
>runs and saw no degradation compared to net-next from today.
>
>Patchset overview:
>Patches #1-#7 introduce the new FIB notifications
>Patches #8-#9 convert listeners to make use of the new notifications
>Patches #10-#14 add "in hardware" indication for IPv4 routes, including
>a dummy FIB offload implementation in netdevsim
>Patch #15 adds a selftest for the new FIB notifications API over
>netdevsim
>
>The series is based on Jiri's "devlink: allow devlink instances to
>change network namespace" series [4]. The patches can be found here [5]
>and patched iproute2 with the "in hardware" indication can be found here
>[6].
>
>IPv6 is next on my TODO list.
>
>[1] https://github.com/vincentbernat/network-lab/blob/master/common/helpers/lab-routes-ipvX/insert-from-bgp
>[2] https://gist.github.com/idosch/2eb96efe50eb5234d205e964f0814859
>[3] https://vincent.bernat.ch/en/blog/2017-ipv4-route-lookup-linux
>[4] https://patchwork.ozlabs.org/cover/1162295/
>[5] https://github.com/idosch/linux/tree/fib-notifier
>[6] https://github.com/idosch/iproute2/tree/fib-notifier
>
>Ido Schimmel (15):
>  ipv4: Add temporary events to the FIB notification chain
>  ipv4: Notify route after insertion to the routing table
>  ipv4: Notify route if replacing currently offloaded one
>  ipv4: Notify newly added route if should be offloaded
>  ipv4: Handle route deletion notification
>  ipv4: Handle route deletion notification during flush
>  ipv4: Only Replay routes of interest to new listeners
>  mlxsw: spectrum_router: Start using new IPv4 route notifications
>  ipv4: Remove old route notifications and convert listeners
>  ipv4: Replace route in list before notifying
>  ipv4: Encapsulate function arguments in a struct
>  ipv4: Add "in hardware" indication to routes
>  mlxsw: spectrum_router: Mark routes as "in hardware"
>  netdevsim: fib: Mark routes as "in hardware"
>  selftests: netdevsim: Add test for route offload API
>
> .../net/ethernet/mellanox/mlx5/core/lag_mp.c  |   4 -
> .../ethernet/mellanox/mlxsw/spectrum_router.c | 152 ++-----
> drivers/net/ethernet/rocker/rocker_main.c     |   4 +-
> drivers/net/netdevsim/fib.c                   | 263 ++++++++++-
> include/net/ip_fib.h                          |   5 +
> include/uapi/linux/rtnetlink.h                |   1 +
> net/ipv4/fib_lookup.h                         |  18 +-
> net/ipv4/fib_semantics.c                      |  30 +-
> net/ipv4/fib_trie.c                           | 223 ++++++++--
> net/ipv4/route.c                              |  12 +-
> .../drivers/net/netdevsim/fib_notifier.sh     | 411 ++++++++++++++++++
> 11 files changed, 938 insertions(+), 185 deletions(-)
> create mode 100755 tools/testing/selftests/drivers/net/netdevsim/fib_notifier.sh
>
>-- 
>2.21.0
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH net-next 12/15] ipv4: Add "in hardware" indication to routes
  2019-10-02 15:58   ` Roopa Prabhu
@ 2019-10-02 18:21     ` Jiri Pirko
  2019-10-03  2:34       ` David Ahern
  2019-10-03 12:59     ` Ido Schimmel
  1 sibling, 1 reply; 39+ messages in thread
From: Jiri Pirko @ 2019-10-02 18:21 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: Ido Schimmel, netdev, David Miller, David Ahern, Jiri Pirko,
	Jakub Kicinski, Saeed Mahameed, mlxsw, Ido Schimmel

Wed, Oct 02, 2019 at 05:58:52PM CEST, roopa@cumulusnetworks.com wrote:
>On Wed, Oct 2, 2019 at 1:41 AM Ido Schimmel <idosch@idosch.org> wrote:
>>
>> From: Ido Schimmel <idosch@mellanox.com>
>>
>> When performing L3 offload, routes and nexthops are usually programmed
>> into two different tables in the underlying device. Therefore, the fact
>> that a nexthop resides in hardware does not necessarily mean that all
>> the associated routes also reside in hardware and vice-versa.
>>

*****

>> While the kernel can signal to user space the presence of a nexthop in
>> hardware (via 'RTNH_F_OFFLOAD'), it does not have a corresponding flag
>> for routes. In addition, the fact that a route resides in hardware does
>> not necessarily mean that the traffic is offloaded. For example,
>> unreachable routes (i.e., 'RTN_UNREACHABLE') are programmed to trap
>> packets to the CPU so that the kernel will be able to generate the
>> appropriate ICMP error packet.

*****


>>
>> This patch adds an "in hardware" indication to IPv4 routes, so that
>> users will have better visibility into the offload process. In the
>> future IPv6 will be extended with this indication as well.
>>
>> 'struct fib_alias' is extended with a new field that indicates if
>> the route resides in hardware or not. Note that the new field is added
>> in the 6 bytes hole and therefore the struct still fits in a single
>> cache line [1].
>>
>> Capable drivers are expected to invoke fib_alias_in_hw_{set,clear}()
>> with the route's key in order to set / clear the "in hardware
>> indication".
>>
>> The new indication is dumped to user space via a new flag (i.e.,
>> 'RTM_F_IN_HW') in the 'rtm_flags' field in the ancillary header.
>>
>
>nice series Ido. why not call this RTM_F_OFFLOAD to keep it consistent
>with the nexthop offload indication ?.

See the second paragraph of this description.


>But this again does not seem to be similar to the other request flags
>like: RTM_F_FIB_MATCH
>
>(so far i think all the RTNH_F_* flags are used on routes too IIRC
>(see iproute2: print_rt_flags)
>RTNH_F_DEAD seems to fall in this category)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH net-next 13/15] mlxsw: spectrum_router: Mark routes as "in hardware"
  2019-10-02  8:41 ` [RFC PATCH net-next 13/15] mlxsw: spectrum_router: Mark routes as "in hardware" Ido Schimmel
@ 2019-10-02 18:27   ` Jiri Pirko
  2019-10-03 15:16     ` Ido Schimmel
  0 siblings, 1 reply; 39+ messages in thread
From: Jiri Pirko @ 2019-10-02 18:27 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, davem, dsahern, jiri, jakub.kicinski, saeedm, mlxsw,
	Ido Schimmel

Wed, Oct 02, 2019 at 10:41:01AM CEST, idosch@idosch.org wrote:
>From: Ido Schimmel <idosch@mellanox.com>
>
>Make use of the recently introduced APIs and mark notified routes as "in
>hardware" after they were programmed to the device's LPM tree.
>
>Similarly, when a route is replaced by an higher priority one, clear the
>"in hardware" indication from it.
>
>Signed-off-by: Ido Schimmel <idosch@mellanox.com>
>---
> .../ethernet/mellanox/mlxsw/spectrum_router.c | 19 +++++++++++++++++++
> 1 file changed, 19 insertions(+)
>
>diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
>index 5a4e61f1feec..26ab8ae482ec 100644
>--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
>+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
>@@ -4769,7 +4769,10 @@ static void mlxsw_sp_fib4_entry_replace(struct mlxsw_sp *mlxsw_sp,
> 					struct mlxsw_sp_fib4_entry *fib4_entry)
> {
> 	struct mlxsw_sp_fib_node *fib_node = fib4_entry->common.fib_node;
>+	struct net *net = mlxsw_sp_net(mlxsw_sp);
>+	u32 *addr = (u32 *) fib_node->key.addr;
> 	struct mlxsw_sp_fib4_entry *replaced;
>+	struct fib_info *fi;
> 
> 	if (list_is_singular(&fib_node->entry_list))
> 		return;
>@@ -4777,6 +4780,10 @@ static void mlxsw_sp_fib4_entry_replace(struct mlxsw_sp *mlxsw_sp,
> 	/* We inserted the new entry before replaced one */
> 	replaced = list_next_entry(fib4_entry, common.list);
> 
>+	fi = mlxsw_sp_nexthop4_group_fi(replaced->common.nh_group);
>+	fib_alias_in_hw_clear(net, *addr, fib_node->key.prefix_len, fi,
>+			      replaced->tos, replaced->type, replaced->tb_id);
>+
> 	mlxsw_sp_fib4_node_entry_unlink(mlxsw_sp, replaced);
> 	mlxsw_sp_fib4_entry_destroy(mlxsw_sp, replaced);
> 	mlxsw_sp_fib_node_put(mlxsw_sp, fib_node);
>@@ -4786,6 +4793,7 @@ static int
> mlxsw_sp_router_fib4_replace(struct mlxsw_sp *mlxsw_sp,
> 			     const struct fib_entry_notifier_info *fen_info)
> {
>+	struct net *net = mlxsw_sp_net(mlxsw_sp);
> 	struct mlxsw_sp_fib4_entry *fib4_entry;
> 	struct mlxsw_sp_fib_node *fib_node;
> 	int err;
>@@ -4815,6 +4823,10 @@ mlxsw_sp_router_fib4_replace(struct mlxsw_sp *mlxsw_sp,
> 		goto err_fib4_node_entry_link;
> 	}
> 
>+	fib_alias_in_hw_set(net, fen_info->dst, fen_info->dst_len,
>+			    fen_info->fi, fen_info->tos, fen_info->type,
>+			    fen_info->tb_id);

Can't you pass "fa" through fen_info and down to fib_alias_in_hw_set and
avoid lookup?


>+
> 	mlxsw_sp_fib4_entry_replace(mlxsw_sp, fib4_entry);
> 
> 	return 0;
>@@ -5731,11 +5743,18 @@ static void mlxsw_sp_fib4_node_flush(struct mlxsw_sp *mlxsw_sp,
> 				     struct mlxsw_sp_fib_node *fib_node)
> {
> 	struct mlxsw_sp_fib4_entry *fib4_entry, *tmp;
>+	struct net *net = mlxsw_sp_net(mlxsw_sp);
>+	u32 *addr = (u32 *) fib_node->key.addr;
> 
> 	list_for_each_entry_safe(fib4_entry, tmp, &fib_node->entry_list,
> 				 common.list) {
> 		bool do_break = &tmp->common.list == &fib_node->entry_list;
>+		struct fib_info *fi;
> 
>+		fi = mlxsw_sp_nexthop4_group_fi(fib4_entry->common.nh_group);
>+		fib_alias_in_hw_clear(net, *addr, fib_node->key.prefix_len, fi,
>+				      fib4_entry->tos, fib4_entry->type,
>+				      fib4_entry->tb_id);
> 		mlxsw_sp_fib4_node_entry_unlink(mlxsw_sp, fib4_entry);
> 		mlxsw_sp_fib4_entry_destroy(mlxsw_sp, fib4_entry);
> 		mlxsw_sp_fib_node_put(mlxsw_sp, fib_node);
>-- 
>2.21.0
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH net-next 02/15] ipv4: Notify route after insertion to the routing table
  2019-10-02  8:40 ` [RFC PATCH net-next 02/15] ipv4: Notify route after insertion to the routing table Ido Schimmel
@ 2019-10-03  1:34   ` David Ahern
  2019-10-03  5:16     ` Ido Schimmel
  0 siblings, 1 reply; 39+ messages in thread
From: David Ahern @ 2019-10-03  1:34 UTC (permalink / raw)
  To: Ido Schimmel, netdev
  Cc: davem, jiri, jakub.kicinski, saeedm, mlxsw, Ido Schimmel

On 10/2/19 2:40 AM, Ido Schimmel wrote:
> @@ -1269,14 +1269,19 @@ int fib_table_insert(struct net *net, struct fib_table *tb,
>  	new_fa->tb_id = tb->tb_id;
>  	new_fa->fa_default = -1;
>  
> -	err = call_fib_entry_notifiers(net, event, key, plen, new_fa, extack);
> +	/* Insert new entry to the list. */
> +	err = fib_insert_alias(t, tp, l, new_fa, fa, key);
>  	if (err)
>  		goto out_free_new_fa;
>  
> -	/* Insert new entry to the list. */
> -	err = fib_insert_alias(t, tp, l, new_fa, fa, key);
> +	/* The alias was already inserted, so the node must exist. */
> +	l = fib_find_node(t, &tp, key);
> +	if (WARN_ON_ONCE(!l))
> +		goto out_free_new_fa;

Maybe I am missing something but, the 'l' is only needed for the error
path, so optimize for the success case and only lookup the node if the
notifier fails.

> +
> +	err = call_fib_entry_notifiers(net, event, key, plen, new_fa, extack);
>  	if (err)
> -		goto out_fib_notif;
> +		goto out_remove_new_fa;
>  
>  	if (!plen)
>  		tb->tb_num_default++;
> @@ -1287,14 +1292,8 @@ int fib_table_insert(struct net *net, struct fib_table *tb,
>  succeeded:
>  	return 0;
>  
> -out_fib_notif:
> -	/* notifier was sent that entry would be added to trie, but
> -	 * the add failed and need to recover. Only failure for
> -	 * fib_insert_alias is ENOMEM.
> -	 */
> -	NL_SET_ERR_MSG(extack, "Failed to insert route into trie");
> -	call_fib_entry_notifiers(net, FIB_EVENT_ENTRY_DEL, key,
> -				 plen, new_fa, NULL);
> +out_remove_new_fa:
> +	fib_remove_alias(t, tp, l, new_fa);
>  out_free_new_fa:
>  	kmem_cache_free(fn_alias_kmem, new_fa);
>  out:
> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH net-next 12/15] ipv4: Add "in hardware" indication to routes
  2019-10-02 18:21     ` Jiri Pirko
@ 2019-10-03  2:34       ` David Ahern
  2019-10-03  5:37         ` Ido Schimmel
  2019-10-03  5:40         ` Jiri Pirko
  0 siblings, 2 replies; 39+ messages in thread
From: David Ahern @ 2019-10-03  2:34 UTC (permalink / raw)
  To: Jiri Pirko, Roopa Prabhu
  Cc: Ido Schimmel, netdev, David Miller, Jiri Pirko, Jakub Kicinski,
	Saeed Mahameed, mlxsw, Ido Schimmel

On 10/2/19 12:21 PM, Jiri Pirko wrote:
>>> This patch adds an "in hardware" indication to IPv4 routes, so that
>>> users will have better visibility into the offload process. In the
>>> future IPv6 will be extended with this indication as well.
>>>
>>> 'struct fib_alias' is extended with a new field that indicates if
>>> the route resides in hardware or not. Note that the new field is added
>>> in the 6 bytes hole and therefore the struct still fits in a single
>>> cache line [1].
>>>
>>> Capable drivers are expected to invoke fib_alias_in_hw_{set,clear}()
>>> with the route's key in order to set / clear the "in hardware
>>> indication".
>>>
>>> The new indication is dumped to user space via a new flag (i.e.,
>>> 'RTM_F_IN_HW') in the 'rtm_flags' field in the ancillary header.
>>>
>>
>> nice series Ido. why not call this RTM_F_OFFLOAD to keep it consistent
>> with the nexthop offload indication ?.
> 
> See the second paragraph of this description.

I read it multiple times. It does not explain why RTM_F_OFFLOAD is not
used. Unless there is good reason RTM_F_OFFLOAD should be the name for
consistency with all of the other OFFLOAD flags. I realize rtm_flags is
overloaded and the lower 8 bits contains RTNH_F flags, but that can be
managed with good documentation - that RTNH_F is for the nexthop and
RTM_F is for the prefix.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH net-next 02/15] ipv4: Notify route after insertion to the routing table
  2019-10-03  1:34   ` David Ahern
@ 2019-10-03  5:16     ` Ido Schimmel
  0 siblings, 0 replies; 39+ messages in thread
From: Ido Schimmel @ 2019-10-03  5:16 UTC (permalink / raw)
  To: David Ahern
  Cc: netdev, davem, jiri, jakub.kicinski, saeedm, mlxsw, Ido Schimmel

On Wed, Oct 02, 2019 at 07:34:35PM -0600, David Ahern wrote:
> On 10/2/19 2:40 AM, Ido Schimmel wrote:
> > @@ -1269,14 +1269,19 @@ int fib_table_insert(struct net *net, struct fib_table *tb,
> >  	new_fa->tb_id = tb->tb_id;
> >  	new_fa->fa_default = -1;
> >  
> > -	err = call_fib_entry_notifiers(net, event, key, plen, new_fa, extack);
> > +	/* Insert new entry to the list. */
> > +	err = fib_insert_alias(t, tp, l, new_fa, fa, key);
> >  	if (err)
> >  		goto out_free_new_fa;
> >  
> > -	/* Insert new entry to the list. */
> > -	err = fib_insert_alias(t, tp, l, new_fa, fa, key);
> > +	/* The alias was already inserted, so the node must exist. */
> > +	l = fib_find_node(t, &tp, key);
> > +	if (WARN_ON_ONCE(!l))
> > +		goto out_free_new_fa;
> 
> Maybe I am missing something but, the 'l' is only needed for the error
> path, so optimize for the success case and only lookup the node if the
> notifier fails.

Hi David,

You're correct that in this patch it is only needed by the error path, but
later on in patch 4 ("ipv4: Notify newly added route if should be
offloaded") we actually need it here as well. The FIB node can be NULL
and fib_insert_alias() will create one if that's the case.

> 
> > +
> > +	err = call_fib_entry_notifiers(net, event, key, plen, new_fa, extack);
> >  	if (err)
> > -		goto out_fib_notif;
> > +		goto out_remove_new_fa;
> >  
> >  	if (!plen)
> >  		tb->tb_num_default++;
> > @@ -1287,14 +1292,8 @@ int fib_table_insert(struct net *net, struct fib_table *tb,
> >  succeeded:
> >  	return 0;
> >  
> > -out_fib_notif:
> > -	/* notifier was sent that entry would be added to trie, but
> > -	 * the add failed and need to recover. Only failure for
> > -	 * fib_insert_alias is ENOMEM.
> > -	 */
> > -	NL_SET_ERR_MSG(extack, "Failed to insert route into trie");
> > -	call_fib_entry_notifiers(net, FIB_EVENT_ENTRY_DEL, key,
> > -				 plen, new_fa, NULL);
> > +out_remove_new_fa:
> > +	fib_remove_alias(t, tp, l, new_fa);
> >  out_free_new_fa:
> >  	kmem_cache_free(fn_alias_kmem, new_fa);
> >  out:
> > 
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH net-next 00/15] Simplify IPv4 route offload API
  2019-10-02 18:17 ` [RFC PATCH net-next 00/15] Simplify IPv4 " Jiri Pirko
@ 2019-10-03  5:18   ` Ido Schimmel
  0 siblings, 0 replies; 39+ messages in thread
From: Ido Schimmel @ 2019-10-03  5:18 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, dsahern, jiri, jakub.kicinski, saeedm, mlxsw,
	Ido Schimmel

On Wed, Oct 02, 2019 at 08:17:59PM +0200, Jiri Pirko wrote:
> Wed, Oct 02, 2019 at 10:40:48AM CEST, idosch@idosch.org wrote:
> >From: Ido Schimmel <idosch@mellanox.com>
> >
> >Today, whenever an IPv4 route is added or deleted a notification is sent
> >in the FIB notification chain and it is up to offload drivers to decide
> >if the route should be programmed to the hardware or not. This is not an
> >easy task as in hardware routes are keyed by {prefix, prefix length,
> >table id}, whereas the kernel can store multiple such routes that only
> >differ in metric / TOS / nexthop info.
> >
> >This series makes sure that only routes that are actually used in the
> >data path are notified to offload drivers. This greatly simplifies the
> >work these drivers need to do, as they are now only concerned with
> >programming the hardware and do not need to replicate the IPv4 route
> >insertion logic and store multiple identical routes.
> >
> >The route that is notified is the first FIB alias in the FIB node with
> >the given {prefix, prefix length, table ID}. In case the route is
> >deleted and there is another route with the same key, a replace
> >notification is emitted. Otherwise, a delete notification is emitted.
> >
> >The above means that in the case of multiple routes with the same key,
> >but different TOS, only the route with the highest TOS is notified.
> >While the kernel can route a packet based on its TOS, this is not
> >supported by any hardware devices I'm familiar with. Moreover, this is
> >not supported by IPv6 nor by BIRD/FRR from what I could see. Offload
> >drivers should therefore use the presence of a non-zero TOS as an
> >indication to trap packets matching the route and let the kernel route
> >them instead. mlxsw has been doing it for the past two years.
> >
> >The series also adds an "in hardware" indication to routes, in addition
> 
> I think this might be a separate patchset. I mean patch "ipv4: Replace
> route in list before notifying" and above.

OK. I mainly wanted to have it together in order to submit the tests
with the patchset itself. I can split it into two.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH net-next 12/15] ipv4: Add "in hardware" indication to routes
  2019-10-03  2:34       ` David Ahern
@ 2019-10-03  5:37         ` Ido Schimmel
  2019-10-04  1:55           ` David Ahern
  2019-10-03  5:40         ` Jiri Pirko
  1 sibling, 1 reply; 39+ messages in thread
From: Ido Schimmel @ 2019-10-03  5:37 UTC (permalink / raw)
  To: David Ahern
  Cc: Jiri Pirko, Roopa Prabhu, netdev, David Miller, Jiri Pirko,
	Jakub Kicinski, Saeed Mahameed, mlxsw, Ido Schimmel

On Wed, Oct 02, 2019 at 08:34:22PM -0600, David Ahern wrote:
> On 10/2/19 12:21 PM, Jiri Pirko wrote:
> >>> This patch adds an "in hardware" indication to IPv4 routes, so that
> >>> users will have better visibility into the offload process. In the
> >>> future IPv6 will be extended with this indication as well.
> >>>
> >>> 'struct fib_alias' is extended with a new field that indicates if
> >>> the route resides in hardware or not. Note that the new field is added
> >>> in the 6 bytes hole and therefore the struct still fits in a single
> >>> cache line [1].
> >>>
> >>> Capable drivers are expected to invoke fib_alias_in_hw_{set,clear}()
> >>> with the route's key in order to set / clear the "in hardware
> >>> indication".
> >>>
> >>> The new indication is dumped to user space via a new flag (i.e.,
> >>> 'RTM_F_IN_HW') in the 'rtm_flags' field in the ancillary header.
> >>>
> >>
> >> nice series Ido. why not call this RTM_F_OFFLOAD to keep it consistent
> >> with the nexthop offload indication ?.
> > 
> > See the second paragraph of this description.
> 
> I read it multiple times. It does not explain why RTM_F_OFFLOAD is not
> used. Unless there is good reason RTM_F_OFFLOAD should be the name for
> consistency with all of the other OFFLOAD flags.

David, I'm not sure I understand the issue. You want the flag to be
called "RTM_F_OFFLOAD" to be consistent with "RTNH_F_OFFLOAD"? Are you
OK with iproute2 displaying it as "in_hw"? Displaying it as "offload" is
really wrong for the reasons I mentioned above. Host routes (for
example) do not offload anything from the kernel, they just reside in
hardware and trap packets...

The above is at least consistent with tc where we already have
"TCA_CLS_FLAGS_IN_HW".

> I realize rtm_flags is overloaded and the lower 8 bits contains RTNH_F
> flags, but that can be managed with good documentation - that RTNH_F
> is for the nexthop and RTM_F is for the prefix.

Are you talking about documenting the display strings in "ip-route" man
page or something else? If we stick with "offload" and "in_hw" then they
should probably be documented there to avoid confusion.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH net-next 12/15] ipv4: Add "in hardware" indication to routes
  2019-10-03  2:34       ` David Ahern
  2019-10-03  5:37         ` Ido Schimmel
@ 2019-10-03  5:40         ` Jiri Pirko
  1 sibling, 0 replies; 39+ messages in thread
From: Jiri Pirko @ 2019-10-03  5:40 UTC (permalink / raw)
  To: David Ahern
  Cc: Roopa Prabhu, Ido Schimmel, netdev, David Miller, Jiri Pirko,
	Jakub Kicinski, Saeed Mahameed, mlxsw, Ido Schimmel

Thu, Oct 03, 2019 at 04:34:22AM CEST, dsahern@gmail.com wrote:
>On 10/2/19 12:21 PM, Jiri Pirko wrote:
>>>> This patch adds an "in hardware" indication to IPv4 routes, so that
>>>> users will have better visibility into the offload process. In the
>>>> future IPv6 will be extended with this indication as well.
>>>>
>>>> 'struct fib_alias' is extended with a new field that indicates if
>>>> the route resides in hardware or not. Note that the new field is added
>>>> in the 6 bytes hole and therefore the struct still fits in a single
>>>> cache line [1].
>>>>
>>>> Capable drivers are expected to invoke fib_alias_in_hw_{set,clear}()
>>>> with the route's key in order to set / clear the "in hardware
>>>> indication".
>>>>
>>>> The new indication is dumped to user space via a new flag (i.e.,
>>>> 'RTM_F_IN_HW') in the 'rtm_flags' field in the ancillary header.
>>>>
>>>
>>> nice series Ido. why not call this RTM_F_OFFLOAD to keep it consistent
>>> with the nexthop offload indication ?.
>> 
>> See the second paragraph of this description.
>
>I read it multiple times. It does not explain why RTM_F_OFFLOAD is not
>used. Unless there is good reason RTM_F_OFFLOAD should be the name for
>consistency with all of the other OFFLOAD flags. I realize rtm_flags is
>overloaded and the lower 8 bits contains RTNH_F flags, but that can be
>managed with good documentation - that RTNH_F is for the nexthop and
>RTM_F is for the prefix.

"In addition, the fact that a route resides in hardware does
not necessarily mean that the traffic is offloaded."

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH net-next 12/15] ipv4: Add "in hardware" indication to routes
  2019-10-02 15:58   ` Roopa Prabhu
  2019-10-02 18:21     ` Jiri Pirko
@ 2019-10-03 12:59     ` Ido Schimmel
  2019-10-04  4:25       ` Roopa Prabhu
  1 sibling, 1 reply; 39+ messages in thread
From: Ido Schimmel @ 2019-10-03 12:59 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: netdev, David Miller, David Ahern, Jiri Pirko, Jakub Kicinski,
	Saeed Mahameed, mlxsw, Ido Schimmel

On Wed, Oct 02, 2019 at 08:58:52AM -0700, Roopa Prabhu wrote:
> On Wed, Oct 2, 2019 at 1:41 AM Ido Schimmel <idosch@idosch.org> wrote:
> >
> > From: Ido Schimmel <idosch@mellanox.com>
> >
> > When performing L3 offload, routes and nexthops are usually programmed
> > into two different tables in the underlying device. Therefore, the fact
> > that a nexthop resides in hardware does not necessarily mean that all
> > the associated routes also reside in hardware and vice-versa.
> >
> > While the kernel can signal to user space the presence of a nexthop in
> > hardware (via 'RTNH_F_OFFLOAD'), it does not have a corresponding flag
> > for routes. In addition, the fact that a route resides in hardware does
> > not necessarily mean that the traffic is offloaded. For example,
> > unreachable routes (i.e., 'RTN_UNREACHABLE') are programmed to trap
> > packets to the CPU so that the kernel will be able to generate the
> > appropriate ICMP error packet.
> >
> > This patch adds an "in hardware" indication to IPv4 routes, so that
> > users will have better visibility into the offload process. In the
> > future IPv6 will be extended with this indication as well.
> >
> > 'struct fib_alias' is extended with a new field that indicates if
> > the route resides in hardware or not. Note that the new field is added
> > in the 6 bytes hole and therefore the struct still fits in a single
> > cache line [1].
> >
> > Capable drivers are expected to invoke fib_alias_in_hw_{set,clear}()
> > with the route's key in order to set / clear the "in hardware
> > indication".
> >
> > The new indication is dumped to user space via a new flag (i.e.,
> > 'RTM_F_IN_HW') in the 'rtm_flags' field in the ancillary header.
> >
> 
> nice series Ido.

Thanks, Roopa. Forgot to copy you on this RFC. Will copy you on v1.

> why not call this RTM_F_OFFLOAD to keep it consistent with the nexthop
> offload indication ?.

I can call it RTM_F_OFFLOAD to be consistent with RTNH_F_OFFLOAD, but it
should really be displayed as "in_hw" to the user which is why I
preferred to use RTM_F_IN_HW.

We probably need to document the semantics better, but as I see it
"offload" is for functionality that we actually offload from the kernel.
For example, prefix and gatewayed routes. We do not set the offload mark
on host routes that are used to locally receive packets. We do mark them
as offloaded if they are used to decap IP-in-IP or VXLAN traffic.

Given the above, we do not have an easy way today to understand which
routes actually reside in hardware and which are not. Having this
information is very useful for debugging and testing (as evident by the
last patch in the series).

> But this again does not seem to be similar to the other request flags
> like: RTM_F_FIB_MATCH

Not sure I understand, similar in what way? Can you clarify?

> (so far i think all the RTNH_F_* flags are used on routes too IIRC
> (see iproute2: print_rt_flags)
> RTNH_F_DEAD seems to fall in this category)

The 'rtm_flags' field in the ancillary header is actually divided
between RTNH_F_ and RTM_F_ flags. When the route has a single nexthop
the RTNH_F_ flags are communicated to user space via this field.

Since I'm interested in letting user space know if the route itself
resides in hardware (not the nexthop) it seemed logical to me to use
RTM_F_ and encode it in 'rtm_flags'.

Shana tova (have a good year)!

> 
> 
> > [1]
> > struct fib_alias {
> >         struct hlist_node  fa_list;                      /*     0    16 */
> >         struct fib_info *          fa_info;              /*    16     8 */
> >         u8                         fa_tos;               /*    24     1 */
> >         u8                         fa_type;              /*    25     1 */
> >         u8                         fa_state;             /*    26     1 */
> >         u8                         fa_slen;              /*    27     1 */
> >         u32                        tb_id;                /*    28     4 */
> >         s16                        fa_default;           /*    32     2 */
> >         u8                         in_hw:1;              /*    34: 0  1 */
> >         u8                         unused:7;             /*    34: 1  1 */
> >
> >         /* XXX 5 bytes hole, try to pack */
> >
> >         struct callback_head rcu __attribute__((__aligned__(8))); /*    40    16 */
> >
> >         /* size: 56, cachelines: 1, members: 11 */
> >         /* sum members: 50, holes: 1, sum holes: 5 */
> >         /* sum bitfield members: 8 bits (1 bytes) */
> >         /* forced alignments: 1, forced holes: 1, sum forced holes: 5 */
> >         /* last cacheline: 56 bytes */
> > } __attribute__((__aligned__(8)));
> >
> > Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> > ---
> >  include/net/ip_fib.h           |  5 +++
> >  include/uapi/linux/rtnetlink.h |  1 +
> >  net/ipv4/fib_lookup.h          |  4 ++
> >  net/ipv4/fib_semantics.c       |  4 ++
> >  net/ipv4/fib_trie.c            | 71 ++++++++++++++++++++++++++++++++++
> >  5 files changed, 85 insertions(+)
> >
> > diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
> > index 52b2406a5dfc..019a138a79f4 100644
> > --- a/include/net/ip_fib.h
> > +++ b/include/net/ip_fib.h
> > @@ -454,6 +454,11 @@ int fib_nh_common_init(struct fib_nh_common *nhc, struct nlattr *fc_encap,
> >  void fib_nh_common_release(struct fib_nh_common *nhc);
> >
> >  /* Exported by fib_trie.c */
> > +void fib_alias_in_hw_set(struct net *net, u32 dst, int dst_len,
> > +                        const struct fib_info *fi, u8 tos, u8 type, u32 tb_id);
> > +void fib_alias_in_hw_clear(struct net *net, u32 dst, int dst_len,
> > +                          const struct fib_info *fi, u8 tos, u8 type,
> > +                          u32 tb_id);
> >  void fib_trie_init(void);
> >  struct fib_table *fib_trie_table(u32 id, struct fib_table *alias);
> >
> > diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
> > index 1418a8362bb7..e5a104f5ce35 100644
> > --- a/include/uapi/linux/rtnetlink.h
> > +++ b/include/uapi/linux/rtnetlink.h
> > @@ -309,6 +309,7 @@ enum rt_scope_t {
> >  #define RTM_F_PREFIX           0x800   /* Prefix addresses             */
> >  #define RTM_F_LOOKUP_TABLE     0x1000  /* set rtm_table to FIB lookup result */
> >  #define RTM_F_FIB_MATCH                0x2000  /* return full fib lookup match */
> > +#define RTM_F_IN_HW            0x4000  /* route is in hardware */
> >
> >  /* Reserved table identifiers */
> >
> > diff --git a/net/ipv4/fib_lookup.h b/net/ipv4/fib_lookup.h
> > index b34594a9965f..65a69a863499 100644
> > --- a/net/ipv4/fib_lookup.h
> > +++ b/net/ipv4/fib_lookup.h
> > @@ -16,6 +16,8 @@ struct fib_alias {
> >         u8                      fa_slen;
> >         u32                     tb_id;
> >         s16                     fa_default;
> > +       u8                      in_hw:1,
> > +                               unused:7;
> >         struct rcu_head         rcu;
> >  };
> >
> > @@ -28,6 +30,8 @@ struct fib_rt_info {
> >         int                     dst_len;
> >         u8                      tos;
> >         u8                      type;
> > +       u8                      in_hw:1,
> > +                               unused:7;
> >  };
> >
> >  /* Dont write on fa_state unless needed, to keep it shared on all cpus */
> > diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
> > index 3c9d47804d93..94f201d44844 100644
> > --- a/net/ipv4/fib_semantics.c
> > +++ b/net/ipv4/fib_semantics.c
> > @@ -519,6 +519,7 @@ void rtmsg_fib(int event, __be32 key, struct fib_alias *fa,
> >         fri.dst_len = dst_len;
> >         fri.tos = fa->fa_tos;
> >         fri.type = fa->fa_type;
> > +       fri.in_hw = fa->in_hw;
> >         err = fib_dump_info(skb, info->portid, seq, event, &fri, nlm_flags);
> >         if (err < 0) {
> >                 /* -EMSGSIZE implies BUG in fib_nlmsg_size() */
> > @@ -1801,6 +1802,9 @@ int fib_dump_info(struct sk_buff *skb, u32 portid, u32 seq, int event,
> >                         goto nla_put_failure;
> >         }
> >
> > +       if (fri->in_hw)
> > +               rtm->rtm_flags |= RTM_F_IN_HW;
> > +
> >         nlmsg_end(skb, nlh);
> >         return 0;
> >
> > diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
> > index 646542de83ca..e3486bde6c5a 100644
> > --- a/net/ipv4/fib_trie.c
> > +++ b/net/ipv4/fib_trie.c
> > @@ -1028,6 +1028,74 @@ static struct fib_alias *fib_find_alias(struct hlist_head *fah, u8 slen,
> >         return NULL;
> >  }
> >
> > +static struct fib_alias *
> > +fib_find_matching_alias(struct net *net, u32 dst, int dst_len,
> > +                       const struct fib_info *fi, u8 tos, u8 type, u32 tb_id)
> > +{
> > +       u8 slen = KEYLENGTH - dst_len;
> > +       struct key_vector *l, *tp;
> > +       struct fib_table *tb;
> > +       struct fib_alias *fa;
> > +       struct trie *t;
> > +
> > +       tb = fib_get_table(net, tb_id);
> > +       if (!tb)
> > +               return NULL;
> > +
> > +       t = (struct trie *)tb->tb_data;
> > +       l = fib_find_node(t, &tp, dst);
> > +       if (!l)
> > +               return NULL;
> > +
> > +       hlist_for_each_entry_rcu(fa, &l->leaf, fa_list) {
> > +               if (fa->fa_slen == slen && fa->tb_id == tb_id &&
> > +                   fa->fa_tos == tos && fa->fa_info == fi &&
> > +                   fa->fa_type == type)
> > +                       return fa;
> > +       }
> > +
> > +       return NULL;
> > +}
> > +
> > +void fib_alias_in_hw_set(struct net *net, u32 dst, int dst_len,
> > +                        const struct fib_info *fi, u8 tos, u8 type, u32 tb_id)
> > +{
> > +       struct fib_alias *fa_match;
> > +
> > +       rcu_read_lock();
> > +
> > +       fa_match = fib_find_matching_alias(net, dst, dst_len, fi, tos, type,
> > +                                          tb_id);
> > +       if (!fa_match)
> > +               goto out;
> > +
> > +       fa_match->in_hw = 1;
> > +
> > +out:
> > +       rcu_read_unlock();
> > +}
> > +EXPORT_SYMBOL_GPL(fib_alias_in_hw_set);
> > +
> > +void fib_alias_in_hw_clear(struct net *net, u32 dst, int dst_len,
> > +                          const struct fib_info *fi, u8 tos, u8 type,
> > +                          u32 tb_id)
> > +{
> > +       struct fib_alias *fa_match;
> > +
> > +       rcu_read_lock();
> > +
> > +       fa_match = fib_find_matching_alias(net, dst, dst_len, fi, tos, type,
> > +                                          tb_id);
> > +       if (!fa_match)
> > +               goto out;
> > +
> > +       fa_match->in_hw = 0;
> > +
> > +out:
> > +       rcu_read_unlock();
> > +}
> > +EXPORT_SYMBOL_GPL(fib_alias_in_hw_clear);
> > +
> >  static void trie_rebalance(struct trie *t, struct key_vector *tn)
> >  {
> >         while (!IS_TRIE(tn))
> > @@ -1236,6 +1304,7 @@ int fib_table_insert(struct net *net, struct fib_table *tb,
> >                         new_fa->fa_slen = fa->fa_slen;
> >                         new_fa->tb_id = tb->tb_id;
> >                         new_fa->fa_default = -1;
> > +                       new_fa->in_hw = 0;
> >
> >                         hlist_replace_rcu(&fa->fa_list, &new_fa->fa_list);
> >
> > @@ -1294,6 +1363,7 @@ int fib_table_insert(struct net *net, struct fib_table *tb,
> >         new_fa->fa_slen = slen;
> >         new_fa->tb_id = tb->tb_id;
> >         new_fa->fa_default = -1;
> > +       new_fa->in_hw = 0;
> >
> >         /* Insert new entry to the list. */
> >         err = fib_insert_alias(t, tp, l, new_fa, fa, key);
> > @@ -2218,6 +2288,7 @@ static int fn_trie_dump_leaf(struct key_vector *l, struct fib_table *tb,
> >                                 fri.dst_len = KEYLENGTH - fa->fa_slen;
> >                                 fri.tos = fa->fa_tos;
> >                                 fri.type = fa->fa_type;
> > +                               fri.in_hw = fa->in_hw;
> >                                 err = fib_dump_info(skb,
> >                                                     NETLINK_CB(cb->skb).portid,
> >                                                     cb->nlh->nlmsg_seq,
> > --
> > 2.21.0
> >

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH net-next 07/15] ipv4: Only Replay routes of interest to new listeners
  2019-10-02 17:44   ` Jiri Pirko
@ 2019-10-03 13:04     ` Ido Schimmel
  0 siblings, 0 replies; 39+ messages in thread
From: Ido Schimmel @ 2019-10-03 13:04 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, dsahern, jiri, jakub.kicinski, saeedm, mlxsw,
	Ido Schimmel

On Wed, Oct 02, 2019 at 07:44:02PM +0200, Jiri Pirko wrote:
> Wed, Oct 02, 2019 at 10:40:55AM CEST, idosch@idosch.org wrote:
> >From: Ido Schimmel <idosch@mellanox.com>
> >
> >When a new listener is registered to the FIB notification chain it
> >receives a dump of all the available routes in the system. Instead, make
> >sure to only replay the IPv4 routes that are actually used in the data
> >path and are of any interest to the new listener.
> >
> >Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> >---
> > net/ipv4/fib_trie.c | 10 ++++++++++
> > 1 file changed, 10 insertions(+)
> >
> >diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
> >index dc4c4e2cb0b3..4937a3503f4f 100644
> >--- a/net/ipv4/fib_trie.c
> >+++ b/net/ipv4/fib_trie.c
> >@@ -2096,6 +2096,7 @@ static int fib_leaf_notify(struct key_vector *l, struct fib_table *tb,
> > 			   struct netlink_ext_ack *extack)
> > {
> > 	struct fib_alias *fa;
> >+	int last_slen = -1;
> > 	int err;
> > 
> > 	hlist_for_each_entry_rcu(fa, &l->leaf, fa_list) {
> >@@ -2110,6 +2111,15 @@ static int fib_leaf_notify(struct key_vector *l, struct fib_table *tb,
> > 		if (tb->tb_id != fa->tb_id)
> > 			continue;
> > 
> >+		if (fa->fa_slen == last_slen)
> >+			continue;
> 
> Hmm, I wonder, don't you want to continue only for FIB_EVENT_ENTRY_REPLACE_TMP
> and keep the notifier call for FIB_EVENT_ENTRY_ADD?

Yea, I think you're right. As-is this introduces a small regression
until later in the series. Will fix. Thanks!

> 
> 
> >+
> >+		last_slen = fa->fa_slen;
> >+		err = call_fib_entry_notifier(nb, FIB_EVENT_ENTRY_REPLACE_TMP,
> >+					      l->key, KEYLENGTH - fa->fa_slen,
> >+					      fa, extack);
> >+		if (err)
> >+			return err;
> > 		err = call_fib_entry_notifier(nb, FIB_EVENT_ENTRY_ADD, l->key,
> > 					      KEYLENGTH - fa->fa_slen,
> > 					      fa, extack);
> >-- 
> >2.21.0
> >

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH net-next 08/15] mlxsw: spectrum_router: Start using new IPv4 route notifications
  2019-10-02 18:01     ` Jiri Pirko
@ 2019-10-03 15:10       ` Ido Schimmel
  0 siblings, 0 replies; 39+ messages in thread
From: Ido Schimmel @ 2019-10-03 15:10 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, dsahern, jiri, jakub.kicinski, saeedm, mlxsw,
	Ido Schimmel

On Wed, Oct 02, 2019 at 08:01:44PM +0200, Jiri Pirko wrote:
> Wed, Oct 02, 2019 at 07:52:30PM CEST, jiri@resnulli.us wrote:
> >Wed, Oct 02, 2019 at 10:40:56AM CEST, idosch@idosch.org wrote:
> >>From: Ido Schimmel <idosch@mellanox.com>
> >>
> >>With the new notifications mlxsw does not need to handle identical
> >>routes itself, as this is taken care of by the core IPv4 code.
> >>
> >>Instead, mlxsw only needs to take care of inserting and removing routes
> >>from the device.
> >>
> >>Convert mlxsw to use the new IPv4 route notifications and simplify the
> >>code.
> >>
> >
> >[...]
> >
> >
> >>@@ -6246,9 +6147,10 @@ static int mlxsw_sp_router_fib_event(struct notifier_block *nb,
> >> 		err = mlxsw_sp_router_fib_rule_event(event, info,
> >> 						     router->mlxsw_sp);
> >> 		return notifier_from_errno(err);
> >>-	case FIB_EVENT_ENTRY_ADD:
> >>+	case FIB_EVENT_ENTRY_ADD: /* fall through */
> >> 	case FIB_EVENT_ENTRY_REPLACE: /* fall through */
> >> 	case FIB_EVENT_ENTRY_APPEND:  /* fall through */
> >
> >Why don't you skip the three above with just return of NOTIFY_DONE?
> 
> if (info->family == AF_INET)
> 	return NOTIFY_DONE;

It's not really needed given ADD and APPEND are not sent for AF_INET
after this patchset. But I did find out that I forgot to remove them
from mlxsw_sp_router_fib4_event(), so I'll do that in v1.

Thanks!

> 
> >
> >
> >>+	case FIB_EVENT_ENTRY_REPLACE_TMP:
> >> 		if (router->aborted) {
> >> 			NL_SET_ERR_MSG_MOD(info->extack, "FIB offload was aborted. Not configuring route");
> >> 			return notifier_from_errno(-EINVAL);
> >>-- 
> >>2.21.0
> >>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH net-next 13/15] mlxsw: spectrum_router: Mark routes as "in hardware"
  2019-10-02 18:27   ` Jiri Pirko
@ 2019-10-03 15:16     ` Ido Schimmel
  0 siblings, 0 replies; 39+ messages in thread
From: Ido Schimmel @ 2019-10-03 15:16 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, dsahern, jiri, jakub.kicinski, saeedm, mlxsw,
	Ido Schimmel

On Wed, Oct 02, 2019 at 08:27:30PM +0200, Jiri Pirko wrote:
> Wed, Oct 02, 2019 at 10:41:01AM CEST, idosch@idosch.org wrote:
> >From: Ido Schimmel <idosch@mellanox.com>
> >
> >Make use of the recently introduced APIs and mark notified routes as "in
> >hardware" after they were programmed to the device's LPM tree.
> >
> >Similarly, when a route is replaced by an higher priority one, clear the
> >"in hardware" indication from it.
> >
> >Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> >---
> > .../ethernet/mellanox/mlxsw/spectrum_router.c | 19 +++++++++++++++++++
> > 1 file changed, 19 insertions(+)
> >
> >diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
> >index 5a4e61f1feec..26ab8ae482ec 100644
> >--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
> >+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
> >@@ -4769,7 +4769,10 @@ static void mlxsw_sp_fib4_entry_replace(struct mlxsw_sp *mlxsw_sp,
> > 					struct mlxsw_sp_fib4_entry *fib4_entry)
> > {
> > 	struct mlxsw_sp_fib_node *fib_node = fib4_entry->common.fib_node;
> >+	struct net *net = mlxsw_sp_net(mlxsw_sp);
> >+	u32 *addr = (u32 *) fib_node->key.addr;
> > 	struct mlxsw_sp_fib4_entry *replaced;
> >+	struct fib_info *fi;
> > 
> > 	if (list_is_singular(&fib_node->entry_list))
> > 		return;
> >@@ -4777,6 +4780,10 @@ static void mlxsw_sp_fib4_entry_replace(struct mlxsw_sp *mlxsw_sp,
> > 	/* We inserted the new entry before replaced one */
> > 	replaced = list_next_entry(fib4_entry, common.list);
> > 
> >+	fi = mlxsw_sp_nexthop4_group_fi(replaced->common.nh_group);
> >+	fib_alias_in_hw_clear(net, *addr, fib_node->key.prefix_len, fi,
> >+			      replaced->tos, replaced->type, replaced->tb_id);
> >+
> > 	mlxsw_sp_fib4_node_entry_unlink(mlxsw_sp, replaced);
> > 	mlxsw_sp_fib4_entry_destroy(mlxsw_sp, replaced);
> > 	mlxsw_sp_fib_node_put(mlxsw_sp, fib_node);
> >@@ -4786,6 +4793,7 @@ static int
> > mlxsw_sp_router_fib4_replace(struct mlxsw_sp *mlxsw_sp,
> > 			     const struct fib_entry_notifier_info *fen_info)
> > {
> >+	struct net *net = mlxsw_sp_net(mlxsw_sp);
> > 	struct mlxsw_sp_fib4_entry *fib4_entry;
> > 	struct mlxsw_sp_fib_node *fib_node;
> > 	int err;
> >@@ -4815,6 +4823,10 @@ mlxsw_sp_router_fib4_replace(struct mlxsw_sp *mlxsw_sp,
> > 		goto err_fib4_node_entry_link;
> > 	}
> > 
> >+	fib_alias_in_hw_set(net, fen_info->dst, fen_info->dst_len,
> >+			    fen_info->fi, fen_info->tos, fen_info->type,
> >+			    fen_info->tb_id);
> 
> Can't you pass "fa" through fen_info and down to fib_alias_in_hw_set and
> avoid lookup?

No, because we don't have a reference count on 'fa' and we can't
guarantee that it will not disappear by the time we want to mark it.

> 
> 
> >+
> > 	mlxsw_sp_fib4_entry_replace(mlxsw_sp, fib4_entry);
> > 
> > 	return 0;
> >@@ -5731,11 +5743,18 @@ static void mlxsw_sp_fib4_node_flush(struct mlxsw_sp *mlxsw_sp,
> > 				     struct mlxsw_sp_fib_node *fib_node)
> > {
> > 	struct mlxsw_sp_fib4_entry *fib4_entry, *tmp;
> >+	struct net *net = mlxsw_sp_net(mlxsw_sp);
> >+	u32 *addr = (u32 *) fib_node->key.addr;
> > 
> > 	list_for_each_entry_safe(fib4_entry, tmp, &fib_node->entry_list,
> > 				 common.list) {
> > 		bool do_break = &tmp->common.list == &fib_node->entry_list;
> >+		struct fib_info *fi;
> > 
> >+		fi = mlxsw_sp_nexthop4_group_fi(fib4_entry->common.nh_group);
> >+		fib_alias_in_hw_clear(net, *addr, fib_node->key.prefix_len, fi,
> >+				      fib4_entry->tos, fib4_entry->type,
> >+				      fib4_entry->tb_id);
> > 		mlxsw_sp_fib4_node_entry_unlink(mlxsw_sp, fib4_entry);
> > 		mlxsw_sp_fib4_entry_destroy(mlxsw_sp, fib4_entry);
> > 		mlxsw_sp_fib_node_put(mlxsw_sp, fib_node);
> >-- 
> >2.21.0
> >

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH net-next 12/15] ipv4: Add "in hardware" indication to routes
  2019-10-03  5:37         ` Ido Schimmel
@ 2019-10-04  1:55           ` David Ahern
  2019-10-04 14:43             ` Ido Schimmel
  0 siblings, 1 reply; 39+ messages in thread
From: David Ahern @ 2019-10-04  1:55 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Jiri Pirko, Roopa Prabhu, netdev, David Miller, Jiri Pirko,
	Jakub Kicinski, Saeed Mahameed, mlxsw, Ido Schimmel

On 10/2/19 11:37 PM, Ido Schimmel wrote:
>>>>> The new indication is dumped to user space via a new flag (i.e.,
>>>>> 'RTM_F_IN_HW') in the 'rtm_flags' field in the ancillary header.
>>>>>
>>>>
>>>> nice series Ido. why not call this RTM_F_OFFLOAD to keep it consistent
>>>> with the nexthop offload indication ?.
>>>
>>> See the second paragraph of this description.
>>
>> I read it multiple times. It does not explain why RTM_F_OFFLOAD is not
>> used. Unless there is good reason RTM_F_OFFLOAD should be the name for
>> consistency with all of the other OFFLOAD flags.
> 
> David, I'm not sure I understand the issue. You want the flag to be
> called "RTM_F_OFFLOAD" to be consistent with "RTNH_F_OFFLOAD"? Are you
> OK with iproute2 displaying it as "in_hw"? Displaying it as "offload" is
> really wrong for the reasons I mentioned above. Host routes (for
> example) do not offload anything from the kernel, they just reside in
> hardware and trap packets...
> 
> The above is at least consistent with tc where we already have
> "TCA_CLS_FLAGS_IN_HW".
> 
>> I realize rtm_flags is overloaded and the lower 8 bits contains RTNH_F
>> flags, but that can be managed with good documentation - that RTNH_F
>> is for the nexthop and RTM_F is for the prefix.
> 
> Are you talking about documenting the display strings in "ip-route" man
> page or something else? If we stick with "offload" and "in_hw" then they
> should probably be documented there to avoid confusion.
> 

Sounds like there are 2 cases for prefixes that should be flagged to the
user -- "offloaded" (as in traffic is offloaded) and  "in_hw" (prefix is
in hardware but forwarding is not offloaded).

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH net-next 12/15] ipv4: Add "in hardware" indication to routes
  2019-10-03 12:59     ` Ido Schimmel
@ 2019-10-04  4:25       ` Roopa Prabhu
  0 siblings, 0 replies; 39+ messages in thread
From: Roopa Prabhu @ 2019-10-04  4:25 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, David Miller, David Ahern, Jiri Pirko, Jakub Kicinski,
	Saeed Mahameed, mlxsw, Ido Schimmel

On Thu, Oct 3, 2019 at 5:59 AM Ido Schimmel <idosch@idosch.org> wrote:
>
> On Wed, Oct 02, 2019 at 08:58:52AM -0700, Roopa Prabhu wrote:
> > On Wed, Oct 2, 2019 at 1:41 AM Ido Schimmel <idosch@idosch.org> wrote:
> > >
> > > From: Ido Schimmel <idosch@mellanox.com>
> > >
> > > When performing L3 offload, routes and nexthops are usually programmed
> > > into two different tables in the underlying device. Therefore, the fact
> > > that a nexthop resides in hardware does not necessarily mean that all
> > > the associated routes also reside in hardware and vice-versa.
> > >
> > > While the kernel can signal to user space the presence of a nexthop in
> > > hardware (via 'RTNH_F_OFFLOAD'), it does not have a corresponding flag
> > > for routes. In addition, the fact that a route resides in hardware does
> > > not necessarily mean that the traffic is offloaded. For example,
> > > unreachable routes (i.e., 'RTN_UNREACHABLE') are programmed to trap
> > > packets to the CPU so that the kernel will be able to generate the
> > > appropriate ICMP error packet.
> > >
> > > This patch adds an "in hardware" indication to IPv4 routes, so that
> > > users will have better visibility into the offload process. In the
> > > future IPv6 will be extended with this indication as well.
> > >
> > > 'struct fib_alias' is extended with a new field that indicates if
> > > the route resides in hardware or not. Note that the new field is added
> > > in the 6 bytes hole and therefore the struct still fits in a single
> > > cache line [1].
> > >
> > > Capable drivers are expected to invoke fib_alias_in_hw_{set,clear}()
> > > with the route's key in order to set / clear the "in hardware
> > > indication".
> > >
> > > The new indication is dumped to user space via a new flag (i.e.,
> > > 'RTM_F_IN_HW') in the 'rtm_flags' field in the ancillary header.
> > >
> >
> > nice series Ido.
>
> Thanks, Roopa. Forgot to copy you on this RFC. Will copy you on v1.

Thanks Ido.

>
> > why not call this RTM_F_OFFLOAD to keep it consistent with the nexthop
> > offload indication ?.
>
> I can call it RTM_F_OFFLOAD to be consistent with RTNH_F_OFFLOAD, but it
> should really be displayed as "in_hw" to the user which is why I
> preferred to use RTM_F_IN_HW.

ok, i missed that part. So you are trying to add a variant of hardware
offloaded routes.
In that case I prefer you keep the display string a direct translation
of the flag name.

>
> We probably need to document the semantics better, but as I see it
> "offload" is for functionality that we actually offload from the kernel.
> For example, prefix and gatewayed routes. We do not set the offload mark
> on host routes that are used to locally receive packets. We do mark them
> as offloaded if they are used to decap IP-in-IP or VXLAN traffic.
>
> Given the above, we do not have an easy way today to understand which
> routes actually reside in hardware and which are not. Having this
> information is very useful for debugging and testing (as evident by the
> last patch in the series).
>
> > But this again does not seem to be similar to the other request flags
> > like: RTM_F_FIB_MATCH
>
> Not sure I understand, similar in what way? Can you clarify?

ignore that...i was just trying to understand again why you are adding
it in rtm_flags with RTM_F_
instead of RTNH_F. I think your below point clarifies it.

>
> > (so far i think all the RTNH_F_* flags are used on routes too IIRC
> > (see iproute2: print_rt_flags)
> > RTNH_F_DEAD seems to fall in this category)
>
> The 'rtm_flags' field in the ancillary header is actually divided
> between RTNH_F_ and RTM_F_ flags. When the route has a single nexthop
> the RTNH_F_ flags are communicated to user space via this field.
>
> Since I'm interested in letting user space know if the route itself
> resides in hardware (not the nexthop) it seemed logical to me to use
> RTM_F_ and encode it in 'rtm_flags'.

ok understood. so you really are introducing a variant of OFFLOAD flag.
and yes, it will need to be documented correctly..the names can lead
to confusion.
....maybe having it RTM_F_OFFLOAD_IN_HW would keep them related by
name... up to you.

>
> Shana tova (have a good year)!

you too!

>
> >
> >
> > > [1]
> > > struct fib_alias {
> > >         struct hlist_node  fa_list;                      /*     0    16 */
> > >         struct fib_info *          fa_info;              /*    16     8 */
> > >         u8                         fa_tos;               /*    24     1 */
> > >         u8                         fa_type;              /*    25     1 */
> > >         u8                         fa_state;             /*    26     1 */
> > >         u8                         fa_slen;              /*    27     1 */
> > >         u32                        tb_id;                /*    28     4 */
> > >         s16                        fa_default;           /*    32     2 */
> > >         u8                         in_hw:1;              /*    34: 0  1 */
> > >         u8                         unused:7;             /*    34: 1  1 */
> > >
> > >         /* XXX 5 bytes hole, try to pack */
> > >
> > >         struct callback_head rcu __attribute__((__aligned__(8))); /*    40    16 */
> > >
> > >         /* size: 56, cachelines: 1, members: 11 */
> > >         /* sum members: 50, holes: 1, sum holes: 5 */
> > >         /* sum bitfield members: 8 bits (1 bytes) */
> > >         /* forced alignments: 1, forced holes: 1, sum forced holes: 5 */
> > >         /* last cacheline: 56 bytes */
> > > } __attribute__((__aligned__(8)));
> > >
> > > Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> > > ---
> > >  include/net/ip_fib.h           |  5 +++
> > >  include/uapi/linux/rtnetlink.h |  1 +
> > >  net/ipv4/fib_lookup.h          |  4 ++
> > >  net/ipv4/fib_semantics.c       |  4 ++
> > >  net/ipv4/fib_trie.c            | 71 ++++++++++++++++++++++++++++++++++
> > >  5 files changed, 85 insertions(+)
> > >
> > > diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
> > > index 52b2406a5dfc..019a138a79f4 100644
> > > --- a/include/net/ip_fib.h
> > > +++ b/include/net/ip_fib.h
> > > @@ -454,6 +454,11 @@ int fib_nh_common_init(struct fib_nh_common *nhc, struct nlattr *fc_encap,
> > >  void fib_nh_common_release(struct fib_nh_common *nhc);
> > >
> > >  /* Exported by fib_trie.c */
> > > +void fib_alias_in_hw_set(struct net *net, u32 dst, int dst_len,
> > > +                        const struct fib_info *fi, u8 tos, u8 type, u32 tb_id);
> > > +void fib_alias_in_hw_clear(struct net *net, u32 dst, int dst_len,
> > > +                          const struct fib_info *fi, u8 tos, u8 type,
> > > +                          u32 tb_id);
> > >  void fib_trie_init(void);
> > >  struct fib_table *fib_trie_table(u32 id, struct fib_table *alias);
> > >
> > > diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
> > > index 1418a8362bb7..e5a104f5ce35 100644
> > > --- a/include/uapi/linux/rtnetlink.h
> > > +++ b/include/uapi/linux/rtnetlink.h
> > > @@ -309,6 +309,7 @@ enum rt_scope_t {
> > >  #define RTM_F_PREFIX           0x800   /* Prefix addresses             */
> > >  #define RTM_F_LOOKUP_TABLE     0x1000  /* set rtm_table to FIB lookup result */
> > >  #define RTM_F_FIB_MATCH                0x2000  /* return full fib lookup match */
> > > +#define RTM_F_IN_HW            0x4000  /* route is in hardware */
> > >
> > >  /* Reserved table identifiers */
> > >
> > > diff --git a/net/ipv4/fib_lookup.h b/net/ipv4/fib_lookup.h
> > > index b34594a9965f..65a69a863499 100644
> > > --- a/net/ipv4/fib_lookup.h
> > > +++ b/net/ipv4/fib_lookup.h
> > > @@ -16,6 +16,8 @@ struct fib_alias {
> > >         u8                      fa_slen;
> > >         u32                     tb_id;
> > >         s16                     fa_default;
> > > +       u8                      in_hw:1,
> > > +                               unused:7;
> > >         struct rcu_head         rcu;
> > >  };
> > >
> > > @@ -28,6 +30,8 @@ struct fib_rt_info {
> > >         int                     dst_len;
> > >         u8                      tos;
> > >         u8                      type;
> > > +       u8                      in_hw:1,
> > > +                               unused:7;
> > >  };
> > >
> > >  /* Dont write on fa_state unless needed, to keep it shared on all cpus */
> > > diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
> > > index 3c9d47804d93..94f201d44844 100644
> > > --- a/net/ipv4/fib_semantics.c
> > > +++ b/net/ipv4/fib_semantics.c
> > > @@ -519,6 +519,7 @@ void rtmsg_fib(int event, __be32 key, struct fib_alias *fa,
> > >         fri.dst_len = dst_len;
> > >         fri.tos = fa->fa_tos;
> > >         fri.type = fa->fa_type;
> > > +       fri.in_hw = fa->in_hw;
> > >         err = fib_dump_info(skb, info->portid, seq, event, &fri, nlm_flags);
> > >         if (err < 0) {
> > >                 /* -EMSGSIZE implies BUG in fib_nlmsg_size() */
> > > @@ -1801,6 +1802,9 @@ int fib_dump_info(struct sk_buff *skb, u32 portid, u32 seq, int event,
> > >                         goto nla_put_failure;
> > >         }
> > >
> > > +       if (fri->in_hw)
> > > +               rtm->rtm_flags |= RTM_F_IN_HW;
> > > +
> > >         nlmsg_end(skb, nlh);
> > >         return 0;
> > >
> > > diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
> > > index 646542de83ca..e3486bde6c5a 100644
> > > --- a/net/ipv4/fib_trie.c
> > > +++ b/net/ipv4/fib_trie.c
> > > @@ -1028,6 +1028,74 @@ static struct fib_alias *fib_find_alias(struct hlist_head *fah, u8 slen,
> > >         return NULL;
> > >  }
> > >
> > > +static struct fib_alias *
> > > +fib_find_matching_alias(struct net *net, u32 dst, int dst_len,
> > > +                       const struct fib_info *fi, u8 tos, u8 type, u32 tb_id)
> > > +{
> > > +       u8 slen = KEYLENGTH - dst_len;
> > > +       struct key_vector *l, *tp;
> > > +       struct fib_table *tb;
> > > +       struct fib_alias *fa;
> > > +       struct trie *t;
> > > +
> > > +       tb = fib_get_table(net, tb_id);
> > > +       if (!tb)
> > > +               return NULL;
> > > +
> > > +       t = (struct trie *)tb->tb_data;
> > > +       l = fib_find_node(t, &tp, dst);
> > > +       if (!l)
> > > +               return NULL;
> > > +
> > > +       hlist_for_each_entry_rcu(fa, &l->leaf, fa_list) {
> > > +               if (fa->fa_slen == slen && fa->tb_id == tb_id &&
> > > +                   fa->fa_tos == tos && fa->fa_info == fi &&
> > > +                   fa->fa_type == type)
> > > +                       return fa;
> > > +       }
> > > +
> > > +       return NULL;
> > > +}
> > > +
> > > +void fib_alias_in_hw_set(struct net *net, u32 dst, int dst_len,
> > > +                        const struct fib_info *fi, u8 tos, u8 type, u32 tb_id)
> > > +{
> > > +       struct fib_alias *fa_match;
> > > +
> > > +       rcu_read_lock();
> > > +
> > > +       fa_match = fib_find_matching_alias(net, dst, dst_len, fi, tos, type,
> > > +                                          tb_id);
> > > +       if (!fa_match)
> > > +               goto out;
> > > +
> > > +       fa_match->in_hw = 1;
> > > +
> > > +out:
> > > +       rcu_read_unlock();
> > > +}
> > > +EXPORT_SYMBOL_GPL(fib_alias_in_hw_set);
> > > +
> > > +void fib_alias_in_hw_clear(struct net *net, u32 dst, int dst_len,
> > > +                          const struct fib_info *fi, u8 tos, u8 type,
> > > +                          u32 tb_id)
> > > +{
> > > +       struct fib_alias *fa_match;
> > > +
> > > +       rcu_read_lock();
> > > +
> > > +       fa_match = fib_find_matching_alias(net, dst, dst_len, fi, tos, type,
> > > +                                          tb_id);
> > > +       if (!fa_match)
> > > +               goto out;
> > > +
> > > +       fa_match->in_hw = 0;
> > > +
> > > +out:
> > > +       rcu_read_unlock();
> > > +}
> > > +EXPORT_SYMBOL_GPL(fib_alias_in_hw_clear);
> > > +
> > >  static void trie_rebalance(struct trie *t, struct key_vector *tn)
> > >  {
> > >         while (!IS_TRIE(tn))
> > > @@ -1236,6 +1304,7 @@ int fib_table_insert(struct net *net, struct fib_table *tb,
> > >                         new_fa->fa_slen = fa->fa_slen;
> > >                         new_fa->tb_id = tb->tb_id;
> > >                         new_fa->fa_default = -1;
> > > +                       new_fa->in_hw = 0;
> > >
> > >                         hlist_replace_rcu(&fa->fa_list, &new_fa->fa_list);
> > >
> > > @@ -1294,6 +1363,7 @@ int fib_table_insert(struct net *net, struct fib_table *tb,
> > >         new_fa->fa_slen = slen;
> > >         new_fa->tb_id = tb->tb_id;
> > >         new_fa->fa_default = -1;
> > > +       new_fa->in_hw = 0;
> > >
> > >         /* Insert new entry to the list. */
> > >         err = fib_insert_alias(t, tp, l, new_fa, fa, key);
> > > @@ -2218,6 +2288,7 @@ static int fn_trie_dump_leaf(struct key_vector *l, struct fib_table *tb,
> > >                                 fri.dst_len = KEYLENGTH - fa->fa_slen;
> > >                                 fri.tos = fa->fa_tos;
> > >                                 fri.type = fa->fa_type;
> > > +                               fri.in_hw = fa->in_hw;
> > >                                 err = fib_dump_info(skb,
> > >                                                     NETLINK_CB(cb->skb).portid,
> > >                                                     cb->nlh->nlmsg_seq,
> > > --
> > > 2.21.0
> > >

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH net-next 12/15] ipv4: Add "in hardware" indication to routes
  2019-10-04  1:55           ` David Ahern
@ 2019-10-04 14:43             ` Ido Schimmel
  2019-10-04 16:38               ` David Ahern
  0 siblings, 1 reply; 39+ messages in thread
From: Ido Schimmel @ 2019-10-04 14:43 UTC (permalink / raw)
  To: David Ahern, roopa
  Cc: Jiri Pirko, netdev, David Miller, Jiri Pirko, Jakub Kicinski,
	Saeed Mahameed, mlxsw, Ido Schimmel

On Thu, Oct 03, 2019 at 07:55:16PM -0600, David Ahern wrote:
> On 10/2/19 11:37 PM, Ido Schimmel wrote:
> >>>>> The new indication is dumped to user space via a new flag (i.e.,
> >>>>> 'RTM_F_IN_HW') in the 'rtm_flags' field in the ancillary header.
> >>>>>
> >>>>
> >>>> nice series Ido. why not call this RTM_F_OFFLOAD to keep it consistent
> >>>> with the nexthop offload indication ?.
> >>>
> >>> See the second paragraph of this description.
> >>
> >> I read it multiple times. It does not explain why RTM_F_OFFLOAD is not
> >> used. Unless there is good reason RTM_F_OFFLOAD should be the name for
> >> consistency with all of the other OFFLOAD flags.
> > 
> > David, I'm not sure I understand the issue. You want the flag to be
> > called "RTM_F_OFFLOAD" to be consistent with "RTNH_F_OFFLOAD"? Are you
> > OK with iproute2 displaying it as "in_hw"? Displaying it as "offload" is
> > really wrong for the reasons I mentioned above. Host routes (for
> > example) do not offload anything from the kernel, they just reside in
> > hardware and trap packets...
> > 
> > The above is at least consistent with tc where we already have
> > "TCA_CLS_FLAGS_IN_HW".
> > 
> >> I realize rtm_flags is overloaded and the lower 8 bits contains RTNH_F
> >> flags, but that can be managed with good documentation - that RTNH_F
> >> is for the nexthop and RTM_F is for the prefix.
> > 
> > Are you talking about documenting the display strings in "ip-route" man
> > page or something else? If we stick with "offload" and "in_hw" then they
> > should probably be documented there to avoid confusion.
> > 
> 
> Sounds like there are 2 cases for prefixes that should be flagged to the
> user -- "offloaded" (as in traffic is offloaded) and  "in_hw" (prefix is
> in hardware but forwarding is not offloaded).

Sounds good. Are you and Roopa OK with the below?

RTM_F_IN_HW - route is in hardware
RTM_F_OFFLOAD - route is offloaded

For example, host routes will have the first flag set, whereas prefix
routes will have both flags set.

Together with the existing offload flags for nexthops and neighbours
this provides great visibility into the entire offload process.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH net-next 12/15] ipv4: Add "in hardware" indication to routes
  2019-10-04 14:43             ` Ido Schimmel
@ 2019-10-04 16:38               ` David Ahern
  2019-10-04 17:43                 ` Roopa Prabhu
  0 siblings, 1 reply; 39+ messages in thread
From: David Ahern @ 2019-10-04 16:38 UTC (permalink / raw)
  To: Ido Schimmel, roopa
  Cc: Jiri Pirko, netdev, David Miller, Jiri Pirko, Jakub Kicinski,
	Saeed Mahameed, mlxsw, Ido Schimmel

On 10/4/19 8:43 AM, Ido Schimmel wrote:
>> Sounds like there are 2 cases for prefixes that should be flagged to the
>> user -- "offloaded" (as in traffic is offloaded) and  "in_hw" (prefix is
>> in hardware but forwarding is not offloaded).
> Sounds good. Are you and Roopa OK with the below?
> 
> RTM_F_IN_HW - route is in hardware
> RTM_F_OFFLOAD - route is offloaded
> 
> For example, host routes will have the first flag set, whereas prefix
> routes will have both flags set.

if "offload" always includes "in_hw", then are both needed? ie., why not
document that offload means in hardware with offloaded traffic, and then
"in_hw" is a lesser meaning - only in hardware with a trap to CPU?

> 
> Together with the existing offload flags for nexthops and neighbours
> this provides great visibility into the entire offload process.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH net-next 12/15] ipv4: Add "in hardware" indication to routes
  2019-10-04 16:38               ` David Ahern
@ 2019-10-04 17:43                 ` Roopa Prabhu
  2019-10-04 23:20                   ` David Ahern
  0 siblings, 1 reply; 39+ messages in thread
From: Roopa Prabhu @ 2019-10-04 17:43 UTC (permalink / raw)
  To: David Ahern
  Cc: Ido Schimmel, Jiri Pirko, netdev, David Miller, Jiri Pirko,
	Jakub Kicinski, Saeed Mahameed, mlxsw, Ido Schimmel

On Fri, Oct 4, 2019 at 9:38 AM David Ahern <dsahern@gmail.com> wrote:
>
> On 10/4/19 8:43 AM, Ido Schimmel wrote:
> >> Sounds like there are 2 cases for prefixes that should be flagged to the
> >> user -- "offloaded" (as in traffic is offloaded) and  "in_hw" (prefix is
> >> in hardware but forwarding is not offloaded).
> > Sounds good. Are you and Roopa OK with the below?
> >
> > RTM_F_IN_HW - route is in hardware
> > RTM_F_OFFLOAD - route is offloaded
> >
> > For example, host routes will have the first flag set, whereas prefix
> > routes will have both flags set.
>
> if "offload" always includes "in_hw", then are both needed? ie., why not
> document that offload means in hardware with offloaded traffic, and then
> "in_hw" is a lesser meaning - only in hardware with a trap to CPU?

I was wondering if we can just call these RTM_F_OFFLOAD_TRAP or
RTM_F_OFFLOAD_ASSIT or something along those lines.

My only concern with the proposed names is, both mean HW offload but
only one uses HW in the name which can be confusing down the lane :).



>
> >
> > Together with the existing offload flags for nexthops and neighbours
> > this provides great visibility into the entire offload process.
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH net-next 12/15] ipv4: Add "in hardware" indication to routes
  2019-10-04 17:43                 ` Roopa Prabhu
@ 2019-10-04 23:20                   ` David Ahern
  0 siblings, 0 replies; 39+ messages in thread
From: David Ahern @ 2019-10-04 23:20 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: Ido Schimmel, Jiri Pirko, netdev, David Miller, Jiri Pirko,
	Jakub Kicinski, Saeed Mahameed, mlxsw, Ido Schimmel

On 10/4/19 11:43 AM, Roopa Prabhu wrote:
> On Fri, Oct 4, 2019 at 9:38 AM David Ahern <dsahern@gmail.com> wrote:
>>
>> On 10/4/19 8:43 AM, Ido Schimmel wrote:
>>>> Sounds like there are 2 cases for prefixes that should be flagged to the
>>>> user -- "offloaded" (as in traffic is offloaded) and  "in_hw" (prefix is
>>>> in hardware but forwarding is not offloaded).
>>> Sounds good. Are you and Roopa OK with the below?
>>>
>>> RTM_F_IN_HW - route is in hardware
>>> RTM_F_OFFLOAD - route is offloaded
>>>
>>> For example, host routes will have the first flag set, whereas prefix
>>> routes will have both flags set.
>>
>> if "offload" always includes "in_hw", then are both needed? ie., why not
>> document that offload means in hardware with offloaded traffic, and then
>> "in_hw" is a lesser meaning - only in hardware with a trap to CPU?
> 
> I was wondering if we can just call these RTM_F_OFFLOAD_TRAP or
> RTM_F_OFFLOAD_ASSIT or something along those lines.
> 
> My only concern with the proposed names is, both mean HW offload but
> only one uses HW in the name which can be confusing down the lane :).

sounds good to me.

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2019-10-04 23:20 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-02  8:40 [RFC PATCH net-next 00/15] Simplify IPv4 route offload API Ido Schimmel
2019-10-02  8:40 ` [RFC PATCH net-next 01/15] ipv4: Add temporary events to the FIB notification chain Ido Schimmel
2019-10-02  8:40 ` [RFC PATCH net-next 02/15] ipv4: Notify route after insertion to the routing table Ido Schimmel
2019-10-03  1:34   ` David Ahern
2019-10-03  5:16     ` Ido Schimmel
2019-10-02  8:40 ` [RFC PATCH net-next 03/15] ipv4: Notify route if replacing currently offloaded one Ido Schimmel
2019-10-02  8:40 ` [RFC PATCH net-next 04/15] ipv4: Notify newly added route if should be offloaded Ido Schimmel
2019-10-02  8:40 ` [RFC PATCH net-next 05/15] ipv4: Handle route deletion notification Ido Schimmel
2019-10-02  8:40 ` [RFC PATCH net-next 06/15] ipv4: Handle route deletion notification during flush Ido Schimmel
2019-10-02  8:40 ` [RFC PATCH net-next 07/15] ipv4: Only Replay routes of interest to new listeners Ido Schimmel
2019-10-02 17:44   ` Jiri Pirko
2019-10-03 13:04     ` Ido Schimmel
2019-10-02  8:40 ` [RFC PATCH net-next 08/15] mlxsw: spectrum_router: Start using new IPv4 route notifications Ido Schimmel
2019-10-02 17:52   ` Jiri Pirko
2019-10-02 18:01     ` Jiri Pirko
2019-10-03 15:10       ` Ido Schimmel
2019-10-02  8:40 ` [RFC PATCH net-next 09/15] ipv4: Remove old route notifications and convert listeners Ido Schimmel
2019-10-02  8:40 ` [RFC PATCH net-next 10/15] ipv4: Replace route in list before notifying Ido Schimmel
2019-10-02  8:40 ` [RFC PATCH net-next 11/15] ipv4: Encapsulate function arguments in a struct Ido Schimmel
2019-10-02  8:41 ` [RFC PATCH net-next 12/15] ipv4: Add "in hardware" indication to routes Ido Schimmel
2019-10-02 15:58   ` Roopa Prabhu
2019-10-02 18:21     ` Jiri Pirko
2019-10-03  2:34       ` David Ahern
2019-10-03  5:37         ` Ido Schimmel
2019-10-04  1:55           ` David Ahern
2019-10-04 14:43             ` Ido Schimmel
2019-10-04 16:38               ` David Ahern
2019-10-04 17:43                 ` Roopa Prabhu
2019-10-04 23:20                   ` David Ahern
2019-10-03  5:40         ` Jiri Pirko
2019-10-03 12:59     ` Ido Schimmel
2019-10-04  4:25       ` Roopa Prabhu
2019-10-02  8:41 ` [RFC PATCH net-next 13/15] mlxsw: spectrum_router: Mark routes as "in hardware" Ido Schimmel
2019-10-02 18:27   ` Jiri Pirko
2019-10-03 15:16     ` Ido Schimmel
2019-10-02  8:41 ` [RFC PATCH net-next 14/15] netdevsim: fib: " Ido Schimmel
2019-10-02  8:41 ` [RFC PATCH net-next 15/15] selftests: netdevsim: Add test for route offload API Ido Schimmel
2019-10-02 18:17 ` [RFC PATCH net-next 00/15] Simplify IPv4 " Jiri Pirko
2019-10-03  5:18   ` Ido Schimmel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).