All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch net-next RFC 0/2] fib4 offload: notifier to let hw to be aware of all prefixes
@ 2016-09-06 12:01 Jiri Pirko
  2016-09-06 12:01 ` [patch net-next RFC 1/2] fib: introduce fib notification infrastructure Jiri Pirko
                   ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Jiri Pirko @ 2016-09-06 12:01 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, yotamg, nogahf, ogerlitz, roopa, nikolay,
	linville, tgraf, gospo, sfeldma, ast, edumazet, hannes,
	f.fainelli, dsa, jhs, vivien.didelot, john.fastabend, andrew,
	ivecera

From: Jiri Pirko <jiri@mellanox.com>

This is RFC, unfinished. I came across some issues in the process so I would
like to share those and restart the fib offload discussion in order to make it
really usable.

So the goal of this patchset is to allow driver to propagate all prefixes
configured in kernel down HW. This is necessary for routing to work
as expected. If we don't do that HW might forward prefixes known to kernel
incorrectly. Take an example when default route is set in switch HW and there
is an IP address set on a management (non-switch) port.

Currently, only fibs related to the switch port netdev are offloaded using
switchdev ops. This model is not extendable so the first patch introduces
a replacement: notifier to propagate fib additions and removals to whoever
interested. The second patch makes mlxsw to adopt this new way, registering
one notifier block for each mlxsw (asic) instance.

Using switchdev ops, "abort" is called by switchdev core whenever there is
an error during fib add offload. This leads to removal of all offloaded fibs on
system by fib_trie code.

Now the new notifier assumes the driver takes care of the abort action.
Here's why:
1) The fact that one HW cannot offload fib does not mean that the others can't
   do it. So let only one entity to abort and leave the rest to work happily.
2) The driver knows what to in order to properly abort. For example, currently
   abort is broken for mlxsw as for Spectrum there is a need to set 0.0.0.0/0
   trap in RALUE register.

Issues:
1) RTNH_F_OFFLOAD is originally set in switchdev core. There the assumption is
   that only one offload device exists. But for fib notifier, we assume
   multiple offload devices. When should the offload flag be set and by who?
   I think that it would make sense to have a per-fib reference counter
   for this:
   0 means RTNH_F_OFFLOAD is not set, no device offloads this entry
   n means RTNH_F_OFFLOAD is set and the fib entry is offloaded by n devices

2) Unabort? Would be nice. Currently when add_failure->abort happens,
   user's only option is to reboot the machine. I would like to make this
   nicer for the fib notifier implementation. Perhaps to provide some button in
   devlink which would tell driver to try to offload entries again? Not sure.

3) Policies. Not directly connected to this patchset but this issues
   we have been discussing couple of times and I still believe that
   the current state is not good.
   Software-only forwarding now happens in case of abort and makes the ASIC
   ports to act like dummy separate NICs. In case of Spectrum, the bandwidth
   of CPU port is something around 4Gbit. For 32x100Gbit ports this is
   simply not possible to handle. In case of abort, the system is broken
   as it really could not forward packets at a speed not even close
   to the expected.
   Here the policies come to the picture, allowing the user to set the
   system to behave according his expectations. For example rather
   fail to add the route than to abort to software forward.
   This policy could be per-ASIC, configurable by devlink.

Thoughts please?

Jiri Pirko (2):
  fib: introduce fib notification infrastructure
  mlxsw: spectrum_router: Use FIB notifications instead of switchdev
    calls

 drivers/net/ethernet/mellanox/mlxsw/spectrum.h     |   8 +-
 .../net/ethernet/mellanox/mlxsw/spectrum_router.c  | 257 ++++++++++-----------
 .../ethernet/mellanox/mlxsw/spectrum_switchdev.c   |   9 -
 include/net/ip_fib.h                               |  19 ++
 net/ipv4/fib_trie.c                                |  43 ++++
 5 files changed, 181 insertions(+), 155 deletions(-)

-- 
2.5.5

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [patch net-next RFC 1/2] fib: introduce fib notification infrastructure
  2016-09-06 12:01 [patch net-next RFC 0/2] fib4 offload: notifier to let hw to be aware of all prefixes Jiri Pirko
@ 2016-09-06 12:01 ` Jiri Pirko
  2016-09-06 14:32   ` David Ahern
                     ` (3 more replies)
  2016-09-06 12:01 ` [patch net-next RFC 2/2] mlxsw: spectrum_router: Use FIB notifications instead of switchdev calls Jiri Pirko
  2016-09-18 20:00 ` [patch net-next RFC 0/2] fib4 offload: notifier to let hw to be aware of all prefixes Florian Fainelli
  2 siblings, 4 replies; 27+ messages in thread
From: Jiri Pirko @ 2016-09-06 12:01 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, yotamg, nogahf, ogerlitz, roopa, nikolay,
	linville, tgraf, gospo, sfeldma, ast, edumazet, hannes,
	f.fainelli, dsa, jhs, vivien.didelot, john.fastabend, andrew,
	ivecera

From: Jiri Pirko <jiri@mellanox.com>

This allows to pass information about added/deleted fib entries to
whoever is interested. This is done in a very similar way as devinet
notifies address additions/removals.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/ip_fib.h | 19 +++++++++++++++++++
 net/ipv4/fib_trie.c  | 43 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 62 insertions(+)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 4079fc1..9ad7ba9 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -22,6 +22,7 @@
 #include <net/fib_rules.h>
 #include <net/inetpeer.h>
 #include <linux/percpu.h>
+#include <linux/notifier.h>
 
 struct fib_config {
 	u8			fc_dst_len;
@@ -184,6 +185,24 @@ __be32 fib_info_update_nh_saddr(struct net *net, struct fib_nh *nh);
 #define FIB_RES_PREFSRC(net, res)	((res).fi->fib_prefsrc ? : \
 					 FIB_RES_SADDR(net, res))
 
+struct fib_notifier_info {
+	u32 dst;
+	int dst_len;
+	struct fib_info *fi;
+	u8 tos;
+	u8 type;
+	u32 tb_id;
+	u32 nlflags;
+};
+
+enum fib_event_type {
+	FIB_EVENT_TYPE_ADD,
+	FIB_EVENT_TYPE_DEL,
+};
+
+int register_fib_notifier(struct notifier_block *nb);
+int unregister_fib_notifier(struct notifier_block *nb);
+
 struct fib_table {
 	struct hlist_node	tb_hlist;
 	u32			tb_id;
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index e2ffc2a..19ec471 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -73,6 +73,7 @@
 #include <linux/slab.h>
 #include <linux/export.h>
 #include <linux/vmalloc.h>
+#include <linux/notifier.h>
 #include <net/net_namespace.h>
 #include <net/ip.h>
 #include <net/protocol.h>
@@ -84,6 +85,36 @@
 #include <trace/events/fib.h>
 #include "fib_lookup.h"
 
+static BLOCKING_NOTIFIER_HEAD(fib_chain);
+
+int register_fib_notifier(struct notifier_block *nb)
+{
+	return blocking_notifier_chain_register(&fib_chain, nb);
+}
+EXPORT_SYMBOL(register_fib_notifier);
+
+int unregister_fib_notifier(struct notifier_block *nb)
+{
+	return blocking_notifier_chain_unregister(&fib_chain, nb);
+}
+EXPORT_SYMBOL(unregister_fib_notifier);
+
+static int call_fib_notifiers(enum fib_event_type event_type, u32 dst,
+			      int dst_len, struct fib_info *fi,
+			      u8 tos, u8 type, u32 tb_id, u32 nlflags)
+{
+	struct fib_notifier_info info = {
+		.dst = dst,
+		.dst_len = dst_len,
+		.fi = fi,
+		.tos = tos,
+		.type = type,
+		.tb_id = tb_id,
+		.nlflags = nlflags,
+	};
+	return blocking_notifier_call_chain(&fib_chain, event_type, &info);
+}
+
 #define MAX_STAT_DEPTH 32
 
 #define KEYLENGTH	(8*sizeof(t_key))
@@ -1190,6 +1221,10 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
 			fib_release_info(fi_drop);
 			if (state & FA_S_ACCESSED)
 				rt_cache_flush(cfg->fc_nlinfo.nl_net);
+
+			call_fib_notifiers(FIB_EVENT_TYPE_ADD, key, plen, fi,
+					   new_fa->fa_tos, cfg->fc_type,
+					   tb->tb_id, cfg->fc_nlflags);
 			rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen,
 				tb->tb_id, &cfg->fc_nlinfo, NLM_F_REPLACE);
 
@@ -1241,6 +1276,8 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
 		tb->tb_num_default++;
 
 	rt_cache_flush(cfg->fc_nlinfo.nl_net);
+	call_fib_notifiers(FIB_EVENT_TYPE_ADD, key, plen, fi, tos,
+			   cfg->fc_type, tb->tb_id, cfg->fc_nlflags);
 	rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen, new_fa->tb_id,
 		  &cfg->fc_nlinfo, nlflags);
 succeeded:
@@ -1542,6 +1579,8 @@ int fib_table_delete(struct fib_table *tb, struct fib_config *cfg)
 	switchdev_fib_ipv4_del(key, plen, fa_to_delete->fa_info, tos,
 			       cfg->fc_type, tb->tb_id);
 
+	call_fib_notifiers(FIB_EVENT_TYPE_DEL, key, plen, fa_to_delete->fa_info,
+			   tos, cfg->fc_type, tb->tb_id, 0);
 	rtmsg_fib(RTM_DELROUTE, htonl(key), fa_to_delete, plen, tb->tb_id,
 		  &cfg->fc_nlinfo, 0);
 
@@ -1857,6 +1896,10 @@ int fib_table_flush(struct fib_table *tb)
 			switchdev_fib_ipv4_del(n->key, KEYLENGTH - fa->fa_slen,
 					       fi, fa->fa_tos, fa->fa_type,
 					       tb->tb_id);
+			call_fib_notifiers(FIB_EVENT_TYPE_DEL, n->key,
+					   KEYLENGTH - fa->fa_slen,
+					   fi, fa->fa_tos, fa->fa_type,
+					   tb->tb_id, 0);
 			hlist_del_rcu(&fa->fa_list);
 			fib_release_info(fa->fa_info);
 			alias_free_mem_rcu(fa);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [patch net-next RFC 2/2] mlxsw: spectrum_router: Use FIB notifications instead of switchdev calls
  2016-09-06 12:01 [patch net-next RFC 0/2] fib4 offload: notifier to let hw to be aware of all prefixes Jiri Pirko
  2016-09-06 12:01 ` [patch net-next RFC 1/2] fib: introduce fib notification infrastructure Jiri Pirko
@ 2016-09-06 12:01 ` Jiri Pirko
  2016-09-18 20:00 ` [patch net-next RFC 0/2] fib4 offload: notifier to let hw to be aware of all prefixes Florian Fainelli
  2 siblings, 0 replies; 27+ messages in thread
From: Jiri Pirko @ 2016-09-06 12:01 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, yotamg, nogahf, ogerlitz, roopa, nikolay,
	linville, tgraf, gospo, sfeldma, ast, edumazet, hannes,
	f.fainelli, dsa, jhs, vivien.didelot, john.fastabend, andrew,
	ivecera

From: Jiri Pirko <jiri@mellanox.com>

Until now, in order to offload a FIB entry to HW we use switchdev op.
However that has limits. Mainly in case we need to make the HW aware of
all route prefixes configured in kernel. HW needs to know those in order
to properly trap appropriate packets and pass the to kernel to do
the forwarding.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h     |   8 +-
 .../net/ethernet/mellanox/mlxsw/spectrum_router.c  | 257 ++++++++++-----------
 .../ethernet/mellanox/mlxsw/spectrum_switchdev.c   |   9 -
 3 files changed, 119 insertions(+), 155 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index 49f4caf..03b97da 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -45,7 +45,7 @@
 #include <linux/list.h>
 #include <linux/dcbnl.h>
 #include <linux/in6.h>
-#include <net/switchdev.h>
+#include <linux/notifier.h>
 
 #include "port.h"
 #include "core.h"
@@ -302,6 +302,7 @@ struct mlxsw_sp {
 		struct mlxsw_sp_span_entry *entries;
 		int entries_count;
 	} span;
+	struct notifier_block fib_nb;
 };
 
 static inline struct mlxsw_sp_upper *
@@ -587,11 +588,6 @@ static inline void mlxsw_sp_port_dcb_fini(struct mlxsw_sp_port *mlxsw_sp_port)
 
 int mlxsw_sp_router_init(struct mlxsw_sp *mlxsw_sp);
 void mlxsw_sp_router_fini(struct mlxsw_sp *mlxsw_sp);
-int mlxsw_sp_router_fib4_add(struct mlxsw_sp_port *mlxsw_sp_port,
-			     const struct switchdev_obj_ipv4_fib *fib4,
-			     struct switchdev_trans *trans);
-int mlxsw_sp_router_fib4_del(struct mlxsw_sp_port *mlxsw_sp_port,
-			     const struct switchdev_obj_ipv4_fib *fib4);
 int mlxsw_sp_router_neigh_construct(struct net_device *dev,
 				    struct neighbour *n);
 void mlxsw_sp_router_neigh_destroy(struct net_device *dev,
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index 3f5c51d..7330439 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -43,6 +43,7 @@
 #include <net/netevent.h>
 #include <net/neighbour.h>
 #include <net/arp.h>
+#include <net/ip_fib.h>
 
 #include "spectrum.h"
 #include "core.h"
@@ -1489,50 +1490,6 @@ static void mlxsw_sp_nexthop_group_put(struct mlxsw_sp *mlxsw_sp,
 	mlxsw_sp_nexthop_group_destroy(mlxsw_sp, nh_grp);
 }
 
-static int __mlxsw_sp_router_init(struct mlxsw_sp *mlxsw_sp)
-{
-	char rgcr_pl[MLXSW_REG_RGCR_LEN];
-
-	mlxsw_reg_rgcr_pack(rgcr_pl, true);
-	mlxsw_reg_rgcr_max_router_interfaces_set(rgcr_pl, MLXSW_SP_RIF_MAX);
-	return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(rgcr), rgcr_pl);
-}
-
-static void __mlxsw_sp_router_fini(struct mlxsw_sp *mlxsw_sp)
-{
-	char rgcr_pl[MLXSW_REG_RGCR_LEN];
-
-	mlxsw_reg_rgcr_pack(rgcr_pl, false);
-	mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(rgcr), rgcr_pl);
-}
-
-int mlxsw_sp_router_init(struct mlxsw_sp *mlxsw_sp)
-{
-	int err;
-
-	INIT_LIST_HEAD(&mlxsw_sp->router.nexthop_neighs_list);
-	INIT_LIST_HEAD(&mlxsw_sp->router.nexthop_group_list);
-	err = __mlxsw_sp_router_init(mlxsw_sp);
-	if (err)
-		return err;
-	mlxsw_sp_lpm_init(mlxsw_sp);
-	mlxsw_sp_vrs_init(mlxsw_sp);
-	err = mlxsw_sp_neigh_init(mlxsw_sp);
-	if (err)
-		goto err_neigh_init;
-	return 0;
-
-err_neigh_init:
-	__mlxsw_sp_router_fini(mlxsw_sp);
-	return err;
-}
-
-void mlxsw_sp_router_fini(struct mlxsw_sp *mlxsw_sp)
-{
-	mlxsw_sp_neigh_fini(mlxsw_sp);
-	__mlxsw_sp_router_fini(mlxsw_sp);
-}
-
 static int mlxsw_sp_fib_entry_op4_remote(struct mlxsw_sp *mlxsw_sp,
 					 struct mlxsw_sp_fib_entry *fib_entry,
 					 enum mlxsw_reg_ralue_op op)
@@ -1637,45 +1594,34 @@ static int mlxsw_sp_fib_entry_del(struct mlxsw_sp *mlxsw_sp,
 				     MLXSW_REG_RALUE_OP_WRITE_DELETE);
 }
 
-struct mlxsw_sp_router_fib4_add_info {
-	struct switchdev_trans_item tritem;
-	struct mlxsw_sp *mlxsw_sp;
-	struct mlxsw_sp_fib_entry *fib_entry;
-};
-
-static void mlxsw_sp_router_fib4_add_info_destroy(void const *data)
-{
-	const struct mlxsw_sp_router_fib4_add_info *info = data;
-	struct mlxsw_sp_fib_entry *fib_entry = info->fib_entry;
-	struct mlxsw_sp *mlxsw_sp = info->mlxsw_sp;
-	struct mlxsw_sp_vr *vr = fib_entry->vr;
-
-	mlxsw_sp_fib_entry_destroy(fib_entry);
-	mlxsw_sp_vr_put(mlxsw_sp, vr);
-	kfree(info);
-}
-
 static int
 mlxsw_sp_router_fib4_entry_init(struct mlxsw_sp *mlxsw_sp,
-				const struct switchdev_obj_ipv4_fib *fib4,
+				const struct fib_notifier_info *fn_info,
 				struct mlxsw_sp_fib_entry *fib_entry)
 {
-	struct fib_info *fi = fib4->fi;
+	struct fib_info *fi = fn_info->fi;
 
-	if (fib4->type == RTN_LOCAL || fib4->type == RTN_BROADCAST) {
+	if (fn_info->type == RTN_LOCAL || fn_info->type == RTN_BROADCAST) {
 		fib_entry->type = MLXSW_SP_FIB_ENTRY_TYPE_TRAP;
 		return 0;
 	}
-	if (fib4->type != RTN_UNICAST)
+	if (fn_info->type != RTN_UNICAST)
 		return -EINVAL;
 
 	if (fi->fib_scope != RT_SCOPE_UNIVERSE) {
 		struct mlxsw_sp_rif *r;
 
-		fib_entry->type = MLXSW_SP_FIB_ENTRY_TYPE_LOCAL;
 		r = mlxsw_sp_rif_find_by_dev(mlxsw_sp, fi->fib_dev);
-		if (!r)
-			return -EINVAL;
+		if (!r) {
+			/* In case router interface is not found, that means
+			 * the fib entry was added to some device unrelated
+			 * to us. Set trap and pass the packets for
+			 * this prefix to kernel.
+			 */
+			fib_entry->type = MLXSW_SP_FIB_ENTRY_TYPE_TRAP;
+			return 0;
+		}
+		fib_entry->type = MLXSW_SP_FIB_ENTRY_TYPE_LOCAL;
 		fib_entry->rif = r->rif;
 		return 0;
 	}
@@ -1694,29 +1640,29 @@ mlxsw_sp_router_fib4_entry_fini(struct mlxsw_sp *mlxsw_sp,
 
 static struct mlxsw_sp_fib_entry *
 mlxsw_sp_fib_entry_get(struct mlxsw_sp *mlxsw_sp,
-		       const struct switchdev_obj_ipv4_fib *fib4)
+		       const struct fib_notifier_info *fn_info)
 {
 	struct mlxsw_sp_fib_entry *fib_entry;
-	struct fib_info *fi = fib4->fi;
+	struct fib_info *fi = fn_info->fi;
 	struct mlxsw_sp_vr *vr;
 	int err;
 
-	vr = mlxsw_sp_vr_get(mlxsw_sp, fib4->dst_len, fib4->tb_id,
+	vr = mlxsw_sp_vr_get(mlxsw_sp, fn_info->dst_len, fn_info->tb_id,
 			     MLXSW_SP_L3_PROTO_IPV4);
 	if (IS_ERR(vr))
 		return ERR_CAST(vr);
 
-	fib_entry = mlxsw_sp_fib_entry_lookup(vr->fib, &fib4->dst,
-					      sizeof(fib4->dst),
-					      fib4->dst_len, fi->fib_dev);
+	fib_entry = mlxsw_sp_fib_entry_lookup(vr->fib, &fn_info->dst,
+					      sizeof(fn_info->dst),
+					      fn_info->dst_len, fi->fib_dev);
 	if (fib_entry) {
 		/* Already exists, just take a reference */
 		fib_entry->ref_count++;
 		return fib_entry;
 	}
-	fib_entry = mlxsw_sp_fib_entry_create(vr->fib, &fib4->dst,
-					      sizeof(fib4->dst),
-					      fib4->dst_len, fi->fib_dev);
+	fib_entry = mlxsw_sp_fib_entry_create(vr->fib, &fn_info->dst,
+					      sizeof(fn_info->dst),
+					      fn_info->dst_len, fi->fib_dev);
 	if (!fib_entry) {
 		err = -ENOMEM;
 		goto err_fib_entry_create;
@@ -1724,7 +1670,7 @@ mlxsw_sp_fib_entry_get(struct mlxsw_sp *mlxsw_sp,
 	fib_entry->vr = vr;
 	fib_entry->ref_count = 1;
 
-	err = mlxsw_sp_router_fib4_entry_init(mlxsw_sp, fib4, fib_entry);
+	err = mlxsw_sp_router_fib4_entry_init(mlxsw_sp, fn_info, fib_entry);
 	if (err)
 		goto err_fib4_entry_init;
 
@@ -1740,17 +1686,17 @@ err_fib_entry_create:
 
 static struct mlxsw_sp_fib_entry *
 mlxsw_sp_fib_entry_find(struct mlxsw_sp *mlxsw_sp,
-			const struct switchdev_obj_ipv4_fib *fib4)
+			const struct fib_notifier_info *fn_info)
 {
 	struct mlxsw_sp_vr *vr;
 
-	vr = mlxsw_sp_vr_find(mlxsw_sp, fib4->tb_id, MLXSW_SP_L3_PROTO_IPV4);
+	vr = mlxsw_sp_vr_find(mlxsw_sp, fn_info->tb_id, MLXSW_SP_L3_PROTO_IPV4);
 	if (!vr)
 		return NULL;
 
-	return mlxsw_sp_fib_entry_lookup(vr->fib, &fib4->dst,
-					 sizeof(fib4->dst), fib4->dst_len,
-					 fib4->fi->fib_dev);
+	return mlxsw_sp_fib_entry_lookup(vr->fib, &fn_info->dst,
+					 sizeof(fn_info->dst), fn_info->dst_len,
+					 fn_info->fi->fib_dev);
 }
 
 void mlxsw_sp_fib_entry_put(struct mlxsw_sp *mlxsw_sp,
@@ -1765,60 +1711,29 @@ void mlxsw_sp_fib_entry_put(struct mlxsw_sp *mlxsw_sp,
 	mlxsw_sp_vr_put(mlxsw_sp, vr);
 }
 
-static int
-mlxsw_sp_router_fib4_add_prepare(struct mlxsw_sp_port *mlxsw_sp_port,
-				 const struct switchdev_obj_ipv4_fib *fib4,
-				 struct switchdev_trans *trans)
+static int mlxsw_sp_router_fib4_add(struct mlxsw_sp *mlxsw_sp,
+				    struct fib_notifier_info *fn_info)
 {
-	struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp;
-	struct mlxsw_sp_router_fib4_add_info *info;
 	struct mlxsw_sp_fib_entry *fib_entry;
+	struct mlxsw_sp_vr *vr;
 	int err;
 
-	fib_entry = mlxsw_sp_fib_entry_get(mlxsw_sp, fib4);
-	if (IS_ERR(fib_entry))
+	fib_entry = mlxsw_sp_fib_entry_get(mlxsw_sp, fn_info);
+	if (IS_ERR(fib_entry)) {
+		dev_warn(mlxsw_sp->bus_info->dev, "Failed to get FIB4 entry being added.\n");
 		return PTR_ERR(fib_entry);
-
-	info = kmalloc(sizeof(*info), GFP_KERNEL);
-	if (!info) {
-		err = -ENOMEM;
-		goto err_alloc_info;
 	}
-	info->mlxsw_sp = mlxsw_sp;
-	info->fib_entry = fib_entry;
-	switchdev_trans_item_enqueue(trans, info,
-				     mlxsw_sp_router_fib4_add_info_destroy,
-				     &info->tritem);
-	return 0;
-
-err_alloc_info:
-	mlxsw_sp_fib_entry_put(mlxsw_sp, fib_entry);
-	return err;
-}
-
-static int
-mlxsw_sp_router_fib4_add_commit(struct mlxsw_sp_port *mlxsw_sp_port,
-				const struct switchdev_obj_ipv4_fib *fib4,
-				struct switchdev_trans *trans)
-{
-	struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp;
-	struct mlxsw_sp_router_fib4_add_info *info;
-	struct mlxsw_sp_fib_entry *fib_entry;
-	struct mlxsw_sp_vr *vr;
-	int err;
-
-	info = switchdev_trans_item_dequeue(trans);
-	fib_entry = info->fib_entry;
-	kfree(info);
 
 	if (fib_entry->ref_count != 1)
 		return 0;
 
 	vr = fib_entry->vr;
 	err = mlxsw_sp_fib_entry_insert(vr->fib, fib_entry);
-	if (err)
+	if (err) {
+		dev_warn(mlxsw_sp->bus_info->dev, "Failed to insert FIB4 entry being added.\n");
 		goto err_fib_entry_insert;
-	err = mlxsw_sp_fib_entry_update(mlxsw_sp_port->mlxsw_sp, fib_entry);
+	}
+	err = mlxsw_sp_fib_entry_update(mlxsw_sp, fib_entry);
 	if (err)
 		goto err_fib_entry_add;
 	return 0;
@@ -1830,24 +1745,12 @@ err_fib_entry_insert:
 	return err;
 }
 
-int mlxsw_sp_router_fib4_add(struct mlxsw_sp_port *mlxsw_sp_port,
-			     const struct switchdev_obj_ipv4_fib *fib4,
-			     struct switchdev_trans *trans)
+static int mlxsw_sp_router_fib4_del(struct mlxsw_sp *mlxsw_sp,
+				    struct fib_notifier_info *fn_info)
 {
-	if (switchdev_trans_ph_prepare(trans))
-		return mlxsw_sp_router_fib4_add_prepare(mlxsw_sp_port,
-							fib4, trans);
-	return mlxsw_sp_router_fib4_add_commit(mlxsw_sp_port,
-					       fib4, trans);
-}
-
-int mlxsw_sp_router_fib4_del(struct mlxsw_sp_port *mlxsw_sp_port,
-			     const struct switchdev_obj_ipv4_fib *fib4)
-{
-	struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp;
 	struct mlxsw_sp_fib_entry *fib_entry;
 
-	fib_entry = mlxsw_sp_fib_entry_find(mlxsw_sp, fib4);
+	fib_entry = mlxsw_sp_fib_entry_find(mlxsw_sp, fn_info);
 	if (!fib_entry) {
 		dev_warn(mlxsw_sp->bus_info->dev, "Failed to find FIB4 entry being removed.\n");
 		return -ENOENT;
@@ -1861,3 +1764,77 @@ int mlxsw_sp_router_fib4_del(struct mlxsw_sp_port *mlxsw_sp_port,
 	mlxsw_sp_fib_entry_put(mlxsw_sp, fib_entry);
 	return 0;
 }
+
+static void mlxsw_sp_router_fib4_abort(struct mlxsw_sp *mlxsw_sp)
+{
+	/* TODO: remove all ralue entries from the asic and set trap
+	 * to pass everything to kernel
+	 */
+}
+
+static int __mlxsw_sp_router_init(struct mlxsw_sp *mlxsw_sp)
+{
+	char rgcr_pl[MLXSW_REG_RGCR_LEN];
+
+	mlxsw_reg_rgcr_pack(rgcr_pl, true);
+	mlxsw_reg_rgcr_max_router_interfaces_set(rgcr_pl, MLXSW_SP_RIF_MAX);
+	return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(rgcr), rgcr_pl);
+}
+
+static void __mlxsw_sp_router_fini(struct mlxsw_sp *mlxsw_sp)
+{
+	char rgcr_pl[MLXSW_REG_RGCR_LEN];
+
+	mlxsw_reg_rgcr_pack(rgcr_pl, false);
+	mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(rgcr), rgcr_pl);
+}
+
+static int mlxsw_sp_router_fib_event(struct notifier_block *nb,
+				     unsigned long event, void *ptr)
+{
+	struct mlxsw_sp *mlxsw_sp = container_of(nb, struct mlxsw_sp, fib_nb);
+	struct fib_notifier_info *fn_info = ptr;
+	int err;
+
+	switch (event) {
+	case FIB_EVENT_TYPE_ADD:
+		err = mlxsw_sp_router_fib4_add(mlxsw_sp, fn_info);
+		if (err)
+			mlxsw_sp_router_fib4_abort(mlxsw_sp);
+		break;
+	case FIB_EVENT_TYPE_DEL:
+		mlxsw_sp_router_fib4_del(mlxsw_sp, fn_info);
+		break;
+	}
+	return NOTIFY_DONE;
+}
+
+int mlxsw_sp_router_init(struct mlxsw_sp *mlxsw_sp)
+{
+	int err;
+
+	INIT_LIST_HEAD(&mlxsw_sp->router.nexthop_neighs_list);
+	INIT_LIST_HEAD(&mlxsw_sp->router.nexthop_group_list);
+	err = __mlxsw_sp_router_init(mlxsw_sp);
+	if (err)
+		return err;
+	mlxsw_sp_lpm_init(mlxsw_sp);
+	mlxsw_sp_vrs_init(mlxsw_sp);
+	err = mlxsw_sp_neigh_init(mlxsw_sp);
+	if (err)
+		goto err_neigh_init;
+	mlxsw_sp->fib_nb.notifier_call = mlxsw_sp_router_fib_event;
+	register_fib_notifier(&mlxsw_sp->fib_nb);
+	return 0;
+
+err_neigh_init:
+	__mlxsw_sp_router_fini(mlxsw_sp);
+	return err;
+}
+
+void mlxsw_sp_router_fini(struct mlxsw_sp *mlxsw_sp)
+{
+	unregister_fib_notifier(&mlxsw_sp->fib_nb);
+	mlxsw_sp_neigh_fini(mlxsw_sp);
+	__mlxsw_sp_router_fini(mlxsw_sp);
+}
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index 7186c48..7042608 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -1044,11 +1044,6 @@ static int mlxsw_sp_port_obj_add(struct net_device *dev,
 					      SWITCHDEV_OBJ_PORT_VLAN(obj),
 					      trans);
 		break;
-	case SWITCHDEV_OBJ_ID_IPV4_FIB:
-		err = mlxsw_sp_router_fib4_add(mlxsw_sp_port,
-					       SWITCHDEV_OBJ_IPV4_FIB(obj),
-					       trans);
-		break;
 	case SWITCHDEV_OBJ_ID_PORT_FDB:
 		err = mlxsw_sp_port_fdb_static_add(mlxsw_sp_port,
 						   SWITCHDEV_OBJ_PORT_FDB(obj),
@@ -1181,10 +1176,6 @@ static int mlxsw_sp_port_obj_del(struct net_device *dev,
 		err = mlxsw_sp_port_vlans_del(mlxsw_sp_port,
 					      SWITCHDEV_OBJ_PORT_VLAN(obj));
 		break;
-	case SWITCHDEV_OBJ_ID_IPV4_FIB:
-		err = mlxsw_sp_router_fib4_del(mlxsw_sp_port,
-					       SWITCHDEV_OBJ_IPV4_FIB(obj));
-		break;
 	case SWITCHDEV_OBJ_ID_PORT_FDB:
 		err = mlxsw_sp_port_fdb_static_del(mlxsw_sp_port,
 						   SWITCHDEV_OBJ_PORT_FDB(obj));
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [patch net-next RFC 1/2] fib: introduce fib notification infrastructure
  2016-09-06 12:01 ` [patch net-next RFC 1/2] fib: introduce fib notification infrastructure Jiri Pirko
@ 2016-09-06 14:32   ` David Ahern
  2016-09-06 14:44     ` Jiri Pirko
  2016-09-06 15:13   ` David Ahern
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 27+ messages in thread
From: David Ahern @ 2016-09-06 14:32 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, idosch, eladr, yotamg, nogahf, ogerlitz, roopa, nikolay,
	linville, tgraf, gospo, sfeldma, ast, edumazet, hannes,
	f.fainelli, jhs, vivien.didelot, john.fastabend, andrew, ivecera

On 9/6/16 6:01 AM, Jiri Pirko wrote:
> From: Jiri Pirko <jiri@mellanox.com>
> 
> This allows to pass information about added/deleted fib entries to
> whoever is interested. This is done in a very similar way as devinet
> notifies address additions/removals.
> 
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> ---
>  include/net/ip_fib.h | 19 +++++++++++++++++++
>  net/ipv4/fib_trie.c  | 43 +++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 62 insertions(+)
> 

The notifier infrastructure should be generalized for use with IPv4 and IPv6. While the data will be family based, the infra can be generic.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch net-next RFC 1/2] fib: introduce fib notification infrastructure
  2016-09-06 14:32   ` David Ahern
@ 2016-09-06 14:44     ` Jiri Pirko
  2016-09-06 15:11       ` David Ahern
  0 siblings, 1 reply; 27+ messages in thread
From: Jiri Pirko @ 2016-09-06 14:44 UTC (permalink / raw)
  To: David Ahern
  Cc: netdev, davem, idosch, eladr, yotamg, nogahf, ogerlitz, roopa,
	nikolay, linville, tgraf, gospo, sfeldma, ast, edumazet, hannes,
	f.fainelli, jhs, vivien.didelot, john.fastabend, andrew, ivecera

Tue, Sep 06, 2016 at 04:32:12PM CEST, dsa@cumulusnetworks.com wrote:
>On 9/6/16 6:01 AM, Jiri Pirko wrote:
>> From: Jiri Pirko <jiri@mellanox.com>
>> 
>> This allows to pass information about added/deleted fib entries to
>> whoever is interested. This is done in a very similar way as devinet
>> notifies address additions/removals.
>> 
>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>> ---
>>  include/net/ip_fib.h | 19 +++++++++++++++++++
>>  net/ipv4/fib_trie.c  | 43 +++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 62 insertions(+)
>> 
>
>The notifier infrastructure should be generalized for use with IPv4 and IPv6. While the data will be family based, the infra can be generic.
>

Yeah, that I thought about as well. Thing is, ipv6 notifier has to be
atomic. That is the reason we have:
inetaddr_chain and register_inetaddr_notifier (blocking notifier)
inet6addr_chain and register_inet6addr_notifier (atomic notifier)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch net-next RFC 1/2] fib: introduce fib notification infrastructure
  2016-09-06 14:44     ` Jiri Pirko
@ 2016-09-06 15:11       ` David Ahern
  2016-09-06 15:49         ` Jiri Pirko
  0 siblings, 1 reply; 27+ messages in thread
From: David Ahern @ 2016-09-06 15:11 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, idosch, eladr, yotamg, nogahf, ogerlitz, roopa,
	nikolay, linville, tgraf, gospo, sfeldma, ast, edumazet, hannes,
	f.fainelli, jhs, vivien.didelot, john.fastabend, andrew, ivecera,
	YOSHIFUJI Hideaki

On 9/6/16 8:44 AM, Jiri Pirko wrote:
> Tue, Sep 06, 2016 at 04:32:12PM CEST, dsa@cumulusnetworks.com wrote:
>> On 9/6/16 6:01 AM, Jiri Pirko wrote:
>>> From: Jiri Pirko <jiri@mellanox.com>
>>>
>>> This allows to pass information about added/deleted fib entries to
>>> whoever is interested. This is done in a very similar way as devinet
>>> notifies address additions/removals.
>>>
>>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>>> ---
>>>  include/net/ip_fib.h | 19 +++++++++++++++++++
>>>  net/ipv4/fib_trie.c  | 43 +++++++++++++++++++++++++++++++++++++++++++
>>>  2 files changed, 62 insertions(+)
>>>
>>
>> The notifier infrastructure should be generalized for use with IPv4 and IPv6. While the data will be family based, the infra can be generic.
>>
> 
> Yeah, that I thought about as well. Thing is, ipv6 notifier has to be
> atomic. That is the reason we have:
> inetaddr_chain and register_inetaddr_notifier (blocking notifier)
> inet6addr_chain and register_inet6addr_notifier (atomic notifier)
> 

Why is IPv6 atomic? Looking at code paths for adding addresses seems like all of the locks are dropped before the notifier is called and adding and deleting ipv6 addresses does not show a hit with this WARN_ON:


diff --git a/net/ipv6/addrconf_core.c b/net/ipv6/addrconf_core.c
index bfa941fc1165..4f9f964d95e5 100644
--- a/net/ipv6/addrconf_core.c
+++ b/net/ipv6/addrconf_core.c
@@ -103,6 +103,7 @@ EXPORT_SYMBOL(unregister_inet6addr_notifier);

 int inet6addr_notifier_call_chain(unsigned long val, void *v)
 {
+WARN_ON(in_atomic());
        return atomic_notifier_call_chain(&inet6addr_chain, val, v);
 }
 EXPORT_SYMBOL(inet6addr_notifier_call_chain);

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [patch net-next RFC 1/2] fib: introduce fib notification infrastructure
  2016-09-06 12:01 ` [patch net-next RFC 1/2] fib: introduce fib notification infrastructure Jiri Pirko
  2016-09-06 14:32   ` David Ahern
@ 2016-09-06 15:13   ` David Ahern
  2016-09-07  8:03     ` Jiri Pirko
  2016-09-15 14:41   ` [net-next,RFC,1/2] " Andy Gospodarek
  2016-09-18 23:23   ` [patch net-next RFC 1/2] " Roopa Prabhu
  3 siblings, 1 reply; 27+ messages in thread
From: David Ahern @ 2016-09-06 15:13 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, idosch, eladr, yotamg, nogahf, ogerlitz, roopa, nikolay,
	linville, tgraf, gospo, sfeldma, ast, edumazet, hannes,
	f.fainelli, jhs, vivien.didelot, john.fastabend, andrew, ivecera

On 9/6/16 6:01 AM, Jiri Pirko wrote:
> From: Jiri Pirko <jiri@mellanox.com>
> 
> This allows to pass information about added/deleted fib entries to
> whoever is interested. This is done in a very similar way as devinet
> notifies address additions/removals.
> 
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> ---
>  include/net/ip_fib.h | 19 +++++++++++++++++++
>  net/ipv4/fib_trie.c  | 43 +++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 62 insertions(+)

Do you intend for this set of notifiers to work with policy routing (FIB rules) as well?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch net-next RFC 1/2] fib: introduce fib notification infrastructure
  2016-09-06 15:11       ` David Ahern
@ 2016-09-06 15:49         ` Jiri Pirko
  2016-09-06 16:14           ` Hannes Frederic Sowa
  0 siblings, 1 reply; 27+ messages in thread
From: Jiri Pirko @ 2016-09-06 15:49 UTC (permalink / raw)
  To: David Ahern
  Cc: netdev, davem, idosch, eladr, yotamg, nogahf, ogerlitz, roopa,
	nikolay, linville, tgraf, gospo, sfeldma, ast, edumazet, hannes,
	f.fainelli, jhs, vivien.didelot, john.fastabend, andrew, ivecera,
	YOSHIFUJI Hideaki

Tue, Sep 06, 2016 at 05:11:11PM CEST, dsa@cumulusnetworks.com wrote:
>On 9/6/16 8:44 AM, Jiri Pirko wrote:
>> Tue, Sep 06, 2016 at 04:32:12PM CEST, dsa@cumulusnetworks.com wrote:
>>> On 9/6/16 6:01 AM, Jiri Pirko wrote:
>>>> From: Jiri Pirko <jiri@mellanox.com>
>>>>
>>>> This allows to pass information about added/deleted fib entries to
>>>> whoever is interested. This is done in a very similar way as devinet
>>>> notifies address additions/removals.
>>>>
>>>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>>>> ---
>>>>  include/net/ip_fib.h | 19 +++++++++++++++++++
>>>>  net/ipv4/fib_trie.c  | 43 +++++++++++++++++++++++++++++++++++++++++++
>>>>  2 files changed, 62 insertions(+)
>>>>
>>>
>>> The notifier infrastructure should be generalized for use with IPv4 and IPv6. While the data will be family based, the infra can be generic.
>>>
>> 
>> Yeah, that I thought about as well. Thing is, ipv6 notifier has to be
>> atomic. That is the reason we have:
>> inetaddr_chain and register_inetaddr_notifier (blocking notifier)
>> inet6addr_chain and register_inet6addr_notifier (atomic notifier)
>> 
>
>Why is IPv6 atomic? Looking at code paths for adding addresses seems like all of the locks are dropped before the notifier is called and adding and deleting ipv6 addresses does not show a hit with this WARN_ON:


Maybe historic reasons. Would be good to unite the notifiers then. I'll
look at it.


>
>
>diff --git a/net/ipv6/addrconf_core.c b/net/ipv6/addrconf_core.c
>index bfa941fc1165..4f9f964d95e5 100644
>--- a/net/ipv6/addrconf_core.c
>+++ b/net/ipv6/addrconf_core.c
>@@ -103,6 +103,7 @@ EXPORT_SYMBOL(unregister_inet6addr_notifier);
>
> int inet6addr_notifier_call_chain(unsigned long val, void *v)
> {
>+WARN_ON(in_atomic());
>        return atomic_notifier_call_chain(&inet6addr_chain, val, v);
> }
> EXPORT_SYMBOL(inet6addr_notifier_call_chain);

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch net-next RFC 1/2] fib: introduce fib notification infrastructure
  2016-09-06 15:49         ` Jiri Pirko
@ 2016-09-06 16:14           ` Hannes Frederic Sowa
  0 siblings, 0 replies; 27+ messages in thread
From: Hannes Frederic Sowa @ 2016-09-06 16:14 UTC (permalink / raw)
  To: Jiri Pirko, David Ahern
  Cc: netdev, davem, idosch, eladr, yotamg, nogahf, ogerlitz, roopa,
	nikolay, linville, tgraf, gospo, sfeldma, ast, edumazet,
	f.fainelli, jhs, vivien.didelot, john.fastabend, andrew, ivecera,
	YOSHIFUJI Hideaki

On Tue, Sep 6, 2016, at 17:49, Jiri Pirko wrote:
> Tue, Sep 06, 2016 at 05:11:11PM CEST, dsa@cumulusnetworks.com wrote:
> >On 9/6/16 8:44 AM, Jiri Pirko wrote:
> >> Tue, Sep 06, 2016 at 04:32:12PM CEST, dsa@cumulusnetworks.com wrote:
> >>> On 9/6/16 6:01 AM, Jiri Pirko wrote:
> >>>> From: Jiri Pirko <jiri@mellanox.com>
> >>>>
> >>>> This allows to pass information about added/deleted fib entries to
> >>>> whoever is interested. This is done in a very similar way as devinet
> >>>> notifies address additions/removals.
> >>>>
> >>>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> >>>> ---
> >>>>  include/net/ip_fib.h | 19 +++++++++++++++++++
> >>>>  net/ipv4/fib_trie.c  | 43 +++++++++++++++++++++++++++++++++++++++++++
> >>>>  2 files changed, 62 insertions(+)
> >>>>
> >>>
> >>> The notifier infrastructure should be generalized for use with IPv4 and IPv6. While the data will be family based, the infra can be generic.
> >>>
> >> 
> >> Yeah, that I thought about as well. Thing is, ipv6 notifier has to be
> >> atomic. That is the reason we have:
> >> inetaddr_chain and register_inetaddr_notifier (blocking notifier)
> >> inet6addr_chain and register_inet6addr_notifier (atomic notifier)
> >> 
> >
> >Why is IPv6 atomic? Looking at code paths for adding addresses seems like all of the locks are dropped before the notifier is called and adding and deleting ipv6 addresses does not show a hit with this WARN_ON:
> 
> 
> Maybe historic reasons. Would be good to unite the notifiers then. I'll
> look at it.

We add IPs and routes from bottom half layer because of neighbour
discovery router advertisements. They need to run in atomic context
without sleeping.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch net-next RFC 1/2] fib: introduce fib notification infrastructure
  2016-09-06 15:13   ` David Ahern
@ 2016-09-07  8:03     ` Jiri Pirko
  0 siblings, 0 replies; 27+ messages in thread
From: Jiri Pirko @ 2016-09-07  8:03 UTC (permalink / raw)
  To: David Ahern
  Cc: netdev, davem, idosch, eladr, yotamg, nogahf, ogerlitz, roopa,
	nikolay, linville, tgraf, gospo, sfeldma, ast, edumazet, hannes,
	f.fainelli, jhs, vivien.didelot, john.fastabend, andrew, ivecera

Tue, Sep 06, 2016 at 05:13:36PM CEST, dsa@cumulusnetworks.com wrote:
>On 9/6/16 6:01 AM, Jiri Pirko wrote:
>> From: Jiri Pirko <jiri@mellanox.com>
>> 
>> This allows to pass information about added/deleted fib entries to
>> whoever is interested. This is done in a very similar way as devinet
>> notifies address additions/removals.
>> 
>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>> ---
>>  include/net/ip_fib.h | 19 +++++++++++++++++++
>>  net/ipv4/fib_trie.c  | 43 +++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 62 insertions(+)
>
>Do you intend for this set of notifiers to work with policy routing (FIB rules) as well?


Yes, plan is to put the notifier calls before notify_rule_change. Would
probably make sense to have this in one notifier, only several event
types. Not sure.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [net-next,RFC,1/2] fib: introduce fib notification infrastructure
  2016-09-06 12:01 ` [patch net-next RFC 1/2] fib: introduce fib notification infrastructure Jiri Pirko
  2016-09-06 14:32   ` David Ahern
  2016-09-06 15:13   ` David Ahern
@ 2016-09-15 14:41   ` Andy Gospodarek
  2016-09-15 14:45     ` Jiri Pirko
  2016-09-18 23:23   ` [patch net-next RFC 1/2] " Roopa Prabhu
  3 siblings, 1 reply; 27+ messages in thread
From: Andy Gospodarek @ 2016-09-15 14:41 UTC (permalink / raw)
  To: Jiří Pírko
  Cc: netdev, David Miller, idosch, eladr, yotamg, nogahf, ogerlitz,
	Roopa Prabhu, nikolay, John Linville, Thomas Graf, Scott Feldman,
	ast, Eric Dumazet, hannes, Florian Fainelli, dsa,
	Jamal Hadi Salim, vivien.didelot, john.fastabend, andrew,
	ivecera

On Tue, Sep 6, 2016 at 8:01 AM, Jiri Pirko
<andrew.gospodarek@broadcom.com> wrote:
> From: Jiri Pirko <jiri@mellanox.com>
>
> This allows to pass information about added/deleted fib entries to
> whoever is interested. This is done in a very similar way as devinet
> notifies address additions/removals.

(Sorry for the delayed response here...)

I had tried a slightly different approach, but this one also seems
reasonable and possibly better -- especially if this can be made more
generic and shared between ipv4 and ipv6 despite their inherent
differences.

What I did differently was make a more ipv4-specific change to start
with that did this:

+#define RTNH_F_MODIFIED                (1 << 7)        /* used for
internal kernel tracking */
+
+#define RTNH_F_COMPARE_MASK    (RTNH_F_DEAD | \
+                                RTNH_F_LINKDOWN | \
+                                RTNH_F_MODIFIED) /* used as mask for
route comparisons */

Then in various cases where the route was modified (fib_sync_up, etc),
I added this:

+                                nexthop_nh->nh_flags |= RTNH_F_MODIFIED;

Checking for the modified flag was then done in fib_table_update().
This new function was a rewrite of fib_table_flush() and checks for
RTNH_F_MODIFIED were done there before calling switchdev infra and
then announce new routes if routes changed.

The main issue I see right now is that neither userspace nor switchdev
are notified when a route flag changes.  This needs to be resolved.

I think this RFC is along the proper path to provide notification, but
I'm not sure that notification will happen when flags change (most
notably the LNKDOWN flag) and there are some other corner cases that
could probably be covered as well.

I need to forward-port my patch from where it was to the latest
net-next and see if these cases I was concerned about were still an
issue.  I'm happy to do that and see if we can put this all together
to fix a few of the outstanding issues.


>
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> ---
>  include/net/ip_fib.h | 19 +++++++++++++++++++
>  net/ipv4/fib_trie.c  | 43 +++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 62 insertions(+)
>
> diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
> index 4079fc1..9ad7ba9 100644
> --- a/include/net/ip_fib.h
> +++ b/include/net/ip_fib.h
> @@ -22,6 +22,7 @@
>  #include <net/fib_rules.h>
>  #include <net/inetpeer.h>
>  #include <linux/percpu.h>
> +#include <linux/notifier.h>
>
>  struct fib_config {
>         u8                      fc_dst_len;
> @@ -184,6 +185,24 @@ __be32 fib_info_update_nh_saddr(struct net *net, struct fib_nh *nh);
>  #define FIB_RES_PREFSRC(net, res)      ((res).fi->fib_prefsrc ? : \
>                                          FIB_RES_SADDR(net, res))
>
> +struct fib_notifier_info {
> +       u32 dst;
> +       int dst_len;
> +       struct fib_info *fi;
> +       u8 tos;
> +       u8 type;
> +       u32 tb_id;
> +       u32 nlflags;
> +};
> +
> +enum fib_event_type {
> +       FIB_EVENT_TYPE_ADD,
> +       FIB_EVENT_TYPE_DEL,
> +};
> +
> +int register_fib_notifier(struct notifier_block *nb);
> +int unregister_fib_notifier(struct notifier_block *nb);
> +
>  struct fib_table {
>         struct hlist_node       tb_hlist;
>         u32                     tb_id;
> diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
> index e2ffc2a..19ec471 100644
> --- a/net/ipv4/fib_trie.c
> +++ b/net/ipv4/fib_trie.c
> @@ -73,6 +73,7 @@
>  #include <linux/slab.h>
>  #include <linux/export.h>
>  #include <linux/vmalloc.h>
> +#include <linux/notifier.h>
>  #include <net/net_namespace.h>
>  #include <net/ip.h>
>  #include <net/protocol.h>
> @@ -84,6 +85,36 @@
>  #include <trace/events/fib.h>
>  #include "fib_lookup.h"
>
> +static BLOCKING_NOTIFIER_HEAD(fib_chain);
> +
> +int register_fib_notifier(struct notifier_block *nb)
> +{
> +       return blocking_notifier_chain_register(&fib_chain, nb);
> +}
> +EXPORT_SYMBOL(register_fib_notifier);
> +
> +int unregister_fib_notifier(struct notifier_block *nb)
> +{
> +       return blocking_notifier_chain_unregister(&fib_chain, nb);
> +}
> +EXPORT_SYMBOL(unregister_fib_notifier);
> +
> +static int call_fib_notifiers(enum fib_event_type event_type, u32 dst,
> +                             int dst_len, struct fib_info *fi,
> +                             u8 tos, u8 type, u32 tb_id, u32 nlflags)
> +{
> +       struct fib_notifier_info info = {
> +               .dst = dst,
> +               .dst_len = dst_len,
> +               .fi = fi,
> +               .tos = tos,
> +               .type = type,
> +               .tb_id = tb_id,
> +               .nlflags = nlflags,
> +       };
> +       return blocking_notifier_call_chain(&fib_chain, event_type, &info);
> +}
> +
>  #define MAX_STAT_DEPTH 32
>
>  #define KEYLENGTH      (8*sizeof(t_key))
> @@ -1190,6 +1221,10 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
>                         fib_release_info(fi_drop);
>                         if (state & FA_S_ACCESSED)
>                                 rt_cache_flush(cfg->fc_nlinfo.nl_net);
> +
> +                       call_fib_notifiers(FIB_EVENT_TYPE_ADD, key, plen, fi,
> +                                          new_fa->fa_tos, cfg->fc_type,
> +                                          tb->tb_id, cfg->fc_nlflags);
>                         rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen,
>                                 tb->tb_id, &cfg->fc_nlinfo, NLM_F_REPLACE);
>
> @@ -1241,6 +1276,8 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
>                 tb->tb_num_default++;
>
>         rt_cache_flush(cfg->fc_nlinfo.nl_net);
> +       call_fib_notifiers(FIB_EVENT_TYPE_ADD, key, plen, fi, tos,
> +                          cfg->fc_type, tb->tb_id, cfg->fc_nlflags);
>         rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen, new_fa->tb_id,
>                   &cfg->fc_nlinfo, nlflags);
>  succeeded:
> @@ -1542,6 +1579,8 @@ int fib_table_delete(struct fib_table *tb, struct fib_config *cfg)
>         switchdev_fib_ipv4_del(key, plen, fa_to_delete->fa_info, tos,
>                                cfg->fc_type, tb->tb_id);
>
> +       call_fib_notifiers(FIB_EVENT_TYPE_DEL, key, plen, fa_to_delete->fa_info,
> +                          tos, cfg->fc_type, tb->tb_id, 0);
>         rtmsg_fib(RTM_DELROUTE, htonl(key), fa_to_delete, plen, tb->tb_id,
>                   &cfg->fc_nlinfo, 0);
>
> @@ -1857,6 +1896,10 @@ int fib_table_flush(struct fib_table *tb)
>                         switchdev_fib_ipv4_del(n->key, KEYLENGTH - fa->fa_slen,
>                                                fi, fa->fa_tos, fa->fa_type,
>                                                tb->tb_id);
> +                       call_fib_notifiers(FIB_EVENT_TYPE_DEL, n->key,
> +                                          KEYLENGTH - fa->fa_slen,
> +                                          fi, fa->fa_tos, fa->fa_type,
> +                                          tb->tb_id, 0);
>                         hlist_del_rcu(&fa->fa_list);
>                         fib_release_info(fa->fa_info);
>                         alias_free_mem_rcu(fa);

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [net-next,RFC,1/2] fib: introduce fib notification infrastructure
  2016-09-15 14:41   ` [net-next,RFC,1/2] " Andy Gospodarek
@ 2016-09-15 14:45     ` Jiri Pirko
  2016-09-15 14:47       ` Andy Gospodarek
  0 siblings, 1 reply; 27+ messages in thread
From: Jiri Pirko @ 2016-09-15 14:45 UTC (permalink / raw)
  To: Andy Gospodarek
  Cc: netdev, David Miller, idosch, eladr, yotamg, nogahf, ogerlitz,
	Roopa Prabhu, nikolay, John Linville, Thomas Graf, Scott Feldman,
	ast, Eric Dumazet, hannes, Florian Fainelli, dsa,
	Jamal Hadi Salim, vivien.didelot, john.fastabend, andrew,
	ivecera

Thu, Sep 15, 2016 at 04:41:20PM CEST, andy@greyhouse.net wrote:
>On Tue, Sep 6, 2016 at 8:01 AM, Jiri Pirko
><andrew.gospodarek@broadcom.com> wrote:
>> From: Jiri Pirko <jiri@mellanox.com>
>>
>> This allows to pass information about added/deleted fib entries to
>> whoever is interested. This is done in a very similar way as devinet
>> notifies address additions/removals.
>
>(Sorry for the delayed response here...)
>
>I had tried a slightly different approach, but this one also seems
>reasonable and possibly better -- especially if this can be made more
>generic and shared between ipv4 and ipv6 despite their inherent
>differences.
>
>What I did differently was make a more ipv4-specific change to start
>with that did this:
>
>+#define RTNH_F_MODIFIED                (1 << 7)        /* used for
>internal kernel tracking */
>+
>+#define RTNH_F_COMPARE_MASK    (RTNH_F_DEAD | \
>+                                RTNH_F_LINKDOWN | \
>+                                RTNH_F_MODIFIED) /* used as mask for
>route comparisons */
>
>Then in various cases where the route was modified (fib_sync_up, etc),
>I added this:
>
>+                                nexthop_nh->nh_flags |= RTNH_F_MODIFIED;
>
>Checking for the modified flag was then done in fib_table_update().
>This new function was a rewrite of fib_table_flush() and checks for
>RTNH_F_MODIFIED were done there before calling switchdev infra and
>then announce new routes if routes changed.
>
>The main issue I see right now is that neither userspace nor switchdev
>are notified when a route flag changes.  This needs to be resolved.
>
>I think this RFC is along the proper path to provide notification, but
>I'm not sure that notification will happen when flags change (most
>notably the LNKDOWN flag) and there are some other corner cases that
>could probably be covered as well.
>
>I need to forward-port my patch from where it was to the latest
>net-next and see if these cases I was concerned about were still an
>issue.  I'm happy to do that and see if we can put this all together
>to fix a few of the outstanding issues.

I believe that "modify" can be easily another fib event. Drivers can
react accordingly. I'm close to sending v1 (hopefully tomorrow). I
believe you can base your patchset on top of mine which saves you lot of
time.


>
>
>>
>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>> ---
>>  include/net/ip_fib.h | 19 +++++++++++++++++++
>>  net/ipv4/fib_trie.c  | 43 +++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 62 insertions(+)
>>
>> diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
>> index 4079fc1..9ad7ba9 100644
>> --- a/include/net/ip_fib.h
>> +++ b/include/net/ip_fib.h
>> @@ -22,6 +22,7 @@
>>  #include <net/fib_rules.h>
>>  #include <net/inetpeer.h>
>>  #include <linux/percpu.h>
>> +#include <linux/notifier.h>
>>
>>  struct fib_config {
>>         u8                      fc_dst_len;
>> @@ -184,6 +185,24 @@ __be32 fib_info_update_nh_saddr(struct net *net, struct fib_nh *nh);
>>  #define FIB_RES_PREFSRC(net, res)      ((res).fi->fib_prefsrc ? : \
>>                                          FIB_RES_SADDR(net, res))
>>
>> +struct fib_notifier_info {
>> +       u32 dst;
>> +       int dst_len;
>> +       struct fib_info *fi;
>> +       u8 tos;
>> +       u8 type;
>> +       u32 tb_id;
>> +       u32 nlflags;
>> +};
>> +
>> +enum fib_event_type {
>> +       FIB_EVENT_TYPE_ADD,
>> +       FIB_EVENT_TYPE_DEL,
>> +};
>> +
>> +int register_fib_notifier(struct notifier_block *nb);
>> +int unregister_fib_notifier(struct notifier_block *nb);
>> +
>>  struct fib_table {
>>         struct hlist_node       tb_hlist;
>>         u32                     tb_id;
>> diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
>> index e2ffc2a..19ec471 100644
>> --- a/net/ipv4/fib_trie.c
>> +++ b/net/ipv4/fib_trie.c
>> @@ -73,6 +73,7 @@
>>  #include <linux/slab.h>
>>  #include <linux/export.h>
>>  #include <linux/vmalloc.h>
>> +#include <linux/notifier.h>
>>  #include <net/net_namespace.h>
>>  #include <net/ip.h>
>>  #include <net/protocol.h>
>> @@ -84,6 +85,36 @@
>>  #include <trace/events/fib.h>
>>  #include "fib_lookup.h"
>>
>> +static BLOCKING_NOTIFIER_HEAD(fib_chain);
>> +
>> +int register_fib_notifier(struct notifier_block *nb)
>> +{
>> +       return blocking_notifier_chain_register(&fib_chain, nb);
>> +}
>> +EXPORT_SYMBOL(register_fib_notifier);
>> +
>> +int unregister_fib_notifier(struct notifier_block *nb)
>> +{
>> +       return blocking_notifier_chain_unregister(&fib_chain, nb);
>> +}
>> +EXPORT_SYMBOL(unregister_fib_notifier);
>> +
>> +static int call_fib_notifiers(enum fib_event_type event_type, u32 dst,
>> +                             int dst_len, struct fib_info *fi,
>> +                             u8 tos, u8 type, u32 tb_id, u32 nlflags)
>> +{
>> +       struct fib_notifier_info info = {
>> +               .dst = dst,
>> +               .dst_len = dst_len,
>> +               .fi = fi,
>> +               .tos = tos,
>> +               .type = type,
>> +               .tb_id = tb_id,
>> +               .nlflags = nlflags,
>> +       };
>> +       return blocking_notifier_call_chain(&fib_chain, event_type, &info);
>> +}
>> +
>>  #define MAX_STAT_DEPTH 32
>>
>>  #define KEYLENGTH      (8*sizeof(t_key))
>> @@ -1190,6 +1221,10 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
>>                         fib_release_info(fi_drop);
>>                         if (state & FA_S_ACCESSED)
>>                                 rt_cache_flush(cfg->fc_nlinfo.nl_net);
>> +
>> +                       call_fib_notifiers(FIB_EVENT_TYPE_ADD, key, plen, fi,
>> +                                          new_fa->fa_tos, cfg->fc_type,
>> +                                          tb->tb_id, cfg->fc_nlflags);
>>                         rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen,
>>                                 tb->tb_id, &cfg->fc_nlinfo, NLM_F_REPLACE);
>>
>> @@ -1241,6 +1276,8 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
>>                 tb->tb_num_default++;
>>
>>         rt_cache_flush(cfg->fc_nlinfo.nl_net);
>> +       call_fib_notifiers(FIB_EVENT_TYPE_ADD, key, plen, fi, tos,
>> +                          cfg->fc_type, tb->tb_id, cfg->fc_nlflags);
>>         rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen, new_fa->tb_id,
>>                   &cfg->fc_nlinfo, nlflags);
>>  succeeded:
>> @@ -1542,6 +1579,8 @@ int fib_table_delete(struct fib_table *tb, struct fib_config *cfg)
>>         switchdev_fib_ipv4_del(key, plen, fa_to_delete->fa_info, tos,
>>                                cfg->fc_type, tb->tb_id);
>>
>> +       call_fib_notifiers(FIB_EVENT_TYPE_DEL, key, plen, fa_to_delete->fa_info,
>> +                          tos, cfg->fc_type, tb->tb_id, 0);
>>         rtmsg_fib(RTM_DELROUTE, htonl(key), fa_to_delete, plen, tb->tb_id,
>>                   &cfg->fc_nlinfo, 0);
>>
>> @@ -1857,6 +1896,10 @@ int fib_table_flush(struct fib_table *tb)
>>                         switchdev_fib_ipv4_del(n->key, KEYLENGTH - fa->fa_slen,
>>                                                fi, fa->fa_tos, fa->fa_type,
>>                                                tb->tb_id);
>> +                       call_fib_notifiers(FIB_EVENT_TYPE_DEL, n->key,
>> +                                          KEYLENGTH - fa->fa_slen,
>> +                                          fi, fa->fa_tos, fa->fa_type,
>> +                                          tb->tb_id, 0);
>>                         hlist_del_rcu(&fa->fa_list);
>>                         fib_release_info(fa->fa_info);
>>                         alias_free_mem_rcu(fa);

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [net-next,RFC,1/2] fib: introduce fib notification infrastructure
  2016-09-15 14:45     ` Jiri Pirko
@ 2016-09-15 14:47       ` Andy Gospodarek
  0 siblings, 0 replies; 27+ messages in thread
From: Andy Gospodarek @ 2016-09-15 14:47 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, David Miller, idosch, eladr, yotamg, nogahf, ogerlitz,
	Roopa Prabhu, nikolay, John Linville, Thomas Graf, Scott Feldman,
	ast, Eric Dumazet, hannes, Florian Fainelli, dsa,
	Jamal Hadi Salim, vivien.didelot, john.fastabend, andrew,
	ivecera

On Thu, Sep 15, 2016 at 10:45 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> Thu, Sep 15, 2016 at 04:41:20PM CEST, andy@greyhouse.net wrote:
>>On Tue, Sep 6, 2016 at 8:01 AM, Jiri Pirko
>><andrew.gospodarek@broadcom.com> wrote:
>>> From: Jiri Pirko <jiri@mellanox.com>
>>>
>>> This allows to pass information about added/deleted fib entries to
>>> whoever is interested. This is done in a very similar way as devinet
>>> notifies address additions/removals.
>>
>>(Sorry for the delayed response here...)
>>
>>I had tried a slightly different approach, but this one also seems
>>reasonable and possibly better -- especially if this can be made more
>>generic and shared between ipv4 and ipv6 despite their inherent
>>differences.
>>
>>What I did differently was make a more ipv4-specific change to start
>>with that did this:
>>
>>+#define RTNH_F_MODIFIED                (1 << 7)        /* used for
>>internal kernel tracking */
>>+
>>+#define RTNH_F_COMPARE_MASK    (RTNH_F_DEAD | \
>>+                                RTNH_F_LINKDOWN | \
>>+                                RTNH_F_MODIFIED) /* used as mask for
>>route comparisons */
>>
>>Then in various cases where the route was modified (fib_sync_up, etc),
>>I added this:
>>
>>+                                nexthop_nh->nh_flags |= RTNH_F_MODIFIED;
>>
>>Checking for the modified flag was then done in fib_table_update().
>>This new function was a rewrite of fib_table_flush() and checks for
>>RTNH_F_MODIFIED were done there before calling switchdev infra and
>>then announce new routes if routes changed.
>>
>>The main issue I see right now is that neither userspace nor switchdev
>>are notified when a route flag changes.  This needs to be resolved.
>>
>>I think this RFC is along the proper path to provide notification, but
>>I'm not sure that notification will happen when flags change (most
>>notably the LNKDOWN flag) and there are some other corner cases that
>>could probably be covered as well.
>>
>>I need to forward-port my patch from where it was to the latest
>>net-next and see if these cases I was concerned about were still an
>>issue.  I'm happy to do that and see if we can put this all together
>>to fix a few of the outstanding issues.
>
> I believe that "modify" can be easily another fib event. Drivers can
> react accordingly. I'm close to sending v1 (hopefully tomorrow). I
> believe you can base your patchset on top of mine which saves you lot of
> time.
>

Sounds good -- looking forward to it.  If you add this email on the
cc-list rather than the other one, I'll see it more quickly this time.
:-)


>
>>
>>
>>>
>>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>>> ---
>>>  include/net/ip_fib.h | 19 +++++++++++++++++++
>>>  net/ipv4/fib_trie.c  | 43 +++++++++++++++++++++++++++++++++++++++++++
>>>  2 files changed, 62 insertions(+)
>>>
>>> diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
>>> index 4079fc1..9ad7ba9 100644
>>> --- a/include/net/ip_fib.h
>>> +++ b/include/net/ip_fib.h
>>> @@ -22,6 +22,7 @@
>>>  #include <net/fib_rules.h>
>>>  #include <net/inetpeer.h>
>>>  #include <linux/percpu.h>
>>> +#include <linux/notifier.h>
>>>
>>>  struct fib_config {
>>>         u8                      fc_dst_len;
>>> @@ -184,6 +185,24 @@ __be32 fib_info_update_nh_saddr(struct net *net, struct fib_nh *nh);
>>>  #define FIB_RES_PREFSRC(net, res)      ((res).fi->fib_prefsrc ? : \
>>>                                          FIB_RES_SADDR(net, res))
>>>
>>> +struct fib_notifier_info {
>>> +       u32 dst;
>>> +       int dst_len;
>>> +       struct fib_info *fi;
>>> +       u8 tos;
>>> +       u8 type;
>>> +       u32 tb_id;
>>> +       u32 nlflags;
>>> +};
>>> +
>>> +enum fib_event_type {
>>> +       FIB_EVENT_TYPE_ADD,
>>> +       FIB_EVENT_TYPE_DEL,
>>> +};
>>> +
>>> +int register_fib_notifier(struct notifier_block *nb);
>>> +int unregister_fib_notifier(struct notifier_block *nb);
>>> +
>>>  struct fib_table {
>>>         struct hlist_node       tb_hlist;
>>>         u32                     tb_id;
>>> diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
>>> index e2ffc2a..19ec471 100644
>>> --- a/net/ipv4/fib_trie.c
>>> +++ b/net/ipv4/fib_trie.c
>>> @@ -73,6 +73,7 @@
>>>  #include <linux/slab.h>
>>>  #include <linux/export.h>
>>>  #include <linux/vmalloc.h>
>>> +#include <linux/notifier.h>
>>>  #include <net/net_namespace.h>
>>>  #include <net/ip.h>
>>>  #include <net/protocol.h>
>>> @@ -84,6 +85,36 @@
>>>  #include <trace/events/fib.h>
>>>  #include "fib_lookup.h"
>>>
>>> +static BLOCKING_NOTIFIER_HEAD(fib_chain);
>>> +
>>> +int register_fib_notifier(struct notifier_block *nb)
>>> +{
>>> +       return blocking_notifier_chain_register(&fib_chain, nb);
>>> +}
>>> +EXPORT_SYMBOL(register_fib_notifier);
>>> +
>>> +int unregister_fib_notifier(struct notifier_block *nb)
>>> +{
>>> +       return blocking_notifier_chain_unregister(&fib_chain, nb);
>>> +}
>>> +EXPORT_SYMBOL(unregister_fib_notifier);
>>> +
>>> +static int call_fib_notifiers(enum fib_event_type event_type, u32 dst,
>>> +                             int dst_len, struct fib_info *fi,
>>> +                             u8 tos, u8 type, u32 tb_id, u32 nlflags)
>>> +{
>>> +       struct fib_notifier_info info = {
>>> +               .dst = dst,
>>> +               .dst_len = dst_len,
>>> +               .fi = fi,
>>> +               .tos = tos,
>>> +               .type = type,
>>> +               .tb_id = tb_id,
>>> +               .nlflags = nlflags,
>>> +       };
>>> +       return blocking_notifier_call_chain(&fib_chain, event_type, &info);
>>> +}
>>> +
>>>  #define MAX_STAT_DEPTH 32
>>>
>>>  #define KEYLENGTH      (8*sizeof(t_key))
>>> @@ -1190,6 +1221,10 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
>>>                         fib_release_info(fi_drop);
>>>                         if (state & FA_S_ACCESSED)
>>>                                 rt_cache_flush(cfg->fc_nlinfo.nl_net);
>>> +
>>> +                       call_fib_notifiers(FIB_EVENT_TYPE_ADD, key, plen, fi,
>>> +                                          new_fa->fa_tos, cfg->fc_type,
>>> +                                          tb->tb_id, cfg->fc_nlflags);
>>>                         rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen,
>>>                                 tb->tb_id, &cfg->fc_nlinfo, NLM_F_REPLACE);
>>>
>>> @@ -1241,6 +1276,8 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
>>>                 tb->tb_num_default++;
>>>
>>>         rt_cache_flush(cfg->fc_nlinfo.nl_net);
>>> +       call_fib_notifiers(FIB_EVENT_TYPE_ADD, key, plen, fi, tos,
>>> +                          cfg->fc_type, tb->tb_id, cfg->fc_nlflags);
>>>         rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen, new_fa->tb_id,
>>>                   &cfg->fc_nlinfo, nlflags);
>>>  succeeded:
>>> @@ -1542,6 +1579,8 @@ int fib_table_delete(struct fib_table *tb, struct fib_config *cfg)
>>>         switchdev_fib_ipv4_del(key, plen, fa_to_delete->fa_info, tos,
>>>                                cfg->fc_type, tb->tb_id);
>>>
>>> +       call_fib_notifiers(FIB_EVENT_TYPE_DEL, key, plen, fa_to_delete->fa_info,
>>> +                          tos, cfg->fc_type, tb->tb_id, 0);
>>>         rtmsg_fib(RTM_DELROUTE, htonl(key), fa_to_delete, plen, tb->tb_id,
>>>                   &cfg->fc_nlinfo, 0);
>>>
>>> @@ -1857,6 +1896,10 @@ int fib_table_flush(struct fib_table *tb)
>>>                         switchdev_fib_ipv4_del(n->key, KEYLENGTH - fa->fa_slen,
>>>                                                fi, fa->fa_tos, fa->fa_type,
>>>                                                tb->tb_id);
>>> +                       call_fib_notifiers(FIB_EVENT_TYPE_DEL, n->key,
>>> +                                          KEYLENGTH - fa->fa_slen,
>>> +                                          fi, fa->fa_tos, fa->fa_type,
>>> +                                          tb->tb_id, 0);
>>>                         hlist_del_rcu(&fa->fa_list);
>>>                         fib_release_info(fa->fa_info);
>>>                         alias_free_mem_rcu(fa);

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch net-next RFC 0/2] fib4 offload: notifier to let hw to be aware of all prefixes
  2016-09-06 12:01 [patch net-next RFC 0/2] fib4 offload: notifier to let hw to be aware of all prefixes Jiri Pirko
  2016-09-06 12:01 ` [patch net-next RFC 1/2] fib: introduce fib notification infrastructure Jiri Pirko
  2016-09-06 12:01 ` [patch net-next RFC 2/2] mlxsw: spectrum_router: Use FIB notifications instead of switchdev calls Jiri Pirko
@ 2016-09-18 20:00 ` Florian Fainelli
  2016-09-18 23:16   ` Roopa Prabhu
  2016-09-19  6:08   ` Jiri Pirko
  2 siblings, 2 replies; 27+ messages in thread
From: Florian Fainelli @ 2016-09-18 20:00 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, idosch, eladr, yotamg, nogahf, ogerlitz, roopa, nikolay,
	linville, tgraf, gospo, sfeldma, ast, edumazet, hannes, dsa, jhs,
	vivien.didelot, john.fastabend, andrew, ivecera

Le 06/09/2016 à 05:01, Jiri Pirko a écrit :
> From: Jiri Pirko <jiri@mellanox.com>
> 
> This is RFC, unfinished. I came across some issues in the process so I would
> like to share those and restart the fib offload discussion in order to make it
> really usable.
> 
> So the goal of this patchset is to allow driver to propagate all prefixes
> configured in kernel down HW. This is necessary for routing to work
> as expected. If we don't do that HW might forward prefixes known to kernel
> incorrectly. Take an example when default route is set in switch HW and there
> is an IP address set on a management (non-switch) port.
> 
> Currently, only fibs related to the switch port netdev are offloaded using
> switchdev ops. This model is not extendable so the first patch introduces
> a replacement: notifier to propagate fib additions and removals to whoever
> interested. The second patch makes mlxsw to adopt this new way, registering
> one notifier block for each mlxsw (asic) instance.

Instead of introducing another specialization of a notifier_block
implementation, could we somehow have a kernel-based netlink listener
which receives the same kind of event information from rtmsg_fib()?

The reason is that having such a facility would hook directly onto
existing rtmsg_* calls that exist throughout the stack, and that seems
to scale better.
-- 
Florian

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch net-next RFC 0/2] fib4 offload: notifier to let hw to be aware of all prefixes
  2016-09-18 20:00 ` [patch net-next RFC 0/2] fib4 offload: notifier to let hw to be aware of all prefixes Florian Fainelli
@ 2016-09-18 23:16   ` Roopa Prabhu
  2016-09-19  6:14     ` Jiri Pirko
  2016-09-19  6:08   ` Jiri Pirko
  1 sibling, 1 reply; 27+ messages in thread
From: Roopa Prabhu @ 2016-09-18 23:16 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Jiri Pirko, netdev, davem, idosch, eladr, yotamg, nogahf,
	ogerlitz, nikolay, linville, tgraf, gospo, sfeldma, ast,
	edumazet, hannes, dsa, jhs, vivien.didelot, john.fastabend,
	andrew, ivecera

On 9/18/16, 1:00 PM, Florian Fainelli wrote:
> Le 06/09/2016 à 05:01, Jiri Pirko a écrit :
>> From: Jiri Pirko <jiri@mellanox.com>
>>
>> This is RFC, unfinished. I came across some issues in the process so I would
>> like to share those and restart the fib offload discussion in order to make it
>> really usable.
>>
>> So the goal of this patchset is to allow driver to propagate all prefixes
>> configured in kernel down HW. This is necessary for routing to work
>> as expected. If we don't do that HW might forward prefixes known to kernel
>> incorrectly. Take an example when default route is set in switch HW and there
>> is an IP address set on a management (non-switch) port.
>>
>> Currently, only fibs related to the switch port netdev are offloaded using
>> switchdev ops. This model is not extendable so the first patch introduces
>> a replacement: notifier to propagate fib additions and removals to whoever
>> interested. The second patch makes mlxsw to adopt this new way, registering
>> one notifier block for each mlxsw (asic) instance.
> Instead of introducing another specialization of a notifier_block
> implementation, could we somehow have a kernel-based netlink listener
> which receives the same kind of event information from rtmsg_fib()?
>
> The reason is that having such a facility would hook directly onto
> existing rtmsg_* calls that exist throughout the stack, and that seems
> to scale better.
I was thinking along the same lines. Instead of proliferating notifier blocks
through-out the stack for switchdev offload, putting existing events to use would be nice.

But the problem though is drivers having to parse the netlink msg again. also, the intent
here is to do the offload first ..before the route is added to the kernel (though i don't see that in
the current series). existing netlink rmsg_fib events are generated after the route is added to the kernel.


Jiri, instead of the notifier, do you see a problem with always calling the existing switchdev
offload api for every route  for every asic instance ?. the first device where the route fits wins.
it seems similar to driver registering for notifier and looking at every route ...
am i missing something ?
and the policies you mention could help around selecting the asic instance (FCFS or mirror).
you will need to abstract out the asic instance for switchdev api to call on, but I thought you
already have that in some form in your devlink infrastructure.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch net-next RFC 1/2] fib: introduce fib notification infrastructure
  2016-09-06 12:01 ` [patch net-next RFC 1/2] fib: introduce fib notification infrastructure Jiri Pirko
                     ` (2 preceding siblings ...)
  2016-09-15 14:41   ` [net-next,RFC,1/2] " Andy Gospodarek
@ 2016-09-18 23:23   ` Roopa Prabhu
  2016-09-19  6:06     ` Jiri Pirko
  3 siblings, 1 reply; 27+ messages in thread
From: Roopa Prabhu @ 2016-09-18 23:23 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, idosch, eladr, yotamg, nogahf, ogerlitz, nikolay,
	linville, tgraf, gospo, sfeldma, ast, edumazet, hannes,
	f.fainelli, dsa, jhs, vivien.didelot, john.fastabend, andrew,
	ivecera

On 9/6/16, 5:01 AM, Jiri Pirko wrote:
> From: Jiri Pirko <jiri@mellanox.com>
>
> This allows to pass information about added/deleted fib entries to
> whoever is interested. This is done in a very similar way as devinet
> notifies address additions/removals.
>
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> ---
>  include/net/ip_fib.h | 19 +++++++++++++++++++
>  net/ipv4/fib_trie.c  | 43 +++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 62 insertions(+)
>
> diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
> index 4079fc1..9ad7ba9 100644
> --- a/include/net/ip_fib.h
> +++ b/include/net/ip_fib.h
> @@ -22,6 +22,7 @@
>  #include <net/fib_rules.h>
>  #include <net/inetpeer.h>
>  #include <linux/percpu.h>
> +#include <linux/notifier.h>
>  
>  struct fib_config {
>  	u8			fc_dst_len;
> @@ -184,6 +185,24 @@ __be32 fib_info_update_nh_saddr(struct net *net, struct fib_nh *nh);
>  #define FIB_RES_PREFSRC(net, res)	((res).fi->fib_prefsrc ? : \
>  					 FIB_RES_SADDR(net, res))
>  
> +struct fib_notifier_info {
> +	u32 dst;
> +	int dst_len;
> +	struct fib_info *fi;
> +	u8 tos;
> +	u8 type;
> +	u32 tb_id;
> +	u32 nlflags;
> +};
> +
> +enum fib_event_type {
> +	FIB_EVENT_TYPE_ADD,
> +	FIB_EVENT_TYPE_DEL,
> +};
> +
> +int register_fib_notifier(struct notifier_block *nb);
> +int unregister_fib_notifier(struct notifier_block *nb);
> +
>  struct fib_table {
>  	struct hlist_node	tb_hlist;
>  	u32			tb_id;
> diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
> index e2ffc2a..19ec471 100644
> --- a/net/ipv4/fib_trie.c
> +++ b/net/ipv4/fib_trie.c
> @@ -73,6 +73,7 @@
>  #include <linux/slab.h>
>  #include <linux/export.h>
>  #include <linux/vmalloc.h>
> +#include <linux/notifier.h>
>  #include <net/net_namespace.h>
>  #include <net/ip.h>
>  #include <net/protocol.h>
> @@ -84,6 +85,36 @@
>  #include <trace/events/fib.h>
>  #include "fib_lookup.h"
>  
> +static BLOCKING_NOTIFIER_HEAD(fib_chain);
> +
> +int register_fib_notifier(struct notifier_block *nb)
> +{
> +	return blocking_notifier_chain_register(&fib_chain, nb);
> +}
> +EXPORT_SYMBOL(register_fib_notifier);
> +
> +int unregister_fib_notifier(struct notifier_block *nb)
> +{
> +	return blocking_notifier_chain_unregister(&fib_chain, nb);
> +}
> +EXPORT_SYMBOL(unregister_fib_notifier);
> +
> +static int call_fib_notifiers(enum fib_event_type event_type, u32 dst,
> +			      int dst_len, struct fib_info *fi,
> +			      u8 tos, u8 type, u32 tb_id, u32 nlflags)
> +{
> +	struct fib_notifier_info info = {
> +		.dst = dst,
> +		.dst_len = dst_len,
> +		.fi = fi,
> +		.tos = tos,
> +		.type = type,
> +		.tb_id = tb_id,
> +		.nlflags = nlflags,
> +	};
> +	return blocking_notifier_call_chain(&fib_chain, event_type, &info);
> +}
> +
>  #define MAX_STAT_DEPTH 32
>  
>  #define KEYLENGTH	(8*sizeof(t_key))
> @@ -1190,6 +1221,10 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
>  			fib_release_info(fi_drop);
>  			if (state & FA_S_ACCESSED)
>  				rt_cache_flush(cfg->fc_nlinfo.nl_net);
> +
> +			call_fib_notifiers(FIB_EVENT_TYPE_ADD, key, plen, fi,
> +					   new_fa->fa_tos, cfg->fc_type,
> +					   tb->tb_id, cfg->fc_nlflags);
>  			rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen,
>  				tb->tb_id, &cfg->fc_nlinfo, NLM_F_REPLACE);
>  
> @@ -1241,6 +1276,8 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
>  		tb->tb_num_default++;
>  
>  	rt_cache_flush(cfg->fc_nlinfo.nl_net);
> +	call_fib_notifiers(FIB_EVENT_TYPE_ADD, key, plen, fi, tos,
> +			   cfg->fc_type, tb->tb_id, cfg->fc_nlflags);


It appears that this is in addition to the existing switchdev_fib_ipv4_add call right above this.
Is the intent to do both ?. i don't see a need to do both.

and switchdev_fib_ipv4_add  offloads before the route is added to the kernel.
But the notifier seems to fire after the route is added to the kernel.

>  	rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen, new_fa->tb_id,
>  		  &cfg->fc_nlinfo, nlflags);
>  succeeded:
> @@ -1542,6 +1579,8 @@ int fib_table_delete(struct fib_table *tb, struct fib_config *cfg)
>  	switchdev_fib_ipv4_del(key, plen, fa_to_delete->fa_info, tos,
>  			       cfg->fc_type, tb->tb_id);
>  
> +	call_fib_notifiers(FIB_EVENT_TYPE_DEL, key, plen, fa_to_delete->fa_info,
> +			   tos, cfg->fc_type, tb->tb_id, 0);
>  	rtmsg_fib(RTM_DELROUTE, htonl(key), fa_to_delete, plen, tb->tb_id,
>  		  &cfg->fc_nlinfo, 0);
>  
> @@ -1857,6 +1896,10 @@ int fib_table_flush(struct fib_table *tb)
>  			switchdev_fib_ipv4_del(n->key, KEYLENGTH - fa->fa_slen,
>  					       fi, fa->fa_tos, fa->fa_type,
>  					       tb->tb_id);
> +			call_fib_notifiers(FIB_EVENT_TYPE_DEL, n->key,
> +					   KEYLENGTH - fa->fa_slen,
> +					   fi, fa->fa_tos, fa->fa_type,
> +					   tb->tb_id, 0);
>  			hlist_del_rcu(&fa->fa_list);
>  			fib_release_info(fa->fa_info);
>  			alias_free_mem_rcu(fa);

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch net-next RFC 1/2] fib: introduce fib notification infrastructure
  2016-09-18 23:23   ` [patch net-next RFC 1/2] " Roopa Prabhu
@ 2016-09-19  6:06     ` Jiri Pirko
  2016-09-19 14:53       ` Roopa Prabhu
  0 siblings, 1 reply; 27+ messages in thread
From: Jiri Pirko @ 2016-09-19  6:06 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: netdev, davem, idosch, eladr, yotamg, nogahf, ogerlitz, nikolay,
	linville, tgraf, gospo, sfeldma, ast, edumazet, hannes,
	f.fainelli, dsa, jhs, vivien.didelot, john.fastabend, andrew,
	ivecera

Mon, Sep 19, 2016 at 01:23:47AM CEST, roopa@cumulusnetworks.com wrote:
>On 9/6/16, 5:01 AM, Jiri Pirko wrote:
>> From: Jiri Pirko <jiri@mellanox.com>
>>
>> This allows to pass information about added/deleted fib entries to
>> whoever is interested. This is done in a very similar way as devinet
>> notifies address additions/removals.
>>
>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>> ---
>>  include/net/ip_fib.h | 19 +++++++++++++++++++
>>  net/ipv4/fib_trie.c  | 43 +++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 62 insertions(+)
>>
>> diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
>> index 4079fc1..9ad7ba9 100644
>> --- a/include/net/ip_fib.h
>> +++ b/include/net/ip_fib.h
>> @@ -22,6 +22,7 @@
>>  #include <net/fib_rules.h>
>>  #include <net/inetpeer.h>
>>  #include <linux/percpu.h>
>> +#include <linux/notifier.h>
>>  
>>  struct fib_config {
>>  	u8			fc_dst_len;
>> @@ -184,6 +185,24 @@ __be32 fib_info_update_nh_saddr(struct net *net, struct fib_nh *nh);
>>  #define FIB_RES_PREFSRC(net, res)	((res).fi->fib_prefsrc ? : \
>>  					 FIB_RES_SADDR(net, res))
>>  
>> +struct fib_notifier_info {
>> +	u32 dst;
>> +	int dst_len;
>> +	struct fib_info *fi;
>> +	u8 tos;
>> +	u8 type;
>> +	u32 tb_id;
>> +	u32 nlflags;
>> +};
>> +
>> +enum fib_event_type {
>> +	FIB_EVENT_TYPE_ADD,
>> +	FIB_EVENT_TYPE_DEL,
>> +};
>> +
>> +int register_fib_notifier(struct notifier_block *nb);
>> +int unregister_fib_notifier(struct notifier_block *nb);
>> +
>>  struct fib_table {
>>  	struct hlist_node	tb_hlist;
>>  	u32			tb_id;
>> diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
>> index e2ffc2a..19ec471 100644
>> --- a/net/ipv4/fib_trie.c
>> +++ b/net/ipv4/fib_trie.c
>> @@ -73,6 +73,7 @@
>>  #include <linux/slab.h>
>>  #include <linux/export.h>
>>  #include <linux/vmalloc.h>
>> +#include <linux/notifier.h>
>>  #include <net/net_namespace.h>
>>  #include <net/ip.h>
>>  #include <net/protocol.h>
>> @@ -84,6 +85,36 @@
>>  #include <trace/events/fib.h>
>>  #include "fib_lookup.h"
>>  
>> +static BLOCKING_NOTIFIER_HEAD(fib_chain);
>> +
>> +int register_fib_notifier(struct notifier_block *nb)
>> +{
>> +	return blocking_notifier_chain_register(&fib_chain, nb);
>> +}
>> +EXPORT_SYMBOL(register_fib_notifier);
>> +
>> +int unregister_fib_notifier(struct notifier_block *nb)
>> +{
>> +	return blocking_notifier_chain_unregister(&fib_chain, nb);
>> +}
>> +EXPORT_SYMBOL(unregister_fib_notifier);
>> +
>> +static int call_fib_notifiers(enum fib_event_type event_type, u32 dst,
>> +			      int dst_len, struct fib_info *fi,
>> +			      u8 tos, u8 type, u32 tb_id, u32 nlflags)
>> +{
>> +	struct fib_notifier_info info = {
>> +		.dst = dst,
>> +		.dst_len = dst_len,
>> +		.fi = fi,
>> +		.tos = tos,
>> +		.type = type,
>> +		.tb_id = tb_id,
>> +		.nlflags = nlflags,
>> +	};
>> +	return blocking_notifier_call_chain(&fib_chain, event_type, &info);
>> +}
>> +
>>  #define MAX_STAT_DEPTH 32
>>  
>>  #define KEYLENGTH	(8*sizeof(t_key))
>> @@ -1190,6 +1221,10 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
>>  			fib_release_info(fi_drop);
>>  			if (state & FA_S_ACCESSED)
>>  				rt_cache_flush(cfg->fc_nlinfo.nl_net);
>> +
>> +			call_fib_notifiers(FIB_EVENT_TYPE_ADD, key, plen, fi,
>> +					   new_fa->fa_tos, cfg->fc_type,
>> +					   tb->tb_id, cfg->fc_nlflags);
>>  			rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen,
>>  				tb->tb_id, &cfg->fc_nlinfo, NLM_F_REPLACE);
>>  
>> @@ -1241,6 +1276,8 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
>>  		tb->tb_num_default++;
>>  
>>  	rt_cache_flush(cfg->fc_nlinfo.nl_net);
>> +	call_fib_notifiers(FIB_EVENT_TYPE_ADD, key, plen, fi, tos,
>> +			   cfg->fc_type, tb->tb_id, cfg->fc_nlflags);
>
>
>It appears that this is in addition to the existing switchdev_fib_ipv4_add call right above this.
>Is the intent to do both ?. i don't see a need to do both.

I already have patchset improved that it removes the switchdev fib code.
Have to do some more testing, will send it soon.


>
>and switchdev_fib_ipv4_add  offloads before the route is added to the kernel.
>But the notifier seems to fire after the route is added to the kernel.

Yeah, I wanted to align it with rtmsg_fib calls. Also I think it makes
sense to have slowpath ready before offload.


>
>>  	rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen, new_fa->tb_id,
>>  		  &cfg->fc_nlinfo, nlflags);
>>  succeeded:
>> @@ -1542,6 +1579,8 @@ int fib_table_delete(struct fib_table *tb, struct fib_config *cfg)
>>  	switchdev_fib_ipv4_del(key, plen, fa_to_delete->fa_info, tos,
>>  			       cfg->fc_type, tb->tb_id);
>>  
>> +	call_fib_notifiers(FIB_EVENT_TYPE_DEL, key, plen, fa_to_delete->fa_info,
>> +			   tos, cfg->fc_type, tb->tb_id, 0);
>>  	rtmsg_fib(RTM_DELROUTE, htonl(key), fa_to_delete, plen, tb->tb_id,
>>  		  &cfg->fc_nlinfo, 0);
>>  
>> @@ -1857,6 +1896,10 @@ int fib_table_flush(struct fib_table *tb)
>>  			switchdev_fib_ipv4_del(n->key, KEYLENGTH - fa->fa_slen,
>>  					       fi, fa->fa_tos, fa->fa_type,
>>  					       tb->tb_id);
>> +			call_fib_notifiers(FIB_EVENT_TYPE_DEL, n->key,
>> +					   KEYLENGTH - fa->fa_slen,
>> +					   fi, fa->fa_tos, fa->fa_type,
>> +					   tb->tb_id, 0);
>>  			hlist_del_rcu(&fa->fa_list);
>>  			fib_release_info(fa->fa_info);
>>  			alias_free_mem_rcu(fa);
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch net-next RFC 0/2] fib4 offload: notifier to let hw to be aware of all prefixes
  2016-09-18 20:00 ` [patch net-next RFC 0/2] fib4 offload: notifier to let hw to be aware of all prefixes Florian Fainelli
  2016-09-18 23:16   ` Roopa Prabhu
@ 2016-09-19  6:08   ` Jiri Pirko
  2016-09-19 16:44     ` Florian Fainelli
  1 sibling, 1 reply; 27+ messages in thread
From: Jiri Pirko @ 2016-09-19  6:08 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: netdev, davem, idosch, eladr, yotamg, nogahf, ogerlitz, roopa,
	nikolay, linville, tgraf, gospo, sfeldma, ast, edumazet, hannes,
	dsa, jhs, vivien.didelot, john.fastabend, andrew, ivecera

Sun, Sep 18, 2016 at 10:00:44PM CEST, f.fainelli@gmail.com wrote:
>Le 06/09/2016 à 05:01, Jiri Pirko a écrit :
>> From: Jiri Pirko <jiri@mellanox.com>
>> 
>> This is RFC, unfinished. I came across some issues in the process so I would
>> like to share those and restart the fib offload discussion in order to make it
>> really usable.
>> 
>> So the goal of this patchset is to allow driver to propagate all prefixes
>> configured in kernel down HW. This is necessary for routing to work
>> as expected. If we don't do that HW might forward prefixes known to kernel
>> incorrectly. Take an example when default route is set in switch HW and there
>> is an IP address set on a management (non-switch) port.
>> 
>> Currently, only fibs related to the switch port netdev are offloaded using
>> switchdev ops. This model is not extendable so the first patch introduces
>> a replacement: notifier to propagate fib additions and removals to whoever
>> interested. The second patch makes mlxsw to adopt this new way, registering
>> one notifier block for each mlxsw (asic) instance.
>
>Instead of introducing another specialization of a notifier_block
>implementation, could we somehow have a kernel-based netlink listener
>which receives the same kind of event information from rtmsg_fib()?

rtmsg_fib destination is userspace. The message format is netlink. I
don't think it is wise to pass netlink messages inside kernel when we
can pass nice structures. Lower overhead. This is how it is done in the
rest of kernel. Not sure how your comment is related specifically to
this patch.


>
>The reason is that having such a facility would hook directly onto
>existing rtmsg_* calls that exist throughout the stack, and that seems
>to scale better.
>-- 
>Florian

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch net-next RFC 0/2] fib4 offload: notifier to let hw to be aware of all prefixes
  2016-09-18 23:16   ` Roopa Prabhu
@ 2016-09-19  6:14     ` Jiri Pirko
  2016-09-19 14:59       ` Roopa Prabhu
  0 siblings, 1 reply; 27+ messages in thread
From: Jiri Pirko @ 2016-09-19  6:14 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: Florian Fainelli, netdev, davem, idosch, eladr, yotamg, nogahf,
	ogerlitz, nikolay, linville, tgraf, gospo, sfeldma, ast,
	edumazet, hannes, dsa, jhs, vivien.didelot, john.fastabend,
	andrew, ivecera

Mon, Sep 19, 2016 at 01:16:17AM CEST, roopa@cumulusnetworks.com wrote:
>On 9/18/16, 1:00 PM, Florian Fainelli wrote:
>> Le 06/09/2016 à 05:01, Jiri Pirko a écrit :
>>> From: Jiri Pirko <jiri@mellanox.com>
>>>
>>> This is RFC, unfinished. I came across some issues in the process so I would
>>> like to share those and restart the fib offload discussion in order to make it
>>> really usable.
>>>
>>> So the goal of this patchset is to allow driver to propagate all prefixes
>>> configured in kernel down HW. This is necessary for routing to work
>>> as expected. If we don't do that HW might forward prefixes known to kernel
>>> incorrectly. Take an example when default route is set in switch HW and there
>>> is an IP address set on a management (non-switch) port.
>>>
>>> Currently, only fibs related to the switch port netdev are offloaded using
>>> switchdev ops. This model is not extendable so the first patch introduces
>>> a replacement: notifier to propagate fib additions and removals to whoever
>>> interested. The second patch makes mlxsw to adopt this new way, registering
>>> one notifier block for each mlxsw (asic) instance.
>> Instead of introducing another specialization of a notifier_block
>> implementation, could we somehow have a kernel-based netlink listener
>> which receives the same kind of event information from rtmsg_fib()?
>>
>> The reason is that having such a facility would hook directly onto
>> existing rtmsg_* calls that exist throughout the stack, and that seems
>> to scale better.
>I was thinking along the same lines. Instead of proliferating notifier blocks
>through-out the stack for switchdev offload, putting existing events to use would be nice.
>
>But the problem though is drivers having to parse the netlink msg again. also, the intent
>here is to do the offload first ..before the route is added to the kernel (though i don't see that in
>the current series). existing netlink rmsg_fib events are generated after the route is added to the kernel.
>
>
>Jiri, instead of the notifier, do you see a problem with always calling the existing switchdev
>offload api for every route  for every asic instance ?. the first device where the route fits wins.

There is not list of asic instances. Therefore the notifier fits much better here.



>it seems similar to driver registering for notifier and looking at every route ...
>am i missing something ?
>and the policies you mention could help around selecting the asic instance (FCFS or mirror).
>you will need to abstract out the asic instance for switchdev api to call on, but I thought you
>already have that in some form in your devlink infrastructure.

switchdev asic instances and devlink instances are orthogonal.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch net-next RFC 1/2] fib: introduce fib notification infrastructure
  2016-09-19  6:06     ` Jiri Pirko
@ 2016-09-19 14:53       ` Roopa Prabhu
  2016-09-19 15:08         ` Jiri Pirko
  0 siblings, 1 reply; 27+ messages in thread
From: Roopa Prabhu @ 2016-09-19 14:53 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, idosch, eladr, yotamg, nogahf, ogerlitz, nikolay,
	linville, tgraf, gospo, sfeldma, ast, edumazet, hannes,
	f.fainelli, dsa, jhs, vivien.didelot, john.fastabend, andrew,
	ivecera

On 9/18/16, 11:06 PM, Jiri Pirko wrote:
> Mon, Sep 19, 2016 at 01:23:47AM CEST, roopa@cumulusnetworks.com wrote:
>> On 9/6/16, 5:01 AM, Jiri Pirko wrote:
>>> From: Jiri Pirko <jiri@mellanox.com>
>>>
>>> This allows to pass information about added/deleted fib entries to
>>> whoever is interested. This is done in a very similar way as devinet
>>> notifies address additions/removals.
>>>
>>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>>> ---

[snip]
>>>  
>>>  #define KEYLENGTH	(8*sizeof(t_key))
>>> @@ -1190,6 +1221,10 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
>>>  			fib_release_info(fi_drop);
>>>  			if (state & FA_S_ACCESSED)
>>>  				rt_cache_flush(cfg->fc_nlinfo.nl_net);
>>> +
>>> +			call_fib_notifiers(FIB_EVENT_TYPE_ADD, key, plen, fi,
>>> +					   new_fa->fa_tos, cfg->fc_type,
>>> +					   tb->tb_id, cfg->fc_nlflags);
>>>  			rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen,
>>>  				tb->tb_id, &cfg->fc_nlinfo, NLM_F_REPLACE);
>>>  
>>> @@ -1241,6 +1276,8 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
>>>  		tb->tb_num_default++;
>>>  
>>>  	rt_cache_flush(cfg->fc_nlinfo.nl_net);
>>> +	call_fib_notifiers(FIB_EVENT_TYPE_ADD, key, plen, fi, tos,
>>> +			   cfg->fc_type, tb->tb_id, cfg->fc_nlflags);
>>
>> It appears that this is in addition to the existing switchdev_fib_ipv4_add call right above this.
>> Is the intent to do both ?. i don't see a need to do both.
> I already have patchset improved that it removes the switchdev fib code.
> Have to do some more testing, will send it soon.

ok, ack.
>
>
>> and switchdev_fib_ipv4_add  offloads before the route is added to the kernel.
>> But the notifier seems to fire after the route is added to the kernel.
> Yeah, I wanted to align it with rtmsg_fib calls. Also I think it makes
> sense to have slowpath ready before offload.
>

ok, ..but..that changes existing behavior though. and if the hw route add fails...,
you may have inconsistent state between hw and sw.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch net-next RFC 0/2] fib4 offload: notifier to let hw to be aware of all prefixes
  2016-09-19  6:14     ` Jiri Pirko
@ 2016-09-19 14:59       ` Roopa Prabhu
  2016-09-19 15:15         ` Jiri Pirko
  0 siblings, 1 reply; 27+ messages in thread
From: Roopa Prabhu @ 2016-09-19 14:59 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Florian Fainelli, netdev, davem, idosch, eladr, yotamg, nogahf,
	ogerlitz, nikolay, linville, tgraf, gospo, sfeldma, ast,
	edumazet, hannes, dsa, jhs, vivien.didelot, john.fastabend,
	andrew, ivecera

On 9/18/16, 11:14 PM, Jiri Pirko wrote:
> Mon, Sep 19, 2016 at 01:16:17AM CEST, roopa@cumulusnetworks.com wrote:
>> On 9/18/16, 1:00 PM, Florian Fainelli wrote:
>>> Le 06/09/2016 à 05:01, Jiri Pirko a écrit :
>>>> From: Jiri Pirko <jiri@mellanox.com>
>>>>
>>>> This is RFC, unfinished. I came across some issues in the process so I would
>>>> like to share those and restart the fib offload discussion in order to make it
>>>> really usable.
>>>>
>>>> So the goal of this patchset is to allow driver to propagate all prefixes
>>>> configured in kernel down HW. This is necessary for routing to work
>>>> as expected. If we don't do that HW might forward prefixes known to kernel
>>>> incorrectly. Take an example when default route is set in switch HW and there
>>>> is an IP address set on a management (non-switch) port.
>>>>
>>>> Currently, only fibs related to the switch port netdev are offloaded using
>>>> switchdev ops. This model is not extendable so the first patch introduces
>>>> a replacement: notifier to propagate fib additions and removals to whoever
>>>> interested. The second patch makes mlxsw to adopt this new way, registering
>>>> one notifier block for each mlxsw (asic) instance.
>>> Instead of introducing another specialization of a notifier_block
>>> implementation, could we somehow have a kernel-based netlink listener
>>> which receives the same kind of event information from rtmsg_fib()?
>>>
>>> The reason is that having such a facility would hook directly onto
>>> existing rtmsg_* calls that exist throughout the stack, and that seems
>>> to scale better.
>> I was thinking along the same lines. Instead of proliferating notifier blocks
>> through-out the stack for switchdev offload, putting existing events to use would be nice.
>>
>> But the problem though is drivers having to parse the netlink msg again. also, the intent
>> here is to do the offload first ..before the route is added to the kernel (though i don't see that in
>> the current series). existing netlink rmsg_fib events are generated after the route is added to the kernel.
>>
>>
>> Jiri, instead of the notifier, do you see a problem with always calling the existing switchdev
>> offload api for every route  for every asic instance ?. the first device where the route fits wins.
> There is not list of asic instances. Therefore the notifier fits much better here.
>
>
>
>> it seems similar to driver registering for notifier and looking at every route ...
>> am i missing something ?
>> and the policies you mention could help around selecting the asic instance (FCFS or mirror).
>> you will need to abstract out the asic instance for switchdev api to call on, but I thought you
>> already have that in some form in your devlink infrastructure.
> switchdev asic instances and devlink instances are orthogonal.

maybe it is not today...but the requirement for devlink was to provide a way to communicate
to the switch driver
- global switch attributes or
- things that cannot go via switch ports (exactly the problem you are trying to solve for routes here)

so,  maybe an instance of switch asic modeled via devlink will help here and possibly all/other switchdev
offload hooks ?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch net-next RFC 1/2] fib: introduce fib notification infrastructure
  2016-09-19 14:53       ` Roopa Prabhu
@ 2016-09-19 15:08         ` Jiri Pirko
  0 siblings, 0 replies; 27+ messages in thread
From: Jiri Pirko @ 2016-09-19 15:08 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: netdev, davem, idosch, eladr, yotamg, nogahf, ogerlitz, nikolay,
	linville, tgraf, gospo, sfeldma, ast, edumazet, hannes,
	f.fainelli, dsa, jhs, vivien.didelot, john.fastabend, andrew,
	ivecera

Mon, Sep 19, 2016 at 04:53:07PM CEST, roopa@cumulusnetworks.com wrote:
>On 9/18/16, 11:06 PM, Jiri Pirko wrote:
>> Mon, Sep 19, 2016 at 01:23:47AM CEST, roopa@cumulusnetworks.com wrote:
>>> On 9/6/16, 5:01 AM, Jiri Pirko wrote:
>>>> From: Jiri Pirko <jiri@mellanox.com>
>>>>
>>>> This allows to pass information about added/deleted fib entries to
>>>> whoever is interested. This is done in a very similar way as devinet
>>>> notifies address additions/removals.
>>>>
>>>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>>>> ---
>
>[snip]
>>>>  
>>>>  #define KEYLENGTH	(8*sizeof(t_key))
>>>> @@ -1190,6 +1221,10 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
>>>>  			fib_release_info(fi_drop);
>>>>  			if (state & FA_S_ACCESSED)
>>>>  				rt_cache_flush(cfg->fc_nlinfo.nl_net);
>>>> +
>>>> +			call_fib_notifiers(FIB_EVENT_TYPE_ADD, key, plen, fi,
>>>> +					   new_fa->fa_tos, cfg->fc_type,
>>>> +					   tb->tb_id, cfg->fc_nlflags);
>>>>  			rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen,
>>>>  				tb->tb_id, &cfg->fc_nlinfo, NLM_F_REPLACE);
>>>>  
>>>> @@ -1241,6 +1276,8 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
>>>>  		tb->tb_num_default++;
>>>>  
>>>>  	rt_cache_flush(cfg->fc_nlinfo.nl_net);
>>>> +	call_fib_notifiers(FIB_EVENT_TYPE_ADD, key, plen, fi, tos,
>>>> +			   cfg->fc_type, tb->tb_id, cfg->fc_nlflags);
>>>
>>> It appears that this is in addition to the existing switchdev_fib_ipv4_add call right above this.
>>> Is the intent to do both ?. i don't see a need to do both.
>> I already have patchset improved that it removes the switchdev fib code.
>> Have to do some more testing, will send it soon.
>
>ok, ack.
>>
>>
>>> and switchdev_fib_ipv4_add  offloads before the route is added to the kernel.
>>> But the notifier seems to fire after the route is added to the kernel.
>> Yeah, I wanted to align it with rtmsg_fib calls. Also I think it makes
>> sense to have slowpath ready before offload.
>>
>
>ok, ..but..that changes existing behavior though. and if the hw route add fails...,
>you may have inconsistent state between hw and sw.

If the hw route add fails, driver will be responsible for abort,
cleanining up hw and set appropriate traps to get packets to cpu
processing.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch net-next RFC 0/2] fib4 offload: notifier to let hw to be aware of all prefixes
  2016-09-19 14:59       ` Roopa Prabhu
@ 2016-09-19 15:15         ` Jiri Pirko
  2016-09-20  5:49           ` Roopa Prabhu
  0 siblings, 1 reply; 27+ messages in thread
From: Jiri Pirko @ 2016-09-19 15:15 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: Florian Fainelli, netdev, davem, idosch, eladr, yotamg, nogahf,
	ogerlitz, nikolay, linville, tgraf, gospo, sfeldma, ast,
	edumazet, hannes, dsa, jhs, vivien.didelot, john.fastabend,
	andrew, ivecera

Mon, Sep 19, 2016 at 04:59:22PM CEST, roopa@cumulusnetworks.com wrote:
>On 9/18/16, 11:14 PM, Jiri Pirko wrote:
>> Mon, Sep 19, 2016 at 01:16:17AM CEST, roopa@cumulusnetworks.com wrote:
>>> On 9/18/16, 1:00 PM, Florian Fainelli wrote:
>>>> Le 06/09/2016 à 05:01, Jiri Pirko a écrit :
>>>>> From: Jiri Pirko <jiri@mellanox.com>
>>>>>
>>>>> This is RFC, unfinished. I came across some issues in the process so I would
>>>>> like to share those and restart the fib offload discussion in order to make it
>>>>> really usable.
>>>>>
>>>>> So the goal of this patchset is to allow driver to propagate all prefixes
>>>>> configured in kernel down HW. This is necessary for routing to work
>>>>> as expected. If we don't do that HW might forward prefixes known to kernel
>>>>> incorrectly. Take an example when default route is set in switch HW and there
>>>>> is an IP address set on a management (non-switch) port.
>>>>>
>>>>> Currently, only fibs related to the switch port netdev are offloaded using
>>>>> switchdev ops. This model is not extendable so the first patch introduces
>>>>> a replacement: notifier to propagate fib additions and removals to whoever
>>>>> interested. The second patch makes mlxsw to adopt this new way, registering
>>>>> one notifier block for each mlxsw (asic) instance.
>>>> Instead of introducing another specialization of a notifier_block
>>>> implementation, could we somehow have a kernel-based netlink listener
>>>> which receives the same kind of event information from rtmsg_fib()?
>>>>
>>>> The reason is that having such a facility would hook directly onto
>>>> existing rtmsg_* calls that exist throughout the stack, and that seems
>>>> to scale better.
>>> I was thinking along the same lines. Instead of proliferating notifier blocks
>>> through-out the stack for switchdev offload, putting existing events to use would be nice.
>>>
>>> But the problem though is drivers having to parse the netlink msg again. also, the intent
>>> here is to do the offload first ..before the route is added to the kernel (though i don't see that in
>>> the current series). existing netlink rmsg_fib events are generated after the route is added to the kernel.
>>>
>>>
>>> Jiri, instead of the notifier, do you see a problem with always calling the existing switchdev
>>> offload api for every route  for every asic instance ?. the first device where the route fits wins.
>> There is not list of asic instances. Therefore the notifier fits much better here.
>>
>>
>>
>>> it seems similar to driver registering for notifier and looking at every route ...
>>> am i missing something ?
>>> and the policies you mention could help around selecting the asic instance (FCFS or mirror).
>>> you will need to abstract out the asic instance for switchdev api to call on, but I thought you
>>> already have that in some form in your devlink infrastructure.
>> switchdev asic instances and devlink instances are orthogonal.
>
>maybe it is not today...but the requirement for devlink was to provide a way to communicate
>to the switch driver
>- global switch attributes or
>- things that cannot go via switch ports (exactly the problem you are trying to solve for routes here)

Devlink is a general beast, not switch specific one. I see no need to
use fib->devlink->driver route inside kernel. Devlink is for userspace
facing.


>
>so,  maybe an instance of switch asic modeled via devlink will help here and possibly all/other switchdev
>offload hooks ?

Maybe, but in case of fibs, the notifier just fits great. I see no need
for anything else.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch net-next RFC 0/2] fib4 offload: notifier to let hw to be aware of all prefixes
  2016-09-19  6:08   ` Jiri Pirko
@ 2016-09-19 16:44     ` Florian Fainelli
  0 siblings, 0 replies; 27+ messages in thread
From: Florian Fainelli @ 2016-09-19 16:44 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, idosch, eladr, yotamg, nogahf, ogerlitz, roopa,
	nikolay, linville, tgraf, gospo, sfeldma, ast, edumazet, hannes,
	dsa, jhs, vivien.didelot, john.fastabend, andrew, ivecera

On 09/18/2016 11:08 PM, Jiri Pirko wrote:
> Sun, Sep 18, 2016 at 10:00:44PM CEST, f.fainelli@gmail.com wrote:
>> Le 06/09/2016 à 05:01, Jiri Pirko a écrit :
>>> From: Jiri Pirko <jiri@mellanox.com>
>>>
>>> This is RFC, unfinished. I came across some issues in the process so I would
>>> like to share those and restart the fib offload discussion in order to make it
>>> really usable.
>>>
>>> So the goal of this patchset is to allow driver to propagate all prefixes
>>> configured in kernel down HW. This is necessary for routing to work
>>> as expected. If we don't do that HW might forward prefixes known to kernel
>>> incorrectly. Take an example when default route is set in switch HW and there
>>> is an IP address set on a management (non-switch) port.
>>>
>>> Currently, only fibs related to the switch port netdev are offloaded using
>>> switchdev ops. This model is not extendable so the first patch introduces
>>> a replacement: notifier to propagate fib additions and removals to whoever
>>> interested. The second patch makes mlxsw to adopt this new way, registering
>>> one notifier block for each mlxsw (asic) instance.
>>
>> Instead of introducing another specialization of a notifier_block
>> implementation, could we somehow have a kernel-based netlink listener
>> which receives the same kind of event information from rtmsg_fib()?
> 
> rtmsg_fib destination is userspace. The message format is netlink. I
> don't think it is wise to pass netlink messages inside kernel when we
> can pass nice structures.

True, which does not mean you cannot have a small piece of kernel code
that listens for netlink events and unmarshalls what you want to have
your drivers operate on a structure-based message. The point I am
getting it is that rtmsg_* exists all over the place, and with close to
zero modification in the networking stack you can get the notification
you want if you hook a netlink listener in kernel space. Just like the
notifier you are proposing, the interface drivers could work with can be
structure (and not netlink message) based. An argument could be made
that with the notifier approach and hooking where necessary you have
finer control of message delivery, especially for managing overhead and
potential subscribers to these messages (it is easier to compile out the
notifier code for what you are interested in).


> Lower overhead. This is how it is done in the
> rest of kernel. Not sure how your comment is related specifically to
> this patch.

This is not directly related to your patch, it just made me think about
it as I was considering adding a notifier for other purposes (turns out
we may be able to do differently, without one).
-- 
Florian

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch net-next RFC 0/2] fib4 offload: notifier to let hw to be aware of all prefixes
  2016-09-19 15:15         ` Jiri Pirko
@ 2016-09-20  5:49           ` Roopa Prabhu
  2016-09-20  6:02             ` Jiri Pirko
  0 siblings, 1 reply; 27+ messages in thread
From: Roopa Prabhu @ 2016-09-20  5:49 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Florian Fainelli, netdev, davem, idosch, eladr, yotamg, nogahf,
	ogerlitz, nikolay, linville, tgraf, gospo, sfeldma, ast,
	edumazet, hannes, dsa, jhs, vivien.didelot, john.fastabend,
	andrew, ivecera

On 9/19/16, 8:15 AM, Jiri Pirko wrote:
> Mon, Sep 19, 2016 at 04:59:22PM CEST, roopa@cumulusnetworks.com wrote:
>> On 9/18/16, 11:14 PM, Jiri Pirko wrote:
>>> Mon, Sep 19, 2016 at 01:16:17AM CEST, roopa@cumulusnetworks.com wrote:
>>>> On 9/18/16, 1:00 PM, Florian Fainelli wrote:
>>>>> Le 06/09/2016 à 05:01, Jiri Pirko a écrit :
>>>>>> From: Jiri Pirko <jiri@mellanox.com>
>>>>>>
>>>>>> This is RFC, unfinished. I came across some issues in the process so I would
>>>>>> like to share those and restart the fib offload discussion in order to make it
>>>>>> really usable.
>>>>>>
>>>>>> So the goal of this patchset is to allow driver to propagate all prefixes
>>>>>> configured in kernel down HW. This is necessary for routing to work
>>>>>> as expected. If we don't do that HW might forward prefixes known to kernel
>>>>>> incorrectly. Take an example when default route is set in switch HW and there
>>>>>> is an IP address set on a management (non-switch) port.
>>>>>>
>>>>>> Currently, only fibs related to the switch port netdev are offloaded using
>>>>>> switchdev ops. This model is not extendable so the first patch introduces
>>>>>> a replacement: notifier to propagate fib additions and removals to whoever
>>>>>> interested. The second patch makes mlxsw to adopt this new way, registering
>>>>>> one notifier block for each mlxsw (asic) instance.
>>>>> Instead of introducing another specialization of a notifier_block
>>>>> implementation, could we somehow have a kernel-based netlink listener
>>>>> which receives the same kind of event information from rtmsg_fib()?
>>>>>
>>>>> The reason is that having such a facility would hook directly onto
>>>>> existing rtmsg_* calls that exist throughout the stack, and that seems
>>>>> to scale better.
>>>> I was thinking along the same lines. Instead of proliferating notifier blocks
>>>> through-out the stack for switchdev offload, putting existing events to use would be nice.
>>>>
>>>> But the problem though is drivers having to parse the netlink msg again. also, the intent
>>>> here is to do the offload first ..before the route is added to the kernel (though i don't see that in
>>>> the current series). existing netlink rmsg_fib events are generated after the route is added to the kernel.
>>>>
>>>>
>>>> Jiri, instead of the notifier, do you see a problem with always calling the existing switchdev
>>>> offload api for every route  for every asic instance ?. the first device where the route fits wins.
>>> There is not list of asic instances. Therefore the notifier fits much better here.
>>>
>>>
>>>
>>>> it seems similar to driver registering for notifier and looking at every route ...
>>>> am i missing something ?
>>>> and the policies you mention could help around selecting the asic instance (FCFS or mirror).
>>>> you will need to abstract out the asic instance for switchdev api to call on, but I thought you
>>>> already have that in some form in your devlink infrastructure.
>>> switchdev asic instances and devlink instances are orthogonal.
>> maybe it is not today...but the requirement for devlink was to provide a way to communicate
>> to the switch driver
>> - global switch attributes or
>> - things that cannot go via switch ports (exactly the problem you are trying to solve for routes here)
> Devlink is a general beast, not switch specific one. I see no need to
> use fib->devlink->driver route inside kernel. Devlink is for userspace
> facing.

yes, sure. it has a dev abstraction and an api. devlink discussion started a few years ago in the context
of switch asics for the very same reason that it will help direct the offload call to the
switch device driver when you cant apply the settings on a per port basis.
You have kept the abstraction and api generic ..which is a great thing.
But that can't be the reason for it to not support its original intent...if there is a way.

>
>
>> so,  maybe an instance of switch asic modeled via devlink will help here and possibly all/other switchdev
>> offload hooks ?
> Maybe, but in case of fibs, the notifier just fits great. I see no need
> for anything else.

I think its better to stick with 'offload api or notifier' whichever we pick ..
to be consistent with other switchdev offload areas. That was the original intent of
introducing the switchdev api layer. If we are now replacing the switchdev api with notifiers,
assuming 'notifiers are the best way' to offload routes, lets keep it consistent with
other switchdev offload areas too.

I know you already have them for links...and that is good..because links already have notifiers.
we will need the same thing for acls. Having notifiers for acls too seems like an overkill.
we will then have to extend this to multicast and mpls routes too. will all these be notifiers too ?

Do you see any scale problems with using notifiers ?. as you know these ascis can scale to
32k-128k routes.

lets discuss more at netdev1.2..if your patches are not in by then.

thanks,
Roopa

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch net-next RFC 0/2] fib4 offload: notifier to let hw to be aware of all prefixes
  2016-09-20  5:49           ` Roopa Prabhu
@ 2016-09-20  6:02             ` Jiri Pirko
  2016-09-20  6:18               ` Roopa Prabhu
  0 siblings, 1 reply; 27+ messages in thread
From: Jiri Pirko @ 2016-09-20  6:02 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: Florian Fainelli, netdev, davem, idosch, eladr, yotamg, nogahf,
	ogerlitz, nikolay, linville, tgraf, gospo, sfeldma, ast,
	edumazet, hannes, dsa, jhs, vivien.didelot, john.fastabend,
	andrew, ivecera

Tue, Sep 20, 2016 at 07:49:47AM CEST, roopa@cumulusnetworks.com wrote:
>On 9/19/16, 8:15 AM, Jiri Pirko wrote:
>> Mon, Sep 19, 2016 at 04:59:22PM CEST, roopa@cumulusnetworks.com wrote:
>>> On 9/18/16, 11:14 PM, Jiri Pirko wrote:
>>>> Mon, Sep 19, 2016 at 01:16:17AM CEST, roopa@cumulusnetworks.com wrote:
>>>>> On 9/18/16, 1:00 PM, Florian Fainelli wrote:
>>>>>> Le 06/09/2016 à 05:01, Jiri Pirko a écrit :
>>>>>>> From: Jiri Pirko <jiri@mellanox.com>
>>>>>>>
>>>>>>> This is RFC, unfinished. I came across some issues in the process so I would
>>>>>>> like to share those and restart the fib offload discussion in order to make it
>>>>>>> really usable.
>>>>>>>
>>>>>>> So the goal of this patchset is to allow driver to propagate all prefixes
>>>>>>> configured in kernel down HW. This is necessary for routing to work
>>>>>>> as expected. If we don't do that HW might forward prefixes known to kernel
>>>>>>> incorrectly. Take an example when default route is set in switch HW and there
>>>>>>> is an IP address set on a management (non-switch) port.
>>>>>>>
>>>>>>> Currently, only fibs related to the switch port netdev are offloaded using
>>>>>>> switchdev ops. This model is not extendable so the first patch introduces
>>>>>>> a replacement: notifier to propagate fib additions and removals to whoever
>>>>>>> interested. The second patch makes mlxsw to adopt this new way, registering
>>>>>>> one notifier block for each mlxsw (asic) instance.
>>>>>> Instead of introducing another specialization of a notifier_block
>>>>>> implementation, could we somehow have a kernel-based netlink listener
>>>>>> which receives the same kind of event information from rtmsg_fib()?
>>>>>>
>>>>>> The reason is that having such a facility would hook directly onto
>>>>>> existing rtmsg_* calls that exist throughout the stack, and that seems
>>>>>> to scale better.
>>>>> I was thinking along the same lines. Instead of proliferating notifier blocks
>>>>> through-out the stack for switchdev offload, putting existing events to use would be nice.
>>>>>
>>>>> But the problem though is drivers having to parse the netlink msg again. also, the intent
>>>>> here is to do the offload first ..before the route is added to the kernel (though i don't see that in
>>>>> the current series). existing netlink rmsg_fib events are generated after the route is added to the kernel.
>>>>>
>>>>>
>>>>> Jiri, instead of the notifier, do you see a problem with always calling the existing switchdev
>>>>> offload api for every route  for every asic instance ?. the first device where the route fits wins.
>>>> There is not list of asic instances. Therefore the notifier fits much better here.
>>>>
>>>>
>>>>
>>>>> it seems similar to driver registering for notifier and looking at every route ...
>>>>> am i missing something ?
>>>>> and the policies you mention could help around selecting the asic instance (FCFS or mirror).
>>>>> you will need to abstract out the asic instance for switchdev api to call on, but I thought you
>>>>> already have that in some form in your devlink infrastructure.
>>>> switchdev asic instances and devlink instances are orthogonal.
>>> maybe it is not today...but the requirement for devlink was to provide a way to communicate
>>> to the switch driver
>>> - global switch attributes or
>>> - things that cannot go via switch ports (exactly the problem you are trying to solve for routes here)
>> Devlink is a general beast, not switch specific one. I see no need to
>> use fib->devlink->driver route inside kernel. Devlink is for userspace
>> facing.
>
>yes, sure. it has a dev abstraction and an api. devlink discussion started a few years ago in the context
>of switch asics for the very same reason that it will help direct the offload call to the
>switch device driver when you cant apply the settings on a per port basis.
>You have kept the abstraction and api generic ..which is a great thing.
>But that can't be the reason for it to not support its original intent...if there is a way.
>
>>
>>
>>> so,  maybe an instance of switch asic modeled via devlink will help here and possibly all/other switchdev
>>> offload hooks ?
>> Maybe, but in case of fibs, the notifier just fits great. I see no need
>> for anything else.
>
>I think its better to stick with 'offload api or notifier' whichever we pick ..
>to be consistent with other switchdev offload areas. That was the original intent of
>introducing the switchdev api layer. If we are now replacing the switchdev api with notifiers,

I strongly disagree. Make it uniform is not desirable. For some things,
direct ndo/sdo make sense and is better. For some other things, notifier
fits better. For example when I was implementing LAG offload,
I also chose a notifier.


>assuming 'notifiers are the best way' to offload routes, lets keep it consistent with
>other switchdev offload areas too.
>
>I know you already have them for links...and that is good..because links already have notifiers.
>we will need the same thing for acls. Having notifiers for acls too seems like an overkill.

Acls will reuse the tc ndo infra. No notifiers required there. 


>we will then have to extend this to multicast and mpls routes too. will all these be notifiers too ?

I believe so.


>
>Do you see any scale problems with using notifiers ?. as you know these ascis can scale to
>32k-128k routes.

I don't see any problem there. What do you think might be wrong?


>
>lets discuss more at netdev1.2..if your patches are not in by then.
>
>thanks,
>Roopa
>
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch net-next RFC 0/2] fib4 offload: notifier to let hw to be aware of all prefixes
  2016-09-20  6:02             ` Jiri Pirko
@ 2016-09-20  6:18               ` Roopa Prabhu
  0 siblings, 0 replies; 27+ messages in thread
From: Roopa Prabhu @ 2016-09-20  6:18 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Florian Fainelli, netdev, davem, Ido Schimmel, Elad Raz,
	Yotam Gigi, Nogah Frankel, Or Gerlitz, Nikolay Aleksandrov,
	John Linville, Thomas Graf, Andy Gospodarek, Scott Feldman,
	Alexei Starovoitov, Eric Dumazet, hannes, David Ahern,
	Jamal Hadi Salim, Vivien Didelot

On Mon, Sep 19, 2016 at 11:02 PM, Jiri Pirko <jiri@resnulli.us> wrote:
> Tue, Sep 20, 2016 at 07:49:47AM CEST, roopa@cumulusnetworks.com wrote:

[snip]

>>
>>Do you see any scale problems with using notifiers ?. as you know these ascis can scale to
>>32k-128k routes.
>
> I don't see any problem there. What do you think might be wrong?
>

we had seen some overheads with link notifiers in older kernels with
large number of links flaps.
But that could have been due to rtnl lock. don't have any more data
than that. so, ignore that.
I don't see anything obvious from perf perspective....rtnl is already
held. but, thought i did just ask.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2016-09-20  6:18 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-06 12:01 [patch net-next RFC 0/2] fib4 offload: notifier to let hw to be aware of all prefixes Jiri Pirko
2016-09-06 12:01 ` [patch net-next RFC 1/2] fib: introduce fib notification infrastructure Jiri Pirko
2016-09-06 14:32   ` David Ahern
2016-09-06 14:44     ` Jiri Pirko
2016-09-06 15:11       ` David Ahern
2016-09-06 15:49         ` Jiri Pirko
2016-09-06 16:14           ` Hannes Frederic Sowa
2016-09-06 15:13   ` David Ahern
2016-09-07  8:03     ` Jiri Pirko
2016-09-15 14:41   ` [net-next,RFC,1/2] " Andy Gospodarek
2016-09-15 14:45     ` Jiri Pirko
2016-09-15 14:47       ` Andy Gospodarek
2016-09-18 23:23   ` [patch net-next RFC 1/2] " Roopa Prabhu
2016-09-19  6:06     ` Jiri Pirko
2016-09-19 14:53       ` Roopa Prabhu
2016-09-19 15:08         ` Jiri Pirko
2016-09-06 12:01 ` [patch net-next RFC 2/2] mlxsw: spectrum_router: Use FIB notifications instead of switchdev calls Jiri Pirko
2016-09-18 20:00 ` [patch net-next RFC 0/2] fib4 offload: notifier to let hw to be aware of all prefixes Florian Fainelli
2016-09-18 23:16   ` Roopa Prabhu
2016-09-19  6:14     ` Jiri Pirko
2016-09-19 14:59       ` Roopa Prabhu
2016-09-19 15:15         ` Jiri Pirko
2016-09-20  5:49           ` Roopa Prabhu
2016-09-20  6:02             ` Jiri Pirko
2016-09-20  6:18               ` Roopa Prabhu
2016-09-19  6:08   ` Jiri Pirko
2016-09-19 16:44     ` Florian Fainelli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.