All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 1/3] net: add IPv4 routing FIB support for swdev
@ 2015-01-02  3:29 sfeldma
  2015-01-02  5:49 ` roopa
  2015-01-06 13:58 ` Hannes Frederic Sowa
  0 siblings, 2 replies; 17+ messages in thread
From: sfeldma @ 2015-01-02  3:29 UTC (permalink / raw)
  To: netdev, jiri, john.fastabend, tgraf, jhs, andy, roopa

From: Scott Feldman <sfeldma@gmail.com>

To offload IPv4 L3 routing functions to swdev device, the swdev device driver
implements two new ndo ops (ndo_switch_fib_ipv4_add/del).  The ops are called
by the core IPv4 FIB code when installing/removing FIB entries to/from the
kernel FIB.  On install, the driver should return 0 if FIB entry (route) can be
installed to device for offloading, -EOPNOTSUPP if route cannot be installed
due to device limitations, and other negative error code on failure to install
route to device.  On failure error code, the route is not installed to device,
and not installed in kernel FIB, and the return code is propagated back to the
user-space caller (via netlink).  An -EOPNOTSUPP error code is skipped for the
device but installed in the kernel FIB.

The FIB entry (route) nexthop list is used to find the swdev device port to
anchor the ndo op call.  The route's fib_dev (the first nexthop's dev) is used
find the swdev port by recursively traversing the fib_dev's lower_dev list
until a swdev port is found.  The ndo op is called on this swdev port.

Since the FIB entry is "naked" when push from the kernel, the driver/device
is responsible for resolving the route's nexthops to neighbor MAC addresses.
This can be done by the driver by monitoring NETEVENT_NEIGH_UPDATE
netevent notifier to watch for ARP activity.  Once a nexthop is resolved to
neighbor MAC address, it can be installed to the device and the device will
do the L3 routing offload in HW, for that nexthop.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 include/linux/netdevice.h |   22 +++++++++++
 include/net/switchdev.h   |   18 +++++++++
 net/ipv4/fib_trie.c       |   17 ++++++++-
 net/switchdev/switchdev.c |   89 +++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 145 insertions(+), 1 deletion(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 679e6e9..b66d22b 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -767,6 +767,8 @@ struct netdev_phys_item_id {
 typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
 				       struct sk_buff *skb);
 
+struct fib_info;
+
 /*
  * This structure defines the management hooks for network devices.
  * The following hooks can be defined; unless noted otherwise, they are
@@ -1030,6 +1032,14 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
  * int (*ndo_switch_port_stp_update)(struct net_device *dev, u8 state);
  *	Called to notify switch device port of bridge port STP
  *	state change.
+ * int (*ndo_sw_parent_fib_ipv4_add)(struct net_device *dev, __be32 dst,
+ *				     int dst_len, struct fib_info *fi,
+ *				     u8 tos, u8 type, u32 tb_id);
+ *	Called to add IPv4 route to switch device.
+ * int (*ndo_sw_parent_fib_ipv4_del)(struct net_device *dev, __be32 dst,
+ *				     int dst_len, struct fib_info *fi,
+ *				     u8 tos, u8 type, u32 tb_id);
+ *	Called to delete IPv4 route from switch device.
  */
 struct net_device_ops {
 	int			(*ndo_init)(struct net_device *dev);
@@ -1189,6 +1199,18 @@ struct net_device_ops {
 							    struct netdev_phys_item_id *psid);
 	int			(*ndo_switch_port_stp_update)(struct net_device *dev,
 							      u8 state);
+	int			(*ndo_switch_fib_ipv4_add)(struct net_device *dev,
+							   __be32 dst,
+							   int dst_len,
+							   struct fib_info *fi,
+							   u8 tos, u8 type,
+							   u32 tb_id);
+	int			(*ndo_switch_fib_ipv4_del)(struct net_device *dev,
+							   __be32 dst,
+							   int dst_len,
+							   struct fib_info *fi,
+							   u8 tos, u8 type,
+							   u32 tb_id);
 #endif
 };
 
diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index 8a6d164..caebc2a 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -17,6 +17,10 @@
 int netdev_switch_parent_id_get(struct net_device *dev,
 				struct netdev_phys_item_id *psid);
 int netdev_switch_port_stp_update(struct net_device *dev, u8 state);
+int netdev_switch_fib_ipv4_add(u32 dst, int dst_len, struct fib_info *fi,
+			       u8 tos, u8 type, u32 tb_id);
+int netdev_switch_fib_ipv4_del(u32 dst, int dst_len, struct fib_info *fi,
+			       u8 tos, u8 type, u32 tb_id);
 
 #else
 
@@ -32,6 +36,20 @@ static inline int netdev_switch_port_stp_update(struct net_device *dev,
 	return -EOPNOTSUPP;
 }
 
+static inline int netdev_switch_fib_ipv4_add(u32 dst, int dst_len,
+					     struct fib_info *fi,
+					     u8 tos, u8 type, u32 tb_id)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline int netdev_switch_fib_ipv4_del(u32 dst, int dst_len,
+					     struct fib_info *fi,
+					     u8 tos, u8 type, u32 tb_id)
+{
+	return -EOPNOTSUPP;
+}
+
 #endif
 
 #endif /* _LINUX_SWITCHDEV_H_ */
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 281e5e0..ea2dc17 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -79,6 +79,7 @@
 #include <net/tcp.h>
 #include <net/sock.h>
 #include <net/ip_fib.h>
+#include <net/switchdev.h>
 #include "fib_lookup.h"
 
 #define MAX_STAT_DEPTH 32
@@ -1201,6 +1202,8 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
 			fib_release_info(fi_drop);
 			if (state & FA_S_ACCESSED)
 				rt_cache_flush(cfg->fc_nlinfo.nl_net);
+			netdev_switch_fib_ipv4_add(key, plen, fi, fa->fa_tos,
+						   cfg->fc_type, tb->tb_id);
 			rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen,
 				tb->tb_id, &cfg->fc_nlinfo, NLM_F_REPLACE);
 
@@ -1229,6 +1232,13 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
 	new_fa->fa_tos = tos;
 	new_fa->fa_type = cfg->fc_type;
 	new_fa->fa_state = 0;
+
+	/* (Optionally) offload fib info to switch hardware. */
+	err = netdev_switch_fib_ipv4_add(key, plen, fi, tos,
+					 cfg->fc_type, tb->tb_id);
+	if (err && err != -EOPNOTSUPP)
+		goto out_free_new_fa;
+
 	/*
 	 * Insert new entry to the list.
 	 */
@@ -1237,7 +1247,7 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
 		fa_head = fib_insert_node(t, key, plen);
 		if (unlikely(!fa_head)) {
 			err = -ENOMEM;
-			goto out_free_new_fa;
+			goto out_sw_fib_del;
 		}
 	}
 
@@ -1253,6 +1263,8 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
 succeeded:
 	return 0;
 
+out_sw_fib_del:
+	netdev_switch_fib_ipv4_del(key, plen, fi, tos, cfg->fc_type, tb->tb_id);
 out_free_new_fa:
 	kmem_cache_free(fn_alias_kmem, new_fa);
 out:
@@ -1529,6 +1541,9 @@ int fib_table_delete(struct fib_table *tb, struct fib_config *cfg)
 	rtmsg_fib(RTM_DELROUTE, htonl(key), fa, plen, tb->tb_id,
 		  &cfg->fc_nlinfo, 0);
 
+	netdev_switch_fib_ipv4_del(key, plen, fa->fa_info, tos,
+				   cfg->fc_type, tb->tb_id);
+
 	list_del_rcu(&fa->fa_list);
 
 	if (!plen)
diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index d162b21..211a8a0 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -12,6 +12,7 @@
 #include <linux/types.h>
 #include <linux/init.h>
 #include <linux/netdevice.h>
+#include <net/ip_fib.h>
 #include <net/switchdev.h>
 
 /**
@@ -50,3 +51,91 @@ int netdev_switch_port_stp_update(struct net_device *dev, u8 state)
 	return ops->ndo_switch_port_stp_update(dev, state);
 }
 EXPORT_SYMBOL(netdev_switch_port_stp_update);
+
+static struct net_device *netdev_switch_get_by_fib_dev(struct net_device *dev)
+{
+	const struct net_device_ops *ops = dev->netdev_ops;
+	struct net_device *lower_dev;
+	struct net_device *port_dev;
+	struct list_head *iter;
+
+	/* Recusively search from fib_dev down until we find
+	 * a sw port dev.  (A sw port dev supports
+	 * ndo_switch_parent_id_get).
+	 */
+
+	if (ops->ndo_switch_parent_id_get)
+		return dev;
+
+	netdev_for_each_lower_dev(dev, lower_dev, iter) {
+		port_dev = netdev_switch_get_by_fib_dev(lower_dev);
+		if (port_dev)
+			return port_dev;
+	}
+
+	return NULL;
+}
+
+/**
+ *	netdev_switch_fib_ipv4_add - Add IPv4 route entry to switch
+ *
+ *	@dst: route's IPv4 destination address
+ *	@dst_len: destination address length (prefix length)
+ *	@fi: route FIB info structure
+ *	@tos: route TOS
+ *	@type: route type
+ *	@tb_id: route table ID
+ *
+ *	Add IPv4 route entry to switch device.
+ */
+int netdev_switch_fib_ipv4_add(u32 dst, int dst_len, struct fib_info *fi,
+			       u8 tos, u8 type, u32 tb_id)
+{
+	struct net_device *dev;
+	const struct net_device_ops *ops;
+	int err = -EOPNOTSUPP;
+
+	dev = netdev_switch_get_by_fib_dev(fi->fib_dev);
+	if (!dev)
+		return -EOPNOTSUPP;
+	ops = dev->netdev_ops;
+
+	if (ops->ndo_switch_fib_ipv4_add)
+		err = ops->ndo_switch_fib_ipv4_add(dev, htonl(dst), dst_len,
+						   fi, tos, type, tb_id);
+
+	return err;
+}
+EXPORT_SYMBOL(netdev_switch_fib_ipv4_add);
+
+/**
+ *	netdev_switch_fib_ipv4_del - Delete IPv4 route entry from switch
+ *
+ *	@dst: route's IPv4 destination address
+ *	@dst_len: destination address length (prefix length)
+ *	@fi: route FIB info structure
+ *	@tos: route TOS
+ *	@type: route type
+ *	@tb_id: route table ID
+ *
+ *	Delete IPv4 route entry from switch device.
+ */
+int netdev_switch_fib_ipv4_del(u32 dst, int dst_len, struct fib_info *fi,
+			       u8 tos, u8 type, u32 tb_id)
+{
+	struct net_device *dev;
+	const struct net_device_ops *ops;
+	int err = -EOPNOTSUPP;
+
+	dev = netdev_switch_get_by_fib_dev(fi->fib_dev);
+	if (!dev)
+		return -EOPNOTSUPP;
+	ops = dev->netdev_ops;
+
+	if (ops->ndo_switch_fib_ipv4_del)
+		err = ops->ndo_switch_fib_ipv4_del(dev, htonl(dst), dst_len,
+						   fi, tos, type, tb_id);
+
+	return err;
+}
+EXPORT_SYMBOL(netdev_switch_fib_ipv4_del);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 1/3] net: add IPv4 routing FIB support for swdev
  2015-01-02  3:29 [PATCH net-next 1/3] net: add IPv4 routing FIB support for swdev sfeldma
@ 2015-01-02  5:49 ` roopa
  2015-01-02  8:00   ` Scott Feldman
  2015-01-02 11:21   ` Arad, Ronen
  2015-01-06 13:58 ` Hannes Frederic Sowa
  1 sibling, 2 replies; 17+ messages in thread
From: roopa @ 2015-01-02  5:49 UTC (permalink / raw)
  To: sfeldma; +Cc: netdev, jiri, john.fastabend, tgraf, jhs, andy

On 1/1/15, 7:29 PM, sfeldma@gmail.com wrote:
> From: Scott Feldman <sfeldma@gmail.com>
>
> To offload IPv4 L3 routing functions to swdev device, the swdev device driver
> implements two new ndo ops (ndo_switch_fib_ipv4_add/del).  The ops are called
> by the core IPv4 FIB code when installing/removing FIB entries to/from the
> kernel FIB.  On install, the driver should return 0 if FIB entry (route) can be
> installed to device for offloading, -EOPNOTSUPP if route cannot be installed
> due to device limitations, and other negative error code on failure to install
> route to device.  On failure error code, the route is not installed to device,
> and not installed in kernel FIB, and the return code is propagated back to the
> user-space caller (via netlink).  An -EOPNOTSUPP error code is skipped for the
> device but installed in the kernel FIB.
>
> The FIB entry (route) nexthop list is used to find the swdev device port to
> anchor the ndo op call.  The route's fib_dev (the first nexthop's dev) is used
> find the swdev port by recursively traversing the fib_dev's lower_dev list
> until a swdev port is found.  The ndo op is called on this swdev port.

scott, I posted a similar api for bridge attribute sets. But, nobody 
supported it.
http://marc.info/?l=linux-netdev&m=141820234410602&w=2

If this is acceptable, I will be resubmitting my api as well.



>
> Since the FIB entry is "naked" when push from the kernel, the driver/device
> is responsible for resolving the route's nexthops to neighbor MAC addresses.
> This can be done by the driver by monitoring NETEVENT_NEIGH_UPDATE
> netevent notifier to watch for ARP activity.  Once a nexthop is resolved to
> neighbor MAC address, it can be installed to the device and the device will
> do the L3 routing offload in HW, for that nexthop.
>
> Signed-off-by: Scott Feldman <sfeldma@gmail.com>
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> ---
>   include/linux/netdevice.h |   22 +++++++++++
>   include/net/switchdev.h   |   18 +++++++++
>   net/ipv4/fib_trie.c       |   17 ++++++++-
>   net/switchdev/switchdev.c |   89 +++++++++++++++++++++++++++++++++++++++++++++
>   4 files changed, 145 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 679e6e9..b66d22b 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -767,6 +767,8 @@ struct netdev_phys_item_id {
>   typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
>   				       struct sk_buff *skb);
>   
> +struct fib_info;
> +
>   /*
>    * This structure defines the management hooks for network devices.
>    * The following hooks can be defined; unless noted otherwise, they are
> @@ -1030,6 +1032,14 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
>    * int (*ndo_switch_port_stp_update)(struct net_device *dev, u8 state);
>    *	Called to notify switch device port of bridge port STP
>    *	state change.
> + * int (*ndo_sw_parent_fib_ipv4_add)(struct net_device *dev, __be32 dst,
> + *				     int dst_len, struct fib_info *fi,
> + *				     u8 tos, u8 type, u32 tb_id);
> + *	Called to add IPv4 route to switch device.
> + * int (*ndo_sw_parent_fib_ipv4_del)(struct net_device *dev, __be32 dst,
> + *				     int dst_len, struct fib_info *fi,
> + *				     u8 tos, u8 type, u32 tb_id);
> + *	Called to delete IPv4 route from switch device.
>    */
>   struct net_device_ops {
>   	int			(*ndo_init)(struct net_device *dev);
> @@ -1189,6 +1199,18 @@ struct net_device_ops {
>   							    struct netdev_phys_item_id *psid);
>   	int			(*ndo_switch_port_stp_update)(struct net_device *dev,
>   							      u8 state);
> +	int			(*ndo_switch_fib_ipv4_add)(struct net_device *dev,
> +							   __be32 dst,
> +							   int dst_len,
> +							   struct fib_info *fi,
> +							   u8 tos, u8 type,
> +							   u32 tb_id);
> +	int			(*ndo_switch_fib_ipv4_del)(struct net_device *dev,
> +							   __be32 dst,
> +							   int dst_len,
> +							   struct fib_info *fi,
> +							   u8 tos, u8 type,
> +							   u32 tb_id);
>   #endif
>   };
>   
> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
> index 8a6d164..caebc2a 100644
> --- a/include/net/switchdev.h
> +++ b/include/net/switchdev.h
> @@ -17,6 +17,10 @@
>   int netdev_switch_parent_id_get(struct net_device *dev,
>   				struct netdev_phys_item_id *psid);
>   int netdev_switch_port_stp_update(struct net_device *dev, u8 state);
> +int netdev_switch_fib_ipv4_add(u32 dst, int dst_len, struct fib_info *fi,
> +			       u8 tos, u8 type, u32 tb_id);
> +int netdev_switch_fib_ipv4_del(u32 dst, int dst_len, struct fib_info *fi,
> +			       u8 tos, u8 type, u32 tb_id);
>   
>   #else
>   
> @@ -32,6 +36,20 @@ static inline int netdev_switch_port_stp_update(struct net_device *dev,
>   	return -EOPNOTSUPP;
>   }
>   
> +static inline int netdev_switch_fib_ipv4_add(u32 dst, int dst_len,
> +					     struct fib_info *fi,
> +					     u8 tos, u8 type, u32 tb_id)
> +{
> +	return -EOPNOTSUPP;
> +}
> +
> +static inline int netdev_switch_fib_ipv4_del(u32 dst, int dst_len,
> +					     struct fib_info *fi,
> +					     u8 tos, u8 type, u32 tb_id)
> +{
> +	return -EOPNOTSUPP;
> +}
> +
>   #endif
>   
>   #endif /* _LINUX_SWITCHDEV_H_ */
> diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
> index 281e5e0..ea2dc17 100644
> --- a/net/ipv4/fib_trie.c
> +++ b/net/ipv4/fib_trie.c
> @@ -79,6 +79,7 @@
>   #include <net/tcp.h>
>   #include <net/sock.h>
>   #include <net/ip_fib.h>
> +#include <net/switchdev.h>
>   #include "fib_lookup.h"
>   
>   #define MAX_STAT_DEPTH 32
> @@ -1201,6 +1202,8 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
>   			fib_release_info(fi_drop);
>   			if (state & FA_S_ACCESSED)
>   				rt_cache_flush(cfg->fc_nlinfo.nl_net);
> +			netdev_switch_fib_ipv4_add(key, plen, fi, fa->fa_tos,
> +						   cfg->fc_type, tb->tb_id);
>   			rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen,
>   				tb->tb_id, &cfg->fc_nlinfo, NLM_F_REPLACE);
>   
> @@ -1229,6 +1232,13 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
>   	new_fa->fa_tos = tos;
>   	new_fa->fa_type = cfg->fc_type;
>   	new_fa->fa_state = 0;
> +
> +	/* (Optionally) offload fib info to switch hardware. */
> +	err = netdev_switch_fib_ipv4_add(key, plen, fi, tos,
> +					 cfg->fc_type, tb->tb_id);
> +	if (err && err != -EOPNOTSUPP)
> +		goto out_free_new_fa;
> +
>   	/*
>   	 * Insert new entry to the list.
>   	 */
> @@ -1237,7 +1247,7 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
>   		fa_head = fib_insert_node(t, key, plen);
>   		if (unlikely(!fa_head)) {
>   			err = -ENOMEM;
> -			goto out_free_new_fa;
> +			goto out_sw_fib_del;
>   		}
>   	}
>   
> @@ -1253,6 +1263,8 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
>   succeeded:
>   	return 0;
>   
> +out_sw_fib_del:
> +	netdev_switch_fib_ipv4_del(key, plen, fi, tos, cfg->fc_type, tb->tb_id);
>   out_free_new_fa:
>   	kmem_cache_free(fn_alias_kmem, new_fa);
>   out:
> @@ -1529,6 +1541,9 @@ int fib_table_delete(struct fib_table *tb, struct fib_config *cfg)
>   	rtmsg_fib(RTM_DELROUTE, htonl(key), fa, plen, tb->tb_id,
>   		  &cfg->fc_nlinfo, 0);
>   
> +	netdev_switch_fib_ipv4_del(key, plen, fa->fa_info, tos,
> +				   cfg->fc_type, tb->tb_id);
> +
>   	list_del_rcu(&fa->fa_list);
>   
>   	if (!plen)
> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
> index d162b21..211a8a0 100644
> --- a/net/switchdev/switchdev.c
> +++ b/net/switchdev/switchdev.c
> @@ -12,6 +12,7 @@
>   #include <linux/types.h>
>   #include <linux/init.h>
>   #include <linux/netdevice.h>
> +#include <net/ip_fib.h>
>   #include <net/switchdev.h>
>   
>   /**
> @@ -50,3 +51,91 @@ int netdev_switch_port_stp_update(struct net_device *dev, u8 state)
>   	return ops->ndo_switch_port_stp_update(dev, state);
>   }
>   EXPORT_SYMBOL(netdev_switch_port_stp_update);
> +
> +static struct net_device *netdev_switch_get_by_fib_dev(struct net_device *dev)
> +{
> +	const struct net_device_ops *ops = dev->netdev_ops;
> +	struct net_device *lower_dev;
> +	struct net_device *port_dev;
> +	struct list_head *iter;
> +
> +	/* Recusively search from fib_dev down until we find
> +	 * a sw port dev.  (A sw port dev supports
> +	 * ndo_switch_parent_id_get).
> +	 */
> +
> +	if (ops->ndo_switch_parent_id_get)
> +		return dev;
> +
> +	netdev_for_each_lower_dev(dev, lower_dev, iter) {
> +		port_dev = netdev_switch_get_by_fib_dev(lower_dev);
> +		if (port_dev)
> +			return port_dev;
> +	}
> +
> +	return NULL;
> +}
> +
> +/**
> + *	netdev_switch_fib_ipv4_add - Add IPv4 route entry to switch
> + *
> + *	@dst: route's IPv4 destination address
> + *	@dst_len: destination address length (prefix length)
> + *	@fi: route FIB info structure
> + *	@tos: route TOS
> + *	@type: route type
> + *	@tb_id: route table ID
> + *
> + *	Add IPv4 route entry to switch device.
> + */
> +int netdev_switch_fib_ipv4_add(u32 dst, int dst_len, struct fib_info *fi,
> +			       u8 tos, u8 type, u32 tb_id)
> +{
> +	struct net_device *dev;
> +	const struct net_device_ops *ops;
> +	int err = -EOPNOTSUPP;
> +
> +	dev = netdev_switch_get_by_fib_dev(fi->fib_dev);
> +	if (!dev)
> +		return -EOPNOTSUPP;
> +	ops = dev->netdev_ops;
> +
> +	if (ops->ndo_switch_fib_ipv4_add)
> +		err = ops->ndo_switch_fib_ipv4_add(dev, htonl(dst), dst_len,
> +						   fi, tos, type, tb_id);
> +
> +	return err;
> +}
> +EXPORT_SYMBOL(netdev_switch_fib_ipv4_add);
> +
> +/**
> + *	netdev_switch_fib_ipv4_del - Delete IPv4 route entry from switch
> + *
> + *	@dst: route's IPv4 destination address
> + *	@dst_len: destination address length (prefix length)
> + *	@fi: route FIB info structure
> + *	@tos: route TOS
> + *	@type: route type
> + *	@tb_id: route table ID
> + *
> + *	Delete IPv4 route entry from switch device.
> + */
> +int netdev_switch_fib_ipv4_del(u32 dst, int dst_len, struct fib_info *fi,
> +			       u8 tos, u8 type, u32 tb_id)
> +{
> +	struct net_device *dev;
> +	const struct net_device_ops *ops;
> +	int err = -EOPNOTSUPP;
> +
> +	dev = netdev_switch_get_by_fib_dev(fi->fib_dev);
> +	if (!dev)
> +		return -EOPNOTSUPP;
> +	ops = dev->netdev_ops;
> +
> +	if (ops->ndo_switch_fib_ipv4_del)
> +		err = ops->ndo_switch_fib_ipv4_del(dev, htonl(dst), dst_len,
> +						   fi, tos, type, tb_id);
> +
> +	return err;
> +}
> +EXPORT_SYMBOL(netdev_switch_fib_ipv4_del);

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 1/3] net: add IPv4 routing FIB support for swdev
  2015-01-02  5:49 ` roopa
@ 2015-01-02  8:00   ` Scott Feldman
  2015-01-02 11:39     ` Arad, Ronen
  2015-01-02 20:55     ` roopa
  2015-01-02 11:21   ` Arad, Ronen
  1 sibling, 2 replies; 17+ messages in thread
From: Scott Feldman @ 2015-01-02  8:00 UTC (permalink / raw)
  To: roopa
  Cc: Netdev, Jiří Pírko, john fastabend, Thomas Graf,
	Jamal Hadi Salim, Andy Gospodarek

On Thu, Jan 1, 2015 at 9:49 PM, roopa <roopa@cumulusnetworks.com> wrote:
> On 1/1/15, 7:29 PM, sfeldma@gmail.com wrote:
>>
>> From: Scott Feldman <sfeldma@gmail.com>
>>
>> To offload IPv4 L3 routing functions to swdev device, the swdev device
>> driver
>> implements two new ndo ops (ndo_switch_fib_ipv4_add/del).  The ops are
>> called
>> by the core IPv4 FIB code when installing/removing FIB entries to/from the
>> kernel FIB.  On install, the driver should return 0 if FIB entry (route)
>> can be
>> installed to device for offloading, -EOPNOTSUPP if route cannot be
>> installed
>> due to device limitations, and other negative error code on failure to
>> install
>> route to device.  On failure error code, the route is not installed to
>> device,
>> and not installed in kernel FIB, and the return code is propagated back to
>> the
>> user-space caller (via netlink).  An -EOPNOTSUPP error code is skipped for
>> the
>> device but installed in the kernel FIB.
>>
>> The FIB entry (route) nexthop list is used to find the swdev device port
>> to
>> anchor the ndo op call.  The route's fib_dev (the first nexthop's dev) is
>> used
>> find the swdev port by recursively traversing the fib_dev's lower_dev list
>> until a swdev port is found.  The ndo op is called on this swdev port.
>
>
> scott, I posted a similar api for bridge attribute sets. But, nobody
> supported it.
> http://marc.info/?l=linux-netdev&m=141820234410602&w=2
>
> If this is acceptable, I will be resubmitting my api as well.
>

This may get shot down as well, who knows?

For routes, the nexthop dev may be a bridge or a bond for an IP on the
router, so we have no choice but to walk down from the bridge or the
bond to find a swport dev to call the ndo op to install the route.

For bridge settings, I remember someone raised the issue that settings
should be propagated down the dev hierarchy, with parent calling
child's op and so on.  I'll go back and look at your post.

>
>
>>
>> Since the FIB entry is "naked" when push from the kernel, the
>> driver/device
>> is responsible for resolving the route's nexthops to neighbor MAC
>> addresses.
>> This can be done by the driver by monitoring NETEVENT_NEIGH_UPDATE
>> netevent notifier to watch for ARP activity.  Once a nexthop is resolved
>> to
>> neighbor MAC address, it can be installed to the device and the device
>> will
>> do the L3 routing offload in HW, for that nexthop.
>>
>> Signed-off-by: Scott Feldman <sfeldma@gmail.com>
>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>> ---
>>   include/linux/netdevice.h |   22 +++++++++++
>>   include/net/switchdev.h   |   18 +++++++++
>>   net/ipv4/fib_trie.c       |   17 ++++++++-
>>   net/switchdev/switchdev.c |   89
>> +++++++++++++++++++++++++++++++++++++++++++++
>>   4 files changed, 145 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index 679e6e9..b66d22b 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -767,6 +767,8 @@ struct netdev_phys_item_id {
>>   typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
>>                                        struct sk_buff *skb);
>>   +struct fib_info;
>> +
>>   /*
>>    * This structure defines the management hooks for network devices.
>>    * The following hooks can be defined; unless noted otherwise, they are
>> @@ -1030,6 +1032,14 @@ typedef u16 (*select_queue_fallback_t)(struct
>> net_device *dev,
>>    * int (*ndo_switch_port_stp_update)(struct net_device *dev, u8 state);
>>    *    Called to notify switch device port of bridge port STP
>>    *    state change.
>> + * int (*ndo_sw_parent_fib_ipv4_add)(struct net_device *dev, __be32 dst,
>> + *                                  int dst_len, struct fib_info *fi,
>> + *                                  u8 tos, u8 type, u32 tb_id);
>> + *     Called to add IPv4 route to switch device.
>> + * int (*ndo_sw_parent_fib_ipv4_del)(struct net_device *dev, __be32 dst,
>> + *                                  int dst_len, struct fib_info *fi,
>> + *                                  u8 tos, u8 type, u32 tb_id);
>> + *     Called to delete IPv4 route from switch device.
>>    */
>>   struct net_device_ops {
>>         int                     (*ndo_init)(struct net_device *dev);
>> @@ -1189,6 +1199,18 @@ struct net_device_ops {
>>                                                             struct
>> netdev_phys_item_id *psid);
>>         int                     (*ndo_switch_port_stp_update)(struct
>> net_device *dev,
>>                                                               u8 state);
>> +       int                     (*ndo_switch_fib_ipv4_add)(struct
>> net_device *dev,
>> +                                                          __be32 dst,
>> +                                                          int dst_len,
>> +                                                          struct fib_info
>> *fi,
>> +                                                          u8 tos, u8
>> type,
>> +                                                          u32 tb_id);
>> +       int                     (*ndo_switch_fib_ipv4_del)(struct
>> net_device *dev,
>> +                                                          __be32 dst,
>> +                                                          int dst_len,
>> +                                                          struct fib_info
>> *fi,
>> +                                                          u8 tos, u8
>> type,
>> +                                                          u32 tb_id);
>>   #endif
>>   };
>>   diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>> index 8a6d164..caebc2a 100644
>> --- a/include/net/switchdev.h
>> +++ b/include/net/switchdev.h
>> @@ -17,6 +17,10 @@
>>   int netdev_switch_parent_id_get(struct net_device *dev,
>>                                 struct netdev_phys_item_id *psid);
>>   int netdev_switch_port_stp_update(struct net_device *dev, u8 state);
>> +int netdev_switch_fib_ipv4_add(u32 dst, int dst_len, struct fib_info *fi,
>> +                              u8 tos, u8 type, u32 tb_id);
>> +int netdev_switch_fib_ipv4_del(u32 dst, int dst_len, struct fib_info *fi,
>> +                              u8 tos, u8 type, u32 tb_id);
>>     #else
>>   @@ -32,6 +36,20 @@ static inline int
>> netdev_switch_port_stp_update(struct net_device *dev,
>>         return -EOPNOTSUPP;
>>   }
>>   +static inline int netdev_switch_fib_ipv4_add(u32 dst, int dst_len,
>> +                                            struct fib_info *fi,
>> +                                            u8 tos, u8 type, u32 tb_id)
>> +{
>> +       return -EOPNOTSUPP;
>> +}
>> +
>> +static inline int netdev_switch_fib_ipv4_del(u32 dst, int dst_len,
>> +                                            struct fib_info *fi,
>> +                                            u8 tos, u8 type, u32 tb_id)
>> +{
>> +       return -EOPNOTSUPP;
>> +}
>> +
>>   #endif
>>     #endif /* _LINUX_SWITCHDEV_H_ */
>> diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
>> index 281e5e0..ea2dc17 100644
>> --- a/net/ipv4/fib_trie.c
>> +++ b/net/ipv4/fib_trie.c
>> @@ -79,6 +79,7 @@
>>   #include <net/tcp.h>
>>   #include <net/sock.h>
>>   #include <net/ip_fib.h>
>> +#include <net/switchdev.h>
>>   #include "fib_lookup.h"
>>     #define MAX_STAT_DEPTH 32
>> @@ -1201,6 +1202,8 @@ int fib_table_insert(struct fib_table *tb, struct
>> fib_config *cfg)
>>                         fib_release_info(fi_drop);
>>                         if (state & FA_S_ACCESSED)
>>                                 rt_cache_flush(cfg->fc_nlinfo.nl_net);
>> +                       netdev_switch_fib_ipv4_add(key, plen, fi,
>> fa->fa_tos,
>> +                                                  cfg->fc_type,
>> tb->tb_id);
>>                         rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen,
>>                                 tb->tb_id, &cfg->fc_nlinfo,
>> NLM_F_REPLACE);
>>   @@ -1229,6 +1232,13 @@ int fib_table_insert(struct fib_table *tb, struct
>> fib_config *cfg)
>>         new_fa->fa_tos = tos;
>>         new_fa->fa_type = cfg->fc_type;
>>         new_fa->fa_state = 0;
>> +
>> +       /* (Optionally) offload fib info to switch hardware. */
>> +       err = netdev_switch_fib_ipv4_add(key, plen, fi, tos,
>> +                                        cfg->fc_type, tb->tb_id);
>> +       if (err && err != -EOPNOTSUPP)
>> +               goto out_free_new_fa;
>> +
>>         /*
>>          * Insert new entry to the list.
>>          */
>> @@ -1237,7 +1247,7 @@ int fib_table_insert(struct fib_table *tb, struct
>> fib_config *cfg)
>>                 fa_head = fib_insert_node(t, key, plen);
>>                 if (unlikely(!fa_head)) {
>>                         err = -ENOMEM;
>> -                       goto out_free_new_fa;
>> +                       goto out_sw_fib_del;
>>                 }
>>         }
>>   @@ -1253,6 +1263,8 @@ int fib_table_insert(struct fib_table *tb, struct
>> fib_config *cfg)
>>   succeeded:
>>         return 0;
>>   +out_sw_fib_del:
>> +       netdev_switch_fib_ipv4_del(key, plen, fi, tos, cfg->fc_type,
>> tb->tb_id);
>>   out_free_new_fa:
>>         kmem_cache_free(fn_alias_kmem, new_fa);
>>   out:
>> @@ -1529,6 +1541,9 @@ int fib_table_delete(struct fib_table *tb, struct
>> fib_config *cfg)
>>         rtmsg_fib(RTM_DELROUTE, htonl(key), fa, plen, tb->tb_id,
>>                   &cfg->fc_nlinfo, 0);
>>   +     netdev_switch_fib_ipv4_del(key, plen, fa->fa_info, tos,
>> +                                  cfg->fc_type, tb->tb_id);
>> +
>>         list_del_rcu(&fa->fa_list);
>>         if (!plen)
>> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
>> index d162b21..211a8a0 100644
>> --- a/net/switchdev/switchdev.c
>> +++ b/net/switchdev/switchdev.c
>> @@ -12,6 +12,7 @@
>>   #include <linux/types.h>
>>   #include <linux/init.h>
>>   #include <linux/netdevice.h>
>> +#include <net/ip_fib.h>
>>   #include <net/switchdev.h>
>>     /**
>> @@ -50,3 +51,91 @@ int netdev_switch_port_stp_update(struct net_device
>> *dev, u8 state)
>>         return ops->ndo_switch_port_stp_update(dev, state);
>>   }
>>   EXPORT_SYMBOL(netdev_switch_port_stp_update);
>> +
>> +static struct net_device *netdev_switch_get_by_fib_dev(struct net_device
>> *dev)
>> +{
>> +       const struct net_device_ops *ops = dev->netdev_ops;
>> +       struct net_device *lower_dev;
>> +       struct net_device *port_dev;
>> +       struct list_head *iter;
>> +
>> +       /* Recusively search from fib_dev down until we find
>> +        * a sw port dev.  (A sw port dev supports
>> +        * ndo_switch_parent_id_get).
>> +        */
>> +
>> +       if (ops->ndo_switch_parent_id_get)
>> +               return dev;
>> +
>> +       netdev_for_each_lower_dev(dev, lower_dev, iter) {
>> +               port_dev = netdev_switch_get_by_fib_dev(lower_dev);
>> +               if (port_dev)
>> +                       return port_dev;
>> +       }
>> +
>> +       return NULL;
>> +}
>> +
>> +/**
>> + *     netdev_switch_fib_ipv4_add - Add IPv4 route entry to switch
>> + *
>> + *     @dst: route's IPv4 destination address
>> + *     @dst_len: destination address length (prefix length)
>> + *     @fi: route FIB info structure
>> + *     @tos: route TOS
>> + *     @type: route type
>> + *     @tb_id: route table ID
>> + *
>> + *     Add IPv4 route entry to switch device.
>> + */
>> +int netdev_switch_fib_ipv4_add(u32 dst, int dst_len, struct fib_info *fi,
>> +                              u8 tos, u8 type, u32 tb_id)
>> +{
>> +       struct net_device *dev;
>> +       const struct net_device_ops *ops;
>> +       int err = -EOPNOTSUPP;
>> +
>> +       dev = netdev_switch_get_by_fib_dev(fi->fib_dev);
>> +       if (!dev)
>> +               return -EOPNOTSUPP;
>> +       ops = dev->netdev_ops;
>> +
>> +       if (ops->ndo_switch_fib_ipv4_add)
>> +               err = ops->ndo_switch_fib_ipv4_add(dev, htonl(dst),
>> dst_len,
>> +                                                  fi, tos, type, tb_id);
>> +
>> +       return err;
>> +}
>> +EXPORT_SYMBOL(netdev_switch_fib_ipv4_add);
>> +
>> +/**
>> + *     netdev_switch_fib_ipv4_del - Delete IPv4 route entry from switch
>> + *
>> + *     @dst: route's IPv4 destination address
>> + *     @dst_len: destination address length (prefix length)
>> + *     @fi: route FIB info structure
>> + *     @tos: route TOS
>> + *     @type: route type
>> + *     @tb_id: route table ID
>> + *
>> + *     Delete IPv4 route entry from switch device.
>> + */
>> +int netdev_switch_fib_ipv4_del(u32 dst, int dst_len, struct fib_info *fi,
>> +                              u8 tos, u8 type, u32 tb_id)
>> +{
>> +       struct net_device *dev;
>> +       const struct net_device_ops *ops;
>> +       int err = -EOPNOTSUPP;
>> +
>> +       dev = netdev_switch_get_by_fib_dev(fi->fib_dev);
>> +       if (!dev)
>> +               return -EOPNOTSUPP;
>> +       ops = dev->netdev_ops;
>> +
>> +       if (ops->ndo_switch_fib_ipv4_del)
>> +               err = ops->ndo_switch_fib_ipv4_del(dev, htonl(dst),
>> dst_len,
>> +                                                  fi, tos, type, tb_id);
>> +
>> +       return err;
>> +}
>> +EXPORT_SYMBOL(netdev_switch_fib_ipv4_del);
>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH net-next 1/3] net: add IPv4 routing FIB support for swdev
  2015-01-02  5:49 ` roopa
  2015-01-02  8:00   ` Scott Feldman
@ 2015-01-02 11:21   ` Arad, Ronen
  2015-01-02 21:53     ` roopa
  1 sibling, 1 reply; 17+ messages in thread
From: Arad, Ronen @ 2015-01-02 11:21 UTC (permalink / raw)
  To: roopa, sfeldma, netdev; +Cc: jiri, john.fastabend, tgraf, jhs, andy



>-----Original Message-----
>From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On
>Behalf Of roopa
>Sent: Friday, January 02, 2015 7:50 AM
>To: sfeldma@gmail.com
>Cc: netdev@vger.kernel.org; jiri@resnulli.us; john.fastabend@gmail.com;
>tgraf@suug.ch; jhs@mojatatu.com; andy@greyhouse.net
>Subject: Re: [PATCH net-next 1/3] net: add IPv4 routing FIB support for swdev
>
>On 1/1/15, 7:29 PM, sfeldma@gmail.com wrote:
>> From: Scott Feldman <sfeldma@gmail.com>
>>
>> To offload IPv4 L3 routing functions to swdev device, the swdev device
>driver
>> implements two new ndo ops (ndo_switch_fib_ipv4_add/del).  The ops are
>called
>> by the core IPv4 FIB code when installing/removing FIB entries to/from the
>> kernel FIB.  On install, the driver should return 0 if FIB entry (route) can
>be
>> installed to device for offloading, -EOPNOTSUPP if route cannot be installed
>> due to device limitations, and other negative error code on failure to
>install
>> route to device.  On failure error code, the route is not installed to
>device,
>> and not installed in kernel FIB, and the return code is propagated back to
>the
>> user-space caller (via netlink).  An -EOPNOTSUPP error code is skipped for
>the
>> device but installed in the kernel FIB.
>>
>> The FIB entry (route) nexthop list is used to find the swdev device port to
>> anchor the ndo op call.  The route's fib_dev (the first nexthop's dev) is
>used
>> find the swdev port by recursively traversing the fib_dev's lower_dev list
>> until a swdev port is found.  The ndo op is called on this swdev port.
>
>scott, I posted a similar api for bridge attribute sets. But, nobody
>supported it.
>http://marc.info/?l=linux-netdev&m=141820234410602&w=2
>
>If this is acceptable, I will be resubmitting my api as well.
>
There is certainly a need to propagate bridge and brport attributes to
switchdev driver. I believe the objections to your patch were not about that
need but about the mechanism of doing that. My understanding of the objections
on the list is that the propagation has to be delegated to intermediate master
devices (such as bond/team) in a stacked architecture instead of blindly
traverse through them to leaf switchdev ports.
An ideal traversal would allow intermediate master (or just upper) devices to
intervene or block the traversal while defaulting to the suggested transparent
traversal. This could address the objections to your patch. Maybe the traversal
requires an introduction of a new ndo.
>
>
>>
>> Since the FIB entry is "naked" when push from the kernel, the driver/device
>> is responsible for resolving the route's nexthops to neighbor MAC addresses.
>> This can be done by the driver by monitoring NETEVENT_NEIGH_UPDATE
>> netevent notifier to watch for ARP activity.  Once a nexthop is resolved to
>> neighbor MAC address, it can be installed to the device and the device will
>> do the L3 routing offload in HW, for that nexthop.
>>
>> Signed-off-by: Scott Feldman <sfeldma@gmail.com>
>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>> ---
>>   include/linux/netdevice.h |   22 +++++++++++
>>   include/net/switchdev.h   |   18 +++++++++
>>   net/ipv4/fib_trie.c       |   17 ++++++++-
>>   net/switchdev/switchdev.c |   89
>+++++++++++++++++++++++++++++++++++++++++++++
>>   4 files changed, 145 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index 679e6e9..b66d22b 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -767,6 +767,8 @@ struct netdev_phys_item_id {
>>   typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
>>   				       struct sk_buff *skb);
>>
>> +struct fib_info;
>> +
>>   /*
>>    * This structure defines the management hooks for network devices.
>>    * The following hooks can be defined; unless noted otherwise, they are
>> @@ -1030,6 +1032,14 @@ typedef u16 (*select_queue_fallback_t)(struct
>net_device *dev,
>>    * int (*ndo_switch_port_stp_update)(struct net_device *dev, u8 state);
>>    *	Called to notify switch device port of bridge port STP
>>    *	state change.
>> + * int (*ndo_sw_parent_fib_ipv4_add)(struct net_device *dev, __be32 dst,
>> + *				     int dst_len, struct fib_info *fi,
>> + *				     u8 tos, u8 type, u32 tb_id);
>> + *	Called to add IPv4 route to switch device.
>> + * int (*ndo_sw_parent_fib_ipv4_del)(struct net_device *dev, __be32 dst,
>> + *				     int dst_len, struct fib_info *fi,
>> + *				     u8 tos, u8 type, u32 tb_id);
>> + *	Called to delete IPv4 route from switch device.
>>    */
>>   struct net_device_ops {
>>   	int			(*ndo_init)(struct net_device *dev);
>> @@ -1189,6 +1199,18 @@ struct net_device_ops {
>>   							    struct netdev_phys_item_id
>*psid);
>>   	int			(*ndo_switch_port_stp_update)(struct net_device
>*dev,
>>   							      u8 state);
>> +	int			(*ndo_switch_fib_ipv4_add)(struct net_device *dev,
>> +							   __be32 dst,
>> +							   int dst_len,
>> +							   struct fib_info *fi,
>> +							   u8 tos, u8 type,
>> +							   u32 tb_id);
>> +	int			(*ndo_switch_fib_ipv4_del)(struct net_device *dev,
>> +							   __be32 dst,
>> +							   int dst_len,
>> +							   struct fib_info *fi,
>> +							   u8 tos, u8 type,
>> +							   u32 tb_id);
>>   #endif
>>   };
>>
>> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>> index 8a6d164..caebc2a 100644
>> --- a/include/net/switchdev.h
>> +++ b/include/net/switchdev.h
>> @@ -17,6 +17,10 @@
>>   int netdev_switch_parent_id_get(struct net_device *dev,
>>   				struct netdev_phys_item_id *psid);
>>   int netdev_switch_port_stp_update(struct net_device *dev, u8 state);
>> +int netdev_switch_fib_ipv4_add(u32 dst, int dst_len, struct fib_info *fi,
>> +			       u8 tos, u8 type, u32 tb_id);
>> +int netdev_switch_fib_ipv4_del(u32 dst, int dst_len, struct fib_info *fi,
>> +			       u8 tos, u8 type, u32 tb_id);
>>
>>   #else
>>
>> @@ -32,6 +36,20 @@ static inline int netdev_switch_port_stp_update(struct
>net_device *dev,
>>   	return -EOPNOTSUPP;
>>   }
>>
>> +static inline int netdev_switch_fib_ipv4_add(u32 dst, int dst_len,
>> +					     struct fib_info *fi,
>> +					     u8 tos, u8 type, u32 tb_id)
>> +{
>> +	return -EOPNOTSUPP;
>> +}
>> +
>> +static inline int netdev_switch_fib_ipv4_del(u32 dst, int dst_len,
>> +					     struct fib_info *fi,
>> +					     u8 tos, u8 type, u32 tb_id)
>> +{
>> +	return -EOPNOTSUPP;
>> +}
>> +
>>   #endif
>>
>>   #endif /* _LINUX_SWITCHDEV_H_ */
>> diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
>> index 281e5e0..ea2dc17 100644
>> --- a/net/ipv4/fib_trie.c
>> +++ b/net/ipv4/fib_trie.c
>> @@ -79,6 +79,7 @@
>>   #include <net/tcp.h>
>>   #include <net/sock.h>
>>   #include <net/ip_fib.h>
>> +#include <net/switchdev.h>
>>   #include "fib_lookup.h"
>>
>>   #define MAX_STAT_DEPTH 32
>> @@ -1201,6 +1202,8 @@ int fib_table_insert(struct fib_table *tb, struct
>fib_config *cfg)
>>   			fib_release_info(fi_drop);
>>   			if (state & FA_S_ACCESSED)
>>   				rt_cache_flush(cfg->fc_nlinfo.nl_net);
>> +			netdev_switch_fib_ipv4_add(key, plen, fi, fa->fa_tos,
>> +						   cfg->fc_type, tb->tb_id);
>>   			rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen,
>>   				tb->tb_id, &cfg->fc_nlinfo, NLM_F_REPLACE);
>>
>> @@ -1229,6 +1232,13 @@ int fib_table_insert(struct fib_table *tb, struct
>fib_config *cfg)
>>   	new_fa->fa_tos = tos;
>>   	new_fa->fa_type = cfg->fc_type;
>>   	new_fa->fa_state = 0;
>> +
>> +	/* (Optionally) offload fib info to switch hardware. */
>> +	err = netdev_switch_fib_ipv4_add(key, plen, fi, tos,
>> +					 cfg->fc_type, tb->tb_id);
>> +	if (err && err != -EOPNOTSUPP)
>> +		goto out_free_new_fa;
>> +
>>   	/*
>>   	 * Insert new entry to the list.
>>   	 */
>> @@ -1237,7 +1247,7 @@ int fib_table_insert(struct fib_table *tb, struct
>fib_config *cfg)
>>   		fa_head = fib_insert_node(t, key, plen);
>>   		if (unlikely(!fa_head)) {
>>   			err = -ENOMEM;
>> -			goto out_free_new_fa;
>> +			goto out_sw_fib_del;
>>   		}
>>   	}
>>
>> @@ -1253,6 +1263,8 @@ int fib_table_insert(struct fib_table *tb, struct
>fib_config *cfg)
>>   succeeded:
>>   	return 0;
>>
>> +out_sw_fib_del:
>> +	netdev_switch_fib_ipv4_del(key, plen, fi, tos, cfg->fc_type, tb-
>>tb_id);
>>   out_free_new_fa:
>>   	kmem_cache_free(fn_alias_kmem, new_fa);
>>   out:
>> @@ -1529,6 +1541,9 @@ int fib_table_delete(struct fib_table *tb, struct
>fib_config *cfg)
>>   	rtmsg_fib(RTM_DELROUTE, htonl(key), fa, plen, tb->tb_id,
>>   		  &cfg->fc_nlinfo, 0);
>>
>> +	netdev_switch_fib_ipv4_del(key, plen, fa->fa_info, tos,
>> +				   cfg->fc_type, tb->tb_id);
>> +
>>   	list_del_rcu(&fa->fa_list);
>>
>>   	if (!plen)
>> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
>> index d162b21..211a8a0 100644
>> --- a/net/switchdev/switchdev.c
>> +++ b/net/switchdev/switchdev.c
>> @@ -12,6 +12,7 @@
>>   #include <linux/types.h>
>>   #include <linux/init.h>
>>   #include <linux/netdevice.h>
>> +#include <net/ip_fib.h>
>>   #include <net/switchdev.h>
>>
>>   /**
>> @@ -50,3 +51,91 @@ int netdev_switch_port_stp_update(struct net_device *dev,
>u8 state)
>>   	return ops->ndo_switch_port_stp_update(dev, state);
>>   }
>>   EXPORT_SYMBOL(netdev_switch_port_stp_update);
>> +
>> +static struct net_device *netdev_switch_get_by_fib_dev(struct net_device
>*dev)
>> +{
>> +	const struct net_device_ops *ops = dev->netdev_ops;
>> +	struct net_device *lower_dev;
>> +	struct net_device *port_dev;
>> +	struct list_head *iter;
>> +
>> +	/* Recusively search from fib_dev down until we find
>> +	 * a sw port dev.  (A sw port dev supports
>> +	 * ndo_switch_parent_id_get).
>> +	 */
>> +
>> +	if (ops->ndo_switch_parent_id_get)
>> +		return dev;
>> +
>> +	netdev_for_each_lower_dev(dev, lower_dev, iter) {
>> +		port_dev = netdev_switch_get_by_fib_dev(lower_dev);
>> +		if (port_dev)
>> +			return port_dev;
>> +	}
>> +
>> +	return NULL;
>> +}
>> +
>> +/**
>> + *	netdev_switch_fib_ipv4_add - Add IPv4 route entry to switch
>> + *
>> + *	@dst: route's IPv4 destination address
>> + *	@dst_len: destination address length (prefix length)
>> + *	@fi: route FIB info structure
>> + *	@tos: route TOS
>> + *	@type: route type
>> + *	@tb_id: route table ID
>> + *
>> + *	Add IPv4 route entry to switch device.
>> + */
>> +int netdev_switch_fib_ipv4_add(u32 dst, int dst_len, struct fib_info *fi,
>> +			       u8 tos, u8 type, u32 tb_id)
>> +{
>> +	struct net_device *dev;
>> +	const struct net_device_ops *ops;
>> +	int err = -EOPNOTSUPP;
>> +
>> +	dev = netdev_switch_get_by_fib_dev(fi->fib_dev);
>> +	if (!dev)
>> +		return -EOPNOTSUPP;
>> +	ops = dev->netdev_ops;
>> +
>> +	if (ops->ndo_switch_fib_ipv4_add)
>> +		err = ops->ndo_switch_fib_ipv4_add(dev, htonl(dst), dst_len,
>> +						   fi, tos, type, tb_id);
>> +
>> +	return err;
>> +}
>> +EXPORT_SYMBOL(netdev_switch_fib_ipv4_add);
>> +
>> +/**
>> + *	netdev_switch_fib_ipv4_del - Delete IPv4 route entry from switch
>> + *
>> + *	@dst: route's IPv4 destination address
>> + *	@dst_len: destination address length (prefix length)
>> + *	@fi: route FIB info structure
>> + *	@tos: route TOS
>> + *	@type: route type
>> + *	@tb_id: route table ID
>> + *
>> + *	Delete IPv4 route entry from switch device.
>> + */
>> +int netdev_switch_fib_ipv4_del(u32 dst, int dst_len, struct fib_info *fi,
>> +			       u8 tos, u8 type, u32 tb_id)
>> +{
>> +	struct net_device *dev;
>> +	const struct net_device_ops *ops;
>> +	int err = -EOPNOTSUPP;
>> +
>> +	dev = netdev_switch_get_by_fib_dev(fi->fib_dev);
>> +	if (!dev)
>> +		return -EOPNOTSUPP;
>> +	ops = dev->netdev_ops;
>> +
>> +	if (ops->ndo_switch_fib_ipv4_del)
>> +		err = ops->ndo_switch_fib_ipv4_del(dev, htonl(dst), dst_len,
>> +						   fi, tos, type, tb_id);
>> +
>> +	return err;
>> +}
>> +EXPORT_SYMBOL(netdev_switch_fib_ipv4_del);
>
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH net-next 1/3] net: add IPv4 routing FIB support for swdev
  2015-01-02  8:00   ` Scott Feldman
@ 2015-01-02 11:39     ` Arad, Ronen
  2015-01-02 17:20       ` Scott Feldman
  2015-01-02 22:57       ` roopa
  2015-01-02 20:55     ` roopa
  1 sibling, 2 replies; 17+ messages in thread
From: Arad, Ronen @ 2015-01-02 11:39 UTC (permalink / raw)
  To: Scott Feldman, roopa, Netdev
  Cc: Jirí Pírko, john fastabend, Thomas Graf,
	Jamal Hadi Salim, Andy Gospodarek



>-----Original Message-----
>From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On
>Behalf Of Scott Feldman
>Sent: Friday, January 02, 2015 10:01 AM
>To: roopa
>Cc: Netdev; Jiří Pírko; john fastabend; Thomas Graf; Jamal Hadi Salim; Andy
>Gospodarek
>Subject: Re: [PATCH net-next 1/3] net: add IPv4 routing FIB support for swdev
>
>On Thu, Jan 1, 2015 at 9:49 PM, roopa <roopa@cumulusnetworks.com> wrote:
>> On 1/1/15, 7:29 PM, sfeldma@gmail.com wrote:
>>>
>>> From: Scott Feldman <sfeldma@gmail.com>
>>>
>>> To offload IPv4 L3 routing functions to swdev device, the swdev device
>>> driver
>>> implements two new ndo ops (ndo_switch_fib_ipv4_add/del).  The ops are
>>> called
>>> by the core IPv4 FIB code when installing/removing FIB entries to/from the
>>> kernel FIB.  On install, the driver should return 0 if FIB entry (route)
>>> can be
>>> installed to device for offloading, -EOPNOTSUPP if route cannot be
>>> installed
>>> due to device limitations, and other negative error code on failure to
>>> install
>>> route to device.  On failure error code, the route is not installed to
>>> device,
>>> and not installed in kernel FIB, and the return code is propagated back to
>>> the
>>> user-space caller (via netlink).  An -EOPNOTSUPP error code is skipped for
>>> the
>>> device but installed in the kernel FIB.
>>>
>>> The FIB entry (route) nexthop list is used to find the swdev device port
>>> to
>>> anchor the ndo op call.  The route's fib_dev (the first nexthop's dev) is
>>> used
>>> find the swdev port by recursively traversing the fib_dev's lower_dev list
>>> until a swdev port is found.  The ndo op is called on this swdev port.
>>
>>
>> scott, I posted a similar api for bridge attribute sets. But, nobody
>> supported it.
>> http://marc.info/?l=linux-netdev&m=141820234410602&w=2
>>
>> If this is acceptable, I will be resubmitting my api as well.
>>
>
>This may get shot down as well, who knows?
>
>For routes, the nexthop dev may be a bridge or a bond for an IP on the
>router, so we have no choice but to walk down from the bridge or the
>bond to find a swport dev to call the ndo op to install the route.
>
Another case is when VLAN-aware bridge with VLAN filtering is used. In that
case IP interfaces are VLAN interfaces created on top of the bridge.

>For bridge settings, I remember someone raised the issue that settings
>should be propagated down the dev hierarchy, with parent calling
>child's op and so on.  I'll go back and look at your post.
>
This was my comment. I'm not sure it was correct. My concern was the VLAN
interface on top of a VLAN-aware bridge use-case. I now believe that such
interfaces are upper devices of the bridge (not master). Therefore, it seems
that traversal starting at a VLAN interface on top of a bridge will follow a
path: VLAN interface => bridge => [team/bond] => switchdev port.
One complication here is that the VLAN context is important. A "naked" nexthop
shall only be resolved within the VLAN associated with the VLAN interface. When
ARP resolution is performed by Linux stack, it goes via the VLAN interface
which imposes a tag on the packet before handing it to the bridge. The VLAN-
aware bridge floods such packet only to member ports of the VLAN. This behavior
of the software bridge has to be preserved with offloaded L3 forwarding and
offloaded L2 switching. 
>>
>>
>>>
>>> Since the FIB entry is "naked" when push from the kernel, the
>>> driver/device
>>> is responsible for resolving the route's nexthops to neighbor MAC
>>> addresses.
>>> This can be done by the driver by monitoring NETEVENT_NEIGH_UPDATE
>>> netevent notifier to watch for ARP activity.  Once a nexthop is resolved
>>> to
>>> neighbor MAC address, it can be installed to the device and the device
>>> will
>>> do the L3 routing offload in HW, for that nexthop.
>>>
>>> Signed-off-by: Scott Feldman <sfeldma@gmail.com>
>>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>>> ---
>>>   include/linux/netdevice.h |   22 +++++++++++
>>>   include/net/switchdev.h   |   18 +++++++++
>>>   net/ipv4/fib_trie.c       |   17 ++++++++-
>>>   net/switchdev/switchdev.c |   89
>>> +++++++++++++++++++++++++++++++++++++++++++++
>>>   4 files changed, 145 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>> index 679e6e9..b66d22b 100644
>>> --- a/include/linux/netdevice.h
>>> +++ b/include/linux/netdevice.h
>>> @@ -767,6 +767,8 @@ struct netdev_phys_item_id {
>>>   typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
>>>                                        struct sk_buff *skb);
>>>   +struct fib_info;
>>> +
>>>   /*
>>>    * This structure defines the management hooks for network devices.
>>>    * The following hooks can be defined; unless noted otherwise, they are
>>> @@ -1030,6 +1032,14 @@ typedef u16 (*select_queue_fallback_t)(struct
>>> net_device *dev,
>>>    * int (*ndo_switch_port_stp_update)(struct net_device *dev, u8 state);
>>>    *    Called to notify switch device port of bridge port STP
>>>    *    state change.
>>> + * int (*ndo_sw_parent_fib_ipv4_add)(struct net_device *dev, __be32 dst,
>>> + *                                  int dst_len, struct fib_info *fi,
>>> + *                                  u8 tos, u8 type, u32 tb_id);
>>> + *     Called to add IPv4 route to switch device.
>>> + * int (*ndo_sw_parent_fib_ipv4_del)(struct net_device *dev, __be32 dst,
>>> + *                                  int dst_len, struct fib_info *fi,
>>> + *                                  u8 tos, u8 type, u32 tb_id);
>>> + *     Called to delete IPv4 route from switch device.
>>>    */
>>>   struct net_device_ops {
>>>         int                     (*ndo_init)(struct net_device *dev);
>>> @@ -1189,6 +1199,18 @@ struct net_device_ops {
>>>                                                             struct
>>> netdev_phys_item_id *psid);
>>>         int                     (*ndo_switch_port_stp_update)(struct
>>> net_device *dev,
>>>                                                               u8 state);
>>> +       int                     (*ndo_switch_fib_ipv4_add)(struct
>>> net_device *dev,
>>> +                                                          __be32 dst,
>>> +                                                          int dst_len,
>>> +                                                          struct fib_info
>>> *fi,
>>> +                                                          u8 tos, u8
>>> type,
>>> +                                                          u32 tb_id);
>>> +       int                     (*ndo_switch_fib_ipv4_del)(struct
>>> net_device *dev,
>>> +                                                          __be32 dst,
>>> +                                                          int dst_len,
>>> +                                                          struct fib_info
>>> *fi,
>>> +                                                          u8 tos, u8
>>> type,
>>> +                                                          u32 tb_id);
>>>   #endif
>>>   };
>>>   diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>>> index 8a6d164..caebc2a 100644
>>> --- a/include/net/switchdev.h
>>> +++ b/include/net/switchdev.h
>>> @@ -17,6 +17,10 @@
>>>   int netdev_switch_parent_id_get(struct net_device *dev,
>>>                                 struct netdev_phys_item_id *psid);
>>>   int netdev_switch_port_stp_update(struct net_device *dev, u8 state);
>>> +int netdev_switch_fib_ipv4_add(u32 dst, int dst_len, struct fib_info *fi,
>>> +                              u8 tos, u8 type, u32 tb_id);
>>> +int netdev_switch_fib_ipv4_del(u32 dst, int dst_len, struct fib_info *fi,
>>> +                              u8 tos, u8 type, u32 tb_id);
>>>     #else
>>>   @@ -32,6 +36,20 @@ static inline int
>>> netdev_switch_port_stp_update(struct net_device *dev,
>>>         return -EOPNOTSUPP;
>>>   }
>>>   +static inline int netdev_switch_fib_ipv4_add(u32 dst, int dst_len,
>>> +                                            struct fib_info *fi,
>>> +                                            u8 tos, u8 type, u32 tb_id)
>>> +{
>>> +       return -EOPNOTSUPP;
>>> +}
>>> +
>>> +static inline int netdev_switch_fib_ipv4_del(u32 dst, int dst_len,
>>> +                                            struct fib_info *fi,
>>> +                                            u8 tos, u8 type, u32 tb_id)
>>> +{
>>> +       return -EOPNOTSUPP;
>>> +}
>>> +
>>>   #endif
>>>     #endif /* _LINUX_SWITCHDEV_H_ */
>>> diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
>>> index 281e5e0..ea2dc17 100644
>>> --- a/net/ipv4/fib_trie.c
>>> +++ b/net/ipv4/fib_trie.c
>>> @@ -79,6 +79,7 @@
>>>   #include <net/tcp.h>
>>>   #include <net/sock.h>
>>>   #include <net/ip_fib.h>
>>> +#include <net/switchdev.h>
>>>   #include "fib_lookup.h"
>>>     #define MAX_STAT_DEPTH 32
>>> @@ -1201,6 +1202,8 @@ int fib_table_insert(struct fib_table *tb, struct
>>> fib_config *cfg)
>>>                         fib_release_info(fi_drop);
>>>                         if (state & FA_S_ACCESSED)
>>>                                 rt_cache_flush(cfg->fc_nlinfo.nl_net);
>>> +                       netdev_switch_fib_ipv4_add(key, plen, fi,
>>> fa->fa_tos,
>>> +                                                  cfg->fc_type,
>>> tb->tb_id);
>>>                         rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen,
>>>                                 tb->tb_id, &cfg->fc_nlinfo,
>>> NLM_F_REPLACE);
>>>   @@ -1229,6 +1232,13 @@ int fib_table_insert(struct fib_table *tb, struct
>>> fib_config *cfg)
>>>         new_fa->fa_tos = tos;
>>>         new_fa->fa_type = cfg->fc_type;
>>>         new_fa->fa_state = 0;
>>> +
>>> +       /* (Optionally) offload fib info to switch hardware. */
>>> +       err = netdev_switch_fib_ipv4_add(key, plen, fi, tos,
>>> +                                        cfg->fc_type, tb->tb_id);
>>> +       if (err && err != -EOPNOTSUPP)
>>> +               goto out_free_new_fa;
>>> +
>>>         /*
>>>          * Insert new entry to the list.
>>>          */
>>> @@ -1237,7 +1247,7 @@ int fib_table_insert(struct fib_table *tb, struct
>>> fib_config *cfg)
>>>                 fa_head = fib_insert_node(t, key, plen);
>>>                 if (unlikely(!fa_head)) {
>>>                         err = -ENOMEM;
>>> -                       goto out_free_new_fa;
>>> +                       goto out_sw_fib_del;
>>>                 }
>>>         }
>>>   @@ -1253,6 +1263,8 @@ int fib_table_insert(struct fib_table *tb, struct
>>> fib_config *cfg)
>>>   succeeded:
>>>         return 0;
>>>   +out_sw_fib_del:
>>> +       netdev_switch_fib_ipv4_del(key, plen, fi, tos, cfg->fc_type,
>>> tb->tb_id);
>>>   out_free_new_fa:
>>>         kmem_cache_free(fn_alias_kmem, new_fa);
>>>   out:
>>> @@ -1529,6 +1541,9 @@ int fib_table_delete(struct fib_table *tb, struct
>>> fib_config *cfg)
>>>         rtmsg_fib(RTM_DELROUTE, htonl(key), fa, plen, tb->tb_id,
>>>                   &cfg->fc_nlinfo, 0);
>>>   +     netdev_switch_fib_ipv4_del(key, plen, fa->fa_info, tos,
>>> +                                  cfg->fc_type, tb->tb_id);
>>> +
>>>         list_del_rcu(&fa->fa_list);
>>>         if (!plen)
>>> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
>>> index d162b21..211a8a0 100644
>>> --- a/net/switchdev/switchdev.c
>>> +++ b/net/switchdev/switchdev.c
>>> @@ -12,6 +12,7 @@
>>>   #include <linux/types.h>
>>>   #include <linux/init.h>
>>>   #include <linux/netdevice.h>
>>> +#include <net/ip_fib.h>
>>>   #include <net/switchdev.h>
>>>     /**
>>> @@ -50,3 +51,91 @@ int netdev_switch_port_stp_update(struct net_device
>>> *dev, u8 state)
>>>         return ops->ndo_switch_port_stp_update(dev, state);
>>>   }
>>>   EXPORT_SYMBOL(netdev_switch_port_stp_update);
>>> +
>>> +static struct net_device *netdev_switch_get_by_fib_dev(struct net_device
>>> *dev)
>>> +{
>>> +       const struct net_device_ops *ops = dev->netdev_ops;
>>> +       struct net_device *lower_dev;
>>> +       struct net_device *port_dev;
>>> +       struct list_head *iter;
>>> +
>>> +       /* Recusively search from fib_dev down until we find
>>> +        * a sw port dev.  (A sw port dev supports
>>> +        * ndo_switch_parent_id_get).
>>> +        */
>>> +
>>> +       if (ops->ndo_switch_parent_id_get)
>>> +               return dev;
>>> +
>>> +       netdev_for_each_lower_dev(dev, lower_dev, iter) {
>>> +               port_dev = netdev_switch_get_by_fib_dev(lower_dev);
>>> +               if (port_dev)
>>> +                       return port_dev;
>>> +       }
>>> +
>>> +       return NULL;
>>> +}
>>> +
>>> +/**
>>> + *     netdev_switch_fib_ipv4_add - Add IPv4 route entry to switch
>>> + *
>>> + *     @dst: route's IPv4 destination address
>>> + *     @dst_len: destination address length (prefix length)
>>> + *     @fi: route FIB info structure
>>> + *     @tos: route TOS
>>> + *     @type: route type
>>> + *     @tb_id: route table ID
>>> + *
>>> + *     Add IPv4 route entry to switch device.
>>> + */
>>> +int netdev_switch_fib_ipv4_add(u32 dst, int dst_len, struct fib_info *fi,
>>> +                              u8 tos, u8 type, u32 tb_id)
>>> +{
>>> +       struct net_device *dev;
>>> +       const struct net_device_ops *ops;
>>> +       int err = -EOPNOTSUPP;
>>> +
>>> +       dev = netdev_switch_get_by_fib_dev(fi->fib_dev);
>>> +       if (!dev)
>>> +               return -EOPNOTSUPP;
>>> +       ops = dev->netdev_ops;
>>> +
>>> +       if (ops->ndo_switch_fib_ipv4_add)
>>> +               err = ops->ndo_switch_fib_ipv4_add(dev, htonl(dst),
>>> dst_len,
>>> +                                                  fi, tos, type, tb_id);
>>> +
>>> +       return err;
>>> +}
>>> +EXPORT_SYMBOL(netdev_switch_fib_ipv4_add);
>>> +
>>> +/**
>>> + *     netdev_switch_fib_ipv4_del - Delete IPv4 route entry from switch
>>> + *
>>> + *     @dst: route's IPv4 destination address
>>> + *     @dst_len: destination address length (prefix length)
>>> + *     @fi: route FIB info structure
>>> + *     @tos: route TOS
>>> + *     @type: route type
>>> + *     @tb_id: route table ID
>>> + *
>>> + *     Delete IPv4 route entry from switch device.
>>> + */
>>> +int netdev_switch_fib_ipv4_del(u32 dst, int dst_len, struct fib_info *fi,
>>> +                              u8 tos, u8 type, u32 tb_id)
>>> +{
>>> +       struct net_device *dev;
>>> +       const struct net_device_ops *ops;
>>> +       int err = -EOPNOTSUPP;
>>> +
>>> +       dev = netdev_switch_get_by_fib_dev(fi->fib_dev);
>>> +       if (!dev)
>>> +               return -EOPNOTSUPP;
>>> +       ops = dev->netdev_ops;
>>> +
>>> +       if (ops->ndo_switch_fib_ipv4_del)
>>> +               err = ops->ndo_switch_fib_ipv4_del(dev, htonl(dst),
>>> dst_len,
>>> +                                                  fi, tos, type, tb_id);
>>> +
>>> +       return err;
>>> +}
>>> +EXPORT_SYMBOL(netdev_switch_fib_ipv4_del);
>>
>>
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 1/3] net: add IPv4 routing FIB support for swdev
  2015-01-02 11:39     ` Arad, Ronen
@ 2015-01-02 17:20       ` Scott Feldman
  2015-01-02 22:57       ` roopa
  1 sibling, 0 replies; 17+ messages in thread
From: Scott Feldman @ 2015-01-02 17:20 UTC (permalink / raw)
  To: Arad, Ronen
  Cc: roopa, Netdev, Jirí Pírko, john fastabend, Thomas Graf,
	Jamal Hadi Salim, Andy Gospodarek

On Fri, Jan 2, 2015 at 3:39 AM, Arad, Ronen <ronen.arad@intel.com> wrote:
>
>
>>-----Original Message-----
>>From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On
>>Behalf Of Scott Feldman
>>Sent: Friday, January 02, 2015 10:01 AM
>>To: roopa
>>Cc: Netdev; Jiří Pírko; john fastabend; Thomas Graf; Jamal Hadi Salim; Andy
>>Gospodarek
>>Subject: Re: [PATCH net-next 1/3] net: add IPv4 routing FIB support for swdev
>>
>>On Thu, Jan 1, 2015 at 9:49 PM, roopa <roopa@cumulusnetworks.com> wrote:
>>> On 1/1/15, 7:29 PM, sfeldma@gmail.com wrote:
>>>>
>>>> From: Scott Feldman <sfeldma@gmail.com>
>>>>
>>>> To offload IPv4 L3 routing functions to swdev device, the swdev device
>>>> driver
>>>> implements two new ndo ops (ndo_switch_fib_ipv4_add/del).  The ops are
>>>> called
>>>> by the core IPv4 FIB code when installing/removing FIB entries to/from the
>>>> kernel FIB.  On install, the driver should return 0 if FIB entry (route)
>>>> can be
>>>> installed to device for offloading, -EOPNOTSUPP if route cannot be
>>>> installed
>>>> due to device limitations, and other negative error code on failure to
>>>> install
>>>> route to device.  On failure error code, the route is not installed to
>>>> device,
>>>> and not installed in kernel FIB, and the return code is propagated back to
>>>> the
>>>> user-space caller (via netlink).  An -EOPNOTSUPP error code is skipped for
>>>> the
>>>> device but installed in the kernel FIB.
>>>>
>>>> The FIB entry (route) nexthop list is used to find the swdev device port
>>>> to
>>>> anchor the ndo op call.  The route's fib_dev (the first nexthop's dev) is
>>>> used
>>>> find the swdev port by recursively traversing the fib_dev's lower_dev list
>>>> until a swdev port is found.  The ndo op is called on this swdev port.
>>>
>>>
>>> scott, I posted a similar api for bridge attribute sets. But, nobody
>>> supported it.
>>> http://marc.info/?l=linux-netdev&m=141820234410602&w=2
>>>
>>> If this is acceptable, I will be resubmitting my api as well.
>>>
>>
>>This may get shot down as well, who knows?
>>
>>For routes, the nexthop dev may be a bridge or a bond for an IP on the
>>router, so we have no choice but to walk down from the bridge or the
>>bond to find a swport dev to call the ndo op to install the route.
>>
> Another case is when VLAN-aware bridge with VLAN filtering is used. In that
> case IP interfaces are VLAN interfaces created on top of the bridge.
>
>>For bridge settings, I remember someone raised the issue that settings
>>should be propagated down the dev hierarchy, with parent calling
>>child's op and so on.  I'll go back and look at your post.
>>
> This was my comment. I'm not sure it was correct. My concern was the VLAN
> interface on top of a VLAN-aware bridge use-case. I now believe that such
> interfaces are upper devices of the bridge (not master). Therefore, it seems
> that traversal starting at a VLAN interface on top of a bridge will follow a
> path: VLAN interface => bridge => [team/bond] => switchdev port.

With the VLAN support built into the bridge, we can avoid the vlan
interface, which makes it a little better:

bridge/vlan => [team/bond] => swdev port

> One complication here is that the VLAN context is important. A "naked" nexthop
> shall only be resolved within the VLAN associated with the VLAN interface. When
> ARP resolution is performed by Linux stack, it goes via the VLAN interface
> which imposes a tag on the packet before handing it to the bridge. The VLAN-
> aware bridge floods such packet only to member ports of the VLAN. This behavior
> of the software bridge has to be preserved with offloaded L3 forwarding and
> offloaded L2 switching.

Good point, and valid for bridge/vlans as well.


>>>
>>>
>>>>
>>>> Since the FIB entry is "naked" when push from the kernel, the
>>>> driver/device
>>>> is responsible for resolving the route's nexthops to neighbor MAC
>>>> addresses.
>>>> This can be done by the driver by monitoring NETEVENT_NEIGH_UPDATE
>>>> netevent notifier to watch for ARP activity.  Once a nexthop is resolved
>>>> to
>>>> neighbor MAC address, it can be installed to the device and the device
>>>> will
>>>> do the L3 routing offload in HW, for that nexthop.
>>>>
>>>> Signed-off-by: Scott Feldman <sfeldma@gmail.com>
>>>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>>>> ---
>>>>   include/linux/netdevice.h |   22 +++++++++++
>>>>   include/net/switchdev.h   |   18 +++++++++
>>>>   net/ipv4/fib_trie.c       |   17 ++++++++-
>>>>   net/switchdev/switchdev.c |   89
>>>> +++++++++++++++++++++++++++++++++++++++++++++
>>>>   4 files changed, 145 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>>> index 679e6e9..b66d22b 100644
>>>> --- a/include/linux/netdevice.h
>>>> +++ b/include/linux/netdevice.h
>>>> @@ -767,6 +767,8 @@ struct netdev_phys_item_id {
>>>>   typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
>>>>                                        struct sk_buff *skb);
>>>>   +struct fib_info;
>>>> +
>>>>   /*
>>>>    * This structure defines the management hooks for network devices.
>>>>    * The following hooks can be defined; unless noted otherwise, they are
>>>> @@ -1030,6 +1032,14 @@ typedef u16 (*select_queue_fallback_t)(struct
>>>> net_device *dev,
>>>>    * int (*ndo_switch_port_stp_update)(struct net_device *dev, u8 state);
>>>>    *    Called to notify switch device port of bridge port STP
>>>>    *    state change.
>>>> + * int (*ndo_sw_parent_fib_ipv4_add)(struct net_device *dev, __be32 dst,
>>>> + *                                  int dst_len, struct fib_info *fi,
>>>> + *                                  u8 tos, u8 type, u32 tb_id);
>>>> + *     Called to add IPv4 route to switch device.
>>>> + * int (*ndo_sw_parent_fib_ipv4_del)(struct net_device *dev, __be32 dst,
>>>> + *                                  int dst_len, struct fib_info *fi,
>>>> + *                                  u8 tos, u8 type, u32 tb_id);
>>>> + *     Called to delete IPv4 route from switch device.
>>>>    */
>>>>   struct net_device_ops {
>>>>         int                     (*ndo_init)(struct net_device *dev);
>>>> @@ -1189,6 +1199,18 @@ struct net_device_ops {
>>>>                                                             struct
>>>> netdev_phys_item_id *psid);
>>>>         int                     (*ndo_switch_port_stp_update)(struct
>>>> net_device *dev,
>>>>                                                               u8 state);
>>>> +       int                     (*ndo_switch_fib_ipv4_add)(struct
>>>> net_device *dev,
>>>> +                                                          __be32 dst,
>>>> +                                                          int dst_len,
>>>> +                                                          struct fib_info
>>>> *fi,
>>>> +                                                          u8 tos, u8
>>>> type,
>>>> +                                                          u32 tb_id);
>>>> +       int                     (*ndo_switch_fib_ipv4_del)(struct
>>>> net_device *dev,
>>>> +                                                          __be32 dst,
>>>> +                                                          int dst_len,
>>>> +                                                          struct fib_info
>>>> *fi,
>>>> +                                                          u8 tos, u8
>>>> type,
>>>> +                                                          u32 tb_id);
>>>>   #endif
>>>>   };
>>>>   diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>>>> index 8a6d164..caebc2a 100644
>>>> --- a/include/net/switchdev.h
>>>> +++ b/include/net/switchdev.h
>>>> @@ -17,6 +17,10 @@
>>>>   int netdev_switch_parent_id_get(struct net_device *dev,
>>>>                                 struct netdev_phys_item_id *psid);
>>>>   int netdev_switch_port_stp_update(struct net_device *dev, u8 state);
>>>> +int netdev_switch_fib_ipv4_add(u32 dst, int dst_len, struct fib_info *fi,
>>>> +                              u8 tos, u8 type, u32 tb_id);
>>>> +int netdev_switch_fib_ipv4_del(u32 dst, int dst_len, struct fib_info *fi,
>>>> +                              u8 tos, u8 type, u32 tb_id);
>>>>     #else
>>>>   @@ -32,6 +36,20 @@ static inline int
>>>> netdev_switch_port_stp_update(struct net_device *dev,
>>>>         return -EOPNOTSUPP;
>>>>   }
>>>>   +static inline int netdev_switch_fib_ipv4_add(u32 dst, int dst_len,
>>>> +                                            struct fib_info *fi,
>>>> +                                            u8 tos, u8 type, u32 tb_id)
>>>> +{
>>>> +       return -EOPNOTSUPP;
>>>> +}
>>>> +
>>>> +static inline int netdev_switch_fib_ipv4_del(u32 dst, int dst_len,
>>>> +                                            struct fib_info *fi,
>>>> +                                            u8 tos, u8 type, u32 tb_id)
>>>> +{
>>>> +       return -EOPNOTSUPP;
>>>> +}
>>>> +
>>>>   #endif
>>>>     #endif /* _LINUX_SWITCHDEV_H_ */
>>>> diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
>>>> index 281e5e0..ea2dc17 100644
>>>> --- a/net/ipv4/fib_trie.c
>>>> +++ b/net/ipv4/fib_trie.c
>>>> @@ -79,6 +79,7 @@
>>>>   #include <net/tcp.h>
>>>>   #include <net/sock.h>
>>>>   #include <net/ip_fib.h>
>>>> +#include <net/switchdev.h>
>>>>   #include "fib_lookup.h"
>>>>     #define MAX_STAT_DEPTH 32
>>>> @@ -1201,6 +1202,8 @@ int fib_table_insert(struct fib_table *tb, struct
>>>> fib_config *cfg)
>>>>                         fib_release_info(fi_drop);
>>>>                         if (state & FA_S_ACCESSED)
>>>>                                 rt_cache_flush(cfg->fc_nlinfo.nl_net);
>>>> +                       netdev_switch_fib_ipv4_add(key, plen, fi,
>>>> fa->fa_tos,
>>>> +                                                  cfg->fc_type,
>>>> tb->tb_id);
>>>>                         rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen,
>>>>                                 tb->tb_id, &cfg->fc_nlinfo,
>>>> NLM_F_REPLACE);
>>>>   @@ -1229,6 +1232,13 @@ int fib_table_insert(struct fib_table *tb, struct
>>>> fib_config *cfg)
>>>>         new_fa->fa_tos = tos;
>>>>         new_fa->fa_type = cfg->fc_type;
>>>>         new_fa->fa_state = 0;
>>>> +
>>>> +       /* (Optionally) offload fib info to switch hardware. */
>>>> +       err = netdev_switch_fib_ipv4_add(key, plen, fi, tos,
>>>> +                                        cfg->fc_type, tb->tb_id);
>>>> +       if (err && err != -EOPNOTSUPP)
>>>> +               goto out_free_new_fa;
>>>> +
>>>>         /*
>>>>          * Insert new entry to the list.
>>>>          */
>>>> @@ -1237,7 +1247,7 @@ int fib_table_insert(struct fib_table *tb, struct
>>>> fib_config *cfg)
>>>>                 fa_head = fib_insert_node(t, key, plen);
>>>>                 if (unlikely(!fa_head)) {
>>>>                         err = -ENOMEM;
>>>> -                       goto out_free_new_fa;
>>>> +                       goto out_sw_fib_del;
>>>>                 }
>>>>         }
>>>>   @@ -1253,6 +1263,8 @@ int fib_table_insert(struct fib_table *tb, struct
>>>> fib_config *cfg)
>>>>   succeeded:
>>>>         return 0;
>>>>   +out_sw_fib_del:
>>>> +       netdev_switch_fib_ipv4_del(key, plen, fi, tos, cfg->fc_type,
>>>> tb->tb_id);
>>>>   out_free_new_fa:
>>>>         kmem_cache_free(fn_alias_kmem, new_fa);
>>>>   out:
>>>> @@ -1529,6 +1541,9 @@ int fib_table_delete(struct fib_table *tb, struct
>>>> fib_config *cfg)
>>>>         rtmsg_fib(RTM_DELROUTE, htonl(key), fa, plen, tb->tb_id,
>>>>                   &cfg->fc_nlinfo, 0);
>>>>   +     netdev_switch_fib_ipv4_del(key, plen, fa->fa_info, tos,
>>>> +                                  cfg->fc_type, tb->tb_id);
>>>> +
>>>>         list_del_rcu(&fa->fa_list);
>>>>         if (!plen)
>>>> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
>>>> index d162b21..211a8a0 100644
>>>> --- a/net/switchdev/switchdev.c
>>>> +++ b/net/switchdev/switchdev.c
>>>> @@ -12,6 +12,7 @@
>>>>   #include <linux/types.h>
>>>>   #include <linux/init.h>
>>>>   #include <linux/netdevice.h>
>>>> +#include <net/ip_fib.h>
>>>>   #include <net/switchdev.h>
>>>>     /**
>>>> @@ -50,3 +51,91 @@ int netdev_switch_port_stp_update(struct net_device
>>>> *dev, u8 state)
>>>>         return ops->ndo_switch_port_stp_update(dev, state);
>>>>   }
>>>>   EXPORT_SYMBOL(netdev_switch_port_stp_update);
>>>> +
>>>> +static struct net_device *netdev_switch_get_by_fib_dev(struct net_device
>>>> *dev)
>>>> +{
>>>> +       const struct net_device_ops *ops = dev->netdev_ops;
>>>> +       struct net_device *lower_dev;
>>>> +       struct net_device *port_dev;
>>>> +       struct list_head *iter;
>>>> +
>>>> +       /* Recusively search from fib_dev down until we find
>>>> +        * a sw port dev.  (A sw port dev supports
>>>> +        * ndo_switch_parent_id_get).
>>>> +        */
>>>> +
>>>> +       if (ops->ndo_switch_parent_id_get)
>>>> +               return dev;
>>>> +
>>>> +       netdev_for_each_lower_dev(dev, lower_dev, iter) {
>>>> +               port_dev = netdev_switch_get_by_fib_dev(lower_dev);
>>>> +               if (port_dev)
>>>> +                       return port_dev;
>>>> +       }
>>>> +
>>>> +       return NULL;
>>>> +}
>>>> +
>>>> +/**
>>>> + *     netdev_switch_fib_ipv4_add - Add IPv4 route entry to switch
>>>> + *
>>>> + *     @dst: route's IPv4 destination address
>>>> + *     @dst_len: destination address length (prefix length)
>>>> + *     @fi: route FIB info structure
>>>> + *     @tos: route TOS
>>>> + *     @type: route type
>>>> + *     @tb_id: route table ID
>>>> + *
>>>> + *     Add IPv4 route entry to switch device.
>>>> + */
>>>> +int netdev_switch_fib_ipv4_add(u32 dst, int dst_len, struct fib_info *fi,
>>>> +                              u8 tos, u8 type, u32 tb_id)
>>>> +{
>>>> +       struct net_device *dev;
>>>> +       const struct net_device_ops *ops;
>>>> +       int err = -EOPNOTSUPP;
>>>> +
>>>> +       dev = netdev_switch_get_by_fib_dev(fi->fib_dev);
>>>> +       if (!dev)
>>>> +               return -EOPNOTSUPP;
>>>> +       ops = dev->netdev_ops;
>>>> +
>>>> +       if (ops->ndo_switch_fib_ipv4_add)
>>>> +               err = ops->ndo_switch_fib_ipv4_add(dev, htonl(dst),
>>>> dst_len,
>>>> +                                                  fi, tos, type, tb_id);
>>>> +
>>>> +       return err;
>>>> +}
>>>> +EXPORT_SYMBOL(netdev_switch_fib_ipv4_add);
>>>> +
>>>> +/**
>>>> + *     netdev_switch_fib_ipv4_del - Delete IPv4 route entry from switch
>>>> + *
>>>> + *     @dst: route's IPv4 destination address
>>>> + *     @dst_len: destination address length (prefix length)
>>>> + *     @fi: route FIB info structure
>>>> + *     @tos: route TOS
>>>> + *     @type: route type
>>>> + *     @tb_id: route table ID
>>>> + *
>>>> + *     Delete IPv4 route entry from switch device.
>>>> + */
>>>> +int netdev_switch_fib_ipv4_del(u32 dst, int dst_len, struct fib_info *fi,
>>>> +                              u8 tos, u8 type, u32 tb_id)
>>>> +{
>>>> +       struct net_device *dev;
>>>> +       const struct net_device_ops *ops;
>>>> +       int err = -EOPNOTSUPP;
>>>> +
>>>> +       dev = netdev_switch_get_by_fib_dev(fi->fib_dev);
>>>> +       if (!dev)
>>>> +               return -EOPNOTSUPP;
>>>> +       ops = dev->netdev_ops;
>>>> +
>>>> +       if (ops->ndo_switch_fib_ipv4_del)
>>>> +               err = ops->ndo_switch_fib_ipv4_del(dev, htonl(dst),
>>>> dst_len,
>>>> +                                                  fi, tos, type, tb_id);
>>>> +
>>>> +       return err;
>>>> +}
>>>> +EXPORT_SYMBOL(netdev_switch_fib_ipv4_del);
>>>
>>>
>>--
>>To unsubscribe from this list: send the line "unsubscribe netdev" in
>>the body of a message to majordomo@vger.kernel.org
>>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 1/3] net: add IPv4 routing FIB support for swdev
  2015-01-02  8:00   ` Scott Feldman
  2015-01-02 11:39     ` Arad, Ronen
@ 2015-01-02 20:55     ` roopa
  1 sibling, 0 replies; 17+ messages in thread
From: roopa @ 2015-01-02 20:55 UTC (permalink / raw)
  To: Scott Feldman
  Cc: Netdev, Jiří Pírko, john fastabend, Thomas Graf,
	Jamal Hadi Salim, Andy Gospodarek

On 1/2/15, 12:00 AM, Scott Feldman wrote:
> On Thu, Jan 1, 2015 at 9:49 PM, roopa <roopa@cumulusnetworks.com> wrote:
>> On 1/1/15, 7:29 PM, sfeldma@gmail.com wrote:
>>> From: Scott Feldman <sfeldma@gmail.com>
>>>
>>> To offload IPv4 L3 routing functions to swdev device, the swdev device
>>> driver
>>> implements two new ndo ops (ndo_switch_fib_ipv4_add/del).  The ops are
>>> called
>>> by the core IPv4 FIB code when installing/removing FIB entries to/from the
>>> kernel FIB.  On install, the driver should return 0 if FIB entry (route)
>>> can be
>>> installed to device for offloading, -EOPNOTSUPP if route cannot be
>>> installed
>>> due to device limitations, and other negative error code on failure to
>>> install
>>> route to device.  On failure error code, the route is not installed to
>>> device,
>>> and not installed in kernel FIB, and the return code is propagated back to
>>> the
>>> user-space caller (via netlink).  An -EOPNOTSUPP error code is skipped for
>>> the
>>> device but installed in the kernel FIB.
>>>
>>> The FIB entry (route) nexthop list is used to find the swdev device port
>>> to
>>> anchor the ndo op call.  The route's fib_dev (the first nexthop's dev) is
>>> used
>>> find the swdev port by recursively traversing the fib_dev's lower_dev list
>>> until a swdev port is found.  The ndo op is called on this swdev port.
>>
>> scott, I posted a similar api for bridge attribute sets. But, nobody
>> supported it.
>> http://marc.info/?l=linux-netdev&m=141820234410602&w=2
>>
>> If this is acceptable, I will be resubmitting my api as well.
>>
> This may get shot down as well, who knows?
concern about traversing the stacked devices came from jiri.
I was wondering if we changed our minds on this. Hence my last comment.
> For routes, the nexthop dev may be a bridge or a bond for an IP on the
> router, so we have no choice but to walk down from the bridge or the
> bond to find a swport dev to call the ndo op to install the route.

I understand, During my patches, i did bring up l3 ops and that this 
would be needed for l3 when nexthops are stacked devices on switch ports 
as well. So, it was a generic concept for all such ops.

>
> For bridge settings, I remember someone raised the issue that settings
> should be propagated down the dev hierarchy, with parent calling
> child's op and so on.  I'll go back and look at your post.
AFAIR it was jiri.
>
>>
>>> Since the FIB entry is "naked" when push from the kernel, the
>>> driver/device
>>> is responsible for resolving the route's nexthops to neighbor MAC
>>> addresses.
>>> This can be done by the driver by monitoring NETEVENT_NEIGH_UPDATE
>>> netevent notifier to watch for ARP activity.  Once a nexthop is resolved
>>> to
>>> neighbor MAC address, it can be installed to the device and the device
>>> will
>>> do the L3 routing offload in HW, for that nexthop.
>>>
>>> Signed-off-by: Scott Feldman <sfeldma@gmail.com>
>>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>>> ---
>>>    include/linux/netdevice.h |   22 +++++++++++
>>>    include/net/switchdev.h   |   18 +++++++++
>>>    net/ipv4/fib_trie.c       |   17 ++++++++-
>>>    net/switchdev/switchdev.c |   89
>>> +++++++++++++++++++++++++++++++++++++++++++++
>>>    4 files changed, 145 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>> index 679e6e9..b66d22b 100644
>>> --- a/include/linux/netdevice.h
>>> +++ b/include/linux/netdevice.h
>>> @@ -767,6 +767,8 @@ struct netdev_phys_item_id {
>>>    typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
>>>                                         struct sk_buff *skb);
>>>    +struct fib_info;
>>> +
>>>    /*
>>>     * This structure defines the management hooks for network devices.
>>>     * The following hooks can be defined; unless noted otherwise, they are
>>> @@ -1030,6 +1032,14 @@ typedef u16 (*select_queue_fallback_t)(struct
>>> net_device *dev,
>>>     * int (*ndo_switch_port_stp_update)(struct net_device *dev, u8 state);
>>>     *    Called to notify switch device port of bridge port STP
>>>     *    state change.
>>> + * int (*ndo_sw_parent_fib_ipv4_add)(struct net_device *dev, __be32 dst,
>>> + *                                  int dst_len, struct fib_info *fi,
>>> + *                                  u8 tos, u8 type, u32 tb_id);
>>> + *     Called to add IPv4 route to switch device.
>>> + * int (*ndo_sw_parent_fib_ipv4_del)(struct net_device *dev, __be32 dst,
>>> + *                                  int dst_len, struct fib_info *fi,
>>> + *                                  u8 tos, u8 type, u32 tb_id);
>>> + *     Called to delete IPv4 route from switch device.
>>>     */
>>>    struct net_device_ops {
>>>          int                     (*ndo_init)(struct net_device *dev);
>>> @@ -1189,6 +1199,18 @@ struct net_device_ops {
>>>                                                              struct
>>> netdev_phys_item_id *psid);
>>>          int                     (*ndo_switch_port_stp_update)(struct
>>> net_device *dev,
>>>                                                                u8 state);
>>> +       int                     (*ndo_switch_fib_ipv4_add)(struct
>>> net_device *dev,
>>> +                                                          __be32 dst,
>>> +                                                          int dst_len,
>>> +                                                          struct fib_info
>>> *fi,
>>> +                                                          u8 tos, u8
>>> type,
>>> +                                                          u32 tb_id);
>>> +       int                     (*ndo_switch_fib_ipv4_del)(struct
>>> net_device *dev,
>>> +                                                          __be32 dst,
>>> +                                                          int dst_len,
>>> +                                                          struct fib_info
>>> *fi,
>>> +                                                          u8 tos, u8
>>> type,
>>> +                                                          u32 tb_id);
>>>    #endif
>>>    };
>>>    diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>>> index 8a6d164..caebc2a 100644
>>> --- a/include/net/switchdev.h
>>> +++ b/include/net/switchdev.h
>>> @@ -17,6 +17,10 @@
>>>    int netdev_switch_parent_id_get(struct net_device *dev,
>>>                                  struct netdev_phys_item_id *psid);
>>>    int netdev_switch_port_stp_update(struct net_device *dev, u8 state);
>>> +int netdev_switch_fib_ipv4_add(u32 dst, int dst_len, struct fib_info *fi,
>>> +                              u8 tos, u8 type, u32 tb_id);
>>> +int netdev_switch_fib_ipv4_del(u32 dst, int dst_len, struct fib_info *fi,
>>> +                              u8 tos, u8 type, u32 tb_id);
>>>      #else
>>>    @@ -32,6 +36,20 @@ static inline int
>>> netdev_switch_port_stp_update(struct net_device *dev,
>>>          return -EOPNOTSUPP;
>>>    }
>>>    +static inline int netdev_switch_fib_ipv4_add(u32 dst, int dst_len,
>>> +                                            struct fib_info *fi,
>>> +                                            u8 tos, u8 type, u32 tb_id)
>>> +{
>>> +       return -EOPNOTSUPP;
>>> +}
>>> +
>>> +static inline int netdev_switch_fib_ipv4_del(u32 dst, int dst_len,
>>> +                                            struct fib_info *fi,
>>> +                                            u8 tos, u8 type, u32 tb_id)
>>> +{
>>> +       return -EOPNOTSUPP;
>>> +}
>>> +
>>>    #endif
>>>      #endif /* _LINUX_SWITCHDEV_H_ */
>>> diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
>>> index 281e5e0..ea2dc17 100644
>>> --- a/net/ipv4/fib_trie.c
>>> +++ b/net/ipv4/fib_trie.c
>>> @@ -79,6 +79,7 @@
>>>    #include <net/tcp.h>
>>>    #include <net/sock.h>
>>>    #include <net/ip_fib.h>
>>> +#include <net/switchdev.h>
>>>    #include "fib_lookup.h"
>>>      #define MAX_STAT_DEPTH 32
>>> @@ -1201,6 +1202,8 @@ int fib_table_insert(struct fib_table *tb, struct
>>> fib_config *cfg)
>>>                          fib_release_info(fi_drop);
>>>                          if (state & FA_S_ACCESSED)
>>>                                  rt_cache_flush(cfg->fc_nlinfo.nl_net);
>>> +                       netdev_switch_fib_ipv4_add(key, plen, fi,
>>> fa->fa_tos,
>>> +                                                  cfg->fc_type,
>>> tb->tb_id);
>>>                          rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen,
>>>                                  tb->tb_id, &cfg->fc_nlinfo,
>>> NLM_F_REPLACE);
>>>    @@ -1229,6 +1232,13 @@ int fib_table_insert(struct fib_table *tb, struct
>>> fib_config *cfg)
>>>          new_fa->fa_tos = tos;
>>>          new_fa->fa_type = cfg->fc_type;
>>>          new_fa->fa_state = 0;
>>> +
>>> +       /* (Optionally) offload fib info to switch hardware. */
>>> +       err = netdev_switch_fib_ipv4_add(key, plen, fi, tos,
>>> +                                        cfg->fc_type, tb->tb_id);
>>> +       if (err && err != -EOPNOTSUPP)
>>> +               goto out_free_new_fa;
>>> +
>>>          /*
>>>           * Insert new entry to the list.
>>>           */
>>> @@ -1237,7 +1247,7 @@ int fib_table_insert(struct fib_table *tb, struct
>>> fib_config *cfg)
>>>                  fa_head = fib_insert_node(t, key, plen);
>>>                  if (unlikely(!fa_head)) {
>>>                          err = -ENOMEM;
>>> -                       goto out_free_new_fa;
>>> +                       goto out_sw_fib_del;
>>>                  }
>>>          }
>>>    @@ -1253,6 +1263,8 @@ int fib_table_insert(struct fib_table *tb, struct
>>> fib_config *cfg)
>>>    succeeded:
>>>          return 0;
>>>    +out_sw_fib_del:
>>> +       netdev_switch_fib_ipv4_del(key, plen, fi, tos, cfg->fc_type,
>>> tb->tb_id);
>>>    out_free_new_fa:
>>>          kmem_cache_free(fn_alias_kmem, new_fa);
>>>    out:
>>> @@ -1529,6 +1541,9 @@ int fib_table_delete(struct fib_table *tb, struct
>>> fib_config *cfg)
>>>          rtmsg_fib(RTM_DELROUTE, htonl(key), fa, plen, tb->tb_id,
>>>                    &cfg->fc_nlinfo, 0);
>>>    +     netdev_switch_fib_ipv4_del(key, plen, fa->fa_info, tos,
>>> +                                  cfg->fc_type, tb->tb_id);
>>> +
>>>          list_del_rcu(&fa->fa_list);
>>>          if (!plen)
>>> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
>>> index d162b21..211a8a0 100644
>>> --- a/net/switchdev/switchdev.c
>>> +++ b/net/switchdev/switchdev.c
>>> @@ -12,6 +12,7 @@
>>>    #include <linux/types.h>
>>>    #include <linux/init.h>
>>>    #include <linux/netdevice.h>
>>> +#include <net/ip_fib.h>
>>>    #include <net/switchdev.h>
>>>      /**
>>> @@ -50,3 +51,91 @@ int netdev_switch_port_stp_update(struct net_device
>>> *dev, u8 state)
>>>          return ops->ndo_switch_port_stp_update(dev, state);
>>>    }
>>>    EXPORT_SYMBOL(netdev_switch_port_stp_update);
>>> +
>>> +static struct net_device *netdev_switch_get_by_fib_dev(struct net_device
>>> *dev)
>>> +{
>>> +       const struct net_device_ops *ops = dev->netdev_ops;
>>> +       struct net_device *lower_dev;
>>> +       struct net_device *port_dev;
>>> +       struct list_head *iter;
>>> +
>>> +       /* Recusively search from fib_dev down until we find
>>> +        * a sw port dev.  (A sw port dev supports
>>> +        * ndo_switch_parent_id_get).
>>> +        */
>>> +
>>> +       if (ops->ndo_switch_parent_id_get)
>>> +               return dev;
>>> +
>>> +       netdev_for_each_lower_dev(dev, lower_dev, iter) {
>>> +               port_dev = netdev_switch_get_by_fib_dev(lower_dev);
>>> +               if (port_dev)
>>> +                       return port_dev;
>>> +       }
>>> +
>>> +       return NULL;
>>> +}
>>> +
>>> +/**
>>> + *     netdev_switch_fib_ipv4_add - Add IPv4 route entry to switch
>>> + *
>>> + *     @dst: route's IPv4 destination address
>>> + *     @dst_len: destination address length (prefix length)
>>> + *     @fi: route FIB info structure
>>> + *     @tos: route TOS
>>> + *     @type: route type
>>> + *     @tb_id: route table ID
>>> + *
>>> + *     Add IPv4 route entry to switch device.
>>> + */
>>> +int netdev_switch_fib_ipv4_add(u32 dst, int dst_len, struct fib_info *fi,
>>> +                              u8 tos, u8 type, u32 tb_id)
>>> +{
>>> +       struct net_device *dev;
>>> +       const struct net_device_ops *ops;
>>> +       int err = -EOPNOTSUPP;
>>> +
>>> +       dev = netdev_switch_get_by_fib_dev(fi->fib_dev);
>>> +       if (!dev)
>>> +               return -EOPNOTSUPP;
>>> +       ops = dev->netdev_ops;
>>> +
>>> +       if (ops->ndo_switch_fib_ipv4_add)
>>> +               err = ops->ndo_switch_fib_ipv4_add(dev, htonl(dst),
>>> dst_len,
>>> +                                                  fi, tos, type, tb_id);
>>> +
>>> +       return err;
>>> +}
>>> +EXPORT_SYMBOL(netdev_switch_fib_ipv4_add);
>>> +
>>> +/**
>>> + *     netdev_switch_fib_ipv4_del - Delete IPv4 route entry from switch
>>> + *
>>> + *     @dst: route's IPv4 destination address
>>> + *     @dst_len: destination address length (prefix length)
>>> + *     @fi: route FIB info structure
>>> + *     @tos: route TOS
>>> + *     @type: route type
>>> + *     @tb_id: route table ID
>>> + *
>>> + *     Delete IPv4 route entry from switch device.
>>> + */
>>> +int netdev_switch_fib_ipv4_del(u32 dst, int dst_len, struct fib_info *fi,
>>> +                              u8 tos, u8 type, u32 tb_id)
>>> +{
>>> +       struct net_device *dev;
>>> +       const struct net_device_ops *ops;
>>> +       int err = -EOPNOTSUPP;
>>> +
>>> +       dev = netdev_switch_get_by_fib_dev(fi->fib_dev);
>>> +       if (!dev)
>>> +               return -EOPNOTSUPP;
>>> +       ops = dev->netdev_ops;
>>> +
>>> +       if (ops->ndo_switch_fib_ipv4_del)
>>> +               err = ops->ndo_switch_fib_ipv4_del(dev, htonl(dst),
>>> dst_len,
>>> +                                                  fi, tos, type, tb_id);
>>> +
>>> +       return err;
>>> +}
>>> +EXPORT_SYMBOL(netdev_switch_fib_ipv4_del);
>>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 1/3] net: add IPv4 routing FIB support for swdev
  2015-01-02 11:21   ` Arad, Ronen
@ 2015-01-02 21:53     ` roopa
  0 siblings, 0 replies; 17+ messages in thread
From: roopa @ 2015-01-02 21:53 UTC (permalink / raw)
  To: Arad, Ronen; +Cc: sfeldma, netdev, jiri, john.fastabend, tgraf, jhs, andy

On 1/2/15, 3:21 AM, Arad, Ronen wrote:
>
>> -----Original Message-----
>> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On
>> Behalf Of roopa
>> Sent: Friday, January 02, 2015 7:50 AM
>> To: sfeldma@gmail.com
>> Cc: netdev@vger.kernel.org; jiri@resnulli.us; john.fastabend@gmail.com;
>> tgraf@suug.ch; jhs@mojatatu.com; andy@greyhouse.net
>> Subject: Re: [PATCH net-next 1/3] net: add IPv4 routing FIB support for swdev
>>
>> On 1/1/15, 7:29 PM, sfeldma@gmail.com wrote:
>>> From: Scott Feldman <sfeldma@gmail.com>
>>>
>>> To offload IPv4 L3 routing functions to swdev device, the swdev device
>> driver
>>> implements two new ndo ops (ndo_switch_fib_ipv4_add/del).  The ops are
>> called
>>> by the core IPv4 FIB code when installing/removing FIB entries to/from the
>>> kernel FIB.  On install, the driver should return 0 if FIB entry (route) can
>> be
>>> installed to device for offloading, -EOPNOTSUPP if route cannot be installed
>>> due to device limitations, and other negative error code on failure to
>> install
>>> route to device.  On failure error code, the route is not installed to
>> device,
>>> and not installed in kernel FIB, and the return code is propagated back to
>> the
>>> user-space caller (via netlink).  An -EOPNOTSUPP error code is skipped for
>> the
>>> device but installed in the kernel FIB.
>>>
>>> The FIB entry (route) nexthop list is used to find the swdev device port to
>>> anchor the ndo op call.  The route's fib_dev (the first nexthop's dev) is
>> used
>>> find the swdev port by recursively traversing the fib_dev's lower_dev list
>>> until a swdev port is found.  The ndo op is called on this swdev port.
>> scott, I posted a similar api for bridge attribute sets. But, nobody
>> supported it.
>> http://marc.info/?l=linux-netdev&m=141820234410602&w=2
>>
>> If this is acceptable, I will be resubmitting my api as well.
>>
> There is certainly a need to propagate bridge and brport attributes to
> switchdev driver. I believe the objections to your patch were not about that
> need but about the mechanism of doing that.
I understand that. It was only about the mechanism. And in my last 
comment i was
only trying to comment on the mechanism. And, my motivation of bringing 
that up
(i had indicated during my patch submission), is this will be needed for 
most offloads.

>   My understanding of the objections
> on the list is that the propagation has to be delegated to intermediate master
> devices (such as bond/team) in a stacked architecture instead of blindly
> traverse through them to leaf switchdev ports.
yes, and the open question was should immediate masters care.
> An ideal traversal would allow intermediate master (or just upper) devices to
> intervene or block the traversal while defaulting to the suggested transparent
> traversal. This could address the objections to your patch.
I thought i had addressed that. At every point you will check if the 
intermediate lowerdev implements the op.
If it does you call the op on that netdev and that will terminate the 
traversal on that netdev (in this case the intermediate master if the 
intermediate master is capable of handling that op).
> Maybe the traversal
> requires an introduction of a new ndo.


>>
>>> Since the FIB entry is "naked" when push from the kernel, the driver/device
>>> is responsible for resolving the route's nexthops to neighbor MAC addresses.
>>> This can be done by the driver by monitoring NETEVENT_NEIGH_UPDATE
>>> netevent notifier to watch for ARP activity.  Once a nexthop is resolved to
>>> neighbor MAC address, it can be installed to the device and the device will
>>> do the L3 routing offload in HW, for that nexthop.
>>>
>>> Signed-off-by: Scott Feldman <sfeldma@gmail.com>
>>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>>> ---
>>>    include/linux/netdevice.h |   22 +++++++++++
>>>    include/net/switchdev.h   |   18 +++++++++
>>>    net/ipv4/fib_trie.c       |   17 ++++++++-
>>>    net/switchdev/switchdev.c |   89
>> +++++++++++++++++++++++++++++++++++++++++++++
>>>    4 files changed, 145 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>> index 679e6e9..b66d22b 100644
>>> --- a/include/linux/netdevice.h
>>> +++ b/include/linux/netdevice.h
>>> @@ -767,6 +767,8 @@ struct netdev_phys_item_id {
>>>    typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
>>>    				       struct sk_buff *skb);
>>>
>>> +struct fib_info;
>>> +
>>>    /*
>>>     * This structure defines the management hooks for network devices.
>>>     * The following hooks can be defined; unless noted otherwise, they are
>>> @@ -1030,6 +1032,14 @@ typedef u16 (*select_queue_fallback_t)(struct
>> net_device *dev,
>>>     * int (*ndo_switch_port_stp_update)(struct net_device *dev, u8 state);
>>>     *	Called to notify switch device port of bridge port STP
>>>     *	state change.
>>> + * int (*ndo_sw_parent_fib_ipv4_add)(struct net_device *dev, __be32 dst,
>>> + *				     int dst_len, struct fib_info *fi,
>>> + *				     u8 tos, u8 type, u32 tb_id);
>>> + *	Called to add IPv4 route to switch device.
>>> + * int (*ndo_sw_parent_fib_ipv4_del)(struct net_device *dev, __be32 dst,
>>> + *				     int dst_len, struct fib_info *fi,
>>> + *				     u8 tos, u8 type, u32 tb_id);
>>> + *	Called to delete IPv4 route from switch device.
>>>     */
>>>    struct net_device_ops {
>>>    	int			(*ndo_init)(struct net_device *dev);
>>> @@ -1189,6 +1199,18 @@ struct net_device_ops {
>>>    							    struct netdev_phys_item_id
>> *psid);
>>>    	int			(*ndo_switch_port_stp_update)(struct net_device
>> *dev,
>>>    							      u8 state);
>>> +	int			(*ndo_switch_fib_ipv4_add)(struct net_device *dev,
>>> +							   __be32 dst,
>>> +							   int dst_len,
>>> +							   struct fib_info *fi,
>>> +							   u8 tos, u8 type,
>>> +							   u32 tb_id);
>>> +	int			(*ndo_switch_fib_ipv4_del)(struct net_device *dev,
>>> +							   __be32 dst,
>>> +							   int dst_len,
>>> +							   struct fib_info *fi,
>>> +							   u8 tos, u8 type,
>>> +							   u32 tb_id);
>>>    #endif
>>>    };
>>>
>>> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>>> index 8a6d164..caebc2a 100644
>>> --- a/include/net/switchdev.h
>>> +++ b/include/net/switchdev.h
>>> @@ -17,6 +17,10 @@
>>>    int netdev_switch_parent_id_get(struct net_device *dev,
>>>    				struct netdev_phys_item_id *psid);
>>>    int netdev_switch_port_stp_update(struct net_device *dev, u8 state);
>>> +int netdev_switch_fib_ipv4_add(u32 dst, int dst_len, struct fib_info *fi,
>>> +			       u8 tos, u8 type, u32 tb_id);
>>> +int netdev_switch_fib_ipv4_del(u32 dst, int dst_len, struct fib_info *fi,
>>> +			       u8 tos, u8 type, u32 tb_id);
>>>
>>>    #else
>>>
>>> @@ -32,6 +36,20 @@ static inline int netdev_switch_port_stp_update(struct
>> net_device *dev,
>>>    	return -EOPNOTSUPP;
>>>    }
>>>
>>> +static inline int netdev_switch_fib_ipv4_add(u32 dst, int dst_len,
>>> +					     struct fib_info *fi,
>>> +					     u8 tos, u8 type, u32 tb_id)
>>> +{
>>> +	return -EOPNOTSUPP;
>>> +}
>>> +
>>> +static inline int netdev_switch_fib_ipv4_del(u32 dst, int dst_len,
>>> +					     struct fib_info *fi,
>>> +					     u8 tos, u8 type, u32 tb_id)
>>> +{
>>> +	return -EOPNOTSUPP;
>>> +}
>>> +
>>>    #endif
>>>
>>>    #endif /* _LINUX_SWITCHDEV_H_ */
>>> diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
>>> index 281e5e0..ea2dc17 100644
>>> --- a/net/ipv4/fib_trie.c
>>> +++ b/net/ipv4/fib_trie.c
>>> @@ -79,6 +79,7 @@
>>>    #include <net/tcp.h>
>>>    #include <net/sock.h>
>>>    #include <net/ip_fib.h>
>>> +#include <net/switchdev.h>
>>>    #include "fib_lookup.h"
>>>
>>>    #define MAX_STAT_DEPTH 32
>>> @@ -1201,6 +1202,8 @@ int fib_table_insert(struct fib_table *tb, struct
>> fib_config *cfg)
>>>    			fib_release_info(fi_drop);
>>>    			if (state & FA_S_ACCESSED)
>>>    				rt_cache_flush(cfg->fc_nlinfo.nl_net);
>>> +			netdev_switch_fib_ipv4_add(key, plen, fi, fa->fa_tos,
>>> +						   cfg->fc_type, tb->tb_id);
>>>    			rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen,
>>>    				tb->tb_id, &cfg->fc_nlinfo, NLM_F_REPLACE);
>>>
>>> @@ -1229,6 +1232,13 @@ int fib_table_insert(struct fib_table *tb, struct
>> fib_config *cfg)
>>>    	new_fa->fa_tos = tos;
>>>    	new_fa->fa_type = cfg->fc_type;
>>>    	new_fa->fa_state = 0;
>>> +
>>> +	/* (Optionally) offload fib info to switch hardware. */
>>> +	err = netdev_switch_fib_ipv4_add(key, plen, fi, tos,
>>> +					 cfg->fc_type, tb->tb_id);
>>> +	if (err && err != -EOPNOTSUPP)
>>> +		goto out_free_new_fa;
>>> +
>>>    	/*
>>>    	 * Insert new entry to the list.
>>>    	 */
>>> @@ -1237,7 +1247,7 @@ int fib_table_insert(struct fib_table *tb, struct
>> fib_config *cfg)
>>>    		fa_head = fib_insert_node(t, key, plen);
>>>    		if (unlikely(!fa_head)) {
>>>    			err = -ENOMEM;
>>> -			goto out_free_new_fa;
>>> +			goto out_sw_fib_del;
>>>    		}
>>>    	}
>>>
>>> @@ -1253,6 +1263,8 @@ int fib_table_insert(struct fib_table *tb, struct
>> fib_config *cfg)
>>>    succeeded:
>>>    	return 0;
>>>
>>> +out_sw_fib_del:
>>> +	netdev_switch_fib_ipv4_del(key, plen, fi, tos, cfg->fc_type, tb-
>>> tb_id);
>>>    out_free_new_fa:
>>>    	kmem_cache_free(fn_alias_kmem, new_fa);
>>>    out:
>>> @@ -1529,6 +1541,9 @@ int fib_table_delete(struct fib_table *tb, struct
>> fib_config *cfg)
>>>    	rtmsg_fib(RTM_DELROUTE, htonl(key), fa, plen, tb->tb_id,
>>>    		  &cfg->fc_nlinfo, 0);
>>>
>>> +	netdev_switch_fib_ipv4_del(key, plen, fa->fa_info, tos,
>>> +				   cfg->fc_type, tb->tb_id);
>>> +
>>>    	list_del_rcu(&fa->fa_list);
>>>
>>>    	if (!plen)
>>> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
>>> index d162b21..211a8a0 100644
>>> --- a/net/switchdev/switchdev.c
>>> +++ b/net/switchdev/switchdev.c
>>> @@ -12,6 +12,7 @@
>>>    #include <linux/types.h>
>>>    #include <linux/init.h>
>>>    #include <linux/netdevice.h>
>>> +#include <net/ip_fib.h>
>>>    #include <net/switchdev.h>
>>>
>>>    /**
>>> @@ -50,3 +51,91 @@ int netdev_switch_port_stp_update(struct net_device *dev,
>> u8 state)
>>>    	return ops->ndo_switch_port_stp_update(dev, state);
>>>    }
>>>    EXPORT_SYMBOL(netdev_switch_port_stp_update);
>>> +
>>> +static struct net_device *netdev_switch_get_by_fib_dev(struct net_device
>> *dev)
>>> +{
>>> +	const struct net_device_ops *ops = dev->netdev_ops;
>>> +	struct net_device *lower_dev;
>>> +	struct net_device *port_dev;
>>> +	struct list_head *iter;
>>> +
>>> +	/* Recusively search from fib_dev down until we find
>>> +	 * a sw port dev.  (A sw port dev supports
>>> +	 * ndo_switch_parent_id_get).
>>> +	 */
>>> +
>>> +	if (ops->ndo_switch_parent_id_get)
>>> +		return dev;
>>> +
>>> +	netdev_for_each_lower_dev(dev, lower_dev, iter) {
>>> +		port_dev = netdev_switch_get_by_fib_dev(lower_dev);
>>> +		if (port_dev)
>>> +			return port_dev;
>>> +	}
>>> +
>>> +	return NULL;
>>> +}
>>> +
>>> +/**
>>> + *	netdev_switch_fib_ipv4_add - Add IPv4 route entry to switch
>>> + *
>>> + *	@dst: route's IPv4 destination address
>>> + *	@dst_len: destination address length (prefix length)
>>> + *	@fi: route FIB info structure
>>> + *	@tos: route TOS
>>> + *	@type: route type
>>> + *	@tb_id: route table ID
>>> + *
>>> + *	Add IPv4 route entry to switch device.
>>> + */
>>> +int netdev_switch_fib_ipv4_add(u32 dst, int dst_len, struct fib_info *fi,
>>> +			       u8 tos, u8 type, u32 tb_id)
>>> +{
>>> +	struct net_device *dev;
>>> +	const struct net_device_ops *ops;
>>> +	int err = -EOPNOTSUPP;
>>> +
>>> +	dev = netdev_switch_get_by_fib_dev(fi->fib_dev);
>>> +	if (!dev)
>>> +		return -EOPNOTSUPP;
>>> +	ops = dev->netdev_ops;
>>> +
>>> +	if (ops->ndo_switch_fib_ipv4_add)
>>> +		err = ops->ndo_switch_fib_ipv4_add(dev, htonl(dst), dst_len,
>>> +						   fi, tos, type, tb_id);
>>> +
>>> +	return err;
>>> +}
>>> +EXPORT_SYMBOL(netdev_switch_fib_ipv4_add);
>>> +
>>> +/**
>>> + *	netdev_switch_fib_ipv4_del - Delete IPv4 route entry from switch
>>> + *
>>> + *	@dst: route's IPv4 destination address
>>> + *	@dst_len: destination address length (prefix length)
>>> + *	@fi: route FIB info structure
>>> + *	@tos: route TOS
>>> + *	@type: route type
>>> + *	@tb_id: route table ID
>>> + *
>>> + *	Delete IPv4 route entry from switch device.
>>> + */
>>> +int netdev_switch_fib_ipv4_del(u32 dst, int dst_len, struct fib_info *fi,
>>> +			       u8 tos, u8 type, u32 tb_id)
>>> +{
>>> +	struct net_device *dev;
>>> +	const struct net_device_ops *ops;
>>> +	int err = -EOPNOTSUPP;
>>> +
>>> +	dev = netdev_switch_get_by_fib_dev(fi->fib_dev);
>>> +	if (!dev)
>>> +		return -EOPNOTSUPP;
>>> +	ops = dev->netdev_ops;
>>> +
>>> +	if (ops->ndo_switch_fib_ipv4_del)
>>> +		err = ops->ndo_switch_fib_ipv4_del(dev, htonl(dst), dst_len,
>>> +						   fi, tos, type, tb_id);
>>> +
>>> +	return err;
>>> +}
>>> +EXPORT_SYMBOL(netdev_switch_fib_ipv4_del);
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 1/3] net: add IPv4 routing FIB support for swdev
  2015-01-02 11:39     ` Arad, Ronen
  2015-01-02 17:20       ` Scott Feldman
@ 2015-01-02 22:57       ` roopa
  1 sibling, 0 replies; 17+ messages in thread
From: roopa @ 2015-01-02 22:57 UTC (permalink / raw)
  To: Arad, Ronen
  Cc: Scott Feldman, Netdev, Jirí Pírko, john fastabend,
	Thomas Graf, Jamal Hadi Salim, Andy Gospodarek

On 1/2/15, 3:39 AM, Arad, Ronen wrote:
>
>> -----Original Message-----
>> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On
>> Behalf Of Scott Feldman
>> Sent: Friday, January 02, 2015 10:01 AM
>> To: roopa
>> Cc: Netdev; Jiří Pírko; john fastabend; Thomas Graf; Jamal Hadi Salim; Andy
>> Gospodarek
>> Subject: Re: [PATCH net-next 1/3] net: add IPv4 routing FIB support for swdev
>>
>> On Thu, Jan 1, 2015 at 9:49 PM, roopa <roopa@cumulusnetworks.com> wrote:
>>> On 1/1/15, 7:29 PM, sfeldma@gmail.com wrote:
>>>> From: Scott Feldman <sfeldma@gmail.com>
>>>>
>>>> To offload IPv4 L3 routing functions to swdev device, the swdev device
>>>> driver
>>>> implements two new ndo ops (ndo_switch_fib_ipv4_add/del).  The ops are
>>>> called
>>>> by the core IPv4 FIB code when installing/removing FIB entries to/from the
>>>> kernel FIB.  On install, the driver should return 0 if FIB entry (route)
>>>> can be
>>>> installed to device for offloading, -EOPNOTSUPP if route cannot be
>>>> installed
>>>> due to device limitations, and other negative error code on failure to
>>>> install
>>>> route to device.  On failure error code, the route is not installed to
>>>> device,
>>>> and not installed in kernel FIB, and the return code is propagated back to
>>>> the
>>>> user-space caller (via netlink).  An -EOPNOTSUPP error code is skipped for
>>>> the
>>>> device but installed in the kernel FIB.
>>>>
>>>> The FIB entry (route) nexthop list is used to find the swdev device port
>>>> to
>>>> anchor the ndo op call.  The route's fib_dev (the first nexthop's dev) is
>>>> used
>>>> find the swdev port by recursively traversing the fib_dev's lower_dev list
>>>> until a swdev port is found.  The ndo op is called on this swdev port.
>>>
>>> scott, I posted a similar api for bridge attribute sets. But, nobody
>>> supported it.
>>> http://marc.info/?l=linux-netdev&m=141820234410602&w=2
>>>
>>> If this is acceptable, I will be resubmitting my api as well.
>>>
>> This may get shot down as well, who knows?
>>
>> For routes, the nexthop dev may be a bridge or a bond for an IP on the
>> router, so we have no choice but to walk down from the bridge or the
>> bond to find a swport dev to call the ndo op to install the route.
>>
> Another case is when VLAN-aware bridge with VLAN filtering is used. In that
> case IP interfaces are VLAN interfaces created on top of the bridge.
>
>> For bridge settings, I remember someone raised the issue that settings
>> should be propagated down the dev hierarchy, with parent calling
>> child's op and so on.  I'll go back and look at your post.
>>
> This was my comment. I'm not sure it was correct. My concern was the VLAN
> interface on top of a VLAN-aware bridge use-case. I now believe that such
> interfaces are upper devices of the bridge (not master). Therefore, it seems
> that traversal starting at a VLAN interface on top of a bridge will follow a
> path: VLAN interface => bridge => [team/bond] => switchdev port.
for l3 this seems right. My patches were doing the same thing only for 
l2...vlan filtering bridge,
and those were only for bridge attributes (learning, flooding etc), 
which will be like below:
bridge => [team/bond] => switchdev port.

> One complication here is that the VLAN context is important. A "naked" nexthop
> shall only be resolved within the VLAN associated with the VLAN interface. When
> ARP resolution is performed by Linux stack, it goes via the VLAN interface
> which imposes a tag on the packet before handing it to the bridge. The VLAN-
> aware bridge floods such packet only to member ports of the VLAN. This behavior
> of the software bridge has to be preserved with offloaded L3 forwarding and
> offloaded L2 switching.
>>>
>>>> Since the FIB entry is "naked" when push from the kernel, the
>>>> driver/device
>>>> is responsible for resolving the route's nexthops to neighbor MAC
>>>> addresses.
>>>> This can be done by the driver by monitoring NETEVENT_NEIGH_UPDATE
>>>> netevent notifier to watch for ARP activity.  Once a nexthop is resolved
>>>> to
>>>> neighbor MAC address, it can be installed to the device and the device
>>>> will
>>>> do the L3 routing offload in HW, for that nexthop.
>>>>
>>>> Signed-off-by: Scott Feldman <sfeldma@gmail.com>
>>>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>>>> ---
>>>>    include/linux/netdevice.h |   22 +++++++++++
>>>>    include/net/switchdev.h   |   18 +++++++++
>>>>    net/ipv4/fib_trie.c       |   17 ++++++++-
>>>>    net/switchdev/switchdev.c |   89
>>>> +++++++++++++++++++++++++++++++++++++++++++++
>>>>    4 files changed, 145 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>>> index 679e6e9..b66d22b 100644
>>>> --- a/include/linux/netdevice.h
>>>> +++ b/include/linux/netdevice.h
>>>> @@ -767,6 +767,8 @@ struct netdev_phys_item_id {
>>>>    typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
>>>>                                         struct sk_buff *skb);
>>>>    +struct fib_info;
>>>> +
>>>>    /*
>>>>     * This structure defines the management hooks for network devices.
>>>>     * The following hooks can be defined; unless noted otherwise, they are
>>>> @@ -1030,6 +1032,14 @@ typedef u16 (*select_queue_fallback_t)(struct
>>>> net_device *dev,
>>>>     * int (*ndo_switch_port_stp_update)(struct net_device *dev, u8 state);
>>>>     *    Called to notify switch device port of bridge port STP
>>>>     *    state change.
>>>> + * int (*ndo_sw_parent_fib_ipv4_add)(struct net_device *dev, __be32 dst,
>>>> + *                                  int dst_len, struct fib_info *fi,
>>>> + *                                  u8 tos, u8 type, u32 tb_id);
>>>> + *     Called to add IPv4 route to switch device.
>>>> + * int (*ndo_sw_parent_fib_ipv4_del)(struct net_device *dev, __be32 dst,
>>>> + *                                  int dst_len, struct fib_info *fi,
>>>> + *                                  u8 tos, u8 type, u32 tb_id);
>>>> + *     Called to delete IPv4 route from switch device.
>>>>     */
>>>>    struct net_device_ops {
>>>>          int                     (*ndo_init)(struct net_device *dev);
>>>> @@ -1189,6 +1199,18 @@ struct net_device_ops {
>>>>                                                              struct
>>>> netdev_phys_item_id *psid);
>>>>          int                     (*ndo_switch_port_stp_update)(struct
>>>> net_device *dev,
>>>>                                                                u8 state);
>>>> +       int                     (*ndo_switch_fib_ipv4_add)(struct
>>>> net_device *dev,
>>>> +                                                          __be32 dst,
>>>> +                                                          int dst_len,
>>>> +                                                          struct fib_info
>>>> *fi,
>>>> +                                                          u8 tos, u8
>>>> type,
>>>> +                                                          u32 tb_id);
>>>> +       int                     (*ndo_switch_fib_ipv4_del)(struct
>>>> net_device *dev,
>>>> +                                                          __be32 dst,
>>>> +                                                          int dst_len,
>>>> +                                                          struct fib_info
>>>> *fi,
>>>> +                                                          u8 tos, u8
>>>> type,
>>>> +                                                          u32 tb_id);
>>>>    #endif
>>>>    };
>>>>    diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>>>> index 8a6d164..caebc2a 100644
>>>> --- a/include/net/switchdev.h
>>>> +++ b/include/net/switchdev.h
>>>> @@ -17,6 +17,10 @@
>>>>    int netdev_switch_parent_id_get(struct net_device *dev,
>>>>                                  struct netdev_phys_item_id *psid);
>>>>    int netdev_switch_port_stp_update(struct net_device *dev, u8 state);
>>>> +int netdev_switch_fib_ipv4_add(u32 dst, int dst_len, struct fib_info *fi,
>>>> +                              u8 tos, u8 type, u32 tb_id);
>>>> +int netdev_switch_fib_ipv4_del(u32 dst, int dst_len, struct fib_info *fi,
>>>> +                              u8 tos, u8 type, u32 tb_id);
>>>>      #else
>>>>    @@ -32,6 +36,20 @@ static inline int
>>>> netdev_switch_port_stp_update(struct net_device *dev,
>>>>          return -EOPNOTSUPP;
>>>>    }
>>>>    +static inline int netdev_switch_fib_ipv4_add(u32 dst, int dst_len,
>>>> +                                            struct fib_info *fi,
>>>> +                                            u8 tos, u8 type, u32 tb_id)
>>>> +{
>>>> +       return -EOPNOTSUPP;
>>>> +}
>>>> +
>>>> +static inline int netdev_switch_fib_ipv4_del(u32 dst, int dst_len,
>>>> +                                            struct fib_info *fi,
>>>> +                                            u8 tos, u8 type, u32 tb_id)
>>>> +{
>>>> +       return -EOPNOTSUPP;
>>>> +}
>>>> +
>>>>    #endif
>>>>      #endif /* _LINUX_SWITCHDEV_H_ */
>>>> diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
>>>> index 281e5e0..ea2dc17 100644
>>>> --- a/net/ipv4/fib_trie.c
>>>> +++ b/net/ipv4/fib_trie.c
>>>> @@ -79,6 +79,7 @@
>>>>    #include <net/tcp.h>
>>>>    #include <net/sock.h>
>>>>    #include <net/ip_fib.h>
>>>> +#include <net/switchdev.h>
>>>>    #include "fib_lookup.h"
>>>>      #define MAX_STAT_DEPTH 32
>>>> @@ -1201,6 +1202,8 @@ int fib_table_insert(struct fib_table *tb, struct
>>>> fib_config *cfg)
>>>>                          fib_release_info(fi_drop);
>>>>                          if (state & FA_S_ACCESSED)
>>>>                                  rt_cache_flush(cfg->fc_nlinfo.nl_net);
>>>> +                       netdev_switch_fib_ipv4_add(key, plen, fi,
>>>> fa->fa_tos,
>>>> +                                                  cfg->fc_type,
>>>> tb->tb_id);
>>>>                          rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen,
>>>>                                  tb->tb_id, &cfg->fc_nlinfo,
>>>> NLM_F_REPLACE);
>>>>    @@ -1229,6 +1232,13 @@ int fib_table_insert(struct fib_table *tb, struct
>>>> fib_config *cfg)
>>>>          new_fa->fa_tos = tos;
>>>>          new_fa->fa_type = cfg->fc_type;
>>>>          new_fa->fa_state = 0;
>>>> +
>>>> +       /* (Optionally) offload fib info to switch hardware. */
>>>> +       err = netdev_switch_fib_ipv4_add(key, plen, fi, tos,
>>>> +                                        cfg->fc_type, tb->tb_id);
>>>> +       if (err && err != -EOPNOTSUPP)
>>>> +               goto out_free_new_fa;
>>>> +
>>>>          /*
>>>>           * Insert new entry to the list.
>>>>           */
>>>> @@ -1237,7 +1247,7 @@ int fib_table_insert(struct fib_table *tb, struct
>>>> fib_config *cfg)
>>>>                  fa_head = fib_insert_node(t, key, plen);
>>>>                  if (unlikely(!fa_head)) {
>>>>                          err = -ENOMEM;
>>>> -                       goto out_free_new_fa;
>>>> +                       goto out_sw_fib_del;
>>>>                  }
>>>>          }
>>>>    @@ -1253,6 +1263,8 @@ int fib_table_insert(struct fib_table *tb, struct
>>>> fib_config *cfg)
>>>>    succeeded:
>>>>          return 0;
>>>>    +out_sw_fib_del:
>>>> +       netdev_switch_fib_ipv4_del(key, plen, fi, tos, cfg->fc_type,
>>>> tb->tb_id);
>>>>    out_free_new_fa:
>>>>          kmem_cache_free(fn_alias_kmem, new_fa);
>>>>    out:
>>>> @@ -1529,6 +1541,9 @@ int fib_table_delete(struct fib_table *tb, struct
>>>> fib_config *cfg)
>>>>          rtmsg_fib(RTM_DELROUTE, htonl(key), fa, plen, tb->tb_id,
>>>>                    &cfg->fc_nlinfo, 0);
>>>>    +     netdev_switch_fib_ipv4_del(key, plen, fa->fa_info, tos,
>>>> +                                  cfg->fc_type, tb->tb_id);
>>>> +
>>>>          list_del_rcu(&fa->fa_list);
>>>>          if (!plen)
>>>> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
>>>> index d162b21..211a8a0 100644
>>>> --- a/net/switchdev/switchdev.c
>>>> +++ b/net/switchdev/switchdev.c
>>>> @@ -12,6 +12,7 @@
>>>>    #include <linux/types.h>
>>>>    #include <linux/init.h>
>>>>    #include <linux/netdevice.h>
>>>> +#include <net/ip_fib.h>
>>>>    #include <net/switchdev.h>
>>>>      /**
>>>> @@ -50,3 +51,91 @@ int netdev_switch_port_stp_update(struct net_device
>>>> *dev, u8 state)
>>>>          return ops->ndo_switch_port_stp_update(dev, state);
>>>>    }
>>>>    EXPORT_SYMBOL(netdev_switch_port_stp_update);
>>>> +
>>>> +static struct net_device *netdev_switch_get_by_fib_dev(struct net_device
>>>> *dev)
>>>> +{
>>>> +       const struct net_device_ops *ops = dev->netdev_ops;
>>>> +       struct net_device *lower_dev;
>>>> +       struct net_device *port_dev;
>>>> +       struct list_head *iter;
>>>> +
>>>> +       /* Recusively search from fib_dev down until we find
>>>> +        * a sw port dev.  (A sw port dev supports
>>>> +        * ndo_switch_parent_id_get).
>>>> +        */
>>>> +
>>>> +       if (ops->ndo_switch_parent_id_get)
>>>> +               return dev;
>>>> +
>>>> +       netdev_for_each_lower_dev(dev, lower_dev, iter) {
>>>> +               port_dev = netdev_switch_get_by_fib_dev(lower_dev);
>>>> +               if (port_dev)
>>>> +                       return port_dev;
>>>> +       }
>>>> +
>>>> +       return NULL;
>>>> +}
>>>> +
>>>> +/**
>>>> + *     netdev_switch_fib_ipv4_add - Add IPv4 route entry to switch
>>>> + *
>>>> + *     @dst: route's IPv4 destination address
>>>> + *     @dst_len: destination address length (prefix length)
>>>> + *     @fi: route FIB info structure
>>>> + *     @tos: route TOS
>>>> + *     @type: route type
>>>> + *     @tb_id: route table ID
>>>> + *
>>>> + *     Add IPv4 route entry to switch device.
>>>> + */
>>>> +int netdev_switch_fib_ipv4_add(u32 dst, int dst_len, struct fib_info *fi,
>>>> +                              u8 tos, u8 type, u32 tb_id)
>>>> +{
>>>> +       struct net_device *dev;
>>>> +       const struct net_device_ops *ops;
>>>> +       int err = -EOPNOTSUPP;
>>>> +
>>>> +       dev = netdev_switch_get_by_fib_dev(fi->fib_dev);
>>>> +       if (!dev)
>>>> +               return -EOPNOTSUPP;
>>>> +       ops = dev->netdev_ops;
>>>> +
>>>> +       if (ops->ndo_switch_fib_ipv4_add)
>>>> +               err = ops->ndo_switch_fib_ipv4_add(dev, htonl(dst),
>>>> dst_len,
>>>> +                                                  fi, tos, type, tb_id);
>>>> +
>>>> +       return err;
>>>> +}
>>>> +EXPORT_SYMBOL(netdev_switch_fib_ipv4_add);
>>>> +
>>>> +/**
>>>> + *     netdev_switch_fib_ipv4_del - Delete IPv4 route entry from switch
>>>> + *
>>>> + *     @dst: route's IPv4 destination address
>>>> + *     @dst_len: destination address length (prefix length)
>>>> + *     @fi: route FIB info structure
>>>> + *     @tos: route TOS
>>>> + *     @type: route type
>>>> + *     @tb_id: route table ID
>>>> + *
>>>> + *     Delete IPv4 route entry from switch device.
>>>> + */
>>>> +int netdev_switch_fib_ipv4_del(u32 dst, int dst_len, struct fib_info *fi,
>>>> +                              u8 tos, u8 type, u32 tb_id)
>>>> +{
>>>> +       struct net_device *dev;
>>>> +       const struct net_device_ops *ops;
>>>> +       int err = -EOPNOTSUPP;
>>>> +
>>>> +       dev = netdev_switch_get_by_fib_dev(fi->fib_dev);
>>>> +       if (!dev)
>>>> +               return -EOPNOTSUPP;
>>>> +       ops = dev->netdev_ops;
>>>> +
>>>> +       if (ops->ndo_switch_fib_ipv4_del)
>>>> +               err = ops->ndo_switch_fib_ipv4_del(dev, htonl(dst),
>>>> dst_len,
>>>> +                                                  fi, tos, type, tb_id);
>>>> +
>>>> +       return err;
>>>> +}
>>>> +EXPORT_SYMBOL(netdev_switch_fib_ipv4_del);
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> N�����r��y���b�X��ǧv�^�)޺{.n�+���z�^�)���w*\x1fjg���\x1e�����ݢj/���z�ޖ��2�ޙ���&�)ߡ�a��\x7f��\x1e�G���h�\x0f�j:+v���w�٥

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 1/3] net: add IPv4 routing FIB support for swdev
  2015-01-02  3:29 [PATCH net-next 1/3] net: add IPv4 routing FIB support for swdev sfeldma
  2015-01-02  5:49 ` roopa
@ 2015-01-06 13:58 ` Hannes Frederic Sowa
  2015-01-06 17:51   ` Scott Feldman
  1 sibling, 1 reply; 17+ messages in thread
From: Hannes Frederic Sowa @ 2015-01-06 13:58 UTC (permalink / raw)
  To: sfeldma; +Cc: netdev, jiri, john.fastabend, tgraf, jhs, andy, roopa

Hi Scott,

On Do, 2015-01-01 at 19:29 -0800, sfeldma@gmail.com wrote:
> From: Scott Feldman <sfeldma@gmail.com>
> 
> To offload IPv4 L3 routing functions to swdev device, the swdev device driver
> implements two new ndo ops (ndo_switch_fib_ipv4_add/del).  The ops are called
> by the core IPv4 FIB code when installing/removing FIB entries to/from the
> kernel FIB.  On install, the driver should return 0 if FIB entry (route) can be
> installed to device for offloading, -EOPNOTSUPP if route cannot be installed
> due to device limitations, and other negative error code on failure to install
> route to device.  On failure error code, the route is not installed to device,
> and not installed in kernel FIB, and the return code is propagated back to the
> user-space caller (via netlink).  An -EOPNOTSUPP error code is skipped for the
> device but installed in the kernel FIB.
> 
> The FIB entry (route) nexthop list is used to find the swdev device port to
> anchor the ndo op call.  The route's fib_dev (the first nexthop's dev) is used
> find the swdev port by recursively traversing the fib_dev's lower_dev list
> until a swdev port is found.  The ndo op is called on this swdev port.
> 
> Since the FIB entry is "naked" when push from the kernel, the driver/device
> is responsible for resolving the route's nexthops to neighbor MAC addresses.
> This can be done by the driver by monitoring NETEVENT_NEIGH_UPDATE
> netevent notifier to watch for ARP activity.  Once a nexthop is resolved to
> neighbor MAC address, it can be installed to the device and the device will
> do the L3 routing offload in HW, for that nexthop.
> 
> Signed-off-by: Scott Feldman <sfeldma@gmail.com>
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> ---
>  include/linux/netdevice.h |   22 +++++++++++
>  include/net/switchdev.h   |   18 +++++++++
>  net/ipv4/fib_trie.c       |   17 ++++++++-
>  net/switchdev/switchdev.c |   89 +++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 145 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 679e6e9..b66d22b 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -767,6 +767,8 @@ struct netdev_phys_item_id {
>  typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
>  				       struct sk_buff *skb);
>  
> +struct fib_info;
> +
>  /*
>   * This structure defines the management hooks for network devices.
>   * The following hooks can be defined; unless noted otherwise, they are
> @@ -1030,6 +1032,14 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
>   * int (*ndo_switch_port_stp_update)(struct net_device *dev, u8 state);
>   *	Called to notify switch device port of bridge port STP
>   *	state change.
> + * int (*ndo_sw_parent_fib_ipv4_add)(struct net_device *dev, __be32 dst,
> + *				     int dst_len, struct fib_info *fi,
> + *				     u8 tos, u8 type, u32 tb_id);
> + *	Called to add IPv4 route to switch device.
> + * int (*ndo_sw_parent_fib_ipv4_del)(struct net_device *dev, __be32 dst,
> + *				     int dst_len, struct fib_info *fi,
> + *				     u8 tos, u8 type, u32 tb_id);
> + *	Called to delete IPv4 route from switch device.
>   */
>  struct net_device_ops {
>  	int			(*ndo_init)(struct net_device *dev);
> @@ -1189,6 +1199,18 @@ struct net_device_ops {
>  							    struct netdev_phys_item_id *psid);
>  	int			(*ndo_switch_port_stp_update)(struct net_device *dev,
>  							      u8 state);
> +	int			(*ndo_switch_fib_ipv4_add)(struct net_device *dev,
> +							   __be32 dst,
> +							   int dst_len,
> +							   struct fib_info *fi,
> +							   u8 tos, u8 type,
> +							   u32 tb_id);
> +	int			(*ndo_switch_fib_ipv4_del)(struct net_device *dev,
> +							   __be32 dst,
> +							   int dst_len,
> +							   struct fib_info *fi,
> +							   u8 tos, u8 type,
> +							   u32 tb_id);
>  #endif
>  };

At this point I would like to start the discussion about handling of the
table ids/vrfs (again :) ): as I can see it, this version just passes
table ids down to the driver layer and the rocker driver filters them by
local/main table? This seems to be mostly fine for a first version but
does not feel like it will integrate well with the rest of the linux
networking ecosystem.

Will hardware have the capabilities to do programmable matches like "ip
rule" is currently capable to do? Should we plan for that? Do we want to
support hardware which does support multiple tables/VRFs?

I would like to present a first suggestion:
My take on this would be strive towards an integration with ip-rule, so
we add tables which will be offloaded to hardware. This happens only in
situations where those tables will be the first match for incoming
packets specified with an in-interface filter which has the capability
to do the offloading (for example). The determination if the table is
capable for hardware offloading should be done automatically, so if
later hardware will be capable of doing ip rule like matches, we can
just expand the check which flags the tables accordingly.

Anyway, if hardware supports multiple tables or VRFs, it is better to
manage pass in a pointer where drivers can embed private data for
management, I think.

Thanks,
Hannes

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 1/3] net: add IPv4 routing FIB support for swdev
  2015-01-06 13:58 ` Hannes Frederic Sowa
@ 2015-01-06 17:51   ` Scott Feldman
  2015-01-06 19:59     ` Hannes Frederic Sowa
  0 siblings, 1 reply; 17+ messages in thread
From: Scott Feldman @ 2015-01-06 17:51 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Netdev, Jiří Pírko, john fastabend, Thomas Graf,
	Jamal Hadi Salim, Andy Gospodarek, Roopa Prabhu

On Tue, Jan 6, 2015 at 5:58 AM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> Hi Scott,
>
> On Do, 2015-01-01 at 19:29 -0800, sfeldma@gmail.com wrote:
>> From: Scott Feldman <sfeldma@gmail.com>
>>
>> To offload IPv4 L3 routing functions to swdev device, the swdev device driver
>> implements two new ndo ops (ndo_switch_fib_ipv4_add/del).  The ops are called
>> by the core IPv4 FIB code when installing/removing FIB entries to/from the
>> kernel FIB.  On install, the driver should return 0 if FIB entry (route) can be
>> installed to device for offloading, -EOPNOTSUPP if route cannot be installed
>> due to device limitations, and other negative error code on failure to install
>> route to device.  On failure error code, the route is not installed to device,
>> and not installed in kernel FIB, and the return code is propagated back to the
>> user-space caller (via netlink).  An -EOPNOTSUPP error code is skipped for the
>> device but installed in the kernel FIB.
>>
>> The FIB entry (route) nexthop list is used to find the swdev device port to
>> anchor the ndo op call.  The route's fib_dev (the first nexthop's dev) is used
>> find the swdev port by recursively traversing the fib_dev's lower_dev list
>> until a swdev port is found.  The ndo op is called on this swdev port.
>>
>> Since the FIB entry is "naked" when push from the kernel, the driver/device
>> is responsible for resolving the route's nexthops to neighbor MAC addresses.
>> This can be done by the driver by monitoring NETEVENT_NEIGH_UPDATE
>> netevent notifier to watch for ARP activity.  Once a nexthop is resolved to
>> neighbor MAC address, it can be installed to the device and the device will
>> do the L3 routing offload in HW, for that nexthop.
>>
>> Signed-off-by: Scott Feldman <sfeldma@gmail.com>
>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>> ---
>>  include/linux/netdevice.h |   22 +++++++++++
>>  include/net/switchdev.h   |   18 +++++++++
>>  net/ipv4/fib_trie.c       |   17 ++++++++-
>>  net/switchdev/switchdev.c |   89 +++++++++++++++++++++++++++++++++++++++++++++
>>  4 files changed, 145 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index 679e6e9..b66d22b 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -767,6 +767,8 @@ struct netdev_phys_item_id {
>>  typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
>>                                      struct sk_buff *skb);
>>
>> +struct fib_info;
>> +
>>  /*
>>   * This structure defines the management hooks for network devices.
>>   * The following hooks can be defined; unless noted otherwise, they are
>> @@ -1030,6 +1032,14 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
>>   * int (*ndo_switch_port_stp_update)(struct net_device *dev, u8 state);
>>   *   Called to notify switch device port of bridge port STP
>>   *   state change.
>> + * int (*ndo_sw_parent_fib_ipv4_add)(struct net_device *dev, __be32 dst,
>> + *                                int dst_len, struct fib_info *fi,
>> + *                                u8 tos, u8 type, u32 tb_id);
>> + *   Called to add IPv4 route to switch device.
>> + * int (*ndo_sw_parent_fib_ipv4_del)(struct net_device *dev, __be32 dst,
>> + *                                int dst_len, struct fib_info *fi,
>> + *                                u8 tos, u8 type, u32 tb_id);
>> + *   Called to delete IPv4 route from switch device.
>>   */
>>  struct net_device_ops {
>>       int                     (*ndo_init)(struct net_device *dev);
>> @@ -1189,6 +1199,18 @@ struct net_device_ops {
>>                                                           struct netdev_phys_item_id *psid);
>>       int                     (*ndo_switch_port_stp_update)(struct net_device *dev,
>>                                                             u8 state);
>> +     int                     (*ndo_switch_fib_ipv4_add)(struct net_device *dev,
>> +                                                        __be32 dst,
>> +                                                        int dst_len,
>> +                                                        struct fib_info *fi,
>> +                                                        u8 tos, u8 type,
>> +                                                        u32 tb_id);
>> +     int                     (*ndo_switch_fib_ipv4_del)(struct net_device *dev,
>> +                                                        __be32 dst,
>> +                                                        int dst_len,
>> +                                                        struct fib_info *fi,
>> +                                                        u8 tos, u8 type,
>> +                                                        u32 tb_id);
>>  #endif
>>  };
>
> At this point I would like to start the discussion about handling of the
> table ids/vrfs (again :) ): as I can see it, this version just passes
> table ids down to the driver layer and the rocker driver filters them by
> local/main table? This seems to be mostly fine for a first version but
> does not feel like it will integrate well with the rest of the linux
> networking ecosystem.
>
> Will hardware have the capabilities to do programmable matches like "ip
> rule" is currently capable to do? Should we plan for that? Do we want to
> support hardware which does support multiple tables/VRFs?

Good questions, thanks for bringing these up.

>
> I would like to present a first suggestion:
> My take on this would be strive towards an integration with ip-rule, so
> we add tables which will be offloaded to hardware. This happens only in
> situations where those tables will be the first match for incoming
> packets specified with an in-interface filter which has the capability
> to do the offloading (for example). The determination if the table is
> capable for hardware offloading should be done automatically, so if
> later hardware will be capable of doing ip rule like matches, we can
> just expand the check which flags the tables accordingly.

Sounds like a good suggestion to me.  We need to think about what the
swdev API looks like to the switch device driver.  Could you take a
stab at defining what integration with ip-rule looks like, code-wise,
at the swdev API layer?

With the rocker device we're prototyping with, the standard LPM on IP
dst is the normal L3 routing table structure.  Within that, table
priorities could be handled, so routes in one table take precedence
over routes in another table.  If we want to do policy routing, then
we'd need to use the ACL table in rocker to match on other fields
besides just IP dst.

> Anyway, if hardware supports multiple tables or VRFs, it is better to
> manage pass in a pointer where drivers can embed private data for
> management, I think.
>
> Thanks,
> Hannes
>
>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 1/3] net: add IPv4 routing FIB support for swdev
  2015-01-06 17:51   ` Scott Feldman
@ 2015-01-06 19:59     ` Hannes Frederic Sowa
  2015-01-06 20:26       ` Hannes Frederic Sowa
  2015-01-07  2:08       ` Shrijeet Mukherjee
  0 siblings, 2 replies; 17+ messages in thread
From: Hannes Frederic Sowa @ 2015-01-06 19:59 UTC (permalink / raw)
  To: Scott Feldman
  Cc: Netdev, Jiří Pírko, john fastabend, Thomas Graf,
	Jamal Hadi Salim, Andy Gospodarek, Roopa Prabhu

On Di, 2015-01-06 at 09:51 -0800, Scott Feldman wrote:
> On Tue, Jan 6, 2015 at 5:58 AM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
> > At this point I would like to start the discussion about handling of the
> > table ids/vrfs (again :) ): as I can see it, this version just passes
> > table ids down to the driver layer and the rocker driver filters them by
> > local/main table? This seems to be mostly fine for a first version but
> > does not feel like it will integrate well with the rest of the linux
> > networking ecosystem.
> >
> > Will hardware have the capabilities to do programmable matches like "ip
> > rule" is currently capable to do? Should we plan for that? Do we want to
> > support hardware which does support multiple tables/VRFs?
> 
> Good questions, thanks for bringing these up.
> >
> > I would like to present a first suggestion:
> > My take on this would be strive towards an integration with ip-rule, so
> > we add tables which will be offloaded to hardware. This happens only in
> > situations where those tables will be the first match for incoming
> > packets specified with an in-interface filter which has the capability
> > to do the offloading (for example). The determination if the table is
> > capable for hardware offloading should be done automatically, so if
> > later hardware will be capable of doing ip rule like matches, we can
> > just expand the check which flags the tables accordingly.
> 
> Sounds like a good suggestion to me.  We need to think about what the
> swdev API looks like to the switch device driver.  Could you take a
> stab at defining what integration with ip-rule looks like, code-wise,
> at the swdev API layer?
> 
> With the rocker device we're prototyping with, the standard LPM on IP
> dst is the normal L3 routing table structure.  Within that, table
> priorities could be handled, so routes in one table take precedence
> over routes in another table.  If we want to do policy routing, then
> we'd need to use the ACL table in rocker to match on other fields
> besides just IP dst.

Sorry, I haven't fully understood this. Does rocker first do a L3
routing table lookup and *after* that does decide which nexthop to chose
based on preferences in the action-set found at the leaf? My gut tells
me that we cannot do a semantically equivalent to ip rules then, we
would have to use ACLs then. Hmm...

For the first idea, I'll try to make an example:

Initial setup:
# ip rule ls
0:	from all lookup local 
32766:	from all lookup main 
32767:	from all lookup default 

# ip rule add pref 100 iif swdev0 table 5
# ip rule ls
0:	from all lookup local 
100:	from all iif swdev0 [detached] lookup 5
> maybe we can show which rules are being able to get offloaded here
32766:	from all lookup main 
32767:	from all lookup default 

table 5 should be the table we can insert routes into which are
offloaded to hardware.

During table modifications we linearly scan the rules if we find
selectors which cannot be represented by hardware.

In case we have a iif selector, we simply can use this table and just
synthesize it into the particular interface.

A ip-rule-from would need all the hardware being capable of matching
source addresses, otherwise we cannot offload all routing tables with
higher preference, same for a to/tos rule. If we encounter a fwmark
rule, we certainly cannot represent it in hardware, so skip it (here we
can think about entangling those with ACLs, but it feels hard to do).

If rules are inserted or changed we must again validate the complete
list of rules and decide if we need to flush all the routes and install
a slow path via kernel.

What do you think? Does that make sense? I could try to come up with an
API for that. ;)

Bye,
Hannes

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 1/3] net: add IPv4 routing FIB support for swdev
  2015-01-06 19:59     ` Hannes Frederic Sowa
@ 2015-01-06 20:26       ` Hannes Frederic Sowa
  2015-01-07  2:08       ` Shrijeet Mukherjee
  1 sibling, 0 replies; 17+ messages in thread
From: Hannes Frederic Sowa @ 2015-01-06 20:26 UTC (permalink / raw)
  To: Scott Feldman
  Cc: Netdev, Jiří Pírko, john fastabend, Thomas Graf,
	Jamal Hadi Salim, Andy Gospodarek, Roopa Prabhu

On Di, 2015-01-06 at 20:59 +0100, Hannes Frederic Sowa wrote:
> Sorry, I haven't fully understood this. Does rocker first do a L3
> routing table lookup and *after* that does decide which nexthop to chose
> based on preferences in the action-set found at the leaf? My gut tells
> me that we cannot do a semantically equivalent to ip rules then, we
> would have to use ACLs then. Hmm...

Does rocker drop the packet if no match is found or can it pass the
packet onto the slowpath to the kernel?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH net-next 1/3] net: add IPv4 routing FIB support for swdev
  2015-01-06 19:59     ` Hannes Frederic Sowa
  2015-01-06 20:26       ` Hannes Frederic Sowa
@ 2015-01-07  2:08       ` Shrijeet Mukherjee
  2015-01-07 11:23         ` Hannes Frederic Sowa
  1 sibling, 1 reply; 17+ messages in thread
From: Shrijeet Mukherjee @ 2015-01-07  2:08 UTC (permalink / raw)
  To: Hannes Frederic Sowa, Scott Feldman
  Cc: Netdev, Jiří Pírko, john fastabend, Thomas Graf,
	Jamal Hadi Salim, Andy Gospodarek, Roopa Prabhu

>For the first idea, I'll try to make an example:
>
>Initial setup:
># ip rule ls
>0:	from all lookup local
>32766:	from all lookup main
>32767:	from all lookup default
>
># ip rule add pref 100 iif swdev0 table 5 # ip rule ls
>0:	from all lookup local
>100:	from all iif swdev0 [detached] lookup 5
>> maybe we can show which rules are being able to get offloaded here
>32766:	from all lookup main
>32767:	from all lookup default
>
>table 5 should be the table we can insert routes into which are offloaded
>to
>hardware.
>
>During table modifications we linearly scan the rules if we find selectors
>which
>cannot be represented by hardware.
>
>In case we have a iif selector, we simply can use this table and just
>synthesize it
>into the particular interface.
>
>A ip-rule-from would need all the hardware being capable of matching source
>addresses, otherwise we cannot offload all routing tables with higher
>preference,
>same for a to/tos rule. If we encounter a fwmark rule, we certainly cannot
>represent it in hardware, so skip it (here we can think about entangling
>those with
>ACLs, but it feels hard to do).
>
>If rules are inserted or changed we must again validate the complete list
>of rules
>and decide if we need to flush all the routes and install a slow path via
>kernel.
>
>What do you think? Does that make sense? I could try to come up with an API
>for
>that. ;)
>

This sounds really good, but I suspect the real problem is the case where
the rule evaluation is in the hardware path right. If it is purely IF based
there is no issue .. but any other policy like missed in table 1, then use
table 2 will not work with this model .. or did I miss something ?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 1/3] net: add IPv4 routing FIB support for swdev
  2015-01-07  2:08       ` Shrijeet Mukherjee
@ 2015-01-07 11:23         ` Hannes Frederic Sowa
  2015-01-07 17:54           ` Shrijeet Mukherjee
  0 siblings, 1 reply; 17+ messages in thread
From: Hannes Frederic Sowa @ 2015-01-07 11:23 UTC (permalink / raw)
  To: Shrijeet Mukherjee
  Cc: Scott Feldman, Netdev, Jiří Pírko, john fastabend,
	Thomas Graf, Jamal Hadi Salim, Andy Gospodarek, Roopa Prabhu

On Di, 2015-01-06 at 18:08 -0800, Shrijeet Mukherjee wrote:
> >For the first idea, I'll try to make an example:
> >
> >Initial setup:
> ># ip rule ls
> >0:	from all lookup local
> >32766:	from all lookup main
> >32767:	from all lookup default
> >
> ># ip rule add pref 100 iif swdev0 table 5 # ip rule ls
> >0:	from all lookup local
> >100:	from all iif swdev0 [detached] lookup 5
> >> maybe we can show which rules are being able to get offloaded here
> >32766:	from all lookup main
> >32767:	from all lookup default
> >
> >table 5 should be the table we can insert routes into which are offloaded
> >to
> >hardware.
> >
> >During table modifications we linearly scan the rules if we find selectors
> >which
> >cannot be represented by hardware.
> >
> >In case we have a iif selector, we simply can use this table and just
> >synthesize it
> >into the particular interface.
> >
> >A ip-rule-from would need all the hardware being capable of matching source
> >addresses, otherwise we cannot offload all routing tables with higher
> >preference,
> >same for a to/tos rule. If we encounter a fwmark rule, we certainly cannot
> >represent it in hardware, so skip it (here we can think about entangling
> >those with
> >ACLs, but it feels hard to do).
> >
> >If rules are inserted or changed we must again validate the complete list
> >of rules
> >and decide if we need to flush all the routes and install a slow path via
> >kernel.
> >
> >What do you think? Does that make sense? I could try to come up with an API
> >for
> >that. ;)
> >
> 
> This sounds really good, but I suspect the real problem is the case where
> the rule evaluation is in the hardware path right. If it is purely IF based
> there is no issue .. but any other policy like missed in table 1, then use
> table 2 will not work with this model .. or did I miss something ?

I could come up with several ways how to model hardware. Depending on
that the integration with rules is easy or nearly impossible:

1) it simply cannot deal with ip rules, so there is no way an ACL can
influence the outcome of a routing table lookup - if the feature should
be used, it has to use the slow-path in the kernel.

2) ACLs can influence which routing table will get queried - this sounds
very much like the ip rule model and it seems not too hard to model
that.

3) Routing implementations in the hardware have a single routing table
and the leafs carry different actions with priorities: making this kind
of model working with the ip rule concept will become very difficult and
it might require lots of algorithmic code by every driver to adapt to a
single API provided by Linux. It might be possible, if the hardware
provides actions like backtrack and retrack and can keep state of
priorities during walking the tree, I really doubt that.

Implementations of type 3) would look naturally to do in hardware (see
different Cisco policy routing configurations or ipv6 subtree feature),
so it seems it won't be possible to find a simple way to fuse rules and
offloading in case of point 3).

Rocker sounds a lot like model 2) and this seems possible and should be
a matter of API design. It should merely be a matter of nicely model the
data structures. ;)

Also, @Scott: if you build drivers with l3 offloading as modules, don't
you need to push the full routing tables to the hw once? Maybe we should
think about the drivers pulling routing information from the kernel, the
kernel only notifying something changed?

Bye,
Hannes

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH net-next 1/3] net: add IPv4 routing FIB support for swdev
  2015-01-07 11:23         ` Hannes Frederic Sowa
@ 2015-01-07 17:54           ` Shrijeet Mukherjee
  2015-01-08 13:03             ` Hannes Frederic Sowa
  0 siblings, 1 reply; 17+ messages in thread
From: Shrijeet Mukherjee @ 2015-01-07 17:54 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Scott Feldman, Netdev, Jiří Pírko, john fastabend,
	Thomas Graf, Jamal Hadi Salim, Andy Gospodarek, Roopa Prabhu

>
>I could come up with several ways how to model hardware. Depending on that
>the integration with rules is easy or nearly impossible:
>
>1) it simply cannot deal with ip rules, so there is no way an ACL can
>influence the
>outcome of a routing table lookup - if the feature should be used, it has
>to use
>the slow-path in the kernel.

As Scott was saying, most hardware has table id's and the ability to
identify and prioritize that way.


>
>2) ACLs can influence which routing table will get queried - this sounds
>very much
>like the ip rule model and it seems not too hard to model that.

This clearly can be made to work .. the problem is really the space of
policy routing (i.e jump across VRF's incase of a lookup failure) when
combined with the space of ip rule flexibility.


>
>3) Routing implementations in the hardware have a single routing table and
>the
>leafs carry different actions with priorities: making this kind of model
>working
>with the ip rule concept will become very difficult and it might require
>lots of
>algorithmic code by every driver to adapt to a single API provided by
>Linux. It
>might be possible, if the hardware provides actions like backtrack and
>retrack and
>can keep state of priorities during walking the tree, I really doubt that.


In the short term .. this maybe a good way to go but with a simplication.
Some tables are offloaded and the rest at the full table level is in
software. Finally then you can put a "default route" in the hardware table
to punt to cpu and then keep the software model clever and the hardware
model fast ?

>
>Implementations of type 3) would look naturally to do in hardware (see
>different
>Cisco policy routing configurations or ipv6 subtree feature), so it seems
>it won't
>be possible to find a simple way to fuse rules and offloading in case of
>point 3).
>
>Rocker sounds a lot like model 2) and this seems possible and should be a
>matter
>of API design. It should merely be a matter of nicely model the data
>structures. ;)
>
>Also, @Scott: if you build drivers with l3 offloading as modules, don't you
>need to
>push the full routing tables to the hw once? Maybe we should think about
>the
>drivers pulling routing information from the kernel, the kernel only
>notifying
>something changed?
>
>Bye,
>Hannes
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 1/3] net: add IPv4 routing FIB support for swdev
  2015-01-07 17:54           ` Shrijeet Mukherjee
@ 2015-01-08 13:03             ` Hannes Frederic Sowa
  0 siblings, 0 replies; 17+ messages in thread
From: Hannes Frederic Sowa @ 2015-01-08 13:03 UTC (permalink / raw)
  To: Shrijeet Mukherjee
  Cc: Scott Feldman, Netdev, Jiří Pírko, john fastabend,
	Thomas Graf, Jamal Hadi Salim, Andy Gospodarek, Roopa Prabhu

Hi,

On Mi, 2015-01-07 at 09:54 -0800, Shrijeet Mukherjee wrote:
> >
> >I could come up with several ways how to model hardware. Depending on that
> >the integration with rules is easy or nearly impossible:
> >
> >1) it simply cannot deal with ip rules, so there is no way an ACL can
> >influence the
> >outcome of a routing table lookup - if the feature should be used, it has
> >to use
> >the slow-path in the kernel.
> 
> As Scott was saying, most hardware has table id's and the ability to
> identify and prioritize that way.

I saw Scott only talking about Rocker - maybe I missed it?

> >2) ACLs can influence which routing table will get queried - this sounds
> >very much
> >like the ip rule model and it seems not too hard to model that.
> 
> This clearly can be made to work .. the problem is really the space of
> policy routing (i.e jump across VRF's incase of a lookup failure) when
> combined with the space of ip rule flexibility.

This very much depends on the hardware, I guess. The complexity is
increased by the routing offloading knowing about the ACL datastructures
and vice versa.

> >3) Routing implementations in the hardware have a single routing table and
> >the
> >leafs carry different actions with priorities: making this kind of model
> >working
> >with the ip rule concept will become very difficult and it might require
> >lots of
> >algorithmic code by every driver to adapt to a single API provided by
> >Linux. It
> >might be possible, if the hardware provides actions like backtrack and
> >retrack and
> >can keep state of priorities during walking the tree, I really doubt that.
> 
> 
> In the short term .. this maybe a good way to go but with a simplication.
> Some tables are offloaded and the rest at the full table level is in
> software. Finally then you can put a "default route" in the hardware table
> to punt to cpu and then keep the software model clever and the hardware
> model fast ?

Yes, the algorithm I described in my prior mail implicitly does that, we
can extend it bit by bit as new hardware supports more filter features.
Especially the default configuration with only the RT_TABLE_LOCAL and
RT_TABLE_MAIN allows complete offloading, which should be desirable.

To deal with the RT_TABLE_LOCAL, we might walk the whole routing table
and verify that all routes have full prefix length (32 ipv4 or 128
ipv6).

Bye,
Hannes

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2015-01-08 13:04 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-02  3:29 [PATCH net-next 1/3] net: add IPv4 routing FIB support for swdev sfeldma
2015-01-02  5:49 ` roopa
2015-01-02  8:00   ` Scott Feldman
2015-01-02 11:39     ` Arad, Ronen
2015-01-02 17:20       ` Scott Feldman
2015-01-02 22:57       ` roopa
2015-01-02 20:55     ` roopa
2015-01-02 11:21   ` Arad, Ronen
2015-01-02 21:53     ` roopa
2015-01-06 13:58 ` Hannes Frederic Sowa
2015-01-06 17:51   ` Scott Feldman
2015-01-06 19:59     ` Hannes Frederic Sowa
2015-01-06 20:26       ` Hannes Frederic Sowa
2015-01-07  2:08       ` Shrijeet Mukherjee
2015-01-07 11:23         ` Hannes Frederic Sowa
2015-01-07 17:54           ` Shrijeet Mukherjee
2015-01-08 13:03             ` Hannes Frederic Sowa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.