[RFC 0/2] net: sched: indirect/remote setup tc block cb registering

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC 0/2] net: sched: indirect/remote setup tc block cb registering
@ 2018-10-04  4:55 Jakub Kicinski
  2018-10-04  4:55 ` [RFC 1/2] net: sched: register callbacks for remote tc block binds Jakub Kicinski
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Jakub Kicinski @ 2018-10-04  4:55 UTC (permalink / raw)
  To: netdev; +Cc: jiri, gerlitz.or, oss-drivers, john.hurley, Jakub Kicinski

Hi!

This set contains a rough RFC implementation of a proposed [1] replacement
for egdev cls_flower offloads.  I did some last minute restructuring
and removal of parts I felt were unnecessary, so if there are glaring bugs
they are probably mine, not John's :)  but hopefully this will give an idea
of the general direction.  We need to beef up the driver part to see how
it fully comes together.

[1] http://vger.kernel.org/netconf2018_files/JakubKicinski_netconf2018.pdf
    slides 10-13

John's says:

This patchset introduces as an alternative to egdev offload by allowing a
driver to register for block updates when an external device (e.g. tunnel
netdev) is bound to a TC block. Drivers can track new netdevs or register
to existing ones to receive information on such events. Based on this,
they may register for block offload rules using already existing
functions.

Included with this RFC is a patch to the NFP driver. This is only supposed
to provide an example of how the remote block setup can be used.

John Hurley (2):
  net: sched: register callbacks for remote tc block binds
  nfp: register remote block callbacks for vxlan/geneve

 .../net/ethernet/netronome/nfp/flower/main.c  |  12 +
 .../net/ethernet/netronome/nfp/flower/main.h  |  10 +
 .../ethernet/netronome/nfp/flower/offload.c   | 156 +++++++++
 .../netronome/nfp/flower/tunnel_conf.c        |   8 +
 include/net/pkt_cls.h                         |  56 ++++
 include/net/sch_generic.h                     |   3 +
 net/sched/cls_api.c                           | 297 +++++++++++++++++-
 7 files changed, 541 insertions(+), 1 deletion(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC 1/2] net: sched: register callbacks for remote tc block binds
  2018-10-04  4:55 [RFC 0/2] net: sched: indirect/remote setup tc block cb registering Jakub Kicinski
@ 2018-10-04  4:55 ` Jakub Kicinski
  2018-10-04  4:55 ` [RFC 2/2] nfp: register remote block callbacks for vxlan/geneve Jakub Kicinski
  2018-10-04 14:28 ` [RFC 0/2] net: sched: indirect/remote setup tc block cb registering Or Gerlitz
  2 siblings, 0 replies; 10+ messages in thread
From: Jakub Kicinski @ 2018-10-04  4:55 UTC (permalink / raw)
  To: netdev; +Cc: jiri, gerlitz.or, oss-drivers, john.hurley, Jakub Kicinski

From: John Hurley <john.hurley@netronome.com>

Currently drivers can register for TC block binds/unbinds by implementing
the setup_tc ndo. However, drivers may also be interested in binds to
higher level devices (e.g. tunnel drivers) to potentially offload filters
applied to them.

Introduce indirect block setups which allows drivers to register callbacks
for block binds on other devices. The calling driver is expected to
allocate a struct containing an initialised list head to all its block
setup callbacks. This is used to track the callbacks from a given driver
and free them if the driver is removed while the upper level device is
still active. Freeing a setup cb will also trigger an unbind event (if
necessary) to direct the driver to unregister any block callbacks.

Allow registering an indirect block setup cb for a device that is already
bound to a block. In this case (if it is an ingress block), register and
also trigger the callback - meaning that any already installed rules can
be replayed to the calling driver if it chooses.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 include/net/pkt_cls.h     |  56 +++++++
 include/net/sch_generic.h |   3 +
 net/sched/cls_api.c       | 297 +++++++++++++++++++++++++++++++++++++-
 3 files changed, 355 insertions(+), 1 deletion(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 338ef054bf16..85e335162982 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -37,6 +37,7 @@ struct tcf_block_ext_info {
 };
 
 struct tcf_block_cb;
+struct tcf_indr_block_owner;
 bool tcf_queue_work(struct rcu_work *rwork, work_func_t func);
 
 #ifdef CONFIG_NET_CLS
@@ -81,6 +82,20 @@ void __tcf_block_cb_unregister(struct tcf_block *block,
 			       struct tcf_block_cb *block_cb);
 void tcf_block_cb_unregister(struct tcf_block *block,
 			     tc_setup_cb_t *cb, void *cb_ident);
+int __tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
+				tc_indr_block_bind_cb_t *cb, void *cb_ident,
+				struct tcf_indr_block_owner *owner);
+int tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
+			      tc_indr_block_bind_cb_t *cb, void *cb_ident,
+			      struct tcf_indr_block_owner *owner);
+void __tc_indr_block_cb_unregister(struct net_device *dev,
+				   tc_indr_block_bind_cb_t *cb, void *cb_ident);
+void tc_indr_block_cb_unregister(struct net_device *dev,
+				 tc_indr_block_bind_cb_t *cb,
+				 void *cb_ident);
+
+struct tcf_indr_block_owner *tc_indr_block_owner_create(void);
+void tc_indr_block_owner_clean(struct tcf_indr_block_owner *owner);
 
 int tcf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 		 struct tcf_result *res, bool compat_mode);
@@ -183,6 +198,47 @@ void tcf_block_cb_unregister(struct tcf_block *block,
 {
 }
 
+static inline
+int __tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
+				tc_indr_block_bind_cb_t *cb,
+				void *cb_ident,
+				struct tcf_indr_block_owner *owner)
+{
+	return 0;
+}
+
+static inline
+int tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
+			      tc_indr_block_bind_cb_t *cb, void *cb_ident,
+			      struct tcf_indr_block_owner *owner)
+{
+	return 0;
+}
+
+static inline
+void __tc_indr_block_cb_unregister(struct net_device *dev,
+				   tc_indr_block_bind_cb_t *cb,
+				   void *cb_ident)
+{
+}
+
+static inline
+void tc_indr_block_cb_unregister(struct net_device *dev,
+				 tc_indr_block_bind_cb_t *cb,
+				 void *cb_ident)
+{
+}
+
+static inline struct tcf_indr_block_owner *tc_indr_block_owner_create(void)
+{
+	/* NULL would mean an error, only CONFIG_NET_CLS can dereference this */
+	return (void *)1;
+}
+
+static inline void tc_indr_block_owner_clean(struct tcf_indr_block_owner *owner)
+{
+}
+
 static inline int tcf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 			       struct tcf_result *res, bool compat_mode)
 {
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index de972403d31e..da73864c001c 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -24,6 +24,9 @@ struct bpf_flow_keys;
 typedef int tc_setup_cb_t(enum tc_setup_type type,
 			  void *type_data, void *cb_priv);
 
+typedef int tc_indr_block_bind_cb_t(struct net_device *dev, void *cb_priv,
+				   enum tc_setup_type type, void *type_data);
+
 struct qdisc_rate_table {
 	struct tc_ratespec rate;
 	u32		data[256];
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 3de47e99b788..3acc103294b4 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -25,6 +25,7 @@
 #include <linux/kmod.h>
 #include <linux/slab.h>
 #include <linux/idr.h>
+#include <linux/rhashtable.h>
 #include <net/net_namespace.h>
 #include <net/sock.h>
 #include <net/netlink.h>
@@ -363,6 +364,286 @@ static void tcf_chain_flush(struct tcf_chain *chain)
 	}
 }
 
+static struct tcf_block *tc_dev_ingress_block(struct net_device *dev)
+{
+	const struct Qdisc_class_ops *cops;
+	struct Qdisc *qdisc;
+
+	if (!dev_ingress_queue(dev))
+		return NULL;
+
+	qdisc = dev_ingress_queue(dev)->qdisc_sleeping;
+	if (!qdisc)
+		return NULL;
+
+	cops = qdisc->ops->cl_ops;
+	if (!cops)
+		return NULL;
+
+	if (!cops->tcf_block)
+		return NULL;
+
+	return cops->tcf_block(qdisc, TC_H_MIN_INGRESS, NULL);
+}
+
+static struct rhashtable indr_setup_block_ht;
+
+struct tc_indr_block_dev {
+	struct rhash_head ht_node;
+	struct net_device *dev;
+	unsigned int refcnt;
+	struct list_head cb_list;
+	struct tcf_block *block;
+};
+
+struct tc_indr_block_cb {
+	struct tc_indr_block_dev *indr_dev;
+	struct list_head list;
+	void *cb_priv;
+	tc_indr_block_bind_cb_t *cb;
+	void *cb_ident;
+	struct list_head track_list;
+};
+
+struct tcf_indr_block_owner {
+	struct list_head cb_list;
+};
+
+static const struct rhashtable_params tc_indr_setup_block_ht_params = {
+	.key_offset	= offsetof(struct tc_indr_block_dev, dev),
+	.head_offset	= offsetof(struct tc_indr_block_dev, ht_node),
+	.key_len	= sizeof(struct net_device *),
+};
+
+static struct tc_indr_block_dev *
+tc_indr_block_dev_lookup(struct net_device *dev)
+{
+	return rhashtable_lookup_fast(&indr_setup_block_ht, &dev,
+				      tc_indr_setup_block_ht_params);
+}
+
+static struct tc_indr_block_dev *tc_indr_block_dev_get(struct net_device *dev)
+{
+	struct tc_indr_block_dev *indr_dev;
+
+	indr_dev = tc_indr_block_dev_lookup(dev);
+	if (indr_dev)
+		goto inc_ref;
+
+	indr_dev = kzalloc(sizeof(*indr_dev), GFP_KERNEL);
+	if (!indr_dev)
+		return NULL;
+
+	INIT_LIST_HEAD(&indr_dev->cb_list);
+	indr_dev->dev = dev;
+	indr_dev->block = tc_dev_ingress_block(dev);
+	if (rhashtable_insert_fast(&indr_setup_block_ht, &indr_dev->ht_node,
+				   tc_indr_setup_block_ht_params)) {
+		kfree(indr_dev);
+		return NULL;
+	}
+
+inc_ref:
+	indr_dev->refcnt++;
+	return indr_dev;
+}
+
+static void tc_indr_block_dev_put(struct tc_indr_block_dev *indr_dev)
+{
+	if (--indr_dev->refcnt)
+		return;
+
+	rhashtable_remove_fast(&indr_setup_block_ht, &indr_dev->ht_node,
+			       tc_indr_setup_block_ht_params);
+	kfree(indr_dev);
+}
+
+static struct tc_indr_block_cb *
+tc_indr_block_cb_lookup(struct tc_indr_block_dev *indr_dev,
+			tc_indr_block_bind_cb_t *cb, void *cb_ident)
+{
+	struct tc_indr_block_cb *indr_block_cb;
+
+	list_for_each_entry(indr_block_cb, &indr_dev->cb_list, list)
+		if (indr_block_cb->cb == cb &&
+		    indr_block_cb->cb_ident == cb_ident)
+			return indr_block_cb;
+	return NULL;
+}
+
+static struct tc_indr_block_cb *
+tc_indr_block_cb_add(struct tc_indr_block_dev *indr_dev,
+		     void *cb_priv, tc_indr_block_bind_cb_t *cb,
+		     void *cb_ident, struct tcf_indr_block_owner *owner)
+{
+	struct tc_indr_block_cb *indr_block_cb;
+
+	indr_block_cb = tc_indr_block_cb_lookup(indr_dev, cb, cb_ident);
+	if (indr_block_cb)
+		return ERR_PTR(-EEXIST);
+
+	indr_block_cb = kzalloc(sizeof(*indr_block_cb), GFP_KERNEL);
+	if (!indr_block_cb)
+		return ERR_PTR(-ENOMEM);
+
+	indr_block_cb->indr_dev = indr_dev;
+	indr_block_cb->cb_priv = cb_priv;
+	indr_block_cb->cb = cb;
+	indr_block_cb->cb_ident = cb_ident;
+	list_add(&indr_block_cb->list, &indr_dev->cb_list);
+	list_add(&indr_block_cb->track_list, &owner->cb_list);
+
+	return indr_block_cb;
+}
+
+static void tc_indr_block_cb_del(struct tc_indr_block_cb *indr_block_cb)
+{
+	list_del(&indr_block_cb->list);
+	list_del(&indr_block_cb->track_list);
+	kfree(indr_block_cb);
+}
+
+static int tc_indr_block_ing_cmd(struct tc_indr_block_dev *indr_dev,
+				 struct tc_indr_block_cb *indr_block_cb,
+				 enum tc_block_command command)
+{
+	struct tc_block_offload bo = {
+		.command	= command,
+		.binder_type	= TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS,
+		.block		= indr_dev->block,
+	};
+
+	if (!indr_dev->block)
+		return 0;
+	return indr_block_cb->cb(indr_dev->dev, indr_block_cb->cb_priv,
+				 TC_SETUP_BLOCK, &bo);
+}
+
+int __tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
+				tc_indr_block_bind_cb_t *cb, void *cb_ident,
+				struct tcf_indr_block_owner *owner)
+{
+	struct tc_indr_block_cb *indr_block_cb;
+	struct tc_indr_block_dev *indr_dev;
+	int err;
+
+	indr_dev = tc_indr_block_dev_get(dev);
+	if (!indr_dev)
+		return -ENOMEM;
+
+	indr_block_cb = tc_indr_block_cb_add(indr_dev, cb_priv, cb, cb_ident,
+					     owner);
+	err = PTR_ERR_OR_ZERO(indr_block_cb);
+	if (err)
+		goto err_dev_put;
+
+	tc_indr_block_ing_cmd(indr_dev, indr_block_cb, TC_BLOCK_BIND);
+	return 0;
+
+err_dev_put:
+	tc_indr_block_dev_put(indr_dev);
+	return err;
+}
+EXPORT_SYMBOL_GPL(__tc_indr_block_cb_register);
+
+int tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
+			      tc_indr_block_bind_cb_t *cb, void *cb_ident,
+			      struct tcf_indr_block_owner *owner)
+{
+	int err;
+
+	rtnl_lock();
+	err = __tc_indr_block_cb_register(dev, cb_priv, cb, cb_ident, owner);
+	rtnl_unlock();
+
+	return err;
+}
+EXPORT_SYMBOL_GPL(tc_indr_block_cb_register);
+
+void __tc_indr_block_cb_unregister(struct net_device *dev,
+				   tc_indr_block_bind_cb_t *cb, void *cb_ident)
+{
+	struct tc_indr_block_cb *indr_block_cb;
+	struct tc_indr_block_dev *indr_dev;
+
+	indr_dev = tc_indr_block_dev_lookup(dev);
+	if (!indr_dev)
+		return;
+
+	indr_block_cb = tc_indr_block_cb_lookup(indr_dev, cb, cb_ident);
+	if (!indr_block_cb)
+		return;
+
+	/* Send unbind message if required to free any block cbs. */
+	tc_indr_block_ing_cmd(indr_dev, indr_block_cb, TC_BLOCK_UNBIND);
+	tc_indr_block_cb_del(indr_block_cb);
+	tc_indr_block_dev_put(indr_dev);
+}
+EXPORT_SYMBOL_GPL(__tc_indr_block_cb_unregister);
+
+void tc_indr_block_cb_unregister(struct net_device *dev,
+				 tc_indr_block_bind_cb_t *cb, void *cb_ident)
+{
+	rtnl_lock();
+	__tc_indr_block_cb_unregister(dev, cb, cb_ident);
+	rtnl_unlock();
+}
+EXPORT_SYMBOL_GPL(tc_indr_block_cb_unregister);
+
+struct tcf_indr_block_owner *tc_indr_block_owner_create(void)
+{
+	struct tcf_indr_block_owner *owner;
+
+	owner = kzalloc(sizeof(*owner), GFP_KERNEL);
+	if (!owner)
+		return NULL;
+	INIT_LIST_HEAD(&owner->cb_list);
+	return owner;
+}
+EXPORT_SYMBOL_GPL(tc_indr_block_owner_create);
+
+void tc_indr_block_owner_clean(struct tcf_indr_block_owner *owner)
+{
+	struct tc_indr_block_cb *indr_block_cb, *store;
+	struct tc_indr_block_dev *indr_dev;
+
+	rtnl_lock();
+	list_for_each_entry_safe(indr_block_cb, store, &owner->cb_list,
+				 track_list) {
+		indr_dev = indr_block_cb->indr_dev;
+		tc_indr_block_ing_cmd(indr_dev, indr_block_cb, TC_BLOCK_UNBIND);
+		tc_indr_block_cb_del(indr_block_cb);
+		tc_indr_block_dev_put(indr_dev);
+	}
+	rtnl_unlock();
+}
+EXPORT_SYMBOL_GPL(tc_indr_block_owner_clean);
+
+static void tc_indr_block_call(struct tcf_block *block, struct net_device *dev,
+			       struct tcf_block_ext_info *ei,
+			       enum tc_block_command command,
+			       struct netlink_ext_ack *extack)
+{
+	struct tc_indr_block_cb *indr_block_cb;
+	struct tc_indr_block_dev *indr_dev;
+	struct tc_block_offload bo = {
+		.command	= command,
+		.binder_type	= ei->binder_type,
+		.block		= block,
+		.extack		= extack,
+	};
+
+	indr_dev = tc_indr_block_dev_lookup(dev);
+	if (!indr_dev)
+		return;
+
+	indr_dev->block = command == TC_BLOCK_BIND ? block : NULL;
+
+	list_for_each_entry(indr_block_cb, &indr_dev->cb_list, list)
+		indr_block_cb->cb(dev, indr_block_cb->cb_priv, TC_SETUP_BLOCK,
+				  &bo);
+}
+
 static bool tcf_block_offload_in_use(struct tcf_block *block)
 {
 	return block->offloadcnt;
@@ -404,12 +685,17 @@ static int tcf_block_offload_bind(struct tcf_block *block, struct Qdisc *q,
 	err = tcf_block_offload_cmd(block, dev, ei, TC_BLOCK_BIND, extack);
 	if (err == -EOPNOTSUPP)
 		goto no_offload_dev_inc;
-	return err;
+	if (err)
+		return err;
+
+	tc_indr_block_call(block, dev, ei, TC_BLOCK_BIND, extack);
+	return 0;
 
 no_offload_dev_inc:
 	if (tcf_block_offload_in_use(block))
 		return -EOPNOTSUPP;
 	block->nooffloaddevcnt++;
+	tc_indr_block_call(block, dev, ei, TC_BLOCK_BIND, extack);
 	return 0;
 }
 
@@ -419,6 +705,8 @@ static void tcf_block_offload_unbind(struct tcf_block *block, struct Qdisc *q,
 	struct net_device *dev = q->dev_queue->dev;
 	int err;
 
+	tc_indr_block_call(block, dev, ei, TC_BLOCK_UNBIND, NULL);
+
 	if (!dev->netdev_ops->ndo_setup_tc)
 		goto no_offload_dev_dec;
 	err = tcf_block_offload_cmd(block, dev, ei, TC_BLOCK_UNBIND, NULL);
@@ -2349,6 +2637,11 @@ static int __init tc_filter_init(void)
 	if (err)
 		goto err_register_pernet_subsys;
 
+	err = rhashtable_init(&indr_setup_block_ht,
+			      &tc_indr_setup_block_ht_params);
+	if (err)
+		goto err_rhash_setup_block_ht;
+
 	rtnl_register(PF_UNSPEC, RTM_NEWTFILTER, tc_new_tfilter, NULL, 0);
 	rtnl_register(PF_UNSPEC, RTM_DELTFILTER, tc_del_tfilter, NULL, 0);
 	rtnl_register(PF_UNSPEC, RTM_GETTFILTER, tc_get_tfilter,
@@ -2360,6 +2653,8 @@ static int __init tc_filter_init(void)
 
 	return 0;
 
+err_rhash_setup_block_ht:
+	unregister_pernet_subsys(&tcf_net_ops);
 err_register_pernet_subsys:
 	destroy_workqueue(tc_filter_wq);
 	return err;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC 2/2] nfp: register remote block callbacks for vxlan/geneve
  2018-10-04  4:55 [RFC 0/2] net: sched: indirect/remote setup tc block cb registering Jakub Kicinski
  2018-10-04  4:55 ` [RFC 1/2] net: sched: register callbacks for remote tc block binds Jakub Kicinski
@ 2018-10-04  4:55 ` Jakub Kicinski
  2018-10-04 14:28 ` [RFC 0/2] net: sched: indirect/remote setup tc block cb registering Or Gerlitz
  2 siblings, 0 replies; 10+ messages in thread
From: Jakub Kicinski @ 2018-10-04  4:55 UTC (permalink / raw)
  To: netdev; +Cc: jiri, gerlitz.or, oss-drivers, john.hurley, Jakub Kicinski

From: John Hurley <john.hurley@netronome.com>

Test stub to illustrate how the NFP could register for and receive
callbacks from remote block setups.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 .../net/ethernet/netronome/nfp/flower/main.c  |  12 ++
 .../net/ethernet/netronome/nfp/flower/main.h  |  10 ++
 .../ethernet/netronome/nfp/flower/offload.c   | 156 ++++++++++++++++++
 .../netronome/nfp/flower/tunnel_conf.c        |   8 +
 4 files changed, 186 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.c b/drivers/net/ethernet/netronome/nfp/flower/main.c
index e57d23746585..34b0c3602ab2 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.c
@@ -587,8 +587,17 @@ static int nfp_flower_init(struct nfp_app *app)
 		goto err_cleanup_metadata;
 	}
 
+	app_priv->indir_cb_owner = tc_indr_block_owner_create();
+	if (!app_priv->indir_cb_owner)
+		goto err_cleanup_lag;
+
+	INIT_LIST_HEAD(&app_priv->nfp_indr_block_cb_list);
+
 	return 0;
 
+err_cleanup_lag:
+	if (app_priv->flower_ext_feats & NFP_FL_FEATS_LAG)
+		nfp_flower_lag_cleanup(&app_priv->nfp_lag);
 err_cleanup_metadata:
 	nfp_flower_metadata_cleanup(app);
 err_free_app_priv:
@@ -607,6 +616,9 @@ static void nfp_flower_clean(struct nfp_app *app)
 	if (app_priv->flower_ext_feats & NFP_FL_FEATS_LAG)
 		nfp_flower_lag_cleanup(&app_priv->nfp_lag);
 
+	tc_indr_block_owner_clean(app_priv->indir_cb_owner);
+	nfp_flower_clean_indr_block_cbs(app_priv);
+
 	nfp_flower_metadata_cleanup(app);
 	vfree(app->priv);
 	app->priv = NULL;
diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.h b/drivers/net/ethernet/netronome/nfp/flower/main.h
index 81d941ab895c..5f27318ecdbd 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.h
@@ -161,6 +161,7 @@ struct nfp_fl_lag {
  * @reify_wait_queue:	wait queue for repr reify response counting
  * @mtu_conf:		Configuration of repr MTU value
  * @nfp_lag:		Link aggregation data block
+ * @indir_cb_owner:	Master structure for indirect TC block callback
  */
 struct nfp_flower_priv {
 	struct nfp_app *app;
@@ -191,6 +192,8 @@ struct nfp_flower_priv {
 	wait_queue_head_t reify_wait_queue;
 	struct nfp_mtu_conf mtu_conf;
 	struct nfp_fl_lag nfp_lag;
+	struct list_head nfp_indr_block_cb_list;
+	struct tcf_indr_block_owner *indir_cb_owner;
 };
 
 /**
@@ -293,5 +296,12 @@ int nfp_flower_lag_populate_pre_action(struct nfp_app *app,
 				       struct nfp_fl_pre_lag *pre_act);
 int nfp_flower_lag_get_output_id(struct nfp_app *app,
 				 struct net_device *master);
+void
+nfp_flower_register_indr_block(struct nfp_flower_priv *app_priv,
+			       struct net_device *netdev);
+void
+nfp_flower_unregister_indr_block(struct nfp_flower_priv *app_priv,
+				 struct net_device *netdev);
+void nfp_flower_clean_indr_block_cbs(struct nfp_flower_priv *app_priv);
 
 #endif
diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index bd19624f10cf..14f1b91b7b90 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -707,3 +707,159 @@ int nfp_flower_setup_tc(struct nfp_app *app, struct net_device *netdev,
 		return -EOPNOTSUPP;
 	}
 }
+
+struct indr_block_cb_priv {
+	struct net_device *netdev;
+	struct nfp_flower_priv *app_priv;
+	struct list_head list;
+};
+
+static struct indr_block_cb_priv *
+indr_block_cb_priv_lookup(struct nfp_flower_priv *app_priv,
+			  struct net_device *netdev)
+{
+	struct indr_block_cb_priv *cb_priv;
+
+	/* All callback list access should be protected by RTNL. */
+	ASSERT_RTNL();
+
+	list_for_each_entry(cb_priv, &app_priv->nfp_indr_block_cb_list, list)
+		if (cb_priv->netdev == netdev)
+			return cb_priv;
+
+	return NULL;
+}
+
+void nfp_flower_clean_indr_block_cbs(struct nfp_flower_priv *app_priv)
+{
+	struct indr_block_cb_priv *cb_priv, *temp;
+
+	list_for_each_entry_safe(cb_priv, temp,
+				 &app_priv->nfp_indr_block_cb_list, list)
+		kfree(cb_priv);
+}
+
+static int
+nfp_flower_indr_offload(struct net_device *netdev,
+			struct tc_cls_flower_offload *flower)
+{
+	if (flower->common.chain_index)
+		return -EOPNOTSUPP;
+
+	if (!eth_proto_is_802_3(flower->common.protocol))
+		return -EOPNOTSUPP;
+
+	switch (flower->command) {
+	case TC_CLSFLOWER_REPLACE:
+		netdev_info(netdev, "Flower replace\n");
+		break;
+	case TC_CLSFLOWER_DESTROY:
+		netdev_info(netdev, "Flower destroy\n");
+		break;
+	case TC_CLSFLOWER_STATS:
+		netdev_info(netdev, "Flower stats\n");
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	return 0;
+}
+
+static int nfp_flower_setup_indr_block_cb(enum tc_setup_type type,
+					  void *type_data, void *cb_priv)
+{
+	struct indr_block_cb_priv *priv = cb_priv;
+
+	switch (type) {
+	case TC_SETUP_CLSFLOWER:
+		return nfp_flower_indr_offload(priv->netdev, type_data);
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+
+static int
+nfp_flower_setup_indr_tc_block(struct net_device *netdev,
+			       struct nfp_flower_priv *app_priv,
+			       struct tc_block_offload *f)
+{
+	struct indr_block_cb_priv *cb_priv;
+	int err;
+
+	if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
+		return -EOPNOTSUPP;
+
+	switch (f->command) {
+	case TC_BLOCK_BIND:
+		cb_priv = indr_block_cb_priv_lookup(app_priv, netdev);
+		if (cb_priv)
+			return -EEXIST;
+
+		cb_priv = kmalloc(sizeof(*cb_priv), GFP_KERNEL);
+		if (!cb_priv)
+			return -ENOMEM;
+
+		cb_priv->netdev = netdev;
+		cb_priv->app_priv = app_priv;
+		list_add(&cb_priv->list, &app_priv->nfp_indr_block_cb_list);
+
+		err = tcf_block_cb_register(f->block,
+					    nfp_flower_setup_indr_block_cb,
+					    netdev, cb_priv, f->extack);
+		if (err) {
+			list_del(&cb_priv->list);
+			kfree(cb_priv);
+		}
+
+		return err;
+	case TC_BLOCK_UNBIND:
+		tcf_block_cb_unregister(f->block,
+					nfp_flower_setup_indr_block_cb,
+					netdev);
+		cb_priv = indr_block_cb_priv_lookup(app_priv, netdev);
+		if (cb_priv) {
+			list_del(&cb_priv->list);
+			kfree(cb_priv);
+		}
+
+		return 0;
+	default:
+		return -EOPNOTSUPP;
+	}
+	return 0;
+}
+
+int nfp_flower_indr_setup_tc_cb(struct net_device *netdev, void *cb_priv,
+				enum tc_setup_type type, void *type_data)
+{
+	switch (type) {
+	case TC_SETUP_BLOCK:
+		return nfp_flower_setup_indr_tc_block(netdev, cb_priv,
+						      type_data);
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+
+void
+nfp_flower_register_indr_block(struct nfp_flower_priv *app_priv,
+			       struct net_device *netdev)
+{
+	int err;
+
+	err = __tc_indr_block_cb_register(netdev, app_priv,
+					  nfp_flower_indr_setup_tc_cb,
+					  netdev, app_priv->indir_cb_owner);
+
+	if (err)
+		nfp_flower_cmsg_warn(app_priv->app, "Failed to register remote block notifier for %s\n", netdev_name(netdev));
+}
+
+void
+nfp_flower_unregister_indr_block(struct nfp_flower_priv *app_priv,
+				 struct net_device *netdev)
+{
+	__tc_indr_block_cb_unregister(netdev, nfp_flower_indr_setup_tc_cb,
+				      netdev);
+}
diff --git a/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c b/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
index 382bb93cb090..49cf86f1a2da 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
@@ -697,6 +697,10 @@ static int nfp_tun_mac_event_handler(struct notifier_block *nb,
 		/* If non-nfp netdev then free its offload index. */
 		if (nfp_tun_is_netdev_to_offload(netdev))
 			nfp_tun_del_mac_idx(app, netdev->ifindex);
+
+		if (event == NETDEV_UNREGISTER &&
+		    nfp_tun_is_netdev_to_offload(netdev))
+			nfp_flower_unregister_indr_block(app_priv, netdev);
 	} else if (event == NETDEV_UP || event == NETDEV_CHANGEADDR ||
 		   event == NETDEV_REGISTER) {
 		app_priv = container_of(nb, struct nfp_flower_priv,
@@ -708,6 +712,10 @@ static int nfp_tun_mac_event_handler(struct notifier_block *nb,
 
 		/* Force a list write to keep NFP up to date. */
 		nfp_tunnel_write_macs(app);
+
+		if (event == NETDEV_REGISTER &&
+		    nfp_tun_is_netdev_to_offload(netdev))
+			nfp_flower_register_indr_block(app_priv, netdev);
 	}
 	return NOTIFY_OK;
 }
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC 0/2] net: sched: indirect/remote setup tc block cb registering
  2018-10-04  4:55 [RFC 0/2] net: sched: indirect/remote setup tc block cb registering Jakub Kicinski
  2018-10-04  4:55 ` [RFC 1/2] net: sched: register callbacks for remote tc block binds Jakub Kicinski
  2018-10-04  4:55 ` [RFC 2/2] nfp: register remote block callbacks for vxlan/geneve Jakub Kicinski
@ 2018-10-04 14:28 ` Or Gerlitz
  2018-10-04 15:44   ` John Hurley
  2 siblings, 1 reply; 10+ messages in thread
From: Or Gerlitz @ 2018-10-04 14:28 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Linux Netdev List, Jiri Pirko, oss-drivers, John Hurley,
	Oz Shlomo, Aviv Heller

On Thu, Oct 4, 2018 at 7:55 AM Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
>
> Hi!
>
> This set contains a rough RFC implementation of a proposed [1] replacement
> for egdev cls_flower offloads.  I did some last minute restructuring
> and removal of parts I felt were unnecessary, so if there are glaring bugs
> they are probably mine, not John's :)  but hopefully this will give an idea
> of the general direction.  We need to beef up the driver part to see how
> it fully comes together.
>
> [1] http://vger.kernel.org/netconf2018_files/JakubKicinski_netconf2018.pdf
>     slides 10-13
>
> John's says:
>
> This patchset introduces as an alternative to egdev offload by allowing a
> driver to register for block updates when an external device (e.g. tunnel
> netdev) is bound to a TC block.

In a slightly different but hopefully somehow related context, regarding
the case of flow offloading in the presence of upper devices (specifically LAG),
your ovs user patch [1]  applied TC block sharing on the slave of lag
(bond/team)
device which serves as ovs port. This way, flows that are installed on
the bond are
propagated to both uplink devices - good!

However, when tunneling comes into play, the bond device is not part of
the virtual switch but rather the tunnel device, so the SW DP is

wire --> hw driver --> bond --> stack --> tunnel driver --> virtual switch

So now, if the HW driver uses your new facility to register for rules
installed on the
tunnel device, we are again properly sharing (duplicating) the rules
to both uplinks, right?!

[1] d22f892 netdev-linux: monitor and offload LAG slaves to TC

> Drivers can track new netdevs or register
> to existing ones to receive information on such events. Based on this,
> they may register for block offload rules using already existing functions.

Just to make it clear, (part of) the claim to fame here is that once
we have this
code in, we can just go and remove all the egdev related code from the
kernel (both
core and drivers), right? only nfp and mlx5 use egdev, so the removal
should be simple
exercise.

> Included with this RFC is a patch to the NFP driver. This is only supposed
> to provide an example of how the remote block setup can be used.

We will look and play with the patches next week and provide feedback, cool
that you took the lead to improve the facilities here!

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/2] net: sched: indirect/remote setup tc block cb registering
  2018-10-04 14:28 ` [RFC 0/2] net: sched: indirect/remote setup tc block cb registering Or Gerlitz
@ 2018-10-04 15:44   ` John Hurley
  2018-10-04 15:53     ` Or Gerlitz
  0 siblings, 1 reply; 10+ messages in thread
From: John Hurley @ 2018-10-04 15:44 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Jakub Kicinski, Linux Netdev List, Jiri Pirko, oss-drivers, ozsh,
	avivh, Simon Horman

On Thu, Oct 4, 2018 at 3:28 PM Or Gerlitz <gerlitz.or@gmail.com> wrote:
>
> On Thu, Oct 4, 2018 at 7:55 AM Jakub Kicinski
> <jakub.kicinski@netronome.com> wrote:
> >
> > Hi!
> >
> > This set contains a rough RFC implementation of a proposed [1] replacement
> > for egdev cls_flower offloads.  I did some last minute restructuring
> > and removal of parts I felt were unnecessary, so if there are glaring bugs
> > they are probably mine, not John's :)  but hopefully this will give an idea
> > of the general direction.  We need to beef up the driver part to see how
> > it fully comes together.
> >
> > [1] http://vger.kernel.org/netconf2018_files/JakubKicinski_netconf2018.pdf
> >     slides 10-13
> >
> > John's says:
> >
> > This patchset introduces as an alternative to egdev offload by allowing a
> > driver to register for block updates when an external device (e.g. tunnel
> > netdev) is bound to a TC block.
>
> In a slightly different but hopefully somehow related context, regarding
> the case of flow offloading in the presence of upper devices (specifically LAG),
> your ovs user patch [1]  applied TC block sharing on the slave of lag
> (bond/team)
> device which serves as ovs port. This way, flows that are installed on
> the bond are
> propagated to both uplink devices - good!
>
> However, when tunneling comes into play, the bond device is not part of
> the virtual switch but rather the tunnel device, so the SW DP is
>
> wire --> hw driver --> bond --> stack --> tunnel driver --> virtual switch
>
> So now, if the HW driver uses your new facility to register for rules
> installed on the
> tunnel device, we are again properly sharing (duplicating) the rules
> to both uplinks, right?!
>
> [1] d22f892 netdev-linux: monitor and offload LAG slaves to TC
>

Hi Or,
In this case the hw driver will receive the rules from the tunnel
device directly.
The driver can then offload them as it sees fit.
Because the bond is not on the vSwitch, the TC rule will not be
offloaded to it and therefore not duplicated to its devices.
Currently, this setup would be offloaded via egdev.

> > Drivers can track new netdevs or register
> > to existing ones to receive information on such events. Based on this,
> > they may register for block offload rules using already existing functions.
>
> Just to make it clear, (part of) the claim to fame here is that once
> we have this
> code in, we can just go and remove all the egdev related code from the
> kernel (both
> core and drivers), right? only nfp and mlx5 use egdev, so the removal
> should be simple
> exercise.
>

Along with simplifying things and removing the need for duplicate rule
offload checks, I see (at least) 2 main functional benefits of using
this instead of egdev:
1. we can get access to the ingress netdev and so can check for things
like tunnel type rather than relying on TC rules and well known ports
to determine this.
2. we can offload rules that do not have an uplink/repr as ingress or
egress dev (currently, the hw driver will not recieve a callback) -
e.g. VXLAN -->bond.

> > Included with this RFC is a patch to the NFP driver. This is only supposed
> > to provide an example of how the remote block setup can be used.
>
> We will look and play with the patches next week and provide feedback, cool
> that you took the lead to improve the facilities here!
cool, thanks

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/2] net: sched: indirect/remote setup tc block cb registering
  2018-10-04 15:44   ` John Hurley
@ 2018-10-04 15:53     ` Or Gerlitz
  2018-10-04 16:20       ` John Hurley
  0 siblings, 1 reply; 10+ messages in thread
From: Or Gerlitz @ 2018-10-04 15:53 UTC (permalink / raw)
  To: John Hurley
  Cc: Jakub Kicinski, Linux Netdev List, Jiri Pirko, oss-drivers,
	Oz Shlomo, Aviv Heller, Simon Horman

On Thu, Oct 4, 2018 at 6:44 PM John Hurley <john.hurley@netronome.com> wrote:
> On Thu, Oct 4, 2018 at 3:28 PM Or Gerlitz <gerlitz.or@gmail.com> wrote:
> > On Thu, Oct 4, 2018 at 7:55 AM Jakub Kicinski <jakub.kicinski@netronome.com> wrote:

> > > This patchset introduces as an alternative to egdev offload by allowing a
> > > driver to register for block updates when an external device (e.g. tunnel
> > > netdev) is bound to a TC block.

> > In a slightly different but hopefully somehow related context, regarding
> > the case of flow offloading in the presence of upper devices (specifically LAG),
> > your ovs user patch [1]  applied TC block sharing on the slave of lag (bond/team)
> > device which serves as ovs port. This way, flows that are installed on
> > the bond are propagated to both uplink devices - good!

> > However, when tunneling comes into play, the bond device is not part of
> > the virtual switch but rather the tunnel device, so the SW DP is
> >
> > wire --> hw driver --> bond --> stack --> tunnel driver --> virtual switch
> >
> > So now, if the HW driver uses your new facility to register for rules installed on the
> > tunnel device, we are again properly sharing (duplicating) the rules
> > to both uplinks, right?!
> >
> > [1] d22f892 netdev-linux: monitor and offload LAG slaves to TC

> Because the bond is not on the vSwitch, the TC rule will not be
> offloaded to it and therefore not duplicated to its devices.

indeed

> In this case the hw driver will receive the rules from the tunnel device directly.
> The driver can then offload them as it sees fit.

if both instances of the hw drivers (uplink0, uplink1) register to get
the rules installed on the block of the tunnel device we have exactly
what we want, isn't that?

> Currently, this setup would be offloaded via egdev.

not following, egdev I thought could be removed... and it's not needed
as I explained above, unless I miss something?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/2] net: sched: indirect/remote setup tc block cb registering
  2018-10-04 15:53     ` Or Gerlitz
@ 2018-10-04 16:20       ` John Hurley
  2018-10-04 17:19         ` Jakub Kicinski
  0 siblings, 1 reply; 10+ messages in thread
From: John Hurley @ 2018-10-04 16:20 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Jakub Kicinski, Linux Netdev List, Jiri Pirko, oss-drivers, ozsh,
	avivh, Simon Horman

On Thu, Oct 4, 2018 at 4:53 PM Or Gerlitz <gerlitz.or@gmail.com> wrote:
>
> On Thu, Oct 4, 2018 at 6:44 PM John Hurley <john.hurley@netronome.com> wrote:
> > On Thu, Oct 4, 2018 at 3:28 PM Or Gerlitz <gerlitz.or@gmail.com> wrote:
> > > On Thu, Oct 4, 2018 at 7:55 AM Jakub Kicinski <jakub.kicinski@netronome.com> wrote:
>
> > > > This patchset introduces as an alternative to egdev offload by allowing a
> > > > driver to register for block updates when an external device (e.g. tunnel
> > > > netdev) is bound to a TC block.
>
> > > In a slightly different but hopefully somehow related context, regarding
> > > the case of flow offloading in the presence of upper devices (specifically LAG),
> > > your ovs user patch [1]  applied TC block sharing on the slave of lag (bond/team)
> > > device which serves as ovs port. This way, flows that are installed on
> > > the bond are propagated to both uplink devices - good!
>
> > > However, when tunneling comes into play, the bond device is not part of
> > > the virtual switch but rather the tunnel device, so the SW DP is
> > >
> > > wire --> hw driver --> bond --> stack --> tunnel driver --> virtual switch
> > >
> > > So now, if the HW driver uses your new facility to register for rules installed on the
> > > tunnel device, we are again properly sharing (duplicating) the rules
> > > to both uplinks, right?!
> > >
> > > [1] d22f892 netdev-linux: monitor and offload LAG slaves to TC
>
> > Because the bond is not on the vSwitch, the TC rule will not be
> > offloaded to it and therefore not duplicated to its devices.
>
> indeed
>
> > In this case the hw driver will receive the rules from the tunnel device directly.
> > The driver can then offload them as it sees fit.
>
> if both instances of the hw drivers (uplink0, uplink1) register to get
> the rules installed on the block of the tunnel device we have exactly
> what we want, isn't that?
>

The design here is that each hw driver should only need to register
for callbacks on a 'higher level' device's block once.
When a callback is triggered the driver receives one instance of the
rule and can make its own decision about what to do.
This is slightly different from registering ingress devs where each
uplink registers for its own block.
It is probably more akin to the egdev setup in that if a rule on a
block egresses to an uplink, the driver receives 1 callback on the
rule, irrespective of how may underlying netdevs are on the block.

> > Currently, this setup would be offloaded via egdev.
>
> not following, egdev I thought could be removed... and it's not needed
> as I explained above, unless I miss something?

Apologies - my use of 'currently' meant with the current upstream
kernel (i.e. prior to this patch).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/2] net: sched: indirect/remote setup tc block cb registering
  2018-10-04 16:20       ` John Hurley
@ 2018-10-04 17:19         ` Jakub Kicinski
  2018-10-10 13:38           ` Or Gerlitz
  0 siblings, 1 reply; 10+ messages in thread
From: Jakub Kicinski @ 2018-10-04 17:19 UTC (permalink / raw)
  To: John Hurley
  Cc: Or Gerlitz, Linux Netdev List, Jiri Pirko, oss-drivers, ozsh,
	avivh, Simon Horman

On Thu, 4 Oct 2018 17:20:43 +0100, John Hurley wrote:
> > > In this case the hw driver will receive the rules from the tunnel device directly.
> > > The driver can then offload them as it sees fit.  
> >
> > if both instances of the hw drivers (uplink0, uplink1) register to get
> > the rules installed on the block of the tunnel device we have exactly
> > what we want, isn't that?
> >  
> 
> The design here is that each hw driver should only need to register
> for callbacks on a 'higher level' device's block once.
> When a callback is triggered the driver receives one instance of the
> rule and can make its own decision about what to do.
> This is slightly different from registering ingress devs where each
> uplink registers for its own block.
> It is probably more akin to the egdev setup in that if a rule on a
> block egresses to an uplink, the driver receives 1 callback on the
> rule, irrespective of how may underlying netdevs are on the block.

Right, though nothing stops the driver from registering multiple
callbacks for the same device, if its somehow useful.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/2] net: sched: indirect/remote setup tc block cb registering
  2018-10-04 17:19         ` Jakub Kicinski
@ 2018-10-10 13:38           ` Or Gerlitz
  2018-10-11 11:03             ` John Hurley
  0 siblings, 1 reply; 10+ messages in thread
From: Or Gerlitz @ 2018-10-10 13:38 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: John Hurley, Linux Netdev List, Jiri Pirko, oss-drivers,
	Oz Shlomo, Aviv Heller, Simon Horman

On Thu, Oct 4, 2018 at 8:19 PM Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
> On Thu, 4 Oct 2018 17:20:43 +0100, John Hurley wrote:
> > > > In this case the hw driver will receive the rules from the tunnel device directly.
> > > > The driver can then offload them as it sees fit.
> > >
> > > if both instances of the hw drivers (uplink0, uplink1) register to get
> > > the rules installed on the block of the tunnel device we have exactly
> > > what we want, isn't that?
> > >
> >
> > The design here is that each hw driver should only need to register
> > for callbacks on a 'higher level' device's block once.
> > When a callback is triggered the driver receives one instance of the
> > rule and can make its own decision about what to do.
> > This is slightly different from registering ingress devs where each
> > uplink registers for its own block.
> > It is probably more akin to the egdev setup in that if a rule on a
> > block egresses to an uplink, the driver receives 1 callback on the
> > rule, irrespective of how may underlying netdevs are on the block.
>
> Right, though nothing stops the driver from registering multiple
> callbacks for the same device, if its somehow useful.

I must be missing something.. put uplink bonding a side. If the user
is setting tc ingress rule
on a tunnel device (vxlan0/gre0) over a system with multiple unrelated
NICs/uplinks that support
TC decap offload, wouldn't each of these netdevs want to install the
rule into HW? why do we want
the HW driver to duplicate the rule between the potential candidates
among the netdev instances they created?
and not each of them to get the callback and decide??

we want each netdev instance of these NIC

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/2] net: sched: indirect/remote setup tc block cb registering
  2018-10-10 13:38           ` Or Gerlitz
@ 2018-10-11 11:03             ` John Hurley
  0 siblings, 0 replies; 10+ messages in thread
From: John Hurley @ 2018-10-11 11:03 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Jakub Kicinski, Linux Netdev List, Jiri Pirko, oss-drivers, ozsh,
	avivh, Simon Horman

On Wed, Oct 10, 2018 at 2:38 PM Or Gerlitz <gerlitz.or@gmail.com> wrote:
>
> On Thu, Oct 4, 2018 at 8:19 PM Jakub Kicinski
> <jakub.kicinski@netronome.com> wrote:
> > On Thu, 4 Oct 2018 17:20:43 +0100, John Hurley wrote:
> > > > > In this case the hw driver will receive the rules from the tunnel device directly.
> > > > > The driver can then offload them as it sees fit.
> > > >
> > > > if both instances of the hw drivers (uplink0, uplink1) register to get
> > > > the rules installed on the block of the tunnel device we have exactly
> > > > what we want, isn't that?
> > > >
> > >
> > > The design here is that each hw driver should only need to register
> > > for callbacks on a 'higher level' device's block once.
> > > When a callback is triggered the driver receives one instance of the
> > > rule and can make its own decision about what to do.
> > > This is slightly different from registering ingress devs where each
> > > uplink registers for its own block.
> > > It is probably more akin to the egdev setup in that if a rule on a
> > > block egresses to an uplink, the driver receives 1 callback on the
> > > rule, irrespective of how may underlying netdevs are on the block.
> >
> > Right, though nothing stops the driver from registering multiple
> > callbacks for the same device, if its somehow useful.
>
> I must be missing something.. put uplink bonding a side. If the user
> is setting tc ingress rule
> on a tunnel device (vxlan0/gre0) over a system with multiple unrelated
> NICs/uplinks that support
> TC decap offload, wouldn't each of these netdevs want to install the
> rule into HW? why do we want
> the HW driver to duplicate the rule between the potential candidates
> among the netdev instances they created?
> and not each of them to get the callback and decide??
>
> we want each netdev instance of these NIC

Hi Or,
It depends on how we want to offload tunnels.
In the case of the NFP, we offload 1 instance of a tunnel rule, not
one instance per uplink.
With this, it makes sense to have 1 callback per tunnel netdev (and
per driver) rather that per uplink (although as Jakub pointed out, the
option is there to register more callbacks).
If we consider the egdev model for offload, we only got a single
callback per rule if the egress device was registered and did not know
the ingress dev - is this not a similar in that the driver gets 1
callback for the rule and decides what to do with it?
John

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-10-11 18:30 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-04  4:55 [RFC 0/2] net: sched: indirect/remote setup tc block cb registering Jakub Kicinski
2018-10-04  4:55 ` [RFC 1/2] net: sched: register callbacks for remote tc block binds Jakub Kicinski
2018-10-04  4:55 ` [RFC 2/2] nfp: register remote block callbacks for vxlan/geneve Jakub Kicinski
2018-10-04 14:28 ` [RFC 0/2] net: sched: indirect/remote setup tc block cb registering Or Gerlitz
2018-10-04 15:44   ` John Hurley
2018-10-04 15:53     ` Or Gerlitz
2018-10-04 16:20       ` John Hurley
2018-10-04 17:19         ` Jakub Kicinski
2018-10-10 13:38           ` Or Gerlitz
2018-10-11 11:03             ` John Hurley

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.