All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch net-next v4 00/10] net: sched: allow qdiscs to share filter block instances
@ 2017-12-23 15:54 Jiri Pirko
  2017-12-23 15:54 ` [patch net-next v4 01/10] net: sched: introduce support for multiple filter chain pointers registration Jiri Pirko
                   ` (11 more replies)
  0 siblings, 12 replies; 40+ messages in thread
From: Jiri Pirko @ 2017-12-23 15:54 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, mlxsw, andrew, vivien.didelot,
	f.fainelli, michael.chan, ganeshgr, saeedm, matanb, leonro,
	idosch, jakub.kicinski, simon.horman, pieter.jansenvanvuuren,
	john.hurley, alexander.h.duyck, ogerlitz, john.fastabend, daniel,
	dsahern

From: Jiri Pirko <jiri@mellanox.com>

Currently the filters added to qdiscs are independent. So for example if you
have 2 netdevices and you create ingress qdisc on both and you want to add
identical filter rules both, you need to add them twice. This patchset
makes this easier and mainly saves resources allowing to share all filters
within a qdisc - I call it a "filter block". Also this helps to save
resources when we do offload to hw for example to expensive TCAM.

So back to the example. First, we create 2 qdiscs. Both will share
block number 22. "22" is just an identification. If we don't pass any
block number, a new one will be generated by kernel:

$ tc qdisc add dev ens7 ingress block 22
                                ^^^^^^^^
$ tc qdisc add dev ens8 ingress block 22
                                ^^^^^^^^

Now if we list the qdiscs, we will see the block index in the output:

$ tc qdisc
qdisc ingress ffff: dev ens7 parent ffff:fff1 block 22
qdisc ingress ffff: dev ens8 parent ffff:fff1 block 22

To make is more visual, the situation looks like this:

   ens7 ingress qdisc                 ens7 ingress qdisc
          |                                  |
          |                                  |
          +---------->  block 22  <----------+

Unlimited number of qdiscs may share the same block.

Now we can add filter to any of qdiscs sharing the same block:

$ tc filter add dev ens7 ingress protocol ip pref 25 flower dst_ip 192.168.0.0/16 action drop


We will see the same output if we list filters for ens7 and ens8, including stats:

$ tc -s filter show dev ens7 ingress
filter protocol ip pref 25 flower chain 0
filter protocol ip pref 25 flower chain 0 handle 0x1
  eth_type ipv4
  dst_ip 192.168.0.0/16
  not_in_hw
        action order 1: gact action drop
         random type none pass val 0
         index 1 ref 1 bind 1 installed 39 sec used 2 sec
        Action statistics:
        Sent 3108 bytes 37 pkt (dropped 37, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

$ tc -s filter show dev ens8 ingress
filter protocol ip pref 25 flower chain 0
filter protocol ip pref 25 flower chain 0 handle 0x1
  eth_type ipv4
  dst_ip 192.168.0.0/16
  not_in_hw
        action order 1: gact action drop
         random type none pass val 0
         index 1 ref 1 bind 1 installed 40 sec used 3 sec
        Action statistics:
        Sent 3108 bytes 37 pkt (dropped 37, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

---
v3->v4:
- patch 1:
 - rebased on top of the current net-next
 - added some extack strings
- patch 3:
 - rebased on top of the current net-next
- patch 5:
 - propagate netdev_ops->ndo_setup_tc error up to tcf_block_offload_bind
   caller
- patch 6:
 - rebased on top of the current net-next

v2->v3:
- removed original patch 1, removing tp->q cls_bpf dependency. Fixed by
  Jakub in the meantime.
- patch 1:
 - rebased on top of the current net-next
- patch 5:
 - new patch
- patch 6:
 - removed "p_" prefix from block index function args
- patch 9:
 - add tc offload feature handling

Jiri Pirko (10):
  net: sched: introduce support for multiple filter chain pointers
    registration
  net: sched: avoid usage of tp->q in tcf_classify
  net: sched: introduce block mechanism to handle netif_keep_dst calls
  net: sched: remove classid and q fields from tcf_proto
  net: sched: keep track of offloaded filters and check tc offload
    feature
  net: sched: allow ingress and clsact qdiscs to share filter blocks
  mlxsw: spectrum_acl: Reshuffle code around
    mlxsw_sp_acl_ruleset_create/destroy
  mlxsw: spectrum_acl: Don't store netdev and ingress for ruleset unbind
  mlxsw: spectrum_acl: Implement TC block sharing
  mlxsw: spectrum_acl: Pass mlxsw_sp_port down to ruleset bind/unbind
    ops

 drivers/net/ethernet/mellanox/mlxsw/spectrum.c     | 182 ++++++++--
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h     |  44 ++-
 drivers/net/ethernet/mellanox/mlxsw/spectrum_acl.c | 302 ++++++++++++----
 .../ethernet/mellanox/mlxsw/spectrum_acl_tcam.c    |  44 ++-
 .../net/ethernet/mellanox/mlxsw/spectrum_flower.c  |  41 +--
 include/net/pkt_cls.h                              |   4 +
 include/net/sch_generic.h                          |  26 +-
 include/uapi/linux/pkt_sched.h                     |  11 +
 net/sched/cls_api.c                                | 380 ++++++++++++++++++---
 net/sched/cls_bpf.c                                |   9 +-
 net/sched/cls_flow.c                               |   2 +-
 net/sched/cls_flower.c                             |   3 +-
 net/sched/cls_matchall.c                           |   3 +-
 net/sched/cls_route.c                              |   2 +-
 net/sched/cls_u32.c                                |  13 +-
 net/sched/sch_ingress.c                            |  89 ++++-
 16 files changed, 931 insertions(+), 224 deletions(-)

-- 
2.9.5

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [patch net-next v4 01/10] net: sched: introduce support for multiple filter chain pointers registration
  2017-12-23 15:54 [patch net-next v4 00/10] net: sched: allow qdiscs to share filter block instances Jiri Pirko
@ 2017-12-23 15:54 ` Jiri Pirko
  2017-12-23 15:54 ` [patch net-next v4 02/10] net: sched: avoid usage of tp->q in tcf_classify Jiri Pirko
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 40+ messages in thread
From: Jiri Pirko @ 2017-12-23 15:54 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, mlxsw, andrew, vivien.didelot,
	f.fainelli, michael.chan, ganeshgr, saeedm, matanb, leonro,
	idosch, jakub.kicinski, simon.horman, pieter.jansenvanvuuren,
	john.hurley, alexander.h.duyck, ogerlitz, john.fastabend, daniel,
	dsahern

From: Jiri Pirko <jiri@mellanox.com>

So far, there was possible only to register a single filter chain
pointer to block->chain[0]. However, when the blocks will get shareable,
we need to allow multiple filter chain pointers registration.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
v3->v4:
- rebased on top of the current net-next
- added some extack strings
v2->v3:
- rebased on top of the current net-next
---
 include/net/pkt_cls.h     |   3 +
 include/net/sch_generic.h |   5 +-
 net/sched/cls_api.c       | 236 +++++++++++++++++++++++++++++++++++++++++-----
 3 files changed, 216 insertions(+), 28 deletions(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 31574c9..35ab7c9 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -29,6 +29,8 @@ struct tcf_block_ext_info {
 	enum tcf_block_binder_type binder_type;
 	tcf_chain_head_change_t *chain_head_change;
 	void *chain_head_change_priv;
+	bool shareable;
+	u32 block_index;
 };
 
 struct tcf_block_cb;
@@ -50,6 +52,7 @@ void tcf_block_put_ext(struct tcf_block *block, struct Qdisc *q,
 
 static inline struct Qdisc *tcf_block_q(struct tcf_block *block)
 {
+	WARN_ON(block->refcnt != 1);
 	return block->q;
 }
 
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index ac029d5..5cc4d71 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -275,8 +275,7 @@ typedef void tcf_chain_head_change_t(struct tcf_proto *tp_head, void *priv);
 
 struct tcf_chain {
 	struct tcf_proto __rcu *filter_chain;
-	tcf_chain_head_change_t *chain_head_change;
-	void *chain_head_change_priv;
+	struct list_head filter_chain_list;
 	struct list_head list;
 	struct tcf_block *block;
 	u32 index; /* chain index */
@@ -285,6 +284,8 @@ struct tcf_chain {
 
 struct tcf_block {
 	struct list_head chain_list;
+	u32 index; /* block index for shared blocks */
+	unsigned int refcnt;
 	struct net *net;
 	struct Qdisc *q;
 	struct list_head cb_list;
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 4591b87..9b1b4fa 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -24,6 +24,7 @@
 #include <linux/init.h>
 #include <linux/kmod.h>
 #include <linux/slab.h>
+#include <linux/idr.h>
 #include <net/net_namespace.h>
 #include <net/sock.h>
 #include <net/netlink.h>
@@ -179,6 +180,12 @@ static void tcf_proto_destroy(struct tcf_proto *tp)
 	kfree_rcu(tp, rcu);
 }
 
+struct tcf_filter_chain_list_item {
+	struct list_head list;
+	tcf_chain_head_change_t *chain_head_change;
+	void *chain_head_change_priv;
+};
+
 static struct tcf_chain *tcf_chain_create(struct tcf_block *block,
 					  u32 chain_index)
 {
@@ -187,6 +194,7 @@ static struct tcf_chain *tcf_chain_create(struct tcf_block *block,
 	chain = kzalloc(sizeof(*chain), GFP_KERNEL);
 	if (!chain)
 		return NULL;
+	INIT_LIST_HEAD(&chain->filter_chain_list);
 	list_add_tail(&chain->list, &block->chain_list);
 	chain->block = block;
 	chain->index = chain_index;
@@ -194,12 +202,19 @@ static struct tcf_chain *tcf_chain_create(struct tcf_block *block,
 	return chain;
 }
 
+static void tcf_chain_head_change_item(struct tcf_filter_chain_list_item *item,
+				       struct tcf_proto *tp_head)
+{
+	if (item->chain_head_change)
+		item->chain_head_change(tp_head, item->chain_head_change_priv);
+}
 static void tcf_chain_head_change(struct tcf_chain *chain,
 				  struct tcf_proto *tp_head)
 {
-	if (chain->chain_head_change)
-		chain->chain_head_change(tp_head,
-					 chain->chain_head_change_priv);
+	struct tcf_filter_chain_list_item *item;
+
+	list_for_each_entry(item, &chain->filter_chain_list, list)
+		tcf_chain_head_change_item(item, tp_head);
 }
 
 static void tcf_chain_flush(struct tcf_chain *chain)
@@ -280,17 +295,91 @@ static void tcf_block_offload_unbind(struct tcf_block *block, struct Qdisc *q,
 	tcf_block_offload_cmd(block, q, ei, TC_BLOCK_UNBIND);
 }
 
-int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q,
-		      struct tcf_block_ext_info *ei,
-		      struct netlink_ext_ack *extack)
+static int
+tcf_chain_head_change_cb_add(struct tcf_chain *chain,
+			     struct tcf_block_ext_info *ei,
+			     struct netlink_ext_ack *extack)
 {
-	struct tcf_block *block = kzalloc(sizeof(*block), GFP_KERNEL);
+	struct tcf_filter_chain_list_item *item;
+
+	item = kmalloc(sizeof(*item), GFP_KERNEL);
+	if (!item) {
+		NL_SET_ERR_MSG(extack, "Memory allocation for head change callback item failed");
+		return -ENOMEM;
+	}
+	item->chain_head_change = ei->chain_head_change;
+	item->chain_head_change_priv = ei->chain_head_change_priv;
+	if (chain->filter_chain)
+		tcf_chain_head_change_item(item, chain->filter_chain);
+	list_add(&item->list, &chain->filter_chain_list);
+	return 0;
+}
+
+static void
+tcf_chain_head_change_cb_del(struct tcf_chain *chain,
+			     struct tcf_block_ext_info *ei)
+{
+	struct tcf_filter_chain_list_item *item;
+
+	list_for_each_entry(item, &chain->filter_chain_list, list) {
+		if ((!ei->chain_head_change && !ei->chain_head_change_priv) ||
+		    (item->chain_head_change == ei->chain_head_change &&
+		     item->chain_head_change_priv == ei->chain_head_change_priv)) {
+			tcf_chain_head_change_item(item, NULL);
+			list_del(&item->list);
+			kfree(item);
+			return;
+		}
+	}
+	WARN_ON(1);
+}
+
+struct tcf_net {
+	struct idr idr;
+};
+
+static unsigned int tcf_net_id;
+
+static int tcf_block_insert(struct tcf_block *block, struct net *net,
+			    u32 block_index, struct netlink_ext_ack *extack)
+{
+	struct tcf_net *tn = net_generic(net, tcf_net_id);
+	int idr_start;
+	int idr_end;
+	int index;
+
+	if (block_index >= INT_MAX) {
+		NL_SET_ERR_MSG(extack, "Invalid block index value (>= INT_MAX)");
+		return -EINVAL;
+	}
+	idr_start = block_index ? block_index : 1;
+	idr_end = block_index ? block_index + 1 : INT_MAX;
+
+	index = idr_alloc(&tn->idr, block, idr_start, idr_end, GFP_KERNEL);
+	if (index < 0)
+		return index;
+	block->index = index;
+	return 0;
+}
+
+static void tcf_block_remove(struct tcf_block *block, struct net *net)
+{
+	struct tcf_net *tn = net_generic(net, tcf_net_id);
+
+	idr_remove(&tn->idr, block->index);
+}
+
+static struct tcf_block *tcf_block_create(struct net *net, struct Qdisc *q,
+					  struct netlink_ext_ack *extack)
+{
+	struct tcf_block *block;
 	struct tcf_chain *chain;
 	int err;
 
+	block = kzalloc(sizeof(*block), GFP_KERNEL);
 	if (!block) {
 		NL_SET_ERR_MSG(extack, "Memory allocation for block failed");
-		return -ENOMEM;
+		return ERR_PTR(-ENOMEM);
 	}
 	INIT_LIST_HEAD(&block->chain_list);
 	INIT_LIST_HEAD(&block->cb_list);
@@ -302,17 +391,75 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q,
 		err = -ENOMEM;
 		goto err_chain_create;
 	}
-	WARN_ON(!ei->chain_head_change);
-	chain->chain_head_change = ei->chain_head_change;
-	chain->chain_head_change_priv = ei->chain_head_change_priv;
 	block->net = qdisc_net(q);
+	block->refcnt = 1;
+	block->net = net;
 	block->q = q;
+	return block;
+
+err_chain_create:
+	kfree(block);
+	return ERR_PTR(err);
+}
+
+static struct tcf_block *tcf_block_lookup(struct net *net, u32 block_index)
+{
+	struct tcf_net *tn = net_generic(net, tcf_net_id);
+
+	return idr_find(&tn->idr, block_index);
+}
+
+static struct tcf_chain *tcf_block_chain_zero(struct tcf_block *block)
+{
+	return list_first_entry(&block->chain_list, struct tcf_chain, list);
+}
+
+int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q,
+		      struct tcf_block_ext_info *ei,
+		      struct netlink_ext_ack *extack)
+{
+	struct net *net = qdisc_net(q);
+	struct tcf_block *block = NULL;
+	bool created = false;
+	int err;
+
+	if (ei->shareable) {
+		block = tcf_block_lookup(net, ei->block_index);
+		if (block)
+			block->refcnt++;
+	}
+
+	if (!block) {
+		block = tcf_block_create(net, q, extack);
+		if (IS_ERR(block))
+			return PTR_ERR(block);
+		created = true;
+		if (ei->shareable) {
+			err = tcf_block_insert(block, net,
+					       ei->block_index, extack);
+			if (err)
+				goto err_block_insert;
+		}
+	}
+
+	err = tcf_chain_head_change_cb_add(tcf_block_chain_zero(block),
+					   ei, extack);
+	if (err)
+		goto err_chain_head_change_cb_add;
+
 	tcf_block_offload_bind(block, q, ei);
 	*p_block = block;
 	return 0;
 
-err_chain_create:
-	kfree(block);
+err_chain_head_change_cb_add:
+	if (created) {
+		if (ei->shareable)
+			tcf_block_remove(block, net);
+err_block_insert:
+		kfree(block);
+	} else {
+		block->refcnt--;
+	}
 	return err;
 }
 EXPORT_SYMBOL(tcf_block_get_ext);
@@ -346,26 +493,35 @@ void tcf_block_put_ext(struct tcf_block *block, struct Qdisc *q,
 {
 	struct tcf_chain *chain, *tmp;
 
-	/* Hold a refcnt for all chains, so that they don't disappear
-	 * while we are iterating.
-	 */
 	if (!block)
 		return;
-	list_for_each_entry(chain, &block->chain_list, list)
-		tcf_chain_hold(chain);
 
-	list_for_each_entry(chain, &block->chain_list, list)
-		tcf_chain_flush(chain);
+	tcf_chain_head_change_cb_del(tcf_block_chain_zero(block), ei);
+
+	if (--block->refcnt == 0) {
+		if (ei->shareable)
+			tcf_block_remove(block, block->net);
+
+		/* Hold a refcnt for all chains, so that they don't disappear
+		 * while we are iterating.
+		 */
+		list_for_each_entry(chain, &block->chain_list, list)
+			tcf_chain_hold(chain);
+
+		list_for_each_entry(chain, &block->chain_list, list)
+			tcf_chain_flush(chain);
+	}
 
 	tcf_block_offload_unbind(block, q, ei);
 
-	/* At this point, all the chains should have refcnt >= 1. */
-	list_for_each_entry_safe(chain, tmp, &block->chain_list, list)
-		tcf_chain_put(chain);
+	if (block->refcnt == 0) {
+		/* At this point, all the chains should have refcnt >= 1. */
+		list_for_each_entry_safe(chain, tmp, &block->chain_list, list)
+			tcf_chain_put(chain);
 
-	/* Finally, put chain 0 and allow block to be freed. */
-	chain = list_first_entry(&block->chain_list, struct tcf_chain, list);
-	tcf_chain_put(chain);
+		/* Finally, put chain 0 and allow block to be freed. */
+		tcf_chain_put(tcf_block_chain_zero(block));
+	}
 }
 EXPORT_SYMBOL(tcf_block_put_ext);
 
@@ -1250,12 +1406,40 @@ int tc_setup_cb_call(struct tcf_block *block, struct tcf_exts *exts,
 }
 EXPORT_SYMBOL(tc_setup_cb_call);
 
+static __net_init int tcf_net_init(struct net *net)
+{
+	struct tcf_net *tn = net_generic(net, tcf_net_id);
+
+	idr_init(&tn->idr);
+	return 0;
+}
+
+static void __net_exit tcf_net_exit(struct net *net)
+{
+	struct tcf_net *tn = net_generic(net, tcf_net_id);
+
+	idr_destroy(&tn->idr);
+}
+
+static struct pernet_operations tcf_net_ops = {
+	.init = tcf_net_init,
+	.exit = tcf_net_exit,
+	.id   = &tcf_net_id,
+	.size = sizeof(struct tcf_net),
+};
+
 static int __init tc_filter_init(void)
 {
+	int err;
+
 	tc_filter_wq = alloc_ordered_workqueue("tc_filter_workqueue", 0);
 	if (!tc_filter_wq)
 		return -ENOMEM;
 
+	err = register_pernet_subsys(&tcf_net_ops);
+	if (err)
+		return err;
+
 	rtnl_register(PF_UNSPEC, RTM_NEWTFILTER, tc_ctl_tfilter, NULL, 0);
 	rtnl_register(PF_UNSPEC, RTM_DELTFILTER, tc_ctl_tfilter, NULL, 0);
 	rtnl_register(PF_UNSPEC, RTM_GETTFILTER, tc_ctl_tfilter,
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [patch net-next v4 02/10] net: sched: avoid usage of tp->q in tcf_classify
  2017-12-23 15:54 [patch net-next v4 00/10] net: sched: allow qdiscs to share filter block instances Jiri Pirko
  2017-12-23 15:54 ` [patch net-next v4 01/10] net: sched: introduce support for multiple filter chain pointers registration Jiri Pirko
@ 2017-12-23 15:54 ` Jiri Pirko
  2017-12-23 15:54 ` [patch net-next v4 03/10] net: sched: introduce block mechanism to handle netif_keep_dst calls Jiri Pirko
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 40+ messages in thread
From: Jiri Pirko @ 2017-12-23 15:54 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, mlxsw, andrew, vivien.didelot,
	f.fainelli, michael.chan, ganeshgr, saeedm, matanb, leonro,
	idosch, jakub.kicinski, simon.horman, pieter.jansenvanvuuren,
	john.hurley, alexander.h.duyck, ogerlitz, john.fastabend, daniel,
	dsahern

From: Jiri Pirko <jiri@mellanox.com>

Use block index in the messages instead.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_api.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 9b1b4fa..b93eca8 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -678,8 +678,9 @@ int tcf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 #ifdef CONFIG_NET_CLS_ACT
 reset:
 	if (unlikely(limit++ >= max_reclassify_loop)) {
-		net_notice_ratelimited("%s: reclassify loop, rule prio %u, protocol %02x\n",
-				       tp->q->ops->id, tp->prio & 0xffff,
+		net_notice_ratelimited("%u: reclassify loop, rule prio %u, protocol %02x\n",
+				       tp->chain->block->index,
+				       tp->prio & 0xffff,
 				       ntohs(tp->protocol));
 		return TC_ACT_SHOT;
 	}
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [patch net-next v4 03/10] net: sched: introduce block mechanism to handle netif_keep_dst calls
  2017-12-23 15:54 [patch net-next v4 00/10] net: sched: allow qdiscs to share filter block instances Jiri Pirko
  2017-12-23 15:54 ` [patch net-next v4 01/10] net: sched: introduce support for multiple filter chain pointers registration Jiri Pirko
  2017-12-23 15:54 ` [patch net-next v4 02/10] net: sched: avoid usage of tp->q in tcf_classify Jiri Pirko
@ 2017-12-23 15:54 ` Jiri Pirko
  2017-12-23 15:54 ` [patch net-next v4 04/10] net: sched: remove classid and q fields from tcf_proto Jiri Pirko
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 40+ messages in thread
From: Jiri Pirko @ 2017-12-23 15:54 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, mlxsw, andrew, vivien.didelot,
	f.fainelli, michael.chan, ganeshgr, saeedm, matanb, leonro,
	idosch, jakub.kicinski, simon.horman, pieter.jansenvanvuuren,
	john.hurley, alexander.h.duyck, ogerlitz, john.fastabend, daniel,
	dsahern

From: Jiri Pirko <jiri@mellanox.com>

Couple of classifiers call netif_keep_dst directly on q->dev. That is
not possible to do directly for shared blocke where multiple qdiscs are
owning the block. So introduce a infrastructure to keep track of the
block owners in list and use this list to implement block variant of
netif_keep_dst.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
v3->v4:
- rebased on top of the current net-next
v1->v2:
- fix binder_type to check "egress" as well as pointed out by Daniel
---
 include/net/pkt_cls.h     |  1 +
 include/net/sch_generic.h |  2 ++
 net/sched/cls_api.c       | 69 +++++++++++++++++++++++++++++++++++++++++++++++
 net/sched/cls_bpf.c       |  4 +--
 net/sched/cls_flow.c      |  2 +-
 net/sched/cls_route.c     |  2 +-
 6 files changed, 76 insertions(+), 4 deletions(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 35ab7c9..4837f4a 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -40,6 +40,7 @@ bool tcf_queue_work(struct work_struct *work);
 struct tcf_chain *tcf_chain_get(struct tcf_block *block, u32 chain_index,
 				bool create);
 void tcf_chain_put(struct tcf_chain *chain);
+void tcf_block_netif_keep_dst(struct tcf_block *block);
 int tcf_block_get(struct tcf_block **p_block,
 		  struct tcf_proto __rcu **p_filter_chain, struct Qdisc *q,
 		  struct netlink_ext_ack *extack);
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 5cc4d71..df97c3e 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -289,6 +289,8 @@ struct tcf_block {
 	struct net *net;
 	struct Qdisc *q;
 	struct list_head cb_list;
+	struct list_head owner_list;
+	bool keep_dst;
 };
 
 static inline void qdisc_cb_private_validate(const struct sk_buff *skb, int sz)
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index b93eca8..8a7d77a 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -383,6 +383,7 @@ static struct tcf_block *tcf_block_create(struct net *net, struct Qdisc *q,
 	}
 	INIT_LIST_HEAD(&block->chain_list);
 	INIT_LIST_HEAD(&block->cb_list);
+	INIT_LIST_HEAD(&block->owner_list);
 
 	/* Create chain 0 by default, it has to be always present. */
 	chain = tcf_chain_create(block, 0);
@@ -414,6 +415,65 @@ static struct tcf_chain *tcf_block_chain_zero(struct tcf_block *block)
 	return list_first_entry(&block->chain_list, struct tcf_chain, list);
 }
 
+struct tcf_block_owner_item {
+	struct list_head list;
+	struct Qdisc *q;
+	enum tcf_block_binder_type binder_type;
+};
+
+static void
+tcf_block_owner_netif_keep_dst(struct tcf_block *block,
+			       struct Qdisc *q,
+			       enum tcf_block_binder_type binder_type)
+{
+	if (block->keep_dst &&
+	    binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS &&
+	    binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_EGRESS)
+		netif_keep_dst(qdisc_dev(q));
+}
+
+void tcf_block_netif_keep_dst(struct tcf_block *block)
+{
+	struct tcf_block_owner_item *item;
+
+	block->keep_dst = true;
+	list_for_each_entry(item, &block->owner_list, list)
+		tcf_block_owner_netif_keep_dst(block, item->q,
+					       item->binder_type);
+}
+EXPORT_SYMBOL(tcf_block_netif_keep_dst);
+
+static int tcf_block_owner_add(struct tcf_block *block,
+			       struct Qdisc *q,
+			       enum tcf_block_binder_type binder_type)
+{
+	struct tcf_block_owner_item *item;
+
+	item = kmalloc(sizeof(*item), GFP_KERNEL);
+	if (!item)
+		return -ENOMEM;
+	item->q = q;
+	item->binder_type = binder_type;
+	list_add(&item->list, &block->owner_list);
+	return 0;
+}
+
+static void tcf_block_owner_del(struct tcf_block *block,
+				struct Qdisc *q,
+				enum tcf_block_binder_type binder_type)
+{
+	struct tcf_block_owner_item *item;
+
+	list_for_each_entry(item, &block->owner_list, list) {
+		if (item->q == q && item->binder_type == binder_type) {
+			list_del(&item->list);
+			kfree(item);
+			return;
+		}
+	}
+	WARN_ON(1);
+}
+
 int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q,
 		      struct tcf_block_ext_info *ei,
 		      struct netlink_ext_ack *extack)
@@ -442,6 +502,12 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q,
 		}
 	}
 
+	err = tcf_block_owner_add(block, q, ei->binder_type);
+	if (err)
+		goto err_block_owner_add;
+
+	tcf_block_owner_netif_keep_dst(block, q, ei->binder_type);
+
 	err = tcf_chain_head_change_cb_add(tcf_block_chain_zero(block),
 					   ei, extack);
 	if (err)
@@ -452,6 +518,8 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q,
 	return 0;
 
 err_chain_head_change_cb_add:
+	tcf_block_owner_del(block, q, ei->binder_type);
+err_block_owner_add:
 	if (created) {
 		if (ei->shareable)
 			tcf_block_remove(block, net);
@@ -497,6 +565,7 @@ void tcf_block_put_ext(struct tcf_block *block, struct Qdisc *q,
 		return;
 
 	tcf_chain_head_change_cb_del(tcf_block_chain_zero(block), ei);
+	tcf_block_owner_del(block, q, ei->binder_type);
 
 	if (--block->refcnt == 0) {
 		if (ei->shareable)
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index 8d78e7f..d79cc50 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -392,8 +392,8 @@ static int cls_bpf_prog_from_efd(struct nlattr **tb, struct cls_bpf_prog *prog,
 	prog->bpf_name = name;
 	prog->filter = fp;
 
-	if (fp->dst_needed && !(tp->q->flags & TCQ_F_INGRESS))
-		netif_keep_dst(qdisc_dev(tp->q));
+	if (fp->dst_needed)
+		tcf_block_netif_keep_dst(tp->chain->block);
 
 	return 0;
 }
diff --git a/net/sched/cls_flow.c b/net/sched/cls_flow.c
index 25c2a88..28cd6fb 100644
--- a/net/sched/cls_flow.c
+++ b/net/sched/cls_flow.c
@@ -526,7 +526,7 @@ static int flow_change(struct net *net, struct sk_buff *in_skb,
 
 	timer_setup(&fnew->perturb_timer, flow_perturbation, TIMER_DEFERRABLE);
 
-	netif_keep_dst(qdisc_dev(tp->q));
+	tcf_block_netif_keep_dst(tp->chain->block);
 
 	if (tb[TCA_FLOW_KEYS]) {
 		fnew->keymask = keymask;
diff --git a/net/sched/cls_route.c b/net/sched/cls_route.c
index ac9a5b8..a1f2b1b 100644
--- a/net/sched/cls_route.c
+++ b/net/sched/cls_route.c
@@ -527,7 +527,7 @@ static int route4_change(struct net *net, struct sk_buff *in_skb,
 		if (f->handle < f1->handle)
 			break;
 
-	netif_keep_dst(qdisc_dev(tp->q));
+	tcf_block_netif_keep_dst(tp->chain->block);
 	rcu_assign_pointer(f->next, f1);
 	rcu_assign_pointer(*fp, f);
 
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [patch net-next v4 04/10] net: sched: remove classid and q fields from tcf_proto
  2017-12-23 15:54 [patch net-next v4 00/10] net: sched: allow qdiscs to share filter block instances Jiri Pirko
                   ` (2 preceding siblings ...)
  2017-12-23 15:54 ` [patch net-next v4 03/10] net: sched: introduce block mechanism to handle netif_keep_dst calls Jiri Pirko
@ 2017-12-23 15:54 ` Jiri Pirko
  2017-12-23 15:54 ` [patch net-next v4 05/10] net: sched: keep track of offloaded filters and check tc offload feature Jiri Pirko
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 40+ messages in thread
From: Jiri Pirko @ 2017-12-23 15:54 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, mlxsw, andrew, vivien.didelot,
	f.fainelli, michael.chan, ganeshgr, saeedm, matanb, leonro,
	idosch, jakub.kicinski, simon.horman, pieter.jansenvanvuuren,
	john.hurley, alexander.h.duyck, ogerlitz, john.fastabend, daniel,
	dsahern

From: Jiri Pirko <jiri@mellanox.com>

Both are no longer used, so remove them.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/sch_generic.h | 2 --
 net/sched/cls_api.c       | 7 ++-----
 2 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index df97c3e..dba2214 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -255,8 +255,6 @@ struct tcf_proto {
 
 	/* All the rest */
 	u32			prio;
-	u32			classid;
-	struct Qdisc		*q;
 	void			*data;
 	const struct tcf_proto_ops	*ops;
 	struct tcf_chain	*chain;
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 8a7d77a..07767b4 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -122,8 +122,7 @@ static inline u32 tcf_auto_prio(struct tcf_proto *tp)
 }
 
 static struct tcf_proto *tcf_proto_create(const char *kind, u32 protocol,
-					  u32 prio, u32 parent, struct Qdisc *q,
-					  struct tcf_chain *chain)
+					  u32 prio, struct tcf_chain *chain)
 {
 	struct tcf_proto *tp;
 	int err;
@@ -157,8 +156,6 @@ static struct tcf_proto *tcf_proto_create(const char *kind, u32 protocol,
 	tp->classify = tp->ops->classify;
 	tp->protocol = protocol;
 	tp->prio = prio;
-	tp->classid = parent;
-	tp->q = q;
 	tp->chain = chain;
 
 	err = tp->ops->init(tp);
@@ -1075,7 +1072,7 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 			prio = tcf_auto_prio(tcf_chain_tp_prev(&chain_info));
 
 		tp = tcf_proto_create(nla_data(tca[TCA_KIND]),
-				      protocol, prio, parent, q, chain);
+				      protocol, prio, chain);
 		if (IS_ERR(tp)) {
 			err = PTR_ERR(tp);
 			goto errout;
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [patch net-next v4 05/10] net: sched: keep track of offloaded filters and check tc offload feature
  2017-12-23 15:54 [patch net-next v4 00/10] net: sched: allow qdiscs to share filter block instances Jiri Pirko
                   ` (3 preceding siblings ...)
  2017-12-23 15:54 ` [patch net-next v4 04/10] net: sched: remove classid and q fields from tcf_proto Jiri Pirko
@ 2017-12-23 15:54 ` Jiri Pirko
  2017-12-24  2:20   ` Jakub Kicinski
  2017-12-23 15:54 ` [patch net-next v4 06/10] net: sched: allow ingress and clsact qdiscs to share filter blocks Jiri Pirko
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 40+ messages in thread
From: Jiri Pirko @ 2017-12-23 15:54 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, mlxsw, andrew, vivien.didelot,
	f.fainelli, michael.chan, ganeshgr, saeedm, matanb, leonro,
	idosch, jakub.kicinski, simon.horman, pieter.jansenvanvuuren,
	john.hurley, alexander.h.duyck, ogerlitz, john.fastabend, daniel,
	dsahern

From: Jiri Pirko <jiri@mellanox.com>

During block bind, we need to check tc offload feature. If it is
disabled yet still the block contains offloaded filters, forbid the
bind. Also forbid to register callback for a block that already
containes offloaded filters, as the play back is not supported now.
For keeping track of offloaded filters there is a new counter
introduced, alongside with couple of helpers called from cls_* code.
These helpers set and clear TCA_CLS_FLAGS_IN_HW flag.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
v3->v4:
- propagate netdev_ops->ndo_setup_tc error up to tcf_block_offload_bind
  caller
v2->v3:
- new patch

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/sch_generic.h | 17 +++++++++++++
 net/sched/cls_api.c       | 63 ++++++++++++++++++++++++++++++++++++-----------
 net/sched/cls_bpf.c       |  5 +++-
 net/sched/cls_flower.c    |  3 ++-
 net/sched/cls_matchall.c  |  3 ++-
 net/sched/cls_u32.c       | 13 +++++-----
 6 files changed, 81 insertions(+), 23 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index dba2214..22a3a1d 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -289,8 +289,25 @@ struct tcf_block {
 	struct list_head cb_list;
 	struct list_head owner_list;
 	bool keep_dst;
+	unsigned int offloadcnt;
 };
 
+static inline void tcf_block_offload_inc(struct tcf_block *block, u32 *flags)
+{
+	if (*flags & TCA_CLS_FLAGS_IN_HW)
+		return;
+	*flags |= TCA_CLS_FLAGS_IN_HW;
+	block->offloadcnt++;
+}
+
+static inline void tcf_block_offload_dec(struct tcf_block *block, u32 *flags)
+{
+	if (!(*flags & TCA_CLS_FLAGS_IN_HW))
+		return;
+	*flags &= ~TCA_CLS_FLAGS_IN_HW;
+	block->offloadcnt--;
+}
+
 static inline void qdisc_cb_private_validate(const struct sk_buff *skb, int sz)
 {
 	struct qdisc_skb_cb *qcb;
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 07767b4..37eea70 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -265,31 +265,54 @@ void tcf_chain_put(struct tcf_chain *chain)
 }
 EXPORT_SYMBOL(tcf_chain_put);
 
-static void tcf_block_offload_cmd(struct tcf_block *block, struct Qdisc *q,
-				  struct tcf_block_ext_info *ei,
-				  enum tc_block_command command)
+static bool tcf_block_offload_in_use(struct tcf_block *block)
+{
+	return block->offloadcnt;
+}
+
+static int tcf_block_offload_cmd(struct tcf_block *block,
+				 struct net_device *dev,
+				 struct tcf_block_ext_info *ei,
+				 enum tc_block_command command)
 {
-	struct net_device *dev = q->dev_queue->dev;
 	struct tc_block_offload bo = {};
 
-	if (!dev->netdev_ops->ndo_setup_tc)
-		return;
 	bo.command = command;
 	bo.binder_type = ei->binder_type;
 	bo.block = block;
-	dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_BLOCK, &bo);
+	return dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_BLOCK, &bo);
 }
 
-static void tcf_block_offload_bind(struct tcf_block *block, struct Qdisc *q,
-				   struct tcf_block_ext_info *ei)
+static int tcf_block_offload_bind(struct tcf_block *block, struct Qdisc *q,
+				  struct tcf_block_ext_info *ei)
 {
-	tcf_block_offload_cmd(block, q, ei, TC_BLOCK_BIND);
+	struct net_device *dev = q->dev_queue->dev;
+	int err;
+
+	if (!dev->netdev_ops->ndo_setup_tc)
+		return 0;
+
+	/* If tc offload feature is disabled and the block we try to bind
+	 * to already has some offloaded filters, forbid to bind.
+	 */
+	if (!tc_can_offload(dev) && tcf_block_offload_in_use(block))
+		return -EOPNOTSUPP;
+
+	err = tcf_block_offload_cmd(block, dev, ei, TC_BLOCK_BIND);
+	if (err == -EOPNOTSUPP)
+		/* Driver does not support binding. */
+		return 0;
+	return err;
 }
 
 static void tcf_block_offload_unbind(struct tcf_block *block, struct Qdisc *q,
 				     struct tcf_block_ext_info *ei)
 {
-	tcf_block_offload_cmd(block, q, ei, TC_BLOCK_UNBIND);
+	struct net_device *dev = q->dev_queue->dev;
+
+	if (!dev->netdev_ops->ndo_setup_tc)
+		return;
+	tcf_block_offload_cmd(block, dev, ei, TC_BLOCK_UNBIND);
 }
 
 static int
@@ -510,10 +533,15 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q,
 	if (err)
 		goto err_chain_head_change_cb_add;
 
-	tcf_block_offload_bind(block, q, ei);
+	err = tcf_block_offload_bind(block, q, ei);
+	if (err)
+		goto err_block_offload_bind;
+
 	*p_block = block;
 	return 0;
 
+err_block_offload_bind:
+	tcf_chain_head_change_cb_del(tcf_block_chain_zero(block), ei);
 err_chain_head_change_cb_add:
 	tcf_block_owner_del(block, q, ei->binder_type);
 err_block_owner_add:
@@ -643,9 +671,16 @@ struct tcf_block_cb *__tcf_block_cb_register(struct tcf_block *block,
 {
 	struct tcf_block_cb *block_cb;
 
+	/* At this point, playback of previous block cb calls is not supported,
+	 * so forbid to register to block which already has some offloaded
+	 * filters present.
+	 */
+	if (tcf_block_offload_in_use(block))
+		return ERR_PTR(-EOPNOTSUPP);
+
 	block_cb = kzalloc(sizeof(*block_cb), GFP_KERNEL);
 	if (!block_cb)
-		return NULL;
+		return ERR_PTR(-ENOMEM);
 	block_cb->cb = cb;
 	block_cb->cb_ident = cb_ident;
 	block_cb->cb_priv = cb_priv;
@@ -661,7 +696,7 @@ int tcf_block_cb_register(struct tcf_block *block,
 	struct tcf_block_cb *block_cb;
 
 	block_cb = __tcf_block_cb_register(block, cb, cb_ident, cb_priv);
-	return block_cb ? 0 : -ENOMEM;
+	return IS_ERR(block_cb) ? PTR_ERR(block_cb) : 0;
 }
 EXPORT_SYMBOL(tcf_block_cb_register);
 
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index d79cc50..cf72aef 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -167,13 +167,16 @@ static int cls_bpf_offload_cmd(struct tcf_proto *tp, struct cls_bpf_prog *prog,
 	cls_bpf.exts_integrated = obj->exts_integrated;
 	cls_bpf.gen_flags = obj->gen_flags;
 
+	if (oldprog)
+		tcf_block_offload_dec(block, &oldprog->gen_flags);
+
 	err = tc_setup_cb_call(block, NULL, TC_SETUP_CLSBPF, &cls_bpf, skip_sw);
 	if (prog) {
 		if (err < 0) {
 			cls_bpf_offload_cmd(tp, oldprog, prog);
 			return err;
 		} else if (err > 0) {
-			prog->gen_flags |= TCA_CLS_FLAGS_IN_HW;
+			tcf_block_offload_inc(block, &prog->gen_flags);
 		}
 	}
 
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 6132a73..f61df19 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -229,6 +229,7 @@ static void fl_hw_destroy_filter(struct tcf_proto *tp, struct cls_fl_filter *f)
 
 	tc_setup_cb_call(block, &f->exts, TC_SETUP_CLSFLOWER,
 			 &cls_flower, false);
+	tcf_block_offload_dec(block, &f->flags);
 }
 
 static int fl_hw_replace_filter(struct tcf_proto *tp,
@@ -256,7 +257,7 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
 		fl_hw_destroy_filter(tp, f);
 		return err;
 	} else if (err > 0) {
-		f->flags |= TCA_CLS_FLAGS_IN_HW;
+		tcf_block_offload_inc(block, &f->flags);
 	}
 
 	if (skip_sw && !(f->flags & TCA_CLS_FLAGS_IN_HW))
diff --git a/net/sched/cls_matchall.c b/net/sched/cls_matchall.c
index 66d4e00..d0e57c8 100644
--- a/net/sched/cls_matchall.c
+++ b/net/sched/cls_matchall.c
@@ -81,6 +81,7 @@ static void mall_destroy_hw_filter(struct tcf_proto *tp,
 	cls_mall.cookie = cookie;
 
 	tc_setup_cb_call(block, NULL, TC_SETUP_CLSMATCHALL, &cls_mall, false);
+	tcf_block_offload_dec(block, &head->flags);
 }
 
 static int mall_replace_hw_filter(struct tcf_proto *tp,
@@ -103,7 +104,7 @@ static int mall_replace_hw_filter(struct tcf_proto *tp,
 		mall_destroy_hw_filter(tp, head, cookie);
 		return err;
 	} else if (err > 0) {
-		head->flags |= TCA_CLS_FLAGS_IN_HW;
+		tcf_block_offload_inc(block, &head->flags);
 	}
 
 	if (skip_sw && !(head->flags & TCA_CLS_FLAGS_IN_HW))
diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 507859c..020d328 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -529,16 +529,17 @@ static int u32_replace_hw_hnode(struct tcf_proto *tp, struct tc_u_hnode *h,
 	return 0;
 }
 
-static void u32_remove_hw_knode(struct tcf_proto *tp, u32 handle)
+static void u32_remove_hw_knode(struct tcf_proto *tp, struct tc_u_knode *n)
 {
 	struct tcf_block *block = tp->chain->block;
 	struct tc_cls_u32_offload cls_u32 = {};
 
 	tc_cls_common_offload_init(&cls_u32.common, tp);
 	cls_u32.command = TC_CLSU32_DELETE_KNODE;
-	cls_u32.knode.handle = handle;
+	cls_u32.knode.handle = n->handle;
 
 	tc_setup_cb_call(block, NULL, TC_SETUP_CLSU32, &cls_u32, false);
+	tcf_block_offload_dec(block, &n->flags);
 }
 
 static int u32_replace_hw_knode(struct tcf_proto *tp, struct tc_u_knode *n,
@@ -567,10 +568,10 @@ static int u32_replace_hw_knode(struct tcf_proto *tp, struct tc_u_knode *n,
 
 	err = tc_setup_cb_call(block, NULL, TC_SETUP_CLSU32, &cls_u32, skip_sw);
 	if (err < 0) {
-		u32_remove_hw_knode(tp, n->handle);
+		u32_remove_hw_knode(tp, n);
 		return err;
 	} else if (err > 0) {
-		n->flags |= TCA_CLS_FLAGS_IN_HW;
+		tcf_block_offload_inc(block, &n->flags);
 	}
 
 	if (skip_sw && !(n->flags & TCA_CLS_FLAGS_IN_HW))
@@ -589,7 +590,7 @@ static void u32_clear_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht)
 			RCU_INIT_POINTER(ht->ht[h],
 					 rtnl_dereference(n->next));
 			tcf_unbind_filter(tp, &n->res);
-			u32_remove_hw_knode(tp, n->handle);
+			u32_remove_hw_knode(tp, n);
 			idr_remove_ext(&ht->handle_idr, n->handle);
 			if (tcf_exts_get_net(&n->exts))
 				call_rcu(&n->rcu, u32_delete_key_freepf_rcu);
@@ -682,7 +683,7 @@ static int u32_delete(struct tcf_proto *tp, void *arg, bool *last)
 		goto out;
 
 	if (TC_U32_KEY(ht->handle)) {
-		u32_remove_hw_knode(tp, ht->handle);
+		u32_remove_hw_knode(tp, (struct tc_u_knode *)ht);
 		ret = u32_delete_key(tp, (struct tc_u_knode *)ht);
 		goto out;
 	}
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [patch net-next v4 06/10] net: sched: allow ingress and clsact qdiscs to share filter blocks
  2017-12-23 15:54 [patch net-next v4 00/10] net: sched: allow qdiscs to share filter block instances Jiri Pirko
                   ` (4 preceding siblings ...)
  2017-12-23 15:54 ` [patch net-next v4 05/10] net: sched: keep track of offloaded filters and check tc offload feature Jiri Pirko
@ 2017-12-23 15:54 ` Jiri Pirko
  2017-12-23 15:54 ` [patch net-next v4 07/10] mlxsw: spectrum_acl: Reshuffle code around mlxsw_sp_acl_ruleset_create/destroy Jiri Pirko
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 40+ messages in thread
From: Jiri Pirko @ 2017-12-23 15:54 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, mlxsw, andrew, vivien.didelot,
	f.fainelli, michael.chan, ganeshgr, saeedm, matanb, leonro,
	idosch, jakub.kicinski, simon.horman, pieter.jansenvanvuuren,
	john.hurley, alexander.h.duyck, ogerlitz, john.fastabend, daniel,
	dsahern

From: Jiri Pirko <jiri@mellanox.com>

Benefit from the previously introduced shared filter blocks
infrastructure and allow ingress and clsact qdisc instances to share
filter blocks. The block index is coming from userspace as qdisc option.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
v3->v4:
- rebased on top of the current net-next
v2->v3:
- removed "p_" prefix from block index function args
---
 include/uapi/linux/pkt_sched.h | 11 ++++++
 net/sched/sch_ingress.c        | 89 +++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 99 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 37b5096..8cc554a 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -934,4 +934,15 @@ enum {
 
 #define TCA_CBS_MAX (__TCA_CBS_MAX - 1)
 
+/* Ingress/clsact */
+
+enum {
+	TCA_CLSACT_UNSPEC,
+	TCA_CLSACT_INGRESS_BLOCK,
+	TCA_CLSACT_EGRESS_BLOCK,
+	__TCA_CLSACT_MAX
+};
+
+#define TCA_CLSACT_MAX	(__TCA_CLSACT_MAX - 1)
+
 #endif
diff --git a/net/sched/sch_ingress.c b/net/sched/sch_ingress.c
index 7ca2be2..debcb8f 100644
--- a/net/sched/sch_ingress.c
+++ b/net/sched/sch_ingress.c
@@ -61,6 +61,29 @@ static void clsact_chain_head_change(struct tcf_proto *tp_head, void *priv)
 	struct mini_Qdisc_pair *miniqp = priv;
 
 	mini_qdisc_pair_swap(miniqp, tp_head);
+};
+
+static const struct nla_policy ingress_policy[TCA_CLSACT_MAX + 1] = {
+	[TCA_CLSACT_INGRESS_BLOCK]	= { .type = NLA_U32 },
+};
+
+static int ingress_parse_opt(struct nlattr *opt, u32 *ingress_block_index)
+{
+	struct nlattr *tb[TCA_CLSACT_MAX + 1];
+	int err;
+
+	*ingress_block_index = 0;
+
+	if (!opt)
+		return 0;
+	err = nla_parse_nested(tb, TCA_CLSACT_MAX, opt, ingress_policy, NULL);
+	if (err)
+		return err;
+
+	if (tb[TCA_CLSACT_INGRESS_BLOCK])
+		*ingress_block_index =
+			nla_get_u32(tb[TCA_CLSACT_INGRESS_BLOCK]);
+	return 0;
 }
 
 static int ingress_init(struct Qdisc *sch, struct nlattr *opt,
@@ -74,6 +97,11 @@ static int ingress_init(struct Qdisc *sch, struct nlattr *opt,
 
 	mini_qdisc_pair_init(&q->miniqp, sch, &dev->miniq_ingress);
 
+	err = ingress_parse_opt(opt, &q->block_info.block_index);
+	if (err)
+		return err;
+
+	q->block_info.shareable = true;
 	q->block_info.binder_type = TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
 	q->block_info.chain_head_change = clsact_chain_head_change;
 	q->block_info.chain_head_change_priv = &q->miniqp;
@@ -97,11 +125,14 @@ static void ingress_destroy(struct Qdisc *sch)
 
 static int ingress_dump(struct Qdisc *sch, struct sk_buff *skb)
 {
+	struct ingress_sched_data *q = qdisc_priv(sch);
 	struct nlattr *nest;
 
 	nest = nla_nest_start(skb, TCA_OPTIONS);
 	if (nest == NULL)
 		goto nla_put_failure;
+	if (nla_put_u32(skb, TCA_CLSACT_INGRESS_BLOCK, q->block->index))
+		goto nla_put_failure;
 
 	return nla_nest_end(skb, nest);
 
@@ -170,6 +201,35 @@ static struct tcf_block *clsact_tcf_block(struct Qdisc *sch, unsigned long cl,
 	}
 }
 
+static const struct nla_policy clsact_policy[TCA_CLSACT_MAX + 1] = {
+	[TCA_CLSACT_INGRESS_BLOCK]	= { .type = NLA_U32 },
+	[TCA_CLSACT_EGRESS_BLOCK]	= { .type = NLA_U32 },
+};
+
+static int clsact_parse_opt(struct nlattr *opt, u32 *ingress_block_index,
+			    u32 *egress_block_index)
+{
+	struct nlattr *tb[TCA_CLSACT_MAX + 1];
+	int err;
+
+	*ingress_block_index = 0;
+	*egress_block_index = 0;
+
+	if (!opt)
+		return 0;
+	err = nla_parse_nested(tb, TCA_CLSACT_MAX, opt, clsact_policy, NULL);
+	if (err)
+		return err;
+
+	if (tb[TCA_CLSACT_INGRESS_BLOCK])
+		*ingress_block_index =
+			nla_get_u32(tb[TCA_CLSACT_INGRESS_BLOCK]);
+	if (tb[TCA_CLSACT_EGRESS_BLOCK])
+		*egress_block_index =
+			nla_get_u32(tb[TCA_CLSACT_EGRESS_BLOCK]);
+	return 0;
+}
+
 static int clsact_init(struct Qdisc *sch, struct nlattr *opt,
 		       struct netlink_ext_ack *extack)
 {
@@ -182,6 +242,12 @@ static int clsact_init(struct Qdisc *sch, struct nlattr *opt,
 
 	mini_qdisc_pair_init(&q->miniqp_ingress, sch, &dev->miniq_ingress);
 
+	err = clsact_parse_opt(opt, &q->ingress_block_info.block_index,
+			       &q->egress_block_info.block_index);
+	if (err)
+		return err;
+
+	q->ingress_block_info.shareable = true;
 	q->ingress_block_info.binder_type = TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
 	q->ingress_block_info.chain_head_change = clsact_chain_head_change;
 	q->ingress_block_info.chain_head_change_priv = &q->miniqp_ingress;
@@ -193,6 +259,7 @@ static int clsact_init(struct Qdisc *sch, struct nlattr *opt,
 
 	mini_qdisc_pair_init(&q->miniqp_egress, sch, &dev->miniq_egress);
 
+	q->egress_block_info.shareable = true;
 	q->egress_block_info.binder_type = TCF_BLOCK_BINDER_TYPE_CLSACT_EGRESS;
 	q->egress_block_info.chain_head_change = clsact_chain_head_change;
 	q->egress_block_info.chain_head_change_priv = &q->miniqp_egress;
@@ -218,6 +285,26 @@ static void clsact_destroy(struct Qdisc *sch)
 	net_dec_egress_queue();
 }
 
+static int clsact_dump(struct Qdisc *sch, struct sk_buff *skb)
+{
+	struct clsact_sched_data *q = qdisc_priv(sch);
+	struct nlattr *nest;
+
+	nest = nla_nest_start(skb, TCA_OPTIONS);
+	if (!nest)
+		goto nla_put_failure;
+	if (nla_put_u32(skb, TCA_CLSACT_INGRESS_BLOCK, q->ingress_block->index))
+		goto nla_put_failure;
+	if (nla_put_u32(skb, TCA_CLSACT_EGRESS_BLOCK, q->egress_block->index))
+		goto nla_put_failure;
+
+	return nla_nest_end(skb, nest);
+
+nla_put_failure:
+	nla_nest_cancel(skb, nest);
+	return -1;
+}
+
 static const struct Qdisc_class_ops clsact_class_ops = {
 	.leaf		=	ingress_leaf,
 	.find		=	clsact_find,
@@ -233,7 +320,7 @@ static struct Qdisc_ops clsact_qdisc_ops __read_mostly = {
 	.priv_size	=	sizeof(struct clsact_sched_data),
 	.init		=	clsact_init,
 	.destroy	=	clsact_destroy,
-	.dump		=	ingress_dump,
+	.dump		=	clsact_dump,
 	.owner		=	THIS_MODULE,
 };
 
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [patch net-next v4 07/10] mlxsw: spectrum_acl: Reshuffle code around mlxsw_sp_acl_ruleset_create/destroy
  2017-12-23 15:54 [patch net-next v4 00/10] net: sched: allow qdiscs to share filter block instances Jiri Pirko
                   ` (5 preceding siblings ...)
  2017-12-23 15:54 ` [patch net-next v4 06/10] net: sched: allow ingress and clsact qdiscs to share filter blocks Jiri Pirko
@ 2017-12-23 15:54 ` Jiri Pirko
  2017-12-23 15:54 ` [patch net-next v4 08/10] mlxsw: spectrum_acl: Don't store netdev and ingress for ruleset unbind Jiri Pirko
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 40+ messages in thread
From: Jiri Pirko @ 2017-12-23 15:54 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, mlxsw, andrew, vivien.didelot,
	f.fainelli, michael.chan, ganeshgr, saeedm, matanb, leonro,
	idosch, jakub.kicinski, simon.horman, pieter.jansenvanvuuren,
	john.hurley, alexander.h.duyck, ogerlitz, john.fastabend, daniel,
	dsahern

From: Jiri Pirko <jiri@mellanox.com>

In order to prepare for follow-up changes, make the bind/unbind helpers
very simple. That required move of ht insertion/removal and bind/unbind
calls into mlxsw_sp_acl_ruleset_create/destroy.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_acl.c | 102 ++++++++++-----------
 1 file changed, 46 insertions(+), 56 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl.c
index 93dcd31..ead4cb8 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl.c
@@ -118,8 +118,26 @@ struct mlxsw_sp_fid *mlxsw_sp_acl_dummy_fid(struct mlxsw_sp *mlxsw_sp)
 	return mlxsw_sp->acl->dummy_fid;
 }
 
+static int mlxsw_sp_acl_ruleset_bind(struct mlxsw_sp *mlxsw_sp,
+				     struct mlxsw_sp_acl_ruleset *ruleset,
+				     struct net_device *dev, bool ingress)
+{
+	const struct mlxsw_sp_acl_profile_ops *ops = ruleset->ht_key.ops;
+
+	return ops->ruleset_bind(mlxsw_sp, ruleset->priv, dev, ingress);
+}
+
+static void mlxsw_sp_acl_ruleset_unbind(struct mlxsw_sp *mlxsw_sp,
+					struct mlxsw_sp_acl_ruleset *ruleset)
+{
+	const struct mlxsw_sp_acl_profile_ops *ops = ruleset->ht_key.ops;
+
+	ops->ruleset_unbind(mlxsw_sp, ruleset->priv);
+}
+
 static struct mlxsw_sp_acl_ruleset *
-mlxsw_sp_acl_ruleset_create(struct mlxsw_sp *mlxsw_sp,
+mlxsw_sp_acl_ruleset_create(struct mlxsw_sp *mlxsw_sp, struct net_device *dev,
+			    bool ingress, u32 chain_index,
 			    const struct mlxsw_sp_acl_profile_ops *ops)
 {
 	struct mlxsw_sp_acl *acl = mlxsw_sp->acl;
@@ -132,6 +150,9 @@ mlxsw_sp_acl_ruleset_create(struct mlxsw_sp *mlxsw_sp,
 	if (!ruleset)
 		return ERR_PTR(-ENOMEM);
 	ruleset->ref_count = 1;
+	ruleset->ht_key.dev = dev;
+	ruleset->ht_key.ingress = ingress;
+	ruleset->ht_key.chain_index = chain_index;
 	ruleset->ht_key.ops = ops;
 
 	err = rhashtable_init(&ruleset->rule_ht, &mlxsw_sp_acl_rule_ht_params);
@@ -142,68 +163,49 @@ mlxsw_sp_acl_ruleset_create(struct mlxsw_sp *mlxsw_sp,
 	if (err)
 		goto err_ops_ruleset_add;
 
-	return ruleset;
-
-err_ops_ruleset_add:
-	rhashtable_destroy(&ruleset->rule_ht);
-err_rhashtable_init:
-	kfree(ruleset);
-	return ERR_PTR(err);
-}
-
-static void mlxsw_sp_acl_ruleset_destroy(struct mlxsw_sp *mlxsw_sp,
-					 struct mlxsw_sp_acl_ruleset *ruleset)
-{
-	const struct mlxsw_sp_acl_profile_ops *ops = ruleset->ht_key.ops;
-
-	ops->ruleset_del(mlxsw_sp, ruleset->priv);
-	rhashtable_destroy(&ruleset->rule_ht);
-	kfree(ruleset);
-}
-
-static int mlxsw_sp_acl_ruleset_bind(struct mlxsw_sp *mlxsw_sp,
-				     struct mlxsw_sp_acl_ruleset *ruleset,
-				     struct net_device *dev, bool ingress,
-				     u32 chain_index)
-{
-	const struct mlxsw_sp_acl_profile_ops *ops = ruleset->ht_key.ops;
-	struct mlxsw_sp_acl *acl = mlxsw_sp->acl;
-	int err;
-
-	ruleset->ht_key.dev = dev;
-	ruleset->ht_key.ingress = ingress;
-	ruleset->ht_key.chain_index = chain_index;
 	err = rhashtable_insert_fast(&acl->ruleset_ht, &ruleset->ht_node,
 				     mlxsw_sp_acl_ruleset_ht_params);
 	if (err)
-		return err;
-	if (!ruleset->ht_key.chain_index) {
+		goto err_ht_insert;
+
+	if (!chain_index) {
 		/* We only need ruleset with chain index 0, the implicit one,
 		 * to be directly bound to device. The rest of the rulesets
 		 * are bound by "Goto action set".
 		 */
-		err = ops->ruleset_bind(mlxsw_sp, ruleset->priv, dev, ingress);
+		err = mlxsw_sp_acl_ruleset_bind(mlxsw_sp, ruleset,
+						dev, ingress);
 		if (err)
-			goto err_ops_ruleset_bind;
+			goto err_ruleset_bind;
 	}
-	return 0;
 
-err_ops_ruleset_bind:
+	return ruleset;
+
+err_ruleset_bind:
 	rhashtable_remove_fast(&acl->ruleset_ht, &ruleset->ht_node,
 			       mlxsw_sp_acl_ruleset_ht_params);
-	return err;
+err_ht_insert:
+	ops->ruleset_del(mlxsw_sp, ruleset->priv);
+err_ops_ruleset_add:
+	rhashtable_destroy(&ruleset->rule_ht);
+err_rhashtable_init:
+	kfree(ruleset);
+	return ERR_PTR(err);
 }
 
-static void mlxsw_sp_acl_ruleset_unbind(struct mlxsw_sp *mlxsw_sp,
-					struct mlxsw_sp_acl_ruleset *ruleset)
+static void mlxsw_sp_acl_ruleset_destroy(struct mlxsw_sp *mlxsw_sp,
+					 struct mlxsw_sp_acl_ruleset *ruleset)
 {
 	const struct mlxsw_sp_acl_profile_ops *ops = ruleset->ht_key.ops;
 	struct mlxsw_sp_acl *acl = mlxsw_sp->acl;
 
 	if (!ruleset->ht_key.chain_index)
-		ops->ruleset_unbind(mlxsw_sp, ruleset->priv);
+		mlxsw_sp_acl_ruleset_unbind(mlxsw_sp, ruleset);
 	rhashtable_remove_fast(&acl->ruleset_ht, &ruleset->ht_node,
 			       mlxsw_sp_acl_ruleset_ht_params);
+	ops->ruleset_del(mlxsw_sp, ruleset->priv);
+	rhashtable_destroy(&ruleset->rule_ht);
+	kfree(ruleset);
 }
 
 static void mlxsw_sp_acl_ruleset_ref_inc(struct mlxsw_sp_acl_ruleset *ruleset)
@@ -216,7 +218,6 @@ static void mlxsw_sp_acl_ruleset_ref_dec(struct mlxsw_sp *mlxsw_sp,
 {
 	if (--ruleset->ref_count)
 		return;
-	mlxsw_sp_acl_ruleset_unbind(mlxsw_sp, ruleset);
 	mlxsw_sp_acl_ruleset_destroy(mlxsw_sp, ruleset);
 }
 
@@ -263,7 +264,6 @@ mlxsw_sp_acl_ruleset_get(struct mlxsw_sp *mlxsw_sp, struct net_device *dev,
 	const struct mlxsw_sp_acl_profile_ops *ops;
 	struct mlxsw_sp_acl *acl = mlxsw_sp->acl;
 	struct mlxsw_sp_acl_ruleset *ruleset;
-	int err;
 
 	ops = acl->ops->profile_ops(mlxsw_sp, profile);
 	if (!ops)
@@ -275,18 +275,8 @@ mlxsw_sp_acl_ruleset_get(struct mlxsw_sp *mlxsw_sp, struct net_device *dev,
 		mlxsw_sp_acl_ruleset_ref_inc(ruleset);
 		return ruleset;
 	}
-	ruleset = mlxsw_sp_acl_ruleset_create(mlxsw_sp, ops);
-	if (IS_ERR(ruleset))
-		return ruleset;
-	err = mlxsw_sp_acl_ruleset_bind(mlxsw_sp, ruleset, dev,
-					ingress, chain_index);
-	if (err)
-		goto err_ruleset_bind;
-	return ruleset;
-
-err_ruleset_bind:
-	mlxsw_sp_acl_ruleset_destroy(mlxsw_sp, ruleset);
-	return ERR_PTR(err);
+	return mlxsw_sp_acl_ruleset_create(mlxsw_sp, dev, ingress,
+					   chain_index, ops);
 }
 
 void mlxsw_sp_acl_ruleset_put(struct mlxsw_sp *mlxsw_sp,
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [patch net-next v4 08/10] mlxsw: spectrum_acl: Don't store netdev and ingress for ruleset unbind
  2017-12-23 15:54 [patch net-next v4 00/10] net: sched: allow qdiscs to share filter block instances Jiri Pirko
                   ` (6 preceding siblings ...)
  2017-12-23 15:54 ` [patch net-next v4 07/10] mlxsw: spectrum_acl: Reshuffle code around mlxsw_sp_acl_ruleset_create/destroy Jiri Pirko
@ 2017-12-23 15:54 ` Jiri Pirko
  2017-12-23 15:54 ` [patch net-next v4 09/10] mlxsw: spectrum_acl: Implement TC block sharing Jiri Pirko
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 40+ messages in thread
From: Jiri Pirko @ 2017-12-23 15:54 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, mlxsw, andrew, vivien.didelot,
	f.fainelli, michael.chan, ganeshgr, saeedm, matanb, leonro,
	idosch, jakub.kicinski, simon.horman, pieter.jansenvanvuuren,
	john.hurley, alexander.h.duyck, ogerlitz, john.fastabend, daniel,
	dsahern

From: Jiri Pirko <jiri@mellanox.com>

Instead, pass netdev and ingress flag to ruleset unbind op.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h     |  3 +-
 drivers/net/ethernet/mellanox/mlxsw/spectrum_acl.c |  9 ++++--
 .../ethernet/mellanox/mlxsw/spectrum_acl_tcam.c    | 33 +++++++++++-----------
 3 files changed, 24 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index a0adcd8..523e64e 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -477,7 +477,8 @@ struct mlxsw_sp_acl_profile_ops {
 	void (*ruleset_del)(struct mlxsw_sp *mlxsw_sp, void *ruleset_priv);
 	int (*ruleset_bind)(struct mlxsw_sp *mlxsw_sp, void *ruleset_priv,
 			    struct net_device *dev, bool ingress);
-	void (*ruleset_unbind)(struct mlxsw_sp *mlxsw_sp, void *ruleset_priv);
+	void (*ruleset_unbind)(struct mlxsw_sp *mlxsw_sp, void *ruleset_priv,
+			       struct net_device *dev, bool ingress);
 	u16 (*ruleset_group_id)(void *ruleset_priv);
 	size_t rule_priv_size;
 	int (*rule_add)(struct mlxsw_sp *mlxsw_sp,
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl.c
index ead4cb8..7fb41a4 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl.c
@@ -128,11 +128,12 @@ static int mlxsw_sp_acl_ruleset_bind(struct mlxsw_sp *mlxsw_sp,
 }
 
 static void mlxsw_sp_acl_ruleset_unbind(struct mlxsw_sp *mlxsw_sp,
-					struct mlxsw_sp_acl_ruleset *ruleset)
+					struct mlxsw_sp_acl_ruleset *ruleset,
+					struct net_device *dev, bool ingress)
 {
 	const struct mlxsw_sp_acl_profile_ops *ops = ruleset->ht_key.ops;
 
-	ops->ruleset_unbind(mlxsw_sp, ruleset->priv);
+	ops->ruleset_unbind(mlxsw_sp, ruleset->priv, dev, ingress);
 }
 
 static struct mlxsw_sp_acl_ruleset *
@@ -200,7 +201,9 @@ static void mlxsw_sp_acl_ruleset_destroy(struct mlxsw_sp *mlxsw_sp,
 	struct mlxsw_sp_acl *acl = mlxsw_sp->acl;
 
 	if (!ruleset->ht_key.chain_index)
-		mlxsw_sp_acl_ruleset_unbind(mlxsw_sp, ruleset);
+		mlxsw_sp_acl_ruleset_unbind(mlxsw_sp, ruleset,
+					    ruleset->ht_key.dev,
+					    ruleset->ht_key.ingress);
 	rhashtable_remove_fast(&acl->ruleset_ht, &ruleset->ht_node,
 			       mlxsw_sp_acl_ruleset_ht_params);
 	ops->ruleset_del(mlxsw_sp, ruleset->priv);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c
index 7e8284b..50b2f9a 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c
@@ -154,10 +154,6 @@ struct mlxsw_sp_acl_tcam_group {
 	struct list_head region_list;
 	unsigned int region_count;
 	struct rhashtable chunk_ht;
-	struct {
-		u16 local_port;
-		bool ingress;
-	} bound;
 	struct mlxsw_sp_acl_tcam_group_ops *ops;
 	const struct mlxsw_sp_acl_tcam_pattern *patterns;
 	unsigned int patterns_count;
@@ -271,26 +267,28 @@ mlxsw_sp_acl_tcam_group_bind(struct mlxsw_sp *mlxsw_sp,
 		return -EINVAL;
 
 	mlxsw_sp_port = netdev_priv(dev);
-	group->bound.local_port = mlxsw_sp_port->local_port;
-	group->bound.ingress = ingress;
-	mlxsw_reg_ppbt_pack(ppbt_pl,
-			    group->bound.ingress ? MLXSW_REG_PXBT_E_IACL :
-						   MLXSW_REG_PXBT_E_EACL,
-			    MLXSW_REG_PXBT_OP_BIND, group->bound.local_port,
+	mlxsw_reg_ppbt_pack(ppbt_pl, ingress ? MLXSW_REG_PXBT_E_IACL :
+					       MLXSW_REG_PXBT_E_EACL,
+			    MLXSW_REG_PXBT_OP_BIND, mlxsw_sp_port->local_port,
 			    group->id);
 	return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(ppbt), ppbt_pl);
 }
 
 static void
 mlxsw_sp_acl_tcam_group_unbind(struct mlxsw_sp *mlxsw_sp,
-			       struct mlxsw_sp_acl_tcam_group *group)
+			       struct mlxsw_sp_acl_tcam_group *group,
+			       struct net_device *dev, bool ingress)
 {
+	struct mlxsw_sp_port *mlxsw_sp_port;
 	char ppbt_pl[MLXSW_REG_PPBT_LEN];
 
-	mlxsw_reg_ppbt_pack(ppbt_pl,
-			    group->bound.ingress ? MLXSW_REG_PXBT_E_IACL :
-						   MLXSW_REG_PXBT_E_EACL,
-			    MLXSW_REG_PXBT_OP_UNBIND, group->bound.local_port,
+	if (WARN_ON(!mlxsw_sp_port_dev_check(dev)))
+		return;
+
+	mlxsw_sp_port = netdev_priv(dev);
+	mlxsw_reg_ppbt_pack(ppbt_pl, ingress ? MLXSW_REG_PXBT_E_IACL :
+					       MLXSW_REG_PXBT_E_EACL,
+			    MLXSW_REG_PXBT_OP_UNBIND, mlxsw_sp_port->local_port,
 			    group->id);
 	mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(ppbt), ppbt_pl);
 }
@@ -1066,11 +1064,12 @@ mlxsw_sp_acl_tcam_flower_ruleset_bind(struct mlxsw_sp *mlxsw_sp,
 
 static void
 mlxsw_sp_acl_tcam_flower_ruleset_unbind(struct mlxsw_sp *mlxsw_sp,
-					void *ruleset_priv)
+					void *ruleset_priv,
+					struct net_device *dev, bool ingress)
 {
 	struct mlxsw_sp_acl_tcam_flower_ruleset *ruleset = ruleset_priv;
 
-	mlxsw_sp_acl_tcam_group_unbind(mlxsw_sp, &ruleset->group);
+	mlxsw_sp_acl_tcam_group_unbind(mlxsw_sp, &ruleset->group, dev, ingress);
 }
 
 static u16
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [patch net-next v4 09/10] mlxsw: spectrum_acl: Implement TC block sharing
  2017-12-23 15:54 [patch net-next v4 00/10] net: sched: allow qdiscs to share filter block instances Jiri Pirko
                   ` (7 preceding siblings ...)
  2017-12-23 15:54 ` [patch net-next v4 08/10] mlxsw: spectrum_acl: Don't store netdev and ingress for ruleset unbind Jiri Pirko
@ 2017-12-23 15:54 ` Jiri Pirko
  2017-12-23 15:54 ` [patch net-next v4 10/10] mlxsw: spectrum_acl: Pass mlxsw_sp_port down to ruleset bind/unbind ops Jiri Pirko
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 40+ messages in thread
From: Jiri Pirko @ 2017-12-23 15:54 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, mlxsw, andrew, vivien.didelot,
	f.fainelli, michael.chan, ganeshgr, saeedm, matanb, leonro,
	idosch, jakub.kicinski, simon.horman, pieter.jansenvanvuuren,
	john.hurley, alexander.h.duyck, ogerlitz, john.fastabend, daniel,
	dsahern

From: Jiri Pirko <jiri@mellanox.com>

Benefit from the prepared TC and in-driver ACL infrastructure and
introduce block sharing offload. For that, a new struct "block" is
introduced in spectrum_acl in order to hold a list of specific
block-port bindings.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
v2->v3:
- add tc offload feature handling
v1->v2:
- new patch
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c     | 182 +++++++++++++---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h     |  37 +++-
 drivers/net/ethernet/mellanox/mlxsw/spectrum_acl.c | 237 ++++++++++++++++++---
 .../net/ethernet/mellanox/mlxsw/spectrum_flower.c  |  41 ++--
 4 files changed, 401 insertions(+), 96 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index d373df7..fa2896f 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -1747,72 +1747,186 @@ static int mlxsw_sp_setup_tc_cls_matchall(struct mlxsw_sp_port *mlxsw_sp_port,
 }
 
 static int
-mlxsw_sp_setup_tc_cls_flower(struct mlxsw_sp_port *mlxsw_sp_port,
-			     struct tc_cls_flower_offload *f,
-			     bool ingress)
+mlxsw_sp_setup_tc_cls_flower(struct mlxsw_sp_acl_block *acl_block,
+			     struct tc_cls_flower_offload *f)
 {
+	struct mlxsw_sp *mlxsw_sp = mlxsw_sp_acl_block_mlxsw_sp(acl_block);
+
 	switch (f->command) {
 	case TC_CLSFLOWER_REPLACE:
-		return mlxsw_sp_flower_replace(mlxsw_sp_port, ingress, f);
+		return mlxsw_sp_flower_replace(mlxsw_sp, acl_block, f);
 	case TC_CLSFLOWER_DESTROY:
-		mlxsw_sp_flower_destroy(mlxsw_sp_port, ingress, f);
+		mlxsw_sp_flower_destroy(mlxsw_sp, acl_block, f);
 		return 0;
 	case TC_CLSFLOWER_STATS:
-		return mlxsw_sp_flower_stats(mlxsw_sp_port, ingress, f);
+		return mlxsw_sp_flower_stats(mlxsw_sp, acl_block, f);
 	default:
 		return -EOPNOTSUPP;
 	}
 }
 
-static int mlxsw_sp_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
-				      void *cb_priv, bool ingress)
+static int mlxsw_sp_setup_tc_block_cb_matchall(enum tc_setup_type type,
+					       void *type_data,
+					       void *cb_priv, bool ingress)
 {
 	struct mlxsw_sp_port *mlxsw_sp_port = cb_priv;
 
-	if (!tc_can_offload(mlxsw_sp_port->dev))
-		return -EOPNOTSUPP;
-
 	switch (type) {
 	case TC_SETUP_CLSMATCHALL:
+		if (!tc_can_offload(mlxsw_sp_port->dev))
+			return -EOPNOTSUPP;
+
 		return mlxsw_sp_setup_tc_cls_matchall(mlxsw_sp_port, type_data,
 						      ingress);
 	case TC_SETUP_CLSFLOWER:
-		return mlxsw_sp_setup_tc_cls_flower(mlxsw_sp_port, type_data,
-						    ingress);
+		return 0;
 	default:
 		return -EOPNOTSUPP;
 	}
 }
 
-static int mlxsw_sp_setup_tc_block_cb_ig(enum tc_setup_type type,
-					 void *type_data, void *cb_priv)
+static int mlxsw_sp_setup_tc_block_cb_matchall_ig(enum tc_setup_type type,
+						  void *type_data,
+						  void *cb_priv)
 {
-	return mlxsw_sp_setup_tc_block_cb(type, type_data, cb_priv, true);
+	return mlxsw_sp_setup_tc_block_cb_matchall(type, type_data,
+						   cb_priv, true);
 }
 
-static int mlxsw_sp_setup_tc_block_cb_eg(enum tc_setup_type type,
-					 void *type_data, void *cb_priv)
+static int mlxsw_sp_setup_tc_block_cb_matchall_eg(enum tc_setup_type type,
+						  void *type_data,
+						  void *cb_priv)
 {
-	return mlxsw_sp_setup_tc_block_cb(type, type_data, cb_priv, false);
+	return mlxsw_sp_setup_tc_block_cb_matchall(type, type_data,
+						   cb_priv, false);
+}
+
+static int mlxsw_sp_setup_tc_block_cb_flower(enum tc_setup_type type,
+					     void *type_data, void *cb_priv)
+{
+	struct mlxsw_sp_acl_block *acl_block = cb_priv;
+
+	switch (type) {
+	case TC_SETUP_CLSMATCHALL:
+		return 0;
+	case TC_SETUP_CLSFLOWER:
+		if (mlxsw_sp_acl_block_disabled(acl_block))
+			return -EOPNOTSUPP;
+
+		return mlxsw_sp_setup_tc_cls_flower(acl_block, type_data);
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+
+static int
+mlxsw_sp_setup_tc_block_flower_bind(struct mlxsw_sp_port *mlxsw_sp_port,
+				    struct tcf_block *block, bool ingress)
+{
+	struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp;
+	struct mlxsw_sp_acl_block *acl_block;
+	struct tcf_block_cb *block_cb;
+	int err;
+
+	block_cb = tcf_block_cb_lookup(block, mlxsw_sp_setup_tc_block_cb_flower,
+				       mlxsw_sp);
+	if (!block_cb) {
+		acl_block = mlxsw_sp_acl_block_create(mlxsw_sp, block->net);
+		if (!acl_block)
+			return -ENOMEM;
+		block_cb = __tcf_block_cb_register(block,
+						   mlxsw_sp_setup_tc_block_cb_flower,
+						   mlxsw_sp, acl_block);
+		if (IS_ERR(block_cb)) {
+			err = PTR_ERR(block_cb);
+			goto err_cb_register;
+		}
+	} else {
+		acl_block = tcf_block_cb_priv(block_cb);
+	}
+	tcf_block_cb_incref(block_cb);
+	err = mlxsw_sp_acl_block_bind(mlxsw_sp, acl_block,
+				      mlxsw_sp_port, ingress);
+	if (err)
+		goto err_block_bind;
+
+	if (ingress)
+		mlxsw_sp_port->ing_acl_block = acl_block;
+	else
+		mlxsw_sp_port->eg_acl_block = acl_block;
+
+	return 0;
+
+err_block_bind:
+	if (!tcf_block_cb_decref(block_cb)) {
+		__tcf_block_cb_unregister(block_cb);
+err_cb_register:
+		mlxsw_sp_acl_block_destroy(acl_block);
+	}
+	return err;
+}
+
+static void
+mlxsw_sp_setup_tc_block_flower_unbind(struct mlxsw_sp_port *mlxsw_sp_port,
+				      struct tcf_block *block, bool ingress)
+{
+	struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp;
+	struct mlxsw_sp_acl_block *acl_block;
+	struct tcf_block_cb *block_cb;
+	int err;
+
+	block_cb = tcf_block_cb_lookup(block, mlxsw_sp_setup_tc_block_cb_flower,
+				       mlxsw_sp);
+	if (!block_cb)
+		return;
+
+	if (ingress)
+		mlxsw_sp_port->ing_acl_block = NULL;
+	else
+		mlxsw_sp_port->eg_acl_block = NULL;
+
+	acl_block = tcf_block_cb_priv(block_cb);
+	err = mlxsw_sp_acl_block_unbind(mlxsw_sp, acl_block,
+					mlxsw_sp_port, ingress);
+	if (!err && !tcf_block_cb_decref(block_cb)) {
+		__tcf_block_cb_unregister(block_cb);
+		mlxsw_sp_acl_block_destroy(acl_block);
+	}
 }
 
 static int mlxsw_sp_setup_tc_block(struct mlxsw_sp_port *mlxsw_sp_port,
 				   struct tc_block_offload *f)
 {
 	tc_setup_cb_t *cb;
+	bool ingress;
+	int err;
 
-	if (f->binder_type == TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
-		cb = mlxsw_sp_setup_tc_block_cb_ig;
-	else if (f->binder_type == TCF_BLOCK_BINDER_TYPE_CLSACT_EGRESS)
-		cb = mlxsw_sp_setup_tc_block_cb_eg;
-	else
+	if (f->binder_type == TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS) {
+		cb = mlxsw_sp_setup_tc_block_cb_matchall_ig;
+		ingress = true;
+	} else if (f->binder_type == TCF_BLOCK_BINDER_TYPE_CLSACT_EGRESS) {
+		cb = mlxsw_sp_setup_tc_block_cb_matchall_eg;
+		ingress = false;
+	} else {
 		return -EOPNOTSUPP;
+	}
 
 	switch (f->command) {
 	case TC_BLOCK_BIND:
-		return tcf_block_cb_register(f->block, cb, mlxsw_sp_port,
-					     mlxsw_sp_port);
+		err = tcf_block_cb_register(f->block, cb, mlxsw_sp_port,
+					    mlxsw_sp_port);
+		if (err)
+			return err;
+		err = mlxsw_sp_setup_tc_block_flower_bind(mlxsw_sp_port,
+							  f->block, ingress);
+		if (err) {
+			tcf_block_cb_unregister(f->block, cb, mlxsw_sp_port);
+			return err;
+		}
+		return 0;
 	case TC_BLOCK_UNBIND:
+		mlxsw_sp_setup_tc_block_flower_unbind(mlxsw_sp_port,
+						      f->block, ingress);
 		tcf_block_cb_unregister(f->block, cb, mlxsw_sp_port);
 		return 0;
 	default:
@@ -1840,10 +1954,18 @@ static int mlxsw_sp_feature_hw_tc(struct net_device *dev, bool enable)
 {
 	struct mlxsw_sp_port *mlxsw_sp_port = netdev_priv(dev);
 
-	if (!enable && (mlxsw_sp_port->acl_rule_count ||
-			!list_empty(&mlxsw_sp_port->mall_tc_list))) {
-		netdev_err(dev, "Active offloaded tc filters, can't turn hw_tc_offload off\n");
-		return -EINVAL;
+	if (!enable) {
+		if (mlxsw_sp_acl_block_rule_count(mlxsw_sp_port->ing_acl_block) ||
+		    mlxsw_sp_acl_block_rule_count(mlxsw_sp_port->eg_acl_block) ||
+		    !list_empty(&mlxsw_sp_port->mall_tc_list)) {
+			netdev_err(dev, "Active offloaded tc filters, can't turn hw_tc_offload off\n");
+			return -EINVAL;
+		}
+		mlxsw_sp_acl_block_disable_inc(mlxsw_sp_port->ing_acl_block);
+		mlxsw_sp_acl_block_disable_inc(mlxsw_sp_port->eg_acl_block);
+	} else {
+		mlxsw_sp_acl_block_disable_dec(mlxsw_sp_port->ing_acl_block);
+		mlxsw_sp_acl_block_disable_dec(mlxsw_sp_port->eg_acl_block);
 	}
 	return 0;
 }
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index 523e64e..ab6ada7 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -270,7 +270,8 @@ struct mlxsw_sp_port {
 	struct mlxsw_sp_port_sample *sample;
 	struct list_head vlans_list;
 	struct mlxsw_sp_qdisc root_qdisc;
-	unsigned acl_rule_count;
+	struct mlxsw_sp_acl_block *ing_acl_block;
+	struct mlxsw_sp_acl_block *eg_acl_block;
 };
 
 static inline bool
@@ -498,17 +499,34 @@ struct mlxsw_sp_acl_ops {
 				       enum mlxsw_sp_acl_profile profile);
 };
 
+struct mlxsw_sp_acl_block;
 struct mlxsw_sp_acl_ruleset;
 
 /* spectrum_acl.c */
 struct mlxsw_afk *mlxsw_sp_acl_afk(struct mlxsw_sp_acl *acl);
+struct mlxsw_sp *mlxsw_sp_acl_block_mlxsw_sp(struct mlxsw_sp_acl_block *block);
+unsigned int mlxsw_sp_acl_block_rule_count(struct mlxsw_sp_acl_block *block);
+void mlxsw_sp_acl_block_disable_inc(struct mlxsw_sp_acl_block *block);
+void mlxsw_sp_acl_block_disable_dec(struct mlxsw_sp_acl_block *block);
+bool mlxsw_sp_acl_block_disabled(struct mlxsw_sp_acl_block *block);
+struct mlxsw_sp_acl_block *mlxsw_sp_acl_block_create(struct mlxsw_sp *mlxsw_sp,
+						     struct net *net);
+void mlxsw_sp_acl_block_destroy(struct mlxsw_sp_acl_block *block);
+int mlxsw_sp_acl_block_bind(struct mlxsw_sp *mlxsw_sp,
+			    struct mlxsw_sp_acl_block *block,
+			    struct mlxsw_sp_port *mlxsw_sp_port,
+			    bool ingress);
+int mlxsw_sp_acl_block_unbind(struct mlxsw_sp *mlxsw_sp,
+			      struct mlxsw_sp_acl_block *block,
+			      struct mlxsw_sp_port *mlxsw_sp_port,
+			      bool ingress);
 struct mlxsw_sp_acl_ruleset *
-mlxsw_sp_acl_ruleset_lookup(struct mlxsw_sp *mlxsw_sp, struct net_device *dev,
-			    bool ingress, u32 chain_index,
+mlxsw_sp_acl_ruleset_lookup(struct mlxsw_sp *mlxsw_sp,
+			    struct mlxsw_sp_acl_block *block, u32 chain_index,
 			    enum mlxsw_sp_acl_profile profile);
 struct mlxsw_sp_acl_ruleset *
-mlxsw_sp_acl_ruleset_get(struct mlxsw_sp *mlxsw_sp, struct net_device *dev,
-			 bool ingress, u32 chain_index,
+mlxsw_sp_acl_ruleset_get(struct mlxsw_sp *mlxsw_sp,
+			 struct mlxsw_sp_acl_block *block, u32 chain_index,
 			 enum mlxsw_sp_acl_profile profile);
 void mlxsw_sp_acl_ruleset_put(struct mlxsw_sp *mlxsw_sp,
 			      struct mlxsw_sp_acl_ruleset *ruleset);
@@ -575,11 +593,14 @@ void mlxsw_sp_acl_fini(struct mlxsw_sp *mlxsw_sp);
 extern const struct mlxsw_sp_acl_ops mlxsw_sp_acl_tcam_ops;
 
 /* spectrum_flower.c */
-int mlxsw_sp_flower_replace(struct mlxsw_sp_port *mlxsw_sp_port, bool ingress,
+int mlxsw_sp_flower_replace(struct mlxsw_sp *mlxsw_sp,
+			    struct mlxsw_sp_acl_block *block,
 			    struct tc_cls_flower_offload *f);
-void mlxsw_sp_flower_destroy(struct mlxsw_sp_port *mlxsw_sp_port, bool ingress,
+void mlxsw_sp_flower_destroy(struct mlxsw_sp *mlxsw_sp,
+			     struct mlxsw_sp_acl_block *block,
 			     struct tc_cls_flower_offload *f);
-int mlxsw_sp_flower_stats(struct mlxsw_sp_port *mlxsw_sp_port, bool ingress,
+int mlxsw_sp_flower_stats(struct mlxsw_sp *mlxsw_sp,
+			  struct mlxsw_sp_acl_block *block,
 			  struct tc_cls_flower_offload *f);
 
 /* spectrum_qdisc.c */
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl.c
index 7fb41a4..f98bca9 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl.c
@@ -39,6 +39,7 @@
 #include <linux/string.h>
 #include <linux/rhashtable.h>
 #include <linux/netdevice.h>
+#include <net/net_namespace.h>
 #include <net/tc_act/tc_vlan.h>
 
 #include "reg.h"
@@ -70,9 +71,23 @@ struct mlxsw_afk *mlxsw_sp_acl_afk(struct mlxsw_sp_acl *acl)
 	return acl->afk;
 }
 
-struct mlxsw_sp_acl_ruleset_ht_key {
-	struct net_device *dev; /* dev this ruleset is bound to */
+struct mlxsw_sp_acl_block_binding {
+	struct list_head list;
+	struct net_device *dev;
+	struct mlxsw_sp_port *mlxsw_sp_port;
 	bool ingress;
+};
+
+struct mlxsw_sp_acl_block {
+	struct list_head binding_list;
+	struct mlxsw_sp_acl_ruleset *ruleset_zero;
+	struct mlxsw_sp *mlxsw_sp;
+	unsigned int rule_count;
+	unsigned int disable_count;
+};
+
+struct mlxsw_sp_acl_ruleset_ht_key {
+	struct mlxsw_sp_acl_block *block;
 	u32 chain_index;
 	const struct mlxsw_sp_acl_profile_ops *ops;
 };
@@ -118,27 +133,185 @@ struct mlxsw_sp_fid *mlxsw_sp_acl_dummy_fid(struct mlxsw_sp *mlxsw_sp)
 	return mlxsw_sp->acl->dummy_fid;
 }
 
-static int mlxsw_sp_acl_ruleset_bind(struct mlxsw_sp *mlxsw_sp,
-				     struct mlxsw_sp_acl_ruleset *ruleset,
-				     struct net_device *dev, bool ingress)
+struct mlxsw_sp *mlxsw_sp_acl_block_mlxsw_sp(struct mlxsw_sp_acl_block *block)
+{
+	return block->mlxsw_sp;
+}
+
+unsigned int mlxsw_sp_acl_block_rule_count(struct mlxsw_sp_acl_block *block)
+{
+	return block ? block->rule_count : 0;
+}
+
+void mlxsw_sp_acl_block_disable_inc(struct mlxsw_sp_acl_block *block)
+{
+	if (block)
+		block->disable_count++;
+}
+
+void mlxsw_sp_acl_block_disable_dec(struct mlxsw_sp_acl_block *block)
+{
+	if (block)
+		block->disable_count--;
+}
+
+bool mlxsw_sp_acl_block_disabled(struct mlxsw_sp_acl_block *block)
 {
+	return block->disable_count;
+}
+
+static int
+mlxsw_sp_acl_ruleset_bind(struct mlxsw_sp *mlxsw_sp,
+			  struct mlxsw_sp_acl_block *block,
+			  struct mlxsw_sp_acl_block_binding *binding)
+{
+	struct mlxsw_sp_acl_ruleset *ruleset = block->ruleset_zero;
 	const struct mlxsw_sp_acl_profile_ops *ops = ruleset->ht_key.ops;
 
-	return ops->ruleset_bind(mlxsw_sp, ruleset->priv, dev, ingress);
+	return ops->ruleset_bind(mlxsw_sp, ruleset->priv,
+				 binding->mlxsw_sp_port->dev, binding->ingress);
 }
 
-static void mlxsw_sp_acl_ruleset_unbind(struct mlxsw_sp *mlxsw_sp,
-					struct mlxsw_sp_acl_ruleset *ruleset,
-					struct net_device *dev, bool ingress)
+static void
+mlxsw_sp_acl_ruleset_unbind(struct mlxsw_sp *mlxsw_sp,
+			    struct mlxsw_sp_acl_block *block,
+			    struct mlxsw_sp_acl_block_binding *binding)
 {
+	struct mlxsw_sp_acl_ruleset *ruleset = block->ruleset_zero;
 	const struct mlxsw_sp_acl_profile_ops *ops = ruleset->ht_key.ops;
 
-	ops->ruleset_unbind(mlxsw_sp, ruleset->priv, dev, ingress);
+	ops->ruleset_unbind(mlxsw_sp, ruleset->priv,
+			    binding->mlxsw_sp_port->dev, binding->ingress);
+}
+
+static bool mlxsw_sp_acl_ruleset_block_bound(struct mlxsw_sp_acl_block *block)
+{
+	return block->ruleset_zero;
+}
+
+static int
+mlxsw_sp_acl_ruleset_block_bind(struct mlxsw_sp *mlxsw_sp,
+				struct mlxsw_sp_acl_ruleset *ruleset,
+				struct mlxsw_sp_acl_block *block)
+{
+	struct mlxsw_sp_acl_block_binding *binding;
+	int err;
+
+	block->ruleset_zero = ruleset;
+	list_for_each_entry(binding, &block->binding_list, list) {
+		err = mlxsw_sp_acl_ruleset_bind(mlxsw_sp, block, binding);
+		if (err)
+			goto rollback;
+	}
+	return 0;
+
+rollback:
+	list_for_each_entry_continue_reverse(binding, &block->binding_list,
+					     list)
+		mlxsw_sp_acl_ruleset_unbind(mlxsw_sp, block, binding);
+	block->ruleset_zero = NULL;
+
+	return err;
+}
+
+static void
+mlxsw_sp_acl_ruleset_block_unbind(struct mlxsw_sp *mlxsw_sp,
+				  struct mlxsw_sp_acl_ruleset *ruleset,
+				  struct mlxsw_sp_acl_block *block)
+{
+	struct mlxsw_sp_acl_block_binding *binding;
+
+	list_for_each_entry(binding, &block->binding_list, list)
+		mlxsw_sp_acl_ruleset_unbind(mlxsw_sp, block, binding);
+	block->ruleset_zero = NULL;
+}
+
+struct mlxsw_sp_acl_block *mlxsw_sp_acl_block_create(struct mlxsw_sp *mlxsw_sp,
+						     struct net *net)
+{
+	struct mlxsw_sp_acl_block *block;
+
+	block = kzalloc(sizeof(*block), GFP_KERNEL);
+	if (!block)
+		return NULL;
+	INIT_LIST_HEAD(&block->binding_list);
+	block->mlxsw_sp = mlxsw_sp;
+	return block;
+}
+
+void mlxsw_sp_acl_block_destroy(struct mlxsw_sp_acl_block *block)
+{
+	WARN_ON(!list_empty(&block->binding_list));
+	kfree(block);
+}
+
+static struct mlxsw_sp_acl_block_binding *
+mlxsw_sp_acl_block_lookup(struct mlxsw_sp_acl_block *block,
+			  struct mlxsw_sp_port *mlxsw_sp_port, bool ingress)
+{
+	struct mlxsw_sp_acl_block_binding *binding;
+
+	list_for_each_entry(binding, &block->binding_list, list)
+		if (binding->mlxsw_sp_port == mlxsw_sp_port &&
+		    binding->ingress == ingress)
+			return binding;
+	return NULL;
+}
+
+int mlxsw_sp_acl_block_bind(struct mlxsw_sp *mlxsw_sp,
+			    struct mlxsw_sp_acl_block *block,
+			    struct mlxsw_sp_port *mlxsw_sp_port,
+			    bool ingress)
+{
+	struct mlxsw_sp_acl_block_binding *binding;
+	int err;
+
+	if (WARN_ON(mlxsw_sp_acl_block_lookup(block, mlxsw_sp_port, ingress)))
+		return -EEXIST;
+
+	binding = kzalloc(sizeof(*binding), GFP_KERNEL);
+	if (!binding)
+		return -ENOMEM;
+	binding->mlxsw_sp_port = mlxsw_sp_port;
+	binding->ingress = ingress;
+
+	if (mlxsw_sp_acl_ruleset_block_bound(block)) {
+		err = mlxsw_sp_acl_ruleset_bind(mlxsw_sp, block, binding);
+		if (err)
+			goto err_ruleset_bind;
+	}
+
+	list_add(&binding->list, &block->binding_list);
+	return 0;
+
+err_ruleset_bind:
+	kfree(binding);
+	return err;
+}
+
+int mlxsw_sp_acl_block_unbind(struct mlxsw_sp *mlxsw_sp,
+			      struct mlxsw_sp_acl_block *block,
+			      struct mlxsw_sp_port *mlxsw_sp_port,
+			      bool ingress)
+{
+	struct mlxsw_sp_acl_block_binding *binding;
+
+	binding = mlxsw_sp_acl_block_lookup(block, mlxsw_sp_port, ingress);
+	if (!binding)
+		return -ENOENT;
+
+	list_del(&binding->list);
+
+	if (mlxsw_sp_acl_ruleset_block_bound(block))
+		mlxsw_sp_acl_ruleset_unbind(mlxsw_sp, block, binding);
+
+	kfree(binding);
+	return 0;
 }
 
 static struct mlxsw_sp_acl_ruleset *
-mlxsw_sp_acl_ruleset_create(struct mlxsw_sp *mlxsw_sp, struct net_device *dev,
-			    bool ingress, u32 chain_index,
+mlxsw_sp_acl_ruleset_create(struct mlxsw_sp *mlxsw_sp,
+			    struct mlxsw_sp_acl_block *block, u32 chain_index,
 			    const struct mlxsw_sp_acl_profile_ops *ops)
 {
 	struct mlxsw_sp_acl *acl = mlxsw_sp->acl;
@@ -151,8 +324,7 @@ mlxsw_sp_acl_ruleset_create(struct mlxsw_sp *mlxsw_sp, struct net_device *dev,
 	if (!ruleset)
 		return ERR_PTR(-ENOMEM);
 	ruleset->ref_count = 1;
-	ruleset->ht_key.dev = dev;
-	ruleset->ht_key.ingress = ingress;
+	ruleset->ht_key.block = block;
 	ruleset->ht_key.chain_index = chain_index;
 	ruleset->ht_key.ops = ops;
 
@@ -174,8 +346,7 @@ mlxsw_sp_acl_ruleset_create(struct mlxsw_sp *mlxsw_sp, struct net_device *dev,
 		 * to be directly bound to device. The rest of the rulesets
 		 * are bound by "Goto action set".
 		 */
-		err = mlxsw_sp_acl_ruleset_bind(mlxsw_sp, ruleset,
-						dev, ingress);
+		err = mlxsw_sp_acl_ruleset_block_bind(mlxsw_sp, ruleset, block);
 		if (err)
 			goto err_ruleset_bind;
 	}
@@ -198,12 +369,12 @@ static void mlxsw_sp_acl_ruleset_destroy(struct mlxsw_sp *mlxsw_sp,
 					 struct mlxsw_sp_acl_ruleset *ruleset)
 {
 	const struct mlxsw_sp_acl_profile_ops *ops = ruleset->ht_key.ops;
+	struct mlxsw_sp_acl_block *block = ruleset->ht_key.block;
+	u32 chain_index = ruleset->ht_key.chain_index;
 	struct mlxsw_sp_acl *acl = mlxsw_sp->acl;
 
-	if (!ruleset->ht_key.chain_index)
-		mlxsw_sp_acl_ruleset_unbind(mlxsw_sp, ruleset,
-					    ruleset->ht_key.dev,
-					    ruleset->ht_key.ingress);
+	if (!chain_index)
+		mlxsw_sp_acl_ruleset_block_unbind(mlxsw_sp, ruleset, block);
 	rhashtable_remove_fast(&acl->ruleset_ht, &ruleset->ht_node,
 			       mlxsw_sp_acl_ruleset_ht_params);
 	ops->ruleset_del(mlxsw_sp, ruleset->priv);
@@ -225,15 +396,14 @@ static void mlxsw_sp_acl_ruleset_ref_dec(struct mlxsw_sp *mlxsw_sp,
 }
 
 static struct mlxsw_sp_acl_ruleset *
-__mlxsw_sp_acl_ruleset_lookup(struct mlxsw_sp_acl *acl, struct net_device *dev,
-			      bool ingress, u32 chain_index,
+__mlxsw_sp_acl_ruleset_lookup(struct mlxsw_sp_acl *acl,
+			      struct mlxsw_sp_acl_block *block, u32 chain_index,
 			      const struct mlxsw_sp_acl_profile_ops *ops)
 {
 	struct mlxsw_sp_acl_ruleset_ht_key ht_key;
 
 	memset(&ht_key, 0, sizeof(ht_key));
-	ht_key.dev = dev;
-	ht_key.ingress = ingress;
+	ht_key.block = block;
 	ht_key.chain_index = chain_index;
 	ht_key.ops = ops;
 	return rhashtable_lookup_fast(&acl->ruleset_ht, &ht_key,
@@ -241,8 +411,8 @@ __mlxsw_sp_acl_ruleset_lookup(struct mlxsw_sp_acl *acl, struct net_device *dev,
 }
 
 struct mlxsw_sp_acl_ruleset *
-mlxsw_sp_acl_ruleset_lookup(struct mlxsw_sp *mlxsw_sp, struct net_device *dev,
-			    bool ingress, u32 chain_index,
+mlxsw_sp_acl_ruleset_lookup(struct mlxsw_sp *mlxsw_sp,
+			    struct mlxsw_sp_acl_block *block, u32 chain_index,
 			    enum mlxsw_sp_acl_profile profile)
 {
 	const struct mlxsw_sp_acl_profile_ops *ops;
@@ -252,16 +422,15 @@ mlxsw_sp_acl_ruleset_lookup(struct mlxsw_sp *mlxsw_sp, struct net_device *dev,
 	ops = acl->ops->profile_ops(mlxsw_sp, profile);
 	if (!ops)
 		return ERR_PTR(-EINVAL);
-	ruleset = __mlxsw_sp_acl_ruleset_lookup(acl, dev, ingress,
-						chain_index, ops);
+	ruleset = __mlxsw_sp_acl_ruleset_lookup(acl, block, chain_index, ops);
 	if (!ruleset)
 		return ERR_PTR(-ENOENT);
 	return ruleset;
 }
 
 struct mlxsw_sp_acl_ruleset *
-mlxsw_sp_acl_ruleset_get(struct mlxsw_sp *mlxsw_sp, struct net_device *dev,
-			 bool ingress, u32 chain_index,
+mlxsw_sp_acl_ruleset_get(struct mlxsw_sp *mlxsw_sp,
+			 struct mlxsw_sp_acl_block *block, u32 chain_index,
 			 enum mlxsw_sp_acl_profile profile)
 {
 	const struct mlxsw_sp_acl_profile_ops *ops;
@@ -272,14 +441,12 @@ mlxsw_sp_acl_ruleset_get(struct mlxsw_sp *mlxsw_sp, struct net_device *dev,
 	if (!ops)
 		return ERR_PTR(-EINVAL);
 
-	ruleset = __mlxsw_sp_acl_ruleset_lookup(acl, dev, ingress,
-						chain_index, ops);
+	ruleset = __mlxsw_sp_acl_ruleset_lookup(acl, block, chain_index, ops);
 	if (ruleset) {
 		mlxsw_sp_acl_ruleset_ref_inc(ruleset);
 		return ruleset;
 	}
-	return mlxsw_sp_acl_ruleset_create(mlxsw_sp, dev, ingress,
-					   chain_index, ops);
+	return mlxsw_sp_acl_ruleset_create(mlxsw_sp, block, chain_index, ops);
 }
 
 void mlxsw_sp_acl_ruleset_put(struct mlxsw_sp *mlxsw_sp,
@@ -528,6 +695,7 @@ int mlxsw_sp_acl_rule_add(struct mlxsw_sp *mlxsw_sp,
 		goto err_rhashtable_insert;
 
 	list_add_tail(&rule->list, &mlxsw_sp->acl->rules);
+	ruleset->ht_key.block->rule_count++;
 	return 0;
 
 err_rhashtable_insert:
@@ -541,6 +709,7 @@ void mlxsw_sp_acl_rule_del(struct mlxsw_sp *mlxsw_sp,
 	struct mlxsw_sp_acl_ruleset *ruleset = rule->ruleset;
 	const struct mlxsw_sp_acl_profile_ops *ops = ruleset->ht_key.ops;
 
+	ruleset->ht_key.block->rule_count--;
 	list_del(&rule->list);
 	rhashtable_remove_fast(&ruleset->rule_ht, &rule->ht_node,
 			       mlxsw_sp_acl_rule_ht_params);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c
index 42e8a36..cf7b97d4 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c
@@ -35,6 +35,7 @@
 #include <linux/kernel.h>
 #include <linux/errno.h>
 #include <linux/netdevice.h>
+#include <net/net_namespace.h>
 #include <net/flow_dissector.h>
 #include <net/pkt_cls.h>
 #include <net/tc_act/tc_gact.h>
@@ -45,7 +46,7 @@
 #include "core_acl_flex_keys.h"
 
 static int mlxsw_sp_flower_parse_actions(struct mlxsw_sp *mlxsw_sp,
-					 struct net_device *dev, bool ingress,
+					 struct mlxsw_sp_acl_block *block,
 					 struct mlxsw_sp_acl_rule_info *rulei,
 					 struct tcf_exts *exts)
 {
@@ -80,8 +81,7 @@ static int mlxsw_sp_flower_parse_actions(struct mlxsw_sp *mlxsw_sp,
 			struct mlxsw_sp_acl_ruleset *ruleset;
 			u16 group_id;
 
-			ruleset = mlxsw_sp_acl_ruleset_lookup(mlxsw_sp, dev,
-							      ingress,
+			ruleset = mlxsw_sp_acl_ruleset_lookup(mlxsw_sp, block,
 							      chain_index,
 							      MLXSW_SP_ACL_PROFILE_FLOWER);
 			if (IS_ERR(ruleset))
@@ -104,9 +104,6 @@ static int mlxsw_sp_flower_parse_actions(struct mlxsw_sp *mlxsw_sp,
 				return err;
 
 			out_dev = tcf_mirred_dev(a);
-			if (out_dev == dev)
-				out_dev = NULL;
-
 			err = mlxsw_sp_acl_rulei_act_fwd(mlxsw_sp, rulei,
 							 out_dev);
 			if (err)
@@ -265,7 +262,7 @@ static int mlxsw_sp_flower_parse_ip(struct mlxsw_sp *mlxsw_sp,
 }
 
 static int mlxsw_sp_flower_parse(struct mlxsw_sp *mlxsw_sp,
-				 struct net_device *dev, bool ingress,
+				 struct mlxsw_sp_acl_block *block,
 				 struct mlxsw_sp_acl_rule_info *rulei,
 				 struct tc_cls_flower_offload *f)
 {
@@ -383,21 +380,19 @@ static int mlxsw_sp_flower_parse(struct mlxsw_sp *mlxsw_sp,
 	if (err)
 		return err;
 
-	return mlxsw_sp_flower_parse_actions(mlxsw_sp, dev, ingress,
-					     rulei, f->exts);
+	return mlxsw_sp_flower_parse_actions(mlxsw_sp, block, rulei, f->exts);
 }
 
-int mlxsw_sp_flower_replace(struct mlxsw_sp_port *mlxsw_sp_port, bool ingress,
+int mlxsw_sp_flower_replace(struct mlxsw_sp *mlxsw_sp,
+			    struct mlxsw_sp_acl_block *block,
 			    struct tc_cls_flower_offload *f)
 {
-	struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp;
-	struct net_device *dev = mlxsw_sp_port->dev;
 	struct mlxsw_sp_acl_rule_info *rulei;
 	struct mlxsw_sp_acl_ruleset *ruleset;
 	struct mlxsw_sp_acl_rule *rule;
 	int err;
 
-	ruleset = mlxsw_sp_acl_ruleset_get(mlxsw_sp, dev, ingress,
+	ruleset = mlxsw_sp_acl_ruleset_get(mlxsw_sp, block,
 					   f->common.chain_index,
 					   MLXSW_SP_ACL_PROFILE_FLOWER);
 	if (IS_ERR(ruleset))
@@ -410,7 +405,7 @@ int mlxsw_sp_flower_replace(struct mlxsw_sp_port *mlxsw_sp_port, bool ingress,
 	}
 
 	rulei = mlxsw_sp_acl_rule_rulei(rule);
-	err = mlxsw_sp_flower_parse(mlxsw_sp, dev, ingress, rulei, f);
+	err = mlxsw_sp_flower_parse(mlxsw_sp, block, rulei, f);
 	if (err)
 		goto err_flower_parse;
 
@@ -423,7 +418,6 @@ int mlxsw_sp_flower_replace(struct mlxsw_sp_port *mlxsw_sp_port, bool ingress,
 		goto err_rule_add;
 
 	mlxsw_sp_acl_ruleset_put(mlxsw_sp, ruleset);
-	mlxsw_sp_port->acl_rule_count++;
 	return 0;
 
 err_rule_add:
@@ -435,15 +429,15 @@ int mlxsw_sp_flower_replace(struct mlxsw_sp_port *mlxsw_sp_port, bool ingress,
 	return err;
 }
 
-void mlxsw_sp_flower_destroy(struct mlxsw_sp_port *mlxsw_sp_port, bool ingress,
+void mlxsw_sp_flower_destroy(struct mlxsw_sp *mlxsw_sp,
+			     struct mlxsw_sp_acl_block *block,
 			     struct tc_cls_flower_offload *f)
 {
-	struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp;
 	struct mlxsw_sp_acl_ruleset *ruleset;
 	struct mlxsw_sp_acl_rule *rule;
 
-	ruleset = mlxsw_sp_acl_ruleset_get(mlxsw_sp, mlxsw_sp_port->dev,
-					   ingress, f->common.chain_index,
+	ruleset = mlxsw_sp_acl_ruleset_get(mlxsw_sp, block,
+					   f->common.chain_index,
 					   MLXSW_SP_ACL_PROFILE_FLOWER);
 	if (IS_ERR(ruleset))
 		return;
@@ -455,13 +449,12 @@ void mlxsw_sp_flower_destroy(struct mlxsw_sp_port *mlxsw_sp_port, bool ingress,
 	}
 
 	mlxsw_sp_acl_ruleset_put(mlxsw_sp, ruleset);
-	mlxsw_sp_port->acl_rule_count--;
 }
 
-int mlxsw_sp_flower_stats(struct mlxsw_sp_port *mlxsw_sp_port, bool ingress,
+int mlxsw_sp_flower_stats(struct mlxsw_sp *mlxsw_sp,
+			  struct mlxsw_sp_acl_block *block,
 			  struct tc_cls_flower_offload *f)
 {
-	struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp;
 	struct mlxsw_sp_acl_ruleset *ruleset;
 	struct mlxsw_sp_acl_rule *rule;
 	u64 packets;
@@ -469,8 +462,8 @@ int mlxsw_sp_flower_stats(struct mlxsw_sp_port *mlxsw_sp_port, bool ingress,
 	u64 bytes;
 	int err;
 
-	ruleset = mlxsw_sp_acl_ruleset_get(mlxsw_sp, mlxsw_sp_port->dev,
-					   ingress, f->common.chain_index,
+	ruleset = mlxsw_sp_acl_ruleset_get(mlxsw_sp, block,
+					   f->common.chain_index,
 					   MLXSW_SP_ACL_PROFILE_FLOWER);
 	if (WARN_ON(IS_ERR(ruleset)))
 		return -EINVAL;
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [patch net-next v4 10/10] mlxsw: spectrum_acl: Pass mlxsw_sp_port down to ruleset bind/unbind ops
  2017-12-23 15:54 [patch net-next v4 00/10] net: sched: allow qdiscs to share filter block instances Jiri Pirko
                   ` (8 preceding siblings ...)
  2017-12-23 15:54 ` [patch net-next v4 09/10] mlxsw: spectrum_acl: Implement TC block sharing Jiri Pirko
@ 2017-12-23 15:54 ` Jiri Pirko
  2017-12-23 16:06 ` [patch iproute2] tc: implement filter block sharing to ingress and clsact qdiscs Jiri Pirko
  2017-12-24  1:54 ` [patch net-next v4 00/10] net: sched: allow qdiscs to share filter block instances David Ahern
  11 siblings, 0 replies; 40+ messages in thread
From: Jiri Pirko @ 2017-12-23 15:54 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, mlxsw, andrew, vivien.didelot,
	f.fainelli, michael.chan, ganeshgr, saeedm, matanb, leonro,
	idosch, jakub.kicinski, simon.horman, pieter.jansenvanvuuren,
	john.hurley, alexander.h.duyck, ogerlitz, john.fastabend, daniel,
	dsahern

From: Jiri Pirko <jiri@mellanox.com>

No need to convert from mlxsw_sp_port to net_device and back again.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h     |  6 +++--
 drivers/net/ethernet/mellanox/mlxsw/spectrum_acl.c |  4 ++--
 .../ethernet/mellanox/mlxsw/spectrum_acl_tcam.c    | 27 +++++++++-------------
 3 files changed, 17 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index ab6ada7..525552d 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -477,9 +477,11 @@ struct mlxsw_sp_acl_profile_ops {
 			   void *priv, void *ruleset_priv);
 	void (*ruleset_del)(struct mlxsw_sp *mlxsw_sp, void *ruleset_priv);
 	int (*ruleset_bind)(struct mlxsw_sp *mlxsw_sp, void *ruleset_priv,
-			    struct net_device *dev, bool ingress);
+			    struct mlxsw_sp_port *mlxsw_sp_port,
+			    bool ingress);
 	void (*ruleset_unbind)(struct mlxsw_sp *mlxsw_sp, void *ruleset_priv,
-			       struct net_device *dev, bool ingress);
+			       struct mlxsw_sp_port *mlxsw_sp_port,
+			       bool ingress);
 	u16 (*ruleset_group_id)(void *ruleset_priv);
 	size_t rule_priv_size;
 	int (*rule_add)(struct mlxsw_sp *mlxsw_sp,
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl.c
index f98bca9..9439bfa 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl.c
@@ -169,7 +169,7 @@ mlxsw_sp_acl_ruleset_bind(struct mlxsw_sp *mlxsw_sp,
 	const struct mlxsw_sp_acl_profile_ops *ops = ruleset->ht_key.ops;
 
 	return ops->ruleset_bind(mlxsw_sp, ruleset->priv,
-				 binding->mlxsw_sp_port->dev, binding->ingress);
+				 binding->mlxsw_sp_port, binding->ingress);
 }
 
 static void
@@ -181,7 +181,7 @@ mlxsw_sp_acl_ruleset_unbind(struct mlxsw_sp *mlxsw_sp,
 	const struct mlxsw_sp_acl_profile_ops *ops = ruleset->ht_key.ops;
 
 	ops->ruleset_unbind(mlxsw_sp, ruleset->priv,
-			    binding->mlxsw_sp_port->dev, binding->ingress);
+			    binding->mlxsw_sp_port, binding->ingress);
 }
 
 static bool mlxsw_sp_acl_ruleset_block_bound(struct mlxsw_sp_acl_block *block)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c
index 50b2f9a..c6e180c 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c
@@ -258,15 +258,11 @@ static void mlxsw_sp_acl_tcam_group_del(struct mlxsw_sp *mlxsw_sp,
 static int
 mlxsw_sp_acl_tcam_group_bind(struct mlxsw_sp *mlxsw_sp,
 			     struct mlxsw_sp_acl_tcam_group *group,
-			     struct net_device *dev, bool ingress)
+			     struct mlxsw_sp_port *mlxsw_sp_port,
+			     bool ingress)
 {
-	struct mlxsw_sp_port *mlxsw_sp_port;
 	char ppbt_pl[MLXSW_REG_PPBT_LEN];
 
-	if (!mlxsw_sp_port_dev_check(dev))
-		return -EINVAL;
-
-	mlxsw_sp_port = netdev_priv(dev);
 	mlxsw_reg_ppbt_pack(ppbt_pl, ingress ? MLXSW_REG_PXBT_E_IACL :
 					       MLXSW_REG_PXBT_E_EACL,
 			    MLXSW_REG_PXBT_OP_BIND, mlxsw_sp_port->local_port,
@@ -277,15 +273,11 @@ mlxsw_sp_acl_tcam_group_bind(struct mlxsw_sp *mlxsw_sp,
 static void
 mlxsw_sp_acl_tcam_group_unbind(struct mlxsw_sp *mlxsw_sp,
 			       struct mlxsw_sp_acl_tcam_group *group,
-			       struct net_device *dev, bool ingress)
+			       struct mlxsw_sp_port *mlxsw_sp_port,
+			       bool ingress)
 {
-	struct mlxsw_sp_port *mlxsw_sp_port;
 	char ppbt_pl[MLXSW_REG_PPBT_LEN];
 
-	if (WARN_ON(!mlxsw_sp_port_dev_check(dev)))
-		return;
-
-	mlxsw_sp_port = netdev_priv(dev);
 	mlxsw_reg_ppbt_pack(ppbt_pl, ingress ? MLXSW_REG_PXBT_E_IACL :
 					       MLXSW_REG_PXBT_E_EACL,
 			    MLXSW_REG_PXBT_OP_UNBIND, mlxsw_sp_port->local_port,
@@ -1054,22 +1046,25 @@ mlxsw_sp_acl_tcam_flower_ruleset_del(struct mlxsw_sp *mlxsw_sp,
 static int
 mlxsw_sp_acl_tcam_flower_ruleset_bind(struct mlxsw_sp *mlxsw_sp,
 				      void *ruleset_priv,
-				      struct net_device *dev, bool ingress)
+				      struct mlxsw_sp_port *mlxsw_sp_port,
+				      bool ingress)
 {
 	struct mlxsw_sp_acl_tcam_flower_ruleset *ruleset = ruleset_priv;
 
 	return mlxsw_sp_acl_tcam_group_bind(mlxsw_sp, &ruleset->group,
-					    dev, ingress);
+					    mlxsw_sp_port, ingress);
 }
 
 static void
 mlxsw_sp_acl_tcam_flower_ruleset_unbind(struct mlxsw_sp *mlxsw_sp,
 					void *ruleset_priv,
-					struct net_device *dev, bool ingress)
+					struct mlxsw_sp_port *mlxsw_sp_port,
+					bool ingress)
 {
 	struct mlxsw_sp_acl_tcam_flower_ruleset *ruleset = ruleset_priv;
 
-	mlxsw_sp_acl_tcam_group_unbind(mlxsw_sp, &ruleset->group, dev, ingress);
+	mlxsw_sp_acl_tcam_group_unbind(mlxsw_sp, &ruleset->group,
+				       mlxsw_sp_port, ingress);
 }
 
 static u16
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [patch iproute2] tc: implement filter block sharing to ingress and clsact qdiscs
  2017-12-23 15:54 [patch net-next v4 00/10] net: sched: allow qdiscs to share filter block instances Jiri Pirko
                   ` (9 preceding siblings ...)
  2017-12-23 15:54 ` [patch net-next v4 10/10] mlxsw: spectrum_acl: Pass mlxsw_sp_port down to ruleset bind/unbind ops Jiri Pirko
@ 2017-12-23 16:06 ` Jiri Pirko
  2017-12-24  1:54 ` [patch net-next v4 00/10] net: sched: allow qdiscs to share filter block instances David Ahern
  11 siblings, 0 replies; 40+ messages in thread
From: Jiri Pirko @ 2017-12-23 16:06 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, mlxsw, andrew, vivien.didelot,
	f.fainelli, michael.chan, ganeshgr, saeedm, matanb, leonro,
	idosch, jakub.kicinski, simon.horman, pieter.jansenvanvuuren,
	john.hurley, alexander.h.duyck, ogerlitz, john.fastabend, daniel,
	dsahern

From: Jiri Pirko <jiri@mellanox.com>

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/uapi/linux/pkt_sched.h | 11 +++++++++
 tc/q_clsact.c                  | 56 ++++++++++++++++++++++++++++++++++++++----
 tc/q_ingress.c                 | 32 +++++++++++++++++++++---
 3 files changed, 91 insertions(+), 8 deletions(-)

diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 37b5096..8cc554a 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -934,4 +934,15 @@ enum {
 
 #define TCA_CBS_MAX (__TCA_CBS_MAX - 1)
 
+/* Ingress/clsact */
+
+enum {
+	TCA_CLSACT_UNSPEC,
+	TCA_CLSACT_INGRESS_BLOCK,
+	TCA_CLSACT_EGRESS_BLOCK,
+	__TCA_CLSACT_MAX
+};
+
+#define TCA_CLSACT_MAX	(__TCA_CLSACT_MAX - 1)
+
 #endif
diff --git a/tc/q_clsact.c b/tc/q_clsact.c
index 341f653..06d67db 100644
--- a/tc/q_clsact.c
+++ b/tc/q_clsact.c
@@ -7,23 +7,69 @@
 
 static void explain(void)
 {
-	fprintf(stderr, "Usage: ... clsact\n");
+	fprintf(stderr, "Usage: ... clsact [ingress_block BLOCK_INDEX] [egress_block BLOCK_INDEX]\n");
 }
 
 static int clsact_parse_opt(struct qdisc_util *qu, int argc, char **argv,
 			    struct nlmsghdr *n, const char *dev)
 {
-	if (argc > 0) {
-		fprintf(stderr, "What is \"%s\"?\n", *argv);
-		explain();
-		return -1;
+	struct rtattr *tail;
+	unsigned int ingress_block = 0;
+	unsigned int egress_block = 0;
+
+	while (argc > 0) {
+		if (strcmp(*argv, "ingress_block") == 0) {
+			NEXT_ARG();
+			if (get_unsigned(&ingress_block, *argv, 0)) {
+				fprintf(stderr, "Illegal \"ingress_block\"\n");
+				return -1;
+			}
+		} else if (strcmp(*argv, "egress_block") == 0) {
+			NEXT_ARG();
+			if (get_unsigned(&egress_block, *argv, 0)) {
+				fprintf(stderr, "Illegal \"egress_block\"\n");
+				return -1;
+			}
+		} else {
+			fprintf(stderr, "What is \"%s\"?\n", *argv);
+			explain();
+			return -1;
+		}
+		NEXT_ARG_FWD();
 	}
 
+	tail = NLMSG_TAIL(n);
+	addattr_l(n, 1024, TCA_OPTIONS, NULL, 0);
+	if (ingress_block)
+		addattr32(n, 1024, TCA_CLSACT_INGRESS_BLOCK, ingress_block);
+	if (egress_block)
+		addattr32(n, 1024, TCA_CLSACT_EGRESS_BLOCK, egress_block);
+	tail->rta_len = (void *) NLMSG_TAIL(n) - (void *) tail;
 	return 0;
 }
 
 static int clsact_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
 {
+	struct rtattr *tb[TCA_CLSACT_MAX + 1];
+	unsigned int block;
+
+	if (!opt)
+		return 0;
+
+	parse_rtattr_nested(tb, TCA_CLSACT_MAX, opt);
+
+	if (tb[TCA_CLSACT_INGRESS_BLOCK] &&
+	    RTA_PAYLOAD(tb[TCA_CLSACT_INGRESS_BLOCK]) >= sizeof(__u32)) {
+		block = rta_getattr_u32(tb[TCA_CLSACT_INGRESS_BLOCK]);
+		print_uint(PRINT_ANY, "ingress_block",
+			   "ingress_block %u ", block);
+	}
+	if (tb[TCA_CLSACT_EGRESS_BLOCK] &&
+	    RTA_PAYLOAD(tb[TCA_CLSACT_EGRESS_BLOCK]) >= sizeof(__u32)) {
+		block = rta_getattr_u32(tb[TCA_CLSACT_EGRESS_BLOCK]);
+		print_uint(PRINT_ANY, "egress_block",
+			   "egress_block %u ", block);
+	}
 	return 0;
 }
 
diff --git a/tc/q_ingress.c b/tc/q_ingress.c
index 1e42229..6899c4d 100644
--- a/tc/q_ingress.c
+++ b/tc/q_ingress.c
@@ -17,30 +17,56 @@
 
 static void explain(void)
 {
-	fprintf(stderr, "Usage: ... ingress\n");
+	fprintf(stderr, "Usage: ... ingress [block BLOCK_INDEX]\n");
 }
 
 static int ingress_parse_opt(struct qdisc_util *qu, int argc, char **argv,
 			     struct nlmsghdr *n, const char *dev)
 {
+	struct rtattr *tail;
+	unsigned int block;
+
 	while (argc > 0) {
 		if (strcmp(*argv, "handle") == 0) {
 			NEXT_ARG();
-			argc--; argv++;
+		} else if (strcmp(*argv, "block") == 0) {
+			NEXT_ARG();
+			if (get_unsigned(&block, *argv, 0)) {
+				fprintf(stderr, "Illegal \"block\"\n");
+				return -1;
+			}
 		} else {
 			fprintf(stderr, "What is \"%s\"?\n", *argv);
 			explain();
 			return -1;
 		}
+		NEXT_ARG_FWD();
 	}
 
+	tail = NLMSG_TAIL(n);
+	addattr_l(n, 1024, TCA_OPTIONS, NULL, 0);
+	if (block)
+		addattr32(n, 1024, TCA_CLSACT_INGRESS_BLOCK, block);
+	tail->rta_len = (void *) NLMSG_TAIL(n) - (void *) tail;
 	return 0;
 }
 
 static int ingress_print_opt(struct qdisc_util *qu, FILE *f,
 			     struct rtattr *opt)
 {
-	fprintf(f, "---------------- ");
+	struct rtattr *tb[TCA_CLSACT_MAX + 1];
+	unsigned int block;
+
+	if (!opt)
+		return 0;
+
+	parse_rtattr_nested(tb, TCA_CLSACT_MAX, opt);
+
+	if (tb[TCA_CLSACT_INGRESS_BLOCK] &&
+	    RTA_PAYLOAD(tb[TCA_CLSACT_INGRESS_BLOCK]) >= sizeof(__u32)) {
+		block = rta_getattr_u32(tb[TCA_CLSACT_INGRESS_BLOCK]);
+		print_uint(PRINT_ANY, "block", "block %u ", block);
+	}
 	return 0;
 }
 
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [patch net-next v4 00/10] net: sched: allow qdiscs to share filter block instances
  2017-12-23 15:54 [patch net-next v4 00/10] net: sched: allow qdiscs to share filter block instances Jiri Pirko
                   ` (10 preceding siblings ...)
  2017-12-23 16:06 ` [patch iproute2] tc: implement filter block sharing to ingress and clsact qdiscs Jiri Pirko
@ 2017-12-24  1:54 ` David Ahern
  2017-12-24  7:19   ` Jiri Pirko
  11 siblings, 1 reply; 40+ messages in thread
From: David Ahern @ 2017-12-24  1:54 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, jhs, xiyou.wangcong, mlxsw, andrew, vivien.didelot,
	f.fainelli, michael.chan, ganeshgr, saeedm, matanb, leonro,
	idosch, jakub.kicinski, simon.horman, pieter.jansenvanvuuren,
	john.hurley, alexander.h.duyck, ogerlitz, john.fastabend, daniel

On 12/23/17 9:54 AM, Jiri Pirko wrote:
> So back to the example. First, we create 2 qdiscs. Both will share
> block number 22. "22" is just an identification. If we don't pass any
> block number, a new one will be generated by kernel:
> 
> $ tc qdisc add dev ens7 ingress block 22
>                                 ^^^^^^^^
> $ tc qdisc add dev ens8 ingress block 22
>                                 ^^^^^^^^
> 
> Now if we list the qdiscs, we will see the block index in the output:
> 
> $ tc qdisc
> qdisc ingress ffff: dev ens7 parent ffff:fff1 block 22
> qdisc ingress ffff: dev ens8 parent ffff:fff1 block 22
> 
> To make is more visual, the situation looks like this:
> 
>    ens7 ingress qdisc                 ens7 ingress qdisc
>           |                                  |
>           |                                  |
>           +---------->  block 22  <----------+
> 
> Unlimited number of qdiscs may share the same block.
> 
> Now we can add filter to any of qdiscs sharing the same block:
> 
> $ tc filter add dev ens7 ingress protocol ip pref 25 flower dst_ip 192.168.0.0/16 action drop


Allowing config of a shared block through any qdisc that references it
is akin to me allowing nexthop objects to be manipulated by any route
that references it -- sure, it could be done but causes a lot surprises
to the user.

You are adding a new tc object -- a shared block. Why the resistance to
creating a proper API for managing it?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch net-next v4 05/10] net: sched: keep track of offloaded filters and check tc offload feature
  2017-12-23 15:54 ` [patch net-next v4 05/10] net: sched: keep track of offloaded filters and check tc offload feature Jiri Pirko
@ 2017-12-24  2:20   ` Jakub Kicinski
  2017-12-24  7:52     ` Jiri Pirko
  0 siblings, 1 reply; 40+ messages in thread
From: Jakub Kicinski @ 2017-12-24  2:20 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, jhs, xiyou.wangcong, mlxsw, andrew,
	vivien.didelot, f.fainelli, michael.chan, ganeshgr, saeedm,
	matanb, leonro, idosch, simon.horman, pieter.jansenvanvuuren,
	john.hurley, alexander.h.duyck, ogerlitz, john.fastabend, daniel,
	dsahern

On Sat, 23 Dec 2017 16:54:31 +0100, Jiri Pirko wrote:
> -static void tcf_block_offload_bind(struct tcf_block *block, struct Qdisc *q,
> -				   struct tcf_block_ext_info *ei)
> +static int tcf_block_offload_bind(struct tcf_block *block, struct Qdisc *q,
> +				  struct tcf_block_ext_info *ei)
>  {
> -	tcf_block_offload_cmd(block, q, ei, TC_BLOCK_BIND);
> +	struct net_device *dev = q->dev_queue->dev;
> +	int err;
> +
> +	if (!dev->netdev_ops->ndo_setup_tc)
> +		return 0;
> +
> +	/* If tc offload feature is disabled and the block we try to bind
> +	 * to already has some offloaded filters, forbid to bind.
> +	 */
> +	if (!tc_can_offload(dev) && tcf_block_offload_in_use(block))
> +		return -EOPNOTSUPP;
> +
> +	err = tcf_block_offload_cmd(block, dev, ei, TC_BLOCK_BIND);
> +	if (err == -EOPNOTSUPP)
> +		/* Driver does not support binding. */
> +		return 0;
> +	return err;
>  }

Would you mind explaining why those return 0s are safe?

Say I have 2 netdevs, one dumb NIC without ndo_setup_tc (dnic) and one
NIC that can offload everything (enic).  I can share a block between
them (before or after adding any filters) and adding filters with
skip_sw will succeed, even though dnic will not see them ever.  There
is only one callback for enic, "all callbacks" != "all devices". 
It's fine to share the block in such case, but that block can never
accept a skip_sw filter.  Don't we need something like (reverse) patch 3
for keeping track of netdevs sharing the block which are not OK with
offloads?

Am I misunderstanding how this is supposed to work?  Or simply too nit
picky about providing predictable behaviour?

Quick hack to illustrate the idea (untested):

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 22a3a1d22ffa..e61e59161243 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -289,6 +289,7 @@ struct tcf_block {
        struct list_head cb_list;
        struct list_head owner_list;
        bool keep_dst;
+       bool nonoffload_taint;
        unsigned int offloadcnt;
 };
 
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 37eea70d1d72..4e017cbbf356 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -290,7 +290,7 @@ static int tcf_block_offload_bind(struct tcf_block *block, struct Qdisc *q,
        int err;
 
        if (!dev->netdev_ops->ndo_setup_tc)
-               return 0;
+               goto mark_no_offload;
 
        /* If tc offload feature is disabled and the block we try to bind
         * to already has some offloaded filters, forbid to bind.
@@ -300,9 +300,14 @@ static int tcf_block_offload_bind(struct tcf_block *block, struct Qdisc *q,
 
        err = tcf_block_offload_cmd(block, dev, ei, TC_BLOCK_BIND);
        if (err == -EOPNOTSUPP)
-               /* Driver does not support binding. */
-               return 0;
+               goto mark_no_offload;
        return err;
+
+mark_no_offload:
+       if (tcf_block_offload_in_use(block))
+               return -EOPNOTSUPP;
+       block->nonoffload_taint = true;
+       return 0;
 }
 
 static void tcf_block_offload_unbind(struct tcf_block *block, struct Qdisc *q,
@@ -1492,6 +1497,10 @@ int tc_setup_cb_call(struct tcf_block *block, struct tcf_exts *exts,
        int ok_count;
        int ret;
 
+       /* Make sure all netdevs sharing this block are offload-capable */
+       if (block->nonoffload_taint && err_stop)
+               return -EOPNOTSUPP;
+
        ret = tcf_block_cb_call(block, type, type_data, err_stop);
        if (ret < 0)
                return ret;


Here a block once tainted with a bad netdev will never be offloadable
again, so tracking a'la patch 3 would be nicer..

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [patch net-next v4 00/10] net: sched: allow qdiscs to share filter block instances
  2017-12-24  1:54 ` [patch net-next v4 00/10] net: sched: allow qdiscs to share filter block instances David Ahern
@ 2017-12-24  7:19   ` Jiri Pirko
  2017-12-24 16:25     ` David Ahern
  0 siblings, 1 reply; 40+ messages in thread
From: Jiri Pirko @ 2017-12-24  7:19 UTC (permalink / raw)
  To: David Ahern
  Cc: netdev, davem, jhs, xiyou.wangcong, mlxsw, andrew,
	vivien.didelot, f.fainelli, michael.chan, ganeshgr, saeedm,
	matanb, leonro, idosch, jakub.kicinski, simon.horman,
	pieter.jansenvanvuuren, john.hurley, alexander.h.duyck, ogerlitz,
	john.fastabend, daniel

Sun, Dec 24, 2017 at 02:54:47AM CET, dsahern@gmail.com wrote:
>On 12/23/17 9:54 AM, Jiri Pirko wrote:
>> So back to the example. First, we create 2 qdiscs. Both will share
>> block number 22. "22" is just an identification. If we don't pass any
>> block number, a new one will be generated by kernel:
>> 
>> $ tc qdisc add dev ens7 ingress block 22
>>                                 ^^^^^^^^
>> $ tc qdisc add dev ens8 ingress block 22
>>                                 ^^^^^^^^
>> 
>> Now if we list the qdiscs, we will see the block index in the output:
>> 
>> $ tc qdisc
>> qdisc ingress ffff: dev ens7 parent ffff:fff1 block 22
>> qdisc ingress ffff: dev ens8 parent ffff:fff1 block 22
>> 
>> To make is more visual, the situation looks like this:
>> 
>>    ens7 ingress qdisc                 ens7 ingress qdisc
>>           |                                  |
>>           |                                  |
>>           +---------->  block 22  <----------+
>> 
>> Unlimited number of qdiscs may share the same block.
>> 
>> Now we can add filter to any of qdiscs sharing the same block:
>> 
>> $ tc filter add dev ens7 ingress protocol ip pref 25 flower dst_ip 192.168.0.0/16 action drop
>
>
>Allowing config of a shared block through any qdisc that references it
>is akin to me allowing nexthop objects to be manipulated by any route
>that references it -- sure, it could be done but causes a lot surprises
>to the user.
>
>You are adding a new tc object -- a shared block. Why the resistance to
>creating a proper API for managing it?

Again, no resistance, I said many times it would be done as a follow-up.
But as an api already exists, it has to continue to work. Or do you
suggest it should stop working? That, I don't agree with.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch net-next v4 05/10] net: sched: keep track of offloaded filters and check tc offload feature
  2017-12-24  2:20   ` Jakub Kicinski
@ 2017-12-24  7:52     ` Jiri Pirko
  0 siblings, 0 replies; 40+ messages in thread
From: Jiri Pirko @ 2017-12-24  7:52 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, davem, jhs, xiyou.wangcong, mlxsw, andrew,
	vivien.didelot, f.fainelli, michael.chan, ganeshgr, saeedm,
	matanb, leonro, idosch, simon.horman, pieter.jansenvanvuuren,
	john.hurley, alexander.h.duyck, ogerlitz, john.fastabend, daniel,
	dsahern

Sun, Dec 24, 2017 at 03:20:45AM CET, kubakici@wp.pl wrote:
>On Sat, 23 Dec 2017 16:54:31 +0100, Jiri Pirko wrote:
>> -static void tcf_block_offload_bind(struct tcf_block *block, struct Qdisc *q,
>> -				   struct tcf_block_ext_info *ei)
>> +static int tcf_block_offload_bind(struct tcf_block *block, struct Qdisc *q,
>> +				  struct tcf_block_ext_info *ei)
>>  {
>> -	tcf_block_offload_cmd(block, q, ei, TC_BLOCK_BIND);
>> +	struct net_device *dev = q->dev_queue->dev;
>> +	int err;
>> +
>> +	if (!dev->netdev_ops->ndo_setup_tc)
>> +		return 0;
>> +
>> +	/* If tc offload feature is disabled and the block we try to bind
>> +	 * to already has some offloaded filters, forbid to bind.
>> +	 */
>> +	if (!tc_can_offload(dev) && tcf_block_offload_in_use(block))
>> +		return -EOPNOTSUPP;
>> +
>> +	err = tcf_block_offload_cmd(block, dev, ei, TC_BLOCK_BIND);
>> +	if (err == -EOPNOTSUPP)
>> +		/* Driver does not support binding. */
>> +		return 0;
>> +	return err;
>>  }
>
>Would you mind explaining why those return 0s are safe?
>
>Say I have 2 netdevs, one dumb NIC without ndo_setup_tc (dnic) and one
>NIC that can offload everything (enic).  I can share a block between
>them (before or after adding any filters) and adding filters with
>skip_sw will succeed, even though dnic will not see them ever.  There
>is only one callback for enic, "all callbacks" != "all devices". 
>It's fine to share the block in such case, but that block can never
>accept a skip_sw filter.  Don't we need something like (reverse) patch 3
>for keeping track of netdevs sharing the block which are not OK with
>offloads?
>
>Am I misunderstanding how this is supposed to work?  Or simply too nit
>picky about providing predictable behaviour?

You undestand it correctly. Original plan was to ignore thore devices
that does not support offloading. But thinking about it a bit more,
you are probably right that they should be taken into consideration
when user explicitly says "skip_sw".

Will include the accounting you suggest below. Thanks!


>
>Quick hack to illustrate the idea (untested):
>
>diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
>index 22a3a1d22ffa..e61e59161243 100644
>--- a/include/net/sch_generic.h
>+++ b/include/net/sch_generic.h
>@@ -289,6 +289,7 @@ struct tcf_block {
>        struct list_head cb_list;
>        struct list_head owner_list;
>        bool keep_dst;
>+       bool nonoffload_taint;
>        unsigned int offloadcnt;
> };
> 
>diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
>index 37eea70d1d72..4e017cbbf356 100644
>--- a/net/sched/cls_api.c
>+++ b/net/sched/cls_api.c
>@@ -290,7 +290,7 @@ static int tcf_block_offload_bind(struct tcf_block *block, struct Qdisc *q,
>        int err;
> 
>        if (!dev->netdev_ops->ndo_setup_tc)
>-               return 0;
>+               goto mark_no_offload;
> 
>        /* If tc offload feature is disabled and the block we try to bind
>         * to already has some offloaded filters, forbid to bind.
>@@ -300,9 +300,14 @@ static int tcf_block_offload_bind(struct tcf_block *block, struct Qdisc *q,
> 
>        err = tcf_block_offload_cmd(block, dev, ei, TC_BLOCK_BIND);
>        if (err == -EOPNOTSUPP)
>-               /* Driver does not support binding. */
>-               return 0;
>+               goto mark_no_offload;
>        return err;
>+
>+mark_no_offload:
>+       if (tcf_block_offload_in_use(block))
>+               return -EOPNOTSUPP;
>+       block->nonoffload_taint = true;
>+       return 0;
> }
> 
> static void tcf_block_offload_unbind(struct tcf_block *block, struct Qdisc *q,
>@@ -1492,6 +1497,10 @@ int tc_setup_cb_call(struct tcf_block *block, struct tcf_exts *exts,
>        int ok_count;
>        int ret;
> 
>+       /* Make sure all netdevs sharing this block are offload-capable */
>+       if (block->nonoffload_taint && err_stop)
>+               return -EOPNOTSUPP;
>+
>        ret = tcf_block_cb_call(block, type, type_data, err_stop);
>        if (ret < 0)
>                return ret;
>
>
>Here a block once tainted with a bad netdev will never be offloadable
>again, so tracking a'la patch 3 would be nicer..

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch net-next v4 00/10] net: sched: allow qdiscs to share filter block instances
  2017-12-24  7:19   ` Jiri Pirko
@ 2017-12-24 16:25     ` David Ahern
  2017-12-25 10:23       ` Jiri Pirko
  0 siblings, 1 reply; 40+ messages in thread
From: David Ahern @ 2017-12-24 16:25 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, jhs, xiyou.wangcong, mlxsw, andrew,
	vivien.didelot, f.fainelli, michael.chan, ganeshgr, saeedm,
	matanb, leonro, idosch, jakub.kicinski, simon.horman,
	pieter.jansenvanvuuren, john.hurley, alexander.h.duyck, ogerlitz,
	john.fastabend, daniel

On 12/24/17 1:19 AM, Jiri Pirko wrote:
> Sun, Dec 24, 2017 at 02:54:47AM CET, dsahern@gmail.com wrote:
>> On 12/23/17 9:54 AM, Jiri Pirko wrote:
>>> So back to the example. First, we create 2 qdiscs. Both will share
>>> block number 22. "22" is just an identification. If we don't pass any
>>> block number, a new one will be generated by kernel:
>>>
>>> $ tc qdisc add dev ens7 ingress block 22
>>>                                 ^^^^^^^^
>>> $ tc qdisc add dev ens8 ingress block 22
>>>                                 ^^^^^^^^
>>>
>>> Now if we list the qdiscs, we will see the block index in the output:
>>>
>>> $ tc qdisc
>>> qdisc ingress ffff: dev ens7 parent ffff:fff1 block 22
>>> qdisc ingress ffff: dev ens8 parent ffff:fff1 block 22
>>>
>>> To make is more visual, the situation looks like this:
>>>
>>>    ens7 ingress qdisc                 ens7 ingress qdisc
>>>           |                                  |
>>>           |                                  |
>>>           +---------->  block 22  <----------+
>>>
>>> Unlimited number of qdiscs may share the same block.
>>>
>>> Now we can add filter to any of qdiscs sharing the same block:
>>>
>>> $ tc filter add dev ens7 ingress protocol ip pref 25 flower dst_ip 192.168.0.0/16 action drop
>>
>>
>> Allowing config of a shared block through any qdisc that references it
>> is akin to me allowing nexthop objects to be manipulated by any route
>> that references it -- sure, it could be done but causes a lot surprises
>> to the user.
>>
>> You are adding a new tc object -- a shared block. Why the resistance to
>> creating a proper API for managing it?
> 
> Again, no resistance, I said many times it would be done as a follow-up.
> But as an api already exists, it has to continue to work. Or do you
> suggest it should stop working? That, I don't agree with.
> 

That is exactly what I am saying - principle of least surprise. The new
object brings its own API and can only be modified using the new API.
The scheme above can and will surprise users. You are thinking like a tc
developer, someone intimately familiar with the code, and not like an
ordinary user of this new feature.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch net-next v4 00/10] net: sched: allow qdiscs to share filter block instances
  2017-12-24 16:25     ` David Ahern
@ 2017-12-25 10:23       ` Jiri Pirko
  2018-01-02 19:49         ` Jiri Pirko
  0 siblings, 1 reply; 40+ messages in thread
From: Jiri Pirko @ 2017-12-25 10:23 UTC (permalink / raw)
  To: David Ahern
  Cc: netdev, davem, jhs, xiyou.wangcong, mlxsw, andrew,
	vivien.didelot, f.fainelli, michael.chan, ganeshgr, saeedm,
	matanb, leonro, idosch, jakub.kicinski, simon.horman,
	pieter.jansenvanvuuren, john.hurley, alexander.h.duyck, ogerlitz,
	john.fastabend, daniel

Sun, Dec 24, 2017 at 05:25:41PM CET, dsahern@gmail.com wrote:
>On 12/24/17 1:19 AM, Jiri Pirko wrote:
>> Sun, Dec 24, 2017 at 02:54:47AM CET, dsahern@gmail.com wrote:
>>> On 12/23/17 9:54 AM, Jiri Pirko wrote:
>>>> So back to the example. First, we create 2 qdiscs. Both will share
>>>> block number 22. "22" is just an identification. If we don't pass any
>>>> block number, a new one will be generated by kernel:
>>>>
>>>> $ tc qdisc add dev ens7 ingress block 22
>>>>                                 ^^^^^^^^
>>>> $ tc qdisc add dev ens8 ingress block 22
>>>>                                 ^^^^^^^^
>>>>
>>>> Now if we list the qdiscs, we will see the block index in the output:
>>>>
>>>> $ tc qdisc
>>>> qdisc ingress ffff: dev ens7 parent ffff:fff1 block 22
>>>> qdisc ingress ffff: dev ens8 parent ffff:fff1 block 22
>>>>
>>>> To make is more visual, the situation looks like this:
>>>>
>>>>    ens7 ingress qdisc                 ens7 ingress qdisc
>>>>           |                                  |
>>>>           |                                  |
>>>>           +---------->  block 22  <----------+
>>>>
>>>> Unlimited number of qdiscs may share the same block.
>>>>
>>>> Now we can add filter to any of qdiscs sharing the same block:
>>>>
>>>> $ tc filter add dev ens7 ingress protocol ip pref 25 flower dst_ip 192.168.0.0/16 action drop
>>>
>>>
>>> Allowing config of a shared block through any qdisc that references it
>>> is akin to me allowing nexthop objects to be manipulated by any route
>>> that references it -- sure, it could be done but causes a lot surprises
>>> to the user.
>>>
>>> You are adding a new tc object -- a shared block. Why the resistance to
>>> creating a proper API for managing it?
>> 
>> Again, no resistance, I said many times it would be done as a follow-up.
>> But as an api already exists, it has to continue to work. Or do you
>> suggest it should stop working? That, I don't agree with.
>> 
>
>That is exactly what I am saying - principle of least surprise. The new
>object brings its own API and can only be modified using the new API.
>The scheme above can and will surprise users. You are thinking like a tc
>developer, someone intimately familiar with the code, and not like an
>ordinary user of this new feature.

Breaking exising tools is newer good. Note that not only about filter
add/del iface but also dump and notifications. I agree to extend the api
for the "block handle", sure, but the existing api should continue to
work.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch net-next v4 00/10] net: sched: allow qdiscs to share filter block instances
  2017-12-25 10:23       ` Jiri Pirko
@ 2018-01-02 19:49         ` Jiri Pirko
  2018-01-03  2:07           ` David Ahern
  0 siblings, 1 reply; 40+ messages in thread
From: Jiri Pirko @ 2018-01-02 19:49 UTC (permalink / raw)
  To: David Ahern
  Cc: netdev, davem, jhs, xiyou.wangcong, mlxsw, andrew,
	vivien.didelot, f.fainelli, michael.chan, ganeshgr, saeedm,
	matanb, leonro, idosch, jakub.kicinski, simon.horman,
	pieter.jansenvanvuuren, john.hurley, alexander.h.duyck, ogerlitz,
	john.fastabend, daniel

Mon, Dec 25, 2017 at 11:23:46AM CET, jiri@resnulli.us wrote:
>Sun, Dec 24, 2017 at 05:25:41PM CET, dsahern@gmail.com wrote:
>>On 12/24/17 1:19 AM, Jiri Pirko wrote:
>>> Sun, Dec 24, 2017 at 02:54:47AM CET, dsahern@gmail.com wrote:
>>>> On 12/23/17 9:54 AM, Jiri Pirko wrote:
>>>>> So back to the example. First, we create 2 qdiscs. Both will share
>>>>> block number 22. "22" is just an identification. If we don't pass any
>>>>> block number, a new one will be generated by kernel:
>>>>>
>>>>> $ tc qdisc add dev ens7 ingress block 22
>>>>>                                 ^^^^^^^^
>>>>> $ tc qdisc add dev ens8 ingress block 22
>>>>>                                 ^^^^^^^^
>>>>>
>>>>> Now if we list the qdiscs, we will see the block index in the output:
>>>>>
>>>>> $ tc qdisc
>>>>> qdisc ingress ffff: dev ens7 parent ffff:fff1 block 22
>>>>> qdisc ingress ffff: dev ens8 parent ffff:fff1 block 22
>>>>>
>>>>> To make is more visual, the situation looks like this:
>>>>>
>>>>>    ens7 ingress qdisc                 ens7 ingress qdisc
>>>>>           |                                  |
>>>>>           |                                  |
>>>>>           +---------->  block 22  <----------+
>>>>>
>>>>> Unlimited number of qdiscs may share the same block.
>>>>>
>>>>> Now we can add filter to any of qdiscs sharing the same block:
>>>>>
>>>>> $ tc filter add dev ens7 ingress protocol ip pref 25 flower dst_ip 192.168.0.0/16 action drop
>>>>
>>>>
>>>> Allowing config of a shared block through any qdisc that references it
>>>> is akin to me allowing nexthop objects to be manipulated by any route
>>>> that references it -- sure, it could be done but causes a lot surprises
>>>> to the user.
>>>>
>>>> You are adding a new tc object -- a shared block. Why the resistance to
>>>> creating a proper API for managing it?
>>> 
>>> Again, no resistance, I said many times it would be done as a follow-up.
>>> But as an api already exists, it has to continue to work. Or do you
>>> suggest it should stop working? That, I don't agree with.
>>> 
>>
>>That is exactly what I am saying - principle of least surprise. The new
>>object brings its own API and can only be modified using the new API.
>>The scheme above can and will surprise users. You are thinking like a tc
>>developer, someone intimately familiar with the code, and not like an
>>ordinary user of this new feature.
>
>Breaking exising tools is newer good. Note that not only about filter
>add/del iface but also dump and notifications. I agree to extend the api
>for the "block handle", sure, but the existing api should continue to
>work.

DaveA, please consider following example:

$ tc qdisc add dev ens7 ingress
$ tc qdisc
qdisc ingress ffff: dev ens7 parent ffff:fff1 block 1

Now I have one device with one qdisc attached.

I will add some filters, for example:
$ tc filter add dev ens7 ingress protocol ip pref 25 flower dst_ip 192.168.0.0/16 action drop

No sharing is happening. The user is doing what he is used to do.

Now user decides to share this filters with another device. As you can
see above, the block created for ens7 qdisc instance has id "1".
User can simply do:

tc qdisc add dev ens8 ingress block 1

And the block gets shared among ens7 ingress qdisc instance and ens8
ingress qdisc instance.

What is wrong with this? The approach you suggest would disallow this
forcing user to explicitly create some block entity and then to attach
it to qdisc instances. I don't really see good reason for it. Could you
please clear this up for me?

Thanks!

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch net-next v4 00/10] net: sched: allow qdiscs to share filter block instances
  2018-01-02 19:49         ` Jiri Pirko
@ 2018-01-03  2:07           ` David Ahern
  2018-01-03  9:40             ` Jiri Pirko
  0 siblings, 1 reply; 40+ messages in thread
From: David Ahern @ 2018-01-03  2:07 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, jhs, xiyou.wangcong, mlxsw, andrew,
	vivien.didelot, f.fainelli, michael.chan, ganeshgr, saeedm,
	matanb, leonro, idosch, jakub.kicinski, simon.horman,
	pieter.jansenvanvuuren, john.hurley, alexander.h.duyck, ogerlitz,
	john.fastabend, daniel

On 1/2/18 12:49 PM, Jiri Pirko wrote:
> DaveA, please consider following example:
> 
> $ tc qdisc add dev ens7 ingress
> $ tc qdisc
> qdisc ingress ffff: dev ens7 parent ffff:fff1 block 1
> 
> Now I have one device with one qdisc attached.
> 
> I will add some filters, for example:
> $ tc filter add dev ens7 ingress protocol ip pref 25 flower dst_ip 192.168.0.0/16 action drop
> 
> No sharing is happening. The user is doing what he is used to do.
> 
> Now user decides to share this filters with another device. As you can
> see above, the block created for ens7 qdisc instance has id "1".
> User can simply do:
> 
> tc qdisc add dev ens8 ingress block 1
> 
> And the block gets shared among ens7 ingress qdisc instance and ens8
> ingress qdisc instance.
> 
> What is wrong with this? The approach you suggest would disallow this

Conceptually, absolutely nothing. We all agree that a shared block
feature is needed. So no argument on sharing the filters across devices.

The disagreement is in how they should be managed. I think my last
response concisely captures my concerns -- the principle of least surprise.

So with the initial commands above, all is fine. Then someone is
debugging a problem or wants to add another filter to ens8, so they run:

$ tc filter add dev ens8 ingress protocol ip pref 25 flower dst_ip
192.168.1.0/16 action drop

Then traffic flows through ens7 break and some other user is struggling
to understand what just happened. That the new filter magically appears
on ens7 when the user operated on ens8 is a surprise. Nothing about that
last command acknowledges that it is changing a shared resource.

Consider the commands being run by different people, and a time span
between. Allowing the shared block to be configured by any device using
the block is just setting up users for errors and confusion.

> forcing user to explicitly create some block entity and then to attach
> it to qdisc instances. I don't really see good reason for it. Could you
> please clear this up for me?

It forces the user to acknowledge it is changing a resource that may be
shared by more than one device.

$ tc filter add dev ens8 ingress protocol ip pref 25 flower dst_ip
192.168.1.0/16 action drop
Error: This qdisc is a shared block. Use the block API to configure.

$ tc qdisc show dev ens8
qdisc ingress ffff: dev ens7 parent ffff:fff1 block 1

$ tc filter add block 1 protocol ip pref 25 flower dst_ip 192.168.1.0/16
action drop

Now there are no surprises. I have to know that ens8 is using block 1,
and I have to specify that block when adding a filter.


BTW, is there an option to list all devices using the same shared block
- short of listing all and grepping?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch net-next v4 00/10] net: sched: allow qdiscs to share filter block instances
  2018-01-03  2:07           ` David Ahern
@ 2018-01-03  9:40             ` Jiri Pirko
  2018-01-03 15:57               ` David Ahern
  0 siblings, 1 reply; 40+ messages in thread
From: Jiri Pirko @ 2018-01-03  9:40 UTC (permalink / raw)
  To: David Ahern
  Cc: netdev, davem, jhs, xiyou.wangcong, mlxsw, andrew,
	vivien.didelot, f.fainelli, michael.chan, ganeshgr, saeedm,
	matanb, leonro, idosch, jakub.kicinski, simon.horman,
	pieter.jansenvanvuuren, john.hurley, alexander.h.duyck, ogerlitz,
	john.fastabend, daniel

Wed, Jan 03, 2018 at 03:07:36AM CET, dsahern@gmail.com wrote:
>On 1/2/18 12:49 PM, Jiri Pirko wrote:
>> DaveA, please consider following example:
>> 
>> $ tc qdisc add dev ens7 ingress
>> $ tc qdisc
>> qdisc ingress ffff: dev ens7 parent ffff:fff1 block 1
>> 
>> Now I have one device with one qdisc attached.
>> 
>> I will add some filters, for example:
>> $ tc filter add dev ens7 ingress protocol ip pref 25 flower dst_ip 192.168.0.0/16 action drop
>> 
>> No sharing is happening. The user is doing what he is used to do.
>> 
>> Now user decides to share this filters with another device. As you can
>> see above, the block created for ens7 qdisc instance has id "1".
>> User can simply do:
>> 
>> tc qdisc add dev ens8 ingress block 1
>> 
>> And the block gets shared among ens7 ingress qdisc instance and ens8
>> ingress qdisc instance.
>> 
>> What is wrong with this? The approach you suggest would disallow this
>
>Conceptually, absolutely nothing. We all agree that a shared block
>feature is needed. So no argument on sharing the filters across devices.
>
>The disagreement is in how they should be managed. I think my last
>response concisely captures my concerns -- the principle of least surprise.
>
>So with the initial commands above, all is fine. Then someone is
>debugging a problem or wants to add another filter to ens8, so they run:
>
>$ tc filter add dev ens8 ingress protocol ip pref 25 flower dst_ip
>192.168.1.0/16 action drop
>
>Then traffic flows through ens7 break and some other user is struggling
>to understand what just happened. That the new filter magically appears
>on ens7 when the user operated on ens8 is a surprise. Nothing about that
>last command acknowledges that it is changing a shared resource.

Given the fact that user configured sharing between ens7 and ens8 and he
can easily see that by "$ tc qdisc show" I don't see anything wrong
about it, no surprise. Either the user knows what is he doing or not.


>
>Consider the commands being run by different people, and a time span
>between. Allowing the shared block to be configured by any device using
>the block is just setting up users for errors and confusion.

No confusion. Everything is visible, all info is in the manpage. The
same story as always.


>
>> forcing user to explicitly create some block entity and then to attach
>> it to qdisc instances. I don't really see good reason for it. Could you
>> please clear this up for me?
>
>It forces the user to acknowledge it is changing a resource that may be
>shared by more than one device.
>
>$ tc filter add dev ens8 ingress protocol ip pref 25 flower dst_ip
>192.168.1.0/16 action drop
>Error: This qdisc is a shared block. Use the block API to configure.
>
>$ tc qdisc show dev ens8
>qdisc ingress ffff: dev ens7 parent ffff:fff1 block 1
>
>$ tc filter add block 1 protocol ip pref 25 flower dst_ip 192.168.1.0/16
>action drop
>
>Now there are no surprises. I have to know that ens8 is using block 1,
>and I have to specify that block when adding a filter.

On contrary. This is surprising! Consider my original example extended
by your approach and limitations:

$ tc qdisc add dev ens7 ingress
$ tc qdisc
qdisc ingress ffff: dev ens7 parent ffff:fff1 block 1
$ tc filter add dev ens7 ingress protocol ip pref 25 flower dst_ip 192.168.0.0/16 action drop

So far, everything is good. Now I add qdisc with block 1 to ens8:
$ tc qdisc add dev ens8 ingress block 1

And I do:
$ tc filter add dev ens7 ingress protocol ip pref 25 flower dst_ip 192.168.0.0/16 action drop
Should it Error out or pass by your limitations?

Assume it should pass.
I do:
$ tc filter add dev ens8 ingress protocol ip pref 25 flower dst_ip 192.168.1.0/16 action drop
Error: This qdisc is a shared block. Use the block API to configure.

This will error out as you wrote. Now I do show:

$ tc qdisc show dev ens8                                                
qdisc ingress ffff: dev ens7 parent ffff:fff1 block 1

As you wrote, there is "ens7" in output of ens8 qdisc. That is
surprising.

What would following commands show with your limitations:
$ tc filter show dev ens7 ingress
$ tc filter show dev ens8 ingress

All filters should be listed under ens7 and ens8 should be blank? I
cannot add filters to ens8 with your limitations so I guess the show for
it should be blank. But there are actually rules there! That is another
surprise and breakage!

Now I continue and remove the qdisc from ens7:
$ tc qdisc add dev ens7 ingress

The block 1 is still there for ens8. So what happens now? What is the
output of "filter show dev ens8 ingress" and "qdisc show dev ens8"?
Will "add dev ens8 ingress" magically start to work now? This is another
set of surprises and breakages.

So as I see it with your limitations, there is a lot of surprises
introduced.

Note that I gave a lot of thoughts to all this. The approach I suggest
is the cleanest and does not break anything. Also, it is easily
extendable by adding the block handle to add/del/list the filters.
But the current commands should not be broken. Please.

If you want, I can implement the block handle extension as a part of this
patchset. I wanted to do it as a follow-up to limit the number of
patches in the set so DaveM would not have reason to hate me :)


>
>
>BTW, is there an option to list all devices using the same shared block
>- short of listing all and grepping?

$ tc qdisc show

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch net-next v4 00/10] net: sched: allow qdiscs to share filter block instances
  2018-01-03  9:40             ` Jiri Pirko
@ 2018-01-03 15:57               ` David Ahern
  2018-01-03 17:22                 ` Jiri Pirko
  0 siblings, 1 reply; 40+ messages in thread
From: David Ahern @ 2018-01-03 15:57 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, jhs, xiyou.wangcong, mlxsw, andrew,
	vivien.didelot, f.fainelli, michael.chan, ganeshgr, saeedm,
	matanb, leonro, idosch, jakub.kicinski, simon.horman,
	pieter.jansenvanvuuren, john.hurley, alexander.h.duyck, ogerlitz,
	john.fastabend, daniel

On 1/3/18 2:40 AM, Jiri Pirko wrote:
> Wed, Jan 03, 2018 at 03:07:36AM CET, dsahern@gmail.com wrote:
>> On 1/2/18 12:49 PM, Jiri Pirko wrote:
>>> DaveA, please consider following example:
>>>
>>> $ tc qdisc add dev ens7 ingress
>>> $ tc qdisc
>>> qdisc ingress ffff: dev ens7 parent ffff:fff1 block 1
>>>
>>> Now I have one device with one qdisc attached.
>>>
>>> I will add some filters, for example:
>>> $ tc filter add dev ens7 ingress protocol ip pref 25 flower dst_ip 192.168.0.0/16 action drop
>>>
>>> No sharing is happening. The user is doing what he is used to do.
>>>
>>> Now user decides to share this filters with another device. As you can
>>> see above, the block created for ens7 qdisc instance has id "1".
>>> User can simply do:
>>>
>>> tc qdisc add dev ens8 ingress block 1
>>>
>>> And the block gets shared among ens7 ingress qdisc instance and ens8
>>> ingress qdisc instance.
>>>
>>> What is wrong with this? The approach you suggest would disallow this
>>
>> Conceptually, absolutely nothing. We all agree that a shared block
>> feature is needed. So no argument on sharing the filters across devices.
>>
>> The disagreement is in how they should be managed. I think my last
>> response concisely captures my concerns -- the principle of least surprise.
>>
>> So with the initial commands above, all is fine. Then someone is
>> debugging a problem or wants to add another filter to ens8, so they run:
>>
>> $ tc filter add dev ens8 ingress protocol ip pref 25 flower dst_ip
>> 192.168.1.0/16 action drop
>>
>> Then traffic flows through ens7 break and some other user is struggling
>> to understand what just happened. That the new filter magically appears
>> on ens7 when the user operated on ens8 is a surprise. Nothing about that
>> last command acknowledges that it is changing a shared resource.
> 
> Given the fact that user configured sharing between ens7 and ens8 and he
> can easily see that by "$ tc qdisc show" I don't see anything wrong
> about it, no surprise. Either the user knows what is he doing or not.

tc is one of the most difficult commands for users to understand and get
right. The API behind the command even more so. There seems to be a
general agreement on this.

To someone like you who is well versed in tc semantics this may seem
obvious, but I contend that even you would slip up here at some point.
There is too much distance between the filter management and the qdisc
listing a part of which shows a block id - not that it is shared or
anything else, just of the many words in the output there is 'block N'.

>>
>> Consider the commands being run by different people, and a time span
>> between. Allowing the shared block to be configured by any device using
>> the block is just setting up users for errors and confusion.
> 
> No confusion. Everything is visible, all info is in the manpage. The
> same story as always.
> 
> 
>>
>>> forcing user to explicitly create some block entity and then to attach
>>> it to qdisc instances. I don't really see good reason for it. Could you
>>> please clear this up for me?
>>
>> It forces the user to acknowledge it is changing a resource that may be
>> shared by more than one device.
>>
>> $ tc filter add dev ens8 ingress protocol ip pref 25 flower dst_ip
>> 192.168.1.0/16 action drop
>> Error: This qdisc is a shared block. Use the block API to configure.
>>
>> $ tc qdisc show dev ens8
>> qdisc ingress ffff: dev ens7 parent ffff:fff1 block 1
>>
>> $ tc filter add block 1 protocol ip pref 25 flower dst_ip 192.168.1.0/16
>> action drop
>>
>> Now there are no surprises. I have to know that ens8 is using block 1,
>> and I have to specify that block when adding a filter.
> 
> On contrary. This is surprising! Consider my original example extended
> by your approach and limitations:

Nope, I was not extending your approach; I was using your examples to
show why I disagree with the approach. As I mentioned in past responses,
I believe the block lifecycle should be independent of any device.

$ tc qdisc add block 1
$ tc filter add block 1 ....

$ tc qdisc add dev ens7 ingress block 1
$ tc qdisc add dev ens8 ingress block 1

$ tc filter show block 1
(filters listed)

$ tc filter show dev ens7 ingress
Info: This qdisc is shared. Use the block api to list filters.

(This is very similar to how I am handling nexthop objects for routes. I
can show some examples if desired, but don't want this tangent to go too
far.)

> 
> $ tc qdisc add dev ens7 ingress
> $ tc qdisc
> qdisc ingress ffff: dev ens7 parent ffff:fff1 block 1
> $ tc filter add dev ens7 ingress protocol ip pref 25 flower dst_ip 192.168.0.0/16 action drop
> 
> So far, everything is good. Now I add qdisc with block 1 to ens8:
> $ tc qdisc add dev ens8 ingress block 1
> 
> And I do:
> $ tc filter add dev ens7 ingress protocol ip pref 25 flower dst_ip 192.168.0.0/16 action drop
> Should it Error out or pass by your limitations?
> 
> Assume it should pass.
> I do:
> $ tc filter add dev ens8 ingress protocol ip pref 25 flower dst_ip 192.168.1.0/16 action drop
> Error: This qdisc is a shared block. Use the block API to configure.
> 
> This will error out as you wrote. Now I do show:
> 
> $ tc qdisc show dev ens8                                                
> qdisc ingress ffff: dev ens7 parent ffff:fff1 block 1
> 
> As you wrote, there is "ens7" in output of ens8 qdisc. That is
> surprising.

that's a typo on my part with copy-paste-modify of commands while
writing an email; that was not intentional to show ens7 on an ens8 device.


> 
> What would following commands show with your limitations:
> $ tc filter show dev ens7 ingress
> $ tc filter show dev ens8 ingress

see above

> 
> All filters should be listed under ens7 and ens8 should be blank? I
> cannot add filters to ens8 with your limitations so I guess the show for
> it should be blank. But there are actually rules there! That is another
> surprise and breakage!
> 
> Now I continue and remove the qdisc from ens7:
> $ tc qdisc add dev ens7 ingress
> 
> The block 1 is still there for ens8. So what happens now? What is the
> output of "filter show dev ens8 ingress" and "qdisc show dev ens8"?
> Will "add dev ens8 ingress" magically start to work now? This is another
> set of surprises and breakages.
> 
> So as I see it with your limitations, there is a lot of surprises
> introduced.
> 
> Note that I gave a lot of thoughts to all this. The approach I suggest
> is the cleanest and does not break anything. Also, it is easily
> extendable by adding the block handle to add/del/list the filters.
> But the current commands should not be broken. Please.
> 
> If you want, I can implement the block handle extension as a part of this
> patchset. I wanted to do it as a follow-up to limit the number of
> patches in the set so DaveM would not have reason to hate me :)
> 
> 
>>
>>
>> BTW, is there an option to list all devices using the same shared block
>> - short of listing all and grepping?
> 
> $ tc qdisc show
> 
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch net-next v4 00/10] net: sched: allow qdiscs to share filter block instances
  2018-01-03 15:57               ` David Ahern
@ 2018-01-03 17:22                 ` Jiri Pirko
  2018-01-03 23:51                   ` Jakub Kicinski
  0 siblings, 1 reply; 40+ messages in thread
From: Jiri Pirko @ 2018-01-03 17:22 UTC (permalink / raw)
  To: David Ahern
  Cc: netdev, davem, jhs, xiyou.wangcong, mlxsw, andrew,
	vivien.didelot, f.fainelli, michael.chan, ganeshgr, saeedm,
	matanb, leonro, idosch, jakub.kicinski, simon.horman,
	pieter.jansenvanvuuren, john.hurley, alexander.h.duyck, ogerlitz,
	john.fastabend, daniel

Wed, Jan 03, 2018 at 04:57:23PM CET, dsahern@gmail.com wrote:
>On 1/3/18 2:40 AM, Jiri Pirko wrote:
>> Wed, Jan 03, 2018 at 03:07:36AM CET, dsahern@gmail.com wrote:
>>> On 1/2/18 12:49 PM, Jiri Pirko wrote:
>>>> DaveA, please consider following example:
>>>>
>>>> $ tc qdisc add dev ens7 ingress
>>>> $ tc qdisc
>>>> qdisc ingress ffff: dev ens7 parent ffff:fff1 block 1
>>>>
>>>> Now I have one device with one qdisc attached.
>>>>
>>>> I will add some filters, for example:
>>>> $ tc filter add dev ens7 ingress protocol ip pref 25 flower dst_ip 192.168.0.0/16 action drop
>>>>
>>>> No sharing is happening. The user is doing what he is used to do.
>>>>
>>>> Now user decides to share this filters with another device. As you can
>>>> see above, the block created for ens7 qdisc instance has id "1".
>>>> User can simply do:
>>>>
>>>> tc qdisc add dev ens8 ingress block 1
>>>>
>>>> And the block gets shared among ens7 ingress qdisc instance and ens8
>>>> ingress qdisc instance.
>>>>
>>>> What is wrong with this? The approach you suggest would disallow this
>>>
>>> Conceptually, absolutely nothing. We all agree that a shared block
>>> feature is needed. So no argument on sharing the filters across devices.
>>>
>>> The disagreement is in how they should be managed. I think my last
>>> response concisely captures my concerns -- the principle of least surprise.
>>>
>>> So with the initial commands above, all is fine. Then someone is
>>> debugging a problem or wants to add another filter to ens8, so they run:
>>>
>>> $ tc filter add dev ens8 ingress protocol ip pref 25 flower dst_ip
>>> 192.168.1.0/16 action drop
>>>
>>> Then traffic flows through ens7 break and some other user is struggling
>>> to understand what just happened. That the new filter magically appears
>>> on ens7 when the user operated on ens8 is a surprise. Nothing about that
>>> last command acknowledges that it is changing a shared resource.
>> 
>> Given the fact that user configured sharing between ens7 and ens8 and he
>> can easily see that by "$ tc qdisc show" I don't see anything wrong
>> about it, no surprise. Either the user knows what is he doing or not.
>
>tc is one of the most difficult commands for users to understand and get
>right. The API behind the command even more so. There seems to be a
>general agreement on this.
>
>To someone like you who is well versed in tc semantics this may seem
>obvious, but I contend that even you would slip up here at some point.
>There is too much distance between the filter management and the qdisc
>listing a part of which shows a block id - not that it is shared or
>anything else, just of the many words in the output there is 'block N'.
>
>>>
>>> Consider the commands being run by different people, and a time span
>>> between. Allowing the shared block to be configured by any device using
>>> the block is just setting up users for errors and confusion.
>> 
>> No confusion. Everything is visible, all info is in the manpage. The
>> same story as always.
>> 
>> 
>>>
>>>> forcing user to explicitly create some block entity and then to attach
>>>> it to qdisc instances. I don't really see good reason for it. Could you
>>>> please clear this up for me?
>>>
>>> It forces the user to acknowledge it is changing a resource that may be
>>> shared by more than one device.
>>>
>>> $ tc filter add dev ens8 ingress protocol ip pref 25 flower dst_ip
>>> 192.168.1.0/16 action drop
>>> Error: This qdisc is a shared block. Use the block API to configure.
>>>
>>> $ tc qdisc show dev ens8
>>> qdisc ingress ffff: dev ens7 parent ffff:fff1 block 1
>>>
>>> $ tc filter add block 1 protocol ip pref 25 flower dst_ip 192.168.1.0/16
>>> action drop
>>>
>>> Now there are no surprises. I have to know that ens8 is using block 1,
>>> and I have to specify that block when adding a filter.
>> 
>> On contrary. This is surprising! Consider my original example extended
>> by your approach and limitations:
>
>Nope, I was not extending your approach; I was using your examples to
>show why I disagree with the approach. As I mentioned in past responses,
>I believe the block lifecycle should be independent of any device.
>
>$ tc qdisc add block 1

qdisc add block seems odd as in this point, block 1 has nothing to do
with any qdisc


>$ tc filter add block 1 ....
>
>$ tc qdisc add dev ens7 ingress block 1
>$ tc qdisc add dev ens8 ingress block 1

So when I do just:
tc qdisc add dev ens7 ingress
there won't be a block id created. I would like to be able to have it
done on one device and share on the second when I please to. With your
limitation, I have remove all, add block, add all back again.


>
>$ tc filter show block 1
>(filters listed)
>
>$ tc filter show dev ens7 ingress

So the command will return 0 as everything is ok. List is empty, yet
there are still filters being processed on this qdisc. That is surprise
from where I stand. I as a user would expect a list of all filters being
used here. This what you propose is clearly a breakage.


I think that both of us have different expectations about how this
should work. I said already that "tc filter add block 1 ...." and
"tc filter show block 1" from your approach is fine with me. I even
think that the explicit creation and destruction of block ("tc qdisc add
block 1" and "tc qdisc del block 1") is fine.

However I don't agree about breaking the existing filter add and show
and also imposibility to make not-shared block shared in the runtime
before defining it first.

Now, I suggest I will implement the block api extensions you suggest
without the limitations and send it as a part of this patchset. Would
you be ok with that?

Thanks!


>Info: This qdisc is shared. Use the block api to list filters.
>
>(This is very similar to how I am handling nexthop objects for routes. I
>can show some examples if desired, but don't want this tangent to go too
>far.)
>
>> 
>> $ tc qdisc add dev ens7 ingress
>> $ tc qdisc
>> qdisc ingress ffff: dev ens7 parent ffff:fff1 block 1
>> $ tc filter add dev ens7 ingress protocol ip pref 25 flower dst_ip 192.168.0.0/16 action drop
>> 
>> So far, everything is good. Now I add qdisc with block 1 to ens8:
>> $ tc qdisc add dev ens8 ingress block 1
>> 
>> And I do:
>> $ tc filter add dev ens7 ingress protocol ip pref 25 flower dst_ip 192.168.0.0/16 action drop
>> Should it Error out or pass by your limitations?
>> 
>> Assume it should pass.
>> I do:
>> $ tc filter add dev ens8 ingress protocol ip pref 25 flower dst_ip 192.168.1.0/16 action drop
>> Error: This qdisc is a shared block. Use the block API to configure.
>> 
>> This will error out as you wrote. Now I do show:
>> 
>> $ tc qdisc show dev ens8                                                
>> qdisc ingress ffff: dev ens7 parent ffff:fff1 block 1
>> 
>> As you wrote, there is "ens7" in output of ens8 qdisc. That is
>> surprising.
>
>that's a typo on my part with copy-paste-modify of commands while
>writing an email; that was not intentional to show ens7 on an ens8 device.

ok.


>
>
>> 
>> What would following commands show with your limitations:
>> $ tc filter show dev ens7 ingress
>> $ tc filter show dev ens8 ingress
>
>see above
>
>> 
>> All filters should be listed under ens7 and ens8 should be blank? I
>> cannot add filters to ens8 with your limitations so I guess the show for
>> it should be blank. But there are actually rules there! That is another
>> surprise and breakage!
>> 
>> Now I continue and remove the qdisc from ens7:
>> $ tc qdisc add dev ens7 ingress
>> 
>> The block 1 is still there for ens8. So what happens now? What is the
>> output of "filter show dev ens8 ingress" and "qdisc show dev ens8"?
>> Will "add dev ens8 ingress" magically start to work now? This is another
>> set of surprises and breakages.
>> 
>> So as I see it with your limitations, there is a lot of surprises
>> introduced.
>> 
>> Note that I gave a lot of thoughts to all this. The approach I suggest
>> is the cleanest and does not break anything. Also, it is easily
>> extendable by adding the block handle to add/del/list the filters.
>> But the current commands should not be broken. Please.
>> 
>> If you want, I can implement the block handle extension as a part of this
>> patchset. I wanted to do it as a follow-up to limit the number of
>> patches in the set so DaveM would not have reason to hate me :)
>> 
>> 
>>>
>>>
>>> BTW, is there an option to list all devices using the same shared block
>>> - short of listing all and grepping?
>> 
>> $ tc qdisc show
>> 
>> 
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch net-next v4 00/10] net: sched: allow qdiscs to share filter block instances
  2018-01-03 17:22                 ` Jiri Pirko
@ 2018-01-03 23:51                   ` Jakub Kicinski
  2018-01-04  6:57                     ` Jiri Pirko
  0 siblings, 1 reply; 40+ messages in thread
From: Jakub Kicinski @ 2018-01-03 23:51 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: David Ahern, netdev, davem, jhs, xiyou.wangcong, mlxsw, andrew,
	vivien.didelot, f.fainelli, michael.chan, ganeshgr, saeedm,
	matanb, leonro, idosch, simon.horman, pieter.jansenvanvuuren,
	john.hurley, alexander.h.duyck, ogerlitz, john.fastabend, daniel

On Wed, 3 Jan 2018 18:22:09 +0100, Jiri Pirko wrote:
> However I don't agree about breaking the existing filter add and show
> and also imposibility to make not-shared block shared in the runtime
> before defining it first.

FWIW I would agree with David that allowing add on a shared block
modify filters on another interface can break existing users.  (No
opinion on dump and lifetime).

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch net-next v4 00/10] net: sched: allow qdiscs to share filter block instances
  2018-01-03 23:51                   ` Jakub Kicinski
@ 2018-01-04  6:57                     ` Jiri Pirko
  2018-01-04  7:06                       ` Jakub Kicinski
  0 siblings, 1 reply; 40+ messages in thread
From: Jiri Pirko @ 2018-01-04  6:57 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David Ahern, netdev, davem, jhs, xiyou.wangcong, mlxsw, andrew,
	vivien.didelot, f.fainelli, michael.chan, ganeshgr, saeedm,
	matanb, leonro, idosch, simon.horman, pieter.jansenvanvuuren,
	john.hurley, alexander.h.duyck, ogerlitz, john.fastabend, daniel

Thu, Jan 04, 2018 at 12:51:52AM CET, kubakici@wp.pl wrote:
>On Wed, 3 Jan 2018 18:22:09 +0100, Jiri Pirko wrote:
>> However I don't agree about breaking the existing filter add and show
>> and also imposibility to make not-shared block shared in the runtime
>> before defining it first.
>
>FWIW I would agree with David that allowing add on a shared block
>modify filters on another interface can break existing users.  (No
>opinion on dump and lifetime).

I don't think that David is saying that, but why do you think it would
break existing users?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch net-next v4 00/10] net: sched: allow qdiscs to share filter block instances
  2018-01-04  6:57                     ` Jiri Pirko
@ 2018-01-04  7:06                       ` Jakub Kicinski
  2018-01-04 10:12                         ` Jiri Pirko
  0 siblings, 1 reply; 40+ messages in thread
From: Jakub Kicinski @ 2018-01-04  7:06 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: David Ahern, netdev, davem, jhs, xiyou.wangcong, mlxsw, andrew,
	vivien.didelot, f.fainelli, michael.chan, ganeshgr, saeedm,
	matanb, leonro, idosch, simon.horman, pieter.jansenvanvuuren,
	john.hurley, alexander.h.duyck, ogerlitz, john.fastabend, daniel

On Thu, 4 Jan 2018 07:57:02 +0100, Jiri Pirko wrote:
> Thu, Jan 04, 2018 at 12:51:52AM CET, kubakici@wp.pl wrote:
> >On Wed, 3 Jan 2018 18:22:09 +0100, Jiri Pirko wrote:  
> >> However I don't agree about breaking the existing filter add and show
> >> and also imposibility to make not-shared block shared in the runtime
> >> before defining it first.  
> >
> >FWIW I would agree with David that allowing add on a shared block
> >modify filters on another interface can break existing users.  (No
> >opinion on dump and lifetime).  
> 
> I don't think that David is saying that, but why do you think it would
> break existing users?

Perhaps I worded is too strongly as "breaking existing users", but it
certainly introduces surprising side effects.  David put it into words
very well:

On Tue, 2 Jan 2018 19:07:36 -0700, David Ahern wrote:
> The disagreement is in how they should be managed. I think my last
> response concisely captures my concerns -- the principle of least surprise.
> 
> So with the initial commands above, all is fine. Then someone is
> debugging a problem or wants to add another filter to ens8, so they run:
> 
> $ tc filter add dev ens8 ingress protocol ip pref 25 flower dst_ip
> 192.168.1.0/16 action drop
> 
> Then traffic flows through ens7 break and some other user is struggling
> to understand what just happened. That the new filter magically appears
> on ens7 when the user operated on ens8 is a surprise. Nothing about that
> last command acknowledges that it is changing a shared resource.
> 
> Consider the commands being run by different people, and a time span
> between. Allowing the shared block to be configured by any device using
> the block is just setting up users for errors and confusion.
> 
> > forcing user to explicitly create some block entity and then to attach
> > it to qdisc instances. I don't really see good reason for it. Could you
> > please clear this up for me?  
> 
> It forces the user to acknowledge it is changing a resource that may be
> shared by more than one device.
> 
> $ tc filter add dev ens8 ingress protocol ip pref 25 flower dst_ip
> 192.168.1.0/16 action drop
> Error: This qdisc is a shared block. Use the block API to configure.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch net-next v4 00/10] net: sched: allow qdiscs to share filter block instances
  2018-01-04  7:06                       ` Jakub Kicinski
@ 2018-01-04 10:12                         ` Jiri Pirko
  2018-01-04 12:41                           ` Jamal Hadi Salim
                                             ` (2 more replies)
  0 siblings, 3 replies; 40+ messages in thread
From: Jiri Pirko @ 2018-01-04 10:12 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David Ahern, netdev, davem, jhs, xiyou.wangcong, mlxsw, andrew,
	vivien.didelot, f.fainelli, michael.chan, ganeshgr, saeedm,
	matanb, leonro, idosch, simon.horman, pieter.jansenvanvuuren,
	john.hurley, alexander.h.duyck, ogerlitz, john.fastabend, daniel

Thu, Jan 04, 2018 at 08:06:58AM CET, kubakici@wp.pl wrote:
>On Thu, 4 Jan 2018 07:57:02 +0100, Jiri Pirko wrote:
>> Thu, Jan 04, 2018 at 12:51:52AM CET, kubakici@wp.pl wrote:
>> >On Wed, 3 Jan 2018 18:22:09 +0100, Jiri Pirko wrote:  
>> >> However I don't agree about breaking the existing filter add and show
>> >> and also imposibility to make not-shared block shared in the runtime
>> >> before defining it first.  
>> >
>> >FWIW I would agree with David that allowing add on a shared block
>> >modify filters on another interface can break existing users.  (No
>> >opinion on dump and lifetime).  
>> 
>> I don't think that David is saying that, but why do you think it would
>> break existing users?
>
>Perhaps I worded is too strongly as "breaking existing users", but it
>certainly introduces surprising side effects.  David put it into words
>very well:
>
>On Tue, 2 Jan 2018 19:07:36 -0700, David Ahern wrote:
>> The disagreement is in how they should be managed. I think my last
>> response concisely captures my concerns -- the principle of least surprise.
>> 
>> So with the initial commands above, all is fine. Then someone is
>> debugging a problem or wants to add another filter to ens8, so they run:
>> 
>> $ tc filter add dev ens8 ingress protocol ip pref 25 flower dst_ip
>> 192.168.1.0/16 action drop
>> 
>> Then traffic flows through ens7 break and some other user is struggling
>> to understand what just happened. That the new filter magically appears
>> on ens7 when the user operated on ens8 is a surprise. Nothing about that
>> last command acknowledges that it is changing a shared resource.

It is not that the new filter magically appears on ens7. No magic. ens8
and ens7 share the same block. User set the sharing up originally,
he can see it in "tc qdisc show". So if he does this, he cannot be
surprised. Also, he can list the filters if he does "tc filter show"
to see what is there per-qdisc.

If the user expects something else, he should go back and read the docs
and learn to understand how tc works.


>> 
>> Consider the commands being run by different people, and a time span
>> between. Allowing the shared block to be configured by any device using
>> the block is just setting up users for errors and confusion.
>> 
>> > forcing user to explicitly create some block entity and then to attach
>> > it to qdisc instances. I don't really see good reason for it. Could you
>> > please clear this up for me?  
>> 
>> It forces the user to acknowledge it is changing a resource that may be
>> shared by more than one device.
>> 
>> $ tc filter add dev ens8 ingress protocol ip pref 25 flower dst_ip
>> 192.168.1.0/16 action drop
>> Error: This qdisc is a shared block. Use the block API to configure.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch net-next v4 00/10] net: sched: allow qdiscs to share filter block instances
  2018-01-04 10:12                         ` Jiri Pirko
@ 2018-01-04 12:41                           ` Jamal Hadi Salim
  2018-01-04 13:00                             ` Jiri Pirko
  2018-01-04 15:45                             ` David Miller
  2018-01-04 12:55                           ` Jamal Hadi Salim
  2018-01-04 15:33                           ` David Miller
  2 siblings, 2 replies; 40+ messages in thread
From: Jamal Hadi Salim @ 2018-01-04 12:41 UTC (permalink / raw)
  To: Jiri Pirko, Jakub Kicinski
  Cc: David Ahern, netdev, davem, xiyou.wangcong, mlxsw, andrew,
	vivien.didelot, f.fainelli, michael.chan, ganeshgr, saeedm,
	matanb, leonro, idosch, simon.horman, pieter.jansenvanvuuren,
	john.hurley, alexander.h.duyck, ogerlitz, john.fastabend, daniel



My comments below - I do have a more _basic_ issue on the egress
side that IMO noone has brought up that needs to be addressed.
All examples so far have been focusing on ingress.

Since I am late in this discussion, let me state my understanding
of the thread and assumptions so there is no cross-talk.

we are going to need to be able to say:
tc qdisc add block 1
tc filter add block 1  protocol ip priority 10 flower ...

 From my scanning of the thread, I dont think I see disagreement.
Jiri, did i misread what you are saying?

New approach syntax should still be legit for backward
compat, so this should work:

"tc filter add dev ens7 parent ffff: protocol ip priority 10 flower .."

Blocks or not we still need to have old scripts work.
Which also means one should also be able to say:
"tc filter ls dev ens7 ..." with no suprises.

 From my reading of the thread - there is no disagreement that
"tc qdisc add dev ens7 ingress block 1"
should be able to "bind" to block1 if it exists
or create it if it doesnt and then "bind" to it ("autobind").

And that a simple "tc qdisc add dev ens7 ingress" should
either not be able to create a block (or we can say creates
block id 0 - owned by the netdev - for consistency).

To David A: If someone created the qdisc using new approach
and then later tried to add/list rules using the old syntax, surely
that should work, no? I have seen setups which separate
creation of qdiscs using one script and addition of filters via
another script (very common actually).

Having said that:
I see the value of less is more and being able to use one(block)
interface to add/del/list filters if we add the blocks feature;
but backward compat(meaning crazy scripting) needs to be respected.
I dont see how we get away from that.

Maybe what we need is another knob to control this new functionality?
Either a kernel config option or an extra parameter when creating
the qdisc or a system wide boolean per netdev (which of course
contradicts the "less is more" principle). If this knob was set
then you reject addition of filters via a port/qdisc which is sharing.

I agree with Jiri - if you consciously choose to share there should
be no suprises with what filters get applied.

I think this email is too long and my egress concern will be lost
if i typed it here; so i will send another email.

cheers,
jamal

On 18-01-04 05:12 AM, Jiri Pirko wrote:
> Thu, Jan 04, 2018 at 08:06:58AM CET, kubakici@wp.pl wrote:
>> On Thu, 4 Jan 2018 07:57:02 +0100, Jiri Pirko wrote:
>>> Thu, Jan 04, 2018 at 12:51:52AM CET, kubakici@wp.pl wrote:
>>>> On Wed, 3 Jan 2018 18:22:09 +0100, Jiri Pirko wrote:
>>>>> However I don't agree about breaking the existing filter add and show
>>>>> and also imposibility to make not-shared block shared in the runtime
>>>>> before defining it first.
>>>>
>>>> FWIW I would agree with David that allowing add on a shared block
>>>> modify filters on another interface can break existing users.  (No
>>>> opinion on dump and lifetime).
>>>
>>> I don't think that David is saying that, but why do you think it would
>>> break existing users?
>>
>> Perhaps I worded is too strongly as "breaking existing users", but it
>> certainly introduces surprising side effects.  David put it into words
>> very well:
>>
>> On Tue, 2 Jan 2018 19:07:36 -0700, David Ahern wrote:
>>> The disagreement is in how they should be managed. I think my last
>>> response concisely captures my concerns -- the principle of least surprise.
>>>
>>> So with the initial commands above, all is fine. Then someone is
>>> debugging a problem or wants to add another filter to ens8, so they run:
>>>
>>> $ tc filter add dev ens8 ingress protocol ip pref 25 flower dst_ip
>>> 192.168.1.0/16 action drop
>>>
>>> Then traffic flows through ens7 break and some other user is struggling
>>> to understand what just happened. That the new filter magically appears
>>> on ens7 when the user operated on ens8 is a surprise. Nothing about that
>>> last command acknowledges that it is changing a shared resource.
> 
> It is not that the new filter magically appears on ens7. No magic. ens8
> and ens7 share the same block. User set the sharing up originally,
> he can see it in "tc qdisc show". So if he does this, he cannot be
> surprised. Also, he can list the filters if he does "tc filter show"
> to see what is there per-qdisc.
> 
> If the user expects something else, he should go back and read the docs
> and learn to understand how tc works.
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch net-next v4 00/10] net: sched: allow qdiscs to share filter block instances
  2018-01-04 10:12                         ` Jiri Pirko
  2018-01-04 12:41                           ` Jamal Hadi Salim
@ 2018-01-04 12:55                           ` Jamal Hadi Salim
  2018-01-04 13:05                             ` Jiri Pirko
  2018-01-04 15:33                           ` David Miller
  2 siblings, 1 reply; 40+ messages in thread
From: Jamal Hadi Salim @ 2018-01-04 12:55 UTC (permalink / raw)
  To: Jiri Pirko, Jakub Kicinski
  Cc: David Ahern, netdev, davem, xiyou.wangcong, mlxsw, andrew,
	vivien.didelot, f.fainelli, michael.chan, ganeshgr, saeedm,
	matanb, leonro, idosch, simon.horman, pieter.jansenvanvuuren,
	john.hurley, alexander.h.duyck, ogerlitz, john.fastabend, daniel


On the egress issue and sharing.

Let me provide a simple example to illustrate.

tc qdisc add dev enps7 root handle 1: prio block 1

Creates 3 classes

$ tc class ls dev enps7
   class prio 1:1 parent 1:
   class prio 1:2 parent 1:
   class prio 1:3 parent 1:

tc qdisc add dev enps8 root handle 10: prio block 1

Creates 3 classes

$ tc class ls dev enps8
   class prio 10:1 parent 10:
   class prio 10:2 parent 10:
   class prio 10:3 parent 10:

So now i add filters, today I can do:

tc filter add dev enps7 parent 1:0 protocol ip priority 10 flower ...
classid 1:2

I could also have added this via the new block interface i.e

$ tc filter add block 1 protocol ip priority 10 flower ...
classid 1:2

Looks good - things will work fine for packets showing
up on egress of enps7 which match the flower rule
and classid 1:2 is selected to queue the packet on.

Things will not _work fine_ for packets showing up on
egress of ensp8. There is no classid 1:2 on egress of
enps8. The prio qdisc is a bad example because it
has a default queue (i think 10:2) in this case. Other
qdiscs(off top of my head DRR) will just drop the packet.

I think this is resolvable - but it will take more to the
patches than the current set you posted Jiri.
A simple solution is to say sharing only works for ingress
(but that sounds very lame).

cheers,
jamal

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch net-next v4 00/10] net: sched: allow qdiscs to share filter block instances
  2018-01-04 12:41                           ` Jamal Hadi Salim
@ 2018-01-04 13:00                             ` Jiri Pirko
  2018-01-04 13:30                               ` Jamal Hadi Salim
  2018-01-04 15:45                             ` David Miller
  1 sibling, 1 reply; 40+ messages in thread
From: Jiri Pirko @ 2018-01-04 13:00 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Jakub Kicinski, David Ahern, netdev, davem, xiyou.wangcong,
	mlxsw, andrew, vivien.didelot, f.fainelli, michael.chan,
	ganeshgr, saeedm, matanb, leonro, idosch, simon.horman,
	pieter.jansenvanvuuren, john.hurley, alexander.h.duyck, ogerlitz,
	john.fastabend, daniel

Thu, Jan 04, 2018 at 01:41:54PM CET, jhs@mojatatu.com wrote:
>
>
>My comments below - I do have a more _basic_ issue on the egress
>side that IMO noone has brought up that needs to be addressed.
>All examples so far have been focusing on ingress.

Which one? I guess you will send it as a separate email.


>
>Since I am late in this discussion, let me state my understanding
>of the thread and assumptions so there is no cross-talk.
>
>we are going to need to be able to say:
>tc qdisc add block 1
>tc filter add block 1  protocol ip priority 10 flower ...
>
>From my scanning of the thread, I dont think I see disagreement.
>Jiri, did i misread what you are saying?

Agreed. The plan is to do this in the follow-up, but if you guys need it
now, I can do that now.


>
>New approach syntax should still be legit for backward
>compat, so this should work:
>
>"tc filter add dev ens7 parent ffff: protocol ip priority 10 flower .."

Agreed. Works like that with the current patchset.


>
>Blocks or not we still need to have old scripts work.
>Which also means one should also be able to say:
>"tc filter ls dev ens7 ..." with no suprises.

Agreed. Works like that with the current patchset.


>
>From my reading of the thread - there is no disagreement that
>"tc qdisc add dev ens7 ingress block 1"
>should be able to "bind" to block1 if it exists
>or create it if it doesnt and then "bind" to it ("autobind").

Agreed. Works exactly as you described in the current patchset.


>
>And that a simple "tc qdisc add dev ens7 ingress" should
>either not be able to create a block (or we can say creates
>block id 0 - owned by the netdev - for consistency).

This allocates a new block - as it needs it internally anyway - and tc
core will assign unused block id. You can see this id in the list then.
Later you can use this block id for other created qdiscs.

Block ids are per-ns.


>
>To David A: If someone created the qdisc using new approach
>and then later tried to add/list rules using the old syntax, surely
>that should work, no? I have seen setups which separate
>creation of qdiscs using one script and addition of filters via
>another script (very common actually).

Agreed, that is exactly what I want from day 1.


>
>Having said that:
>I see the value of less is more and being able to use one(block)
>interface to add/del/list filters if we add the blocks feature;
>but backward compat(meaning crazy scripting) needs to be respected.
>I dont see how we get away from that.
>
>Maybe what we need is another knob to control this new functionality?
>Either a kernel config option or an extra parameter when creating
>the qdisc or a system wide boolean per netdev (which of course
>contradicts the "less is more" principle). If this knob was set
>then you reject addition of filters via a port/qdisc which is sharing.

I don't like this knob idea. It just adds confusion for no good reason
what so ever.


>
>I agree with Jiri - if you consciously choose to share there should
>be no suprises with what filters get applied.
>
>I think this email is too long and my egress concern will be lost
>if i typed it here; so i will send another email.
>
>cheers,
>jamal
>
>On 18-01-04 05:12 AM, Jiri Pirko wrote:
>> Thu, Jan 04, 2018 at 08:06:58AM CET, kubakici@wp.pl wrote:
>> > On Thu, 4 Jan 2018 07:57:02 +0100, Jiri Pirko wrote:
>> > > Thu, Jan 04, 2018 at 12:51:52AM CET, kubakici@wp.pl wrote:
>> > > > On Wed, 3 Jan 2018 18:22:09 +0100, Jiri Pirko wrote:
>> > > > > However I don't agree about breaking the existing filter add and show
>> > > > > and also imposibility to make not-shared block shared in the runtime
>> > > > > before defining it first.
>> > > > 
>> > > > FWIW I would agree with David that allowing add on a shared block
>> > > > modify filters on another interface can break existing users.  (No
>> > > > opinion on dump and lifetime).
>> > > 
>> > > I don't think that David is saying that, but why do you think it would
>> > > break existing users?
>> > 
>> > Perhaps I worded is too strongly as "breaking existing users", but it
>> > certainly introduces surprising side effects.  David put it into words
>> > very well:
>> > 
>> > On Tue, 2 Jan 2018 19:07:36 -0700, David Ahern wrote:
>> > > The disagreement is in how they should be managed. I think my last
>> > > response concisely captures my concerns -- the principle of least surprise.
>> > > 
>> > > So with the initial commands above, all is fine. Then someone is
>> > > debugging a problem or wants to add another filter to ens8, so they run:
>> > > 
>> > > $ tc filter add dev ens8 ingress protocol ip pref 25 flower dst_ip
>> > > 192.168.1.0/16 action drop
>> > > 
>> > > Then traffic flows through ens7 break and some other user is struggling
>> > > to understand what just happened. That the new filter magically appears
>> > > on ens7 when the user operated on ens8 is a surprise. Nothing about that
>> > > last command acknowledges that it is changing a shared resource.
>> 
>> It is not that the new filter magically appears on ens7. No magic. ens8
>> and ens7 share the same block. User set the sharing up originally,
>> he can see it in "tc qdisc show". So if he does this, he cannot be
>> surprised. Also, he can list the filters if he does "tc filter show"
>> to see what is there per-qdisc.
>> 
>> If the user expects something else, he should go back and read the docs
>> and learn to understand how tc works.
>> 
>
>
>
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch net-next v4 00/10] net: sched: allow qdiscs to share filter block instances
  2018-01-04 12:55                           ` Jamal Hadi Salim
@ 2018-01-04 13:05                             ` Jiri Pirko
  2018-01-04 13:43                               ` Jamal Hadi Salim
  0 siblings, 1 reply; 40+ messages in thread
From: Jiri Pirko @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Jakub Kicinski, David Ahern, netdev, davem, xiyou.wangcong,
	mlxsw, andrew, vivien.didelot, f.fainelli, michael.chan,
	ganeshgr, saeedm, matanb, leonro, idosch, simon.horman,
	pieter.jansenvanvuuren, john.hurley, alexander.h.duyck, ogerlitz,
	john.fastabend, daniel

Thu, Jan 04, 2018 at 01:55:05PM CET, jhs@mojatatu.com wrote:
>
>On the egress issue and sharing.
>
>Let me provide a simple example to illustrate.
>
>tc qdisc add dev enps7 root handle 1: prio block 1
>
>Creates 3 classes
>
>$ tc class ls dev enps7
>  class prio 1:1 parent 1:
>  class prio 1:2 parent 1:
>  class prio 1:3 parent 1:
>
>tc qdisc add dev enps8 root handle 10: prio block 1
>
>Creates 3 classes
>
>$ tc class ls dev enps8
>  class prio 10:1 parent 10:
>  class prio 10:2 parent 10:
>  class prio 10:3 parent 10:
>
>So now i add filters, today I can do:
>
>tc filter add dev enps7 parent 1:0 protocol ip priority 10 flower ...
>classid 1:2
>
>I could also have added this via the new block interface i.e
>
>$ tc filter add block 1 protocol ip priority 10 flower ...
>classid 1:2
>
>Looks good - things will work fine for packets showing
>up on egress of enps7 which match the flower rule
>and classid 1:2 is selected to queue the packet on.
>
>Things will not _work fine_ for packets showing up on
>egress of ensp8. There is no classid 1:2 on egress of
>enps8. The prio qdisc is a bad example because it
>has a default queue (i think 10:2) in this case. Other
>qdiscs(off top of my head DRR) will just drop the packet.

I'm very well aware of this. The plan is to support this and resolve
the lack of classid withing the qdisc according to the qdisc type.
However, currently the classfull qdiscs are not supported for
block sharing - only ingress and clsact qdiscs are supported.


>
>I think this is resolvable - but it will take more to the
>patches than the current set you posted Jiri.
>A simple solution is to say sharing only works for ingress
>(but that sounds very lame).

That is the current limitation of the patchset as you can see. The
sharing works only for ingress and clsact qdisc. So works for both
ingress and egress (clsact).



>
>cheers,
>jamal

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch net-next v4 00/10] net: sched: allow qdiscs to share filter block instances
  2018-01-04 13:00                             ` Jiri Pirko
@ 2018-01-04 13:30                               ` Jamal Hadi Salim
  2018-01-04 14:02                                 ` Jiri Pirko
  0 siblings, 1 reply; 40+ messages in thread
From: Jamal Hadi Salim @ 2018-01-04 13:30 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Jakub Kicinski, David Ahern, netdev, davem, xiyou.wangcong,
	mlxsw, andrew, vivien.didelot, f.fainelli, michael.chan,
	ganeshgr, saeedm, matanb, leonro, idosch, simon.horman,
	pieter.jansenvanvuuren, john.hurley, alexander.h.duyck, ogerlitz,
	john.fastabend, daniel

On 18-01-04 08:00 AM, Jiri Pirko wrote:
> Thu, Jan 04, 2018 at 01:41:54PM CET, jhs@mojatatu.com wrote:


>>
>> And that a simple "tc qdisc add dev ens7 ingress" should
>> either not be able to create a block (or we can say creates
>> block id 0 - owned by the netdev - for consistency).
> 
> This allocates a new block - as it needs it internally anyway - and tc
> core will assign unused block id. You can see this id in the list then.
> Later you can use this block id for other created qdiscs.
> 
> Block ids are per-ns.
>

Hrm. I was thinking more that there is a block that cannot be shared
and is only owned by the netdev i.e current behavior. But what you
describe should work
(and let the user decide if they want to share).

>>
>> Maybe what we need is another knob to control this new functionality?
>> Either a kernel config option or an extra parameter when creating
>> the qdisc or a system wide boolean per netdev (which of course
>> contradicts the "less is more" principle). If this knob was set
>> then you reject addition of filters via a port/qdisc which is sharing.
> 
> I don't like this knob idea. It just adds confusion for no good reason
> what so ever.

I dont like it either but i see it as a knob to choose between
improved usability/manageability (which new interface provides)
vs least suprise - which is to keep the old syntax around.
If i was an admin and turned on this knob - I am prepared to deal
with fallout of some user script which tries to add filters via
the device instead of the block.

cheers,
jamal

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch net-next v4 00/10] net: sched: allow qdiscs to share filter block instances
  2018-01-04 13:05                             ` Jiri Pirko
@ 2018-01-04 13:43                               ` Jamal Hadi Salim
  2018-01-04 14:06                                 ` Jiri Pirko
  0 siblings, 1 reply; 40+ messages in thread
From: Jamal Hadi Salim @ 2018-01-04 13:43 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Jakub Kicinski, David Ahern, netdev, davem, xiyou.wangcong,
	mlxsw, andrew, vivien.didelot, f.fainelli, michael.chan,
	ganeshgr, saeedm, matanb, leonro, idosch, simon.horman,
	pieter.jansenvanvuuren, john.hurley, alexander.h.duyck, ogerlitz,
	john.fastabend, daniel

On 18-01-04 08:05 AM, Jiri Pirko wrote:
> Thu, Jan 04, 2018 at 01:55:05PM CET, jhs@mojatatu.com wrote:

>>
>> $ tc filter add block 1 protocol ip priority 10 flower ...
>> classid 1:2
>>
>> Looks good - things will work fine for packets showing
>> up on egress of enps7 which match the flower rule
>> and classid 1:2 is selected to queue the packet on.
>>
>> Things will not _work fine_ for packets showing up on
>> egress of ensp8. There is no classid 1:2 on egress of
>> enps8. The prio qdisc is a bad example because it
>> has a default queue (i think 10:2) in this case. Other
>> qdiscs(off top of my head DRR) will just drop the packet.
> 
> I'm very well aware of this. The plan is to support this and resolve
> the lack of classid withing the qdisc according to the qdisc type.

If i understood you correctly,  that is still iffy.
An admin will see packets for the same filter on one device going
to an agreed-to queue but on another going to a default queue.

> However, currently the classfull qdiscs are not supported for
> block sharing - only ingress and clsact qdiscs are supported.
> 

Essentially anything that doesnt have queues associated with it..
(and ignore the tcf_result).

>>
>> I think this is resolvable - but it will take more to the
>> patches than the current set you posted Jiri.
>> A simple solution is to say sharing only works for ingress
>> (but that sounds very lame).
> 
> That is the current limitation of the patchset as you can see. The
> sharing works only for ingress and clsact qdisc. So works for both
> ingress and egress (clsact).
> 

One option is to name all egress queues the same way on all devices
f.e in the two examples i provided call root qdiscs 1:0.

I am not sure how cleanly you get this to work with egress
My current thinking involves some brain somersault...
I will think some more about it...

cheers,
jamal

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch net-next v4 00/10] net: sched: allow qdiscs to share filter block instances
  2018-01-04 13:30                               ` Jamal Hadi Salim
@ 2018-01-04 14:02                                 ` Jiri Pirko
  0 siblings, 0 replies; 40+ messages in thread
From: Jiri Pirko @ 2018-01-04 14:02 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Jakub Kicinski, David Ahern, netdev, davem, xiyou.wangcong,
	mlxsw, andrew, vivien.didelot, f.fainelli, michael.chan,
	ganeshgr, saeedm, matanb, leonro, idosch, simon.horman,
	pieter.jansenvanvuuren, john.hurley, alexander.h.duyck, ogerlitz,
	john.fastabend, daniel

Thu, Jan 04, 2018 at 02:30:54PM CET, jhs@mojatatu.com wrote:
>On 18-01-04 08:00 AM, Jiri Pirko wrote:
>> Thu, Jan 04, 2018 at 01:41:54PM CET, jhs@mojatatu.com wrote:
>
>
>> > 
>> > And that a simple "tc qdisc add dev ens7 ingress" should
>> > either not be able to create a block (or we can say creates
>> > block id 0 - owned by the netdev - for consistency).
>> 
>> This allocates a new block - as it needs it internally anyway - and tc
>> core will assign unused block id. You can see this id in the list then.
>> Later you can use this block id for other created qdiscs.
>> 
>> Block ids are per-ns.
>> 
>
>Hrm. I was thinking more that there is a block that cannot be shared
>and is only owned by the netdev i.e current behavior. But what you
>describe should work
>(and let the user decide if they want to share).

Yeah. This is how it is implemented right now. Works fine. And provides
user possibility to set the share with other qdisc while the filters are
already there. It is actually quite nice.

>
>> > 
>> > Maybe what we need is another knob to control this new functionality?
>> > Either a kernel config option or an extra parameter when creating
>> > the qdisc or a system wide boolean per netdev (which of course
>> > contradicts the "less is more" principle). If this knob was set
>> > then you reject addition of filters via a port/qdisc which is sharing.
>> 
>> I don't like this knob idea. It just adds confusion for no good reason
>> what so ever.
>
>I dont like it either but i see it as a knob to choose between
>improved usability/manageability (which new interface provides)
>vs least suprise - which is to keep the old syntax around.
>If i was an admin and turned on this knob - I am prepared to deal
>with fallout of some user script which tries to add filters via
>the device instead of the block.

This smells awfully like "break_my_system" knob described by DaveM
during our route offloading discussions :)

I would just go without this knob for now in a way I suggested earlier
in my reply to DavidA. I would leave the functionality there as it is in
the current patchset, leaving all original interfaces to work and on top
of that, I will add "filter add block" and "filter list block" handles.
Sounds fine?


>
>cheers,
>jamal

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch net-next v4 00/10] net: sched: allow qdiscs to share filter block instances
  2018-01-04 13:43                               ` Jamal Hadi Salim
@ 2018-01-04 14:06                                 ` Jiri Pirko
  2018-01-04 15:42                                   ` Jamal Hadi Salim
  0 siblings, 1 reply; 40+ messages in thread
From: Jiri Pirko @ 2018-01-04 14:06 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Jakub Kicinski, David Ahern, netdev, davem, xiyou.wangcong,
	mlxsw, andrew, vivien.didelot, f.fainelli, michael.chan,
	ganeshgr, saeedm, matanb, leonro, idosch, simon.horman,
	pieter.jansenvanvuuren, john.hurley, alexander.h.duyck, ogerlitz,
	john.fastabend, daniel

Thu, Jan 04, 2018 at 02:43:08PM CET, jhs@mojatatu.com wrote:
>On 18-01-04 08:05 AM, Jiri Pirko wrote:
>> Thu, Jan 04, 2018 at 01:55:05PM CET, jhs@mojatatu.com wrote:
>
>> > 
>> > $ tc filter add block 1 protocol ip priority 10 flower ...
>> > classid 1:2
>> > 
>> > Looks good - things will work fine for packets showing
>> > up on egress of enps7 which match the flower rule
>> > and classid 1:2 is selected to queue the packet on.
>> > 
>> > Things will not _work fine_ for packets showing up on
>> > egress of ensp8. There is no classid 1:2 on egress of
>> > enps8. The prio qdisc is a bad example because it
>> > has a default queue (i think 10:2) in this case. Other
>> > qdiscs(off top of my head DRR) will just drop the packet.
>> 
>> I'm very well aware of this. The plan is to support this and resolve
>> the lack of classid withing the qdisc according to the qdisc type.
>
>If i understood you correctly,  that is still iffy.
>An admin will see packets for the same filter on one device going
>to an agreed-to queue but on another going to a default queue.

Yeah, it is. That needs a bit more thoughts and discussion. That was the
main purpose I did not address it in this patchset and allowed only
ingress and clsact qidsc.


>
>> However, currently the classfull qdiscs are not supported for
>> block sharing - only ingress and clsact qdiscs are supported.
>> 
>
>Essentially anything that doesnt have queues associated with it..
>(and ignore the tcf_result).
>
>> > 
>> > I think this is resolvable - but it will take more to the
>> > patches than the current set you posted Jiri.
>> > A simple solution is to say sharing only works for ingress
>> > (but that sounds very lame).
>> 
>> That is the current limitation of the patchset as you can see. The
>> sharing works only for ingress and clsact qdisc. So works for both
>> ingress and egress (clsact).
>> 
>
>One option is to name all egress queues the same way on all devices
>f.e in the two examples i provided call root qdiscs 1:0.
>
>I am not sure how cleanly you get this to work with egress
>My current thinking involves some brain somersault...
>I will think some more about it...

clsact egress works the same as ingress qdisc - please check the code.


>
>cheers,
>jamal

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch net-next v4 00/10] net: sched: allow qdiscs to share filter block instances
  2018-01-04 10:12                         ` Jiri Pirko
  2018-01-04 12:41                           ` Jamal Hadi Salim
  2018-01-04 12:55                           ` Jamal Hadi Salim
@ 2018-01-04 15:33                           ` David Miller
  2018-01-04 15:51                             ` Jiri Pirko
  2 siblings, 1 reply; 40+ messages in thread
From: David Miller @ 2018-01-04 15:33 UTC (permalink / raw)
  To: jiri
  Cc: kubakici, dsahern, netdev, jhs, xiyou.wangcong, mlxsw, andrew,
	vivien.didelot, f.fainelli, michael.chan, ganeshgr, saeedm,
	matanb, leonro, idosch, simon.horman, pieter.jansenvanvuuren,
	john.hurley, alexander.h.duyck, ogerlitz, john.fastabend, daniel

From: Jiri Pirko <jiri@resnulli.us>
Date: Thu, 4 Jan 2018 11:12:57 +0100

> No magic. ens8 and ens7 share the same block.

No Jiri, the fact that they share the same block _IS MAGIC_.

It is unexpected behavior to modify a rule and have it propagate
to devices not mentioned in the command line.

This is totally going to break things and upset people.

Saying it shows up in some tc dump command is not an argument
for this behavior being "expected".  NO way.

I completely agree with David and others, you _MUST_ make an
explicit API and set of operations to make changes to rules
contained in shared blocks.

Period.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch net-next v4 00/10] net: sched: allow qdiscs to share filter block instances
  2018-01-04 14:06                                 ` Jiri Pirko
@ 2018-01-04 15:42                                   ` Jamal Hadi Salim
  0 siblings, 0 replies; 40+ messages in thread
From: Jamal Hadi Salim @ 2018-01-04 15:42 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Jakub Kicinski, David Ahern, netdev, davem, xiyou.wangcong,
	mlxsw, andrew, vivien.didelot, f.fainelli, michael.chan,
	ganeshgr, saeedm, matanb, leonro, idosch, simon.horman,
	pieter.jansenvanvuuren, john.hurley, alexander.h.duyck, ogerlitz,
	john.fastabend, daniel

On 18-01-04 09:06 AM, Jiri Pirko wrote:
> Thu, Jan 04, 2018 at 02:43:08PM CET, jhs@mojatatu.com wrote:

>> If i understood you correctly,  that is still iffy.
>> An admin will see packets for the same filter on one device going
>> to an agreed-to queue but on another going to a default queue.
> 
> Yeah, it is. That needs a bit more thoughts and discussion. That was the
> main purpose I did not address it in this patchset and allowed only
> ingress and clsact qidsc.
> 

Like i said - this requires the admin to be aware of this issue.
It is nasty otherwise.

>> One option is to name all egress queues the same way on all devices
>> f.e in the two examples i provided call root qdiscs 1:0.
>>
>> I am not sure how cleanly you get this to work with egress
>> My current thinking involves some brain somersault...
>> I will think some more about it...
> 
> clsact egress works the same as ingress qdisc - please check the code.
> 


clsact egress does not deal with tcf_result - what code do you want
me to check?

cheers,
jamal

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch net-next v4 00/10] net: sched: allow qdiscs to share filter block instances
  2018-01-04 12:41                           ` Jamal Hadi Salim
  2018-01-04 13:00                             ` Jiri Pirko
@ 2018-01-04 15:45                             ` David Miller
  1 sibling, 0 replies; 40+ messages in thread
From: David Miller @ 2018-01-04 15:45 UTC (permalink / raw)
  To: jhs
  Cc: jiri, kubakici, dsahern, netdev, xiyou.wangcong, mlxsw, andrew,
	vivien.didelot, f.fainelli, michael.chan, ganeshgr, saeedm,
	matanb, leonro, idosch, simon.horman, pieter.jansenvanvuuren,
	john.hurley, alexander.h.duyck, ogerlitz, john.fastabend, daniel

From: Jamal Hadi Salim <jhs@mojatatu.com>
Date: Thu, 4 Jan 2018 07:41:54 -0500

> I agree with Jiri - if you consciously choose to share there should
> be no suprises with what filters get applied.

Wrong.  It is not always the case that the two entities making changes
are coordinated explicitly.

Block operations shared across devices must therefore be _explicit_.

Any other behavior will result in surprised and unintended
consequences.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch net-next v4 00/10] net: sched: allow qdiscs to share filter block instances
  2018-01-04 15:33                           ` David Miller
@ 2018-01-04 15:51                             ` Jiri Pirko
  2018-01-05 10:38                               ` Jiri Pirko
  0 siblings, 1 reply; 40+ messages in thread
From: Jiri Pirko @ 2018-01-04 15:51 UTC (permalink / raw)
  To: David Miller
  Cc: kubakici, dsahern, netdev, jhs, xiyou.wangcong, mlxsw, andrew,
	vivien.didelot, f.fainelli, michael.chan, ganeshgr, saeedm,
	matanb, leonro, idosch, simon.horman, pieter.jansenvanvuuren,
	john.hurley, alexander.h.duyck, ogerlitz, john.fastabend, daniel

Thu, Jan 04, 2018 at 04:33:48PM CET, davem@davemloft.net wrote:
>From: Jiri Pirko <jiri@resnulli.us>
>Date: Thu, 4 Jan 2018 11:12:57 +0100
>
>> No magic. ens8 and ens7 share the same block.
>
>No Jiri, the fact that they share the same block _IS MAGIC_.
>
>It is unexpected behavior to modify a rule and have it propagate
>to devices not mentioned in the command line.
>
>This is totally going to break things and upset people.
>
>Saying it shows up in some tc dump command is not an argument
>for this behavior being "expected".  NO way.
>
>I completely agree with David and others, you _MUST_ make an
>explicit API and set of operations to make changes to rules
>contained in shared blocks.

Okay. So you say that when I create a qdisc and its block is created, I
can never share it.

I have to always explicitly create block to share and only then to bind
it to some qdisc/s?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch net-next v4 00/10] net: sched: allow qdiscs to share filter block instances
  2018-01-04 15:51                             ` Jiri Pirko
@ 2018-01-05 10:38                               ` Jiri Pirko
  0 siblings, 0 replies; 40+ messages in thread
From: Jiri Pirko @ 2018-01-05 10:38 UTC (permalink / raw)
  To: David Miller
  Cc: kubakici, dsahern, netdev, jhs, xiyou.wangcong, mlxsw, andrew,
	vivien.didelot, f.fainelli, michael.chan, ganeshgr, saeedm,
	matanb, leonro, idosch, simon.horman, pieter.jansenvanvuuren,
	john.hurley, alexander.h.duyck, ogerlitz, john.fastabend, daniel

Thu, Jan 04, 2018 at 04:51:15PM CET, jiri@resnulli.us wrote:
>Thu, Jan 04, 2018 at 04:33:48PM CET, davem@davemloft.net wrote:
>>From: Jiri Pirko <jiri@resnulli.us>
>>Date: Thu, 4 Jan 2018 11:12:57 +0100
>>
>>> No magic. ens8 and ens7 share the same block.
>>
>>No Jiri, the fact that they share the same block _IS MAGIC_.
>>
>>It is unexpected behavior to modify a rule and have it propagate
>>to devices not mentioned in the command line.
>>
>>This is totally going to break things and upset people.
>>
>>Saying it shows up in some tc dump command is not an argument
>>for this behavior being "expected".  NO way.
>>
>>I completely agree with David and others, you _MUST_ make an
>>explicit API and set of operations to make changes to rules
>>contained in shared blocks.
>
>Okay. So you say that when I create a qdisc and its block is created, I
>can never share it.
>
>I have to always explicitly create block to share and only then to bind
>it to some qdisc/s?

I think I have the idea. It can work as usual until the block gets to be
shared, then only the block-specific things would work. The exisint
filter add and filter should would error out (with extack message why).
Ok, I'm on it. Thanks.

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2018-01-05 10:38 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-23 15:54 [patch net-next v4 00/10] net: sched: allow qdiscs to share filter block instances Jiri Pirko
2017-12-23 15:54 ` [patch net-next v4 01/10] net: sched: introduce support for multiple filter chain pointers registration Jiri Pirko
2017-12-23 15:54 ` [patch net-next v4 02/10] net: sched: avoid usage of tp->q in tcf_classify Jiri Pirko
2017-12-23 15:54 ` [patch net-next v4 03/10] net: sched: introduce block mechanism to handle netif_keep_dst calls Jiri Pirko
2017-12-23 15:54 ` [patch net-next v4 04/10] net: sched: remove classid and q fields from tcf_proto Jiri Pirko
2017-12-23 15:54 ` [patch net-next v4 05/10] net: sched: keep track of offloaded filters and check tc offload feature Jiri Pirko
2017-12-24  2:20   ` Jakub Kicinski
2017-12-24  7:52     ` Jiri Pirko
2017-12-23 15:54 ` [patch net-next v4 06/10] net: sched: allow ingress and clsact qdiscs to share filter blocks Jiri Pirko
2017-12-23 15:54 ` [patch net-next v4 07/10] mlxsw: spectrum_acl: Reshuffle code around mlxsw_sp_acl_ruleset_create/destroy Jiri Pirko
2017-12-23 15:54 ` [patch net-next v4 08/10] mlxsw: spectrum_acl: Don't store netdev and ingress for ruleset unbind Jiri Pirko
2017-12-23 15:54 ` [patch net-next v4 09/10] mlxsw: spectrum_acl: Implement TC block sharing Jiri Pirko
2017-12-23 15:54 ` [patch net-next v4 10/10] mlxsw: spectrum_acl: Pass mlxsw_sp_port down to ruleset bind/unbind ops Jiri Pirko
2017-12-23 16:06 ` [patch iproute2] tc: implement filter block sharing to ingress and clsact qdiscs Jiri Pirko
2017-12-24  1:54 ` [patch net-next v4 00/10] net: sched: allow qdiscs to share filter block instances David Ahern
2017-12-24  7:19   ` Jiri Pirko
2017-12-24 16:25     ` David Ahern
2017-12-25 10:23       ` Jiri Pirko
2018-01-02 19:49         ` Jiri Pirko
2018-01-03  2:07           ` David Ahern
2018-01-03  9:40             ` Jiri Pirko
2018-01-03 15:57               ` David Ahern
2018-01-03 17:22                 ` Jiri Pirko
2018-01-03 23:51                   ` Jakub Kicinski
2018-01-04  6:57                     ` Jiri Pirko
2018-01-04  7:06                       ` Jakub Kicinski
2018-01-04 10:12                         ` Jiri Pirko
2018-01-04 12:41                           ` Jamal Hadi Salim
2018-01-04 13:00                             ` Jiri Pirko
2018-01-04 13:30                               ` Jamal Hadi Salim
2018-01-04 14:02                                 ` Jiri Pirko
2018-01-04 15:45                             ` David Miller
2018-01-04 12:55                           ` Jamal Hadi Salim
2018-01-04 13:05                             ` Jiri Pirko
2018-01-04 13:43                               ` Jamal Hadi Salim
2018-01-04 14:06                                 ` Jiri Pirko
2018-01-04 15:42                                   ` Jamal Hadi Salim
2018-01-04 15:33                           ` David Miller
2018-01-04 15:51                             ` Jiri Pirko
2018-01-05 10:38                               ` Jiri Pirko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.