All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v4 0/3] net/sched: Introduce tc block ports tracking and use
@ 2023-10-05 18:42 Victor Nogueira
  2023-10-05 18:42 ` [PATCH net-next v4 1/3] net/sched: Introduce tc block netdev tracking infra Victor Nogueira
                   ` (3 more replies)
  0 siblings, 4 replies; 20+ messages in thread
From: Victor Nogueira @ 2023-10-05 18:42 UTC (permalink / raw)
  To: jhs, xiyou.wangcong, jiri, davem, pabeni, edumazet, kuba
  Cc: mleitner, vladbu, simon.horman, pctammela, netdev, kernel

__Context__
The "tc block" is a collection of netdevs/ports which allow qdiscs to share
match-action block instances (as opposed to the traditional tc filter per
netdev/port)[1].

Example setup:
$ tc qdisc add dev ens7 ingress block 22
$ tc qdisc add dev ens8 ingress block 22

Once the block is created we can add a filter using the block index:
$ tc filter add block 22 protocol ip pref 25 \
  flower dst_ip 192.168.0.0/16 action drop

A packet with dst IP matching 192.168.0.0/16 arriving on the ingress of
either ens7 or ens8 is dropped.

__This patchset__
Up to this point in the implementation, the block is unaware of its ports.
This patch fixes that and makes the tc block ports available to the
datapath.

For the datapath we provide a use case of the tc block in an action
we call "blockcast" in patch 3. This action can be used in an example as
such:

$ tc qdisc add dev ens7 ingress block 22
$ tc qdisc add dev ens8 ingress block 22
$ tc qdisc add dev ens9 ingress block 22
$ tc filter add block 22 protocol ip pref 25 \
  flower dst_ip 192.168.0.0/16 action blockcast

When a packet(matching dst IP 192.168.0.0/16) arrives on the ingress of any
of ens7, ens8 or ens9 it will be copied to all ports other than itself.
For example, if it arrives on ens8 then a copy of the packet will be
"blockcasted";-> to both ens7 and ens9 (unmodified), but not to ens8.

Patch 1 introduces the required infra. Patch 2 exposes the tc block to the
tc datapath and patch 3 implements datapath usage via a new tc action
"blockcast".

__Acknowledgements__
Suggestions from Vlad Buslov and Marcelo Ricardo Leitner made this patchset
better. The idea of integrating the ports into the tc block was suggested
by Jiri Pirko.

[1] See commit ca46abd6f89f ("Merge branch 'net-sched-allow-qdiscs-to-share-filter-block-instances'")

Changes in v2:
  - Remove RFC tag
  - Add more details in patch 0(Jiri)
  - When CONFIG_NET_TC_SKB_EXT is selected we have unused qdisc_cb
    Reported-by: kernel test robot <lkp@intel.com> (and horms@kernel.org)
  - Fix bad dev dereference in printk of blockcast action (Simon)

Changes in v3:
  - Add missing xa_destroy (pointed out by Vlad)
  - Remove bugfix pointed by Vlad (will send in separate patch)
  - Removed ports from subject in patch #2 and typos (suggested by Marcelo)
  - Remove net_notice_ratelimited debug messages in error
    cases (suggested by Marcelo)
  - Minor changes to appease sparse's lock context warning

Changes in v4:
  - Avoid code repetition using gotos in cast_one (suggested by Paolo)
  - Fix typo in cover letter (pointed out by Paolo)
  - Create a module description for act_blockcast
    (reported by Paolo and CI)

Victor Nogueira (3):
  net/sched: Introduce tc block netdev tracking infra
  net/sched: cls_api: Expose tc block to the datapath
  net/sched: act_blockcast: Introduce blockcast tc action

 include/net/sch_generic.h    |   8 +
 include/net/tc_wrapper.h     |   5 +
 include/uapi/linux/pkt_cls.h |   1 +
 net/sched/Kconfig            |  13 ++
 net/sched/Makefile           |   1 +
 net/sched/act_blockcast.c    | 297 +++++++++++++++++++++++++++++++++++
 net/sched/cls_api.c          |  12 +-
 net/sched/sch_api.c          |  58 +++++++
 net/sched/sch_generic.c      |  34 +++-
 9 files changed, 426 insertions(+), 3 deletions(-)
 create mode 100644 net/sched/act_blockcast.c

-- 
2.25.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH net-next v4 1/3] net/sched: Introduce tc block netdev tracking infra
  2023-10-05 18:42 [PATCH net-next v4 0/3] net/sched: Introduce tc block ports tracking and use Victor Nogueira
@ 2023-10-05 18:42 ` Victor Nogueira
  2023-10-05 18:42 ` [PATCH net-next v4 2/3] net/sched: cls_api: Expose tc block to the datapath Victor Nogueira
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 20+ messages in thread
From: Victor Nogueira @ 2023-10-05 18:42 UTC (permalink / raw)
  To: jhs, xiyou.wangcong, jiri, davem, pabeni, edumazet, kuba
  Cc: mleitner, vladbu, simon.horman, pctammela, netdev, kernel

The tc block is a collection of netdevs/ports which allow qdiscs to share
filter block instances (as opposed to the traditional tc filter per port).
Example:
$ tc qdisc add dev ens7 ingress block 22
$ tc qdisc add dev ens8 ingress block 22

Now we can add a filter using the block index:
$ tc filter add block 22 protocol ip pref 25 \
  flower dst_ip 192.168.0.0/16 action drop

Up to this point, the block is unaware of its ports. This patch fixes that
and makes the tc block ports available to the datapath.

Suggested-by: Jiri Pirko <jiri@nvidia.com>
Co-developed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Co-developed-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
---
 include/net/sch_generic.h |  4 +++
 net/sched/cls_api.c       |  2 ++
 net/sched/sch_api.c       | 58 +++++++++++++++++++++++++++++++++++++++
 net/sched/sch_generic.c   | 34 +++++++++++++++++++++--
 4 files changed, 96 insertions(+), 2 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index c7318c73cfd6..a01979b0a2a1 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -19,6 +19,7 @@
 #include <net/gen_stats.h>
 #include <net/rtnetlink.h>
 #include <net/flow_offload.h>
+#include <linux/xarray.h>
 
 struct Qdisc_ops;
 struct qdisc_walker;
@@ -126,6 +127,8 @@ struct Qdisc {
 
 	struct rcu_head		rcu;
 	netdevice_tracker	dev_tracker;
+	netdevice_tracker	in_block_tracker;
+	netdevice_tracker	eg_block_tracker;
 	/* private data */
 	long privdata[] ____cacheline_aligned;
 };
@@ -458,6 +461,7 @@ struct tcf_chain {
 };
 
 struct tcf_block {
+	struct xarray ports; /* datapath accessible */
 	/* Lock protects tcf_block and lifetime-management data of chains
 	 * attached to the block (refcnt, action_refcnt, explicitly_created).
 	 */
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index a193cc7b3241..06b55344a948 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -531,6 +531,7 @@ static void tcf_block_destroy(struct tcf_block *block)
 {
 	mutex_destroy(&block->lock);
 	mutex_destroy(&block->proto_destroy_lock);
+	xa_destroy(&block->ports);
 	kfree_rcu(block, rcu);
 }
 
@@ -1003,6 +1004,7 @@ static struct tcf_block *tcf_block_create(struct net *net, struct Qdisc *q,
 	refcount_set(&block->refcnt, 1);
 	block->net = net;
 	block->index = block_index;
+	xa_init(&block->ports);
 
 	/* Don't store q pointer for blocks which are shared */
 	if (!tcf_block_shared(block))
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index e9eaf637220e..66543e4d6cdc 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1180,6 +1180,60 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
 	return 0;
 }
 
+static int qdisc_block_add_dev(struct Qdisc *sch, struct net_device *dev,
+			       struct nlattr **tca,
+			       struct netlink_ext_ack *extack)
+{
+	const struct Qdisc_class_ops *cl_ops = sch->ops->cl_ops;
+	struct tcf_block *in_block = NULL;
+	struct tcf_block *eg_block = NULL;
+	unsigned long cl = 0;
+	int err;
+
+	if (tca[TCA_INGRESS_BLOCK]) {
+		/* works for both ingress and clsact */
+		cl = TC_H_MIN_INGRESS;
+		in_block = cl_ops->tcf_block(sch, cl, NULL);
+		if (!in_block) {
+			NL_SET_ERR_MSG(extack, "Shared ingress block missing");
+			return -EINVAL;
+		}
+
+		err = xa_insert(&in_block->ports, dev->ifindex, dev, GFP_KERNEL);
+		if (err) {
+			NL_SET_ERR_MSG(extack, "ingress block dev insert failed");
+			return err;
+		}
+
+		netdev_hold(dev, &sch->in_block_tracker, GFP_KERNEL);
+	}
+
+	if (tca[TCA_EGRESS_BLOCK]) {
+		cl = TC_H_MIN_EGRESS;
+		eg_block = cl_ops->tcf_block(sch, cl, NULL);
+		if (!eg_block) {
+			NL_SET_ERR_MSG(extack, "Shared egress block missing");
+			err = -EINVAL;
+			goto err_out;
+		}
+
+		err = xa_insert(&eg_block->ports, dev->ifindex, dev, GFP_KERNEL);
+		if (err) {
+			NL_SET_ERR_MSG(extack, "Egress block dev insert failed");
+			goto err_out;
+		}
+		netdev_hold(dev, &sch->eg_block_tracker, GFP_KERNEL);
+	}
+
+	return 0;
+err_out:
+	if (in_block) {
+		xa_erase(&in_block->ports, dev->ifindex);
+		netdev_put(dev, &sch->in_block_tracker);
+	}
+	return err;
+}
+
 static int qdisc_block_indexes_set(struct Qdisc *sch, struct nlattr **tca,
 				   struct netlink_ext_ack *extack)
 {
@@ -1350,6 +1404,10 @@ static struct Qdisc *qdisc_create(struct net_device *dev,
 	qdisc_hash_add(sch, false);
 	trace_qdisc_create(ops, dev, parent);
 
+	err = qdisc_block_add_dev(sch, dev, tca, extack);
+	if (err)
+		goto err_out4;
+
 	return sch;
 
 err_out4:
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 4195a4bc26ca..b0c28b2ee713 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -1049,7 +1049,12 @@ static void qdisc_free_cb(struct rcu_head *head)
 
 static void __qdisc_destroy(struct Qdisc *qdisc)
 {
-	const struct Qdisc_ops  *ops = qdisc->ops;
+	struct net_device *dev = qdisc_dev(qdisc);
+	const struct Qdisc_ops *ops = qdisc->ops;
+	const struct Qdisc_class_ops *cops;
+	struct tcf_block *block;
+	unsigned long cl;
+	u32 block_index;
 
 #ifdef CONFIG_NET_SCHED
 	qdisc_hash_del(qdisc);
@@ -1060,11 +1065,36 @@ static void __qdisc_destroy(struct Qdisc *qdisc)
 
 	qdisc_reset(qdisc);
 
+	cops = ops->cl_ops;
+	if (ops->ingress_block_get) {
+		block_index = ops->ingress_block_get(qdisc);
+		if (block_index) {
+			cl = TC_H_MIN_INGRESS;
+			block = cops->tcf_block(qdisc, cl, NULL);
+			if (block) {
+				if (xa_erase(&block->ports, dev->ifindex))
+					netdev_put(dev, &qdisc->in_block_tracker);
+			}
+		}
+	}
+
+	if (ops->egress_block_get) {
+		block_index = ops->egress_block_get(qdisc);
+		if (block_index) {
+			cl = TC_H_MIN_EGRESS;
+			block = cops->tcf_block(qdisc, cl, NULL);
+			if (block) {
+				if (xa_erase(&block->ports, dev->ifindex))
+					netdev_put(dev, &qdisc->eg_block_tracker);
+			}
+		}
+	}
+
 	if (ops->destroy)
 		ops->destroy(qdisc);
 
 	module_put(ops->owner);
-	netdev_put(qdisc_dev(qdisc), &qdisc->dev_tracker);
+	netdev_put(dev, &qdisc->dev_tracker);
 
 	trace_qdisc_destroy(qdisc);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next v4 2/3] net/sched: cls_api: Expose tc block to the datapath
  2023-10-05 18:42 [PATCH net-next v4 0/3] net/sched: Introduce tc block ports tracking and use Victor Nogueira
  2023-10-05 18:42 ` [PATCH net-next v4 1/3] net/sched: Introduce tc block netdev tracking infra Victor Nogueira
@ 2023-10-05 18:42 ` Victor Nogueira
  2023-10-05 18:42 ` [PATCH net-next v4 3/3] net/sched: act_blockcast: Introduce blockcast tc action Victor Nogueira
  2023-10-06 12:59 ` [PATCH net-next v4 0/3] net/sched: Introduce tc block ports tracking and use Jiri Pirko
  3 siblings, 0 replies; 20+ messages in thread
From: Victor Nogueira @ 2023-10-05 18:42 UTC (permalink / raw)
  To: jhs, xiyou.wangcong, jiri, davem, pabeni, edumazet, kuba
  Cc: mleitner, vladbu, simon.horman, pctammela, netdev, kernel

The datapath can now find the block of the port in which the packet arrived
at. It can then use it for various activities.

In the next patch we show a simple action that multicasts to all ports
except for the port in which the packet arrived on.

Co-developed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Co-developed-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
---
 include/net/sch_generic.h |  4 ++++
 net/sched/cls_api.c       | 10 +++++++++-
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index a01979b0a2a1..03ab3730ba09 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -440,6 +440,8 @@ struct qdisc_skb_cb {
 	};
 #define QDISC_CB_PRIV_LEN 20
 	unsigned char		data[QDISC_CB_PRIV_LEN];
+	/* This should allow eBPF to continue to align */
+	u32                     block_index;
 };
 
 typedef void tcf_chain_head_change_t(struct tcf_proto *tp_head, void *priv);
@@ -488,6 +490,8 @@ struct tcf_block {
 	struct mutex proto_destroy_lock; /* Lock for proto_destroy hashtable. */
 };
 
+struct tcf_block *tcf_block_lookup(struct net *net, u32 block_index);
+
 static inline bool lockdep_tcf_chain_is_locked(struct tcf_chain *chain)
 {
 	return lockdep_is_held(&chain->filter_chain_lock);
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 06b55344a948..c102fe26ac5e 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -1012,12 +1012,13 @@ static struct tcf_block *tcf_block_create(struct net *net, struct Qdisc *q,
 	return block;
 }
 
-static struct tcf_block *tcf_block_lookup(struct net *net, u32 block_index)
+struct tcf_block *tcf_block_lookup(struct net *net, u32 block_index)
 {
 	struct tcf_net *tn = net_generic(net, tcf_net_id);
 
 	return idr_find(&tn->idr, block_index);
 }
+EXPORT_SYMBOL(tcf_block_lookup);
 
 static struct tcf_block *tcf_block_refcnt_get(struct net *net, u32 block_index)
 {
@@ -1738,9 +1739,13 @@ int tcf_classify(struct sk_buff *skb,
 		 const struct tcf_proto *tp,
 		 struct tcf_result *res, bool compat_mode)
 {
+	struct qdisc_skb_cb *qdisc_cb = qdisc_skb_cb(skb);
+
 #if !IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
 	u32 last_executed_chain = 0;
 
+	qdisc_cb->block_index = block ? block->index : 0;
+
 	return __tcf_classify(skb, tp, tp, res, compat_mode, NULL, 0,
 			      &last_executed_chain);
 #else
@@ -1752,6 +1757,7 @@ int tcf_classify(struct sk_buff *skb,
 	int ret;
 
 	if (block) {
+		qdisc_cb->block_index = block->index;
 		ext = skb_ext_find(skb, TC_SKB_EXT);
 
 		if (ext && (ext->chain || ext->act_miss)) {
@@ -1779,6 +1785,8 @@ int tcf_classify(struct sk_buff *skb,
 			tp = rcu_dereference_bh(fchain->filter_chain);
 			last_executed_chain = fchain->index;
 		}
+	} else {
+		qdisc_cb->block_index = 0;
 	}
 
 	ret = __tcf_classify(skb, tp, orig_tp, res, compat_mode, n, act_index,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next v4 3/3] net/sched: act_blockcast: Introduce blockcast tc action
  2023-10-05 18:42 [PATCH net-next v4 0/3] net/sched: Introduce tc block ports tracking and use Victor Nogueira
  2023-10-05 18:42 ` [PATCH net-next v4 1/3] net/sched: Introduce tc block netdev tracking infra Victor Nogueira
  2023-10-05 18:42 ` [PATCH net-next v4 2/3] net/sched: cls_api: Expose tc block to the datapath Victor Nogueira
@ 2023-10-05 18:42 ` Victor Nogueira
  2023-10-06 12:59 ` [PATCH net-next v4 0/3] net/sched: Introduce tc block ports tracking and use Jiri Pirko
  3 siblings, 0 replies; 20+ messages in thread
From: Victor Nogueira @ 2023-10-05 18:42 UTC (permalink / raw)
  To: jhs, xiyou.wangcong, jiri, davem, pabeni, edumazet, kuba
  Cc: mleitner, vladbu, simon.horman, pctammela, netdev, kernel

This action takes advantage of the presence of tc block ports set in the
datapath and broadcast a packet to all ports on that set with exception of
the port in which it arrived on.

Example usage:
    $ tc qdisc add dev ens7 ingress block 22
    $ tc qdisc add dev ens8 ingress block 22

Now we can add a filter using the block index:
$ tc filter add block 22 protocol ip pref 25 \
  flower dst_ip 192.168.0.0/16 action blockcast

Co-developed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Co-developed-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
---
 include/net/tc_wrapper.h     |   5 +
 include/uapi/linux/pkt_cls.h |   1 +
 net/sched/Kconfig            |  13 ++
 net/sched/Makefile           |   1 +
 net/sched/act_blockcast.c    | 297 +++++++++++++++++++++++++++++++++++
 5 files changed, 317 insertions(+)
 create mode 100644 net/sched/act_blockcast.c

diff --git a/include/net/tc_wrapper.h b/include/net/tc_wrapper.h
index a6d481b5bcbc..8ef848968be7 100644
--- a/include/net/tc_wrapper.h
+++ b/include/net/tc_wrapper.h
@@ -28,6 +28,7 @@ TC_INDIRECT_ACTION_DECLARE(tcf_csum_act);
 TC_INDIRECT_ACTION_DECLARE(tcf_ct_act);
 TC_INDIRECT_ACTION_DECLARE(tcf_ctinfo_act);
 TC_INDIRECT_ACTION_DECLARE(tcf_gact_act);
+TC_INDIRECT_ACTION_DECLARE(tcf_blockcast_run);
 TC_INDIRECT_ACTION_DECLARE(tcf_gate_act);
 TC_INDIRECT_ACTION_DECLARE(tcf_ife_act);
 TC_INDIRECT_ACTION_DECLARE(tcf_ipt_act);
@@ -57,6 +58,10 @@ static inline int tc_act(struct sk_buff *skb, const struct tc_action *a,
 	if (a->ops->act == tcf_mirred_act)
 		return tcf_mirred_act(skb, a, res);
 #endif
+#if IS_BUILTIN(CONFIG_NET_ACT_BLOCKCAST)
+	if (a->ops->act == tcf_blockcast_run)
+		return tcf_blockcast_run(skb, a, res);
+#endif
 #if IS_BUILTIN(CONFIG_NET_ACT_PEDIT)
 	if (a->ops->act == tcf_pedit_act)
 		return tcf_pedit_act(skb, a, res);
diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index c7082cc60d21..e12fc51c1be1 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -139,6 +139,7 @@ enum tca_id {
 	TCA_ID_MPLS,
 	TCA_ID_CT,
 	TCA_ID_GATE,
+	TCA_ID_BLOCKCAST,
 	/* other actions go here */
 	__TCA_ID_MAX = 255
 };
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 470c70deffe2..abf26f0c921f 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -780,6 +780,19 @@ config NET_ACT_SIMP
 	  To compile this code as a module, choose M here: the
 	  module will be called act_simple.
 
+config NET_ACT_BLOCKCAST
+	tristate "TC block Multicast"
+	depends on NET_CLS_ACT
+	help
+	  Say Y here to add an action that will multicast an skb to egress of
+	  all netdevs that belong to a tc block except for the netdev on which
+          the skb arrived on
+
+	  If unsure, say N.
+
+	  To compile this code as a module, choose M here: the
+	  module will be called act_blockcast.
+
 config NET_ACT_SKBEDIT
 	tristate "SKB Editing"
 	depends on NET_CLS_ACT
diff --git a/net/sched/Makefile b/net/sched/Makefile
index b5fd49641d91..2cdcf30645eb 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -17,6 +17,7 @@ obj-$(CONFIG_NET_ACT_IPT)	+= act_ipt.o
 obj-$(CONFIG_NET_ACT_NAT)	+= act_nat.o
 obj-$(CONFIG_NET_ACT_PEDIT)	+= act_pedit.o
 obj-$(CONFIG_NET_ACT_SIMP)	+= act_simple.o
+obj-$(CONFIG_NET_ACT_BLOCKCAST)	+= act_blockcast.o
 obj-$(CONFIG_NET_ACT_SKBEDIT)	+= act_skbedit.o
 obj-$(CONFIG_NET_ACT_CSUM)	+= act_csum.o
 obj-$(CONFIG_NET_ACT_MPLS)	+= act_mpls.o
diff --git a/net/sched/act_blockcast.c b/net/sched/act_blockcast.c
new file mode 100644
index 000000000000..047023cba749
--- /dev/null
+++ b/net/sched/act_blockcast.c
@@ -0,0 +1,297 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * net/sched/act_blockcast.c	Block Cast action
+ * Copyright (c) 2023, Mojatatu Networks
+ * Authors:     Jamal Hadi Salim <jhs@mojatatu.com>
+ *              Victor Nogueira <victor@mojatatu.com>
+ *              Pedro Tammela <pctammela@mojatatu.com>
+ */
+
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/skbuff.h>
+#include <linux/rtnetlink.h>
+#include <net/netlink.h>
+#include <net/pkt_sched.h>
+#include <net/pkt_cls.h>
+#include <linux/if_arp.h>
+#include <net/tc_wrapper.h>
+
+#include <linux/tc_act/tc_defact.h>
+
+static struct tc_action_ops act_blockcast_ops;
+
+struct tcf_blockcast_act {
+	struct tc_action	common;
+};
+
+#define to_blockcast_act(a) ((struct tcf_blockcast_act *)a)
+
+#define CAST_RECURSION_LIMIT  4
+
+static DEFINE_PER_CPU(unsigned int, redirect_rec_level);
+
+static int cast_one(struct sk_buff *skb, const u32 ifindex)
+{
+	struct sk_buff *skb2 = skb;
+	int retval = TC_ACT_PIPE;
+	struct net_device *dev;
+	unsigned int rec_level;
+	bool expects_nh;
+	int mac_len;
+	bool at_nh;
+	int err;
+
+	rec_level = __this_cpu_inc_return(redirect_rec_level);
+	if (unlikely(rec_level > CAST_RECURSION_LIMIT)) {
+		net_warn_ratelimited("blockcast: exceeded redirect recursion limit on dev %s\n",
+				     netdev_name(skb->dev));
+		goto out_shot;
+	}
+
+	dev = dev_get_by_index_rcu(dev_net(skb->dev), ifindex);
+	if (unlikely(!dev))
+		goto out_shot;
+
+	if (unlikely(!(dev->flags & IFF_UP) || !netif_carrier_ok(dev)))
+		goto out_shot;
+
+	skb2 = skb_clone(skb, GFP_ATOMIC);
+	if (!skb2)
+		goto out_shot;
+
+	nf_reset_ct(skb2);
+
+	expects_nh = !dev_is_mac_header_xmit(dev);
+	at_nh = skb->data == skb_network_header(skb);
+	if (at_nh != expects_nh) {
+		mac_len = skb_at_tc_ingress(skb) ?
+				  skb->mac_len :
+				  skb_network_header(skb) - skb_mac_header(skb);
+
+		if (expects_nh) {
+			/* target device/action expect data at nh */
+			skb_pull_rcsum(skb2, mac_len);
+		} else {
+			/* target device/action expect data at mac */
+			skb_push_rcsum(skb2, mac_len);
+		}
+	}
+
+	skb2->skb_iif = skb->dev->ifindex;
+	skb2->dev = dev;
+
+	err = dev_queue_xmit(skb2);
+	if (err)
+		goto out_shot;
+
+	goto rec_level_dec;
+
+out_shot:
+	retval = TC_ACT_SHOT;
+
+rec_level_dec:
+	__this_cpu_dec(redirect_rec_level);
+
+	return retval;
+}
+
+TC_INDIRECT_SCOPE int tcf_blockcast_run(struct sk_buff *skb,
+					const struct tc_action *a,
+					struct tcf_result *res)
+{
+	u32 block_index = qdisc_skb_cb(skb)->block_index;
+	struct tcf_blockcast_act *p = to_blockcast_act(a);
+	int action = READ_ONCE(p->tcf_action);
+	struct net *net = dev_net(skb->dev);
+	struct tcf_block *block;
+	struct net_device *dev;
+	u32 exception_ifindex;
+	unsigned long index;
+
+	block = tcf_block_lookup(net, block_index);
+	exception_ifindex = skb->dev->ifindex;
+
+	tcf_action_update_bstats(&p->common, skb);
+	tcf_lastuse_update(&p->tcf_tm);
+
+	if (!block || xa_empty(&block->ports))
+		goto act_done;
+
+	/* we are already under rcu protection, so iterating block is safe*/
+	xa_for_each(&block->ports, index, dev) {
+		int err;
+
+		if (index == exception_ifindex)
+			continue;
+
+		err = cast_one(skb, dev->ifindex);
+		if (err != TC_ACT_PIPE)
+			tcf_action_inc_overlimit_qstats(&p->common);
+	}
+
+act_done:
+	if (action == TC_ACT_SHOT)
+		tcf_action_inc_drop_qstats(&p->common);
+	return action;
+}
+
+static const struct nla_policy blockcast_policy[TCA_DEF_MAX + 1] = {
+	[TCA_DEF_PARMS]	= { .len = sizeof(struct tc_defact) },
+};
+
+static int tcf_blockcast_init(struct net *net, struct nlattr *nla,
+			      struct nlattr *est, struct tc_action **a,
+			      struct tcf_proto *tp, u32 flags,
+			      struct netlink_ext_ack *extack)
+{
+	struct tc_action_net *tn = net_generic(net, act_blockcast_ops.net_id);
+	struct tcf_blockcast_act *p = to_blockcast_act(a);
+	bool bind = flags & TCA_ACT_FLAGS_BIND;
+	struct nlattr *tb[TCA_DEF_MAX + 1];
+	struct tcf_chain *goto_ch = NULL;
+	struct tc_defact *parm;
+	bool exists = false;
+	int ret = 0, err;
+	u32 index;
+
+	if (!nla)
+		return -EINVAL;
+
+	err = nla_parse_nested(tb, TCA_DEF_MAX, nla,
+			       blockcast_policy, extack);
+	if (err < 0)
+		return err;
+
+	if (!tb[TCA_DEF_PARMS])
+		return -EINVAL;
+
+	parm = nla_data(tb[TCA_DEF_PARMS]);
+	index = parm->index;
+
+	err = tcf_idr_check_alloc(tn, &index, a, bind);
+	if (err < 0)
+		return err;
+
+	exists = err;
+	if (exists && bind)
+		return 0;
+
+	if (!exists) {
+		ret = tcf_idr_create_from_flags(tn, index, est, a,
+						&act_blockcast_ops, bind, flags);
+		if (ret) {
+			tcf_idr_cleanup(tn, index);
+			return ret;
+		}
+
+		ret = ACT_P_CREATED;
+	} else {
+		if (!(flags & TCA_ACT_FLAGS_REPLACE)) {
+			err = -EEXIST;
+			goto release_idr;
+		}
+	}
+
+	err = tcf_action_check_ctrlact(parm->action, tp, &goto_ch, extack);
+	if (err < 0)
+		goto release_idr;
+
+	if (exists) {
+		spin_lock_bh(&p->tcf_lock);
+		goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch);
+		spin_unlock_bh(&p->tcf_lock);
+	} else {
+		goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch);
+	}
+
+	if (goto_ch)
+		tcf_chain_put_by_act(goto_ch);
+
+	return ret;
+release_idr:
+	tcf_idr_release(*a, bind);
+	return err;
+}
+
+static int tcf_blockcast_dump(struct sk_buff *skb, struct tc_action *a,
+			      int bind, int ref)
+{
+	unsigned char *b = skb_tail_pointer(skb);
+	struct tcf_blockcast_act *p = to_blockcast_act(a);
+	struct tc_defact opt = {
+		.index   = p->tcf_index,
+		.refcnt  = refcount_read(&p->tcf_refcnt) - ref,
+		.bindcnt = atomic_read(&p->tcf_bindcnt) - bind,
+	};
+	struct tcf_t t;
+
+	spin_lock_bh(&p->tcf_lock);
+	opt.action = p->tcf_action;
+	if (nla_put(skb, TCA_DEF_PARMS, sizeof(opt), &opt))
+		goto nla_put_failure;
+
+	tcf_tm_dump(&t, &p->tcf_tm);
+	if (nla_put_64bit(skb, TCA_DEF_TM, sizeof(t), &t, TCA_DEF_PAD))
+		goto nla_put_failure;
+	spin_unlock_bh(&p->tcf_lock);
+
+	return skb->len;
+
+nla_put_failure:
+	spin_unlock_bh(&p->tcf_lock);
+	nlmsg_trim(skb, b);
+	return -1;
+}
+
+static struct tc_action_ops act_blockcast_ops = {
+	.kind		=	"blockcast",
+	.id		=	TCA_ID_BLOCKCAST,
+	.owner		=	THIS_MODULE,
+	.act		=	tcf_blockcast_run,
+	.dump		=	tcf_blockcast_dump,
+	.init		=	tcf_blockcast_init,
+	.size		=	sizeof(struct tcf_blockcast_act),
+};
+
+static __net_init int blockcast_init_net(struct net *net)
+{
+	struct tc_action_net *tn = net_generic(net, act_blockcast_ops.net_id);
+
+	return tc_action_net_init(net, tn, &act_blockcast_ops);
+}
+
+static void __net_exit blockcast_exit_net(struct list_head *net_list)
+{
+	tc_action_net_exit(net_list, act_blockcast_ops.net_id);
+}
+
+static struct pernet_operations blockcast_net_ops = {
+	.init = blockcast_init_net,
+	.exit_batch = blockcast_exit_net,
+	.id   = &act_blockcast_ops.net_id,
+	.size = sizeof(struct tc_action_net),
+};
+
+MODULE_AUTHOR("Mojatatu Networks, Inc");
+MODULE_DESCRIPTION("Action to broadcast to devices on a block");
+MODULE_LICENSE("GPL");
+
+static int __init blockcast_init_module(void)
+{
+	int ret = tcf_register_action(&act_blockcast_ops, &blockcast_net_ops);
+
+	if (!ret)
+		pr_info("blockcast TC action Loaded\n");
+	return ret;
+}
+
+static void __exit blockcast_cleanup_module(void)
+{
+	tcf_unregister_action(&act_blockcast_ops, &blockcast_net_ops);
+}
+
+module_init(blockcast_init_module);
+module_exit(blockcast_cleanup_module);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v4 0/3] net/sched: Introduce tc block ports tracking and use
  2023-10-05 18:42 [PATCH net-next v4 0/3] net/sched: Introduce tc block ports tracking and use Victor Nogueira
                   ` (2 preceding siblings ...)
  2023-10-05 18:42 ` [PATCH net-next v4 3/3] net/sched: act_blockcast: Introduce blockcast tc action Victor Nogueira
@ 2023-10-06 12:59 ` Jiri Pirko
  2023-10-06 15:37   ` Jamal Hadi Salim
  3 siblings, 1 reply; 20+ messages in thread
From: Jiri Pirko @ 2023-10-06 12:59 UTC (permalink / raw)
  To: Victor Nogueira
  Cc: jhs, xiyou.wangcong, davem, pabeni, edumazet, kuba, mleitner,
	vladbu, simon.horman, pctammela, netdev, kernel

Thu, Oct 05, 2023 at 08:42:25PM CEST, victor@mojatatu.com wrote:
>__Context__
>The "tc block" is a collection of netdevs/ports which allow qdiscs to share
>match-action block instances (as opposed to the traditional tc filter per
>netdev/port)[1].
>
>Example setup:
>$ tc qdisc add dev ens7 ingress block 22
>$ tc qdisc add dev ens8 ingress block 22
>
>Once the block is created we can add a filter using the block index:
>$ tc filter add block 22 protocol ip pref 25 \
>  flower dst_ip 192.168.0.0/16 action drop
>
>A packet with dst IP matching 192.168.0.0/16 arriving on the ingress of
>either ens7 or ens8 is dropped.
>
>__This patchset__
>Up to this point in the implementation, the block is unaware of its ports.
>This patch fixes that and makes the tc block ports available to the

Odd. You fix a bug. Is there a bug? If yes, you need to describe it. If
no, don't use "fix".


>datapath.
>
>For the datapath we provide a use case of the tc block in an action
>we call "blockcast" in patch 3. This action can be used in an example as
>such:
>
>$ tc qdisc add dev ens7 ingress block 22
>$ tc qdisc add dev ens8 ingress block 22
>$ tc qdisc add dev ens9 ingress block 22
>$ tc filter add block 22 protocol ip pref 25 \
>  flower dst_ip 192.168.0.0/16 action blockcast

Seems to me a bit odd that the action works with the entity (block) is
is connected to. I would expect rather to give the action configuration:

$ tc filter add block 22 protocol ip pref 25 \
  flower dst_ip 192.168.0.0/16 action blockcast block 22
                                                ^^^^^^^^

Then this is more flexible and allows user to use this action for any
packet, no matter from where it was received.

Looks like this is functionality-wise similar to mirred redirect. Why
can't we have that action extended to accept block number instead of
netdev and have something like:

$ tc filter add block 22 protocol ip pref 25 \
  flower dst_ip 192.168.0.0/16 action mirred egress redirect block 22

This would be very much alike we do either "tc filter add dev X" or "tc
filter add block Y".

Regarding the filtering, that could be a simple flag config of mirred
action:

$ tc filter add block 22 protocol ip pref 25 \
  flower dst_ip 192.168.0.0/16 action mirred egress redirect block 22
  srcfilter

Or something like that.

Makes sense?



>
>When a packet(matching dst IP 192.168.0.0/16) arrives on the ingress of any
>of ens7, ens8 or ens9 it will be copied to all ports other than itself.
>For example, if it arrives on ens8 then a copy of the packet will be
>"blockcasted";-> to both ens7 and ens9 (unmodified), but not to ens8.
>
>Patch 1 introduces the required infra. Patch 2 exposes the tc block to the
>tc datapath and patch 3 implements datapath usage via a new tc action
>"blockcast".
>
>__Acknowledgements__
>Suggestions from Vlad Buslov and Marcelo Ricardo Leitner made this patchset
>better. The idea of integrating the ports into the tc block was suggested
>by Jiri Pirko.
>
>[1] See commit ca46abd6f89f ("Merge branch 'net-sched-allow-qdiscs-to-share-filter-block-instances'")
>
>Changes in v2:
>  - Remove RFC tag
>  - Add more details in patch 0(Jiri)
>  - When CONFIG_NET_TC_SKB_EXT is selected we have unused qdisc_cb
>    Reported-by: kernel test robot <lkp@intel.com> (and horms@kernel.org)
>  - Fix bad dev dereference in printk of blockcast action (Simon)
>
>Changes in v3:
>  - Add missing xa_destroy (pointed out by Vlad)
>  - Remove bugfix pointed by Vlad (will send in separate patch)
>  - Removed ports from subject in patch #2 and typos (suggested by Marcelo)
>  - Remove net_notice_ratelimited debug messages in error
>    cases (suggested by Marcelo)
>  - Minor changes to appease sparse's lock context warning
>
>Changes in v4:
>  - Avoid code repetition using gotos in cast_one (suggested by Paolo)
>  - Fix typo in cover letter (pointed out by Paolo)
>  - Create a module description for act_blockcast
>    (reported by Paolo and CI)
>
>Victor Nogueira (3):
>  net/sched: Introduce tc block netdev tracking infra
>  net/sched: cls_api: Expose tc block to the datapath
>  net/sched: act_blockcast: Introduce blockcast tc action
>
> include/net/sch_generic.h    |   8 +
> include/net/tc_wrapper.h     |   5 +
> include/uapi/linux/pkt_cls.h |   1 +
> net/sched/Kconfig            |  13 ++
> net/sched/Makefile           |   1 +
> net/sched/act_blockcast.c    | 297 +++++++++++++++++++++++++++++++++++
> net/sched/cls_api.c          |  12 +-
> net/sched/sch_api.c          |  58 +++++++
> net/sched/sch_generic.c      |  34 +++-
> 9 files changed, 426 insertions(+), 3 deletions(-)
> create mode 100644 net/sched/act_blockcast.c
>
>-- 
>2.25.1
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v4 0/3] net/sched: Introduce tc block ports tracking and use
  2023-10-06 12:59 ` [PATCH net-next v4 0/3] net/sched: Introduce tc block ports tracking and use Jiri Pirko
@ 2023-10-06 15:37   ` Jamal Hadi Salim
  2023-10-06 16:50     ` Jiri Pirko
  0 siblings, 1 reply; 20+ messages in thread
From: Jamal Hadi Salim @ 2023-10-06 15:37 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Victor Nogueira, xiyou.wangcong, davem, pabeni, edumazet, kuba,
	mleitner, vladbu, simon.horman, pctammela, netdev, kernel

On Fri, Oct 6, 2023 at 8:59 AM Jiri Pirko <jiri@resnulli.us> wrote:
>
> Thu, Oct 05, 2023 at 08:42:25PM CEST, victor@mojatatu.com wrote:
> >__Context__
> >The "tc block" is a collection of netdevs/ports which allow qdiscs to share
> >match-action block instances (as opposed to the traditional tc filter per
> >netdev/port)[1].
> >
> >Example setup:
> >$ tc qdisc add dev ens7 ingress block 22
> >$ tc qdisc add dev ens8 ingress block 22
> >
> >Once the block is created we can add a filter using the block index:
> >$ tc filter add block 22 protocol ip pref 25 \
> >  flower dst_ip 192.168.0.0/16 action drop
> >
> >A packet with dst IP matching 192.168.0.0/16 arriving on the ingress of
> >either ens7 or ens8 is dropped.
> >
> >__This patchset__
> >Up to this point in the implementation, the block is unaware of its ports.
> >This patch fixes that and makes the tc block ports available to the
>
> Odd. You fix a bug. Is there a bug? If yes, you need to describe it. If
> no, don't use "fix".

Ok, Jiri;->  we will change the language.

>
> >datapath.
> >
> >For the datapath we provide a use case of the tc block in an action
> >we call "blockcast" in patch 3. This action can be used in an example as
> >such:
> >
> >$ tc qdisc add dev ens7 ingress block 22
> >$ tc qdisc add dev ens8 ingress block 22
> >$ tc qdisc add dev ens9 ingress block 22
> >$ tc filter add block 22 protocol ip pref 25 \
> >  flower dst_ip 192.168.0.0/16 action blockcast
>
> Seems to me a bit odd that the action works with the entity (block) is
> is connected to. I would expect rather to give the action configuration:
>
> $ tc filter add block 22 protocol ip pref 25 \
>   flower dst_ip 192.168.0.0/16 action blockcast block 22
>                                                 ^^^^^^^^

We are currently passing the blockid in the skb cb field so it is
configuration-less. I suppose we could add this as an optional field
and use it when specified.

> Then this is more flexible and allows user to use this action for any
> packet, no matter from where it was received.
>
> Looks like this is functionality-wise similar to mirred redirect. Why
> can't we have that action extended to accept block number instead of
> netdev and have something like:
>
> $ tc filter add block 22 protocol ip pref 25 \
>   flower dst_ip 192.168.0.0/16 action mirred egress redirect block 22
>
> This would be very much alike we do either "tc filter add dev X" or "tc
> filter add block Y".
>

We did consider it but concluded it is a lot of work to get it done on
mirred - just take a look at mirred and you'll see what i mean;->
Based on that review we came to the conclusion that at some point it
would be safer to separate mirred's mirror from redirect; there are
too many checks to avoid one or the other based on whether you are
coming from egress vs ingress etc. This one is simple, it is just a
broadcast.


> Regarding the filtering, that could be a simple flag config of mirred
> action:
>
> $ tc filter add block 22 protocol ip pref 25 \
>   flower dst_ip 192.168.0.0/16 action mirred egress redirect block 22
>   srcfilter
>
> Or something like that.
>

See my comment above.

cheers,
jamal
> Makes sense?
>
>
>
> >
> >When a packet(matching dst IP 192.168.0.0/16) arrives on the ingress of any
> >of ens7, ens8 or ens9 it will be copied to all ports other than itself.
> >For example, if it arrives on ens8 then a copy of the packet will be
> >"blockcasted";-> to both ens7 and ens9 (unmodified), but not to ens8.
> >
> >Patch 1 introduces the required infra. Patch 2 exposes the tc block to the
> >tc datapath and patch 3 implements datapath usage via a new tc action
> >"blockcast".
> >
> >__Acknowledgements__
> >Suggestions from Vlad Buslov and Marcelo Ricardo Leitner made this patchset
> >better. The idea of integrating the ports into the tc block was suggested
> >by Jiri Pirko.
> >
> >[1] See commit ca46abd6f89f ("Merge branch 'net-sched-allow-qdiscs-to-share-filter-block-instances'")
> >
> >Changes in v2:
> >  - Remove RFC tag
> >  - Add more details in patch 0(Jiri)
> >  - When CONFIG_NET_TC_SKB_EXT is selected we have unused qdisc_cb
> >    Reported-by: kernel test robot <lkp@intel.com> (and horms@kernel.org)
> >  - Fix bad dev dereference in printk of blockcast action (Simon)
> >
> >Changes in v3:
> >  - Add missing xa_destroy (pointed out by Vlad)
> >  - Remove bugfix pointed by Vlad (will send in separate patch)
> >  - Removed ports from subject in patch #2 and typos (suggested by Marcelo)
> >  - Remove net_notice_ratelimited debug messages in error
> >    cases (suggested by Marcelo)
> >  - Minor changes to appease sparse's lock context warning
> >
> >Changes in v4:
> >  - Avoid code repetition using gotos in cast_one (suggested by Paolo)
> >  - Fix typo in cover letter (pointed out by Paolo)
> >  - Create a module description for act_blockcast
> >    (reported by Paolo and CI)
> >
> >Victor Nogueira (3):
> >  net/sched: Introduce tc block netdev tracking infra
> >  net/sched: cls_api: Expose tc block to the datapath
> >  net/sched: act_blockcast: Introduce blockcast tc action
> >
> > include/net/sch_generic.h    |   8 +
> > include/net/tc_wrapper.h     |   5 +
> > include/uapi/linux/pkt_cls.h |   1 +
> > net/sched/Kconfig            |  13 ++
> > net/sched/Makefile           |   1 +
> > net/sched/act_blockcast.c    | 297 +++++++++++++++++++++++++++++++++++
> > net/sched/cls_api.c          |  12 +-
> > net/sched/sch_api.c          |  58 +++++++
> > net/sched/sch_generic.c      |  34 +++-
> > 9 files changed, 426 insertions(+), 3 deletions(-)
> > create mode 100644 net/sched/act_blockcast.c
> >
> >--
> >2.25.1
> >

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v4 0/3] net/sched: Introduce tc block ports tracking and use
  2023-10-06 15:37   ` Jamal Hadi Salim
@ 2023-10-06 16:50     ` Jiri Pirko
  2023-10-06 19:06       ` Jamal Hadi Salim
  0 siblings, 1 reply; 20+ messages in thread
From: Jiri Pirko @ 2023-10-06 16:50 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Victor Nogueira, xiyou.wangcong, davem, pabeni, edumazet, kuba,
	mleitner, vladbu, simon.horman, pctammela, netdev, kernel

Fri, Oct 06, 2023 at 05:37:41PM CEST, jhs@mojatatu.com wrote:
>On Fri, Oct 6, 2023 at 8:59 AM Jiri Pirko <jiri@resnulli.us> wrote:
>>
>> Thu, Oct 05, 2023 at 08:42:25PM CEST, victor@mojatatu.com wrote:
>> >__Context__
>> >The "tc block" is a collection of netdevs/ports which allow qdiscs to share
>> >match-action block instances (as opposed to the traditional tc filter per
>> >netdev/port)[1].
>> >
>> >Example setup:
>> >$ tc qdisc add dev ens7 ingress block 22
>> >$ tc qdisc add dev ens8 ingress block 22
>> >
>> >Once the block is created we can add a filter using the block index:
>> >$ tc filter add block 22 protocol ip pref 25 \
>> >  flower dst_ip 192.168.0.0/16 action drop
>> >
>> >A packet with dst IP matching 192.168.0.0/16 arriving on the ingress of
>> >either ens7 or ens8 is dropped.
>> >
>> >__This patchset__
>> >Up to this point in the implementation, the block is unaware of its ports.
>> >This patch fixes that and makes the tc block ports available to the
>>
>> Odd. You fix a bug. Is there a bug? If yes, you need to describe it. If
>> no, don't use "fix".
>
>Ok, Jiri;->  we will change the language.
>
>>
>> >datapath.
>> >
>> >For the datapath we provide a use case of the tc block in an action
>> >we call "blockcast" in patch 3. This action can be used in an example as
>> >such:
>> >
>> >$ tc qdisc add dev ens7 ingress block 22
>> >$ tc qdisc add dev ens8 ingress block 22
>> >$ tc qdisc add dev ens9 ingress block 22
>> >$ tc filter add block 22 protocol ip pref 25 \
>> >  flower dst_ip 192.168.0.0/16 action blockcast
>>
>> Seems to me a bit odd that the action works with the entity (block) is
>> is connected to. I would expect rather to give the action configuration:
>>
>> $ tc filter add block 22 protocol ip pref 25 \
>>   flower dst_ip 192.168.0.0/16 action blockcast block 22
>>                                                 ^^^^^^^^
>
>We are currently passing the blockid in the skb cb field so it is
>configuration-less. I suppose we could add this as an optional field
>and use it when specified.

I don't understand the need for configuration less here. You don't have
it for the rest of the actions. Why this is special?


>
>> Then this is more flexible and allows user to use this action for any
>> packet, no matter from where it was received.
>>
>> Looks like this is functionality-wise similar to mirred redirect. Why
>> can't we have that action extended to accept block number instead of
>> netdev and have something like:
>>
>> $ tc filter add block 22 protocol ip pref 25 \
>>   flower dst_ip 192.168.0.0/16 action mirred egress redirect block 22
>>
>> This would be very much alike we do either "tc filter add dev X" or "tc
>> filter add block Y".
>>
>
>We did consider it but concluded it is a lot of work to get it done on
>mirred - just take a look at mirred and you'll see what i mean;->
>Based on that review we came to the conclusion that at some point it
>would be safer to separate mirred's mirror from redirect; there are
>too many checks to avoid one or the other based on whether you are
>coming from egress vs ingress etc. This one is simple, it is just a
>broadcast.

Perhaps it is a nice opportunity to do such mirred cleanup, prepare the
code and implement block send afterwards?

If I omit the code for now, from user perspective, this functionality
belongs into mirred, don't you think? Just replace "dev" by "block" and
you got what you need.


>
>
>> Regarding the filtering, that could be a simple flag config of mirred
>> action:
>>
>> $ tc filter add block 22 protocol ip pref 25 \
>>   flower dst_ip 192.168.0.0/16 action mirred egress redirect block 22
>>   srcfilter
>>
>> Or something like that.
>>
>
>See my comment above.
>
>cheers,
>jamal
>> Makes sense?
>>
>>
>>
>> >
>> >When a packet(matching dst IP 192.168.0.0/16) arrives on the ingress of any
>> >of ens7, ens8 or ens9 it will be copied to all ports other than itself.
>> >For example, if it arrives on ens8 then a copy of the packet will be
>> >"blockcasted";-> to both ens7 and ens9 (unmodified), but not to ens8.
>> >
>> >Patch 1 introduces the required infra. Patch 2 exposes the tc block to the
>> >tc datapath and patch 3 implements datapath usage via a new tc action
>> >"blockcast".
>> >
>> >__Acknowledgements__
>> >Suggestions from Vlad Buslov and Marcelo Ricardo Leitner made this patchset
>> >better. The idea of integrating the ports into the tc block was suggested
>> >by Jiri Pirko.
>> >
>> >[1] See commit ca46abd6f89f ("Merge branch 'net-sched-allow-qdiscs-to-share-filter-block-instances'")
>> >
>> >Changes in v2:
>> >  - Remove RFC tag
>> >  - Add more details in patch 0(Jiri)
>> >  - When CONFIG_NET_TC_SKB_EXT is selected we have unused qdisc_cb
>> >    Reported-by: kernel test robot <lkp@intel.com> (and horms@kernel.org)
>> >  - Fix bad dev dereference in printk of blockcast action (Simon)
>> >
>> >Changes in v3:
>> >  - Add missing xa_destroy (pointed out by Vlad)
>> >  - Remove bugfix pointed by Vlad (will send in separate patch)
>> >  - Removed ports from subject in patch #2 and typos (suggested by Marcelo)
>> >  - Remove net_notice_ratelimited debug messages in error
>> >    cases (suggested by Marcelo)
>> >  - Minor changes to appease sparse's lock context warning
>> >
>> >Changes in v4:
>> >  - Avoid code repetition using gotos in cast_one (suggested by Paolo)
>> >  - Fix typo in cover letter (pointed out by Paolo)
>> >  - Create a module description for act_blockcast
>> >    (reported by Paolo and CI)
>> >
>> >Victor Nogueira (3):
>> >  net/sched: Introduce tc block netdev tracking infra
>> >  net/sched: cls_api: Expose tc block to the datapath
>> >  net/sched: act_blockcast: Introduce blockcast tc action
>> >
>> > include/net/sch_generic.h    |   8 +
>> > include/net/tc_wrapper.h     |   5 +
>> > include/uapi/linux/pkt_cls.h |   1 +
>> > net/sched/Kconfig            |  13 ++
>> > net/sched/Makefile           |   1 +
>> > net/sched/act_blockcast.c    | 297 +++++++++++++++++++++++++++++++++++
>> > net/sched/cls_api.c          |  12 +-
>> > net/sched/sch_api.c          |  58 +++++++
>> > net/sched/sch_generic.c      |  34 +++-
>> > 9 files changed, 426 insertions(+), 3 deletions(-)
>> > create mode 100644 net/sched/act_blockcast.c
>> >
>> >--
>> >2.25.1
>> >

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v4 0/3] net/sched: Introduce tc block ports tracking and use
  2023-10-06 16:50     ` Jiri Pirko
@ 2023-10-06 19:06       ` Jamal Hadi Salim
  2023-10-06 22:25         ` Jakub Kicinski
  2023-10-07 10:22         ` Jiri Pirko
  0 siblings, 2 replies; 20+ messages in thread
From: Jamal Hadi Salim @ 2023-10-06 19:06 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Victor Nogueira, xiyou.wangcong, davem, pabeni, edumazet, kuba,
	mleitner, vladbu, simon.horman, pctammela, netdev, kernel

On Fri, Oct 6, 2023 at 12:50 PM Jiri Pirko <jiri@resnulli.us> wrote:
>
> Fri, Oct 06, 2023 at 05:37:41PM CEST, jhs@mojatatu.com wrote:
> >On Fri, Oct 6, 2023 at 8:59 AM Jiri Pirko <jiri@resnulli.us> wrote:
> >>
> >> Thu, Oct 05, 2023 at 08:42:25PM CEST, victor@mojatatu.com wrote:
> >> >__Context__
> >> >The "tc block" is a collection of netdevs/ports which allow qdiscs to share
> >> >match-action block instances (as opposed to the traditional tc filter per
> >> >netdev/port)[1].
> >> >
> >> >Example setup:
> >> >$ tc qdisc add dev ens7 ingress block 22
> >> >$ tc qdisc add dev ens8 ingress block 22
> >> >
> >> >Once the block is created we can add a filter using the block index:
> >> >$ tc filter add block 22 protocol ip pref 25 \
> >> >  flower dst_ip 192.168.0.0/16 action drop
> >> >
> >> >A packet with dst IP matching 192.168.0.0/16 arriving on the ingress of
> >> >either ens7 or ens8 is dropped.
> >> >
> >> >__This patchset__
> >> >Up to this point in the implementation, the block is unaware of its ports.
> >> >This patch fixes that and makes the tc block ports available to the
> >>
> >> Odd. You fix a bug. Is there a bug? If yes, you need to describe it. If
> >> no, don't use "fix".
> >
> >Ok, Jiri;->  we will change the language.
> >
> >>
> >> >datapath.
> >> >
> >> >For the datapath we provide a use case of the tc block in an action
> >> >we call "blockcast" in patch 3. This action can be used in an example as
> >> >such:
> >> >
> >> >$ tc qdisc add dev ens7 ingress block 22
> >> >$ tc qdisc add dev ens8 ingress block 22
> >> >$ tc qdisc add dev ens9 ingress block 22
> >> >$ tc filter add block 22 protocol ip pref 25 \
> >> >  flower dst_ip 192.168.0.0/16 action blockcast
> >>
> >> Seems to me a bit odd that the action works with the entity (block) is
> >> is connected to. I would expect rather to give the action configuration:
> >>
> >> $ tc filter add block 22 protocol ip pref 25 \
> >>   flower dst_ip 192.168.0.0/16 action blockcast block 22
> >>                                                 ^^^^^^^^
> >
> >We are currently passing the blockid in the skb cb field so it is
> >configuration-less. I suppose we could add this as an optional field
> >and use it when specified.
>
> I don't understand the need for configuration less here. You don't have
> it for the rest of the actions. Why this is special?

It is not needed really. Think of an L2 switch - the broadcast action
is to send to all ports but self.

>
> >
> >> Then this is more flexible and allows user to use this action for any
> >> packet, no matter from where it was received.
> >>
> >> Looks like this is functionality-wise similar to mirred redirect. Why
> >> can't we have that action extended to accept block number instead of
> >> netdev and have something like:
> >>
> >> $ tc filter add block 22 protocol ip pref 25 \
> >>   flower dst_ip 192.168.0.0/16 action mirred egress redirect block 22
> >>
> >> This would be very much alike we do either "tc filter add dev X" or "tc
> >> filter add block Y".
> >>
> >
> >We did consider it but concluded it is a lot of work to get it done on
> >mirred - just take a look at mirred and you'll see what i mean;->
> >Based on that review we came to the conclusion that at some point it
> >would be safer to separate mirred's mirror from redirect; there are
> >too many checks to avoid one or the other based on whether you are
> >coming from egress vs ingress etc. This one is simple, it is just a
> >broadcast.
>
> Perhaps it is a nice opportunity to do such mirred cleanup, prepare the
> code and implement block send afterwards?

I was worried about breaking some existing use cases - the code has
got too clever.
But probably it is time to show it some love, one of us will invest
time into it.

> If I omit the code for now, from user perspective, this functionality
> belongs into mirred, don't you think? Just replace "dev" by "block" and
> you got what you need.

If we can adequately cleanup mirred,  then we can put it there but
certainly now we are adding more buttons to click on mirred. It may
make sense to refactor the mirred code then reuse the refactored code
in a new action.

cheers,
jamal

>
> >
> >
> >> Regarding the filtering, that could be a simple flag config of mirred
> >> action:
> >>
> >> $ tc filter add block 22 protocol ip pref 25 \
> >>   flower dst_ip 192.168.0.0/16 action mirred egress redirect block 22
> >>   srcfilter
> >>
> >> Or something like that.
> >>
> >
> >See my comment above.
> >
> >cheers,
> >jamal
> >> Makes sense?
> >>
> >>
> >>
> >> >
> >> >When a packet(matching dst IP 192.168.0.0/16) arrives on the ingress of any
> >> >of ens7, ens8 or ens9 it will be copied to all ports other than itself.
> >> >For example, if it arrives on ens8 then a copy of the packet will be
> >> >"blockcasted";-> to both ens7 and ens9 (unmodified), but not to ens8.
> >> >
> >> >Patch 1 introduces the required infra. Patch 2 exposes the tc block to the
> >> >tc datapath and patch 3 implements datapath usage via a new tc action
> >> >"blockcast".
> >> >
> >> >__Acknowledgements__
> >> >Suggestions from Vlad Buslov and Marcelo Ricardo Leitner made this patchset
> >> >better. The idea of integrating the ports into the tc block was suggested
> >> >by Jiri Pirko.
> >> >
> >> >[1] See commit ca46abd6f89f ("Merge branch 'net-sched-allow-qdiscs-to-share-filter-block-instances'")
> >> >
> >> >Changes in v2:
> >> >  - Remove RFC tag
> >> >  - Add more details in patch 0(Jiri)
> >> >  - When CONFIG_NET_TC_SKB_EXT is selected we have unused qdisc_cb
> >> >    Reported-by: kernel test robot <lkp@intel.com> (and horms@kernel.org)
> >> >  - Fix bad dev dereference in printk of blockcast action (Simon)
> >> >
> >> >Changes in v3:
> >> >  - Add missing xa_destroy (pointed out by Vlad)
> >> >  - Remove bugfix pointed by Vlad (will send in separate patch)
> >> >  - Removed ports from subject in patch #2 and typos (suggested by Marcelo)
> >> >  - Remove net_notice_ratelimited debug messages in error
> >> >    cases (suggested by Marcelo)
> >> >  - Minor changes to appease sparse's lock context warning
> >> >
> >> >Changes in v4:
> >> >  - Avoid code repetition using gotos in cast_one (suggested by Paolo)
> >> >  - Fix typo in cover letter (pointed out by Paolo)
> >> >  - Create a module description for act_blockcast
> >> >    (reported by Paolo and CI)
> >> >
> >> >Victor Nogueira (3):
> >> >  net/sched: Introduce tc block netdev tracking infra
> >> >  net/sched: cls_api: Expose tc block to the datapath
> >> >  net/sched: act_blockcast: Introduce blockcast tc action
> >> >
> >> > include/net/sch_generic.h    |   8 +
> >> > include/net/tc_wrapper.h     |   5 +
> >> > include/uapi/linux/pkt_cls.h |   1 +
> >> > net/sched/Kconfig            |  13 ++
> >> > net/sched/Makefile           |   1 +
> >> > net/sched/act_blockcast.c    | 297 +++++++++++++++++++++++++++++++++++
> >> > net/sched/cls_api.c          |  12 +-
> >> > net/sched/sch_api.c          |  58 +++++++
> >> > net/sched/sch_generic.c      |  34 +++-
> >> > 9 files changed, 426 insertions(+), 3 deletions(-)
> >> > create mode 100644 net/sched/act_blockcast.c
> >> >
> >> >--
> >> >2.25.1
> >> >

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v4 0/3] net/sched: Introduce tc block ports tracking and use
  2023-10-06 19:06       ` Jamal Hadi Salim
@ 2023-10-06 22:25         ` Jakub Kicinski
  2023-10-06 23:00           ` Jamal Hadi Salim
  2023-10-07 10:22         ` Jiri Pirko
  1 sibling, 1 reply; 20+ messages in thread
From: Jakub Kicinski @ 2023-10-06 22:25 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Jiri Pirko, Victor Nogueira, xiyou.wangcong, davem, pabeni,
	edumazet, mleitner, vladbu, simon.horman, pctammela, netdev,
	kernel

On Fri, 6 Oct 2023 15:06:45 -0400 Jamal Hadi Salim wrote:
> > I don't understand the need for configuration less here. You don't have
> > it for the rest of the actions. Why this is special?  

+1, FWIW

> It is not needed really. Think of an L2 switch - the broadcast action
> is to send to all ports but self.

We do have an implementation of an L2 switch already, what's the use
case which necessitates this new action / functionality?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v4 0/3] net/sched: Introduce tc block ports tracking and use
  2023-10-06 22:25         ` Jakub Kicinski
@ 2023-10-06 23:00           ` Jamal Hadi Salim
  2023-10-07 10:20             ` Jiri Pirko
  0 siblings, 1 reply; 20+ messages in thread
From: Jamal Hadi Salim @ 2023-10-06 23:00 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jiri Pirko, Victor Nogueira, xiyou.wangcong, davem, pabeni,
	edumazet, mleitner, vladbu, simon.horman, pctammela, netdev,
	kernel

On Fri, Oct 6, 2023 at 6:25 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Fri, 6 Oct 2023 15:06:45 -0400 Jamal Hadi Salim wrote:
> > > I don't understand the need for configuration less here. You don't have
> > > it for the rest of the actions. Why this is special?
>
> +1, FWIW

We dont have any rule that says all actions MUST have parameters ;->
There is nothing speacial about any action that doesnt have a
parameter.

> > It is not needed really. Think of an L2 switch - the broadcast action
> > is to send to all ports but self.
>
> We do have an implementation of an L2 switch already, what's the use
> case which necessitates this new action / functionality?

It is not an L2 switch - the L2 example switch was what came to mind
of something that does a broadcast (it doesnt depend on MAC addresses
for example). Could have called it a hub. Ex: you could match on many
different header fields or skb metadata, then first modify the packet
using NAT, etc and then "blockcast" it.

cheers,
jamal

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v4 0/3] net/sched: Introduce tc block ports tracking and use
  2023-10-06 23:00           ` Jamal Hadi Salim
@ 2023-10-07 10:20             ` Jiri Pirko
  2023-10-07 11:06               ` Jamal Hadi Salim
  0 siblings, 1 reply; 20+ messages in thread
From: Jiri Pirko @ 2023-10-07 10:20 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Jakub Kicinski, Victor Nogueira, xiyou.wangcong, davem, pabeni,
	edumazet, mleitner, vladbu, simon.horman, pctammela, netdev,
	kernel

Sat, Oct 07, 2023 at 01:00:00AM CEST, jhs@mojatatu.com wrote:
>On Fri, Oct 6, 2023 at 6:25 PM Jakub Kicinski <kuba@kernel.org> wrote:
>>
>> On Fri, 6 Oct 2023 15:06:45 -0400 Jamal Hadi Salim wrote:
>> > > I don't understand the need for configuration less here. You don't have
>> > > it for the rest of the actions. Why this is special?
>>
>> +1, FWIW
>
>We dont have any rule that says all actions MUST have parameters ;->
>There is nothing speacial about any action that doesnt have a
>parameter.

You are getting the configuration from the block/device the action is
attached to. Can you point me to another action doing that?



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v4 0/3] net/sched: Introduce tc block ports tracking and use
  2023-10-06 19:06       ` Jamal Hadi Salim
  2023-10-06 22:25         ` Jakub Kicinski
@ 2023-10-07 10:22         ` Jiri Pirko
  1 sibling, 0 replies; 20+ messages in thread
From: Jiri Pirko @ 2023-10-07 10:22 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Victor Nogueira, xiyou.wangcong, davem, pabeni, edumazet, kuba,
	mleitner, vladbu, simon.horman, pctammela, netdev, kernel

Fri, Oct 06, 2023 at 09:06:45PM CEST, jhs@mojatatu.com wrote:
>On Fri, Oct 6, 2023 at 12:50 PM Jiri Pirko <jiri@resnulli.us> wrote:
>>
>> Fri, Oct 06, 2023 at 05:37:41PM CEST, jhs@mojatatu.com wrote:
>> >On Fri, Oct 6, 2023 at 8:59 AM Jiri Pirko <jiri@resnulli.us> wrote:
>> >>
>> >> Thu, Oct 05, 2023 at 08:42:25PM CEST, victor@mojatatu.com wrote:
>> >> >__Context__
>> >> >The "tc block" is a collection of netdevs/ports which allow qdiscs to share
>> >> >match-action block instances (as opposed to the traditional tc filter per
>> >> >netdev/port)[1].
>> >> >
>> >> >Example setup:
>> >> >$ tc qdisc add dev ens7 ingress block 22
>> >> >$ tc qdisc add dev ens8 ingress block 22
>> >> >
>> >> >Once the block is created we can add a filter using the block index:
>> >> >$ tc filter add block 22 protocol ip pref 25 \
>> >> >  flower dst_ip 192.168.0.0/16 action drop
>> >> >
>> >> >A packet with dst IP matching 192.168.0.0/16 arriving on the ingress of
>> >> >either ens7 or ens8 is dropped.
>> >> >
>> >> >__This patchset__
>> >> >Up to this point in the implementation, the block is unaware of its ports.
>> >> >This patch fixes that and makes the tc block ports available to the
>> >>
>> >> Odd. You fix a bug. Is there a bug? If yes, you need to describe it. If
>> >> no, don't use "fix".
>> >
>> >Ok, Jiri;->  we will change the language.
>> >
>> >>
>> >> >datapath.
>> >> >
>> >> >For the datapath we provide a use case of the tc block in an action
>> >> >we call "blockcast" in patch 3. This action can be used in an example as
>> >> >such:
>> >> >
>> >> >$ tc qdisc add dev ens7 ingress block 22
>> >> >$ tc qdisc add dev ens8 ingress block 22
>> >> >$ tc qdisc add dev ens9 ingress block 22
>> >> >$ tc filter add block 22 protocol ip pref 25 \
>> >> >  flower dst_ip 192.168.0.0/16 action blockcast
>> >>
>> >> Seems to me a bit odd that the action works with the entity (block) is
>> >> is connected to. I would expect rather to give the action configuration:
>> >>
>> >> $ tc filter add block 22 protocol ip pref 25 \
>> >>   flower dst_ip 192.168.0.0/16 action blockcast block 22
>> >>                                                 ^^^^^^^^
>> >
>> >We are currently passing the blockid in the skb cb field so it is
>> >configuration-less. I suppose we could add this as an optional field
>> >and use it when specified.
>>
>> I don't understand the need for configuration less here. You don't have
>> it for the rest of the actions. Why this is special?
>
>It is not needed really. Think of an L2 switch - the broadcast action
>is to send to all ports but self.
>
>>
>> >
>> >> Then this is more flexible and allows user to use this action for any
>> >> packet, no matter from where it was received.
>> >>
>> >> Looks like this is functionality-wise similar to mirred redirect. Why
>> >> can't we have that action extended to accept block number instead of
>> >> netdev and have something like:
>> >>
>> >> $ tc filter add block 22 protocol ip pref 25 \
>> >>   flower dst_ip 192.168.0.0/16 action mirred egress redirect block 22
>> >>
>> >> This would be very much alike we do either "tc filter add dev X" or "tc
>> >> filter add block Y".
>> >>
>> >
>> >We did consider it but concluded it is a lot of work to get it done on
>> >mirred - just take a look at mirred and you'll see what i mean;->
>> >Based on that review we came to the conclusion that at some point it
>> >would be safer to separate mirred's mirror from redirect; there are
>> >too many checks to avoid one or the other based on whether you are
>> >coming from egress vs ingress etc. This one is simple, it is just a
>> >broadcast.
>>
>> Perhaps it is a nice opportunity to do such mirred cleanup, prepare the
>> code and implement block send afterwards?
>
>I was worried about breaking some existing use cases - the code has
>got too clever.
>But probably it is time to show it some love, one of us will invest
>time into it.

Awesome.


>
>> If I omit the code for now, from user perspective, this functionality
>> belongs into mirred, don't you think? Just replace "dev" by "block" and
>> you got what you need.
>
>If we can adequately cleanup mirred,  then we can put it there but
>certainly now we are adding more buttons to click on mirred. It may
>make sense to refactor the mirred code then reuse the refactored code
>in a new action.

I don't understand why you need any new action. mirred redirect to block
instead of dev is exactly what you need. Isn't it?


>
>cheers,
>jamal
>
>>
>> >
>> >
>> >> Regarding the filtering, that could be a simple flag config of mirred
>> >> action:
>> >>
>> >> $ tc filter add block 22 protocol ip pref 25 \
>> >>   flower dst_ip 192.168.0.0/16 action mirred egress redirect block 22
>> >>   srcfilter
>> >>
>> >> Or something like that.
>> >>
>> >
>> >See my comment above.
>> >
>> >cheers,
>> >jamal
>> >> Makes sense?
>> >>
>> >>
>> >>
>> >> >
>> >> >When a packet(matching dst IP 192.168.0.0/16) arrives on the ingress of any
>> >> >of ens7, ens8 or ens9 it will be copied to all ports other than itself.
>> >> >For example, if it arrives on ens8 then a copy of the packet will be
>> >> >"blockcasted";-> to both ens7 and ens9 (unmodified), but not to ens8.
>> >> >
>> >> >Patch 1 introduces the required infra. Patch 2 exposes the tc block to the
>> >> >tc datapath and patch 3 implements datapath usage via a new tc action
>> >> >"blockcast".
>> >> >
>> >> >__Acknowledgements__
>> >> >Suggestions from Vlad Buslov and Marcelo Ricardo Leitner made this patchset
>> >> >better. The idea of integrating the ports into the tc block was suggested
>> >> >by Jiri Pirko.
>> >> >
>> >> >[1] See commit ca46abd6f89f ("Merge branch 'net-sched-allow-qdiscs-to-share-filter-block-instances'")
>> >> >
>> >> >Changes in v2:
>> >> >  - Remove RFC tag
>> >> >  - Add more details in patch 0(Jiri)
>> >> >  - When CONFIG_NET_TC_SKB_EXT is selected we have unused qdisc_cb
>> >> >    Reported-by: kernel test robot <lkp@intel.com> (and horms@kernel.org)
>> >> >  - Fix bad dev dereference in printk of blockcast action (Simon)
>> >> >
>> >> >Changes in v3:
>> >> >  - Add missing xa_destroy (pointed out by Vlad)
>> >> >  - Remove bugfix pointed by Vlad (will send in separate patch)
>> >> >  - Removed ports from subject in patch #2 and typos (suggested by Marcelo)
>> >> >  - Remove net_notice_ratelimited debug messages in error
>> >> >    cases (suggested by Marcelo)
>> >> >  - Minor changes to appease sparse's lock context warning
>> >> >
>> >> >Changes in v4:
>> >> >  - Avoid code repetition using gotos in cast_one (suggested by Paolo)
>> >> >  - Fix typo in cover letter (pointed out by Paolo)
>> >> >  - Create a module description for act_blockcast
>> >> >    (reported by Paolo and CI)
>> >> >
>> >> >Victor Nogueira (3):
>> >> >  net/sched: Introduce tc block netdev tracking infra
>> >> >  net/sched: cls_api: Expose tc block to the datapath
>> >> >  net/sched: act_blockcast: Introduce blockcast tc action
>> >> >
>> >> > include/net/sch_generic.h    |   8 +
>> >> > include/net/tc_wrapper.h     |   5 +
>> >> > include/uapi/linux/pkt_cls.h |   1 +
>> >> > net/sched/Kconfig            |  13 ++
>> >> > net/sched/Makefile           |   1 +
>> >> > net/sched/act_blockcast.c    | 297 +++++++++++++++++++++++++++++++++++
>> >> > net/sched/cls_api.c          |  12 +-
>> >> > net/sched/sch_api.c          |  58 +++++++
>> >> > net/sched/sch_generic.c      |  34 +++-
>> >> > 9 files changed, 426 insertions(+), 3 deletions(-)
>> >> > create mode 100644 net/sched/act_blockcast.c
>> >> >
>> >> >--
>> >> >2.25.1
>> >> >

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v4 0/3] net/sched: Introduce tc block ports tracking and use
  2023-10-07 10:20             ` Jiri Pirko
@ 2023-10-07 11:06               ` Jamal Hadi Salim
  2023-10-07 12:43                 ` Jiri Pirko
  0 siblings, 1 reply; 20+ messages in thread
From: Jamal Hadi Salim @ 2023-10-07 11:06 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Jakub Kicinski, Victor Nogueira, xiyou.wangcong, davem, pabeni,
	edumazet, mleitner, vladbu, simon.horman, pctammela, netdev,
	kernel

On Sat, Oct 7, 2023 at 6:20 AM Jiri Pirko <jiri@resnulli.us> wrote:
>
> Sat, Oct 07, 2023 at 01:00:00AM CEST, jhs@mojatatu.com wrote:
> >On Fri, Oct 6, 2023 at 6:25 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >>
> >> On Fri, 6 Oct 2023 15:06:45 -0400 Jamal Hadi Salim wrote:
> >> > > I don't understand the need for configuration less here. You don't have
> >> > > it for the rest of the actions. Why this is special?
> >>
> >> +1, FWIW
> >
> >We dont have any rule that says all actions MUST have parameters ;->
> >There is nothing speacial about any action that doesnt have a
> >parameter.
>
> You are getting the configuration from the block/device the action is
> attached to. Can you point me to another action doing that?

We are entering a pedantic road i am afraid. If there is no existing
action that has zero config then consider this one the first one. We
use skb->metadata all the time as a source of information for actions,
classifiers, qdiscs. If i dont need config i dont need to invent one
just because, well, all other actions are using one or more config;->
Your suggestion to specify an extra config to select a block - which
may be different than the one the one packet on - is a useful
feature(it just adds more code) but really should be optional. i.e if
you dont specify a block id configuration then we pick the metadata
one.

> >If we can adequately cleanup mirred,  then we can put it there but
> >certainly now we are adding more buttons to click on mirred. It may
> >make sense to refactor the mirred code then reuse the refactored code
> >in a new action.
>
> I don't understand why you need any new action. mirred redirect to block
> instead of dev is exactly what you need. Isn't it?

The actions have different meanings and lumping them together then
selecting via a knob is not the right approach.
There's shared code for sure. Infact the sending code was ripped from
mirred so as not to touch the rest because like i said mirred has
since grown a couple of horns and tails. In retrospect mirred should
have been two actions with shared code - but it is too late to change
now because it is very widely used. If someone like me was afraid of
touching it is because there's a maintainance challenge. I consider it
in the same zone as trying to restructure something in the skb.
I agree mirroring to a group of ports with a simple config is a useful
feature. Mirroring to a group via a tc block is simpler but adding a
list of ports instead is more powerful. So this feature is useful to
have in mirred - just the adding of yet one more button to say "skip
this port" is my concern.
Lets see how the refactoring goes first then it will be clearer - it
is a lot of delicate work - but you are right lets give it some love
now.

cheers,
jamal


>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v4 0/3] net/sched: Introduce tc block ports tracking and use
  2023-10-07 11:06               ` Jamal Hadi Salim
@ 2023-10-07 12:43                 ` Jiri Pirko
  2023-10-07 14:09                   ` Jamal Hadi Salim
  0 siblings, 1 reply; 20+ messages in thread
From: Jiri Pirko @ 2023-10-07 12:43 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Jakub Kicinski, Victor Nogueira, xiyou.wangcong, davem, pabeni,
	edumazet, mleitner, vladbu, simon.horman, pctammela, netdev,
	kernel

Sat, Oct 07, 2023 at 01:06:43PM CEST, jhs@mojatatu.com wrote:
>On Sat, Oct 7, 2023 at 6:20 AM Jiri Pirko <jiri@resnulli.us> wrote:
>>
>> Sat, Oct 07, 2023 at 01:00:00AM CEST, jhs@mojatatu.com wrote:
>> >On Fri, Oct 6, 2023 at 6:25 PM Jakub Kicinski <kuba@kernel.org> wrote:
>> >>
>> >> On Fri, 6 Oct 2023 15:06:45 -0400 Jamal Hadi Salim wrote:
>> >> > > I don't understand the need for configuration less here. You don't have
>> >> > > it for the rest of the actions. Why this is special?
>> >>
>> >> +1, FWIW
>> >
>> >We dont have any rule that says all actions MUST have parameters ;->
>> >There is nothing speacial about any action that doesnt have a
>> >parameter.
>>
>> You are getting the configuration from the block/device the action is
>> attached to. Can you point me to another action doing that?
>
>We are entering a pedantic road i am afraid. If there is no existing
>action that has zero config then consider this one the first one. We

Nope, nothing pedantic about it. I was just curious if there's anything
out there I missed.


>use skb->metadata all the time as a source of information for actions,
>classifiers, qdiscs. If i dont need config i dont need to invent one

skb->metadata is something that is specific to a packet. That has
nothing to do with the actual configuration.


>just because, well, all other actions are using one or more config;->
>Your suggestion to specify an extra config to select a block - which
>may be different than the one the one packet on - is a useful
>feature(it just adds more code) but really should be optional. i.e if
>you dont specify a block id configuration then we pick the metadata
>one.

My primary point is, this should be mirred redirect to block instead of
what we currently have only for dev. That's it.



>
>> >If we can adequately cleanup mirred,  then we can put it there but
>> >certainly now we are adding more buttons to click on mirred. It may
>> >make sense to refactor the mirred code then reuse the refactored code
>> >in a new action.
>>
>> I don't understand why you need any new action. mirred redirect to block
>> instead of dev is exactly what you need. Isn't it?
>
>The actions have different meanings and lumping them together then
>selecting via a knob is not the right approach.
>There's shared code for sure. Infact the sending code was ripped from
>mirred so as not to touch the rest because like i said mirred has
>since grown a couple of horns and tails. In retrospect mirred should
>have been two actions with shared code - but it is too late to change
>now because it is very widely used. If someone like me was afraid of
>touching it is because there's a maintainance challenge. I consider it
>in the same zone as trying to restructure something in the skb.
>I agree mirroring to a group of ports with a simple config is a useful
>feature. Mirroring to a group via a tc block is simpler but adding a
>list of ports instead is more powerful. So this feature is useful to
>have in mirred - just the adding of yet one more button to say "skip
>this port" is my concern.

Why? Perhaps skb->iif could be used for check in the tx iteration.


>Lets see how the refactoring goes first then it will be clearer - it
>is a lot of delicate work - but you are right lets give it some love
>now.
>
>cheers,
>jamal
>
>
>>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v4 0/3] net/sched: Introduce tc block ports tracking and use
  2023-10-07 12:43                 ` Jiri Pirko
@ 2023-10-07 14:09                   ` Jamal Hadi Salim
  2023-10-07 17:20                     ` Jiri Pirko
  0 siblings, 1 reply; 20+ messages in thread
From: Jamal Hadi Salim @ 2023-10-07 14:09 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Jakub Kicinski, Victor Nogueira, xiyou.wangcong, davem, pabeni,
	edumazet, mleitner, vladbu, simon.horman, pctammela, netdev,
	kernel

On Sat, Oct 7, 2023 at 8:43 AM Jiri Pirko <jiri@resnulli.us> wrote:
>
> Sat, Oct 07, 2023 at 01:06:43PM CEST, jhs@mojatatu.com wrote:
> >On Sat, Oct 7, 2023 at 6:20 AM Jiri Pirko <jiri@resnulli.us> wrote:
> >>
> >> Sat, Oct 07, 2023 at 01:00:00AM CEST, jhs@mojatatu.com wrote:
> >> >On Fri, Oct 6, 2023 at 6:25 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >> >>
> >> >> On Fri, 6 Oct 2023 15:06:45 -0400 Jamal Hadi Salim wrote:
> >> >> > > I don't understand the need for configuration less here. You don't have
> >> >> > > it for the rest of the actions. Why this is special?
> >> >>
> >> >> +1, FWIW
> >> >
> >> >We dont have any rule that says all actions MUST have parameters ;->
> >> >There is nothing speacial about any action that doesnt have a
> >> >parameter.
> >>
> >> You are getting the configuration from the block/device the action is
> >> attached to. Can you point me to another action doing that?
> >
> >We are entering a pedantic road i am afraid. If there is no existing
> >action that has zero config then consider this one the first one. We
>
> Nope, nothing pedantic about it. I was just curious if there's anything
> out there I missed.
>

Not sure if you noticed in the patch: the blockid on which the skb
arrived on is now available in the tc_cb[] so when it shows up at the
action we can just use it.

>
> >use skb->metadata all the time as a source of information for actions,
> >classifiers, qdiscs. If i dont need config i dont need to invent one
>
> skb->metadata is something that is specific to a packet. That has
> nothing to do with the actual configuration.

Essentially we turned blockid into skb metadata. A user specifying
configuration of a different blockid is certainly useful. My point is
we can have both worlds: when such a user config is missing we'll
assume a default which happens to be in the skb.

>
> >just because, well, all other actions are using one or more config;->
> >Your suggestion to specify an extra config to select a block - which
> >may be different than the one the one packet on - is a useful
> >feature(it just adds more code) but really should be optional. i.e if
> >you dont specify a block id configuration then we pick the metadata
> >one.
>
> My primary point is, this should be mirred redirect to block instead of
> what we currently have only for dev. That's it.

Agreed (and such a feature should be added regardless of this action).
The tc block provides a simple abstraction, but do you think it is
enough? Alternative is to use a list of ports given to mirred: it
allows us to group ports from different tc blocks or even just a
subset of what is in a tc block - but it will require a lot more code
to express such functionality.

>
>
> >
> >> >If we can adequately cleanup mirred,  then we can put it there but
> >> >certainly now we are adding more buttons to click on mirred. It may
> >> >make sense to refactor the mirred code then reuse the refactored code
> >> >in a new action.
> >>
> >> I don't understand why you need any new action. mirred redirect to block
> >> instead of dev is exactly what you need. Isn't it?
> >
> >The actions have different meanings and lumping them together then
> >selecting via a knob is not the right approach.
> >There's shared code for sure. Infact the sending code was ripped from
> >mirred so as not to touch the rest because like i said mirred has
> >since grown a couple of horns and tails. In retrospect mirred should
> >have been two actions with shared code - but it is too late to change
> >now because it is very widely used. If someone like me was afraid of
> >touching it is because there's a maintainance challenge. I consider it
> >in the same zone as trying to restructure something in the skb.
> >I agree mirroring to a group of ports with a simple config is a useful
> >feature. Mirroring to a group via a tc block is simpler but adding a
> >list of ports instead is more powerful. So this feature is useful to
> >have in mirred - just the adding of yet one more button to say "skip
> >this port" is my concern.
>
> Why? Perhaps skb->iif could be used for check in the tx iteration.
>

We use skb->dev->ifindex to find the exception. Is iif better?.
Jiri - but why does this have to be part of mirred::mirror? I am
asking the same question of why mirror and redirect have to be part
mirred instead of separate actions.

cheers,
jamal

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v4 0/3] net/sched: Introduce tc block ports tracking and use
  2023-10-07 14:09                   ` Jamal Hadi Salim
@ 2023-10-07 17:20                     ` Jiri Pirko
  2023-10-08 12:38                       ` Jamal Hadi Salim
  2023-10-09 20:54                       ` Marcelo Ricardo Leitner
  0 siblings, 2 replies; 20+ messages in thread
From: Jiri Pirko @ 2023-10-07 17:20 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Jakub Kicinski, Victor Nogueira, xiyou.wangcong, davem, pabeni,
	edumazet, mleitner, vladbu, simon.horman, pctammela, netdev,
	kernel

Sat, Oct 07, 2023 at 04:09:15PM CEST, jhs@mojatatu.com wrote:
>On Sat, Oct 7, 2023 at 8:43 AM Jiri Pirko <jiri@resnulli.us> wrote:
>>
>> Sat, Oct 07, 2023 at 01:06:43PM CEST, jhs@mojatatu.com wrote:
>> >On Sat, Oct 7, 2023 at 6:20 AM Jiri Pirko <jiri@resnulli.us> wrote:
>> >>
>> >> Sat, Oct 07, 2023 at 01:00:00AM CEST, jhs@mojatatu.com wrote:
>> >> >On Fri, Oct 6, 2023 at 6:25 PM Jakub Kicinski <kuba@kernel.org> wrote:
>> >> >>
>> >> >> On Fri, 6 Oct 2023 15:06:45 -0400 Jamal Hadi Salim wrote:
>> >> >> > > I don't understand the need for configuration less here. You don't have
>> >> >> > > it for the rest of the actions. Why this is special?
>> >> >>
>> >> >> +1, FWIW
>> >> >
>> >> >We dont have any rule that says all actions MUST have parameters ;->
>> >> >There is nothing speacial about any action that doesnt have a
>> >> >parameter.
>> >>
>> >> You are getting the configuration from the block/device the action is
>> >> attached to. Can you point me to another action doing that?
>> >
>> >We are entering a pedantic road i am afraid. If there is no existing
>> >action that has zero config then consider this one the first one. We
>>
>> Nope, nothing pedantic about it. I was just curious if there's anything
>> out there I missed.
>>
>
>Not sure if you noticed in the patch: the blockid on which the skb
>arrived on is now available in the tc_cb[] so when it shows up at the
>action we can just use it.

I see, but does it has to be? I don't think so with the solution I'm
proposing.


>
>>
>> >use skb->metadata all the time as a source of information for actions,
>> >classifiers, qdiscs. If i dont need config i dont need to invent one
>>
>> skb->metadata is something that is specific to a packet. That has
>> nothing to do with the actual configuration.
>
>Essentially we turned blockid into skb metadata. A user specifying
>configuration of a different blockid is certainly useful. My point is
>we can have both worlds: when such a user config is missing we'll
>assume a default which happens to be in the skb.
>
>>
>> >just because, well, all other actions are using one or more config;->
>> >Your suggestion to specify an extra config to select a block - which
>> >may be different than the one the one packet on - is a useful
>> >feature(it just adds more code) but really should be optional. i.e if
>> >you dont specify a block id configuration then we pick the metadata
>> >one.
>>
>> My primary point is, this should be mirred redirect to block instead of
>> what we currently have only for dev. That's it.
>
>Agreed (and such a feature should be added regardless of this action).
>The tc block provides a simple abstraction, but do you think it is
>enough? Alternative is to use a list of ports given to mirred: it
>allows us to group ports from different tc blocks or even just a
>subset of what is in a tc block - but it will require a lot more code
>to express such functionality.

Again, you attach filter to either dev or block. If you extend mirred
redirect to accept the same 2 types of target, I think it would be best.


>
>>
>>
>> >
>> >> >If we can adequately cleanup mirred,  then we can put it there but
>> >> >certainly now we are adding more buttons to click on mirred. It may
>> >> >make sense to refactor the mirred code then reuse the refactored code
>> >> >in a new action.
>> >>
>> >> I don't understand why you need any new action. mirred redirect to block
>> >> instead of dev is exactly what you need. Isn't it?
>> >
>> >The actions have different meanings and lumping them together then
>> >selecting via a knob is not the right approach.
>> >There's shared code for sure. Infact the sending code was ripped from
>> >mirred so as not to touch the rest because like i said mirred has
>> >since grown a couple of horns and tails. In retrospect mirred should
>> >have been two actions with shared code - but it is too late to change
>> >now because it is very widely used. If someone like me was afraid of
>> >touching it is because there's a maintainance challenge. I consider it
>> >in the same zone as trying to restructure something in the skb.
>> >I agree mirroring to a group of ports with a simple config is a useful
>> >feature. Mirroring to a group via a tc block is simpler but adding a
>> >list of ports instead is more powerful. So this feature is useful to
>> >have in mirred - just the adding of yet one more button to say "skip
>> >this port" is my concern.
>>
>> Why? Perhaps skb->iif could be used for check in the tx iteration.
>>
>
>We use skb->dev->ifindex to find the exception. Is iif better?.

iif contains ifindex of the actual ingress device. So if the netdev is
part of bond for example, this still contains the original ifindex.
So I buess that this depends on what you need. Looks to me that
skb->dev->ifindex would be probaly better. It contains the netdev that
the filter is attached on, right?


>Jiri - but why does this have to be part of mirred::mirror? I am
>asking the same question of why mirror and redirect have to be part
>mirred instead of separate actions.

You have to maintain the backwards compatibility. Currently mirred is
one action right? Does not matter how you do it in kernel, user should
not tell any difference.


>
>cheers,
>jamal

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v4 0/3] net/sched: Introduce tc block ports tracking and use
  2023-10-07 17:20                     ` Jiri Pirko
@ 2023-10-08 12:38                       ` Jamal Hadi Salim
  2023-10-09 20:54                       ` Marcelo Ricardo Leitner
  1 sibling, 0 replies; 20+ messages in thread
From: Jamal Hadi Salim @ 2023-10-08 12:38 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Jakub Kicinski, Victor Nogueira, xiyou.wangcong, davem, pabeni,
	edumazet, mleitner, vladbu, simon.horman, pctammela, netdev,
	kernel

On Sat, Oct 7, 2023 at 1:20 PM Jiri Pirko <jiri@resnulli.us> wrote:
>
> Sat, Oct 07, 2023 at 04:09:15PM CEST, jhs@mojatatu.com wrote:
> >On Sat, Oct 7, 2023 at 8:43 AM Jiri Pirko <jiri@resnulli.us> wrote:
> >>
> >> Sat, Oct 07, 2023 at 01:06:43PM CEST, jhs@mojatatu.com wrote:
> >> >On Sat, Oct 7, 2023 at 6:20 AM Jiri Pirko <jiri@resnulli.us> wrote:
> >> >>
> >> >> Sat, Oct 07, 2023 at 01:00:00AM CEST, jhs@mojatatu.com wrote:
> >> >> >On Fri, Oct 6, 2023 at 6:25 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >> >> >>
> >> >> >> On Fri, 6 Oct 2023 15:06:45 -0400 Jamal Hadi Salim wrote:
> >> >> >> > > I don't understand the need for configuration less here. You don't have
> >> >> >> > > it for the rest of the actions. Why this is special?
> >> >> >>
> >> >> >> +1, FWIW
> >> >> >
> >> >> >We dont have any rule that says all actions MUST have parameters ;->
> >> >> >There is nothing speacial about any action that doesnt have a
> >> >> >parameter.
> >> >>
> >> >> You are getting the configuration from the block/device the action is
> >> >> attached to. Can you point me to another action doing that?
> >> >
> >> >We are entering a pedantic road i am afraid. If there is no existing
> >> >action that has zero config then consider this one the first one. We
> >>
> >> Nope, nothing pedantic about it. I was just curious if there's anything
> >> out there I missed.
> >>
> >
> >Not sure if you noticed in the patch: the blockid on which the skb
> >arrived on is now available in the tc_cb[] so when it shows up at the
> >action we can just use it.
>
> I see, but does it has to be? I don't think so with the solution I'm
> proposing.

It's a simplistic use for a broadcast. We should support the one you
suggested as well.

> >
> >>
> >> >use skb->metadata all the time as a source of information for actions,
> >> >classifiers, qdiscs. If i dont need config i dont need to invent one
> >>
> >> skb->metadata is something that is specific to a packet. That has
> >> nothing to do with the actual configuration.
> >
> >Essentially we turned blockid into skb metadata. A user specifying
> >configuration of a different blockid is certainly useful. My point is
> >we can have both worlds: when such a user config is missing we'll
> >assume a default which happens to be in the skb.
> >
> >>
> >> >just because, well, all other actions are using one or more config;->
> >> >Your suggestion to specify an extra config to select a block - which
> >> >may be different than the one the one packet on - is a useful
> >> >feature(it just adds more code) but really should be optional. i.e if
> >> >you dont specify a block id configuration then we pick the metadata
> >> >one.
> >>
> >> My primary point is, this should be mirred redirect to block instead of
> >> what we currently have only for dev. That's it.
> >
> >Agreed (and such a feature should be added regardless of this action).
> >The tc block provides a simple abstraction, but do you think it is
> >enough? Alternative is to use a list of ports given to mirred: it
> >allows us to group ports from different tc blocks or even just a
> >subset of what is in a tc block - but it will require a lot more code
> >to express such functionality.
>
> Again, you attach filter to either dev or block. If you extend mirred
> redirect to accept the same 2 types of target, I think it would be best.
>

We are going to make block work with mirror, it makes sense. I am not
sure about the redirect, what is the semantic? mirror to everyone but
redirect to the last one?

>
> >
> >>
> >>
> >> >
> >> >> >If we can adequately cleanup mirred,  then we can put it there but
> >> >> >certainly now we are adding more buttons to click on mirred. It may
> >> >> >make sense to refactor the mirred code then reuse the refactored code
> >> >> >in a new action.
> >> >>
> >> >> I don't understand why you need any new action. mirred redirect to block
> >> >> instead of dev is exactly what you need. Isn't it?
> >> >
> >> >The actions have different meanings and lumping them together then
> >> >selecting via a knob is not the right approach.
> >> >There's shared code for sure. Infact the sending code was ripped from
> >> >mirred so as not to touch the rest because like i said mirred has
> >> >since grown a couple of horns and tails. In retrospect mirred should
> >> >have been two actions with shared code - but it is too late to change
> >> >now because it is very widely used. If someone like me was afraid of
> >> >touching it is because there's a maintainance challenge. I consider it
> >> >in the same zone as trying to restructure something in the skb.
> >> >I agree mirroring to a group of ports with a simple config is a useful
> >> >feature. Mirroring to a group via a tc block is simpler but adding a
> >> >list of ports instead is more powerful. So this feature is useful to
> >> >have in mirred - just the adding of yet one more button to say "skip
> >> >this port" is my concern.
> >>
> >> Why? Perhaps skb->iif could be used for check in the tx iteration.
> >>
> >
> >We use skb->dev->ifindex to find the exception. Is iif better?.
>
> iif contains ifindex of the actual ingress device. So if the netdev is
> part of bond for example, this still contains the original ifindex.
> So I buess that this depends on what you need. Looks to me that
> skb->dev->ifindex would be probaly better. It contains the netdev that
> the filter is attached on, right?
>

Note: you can use mirred to redirect to either ingress or egress of
other ports - I believe one of these ifindices changes to reflect the
new ifindex. We'll take a closer look.

>
> >Jiri - but why does this have to be part of mirred::mirror? I am
> >asking the same question of why mirror and redirect have to be part
> >mirred instead of separate actions.
>
> You have to maintain the backwards compatibility. Currently mirred is
> one action right? Does not matter how you do it in kernel, user should
> not tell any difference.

I dont mean to break existing mirred. What i meant was in retrospect i
wish i had the insight to separate mirred into two actions(and share
the code instead), it would have simplified the code and its
maintainance. It is for the same reason i am not in favor of is adding
the "skip this port" in mirror. This is in the spirit of unix
philosophy, which we have been mostly adhering to: write small
features/actions which do one thing well and stitch them together to
compose.

cheers,
jamal
>
> >
> >cheers,
> >jamal

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v4 0/3] net/sched: Introduce tc block ports tracking and use
  2023-10-07 17:20                     ` Jiri Pirko
  2023-10-08 12:38                       ` Jamal Hadi Salim
@ 2023-10-09 20:54                       ` Marcelo Ricardo Leitner
  2023-10-10  7:41                         ` Jiri Pirko
  1 sibling, 1 reply; 20+ messages in thread
From: Marcelo Ricardo Leitner @ 2023-10-09 20:54 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Jamal Hadi Salim, Jakub Kicinski, Victor Nogueira,
	xiyou.wangcong, davem, pabeni, edumazet, vladbu, simon.horman,
	pctammela, netdev, kernel

On Sat, Oct 07, 2023 at 07:20:52PM +0200, Jiri Pirko wrote:
> Sat, Oct 07, 2023 at 04:09:15PM CEST, jhs@mojatatu.com wrote:
> >On Sat, Oct 7, 2023 at 8:43 AM Jiri Pirko <jiri@resnulli.us> wrote:
...
> >> My primary point is, this should be mirred redirect to block instead of
> >> what we currently have only for dev. That's it.
> >
> >Agreed (and such a feature should be added regardless of this action).
> >The tc block provides a simple abstraction, but do you think it is
> >enough? Alternative is to use a list of ports given to mirred: it
> >allows us to group ports from different tc blocks or even just a
> >subset of what is in a tc block - but it will require a lot more code
> >to express such functionality.
>
> Again, you attach filter to either dev or block. If you extend mirred
> redirect to accept the same 2 types of target, I think it would be best.

The difference here between filter and action here is that you don't
really have an option for filters: you either attach it to either dev
or block, or you create an entire new class of objects, say,
"blockfilter", all while retaining the same filters, parameters, etc.
I'm not aware of a single filter that behaves differently over a block
than a netdev.

But for actions, there is the option, and despite the fact that both
"output packets", the semantics are not that close. It actually
helps with parameter parsing, documentation (man pages), testing (as
use and test cases can be more easily tracked) and perhaps more
importantly: if I don't want this feature, I can disable the new
module.

Later someone will say "hey why not have a hash_dst_selector", so it
can implement a load balancer through the block output? And mirred,
once a simple use case (with an already complex implementation),
becomes a partial implementation of bonding then. :)

In short, I'm not sure if having the user to fiddle through a maze of
options that only work in mode A or B or work differently is better
than having more specialized actions (which can and should reuse code).

  Marcelo


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v4 0/3] net/sched: Introduce tc block ports tracking and use
  2023-10-09 20:54                       ` Marcelo Ricardo Leitner
@ 2023-10-10  7:41                         ` Jiri Pirko
  2023-10-10 11:54                           ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 20+ messages in thread
From: Jiri Pirko @ 2023-10-10  7:41 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: Jamal Hadi Salim, Jakub Kicinski, Victor Nogueira,
	xiyou.wangcong, davem, pabeni, edumazet, vladbu, simon.horman,
	pctammela, netdev, kernel

Mon, Oct 09, 2023 at 10:54:07PM CEST, mleitner@redhat.com wrote:
>On Sat, Oct 07, 2023 at 07:20:52PM +0200, Jiri Pirko wrote:
>> Sat, Oct 07, 2023 at 04:09:15PM CEST, jhs@mojatatu.com wrote:
>> >On Sat, Oct 7, 2023 at 8:43 AM Jiri Pirko <jiri@resnulli.us> wrote:
>...
>> >> My primary point is, this should be mirred redirect to block instead of
>> >> what we currently have only for dev. That's it.
>> >
>> >Agreed (and such a feature should be added regardless of this action).
>> >The tc block provides a simple abstraction, but do you think it is
>> >enough? Alternative is to use a list of ports given to mirred: it
>> >allows us to group ports from different tc blocks or even just a
>> >subset of what is in a tc block - but it will require a lot more code
>> >to express such functionality.
>>
>> Again, you attach filter to either dev or block. If you extend mirred
>> redirect to accept the same 2 types of target, I think it would be best.
>
>The difference here between filter and action here is that you don't
>really have an option for filters: you either attach it to either dev
>or block, or you create an entire new class of objects, say,
>"blockfilter", all while retaining the same filters, parameters, etc.
>I'm not aware of a single filter that behaves differently over a block
>than a netdev.

Why do you talk about different behaviour? Nobody suggested that. I have
no idea what you mean by "blockfilter".



>
>But for actions, there is the option, and despite the fact that both

Which option?


>"output packets", the semantics are not that close. It actually
>helps with parameter parsing, documentation (man pages), testing (as
>use and test cases can be more easily tracked) and perhaps more
>importantly: if I don't want this feature, I can disable the new
>module.
>
>Later someone will say "hey why not have a hash_dst_selector", so it
>can implement a load balancer through the block output? And mirred,
>once a simple use case (with an already complex implementation),
>becomes a partial implementation of bonding then. :)
>
>In short, I'm not sure if having the user to fiddle through a maze of
>options that only work in mode A or B or work differently is better
>than having more specialized actions (which can and should reuse code).

Sure, you can have "blockmirredredirect" that would only to the
redirection for block target. No problem. I don't see why it can't be
just a case of mirred redirect, but if people want that separate, why
not.

My problem with the action "blockcast" is that it somehow works with
configuration of an entity the filter is attached to:
blockX->filterY->blockcastZ

Z goes all the way down to blockX to figure out where to redirect the
packet. If that is not odd, then I don't know what else is.

Has other consequences, like what happens if the filter is not attached
to block, but dev directly? What happens is blockcast action is shared
among filter? Etc.

Configuration should be action property. That is my point.



>
>  Marcelo
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v4 0/3] net/sched: Introduce tc block ports tracking and use
  2023-10-10  7:41                         ` Jiri Pirko
@ 2023-10-10 11:54                           ` Marcelo Ricardo Leitner
  0 siblings, 0 replies; 20+ messages in thread
From: Marcelo Ricardo Leitner @ 2023-10-10 11:54 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Jamal Hadi Salim, Jakub Kicinski, Victor Nogueira,
	xiyou.wangcong, davem, pabeni, edumazet, vladbu, simon.horman,
	pctammela, netdev, kernel

On Tue, Oct 10, 2023 at 09:41:11AM +0200, Jiri Pirko wrote:
> Mon, Oct 09, 2023 at 10:54:07PM CEST, mleitner@redhat.com wrote:
> >On Sat, Oct 07, 2023 at 07:20:52PM +0200, Jiri Pirko wrote:
> >> Sat, Oct 07, 2023 at 04:09:15PM CEST, jhs@mojatatu.com wrote:
> >> >On Sat, Oct 7, 2023 at 8:43 AM Jiri Pirko <jiri@resnulli.us> wrote:
> >...
> >> >> My primary point is, this should be mirred redirect to block instead of
> >> >> what we currently have only for dev. That's it.
> >> >
> >> >Agreed (and such a feature should be added regardless of this action).
> >> >The tc block provides a simple abstraction, but do you think it is
> >> >enough? Alternative is to use a list of ports given to mirred: it
> >> >allows us to group ports from different tc blocks or even just a
> >> >subset of what is in a tc block - but it will require a lot more code
> >> >to express such functionality.
> >>
> >> Again, you attach filter to either dev or block. If you extend mirred
> >> redirect to accept the same 2 types of target, I think it would be best.
> >
> >The difference here between filter and action here is that you don't
> >really have an option for filters: you either attach it to either dev
> >or block, or you create an entire new class of objects, say,
> >"blockfilter", all while retaining the same filters, parameters, etc.
> >I'm not aware of a single filter that behaves differently over a block
> >than a netdev.
>
> Why do you talk about different behaviour? Nobody suggested that. I have

Because if mirred gets updated to support blocks, that's what will
happen. It will have options that only make sense for netdev or for
blocks.

> no idea what you mean by "blockfilter".

Well, it's described above. It was just an example.

>
>
>
> >
> >But for actions, there is the option, and despite the fact that both
>
> Which option?

To create a new action.

>
>
> >"output packets", the semantics are not that close. It actually
> >helps with parameter parsing, documentation (man pages), testing (as
> >use and test cases can be more easily tracked) and perhaps more
> >importantly: if I don't want this feature, I can disable the new
> >module.
> >
> >Later someone will say "hey why not have a hash_dst_selector", so it
> >can implement a load balancer through the block output? And mirred,
> >once a simple use case (with an already complex implementation),
> >becomes a partial implementation of bonding then. :)
> >
> >In short, I'm not sure if having the user to fiddle through a maze of
> >options that only work in mode A or B or work differently is better
> >than having more specialized actions (which can and should reuse code).
>
> Sure, you can have "blockmirredredirect" that would only to the
> redirection for block target. No problem. I don't see why it can't be
> just a case of mirred redirect, but if people want that separate, why
> not.
>
> My problem with the action "blockcast" is that it somehow works with
> configuration of an entity the filter is attached to:
> blockX->filterY->blockcastZ
>
> Z goes all the way down to blockX to figure out where to redirect the
> packet. If that is not odd, then I don't know what else is.

Maybe fried eggs with chocolate. :-D
Just joking..

>
> Has other consequences, like what happens if the filter is not attached
> to block, but dev directly? What happens is blockcast action is shared
> among filter? Etc.

Good points!

>
> Configuration should be action property. That is my point.

Makes sense. Whoever is adding such filter, already knows the block,
and can add the parameter to the action as well. Making it automatic
is not helping out that much in the end.

  Marcelo


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2023-10-10 11:54 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-05 18:42 [PATCH net-next v4 0/3] net/sched: Introduce tc block ports tracking and use Victor Nogueira
2023-10-05 18:42 ` [PATCH net-next v4 1/3] net/sched: Introduce tc block netdev tracking infra Victor Nogueira
2023-10-05 18:42 ` [PATCH net-next v4 2/3] net/sched: cls_api: Expose tc block to the datapath Victor Nogueira
2023-10-05 18:42 ` [PATCH net-next v4 3/3] net/sched: act_blockcast: Introduce blockcast tc action Victor Nogueira
2023-10-06 12:59 ` [PATCH net-next v4 0/3] net/sched: Introduce tc block ports tracking and use Jiri Pirko
2023-10-06 15:37   ` Jamal Hadi Salim
2023-10-06 16:50     ` Jiri Pirko
2023-10-06 19:06       ` Jamal Hadi Salim
2023-10-06 22:25         ` Jakub Kicinski
2023-10-06 23:00           ` Jamal Hadi Salim
2023-10-07 10:20             ` Jiri Pirko
2023-10-07 11:06               ` Jamal Hadi Salim
2023-10-07 12:43                 ` Jiri Pirko
2023-10-07 14:09                   ` Jamal Hadi Salim
2023-10-07 17:20                     ` Jiri Pirko
2023-10-08 12:38                       ` Jamal Hadi Salim
2023-10-09 20:54                       ` Marcelo Ricardo Leitner
2023-10-10  7:41                         ` Jiri Pirko
2023-10-10 11:54                           ` Marcelo Ricardo Leitner
2023-10-07 10:22         ` Jiri Pirko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.