All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH net-next 0/3] TC: Introduce qevents
@ 2020-05-26 17:10 Petr Machata
  2020-05-26 17:10 ` [RFC PATCH net-next 1/3] net: sched: Introduce helpers for qevent blocks Petr Machata
                   ` (5 more replies)
  0 siblings, 6 replies; 26+ messages in thread
From: Petr Machata @ 2020-05-26 17:10 UTC (permalink / raw)
  To: netdev; +Cc: Jakub Kicinski, Eric Dumazet, jhs, jiri, idosch, Petr Machata

The Spectrum hardware allows execution of one of several actions as a
result of queue management events: tail-dropping, early-dropping, marking a
packet, or passing a configured latency threshold or buffer size. Such
packets can be mirrored, trapped, or sampled.

Modeling the action to be taken as simply a TC action is very attractive,
but it is not obvious where to put these actions. At least with ECN marking
one could imagine a tree of qdiscs and classifiers that effectively
accomplishes this task, albeit in an impractically complex manner. But
there is just no way to match on dropped-ness of a packet, let alone
dropped-ness due to a particular reason.

To allow configuring user-defined actions as a result of inner workings of
a qdisc, this patch set introduces a concept of qevents. Those are attach
points for TC blocks, where filters can be put that are executed as the
packet hits well-defined points in the qdisc algorithms. The attached
blocks can be shared, in a manner similar to clsact ingress and egress
blocks, arbitrary classifiers with arbitrary actions can be put on them,
etc.

For example:

# tc qdisc add dev eth0 root handle 1: \
	red limit 500K avpkt 1K qevent early block 10
# tc filter add block 10 \
	matchall action mirred egress mirror dev eth1

Patch #1 of this set introduces several helpers to allow easy and uniform
addition of qevents to qdiscs. The following two patches, #2 and #3, then
add two qevents to the RED qdisc: "early" qevent fires when a packet is
early-dropped; "mark" qevent, when it is ECN-marked.

This patch set does not deal with offloading. The idea there is that a
driver will be able to figure out that a given block is used in qevent
context by looking at binder type. A future patch-set will add a qdisc
pointer to struct flow_block_offload, which a driver will be able to
consult to glean the TC or other relevant attributes.

Petr Machata (3):
  net: sched: Introduce helpers for qevent blocks
  net: sched: sch_red: Split init and change callbacks
  net: sched: sch_red: Add qevents "early" and "mark"

 include/net/flow_offload.h     |   2 +
 include/net/pkt_cls.h          |  48 +++++++++++++++
 include/uapi/linux/pkt_sched.h |   2 +
 net/sched/cls_api.c            | 107 +++++++++++++++++++++++++++++++++
 net/sched/sch_red.c            | 100 ++++++++++++++++++++++++++----
 5 files changed, 247 insertions(+), 12 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC PATCH net-next 1/3] net: sched: Introduce helpers for qevent blocks
  2020-05-26 17:10 [RFC PATCH net-next 0/3] TC: Introduce qevents Petr Machata
@ 2020-05-26 17:10 ` Petr Machata
  2020-05-26 17:10 ` [RFC PATCH net-next 2/3] net: sched: sch_red: Split init and change callbacks Petr Machata
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 26+ messages in thread
From: Petr Machata @ 2020-05-26 17:10 UTC (permalink / raw)
  To: netdev; +Cc: Jakub Kicinski, Eric Dumazet, jhs, jiri, idosch, Petr Machata

Qevents are attach points for TC blocks, where filters can be put that are
executed when "interesting events" take place in a qdisc. The data to keep
and the functions to invoke to maintain a qevent will be largely the same
between qevents. Therefore introduce sched-wide helpers for qevent
management.

Currently, similarly to ingress and egress blocks of clsact pseudo-qdisc,
blocks attachment cannot be changed after the qdisc is created. To that
end, add a helper tcf_qevent_validate_change(), which verifies whether
block index attribute is not attached, or if it is, whether its value
matches the current one (i.e. there is no material change).

The function tcf_qevent_handle() is supposed to be invoked when qdisc hits
the "interesting event" corresponding to this block.

Signed-off-by: Petr Machata <petrm@mellanox.com>
---
 include/net/pkt_cls.h |  48 +++++++++++++++++++
 net/sched/cls_api.c   | 107 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 155 insertions(+)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index ed65619cbc47..efb20e3c2c98 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -32,6 +32,13 @@ struct tcf_block_ext_info {
 	u32 block_index;
 };
 
+struct tcf_qevent {
+	int			attr_name;
+	struct tcf_block	*block;
+	struct tcf_block_ext_info info;
+	struct tcf_proto __rcu *filter_chain;
+};
+
 struct tcf_block_cb;
 bool tcf_queue_work(struct rcu_work *rwork, work_func_t func);
 
@@ -552,6 +559,47 @@ int tc_setup_cb_reoffload(struct tcf_block *block, struct tcf_proto *tp,
 			  void *cb_priv, u32 *flags, unsigned int *in_hw_count);
 unsigned int tcf_exts_num_actions(struct tcf_exts *exts);
 
+#ifdef CONFIG_NET_CLS_ACT
+int tcf_qevent_init(struct tcf_qevent *qe, struct Qdisc *sch,
+		    enum flow_block_binder_type binder_type,
+		    struct nlattr *block_index_attr,
+		    struct netlink_ext_ack *extack);
+void tcf_qevent_destroy(struct tcf_qevent *qe, struct Qdisc *sch);
+int tcf_qevent_validate_change(struct tcf_qevent *qe,
+			       struct nlattr *block_index_attr,
+			       struct netlink_ext_ack *extack);
+struct sk_buff *tcf_qevent_handle(struct tcf_qevent *qe, struct Qdisc *sch,
+				  struct sk_buff *skb, struct sk_buff **to_free,
+				  int *ret);
+#else
+static inline int tcf_qevent_init(struct tcf_qevent *qe, struct Qdisc *sch,
+				  enum flow_block_binder_type binder_type,
+				  struct nlattr *block_index_attr,
+				  struct netlink_ext_ack *extack)
+{
+	return 0;
+}
+
+static inline void tcf_qevent_destroy(struct tcf_qevent *qe, struct Qdisc *sch)
+{
+}
+
+static inline int tcf_qevent_validate_change(struct tcf_qevent *qe,
+					     struct nlattr *block_index_attr,
+					     struct netlink_ext_ack *extack)
+{
+	return 0;
+}
+
+static inline struct sk_buff *
+tcf_qevent_handle(struct tcf_qevent *qe, struct Qdisc *sch,
+		  struct sk_buff *skb, struct sk_buff **to_free,
+		  int *ret)
+{
+	return skb;
+}
+#endif
+
 struct tc_cls_u32_knode {
 	struct tcf_exts *exts;
 	struct tcf_result *res;
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 752d608f4442..f95a5eee9279 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -3801,6 +3801,113 @@ unsigned int tcf_exts_num_actions(struct tcf_exts *exts)
 }
 EXPORT_SYMBOL(tcf_exts_num_actions);
 
+static int tcf_qevent_parse_block_index(struct nlattr *block_index_attr,
+					u32 *p_block_index,
+					struct netlink_ext_ack *extack)
+{
+	*p_block_index = nla_get_u32(block_index_attr);
+	if (!*p_block_index) {
+		NL_SET_ERR_MSG(extack, "Block number may not be zero");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+int tcf_qevent_init(struct tcf_qevent *qe, struct Qdisc *sch,
+		    enum flow_block_binder_type binder_type,
+		    struct nlattr *block_index_attr,
+		    struct netlink_ext_ack *extack)
+{
+	u32 block_index;
+	int err;
+
+	if (!block_index_attr)
+		return 0;
+
+	err = tcf_qevent_parse_block_index(block_index_attr, &block_index,
+					   extack);
+	if (err)
+		return err;
+
+	if (!block_index)
+		return 0;
+
+	qe->info.binder_type = binder_type;
+	qe->info.chain_head_change = tcf_chain_head_change_dflt;
+	qe->info.chain_head_change_priv = &qe->filter_chain;
+	qe->info.block_index = block_index;
+
+	return tcf_block_get_ext(&qe->block, sch, &qe->info, extack);
+}
+EXPORT_SYMBOL(tcf_qevent_init);
+
+void tcf_qevent_destroy(struct tcf_qevent *qe, struct Qdisc *sch)
+{
+	if (qe->info.block_index)
+		tcf_block_put_ext(qe->block, sch, &qe->info);
+}
+EXPORT_SYMBOL(tcf_qevent_destroy);
+
+int tcf_qevent_validate_change(struct tcf_qevent *qe,
+			       struct nlattr *block_index_attr,
+			       struct netlink_ext_ack *extack)
+{
+	u32 block_index;
+	int err;
+
+	if (!block_index_attr)
+		return 0;
+
+	err = tcf_qevent_parse_block_index(block_index_attr, &block_index,
+					   extack);
+	if (err)
+		return err;
+
+	/* Bounce newly-configured block or change in block. */
+	if (block_index != qe->info.block_index) {
+		NL_SET_ERR_MSG(extack, "Change of blocks is not supported");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(tcf_qevent_validate_change);
+
+struct sk_buff *tcf_qevent_handle(struct tcf_qevent *qe, struct Qdisc *sch,
+				  struct sk_buff *skb, struct sk_buff **to_free,
+				  int *ret)
+{
+	struct tcf_result cl_res;
+	struct tcf_proto *fl;
+
+	if (!qe->info.block_index)
+		return skb;
+
+	fl = rcu_dereference_bh(qe->filter_chain);
+
+	switch (tcf_classify(skb, fl, &cl_res, false)) {
+	case TC_ACT_SHOT:
+		qdisc_qstats_drop(sch);
+		__qdisc_drop(skb, to_free);
+		*ret = __NET_XMIT_BYPASS;
+		return NULL;
+	case TC_ACT_STOLEN:
+	case TC_ACT_QUEUED:
+	case TC_ACT_TRAP:
+		__qdisc_drop(skb, to_free);
+		*ret = __NET_XMIT_STOLEN;
+		return NULL;
+	case TC_ACT_REDIRECT:
+		skb_do_redirect(skb);
+		*ret = __NET_XMIT_STOLEN;
+		return NULL;
+	}
+
+	return skb;
+}
+EXPORT_SYMBOL(tcf_qevent_handle);
+
 static __net_init int tcf_net_init(struct net *net)
 {
 	struct tcf_net *tn = net_generic(net, tcf_net_id);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH net-next 2/3] net: sched: sch_red: Split init and change callbacks
  2020-05-26 17:10 [RFC PATCH net-next 0/3] TC: Introduce qevents Petr Machata
  2020-05-26 17:10 ` [RFC PATCH net-next 1/3] net: sched: Introduce helpers for qevent blocks Petr Machata
@ 2020-05-26 17:10 ` Petr Machata
  2020-05-26 18:32   ` Jakub Kicinski
  2020-05-26 17:10 ` [RFC PATCH net-next 3/3] net: sched: sch_red: Add qevents "early" and "mark" Petr Machata
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 26+ messages in thread
From: Petr Machata @ 2020-05-26 17:10 UTC (permalink / raw)
  To: netdev; +Cc: Jakub Kicinski, Eric Dumazet, jhs, jiri, idosch, Petr Machata

In the following patches, RED will get two qevents. The implementation will
be clearer if the callback for change is not a pure subset of the callback
for init. Split the two and promote attribute parsing to the callbacks
themselves from the common code, because it will be handy there.

Signed-off-by: Petr Machata <petrm@mellanox.com>
---
 net/sched/sch_red.c | 43 +++++++++++++++++++++++++++++++------------
 1 file changed, 31 insertions(+), 12 deletions(-)

diff --git a/net/sched/sch_red.c b/net/sched/sch_red.c
index 555a1b9e467f..c52a40ad5e59 100644
--- a/net/sched/sch_red.c
+++ b/net/sched/sch_red.c
@@ -215,12 +215,11 @@ static const struct nla_policy red_policy[TCA_RED_MAX + 1] = {
 	[TCA_RED_FLAGS] = NLA_POLICY_BITFIELD32(TC_RED_SUPPORTED_FLAGS),
 };
 
-static int red_change(struct Qdisc *sch, struct nlattr *opt,
-		      struct netlink_ext_ack *extack)
+static int __red_change(struct Qdisc *sch, struct nlattr **tb,
+			struct netlink_ext_ack *extack)
 {
 	struct Qdisc *old_child = NULL, *child = NULL;
 	struct red_sched_data *q = qdisc_priv(sch);
-	struct nlattr *tb[TCA_RED_MAX + 1];
 	struct nla_bitfield32 flags_bf;
 	struct tc_red_qopt *ctl;
 	unsigned char userbits;
@@ -228,14 +227,6 @@ static int red_change(struct Qdisc *sch, struct nlattr *opt,
 	int err;
 	u32 max_P;
 
-	if (opt == NULL)
-		return -EINVAL;
-
-	err = nla_parse_nested_deprecated(tb, TCA_RED_MAX, opt, red_policy,
-					  NULL);
-	if (err < 0)
-		return err;
-
 	if (tb[TCA_RED_PARMS] == NULL ||
 	    tb[TCA_RED_STAB] == NULL)
 		return -EINVAL;
@@ -323,11 +314,39 @@ static int red_init(struct Qdisc *sch, struct nlattr *opt,
 		    struct netlink_ext_ack *extack)
 {
 	struct red_sched_data *q = qdisc_priv(sch);
+	struct nlattr *tb[TCA_RED_MAX + 1];
+	int err;
+
+	if (!opt)
+		return -EINVAL;
+
+	err = nla_parse_nested_deprecated(tb, TCA_RED_MAX, opt, red_policy,
+					  extack);
+	if (err < 0)
+		return err;
 
 	q->qdisc = &noop_qdisc;
 	q->sch = sch;
 	timer_setup(&q->adapt_timer, red_adaptative_timer, 0);
-	return red_change(sch, opt, extack);
+	return __red_change(sch, tb, extack);
+}
+
+static int red_change(struct Qdisc *sch, struct nlattr *opt,
+		      struct netlink_ext_ack *extack)
+{
+	struct red_sched_data *q = qdisc_priv(sch);
+	struct nlattr *tb[TCA_RED_MAX + 1];
+	int err;
+
+	if (!opt)
+		return -EINVAL;
+
+	err = nla_parse_nested_deprecated(tb, TCA_RED_MAX, opt, red_policy,
+					  extack);
+	if (err < 0)
+		return err;
+
+	return __red_change(sch, tb, extack);
 }
 
 static int red_dump_offload_stats(struct Qdisc *sch)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH net-next 3/3] net: sched: sch_red: Add qevents "early" and "mark"
  2020-05-26 17:10 [RFC PATCH net-next 0/3] TC: Introduce qevents Petr Machata
  2020-05-26 17:10 ` [RFC PATCH net-next 1/3] net: sched: Introduce helpers for qevent blocks Petr Machata
  2020-05-26 17:10 ` [RFC PATCH net-next 2/3] net: sched: sch_red: Split init and change callbacks Petr Machata
@ 2020-05-26 17:10 ` Petr Machata
  2020-05-26 17:10 ` [RFC PATCH iproute2-next 1/4] uapi: pkt_sched: Add two new RED attributes Petr Machata
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 26+ messages in thread
From: Petr Machata @ 2020-05-26 17:10 UTC (permalink / raw)
  To: netdev; +Cc: Jakub Kicinski, Eric Dumazet, jhs, jiri, idosch, Petr Machata

In order to allow acting on dropped and/or ECN-marked packets, add two new
qevents to the RED qdisc: "early" and "mark". Filters attached at "early"
block are executed as packets are early-dropped, those attached at the
"mark" block are executed as packets are ECN-marked.

Two new attributes are introduced: TCA_RED_EARLY_BLOCK with the block index
for the "early" qevent, and TCA_RED_MARK_BLOCK for the "mark" qevent.
Absence of these attributes signifies "don't care": no block is allocated
in that case, or the existing blocks are left intact in case of the change
callback.

For purposes of offloading, blocks attached to these qevents appear with
newly-introduced binder types, FLOW_BLOCK_BINDER_TYPE_RED_EARLY and
FLOW_BLOCK_BINDER_TYPE_RED_MARK.

Signed-off-by: Petr Machata <petrm@mellanox.com>
---
 include/net/flow_offload.h     |  2 ++
 include/uapi/linux/pkt_sched.h |  2 ++
 net/sched/sch_red.c            | 59 +++++++++++++++++++++++++++++++++-
 3 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index 4001ffb04f0d..635d2bb57550 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -416,6 +416,8 @@ enum flow_block_binder_type {
 	FLOW_BLOCK_BINDER_TYPE_UNSPEC,
 	FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS,
 	FLOW_BLOCK_BINDER_TYPE_CLSACT_EGRESS,
+	FLOW_BLOCK_BINDER_TYPE_RED_EARLY,
+	FLOW_BLOCK_BINDER_TYPE_RED_MARK,
 };
 
 struct flow_block {
diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index a95f3ae7ab37..ff3f4830e049 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -257,6 +257,8 @@ enum {
 	TCA_RED_STAB,
 	TCA_RED_MAX_P,
 	TCA_RED_FLAGS,		/* bitfield32 */
+	TCA_RED_EARLY_BLOCK,	/* u32 */
+	TCA_RED_MARK_BLOCK,	/* u32 */
 	__TCA_RED_MAX,
 };
 
diff --git a/net/sched/sch_red.c b/net/sched/sch_red.c
index c52a40ad5e59..0c6a6429ca02 100644
--- a/net/sched/sch_red.c
+++ b/net/sched/sch_red.c
@@ -46,6 +46,8 @@ struct red_sched_data {
 	struct red_vars		vars;
 	struct red_stats	stats;
 	struct Qdisc		*qdisc;
+	struct tcf_qevent	qe_early;
+	struct tcf_qevent	qe_mark;
 };
 
 #define TC_RED_SUPPORTED_FLAGS (TC_RED_HISTORIC_FLAGS | TC_RED_NODROP)
@@ -92,6 +94,10 @@ static int red_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 
 		if (INET_ECN_set_ce(skb)) {
 			q->stats.prob_mark++;
+			skb = tcf_qevent_handle(&q->qe_mark, sch,
+						skb, to_free, &ret);
+			if (!skb)
+				return NET_XMIT_CN | ret;
 		} else if (!red_use_nodrop(q)) {
 			q->stats.prob_drop++;
 			goto congestion_drop;
@@ -109,6 +115,10 @@ static int red_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 
 		if (INET_ECN_set_ce(skb)) {
 			q->stats.forced_mark++;
+			skb = tcf_qevent_handle(&q->qe_mark, sch,
+						skb, to_free, &ret);
+			if (!skb)
+				return NET_XMIT_CN | ret;
 		} else if (!red_use_nodrop(q)) {
 			q->stats.forced_drop++;
 			goto congestion_drop;
@@ -129,6 +139,11 @@ static int red_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	return ret;
 
 congestion_drop:
+	skb = tcf_qevent_handle(&q->qe_early, sch,
+				skb, to_free, &ret);
+	if (!skb)
+		return NET_XMIT_CN | ret;
+
 	qdisc_drop(skb, sch, to_free);
 	return NET_XMIT_CN;
 }
@@ -202,6 +217,8 @@ static void red_destroy(struct Qdisc *sch)
 {
 	struct red_sched_data *q = qdisc_priv(sch);
 
+	tcf_qevent_destroy(&q->qe_mark, sch);
+	tcf_qevent_destroy(&q->qe_early, sch);
 	del_timer_sync(&q->adapt_timer);
 	red_offload(sch, false);
 	qdisc_put(q->qdisc);
@@ -213,6 +230,8 @@ static const struct nla_policy red_policy[TCA_RED_MAX + 1] = {
 	[TCA_RED_STAB]	= { .len = RED_STAB_SIZE },
 	[TCA_RED_MAX_P] = { .type = NLA_U32 },
 	[TCA_RED_FLAGS] = NLA_POLICY_BITFIELD32(TC_RED_SUPPORTED_FLAGS),
+	[TCA_RED_EARLY_BLOCK] = { .type = NLA_U32 },
+	[TCA_RED_MARK_BLOCK] = { .type = NLA_U32 },
 };
 
 static int __red_change(struct Qdisc *sch, struct nlattr **tb,
@@ -328,7 +347,35 @@ static int red_init(struct Qdisc *sch, struct nlattr *opt,
 	q->qdisc = &noop_qdisc;
 	q->sch = sch;
 	timer_setup(&q->adapt_timer, red_adaptative_timer, 0);
-	return __red_change(sch, tb, extack);
+
+	q->qe_early.attr_name = TCA_RED_EARLY_BLOCK;
+	q->qe_mark.attr_name = TCA_RED_MARK_BLOCK;
+
+	err = __red_change(sch, tb, extack);
+	if (err)
+		return err;
+
+	err = tcf_qevent_init(&q->qe_early, sch,
+			      FLOW_BLOCK_BINDER_TYPE_RED_EARLY,
+			      tb[TCA_RED_EARLY_BLOCK], extack);
+	if (err)
+		goto err_early_init;
+
+	err = tcf_qevent_init(&q->qe_mark, sch,
+			      FLOW_BLOCK_BINDER_TYPE_RED_MARK,
+			      tb[TCA_RED_MARK_BLOCK], extack);
+	if (err)
+		goto err_mark_init;
+
+	return 0;
+
+err_mark_init:
+	tcf_qevent_destroy(&q->qe_early, sch);
+err_early_init:
+	del_timer_sync(&q->adapt_timer);
+	red_offload(sch, false);
+	qdisc_put(q->qdisc);
+	return err;
 }
 
 static int red_change(struct Qdisc *sch, struct nlattr *opt,
@@ -346,6 +393,16 @@ static int red_change(struct Qdisc *sch, struct nlattr *opt,
 	if (err < 0)
 		return err;
 
+	err = tcf_qevent_validate_change(&q->qe_early, tb[TCA_RED_EARLY_BLOCK],
+					 extack);
+	if (err)
+		return err;
+
+	err = tcf_qevent_validate_change(&q->qe_mark, tb[TCA_RED_MARK_BLOCK],
+					 extack);
+	if (err)
+		return err;
+
 	return __red_change(sch, tb, extack);
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH iproute2-next 1/4] uapi: pkt_sched: Add two new RED attributes
  2020-05-26 17:10 [RFC PATCH net-next 0/3] TC: Introduce qevents Petr Machata
                   ` (2 preceding siblings ...)
  2020-05-26 17:10 ` [RFC PATCH net-next 3/3] net: sched: sch_red: Add qevents "early" and "mark" Petr Machata
@ 2020-05-26 17:10 ` Petr Machata
  2020-05-26 17:10   ` [RFC PATCH iproute2-next 2/4] tc: Add helpers to support qevent parsing Petr Machata
                     ` (2 more replies)
  2020-05-27  4:09 ` [RFC PATCH net-next 0/3] TC: Introduce qevents Cong Wang
  2020-05-27 15:19 ` Vladimir Oltean
  5 siblings, 3 replies; 26+ messages in thread
From: Petr Machata @ 2020-05-26 17:10 UTC (permalink / raw)
  To: netdev; +Cc: Jakub Kicinski, Eric Dumazet, jhs, jiri, idosch, Petr Machata

Signed-off-by: Petr Machata <petrm@mellanox.com>
---
 include/uapi/linux/pkt_sched.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 0c02737c..203272df 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -257,6 +257,8 @@ enum {
 	TCA_RED_STAB,
 	TCA_RED_MAX_P,
 	TCA_RED_FLAGS,		/* bitfield32 */
+	TCA_RED_EARLY_BLOCK,	/* u32 */
+	TCA_RED_MARK_BLOCK,	/* u32 */
 	__TCA_RED_MAX,
 };
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH iproute2-next 2/4] tc: Add helpers to support qevent parsing
  2020-05-26 17:10 ` [RFC PATCH iproute2-next 1/4] uapi: pkt_sched: Add two new RED attributes Petr Machata
@ 2020-05-26 17:10   ` Petr Machata
  2020-05-26 17:10   ` [RFC PATCH iproute2-next 3/4] man: tc: Describe qevents Petr Machata
  2020-05-26 17:10   ` [RFC PATCH iproute2-next 4/4] tc: q_red: Add support for qevents "mark" and "early" Petr Machata
  2 siblings, 0 replies; 26+ messages in thread
From: Petr Machata @ 2020-05-26 17:10 UTC (permalink / raw)
  To: netdev; +Cc: Jakub Kicinski, Eric Dumazet, jhs, jiri, idosch, Petr Machata

Introduce a set of helpers to make it easy to add support for qevents into
qdisc.

The idea behind this is that qevent types will be generally reused between
qdiscs, rather than each having a completely idiosyncratic set of qevents.
The qevent module holds functions for parsing of these common qevent types,
and for dispatching to one of the parsers based on the qevent name.

Signed-off-by: Petr Machata <petrm@mellanox.com>
---
 tc/Makefile    |  1 +
 tc/tc_qevent.c | 83 ++++++++++++++++++++++++++++++++++++++++++++++++++
 tc/tc_qevent.h | 34 +++++++++++++++++++++
 3 files changed, 118 insertions(+)
 create mode 100644 tc/tc_qevent.c
 create mode 100644 tc/tc_qevent.h

diff --git a/tc/Makefile b/tc/Makefile
index e31cbc12..10567f33 100644
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -121,6 +121,7 @@ TCLIB += tc_red.o
 TCLIB += tc_cbq.o
 TCLIB += tc_estimator.o
 TCLIB += tc_stab.o
+TCLIB += tc_qevent.o
 
 CFLAGS += -DCONFIG_GACT -DCONFIG_GACT_PROB
 ifneq ($(IPT_LIB_DIR),)
diff --git a/tc/tc_qevent.c b/tc/tc_qevent.c
new file mode 100644
index 00000000..f2b22778
--- /dev/null
+++ b/tc/tc_qevent.c
@@ -0,0 +1,83 @@
+// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
+
+/*
+ * Helpers for handling qevents.
+ */
+
+#include <stdio.h>
+#include <string.h>
+
+#include "tc_qevent.h"
+#include "utils.h"
+
+int qevent_parse(struct qevent_util *qevents, int *p_argc, char ***p_argv)
+{
+	char **argv = *p_argv;
+	int argc = *p_argc;
+	const char *name = *argv;
+	int err;
+
+	for (; qevents->id; qevents++) {
+		if (strcmp(name, qevents->id) == 0) {
+			NEXT_ARG();
+			err = qevents->parse_qevent(&argc, &argv,
+						    qevents->data);
+			if (err)
+				return err;
+
+			*p_argc = argc;
+			*p_argv = argv;
+			return 0;
+		}
+	}
+
+	fprintf(stderr, "Unknown qevent `%s'\n", name);
+	return -1;
+}
+
+static int parse_block_idx(const char *arg, struct qevent_base *qeb)
+{
+	if (qeb->block_idx) {
+		fprintf(stderr, "Qevent block index already specified\n");
+		return -1;
+	}
+
+	if (get_unsigned(&qeb->block_idx, arg, 10) || !qeb->block_idx) {
+		fprintf(stderr, "Illegal qevent block index\n");
+		return -1;
+	}
+
+	return 0;
+}
+
+int qevent_parse_plain(int *p_argc, char ***p_argv, void *p_data)
+{
+	struct qevent_plain *qe = p_data;
+	char **argv = *p_argv;
+	int argc = *p_argc;
+
+	if (qe->base.block_idx) {
+		fprintf(stderr, "Duplicate qevent\n");
+		return -1;
+	}
+
+	while (argc > 0) {
+		if (strcmp(*argv, "block") == 0) {
+			NEXT_ARG();
+			if (parse_block_idx(*argv, &qe->base))
+				return -1;
+		} else {
+			break;
+		}
+		NEXT_ARG_FWD();
+	}
+
+	if (!qe->base.block_idx) {
+		fprintf(stderr, "Unspecified qevent block index\n");
+		return -1;
+	}
+
+	*p_argc = argc;
+	*p_argv = argv;
+	return 0;
+}
diff --git a/tc/tc_qevent.h b/tc/tc_qevent.h
new file mode 100644
index 00000000..548f3fc6
--- /dev/null
+++ b/tc/tc_qevent.h
@@ -0,0 +1,34 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _TC_QEVENT_H_
+#define _TC_QEVENT_H_
+
+#include <linux/types.h>
+
+struct qevent_base {
+	__u32 block_idx;
+};
+
+struct qevent_util {
+	const char *id;
+	int (*parse_qevent)(int *argc, char ***argv, void *data);
+	void *data;
+};
+
+#define QEVENT(_name, _parser, _data)					\
+	{								\
+		.id = _name,						\
+		.parse_qevent = qevent_parse_##_parser,			\
+		.data = ({						\
+				struct qevent_##_parser *__data = _data; \
+				__data;					\
+		}),							\
+	}
+
+int qevent_parse(struct qevent_util *qevents, int *p_argc, char ***p_argv);
+
+struct qevent_plain {
+	struct qevent_base base;
+};
+int qevent_parse_plain(int *p_argc, char ***p_argv, void *p_data);
+
+#endif
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH iproute2-next 3/4] man: tc: Describe qevents
  2020-05-26 17:10 ` [RFC PATCH iproute2-next 1/4] uapi: pkt_sched: Add two new RED attributes Petr Machata
  2020-05-26 17:10   ` [RFC PATCH iproute2-next 2/4] tc: Add helpers to support qevent parsing Petr Machata
@ 2020-05-26 17:10   ` Petr Machata
  2020-05-26 17:10   ` [RFC PATCH iproute2-next 4/4] tc: q_red: Add support for qevents "mark" and "early" Petr Machata
  2 siblings, 0 replies; 26+ messages in thread
From: Petr Machata @ 2020-05-26 17:10 UTC (permalink / raw)
  To: netdev; +Cc: Jakub Kicinski, Eric Dumazet, jhs, jiri, idosch, Petr Machata

Add some general remarks about qevents.

Signed-off-by: Petr Machata <petrm@mellanox.com>
---
 man/man8/tc.8 | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/man/man8/tc.8 b/man/man8/tc.8
index e8e0cd0f..970440f6 100644
--- a/man/man8/tc.8
+++ b/man/man8/tc.8
@@ -254,6 +254,25 @@ Traffic control filter that matches every packet. See
 .BR tc-matchall (8)
 for details.
 
+.SH QEVENTS
+Qdiscs may invoke user-configured actions when certain interesting events
+take place in the qdisc. Each qevent can either be unused, or can have a
+block attached to it. To this block are then attached filters using the "tc
+block BLOCK_IDX" syntax. The block is executed when the qevent associated
+with the attachment point takes place. For example, packet could be
+dropped, or delayed, etc., depending on the qdisc and the qevent in
+question.
+
+For example:
+.PP
+.RS
+tc qdisc add dev eth0 root handle 1: red limit 500K avpkt 1K \\
+   qevent early block 10
+.RE
+.RS
+tc filter add block 10 matchall action mirred egress mirror dev eth1
+.RE
+
 .SH CLASSLESS QDISCS
 The classless qdiscs are:
 .TP
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH iproute2-next 4/4] tc: q_red: Add support for qevents "mark" and "early"
  2020-05-26 17:10 ` [RFC PATCH iproute2-next 1/4] uapi: pkt_sched: Add two new RED attributes Petr Machata
  2020-05-26 17:10   ` [RFC PATCH iproute2-next 2/4] tc: Add helpers to support qevent parsing Petr Machata
  2020-05-26 17:10   ` [RFC PATCH iproute2-next 3/4] man: tc: Describe qevents Petr Machata
@ 2020-05-26 17:10   ` Petr Machata
  2 siblings, 0 replies; 26+ messages in thread
From: Petr Machata @ 2020-05-26 17:10 UTC (permalink / raw)
  To: netdev; +Cc: Jakub Kicinski, Eric Dumazet, jhs, jiri, idosch, Petr Machata

The "early" qevent matches packets that have been early-dropped. The
"mark" qevent matches packets that have been ECN-marked.

Signed-off-by: Petr Machata <petrm@mellanox.com>
---
 man/man8/tc-red.8 | 18 +++++++++++++++++-
 tc/q_red.c        | 23 ++++++++++++++++++++++-
 2 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/man/man8/tc-red.8 b/man/man8/tc-red.8
index b5aaa986..e74dd330 100644
--- a/man/man8/tc-red.8
+++ b/man/man8/tc-red.8
@@ -17,7 +17,11 @@ packets
 rate
 .B ] [ probability
 chance
-.B ] [ adaptive ]
+.B ] [ adaptive ] [ qevent early block
+index
+.B ] [ qevent mark block
+index
+.B ]
 
 .SH DESCRIPTION
 Random Early Detection is a classless qdisc which manages its queue size
@@ -134,6 +138,18 @@ Goal of Adaptive RED is to make 'probability' dynamic value between 1% and 50% t
 .B (max - min) / 2
 .fi
 
+.SH QEVENTS
+See tc (8) for some general notes about qevents. The RED qdisc supports the
+following qevents:
+
+.TP
+early
+The associated block is executed when packets are early-dropped. This includes
+non-ECT packets in ECN mode.
+.TP
+mark
+The associated block is executed when packets are marked in ECN mode.
+
 .SH EXAMPLE
 
 .P
diff --git a/tc/q_red.c b/tc/q_red.c
index 53181c82..7e7dfa05 100644
--- a/tc/q_red.c
+++ b/tc/q_red.c
@@ -22,6 +22,7 @@
 
 #include "utils.h"
 #include "tc_util.h"
+#include "tc_qevent.h"
 
 #include "tc_red.h"
 
@@ -30,7 +31,8 @@ static void explain(void)
 	fprintf(stderr,
 		"Usage: ... red	limit BYTES [min BYTES] [max BYTES] avpkt BYTES [burst PACKETS]\n"
 		"		[adaptive] [probability PROBABILITY] [bandwidth KBPS]\n"
-		"		[ecn] [harddrop] [nodrop]\n");
+		"		[ecn] [harddrop] [nodrop]\n"
+		"		[qevent early block IDX] [qevent mark block IDX]\n");
 }
 
 #define RED_SUPPORTED_FLAGS (TC_RED_HISTORIC_FLAGS | TC_RED_NODROP)
@@ -38,6 +40,14 @@ static void explain(void)
 static int red_parse_opt(struct qdisc_util *qu, int argc, char **argv,
 			 struct nlmsghdr *n, const char *dev)
 {
+	struct qevent_plain qe_early = {};
+	struct qevent_plain qe_mark = {};
+	struct qevent_util qevents[] = {
+		QEVENT("early", plain, &qe_early),
+		QEVENT("mark", plain, &qe_mark),
+		{},
+	};
+
 	struct nla_bitfield32 flags_bf = {
 		.selector = RED_SUPPORTED_FLAGS,
 	};
@@ -109,6 +119,11 @@ static int red_parse_opt(struct qdisc_util *qu, int argc, char **argv,
 			flags_bf.value |= TC_RED_ADAPTATIVE;
 		} else if (strcmp(*argv, "adaptive") == 0) {
 			flags_bf.value |= TC_RED_ADAPTATIVE;
+		} else if (matches(*argv, "qevent") == 0) {
+			NEXT_ARG();
+			if (qevent_parse(qevents, &argc, &argv))
+				return -1;
+			continue;
 		} else if (strcmp(*argv, "help") == 0) {
 			explain();
 			return -1;
@@ -162,6 +177,12 @@ static int red_parse_opt(struct qdisc_util *qu, int argc, char **argv,
 	max_P = probability * pow(2, 32);
 	addattr_l(n, 1024, TCA_RED_MAX_P, &max_P, sizeof(max_P));
 	addattr_l(n, 1024, TCA_RED_FLAGS, &flags_bf, sizeof(flags_bf));
+	if (qe_early.base.block_idx)
+		addattr32(n, 1024, TCA_RED_EARLY_BLOCK,
+			  qe_early.base.block_idx);
+	if (qe_mark.base.block_idx)
+		addattr32(n, 1024, TCA_RED_MARK_BLOCK,
+			  qe_mark.base.block_idx);
 	addattr_nest_end(n, tail);
 	return 0;
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 2/3] net: sched: sch_red: Split init and change callbacks
  2020-05-26 17:10 ` [RFC PATCH net-next 2/3] net: sched: sch_red: Split init and change callbacks Petr Machata
@ 2020-05-26 18:32   ` Jakub Kicinski
  2020-05-27 15:23     ` Petr Machata
  0 siblings, 1 reply; 26+ messages in thread
From: Jakub Kicinski @ 2020-05-26 18:32 UTC (permalink / raw)
  To: Petr Machata; +Cc: netdev, Eric Dumazet, jhs, jiri, idosch

On Tue, 26 May 2020 20:10:06 +0300 Petr Machata wrote:
> +static int red_change(struct Qdisc *sch, struct nlattr *opt,
> +		      struct netlink_ext_ack *extack)
> +{
> +	struct red_sched_data *q = qdisc_priv(sch);

net/sched/sch_red.c: In function red_change:
net/sched/sch_red.c:337:25: warning: unused variable q [-Wunused-variable]
  337 |  struct red_sched_data *q = qdisc_priv(sch);
      |                         ^

Needs to go to the next patch.

> +	struct nlattr *tb[TCA_RED_MAX + 1];
> +	int err;
> +
> +	if (!opt)
> +		return -EINVAL;
> +
> +	err = nla_parse_nested_deprecated(tb, TCA_RED_MAX, opt, red_policy,
> +					  extack);
> +	if (err < 0)
> +		return err;
> +
> +	return __red_change(sch, tb, extack);
>  }

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 0/3] TC: Introduce qevents
  2020-05-26 17:10 [RFC PATCH net-next 0/3] TC: Introduce qevents Petr Machata
                   ` (3 preceding siblings ...)
  2020-05-26 17:10 ` [RFC PATCH iproute2-next 1/4] uapi: pkt_sched: Add two new RED attributes Petr Machata
@ 2020-05-27  4:09 ` Cong Wang
  2020-05-27  9:56   ` Petr Machata
  2020-06-01 13:40   ` Jiri Pirko
  2020-05-27 15:19 ` Vladimir Oltean
  5 siblings, 2 replies; 26+ messages in thread
From: Cong Wang @ 2020-05-27  4:09 UTC (permalink / raw)
  To: Petr Machata
  Cc: Linux Kernel Network Developers, Jakub Kicinski, Eric Dumazet,
	Jamal Hadi Salim, Jiri Pirko, Ido Schimmel

On Tue, May 26, 2020 at 10:11 AM Petr Machata <petrm@mellanox.com> wrote:
>
> The Spectrum hardware allows execution of one of several actions as a
> result of queue management events: tail-dropping, early-dropping, marking a
> packet, or passing a configured latency threshold or buffer size. Such
> packets can be mirrored, trapped, or sampled.
>
> Modeling the action to be taken as simply a TC action is very attractive,
> but it is not obvious where to put these actions. At least with ECN marking
> one could imagine a tree of qdiscs and classifiers that effectively
> accomplishes this task, albeit in an impractically complex manner. But
> there is just no way to match on dropped-ness of a packet, let alone
> dropped-ness due to a particular reason.
>
> To allow configuring user-defined actions as a result of inner workings of
> a qdisc, this patch set introduces a concept of qevents. Those are attach
> points for TC blocks, where filters can be put that are executed as the
> packet hits well-defined points in the qdisc algorithms. The attached
> blocks can be shared, in a manner similar to clsact ingress and egress
> blocks, arbitrary classifiers with arbitrary actions can be put on them,
> etc.

This concept does not fit well into qdisc, essentially you still want to
install filters (and actions) somewhere on qdisc, but currently all filters
are executed at enqueue, basically you want to execute them at other
pre-defined locations too, for example early drop.

So, perhaps adding a "position" in tc filter is better? Something like:

tc qdisc add dev x root handle 1: ... # same as before
tc filter add dev x parent 1:0 position early_drop matchall action....

And obviously default position must be "enqueue". Makes sense?

(The word "position" may be not accurate, but hope you get my point
here.)

Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 0/3] TC: Introduce qevents
  2020-05-27  4:09 ` [RFC PATCH net-next 0/3] TC: Introduce qevents Cong Wang
@ 2020-05-27  9:56   ` Petr Machata
  2020-05-28  4:00     ` Cong Wang
  2020-06-01 13:40   ` Jiri Pirko
  1 sibling, 1 reply; 26+ messages in thread
From: Petr Machata @ 2020-05-27  9:56 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, Jakub Kicinski, Eric Dumazet,
	Jamal Hadi Salim, Jiri Pirko, Ido Schimmel


Cong Wang <xiyou.wangcong@gmail.com> writes:

> On Tue, May 26, 2020 at 10:11 AM Petr Machata <petrm@mellanox.com> wrote:
>>
>> The Spectrum hardware allows execution of one of several actions as a
>> result of queue management events: tail-dropping, early-dropping, marking a
>> packet, or passing a configured latency threshold or buffer size. Such
>> packets can be mirrored, trapped, or sampled.
>>
>> Modeling the action to be taken as simply a TC action is very attractive,
>> but it is not obvious where to put these actions. At least with ECN marking
>> one could imagine a tree of qdiscs and classifiers that effectively
>> accomplishes this task, albeit in an impractically complex manner. But
>> there is just no way to match on dropped-ness of a packet, let alone
>> dropped-ness due to a particular reason.
>>
>> To allow configuring user-defined actions as a result of inner workings of
>> a qdisc, this patch set introduces a concept of qevents. Those are attach
>> points for TC blocks, where filters can be put that are executed as the
>> packet hits well-defined points in the qdisc algorithms. The attached
>> blocks can be shared, in a manner similar to clsact ingress and egress
>> blocks, arbitrary classifiers with arbitrary actions can be put on them,
>> etc.
>
> This concept does not fit well into qdisc, essentially you still want to
> install filters (and actions) somewhere on qdisc, but currently all filters
> are executed at enqueue, basically you want to execute them at other
> pre-defined locations too, for example early drop.
>
> So, perhaps adding a "position" in tc filter is better? Something like:
>
> tc qdisc add dev x root handle 1: ... # same as before
> tc filter add dev x parent 1:0 position early_drop matchall action....

Position would just be a chain index.

> And obviously default position must be "enqueue". Makes sense?

Chain 0.

So if I understand the proposal correctly: a qdisc has a classification
block (cl_ops->tcf_block). Qevents then reference a chain on that
classification block.

If the above is correct, I disagree that this is a better model. RED
does not need to classify anything, modelling this as a classification
block is wrong. It should be another block. (Also shareable, because as
an operator you likely want to treat all, say, early drops the same, and
therefore to add just one rule for all 128 or so of your qdiscs.)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 0/3] TC: Introduce qevents
  2020-05-26 17:10 [RFC PATCH net-next 0/3] TC: Introduce qevents Petr Machata
                   ` (4 preceding siblings ...)
  2020-05-27  4:09 ` [RFC PATCH net-next 0/3] TC: Introduce qevents Cong Wang
@ 2020-05-27 15:19 ` Vladimir Oltean
  2020-05-27 16:25   ` Petr Machata
  5 siblings, 1 reply; 26+ messages in thread
From: Vladimir Oltean @ 2020-05-27 15:19 UTC (permalink / raw)
  To: Petr Machata
  Cc: netdev, Jakub Kicinski, Eric Dumazet, Jamal Hadi Salim,
	Jiri Pirko, Ido Schimmel

Hi Petr,

On Tue, 26 May 2020 at 20:11, Petr Machata <petrm@mellanox.com> wrote:
>
> The Spectrum hardware allows execution of one of several actions as a
> result of queue management events: tail-dropping, early-dropping, marking a
> packet, or passing a configured latency threshold or buffer size. Such
> packets can be mirrored, trapped, or sampled.
>
> Modeling the action to be taken as simply a TC action is very attractive,
> but it is not obvious where to put these actions. At least with ECN marking
> one could imagine a tree of qdiscs and classifiers that effectively
> accomplishes this task, albeit in an impractically complex manner. But
> there is just no way to match on dropped-ness of a packet, let alone
> dropped-ness due to a particular reason.
>
> To allow configuring user-defined actions as a result of inner workings of
> a qdisc, this patch set introduces a concept of qevents. Those are attach
> points for TC blocks, where filters can be put that are executed as the
> packet hits well-defined points in the qdisc algorithms. The attached
> blocks can be shared, in a manner similar to clsact ingress and egress
> blocks, arbitrary classifiers with arbitrary actions can be put on them,
> etc.
>
> For example:
>
> # tc qdisc add dev eth0 root handle 1: \
>         red limit 500K avpkt 1K qevent early block 10
> # tc filter add block 10 \
>         matchall action mirred egress mirror dev eth1
>
> Patch #1 of this set introduces several helpers to allow easy and uniform
> addition of qevents to qdiscs. The following two patches, #2 and #3, then
> add two qevents to the RED qdisc: "early" qevent fires when a packet is
> early-dropped; "mark" qevent, when it is ECN-marked.
>
> This patch set does not deal with offloading. The idea there is that a
> driver will be able to figure out that a given block is used in qevent
> context by looking at binder type. A future patch-set will add a qdisc
> pointer to struct flow_block_offload, which a driver will be able to
> consult to glean the TC or other relevant attributes.
>
> Petr Machata (3):
>   net: sched: Introduce helpers for qevent blocks
>   net: sched: sch_red: Split init and change callbacks
>   net: sched: sch_red: Add qevents "early" and "mark"
>
>  include/net/flow_offload.h     |   2 +
>  include/net/pkt_cls.h          |  48 +++++++++++++++
>  include/uapi/linux/pkt_sched.h |   2 +
>  net/sched/cls_api.c            | 107 +++++++++++++++++++++++++++++++++
>  net/sched/sch_red.c            | 100 ++++++++++++++++++++++++++----
>  5 files changed, 247 insertions(+), 12 deletions(-)
>
> --
> 2.20.1
>

I only took a cursory glance at your patches. Can these "qevents" be
added to code outside of the packet scheduler, like to the bridge, for
example? Or can the bridge mark the packets somehow, and then any
generic qdisc be able to recognize this mark without specific code?
A very common use case which is currently not possible to implement is
to rate-limit flooded (broadcast, unknown unicast, unknown multicast)
traffic. Can your "qevents" be used to describe this, or must it be
described separately?

Thanks,
-Vladimir

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 2/3] net: sched: sch_red: Split init and change callbacks
  2020-05-26 18:32   ` Jakub Kicinski
@ 2020-05-27 15:23     ` Petr Machata
  0 siblings, 0 replies; 26+ messages in thread
From: Petr Machata @ 2020-05-27 15:23 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, Eric Dumazet, jhs, jiri, idosch


Jakub Kicinski <kuba@kernel.org> writes:

> On Tue, 26 May 2020 20:10:06 +0300 Petr Machata wrote:
>> +static int red_change(struct Qdisc *sch, struct nlattr *opt,
>> +		      struct netlink_ext_ack *extack)
>> +{
>> +	struct red_sched_data *q = qdisc_priv(sch);
>
> net/sched/sch_red.c: In function red_change:
> net/sched/sch_red.c:337:25: warning: unused variable q [-Wunused-variable]
>   337 |  struct red_sched_data *q = qdisc_priv(sch);
>       |                         ^
>
> Needs to go to the next patch.

Indeed. I neglected to do my usual "git rebase -x" dance :-/

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 0/3] TC: Introduce qevents
  2020-05-27 15:19 ` Vladimir Oltean
@ 2020-05-27 16:25   ` Petr Machata
  0 siblings, 0 replies; 26+ messages in thread
From: Petr Machata @ 2020-05-27 16:25 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, Jakub Kicinski, Eric Dumazet, Jamal Hadi Salim,
	Jiri Pirko, Ido Schimmel


Vladimir Oltean <olteanv@gmail.com> writes:

> I only took a cursory glance at your patches. Can these "qevents" be
> added to code outside of the packet scheduler, like to the bridge, for
> example? Or can the bridge mark the packets somehow, and then any
> generic qdisc be able to recognize this mark without specific code?
> A very common use case which is currently not possible to implement is
> to rate-limit flooded (broadcast, unknown unicast, unknown multicast)
> traffic. Can your "qevents" be used to describe this, or must it be
> described separately?

You mean something like a "flood" qevent? In principle nothing prevents
this, but it does not strike me as a very good fit. These events are
meant to be used on qdiscs, hence the "q" in the name. I am not sure it
makes sense to reuse them for bridge traffic policing.

Peeking in 802.1Q, I see "Managed objects for per-stream filtering and
policing". If that's related, it seems like it would be conservative to
model the policing directly in the bridge, instead of this round-about
through TC.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 0/3] TC: Introduce qevents
  2020-05-27  9:56   ` Petr Machata
@ 2020-05-28  4:00     ` Cong Wang
  2020-05-28  9:48       ` Petr Machata
  0 siblings, 1 reply; 26+ messages in thread
From: Cong Wang @ 2020-05-28  4:00 UTC (permalink / raw)
  To: Petr Machata
  Cc: Linux Kernel Network Developers, Jakub Kicinski, Eric Dumazet,
	Jamal Hadi Salim, Jiri Pirko, Ido Schimmel

On Wed, May 27, 2020 at 2:56 AM Petr Machata <petrm@mellanox.com> wrote:
>
>
> Cong Wang <xiyou.wangcong@gmail.com> writes:
>
> > On Tue, May 26, 2020 at 10:11 AM Petr Machata <petrm@mellanox.com> wrote:
> >>
> >> The Spectrum hardware allows execution of one of several actions as a
> >> result of queue management events: tail-dropping, early-dropping, marking a
> >> packet, or passing a configured latency threshold or buffer size. Such
> >> packets can be mirrored, trapped, or sampled.
> >>
> >> Modeling the action to be taken as simply a TC action is very attractive,
> >> but it is not obvious where to put these actions. At least with ECN marking
> >> one could imagine a tree of qdiscs and classifiers that effectively
> >> accomplishes this task, albeit in an impractically complex manner. But
> >> there is just no way to match on dropped-ness of a packet, let alone
> >> dropped-ness due to a particular reason.
> >>
> >> To allow configuring user-defined actions as a result of inner workings of
> >> a qdisc, this patch set introduces a concept of qevents. Those are attach
> >> points for TC blocks, where filters can be put that are executed as the
> >> packet hits well-defined points in the qdisc algorithms. The attached
> >> blocks can be shared, in a manner similar to clsact ingress and egress
> >> blocks, arbitrary classifiers with arbitrary actions can be put on them,
> >> etc.
> >
> > This concept does not fit well into qdisc, essentially you still want to
> > install filters (and actions) somewhere on qdisc, but currently all filters
> > are executed at enqueue, basically you want to execute them at other
> > pre-defined locations too, for example early drop.
> >
> > So, perhaps adding a "position" in tc filter is better? Something like:
> >
> > tc qdisc add dev x root handle 1: ... # same as before
> > tc filter add dev x parent 1:0 position early_drop matchall action....
>
> Position would just be a chain index.

Why? By position, I mean a place where we _execute_ tc filters on
a qdisc, currently there is only "enqueue". I don't see how this is
close to a chain which is basically a group of filters.


>
> > And obviously default position must be "enqueue". Makes sense?
>
> Chain 0.
>
> So if I understand the proposal correctly: a qdisc has a classification
> block (cl_ops->tcf_block). Qevents then reference a chain on that
> classification block.

No, I am suggesting to replace your qevents with position, because
as I said it does not fit well there.

>
> If the above is correct, I disagree that this is a better model. RED
> does not need to classify anything, modelling this as a classification
> block is wrong. It should be another block. (Also shareable, because as
> an operator you likely want to treat all, say, early drops the same, and
> therefore to add just one rule for all 128 or so of your qdiscs.)

You can still choose not to classify anything by using matchall. No
one is saying you have to do classification.

If you want to jump to a block, you can just use a goto action like
normal. So your above example can be replaced with:

# tc qdisc add dev eth0 root handle 1: \
        red limit 500K avpkt 1K

# tc filter add dev eth0 parent 1:0 position early_drop matchall \
   action goto chain 10

# tc chain add dev eth0 index 10 ...

See the difference?

Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 0/3] TC: Introduce qevents
  2020-05-28  4:00     ` Cong Wang
@ 2020-05-28  9:48       ` Petr Machata
  2020-05-30  4:48         ` Cong Wang
  2020-06-01 13:35         ` Jiri Pirko
  0 siblings, 2 replies; 26+ messages in thread
From: Petr Machata @ 2020-05-28  9:48 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, Jakub Kicinski, Eric Dumazet,
	Jamal Hadi Salim, Jiri Pirko, Ido Schimmel


Cong Wang <xiyou.wangcong@gmail.com> writes:

> On Wed, May 27, 2020 at 2:56 AM Petr Machata <petrm@mellanox.com> wrote:
>>
>>
>> Cong Wang <xiyou.wangcong@gmail.com> writes:
>>
>> > On Tue, May 26, 2020 at 10:11 AM Petr Machata <petrm@mellanox.com> wrote:
>> >>
>> >> The Spectrum hardware allows execution of one of several actions as a
>> >> result of queue management events: tail-dropping, early-dropping, marking a
>> >> packet, or passing a configured latency threshold or buffer size. Such
>> >> packets can be mirrored, trapped, or sampled.
>> >>
>> >> Modeling the action to be taken as simply a TC action is very attractive,
>> >> but it is not obvious where to put these actions. At least with ECN marking
>> >> one could imagine a tree of qdiscs and classifiers that effectively
>> >> accomplishes this task, albeit in an impractically complex manner. But
>> >> there is just no way to match on dropped-ness of a packet, let alone
>> >> dropped-ness due to a particular reason.
>> >>
>> >> To allow configuring user-defined actions as a result of inner workings of
>> >> a qdisc, this patch set introduces a concept of qevents. Those are attach
>> >> points for TC blocks, where filters can be put that are executed as the
>> >> packet hits well-defined points in the qdisc algorithms. The attached
>> >> blocks can be shared, in a manner similar to clsact ingress and egress
>> >> blocks, arbitrary classifiers with arbitrary actions can be put on them,
>> >> etc.
>> >
>> > This concept does not fit well into qdisc, essentially you still want to
>> > install filters (and actions) somewhere on qdisc, but currently all filters
>> > are executed at enqueue, basically you want to execute them at other
>> > pre-defined locations too, for example early drop.
>> >
>> > So, perhaps adding a "position" in tc filter is better? Something like:
>> >
>> > tc qdisc add dev x root handle 1: ... # same as before
>> > tc filter add dev x parent 1:0 position early_drop matchall action....
>>
>> Position would just be a chain index.
>
> Why? By position, I mean a place where we _execute_ tc filters on
> a qdisc, currently there is only "enqueue". I don't see how this is
> close to a chain which is basically a group of filters.

OK, I misunderstood what you mean.

So you propose to have further division within the block? To have sort
of namespaces within blocks or chains, where depending on the context,
only filters in the corresponding namespace are executed?

>>
>> > And obviously default position must be "enqueue". Makes sense?
>>
>> Chain 0.
>>
>> So if I understand the proposal correctly: a qdisc has a classification
>> block (cl_ops->tcf_block). Qevents then reference a chain on that
>> classification block.
>
> No, I am suggesting to replace your qevents with position, because
> as I said it does not fit well there.
>
>>
>> If the above is correct, I disagree that this is a better model. RED
>> does not need to classify anything, modelling this as a classification
>> block is wrong. It should be another block. (Also shareable, because as
>> an operator you likely want to treat all, say, early drops the same, and
>> therefore to add just one rule for all 128 or so of your qdiscs.)
>
> You can still choose not to classify anything by using matchall. No
> one is saying you have to do classification.

The point here is filter reuse, not classification.

> If you want to jump to a block, you can just use a goto action like

I don't think you can jump to a block. You can jump to a chain within
the same block.

> normal. So your above example can be replaced with:
>
> # tc qdisc add dev eth0 root handle 1: \
>         red limit 500K avpkt 1K
>
> # tc filter add dev eth0 parent 1:0 position early_drop matchall \
>    action goto chain 10
>
> # tc chain add dev eth0 index 10 ...
>
> See the difference?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 0/3] TC: Introduce qevents
  2020-05-28  9:48       ` Petr Machata
@ 2020-05-30  4:48         ` Cong Wang
  2020-05-30  8:55           ` Petr Machata
  2020-06-01 13:35         ` Jiri Pirko
  1 sibling, 1 reply; 26+ messages in thread
From: Cong Wang @ 2020-05-30  4:48 UTC (permalink / raw)
  To: Petr Machata
  Cc: Linux Kernel Network Developers, Jakub Kicinski, Eric Dumazet,
	Jamal Hadi Salim, Jiri Pirko, Ido Schimmel

On Thu, May 28, 2020 at 2:48 AM Petr Machata <petrm@mellanox.com> wrote:
> So you propose to have further division within the block? To have sort
> of namespaces within blocks or chains, where depending on the context,
> only filters in the corresponding namespace are executed?

What I suggest is to let filters (or chain or block) decide where
they belong to, because I think that fit naturally.

What you suggest is to let qdisc's decide where they put filters,
which looks odd to me.

Hope it is all clear now.

Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 0/3] TC: Introduce qevents
  2020-05-30  4:48         ` Cong Wang
@ 2020-05-30  8:55           ` Petr Machata
  2020-06-01 20:01             ` Cong Wang
  0 siblings, 1 reply; 26+ messages in thread
From: Petr Machata @ 2020-05-30  8:55 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, Jakub Kicinski, Eric Dumazet,
	Jamal Hadi Salim, Jiri Pirko, Ido Schimmel


Cong Wang <xiyou.wangcong@gmail.com> writes:

> On Thu, May 28, 2020 at 2:48 AM Petr Machata <petrm@mellanox.com> wrote:
>> So you propose to have further division within the block? To have sort
>> of namespaces within blocks or chains, where depending on the context,
>> only filters in the corresponding namespace are executed?
>
> What I suggest is to let filters (or chain or block) decide where
> they belong to, because I think that fit naturally.

So filters would have this attribute that marks them for execution in
the qevent context?

Does "goto chain" keep this position? I.e. are the executed filters from
the "position" context or not? If they keep the position, then this new
system can be equivalently modeled as simply a block binding point.

If "goto chain" loses the position, that is just a block binding point
and a chain to start on, instead of the default zero.

So introducing the position does not seem to allow us to model anything
that could not be modeled before. But it is a more complicated system.

> What you suggest is to let qdisc's decide where they put filters,
> which looks odd to me.

Ultimately the qdisc decides what qevents to expose. Qevents are closely
tied to the inner workings of a qdisc algorithm, they can't be probed as
modules the way qdiscs, filters and actions can. If a user wishes to
make use of them, they will have to let qdiscs "dictate" where to put
the filters one way or another.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 0/3] TC: Introduce qevents
  2020-05-28  9:48       ` Petr Machata
  2020-05-30  4:48         ` Cong Wang
@ 2020-06-01 13:35         ` Jiri Pirko
  1 sibling, 0 replies; 26+ messages in thread
From: Jiri Pirko @ 2020-06-01 13:35 UTC (permalink / raw)
  To: Petr Machata
  Cc: Cong Wang, Linux Kernel Network Developers, Jakub Kicinski,
	Eric Dumazet, Jamal Hadi Salim, Jiri Pirko, Ido Schimmel

Thu, May 28, 2020 at 11:48:50AM CEST, petrm@mellanox.com wrote:
>
>Cong Wang <xiyou.wangcong@gmail.com> writes:
>
>> On Wed, May 27, 2020 at 2:56 AM Petr Machata <petrm@mellanox.com> wrote:
>>>
>>>
>>> Cong Wang <xiyou.wangcong@gmail.com> writes:
>>>
>>> > On Tue, May 26, 2020 at 10:11 AM Petr Machata <petrm@mellanox.com> wrote:
>>> >>
>>> >> The Spectrum hardware allows execution of one of several actions as a
>>> >> result of queue management events: tail-dropping, early-dropping, marking a
>>> >> packet, or passing a configured latency threshold or buffer size. Such
>>> >> packets can be mirrored, trapped, or sampled.
>>> >>
>>> >> Modeling the action to be taken as simply a TC action is very attractive,
>>> >> but it is not obvious where to put these actions. At least with ECN marking
>>> >> one could imagine a tree of qdiscs and classifiers that effectively
>>> >> accomplishes this task, albeit in an impractically complex manner. But
>>> >> there is just no way to match on dropped-ness of a packet, let alone
>>> >> dropped-ness due to a particular reason.
>>> >>
>>> >> To allow configuring user-defined actions as a result of inner workings of
>>> >> a qdisc, this patch set introduces a concept of qevents. Those are attach
>>> >> points for TC blocks, where filters can be put that are executed as the
>>> >> packet hits well-defined points in the qdisc algorithms. The attached
>>> >> blocks can be shared, in a manner similar to clsact ingress and egress
>>> >> blocks, arbitrary classifiers with arbitrary actions can be put on them,
>>> >> etc.
>>> >
>>> > This concept does not fit well into qdisc, essentially you still want to
>>> > install filters (and actions) somewhere on qdisc, but currently all filters
>>> > are executed at enqueue, basically you want to execute them at other
>>> > pre-defined locations too, for example early drop.
>>> >
>>> > So, perhaps adding a "position" in tc filter is better? Something like:
>>> >
>>> > tc qdisc add dev x root handle 1: ... # same as before
>>> > tc filter add dev x parent 1:0 position early_drop matchall action....
>>>
>>> Position would just be a chain index.
>>
>> Why? By position, I mean a place where we _execute_ tc filters on
>> a qdisc, currently there is only "enqueue". I don't see how this is
>> close to a chain which is basically a group of filters.
>
>OK, I misunderstood what you mean.
>
>So you propose to have further division within the block? To have sort
>of namespaces within blocks or chains, where depending on the context,
>only filters in the corresponding namespace are executed?

Please take the block as a whole entity. It has one entry ->chain0
processing. The gotos to other chains should be contained within the
block.

Please don't divide the block. If you want to process the block from a
different entry point, please process it as a whole.


>
>>>
>>> > And obviously default position must be "enqueue". Makes sense?
>>>
>>> Chain 0.
>>>
>>> So if I understand the proposal correctly: a qdisc has a classification
>>> block (cl_ops->tcf_block). Qevents then reference a chain on that
>>> classification block.
>>
>> No, I am suggesting to replace your qevents with position, because
>> as I said it does not fit well there.
>>
>>>
>>> If the above is correct, I disagree that this is a better model. RED
>>> does not need to classify anything, modelling this as a classification
>>> block is wrong. It should be another block. (Also shareable, because as
>>> an operator you likely want to treat all, say, early drops the same, and
>>> therefore to add just one rule for all 128 or so of your qdiscs.)
>>
>> You can still choose not to classify anything by using matchall. No
>> one is saying you have to do classification.
>
>The point here is filter reuse, not classification.
>
>> If you want to jump to a block, you can just use a goto action like
>
>I don't think you can jump to a block. You can jump to a chain within
>the same block.

Correct.

entrypoint->
   block X
     chain 0
     chain A
     chain B
     chain ...

There's no goto/jump out of the block.

>
>> normal. So your above example can be replaced with:
>>
>> # tc qdisc add dev eth0 root handle 1: \
>>         red limit 500K avpkt 1K
>>
>> # tc filter add dev eth0 parent 1:0 position early_drop matchall \
>>    action goto chain 10
>>
>> # tc chain add dev eth0 index 10 ...
>>
>> See the difference?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 0/3] TC: Introduce qevents
  2020-05-27  4:09 ` [RFC PATCH net-next 0/3] TC: Introduce qevents Cong Wang
  2020-05-27  9:56   ` Petr Machata
@ 2020-06-01 13:40   ` Jiri Pirko
  2020-06-01 19:50     ` Cong Wang
  1 sibling, 1 reply; 26+ messages in thread
From: Jiri Pirko @ 2020-06-01 13:40 UTC (permalink / raw)
  To: Cong Wang
  Cc: Petr Machata, Linux Kernel Network Developers, Jakub Kicinski,
	Eric Dumazet, Jamal Hadi Salim, Jiri Pirko, Ido Schimmel

Wed, May 27, 2020 at 06:09:03AM CEST, xiyou.wangcong@gmail.com wrote:
>On Tue, May 26, 2020 at 10:11 AM Petr Machata <petrm@mellanox.com> wrote:
>>
>> The Spectrum hardware allows execution of one of several actions as a
>> result of queue management events: tail-dropping, early-dropping, marking a
>> packet, or passing a configured latency threshold or buffer size. Such
>> packets can be mirrored, trapped, or sampled.
>>
>> Modeling the action to be taken as simply a TC action is very attractive,
>> but it is not obvious where to put these actions. At least with ECN marking
>> one could imagine a tree of qdiscs and classifiers that effectively
>> accomplishes this task, albeit in an impractically complex manner. But
>> there is just no way to match on dropped-ness of a packet, let alone
>> dropped-ness due to a particular reason.
>>
>> To allow configuring user-defined actions as a result of inner workings of
>> a qdisc, this patch set introduces a concept of qevents. Those are attach
>> points for TC blocks, where filters can be put that are executed as the
>> packet hits well-defined points in the qdisc algorithms. The attached
>> blocks can be shared, in a manner similar to clsact ingress and egress
>> blocks, arbitrary classifiers with arbitrary actions can be put on them,
>> etc.
>
>This concept does not fit well into qdisc, essentially you still want to
>install filters (and actions) somewhere on qdisc, but currently all filters
>are executed at enqueue, basically you want to execute them at other
>pre-defined locations too, for example early drop.
>
>So, perhaps adding a "position" in tc filter is better? Something like:
>
>tc qdisc add dev x root handle 1: ... # same as before
>tc filter add dev x parent 1:0 position early_drop matchall action....
>
>And obviously default position must be "enqueue". Makes sense?


Well, if you look at the examples in the cover letter, I think that they
are showing something very similar you are talking about:

# tc qdisc add dev eth0 root handle 1: \
        red limit 500K avpkt 1K qevent early block 10
# tc filter add block 10 \
        matchall action mirred egress mirror dev eth1


The first command just says "early drop position should be processed by
block 10"

The second command just adds a filter to the block 10.



We have this concept of blocks, we use them in "tc filter" as a handle.

The block as a unit could be attached to be processed not only to
"enqueue" but to anything else, like some qdisc stage.

Looks quite neat to me.



>
>(The word "position" may be not accurate, but hope you get my point
>here.)
>
>Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 0/3] TC: Introduce qevents
  2020-06-01 13:40   ` Jiri Pirko
@ 2020-06-01 19:50     ` Cong Wang
  2020-06-01 22:37       ` Petr Machata
  2020-06-02  6:05       ` Jiri Pirko
  0 siblings, 2 replies; 26+ messages in thread
From: Cong Wang @ 2020-06-01 19:50 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Petr Machata, Linux Kernel Network Developers, Jakub Kicinski,
	Eric Dumazet, Jamal Hadi Salim, Jiri Pirko, Ido Schimmel

On Mon, Jun 1, 2020 at 6:40 AM Jiri Pirko <jiri@resnulli.us> wrote:
> The first command just says "early drop position should be processed by
> block 10"
>
> The second command just adds a filter to the block 10.

This is exactly why it looks odd to me, because it _reads_ like
'tc qdisc' creates the block to hold tc filters... I think tc filters should
create whatever placeholder for themselves.

I know in memory block (or chain or filters) are stored in qdisc, but
it is still different to me who initiates the creation.

Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 0/3] TC: Introduce qevents
  2020-05-30  8:55           ` Petr Machata
@ 2020-06-01 20:01             ` Cong Wang
  0 siblings, 0 replies; 26+ messages in thread
From: Cong Wang @ 2020-06-01 20:01 UTC (permalink / raw)
  To: Petr Machata
  Cc: Linux Kernel Network Developers, Jakub Kicinski, Eric Dumazet,
	Jamal Hadi Salim, Jiri Pirko, Ido Schimmel

On Sat, May 30, 2020 at 1:55 AM Petr Machata <petrm@mellanox.com> wrote:
>
>
> Cong Wang <xiyou.wangcong@gmail.com> writes:
>
> > On Thu, May 28, 2020 at 2:48 AM Petr Machata <petrm@mellanox.com> wrote:
> >> So you propose to have further division within the block? To have sort
> >> of namespaces within blocks or chains, where depending on the context,
> >> only filters in the corresponding namespace are executed?
> >
> > What I suggest is to let filters (or chain or block) decide where
> > they belong to, because I think that fit naturally.
>
> So filters would have this attribute that marks them for execution in
> the qevent context?

If you view it as position rather than qevent, sure, we already need to
specify the "context" for tc filter creations anyway, "dev... parent ..." is
is context to locate the filter placeholders. This is why it makes sense
to add one more piece, say "position", that is "dev... parent... position...".

It is you who calls it qevent which of course looks like only belong to
qdisc's.

>
> Ultimately the qdisc decides what qevents to expose. Qevents are closely
> tied to the inner workings of a qdisc algorithm, they can't be probed as
> modules the way qdiscs, filters and actions can. If a user wishes to
> make use of them, they will have to let qdiscs "dictate" where to put
> the filters one way or another.

For a specific event like early drop, sure. But if we think it generally,
"enqueue" has the same problem, there are a few qdisc's don't even
support filtering (noop). We can just return ENOSUPP anyway.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 0/3] TC: Introduce qevents
  2020-06-01 19:50     ` Cong Wang
@ 2020-06-01 22:37       ` Petr Machata
  2020-06-02  6:05       ` Jiri Pirko
  1 sibling, 0 replies; 26+ messages in thread
From: Petr Machata @ 2020-06-01 22:37 UTC (permalink / raw)
  To: Cong Wang
  Cc: Jiri Pirko, Linux Kernel Network Developers, Jakub Kicinski,
	Eric Dumazet, Jamal Hadi Salim, Jiri Pirko, Ido Schimmel


Cong Wang <xiyou.wangcong@gmail.com> writes:

> On Mon, Jun 1, 2020 at 6:40 AM Jiri Pirko <jiri@resnulli.us> wrote:
>> The first command just says "early drop position should be processed by
>> block 10"
>>
>> The second command just adds a filter to the block 10.

> This is exactly why it looks odd to me, because it _reads_ like
> 'tc qdisc' creates the block to hold tc filters... I think tc filters should
> create whatever placeholder for themselves.

Look at clsact. It creates blocks in exactly the same way.

> I know in memory block (or chain or filters) are stored in qdisc, but
> it is still different to me who initiates the creation.

The block binding mechanics are not new. The patch just reuses them. If
you are unhappy about how this is currently done, I too would see merit
in creating a block explicitly, like with chains. But it has nothing to
do with this patchset, which would just naturally pick up whatever this
new mechanic is.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 0/3] TC: Introduce qevents
  2020-06-01 19:50     ` Cong Wang
  2020-06-01 22:37       ` Petr Machata
@ 2020-06-02  6:05       ` Jiri Pirko
  2020-06-03  7:05         ` Cong Wang
  1 sibling, 1 reply; 26+ messages in thread
From: Jiri Pirko @ 2020-06-02  6:05 UTC (permalink / raw)
  To: Cong Wang
  Cc: Petr Machata, Linux Kernel Network Developers, Jakub Kicinski,
	Eric Dumazet, Jamal Hadi Salim, Jiri Pirko, Ido Schimmel

Mon, Jun 01, 2020 at 09:50:23PM CEST, xiyou.wangcong@gmail.com wrote:
>On Mon, Jun 1, 2020 at 6:40 AM Jiri Pirko <jiri@resnulli.us> wrote:
>> The first command just says "early drop position should be processed by
>> block 10"
>>
>> The second command just adds a filter to the block 10.
>
>This is exactly why it looks odd to me, because it _reads_ like
>'tc qdisc' creates the block to hold tc filters... I think tc filters should
>create whatever placeholder for themselves.

That is how it is done already today. We have the block object. The
instances are created separatelly (clsact for example), user may specify
the block id to identify the block. Then the user may use this block id
to add/remove filters directly to/from the block.

What you propose with "position" on the other hand look unnatural for
me. It would slice the exising blocks in some way. Currently, the block
has 1 clearly defined entrypoint. With "positions", all of the sudden
there would be many of them? I can't really imagine how that is supposed
to work :/


>
>I know in memory block (or chain or filters) are stored in qdisc, but
>it is still different to me who initiates the creation.

Qdiscs create blocks.

>
>Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 0/3] TC: Introduce qevents
  2020-06-02  6:05       ` Jiri Pirko
@ 2020-06-03  7:05         ` Cong Wang
  2020-06-03 10:08           ` Petr Machata
  0 siblings, 1 reply; 26+ messages in thread
From: Cong Wang @ 2020-06-03  7:05 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Petr Machata, Linux Kernel Network Developers, Jakub Kicinski,
	Eric Dumazet, Jamal Hadi Salim, Jiri Pirko, Ido Schimmel

On Mon, Jun 1, 2020 at 11:05 PM Jiri Pirko <jiri@resnulli.us> wrote:
>
> Mon, Jun 01, 2020 at 09:50:23PM CEST, xiyou.wangcong@gmail.com wrote:
> >On Mon, Jun 1, 2020 at 6:40 AM Jiri Pirko <jiri@resnulli.us> wrote:
> >> The first command just says "early drop position should be processed by
> >> block 10"
> >>
> >> The second command just adds a filter to the block 10.
> >
> >This is exactly why it looks odd to me, because it _reads_ like
> >'tc qdisc' creates the block to hold tc filters... I think tc filters should
> >create whatever placeholder for themselves.
>
> That is how it is done already today. We have the block object. The
> instances are created separatelly (clsact for example), user may specify
> the block id to identify the block. Then the user may use this block id
> to add/remove filters directly to/from the block.
>
> What you propose with "position" on the other hand look unnatural for
> me. It would slice the exising blocks in some way. Currently, the block
> has 1 clearly defined entrypoint. With "positions", all of the sudden
> there would be many of them? I can't really imagine how that is supposed
> to work :/

I imagine we could introduce multiple blocks for a qdisc.

Currently we have:

struct some_qdisc_data {
  struct tcf_block *block;
};

Maybe we can extend it to:

struct some_qdisc_data {
  struct tcf_block *blocks[3];
};

#define ENQUEUE 0
#define DEQUEUE 1
#define DROP 2

static struct tcf_block *foo_tcf_block(struct Qdisc *sch, unsigned long cl,
                                            struct netlink_ext_ack
*extack, int position)
{
        struct some_qdisc_data *q = qdisc_priv(sch);

        if (cl)
                return NULL;
        return q->block[position];
}


Just some scratches on my mind.

Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 0/3] TC: Introduce qevents
  2020-06-03  7:05         ` Cong Wang
@ 2020-06-03 10:08           ` Petr Machata
  0 siblings, 0 replies; 26+ messages in thread
From: Petr Machata @ 2020-06-03 10:08 UTC (permalink / raw)
  To: Cong Wang
  Cc: Jiri Pirko, Linux Kernel Network Developers, Jakub Kicinski,
	Eric Dumazet, Jamal Hadi Salim, Jiri Pirko, Ido Schimmel


Cong Wang <xiyou.wangcong@gmail.com> writes:

> I imagine we could introduce multiple blocks for a qdisc.

Yes, and that's what the patchset does. If you look at struct
tcf_qevent, it is just some block bookkeeping and an attribute name.

> Currently we have:
>
> struct some_qdisc_data {
>   struct tcf_block *block;
> };
>
> Maybe we can extend it to:
>
> struct some_qdisc_data {
>   struct tcf_block *blocks[3];

Yeah, except not all qdiscs will implement all qevents, so let's instead
make it a handful of fields, like in the patchset.

> };
>
> #define ENQUEUE 0
> #define DEQUEUE 1
> #define DROP 2
>
> static struct tcf_block *foo_tcf_block(struct Qdisc *sch, unsigned long cl,
>                                             struct netlink_ext_ack
> *extack, int position)
> {
>         struct some_qdisc_data *q = qdisc_priv(sch);
>
>         if (cl)
>                 return NULL;
>         return q->block[position];
> }

Interestingly, this is close to my original approach, pre-RFC. But there
needs to be this global list of all existing qevents. On its own, that's
a negative--at least it's an extra uAPI to maintain. What does it bring?

It theoretically allows one to refer to blocks symbolically, through
binding point coordinates (dev D parent P qevent Q) not by indices
(block B). But then one block could be referenced by several different
coordinates, which is confusing. That is the reason TC disallows editing
filters on shared blocks. Qevents should be the same.

What else is there?

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2020-06-03 10:08 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-26 17:10 [RFC PATCH net-next 0/3] TC: Introduce qevents Petr Machata
2020-05-26 17:10 ` [RFC PATCH net-next 1/3] net: sched: Introduce helpers for qevent blocks Petr Machata
2020-05-26 17:10 ` [RFC PATCH net-next 2/3] net: sched: sch_red: Split init and change callbacks Petr Machata
2020-05-26 18:32   ` Jakub Kicinski
2020-05-27 15:23     ` Petr Machata
2020-05-26 17:10 ` [RFC PATCH net-next 3/3] net: sched: sch_red: Add qevents "early" and "mark" Petr Machata
2020-05-26 17:10 ` [RFC PATCH iproute2-next 1/4] uapi: pkt_sched: Add two new RED attributes Petr Machata
2020-05-26 17:10   ` [RFC PATCH iproute2-next 2/4] tc: Add helpers to support qevent parsing Petr Machata
2020-05-26 17:10   ` [RFC PATCH iproute2-next 3/4] man: tc: Describe qevents Petr Machata
2020-05-26 17:10   ` [RFC PATCH iproute2-next 4/4] tc: q_red: Add support for qevents "mark" and "early" Petr Machata
2020-05-27  4:09 ` [RFC PATCH net-next 0/3] TC: Introduce qevents Cong Wang
2020-05-27  9:56   ` Petr Machata
2020-05-28  4:00     ` Cong Wang
2020-05-28  9:48       ` Petr Machata
2020-05-30  4:48         ` Cong Wang
2020-05-30  8:55           ` Petr Machata
2020-06-01 20:01             ` Cong Wang
2020-06-01 13:35         ` Jiri Pirko
2020-06-01 13:40   ` Jiri Pirko
2020-06-01 19:50     ` Cong Wang
2020-06-01 22:37       ` Petr Machata
2020-06-02  6:05       ` Jiri Pirko
2020-06-03  7:05         ` Cong Wang
2020-06-03 10:08           ` Petr Machata
2020-05-27 15:19 ` Vladimir Oltean
2020-05-27 16:25   ` Petr Machata

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.