Netdev Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH net-next v2 0/4] Implement classifier-action terse dump mode
@ 2020-05-15 11:40 Vlad Buslov
  2020-05-15 11:40 ` [PATCH net-next v2 1/4] net: sched: introduce terse dump flag Vlad Buslov
                   ` (7 more replies)
  0 siblings, 8 replies; 31+ messages in thread
From: Vlad Buslov @ 2020-05-15 11:40 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, jiri, dcaratti, marcelo.leitner,
	kuba, Vlad Buslov

Output rate of current upstream kernel TC filter dump implementation if
relatively low (~100k rules/sec depending on configuration). This
constraint impacts performance of software switch implementation that
rely on TC for their datapath implementation and periodically call TC
filter dump to update rules stats. Moreover, TC filter dump output a lot
of static data that don't change during the filter lifecycle (filter
key, specific action details, etc.) which constitutes significant
portion of payload on resulting netlink packets and increases amount of
syscalls necessary to dump all filters on particular Qdisc. In order to
significantly improve filter dump rate this patch sets implement new
mode of TC filter dump operation named "terse dump" mode. In this mode
only parameters necessary to identify the filter (handle, action cookie,
etc.) and data that can change during filter lifecycle (filter flags,
action stats, etc.) are preserved in dump output while everything else
is omitted.

Userspace API is implemented using new TCA_DUMP_FLAGS tlv with only
available flag value TCA_DUMP_FLAGS_TERSE. Internally, new API requires
individual classifier support (new tcf_proto_ops->terse_dump()
callback). Support for action terse dump is implemented in act API and
don't require changing individual action implementations.

The following table provides performance comparison between regular
filter dump and new terse dump mode for two classifier-action profiles:
one minimal config with L2 flower classifier and single gact action and
another heavier config with L2+5tuple flower classifier with
tunnel_key+mirred actions.

 Classifier-action type      |        dump |  terse dump | X improvement 
                             | (rules/sec) | (rules/sec) |               
-----------------------------+-------------+-------------+---------------
 L2 with gact                |       141.8 |       293.2 |          2.07 
 L2+5tuple tunnel_key+mirred |        76.4 |       198.8 |          2.60 

Benchmark details: to measure the rate tc filter dump and terse dump
commands are invoked on ingress Qdisc that have one million filters
configured using following commands.

> time sudo tc -s filter show dev ens1f0 ingress >/dev/null

> time sudo tc -s filter show terse dev ens1f0 ingress >/dev/null

Value in results table is calculated by dividing 1000000 total rules by
"real" time reported by time command.

Setup details: 2x Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz, 32GB memory

Vlad Buslov (4):
  net: sched: introduce terse dump flag
  net: sched: implement terse dump support in act
  net: sched: cls_flower: implement terse dump support
  selftests: implement flower classifier terse dump tests

 include/net/act_api.h                         |  2 +-
 include/net/pkt_cls.h                         |  1 +
 include/net/sch_generic.h                     |  4 ++
 include/uapi/linux/rtnetlink.h                |  6 ++
 net/sched/act_api.c                           | 30 +++++++--
 net/sched/cls_api.c                           | 67 ++++++++++++++++---
 net/sched/cls_flower.c                        | 43 ++++++++++++
 .../tc-testing/tc-tests/filters/tests.json    | 38 +++++++++++
 8 files changed, 174 insertions(+), 17 deletions(-)

-- 
2.21.0


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH net-next v2 1/4] net: sched: introduce terse dump flag
  2020-05-15 11:40 [PATCH net-next v2 0/4] Implement classifier-action terse dump mode Vlad Buslov
@ 2020-05-15 11:40 ` Vlad Buslov
  2020-05-15 11:51   ` Jiri Pirko
  2020-05-15 11:40 ` [PATCH net-next v2 2/4] net: sched: implement terse dump support in act Vlad Buslov
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 31+ messages in thread
From: Vlad Buslov @ 2020-05-15 11:40 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, jiri, dcaratti, marcelo.leitner,
	kuba, Vlad Buslov

Add new TCA_DUMP_FLAGS attribute and use it in cls API to request terse
filter output from classifiers with TCA_DUMP_FLAGS_TERSE flag. This option
is intended to be used to improve performance of TC filter dump when
userland only needs to obtain stats and not the whole classifier/action
data. Extend struct tcf_proto_ops with new terse_dump() callback that must
be defined by supporting classifier implementations.

Support of the options in specific classifiers and actions is
implemented in following patches in the series.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
---

Notes:
    Changes V1 -> V2:
    
    - Use terse dump flag value directly when defining dump policy array
    instead of first assigning it allowed flag values static constant u32.

 include/net/sch_generic.h      |  4 ++++
 include/uapi/linux/rtnetlink.h |  6 ++++++
 net/sched/cls_api.c            | 39 +++++++++++++++++++++++++++-------
 3 files changed, 41 insertions(+), 8 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index ab87a8b86a32..c510b03b9751 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -330,6 +330,10 @@ struct tcf_proto_ops {
 	int			(*dump)(struct net*, struct tcf_proto*, void *,
 					struct sk_buff *skb, struct tcmsg*,
 					bool);
+	int			(*terse_dump)(struct net *net,
+					      struct tcf_proto *tp, void *fh,
+					      struct sk_buff *skb,
+					      struct tcmsg *t, bool rtnl_held);
 	int			(*tmplt_dump)(struct sk_buff *skb,
 					      struct net *net,
 					      void *tmplt_priv);
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 4a8c5b745157..073e71ef6bdd 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -609,11 +609,17 @@ enum {
 	TCA_HW_OFFLOAD,
 	TCA_INGRESS_BLOCK,
 	TCA_EGRESS_BLOCK,
+	TCA_DUMP_FLAGS,
 	__TCA_MAX
 };
 
 #define TCA_MAX (__TCA_MAX - 1)
 
+#define TCA_DUMP_FLAGS_TERSE (1 << 0) /* Means that in dump user gets only basic
+				       * data necessary to identify the objects
+				       * (handle, cookie, etc.) and stats.
+				       */
+
 #define TCA_RTA(r)  ((struct rtattr*)(((char*)(r)) + NLMSG_ALIGN(sizeof(struct tcmsg))))
 #define TCA_PAYLOAD(n) NLMSG_PAYLOAD(n,sizeof(struct tcmsg))
 
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 299b963c796e..cb2c10e0fee5 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -1851,7 +1851,7 @@ static int tcf_fill_node(struct net *net, struct sk_buff *skb,
 			 struct tcf_proto *tp, struct tcf_block *block,
 			 struct Qdisc *q, u32 parent, void *fh,
 			 u32 portid, u32 seq, u16 flags, int event,
-			 bool rtnl_held)
+			 bool terse_dump, bool rtnl_held)
 {
 	struct tcmsg *tcm;
 	struct nlmsghdr  *nlh;
@@ -1878,6 +1878,14 @@ static int tcf_fill_node(struct net *net, struct sk_buff *skb,
 		goto nla_put_failure;
 	if (!fh) {
 		tcm->tcm_handle = 0;
+	} else if (terse_dump) {
+		if (tp->ops->terse_dump) {
+			if (tp->ops->terse_dump(net, tp, fh, skb, tcm,
+						rtnl_held) < 0)
+				goto nla_put_failure;
+		} else {
+			goto cls_op_not_supp;
+		}
 	} else {
 		if (tp->ops->dump &&
 		    tp->ops->dump(net, tp, fh, skb, tcm, rtnl_held) < 0)
@@ -1888,6 +1896,7 @@ static int tcf_fill_node(struct net *net, struct sk_buff *skb,
 
 out_nlmsg_trim:
 nla_put_failure:
+cls_op_not_supp:
 	nlmsg_trim(skb, b);
 	return -1;
 }
@@ -1908,7 +1917,7 @@ static int tfilter_notify(struct net *net, struct sk_buff *oskb,
 
 	if (tcf_fill_node(net, skb, tp, block, q, parent, fh, portid,
 			  n->nlmsg_seq, n->nlmsg_flags, event,
-			  rtnl_held) <= 0) {
+			  false, rtnl_held) <= 0) {
 		kfree_skb(skb);
 		return -EINVAL;
 	}
@@ -1940,7 +1949,7 @@ static int tfilter_del_notify(struct net *net, struct sk_buff *oskb,
 
 	if (tcf_fill_node(net, skb, tp, block, q, parent, fh, portid,
 			  n->nlmsg_seq, n->nlmsg_flags, RTM_DELTFILTER,
-			  rtnl_held) <= 0) {
+			  false, rtnl_held) <= 0) {
 		NL_SET_ERR_MSG(extack, "Failed to build del event notification");
 		kfree_skb(skb);
 		return -EINVAL;
@@ -2501,6 +2510,7 @@ struct tcf_dump_args {
 	struct tcf_block *block;
 	struct Qdisc *q;
 	u32 parent;
+	bool terse_dump;
 };
 
 static int tcf_node_dump(struct tcf_proto *tp, void *n, struct tcf_walker *arg)
@@ -2511,12 +2521,12 @@ static int tcf_node_dump(struct tcf_proto *tp, void *n, struct tcf_walker *arg)
 	return tcf_fill_node(net, a->skb, tp, a->block, a->q, a->parent,
 			     n, NETLINK_CB(a->cb->skb).portid,
 			     a->cb->nlh->nlmsg_seq, NLM_F_MULTI,
-			     RTM_NEWTFILTER, true);
+			     RTM_NEWTFILTER, a->terse_dump, true);
 }
 
 static bool tcf_chain_dump(struct tcf_chain *chain, struct Qdisc *q, u32 parent,
 			   struct sk_buff *skb, struct netlink_callback *cb,
-			   long index_start, long *p_index)
+			   long index_start, long *p_index, bool terse)
 {
 	struct net *net = sock_net(skb->sk);
 	struct tcf_block *block = chain->block;
@@ -2545,7 +2555,7 @@ static bool tcf_chain_dump(struct tcf_chain *chain, struct Qdisc *q, u32 parent,
 			if (tcf_fill_node(net, skb, tp, block, q, parent, NULL,
 					  NETLINK_CB(cb->skb).portid,
 					  cb->nlh->nlmsg_seq, NLM_F_MULTI,
-					  RTM_NEWTFILTER, true) <= 0)
+					  RTM_NEWTFILTER, false, true) <= 0)
 				goto errout;
 			cb->args[1] = 1;
 		}
@@ -2561,6 +2571,7 @@ static bool tcf_chain_dump(struct tcf_chain *chain, struct Qdisc *q, u32 parent,
 		arg.w.skip = cb->args[1] - 1;
 		arg.w.count = 0;
 		arg.w.cookie = cb->args[2];
+		arg.terse_dump = terse;
 		tp->ops->walk(tp, &arg.w, true);
 		cb->args[2] = arg.w.cookie;
 		cb->args[1] = arg.w.count + 1;
@@ -2574,6 +2585,10 @@ static bool tcf_chain_dump(struct tcf_chain *chain, struct Qdisc *q, u32 parent,
 	return false;
 }
 
+static const struct nla_policy tcf_tfilter_dump_policy[TCA_MAX + 1] = {
+	[TCA_DUMP_FLAGS] = NLA_POLICY_BITFIELD32(TCA_DUMP_FLAGS_TERSE),
+};
+
 /* called with RTNL */
 static int tc_dump_tfilter(struct sk_buff *skb, struct netlink_callback *cb)
 {
@@ -2583,6 +2598,7 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct netlink_callback *cb)
 	struct Qdisc *q = NULL;
 	struct tcf_block *block;
 	struct tcmsg *tcm = nlmsg_data(cb->nlh);
+	bool terse_dump = false;
 	long index_start;
 	long index;
 	u32 parent;
@@ -2592,10 +2608,17 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct netlink_callback *cb)
 		return skb->len;
 
 	err = nlmsg_parse_deprecated(cb->nlh, sizeof(*tcm), tca, TCA_MAX,
-				     NULL, cb->extack);
+				     tcf_tfilter_dump_policy, cb->extack);
 	if (err)
 		return err;
 
+	if (tca[TCA_DUMP_FLAGS]) {
+		struct nla_bitfield32 flags =
+			nla_get_bitfield32(tca[TCA_DUMP_FLAGS]);
+
+		terse_dump = flags.value & TCA_DUMP_FLAGS_TERSE;
+	}
+
 	if (tcm->tcm_ifindex == TCM_IFINDEX_MAGIC_BLOCK) {
 		block = tcf_block_refcnt_get(net, tcm->tcm_block_index);
 		if (!block)
@@ -2653,7 +2676,7 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct netlink_callback *cb)
 		    nla_get_u32(tca[TCA_CHAIN]) != chain->index)
 			continue;
 		if (!tcf_chain_dump(chain, q, parent, skb, cb,
-				    index_start, &index)) {
+				    index_start, &index, terse_dump)) {
 			tcf_chain_put(chain);
 			err = -EMSGSIZE;
 			break;
-- 
2.21.0


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH net-next v2 2/4] net: sched: implement terse dump support in act
  2020-05-15 11:40 [PATCH net-next v2 0/4] Implement classifier-action terse dump mode Vlad Buslov
  2020-05-15 11:40 ` [PATCH net-next v2 1/4] net: sched: introduce terse dump flag Vlad Buslov
@ 2020-05-15 11:40 ` Vlad Buslov
  2020-05-15 11:51   ` Jiri Pirko
  2020-05-15 11:40 ` [PATCH net-next v2 3/4] net: sched: cls_flower: implement terse dump support Vlad Buslov
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 31+ messages in thread
From: Vlad Buslov @ 2020-05-15 11:40 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, jiri, dcaratti, marcelo.leitner,
	kuba, Vlad Buslov

Extend tcf_action_dump() with boolean argument 'terse' that is used to
request terse-mode action dump. In terse mode only essential data needed to
identify particular action (action kind, cookie, etc.) and its stats is put
to resulting skb and everything else is omitted. Implement
tcf_exts_terse_dump() helper in cls API that is intended to be used to
request terse dump of all exts (actions) attached to the filter.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
---

Notes:
    Changes V1 -> V2:
    
    - Remove redundant newline after tcf_exts_terse_dump() function.

 include/net/act_api.h |  2 +-
 include/net/pkt_cls.h |  1 +
 net/sched/act_api.c   | 30 +++++++++++++++++++++++-------
 net/sched/cls_api.c   | 28 +++++++++++++++++++++++++++-
 4 files changed, 52 insertions(+), 9 deletions(-)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index c24d7643548e..1b4bfc4437be 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -193,7 +193,7 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp,
 				    bool rtnl_held,
 				    struct netlink_ext_ack *extack);
 int tcf_action_dump(struct sk_buff *skb, struct tc_action *actions[], int bind,
-		    int ref);
+		    int ref, bool terse);
 int tcf_action_dump_old(struct sk_buff *skb, struct tc_action *a, int, int);
 int tcf_action_dump_1(struct sk_buff *skb, struct tc_action *a, int, int);
 
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 04aa0649f3b0..ed65619cbc47 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -325,6 +325,7 @@ int tcf_exts_validate(struct net *net, struct tcf_proto *tp,
 void tcf_exts_destroy(struct tcf_exts *exts);
 void tcf_exts_change(struct tcf_exts *dst, struct tcf_exts *src);
 int tcf_exts_dump(struct sk_buff *skb, struct tcf_exts *exts);
+int tcf_exts_terse_dump(struct sk_buff *skb, struct tcf_exts *exts);
 int tcf_exts_dump_stats(struct sk_buff *skb, struct tcf_exts *exts);
 
 /**
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index fbbec2e562f5..8ac7eb0a8309 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -766,12 +766,10 @@ tcf_action_dump_old(struct sk_buff *skb, struct tc_action *a, int bind, int ref)
 	return a->ops->dump(skb, a, bind, ref);
 }
 
-int
-tcf_action_dump_1(struct sk_buff *skb, struct tc_action *a, int bind, int ref)
+static int
+tcf_action_dump_terse(struct sk_buff *skb, struct tc_action *a)
 {
-	int err = -EINVAL;
 	unsigned char *b = skb_tail_pointer(skb);
-	struct nlattr *nest;
 	struct tc_cookie *cookie;
 
 	if (nla_put_string(skb, TCA_KIND, a->ops->kind))
@@ -789,6 +787,23 @@ tcf_action_dump_1(struct sk_buff *skb, struct tc_action *a, int bind, int ref)
 	}
 	rcu_read_unlock();
 
+	return 0;
+
+nla_put_failure:
+	nlmsg_trim(skb, b);
+	return -1;
+}
+
+int
+tcf_action_dump_1(struct sk_buff *skb, struct tc_action *a, int bind, int ref)
+{
+	int err = -EINVAL;
+	unsigned char *b = skb_tail_pointer(skb);
+	struct nlattr *nest;
+
+	if (tcf_action_dump_terse(skb, a))
+		goto nla_put_failure;
+
 	if (a->hw_stats != TCA_ACT_HW_STATS_ANY &&
 	    nla_put_bitfield32(skb, TCA_ACT_HW_STATS,
 			       a->hw_stats, TCA_ACT_HW_STATS_ANY))
@@ -820,7 +835,7 @@ tcf_action_dump_1(struct sk_buff *skb, struct tc_action *a, int bind, int ref)
 EXPORT_SYMBOL(tcf_action_dump_1);
 
 int tcf_action_dump(struct sk_buff *skb, struct tc_action *actions[],
-		    int bind, int ref)
+		    int bind, int ref, bool terse)
 {
 	struct tc_action *a;
 	int err = -EINVAL, i;
@@ -831,7 +846,8 @@ int tcf_action_dump(struct sk_buff *skb, struct tc_action *actions[],
 		nest = nla_nest_start_noflag(skb, i + 1);
 		if (nest == NULL)
 			goto nla_put_failure;
-		err = tcf_action_dump_1(skb, a, bind, ref);
+		err = terse ? tcf_action_dump_terse(skb, a) :
+			tcf_action_dump_1(skb, a, bind, ref);
 		if (err < 0)
 			goto errout;
 		nla_nest_end(skb, nest);
@@ -1133,7 +1149,7 @@ static int tca_get_fill(struct sk_buff *skb, struct tc_action *actions[],
 	if (!nest)
 		goto out_nlmsg_trim;
 
-	if (tcf_action_dump(skb, actions, bind, ref) < 0)
+	if (tcf_action_dump(skb, actions, bind, ref, false) < 0)
 		goto out_nlmsg_trim;
 
 	nla_nest_end(skb, nest);
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index cb2c10e0fee5..752d608f4442 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -3179,7 +3179,8 @@ int tcf_exts_dump(struct sk_buff *skb, struct tcf_exts *exts)
 			if (nest == NULL)
 				goto nla_put_failure;
 
-			if (tcf_action_dump(skb, exts->actions, 0, 0) < 0)
+			if (tcf_action_dump(skb, exts->actions, 0, 0, false)
+			    < 0)
 				goto nla_put_failure;
 			nla_nest_end(skb, nest);
 		} else if (exts->police) {
@@ -3203,6 +3204,31 @@ int tcf_exts_dump(struct sk_buff *skb, struct tcf_exts *exts)
 }
 EXPORT_SYMBOL(tcf_exts_dump);
 
+int tcf_exts_terse_dump(struct sk_buff *skb, struct tcf_exts *exts)
+{
+#ifdef CONFIG_NET_CLS_ACT
+	struct nlattr *nest;
+
+	if (!exts->action || !tcf_exts_has_actions(exts))
+		return 0;
+
+	nest = nla_nest_start_noflag(skb, exts->action);
+	if (!nest)
+		goto nla_put_failure;
+
+	if (tcf_action_dump(skb, exts->actions, 0, 0, true) < 0)
+		goto nla_put_failure;
+	nla_nest_end(skb, nest);
+	return 0;
+
+nla_put_failure:
+	nla_nest_cancel(skb, nest);
+	return -1;
+#else
+	return 0;
+#endif
+}
+EXPORT_SYMBOL(tcf_exts_terse_dump);
 
 int tcf_exts_dump_stats(struct sk_buff *skb, struct tcf_exts *exts)
 {
-- 
2.21.0


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH net-next v2 3/4] net: sched: cls_flower: implement terse dump support
  2020-05-15 11:40 [PATCH net-next v2 0/4] Implement classifier-action terse dump mode Vlad Buslov
  2020-05-15 11:40 ` [PATCH net-next v2 1/4] net: sched: introduce terse dump flag Vlad Buslov
  2020-05-15 11:40 ` [PATCH net-next v2 2/4] net: sched: implement terse dump support in act Vlad Buslov
@ 2020-05-15 11:40 ` Vlad Buslov
  2020-05-15 11:40 ` [PATCH net-next v2 4/4] selftests: implement flower classifier terse dump tests Vlad Buslov
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 31+ messages in thread
From: Vlad Buslov @ 2020-05-15 11:40 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, jiri, dcaratti, marcelo.leitner,
	kuba, Vlad Buslov, Jiri Pirko

Implement tcf_proto_ops->terse_dump() callback for flower classifier. Only
dump handle, flags and action data in terse mode.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_flower.c | 43 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 74a0febcafb8..0c574700da75 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -2768,6 +2768,48 @@ static int fl_dump(struct net *net, struct tcf_proto *tp, void *fh,
 	return -1;
 }
 
+static int fl_terse_dump(struct net *net, struct tcf_proto *tp, void *fh,
+			 struct sk_buff *skb, struct tcmsg *t, bool rtnl_held)
+{
+	struct cls_fl_filter *f = fh;
+	struct nlattr *nest;
+	bool skip_hw;
+
+	if (!f)
+		return skb->len;
+
+	t->tcm_handle = f->handle;
+
+	nest = nla_nest_start_noflag(skb, TCA_OPTIONS);
+	if (!nest)
+		goto nla_put_failure;
+
+	spin_lock(&tp->lock);
+
+	skip_hw = tc_skip_hw(f->flags);
+
+	if (f->flags && nla_put_u32(skb, TCA_FLOWER_FLAGS, f->flags))
+		goto nla_put_failure_locked;
+
+	spin_unlock(&tp->lock);
+
+	if (!skip_hw)
+		fl_hw_update_stats(tp, f, rtnl_held);
+
+	if (tcf_exts_terse_dump(skb, &f->exts))
+		goto nla_put_failure;
+
+	nla_nest_end(skb, nest);
+
+	return skb->len;
+
+nla_put_failure_locked:
+	spin_unlock(&tp->lock);
+nla_put_failure:
+	nla_nest_cancel(skb, nest);
+	return -1;
+}
+
 static int fl_tmplt_dump(struct sk_buff *skb, struct net *net, void *tmplt_priv)
 {
 	struct fl_flow_tmplt *tmplt = tmplt_priv;
@@ -2832,6 +2874,7 @@ static struct tcf_proto_ops cls_fl_ops __read_mostly = {
 	.hw_add		= fl_hw_add,
 	.hw_del		= fl_hw_del,
 	.dump		= fl_dump,
+	.terse_dump	= fl_terse_dump,
 	.bind_class	= fl_bind_class,
 	.tmplt_create	= fl_tmplt_create,
 	.tmplt_destroy	= fl_tmplt_destroy,
-- 
2.21.0


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH net-next v2 4/4] selftests: implement flower classifier terse dump tests
  2020-05-15 11:40 [PATCH net-next v2 0/4] Implement classifier-action terse dump mode Vlad Buslov
                   ` (2 preceding siblings ...)
  2020-05-15 11:40 ` [PATCH net-next v2 3/4] net: sched: cls_flower: implement terse dump support Vlad Buslov
@ 2020-05-15 11:40 ` Vlad Buslov
  2020-05-15 17:04 ` [PATCH net-next v2 0/4] Implement classifier-action terse dump mode Jakub Kicinski
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 31+ messages in thread
From: Vlad Buslov @ 2020-05-15 11:40 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, jiri, dcaratti, marcelo.leitner,
	kuba, Vlad Buslov

Implement two basic tests to verify terse dump functionality of flower
classifier:

- Test that verifies that terse dump works.

- Test that verifies that terse dump doesn't print filter key.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
---
 .../tc-testing/tc-tests/filters/tests.json    | 38 +++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/tools/testing/selftests/tc-testing/tc-tests/filters/tests.json b/tools/testing/selftests/tc-testing/tc-tests/filters/tests.json
index 12aa4bc1f6a0..bb543bf69d69 100644
--- a/tools/testing/selftests/tc-testing/tc-tests/filters/tests.json
+++ b/tools/testing/selftests/tc-testing/tc-tests/filters/tests.json
@@ -87,5 +87,43 @@
         "teardown": [
             "$TC qdisc del dev $DEV2 ingress"
         ]
+    },
+    {
+        "id": "7c65",
+        "name": "Add flower filter and then terse dump it",
+        "category": [
+            "filter",
+            "flower"
+        ],
+        "setup": [
+            "$TC qdisc add dev $DEV2 ingress"
+        ],
+        "cmdUnderTest": "$TC filter add dev $DEV2 protocol ip pref 1 ingress flower dst_mac e4:11:22:11:4a:51 action drop",
+        "expExitCode": "0",
+        "verifyCmd": "$TC filter show terse dev $DEV2 ingress",
+        "matchPattern": "filter protocol ip pref 1 flower.*handle",
+        "matchCount": "1",
+        "teardown": [
+            "$TC qdisc del dev $DEV2 ingress"
+        ]
+    },
+    {
+        "id": "d45e",
+        "name": "Add flower filter and verify that terse dump doesn't output filter key",
+        "category": [
+            "filter",
+            "flower"
+        ],
+        "setup": [
+            "$TC qdisc add dev $DEV2 ingress"
+        ],
+        "cmdUnderTest": "$TC filter add dev $DEV2 protocol ip pref 1 ingress flower dst_mac e4:11:22:11:4a:51 action drop",
+        "expExitCode": "0",
+        "verifyCmd": "$TC filter show terse dev $DEV2 ingress",
+        "matchPattern": "  dst_mac e4:11:22:11:4a:51",
+        "matchCount": "0",
+        "teardown": [
+            "$TC qdisc del dev $DEV2 ingress"
+        ]
     }
 ]
-- 
2.21.0


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next v2 2/4] net: sched: implement terse dump support in act
  2020-05-15 11:40 ` [PATCH net-next v2 2/4] net: sched: implement terse dump support in act Vlad Buslov
@ 2020-05-15 11:51   ` Jiri Pirko
  0 siblings, 0 replies; 31+ messages in thread
From: Jiri Pirko @ 2020-05-15 11:51 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: netdev, davem, jhs, xiyou.wangcong, dcaratti, marcelo.leitner, kuba

Fri, May 15, 2020 at 01:40:12PM CEST, vladbu@mellanox.com wrote:
>Extend tcf_action_dump() with boolean argument 'terse' that is used to
>request terse-mode action dump. In terse mode only essential data needed to
>identify particular action (action kind, cookie, etc.) and its stats is put
>to resulting skb and everything else is omitted. Implement
>tcf_exts_terse_dump() helper in cls API that is intended to be used to
>request terse dump of all exts (actions) attached to the filter.
>
>Signed-off-by: Vlad Buslov <vladbu@mellanox.com>

Reviewed-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next v2 1/4] net: sched: introduce terse dump flag
  2020-05-15 11:40 ` [PATCH net-next v2 1/4] net: sched: introduce terse dump flag Vlad Buslov
@ 2020-05-15 11:51   ` Jiri Pirko
  0 siblings, 0 replies; 31+ messages in thread
From: Jiri Pirko @ 2020-05-15 11:51 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: netdev, davem, jhs, xiyou.wangcong, dcaratti, marcelo.leitner, kuba

Fri, May 15, 2020 at 01:40:11PM CEST, vladbu@mellanox.com wrote:
>Add new TCA_DUMP_FLAGS attribute and use it in cls API to request terse
>filter output from classifiers with TCA_DUMP_FLAGS_TERSE flag. This option
>is intended to be used to improve performance of TC filter dump when
>userland only needs to obtain stats and not the whole classifier/action
>data. Extend struct tcf_proto_ops with new terse_dump() callback that must
>be defined by supporting classifier implementations.
>
>Support of the options in specific classifiers and actions is
>implemented in following patches in the series.
>
>Signed-off-by: Vlad Buslov <vladbu@mellanox.com>

Reviewed-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next v2 0/4] Implement classifier-action terse dump mode
  2020-05-15 11:40 [PATCH net-next v2 0/4] Implement classifier-action terse dump mode Vlad Buslov
                   ` (3 preceding siblings ...)
  2020-05-15 11:40 ` [PATCH net-next v2 4/4] selftests: implement flower classifier terse dump tests Vlad Buslov
@ 2020-05-15 17:04 ` Jakub Kicinski
  2020-05-18  6:46   ` Vlad Buslov
  2020-05-15 17:25 ` David Miller
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 31+ messages in thread
From: Jakub Kicinski @ 2020-05-15 17:04 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: netdev, davem, jhs, xiyou.wangcong, jiri, dcaratti, marcelo.leitner

On Fri, 15 May 2020 14:40:10 +0300 Vlad Buslov wrote:
> Output rate of current upstream kernel TC filter dump implementation if
> relatively low (~100k rules/sec depending on configuration). This
> constraint impacts performance of software switch implementation that
> rely on TC for their datapath implementation and periodically call TC
> filter dump to update rules stats. Moreover, TC filter dump output a lot
> of static data that don't change during the filter lifecycle (filter
> key, specific action details, etc.) which constitutes significant
> portion of payload on resulting netlink packets and increases amount of
> syscalls necessary to dump all filters on particular Qdisc. In order to
> significantly improve filter dump rate this patch sets implement new
> mode of TC filter dump operation named "terse dump" mode. In this mode
> only parameters necessary to identify the filter (handle, action cookie,
> etc.) and data that can change during filter lifecycle (filter flags,
> action stats, etc.) are preserved in dump output while everything else
> is omitted.

Please keep the review tags you already got when making minor changes.

Reviewed-by: Jakub Kicinski <kuba@kernel.org>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next v2 0/4] Implement classifier-action terse dump mode
  2020-05-15 11:40 [PATCH net-next v2 0/4] Implement classifier-action terse dump mode Vlad Buslov
                   ` (4 preceding siblings ...)
  2020-05-15 17:04 ` [PATCH net-next v2 0/4] Implement classifier-action terse dump mode Jakub Kicinski
@ 2020-05-15 17:25 ` David Miller
  2020-05-18  6:50   ` Vlad Buslov
  2020-05-17 19:13 ` Cong Wang
  2020-05-18 15:37 ` Edward Cree
  7 siblings, 1 reply; 31+ messages in thread
From: David Miller @ 2020-05-15 17:25 UTC (permalink / raw)
  To: vladbu; +Cc: netdev, jhs, xiyou.wangcong, jiri, dcaratti, marcelo.leitner, kuba

From: Vlad Buslov <vladbu@mellanox.com>
Date: Fri, 15 May 2020 14:40:10 +0300

> Output rate of current upstream kernel TC filter dump implementation if
> relatively low (~100k rules/sec depending on configuration). This
> constraint impacts performance of software switch implementation that
> rely on TC for their datapath implementation and periodically call TC
> filter dump to update rules stats. Moreover, TC filter dump output a lot
> of static data that don't change during the filter lifecycle (filter
> key, specific action details, etc.) which constitutes significant
> portion of payload on resulting netlink packets and increases amount of
> syscalls necessary to dump all filters on particular Qdisc. In order to
> significantly improve filter dump rate this patch sets implement new
> mode of TC filter dump operation named "terse dump" mode. In this mode
> only parameters necessary to identify the filter (handle, action cookie,
> etc.) and data that can change during filter lifecycle (filter flags,
> action stats, etc.) are preserved in dump output while everything else
> is omitted.
> 
> Userspace API is implemented using new TCA_DUMP_FLAGS tlv with only
> available flag value TCA_DUMP_FLAGS_TERSE. Internally, new API requires
> individual classifier support (new tcf_proto_ops->terse_dump()
> callback). Support for action terse dump is implemented in act API and
> don't require changing individual action implementations.
 ...

This looks fine, so series applied.

But really if people just want an efficient stats dump there is probably
a better way to efficiently encode just the IDs and STATs.  Maybe even
put the stats in pages that userland can mmap() and avoid all of this
system call overhead and locking altogether.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next v2 0/4] Implement classifier-action terse dump mode
  2020-05-15 11:40 [PATCH net-next v2 0/4] Implement classifier-action terse dump mode Vlad Buslov
                   ` (5 preceding siblings ...)
  2020-05-15 17:25 ` David Miller
@ 2020-05-17 19:13 ` Cong Wang
  2020-05-18  6:44   ` Vlad Buslov
  2020-05-18 15:37 ` Edward Cree
  7 siblings, 1 reply; 31+ messages in thread
From: Cong Wang @ 2020-05-17 19:13 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Linux Kernel Network Developers, David Miller, Jamal Hadi Salim,
	Jiri Pirko, Davide Caratti, Marcelo Ricardo Leitner,
	Jakub Kicinski

On Fri, May 15, 2020 at 4:40 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>
> Output rate of current upstream kernel TC filter dump implementation if
> relatively low (~100k rules/sec depending on configuration). This
> constraint impacts performance of software switch implementation that
> rely on TC for their datapath implementation and periodically call TC
> filter dump to update rules stats. Moreover, TC filter dump output a lot
> of static data that don't change during the filter lifecycle (filter
> key, specific action details, etc.) which constitutes significant
> portion of payload on resulting netlink packets and increases amount of
> syscalls necessary to dump all filters on particular Qdisc. In order to
> significantly improve filter dump rate this patch sets implement new
> mode of TC filter dump operation named "terse dump" mode. In this mode
> only parameters necessary to identify the filter (handle, action cookie,
> etc.) and data that can change during filter lifecycle (filter flags,
> action stats, etc.) are preserved in dump output while everything else
> is omitted.
>
> Userspace API is implemented using new TCA_DUMP_FLAGS tlv with only
> available flag value TCA_DUMP_FLAGS_TERSE. Internally, new API requires
> individual classifier support (new tcf_proto_ops->terse_dump()
> callback). Support for action terse dump is implemented in act API and
> don't require changing individual action implementations.

Sorry for being late.

Why terse dump needs a new ops if it only dumps a subset of the
regular dump? That is, why not just pass a boolean flag to regular
->dump() implementation?

I guess that might break user-space ABI? At least some netlink
attributes are not always dumped anyway, so it does not look like
a problem?

Thanks.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next v2 0/4] Implement classifier-action terse dump mode
  2020-05-17 19:13 ` Cong Wang
@ 2020-05-18  6:44   ` Vlad Buslov
  2020-05-18 18:46     ` Cong Wang
  0 siblings, 1 reply; 31+ messages in thread
From: Vlad Buslov @ 2020-05-18  6:44 UTC (permalink / raw)
  To: Cong Wang, Jiri Pirko
  Cc: Vlad Buslov, Linux Kernel Network Developers, David Miller,
	Jamal Hadi Salim, Davide Caratti, Marcelo Ricardo Leitner,
	Jakub Kicinski


On Sun 17 May 2020 at 22:13, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Fri, May 15, 2020 at 4:40 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>>
>> Output rate of current upstream kernel TC filter dump implementation if
>> relatively low (~100k rules/sec depending on configuration). This
>> constraint impacts performance of software switch implementation that
>> rely on TC for their datapath implementation and periodically call TC
>> filter dump to update rules stats. Moreover, TC filter dump output a lot
>> of static data that don't change during the filter lifecycle (filter
>> key, specific action details, etc.) which constitutes significant
>> portion of payload on resulting netlink packets and increases amount of
>> syscalls necessary to dump all filters on particular Qdisc. In order to
>> significantly improve filter dump rate this patch sets implement new
>> mode of TC filter dump operation named "terse dump" mode. In this mode
>> only parameters necessary to identify the filter (handle, action cookie,
>> etc.) and data that can change during filter lifecycle (filter flags,
>> action stats, etc.) are preserved in dump output while everything else
>> is omitted.
>>
>> Userspace API is implemented using new TCA_DUMP_FLAGS tlv with only
>> available flag value TCA_DUMP_FLAGS_TERSE. Internally, new API requires
>> individual classifier support (new tcf_proto_ops->terse_dump()
>> callback). Support for action terse dump is implemented in act API and
>> don't require changing individual action implementations.
>
> Sorry for being late.
>
> Why terse dump needs a new ops if it only dumps a subset of the
> regular dump? That is, why not just pass a boolean flag to regular
> ->dump() implementation?
>
> I guess that might break user-space ABI? At least some netlink
> attributes are not always dumped anyway, so it does not look like
> a problem?
>
> Thanks.

Hi Cong,

I considered adding a flag to ->dump() callback but decided against it
for following reasons:

- It complicates fl_dump() code by adding additional conditionals. Not a
  big problem but it seemed better for me to have a standalone callback
  because with combined implementation it is even hard to deduce what
  does terse dump actually output.

- My initial implementation just called regular dump for classifiers
  that don't support terse dump, but in internal review Jiri insisted
  that cls API should fail if it can't satisfy user's request and having
  dedicated callback allows implementation to return an error if
  classifier doesn't define ->terse_dump(). With flag approach it would
  be not trivial to determine if implementation actually uses the flag.
  I guess I could have added new tcf_proto_ops->flags value to designate
  terse dump support, but checking for dedicated callback existence
  seemed like obvious approach.

Regards,
Vlad

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next v2 0/4] Implement classifier-action terse dump mode
  2020-05-15 17:04 ` [PATCH net-next v2 0/4] Implement classifier-action terse dump mode Jakub Kicinski
@ 2020-05-18  6:46   ` Vlad Buslov
  0 siblings, 0 replies; 31+ messages in thread
From: Vlad Buslov @ 2020-05-18  6:46 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Vlad Buslov, netdev, davem, jhs, xiyou.wangcong, jiri, dcaratti,
	marcelo.leitner


On Fri 15 May 2020 at 20:04, Jakub Kicinski <kuba@kernel.org> wrote:
> On Fri, 15 May 2020 14:40:10 +0300 Vlad Buslov wrote:
>> Output rate of current upstream kernel TC filter dump implementation if
>> relatively low (~100k rules/sec depending on configuration). This
>> constraint impacts performance of software switch implementation that
>> rely on TC for their datapath implementation and periodically call TC
>> filter dump to update rules stats. Moreover, TC filter dump output a lot
>> of static data that don't change during the filter lifecycle (filter
>> key, specific action details, etc.) which constitutes significant
>> portion of payload on resulting netlink packets and increases amount of
>> syscalls necessary to dump all filters on particular Qdisc. In order to
>> significantly improve filter dump rate this patch sets implement new
>> mode of TC filter dump operation named "terse dump" mode. In this mode
>> only parameters necessary to identify the filter (handle, action cookie,
>> etc.) and data that can change during filter lifecycle (filter flags,
>> action stats, etc.) are preserved in dump output while everything else
>> is omitted.
>
> Please keep the review tags you already got when making minor changes.

Thanks. I generally hesitant to retain other people's tags when making
any changes to the code. I'll retain your tags when making trivial
changes from now on.

>
> Reviewed-by: Jakub Kicinski <kuba@kernel.org>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next v2 0/4] Implement classifier-action terse dump mode
  2020-05-15 17:25 ` David Miller
@ 2020-05-18  6:50   ` Vlad Buslov
  0 siblings, 0 replies; 31+ messages in thread
From: Vlad Buslov @ 2020-05-18  6:50 UTC (permalink / raw)
  To: David Miller
  Cc: vladbu, netdev, jhs, xiyou.wangcong, jiri, dcaratti,
	marcelo.leitner, kuba


On Fri 15 May 2020 at 20:25, David Miller <davem@davemloft.net> wrote:
> From: Vlad Buslov <vladbu@mellanox.com>
> Date: Fri, 15 May 2020 14:40:10 +0300
>
>> Output rate of current upstream kernel TC filter dump implementation if
>> relatively low (~100k rules/sec depending on configuration). This
>> constraint impacts performance of software switch implementation that
>> rely on TC for their datapath implementation and periodically call TC
>> filter dump to update rules stats. Moreover, TC filter dump output a lot
>> of static data that don't change during the filter lifecycle (filter
>> key, specific action details, etc.) which constitutes significant
>> portion of payload on resulting netlink packets and increases amount of
>> syscalls necessary to dump all filters on particular Qdisc. In order to
>> significantly improve filter dump rate this patch sets implement new
>> mode of TC filter dump operation named "terse dump" mode. In this mode
>> only parameters necessary to identify the filter (handle, action cookie,
>> etc.) and data that can change during filter lifecycle (filter flags,
>> action stats, etc.) are preserved in dump output while everything else
>> is omitted.
>>
>> Userspace API is implemented using new TCA_DUMP_FLAGS tlv with only
>> available flag value TCA_DUMP_FLAGS_TERSE. Internally, new API requires
>> individual classifier support (new tcf_proto_ops->terse_dump()
>> callback). Support for action terse dump is implemented in act API and
>> don't require changing individual action implementations.
>  ...
>
> This looks fine, so series applied.
>
> But really if people just want an efficient stats dump there is probably
> a better way to efficiently encode just the IDs and STATs.  Maybe even
> put the stats in pages that userland can mmap() and avoid all of this
> system call overhead and locking altogether.

Thanks! Adding such API will be my next step, if terse dump performance
proves insufficient.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next v2 0/4] Implement classifier-action terse dump mode
  2020-05-15 11:40 [PATCH net-next v2 0/4] Implement classifier-action terse dump mode Vlad Buslov
                   ` (6 preceding siblings ...)
  2020-05-17 19:13 ` Cong Wang
@ 2020-05-18 15:37 ` Edward Cree
  2020-05-18 18:50   ` Cong Wang
                     ` (2 more replies)
  7 siblings, 3 replies; 31+ messages in thread
From: Edward Cree @ 2020-05-18 15:37 UTC (permalink / raw)
  To: Vlad Buslov, netdev
  Cc: davem, jhs, xiyou.wangcong, jiri, dcaratti, marcelo.leitner, kuba

On 15/05/2020 12:40, Vlad Buslov wrote:
> In order to
> significantly improve filter dump rate this patch sets implement new
> mode of TC filter dump operation named "terse dump" mode. In this mode
> only parameters necessary to identify the filter (handle, action cookie,
> etc.) and data that can change during filter lifecycle (filter flags,
> action stats, etc.) are preserved in dump output while everything else
> is omitted.
I realise I'm a bit late, but isn't this the kind of policy that shouldn't
 be hard-coded in the kernel?  I.e. if next year it turns out that some
 user needs one parameter that's been omitted here, but not the whole dump,
 are they going to want to add another mode to the uapi?
Should this not instead have been done as a set of flags to specify which
 pieces of information the caller wanted in the dump, rather than a mode
 flag selecting a pre-defined set?

-ed

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next v2 0/4] Implement classifier-action terse dump mode
  2020-05-18  6:44   ` Vlad Buslov
@ 2020-05-18 18:46     ` Cong Wang
  2020-05-19  9:10       ` Vlad Buslov
  0 siblings, 1 reply; 31+ messages in thread
From: Cong Wang @ 2020-05-18 18:46 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Jiri Pirko, Linux Kernel Network Developers, David Miller,
	Jamal Hadi Salim, Davide Caratti, Marcelo Ricardo Leitner,
	Jakub Kicinski

On Sun, May 17, 2020 at 11:44 PM Vlad Buslov <vladbu@mellanox.com> wrote:
>
>
> On Sun 17 May 2020 at 22:13, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > On Fri, May 15, 2020 at 4:40 AM Vlad Buslov <vladbu@mellanox.com> wrote:
> >>
> >> Output rate of current upstream kernel TC filter dump implementation if
> >> relatively low (~100k rules/sec depending on configuration). This
> >> constraint impacts performance of software switch implementation that
> >> rely on TC for their datapath implementation and periodically call TC
> >> filter dump to update rules stats. Moreover, TC filter dump output a lot
> >> of static data that don't change during the filter lifecycle (filter
> >> key, specific action details, etc.) which constitutes significant
> >> portion of payload on resulting netlink packets and increases amount of
> >> syscalls necessary to dump all filters on particular Qdisc. In order to
> >> significantly improve filter dump rate this patch sets implement new
> >> mode of TC filter dump operation named "terse dump" mode. In this mode
> >> only parameters necessary to identify the filter (handle, action cookie,
> >> etc.) and data that can change during filter lifecycle (filter flags,
> >> action stats, etc.) are preserved in dump output while everything else
> >> is omitted.
> >>
> >> Userspace API is implemented using new TCA_DUMP_FLAGS tlv with only
> >> available flag value TCA_DUMP_FLAGS_TERSE. Internally, new API requires
> >> individual classifier support (new tcf_proto_ops->terse_dump()
> >> callback). Support for action terse dump is implemented in act API and
> >> don't require changing individual action implementations.
> >
> > Sorry for being late.
> >
> > Why terse dump needs a new ops if it only dumps a subset of the
> > regular dump? That is, why not just pass a boolean flag to regular
> > ->dump() implementation?
> >
> > I guess that might break user-space ABI? At least some netlink
> > attributes are not always dumped anyway, so it does not look like
> > a problem?
> >
> > Thanks.
>
> Hi Cong,
>
> I considered adding a flag to ->dump() callback but decided against it
> for following reasons:
>
> - It complicates fl_dump() code by adding additional conditionals. Not a
>   big problem but it seemed better for me to have a standalone callback
>   because with combined implementation it is even hard to deduce what
>   does terse dump actually output.

This is not a problem, at least you can add a big if in fl_dump(),
something like:

if (terse) {
  // do terse dump
  return 0;
}
// normal dump

>
> - My initial implementation just called regular dump for classifiers
>   that don't support terse dump, but in internal review Jiri insisted
>   that cls API should fail if it can't satisfy user's request and having
>   dedicated callback allows implementation to return an error if
>   classifier doesn't define ->terse_dump(). With flag approach it would
>   be not trivial to determine if implementation actually uses the flag.

Hmm? For those not support terse dump, we can just do:

if (terse)
  return -EOPNOTSUPP;
// normal dump goes here

You just have to pass 'terse' flag to all implementations and let them
to decide whether to support it or not.


>   I guess I could have added new tcf_proto_ops->flags value to designate
>   terse dump support, but checking for dedicated callback existence
>   seemed like obvious approach.

This does not look necessary, as long as we can just pass the flag
down to each ->dump().

Thanks.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next v2 0/4] Implement classifier-action terse dump mode
  2020-05-18 15:37 ` Edward Cree
@ 2020-05-18 18:50   ` Cong Wang
  2020-05-19  9:04   ` Vlad Buslov
  2020-05-21 14:36   ` Vlad Buslov
  2 siblings, 0 replies; 31+ messages in thread
From: Cong Wang @ 2020-05-18 18:50 UTC (permalink / raw)
  To: Edward Cree
  Cc: Vlad Buslov, Linux Kernel Network Developers, David Miller,
	Jamal Hadi Salim, Jiri Pirko, Davide Caratti,
	Marcelo Ricardo Leitner, Jakub Kicinski

On Mon, May 18, 2020 at 8:38 AM Edward Cree <ecree@solarflare.com> wrote:
>
> On 15/05/2020 12:40, Vlad Buslov wrote:
> > In order to
> > significantly improve filter dump rate this patch sets implement new
> > mode of TC filter dump operation named "terse dump" mode. In this mode
> > only parameters necessary to identify the filter (handle, action cookie,
> > etc.) and data that can change during filter lifecycle (filter flags,
> > action stats, etc.) are preserved in dump output while everything else
> > is omitted.
> I realise I'm a bit late, but isn't this the kind of policy that shouldn't
>  be hard-coded in the kernel?  I.e. if next year it turns out that some
>  user needs one parameter that's been omitted here, but not the whole dump,
>  are they going to want to add another mode to the uapi?
> Should this not instead have been done as a set of flags to specify which
>  pieces of information the caller wanted in the dump, rather than a mode
>  flag selecting a pre-defined set?

Excellent point!

I agree, this is more elegant, although potentially needs more work.

I am not sure whether we can simply pass those flags to cb->args[],
if not, that will need more work at netlink layer.

Thanks.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next v2 0/4] Implement classifier-action terse dump mode
  2020-05-18 15:37 ` Edward Cree
  2020-05-18 18:50   ` Cong Wang
@ 2020-05-19  9:04   ` Vlad Buslov
  2020-05-19 14:30     ` Edward Cree
  2020-05-19 18:58     ` Cong Wang
  2020-05-21 14:36   ` Vlad Buslov
  2 siblings, 2 replies; 31+ messages in thread
From: Vlad Buslov @ 2020-05-19  9:04 UTC (permalink / raw)
  To: Edward Cree
  Cc: Vlad Buslov, netdev, davem, jhs, xiyou.wangcong, jiri, dcaratti,
	marcelo.leitner, kuba


On Mon 18 May 2020 at 18:37, Edward Cree <ecree@solarflare.com> wrote:
> On 15/05/2020 12:40, Vlad Buslov wrote:
>> In order to
>> significantly improve filter dump rate this patch sets implement new
>> mode of TC filter dump operation named "terse dump" mode. In this mode
>> only parameters necessary to identify the filter (handle, action cookie,
>> etc.) and data that can change during filter lifecycle (filter flags,
>> action stats, etc.) are preserved in dump output while everything else
>> is omitted.
> I realise I'm a bit late, but isn't this the kind of policy that shouldn't
>  be hard-coded in the kernel?  I.e. if next year it turns out that some
>  user needs one parameter that's been omitted here, but not the whole dump,
>  are they going to want to add another mode to the uapi?

Why not just extend terse dump? I won't break user land unless you are
removing something from it.

> Should this not instead have been done as a set of flags to specify which
>  pieces of information the caller wanted in the dump, rather than a mode
>  flag selecting a pre-defined set?
>
> -ed

I considered that approach initially but decided against it for
following reasons:

- Generic data is covered by current terse dump implementation.
  Everything else will be act or cls specific which would result long
  list of flag values like: TCA_DUMP_FLOWER_KEY_ETH_DST,
  TCA_DUMP_FLOWER_KEY_ETH_DST, TCA_DUMP_FLOWER_KEY_VLAN_ID, ...,
  TCA_DUMP_TUNNEL_KEY_ENC_KEY_ID, TCA_DUMP_TUNNEL_KEY_ENC_TOS. All of
  these would require a lot of dedicated logic in act and cls dump
  callbacks. Also, it would be quite a challenge to test all possible
  combinations.

- It is hard to come up with proper validation for such implementation.
  In case of terse dump I just return an error if classifier doesn't
  implement the callback (and since current implementation only outputs
  generic action info, it doesn't even require support from
  action-specific dump callbacks). But, for example, how do we validate
  a case where user sets some flower and tunnel_key act dump flags from
  previous paragraph, but Qdisc contains some other classifier? Or
  flower classifier points to other types of actions? Or when flower
  classifier has and tunnel_key actions but also mirred? Should the
  implementation return an error on encountering any classifier or
  action that doesn't have any flags set for its type or just print all
  data like regular dump? What if user asks to dump some specific option
  that wasn't set for particular filter of action instance?

Overall, the more I think about such implementation the more it looks
like a mess to me.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next v2 0/4] Implement classifier-action terse dump mode
  2020-05-18 18:46     ` Cong Wang
@ 2020-05-19  9:10       ` Vlad Buslov
  2020-05-19 18:39         ` Cong Wang
  0 siblings, 1 reply; 31+ messages in thread
From: Vlad Buslov @ 2020-05-19  9:10 UTC (permalink / raw)
  To: Cong Wang
  Cc: Vlad Buslov, Jiri Pirko, Linux Kernel Network Developers,
	David Miller, Jamal Hadi Salim, Davide Caratti,
	Marcelo Ricardo Leitner, Jakub Kicinski


On Mon 18 May 2020 at 21:46, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Sun, May 17, 2020 at 11:44 PM Vlad Buslov <vladbu@mellanox.com> wrote:
>>
>>
>> On Sun 17 May 2020 at 22:13, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> > On Fri, May 15, 2020 at 4:40 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>> >>
>> >> Output rate of current upstream kernel TC filter dump implementation if
>> >> relatively low (~100k rules/sec depending on configuration). This
>> >> constraint impacts performance of software switch implementation that
>> >> rely on TC for their datapath implementation and periodically call TC
>> >> filter dump to update rules stats. Moreover, TC filter dump output a lot
>> >> of static data that don't change during the filter lifecycle (filter
>> >> key, specific action details, etc.) which constitutes significant
>> >> portion of payload on resulting netlink packets and increases amount of
>> >> syscalls necessary to dump all filters on particular Qdisc. In order to
>> >> significantly improve filter dump rate this patch sets implement new
>> >> mode of TC filter dump operation named "terse dump" mode. In this mode
>> >> only parameters necessary to identify the filter (handle, action cookie,
>> >> etc.) and data that can change during filter lifecycle (filter flags,
>> >> action stats, etc.) are preserved in dump output while everything else
>> >> is omitted.
>> >>
>> >> Userspace API is implemented using new TCA_DUMP_FLAGS tlv with only
>> >> available flag value TCA_DUMP_FLAGS_TERSE. Internally, new API requires
>> >> individual classifier support (new tcf_proto_ops->terse_dump()
>> >> callback). Support for action terse dump is implemented in act API and
>> >> don't require changing individual action implementations.
>> >
>> > Sorry for being late.
>> >
>> > Why terse dump needs a new ops if it only dumps a subset of the
>> > regular dump? That is, why not just pass a boolean flag to regular
>> > ->dump() implementation?
>> >
>> > I guess that might break user-space ABI? At least some netlink
>> > attributes are not always dumped anyway, so it does not look like
>> > a problem?
>> >
>> > Thanks.
>>
>> Hi Cong,
>>
>> I considered adding a flag to ->dump() callback but decided against it
>> for following reasons:
>>
>> - It complicates fl_dump() code by adding additional conditionals. Not a
>>   big problem but it seemed better for me to have a standalone callback
>>   because with combined implementation it is even hard to deduce what
>>   does terse dump actually output.
>
> This is not a problem, at least you can add a big if in fl_dump(),
> something like:
>
> if (terse) {
>   // do terse dump
>   return 0;
> }
> // normal dump

That is what I was trying to prevent with my implementation: having big
"superfunctions" that implement multiple things with branching. Why not
just have dedicated callbacks that do exactly one thing?

>
>>
>> - My initial implementation just called regular dump for classifiers
>>   that don't support terse dump, but in internal review Jiri insisted
>>   that cls API should fail if it can't satisfy user's request and having
>>   dedicated callback allows implementation to return an error if
>>   classifier doesn't define ->terse_dump(). With flag approach it would
>>   be not trivial to determine if implementation actually uses the flag.
>
> Hmm? For those not support terse dump, we can just do:
>
> if (terse)
>   return -EOPNOTSUPP;
> // normal dump goes here
>
> You just have to pass 'terse' flag to all implementations and let them
> to decide whether to support it or not.

But why duplicate the same code to all existing cls dump implementations
instead of having such check nicely implemented in cls API (via callback
existence or a flag)?

>
>
>>   I guess I could have added new tcf_proto_ops->flags value to designate
>>   terse dump support, but checking for dedicated callback existence
>>   seemed like obvious approach.
>
> This does not look necessary, as long as we can just pass the flag
> down to each ->dump().
>
> Thanks.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next v2 0/4] Implement classifier-action terse dump mode
  2020-05-19  9:04   ` Vlad Buslov
@ 2020-05-19 14:30     ` Edward Cree
  2020-05-19 15:17       ` Vlad Buslov
  2020-05-19 18:58     ` Cong Wang
  1 sibling, 1 reply; 31+ messages in thread
From: Edward Cree @ 2020-05-19 14:30 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: netdev, davem, jhs, xiyou.wangcong, jiri, dcaratti,
	marcelo.leitner, kuba

On 19/05/2020 10:04, Vlad Buslov wrote:
> On Mon 18 May 2020 at 18:37, Edward Cree <ecree@solarflare.com> wrote:
>> I.e. if next year it turns out that some
>>  user needs one parameter that's been omitted here, but not the whole dump,
>>  are they going to want to add another mode to the uapi?
> Why not just extend terse dump? I won't break user land unless you are
> removing something from it.
But then all terse dump users pay the performance cost for thatone
 app's extra need.

> - Generic data is covered by current terse dump implementation.
>   Everything else will be act or cls specific
Fair point.
I don't suppose something something BPF mumble solve this? I haven't
 been following the BPF dumping work in detail but it sounds like it
 might be a cheap way to get the 'more performant next step' that
 was mentioned in the subthread with David.  Just a thought.

-ed

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next v2 0/4] Implement classifier-action terse dump mode
  2020-05-19 14:30     ` Edward Cree
@ 2020-05-19 15:17       ` Vlad Buslov
  0 siblings, 0 replies; 31+ messages in thread
From: Vlad Buslov @ 2020-05-19 15:17 UTC (permalink / raw)
  To: Edward Cree
  Cc: Vlad Buslov, netdev, davem, jhs, xiyou.wangcong, jiri, dcaratti,
	marcelo.leitner, kuba


On Tue 19 May 2020 at 17:30, Edward Cree <ecree@solarflare.com> wrote:
> On 19/05/2020 10:04, Vlad Buslov wrote:
>> On Mon 18 May 2020 at 18:37, Edward Cree <ecree@solarflare.com> wrote:
>>> I.e. if next year it turns out that some
>>>  user needs one parameter that's been omitted here, but not the whole dump,
>>>  are they going to want to add another mode to the uapi?
>> Why not just extend terse dump? I won't break user land unless you are
>> removing something from it.
> But then all terse dump users pay the performance cost for thatone
>  app's extra need.

Yes. However reducing amount of data per filter is only part of
optimization. Skipping fl_dump_key() in flower and completely omitting
calling any of the tc_action_ops callbacks is also a major improvement.
So as long as outputting new parameter doesn't require calling one of
those (and it shouldn't, otherwise the parameter represent something
very cls or action type specific) it shouldn't impact performance
significantly.

>
>> - Generic data is covered by current terse dump implementation.
>>   Everything else will be act or cls specific
> Fair point.
> I don't suppose something something BPF mumble solve this? I haven't
>  been following the BPF dumping work in detail but it sounds like it
>  might be a cheap way to get the 'more performant next step' that
>  was mentioned in the subthread with David.  Just a thought.

I'm very interested in ideas how to use BPF to improve dump performance
so any comments/suggestions from resident BPF experts are welcome. Don't
think it will as fast as what David suggested though.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next v2 0/4] Implement classifier-action terse dump mode
  2020-05-19  9:10       ` Vlad Buslov
@ 2020-05-19 18:39         ` Cong Wang
  2020-05-20  7:33           ` Vlad Buslov
  0 siblings, 1 reply; 31+ messages in thread
From: Cong Wang @ 2020-05-19 18:39 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Jiri Pirko, Linux Kernel Network Developers, David Miller,
	Jamal Hadi Salim, Davide Caratti, Marcelo Ricardo Leitner,
	Jakub Kicinski

On Tue, May 19, 2020 at 2:10 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>
>
> On Mon 18 May 2020 at 21:46, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > On Sun, May 17, 2020 at 11:44 PM Vlad Buslov <vladbu@mellanox.com> wrote:
> >>
> >>
> >> On Sun 17 May 2020 at 22:13, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >> > On Fri, May 15, 2020 at 4:40 AM Vlad Buslov <vladbu@mellanox.com> wrote:
> >> >>
> >> >> Output rate of current upstream kernel TC filter dump implementation if
> >> >> relatively low (~100k rules/sec depending on configuration). This
> >> >> constraint impacts performance of software switch implementation that
> >> >> rely on TC for their datapath implementation and periodically call TC
> >> >> filter dump to update rules stats. Moreover, TC filter dump output a lot
> >> >> of static data that don't change during the filter lifecycle (filter
> >> >> key, specific action details, etc.) which constitutes significant
> >> >> portion of payload on resulting netlink packets and increases amount of
> >> >> syscalls necessary to dump all filters on particular Qdisc. In order to
> >> >> significantly improve filter dump rate this patch sets implement new
> >> >> mode of TC filter dump operation named "terse dump" mode. In this mode
> >> >> only parameters necessary to identify the filter (handle, action cookie,
> >> >> etc.) and data that can change during filter lifecycle (filter flags,
> >> >> action stats, etc.) are preserved in dump output while everything else
> >> >> is omitted.
> >> >>
> >> >> Userspace API is implemented using new TCA_DUMP_FLAGS tlv with only
> >> >> available flag value TCA_DUMP_FLAGS_TERSE. Internally, new API requires
> >> >> individual classifier support (new tcf_proto_ops->terse_dump()
> >> >> callback). Support for action terse dump is implemented in act API and
> >> >> don't require changing individual action implementations.
> >> >
> >> > Sorry for being late.
> >> >
> >> > Why terse dump needs a new ops if it only dumps a subset of the
> >> > regular dump? That is, why not just pass a boolean flag to regular
> >> > ->dump() implementation?
> >> >
> >> > I guess that might break user-space ABI? At least some netlink
> >> > attributes are not always dumped anyway, so it does not look like
> >> > a problem?
> >> >
> >> > Thanks.
> >>
> >> Hi Cong,
> >>
> >> I considered adding a flag to ->dump() callback but decided against it
> >> for following reasons:
> >>
> >> - It complicates fl_dump() code by adding additional conditionals. Not a
> >>   big problem but it seemed better for me to have a standalone callback
> >>   because with combined implementation it is even hard to deduce what
> >>   does terse dump actually output.
> >
> > This is not a problem, at least you can add a big if in fl_dump(),
> > something like:
> >
> > if (terse) {
> >   // do terse dump
> >   return 0;
> > }
> > // normal dump
>
> That is what I was trying to prevent with my implementation: having big
> "superfunctions" that implement multiple things with branching. Why not
> just have dedicated callbacks that do exactly one thing?

1. Saving one unnecessary ops.
2. Easier to trace the code: all dumpings are in one implementation.

>
> >
> >>
> >> - My initial implementation just called regular dump for classifiers
> >>   that don't support terse dump, but in internal review Jiri insisted
> >>   that cls API should fail if it can't satisfy user's request and having
> >>   dedicated callback allows implementation to return an error if
> >>   classifier doesn't define ->terse_dump(). With flag approach it would
> >>   be not trivial to determine if implementation actually uses the flag.
> >
> > Hmm? For those not support terse dump, we can just do:
> >
> > if (terse)
> >   return -EOPNOTSUPP;
> > // normal dump goes here
> >
> > You just have to pass 'terse' flag to all implementations and let them
> > to decide whether to support it or not.
>
> But why duplicate the same code to all existing cls dump implementations
> instead of having such check nicely implemented in cls API (via callback
> existence or a flag)?

I am confused, your fl_terse_dump() also has duplication with fl_dump()...

Thanks.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next v2 0/4] Implement classifier-action terse dump mode
  2020-05-19  9:04   ` Vlad Buslov
  2020-05-19 14:30     ` Edward Cree
@ 2020-05-19 18:58     ` Cong Wang
  2020-05-20  7:24       ` Vlad Buslov
  1 sibling, 1 reply; 31+ messages in thread
From: Cong Wang @ 2020-05-19 18:58 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Edward Cree, Linux Kernel Network Developers, David Miller,
	Jamal Hadi Salim, Jiri Pirko, Davide Caratti,
	Marcelo Ricardo Leitner, Jakub Kicinski

On Tue, May 19, 2020 at 2:04 AM Vlad Buslov <vladbu@mellanox.com> wrote:
> I considered that approach initially but decided against it for
> following reasons:
>
> - Generic data is covered by current terse dump implementation.
>   Everything else will be act or cls specific which would result long
>   list of flag values like: TCA_DUMP_FLOWER_KEY_ETH_DST,
>   TCA_DUMP_FLOWER_KEY_ETH_DST, TCA_DUMP_FLOWER_KEY_VLAN_ID, ...,
>   TCA_DUMP_TUNNEL_KEY_ENC_KEY_ID, TCA_DUMP_TUNNEL_KEY_ENC_TOS. All of
>   these would require a lot of dedicated logic in act and cls dump
>   callbacks. Also, it would be quite a challenge to test all possible
>   combinations.

Well, if you consider netlink dump as a database query, what Edward
proposed is merely "select COLUMN1 COLUMN2 from cls_db" rather
than "select * from cls_db".

No one said it is easy to implement, it is just more elegant than you
select a hardcoded set of columns for the user.

Think about it, what if another user wants a less terse dump but still
not a full dump? Would you implement ops->terse_dump2()? Or
what if people still think your terse dump is not as terse as she wants?
ops->mini_dump()? How many ops's we would end having?


>
> - It is hard to come up with proper validation for such implementation.
>   In case of terse dump I just return an error if classifier doesn't
>   implement the callback (and since current implementation only outputs
>   generic action info, it doesn't even require support from
>   action-specific dump callbacks). But, for example, how do we validate
>   a case where user sets some flower and tunnel_key act dump flags from
>   previous paragraph, but Qdisc contains some other classifier? Or
>   flower classifier points to other types of actions? Or when flower
>   classifier has and tunnel_key actions but also mirred? Should the

Each action should be able to dump selectively too. If you think it
as a database, it is just a different table with different schemas.


>   implementation return an error on encountering any classifier or
>   action that doesn't have any flags set for its type or just print all
>   data like regular dump? What if user asks to dump some specific option
>   that wasn't set for particular filter of action instance?

Undefined or error.

>
> Overall, the more I think about such implementation the more it looks
> like a mess to me.

This is what I think about your current implementation. You know once
we add this we can't remove it any longer, right? This is why we should
make it right and better in the first place, not after carrying it for even one
release.

Thanks.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next v2 0/4] Implement classifier-action terse dump mode
  2020-05-19 18:58     ` Cong Wang
@ 2020-05-20  7:24       ` Vlad Buslov
  2020-05-22 19:33         ` Cong Wang
  0 siblings, 1 reply; 31+ messages in thread
From: Vlad Buslov @ 2020-05-20  7:24 UTC (permalink / raw)
  To: Cong Wang
  Cc: Vlad Buslov, Edward Cree, Linux Kernel Network Developers,
	David Miller, Jamal Hadi Salim, Jiri Pirko, Davide Caratti,
	Marcelo Ricardo Leitner, Jakub Kicinski


On Tue 19 May 2020 at 21:58, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Tue, May 19, 2020 at 2:04 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>> I considered that approach initially but decided against it for
>> following reasons:
>>
>> - Generic data is covered by current terse dump implementation.
>>   Everything else will be act or cls specific which would result long
>>   list of flag values like: TCA_DUMP_FLOWER_KEY_ETH_DST,
>>   TCA_DUMP_FLOWER_KEY_ETH_DST, TCA_DUMP_FLOWER_KEY_VLAN_ID, ...,
>>   TCA_DUMP_TUNNEL_KEY_ENC_KEY_ID, TCA_DUMP_TUNNEL_KEY_ENC_TOS. All of
>>   these would require a lot of dedicated logic in act and cls dump
>>   callbacks. Also, it would be quite a challenge to test all possible
>>   combinations.
>
> Well, if you consider netlink dump as a database query, what Edward
> proposed is merely "select COLUMN1 COLUMN2 from cls_db" rather
> than "select * from cls_db".
>
> No one said it is easy to implement, it is just more elegant than you
> select a hardcoded set of columns for the user.

As I explained to Edward, having denser netlink packets with more
filters per packet is only part of optimization. Another part is not
executing some code at all. Consider fl_dump_key() which is 200 lines
function with bunch of conditionals like that:

static int fl_dump_key(struct sk_buff *skb, struct net *net,
		       struct fl_flow_key *key, struct fl_flow_key *mask)
{
	if (mask->meta.ingress_ifindex) {
		struct net_device *dev;

		dev = __dev_get_by_index(net, key->meta.ingress_ifindex);
		if (dev && nla_put_string(skb, TCA_FLOWER_INDEV, dev->name))
			goto nla_put_failure;
	}

	if (fl_dump_key_val(skb, key->eth.dst, TCA_FLOWER_KEY_ETH_DST,
			    mask->eth.dst, TCA_FLOWER_KEY_ETH_DST_MASK,
			    sizeof(key->eth.dst)) ||
	    fl_dump_key_val(skb, key->eth.src, TCA_FLOWER_KEY_ETH_SRC,
			    mask->eth.src, TCA_FLOWER_KEY_ETH_SRC_MASK,
			    sizeof(key->eth.src)) ||
	    fl_dump_key_val(skb, &key->basic.n_proto, TCA_FLOWER_KEY_ETH_TYPE,
			    &mask->basic.n_proto, TCA_FLOWER_UNSPEC,
			    sizeof(key->basic.n_proto)))
		goto nla_put_failure;

	if (fl_dump_key_mpls(skb, &key->mpls, &mask->mpls))
		goto nla_put_failure;

	if (fl_dump_key_vlan(skb, TCA_FLOWER_KEY_VLAN_ID,
			     TCA_FLOWER_KEY_VLAN_PRIO, &key->vlan, &mask->vlan))
		goto nla_put_failure;
    ...


Now imagine all of these are extended with additional if (flags &
TCA_DUMP_XXX). All gains from not outputting some other minor stuff into
netlink packet will be negated by it.


>
> Think about it, what if another user wants a less terse dump but still
> not a full dump? Would you implement ops->terse_dump2()? Or
> what if people still think your terse dump is not as terse as she wants?
> ops->mini_dump()? How many ops's we would end having?

User can discard whatever he doesn't need in user land code. The goal of
this change is performance optimization, not designing a generic
kernel-space data filtering mechanism.

>
>
>>
>> - It is hard to come up with proper validation for such implementation.
>>   In case of terse dump I just return an error if classifier doesn't
>>   implement the callback (and since current implementation only outputs
>>   generic action info, it doesn't even require support from
>>   action-specific dump callbacks). But, for example, how do we validate
>>   a case where user sets some flower and tunnel_key act dump flags from
>>   previous paragraph, but Qdisc contains some other classifier? Or
>>   flower classifier points to other types of actions? Or when flower
>>   classifier has and tunnel_key actions but also mirred? Should the
>
> Each action should be able to dump selectively too. If you think it
> as a database, it is just a different table with different schemas.

How is designing custom SQL-like query language (according to your
example at the beginning of the mail) for filter dump is going to
improve performance? If there is a way to do it in fast a generic manner
with BPF, then I'm very interested to hear the details. But adding
hundred more hardcoded conditionals is just not a solution considering
main motivations for this change is performance.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next v2 0/4] Implement classifier-action terse dump mode
  2020-05-19 18:39         ` Cong Wang
@ 2020-05-20  7:33           ` Vlad Buslov
  0 siblings, 0 replies; 31+ messages in thread
From: Vlad Buslov @ 2020-05-20  7:33 UTC (permalink / raw)
  To: Cong Wang
  Cc: Vlad Buslov, Jiri Pirko, Linux Kernel Network Developers,
	David Miller, Jamal Hadi Salim, Davide Caratti,
	Marcelo Ricardo Leitner, Jakub Kicinski


On Tue 19 May 2020 at 21:39, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Tue, May 19, 2020 at 2:10 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>>
>>
>> On Mon 18 May 2020 at 21:46, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> > On Sun, May 17, 2020 at 11:44 PM Vlad Buslov <vladbu@mellanox.com> wrote:
>> >>
>> >>
>> >> On Sun 17 May 2020 at 22:13, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> >> > On Fri, May 15, 2020 at 4:40 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>> >> >>
>> >> >> Output rate of current upstream kernel TC filter dump implementation if
>> >> >> relatively low (~100k rules/sec depending on configuration). This
>> >> >> constraint impacts performance of software switch implementation that
>> >> >> rely on TC for their datapath implementation and periodically call TC
>> >> >> filter dump to update rules stats. Moreover, TC filter dump output a lot
>> >> >> of static data that don't change during the filter lifecycle (filter
>> >> >> key, specific action details, etc.) which constitutes significant
>> >> >> portion of payload on resulting netlink packets and increases amount of
>> >> >> syscalls necessary to dump all filters on particular Qdisc. In order to
>> >> >> significantly improve filter dump rate this patch sets implement new
>> >> >> mode of TC filter dump operation named "terse dump" mode. In this mode
>> >> >> only parameters necessary to identify the filter (handle, action cookie,
>> >> >> etc.) and data that can change during filter lifecycle (filter flags,
>> >> >> action stats, etc.) are preserved in dump output while everything else
>> >> >> is omitted.
>> >> >>
>> >> >> Userspace API is implemented using new TCA_DUMP_FLAGS tlv with only
>> >> >> available flag value TCA_DUMP_FLAGS_TERSE. Internally, new API requires
>> >> >> individual classifier support (new tcf_proto_ops->terse_dump()
>> >> >> callback). Support for action terse dump is implemented in act API and
>> >> >> don't require changing individual action implementations.
>> >> >
>> >> > Sorry for being late.
>> >> >
>> >> > Why terse dump needs a new ops if it only dumps a subset of the
>> >> > regular dump? That is, why not just pass a boolean flag to regular
>> >> > ->dump() implementation?
>> >> >
>> >> > I guess that might break user-space ABI? At least some netlink
>> >> > attributes are not always dumped anyway, so it does not look like
>> >> > a problem?
>> >> >
>> >> > Thanks.
>> >>
>> >> Hi Cong,
>> >>
>> >> I considered adding a flag to ->dump() callback but decided against it
>> >> for following reasons:
>> >>
>> >> - It complicates fl_dump() code by adding additional conditionals. Not a
>> >>   big problem but it seemed better for me to have a standalone callback
>> >>   because with combined implementation it is even hard to deduce what
>> >>   does terse dump actually output.
>> >
>> > This is not a problem, at least you can add a big if in fl_dump(),
>> > something like:
>> >
>> > if (terse) {
>> >   // do terse dump
>> >   return 0;
>> > }
>> > // normal dump
>>
>> That is what I was trying to prevent with my implementation: having big
>> "superfunctions" that implement multiple things with branching. Why not
>> just have dedicated callbacks that do exactly one thing?
>
> 1. Saving one unnecessary ops.
> 2. Easier to trace the code: all dumpings are in one implementation.

Okay, I see your point.

>
>>
>> >
>> >>
>> >> - My initial implementation just called regular dump for classifiers
>> >>   that don't support terse dump, but in internal review Jiri insisted
>> >>   that cls API should fail if it can't satisfy user's request and having
>> >>   dedicated callback allows implementation to return an error if
>> >>   classifier doesn't define ->terse_dump(). With flag approach it would
>> >>   be not trivial to determine if implementation actually uses the flag.
>> >
>> > Hmm? For those not support terse dump, we can just do:
>> >
>> > if (terse)
>> >   return -EOPNOTSUPP;
>> > // normal dump goes here
>> >
>> > You just have to pass 'terse' flag to all implementations and let them
>> > to decide whether to support it or not.
>>
>> But why duplicate the same code to all existing cls dump implementations
>> instead of having such check nicely implemented in cls API (via callback
>> existence or a flag)?
>
> I am confused, your fl_terse_dump() also has duplication with fl_dump()...
>
> Thanks.

I meant duplicating the "if terse not supported return -EOPNOTSUPP" in
dump callback of every classifier implementation. With current
implementation cls API handles such case by checking whether classifier
implementation has ->terse_dump() defined and returns error otherwise.
This can also be achieved by having a new classifier flag, in case we
decide to proceed with folding both dump and terse_dump into single
->dump(bool terse) callback.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next v2 0/4] Implement classifier-action terse dump mode
  2020-05-18 15:37 ` Edward Cree
  2020-05-18 18:50   ` Cong Wang
  2020-05-19  9:04   ` Vlad Buslov
@ 2020-05-21 14:36   ` Vlad Buslov
  2020-05-21 17:02     ` Jakub Kicinski
  2020-05-22 19:41     ` Cong Wang
  2 siblings, 2 replies; 31+ messages in thread
From: Vlad Buslov @ 2020-05-21 14:36 UTC (permalink / raw)
  To: Edward Cree, xiyou.wangcong
  Cc: Vlad Buslov, netdev, davem, jhs, jiri, dcaratti, marcelo.leitner, kuba

Hi Edward, Cong,

On Mon 18 May 2020 at 18:37, Edward Cree <ecree@solarflare.com> wrote:
> On 15/05/2020 12:40, Vlad Buslov wrote:
>> In order to
>> significantly improve filter dump rate this patch sets implement new
>> mode of TC filter dump operation named "terse dump" mode. In this mode
>> only parameters necessary to identify the filter (handle, action cookie,
>> etc.) and data that can change during filter lifecycle (filter flags,
>> action stats, etc.) are preserved in dump output while everything else
>> is omitted.
> I realise I'm a bit late, but isn't this the kind of policy that shouldn't
>  be hard-coded in the kernel?  I.e. if next year it turns out that some
>  user needs one parameter that's been omitted here, but not the whole dump,
>  are they going to want to add another mode to the uapi?
> Should this not instead have been done as a set of flags to specify which
>  pieces of information the caller wanted in the dump, rather than a mode
>  flag selecting a pre-defined set?
>
> -ed

I've been thinking some more about this. While the idea of making
fine-grained dump where user controls exact contents field-by-field is
unfeasible due to performance considerations, we can try to come up with
something more coarse-grained but not fully hardcoded (like current terse
dump implementation). Something like having a set of flags that allows
to skip output of groups of attributes.

For example, CLS_SKIP_KEY flag would skip the whole expensive classifier
key dump without having to go through all 200 lines of conditionals in
fl_dump_key() while ACT_SKIP_OPTIONS would skip outputting TCA_OPTIONS
compound attribute (and expensive call to tc_action_ops->dump()). This
approach would also leave the door open for further more fine-grained
flags, if the need arises. For example, new flags
CLS_SKIP_KEY_{L2,L3,L4} can be introduced to more precisely control
which parts of cls key should be skipped.

The main drawback of such approach is that it is impossible to come up
with universal set of flags that would be applicable for all
classifiers. Key (in some form) is applicable to most classifiers, but
it still doesn't make sense for matchall or bpf. Some classifiers have
'flags', some don't. Hardware-offloaded classifiers have in_hw_count.
Considering this, initial set of flags will be somewhat flower-centric.

What do you think?

Regards,
Vlad

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next v2 0/4] Implement classifier-action terse dump mode
  2020-05-21 14:36   ` Vlad Buslov
@ 2020-05-21 17:02     ` Jakub Kicinski
  2020-05-22 16:16       ` Vlad Buslov
  2020-05-22 19:41     ` Cong Wang
  1 sibling, 1 reply; 31+ messages in thread
From: Jakub Kicinski @ 2020-05-21 17:02 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Edward Cree, xiyou.wangcong, netdev, davem, jhs, jiri, dcaratti,
	marcelo.leitner

On Thu, 21 May 2020 17:36:12 +0300 Vlad Buslov wrote:
> Hi Edward, Cong,
> 
> On Mon 18 May 2020 at 18:37, Edward Cree <ecree@solarflare.com> wrote:
> > On 15/05/2020 12:40, Vlad Buslov wrote:  
> >> In order to
> >> significantly improve filter dump rate this patch sets implement new
> >> mode of TC filter dump operation named "terse dump" mode. In this mode
> >> only parameters necessary to identify the filter (handle, action cookie,
> >> etc.) and data that can change during filter lifecycle (filter flags,
> >> action stats, etc.) are preserved in dump output while everything else
> >> is omitted.  
> > I realise I'm a bit late, but isn't this the kind of policy that shouldn't
> >  be hard-coded in the kernel?  I.e. if next year it turns out that some
> >  user needs one parameter that's been omitted here, but not the whole dump,
> >  are they going to want to add another mode to the uapi?
> > Should this not instead have been done as a set of flags to specify which
> >  pieces of information the caller wanted in the dump, rather than a mode
> >  flag selecting a pre-defined set?
> >
> > -ed  
> 
> I've been thinking some more about this. While the idea of making
> fine-grained dump where user controls exact contents field-by-field is
> unfeasible due to performance considerations, we can try to come up with
> something more coarse-grained but not fully hardcoded (like current terse
> dump implementation). Something like having a set of flags that allows
> to skip output of groups of attributes.
> 
> For example, CLS_SKIP_KEY flag would skip the whole expensive classifier
> key dump without having to go through all 200 lines of conditionals in

Do you really need to dump classifiers? If you care about stats 
the actions could be sufficient if the offload code was fixed
appropriately... Sorry I had to say that.

> fl_dump_key() while ACT_SKIP_OPTIONS would skip outputting TCA_OPTIONS
> compound attribute (and expensive call to tc_action_ops->dump()). This
> approach would also leave the door open for further more fine-grained
> flags, if the need arises. For example, new flags
> CLS_SKIP_KEY_{L2,L3,L4} can be introduced to more precisely control
> which parts of cls key should be skipped.

L2, L3, etc. will be meaningless for a lot of classifiers.

> The main drawback of such approach is that it is impossible to come up
> with universal set of flags that would be applicable for all
> classifiers. Key (in some form) is applicable to most classifiers, but
> it still doesn't make sense for matchall or bpf. Some classifiers have
> 'flags', some don't. Hardware-offloaded classifiers have in_hw_count.
> Considering this, initial set of flags will be somewhat flower-centric.
> 
> What do you think?

Simplest heuristic is to dump everything that can't get changed without
a notification. Which I think you're quite close to already..

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next v2 0/4] Implement classifier-action terse dump mode
  2020-05-21 17:02     ` Jakub Kicinski
@ 2020-05-22 16:16       ` Vlad Buslov
  2020-05-23 11:06         ` Jamal Hadi Salim
  0 siblings, 1 reply; 31+ messages in thread
From: Vlad Buslov @ 2020-05-22 16:16 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Vlad Buslov, Edward Cree, xiyou.wangcong, netdev, davem, jhs,
	jiri, dcaratti, marcelo.leitner

On Thu 21 May 2020 at 20:02, Jakub Kicinski <kuba@kernel.org> wrote:
> On Thu, 21 May 2020 17:36:12 +0300 Vlad Buslov wrote:
>> Hi Edward, Cong,
>> 
>> On Mon 18 May 2020 at 18:37, Edward Cree <ecree@solarflare.com> wrote:
>> > On 15/05/2020 12:40, Vlad Buslov wrote:  
>> >> In order to
>> >> significantly improve filter dump rate this patch sets implement new
>> >> mode of TC filter dump operation named "terse dump" mode. In this mode
>> >> only parameters necessary to identify the filter (handle, action cookie,
>> >> etc.) and data that can change during filter lifecycle (filter flags,
>> >> action stats, etc.) are preserved in dump output while everything else
>> >> is omitted.  
>> > I realise I'm a bit late, but isn't this the kind of policy that shouldn't
>> >  be hard-coded in the kernel?  I.e. if next year it turns out that some
>> >  user needs one parameter that's been omitted here, but not the whole dump,
>> >  are they going to want to add another mode to the uapi?
>> > Should this not instead have been done as a set of flags to specify which
>> >  pieces of information the caller wanted in the dump, rather than a mode
>> >  flag selecting a pre-defined set?
>> >
>> > -ed  
>> 
>> I've been thinking some more about this. While the idea of making
>> fine-grained dump where user controls exact contents field-by-field is
>> unfeasible due to performance considerations, we can try to come up with
>> something more coarse-grained but not fully hardcoded (like current terse
>> dump implementation). Something like having a set of flags that allows
>> to skip output of groups of attributes.
>> 
>> For example, CLS_SKIP_KEY flag would skip the whole expensive classifier
>> key dump without having to go through all 200 lines of conditionals in
>
> Do you really need to dump classifiers? If you care about stats 
> the actions could be sufficient if the offload code was fixed
> appropriately... Sorry I had to say that.

Technically I need neither classifier nor action. All I need is cookie
and stats of single terminating action attached to filter (redirect,
drop, etc.). This can be achieved by making terse dump output that data
for last extension on filter. However, when I discussed my initial terse
dump idea with Jamal he asked me not to ossify such behavior to allow
for implementation of offloaded shared actions in future.

Speaking about shared action offload, I remember looking at some RFC
patches by Edward implementing such functionality and allowing action
stats update directly from act, as opposed to current design that relies
on classifier to update action stats from hardware. Is that what you
mean by appropriately fixing offload code? With such implementation,
just dumping relevant action types would also work. My only concern is
that the only way to dump actions is per-namespace as opposed to
per-Qdisc of filters, which would clash with any other cls/act users on
same machine/hypervisor.


[...]


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next v2 0/4] Implement classifier-action terse dump mode
  2020-05-20  7:24       ` Vlad Buslov
@ 2020-05-22 19:33         ` Cong Wang
  2020-05-25 11:38           ` Vlad Buslov
  0 siblings, 1 reply; 31+ messages in thread
From: Cong Wang @ 2020-05-22 19:33 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Edward Cree, Linux Kernel Network Developers, David Miller,
	Jamal Hadi Salim, Jiri Pirko, Davide Caratti,
	Marcelo Ricardo Leitner, Jakub Kicinski

On Wed, May 20, 2020 at 12:24 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>
>
> On Tue 19 May 2020 at 21:58, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > On Tue, May 19, 2020 at 2:04 AM Vlad Buslov <vladbu@mellanox.com> wrote:
> >> I considered that approach initially but decided against it for
> >> following reasons:
> >>
> >> - Generic data is covered by current terse dump implementation.
> >>   Everything else will be act or cls specific which would result long
> >>   list of flag values like: TCA_DUMP_FLOWER_KEY_ETH_DST,
> >>   TCA_DUMP_FLOWER_KEY_ETH_DST, TCA_DUMP_FLOWER_KEY_VLAN_ID, ...,
> >>   TCA_DUMP_TUNNEL_KEY_ENC_KEY_ID, TCA_DUMP_TUNNEL_KEY_ENC_TOS. All of
> >>   these would require a lot of dedicated logic in act and cls dump
> >>   callbacks. Also, it would be quite a challenge to test all possible
> >>   combinations.
> >
> > Well, if you consider netlink dump as a database query, what Edward
> > proposed is merely "select COLUMN1 COLUMN2 from cls_db" rather
> > than "select * from cls_db".
> >
> > No one said it is easy to implement, it is just more elegant than you
> > select a hardcoded set of columns for the user.
>
> As I explained to Edward, having denser netlink packets with more
> filters per packet is only part of optimization. Another part is not
> executing some code at all. Consider fl_dump_key() which is 200 lines
> function with bunch of conditionals like that:
>
> static int fl_dump_key(struct sk_buff *skb, struct net *net,
>                        struct fl_flow_key *key, struct fl_flow_key *mask)
> {
>         if (mask->meta.ingress_ifindex) {
>                 struct net_device *dev;
>
>                 dev = __dev_get_by_index(net, key->meta.ingress_ifindex);
>                 if (dev && nla_put_string(skb, TCA_FLOWER_INDEV, dev->name))
>                         goto nla_put_failure;
>         }
>
>         if (fl_dump_key_val(skb, key->eth.dst, TCA_FLOWER_KEY_ETH_DST,
>                             mask->eth.dst, TCA_FLOWER_KEY_ETH_DST_MASK,
>                             sizeof(key->eth.dst)) ||
>             fl_dump_key_val(skb, key->eth.src, TCA_FLOWER_KEY_ETH_SRC,
>                             mask->eth.src, TCA_FLOWER_KEY_ETH_SRC_MASK,
>                             sizeof(key->eth.src)) ||
>             fl_dump_key_val(skb, &key->basic.n_proto, TCA_FLOWER_KEY_ETH_TYPE,
>                             &mask->basic.n_proto, TCA_FLOWER_UNSPEC,
>                             sizeof(key->basic.n_proto)))
>                 goto nla_put_failure;
>
>         if (fl_dump_key_mpls(skb, &key->mpls, &mask->mpls))
>                 goto nla_put_failure;
>
>         if (fl_dump_key_vlan(skb, TCA_FLOWER_KEY_VLAN_ID,
>                              TCA_FLOWER_KEY_VLAN_PRIO, &key->vlan, &mask->vlan))
>                 goto nla_put_failure;
>     ...
>
>
> Now imagine all of these are extended with additional if (flags &
> TCA_DUMP_XXX). All gains from not outputting some other minor stuff into
> netlink packet will be negated by it.

Interesting, are you saying a bit test is as expensive as appending
an actual netlink attribution to the dumping? I am surprised.


>
>
> >
> > Think about it, what if another user wants a less terse dump but still
> > not a full dump? Would you implement ops->terse_dump2()? Or
> > what if people still think your terse dump is not as terse as she wants?
> > ops->mini_dump()? How many ops's we would end having?
>
> User can discard whatever he doesn't need in user land code. The goal of
> this change is performance optimization, not designing a generic
> kernel-space data filtering mechanism.

You optimize the performance by reducing the dump size, which is
already effectively a data filtering. This doesn't have to be your goal,
you are implementing it anyway.


>
> >
> >
> >>
> >> - It is hard to come up with proper validation for such implementation.
> >>   In case of terse dump I just return an error if classifier doesn't
> >>   implement the callback (and since current implementation only outputs
> >>   generic action info, it doesn't even require support from
> >>   action-specific dump callbacks). But, for example, how do we validate
> >>   a case where user sets some flower and tunnel_key act dump flags from
> >>   previous paragraph, but Qdisc contains some other classifier? Or
> >>   flower classifier points to other types of actions? Or when flower
> >>   classifier has and tunnel_key actions but also mirred? Should the
> >
> > Each action should be able to dump selectively too. If you think it
> > as a database, it is just a different table with different schemas.
>
> How is designing custom SQL-like query language (according to your
> example at the beginning of the mail) for filter dump is going to
> improve performance? If there is a way to do it in fast a generic manner
> with BPF, then I'm very interested to hear the details. But adding
> hundred more hardcoded conditionals is just not a solution considering
> main motivations for this change is performance.

I still wonder how a bit test is as expensive as you claim, it does
not look like you actually measure it. This of course depends on the
size of the dump, but if you look at other netlink dump in kernel,
not just tc filters, we already dump a lot of attributes per record.

Thanks.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next v2 0/4] Implement classifier-action terse dump mode
  2020-05-21 14:36   ` Vlad Buslov
  2020-05-21 17:02     ` Jakub Kicinski
@ 2020-05-22 19:41     ` Cong Wang
  1 sibling, 0 replies; 31+ messages in thread
From: Cong Wang @ 2020-05-22 19:41 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Edward Cree, Linux Kernel Network Developers, David Miller,
	Jamal Hadi Salim, Jiri Pirko, Davide Caratti,
	Marcelo Ricardo Leitner, Jakub Kicinski

On Thu, May 21, 2020 at 7:36 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>
> Hi Edward, Cong,
>
> On Mon 18 May 2020 at 18:37, Edward Cree <ecree@solarflare.com> wrote:
> > On 15/05/2020 12:40, Vlad Buslov wrote:
> >> In order to
> >> significantly improve filter dump rate this patch sets implement new
> >> mode of TC filter dump operation named "terse dump" mode. In this mode
> >> only parameters necessary to identify the filter (handle, action cookie,
> >> etc.) and data that can change during filter lifecycle (filter flags,
> >> action stats, etc.) are preserved in dump output while everything else
> >> is omitted.
> > I realise I'm a bit late, but isn't this the kind of policy that shouldn't
> >  be hard-coded in the kernel?  I.e. if next year it turns out that some
> >  user needs one parameter that's been omitted here, but not the whole dump,
> >  are they going to want to add another mode to the uapi?
> > Should this not instead have been done as a set of flags to specify which
> >  pieces of information the caller wanted in the dump, rather than a mode
> >  flag selecting a pre-defined set?
> >
> > -ed
>
> I've been thinking some more about this. While the idea of making
> fine-grained dump where user controls exact contents field-by-field is
> unfeasible due to performance considerations, we can try to come up with
> something more coarse-grained but not fully hardcoded (like current terse
> dump implementation). Something like having a set of flags that allows
> to skip output of groups of attributes.
>
> For example, CLS_SKIP_KEY flag would skip the whole expensive classifier
> key dump without having to go through all 200 lines of conditionals in
> fl_dump_key() while ACT_SKIP_OPTIONS would skip outputting TCA_OPTIONS
> compound attribute (and expensive call to tc_action_ops->dump()). This
> approach would also leave the door open for further more fine-grained
> flags, if the need arises. For example, new flags
> CLS_SKIP_KEY_{L2,L3,L4} can be introduced to more precisely control
> which parts of cls key should be skipped.
>
> The main drawback of such approach is that it is impossible to come up
> with universal set of flags that would be applicable for all
> classifiers. Key (in some form) is applicable to most classifiers, but
> it still doesn't make sense for matchall or bpf. Some classifiers have
> 'flags', some don't. Hardware-offloaded classifiers have in_hw_count.
> Considering this, initial set of flags will be somewhat flower-centric.
>
> What do you think?

This looks like a reverse filtering to me, so essentially the same.
Please give me some time to think about this, it is definitely not
easy.

The only thing I worry is that once you add terse dump, we cannot
simply remove it any more. (Otherwise I wouldn't even want to push
you on this.)

Thanks.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next v2 0/4] Implement classifier-action terse dump mode
  2020-05-22 16:16       ` Vlad Buslov
@ 2020-05-23 11:06         ` Jamal Hadi Salim
  0 siblings, 0 replies; 31+ messages in thread
From: Jamal Hadi Salim @ 2020-05-23 11:06 UTC (permalink / raw)
  To: Vlad Buslov, Jakub Kicinski
  Cc: Vlad Buslov, Edward Cree, xiyou.wangcong, netdev, davem, jiri,
	dcaratti, marcelo.leitner

On 2020-05-22 12:16 p.m., Vlad Buslov wrote:
> On Thu 21 May 2020 at 20:02, Jakub Kicinski <kuba@kernel.org> wrote:
>> On Thu, 21 May 2020 17:36:12 +0300 Vlad Buslov wrote:
>>> Hi Edward, Cong,
>>>

>> Do you really need to dump classifiers? If you care about stats
>> the actions could be sufficient if the offload code was fixed
>> appropriately... Sorry I had to say that.
> 
> Technically I need neither classifier nor action. All I need is cookie
> and stats of single terminating action attached to filter (redirect,
> drop, etc.). This can be achieved by making terse dump output that data
> for last extension on filter. However, when I discussed my initial terse
> dump idea with Jamal he asked me not to ossify such behavior to allow
> for implementation of offloaded shared actions in future.
> 


Trying to recollect our discussion (please forgive me if i am
rehashing). old skule hardware model typically is ACL style with one
action - therefore concept of tying a counter with with
a classifier is common.

Other hardware i am familiar with tends to have a table of counters.
More an array with indices.
In the shared case using the same counter index from multiple
tables implies it is shared. Note "old skule" does not have
a concept of sharing.

So i was more worried about assuming the "old skule" model
at the expense of other hardware models.
We should be able to dump different tables from hardware.
Mostly these could be tables of actions. And counter tables
look like a gact action.

 From s/w:
If what is needed is to just dump explicit stats a gact
action with a cookie and an index would suffice.
i.e tc match foobar \
action continue cookie blah index 15 \
action ...
action mirred redirect ...

of course this now adds extra cycles in the s/w datapath but
advantage is it means you can cheaply either get
individual counters (get action gact index 15) or dump all gact actions
and filter in user space for your cookie. Or introduce a dump
cookie filter in the kernel (similar to the "time since" action
dump filter).

cheers,
jamal

> Speaking about shared action offload, I remember looking at some RFC
> patches by Edward implementing such functionality and allowing action
> stats update directly from act, as opposed to current design that relies
> on classifier to update action stats from hardware. Is that what you
> mean by appropriately fixing offload code? With such implementation,
> just dumping relevant action types would also work. My only concern is
> that the only way to dump actions is per-namespace as opposed to
> per-Qdisc of filters, which would clash with any other cls/act users on
> same machine/hypervisor.
> 
> 
> [...]
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next v2 0/4] Implement classifier-action terse dump mode
  2020-05-22 19:33         ` Cong Wang
@ 2020-05-25 11:38           ` Vlad Buslov
  0 siblings, 0 replies; 31+ messages in thread
From: Vlad Buslov @ 2020-05-25 11:38 UTC (permalink / raw)
  To: Cong Wang
  Cc: Vlad Buslov, Edward Cree, Linux Kernel Network Developers,
	David Miller, Jamal Hadi Salim, Jiri Pirko, Davide Caratti,
	Marcelo Ricardo Leitner, Jakub Kicinski


[-- Attachment #1: Type: text/plain, Size: 6249 bytes --]

On Fri 22 May 2020 at 22:33, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Wed, May 20, 2020 at 12:24 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>>
>>
>> On Tue 19 May 2020 at 21:58, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> > On Tue, May 19, 2020 at 2:04 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>> >> I considered that approach initially but decided against it for
>> >> following reasons:
>> >>
>> >> - Generic data is covered by current terse dump implementation.
>> >>   Everything else will be act or cls specific which would result long
>> >>   list of flag values like: TCA_DUMP_FLOWER_KEY_ETH_DST,
>> >>   TCA_DUMP_FLOWER_KEY_ETH_DST, TCA_DUMP_FLOWER_KEY_VLAN_ID, ...,
>> >>   TCA_DUMP_TUNNEL_KEY_ENC_KEY_ID, TCA_DUMP_TUNNEL_KEY_ENC_TOS. All of
>> >>   these would require a lot of dedicated logic in act and cls dump
>> >>   callbacks. Also, it would be quite a challenge to test all possible
>> >>   combinations.
>> >
>> > Well, if you consider netlink dump as a database query, what Edward
>> > proposed is merely "select COLUMN1 COLUMN2 from cls_db" rather
>> > than "select * from cls_db".
>> >
>> > No one said it is easy to implement, it is just more elegant than you
>> > select a hardcoded set of columns for the user.
>>
>> As I explained to Edward, having denser netlink packets with more
>> filters per packet is only part of optimization. Another part is not
>> executing some code at all. Consider fl_dump_key() which is 200 lines
>> function with bunch of conditionals like that:
>>
>> static int fl_dump_key(struct sk_buff *skb, struct net *net,
>>                        struct fl_flow_key *key, struct fl_flow_key *mask)
>> {
>>         if (mask->meta.ingress_ifindex) {
>>                 struct net_device *dev;
>>
>>                 dev = __dev_get_by_index(net, key->meta.ingress_ifindex);
>>                 if (dev && nla_put_string(skb, TCA_FLOWER_INDEV, dev->name))
>>                         goto nla_put_failure;
>>         }
>>
>>         if (fl_dump_key_val(skb, key->eth.dst, TCA_FLOWER_KEY_ETH_DST,
>>                             mask->eth.dst, TCA_FLOWER_KEY_ETH_DST_MASK,
>>                             sizeof(key->eth.dst)) ||
>>             fl_dump_key_val(skb, key->eth.src, TCA_FLOWER_KEY_ETH_SRC,
>>                             mask->eth.src, TCA_FLOWER_KEY_ETH_SRC_MASK,
>>                             sizeof(key->eth.src)) ||
>>             fl_dump_key_val(skb, &key->basic.n_proto, TCA_FLOWER_KEY_ETH_TYPE,
>>                             &mask->basic.n_proto, TCA_FLOWER_UNSPEC,
>>                             sizeof(key->basic.n_proto)))
>>                 goto nla_put_failure;
>>
>>         if (fl_dump_key_mpls(skb, &key->mpls, &mask->mpls))
>>                 goto nla_put_failure;
>>
>>         if (fl_dump_key_vlan(skb, TCA_FLOWER_KEY_VLAN_ID,
>>                              TCA_FLOWER_KEY_VLAN_PRIO, &key->vlan, &mask->vlan))
>>                 goto nla_put_failure;
>>     ...
>>
>>
>> Now imagine all of these are extended with additional if (flags &
>> TCA_DUMP_XXX). All gains from not outputting some other minor stuff into
>> netlink packet will be negated by it.
>
> Interesting, are you saying a bit test is as expensive as appending
> an actual netlink attribution to the dumping? I am surprised.

It is not just adding a clause to all those conditionals. Some functions
are not called at all with current terse dump design. In the case of
fl_dump_key() it is just a bunch of conditionals (and maybe price of
cache misses to access struct fl_flow_key in a first place). In case of
tc_action_ops->dump() it is also obtaining a spinlock, some atomic ops,
etc. But I agree, "negated" is too strong of a word, "significantly
impacted" is more correct.

>
>
>>
>>
>> >
>> > Think about it, what if another user wants a less terse dump but still
>> > not a full dump? Would you implement ops->terse_dump2()? Or
>> > what if people still think your terse dump is not as terse as she wants?
>> > ops->mini_dump()? How many ops's we would end having?
>>
>> User can discard whatever he doesn't need in user land code. The goal of
>> this change is performance optimization, not designing a generic
>> kernel-space data filtering mechanism.
>
> You optimize the performance by reducing the dump size, which is
> already effectively a data filtering. This doesn't have to be your goal,
> you are implementing it anyway.
>
>
>>
>> >
>> >
>> >>
>> >> - It is hard to come up with proper validation for such implementation.
>> >>   In case of terse dump I just return an error if classifier doesn't
>> >>   implement the callback (and since current implementation only outputs
>> >>   generic action info, it doesn't even require support from
>> >>   action-specific dump callbacks). But, for example, how do we validate
>> >>   a case where user sets some flower and tunnel_key act dump flags from
>> >>   previous paragraph, but Qdisc contains some other classifier? Or
>> >>   flower classifier points to other types of actions? Or when flower
>> >>   classifier has and tunnel_key actions but also mirred? Should the
>> >
>> > Each action should be able to dump selectively too. If you think it
>> > as a database, it is just a different table with different schemas.
>>
>> How is designing custom SQL-like query language (according to your
>> example at the beginning of the mail) for filter dump is going to
>> improve performance? If there is a way to do it in fast a generic manner
>> with BPF, then I'm very interested to hear the details. But adding
>> hundred more hardcoded conditionals is just not a solution considering
>> main motivations for this change is performance.
>
> I still wonder how a bit test is as expensive as you claim, it does
> not look like you actually measure it. This of course depends on the
> size of the dump, but if you look at other netlink dump in kernel,
> not just tc filters, we already dump a lot of attributes per record.
>
> Thanks.

I agree that I didn't specify which parts of the change constitute what
fraction of the dump rate increase. Lets stage a simple test to verify
the cost of calling just two functions (fl_dump_key() and
tc_act_ops->dump() callback) and instantly discarding their results from
packet (patch attached).

[-- Attachment #2: Test patch --]
[-- Type: text/plain, Size: 1771 bytes --]

diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 8ac7eb0a8309..267ee76d3ddb 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -771,6 +771,9 @@ tcf_action_dump_terse(struct sk_buff *skb, struct tc_action *a)
 {
 	unsigned char *b = skb_tail_pointer(skb);
 	struct tc_cookie *cookie;
+	unsigned char *c;
+	struct nlattr *nest;
+	int err;
 
 	if (nla_put_string(skb, TCA_KIND, a->ops->kind))
 		goto nla_put_failure;
@@ -787,6 +790,16 @@ tcf_action_dump_terse(struct sk_buff *skb, struct tc_action *a)
 	}
 	rcu_read_unlock();
 
+	c = skb_tail_pointer(skb);
+	nest = nla_nest_start_noflag(skb, TCA_OPTIONS);
+	if (nest == NULL)
+		goto nla_put_failure;
+	err = tcf_action_dump_old(skb, a, 0, 0);
+	if (err > 0) {
+		nla_nest_end(skb, nest);
+	}
+	nlmsg_trim(skb, c);
+
 	return 0;
 
 nla_put_failure:
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 0c574700da75..1bc6294c5c9b 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -2771,8 +2771,10 @@ static int fl_dump(struct net *net, struct tcf_proto *tp, void *fh,
 static int fl_terse_dump(struct net *net, struct tcf_proto *tp, void *fh,
 			 struct sk_buff *skb, struct tcmsg *t, bool rtnl_held)
 {
+	struct fl_flow_key *key, *mask;
 	struct cls_fl_filter *f = fh;
 	struct nlattr *nest;
+	unsigned char *b;
 	bool skip_hw;
 
 	if (!f)
@@ -2786,8 +2788,15 @@ static int fl_terse_dump(struct net *net, struct tcf_proto *tp, void *fh,
 
 	spin_lock(&tp->lock);
 
+	key = &f->key;
+	mask = &f->mask->key;
 	skip_hw = tc_skip_hw(f->flags);
 
+	b = skb_tail_pointer(skb);
+	if (fl_dump_key(skb, net, key, mask))
+		goto nla_put_failure_locked;
+	nlmsg_trim(skb, b);
+
 	if (f->flags && nla_put_u32(skb, TCA_FLOWER_FLAGS, f->flags))
 		goto nla_put_failure_locked;
 

[-- Attachment #3: Type: text/plain, Size: 947 bytes --]


Result for terse dumping 1m simple rules (flower with L2 key + gact
drop) on current net-next:

$ time sudo tc -s filter show terse dev ens1f0 ingress >/dev/null

real    0m3.445s
user    0m2.087s
sys     0m1.298s

With patch applied:

$ time sudo tc -s filter show terse dev ens1f0_0 ingress >/dev/null

real    0m5.035s
user    0m3.289s
sys     0m1.687s

As we can see this leads to 30% overhead in kernel space execution time.

Now with more complex rules (flower 5tuple + act tunnel key + act
mirred) on current net-next:

$ time sudo tc -s filter show terse dev ens1f0 ingress >/dev/null

real    0m4.052s
user    0m2.065s
sys     0m1.937s

Same rules with patch applied:

$ time sudo tc -s filter show terse dev ens1f0_0 ingress >/dev/null

real    0m6.346s
user    0m3.166s
sys     0m3.108s

With more complex rules performance impact on kernel space execution
time get more severe (60%). Overall, this looks like significant
degradation.

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, back to index

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-15 11:40 [PATCH net-next v2 0/4] Implement classifier-action terse dump mode Vlad Buslov
2020-05-15 11:40 ` [PATCH net-next v2 1/4] net: sched: introduce terse dump flag Vlad Buslov
2020-05-15 11:51   ` Jiri Pirko
2020-05-15 11:40 ` [PATCH net-next v2 2/4] net: sched: implement terse dump support in act Vlad Buslov
2020-05-15 11:51   ` Jiri Pirko
2020-05-15 11:40 ` [PATCH net-next v2 3/4] net: sched: cls_flower: implement terse dump support Vlad Buslov
2020-05-15 11:40 ` [PATCH net-next v2 4/4] selftests: implement flower classifier terse dump tests Vlad Buslov
2020-05-15 17:04 ` [PATCH net-next v2 0/4] Implement classifier-action terse dump mode Jakub Kicinski
2020-05-18  6:46   ` Vlad Buslov
2020-05-15 17:25 ` David Miller
2020-05-18  6:50   ` Vlad Buslov
2020-05-17 19:13 ` Cong Wang
2020-05-18  6:44   ` Vlad Buslov
2020-05-18 18:46     ` Cong Wang
2020-05-19  9:10       ` Vlad Buslov
2020-05-19 18:39         ` Cong Wang
2020-05-20  7:33           ` Vlad Buslov
2020-05-18 15:37 ` Edward Cree
2020-05-18 18:50   ` Cong Wang
2020-05-19  9:04   ` Vlad Buslov
2020-05-19 14:30     ` Edward Cree
2020-05-19 15:17       ` Vlad Buslov
2020-05-19 18:58     ` Cong Wang
2020-05-20  7:24       ` Vlad Buslov
2020-05-22 19:33         ` Cong Wang
2020-05-25 11:38           ` Vlad Buslov
2020-05-21 14:36   ` Vlad Buslov
2020-05-21 17:02     ` Jakub Kicinski
2020-05-22 16:16       ` Vlad Buslov
2020-05-23 11:06         ` Jamal Hadi Salim
2020-05-22 19:41     ` Cong Wang

Netdev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/netdev/0 netdev/git/0.git
	git clone --mirror https://lore.kernel.org/netdev/1 netdev/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 netdev netdev/ https://lore.kernel.org/netdev \
		netdev@vger.kernel.org
	public-inbox-index netdev

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.netdev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git