netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v3 0/4] ] TC datapath hash api
@ 2020-07-11 21:28 Ariel Levkovich
  2020-07-11 21:28 ` [PATCH net-next v3 1/4] net/sched: Add skb->hash field editing via act_skbedit Ariel Levkovich
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Ariel Levkovich @ 2020-07-11 21:28 UTC (permalink / raw)
  To: netdev; +Cc: jiri, kuba, jhs, xiyou.wangcong, ast, daniel, Ariel Levkovich

Supporting datapath hash allows user to set up rules that provide
load balancing of traffic across multiple vports and for ECMP path
selection while keeping the number of rule at minimum.

Instead of matching on exact flow spec, which requires a rule per
flow, user can define rules based on hashing on the packet headers
and distribute the flows to different buckets. The number of rules
in this case will be constant and equal to the number of buckets.

The datapath hash functionality is achieved in two steps -
performing the hash action and then matching on the result, as
part of the packet's classification.

To compute the hash value, the api offers 2 methods:
1. Linux implementation of an asymmetric hash algorithm
which is performed on the L4 headers of the packet.
This method is usable via an extention to act_skbedit and
allows user to provide a basis value to be included in
the computation.

2. User provided bpf program that implements
a hash computation algorithm. This option is usable
via a new type of tc action - action_hash.

Through both methods, the hash value is calculated
and stored in the skb->hash field so it can be matched
later as a key in the cls flower classifier.
where the hash function can be standard asymetric hashing that Linux
offers or alternatively user can provide a bpf program that
performs hash calculation on a packet.

Usage is as follows:

For hash calculation:
$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 0 proto ip \
flower ip_proto tcp \
action hash object-file <file> section <hash_section>\
action goto chain 2

Or:

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 0 proto ip \
flower ip_proto udp \
action skbedit hash asym_l4 basis <basis> \
action goto chain 2

Matching on hash result:

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 2 proto ip \
flower hash 0x0/0xf  \
action mirred egress redirect dev ens1f0_1

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 2 proto ip \
flower hash 0x1/0xf  \
action mirred egress redirect dev ens1f0_2


v2 -> v3:
 *Split hash algorithm option into 2 different actions.
  Asym_l4 available via act_skbedit and bpf via new act_hash.

Ariel Levkovich (4):
  net/sched: Add skb->hash field editing via act_skbedit
  net/sched: Introduce action hash
  net/flow_dissector: add packet hash dissection
  net/sched: cls_flower: Add hash info to flow classification

 include/linux/skbuff.h                 |   4 +
 include/net/act_api.h                  |   2 +
 include/net/flow_dissector.h           |   9 +
 include/net/tc_act/tc_hash.h           |  20 ++
 include/net/tc_act/tc_skbedit.h        |   2 +
 include/uapi/linux/pkt_cls.h           |   4 +
 include/uapi/linux/tc_act/tc_hash.h    |  25 ++
 include/uapi/linux/tc_act/tc_skbedit.h |   7 +
 net/core/flow_dissector.c              |  17 ++
 net/sched/Kconfig                      |  11 +
 net/sched/Makefile                     |   1 +
 net/sched/act_hash.c                   | 348 +++++++++++++++++++++++++
 net/sched/act_skbedit.c                |  38 +++
 net/sched/cls_api.c                    |   1 +
 net/sched/cls_flower.c                 |  16 ++
 15 files changed, 505 insertions(+)
 create mode 100644 include/net/tc_act/tc_hash.h
 create mode 100644 include/uapi/linux/tc_act/tc_hash.h
 create mode 100644 net/sched/act_hash.c

-- 
2.25.2


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH net-next v3 1/4] net/sched: Add skb->hash field editing via act_skbedit
  2020-07-11 21:28 [PATCH net-next v3 0/4] ] TC datapath hash api Ariel Levkovich
@ 2020-07-11 21:28 ` Ariel Levkovich
  2020-07-13 17:11   ` Davide Caratti
  2020-07-15  1:14   ` David Ahern
  2020-07-11 21:28 ` [PATCH net-next v3 2/4] net/sched: Introduce action hash Ariel Levkovich
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 14+ messages in thread
From: Ariel Levkovich @ 2020-07-11 21:28 UTC (permalink / raw)
  To: netdev
  Cc: jiri, kuba, jhs, xiyou.wangcong, ast, daniel, Ariel Levkovich,
	Jiri Pirko

Extend act_skbedit api to allow writing into skb->hash
field.

To modify skb->hash user selects the hash algorithm
to use for the hash computation and can provide a
hash basis value to be used in the calculation.
The hash value will be calculated on the packet in the
datapath and will be set into skb->hash field.

Current implementation supports only the asymmetric l4 hash
algorithm that first checks whether the skb->hash was already
set with l4 hash value (possibly by the device driver) and uses
the existing value. If hash value wasn't set, it computes the
hash value in place using the kernel implementation of the
Jenkins hash algorithm.

Usage example:

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 0 proto ip \
flower ip_proto tcp \
action skbedit hash asym_l4 basis 5 \
action goto chain 2

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/tc_act/tc_skbedit.h        |  2 ++
 include/uapi/linux/tc_act/tc_skbedit.h |  7 +++++
 net/sched/act_skbedit.c                | 38 ++++++++++++++++++++++++++
 3 files changed, 47 insertions(+)

diff --git a/include/net/tc_act/tc_skbedit.h b/include/net/tc_act/tc_skbedit.h
index 00bfee70609e..44a8a4625556 100644
--- a/include/net/tc_act/tc_skbedit.h
+++ b/include/net/tc_act/tc_skbedit.h
@@ -18,6 +18,8 @@ struct tcf_skbedit_params {
 	u32 mask;
 	u16 queue_mapping;
 	u16 ptype;
+	u32 hash_alg;
+	u32 hash_basis;
 	struct rcu_head rcu;
 };
 
diff --git a/include/uapi/linux/tc_act/tc_skbedit.h b/include/uapi/linux/tc_act/tc_skbedit.h
index 800e93377218..5877811b093b 100644
--- a/include/uapi/linux/tc_act/tc_skbedit.h
+++ b/include/uapi/linux/tc_act/tc_skbedit.h
@@ -29,6 +29,11 @@
 #define SKBEDIT_F_PTYPE			0x8
 #define SKBEDIT_F_MASK			0x10
 #define SKBEDIT_F_INHERITDSFIELD	0x20
+#define SKBEDIT_F_HASH			0x40
+
+enum {
+	TCA_SKBEDIT_HASH_ALG_ASYM_L4,
+};
 
 struct tc_skbedit {
 	tc_gen;
@@ -45,6 +50,8 @@ enum {
 	TCA_SKBEDIT_PTYPE,
 	TCA_SKBEDIT_MASK,
 	TCA_SKBEDIT_FLAGS,
+	TCA_SKBEDIT_HASH,
+	TCA_SKBEDIT_HASH_BASIS,
 	__TCA_SKBEDIT_MAX
 };
 #define TCA_SKBEDIT_MAX (__TCA_SKBEDIT_MAX - 1)
diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
index b125b2be4467..2cc66c798afb 100644
--- a/net/sched/act_skbedit.c
+++ b/net/sched/act_skbedit.c
@@ -66,6 +66,20 @@ static int tcf_skbedit_act(struct sk_buff *skb, const struct tc_action *a,
 	}
 	if (params->flags & SKBEDIT_F_PTYPE)
 		skb->pkt_type = params->ptype;
+
+	if (params->flags & SKBEDIT_F_HASH) {
+		u32 hash;
+
+		hash = skb_get_hash(skb);
+		/* If a hash basis was provided, add it into
+		 * hash calculation here and re-set skb->hash
+		 * to the new result with sw_hash indication
+		 * and keeping the l4 hash indication.
+		 */
+		hash = jhash_1word(hash, params->hash_basis);
+		__skb_set_sw_hash(skb, hash, skb->l4_hash);
+	}
+
 	return action;
 
 err:
@@ -91,6 +105,8 @@ static const struct nla_policy skbedit_policy[TCA_SKBEDIT_MAX + 1] = {
 	[TCA_SKBEDIT_PTYPE]		= { .len = sizeof(u16) },
 	[TCA_SKBEDIT_MASK]		= { .len = sizeof(u32) },
 	[TCA_SKBEDIT_FLAGS]		= { .len = sizeof(u64) },
+	[TCA_SKBEDIT_HASH]		= { .len = sizeof(u32) },
+	[TCA_SKBEDIT_HASH_BASIS]	= { .len = sizeof(u32) },
 };
 
 static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
@@ -107,6 +123,7 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
 	struct tcf_skbedit *d;
 	u32 flags = 0, *priority = NULL, *mark = NULL, *mask = NULL;
 	u16 *queue_mapping = NULL, *ptype = NULL;
+	u32 hash_alg, hash_basis = 0;
 	bool exists = false;
 	int ret = 0, err;
 	u32 index;
@@ -156,6 +173,17 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
 			flags |= SKBEDIT_F_INHERITDSFIELD;
 	}
 
+	if (tb[TCA_SKBEDIT_HASH] != NULL) {
+		hash_alg = nla_get_u32(tb[TCA_SKBEDIT_HASH]);
+		if (hash_alg > TCA_SKBEDIT_HASH_ALG_ASYM_L4)
+			return -EINVAL;
+
+		flags |= SKBEDIT_F_HASH;
+
+		if (tb[TCA_SKBEDIT_HASH_BASIS])
+			hash_basis = nla_get_u32(tb[TCA_SKBEDIT_HASH_BASIS]);
+	}
+
 	parm = nla_data(tb[TCA_SKBEDIT_PARMS]);
 	index = parm->index;
 	err = tcf_idr_check_alloc(tn, &index, a, bind);
@@ -213,6 +241,10 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
 	params_new->mask = 0xffffffff;
 	if (flags & SKBEDIT_F_MASK)
 		params_new->mask = *mask;
+	if (flags & SKBEDIT_F_HASH) {
+		params_new->hash_alg = hash_alg;
+		params_new->hash_basis = hash_basis;
+	}
 
 	spin_lock_bh(&d->tcf_lock);
 	goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch);
@@ -276,6 +308,12 @@ static int tcf_skbedit_dump(struct sk_buff *skb, struct tc_action *a,
 	if (pure_flags != 0 &&
 	    nla_put(skb, TCA_SKBEDIT_FLAGS, sizeof(pure_flags), &pure_flags))
 		goto nla_put_failure;
+	if (params->flags & SKBEDIT_F_HASH) {
+		if (nla_put_u32(skb, TCA_SKBEDIT_HASH, params->hash_alg))
+			goto nla_put_failure;
+		if (nla_put_u32(skb, TCA_SKBEDIT_HASH_BASIS, params->hash_basis))
+			goto nla_put_failure;
+	}
 
 	tcf_tm_dump(&t, &d->tcf_tm);
 	if (nla_put_64bit(skb, TCA_SKBEDIT_TM, sizeof(t), &t, TCA_SKBEDIT_PAD))
-- 
2.25.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next v3 2/4] net/sched: Introduce action hash
  2020-07-11 21:28 [PATCH net-next v3 0/4] ] TC datapath hash api Ariel Levkovich
  2020-07-11 21:28 ` [PATCH net-next v3 1/4] net/sched: Add skb->hash field editing via act_skbedit Ariel Levkovich
@ 2020-07-11 21:28 ` Ariel Levkovich
  2020-07-13 22:04   ` Cong Wang
  2020-07-11 21:28 ` [PATCH net-next v3 3/4] net/flow_dissector: add packet hash dissection Ariel Levkovich
  2020-07-11 21:28 ` [PATCH net-next v3 4/4] net/sched: cls_flower: Add hash info to flow classification Ariel Levkovich
  3 siblings, 1 reply; 14+ messages in thread
From: Ariel Levkovich @ 2020-07-11 21:28 UTC (permalink / raw)
  To: netdev
  Cc: jiri, kuba, jhs, xiyou.wangcong, ast, daniel, Ariel Levkovich,
	Jiri Pirko

Allow user to set a packet's hash value using a bpf program.

The user provided BPF program is required to compute and return
a hash value for the packet which is then stored in skb->hash.

Using this action to set the skb->hash is an alternative to setting
it with act_skbedit and can be useful for future HW offload support
when the HW hash function is different then the kernel's hash
function.
By using a bpg program that emulates the HW hash function user
can ensure hash consistency between the SW and the HW.

Usage is as follows:

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 0 proto ip \
flower ip_proto tcp \
action hash object-file <bpf file> section <hash_comp_section>\
action goto chain 2

Matching on the result:
$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 2 proto ip \
flower hash 0x0/0xf  \
action mirred egress redirect dev ens1f0_1

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 2 proto ip \
flower hash 0x1/0xf  \
action mirred egress redirect dev ens1f0_2

v1 -> v2:
 *Handle egress case for bpf hash properly.
 *Check for valid bpf fd before referencing it.
 *Fixed missing unlocking of tcf_lock.

v2 -> v3:
 *Move hash algorithm asym_l4 to act_skbedit.
  This action only supports bpf option now.
 
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/act_api.h               |   2 +
 include/net/tc_act/tc_hash.h        |  20 ++
 include/uapi/linux/pkt_cls.h        |   1 +
 include/uapi/linux/tc_act/tc_hash.h |  25 ++
 net/sched/Kconfig                   |  11 +
 net/sched/Makefile                  |   1 +
 net/sched/act_hash.c                | 348 ++++++++++++++++++++++++++++
 net/sched/cls_api.c                 |   1 +
 8 files changed, 409 insertions(+)
 create mode 100644 include/net/tc_act/tc_hash.h
 create mode 100644 include/uapi/linux/tc_act/tc_hash.h
 create mode 100644 net/sched/act_hash.c

diff --git a/include/net/act_api.h b/include/net/act_api.h
index 8c3934880670..b7e5d060bd2f 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -12,6 +12,8 @@
 #include <net/net_namespace.h>
 #include <net/netns/generic.h>
 
+#define ACT_BPF_NAME_LEN	256
+
 struct tcf_idrinfo {
 	struct mutex	lock;
 	struct idr	action_idr;
diff --git a/include/net/tc_act/tc_hash.h b/include/net/tc_act/tc_hash.h
new file mode 100644
index 000000000000..4a9e36664813
--- /dev/null
+++ b/include/net/tc_act/tc_hash.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef __NET_TC_HASH_H
+#define __NET_TC_HASH_H
+
+#include <net/act_api.h>
+#include <uapi/linux/tc_act/tc_hash.h>
+
+struct tcf_hash_params {
+	struct bpf_prog	*prog;
+	const char *bpf_name;
+	struct rcu_head	rcu;
+};
+
+struct tcf_hash {
+	struct tc_action common;
+	struct tcf_hash_params __rcu *hash_p;
+};
+#define to_hash(a) ((struct tcf_hash *)a)
+
+#endif /* __NET_TC_HASH_H */
diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 7576209d96f9..2fd93389d091 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -135,6 +135,7 @@ enum tca_id {
 	TCA_ID_MPLS,
 	TCA_ID_CT,
 	TCA_ID_GATE,
+	TCA_ID_HASH,
 	/* other actions go here */
 	__TCA_ID_MAX = 255
 };
diff --git a/include/uapi/linux/tc_act/tc_hash.h b/include/uapi/linux/tc_act/tc_hash.h
new file mode 100644
index 000000000000..08937f097ed7
--- /dev/null
+++ b/include/uapi/linux/tc_act/tc_hash.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+
+#ifndef __LINUX_TC_HASH_H
+#define __LINUX_TC_HASH_H
+
+#include <linux/pkt_cls.h>
+
+enum {
+	TCA_HASH_UNSPEC,
+	TCA_HASH_TM,
+	TCA_HASH_PARMS,
+	TCA_HASH_BPF_FD,
+	TCA_HASH_BPF_NAME,
+	TCA_HASH_BPF_ID,
+	TCA_HASH_BPF_TAG,
+	TCA_HASH_PAD,
+	__TCA_HASH_MAX,
+};
+#define TCA_HASH_MAX (__TCA_HASH_MAX - 1)
+
+struct tc_hash {
+	tc_gen;
+};
+
+#endif
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 84badf00647e..e9725bb40f4f 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -993,6 +993,17 @@ config NET_ACT_GATE
 	  To compile this code as a module, choose M here: the
 	  module will be called act_gate.
 
+config NET_ACT_HASH
+	tristate "Hash calculation action"
+	depends on NET_CLS_ACT
+	help
+	  Say Y here to perform hash calculation on packet headers.
+
+	  If unsure, say N.
+
+	  To compile this code as a module, choose M here: the
+	  module will be called act_hash.
+
 config NET_IFE_SKBMARK
 	tristate "Support to encoding decoding skb mark on IFE action"
 	depends on NET_ACT_IFE
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 66bbf9a98f9e..2d1415fb57db 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -25,6 +25,7 @@ obj-$(CONFIG_NET_ACT_CONNMARK)	+= act_connmark.o
 obj-$(CONFIG_NET_ACT_CTINFO)	+= act_ctinfo.o
 obj-$(CONFIG_NET_ACT_SKBMOD)	+= act_skbmod.o
 obj-$(CONFIG_NET_ACT_IFE)	+= act_ife.o
+obj-$(CONFIG_NET_ACT_HASH)      += act_hash.o
 obj-$(CONFIG_NET_IFE_SKBMARK)	+= act_meta_mark.o
 obj-$(CONFIG_NET_IFE_SKBPRIO)	+= act_meta_skbprio.o
 obj-$(CONFIG_NET_IFE_SKBTCINDEX)	+= act_meta_skbtcindex.o
diff --git a/net/sched/act_hash.c b/net/sched/act_hash.c
new file mode 100644
index 000000000000..88e7c9f9a2d3
--- /dev/null
+++ b/net/sched/act_hash.c
@@ -0,0 +1,348 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* -
+ * net/sched/act_hash.c  Hash calculation action
+ *
+ * Author:   Ariel Levkovich <lariel@mellanox.com>
+ */
+
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/rtnetlink.h>
+#include <linux/skbuff.h>
+#include <linux/filter.h>
+#include <net/netlink.h>
+#include <net/pkt_sched.h>
+#include <net/pkt_cls.h>
+#include <linux/tc_act/tc_hash.h>
+#include <net/tc_act/tc_hash.h>
+
+#define ACT_HASH_BPF_NAME_LEN	256
+
+static unsigned int hash_net_id;
+static struct tc_action_ops act_hash_ops;
+
+static int tcf_hash_act(struct sk_buff *skb, const struct tc_action *a,
+			struct tcf_result *res)
+{
+	bool at_ingress = skb_at_tc_ingress(skb);
+	struct tcf_hash *h = to_hash(a);
+	struct tcf_hash_params *p;
+	int action;
+	u32 hash;
+
+	tcf_lastuse_update(&h->tcf_tm);
+	tcf_action_update_bstats(&h->common, skb);
+
+	p = rcu_dereference_bh(h->hash_p);
+
+	action = READ_ONCE(h->tcf_action);
+
+	if (at_ingress) {
+		__skb_push(skb, skb->mac_len);
+		bpf_compute_data_pointers(skb);
+		hash = BPF_PROG_RUN(p->prog, skb);
+		__skb_pull(skb, skb->mac_len);
+	} else {
+		bpf_compute_data_pointers(skb);
+		hash = BPF_PROG_RUN(p->prog, skb);
+	}
+
+	/* The BPF program hash function type is
+	 * unknown so only the sw hash bit is set.
+	 */
+	__skb_set_sw_hash(skb, hash, false);
+
+	return action;
+}
+
+static const struct nla_policy hash_policy[TCA_HASH_MAX + 1] = {
+	[TCA_HASH_PARMS]	= { .type = NLA_EXACT_LEN, .len = sizeof(struct tc_hash) },
+	[TCA_HASH_BPF_FD]	= { .type = NLA_U32 },
+	[TCA_HASH_BPF_NAME]	= { .type = NLA_NUL_STRING,
+				    .len = ACT_HASH_BPF_NAME_LEN },
+};
+
+static int tcf_hash_bpf_init(struct nlattr **tb, struct tcf_hash_params *params)
+{
+	struct bpf_prog *fp;
+	char *name = NULL;
+	u32 bpf_fd;
+
+	bpf_fd = nla_get_u32(tb[TCA_HASH_BPF_FD]);
+
+	fp = bpf_prog_get_type(bpf_fd, BPF_PROG_TYPE_SCHED_ACT);
+	if (IS_ERR(fp))
+		return PTR_ERR(fp);
+
+	if (tb[TCA_HASH_BPF_NAME]) {
+		name = nla_memdup(tb[TCA_HASH_BPF_NAME], GFP_KERNEL);
+		if (!name) {
+			bpf_prog_put(fp);
+			return -ENOMEM;
+		}
+	}
+
+	params->bpf_name = name;
+	params->prog = fp;
+
+	return 0;
+}
+
+static void tcf_hash_bpf_cleanup(struct tcf_hash_params *params)
+{
+	if (params->prog)
+		bpf_prog_put(params->prog);
+
+	kfree(params->bpf_name);
+}
+
+static int tcf_hash_init(struct net *net, struct nlattr *nla,
+			 struct nlattr *est, struct tc_action **a,
+			 int replace, int bind, bool rtnl_held,
+			 struct tcf_proto *tp, u32 flags,
+			 struct netlink_ext_ack *extack)
+{
+	struct tc_action_net *tn = net_generic(net, hash_net_id);
+	struct tcf_hash_params *params, old;
+	struct nlattr *tb[TCA_HASH_MAX + 1];
+	struct tcf_chain *goto_ch = NULL;
+	struct tcf_hash_params *p = NULL;
+	struct tc_hash *parm;
+	struct tcf_hash *h;
+	int err, res = 0;
+	u32 index;
+
+	if (!nla) {
+		NL_SET_ERR_MSG_MOD(extack, "Hash requires attributes to be passed");
+		return -EINVAL;
+	}
+
+	err = nla_parse_nested(tb, TCA_HASH_MAX, nla, hash_policy, extack);
+	if (err < 0)
+		return err;
+
+	if (!tb[TCA_HASH_PARMS]) {
+		NL_SET_ERR_MSG_MOD(extack, "Missing required hash parameters");
+		return -EINVAL;
+	}
+	parm = nla_data(tb[TCA_HASH_PARMS]);
+	index = parm->index;
+
+	err = tcf_idr_check_alloc(tn, &index, a, bind);
+	if (err < 0)
+		return err;
+
+	if (!err) {
+		err = tcf_idr_create_from_flags(tn, index, est, a,
+						&act_hash_ops, bind, flags);
+		if (err) {
+			tcf_idr_cleanup(tn, index);
+			return err;
+		}
+		res = ACT_P_CREATED;
+	} else {
+		if (bind)
+			return 0;
+
+		if (!replace) {
+			tcf_idr_release(*a, bind);
+			return -EEXIST;
+		}
+	}
+	err = tcf_action_check_ctrlact(parm->action, tp, &goto_ch, extack);
+	if (err < 0)
+		goto release_idr;
+
+	h = to_hash(*a);
+
+	p = kzalloc(sizeof(*p), GFP_KERNEL);
+	if (unlikely(!p)) {
+		err = -ENOMEM;
+		goto cleanup;
+	}
+
+	if (!tb[TCA_HASH_BPF_FD]) {
+		NL_SET_ERR_MSG_MOD(extack, "Missing hash bpf fd");
+		err = -EINVAL;
+		goto cleanup;
+	}
+
+	spin_lock_bh(&h->tcf_lock);
+
+	if (res != ACT_P_CREATED) {
+		params = rcu_dereference_protected(h->hash_p, 1);
+		old.prog = params->prog;
+		old.bpf_name = params->bpf_name;
+	}
+
+	err = tcf_hash_bpf_init(tb, p);
+	if (err)
+		goto unlock;
+
+	goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch);
+	p = rcu_replace_pointer(h->hash_p, p,
+				lockdep_is_held(&h->tcf_lock));
+	spin_unlock_bh(&h->tcf_lock);
+
+	if (goto_ch)
+		tcf_chain_put_by_act(goto_ch);
+	if (p)
+		kfree_rcu(p, rcu);
+
+	if (res == ACT_P_CREATED) {
+		tcf_idr_insert(tn, *a);
+	} else {
+		synchronize_rcu();
+		tcf_hash_bpf_cleanup(&old);
+	}
+
+	return res;
+
+unlock:
+	spin_unlock_bh(&h->tcf_lock);
+
+cleanup:
+	if (goto_ch)
+		tcf_chain_put_by_act(goto_ch);
+	kfree(p);
+
+release_idr:
+	tcf_idr_release(*a, bind);
+	return err;
+}
+
+static void tcf_hash_cleanup(struct tc_action *a)
+{
+	struct tcf_hash *h = to_hash(a);
+	struct tcf_hash_params *p;
+
+	p = rcu_dereference_protected(h->hash_p, 1);
+	if (p) {
+		tcf_hash_bpf_cleanup(p);
+		kfree_rcu(p, rcu);
+	}
+}
+
+static int tcf_hash_dump(struct sk_buff *skb, struct tc_action *a,
+			 int bind, int ref)
+{
+	unsigned char *tp = skb_tail_pointer(skb);
+	struct tcf_hash *h = to_hash(a);
+	struct tcf_hash_params *p;
+	struct tc_hash opt = {
+		.index    = h->tcf_index,
+		.refcnt   = refcount_read(&h->tcf_refcnt) - ref,
+		.bindcnt  = atomic_read(&h->tcf_bindcnt) - bind,
+	};
+	struct nlattr *nla;
+	struct tcf_t tm;
+
+	spin_lock_bh(&h->tcf_lock);
+	opt.action = h->tcf_action;
+	p = rcu_dereference_protected(h->hash_p, lockdep_is_held(&h->tcf_lock));
+
+	if (nla_put(skb, TCA_HASH_PARMS, sizeof(opt), &opt))
+		goto nla_put_failure;
+
+	if (p->bpf_name && nla_put_string(skb, TCA_HASH_BPF_NAME, p->bpf_name))
+		goto nla_put_failure;
+
+	if (nla_put_u32(skb, TCA_HASH_BPF_ID, p->prog->aux->id))
+		goto nla_put_failure;
+
+	nla = nla_reserve(skb, TCA_HASH_BPF_TAG, sizeof(p->prog->tag));
+	if (!nla)
+		goto nla_put_failure;
+
+	memcpy(nla_data(nla), p->prog->tag, nla_len(nla));
+
+	tcf_tm_dump(&tm, &h->tcf_tm);
+	if (nla_put_64bit(skb, TCA_HASH_TM, sizeof(tm), &tm,
+			  TCA_HASH_PAD))
+		goto nla_put_failure;
+
+	spin_unlock_bh(&h->tcf_lock);
+	return skb->len;
+
+nla_put_failure:
+	spin_unlock_bh(&h->tcf_lock);
+	nlmsg_trim(skb, tp);
+	return -1;
+}
+
+static int tcf_hash_walker(struct net *net, struct sk_buff *skb,
+			   struct netlink_callback *cb, int type,
+			   const struct tc_action_ops *ops,
+			   struct netlink_ext_ack *extack)
+{
+	struct tc_action_net *tn = net_generic(net, hash_net_id);
+
+	return tcf_generic_walker(tn, skb, cb, type, ops, extack);
+}
+
+static int tcf_hash_search(struct net *net, struct tc_action **a, u32 index)
+{
+	struct tc_action_net *tn = net_generic(net, hash_net_id);
+
+	return tcf_idr_search(tn, a, index);
+}
+
+static void tcf_hash_stats_update(struct tc_action *a, u64 bytes, u32 packets,
+				  u64 lastuse, bool hw)
+{
+	struct tcf_hash *h = to_hash(a);
+
+	tcf_action_update_stats(a, bytes, packets, false, hw);
+	h->tcf_tm.lastuse = max_t(u64, h->tcf_tm.lastuse, lastuse);
+}
+
+static struct tc_action_ops act_hash_ops = {
+	.kind		=       "hash",
+	.id		=       TCA_ID_HASH,
+	.owner		=       THIS_MODULE,
+	.act		=       tcf_hash_act,
+	.dump		=       tcf_hash_dump,
+	.init		=       tcf_hash_init,
+	.cleanup	=       tcf_hash_cleanup,
+	.walk		=       tcf_hash_walker,
+	.lookup		=       tcf_hash_search,
+	.stats_update	=       tcf_hash_stats_update,
+	.size		=       sizeof(struct tcf_hash),
+};
+
+static __net_init int hash_init_net(struct net *net)
+{
+	struct tc_action_net *tn = net_generic(net, hash_net_id);
+
+	return tc_action_net_init(net, tn, &act_hash_ops);
+}
+
+static void __net_exit hash_exit_net(struct list_head *net_list)
+{
+	tc_action_net_exit(net_list, hash_net_id);
+}
+
+static struct pernet_operations hash_net_ops = {
+	.init = hash_init_net,
+	.exit_batch = hash_exit_net,
+	.id = &hash_net_id,
+	.size = sizeof(struct tc_action_net),
+};
+
+static int __init hash_init_module(void)
+{
+	return tcf_register_action(&act_hash_ops, &hash_net_ops);
+}
+
+static void __exit hash_cleanup_module(void)
+{
+	tcf_unregister_action(&act_hash_ops, &hash_net_ops);
+}
+
+module_init(hash_init_module);
+module_exit(hash_cleanup_module);
+
+MODULE_AUTHOR("Ariel Levkovich <lariel@mellanox.com>");
+MODULE_DESCRIPTION("Packet hash action");
+MODULE_LICENSE("GPL v2");
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index a00a203b2ef5..6d7eb249e557 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -40,6 +40,7 @@
 #include <net/tc_act/tc_ct.h>
 #include <net/tc_act/tc_mpls.h>
 #include <net/tc_act/tc_gate.h>
+#include <net/tc_act/tc_hash.h>
 #include <net/flow_offload.h>
 
 extern const struct nla_policy rtm_tca_policy[TCA_MAX + 1];
-- 
2.25.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next v3 3/4] net/flow_dissector: add packet hash dissection
  2020-07-11 21:28 [PATCH net-next v3 0/4] ] TC datapath hash api Ariel Levkovich
  2020-07-11 21:28 ` [PATCH net-next v3 1/4] net/sched: Add skb->hash field editing via act_skbedit Ariel Levkovich
  2020-07-11 21:28 ` [PATCH net-next v3 2/4] net/sched: Introduce action hash Ariel Levkovich
@ 2020-07-11 21:28 ` Ariel Levkovich
  2020-07-11 21:28 ` [PATCH net-next v3 4/4] net/sched: cls_flower: Add hash info to flow classification Ariel Levkovich
  3 siblings, 0 replies; 14+ messages in thread
From: Ariel Levkovich @ 2020-07-11 21:28 UTC (permalink / raw)
  To: netdev
  Cc: jiri, kuba, jhs, xiyou.wangcong, ast, daniel, Ariel Levkovich,
	Jiri Pirko

Retreive a hash value from the SKB and store it
in the dissector key for future matching.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
---
 include/linux/skbuff.h       |  4 ++++
 include/net/flow_dissector.h |  9 +++++++++
 net/core/flow_dissector.c    | 17 +++++++++++++++++
 3 files changed, 30 insertions(+)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 0c0377fc00c2..beb7fe2c7809 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1342,6 +1342,10 @@ skb_flow_dissect_tunnel_info(const struct sk_buff *skb,
 			     struct flow_dissector *flow_dissector,
 			     void *target_container);
 
+void skb_flow_dissect_hash(const struct sk_buff *skb,
+			   struct flow_dissector *flow_dissector,
+			   void *target_container);
+
 static inline __u32 skb_get_hash(struct sk_buff *skb)
 {
 	if (!skb->l4_hash && !skb->sw_hash)
diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
index a7eba43fe4e4..5cc0540ce3f7 100644
--- a/include/net/flow_dissector.h
+++ b/include/net/flow_dissector.h
@@ -243,6 +243,14 @@ struct flow_dissector_key_ct {
 	u32	ct_labels[4];
 };
 
+/**
+ * struct flow_dissector_key_hash:
+ * @hash: hash value
+ */
+struct flow_dissector_key_hash {
+	u32 hash;
+};
+
 enum flow_dissector_key_id {
 	FLOW_DISSECTOR_KEY_CONTROL, /* struct flow_dissector_key_control */
 	FLOW_DISSECTOR_KEY_BASIC, /* struct flow_dissector_key_basic */
@@ -271,6 +279,7 @@ enum flow_dissector_key_id {
 	FLOW_DISSECTOR_KEY_ENC_OPTS, /* struct flow_dissector_key_enc_opts */
 	FLOW_DISSECTOR_KEY_META, /* struct flow_dissector_key_meta */
 	FLOW_DISSECTOR_KEY_CT, /* struct flow_dissector_key_ct */
+	FLOW_DISSECTOR_KEY_HASH, /* struct flow_dissector_key_hash */
 
 	FLOW_DISSECTOR_KEY_MAX,
 };
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index d02df0b6d0d9..c114f0e3ef4f 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -392,6 +392,23 @@ skb_flow_dissect_tunnel_info(const struct sk_buff *skb,
 }
 EXPORT_SYMBOL(skb_flow_dissect_tunnel_info);
 
+void skb_flow_dissect_hash(const struct sk_buff *skb,
+			   struct flow_dissector *flow_dissector,
+			   void *target_container)
+{
+	struct flow_dissector_key_hash *key;
+
+	if (!dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_HASH))
+		return;
+
+	key = skb_flow_dissector_target(flow_dissector,
+					FLOW_DISSECTOR_KEY_HASH,
+					target_container);
+
+	key->hash = skb_get_hash_raw(skb);
+}
+EXPORT_SYMBOL(skb_flow_dissect_hash);
+
 static enum flow_dissect_ret
 __skb_flow_dissect_mpls(const struct sk_buff *skb,
 			struct flow_dissector *flow_dissector,
-- 
2.25.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next v3 4/4] net/sched: cls_flower: Add hash info to flow classification
  2020-07-11 21:28 [PATCH net-next v3 0/4] ] TC datapath hash api Ariel Levkovich
                   ` (2 preceding siblings ...)
  2020-07-11 21:28 ` [PATCH net-next v3 3/4] net/flow_dissector: add packet hash dissection Ariel Levkovich
@ 2020-07-11 21:28 ` Ariel Levkovich
  3 siblings, 0 replies; 14+ messages in thread
From: Ariel Levkovich @ 2020-07-11 21:28 UTC (permalink / raw)
  To: netdev
  Cc: jiri, kuba, jhs, xiyou.wangcong, ast, daniel, Ariel Levkovich,
	Jiri Pirko

Adding new cls flower keys for hash value and hash
mask and dissect the hash info from the skb into
the flow key towards flow classication.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
---
 include/uapi/linux/pkt_cls.h |  3 +++
 net/sched/cls_flower.c       | 16 ++++++++++++++++
 2 files changed, 19 insertions(+)

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 2fd93389d091..ef145320ee99 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -579,6 +579,9 @@ enum {
 
 	TCA_FLOWER_KEY_MPLS_OPTS,
 
+	TCA_FLOWER_KEY_HASH,		/* u32 */
+	TCA_FLOWER_KEY_HASH_MASK,	/* u32 */
+
 	__TCA_FLOWER_MAX,
 };
 
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index b2da37286082..ff739e0d86fc 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -64,6 +64,7 @@ struct fl_flow_key {
 		};
 	} tp_range;
 	struct flow_dissector_key_ct ct;
+	struct flow_dissector_key_hash hash;
 } __aligned(BITS_PER_LONG / 8); /* Ensure that we can do comparisons as longs. */
 
 struct fl_flow_mask_range {
@@ -318,6 +319,7 @@ static int fl_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 		skb_flow_dissect_ct(skb, &mask->dissector, &skb_key,
 				    fl_ct_info_to_flower_map,
 				    ARRAY_SIZE(fl_ct_info_to_flower_map));
+		skb_flow_dissect_hash(skb, &mask->dissector, &skb_key);
 		skb_flow_dissect(skb, &mask->dissector, &skb_key, 0);
 
 		f = fl_mask_lookup(mask, &skb_key);
@@ -694,6 +696,9 @@ static const struct nla_policy fl_policy[TCA_FLOWER_MAX + 1] = {
 	[TCA_FLOWER_KEY_CT_LABELS_MASK]	= { .type = NLA_BINARY,
 					    .len = 128 / BITS_PER_BYTE },
 	[TCA_FLOWER_FLAGS]		= { .type = NLA_U32 },
+	[TCA_FLOWER_KEY_HASH]		= { .type = NLA_U32 },
+	[TCA_FLOWER_KEY_HASH_MASK]	= { .type = NLA_U32 },
+
 };
 
 static const struct nla_policy
@@ -1625,6 +1630,10 @@ static int fl_set_key(struct net *net, struct nlattr **tb,
 
 	fl_set_key_ip(tb, true, &key->enc_ip, &mask->enc_ip);
 
+	fl_set_key_val(tb, &key->hash.hash, TCA_FLOWER_KEY_HASH,
+		       &mask->hash.hash, TCA_FLOWER_KEY_HASH_MASK,
+		       sizeof(key->hash.hash));
+
 	if (tb[TCA_FLOWER_KEY_ENC_OPTS]) {
 		ret = fl_set_enc_opt(tb, key, mask, extack);
 		if (ret)
@@ -1739,6 +1748,8 @@ static void fl_init_dissector(struct flow_dissector *dissector,
 			     FLOW_DISSECTOR_KEY_ENC_OPTS, enc_opts);
 	FL_KEY_SET_IF_MASKED(mask, keys, cnt,
 			     FLOW_DISSECTOR_KEY_CT, ct);
+	FL_KEY_SET_IF_MASKED(mask, keys, cnt,
+			     FLOW_DISSECTOR_KEY_HASH, hash);
 
 	skb_flow_dissector_init(dissector, keys, cnt);
 }
@@ -2959,6 +2970,11 @@ static int fl_dump_key(struct sk_buff *skb, struct net *net,
 	if (fl_dump_key_flags(skb, key->control.flags, mask->control.flags))
 		goto nla_put_failure;
 
+	if (fl_dump_key_val(skb, &key->hash.hash, TCA_FLOWER_KEY_HASH,
+			     &mask->hash.hash, TCA_FLOWER_KEY_HASH_MASK,
+			     sizeof(key->hash.hash)))
+		goto nla_put_failure;
+
 	return 0;
 
 nla_put_failure:
-- 
2.25.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v3 1/4] net/sched: Add skb->hash field editing via act_skbedit
  2020-07-11 21:28 ` [PATCH net-next v3 1/4] net/sched: Add skb->hash field editing via act_skbedit Ariel Levkovich
@ 2020-07-13 17:11   ` Davide Caratti
  2020-07-15 14:25     ` Ariel Levkovich
  2020-07-15  1:14   ` David Ahern
  1 sibling, 1 reply; 14+ messages in thread
From: Davide Caratti @ 2020-07-13 17:11 UTC (permalink / raw)
  To: Ariel Levkovich, netdev
  Cc: jiri, kuba, jhs, xiyou.wangcong, ast, daniel, Jiri Pirko

On Sun, 2020-07-12 at 00:28 +0300, Ariel Levkovich wrote:
> Extend act_skbedit api to allow writing into skb->hash
> field.
> 
[...]

> Usage example:
> 
> $ tc filter add dev ens1f0_0 ingress \
> prio 1 chain 0 proto ip \
> flower ip_proto tcp \
> action skbedit hash asym_l4 basis 5 \
> action goto chain 2

hello Ariel, thanks for the patch!

> Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
> Reviewed-by: Jiri Pirko <jiri@mellanox.com>
> ---
>  include/net/tc_act/tc_skbedit.h        |  2 ++
>  include/uapi/linux/tc_act/tc_skbedit.h |  7 +++++
>  net/sched/act_skbedit.c                | 38 ++++++++++++++++++++++++++
>  3 files changed, 47 insertions(+)

this diffstat is ok for l4 hash calculation :)

> diff --git a/include/net/tc_act/tc_skbedit.h b/include/net/tc_act/tc_skbedit.h
> index 00bfee70609e..44a8a4625556 100644
> --- a/include/net/tc_act/tc_skbedit.h
> +++ b/include/net/tc_act/tc_skbedit.h
> @@ -18,6 +18,8 @@ struct tcf_skbedit_params {
>  	u32 mask;
>  	u16 queue_mapping;
>  	u16 ptype;
> +	u32 hash_alg;
> +	u32 hash_basis;
>  	struct rcu_head rcu;
>  };
>  
> diff --git a/include/uapi/linux/tc_act/tc_skbedit.h b/include/uapi/linux/tc_act/tc_skbedit.h
> index 800e93377218..5877811b093b 100644
> --- a/include/uapi/linux/tc_act/tc_skbedit.h
> +++ b/include/uapi/linux/tc_act/tc_skbedit.h
> @@ -29,6 +29,11 @@
>  #define SKBEDIT_F_PTYPE			0x8
>  #define SKBEDIT_F_MASK			0x10
>  #define SKBEDIT_F_INHERITDSFIELD	0x20
> +#define SKBEDIT_F_HASH			0x40
> +
> +enum {
> +	TCA_SKBEDIT_HASH_ALG_ASYM_L4,
> +};

nit:

it's a common practice, when specifying enums in the uAPI, to set the
first value  "UNSPEC", and last one as "MAX":

enum {
	TCA_SKBEDIT_HASH_ALG_UNSPEC,
	TCA_SKBEDIT_HASH_ALG_ASYM_L4,
	__TCA_SKBEDIT_HASH_ALG_MAX
};

see below the rationale:

>  struct tc_skbedit {
>  	tc_gen;
> @@ -45,6 +50,8 @@ enum {
>  	TCA_SKBEDIT_PTYPE,
>  	TCA_SKBEDIT_MASK,
>  	TCA_SKBEDIT_FLAGS,
> +	TCA_SKBEDIT_HASH,
> +	TCA_SKBEDIT_HASH_BASIS,
>  	__TCA_SKBEDIT_MAX
>  };
>  #define TCA_SKBEDIT_MAX (__TCA_SKBEDIT_MAX - 1)
> diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
> index b125b2be4467..2cc66c798afb 100644
> --- a/net/sched/act_skbedit.c
> +++ b/net/sched/act_skbedit.c
> @@ -66,6 +66,20 @@ static int tcf_skbedit_act(struct sk_buff *skb, const struct tc_action *a,
>  	}
>  	if (params->flags & SKBEDIT_F_PTYPE)
>  		skb->pkt_type = params->ptype;
> +
> +	if (params->flags & SKBEDIT_F_HASH) {
> +		u32 hash;
> +
> +		hash = skb_get_hash(skb);
> +		/* If a hash basis was provided, add it into
> +		 * hash calculation here and re-set skb->hash
> +		 * to the new result with sw_hash indication
> +		 * and keeping the l4 hash indication.
> +		 */
> +		hash = jhash_1word(hash, params->hash_basis);
> +		__skb_set_sw_hash(skb, hash, skb->l4_hash);
> +	}

in this way you don't need to define a value in 'flags'
(SKBEDIT_F_HASH), you just need to check if params->hash_alg is not
zero:
	if (params->hash_alg) {
		....
	}

>  	return action;
>  
>  err:
> @@ -91,6 +105,8 @@ static const struct nla_policy skbedit_policy[TCA_SKBEDIT_MAX + 1] = {
>  	[TCA_SKBEDIT_PTYPE]		= { .len = sizeof(u16) },
>  	[TCA_SKBEDIT_MASK]		= { .len = sizeof(u32) },
>  	[TCA_SKBEDIT_FLAGS]		= { .len = sizeof(u64) },
> +	[TCA_SKBEDIT_HASH]		= { .len = sizeof(u32) },
> +	[TCA_SKBEDIT_HASH_BASIS]	= { .len = sizeof(u32) },
>  };
>  
>  static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
> @@ -107,6 +123,7 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
>  	struct tcf_skbedit *d;
>  	u32 flags = 0, *priority = NULL, *mark = NULL, *mask = NULL;
>  	u16 *queue_mapping = NULL, *ptype = NULL;
> +	u32 hash_alg, hash_basis = 0;
>  	bool exists = false;
>  	int ret = 0, err;
>  	u32 index;
> @@ -156,6 +173,17 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
>  			flags |= SKBEDIT_F_INHERITDSFIELD;
>  	}
>  
> +	if (tb[TCA_SKBEDIT_HASH] != NULL) {
> +		hash_alg = nla_get_u32(tb[TCA_SKBEDIT_HASH]);
> +		if (hash_alg > TCA_SKBEDIT_HASH_ALG_ASYM_L4)
> +			return -EINVAL;

moreover, even without doing the strict validation, when somebody in the
future will extend the uAPI, he will not need to change the line above.
The following test will validate all good values of hash_alg:

	if (!hash_alg || hash_alg >= __TCA_SKBEDIT_HASH_ALG_MAX) {
		NL_SET_ERR_MSG_MOD(extack, "hash_alg is out of range");
		return -EINVAL;
 	}

WDYT?

thanks!
-- 
davide



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v3 2/4] net/sched: Introduce action hash
  2020-07-11 21:28 ` [PATCH net-next v3 2/4] net/sched: Introduce action hash Ariel Levkovich
@ 2020-07-13 22:04   ` Cong Wang
  2020-07-14  3:17     ` Ariel Levkovich
  0 siblings, 1 reply; 14+ messages in thread
From: Cong Wang @ 2020-07-13 22:04 UTC (permalink / raw)
  To: Ariel Levkovich
  Cc: Linux Kernel Network Developers, Jiri Pirko, Jakub Kicinski,
	Jamal Hadi Salim, Alexei Starovoitov, Daniel Borkmann,
	Jiri Pirko

On Sat, Jul 11, 2020 at 2:28 PM Ariel Levkovich <lariel@mellanox.com> wrote:
>
> Allow user to set a packet's hash value using a bpf program.
>
> The user provided BPF program is required to compute and return
> a hash value for the packet which is then stored in skb->hash.

Can be done by act_bpf, right?

>
> Using this action to set the skb->hash is an alternative to setting
> it with act_skbedit and can be useful for future HW offload support
> when the HW hash function is different then the kernel's hash
> function.
> By using a bpg program that emulates the HW hash function user
> can ensure hash consistency between the SW and the HW.

It sounds weird that the sole reason to add a new action is
because of HW offloading. What prevents us extending the
existing actions to support HW offloading?

Thanks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v3 2/4] net/sched: Introduce action hash
  2020-07-13 22:04   ` Cong Wang
@ 2020-07-14  3:17     ` Ariel Levkovich
  2020-07-15  6:12       ` Cong Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Ariel Levkovich @ 2020-07-14  3:17 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, Jiri Pirko, Jakub Kicinski,
	Jamal Hadi Salim, Alexei Starovoitov, Daniel Borkmann,
	Jiri Pirko

On 7/13/20 6:04 PM, Cong Wang wrote:
> On Sat, Jul 11, 2020 at 2:28 PM Ariel Levkovich <lariel@mellanox.com> wrote:
>> Allow user to set a packet's hash value using a bpf program.
>>
>> The user provided BPF program is required to compute and return
>> a hash value for the packet which is then stored in skb->hash.
> Can be done by act_bpf, right?

Right. We already agreed on that.

Nevertheless, as I mentioned, act_bpf is not offloadable.

Device driver has no clue what the program does.

>
>> Using this action to set the skb->hash is an alternative to setting
>> it with act_skbedit and can be useful for future HW offload support
>> when the HW hash function is different then the kernel's hash
>> function.
>> By using a bpg program that emulates the HW hash function user
>> can ensure hash consistency between the SW and the HW.
> It sounds weird that the sole reason to add a new action is
> because of HW offloading. What prevents us extending the
> existing actions to support HW offloading?
>
> Thanks.
Something like adding a parameter to act_bpf that provides information 
on the program

and what it does?


Ariel



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v3 1/4] net/sched: Add skb->hash field editing via act_skbedit
  2020-07-11 21:28 ` [PATCH net-next v3 1/4] net/sched: Add skb->hash field editing via act_skbedit Ariel Levkovich
  2020-07-13 17:11   ` Davide Caratti
@ 2020-07-15  1:14   ` David Ahern
  1 sibling, 0 replies; 14+ messages in thread
From: David Ahern @ 2020-07-15  1:14 UTC (permalink / raw)
  To: Ariel Levkovich, netdev
  Cc: jiri, kuba, jhs, xiyou.wangcong, ast, daniel, Jiri Pirko

On 7/11/20 3:28 PM, Ariel Levkovich wrote:
> @@ -156,6 +173,17 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
>  			flags |= SKBEDIT_F_INHERITDSFIELD;
>  	}
>  
> +	if (tb[TCA_SKBEDIT_HASH] != NULL) {
> +		hash_alg = nla_get_u32(tb[TCA_SKBEDIT_HASH]);
> +		if (hash_alg > TCA_SKBEDIT_HASH_ALG_ASYM_L4)
> +			return -EINVAL;

tcf_skbedit_init has an extack argument, so fill in a message stating
why it is failing EINVAL.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v3 2/4] net/sched: Introduce action hash
  2020-07-14  3:17     ` Ariel Levkovich
@ 2020-07-15  6:12       ` Cong Wang
  2020-07-15 13:30         ` Ariel Levkovich
  0 siblings, 1 reply; 14+ messages in thread
From: Cong Wang @ 2020-07-15  6:12 UTC (permalink / raw)
  To: Ariel Levkovich
  Cc: Linux Kernel Network Developers, Jiri Pirko, Jakub Kicinski,
	Jamal Hadi Salim, Alexei Starovoitov, Daniel Borkmann,
	Jiri Pirko

On Mon, Jul 13, 2020 at 8:17 PM Ariel Levkovich <lariel@mellanox.com> wrote:
>
> On 7/13/20 6:04 PM, Cong Wang wrote:
> > On Sat, Jul 11, 2020 at 2:28 PM Ariel Levkovich <lariel@mellanox.com> wrote:
> >> Allow user to set a packet's hash value using a bpf program.
> >>
> >> The user provided BPF program is required to compute and return
> >> a hash value for the packet which is then stored in skb->hash.
> > Can be done by act_bpf, right?
>
> Right. We already agreed on that.
>
> Nevertheless, as I mentioned, act_bpf is not offloadable.
>
> Device driver has no clue what the program does.

What about offloading act_skbedit? You care about offloading
the skb->hash computation, not about bpf.

Thanks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v3 2/4] net/sched: Introduce action hash
  2020-07-15  6:12       ` Cong Wang
@ 2020-07-15 13:30         ` Ariel Levkovich
  2020-07-15 18:49           ` Daniel Borkmann
  0 siblings, 1 reply; 14+ messages in thread
From: Ariel Levkovich @ 2020-07-15 13:30 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, Jiri Pirko, Jakub Kicinski,
	Jamal Hadi Salim, Alexei Starovoitov, Daniel Borkmann,
	Jiri Pirko

On 7/15/20 2:12 AM, Cong Wang wrote:
> On Mon, Jul 13, 2020 at 8:17 PM Ariel Levkovich <lariel@mellanox.com> wrote:
>> On 7/13/20 6:04 PM, Cong Wang wrote:
>>> On Sat, Jul 11, 2020 at 2:28 PM Ariel Levkovich <lariel@mellanox.com> wrote:
>>>> Allow user to set a packet's hash value using a bpf program.
>>>>
>>>> The user provided BPF program is required to compute and return
>>>> a hash value for the packet which is then stored in skb->hash.
>>> Can be done by act_bpf, right?
>> Right. We already agreed on that.
>>
>> Nevertheless, as I mentioned, act_bpf is not offloadable.
>>
>> Device driver has no clue what the program does.
> What about offloading act_skbedit? You care about offloading
> the skb->hash computation, not about bpf.
>
> Thanks.


That's true but act_skedit provides (according to the current design) hash

computation using a kernel implemented algorithm.

HW not necessarily can offload this kernel based jhash function and 
therefore

we introduce the bpf option. With bpf the user can provide an implemenation

of a hash function that the HW can actually offload and that way user

maintains consistency between SW hash calculation and HW.

For example, in cases where offload is added dynamically as traffic 
flows, like

in the OVS case, first packets will go to SW and hash will be calculated 
on them

using bpf that emulates the HW hash so that this packet will get the 
same hash

result that it will later get in HW when the flow is offloaded.


If there's a strong objection to adding a new action,

IMO, we can include the bpf option in act_skbedit - action skbedit hash 
bpf <bpf.o>

What do u think?

Thanks,

Ariel



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v3 1/4] net/sched: Add skb->hash field editing via act_skbedit
  2020-07-13 17:11   ` Davide Caratti
@ 2020-07-15 14:25     ` Ariel Levkovich
  0 siblings, 0 replies; 14+ messages in thread
From: Ariel Levkovich @ 2020-07-15 14:25 UTC (permalink / raw)
  To: Davide Caratti, netdev
  Cc: jiri, kuba, jhs, xiyou.wangcong, ast, daniel, Jiri Pirko

On 7/13/20 1:11 PM, Davide Caratti wrote:
> On Sun, 2020-07-12 at 00:28 +0300, Ariel Levkovich wrote:
>> Extend act_skbedit api to allow writing into skb->hash
>> field.
>>
> [...]
>
>> Usage example:
>>
>> $ tc filter add dev ens1f0_0 ingress \
>> prio 1 chain 0 proto ip \
>> flower ip_proto tcp \
>> action skbedit hash asym_l4 basis 5 \
>> action goto chain 2
> hello Ariel, thanks for the patch!
>
>> Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
>> Reviewed-by: Jiri Pirko <jiri@mellanox.com>
>> ---
>>   include/net/tc_act/tc_skbedit.h        |  2 ++
>>   include/uapi/linux/tc_act/tc_skbedit.h |  7 +++++
>>   net/sched/act_skbedit.c                | 38 ++++++++++++++++++++++++++
>>   3 files changed, 47 insertions(+)
> this diffstat is ok for l4 hash calculation :)
>
>> diff --git a/include/net/tc_act/tc_skbedit.h b/include/net/tc_act/tc_skbedit.h
>> index 00bfee70609e..44a8a4625556 100644
>> --- a/include/net/tc_act/tc_skbedit.h
>> +++ b/include/net/tc_act/tc_skbedit.h
>> @@ -18,6 +18,8 @@ struct tcf_skbedit_params {
>>   	u32 mask;
>>   	u16 queue_mapping;
>>   	u16 ptype;
>> +	u32 hash_alg;
>> +	u32 hash_basis;
>>   	struct rcu_head rcu;
>>   };
>>   
>> diff --git a/include/uapi/linux/tc_act/tc_skbedit.h b/include/uapi/linux/tc_act/tc_skbedit.h
>> index 800e93377218..5877811b093b 100644
>> --- a/include/uapi/linux/tc_act/tc_skbedit.h
>> +++ b/include/uapi/linux/tc_act/tc_skbedit.h
>> @@ -29,6 +29,11 @@
>>   #define SKBEDIT_F_PTYPE			0x8
>>   #define SKBEDIT_F_MASK			0x10
>>   #define SKBEDIT_F_INHERITDSFIELD	0x20
>> +#define SKBEDIT_F_HASH			0x40
>> +
>> +enum {
>> +	TCA_SKBEDIT_HASH_ALG_ASYM_L4,
>> +};
> nit:
>
> it's a common practice, when specifying enums in the uAPI, to set the
> first value  "UNSPEC", and last one as "MAX":
>
> enum {
> 	TCA_SKBEDIT_HASH_ALG_UNSPEC,
> 	TCA_SKBEDIT_HASH_ALG_ASYM_L4,
> 	__TCA_SKBEDIT_HASH_ALG_MAX
> };
>
> see below the rationale:


Agree. Missed that. Actual enums should start at 1.

>
>>   struct tc_skbedit {
>>   	tc_gen;
>> @@ -45,6 +50,8 @@ enum {
>>   	TCA_SKBEDIT_PTYPE,
>>   	TCA_SKBEDIT_MASK,
>>   	TCA_SKBEDIT_FLAGS,
>> +	TCA_SKBEDIT_HASH,
>> +	TCA_SKBEDIT_HASH_BASIS,
>>   	__TCA_SKBEDIT_MAX
>>   };
>>   #define TCA_SKBEDIT_MAX (__TCA_SKBEDIT_MAX - 1)
>> diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
>> index b125b2be4467..2cc66c798afb 100644
>> --- a/net/sched/act_skbedit.c
>> +++ b/net/sched/act_skbedit.c
>> @@ -66,6 +66,20 @@ static int tcf_skbedit_act(struct sk_buff *skb, const struct tc_action *a,
>>   	}
>>   	if (params->flags & SKBEDIT_F_PTYPE)
>>   		skb->pkt_type = params->ptype;
>> +
>> +	if (params->flags & SKBEDIT_F_HASH) {
>> +		u32 hash;
>> +
>> +		hash = skb_get_hash(skb);
>> +		/* If a hash basis was provided, add it into
>> +		 * hash calculation here and re-set skb->hash
>> +		 * to the new result with sw_hash indication
>> +		 * and keeping the l4 hash indication.
>> +		 */
>> +		hash = jhash_1word(hash, params->hash_basis);
>> +		__skb_set_sw_hash(skb, hash, skb->l4_hash);
>> +	}
> in this way you don't need to define a value in 'flags'
> (SKBEDIT_F_HASH), you just need to check if params->hash_alg is not
> zero:
> 	if (params->hash_alg) {
> 		....
> 	}
>
>>   	return action;
>>   
>>   err:
>> @@ -91,6 +105,8 @@ static const struct nla_policy skbedit_policy[TCA_SKBEDIT_MAX + 1] = {
>>   	[TCA_SKBEDIT_PTYPE]		= { .len = sizeof(u16) },
>>   	[TCA_SKBEDIT_MASK]		= { .len = sizeof(u32) },
>>   	[TCA_SKBEDIT_FLAGS]		= { .len = sizeof(u64) },
>> +	[TCA_SKBEDIT_HASH]		= { .len = sizeof(u32) },
>> +	[TCA_SKBEDIT_HASH_BASIS]	= { .len = sizeof(u32) },
>>   };
>>   
>>   static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
>> @@ -107,6 +123,7 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
>>   	struct tcf_skbedit *d;
>>   	u32 flags = 0, *priority = NULL, *mark = NULL, *mask = NULL;
>>   	u16 *queue_mapping = NULL, *ptype = NULL;
>> +	u32 hash_alg, hash_basis = 0;
>>   	bool exists = false;
>>   	int ret = 0, err;
>>   	u32 index;
>> @@ -156,6 +173,17 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
>>   			flags |= SKBEDIT_F_INHERITDSFIELD;
>>   	}
>>   
>> +	if (tb[TCA_SKBEDIT_HASH] != NULL) {
>> +		hash_alg = nla_get_u32(tb[TCA_SKBEDIT_HASH]);
>> +		if (hash_alg > TCA_SKBEDIT_HASH_ALG_ASYM_L4)
>> +			return -EINVAL;
> moreover, even without doing the strict validation, when somebody in the
> future will extend the uAPI, he will not need to change the line above.
> The following test will validate all good values of hash_alg:
>
> 	if (!hash_alg || hash_alg >= __TCA_SKBEDIT_HASH_ALG_MAX) {
> 		NL_SET_ERR_MSG_MOD(extack, "hash_alg is out of range");
> 		return -EINVAL;
>   	}
>
> WDYT?
>
> thanks!

I actually thought about it during implementation. Dropped it eventually.


Thanks for the comments.





^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v3 2/4] net/sched: Introduce action hash
  2020-07-15 13:30         ` Ariel Levkovich
@ 2020-07-15 18:49           ` Daniel Borkmann
  2020-07-16 15:40             ` Ariel Levkovich
  0 siblings, 1 reply; 14+ messages in thread
From: Daniel Borkmann @ 2020-07-15 18:49 UTC (permalink / raw)
  To: Ariel Levkovich, Cong Wang
  Cc: Linux Kernel Network Developers, Jiri Pirko, Jakub Kicinski,
	Jamal Hadi Salim, Alexei Starovoitov, Jiri Pirko

On 7/15/20 3:30 PM, Ariel Levkovich wrote:
> On 7/15/20 2:12 AM, Cong Wang wrote:
>> On Mon, Jul 13, 2020 at 8:17 PM Ariel Levkovich <lariel@mellanox.com> wrote:
>>> On 7/13/20 6:04 PM, Cong Wang wrote:
>>>> On Sat, Jul 11, 2020 at 2:28 PM Ariel Levkovich <lariel@mellanox.com> wrote:
>>>>> Allow user to set a packet's hash value using a bpf program.
>>>>>
>>>>> The user provided BPF program is required to compute and return
>>>>> a hash value for the packet which is then stored in skb->hash.
>>>> Can be done by act_bpf, right?
>>> Right. We already agreed on that.
>>>
>>> Nevertheless, as I mentioned, act_bpf is not offloadable.
>>>
>>> Device driver has no clue what the program does.
>> What about offloading act_skbedit? You care about offloading
>> the skb->hash computation, not about bpf.
>>
>> Thanks.
> 
> That's true but act_skedit provides (according to the current design) hash
> 
> computation using a kernel implemented algorithm.
> 
> HW not necessarily can offload this kernel based jhash function and therefore
> 
> we introduce the bpf option. With bpf the user can provide an implemenation
> 
> of a hash function that the HW can actually offload and that way user
> 
> maintains consistency between SW hash calculation and HW.
> 
> For example, in cases where offload is added dynamically as traffic flows, like
> 
> in the OVS case, first packets will go to SW and hash will be calculated on them
> 
> using bpf that emulates the HW hash so that this packet will get the same hash
> 
> result that it will later get in HW when the flow is offloaded.
> 
> 
> If there's a strong objection to adding a new action,
> 
> IMO, we can include the bpf option in act_skbedit - action skbedit hash bpf <bpf.o>
> 
> What do u think?

Please don't. From a BPF pov this is all very misleading since it might wrongly suggest
to the user that existing means aka {cls,act}_bpf in tc are not capable of already doing
this. They are capable for several years already though. Also, it is very confusing that
act_hash or 'skbedit hash bpf' can do everything that {cls,act}_bpf can do already, so
much beyond setting a hash value (e.g. you could set tunnel keys etc from there). Given
act_hash is only about offloading but nothing else, did you consider for the BPF alternative
to just use plain old classic BPF given you only need to parse the pkt and calc the hash
val but nothing more?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v3 2/4] net/sched: Introduce action hash
  2020-07-15 18:49           ` Daniel Borkmann
@ 2020-07-16 15:40             ` Ariel Levkovich
  0 siblings, 0 replies; 14+ messages in thread
From: Ariel Levkovich @ 2020-07-16 15:40 UTC (permalink / raw)
  To: Daniel Borkmann, Cong Wang
  Cc: Linux Kernel Network Developers, Jiri Pirko, Jakub Kicinski,
	Jamal Hadi Salim, Alexei Starovoitov, Jiri Pirko

On 7/15/20 2:49 PM, Daniel Borkmann wrote:
> On 7/15/20 3:30 PM, Ariel Levkovich wrote:
>> On 7/15/20 2:12 AM, Cong Wang wrote:
>>> On Mon, Jul 13, 2020 at 8:17 PM Ariel Levkovich 
>>> <lariel@mellanox.com> wrote:
>>>> On 7/13/20 6:04 PM, Cong Wang wrote:
>>>>> On Sat, Jul 11, 2020 at 2:28 PM Ariel Levkovich 
>>>>> <lariel@mellanox.com> wrote:
>>>>>> Allow user to set a packet's hash value using a bpf program.
>>>>>>
>>>>>> The user provided BPF program is required to compute and return
>>>>>> a hash value for the packet which is then stored in skb->hash.
>>>>> Can be done by act_bpf, right?
>>>> Right. We already agreed on that.
>>>>
>>>> Nevertheless, as I mentioned, act_bpf is not offloadable.
>>>>
>>>> Device driver has no clue what the program does.
>>> What about offloading act_skbedit? You care about offloading
>>> the skb->hash computation, not about bpf.
>>>
>>> Thanks.
>>
>> That's true but act_skedit provides (according to the current design) 
>> hash
>>
>> computation using a kernel implemented algorithm.
>>
>> HW not necessarily can offload this kernel based jhash function and 
>> therefore
>>
>> we introduce the bpf option. With bpf the user can provide an 
>> implemenation
>>
>> of a hash function that the HW can actually offload and that way user
>>
>> maintains consistency between SW hash calculation and HW.
>>
>> For example, in cases where offload is added dynamically as traffic 
>> flows, like
>>
>> in the OVS case, first packets will go to SW and hash will be 
>> calculated on them
>>
>> using bpf that emulates the HW hash so that this packet will get the 
>> same hash
>>
>> result that it will later get in HW when the flow is offloaded.
>>
>>
>> If there's a strong objection to adding a new action,
>>
>> IMO, we can include the bpf option in act_skbedit - action skbedit 
>> hash bpf <bpf.o>
>>
>> What do u think?
>
> Please don't. From a BPF pov this is all very misleading since it 
> might wrongly suggest
> to the user that existing means aka {cls,act}_bpf in tc are not 
> capable of already doing
> this. They are capable for several years already though. Also, it is 
> very confusing that
> act_hash or 'skbedit hash bpf' can do everything that {cls,act}_bpf 
> can do already, so
> much beyond setting a hash value (e.g. you could set tunnel keys etc 
> from there). Given
> act_hash is only about offloading but nothing else, did you consider 
> for the BPF alternative
> to just use plain old classic BPF given you only need to parse the pkt 
> and calc the hash
> val but nothing more?

You can do almost everything with act_bpf and yet there are explicit 
actions to set a tunnel

key and add/remove MPLS header (and more...).

What do u mean by classic BPF? How will that help with the offload?

It will still go via act_bpf without any indication on what type of 
program is this, won't it?

Thanks,

Ariel



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-07-16 15:40 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-11 21:28 [PATCH net-next v3 0/4] ] TC datapath hash api Ariel Levkovich
2020-07-11 21:28 ` [PATCH net-next v3 1/4] net/sched: Add skb->hash field editing via act_skbedit Ariel Levkovich
2020-07-13 17:11   ` Davide Caratti
2020-07-15 14:25     ` Ariel Levkovich
2020-07-15  1:14   ` David Ahern
2020-07-11 21:28 ` [PATCH net-next v3 2/4] net/sched: Introduce action hash Ariel Levkovich
2020-07-13 22:04   ` Cong Wang
2020-07-14  3:17     ` Ariel Levkovich
2020-07-15  6:12       ` Cong Wang
2020-07-15 13:30         ` Ariel Levkovich
2020-07-15 18:49           ` Daniel Borkmann
2020-07-16 15:40             ` Ariel Levkovich
2020-07-11 21:28 ` [PATCH net-next v3 3/4] net/flow_dissector: add packet hash dissection Ariel Levkovich
2020-07-11 21:28 ` [PATCH net-next v3 4/4] net/sched: cls_flower: Add hash info to flow classification Ariel Levkovich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).