All of lore.kernel.org
 help / color / mirror / Atom feed
* [net-next v3 0/2] net: sched: allow user to select txqueue
@ 2021-12-10  2:36 xiangxia.m.yue
  2021-12-10  2:36 ` [net-next v3 1/2] net: sched: use queue_mapping to pick tx queue xiangxia.m.yue
  2021-12-10  2:36 ` [net-next v3 2/2] net: sched: support hash/classid/cpuid selecting " xiangxia.m.yue
  0 siblings, 2 replies; 15+ messages in thread
From: xiangxia.m.yue @ 2021-12-10  2:36 UTC (permalink / raw)
  To: netdev
  Cc: Tonghao Zhang, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
	David S. Miller, Jakub Kicinski, Jonathan Lemon, Eric Dumazet,
	Alexander Lobakin, Paolo Abeni, Talal Ahmad, Kevin Hao,
	Ilias Apalodimas, Kees Cook, Kumar Kartikeya Dwivedi,
	Antoine Tenart, Wei Wang, Arnd Bergmann

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

Patch 1 allow user to select txqueue in clsact hook.
Patch 2 support skb-hash and classid to select txqueue.

Tonghao Zhang (2):
  net: sched: use queue_mapping to pick tx queue
  net: sched: support hash/classid/cpuid selecting tx queue

 include/linux/netdevice.h              | 21 +++++++
 include/net/tc_act/tc_skbedit.h        |  1 +
 include/uapi/linux/tc_act/tc_skbedit.h |  8 +++
 net/core/dev.c                         |  6 +-
 net/sched/act_skbedit.c                | 76 ++++++++++++++++++++++++--
 5 files changed, 107 insertions(+), 5 deletions(-)

-- 
v2:
* 1/2 change skb->tc_skip_txqueue to per-cpu var, add more commit
* message.
* 2/2 optmize the codes.
v3:
* 2/2 fix the warning, add cpuid hash type.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexander Lobakin <alobakin@pm.me>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Talal Ahmad <talalahmad@google.com>
Cc: Kevin Hao <haokexin@gmail.com>
Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Cc: Antoine Tenart <atenart@kernel.org>
Cc: Wei Wang <weiwan@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
--
2.27.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [net-next v3 1/2] net: sched: use queue_mapping to pick tx queue
  2021-12-10  2:36 [net-next v3 0/2] net: sched: allow user to select txqueue xiangxia.m.yue
@ 2021-12-10  2:36 ` xiangxia.m.yue
  2021-12-13 22:58   ` Cong Wang
  2021-12-10  2:36 ` [net-next v3 2/2] net: sched: support hash/classid/cpuid selecting " xiangxia.m.yue
  1 sibling, 1 reply; 15+ messages in thread
From: xiangxia.m.yue @ 2021-12-10  2:36 UTC (permalink / raw)
  To: netdev
  Cc: Tonghao Zhang, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
	David S. Miller, Jakub Kicinski, Jonathan Lemon, Eric Dumazet,
	Alexander Lobakin, Paolo Abeni, Talal Ahmad, Kevin Hao,
	Ilias Apalodimas, Kees Cook, Kumar Kartikeya Dwivedi,
	Antoine Tenart, Wei Wang, Arnd Bergmann

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

This patch fix issue:
* If we install tc filters with act_skbedit in clsact hook.
  It doesn't work, because netdev_core_pick_tx() overwrites
  queue_mapping.

  $ tc filter ... action skbedit queue_mapping 1

And this patch is useful:
* We can use FQ + EDT to implement efficient policies. Tx queues
  are picked by xps, ndo_select_queue of netdev driver, or skb hash
  in netdev_core_pick_tx(). In fact, the netdev driver, and skb
  hash are _not_ under control. xps uses the CPUs map to select Tx
  queues, but we can't figure out which task_struct of pod/containter
  running on this cpu in most case. We can use clsact filters to classify
  one pod/container traffic to one Tx queue. Why ?

  In containter networking environment, there are two kinds of pod/
  containter/net-namespace. One kind (e.g. P1, P2), the high throughput
  is key in these applications. But avoid running out of network resource,
  the outbound traffic of these pods is limited, using or sharing one
  dedicated Tx queues assigned HTB/TBF/FQ Qdisc. Other kind of pods
  (e.g. Pn), the low latency of data access is key. And the traffic is not
  limited. Pods use or share other dedicated Tx queues assigned FIFO Qdisc.
  This choice provides two benefits. First, contention on the HTB/FQ Qdisc
  lock is significantly reduced since fewer CPUs contend for the same queue.
  More importantly, Qdisc contention can be eliminated completely if each
  CPU has its own FIFO Qdisc for the second kind of pods.

  There must be a mechanism in place to support classifying traffic based on
  pods/container to different Tx queues. Note that clsact is outside of Qdisc
  while Qdisc can run a classifier to select a sub-queue under the lock.

  In general recording the decision in the skb seems a little heavy handed.
  This patch introduces a per-CPU variable, suggested by Eric.

  The skip txqueue flag will be cleared to avoid picking Tx queue in
  next netdev, for example (not usual case):

  eth0 (macvlan in Pod, skbedit queue_mapping) -> eth0.3 (vlan in Host)
  -> eth0 (ixgbe in Host).

  +----+      +----+      +----+
  | P1 |      | P2 |      | Pn |
  +----+      +----+      +----+
    |           |           |
    +-----------+-----------+
                |
                | clsact/skbedit
                |      MQ
                v
    +-----------+-----------+
    | q0        | q1        | qn
    v           v           v
  HTB/FQ      HTB/FQ  ...  FIFO

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexander Lobakin <alobakin@pm.me>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Talal Ahmad <talalahmad@google.com>
Cc: Kevin Hao <haokexin@gmail.com>
Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Cc: Antoine Tenart <atenart@kernel.org>
Cc: Wei Wang <weiwan@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
 include/linux/netdevice.h | 21 +++++++++++++++++++++
 net/core/dev.c            |  6 +++++-
 net/sched/act_skbedit.c   |  4 +++-
 3 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 65117f01d5f2..64f12a819246 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2997,6 +2997,7 @@ struct softnet_data {
 	/* written and read only by owning cpu: */
 	struct {
 		u16 recursion;
+		u8  skip_txqueue;
 		u8  more;
 	} xmit;
 #ifdef CONFIG_RPS
@@ -4633,6 +4634,26 @@ static inline netdev_tx_t netdev_start_xmit(struct sk_buff *skb, struct net_devi
 	return rc;
 }
 
+static inline void netdev_xmit_skip_txqueue(void)
+{
+	__this_cpu_write(softnet_data.xmit.skip_txqueue, 1);
+}
+
+static inline bool netdev_xmit_txqueue_skipped(void)
+{
+	return __this_cpu_read(softnet_data.xmit.skip_txqueue);
+}
+
+static inline struct netdev_queue *
+netdev_tx_queue_mapping(struct net_device *dev, struct sk_buff *skb)
+{
+       int qm = skb_get_queue_mapping(skb);
+
+       /* Take effect only on current netdev. */
+       __this_cpu_write(softnet_data.xmit.skip_txqueue, 0);
+       return netdev_get_tx_queue(dev, netdev_cap_txqueue(dev, qm));
+}
+
 int netdev_class_create_file_ns(const struct class_attribute *class_attr,
 				const void *ns);
 void netdev_class_remove_file_ns(const struct class_attribute *class_attr,
diff --git a/net/core/dev.c b/net/core/dev.c
index aba8acc1238c..a64297a4cc89 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4069,7 +4069,11 @@ static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
 	else
 		skb_dst_force(skb);
 
-	txq = netdev_core_pick_tx(dev, skb, sb_dev);
+	if (netdev_xmit_txqueue_skipped())
+		txq = netdev_tx_queue_mapping(dev, skb);
+	else
+		txq = netdev_core_pick_tx(dev, skb, sb_dev);
+
 	q = rcu_dereference_bh(txq->qdisc);
 
 	trace_net_dev_queue(skb);
diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
index d30ecbfc8f84..498feedad70a 100644
--- a/net/sched/act_skbedit.c
+++ b/net/sched/act_skbedit.c
@@ -58,8 +58,10 @@ static int tcf_skbedit_act(struct sk_buff *skb, const struct tc_action *a,
 		}
 	}
 	if (params->flags & SKBEDIT_F_QUEUE_MAPPING &&
-	    skb->dev->real_num_tx_queues > params->queue_mapping)
+	    skb->dev->real_num_tx_queues > params->queue_mapping) {
+		netdev_xmit_skip_txqueue();
 		skb_set_queue_mapping(skb, params->queue_mapping);
+	}
 	if (params->flags & SKBEDIT_F_MARK) {
 		skb->mark &= ~params->mask;
 		skb->mark |= params->mark & params->mask;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [net-next v3 2/2] net: sched: support hash/classid/cpuid selecting tx queue
  2021-12-10  2:36 [net-next v3 0/2] net: sched: allow user to select txqueue xiangxia.m.yue
  2021-12-10  2:36 ` [net-next v3 1/2] net: sched: use queue_mapping to pick tx queue xiangxia.m.yue
@ 2021-12-10  2:36 ` xiangxia.m.yue
  2021-12-12  2:19   ` Cong Wang
                     ` (2 more replies)
  1 sibling, 3 replies; 15+ messages in thread
From: xiangxia.m.yue @ 2021-12-10  2:36 UTC (permalink / raw)
  To: netdev
  Cc: Tonghao Zhang, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
	David S. Miller, Jakub Kicinski, Jonathan Lemon, Eric Dumazet,
	Alexander Lobakin, Paolo Abeni, Talal Ahmad, Kevin Hao,
	Ilias Apalodimas, Kees Cook, Kumar Kartikeya Dwivedi,
	Antoine Tenart, Wei Wang, Arnd Bergmann

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

This patch allows users to select queue_mapping, range
from A to B. And users can use skb-hash, cgroup classid
and cpuid to select Tx queues. Then we can load balance
packets from A to B queue. The range is an unsigned 16bit
value in decimal format.

$ tc filter ... action skbedit queue_mapping hash-type normal A B

"skbedit queue_mapping QUEUE_MAPPING" (from "man 8 tc-skbedit") is
enhanced with flags:
* SKBEDIT_F_QUEUE_MAPPING_HASH
* SKBEDIT_F_QUEUE_MAPPING_CLASSID
* SKBEDIT_F_QUEUE_MAPPING_CPUID

Use skb->hash, cgroup classid, or cpuid to distribute packets.
Then same range of tx queues can be shared for different flows,
cgroups, or CPUs in a variety of scenarios.

For example, flows F1 may share range R1 with flows F2. The best
way to do that is to set flag to SKBEDIT_F_QUEUE_MAPPING_HASH.
If cgroup C1 share the R1 with cgroup C2 .. Cn, use the
SKBEDIT_F_QUEUE_MAPPING_CLASSID. Of course, in some other scenario,
C1 uses R1, while Cn can use Rn.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexander Lobakin <alobakin@pm.me>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Talal Ahmad <talalahmad@google.com>
Cc: Kevin Hao <haokexin@gmail.com>
Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Cc: Antoine Tenart <atenart@kernel.org>
Cc: Wei Wang <weiwan@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
 include/net/tc_act/tc_skbedit.h        |  1 +
 include/uapi/linux/tc_act/tc_skbedit.h |  8 +++
 net/sched/act_skbedit.c                | 74 ++++++++++++++++++++++++--
 3 files changed, 79 insertions(+), 4 deletions(-)

diff --git a/include/net/tc_act/tc_skbedit.h b/include/net/tc_act/tc_skbedit.h
index 00bfee70609e..ee96e0fa6566 100644
--- a/include/net/tc_act/tc_skbedit.h
+++ b/include/net/tc_act/tc_skbedit.h
@@ -17,6 +17,7 @@ struct tcf_skbedit_params {
 	u32 mark;
 	u32 mask;
 	u16 queue_mapping;
+	u16 mapping_mod;
 	u16 ptype;
 	struct rcu_head rcu;
 };
diff --git a/include/uapi/linux/tc_act/tc_skbedit.h b/include/uapi/linux/tc_act/tc_skbedit.h
index 800e93377218..5642b095d206 100644
--- a/include/uapi/linux/tc_act/tc_skbedit.h
+++ b/include/uapi/linux/tc_act/tc_skbedit.h
@@ -29,6 +29,13 @@
 #define SKBEDIT_F_PTYPE			0x8
 #define SKBEDIT_F_MASK			0x10
 #define SKBEDIT_F_INHERITDSFIELD	0x20
+#define SKBEDIT_F_QUEUE_MAPPING_HASH	0x40
+#define SKBEDIT_F_QUEUE_MAPPING_CLASSID	0x80
+#define SKBEDIT_F_QUEUE_MAPPING_CPUID	0x100
+
+#define SKBEDIT_F_QUEUE_MAPPING_HASH_MASK (SKBEDIT_F_QUEUE_MAPPING_HASH | \
+					   SKBEDIT_F_QUEUE_MAPPING_CLASSID | \
+					   SKBEDIT_F_QUEUE_MAPPING_CPUID)
 
 struct tc_skbedit {
 	tc_gen;
@@ -45,6 +52,7 @@ enum {
 	TCA_SKBEDIT_PTYPE,
 	TCA_SKBEDIT_MASK,
 	TCA_SKBEDIT_FLAGS,
+	TCA_SKBEDIT_QUEUE_MAPPING_MAX,
 	__TCA_SKBEDIT_MAX
 };
 #define TCA_SKBEDIT_MAX (__TCA_SKBEDIT_MAX - 1)
diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
index 498feedad70a..0b0d65d7112e 100644
--- a/net/sched/act_skbedit.c
+++ b/net/sched/act_skbedit.c
@@ -10,6 +10,7 @@
 #include <linux/kernel.h>
 #include <linux/skbuff.h>
 #include <linux/rtnetlink.h>
+#include <net/cls_cgroup.h>
 #include <net/netlink.h>
 #include <net/pkt_sched.h>
 #include <net/ip.h>
@@ -23,6 +24,37 @@
 static unsigned int skbedit_net_id;
 static struct tc_action_ops act_skbedit_ops;
 
+static u16 tcf_skbedit_hash(struct tcf_skbedit_params *params,
+			    struct sk_buff *skb)
+{
+	u16 queue_mapping = params->queue_mapping;
+	u16 mapping_mod = params->mapping_mod;
+	u32 mapping_hash_type = params->flags &
+				SKBEDIT_F_QUEUE_MAPPING_HASH_MASK;
+	u32 hash = 0;
+
+	if (!mapping_hash_type)
+		return netdev_cap_txqueue(skb->dev, queue_mapping);
+
+	switch (mapping_hash_type) {
+	case SKBEDIT_F_QUEUE_MAPPING_CLASSID:
+		hash = jhash_1word(task_get_classid(skb), 0);
+		break;
+	case SKBEDIT_F_QUEUE_MAPPING_HASH:
+		hash = skb_get_hash(skb);
+		break;
+	case SKBEDIT_F_QUEUE_MAPPING_CPUID:
+		hash = raw_smp_processor_id();
+		break;
+	default:
+		net_warn_ratelimited("The type of queue_mapping hash is not supported. 0x%x\n",
+				     mapping_hash_type);
+	}
+
+	queue_mapping = queue_mapping + hash % mapping_mod;
+	return netdev_cap_txqueue(skb->dev, queue_mapping);
+}
+
 static int tcf_skbedit_act(struct sk_buff *skb, const struct tc_action *a,
 			   struct tcf_result *res)
 {
@@ -57,10 +89,9 @@ static int tcf_skbedit_act(struct sk_buff *skb, const struct tc_action *a,
 			break;
 		}
 	}
-	if (params->flags & SKBEDIT_F_QUEUE_MAPPING &&
-	    skb->dev->real_num_tx_queues > params->queue_mapping) {
+	if (params->flags & SKBEDIT_F_QUEUE_MAPPING) {
 		netdev_xmit_skip_txqueue();
-		skb_set_queue_mapping(skb, params->queue_mapping);
+		skb_set_queue_mapping(skb, tcf_skbedit_hash(params, skb));
 	}
 	if (params->flags & SKBEDIT_F_MARK) {
 		skb->mark &= ~params->mask;
@@ -94,6 +125,7 @@ static const struct nla_policy skbedit_policy[TCA_SKBEDIT_MAX + 1] = {
 	[TCA_SKBEDIT_PTYPE]		= { .len = sizeof(u16) },
 	[TCA_SKBEDIT_MASK]		= { .len = sizeof(u32) },
 	[TCA_SKBEDIT_FLAGS]		= { .len = sizeof(u64) },
+	[TCA_SKBEDIT_QUEUE_MAPPING_MAX]	= { .len = sizeof(u16) },
 };
 
 static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
@@ -110,6 +142,7 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
 	struct tcf_skbedit *d;
 	u32 flags = 0, *priority = NULL, *mark = NULL, *mask = NULL;
 	u16 *queue_mapping = NULL, *ptype = NULL;
+	u16 mapping_mod = 0;
 	bool exists = false;
 	int ret = 0, err;
 	u32 index;
@@ -154,7 +187,30 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
 
 	if (tb[TCA_SKBEDIT_FLAGS] != NULL) {
 		u64 *pure_flags = nla_data(tb[TCA_SKBEDIT_FLAGS]);
+		u64 mapping_hash_type = *pure_flags &
+					SKBEDIT_F_QUEUE_MAPPING_HASH_MASK;
+		if (mapping_hash_type) {
+			u16 *queue_mapping_max;
+
+			/* Hash types are mutually exclusive. */
+			if (mapping_hash_type & (mapping_hash_type - 1))
+				return -EINVAL;
+
+			if (!tb[TCA_SKBEDIT_QUEUE_MAPPING_MAX])
+				return -EINVAL;
 
+			if (!tb[TCA_SKBEDIT_QUEUE_MAPPING])
+				return -EINVAL;
+
+			queue_mapping_max =
+				nla_data(tb[TCA_SKBEDIT_QUEUE_MAPPING_MAX]);
+
+			if (*queue_mapping_max < *queue_mapping)
+				return -EINVAL;
+
+			mapping_mod = *queue_mapping_max - *queue_mapping + 1;
+			flags |= mapping_hash_type;
+		}
 		if (*pure_flags & SKBEDIT_F_INHERITDSFIELD)
 			flags |= SKBEDIT_F_INHERITDSFIELD;
 	}
@@ -206,8 +262,10 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
 	params_new->flags = flags;
 	if (flags & SKBEDIT_F_PRIORITY)
 		params_new->priority = *priority;
-	if (flags & SKBEDIT_F_QUEUE_MAPPING)
+	if (flags & SKBEDIT_F_QUEUE_MAPPING) {
 		params_new->queue_mapping = *queue_mapping;
+		params_new->mapping_mod = mapping_mod;
+	}
 	if (flags & SKBEDIT_F_MARK)
 		params_new->mark = *mark;
 	if (flags & SKBEDIT_F_PTYPE)
@@ -274,6 +332,14 @@ static int tcf_skbedit_dump(struct sk_buff *skb, struct tc_action *a,
 		goto nla_put_failure;
 	if (params->flags & SKBEDIT_F_INHERITDSFIELD)
 		pure_flags |= SKBEDIT_F_INHERITDSFIELD;
+	if (params->flags & SKBEDIT_F_QUEUE_MAPPING_HASH_MASK) {
+		if (nla_put_u16(skb, TCA_SKBEDIT_QUEUE_MAPPING_MAX,
+				params->queue_mapping + params->mapping_mod - 1))
+			goto nla_put_failure;
+
+		pure_flags |= params->flags &
+			      SKBEDIT_F_QUEUE_MAPPING_HASH_MASK;
+	}
 	if (pure_flags != 0 &&
 	    nla_put(skb, TCA_SKBEDIT_FLAGS, sizeof(pure_flags), &pure_flags))
 		goto nla_put_failure;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [net-next v3 2/2] net: sched: support hash/classid/cpuid selecting tx queue
  2021-12-10  2:36 ` [net-next v3 2/2] net: sched: support hash/classid/cpuid selecting " xiangxia.m.yue
@ 2021-12-12  2:19   ` Cong Wang
  2021-12-12  2:34     ` Tonghao Zhang
  2021-12-15  1:34   ` Tonghao Zhang
  2021-12-15 16:08   ` Jakub Kicinski
  2 siblings, 1 reply; 15+ messages in thread
From: Cong Wang @ 2021-12-12  2:19 UTC (permalink / raw)
  To: Tonghao Zhang
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	David S. Miller, Jakub Kicinski, Jonathan Lemon, Eric Dumazet,
	Alexander Lobakin, Paolo Abeni, Talal Ahmad, Kevin Hao,
	Ilias Apalodimas, Kees Cook, Kumar Kartikeya Dwivedi,
	Antoine Tenart, Wei Wang, Arnd Bergmann

On Thu, Dec 9, 2021 at 6:36 PM <xiangxia.m.yue@gmail.com> wrote:
>
> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
>
> This patch allows users to select queue_mapping, range
> from A to B. And users can use skb-hash, cgroup classid
> and cpuid to select Tx queues. Then we can load balance
> packets from A to B queue. The range is an unsigned 16bit
> value in decimal format.
>
> $ tc filter ... action skbedit queue_mapping hash-type normal A B
>
> "skbedit queue_mapping QUEUE_MAPPING" (from "man 8 tc-skbedit") is
> enhanced with flags:
> * SKBEDIT_F_QUEUE_MAPPING_HASH
> * SKBEDIT_F_QUEUE_MAPPING_CLASSID
> * SKBEDIT_F_QUEUE_MAPPING_CPUID

With act_bpf you can do all of them... So why do you have to do it
in skbedit?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [net-next v3 2/2] net: sched: support hash/classid/cpuid selecting tx queue
  2021-12-12  2:19   ` Cong Wang
@ 2021-12-12  2:34     ` Tonghao Zhang
  2021-12-13 22:53       ` Cong Wang
  0 siblings, 1 reply; 15+ messages in thread
From: Tonghao Zhang @ 2021-12-12  2:34 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	David S. Miller, Jakub Kicinski, Jonathan Lemon, Eric Dumazet,
	Alexander Lobakin, Paolo Abeni, Talal Ahmad, Kevin Hao,
	Ilias Apalodimas, Kees Cook, Kumar Kartikeya Dwivedi,
	Antoine Tenart, Wei Wang, Arnd Bergmann

On Sun, Dec 12, 2021 at 10:19 AM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> On Thu, Dec 9, 2021 at 6:36 PM <xiangxia.m.yue@gmail.com> wrote:
> >
> > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> >
> > This patch allows users to select queue_mapping, range
> > from A to B. And users can use skb-hash, cgroup classid
> > and cpuid to select Tx queues. Then we can load balance
> > packets from A to B queue. The range is an unsigned 16bit
> > value in decimal format.
> >
> > $ tc filter ... action skbedit queue_mapping hash-type normal A B
> >
> > "skbedit queue_mapping QUEUE_MAPPING" (from "man 8 tc-skbedit") is
> > enhanced with flags:
> > * SKBEDIT_F_QUEUE_MAPPING_HASH
> > * SKBEDIT_F_QUEUE_MAPPING_CLASSID
> > * SKBEDIT_F_QUEUE_MAPPING_CPUID
>
> With act_bpf you can do all of them... So why do you have to do it
> in skbedit?
Hi Cong
This idea is inspired by skbedit queue_mapping, and skbedit is
enhanced by this patch.
We support this in skbedit firstly in production. act_bpf can do more
things than this. Anyway we
can support this in both act_skbedit/acc_bpf. 1/2 is changed from
skip_tx_queue in skb to per-cpu var suggested-by Eric. We need another
patch which can change the
per-cpu var in bpf. I will post this patch later.


-- 
Best regards, Tonghao

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [net-next v3 2/2] net: sched: support hash/classid/cpuid selecting tx queue
  2021-12-12  2:34     ` Tonghao Zhang
@ 2021-12-13 22:53       ` Cong Wang
  2021-12-14  2:19         ` Tonghao Zhang
  2021-12-14 12:13         ` Jamal Hadi Salim
  0 siblings, 2 replies; 15+ messages in thread
From: Cong Wang @ 2021-12-13 22:53 UTC (permalink / raw)
  To: Tonghao Zhang
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	David S. Miller, Jakub Kicinski, Jonathan Lemon, Eric Dumazet,
	Alexander Lobakin, Paolo Abeni, Talal Ahmad, Kevin Hao,
	Ilias Apalodimas, Kees Cook, Kumar Kartikeya Dwivedi,
	Antoine Tenart, Wei Wang, Arnd Bergmann

On Sat, Dec 11, 2021 at 6:34 PM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
>
> On Sun, Dec 12, 2021 at 10:19 AM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >
> > On Thu, Dec 9, 2021 at 6:36 PM <xiangxia.m.yue@gmail.com> wrote:
> > >
> > > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> > >
> > > This patch allows users to select queue_mapping, range
> > > from A to B. And users can use skb-hash, cgroup classid
> > > and cpuid to select Tx queues. Then we can load balance
> > > packets from A to B queue. The range is an unsigned 16bit
> > > value in decimal format.
> > >
> > > $ tc filter ... action skbedit queue_mapping hash-type normal A B
> > >
> > > "skbedit queue_mapping QUEUE_MAPPING" (from "man 8 tc-skbedit") is
> > > enhanced with flags:
> > > * SKBEDIT_F_QUEUE_MAPPING_HASH
> > > * SKBEDIT_F_QUEUE_MAPPING_CLASSID
> > > * SKBEDIT_F_QUEUE_MAPPING_CPUID
> >
> > With act_bpf you can do all of them... So why do you have to do it
> > in skbedit?
> Hi Cong
> This idea is inspired by skbedit queue_mapping, and skbedit is
> enhanced by this patch.

This is exactly my question. ;)

> We support this in skbedit firstly in production. act_bpf can do more
> things than this. Anyway we
> can support this in both act_skbedit/acc_bpf. 1/2 is changed from
> skip_tx_queue in skb to per-cpu var suggested-by Eric. We need another
> patch which can change the
> per-cpu var in bpf. I will post this patch later.

The point is if act_bpf can do it, you don't need to bother skbedit at
all. More importantly, you are enforcing policies in kernel, which is
not encouraged. So unless you provide more details, this patch is not
needed at all.

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [net-next v3 1/2] net: sched: use queue_mapping to pick tx queue
  2021-12-10  2:36 ` [net-next v3 1/2] net: sched: use queue_mapping to pick tx queue xiangxia.m.yue
@ 2021-12-13 22:58   ` Cong Wang
  2021-12-14  1:56     ` Tonghao Zhang
  0 siblings, 1 reply; 15+ messages in thread
From: Cong Wang @ 2021-12-13 22:58 UTC (permalink / raw)
  To: Tonghao Zhang
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	David S. Miller, Jakub Kicinski, Jonathan Lemon, Eric Dumazet,
	Alexander Lobakin, Paolo Abeni, Talal Ahmad, Kevin Hao,
	Ilias Apalodimas, Kees Cook, Kumar Kartikeya Dwivedi,
	Antoine Tenart, Wei Wang, Arnd Bergmann

On Thu, Dec 9, 2021 at 6:36 PM <xiangxia.m.yue@gmail.com> wrote:
>
> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
>
> This patch fix issue:
> * If we install tc filters with act_skbedit in clsact hook.
>   It doesn't work, because netdev_core_pick_tx() overwrites
>   queue_mapping.
>
>   $ tc filter ... action skbedit queue_mapping 1

Interesting, but bpf offers a helper (bpf_set_hash()) to overwrite
skb->hash which could _indirectly_ be used to change queue_mapping.
Does it not work?

BTW, for this skbedit action, I guess it is probably because skbedit was
introduced much earlier than clsact.

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [net-next v3 1/2] net: sched: use queue_mapping to pick tx queue
  2021-12-13 22:58   ` Cong Wang
@ 2021-12-14  1:56     ` Tonghao Zhang
  0 siblings, 0 replies; 15+ messages in thread
From: Tonghao Zhang @ 2021-12-14  1:56 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	David S. Miller, Jakub Kicinski, Jonathan Lemon, Eric Dumazet,
	Alexander Lobakin, Paolo Abeni, Talal Ahmad, Kevin Hao,
	Ilias Apalodimas, Kees Cook, Kumar Kartikeya Dwivedi,
	Antoine Tenart, Wei Wang, Arnd Bergmann

On Tue, Dec 14, 2021 at 6:58 AM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> On Thu, Dec 9, 2021 at 6:36 PM <xiangxia.m.yue@gmail.com> wrote:
> >
> > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> >
> > This patch fix issue:
> > * If we install tc filters with act_skbedit in clsact hook.
> >   It doesn't work, because netdev_core_pick_tx() overwrites
> >   queue_mapping.
> >
> >   $ tc filter ... action skbedit queue_mapping 1
>
> Interesting, but bpf offers a helper (bpf_set_hash()) to overwrite
> skb->hash which could _indirectly_ be used to change queue_mapping.
> Does it not work?
No
The root cause is that queue_mapping is overwritten by netdev_core_pick_tx.
Tx queues are picked by xps, ndo_select_queue of netdev driver, or skb hash
in netdev_core_pick_tx(). We can't only change the skb-hash. BTW, this patch
fix the issue, more importantly, this is useful (more details, see my
commit message):
* We can use FQ + EDT to implement efficient policies. Tx queues
  are picked by xps, ndo_select_queue of netdev driver, or skb hash
  in netdev_core_pick_tx(). In fact, the netdev driver, and skb
  hash are _not_ under control. xps uses the CPUs map to select Tx
  queues, but we can't figure out which task_struct of pod/containter
  running on this cpu in most case. We can use clsact filters to classify
  one pod/container traffic to one Tx queue. Why ?

  In containter networking environment, there are two kinds of pod/
  containter/net-namespace. One kind (e.g. P1, P2), the high throughput
  is key in these applications. But avoid running out of network resource,
  the outbound traffic of these pods is limited, using or sharing one
  dedicated Tx queues assigned HTB/TBF/FQ Qdisc. Other kind of pods
  (e.g. Pn), the low latency of data access is key. And the traffic is not
  limited. Pods use or share other dedicated Tx queues assigned FIFO Qdisc.
  This choice provides two benefits. First, contention on the HTB/FQ Qdisc
  lock is significantly reduced since fewer CPUs contend for the same queue.
  More importantly, Qdisc contention can be eliminated completely if each
  CPU has its own FIFO Qdisc for the second kind of pods.

  There must be a mechanism in place to support classifying traffic based on
  pods/container to different Tx queues. Note that clsact is outside of Qdisc
  while Qdisc can run a classifier to select a sub-queue under the lock.

  In general recording the decision in the skb seems a little heavy handed.
  This patch introduces a per-CPU variable, suggested by Eric.

>
> BTW, for this skbedit action, I guess it is probably because skbedit was
> introduced much earlier than clsact.
Yes
> Thanks.



-- 
Best regards, Tonghao

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [net-next v3 2/2] net: sched: support hash/classid/cpuid selecting tx queue
  2021-12-13 22:53       ` Cong Wang
@ 2021-12-14  2:19         ` Tonghao Zhang
  2021-12-14 12:13         ` Jamal Hadi Salim
  1 sibling, 0 replies; 15+ messages in thread
From: Tonghao Zhang @ 2021-12-14  2:19 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	David S. Miller, Jakub Kicinski, Jonathan Lemon, Eric Dumazet,
	Alexander Lobakin, Paolo Abeni, Talal Ahmad, Kevin Hao,
	Ilias Apalodimas, Kees Cook, Kumar Kartikeya Dwivedi,
	Antoine Tenart, Wei Wang, Arnd Bergmann

On Tue, Dec 14, 2021 at 6:53 AM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> On Sat, Dec 11, 2021 at 6:34 PM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
> >
> > On Sun, Dec 12, 2021 at 10:19 AM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > >
> > > On Thu, Dec 9, 2021 at 6:36 PM <xiangxia.m.yue@gmail.com> wrote:
> > > >
> > > > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> > > >
> > > > This patch allows users to select queue_mapping, range
> > > > from A to B. And users can use skb-hash, cgroup classid
> > > > and cpuid to select Tx queues. Then we can load balance
> > > > packets from A to B queue. The range is an unsigned 16bit
> > > > value in decimal format.
> > > >
> > > > $ tc filter ... action skbedit queue_mapping hash-type normal A B
> > > >
> > > > "skbedit queue_mapping QUEUE_MAPPING" (from "man 8 tc-skbedit") is
> > > > enhanced with flags:
> > > > * SKBEDIT_F_QUEUE_MAPPING_HASH
> > > > * SKBEDIT_F_QUEUE_MAPPING_CLASSID
> > > > * SKBEDIT_F_QUEUE_MAPPING_CPUID
> > >
> > > With act_bpf you can do all of them... So why do you have to do it
> > > in skbedit?
> > Hi Cong
> > This idea is inspired by skbedit queue_mapping, and skbedit is
> > enhanced by this patch.
>
> This is exactly my question. ;)
>
> > We support this in skbedit firstly in production. act_bpf can do more
> > things than this. Anyway we
> > can support this in both act_skbedit/acc_bpf. 1/2 is changed from
> > skip_tx_queue in skb to per-cpu var suggested-by Eric. We need another
> > patch which can change the
> > per-cpu var in bpf. I will post this patch later.
>
> The point is if act_bpf can do it, you don't need to bother skbedit at
> all. More importantly, you are enforcing policies in kernel, which is
> not encouraged. So unless you provide more details, this patch is not
> needed at all.
Hi Cong,
1. As I understand it, act_bpf can work with 1/2 patch(but we should
another path to change the skip_txqueue in bpf).
It is easy for kernel developer. But for another user, it is not easy.
tc command is a choice, and easy to use.
2. BTW, act_bpf can't try to instead act_xxx action in future even
though it can do more thing. right ?
3. This patch is more important to work with 1/2, and only enhance the
skbedit queue_mapping function. not a big change.
4. "policies" ? if selecting tx from a range(this patch) is what you
mean "policies", so change the skb mark, priority, tc-peidt
and other tc-action are "policies" too. we still need the different tc actions.
>
> Thanks.



-- 
Best regards, Tonghao

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [net-next v3 2/2] net: sched: support hash/classid/cpuid selecting tx queue
  2021-12-13 22:53       ` Cong Wang
  2021-12-14  2:19         ` Tonghao Zhang
@ 2021-12-14 12:13         ` Jamal Hadi Salim
  2021-12-15  1:41           ` Tonghao Zhang
  1 sibling, 1 reply; 15+ messages in thread
From: Jamal Hadi Salim @ 2021-12-14 12:13 UTC (permalink / raw)
  To: Cong Wang, Tonghao Zhang
  Cc: Linux Kernel Network Developers, Jiri Pirko, David S. Miller,
	Jakub Kicinski, Jonathan Lemon, Eric Dumazet, Alexander Lobakin,
	Paolo Abeni, Talal Ahmad, Kevin Hao, Ilias Apalodimas, Kees Cook,
	Kumar Kartikeya Dwivedi, Antoine Tenart, Wei Wang, Arnd Bergmann

On 2021-12-13 17:53, Cong Wang wrote:
> On Sat, Dec 11, 2021 at 6:34 PM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:

>> We support this in skbedit firstly in production. act_bpf can do more
>> things than this. Anyway we
>> can support this in both act_skbedit/acc_bpf. 1/2 is changed from
>> skip_tx_queue in skb to per-cpu var suggested-by Eric. We need another
>> patch which can change the
>> per-cpu var in bpf. I will post this patch later.
> 
> The point is if act_bpf can do it, you don't need to bother skbedit at
> all. 

Just a comment on this general statement:
I know of at least one large data centre that wont allow anything
"compiled" to be installed unless it goes through a very long vetting
process(we are talking months).
"Compiled" includes bpf. This is common practise in a few other places
some extreme more than others - the reasons are typically driven by
either some overzelous program manager or security people. Regardless
of the reasoning, this is a real issue.
Note: None of these data centres have a long process if what is
installed is a bash script expressing policy.
In such a case an upstream of a feature such as this is more useful.


cheers,
jamal


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [net-next v3 2/2] net: sched: support hash/classid/cpuid selecting tx queue
  2021-12-10  2:36 ` [net-next v3 2/2] net: sched: support hash/classid/cpuid selecting " xiangxia.m.yue
  2021-12-12  2:19   ` Cong Wang
@ 2021-12-15  1:34   ` Tonghao Zhang
  2021-12-15 16:12     ` Jakub Kicinski
  2021-12-15 16:08   ` Jakub Kicinski
  2 siblings, 1 reply; 15+ messages in thread
From: Tonghao Zhang @ 2021-12-15  1:34 UTC (permalink / raw)
  To: Jakub Kicinski, Eric Dumazet
  Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, David S. Miller,
	Jonathan Lemon, Alexander Lobakin, Paolo Abeni, Talal Ahmad,
	Kevin Hao, Ilias Apalodimas, Kees Cook, Kumar Kartikeya Dwivedi,
	Antoine Tenart, Wei Wang, Arnd Bergmann,
	Linux Kernel Network Developers

On Fri, Dec 10, 2021 at 10:36 AM <xiangxia.m.yue@gmail.com> wrote:
>
> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
>
> This patch allows users to select queue_mapping, range
> from A to B. And users can use skb-hash, cgroup classid
> and cpuid to select Tx queues. Then we can load balance
> packets from A to B queue. The range is an unsigned 16bit
> value in decimal format.
>
> $ tc filter ... action skbedit queue_mapping hash-type normal A B
>
> "skbedit queue_mapping QUEUE_MAPPING" (from "man 8 tc-skbedit") is
> enhanced with flags:
> * SKBEDIT_F_QUEUE_MAPPING_HASH
> * SKBEDIT_F_QUEUE_MAPPING_CLASSID
> * SKBEDIT_F_QUEUE_MAPPING_CPUID
>
> Use skb->hash, cgroup classid, or cpuid to distribute packets.
> Then same range of tx queues can be shared for different flows,
> cgroups, or CPUs in a variety of scenarios.
>
> For example, flows F1 may share range R1 with flows F2. The best
> way to do that is to set flag to SKBEDIT_F_QUEUE_MAPPING_HASH.
> If cgroup C1 share the R1 with cgroup C2 .. Cn, use the
> SKBEDIT_F_QUEUE_MAPPING_CLASSID. Of course, in some other scenario,
> C1 uses R1, while Cn can use Rn.
>
> Cc: Jamal Hadi Salim <jhs@mojatatu.com>
> Cc: Cong Wang <xiyou.wangcong@gmail.com>
> Cc: Jiri Pirko <jiri@resnulli.us>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Alexander Lobakin <alobakin@pm.me>
> Cc: Paolo Abeni <pabeni@redhat.com>
> Cc: Talal Ahmad <talalahmad@google.com>
> Cc: Kevin Hao <haokexin@gmail.com>
> Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> Cc: Antoine Tenart <atenart@kernel.org>
> Cc: Wei Wang <weiwan@google.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Hi Jakub, Eric
Do you have any comment?We talk about this patchset in this thread.
Should I resend this patchset with more commit message ?
> ---
>  include/net/tc_act/tc_skbedit.h        |  1 +
>  include/uapi/linux/tc_act/tc_skbedit.h |  8 +++
>  net/sched/act_skbedit.c                | 74 ++++++++++++++++++++++++--
>  3 files changed, 79 insertions(+), 4 deletions(-)
>
> diff --git a/include/net/tc_act/tc_skbedit.h b/include/net/tc_act/tc_skbedit.h
> index 00bfee70609e..ee96e0fa6566 100644
> --- a/include/net/tc_act/tc_skbedit.h
> +++ b/include/net/tc_act/tc_skbedit.h
> @@ -17,6 +17,7 @@ struct tcf_skbedit_params {
>         u32 mark;
>         u32 mask;
>         u16 queue_mapping;
> +       u16 mapping_mod;
>         u16 ptype;
>         struct rcu_head rcu;
>  };
> diff --git a/include/uapi/linux/tc_act/tc_skbedit.h b/include/uapi/linux/tc_act/tc_skbedit.h
> index 800e93377218..5642b095d206 100644
> --- a/include/uapi/linux/tc_act/tc_skbedit.h
> +++ b/include/uapi/linux/tc_act/tc_skbedit.h
> @@ -29,6 +29,13 @@
>  #define SKBEDIT_F_PTYPE                        0x8
>  #define SKBEDIT_F_MASK                 0x10
>  #define SKBEDIT_F_INHERITDSFIELD       0x20
> +#define SKBEDIT_F_QUEUE_MAPPING_HASH   0x40
> +#define SKBEDIT_F_QUEUE_MAPPING_CLASSID        0x80
> +#define SKBEDIT_F_QUEUE_MAPPING_CPUID  0x100
> +
> +#define SKBEDIT_F_QUEUE_MAPPING_HASH_MASK (SKBEDIT_F_QUEUE_MAPPING_HASH | \
> +                                          SKBEDIT_F_QUEUE_MAPPING_CLASSID | \
> +                                          SKBEDIT_F_QUEUE_MAPPING_CPUID)
>
>  struct tc_skbedit {
>         tc_gen;
> @@ -45,6 +52,7 @@ enum {
>         TCA_SKBEDIT_PTYPE,
>         TCA_SKBEDIT_MASK,
>         TCA_SKBEDIT_FLAGS,
> +       TCA_SKBEDIT_QUEUE_MAPPING_MAX,
>         __TCA_SKBEDIT_MAX
>  };
>  #define TCA_SKBEDIT_MAX (__TCA_SKBEDIT_MAX - 1)
> diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
> index 498feedad70a..0b0d65d7112e 100644
> --- a/net/sched/act_skbedit.c
> +++ b/net/sched/act_skbedit.c
> @@ -10,6 +10,7 @@
>  #include <linux/kernel.h>
>  #include <linux/skbuff.h>
>  #include <linux/rtnetlink.h>
> +#include <net/cls_cgroup.h>
>  #include <net/netlink.h>
>  #include <net/pkt_sched.h>
>  #include <net/ip.h>
> @@ -23,6 +24,37 @@
>  static unsigned int skbedit_net_id;
>  static struct tc_action_ops act_skbedit_ops;
>
> +static u16 tcf_skbedit_hash(struct tcf_skbedit_params *params,
> +                           struct sk_buff *skb)
> +{
> +       u16 queue_mapping = params->queue_mapping;
> +       u16 mapping_mod = params->mapping_mod;
> +       u32 mapping_hash_type = params->flags &
> +                               SKBEDIT_F_QUEUE_MAPPING_HASH_MASK;
> +       u32 hash = 0;
> +
> +       if (!mapping_hash_type)
> +               return netdev_cap_txqueue(skb->dev, queue_mapping);
> +
> +       switch (mapping_hash_type) {
> +       case SKBEDIT_F_QUEUE_MAPPING_CLASSID:
> +               hash = jhash_1word(task_get_classid(skb), 0);
> +               break;
> +       case SKBEDIT_F_QUEUE_MAPPING_HASH:
> +               hash = skb_get_hash(skb);
> +               break;
> +       case SKBEDIT_F_QUEUE_MAPPING_CPUID:
> +               hash = raw_smp_processor_id();
> +               break;
> +       default:
> +               net_warn_ratelimited("The type of queue_mapping hash is not supported. 0x%x\n",
> +                                    mapping_hash_type);
> +       }
> +
> +       queue_mapping = queue_mapping + hash % mapping_mod;
> +       return netdev_cap_txqueue(skb->dev, queue_mapping);
> +}
> +
>  static int tcf_skbedit_act(struct sk_buff *skb, const struct tc_action *a,
>                            struct tcf_result *res)
>  {
> @@ -57,10 +89,9 @@ static int tcf_skbedit_act(struct sk_buff *skb, const struct tc_action *a,
>                         break;
>                 }
>         }
> -       if (params->flags & SKBEDIT_F_QUEUE_MAPPING &&
> -           skb->dev->real_num_tx_queues > params->queue_mapping) {
> +       if (params->flags & SKBEDIT_F_QUEUE_MAPPING) {
>                 netdev_xmit_skip_txqueue();
> -               skb_set_queue_mapping(skb, params->queue_mapping);
> +               skb_set_queue_mapping(skb, tcf_skbedit_hash(params, skb));
>         }
>         if (params->flags & SKBEDIT_F_MARK) {
>                 skb->mark &= ~params->mask;
> @@ -94,6 +125,7 @@ static const struct nla_policy skbedit_policy[TCA_SKBEDIT_MAX + 1] = {
>         [TCA_SKBEDIT_PTYPE]             = { .len = sizeof(u16) },
>         [TCA_SKBEDIT_MASK]              = { .len = sizeof(u32) },
>         [TCA_SKBEDIT_FLAGS]             = { .len = sizeof(u64) },
> +       [TCA_SKBEDIT_QUEUE_MAPPING_MAX] = { .len = sizeof(u16) },
>  };
>
>  static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
> @@ -110,6 +142,7 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
>         struct tcf_skbedit *d;
>         u32 flags = 0, *priority = NULL, *mark = NULL, *mask = NULL;
>         u16 *queue_mapping = NULL, *ptype = NULL;
> +       u16 mapping_mod = 0;
>         bool exists = false;
>         int ret = 0, err;
>         u32 index;
> @@ -154,7 +187,30 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
>
>         if (tb[TCA_SKBEDIT_FLAGS] != NULL) {
>                 u64 *pure_flags = nla_data(tb[TCA_SKBEDIT_FLAGS]);
> +               u64 mapping_hash_type = *pure_flags &
> +                                       SKBEDIT_F_QUEUE_MAPPING_HASH_MASK;
> +               if (mapping_hash_type) {
> +                       u16 *queue_mapping_max;
> +
> +                       /* Hash types are mutually exclusive. */
> +                       if (mapping_hash_type & (mapping_hash_type - 1))
> +                               return -EINVAL;
> +
> +                       if (!tb[TCA_SKBEDIT_QUEUE_MAPPING_MAX])
> +                               return -EINVAL;
>
> +                       if (!tb[TCA_SKBEDIT_QUEUE_MAPPING])
> +                               return -EINVAL;
> +
> +                       queue_mapping_max =
> +                               nla_data(tb[TCA_SKBEDIT_QUEUE_MAPPING_MAX]);
> +
> +                       if (*queue_mapping_max < *queue_mapping)
> +                               return -EINVAL;
> +
> +                       mapping_mod = *queue_mapping_max - *queue_mapping + 1;
> +                       flags |= mapping_hash_type;
> +               }
>                 if (*pure_flags & SKBEDIT_F_INHERITDSFIELD)
>                         flags |= SKBEDIT_F_INHERITDSFIELD;
>         }
> @@ -206,8 +262,10 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
>         params_new->flags = flags;
>         if (flags & SKBEDIT_F_PRIORITY)
>                 params_new->priority = *priority;
> -       if (flags & SKBEDIT_F_QUEUE_MAPPING)
> +       if (flags & SKBEDIT_F_QUEUE_MAPPING) {
>                 params_new->queue_mapping = *queue_mapping;
> +               params_new->mapping_mod = mapping_mod;
> +       }
>         if (flags & SKBEDIT_F_MARK)
>                 params_new->mark = *mark;
>         if (flags & SKBEDIT_F_PTYPE)
> @@ -274,6 +332,14 @@ static int tcf_skbedit_dump(struct sk_buff *skb, struct tc_action *a,
>                 goto nla_put_failure;
>         if (params->flags & SKBEDIT_F_INHERITDSFIELD)
>                 pure_flags |= SKBEDIT_F_INHERITDSFIELD;
> +       if (params->flags & SKBEDIT_F_QUEUE_MAPPING_HASH_MASK) {
> +               if (nla_put_u16(skb, TCA_SKBEDIT_QUEUE_MAPPING_MAX,
> +                               params->queue_mapping + params->mapping_mod - 1))
> +                       goto nla_put_failure;
> +
> +               pure_flags |= params->flags &
> +                             SKBEDIT_F_QUEUE_MAPPING_HASH_MASK;
> +       }
>         if (pure_flags != 0 &&
>             nla_put(skb, TCA_SKBEDIT_FLAGS, sizeof(pure_flags), &pure_flags))
>                 goto nla_put_failure;
> --
> 2.27.0
>


-- 
Best regards, Tonghao

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [net-next v3 2/2] net: sched: support hash/classid/cpuid selecting tx queue
  2021-12-14 12:13         ` Jamal Hadi Salim
@ 2021-12-15  1:41           ` Tonghao Zhang
  0 siblings, 0 replies; 15+ messages in thread
From: Tonghao Zhang @ 2021-12-15  1:41 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Cong Wang, Linux Kernel Network Developers, Jiri Pirko,
	David S. Miller, Jakub Kicinski, Jonathan Lemon, Eric Dumazet,
	Alexander Lobakin, Paolo Abeni, Talal Ahmad, Kevin Hao,
	Ilias Apalodimas, Kees Cook, Kumar Kartikeya Dwivedi,
	Antoine Tenart, Wei Wang, Arnd Bergmann

On Tue, Dec 14, 2021 at 8:13 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> On 2021-12-13 17:53, Cong Wang wrote:
> > On Sat, Dec 11, 2021 at 6:34 PM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
>
> >> We support this in skbedit firstly in production. act_bpf can do more
> >> things than this. Anyway we
> >> can support this in both act_skbedit/acc_bpf. 1/2 is changed from
> >> skip_tx_queue in skb to per-cpu var suggested-by Eric. We need another
> >> patch which can change the
> >> per-cpu var in bpf. I will post this patch later.
> >
> > The point is if act_bpf can do it, you don't need to bother skbedit at
> > all.
>
> Just a comment on this general statement:
> I know of at least one large data centre that wont allow anything
> "compiled" to be installed unless it goes through a very long vetting
> process(we are talking months).
> "Compiled" includes bpf. This is common practise in a few other places
> some extreme more than others - the reasons are typically driven by
> either some overzelous program manager or security people. Regardless
> of the reasoning, this is a real issue.
Yes, I agree with you that
> Note: None of these data centres have a long process if what is
> installed is a bash script expressing policy.
> In such a case an upstream of a feature such as this is more useful.
>
>
> cheers,
> jamal
>


-- 
Best regards, Tonghao

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [net-next v3 2/2] net: sched: support hash/classid/cpuid selecting tx queue
  2021-12-10  2:36 ` [net-next v3 2/2] net: sched: support hash/classid/cpuid selecting " xiangxia.m.yue
  2021-12-12  2:19   ` Cong Wang
  2021-12-15  1:34   ` Tonghao Zhang
@ 2021-12-15 16:08   ` Jakub Kicinski
  2021-12-16 11:29     ` Tonghao Zhang
  2 siblings, 1 reply; 15+ messages in thread
From: Jakub Kicinski @ 2021-12-15 16:08 UTC (permalink / raw)
  To: xiangxia.m.yue
  Cc: netdev, Jamal Hadi Salim, Cong Wang, Jiri Pirko, David S. Miller,
	Jonathan Lemon, Eric Dumazet, Alexander Lobakin, Paolo Abeni,
	Talal Ahmad, Kevin Hao, Ilias Apalodimas, Kees Cook,
	Kumar Kartikeya Dwivedi, Antoine Tenart, Wei Wang, Arnd Bergmann

On Fri, 10 Dec 2021 10:36:26 +0800 xiangxia.m.yue@gmail.com wrote:
> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> 
> This patch allows users to select queue_mapping, range
> from A to B. And users can use skb-hash, cgroup classid
> and cpuid to select Tx queues. Then we can load balance
> packets from A to B queue. The range is an unsigned 16bit
> value in decimal format.
> 
> $ tc filter ... action skbedit queue_mapping hash-type normal A B
> 
> "skbedit queue_mapping QUEUE_MAPPING" (from "man 8 tc-skbedit") is
> enhanced with flags:
> * SKBEDIT_F_QUEUE_MAPPING_HASH
> * SKBEDIT_F_QUEUE_MAPPING_CLASSID
> * SKBEDIT_F_QUEUE_MAPPING_CPUID
> 
> Use skb->hash, cgroup classid, or cpuid to distribute packets.
> Then same range of tx queues can be shared for different flows,
> cgroups, or CPUs in a variety of scenarios.
> 
> For example, flows F1 may share range R1 with flows F2. The best
> way to do that is to set flag to SKBEDIT_F_QUEUE_MAPPING_HASH.
> If cgroup C1 share the R1 with cgroup C2 .. Cn, use the
> SKBEDIT_F_QUEUE_MAPPING_CLASSID. Of course, in some other scenario,
> C1 uses R1, while Cn can use Rn.

> diff --git a/include/net/tc_act/tc_skbedit.h b/include/net/tc_act/tc_skbedit.h
> index 00bfee70609e..ee96e0fa6566 100644
> --- a/include/net/tc_act/tc_skbedit.h
> +++ b/include/net/tc_act/tc_skbedit.h
> @@ -17,6 +17,7 @@ struct tcf_skbedit_params {
>  	u32 mark;
>  	u32 mask;
>  	u16 queue_mapping;
> +	u16 mapping_mod;
>  	u16 ptype;
>  	struct rcu_head rcu;
>  };
> diff --git a/include/uapi/linux/tc_act/tc_skbedit.h b/include/uapi/linux/tc_act/tc_skbedit.h
> index 800e93377218..5642b095d206 100644
> --- a/include/uapi/linux/tc_act/tc_skbedit.h
> +++ b/include/uapi/linux/tc_act/tc_skbedit.h
> @@ -29,6 +29,13 @@
>  #define SKBEDIT_F_PTYPE			0x8
>  #define SKBEDIT_F_MASK			0x10
>  #define SKBEDIT_F_INHERITDSFIELD	0x20
> +#define SKBEDIT_F_QUEUE_MAPPING_HASH	0x40
> +#define SKBEDIT_F_QUEUE_MAPPING_CLASSID	0x80
> +#define SKBEDIT_F_QUEUE_MAPPING_CPUID	0x100
> +
> +#define SKBEDIT_F_QUEUE_MAPPING_HASH_MASK (SKBEDIT_F_QUEUE_MAPPING_HASH | \
> +					   SKBEDIT_F_QUEUE_MAPPING_CLASSID | \
> +					   SKBEDIT_F_QUEUE_MAPPING_CPUID)

Any way we can make these defines shorter?

s/QUEUE_MAPPING_/TXQ_/ ?

>  struct tc_skbedit {
>  	tc_gen;
> @@ -45,6 +52,7 @@ enum {
>  	TCA_SKBEDIT_PTYPE,
>  	TCA_SKBEDIT_MASK,
>  	TCA_SKBEDIT_FLAGS,
> +	TCA_SKBEDIT_QUEUE_MAPPING_MAX,
>  	__TCA_SKBEDIT_MAX
>  };
>  #define TCA_SKBEDIT_MAX (__TCA_SKBEDIT_MAX - 1)
> diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
> index 498feedad70a..0b0d65d7112e 100644
> --- a/net/sched/act_skbedit.c
> +++ b/net/sched/act_skbedit.c
> @@ -10,6 +10,7 @@
>  #include <linux/kernel.h>
>  #include <linux/skbuff.h>
>  #include <linux/rtnetlink.h>
> +#include <net/cls_cgroup.h>
>  #include <net/netlink.h>
>  #include <net/pkt_sched.h>
>  #include <net/ip.h>
> @@ -23,6 +24,37 @@
>  static unsigned int skbedit_net_id;
>  static struct tc_action_ops act_skbedit_ops;
>  
> +static u16 tcf_skbedit_hash(struct tcf_skbedit_params *params,
> +			    struct sk_buff *skb)
> +{
> +	u16 queue_mapping = params->queue_mapping;
> +	u16 mapping_mod = params->mapping_mod;
> +	u32 mapping_hash_type = params->flags &
> +				SKBEDIT_F_QUEUE_MAPPING_HASH_MASK;

Don't do inline init if the init doesn't fit on a line.

> +	u32 hash = 0;
> +
> +	if (!mapping_hash_type)
> +		return netdev_cap_txqueue(skb->dev, queue_mapping);

Make it "case 0:" below?

> +	switch (mapping_hash_type) {
> +	case SKBEDIT_F_QUEUE_MAPPING_CLASSID:
> +		hash = jhash_1word(task_get_classid(skb), 0);
> +		break;
> +	case SKBEDIT_F_QUEUE_MAPPING_HASH:
> +		hash = skb_get_hash(skb);
> +		break;
> +	case SKBEDIT_F_QUEUE_MAPPING_CPUID:
> +		hash = raw_smp_processor_id();
> +		break;
> +	default:
> +		net_warn_ratelimited("The type of queue_mapping hash is not supported. 0x%x\n",
> +				     mapping_hash_type);
> +	}
> +
> +	queue_mapping = queue_mapping + hash % mapping_mod;
> +	return netdev_cap_txqueue(skb->dev, queue_mapping);
> +}
> +
>  static int tcf_skbedit_act(struct sk_buff *skb, const struct tc_action *a,
>  			   struct tcf_result *res)
>  {
> @@ -57,10 +89,9 @@ static int tcf_skbedit_act(struct sk_buff *skb, const struct tc_action *a,
>  			break;
>  		}
>  	}
> -	if (params->flags & SKBEDIT_F_QUEUE_MAPPING &&
> -	    skb->dev->real_num_tx_queues > params->queue_mapping) {
> +	if (params->flags & SKBEDIT_F_QUEUE_MAPPING) {
>  		netdev_xmit_skip_txqueue();
> -		skb_set_queue_mapping(skb, params->queue_mapping);
> +		skb_set_queue_mapping(skb, tcf_skbedit_hash(params, skb));
>  	}
>  	if (params->flags & SKBEDIT_F_MARK) {
>  		skb->mark &= ~params->mask;
> @@ -94,6 +125,7 @@ static const struct nla_policy skbedit_policy[TCA_SKBEDIT_MAX + 1] = {
>  	[TCA_SKBEDIT_PTYPE]		= { .len = sizeof(u16) },
>  	[TCA_SKBEDIT_MASK]		= { .len = sizeof(u32) },
>  	[TCA_SKBEDIT_FLAGS]		= { .len = sizeof(u64) },
> +	[TCA_SKBEDIT_QUEUE_MAPPING_MAX]	= { .len = sizeof(u16) },

.type = NLA_U16 ? I think it's okay to use the modern stuff here.

>  };
>  
>  static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
> @@ -110,6 +142,7 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
>  	struct tcf_skbedit *d;
>  	u32 flags = 0, *priority = NULL, *mark = NULL, *mask = NULL;
>  	u16 *queue_mapping = NULL, *ptype = NULL;
> +	u16 mapping_mod = 0;
>  	bool exists = false;
>  	int ret = 0, err;
>  	u32 index;
> @@ -154,7 +187,30 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
>  
>  	if (tb[TCA_SKBEDIT_FLAGS] != NULL) {
>  		u64 *pure_flags = nla_data(tb[TCA_SKBEDIT_FLAGS]);
> +		u64 mapping_hash_type = *pure_flags &
> +					SKBEDIT_F_QUEUE_MAPPING_HASH_MASK;

Again, doing init inline and new line between variables and code

> +		if (mapping_hash_type) {
> +			u16 *queue_mapping_max;
> +
> +			/* Hash types are mutually exclusive. */
> +			if (mapping_hash_type & (mapping_hash_type - 1))
> +				return -EINVAL;
> +
> +			if (!tb[TCA_SKBEDIT_QUEUE_MAPPING_MAX])
> +				return -EINVAL;
>  
> +			if (!tb[TCA_SKBEDIT_QUEUE_MAPPING])
> +				return -EINVAL;

Can be merged with the condition above, unless you add extack, which
perhaps you should..

> +			queue_mapping_max =
> +				nla_data(tb[TCA_SKBEDIT_QUEUE_MAPPING_MAX]);

nla_get_u16()

> +			if (*queue_mapping_max < *queue_mapping)
> +				return -EINVAL;
> +
> +			mapping_mod = *queue_mapping_max - *queue_mapping + 1;
> +			flags |= mapping_hash_type;
> +		}
>  		if (*pure_flags & SKBEDIT_F_INHERITDSFIELD)
>  			flags |= SKBEDIT_F_INHERITDSFIELD;
>  	}
> @@ -206,8 +262,10 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
>  	params_new->flags = flags;
>  	if (flags & SKBEDIT_F_PRIORITY)
>  		params_new->priority = *priority;
> -	if (flags & SKBEDIT_F_QUEUE_MAPPING)
> +	if (flags & SKBEDIT_F_QUEUE_MAPPING) {
>  		params_new->queue_mapping = *queue_mapping;
> +		params_new->mapping_mod = mapping_mod;
> +	}
>  	if (flags & SKBEDIT_F_MARK)
>  		params_new->mark = *mark;
>  	if (flags & SKBEDIT_F_PTYPE)
> @@ -274,6 +332,14 @@ static int tcf_skbedit_dump(struct sk_buff *skb, struct tc_action *a,
>  		goto nla_put_failure;
>  	if (params->flags & SKBEDIT_F_INHERITDSFIELD)
>  		pure_flags |= SKBEDIT_F_INHERITDSFIELD;
> +	if (params->flags & SKBEDIT_F_QUEUE_MAPPING_HASH_MASK) {
> +		if (nla_put_u16(skb, TCA_SKBEDIT_QUEUE_MAPPING_MAX,
> +				params->queue_mapping + params->mapping_mod - 1))
> +			goto nla_put_failure;
> +
> +		pure_flags |= params->flags &
> +			      SKBEDIT_F_QUEUE_MAPPING_HASH_MASK;

Why "params->flags &" ? Given the if above the flag must be set.

> +	}
>  	if (pure_flags != 0 &&
>  	    nla_put(skb, TCA_SKBEDIT_FLAGS, sizeof(pure_flags), &pure_flags))
>  		goto nla_put_failure;


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [net-next v3 2/2] net: sched: support hash/classid/cpuid selecting tx queue
  2021-12-15  1:34   ` Tonghao Zhang
@ 2021-12-15 16:12     ` Jakub Kicinski
  0 siblings, 0 replies; 15+ messages in thread
From: Jakub Kicinski @ 2021-12-15 16:12 UTC (permalink / raw)
  To: Tonghao Zhang
  Cc: Eric Dumazet, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
	David S. Miller, Jonathan Lemon, Alexander Lobakin, Paolo Abeni,
	Talal Ahmad, Kevin Hao, Ilias Apalodimas, Kees Cook,
	Kumar Kartikeya Dwivedi, Antoine Tenart, Wei Wang, Arnd Bergmann,
	Linux Kernel Network Developers

On Wed, 15 Dec 2021 09:34:15 +0800 Tonghao Zhang wrote:
> Do you have any comment?We talk about this patchset in this thread.
> Should I resend this patchset with more commit message ?

Patch 1 LGTM, minor nits on patch 2, I'd side with Jamal 
in the discussion.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [net-next v3 2/2] net: sched: support hash/classid/cpuid selecting tx queue
  2021-12-15 16:08   ` Jakub Kicinski
@ 2021-12-16 11:29     ` Tonghao Zhang
  0 siblings, 0 replies; 15+ messages in thread
From: Tonghao Zhang @ 2021-12-16 11:29 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Cong Wang,
	Jiri Pirko, David S. Miller, Jonathan Lemon, Eric Dumazet,
	Alexander Lobakin, Paolo Abeni, Talal Ahmad, Kevin Hao,
	Ilias Apalodimas, Kees Cook, Kumar Kartikeya Dwivedi,
	Antoine Tenart, Wei Wang, Arnd Bergmann

On Thu, Dec 16, 2021 at 12:09 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Fri, 10 Dec 2021 10:36:26 +0800 xiangxia.m.yue@gmail.com wrote:
> > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> >
> > This patch allows users to select queue_mapping, range
> > from A to B. And users can use skb-hash, cgroup classid
> > and cpuid to select Tx queues. Then we can load balance
> > packets from A to B queue. The range is an unsigned 16bit
> > value in decimal format.
> >
> > $ tc filter ... action skbedit queue_mapping hash-type normal A B
> >
> > "skbedit queue_mapping QUEUE_MAPPING" (from "man 8 tc-skbedit") is
> > enhanced with flags:
> > * SKBEDIT_F_QUEUE_MAPPING_HASH
> > * SKBEDIT_F_QUEUE_MAPPING_CLASSID
> > * SKBEDIT_F_QUEUE_MAPPING_CPUID
> >
> > Use skb->hash, cgroup classid, or cpuid to distribute packets.
> > Then same range of tx queues can be shared for different flows,
> > cgroups, or CPUs in a variety of scenarios.
> >
> > For example, flows F1 may share range R1 with flows F2. The best
> > way to do that is to set flag to SKBEDIT_F_QUEUE_MAPPING_HASH.
> > If cgroup C1 share the R1 with cgroup C2 .. Cn, use the
> > SKBEDIT_F_QUEUE_MAPPING_CLASSID. Of course, in some other scenario,
> > C1 uses R1, while Cn can use Rn.
>
> > diff --git a/include/net/tc_act/tc_skbedit.h b/include/net/tc_act/tc_skbedit.h
> > index 00bfee70609e..ee96e0fa6566 100644
> > --- a/include/net/tc_act/tc_skbedit.h
> > +++ b/include/net/tc_act/tc_skbedit.h
> > @@ -17,6 +17,7 @@ struct tcf_skbedit_params {
> >       u32 mark;
> >       u32 mask;
> >       u16 queue_mapping;
> > +     u16 mapping_mod;
> >       u16 ptype;
> >       struct rcu_head rcu;
> >  };
> > diff --git a/include/uapi/linux/tc_act/tc_skbedit.h b/include/uapi/linux/tc_act/tc_skbedit.h
> > index 800e93377218..5642b095d206 100644
> > --- a/include/uapi/linux/tc_act/tc_skbedit.h
> > +++ b/include/uapi/linux/tc_act/tc_skbedit.h
> > @@ -29,6 +29,13 @@
> >  #define SKBEDIT_F_PTYPE                      0x8
> >  #define SKBEDIT_F_MASK                       0x10
> >  #define SKBEDIT_F_INHERITDSFIELD     0x20
> > +#define SKBEDIT_F_QUEUE_MAPPING_HASH 0x40
> > +#define SKBEDIT_F_QUEUE_MAPPING_CLASSID      0x80
> > +#define SKBEDIT_F_QUEUE_MAPPING_CPUID        0x100
> > +
> > +#define SKBEDIT_F_QUEUE_MAPPING_HASH_MASK (SKBEDIT_F_QUEUE_MAPPING_HASH | \
> > +                                        SKBEDIT_F_QUEUE_MAPPING_CLASSID | \
> > +                                        SKBEDIT_F_QUEUE_MAPPING_CPUID)
>
> Any way we can make these defines shorter?
>
> s/QUEUE_MAPPING_/TXQ_/ ?
Yes
>
> >  struct tc_skbedit {
> >       tc_gen;
> > @@ -45,6 +52,7 @@ enum {
> >       TCA_SKBEDIT_PTYPE,
> >       TCA_SKBEDIT_MASK,
> >       TCA_SKBEDIT_FLAGS,
> > +     TCA_SKBEDIT_QUEUE_MAPPING_MAX,
> >       __TCA_SKBEDIT_MAX
> >  };
> >  #define TCA_SKBEDIT_MAX (__TCA_SKBEDIT_MAX - 1)
> > diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
> > index 498feedad70a..0b0d65d7112e 100644
> > --- a/net/sched/act_skbedit.c
> > +++ b/net/sched/act_skbedit.c
> > @@ -10,6 +10,7 @@
> >  #include <linux/kernel.h>
> >  #include <linux/skbuff.h>
> >  #include <linux/rtnetlink.h>
> > +#include <net/cls_cgroup.h>
> >  #include <net/netlink.h>
> >  #include <net/pkt_sched.h>
> >  #include <net/ip.h>
> > @@ -23,6 +24,37 @@
> >  static unsigned int skbedit_net_id;
> >  static struct tc_action_ops act_skbedit_ops;
> >
> > +static u16 tcf_skbedit_hash(struct tcf_skbedit_params *params,
> > +                         struct sk_buff *skb)
> > +{
> > +     u16 queue_mapping = params->queue_mapping;
> > +     u16 mapping_mod = params->mapping_mod;
> > +     u32 mapping_hash_type = params->flags &
> > +                             SKBEDIT_F_QUEUE_MAPPING_HASH_MASK;
>
> Don't do inline init if the init doesn't fit on a line.
OK
> > +     u32 hash = 0;
> > +
> > +     if (!mapping_hash_type)
> > +             return netdev_cap_txqueue(skb->dev, queue_mapping);
>
> Make it "case 0:" below?
Yes
>
> > +     switch (mapping_hash_type) {
> > +     case SKBEDIT_F_QUEUE_MAPPING_CLASSID:
> > +             hash = jhash_1word(task_get_classid(skb), 0);
> > +             break;
> > +     case SKBEDIT_F_QUEUE_MAPPING_HASH:
> > +             hash = skb_get_hash(skb);
> > +             break;
> > +     case SKBEDIT_F_QUEUE_MAPPING_CPUID:
> > +             hash = raw_smp_processor_id();
> > +             break;
> > +     default:
> > +             net_warn_ratelimited("The type of queue_mapping hash is not supported. 0x%x\n",
> > +                                  mapping_hash_type);
> > +     }
> > +
> > +     queue_mapping = queue_mapping + hash % mapping_mod;
> > +     return netdev_cap_txqueue(skb->dev, queue_mapping);
> > +}
> > +
> >  static int tcf_skbedit_act(struct sk_buff *skb, const struct tc_action *a,
> >                          struct tcf_result *res)
> >  {
> > @@ -57,10 +89,9 @@ static int tcf_skbedit_act(struct sk_buff *skb, const struct tc_action *a,
> >                       break;
> >               }
> >       }
> > -     if (params->flags & SKBEDIT_F_QUEUE_MAPPING &&
> > -         skb->dev->real_num_tx_queues > params->queue_mapping) {
> > +     if (params->flags & SKBEDIT_F_QUEUE_MAPPING) {
> >               netdev_xmit_skip_txqueue();
> > -             skb_set_queue_mapping(skb, params->queue_mapping);
> > +             skb_set_queue_mapping(skb, tcf_skbedit_hash(params, skb));
> >       }
> >       if (params->flags & SKBEDIT_F_MARK) {
> >               skb->mark &= ~params->mask;
> > @@ -94,6 +125,7 @@ static const struct nla_policy skbedit_policy[TCA_SKBEDIT_MAX + 1] = {
> >       [TCA_SKBEDIT_PTYPE]             = { .len = sizeof(u16) },
> >       [TCA_SKBEDIT_MASK]              = { .len = sizeof(u32) },
> >       [TCA_SKBEDIT_FLAGS]             = { .len = sizeof(u64) },
> > +     [TCA_SKBEDIT_QUEUE_MAPPING_MAX] = { .len = sizeof(u16) },
>
> .type = NLA_U16 ? I think it's okay to use the modern stuff here.
TCA_SKBEDIT_QUEUE_MAPPING_MAX and TCA_SKBEDIT_QUEUE_MAPPING are the
same type of value,
so I use the len of u16 as TCA_SKBEDIT_QUEUE_MAPPING. I think it is better.
> >  };
> >
> >  static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
> > @@ -110,6 +142,7 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
> >       struct tcf_skbedit *d;
> >       u32 flags = 0, *priority = NULL, *mark = NULL, *mask = NULL;
> >       u16 *queue_mapping = NULL, *ptype = NULL;
> > +     u16 mapping_mod = 0;
> >       bool exists = false;
> >       int ret = 0, err;
> >       u32 index;
> > @@ -154,7 +187,30 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
> >
> >       if (tb[TCA_SKBEDIT_FLAGS] != NULL) {
> >               u64 *pure_flags = nla_data(tb[TCA_SKBEDIT_FLAGS]);
> > +             u64 mapping_hash_type = *pure_flags &
> > +                                     SKBEDIT_F_QUEUE_MAPPING_HASH_MASK;
>
> Again, doing init inline and new line between variables and code
Ok
> > +             if (mapping_hash_type) {
> > +                     u16 *queue_mapping_max;
> > +
> > +                     /* Hash types are mutually exclusive. */
> > +                     if (mapping_hash_type & (mapping_hash_type - 1))
> > +                             return -EINVAL;
> > +
> > +                     if (!tb[TCA_SKBEDIT_QUEUE_MAPPING_MAX])
> > +                             return -EINVAL;
> >
> > +                     if (!tb[TCA_SKBEDIT_QUEUE_MAPPING])
> > +                             return -EINVAL;
>
> Can be merged with the condition above, unless you add extack, which
> perhaps you should..
Yes, I will merged them and add exack.
> > +                     queue_mapping_max =
> > +                             nla_data(tb[TCA_SKBEDIT_QUEUE_MAPPING_MAX]);
>
> nla_get_u16()
>
> > +                     if (*queue_mapping_max < *queue_mapping)
> > +                             return -EINVAL;
> > +
> > +                     mapping_mod = *queue_mapping_max - *queue_mapping + 1;
> > +                     flags |= mapping_hash_type;
> > +             }
> >               if (*pure_flags & SKBEDIT_F_INHERITDSFIELD)
> >                       flags |= SKBEDIT_F_INHERITDSFIELD;
> >       }
> > @@ -206,8 +262,10 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
> >       params_new->flags = flags;
> >       if (flags & SKBEDIT_F_PRIORITY)
> >               params_new->priority = *priority;
> > -     if (flags & SKBEDIT_F_QUEUE_MAPPING)
> > +     if (flags & SKBEDIT_F_QUEUE_MAPPING) {
> >               params_new->queue_mapping = *queue_mapping;
> > +             params_new->mapping_mod = mapping_mod;
> > +     }
> >       if (flags & SKBEDIT_F_MARK)
> >               params_new->mark = *mark;
> >       if (flags & SKBEDIT_F_PTYPE)
> > @@ -274,6 +332,14 @@ static int tcf_skbedit_dump(struct sk_buff *skb, struct tc_action *a,
> >               goto nla_put_failure;
> >       if (params->flags & SKBEDIT_F_INHERITDSFIELD)
> >               pure_flags |= SKBEDIT_F_INHERITDSFIELD;
> > +     if (params->flags & SKBEDIT_F_QUEUE_MAPPING_HASH_MASK) {
> > +             if (nla_put_u16(skb, TCA_SKBEDIT_QUEUE_MAPPING_MAX,
> > +                             params->queue_mapping + params->mapping_mod - 1))
> > +                     goto nla_put_failure;
> > +
> > +             pure_flags |= params->flags &
> > +                           SKBEDIT_F_QUEUE_MAPPING_HASH_MASK;
>
> Why "params->flags &" ? Given the if above the flag must be set.
We have checked "params->flags", we can use it in this block. Thanks!
> > +     }
> >       if (pure_flags != 0 &&
> >           nla_put(skb, TCA_SKBEDIT_FLAGS, sizeof(pure_flags), &pure_flags))
> >               goto nla_put_failure;
>


-- 
Best regards, Tonghao

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-12-16 11:30 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-10  2:36 [net-next v3 0/2] net: sched: allow user to select txqueue xiangxia.m.yue
2021-12-10  2:36 ` [net-next v3 1/2] net: sched: use queue_mapping to pick tx queue xiangxia.m.yue
2021-12-13 22:58   ` Cong Wang
2021-12-14  1:56     ` Tonghao Zhang
2021-12-10  2:36 ` [net-next v3 2/2] net: sched: support hash/classid/cpuid selecting " xiangxia.m.yue
2021-12-12  2:19   ` Cong Wang
2021-12-12  2:34     ` Tonghao Zhang
2021-12-13 22:53       ` Cong Wang
2021-12-14  2:19         ` Tonghao Zhang
2021-12-14 12:13         ` Jamal Hadi Salim
2021-12-15  1:41           ` Tonghao Zhang
2021-12-15  1:34   ` Tonghao Zhang
2021-12-15 16:12     ` Jakub Kicinski
2021-12-15 16:08   ` Jakub Kicinski
2021-12-16 11:29     ` Tonghao Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.