All of lore.kernel.org
 help / color / mirror / Atom feed
* [net-next v1 0/2] net: sched: allow user to select txqueue
@ 2021-12-06  8:05 xiangxia.m.yue
  2021-12-06  8:05 ` [net-next v1 1/2] net: sched: use queue_mapping to pick tx queue xiangxia.m.yue
  2021-12-06  8:05 ` [net-next v1 2/2] net: sched: support hash/classid selecting queue_mapping xiangxia.m.yue
  0 siblings, 2 replies; 9+ messages in thread
From: xiangxia.m.yue @ 2021-12-06  8:05 UTC (permalink / raw)
  To: netdev
  Cc: Tonghao Zhang, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
	David S. Miller, Jakub Kicinski, Jonathan Lemon, Eric Dumazet,
	Alexander Lobakin, Paolo Abeni, Talal Ahmad, Kevin Hao,
	Ilias Apalodimas, Kees Cook, Kumar Kartikeya Dwivedi,
	Antoine Tenart, Wei Wang, Arnd Bergmann

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

Patch 1 allows user to select txqueue in clsact hook.
Patch 2 supports skb-hash and classid to select rxqueue.

Tonghao Zhang (2):
  net: sched: use queue_mapping to pick tx queue
  net: sched: support hash/classid selecting queue_mapping

 include/linux/skbuff.h                 |  1 +
 include/net/tc_act/tc_skbedit.h        |  1 +
 include/uapi/linux/tc_act/tc_skbedit.h |  5 +++
 net/core/dev.c                         | 12 +++++--
 net/sched/act_skbedit.c                | 50 +++++++++++++++++++++++---
 5 files changed, 62 insertions(+), 7 deletions(-)

-- 
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexander Lobakin <alobakin@pm.me>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Talal Ahmad <talalahmad@google.com>
Cc: Kevin Hao <haokexin@gmail.com>
Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Cc: Antoine Tenart <atenart@kernel.org>
Cc: Wei Wang <weiwan@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
--
2.27.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [net-next v1 1/2] net: sched: use queue_mapping to pick tx queue
  2021-12-06  8:05 [net-next v1 0/2] net: sched: allow user to select txqueue xiangxia.m.yue
@ 2021-12-06  8:05 ` xiangxia.m.yue
  2021-12-06 20:40   ` Jakub Kicinski
  2021-12-06  8:05 ` [net-next v1 2/2] net: sched: support hash/classid selecting queue_mapping xiangxia.m.yue
  1 sibling, 1 reply; 9+ messages in thread
From: xiangxia.m.yue @ 2021-12-06  8:05 UTC (permalink / raw)
  To: netdev
  Cc: Tonghao Zhang, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
	David S. Miller, Jakub Kicinski, Jonathan Lemon, Eric Dumazet,
	Alexander Lobakin, Paolo Abeni, Talal Ahmad, Kevin Hao,
	Ilias Apalodimas, Kees Cook, Kumar Kartikeya Dwivedi,
	Antoine Tenart, Wei Wang, Arnd Bergmann

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

This patch fix issue:
* If we install tc filters with act_skbedit in clsact hook.
  It doesn't work, because *netdev_core_pick_tx will overwrite
  queue_mapping.

  $ tc filter add dev $NETDEV egress .. action skbedit queue_mapping 1

And this patch is useful:
* In containter networking environment, one kind of pod/containter/
  net-namespace (e.g. P1, P2) which outbound traffic limited, can
  use one specific tx queue which used HTB/TBF Qdisc. But other kind
  of pods (e.g. Pn) can use other specific tx queue too, which used fifio
  Qdisc. Then the lock contention of HTB/TBF Qdisc will not affect Pn.

  +----+      +----+      +----+
  | P1 |      | P2 |      | Pn |
  +----+      +----+      +----+
    |           |           |
    +-----------+-----------+
                |
                | clsact/skbedit
                |    MQ
                v
    +-----------+-----------+
    | q0        | q1        | qn
    v           v           v
   HTB         HTB   ...   FIFO

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexander Lobakin <alobakin@pm.me>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Talal Ahmad <talalahmad@google.com>
Cc: Kevin Hao <haokexin@gmail.com>
Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Cc: Antoine Tenart <atenart@kernel.org>
Cc: Wei Wang <weiwan@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
 include/linux/skbuff.h  |  1 +
 net/core/dev.c          | 12 +++++++++---
 net/sched/act_skbedit.c |  4 +++-
 3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index eae4bd3237a4..b6ea4b920409 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -856,6 +856,7 @@ struct sk_buff {
 #endif
 #ifdef CONFIG_NET_CLS_ACT
 	__u8			tc_skip_classify:1;
+	__u8			tc_skip_txqueue:1;
 	__u8			tc_at_ingress:1;
 #endif
 	__u8			redirected:1;
diff --git a/net/core/dev.c b/net/core/dev.c
index aba8acc1238c..fb9d4eee29ee 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3975,10 +3975,16 @@ struct netdev_queue *netdev_core_pick_tx(struct net_device *dev,
 {
 	int queue_index = 0;
 
-#ifdef CONFIG_XPS
-	u32 sender_cpu = skb->sender_cpu - 1;
+#ifdef CONFIG_NET_CLS_ACT
+	if (skb->tc_skip_txqueue) {
+		queue_index = netdev_cap_txqueue(dev,
+						 skb_get_queue_mapping(skb));
+		return netdev_get_tx_queue(dev, queue_index);
+	}
+#endif
 
-	if (sender_cpu >= (u32)NR_CPUS)
+#ifdef CONFIG_XPS
+	if ((skb->sender_cpu - 1) >= (u32)NR_CPUS)
 		skb->sender_cpu = raw_smp_processor_id() + 1;
 #endif
 
diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
index d30ecbfc8f84..940091a7c7f0 100644
--- a/net/sched/act_skbedit.c
+++ b/net/sched/act_skbedit.c
@@ -58,8 +58,10 @@ static int tcf_skbedit_act(struct sk_buff *skb, const struct tc_action *a,
 		}
 	}
 	if (params->flags & SKBEDIT_F_QUEUE_MAPPING &&
-	    skb->dev->real_num_tx_queues > params->queue_mapping)
+	    skb->dev->real_num_tx_queues > params->queue_mapping) {
+		skb->tc_skip_txqueue = 1;
 		skb_set_queue_mapping(skb, params->queue_mapping);
+	}
 	if (params->flags & SKBEDIT_F_MARK) {
 		skb->mark &= ~params->mask;
 		skb->mark |= params->mark & params->mask;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [net-next v1 2/2] net: sched: support hash/classid selecting queue_mapping
  2021-12-06  8:05 [net-next v1 0/2] net: sched: allow user to select txqueue xiangxia.m.yue
  2021-12-06  8:05 ` [net-next v1 1/2] net: sched: use queue_mapping to pick tx queue xiangxia.m.yue
@ 2021-12-06  8:05 ` xiangxia.m.yue
  1 sibling, 0 replies; 9+ messages in thread
From: xiangxia.m.yue @ 2021-12-06  8:05 UTC (permalink / raw)
  To: netdev
  Cc: Tonghao Zhang, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
	David S. Miller, Jakub Kicinski, Jonathan Lemon, Eric Dumazet,
	Alexander Lobakin, Paolo Abeni, Talal Ahmad, Kevin Hao,
	Ilias Apalodimas, Kees Cook, Kumar Kartikeya Dwivedi,
	Antoine Tenart, Wei Wang, Arnd Bergmann

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

This patch allows users to select queue_mapping, range
from A to B. And users can use skb-hash or cgroup classid
to select queues. Then the packets can load balance from A to
B queue.

$ tc filter ... action skbedit queue_mapping hash-type normal 0 4

"skbedit queue_mapping QUEUE_MAPPING"[0] is enhanced with two flags:
SKBEDIT_F_QUEUE_MAPPING_HASH, SKBEDIT_F_QUEUE_MAPPING_CLASSID.
The range is an unsigned 8bit value in decimal format.

[0]: https://man7.org/linux/man-pages/man8/tc-skbedit.8.html

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexander Lobakin <alobakin@pm.me>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Talal Ahmad <talalahmad@google.com>
Cc: Kevin Hao <haokexin@gmail.com>
Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Cc: Antoine Tenart <atenart@kernel.org>
Cc: Wei Wang <weiwan@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
 include/net/tc_act/tc_skbedit.h        |  1 +
 include/uapi/linux/tc_act/tc_skbedit.h |  5 +++
 net/sched/act_skbedit.c                | 48 +++++++++++++++++++++++---
 3 files changed, 50 insertions(+), 4 deletions(-)

diff --git a/include/net/tc_act/tc_skbedit.h b/include/net/tc_act/tc_skbedit.h
index 00bfee70609e..ee96e0fa6566 100644
--- a/include/net/tc_act/tc_skbedit.h
+++ b/include/net/tc_act/tc_skbedit.h
@@ -17,6 +17,7 @@ struct tcf_skbedit_params {
 	u32 mark;
 	u32 mask;
 	u16 queue_mapping;
+	u16 mapping_mod;
 	u16 ptype;
 	struct rcu_head rcu;
 };
diff --git a/include/uapi/linux/tc_act/tc_skbedit.h b/include/uapi/linux/tc_act/tc_skbedit.h
index 800e93377218..badb58ec84ef 100644
--- a/include/uapi/linux/tc_act/tc_skbedit.h
+++ b/include/uapi/linux/tc_act/tc_skbedit.h
@@ -29,6 +29,11 @@
 #define SKBEDIT_F_PTYPE			0x8
 #define SKBEDIT_F_MASK			0x10
 #define SKBEDIT_F_INHERITDSFIELD	0x20
+#define SKBEDIT_F_QUEUE_MAPPING_HASH	0x40
+#define SKBEDIT_F_QUEUE_MAPPING_CLASSID	0x80
+
+#define SKBEDIT_F_QUEUE_MAPPING_HASH_MASK (SKBEDIT_F_QUEUE_MAPPING_HASH | \
+					   SKBEDIT_F_QUEUE_MAPPING_CLASSID)
 
 struct tc_skbedit {
 	tc_gen;
diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
index 940091a7c7f0..9cb65bcce001 100644
--- a/net/sched/act_skbedit.c
+++ b/net/sched/act_skbedit.c
@@ -10,6 +10,7 @@
 #include <linux/kernel.h>
 #include <linux/skbuff.h>
 #include <linux/rtnetlink.h>
+#include <net/cls_cgroup.h>
 #include <net/netlink.h>
 #include <net/pkt_sched.h>
 #include <net/ip.h>
@@ -23,6 +24,25 @@
 static unsigned int skbedit_net_id;
 static struct tc_action_ops act_skbedit_ops;
 
+static u16 tcf_skbedit_hash(struct tcf_skbedit_params *params,
+			    struct sk_buff *skb)
+{
+	u16 queue_mapping = params->queue_mapping;
+	u16 mapping_mod = params->mapping_mod;
+	u32 hash = 0;
+
+	if (!(params->flags & SKBEDIT_F_QUEUE_MAPPING_HASH_MASK))
+		return netdev_cap_txqueue(skb->dev, queue_mapping);
+
+	if (params->flags & SKBEDIT_F_QUEUE_MAPPING_CLASSID)
+		hash = jhash_1word(task_get_classid(skb), 0);
+	else if (params->flags & SKBEDIT_F_QUEUE_MAPPING_HASH)
+		hash = skb_get_hash(skb);
+
+	queue_mapping= (queue_mapping & 0xff) + hash % mapping_mod;
+	return netdev_cap_txqueue(skb->dev, queue_mapping);
+}
+
 static int tcf_skbedit_act(struct sk_buff *skb, const struct tc_action *a,
 			   struct tcf_result *res)
 {
@@ -57,10 +77,9 @@ static int tcf_skbedit_act(struct sk_buff *skb, const struct tc_action *a,
 			break;
 		}
 	}
-	if (params->flags & SKBEDIT_F_QUEUE_MAPPING &&
-	    skb->dev->real_num_tx_queues > params->queue_mapping) {
+	if (params->flags & SKBEDIT_F_QUEUE_MAPPING) {
 		skb->tc_skip_txqueue = 1;
-		skb_set_queue_mapping(skb, params->queue_mapping);
+		skb_set_queue_mapping(skb, tcf_skbedit_hash(params, skb));
 	}
 	if (params->flags & SKBEDIT_F_MARK) {
 		skb->mark &= ~params->mask;
@@ -110,6 +129,7 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
 	struct tcf_skbedit *d;
 	u32 flags = 0, *priority = NULL, *mark = NULL, *mask = NULL;
 	u16 *queue_mapping = NULL, *ptype = NULL;
+	u16 mapping_mod = 0;
 	bool exists = false;
 	int ret = 0, err;
 	u32 index;
@@ -157,6 +177,21 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
 
 		if (*pure_flags & SKBEDIT_F_INHERITDSFIELD)
 			flags |= SKBEDIT_F_INHERITDSFIELD;
+		if (*pure_flags & SKBEDIT_F_QUEUE_MAPPING_HASH_MASK) {
+			u16 max, min;
+
+			if (!queue_mapping)
+				return -EINVAL;
+
+			max = *queue_mapping >> 8;
+			min = *queue_mapping & 0xff;
+			if (max < min)
+				return -EINVAL;
+
+			mapping_mod = max - min + 1;
+			flags |= *pure_flags &
+				 SKBEDIT_F_QUEUE_MAPPING_HASH_MASK;
+		}
 	}
 
 	parm = nla_data(tb[TCA_SKBEDIT_PARMS]);
@@ -206,8 +241,10 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
 	params_new->flags = flags;
 	if (flags & SKBEDIT_F_PRIORITY)
 		params_new->priority = *priority;
-	if (flags & SKBEDIT_F_QUEUE_MAPPING)
+	if (flags & SKBEDIT_F_QUEUE_MAPPING) {
 		params_new->queue_mapping = *queue_mapping;
+		params_new->mapping_mod = mapping_mod;
+	}
 	if (flags & SKBEDIT_F_MARK)
 		params_new->mark = *mark;
 	if (flags & SKBEDIT_F_PTYPE)
@@ -274,6 +311,9 @@ static int tcf_skbedit_dump(struct sk_buff *skb, struct tc_action *a,
 		goto nla_put_failure;
 	if (params->flags & SKBEDIT_F_INHERITDSFIELD)
 		pure_flags |= SKBEDIT_F_INHERITDSFIELD;
+	if (params->flags & SKBEDIT_F_QUEUE_MAPPING_HASH_MASK)
+		pure_flags |= params->flags &
+			      SKBEDIT_F_QUEUE_MAPPING_HASH_MASK;
 	if (pure_flags != 0 &&
 	    nla_put(skb, TCA_SKBEDIT_FLAGS, sizeof(pure_flags), &pure_flags))
 		goto nla_put_failure;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [net-next v1 1/2] net: sched: use queue_mapping to pick tx queue
  2021-12-06  8:05 ` [net-next v1 1/2] net: sched: use queue_mapping to pick tx queue xiangxia.m.yue
@ 2021-12-06 20:40   ` Jakub Kicinski
  2021-12-07  2:10     ` Tonghao Zhang
  0 siblings, 1 reply; 9+ messages in thread
From: Jakub Kicinski @ 2021-12-06 20:40 UTC (permalink / raw)
  To: xiangxia.m.yue
  Cc: netdev, Jamal Hadi Salim, Cong Wang, Jiri Pirko, David S. Miller,
	Jonathan Lemon, Eric Dumazet, Alexander Lobakin, Paolo Abeni,
	Talal Ahmad, Kevin Hao, Ilias Apalodimas, Kees Cook,
	Kumar Kartikeya Dwivedi, Antoine Tenart, Wei Wang, Arnd Bergmann,
	alexander.duyck

On Mon,  6 Dec 2021 16:05:11 +0800 xiangxia.m.yue@gmail.com wrote:
>   +----+      +----+      +----+
>   | P1 |      | P2 |      | Pn |
>   +----+      +----+      +----+
>     |           |           |
>     +-----------+-----------+
>                 |
>                 | clsact/skbedit
>                 |    MQ
>                 v
>     +-----------+-----------+
>     | q0        | q1        | qn
>     v           v           v
>    HTB         HTB   ...   FIFO

The usual suggestion these days is to try to use FQ + EDT to
implement efficient policies. You don't need dedicated qdiscs, 
just modulate transmission time appropriately on egress of the
container.

In general recording the decision in the skb seems a little heavy
handed. We just need to carry the information from the egress hook
to the queue selection a few lines below. Or in fact maybe egress
hook shouldn't be used for this in the first place, and we need 
a more appropriate root qdisc than simple mq?

Not sure. What I am sure of is that you need to fix these warnings:

include/linux/skbuff.h:937: warning: Function parameter or member 'tc_skip_txqueue' not described in 'sk_buff'

ERROR: spaces required around that '=' (ctx:VxW)
#103: FILE: net/sched/act_skbedit.c:42:
+	queue_mapping= (queue_mapping & 0xff) + hash % mapping_mod;
 	             ^

;)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [net-next v1 1/2] net: sched: use queue_mapping to pick tx queue
  2021-12-06 20:40   ` Jakub Kicinski
@ 2021-12-07  2:10     ` Tonghao Zhang
  2021-12-07  2:33       ` Jakub Kicinski
  0 siblings, 1 reply; 9+ messages in thread
From: Tonghao Zhang @ 2021-12-07  2:10 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Cong Wang,
	Jiri Pirko, David S. Miller, Jonathan Lemon, Eric Dumazet,
	Alexander Lobakin, Paolo Abeni, Talal Ahmad, Kevin Hao,
	Ilias Apalodimas, Kees Cook, Kumar Kartikeya Dwivedi,
	Antoine Tenart, Wei Wang, Arnd Bergmann, Alexander Duyck

On Tue, Dec 7, 2021 at 4:40 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Mon,  6 Dec 2021 16:05:11 +0800 xiangxia.m.yue@gmail.com wrote:
> >   +----+      +----+      +----+
> >   | P1 |      | P2 |      | Pn |
> >   +----+      +----+      +----+
> >     |           |           |
> >     +-----------+-----------+
> >                 |
> >                 | clsact/skbedit
> >                 |    MQ
> >                 v
> >     +-----------+-----------+
> >     | q0        | q1        | qn
> >     v           v           v
> >    HTB         HTB   ...   FIFO
Hi Jakub, thanks for your comments
> The usual suggestion these days is to try to use FQ + EDT to
> implement efficient policies. You don't need dedicated qdiscs,
> just modulate transmission time appropriately on egress of the
> container.
FQ+EDT is good solution. But this patch should be used on another scenario.
1. the containers  which outbound traffic is not limited, want to use
the fifo qdisc.
If this traffic share the FQ/HTB Qdisc, the qdisc lock will affect the
performance and latency.
2. we can support user to select tx queue, range from A to B. skb hash
or cgroup classid is good to do load balance.
patch 2/2: https://patchwork.kernel.org/project/netdevbpf/patch/20211206080512.36610-3-xiangxia.m.yue@gmail.com/

> In general recording the decision in the skb seems a little heavy
> handed. We just need to carry the information from the egress hook
> to the queue selection a few lines below. Or in fact maybe egress
Yes, we can refactor netdev_core_pick_tx to
1. select queue_index and invoke skb_set_queue_mapping, but don't
return the txq.
2. after egress hook, use skb_get_queue_mapping/netdev_get_tx_queue to get txq.
> hook shouldn't be used for this in the first place, and we need
> a more appropriate root qdisc than simple mq?
I have no idea about mq, I think clsact may make the things more flexible.
and act_bpf can also support to change sk queue_mapping. queue_mapping
was included in __sk_buff.

> Not sure. What I am sure of is that you need to fix these warnings:
Ok
> include/linux/skbuff.h:937: warning: Function parameter or member 'tc_skip_txqueue' not described in 'sk_buff'
>
> ERROR: spaces required around that '=' (ctx:VxW)
> #103: FILE: net/sched/act_skbedit.c:42:
> +       queue_mapping= (queue_mapping & 0xff) + hash % mapping_mod;
>                      ^
>
> ;)



-- 
Best regards, Tonghao

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [net-next v1 1/2] net: sched: use queue_mapping to pick tx queue
  2021-12-07  2:10     ` Tonghao Zhang
@ 2021-12-07  2:33       ` Jakub Kicinski
  2021-12-07  3:22         ` Tonghao Zhang
  0 siblings, 1 reply; 9+ messages in thread
From: Jakub Kicinski @ 2021-12-07  2:33 UTC (permalink / raw)
  To: Tonghao Zhang
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Cong Wang,
	Jiri Pirko, David S. Miller, Jonathan Lemon, Eric Dumazet,
	Alexander Lobakin, Paolo Abeni, Talal Ahmad, Kevin Hao,
	Ilias Apalodimas, Kees Cook, Kumar Kartikeya Dwivedi,
	Antoine Tenart, Wei Wang, Arnd Bergmann, Alexander Duyck

On Tue, 7 Dec 2021 10:10:22 +0800 Tonghao Zhang wrote:
> > In general recording the decision in the skb seems a little heavy
> > handed. We just need to carry the information from the egress hook
> > to the queue selection a few lines below. Or in fact maybe egress  
> Yes, we can refactor netdev_core_pick_tx to
> 1. select queue_index and invoke skb_set_queue_mapping, but don't
> return the txq.
> 2. after egress hook, use skb_get_queue_mapping/netdev_get_tx_queue to get txq.

I'm not sure that's what I meant, I meant the information you need to
store does not need to be stored in the skb, you can pass a pointer to
a stack variable to both egress handling and pick_tx.

> > hook shouldn't be used for this in the first place, and we need
> > a more appropriate root qdisc than simple mq?  
> I have no idea about mq, I think clsact may make the things more flexible.
> and act_bpf can also support to change sk queue_mapping. queue_mapping
> was included in __sk_buff.

Qdiscs can run a classifier to select a sub-queue. The advantage of
the classifier run by the Qdisc is that it runs after pick_tx.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [net-next v1 1/2] net: sched: use queue_mapping to pick tx queue
  2021-12-07  2:33       ` Jakub Kicinski
@ 2021-12-07  3:22         ` Tonghao Zhang
  2021-12-07 15:44           ` Jakub Kicinski
  0 siblings, 1 reply; 9+ messages in thread
From: Tonghao Zhang @ 2021-12-07  3:22 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Cong Wang,
	Jiri Pirko, David S. Miller, Jonathan Lemon, Eric Dumazet,
	Alexander Lobakin, Paolo Abeni, Talal Ahmad, Kevin Hao,
	Ilias Apalodimas, Kees Cook, Kumar Kartikeya Dwivedi,
	Antoine Tenart, Wei Wang, Arnd Bergmann, Alexander Duyck

On Tue, Dec 7, 2021 at 10:33 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 7 Dec 2021 10:10:22 +0800 Tonghao Zhang wrote:
> > > In general recording the decision in the skb seems a little heavy
> > > handed. We just need to carry the information from the egress hook
> > > to the queue selection a few lines below. Or in fact maybe egress
> > Yes, we can refactor netdev_core_pick_tx to
> > 1. select queue_index and invoke skb_set_queue_mapping, but don't
> > return the txq.
> > 2. after egress hook, use skb_get_queue_mapping/netdev_get_tx_queue to get txq.
>
> I'm not sure that's what I meant, I meant the information you need to
> store does not need to be stored in the skb, you can pass a pointer to
> a stack variable to both egress handling and pick_tx.
Thanks, I got it. I think we store the txq index in skb->queue_mapping
better. because in egress hook,
act_skbedit/act_bpf can change the skb queue_mapping. Then we can
pick_tx depending on queue_mapping.

> > > hook shouldn't be used for this in the first place, and we need
> > > a more appropriate root qdisc than simple mq?
> > I have no idea about mq, I think clsact may make the things more flexible.
> > and act_bpf can also support to change sk queue_mapping. queue_mapping
> > was included in __sk_buff.
>
> Qdiscs can run a classifier to select a sub-queue. The advantage of
> the classifier run by the Qdisc is that it runs after pick_tx.
Yes, we should consider the qdisc lock too. Qdisc lock may affect
performance and latency when running a classifier in Qdisc
and clsact is outside of qdisc.


-- 
Best regards, Tonghao

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [net-next v1 1/2] net: sched: use queue_mapping to pick tx queue
  2021-12-07  3:22         ` Tonghao Zhang
@ 2021-12-07 15:44           ` Jakub Kicinski
  2021-12-08 14:54             ` Tonghao Zhang
  0 siblings, 1 reply; 9+ messages in thread
From: Jakub Kicinski @ 2021-12-07 15:44 UTC (permalink / raw)
  To: Tonghao Zhang
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Cong Wang,
	Jiri Pirko, David S. Miller, Jonathan Lemon, Eric Dumazet,
	Alexander Lobakin, Paolo Abeni, Talal Ahmad, Kevin Hao,
	Ilias Apalodimas, Kees Cook, Kumar Kartikeya Dwivedi,
	Antoine Tenart, Wei Wang, Arnd Bergmann, Alexander Duyck

On Tue, 7 Dec 2021 11:22:28 +0800 Tonghao Zhang wrote:
> On Tue, Dec 7, 2021 at 10:33 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > On Tue, 7 Dec 2021 10:10:22 +0800 Tonghao Zhang wrote:  
> > > Yes, we can refactor netdev_core_pick_tx to
> > > 1. select queue_index and invoke skb_set_queue_mapping, but don't
> > > return the txq.
> > > 2. after egress hook, use skb_get_queue_mapping/netdev_get_tx_queue to get txq.  
> >
> > I'm not sure that's what I meant, I meant the information you need to
> > store does not need to be stored in the skb, you can pass a pointer to
> > a stack variable to both egress handling and pick_tx.  
> Thanks, I got it. I think we store the txq index in skb->queue_mapping
> better. because in egress hook,
> act_skbedit/act_bpf can change the skb queue_mapping. Then we can
> pick_tx depending on queue_mapping.

Actually Eric pointed out in another thread that xmit_more() is now
done via a per-CPU variable, you can try that instead of plumbing a
variable all the way into actions and back out to pick_tx().

Please make sure to include the analysis of the performance impact 
when the feature is _not_ used in the next version.

> > > I have no idea about mq, I think clsact may make the things more flexible.
> > > and act_bpf can also support to change sk queue_mapping. queue_mapping
> > > was included in __sk_buff.  
> >
> > Qdiscs can run a classifier to select a sub-queue. The advantage of
> > the classifier run by the Qdisc is that it runs after pick_tx.  
> Yes, we should consider the qdisc lock too. Qdisc lock may affect
> performance and latency when running a classifier in Qdisc
> and clsact is outside of qdisc.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [net-next v1 1/2] net: sched: use queue_mapping to pick tx queue
  2021-12-07 15:44           ` Jakub Kicinski
@ 2021-12-08 14:54             ` Tonghao Zhang
  0 siblings, 0 replies; 9+ messages in thread
From: Tonghao Zhang @ 2021-12-08 14:54 UTC (permalink / raw)
  To: Jakub Kicinski, Eric Dumazet
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Cong Wang,
	Jiri Pirko, David S. Miller, Jonathan Lemon, Alexander Lobakin,
	Paolo Abeni, Talal Ahmad, Kevin Hao, Ilias Apalodimas, Kees Cook,
	Kumar Kartikeya Dwivedi, Antoine Tenart, Wei Wang, Arnd Bergmann,
	Alexander Duyck

On Tue, Dec 7, 2021 at 11:45 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 7 Dec 2021 11:22:28 +0800 Tonghao Zhang wrote:
> > On Tue, Dec 7, 2021 at 10:33 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > > On Tue, 7 Dec 2021 10:10:22 +0800 Tonghao Zhang wrote:
> > > > Yes, we can refactor netdev_core_pick_tx to
> > > > 1. select queue_index and invoke skb_set_queue_mapping, but don't
> > > > return the txq.
> > > > 2. after egress hook, use skb_get_queue_mapping/netdev_get_tx_queue to get txq.
> > >
> > > I'm not sure that's what I meant, I meant the information you need to
> > > store does not need to be stored in the skb, you can pass a pointer to
> > > a stack variable to both egress handling and pick_tx.
> > Thanks, I got it. I think we store the txq index in skb->queue_mapping
> > better. because in egress hook,
> > act_skbedit/act_bpf can change the skb queue_mapping. Then we can
> > pick_tx depending on queue_mapping.
>
> Actually Eric pointed out in another thread that xmit_more() is now
> done via a per-CPU variable, you can try that instead of plumbing a
> variable all the way into actions and back out to pick_tx().
Thanks Jakub, Eric
v2 is sent, please review
https://patchwork.kernel.org/project/netdevbpf/list/?series=592341

> Please make sure to include the analysis of the performance impact
> when the feature is _not_ used in the next version.
Ok, I updated the commit message. Thanks!

> > > > I have no idea about mq, I think clsact may make the things more flexible.
> > > > and act_bpf can also support to change sk queue_mapping. queue_mapping
> > > > was included in __sk_buff.
> > >
> > > Qdiscs can run a classifier to select a sub-queue. The advantage of
> > > the classifier run by the Qdisc is that it runs after pick_tx.
> > Yes, we should consider the qdisc lock too. Qdisc lock may affect
> > performance and latency when running a classifier in Qdisc
> > and clsact is outside of qdisc.



-- 
Best regards, Tonghao

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-12-08 14:55 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-06  8:05 [net-next v1 0/2] net: sched: allow user to select txqueue xiangxia.m.yue
2021-12-06  8:05 ` [net-next v1 1/2] net: sched: use queue_mapping to pick tx queue xiangxia.m.yue
2021-12-06 20:40   ` Jakub Kicinski
2021-12-07  2:10     ` Tonghao Zhang
2021-12-07  2:33       ` Jakub Kicinski
2021-12-07  3:22         ` Tonghao Zhang
2021-12-07 15:44           ` Jakub Kicinski
2021-12-08 14:54             ` Tonghao Zhang
2021-12-06  8:05 ` [net-next v1 2/2] net: sched: support hash/classid selecting queue_mapping xiangxia.m.yue

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.