netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] skbuff: Add new tc classify variable
@ 2012-02-07 18:39 Simon Wunderlich
  2012-02-07 18:58 ` David Miller
  2012-02-07 19:05 ` Stephen Hemminger
  0 siblings, 2 replies; 10+ messages in thread
From: Simon Wunderlich @ 2012-02-07 18:39 UTC (permalink / raw)
  To: netdev

The linux traffic control mechanism has different ways to select the
correct class of a qdisc. A common way to do this is to use tc filters
that are directly attached to a qdisc. Another approach is to use the
iptables classify module. The latter one can reduce the amount of work
necessary to process a packet when iptables is already involved in the
packet classification.

The iptables module can be used in the postrouting chain of the mangle
table.

# iptables -F -t mangle
# iptables -X -t mangle
# iptables -t mangle -I POSTROUTING -j CLASSIFY --set-class 1:1337

A simple qdisc with two classes will now route the traffic to 1:1337

# tc qdisc del dev eth0 root
# tc qdisc add dev eth0 root handle 1: htb default 1
# tc class add dev eth0 parent 1: classid 1:1 htb rate 15kbit burst 0kbit
# tc class add dev eth0 parent 1: classid 1:1337 htb rate 250kbit burst 0kbit
# tc -s class show dev eth0

A similar test with an ath9k device will show a complete different
behavior. The default class 1:1 will be used and data is sent through.
This problem seems to be related to the fact that the shared member
variable sk_buff::priority is used to store the tc class id of an
outgoing packet. This variable is also used in other places for
different purposes. An example is the ieee80211_select_queue function
which always overwrites the priority of an outgoing skb.

This conflict can be resolved by an additional member variable
sk_buff::tc_class that is only used for the traffic control
classification.

Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jamal Hadi Salim <hadi@cyberus.ca>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: Marek Lindner <lindner_marek@yahoo.de>
Cc: Sven Eckelmann <sven@narfation.org>
Cc: netfilter-devel@vger.kernel.org
Cc: netfilter@vger.kernel.org
Cc: coreteam@netfilter.org
Cc: netdev@vger.kernel.org
Cc: linux-wireless@vger.kernel.org
---
 include/linux/skbuff.h      |    2 ++
 net/netfilter/xt_CLASSIFY.c |    4 ++--
 net/sched/sch_atm.c         |    4 ++--
 net/sched/sch_cbq.c         |    2 +-
 net/sched/sch_drr.c         |    4 ++--
 net/sched/sch_dsmark.c      |    4 ++--
 net/sched/sch_generic.c     |    2 +-
 net/sched/sch_hfsc.c        |    4 ++--
 net/sched/sch_htb.c         |    6 +++---
 net/sched/sch_prio.c        |    4 ++--
 net/sched/sch_sfq.c         |    8 ++++----
 11 files changed, 23 insertions(+), 21 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 50db9b0..3ffc301 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -352,6 +352,7 @@ typedef unsigned char *sk_buff_data_t;
  *	@nf_bridge: Saved data about a bridged frame - see br_netfilter.c
  *	@skb_iif: ifindex of device we arrived on
  *	@tc_index: Traffic control index
+ *	@tc_class: Classification index
  *	@tc_verd: traffic control verdict
  *	@rxhash: the packet hash computed on receive
  *	@queue_mapping: Queue mapping for multiqueue devices
@@ -439,6 +440,7 @@ struct sk_buff {
 
 	int			skb_iif;
 #ifdef CONFIG_NET_SCHED
+	__u32			tc_class;	/* traffic control class */
 	__u16			tc_index;	/* traffic control index */
 #ifdef CONFIG_NET_CLS_ACT
 	__u16			tc_verd;	/* traffic control verdict */
diff --git a/net/netfilter/xt_CLASSIFY.c b/net/netfilter/xt_CLASSIFY.c
index af9c4da..9b71957 100644
--- a/net/netfilter/xt_CLASSIFY.c
+++ b/net/netfilter/xt_CLASSIFY.c
@@ -1,5 +1,5 @@
 /*
- * This is a module which is used for setting the skb->priority field
+ * This is a module which is used for setting the skb->tc_class field
  * of an skb for qdisc classification.
  */
 
@@ -33,7 +33,7 @@ classify_tg(struct sk_buff *skb, const struct xt_action_param *par)
 {
 	const struct xt_classify_target_info *clinfo = par->targinfo;
 
-	skb->priority = clinfo->priority;
+	skb->tc_class = clinfo->priority;
 	return XT_CONTINUE;
 }
 
diff --git a/net/sched/sch_atm.c b/net/sched/sch_atm.c
index e25e490..003511a 100644
--- a/net/sched/sch_atm.c
+++ b/net/sched/sch_atm.c
@@ -369,8 +369,8 @@ static int atm_tc_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	pr_debug("atm_tc_enqueue(skb %p,sch %p,[qdisc %p])\n", skb, sch, p);
 	result = TC_POLICE_OK;	/* be nice to gcc */
 	flow = NULL;
-	if (TC_H_MAJ(skb->priority) != sch->handle ||
-	    !(flow = (struct atm_flow_data *)atm_tc_get(sch, skb->priority))) {
+	if (TC_H_MAJ(skb->tc_class) != sch->handle ||
+	    !(flow = (struct atm_flow_data *)atm_tc_get(sch, skb->tc_class))) {
 		list_for_each_entry(flow, &p->flows, list) {
 			if (flow->filter_list) {
 				result = tc_classify_compat(skb,
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index 24d94c0..b1321f9 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -221,7 +221,7 @@ cbq_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
 	struct cbq_class *head = &q->link;
 	struct cbq_class **defmap;
 	struct cbq_class *cl = NULL;
-	u32 prio = skb->priority;
+	u32 prio = skb->tc_class;
 	struct tcf_result res;
 
 	/*
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index 6b7fe4a..57b1ca2 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -321,8 +321,8 @@ static struct drr_class *drr_classify(struct sk_buff *skb, struct Qdisc *sch,
 	struct tcf_result res;
 	int result;
 
-	if (TC_H_MAJ(skb->priority ^ sch->handle) == 0) {
-		cl = drr_find_class(sch, skb->priority);
+	if (TC_H_MAJ(skb->tc_class ^ sch->handle) == 0) {
+		cl = drr_find_class(sch, skb->tc_class);
 		if (cl != NULL)
 			return cl;
 	}
diff --git a/net/sched/sch_dsmark.c b/net/sched/sch_dsmark.c
index 2c79020..60cfe15 100644
--- a/net/sched/sch_dsmark.c
+++ b/net/sched/sch_dsmark.c
@@ -224,8 +224,8 @@ static int dsmark_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		}
 	}
 
-	if (TC_H_MAJ(skb->priority) == sch->handle)
-		skb->tc_index = TC_H_MIN(skb->priority);
+	if (TC_H_MAJ(skb->tc_class) == sch->handle)
+		skb->tc_index = TC_H_MIN(skb->tc_class);
 	else {
 		struct tcf_result res;
 		int result = tc_classify(skb, p->filter_list, &res);
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 67fc573..c14779a 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -449,7 +449,7 @@ static inline struct sk_buff_head *band2list(struct pfifo_fast_priv *priv,
 static int pfifo_fast_enqueue(struct sk_buff *skb, struct Qdisc *qdisc)
 {
 	if (skb_queue_len(&qdisc->q) < qdisc_dev(qdisc)->tx_queue_len) {
-		int band = prio2band[skb->priority & TC_PRIO_MAX];
+		int band = prio2band[skb->tc_class & TC_PRIO_MAX];
 		struct pfifo_fast_priv *priv = qdisc_priv(qdisc);
 		struct sk_buff_head *list = band2list(priv, band);
 
diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index 9bdca2e..2fcc6ef 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -1154,8 +1154,8 @@ hfsc_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
 	struct tcf_proto *tcf;
 	int result;
 
-	if (TC_H_MAJ(skb->priority ^ sch->handle) == 0 &&
-	    (cl = hfsc_find_class(skb->priority, sch)) != NULL)
+	if (TC_H_MAJ(skb->tc_class ^ sch->handle) == 0 &&
+	    (cl = hfsc_find_class(skb->tc_class, sch)) != NULL)
 		if (cl->level == 0)
 			return cl;
 
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 29b942c..ef8dd42 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -197,13 +197,13 @@ static struct htb_class *htb_classify(struct sk_buff *skb, struct Qdisc *sch,
 	struct tcf_proto *tcf;
 	int result;
 
-	/* allow to select class by setting skb->priority to valid classid;
+	/* allow to select class by setting skb->tc_class to valid classid;
 	 * note that nfmark can be used too by attaching filter fw with no
 	 * rules in it
 	 */
-	if (skb->priority == sch->handle)
+	if (skb->tc_class == sch->handle)
 		return HTB_DIRECT;	/* X:0 (direct flow) selected */
-	cl = htb_find(skb->priority, sch);
+	cl = htb_find(skb->tc_class, sch);
 	if (cl && cl->level == 0)
 		return cl;
 
diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c
index b5d56a2..f53a5b99 100644
--- a/net/sched/sch_prio.c
+++ b/net/sched/sch_prio.c
@@ -34,12 +34,12 @@ static struct Qdisc *
 prio_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
 {
 	struct prio_sched_data *q = qdisc_priv(sch);
-	u32 band = skb->priority;
+	u32 band = skb->tc_class;
 	struct tcf_result res;
 	int err;
 
 	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
-	if (TC_H_MAJ(skb->priority) != sch->handle) {
+	if (TC_H_MAJ(skb->tc_class) != sch->handle) {
 		err = tc_classify(skb, q->filter_list, &res);
 #ifdef CONFIG_NET_CLS_ACT
 		switch (err) {
diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index 67494ae..79bc0ac 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -190,10 +190,10 @@ static unsigned int sfq_classify(struct sk_buff *skb, struct Qdisc *sch,
 	struct tcf_result res;
 	int result;
 
-	if (TC_H_MAJ(skb->priority) == sch->handle &&
-	    TC_H_MIN(skb->priority) > 0 &&
-	    TC_H_MIN(skb->priority) <= q->divisor)
-		return TC_H_MIN(skb->priority);
+	if (TC_H_MAJ(skb->tc_class) == sch->handle &&
+	    TC_H_MIN(skb->tc_class) > 0 &&
+	    TC_H_MIN(skb->tc_class) <= q->divisor)
+		return TC_H_MIN(skb->tc_class);
 
 	if (!q->filter_list) {
 		skb_flow_dissect(skb, &sfq_skb_cb(skb)->keys);
-- 
1.7.8.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] skbuff: Add new tc classify variable
  2012-02-07 18:39 [PATCH] skbuff: Add new tc classify variable Simon Wunderlich
@ 2012-02-07 18:58 ` David Miller
  2012-02-07 20:16   ` Simon Wunderlich
  2012-02-07 19:05 ` Stephen Hemminger
  1 sibling, 1 reply; 10+ messages in thread
From: David Miller @ 2012-02-07 18:58 UTC (permalink / raw)
  To: simon.wunderlich; +Cc: netdev

From: Simon Wunderlich <simon.wunderlich@s2003.tu-chemnitz.de>
Date: Tue,  7 Feb 2012 19:39:08 +0100

> The linux traffic control mechanism has different ways to select the
> correct class of a qdisc. A common way to do this is to use tc filters
> that are directly attached to a qdisc. Another approach is to use the
> iptables classify module. The latter one can reduce the amount of work
> necessary to process a packet when iptables is already involved in the
> packet classification.

Do not bloat up sk_buff any more.  Add this, and the other existing
tc_* members to the qdisc SKB control block instead.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] skbuff: Add new tc classify variable
  2012-02-07 18:39 [PATCH] skbuff: Add new tc classify variable Simon Wunderlich
  2012-02-07 18:58 ` David Miller
@ 2012-02-07 19:05 ` Stephen Hemminger
  2012-02-07 19:57   ` Simon Wunderlich
  1 sibling, 1 reply; 10+ messages in thread
From: Stephen Hemminger @ 2012-02-07 19:05 UTC (permalink / raw)
  To: Simon Wunderlich; +Cc: netdev

On Tue,  7 Feb 2012 19:39:08 +0100
Simon Wunderlich <simon.wunderlich@s2003.tu-chemnitz.de> wrote:

> The linux traffic control mechanism has different ways to select the
> correct class of a qdisc. A common way to do this is to use tc filters
> that are directly attached to a qdisc. Another approach is to use the
> iptables classify module. The latter one can reduce the amount of work
> necessary to process a packet when iptables is already involved in the
> packet classification.
> 
> The iptables module can be used in the postrouting chain of the mangle
> table.
> 
> # iptables -F -t mangle
> # iptables -X -t mangle
> # iptables -t mangle -I POSTROUTING -j CLASSIFY --set-class 1:1337
> 
> A simple qdisc with two classes will now route the traffic to 1:1337
> 
> # tc qdisc del dev eth0 root
> # tc qdisc add dev eth0 root handle 1: htb default 1
> # tc class add dev eth0 parent 1: classid 1:1 htb rate 15kbit burst 0kbit
> # tc class add dev eth0 parent 1: classid 1:1337 htb rate 250kbit burst 0kbit
> # tc -s class show dev eth0
> 
> A similar test with an ath9k device will show a complete different
> behavior. The default class 1:1 will be used and data is sent through.
> This problem seems to be related to the fact that the shared member
> variable sk_buff::priority is used to store the tc class id of an
> outgoing packet. This variable is also used in other places for
> different purposes. An example is the ieee80211_select_queue function
> which always overwrites the priority of an outgoing skb.
> 
> This conflict can be resolved by an additional member variable
> sk_buff::tc_class that is only used for the traffic control
> classification.
> 
> Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
> Cc: Pablo Neira Ayuso <pablo@netfilter.org>
> Cc: Patrick McHardy <kaber@trash.net>
> Cc: Jamal Hadi Salim <hadi@cyberus.ca>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Johannes Berg <johannes@sipsolutions.net>
> Cc: "John W. Linville" <linville@tuxdriver.com>
> Cc: Marek Lindner <lindner_marek@yahoo.de>
> Cc: Sven Eckelmann <sven@narfation.org>
> Cc: netfilter-devel@vger.kernel.org
> Cc: netfilter@vger.kernel.org
> Cc: coreteam@netfilter.org
> Cc: netdev@vger.kernel.org
> Cc: linux-wireless@vger.kernel.org

I don't understand why this is better, we already have mark to do this.
Your method saves adding a tc filter to map mark to classid, but that is hardly
a huge burden.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] skbuff: Add new tc classify variable
  2012-02-07 19:05 ` Stephen Hemminger
@ 2012-02-07 19:57   ` Simon Wunderlich
  2012-02-07 21:45     ` Stephen Hemminger
  0 siblings, 1 reply; 10+ messages in thread
From: Simon Wunderlich @ 2012-02-07 19:57 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Simon Wunderlich, netdev

[-- Attachment #1: Type: text/plain, Size: 721 bytes --]

On Tue, Feb 07, 2012 at 11:05:22AM -0800, Stephen Hemminger wrote:
> On Tue,  7 Feb 2012 19:39:08 +0100
> Simon Wunderlich <simon.wunderlich@s2003.tu-chemnitz.de> wrote:
> 
> I don't understand why this is better, we already have mark to do this.
> Your method saves adding a tc filter to map mark to classid, but that is hardly
> a huge burden.

Unfortunately, it is. We have previously built our trees by setting marks with iptables
and matching the masks with tc and the u32 matcher, but we got a rather big performance
impact as soon as the number of users grow. The target platform are WiFi access points.
By using the proposed patch, the performance stays nearly constant at a growing number
of users.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] skbuff: Add new tc classify variable
  2012-02-07 18:58 ` David Miller
@ 2012-02-07 20:16   ` Simon Wunderlich
  2012-02-07 20:33     ` Dave Taht
  2012-02-08  9:21     ` Eric Dumazet
  0 siblings, 2 replies; 10+ messages in thread
From: Simon Wunderlich @ 2012-02-07 20:16 UTC (permalink / raw)
  To: David Miller
  Cc: simon.wunderlich, netdev, Pablo Neira Ayuso, Patrick McHardy,
	Jamal Hadi Salim, Johannes Berg, John W. Linville, Marek Lindner,
	Sven Eckelmann, netfilter-devel, netfilter, coreteam,
	linux-wireless

[-- Attachment #1: Type: text/plain, Size: 1413 bytes --]

Hello David,

On Tue, Feb 07, 2012 at 01:58:41PM -0500, David Miller wrote:
> From: Simon Wunderlich <simon.wunderlich@s2003.tu-chemnitz.de>
> Date: Tue,  7 Feb 2012 19:39:08 +0100
> 
> > The linux traffic control mechanism has different ways to select the
> > correct class of a qdisc. A common way to do this is to use tc filters
> > that are directly attached to a qdisc. Another approach is to use the
> > iptables classify module. The latter one can reduce the amount of work
> > necessary to process a packet when iptables is already involved in the
> > packet classification.
> 
> Do not bloat up sk_buff any more.  Add this, and the other existing
> tc_* members to the qdisc SKB control block instead.
> 

Thanks for your feedback!

I guess you mean skb->cb, but this is also used within mac80211 for various things
(quoting include/net/mac80211.h):

 * struct ieee80211_tx_info - skb transmit information
 *
 * This structure is placed in skb->cb for three uses:
 *  (1) mac80211 TX control - mac80211 tells the driver what to do
 *  (2) driver internal use (if applicable)
 *  (3) TX status information - driver tells mac80211 what happened

We could give it a try, but we most probably run into conflicts again.

I've messed up the CCs in my initial mail (and just resent it) - sorry about that.
Maybe the mac80211 guys have a suggestion as well :)

Cheers,
	Simon

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] skbuff: Add new tc classify variable
  2012-02-07 20:16   ` Simon Wunderlich
@ 2012-02-07 20:33     ` Dave Taht
  2012-02-08  9:21     ` Eric Dumazet
  1 sibling, 0 replies; 10+ messages in thread
From: Dave Taht @ 2012-02-07 20:33 UTC (permalink / raw)
  To: Simon Wunderlich
  Cc: David Miller, netdev, Pablo Neira Ayuso, Patrick McHardy,
	Jamal Hadi Salim, Johannes Berg, John W. Linville, Marek Lindner,
	Sven Eckelmann, netfilter-devel, netfilter, coreteam,
	linux-wireless

On Tue, Feb 7, 2012 at 7:57 PM, Simon Wunderlich <
simon.wunderlich@s2003.tu-chemnitz.de> wrote:

> On Tue, Feb 07, 2012 at 11:05:22AM -0800, Stephen Hemminger wrote:
> > On Tue,  7 Feb 2012 19:39:08 +0100
> > Simon Wunderlich <simon.wunderlich@s2003.tu-chemnitz.de> wrote:
> >
> > I don't understand why this is better, we already have mark to do this.
> > Your method saves adding a tc filter to map mark to classid, but that i=
s
> hardly
> > a huge burden.
>
> Unfortunately, it is. We have previously built our trees by setting marks
> with iptables
> and matching the masks with tc and the u32 matcher, but we got a rather
> big performance
> impact as soon as the number of users grow. The target platform are WiFi
> access points.
> By using the proposed patch, the performance stays nearly constant at a
> growing number
> of users.
>

To put things in more general terms, the overspecialized behavior and use
of magic numbers in the wireless stack

... from net/wireless/util.c/*

Given a data frame determine the 802.1p/1d tag to use. */
unsigned int cfg80211_classify8021d(struct sk_buff *skb)
{
        unsigned int dscp;

        /* skb->priority values from 256->263 are magic values to
         * directly indicate a specific 802.1d priority.  This is used
         * to allow 802.1d priority to be passed directly in from VLAN
         * tags, etc.
         */
        if (skb->priority >=3D 256 && skb->priority <=3D 263)
                return skb->priority - 256;

        switch (skb->protocol) {
        case htons(ETH_P_IP):
                dscp =3D ipv4_get_dsfield(ip_hdr(skb)) & 0xfc;
                break;
        case htons(ETH_P_IPV6):
                dscp =3D ipv6_get_dsfield(ipv6_hdr(skb)) & 0xfc;
                break;
        default:
                return 0;
        }

        return dscp >> 5;
}


and the related select queue function calling this
which maps the 4 hardware queues wireless has, onto 4 mq subqdiscs...

overrides most anything you might want to do with iptables doing
classification directly, especially if you want to put things
into different hw queues.

and furthermore makes using other qdiscs than mq, such as the htb rate
limiter, problematic, as well as sch_mq has no ability to put in a top
level filter, anyway because stuff ends up in the 1:1,1:2,1:3,1:4
automagically...

Reducing the namespace to 'marks' vs classes is ok for some uses, I guess,
but how to handle the hw queues and the select_queue problem?

And while I'm at the problems that wireless has, having one (or 4) queues
per actual end-point rather than 4 hardware queues would make it possible
to address the bufferbloat problem with wireless and sanely calculating
aggregation values....

My thought on that was to be tossing a station identifier into the cb, but
that seems to alsobe a cross layer violation.

I like the idea of separating "priority" from "classification" and from
"hw queues" but a unifying metaphor and space in the skb seems
to be THE problem.




--
Dave Täht
SKYPE: davetaht
US Tel: 1-239-829-5608
FR Tel: 0638645374
http://www.bufferbloat.net

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] skbuff: Add new tc classify variable
  2012-02-07 19:57   ` Simon Wunderlich
@ 2012-02-07 21:45     ` Stephen Hemminger
  2012-02-08  8:54       ` Florian Westphal
  0 siblings, 1 reply; 10+ messages in thread
From: Stephen Hemminger @ 2012-02-07 21:45 UTC (permalink / raw)
  To: Simon Wunderlich; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 1136 bytes --]

On Tue, 7 Feb 2012 20:57:29 +0100
Simon Wunderlich <simon.wunderlich@s2003.tu-chemnitz.de> wrote:

> On Tue, Feb 07, 2012 at 11:05:22AM -0800, Stephen Hemminger wrote:
> > On Tue,  7 Feb 2012 19:39:08 +0100
> > Simon Wunderlich <simon.wunderlich@s2003.tu-chemnitz.de> wrote:
> > 
> > I don't understand why this is better, we already have mark to do this.
> > Your method saves adding a tc filter to map mark to classid, but that is hardly
> > a huge burden.
> 
> Unfortunately, it is. We have previously built our trees by setting marks with iptables
> and matching the masks with tc and the u32 matcher, but we got a rather big performance
> impact as soon as the number of users grow. The target platform are WiFi access points.
> By using the proposed patch, the performance stays nearly constant at a growing number
> of users.

The problem then is not the use of marks but having to add a filter rule for
each mark. We need something like u32 hash but for marks to build a table
of mark -> class id.  The current method uses a linear list of filter rules,
which is an obvious bottleneck with lots of classes.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] skbuff: Add new tc classify variable
  2012-02-07 21:45     ` Stephen Hemminger
@ 2012-02-08  8:54       ` Florian Westphal
  0 siblings, 0 replies; 10+ messages in thread
From: Florian Westphal @ 2012-02-08  8:54 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Simon Wunderlich, netdev

Stephen Hemminger <shemminger@vyatta.com> wrote:
> > On Tue, Feb 07, 2012 at 11:05:22AM -0800, Stephen Hemminger wrote:
> > > On Tue,  7 Feb 2012 19:39:08 +0100
> > > Simon Wunderlich <simon.wunderlich@s2003.tu-chemnitz.de> wrote:
> > > 
> > > I don't understand why this is better, we already have mark to do this.
> > > Your method saves adding a tc filter to map mark to classid, but that is hardly
> > > a huge burden.
> > 
> > Unfortunately, it is. We have previously built our trees by setting marks with iptables
> > and matching the masks with tc and the u32 matcher, but we got a rather big performance
> > impact as soon as the number of users grow. The target platform are WiFi access points.
> > By using the proposed patch, the performance stays nearly constant at a growing number
> > of users.
> 
> The problem then is not the use of marks but having to add a filter rule for
> each mark. We need something like u32 hash but for marks to build a table
> of mark -> class id.  The current method uses a linear list of filter rules,
> which is an obvious bottleneck with lots of classes.

Maybe.  But marks might already be used e.g. for policy routing or for
additional metadata (such as encoding higher level data).

And even if you do your 'mark identifies the traffic class' very late in
your iptables ruleset -- as soon as you've got tunnels configured,
the encapsulated packets end up travellign through the ruleset again
with a "classid mark" set -- which might interfere with the ruleset.

IOW, using -j CLASSIFY does have advantages regardless of the "tc mark filter"
performance issue Simon has.

There is also the SO_PRIORITY setsockopt which userspace might be using to pick a
particular traffic class...

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] skbuff: Add new tc classify variable
  2012-02-07 20:16   ` Simon Wunderlich
  2012-02-07 20:33     ` Dave Taht
@ 2012-02-08  9:21     ` Eric Dumazet
  2012-02-08 14:48       ` jamal
  1 sibling, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2012-02-08  9:21 UTC (permalink / raw)
  To: Simon Wunderlich
  Cc: David Miller, netdev, Pablo Neira Ayuso, Patrick McHardy,
	Jamal Hadi Salim, Johannes Berg, John W. Linville, Marek Lindner,
	Sven Eckelmann, netfilter-devel, netfilter, coreteam,
	linux-wireless

Le mardi 07 février 2012 à 21:16 +0100, Simon Wunderlich a écrit :
> Hello David,
> 
> On Tue, Feb 07, 2012 at 01:58:41PM -0500, David Miller wrote:
> > From: Simon Wunderlich <simon.wunderlich@s2003.tu-chemnitz.de>
> > Date: Tue,  7 Feb 2012 19:39:08 +0100
> > 
> > > The linux traffic control mechanism has different ways to select the
> > > correct class of a qdisc. A common way to do this is to use tc filters
> > > that are directly attached to a qdisc. Another approach is to use the
> > > iptables classify module. The latter one can reduce the amount of work
> > > necessary to process a packet when iptables is already involved in the
> > > packet classification.
> > 
> > Do not bloat up sk_buff any more.  Add this, and the other existing
> > tc_* members to the qdisc SKB control block instead.
> > 
> 
> Thanks for your feedback!
> 
> I guess you mean skb->cb, but this is also used within mac80211 for various things
> (quoting include/net/mac80211.h):
> 
>  * struct ieee80211_tx_info - skb transmit information
>  *
>  * This structure is placed in skb->cb for three uses:
>  *  (1) mac80211 TX control - mac80211 tells the driver what to do
>  *  (2) driver internal use (if applicable)
>  *  (3) TX status information - driver tells mac80211 what happened
> 
> We could give it a try, but we most probably run into conflicts again.
> 
> I've messed up the CCs in my initial mail (and just resent it) - sorry about that.
> Maybe the mac80211 guys have a suggestion as well :)
> 

I really dont understand what the conflict might be.

As long as skb is in Qdisc layer, Qdisc owns skb->cb[] (a part of it
actually, see current discussions about IB on netdev)

As soon as packet is given to device (mac80211), cb[] can be reused.

Or are you saying tc_class also might be needed by mac80211 in the
future ?



--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] skbuff: Add new tc classify variable
  2012-02-08  9:21     ` Eric Dumazet
@ 2012-02-08 14:48       ` jamal
  0 siblings, 0 replies; 10+ messages in thread
From: jamal @ 2012-02-08 14:48 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Simon Wunderlich, David Miller, netdev, Pablo Neira Ayuso,
	Patrick McHardy, Johannes Berg, John W. Linville, Marek Lindner,
	Sven Eckelmann, netfilter-devel, netfilter, coreteam,
	linux-wireless

skb->cb is the right scratch pad to use.
The problem appears to be the wireless stack abusing some of these
fields (prio for one). And we do have marks in place already which
are 32 bit and global.

 
cheers,
jamal

On Wed, 2012-02-08 at 10:21 +0100, Eric Dumazet wrote:
> Le mardi 07 février 2012 à 21:16 +0100, Simon Wunderlich a écrit :

> 
> I really dont understand what the conflict might be.
> 
> As long as skb is in Qdisc layer, Qdisc owns skb->cb[] (a part of it
> actually, see current discussions about IB on netdev)
> 
> As soon as packet is given to device (mac80211), cb[] can be reused.
> 
> Or are you saying tc_class also might be needed by mac80211 in the
> future ?
> 
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-02-08 14:48 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-07 18:39 [PATCH] skbuff: Add new tc classify variable Simon Wunderlich
2012-02-07 18:58 ` David Miller
2012-02-07 20:16   ` Simon Wunderlich
2012-02-07 20:33     ` Dave Taht
2012-02-08  9:21     ` Eric Dumazet
2012-02-08 14:48       ` jamal
2012-02-07 19:05 ` Stephen Hemminger
2012-02-07 19:57   ` Simon Wunderlich
2012-02-07 21:45     ` Stephen Hemminger
2012-02-08  8:54       ` Florian Westphal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).