All of lore.kernel.org
 help / color / mirror / Atom feed
* net_sched 00/07: classful multiqueue dummy scheduler
@ 2009-09-04 16:41 Patrick McHardy
  2009-09-04 16:41 ` net_sched 01/07: fix class grafting errno codes Patrick McHardy
                   ` (8 more replies)
  0 siblings, 9 replies; 46+ messages in thread
From: Patrick McHardy @ 2009-09-04 16:41 UTC (permalink / raw)
  To: netdev; +Cc: Patrick McHardy

These patches contain a classful multiqueue ("mq") dummy scheduler to fix a
couple of problems with the current multiqueue TC API integration. The
changelogs of patch 05 and 07 contain more details.

The mq scheduler does two things:

- present device TX queues as classes, allowing to attach different qdiscs
  to them, which are grafted to the TX queues

- present accumulated statistics of all device queue root qdiscs

Its used by default for multiqueue devices instead of the regular pfifo_fast
qdisc, but can also be attached manually to restore multiqueue behaviour
after attaching a non-multiqueue (shared) qdisc.

Patches 1-4 contain some preparatory cleanups because I was getting tired
of copying unnecessary checks and dummy functions :)

Patch 5 introduces a dev->qdisc pointer, which points to the root qdisc from
userspace's point of view. This is later used for the mq qdisc, which isn't
actually attached to any device queues. Patch 7 contains the mq scheduler.

I've tested the scheduler with a hacked macvlan version which uses 4 queues,
but since I don't own a multiqueue capable device I couldn't test this on
real hardware.

Any comments and test results welcome :)


 include/linux/netdevice.h |    3 +
 include/net/sch_generic.h |    6 +
 net/core/rtnetlink.c      |    6 +-
 net/sched/Makefile        |    2 +-
 net/sched/cls_api.c       |   10 +-
 net/sched/sch_api.c       |   99 ++++++++-----------
 net/sched/sch_cbq.c       |   38 ++++----
 net/sched/sch_generic.c   |   58 +++++++++--
 net/sched/sch_hfsc.c      |    4 +-
 net/sched/sch_htb.c       |   35 ++++----
 net/sched/sch_ingress.c   |   14 ---
 net/sched/sch_mq.c        |  234 +++++++++++++++++++++++++++++++++++++++++++++
 net/sched/sch_multiq.c    |   33 +------
 net/sched/sch_prio.c      |   32 +------
 net/sched/sch_red.c       |   21 ----
 net/sched/sch_sfq.c       |    7 --
 net/sched/sch_tbf.c       |   22 ----
 17 files changed, 375 insertions(+), 249 deletions(-)
 create mode 100644 net/sched/sch_mq.c

Patrick McHardy (7):
      net_sched: fix class grafting errno codes
      net_sched: make cls_ops->tcf_chain() optional
      net_sched: make cls_ops->change and cls_ops->delete optional
      net_sched: remove some unnecessary checks in classful schedulers
      net_sched: reintroduce dev->qdisc for use by sch_api
      net_sched: move dev_graft_qdisc() to sch_generic.c
      net_sched: add classful multiqueue dummy scheduler

^ permalink raw reply	[flat|nested] 46+ messages in thread

* net_sched 01/07: fix class grafting errno codes
  2009-09-04 16:41 net_sched 00/07: classful multiqueue dummy scheduler Patrick McHardy
@ 2009-09-04 16:41 ` Patrick McHardy
  2009-09-04 16:41 ` net_sched 02/07: make cls_ops->tcf_chain() optional Patrick McHardy
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 46+ messages in thread
From: Patrick McHardy @ 2009-09-04 16:41 UTC (permalink / raw)
  To: netdev; +Cc: Patrick McHardy

commit 1cf555183f0ae7e256381ea8993272c0a321f5b5
Author: Patrick McHardy <kaber@trash.net>
Date:   Fri Sep 4 14:28:10 2009 +0200

    net_sched: fix class grafting errno codes
    
    If the parent qdisc doesn't support classes, use EOPNOTSUPP.
    If the parent class doesn't exist, use ENOENT. Currently EINVAL
    is returned in both cases.
    
    Additionally check whether grafting is supported and remove a now
    unnecessary graft function from sch_ingress.
    
    Signed-off-by: Patrick McHardy <kaber@trash.net>

diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 24d17ce..bef2d64 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -728,14 +728,14 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
 	} else {
 		const struct Qdisc_class_ops *cops = parent->ops->cl_ops;
 
-		err = -EINVAL;
-
-		if (cops) {
+		err = -EOPNOTSUPP;
+		if (cops && cops->graft) {
 			unsigned long cl = cops->get(parent, classid);
 			if (cl) {
 				err = cops->graft(parent, cl, new, &old);
 				cops->put(parent, cl);
-			}
+			} else
+				err = -ENOENT;
 		}
 		if (!err)
 			notify_and_destroy(skb, n, classid, old, new);
diff --git a/net/sched/sch_ingress.c b/net/sched/sch_ingress.c
index 4a2b773..ace7902 100644
--- a/net/sched/sch_ingress.c
+++ b/net/sched/sch_ingress.c
@@ -22,12 +22,6 @@ struct ingress_qdisc_data {
 
 /* ------------------------- Class/flow operations ------------------------- */
 
-static int ingress_graft(struct Qdisc *sch, unsigned long arg,
-			 struct Qdisc *new, struct Qdisc **old)
-{
-	return -EOPNOTSUPP;
-}
-
 static struct Qdisc *ingress_leaf(struct Qdisc *sch, unsigned long arg)
 {
 	return NULL;
@@ -123,7 +117,6 @@ nla_put_failure:
 }
 
 static const struct Qdisc_class_ops ingress_class_ops = {
-	.graft		=	ingress_graft,
 	.leaf		=	ingress_leaf,
 	.get		=	ingress_get,
 	.put		=	ingress_put,

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* net_sched 02/07: make cls_ops->tcf_chain() optional
  2009-09-04 16:41 net_sched 00/07: classful multiqueue dummy scheduler Patrick McHardy
  2009-09-04 16:41 ` net_sched 01/07: fix class grafting errno codes Patrick McHardy
@ 2009-09-04 16:41 ` Patrick McHardy
  2009-09-05  8:13   ` Jarek Poplawski
  2009-09-04 16:41 ` net_sched 03/07: make cls_ops->change and cls_ops->delete optional Patrick McHardy
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 46+ messages in thread
From: Patrick McHardy @ 2009-09-04 16:41 UTC (permalink / raw)
  To: netdev; +Cc: Patrick McHardy

commit 6ea4233ef8f398289a14a3305d4ed440fb026d43
Author: Patrick McHardy <kaber@trash.net>
Date:   Fri Sep 4 14:28:11 2009 +0200

    net_sched: make cls_ops->tcf_chain() optional
    
    Some qdiscs don't support attaching filters. Handle this centrally in
    cls_api and return a proper errno code (EOPNOTSUPP) instead of EINVAL.
    
    Signed-off-by: Patrick McHardy <kaber@trash.net>

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 09cdcdf..eaa8f43 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -181,6 +181,9 @@ replay:
 	if ((cops = q->ops->cl_ops) == NULL)
 		return -EINVAL;
 
+	if (cops->tcf_chain == NULL)
+		return -EOPNOTSUPP;
+
 	/* Do we search for filter, attached to class? */
 	if (TC_H_MIN(parent)) {
 		cl = cops->get(q, parent);
diff --git a/net/sched/sch_red.c b/net/sched/sch_red.c
index 2bdf241..c27b802 100644
--- a/net/sched/sch_red.c
+++ b/net/sched/sch_red.c
@@ -331,11 +331,6 @@ static void red_walk(struct Qdisc *sch, struct qdisc_walker *walker)
 	}
 }
 
-static struct tcf_proto **red_find_tcf(struct Qdisc *sch, unsigned long cl)
-{
-	return NULL;
-}
-
 static const struct Qdisc_class_ops red_class_ops = {
 	.graft		=	red_graft,
 	.leaf		=	red_leaf,
@@ -344,7 +339,6 @@ static const struct Qdisc_class_ops red_class_ops = {
 	.change		=	red_change_class,
 	.delete		=	red_delete,
 	.walk		=	red_walk,
-	.tcf_chain	=	red_find_tcf,
 	.dump		=	red_dump_class,
 };
 
diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
index e22dfe8..2890969 100644
--- a/net/sched/sch_tbf.c
+++ b/net/sched/sch_tbf.c
@@ -433,11 +433,6 @@ static void tbf_walk(struct Qdisc *sch, struct qdisc_walker *walker)
 	}
 }
 
-static struct tcf_proto **tbf_find_tcf(struct Qdisc *sch, unsigned long cl)
-{
-	return NULL;
-}
-
 static const struct Qdisc_class_ops tbf_class_ops =
 {
 	.graft		=	tbf_graft,
@@ -447,7 +442,6 @@ static const struct Qdisc_class_ops tbf_class_ops =
 	.change		=	tbf_change_class,
 	.delete		=	tbf_delete,
 	.walk		=	tbf_walk,
-	.tcf_chain	=	tbf_find_tcf,
 	.dump		=	tbf_dump_class,
 };
 

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* net_sched 03/07: make cls_ops->change and cls_ops->delete optional
  2009-09-04 16:41 net_sched 00/07: classful multiqueue dummy scheduler Patrick McHardy
  2009-09-04 16:41 ` net_sched 01/07: fix class grafting errno codes Patrick McHardy
  2009-09-04 16:41 ` net_sched 02/07: make cls_ops->tcf_chain() optional Patrick McHardy
@ 2009-09-04 16:41 ` Patrick McHardy
  2009-09-04 16:41 ` net_sched 04/07: remove some unnecessary checks in classful schedulers Patrick McHardy
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 46+ messages in thread
From: Patrick McHardy @ 2009-09-04 16:41 UTC (permalink / raw)
  To: netdev; +Cc: Patrick McHardy

commit c357570bb4fdd3c608dce92174acccb9b5b8163b
Author: Patrick McHardy <kaber@trash.net>
Date:   Fri Sep 4 15:14:58 2009 +0200

    net_sched: make cls_ops->change and cls_ops->delete optional
    
    Some schedulers don't support creating, changing or deleting classes.
    Make the respective callbacks optionally and consistently return
    -EOPNOTSUPP for unsupported operations, instead of currently either
    -EOPNOTSUPP, -ENOSYS or no error.
    
    In case of sch_prio and sch_multiq, the removed operations additionally
    checked for an invalid class. This is not necessary since the class
    argument can only orginate from ->get() or in case of ->change is 0
    for creation of new classes, in which case ->change() incorrectly
    returned -ENOENT.
    
    As a side-effect, this patch fixes a possible (root-only) NULL pointer
    function call in sch_ingress, which didn't implement a so far mandatory
    ->delete() operation.
    
    Signed-off-by: Patrick McHardy <kaber@trash.net>

diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index bef2d64..166fcca 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1417,7 +1417,9 @@ static int tc_ctl_tclass(struct sk_buff *skb, struct nlmsghdr *n, void *arg)
 				goto out;
 			break;
 		case RTM_DELTCLASS:
-			err = cops->delete(q, cl);
+			err = -EOPNOTSUPP;
+			if (cops->delete)
+				err = cops->delete(q, cl);
 			if (err == 0)
 				tclass_notify(skb, n, q, cl, RTM_DELTCLASS);
 			goto out;
@@ -1431,7 +1433,9 @@ static int tc_ctl_tclass(struct sk_buff *skb, struct nlmsghdr *n, void *arg)
 	}
 
 	new_cl = cl;
-	err = cops->change(q, clid, pid, tca, &new_cl);
+	err = -EOPNOTSUPP;
+	if (cops->change)
+		err = cops->change(q, clid, pid, tca, &new_cl);
 	if (err == 0)
 		tclass_notify(skb, n, q, new_cl, RTM_NEWTCLASS);
 
diff --git a/net/sched/sch_ingress.c b/net/sched/sch_ingress.c
index ace7902..a9e646b 100644
--- a/net/sched/sch_ingress.c
+++ b/net/sched/sch_ingress.c
@@ -42,12 +42,6 @@ static void ingress_put(struct Qdisc *sch, unsigned long cl)
 {
 }
 
-static int ingress_change(struct Qdisc *sch, u32 classid, u32 parent,
-			  struct nlattr **tca, unsigned long *arg)
-{
-	return 0;
-}
-
 static void ingress_walk(struct Qdisc *sch, struct qdisc_walker *walker)
 {
 	return;
@@ -120,7 +114,6 @@ static const struct Qdisc_class_ops ingress_class_ops = {
 	.leaf		=	ingress_leaf,
 	.get		=	ingress_get,
 	.put		=	ingress_put,
-	.change		=	ingress_change,
 	.walk		=	ingress_walk,
 	.tcf_chain	=	ingress_find_tcf,
 	.bind_tcf	=	ingress_bind_filter,
diff --git a/net/sched/sch_multiq.c b/net/sched/sch_multiq.c
index 9127312..a0ffe71 100644
--- a/net/sched/sch_multiq.c
+++ b/net/sched/sch_multiq.c
@@ -348,26 +348,6 @@ static void multiq_put(struct Qdisc *q, unsigned long cl)
 	return;
 }
 
-static int multiq_change(struct Qdisc *sch, u32 handle, u32 parent,
-			 struct nlattr **tca, unsigned long *arg)
-{
-	unsigned long cl = *arg;
-	struct multiq_sched_data *q = qdisc_priv(sch);
-
-	if (cl - 1 > q->bands)
-		return -ENOENT;
-	return 0;
-}
-
-static int multiq_delete(struct Qdisc *sch, unsigned long cl)
-{
-	struct multiq_sched_data *q = qdisc_priv(sch);
-	if (cl - 1 > q->bands)
-		return -ENOENT;
-	return 0;
-}
-
-
 static int multiq_dump_class(struct Qdisc *sch, unsigned long cl,
 			     struct sk_buff *skb, struct tcmsg *tcm)
 {
@@ -430,8 +410,6 @@ static const struct Qdisc_class_ops multiq_class_ops = {
 	.leaf		=	multiq_leaf,
 	.get		=	multiq_get,
 	.put		=	multiq_put,
-	.change		=	multiq_change,
-	.delete		=	multiq_delete,
 	.walk		=	multiq_walk,
 	.tcf_chain	=	multiq_find_tcf,
 	.bind_tcf	=	multiq_bind,
diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c
index 94cecef..209a4ca 100644
--- a/net/sched/sch_prio.c
+++ b/net/sched/sch_prio.c
@@ -311,25 +311,6 @@ static void prio_put(struct Qdisc *q, unsigned long cl)
 	return;
 }
 
-static int prio_change(struct Qdisc *sch, u32 handle, u32 parent, struct nlattr **tca, unsigned long *arg)
-{
-	unsigned long cl = *arg;
-	struct prio_sched_data *q = qdisc_priv(sch);
-
-	if (cl - 1 > q->bands)
-		return -ENOENT;
-	return 0;
-}
-
-static int prio_delete(struct Qdisc *sch, unsigned long cl)
-{
-	struct prio_sched_data *q = qdisc_priv(sch);
-	if (cl - 1 > q->bands)
-		return -ENOENT;
-	return 0;
-}
-
-
 static int prio_dump_class(struct Qdisc *sch, unsigned long cl, struct sk_buff *skb,
 			   struct tcmsg *tcm)
 {
@@ -392,8 +373,6 @@ static const struct Qdisc_class_ops prio_class_ops = {
 	.leaf		=	prio_leaf,
 	.get		=	prio_get,
 	.put		=	prio_put,
-	.change		=	prio_change,
-	.delete		=	prio_delete,
 	.walk		=	prio_walk,
 	.tcf_chain	=	prio_find_tcf,
 	.bind_tcf	=	prio_bind,
diff --git a/net/sched/sch_red.c b/net/sched/sch_red.c
index c27b802..a2c4d1a 100644
--- a/net/sched/sch_red.c
+++ b/net/sched/sch_red.c
@@ -308,17 +308,6 @@ static void red_put(struct Qdisc *sch, unsigned long arg)
 	return;
 }
 
-static int red_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
-			    struct nlattr **tca, unsigned long *arg)
-{
-	return -ENOSYS;
-}
-
-static int red_delete(struct Qdisc *sch, unsigned long cl)
-{
-	return -ENOSYS;
-}
-
 static void red_walk(struct Qdisc *sch, struct qdisc_walker *walker)
 {
 	if (!walker->stop) {
@@ -336,8 +325,6 @@ static const struct Qdisc_class_ops red_class_ops = {
 	.leaf		=	red_leaf,
 	.get		=	red_get,
 	.put		=	red_put,
-	.change		=	red_change_class,
-	.delete		=	red_delete,
 	.walk		=	red_walk,
 	.dump		=	red_dump_class,
 };
diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index 8706920..cb21380 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -496,12 +496,6 @@ nla_put_failure:
 	return -1;
 }
 
-static int sfq_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
-			    struct nlattr **tca, unsigned long *arg)
-{
-	return -EOPNOTSUPP;
-}
-
 static unsigned long sfq_get(struct Qdisc *sch, u32 classid)
 {
 	return 0;
@@ -560,7 +554,6 @@ static void sfq_walk(struct Qdisc *sch, struct qdisc_walker *arg)
 
 static const struct Qdisc_class_ops sfq_class_ops = {
 	.get		=	sfq_get,
-	.change		=	sfq_change_class,
 	.tcf_chain	=	sfq_find_tcf,
 	.dump		=	sfq_dump_class,
 	.dump_stats	=	sfq_dump_class_stats,
diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
index 2890969..d904167 100644
--- a/net/sched/sch_tbf.c
+++ b/net/sched/sch_tbf.c
@@ -410,17 +410,6 @@ static void tbf_put(struct Qdisc *sch, unsigned long arg)
 {
 }
 
-static int tbf_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
-			    struct nlattr **tca, unsigned long *arg)
-{
-	return -ENOSYS;
-}
-
-static int tbf_delete(struct Qdisc *sch, unsigned long arg)
-{
-	return -ENOSYS;
-}
-
 static void tbf_walk(struct Qdisc *sch, struct qdisc_walker *walker)
 {
 	if (!walker->stop) {
@@ -439,8 +428,6 @@ static const struct Qdisc_class_ops tbf_class_ops =
 	.leaf		=	tbf_leaf,
 	.get		=	tbf_get,
 	.put		=	tbf_put,
-	.change		=	tbf_change_class,
-	.delete		=	tbf_delete,
 	.walk		=	tbf_walk,
 	.dump		=	tbf_dump_class,
 };

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* net_sched 04/07: remove some unnecessary checks in classful schedulers
  2009-09-04 16:41 net_sched 00/07: classful multiqueue dummy scheduler Patrick McHardy
                   ` (2 preceding siblings ...)
  2009-09-04 16:41 ` net_sched 03/07: make cls_ops->change and cls_ops->delete optional Patrick McHardy
@ 2009-09-04 16:41 ` Patrick McHardy
  2009-09-04 16:41 ` net_sched 05/07: reintroduce dev->qdisc for use by sch_api Patrick McHardy
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 46+ messages in thread
From: Patrick McHardy @ 2009-09-04 16:41 UTC (permalink / raw)
  To: netdev; +Cc: Patrick McHardy

commit eca9dbb05c5bf47f4bc3162b4f19dcb4dd85acfc
Author: Patrick McHardy <kaber@trash.net>
Date:   Fri Sep 4 16:12:45 2009 +0200

    net_sched: remove some unnecessary checks in classful schedulers
    
    The class argument to the ->graft(), ->leaf(), ->dump(), ->dump_stats() all
    originate from either ->get() or ->walk() and are always valid.
    
    Remove unnecessary checks.
    
    Signed-off-by: Patrick McHardy <kaber@trash.net>

diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index d5798e1..5b132c4 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -1621,29 +1621,25 @@ static int cbq_graft(struct Qdisc *sch, unsigned long arg, struct Qdisc *new,
 {
 	struct cbq_class *cl = (struct cbq_class*)arg;
 
-	if (cl) {
-		if (new == NULL) {
-			new = qdisc_create_dflt(qdisc_dev(sch), sch->dev_queue,
-						&pfifo_qdisc_ops,
-						cl->common.classid);
-			if (new == NULL)
-				return -ENOBUFS;
-		} else {
+	if (new == NULL) {
+		new = qdisc_create_dflt(qdisc_dev(sch), sch->dev_queue,
+					&pfifo_qdisc_ops, cl->common.classid);
+		if (new == NULL)
+			return -ENOBUFS;
+	} else {
 #ifdef CONFIG_NET_CLS_ACT
-			if (cl->police == TC_POLICE_RECLASSIFY)
-				new->reshape_fail = cbq_reshape_fail;
+		if (cl->police == TC_POLICE_RECLASSIFY)
+			new->reshape_fail = cbq_reshape_fail;
 #endif
-		}
-		sch_tree_lock(sch);
-		*old = cl->q;
-		cl->q = new;
-		qdisc_tree_decrease_qlen(*old, (*old)->q.qlen);
-		qdisc_reset(*old);
-		sch_tree_unlock(sch);
-
-		return 0;
 	}
-	return -ENOENT;
+	sch_tree_lock(sch);
+	*old = cl->q;
+	cl->q = new;
+	qdisc_tree_decrease_qlen(*old, (*old)->q.qlen);
+	qdisc_reset(*old);
+	sch_tree_unlock(sch);
+
+	return 0;
 }
 
 static struct Qdisc *
@@ -1651,7 +1647,7 @@ cbq_leaf(struct Qdisc *sch, unsigned long arg)
 {
 	struct cbq_class *cl = (struct cbq_class*)arg;
 
-	return cl ? cl->q : NULL;
+	return cl->q;
 }
 
 static void cbq_qlen_notify(struct Qdisc *sch, unsigned long arg)
diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index dad0144..375d64c 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -1203,8 +1203,6 @@ hfsc_graft_class(struct Qdisc *sch, unsigned long arg, struct Qdisc *new,
 {
 	struct hfsc_class *cl = (struct hfsc_class *)arg;
 
-	if (cl == NULL)
-		return -ENOENT;
 	if (cl->level > 0)
 		return -EINVAL;
 	if (new == NULL) {
@@ -1228,7 +1226,7 @@ hfsc_class_leaf(struct Qdisc *sch, unsigned long arg)
 {
 	struct hfsc_class *cl = (struct hfsc_class *)arg;
 
-	if (cl != NULL && cl->level == 0)
+	if (cl->level == 0)
 		return cl->qdisc;
 
 	return NULL;
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index ec4d463..85acab9 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -1117,30 +1117,29 @@ static int htb_graft(struct Qdisc *sch, unsigned long arg, struct Qdisc *new,
 {
 	struct htb_class *cl = (struct htb_class *)arg;
 
-	if (cl && !cl->level) {
-		if (new == NULL &&
-		    (new = qdisc_create_dflt(qdisc_dev(sch), sch->dev_queue,
-					     &pfifo_qdisc_ops,
-					     cl->common.classid))
-		    == NULL)
-			return -ENOBUFS;
-		sch_tree_lock(sch);
-		*old = cl->un.leaf.q;
-		cl->un.leaf.q = new;
-		if (*old != NULL) {
-			qdisc_tree_decrease_qlen(*old, (*old)->q.qlen);
-			qdisc_reset(*old);
-		}
-		sch_tree_unlock(sch);
-		return 0;
+	if (cl->level)
+		return -EINVAL;
+	if (new == NULL &&
+	    (new = qdisc_create_dflt(qdisc_dev(sch), sch->dev_queue,
+				     &pfifo_qdisc_ops,
+				     cl->common.classid)) == NULL)
+		return -ENOBUFS;
+
+	sch_tree_lock(sch);
+	*old = cl->un.leaf.q;
+	cl->un.leaf.q = new;
+	if (*old != NULL) {
+		qdisc_tree_decrease_qlen(*old, (*old)->q.qlen);
+		qdisc_reset(*old);
 	}
-	return -ENOENT;
+	sch_tree_unlock(sch);
+	return 0;
 }
 
 static struct Qdisc *htb_leaf(struct Qdisc *sch, unsigned long arg)
 {
 	struct htb_class *cl = (struct htb_class *)arg;
-	return (cl && !cl->level) ? cl->un.leaf.q : NULL;
+	return !cl->level ? cl->un.leaf.q : NULL;
 }
 
 static void htb_qlen_notify(struct Qdisc *sch, unsigned long arg)
diff --git a/net/sched/sch_multiq.c b/net/sched/sch_multiq.c
index a0ffe71..069f81c 100644
--- a/net/sched/sch_multiq.c
+++ b/net/sched/sch_multiq.c
@@ -298,9 +298,6 @@ static int multiq_graft(struct Qdisc *sch, unsigned long arg, struct Qdisc *new,
 	struct multiq_sched_data *q = qdisc_priv(sch);
 	unsigned long band = arg - 1;
 
-	if (band >= q->bands)
-		return -EINVAL;
-
 	if (new == NULL)
 		new = &noop_qdisc;
 
@@ -320,9 +317,6 @@ multiq_leaf(struct Qdisc *sch, unsigned long arg)
 	struct multiq_sched_data *q = qdisc_priv(sch);
 	unsigned long band = arg - 1;
 
-	if (band >= q->bands)
-		return NULL;
-
 	return q->queues[band];
 }
 
@@ -353,11 +347,8 @@ static int multiq_dump_class(struct Qdisc *sch, unsigned long cl,
 {
 	struct multiq_sched_data *q = qdisc_priv(sch);
 
-	if (cl - 1 > q->bands)
-		return -ENOENT;
 	tcm->tcm_handle |= TC_H_MIN(cl);
-	if (q->queues[cl-1])
-		tcm->tcm_info = q->queues[cl-1]->handle;
+	tcm->tcm_info = q->queues[cl-1]->handle;
 	return 0;
 }
 
diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c
index 209a4ca..0f73c41 100644
--- a/net/sched/sch_prio.c
+++ b/net/sched/sch_prio.c
@@ -262,9 +262,6 @@ static int prio_graft(struct Qdisc *sch, unsigned long arg, struct Qdisc *new,
 	struct prio_sched_data *q = qdisc_priv(sch);
 	unsigned long band = arg - 1;
 
-	if (band >= q->bands)
-		return -EINVAL;
-
 	if (new == NULL)
 		new = &noop_qdisc;
 
@@ -284,9 +281,6 @@ prio_leaf(struct Qdisc *sch, unsigned long arg)
 	struct prio_sched_data *q = qdisc_priv(sch);
 	unsigned long band = arg - 1;
 
-	if (band >= q->bands)
-		return NULL;
-
 	return q->queues[band];
 }
 
@@ -316,11 +310,8 @@ static int prio_dump_class(struct Qdisc *sch, unsigned long cl, struct sk_buff *
 {
 	struct prio_sched_data *q = qdisc_priv(sch);
 
-	if (cl - 1 > q->bands)
-		return -ENOENT;
 	tcm->tcm_handle |= TC_H_MIN(cl);
-	if (q->queues[cl-1])
-		tcm->tcm_info = q->queues[cl-1]->handle;
+	tcm->tcm_info = q->queues[cl-1]->handle;
 	return 0;
 }
 
diff --git a/net/sched/sch_red.c b/net/sched/sch_red.c
index a2c4d1a..072cdf4 100644
--- a/net/sched/sch_red.c
+++ b/net/sched/sch_red.c
@@ -268,8 +268,6 @@ static int red_dump_class(struct Qdisc *sch, unsigned long cl,
 {
 	struct red_sched_data *q = qdisc_priv(sch);
 
-	if (cl != 1)
-		return -ENOENT;
 	tcm->tcm_handle |= TC_H_MIN(1);
 	tcm->tcm_info = q->qdisc->handle;
 	return 0;
diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
index d904167..8fb8107 100644
--- a/net/sched/sch_tbf.c
+++ b/net/sched/sch_tbf.c
@@ -368,9 +368,6 @@ static int tbf_dump_class(struct Qdisc *sch, unsigned long cl,
 {
 	struct tbf_sched_data *q = qdisc_priv(sch);
 
-	if (cl != 1) 	/* only one class */
-		return -ENOENT;
-
 	tcm->tcm_handle |= TC_H_MIN(1);
 	tcm->tcm_info = q->qdisc->handle;
 

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* net_sched 05/07: reintroduce dev->qdisc for use by sch_api
  2009-09-04 16:41 net_sched 00/07: classful multiqueue dummy scheduler Patrick McHardy
                   ` (3 preceding siblings ...)
  2009-09-04 16:41 ` net_sched 04/07: remove some unnecessary checks in classful schedulers Patrick McHardy
@ 2009-09-04 16:41 ` Patrick McHardy
  2009-09-06 18:57   ` Jarek Poplawski
  2009-09-04 16:41 ` net_sched 06/07: move dev_graft_qdisc() to sch_generic.c Patrick McHardy
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 46+ messages in thread
From: Patrick McHardy @ 2009-09-04 16:41 UTC (permalink / raw)
  To: netdev; +Cc: Patrick McHardy

commit 57a016350a3d85dc351ab90ce91e4dc49ce2183a
Author: Patrick McHardy <kaber@trash.net>
Date:   Fri Sep 4 16:12:45 2009 +0200

    net_sched: reintroduce dev->qdisc for use by sch_api
    
    Currently the multiqueue integration with the qdisc API suffers from
    a few problems:
    
    - with multiple queues, all root qdiscs use the same handle. This means
      they can't be exposed to userspace in a backwards compatible fashion.
    
    - all API operations always refer to queue number 0. Newly created
      qdiscs are automatically shared between all queues, its not possible
      to address individual queues or restore multiqueue behaviour once a
      shared qdisc has been attached.
    
    - Dumps only contain the root qdisc of queue 0, in case of non-shared
      qdiscs this means the statistics are incomplete.
    
    This patch reintroduces dev->qdisc, which points to the (single) root qdisc
    from userspace's point of view. Currently it either points to the first
    (non-shared) default qdisc, or a qdisc shared between all queues. The
    following patches will introduce a classful dummy qdisc, which will be used
    as root qdisc and contain the per-queue qdiscs as children.
    
    Signed-off-by: Patrick McHardy <kaber@trash.net>

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 121cbad..a44118b 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -832,6 +832,9 @@ struct net_device
 	/* Number of TX queues currently active in device  */
 	unsigned int		real_num_tx_queues;
 
+	/* root qdisc from userspace point of view */
+	struct Qdisc		*qdisc;
+
 	unsigned long		tx_queue_len;	/* Max frames per queue allowed */
 	spinlock_t		tx_global_lock;
 /*
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index bbcba2a..eb42873 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -606,7 +606,6 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 			    int type, u32 pid, u32 seq, u32 change,
 			    unsigned int flags)
 {
-	struct netdev_queue *txq;
 	struct ifinfomsg *ifm;
 	struct nlmsghdr *nlh;
 	const struct net_device_stats *stats;
@@ -637,9 +636,8 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 	if (dev->master)
 		NLA_PUT_U32(skb, IFLA_MASTER, dev->master->ifindex);
 
-	txq = netdev_get_tx_queue(dev, 0);
-	if (txq->qdisc_sleeping)
-		NLA_PUT_STRING(skb, IFLA_QDISC, txq->qdisc_sleeping->ops->id);
+	if (dev->qdisc)
+		NLA_PUT_STRING(skb, IFLA_QDISC, dev->qdisc->ops->id);
 
 	if (dev->ifalias)
 		NLA_PUT_STRING(skb, IFLA_IFALIAS, dev->ifalias);
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index eaa8f43..8cbc66f 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -168,8 +168,7 @@ replay:
 
 	/* Find qdisc */
 	if (!parent) {
-		struct netdev_queue *dev_queue = netdev_get_tx_queue(dev, 0);
-		q = dev_queue->qdisc_sleeping;
+		q = dev->qdisc;
 		parent = q->handle;
 	} else {
 		q = qdisc_lookup(dev, TC_H_MAJ(t->tcm_parent));
@@ -408,7 +407,6 @@ static int tcf_node_dump(struct tcf_proto *tp, unsigned long n,
 static int tc_dump_tfilter(struct sk_buff *skb, struct netlink_callback *cb)
 {
 	struct net *net = sock_net(skb->sk);
-	struct netdev_queue *dev_queue;
 	int t;
 	int s_t;
 	struct net_device *dev;
@@ -427,9 +425,8 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct netlink_callback *cb)
 	if ((dev = dev_get_by_index(&init_net, tcm->tcm_ifindex)) == NULL)
 		return skb->len;
 
-	dev_queue = netdev_get_tx_queue(dev, 0);
 	if (!tcm->tcm_parent)
-		q = dev_queue->qdisc_sleeping;
+		q = dev->qdisc;
 	else
 		q = qdisc_lookup(dev, TC_H_MAJ(tcm->tcm_parent));
 	if (!q)
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 166fcca..8aa9a0c 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -207,7 +207,7 @@ static struct Qdisc *qdisc_match_from_root(struct Qdisc *root, u32 handle)
 static void qdisc_list_add(struct Qdisc *q)
 {
 	if ((q->parent != TC_H_ROOT) && !(q->flags & TCQ_F_INGRESS))
-		list_add_tail(&q->list, &qdisc_root_sleeping(q)->list);
+		list_add_tail(&q->list, &qdisc_dev(q)->qdisc->list);
 }
 
 void qdisc_list_del(struct Qdisc *q)
@@ -219,17 +219,11 @@ EXPORT_SYMBOL(qdisc_list_del);
 
 struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle)
 {
-	unsigned int i;
 	struct Qdisc *q;
 
-	for (i = 0; i < dev->num_tx_queues; i++) {
-		struct netdev_queue *txq = netdev_get_tx_queue(dev, i);
-		struct Qdisc *txq_root = txq->qdisc_sleeping;
-
-		q = qdisc_match_from_root(txq_root, handle);
-		if (q)
-			goto out;
-	}
+	q = qdisc_match_from_root(dev->qdisc, handle);
+	if (q)
+		goto out;
 
 	q = qdisc_match_from_root(dev->rx_queue.qdisc_sleeping, handle);
 out:
@@ -720,9 +714,14 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
 			if (new && i > 0)
 				atomic_inc(&new->refcnt);
 
-			notify_and_destroy(skb, n, classid, old, new);
+			qdisc_destroy(old);
 		}
 
+		notify_and_destroy(skb, n, classid, dev->qdisc, new);
+		if (new)
+			atomic_inc(&new->refcnt);
+		dev->qdisc = new ? : &noop_qdisc;
+
 		if (dev->flags & IFF_UP)
 			dev_activate(dev);
 	} else {
@@ -974,9 +973,7 @@ static int tc_get_qdisc(struct sk_buff *skb, struct nlmsghdr *n, void *arg)
 				q = dev->rx_queue.qdisc_sleeping;
 			}
 		} else {
-			struct netdev_queue *dev_queue;
-			dev_queue = netdev_get_tx_queue(dev, 0);
-			q = dev_queue->qdisc_sleeping;
+			q = dev->qdisc;
 		}
 		if (!q)
 			return -ENOENT;
@@ -1044,9 +1041,7 @@ replay:
 				q = dev->rx_queue.qdisc_sleeping;
 			}
 		} else {
-			struct netdev_queue *dev_queue;
-			dev_queue = netdev_get_tx_queue(dev, 0);
-			q = dev_queue->qdisc_sleeping;
+			q = dev->qdisc;
 		}
 
 		/* It may be default qdisc, ignore it */
@@ -1291,8 +1286,7 @@ static int tc_dump_qdisc(struct sk_buff *skb, struct netlink_callback *cb)
 			s_q_idx = 0;
 		q_idx = 0;
 
-		dev_queue = netdev_get_tx_queue(dev, 0);
-		if (tc_dump_qdisc_root(dev_queue->qdisc_sleeping, skb, cb, &q_idx, s_q_idx) < 0)
+		if (tc_dump_qdisc_root(dev->qdisc, skb, cb, &q_idx, s_q_idx) < 0)
 			goto done;
 
 		dev_queue = &dev->rx_queue;
@@ -1323,7 +1317,6 @@ done:
 static int tc_ctl_tclass(struct sk_buff *skb, struct nlmsghdr *n, void *arg)
 {
 	struct net *net = sock_net(skb->sk);
-	struct netdev_queue *dev_queue;
 	struct tcmsg *tcm = NLMSG_DATA(n);
 	struct nlattr *tca[TCA_MAX + 1];
 	struct net_device *dev;
@@ -1361,7 +1354,6 @@ static int tc_ctl_tclass(struct sk_buff *skb, struct nlmsghdr *n, void *arg)
 
 	/* Step 1. Determine qdisc handle X:0 */
 
-	dev_queue = netdev_get_tx_queue(dev, 0);
 	if (pid != TC_H_ROOT) {
 		u32 qid1 = TC_H_MAJ(pid);
 
@@ -1372,7 +1364,7 @@ static int tc_ctl_tclass(struct sk_buff *skb, struct nlmsghdr *n, void *arg)
 		} else if (qid1) {
 			qid = qid1;
 		} else if (qid == 0)
-			qid = dev_queue->qdisc_sleeping->handle;
+			qid = dev->qdisc->handle;
 
 		/* Now qid is genuine qdisc handle consistent
 		   both with parent and child.
@@ -1383,7 +1375,7 @@ static int tc_ctl_tclass(struct sk_buff *skb, struct nlmsghdr *n, void *arg)
 			pid = TC_H_MAKE(qid, pid);
 	} else {
 		if (qid == 0)
-			qid = dev_queue->qdisc_sleeping->handle;
+			qid = dev->qdisc->handle;
 	}
 
 	/* OK. Locate qdisc */
@@ -1588,8 +1580,7 @@ static int tc_dump_tclass(struct sk_buff *skb, struct netlink_callback *cb)
 	s_t = cb->args[0];
 	t = 0;
 
-	dev_queue = netdev_get_tx_queue(dev, 0);
-	if (tc_dump_tclass_root(dev_queue->qdisc_sleeping, skb, tcm, cb, &t, s_t) < 0)
+	if (tc_dump_tclass_root(dev->qdisc, skb, tcm, cb, &t, s_t) < 0)
 		goto done;
 
 	dev_queue = &dev->rx_queue;
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 6128e6f..a91f079 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -623,19 +623,6 @@ void qdisc_destroy(struct Qdisc *qdisc)
 }
 EXPORT_SYMBOL(qdisc_destroy);
 
-static bool dev_all_qdisc_sleeping_noop(struct net_device *dev)
-{
-	unsigned int i;
-
-	for (i = 0; i < dev->num_tx_queues; i++) {
-		struct netdev_queue *txq = netdev_get_tx_queue(dev, i);
-
-		if (txq->qdisc_sleeping != &noop_qdisc)
-			return false;
-	}
-	return true;
-}
-
 static void attach_one_default_qdisc(struct net_device *dev,
 				     struct netdev_queue *dev_queue,
 				     void *_unused)
@@ -677,6 +664,7 @@ static void transition_one_qdisc(struct net_device *dev,
 
 void dev_activate(struct net_device *dev)
 {
+	struct netdev_queue *txq;
 	int need_watchdog;
 
 	/* No queueing discipline is attached to device;
@@ -685,9 +673,14 @@ void dev_activate(struct net_device *dev)
 	   virtual interfaces
 	 */
 
-	if (dev_all_qdisc_sleeping_noop(dev))
+	if (dev->qdisc == &noop_qdisc) {
 		netdev_for_each_tx_queue(dev, attach_one_default_qdisc, NULL);
 
+		txq = netdev_get_tx_queue(dev, 0);
+		dev->qdisc = txq->qdisc_sleeping;
+		atomic_inc(&dev->qdisc->refcnt);
+	}
+
 	if (!netif_carrier_ok(dev))
 		/* Delay activation until next carrier-on event */
 		return;
@@ -777,6 +770,7 @@ static void dev_init_scheduler_queue(struct net_device *dev,
 
 void dev_init_scheduler(struct net_device *dev)
 {
+	dev->qdisc = &noop_qdisc;
 	netdev_for_each_tx_queue(dev, dev_init_scheduler_queue, &noop_qdisc);
 	dev_init_scheduler_queue(dev, &dev->rx_queue, &noop_qdisc);
 
@@ -802,5 +796,8 @@ void dev_shutdown(struct net_device *dev)
 {
 	netdev_for_each_tx_queue(dev, shutdown_scheduler_queue, &noop_qdisc);
 	shutdown_scheduler_queue(dev, &dev->rx_queue, &noop_qdisc);
+	qdisc_destroy(dev->qdisc);
+	dev->qdisc = &noop_qdisc;
+
 	WARN_ON(timer_pending(&dev->watchdog_timer));
 }

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* net_sched 06/07: move dev_graft_qdisc() to sch_generic.c
  2009-09-04 16:41 net_sched 00/07: classful multiqueue dummy scheduler Patrick McHardy
                   ` (4 preceding siblings ...)
  2009-09-04 16:41 ` net_sched 05/07: reintroduce dev->qdisc for use by sch_api Patrick McHardy
@ 2009-09-04 16:41 ` Patrick McHardy
  2009-09-04 16:41 ` net_sched 07/07: add classful multiqueue dummy scheduler Patrick McHardy
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 46+ messages in thread
From: Patrick McHardy @ 2009-09-04 16:41 UTC (permalink / raw)
  To: netdev; +Cc: Patrick McHardy

commit 7d0411697d850bcabf79bdc5bce9bf140fb317ef
Author: Patrick McHardy <kaber@trash.net>
Date:   Fri Sep 4 16:12:45 2009 +0200

    net_sched: move dev_graft_qdisc() to sch_generic.c
    
    It will be used in a following patch by the multiqueue qdisc.
    
    Signed-off-by: Patrick McHardy <kaber@trash.net>

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index a48a4cc..a92dc62 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -302,6 +302,8 @@ extern void dev_init_scheduler(struct net_device *dev);
 extern void dev_shutdown(struct net_device *dev);
 extern void dev_activate(struct net_device *dev);
 extern void dev_deactivate(struct net_device *dev);
+extern struct Qdisc *dev_graft_qdisc(struct netdev_queue *dev_queue,
+				     struct Qdisc *qdisc);
 extern void qdisc_reset(struct Qdisc *qdisc);
 extern void qdisc_destroy(struct Qdisc *qdisc);
 extern void qdisc_tree_decrease_qlen(struct Qdisc *qdisc, unsigned int n);
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 8aa9a0c..d71f12b 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -610,32 +610,6 @@ static u32 qdisc_alloc_handle(struct net_device *dev)
 	return i>0 ? autohandle : 0;
 }
 
-/* Attach toplevel qdisc to device queue. */
-
-static struct Qdisc *dev_graft_qdisc(struct netdev_queue *dev_queue,
-				     struct Qdisc *qdisc)
-{
-	struct Qdisc *oqdisc = dev_queue->qdisc_sleeping;
-	spinlock_t *root_lock;
-
-	root_lock = qdisc_lock(oqdisc);
-	spin_lock_bh(root_lock);
-
-	/* Prune old scheduler */
-	if (oqdisc && atomic_read(&oqdisc->refcnt) <= 1)
-		qdisc_reset(oqdisc);
-
-	/* ... and graft new one */
-	if (qdisc == NULL)
-		qdisc = &noop_qdisc;
-	dev_queue->qdisc_sleeping = qdisc;
-	rcu_assign_pointer(dev_queue->qdisc, &noop_qdisc);
-
-	spin_unlock_bh(root_lock);
-
-	return oqdisc;
-}
-
 void qdisc_tree_decrease_qlen(struct Qdisc *sch, unsigned int n)
 {
 	const struct Qdisc_class_ops *cops;
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index a91f079..e7c47ce 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -623,6 +623,31 @@ void qdisc_destroy(struct Qdisc *qdisc)
 }
 EXPORT_SYMBOL(qdisc_destroy);
 
+/* Attach toplevel qdisc to device queue. */
+struct Qdisc *dev_graft_qdisc(struct netdev_queue *dev_queue,
+			      struct Qdisc *qdisc)
+{
+	struct Qdisc *oqdisc = dev_queue->qdisc_sleeping;
+	spinlock_t *root_lock;
+
+	root_lock = qdisc_lock(oqdisc);
+	spin_lock_bh(root_lock);
+
+	/* Prune old scheduler */
+	if (oqdisc && atomic_read(&oqdisc->refcnt) <= 1)
+		qdisc_reset(oqdisc);
+
+	/* ... and graft new one */
+	if (qdisc == NULL)
+		qdisc = &noop_qdisc;
+	dev_queue->qdisc_sleeping = qdisc;
+	rcu_assign_pointer(dev_queue->qdisc, &noop_qdisc);
+
+	spin_unlock_bh(root_lock);
+
+	return oqdisc;
+}
+
 static void attach_one_default_qdisc(struct net_device *dev,
 				     struct netdev_queue *dev_queue,
 				     void *_unused)

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* net_sched 07/07: add classful multiqueue dummy scheduler
  2009-09-04 16:41 net_sched 00/07: classful multiqueue dummy scheduler Patrick McHardy
                   ` (5 preceding siblings ...)
  2009-09-04 16:41 ` net_sched 06/07: move dev_graft_qdisc() to sch_generic.c Patrick McHardy
@ 2009-09-04 16:41 ` Patrick McHardy
  2009-09-06 20:04   ` Jarek Poplawski
  2009-09-04 16:42 ` net_sched 00/07: " Patrick McHardy
  2009-09-05  7:27 ` David Miller
  8 siblings, 1 reply; 46+ messages in thread
From: Patrick McHardy @ 2009-09-04 16:41 UTC (permalink / raw)
  To: netdev; +Cc: Patrick McHardy

commit f114d0f02c9e72fea7bbc4d28a113946183fc65f
Author: Patrick McHardy <kaber@trash.net>
Date:   Fri Sep 4 18:25:04 2009 +0200

    net_sched: add classful multiqueue dummy scheduler
    
    This patch adds a classful dummy scheduler which can be used as root qdisc
    for multiqueue devices and exposes each device queue as a child class.
    
    This allows to address queues individually and graft them similar to regular
    classes. Additionally it presents an accumulated view of the statistics of
    all real root qdiscs in the dummy root.
    
    Two new callbacks are added to the qdisc_ops and qdisc_class_ops:
    
    - cl_ops->select_queue selects the tx queue number for new child classes.
    
    - qdisc_ops->attach() overrides root qdisc device grafting to attach
      non-shared qdiscs to the queues.
    
    Signed-off-by: Patrick McHardy <kaber@trash.net>

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index a92dc62..9c69585 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -80,6 +80,7 @@ struct Qdisc
 struct Qdisc_class_ops
 {
 	/* Child qdisc manipulation */
+	unsigned int		(*select_queue)(struct Qdisc *, struct tcmsg *);
 	int			(*graft)(struct Qdisc *, unsigned long cl,
 					struct Qdisc *, struct Qdisc **);
 	struct Qdisc *		(*leaf)(struct Qdisc *, unsigned long cl);
@@ -122,6 +123,7 @@ struct Qdisc_ops
 	void			(*reset)(struct Qdisc *);
 	void			(*destroy)(struct Qdisc *);
 	int			(*change)(struct Qdisc *, struct nlattr *arg);
+	void			(*attach)(struct Qdisc *);
 
 	int			(*dump)(struct Qdisc *, struct sk_buff *);
 	int			(*dump_stats)(struct Qdisc *, struct gnet_dump *);
@@ -255,6 +257,8 @@ static inline void sch_tree_unlock(struct Qdisc *q)
 
 extern struct Qdisc noop_qdisc;
 extern struct Qdisc_ops noop_qdisc_ops;
+extern struct Qdisc_ops pfifo_fast_ops;
+extern struct Qdisc_ops mq_qdisc_ops;
 
 struct Qdisc_class_common
 {
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 54d950c..f14e71b 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -2,7 +2,7 @@
 # Makefile for the Linux Traffic Control Unit.
 #
 
-obj-y	:= sch_generic.o
+obj-y	:= sch_generic.o sch_mq.o
 
 obj-$(CONFIG_NET_SCHED)		+= sch_api.o sch_blackhole.o
 obj-$(CONFIG_NET_CLS)		+= cls_api.o
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index d71f12b..2a78d54 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -678,6 +678,11 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
 		if (dev->flags & IFF_UP)
 			dev_deactivate(dev);
 
+		if (new && new->ops->attach) {
+			new->ops->attach(new);
+			num_q = 0;
+		}
+
 		for (i = 0; i < num_q; i++) {
 			struct netdev_queue *dev_queue = &dev->rx_queue;
 
@@ -692,7 +697,7 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
 		}
 
 		notify_and_destroy(skb, n, classid, dev->qdisc, new);
-		if (new)
+		if (new && !new->ops->attach)
 			atomic_inc(&new->refcnt);
 		dev->qdisc = new ? : &noop_qdisc;
 
@@ -1095,10 +1100,16 @@ create_n_graft:
 		q = qdisc_create(dev, &dev->rx_queue,
 				 tcm->tcm_parent, tcm->tcm_parent,
 				 tca, &err);
-	else
-		q = qdisc_create(dev, netdev_get_tx_queue(dev, 0),
+	else {
+		unsigned int ntx = 0;
+
+		if (p && p->ops->cl_ops && p->ops->cl_ops->select_queue)
+			ntx = p->ops->cl_ops->select_queue(p, tcm);
+
+		q = qdisc_create(dev, netdev_get_tx_queue(dev, ntx),
 				 tcm->tcm_parent, tcm->tcm_handle,
 				 tca, &err);
+	}
 	if (q == NULL) {
 		if (err == -EAGAIN)
 			goto replay;
@@ -1674,6 +1685,7 @@ static int __init pktsched_init(void)
 {
 	register_qdisc(&pfifo_qdisc_ops);
 	register_qdisc(&bfifo_qdisc_ops);
+	register_qdisc(&mq_qdisc_ops);
 	proc_net_fops_create(&init_net, "psched", 0, &psched_fops);
 
 	rtnl_register(PF_UNSPEC, RTM_NEWQDISC, tc_modify_qdisc, NULL);
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index e7c47ce..4ae6aa5 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -514,7 +514,7 @@ static int pfifo_fast_init(struct Qdisc *qdisc, struct nlattr *opt)
 	return 0;
 }
 
-static struct Qdisc_ops pfifo_fast_ops __read_mostly = {
+struct Qdisc_ops pfifo_fast_ops __read_mostly = {
 	.id		=	"pfifo_fast",
 	.priv_size	=	sizeof(struct pfifo_fast_priv),
 	.enqueue	=	pfifo_fast_enqueue,
@@ -670,6 +670,26 @@ static void attach_one_default_qdisc(struct net_device *dev,
 	dev_queue->qdisc_sleeping = qdisc;
 }
 
+static void attach_default_qdiscs(struct net_device *dev)
+{
+	struct netdev_queue *txq;
+	struct Qdisc *qdisc;
+
+	txq = netdev_get_tx_queue(dev, 0);
+
+	if (!netif_is_multiqueue(dev) || dev->tx_queue_len == 0) {
+		netdev_for_each_tx_queue(dev, attach_one_default_qdisc, NULL);
+		dev->qdisc = txq->qdisc_sleeping;
+		atomic_inc(&dev->qdisc->refcnt);
+	} else {
+		qdisc = qdisc_create_dflt(dev, txq, &mq_qdisc_ops, TC_H_ROOT);
+		if (qdisc) {
+			qdisc->ops->attach(qdisc);
+			dev->qdisc = qdisc;
+		}
+	}
+}
+
 static void transition_one_qdisc(struct net_device *dev,
 				 struct netdev_queue *dev_queue,
 				 void *_need_watchdog)
@@ -689,7 +709,6 @@ static void transition_one_qdisc(struct net_device *dev,
 
 void dev_activate(struct net_device *dev)
 {
-	struct netdev_queue *txq;
 	int need_watchdog;
 
 	/* No queueing discipline is attached to device;
@@ -698,13 +717,8 @@ void dev_activate(struct net_device *dev)
 	   virtual interfaces
 	 */
 
-	if (dev->qdisc == &noop_qdisc) {
-		netdev_for_each_tx_queue(dev, attach_one_default_qdisc, NULL);
-
-		txq = netdev_get_tx_queue(dev, 0);
-		dev->qdisc = txq->qdisc_sleeping;
-		atomic_inc(&dev->qdisc->refcnt);
-	}
+	if (dev->qdisc == &noop_qdisc)
+		attach_default_qdiscs(dev);
 
 	if (!netif_carrier_ok(dev))
 		/* Delay activation until next carrier-on event */
diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
new file mode 100644
index 0000000..5e453fd
--- /dev/null
+++ b/net/sched/sch_mq.c
@@ -0,0 +1,234 @@
+/*
+ * net/sched/sch_mq.c		Classful multiqueue dummy scheduler
+ *
+ * Copyright (c) 2009 Patrick McHardy <kaber@trash.net>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2 as published by the Free Software Foundation.
+ */
+
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <linux/errno.h>
+#include <linux/skbuff.h>
+#include <net/netlink.h>
+#include <net/pkt_sched.h>
+
+struct mq_sched {
+	struct Qdisc		**qdiscs;
+};
+
+static void mq_destroy(struct Qdisc *sch)
+{
+	struct net_device *dev = qdisc_dev(sch);
+	struct mq_sched *priv = qdisc_priv(sch);
+	unsigned int ntx;
+
+	if (priv->qdiscs)
+		return;
+	for (ntx = 0; ntx < dev->num_tx_queues && priv->qdiscs[ntx]; ntx++)
+		qdisc_destroy(priv->qdiscs[ntx]);
+	kfree(priv->qdiscs);
+}
+
+static int mq_init(struct Qdisc *sch, struct nlattr *opt)
+{
+	struct net_device *dev = qdisc_dev(sch);
+	struct mq_sched *priv = qdisc_priv(sch);
+	struct netdev_queue *dev_queue;
+	struct Qdisc *qdisc;
+	unsigned int ntx;
+
+	if (sch->parent != TC_H_ROOT)
+		return -EOPNOTSUPP;
+
+	if (!netif_is_multiqueue(dev))
+		return -EOPNOTSUPP;
+
+	/* pre-allocate qdiscs, attachment can't fail */
+	priv->qdiscs = kcalloc(dev->num_tx_queues, sizeof(priv->qdiscs[0]),
+			       GFP_KERNEL);
+	if (priv->qdiscs == NULL)
+		return -ENOMEM;
+
+	for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
+		dev_queue = netdev_get_tx_queue(dev, ntx);
+		qdisc = qdisc_create_dflt(dev, dev_queue, &pfifo_fast_ops,
+					  TC_H_MAKE(TC_H_MAJ(sch->handle),
+						    TC_H_MIN(ntx + 1)));
+		if (qdisc == NULL)
+			goto err;
+		qdisc->flags |= TCQ_F_CAN_BYPASS;
+		priv->qdiscs[ntx] = qdisc;
+	}
+
+	return 0;
+
+err:
+	mq_destroy(sch);
+	return -ENOMEM;
+}
+
+static void mq_attach(struct Qdisc *sch)
+{
+	struct net_device *dev = qdisc_dev(sch);
+	struct mq_sched *priv = qdisc_priv(sch);
+	struct Qdisc *qdisc;
+	unsigned int ntx;
+
+	for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
+		qdisc = priv->qdiscs[ntx];
+		qdisc = dev_graft_qdisc(qdisc->dev_queue, qdisc);
+		if (qdisc)
+			qdisc_destroy(qdisc);
+	}
+	kfree(priv->qdiscs);
+	priv->qdiscs = NULL;
+}
+
+static int mq_dump(struct Qdisc *sch, struct sk_buff *skb)
+{
+	struct net_device *dev = qdisc_dev(sch);
+	struct Qdisc *qdisc;
+	unsigned int ntx;
+
+	sch->q.qlen = 0;
+	memset(&sch->bstats, 0, sizeof(sch->bstats));
+	memset(&sch->qstats, 0, sizeof(sch->qstats));
+
+	for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
+		qdisc = netdev_get_tx_queue(dev, ntx)->qdisc_sleeping;
+		spin_lock_bh(qdisc_lock(qdisc));
+		sch->q.qlen		+= qdisc->q.qlen;
+		sch->bstats.bytes	+= qdisc->bstats.bytes;
+		sch->bstats.packets	+= qdisc->bstats.packets;
+		sch->qstats.qlen	+= qdisc->qstats.qlen;
+		sch->qstats.backlog	+= qdisc->qstats.backlog;
+		sch->qstats.drops	+= qdisc->qstats.drops;
+		sch->qstats.requeues	+= qdisc->qstats.requeues;
+		sch->qstats.overlimits	+= qdisc->qstats.overlimits;
+		spin_unlock_bh(qdisc_lock(qdisc));
+	}
+	return 0;
+}
+
+static struct netdev_queue *mq_queue_get(struct Qdisc *sch, unsigned long cl)
+{
+	struct net_device *dev = qdisc_dev(sch);
+	unsigned long ntx = cl - 1;
+
+	if (ntx >= dev->num_tx_queues)
+		return NULL;
+	return netdev_get_tx_queue(dev, ntx);
+}
+
+static unsigned int mq_select_queue(struct Qdisc *sch, struct tcmsg *tcm)
+{
+	unsigned int ntx = TC_H_MIN(tcm->tcm_parent);
+
+	if (!mq_queue_get(sch, ntx))
+		return 0;
+	return ntx - 1;
+}
+
+static int mq_graft(struct Qdisc *sch, unsigned long cl, struct Qdisc *new,
+		    struct Qdisc **old)
+{
+	struct netdev_queue *dev_queue = mq_queue_get(sch, cl);
+	struct net_device *dev = qdisc_dev(sch);
+
+	if (dev->flags & IFF_UP)
+		dev_deactivate(dev);
+
+	*old = dev_graft_qdisc(dev_queue, new);
+
+	if (dev->flags & IFF_UP)
+		dev_activate(dev);
+	return 0;
+}
+
+static struct Qdisc *mq_leaf(struct Qdisc *sch, unsigned long cl)
+{
+	struct netdev_queue *dev_queue = mq_queue_get(sch, cl);
+
+	return dev_queue->qdisc_sleeping;
+}
+
+static unsigned long mq_get(struct Qdisc *sch, u32 classid)
+{
+	unsigned int ntx = TC_H_MIN(classid);
+
+	if (!mq_queue_get(sch, ntx))
+		return 0;
+	return ntx;
+}
+
+static void mq_put(struct Qdisc *sch, unsigned long cl)
+{
+	return;
+}
+
+static int mq_dump_class(struct Qdisc *sch, unsigned long cl,
+			 struct sk_buff *skb, struct tcmsg *tcm)
+{
+	struct netdev_queue *dev_queue = mq_queue_get(sch, cl);
+
+	tcm->tcm_parent = TC_H_ROOT;
+	tcm->tcm_handle |= TC_H_MIN(cl);
+	tcm->tcm_info = dev_queue->qdisc_sleeping->handle;
+	return 0;
+}
+
+static int mq_dump_class_stats(struct Qdisc *sch, unsigned long cl,
+			       struct gnet_dump *d)
+{
+	struct netdev_queue *dev_queue = mq_queue_get(sch, cl);
+
+	sch = dev_queue->qdisc_sleeping;
+	if (gnet_stats_copy_basic(d, &sch->bstats) < 0 ||
+	    gnet_stats_copy_queue(d, &sch->qstats) < 0)
+		return -1;
+	return 0;
+}
+
+static void mq_walk(struct Qdisc *sch, struct qdisc_walker *arg)
+{
+	struct net_device *dev = qdisc_dev(sch);
+	unsigned int ntx;
+
+	if (arg->stop)
+		return;
+
+	arg->count = arg->skip;
+	for (ntx = arg->skip; ntx < dev->num_tx_queues; ntx++) {
+		if (arg->fn(sch, ntx + 1, arg) < 0) {
+			arg->stop = 1;
+			break;
+		}
+		arg->count++;
+	}
+}
+
+static const struct Qdisc_class_ops mq_class_ops = {
+	.select_queue	= mq_select_queue,
+	.graft		= mq_graft,
+	.leaf		= mq_leaf,
+	.get		= mq_get,
+	.put		= mq_put,
+	.walk		= mq_walk,
+	.dump		= mq_dump_class,
+	.dump_stats	= mq_dump_class_stats,
+};
+
+struct Qdisc_ops mq_qdisc_ops __read_mostly = {
+	.cl_ops		= &mq_class_ops,
+	.id		= "mq",
+	.priv_size	= sizeof(struct mq_sched),
+	.init		= mq_init,
+	.destroy	= mq_destroy,
+	.attach		= mq_attach,
+	.dump		= mq_dump,
+	.owner		= THIS_MODULE,
+};

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: net_sched 00/07: classful multiqueue dummy scheduler
  2009-09-04 16:41 net_sched 00/07: classful multiqueue dummy scheduler Patrick McHardy
                   ` (6 preceding siblings ...)
  2009-09-04 16:41 ` net_sched 07/07: add classful multiqueue dummy scheduler Patrick McHardy
@ 2009-09-04 16:42 ` Patrick McHardy
  2009-09-07  8:50   ` David Miller
  2009-09-05  7:27 ` David Miller
  8 siblings, 1 reply; 46+ messages in thread
From: Patrick McHardy @ 2009-09-04 16:42 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: Type: text/plain, Size: 1279 bytes --]

Patrick McHardy wrote:
> These patches contain a classful multiqueue ("mq") dummy scheduler to fix a
> couple of problems with the current multiqueue TC API integration. The
> changelogs of patch 05 and 07 contain more details.
> 
> The mq scheduler does two things:
> 
> - present device TX queues as classes, allowing to attach different qdiscs
>   to them, which are grafted to the TX queues
> 
> - present accumulated statistics of all device queue root qdiscs
> 
> Its used by default for multiqueue devices instead of the regular pfifo_fast
> qdisc, but can also be attached manually to restore multiqueue behaviour
> after attaching a non-multiqueue (shared) qdisc.
> 
> Patches 1-4 contain some preparatory cleanups because I was getting tired
> of copying unnecessary checks and dummy functions :)
> 
> Patch 5 introduces a dev->qdisc pointer, which points to the root qdisc from
> userspace's point of view. This is later used for the mq qdisc, which isn't
> actually attached to any device queues. Patch 7 contains the mq scheduler.
> 
> I've tested the scheduler with a hacked macvlan version which uses 4 queues,
> but since I don't own a multiqueue capable device I couldn't test this on
> real hardware.

And for reference, this is the script I used for testing:


[-- Attachment #2: test.sh --]
[-- Type: application/x-sh, Size: 928 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: net_sched 00/07: classful multiqueue dummy scheduler
  2009-09-04 16:41 net_sched 00/07: classful multiqueue dummy scheduler Patrick McHardy
                   ` (7 preceding siblings ...)
  2009-09-04 16:42 ` net_sched 00/07: " Patrick McHardy
@ 2009-09-05  7:27 ` David Miller
  2009-09-05 17:02   ` Patrick McHardy
  8 siblings, 1 reply; 46+ messages in thread
From: David Miller @ 2009-09-05  7:27 UTC (permalink / raw)
  To: kaber; +Cc: netdev

From: Patrick McHardy <kaber@trash.net>
Date: Fri,  4 Sep 2009 18:41:12 +0200 (MEST)

> Any comments and test results welcome :)

This looks really nice.  I have them already checked into my
local net-next-2.6 tree and will push them out after I do
some multiqueue testing with NIU.

Thanks!

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: net_sched 02/07: make cls_ops->tcf_chain() optional
  2009-09-04 16:41 ` net_sched 02/07: make cls_ops->tcf_chain() optional Patrick McHardy
@ 2009-09-05  8:13   ` Jarek Poplawski
  2009-09-05 11:57     ` Jarek Poplawski
  0 siblings, 1 reply; 46+ messages in thread
From: Jarek Poplawski @ 2009-09-05  8:13 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netdev

Patrick McHardy wrote, On 09/04/2009 06:41 PM:

> commit 6ea4233ef8f398289a14a3305d4ed440fb026d43
> Author: Patrick McHardy <kaber@trash.net>
> Date:   Fri Sep 4 14:28:11 2009 +0200
> 
>     net_sched: make cls_ops->tcf_chain() optional
>     
>     Some qdiscs don't support attaching filters. Handle this centrally in
>     cls_api and return a proper errno code (EOPNOTSUPP) instead of EINVAL.
>     
>     Signed-off-by: Patrick McHardy <kaber@trash.net>
> 
> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
> index 09cdcdf..eaa8f43 100644
> --- a/net/sched/cls_api.c
> +++ b/net/sched/cls_api.c
> @@ -181,6 +181,9 @@ replay:
>  	if ((cops = q->ops->cl_ops) == NULL)
>  		return -EINVAL;
>  
> +	if (cops->tcf_chain == NULL)
> +		return -EOPNOTSUPP;
> +


You should probably repeat this in tc_dump_tfilter.

Jarek P.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: net_sched 02/07: make cls_ops->tcf_chain() optional
  2009-09-05  8:13   ` Jarek Poplawski
@ 2009-09-05 11:57     ` Jarek Poplawski
  2009-09-05 12:32       ` Jarek Poplawski
  0 siblings, 1 reply; 46+ messages in thread
From: Jarek Poplawski @ 2009-09-05 11:57 UTC (permalink / raw)
  Cc: Patrick McHardy, netdev

Jarek Poplawski wrote, On 09/05/2009 10:13 AM:

> Patrick McHardy wrote, On 09/04/2009 06:41 PM:
> 
>> commit 6ea4233ef8f398289a14a3305d4ed440fb026d43
>> Author: Patrick McHardy <kaber@trash.net>
>> Date:   Fri Sep 4 14:28:11 2009 +0200
>>
>>     net_sched: make cls_ops->tcf_chain() optional
>>     
>>     Some qdiscs don't support attaching filters. Handle this centrally in
>>     cls_api and return a proper errno code (EOPNOTSUPP) instead of EINVAL.
>>     
>>     Signed-off-by: Patrick McHardy <kaber@trash.net>
>>
>> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
>> index 09cdcdf..eaa8f43 100644
>> --- a/net/sched/cls_api.c
>> +++ b/net/sched/cls_api.c
>> @@ -181,6 +181,9 @@ replay:
>>  	if ((cops = q->ops->cl_ops) == NULL)
>>  		return -EINVAL;
>>  
>> +	if (cops->tcf_chain == NULL)
>> +		return -EOPNOTSUPP;
>> +
> 
> 
> You should probably repeat this in tc_dump_tfilter.
 

...In case somebody finds the way to list a filter before
adding it. ;-) But, since it's quite unlikely, let's foget it.

Sorry,

Jarek P.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: net_sched 02/07: make cls_ops->tcf_chain() optional
  2009-09-05 11:57     ` Jarek Poplawski
@ 2009-09-05 12:32       ` Jarek Poplawski
  2009-09-05 17:03         ` Patrick McHardy
  0 siblings, 1 reply; 46+ messages in thread
From: Jarek Poplawski @ 2009-09-05 12:32 UTC (permalink / raw)
  Cc: Patrick McHardy, netdev

Jarek Poplawski wrote, On 09/05/2009 01:57 PM:

> Jarek Poplawski wrote, On 09/05/2009 10:13 AM:
> 
>> Patrick McHardy wrote, On 09/04/2009 06:41 PM:
>>
>>> commit 6ea4233ef8f398289a14a3305d4ed440fb026d43
>>> Author: Patrick McHardy <kaber@trash.net>
>>> Date:   Fri Sep 4 14:28:11 2009 +0200
>>>
>>>     net_sched: make cls_ops->tcf_chain() optional
>>>     
>>>     Some qdiscs don't support attaching filters. Handle this centrally in
>>>     cls_api and return a proper errno code (EOPNOTSUPP) instead of EINVAL.
>>>     
>>>     Signed-off-by: Patrick McHardy <kaber@trash.net>
>>>
>>> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
>>> index 09cdcdf..eaa8f43 100644
>>> --- a/net/sched/cls_api.c
>>> +++ b/net/sched/cls_api.c
>>> @@ -181,6 +181,9 @@ replay:
>>>  	if ((cops = q->ops->cl_ops) == NULL)
>>>  		return -EINVAL;
>>>  
>>> +	if (cops->tcf_chain == NULL)
>>> +		return -EOPNOTSUPP;
>>> +
>>
>> You should probably repeat this in tc_dump_tfilter.
>  
> 
> ...In case somebody finds the way to list a filter before
> adding it. ;-) But, since it's quite unlikely, let's foget it.


...or simply tries to do it instead of meditating the code.
So this change is definitely needed in tc_dump_tfilter too.

Sorry to myself,
Jarek P.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: net_sched 00/07: classful multiqueue dummy scheduler
  2009-09-05  7:27 ` David Miller
@ 2009-09-05 17:02   ` Patrick McHardy
  2009-09-06  9:01     ` David Miller
  0 siblings, 1 reply; 46+ messages in thread
From: Patrick McHardy @ 2009-09-05 17:02 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 405 bytes --]

David Miller wrote:
> From: Patrick McHardy <kaber@trash.net>
> Date: Fri,  4 Sep 2009 18:41:12 +0200 (MEST)
> 
>> Any comments and test results welcome :)
> 
> This looks really nice.  I have them already checked into my
> local net-next-2.6 tree and will push them out after I do
> some multiqueue testing with NIU.

Thanks. Attached is a small fix on top hat fixes inverted logic
in mq_destroy().






[-- Attachment #2: x --]
[-- Type: text/plain, Size: 417 bytes --]

diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
index 5e453fd..c84dec9 100644
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -26,7 +26,7 @@ static void mq_destroy(struct Qdisc *sch)
 	struct mq_sched *priv = qdisc_priv(sch);
 	unsigned int ntx;
 
-	if (priv->qdiscs)
+	if (!priv->qdiscs)
 		return;
 	for (ntx = 0; ntx < dev->num_tx_queues && priv->qdiscs[ntx]; ntx++)
 		qdisc_destroy(priv->qdiscs[ntx]);

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: net_sched 02/07: make cls_ops->tcf_chain() optional
  2009-09-05 12:32       ` Jarek Poplawski
@ 2009-09-05 17:03         ` Patrick McHardy
  2009-09-06  9:06           ` David Miller
  0 siblings, 1 reply; 46+ messages in thread
From: Patrick McHardy @ 2009-09-05 17:03 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: netdev

Jarek Poplawski wrote:
> Jarek Poplawski wrote, On 09/05/2009 01:57 PM:
> 
>>>> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
>>>> index 09cdcdf..eaa8f43 100644
>>>> --- a/net/sched/cls_api.c
>>>> +++ b/net/sched/cls_api.c
>>>> @@ -181,6 +181,9 @@ replay:
>>>>  	if ((cops = q->ops->cl_ops) == NULL)
>>>>  		return -EINVAL;
>>>>  
>>>> +	if (cops->tcf_chain == NULL)
>>>> +		return -EOPNOTSUPP;
>>>> +
>>> You should probably repeat this in tc_dump_tfilter.
>>  
>>
>> ...In case somebody finds the way to list a filter before
>> adding it. ;-) But, since it's quite unlikely, let's foget it.
> 
> 
> ...or simply tries to do it instead of meditating the code.
> So this change is definitely needed in tc_dump_tfilter too.

Thanks Jarek. I'm on my way out the door, but I'll fix that tommorrow.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: net_sched 00/07: classful multiqueue dummy scheduler
  2009-09-05 17:02   ` Patrick McHardy
@ 2009-09-06  9:01     ` David Miller
  0 siblings, 0 replies; 46+ messages in thread
From: David Miller @ 2009-09-06  9:01 UTC (permalink / raw)
  To: kaber; +Cc: netdev

From: Patrick McHardy <kaber@trash.net>
Date: Sat, 05 Sep 2009 19:02:00 +0200

> David Miller wrote:
>> From: Patrick McHardy <kaber@trash.net>
>> Date: Fri,  4 Sep 2009 18:41:12 +0200 (MEST)
>> 
>>> Any comments and test results welcome :)
>> 
>> This looks really nice.  I have them already checked into my
>> local net-next-2.6 tree and will push them out after I do
>> some multiqueue testing with NIU.
> 
> Thanks. Attached is a small fix on top hat fixes inverted logic
> in mq_destroy().

I've intesgrated this into patch 7, thanks!

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: net_sched 02/07: make cls_ops->tcf_chain() optional
  2009-09-05 17:03         ` Patrick McHardy
@ 2009-09-06  9:06           ` David Miller
  0 siblings, 0 replies; 46+ messages in thread
From: David Miller @ 2009-09-06  9:06 UTC (permalink / raw)
  To: kaber; +Cc: jarkao2, netdev

From: Patrick McHardy <kaber@trash.net>
Date: Sat, 05 Sep 2009 19:03:02 +0200

> Jarek Poplawski wrote:
>> Jarek Poplawski wrote, On 09/05/2009 01:57 PM:
>> 
>>>>> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
>>>>> index 09cdcdf..eaa8f43 100644
>>>>> --- a/net/sched/cls_api.c
>>>>> +++ b/net/sched/cls_api.c
>>>>> @@ -181,6 +181,9 @@ replay:
>>>>>  	if ((cops = q->ops->cl_ops) == NULL)
>>>>>  		return -EINVAL;
>>>>>  
>>>>> +	if (cops->tcf_chain == NULL)
>>>>> +		return -EOPNOTSUPP;
>>>>> +
>>>> You should probably repeat this in tc_dump_tfilter.
>>>  
>>>
>>> ...In case somebody finds the way to list a filter before
>>> adding it. ;-) But, since it's quite unlikely, let's foget it.
>> 
>> 
>> ...or simply tries to do it instead of meditating the code.
>> So this change is definitely needed in tc_dump_tfilter too.
> 
> Thanks Jarek. I'm on my way out the door, but I'll fix that tommorrow.

I'll add the following to patch 2:

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 8cbc66f..6a53694 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -433,6 +433,8 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct netlink_callback *cb)
 		goto out;
 	if ((cops = q->ops->cl_ops) == NULL)
 		goto errout;
+	if (cops->tcf_chain == NULL)
+		goto errout;
 	if (TC_H_MIN(tcm->tcm_parent)) {
 		cl = cops->get(q, tcm->tcm_parent);
 		if (cl == 0)

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: net_sched 05/07: reintroduce dev->qdisc for use by sch_api
  2009-09-04 16:41 ` net_sched 05/07: reintroduce dev->qdisc for use by sch_api Patrick McHardy
@ 2009-09-06 18:57   ` Jarek Poplawski
  2009-09-07 13:16     ` Patrick McHardy
  0 siblings, 1 reply; 46+ messages in thread
From: Jarek Poplawski @ 2009-09-06 18:57 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netdev

Patrick McHardy wrote, On 09/04/2009 06:41 PM:

> commit 57a016350a3d85dc351ab90ce91e4dc49ce2183a
> Author: Patrick McHardy <kaber@trash.net>
> Date:   Fri Sep 4 16:12:45 2009 +0200
> 
>     net_sched: reintroduce dev->qdisc for use by sch_api
>     
>     Currently the multiqueue integration with the qdisc API suffers from
>     a few problems:
>     
>     - with multiple queues, all root qdiscs use the same handle. This means
>       they can't be exposed to userspace in a backwards compatible fashion.
>     
>     - all API operations always refer to queue number 0. Newly created
>       qdiscs are automatically shared between all queues, its not possible
>       to address individual queues or restore multiqueue behaviour once a
>       shared qdisc has been attached.
>     
>     - Dumps only contain the root qdisc of queue 0, in case of non-shared
>       qdiscs this means the statistics are incomplete.
>     
>     This patch reintroduces dev->qdisc, which points to the (single) root qdisc
>     from userspace's point of view. Currently it either points to the first
>     (non-shared) default qdisc, or a qdisc shared between all queues. The
>     following patches will introduce a classful dummy qdisc, which will be used
>     as root qdisc and contain the per-queue qdiscs as children.
...
> @@ -1323,7 +1317,6 @@ done:
>  static int tc_ctl_tclass(struct sk_buff *skb, struct nlmsghdr *n, void *arg)
>  {
>  	struct net *net = sock_net(skb->sk);
> -	struct netdev_queue *dev_queue;
>  	struct tcmsg *tcm = NLMSG_DATA(n);
>  	struct nlattr *tca[TCA_MAX + 1];
>  	struct net_device *dev;
> @@ -1361,7 +1354,6 @@ static int tc_ctl_tclass(struct sk_buff *skb, struct nlmsghdr *n, void *arg)
>  
>  	/* Step 1. Determine qdisc handle X:0 */
>  
> -	dev_queue = netdev_get_tx_queue(dev, 0);
>  	if (pid != TC_H_ROOT) {
>  		u32 qid1 = TC_H_MAJ(pid);
>  
> @@ -1372,7 +1364,7 @@ static int tc_ctl_tclass(struct sk_buff *skb, struct nlmsghdr *n, void *arg)
>  		} else if (qid1) {
>  			qid = qid1;
>  		} else if (qid == 0)
> -			qid = dev_queue->qdisc_sleeping->handle;
> +			qid = dev->qdisc->handle;
>  
>  		/* Now qid is genuine qdisc handle consistent
>  		   both with parent and child.
> @@ -1383,7 +1375,7 @@ static int tc_ctl_tclass(struct sk_buff *skb, struct nlmsghdr *n, void *arg)
>  			pid = TC_H_MAKE(qid, pid);
>  	} else {
>  		if (qid == 0)
> -			qid = dev_queue->qdisc_sleeping->handle;
> +			qid = dev->qdisc->handle;

Probably I miss something, but in mq root case it seems to never do
anything we need. If so, it could be the example of possible issues
elsewhere.

I thought this mq virtual root qdisc could be done more transparently
and invisible for the current code, but it seems, in your
implementation some pointers like this, or parent ids (especially
TC_H_ROOT) might be different, and even if it works OK, needs a lot of
verification. So, my question is, if it's really necessary.

Jarek P.
>  	}
>  
>  	/* OK. Locate qdisc */

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: net_sched 07/07: add classful multiqueue dummy scheduler
  2009-09-04 16:41 ` net_sched 07/07: add classful multiqueue dummy scheduler Patrick McHardy
@ 2009-09-06 20:04   ` Jarek Poplawski
  2009-09-07 13:27     ` Patrick McHardy
  0 siblings, 1 reply; 46+ messages in thread
From: Jarek Poplawski @ 2009-09-06 20:04 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netdev

Patrick McHardy wrote, On 09/04/2009 06:41 PM:

> commit f114d0f02c9e72fea7bbc4d28a113946183fc65f
> Author: Patrick McHardy <kaber@trash.net>
> Date:   Fri Sep 4 18:25:04 2009 +0200
> 
>     net_sched: add classful multiqueue dummy scheduler
>     
>     This patch adds a classful dummy scheduler which can be used as root qdisc
>     for multiqueue devices and exposes each device queue as a child class.
>     
>     This allows to address queues individually and graft them similar to regular
>     classes. Additionally it presents an accumulated view of the statistics of
>     all real root qdiscs in the dummy root.
>     
>     Two new callbacks are added to the qdisc_ops and qdisc_class_ops:
>     
>     - cl_ops->select_queue selects the tx queue number for new child classes.
>     
>     - qdisc_ops->attach() overrides root qdisc device grafting to attach
>       non-shared qdiscs to the queues.
>     
>     Signed-off-by: Patrick McHardy <kaber@trash.net>
> 
> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> index a92dc62..9c69585 100644
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -80,6 +80,7 @@ struct Qdisc
>  struct Qdisc_class_ops
>  {
>  	/* Child qdisc manipulation */
> +	unsigned int		(*select_queue)(struct Qdisc *, struct tcmsg *);
>  	int			(*graft)(struct Qdisc *, unsigned long cl,
>  					struct Qdisc *, struct Qdisc **);
>  	struct Qdisc *		(*leaf)(struct Qdisc *, unsigned long cl);
> @@ -122,6 +123,7 @@ struct Qdisc_ops
>  	void			(*reset)(struct Qdisc *);
>  	void			(*destroy)(struct Qdisc *);
>  	int			(*change)(struct Qdisc *, struct nlattr *arg);
> +	void			(*attach)(struct Qdisc *);

Probably it's a matter of taste, but I wonder why these two methods
used only by one qdisc in max 2 places can't be functions instead
(maybe even static in case of select_queue)? (And this mq sched could
be tested with some flag instead of ->attach, I guess.)

>  
>  	int			(*dump)(struct Qdisc *, struct sk_buff *);
>  	int			(*dump_stats)(struct Qdisc *, struct gnet_dump *);
> @@ -255,6 +257,8 @@ static inline void sch_tree_unlock(struct Qdisc *q)
>  
>  extern struct Qdisc noop_qdisc;
>  extern struct Qdisc_ops noop_qdisc_ops;
> +extern struct Qdisc_ops pfifo_fast_ops;
> +extern struct Qdisc_ops mq_qdisc_ops;
>  
>  struct Qdisc_class_common
>  {
> diff --git a/net/sched/Makefile b/net/sched/Makefile
> index 54d950c..f14e71b 100644
> --- a/net/sched/Makefile
> +++ b/net/sched/Makefile
> @@ -2,7 +2,7 @@
>  # Makefile for the Linux Traffic Control Unit.
>  #
>  
> -obj-y	:= sch_generic.o
> +obj-y	:= sch_generic.o sch_mq.o
>  
>  obj-$(CONFIG_NET_SCHED)		+= sch_api.o sch_blackhole.o
>  obj-$(CONFIG_NET_CLS)		+= cls_api.o
> diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
> index d71f12b..2a78d54 100644
> --- a/net/sched/sch_api.c
> +++ b/net/sched/sch_api.c
> @@ -678,6 +678,11 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
>  		if (dev->flags & IFF_UP)
>  			dev_deactivate(dev);
>  
> +		if (new && new->ops->attach) {
> +			new->ops->attach(new);
> +			num_q = 0;
> +		}
> +

Actually, I wonder if it's not cleaner to let replace all qdiscs with
noops below like in qdisc delete case, and do this attaching in one
place only (dev_activate).

>  		for (i = 0; i < num_q; i++) {
>  			struct netdev_queue *dev_queue = &dev->rx_queue;
>  
> @@ -692,7 +697,7 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
>  		}
>  
>  		notify_and_destroy(skb, n, classid, dev->qdisc, new);
> -		if (new)
> +		if (new && !new->ops->attach)
>  			atomic_inc(&new->refcnt);
>  		dev->qdisc = new ? : &noop_qdisc;
>  
> @@ -1095,10 +1100,16 @@ create_n_graft:
>  		q = qdisc_create(dev, &dev->rx_queue,
>  				 tcm->tcm_parent, tcm->tcm_parent,
>  				 tca, &err);
> -	else
> -		q = qdisc_create(dev, netdev_get_tx_queue(dev, 0),
> +	else {
> +		unsigned int ntx = 0;
> +
> +		if (p && p->ops->cl_ops && p->ops->cl_ops->select_queue)
> +			ntx = p->ops->cl_ops->select_queue(p, tcm);

So, this if could be probably made shorter with a common function, but
the main point is: this probably works only for qdiscs having mq as a
parent, and not below.

> +
> +		q = qdisc_create(dev, netdev_get_tx_queue(dev, ntx),
>  				 tcm->tcm_parent, tcm->tcm_handle,
>  				 tca, &err);
> +	}
>  	if (q == NULL) {
>  		if (err == -EAGAIN)
>  			goto replay;
> @@ -1674,6 +1685,7 @@ static int __init pktsched_init(void)
>  {
>  	register_qdisc(&pfifo_qdisc_ops);
>  	register_qdisc(&bfifo_qdisc_ops);
> +	register_qdisc(&mq_qdisc_ops);
>  	proc_net_fops_create(&init_net, "psched", 0, &psched_fops);
>  
>  	rtnl_register(PF_UNSPEC, RTM_NEWQDISC, tc_modify_qdisc, NULL);
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index e7c47ce..4ae6aa5 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -514,7 +514,7 @@ static int pfifo_fast_init(struct Qdisc *qdisc, struct nlattr *opt)
>  	return 0;
>  }
>  
> -static struct Qdisc_ops pfifo_fast_ops __read_mostly = {
> +struct Qdisc_ops pfifo_fast_ops __read_mostly = {
>  	.id		=	"pfifo_fast",
>  	.priv_size	=	sizeof(struct pfifo_fast_priv),
>  	.enqueue	=	pfifo_fast_enqueue,
> @@ -670,6 +670,26 @@ static void attach_one_default_qdisc(struct net_device *dev,
>  	dev_queue->qdisc_sleeping = qdisc;
>  }
>  
> +static void attach_default_qdiscs(struct net_device *dev)
> +{
> +	struct netdev_queue *txq;
> +	struct Qdisc *qdisc;
> +
> +	txq = netdev_get_tx_queue(dev, 0);
> +
> +	if (!netif_is_multiqueue(dev) || dev->tx_queue_len == 0) {
> +		netdev_for_each_tx_queue(dev, attach_one_default_qdisc, NULL);
> +		dev->qdisc = txq->qdisc_sleeping;
> +		atomic_inc(&dev->qdisc->refcnt);
> +	} else {
> +		qdisc = qdisc_create_dflt(dev, txq, &mq_qdisc_ops, TC_H_ROOT);
> +		if (qdisc) {
> +			qdisc->ops->attach(qdisc);
> +			dev->qdisc = qdisc;
> +		}
> +	}
> +}
> +
>  static void transition_one_qdisc(struct net_device *dev,
>  				 struct netdev_queue *dev_queue,
>  				 void *_need_watchdog)
> @@ -689,7 +709,6 @@ static void transition_one_qdisc(struct net_device *dev,
>  
>  void dev_activate(struct net_device *dev)
>  {
> -	struct netdev_queue *txq;
>  	int need_watchdog;
>  
>  	/* No queueing discipline is attached to device;
> @@ -698,13 +717,8 @@ void dev_activate(struct net_device *dev)
>  	   virtual interfaces
>  	 */
>  
> -	if (dev->qdisc == &noop_qdisc) {
> -		netdev_for_each_tx_queue(dev, attach_one_default_qdisc, NULL);
> -
> -		txq = netdev_get_tx_queue(dev, 0);
> -		dev->qdisc = txq->qdisc_sleeping;
> -		atomic_inc(&dev->qdisc->refcnt);
> -	}
> +	if (dev->qdisc == &noop_qdisc)
> +		attach_default_qdiscs(dev);
>  
>  	if (!netif_carrier_ok(dev))
>  		/* Delay activation until next carrier-on event */
> diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
> new file mode 100644
> index 0000000..5e453fd
> --- /dev/null
> +++ b/net/sched/sch_mq.c
> @@ -0,0 +1,234 @@
> +/*
> + * net/sched/sch_mq.c		Classful multiqueue dummy scheduler
> + *
> + * Copyright (c) 2009 Patrick McHardy <kaber@trash.net>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * version 2 as published by the Free Software Foundation.
> + */
> +
> +#include <linux/types.h>
> +#include <linux/kernel.h>
> +#include <linux/string.h>
> +#include <linux/errno.h>
> +#include <linux/skbuff.h>
> +#include <net/netlink.h>
> +#include <net/pkt_sched.h>
> +
> +struct mq_sched {
> +	struct Qdisc		**qdiscs;
> +};
> +
> +static void mq_destroy(struct Qdisc *sch)
> +{
> +	struct net_device *dev = qdisc_dev(sch);
> +	struct mq_sched *priv = qdisc_priv(sch);
> +	unsigned int ntx;
> +
> +	if (priv->qdiscs)
> +		return;
> +	for (ntx = 0; ntx < dev->num_tx_queues && priv->qdiscs[ntx]; ntx++)
> +		qdisc_destroy(priv->qdiscs[ntx]);
> +	kfree(priv->qdiscs);
> +}
> +
> +static int mq_init(struct Qdisc *sch, struct nlattr *opt)
> +{
> +	struct net_device *dev = qdisc_dev(sch);
> +	struct mq_sched *priv = qdisc_priv(sch);
> +	struct netdev_queue *dev_queue;
> +	struct Qdisc *qdisc;
> +	unsigned int ntx;
> +
> +	if (sch->parent != TC_H_ROOT)
> +		return -EOPNOTSUPP;
> +
> +	if (!netif_is_multiqueue(dev))
> +		return -EOPNOTSUPP;
> +
> +	/* pre-allocate qdiscs, attachment can't fail */
> +	priv->qdiscs = kcalloc(dev->num_tx_queues, sizeof(priv->qdiscs[0]),
> +			       GFP_KERNEL);

I guess we could avoid this at all or at least to do it in one step with
current ->attach.

> +	if (priv->qdiscs == NULL)
> +		return -ENOMEM;
> +
> +	for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
> +		dev_queue = netdev_get_tx_queue(dev, ntx);
> +		qdisc = qdisc_create_dflt(dev, dev_queue, &pfifo_fast_ops,
> +					  TC_H_MAKE(TC_H_MAJ(sch->handle),
> +						    TC_H_MIN(ntx + 1)));

As I wrote in 05/07 comment, I wonder if we really can't achieve this
with old TC_H_ROOT parentid, and maybe some mapping while dumping to
the userspace only. Another possibility would be considering a new
kind of root (mqroot?) to tell precisely, where a new qdisc should be
added.

> +		if (qdisc == NULL)
> +			goto err;
> +		qdisc->flags |= TCQ_F_CAN_BYPASS;
> +		priv->qdiscs[ntx] = qdisc;
> +	}
> +
> +	return 0;
> +
> +err:
> +	mq_destroy(sch);
> +	return -ENOMEM;
> +}
> +
> +static void mq_attach(struct Qdisc *sch)
> +{
> +	struct net_device *dev = qdisc_dev(sch);
> +	struct mq_sched *priv = qdisc_priv(sch);
> +	struct Qdisc *qdisc;
> +	unsigned int ntx;
> +
> +	for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
> +		qdisc = priv->qdiscs[ntx];
> +		qdisc = dev_graft_qdisc(qdisc->dev_queue, qdisc);
> +		if (qdisc)
> +			qdisc_destroy(qdisc);
> +	}
> +	kfree(priv->qdiscs);
> +	priv->qdiscs = NULL;
> +}
> +
> +static int mq_dump(struct Qdisc *sch, struct sk_buff *skb)
> +{
> +	struct net_device *dev = qdisc_dev(sch);
> +	struct Qdisc *qdisc;
> +	unsigned int ntx;
> +
> +	sch->q.qlen = 0;
> +	memset(&sch->bstats, 0, sizeof(sch->bstats));
> +	memset(&sch->qstats, 0, sizeof(sch->qstats));
> +
> +	for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
> +		qdisc = netdev_get_tx_queue(dev, ntx)->qdisc_sleeping;
> +		spin_lock_bh(qdisc_lock(qdisc));
> +		sch->q.qlen		+= qdisc->q.qlen;
> +		sch->bstats.bytes	+= qdisc->bstats.bytes;
> +		sch->bstats.packets	+= qdisc->bstats.packets;
> +		sch->qstats.qlen	+= qdisc->qstats.qlen;

Like in Christoph's case, we should probably use q.qlen instead.

Thanks,
Jarek P.

> +		sch->qstats.backlog	+= qdisc->qstats.backlog;
> +		sch->qstats.drops	+= qdisc->qstats.drops;
> +		sch->qstats.requeues	+= qdisc->qstats.requeues;
> +		sch->qstats.overlimits	+= qdisc->qstats.overlimits;
> +		spin_unlock_bh(qdisc_lock(qdisc));
> +	}
> +	return 0;
> +}
> +
> +static struct netdev_queue *mq_queue_get(struct Qdisc *sch, unsigned long cl)
> +{
> +	struct net_device *dev = qdisc_dev(sch);
> +	unsigned long ntx = cl - 1;
> +
> +	if (ntx >= dev->num_tx_queues)
> +		return NULL;
> +	return netdev_get_tx_queue(dev, ntx);
> +}
> +
> +static unsigned int mq_select_queue(struct Qdisc *sch, struct tcmsg *tcm)
> +{
> +	unsigned int ntx = TC_H_MIN(tcm->tcm_parent);
> +
> +	if (!mq_queue_get(sch, ntx))
> +		return 0;
> +	return ntx - 1;
> +}
> +
> +static int mq_graft(struct Qdisc *sch, unsigned long cl, struct Qdisc *new,
> +		    struct Qdisc **old)
> +{
> +	struct netdev_queue *dev_queue = mq_queue_get(sch, cl);
> +	struct net_device *dev = qdisc_dev(sch);
> +
> +	if (dev->flags & IFF_UP)
> +		dev_deactivate(dev);
> +
> +	*old = dev_graft_qdisc(dev_queue, new);
> +
> +	if (dev->flags & IFF_UP)
> +		dev_activate(dev);
> +	return 0;
> +}
> +
> +static struct Qdisc *mq_leaf(struct Qdisc *sch, unsigned long cl)
> +{
> +	struct netdev_queue *dev_queue = mq_queue_get(sch, cl);
> +
> +	return dev_queue->qdisc_sleeping;
> +}
> +
> +static unsigned long mq_get(struct Qdisc *sch, u32 classid)
> +{
> +	unsigned int ntx = TC_H_MIN(classid);
> +
> +	if (!mq_queue_get(sch, ntx))
> +		return 0;
> +	return ntx;
> +}
> +
> +static void mq_put(struct Qdisc *sch, unsigned long cl)
> +{
> +	return;
> +}
> +
> +static int mq_dump_class(struct Qdisc *sch, unsigned long cl,
> +			 struct sk_buff *skb, struct tcmsg *tcm)
> +{
> +	struct netdev_queue *dev_queue = mq_queue_get(sch, cl);
> +
> +	tcm->tcm_parent = TC_H_ROOT;
> +	tcm->tcm_handle |= TC_H_MIN(cl);
> +	tcm->tcm_info = dev_queue->qdisc_sleeping->handle;
> +	return 0;
> +}
> +
> +static int mq_dump_class_stats(struct Qdisc *sch, unsigned long cl,
> +			       struct gnet_dump *d)
> +{
> +	struct netdev_queue *dev_queue = mq_queue_get(sch, cl);
> +
> +	sch = dev_queue->qdisc_sleeping;
> +	if (gnet_stats_copy_basic(d, &sch->bstats) < 0 ||
> +	    gnet_stats_copy_queue(d, &sch->qstats) < 0)
> +		return -1;
> +	return 0;
> +}
> +
> +static void mq_walk(struct Qdisc *sch, struct qdisc_walker *arg)
> +{
> +	struct net_device *dev = qdisc_dev(sch);
> +	unsigned int ntx;
> +
> +	if (arg->stop)
> +		return;
> +
> +	arg->count = arg->skip;
> +	for (ntx = arg->skip; ntx < dev->num_tx_queues; ntx++) {
> +		if (arg->fn(sch, ntx + 1, arg) < 0) {
> +			arg->stop = 1;
> +			break;
> +		}
> +		arg->count++;
> +	}
> +}
> +
> +static const struct Qdisc_class_ops mq_class_ops = {
> +	.select_queue	= mq_select_queue,
> +	.graft		= mq_graft,
> +	.leaf		= mq_leaf,
> +	.get		= mq_get,
> +	.put		= mq_put,
> +	.walk		= mq_walk,
> +	.dump		= mq_dump_class,
> +	.dump_stats	= mq_dump_class_stats,
> +};
> +
> +struct Qdisc_ops mq_qdisc_ops __read_mostly = {
> +	.cl_ops		= &mq_class_ops,
> +	.id		= "mq",
> +	.priv_size	= sizeof(struct mq_sched),
> +	.init		= mq_init,
> +	.destroy	= mq_destroy,
> +	.attach		= mq_attach,
> +	.dump		= mq_dump,
> +	.owner		= THIS_MODULE,
> +};
> --

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: net_sched 00/07: classful multiqueue dummy scheduler
  2009-09-04 16:42 ` net_sched 00/07: " Patrick McHardy
@ 2009-09-07  8:50   ` David Miller
  2009-09-07  9:46     ` Jarek Poplawski
  2009-09-07 13:00     ` Eric Dumazet
  0 siblings, 2 replies; 46+ messages in thread
From: David Miller @ 2009-09-07  8:50 UTC (permalink / raw)
  To: kaber; +Cc: netdev


I gave these patches a very basic bashing with NIU, and it
seems to work from what I've tried.

I know that Jarek has expressed some questions about the callback
scheme used by the new mq classful qdisc, as well as some other
issues, but we can refine this using followon patches.

For now I'm pushing this out so that it gets wider testing.

Thanks everyone!

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: net_sched 00/07: classful multiqueue dummy scheduler
  2009-09-07  8:50   ` David Miller
@ 2009-09-07  9:46     ` Jarek Poplawski
  2009-09-07 13:00     ` Eric Dumazet
  1 sibling, 0 replies; 46+ messages in thread
From: Jarek Poplawski @ 2009-09-07  9:46 UTC (permalink / raw)
  To: David Miller; +Cc: kaber, netdev

On 07-09-2009 10:50, David Miller wrote:
> I gave these patches a very basic bashing with NIU, and it
> seems to work from what I've tried.
> 
> I know that Jarek has expressed some questions about the callback
> scheme used by the new mq classful qdisc, as well as some other
> issues, but we can refine this using followon patches.
> 
> For now I'm pushing this out so that it gets wider testing.

Sure, it should make the further discussion easier (at least until
a new backward compatibilty starts to matter ;-).

Thanks,
Jarek P.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: net_sched 00/07: classful multiqueue dummy scheduler
  2009-09-07  8:50   ` David Miller
  2009-09-07  9:46     ` Jarek Poplawski
@ 2009-09-07 13:00     ` Eric Dumazet
  2009-09-07 13:29       ` Patrick McHardy
  1 sibling, 1 reply; 46+ messages in thread
From: Eric Dumazet @ 2009-09-07 13:00 UTC (permalink / raw)
  To: David Miller; +Cc: kaber, netdev

David Miller a écrit :
> I gave these patches a very basic bashing with NIU, and it
> seems to work from what I've tried.
> 
> I know that Jarek has expressed some questions about the callback
> scheme used by the new mq classful qdisc, as well as some other
> issues, but we can refine this using followon patches.
> 
> For now I'm pushing this out so that it gets wider testing.
> 
> Thanks everyone!

Very interesting :)

Had very litle time to test this, but got problems very fast, if rate estimator configured.

(Here, eth2 maps to tg3, that uses a num_tx_queues of 5, even on non multiqueue device)
So its real_num_tx_queues is 1, but we can play with tc and mq

# tc qdisc replace dev eth2 handle 1: root estimator 1sec 8sec mq

# tc -s -d qdisc show dev eth2
qdisc mq 1: root
 Sent 62414 bytes 401 pkt (dropped 0, overlimits 0 requeues 0)
 rate 5456bit 4pps backlog 0b 0p requeues 0

# tc qdisc replace dev eth2 parent 1:1 estimator 1sec 8sec pfifo
# tc -s -d qdisc show dev eth2
qdisc mq 1: root
 Sent 12984 bytes 88 pkt (dropped 0, overlimits 0 requeues 0)
 rate 4368bit 4pps backlog 0b 0p requeues 0
qdisc pfifo 8001: parent 1:1 limit 1000p
 Sent 12984 bytes 88 pkt (dropped 0, overlimits 0 requeues 0)
 rate 9624bit 8pps backlog 0b 0p requeues 0
# tc -s -d qdisc show dev eth2
qdisc mq 1: root
 Sent 36781 bytes 244 pkt (dropped 0, overlimits 0 requeues 0)
 rate 34360Mbit 205872pps backlog 0b 0p requeues 0
qdisc pfifo 8001: parent 1:1 limit 1000p
 Sent 36781 bytes 244 pkt (dropped 0, overlimits 0 requeues 0)
 rate 19824bit 16pps backlog 0b 0p requeues 0
# tc -s -d qdisc show dev eth2
qdisc mq 1: root
 Sent 52663 bytes 348 pkt (dropped 0, overlimits 0 requeues 0)
 rate 17457Mbit 105605pps backlog 0b 0p requeues 0
qdisc pfifo 8001: parent 1:1 limit 1000p
 Sent 52663 bytes 348 pkt (dropped 0, overlimits 0 requeues 0)
 rate 22560bit 19pps backlog 0b 0p requeues 0
# tc -s -d qdisc show dev eth2
qdisc mq 1: root
 Sent 71775 bytes 473 pkt (dropped 0, overlimits 0 requeues 0)
 rate 11838Mbit 47402pps backlog 0b 0p requeues 0
qdisc pfifo 8001: parent 1:1 limit 1000p
 Sent 71775 bytes 473 pkt (dropped 0, overlimits 0 requeues 0)
 rate 23880bit 20pps backlog 0b 0p requeues 0
# tc -s -d qdisc show dev eth2
qdisc mq 1: root
 Sent 94755 bytes 623 pkt (dropped 0, overlimits 0 requeues 0)
 rate 3562Mbit 18621pps backlog 0b 0p requeues 0
qdisc pfifo 8001: parent 1:1 limit 1000p
 Sent 94755 bytes 623 pkt (dropped 0, overlimits 0 requeues 0)
 rate 24440bit 20pps backlog 0b 0p requeues 0
# tc -s -d qdisc show dev eth2
qdisc mq 1: root
 Sent 112501 bytes 741 pkt (dropped 0, overlimits 0 requeues 0)
 rate 734270Kbit 9562pps backlog 0b 0p requeues 0
qdisc pfifo 8001: parent 1:1 limit 1000p
 Sent 112501 bytes 741 pkt (dropped 0, overlimits 0 requeues 0)
 rate 24632bit 20pps backlog 0b 0p requeues 0
# tc -s -d qdisc show dev eth2
qdisc mq 1: root
 Sent 127137 bytes 836 pkt (dropped 0, overlimits 0 requeues 0)
 rate 25390Mbit 4913pps backlog 0b 0p requeues 0
qdisc pfifo 8001: parent 1:1 limit 1000p
 Sent 127137 bytes 836 pkt (dropped 0, overlimits 0 requeues 0)
 rate 24960bit 21pps backlog 0b 0p requeues 0
# tc -s -d qdisc show dev eth2
qdisc mq 1: root
 Sent 150745 bytes 992 pkt (dropped 0, overlimits 0 requeues 0)
 rate 6212Mbit 1693pps backlog 0b 0p requeues 0
qdisc pfifo 8001: parent 1:1 limit 1000p
 Sent 150745 bytes 992 pkt (dropped 0, overlimits 0 requeues 0)
 rate 25032bit 21pps backlog 0b 0p requeues 0
# tc -s -d qdisc show dev eth2
qdisc mq 1: root
 Sent 174008 bytes 1144 pkt (dropped 0, overlimits 0 requeues 0)
 rate 29377Mbit 674pps backlog 0b 0p requeues 0
qdisc pfifo 8001: parent 1:1 limit 1000p
 Sent 174008 bytes 1144 pkt (dropped 0, overlimits 0 requeues 0)
 rate 24904bit 21pps backlog 0b 0p requeues 0
# tc -s -d qdisc show dev eth2
qdisc mq 1: root
 Sent 185976 bytes 1224 pkt (dropped 0, overlimits 0 requeues 0)
 rate 13093Mbit 408pps backlog 0b 0p requeues 0
qdisc pfifo 8001: parent 1:1 limit 1000p
 Sent 185976 bytes 1224 pkt (dropped 0, overlimits 0 requeues 0)
 rate 25288bit 21pps backlog 0b 0p requeues 0
# tc -s -d qdisc show dev eth2
qdisc mq 1: root
 Sent 232272 bytes 1530 pkt (dropped 0, overlimits 0 requeues 0)
 rate 5196Mbit 57pps backlog 0b 0p requeues 0
qdisc pfifo 8001: parent 1:1 limit 1000p
 Sent 232272 bytes 1530 pkt (dropped 0, overlimits 0 requeues 0)
 rate 24784bit 21pps backlog 0b 0p requeues 0
# tc -s -d qdisc show dev eth2
qdisc mq 1: root
 Sent 300071 bytes 1977 pkt (dropped 0, overlimits 0 requeues 0)
 rate 8988Mbit 6pps backlog 0b 0p requeues 0
qdisc pfifo 8001: parent 1:1 limit 1000p
 Sent 300071 bytes 1977 pkt (dropped 0, overlimits 0 requeues 0)
 rate 24432bit 20pps backlog 0b 0p requeues 0
# tc -s -d qdisc show dev eth2
qdisc mq 1: root
 Sent 377495 bytes 2490 pkt (dropped 0, overlimits 0 requeues 0)
 rate 20429Mbit 2pps backlog 0b 0p requeues 0
qdisc pfifo 8001: parent 1:1 limit 1000p
 Sent 377495 bytes 2490 pkt (dropped 0, overlimits 0 requeues 0)
 rate 24520bit 21pps backlog 0b 0p requeues 0
# tc -s -d qdisc show dev eth2
qdisc mq 1: root
 Sent 448070 bytes 2958 pkt (dropped 0, overlimits 0 requeues 0)
 rate 5726Mbit 4pps backlog 0b 0p requeues 0
qdisc pfifo 8001: parent 1:1 limit 1000p
 Sent 448070 bytes 2958 pkt (dropped 0, overlimits 0 requeues 0)
 rate 24576bit 20pps backlog 0b 0p requeues 0
# tc -s -d qdisc show dev eth2
qdisc mq 1: root
 Sent 519926 bytes 3435 pkt (dropped 0, overlimits 0 requeues 0)
 rate 265505Kbit 3pps backlog 0b 0p requeues 0
qdisc pfifo 8001: parent 1:1 limit 1000p
 Sent 519926 bytes 3435 pkt (dropped 0, overlimits 0 requeues 0)
 rate 24920bit 21pps backlog 0b 0p requeues 0
# tc -s -d qdisc show dev eth2
qdisc mq 1: root
 Sent 522242 bytes 3449 pkt (dropped 0, overlimits 0 requeues 0)
 rate 232389Kbit 62pps backlog 0b 0p requeues 0
qdisc pfifo 8001: parent 1:1 limit 1000p
 Sent 522242 bytes 3449 pkt (dropped 0, overlimits 0 requeues 0)
 rate 25304bit 21pps backlog 0b 0p requeues 0
# tc -s -d qdisc show dev eth2
qdisc mq 1: root
 Sent 528702 bytes 3491 pkt (dropped 0, overlimits 0 requeues 0)
 rate 177925Kbit 49pps backlog 0b 0p requeues 0
qdisc pfifo 8001: parent 1:1 limit 1000p
 Sent 528702 bytes 3491 pkt (dropped 0, overlimits 0 requeues 0)
 rate 25400bit 21pps backlog 0b 0p requeues 0

<<<crash>>>

(On another term I had a "ping -i 0.1 192.168.20.120" that gave :

2009/08/07 14:53:42.498 64 bytes from 192.168.20.120: icmp_seq=1982 ttl=64 time=0.126 ms
2009/08/07 14:53:42.598 64 bytes from 192.168.20.120: icmp_seq=1983 ttl=64 time=0.118 ms
2009/08/07 14:53:42.698 64 bytes from 192.168.20.120: icmp_seq=1984 ttl=64 time=0.114 ms
2009/08/07 14:53:42.798 64 bytes from 192.168.20.120: icmp_seq=1985 ttl=64 time=0.123 ms
2009/08/07 14:53:42.898 64 bytes from 192.168.20.120: icmp_seq=1986 ttl=64 time=0.126 ms
2009/08/07 14:53:42.998 64 bytes from 192.168.20.120: icmp_seq=1987 ttl=64 time=0.119 ms
2009/08/07 14:53:43.098 64 bytes from 192.168.20.120: icmp_seq=1988 ttl=64 time=0.122 ms
2009/08/07 14:53:43.198 64 bytes from 192.168.20.120: icmp_seq=1989 ttl=64 time=0.119 ms
2009/08/07 14:53:43.298 64 bytes from 192.168.20.120: icmp_seq=1990 ttl=64 time=0.117 ms
2009/08/07 14:53:43.398 64 bytes from 192.168.20.120: icmp_seq=1991 ttl=64 time=0.117 ms
ping: sendmsg: No buffer space available



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: net_sched 05/07: reintroduce dev->qdisc for use by sch_api
  2009-09-06 18:57   ` Jarek Poplawski
@ 2009-09-07 13:16     ` Patrick McHardy
  2009-09-07 16:49       ` Jarek Poplawski
  0 siblings, 1 reply; 46+ messages in thread
From: Patrick McHardy @ 2009-09-07 13:16 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: netdev

Jarek Poplawski wrote:
>> @@ -1383,7 +1375,7 @@ static int tc_ctl_tclass(struct sk_buff *skb, struct nlmsghdr *n, void *arg)
>>  			pid = TC_H_MAKE(qid, pid);
>>  	} else {
>>  		if (qid == 0)
>> -			qid = dev_queue->qdisc_sleeping->handle;
>> +			qid = dev->qdisc->handle;
> 
> Probably I miss something, but in mq root case it seems to never do
> anything we need. If so, it could be the example of possible issues
> elsewhere.

Sorry, I'm not sure what you're saying ..

> I thought this mq virtual root qdisc could be done more transparently
> and invisible for the current code, but it seems, in your
> implementation some pointers like this, or parent ids (especially
> TC_H_ROOT) might be different, and even if it works OK, needs a lot of
> verification. So, my question is, if it's really necessary.

Same here.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: net_sched 07/07: add classful multiqueue dummy scheduler
  2009-09-06 20:04   ` Jarek Poplawski
@ 2009-09-07 13:27     ` Patrick McHardy
  2009-09-07 18:22       ` Jarek Poplawski
  2009-09-07 19:24       ` Jarek Poplawski
  0 siblings, 2 replies; 46+ messages in thread
From: Patrick McHardy @ 2009-09-07 13:27 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: netdev

Jarek Poplawski wrote:
>>  struct Qdisc_class_ops
>>  {
>>  	/* Child qdisc manipulation */
>> +	unsigned int		(*select_queue)(struct Qdisc *, struct tcmsg *);
>>  	int			(*graft)(struct Qdisc *, unsigned long cl,
>>  					struct Qdisc *, struct Qdisc **);
>>  	struct Qdisc *		(*leaf)(struct Qdisc *, unsigned long cl);
>> @@ -122,6 +123,7 @@ struct Qdisc_ops
>>  	void			(*reset)(struct Qdisc *);
>>  	void			(*destroy)(struct Qdisc *);
>>  	int			(*change)(struct Qdisc *, struct nlattr *arg);
>> +	void			(*attach)(struct Qdisc *);
> 
> Probably it's a matter of taste, but I wonder why these two methods
> used only by one qdisc in max 2 places can't be functions instead
> (maybe even static in case of select_queue)? (And this mq sched could
> be tested with some flag instead of ->attach, I guess.)

Yes, we could also use normal functions. Either way is fine with me.

>> diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
>> index d71f12b..2a78d54 100644
>> --- a/net/sched/sch_api.c
>> +++ b/net/sched/sch_api.c
>> @@ -678,6 +678,11 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
>>  		if (dev->flags & IFF_UP)
>>  			dev_deactivate(dev);
>>  
>> +		if (new && new->ops->attach) {
>> +			new->ops->attach(new);
>> +			num_q = 0;
>> +		}
>> +
> 
> Actually, I wonder if it's not cleaner to let replace all qdiscs with
> noops below like in qdisc delete case, and do this attaching in one
> place only (dev_activate).

I don't think that would work since dev_activate() allocates its own
qdiscs, which use different handles than those specified by userspace.
We also need the new qdisc for notifications. It would be a nice
cleanup however if you can make it work.

>> @@ -1095,10 +1100,16 @@ create_n_graft:
>>  		q = qdisc_create(dev, &dev->rx_queue,
>>  				 tcm->tcm_parent, tcm->tcm_parent,
>>  				 tca, &err);
>> -	else
>> -		q = qdisc_create(dev, netdev_get_tx_queue(dev, 0),
>> +	else {
>> +		unsigned int ntx = 0;
>> +
>> +		if (p && p->ops->cl_ops && p->ops->cl_ops->select_queue)
>> +			ntx = p->ops->cl_ops->select_queue(p, tcm);
> 
> So, this if could be probably made shorter with a common function, but
> the main point is: this probably works only for qdiscs having mq as a
> parent, and not below.

Yes. mq can only be attached to the root however, so its not
possible to use it as a child qdisc.

>> +static int mq_init(struct Qdisc *sch, struct nlattr *opt)
>> +{
>> +	struct net_device *dev = qdisc_dev(sch);
>> +	struct mq_sched *priv = qdisc_priv(sch);
>> +	struct netdev_queue *dev_queue;
>> +	struct Qdisc *qdisc;
>> +	unsigned int ntx;
>> +
>> +	if (sch->parent != TC_H_ROOT)
>> +		return -EOPNOTSUPP;
>> +
>> +	if (!netif_is_multiqueue(dev))
>> +		return -EOPNOTSUPP;
>> +
>> +	/* pre-allocate qdiscs, attachment can't fail */
>> +	priv->qdiscs = kcalloc(dev->num_tx_queues, sizeof(priv->qdiscs[0]),
>> +			       GFP_KERNEL);
> 
> I guess we could avoid this at all or at least to do it in one step with
> current ->attach.

It seemed easier this way, but I don't care much where its done exactly.

>> +	if (priv->qdiscs == NULL)
>> +		return -ENOMEM;
>> +
>> +	for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
>> +		dev_queue = netdev_get_tx_queue(dev, ntx);
>> +		qdisc = qdisc_create_dflt(dev, dev_queue, &pfifo_fast_ops,
>> +					  TC_H_MAKE(TC_H_MAJ(sch->handle),
>> +						    TC_H_MIN(ntx + 1)));
> 
> As I wrote in 05/07 comment, I wonder if we really can't achieve this
> with old TC_H_ROOT parentid, and maybe some mapping while dumping to
> the userspace only.

I don't see the advantage.

> Another possibility would be considering a new
> kind of root (mqroot?) to tell precisely, where a new qdisc should be
> added.

That's what mq is doing.

>> +static int mq_dump(struct Qdisc *sch, struct sk_buff *skb)
>> +{
>> +	struct net_device *dev = qdisc_dev(sch);
>> +	struct Qdisc *qdisc;
>> +	unsigned int ntx;
>> +
>> +	sch->q.qlen = 0;
>> +	memset(&sch->bstats, 0, sizeof(sch->bstats));
>> +	memset(&sch->qstats, 0, sizeof(sch->qstats));
>> +
>> +	for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
>> +		qdisc = netdev_get_tx_queue(dev, ntx)->qdisc_sleeping;
>> +		spin_lock_bh(qdisc_lock(qdisc));
>> +		sch->q.qlen		+= qdisc->q.qlen;
>> +		sch->bstats.bytes	+= qdisc->bstats.bytes;
>> +		sch->bstats.packets	+= qdisc->bstats.packets;
>> +		sch->qstats.qlen	+= qdisc->qstats.qlen;
> 
> Like in Christoph's case, we should probably use q.qlen instead.

Its done a few lines above. This simply sums up all members of qstats.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: net_sched 00/07: classful multiqueue dummy scheduler
  2009-09-07 13:00     ` Eric Dumazet
@ 2009-09-07 13:29       ` Patrick McHardy
  2009-09-07 14:23         ` Patrick McHardy
  0 siblings, 1 reply; 46+ messages in thread
From: Patrick McHardy @ 2009-09-07 13:29 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev

Eric Dumazet wrote:
> David Miller a écrit :
>> I gave these patches a very basic bashing with NIU, and it
>> seems to work from what I've tried.
>>
>> I know that Jarek has expressed some questions about the callback
>> scheme used by the new mq classful qdisc, as well as some other
>> issues, but we can refine this using followon patches.
>>
>> For now I'm pushing this out so that it gets wider testing.
>>
>> Thanks everyone!
> 
> Very interesting :)
> 
> Had very litle time to test this, but got problems very fast, if rate estimator configured.

I didn't test that, but I'll look into it.

> qdisc mq 1: root
>  Sent 528702 bytes 3491 pkt (dropped 0, overlimits 0 requeues 0)
>  rate 177925Kbit 49pps backlog 0b 0p requeues 0
> qdisc pfifo 8001: parent 1:1 limit 1000p
>  Sent 528702 bytes 3491 pkt (dropped 0, overlimits 0 requeues 0)
>  rate 25400bit 21pps backlog 0b 0p requeues 0
> 
> <<<crash>>>

Did you capture the crash?

> (On another term I had a "ping -i 0.1 192.168.20.120" that gave :
> 
> 2009/08/07 14:53:42.498 64 bytes from 192.168.20.120: icmp_seq=1982 ttl=64 time=0.126 ms
> 2009/08/07 14:53:42.598 64 bytes from 192.168.20.120: icmp_seq=1983 ttl=64 time=0.118 ms
> 2009/08/07 14:53:42.698 64 bytes from 192.168.20.120: icmp_seq=1984 ttl=64 time=0.114 ms
> 2009/08/07 14:53:42.798 64 bytes from 192.168.20.120: icmp_seq=1985 ttl=64 time=0.123 ms
> 2009/08/07 14:53:42.898 64 bytes from 192.168.20.120: icmp_seq=1986 ttl=64 time=0.126 ms
> 2009/08/07 14:53:42.998 64 bytes from 192.168.20.120: icmp_seq=1987 ttl=64 time=0.119 ms
> 2009/08/07 14:53:43.098 64 bytes from 192.168.20.120: icmp_seq=1988 ttl=64 time=0.122 ms
> 2009/08/07 14:53:43.198 64 bytes from 192.168.20.120: icmp_seq=1989 ttl=64 time=0.119 ms
> 2009/08/07 14:53:43.298 64 bytes from 192.168.20.120: icmp_seq=1990 ttl=64 time=0.117 ms
> 2009/08/07 14:53:43.398 64 bytes from 192.168.20.120: icmp_seq=1991 ttl=64 time=0.117 ms
> ping: sendmsg: No buffer space available

Was this also with rate estimators? No buffer space available
indicates that some class/qdisc isn't dequeued or the packets
are leaking, so the output of tc -s -d qdisc show ... might be
helpful.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: net_sched 00/07: classful multiqueue dummy scheduler
  2009-09-07 13:29       ` Patrick McHardy
@ 2009-09-07 14:23         ` Patrick McHardy
  2009-09-07 17:21           ` Eric Dumazet
  2009-09-08  9:31           ` David Miller
  0 siblings, 2 replies; 46+ messages in thread
From: Patrick McHardy @ 2009-09-07 14:23 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev

[-- Attachment #1: Type: text/plain, Size: 2247 bytes --]

Patrick McHardy wrote:
> Eric Dumazet wrote:
>> Had very litle time to test this, but got problems very fast, if rate estimator configured.
> 
> I didn't test that, but I'll look into it.
> 
>> qdisc mq 1: root
>>  Sent 528702 bytes 3491 pkt (dropped 0, overlimits 0 requeues 0)
>>  rate 177925Kbit 49pps backlog 0b 0p requeues 0
>> qdisc pfifo 8001: parent 1:1 limit 1000p
>>  Sent 528702 bytes 3491 pkt (dropped 0, overlimits 0 requeues 0)
>>  rate 25400bit 21pps backlog 0b 0p requeues 0
>>
>> <<<crash>>>
> 
> Did you capture the crash?
> 
>> (On another term I had a "ping -i 0.1 192.168.20.120" that gave :
>>
>> 2009/08/07 14:53:42.498 64 bytes from 192.168.20.120: icmp_seq=1982 ttl=64 time=0.126 ms
>> 2009/08/07 14:53:42.598 64 bytes from 192.168.20.120: icmp_seq=1983 ttl=64 time=0.118 ms
>> 2009/08/07 14:53:42.698 64 bytes from 192.168.20.120: icmp_seq=1984 ttl=64 time=0.114 ms
>> 2009/08/07 14:53:42.798 64 bytes from 192.168.20.120: icmp_seq=1985 ttl=64 time=0.123 ms
>> 2009/08/07 14:53:42.898 64 bytes from 192.168.20.120: icmp_seq=1986 ttl=64 time=0.126 ms
>> 2009/08/07 14:53:42.998 64 bytes from 192.168.20.120: icmp_seq=1987 ttl=64 time=0.119 ms
>> 2009/08/07 14:53:43.098 64 bytes from 192.168.20.120: icmp_seq=1988 ttl=64 time=0.122 ms
>> 2009/08/07 14:53:43.198 64 bytes from 192.168.20.120: icmp_seq=1989 ttl=64 time=0.119 ms
>> 2009/08/07 14:53:43.298 64 bytes from 192.168.20.120: icmp_seq=1990 ttl=64 time=0.117 ms
>> 2009/08/07 14:53:43.398 64 bytes from 192.168.20.120: icmp_seq=1991 ttl=64 time=0.117 ms
>> ping: sendmsg: No buffer space available
> 
> Was this also with rate estimators? No buffer space available
> indicates that some class/qdisc isn't dequeued or the packets
> are leaking, so the output of tc -s -d qdisc show ... might be
> helpful.

I figured out the bug, which is likely responsible for both
problems. When grafting a mq class and creating a rate estimator,
the new qdisc is not attached to the device queue yet and also
doesn't have TC_H_ROOT as parent, so qdisc_create() selects
qdisc_root_sleeping_lock() for the estimator, which belongs to
the qdisc that is getting replaced.

This is a patch I used for testing, but I'll come up with
something more elegant (I hope) as a final fix :)



[-- Attachment #2: x --]
[-- Type: text/plain, Size: 1497 bytes --]

diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 2a78d54..428eb34 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -732,7 +732,8 @@ static struct lock_class_key qdisc_rx_lock;
  */
 
 static struct Qdisc *
-qdisc_create(struct net_device *dev, struct netdev_queue *dev_queue,
+qdisc_create(struct net_device *dev, struct Qdisc *p,
+	     struct netdev_queue *dev_queue,
 	     u32 parent, u32 handle, struct nlattr **tca, int *errp)
 {
 	int err;
@@ -810,8 +811,9 @@ qdisc_create(struct net_device *dev, struct netdev_queue *dev_queue,
 		if (tca[TCA_RATE]) {
 			spinlock_t *root_lock;
 
-			if ((sch->parent != TC_H_ROOT) &&
-			    !(sch->flags & TCQ_F_INGRESS))
+			if (((sch->parent != TC_H_ROOT) &&
+			     !(sch->flags & TCQ_F_INGRESS)) &&
+			    (!p || !p->ops->attach))
 				root_lock = qdisc_root_sleeping_lock(sch);
 			else
 				root_lock = qdisc_lock(sch);
@@ -1097,7 +1099,7 @@ create_n_graft:
 	if (!(n->nlmsg_flags&NLM_F_CREATE))
 		return -ENOENT;
 	if (clid == TC_H_INGRESS)
-		q = qdisc_create(dev, &dev->rx_queue,
+		q = qdisc_create(dev, p, &dev->rx_queue,
 				 tcm->tcm_parent, tcm->tcm_parent,
 				 tca, &err);
 	else {
@@ -1106,7 +1108,7 @@ create_n_graft:
 		if (p && p->ops->cl_ops && p->ops->cl_ops->select_queue)
 			ntx = p->ops->cl_ops->select_queue(p, tcm);
 
-		q = qdisc_create(dev, netdev_get_tx_queue(dev, ntx),
+		q = qdisc_create(dev, p, netdev_get_tx_queue(dev, ntx),
 				 tcm->tcm_parent, tcm->tcm_handle,
 				 tca, &err);
 	}

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: net_sched 05/07: reintroduce dev->qdisc for use by sch_api
  2009-09-07 13:16     ` Patrick McHardy
@ 2009-09-07 16:49       ` Jarek Poplawski
  0 siblings, 0 replies; 46+ messages in thread
From: Jarek Poplawski @ 2009-09-07 16:49 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netdev

On Mon, Sep 07, 2009 at 03:16:29PM +0200, Patrick McHardy wrote:
> Jarek Poplawski wrote:
> >> @@ -1383,7 +1375,7 @@ static int tc_ctl_tclass(struct sk_buff *skb, struct nlmsghdr *n, void *arg)
> >>  			pid = TC_H_MAKE(qid, pid);
> >>  	} else {
> >>  		if (qid == 0)
> >> -			qid = dev_queue->qdisc_sleeping->handle;
> >> +			qid = dev->qdisc->handle;
> > 
> > Probably I miss something, but in mq root case it seems to never do
> > anything we need. If so, it could be the example of possible issues
> > elsewhere.
> 
> Sorry, I'm not sure what you're saying ..
> 
> > I thought this mq virtual root qdisc could be done more transparently
> > and invisible for the current code, but it seems, in your
> > implementation some pointers like this, or parent ids (especially
> > TC_H_ROOT) might be different, and even if it works OK, needs a lot of
> > verification. So, my question is, if it's really necessary.
> 
> Same here.

Nevermind! I simply had a dream there could be preserved some old
meaning of "root" etc. within a queue but it doesn't make a sense with
this kind of interface.

Jarek P.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: net_sched 00/07: classful multiqueue dummy scheduler
  2009-09-07 14:23         ` Patrick McHardy
@ 2009-09-07 17:21           ` Eric Dumazet
  2009-09-07 17:28             ` Patrick McHardy
  2009-09-08  9:31           ` David Miller
  1 sibling, 1 reply; 46+ messages in thread
From: Eric Dumazet @ 2009-09-07 17:21 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David Miller, netdev

Patrick McHardy a écrit :
> Patrick McHardy wrote:
>> Eric Dumazet wrote:
>>> Had very litle time to test this, but got problems very fast, if rate estimator configured.
>> I didn't test that, but I'll look into it.
>>
>>> qdisc mq 1: root
>>>  Sent 528702 bytes 3491 pkt (dropped 0, overlimits 0 requeues 0)
>>>  rate 177925Kbit 49pps backlog 0b 0p requeues 0
>>> qdisc pfifo 8001: parent 1:1 limit 1000p
>>>  Sent 528702 bytes 3491 pkt (dropped 0, overlimits 0 requeues 0)
>>>  rate 25400bit 21pps backlog 0b 0p requeues 0
>>>
>>> <<<crash>>>
>> Did you capture the crash?

No, in fact it was a freeze.

>>
>>> (On another term I had a "ping -i 0.1 192.168.20.120" that gave :
>>>
>>> 2009/08/07 14:53:42.498 64 bytes from 192.168.20.120: icmp_seq=1982 ttl=64 time=0.126 ms
>>> 2009/08/07 14:53:42.598 64 bytes from 192.168.20.120: icmp_seq=1983 ttl=64 time=0.118 ms
>>> 2009/08/07 14:53:42.698 64 bytes from 192.168.20.120: icmp_seq=1984 ttl=64 time=0.114 ms
>>> 2009/08/07 14:53:42.798 64 bytes from 192.168.20.120: icmp_seq=1985 ttl=64 time=0.123 ms
>>> 2009/08/07 14:53:42.898 64 bytes from 192.168.20.120: icmp_seq=1986 ttl=64 time=0.126 ms
>>> 2009/08/07 14:53:42.998 64 bytes from 192.168.20.120: icmp_seq=1987 ttl=64 time=0.119 ms
>>> 2009/08/07 14:53:43.098 64 bytes from 192.168.20.120: icmp_seq=1988 ttl=64 time=0.122 ms
>>> 2009/08/07 14:53:43.198 64 bytes from 192.168.20.120: icmp_seq=1989 ttl=64 time=0.119 ms
>>> 2009/08/07 14:53:43.298 64 bytes from 192.168.20.120: icmp_seq=1990 ttl=64 time=0.117 ms
>>> 2009/08/07 14:53:43.398 64 bytes from 192.168.20.120: icmp_seq=1991 ttl=64 time=0.117 ms
>>> ping: sendmsg: No buffer space available
>> Was this also with rate estimators? No buffer space available
>> indicates that some class/qdisc isn't dequeued or the packets
>> are leaking, so the output of tc -s -d qdisc show ... might be
>> helpful.
> 
> I figured out the bug, which is likely responsible for both
> problems. When grafting a mq class and creating a rate estimator,
> the new qdisc is not attached to the device queue yet and also
> doesn't have TC_H_ROOT as parent, so qdisc_create() selects
> qdisc_root_sleeping_lock() for the estimator, which belongs to
> the qdisc that is getting replaced.
> 
> This is a patch I used for testing, but I'll come up with
> something more elegant (I hope) as a final fix :)

Yes, this was the problem, and your patch fixed it.

Now adding CONFIG_SLUB_DEBUG_ON=y for next tries :)

Sep  7 16:37:55 erd kernel: [  217.056813] =============================================================================
Sep  7 16:37:55 erd kernel: [  217.056865] BUG kmalloc-256: Poison overwritten
Sep  7 16:37:55 erd kernel: [  217.056910] -----------------------------------------------------------------------------
Sep  7 16:37:55 erd kernel: [  217.056911]
Sep  7 16:37:55 erd kernel: [  217.056990] INFO: 0xf6e622bc-0xf6e622bd. First byte 0x76 instead of 0x6b
Sep  7 16:37:55 erd kernel: [  217.057049] INFO: Allocated in qdisc_alloc+0x1b/0x80 age=154593 cpu=2 pid=5165
Sep  7 16:37:55 erd kernel: [  217.057094] INFO: Freed in qdisc_destroy+0x88/0xa0 age=139186 cpu=4 pid=5173
Sep  7 16:37:55 erd kernel: [  217.057139] INFO: Slab 0xc16ddc40 objects=26 used=6 fp=0xf6e62260 flags=0x28040c3
Sep  7 16:37:55 erd kernel: [  217.057184] INFO: Object 0xf6e62260 @offset=608 fp=0xf6e62850
Sep  7 16:37:55 erd kernel: [  217.057184]
Sep  7 16:37:55 erd kernel: [  217.057259] Bytes b4 0xf6e62250:  d9 04 00 00 fc 6f fb ff 5a 5a 5a 5a 5a 5a 5a 5a Ù...üoûÿZZZZZZZZ
Sep  7 16:37:55 erd kernel: [  217.057771]   Object 0xf6e62260:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Sep  7 16:37:55 erd kernel: [  217.057771]   Object 0xf6e62270:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Sep  7 16:37:55 erd kernel: [  217.057771]   Object 0xf6e62280:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Sep  7 16:37:55 erd kernel: [  217.057771]   Object 0xf6e62290:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Sep  7 16:37:55 erd kernel: [  217.057771]   Object 0xf6e622a0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Sep  7 16:37:55 erd kernel: [  217.057771]   Object 0xf6e622b0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 76 76 6b 6b kkkkkkkkkkkkvvkk
Sep  7 16:37:55 erd kernel: [  217.057771]   Object 0xf6e622c0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Sep  7 16:37:55 erd kernel: [  217.057771]   Object 0xf6e622d0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Sep  7 16:37:55 erd kernel: [  217.057771]   Object 0xf6e622e0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Sep  7 16:37:55 erd kernel: [  217.057771]   Object 0xf6e622f0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Sep  7 16:37:55 erd kernel: [  217.057771]   Object 0xf6e62300:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Sep  7 16:37:55 erd kernel: [  217.057771]   Object 0xf6e62310:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Sep  7 16:37:55 erd kernel: [  217.057771]   Object 0xf6e62320:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Sep  7 16:37:55 erd kernel: [  217.057771]   Object 0xf6e62330:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Sep  7 16:37:55 erd kernel: [  217.057771]   Object 0xf6e62340:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Sep  7 16:37:55 erd kernel: [  217.057771]   Object 0xf6e62350:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk¥
Sep  7 16:37:55 erd kernel: [  217.057771]  Redzone 0xf6e62360:  bb bb bb bb                                     »»»»
Sep  7 16:37:55 erd kernel: [  217.057771]  Padding 0xf6e62388:  5a 5a 5a 5a 5a 5a 5a 5a                         ZZZZZZZZ
Sep  7 16:37:55 erd kernel: [  217.057771] Pid: 5334, comm: bash Not tainted 2.6.31-rc5-04006-gedfbc1d-dirty #188
Sep  7 16:37:55 erd kernel: [  217.057771] Call Trace:
Sep  7 16:37:55 erd kernel: [  217.057771]  [<c02a6d5f>] print_trailer+0xcf/0x120
Sep  7 16:37:55 erd kernel: [  217.057771]  [<c02a6e69>] check_bytes_and_report+0xb9/0xe0
Sep  7 16:37:55 erd kernel: [  217.057771]  [<c02a7097>] check_object+0x1b7/0x200
Sep  7 16:37:55 erd kernel: [  217.057771]  [<c02a89b6>] __slab_alloc+0x3d6/0x5a0
Sep  7 16:37:55 erd kernel: [  217.057771]  [<c02a9602>] __kmalloc+0x172/0x180
Sep  7 16:37:55 erd kernel: [  217.057771]  [<c02e4c02>] ? load_elf_binary+0x122/0x1550
Sep  7 16:37:55 erd kernel: [  217.057771]  [<c02e4c02>] load_elf_binary+0x122/0x1550
Sep  7 16:37:55 erd kernel: [  217.057771]  [<c035655e>] ? strrchr+0xe/0x30
Sep  7 16:37:55 erd kernel: [  217.057771]  [<c02e2644>] ? load_misc_binary+0x64/0x420
Sep  7 16:37:55 erd kernel: [  217.057771]  [<c029190f>] ? page_address+0xcf/0xf0
Sep  7 16:37:55 erd kernel: [  217.057771]  [<c0291aac>] ? kmap_high+0x1c/0x1e0
Sep  7 16:37:55 erd kernel: [  217.057771]  [<c029190f>] ? page_address+0xcf/0xf0
Sep  7 16:37:55 erd kernel: [  217.057771]  [<c029194a>] ? kunmap_high+0x1a/0x90
Sep  7 16:37:55 erd kernel: [  217.057771]  [<c02b20d7>] search_binary_handler+0xa7/0x240
Sep  7 16:37:55 erd kernel: [  217.057771]  [<c02b3686>] do_execve+0x2e6/0x3c0
Sep  7 16:37:56 erd kernel: [  217.057771]  [<c0201638>] sys_execve+0x28/0x60
Sep  7 16:37:56 erd kernel: [  217.057771]  [<c0202d08>] sysenter_do_call+0x12/0x26
Sep  7 16:37:56 erd kernel: [  217.057771] FIX kmalloc-256: Restoring 0xf6e622bc-0xf6e622bd=0x6b
Sep  7 16:37:56 erd kernel: [  217.057771]
Sep  7 16:37:56 erd kernel: [  217.057771] FIX kmalloc-256: Marking all objects used

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: net_sched 00/07: classful multiqueue dummy scheduler
  2009-09-07 17:21           ` Eric Dumazet
@ 2009-09-07 17:28             ` Patrick McHardy
  2009-09-07 17:30               ` Eric Dumazet
  0 siblings, 1 reply; 46+ messages in thread
From: Patrick McHardy @ 2009-09-07 17:28 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev

Eric Dumazet wrote:
>> I figured out the bug, which is likely responsible for both
>> problems. When grafting a mq class and creating a rate estimator,
>> the new qdisc is not attached to the device queue yet and also
>> doesn't have TC_H_ROOT as parent, so qdisc_create() selects
>> qdisc_root_sleeping_lock() for the estimator, which belongs to
>> the qdisc that is getting replaced.
>>
>> This is a patch I used for testing, but I'll come up with
>> something more elegant (I hope) as a final fix :)
> 
> Yes, this was the problem, and your patch fixed it.

Thanks for testing.

> Now adding CONFIG_SLUB_DEBUG_ON=y for next tries :)
> 
> Sep  7 16:37:55 erd kernel: [  217.056813] =============================================================================
> Sep  7 16:37:55 erd kernel: [  217.056865] BUG kmalloc-256: Poison overwritten
> Sep  7 16:37:55 erd kernel: [  217.056910] -----------------------------------------------------------------------------
> Sep  7 16:37:55 erd kernel: [  217.056911]
> Sep  7 16:37:55 erd kernel: [  217.056990] INFO: 0xf6e622bc-0xf6e622bd. First byte 0x76 instead of 0x6b
> Sep  7 16:37:55 erd kernel: [  217.057049] INFO: Allocated in qdisc_alloc+0x1b/0x80 age=154593 cpu=2 pid=5165
> Sep  7 16:37:55 erd kernel: [  217.057094] INFO: Freed in qdisc_destroy+0x88/0xa0 age=139186 cpu=4 pid=5173
> Sep  7 16:37:55 erd kernel: [  217.057139] INFO: Slab 0xc16ddc40 objects=26 used=6 fp=0xf6e62260 flags=0x28040c3
> Sep  7 16:37:55 erd kernel: [  217.057184] INFO: Object 0xf6e62260 @offset=608 fp=0xf6e62850
> Sep  7 16:37:55 erd kernel: [  217.057184]

I'm unable to reproduce this. Could you send me the commands you
used that lead to this?


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: net_sched 00/07: classful multiqueue dummy scheduler
  2009-09-07 17:28             ` Patrick McHardy
@ 2009-09-07 17:30               ` Eric Dumazet
  2009-09-07 17:33                 ` Patrick McHardy
  0 siblings, 1 reply; 46+ messages in thread
From: Eric Dumazet @ 2009-09-07 17:30 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David Miller, netdev

Patrick McHardy a écrit :
> Eric Dumazet wrote:
>>> I figured out the bug, which is likely responsible for both
>>> problems. When grafting a mq class and creating a rate estimator,
>>> the new qdisc is not attached to the device queue yet and also
>>> doesn't have TC_H_ROOT as parent, so qdisc_create() selects
>>> qdisc_root_sleeping_lock() for the estimator, which belongs to
>>> the qdisc that is getting replaced.
>>>
>>> This is a patch I used for testing, but I'll come up with
>>> something more elegant (I hope) as a final fix :)
>> Yes, this was the problem, and your patch fixed it.
> 
> Thanks for testing.
> 
>> Now adding CONFIG_SLUB_DEBUG_ON=y for next tries :)
>>
>> Sep  7 16:37:55 erd kernel: [  217.056813] =============================================================================
>> Sep  7 16:37:55 erd kernel: [  217.056865] BUG kmalloc-256: Poison overwritten
>> Sep  7 16:37:55 erd kernel: [  217.056910] -----------------------------------------------------------------------------
>> Sep  7 16:37:55 erd kernel: [  217.056911]
>> Sep  7 16:37:55 erd kernel: [  217.056990] INFO: 0xf6e622bc-0xf6e622bd. First byte 0x76 instead of 0x6b
>> Sep  7 16:37:55 erd kernel: [  217.057049] INFO: Allocated in qdisc_alloc+0x1b/0x80 age=154593 cpu=2 pid=5165
>> Sep  7 16:37:55 erd kernel: [  217.057094] INFO: Freed in qdisc_destroy+0x88/0xa0 age=139186 cpu=4 pid=5173
>> Sep  7 16:37:55 erd kernel: [  217.057139] INFO: Slab 0xc16ddc40 objects=26 used=6 fp=0xf6e62260 flags=0x28040c3
>> Sep  7 16:37:55 erd kernel: [  217.057184] INFO: Object 0xf6e62260 @offset=608 fp=0xf6e62850
>> Sep  7 16:37:55 erd kernel: [  217.057184]
> 
> I'm unable to reproduce this. Could you send me the commands you
> used that lead to this?
> 

Sorry, this was *before* your last patch.

I tried to have more information, because I was not able to get console messages at crash time on this remote dev machine.

enabling SLUB checks got some hint of what the problem was (using memory block after its freeing by qdisc_destroy)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: net_sched 00/07: classful multiqueue dummy scheduler
  2009-09-07 17:30               ` Eric Dumazet
@ 2009-09-07 17:33                 ` Patrick McHardy
  2009-09-07 17:38                   ` Eric Dumazet
  0 siblings, 1 reply; 46+ messages in thread
From: Patrick McHardy @ 2009-09-07 17:33 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev

Eric Dumazet wrote:
> Patrick McHardy a écrit :
>>> Sep  7 16:37:55 erd kernel: [  217.056813] =============================================================================
>>> Sep  7 16:37:55 erd kernel: [  217.056865] BUG kmalloc-256: Poison overwritten
>>> Sep  7 16:37:55 erd kernel: [  217.056910] -----------------------------------------------------------------------------
>>> Sep  7 16:37:55 erd kernel: [  217.056911]
>>> Sep  7 16:37:55 erd kernel: [  217.056990] INFO: 0xf6e622bc-0xf6e622bd. First byte 0x76 instead of 0x6b
>>> Sep  7 16:37:55 erd kernel: [  217.057049] INFO: Allocated in qdisc_alloc+0x1b/0x80 age=154593 cpu=2 pid=5165
>>> Sep  7 16:37:55 erd kernel: [  217.057094] INFO: Freed in qdisc_destroy+0x88/0xa0 age=139186 cpu=4 pid=5173
>>> Sep  7 16:37:55 erd kernel: [  217.057139] INFO: Slab 0xc16ddc40 objects=26 used=6 fp=0xf6e62260 flags=0x28040c3
>>> Sep  7 16:37:55 erd kernel: [  217.057184] INFO: Object 0xf6e62260 @offset=608 fp=0xf6e62850
>>> Sep  7 16:37:55 erd kernel: [  217.057184]
>> I'm unable to reproduce this. Could you send me the commands you
>> used that lead to this?
>>
> 
> Sorry, this was *before* your last patch.
> 
> I tried to have more information, because I was not able to get console messages at crash time on this remote dev machine.
> 
> enabling SLUB checks got some hint of what the problem was (using memory block after its freeing by qdisc_destroy)

OK, that probably explains it, the spinlock operations were operating
on already freed memory.

I'll do some more testing and will send the final patch if no
other problems show up.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: net_sched 00/07: classful multiqueue dummy scheduler
  2009-09-07 17:33                 ` Patrick McHardy
@ 2009-09-07 17:38                   ` Eric Dumazet
  2009-09-07 17:46                     ` Patrick McHardy
  0 siblings, 1 reply; 46+ messages in thread
From: Eric Dumazet @ 2009-09-07 17:38 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David Miller, netdev

Patrick McHardy a écrit :
> Eric Dumazet wrote:
>> Patrick McHardy a écrit :
>>>> Sep  7 16:37:55 erd kernel: [  217.056813] =============================================================================
>>>> Sep  7 16:37:55 erd kernel: [  217.056865] BUG kmalloc-256: Poison overwritten
>>>> Sep  7 16:37:55 erd kernel: [  217.056910] -----------------------------------------------------------------------------
>>>> Sep  7 16:37:55 erd kernel: [  217.056911]
>>>> Sep  7 16:37:55 erd kernel: [  217.056990] INFO: 0xf6e622bc-0xf6e622bd. First byte 0x76 instead of 0x6b
>>>> Sep  7 16:37:55 erd kernel: [  217.057049] INFO: Allocated in qdisc_alloc+0x1b/0x80 age=154593 cpu=2 pid=5165
>>>> Sep  7 16:37:55 erd kernel: [  217.057094] INFO: Freed in qdisc_destroy+0x88/0xa0 age=139186 cpu=4 pid=5173
>>>> Sep  7 16:37:55 erd kernel: [  217.057139] INFO: Slab 0xc16ddc40 objects=26 used=6 fp=0xf6e62260 flags=0x28040c3
>>>> Sep  7 16:37:55 erd kernel: [  217.057184] INFO: Object 0xf6e62260 @offset=608 fp=0xf6e62850
>>>> Sep  7 16:37:55 erd kernel: [  217.057184]
>>> I'm unable to reproduce this. Could you send me the commands you
>>> used that lead to this?
>>>
>> Sorry, this was *before* your last patch.
>>
>> I tried to have more information, because I was not able to get console messages at crash time on this remote dev machine.
>>
>> enabling SLUB checks got some hint of what the problem was (using memory block after its freeing by qdisc_destroy)
> 
> OK, that probably explains it, the spinlock operations were operating
> on already freed memory.
> 
> I'll do some more testing and will send the final patch if no
> other problems show up.

BTW, you may ignore rate estimation requests on the mq root, since its stats
are updated only by user request, when doing a "tc -s -q qdisc" command, while
estimator is fired by a cyclic timer...



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: net_sched 00/07: classful multiqueue dummy scheduler
  2009-09-07 17:38                   ` Eric Dumazet
@ 2009-09-07 17:46                     ` Patrick McHardy
  0 siblings, 0 replies; 46+ messages in thread
From: Patrick McHardy @ 2009-09-07 17:46 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev

Eric Dumazet wrote:
> Patrick McHardy a écrit :
>> Eric Dumazet wrote:
>>> Patrick McHardy a écrit :
>>>>> Sep  7 16:37:55 erd kernel: [  217.056813] =============================================================================
>>>>> Sep  7 16:37:55 erd kernel: [  217.056865] BUG kmalloc-256: Poison overwritten
>>>>> Sep  7 16:37:55 erd kernel: [  217.056910] -----------------------------------------------------------------------------
>>>>> Sep  7 16:37:55 erd kernel: [  217.056911]
>>>>> Sep  7 16:37:55 erd kernel: [  217.056990] INFO: 0xf6e622bc-0xf6e622bd. First byte 0x76 instead of 0x6b
>>>>> Sep  7 16:37:55 erd kernel: [  217.057049] INFO: Allocated in qdisc_alloc+0x1b/0x80 age=154593 cpu=2 pid=5165
>>>>> Sep  7 16:37:55 erd kernel: [  217.057094] INFO: Freed in qdisc_destroy+0x88/0xa0 age=139186 cpu=4 pid=5173
>>>>> Sep  7 16:37:55 erd kernel: [  217.057139] INFO: Slab 0xc16ddc40 objects=26 used=6 fp=0xf6e62260 flags=0x28040c3
>>>>> Sep  7 16:37:55 erd kernel: [  217.057184] INFO: Object 0xf6e62260 @offset=608 fp=0xf6e62850
>>>>> Sep  7 16:37:55 erd kernel: [  217.057184]
>>>> I'm unable to reproduce this. Could you send me the commands you
>>>> used that lead to this?
>>>>
>>> Sorry, this was *before* your last patch.
>>>
>>> I tried to have more information, because I was not able to get console messages at crash time on this remote dev machine.
>>>
>>> enabling SLUB checks got some hint of what the problem was (using memory block after its freeing by qdisc_destroy)
>> OK, that probably explains it, the spinlock operations were operating
>> on already freed memory.
>>
>> I'll do some more testing and will send the final patch if no
>> other problems show up.
> 
> BTW, you may ignore rate estimation requests on the mq root, since its stats
> are updated only by user request, when doing a "tc -s -q qdisc" command, while
> estimator is fired by a cyclic timer...

Yes, that's probably the cleanest solution. I was considering
cloning the root estimator to the real qdiscs and summing them
up, but for now I think I'll rather disable them on the mq root
completely.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: net_sched 07/07: add classful multiqueue dummy scheduler
  2009-09-07 13:27     ` Patrick McHardy
@ 2009-09-07 18:22       ` Jarek Poplawski
  2009-09-07 19:24       ` Jarek Poplawski
  1 sibling, 0 replies; 46+ messages in thread
From: Jarek Poplawski @ 2009-09-07 18:22 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netdev

On Mon, Sep 07, 2009 at 03:27:37PM +0200, Patrick McHardy wrote:
> Jarek Poplawski wrote:
...
> >> @@ -1095,10 +1100,16 @@ create_n_graft:
> >>  		q = qdisc_create(dev, &dev->rx_queue,
> >>  				 tcm->tcm_parent, tcm->tcm_parent,
> >>  				 tca, &err);
> >> -	else
> >> -		q = qdisc_create(dev, netdev_get_tx_queue(dev, 0),
> >> +	else {
> >> +		unsigned int ntx = 0;
> >> +
> >> +		if (p && p->ops->cl_ops && p->ops->cl_ops->select_queue)
> >> +			ntx = p->ops->cl_ops->select_queue(p, tcm);
> > 
> > So, this if could be probably made shorter with a common function, but
> > the main point is: this probably works only for qdiscs having mq as a
> > parent, and not below.
> 
> Yes. mq can only be attached to the root however, so its not
> possible to use it as a child qdisc.

I mean this ->select_queue() works OK for a child qdisc of mq, e.g.
htb, but not for a child qdisc of this htb qdisc, e.g. sfq.

Jarek P.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: net_sched 07/07: add classful multiqueue dummy scheduler
  2009-09-07 13:27     ` Patrick McHardy
  2009-09-07 18:22       ` Jarek Poplawski
@ 2009-09-07 19:24       ` Jarek Poplawski
  2009-09-07 19:49         ` Eric Dumazet
  2009-09-09 16:01         ` Patrick McHardy
  1 sibling, 2 replies; 46+ messages in thread
From: Jarek Poplawski @ 2009-09-07 19:24 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netdev

On Mon, Sep 07, 2009 at 03:27:37PM +0200, Patrick McHardy wrote:
> Jarek Poplawski wrote:
...
> >> +static int mq_dump(struct Qdisc *sch, struct sk_buff *skb)
> >> +{
> >> +	struct net_device *dev = qdisc_dev(sch);
> >> +	struct Qdisc *qdisc;
> >> +	unsigned int ntx;
> >> +
> >> +	sch->q.qlen = 0;
> >> +	memset(&sch->bstats, 0, sizeof(sch->bstats));
> >> +	memset(&sch->qstats, 0, sizeof(sch->qstats));
> >> +
> >> +	for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
> >> +		qdisc = netdev_get_tx_queue(dev, ntx)->qdisc_sleeping;
> >> +		spin_lock_bh(qdisc_lock(qdisc));
> >> +		sch->q.qlen		+= qdisc->q.qlen;
> >> +		sch->bstats.bytes	+= qdisc->bstats.bytes;
> >> +		sch->bstats.packets	+= qdisc->bstats.packets;
> >> +		sch->qstats.qlen	+= qdisc->qstats.qlen;
> > 
> > Like in Christoph's case, we should probably use q.qlen instead.
> 
> Its done a few lines above. This simply sums up all members of qstats.

AFAICS these members are updated only in tc_fill_qdisc, starting from
the root, so they might be not up-to-date at the moment, unless I miss
something.

Jarek P.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: net_sched 07/07: add classful multiqueue dummy scheduler
  2009-09-07 19:24       ` Jarek Poplawski
@ 2009-09-07 19:49         ` Eric Dumazet
  2009-09-09 16:02           ` Patrick McHardy
  2009-09-09 16:01         ` Patrick McHardy
  1 sibling, 1 reply; 46+ messages in thread
From: Eric Dumazet @ 2009-09-07 19:49 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Patrick McHardy, netdev

Jarek Poplawski a écrit :
> On Mon, Sep 07, 2009 at 03:27:37PM +0200, Patrick McHardy wrote:
>> Jarek Poplawski wrote:
> ...
>>>> +static int mq_dump(struct Qdisc *sch, struct sk_buff *skb)
>>>> +{
>>>> +	struct net_device *dev = qdisc_dev(sch);
>>>> +	struct Qdisc *qdisc;
>>>> +	unsigned int ntx;
>>>> +
>>>> +	sch->q.qlen = 0;
>>>> +	memset(&sch->bstats, 0, sizeof(sch->bstats));
>>>> +	memset(&sch->qstats, 0, sizeof(sch->qstats));
>>>> +
>>>> +	for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
>>>> +		qdisc = netdev_get_tx_queue(dev, ntx)->qdisc_sleeping;
>>>> +		spin_lock_bh(qdisc_lock(qdisc));
>>>> +		sch->q.qlen		+= qdisc->q.qlen;
>>>> +		sch->bstats.bytes	+= qdisc->bstats.bytes;
>>>> +		sch->bstats.packets	+= qdisc->bstats.packets;
>>>> +		sch->qstats.qlen	+= qdisc->qstats.qlen;
>>> Like in Christoph's case, we should probably use q.qlen instead.
>> Its done a few lines above. This simply sums up all members of qstats.
> 
> AFAICS these members are updated only in tc_fill_qdisc, starting from
> the root, so they might be not up-to-date at the moment, unless I miss
> something.
> 

Yes, we might need an q->ops->update_stats(struct Qdisc *sch) method, and
to recursively call it from mq_update_stats()


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: net_sched 00/07: classful multiqueue dummy scheduler
  2009-09-07 14:23         ` Patrick McHardy
  2009-09-07 17:21           ` Eric Dumazet
@ 2009-09-08  9:31           ` David Miller
  2009-09-08 15:53             ` Patrick McHardy
  1 sibling, 1 reply; 46+ messages in thread
From: David Miller @ 2009-09-08  9:31 UTC (permalink / raw)
  To: kaber; +Cc: eric.dumazet, netdev

From: Patrick McHardy <kaber@trash.net>
Date: Mon, 07 Sep 2009 16:23:27 +0200

> This is a patch I used for testing, but I'll come up with
> something more elegant (I hope) as a final fix :)

Thanks for figuring this out Patrick.

Let me know when you have a final patch.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: net_sched 00/07: classful multiqueue dummy scheduler
  2009-09-08  9:31           ` David Miller
@ 2009-09-08 15:53             ` Patrick McHardy
  0 siblings, 0 replies; 46+ messages in thread
From: Patrick McHardy @ 2009-09-08 15:53 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, netdev

David Miller wrote:
> From: Patrick McHardy <kaber@trash.net>
> Date: Mon, 07 Sep 2009 16:23:27 +0200
>
>   
>> This is a patch I used for testing, but I'll come up with
>> something more elegant (I hope) as a final fix :)
>>     
>
> Thanks for figuring this out Patrick.
>
> Let me know when you have a final patch
>   

Will do. I'm having some trouble with my test system, so might take until
tommorrow.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: net_sched 07/07: add classful multiqueue dummy scheduler
  2009-09-07 19:24       ` Jarek Poplawski
  2009-09-07 19:49         ` Eric Dumazet
@ 2009-09-09 16:01         ` Patrick McHardy
  1 sibling, 0 replies; 46+ messages in thread
From: Patrick McHardy @ 2009-09-09 16:01 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: netdev

Jarek Poplawski wrote:
> On Mon, Sep 07, 2009 at 03:27:37PM +0200, Patrick McHardy wrote:
>> Jarek Poplawski wrote:
> ...
>>>> +static int mq_dump(struct Qdisc *sch, struct sk_buff *skb)
>>>> +{
>>>> +	struct net_device *dev = qdisc_dev(sch);
>>>> +	struct Qdisc *qdisc;
>>>> +	unsigned int ntx;
>>>> +
>>>> +	sch->q.qlen = 0;
>>>> +	memset(&sch->bstats, 0, sizeof(sch->bstats));
>>>> +	memset(&sch->qstats, 0, sizeof(sch->qstats));
>>>> +
>>>> +	for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
>>>> +		qdisc = netdev_get_tx_queue(dev, ntx)->qdisc_sleeping;
>>>> +		spin_lock_bh(qdisc_lock(qdisc));
>>>> +		sch->q.qlen		+= qdisc->q.qlen;
>>>> +		sch->bstats.bytes	+= qdisc->bstats.bytes;
>>>> +		sch->bstats.packets	+= qdisc->bstats.packets;
>>>> +		sch->qstats.qlen	+= qdisc->qstats.qlen;
>>> Like in Christoph's case, we should probably use q.qlen instead.
>> Its done a few lines above. This simply sums up all members of qstats.
> 
> AFAICS these members are updated only in tc_fill_qdisc, starting from
> the root, so they might be not up-to-date at the moment, unless I miss
> something.

Right. Its overwritten again in tc_fill_qdisc with the proper
value contained in sch->q.qlen however, so the final value
dumped to userspace is correct. So we can simply remove the
qstats.qlen handling in mq_dump().


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: net_sched 07/07: add classful multiqueue dummy scheduler
  2009-09-07 19:49         ` Eric Dumazet
@ 2009-09-09 16:02           ` Patrick McHardy
  2009-09-09 19:52             ` Jarek Poplawski
  0 siblings, 1 reply; 46+ messages in thread
From: Patrick McHardy @ 2009-09-09 16:02 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jarek Poplawski, netdev

Eric Dumazet wrote:
> Jarek Poplawski a écrit :
>> On Mon, Sep 07, 2009 at 03:27:37PM +0200, Patrick McHardy wrote:
>>> Jarek Poplawski wrote:
>> ...
>>>>> +static int mq_dump(struct Qdisc *sch, struct sk_buff *skb)
>>>>> +{
>>>>> +	struct net_device *dev = qdisc_dev(sch);
>>>>> +	struct Qdisc *qdisc;
>>>>> +	unsigned int ntx;
>>>>> +
>>>>> +	sch->q.qlen = 0;
>>>>> +	memset(&sch->bstats, 0, sizeof(sch->bstats));
>>>>> +	memset(&sch->qstats, 0, sizeof(sch->qstats));
>>>>> +
>>>>> +	for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
>>>>> +		qdisc = netdev_get_tx_queue(dev, ntx)->qdisc_sleeping;
>>>>> +		spin_lock_bh(qdisc_lock(qdisc));
>>>>> +		sch->q.qlen		+= qdisc->q.qlen;
>>>>> +		sch->bstats.bytes	+= qdisc->bstats.bytes;
>>>>> +		sch->bstats.packets	+= qdisc->bstats.packets;
>>>>> +		sch->qstats.qlen	+= qdisc->qstats.qlen;
>>>> Like in Christoph's case, we should probably use q.qlen instead.
>>> Its done a few lines above. This simply sums up all members of qstats.
>> AFAICS these members are updated only in tc_fill_qdisc, starting from
>> the root, so they might be not up-to-date at the moment, unless I miss
>> something.
>
> Yes, we might need an q->ops->update_stats(struct Qdisc *sch) method, and
> to recursively call it from mq_update_stats()

Unless I'm missing something, that shouldn't be necessary since
sch->q.qlen contains the correct sum of all child qdiscs and
this is used by tc_fill_qdisc to update qstats.qlen.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: net_sched 07/07: add classful multiqueue dummy scheduler
  2009-09-09 16:02           ` Patrick McHardy
@ 2009-09-09 19:52             ` Jarek Poplawski
  2009-09-10 11:28               ` Patrick McHardy
  0 siblings, 1 reply; 46+ messages in thread
From: Jarek Poplawski @ 2009-09-09 19:52 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Eric Dumazet, netdev

On Wed, Sep 09, 2009 at 06:02:59PM +0200, Patrick McHardy wrote:
> Eric Dumazet wrote:
> > Jarek Poplawski a écrit :
> >> On Mon, Sep 07, 2009 at 03:27:37PM +0200, Patrick McHardy wrote:
> >>> Jarek Poplawski wrote:
> >> ...
> >>>>> +static int mq_dump(struct Qdisc *sch, struct sk_buff *skb)
> >>>>> +{
> >>>>> +	struct net_device *dev = qdisc_dev(sch);
> >>>>> +	struct Qdisc *qdisc;
> >>>>> +	unsigned int ntx;
> >>>>> +
> >>>>> +	sch->q.qlen = 0;
> >>>>> +	memset(&sch->bstats, 0, sizeof(sch->bstats));
> >>>>> +	memset(&sch->qstats, 0, sizeof(sch->qstats));
> >>>>> +
> >>>>> +	for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
> >>>>> +		qdisc = netdev_get_tx_queue(dev, ntx)->qdisc_sleeping;
> >>>>> +		spin_lock_bh(qdisc_lock(qdisc));
> >>>>> +		sch->q.qlen		+= qdisc->q.qlen;
> >>>>> +		sch->bstats.bytes	+= qdisc->bstats.bytes;
> >>>>> +		sch->bstats.packets	+= qdisc->bstats.packets;
> >>>>> +		sch->qstats.qlen	+= qdisc->qstats.qlen;
> >>>> Like in Christoph's case, we should probably use q.qlen instead.
> >>> Its done a few lines above. This simply sums up all members of qstats.
> >> AFAICS these members are updated only in tc_fill_qdisc, starting from
> >> the root, so they might be not up-to-date at the moment, unless I miss
> >> something.
> >
> > Yes, we might need an q->ops->update_stats(struct Qdisc *sch) method, and
> > to recursively call it from mq_update_stats()
> 
> Unless I'm missing something, that shouldn't be necessary since
> sch->q.qlen contains the correct sum of all child qdiscs and
> this is used by tc_fill_qdisc to update qstats.qlen.

You're perfectly right! (And the code is perfectly misleading.;-)

Thanks,
Jarek P.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: net_sched 07/07: add classful multiqueue dummy scheduler
  2009-09-09 19:52             ` Jarek Poplawski
@ 2009-09-10 11:28               ` Patrick McHardy
  2009-09-11 21:38                 ` Jarek Poplawski
  0 siblings, 1 reply; 46+ messages in thread
From: Patrick McHardy @ 2009-09-10 11:28 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Eric Dumazet, netdev

Jarek Poplawski wrote:
> On Wed, Sep 09, 2009 at 06:02:59PM +0200, Patrick McHardy wrote:
>>>>>>> +	for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
>>>>>>> +		qdisc = netdev_get_tx_queue(dev, ntx)->qdisc_sleeping;
>>>>>>> +		spin_lock_bh(qdisc_lock(qdisc));
>>>>>>> +		sch->q.qlen		+= qdisc->q.qlen;
>>>>>>> +		sch->bstats.bytes	+= qdisc->bstats.bytes;
>>>>>>> +		sch->bstats.packets	+= qdisc->bstats.packets;
>>>>>>> +		sch->qstats.qlen	+= qdisc->qstats.qlen;
>>>>>> Like in Christoph's case, we should probably use q.qlen instead.
>>>>> Its done a few lines above. This simply sums up all members of qstats.
>>>> AFAICS these members are updated only in tc_fill_qdisc, starting from
>>>> the root, so they might be not up-to-date at the moment, unless I miss
>>>> something.
>>> Yes, we might need an q->ops->update_stats(struct Qdisc *sch) method, and
>>> to recursively call it from mq_update_stats()
>> Unless I'm missing something, that shouldn't be necessary since
>> sch->q.qlen contains the correct sum of all child qdiscs and
>> this is used by tc_fill_qdisc to update qstats.qlen.
> 
> You're perfectly right! (And the code is perfectly misleading.;-)

I'll remove the misleading (and unnecessary) line of code, thanks Jarek.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: net_sched 07/07: add classful multiqueue dummy scheduler
  2009-09-10 11:28               ` Patrick McHardy
@ 2009-09-11 21:38                 ` Jarek Poplawski
  2009-09-11 22:10                   ` David Miller
  0 siblings, 1 reply; 46+ messages in thread
From: Jarek Poplawski @ 2009-09-11 21:38 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Eric Dumazet, netdev, David Miller

On Thu, Sep 10, 2009 at 01:28:59PM +0200, Patrick McHardy wrote:
...
> I'll remove the misleading (and unnecessary) line of code, thanks Jarek.

Btw, I guess David owes you one classful(!) dummy(?) scheduler...

Jarek P.

commit 6ec1c69a8f6492fd25722f4762721921da074c12
Author: David S. Miller <davem@davemloft.net>
Date:   Sun Sep 6 01:58:51 2009 -0700

    net_sched: add classful multiqueue dummy scheduler
    
    This patch adds a classful dummy scheduler which can be used as root qdisc
    for multiqueue devices and exposes each device queue as a child class.
    
    This allows to address queues individually and graft them similar to regular
    classes. Additionally it presents an accumulated view of the statistics of
    all real root qdiscs in the dummy root.
    
    Two new callbacks are added to the qdisc_ops and qdisc_class_ops:
    
    - cl_ops->select_queue selects the tx queue number for new child classes.
    
    - qdisc_ops->attach() overrides root qdisc device grafting to attach
      non-shared qdiscs to the queues.
    
    Signed-off-by: Patrick McHardy <kaber@trash.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
new file mode 100644
index 0000000..c84dec9
--- /dev/null
+++ b/net/sched/sch_mq.c


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: net_sched 07/07: add classful multiqueue dummy scheduler
  2009-09-11 21:38                 ` Jarek Poplawski
@ 2009-09-11 22:10                   ` David Miller
  2009-09-11 22:21                     ` Jarek Poplawski
  0 siblings, 1 reply; 46+ messages in thread
From: David Miller @ 2009-09-11 22:10 UTC (permalink / raw)
  To: jarkao2; +Cc: kaber, eric.dumazet, netdev

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Fri, 11 Sep 2009 23:38:13 +0200

> On Thu, Sep 10, 2009 at 01:28:59PM +0200, Patrick McHardy wrote:
> ...
>> I'll remove the misleading (and unnecessary) line of code, thanks Jarek.
> 
> Btw, I guess David owes you one classful(!) dummy(?) scheduler...

Did I forget to add the sch_mq.c file to the tree?  What are
you saying? :-)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: net_sched 07/07: add classful multiqueue dummy scheduler
  2009-09-11 22:10                   ` David Miller
@ 2009-09-11 22:21                     ` Jarek Poplawski
  2009-09-11 22:27                       ` David Miller
  0 siblings, 1 reply; 46+ messages in thread
From: Jarek Poplawski @ 2009-09-11 22:21 UTC (permalink / raw)
  To: David Miller; +Cc: kaber, eric.dumazet, netdev

On Fri, Sep 11, 2009 at 03:10:39PM -0700, David Miller wrote:
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Fri, 11 Sep 2009 23:38:13 +0200
> 
> > On Thu, Sep 10, 2009 at 01:28:59PM +0200, Patrick McHardy wrote:
> > ...
> >> I'll remove the misleading (and unnecessary) line of code, thanks Jarek.
> > 
> > Btw, I guess David owes you one classful(!) dummy(?) scheduler...
> 
> Did I forget to add the sch_mq.c file to the tree?  What are
> you saying? :-)

commit 6ec1c69a8f6492fd25722f4762721921da074c12
Author: David S. Miller <davem@davemloft.net>
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ?! ;-)

Jarek P.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: net_sched 07/07: add classful multiqueue dummy scheduler
  2009-09-11 22:21                     ` Jarek Poplawski
@ 2009-09-11 22:27                       ` David Miller
  0 siblings, 0 replies; 46+ messages in thread
From: David Miller @ 2009-09-11 22:27 UTC (permalink / raw)
  To: jarkao2; +Cc: kaber, eric.dumazet, netdev

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Sat, 12 Sep 2009 00:21:22 +0200

> On Fri, Sep 11, 2009 at 03:10:39PM -0700, David Miller wrote:
>> From: Jarek Poplawski <jarkao2@gmail.com>
>> Date: Fri, 11 Sep 2009 23:38:13 +0200
>> 
>> > On Thu, Sep 10, 2009 at 01:28:59PM +0200, Patrick McHardy wrote:
>> > ...
>> >> I'll remove the misleading (and unnecessary) line of code, thanks Jarek.
>> > 
>> > Btw, I guess David owes you one classful(!) dummy(?) scheduler...
>> 
>> Did I forget to add the sch_mq.c file to the tree?  What are
>> you saying? :-)
> 
> commit 6ec1c69a8f6492fd25722f4762721921da074c12
> Author: David S. Miller <davem@davemloft.net>
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ?! ;-)

:-)

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2009-09-11 22:27 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-04 16:41 net_sched 00/07: classful multiqueue dummy scheduler Patrick McHardy
2009-09-04 16:41 ` net_sched 01/07: fix class grafting errno codes Patrick McHardy
2009-09-04 16:41 ` net_sched 02/07: make cls_ops->tcf_chain() optional Patrick McHardy
2009-09-05  8:13   ` Jarek Poplawski
2009-09-05 11:57     ` Jarek Poplawski
2009-09-05 12:32       ` Jarek Poplawski
2009-09-05 17:03         ` Patrick McHardy
2009-09-06  9:06           ` David Miller
2009-09-04 16:41 ` net_sched 03/07: make cls_ops->change and cls_ops->delete optional Patrick McHardy
2009-09-04 16:41 ` net_sched 04/07: remove some unnecessary checks in classful schedulers Patrick McHardy
2009-09-04 16:41 ` net_sched 05/07: reintroduce dev->qdisc for use by sch_api Patrick McHardy
2009-09-06 18:57   ` Jarek Poplawski
2009-09-07 13:16     ` Patrick McHardy
2009-09-07 16:49       ` Jarek Poplawski
2009-09-04 16:41 ` net_sched 06/07: move dev_graft_qdisc() to sch_generic.c Patrick McHardy
2009-09-04 16:41 ` net_sched 07/07: add classful multiqueue dummy scheduler Patrick McHardy
2009-09-06 20:04   ` Jarek Poplawski
2009-09-07 13:27     ` Patrick McHardy
2009-09-07 18:22       ` Jarek Poplawski
2009-09-07 19:24       ` Jarek Poplawski
2009-09-07 19:49         ` Eric Dumazet
2009-09-09 16:02           ` Patrick McHardy
2009-09-09 19:52             ` Jarek Poplawski
2009-09-10 11:28               ` Patrick McHardy
2009-09-11 21:38                 ` Jarek Poplawski
2009-09-11 22:10                   ` David Miller
2009-09-11 22:21                     ` Jarek Poplawski
2009-09-11 22:27                       ` David Miller
2009-09-09 16:01         ` Patrick McHardy
2009-09-04 16:42 ` net_sched 00/07: " Patrick McHardy
2009-09-07  8:50   ` David Miller
2009-09-07  9:46     ` Jarek Poplawski
2009-09-07 13:00     ` Eric Dumazet
2009-09-07 13:29       ` Patrick McHardy
2009-09-07 14:23         ` Patrick McHardy
2009-09-07 17:21           ` Eric Dumazet
2009-09-07 17:28             ` Patrick McHardy
2009-09-07 17:30               ` Eric Dumazet
2009-09-07 17:33                 ` Patrick McHardy
2009-09-07 17:38                   ` Eric Dumazet
2009-09-07 17:46                     ` Patrick McHardy
2009-09-08  9:31           ` David Miller
2009-09-08 15:53             ` Patrick McHardy
2009-09-05  7:27 ` David Miller
2009-09-05 17:02   ` Patrick McHardy
2009-09-06  9:01     ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.