All of lore.kernel.org
 help / color / mirror / Atom feed
* CBQ broken in 2.6
@ 2010-01-27 11:05 Anton Ivanov
  2010-01-27 11:56 ` David Miller
  0 siblings, 1 reply; 10+ messages in thread
From: Anton Ivanov @ 2010-01-27 11:05 UTC (permalink / raw)
  To: linux-kernel

Hi list,

I have been looking at CBQ being broken in 2.6 (Debian BUG 534430).

The bug is very simple and it makes it unusable for all practical
purposes - BOUNDED classes always borrow from their parent. As a result
it impossible to:

1. Set and enforce a bandwidth limit to a class with a CBQ tree
2. Use CBQ to do QoS at any place but the actual link. For example - put
a linux box after behind a DSL modem and tell it to QoS at the bandwidth
at which the modem is configured.

I have now looked through the code of cbq_sched.c (in 2.6.33-rc5 and
2.6.26 as shipped by Debian).

In both pieces of code the variable in the class structure for CBQ
cl->borrow is used to signal if borrowing is allowed. If borrowing is
not allowed (BOUNDED class) it is set to NULL.

However the routine which parses the class options and defines the class
cbq_change_class() sets it to the parent class and never resets it to
NULL for bounded classes.

Am I missing something here? 

Please cc me as I am not subscribed to the list.

Best Regards,

-- 
   Understanding is a three-edged sword:
            your side, their side, and the truth. --Kosh Naranek

A. R. Ivanov
E-mail:  anton.ivanov@kot-begemot.co.uk
WWW:     http://www.kot-begemot.co.uk/



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: CBQ broken in 2.6
  2010-01-27 11:05 CBQ broken in 2.6 Anton Ivanov
@ 2010-01-27 11:56 ` David Miller
  2010-01-27 12:28   ` Anton Ivanov
  0 siblings, 1 reply; 10+ messages in thread
From: David Miller @ 2010-01-27 11:56 UTC (permalink / raw)
  To: anton.ivanov; +Cc: linux-kernel, netdev

From: Anton Ivanov <anton.ivanov@kot-begemot.co.uk>
Date: Wed, 27 Jan 2010 11:05:47 +0000

> I have been looking at CBQ being broken in 2.6 (Debian BUG 534430).

Please direct your question to netdev@vger.kernel.org which is
where the networking experts hang out.

Added to CC:

> The bug is very simple and it makes it unusable for all practical
> purposes - BOUNDED classes always borrow from their parent. As a result
> it impossible to:
> 
> 1. Set and enforce a bandwidth limit to a class with a CBQ tree
> 2. Use CBQ to do QoS at any place but the actual link. For example - put
> a linux box after behind a DSL modem and tell it to QoS at the bandwidth
> at which the modem is configured.
> 
> I have now looked through the code of cbq_sched.c (in 2.6.33-rc5 and
> 2.6.26 as shipped by Debian).

I think you mean net/sched/sch_cbq.c, there is no cbq_sched.c
anywhere in the source tree.

> In both pieces of code the variable in the class structure for CBQ
> cl->borrow is used to signal if borrowing is allowed. If borrowing is
> not allowed (BOUNDED class) it is set to NULL.
> 
> However the routine which parses the class options and defines the class
> cbq_change_class() sets it to the parent class and never resets it to
> NULL for bounded classes.
> 
> Am I missing something here? 
> 
> Please cc me as I am not subscribed to the list.
> 
> Best Regards,
> 
> -- 
>    Understanding is a three-edged sword:
>             your side, their side, and the truth. --Kosh Naranek
> 
> A. R. Ivanov
> E-mail:  anton.ivanov@kot-begemot.co.uk
> WWW:     http://www.kot-begemot.co.uk/
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: CBQ broken in 2.6
  2010-01-27 11:56 ` David Miller
@ 2010-01-27 12:28   ` Anton Ivanov
  2010-01-28 12:53     ` Jarek Poplawski
  0 siblings, 1 reply; 10+ messages in thread
From: Anton Ivanov @ 2010-01-27 12:28 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Thanks it is indeed sch_cbq.c,

I have been going through the code in the meantime and found a place
where cl->tparent can be modified further down in the routine in
question. There is an invocation of cbq_set_lss() around the end of
cbq_change_class() which can do that.

I am rebuilding the kernel for my CBQ box with a few printks at the
moment to see if it modified there or not.

In any case here is the tell-tale symptom:

class cbq 1:16 parent 1: leaf 76: rate 5600Kbit (bounded,isolated) prio
2
 Sent 162051 bytes 925 pkt (dropped 0, overlimits 0 requeues 0)
 rate 0bit 0pps backlog 0b 0p requeues 0
  borrowed 201 overactions 0 avgidle 78 undertime 0

That is a bounded class. Its borrowed should be always 0 no matter what.
That is basically a broken CBQ implementation.

An elementary network test suite shows the same result - it is being
allowed to borrow.

I am happy to send the whole config if necessary if someone wants to
look at it.

Brgds,

[snip]

-- 
   Understanding is a three-edged sword:
            your side, their side, and the truth. --Kosh Naranek

A. R. Ivanov
E-mail:  anton.ivanov@kot-begemot.co.uk
WWW:     http://www.kot-begemot.co.uk/



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: CBQ broken in 2.6
  2010-01-27 12:28   ` Anton Ivanov
@ 2010-01-28 12:53     ` Jarek Poplawski
  2010-01-28 13:34       ` Anton Ivanov
  0 siblings, 1 reply; 10+ messages in thread
From: Jarek Poplawski @ 2010-01-28 12:53 UTC (permalink / raw)
  To: Anton Ivanov; +Cc: David Miller, netdev

On 27-01-2010 13:28, Anton Ivanov wrote:
> Thanks it is indeed sch_cbq.c,
> 
> I have been going through the code in the meantime and found a place
> where cl->tparent can be modified further down in the routine in
> question. There is an invocation of cbq_set_lss() around the end of
> cbq_change_class() which can do that.
> 
> I am rebuilding the kernel for my CBQ box with a few printks at the
> moment to see if it modified there or not.
> 
> In any case here is the tell-tale symptom:
> 
> class cbq 1:16 parent 1: leaf 76: rate 5600Kbit (bounded,isolated) prio
> 2
>  Sent 162051 bytes 925 pkt (dropped 0, overlimits 0 requeues 0)
>  rate 0bit 0pps backlog 0b 0p requeues 0
>   borrowed 201 overactions 0 avgidle 78 undertime 0
> 
> That is a bounded class. Its borrowed should be always 0 no matter what.
> That is basically a broken CBQ implementation.
> 
> An elementary network test suite shows the same result - it is being
> allowed to borrow.
> 
> I am happy to send the whole config if necessary if someone wants to
> look at it.

Was this class created as bounded or changed later? Did it have any
child?

Jarek P.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: CBQ broken in 2.6
  2010-01-28 12:53     ` Jarek Poplawski
@ 2010-01-28 13:34       ` Anton Ivanov
  2010-01-28 18:45         ` Jarek Poplawski
  0 siblings, 1 reply; 10+ messages in thread
From: Anton Ivanov @ 2010-01-28 13:34 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: David Miller, netdev

Hi Jarek,

The class was created as bounded and never changed. I do a full reload
of the hierarchy including a root qdisc delete via a script instead of
trying to adjust it. 

It has 7 or so children and one sibling. It is parented to root.

I have recompiled sch_cbq.c with a number of printks in strategic
places. I have also added some debug output to tc. I have also rewritten
my test suite.

The results are as follows

Based on the printks cbq_set_lss and the cbq_change_class functions
correctly.

It indeed sets cl->borrow and cl->share to NULL as expected.

Based on results from the fixed test suite it also works as expected,
just with much lower precision than what I used to get from 2.6.9 and
2.6.18.

However, it still returns complete bonkers for stats. 

Example (same class, I just moved the hierarchy around a bit trying to
get a better fix on this so it is now 1:15).

class cbq 1:15 parent 1: leaf 76: rate 5600Kbit (bounded) prio 1
 Sent 10920592 bytes 14420 pkt (dropped 0, overlimits 0 requeues 0)
 rate 0bit 0pps backlog 0b 0p requeues 0
  borrowed 8311 overactions 0 avgidle 78 undertime 0

It is bounded, but according to stats it has borrowed and has no
overactions. 

If the stats are to be believed it is a bug somewhere which is eluding
me (and my test suite and printks are wrong somehow). 

Alternatively, it may be a bug with the stats themselves which once
again I do not see in the actual sch_cbq.c file.

Brgds,


On Thu, 2010-01-28 at 12:53 +0000, Jarek Poplawski wrote:
> On 27-01-2010 13:28, Anton Ivanov wrote:
> > Thanks it is indeed sch_cbq.c,
> > 
> > I have been going through the code in the meantime and found a place
> > where cl->tparent can be modified further down in the routine in
> > question. There is an invocation of cbq_set_lss() around the end of
> > cbq_change_class() which can do that.
> > 
> > I am rebuilding the kernel for my CBQ box with a few printks at the
> > moment to see if it modified there or not.
> > 
> > In any case here is the tell-tale symptom:
> > 
> > class cbq 1:16 parent 1: leaf 76: rate 5600Kbit (bounded,isolated) prio
> > 2
> >  Sent 162051 bytes 925 pkt (dropped 0, overlimits 0 requeues 0)
> >  rate 0bit 0pps backlog 0b 0p requeues 0
> >   borrowed 201 overactions 0 avgidle 78 undertime 0
> > 
> > That is a bounded class. Its borrowed should be always 0 no matter what.
> > That is basically a broken CBQ implementation.
> > 
> > An elementary network test suite shows the same result - it is being
> > allowed to borrow.
> > 
> > I am happy to send the whole config if necessary if someone wants to
> > look at it.
> 
> Was this class created as bounded or changed later? Did it have any
> child?
> 
> Jarek P.
> 
-- 
   Understanding is a three-edged sword:
            your side, their side, and the truth. --Kosh Naranek

A. R. Ivanov
E-mail:  anton.ivanov@kot-begemot.co.uk
WWW:     http://www.kot-begemot.co.uk/



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: CBQ broken in 2.6
  2010-01-28 13:34       ` Anton Ivanov
@ 2010-01-28 18:45         ` Jarek Poplawski
  2010-01-28 19:48           ` Anton Ivanov
  0 siblings, 1 reply; 10+ messages in thread
From: Jarek Poplawski @ 2010-01-28 18:45 UTC (permalink / raw)
  To: Anton Ivanov; +Cc: David Miller, netdev

On Thu, Jan 28, 2010 at 01:34:56PM +0000, Anton Ivanov wrote:
> Hi Jarek,
Hi Anton,

> 
> The class was created as bounded and never changed. I do a full reload
> of the hierarchy including a root qdisc delete via a script instead of
> trying to adjust it. 
> 
> It has 7 or so children and one sibling. It is parented to root.
> 
> I have recompiled sch_cbq.c with a number of printks in strategic
> places. I have also added some debug output to tc. I have also rewritten
> my test suite.
> 
> The results are as follows
> 
> Based on the printks cbq_set_lss and the cbq_change_class functions
> correctly.
> 
> It indeed sets cl->borrow and cl->share to NULL as expected.
> 
> Based on results from the fixed test suite it also works as expected,
> just with much lower precision than what I used to get from 2.6.9 and
> 2.6.18.
> 
> However, it still returns complete bonkers for stats. 
> 
> Example (same class, I just moved the hierarchy around a bit trying to
> get a better fix on this so it is now 1:15).
> 
> class cbq 1:15 parent 1: leaf 76: rate 5600Kbit (bounded) prio 1
>  Sent 10920592 bytes 14420 pkt (dropped 0, overlimits 0 requeues 0)
>  rate 0bit 0pps backlog 0b 0p requeues 0
>   borrowed 8311 overactions 0 avgidle 78 undertime 0
> 
> It is bounded, but according to stats it has borrowed and has no
> overactions. 
> 
> If the stats are to be believed it is a bug somewhere which is eluding
> me (and my test suite and printks are wrong somehow). 
> 
> Alternatively, it may be a bug with the stats themselves which once
> again I do not see in the actual sch_cbq.c file.

For now I could only find some explanation in this place:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=net/sched/sch_cbq.c;h=3846d65bc03ec7a7de7e6ca674dc6c2b255dc570;hb=HEAD

820 static __inline__ struct sk_buff *
821 cbq_dequeue_prio(struct Qdisc *sch, int prio)
822 {
...
863                         if (borrow != cl) {
864 #ifndef CBQ_XSTATS_BORROWS_BYTES
865                                 borrow->xstats.borrows++;
866                                 cl->xstats.borrows++;
867 #else
868                                 borrow->xstats.borrows += qdisc_pkt_len(skb);
869                                 cl->xstats.borrows += qdisc_pkt_len(skb);
870 #endif

So, xstats.borrows is updated for lender as well, which might look
wrong/right/funny (depending on your political principles ;-).

It could be your case if the class above has an unbounded child class.
Otherwise, it needs more searching. Btw, I'm not the CBQ expert to
verify (without learning the specs) these borrowing relations. (As a
matter of fact, within a few years I didn't find here many traces of
such (active) experts, so my recommendation would be HTB or HFSC
unless you really know what you're doing ;-)

Regards,
Jarek P.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: CBQ broken in 2.6
  2010-01-28 18:45         ` Jarek Poplawski
@ 2010-01-28 19:48           ` Anton Ivanov
  2010-01-28 21:21             ` Jarek Poplawski
  2010-01-28 21:27             ` Eric Dumazet
  0 siblings, 2 replies; 10+ messages in thread
From: Anton Ivanov @ 2010-01-28 19:48 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: David Miller, netdev

Hi Jarek,

[snip]


> > Alternatively, it may be a bug with the stats themselves which once
> > again I do not see in the actual sch_cbq.c file.
> 
> For now I could only find some explanation in this place:
> 
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=net/sched/sch_cbq.c;h=3846d65bc03ec7a7de7e6ca674dc6c2b255dc570;hb=HEAD
> 
> 820 static __inline__ struct sk_buff *
> 821 cbq_dequeue_prio(struct Qdisc *sch, int prio)
> 822 {
> ...
> 863                         if (borrow != cl) {
> 864 #ifndef CBQ_XSTATS_BORROWS_BYTES
> 865                                 borrow->xstats.borrows++;
> 866                                 cl->xstats.borrows++;
> 867 #else
> 868                                 borrow->xstats.borrows += qdisc_pkt_len(skb);
> 869                                 cl->xstats.borrows += qdisc_pkt_len(skb);
> 870 #endif
> 
> So, xstats.borrows is updated for lender as well, which might look
> wrong/right/funny (depending on your political principles ;-).

OK. I will look at it. The traffic is actually in a subclass (1:18) so
this may explain it. It definitely does not fit my "political
principles". Borrow should mean how many times the class has borrowed
and overactions should mean the whole lot of overs - delay, borrow,
drop, etc. At least that were the semantics on BSD and other
implementations I dealt with before Linux. 

My problem as well is that my C has always sucked bricks and I use it
once a year when I run into a problem like that. So it takes me a while
to trace it and figure it out.

> 
> It could be your case if the class above has an unbounded child class.
> Otherwise, it needs more searching. Btw, I'm not the CBQ expert to
> verify (without learning the specs) these borrowing relations. (As a
> matter of fact, within a few years I didn't find here many traces of
> such (active) experts, so my recommendation would be HTB or HFSC
> unless you really know what you're doing ;-)

Well, I know CBQ reasonably well from a "network engineer" perspective.
I used to run it production in an ISP 1998 and after that in an SMB
2002-2007. I have also been running it in lab and on my home network for
10+ years. Mostly on BSD though. There was even a point where I
maintained a fork of the patches before they went into the main tree.
The sole reason I ditched BSD was that they plugged it into the pf
framework and it became too big of a hassle to port/maintain the
configs.

I know what I am doing and HTB is not what I want. It is a good algo for
expressing classic commerical relationships (CIR/PIR). It is worse than
CBQ for "sharing" where all of the apps that share are your own and the
scheduler/estimator is good.

Thanks for all the help. 

> 
> Regards,
> Jarek P.
> 
-- 
   Understanding is a three-edged sword:
            your side, their side, and the truth. --Kosh Naranek

A. R. Ivanov
E-mail:  anton.ivanov@kot-begemot.co.uk
WWW:     http://www.kot-begemot.co.uk/



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: CBQ broken in 2.6
  2010-01-28 19:48           ` Anton Ivanov
@ 2010-01-28 21:21             ` Jarek Poplawski
  2010-01-28 21:27             ` Eric Dumazet
  1 sibling, 0 replies; 10+ messages in thread
From: Jarek Poplawski @ 2010-01-28 21:21 UTC (permalink / raw)
  To: Anton Ivanov; +Cc: David Miller, netdev

On Thu, Jan 28, 2010 at 07:48:55PM +0000, Anton Ivanov wrote:
> Hi Jarek,
Hi Anton,
...
> > So, xstats.borrows is updated for lender as well, which might look
> > wrong/right/funny (depending on your political principles ;-).
> 
> OK. I will look at it. The traffic is actually in a subclass (1:18) so
> this may explain it. It definitely does not fit my "political
> principles". Borrow should mean how many times the class has borrowed
> and overactions should mean the whole lot of overs - delay, borrow,
> drop, etc. At least that were the semantics on BSD and other
> implementations I dealt with before Linux. 

Of course I was joking, and no need to explain this is at least
misleading, but if it's old enough it might be treated as part of user
API. Probably somebody wanted to save an additional variable, and btw.
in some languages there is one verb only, something like here:

"bounded
	Signifies that this class will not borrow bandwidth from its siblings.

 isolated
        Means that this class will not borrow bandwidth to its siblings"

[from man(8)tc-cbq]

So, consider it documented ;-)

> 
> My problem as well is that my C has always sucked bricks and I use it
> once a year when I run into a problem like that. So it takes me a while
> to trace it and figure it out.
> 
> > 
> > It could be your case if the class above has an unbounded child class.
> > Otherwise, it needs more searching. Btw, I'm not the CBQ expert to
> > verify (without learning the specs) these borrowing relations. (As a
> > matter of fact, within a few years I didn't find here many traces of
> > such (active) experts, so my recommendation would be HTB or HFSC
> > unless you really know what you're doing ;-)
> 
> Well, I know CBQ reasonably well from a "network engineer" perspective.
> I used to run it production in an ISP 1998 and after that in an SMB
> 2002-2007. I have also been running it in lab and on my home network for
> 10+ years. Mostly on BSD though. There was even a point where I
> maintained a fork of the patches before they went into the main tree.
> The sole reason I ditched BSD was that they plugged it into the pf
> framework and it became too big of a hassle to port/maintain the
> configs.
> 
> I know what I am doing and HTB is not what I want. It is a good algo for
> expressing classic commerical relationships (CIR/PIR). It is worse than
> CBQ for "sharing" where all of the apps that share are your own and the
> scheduler/estimator is good.

Well, congratulations! But, please, remember this nice, but a very
complex tool is rarely improved/fixed now, mainly not to break some
backward compatibility, while not much reporters/testers either. But,
of course, any feedback is welcome...

Thanks,
Jarek P.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: CBQ broken in 2.6
  2010-01-28 19:48           ` Anton Ivanov
  2010-01-28 21:21             ` Jarek Poplawski
@ 2010-01-28 21:27             ` Eric Dumazet
  2010-01-29 12:25               ` Anton Ivanov
  1 sibling, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2010-01-28 21:27 UTC (permalink / raw)
  To: Anton Ivanov; +Cc: Jarek Poplawski, David Miller, netdev

Le jeudi 28 janvier 2010 à 19:48 +0000, Anton Ivanov a écrit :

> Well, I know CBQ reasonably well from a "network engineer" perspective.
> I used to run it production in an ISP 1998 and after that in an SMB
> 2002-2007. I have also been running it in lab and on my home network for
> 10+ years. Mostly on BSD though. There was even a point where I
> maintained a fork of the patches before they went into the main tree.
> The sole reason I ditched BSD was that they plugged it into the pf
> framework and it became too big of a hassle to port/maintain the
> configs.
> 
> I know what I am doing and HTB is not what I want. It is a good algo for
> expressing classic commerical relationships (CIR/PIR). It is worse than
> CBQ for "sharing" where all of the apps that share are your own and the
> scheduler/estimator is good.
> 
> Thanks for all the help. 

I am using CBQ myself with recent kernels and never found it
'borrowing', could you post a copy of your rules, or better, a subset of
them desmonstrating the problem ?




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: CBQ broken in 2.6
  2010-01-28 21:27             ` Eric Dumazet
@ 2010-01-29 12:25               ` Anton Ivanov
  0 siblings, 0 replies; 10+ messages in thread
From: Anton Ivanov @ 2010-01-29 12:25 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jarek Poplawski, David Miller, netdev

[snip]

> > Thanks for all the help. 
> 
> I am using CBQ myself with recent kernels and never found it
> 'borrowing', could you post a copy of your rules, or better, a subset of
> them desmonstrating the problem ?

Actually after going through it several times and looking at the code
Jarek pointed out it doesn't. 

Just the stats are very confusing and precision is not particularly
great.

It says "borrowed" while actually it is the child which has borrowed
from this class, not the class itself borrowing. The class has been
borrowed from, not it itself borrowed. 

As nobody else has complained about the stats they should be probably
left as is according to the "least surprise" principle.

As far as the precision becoming worse over the last 15 or so minor
revision 2.6.9 to 2.6.26 that cannot be helped. On my hardware it was so
bad that I had it confused with not working at all at some point. I
ended up sticking aggressive RED leafs on the biggest "offenders" and
this has gotten my config to a working state for now. I am not happy
with it, but it is better than no QoS at all.

> 
> 
> 
> 
-- 
   Understanding is a three-edged sword:
            your side, their side, and the truth. --Kosh Naranek

A. R. Ivanov
E-mail:  anton.ivanov@kot-begemot.co.uk
WWW:     http://www.kot-begemot.co.uk/



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-01-29 12:25 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-01-27 11:05 CBQ broken in 2.6 Anton Ivanov
2010-01-27 11:56 ` David Miller
2010-01-27 12:28   ` Anton Ivanov
2010-01-28 12:53     ` Jarek Poplawski
2010-01-28 13:34       ` Anton Ivanov
2010-01-28 18:45         ` Jarek Poplawski
2010-01-28 19:48           ` Anton Ivanov
2010-01-28 21:21             ` Jarek Poplawski
2010-01-28 21:27             ` Eric Dumazet
2010-01-29 12:25               ` Anton Ivanov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.