[PATCH 2/6] pkt_sched: sch_htb: Consider used jiffies in htb

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 2/6] pkt_sched: sch_htb: Consider used jiffies in htb_dequeue()
@ 2008-12-09 10:21 Jarek Poplawski
  2008-12-09 10:28 ` Patrick McHardy
  0 siblings, 1 reply; 40+ messages in thread
From: Jarek Poplawski @ 2008-12-09 10:21 UTC (permalink / raw)
  To: David Miller; +Cc: Martin Devera, Patrick McHardy, netdev

Next event time should consider jiffies used for recounting. Otherwise
qdisc_watchdog_schedule() triggers hrtimer immediately with the event
in the past, and may cause very high ksoftirqd cpu usage (if highres
is on). This patch charges jiffies globally, so we can skip this in
htb_do_events().

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
---
 net/sched/sch_htb.c |   14 ++++++++++++--
 1 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index d6eb4a7..102866d 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -685,8 +685,11 @@ static psched_time_t htb_do_events(struct htb_sched *q, int level)
 		if (cl->cmode != HTB_CAN_SEND)
 			htb_add_to_wait_tree(q, cl, diff);
 	}
-	/* too much load - let's continue on next jiffie (including above) */
-	return q->now + 2 * PSCHED_TICKS_PER_SEC / HZ;
+	/*
+	 * Too much load - let's continue on next jiffie.
+	 * (Used jiffies are charged later.)
+	 */
+	return q->now + 1;
 }
 
 /* Returns class->node+prio from id-tree where classe's id is >= id. NULL
@@ -845,6 +848,7 @@ static struct sk_buff *htb_dequeue(struct Qdisc *sch)
 	struct htb_sched *q = qdisc_priv(sch);
 	int level;
 	psched_time_t next_event;
+	unsigned long start_at;
 
 	/* try to dequeue direct packets as high prio (!) to minimize cpu work */
 	skb = __skb_dequeue(&q->direct_queue);
@@ -857,6 +861,7 @@ static struct sk_buff *htb_dequeue(struct Qdisc *sch)
 	if (!sch->q.qlen)
 		goto fin;
 	q->now = psched_get_time();
+	start_at = jiffies;
 
 	next_event = q->now + 5 * PSCHED_TICKS_PER_SEC;
 
@@ -889,6 +894,11 @@ static struct sk_buff *htb_dequeue(struct Qdisc *sch)
 		}
 	}
 	sch->qstats.overlimits++;
+	/* charge used jiffies */
+	start_at = jiffies - start_at;
+	if (start_at > 0)
+		next_event += start_at * PSCHED_TICKS_PER_SEC / HZ;
+
 	qdisc_watchdog_schedule(&q->watchdog, next_event);
 fin:
 	return skb;
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH 2/6] pkt_sched: sch_htb: Consider used jiffies in htb_dequeue()
  2008-12-09 10:21 [PATCH 2/6] pkt_sched: sch_htb: Consider used jiffies in htb_dequeue() Jarek Poplawski
@ 2008-12-09 10:28 ` Patrick McHardy
  2008-12-09 11:32   ` Jarek Poplawski
  0 siblings, 1 reply; 40+ messages in thread
From: Patrick McHardy @ 2008-12-09 10:28 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: David Miller, Martin Devera, netdev

Jarek Poplawski wrote:
> Next event time should consider jiffies used for recounting. Otherwise
> qdisc_watchdog_schedule() triggers hrtimer immediately with the event
> in the past, and may cause very high ksoftirqd cpu usage (if highres
> is on). This patch charges jiffies globally, so we can skip this in
> htb_do_events().
> 
> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
> ---
>  net/sched/sch_htb.c |   14 ++++++++++++--
>  1 files changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
> index d6eb4a7..102866d 100644
> --- a/net/sched/sch_htb.c
> +++ b/net/sched/sch_htb.c
> @@ -685,8 +685,11 @@ static psched_time_t htb_do_events(struct htb_sched *q, int level)
>  		if (cl->cmode != HTB_CAN_SEND)
>  			htb_add_to_wait_tree(q, cl, diff);
>  	}
> -	/* too much load - let's continue on next jiffie (including above) */
> -	return q->now + 2 * PSCHED_TICKS_PER_SEC / HZ;
> +	/*
> +	 * Too much load - let's continue on next jiffie.
> +	 * (Used jiffies are charged later.)
> +	 */
> +	return q->now + 1;
>  

This (including the last patch) is really confusing - q->now doesn't
contain HZ values but psched ticks. Could you describe the overall
algorithm you're trying to implement please?


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 2/6] pkt_sched: sch_htb: Consider used jiffies in htb_dequeue()
  2008-12-09 10:28 ` Patrick McHardy
@ 2008-12-09 11:32   ` Jarek Poplawski
  2008-12-09 12:25     ` Patrick McHardy
  0 siblings, 1 reply; 40+ messages in thread
From: Jarek Poplawski @ 2008-12-09 11:32 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David Miller, Martin Devera, netdev

On Tue, Dec 09, 2008 at 11:28:44AM +0100, Patrick McHardy wrote:
> Jarek Poplawski wrote:
>> Next event time should consider jiffies used for recounting. Otherwise
>> qdisc_watchdog_schedule() triggers hrtimer immediately with the event
>> in the past, and may cause very high ksoftirqd cpu usage (if highres
>> is on). This patch charges jiffies globally, so we can skip this in
>> htb_do_events().
>>
>> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
>> ---
>>  net/sched/sch_htb.c |   14 ++++++++++++--
>>  1 files changed, 12 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
>> index d6eb4a7..102866d 100644
>> --- a/net/sched/sch_htb.c
>> +++ b/net/sched/sch_htb.c
>> @@ -685,8 +685,11 @@ static psched_time_t htb_do_events(struct htb_sched *q, int level)
>>  		if (cl->cmode != HTB_CAN_SEND)
>>  			htb_add_to_wait_tree(q, cl, diff);
>>  	}
>> -	/* too much load - let's continue on next jiffie (including above) */
>> -	return q->now + 2 * PSCHED_TICKS_PER_SEC / HZ;
>> +	/*
>> +	 * Too much load - let's continue on next jiffie.
>> +	 * (Used jiffies are charged later.)
>> +	 */
>> +	return q->now + 1;
>>  
>
> This (including the last patch) is really confusing - q->now doesn't
> contain HZ values but psched ticks. Could you describe the overall
> algorithm you're trying to implement please?

The algorithm is we want to "continue on the next jiffie". We know
we've lost here a lot of time (~2 jiffies), and this will be added
later. Since these jiffies are not precise enough wrt. psched ticks
or ktime, and we will add around 2000 (for HZ 1000) psched ticks
anyway this +1 here simply doesn't matter and can mean "a bit after
q->now".

We can try to do this more precisely with additional psched_get_time(),
instead of jiffies, but my "tests" didn't show any advantage. What
matters is to avoid longer scheduling with the past time.

This case can probably happen only with very low rate limits for a
large number of classes, and I guess they simply need some spare time,
not necessarily at psched tick precision.

Jarek P.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 2/6] pkt_sched: sch_htb: Consider used jiffies in htb_dequeue()
  2008-12-09 11:32   ` Jarek Poplawski
@ 2008-12-09 12:25     ` Patrick McHardy
  2008-12-09 13:08       ` Jarek Poplawski
  0 siblings, 1 reply; 40+ messages in thread
From: Patrick McHardy @ 2008-12-09 12:25 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: David Miller, Martin Devera, netdev

Jarek Poplawski wrote:
> On Tue, Dec 09, 2008 at 11:28:44AM +0100, Patrick McHardy wrote:
>>> -	/* too much load - let's continue on next jiffie (including above) */
>>> -	return q->now + 2 * PSCHED_TICKS_PER_SEC / HZ;
>>> +	/*
>>> +	 * Too much load - let's continue on next jiffie.
>>> +	 * (Used jiffies are charged later.)
>>> +	 */
>>> +	return q->now + 1;
>>>  
>> This (including the last patch) is really confusing - q->now doesn't
>> contain HZ values but psched ticks. Could you describe the overall
>> algorithm you're trying to implement please?
> 
> The algorithm is we want to "continue on the next jiffie". We know
> we've lost here a lot of time (~2 jiffies), and this will be added
> later. Since these jiffies are not precise enough wrt. psched ticks
> or ktime, and we will add around 2000 (for HZ 1000) psched ticks
> anyway this +1 here simply doesn't matter and can mean "a bit after
> q->now".

This might as well return q->now, no? The elapsed time is added
on top later anyways. But anyways, I think both the approach and
the patch are wrong.

	/* charge used jiffies */
	start_at = jiffies - start_at;
	if (start_at > 0)
		next_event += start_at * PSCHED_TICKS_PER_SEC / HZ;

What relationship does the duration it ran for has to the time it
should run at again?

The focus on jiffies is wrong IMO, the reason why we get high
load is because the CPU can't keep up, delaying things even
longer is not going to help get the work done. The only reason to
look at jiffies is because its a cheap indication that it has
ran for too long and we should give other tasks a change to run
as well, but it should continue immediately after it did that.
So all it should do is make sure that the watchdog is scheduled
with a very small positive delay.

As for the implementation: the increase in delay (the snippet
above) is also done in the case that no packets were available
for other reasons (throttling), in which case we might needlessly
delay for an extra jiffie if jiffies wrapped while it tried to
dequeue.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 2/6] pkt_sched: sch_htb: Consider used jiffies in htb_dequeue()
  2008-12-09 12:25     ` Patrick McHardy
@ 2008-12-09 13:08       ` Jarek Poplawski
  2008-12-09 13:20         ` Patrick McHardy
  0 siblings, 1 reply; 40+ messages in thread
From: Jarek Poplawski @ 2008-12-09 13:08 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David Miller, Martin Devera, netdev

On Tue, Dec 09, 2008 at 01:25:15PM +0100, Patrick McHardy wrote:
> Jarek Poplawski wrote:
>> On Tue, Dec 09, 2008 at 11:28:44AM +0100, Patrick McHardy wrote:
>>>> -	/* too much load - let's continue on next jiffie (including above) */
>>>> -	return q->now + 2 * PSCHED_TICKS_PER_SEC / HZ;
>>>> +	/*
>>>> +	 * Too much load - let's continue on next jiffie.
>>>> +	 * (Used jiffies are charged later.)
>>>> +	 */
>>>> +	return q->now + 1;
>>>>  
>>> This (including the last patch) is really confusing - q->now doesn't
>>> contain HZ values but psched ticks. Could you describe the overall
>>> algorithm you're trying to implement please?
>>
>> The algorithm is we want to "continue on the next jiffie". We know
>> we've lost here a lot of time (~2 jiffies), and this will be added
>> later. Since these jiffies are not precise enough wrt. psched ticks
>> or ktime, and we will add around 2000 (for HZ 1000) psched ticks
>> anyway this +1 here simply doesn't matter and can mean "a bit after
>> q->now".
>
> This might as well return q->now, no?

Yes, but IMHO it looks worse, considering the problem here (we want to
avoid scheduling in the past).

> The elapsed time is added
> on top later anyways. But anyways, I think both the approach and
> the patch are wrong.
>
> 	/* charge used jiffies */
> 	start_at = jiffies - start_at;
> 	if (start_at > 0)
> 		next_event += start_at * PSCHED_TICKS_PER_SEC / HZ;
>
> What relationship does the duration it ran for has to the time it
> should run at again?

The scheduling times won't be in the past mostly and hrtimers won't
trigger too soon, but approximately around we really need and can
afford a new try without stopping everything else.

> The focus on jiffies is wrong IMO, the reason why we get high
> load is because the CPU can't keep up, delaying things even
> longer is not going to help get the work done. The only reason to
> look at jiffies is because its a cheap indication that it has
> ran for too long and we should give other tasks a change to run
> as well, but it should continue immediately after it did that.
> So all it should do is make sure that the watchdog is scheduled
> with a very small positive delay.

This needs additional psched_get_time(), and as I've written before
there is no apparent advantage in problematic cases, but this would
add more overhead for common cases.

> As for the implementation: the increase in delay (the snippet
> above) is also done in the case that no packets were available
> for other reasons (throttling), in which case we might needlessly
> delay for an extra jiffie if jiffies wrapped while it tried to
> dequeue.

But in another similar cases there could be no change in jiffies, but
almost a jiffie used for counting, so wrong schedule time as well.
Approximatly this all should be fine, and it still can be tuned later.
IMHO, this all should not affect "common" cases, which are expected to
use less then jiffie here.

Jarek P.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 2/6] pkt_sched: sch_htb: Consider used jiffies in htb_dequeue()
  2008-12-09 13:08       ` Jarek Poplawski
@ 2008-12-09 13:20         ` Patrick McHardy
  2008-12-09 14:45           ` Jarek Poplawski
  0 siblings, 1 reply; 40+ messages in thread
From: Patrick McHardy @ 2008-12-09 13:20 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: David Miller, Martin Devera, netdev

Jarek Poplawski wrote:
> On Tue, Dec 09, 2008 at 01:25:15PM +0100, Patrick McHardy wrote:
>> Jarek Poplawski wrote:
>>> The algorithm is we want to "continue on the next jiffie". We know
>>> we've lost here a lot of time (~2 jiffies), and this will be added
>>> later. Since these jiffies are not precise enough wrt. psched ticks
>>> or ktime, and we will add around 2000 (for HZ 1000) psched ticks
>>> anyway this +1 here simply doesn't matter and can mean "a bit after
>>> q->now".
>> This might as well return q->now, no?
> 
> Yes, but IMHO it looks worse, considering the problem here (we want to
> avoid scheduling in the past).

I guess its a matter of taste.

>> The elapsed time is added
>> on top later anyways. But anyways, I think both the approach and
>> the patch are wrong.
>>
>> 	/* charge used jiffies */
>> 	start_at = jiffies - start_at;
>> 	if (start_at > 0)
>> 		next_event += start_at * PSCHED_TICKS_PER_SEC / HZ;
>>
>> What relationship does the duration it ran for has to the time it
>> should run at again?
> 
> The scheduling times won't be in the past mostly and hrtimers won't
> trigger too soon, but approximately around we really need and can
> afford a new try without stopping everything else.

Sure. But it also won't be in the past if we simply add .. lets say
the current uptime in ms. My point was that there's absolutely no
relationship between those two times and combining them just to
get a value thats not in the past is wrong. Especially considering
*why* we want a value in the future and what we'll get from that
calculation.

>> The focus on jiffies is wrong IMO, the reason why we get high
>> load is because the CPU can't keep up, delaying things even
>> longer is not going to help get the work done. The only reason to
>> look at jiffies is because its a cheap indication that it has
>> ran for too long and we should give other tasks a change to run
>> as well, but it should continue immediately after it did that.
>> So all it should do is make sure that the watchdog is scheduled
>> with a very small positive delay.
> 
> This needs additional psched_get_time(), and as I've written before
> there is no apparent advantage in problematic cases, but this would
> add more overhead for common cases.

htb_do_events() exceeding two jiffies is fortunately not a common
case. You (incorrectly) made the calculation somewhat of a common
case by also adding to the delay if the inner classes simply throttled
and already returned the exact delay they want.

Much better (again considering what we want to achieve here) would
be to not use the hrtimer watchdog at all. We want to give lower
priority tasks a chance to run, so ideally we'd use a low priority
task for wakeup.

>> As for the implementation: the increase in delay (the snippet
>> above) is also done in the case that no packets were available
>> for other reasons (throttling), in which case we might needlessly
>> delay for an extra jiffie if jiffies wrapped while it tried to
>> dequeue.
> 
> But in another similar cases there could be no change in jiffies, but
> almost a jiffie used for counting, so wrong schedule time as well.

Its not "wrong". We don't want to delay. Its a courtesy to the
remaining system.

> Approximatly this all should be fine, and it still can be tuned later.
> IMHO, this all should not affect "common" cases, which are expected to
> use less then jiffie here.

Jiffies might wrap even if it only took only a few nanoseconds.
And its not fine, in the case of throttled classes there's no
reason to add extra delay *at all*.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 2/6] pkt_sched: sch_htb: Consider used jiffies in htb_dequeue()
  2008-12-09 13:20         ` Patrick McHardy
@ 2008-12-09 14:45           ` Jarek Poplawski
  2008-12-09 14:56             ` Patrick McHardy
  2008-12-10  6:35             ` [PATCH 2/6] pkt_sched: sch_htb: Consider used jiffies in htb_dequeue() David Miller
  0 siblings, 2 replies; 40+ messages in thread
From: Jarek Poplawski @ 2008-12-09 14:45 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David Miller, Martin Devera, netdev

To David Miller: David don't apply yet - this patch needs change.

Patrick, read below:

On Tue, Dec 09, 2008 at 02:20:00PM +0100, Patrick McHardy wrote:
> Jarek Poplawski wrote:
>> On Tue, Dec 09, 2008 at 01:25:15PM +0100, Patrick McHardy wrote:
>>> Jarek Poplawski wrote:
>>>> The algorithm is we want to "continue on the next jiffie". We know
>>>> we've lost here a lot of time (~2 jiffies), and this will be added
>>>> later. Since these jiffies are not precise enough wrt. psched ticks
>>>> or ktime, and we will add around 2000 (for HZ 1000) psched ticks
>>>> anyway this +1 here simply doesn't matter and can mean "a bit after
>>>> q->now".
>>> This might as well return q->now, no?
>>
>> Yes, but IMHO it looks worse, considering the problem here (we want to
>> avoid scheduling in the past).
>
> I guess its a matter of taste.

Exactly. (And could be changed.)

>
>>> The elapsed time is added
>>> on top later anyways. But anyways, I think both the approach and
>>> the patch are wrong.
>>>
>>> 	/* charge used jiffies */
>>> 	start_at = jiffies - start_at;
>>> 	if (start_at > 0)
>>> 		next_event += start_at * PSCHED_TICKS_PER_SEC / HZ;
>>>
>>> What relationship does the duration it ran for has to the time it
>>> should run at again?
>>
>> The scheduling times won't be in the past mostly and hrtimers won't
>> trigger too soon, but approximately around we really need and can
>> afford a new try without stopping everything else.
>
> Sure. But it also won't be in the past if we simply add .. lets say
> the current uptime in ms. My point was that there's absolutely no
> relationship between those two times and combining them just to
> get a value thats not in the past is wrong. Especially considering
> *why* we want a value in the future and what we'll get from that
> calculation.
>
>>> The focus on jiffies is wrong IMO, the reason why we get high
>>> load is because the CPU can't keep up, delaying things even
>>> longer is not going to help get the work done. The only reason to
>>> look at jiffies is because its a cheap indication that it has
>>> ran for too long and we should give other tasks a change to run
>>> as well, but it should continue immediately after it did that.
>>> So all it should do is make sure that the watchdog is scheduled
>>> with a very small positive delay.
>>
>> This needs additional psched_get_time(), and as I've written before
>> there is no apparent advantage in problematic cases, but this would
>> add more overhead for common cases.
>
> htb_do_events() exceeding two jiffies is fortunately not a common
> case. You (incorrectly) made the calculation somewhat of a common
> case by also adding to the delay if the inner classes simply throttled
> and already returned the exact delay they want.

I see! You are right and this needs fixing.

> Much better (again considering what we want to achieve here) would
> be to not use the hrtimer watchdog at all. We want to give lower
> priority tasks a chance to run, so ideally we'd use a low priority
> task for wakeup.

I'm not sure I get ot right: for precise scheduling hrtimers look
useful. Do you really mean "at all"?
 
>
>>> As for the implementation: the increase in delay (the snippet
>>> above) is also done in the case that no packets were available
>>> for other reasons (throttling), in which case we might needlessly
>>> delay for an extra jiffie if jiffies wrapped while it tried to
>>> dequeue.
>>
>> But in another similar cases there could be no change in jiffies, but
>> almost a jiffie used for counting, so wrong schedule time as well.
>
> Its not "wrong". We don't want to delay. Its a courtesy to the
> remaining system.

In this case it's probably self-courtesy too: this ksoftirqd takes
most of the time and it's useless.

>
>> Approximatly this all should be fine, and it still can be tuned later.
>> IMHO, this all should not affect "common" cases, which are expected to
>> use less then jiffie here.
>
> Jiffies might wrap even if it only took only a few nanoseconds.
> And its not fine, in the case of throttled classes there's no
> reason to add extra delay *at all*.

Yes, you are right with this. I can try too fix this tomorrow, unless
you prefer to send your version of this patch.

Thanks,
Jarek P.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 2/6] pkt_sched: sch_htb: Consider used jiffies in htb_dequeue()
  2008-12-09 14:45           ` Jarek Poplawski
@ 2008-12-09 14:56             ` Patrick McHardy
  2008-12-10 10:52               ` [PATCH 8/6] " Jarek Poplawski
  2009-01-12 10:17               ` [PATCH 8/6 resend] pkt_sched: sch_htb: Break all htb_do_events() after 2 jiffies Jarek Poplawski
  2008-12-10  6:35             ` [PATCH 2/6] pkt_sched: sch_htb: Consider used jiffies in htb_dequeue() David Miller
  1 sibling, 2 replies; 40+ messages in thread
From: Patrick McHardy @ 2008-12-09 14:56 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: David Miller, Martin Devera, netdev

Jarek Poplawski wrote:
> To David Miller: David don't apply yet - this patch needs change.
> 
> Patrick, read below:
> 
> On Tue, Dec 09, 2008 at 02:20:00PM +0100, Patrick McHardy wrote:
>>>>
>>>> The focus on jiffies is wrong IMO, the reason why we get high
>>>> load is because the CPU can't keep up, delaying things even
>>>> longer is not going to help get the work done. The only reason to
>>>> look at jiffies is because its a cheap indication that it has
>>>> ran for too long and we should give other tasks a change to run
>>>> as well, but it should continue immediately after it did that.
>>>> So all it should do is make sure that the watchdog is scheduled
>>>> with a very small positive delay.
 >>>>
>>> This needs additional psched_get_time(), and as I've written before
>>> there is no apparent advantage in problematic cases, but this would
>>> add more overhead for common cases.
 >>>
>> htb_do_events() exceeding two jiffies is fortunately not a common
>> case. You (incorrectly) made the calculation somewhat of a common
>> case by also adding to the delay if the inner classes simply throttled
>> and already returned the exact delay they want.
> 
> I see! You are right and this needs fixing.
> 
>> Much better (again considering what we want to achieve here) would
>> be to not use the hrtimer watchdog at all. We want to give lower
>> priority tasks a chance to run, so ideally we'd use a low priority
>> task for wakeup.
> 
> I'm not sure I get ot right: for precise scheduling hrtimers look
> useful. Do you really mean "at all"?

I meant "at all" for the wakeup after we've decided HTB has too much
work to do at once. A work queue seems better suited since that makes
sure we allow other processes to run, but don't wait unnecessarily
long when there is no other work.

>>>> As for the implementation: the increase in delay (the snippet
>>>> above) is also done in the case that no packets were available
>>>> for other reasons (throttling), in which case we might needlessly
>>>> delay for an extra jiffie if jiffies wrapped while it tried to
>>>> dequeue.
 >>>>
>>> But in another similar cases there could be no change in jiffies, but
>>> almost a jiffie used for counting, so wrong schedule time as well.
>> Its not "wrong". We don't want to delay. Its a courtesy to the
>> remaining system.
> 
> In this case it's probably self-courtesy too: this ksoftirqd takes
> most of the time and it's useless.

Well, it calls back to HTB, which continues to do real work. But
leaving HTB, scheduling a timer just to be called immediately again
is useless overhead, I agree.

>>> Approximatly this all should be fine, and it still can be tuned later.
>>> IMHO, this all should not affect "common" cases, which are expected to
>>> use less then jiffie here.
 >>>
>> Jiffies might wrap even if it only took only a few nanoseconds.
>> And its not fine, in the case of throttled classes there's no
>> reason to add extra delay *at all*.
> 
> Yes, you are right with this. I can try too fix this tomorrow, unless
> you prefer to send your version of this patch.

I don't have a version of my own, so please go ahead :)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 2/6] pkt_sched: sch_htb: Consider used jiffies in htb_dequeue()
  2008-12-09 14:45           ` Jarek Poplawski
  2008-12-09 14:56             ` Patrick McHardy
@ 2008-12-10  6:35             ` David Miller
  2008-12-10  9:11               ` Jarek Poplawski
  1 sibling, 1 reply; 40+ messages in thread
From: David Miller @ 2008-12-10  6:35 UTC (permalink / raw)
  To: jarkao2; +Cc: kaber, devik, netdev

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Tue, 9 Dec 2008 14:45:34 +0000

> To David Miller: David don't apply yet - this patch needs change.

What I've done is apply patches 5 and 6 since those are
fine and independent of these timer issues.

So once you work this stuff out please resubmit the rest.

Thanks!

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 2/6] pkt_sched: sch_htb: Consider used jiffies in htb_dequeue()
  2008-12-10  6:35             ` [PATCH 2/6] pkt_sched: sch_htb: Consider used jiffies in htb_dequeue() David Miller
@ 2008-12-10  9:11               ` Jarek Poplawski
  2008-12-10  9:14                 ` David Miller
  0 siblings, 1 reply; 40+ messages in thread
From: Jarek Poplawski @ 2008-12-10  9:11 UTC (permalink / raw)
  To: David Miller; +Cc: kaber, devik, netdev

On Tue, Dec 09, 2008 at 10:35:45PM -0800, David Miller wrote:
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Tue, 9 Dec 2008 14:45:34 +0000
> 
> > To David Miller: David don't apply yet - this patch needs change.
> 
> What I've done is apply patches 5 and 6 since those are
> fine and independent of these timer issues.

Very nice of you, but IMHO patches 1 and 4 are OK too.

I'm withdrawing patches 2 and 3, and will try to discuss this with
Patrick more, but this could stay like this too.

> So once you work this stuff out please resubmit the rest.

Of course, I can resubmit 1 and 4 if you think it's really needed.

Thanks,
Jarek P.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 2/6] pkt_sched: sch_htb: Consider used jiffies in htb_dequeue()
  2008-12-10  9:11               ` Jarek Poplawski
@ 2008-12-10  9:14                 ` David Miller
  2008-12-10  9:35                   ` [PATCH 7/6] " Jarek Poplawski
  2009-01-12 10:16                   ` [PATCH 7/6 resend] pkt_sched: sch_htb: Consider used jiffies in htb_do_events() Jarek Poplawski
  0 siblings, 2 replies; 40+ messages in thread
From: David Miller @ 2008-12-10  9:14 UTC (permalink / raw)
  To: jarkao2; +Cc: kaber, devik, netdev

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Wed, 10 Dec 2008 09:11:26 +0000

> On Tue, Dec 09, 2008 at 10:35:45PM -0800, David Miller wrote:
> > So once you work this stuff out please resubmit the rest.
> 
> Of course, I can resubmit 1 and 4 if you think it's really needed.

Please do, there were dependencies.  In fact patch 2 changes
the very change you made in patch 1.

I understand it's logical steps, but you could just do 1 and
2 in the same patch and it'd be OK.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 7/6] Re: [PATCH 2/6] pkt_sched: sch_htb: Consider used jiffies in htb_dequeue()
  2008-12-10  9:14                 ` David Miller
@ 2008-12-10  9:35                   ` Jarek Poplawski
  2008-12-10 14:38                     ` Patrick McHardy
  2008-12-16 23:57                     ` David Miller
  2009-01-12 10:16                   ` [PATCH 7/6 resend] pkt_sched: sch_htb: Consider used jiffies in htb_do_events() Jarek Poplawski
  1 sibling, 2 replies; 40+ messages in thread
From: Jarek Poplawski @ 2008-12-10  9:35 UTC (permalink / raw)
  To: David Miller; +Cc: kaber, devik, netdev

On Wed, Dec 10, 2008 at 01:14:31AM -0800, David Miller wrote:
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Wed, 10 Dec 2008 09:11:26 +0000
> 
> > On Tue, Dec 09, 2008 at 10:35:45PM -0800, David Miller wrote:
> > > So once you work this stuff out please resubmit the rest.
> > 
> > Of course, I can resubmit 1 and 4 if you think it's really needed.
> 
> Please do, there were dependencies.  In fact patch 2 changes
> the very change you made in patch 1.

But patch 2 is nonexistent. (I've withdrawn this...)

> 
> I understand it's logical steps, but you could just do 1 and
> 2 in the same patch and it'd be OK.

OK, so I withdraw 1 and 4 too, and let's say this is No 7 (2in1).

Sorry for this mess,
Jarek P.

------------->
pkt_sched: sch_htb: Consider used jiffies in htb_do_events()

Next event time should consider jiffies used for recounting. Otherwise
qdisc_watchdog_schedule() triggers hrtimer immediately with the event
in the past, and may cause very high ksoftirqd cpu usage (if highres
is on).

There is also removed checking "event" for zero in htb_dequeue(): it's
always true in this place.

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
---

 net/sched/sch_htb.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 5070643..9ca8a26 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -685,8 +685,8 @@ static psched_time_t htb_do_events(struct htb_sched *q, int level)
 		if (cl->cmode != HTB_CAN_SEND)
 			htb_add_to_wait_tree(q, cl, diff);
 	}
-	/* too much load - let's continue on next jiffie */
-	return q->now + PSCHED_TICKS_PER_SEC / HZ;
+	/* too much load - let's continue on next jiffie (including above) */
+	return q->now + 2 * PSCHED_TICKS_PER_SEC / HZ;
 }
 
 /* Returns class->node+prio from id-tree where classe's id is >= id. NULL
@@ -873,7 +873,7 @@ static struct sk_buff *htb_dequeue(struct Qdisc *sch)
 		} else
 			event = q->near_ev_cache[level];
 
-		if (event && next_event > event)
+		if (next_event > event)
 			next_event = event;
 
 		m = ~q->row_mask[level];

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 8/6] Re: [PATCH 2/6] pkt_sched: sch_htb: Consider used jiffies in htb_dequeue()
  2008-12-09 14:56             ` Patrick McHardy
@ 2008-12-10 10:52               ` Jarek Poplawski
  2009-01-12 10:17               ` [PATCH 8/6 resend] pkt_sched: sch_htb: Break all htb_do_events() after 2 jiffies Jarek Poplawski
  1 sibling, 0 replies; 40+ messages in thread
From: Jarek Poplawski @ 2008-12-10 10:52 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David Miller, Martin Devera, netdev

On Tue, Dec 09, 2008 at 03:56:09PM +0100, Patrick McHardy wrote:
...
> I meant "at all" for the wakeup after we've decided HTB has too much
> work to do at once. A work queue seems better suited since that makes
> sure we allow other processes to run, but don't wait unnecessarily
> long when there is no other work.

Maybe I miss your point, but IMHO the too much work case isn't a
problem here: there is a lot of time wasted in this case, so e.g.
additional psched_get_time() and "safe" rescheduling is possible -
no need to complicate it with workqueues etc. (but current patch 7
should be enough). I'm concerned with e.g. a config doing often
htb_do_events() and htb_dequeue_tree() in ~1 jiffie and
rescheduling for 1/2 jiffie (or 1/2 vs. 1/4 etc.), so mainly in
the past. hrtimers beahave with this really different from timers.

>>> Jiffies might wrap even if it only took only a few nanoseconds.
>>> And its not fine, in the case of throttled classes there's no
>>> reason to add extra delay *at all*.
>>
>> Yes, you are right with this. I can try too fix this tomorrow, unless
>> you prefer to send your version of this patch.
>
> I don't have a version of my own, so please go ahead :)

Alas I haven't found how we can fix it generally without any such costs,
so I think I've to leave this problem for now. Then only this "over 2
jiffies" case should be fixed here (current patch 7), plus I propose the
patch below to additionally skip doing events generally after 2 jiffies.

Thanks,
Jarek P.

-------------> (redone old patch 3 - to apply on top of patch 7)

pkt_sched: sch_htb: Break all htb_do_events() after 2 jiffies

Currently htb_do_events() breaks events recounting for a level after 2
jiffies, but there is no reason to repeat this for next levels and
increase delays even more (with softirqs disabled). htb_dequeue_tree()
can add to this too, btw. In such a case q->now time is invalid anyway.

Thanks to Patrick McHardy for spotting an error around earlier version
of this patch.

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
---

 net/sched/sch_htb.c |    9 ++++++---
 1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 9ca8a26..2f0f0b0 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -661,12 +661,13 @@ static void htb_charge_class(struct htb_sched *q, struct htb_class *cl,
  * next pending event (0 for no event in pq).
  * Note: Applied are events whose have cl->pq_key <= q->now.
  */
-static psched_time_t htb_do_events(struct htb_sched *q, int level)
+static psched_time_t htb_do_events(struct htb_sched *q, int level,
+				   unsigned long start)
 {
 	/* don't run for longer than 2 jiffies; 2 is used instead of
 	   1 to simplify things when jiffy is going to be incremented
 	   too soon */
-	unsigned long stop_at = jiffies + 2;
+	unsigned long stop_at = start + 2;
 	while (time_before(jiffies, stop_at)) {
 		struct htb_class *cl;
 		long diff;
@@ -845,6 +846,7 @@ static struct sk_buff *htb_dequeue(struct Qdisc *sch)
 	struct htb_sched *q = qdisc_priv(sch);
 	int level;
 	psched_time_t next_event;
+	unsigned long start_at;
 
 	/* try to dequeue direct packets as high prio (!) to minimize cpu work */
 	skb = __skb_dequeue(&q->direct_queue);
@@ -857,6 +859,7 @@ static struct sk_buff *htb_dequeue(struct Qdisc *sch)
 	if (!sch->q.qlen)
 		goto fin;
 	q->now = psched_get_time();
+	start_at = jiffies;
 
 	next_event = q->now + 5 * PSCHED_TICKS_PER_SEC;
 
@@ -866,7 +869,7 @@ static struct sk_buff *htb_dequeue(struct Qdisc *sch)
 		psched_time_t event;
 
 		if (q->now >= q->near_ev_cache[level]) {
-			event = htb_do_events(q, level);
+			event = htb_do_events(q, level, start_at);
 			if (!event)
 				event = q->now + PSCHED_TICKS_PER_SEC;
 			q->near_ev_cache[level] = event;

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH 7/6] Re: [PATCH 2/6] pkt_sched: sch_htb: Consider used jiffies in htb_dequeue()
  2008-12-10  9:35                   ` [PATCH 7/6] " Jarek Poplawski
@ 2008-12-10 14:38                     ` Patrick McHardy
  2008-12-16 23:57                     ` David Miller
  1 sibling, 0 replies; 40+ messages in thread
From: Patrick McHardy @ 2008-12-10 14:38 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: David Miller, devik, netdev

Jarek Poplawski wrote:
> On Wed, Dec 10, 2008 at 01:14:31AM -0800, David Miller wrote:
>   
>
>> I understand it's logical steps, but you could just do 1 and
>> 2 in the same patch and it'd be OK.
>>     
>
> OK, so I withdraw 1 and 4 too, and let's say this is No 7 (2in1).
>   


Just FYI, I'm travelling today, so I'll need until tommorrow to
review this.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 7/6] Re: [PATCH 2/6] pkt_sched: sch_htb: Consider used jiffies in htb_dequeue()
  2008-12-10  9:35                   ` [PATCH 7/6] " Jarek Poplawski
  2008-12-10 14:38                     ` Patrick McHardy
@ 2008-12-16 23:57                     ` David Miller
  2008-12-17  7:03                       ` Jarek Poplawski
  1 sibling, 1 reply; 40+ messages in thread
From: David Miller @ 2008-12-16 23:57 UTC (permalink / raw)
  To: jarkao2; +Cc: kaber, devik, netdev


Patrick, I know you said you were travelling and very busy,
but Jarek has been waiting patiently for you to review
these two newly respun patches.

These are just rotting in my patch queue and I'd like to
do something with them.

Thanks.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 7/6] Re: [PATCH 2/6] pkt_sched: sch_htb: Consider used jiffies in htb_dequeue()
  2008-12-16 23:57                     ` David Miller
@ 2008-12-17  7:03                       ` Jarek Poplawski
  2008-12-17  7:38                         ` David Miller
  0 siblings, 1 reply; 40+ messages in thread
From: Jarek Poplawski @ 2008-12-17  7:03 UTC (permalink / raw)
  To: David Miller; +Cc: kaber, devik, netdev

On Tue, Dec 16, 2008 at 03:57:22PM -0800, David Miller wrote:
> 
> Patrick, I know you said you were travelling and very busy,
> but Jarek has been waiting patiently for you to review
> these two newly respun patches.
> 
> These are just rotting in my patch queue and I'd like to
> do something with them.

I guess Patrick has some doubts, but since these patches are not
critical, let's say I'll resend them within a few weeks.

Thanks,
Jarek P.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 7/6] Re: [PATCH 2/6] pkt_sched: sch_htb: Consider used jiffies in htb_dequeue()
  2008-12-17  7:03                       ` Jarek Poplawski
@ 2008-12-17  7:38                         ` David Miller
  2009-01-12  6:56                           ` Patrick McHardy
  0 siblings, 1 reply; 40+ messages in thread
From: David Miller @ 2008-12-17  7:38 UTC (permalink / raw)
  To: jarkao2; +Cc: kaber, devik, netdev

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Wed, 17 Dec 2008 07:03:03 +0000

> On Tue, Dec 16, 2008 at 03:57:22PM -0800, David Miller wrote:
> > 
> > Patrick, I know you said you were travelling and very busy,
> > but Jarek has been waiting patiently for you to review
> > these two newly respun patches.
> > 
> > These are just rotting in my patch queue and I'd like to
> > do something with them.
> 
> I guess Patrick has some doubts, but since these patches are not
> critical, let's say I'll resend them within a few weeks.

Fair enough, I'll mark them deferred on patchwork

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 7/6] Re: [PATCH 2/6] pkt_sched: sch_htb: Consider used jiffies in htb_dequeue()
  2008-12-17  7:38                         ` David Miller
@ 2009-01-12  6:56                           ` Patrick McHardy
  2009-01-12 10:10                             ` Jarek Poplawski
  0 siblings, 1 reply; 40+ messages in thread
From: Patrick McHardy @ 2009-01-12  6:56 UTC (permalink / raw)
  To: David Miller; +Cc: jarkao2, devik, netdev

David Miller wrote:
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Wed, 17 Dec 2008 07:03:03 +0000
> 
>> On Tue, Dec 16, 2008 at 03:57:22PM -0800, David Miller wrote:
>>> Patrick, I know you said you were travelling and very busy,
>>> but Jarek has been waiting patiently for you to review
>>> these two newly respun patches.
>>>
>>> These are just rotting in my patch queue and I'd like to
>>> do something with them.
>> I guess Patrick has some doubts, but since these patches are not
>> critical, let's say I'll resend them within a few weeks.
> 
> Fair enough, I'll mark them deferred on patchwork

Sorry, I dropped the ball on this one. I still think scheduling
a work-queue or something else running in process context to
kick the queue once the scheduler had a chance to run would
be a better solution. But Jarek's patches are an improvement
to the current situation, so no objections from me.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 7/6] Re: [PATCH 2/6] pkt_sched: sch_htb: Consider used jiffies in htb_dequeue()
  2009-01-12  6:56                           ` Patrick McHardy
@ 2009-01-12 10:10                             ` Jarek Poplawski
  2009-01-12 10:22                               ` Patrick McHardy
                                                 ` (2 more replies)
  0 siblings, 3 replies; 40+ messages in thread
From: Jarek Poplawski @ 2009-01-12 10:10 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David Miller, devik, netdev

On Mon, Jan 12, 2009 at 07:56:37AM +0100, Patrick McHardy wrote:
> David Miller wrote:
>> From: Jarek Poplawski <jarkao2@gmail.com>
>> Date: Wed, 17 Dec 2008 07:03:03 +0000
>>
>>> On Tue, Dec 16, 2008 at 03:57:22PM -0800, David Miller wrote:
>>>> Patrick, I know you said you were travelling and very busy,
>>>> but Jarek has been waiting patiently for you to review
>>>> these two newly respun patches.
>>>>
>>>> These are just rotting in my patch queue and I'd like to
>>>> do something with them.
>>> I guess Patrick has some doubts, but since these patches are not
>>> critical, let's say I'll resend them within a few weeks.
>>
>> Fair enough, I'll mark them deferred on patchwork
>
> Sorry, I dropped the ball on this one. I still think scheduling
> a work-queue or something else running in process context to
> kick the queue once the scheduler had a chance to run would
> be a better solution. But Jarek's patches are an improvement
> to the current situation, so no objections from me.
>

Thanks for the review Patrick. As I wrote before, I'm not against
using a workqueue here: it's logically better, but I still think
this place is rather exception, so I'm not convinced we should
care so much adding better solution, but also some overhead when
cancelling this workqueue. But if it really bothers you, please
confirm, and I'll do it. BTW, I wonder if adding the old "too many
events" warning back wouldn't be more useful here.

David, I'm not sure you can still track these patches, so I'll
soon resend them (2 patches: #7/6 and 8/6).

Thanks,
Jarek P.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 7/6 resend] pkt_sched: sch_htb: Consider used jiffies in htb_do_events()
  2008-12-10  9:14                 ` David Miller
  2008-12-10  9:35                   ` [PATCH 7/6] " Jarek Poplawski
@ 2009-01-12 10:16                   ` Jarek Poplawski
  2009-01-13  5:54                     ` David Miller
  1 sibling, 1 reply; 40+ messages in thread
From: Jarek Poplawski @ 2009-01-12 10:16 UTC (permalink / raw)
  To: David Miller; +Cc: kaber, devik, netdev

(resend)

pkt_sched: sch_htb: Consider used jiffies in htb_do_events()

Next event time should consider jiffies used for recounting. Otherwise
qdisc_watchdog_schedule() triggers hrtimer immediately with the event
in the past, and may cause very high ksoftirqd cpu usage (if highres
is on).

There is also removed checking "event" for zero in htb_dequeue(): it's
always true in this place.

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
---

 net/sched/sch_htb.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 5070643..9ca8a26 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -685,8 +685,8 @@ static psched_time_t htb_do_events(struct htb_sched *q, int level)
 		if (cl->cmode != HTB_CAN_SEND)
 			htb_add_to_wait_tree(q, cl, diff);
 	}
-	/* too much load - let's continue on next jiffie */
-	return q->now + PSCHED_TICKS_PER_SEC / HZ;
+	/* too much load - let's continue on next jiffie (including above) */
+	return q->now + 2 * PSCHED_TICKS_PER_SEC / HZ;
 }
 
 /* Returns class->node+prio from id-tree where classe's id is >= id. NULL
@@ -873,7 +873,7 @@ static struct sk_buff *htb_dequeue(struct Qdisc *sch)
 		} else
 			event = q->near_ev_cache[level];
 
-		if (event && next_event > event)
+		if (next_event > event)
 			next_event = event;
 
 		m = ~q->row_mask[level];

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 8/6 resend] pkt_sched: sch_htb: Break all htb_do_events() after 2 jiffies
  2008-12-09 14:56             ` Patrick McHardy
  2008-12-10 10:52               ` [PATCH 8/6] " Jarek Poplawski
@ 2009-01-12 10:17               ` Jarek Poplawski
  2009-01-13  5:54                 ` David Miller
  1 sibling, 1 reply; 40+ messages in thread
From: Jarek Poplawski @ 2009-01-12 10:17 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David Miller, Martin Devera, netdev

(resend)

pkt_sched: sch_htb: Break all htb_do_events() after 2 jiffies

Currently htb_do_events() breaks events recounting for a level after 2
jiffies, but there is no reason to repeat this for next levels and
increase delays even more (with softirqs disabled). htb_dequeue_tree()
can add to this too, btw. In such a case q->now time is invalid anyway.

Thanks to Patrick McHardy for spotting an error around earlier version
of this patch.

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
---

 net/sched/sch_htb.c |    9 ++++++---
 1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 9ca8a26..2f0f0b0 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -661,12 +661,13 @@ static void htb_charge_class(struct htb_sched *q, struct htb_class *cl,
  * next pending event (0 for no event in pq).
  * Note: Applied are events whose have cl->pq_key <= q->now.
  */
-static psched_time_t htb_do_events(struct htb_sched *q, int level)
+static psched_time_t htb_do_events(struct htb_sched *q, int level,
+				   unsigned long start)
 {
 	/* don't run for longer than 2 jiffies; 2 is used instead of
 	   1 to simplify things when jiffy is going to be incremented
 	   too soon */
-	unsigned long stop_at = jiffies + 2;
+	unsigned long stop_at = start + 2;
 	while (time_before(jiffies, stop_at)) {
 		struct htb_class *cl;
 		long diff;
@@ -845,6 +846,7 @@ static struct sk_buff *htb_dequeue(struct Qdisc *sch)
 	struct htb_sched *q = qdisc_priv(sch);
 	int level;
 	psched_time_t next_event;
+	unsigned long start_at;
 
 	/* try to dequeue direct packets as high prio (!) to minimize cpu work */
 	skb = __skb_dequeue(&q->direct_queue);
@@ -857,6 +859,7 @@ static struct sk_buff *htb_dequeue(struct Qdisc *sch)
 	if (!sch->q.qlen)
 		goto fin;
 	q->now = psched_get_time();
+	start_at = jiffies;
 
 	next_event = q->now + 5 * PSCHED_TICKS_PER_SEC;
 
@@ -866,7 +869,7 @@ static struct sk_buff *htb_dequeue(struct Qdisc *sch)
 		psched_time_t event;
 
 		if (q->now >= q->near_ev_cache[level]) {
-			event = htb_do_events(q, level);
+			event = htb_do_events(q, level, start_at);
 			if (!event)
 				event = q->now + PSCHED_TICKS_PER_SEC;
 			q->near_ev_cache[level] = event;

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH 7/6] Re: [PATCH 2/6] pkt_sched: sch_htb: Consider used jiffies in htb_dequeue()
  2009-01-12 10:10                             ` Jarek Poplawski
@ 2009-01-12 10:22                               ` Patrick McHardy
  2009-01-12 11:08                                 ` Jarek Poplawski
                                                   ` (2 more replies)
  2009-01-12 10:29                               ` Jarek Poplawski
  2009-01-12 10:32                               ` David Miller
  2 siblings, 3 replies; 40+ messages in thread
From: Patrick McHardy @ 2009-01-12 10:22 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: David Miller, devik, netdev

Jarek Poplawski wrote:
> On Mon, Jan 12, 2009 at 07:56:37AM +0100, Patrick McHardy wrote:
>> Sorry, I dropped the ball on this one. I still think scheduling
>> a work-queue or something else running in process context to
>> kick the queue once the scheduler had a chance to run would
>> be a better solution. But Jarek's patches are an improvement
>> to the current situation, so no objections from me.
>>
> 
> Thanks for the review Patrick. As I wrote before, I'm not against
> using a workqueue here: it's logically better, but I still think
> this place is rather exception, so I'm not convinced we should
> care so much adding better solution, but also some overhead when
> cancelling this workqueue. But if it really bothers you, please
> confirm, and I'll do it.

It doesn't bother me :) I just think its the technical better
and also most likely code-wise cleaner solution to this problem.
Cancellation wouldn't be necessary since an unnecessary
netif_schedule() doesn't really matter.

It you don't mind adding the workqueue, I certainly would prefer
it, but I'm also fine with this patch. I don't have a HTB setup
or a testcase for this specific case, otherwise I'd simply do it
myself.

> BTW, I wonder if adding the old "too many
> events" warning back wouldn't be more useful here.

It would be good to notify the user and also have some indication
for this case when looking into bug reports. A counter or a (single)
printk would both be fine.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 7/6] Re: [PATCH 2/6] pkt_sched: sch_htb: Consider used jiffies in htb_dequeue()
  2009-01-12 10:10                             ` Jarek Poplawski
  2009-01-12 10:22                               ` Patrick McHardy
@ 2009-01-12 10:29                               ` Jarek Poplawski
  2009-01-12 10:32                               ` David Miller
  2 siblings, 0 replies; 40+ messages in thread
From: Jarek Poplawski @ 2009-01-12 10:29 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David Miller, devik, netdev

On Mon, Jan 12, 2009 at 10:10:47AM +0000, Jarek Poplawski wrote:
...
> David, I'm not sure you can still track these patches, so I'll
> soon resend them (2 patches: #7/6 and 8/6).

Maybe I should have mention this: there is nothing more except these 2
patches to merge from this thread.

Jarek P.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 7/6] Re: [PATCH 2/6] pkt_sched: sch_htb: Consider used jiffies in htb_dequeue()
  2009-01-12 10:10                             ` Jarek Poplawski
  2009-01-12 10:22                               ` Patrick McHardy
  2009-01-12 10:29                               ` Jarek Poplawski
@ 2009-01-12 10:32                               ` David Miller
  2009-01-12 10:59                                 ` Jarek Poplawski
  2 siblings, 1 reply; 40+ messages in thread
From: David Miller @ 2009-01-12 10:32 UTC (permalink / raw)
  To: jarkao2; +Cc: kaber, devik, netdev

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Mon, 12 Jan 2009 10:10:47 +0000

> David, I'm not sure you can still track these patches, so I'll
> soon resend them (2 patches: #7/6 and 8/6).

Thanks, please do that.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 7/6] Re: [PATCH 2/6] pkt_sched: sch_htb: Consider used jiffies in htb_dequeue()
  2009-01-12 10:32                               ` David Miller
@ 2009-01-12 10:59                                 ` Jarek Poplawski
  2009-01-12 11:04                                   ` David Miller
  0 siblings, 1 reply; 40+ messages in thread
From: Jarek Poplawski @ 2009-01-12 10:59 UTC (permalink / raw)
  To: David Miller; +Cc: kaber, devik, netdev

On Mon, Jan 12, 2009 at 02:32:50AM -0800, David Miller wrote:
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Mon, 12 Jan 2009 10:10:47 +0000
> 
> > David, I'm not sure you can still track these patches, so I'll
> > soon resend them (2 patches: #7/6 and 8/6).
> 
> Thanks, please do that.

I hope you've received them.

Thanks,
Jarek P.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 7/6] Re: [PATCH 2/6] pkt_sched: sch_htb: Consider used jiffies in htb_dequeue()
  2009-01-12 10:59                                 ` Jarek Poplawski
@ 2009-01-12 11:04                                   ` David Miller
  0 siblings, 0 replies; 40+ messages in thread
From: David Miller @ 2009-01-12 11:04 UTC (permalink / raw)
  To: jarkao2; +Cc: kaber, devik, netdev

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Mon, 12 Jan 2009 10:59:03 +0000

> On Mon, Jan 12, 2009 at 02:32:50AM -0800, David Miller wrote:
> > From: Jarek Poplawski <jarkao2@gmail.com>
> > Date: Mon, 12 Jan 2009 10:10:47 +0000
> > 
> > > David, I'm not sure you can still track these patches, so I'll
> > > soon resend them (2 patches: #7/6 and 8/6).
> > 
> > Thanks, please do that.
> 
> I hope you've received them.

I did, thanks.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 7/6] Re: [PATCH 2/6] pkt_sched: sch_htb: Consider used jiffies in htb_dequeue()
  2009-01-12 10:22                               ` Patrick McHardy
@ 2009-01-12 11:08                                 ` Jarek Poplawski
  2009-01-12 13:10                                   ` Patrick McHardy
  2009-01-28 12:52                                 ` [PATCH net-next] pkt_sched: sch_htb: Warn on too many events Jarek Poplawski
  2009-01-28 13:23                                 ` [PATCH 7/6] Re: [PATCH 2/6] pkt_sched: sch_htb: Consider used jiffies in htb_dequeue() Jarek Poplawski
  2 siblings, 1 reply; 40+ messages in thread
From: Jarek Poplawski @ 2009-01-12 11:08 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David Miller, devik, netdev

On Mon, Jan 12, 2009 at 11:22:47AM +0100, Patrick McHardy wrote:
...
> It doesn't bother me :) I just think its the technical better
> and also most likely code-wise cleaner solution to this problem.
> Cancellation wouldn't be necessary since an unnecessary
> netif_schedule() doesn't really matter.

Hmm... Do you mean during destroying... It's probably not very long,
but deleting and creating qdiscs especially for some virtual devices
can take longer I guess.

Jarek P.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 7/6] Re: [PATCH 2/6] pkt_sched: sch_htb: Consider used jiffies in htb_dequeue()
  2009-01-12 11:08                                 ` Jarek Poplawski
@ 2009-01-12 13:10                                   ` Patrick McHardy
  0 siblings, 0 replies; 40+ messages in thread
From: Patrick McHardy @ 2009-01-12 13:10 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: David Miller, devik, netdev

Jarek Poplawski wrote:
> On Mon, Jan 12, 2009 at 11:22:47AM +0100, Patrick McHardy wrote:
> ...
>> It doesn't bother me :) I just think its the technical better
>> and also most likely code-wise cleaner solution to this problem.
>> Cancellation wouldn't be necessary since an unnecessary
>> netif_schedule() doesn't really matter.
> 
> Hmm... Do you mean during destroying... It's probably not very long,
> but deleting and creating qdiscs especially for some virtual devices
> can take longer I guess.

I was referring to your statement "but also some overhead when
cancelling this workqueue" and assumed you meant during packet
processing. In the destruction path it really doesn't matter at
all.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 7/6 resend] pkt_sched: sch_htb: Consider used jiffies in htb_do_events()
  2009-01-12 10:16                   ` [PATCH 7/6 resend] pkt_sched: sch_htb: Consider used jiffies in htb_do_events() Jarek Poplawski
@ 2009-01-13  5:54                     ` David Miller
  0 siblings, 0 replies; 40+ messages in thread
From: David Miller @ 2009-01-13  5:54 UTC (permalink / raw)
  To: jarkao2; +Cc: kaber, devik, netdev

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Mon, 12 Jan 2009 10:16:13 +0000

> pkt_sched: sch_htb: Consider used jiffies in htb_do_events()
> 
> Next event time should consider jiffies used for recounting. Otherwise
> qdisc_watchdog_schedule() triggers hrtimer immediately with the event
> in the past, and may cause very high ksoftirqd cpu usage (if highres
> is on).
> 
> There is also removed checking "event" for zero in htb_dequeue(): it's
> always true in this place.
> 
> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>

Applied.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 8/6 resend] pkt_sched: sch_htb: Break all htb_do_events() after 2 jiffies
  2009-01-12 10:17               ` [PATCH 8/6 resend] pkt_sched: sch_htb: Break all htb_do_events() after 2 jiffies Jarek Poplawski
@ 2009-01-13  5:54                 ` David Miller
  0 siblings, 0 replies; 40+ messages in thread
From: David Miller @ 2009-01-13  5:54 UTC (permalink / raw)
  To: jarkao2; +Cc: kaber, devik, netdev

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Mon, 12 Jan 2009 10:17:46 +0000

> pkt_sched: sch_htb: Break all htb_do_events() after 2 jiffies
> 
> Currently htb_do_events() breaks events recounting for a level after 2
> jiffies, but there is no reason to repeat this for next levels and
> increase delays even more (with softirqs disabled). htb_dequeue_tree()
> can add to this too, btw. In such a case q->now time is invalid anyway.
> 
> Thanks to Patrick McHardy for spotting an error around earlier version
> of this patch.
> 
> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>

Also applied, thanks Jarek.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH net-next] pkt_sched: sch_htb: Warn on too many events.
  2009-01-12 10:22                               ` Patrick McHardy
  2009-01-12 11:08                                 ` Jarek Poplawski
@ 2009-01-28 12:52                                 ` Jarek Poplawski
  2009-01-28 16:18                                   ` Patrick McHardy
  2009-01-28 13:23                                 ` [PATCH 7/6] Re: [PATCH 2/6] pkt_sched: sch_htb: Consider used jiffies in htb_dequeue() Jarek Poplawski
  2 siblings, 1 reply; 40+ messages in thread
From: Jarek Poplawski @ 2009-01-28 12:52 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David Miller, devik, netdev

On 12-01-2009 11:22, Patrick McHardy wrote:
> Jarek Poplawski wrote:
...
>> BTW, I wonder if adding the old "too many
>> events" warning back wouldn't be more useful here.
> 
> It would be good to notify the user and also have some indication
> for this case when looking into bug reports. A counter or a (single)
> printk would both be fine.

Here is my proposal. (I assume single "work conserving" warn should be
enough too - if somebody doesn't fix it for one class, they probably
know what they are doing, and don't need more.)

Jarek P.
-------------->
pkt_sched: sch_htb: Warn on too many events.

Let's get some info on possible config problems. This patch brings
back an old warning, but it's printed only once now. BTW a "class
isn't work conserving" warning is also limited to once per qdisc
instead of per class.

With feedback from Patrick McHardy <kaber@trash.net>

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
---

 net/sched/sch_htb.c |   14 ++++++++++----
 1 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 2f0f0b0..6fbb3e3 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -114,8 +114,6 @@ struct htb_class {
 	struct tcf_proto *filter_list;
 	int filter_cnt;
 
-	int warned;		/* only one warning about non work conserving .. */
-
 	/* token bucket parameters */
 	struct qdisc_rate_table *rate;	/* rate table of the class itself */
 	struct qdisc_rate_table *ceil;	/* ceiling rate (limits borrows too) */
@@ -155,6 +153,10 @@ struct htb_sched {
 	int direct_qlen;	/* max qlen of above */
 
 	long direct_pkts;
+
+#define HTB_WARN_NONCONSERVING	0x1
+#define HTB_WARN_TOOMANYEVENTS	0x2
+	int warned;	/* only one warning about non work conserving etc. */
 };
 
 /* find class in global hash table using given handle */
@@ -687,6 +689,10 @@ static psched_time_t htb_do_events(struct htb_sched *q, int level,
 			htb_add_to_wait_tree(q, cl, diff);
 	}
 	/* too much load - let's continue on next jiffie (including above) */
+	if (!(q->warned & HTB_WARN_TOOMANYEVENTS)) {
+		printk(KERN_WARNING "htb: too many events!\n");
+		q->warned |= HTB_WARN_TOOMANYEVENTS;
+	}
 	return q->now + 2 * PSCHED_TICKS_PER_SEC / HZ;
 }
 
@@ -809,11 +815,11 @@ next:
 		skb = cl->un.leaf.q->dequeue(cl->un.leaf.q);
 		if (likely(skb != NULL))
 			break;
-		if (!cl->warned) {
+		if (!(q->warned & HTB_WARN_NONCONSERVING)) {
 			printk(KERN_WARNING
 			       "htb: class %X isn't work conserving ?!\n",
 			       cl->common.classid);
-			cl->warned = 1;
+			q->warned |= HTB_WARN_NONCONSERVING;
 		}
 
 		htb_next_rb_node((level ? cl->parent->un.inner.ptr : q->

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH 7/6] Re: [PATCH 2/6] pkt_sched: sch_htb: Consider used jiffies in htb_dequeue()
  2009-01-12 10:22                               ` Patrick McHardy
  2009-01-12 11:08                                 ` Jarek Poplawski
  2009-01-28 12:52                                 ` [PATCH net-next] pkt_sched: sch_htb: Warn on too many events Jarek Poplawski
@ 2009-01-28 13:23                                 ` Jarek Poplawski
  2009-01-28 16:20                                   ` Patrick McHardy
  2 siblings, 1 reply; 40+ messages in thread
From: Jarek Poplawski @ 2009-01-28 13:23 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David Miller, devik, netdev

On 12-01-2009 11:22, Patrick McHardy wrote:
> Jarek Poplawski wrote:
>> On Mon, Jan 12, 2009 at 07:56:37AM +0100, Patrick McHardy wrote:
>>> Sorry, I dropped the ball on this one. I still think scheduling
>>> a work-queue or something else running in process context to
>>> kick the queue once the scheduler had a chance to run would
>>> be a better solution. But Jarek's patches are an improvement
>>> to the current situation, so no objections from me.
>>>
>> Thanks for the review Patrick. As I wrote before, I'm not against
>> using a workqueue here: it's logically better, but I still think
>> this place is rather exception, so I'm not convinced we should
>> care so much adding better solution, but also some overhead when
>> cancelling this workqueue. But if it really bothers you, please
>> confirm, and I'll do it.
> 
> It doesn't bother me :) I just think its the technical better
> and also most likely code-wise cleaner solution to this problem.
> Cancellation wouldn't be necessary since an unnecessary
> netif_schedule() doesn't really matter.
> 
> It you don't mind adding the workqueue, I certainly would prefer
> it, but I'm also fine with this patch. I don't have a HTB setup
> or a testcase for this specific case, otherwise I'd simply do it
> myself.

Here is an example of this workqueue. I hope I didn't miss your point,
but since I didn't find much difference in testing, I'd prefer not to
sign-off/merge this yet, at least until there are many reports on
"too many events" problem, and somebody finds it useful.

Thanks,
Jarek P.

--- (for example only)

diff -Nurp b/net/sched/sch_htb.c c/net/sched/sch_htb.c
--- b/net/sched/sch_htb.c	2009-01-13 20:20:47.000000000 +0100
+++ c/net/sched/sch_htb.c	2009-01-13 21:32:17.000000000 +0100
@@ -35,6 +35,7 @@
 #include <linux/list.h>
 #include <linux/compiler.h>
 #include <linux/rbtree.h>
+#include <linux/workqueue.h>
 #include <net/netlink.h>
 #include <net/pkt_sched.h>
 
@@ -157,6 +158,7 @@ struct htb_sched {
 #define HTB_WARN_NONCONSERVING	0x1
 #define HTB_WARN_TOOMANYEVENTS	0x2
 	int warned;	/* only one warning about non work conserving etc. */
+	struct work_struct work;
 };
 
 /* find class in global hash table using given handle */
@@ -660,7 +662,7 @@ static void htb_charge_class(struct htb_
  * htb_do_events - make mode changes to classes at the level
  *
  * Scans event queue for pending events and applies them. Returns time of
- * next pending event (0 for no event in pq).
+ * next pending event (0 for no event in pq, q->now for too many events).
  * Note: Applied are events whose have cl->pq_key <= q->now.
  */
 static psched_time_t htb_do_events(struct htb_sched *q, int level,
@@ -688,12 +690,14 @@ static psched_time_t htb_do_events(struc
 		if (cl->cmode != HTB_CAN_SEND)
 			htb_add_to_wait_tree(q, cl, diff);
 	}
-	/* too much load - let's continue on next jiffie (including above) */
+
+	/* too much load - let's continue after a break for scheduling */
 	if (!(q->warned & HTB_WARN_TOOMANYEVENTS)) {
 		printk(KERN_WARNING "htb: too many events!\n");
 		q->warned |= HTB_WARN_TOOMANYEVENTS;
 	}
-	return q->now + 2 * PSCHED_TICKS_PER_SEC / HZ;
+
+	return q->now;
 }
 
 /* Returns class->node+prio from id-tree where classe's id is >= id. NULL
@@ -898,7 +902,10 @@ static struct sk_buff *htb_dequeue(struc
 		}
 	}
 	sch->qstats.overlimits++;
-	qdisc_watchdog_schedule(&q->watchdog, next_event);
+	if (likely(next_event > q->now))
+		qdisc_watchdog_schedule(&q->watchdog, next_event);
+	else
+		schedule_work(&q->work);
 fin:
 	return skb;
 }
@@ -968,6 +975,14 @@ static const struct nla_policy htb_polic
 	[TCA_HTB_RTAB]	= { .type = NLA_BINARY, .len = TC_RTAB_SIZE },
 };
 
+static void htb_work_func(struct work_struct *work)
+{
+	struct htb_sched *q = container_of(work, struct htb_sched, work);
+	struct Qdisc *sch = q->watchdog.qdisc;
+
+	__netif_schedule(qdisc_root(sch));
+}
+
 static int htb_init(struct Qdisc *sch, struct nlattr *opt)
 {
 	struct htb_sched *q = qdisc_priv(sch);
@@ -1002,6 +1017,7 @@ static int htb_init(struct Qdisc *sch, s
 		INIT_LIST_HEAD(q->drops + i);
 
 	qdisc_watchdog_init(&q->watchdog, sch);
+	INIT_WORK(&q->work, htb_work_func);
 	skb_queue_head_init(&q->direct_queue);
 
 	q->direct_qlen = qdisc_dev(sch)->tx_queue_len;
@@ -1194,7 +1210,6 @@ static void htb_destroy_class(struct Qdi
 	kfree(cl);
 }
 
-/* always caled under BH & queue lock */
 static void htb_destroy(struct Qdisc *sch)
 {
 	struct htb_sched *q = qdisc_priv(sch);
@@ -1202,6 +1217,7 @@ static void htb_destroy(struct Qdisc *sc
 	struct htb_class *cl;
 	unsigned int i;
 
+	cancel_work_sync(&q->work);
 	qdisc_watchdog_cancel(&q->watchdog);
 	/* This line used to be after htb_destroy_class call below
 	   and surprisingly it worked in 2.4. But it must precede it

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next] pkt_sched: sch_htb: Warn on too many events.
  2009-01-28 12:52                                 ` [PATCH net-next] pkt_sched: sch_htb: Warn on too many events Jarek Poplawski
@ 2009-01-28 16:18                                   ` Patrick McHardy
  2009-01-30 10:17                                     ` [PATCH 1/3 v2 " Jarek Poplawski
                                                       ` (2 more replies)
  0 siblings, 3 replies; 40+ messages in thread
From: Patrick McHardy @ 2009-01-28 16:18 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: David Miller, devik, netdev

Jarek Poplawski wrote:
> pkt_sched: sch_htb: Warn on too many events.
> 
> Let's get some info on possible config problems. This patch brings
> back an old warning, but it's printed only once now. BTW a "class
> isn't work conserving" warning is also limited to once per qdisc
> instead of per class.
> 
> With feedback from Patrick McHardy <kaber@trash.net>
> 
> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
> ---
> 
>  		skb = cl->un.leaf.q->dequeue(cl->un.leaf.q);
>  		if (likely(skb != NULL))
>  			break;
> -		if (!cl->warned) {
> +		if (!(q->warned & HTB_WARN_NONCONSERVING)) {
>  			printk(KERN_WARNING
>  			       "htb: class %X isn't work conserving ?!\n",
>  			       cl->common.classid);
> -			cl->warned = 1;
> +			q->warned |= HTB_WARN_NONCONSERVING;

How about making this flag and the warning message (in a out-of-line
function) globally available? Other qdiscs (f.i. HFSC) can't deal with
inner non-work-conserving qdiscs as well.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 7/6] Re: [PATCH 2/6] pkt_sched: sch_htb: Consider used jiffies in htb_dequeue()
  2009-01-28 13:23                                 ` [PATCH 7/6] Re: [PATCH 2/6] pkt_sched: sch_htb: Consider used jiffies in htb_dequeue() Jarek Poplawski
@ 2009-01-28 16:20                                   ` Patrick McHardy
  0 siblings, 0 replies; 40+ messages in thread
From: Patrick McHardy @ 2009-01-28 16:20 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: David Miller, devik, netdev

Jarek Poplawski wrote:
> On 12-01-2009 11:22, Patrick McHardy wrote:
>> It you don't mind adding the workqueue, I certainly would prefer
>> it, but I'm also fine with this patch. I don't have a HTB setup
>> or a testcase for this specific case, otherwise I'd simply do it
>> myself.
> 
> Here is an example of this workqueue. I hope I didn't miss your point,
> but since I didn't find much difference in testing, I'd prefer not to
> sign-off/merge this yet, at least until there are many reports on
> "too many events" problem, and somebody finds it useful.

No, this seems to be exactly what I meant. The differnce - yeah, it
shouldn't make much, mainly wake up the qdisc earlier (but not too
early) after "too many events" occured _and_ no further enqueue
events wake up the qdisc anyways.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/3 v2 net-next] pkt_sched: sch_htb: Warn on too many events.
  2009-01-28 16:18                                   ` Patrick McHardy
@ 2009-01-30 10:17                                     ` Jarek Poplawski
  2009-02-01  9:13                                       ` David Miller
  2009-01-30 10:17                                     ` [PATCH 2/3 " Jarek Poplawski
  2009-01-30 10:17                                     ` [PATCH 3/3 " Jarek Poplawski
  2 siblings, 1 reply; 40+ messages in thread
From: Jarek Poplawski @ 2009-01-30 10:17 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David Miller, devik, netdev

On Wed, Jan 28, 2009 at 05:18:12PM +0100, Patrick McHardy wrote:
> Jarek Poplawski wrote:
>> pkt_sched: sch_htb: Warn on too many events.
...
> How about making this flag and the warning message (in a out-of-line
> function) globally available? Other qdiscs (f.i. HFSC) can't deal with
> inner non-work-conserving qdiscs as well.

OK, thanks,
Jarek P.
------------------> take 2: PATCH 1/3
pkt_sched: sch_hfsc: sch_htb: Add non-work-conserving warning handler.

Patrick McHardy <kaber@trash.net> suggested:
> How about making this flag and the warning message (in a out-of-line
> function) globally available? Other qdiscs (f.i. HFSC) can't deal with
> inner non-work-conserving qdiscs as well.

This patch uses qdisc->flags field of "suspected" child qdisc.

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
---

diff -Nurp a/include/net/pkt_sched.h b/include/net/pkt_sched.h
--- a/include/net/pkt_sched.h	2009-01-02 21:21:37.000000000 +0100
+++ b/include/net/pkt_sched.h	2009-01-29 22:57:55.000000000 +0100
@@ -85,6 +85,7 @@ extern struct qdisc_rate_table *qdisc_ge
 		struct nlattr *tab);
 extern void qdisc_put_rtab(struct qdisc_rate_table *tab);
 extern void qdisc_put_stab(struct qdisc_size_table *tab);
+extern void qdisc_warn_nonwc(char *txt, struct Qdisc *qdisc);
 
 extern void __qdisc_run(struct Qdisc *q);
 
diff -Nurp a/include/net/sch_generic.h b/include/net/sch_generic.h
--- a/include/net/sch_generic.h	2009-01-20 18:43:03.000000000 +0100
+++ b/include/net/sch_generic.h	2009-01-29 22:36:33.000000000 +0100
@@ -42,9 +42,10 @@ struct Qdisc
 	int 			(*enqueue)(struct sk_buff *skb, struct Qdisc *dev);
 	struct sk_buff *	(*dequeue)(struct Qdisc *dev);
 	unsigned		flags;
-#define TCQ_F_BUILTIN	1
-#define TCQ_F_THROTTLED	2
-#define TCQ_F_INGRESS	4
+#define TCQ_F_BUILTIN		1
+#define TCQ_F_THROTTLED		2
+#define TCQ_F_INGRESS		4
+#define TCQ_F_WARN_NONWC	(1 << 16)
 	int			padded;
 	struct Qdisc_ops	*ops;
 	struct qdisc_size_table	*stab;
diff -Nurp a/net/sched/sch_api.c b/net/sched/sch_api.c
--- a/net/sched/sch_api.c	2009-01-20 18:43:10.000000000 +0100
+++ b/net/sched/sch_api.c	2009-01-30 00:35:19.000000000 +0100
@@ -444,6 +444,17 @@ out:
 }
 EXPORT_SYMBOL(qdisc_calculate_pkt_len);
 
+void qdisc_warn_nonwc(char *txt, struct Qdisc *qdisc)
+{
+	if (!(qdisc->flags & TCQ_F_WARN_NONWC)) {
+		printk(KERN_WARNING
+		       "%s: %s qdisc %X: is non-work-conserving?\n",
+		       txt, qdisc->ops->id, qdisc->handle >> 16);
+		qdisc->flags |= TCQ_F_WARN_NONWC;
+	}
+}
+EXPORT_SYMBOL(qdisc_warn_nonwc);
+
 static enum hrtimer_restart qdisc_watchdog(struct hrtimer *timer)
 {
 	struct qdisc_watchdog *wd = container_of(timer, struct qdisc_watchdog,
diff -Nurp a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
--- a/net/sched/sch_hfsc.c	2009-01-20 18:43:10.000000000 +0100
+++ b/net/sched/sch_hfsc.c	2009-01-29 23:19:57.000000000 +0100
@@ -887,8 +887,7 @@ qdisc_peek_len(struct Qdisc *sch)
 
 	skb = sch->ops->peek(sch);
 	if (skb == NULL) {
-		if (net_ratelimit())
-			printk("qdisc_peek_len: non work-conserving qdisc ?\n");
+		qdisc_warn_nonwc("qdisc_peek_len", sch);
 		return 0;
 	}
 	len = qdisc_pkt_len(skb);
@@ -1642,8 +1641,7 @@ hfsc_dequeue(struct Qdisc *sch)
 
 	skb = qdisc_dequeue_peeked(cl->qdisc);
 	if (skb == NULL) {
-		if (net_ratelimit())
-			printk("HFSC: Non-work-conserving qdisc ?\n");
+		qdisc_warn_nonwc("HFSC", cl->qdisc);
 		return NULL;
 	}
 
diff -Nurp a/net/sched/sch_htb.c b/net/sched/sch_htb.c
--- a/net/sched/sch_htb.c	2009-01-12 21:09:35.000000000 +0100
+++ b/net/sched/sch_htb.c	2009-01-29 23:07:42.000000000 +0100
@@ -114,8 +114,6 @@ struct htb_class {
 	struct tcf_proto *filter_list;
 	int filter_cnt;
 
-	int warned;		/* only one warning about non work conserving .. */
-
 	/* token bucket parameters */
 	struct qdisc_rate_table *rate;	/* rate table of the class itself */
 	struct qdisc_rate_table *ceil;	/* ceiling rate (limits borrows too) */
@@ -809,13 +807,8 @@ next:
 		skb = cl->un.leaf.q->dequeue(cl->un.leaf.q);
 		if (likely(skb != NULL))
 			break;
-		if (!cl->warned) {
-			printk(KERN_WARNING
-			       "htb: class %X isn't work conserving ?!\n",
-			       cl->common.classid);
-			cl->warned = 1;
-		}
 
+		qdisc_warn_nonwc("htb", cl->un.leaf.q);
 		htb_next_rb_node((level ? cl->parent->un.inner.ptr : q->
 				  ptr[0]) + prio);
 		cl = htb_lookup_leaf(q->row[level] + prio, prio,

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 2/3 net-next] pkt_sched: sch_htb: Warn on too many events.
  2009-01-28 16:18                                   ` Patrick McHardy
  2009-01-30 10:17                                     ` [PATCH 1/3 v2 " Jarek Poplawski
@ 2009-01-30 10:17                                     ` Jarek Poplawski
  2009-02-01  9:13                                       ` David Miller
  2009-01-30 10:17                                     ` [PATCH 3/3 " Jarek Poplawski
  2 siblings, 1 reply; 40+ messages in thread
From: Jarek Poplawski @ 2009-01-30 10:17 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David Miller, devik, netdev

------------------> PATCH 2/3
pkt_sched: sch_htb: Warn on too many events.

Let's get some info on possible config problems. This patch brings
back an old warning, but is printed only once now.

With feedback from Patrick McHardy <kaber@trash.net>

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
---

diff -Nurp b/net/sched/sch_htb.c c/net/sched/sch_htb.c
--- b/net/sched/sch_htb.c	2009-01-29 22:07:42.000000000 +0000
+++ c/net/sched/sch_htb.c	2009-01-30 08:48:41.000000000 +0000
@@ -153,6 +153,9 @@ struct htb_sched {
 	int direct_qlen;	/* max qlen of above */
 
 	long direct_pkts;
+
+#define HTB_WARN_TOOMANYEVENTS	0x1
+	unsigned int warned;	/* only one warning */
 };
 
 /* find class in global hash table using given handle */
@@ -685,6 +688,10 @@ static psched_time_t htb_do_events(struc
 			htb_add_to_wait_tree(q, cl, diff);
 	}
 	/* too much load - let's continue on next jiffie (including above) */
+	if (!(q->warned & HTB_WARN_TOOMANYEVENTS)) {
+		printk(KERN_WARNING "htb: too many events!\n");
+		q->warned |= HTB_WARN_TOOMANYEVENTS;
+	}
 	return q->now + 2 * PSCHED_TICKS_PER_SEC / HZ;
 }
 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 3/3 net-next] pkt_sched: sch_htb: Warn on too many events.
  2009-01-28 16:18                                   ` Patrick McHardy
  2009-01-30 10:17                                     ` [PATCH 1/3 v2 " Jarek Poplawski
  2009-01-30 10:17                                     ` [PATCH 2/3 " Jarek Poplawski
@ 2009-01-30 10:17                                     ` Jarek Poplawski
  2009-02-01  9:13                                       ` David Miller
  2 siblings, 1 reply; 40+ messages in thread
From: Jarek Poplawski @ 2009-01-30 10:17 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David Miller, devik, netdev

(Changed In-Reply-To)
On Wed, Jan 28, 2009 at 05:20:36PM +0100, Patrick McHardy wrote:
> Jarek Poplawski wrote:
>> On 12-01-2009 11:22, Patrick McHardy wrote:
>>> It you don't mind adding the workqueue, I certainly would prefer
>>> it, but I'm also fine with this patch. I don't have a HTB setup
>>> or a testcase for this specific case, otherwise I'd simply do it
>>> myself.
>>
>> Here is an example of this workqueue. I hope I didn't miss your point,
>> but since I didn't find much difference in testing, I'd prefer not to
>> sign-off/merge this yet, at least until there are many reports on
>> "too many events" problem, and somebody finds it useful.
>
> No, this seems to be exactly what I meant. The differnce - yeah, it
> shouldn't make much, mainly wake up the qdisc earlier (but not too
> early) after "too many events" occured _and_ no further enqueue
> events wake up the qdisc anyways.

OK, thanks,
Jarek P.
-------------> PATCH 3/3
pkt_sched: sch_htb: Use workqueue to schedule after too many events.

Patrick McHardy <kaber@trash.net> suggested using a workqueue instead
of hrtimers to trigger netif_schedule() when there is a problem with
setting exact time of this event: 'The differnce - yeah, it shouldn't
make much, mainly wake up the qdisc earlier (but not too early) after
"too many events" occured _and_ no further enqueue events wake up the
qdisc anyways.'

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
---
diff -Nurp c/net/sched/sch_htb.c d/net/sched/sch_htb.c
--- c/net/sched/sch_htb.c	2009-01-30 08:48:41.000000000 +0000
+++ d/net/sched/sch_htb.c	2009-01-30 09:00:06.000000000 +0000
@@ -35,6 +35,7 @@
 #include <linux/list.h>
 #include <linux/compiler.h>
 #include <linux/rbtree.h>
+#include <linux/workqueue.h>
 #include <net/netlink.h>
 #include <net/pkt_sched.h>
 
@@ -156,6 +157,7 @@ struct htb_sched {
 
 #define HTB_WARN_TOOMANYEVENTS	0x1
 	unsigned int warned;	/* only one warning */
+	struct work_struct work;
 };
 
 /* find class in global hash table using given handle */
@@ -659,7 +661,7 @@ static void htb_charge_class(struct htb_
  * htb_do_events - make mode changes to classes at the level
  *
  * Scans event queue for pending events and applies them. Returns time of
- * next pending event (0 for no event in pq).
+ * next pending event (0 for no event in pq, q->now for too many events).
  * Note: Applied are events whose have cl->pq_key <= q->now.
  */
 static psched_time_t htb_do_events(struct htb_sched *q, int level,
@@ -687,12 +689,14 @@ static psched_time_t htb_do_events(struc
 		if (cl->cmode != HTB_CAN_SEND)
 			htb_add_to_wait_tree(q, cl, diff);
 	}
-	/* too much load - let's continue on next jiffie (including above) */
+
+	/* too much load - let's continue after a break for scheduling */
 	if (!(q->warned & HTB_WARN_TOOMANYEVENTS)) {
 		printk(KERN_WARNING "htb: too many events!\n");
 		q->warned |= HTB_WARN_TOOMANYEVENTS;
 	}
-	return q->now + 2 * PSCHED_TICKS_PER_SEC / HZ;
+
+	return q->now;
 }
 
 /* Returns class->node+prio from id-tree where classe's id is >= id. NULL
@@ -892,7 +896,10 @@ static struct sk_buff *htb_dequeue(struc
 		}
 	}
 	sch->qstats.overlimits++;
-	qdisc_watchdog_schedule(&q->watchdog, next_event);
+	if (likely(next_event > q->now))
+		qdisc_watchdog_schedule(&q->watchdog, next_event);
+	else
+		schedule_work(&q->work);
 fin:
 	return skb;
 }
@@ -962,6 +969,14 @@ static const struct nla_policy htb_polic
 	[TCA_HTB_RTAB]	= { .type = NLA_BINARY, .len = TC_RTAB_SIZE },
 };
 
+static void htb_work_func(struct work_struct *work)
+{
+	struct htb_sched *q = container_of(work, struct htb_sched, work);
+	struct Qdisc *sch = q->watchdog.qdisc;
+
+	__netif_schedule(qdisc_root(sch));
+}
+
 static int htb_init(struct Qdisc *sch, struct nlattr *opt)
 {
 	struct htb_sched *q = qdisc_priv(sch);
@@ -996,6 +1011,7 @@ static int htb_init(struct Qdisc *sch, s
 		INIT_LIST_HEAD(q->drops + i);
 
 	qdisc_watchdog_init(&q->watchdog, sch);
+	INIT_WORK(&q->work, htb_work_func);
 	skb_queue_head_init(&q->direct_queue);
 
 	q->direct_qlen = qdisc_dev(sch)->tx_queue_len;
@@ -1188,7 +1204,6 @@ static void htb_destroy_class(struct Qdi
 	kfree(cl);
 }
 
-/* always caled under BH & queue lock */
 static void htb_destroy(struct Qdisc *sch)
 {
 	struct htb_sched *q = qdisc_priv(sch);
@@ -1196,6 +1211,7 @@ static void htb_destroy(struct Qdisc *sc
 	struct htb_class *cl;
 	unsigned int i;
 
+	cancel_work_sync(&q->work);
 	qdisc_watchdog_cancel(&q->watchdog);
 	/* This line used to be after htb_destroy_class call below
 	   and surprisingly it worked in 2.4. But it must precede it

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/3 v2 net-next] pkt_sched: sch_htb: Warn on too many events.
  2009-01-30 10:17                                     ` [PATCH 1/3 v2 " Jarek Poplawski
@ 2009-02-01  9:13                                       ` David Miller
  0 siblings, 0 replies; 40+ messages in thread
From: David Miller @ 2009-02-01  9:13 UTC (permalink / raw)
  To: jarkao2; +Cc: kaber, devik, netdev

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Fri, 30 Jan 2009 10:17:01 +0000

> pkt_sched: sch_hfsc: sch_htb: Add non-work-conserving warning handler.
> 
> Patrick McHardy <kaber@trash.net> suggested:
> > How about making this flag and the warning message (in a out-of-line
> > function) globally available? Other qdiscs (f.i. HFSC) can't deal with
> > inner non-work-conserving qdiscs as well.
> 
> This patch uses qdisc->flags field of "suspected" child qdisc.
> 
> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>

Applied.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 2/3 net-next] pkt_sched: sch_htb: Warn on too many events.
  2009-01-30 10:17                                     ` [PATCH 2/3 " Jarek Poplawski
@ 2009-02-01  9:13                                       ` David Miller
  0 siblings, 0 replies; 40+ messages in thread
From: David Miller @ 2009-02-01  9:13 UTC (permalink / raw)
  To: jarkao2; +Cc: kaber, devik, netdev

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Fri, 30 Jan 2009 10:17:15 +0000

> pkt_sched: sch_htb: Warn on too many events.
> 
> Let's get some info on possible config problems. This patch brings
> back an old warning, but is printed only once now.
> 
> With feedback from Patrick McHardy <kaber@trash.net>
> 
> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>

Applied.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 3/3 net-next] pkt_sched: sch_htb: Warn on too many events.
  2009-01-30 10:17                                     ` [PATCH 3/3 " Jarek Poplawski
@ 2009-02-01  9:13                                       ` David Miller
  0 siblings, 0 replies; 40+ messages in thread
From: David Miller @ 2009-02-01  9:13 UTC (permalink / raw)
  To: jarkao2; +Cc: kaber, devik, netdev

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Fri, 30 Jan 2009 10:17:25 +0000

> pkt_sched: sch_htb: Use workqueue to schedule after too many events.
> 
> Patrick McHardy <kaber@trash.net> suggested using a workqueue instead
> of hrtimers to trigger netif_schedule() when there is a problem with
> setting exact time of this event: 'The differnce - yeah, it shouldn't
> make much, mainly wake up the qdisc earlier (but not too early) after
> "too many events" occured _and_ no further enqueue events wake up the
> qdisc anyways.'
> 
> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>

Applied.

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2009-02-01  9:13 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-12-09 10:21 [PATCH 2/6] pkt_sched: sch_htb: Consider used jiffies in htb_dequeue() Jarek Poplawski
2008-12-09 10:28 ` Patrick McHardy
2008-12-09 11:32   ` Jarek Poplawski
2008-12-09 12:25     ` Patrick McHardy
2008-12-09 13:08       ` Jarek Poplawski
2008-12-09 13:20         ` Patrick McHardy
2008-12-09 14:45           ` Jarek Poplawski
2008-12-09 14:56             ` Patrick McHardy
2008-12-10 10:52               ` [PATCH 8/6] " Jarek Poplawski
2009-01-12 10:17               ` [PATCH 8/6 resend] pkt_sched: sch_htb: Break all htb_do_events() after 2 jiffies Jarek Poplawski
2009-01-13  5:54                 ` David Miller
2008-12-10  6:35             ` [PATCH 2/6] pkt_sched: sch_htb: Consider used jiffies in htb_dequeue() David Miller
2008-12-10  9:11               ` Jarek Poplawski
2008-12-10  9:14                 ` David Miller
2008-12-10  9:35                   ` [PATCH 7/6] " Jarek Poplawski
2008-12-10 14:38                     ` Patrick McHardy
2008-12-16 23:57                     ` David Miller
2008-12-17  7:03                       ` Jarek Poplawski
2008-12-17  7:38                         ` David Miller
2009-01-12  6:56                           ` Patrick McHardy
2009-01-12 10:10                             ` Jarek Poplawski
2009-01-12 10:22                               ` Patrick McHardy
2009-01-12 11:08                                 ` Jarek Poplawski
2009-01-12 13:10                                   ` Patrick McHardy
2009-01-28 12:52                                 ` [PATCH net-next] pkt_sched: sch_htb: Warn on too many events Jarek Poplawski
2009-01-28 16:18                                   ` Patrick McHardy
2009-01-30 10:17                                     ` [PATCH 1/3 v2 " Jarek Poplawski
2009-02-01  9:13                                       ` David Miller
2009-01-30 10:17                                     ` [PATCH 2/3 " Jarek Poplawski
2009-02-01  9:13                                       ` David Miller
2009-01-30 10:17                                     ` [PATCH 3/3 " Jarek Poplawski
2009-02-01  9:13                                       ` David Miller
2009-01-28 13:23                                 ` [PATCH 7/6] Re: [PATCH 2/6] pkt_sched: sch_htb: Consider used jiffies in htb_dequeue() Jarek Poplawski
2009-01-28 16:20                                   ` Patrick McHardy
2009-01-12 10:29                               ` Jarek Poplawski
2009-01-12 10:32                               ` David Miller
2009-01-12 10:59                                 ` Jarek Poplawski
2009-01-12 11:04                                   ` David Miller
2009-01-12 10:16                   ` [PATCH 7/6 resend] pkt_sched: sch_htb: Consider used jiffies in htb_do_events() Jarek Poplawski
2009-01-13  5:54                     ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.