All of lore.kernel.org
 help / color / mirror / Atom feed
* deadlocks if use htb
@ 2008-10-10  5:44 Badalian Vyacheslav
  2008-10-10  7:56 ` Jarek Poplawski
  0 siblings, 1 reply; 65+ messages in thread
From: Badalian Vyacheslav @ 2008-10-10  5:44 UTC (permalink / raw)
  To: netdev

Hello all!

Please look to if you have time:
http://bugzilla.kernel.org/show_bug.cgi?id=11718

We have deadlocks at few PC one times in week.
I can test any patches to detect and fix problem.
Now i test 2.6.27-rc kernel at one PC.

How information you needed to fast or full debug problem?

Thanks for you doing!

Badalian Vyacheslav aka Slavon
BIG TELECOM CLOSED JSC, Moscow
Network Administrator


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2008-10-10  5:44 deadlocks if use htb Badalian Vyacheslav
@ 2008-10-10  7:56 ` Jarek Poplawski
  2008-10-10  8:46   ` Badalian Vyacheslav
                     ` (2 more replies)
  0 siblings, 3 replies; 65+ messages in thread
From: Jarek Poplawski @ 2008-10-10  7:56 UTC (permalink / raw)
  To: Badalian Vyacheslav; +Cc: netdev

On 10-10-2008 07:44, Badalian Vyacheslav wrote:
> Hello all!

Hello Slavon,

> 
> Please look to if you have time:
> http://bugzilla.kernel.org/show_bug.cgi?id=11718
> 
> We have deadlocks at few PC one times in week.
> I can test any patches to detect and fix problem.
> Now i test 2.6.27-rc kernel at one PC.

A similar bug was reported by Denys Fedoryshchenko but it wasn't fully
diagnosed. Anyway it looks like hardware dependent. The patch below
can sometimes help. 2.6.27 may have this fixed too (some other way).

Jarek P.

(some offsets are OK when patching 2.6.26)
---

 net/sched/sch_htb.c |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 30c999c..ff9e965 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -162,6 +162,7 @@ struct htb_sched {
 
 	int rate2quantum;	/* quant = rate / rate2quantum */
 	psched_time_t now;	/* cached dequeue time */
+	psched_time_t next_watchdog;
 	struct qdisc_watchdog watchdog;
 
 	/* non shaped skbs; let them go directly thru */
@@ -920,7 +921,11 @@ static struct sk_buff *htb_dequeue(struct Qdisc *sch)
 		}
 	}
 	sch->qstats.overlimits++;
-	qdisc_watchdog_schedule(&q->watchdog, next_event);
+	if (q->next_watchdog < q->now || next_event <=
+	     q->next_watchdog - PSCHED_TICKS_PER_SEC / HZ) {
+		qdisc_watchdog_schedule(&q->watchdog, next_event);
+		q->next_watchdog = next_event;
+	}
 fin:
 	return skb;
 }
@@ -973,6 +978,7 @@ static void htb_reset(struct Qdisc *sch)
 		}
 	}
 	qdisc_watchdog_cancel(&q->watchdog);
+	q->next_watchdog = 0;
 	__skb_queue_purge(&q->direct_queue);
 	sch->q.qlen = 0;
 	memset(q->row, 0, sizeof(q->row));

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2008-10-10  7:56 ` Jarek Poplawski
@ 2008-10-10  8:46   ` Badalian Vyacheslav
  2008-10-10  8:52   ` Badalian Vyacheslav
  2008-10-10 12:32   ` Patrick McHardy
  2 siblings, 0 replies; 65+ messages in thread
From: Badalian Vyacheslav @ 2008-10-10  8:46 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: netdev

Jarek Poplawski пишет:
> On 10-10-2008 07:44, Badalian Vyacheslav wrote:
>   
>> Hello all!
>>     
>
> Hello Slavon,
>
>   
>> Please look to if you have time:
>> http://bugzilla.kernel.org/show_bug.cgi?id=11718
>>
>> We have deadlocks at few PC one times in week.
>> I can test any patches to detect and fix problem.
>> Now i test 2.6.27-rc kernel at one PC.
>>     
>
> A similar bug was reported by Denys Fedoryshchenko but it wasn't fully
> diagnosed. Anyway it looks like hardware dependent. The patch below
> can sometimes help. 2.6.27 may have this fixed too (some other way).
>
>   
2.6.27 - get it now!


[ 6951.841662] BUG: NMI Watchdog detected LOCKUP on CPU3, ip c01fde4c,
registers:
[ 6951.841662] Modules linked in: sch_sfq sch_htb netconsole e1000
i2c_i801 e1000e i2c_core
[ 6951.841662]
[ 6951.841662] Pid: 0, comm: swapper Not tainted (2.6.27-fw #1)
[ 6951.841662] EIP: 0060:[<c01fde4c>] EFLAGS: 00000092 CPU: 3
[ 6951.841662] EIP is at __rb_rotate_right+0xc/0x70
[ 6951.841662] EAX: f70c3c68 EBX: f70c3c68 ECX: f70c3c68 EDX: c202c134
[ 6951.841662] ESI: f70c3c68 EDI: f70c3c68 EBP: c202c134 ESP: f785fc2c
[ 6951.841662]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[ 6951.841662] Process swapper (pid: 0, ti=f785e000 task=f7832940
task.ti=f785e000)
[ 6951.841662] Stack: f70c3c68 f70c3c68 f70c3c68 c01fdf41 f70c3c68
00000000 c202c12c c202c134
[ 6951.841662]        c013a91f f70c3c68 c202c12c c202212c c045b100
c013ae0a 00000000 c013d63d
[ 6951.841662]        9a011800 00000652 00000001 00000282 00000652
f70c3c68 00000000 00000000
[ 6951.841662] Call Trace:
[ 6951.841662]  [<c01fdf41>] rb_insert_color+0x91/0xc0
[ 6951.841662]  [<c013a91f>] enqueue_hrtimer+0x5f/0x80
[ 6951.841662]  [<c013ae0a>] hrtimer_start+0xaa/0x130
[ 6951.841662]  [<c013d63d>] getnstimeofday+0x3d/0xe0
[ 6951.841662]  [<c02de83d>] qdisc_watchdog_schedule+0x3d/0x50
[ 6951.841662]  [<f88ac343>] htb_dequeue+0x683/0x7b0 [sch_htb]
[ 6951.841662]  [<c02ce692>] dev_hard_start_xmit+0x1d2/0x2c0
[ 6951.841662]  [<c02dc87a>] __qdisc_run+0x13a/0x1d0
[ 6951.841662]  [<c02d0ed7>] dev_queue_xmit+0x227/0x4f0
[ 6951.841662]  [<c02f29ff>] ip_finish_output+0x11f/0x280
[ 6951.841662]  [<c02f00e0>] ip_forward+0x290/0x310
[ 6951.841662]  [<c02efe35>] ip_forward_finish+0x25/0x40
[ 6951.841662]  [<c02ee9a2>] ip_rcv_finish+0x122/0x360
[ 6951.841662]  [<c02c8cc6>] __alloc_skb+0x36/0x120
[ 6951.841662]  [<c02c9d02>] __netdev_alloc_skb+0x22/0x50
[ 6951.841662]  [<c02eee20>] ip_rcv+0x0/0x290
[ 6951.841662]  [<c02ce064>] netif_receive_skb+0x274/0x4d0
[ 6951.841662]  [<c0108b1a>] nommu_map_single+0x2a/0x60
[ 6951.841662]  [<f883be39>] e1000_receive_skb+0x49/0x80 [e1000e]
[ 6951.841662]  [<f883e84c>] e1000_clean_rx_irq+0x23c/0x300 [e1000e]
[ 6951.841662]  [<f883b3ad>] e1000_clean+0x1bd/0x570 [e1000e]
[ 6951.841662]  [<c02d03bc>] net_rx_action+0x13c/0x200
[ 6951.841662]  [<c0129b72>] __do_softirq+0x82/0x100
[ 6951.841662]  [<c0129c27>] do_softirq+0x37/0x40
[ 6951.841662]  [<c0106060>] do_IRQ+0x40/0x80
[ 6951.841662]  [<c01134c7>] smp_apic_timer_interrupt+0x57/0x90
[ 6951.841662]  [<c010457f>] common_interrupt+0x23/0x28
[ 6951.841662]  [<c0109aa2>] mwait_idle+0x32/0x40
[ 6951.841662]  [<c01026c8>] cpu_idle+0x48/0xe0
[ 6951.841662]  =======================
[ 6951.841662] Code: 24 08 83 e0 03 09 d0 89 03 8b 1c 24 83 c4 0c c3 89
56 08 eb e3 8d 76 00 8d bc 27 00 00 00 00 83 ec 0c 89 1c 24 89 c3 89 7c
24 08 <89> d7 89 74 24 04 8b 50 08 8b 30 8b 4a 04 83 e6 fc 85 c9 89 48




> Jarek P.
>
> (some offsets are OK when patching 2.6.26)
> ---
>
>  net/sched/sch_htb.c |    8 +++++++-
>  1 files changed, 7 insertions(+), 1 deletions(-)
>
> diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
> index 30c999c..ff9e965 100644
> --- a/net/sched/sch_htb.c
> +++ b/net/sched/sch_htb.c
> @@ -162,6 +162,7 @@ struct htb_sched {
>  
>  	int rate2quantum;	/* quant = rate / rate2quantum */
>  	psched_time_t now;	/* cached dequeue time */
> +	psched_time_t next_watchdog;
>  	struct qdisc_watchdog watchdog;
>  
>  	/* non shaped skbs; let them go directly thru */
> @@ -920,7 +921,11 @@ static struct sk_buff *htb_dequeue(struct Qdisc *sch)
>  		}
>  	}
>  	sch->qstats.overlimits++;
> -	qdisc_watchdog_schedule(&q->watchdog, next_event);
> +	if (q->next_watchdog < q->now || next_event <=
> +	     q->next_watchdog - PSCHED_TICKS_PER_SEC / HZ) {
> +		qdisc_watchdog_schedule(&q->watchdog, next_event);
> +		q->next_watchdog = next_event;
> +	}
>  fin:
>  	return skb;
>  }
> @@ -973,6 +978,7 @@ static void htb_reset(struct Qdisc *sch)
>  		}
>  	}
>  	qdisc_watchdog_cancel(&q->watchdog);
> +	q->next_watchdog = 0;
>  	__skb_queue_purge(&q->direct_queue);
>  	sch->q.qlen = 0;
>  	memset(q->row, 0, sizeof(q->row));
>
>   


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2008-10-10  7:56 ` Jarek Poplawski
  2008-10-10  8:46   ` Badalian Vyacheslav
@ 2008-10-10  8:52   ` Badalian Vyacheslav
  2008-10-10  9:04     ` Jarek Poplawski
  2008-10-10 12:32   ` Patrick McHardy
  2 siblings, 1 reply; 65+ messages in thread
From: Badalian Vyacheslav @ 2008-10-10  8:52 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: netdev

Oh... sorry. I Wrong you understand... i patch 2.6.27 with this patch
and will test it...

>> Please look to if you have time:
>> http://bugzilla.kernel.org/show_bug.cgi?id=11718
>>
>> We have deadlocks at few PC one times in week.
>> I can test any patches to detect and fix problem.
>> Now i test 2.6.27-rc kernel at one PC.
>>     
>
> A similar bug was reported by Denys Fedoryshchenko but it wasn't fully
> diagnosed. Anyway it looks like hardware dependent. The patch below
> can sometimes help. 2.6.27 may have this fixed too (some other way).
>
> Jarek P.


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2008-10-10  8:52   ` Badalian Vyacheslav
@ 2008-10-10  9:04     ` Jarek Poplawski
  2008-10-10  9:51       ` Jarek Poplawski
  0 siblings, 1 reply; 65+ messages in thread
From: Jarek Poplawski @ 2008-10-10  9:04 UTC (permalink / raw)
  To: Badalian Vyacheslav; +Cc: netdev

On Fri, Oct 10, 2008 at 12:52:58PM +0400, Badalian Vyacheslav wrote:
> Oh... sorry. I Wrong you understand... i patch 2.6.27 with this patch
> and will test it...

No, you understood it right. But it seems 2.6.27 fix doesn't work for
you. So, yes, try this patch with 2.6.26 or 2.6.27.

Jarek P.


> 
> >> Please look to if you have time:
> >> http://bugzilla.kernel.org/show_bug.cgi?id=11718
> >>
> >> We have deadlocks at few PC one times in week.
> >> I can test any patches to detect and fix problem.
> >> Now i test 2.6.27-rc kernel at one PC.
> >>     
> >
> > A similar bug was reported by Denys Fedoryshchenko but it wasn't fully
> > diagnosed. Anyway it looks like hardware dependent. The patch below
> > can sometimes help. 2.6.27 may have this fixed too (some other way).
> >
> > Jarek P.
> 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2008-10-10  9:04     ` Jarek Poplawski
@ 2008-10-10  9:51       ` Jarek Poplawski
  2008-10-16  8:28         ` Badalian Vyacheslav
  0 siblings, 1 reply; 65+ messages in thread
From: Jarek Poplawski @ 2008-10-10  9:51 UTC (permalink / raw)
  To: Badalian Vyacheslav; +Cc: netdev

On Fri, Oct 10, 2008 at 09:04:26AM +0000, Jarek Poplawski wrote:
> On Fri, Oct 10, 2008 at 12:52:58PM +0400, Badalian Vyacheslav wrote:
> > Oh... sorry. I Wrong you understand... i patch 2.6.27 with this patch
> > and will test it...
> 
> No, you understood it right. But it seems 2.6.27 fix doesn't work for
> you. So, yes, try this patch with 2.6.26 or 2.6.27.

There is also a parameter you could try (without this patch):

modprobe sch_htb htb_hysteresis=1

Jarek P.

> > 
> > >> Please look to if you have time:
> > >> http://bugzilla.kernel.org/show_bug.cgi?id=11718
> > >>
> > >> We have deadlocks at few PC one times in week.
> > >> I can test any patches to detect and fix problem.
> > >> Now i test 2.6.27-rc kernel at one PC.
> > >>     
> > >
> > > A similar bug was reported by Denys Fedoryshchenko but it wasn't fully
> > > diagnosed. Anyway it looks like hardware dependent. The patch below
> > > can sometimes help. 2.6.27 may have this fixed too (some other way).
> > >
> > > Jarek P.
> > 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2008-10-10  7:56 ` Jarek Poplawski
  2008-10-10  8:46   ` Badalian Vyacheslav
  2008-10-10  8:52   ` Badalian Vyacheslav
@ 2008-10-10 12:32   ` Patrick McHardy
  2008-10-10 12:34     ` Patrick McHardy
  2 siblings, 1 reply; 65+ messages in thread
From: Patrick McHardy @ 2008-10-10 12:32 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Badalian Vyacheslav, netdev, Denys

Jarek Poplawski wrote:
> On 10-10-2008 07:44, Badalian Vyacheslav wrote:
>> Hello all!
> 
> Hello Slavon,
> 
>> Please look to if you have time:
>> http://bugzilla.kernel.org/show_bug.cgi?id=11718
>>
>> We have deadlocks at few PC one times in week.
>> I can test any patches to detect and fix problem.
>> Now i test 2.6.27-rc kernel at one PC.
> 
> A similar bug was reported by Denys Fedoryshchenko but it wasn't fully
> diagnosed. Anyway it looks like hardware dependent. The patch below
> can sometimes help. 2.6.27 may have this fixed too (some other way).

I doubt its hardware related, whats happening is that the hrtimer
insertion gets into an endless loop because the rb tree (or node)
apparently has a loop. I went through the qdiscs' use of hrtimers
again, but can't spot any error there.

Denys, did your systems also have CONFIG_HIGH_RES_TIMERS=n?

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2008-10-10 12:32   ` Patrick McHardy
@ 2008-10-10 12:34     ` Patrick McHardy
  2008-10-10 12:54       ` Badalian Vyacheslav
  0 siblings, 1 reply; 65+ messages in thread
From: Patrick McHardy @ 2008-10-10 12:34 UTC (permalink / raw)
  To: Badalian Vyacheslav; +Cc: Jarek Poplawski, netdev, Denys

Patrick McHardy wrote:
> Jarek Poplawski wrote:
>> On 10-10-2008 07:44, Badalian Vyacheslav wrote:
>>> Hello all!
>>
>> Hello Slavon,
>>
>>> Please look to if you have time:
>>> http://bugzilla.kernel.org/show_bug.cgi?id=11718
>>>
>>> We have deadlocks at few PC one times in week.
>>> I can test any patches to detect and fix problem.
>>> Now i test 2.6.27-rc kernel at one PC.
>>
>> A similar bug was reported by Denys Fedoryshchenko but it wasn't fully
>> diagnosed. Anyway it looks like hardware dependent. The patch below
>> can sometimes help. 2.6.27 may have this fixed too (some other way).
> 
> I doubt its hardware related, whats happening is that the hrtimer
> insertion gets into an endless loop because the rb tree (or node)
> apparently has a loop. I went through the qdiscs' use of hrtimers
> again, but can't spot any error there.
> 
> Denys, did your systems also have CONFIG_HIGH_RES_TIMERS=n?

Badalian, please try enabling CONFIG_DEBUG_OBJECTS and post the
results, if any.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2008-10-10 12:34     ` Patrick McHardy
@ 2008-10-10 12:54       ` Badalian Vyacheslav
  0 siblings, 0 replies; 65+ messages in thread
From: Badalian Vyacheslav @ 2008-10-10 12:54 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Jarek Poplawski, netdev, Denys

Patrick McHardy пишет:
> Patrick McHardy wrote:
>> Jarek Poplawski wrote:
>>> On 10-10-2008 07:44, Badalian Vyacheslav wrote:
>>>> Hello all!
>>>
>>> Hello Slavon,
>>>
>>>> Please look to if you have time:
>>>> http://bugzilla.kernel.org/show_bug.cgi?id=11718
>>>>
>>>> We have deadlocks at few PC one times in week.
>>>> I can test any patches to detect and fix problem.
>>>> Now i test 2.6.27-rc kernel at one PC.
>>>
>>> A similar bug was reported by Denys Fedoryshchenko but it wasn't fully
>>> diagnosed. Anyway it looks like hardware dependent. The patch below
>>> can sometimes help. 2.6.27 may have this fixed too (some other way).
>>
>> I doubt its hardware related, whats happening is that the hrtimer
>> insertion gets into an endless loop because the rb tree (or node)
>> apparently has a loop. I went through the qdiscs' use of hrtimers
>> again, but can't spot any error there.
>>
>> Denys, did your systems also have CONFIG_HIGH_RES_TIMERS=n?
>
> Badalian, please try enabling CONFIG_DEBUG_OBJECTS and post the
> results, if any.

i have some results with CONFIG_HIGH_RES_TIMERS=n and
CONFIG_HIGH_RES_TIMERS=y
Ok... i recompile kernel... simple wait crash for reboot =)
Now i have pc:

1. 2.6.27 with patch
2. 2.6.26.6 with htb_hysteresis=1 and CONFIG_DEBUG_OBJECTS=n
3. 2.6.26.6 with htb_hysteresis=1 and CONFIG_DEBUG_OBJECTS=y (wait for
crash for reboot)
4. 2 servers deadlocked and not rebooted after panic (2.6.26.5
kernel)... need for drive to its for reboot...
5. 4 pc with 2.6.24-rc7-git2 that also do equal shaping but not have
crashes(its on other hardware)

> -- 
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2008-10-10  9:51       ` Jarek Poplawski
@ 2008-10-16  8:28         ` Badalian Vyacheslav
  2008-10-16  8:40           ` Jarek Poplawski
  0 siblings, 1 reply; 65+ messages in thread
From: Badalian Vyacheslav @ 2008-10-16  8:28 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: netdev

Jarek Poplawski пишет:
> On Fri, Oct 10, 2008 at 09:04:26AM +0000, Jarek Poplawski wrote:
>   
>> On Fri, Oct 10, 2008 at 12:52:58PM +0400, Badalian Vyacheslav wrote:
>>     
>>> Oh... sorry. I Wrong you understand... i patch 2.6.27 with this patch
>>> and will test it...
>>>       
>> No, you understood it right. But it seems 2.6.27 fix doesn't work for
>> you. So, yes, try this patch with 2.6.26 or 2.6.27.
>>     
>
> There is also a parameter you could try (without this patch):
>
> modprobe sch_htb htb_hysteresis=1
>
> Jarek P.
>
>   
>>>>> Please look to if you have time:
>>>>> http://bugzilla.kernel.org/show_bug.cgi?id=11718
>>>>>
>>>>> We have deadlocks at few PC one times in week.
>>>>> I can test any patches to detect and fix problem.
>>>>> Now i test 2.6.27-rc kernel at one PC.
>>>>>     
>>>>>           
>>>> A similar bug was reported by Denys Fedoryshchenko but it wasn't fully
>>>> diagnosed. Anyway it looks like hardware dependent. The patch below
>>>> can sometimes help. 2.6.27 may have this fixed too (some other way).
>>>>
>>>> Jarek P.
>>>>         
>
>   
Sorry for long answer.

We have troubles with power in our server place. Now its gone and i will
test again all this.

With patch +  htb_hysteresis=0 and htb_hysteresis=1 without patch all PC
work done 2 days and 18 hours. After this we have power crash.... =(

Thanks

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2008-10-16  8:28         ` Badalian Vyacheslav
@ 2008-10-16  8:40           ` Jarek Poplawski
  2008-10-22  6:06             ` Badalian Vyacheslav
  0 siblings, 1 reply; 65+ messages in thread
From: Jarek Poplawski @ 2008-10-16  8:40 UTC (permalink / raw)
  To: Badalian Vyacheslav; +Cc: netdev

On Thu, Oct 16, 2008 at 12:28:46PM +0400, Badalian Vyacheslav wrote:
...
> Sorry for long answer.
> 
> We have troubles with power in our server place. Now its gone and i will
> test again all this.
> 
> With patch +  htb_hysteresis=0 and htb_hysteresis=1 without patch all PC
> work done 2 days and 18 hours. After this we have power crash.... =(

No need to hurry: you've written it's not everyday. Better try to make
sure there is really a diffrence after any of these changes.

Thanks,
Jarek P.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2008-10-16  8:40           ` Jarek Poplawski
@ 2008-10-22  6:06             ` Badalian Vyacheslav
  2008-10-22  7:02               ` Jarek Poplawski
  0 siblings, 1 reply; 65+ messages in thread
From: Badalian Vyacheslav @ 2008-10-22  6:06 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: netdev

Hello!
I get more information.

Statistics of PC:
1. 2.6.26.6 Dunamic Timer, HiResTimer, 1000HZ, htb_hysteresis=0 -
crashed 1d 18h ago
2. 2.6.26.5 HZ300, NO Dunamic Timer, No HiResTimer, htb_hysteresis=0 -
uptime 5d 17h (no crashes for now, but it crashed some time ago with
htb_hysteresis=1)
3. 2.6.27, 1000HZ, NO Dunamic Timer, No HiResTimer, htb_hysteresis=0 +
PATCH - uptime 5d 17h (no crashes for now, but it crashed some time ago
without patch)

Also attach crash log of lash crash PC 1:

[10610.110729] BUG: NMI Watchdog detected LOCKUP on CPU1, ip c01fd939,
registers:
[10610.110729] Modules linked in: netconsole e1000e i2c_i801 i2c_core e1000
[10610.110729]
[10610.110729] Pid: 0, comm: swapper Not tainted (2.6.26.6-fw #1)
[10610.110729] EIP: 0060:[<c01fd939>] EFLAGS: 00000082 CPU: 1
[10610.110729] EIP is at rb_insert_color+0x19/0xc0
[10610.110729] EAX: f6c23ca4 EBX: f6c23ca4 ECX: 00000000 EDX: f6c23ca4
[10610.110729] ESI: f6c23ca4 EDI: f6c23ca4 EBP: c20190e0 ESP: f7c4dc98
[10610.110729]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[10610.110729] Process swapper (pid: 0, ti=f7c4c000 task=f7c314a0
task.ti=f7c4c000)
[10610.110729] Stack: f6c23ca8 f6c23ca4 f6c23ca4 00000000 c013b672
c20190e0 00000001 c20190d8
[10610.110729]        c20190d8 f6c23ca4 c202d0d8 c04470a0 c013bd4d
00000000 f7c4dcf4 7c491000
[10610.110729]        000009a5 00000001 00000286 f6c23800 ffffffff
00000000 00000000 c02d407e
[10610.110729] Call Trace:
[10610.110729]  [<c013b672>] enqueue_hrtimer+0x72/0xf0
[10610.110729]  [<c013bd4d>] hrtimer_start+0xad/0x150
[10610.110729]  [<c02d407e>] qdisc_watchdog_schedule+0x1e/0x30
[10610.110729]  [<c02d9826>] htb_dequeue+0x6a6/0x810
[10610.110729]  [<c02d3f72>] tc_classify+0x42/0x90
[10610.110729]  [<c02dab22>] sfq_enqueue+0x22/0x230
[10610.110729]  [<c02d9c40>] htb_enqueue+0x0/0x1e0
[10610.110729]  [<c02d2efc>] __qdisc_run+0x19c/0x1d0
[10610.110729]  [<c02d9c40>] htb_enqueue+0x0/0x1e0
[10610.110729]  [<c02c7737>] dev_queue_xmit+0x267/0x380
[10610.110729]  [<c02e8ab0>] ip_forward_finish+0x0/0x40
[10610.110729]  [<c02eb65f>] ip_finish_output+0x11f/0x280
[10610.110729]  [<c02e8d7f>] ip_forward+0x28f/0x2d0
[10610.110729]  [<c02e8ad5>] ip_forward_finish+0x25/0x40
[10610.110729]  [<c02e7612>] ip_rcv_finish+0x122/0x360
[10610.110729]  [<c02bfa87>] __alloc_skb+0x57/0x120
[10610.110729]  [<c0109c8a>] nommu_map_single+0x2a/0x60
[10610.110729]  [<c02e7a90>] ip_rcv+0x0/0x290
[10610.110729]  [<c02c45cb>] netif_receive_skb+0x26b/0x470
[10610.110729]  [<f886c75d>] e1000_receive_skb+0x4d/0x1b0 [e1000e]
[10610.110729]  [<f886f9cc>] e1000_clean_rx_irq+0x23c/0x300 [e1000e]
[10610.110729]  [<f886bf69>] e1000_clean+0x49/0x1f0 [e1000e]
[10610.110729]  [<c02c69d8>] net_rx_action+0xf8/0x1b0
[10610.110729]  [<c012a922>] __do_softirq+0x82/0x100
[10610.110729]  [<c012a9d7>] do_softirq+0x37/0x40
[10610.110729]  [<c012ad27>] irq_exit+0x57/0x80
[10610.110729]  [<c0107120>] do_IRQ+0x40/0x80
[10610.110729]  [<c0114097>] smp_apic_timer_interrupt+0x57/0x90
[10610.110729]  [<c01055a3>] common_interrupt+0x23/0x28
[10610.110729]  [<c010a602>] mwait_idle+0x32/0x40
[10610.110729]  [<c010a5d0>] mwait_idle+0x0/0x40
[10610.110729]  [<c01036f3>] cpu_idle+0x53/0xc0
[10610.110729]  =======================
[10610.110729] Code: c4 0c c3 89 56 04 eb e3 8d 76 00 8d bc 27 00 00 00
00 55 89 d5 57 89 c7 56 53 90 8d b4 26 00 00 00 00 8b 1f 83 e3 fc 74 32
8b 03 <89> d9 a8 01 75 2a 89 c6 83 e6 fc 8b 56 08 39 d3 74 45 85 d2 74

Thanks!
Best regals, Badalian Vyacheslav

> On Thu, Oct 16, 2008 at 12:28:46PM +0400, Badalian Vyacheslav wrote:
> ...
>   
>> Sorry for long answer.
>>
>> We have troubles with power in our server place. Now its gone and i will
>> test again all this.
>>
>> With patch +  htb_hysteresis=0 and htb_hysteresis=1 without patch all PC
>> work done 2 days and 18 hours. After this we have power crash.... =(
>>     
>
> No need to hurry: you've written it's not everyday. Better try to make
> sure there is really a diffrence after any of these changes.
>
> Thanks,
> Jarek P.
>
>   


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2008-10-22  6:06             ` Badalian Vyacheslav
@ 2008-10-22  7:02               ` Jarek Poplawski
  2008-12-10 15:14                 ` Badalian Vyacheslav
  0 siblings, 1 reply; 65+ messages in thread
From: Jarek Poplawski @ 2008-10-22  7:02 UTC (permalink / raw)
  To: Badalian Vyacheslav; +Cc: netdev, linux-kernel

On Wed, Oct 22, 2008 at 10:06:58AM +0400, Badalian Vyacheslav wrote:
> Hello!
Hi!

> I get more information.
> 
> Statistics of PC:
> 1. 2.6.26.6 Dunamic Timer, HiResTimer, 1000HZ, htb_hysteresis=0 -
> crashed 1d 18h ago
> 2. 2.6.26.5 HZ300, NO Dunamic Timer, No HiResTimer, htb_hysteresis=0 -
> uptime 5d 17h (no crashes for now, but it crashed some time ago with
> htb_hysteresis=1)

So it looks like htb_hysteresis change could be not enough, and it's
a pity because this could be easiest to "reverse" in the kernel.

> 3. 2.6.27, 1000HZ, NO Dunamic Timer, No HiResTimer, htb_hysteresis=0 +
> PATCH - uptime 5d 17h (no crashes for now, but it crashed some time ago
> without patch)

So I guess this patch seems to make some difference, but it also could
be hard to convince people to fix it this way, because it makes
scheduling less exact.

Alas I have still no idea of the real reason. IMHO this should be 
rather debugged by hrtimers/NMI people, but there were no respose last
time I Cc-ed them - anyway I added linux-kernel to Cc again here.

I attach below a slightly modified version of the previous patch,
which lets for more exact scheduling, but could be more vulnerable for
this bug. Alas, even if it works, it still could be not the final
solution of this problem.

> Also attach crash log of lash crash PC 1:
> 
> [10610.110729] BUG: NMI Watchdog detected LOCKUP on CPU1, ip c01fd939,
> registers:
> [10610.110729] Modules linked in: netconsole e1000e i2c_i801 i2c_core e1000
> [10610.110729]
> [10610.110729] Pid: 0, comm: swapper Not tainted (2.6.26.6-fw #1)
> [10610.110729] EIP: 0060:[<c01fd939>] EFLAGS: 00000082 CPU: 1
> [10610.110729] EIP is at rb_insert_color+0x19/0xc0

Thanks,
Jarek P.

--- (testing patch #2)

 net/sched/sch_htb.c |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 30c999c..ff9e965 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -162,6 +162,7 @@ struct htb_sched {
 
 	int rate2quantum;	/* quant = rate / rate2quantum */
 	psched_time_t now;	/* cached dequeue time */
+	psched_time_t next_watchdog;
 	struct qdisc_watchdog watchdog;
 
 	/* non shaped skbs; let them go directly thru */
@@ -920,7 +921,11 @@ static struct sk_buff *htb_dequeue(struct Qdisc *sch)
 		}
 	}
 	sch->qstats.overlimits++;
-	qdisc_watchdog_schedule(&q->watchdog, next_event);
+	if (q->next_watchdog < q->now || next_event <=
+	     q->next_watchdog - PSCHED_TICKS_PER_SEC / (10 * HZ)) {
+		qdisc_watchdog_schedule(&q->watchdog, next_event);
+		q->next_watchdog = next_event;
+	}
 fin:
 	return skb;
 }
@@ -973,6 +978,7 @@ static void htb_reset(struct Qdisc *sch)
 		}
 	}
 	qdisc_watchdog_cancel(&q->watchdog);
+	q->next_watchdog = 0;
 	__skb_queue_purge(&q->direct_queue);
 	sch->q.qlen = 0;
 	memset(q->row, 0, sizeof(q->row));

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2008-10-22  7:02               ` Jarek Poplawski
@ 2008-12-10 15:14                 ` Badalian Vyacheslav
  2008-12-11  8:46                   ` Jarek Poplawski
  0 siblings, 1 reply; 65+ messages in thread
From: Badalian Vyacheslav @ 2008-12-10 15:14 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: netdev, linux-kernel

Hello again! Sorry for long away.
I was go away from this work for long time.

May we return to this bug?
Servers at last stable kernel 2.6.27.8
HZ=1000, HR=off, DynamicTicks=off, hysteresis=1
Sorry - no patched, update do not i. Do you have fresh patches or ideas
for tests?
Also debug "Debug object operations" and "Debug timer objects" is on but
i not see additional information at dump.

Thanks!

That dump:
[ 1500.932813] BUG: NMI Watchdog detected LOCKUP on CPU3, ip c0208ed9,
registers:
[ 1500.932813] Modules linked in: cls_u32 sch_sfq sch_htb netconsole
e1000e e1000 i2c_i801
[ 1500.932813]
[ 1500.932813] Pid: 0, comm: swapper Not tainted (2.6.27-gentoo-r5-fw #1)
[ 1500.932813] EIP: 0060:[<c0208ed9>] EFLAGS: 00000082 CPU: 3
[ 1500.932813] EIP is at rb_insert_color+0x29/0xf0
[ 1500.932813] EAX: f3391c68 EBX: f3391c68 ECX: 00000000 EDX: f3391c68
[ 1500.932813] ESI: 00000000 EDI: f3391c68 EBP: f3391c68 ESP: f785fc1c
[ 1500.932813]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[ 1500.932813] Process swapper (pid: 0, ti=f785e000 task=f7832940
task.ti=f785e000)
[ 1500.932813] Stack: c202c134 f3391c68 f3391c6c 00000000 c202c134
c013d5a2 c202c12c f3391c68
[ 1500.932813]        c202c12c c202212c c0480100 c013dba9 00000000
002dcd3c 76886800 0000015d
[ 1500.932813]        00000001 00000282 0000015d f3391c68 f3391800
f3391880 c02ff53d 00000000
[ 1500.932813] Call Trace:
[ 1500.932813]  [<c013d5a2>] enqueue_hrtimer+0x72/0x90
[ 1500.932813]  [<c013dba9>] hrtimer_start+0x99/0x150
[ 1500.932813]  [<c02ff53d>] qdisc_watchdog_schedule+0x3d/0x50
[ 1500.932813]  [<f88a379a>] htb_dequeue+0x6aa/0x7d0 [sch_htb]
[ 1500.932813]  [<c02ee6a2>] dev_hard_start_xmit+0x242/0x2d0
[ 1500.932813]  [<c02fccfc>] __qdisc_run+0x13c/0x1e0
[ 1500.932813]  [<c02eea81>] dev_queue_xmit+0x241/0x570
[ 1500.932813]  [<c03133a4>] ip_finish_output+0x184/0x260
[ 1500.932813]  [<c030f6a6>] ip_forward_finish+0x26/0x40
[ 1500.932813]  [<c030e1a6>] ip_rcv_finish+0x156/0x340
[ 1500.932813]  [<c02e93e2>] __netdev_alloc_skb+0x22/0x50
[ 1500.932813]  [<c02e8934>] __alloc_skb+0x34/0x120
[ 1500.932813]  [<c02e93e2>] __netdev_alloc_skb+0x22/0x50
[ 1500.932813]  [<c02edef7>] netif_receive_skb+0x227/0x4c0
[ 1500.932813]  [<f8866db6>] e1000_receive_skb+0x46/0x80 [e1000e]
[ 1500.932813]  [<f8867030>] e1000_clean_rx_irq+0x240/0x340 [e1000e]
[ 1500.932813]  [<f8865277>] e1000_clean+0x1f7/0x5b0 [e1000e]
[ 1500.932813]  [<c0121daa>] run_rebalance_domains+0x8a/0x520
[ 1500.932813]  [<c011d6b7>] __dequeue_entity+0x57/0xc0
[ 1500.932813]  [<c02ec69c>] net_rx_action+0xcc/0x1e0
[ 1500.932813]  [<c012b842>] __do_softirq+0x82/0xf0
[ 1500.932813]  [<c012b8ed>] do_softirq+0x3d/0x50
[ 1500.932813]  [<c0106090>] do_IRQ+0x40/0x70
[ 1500.932813]  [<c010462b>] common_interrupt+0x23/0x28
[ 1500.932813]  [<c01099ef>] mwait_idle+0x2f/0x40
[ 1500.932813]  [<c0102c13>] cpu_idle+0x73/0xf0
[ 1500.932813]  =======================
[ 1500.932813] Code: 76 00 55 89 c5 57 56 53 83 ec 04 89 14 24 8d 74 26
00 8b 45 00 89 c3 83 e3 fc 74 36 8b 13 f6 c2 01 75 2f 89 d7 83 e7 fc 8b
77 08 <39> de 74 53 85 f6 74 2f 8b 06 a8 01 75 29 83 c8 01 89 fd 89 06



> On Wed, Oct 22, 2008 at 10:06:58AM +0400, Badalian Vyacheslav wrote:
>   
>> Hello!
>>     
> Hi!
>
>   
>> I get more information.
>>
>> Statistics of PC:
>> 1. 2.6.26.6 Dunamic Timer, HiResTimer, 1000HZ, htb_hysteresis=0 -
>> crashed 1d 18h ago
>> 2. 2.6.26.5 HZ300, NO Dunamic Timer, No HiResTimer, htb_hysteresis=0 -
>> uptime 5d 17h (no crashes for now, but it crashed some time ago with
>> htb_hysteresis=1)
>>     
>
> So it looks like htb_hysteresis change could be not enough, and it's
> a pity because this could be easiest to "reverse" in the kernel.
>
>   
>> 3. 2.6.27, 1000HZ, NO Dunamic Timer, No HiResTimer, htb_hysteresis=0 +
>> PATCH - uptime 5d 17h (no crashes for now, but it crashed some time ago
>> without patch)
>>     
>
> So I guess this patch seems to make some difference, but it also could
> be hard to convince people to fix it this way, because it makes
> scheduling less exact.
>
> Alas I have still no idea of the real reason. IMHO this should be 
> rather debugged by hrtimers/NMI people, but there were no respose last
> time I Cc-ed them - anyway I added linux-kernel to Cc again here.
>
> I attach below a slightly modified version of the previous patch,
> which lets for more exact scheduling, but could be more vulnerable for
> this bug. Alas, even if it works, it still could be not the final
> solution of this problem.
>
>   
>> Also attach crash log of lash crash PC 1:
>>
>> [10610.110729] BUG: NMI Watchdog detected LOCKUP on CPU1, ip c01fd939,
>> registers:
>> [10610.110729] Modules linked in: netconsole e1000e i2c_i801 i2c_core e1000
>> [10610.110729]
>> [10610.110729] Pid: 0, comm: swapper Not tainted (2.6.26.6-fw #1)
>> [10610.110729] EIP: 0060:[<c01fd939>] EFLAGS: 00000082 CPU: 1
>> [10610.110729] EIP is at rb_insert_color+0x19/0xc0
>>     
>
> Thanks,
> Jarek P.
>
> --- (testing patch #2)
>
>  net/sched/sch_htb.c |    8 +++++++-
>  1 files changed, 7 insertions(+), 1 deletions(-)
>
> diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
> index 30c999c..ff9e965 100644
> --- a/net/sched/sch_htb.c
> +++ b/net/sched/sch_htb.c
> @@ -162,6 +162,7 @@ struct htb_sched {
>  
>  	int rate2quantum;	/* quant = rate / rate2quantum */
>  	psched_time_t now;	/* cached dequeue time */
> +	psched_time_t next_watchdog;
>  	struct qdisc_watchdog watchdog;
>  
>  	/* non shaped skbs; let them go directly thru */
> @@ -920,7 +921,11 @@ static struct sk_buff *htb_dequeue(struct Qdisc *sch)
>  		}
>  	}
>  	sch->qstats.overlimits++;
> -	qdisc_watchdog_schedule(&q->watchdog, next_event);
> +	if (q->next_watchdog < q->now || next_event <=
> +	     q->next_watchdog - PSCHED_TICKS_PER_SEC / (10 * HZ)) {
> +		qdisc_watchdog_schedule(&q->watchdog, next_event);
> +		q->next_watchdog = next_event;
> +	}
>  fin:
>  	return skb;
>  }
> @@ -973,6 +978,7 @@ static void htb_reset(struct Qdisc *sch)
>  		}
>  	}
>  	qdisc_watchdog_cancel(&q->watchdog);
> +	q->next_watchdog = 0;
>  	__skb_queue_purge(&q->direct_queue);
>  	sch->q.qlen = 0;
>  	memset(q->row, 0, sizeof(q->row));
>
>   


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2008-12-10 15:14                 ` Badalian Vyacheslav
@ 2008-12-11  8:46                   ` Jarek Poplawski
  2008-12-15 11:13                     ` Jarek Poplawski
  0 siblings, 1 reply; 65+ messages in thread
From: Jarek Poplawski @ 2008-12-11  8:46 UTC (permalink / raw)
  To: Badalian Vyacheslav; +Cc: netdev, linux-kernel

On Wed, Dec 10, 2008 at 06:14:28PM +0300, Badalian Vyacheslav wrote:
> Hello again! Sorry for long away.

Hi!

> I was go away from this work for long time.
> 
> May we return to this bug?
> Servers at last stable kernel 2.6.27.8
> HZ=1000, HR=off, DynamicTicks=off, hysteresis=1
> Sorry - no patched, update do not i. Do you have fresh patches or ideas
> for tests?

Not much, but I can have if you only are willing to test them...
I attach below a patch which combines 2 patches I sent yesterday to
netdev (PATCH 7/6 and 8/6) vs. 2.6.27.7 (named testing patch #3 here).

You can still try the testing patch #2 I sent previously (quoted below)
with or without this new #3 patch.

Thanks,
Jarek P.

> Also debug "Debug object operations" and "Debug timer objects" is on but
> i not see additional information at dump.
> 
> Thanks!
> 
> That dump:
> [ 1500.932813] BUG: NMI Watchdog detected LOCKUP on CPU3, ip c0208ed9,
> registers:
> [ 1500.932813] Modules linked in: cls_u32 sch_sfq sch_htb netconsole
> e1000e e1000 i2c_i801
> [ 1500.932813]
> [ 1500.932813] Pid: 0, comm: swapper Not tainted (2.6.27-gentoo-r5-fw #1)
> [ 1500.932813] EIP: 0060:[<c0208ed9>] EFLAGS: 00000082 CPU: 3
> [ 1500.932813] EIP is at rb_insert_color+0x29/0xf0
...
> > --- (testing patch #2)
> >
> >  net/sched/sch_htb.c |    8 +++++++-
> >  1 files changed, 7 insertions(+), 1 deletions(-)
> >
> > diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
> > index 30c999c..ff9e965 100644


-----------> (testing patch #3)

diff -Nurp a2.6.27.7/net/sched/sch_htb.c b2.6.27.7/net/sched/sch_htb.c
--- a2.6.27.7/net/sched/sch_htb.c	2008-12-11 08:16:16.000000000 +0000
+++ b2.6.27.7/net/sched/sch_htb.c	2008-12-11 08:20:27.000000000 +0000
@@ -696,12 +696,13 @@ static void htb_charge_class(struct htb_
  * next pending event (0 for no event in pq).
  * Note: Applied are events whose have cl->pq_key <= q->now.
  */
-static psched_time_t htb_do_events(struct htb_sched *q, int level)
+static psched_time_t htb_do_events(struct htb_sched *q, int level,
+				   unsigned long start)
 {
 	/* don't run for longer than 2 jiffies; 2 is used instead of
 	   1 to simplify things when jiffy is going to be incremented
 	   too soon */
-	unsigned long stop_at = jiffies + 2;
+	unsigned long stop_at = start + 2;
 	while (time_before(jiffies, stop_at)) {
 		struct htb_class *cl;
 		long diff;
@@ -720,8 +721,8 @@ static psched_time_t htb_do_events(struc
 		if (cl->cmode != HTB_CAN_SEND)
 			htb_add_to_wait_tree(q, cl, diff);
 	}
-	/* too much load - let's continue on next jiffie */
-	return q->now + PSCHED_TICKS_PER_SEC / HZ;
+	/* too much load - let's continue on next jiffie (including above) */
+	return q->now + 2 * PSCHED_TICKS_PER_SEC / HZ;
 }
 
 /* Returns class->node+prio from id-tree where classe's id is >= id. NULL
@@ -880,6 +881,7 @@ static struct sk_buff *htb_dequeue(struc
 	struct htb_sched *q = qdisc_priv(sch);
 	int level;
 	psched_time_t next_event;
+	unsigned long start_at;
 
 	/* try to dequeue direct packets as high prio (!) to minimize cpu work */
 	skb = __skb_dequeue(&q->direct_queue);
@@ -892,6 +894,7 @@ static struct sk_buff *htb_dequeue(struc
 	if (!sch->q.qlen)
 		goto fin;
 	q->now = psched_get_time();
+	start_at = jiffies;
 
 	next_event = q->now + 5 * PSCHED_TICKS_PER_SEC;
 	q->nwc_hit = 0;
@@ -901,14 +904,14 @@ static struct sk_buff *htb_dequeue(struc
 		psched_time_t event;
 
 		if (q->now >= q->near_ev_cache[level]) {
-			event = htb_do_events(q, level);
+			event = htb_do_events(q, level, start_at);
 			if (!event)
 				event = q->now + PSCHED_TICKS_PER_SEC;
 			q->near_ev_cache[level] = event;
 		} else
 			event = q->near_ev_cache[level];
 
-		if (event && next_event > event)
+		if (next_event > event)
 			next_event = event;
 
 		m = ~q->row_mask[level];

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2008-12-11  8:46                   ` Jarek Poplawski
@ 2008-12-15 11:13                     ` Jarek Poplawski
  2008-12-16  7:37                       ` Badalian Vyacheslav
  2008-12-18  6:43                       ` Badalian Vyacheslav
  0 siblings, 2 replies; 65+ messages in thread
From: Jarek Poplawski @ 2008-12-15 11:13 UTC (permalink / raw)
  To: Badalian Vyacheslav; +Cc: netdev, linux-kernel

On Thu, Dec 11, 2008 at 08:46:06AM +0000, Jarek Poplawski wrote:
> On Wed, Dec 10, 2008 at 06:14:28PM +0300, Badalian Vyacheslav wrote:
> > Hello again! Sorry for long away.
> 
> Hi!
> 
> > I was go away from this work for long time.
> > 
> > May we return to this bug?
> > Servers at last stable kernel 2.6.27.8
> > HZ=1000, HR=off, DynamicTicks=off, hysteresis=1
> > Sorry - no patched, update do not i. Do you have fresh patches or ideas
> > for tests?
> 
> Not much, but I can have if you only are willing to test them...
> I attach below a patch which combines 2 patches I sent yesterday to
> netdev (PATCH 7/6 and 8/6) vs. 2.6.27.7 (named testing patch #3 here).
> 
> You can still try the testing patch #2 I sent previously (quoted below)
> with or without this new #3 patch.
> 

Here is another idea worth checking (instead of patch #2).

Jarek P.

--- (testing patch #4)

diff -Nurp a2.6.27.7/net/sched/sch_htb.c b2.6.27.7/net/sched/sch_htb.c
--- a2.6.27.7/net/sched/sch_htb.c	2008-12-11 08:16:16.000000000 +0000
+++ b2.6.27.7/net/sched/sch_htb.c	2008-12-15 10:44:32.000000000 +0000
@@ -924,6 +924,7 @@ static struct sk_buff *htb_dequeue(struc
 		}
 	}
 	sch->qstats.overlimits++;
+	qdisc_watchdog_cancel(&q->watchdog);
 	qdisc_watchdog_schedule(&q->watchdog, next_event);
 fin:
 	return skb;

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2008-12-15 11:13                     ` Jarek Poplawski
@ 2008-12-16  7:37                       ` Badalian Vyacheslav
  2008-12-18  6:43                       ` Badalian Vyacheslav
  1 sibling, 0 replies; 65+ messages in thread
From: Badalian Vyacheslav @ 2008-12-16  7:37 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: netdev, linux-kernel

Now i test patches 2+3. 5 pc work without oops 5 days
> On Thu, Dec 11, 2008 at 08:46:06AM +0000, Jarek Poplawski wrote:
>   
>> On Wed, Dec 10, 2008 at 06:14:28PM +0300, Badalian Vyacheslav wrote:
>>     
>>> Hello again! Sorry for long away.
>>>       
>> Hi!
>>
>>     
>>> I was go away from this work for long time.
>>>
>>> May we return to this bug?
>>> Servers at last stable kernel 2.6.27.8
>>> HZ=1000, HR=off, DynamicTicks=off, hysteresis=1
>>> Sorry - no patched, update do not i. Do you have fresh patches or ideas
>>> for tests?
>>>       
>> Not much, but I can have if you only are willing to test them...
>> I attach below a patch which combines 2 patches I sent yesterday to
>> netdev (PATCH 7/6 and 8/6) vs. 2.6.27.7 (named testing patch #3 here).
>>
>> You can still try the testing patch #2 I sent previously (quoted below)
>> with or without this new #3 patch.
>>
>>     
>
> Here is another idea worth checking (instead of patch #2).
>
> Jarek P.
>
> --- (testing patch #4)
>
> diff -Nurp a2.6.27.7/net/sched/sch_htb.c b2.6.27.7/net/sched/sch_htb.c
> --- a2.6.27.7/net/sched/sch_htb.c	2008-12-11 08:16:16.000000000 +0000
> +++ b2.6.27.7/net/sched/sch_htb.c	2008-12-15 10:44:32.000000000 +0000
> @@ -924,6 +924,7 @@ static struct sk_buff *htb_dequeue(struc
>  		}
>  	}
>  	sch->qstats.overlimits++;
> +	qdisc_watchdog_cancel(&q->watchdog);
>  	qdisc_watchdog_schedule(&q->watchdog, next_event);
>  fin:
>  	return skb;
>
>   


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2008-12-15 11:13                     ` Jarek Poplawski
  2008-12-16  7:37                       ` Badalian Vyacheslav
@ 2008-12-18  6:43                       ` Badalian Vyacheslav
  2008-12-18  8:17                         ` Jarek Poplawski
  1 sibling, 1 reply; 65+ messages in thread
From: Badalian Vyacheslav @ 2008-12-18  6:43 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: netdev, linux-kernel

Hello
result: Patch 2+3 = uptime 7 days without crashes.
May i revert patches and try single new patch?

Thanks!

> On Thu, Dec 11, 2008 at 08:46:06AM +0000, Jarek Poplawski wrote:
>   
>> On Wed, Dec 10, 2008 at 06:14:28PM +0300, Badalian Vyacheslav wrote:
>>     
>>> Hello again! Sorry for long away.
>>>       
>> Hi!
>>
>>     
>>> I was go away from this work for long time.
>>>
>>> May we return to this bug?
>>> Servers at last stable kernel 2.6.27.8
>>> HZ=1000, HR=off, DynamicTicks=off, hysteresis=1
>>> Sorry - no patched, update do not i. Do you have fresh patches or ideas
>>> for tests?
>>>       
>> Not much, but I can have if you only are willing to test them...
>> I attach below a patch which combines 2 patches I sent yesterday to
>> netdev (PATCH 7/6 and 8/6) vs. 2.6.27.7 (named testing patch #3 here).
>>
>> You can still try the testing patch #2 I sent previously (quoted below)
>> with or without this new #3 patch.
>>
>>     
>
> Here is another idea worth checking (instead of patch #2).
>
> Jarek P.
>
> --- (testing patch #4)
>
> diff -Nurp a2.6.27.7/net/sched/sch_htb.c b2.6.27.7/net/sched/sch_htb.c
> --- a2.6.27.7/net/sched/sch_htb.c	2008-12-11 08:16:16.000000000 +0000
> +++ b2.6.27.7/net/sched/sch_htb.c	2008-12-15 10:44:32.000000000 +0000
> @@ -924,6 +924,7 @@ static struct sk_buff *htb_dequeue(struc
>  		}
>  	}
>  	sch->qstats.overlimits++;
> +	qdisc_watchdog_cancel(&q->watchdog);
>  	qdisc_watchdog_schedule(&q->watchdog, next_event);
>  fin:
>  	return skb;
>
>
>   


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2008-12-18  6:43                       ` Badalian Vyacheslav
@ 2008-12-18  8:17                         ` Jarek Poplawski
  2008-12-18 11:23                           ` Badalian Vyacheslav
  2009-01-14  2:43                           ` Chris Caputo
  0 siblings, 2 replies; 65+ messages in thread
From: Jarek Poplawski @ 2008-12-18  8:17 UTC (permalink / raw)
  To: Badalian Vyacheslav; +Cc: netdev, linux-kernel

On Thu, Dec 18, 2008 at 09:43:51AM +0300, Badalian Vyacheslav wrote:
> Hello
> result: Patch 2+3 = uptime 7 days without crashes.
> May i revert patches and try single new patch?

Here is my current opinion on this bug:

1) I'm almost sure it's not a htb, but hrtimers bug (some race),

2) the htb patches you've tested are not "the proper" way of fixing
   it; I see substantial changes in hrtimers code in the "-tip" tree
   (probably for 2.6.29), which, probably, you'll be advised by
   hrtimers maintainers to try, and I guess, it's not easy on a
   production system,

So, it's up to you:

1) since these patches work for you, you can stop with testing and
   wait with these patched kernels until 2.6.29 (I can propose this
   #2 patch as a temporary fix then),

2) for curiosity you could try this patch #4 alone on one box first
   (after reverting at least patch #2), but again: if it works, it
   could be only treated as a temporary hack, and alternative of #2.

Thanks,
Jarek P.

> 
> > On Thu, Dec 11, 2008 at 08:46:06AM +0000, Jarek Poplawski wrote:
> >   
> >> On Wed, Dec 10, 2008 at 06:14:28PM +0300, Badalian Vyacheslav wrote:
> >>     
> >>> Hello again! Sorry for long away.
> >>>       
> >> Hi!
> >>
> >>     
> >>> I was go away from this work for long time.
> >>>
> >>> May we return to this bug?
> >>> Servers at last stable kernel 2.6.27.8
> >>> HZ=1000, HR=off, DynamicTicks=off, hysteresis=1
> >>> Sorry - no patched, update do not i. Do you have fresh patches or ideas
> >>> for tests?
> >>>       
> >> Not much, but I can have if you only are willing to test them...
> >> I attach below a patch which combines 2 patches I sent yesterday to
> >> netdev (PATCH 7/6 and 8/6) vs. 2.6.27.7 (named testing patch #3 here).
> >>
> >> You can still try the testing patch #2 I sent previously (quoted below)
> >> with or without this new #3 patch.
> >>
> >>     
> >
> > Here is another idea worth checking (instead of patch #2).
> >
> > Jarek P.
> >
> > --- (testing patch #4)
> >
> > diff -Nurp a2.6.27.7/net/sched/sch_htb.c b2.6.27.7/net/sched/sch_htb.c
> > --- a2.6.27.7/net/sched/sch_htb.c	2008-12-11 08:16:16.000000000 +0000
> > +++ b2.6.27.7/net/sched/sch_htb.c	2008-12-15 10:44:32.000000000 +0000
> > @@ -924,6 +924,7 @@ static struct sk_buff *htb_dequeue(struc
> >  		}
> >  	}
> >  	sch->qstats.overlimits++;
> > +	qdisc_watchdog_cancel(&q->watchdog);
> >  	qdisc_watchdog_schedule(&q->watchdog, next_event);
> >  fin:
> >  	return skb;
> >
> >
> >   
> 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2008-12-18  8:17                         ` Jarek Poplawski
@ 2008-12-18 11:23                           ` Badalian Vyacheslav
  2008-12-18 11:37                             ` Jarek Poplawski
  2009-01-14  2:43                           ` Chris Caputo
  1 sibling, 1 reply; 65+ messages in thread
From: Badalian Vyacheslav @ 2008-12-18 11:23 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: netdev, linux-kernel

Thanks for all Jarek!

Vyacheslav Badalian

> On Thu, Dec 18, 2008 at 09:43:51AM +0300, Badalian Vyacheslav wrote:
>   
>> Hello
>> result: Patch 2+3 = uptime 7 days without crashes.
>> May i revert patches and try single new patch?
>>     
>
> Here is my current opinion on this bug:
>
> 1) I'm almost sure it's not a htb, but hrtimers bug (some race),
>
> 2) the htb patches you've tested are not "the proper" way of fixing
>    it; I see substantial changes in hrtimers code in the "-tip" tree
>    (probably for 2.6.29), which, probably, you'll be advised by
>    hrtimers maintainers to try, and I guess, it's not easy on a
>    production system,
>
> So, it's up to you:
>
> 1) since these patches work for you, you can stop with testing and
>    wait with these patched kernels until 2.6.29 (I can propose this
>    #2 patch as a temporary fix then),
>
> 2) for curiosity you could try this patch #4 alone on one box first
>    (after reverting at least patch #2), but again: if it works, it
>    could be only treated as a temporary hack, and alternative of #2.
>
> Thanks,
> Jarek P.
>
>   
>>> On Thu, Dec 11, 2008 at 08:46:06AM +0000, Jarek Poplawski wrote:
>>>   
>>>       
>>>> On Wed, Dec 10, 2008 at 06:14:28PM +0300, Badalian Vyacheslav wrote:
>>>>     
>>>>         
>>>>> Hello again! Sorry for long away.
>>>>>       
>>>>>           
>>>> Hi!
>>>>
>>>>     
>>>>         
>>>>> I was go away from this work for long time.
>>>>>
>>>>> May we return to this bug?
>>>>> Servers at last stable kernel 2.6.27.8
>>>>> HZ=1000, HR=off, DynamicTicks=off, hysteresis=1
>>>>> Sorry - no patched, update do not i. Do you have fresh patches or ideas
>>>>> for tests?
>>>>>       
>>>>>           
>>>> Not much, but I can have if you only are willing to test them...
>>>> I attach below a patch which combines 2 patches I sent yesterday to
>>>> netdev (PATCH 7/6 and 8/6) vs. 2.6.27.7 (named testing patch #3 here).
>>>>
>>>> You can still try the testing patch #2 I sent previously (quoted below)
>>>> with or without this new #3 patch.
>>>>
>>>>     
>>>>         
>>> Here is another idea worth checking (instead of patch #2).
>>>
>>> Jarek P.
>>>
>>> --- (testing patch #4)
>>>
>>> diff -Nurp a2.6.27.7/net/sched/sch_htb.c b2.6.27.7/net/sched/sch_htb.c
>>> --- a2.6.27.7/net/sched/sch_htb.c	2008-12-11 08:16:16.000000000 +0000
>>> +++ b2.6.27.7/net/sched/sch_htb.c	2008-12-15 10:44:32.000000000 +0000
>>> @@ -924,6 +924,7 @@ static struct sk_buff *htb_dequeue(struc
>>>  		}
>>>  	}
>>>  	sch->qstats.overlimits++;
>>> +	qdisc_watchdog_cancel(&q->watchdog);
>>>  	qdisc_watchdog_schedule(&q->watchdog, next_event);
>>>  fin:
>>>  	return skb;
>>>
>>>
>>>   
>>>       
>
>
>   


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2008-12-18 11:23                           ` Badalian Vyacheslav
@ 2008-12-18 11:37                             ` Jarek Poplawski
  0 siblings, 0 replies; 65+ messages in thread
From: Jarek Poplawski @ 2008-12-18 11:37 UTC (permalink / raw)
  To: Badalian Vyacheslav; +Cc: netdev, linux-kernel

On Thu, Dec 18, 2008 at 02:23:33PM +0300, Badalian Vyacheslav wrote:
> Thanks for all Jarek!

The same to you for very valuable reporting and testing Vyacheslav!

Jarek P.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2008-12-18  8:17                         ` Jarek Poplawski
  2008-12-18 11:23                           ` Badalian Vyacheslav
@ 2009-01-14  2:43                           ` Chris Caputo
  2009-01-14  6:39                             ` Jarek Poplawski
  1 sibling, 1 reply; 65+ messages in thread
From: Chris Caputo @ 2009-01-14  2:43 UTC (permalink / raw)
  To: netdev, linux-kernel; +Cc: Jarek Poplawski, Badalian Vyacheslav

On Thu, 18 Dec 2008, Jarek Poplawski wrote:
> On Thu, Dec 18, 2008 at 09:43:51AM +0300, Badalian Vyacheslav wrote:
> > Hello
> > result: Patch 2+3 = uptime 7 days without crashes.
> > May i revert patches and try single new patch?
> 
> Here is my current opinion on this bug:
> 
> 1) I'm almost sure it's not a htb, but hrtimers bug (some race),
> 
> 2) the htb patches you've tested are not "the proper" way of fixing
>    it; I see substantial changes in hrtimers code in the "-tip" tree
>    (probably for 2.6.29), which, probably, you'll be advised by
>    hrtimers maintainers to try, and I guess, it's not easy on a
>    production system,
> 
> So, it's up to you:
> 
> 1) since these patches work for you, you can stop with testing and
>    wait with these patched kernels until 2.6.29 (I can propose this
>    #2 patch as a temporary fix then),
> 
> 2) for curiosity you could try this patch #4 alone on one box first
>    (after reverting at least patch #2), but again: if it works, it
>    could be only treated as a temporary hack, and alternative of #2.

I think I am hitting the same issue discussed in this thread and:

 http://bugzilla.kernel.org/show_bug.cgi?id=11718

and wanted to share my data.

My system:

 2x Xeon @ 3.06ghz
 CONFIG_HZ=250
 CONFIG_HIGH_RES_TIMERS=y
 CONFIG_NO_HZ=y
 (please let me know if more details are useful)

With 2.6.27.10 after around 30 minutes or an hour, server tends to hang 
with:

 Pid: 0, comm: swapper Not tainted (2.6.27.10 #1)
 EIP: 0060:[<c023199f>] EFLAGS: 00000286 CPU: 0
 EIP is at run_hrtimer_pending+0x31/0xb8
 EAX: f6c1d468 EBX: c039b2ae ECX: 00000002 EDX: 00000001
 ESI: f6c1d468 EDI: c1807e20 EBP: c0587f28 ESP: c0587f18
  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
 CR0: 8005003b CR2: 09a36358 CR3: 36d64000 CR4: 000006d0
 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
 DR6: ffff0ff0 DR7: 00000400
  [<c0231a3c>] run_hrtimer_softirq+0x16/0x18
  [<c0223e6d>] __do_softirq+0x64/0xcd
  [<c0223f0b>] do_softirq+0x35/0x3a
  [<c0224199>] irq_exit+0x38/0x6d
  [<c020eefb>] smp_apic_timer_interrupt+0x71/0x82
  [<c0203640>] apic_timer_interrupt+0x28/0x30
  [<c0207f9d>] ? default_idle+0x2d/0x42
  [<c0201946>] cpu_idle+0xca/0xea
  [<c045194e>] rest_init+0x4e/0x50

One time with 2.6.27.10:

 BUG: unable to handle kernel NULL pointer dereference at 000000
  [<c0239b6b>] smp_call_function+0x12/0x14
  [<c020df20>] native_smp_send_stop+0x1b/0x28
  [<c0220304>] panic+0x47/0xdf
  [<c0203ac2>] oops_end+0x5d/0x71
  [<c0203f9f>] die+0x57/0x5f
  [<c02129a7>] do_page_fault+0x474/0x529
  [<c0212533>] ? do_page_fault+0x0/0x529
  [<c045eb3a>] error_code+0x72/0x78
  [<c024007b>] ? cgroup_get_sb+0x20/0x2b4
  [<c02f223c>] ? rb_erase+0x118/0x241
  [<c023177d>] __remove_hrtimer+0x5f/0x67
  [<c039b2ae>] ? qdisc_watchdog+0x0/0x1c
  [<c023199a>] run_hrtimer_pending+0x2c/0xb8
  [<c0231a3c>] run_hrtimer_softirq+0x16/0x18
  [<c0223e6d>] __do_softirq+0x64/0xcd
  [<c0223f0b>] do_softirq+0x35/0x3a
  [<c0224199>] irq_exit+0x38/0x6d
  [<c020eefb>] smp_apic_timer_interrupt+0x71/0x82
  [<c0203640>] apic_timer_interrupt+0x28/0x30
  [<c0207f9d>] ? default_idle+0x2d/0x42
  [<c0201946>] cpu_idle+0xca/0xea
  [<c045194e>] rest_init+0x4e/0x50
  =======================
 ---[ end trace 818de8d3237477a5 ]---
 Rebooting in 10 seconds..

With 2.6.27.10 plus what is being called patches #2 and #3 in this thread, 
the server ran for a day or so before hanging again:

 Pid: 0, comm: swapper Not tainted (2.6.27.10 #3)
 EIP: 0060:[<c023199f>] EFLAGS: 00000206 CPU: 0
 EIP is at run_hrtimer_pending+0x31/0xb8
 EAX: f73f8470 EBX: c039b312 ECX: 00000002 EDX: 00000001
 ESI: f73f8470 EDI: c1807e20 EBP: c0589f20 ESP: c0589f10
  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
 CR0: 8005003b CR2: 08ca3358 CR3: 005ca000 CR4: 000006d0
 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
 DR6: ffff0ff0 DR7: 00000400
  [<c0231a3c>] run_hrtimer_softirq+0x16/0x18
  [<c0223e6d>] __do_softirq+0x64/0xcd
  [<c0223f0b>] do_softirq+0x35/0x3a
  [<c0224199>] irq_exit+0x38/0x6d
  [<c02052b3>] do_IRQ+0x5c/0x75
  [<c0203553>] common_interrupt+0x23/0x28
  [<c0207f9d>] ? default_idle+0x2d/0x42
  [<c0201946>] cpu_idle+0xca/0xea
  [<c045225e>] rest_init+0x4e/0x50

With 2.6.28 (without the patches from this thread) after about an hour the 
system hung again.  "Show Regs" indicated:

 Pid: 0, comm: swapper Not tainted (2.6.28 #1)
 EIP: 0060:[<c03977bd>] EFLAGS: 00000247 CPU: 0
 EIP is at __netif_schedule+0xc/0x39
 EAX: f6934600 EBX: 00000000 ECX: f6934600 EDX: f6864000
 ESI: f6864488 EDI: c1807480 EBP: c05c1f08 ESP: c05c1f04
  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
 CR0: 8005003b CR2: 09968358 CR3: 365cd000 CR4: 000006d0
 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
 DR6: ffff0ff0 DR7: 00000400
 Call Trace:
  [<c03a669d>] qdisc_watchdog+0x18/0x1c
  [<c02335c3>] run_hrtimer_pending+0x56/0xda
  [<c03a6685>] ? qdisc_watchdog+0x0/0x1c
  [<c023365d>] run_hrtimer_softirq+0x16/0x18
  [<c0225b1a>] __do_softirq+0x7a/0x11c
  [<c0225bf1>] do_softirq+0x35/0x3a
  [<c0225eb5>] irq_exit+0x38/0x6d
  [<c020f719>] smp_apic_timer_interrupt+0x71/0x7f
  [<c0203858>] apic_timer_interrupt+0x28/0x30
  [<c0208360>] ? default_idle+0x2d/0x42
  [<c0201938>] cpu_idle+0x6b/0x84
  [<c046b292>] rest_init+0x4e/0x50

Per Jarek's suggestion in bugzilla, I ran 2.6.28 plus Peter Zijlstra's 
"hrtimer: removing all ur callback modes" patches dated 2008-11-25, 
2008-12-04 and 2008-12-08.  Uptime was 2 days 22 hours before I hit what 
appears to be an unrelated bug related to the IPv6 FIB.  (Separately 
reported with subject 'panic with 2.6.28 while doing "ip -6 route"'.)

Chris

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2009-01-14  2:43                           ` Chris Caputo
@ 2009-01-14  6:39                             ` Jarek Poplawski
  2009-01-14 12:17                               ` Denys Fedoryschenko
  0 siblings, 1 reply; 65+ messages in thread
From: Jarek Poplawski @ 2009-01-14  6:39 UTC (permalink / raw)
  To: Chris Caputo
  Cc: netdev, linux-kernel, Badalian Vyacheslav, Denys Fedoryschenko,
	Peter Zijlstra

On Wed, Jan 14, 2009 at 02:43:13AM +0000, Chris Caputo wrote:
...
> Per Jarek's suggestion in bugzilla, I ran 2.6.28 plus Peter Zijlstra's 
> "hrtimer: removing all ur callback modes" patches dated 2008-11-25, 
> 2008-12-04 and 2008-12-08.  Uptime was 2 days 22 hours before I hit what 
> appears to be an unrelated bug related to the IPv6 FIB.  (Separately 
> reported with subject 'panic with 2.6.28 while doing "ip -6 route"'.)

Chris, this is a very good news, thanks for testing this!

Peter, great work! IMHO these patches should go to the stable (at
least 2.6.28).

Thanks,
Jarek P.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2009-01-14  6:39                             ` Jarek Poplawski
@ 2009-01-14 12:17                               ` Denys Fedoryschenko
  2009-01-14 12:36                                 ` Jarek Poplawski
  2009-01-14 12:50                                 ` Peter Zijlstra
  0 siblings, 2 replies; 65+ messages in thread
From: Denys Fedoryschenko @ 2009-01-14 12:17 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Chris Caputo, netdev, linux-kernel, Badalian Vyacheslav, Peter Zijlstra

I will try that patches too, when i got this message, after 3 minutes i got 
crash of my router :-) after working around 17 hours :-(
There is 3 of them by the way, 2 fixes also.

hrtimer: removing all ur callback modes
hrtimer: removing all ur callback modes, fix hotplug
hrtimer: removing all ur callback modes, fix

On Wednesday 14 January 2009 08:39:09 Jarek Poplawski wrote:
> On Wed, Jan 14, 2009 at 02:43:13AM +0000, Chris Caputo wrote:
> ...
>
> > Per Jarek's suggestion in bugzilla, I ran 2.6.28 plus Peter Zijlstra's
> > "hrtimer: removing all ur callback modes" patches dated 2008-11-25,
> > 2008-12-04 and 2008-12-08.  Uptime was 2 days 22 hours before I hit what
> > appears to be an unrelated bug related to the IPv6 FIB.  (Separately
> > reported with subject 'panic with 2.6.28 while doing "ip -6 route"'.)
>
> Chris, this is a very good news, thanks for testing this!
>
> Peter, great work! IMHO these patches should go to the stable (at
> least 2.6.28).
>
> Thanks,
> Jarek P.



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2009-01-14 12:17                               ` Denys Fedoryschenko
@ 2009-01-14 12:36                                 ` Jarek Poplawski
  2009-01-14 12:41                                   ` Denys Fedoryschenko
  2009-01-14 12:50                                 ` Peter Zijlstra
  1 sibling, 1 reply; 65+ messages in thread
From: Jarek Poplawski @ 2009-01-14 12:36 UTC (permalink / raw)
  To: Denys Fedoryschenko
  Cc: Chris Caputo, netdev, linux-kernel, Badalian Vyacheslav, Peter Zijlstra

On Wed, Jan 14, 2009 at 02:17:58PM +0200, Denys Fedoryschenko wrote:
> I will try that patches too, when i got this message, after 3 minutes i got 
> crash of my router :-) after working around 17 hours :-(
> There is 3 of them by the way, 2 fixes also.
> 
> hrtimer: removing all ur callback modes
> hrtimer: removing all ur callback modes, fix hotplug
> hrtimer: removing all ur callback modes, fix

Yes, it's a very nasty bug which, BTW you reported the first AFAIK,
but I hope it's really gone with these patches.

Thanks,
Jarek P.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2009-01-14 12:36                                 ` Jarek Poplawski
@ 2009-01-14 12:41                                   ` Denys Fedoryschenko
  0 siblings, 0 replies; 65+ messages in thread
From: Denys Fedoryschenko @ 2009-01-14 12:41 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Chris Caputo, netdev, linux-kernel, Badalian Vyacheslav, Peter Zijlstra

On Wednesday 14 January 2009 14:36:23 Jarek Poplawski wrote:
> On Wed, Jan 14, 2009 at 02:17:58PM +0200, Denys Fedoryschenko wrote:
> > I will try that patches too, when i got this message, after 3 minutes i
> > got crash of my router :-) after working around 17 hours :-(
> > There is 3 of them by the way, 2 fixes also.
> >
> > hrtimer: removing all ur callback modes
> > hrtimer: removing all ur callback modes, fix hotplug
> > hrtimer: removing all ur callback modes, fix
>
> Yes, it's a very nasty bug which, BTW you reported the first AFAIK,
> but I hope it's really gone with these patches.
>
If it is really fix, it must go to mainline ASAP.
Because really many people hitting this bug... and complaining, that "Linux 
with htb not stable".
But i think it is not just fix :-(

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2009-01-14 12:17                               ` Denys Fedoryschenko
  2009-01-14 12:36                                 ` Jarek Poplawski
@ 2009-01-14 12:50                                 ` Peter Zijlstra
  2009-01-14 13:04                                   ` Jarek Poplawski
                                                     ` (2 more replies)
  1 sibling, 3 replies; 65+ messages in thread
From: Peter Zijlstra @ 2009-01-14 12:50 UTC (permalink / raw)
  To: Denys Fedoryschenko
  Cc: Jarek Poplawski, Chris Caputo, netdev, linux-kernel,
	Badalian Vyacheslav, Thomas Gleixner

On Wed, 2009-01-14 at 14:17 +0200, Denys Fedoryschenko wrote:
> I will try that patches too, when i got this message, after 3 minutes i got 
> crash of my router :-) after working around 17 hours :-(
> There is 3 of them by the way, 2 fixes also.
> 
> hrtimer: removing all ur callback modes
> hrtimer: removing all ur callback modes, fix hotplug
> hrtimer: removing all ur callback modes, fix

I'm afraid its a bit more than that:

ca109491f612aab5c8152207631c0444f63da97f
37810659ea7d9572c5ac284ade272f806ef8f788
a0a99b227da57f81319dd239bc4de811b0f530ec
b2e3c0adec918ea22b6c9d7c76193dd3aaba9bd4
8bdec955b0da2ffbd10eb9b200651dd1f9e366f2
d5fd43c4ae04523e1dcd7794f9c511b289851350
731a55ba0f17064f85903b7bf8e24849ec6cfa20
a6037b61c2f5fc99c57c15b26d7cfa58bbb34008
e3f1d883740b09e5116d4d4e30a6a6987264a83c
82c5b7b527ccc4b5d3cf832437e842f9d2920a79

and

6e5c172cf7ca1ab878cc6a6a4c1d52fef60f3ee0

Also, all this is rather invasive and large, so I'm not sure it will
meet the -stable criteria.


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2009-01-14 12:50                                 ` Peter Zijlstra
@ 2009-01-14 13:04                                   ` Jarek Poplawski
  2009-01-14 13:05                                   ` Denys Fedoryschenko
  2009-01-14 18:02                                   ` Chris Caputo
  2 siblings, 0 replies; 65+ messages in thread
From: Jarek Poplawski @ 2009-01-14 13:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Denys Fedoryschenko, Chris Caputo, netdev, linux-kernel,
	Badalian Vyacheslav, Thomas Gleixner

On Wed, Jan 14, 2009 at 01:50:04PM +0100, Peter Zijlstra wrote:
> On Wed, 2009-01-14 at 14:17 +0200, Denys Fedoryschenko wrote:
> > I will try that patches too, when i got this message, after 3 minutes i got 
> > crash of my router :-) after working around 17 hours :-(
> > There is 3 of them by the way, 2 fixes also.
> > 
> > hrtimer: removing all ur callback modes
> > hrtimer: removing all ur callback modes, fix hotplug
> > hrtimer: removing all ur callback modes, fix
> 
> I'm afraid its a bit more than that:
> 
> ca109491f612aab5c8152207631c0444f63da97f
> 37810659ea7d9572c5ac284ade272f806ef8f788
> a0a99b227da57f81319dd239bc4de811b0f530ec
> b2e3c0adec918ea22b6c9d7c76193dd3aaba9bd4
> 8bdec955b0da2ffbd10eb9b200651dd1f9e366f2
> d5fd43c4ae04523e1dcd7794f9c511b289851350
> 731a55ba0f17064f85903b7bf8e24849ec6cfa20
> a6037b61c2f5fc99c57c15b26d7cfa58bbb34008
> e3f1d883740b09e5116d4d4e30a6a6987264a83c
> 82c5b7b527ccc4b5d3cf832437e842f9d2920a79
> 
> and
> 
> 6e5c172cf7ca1ab878cc6a6a4c1d52fef60f3ee0
> 
> Also, all this is rather invasive and large, so I'm not sure it will
> meet the -stable criteria.
> 

Earlier I thought it's very hard to hit, but after Chris reported it
triggers within hours, IMHO some minimal fix is necessary.

Jarek P.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2009-01-14 12:50                                 ` Peter Zijlstra
  2009-01-14 13:04                                   ` Jarek Poplawski
@ 2009-01-14 13:05                                   ` Denys Fedoryschenko
  2009-01-14 13:12                                     ` Jarek Poplawski
  2009-01-14 18:02                                   ` Chris Caputo
  2 siblings, 1 reply; 65+ messages in thread
From: Denys Fedoryschenko @ 2009-01-14 13:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jarek Poplawski, Chris Caputo, netdev, linux-kernel,
	Badalian Vyacheslav, Thomas Gleixner

> I'm afraid its a bit more than that:
>
> ca109491f612aab5c8152207631c0444f63da97f
> 37810659ea7d9572c5ac284ade272f806ef8f788
> a0a99b227da57f81319dd239bc4de811b0f530ec
> b2e3c0adec918ea22b6c9d7c76193dd3aaba9bd4
> 8bdec955b0da2ffbd10eb9b200651dd1f9e366f2
> d5fd43c4ae04523e1dcd7794f9c511b289851350
> 731a55ba0f17064f85903b7bf8e24849ec6cfa20
> a6037b61c2f5fc99c57c15b26d7cfa58bbb34008
> e3f1d883740b09e5116d4d4e30a6a6987264a83c
> 82c5b7b527ccc4b5d3cf832437e842f9d2920a79
>
> and
>
> 6e5c172cf7ca1ab878cc6a6a4c1d52fef60f3ee0
>
> Also, all this is rather invasive and large, so I'm not sure it will
> meet the -stable criteria.

Then i will try to run 2.6.29-rc1-git4. Kind of risky to run rc1 on loaded 
router, but i am hitting this bug with hrtimers hardly, so no choice for me 
seems. And i cannot disable hrtimers because of bad shaper precision.
Thanks a lot for info and patches :-)


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2009-01-14 13:05                                   ` Denys Fedoryschenko
@ 2009-01-14 13:12                                     ` Jarek Poplawski
  2009-01-14 13:15                                       ` Peter Zijlstra
  0 siblings, 1 reply; 65+ messages in thread
From: Jarek Poplawski @ 2009-01-14 13:12 UTC (permalink / raw)
  To: Denys Fedoryschenko
  Cc: Peter Zijlstra, Chris Caputo, netdev, linux-kernel,
	Badalian Vyacheslav, Thomas Gleixner

On Wed, Jan 14, 2009 at 03:05:46PM +0200, Denys Fedoryschenko wrote:
> > I'm afraid its a bit more than that:
> >
> > ca109491f612aab5c8152207631c0444f63da97f
> > 37810659ea7d9572c5ac284ade272f806ef8f788
> > a0a99b227da57f81319dd239bc4de811b0f530ec
> > b2e3c0adec918ea22b6c9d7c76193dd3aaba9bd4
> > 8bdec955b0da2ffbd10eb9b200651dd1f9e366f2
> > d5fd43c4ae04523e1dcd7794f9c511b289851350
> > 731a55ba0f17064f85903b7bf8e24849ec6cfa20
> > a6037b61c2f5fc99c57c15b26d7cfa58bbb34008
> > e3f1d883740b09e5116d4d4e30a6a6987264a83c
> > 82c5b7b527ccc4b5d3cf832437e842f9d2920a79
> >
> > and
> >
> > 6e5c172cf7ca1ab878cc6a6a4c1d52fef60f3ee0
> >
> > Also, all this is rather invasive and large, so I'm not sure it will
> > meet the -stable criteria.
> 
> Then i will try to run 2.6.29-rc1-git4. Kind of risky to run rc1 on loaded 
> router, but i am hitting this bug with hrtimers hardly, so no choice for me 
> seems. And i cannot disable hrtimers because of bad shaper precision.
> Thanks a lot for info and patches :-)
> 

Denys, since these three patches worked for Chris, I guess it's safer
to stay with 2.6.28 anyway (or my patches mentioned earlier in this
thread). rc1 is really not stable...

Jarek P.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2009-01-14 13:12                                     ` Jarek Poplawski
@ 2009-01-14 13:15                                       ` Peter Zijlstra
  2009-01-14 13:19                                         ` Denys Fedoryschenko
  2009-01-14 13:26                                         ` Jarek Poplawski
  0 siblings, 2 replies; 65+ messages in thread
From: Peter Zijlstra @ 2009-01-14 13:15 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Denys Fedoryschenko, Chris Caputo, netdev, linux-kernel,
	Badalian Vyacheslav, Thomas Gleixner

On Wed, 2009-01-14 at 13:12 +0000, Jarek Poplawski wrote:
> On Wed, Jan 14, 2009 at 03:05:46PM +0200, Denys Fedoryschenko wrote:
> > > I'm afraid its a bit more than that:
> > >
> > > ca109491f612aab5c8152207631c0444f63da97f
> > > 37810659ea7d9572c5ac284ade272f806ef8f788
> > > a0a99b227da57f81319dd239bc4de811b0f530ec
> > > b2e3c0adec918ea22b6c9d7c76193dd3aaba9bd4
> > > 8bdec955b0da2ffbd10eb9b200651dd1f9e366f2
> > > d5fd43c4ae04523e1dcd7794f9c511b289851350
> > > 731a55ba0f17064f85903b7bf8e24849ec6cfa20
> > > a6037b61c2f5fc99c57c15b26d7cfa58bbb34008
> > > e3f1d883740b09e5116d4d4e30a6a6987264a83c
> > > 82c5b7b527ccc4b5d3cf832437e842f9d2920a79
> > >
> > > and
> > >
> > > 6e5c172cf7ca1ab878cc6a6a4c1d52fef60f3ee0
> > >
> > > Also, all this is rather invasive and large, so I'm not sure it will
> > > meet the -stable criteria.
> > 
> > Then i will try to run 2.6.29-rc1-git4. Kind of risky to run rc1 on loaded 
> > router, but i am hitting this bug with hrtimers hardly, so no choice for me 
> > seems. And i cannot disable hrtimers because of bad shaper precision.
> > Thanks a lot for info and patches :-)
> > 
> 
> Denys, since these three patches worked for Chris, I guess it's safer
> to stay with 2.6.28 anyway (or my patches mentioned earlier in this
> thread). rc1 is really not stable...

Yeah, but those three patches have some holes in them still, so I cannot
recommend just patching in those three.


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2009-01-14 13:15                                       ` Peter Zijlstra
@ 2009-01-14 13:19                                         ` Denys Fedoryschenko
  2009-01-14 13:26                                         ` Jarek Poplawski
  1 sibling, 0 replies; 65+ messages in thread
From: Denys Fedoryschenko @ 2009-01-14 13:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jarek Poplawski, Chris Caputo, netdev, linux-kernel,
	Badalian Vyacheslav, Thomas Gleixner

On Wednesday 14 January 2009 15:15:29 Peter Zijlstra wrote:
> >
> > Denys, since these three patches worked for Chris, I guess it's safer
> > to stay with 2.6.28 anyway (or my patches mentioned earlier in this
> > thread). rc1 is really not stable...
>
> Yeah, but those three patches have some holes in them still, so I cannot
> recommend just patching in those three.
I will test rc1-git4 on few units which is less important and have redundancy, 
and then will decide. Already i did this 3 patches, but i prefer complete 
solution. If i will hit any bug, at least i will do my best to report about 
this bugs and such way to help fix them :-)

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2009-01-14 13:15                                       ` Peter Zijlstra
  2009-01-14 13:19                                         ` Denys Fedoryschenko
@ 2009-01-14 13:26                                         ` Jarek Poplawski
  2009-01-14 13:32                                           ` Peter Zijlstra
  1 sibling, 1 reply; 65+ messages in thread
From: Jarek Poplawski @ 2009-01-14 13:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Denys Fedoryschenko, Chris Caputo, netdev, linux-kernel,
	Badalian Vyacheslav, Thomas Gleixner

On Wed, Jan 14, 2009 at 02:15:29PM +0100, Peter Zijlstra wrote:
> On Wed, 2009-01-14 at 13:12 +0000, Jarek Poplawski wrote:
> > On Wed, Jan 14, 2009 at 03:05:46PM +0200, Denys Fedoryschenko wrote:
> > > > I'm afraid its a bit more than that:
> > > >
> > > > ca109491f612aab5c8152207631c0444f63da97f
> > > > 37810659ea7d9572c5ac284ade272f806ef8f788
> > > > a0a99b227da57f81319dd239bc4de811b0f530ec
> > > > b2e3c0adec918ea22b6c9d7c76193dd3aaba9bd4
> > > > 8bdec955b0da2ffbd10eb9b200651dd1f9e366f2
> > > > d5fd43c4ae04523e1dcd7794f9c511b289851350
> > > > 731a55ba0f17064f85903b7bf8e24849ec6cfa20
> > > > a6037b61c2f5fc99c57c15b26d7cfa58bbb34008
> > > > e3f1d883740b09e5116d4d4e30a6a6987264a83c
> > > > 82c5b7b527ccc4b5d3cf832437e842f9d2920a79
> > > >
> > > > and
> > > >
> > > > 6e5c172cf7ca1ab878cc6a6a4c1d52fef60f3ee0
> > > >
> > > > Also, all this is rather invasive and large, so I'm not sure it will
> > > > meet the -stable criteria.
> > > 
> > > Then i will try to run 2.6.29-rc1-git4. Kind of risky to run rc1 on loaded 
> > > router, but i am hitting this bug with hrtimers hardly, so no choice for me 
> > > seems. And i cannot disable hrtimers because of bad shaper precision.
> > > Thanks a lot for info and patches :-)
> > > 
> > 
> > Denys, since these three patches worked for Chris, I guess it's safer
> > to stay with 2.6.28 anyway (or my patches mentioned earlier in this
> > thread). rc1 is really not stable...
> 
> Yeah, but those three patches have some holes in them still, so I cannot
> recommend just patching in those three.
> 

OK, I hope Denys can apply more, but what about others? Without any
patches the hole seems to be much bigger.

Jarek P.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2009-01-14 13:26                                         ` Jarek Poplawski
@ 2009-01-14 13:32                                           ` Peter Zijlstra
  2009-01-14 13:57                                             ` Jarek Poplawski
  2009-01-14 14:13                                             ` Jarek Poplawski
  0 siblings, 2 replies; 65+ messages in thread
From: Peter Zijlstra @ 2009-01-14 13:32 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Denys Fedoryschenko, Chris Caputo, netdev, linux-kernel,
	Badalian Vyacheslav, Thomas Gleixner

On Wed, 2009-01-14 at 13:26 +0000, Jarek Poplawski wrote:

> OK, I hope Denys can apply more, but what about others? Without any
> patches the hole seems to be much bigger.

OK, I read most of this thread on netdev, but didn't find a clear clue
on the specific hrtimer insertion race.

Do you have any clear ideas or should I poke at the htb/hrtimer code a
little?


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2009-01-14 13:32                                           ` Peter Zijlstra
@ 2009-01-14 13:57                                             ` Jarek Poplawski
  2009-01-14 14:13                                             ` Jarek Poplawski
  1 sibling, 0 replies; 65+ messages in thread
From: Jarek Poplawski @ 2009-01-14 13:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Denys Fedoryschenko, Chris Caputo, netdev, linux-kernel,
	Badalian Vyacheslav, Thomas Gleixner

On Wed, Jan 14, 2009 at 02:32:26PM +0100, Peter Zijlstra wrote:
> On Wed, 2009-01-14 at 13:26 +0000, Jarek Poplawski wrote:
> 
> > OK, I hope Denys can apply more, but what about others? Without any
> > patches the hole seems to be much bigger.
> 
> OK, I read most of this thread on netdev, but didn't find a clear clue
> on the specific hrtimer insertion race.
> 
> Do you have any clear ideas or should I poke at the htb/hrtimer code a
> little?
> 

I didn't diagnose this entirely, and there were "a few" days since I
wrote my opinion on this, so currently I forgot details, but I guess
the most suspicious place is adding a hrtimer, when it's on the list,
maybe even two times in a row. I guess it's not removed from the
rbtree in some case. (I stopped searching after seeing you removed
this list or something...)

Jarek P.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2009-01-14 13:32                                           ` Peter Zijlstra
  2009-01-14 13:57                                             ` Jarek Poplawski
@ 2009-01-14 14:13                                             ` Jarek Poplawski
  2009-01-14 14:28                                               ` Peter Zijlstra
  1 sibling, 1 reply; 65+ messages in thread
From: Jarek Poplawski @ 2009-01-14 14:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Denys Fedoryschenko, Chris Caputo, netdev, linux-kernel,
	Badalian Vyacheslav, Thomas Gleixner

On Wed, Jan 14, 2009 at 02:32:26PM +0100, Peter Zijlstra wrote:
> On Wed, 2009-01-14 at 13:26 +0000, Jarek Poplawski wrote:
> 
> > OK, I hope Denys can apply more, but what about others? Without any
> > patches the hole seems to be much bigger.
> 
> OK, I read most of this thread on netdev, but didn't find a clear clue
> on the specific hrtimer insertion race.

There is something at the beginning of this thread, plus earlier
threads mostly with Denys as sender, and "htb bug" in the subject.

> 
> Do you have any clear ideas or should I poke at the htb/hrtimer code a
> little?
> 

...And htb code is htb_dequeue(): qdisc_watchdog_schedule().

Jarek P.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2009-01-14 14:13                                             ` Jarek Poplawski
@ 2009-01-14 14:28                                               ` Peter Zijlstra
  2009-01-14 14:39                                                 ` Jarek Poplawski
  2009-01-15  9:01                                                 ` Jarek Poplawski
  0 siblings, 2 replies; 65+ messages in thread
From: Peter Zijlstra @ 2009-01-14 14:28 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Denys Fedoryschenko, Chris Caputo, netdev, linux-kernel,
	Badalian Vyacheslav, Thomas Gleixner

On Wed, 2009-01-14 at 14:13 +0000, Jarek Poplawski wrote:
> On Wed, Jan 14, 2009 at 02:32:26PM +0100, Peter Zijlstra wrote:
> > On Wed, 2009-01-14 at 13:26 +0000, Jarek Poplawski wrote:
> > 
> > > OK, I hope Denys can apply more, but what about others? Without any
> > > patches the hole seems to be much bigger.
> > 
> > OK, I read most of this thread on netdev, but didn't find a clear clue
> > on the specific hrtimer insertion race.
> 
> There is something at the beginning of this thread, plus earlier
> threads mostly with Denys as sender, and "htb bug" in the subject.
> 
> > 
> > Do you have any clear ideas or should I poke at the htb/hrtimer code a
> > little?
> > 
> 
> ....And htb code is htb_dequeue(): qdisc_watchdog_schedule().

Right, found all that...

Can't spot anything obviously wrong though.. hrtimer_start*() does
remove_hrtimer() which checks STATE_ENQUEUED, STATE_PENDING and pulls it
off the relevant list before it continues the enqueue.

However a loop in enqueue_hrtimer() would suggest a corrupted RB-tree,
but I cannot find an RB-op that doesn't hold base-lock.

Hohumm..


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2009-01-14 14:28                                               ` Peter Zijlstra
@ 2009-01-14 14:39                                                 ` Jarek Poplawski
  2009-01-15  9:01                                                 ` Jarek Poplawski
  1 sibling, 0 replies; 65+ messages in thread
From: Jarek Poplawski @ 2009-01-14 14:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Denys Fedoryschenko, Chris Caputo, netdev, linux-kernel,
	Badalian Vyacheslav, Thomas Gleixner

On Wed, Jan 14, 2009 at 03:28:03PM +0100, Peter Zijlstra wrote:
> On Wed, 2009-01-14 at 14:13 +0000, Jarek Poplawski wrote:
> > On Wed, Jan 14, 2009 at 02:32:26PM +0100, Peter Zijlstra wrote:
> > > On Wed, 2009-01-14 at 13:26 +0000, Jarek Poplawski wrote:
> > > 
> > > > OK, I hope Denys can apply more, but what about others? Without any
> > > > patches the hole seems to be much bigger.
> > > 
> > > OK, I read most of this thread on netdev, but didn't find a clear clue
> > > on the specific hrtimer insertion race.
> > 
> > There is something at the beginning of this thread, plus earlier
> > threads mostly with Denys as sender, and "htb bug" in the subject.
> > 
> > > 
> > > Do you have any clear ideas or should I poke at the htb/hrtimer code a
> > > little?
> > > 
> > 
> > ....And htb code is htb_dequeue(): qdisc_watchdog_schedule().
> 
> Right, found all that...
> 
> Can't spot anything obviously wrong though.. hrtimer_start*() does
> remove_hrtimer() which checks STATE_ENQUEUED, STATE_PENDING and pulls it
> off the relevant list before it continues the enqueue.
> 
> However a loop in enqueue_hrtimer() would suggest a corrupted RB-tree,
> but I cannot find an RB-op that doesn't hold base-lock.
> 

Anyway I think some trace in rbtree seemed to show the parent is the
same with the node, if I didn't mess anything.

Jarek P.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2009-01-14 12:50                                 ` Peter Zijlstra
  2009-01-14 13:04                                   ` Jarek Poplawski
  2009-01-14 13:05                                   ` Denys Fedoryschenko
@ 2009-01-14 18:02                                   ` Chris Caputo
  2009-01-15  6:53                                     ` Jarek Poplawski
  2 siblings, 1 reply; 65+ messages in thread
From: Chris Caputo @ 2009-01-14 18:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Denys Fedoryschenko, Jarek Poplawski, netdev, linux-kernel,
	Badalian Vyacheslav, Thomas Gleixner

On Wed, 14 Jan 2009, Peter Zijlstra wrote:
> On Wed, 2009-01-14 at 14:17 +0200, Denys Fedoryschenko wrote:
> > I will try that patches too, when i got this message, after 3 minutes i got 
> > crash of my router :-) after working around 17 hours :-(
> > There is 3 of them by the way, 2 fixes also.
> > 
> > hrtimer: removing all ur callback modes
> > hrtimer: removing all ur callback modes, fix hotplug
> > hrtimer: removing all ur callback modes, fix
> 
> I'm afraid its a bit more than that:
> 
> ca109491f612aab5c8152207631c0444f63da97f
> 37810659ea7d9572c5ac284ade272f806ef8f788
> a0a99b227da57f81319dd239bc4de811b0f530ec
> b2e3c0adec918ea22b6c9d7c76193dd3aaba9bd4
> 8bdec955b0da2ffbd10eb9b200651dd1f9e366f2
> d5fd43c4ae04523e1dcd7794f9c511b289851350
> 731a55ba0f17064f85903b7bf8e24849ec6cfa20
> a6037b61c2f5fc99c57c15b26d7cfa58bbb34008
> e3f1d883740b09e5116d4d4e30a6a6987264a83c
> 82c5b7b527ccc4b5d3cf832437e842f9d2920a79
> 
> and
> 
> 6e5c172cf7ca1ab878cc6a6a4c1d52fef60f3ee0
> 
> Also, all this is rather invasive and large, so I'm not sure it will
> meet the -stable criteria.

In my testing (another 12 hours of uptime, BTW), the three patches I am 
using are:

  ca109491f612aab5c8152207631c0444f63da97f
  37810659ea7d9572c5ac284ade272f806ef8f788
  a0a99b227da57f81319dd239bc4de811b0f530ec

I have not tried the other patches.

That said, I would not recommend just the three for -stable unless they 
get a much wider amount of testing, on multiple platforms.  I don't see 
that as likely to happen, plus Peter says they are incomplete, so maybe it 
is just best to recommend that 2.6.28 users getting crashes while using 
HTB try these specific patches at first, and then the rest of the patches 
if they do not work.

Thanks,
Chris

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2009-01-14 18:02                                   ` Chris Caputo
@ 2009-01-15  6:53                                     ` Jarek Poplawski
  2009-01-15  7:12                                       ` Badalian Vyacheslav
                                                         ` (2 more replies)
  0 siblings, 3 replies; 65+ messages in thread
From: Jarek Poplawski @ 2009-01-15  6:53 UTC (permalink / raw)
  To: Chris Caputo
  Cc: Peter Zijlstra, Denys Fedoryschenko, netdev, linux-kernel,
	Badalian Vyacheslav, Thomas Gleixner

On Wed, Jan 14, 2009 at 06:02:04PM +0000, Chris Caputo wrote:
...
> That said, I would not recommend just the three for -stable unless they 
> get a much wider amount of testing, on multiple platforms.  I don't see 
> that as likely to happen, plus Peter says they are incomplete, so maybe it 
> is just best to recommend that 2.6.28 users getting crashes while using 
> HTB try these specific patches at first, and then the rest of the patches 
> if they do not work.

The main problem is my patches, at least the tested ones, harm htb's
exactness, and I doubt I could convince anybody to merege them, at
least before your case. It was only reported by two users here (plus
one more on private), and looked like something very rare. After your
report it looks much more necessary.

If there is nothing better, I can recommend it, but IMHO the best
candidate for this is the testing patch #4 from this thread, which
alas wasn't even tested... So, Chris, if you could give it a try in
the meantime (without any other patches)?

Thanks,
Jarek P.

(resend testing patch #4 - for 2.6.27 or 2.6.28)
---

diff -Nurp a2.6.27.7/net/sched/sch_htb.c b2.6.27.7/net/sched/sch_htb.c
--- a2.6.27.7/net/sched/sch_htb.c	2008-12-11 08:16:16.000000000 +0000
+++ b2.6.27.7/net/sched/sch_htb.c	2008-12-15 10:44:32.000000000 +0000
@@ -924,6 +924,7 @@ static struct sk_buff *htb_dequeue(struc
 		}
 	}
 	sch->qstats.overlimits++;
+	qdisc_watchdog_cancel(&q->watchdog);
 	qdisc_watchdog_schedule(&q->watchdog, next_event);
 fin:
 	return skb;

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2009-01-15  6:53                                     ` Jarek Poplawski
@ 2009-01-15  7:12                                       ` Badalian Vyacheslav
  2009-01-15  8:09                                         ` Jarek Poplawski
  2009-01-15  7:26                                       ` Chris Caputo
  2009-01-19  5:46                                       ` David Miller
  2 siblings, 1 reply; 65+ messages in thread
From: Badalian Vyacheslav @ 2009-01-15  7:12 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Chris Caputo, Peter Zijlstra, Denys Fedoryschenko, netdev,
	linux-kernel, Thomas Gleixner


> On Wed, Jan 14, 2009 at 06:02:04PM +0000, Chris Caputo wrote:
> ...
>   
>> That said, I would not recommend just the three for -stable unless they 
>> get a much wider amount of testing, on multiple platforms.  I don't see 
>> that as likely to happen, plus Peter says they are incomplete, so maybe it 
>> is just best to recommend that 2.6.28 users getting crashes while using 
>> HTB try these specific patches at first, and then the rest of the patches 
>> if they do not work.
>>     
>
> The main problem is my patches, at least the tested ones, harm htb's
> exactness, and I doubt I could convince anybody to merege them, at
> least before your case. It was only reported by two users here (plus
> one more on private), and looked like something very rare. After your
> report it looks much more necessary.
>
> If there is nothing better, I can recommend it, but IMHO the best
> candidate for this is the testing patch #4 from this thread, which
> alas wasn't even tested... So, Chris, if you could give it a try in
> the meantime (without any other patches)?
>
> Thanks,
> Jarek P.
>
> (resend testing patch #4 - for 2.6.27 or 2.6.28)
> ---
>
> diff -Nurp a2.6.27.7/net/sched/sch_htb.c b2.6.27.7/net/sched/sch_htb.c
> --- a2.6.27.7/net/sched/sch_htb.c	2008-12-11 08:16:16.000000000 +0000
> +++ b2.6.27.7/net/sched/sch_htb.c	2008-12-15 10:44:32.000000000 +0000
> @@ -924,6 +924,7 @@ static struct sk_buff *htb_dequeue(struc
>  		}
>  	}
>  	sch->qstats.overlimits++;
> +	qdisc_watchdog_cancel(&q->watchdog);
>  	qdisc_watchdog_schedule(&q->watchdog, next_event);
>  fin:
>  	return skb;
>
>   
Hello all.
I also can say this, maybe its help:
At old kernels my servers do 100% soft interupt if traffic more 600mbs.
Without your patches at new kernel i get crash only at heavy network
load PCs (more then 400mbs-500mbs). Servers that get 100-200 mbs not
crashed long time.
I remember that i not test patch #4, because you sat what its only
another way to temporary fix and mainline problem in hrtimer , but i try
turn on HiRes and Dynamic Tics in kernel - its not help for me.
Best Regals. Slavon

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2009-01-15  6:53                                     ` Jarek Poplawski
  2009-01-15  7:12                                       ` Badalian Vyacheslav
@ 2009-01-15  7:26                                       ` Chris Caputo
  2009-01-15  7:54                                         ` Jarek Poplawski
  2009-01-19  5:46                                       ` David Miller
  2 siblings, 1 reply; 65+ messages in thread
From: Chris Caputo @ 2009-01-15  7:26 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Peter Zijlstra, Denys Fedoryschenko, netdev, linux-kernel,
	Badalian Vyacheslav, Thomas Gleixner

On Thu, 15 Jan 2009, Jarek Poplawski wrote:
> On Wed, Jan 14, 2009 at 06:02:04PM +0000, Chris Caputo wrote:
> ...
> > That said, I would not recommend just the three for -stable unless they 
> > get a much wider amount of testing, on multiple platforms.  I don't see 
> > that as likely to happen, plus Peter says they are incomplete, so maybe it 
> > is just best to recommend that 2.6.28 users getting crashes while using 
> > HTB try these specific patches at first, and then the rest of the patches 
> > if they do not work.
> 
> The main problem is my patches, at least the tested ones, harm htb's
> exactness, and I doubt I could convince anybody to merege them, at
> least before your case. It was only reported by two users here (plus
> one more on private), and looked like something very rare. After your
> report it looks much more necessary.
> 
> If there is nothing better, I can recommend it, but IMHO the best
> candidate for this is the testing patch #4 from this thread, which
> alas wasn't even tested... So, Chris, if you could give it a try in
> the meantime (without any other patches)?
> 
> Thanks,
> Jarek P.
> 
> (resend testing patch #4 - for 2.6.27 or 2.6.28)
> ---
> 
> diff -Nurp a2.6.27.7/net/sched/sch_htb.c b2.6.27.7/net/sched/sch_htb.c
> --- a2.6.27.7/net/sched/sch_htb.c	2008-12-11 08:16:16.000000000 +0000
> +++ b2.6.27.7/net/sched/sch_htb.c	2008-12-15 10:44:32.000000000 +0000
> @@ -924,6 +924,7 @@ static struct sk_buff *htb_dequeue(struc
>  		}
>  	}
>  	sch->qstats.overlimits++;
> +	qdisc_watchdog_cancel(&q->watchdog);
>  	qdisc_watchdog_schedule(&q->watchdog, next_event);
>  fin:
>  	return skb;

I hope you can forgive me, but since I have something that works on a 
production machine, I am not in a position in which I can test this.

Chris

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2009-01-15  7:26                                       ` Chris Caputo
@ 2009-01-15  7:54                                         ` Jarek Poplawski
  2009-01-15  9:45                                           ` Jarek Poplawski
  0 siblings, 1 reply; 65+ messages in thread
From: Jarek Poplawski @ 2009-01-15  7:54 UTC (permalink / raw)
  To: Chris Caputo
  Cc: Peter Zijlstra, Denys Fedoryschenko, netdev, linux-kernel,
	Badalian Vyacheslav, Thomas Gleixner

On Thu, Jan 15, 2009 at 07:26:28AM +0000, Chris Caputo wrote:
> On Thu, 15 Jan 2009, Jarek Poplawski wrote:
> > On Wed, Jan 14, 2009 at 06:02:04PM +0000, Chris Caputo wrote:
> > ...
> > > That said, I would not recommend just the three for -stable unless they 
> > > get a much wider amount of testing, on multiple platforms.  I don't see 
> > > that as likely to happen, plus Peter says they are incomplete, so maybe it 
> > > is just best to recommend that 2.6.28 users getting crashes while using 
> > > HTB try these specific patches at first, and then the rest of the patches 
> > > if they do not work.
...
> I hope you can forgive me, but since I have something that works on a 
> production machine, I am not in a position in which I can test this.

So, IOW, you don't recommend Peter's patches but prefere to use them,
and recommend my patches, but you are afraid of using them...
Of course I forgive you!-)

Jarek P.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2009-01-15  7:12                                       ` Badalian Vyacheslav
@ 2009-01-15  8:09                                         ` Jarek Poplawski
  2009-01-15  9:01                                           ` Denys Fedoryschenko
  0 siblings, 1 reply; 65+ messages in thread
From: Jarek Poplawski @ 2009-01-15  8:09 UTC (permalink / raw)
  To: Badalian Vyacheslav
  Cc: Chris Caputo, Peter Zijlstra, Denys Fedoryschenko, netdev,
	linux-kernel, Thomas Gleixner

On Thu, Jan 15, 2009 at 10:12:20AM +0300, Badalian Vyacheslav wrote:
...
> Hello all.
> I also can say this, maybe its help:
> At old kernels my servers do 100% soft interupt if traffic more 600mbs.
> Without your patches at new kernel i get crash only at heavy network
> load PCs (more then 400mbs-500mbs). Servers that get 100-200 mbs not
> crashed long time.
> I remember that i not test patch #4, because you sat what its only
> another way to temporary fix and mainline problem in hrtimer , but i try
> turn on HiRes and Dynamic Tics in kernel - its not help for me.

Slavon, I'm very glad of your testing and reporting, and I never
expected you test anything on production, when something else works
for you. On the other hand, what works for you could be unacceptable
for other users, who don't have such crashes, and prefer better
exactness. So it's hard to recommend such patches if there is not
enough people interested in them.

Best regards,
Jarek P.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2009-01-15  8:09                                         ` Jarek Poplawski
@ 2009-01-15  9:01                                           ` Denys Fedoryschenko
  2009-01-15  9:06                                             ` Jarek Poplawski
  0 siblings, 1 reply; 65+ messages in thread
From: Denys Fedoryschenko @ 2009-01-15  9:01 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Badalian Vyacheslav, Chris Caputo, Peter Zijlstra, netdev,
	linux-kernel, Thomas Gleixner

On Thursday 15 January 2009 10:09:21 Jarek Poplawski wrote:
> Slavon, I'm very glad of your testing and reporting, and I never
> expected you test anything on production, when something else works
> for you. On the other hand, what works for you could be unacceptable
> for other users, who don't have such crashes, and prefer better
> exactness. So it's hard to recommend such patches if there is not
> enough people interested in them.
>
> Best regards,
> Jarek P.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek, can you give me exact name of patch or link to it?
I will test it on production.
And i am interested in searching, what is a problem.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2009-01-14 14:28                                               ` Peter Zijlstra
  2009-01-14 14:39                                                 ` Jarek Poplawski
@ 2009-01-15  9:01                                                 ` Jarek Poplawski
  2009-01-15 10:46                                                   ` Peter Zijlstra
  1 sibling, 1 reply; 65+ messages in thread
From: Jarek Poplawski @ 2009-01-15  9:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Denys Fedoryschenko, Chris Caputo, netdev, linux-kernel,
	Badalian Vyacheslav, Thomas Gleixner

On Wed, Jan 14, 2009 at 03:28:03PM +0100, Peter Zijlstra wrote:
...
> Right, found all that...
> 
> Can't spot anything obviously wrong though.. hrtimer_start*() does
> remove_hrtimer() which checks STATE_ENQUEUED, STATE_PENDING and pulls it
> off the relevant list before it continues the enqueue.
> 
> However a loop in enqueue_hrtimer() would suggest a corrupted RB-tree,
> but I cannot find an RB-op that doesn't hold base-lock.
> 

I've revisited it yesterday, and if I don't miss something, there is
possible a scenario similar to this:

cpu1:				cpu2:

run_hrtimer_pending
spin_unlock
restart = fn(timer)

				hrtimer_start
				enqueue_hrtimer

				hrtimer_start
				remove_hrtimer
				(the HRTIMER_STATE_CALLBACK is removed)

				switch_hrtimer_base
spin_lock
(not this hrtimer's anymore)
__remove_hrtimer
list_add_tail			enqueue_hrtimer


Jarek P.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2009-01-15  9:01                                           ` Denys Fedoryschenko
@ 2009-01-15  9:06                                             ` Jarek Poplawski
  2009-01-15  9:40                                               ` Badalian Vyacheslav
  0 siblings, 1 reply; 65+ messages in thread
From: Jarek Poplawski @ 2009-01-15  9:06 UTC (permalink / raw)
  To: Denys Fedoryschenko
  Cc: Badalian Vyacheslav, Chris Caputo, Peter Zijlstra, netdev,
	linux-kernel, Thomas Gleixner

On Thu, Jan 15, 2009 at 11:01:14AM +0200, Denys Fedoryschenko wrote:
...
> Jarek, can you give me exact name of patch or link to it?
> I will test it on production.
> And i am interested in searching, what is a problem.

If there is nothing better I could recommend the patch below for
-stable (2.6.28), when it's tested.

Thanks,
Jarek P.
---

diff -Nurp a2.6.27.7/net/sched/sch_htb.c b2.6.27.7/net/sched/sch_htb.c
--- a2.6.27.7/net/sched/sch_htb.c	2008-12-11 08:16:16.000000000 +0000
+++ b2.6.27.7/net/sched/sch_htb.c	2008-12-15 10:44:32.000000000 +0000
@@ -924,6 +924,7 @@ static struct sk_buff *htb_dequeue(struc
 		}
 	}
 	sch->qstats.overlimits++;
+	qdisc_watchdog_cancel(&q->watchdog);
 	qdisc_watchdog_schedule(&q->watchdog, next_event);
 fin:
 	return skb;

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2009-01-15  9:06                                             ` Jarek Poplawski
@ 2009-01-15  9:40                                               ` Badalian Vyacheslav
  2009-01-15  9:54                                                 ` Jarek Poplawski
  0 siblings, 1 reply; 65+ messages in thread
From: Badalian Vyacheslav @ 2009-01-15  9:40 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Denys Fedoryschenko, Chris Caputo, Peter Zijlstra, netdev,
	linux-kernel, Thomas Gleixner

Jarek Poplawski пишет:
> On Thu, Jan 15, 2009 at 11:01:14AM +0200, Denys Fedoryschenko wrote:
> ...
>   
>> Jarek, can you give me exact name of patch or link to it?
>> I will test it on production.
>> And i am interested in searching, what is a problem.
>>     
>
> If there is nothing better I could recommend the patch below for
> -stable (2.6.28), when it's tested.
>
> Thanks,
> Jarek P.
> ---
>
> diff -Nurp a2.6.27.7/net/sched/sch_htb.c b2.6.27.7/net/sched/sch_htb.c
> --- a2.6.27.7/net/sched/sch_htb.c	2008-12-11 08:16:16.000000000 +0000
> +++ b2.6.27.7/net/sched/sch_htb.c	2008-12-15 10:44:32.000000000 +0000
> @@ -924,6 +924,7 @@ static struct sk_buff *htb_dequeue(struc
>  		}
>  	}
>  	sch->qstats.overlimits++;
> +	qdisc_watchdog_cancel(&q->watchdog);
>  	qdisc_watchdog_schedule(&q->watchdog, next_event);
>  fin:
>  	return skb;
>
>   

Jarek, i easy can test patch 4 without 2+3 if it needed at heavy
production server. I use dynamic routing and if server crashed - traffic
go to another pc after few seconds. I not test it because you say that
its not needed if for me help 3+2. I apply it today and test at few servers.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2009-01-15  7:54                                         ` Jarek Poplawski
@ 2009-01-15  9:45                                           ` Jarek Poplawski
  2009-01-15 12:00                                             ` Chris Caputo
  0 siblings, 1 reply; 65+ messages in thread
From: Jarek Poplawski @ 2009-01-15  9:45 UTC (permalink / raw)
  To: Chris Caputo
  Cc: Peter Zijlstra, Denys Fedoryschenko, netdev, linux-kernel,
	Badalian Vyacheslav, Thomas Gleixner

On Thu, Jan 15, 2009 at 07:54:50AM +0000, Jarek Poplawski wrote:
> On Thu, Jan 15, 2009 at 07:26:28AM +0000, Chris Caputo wrote:
> > On Thu, 15 Jan 2009, Jarek Poplawski wrote:
> > > On Wed, Jan 14, 2009 at 06:02:04PM +0000, Chris Caputo wrote:
> > > ...
> > > > That said, I would not recommend just the three for -stable unless they 
> > > > get a much wider amount of testing, on multiple platforms.  I don't see 
> > > > that as likely to happen, plus Peter says they are incomplete, so maybe it 
> > > > is just best to recommend that 2.6.28 users getting crashes while using 
> > > > HTB try these specific patches at first, and then the rest of the patches 
> > > > if they do not work.
> ...
> > I hope you can forgive me, but since I have something that works on a 
> > production machine, I am not in a position in which I can test this.
> 
> So, IOW, you don't recommend Peter's patches but prefere to use them,
> and recommend my patches, but you are afraid of using them...

Sorry, Chris! It looks like I misunderstood, and you recomend Peter's
patches yet. (But I guess there are users relying on official kernel
versions.)

I hope you can forgive me,
Jarek P.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2009-01-15  9:40                                               ` Badalian Vyacheslav
@ 2009-01-15  9:54                                                 ` Jarek Poplawski
  2009-01-15  9:57                                                   ` Denys Fedoryschenko
  0 siblings, 1 reply; 65+ messages in thread
From: Jarek Poplawski @ 2009-01-15  9:54 UTC (permalink / raw)
  To: Badalian Vyacheslav
  Cc: Denys Fedoryschenko, Chris Caputo, Peter Zijlstra, netdev,
	linux-kernel, Thomas Gleixner

On Thu, Jan 15, 2009 at 12:40:59PM +0300, Badalian Vyacheslav wrote:
...
> Jarek, i easy can test patch 4 without 2+3 if it needed at heavy
> production server. I use dynamic routing and if server crashed - traffic
> go to another pc after few seconds. I not test it because you say that
> its not needed if for me help 3+2. I apply it today and test at few servers.

Yes, I thought it's so rare we can wait till 2.6.29, while you had this
patched, and Denys seemed to not see it after changing his sever. But
I have changed my mind after seeing Chris's report...

I'm not sure if Peter would recomend something better yet, so I guess
you could wait with this testing for his opinion.

Thanks,
Jarek P.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2009-01-15  9:54                                                 ` Jarek Poplawski
@ 2009-01-15  9:57                                                   ` Denys Fedoryschenko
  2009-01-15 10:06                                                     ` Jarek Poplawski
  2009-01-15 10:10                                                     ` Jarek Poplawski
  0 siblings, 2 replies; 65+ messages in thread
From: Denys Fedoryschenko @ 2009-01-15  9:57 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Badalian Vyacheslav, Chris Caputo, Peter Zijlstra, netdev,
	linux-kernel, Thomas Gleixner

I didn't notice a bug, just because all that time i was using it without 
hrtimers (because of nmi_watchdog).

On Thursday 15 January 2009 11:54:54 Jarek Poplawski wrote:
> On Thu, Jan 15, 2009 at 12:40:59PM +0300, Badalian Vyacheslav wrote:
> ...
>
> > Jarek, i easy can test patch 4 without 2+3 if it needed at heavy
> > production server. I use dynamic routing and if server crashed - traffic
> > go to another pc after few seconds. I not test it because you say that
> > its not needed if for me help 3+2. I apply it today and test at few
> > servers.
>
> Yes, I thought it's so rare we can wait till 2.6.29, while you had this
> patched, and Denys seemed to not see it after changing his sever. But
> I have changed my mind after seeing Chris's report...
>
> I'm not sure if Peter would recomend something better yet, so I guess
> you could wait with this testing for his opinion.
>
> Thanks,
> Jarek P.



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2009-01-15  9:57                                                   ` Denys Fedoryschenko
@ 2009-01-15 10:06                                                     ` Jarek Poplawski
  2009-01-15 10:10                                                     ` Jarek Poplawski
  1 sibling, 0 replies; 65+ messages in thread
From: Jarek Poplawski @ 2009-01-15 10:06 UTC (permalink / raw)
  To: Denys Fedoryschenko
  Cc: Badalian Vyacheslav, Chris Caputo, Peter Zijlstra, netdev,
	linux-kernel, Thomas Gleixner

On Thu, Jan 15, 2009 at 11:57:13AM +0200, Denys Fedoryschenko wrote:
> I didn't notice a bug, just because all that time i was using it without 
> hrtimers (because of nmi_watchdog).
> 

Yes, it's strange: I think Slavon reported it happened without hrtimers
too.

Jarek P.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2009-01-15  9:57                                                   ` Denys Fedoryschenko
  2009-01-15 10:06                                                     ` Jarek Poplawski
@ 2009-01-15 10:10                                                     ` Jarek Poplawski
  2009-01-15 10:40                                                       ` Denys Fedoryschenko
  1 sibling, 1 reply; 65+ messages in thread
From: Jarek Poplawski @ 2009-01-15 10:10 UTC (permalink / raw)
  To: Denys Fedoryschenko
  Cc: Badalian Vyacheslav, Chris Caputo, Peter Zijlstra, netdev,
	linux-kernel, Thomas Gleixner

On Thu, Jan 15, 2009 at 11:57:13AM +0200, Denys Fedoryschenko wrote:
> I didn't notice a bug, just because all that time i was using it without 
> hrtimers (because of nmi_watchdog).
> 

Denys, maybe I forgot something, but I think you reported this bug
with nmi_watchdog too...

Jarek P.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2009-01-15 10:10                                                     ` Jarek Poplawski
@ 2009-01-15 10:40                                                       ` Denys Fedoryschenko
  0 siblings, 0 replies; 65+ messages in thread
From: Denys Fedoryschenko @ 2009-01-15 10:40 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Badalian Vyacheslav, Chris Caputo, Peter Zijlstra, netdev,
	linux-kernel, Thomas Gleixner

Yes, you are correct, on another hardware. Thats strange.

On Thursday 15 January 2009 12:10:15 Jarek Poplawski wrote:
>
> Denys, maybe I forgot something, but I think you reported this bug
> with nmi_watchdog too...
>
> Jarek P.



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2009-01-15  9:01                                                 ` Jarek Poplawski
@ 2009-01-15 10:46                                                   ` Peter Zijlstra
  2009-01-15 10:54                                                     ` Jarek Poplawski
  0 siblings, 1 reply; 65+ messages in thread
From: Peter Zijlstra @ 2009-01-15 10:46 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Denys Fedoryschenko, Chris Caputo, netdev, linux-kernel,
	Badalian Vyacheslav, Thomas Gleixner

On Thu, 2009-01-15 at 09:01 +0000, Jarek Poplawski wrote:
> On Wed, Jan 14, 2009 at 03:28:03PM +0100, Peter Zijlstra wrote:
> ....
> > Right, found all that...
> > 
> > Can't spot anything obviously wrong though.. hrtimer_start*() does
> > remove_hrtimer() which checks STATE_ENQUEUED, STATE_PENDING and pulls it
> > off the relevant list before it continues the enqueue.
> > 
> > However a loop in enqueue_hrtimer() would suggest a corrupted RB-tree,
> > but I cannot find an RB-op that doesn't hold base-lock.
> > 
> 
> I've revisited it yesterday, and if I don't miss something, there is
> possible a scenario similar to this:
> 
> cpu1:				cpu2:
> 
> run_hrtimer_pending
> spin_unlock
> restart = fn(timer)
> 
> 				hrtimer_start
> 				enqueue_hrtimer
> 
> 				hrtimer_start
> 				remove_hrtimer
> 				(the HRTIMER_STATE_CALLBACK is removed)
> 
> 				switch_hrtimer_base
> spin_lock
> (not this hrtimer's anymore)
> __remove_hrtimer
> list_add_tail			enqueue_hrtimer
> 

(looking at .28 code)

run_hrtimer_pending() reads like:

while (pending timers) {
  __remove_hrtimer(timer, HRTIMER_STATE_CALLBACK);
  spin_unlock(&cpu_base->lock);

  fn(timer);

  spin_lock(&cpu_base->lock);
  timer->state &= ~HRTIMER_STATE_CALLBACK; // _should_ result in HRTIMER_STATE_INACTIVE
  if (HRTIMER_RESTART)
    re-queue
  else if (timer->state != INACTIVE) {
    // so another cpu re-queued this timer _while_ we were executing it.
    if (timer is first && !reprogramm) {
      __remove_hrtimer(timer, HRTIMER_STATE_PENDING);
      list_add_tail(timer, &cb_pending);
    }
  } 
}

So in the window where we drop the lock, one can, as you said, have
another cpu requeue the timer, but the rb_entry and list_entry are free,
so it should not cause the data corruption we're seeing.




^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2009-01-15 10:46                                                   ` Peter Zijlstra
@ 2009-01-15 10:54                                                     ` Jarek Poplawski
  0 siblings, 0 replies; 65+ messages in thread
From: Jarek Poplawski @ 2009-01-15 10:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Denys Fedoryschenko, Chris Caputo, netdev, linux-kernel,
	Badalian Vyacheslav, Thomas Gleixner

On Thu, Jan 15, 2009 at 11:46:48AM +0100, Peter Zijlstra wrote:
> On Thu, 2009-01-15 at 09:01 +0000, Jarek Poplawski wrote:
...
> > spin_lock
> > (not this hrtimer's anymore)
> > __remove_hrtimer
> > list_add_tail			enqueue_hrtimer
> > 
> 
> (looking at .28 code)
> 
> run_hrtimer_pending() reads like:
> 
> while (pending timers) {
>   __remove_hrtimer(timer, HRTIMER_STATE_CALLBACK);
>   spin_unlock(&cpu_base->lock);
> 
>   fn(timer);
> 
>   spin_lock(&cpu_base->lock);
>   timer->state &= ~HRTIMER_STATE_CALLBACK; // _should_ result in HRTIMER_STATE_INACTIVE
>   if (HRTIMER_RESTART)
>     re-queue
>   else if (timer->state != INACTIVE) {
>     // so another cpu re-queued this timer _while_ we were executing it.
>     if (timer is first && !reprogramm) {
>       __remove_hrtimer(timer, HRTIMER_STATE_PENDING);
>       list_add_tail(timer, &cb_pending);
>     }
>   } 
> }
> 
> So in the window where we drop the lock, one can, as you said, have
> another cpu requeue the timer, but the rb_entry and list_entry are free,
> so it should not cause the data corruption we're seeing.
> 

Can't they be enqueued to the list (without a lock) and rbtree at the
same time? Then removing is done for the list only?

Jarek P. 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2009-01-15  9:45                                           ` Jarek Poplawski
@ 2009-01-15 12:00                                             ` Chris Caputo
  2009-01-15 12:18                                               ` Jarek Poplawski
  0 siblings, 1 reply; 65+ messages in thread
From: Chris Caputo @ 2009-01-15 12:00 UTC (permalink / raw)
  To: Jarek Poplawski, Denys Fedoryschenko
  Cc: Peter Zijlstra, netdev, linux-kernel, Badalian Vyacheslav,
	Thomas Gleixner

On Thu, 15 Jan 2009, Jarek Poplawski wrote:
> On Thu, Jan 15, 2009 at 07:54:50AM +0000, Jarek Poplawski wrote:
> > On Thu, Jan 15, 2009 at 07:26:28AM +0000, Chris Caputo wrote:
> > > On Thu, 15 Jan 2009, Jarek Poplawski wrote:
> > > > On Wed, Jan 14, 2009 at 06:02:04PM +0000, Chris Caputo wrote:
> > > > ...
> > > > > That said, I would not recommend just the three for -stable unless they 
> > > > > get a much wider amount of testing, on multiple platforms.  I don't see 
> > > > > that as likely to happen, plus Peter says they are incomplete, so maybe it 
> > > > > is just best to recommend that 2.6.28 users getting crashes while using 
> > > > > HTB try these specific patches at first, and then the rest of the patches 
> > > > > if they do not work.
> > ...
> > > I hope you can forgive me, but since I have something that works on a 
> > > production machine, I am not in a position in which I can test this.
> > 
> > So, IOW, you don't recommend Peter's patches but prefere to use them,
> > and recommend my patches, but you are afraid of using them...
> 
> Sorry, Chris! It looks like I misunderstood, and you recomend Peter's
> patches yet. (But I guess there are users relying on official kernel
> versions.)
> 
> I hope you can forgive me,

:-)  No worries.

Just to be clear, by "these specific patches at first", I mean the 
"hrtimer: removing all ur callback modes" ones:

 ca109491f612aab5c8152207631c0444f63da97f
 37810659ea7d9572c5ac284ade272f806ef8f788
 a0a99b227da57f81319dd239bc4de811b0f530ec

By "rest of the patches", I mean the above plus the others created by
Peter for hrtimers.  Full list:

 ca109491f612aab5c8152207631c0444f63da97f
 37810659ea7d9572c5ac284ade272f806ef8f788
 a0a99b227da57f81319dd239bc4de811b0f530ec
 b2e3c0adec918ea22b6c9d7c76193dd3aaba9bd4
 8bdec955b0da2ffbd10eb9b200651dd1f9e366f2
 d5fd43c4ae04523e1dcd7794f9c511b289851350
 731a55ba0f17064f85903b7bf8e24849ec6cfa20
 a6037b61c2f5fc99c57c15b26d7cfa58bbb34008
 e3f1d883740b09e5116d4d4e30a6a6987264a83c
 82c5b7b527ccc4b5d3cf832437e842f9d2920a79
 6e5c172cf7ca1ab878cc6a6a4c1d52fef60f3ee0

The first 3 git patches listed above are working for me so far, so I have 
not had a need to try the others.

On Thu, Jan 15, 2009 at 11:57:13AM +0200, Denys Fedoryschenko wrote:
> I didn't notice a bug, just because all that time i was using it without
> hrtimers (because of nmi_watchdog).

nmi_watchdog 1 or 2?  I'm running with 2 which I've read does not disable 
hrtimers, right?

I mention it because my original HTB/hrtimers-related crash on .28 was 
without nmi_watchdog altogether.  I added both "nmi_watchdog=panic,2" and 
Peter's first 3 patches (ca1, 378, a0a) at the same time, to produce an 
apparently stable system.

I wish I had a repro scenario in a non-production environment, so I could 
help out further with this.  If I did, I would test without nmi_watchdog 
while trying just 2.6.28 and Jarek's #4 patch.

Chris

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2009-01-15 12:00                                             ` Chris Caputo
@ 2009-01-15 12:18                                               ` Jarek Poplawski
  2009-01-15 13:53                                                 ` Chris Caputo
  0 siblings, 1 reply; 65+ messages in thread
From: Jarek Poplawski @ 2009-01-15 12:18 UTC (permalink / raw)
  To: Chris Caputo
  Cc: Denys Fedoryschenko, Peter Zijlstra, netdev, linux-kernel,
	Badalian Vyacheslav, Thomas Gleixner

On Thu, Jan 15, 2009 at 12:00:50PM +0000, Chris Caputo wrote:
...
> I wish I had a repro scenario in a non-production environment, so I could 
> help out further with this.  If I did, I would test without nmi_watchdog 
> while trying just 2.6.28 and Jarek's #4 patch.

I think this watchdog doesn't matter too much yet. Probably it's more
about hardware (smp, maybe kind of hw timer), and high traffic vs. htb
rules.

Jarek P.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2009-01-15 12:18                                               ` Jarek Poplawski
@ 2009-01-15 13:53                                                 ` Chris Caputo
  2009-01-16  6:51                                                   ` Badalian Vyacheslav
  0 siblings, 1 reply; 65+ messages in thread
From: Chris Caputo @ 2009-01-15 13:53 UTC (permalink / raw)
  To: Jarek Poplawski, Thomas Gleixner
  Cc: Denys Fedoryschenko, Peter Zijlstra, netdev, linux-kernel,
	Badalian Vyacheslav

On Thu, 15 Jan 2009, Jarek Poplawski wrote:
> On Thu, Jan 15, 2009 at 12:00:50PM +0000, Chris Caputo wrote:
> ...
> > I wish I had a repro scenario in a non-production environment, so I could 
> > help out further with this.  If I did, I would test without nmi_watchdog 
> > while trying just 2.6.28 and Jarek's #4 patch.
> 
> I think this watchdog doesn't matter too much yet. Probably it's more
> about hardware (smp, maybe kind of hw timer), and high traffic vs. htb
> rules.

Per Thomas' comment in http://bugzilla.kernel.org/show_bug.cgi?id=10944 
regarding broken hrtimers:

--
  nmi_watchdog=1 disables the local apic timers and therefor highres/nohz

  Try to use nmi_watchdog=2 instead.
--

Chris

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2009-01-15 13:53                                                 ` Chris Caputo
@ 2009-01-16  6:51                                                   ` Badalian Vyacheslav
  0 siblings, 0 replies; 65+ messages in thread
From: Badalian Vyacheslav @ 2009-01-16  6:51 UTC (permalink / raw)
  To: Chris Caputo
  Cc: Jarek Poplawski, Thomas Gleixner, Denys Fedoryschenko,
	Peter Zijlstra, netdev, linux-kernel

Chris Caputo пишет:
> On Thu, 15 Jan 2009, Jarek Poplawski wrote:
>   
>> On Thu, Jan 15, 2009 at 12:00:50PM +0000, Chris Caputo wrote:
>> ...
>>     
>>> I wish I had a repro scenario in a non-production environment, so I could 
>>> help out further with this.  If I did, I would test without nmi_watchdog 
>>> while trying just 2.6.28 and Jarek's #4 patch.
>>>       
>> I think this watchdog doesn't matter too much yet. Probably it's more
>> about hardware (smp, maybe kind of hw timer), and high traffic vs. htb
>> rules.
>>     
>
> Per Thomas' comment in http://bugzilla.kernel.org/show_bug.cgi?id=10944 
> regarding broken hrtimers:
>
> --
>   nmi_watchdog=1 disables the local apic timers and therefor highres/nohz
>
>   Try to use nmi_watchdog=2 instead.
> --
>
> Chris
>
>
>   
Oh... after first deadlocks i add nmi_watchdog=1 to kernel and test with
it. Is this result what highres/nohz
allways on and my message where i say what test bug which highres/nohz
on/off is wrong?


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: deadlocks if use htb
  2009-01-15  6:53                                     ` Jarek Poplawski
  2009-01-15  7:12                                       ` Badalian Vyacheslav
  2009-01-15  7:26                                       ` Chris Caputo
@ 2009-01-19  5:46                                       ` David Miller
  2009-01-19  6:57                                         ` [PATCH] " Jarek Poplawski
  2 siblings, 1 reply; 65+ messages in thread
From: David Miller @ 2009-01-19  5:46 UTC (permalink / raw)
  To: jarkao2; +Cc: ccaputo, a.p.zijlstra, denys, netdev, linux-kernel, slavon, tglx

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Thu, 15 Jan 2009 06:53:22 +0000

> (resend testing patch #4 - for 2.6.27 or 2.6.28)

Jarek, if you deem that this is in fact what we should
submit for -stable please give me a submission with
a suitable commit message and signoffs, and I will queue
it up for -stable.

Thanks.

> diff -Nurp a2.6.27.7/net/sched/sch_htb.c b2.6.27.7/net/sched/sch_htb.c
> --- a2.6.27.7/net/sched/sch_htb.c	2008-12-11 08:16:16.000000000 +0000
> +++ b2.6.27.7/net/sched/sch_htb.c	2008-12-15 10:44:32.000000000 +0000
> @@ -924,6 +924,7 @@ static struct sk_buff *htb_dequeue(struc
>  		}
>  	}
>  	sch->qstats.overlimits++;
> +	qdisc_watchdog_cancel(&q->watchdog);
>  	qdisc_watchdog_schedule(&q->watchdog, next_event);
>  fin:
>  	return skb;
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH] Re: deadlocks if use htb
  2009-01-19  5:46                                       ` David Miller
@ 2009-01-19  6:57                                         ` Jarek Poplawski
  2009-01-19  7:42                                           ` Badalian Vyacheslav
  0 siblings, 1 reply; 65+ messages in thread
From: Jarek Poplawski @ 2009-01-19  6:57 UTC (permalink / raw)
  To: David Miller
  Cc: ccaputo, a.p.zijlstra, denys, netdev, linux-kernel, slavon, tglx

On Sun, Jan 18, 2009 at 09:46:04PM -0800, David Miller wrote:
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Thu, 15 Jan 2009 06:53:22 +0000
> 
> > (resend testing patch #4 - for 2.6.27 or 2.6.28)
> 
> Jarek, if you deem that this is in fact what we should
> submit for -stable please give me a submission with
> a suitable commit message and signoffs, and I will queue
> it up for -stable.

It looks like this should be needed, but I think it's better to wait
2 or 3 days for "Tested-by" from Denys and/or maybe Vyacheslav yet.
(I hoped they would rather test some hrtimers patch, but it looks like
Peter was busy.)

Thanks,
Jarek P.
-----------------> (needed only for -stables: 2.6.28 and older)

pkt_sched: sch_htb: Fix deadlock in hrtimers triggered by HTB

Most probably there is a (still unproven) race in hrtimers (before
2.6.29 kernels), which causes a corruption of hrtimers rbtree. This
patch doesn't fix it, but should let HTB avoid triggering the bug.

Reported-by: Denys Fedoryschenko <denys@visp.net.lb>
Reported-by: Badalian Vyacheslav <slavon@bigtelecom.ru>
Reported-by: Chris Caputo <ccaputo@alt.net>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
---

diff -Nurp a2.6.27.7/net/sched/sch_htb.c b2.6.27.7/net/sched/sch_htb.c
--- a2.6.27.7/net/sched/sch_htb.c	2008-12-11 08:16:16.000000000 +0000
+++ b2.6.27.7/net/sched/sch_htb.c	2008-12-15 10:44:32.000000000 +0000
@@ -924,6 +924,7 @@ static struct sk_buff *htb_dequeue(struc
 		}
 	}
 	sch->qstats.overlimits++;
+	qdisc_watchdog_cancel(&q->watchdog);
 	qdisc_watchdog_schedule(&q->watchdog, next_event);
 fin:
 	return skb;

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH] Re: deadlocks if use htb
  2009-01-19  6:57                                         ` [PATCH] " Jarek Poplawski
@ 2009-01-19  7:42                                           ` Badalian Vyacheslav
  2009-01-19  7:57                                             ` Jarek Poplawski
  0 siblings, 1 reply; 65+ messages in thread
From: Badalian Vyacheslav @ 2009-01-19  7:42 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: David Miller, a.p.zijlstra, denys, netdev, linux-kernel, tglx

Tested 2 days (weekend is stress days for this bug, because its have
many traffic) at 3 servers. Fly normal. For completed test need up to 7
days, but i think (imho) this patch not break any functional and may
safely added to stable. 100% help for me patch # 2+3.
> On Sun, Jan 18, 2009 at 09:46:04PM -0800, David Miller wrote:
>   
>> From: Jarek Poplawski <jarkao2@gmail.com>
>> Date: Thu, 15 Jan 2009 06:53:22 +0000
>>
>>     
>>> (resend testing patch #4 - for 2.6.27 or 2.6.28)
>>>       
>> Jarek, if you deem that this is in fact what we should
>> submit for -stable please give me a submission with
>> a suitable commit message and signoffs, and I will queue
>> it up for -stable.
>>     
>
> It looks like this should be needed, but I think it's better to wait
> 2 or 3 days for "Tested-by" from Denys and/or maybe Vyacheslav yet.
> (I hoped they would rather test some hrtimers patch, but it looks like
> Peter was busy.)
>
> Thanks,
> Jarek P.
> -----------------> (needed only for -stables: 2.6.28 and older)
>
> pkt_sched: sch_htb: Fix deadlock in hrtimers triggered by HTB
>
> Most probably there is a (still unproven) race in hrtimers (before
> 2.6.29 kernels), which causes a corruption of hrtimers rbtree. This
> patch doesn't fix it, but should let HTB avoid triggering the bug.
>
> Reported-by: Denys Fedoryschenko <denys@visp.net.lb>
> Reported-by: Badalian Vyacheslav <slavon@bigtelecom.ru>
> Reported-by: Chris Caputo <ccaputo@alt.net>
> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
> ---
>
> diff -Nurp a2.6.27.7/net/sched/sch_htb.c b2.6.27.7/net/sched/sch_htb.c
> --- a2.6.27.7/net/sched/sch_htb.c	2008-12-11 08:16:16.000000000 +0000
> +++ b2.6.27.7/net/sched/sch_htb.c	2008-12-15 10:44:32.000000000 +0000
> @@ -924,6 +924,7 @@ static struct sk_buff *htb_dequeue(struc
>  		}
>  	}
>  	sch->qstats.overlimits++;
> +	qdisc_watchdog_cancel(&q->watchdog);
>  	qdisc_watchdog_schedule(&q->watchdog, next_event);
>  fin:
>  	return skb;
>
>   


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH] Re: deadlocks if use htb
  2009-01-19  7:42                                           ` Badalian Vyacheslav
@ 2009-01-19  7:57                                             ` Jarek Poplawski
  2009-01-20  1:29                                               ` David Miller
  0 siblings, 1 reply; 65+ messages in thread
From: Jarek Poplawski @ 2009-01-19  7:57 UTC (permalink / raw)
  To: Badalian Vyacheslav
  Cc: David Miller, a.p.zijlstra, denys, netdev, linux-kernel, tglx

On Mon, Jan 19, 2009 at 10:42:29AM +0300, Badalian Vyacheslav wrote:
> Tested 2 days (weekend is stress days for this bug, because its have
> many traffic) at 3 servers. Fly normal. For completed test need up to 7
> days, but i think (imho) this patch not break any functional and may
> safely added to stable. 100% help for me patch # 2+3.

Thank you very much, Slavon! (2+3 are of course better tested, but
I hope this #4 is more appropriate for -stable.)

Jarek P.

> > On Sun, Jan 18, 2009 at 09:46:04PM -0800, David Miller wrote:
> >   
> >> From: Jarek Poplawski <jarkao2@gmail.com>
> >> Date: Thu, 15 Jan 2009 06:53:22 +0000
> >>
> >>     
> >>> (resend testing patch #4 - for 2.6.27 or 2.6.28)
> >>>       
> >> Jarek, if you deem that this is in fact what we should
> >> submit for -stable please give me a submission with
> >> a suitable commit message and signoffs, and I will queue
> >> it up for -stable.
> >>     
> >
> > It looks like this should be needed, but I think it's better to wait
> > 2 or 3 days for "Tested-by" from Denys and/or maybe Vyacheslav yet.
> > (I hoped they would rather test some hrtimers patch, but it looks like
> > Peter was busy.)
> >
> > Thanks,
> > Jarek P.
> > -----------------> (needed only for -stables: 2.6.28 and older)
> >
> > pkt_sched: sch_htb: Fix deadlock in hrtimers triggered by HTB
> >
> > Most probably there is a (still unproven) race in hrtimers (before
> > 2.6.29 kernels), which causes a corruption of hrtimers rbtree. This
> > patch doesn't fix it, but should let HTB avoid triggering the bug.
> >
> > Reported-by: Denys Fedoryschenko <denys@visp.net.lb>
> > Reported-by: Badalian Vyacheslav <slavon@bigtelecom.ru>
> > Reported-by: Chris Caputo <ccaputo@alt.net>
> > Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
> > ---
> >
> > diff -Nurp a2.6.27.7/net/sched/sch_htb.c b2.6.27.7/net/sched/sch_htb.c
> > --- a2.6.27.7/net/sched/sch_htb.c	2008-12-11 08:16:16.000000000 +0000
> > +++ b2.6.27.7/net/sched/sch_htb.c	2008-12-15 10:44:32.000000000 +0000
> > @@ -924,6 +924,7 @@ static struct sk_buff *htb_dequeue(struc
> >  		}
> >  	}
> >  	sch->qstats.overlimits++;
> > +	qdisc_watchdog_cancel(&q->watchdog);
> >  	qdisc_watchdog_schedule(&q->watchdog, next_event);
> >  fin:
> >  	return skb;
> >
> >   
> 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH] Re: deadlocks if use htb
  2009-01-19  7:57                                             ` Jarek Poplawski
@ 2009-01-20  1:29                                               ` David Miller
  0 siblings, 0 replies; 65+ messages in thread
From: David Miller @ 2009-01-20  1:29 UTC (permalink / raw)
  To: jarkao2; +Cc: slavon, a.p.zijlstra, denys, netdev, linux-kernel, tglx

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Mon, 19 Jan 2009 07:57:33 +0000

> On Mon, Jan 19, 2009 at 10:42:29AM +0300, Badalian Vyacheslav wrote:
> > Tested 2 days (weekend is stress days for this bug, because its have
> > many traffic) at 3 servers. Fly normal. For completed test need up to 7
> > days, but i think (imho) this patch not break any functional and may
> > safely added to stable. 100% help for me patch # 2+3.
> 
> Thank you very much, Slavon! (2+3 are of course better tested, but
> I hope this #4 is more appropriate for -stable.)

I've queued this up, thanks everyone.

^ permalink raw reply	[flat|nested] 65+ messages in thread

end of thread, other threads:[~2009-01-20  1:29 UTC | newest]

Thread overview: 65+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-10-10  5:44 deadlocks if use htb Badalian Vyacheslav
2008-10-10  7:56 ` Jarek Poplawski
2008-10-10  8:46   ` Badalian Vyacheslav
2008-10-10  8:52   ` Badalian Vyacheslav
2008-10-10  9:04     ` Jarek Poplawski
2008-10-10  9:51       ` Jarek Poplawski
2008-10-16  8:28         ` Badalian Vyacheslav
2008-10-16  8:40           ` Jarek Poplawski
2008-10-22  6:06             ` Badalian Vyacheslav
2008-10-22  7:02               ` Jarek Poplawski
2008-12-10 15:14                 ` Badalian Vyacheslav
2008-12-11  8:46                   ` Jarek Poplawski
2008-12-15 11:13                     ` Jarek Poplawski
2008-12-16  7:37                       ` Badalian Vyacheslav
2008-12-18  6:43                       ` Badalian Vyacheslav
2008-12-18  8:17                         ` Jarek Poplawski
2008-12-18 11:23                           ` Badalian Vyacheslav
2008-12-18 11:37                             ` Jarek Poplawski
2009-01-14  2:43                           ` Chris Caputo
2009-01-14  6:39                             ` Jarek Poplawski
2009-01-14 12:17                               ` Denys Fedoryschenko
2009-01-14 12:36                                 ` Jarek Poplawski
2009-01-14 12:41                                   ` Denys Fedoryschenko
2009-01-14 12:50                                 ` Peter Zijlstra
2009-01-14 13:04                                   ` Jarek Poplawski
2009-01-14 13:05                                   ` Denys Fedoryschenko
2009-01-14 13:12                                     ` Jarek Poplawski
2009-01-14 13:15                                       ` Peter Zijlstra
2009-01-14 13:19                                         ` Denys Fedoryschenko
2009-01-14 13:26                                         ` Jarek Poplawski
2009-01-14 13:32                                           ` Peter Zijlstra
2009-01-14 13:57                                             ` Jarek Poplawski
2009-01-14 14:13                                             ` Jarek Poplawski
2009-01-14 14:28                                               ` Peter Zijlstra
2009-01-14 14:39                                                 ` Jarek Poplawski
2009-01-15  9:01                                                 ` Jarek Poplawski
2009-01-15 10:46                                                   ` Peter Zijlstra
2009-01-15 10:54                                                     ` Jarek Poplawski
2009-01-14 18:02                                   ` Chris Caputo
2009-01-15  6:53                                     ` Jarek Poplawski
2009-01-15  7:12                                       ` Badalian Vyacheslav
2009-01-15  8:09                                         ` Jarek Poplawski
2009-01-15  9:01                                           ` Denys Fedoryschenko
2009-01-15  9:06                                             ` Jarek Poplawski
2009-01-15  9:40                                               ` Badalian Vyacheslav
2009-01-15  9:54                                                 ` Jarek Poplawski
2009-01-15  9:57                                                   ` Denys Fedoryschenko
2009-01-15 10:06                                                     ` Jarek Poplawski
2009-01-15 10:10                                                     ` Jarek Poplawski
2009-01-15 10:40                                                       ` Denys Fedoryschenko
2009-01-15  7:26                                       ` Chris Caputo
2009-01-15  7:54                                         ` Jarek Poplawski
2009-01-15  9:45                                           ` Jarek Poplawski
2009-01-15 12:00                                             ` Chris Caputo
2009-01-15 12:18                                               ` Jarek Poplawski
2009-01-15 13:53                                                 ` Chris Caputo
2009-01-16  6:51                                                   ` Badalian Vyacheslav
2009-01-19  5:46                                       ` David Miller
2009-01-19  6:57                                         ` [PATCH] " Jarek Poplawski
2009-01-19  7:42                                           ` Badalian Vyacheslav
2009-01-19  7:57                                             ` Jarek Poplawski
2009-01-20  1:29                                               ` David Miller
2008-10-10 12:32   ` Patrick McHardy
2008-10-10 12:34     ` Patrick McHardy
2008-10-10 12:54       ` Badalian Vyacheslav

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.