All of lore.kernel.org
 help / color / mirror / Atom feed
* RFC/T: Possible fix for bcm43xx periodic work bug
@ 2006-09-07 18:17 Larry Finger
       [not found] ` <45006221.4090603-tQ5ms3gMjBLk1uMJSBkQmQ@public.gmane.org>
  2006-09-08  9:42 ` Erik Mouw
  0 siblings, 2 replies; 10+ messages in thread
From: Larry Finger @ 2006-09-07 18:17 UTC (permalink / raw)
  To: netdev; +Cc: Bcm43xx-dev, Michael Buesch, Stefano Brivio

Hi all,

I think I have a fix for the bcm43xx bug that leads to NETDEV WATCHDOG tx timeouts and would like it
to get as much testing as possible as this bug affects V2.6.18-rcX. If the problem is truly
fixed, I hope to get the fix into mainline before release of the bug into the stable series.

I got the idea for the fix when I discovered that the timeout interval used for the watchdog is the 
default value of 5 seconds. Obviously, the few milliseconds used in the periodic work handler 
weren't causing us to "just miss".

To exacerbate the problem, I changed the repeat timer for periodic work from 15 to 1 sec. I also set 
BADNESS_LIMIT to 0. As a result, I was running the problem code once per second instead of once per 
minute. Now failures would occur in minutes instead of hours.

Operating from the premise that the DMA needed some time to reach the idle state after the MAC was 
suspended, I tried various delays, but nothing worked.

Then I decided to test the premise that the problem was associated with shutting down and restarting 
the network. That lead to the current patch, which has run for what is effectively 100 times longer 
than previous versions.

Larry
-------------------------------------------------------------------


Index: wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx_main.c
===================================================================
--- wireless-2.6.orig/drivers/net/wireless/bcm43xx/bcm43xx_main.c
+++ wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx_main.c
@@ -3244,8 +3244,6 @@ static void bcm43xx_periodic_work_handle
  		 * be preemtible.
  		 */
  		mutex_lock(&bcm->mutex);
-		netif_stop_queue(bcm->net_dev);
-		synchronize_net();
  		spin_lock_irqsave(&bcm->irq_lock, flags);
  		bcm43xx_mac_suspend(bcm);
  		if (bcm43xx_using_pio(bcm))
@@ -3270,7 +3268,6 @@ static void bcm43xx_periodic_work_handle
  		if (bcm43xx_using_pio(bcm))
  			bcm43xx_pio_thaw_txqueues(bcm);
  		bcm43xx_mac_enable(bcm);
-		netif_wake_queue(bcm->net_dev);
  	}
  	mmiowb();
  	spin_unlock_irqrestore(&bcm->irq_lock, flags);



----------------------------------------------------------

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RFC/T: Possible fix for bcm43xx periodic work bug
       [not found] ` <45006221.4090603-tQ5ms3gMjBLk1uMJSBkQmQ@public.gmane.org>
@ 2006-09-07 20:39   ` Michael Buesch
       [not found]     ` <200609072239.20074.mb-fseUSCV1ubazQB+pC5nmwQ@public.gmane.org>
       [not found]     ` <45016F63.9000200@lwfinger.net>
  0 siblings, 2 replies; 10+ messages in thread
From: Michael Buesch @ 2006-09-07 20:39 UTC (permalink / raw)
  To: Larry Finger
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	Bcm43xx-dev-0fE9KPoRgkgATYTw5x5z8w, Stefano Brivio

On Thursday 07 September 2006 20:17, Larry Finger wrote:
> Hi all,
> 
> I think I have a fix for the bcm43xx bug that leads to NETDEV WATCHDOG tx timeouts and would like it
> to get as much testing as possible as this bug affects V2.6.18-rcX. If the problem is truly
> fixed, I hope to get the fix into mainline before release of the bug into the stable series.
> 
> I got the idea for the fix when I discovered that the timeout interval used for the watchdog is the 
> default value of 5 seconds. Obviously, the few milliseconds used in the periodic work handler 
> weren't causing us to "just miss".
> 
> To exacerbate the problem, I changed the repeat timer for periodic work from 15 to 1 sec. I also set 
> BADNESS_LIMIT to 0. As a result, I was running the problem code once per second instead of once per 
> minute. Now failures would occur in minutes instead of hours.
> 
> Operating from the premise that the DMA needed some time to reach the idle state after the MAC was 
> suspended, I tried various delays, but nothing worked.
> 
> Then I decided to test the premise that the problem was associated with shutting down and restarting 
> the network. That lead to the current patch, which has run for what is effectively 100 times longer 
> than previous versions.
> 
> Larry
> -------------------------------------------------------------------
> 
> 
> Index: wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx_main.c
> ===================================================================
> --- wireless-2.6.orig/drivers/net/wireless/bcm43xx/bcm43xx_main.c
> +++ wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx_main.c
> @@ -3244,8 +3244,6 @@ static void bcm43xx_periodic_work_handle
>   		 * be preemtible.
>   		 */
>   		mutex_lock(&bcm->mutex);
> -		netif_stop_queue(bcm->net_dev);
> -		synchronize_net();
>   		spin_lock_irqsave(&bcm->irq_lock, flags);
>   		bcm43xx_mac_suspend(bcm);
>   		if (bcm43xx_using_pio(bcm))
> @@ -3270,7 +3268,6 @@ static void bcm43xx_periodic_work_handle
>   		if (bcm43xx_using_pio(bcm))
>   			bcm43xx_pio_thaw_txqueues(bcm);
>   		bcm43xx_mac_enable(bcm);
> -		netif_wake_queue(bcm->net_dev);
>   	}
>   	mmiowb();
>   	spin_unlock_irqrestore(&bcm->irq_lock, flags);

The real question is: Why does this patch help?

Let's explain it. We don't stop networking just for fun there.
While executing long preemptible periodic work, we must ensure
that the TX path into the driver is not entered. It's the same
reason why we disable IRQs in the first place. We can't take the
mutex in the TX path and the IRQ handler. (That are the only places
where we can't take the mutex).
Short: We must stop netif here.
The question is: Why does stopping netif queue cause a watchdog
trigger here? The maximum time it can take for the periodic
work inside of the critical section is about 0.2sec. So the queue
is stopped for about 0.2sec max. Why does the watchdog trigger?
Any idea from some networking guru?
Could synchronize_net() take over 5sec in some worst case? Why?
Questions over questions :D

-- 
Greetings Michael.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RFC/T: Possible fix for bcm43xx periodic work bug
       [not found]     ` <200609072239.20074.mb-fseUSCV1ubazQB+pC5nmwQ@public.gmane.org>
@ 2006-09-07 22:08       ` Larry Finger
  0 siblings, 0 replies; 10+ messages in thread
From: Larry Finger @ 2006-09-07 22:08 UTC (permalink / raw)
  To: Michael Buesch
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	Bcm43xx-dev-0fE9KPoRgkgATYTw5x5z8w, Stefano Brivio

[-- Attachment #1: Type: text/plain, Size: 1674 bytes --]

Michael Buesch wrote:
> 
> The real question is: Why does this patch help?
> 
> Let's explain it. We don't stop networking just for fun there.
> While executing long preemptible periodic work, we must ensure
> that the TX path into the driver is not entered. It's the same
> reason why we disable IRQs in the first place. We can't take the
> mutex in the TX path and the IRQ handler. (That are the only places
> where we can't take the mutex).
> Short: We must stop netif here.
> The question is: Why does stopping netif queue cause a watchdog
> trigger here? The maximum time it can take for the periodic
> work inside of the critical section is about 0.2sec. So the queue
> is stopped for about 0.2sec max. Why does the watchdog trigger?
> Any idea from some networking guru?
> Could synchronize_net() take over 5sec in some worst case? Why?
> Questions over questions :D
> 

To check if it takes more than 5 seconds, I restored the original network disabling code and 
increased the timeout to 30 seconds. If this works without error, I'll try to margin the time. I'm 
still running that branch every second.

Larry


Index: wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx_main.c
===================================================================
--- wireless-2.6.orig/drivers/net/wireless/bcm43xx/bcm43xx_main.c
+++ wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx_main.c
@@ -4147,6 +4147,7 @@ static int __devinit bcm43xx_init_one(st
  	SET_MODULE_OWNER(net_dev);
  	SET_NETDEV_DEV(net_dev, &pdev->dev);

+	net_dev->watchdog_timeo = 30 * HZ;
  	net_dev->open = bcm43xx_net_open;
  	net_dev->stop = bcm43xx_net_stop;
  	net_dev->get_stats = bcm43xx_net_get_stats;



[-- Attachment #2: patch_periodic --]
[-- Type: text/plain, Size: 1054 bytes --]


Index: wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx_main.c
===================================================================
--- wireless-2.6.orig/drivers/net/wireless/bcm43xx/bcm43xx_main.c
+++ wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx_main.c
@@ -3207,7 +3207,7 @@ static void do_periodic_work(struct bcm4
 		bcm43xx_periodic_every15sec(bcm);
 	bcm->periodic_state = state + 1;
 
-	schedule_delayed_work(&bcm->periodic_work, HZ * 15);
+	schedule_delayed_work(&bcm->periodic_work, HZ * 1);
 }
 
 /* Estimate a "Badness" value based on the periodic work
@@ -3227,7 +3227,7 @@ static int estimate_periodic_work_badnes
 	if (state % 1 == 0) /* every 15 sec */
 		badness += 1;
 
-#define BADNESS_LIMIT	4
+#define BADNESS_LIMIT	0
 	return badness;
 }
 
@@ -4147,6 +4147,7 @@ static int __devinit bcm43xx_init_one(st
 	SET_MODULE_OWNER(net_dev);
 	SET_NETDEV_DEV(net_dev, &pdev->dev);
 
+	net_dev->watchdog_timeo = 30 * HZ;
 	net_dev->open = bcm43xx_net_open;
 	net_dev->stop = bcm43xx_net_stop;
 	net_dev->get_stats = bcm43xx_net_get_stats;


[-- Attachment #3: Type: text/plain, Size: 179 bytes --]

_______________________________________________
Bcm43xx-dev mailing list
Bcm43xx-dev-0fE9KPoRgkgATYTw5x5z8w@public.gmane.org
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RFC/T: Possible fix for bcm43xx periodic work bug
  2006-09-07 18:17 RFC/T: Possible fix for bcm43xx periodic work bug Larry Finger
       [not found] ` <45006221.4090603-tQ5ms3gMjBLk1uMJSBkQmQ@public.gmane.org>
@ 2006-09-08  9:42 ` Erik Mouw
  2006-09-08  9:45   ` Michael Buesch
  1 sibling, 1 reply; 10+ messages in thread
From: Erik Mouw @ 2006-09-08  9:42 UTC (permalink / raw)
  To: Larry Finger; +Cc: netdev, Bcm43xx-dev, Michael Buesch, Stefano Brivio

On Thu, Sep 07, 2006 at 01:17:05PM -0500, Larry Finger wrote:
> I think I have a fix for the bcm43xx bug that leads to NETDEV WATCHDOG tx timeouts and would like it
> to get as much testing as possible as this bug affects V2.6.18-rcX. If the problem is truly
> fixed, I hope to get the fix into mainline before release of the bug into the stable series.

FWIW, I finally tracked down the bug that hangs my laptop to the
bcm43xx driver. At first I got the impression it was the cpufreq code
(which has been flaky in early 2.6.18-rc kernels), but after disabling
that my laptop still crashed. After that I disabled preempt cause I got
a couple of lockdep warnings when I had it enabled. That didn't make a
difference, laptop still hangs after some time (runs a couple of hours
at most). Right now I'm on wired network and my laptop finally doesn't
hang anymore (up for two days).

The hang is very hard to trigger (i.e.: it happens at random, I see no
pattern) and locks up the machine completely. I've tried to capture
kernel messages through serial console, but that doesn't work (lock up
before any messages are printed).

This is with any 2.6.18-rc kernel without additional patches or
proprietary modules, 2.6.17 works ok.

Output from lspci:

0000:06:00.0 Network controller: Broadcom Corporation BCM4318 [AirForce
One 54g] 802.11g Wireless LAN Controller (rev 02)
        Subsystem: Linksys WPC54G-EU version 3 [Wireless-G Notebook Adapter]
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 64
        Interrupt: pin A routed to IRQ 10
        Region 0: Memory at 36000000 (32-bit, non-prefetchable) [size=8K]
00: e4 14 18 43 06 00 00 00 02 00 80 02 00 40 00 00
10: 00 00 00 36 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 37 17 48 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 0a 01 00 00

Is the patch you just posted supposed to fix the kind of problems I
have, or do I have to look elsewhere?


Erik

-- 
+-- Erik Mouw -- www.harddisk-recovery.com -- +31 70 370 12 90 --
| Lab address: Delftechpark 26, 2628 XH, Delft, The Netherlands

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RFC/T: Possible fix for bcm43xx periodic work bug
  2006-09-08  9:42 ` Erik Mouw
@ 2006-09-08  9:45   ` Michael Buesch
  2006-09-08 13:23     ` Erik Mouw
  0 siblings, 1 reply; 10+ messages in thread
From: Michael Buesch @ 2006-09-08  9:45 UTC (permalink / raw)
  To: Erik Mouw; +Cc: netdev, Bcm43xx-dev, Stefano Brivio, Larry Finger

On Friday 08 September 2006 11:42, Erik Mouw wrote:
> On Thu, Sep 07, 2006 at 01:17:05PM -0500, Larry Finger wrote:
> > I think I have a fix for the bcm43xx bug that leads to NETDEV WATCHDOG tx timeouts and would like it
> > to get as much testing as possible as this bug affects V2.6.18-rcX. If the problem is truly
> > fixed, I hope to get the fix into mainline before release of the bug into the stable series.
> 
> FWIW, I finally tracked down the bug that hangs my laptop to the
> bcm43xx driver. At first I got the impression it was the cpufreq code
> (which has been flaky in early 2.6.18-rc kernels), but after disabling
> that my laptop still crashed. After that I disabled preempt cause I got
> a couple of lockdep warnings when I had it enabled. That didn't make a
> difference, laptop still hangs after some time (runs a couple of hours
> at most). Right now I'm on wired network and my laptop finally doesn't
> hang anymore (up for two days).
> 
> The hang is very hard to trigger (i.e.: it happens at random, I see no
> pattern) and locks up the machine completely. I've tried to capture
> kernel messages through serial console, but that doesn't work (lock up
> before any messages are printed).
> 
> This is with any 2.6.18-rc kernel without additional patches or
> proprietary modules, 2.6.17 works ok.

The crash is fixed in wireless-2.6.
The actual cause of the controller restart not. So at least it
does not crash anymore.

-- 
Greetings Michael.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RFC/T: Possible fix for bcm43xx periodic work bug
  2006-09-08  9:45   ` Michael Buesch
@ 2006-09-08 13:23     ` Erik Mouw
       [not found]       ` <20060908132317.GS26916-7jKlSzr1t1+rFW3l5+5NieqUGfbH9hYC@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Erik Mouw @ 2006-09-08 13:23 UTC (permalink / raw)
  To: Michael Buesch; +Cc: netdev, Larry Finger, Bcm43xx-dev, Stefano Brivio

On Fri, Sep 08, 2006 at 11:45:27AM +0200, Michael Buesch wrote:
> The crash is fixed in wireless-2.6.
> The actual cause of the controller restart not. So at least it
> does not crash anymore.

Thanks for the information, pulled wireless-2.6 and recompiling kernel.
If this really fixes the problem, can we try to get it merged before
2.6.18 closes? I don't know if vanilla 2.6.18-rc6 locks up on other
hardware as well, but if it does it would be a major regression against
2.6.17.


Erik

-- 
+-- Erik Mouw -- www.harddisk-recovery.com -- +31 70 370 12 90 --
| Lab address: Delftechpark 26, 2628 XH, Delft, The Netherlands

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RFC/T: Possible fix for bcm43xx periodic work bug
       [not found]       ` <20060908132317.GS26916-7jKlSzr1t1+rFW3l5+5NieqUGfbH9hYC@public.gmane.org>
@ 2006-09-08 13:36         ` Larry Finger
       [not found]           ` <450171E4.8070901-tQ5ms3gMjBLk1uMJSBkQmQ@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Larry Finger @ 2006-09-08 13:36 UTC (permalink / raw)
  To: Erik Mouw
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, John Linville,
	Bcm43xx-dev-0fE9KPoRgkgATYTw5x5z8w, Michael Buesch,
	Stefano Brivio

Erik Mouw wrote:
> On Fri, Sep 08, 2006 at 11:45:27AM +0200, Michael Buesch wrote:
>> The crash is fixed in wireless-2.6.
>> The actual cause of the controller restart not. So at least it
>> does not crash anymore.
> 
> Thanks for the information, pulled wireless-2.6 and recompiling kernel.
> If this really fixes the problem, can we try to get it merged before
> 2.6.18 closes? I don't know if vanilla 2.6.18-rc6 locks up on other
> hardware as well, but if it does it would be a major regression against
> 2.6.17.

I'm trying. At the moment, I'm testing a patch against 2.6.18-rc6. Once it works here, I'll be 
putting it out for testing.

Larry

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RFC/T: Possible fix for bcm43xx periodic work bug
       [not found]       ` <45016F63.9000200-tQ5ms3gMjBLk1uMJSBkQmQ@public.gmane.org>
@ 2006-09-08 20:42         ` Michael Buesch
  0 siblings, 0 replies; 10+ messages in thread
From: Michael Buesch @ 2006-09-08 20:42 UTC (permalink / raw)
  To: Larry Finger; +Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Broadcom Linux

On Friday 08 September 2006 15:25, Larry Finger wrote:
> Michael Buesch wrote:
> > On Thursday 07 September 2006 20:17, Larry Finger wrote:
> >> Hi all,
> >>
> >> I think I have a fix for the bcm43xx bug that leads to NETDEV WATCHDOG tx timeouts and would like it
> >> to get as much testing as possible as this bug affects V2.6.18-rcX. If the problem is truly
> >> fixed, I hope to get the fix into mainline before release of the bug into the stable series.
> >>
> >> I got the idea for the fix when I discovered that the timeout interval used for the watchdog is the 
> >> default value of 5 seconds. Obviously, the few milliseconds used in the periodic work handler 
> >> weren't causing us to "just miss".
> >>
> >> To exacerbate the problem, I changed the repeat timer for periodic work from 15 to 1 sec. I also set 
> >> BADNESS_LIMIT to 0. As a result, I was running the problem code once per second instead of once per 
> >> minute. Now failures would occur in minutes instead of hours.
> >>
> >> Operating from the premise that the DMA needed some time to reach the idle state after the MAC was 
> >> suspended, I tried various delays, but nothing worked.
> >>
> >> Then I decided to test the premise that the problem was associated with shutting down and restarting 
> >> the network. That lead to the current patch, which has run for what is effectively 100 times longer 
> >> than previous versions.
> >>
> >> Larry
> >> -------------------------------------------------------------------
> >>
> >>
> >> Index: wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx_main.c
> >> ===================================================================
> >> --- wireless-2.6.orig/drivers/net/wireless/bcm43xx/bcm43xx_main.c
> >> +++ wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx_main.c
> >> @@ -3244,8 +3244,6 @@ static void bcm43xx_periodic_work_handle
> >>   		 * be preemtible.
> >>   		 */
> >>   		mutex_lock(&bcm->mutex);
> >> -		netif_stop_queue(bcm->net_dev);
> >> -		synchronize_net();
> >>   		spin_lock_irqsave(&bcm->irq_lock, flags);
> >>   		bcm43xx_mac_suspend(bcm);
> >>   		if (bcm43xx_using_pio(bcm))
> >> @@ -3270,7 +3268,6 @@ static void bcm43xx_periodic_work_handle
> >>   		if (bcm43xx_using_pio(bcm))
> >>   			bcm43xx_pio_thaw_txqueues(bcm);
> >>   		bcm43xx_mac_enable(bcm);
> >> -		netif_wake_queue(bcm->net_dev);
> >>   	}
> >>   	mmiowb();
> >>   	spin_unlock_irqrestore(&bcm->irq_lock, flags);
> > 
> > The real question is: Why does this patch help?
> > 
> > Let's explain it. We don't stop networking just for fun there.
> > While executing long preemptible periodic work, we must ensure
> > that the TX path into the driver is not entered. It's the same
> > reason why we disable IRQs in the first place. We can't take the
> > mutex in the TX path and the IRQ handler. (That are the only places
> > where we can't take the mutex).
> > Short: We must stop netif here.
> > The question is: Why does stopping netif queue cause a watchdog
> > trigger here? The maximum time it can take for the periodic
> > work inside of the critical section is about 0.2sec. So the queue
> > is stopped for about 0.2sec max. Why does the watchdog trigger?
> > Any idea from some networking guru?
> > Could synchronize_net() take over 5sec in some worst case? Why?
> > Questions over questions :D
> 
> This may be a stupid question, but does the synchronize_net call belong?
> 
> The reason I ask is because the code for synchronize_net is

synchronize_net() ensures that all currently running TX handlers
complete before returning from synchronize_net(). That's what I have
been told.
So what we do is: Disable TX queue and wait for any running TX queue
to finish. We must do this to make sure no TX handler can run after
the sync. I previously explained the exact reasons.

> When I look through the mutex_lock code, I don't find any rcu code. What did I miss?

This is bot about synchronoizing bcm43xx, but the net layer.

> BTW, I still  
> got NETDEV tx timeouts with a 30 second timeout.

So, well. I don't think synchronize_net can take 30 seconds.
That would be a big bug in the net layer.

> I'm currently testing with the netif_stop_queue and netif_wake_queue statements restored, but 
> without the synchronize_net. It has run for over 11 hours with the accelerated testing, which would 
> correspond to almost 4 weeks at regular rates.

Well, we can brute-force it to death, or we can ask someone who actually has
a clue what is going on. Some networking guru, please help. :)

-- 
Greetings Michael.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RFC/T: Possible fix for bcm43xx periodic work bug
       [not found]           ` <450171E4.8070901-tQ5ms3gMjBLk1uMJSBkQmQ@public.gmane.org>
@ 2006-09-11 10:52             ` Erik Mouw
       [not found]               ` <20060911105202.GA7903-7jKlSzr1t1+rFW3l5+5NieqUGfbH9hYC@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Erik Mouw @ 2006-09-11 10:52 UTC (permalink / raw)
  To: Larry Finger
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Stefano Brivio, John Linville,
	Michael Buesch, Bcm43xx-dev-0fE9KPoRgkgATYTw5x5z8w

On Fri, Sep 08, 2006 at 08:36:36AM -0500, Larry Finger wrote:
> Erik Mouw wrote:
> > Thanks for the information, pulled wireless-2.6 and recompiling kernel.
> > If this really fixes the problem, can we try to get it merged before
> > 2.6.18 closes? I don't know if vanilla 2.6.18-rc6 locks up on other
> > hardware as well, but if it does it would be a major regression against
> > 2.6.17.
> 
> I'm trying. At the moment, I'm testing a patch against 2.6.18-rc6. Once it works here, I'll be 
> putting it out for testing.

I suppose that's the patch in "[PATCH] fix for system lockups in
2.6.18-rcX caused by bcm43xx"?

FWIW: 2.6.18-rc6+wireless-2.6 runs well, my laptop didn't lock up over
the weekend.


Erik

-- 
+-- Erik Mouw -- www.harddisk-recovery.com -- +31 70 370 12 90 --
| Lab address: Delftechpark 26, 2628 XH, Delft, The Netherlands

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RFC/T: Possible fix for bcm43xx periodic work bug
       [not found]               ` <20060911105202.GA7903-7jKlSzr1t1+rFW3l5+5NieqUGfbH9hYC@public.gmane.org>
@ 2006-09-11 14:37                 ` Larry Finger
  0 siblings, 0 replies; 10+ messages in thread
From: Larry Finger @ 2006-09-11 14:37 UTC (permalink / raw)
  To: Erik Mouw
  Cc: Bcm43xx-dev-0fE9KPoRgkgATYTw5x5z8w,
	netdev-u79uwXL29TY76Z2rM5mHXA, Stefano Brivio, Michael Buesch,
	John Linville

Erik Mouw wrote:
> On Fri, Sep 08, 2006 at 08:36:36AM -0500, Larry Finger wrote:
>> Erik Mouw wrote:
>>> Thanks for the information, pulled wireless-2.6 and recompiling kernel.
>>> If this really fixes the problem, can we try to get it merged before
>>> 2.6.18 closes? I don't know if vanilla 2.6.18-rc6 locks up on other
>>> hardware as well, but if it does it would be a major regression against
>>> 2.6.17.
>> I'm trying. At the moment, I'm testing a patch against 2.6.18-rc6. Once it works here, I'll be 
>> putting it out for testing.
> 
> I suppose that's the patch in "[PATCH] fix for system lockups in
> 2.6.18-rcX caused by bcm43xx"?

Yes.
> 
> FWIW: 2.6.18-rc6+wireless-2.6 runs well, my laptop didn't lock up over
> the weekend.

It seems to hurt some hardware much worse than others. My BCM4306 Rev 2 could
not run without failure longer than ~6 hours, and sometimes as short as 10 minutes,
before the patch. I've now completed the equivalent of 3 weeks without a failure
(I'm running the preemptible periodic work every second instead of every minute).

Larry

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2006-09-11 14:37 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-09-07 18:17 RFC/T: Possible fix for bcm43xx periodic work bug Larry Finger
     [not found] ` <45006221.4090603-tQ5ms3gMjBLk1uMJSBkQmQ@public.gmane.org>
2006-09-07 20:39   ` Michael Buesch
     [not found]     ` <200609072239.20074.mb-fseUSCV1ubazQB+pC5nmwQ@public.gmane.org>
2006-09-07 22:08       ` Larry Finger
     [not found]     ` <45016F63.9000200@lwfinger.net>
     [not found]       ` <45016F63.9000200-tQ5ms3gMjBLk1uMJSBkQmQ@public.gmane.org>
2006-09-08 20:42         ` Michael Buesch
2006-09-08  9:42 ` Erik Mouw
2006-09-08  9:45   ` Michael Buesch
2006-09-08 13:23     ` Erik Mouw
     [not found]       ` <20060908132317.GS26916-7jKlSzr1t1+rFW3l5+5NieqUGfbH9hYC@public.gmane.org>
2006-09-08 13:36         ` Larry Finger
     [not found]           ` <450171E4.8070901-tQ5ms3gMjBLk1uMJSBkQmQ@public.gmane.org>
2006-09-11 10:52             ` Erik Mouw
     [not found]               ` <20060911105202.GA7903-7jKlSzr1t1+rFW3l5+5NieqUGfbH9hYC@public.gmane.org>
2006-09-11 14:37                 ` Larry Finger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.