All of lore.kernel.org
 help / color / mirror / Atom feed
* [ath9k-devel] ath9k implicated in lockups, kernel 2.6.38
@ 2011-05-19 10:23 Camilo Mesias
  2011-05-19 14:20 ` Peter Stuge
  2011-05-19 15:21 ` Ben Greear
  0 siblings, 2 replies; 11+ messages in thread
From: Camilo Mesias @ 2011-05-19 10:23 UTC (permalink / raw)
  To: ath9k-devel

Hi,

I'm running Fedora 15 and found it unstable compared to F14.
Specifically it would randomly lock up (needing power cycle) with no
message of relevance in the logs. Also it wouldn't suspend, reboot or
hibernate, and the rf-kill switch action would also lock up the
machine. After some difficulties I blacklisted ath9k and the machine
worked perfectly.

After searching I found a message 'ath9k causes lockups since kernel
2.6.35' dated Feb 25 2011 on this list. I take it this is the same
problem as we have the same symptoms and hardware - HP mini 311c with
Atheros AR9285 PCI-Express (rev 01).

Please help to diagnose and fix the problem, I would be interested in
trying development drivers and upping the debug levels as necessary.

-Cam

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [ath9k-devel] ath9k implicated in lockups, kernel 2.6.38
  2011-05-19 10:23 [ath9k-devel] ath9k implicated in lockups, kernel 2.6.38 Camilo Mesias
@ 2011-05-19 14:20 ` Peter Stuge
  2011-05-19 15:21 ` Ben Greear
  1 sibling, 0 replies; 11+ messages in thread
From: Peter Stuge @ 2011-05-19 14:20 UTC (permalink / raw)
  To: ath9k-devel

Hi Cam,

Camilo Mesias wrote:
> Please help to diagnose and fix the problem, I would be interested in
> trying development drivers and upping the debug levels as necessary.

I'm afraid the bug ping might not have much effect. The Atheros
developers seem to focus on issues they are able to reproduce, which
is of course difficult when some problem depends on external factors,
as is often the case. But hopefully. You could try wireless-testing
and you could look on linux-wireless for patches. There are nearly no
patches on this list. (I don't understand that, but oh well.)


//Peter

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [ath9k-devel] ath9k implicated in lockups, kernel 2.6.38
  2011-05-19 10:23 [ath9k-devel] ath9k implicated in lockups, kernel 2.6.38 Camilo Mesias
  2011-05-19 14:20 ` Peter Stuge
@ 2011-05-19 15:21 ` Ben Greear
  2011-05-19 17:15   ` Camilo Mesias
  1 sibling, 1 reply; 11+ messages in thread
From: Ben Greear @ 2011-05-19 15:21 UTC (permalink / raw)
  To: ath9k-devel

On 05/19/2011 03:23 AM, Camilo Mesias wrote:
> Hi,
>
> I'm running Fedora 15 and found it unstable compared to F14.
> Specifically it would randomly lock up (needing power cycle) with no
> message of relevance in the logs. Also it wouldn't suspend, reboot or
> hibernate, and the rf-kill switch action would also lock up the
> machine. After some difficulties I blacklisted ath9k and the machine
> worked perfectly.
>
> After searching I found a message 'ath9k causes lockups since kernel
> 2.6.35' dated Feb 25 2011 on this list. I take it this is the same
> problem as we have the same symptoms and hardware - HP mini 311c with
> Atheros AR9285 PCI-Express (rev 01).

Try yanking your NIC and see if the problems persist?

Ath9k has it's issues, but it's surely not the only thing that
can lock up a system..and it usually spews all sorts of warnings/errors
before it does lock up a system.

And, recently it's been much more stable..so try a 2.6.39 kernel perhaps?

Thanks,
Ben

>
> Please help to diagnose and fix the problem, I would be interested in
> trying development drivers and upping the debug levels as necessary.
>
> -Cam
> _______________________________________________
> ath9k-devel mailing list
> ath9k-devel at lists.ath9k.org
> https://lists.ath9k.org/mailman/listinfo/ath9k-devel


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [ath9k-devel] ath9k implicated in lockups, kernel 2.6.38
  2011-05-19 15:21 ` Ben Greear
@ 2011-05-19 17:15   ` Camilo Mesias
  2011-05-19 17:22     ` Ben Greear
  0 siblings, 1 reply; 11+ messages in thread
From: Camilo Mesias @ 2011-05-19 17:15 UTC (permalink / raw)
  To: ath9k-devel

Hi,

On Thu, May 19, 2011 at 4:21 PM, Ben Greear <greearb@candelatech.com> wrote:
> Try yanking your NIC and see if the problems persist?

Just to be clear, do you mean to remove the mini-pci express card from
the netbook and run without it? I understand the driver wouldn't get
loaded in that case anyway, so it would be a hardware equivalent to
blacklisting the ath9k module (which completely cures the problem)

> Ath9k has it's issues, but it's surely not the only thing that
> can lock up a system..and it usually spews all sorts of warnings/errors
> before it does lock up a system.

Not in this case it seems. Is there a way to increase the debug level?

> And, recently it's been much more stable..so try a 2.6.39 kernel perhaps?

I'll have a look in koji and see if I can grab any newer kernels.

Who understands the suspend process and what happens to the driver's
threads and resources? Is it documented anywhere?

-Cam

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [ath9k-devel] ath9k implicated in lockups, kernel 2.6.38
  2011-05-19 17:15   ` Camilo Mesias
@ 2011-05-19 17:22     ` Ben Greear
  2011-05-19 18:22       ` Camilo Mesias
  0 siblings, 1 reply; 11+ messages in thread
From: Ben Greear @ 2011-05-19 17:22 UTC (permalink / raw)
  To: ath9k-devel

On 05/19/2011 10:15 AM, Camilo Mesias wrote:
> Hi,
>
> On Thu, May 19, 2011 at 4:21 PM, Ben Greear<greearb@candelatech.com>  wrote:
>> Try yanking your NIC and see if the problems persist?
>
> Just to be clear, do you mean to remove the mini-pci express card from
> the netbook and run without it? I understand the driver wouldn't get
> loaded in that case anyway, so it would be a hardware equivalent to
> blacklisting the ath9k module (which completely cures the problem)

Ahh, I didn't realize you had blacklisted.  That does seem to implicate
ath9k.

>
>> Ath9k has it's issues, but it's surely not the only thing that
>> can lock up a system..and it usually spews all sorts of warnings/errors
>> before it does lock up a system.
>
> Not in this case it seems. Is there a way to increase the debug level?

Have you tried enabling sysrq and seeing if sysrq can print any CPU backtraces?

http://www.mjmwired.net/kernel/Documentation/sysrq.txt

If you can re-compile your kernel, make sure you have the
LOCKUP_DETECTOR enabled, and perhaps PROVE_LOCKING and other
mutex debugging.

>
>> And, recently it's been much more stable..so try a 2.6.39 kernel perhaps?
>
> I'll have a look in koji and see if I can grab any newer kernels.
>
> Who understands the suspend process and what happens to the driver's
> threads and resources? Is it documented anywhere?

I may have missed your initial report.  Does this happen only
when you suspend?  Can you give as many details as possible about
how you can cause this bug to happen (and not happen)?

Thanks,
Ben

>
> -Cam


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [ath9k-devel] ath9k implicated in lockups, kernel 2.6.38
  2011-05-19 17:22     ` Ben Greear
@ 2011-05-19 18:22       ` Camilo Mesias
  2011-05-19 19:21         ` Camilo Mesias
  0 siblings, 1 reply; 11+ messages in thread
From: Camilo Mesias @ 2011-05-19 18:22 UTC (permalink / raw)
  To: ath9k-devel

Hi,

thanks for the suggestions,

On Thu, May 19, 2011 at 6:22 PM, Ben Greear <greearb@candelatech.com> wrote:
>
> Have you tried enabling sysrq and seeing if sysrq can print any CPU
> backtraces?

I enabled it, took a backtrace (to prove it was working) then tried to
suspend, the machine hung hard and wouldn't respond to sysrq :(

> If you can re-compile your kernel, make sure you have the
> LOCKUP_DETECTOR enabled, and perhaps PROVE_LOCKING and other
> mutex debugging.

OK I think I'll try random newer kernels from koji then try rebuilding.

> I may have missed your initial report. ?Does this happen only
> when you suspend? ?Can you give as many details as possible about
> how you can cause this bug to happen (and not happen)?

There is some info in https://bugzilla.redhat.com/show_bug.cgi?id=697157
basically the system locks up all on its own sometimes, and will also
lock up with the same symptoms if I try to suspend, logout, reboot,
etc.

-Cam

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [ath9k-devel] ath9k implicated in lockups, kernel 2.6.38
  2011-05-19 18:22       ` Camilo Mesias
@ 2011-05-19 19:21         ` Camilo Mesias
  2011-05-19 19:27           ` Ben Greear
  0 siblings, 1 reply; 11+ messages in thread
From: Camilo Mesias @ 2011-05-19 19:21 UTC (permalink / raw)
  To: ath9k-devel

Hi,

I tried a few more things, I found another reliable way to lock up the
system is to try rmmod ath9k
Surely that should be straightforward to debug?

I tried a 2.6.39 kernel and the behaviour was essentially the same.

I added ath9k options debug=0xffffffff to an ath9k.conf file in
modprobe.d but didn't get any extra debug that I could see. Maybe
that's the wrong way to do it?

-Cam

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [ath9k-devel] ath9k implicated in lockups, kernel 2.6.38
  2011-05-19 19:21         ` Camilo Mesias
@ 2011-05-19 19:27           ` Ben Greear
  2011-05-19 19:58             ` Camilo Mesias
  0 siblings, 1 reply; 11+ messages in thread
From: Ben Greear @ 2011-05-19 19:27 UTC (permalink / raw)
  To: ath9k-devel

On 05/19/2011 12:21 PM, Camilo Mesias wrote:
> Hi,
>
> I tried a few more things, I found another reliable way to lock up the
> system is to try rmmod ath9k
> Surely that should be straightforward to debug?
>
> I tried a 2.6.39 kernel and the behaviour was essentially the same.
>
> I added ath9k options debug=0xffffffff to an ath9k.conf file in
> modprobe.d but didn't get any extra debug that I could see. Maybe
> that's the wrong way to do it?

We do rmmod a fair bit here, and haven't seen problems lately.

Do you happen to have another system you could try that NIC in?

Ben

>
> -Cam


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [ath9k-devel] ath9k implicated in lockups, kernel 2.6.38
  2011-05-19 19:27           ` Ben Greear
@ 2011-05-19 19:58             ` Camilo Mesias
  2011-05-19 20:34               ` Ben Greear
  0 siblings, 1 reply; 11+ messages in thread
From: Camilo Mesias @ 2011-05-19 19:58 UTC (permalink / raw)
  To: ath9k-devel

Hmm possibly, although I think the other systems might not be pci-e. I don't
think it's hw failure because the same system works fine under F14 (I have
two hard disks and can swap). Also another guy with the same netbook
reported the same problem - my first message on this list mentions the topic
and date.

-Cam

On 19 May 2011 20:28, "Ben Greear" <greearb@candelatech.com> wrote:

On 05/19/2011 12:21 PM, Camilo Mesias wrote:
>
> Hi,
>
> I tried a few more things, I found another ...
We do rmmod a fair bit here, and haven't seen problems lately.

Do you happen to have another system you could try that NIC in?

Ben


> -Cam
>


-- 

Ben Greear <greearb@candelatech.com>

Candela Technologies Inc  http://www.candelatech.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.ath9k.org/pipermail/ath9k-devel/attachments/20110519/b1e7db31/attachment.htm 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [ath9k-devel] ath9k implicated in lockups, kernel 2.6.38
  2011-05-19 19:58             ` Camilo Mesias
@ 2011-05-19 20:34               ` Ben Greear
  2011-05-19 22:35                 ` Camilo Mesias
  0 siblings, 1 reply; 11+ messages in thread
From: Ben Greear @ 2011-05-19 20:34 UTC (permalink / raw)
  To: ath9k-devel

On 05/19/2011 12:58 PM, Camilo Mesias wrote:
> Hmm possibly, although I think the other systems might not be pci-e. I
> don't think it's hw failure because the same system works fine under F14
> (I have two hard disks and can swap). Also another guy with the same
> netbook reported the same problem - my first message on this list
> mentions the topic and date.

Well, you could try doing a git bisect between the working kernel
and .38

It would still be good to know if the same problem happened
in another type of machine.  If so, any of us would have a chance
to reproduce it...but if not, then it's something special about
the motherboard/bus/bios/whatever.

Thanks,
Ben

>
> -Cam
>
>> On 19 May 2011 20:28, "Ben Greear" <greearb@candelatech.com
>> <mailto:greearb@candelatech.com>> wrote:
>>
>> On 05/19/2011 12:21 PM, Camilo Mesias wrote:
>> >
>> > Hi,
>> >
>> > I tried a few more things, I found another ...
>>
>> We do rmmod a fair bit here, and haven't seen problems lately.
>>
>> Do you happen to have another system you could try that NIC in?
>>
>> Ben
>>
>>
>>     -Cam
>>
>>
>>
>> --
>>
>> Ben Greear <greearb at candelatech.com <mailto:greearb@candelatech.com>>
>>
>> Candela Technologies Inc http://www.candelatech.com
>>


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [ath9k-devel] ath9k implicated in lockups, kernel 2.6.38
  2011-05-19 20:34               ` Ben Greear
@ 2011-05-19 22:35                 ` Camilo Mesias
  0 siblings, 0 replies; 11+ messages in thread
From: Camilo Mesias @ 2011-05-19 22:35 UTC (permalink / raw)
  To: ath9k-devel

I'll spend some time on it this weekend, there are a few machines that
might be viable for swapping. The git bisect thing sounds interesting
too (I haven't used git before but I'm interested).

I'm trying to think if there is anything special about this machine,
wireless wise the rfkill switch affects the bluetooth and wifi
simultaneously but is a single button with built-in bi-colour LED. I
wonder if that could cause races or locking issues... I might try
blacklisting bluetooth and see what happens.

Thanks for all the input so far.

-Cam

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2011-05-19 22:35 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-19 10:23 [ath9k-devel] ath9k implicated in lockups, kernel 2.6.38 Camilo Mesias
2011-05-19 14:20 ` Peter Stuge
2011-05-19 15:21 ` Ben Greear
2011-05-19 17:15   ` Camilo Mesias
2011-05-19 17:22     ` Ben Greear
2011-05-19 18:22       ` Camilo Mesias
2011-05-19 19:21         ` Camilo Mesias
2011-05-19 19:27           ` Ben Greear
2011-05-19 19:58             ` Camilo Mesias
2011-05-19 20:34               ` Ben Greear
2011-05-19 22:35                 ` Camilo Mesias

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.