linux-wireless.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Regression, 2.6.36-rc1] ath9k resume problem on Acer Ferrari One
@ 2010-08-18 22:01 Rafael J. Wysocki
  2010-08-19  1:00 ` Luis R. Rodriguez
  2010-08-19  8:11 ` Tejun Heo
  0 siblings, 2 replies; 11+ messages in thread
From: Rafael J. Wysocki @ 2010-08-18 22:01 UTC (permalink / raw)
  To: Luis Rodriguez, Tejun Heo
  Cc: LKML, Linux-pm mailing list, linux-wireless, ath9k-devel, Maciej Rutecki

Hi,

While testing 2.6.36-rc1 (with a couple of fixes on top) I noticed that the ath9k
driver didn't work after resume from suspend to RAM.  An attempt to unload the
driver using rmmod caused the BUG_ON() in kernel/workqueue.c:2844 to trigger.

I wonder if that regression is a result of the recent workqueue changes?

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Regression, 2.6.36-rc1] ath9k resume problem on Acer Ferrari One
  2010-08-18 22:01 [Regression, 2.6.36-rc1] ath9k resume problem on Acer Ferrari One Rafael J. Wysocki
@ 2010-08-19  1:00 ` Luis R. Rodriguez
  2010-08-19 13:55   ` Rafael J. Wysocki
  2010-08-19  8:11 ` Tejun Heo
  1 sibling, 1 reply; 11+ messages in thread
From: Luis R. Rodriguez @ 2010-08-19  1:00 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Tejun Heo, LKML, Linux-pm mailing list, linux-wireless,
	ath9k-devel, Maciej Rutecki

On Wed, Aug 18, 2010 at 3:01 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> Hi,
>
> While testing 2.6.36-rc1 (with a couple of fixes on top)

Which couple of fixes?

> I noticed that the ath9k
> driver didn't work after resume from suspend to RAM.

To rule out if its an ath9k issue you can try
compat-wireless-2.6.36-rc1 from here:

http://wireless.kernel.org/en/users/Download/stable/

and install it on an older kernel, you can use ./scripts/driver-select
to only enable ath9k to compile.

I've been using pm-suspend on this release for a few days now without
any issue but I am using an AR9003 chipset. What chipset are you
using? Can you provide the dmesg output upon module load?

>  An attempt to unload the
> driver using rmmod caused the BUG_ON() in kernel/workqueue.c:2844 to trigger.

That's a bug, a regression likely.

> I wonder if that regression is a result of the recent workqueue changes?

Yeah very likely.

  Luis

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Regression, 2.6.36-rc1] ath9k resume problem on Acer Ferrari One
  2010-08-18 22:01 [Regression, 2.6.36-rc1] ath9k resume problem on Acer Ferrari One Rafael J. Wysocki
  2010-08-19  1:00 ` Luis R. Rodriguez
@ 2010-08-19  8:11 ` Tejun Heo
  2010-08-19 14:05   ` Rafael J. Wysocki
  1 sibling, 1 reply; 11+ messages in thread
From: Tejun Heo @ 2010-08-19  8:11 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Luis Rodriguez, LKML, Linux-pm mailing list, linux-wireless,
	ath9k-devel, Maciej Rutecki

Hello, Rafael.

On 08/19/2010 12:01 AM, Rafael J. Wysocki wrote:
> While testing 2.6.36-rc1 (with a couple of fixes on top) I noticed
> that the ath9k driver didn't work after resume from suspend to RAM.
> An attempt to unload the driver using rmmod caused the BUG_ON() in
> kernel/workqueue.c:2844 to trigger.

That BUG_ON() triggers if destroy_workqueue() is called while work
items are still pending on the workqueue.  Can you please trigger
stack traces after resume and post it?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Regression, 2.6.36-rc1] ath9k resume problem on Acer Ferrari One
  2010-08-19  1:00 ` Luis R. Rodriguez
@ 2010-08-19 13:55   ` Rafael J. Wysocki
  0 siblings, 0 replies; 11+ messages in thread
From: Rafael J. Wysocki @ 2010-08-19 13:55 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Tejun Heo, LKML, Linux-pm mailing list, linux-wireless,
	ath9k-devel, Maciej Rutecki

On Thursday, August 19, 2010, Luis R. Rodriguez wrote:
> On Wed, Aug 18, 2010 at 3:01 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > Hi,
> >
> > While testing 2.6.36-rc1 (with a couple of fixes on top)
> 
> Which couple of fixes?

AMD bood fix, HID suspend fix and shmem fix (two of them have already been
merged).

> > I noticed that the ath9k
> > driver didn't work after resume from suspend to RAM.
> 
> To rule out if its an ath9k issue you can try
> compat-wireless-2.6.36-rc1 from here:
> 
> http://wireless.kernel.org/en/users/Download/stable/
> 
> and install it on an older kernel, you can use ./scripts/driver-select
> to only enable ath9k to compile.

Well, that sounds a bit complicated and even if I know it's not ath9k,
that's not going to help me find the real source of the problem.

> I've been using pm-suspend on this release for a few days now without
> any issue but I am using an AR9003 chipset. What chipset are you
> using? Can you provide the dmesg output upon module load?

Sure.

[    9.680128] ath9k 0000:09:00.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19
[    9.689487] ath9k 0000:09:00.0: setting latency timer to 64
[    9.706383] HDA Intel 0000:00:14.2: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[    9.805043] hda_codec: ALC272X: BIOS auto-probing.
[    9.819389] input: HDA Digital PCBeep as /devices/pci0000:00/0000:00:14.2/input/input8
[   10.155052] ath: EEPROM regdomain: 0x65
[   10.155058] ath: EEPROM indicates we should expect a direct regpair map
[   10.155066] ath: Country alpha2 being used: 00
[   10.155070] ath: Regpair used: 0x65
[   10.179762] phy0: Selected rate control algorithm 'ath9k_rate_control'
[   10.181657] Registered led device: ath9k-phy0::radio
[   10.181890] Registered led device: ath9k-phy0::assoc
[   10.182139] Registered led device: ath9k-phy0::tx
[   10.182339] Registered led device: ath9k-phy0::rx

lspci says it's:

09:00.0 Network controller: Atheros Communications Inc. AR928X Wireless Network Adapter (PCI-Express) (rev 01)

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Regression, 2.6.36-rc1] ath9k resume problem on Acer Ferrari One
  2010-08-19  8:11 ` Tejun Heo
@ 2010-08-19 14:05   ` Rafael J. Wysocki
  2010-08-19 14:19     ` Tejun Heo
  0 siblings, 1 reply; 11+ messages in thread
From: Rafael J. Wysocki @ 2010-08-19 14:05 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Luis Rodriguez, LKML, Linux-pm mailing list, linux-wireless,
	ath9k-devel, Maciej Rutecki

On Thursday, August 19, 2010, Tejun Heo wrote:
> Hello, Rafael.
> 
> On 08/19/2010 12:01 AM, Rafael J. Wysocki wrote:
> > While testing 2.6.36-rc1 (with a couple of fixes on top) I noticed
> > that the ath9k driver didn't work after resume from suspend to RAM.
> > An attempt to unload the driver using rmmod caused the BUG_ON() in
> > kernel/workqueue.c:2844 to trigger.
> 
> That BUG_ON() triggers if destroy_workqueue() is called while work
> items are still pending on the workqueue.  Can you please trigger
> stack traces after resume and post it?

Do you mean sysrq-t?

Rafael

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Regression, 2.6.36-rc1] ath9k resume problem on Acer Ferrari One
  2010-08-19 14:05   ` Rafael J. Wysocki
@ 2010-08-19 14:19     ` Tejun Heo
  2010-08-19 14:23       ` Tejun Heo
  2010-08-19 20:31       ` Rafael J. Wysocki
  0 siblings, 2 replies; 11+ messages in thread
From: Tejun Heo @ 2010-08-19 14:19 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Luis Rodriguez, LKML, Linux-pm mailing list, linux-wireless,
	ath9k-devel, Maciej Rutecki

Hello,

On 08/19/2010 04:05 PM, Rafael J. Wysocki wrote:
> On Thursday, August 19, 2010, Tejun Heo wrote:
>> Hello, Rafael.
>>
>> On 08/19/2010 12:01 AM, Rafael J. Wysocki wrote:
>>> While testing 2.6.36-rc1 (with a couple of fixes on top) I noticed
>>> that the ath9k driver didn't work after resume from suspend to RAM.
>>> An attempt to unload the driver using rmmod caused the BUG_ON() in
>>> kernel/workqueue.c:2844 to trigger.
>>
>> That BUG_ON() triggers if destroy_workqueue() is called while work
>> items are still pending on the workqueue.  Can you please trigger
>> stack traces after resume and post it?
> 
> Do you mean sysrq-t?

Yeah, I'm a bit confused regarding what's going on.  I thought the
most likely cause is thawing failing to kick a frozen workqueue into
working state but then flush_workqueue() which is called from
destroy_workqueue() should have hung too, that is, unless
flush_workqueue() is broken too.  If flush_workqueue() is not broken,
then it could be that workqueue itself isn't at fault and works are
being scheduled and executed fine for the workqueue ath9k is using but
the driver doesn't work for another reason.

Also, the BUG_ON() being triggered means either flush_workqueue() is
broken or the driver is failing to stop works on the workqueue from
being requeued before calling destroy_workqueue().  So, finding out
the followings would be great,

* While the driver isn't working, do a sysrq-t and see whether any
  worker is executing a work for ath9k.

* Repeat it several times and see whether the work is stuck or making
  progress and/or executing on different workers.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Regression, 2.6.36-rc1] ath9k resume problem on Acer Ferrari One
  2010-08-19 14:19     ` Tejun Heo
@ 2010-08-19 14:23       ` Tejun Heo
  2010-08-19 20:17         ` Rafael J. Wysocki
  2010-08-19 20:31       ` Rafael J. Wysocki
  1 sibling, 1 reply; 11+ messages in thread
From: Tejun Heo @ 2010-08-19 14:23 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Luis Rodriguez, LKML, Linux-pm mailing list, linux-wireless,
	ath9k-devel, Maciej Rutecki

Oh, can you also please attach log of the BUG()?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Regression, 2.6.36-rc1] ath9k resume problem on Acer Ferrari One
  2010-08-19 14:23       ` Tejun Heo
@ 2010-08-19 20:17         ` Rafael J. Wysocki
  0 siblings, 0 replies; 11+ messages in thread
From: Rafael J. Wysocki @ 2010-08-19 20:17 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Luis Rodriguez, LKML, Linux-pm mailing list, linux-wireless,
	ath9k-devel, Maciej Rutecki

On Thursday, August 19, 2010, Tejun Heo wrote:
> Oh, can you also please attach log of the BUG()?

That's difficult, because I have no way to collect it after it's happened.

I can try to convert it to WARN_ON or rewrite the call trace by hand.

Or I may try to make a photo. :-)

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Regression, 2.6.36-rc1] ath9k resume problem on Acer Ferrari One
  2010-08-19 14:19     ` Tejun Heo
  2010-08-19 14:23       ` Tejun Heo
@ 2010-08-19 20:31       ` Rafael J. Wysocki
  2010-08-19 20:42         ` Luis R. Rodriguez
  1 sibling, 1 reply; 11+ messages in thread
From: Rafael J. Wysocki @ 2010-08-19 20:31 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Luis Rodriguez, LKML, Linux-pm mailing list, linux-wireless,
	ath9k-devel, Maciej Rutecki

On Thursday, August 19, 2010, Tejun Heo wrote:
> Hello,
> 
> On 08/19/2010 04:05 PM, Rafael J. Wysocki wrote:
> > On Thursday, August 19, 2010, Tejun Heo wrote:
> >> Hello, Rafael.
> >>
> >> On 08/19/2010 12:01 AM, Rafael J. Wysocki wrote:
> >>> While testing 2.6.36-rc1 (with a couple of fixes on top) I noticed
> >>> that the ath9k driver didn't work after resume from suspend to RAM.
> >>> An attempt to unload the driver using rmmod caused the BUG_ON() in
> >>> kernel/workqueue.c:2844 to trigger.
> >>
> >> That BUG_ON() triggers if destroy_workqueue() is called while work
> >> items are still pending on the workqueue.  Can you please trigger
> >> stack traces after resume and post it?
> > 
> > Do you mean sysrq-t?
> 
> Yeah, I'm a bit confused regarding what's going on.  I thought the
> most likely cause is thawing failing to kick a frozen workqueue into
> working state but then flush_workqueue() which is called from
> destroy_workqueue() should have hung too, that is, unless
> flush_workqueue() is broken too.  If flush_workqueue() is not broken,
> then it could be that workqueue itself isn't at fault and works are
> being scheduled and executed fine for the workqueue ath9k is using but
> the driver doesn't work for another reason.
> 
> Also, the BUG_ON() being triggered means either flush_workqueue() is
> broken or the driver is failing to stop works on the workqueue from
> being requeued before calling destroy_workqueue().  So, finding out
> the followings would be great,
> 
> * While the driver isn't working, do a sysrq-t and see whether any
>   worker is executing a work for ath9k.
> 
> * Repeat it several times and see whether the work is stuck or making
>   progress and/or executing on different workers.

Actaully, I'm unable to reproduce the resume issue with current mainline
(HEAD = 763008c4357b73c8d18396dfd8d79dc58fa3f99d), so I guess it either is
a race (or another timing issue), or it's been fixed by one of the patches on
top of -rc1.

I'll let you know if I see it again.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Regression, 2.6.36-rc1] ath9k resume problem on Acer Ferrari One
  2010-08-19 20:31       ` Rafael J. Wysocki
@ 2010-08-19 20:42         ` Luis R. Rodriguez
  2010-08-19 21:07           ` Rafael J. Wysocki
  0 siblings, 1 reply; 11+ messages in thread
From: Luis R. Rodriguez @ 2010-08-19 20:42 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Tejun Heo, Luis Rodriguez, LKML, Linux-pm mailing list,
	linux-wireless, ath9k-devel, Maciej Rutecki

On Thu, Aug 19, 2010 at 01:31:01PM -0700, Rafael J. Wysocki wrote:
> On Thursday, August 19, 2010, Tejun Heo wrote:
> > Hello,
> > 
> > On 08/19/2010 04:05 PM, Rafael J. Wysocki wrote:
> > > On Thursday, August 19, 2010, Tejun Heo wrote:
> > >> Hello, Rafael.
> > >>
> > >> On 08/19/2010 12:01 AM, Rafael J. Wysocki wrote:
> > >>> While testing 2.6.36-rc1 (with a couple of fixes on top) I noticed
> > >>> that the ath9k driver didn't work after resume from suspend to RAM.
> > >>> An attempt to unload the driver using rmmod caused the BUG_ON() in
> > >>> kernel/workqueue.c:2844 to trigger.
> > >>
> > >> That BUG_ON() triggers if destroy_workqueue() is called while work
> > >> items are still pending on the workqueue.  Can you please trigger
> > >> stack traces after resume and post it?
> > > 
> > > Do you mean sysrq-t?
> > 
> > Yeah, I'm a bit confused regarding what's going on.  I thought the
> > most likely cause is thawing failing to kick a frozen workqueue into
> > working state but then flush_workqueue() which is called from
> > destroy_workqueue() should have hung too, that is, unless
> > flush_workqueue() is broken too.  If flush_workqueue() is not broken,
> > then it could be that workqueue itself isn't at fault and works are
> > being scheduled and executed fine for the workqueue ath9k is using but
> > the driver doesn't work for another reason.
> > 
> > Also, the BUG_ON() being triggered means either flush_workqueue() is
> > broken or the driver is failing to stop works on the workqueue from
> > being requeued before calling destroy_workqueue().  So, finding out
> > the followings would be great,
> > 
> > * While the driver isn't working, do a sysrq-t and see whether any
> >   worker is executing a work for ath9k.
> > 
> > * Repeat it several times and see whether the work is stuck or making
> >   progress and/or executing on different workers.
> 
> Actaully, I'm unable to reproduce the resume issue with current mainline
> (HEAD = 763008c4357b73c8d18396dfd8d79dc58fa3f99d), so I guess it either is
> a race (or another timing issue), or it's been fixed by one of the patches on
> top of -rc1.
> 
> I'll let you know if I see it again.

To be clear, this is a non-issue now until further notice, ACK?

  Luis

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Regression, 2.6.36-rc1] ath9k resume problem on Acer Ferrari One
  2010-08-19 20:42         ` Luis R. Rodriguez
@ 2010-08-19 21:07           ` Rafael J. Wysocki
  0 siblings, 0 replies; 11+ messages in thread
From: Rafael J. Wysocki @ 2010-08-19 21:07 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Tejun Heo, Luis Rodriguez, LKML, Linux-pm mailing list,
	linux-wireless, ath9k-devel, Maciej Rutecki

On Thursday, August 19, 2010, Luis R. Rodriguez wrote:
> On Thu, Aug 19, 2010 at 01:31:01PM -0700, Rafael J. Wysocki wrote:
> > On Thursday, August 19, 2010, Tejun Heo wrote:
> > > Hello,
> > > 
> > > On 08/19/2010 04:05 PM, Rafael J. Wysocki wrote:
> > > > On Thursday, August 19, 2010, Tejun Heo wrote:
> > > >> Hello, Rafael.
> > > >>
> > > >> On 08/19/2010 12:01 AM, Rafael J. Wysocki wrote:
> > > >>> While testing 2.6.36-rc1 (with a couple of fixes on top) I noticed
> > > >>> that the ath9k driver didn't work after resume from suspend to RAM.
> > > >>> An attempt to unload the driver using rmmod caused the BUG_ON() in
> > > >>> kernel/workqueue.c:2844 to trigger.
> > > >>
> > > >> That BUG_ON() triggers if destroy_workqueue() is called while work
> > > >> items are still pending on the workqueue.  Can you please trigger
> > > >> stack traces after resume and post it?
> > > > 
> > > > Do you mean sysrq-t?
> > > 
> > > Yeah, I'm a bit confused regarding what's going on.  I thought the
> > > most likely cause is thawing failing to kick a frozen workqueue into
> > > working state but then flush_workqueue() which is called from
> > > destroy_workqueue() should have hung too, that is, unless
> > > flush_workqueue() is broken too.  If flush_workqueue() is not broken,
> > > then it could be that workqueue itself isn't at fault and works are
> > > being scheduled and executed fine for the workqueue ath9k is using but
> > > the driver doesn't work for another reason.
> > > 
> > > Also, the BUG_ON() being triggered means either flush_workqueue() is
> > > broken or the driver is failing to stop works on the workqueue from
> > > being requeued before calling destroy_workqueue().  So, finding out
> > > the followings would be great,
> > > 
> > > * While the driver isn't working, do a sysrq-t and see whether any
> > >   worker is executing a work for ath9k.
> > > 
> > > * Repeat it several times and see whether the work is stuck or making
> > >   progress and/or executing on different workers.
> > 
> > Actaully, I'm unable to reproduce the resume issue with current mainline
> > (HEAD = 763008c4357b73c8d18396dfd8d79dc58fa3f99d), so I guess it either is
> > a race (or another timing issue), or it's been fixed by one of the patches on
> > top of -rc1.
> > 
> > I'll let you know if I see it again.
> 
> To be clear, this is a non-issue now until further notice, ACK?

Yep.

Rafael

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-08-19 21:09 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-18 22:01 [Regression, 2.6.36-rc1] ath9k resume problem on Acer Ferrari One Rafael J. Wysocki
2010-08-19  1:00 ` Luis R. Rodriguez
2010-08-19 13:55   ` Rafael J. Wysocki
2010-08-19  8:11 ` Tejun Heo
2010-08-19 14:05   ` Rafael J. Wysocki
2010-08-19 14:19     ` Tejun Heo
2010-08-19 14:23       ` Tejun Heo
2010-08-19 20:17         ` Rafael J. Wysocki
2010-08-19 20:31       ` Rafael J. Wysocki
2010-08-19 20:42         ` Luis R. Rodriguez
2010-08-19 21:07           ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).