linux-wireless.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ben Greear <greearb@candelatech.com>
To: "Luis R. Rodriguez" <mcgrof@gmail.com>
Cc: Johannes Berg <johannes@sipsolutions.net>,
	Tejun Heo <tj@kernel.org>,
	linux-wireless@vger.kernel.org
Subject: Re: [PATCH] mac80211: Fix deadlock in ieee80211_do_stop.
Date: Wed, 08 Dec 2010 10:28:23 -0800	[thread overview]
Message-ID: <4CFFCE47.8040305@candelatech.com> (raw)
In-Reply-To: <4CFFCC31.1050408@candelatech.com>

On 12/08/2010 10:19 AM, Ben Greear wrote:
> On 12/08/2010 09:36 AM, Ben Greear wrote:
>> On 11/19/2010 02:27 PM, Luis R. Rodriguez wrote:
>>> On Fri, Nov 19, 2010 at 12:55 PM, Ben Greear<greearb@candelatech.com>
>>> wrote:
>>>> On 11/19/2010 09:57 AM, Johannes Berg wrote:
>>>>>
>>>>> On Fri, 2010-11-19 at 15:34 +0100, Tejun Heo wrote:
>>>>>
>>>>>> Awesome. :-)
>>>>>>
>>>>>> Ben, if you have trouble generating full trace, please let me know if
>>>>>> there's something I can buy which isn't too expensive to reproduce
>>>>>> the
>>>>>> problem. I would be happy to track it down myself.
>>>>>
>>>>> Maybe you can try Ben's setup in kvm (or directly on your box if you
>>>>> like) with mac80211_hwsim. From a mac80211 POV it should be almost
>>>>> equivalent, although it'll do different memory allocation patterns
>>>>> etc.
>>>>
>>>> I tried manually backing out my patch, and now I can no longer
>>>> reproduce
>>>> the problem. Maybe something in -rc2 fixed it, or maybe some changes
>>>> to my environment just made it harder to hit.
>>>>
>>>> If you see no logical reason why calling flush_work with RTNL held
>>>> would cause trouble, then I guess we can just leave the code as is
>>>> for now.
>>>>
>>>> If you do want to play with this yourself, I think any ath5k type
>>>> adapter
>>>> with 64+ virtual stations configured would be a valid test case. My
>>>> application calls ifdown/ifup on them a few times after being created
>>>> and then generates traffic (and gathers stats, calls 'iwconfig', etc).
>>>> As configured in the original scenario that reproduced the problem,
>>>> the STAs had no encryption and were all associating with a single AP.
>>>> wpa_supplicant was not being used.
>>>
>>> FWIW, I had to do similar tests before and Ben offered up a perl
>>> script to do something similar to what his proprietary app does upon
>>> device bring up. I've modified it just a bit and you can find it here:
>>>
>>> http://www.kernel.org/pub/linux/kernel/people/mcgrof/scripts/poo.pl
>>
>> Well, I backed out my work-around patch yesterday, and then let
>> the system run overnight. This morning it is mostly dead, spewing
>> OOM errors and with a bunch of 'sh' processes using maximum amount
>> of CPU, blocked on trying to acquire rtnl.
>>
>> There is one 'ip' process that appears to hold rtnl and is trying
>> to call ieee80211_do_stop, which is probably blocked down in
>> the work-queue logic just like last time. Lots of worker processes
>> attempting to grab rtnl (and many other processes as well.)
>>
>> Lockdep was disabled because a proprietary module of mine was attempted
>> to be loaded, but it doesn't actual load due to symbol mismatch
>> (it's compiled against a non-debug kernel).
>>
>> If the lockdep info is critical, I can attempt to reproduce with
>> my module completed removed from the file system so it cannot attempt
>> to load, but it seems like last time the 'sysrq t' was of more interest
>> anyway.
>>
>> I have uploaded what I believe is a full 'sysrq t' output, interspersed
>> with OOM warnings that are constantly spewing to the console,
>> here:
>>
>> http://www.candelatech.com/~greearb/minicom_ath9k_log.txt
>
> And here's a log with lockdep enabled:
>
> http://www.candelatech.com/~greearb/minicom_ath9k_log2.txt
>
> The sysrq output starts at line 1346 in this file.
>
> Seems I have a decent environment for reproducing this today,
> in case you have any debug you'd like me to add.

And one more thing:  It seems it doesn't always block forever.
The system in that last trace actually recovered after a
minute or two, though it periodically enters the blocked
state again.

I'm going to re-add my hack, but will be happy to remove
it and test more if you guys want to help debug the problem.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


  reply	other threads:[~2010-12-08 18:28 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-12 20:07 [PATCH] mac80211: Fix deadlock in ieee80211_do_stop greearb
2010-11-12 20:08 ` Luis R. Rodriguez
2010-11-12 20:16   ` Ben Greear
2010-11-12 20:49 ` Johannes Berg
2010-11-12 20:57   ` Ben Greear
2010-11-12 21:08     ` Johannes Berg
2010-11-12 21:51       ` Ben Greear
2010-11-13 10:34       ` Tejun Heo
2010-11-15 21:16         ` Ben Greear
2010-11-16 14:19           ` Tejun Heo
2010-11-16 16:51             ` Ben Greear
2010-11-17  8:55               ` Tejun Heo
2010-11-17 17:37                 ` Ben Greear
2010-11-16 17:40             ` Johannes Berg
2010-11-17  8:47               ` Tejun Heo
2010-11-17 18:53                 ` Johannes Berg
2010-11-17 18:59                   ` Ben Greear
2010-11-17 19:03                     ` Johannes Berg
2010-11-18  6:34                   ` Tejun Heo
2010-11-18  7:07                     ` Johannes Berg
2010-11-18  7:22                       ` Tejun Heo
2010-11-18 16:59                         ` Johannes Berg
2010-11-19 14:34                           ` Tejun Heo
2010-11-19 17:57                             ` Johannes Berg
2010-11-19 20:55                               ` Ben Greear
2010-11-19 22:27                                 ` Luis R. Rodriguez
2010-12-08 17:36                                   ` Ben Greear
2010-12-08 18:19                                     ` Ben Greear
2010-12-08 18:28                                       ` Ben Greear [this message]
2010-12-09 14:34                                         ` Tejun Heo
2010-12-09 14:42                                           ` Johannes Berg
2010-12-09 14:46                                             ` Tejun Heo
2010-12-09 16:17                                               ` Tejun Heo
     [not found]                                                 ` <4D0156F6.4000306@candelate ch.com>
2010-12-09 17:27                                                 ` Ben Greear
2010-12-09 22:23                                                 ` Ben Greear
2010-12-10 15:11                                                   ` Tejun Heo
2010-12-10 16:35                                                     ` Ben Greear
2010-11-18 17:55                         ` Ben Greear
2010-11-18 18:04                           ` Tejun Heo
2010-11-18 18:11                             ` Ben Greear
2010-11-17 20:13             ` Ben Greear

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CFFCE47.8040305@candelatech.com \
    --to=greearb@candelatech.com \
    --cc=johannes@sipsolutions.net \
    --cc=linux-wireless@vger.kernel.org \
    --cc=mcgrof@gmail.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).