linux-wireless.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: Johannes Berg <johannes@sipsolutions.net>
Cc: Ben Greear <greearb@candelatech.com>, linux-wireless@vger.kernel.org
Subject: Re: [PATCH] mac80211:  Fix deadlock in ieee80211_do_stop.
Date: Sat, 13 Nov 2010 11:34:03 +0100	[thread overview]
Message-ID: <4CDE699B.70401@kernel.org> (raw)
In-Reply-To: <1289596096.3736.13.camel@jlt3.sipsolutions.net>

Hello,

On 11/12/2010 10:08 PM, Johannes Berg wrote:
> On Fri, 2010-11-12 at 12:57 -0800, Ben Greear wrote:
> 
>>> However, I also don't think it should be necessary to do this.
>>> sdata->work is always queued on local->workqueue, which is created using
>>> alloc_ordered_workqueue(), and there is no work on this workqueue that
>>> uses the RTNL. Therefore, even flushing the entire workqueue must work,
>>> unless alloc_ordered_workqueue() has no such guarantee any more -- which
>>> I would consider to be a bug in the new workqueue framework.

alloc_ordered_workqueue() itself doesn't have forward progress
guarantee under memory pressure.  You'll need to add WQ_MEM_RECLAIM
for that, but I don't really think that's the problem here as it isn't
in the memory reclaim path.

>> The problem appears (to me) to be that the flush_work() attempts
>> to wait for the worker to complete it's current task.  The worker can
>> be doing a completely separate task (ie, wireless_nlevent_process),
>> but that task can never complete because do_stop() holds rtnl
>> and the task-in-progress may block on acquiring rtnl.
>>
>> So, flush_work() cannot make any progress.
>>
>> The stack-traces for hung programs I originally posted seem
>> to agree with this analysis.

Can you please attach the whole thing?

>> So far, I reproduced the bug around 20 times in a row witout the patch,
>> and since I added this patch, I have two good runs in a row, so it definitely
>> has an affect.
>>
>> If my assumptions are correct, it would seem to unsafe to EVER
>> call flush_work() while holding rtnl (or indeed, any other lock
>> that any other work could possibly require).
> 
> Well then in that case all I'm saying is that we have a bug in the
> workqueue code, because it used to be allowed to flush a workqueue that
> never had any work items on it that grabbed the RTNL themselves. I
> actually added lockdep detection for the case where you _did_, but since
> I'm sure we don't have anything that grabs the RTNL on our workqueue,
> this should be OK.

Yeah, that would be a pretty big flaw in the workqueue code and we
would be seeing the system collapsing in whole lot more places.  It's
likely a more subtle issue which is livelocking or fork-bombing
workqueue.  New workqueue code maintains compatibility with the old
implementation but sometimes does triggers problems which were masked
due to the statically allocated execution resource before.  So, there
are few cases where the user code needs to be adjusted (e.g. xfs
needed some tweaking).

Anyways, let's look into what's going on.

Thanks.

-- 
tejun

  parent reply	other threads:[~2010-11-13 10:34 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-12 20:07 [PATCH] mac80211: Fix deadlock in ieee80211_do_stop greearb
2010-11-12 20:08 ` Luis R. Rodriguez
2010-11-12 20:16   ` Ben Greear
2010-11-12 20:49 ` Johannes Berg
2010-11-12 20:57   ` Ben Greear
2010-11-12 21:08     ` Johannes Berg
2010-11-12 21:51       ` Ben Greear
2010-11-13 10:34       ` Tejun Heo [this message]
2010-11-15 21:16         ` Ben Greear
2010-11-16 14:19           ` Tejun Heo
2010-11-16 16:51             ` Ben Greear
2010-11-17  8:55               ` Tejun Heo
2010-11-17 17:37                 ` Ben Greear
2010-11-16 17:40             ` Johannes Berg
2010-11-17  8:47               ` Tejun Heo
2010-11-17 18:53                 ` Johannes Berg
2010-11-17 18:59                   ` Ben Greear
2010-11-17 19:03                     ` Johannes Berg
2010-11-18  6:34                   ` Tejun Heo
2010-11-18  7:07                     ` Johannes Berg
2010-11-18  7:22                       ` Tejun Heo
2010-11-18 16:59                         ` Johannes Berg
2010-11-19 14:34                           ` Tejun Heo
2010-11-19 17:57                             ` Johannes Berg
2010-11-19 20:55                               ` Ben Greear
2010-11-19 22:27                                 ` Luis R. Rodriguez
2010-12-08 17:36                                   ` Ben Greear
2010-12-08 18:19                                     ` Ben Greear
2010-12-08 18:28                                       ` Ben Greear
2010-12-09 14:34                                         ` Tejun Heo
2010-12-09 14:42                                           ` Johannes Berg
2010-12-09 14:46                                             ` Tejun Heo
2010-12-09 16:17                                               ` Tejun Heo
     [not found]                                                 ` <4D0156F6.4000306@candelate ch.com>
2010-12-09 17:27                                                 ` Ben Greear
2010-12-09 22:23                                                 ` Ben Greear
2010-12-10 15:11                                                   ` Tejun Heo
2010-12-10 16:35                                                     ` Ben Greear
2010-11-18 17:55                         ` Ben Greear
2010-11-18 18:04                           ` Tejun Heo
2010-11-18 18:11                             ` Ben Greear
2010-11-17 20:13             ` Ben Greear

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CDE699B.70401@kernel.org \
    --to=tj@kernel.org \
    --cc=greearb@candelatech.com \
    --cc=johannes@sipsolutions.net \
    --cc=linux-wireless@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).