linux-wireless.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* cfg80211_disconnected memory leak
@ 2012-07-31 17:36 Daniel Drake
  2012-08-01 17:39 ` Johannes Berg
  0 siblings, 1 reply; 9+ messages in thread
From: Daniel Drake @ 2012-07-31 17:36 UTC (permalink / raw)
  To: linux-wireless, johannes

Hi,

After doing insmod/rmmod of libertas, kmemleak found:

unreferenced object 0xe90f1398 (size 64):
  comm "rmmod", pid 836, jiffies 4294944467 (age 34.620s)
  hex dump (first 32 bytes):
    58 cd 24 e9 58 cd 24 e9 02 00 00 00 c0 13 0f e9  X.$.X.$.........
    00 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<b0747a00>] kmemleak_alloc+0x26/0x44
    [<b049eff1>] __kmalloc+0xf3/0x176
    [<b0740aae>] cfg80211_disconnected+0x3e/0xc8
    [<b06008c9>] lbs_disconnect+0x73/0x86
    [<b0600955>] lbs_cfg_disconnect+0x79/0x88
    [<b0741c20>] __cfg80211_disconnect+0xf5/0x148
    [<b072cd8f>] cfg80211_netdev_notifier_call+0x253/0x452
    [<b075781b>] notifier_call_chain+0x2a/0x4b
    [<b04344f6>] __raw_notifier_call_chain+0x13/0x15
    [<b0434509>] raw_notifier_call_chain+0x11/0x13
    [<b06a51ff>] call_netdevice_notifiers+0x41/0x48
    [<b06a5247>] __dev_close_many+0x41/0x8b
    [<b06a5323>] dev_close_many+0x58/0xa8
    [<b06a5401>] rollback_registered_many+0x8e/0x1f8
    [<b06a5593>] rollback_registered+0x28/0x34
    [<b06a66b0>] unregister_netdevice_queue+0x51/0x6e

By adding some printks I have found that cfg80211_disconnected() does
indeed queue an event to be processed in cfg80211_wq on the eth0
device, but by the time cfg80211_process_rdev_events() is called, eth0
is no longer present in the rdev's netdev_list, so the event never
gets processed (or freed).

Daniel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: cfg80211_disconnected memory leak
  2012-07-31 17:36 cfg80211_disconnected memory leak Daniel Drake
@ 2012-08-01 17:39 ` Johannes Berg
  2012-08-01 21:04   ` Daniel Drake
  2012-08-01 23:22   ` Daniel Drake
  0 siblings, 2 replies; 9+ messages in thread
From: Johannes Berg @ 2012-08-01 17:39 UTC (permalink / raw)
  To: Daniel Drake; +Cc: linux-wireless

Hi,

> unreferenced object 0xe90f1398 (size 64):

>   backtrace:
>     [<b0747a00>] kmemleak_alloc+0x26/0x44
>     [<b049eff1>] __kmalloc+0xf3/0x176
>     [<b0740aae>] cfg80211_disconnected+0x3e/0xc8
>     [<b06008c9>] lbs_disconnect+0x73/0x86
>     [<b0600955>] lbs_cfg_disconnect+0x79/0x88
>     [<b0741c20>] __cfg80211_disconnect+0xf5/0x148
>     [<b072cd8f>] cfg80211_netdev_notifier_call+0x253/0x452

> By adding some printks I have found that cfg80211_disconnected() does
> indeed queue an event to be processed in cfg80211_wq on the eth0
> device, but by the time cfg80211_process_rdev_events() is called, eth0
> is no longer present in the rdev's netdev_list, so the event never
> gets processed (or freed).

This is very odd. What version of the kernel is this?

The strange thing is that we call __cfg80211_disconnect() from the
netdev notifier with NETDEV_GOING_DOWN. This will allocate and queue the
work item as you found. The next thing that happens should be
NETDEV_DOWN, which will cause us to dev_hold() the device and then queue
the cleanup work. The cleanup work must run for us to dev_put() the
device, so that it can only be unregistered after that runs. Then,
finally, we get NETDEV_UNREGISTER which removes it from the list.

Now note that the work item we queue in __cfg80211_disconnect() is
queued *before* the cleanup work, therefore it should also run before
the cleanup work since the workqueue is singlethreaded.

Hence I have no idea how this comes about.

johannes


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: cfg80211_disconnected memory leak
  2012-08-01 17:39 ` Johannes Berg
@ 2012-08-01 21:04   ` Daniel Drake
  2012-08-01 23:22   ` Daniel Drake
  1 sibling, 0 replies; 9+ messages in thread
From: Daniel Drake @ 2012-08-01 21:04 UTC (permalink / raw)
  To: Johannes Berg; +Cc: linux-wireless

On Wed, Aug 1, 2012 at 11:39 AM, Johannes Berg
<johannes@sipsolutions.net> wrote:
> This is very odd. What version of the kernel is this?

3.3, and just reproduced on 3.5.

I'll have a closer look later in the week.

Thanks
Daniel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: cfg80211_disconnected memory leak
  2012-08-01 17:39 ` Johannes Berg
  2012-08-01 21:04   ` Daniel Drake
@ 2012-08-01 23:22   ` Daniel Drake
  2012-08-02  8:02     ` Johannes Berg
  1 sibling, 1 reply; 9+ messages in thread
From: Daniel Drake @ 2012-08-01 23:22 UTC (permalink / raw)
  To: Johannes Berg; +Cc: linux-wireless

On Wed, Aug 1, 2012 at 11:39 AM, Johannes Berg
<johannes@sipsolutions.net> wrote:
> The strange thing is that we call __cfg80211_disconnect() from the
> netdev notifier with NETDEV_GOING_DOWN. This will allocate and queue the
> work item as you found. The next thing that happens should be
> NETDEV_DOWN, which will cause us to dev_hold() the device and then queue
> the cleanup work. The cleanup work must run for us to dev_put() the
> device, so that it can only be unregistered after that runs. Then,
> finally, we get NETDEV_UNREGISTER which removes it from the list.
>
> Now note that the work item we queue in __cfg80211_disconnect() is
> queued *before* the cleanup work, therefore it should also run before
> the cleanup work since the workqueue is singlethreaded.

Here is what happens:

NETDEV_GOING_DOWN
cfg80211_disconnected() called, disconnect event work queued
NETDEV_DOWN
cleanup work queued
NETDEV_UNREGISTER
*** cfg80211_netdev_notifier_call now calls: list_del_rcu(&wdev->list);
disconnect even work runs, calls cfg80211_process_rdev_events() but
the wdev is already removed from rdev->netdev_list as above
cleanup work runs

The bit I marked with *** is what is causing the difficulties - it
runs before the work items do.

Daniel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: cfg80211_disconnected memory leak
  2012-08-01 23:22   ` Daniel Drake
@ 2012-08-02  8:02     ` Johannes Berg
  2012-08-02 16:11       ` Daniel Drake
  0 siblings, 1 reply; 9+ messages in thread
From: Johannes Berg @ 2012-08-02  8:02 UTC (permalink / raw)
  To: Daniel Drake; +Cc: linux-wireless

On Wed, 2012-08-01 at 17:22 -0600, Daniel Drake wrote:

> Here is what happens:
> 
> NETDEV_GOING_DOWN
> cfg80211_disconnected() called, disconnect event work queued
> NETDEV_DOWN
> cleanup work queued
> NETDEV_UNREGISTER
> *** cfg80211_netdev_notifier_call now calls: list_del_rcu(&wdev->list);
> disconnect even work runs, calls cfg80211_process_rdev_events() but
> the wdev is already removed from rdev->netdev_list as above
> cleanup work runs
> 
> The bit I marked with *** is what is causing the difficulties - it
> runs before the work items do.

Oh, hm. I didn't think it could unregister before we give up our
reference, but I guess that makes sense after all.

I'm not sure there's an easy way to fix it other than making the driver
not call cfg80211_disconnected() in case the disconnect was requested by
cfg80211 -- that call isn't needed and will not do anything at all, but
I'm not sure how easy that would be in the driver?

johannes


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: cfg80211_disconnected memory leak
  2012-08-02  8:02     ` Johannes Berg
@ 2012-08-02 16:11       ` Daniel Drake
  2012-08-02 16:26         ` Johannes Berg
  0 siblings, 1 reply; 9+ messages in thread
From: Daniel Drake @ 2012-08-02 16:11 UTC (permalink / raw)
  To: Johannes Berg; +Cc: linux-wireless

On Thu, Aug 2, 2012 at 2:02 AM, Johannes Berg <johannes@sipsolutions.net> wrote:
> Oh, hm. I didn't think it could unregister before we give up our
> reference, but I guess that makes sense after all.
>
> I'm not sure there's an easy way to fix it other than making the driver
> not call cfg80211_disconnected() in case the disconnect was requested by
> cfg80211 -- that call isn't needed and will not do anything at all, but
> I'm not sure how easy that would be in the driver?

I guess you've considered clearing all the pending work before
removing a netdev from the rdev's list?


I think a driver modification would be easy, if it is the right solution.

lbs_disconnect() is the function that calls cfg80211_disconnected().
We only ever call this in 2 contexts:

1. From our cfg80211_ops.disconnect handler - you say this isn't needed
2. From the netdev ndo_stop handler - I guess it is also not necessary
to inform cfg80211 that we have disconnected at this point, it is kind
of obvious..?

So completely removing the call to cfg80211_disconnected() may be the
right option here, is that what you recommend?

Thanks
Daniel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: cfg80211_disconnected memory leak
  2012-08-02 16:11       ` Daniel Drake
@ 2012-08-02 16:26         ` Johannes Berg
  2012-08-02 17:25           ` Daniel Drake
  0 siblings, 1 reply; 9+ messages in thread
From: Johannes Berg @ 2012-08-02 16:26 UTC (permalink / raw)
  To: Daniel Drake; +Cc: linux-wireless

On Thu, 2012-08-02 at 10:11 -0600, Daniel Drake wrote:
> On Thu, Aug 2, 2012 at 2:02 AM, Johannes Berg <johannes@sipsolutions.net> wrote:
> > Oh, hm. I didn't think it could unregister before we give up our
> > reference, but I guess that makes sense after all.
> >
> > I'm not sure there's an easy way to fix it other than making the driver
> > not call cfg80211_disconnected() in case the disconnect was requested by
> > cfg80211 -- that call isn't needed and will not do anything at all, but
> > I'm not sure how easy that would be in the driver?
> 
> I guess you've considered clearing all the pending work before
> removing a netdev from the rdev's list?

Yes, but we can't just try to flush the workqueue because of locking
concerns in this area.

Hmm. Then again, I think we can call cfg80211_process_wdev_events() from
case NETDEV_UNREGISTER though, probably after removing from the list.
Maybe you could try that?

> I think a driver modification would be easy, if it is the right solution.
> 
> lbs_disconnect() is the function that calls cfg80211_disconnected().
> We only ever call this in 2 contexts:
> 
> 1. From our cfg80211_ops.disconnect handler - you say this isn't needed
> 2. From the netdev ndo_stop handler - I guess it is also not necessary
> to inform cfg80211 that we have disconnected at this point, it is kind
> of obvious..?

In fact, you won't get to ndo_stop without cfg80211 calling
disconnect(), because it does that from NETDEV_GOING_DOWN.

> So completely removing the call to cfg80211_disconnected() may be the
> right option here, is that what you recommend?

I'm not 100% sure about the API in this area right now though, it's been
a while and I never worked much with this API (rather than the mac80211
one with auth/assoc/disassoc/deauth.)

johannes


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: cfg80211_disconnected memory leak
  2012-08-02 16:26         ` Johannes Berg
@ 2012-08-02 17:25           ` Daniel Drake
  2012-08-02 19:57             ` Johannes Berg
  0 siblings, 1 reply; 9+ messages in thread
From: Daniel Drake @ 2012-08-02 17:25 UTC (permalink / raw)
  To: Johannes Berg; +Cc: linux-wireless

On Thu, Aug 2, 2012 at 10:26 AM, Johannes Berg
<johannes@sipsolutions.net> wrote:
> Hmm. Then again, I think we can call cfg80211_process_wdev_events() from
> case NETDEV_UNREGISTER though, probably after removing from the list.
> Maybe you could try that?

That solves the issue - confirmed that kmemleak now shuts up, and with
some added printks to confirm event creation and freeing.
Patch coming up, titled:    cfg80211: process pending events when
unregistering net device

Even if libertas isn't quite doing the right thing here, I think this
is the right thing to do. I guess there are other situations, perhaps
more legitimate, where we can reach this point with events in the
queue.

> I'm not 100% sure about the API in this area right now though, it's been
> a while and I never worked much with this API (rather than the mac80211
> one with auth/assoc/disassoc/deauth.)

I think we both feel that removing it is correct. I'll test this when
I find some free time, and if things seem OK i'll post a libertas
patch in addition to the cfg80211 fix.

Thanks
Daniel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: cfg80211_disconnected memory leak
  2012-08-02 17:25           ` Daniel Drake
@ 2012-08-02 19:57             ` Johannes Berg
  0 siblings, 0 replies; 9+ messages in thread
From: Johannes Berg @ 2012-08-02 19:57 UTC (permalink / raw)
  To: Daniel Drake; +Cc: linux-wireless

On Thu, 2012-08-02 at 11:25 -0600, Daniel Drake wrote:
> On Thu, Aug 2, 2012 at 10:26 AM, Johannes Berg
> <johannes@sipsolutions.net> wrote:
> > Hmm. Then again, I think we can call cfg80211_process_wdev_events() from
> > case NETDEV_UNREGISTER though, probably after removing from the list.
> > Maybe you could try that?
> 
> That solves the issue - confirmed that kmemleak now shuts up, and with
> some added printks to confirm event creation and freeing.
> Patch coming up, titled:    cfg80211: process pending events when
> unregistering net device

Nice, thanks!

> Even if libertas isn't quite doing the right thing here, I think this
> is the right thing to do. I guess there are other situations, perhaps
> more legitimate, where we can reach this point with events in the
> queue.

Yes, I agree. I just wasn't completely sure this would be OK when I
looked first, and then started wondering why it didn't happen with
mac80211, but that doesn't send the event.

> > I'm not 100% sure about the API in this area right now though, it's been
> > a while and I never worked much with this API (rather than the mac80211
> > one with auth/assoc/disassoc/deauth.)
> 
> I think we both feel that removing it is correct. I'll test this when
> I find some free time, and if things seem OK i'll post a libertas
> patch in addition to the cfg80211 fix.

It shouldn't hurt either way, since if the event is there but we're
already disconnected we'll just ignore it, afaict.

johannes


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-08-02 19:57 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-31 17:36 cfg80211_disconnected memory leak Daniel Drake
2012-08-01 17:39 ` Johannes Berg
2012-08-01 21:04   ` Daniel Drake
2012-08-01 23:22   ` Daniel Drake
2012-08-02  8:02     ` Johannes Berg
2012-08-02 16:11       ` Daniel Drake
2012-08-02 16:26         ` Johannes Berg
2012-08-02 17:25           ` Daniel Drake
2012-08-02 19:57             ` Johannes Berg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).