* cfg80211_disconnected memory leak
@ 2012-07-31 17:36 Daniel Drake
2012-08-01 17:39 ` Johannes Berg
0 siblings, 1 reply; 9+ messages in thread
From: Daniel Drake @ 2012-07-31 17:36 UTC (permalink / raw)
To: linux-wireless, johannes
Hi,
After doing insmod/rmmod of libertas, kmemleak found:
unreferenced object 0xe90f1398 (size 64):
comm "rmmod", pid 836, jiffies 4294944467 (age 34.620s)
hex dump (first 32 bytes):
58 cd 24 e9 58 cd 24 e9 02 00 00 00 c0 13 0f e9 X.$.X.$.........
00 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<b0747a00>] kmemleak_alloc+0x26/0x44
[<b049eff1>] __kmalloc+0xf3/0x176
[<b0740aae>] cfg80211_disconnected+0x3e/0xc8
[<b06008c9>] lbs_disconnect+0x73/0x86
[<b0600955>] lbs_cfg_disconnect+0x79/0x88
[<b0741c20>] __cfg80211_disconnect+0xf5/0x148
[<b072cd8f>] cfg80211_netdev_notifier_call+0x253/0x452
[<b075781b>] notifier_call_chain+0x2a/0x4b
[<b04344f6>] __raw_notifier_call_chain+0x13/0x15
[<b0434509>] raw_notifier_call_chain+0x11/0x13
[<b06a51ff>] call_netdevice_notifiers+0x41/0x48
[<b06a5247>] __dev_close_many+0x41/0x8b
[<b06a5323>] dev_close_many+0x58/0xa8
[<b06a5401>] rollback_registered_many+0x8e/0x1f8
[<b06a5593>] rollback_registered+0x28/0x34
[<b06a66b0>] unregister_netdevice_queue+0x51/0x6e
By adding some printks I have found that cfg80211_disconnected() does
indeed queue an event to be processed in cfg80211_wq on the eth0
device, but by the time cfg80211_process_rdev_events() is called, eth0
is no longer present in the rdev's netdev_list, so the event never
gets processed (or freed).
Daniel
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: cfg80211_disconnected memory leak
2012-07-31 17:36 cfg80211_disconnected memory leak Daniel Drake
@ 2012-08-01 17:39 ` Johannes Berg
2012-08-01 21:04 ` Daniel Drake
2012-08-01 23:22 ` Daniel Drake
0 siblings, 2 replies; 9+ messages in thread
From: Johannes Berg @ 2012-08-01 17:39 UTC (permalink / raw)
To: Daniel Drake; +Cc: linux-wireless
Hi,
> unreferenced object 0xe90f1398 (size 64):
> backtrace:
> [<b0747a00>] kmemleak_alloc+0x26/0x44
> [<b049eff1>] __kmalloc+0xf3/0x176
> [<b0740aae>] cfg80211_disconnected+0x3e/0xc8
> [<b06008c9>] lbs_disconnect+0x73/0x86
> [<b0600955>] lbs_cfg_disconnect+0x79/0x88
> [<b0741c20>] __cfg80211_disconnect+0xf5/0x148
> [<b072cd8f>] cfg80211_netdev_notifier_call+0x253/0x452
> By adding some printks I have found that cfg80211_disconnected() does
> indeed queue an event to be processed in cfg80211_wq on the eth0
> device, but by the time cfg80211_process_rdev_events() is called, eth0
> is no longer present in the rdev's netdev_list, so the event never
> gets processed (or freed).
This is very odd. What version of the kernel is this?
The strange thing is that we call __cfg80211_disconnect() from the
netdev notifier with NETDEV_GOING_DOWN. This will allocate and queue the
work item as you found. The next thing that happens should be
NETDEV_DOWN, which will cause us to dev_hold() the device and then queue
the cleanup work. The cleanup work must run for us to dev_put() the
device, so that it can only be unregistered after that runs. Then,
finally, we get NETDEV_UNREGISTER which removes it from the list.
Now note that the work item we queue in __cfg80211_disconnect() is
queued *before* the cleanup work, therefore it should also run before
the cleanup work since the workqueue is singlethreaded.
Hence I have no idea how this comes about.
johannes
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: cfg80211_disconnected memory leak
2012-08-01 17:39 ` Johannes Berg
@ 2012-08-01 21:04 ` Daniel Drake
2012-08-01 23:22 ` Daniel Drake
1 sibling, 0 replies; 9+ messages in thread
From: Daniel Drake @ 2012-08-01 21:04 UTC (permalink / raw)
To: Johannes Berg; +Cc: linux-wireless
On Wed, Aug 1, 2012 at 11:39 AM, Johannes Berg
<johannes@sipsolutions.net> wrote:
> This is very odd. What version of the kernel is this?
3.3, and just reproduced on 3.5.
I'll have a closer look later in the week.
Thanks
Daniel
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: cfg80211_disconnected memory leak
2012-08-01 17:39 ` Johannes Berg
2012-08-01 21:04 ` Daniel Drake
@ 2012-08-01 23:22 ` Daniel Drake
2012-08-02 8:02 ` Johannes Berg
1 sibling, 1 reply; 9+ messages in thread
From: Daniel Drake @ 2012-08-01 23:22 UTC (permalink / raw)
To: Johannes Berg; +Cc: linux-wireless
On Wed, Aug 1, 2012 at 11:39 AM, Johannes Berg
<johannes@sipsolutions.net> wrote:
> The strange thing is that we call __cfg80211_disconnect() from the
> netdev notifier with NETDEV_GOING_DOWN. This will allocate and queue the
> work item as you found. The next thing that happens should be
> NETDEV_DOWN, which will cause us to dev_hold() the device and then queue
> the cleanup work. The cleanup work must run for us to dev_put() the
> device, so that it can only be unregistered after that runs. Then,
> finally, we get NETDEV_UNREGISTER which removes it from the list.
>
> Now note that the work item we queue in __cfg80211_disconnect() is
> queued *before* the cleanup work, therefore it should also run before
> the cleanup work since the workqueue is singlethreaded.
Here is what happens:
NETDEV_GOING_DOWN
cfg80211_disconnected() called, disconnect event work queued
NETDEV_DOWN
cleanup work queued
NETDEV_UNREGISTER
*** cfg80211_netdev_notifier_call now calls: list_del_rcu(&wdev->list);
disconnect even work runs, calls cfg80211_process_rdev_events() but
the wdev is already removed from rdev->netdev_list as above
cleanup work runs
The bit I marked with *** is what is causing the difficulties - it
runs before the work items do.
Daniel
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: cfg80211_disconnected memory leak
2012-08-01 23:22 ` Daniel Drake
@ 2012-08-02 8:02 ` Johannes Berg
2012-08-02 16:11 ` Daniel Drake
0 siblings, 1 reply; 9+ messages in thread
From: Johannes Berg @ 2012-08-02 8:02 UTC (permalink / raw)
To: Daniel Drake; +Cc: linux-wireless
On Wed, 2012-08-01 at 17:22 -0600, Daniel Drake wrote:
> Here is what happens:
>
> NETDEV_GOING_DOWN
> cfg80211_disconnected() called, disconnect event work queued
> NETDEV_DOWN
> cleanup work queued
> NETDEV_UNREGISTER
> *** cfg80211_netdev_notifier_call now calls: list_del_rcu(&wdev->list);
> disconnect even work runs, calls cfg80211_process_rdev_events() but
> the wdev is already removed from rdev->netdev_list as above
> cleanup work runs
>
> The bit I marked with *** is what is causing the difficulties - it
> runs before the work items do.
Oh, hm. I didn't think it could unregister before we give up our
reference, but I guess that makes sense after all.
I'm not sure there's an easy way to fix it other than making the driver
not call cfg80211_disconnected() in case the disconnect was requested by
cfg80211 -- that call isn't needed and will not do anything at all, but
I'm not sure how easy that would be in the driver?
johannes
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: cfg80211_disconnected memory leak
2012-08-02 8:02 ` Johannes Berg
@ 2012-08-02 16:11 ` Daniel Drake
2012-08-02 16:26 ` Johannes Berg
0 siblings, 1 reply; 9+ messages in thread
From: Daniel Drake @ 2012-08-02 16:11 UTC (permalink / raw)
To: Johannes Berg; +Cc: linux-wireless
On Thu, Aug 2, 2012 at 2:02 AM, Johannes Berg <johannes@sipsolutions.net> wrote:
> Oh, hm. I didn't think it could unregister before we give up our
> reference, but I guess that makes sense after all.
>
> I'm not sure there's an easy way to fix it other than making the driver
> not call cfg80211_disconnected() in case the disconnect was requested by
> cfg80211 -- that call isn't needed and will not do anything at all, but
> I'm not sure how easy that would be in the driver?
I guess you've considered clearing all the pending work before
removing a netdev from the rdev's list?
I think a driver modification would be easy, if it is the right solution.
lbs_disconnect() is the function that calls cfg80211_disconnected().
We only ever call this in 2 contexts:
1. From our cfg80211_ops.disconnect handler - you say this isn't needed
2. From the netdev ndo_stop handler - I guess it is also not necessary
to inform cfg80211 that we have disconnected at this point, it is kind
of obvious..?
So completely removing the call to cfg80211_disconnected() may be the
right option here, is that what you recommend?
Thanks
Daniel
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: cfg80211_disconnected memory leak
2012-08-02 16:11 ` Daniel Drake
@ 2012-08-02 16:26 ` Johannes Berg
2012-08-02 17:25 ` Daniel Drake
0 siblings, 1 reply; 9+ messages in thread
From: Johannes Berg @ 2012-08-02 16:26 UTC (permalink / raw)
To: Daniel Drake; +Cc: linux-wireless
On Thu, 2012-08-02 at 10:11 -0600, Daniel Drake wrote:
> On Thu, Aug 2, 2012 at 2:02 AM, Johannes Berg <johannes@sipsolutions.net> wrote:
> > Oh, hm. I didn't think it could unregister before we give up our
> > reference, but I guess that makes sense after all.
> >
> > I'm not sure there's an easy way to fix it other than making the driver
> > not call cfg80211_disconnected() in case the disconnect was requested by
> > cfg80211 -- that call isn't needed and will not do anything at all, but
> > I'm not sure how easy that would be in the driver?
>
> I guess you've considered clearing all the pending work before
> removing a netdev from the rdev's list?
Yes, but we can't just try to flush the workqueue because of locking
concerns in this area.
Hmm. Then again, I think we can call cfg80211_process_wdev_events() from
case NETDEV_UNREGISTER though, probably after removing from the list.
Maybe you could try that?
> I think a driver modification would be easy, if it is the right solution.
>
> lbs_disconnect() is the function that calls cfg80211_disconnected().
> We only ever call this in 2 contexts:
>
> 1. From our cfg80211_ops.disconnect handler - you say this isn't needed
> 2. From the netdev ndo_stop handler - I guess it is also not necessary
> to inform cfg80211 that we have disconnected at this point, it is kind
> of obvious..?
In fact, you won't get to ndo_stop without cfg80211 calling
disconnect(), because it does that from NETDEV_GOING_DOWN.
> So completely removing the call to cfg80211_disconnected() may be the
> right option here, is that what you recommend?
I'm not 100% sure about the API in this area right now though, it's been
a while and I never worked much with this API (rather than the mac80211
one with auth/assoc/disassoc/deauth.)
johannes
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: cfg80211_disconnected memory leak
2012-08-02 16:26 ` Johannes Berg
@ 2012-08-02 17:25 ` Daniel Drake
2012-08-02 19:57 ` Johannes Berg
0 siblings, 1 reply; 9+ messages in thread
From: Daniel Drake @ 2012-08-02 17:25 UTC (permalink / raw)
To: Johannes Berg; +Cc: linux-wireless
On Thu, Aug 2, 2012 at 10:26 AM, Johannes Berg
<johannes@sipsolutions.net> wrote:
> Hmm. Then again, I think we can call cfg80211_process_wdev_events() from
> case NETDEV_UNREGISTER though, probably after removing from the list.
> Maybe you could try that?
That solves the issue - confirmed that kmemleak now shuts up, and with
some added printks to confirm event creation and freeing.
Patch coming up, titled: cfg80211: process pending events when
unregistering net device
Even if libertas isn't quite doing the right thing here, I think this
is the right thing to do. I guess there are other situations, perhaps
more legitimate, where we can reach this point with events in the
queue.
> I'm not 100% sure about the API in this area right now though, it's been
> a while and I never worked much with this API (rather than the mac80211
> one with auth/assoc/disassoc/deauth.)
I think we both feel that removing it is correct. I'll test this when
I find some free time, and if things seem OK i'll post a libertas
patch in addition to the cfg80211 fix.
Thanks
Daniel
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: cfg80211_disconnected memory leak
2012-08-02 17:25 ` Daniel Drake
@ 2012-08-02 19:57 ` Johannes Berg
0 siblings, 0 replies; 9+ messages in thread
From: Johannes Berg @ 2012-08-02 19:57 UTC (permalink / raw)
To: Daniel Drake; +Cc: linux-wireless
On Thu, 2012-08-02 at 11:25 -0600, Daniel Drake wrote:
> On Thu, Aug 2, 2012 at 10:26 AM, Johannes Berg
> <johannes@sipsolutions.net> wrote:
> > Hmm. Then again, I think we can call cfg80211_process_wdev_events() from
> > case NETDEV_UNREGISTER though, probably after removing from the list.
> > Maybe you could try that?
>
> That solves the issue - confirmed that kmemleak now shuts up, and with
> some added printks to confirm event creation and freeing.
> Patch coming up, titled: cfg80211: process pending events when
> unregistering net device
Nice, thanks!
> Even if libertas isn't quite doing the right thing here, I think this
> is the right thing to do. I guess there are other situations, perhaps
> more legitimate, where we can reach this point with events in the
> queue.
Yes, I agree. I just wasn't completely sure this would be OK when I
looked first, and then started wondering why it didn't happen with
mac80211, but that doesn't send the event.
> > I'm not 100% sure about the API in this area right now though, it's been
> > a while and I never worked much with this API (rather than the mac80211
> > one with auth/assoc/disassoc/deauth.)
>
> I think we both feel that removing it is correct. I'll test this when
> I find some free time, and if things seem OK i'll post a libertas
> patch in addition to the cfg80211 fix.
It shouldn't hurt either way, since if the event is there but we're
already disconnected we'll just ignore it, afaict.
johannes
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2012-08-02 19:57 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-31 17:36 cfg80211_disconnected memory leak Daniel Drake
2012-08-01 17:39 ` Johannes Berg
2012-08-01 21:04 ` Daniel Drake
2012-08-01 23:22 ` Daniel Drake
2012-08-02 8:02 ` Johannes Berg
2012-08-02 16:11 ` Daniel Drake
2012-08-02 16:26 ` Johannes Berg
2012-08-02 17:25 ` Daniel Drake
2012-08-02 19:57 ` Johannes Berg
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).