All of lore.kernel.org
 help / color / mirror / Atom feed
* Suspicious RCU usage in mac80211
@ 2012-04-10  3:19 Larry Finger
  2012-04-12  3:31 ` Johannes Berg
  0 siblings, 1 reply; 25+ messages in thread
From: Larry Finger @ 2012-04-10  3:19 UTC (permalink / raw)
  To: Johannes Berg; +Cc: wireless

When running kernel 3.4-rc2 from wireless testing, I got the following logged:

[ 2299.344437] ===============================
[ 2299.344443] [ INFO: suspicious RCU usage. ]
[ 2299.344452] 3.4.0-rc2-wl+ #222 Not tainted
[ 2299.344458] -------------------------------
[ 2299.344466] net/mac80211/sta_info.h:449 suspicious 
rcu_dereference_protected() usage!
[ 2299.344472]
[ 2299.344474] other info that might help us debug this:
[ 2299.344477]
[ 2299.344484]
[ 2299.344486] rcu_scheduler_active = 1, debug_locks = 0
[ 2299.344493] 1 lock held by swapper/1/0:
[ 2299.344499]  #0:  (&tid_tx->session_timer){+.-...}, at: [<ffffffff8104850a>] 
run_timer_softirq+0xfa/0x6e0
[ 2299.344533]
[ 2299.344535] stack backtrace:
[ 2299.344544] Pid: 0, comm: swapper/1 Not tainted 3.4.0-rc2-wl+ #222
[ 2299.344551] Call Trace:
[ 2299.344557]  <IRQ>  [<ffffffff81092fdd>] lockdep_rcu_suspicious+0xfd/0x130
[ 2299.344651]  [<ffffffffa05db673>] sta_tx_agg_session_timer_expired+0xe3/0x100 
[mac80211]
[ 2299.344664]  [<ffffffff81048595>] run_timer_softirq+0x185/0x6e0
[ 2299.344676]  [<ffffffff8104850a>] ? run_timer_softirq+0xfa/0x6e0
[ 2299.344714]  [<ffffffffa05db590>] ? ieee80211_stop_tx_ba_session+0x240/0x240 
[mac80211]
[ 2299.344732]  [<ffffffff81040156>] __do_softirq+0xc6/0x3f0
[ 2299.344748]  [<ffffffff8139c934>] ? _raw_spin_lock+0x44/0x50
[ 2299.344758]  [<ffffffff8139d0f6>] ? _raw_spin_unlock+0x26/0x40
[ 2299.344771]  [<ffffffff8139f24c>] call_softirq+0x1c/0x30
[ 2299.344785]  [<ffffffff81004785>] do_softirq+0x85/0xc0
[ 2299.344795]  [<ffffffff810407b6>] irq_exit+0xa6/0xe0
[ 2299.344806]  [<ffffffff8100424e>] do_IRQ+0x5e/0xd0
[ 2299.344816]  [<ffffffff8139d56c>] common_interrupt+0x6c/0x6c
[ 2299.344823]  <EOI>  [<ffffffff8100c042>] ? default_idle+0x62/0x4f0
[ 2299.344844]  [<ffffffff8100c040>] ? default_idle+0x60/0x4f0
[ 2299.344854]  [<ffffffff8100c524>] amd_e400_idle+0x54/0x100
[ 2299.344864]  [<ffffffff8100ce96>] cpu_idle+0x96/0xe0
[ 2299.344875]  [<ffffffff8138fbc6>] start_secondary+0x1b7/0x1bb

The wireless driver in use was rtl8192ce.

Larry

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Suspicious RCU usage in mac80211
  2012-04-10  3:19 Suspicious RCU usage in mac80211 Larry Finger
@ 2012-04-12  3:31 ` Johannes Berg
  2012-04-12  3:51   ` Larry Finger
  0 siblings, 1 reply; 25+ messages in thread
From: Johannes Berg @ 2012-04-12  3:31 UTC (permalink / raw)
  To: Larry Finger; +Cc: wireless

On Mon, 2012-04-09 at 22:19 -0500, Larry Finger wrote:
> When running kernel 3.4-rc2 from wireless testing, I got the following logged:
> 
> [ 2299.344437] ===============================
> [ 2299.344443] [ INFO: suspicious RCU usage. ]
> [ 2299.344452] 3.4.0-rc2-wl+ #222 Not tainted
> [ 2299.344458] -------------------------------
> [ 2299.344466] net/mac80211/sta_info.h:449 suspicious rcu_dereference_protected() usage!
> [ 2299.344472]
> [ 2299.344474] other info that might help us debug this:
> [ 2299.344477]
> [ 2299.344484]
> [ 2299.344486] rcu_scheduler_active = 1, debug_locks = 0
> [ 2299.344493] 1 lock held by swapper/1/0:
> [ 2299.344499]  #0:  (&tid_tx->session_timer){+.-...}, at: [<ffffffff8104850a>] 
> run_timer_softirq+0xfa/0x6e0
> [ 2299.344533]
> [ 2299.344535] stack backtrace:
> [ 2299.344544] Pid: 0, comm: swapper/1 Not tainted 3.4.0-rc2-wl+ #222
> [ 2299.344551] Call Trace:
> [ 2299.344557]  <IRQ>  [<ffffffff81092fdd>] lockdep_rcu_suspicious+0xfd/0x130
> [ 2299.344651]  [<ffffffffa05db673>] sta_tx_agg_session_timer_expired+0xe3/0x100 
> [mac80211]

That's odd:

static inline struct tid_ampdu_tx *
rcu_dereference_protected_tid_tx(struct sta_info *sta, int tid)
{
        return rcu_dereference_protected(sta->ampdu_mlme.tid_tx[tid],
                                         lockdep_is_held(&sta->lock) ||
                                         lockdep_is_held(&sta->ampdu_mlme.mtx));
} 

sta_tx_agg_session_timer_expired calls ieee80211_stop_tx_ba_session
which does:

        spin_lock_bh(&sta->lock);
        tid_tx = rcu_dereference_protected_tid_tx(sta, tid);


So why would the message happen? Strange.

johannes


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Suspicious RCU usage in mac80211
  2012-04-12  3:31 ` Johannes Berg
@ 2012-04-12  3:51   ` Larry Finger
  2012-04-12  3:54     ` Johannes Berg
  0 siblings, 1 reply; 25+ messages in thread
From: Larry Finger @ 2012-04-12  3:51 UTC (permalink / raw)
  To: Johannes Berg; +Cc: wireless

On 04/11/2012 10:31 PM, Johannes Berg wrote:
> On Mon, 2012-04-09 at 22:19 -0500, Larry Finger wrote:
>> When running kernel 3.4-rc2 from wireless testing, I got the following logged:
>>
>> [ 2299.344437] ===============================
>> [ 2299.344443] [ INFO: suspicious RCU usage. ]
>> [ 2299.344452] 3.4.0-rc2-wl+ #222 Not tainted
>> [ 2299.344458] -------------------------------
>> [ 2299.344466] net/mac80211/sta_info.h:449 suspicious rcu_dereference_protected() usage!
>> [ 2299.344472]
>> [ 2299.344474] other info that might help us debug this:
>> [ 2299.344477]
>> [ 2299.344484]
>> [ 2299.344486] rcu_scheduler_active = 1, debug_locks = 0
>> [ 2299.344493] 1 lock held by swapper/1/0:
>> [ 2299.344499]  #0:  (&tid_tx->session_timer){+.-...}, at: [<ffffffff8104850a>]
>> run_timer_softirq+0xfa/0x6e0
>> [ 2299.344533]
>> [ 2299.344535] stack backtrace:
>> [ 2299.344544] Pid: 0, comm: swapper/1 Not tainted 3.4.0-rc2-wl+ #222
>> [ 2299.344551] Call Trace:
>> [ 2299.344557]<IRQ>   [<ffffffff81092fdd>] lockdep_rcu_suspicious+0xfd/0x130
>> [ 2299.344651]  [<ffffffffa05db673>] sta_tx_agg_session_timer_expired+0xe3/0x100
>> [mac80211]
>
> That's odd:
>
> static inline struct tid_ampdu_tx *
> rcu_dereference_protected_tid_tx(struct sta_info *sta, int tid)
> {
>          return rcu_dereference_protected(sta->ampdu_mlme.tid_tx[tid],
>                                           lockdep_is_held(&sta->lock) ||
>                                           lockdep_is_held(&sta->ampdu_mlme.mtx));
> }
>
> sta_tx_agg_session_timer_expired calls ieee80211_stop_tx_ba_session
> which does:
>
>          spin_lock_bh(&sta->lock);
>          tid_tx = rcu_dereference_protected_tid_tx(sta, tid);
>
>
> So why would the message happen? Strange.

When I first saw the thread at 
http://lkml.indiana.edu/hypermail/linux/kernel/1204.1/01446.html, I thought it 
might be about the same problem, but that appears to be restricted to sparc64.

The problem does not happen every time - even more strangeness.

Larry



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Suspicious RCU usage in mac80211
  2012-04-12  3:51   ` Larry Finger
@ 2012-04-12  3:54     ` Johannes Berg
  2012-04-12 15:51       ` Larry Finger
  0 siblings, 1 reply; 25+ messages in thread
From: Johannes Berg @ 2012-04-12  3:54 UTC (permalink / raw)
  To: Larry Finger; +Cc: wireless

On Wed, 2012-04-11 at 22:51 -0500, Larry Finger wrote:
> On 04/11/2012 10:31 PM, Johannes Berg wrote:
> > On Mon, 2012-04-09 at 22:19 -0500, Larry Finger wrote:
> >> When running kernel 3.4-rc2 from wireless testing, I got the following logged:
> >>
> >> [ 2299.344437] ===============================
> >> [ 2299.344443] [ INFO: suspicious RCU usage. ]
> >> [ 2299.344452] 3.4.0-rc2-wl+ #222 Not tainted
> >> [ 2299.344458] -------------------------------
> >> [ 2299.344466] net/mac80211/sta_info.h:449 suspicious rcu_dereference_protected() usage!
> >> [ 2299.344472]
> >> [ 2299.344474] other info that might help us debug this:
> >> [ 2299.344477]
> >> [ 2299.344484]
> >> [ 2299.344486] rcu_scheduler_active = 1, debug_locks = 0
> >> [ 2299.344493] 1 lock held by swapper/1/0:
> >> [ 2299.344499]  #0:  (&tid_tx->session_timer){+.-...}, at: [<ffffffff8104850a>]
> >> run_timer_softirq+0xfa/0x6e0
> >> [ 2299.344533]
> >> [ 2299.344535] stack backtrace:
> >> [ 2299.344544] Pid: 0, comm: swapper/1 Not tainted 3.4.0-rc2-wl+ #222
> >> [ 2299.344551] Call Trace:
> >> [ 2299.344557]<IRQ>   [<ffffffff81092fdd>] lockdep_rcu_suspicious+0xfd/0x130
> >> [ 2299.344651]  [<ffffffffa05db673>] sta_tx_agg_session_timer_expired+0xe3/0x100
> >> [mac80211]
> >
> > That's odd:
> >
> > static inline struct tid_ampdu_tx *
> > rcu_dereference_protected_tid_tx(struct sta_info *sta, int tid)
> > {
> >          return rcu_dereference_protected(sta->ampdu_mlme.tid_tx[tid],
> >                                           lockdep_is_held(&sta->lock) ||
> >                                           lockdep_is_held(&sta->ampdu_mlme.mtx));
> > }
> >
> > sta_tx_agg_session_timer_expired calls ieee80211_stop_tx_ba_session
> > which does:
> >
> >          spin_lock_bh(&sta->lock);
> >          tid_tx = rcu_dereference_protected_tid_tx(sta, tid);
> >
> >
> > So why would the message happen? Strange.
> 
> When I first saw the thread at 
> http://lkml.indiana.edu/hypermail/linux/kernel/1204.1/01446.html, I thought it 
> might be about the same problem, but that appears to be restricted to sparc64.
> 
> The problem does not happen every time - even more strangeness.

What do you mean by "every time"? I wouldn't expect the timer to fire
every time you set up aggregation etc., the timer firing is a special
case to start with.

johannes


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Suspicious RCU usage in mac80211
  2012-04-12  3:54     ` Johannes Berg
@ 2012-04-12 15:51       ` Larry Finger
  2012-04-12 15:55         ` Johannes Berg
  0 siblings, 1 reply; 25+ messages in thread
From: Larry Finger @ 2012-04-12 15:51 UTC (permalink / raw)
  To: Johannes Berg; +Cc: wireless

On 04/11/2012 10:54 PM, Johannes Berg wrote:
>
> What do you mean by "every time"? I wouldn't expect the timer to fire
> every time you set up aggregation etc., the timer firing is a special
> case to start with.

I meant it is intermittent, or it required some special setup that I have not 
duplicated, as I have seen the condition only once.

Larry


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Suspicious RCU usage in mac80211
  2012-04-12 15:51       ` Larry Finger
@ 2012-04-12 15:55         ` Johannes Berg
  2012-05-01 14:25           ` Mohammed Shafi
  0 siblings, 1 reply; 25+ messages in thread
From: Johannes Berg @ 2012-04-12 15:55 UTC (permalink / raw)
  To: Larry Finger; +Cc: wireless

On Thu, 2012-04-12 at 10:51 -0500, Larry Finger wrote:
> On 04/11/2012 10:54 PM, Johannes Berg wrote:
> >
> > What do you mean by "every time"? I wouldn't expect the timer to fire
> > every time you set up aggregation etc., the timer firing is a special
> > case to start with.
> 
> I meant it is intermittent, or it required some special setup that I have not 
> duplicated, as I have seen the condition only once.

Ok. I suspect that's explained by the fact that the timer is an error
case anyway, so you've probably never seen that error case again. It's
still puzzling though since the code looks fine to me.

johannes


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Suspicious RCU usage in mac80211
  2012-04-12 15:55         ` Johannes Berg
@ 2012-05-01 14:25           ` Mohammed Shafi
  2012-05-01 19:18             ` Larry Finger
  0 siblings, 1 reply; 25+ messages in thread
From: Mohammed Shafi @ 2012-05-01 14:25 UTC (permalink / raw)
  To: Johannes Berg; +Cc: Larry Finger, wireless

hi Johannes and Larry,

On Thu, Apr 12, 2012 at 9:25 PM, Johannes Berg
<johannes@sipsolutions.net> wrote:
> On Thu, 2012-04-12 at 10:51 -0500, Larry Finger wrote:
>> On 04/11/2012 10:54 PM, Johannes Berg wrote:
>> >
>> > What do you mean by "every time"? I wouldn't expect the timer to fire
>> > every time you set up aggregation etc., the timer firing is a special
>> > case to start with.
>>
>> I meant it is intermittent, or it required some special setup that I have not
>> duplicated, as I have seen the condition only once.
>
> Ok. I suspect that's explained by the fact that the timer is an error
> case anyway, so you've probably never seen that error case again. It's
> still puzzling though since the code looks fine to me.
>
> johannes
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


May  1 14:34:51 shafi-laptop kernel: [   83.216284]
===============================
May  1 14:34:51 shafi-laptop kernel: [   83.216286] [ INFO: suspicious
RCU usage. ]
May  1 14:34:51 shafi-laptop kernel: [   83.216289] 3.4.0-rc3-wl #16 Not tainted
May  1 14:34:51 shafi-laptop kernel: [   83.216291]
-------------------------------
May  1 14:34:51 shafi-laptop kernel: [   83.216293]
net/mac80211/sta_info.h:456 suspicious rcu_dereference_protected()
usage!
May  1 14:34:51 shafi-laptop kernel: [   83.216295]
May  1 14:34:51 shafi-laptop kernel: [   83.216296] other info that
might help us debug this:
May  1 14:34:51 shafi-laptop kernel: [   83.216297]
May  1 14:34:51 shafi-laptop kernel: [   83.216299]
May  1 14:34:51 shafi-laptop kernel: [   83.216299]
rcu_scheduler_active = 1, debug_locks = 1
May  1 14:34:51 shafi-laptop kernel: [   83.216302] 2 locks held by kmemleak/52:
May  1 14:34:51 shafi-laptop kernel: [   83.216303]  #0:
(scan_mutex){+.+...}, at: [<c024d343>] kmemleak_scan_thread+0x53/0xc0
May  1 14:34:51 shafi-laptop kernel: [   83.216313]  #1:
(&tid_tx->session_timer){+.-...}, at: [<c0144fdf>]
run_timer_softirq+0xcf/0x400
May  1 14:34:51 shafi-laptop kernel: [   83.216321]
May  1 14:34:51 shafi-laptop kernel: [   83.216322] stack backtrace:
May  1 14:34:51 shafi-laptop kernel: [   83.216325] Pid: 52, comm:
kmemleak Not tainted 3.4.0-rc3-wl #16
May  1 14:34:51 shafi-laptop kernel: [   83.216327] Call Trace:
May  1 14:34:51 shafi-laptop kernel: [   83.216332]  [<c0191726>]
lockdep_rcu_suspicious+0xc6/0x100
May  1 14:34:51 shafi-laptop kernel: [   83.216351]  [<f835b14d>]
sta_tx_agg_session_timer_expired+0xcd/0x100 [mac80211]
May  1 14:34:51 shafi-laptop kernel: [   83.216355]  [<c014506d>]
run_timer_softirq+0x15d/0x400
May  1 14:34:51 shafi-laptop kernel: [   83.216358]  [<c0144fdf>] ?
run_timer_softirq+0xcf/0x400
May  1 14:34:51 shafi-laptop kernel: [   83.216373]  [<f835b080>] ?
sta_addba_resp_timer_expired+0x240/0x240 [mac80211]
May  1 14:34:51 shafi-laptop kernel: [   83.216380]  [<c013d56f>]
__do_softirq+0xaf/0x310
May  1 14:34:51 shafi-laptop kernel: [   83.216384]  [<c013d4c0>] ?
__hrtimer_tasklet_trampoline+0x20/0x20
May  1 14:34:51 shafi-laptop kernel: [   83.216386]  <IRQ>
[<c013cc65>] ? irq_exit+0xb5/0xd0
May  1 14:34:51 shafi-laptop kernel: [   83.216393]  [<c06b74a9>] ?
smp_apic_timer_interrupt+0x59/0x88
May  1 14:34:51 shafi-laptop kernel: [   83.216398]  [<c03d8edc>] ?
trace_hardirqs_off_thunk+0xc/0x10
May  1 14:34:51 shafi-laptop kernel: [   83.216402]  [<c06afc36>] ?
apic_timer_interrupt+0x36/0x40
May  1 14:34:51 shafi-laptop kernel: [   83.216405]  [<c019007b>] ?
rcu_read_lock_sched_held+0x1b/0x70
May  1 14:34:51 shafi-laptop kernel: [   83.216409]  [<c06af58e>] ?
_raw_spin_unlock_irqrestore+0x3e/0x70
May  1 14:34:51 shafi-laptop kernel: [   83.216412]  [<c024bf4b>] ?
scan_gray_list+0x12b/0x180
May  1 14:34:51 shafi-laptop kernel: [   83.216415]  [<c024c815>] ?
kmemleak_scan+0x235/0x8c0
May  1 14:34:51 shafi-laptop kernel: [   83.216418]  [<c024c5e0>] ?
kmemleak_seq_next+0x130/0x130
May  1 14:34:51 shafi-laptop kernel: [   83.216422]  [<c024d2f0>] ?
kmemleak_write+0x450/0x450
May  1 14:34:51 shafi-laptop kernel: [   83.216425]  [<c024d2f0>] ?
kmemleak_write+0x450/0x450
May  1 14:34:51 shafi-laptop kernel: [   83.216428]  [<c024d348>] ?
kmemleak_scan_thread+0x58/0xc0
May  1 14:34:51 shafi-laptop kernel: [   83.216431]  [<c01590a4>] ?
kthread+0x84/0x90
May  1 14:34:51 shafi-laptop kernel: [   83.216435]  [<c0159020>] ?
__init_kthread_worker+0x60/0x60
May  1 14:34:51 shafi-laptop kernel: [   83.216438]  [<c06b7306>] ?
kernel_thread_helper+0x6/0x10
May  1 14:34:54 shafi-laptop kernel: [   86.692061] wlan0: no IPv6
routers present
May  1 14:34:58 shafi-laptop kernel: [   90.691571] cfg80211: Found
new beacon on frequency: 2467 MHz (Ch 12) on phy0

i just looked into the code and compared it with
sta_addba_resp_timer_expired and thought whether we should do
this..

diff --git a/net/mac80211/agg-tx.c b/net/mac80211/agg-tx.c
index 5b7053c..e4da258 100644
--- a/net/mac80211/agg-tx.c
+++ b/net/mac80211/agg-tx.c
@@ -421,21 +421,25 @@ static void
sta_tx_agg_session_timer_expired(unsigned long data)
        struct tid_ampdu_tx *tid_tx;
        unsigned long timeout;

+       rcu_read_lock();
        tid_tx = rcu_dereference_protected_tid_tx(sta, *ptid);
-       if (!tid_tx)
+       if (!tid_tx) {
+               rcu_read_unlock();
                return;
+       }

        timeout = tid_tx->last_tx + TU_TO_JIFFIES(tid_tx->timeout);
        if (time_is_after_jiffies(timeout)) {
                mod_timer(&tid_tx->session_timer, timeout);
+               rcu_read_unlock();
                return;
        }

 #ifdef CONFIG_MAC80211_HT_DEBUG
        printk(KERN_DEBUG "tx session timer expired on tid %d\n", (u16)*ptid);
 #endif
-
        ieee80211_stop_tx_ba_session(&sta->sta, *ptid);
+       rcu_read_unlock();
 }

 int i



-- 
thanks,
shafi

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: Suspicious RCU usage in mac80211
  2012-05-01 14:25           ` Mohammed Shafi
@ 2012-05-01 19:18             ` Larry Finger
  2012-05-02  5:02               ` Mohammed Shafi
  2012-05-02 10:00               ` Catalin Marinas
  0 siblings, 2 replies; 25+ messages in thread
From: Larry Finger @ 2012-05-01 19:18 UTC (permalink / raw)
  To: Mohammed Shafi; +Cc: Johannes Berg, wireless, Catalin Marinas

On 05/01/2012 09:25 AM, Mohammed Shafi wrote:
>
> i just looked into the code and compared it with
> sta_addba_resp_timer_expired and thought whether we should do
> this..

Your patch does not help. I still get the following dump in the log:

===============================
[ INFO: suspicious RCU usage. ]
3.4.0-rc5-wl+ #287 Not tainted
-------------------------------
net/mac80211/sta_info.h:456 suspicious rcu_dereference_protected() usage!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
3 locks held by kmemleak/622:
#0:  (scan_mutex){+.+...}, at: [<ffffffff8113b0d6>] kmemleak_scan_thread+0x56/0xd0
#1:  (&tid_tx->session_timer){+.-...}, at: [<ffffffff8104853a>] 
run_timer_softirq+0xfa/0x6e0
#2:  (rcu_read_lock){.+.+..}, at: [<ffffffffa0449ff0>] 
sta_tx_agg_session_timer_expired+0x0/0x2a0 [mac80211]

stack backtrace:
Pid: 622, comm: kmemleak Not tainted 3.4.0-rc5-wl+ #287
Call Trace:
  <IRQ>  [<ffffffff8109309d>] lockdep_rcu_suspicious+0xfd/0x130
  [<ffffffffa044a1cf>] sta_tx_agg_session_timer_expired+0x1df/0x2a0 [mac80211]
  [<ffffffffa0449ff0>] ? ieee80211_start_tx_ba_session+0x450/0x450 [mac80211]
  [<ffffffff810485c5>] run_timer_softirq+0x185/0x6e0

As kmemleak seems to be involved, I have added Catalin Marinas to the Cc list.

Larry



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Suspicious RCU usage in mac80211
  2012-05-01 19:18             ` Larry Finger
@ 2012-05-02  5:02               ` Mohammed Shafi
  2012-05-02 10:00               ` Catalin Marinas
  1 sibling, 0 replies; 25+ messages in thread
From: Mohammed Shafi @ 2012-05-02  5:02 UTC (permalink / raw)
  To: Larry Finger; +Cc: Johannes Berg, wireless, Catalin Marinas

On Wed, May 2, 2012 at 12:48 AM, Larry Finger <Larry.Finger@lwfinger.net> wrote:
> On 05/01/2012 09:25 AM, Mohammed Shafi wrote:
>>
>>
>> i just looked into the code and compared it with
>> sta_addba_resp_timer_expired and thought whether we should do
>> this..
>
>
> Your patch does not help. I still get the following dump in the log:

Larry, thanks for trying it out. then there seems to be something else, which
requires further debugging/understanding.

>
> ===============================
> [ INFO: suspicious RCU usage. ]
> 3.4.0-rc5-wl+ #287 Not tainted
> -------------------------------
>
> net/mac80211/sta_info.h:456 suspicious rcu_dereference_protected() usage!
>
> other info that might help us debug this:
>
> rcu_scheduler_active = 1, debug_locks = 0
> 3 locks held by kmemleak/622:
> #0:  (scan_mutex){+.+...}, at: [<ffffffff8113b0d6>]
> kmemleak_scan_thread+0x56/0xd0
> #1:  (&tid_tx->session_timer){+.-...}, at: [<ffffffff8104853a>]
> run_timer_softirq+0xfa/0x6e0
> #2:  (rcu_read_lock){.+.+..}, at: [<ffffffffa0449ff0>]
> sta_tx_agg_session_timer_expired+0x0/0x2a0 [mac80211]
>
> stack backtrace:
> Pid: 622, comm: kmemleak Not tainted 3.4.0-rc5-wl+ #287
> Call Trace:
>  <IRQ>  [<ffffffff8109309d>] lockdep_rcu_suspicious+0xfd/0x130
>  [<ffffffffa044a1cf>] sta_tx_agg_session_timer_expired+0x1df/0x2a0
> [mac80211]
>  [<ffffffffa0449ff0>] ? ieee80211_start_tx_ba_session+0x450/0x450 [mac80211]
>  [<ffffffff810485c5>] run_timer_softirq+0x185/0x6e0
>
> As kmemleak seems to be involved, I have added Catalin Marinas to the Cc
> list.
>
> Larry
>
>

-- 
thanks,
shafi

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Suspicious RCU usage in mac80211
  2012-05-01 19:18             ` Larry Finger
  2012-05-02  5:02               ` Mohammed Shafi
@ 2012-05-02 10:00               ` Catalin Marinas
  2012-05-02 17:07                 ` Johannes Berg
  1 sibling, 1 reply; 25+ messages in thread
From: Catalin Marinas @ 2012-05-02 10:00 UTC (permalink / raw)
  To: Larry Finger; +Cc: Mohammed Shafi, Johannes Berg, wireless

On Tue, May 01, 2012 at 08:18:54PM +0100, Larry Finger wrote:
> On 05/01/2012 09:25 AM, Mohammed Shafi wrote:
> > i just looked into the code and compared it with
> > sta_addba_resp_timer_expired and thought whether we should do
> > this..
> 
> Your patch does not help. I still get the following dump in the log:
> 
> ===============================
> [ INFO: suspicious RCU usage. ]
> 3.4.0-rc5-wl+ #287 Not tainted
> -------------------------------
> net/mac80211/sta_info.h:456 suspicious rcu_dereference_protected() usage!
> 
> other info that might help us debug this:
> 
> rcu_scheduler_active = 1, debug_locks = 0
> 3 locks held by kmemleak/622:
> #0:  (scan_mutex){+.+...}, at: [<ffffffff8113b0d6>] kmemleak_scan_thread+0x56/0xd0
> #1:  (&tid_tx->session_timer){+.-...}, at: [<ffffffff8104853a>] 
> run_timer_softirq+0xfa/0x6e0
> #2:  (rcu_read_lock){.+.+..}, at: [<ffffffffa0449ff0>] 
> sta_tx_agg_session_timer_expired+0x0/0x2a0 [mac80211]
> 
> stack backtrace:
> Pid: 622, comm: kmemleak Not tainted 3.4.0-rc5-wl+ #287
> Call Trace:
>   <IRQ>  [<ffffffff8109309d>] lockdep_rcu_suspicious+0xfd/0x130
>   [<ffffffffa044a1cf>] sta_tx_agg_session_timer_expired+0x1df/0x2a0 [mac80211]
>   [<ffffffffa0449ff0>] ? ieee80211_start_tx_ba_session+0x450/0x450 [mac80211]
>   [<ffffffff810485c5>] run_timer_softirq+0x185/0x6e0
> 
> As kmemleak seems to be involved, I have added Catalin Marinas to the Cc list.

It doesn't look related to kmemleak. It just happens that
sta_tx_agg_session_timer_expired() was called from a timer when the
kmemleak scanning thread was running (pretty likely as this thread can
take minutes to complete).

Looking at the code and the logs, ieee80211_start_tx_ba_session() calls
rcu_dereference_protected_tid_tx() which calls
rcu_dereference_protected() with the (lockdep_is_held(&sta->lock) ||
lockdep_is_held(&sta->ampdu_mlme.mtx)) condition which is false. As the
kernel log says, none of these locks are held, hence the warning.

Either the rcu_dereference annotation is wrong (introduced by commit
40b275 "mac80211: sparse RCU annotations") or there is something wrong
in the code and the RCU pointer is dereferenced without any of those
locks held.

-- 
Catalin

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Suspicious RCU usage in mac80211
  2012-05-02 10:00               ` Catalin Marinas
@ 2012-05-02 17:07                 ` Johannes Berg
  2012-05-02 20:09                   ` Paul E. McKenney
  2012-05-03  3:02                   ` Larry Finger
  0 siblings, 2 replies; 25+ messages in thread
From: Johannes Berg @ 2012-05-02 17:07 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: Larry Finger, Mohammed Shafi, wireless, Paul E. McKenney

On Wed, 2012-05-02 at 11:00 +0100, Catalin Marinas wrote:

> > Your patch does not help. I still get the following dump in the log:

The patch is also wrong, we hold the mutex there and can't hold RCU read
lock.

> > #0:  (scan_mutex){+.+...}, at: [<ffffffff8113b0d6>] kmemleak_scan_thread+0x56/0xd0
> > #1:  (&tid_tx->session_timer){+.-...}, at: [<ffffffff8104853a>] 
> > run_timer_softirq+0xfa/0x6e0
> > #2:  (rcu_read_lock){.+.+..}, at: [<ffffffffa0449ff0>] 
> > sta_tx_agg_session_timer_expired+0x0/0x2a0 [mac80211]
> > 
> > stack backtrace:
> > Pid: 622, comm: kmemleak Not tainted 3.4.0-rc5-wl+ #287
> > Call Trace:
> >   <IRQ>  [<ffffffff8109309d>] lockdep_rcu_suspicious+0xfd/0x130
> >   [<ffffffffa044a1cf>] sta_tx_agg_session_timer_expired+0x1df/0x2a0 [mac80211]
> >   [<ffffffffa0449ff0>] ? ieee80211_start_tx_ba_session+0x450/0x450 [mac80211]
> >   [<ffffffff810485c5>] run_timer_softirq+0x185/0x6e0
> > 
> > As kmemleak seems to be involved, I have added Catalin Marinas to the Cc list.

> Looking at the code and the logs, ieee80211_start_tx_ba_session() calls

I'm almost certain that ieee80211_start_tx_ba_session() is a bogus
calltrace entry, since we're in a timer and that's not called from a
timer.

> rcu_dereference_protected_tid_tx() which calls
> rcu_dereference_protected() with the (lockdep_is_held(&sta->lock) ||
> lockdep_is_held(&sta->ampdu_mlme.mtx)) condition which is false. As the
> kernel log says, none of these locks are held, hence the warning.

So does that just mean we need to add rcu_read_lock_held() to the
conditions? I thought that wasn't necessary? +Paul.

johannes


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Suspicious RCU usage in mac80211
  2012-05-02 17:07                 ` Johannes Berg
@ 2012-05-02 20:09                   ` Paul E. McKenney
  2012-05-03 18:38                     ` Johannes Berg
  2012-05-03  3:02                   ` Larry Finger
  1 sibling, 1 reply; 25+ messages in thread
From: Paul E. McKenney @ 2012-05-02 20:09 UTC (permalink / raw)
  To: Johannes Berg; +Cc: Catalin Marinas, Larry Finger, Mohammed Shafi, wireless

On Wed, May 02, 2012 at 07:07:51PM +0200, Johannes Berg wrote:
> On Wed, 2012-05-02 at 11:00 +0100, Catalin Marinas wrote:
> 
> > > Your patch does not help. I still get the following dump in the log:
> 
> The patch is also wrong, we hold the mutex there and can't hold RCU read
> lock.
> 
> > > #0:  (scan_mutex){+.+...}, at: [<ffffffff8113b0d6>] kmemleak_scan_thread+0x56/0xd0
> > > #1:  (&tid_tx->session_timer){+.-...}, at: [<ffffffff8104853a>] 
> > > run_timer_softirq+0xfa/0x6e0
> > > #2:  (rcu_read_lock){.+.+..}, at: [<ffffffffa0449ff0>] 
> > > sta_tx_agg_session_timer_expired+0x0/0x2a0 [mac80211]
> > > 
> > > stack backtrace:
> > > Pid: 622, comm: kmemleak Not tainted 3.4.0-rc5-wl+ #287
> > > Call Trace:
> > >   <IRQ>  [<ffffffff8109309d>] lockdep_rcu_suspicious+0xfd/0x130
> > >   [<ffffffffa044a1cf>] sta_tx_agg_session_timer_expired+0x1df/0x2a0 [mac80211]
> > >   [<ffffffffa0449ff0>] ? ieee80211_start_tx_ba_session+0x450/0x450 [mac80211]
> > >   [<ffffffff810485c5>] run_timer_softirq+0x185/0x6e0
> > > 
> > > As kmemleak seems to be involved, I have added Catalin Marinas to the Cc list.
> 
> > Looking at the code and the logs, ieee80211_start_tx_ba_session() calls
> 
> I'm almost certain that ieee80211_start_tx_ba_session() is a bogus
> calltrace entry, since we're in a timer and that's not called from a
> timer.
> 
> > rcu_dereference_protected_tid_tx() which calls
> > rcu_dereference_protected() with the (lockdep_is_held(&sta->lock) ||
> > lockdep_is_held(&sta->ampdu_mlme.mtx)) condition which is false. As the
> > kernel log says, none of these locks are held, hence the warning.
> 
> So does that just mean we need to add rcu_read_lock_held() to the
> conditions? I thought that wasn't necessary? +Paul.

If you are using rcu_dereference_protected(), you really would need
to add rcu_read_lock_held().  Except that it is not legal to use
rcu_dereference_protected() within an RCU read-side critical section
because rcu_dereference_protected() does nothing to protect against
misordering mischief from the compiler and the CPU.  Actually, that
sounds like a useful coccinelle check, now that you mention it.

So you should instead use rcu_dereference_check().  This may be used
in an RCU read-side critical section, but may also be passed things like
(lockdep_is_held(&sta->lock) || lockdep_is_held(&sta->ampdu_mlme.mtx)).

							Thanx, Paul


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Suspicious RCU usage in mac80211
  2012-05-02 17:07                 ` Johannes Berg
  2012-05-02 20:09                   ` Paul E. McKenney
@ 2012-05-03  3:02                   ` Larry Finger
  2012-05-03  8:47                     ` Catalin Marinas
  1 sibling, 1 reply; 25+ messages in thread
From: Larry Finger @ 2012-05-03  3:02 UTC (permalink / raw)
  To: Johannes Berg; +Cc: Catalin Marinas, Mohammed Shafi, wireless, Paul E. McKenney

On 05/02/2012 12:07 PM, Johannes Berg wrote:
> On Wed, 2012-05-02 at 11:00 +0100, Catalin Marinas wrote:
>
>>> Your patch does not help. I still get the following dump in the log:
>
> The patch is also wrong, we hold the mutex there and can't hold RCU read
> lock.
>
>>> #0:  (scan_mutex){+.+...}, at: [<ffffffff8113b0d6>] kmemleak_scan_thread+0x56/0xd0
>>> #1:  (&tid_tx->session_timer){+.-...}, at: [<ffffffff8104853a>]
>>> run_timer_softirq+0xfa/0x6e0
>>> #2:  (rcu_read_lock){.+.+..}, at: [<ffffffffa0449ff0>]
>>> sta_tx_agg_session_timer_expired+0x0/0x2a0 [mac80211]
>>>
>>> stack backtrace:
>>> Pid: 622, comm: kmemleak Not tainted 3.4.0-rc5-wl+ #287
>>> Call Trace:
>>>    <IRQ>   [<ffffffff8109309d>] lockdep_rcu_suspicious+0xfd/0x130
>>>    [<ffffffffa044a1cf>] sta_tx_agg_session_timer_expired+0x1df/0x2a0 [mac80211]
>>>    [<ffffffffa0449ff0>] ? ieee80211_start_tx_ba_session+0x450/0x450 [mac80211]
>>>    [<ffffffff810485c5>] run_timer_softirq+0x185/0x6e0
>>>
>>> As kmemleak seems to be involved, I have added Catalin Marinas to the Cc list.
>
>> Looking at the code and the logs, ieee80211_start_tx_ba_session() calls
>
> I'm almost certain that ieee80211_start_tx_ba_session() is a bogus
> calltrace entry, since we're in a timer and that's not called from a
> timer.
>
>> rcu_dereference_protected_tid_tx() which calls
>> rcu_dereference_protected() with the (lockdep_is_held(&sta->lock) ||
>> lockdep_is_held(&sta->ampdu_mlme.mtx)) condition which is false. As the
>> kernel log says, none of these locks are held, hence the warning.
>
> So does that just mean we need to add rcu_read_lock_held() to the
> conditions? I thought that wasn't necessary? +Paul.

If you meant the following

Index: wireless-testing-new/net/mac80211/sta_info.h
===================================================================
--- wireless-testing-new.orig/net/mac80211/sta_info.h
+++ wireless-testing-new/net/mac80211/sta_info.h
@@ -453,7 +453,8 @@ rcu_dereference_protected_tid_tx(struct
  {
         return rcu_dereference_protected(sta->ampdu_mlme.tid_tx[tid],
                                          lockdep_is_held(&sta->lock) ||
-                                        lockdep_is_held(&sta->ampdu_mlme.mtx));
+                                        lockdep_is_held(&sta->ampdu_mlme.mtx) ||
+                                        rcu_read_lock_held());
  }

  #define STA_HASH_SIZE 256

It does not help.

Larry


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Suspicious RCU usage in mac80211
  2012-05-03  3:02                   ` Larry Finger
@ 2012-05-03  8:47                     ` Catalin Marinas
  2012-05-03 16:54                       ` Larry Finger
  0 siblings, 1 reply; 25+ messages in thread
From: Catalin Marinas @ 2012-05-03  8:47 UTC (permalink / raw)
  To: Larry Finger; +Cc: Johannes Berg, Mohammed Shafi, wireless, Paul E. McKenney

On Thu, May 03, 2012 at 04:02:12AM +0100, Larry Finger wrote:
> On 05/02/2012 12:07 PM, Johannes Berg wrote:
> > On Wed, 2012-05-02 at 11:00 +0100, Catalin Marinas wrote:
> >
> >>> Your patch does not help. I still get the following dump in the log:
> >
> > The patch is also wrong, we hold the mutex there and can't hold RCU read
> > lock.
> >
> >>> #0:  (scan_mutex){+.+...}, at: [<ffffffff8113b0d6>] kmemleak_scan_thread+0x56/0xd0
> >>> #1:  (&tid_tx->session_timer){+.-...}, at: [<ffffffff8104853a>]
> >>> run_timer_softirq+0xfa/0x6e0
> >>> #2:  (rcu_read_lock){.+.+..}, at: [<ffffffffa0449ff0>]
> >>> sta_tx_agg_session_timer_expired+0x0/0x2a0 [mac80211]
> >>>
> >>> stack backtrace:
> >>> Pid: 622, comm: kmemleak Not tainted 3.4.0-rc5-wl+ #287
> >>> Call Trace:
> >>>    <IRQ>   [<ffffffff8109309d>] lockdep_rcu_suspicious+0xfd/0x130
> >>>    [<ffffffffa044a1cf>] sta_tx_agg_session_timer_expired+0x1df/0x2a0 [mac80211]
> >>>    [<ffffffffa0449ff0>] ? ieee80211_start_tx_ba_session+0x450/0x450 [mac80211]
> >>>    [<ffffffff810485c5>] run_timer_softirq+0x185/0x6e0
> >>>
> >>> As kmemleak seems to be involved, I have added Catalin Marinas to the Cc list.
> >
> >> Looking at the code and the logs, ieee80211_start_tx_ba_session() calls
> >
> > I'm almost certain that ieee80211_start_tx_ba_session() is a bogus
> > calltrace entry, since we're in a timer and that's not called from a
> > timer.
> >
> >> rcu_dereference_protected_tid_tx() which calls
> >> rcu_dereference_protected() with the (lockdep_is_held(&sta->lock) ||
> >> lockdep_is_held(&sta->ampdu_mlme.mtx)) condition which is false. As the
> >> kernel log says, none of these locks are held, hence the warning.
> >
> > So does that just mean we need to add rcu_read_lock_held() to the
> > conditions? I thought that wasn't necessary? +Paul.
> 
> If you meant the following
> 
> Index: wireless-testing-new/net/mac80211/sta_info.h
> ===================================================================
> --- wireless-testing-new.orig/net/mac80211/sta_info.h
> +++ wireless-testing-new/net/mac80211/sta_info.h
> @@ -453,7 +453,8 @@ rcu_dereference_protected_tid_tx(struct
>   {
>          return rcu_dereference_protected(sta->ampdu_mlme.tid_tx[tid],

IIUC, Paul suggested that you should use rcu_dereference_check() here
instead as the protected one is not safe in this context.

-- 
Catalin

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Suspicious RCU usage in mac80211
  2012-05-03  8:47                     ` Catalin Marinas
@ 2012-05-03 16:54                       ` Larry Finger
  2012-05-03 17:12                         ` Paul E. McKenney
  0 siblings, 1 reply; 25+ messages in thread
From: Larry Finger @ 2012-05-03 16:54 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: Johannes Berg, Mohammed Shafi, wireless, Paul E. McKenney

On 05/03/2012 03:47 AM, Catalin Marinas wrote:
>
> IIUC, Paul suggested that you should use rcu_dereference_check() here
> instead as the protected one is not safe in this context.

This patch also fails to fix the problem. Did I do what Paul suggested?


Index: wireless-testing-new/net/mac80211/sta_info.h
===================================================================
--- wireless-testing-new.orig/net/mac80211/sta_info.h
+++ wireless-testing-new/net/mac80211/sta_info.h
@@ -452,7 +452,7 @@ void ieee80211_assign_tid_tx(struct sta_
  static inline struct tid_ampdu_tx *
  rcu_dereference_protected_tid_tx(struct sta_info *sta, int tid)
  {
-       return rcu_dereference_protected(sta->ampdu_mlme.tid_tx[tid],
+       return rcu_dereference_check(sta->ampdu_mlme.tid_tx[tid],
                                          lockdep_is_held(&sta->lock) ||
                                          lockdep_is_held(&sta->ampdu_mlme.mtx));
  }

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Suspicious RCU usage in mac80211
  2012-05-03 16:54                       ` Larry Finger
@ 2012-05-03 17:12                         ` Paul E. McKenney
  2012-05-03 17:46                           ` Larry Finger
  0 siblings, 1 reply; 25+ messages in thread
From: Paul E. McKenney @ 2012-05-03 17:12 UTC (permalink / raw)
  To: Larry Finger; +Cc: Catalin Marinas, Johannes Berg, Mohammed Shafi, wireless

On Thu, May 03, 2012 at 11:54:29AM -0500, Larry Finger wrote:
> On 05/03/2012 03:47 AM, Catalin Marinas wrote:
> >
> >IIUC, Paul suggested that you should use rcu_dereference_check() here
> >instead as the protected one is not safe in this context.
> 
> This patch also fails to fix the problem. Did I do what Paul suggested?

That is indeed what I suggested!

What locks does lockdep say are held?

							Thanx, Paul

> Index: wireless-testing-new/net/mac80211/sta_info.h
> ===================================================================
> --- wireless-testing-new.orig/net/mac80211/sta_info.h
> +++ wireless-testing-new/net/mac80211/sta_info.h
> @@ -452,7 +452,7 @@ void ieee80211_assign_tid_tx(struct sta_
>  static inline struct tid_ampdu_tx *
>  rcu_dereference_protected_tid_tx(struct sta_info *sta, int tid)
>  {
> -       return rcu_dereference_protected(sta->ampdu_mlme.tid_tx[tid],
> +       return rcu_dereference_check(sta->ampdu_mlme.tid_tx[tid],
>                                          lockdep_is_held(&sta->lock) ||
>                                          lockdep_is_held(&sta->ampdu_mlme.mtx));
>  }
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Suspicious RCU usage in mac80211
  2012-05-03 17:12                         ` Paul E. McKenney
@ 2012-05-03 17:46                           ` Larry Finger
  2012-05-03 18:22                             ` Paul E. McKenney
  0 siblings, 1 reply; 25+ messages in thread
From: Larry Finger @ 2012-05-03 17:46 UTC (permalink / raw)
  To: paulmck; +Cc: Catalin Marinas, Johannes Berg, Mohammed Shafi, wireless

On 05/03/2012 12:12 PM, Paul E. McKenney wrote:
> On Thu, May 03, 2012 at 11:54:29AM -0500, Larry Finger wrote:
>> On 05/03/2012 03:47 AM, Catalin Marinas wrote:
>>>
>>> IIUC, Paul suggested that you should use rcu_dereference_check() here
>>> instead as the protected one is not safe in this context.
>>
>> This patch also fails to fix the problem. Did I do what Paul suggested?
>
> That is indeed what I suggested!
>
> What locks does lockdep say are held?

It varies from instance to instance. I have seen 1, 2, or 3. Those are the 
following:

#0:  (scan_mutex){+.+...}, at: [<ffffffff8113b0d6>] kmemleak_scan_thread+0x56/0xd0
#1:  (&tid_tx->session_timer){+.-...}, at: [<ffffffff8104853a>] 
run_timer_softirq+0xfa/0x6e0
#2:  (rcu_read_lock){.+.+..}, at: [<ffffffffa03c1ff0>] 
sta_tx_agg_session_timer_expired+0x0/0x2a0 [mac80211]


When only 1 lock is held, it is "&tid_tx->session_timer", and that one is held 
in every case. I think that means we need to OR it with the other 
lockdep_is_held() arguments in the rcu_dereference_xxxxx() call, but I did not 
seem to get the syntax right.

Larry

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Suspicious RCU usage in mac80211
  2012-05-03 17:46                           ` Larry Finger
@ 2012-05-03 18:22                             ` Paul E. McKenney
  2012-05-03 18:32                               ` Larry Finger
  0 siblings, 1 reply; 25+ messages in thread
From: Paul E. McKenney @ 2012-05-03 18:22 UTC (permalink / raw)
  To: Larry Finger; +Cc: Catalin Marinas, Johannes Berg, Mohammed Shafi, wireless

On Thu, May 03, 2012 at 12:46:02PM -0500, Larry Finger wrote:
> On 05/03/2012 12:12 PM, Paul E. McKenney wrote:
> >On Thu, May 03, 2012 at 11:54:29AM -0500, Larry Finger wrote:
> >>On 05/03/2012 03:47 AM, Catalin Marinas wrote:
> >>>
> >>>IIUC, Paul suggested that you should use rcu_dereference_check() here
> >>>instead as the protected one is not safe in this context.
> >>
> >>This patch also fails to fix the problem. Did I do what Paul suggested?
> >
> >That is indeed what I suggested!
> >
> >What locks does lockdep say are held?
> 
> It varies from instance to instance. I have seen 1, 2, or 3. Those
> are the following:
> 
> #0:  (scan_mutex){+.+...}, at: [<ffffffff8113b0d6>] kmemleak_scan_thread+0x56/0xd0
> #1:  (&tid_tx->session_timer){+.-...}, at: [<ffffffff8104853a>]
> run_timer_softirq+0xfa/0x6e0
> #2:  (rcu_read_lock){.+.+..}, at: [<ffffffffa03c1ff0>]
> sta_tx_agg_session_timer_expired+0x0/0x2a0 [mac80211]
> 
> 
> When only 1 lock is held, it is "&tid_tx->session_timer", and that
> one is held in every case. I think that means we need to OR it with
> the other lockdep_is_held() arguments in the rcu_dereference_xxxxx()
> call, but I did not seem to get the syntax right.

If any one of several locks (call them a, b, and c) is sufficient to
protect the data, and if an rcu_read_lock() also suffices, then something
like the following would work:

	return rcu_dereference_check(sta->ampdu_mlme.tid_tx[tid],
				     lockdep_is_held(&a) ||
				     lockdep_is_held(&b) ||
				     lockdep_is_held(&c));

But I must defer to the developers and maintainers of that code
as to exactly which combinations of locks and RCU are required.
The rcu_read_lock_held() is supplied by rcu_dereference_check(), so I
am surprised that you got a splat that included rcu_read_lock.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Suspicious RCU usage in mac80211
  2012-05-03 18:22                             ` Paul E. McKenney
@ 2012-05-03 18:32                               ` Larry Finger
  0 siblings, 0 replies; 25+ messages in thread
From: Larry Finger @ 2012-05-03 18:32 UTC (permalink / raw)
  To: paulmck; +Cc: Catalin Marinas, Johannes Berg, Mohammed Shafi, wireless

On 05/03/2012 01:22 PM, Paul E. McKenney wrote:
> On Thu, May 03, 2012 at 12:46:02PM -0500, Larry Finger wrote:
>> On 05/03/2012 12:12 PM, Paul E. McKenney wrote:
>>> On Thu, May 03, 2012 at 11:54:29AM -0500, Larry Finger wrote:
>>>> On 05/03/2012 03:47 AM, Catalin Marinas wrote:
>>>>>
>>>>> IIUC, Paul suggested that you should use rcu_dereference_check() here
>>>>> instead as the protected one is not safe in this context.
>>>>
>>>> This patch also fails to fix the problem. Did I do what Paul suggested?
>>>
>>> That is indeed what I suggested!
>>>
>>> What locks does lockdep say are held?
>>
>> It varies from instance to instance. I have seen 1, 2, or 3. Those
>> are the following:
>>
>> #0:  (scan_mutex){+.+...}, at: [<ffffffff8113b0d6>] kmemleak_scan_thread+0x56/0xd0
>> #1:  (&tid_tx->session_timer){+.-...}, at: [<ffffffff8104853a>]
>> run_timer_softirq+0xfa/0x6e0
>> #2:  (rcu_read_lock){.+.+..}, at: [<ffffffffa03c1ff0>]
>> sta_tx_agg_session_timer_expired+0x0/0x2a0 [mac80211]
>>
>>
>> When only 1 lock is held, it is "&tid_tx->session_timer", and that
>> one is held in every case. I think that means we need to OR it with
>> the other lockdep_is_held() arguments in the rcu_dereference_xxxxx()
>> call, but I did not seem to get the syntax right.
>
> If any one of several locks (call them a, b, and c) is sufficient to
> protect the data, and if an rcu_read_lock() also suffices, then something
> like the following would work:
>
> 	return rcu_dereference_check(sta->ampdu_mlme.tid_tx[tid],
> 				     lockdep_is_held(&a) ||
> 				     lockdep_is_held(&b) ||
> 				     lockdep_is_held(&c));
>
> But I must defer to the developers and maintainers of that code
> as to exactly which combinations of locks and RCU are required.
> The rcu_read_lock_held() is supplied by rcu_dereference_check(), so I
> am surprised that you got a splat that included rcu_read_lock.

That was before I switched to rcu_dereference_check(). Sorry if that misled you.

Larry


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Suspicious RCU usage in mac80211
  2012-05-02 20:09                   ` Paul E. McKenney
@ 2012-05-03 18:38                     ` Johannes Berg
  2012-05-04  6:17                       ` Larry Finger
  0 siblings, 1 reply; 25+ messages in thread
From: Johannes Berg @ 2012-05-03 18:38 UTC (permalink / raw)
  To: paulmck; +Cc: Catalin Marinas, Larry Finger, Mohammed Shafi, wireless

On Wed, 2012-05-02 at 13:09 -0700, Paul E. McKenney wrote:
> On Wed, May 02, 2012 at 07:07:51PM +0200, Johannes Berg wrote:
> > On Wed, 2012-05-02 at 11:00 +0100, Catalin Marinas wrote:
> > 
> > > > Your patch does not help. I still get the following dump in the log:
> > 
> > The patch is also wrong, we hold the mutex there and can't hold RCU read
> > lock.
> > 
> > > > #0:  (scan_mutex){+.+...}, at: [<ffffffff8113b0d6>] kmemleak_scan_thread+0x56/0xd0
> > > > #1:  (&tid_tx->session_timer){+.-...}, at: [<ffffffff8104853a>] 
> > > > run_timer_softirq+0xfa/0x6e0
> > > > #2:  (rcu_read_lock){.+.+..}, at: [<ffffffffa0449ff0>] 
> > > > sta_tx_agg_session_timer_expired+0x0/0x2a0 [mac80211]
> > > > 
> > > > stack backtrace:
> > > > Pid: 622, comm: kmemleak Not tainted 3.4.0-rc5-wl+ #287
> > > > Call Trace:
> > > >   <IRQ>  [<ffffffff8109309d>] lockdep_rcu_suspicious+0xfd/0x130
> > > >   [<ffffffffa044a1cf>] sta_tx_agg_session_timer_expired+0x1df/0x2a0 [mac80211]
> > > >   [<ffffffffa0449ff0>] ? ieee80211_start_tx_ba_session+0x450/0x450 [mac80211]
> > > >   [<ffffffff810485c5>] run_timer_softirq+0x185/0x6e0
> > > > 
> > > > As kmemleak seems to be involved, I have added Catalin Marinas to the Cc list.
> > 
> > > Looking at the code and the logs, ieee80211_start_tx_ba_session() calls
> > 
> > I'm almost certain that ieee80211_start_tx_ba_session() is a bogus
> > calltrace entry, since we're in a timer and that's not called from a
> > timer.
> > 
> > > rcu_dereference_protected_tid_tx() which calls
> > > rcu_dereference_protected() with the (lockdep_is_held(&sta->lock) ||
> > > lockdep_is_held(&sta->ampdu_mlme.mtx)) condition which is false. As the
> > > kernel log says, none of these locks are held, hence the warning.
> > 
> > So does that just mean we need to add rcu_read_lock_held() to the
> > conditions? I thought that wasn't necessary? +Paul.
> 
> If you are using rcu_dereference_protected(), you really would need
> to add rcu_read_lock_held().  Except that it is not legal to use
> rcu_dereference_protected() within an RCU read-side critical section
> because rcu_dereference_protected() does nothing to protect against
> misordering mischief from the compiler and the CPU.  Actually, that
> sounds like a useful coccinelle check, now that you mention it.
> 
> So you should instead use rcu_dereference_check().  This may be used
> in an RCU read-side critical section, but may also be passed things like
> (lockdep_is_held(&sta->lock) || lockdep_is_held(&sta->ampdu_mlme.mtx)).

Oops, of course, I should have seen that, thanks for pointing it out!

Now that I look at it, the problem actually seems to be something else
entirely though: the rcu_dereference (whichever type) in
sta_tx_agg_session_timer_expired() itself isn't protected at all,
something like the patch below is needed. Larry, can you try that?

johannes

diff --git a/net/mac80211/agg-tx.c b/net/mac80211/agg-tx.c
index 5b7053c..40d3ff4 100644
--- a/net/mac80211/agg-tx.c
+++ b/net/mac80211/agg-tx.c
@@ -421,16 +421,22 @@ static void sta_tx_agg_session_timer_expired(unsigned long data)
 	struct tid_ampdu_tx *tid_tx;
 	unsigned long timeout;
 
-	tid_tx = rcu_dereference_protected_tid_tx(sta, *ptid);
-	if (!tid_tx)
+	rcu_read_lock();
+	tid_tx = rcu_dereference(sta->ampdu_mlme.tid_tx[*ptid]);
+	if (!tid_tx) {
+		rcu_read_unlock();
 		return;
+	}
 
 	timeout = tid_tx->last_tx + TU_TO_JIFFIES(tid_tx->timeout);
 	if (time_is_after_jiffies(timeout)) {
 		mod_timer(&tid_tx->session_timer, timeout);
+		rcu_read_unlock();
 		return;
 	}
 
+	rcu_read_unlock();
+
 #ifdef CONFIG_MAC80211_HT_DEBUG
 	printk(KERN_DEBUG "tx session timer expired on tid %d\n", (u16)*ptid);
 #endif



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: Suspicious RCU usage in mac80211
  2012-05-03 18:38                     ` Johannes Berg
@ 2012-05-04  6:17                       ` Larry Finger
  2012-05-04  6:40                         ` Mohammed Shafi
  0 siblings, 1 reply; 25+ messages in thread
From: Larry Finger @ 2012-05-04  6:17 UTC (permalink / raw)
  To: Johannes Berg; +Cc: Catalin Marinas, Mohammed Shafi, wireless

On 05/03/2012 01:38 PM, Johannes Berg wrote:
> diff --git a/net/mac80211/agg-tx.c b/net/mac80211/agg-tx.c
> index 5b7053c..40d3ff4 100644
> --- a/net/mac80211/agg-tx.c
> +++ b/net/mac80211/agg-tx.c
> @@ -421,16 +421,22 @@ static void sta_tx_agg_session_timer_expired(unsigned long data)
>   	struct tid_ampdu_tx *tid_tx;
>   	unsigned long timeout;
>
> -	tid_tx = rcu_dereference_protected_tid_tx(sta, *ptid);
> -	if (!tid_tx)
> +	rcu_read_lock();
> +	tid_tx = rcu_dereference(sta->ampdu_mlme.tid_tx[*ptid]);
> +	if (!tid_tx) {
> +		rcu_read_unlock();
>   		return;
> +	}
>
>   	timeout = tid_tx->last_tx + TU_TO_JIFFIES(tid_tx->timeout);
>   	if (time_is_after_jiffies(timeout)) {
>   		mod_timer(&tid_tx->session_timer, timeout);
> +		rcu_read_unlock();
>   		return;
>   	}
>
> +	rcu_read_unlock();
> +
>   #ifdef CONFIG_MAC80211_HT_DEBUG
>   	printk(KERN_DEBUG "tx session timer expired on tid %d\n", (u16)*ptid);
>   #endif

This patch is the same as was proposed by Mohammed on May 1. I must have messed 
up the testing as I thought it failed; however, it is now working.

Mohammed - I plan on submitting this to John with a "From:" from you along with 
your s-o-b. Is that OK. It will be submitted for inclusion in 3.5.

Larry


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Suspicious RCU usage in mac80211
  2012-05-04  6:17                       ` Larry Finger
@ 2012-05-04  6:40                         ` Mohammed Shafi
  2012-05-04  6:48                           ` Mohammed Shafi
  0 siblings, 1 reply; 25+ messages in thread
From: Mohammed Shafi @ 2012-05-04  6:40 UTC (permalink / raw)
  To: Larry Finger; +Cc: Johannes Berg, Catalin Marinas, wireless

Hi Larry,

On Fri, May 4, 2012 at 11:47 AM, Larry Finger <Larry.Finger@lwfinger.net> wrote:
> On 05/03/2012 01:38 PM, Johannes Berg wrote:
>>
>> diff --git a/net/mac80211/agg-tx.c b/net/mac80211/agg-tx.c
>> index 5b7053c..40d3ff4 100644
>> --- a/net/mac80211/agg-tx.c
>> +++ b/net/mac80211/agg-tx.c
>> @@ -421,16 +421,22 @@ static void
>> sta_tx_agg_session_timer_expired(unsigned long data)
>>        struct tid_ampdu_tx *tid_tx;
>>        unsigned long timeout;
>>
>> -       tid_tx = rcu_dereference_protected_tid_tx(sta, *ptid);
>> -       if (!tid_tx)
>> +       rcu_read_lock();
>> +       tid_tx = rcu_dereference(sta->ampdu_mlme.tid_tx[*ptid]);
>> +       if (!tid_tx) {
>> +               rcu_read_unlock();
>>                return;
>> +       }
>>
>>        timeout = tid_tx->last_tx + TU_TO_JIFFIES(tid_tx->timeout);
>>        if (time_is_after_jiffies(timeout)) {
>>                mod_timer(&tid_tx->session_timer, timeout);
>> +               rcu_read_unlock();
>>                return;
>>        }
>>
>> +       rcu_read_unlock();
>> +
>>  #ifdef CONFIG_MAC80211_HT_DEBUG
>>        printk(KERN_DEBUG "tx session timer expired on tid %d\n",
>> (u16)*ptid);
>>  #endif
>
>
> This patch is the same as was proposed by Mohammed on May 1. I must have
> messed up the testing as I thought it failed; however, it is now working.

honestly i am not sure about the technical details, but Johannes patch
seems to follow almost similar
what's is done in 'sta_addba_resp_timer_expired'. while thought that
is being also with mutex under hold
but the difference is  the lock is unhold before
ieee80211_stop_tx_ba_session is called.

>
> Mohammed - I plan on submitting this to John with a "From:" from you along
> with your s-o-b. Is that OK. It will be submitted for inclusion in 3.5.
>
> Larry
>


-- 
thanks,
shafi

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Suspicious RCU usage in mac80211
  2012-05-04  6:40                         ` Mohammed Shafi
@ 2012-05-04  6:48                           ` Mohammed Shafi
  2012-05-04 13:45                             ` Larry Finger
  0 siblings, 1 reply; 25+ messages in thread
From: Mohammed Shafi @ 2012-05-04  6:48 UTC (permalink / raw)
  To: Larry Finger; +Cc: Johannes Berg, Catalin Marinas, wireless

On Fri, May 4, 2012 at 12:10 PM, Mohammed Shafi
<shafi.wireless@gmail.com> wrote:
> Hi Larry,
>
> On Fri, May 4, 2012 at 11:47 AM, Larry Finger <Larry.Finger@lwfinger.net> wrote:
>> On 05/03/2012 01:38 PM, Johannes Berg wrote:
>>>
>>> diff --git a/net/mac80211/agg-tx.c b/net/mac80211/agg-tx.c
>>> index 5b7053c..40d3ff4 100644
>>> --- a/net/mac80211/agg-tx.c
>>> +++ b/net/mac80211/agg-tx.c
>>> @@ -421,16 +421,22 @@ static void
>>> sta_tx_agg_session_timer_expired(unsigned long data)
>>>        struct tid_ampdu_tx *tid_tx;
>>>        unsigned long timeout;
>>>
>>> -       tid_tx = rcu_dereference_protected_tid_tx(sta, *ptid);
>>> -       if (!tid_tx)
>>> +       rcu_read_lock();
>>> +       tid_tx = rcu_dereference(sta->ampdu_mlme.tid_tx[*ptid]);

Larry, Johannes seems to use rcu_dereference straight way just like in
'sta_addba_resp_timer_expired'
and have it protected by rcu_read_locks

>>> +       if (!tid_tx) {
>>> +               rcu_read_unlock();
>>>                return;
>>> +       }
>>>
>>>        timeout = tid_tx->last_tx + TU_TO_JIFFIES(tid_tx->timeout);
>>>        if (time_is_after_jiffies(timeout)) {
>>>                mod_timer(&tid_tx->session_timer, timeout);
>>> +               rcu_read_unlock();
>>>                return;
>>>        }
>>>
>>> +       rcu_read_unlock();
>>> +
>>>  #ifdef CONFIG_MAC80211_HT_DEBUG
>>>        printk(KERN_DEBUG "tx session timer expired on tid %d\n",
>>> (u16)*ptid);
>>>  #endif
>>
>>
>> This patch is the same as was proposed by Mohammed on May 1. I must have
>> messed up the testing as I thought it failed; however, it is now working.
>
> honestly i am not sure about the technical details, but Johannes patch
> seems to follow almost similar
> what's is done in 'sta_addba_resp_timer_expired'. while thought that
> is being also with mutex under hold
> but the difference is  the lock is unhold before
> ieee80211_stop_tx_ba_session is called.
>
>>
>> Mohammed - I plan on submitting this to John with a "From:" from you along
>> with your s-o-b. Is that OK. It will be submitted for inclusion in 3.5.
>>
>> Larry
>>
>
>
> --
> thanks,
> shafi



-- 
thanks,
shafi

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Suspicious RCU usage in mac80211
  2012-05-04  6:48                           ` Mohammed Shafi
@ 2012-05-04 13:45                             ` Larry Finger
  2012-05-04 14:35                               ` Mohammed Shafi
  0 siblings, 1 reply; 25+ messages in thread
From: Larry Finger @ 2012-05-04 13:45 UTC (permalink / raw)
  To: Mohammed Shafi; +Cc: Johannes Berg, Catalin Marinas, wireless

On 05/04/2012 01:48 AM, Mohammed Shafi wrote:
> On Fri, May 4, 2012 at 12:10 PM, Mohammed Shafi
> <shafi.wireless@gmail.com>  wrote:
>> Hi Larry,
>>
>> On Fri, May 4, 2012 at 11:47 AM, Larry Finger<Larry.Finger@lwfinger.net>  wrote:
>>> On 05/03/2012 01:38 PM, Johannes Berg wrote:
>>>>
>>>> diff --git a/net/mac80211/agg-tx.c b/net/mac80211/agg-tx.c
>>>> index 5b7053c..40d3ff4 100644
>>>> --- a/net/mac80211/agg-tx.c
>>>> +++ b/net/mac80211/agg-tx.c
>>>> @@ -421,16 +421,22 @@ static void
>>>> sta_tx_agg_session_timer_expired(unsigned long data)
>>>>         struct tid_ampdu_tx *tid_tx;
>>>>         unsigned long timeout;
>>>>
>>>> -       tid_tx = rcu_dereference_protected_tid_tx(sta, *ptid);
>>>> -       if (!tid_tx)
>>>> +       rcu_read_lock();
>>>> +       tid_tx = rcu_dereference(sta->ampdu_mlme.tid_tx[*ptid]);
>
> Larry, Johannes seems to use rcu_dereference straight way just like in
> 'sta_addba_resp_timer_expired'
> and have it protected by rcu_read_locks

Sorry. I missed that subtle difference. Submitted "from Johannes".

Larry




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Suspicious RCU usage in mac80211
  2012-05-04 13:45                             ` Larry Finger
@ 2012-05-04 14:35                               ` Mohammed Shafi
  0 siblings, 0 replies; 25+ messages in thread
From: Mohammed Shafi @ 2012-05-04 14:35 UTC (permalink / raw)
  To: Larry Finger; +Cc: Johannes Berg, Catalin Marinas, wireless

On Fri, May 4, 2012 at 7:15 PM, Larry Finger <Larry.Finger@lwfinger.net> wrote:
> On 05/04/2012 01:48 AM, Mohammed Shafi wrote:
>>
>> On Fri, May 4, 2012 at 12:10 PM, Mohammed Shafi
>> <shafi.wireless@gmail.com>  wrote:
>>>
>>> Hi Larry,
>>>
>>> On Fri, May 4, 2012 at 11:47 AM, Larry Finger<Larry.Finger@lwfinger.net>
>>>  wrote:
>>>>
>>>> On 05/03/2012 01:38 PM, Johannes Berg wrote:
>>>>>
>>>>>
>>>>> diff --git a/net/mac80211/agg-tx.c b/net/mac80211/agg-tx.c
>>>>> index 5b7053c..40d3ff4 100644
>>>>> --- a/net/mac80211/agg-tx.c
>>>>> +++ b/net/mac80211/agg-tx.c
>>>>> @@ -421,16 +421,22 @@ static void
>>>>> sta_tx_agg_session_timer_expired(unsigned long data)
>>>>>        struct tid_ampdu_tx *tid_tx;
>>>>>        unsigned long timeout;
>>>>>
>>>>> -       tid_tx = rcu_dereference_protected_tid_tx(sta, *ptid);
>>>>> -       if (!tid_tx)
>>>>> +       rcu_read_lock();
>>>>> +       tid_tx = rcu_dereference(sta->ampdu_mlme.tid_tx[*ptid]);
>>
>>
>> Larry, Johannes seems to use rcu_dereference straight way just like in
>> 'sta_addba_resp_timer_expired'
>> and have it protected by rcu_read_locks
>
>
> Sorry. I missed that subtle difference. Submitted "from Johannes".

no problem thanks! yeah but it seems my gmail seems to be identify those
changes!!

-- 
thanks,
shafi

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2012-05-04 14:36 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-10  3:19 Suspicious RCU usage in mac80211 Larry Finger
2012-04-12  3:31 ` Johannes Berg
2012-04-12  3:51   ` Larry Finger
2012-04-12  3:54     ` Johannes Berg
2012-04-12 15:51       ` Larry Finger
2012-04-12 15:55         ` Johannes Berg
2012-05-01 14:25           ` Mohammed Shafi
2012-05-01 19:18             ` Larry Finger
2012-05-02  5:02               ` Mohammed Shafi
2012-05-02 10:00               ` Catalin Marinas
2012-05-02 17:07                 ` Johannes Berg
2012-05-02 20:09                   ` Paul E. McKenney
2012-05-03 18:38                     ` Johannes Berg
2012-05-04  6:17                       ` Larry Finger
2012-05-04  6:40                         ` Mohammed Shafi
2012-05-04  6:48                           ` Mohammed Shafi
2012-05-04 13:45                             ` Larry Finger
2012-05-04 14:35                               ` Mohammed Shafi
2012-05-03  3:02                   ` Larry Finger
2012-05-03  8:47                     ` Catalin Marinas
2012-05-03 16:54                       ` Larry Finger
2012-05-03 17:12                         ` Paul E. McKenney
2012-05-03 17:46                           ` Larry Finger
2012-05-03 18:22                             ` Paul E. McKenney
2012-05-03 18:32                               ` Larry Finger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.