linux-wireless.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* regression: soft lockup with ath9k on master-2010-04-14
@ 2010-04-15 19:44 Kalle Valo
  2010-04-15 19:58 ` John W. Linville
  0 siblings, 1 reply; 7+ messages in thread
From: Kalle Valo @ 2010-04-15 19:44 UTC (permalink / raw)
  To: linux-wireless

Hello,

I just updated my laptop to latest wireless-testing and it everytime
soft lockups few seconds after association.

I haven't updated wireless-testing for few days, so I can't say when
this bug was introduced.

Info about my setup:

samsung x120
ubuntu 10.04 x86-64
11n ap running openwrt and ath9k, wpa2 psk

more info about chipset:

[   18.617974] ath9k 0000:02:00.0: PCI INT A -> GSI 16 (level, low) ->
IRQ 16
[   18.617989] ath9k 0000:02:00.0: setting latency timer to 64
[   18.667465] ath: EEPROM regdomain: 0x65
[   18.667468] ath: EEPROM indicates we should expect a direct regpair
map
[   18.667473] ath: Country alpha2 being used: 00
[   18.667475] ath: Regpair used: 0x65
[   18.840350] phy0: Selected rate control algorithm
'ath9k_rate_control'
[   18.841346] Registered led device: ath9k-phy0::radio
[   18.841373] Registered led device: ath9k-phy0::assoc
[   18.841400] Registered led device: ath9k-phy0::tx
[   18.841425] Registered led device: ath9k-phy0::rx
[   18.841442] phy0: Atheros AR9285 MAC/BB Rev:2 AR5133 RF Rev:e0:
mem=0xffffc90021e20000, irq=16

I will debug more tomorrow.

-- 
Kalle Valo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: regression: soft lockup with ath9k on master-2010-04-14
  2010-04-15 19:44 regression: soft lockup with ath9k on master-2010-04-14 Kalle Valo
@ 2010-04-15 19:58 ` John W. Linville
  2010-04-19  6:29   ` Kalle Valo
  0 siblings, 1 reply; 7+ messages in thread
From: John W. Linville @ 2010-04-15 19:58 UTC (permalink / raw)
  To: Kalle Valo; +Cc: linux-wireless

On Thu, Apr 15, 2010 at 10:44:52PM +0300, Kalle Valo wrote:
> Hello,
> 
> I just updated my laptop to latest wireless-testing and it everytime
> soft lockups few seconds after association.
> 
> I haven't updated wireless-testing for few days, so I can't say when
> this bug was introduced.

It might be useful to do a bisect.  If you choose to do that, you
might want to use wireless-next-2.6 instead, since that doesn't have
the occasional pulls from Linus that make bisecting wireless-testing
more painful.

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: regression: soft lockup with ath9k on master-2010-04-14
  2010-04-15 19:58 ` John W. Linville
@ 2010-04-19  6:29   ` Kalle Valo
  2010-04-19  6:43     ` Johannes Berg
  0 siblings, 1 reply; 7+ messages in thread
From: Kalle Valo @ 2010-04-19  6:29 UTC (permalink / raw)
  To: John W. Linville; +Cc: linux-wireless, johannes

"John W. Linville" <linville@tuxdriver.com> writes:

> On Thu, Apr 15, 2010 at 10:44:52PM +0300, Kalle Valo wrote:
>> Hello,
>> 
>> I just updated my laptop to latest wireless-testing and it everytime
>> soft lockups few seconds after association.
>> 
>> I haven't updated wireless-testing for few days, so I can't say when
>> this bug was introduced.
>
> It might be useful to do a bisect.  If you choose to do that, you
> might want to use wireless-next-2.6 instead, since that doesn't have
> the occasional pulls from Linus that make bisecting wireless-testing
> more painful.

Thanks for the tip, it helped a lot. My new laptop is really slow to
compile kernels :/

I bisected it finally and found the culprit:

66b0470aeef10a3b0f9a6a1c60d908b5a06c62ae is the first bad commit
commit 66b0470aeef10a3b0f9a6a1c60d908b5a06c62ae
Author: Johannes Berg <johannes@sipsolutions.net>
Date:   Tue Apr 6 11:18:45 2010 +0200

    mac80211: remove ieee80211_sta_stop_rx_ba_session
    
    All callers of ieee80211_sta_stop_rx_ba_session can
    just call __ieee80211_stop_rx_ba_session instead
    because they already have the station struct, so do
    that and remove ieee80211_sta_stop_rx_ba_session.
    
    Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
    Signed-off-by: John W. Linville <linville@tuxdriver.com>

I verified that reverting these three patches make my laptop stable
again:

54297e4d60b74e602138594c131097347d128b5a mac80211: fix some RX aggregation...
098a607091426e79178b9a6c318d993fea131791 mac80211: clean up/fix aggregation..
66b0470aeef10a3b0f9a6a1c60d908b5a06c62ae mac80211: remove ieee80211_sta_...

(I had to revert all three because of conflicts.)

I took a quick peek of the patches but I wasn't able to immediately
say what was wrong. This just made me suspicious:

-       ieee80211_sta_stop_rx_ba_session(sta->sdata, sta->sta.addr,
-                                        (u16)*ptid, WLAN_BACK_TIMER,
-                                        WLAN_REASON_QSTA_TIMEOUT);
+       __ieee80211_stop_rx_ba_session(sta, *ptid,
-                                        WLAN_BACK_RECIPIENT,
+                                      WLAN_REASON_QSTA_TIMEOUT);

WLAN_BACK_TIMER was changed to WLAN_BACK_RECIPIENT, but I don't know
if it was in purpose or not. Johannes, any ideas?

-- 
Kalle Valo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: regression: soft lockup with ath9k on master-2010-04-14
  2010-04-19  6:29   ` Kalle Valo
@ 2010-04-19  6:43     ` Johannes Berg
  2010-04-19  7:19       ` Kalle Valo
  0 siblings, 1 reply; 7+ messages in thread
From: Johannes Berg @ 2010-04-19  6:43 UTC (permalink / raw)
  To: Kalle Valo; +Cc: John W. Linville, linux-wireless

On Mon, 2010-04-19 at 09:29 +0300, Kalle Valo wrote:

> > It might be useful to do a bisect.  If you choose to do that, you
> > might want to use wireless-next-2.6 instead, since that doesn't have
> > the occasional pulls from Linus that make bisecting wireless-testing
> > more painful.
> 
> Thanks for the tip, it helped a lot. My new laptop is really slow to
> compile kernels :/
> 
> I bisected it finally and found the culprit:
> 
> 66b0470aeef10a3b0f9a6a1c60d908b5a06c62ae is the first bad commit
> commit 66b0470aeef10a3b0f9a6a1c60d908b5a06c62ae
> Author: Johannes Berg <johannes@sipsolutions.net>
> Date:   Tue Apr 6 11:18:45 2010 +0200
> 
>     mac80211: remove ieee80211_sta_stop_rx_ba_session

[...]

> I took a quick peek of the patches but I wasn't able to immediately
> say what was wrong. This just made me suspicious:
> 
> -       ieee80211_sta_stop_rx_ba_session(sta->sdata, sta->sta.addr,
> -                                        (u16)*ptid, WLAN_BACK_TIMER,
> -                                        WLAN_REASON_QSTA_TIMEOUT);
> +       __ieee80211_stop_rx_ba_session(sta, *ptid,
> -                                        WLAN_BACK_RECIPIENT,
> +                                      WLAN_REASON_QSTA_TIMEOUT);
> 
> WLAN_BACK_TIMER was changed to WLAN_BACK_RECIPIENT, but I don't know
> if it was in purpose or not. Johannes, any ideas?

That was on purpose but belongs into
098a607091426e79178b9a6c318d993fea131791 not this patch ... :(

However that shouldn't be the problem. Or rather, that could be the
reason you're seeing the problem on this patch, rather than the 098a
one.

Try the patch below?

johannes

--- wireless-testing.orig/net/mac80211/agg-rx.c	2010-04-19 08:40:17.000000000 +0200
+++ wireless-testing/net/mac80211/agg-rx.c	2010-04-19 08:40:27.000000000 +0200
@@ -47,11 +47,6 @@ void __ieee80211_stop_rx_ba_session(stru
 		printk(KERN_DEBUG "HW problem - can not stop rx "
 				"aggregation for tid %d\n", tid);
 
-	/* check if this is a self generated aggregation halt */
-	if (initiator == WLAN_BACK_RECIPIENT)
-		ieee80211_send_delba(sta->sdata, sta->sta.addr,
-				     tid, 0, reason);
-
 	/* free the reordering buffer */
 	for (i = 0; i < tid_rx->buf_size; i++) {
 		if (tid_rx->reorder_buf[i]) {
@@ -69,6 +64,11 @@ void __ieee80211_stop_rx_ba_session(stru
 
 	spin_unlock_bh(&sta->lock);
 
+	/* check if this is a self generated aggregation halt */
+	if (initiator == WLAN_BACK_RECIPIENT)
+		ieee80211_send_delba(sta->sdata, sta->sta.addr,
+				     tid, 0, reason);
+
 	del_timer_sync(&tid_rx->session_timer);
 	kfree(tid_rx);
 }



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: regression: soft lockup with ath9k on master-2010-04-14
  2010-04-19  6:43     ` Johannes Berg
@ 2010-04-19  7:19       ` Kalle Valo
  2010-04-19  9:00         ` [PATCH] mac80211: fix stopping RX BA session from timer Johannes Berg
  0 siblings, 1 reply; 7+ messages in thread
From: Kalle Valo @ 2010-04-19  7:19 UTC (permalink / raw)
  To: Johannes Berg; +Cc: John W. Linville, linux-wireless

Johannes Berg <johannes@sipsolutions.net> writes:

> However that shouldn't be the problem. Or rather, that could be the
> reason you're seeing the problem on this patch, rather than the 098a
> one.
>
> Try the patch below?

Sorry, no luck. Still my laptop freezes.

-- 
Kalle Valo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] mac80211: fix stopping RX BA session from timer
  2010-04-19  7:19       ` Kalle Valo
@ 2010-04-19  9:00         ` Johannes Berg
  2010-04-19 11:22           ` Kalle Valo
  0 siblings, 1 reply; 7+ messages in thread
From: Johannes Berg @ 2010-04-19  9:00 UTC (permalink / raw)
  To: Kalle Valo; +Cc: John W. Linville, linux-wireless

Kalle reported that his system deadlocks since my
recent work in this area. The reason quickly became
apparent: we try to cancel_timer_sync() a timer
from within itself. Fix that by making the function
aware of the context it is called from.

Reported-by: Kalle Valo <kvalo@adurom.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
---
Please apply after Kalle confirms it fixes the problem.

 net/mac80211/agg-rx.c |   18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

--- wireless-testing.orig/net/mac80211/agg-rx.c	2010-04-19 10:49:25.000000000 +0200
+++ wireless-testing/net/mac80211/agg-rx.c	2010-04-19 10:51:57.000000000 +0200
@@ -18,8 +18,9 @@
 #include "ieee80211_i.h"
 #include "driver-ops.h"
 
-void __ieee80211_stop_rx_ba_session(struct sta_info *sta, u16 tid,
-				    u16 initiator, u16 reason)
+static void ___ieee80211_stop_rx_ba_session(struct sta_info *sta, u16 tid,
+					    u16 initiator, u16 reason,
+					    bool from_timer)
 {
 	struct ieee80211_local *local = sta->local;
 	struct tid_ampdu_rx *tid_rx;
@@ -69,10 +70,17 @@ void __ieee80211_stop_rx_ba_session(stru
 
 	spin_unlock_bh(&sta->lock);
 
-	del_timer_sync(&tid_rx->session_timer);
+	if (!from_timer)
+		del_timer_sync(&tid_rx->session_timer);
 	kfree(tid_rx);
 }
 
+void __ieee80211_stop_rx_ba_session(struct sta_info *sta, u16 tid,
+				    u16 initiator, u16 reason)
+{
+	___ieee80211_stop_rx_ba_session(sta, tid, initiator, reason, false);
+}
+
 /*
  * After accepting the AddBA Request we activated a timer,
  * resetting it after each frame that arrives from the originator.
@@ -91,8 +99,8 @@ static void sta_rx_agg_session_timer_exp
 #ifdef CONFIG_MAC80211_HT_DEBUG
 	printk(KERN_DEBUG "rx session timer expired on tid %d\n", (u16)*ptid);
 #endif
-	__ieee80211_stop_rx_ba_session(sta, *ptid, WLAN_BACK_RECIPIENT,
-				       WLAN_REASON_QSTA_TIMEOUT);
+	___ieee80211_stop_rx_ba_session(sta, *ptid, WLAN_BACK_RECIPIENT,
+					WLAN_REASON_QSTA_TIMEOUT, true);
 }
 
 static void ieee80211_send_addba_resp(struct ieee80211_sub_if_data *sdata, u8 *da, u16 tid,



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mac80211: fix stopping RX BA session from timer
  2010-04-19  9:00         ` [PATCH] mac80211: fix stopping RX BA session from timer Johannes Berg
@ 2010-04-19 11:22           ` Kalle Valo
  0 siblings, 0 replies; 7+ messages in thread
From: Kalle Valo @ 2010-04-19 11:22 UTC (permalink / raw)
  To: Johannes Berg; +Cc: John W. Linville, linux-wireless

Johannes Berg <johannes@sipsolutions.net> writes:

> Kalle reported that his system deadlocks since my
> recent work in this area. The reason quickly became
> apparent: we try to cancel_timer_sync() a timer
> from within itself. Fix that by making the function
> aware of the context it is called from.

I have now tested this for an hour and I can't reproduce the problem
anymore, earlier I was able to reproduce at least within a minute or
so. So I'm confident that the problem I saw is this fixed by this
patch.

Thank you very much for fixing this so quickly.

> Reported-by: Kalle Valo <kvalo@adurom.com>
> Signed-off-by: Johannes Berg <johannes@sipsolutions.net>

Tested-by: Kalle Valo <kvalo@adurom.com>

-- 
Kalle Valo

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-04-19 11:22 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-15 19:44 regression: soft lockup with ath9k on master-2010-04-14 Kalle Valo
2010-04-15 19:58 ` John W. Linville
2010-04-19  6:29   ` Kalle Valo
2010-04-19  6:43     ` Johannes Berg
2010-04-19  7:19       ` Kalle Valo
2010-04-19  9:00         ` [PATCH] mac80211: fix stopping RX BA session from timer Johannes Berg
2010-04-19 11:22           ` Kalle Valo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).