* regression: soft lockup with ath9k on master-2010-04-14
@ 2010-04-15 19:44 Kalle Valo
2010-04-15 19:58 ` John W. Linville
0 siblings, 1 reply; 7+ messages in thread
From: Kalle Valo @ 2010-04-15 19:44 UTC (permalink / raw)
To: linux-wireless
Hello,
I just updated my laptop to latest wireless-testing and it everytime
soft lockups few seconds after association.
I haven't updated wireless-testing for few days, so I can't say when
this bug was introduced.
Info about my setup:
samsung x120
ubuntu 10.04 x86-64
11n ap running openwrt and ath9k, wpa2 psk
more info about chipset:
[ 18.617974] ath9k 0000:02:00.0: PCI INT A -> GSI 16 (level, low) ->
IRQ 16
[ 18.617989] ath9k 0000:02:00.0: setting latency timer to 64
[ 18.667465] ath: EEPROM regdomain: 0x65
[ 18.667468] ath: EEPROM indicates we should expect a direct regpair
map
[ 18.667473] ath: Country alpha2 being used: 00
[ 18.667475] ath: Regpair used: 0x65
[ 18.840350] phy0: Selected rate control algorithm
'ath9k_rate_control'
[ 18.841346] Registered led device: ath9k-phy0::radio
[ 18.841373] Registered led device: ath9k-phy0::assoc
[ 18.841400] Registered led device: ath9k-phy0::tx
[ 18.841425] Registered led device: ath9k-phy0::rx
[ 18.841442] phy0: Atheros AR9285 MAC/BB Rev:2 AR5133 RF Rev:e0:
mem=0xffffc90021e20000, irq=16
I will debug more tomorrow.
--
Kalle Valo
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: regression: soft lockup with ath9k on master-2010-04-14
2010-04-15 19:44 regression: soft lockup with ath9k on master-2010-04-14 Kalle Valo
@ 2010-04-15 19:58 ` John W. Linville
2010-04-19 6:29 ` Kalle Valo
0 siblings, 1 reply; 7+ messages in thread
From: John W. Linville @ 2010-04-15 19:58 UTC (permalink / raw)
To: Kalle Valo; +Cc: linux-wireless
On Thu, Apr 15, 2010 at 10:44:52PM +0300, Kalle Valo wrote:
> Hello,
>
> I just updated my laptop to latest wireless-testing and it everytime
> soft lockups few seconds after association.
>
> I haven't updated wireless-testing for few days, so I can't say when
> this bug was introduced.
It might be useful to do a bisect. If you choose to do that, you
might want to use wireless-next-2.6 instead, since that doesn't have
the occasional pulls from Linus that make bisecting wireless-testing
more painful.
John
--
John W. Linville Someday the world will need a hero, and you
linville@tuxdriver.com might be all we have. Be ready.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: regression: soft lockup with ath9k on master-2010-04-14
2010-04-15 19:58 ` John W. Linville
@ 2010-04-19 6:29 ` Kalle Valo
2010-04-19 6:43 ` Johannes Berg
0 siblings, 1 reply; 7+ messages in thread
From: Kalle Valo @ 2010-04-19 6:29 UTC (permalink / raw)
To: John W. Linville; +Cc: linux-wireless, johannes
"John W. Linville" <linville@tuxdriver.com> writes:
> On Thu, Apr 15, 2010 at 10:44:52PM +0300, Kalle Valo wrote:
>> Hello,
>>
>> I just updated my laptop to latest wireless-testing and it everytime
>> soft lockups few seconds after association.
>>
>> I haven't updated wireless-testing for few days, so I can't say when
>> this bug was introduced.
>
> It might be useful to do a bisect. If you choose to do that, you
> might want to use wireless-next-2.6 instead, since that doesn't have
> the occasional pulls from Linus that make bisecting wireless-testing
> more painful.
Thanks for the tip, it helped a lot. My new laptop is really slow to
compile kernels :/
I bisected it finally and found the culprit:
66b0470aeef10a3b0f9a6a1c60d908b5a06c62ae is the first bad commit
commit 66b0470aeef10a3b0f9a6a1c60d908b5a06c62ae
Author: Johannes Berg <johannes@sipsolutions.net>
Date: Tue Apr 6 11:18:45 2010 +0200
mac80211: remove ieee80211_sta_stop_rx_ba_session
All callers of ieee80211_sta_stop_rx_ba_session can
just call __ieee80211_stop_rx_ba_session instead
because they already have the station struct, so do
that and remove ieee80211_sta_stop_rx_ba_session.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
I verified that reverting these three patches make my laptop stable
again:
54297e4d60b74e602138594c131097347d128b5a mac80211: fix some RX aggregation...
098a607091426e79178b9a6c318d993fea131791 mac80211: clean up/fix aggregation..
66b0470aeef10a3b0f9a6a1c60d908b5a06c62ae mac80211: remove ieee80211_sta_...
(I had to revert all three because of conflicts.)
I took a quick peek of the patches but I wasn't able to immediately
say what was wrong. This just made me suspicious:
- ieee80211_sta_stop_rx_ba_session(sta->sdata, sta->sta.addr,
- (u16)*ptid, WLAN_BACK_TIMER,
- WLAN_REASON_QSTA_TIMEOUT);
+ __ieee80211_stop_rx_ba_session(sta, *ptid,
- WLAN_BACK_RECIPIENT,
+ WLAN_REASON_QSTA_TIMEOUT);
WLAN_BACK_TIMER was changed to WLAN_BACK_RECIPIENT, but I don't know
if it was in purpose or not. Johannes, any ideas?
--
Kalle Valo
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: regression: soft lockup with ath9k on master-2010-04-14
2010-04-19 6:29 ` Kalle Valo
@ 2010-04-19 6:43 ` Johannes Berg
2010-04-19 7:19 ` Kalle Valo
0 siblings, 1 reply; 7+ messages in thread
From: Johannes Berg @ 2010-04-19 6:43 UTC (permalink / raw)
To: Kalle Valo; +Cc: John W. Linville, linux-wireless
On Mon, 2010-04-19 at 09:29 +0300, Kalle Valo wrote:
> > It might be useful to do a bisect. If you choose to do that, you
> > might want to use wireless-next-2.6 instead, since that doesn't have
> > the occasional pulls from Linus that make bisecting wireless-testing
> > more painful.
>
> Thanks for the tip, it helped a lot. My new laptop is really slow to
> compile kernels :/
>
> I bisected it finally and found the culprit:
>
> 66b0470aeef10a3b0f9a6a1c60d908b5a06c62ae is the first bad commit
> commit 66b0470aeef10a3b0f9a6a1c60d908b5a06c62ae
> Author: Johannes Berg <johannes@sipsolutions.net>
> Date: Tue Apr 6 11:18:45 2010 +0200
>
> mac80211: remove ieee80211_sta_stop_rx_ba_session
[...]
> I took a quick peek of the patches but I wasn't able to immediately
> say what was wrong. This just made me suspicious:
>
> - ieee80211_sta_stop_rx_ba_session(sta->sdata, sta->sta.addr,
> - (u16)*ptid, WLAN_BACK_TIMER,
> - WLAN_REASON_QSTA_TIMEOUT);
> + __ieee80211_stop_rx_ba_session(sta, *ptid,
> - WLAN_BACK_RECIPIENT,
> + WLAN_REASON_QSTA_TIMEOUT);
>
> WLAN_BACK_TIMER was changed to WLAN_BACK_RECIPIENT, but I don't know
> if it was in purpose or not. Johannes, any ideas?
That was on purpose but belongs into
098a607091426e79178b9a6c318d993fea131791 not this patch ... :(
However that shouldn't be the problem. Or rather, that could be the
reason you're seeing the problem on this patch, rather than the 098a
one.
Try the patch below?
johannes
--- wireless-testing.orig/net/mac80211/agg-rx.c 2010-04-19 08:40:17.000000000 +0200
+++ wireless-testing/net/mac80211/agg-rx.c 2010-04-19 08:40:27.000000000 +0200
@@ -47,11 +47,6 @@ void __ieee80211_stop_rx_ba_session(stru
printk(KERN_DEBUG "HW problem - can not stop rx "
"aggregation for tid %d\n", tid);
- /* check if this is a self generated aggregation halt */
- if (initiator == WLAN_BACK_RECIPIENT)
- ieee80211_send_delba(sta->sdata, sta->sta.addr,
- tid, 0, reason);
-
/* free the reordering buffer */
for (i = 0; i < tid_rx->buf_size; i++) {
if (tid_rx->reorder_buf[i]) {
@@ -69,6 +64,11 @@ void __ieee80211_stop_rx_ba_session(stru
spin_unlock_bh(&sta->lock);
+ /* check if this is a self generated aggregation halt */
+ if (initiator == WLAN_BACK_RECIPIENT)
+ ieee80211_send_delba(sta->sdata, sta->sta.addr,
+ tid, 0, reason);
+
del_timer_sync(&tid_rx->session_timer);
kfree(tid_rx);
}
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: regression: soft lockup with ath9k on master-2010-04-14
2010-04-19 6:43 ` Johannes Berg
@ 2010-04-19 7:19 ` Kalle Valo
2010-04-19 9:00 ` [PATCH] mac80211: fix stopping RX BA session from timer Johannes Berg
0 siblings, 1 reply; 7+ messages in thread
From: Kalle Valo @ 2010-04-19 7:19 UTC (permalink / raw)
To: Johannes Berg; +Cc: John W. Linville, linux-wireless
Johannes Berg <johannes@sipsolutions.net> writes:
> However that shouldn't be the problem. Or rather, that could be the
> reason you're seeing the problem on this patch, rather than the 098a
> one.
>
> Try the patch below?
Sorry, no luck. Still my laptop freezes.
--
Kalle Valo
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH] mac80211: fix stopping RX BA session from timer
2010-04-19 7:19 ` Kalle Valo
@ 2010-04-19 9:00 ` Johannes Berg
2010-04-19 11:22 ` Kalle Valo
0 siblings, 1 reply; 7+ messages in thread
From: Johannes Berg @ 2010-04-19 9:00 UTC (permalink / raw)
To: Kalle Valo; +Cc: John W. Linville, linux-wireless
Kalle reported that his system deadlocks since my
recent work in this area. The reason quickly became
apparent: we try to cancel_timer_sync() a timer
from within itself. Fix that by making the function
aware of the context it is called from.
Reported-by: Kalle Valo <kvalo@adurom.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
---
Please apply after Kalle confirms it fixes the problem.
net/mac80211/agg-rx.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)
--- wireless-testing.orig/net/mac80211/agg-rx.c 2010-04-19 10:49:25.000000000 +0200
+++ wireless-testing/net/mac80211/agg-rx.c 2010-04-19 10:51:57.000000000 +0200
@@ -18,8 +18,9 @@
#include "ieee80211_i.h"
#include "driver-ops.h"
-void __ieee80211_stop_rx_ba_session(struct sta_info *sta, u16 tid,
- u16 initiator, u16 reason)
+static void ___ieee80211_stop_rx_ba_session(struct sta_info *sta, u16 tid,
+ u16 initiator, u16 reason,
+ bool from_timer)
{
struct ieee80211_local *local = sta->local;
struct tid_ampdu_rx *tid_rx;
@@ -69,10 +70,17 @@ void __ieee80211_stop_rx_ba_session(stru
spin_unlock_bh(&sta->lock);
- del_timer_sync(&tid_rx->session_timer);
+ if (!from_timer)
+ del_timer_sync(&tid_rx->session_timer);
kfree(tid_rx);
}
+void __ieee80211_stop_rx_ba_session(struct sta_info *sta, u16 tid,
+ u16 initiator, u16 reason)
+{
+ ___ieee80211_stop_rx_ba_session(sta, tid, initiator, reason, false);
+}
+
/*
* After accepting the AddBA Request we activated a timer,
* resetting it after each frame that arrives from the originator.
@@ -91,8 +99,8 @@ static void sta_rx_agg_session_timer_exp
#ifdef CONFIG_MAC80211_HT_DEBUG
printk(KERN_DEBUG "rx session timer expired on tid %d\n", (u16)*ptid);
#endif
- __ieee80211_stop_rx_ba_session(sta, *ptid, WLAN_BACK_RECIPIENT,
- WLAN_REASON_QSTA_TIMEOUT);
+ ___ieee80211_stop_rx_ba_session(sta, *ptid, WLAN_BACK_RECIPIENT,
+ WLAN_REASON_QSTA_TIMEOUT, true);
}
static void ieee80211_send_addba_resp(struct ieee80211_sub_if_data *sdata, u8 *da, u16 tid,
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] mac80211: fix stopping RX BA session from timer
2010-04-19 9:00 ` [PATCH] mac80211: fix stopping RX BA session from timer Johannes Berg
@ 2010-04-19 11:22 ` Kalle Valo
0 siblings, 0 replies; 7+ messages in thread
From: Kalle Valo @ 2010-04-19 11:22 UTC (permalink / raw)
To: Johannes Berg; +Cc: John W. Linville, linux-wireless
Johannes Berg <johannes@sipsolutions.net> writes:
> Kalle reported that his system deadlocks since my
> recent work in this area. The reason quickly became
> apparent: we try to cancel_timer_sync() a timer
> from within itself. Fix that by making the function
> aware of the context it is called from.
I have now tested this for an hour and I can't reproduce the problem
anymore, earlier I was able to reproduce at least within a minute or
so. So I'm confident that the problem I saw is this fixed by this
patch.
Thank you very much for fixing this so quickly.
> Reported-by: Kalle Valo <kvalo@adurom.com>
> Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Tested-by: Kalle Valo <kvalo@adurom.com>
--
Kalle Valo
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2010-04-19 11:22 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-15 19:44 regression: soft lockup with ath9k on master-2010-04-14 Kalle Valo
2010-04-15 19:58 ` John W. Linville
2010-04-19 6:29 ` Kalle Valo
2010-04-19 6:43 ` Johannes Berg
2010-04-19 7:19 ` Kalle Valo
2010-04-19 9:00 ` [PATCH] mac80211: fix stopping RX BA session from timer Johannes Berg
2010-04-19 11:22 ` Kalle Valo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).