linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 802.11n IBSS: wlan0 stops receiving packets due to aggregation after sender reboot
@ 2019-10-21 12:11 Krzysztof Hałasa
  2019-10-21 12:18 ` [PATCH] " Krzysztof Hałasa
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Krzysztof Hałasa @ 2019-10-21 12:11 UTC (permalink / raw)
  To: Johannes Berg; +Cc: David S. Miller, linux-wireless, netdev, linux-kernel

Johannes,

it seems I've encountered a bug in mac80211 RX aggregation handler.
The hw is a pair of stations using AR9580 (PCI ID 168c:0033) PCIe
adapters. Linux 5.4-rc4.
The driver shows the chip is Atheros AR9300 Rev:4.
I'm using (on both ends):
	iw wlan0 set type ibss
	ip link set wlan0 up
	iw dev wlan0 ibss join $ESSID $FREQ HT20

The problem manifests itself after one of the stations is restarted
(or the ath9k driver is reloaded, or a station is out of range for
some time etc).
It appears that the mac80211 RX aggregation code sets a new aggregation
"session" at the remote station's request, but the head_seq_num
(the sequence number the receiver expects to receive) isn't reset.
I've added some debugging code to ___ieee80211_start_rx_ba_session()
and ieee80211_sta_manage_reorder_buf() and it produced the following:

Both stations boot and join the IBSS, packets get through:
[   61.123131] AGG RX OK: ssn 1
[   61.125346] SEQ OK: 1 vs 1
[   61.125484] SEQ OK: 2 vs 2
[   62.100841] SEQ OK: 3 vs 3
...
[  180.124210] SEQ OK: 130 vs 130
[  181.123888] SEQ OK: 131 vs 131
[  182.126046] SEQ OK: 132 vs 132

Now I'm rebooting the remote station. It joins IBSS, packets can be seen
on mon0 monitoring interface (on the local station), but they aren't
arriving on wlan0:

[  192.131102] SEQ BAD: 0 vs 133
[  192.151243] AGG RX no change - OK: ssn 1
[  192.242760] SEQ BAD: 1 vs 133
[  193.133819] SEQ BAD: 2 vs 133
[  193.272802] SEQ BAD: 3 vs 133
...
[  421.272374] SEQ BAD: 130 vs 133
[  421.303630] SEQ BAD: 131 vs 133
[  422.327924] SEQ BAD: 132 vs 133

Then the sequence number catches up and the communication is
reestablished:
[  423.167023] SEQ OK: 133 vs 133
[  423.169061] SEQ OK: 134 vs 134
[  423.351618] SEQ OK: 135 vs 135

I'll attach a patch in a separate mail but I'm not sure if it's
the optimal fix - one packet (the "SEQ BAD: 0 vs 133) is still dropped,
and I guess it won't work if the sender decides to not request
aggregation anymore.

Comments?

The debugging code:
--- a/net/mac80211/agg-rx.c
+++ b/net/mac80211/agg-rx.c
@@ -354,9 +354,10 @@ void ___ieee80211_start_rx_ba_session(struct sta_info *sta,
 			 */
 			rcu_read_lock();
 			tid_rx = rcu_dereference(sta->ampdu_mlme.tid_rx[tid]);
-			if (tid_rx && tid_rx->timeout == timeout)
+			if (tid_rx && tid_rx->timeout == timeout) {
 				status = WLAN_STATUS_SUCCESS;
-			else
+				printk(KERN_DEBUG "AGG RX no change - OK: ssn %u\n", start_seq_num);
+			} else
 				status = WLAN_STATUS_REQUEST_DECLINED;
 			rcu_read_unlock();
 			goto end;
@@ -434,6 +437,7 @@ void ___ieee80211_start_rx_ba_session(struct sta_info *sta,
 	tid_agg_rx->tid = tid;
 	tid_agg_rx->sta = sta;
 	status = WLAN_STATUS_SUCCESS;
+	printk(KERN_DEBUG "AGG RX OK: ssn %u\n", start_seq_num);
 
 	/* activate it for RX */
 	rcu_assign_pointer(sta->ampdu_mlme.tid_rx[tid], tid_agg_rx);
--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -1298,9 +1298,11 @@ static bool ieee80211_sta_manage_reorder_buf(struct ieee80211_sub_if_data *sdata
 
 	/* frame with out of date sequence number */
 	if (ieee80211_sn_less(mpdu_seq_num, head_seq_num)) {
+		printk(KERN_DEBUG "SEQ BAD: %u vs %u\n", mpdu_seq_num, head_seq_num);
 		dev_kfree_skb(skb);
 		goto out;
-	}
+	} else
+		printk(KERN_DEBUG "SEQ OK: %u vs %u\n", mpdu_seq_num, head_seq_num);
 
 	/*
 	 * If frame the sequence number exceeds our buffering window

-- 
Krzysztof Hałasa

ŁUKASIEWICZ Research Network
Industrial Research Institute for Automation and Measurements PIAP
Al. Jerozolimskie 202, 02-486 Warsaw, Poland

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] 802.11n IBSS: wlan0 stops receiving packets due to aggregation after sender reboot
  2019-10-21 12:11 802.11n IBSS: wlan0 stops receiving packets due to aggregation after sender reboot Krzysztof Hałasa
  2019-10-21 12:18 ` [PATCH] " Krzysztof Hałasa
@ 2019-10-21 12:18 ` Krzysztof Hałasa
  2019-10-25 10:21 ` [PATCH v2] " Krzysztof Hałasa
  2 siblings, 0 replies; 15+ messages in thread
From: Krzysztof Hałasa @ 2019-10-21 12:18 UTC (permalink / raw)
  To: Johannes Berg; +Cc: David S. Miller, linux-wireless, netdev, linux-kernel

Fix a bug where the mac80211 RX aggregation code sets a new aggregation
"session" at the remote station's request, but the head_seq_num
(the sequence number the receiver expects to receive) isn't reset.

Spotted on a pair of AR9580 in IBSS mode.

diff --git a/net/mac80211/agg-rx.c b/net/mac80211/agg-rx.c
index 4d1c335e06e5..775a51cc51c9 100644
--- a/net/mac80211/agg-rx.c
+++ b/net/mac80211/agg-rx.c
@@ -354,9 +354,11 @@ void ___ieee80211_start_rx_ba_session(struct sta_info *sta,
 			 */
 			rcu_read_lock();
 			tid_rx = rcu_dereference(sta->ampdu_mlme.tid_rx[tid]);
-			if (tid_rx && tid_rx->timeout == timeout)
+			if (tid_rx && tid_rx->timeout == timeout) {
+				tid_rx->ssn = start_seq_num;
+				tid_rx->head_seq_num = start_seq_num;
 				status = WLAN_STATUS_SUCCESS;
-			else
+			} else
 				status = WLAN_STATUS_REQUEST_DECLINED;
 			rcu_read_unlock();
 			goto end;

-- 
Krzysztof Hałasa

ŁUKASIEWICZ Research Network
Industrial Research Institute for Automation and Measurements PIAP
Al. Jerozolimskie 202, 02-486 Warsaw, Poland

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH] 802.11n IBSS: wlan0 stops receiving packets due to aggregation after sender reboot
  2019-10-21 12:11 802.11n IBSS: wlan0 stops receiving packets due to aggregation after sender reboot Krzysztof Hałasa
@ 2019-10-21 12:18 ` Krzysztof Hałasa
  2019-10-22  9:42   ` Sergei Shtylyov
  2019-10-21 12:18 ` Krzysztof Hałasa
  2019-10-25 10:21 ` [PATCH v2] " Krzysztof Hałasa
  2 siblings, 1 reply; 15+ messages in thread
From: Krzysztof Hałasa @ 2019-10-21 12:18 UTC (permalink / raw)
  To: Johannes Berg; +Cc: David S. Miller, linux-wireless, netdev, linux-kernel

Fix a bug where the mac80211 RX aggregation code sets a new aggregation
"session" at the remote station's request, but the head_seq_num
(the sequence number the receiver expects to receive) isn't reset.

Spotted on a pair of AR9580 in IBSS mode.

Signed-off-by: Krzysztof Halasa <khalasa@piap.pl>

diff --git a/net/mac80211/agg-rx.c b/net/mac80211/agg-rx.c
index 4d1c335e06e5..775a51cc51c9 100644
--- a/net/mac80211/agg-rx.c
+++ b/net/mac80211/agg-rx.c
@@ -354,9 +354,11 @@ void ___ieee80211_start_rx_ba_session(struct sta_info *sta,
 			 */
 			rcu_read_lock();
 			tid_rx = rcu_dereference(sta->ampdu_mlme.tid_rx[tid]);
-			if (tid_rx && tid_rx->timeout == timeout)
+			if (tid_rx && tid_rx->timeout == timeout) {
+				tid_rx->ssn = start_seq_num;
+				tid_rx->head_seq_num = start_seq_num;
 				status = WLAN_STATUS_SUCCESS;
-			else
+			} else
 				status = WLAN_STATUS_REQUEST_DECLINED;
 			rcu_read_unlock();
 			goto end;

-- 
Krzysztof Hałasa

ŁUKASIEWICZ Research Network
Industrial Research Institute for Automation and Measurements PIAP
Al. Jerozolimskie 202, 02-486 Warsaw, Poland

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] 802.11n IBSS: wlan0 stops receiving packets due to aggregation after sender reboot
  2019-10-21 12:18 ` [PATCH] " Krzysztof Hałasa
@ 2019-10-22  9:42   ` Sergei Shtylyov
  0 siblings, 0 replies; 15+ messages in thread
From: Sergei Shtylyov @ 2019-10-22  9:42 UTC (permalink / raw)
  To: Krzysztof Hałasa, Johannes Berg
  Cc: David S. Miller, linux-wireless, netdev, linux-kernel

Hello!

On 21.10.2019 15:18, Krzysztof Hałasa wrote:

> Fix a bug where the mac80211 RX aggregation code sets a new aggregation
> "session" at the remote station's request, but the head_seq_num
> (the sequence number the receiver expects to receive) isn't reset.
> 
> Spotted on a pair of AR9580 in IBSS mode.
> 
> Signed-off-by: Krzysztof Halasa <khalasa@piap.pl>
> 
> diff --git a/net/mac80211/agg-rx.c b/net/mac80211/agg-rx.c
> index 4d1c335e06e5..775a51cc51c9 100644
> --- a/net/mac80211/agg-rx.c
> +++ b/net/mac80211/agg-rx.c
> @@ -354,9 +354,11 @@ void ___ieee80211_start_rx_ba_session(struct sta_info *sta,
>   			 */
>   			rcu_read_lock();
>   			tid_rx = rcu_dereference(sta->ampdu_mlme.tid_rx[tid]);
> -			if (tid_rx && tid_rx->timeout == timeout)
> +			if (tid_rx && tid_rx->timeout == timeout) {
> +				tid_rx->ssn = start_seq_num;
> +				tid_rx->head_seq_num = start_seq_num;
>   				status = WLAN_STATUS_SUCCESS;
> -			else
> +			} else

    If you add {} on one branch of *if*, you also need to add {} to all other 
branches, says CodingStyle...

[...]

MBR, Sergei

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2] 802.11n IBSS: wlan0 stops receiving packets due to aggregation after sender reboot
  2019-10-21 12:11 802.11n IBSS: wlan0 stops receiving packets due to aggregation after sender reboot Krzysztof Hałasa
  2019-10-21 12:18 ` [PATCH] " Krzysztof Hałasa
  2019-10-21 12:18 ` Krzysztof Hałasa
@ 2019-10-25 10:21 ` Krzysztof Hałasa
  2019-10-28 12:21   ` Johannes Berg
  2 siblings, 1 reply; 15+ messages in thread
From: Krzysztof Hałasa @ 2019-10-25 10:21 UTC (permalink / raw)
  To: Johannes Berg; +Cc: David S. Miller, linux-wireless, netdev, linux-kernel

Fix a bug where the mac80211 RX aggregation code sets a new aggregation
"session" at the remote station's request, but the head_seq_num
(the sequence number the receiver expects to receive) isn't reset.

Spotted on a pair of AR9580 in IBSS mode.

Signed-off-by: Krzysztof Halasa <khalasa@piap.pl>

diff --git a/net/mac80211/agg-rx.c b/net/mac80211/agg-rx.c
index 4d1c335e06e5..67733bd61297 100644
--- a/net/mac80211/agg-rx.c
+++ b/net/mac80211/agg-rx.c
@@ -354,10 +354,13 @@ void ___ieee80211_start_rx_ba_session(struct sta_info *sta,
 			 */
 			rcu_read_lock();
 			tid_rx = rcu_dereference(sta->ampdu_mlme.tid_rx[tid]);
-			if (tid_rx && tid_rx->timeout == timeout)
+			if (tid_rx && tid_rx->timeout == timeout) {
+				tid_rx->ssn = start_seq_num;
+				tid_rx->head_seq_num = start_seq_num;
 				status = WLAN_STATUS_SUCCESS;
-			else
+			} else {
 				status = WLAN_STATUS_REQUEST_DECLINED;
+			}
 			rcu_read_unlock();
 			goto end;
 		}

-- 
Krzysztof Hałasa

ŁUKASIEWICZ Research Network
Industrial Research Institute for Automation and Measurements PIAP
Al. Jerozolimskie 202, 02-486 Warsaw, Poland

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] 802.11n IBSS: wlan0 stops receiving packets due to aggregation after sender reboot
  2019-10-25 10:21 ` [PATCH v2] " Krzysztof Hałasa
@ 2019-10-28 12:21   ` Johannes Berg
  2019-10-29  8:41     ` Koen Vandeputte
  2019-10-29  8:54     ` Krzysztof Hałasa
  0 siblings, 2 replies; 15+ messages in thread
From: Johannes Berg @ 2019-10-28 12:21 UTC (permalink / raw)
  To: Krzysztof Hałasa
  Cc: David S. Miller, linux-wireless, netdev, linux-kernel

On Fri, 2019-10-25 at 12:21 +0200, Krzysztof Hałasa wrote:
> Fix a bug where the mac80211 RX aggregation code sets a new aggregation
> "session" at the remote station's request, but the head_seq_num
> (the sequence number the receiver expects to receive) isn't reset.
> 
> Spotted on a pair of AR9580 in IBSS mode.
> 
> Signed-off-by: Krzysztof Halasa <khalasa@piap.pl>
> 
> diff --git a/net/mac80211/agg-rx.c b/net/mac80211/agg-rx.c
> index 4d1c335e06e5..67733bd61297 100644
> --- a/net/mac80211/agg-rx.c
> +++ b/net/mac80211/agg-rx.c
> @@ -354,10 +354,13 @@ void ___ieee80211_start_rx_ba_session(struct sta_info *sta,
>  			 */
>  			rcu_read_lock();
>  			tid_rx = rcu_dereference(sta->ampdu_mlme.tid_rx[tid]);
> -			if (tid_rx && tid_rx->timeout == timeout)
> +			if (tid_rx && tid_rx->timeout == timeout) {
> +				tid_rx->ssn = start_seq_num;
> +				tid_rx->head_seq_num = start_seq_num;
>  				status = WLAN_STATUS_SUCCESS;

This is wrong, this is the case of *updating an existing session*, we
must not reset the head SN then.

I think you just got very lucky (or unlucky) to have the same dialog
token, because we start from 0 - maybe we should initialize it to a
random value to flush out such issues.

Really what I think probably happened is that one of your stations lost
the connection to the other, and didn't tell it about it in any way - so
the other kept all the status alive.

I suspect to make all this work well we need to not only have the fixes
I made recently to actually send and parse deauth frames, but also to
even send an auth and reset the state when we receive that, so if we
move out of range and even the deauth frame is lost, we can still reset
properly.

In any case, this is not the right approach - we need to handle the
"lost connection" case better I suspect, but since you don't say what
really happened I don't really know that that's what you're seeing.

johannes


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] 802.11n IBSS: wlan0 stops receiving packets due to aggregation after sender reboot
  2019-10-28 12:21   ` Johannes Berg
@ 2019-10-29  8:41     ` Koen Vandeputte
  2019-10-29  8:58       ` Sebastian Gottschall
  2019-10-29  9:03       ` Johannes Berg
  2019-10-29  8:54     ` Krzysztof Hałasa
  1 sibling, 2 replies; 15+ messages in thread
From: Koen Vandeputte @ 2019-10-29  8:41 UTC (permalink / raw)
  To: Johannes Berg, Krzysztof Hałasa
  Cc: David S. Miller, linux-wireless, netdev, linux-kernel


On 28.10.19 13:21, Johannes Berg wrote:
> On Fri, 2019-10-25 at 12:21 +0200, Krzysztof Hałasa wrote:
>> Fix a bug where the mac80211 RX aggregation code sets a new aggregation
>> "session" at the remote station's request, but the head_seq_num
>> (the sequence number the receiver expects to receive) isn't reset.
>>
>> Spotted on a pair of AR9580 in IBSS mode.
>>
>> Signed-off-by: Krzysztof Halasa <khalasa@piap.pl>
>>
>> diff --git a/net/mac80211/agg-rx.c b/net/mac80211/agg-rx.c
>> index 4d1c335e06e5..67733bd61297 100644
>> --- a/net/mac80211/agg-rx.c
>> +++ b/net/mac80211/agg-rx.c
>> @@ -354,10 +354,13 @@ void ___ieee80211_start_rx_ba_session(struct sta_info *sta,
>>   			 */
>>   			rcu_read_lock();
>>   			tid_rx = rcu_dereference(sta->ampdu_mlme.tid_rx[tid]);
>> -			if (tid_rx && tid_rx->timeout == timeout)
>> +			if (tid_rx && tid_rx->timeout == timeout) {
>> +				tid_rx->ssn = start_seq_num;
>> +				tid_rx->head_seq_num = start_seq_num;
>>   				status = WLAN_STATUS_SUCCESS;
> This is wrong, this is the case of *updating an existing session*, we
> must not reset the head SN then.
>
> I think you just got very lucky (or unlucky) to have the same dialog
> token, because we start from 0 - maybe we should initialize it to a
> random value to flush out such issues.
>
> Really what I think probably happened is that one of your stations lost
> the connection to the other, and didn't tell it about it in any way - so
> the other kept all the status alive.
>
> I suspect to make all this work well we need to not only have the fixes
> I made recently to actually send and parse deauth frames, but also to
> even send an auth and reset the state when we receive that, so if we
> move out of range and even the deauth frame is lost, we can still reset
> properly.
>
> In any case, this is not the right approach - we need to handle the
> "lost connection" case better I suspect, but since you don't say what
> really happened I don't really know that that's what you're seeing.
>
> johannes

Hi all,

I can confirm the issue as I'm also seeing this sometimes in the field here.

Sometimes when a devices goes out of range and then re-enters,
the link refuses to "come up", as in rx looks to be "stuck" without any 
reports in system log or locking issues (lockdep enabled)

I have dozens of devices installed offshore (802.11n based), both on 
static and moving assets,
which cover from short (250m) up to very long distances (~35km)

So .. while there is some momentum for this issue,
I'm more than happy to provide extensive testing should fixes be posted 
regarding IBSS in general.

Regards,

Koen


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] 802.11n IBSS: wlan0 stops receiving packets due to aggregation after sender reboot
  2019-10-28 12:21   ` Johannes Berg
  2019-10-29  8:41     ` Koen Vandeputte
@ 2019-10-29  8:54     ` Krzysztof Hałasa
  2019-10-29  9:07       ` Johannes Berg
  1 sibling, 1 reply; 15+ messages in thread
From: Krzysztof Hałasa @ 2019-10-29  8:54 UTC (permalink / raw)
  To: Johannes Berg; +Cc: David S. Miller, linux-wireless, netdev, linux-kernel

Johannes Berg <johannes@sipsolutions.net> writes:

> I think you just got very lucky (or unlucky) to have the same dialog
> token, because we start from 0

Right, it seems to be the case.

> - maybe we should initialize it to a
> random value to flush out such issues.

The problem I can see is that the dialog_tokens are 8-bit, way too small
to eliminate conflicts.

> Really what I think probably happened is that one of your stations lost
> the connection to the other, and didn't tell it about it in any way - so
> the other kept all the status alive.

You must have missed my previous mail - I simply rebooted that station,
and alternatively rmmoded/modprobed ath9k. But the problem originated in
a station going out of and back in range, in fact.

> I suspect to make all this work well we need to not only have the fixes
> I made recently to actually send and parse deauth frames, but also to
> even send an auth and reset the state when we receive that, so if we
> move out of range and even the deauth frame is lost, we can still reset
> properly.

That's one thing. The other is a station trying ADDBA for the first time
after boot (while the local station has seen it before that reboot).

> In any case, this is not the right approach - we need to handle the
> "lost connection" case better I suspect, but since you don't say what
> really happened I don't really know that that's what you're seeing.

I guess we need to identify "new connection" reliably. Otherwise,
the new connections are treated as old ones and it doesn't work.

Now how can it be fixed?
-- 
Krzysztof Halasa

ŁUKASIEWICZ Research Network
Industrial Research Institute for Automation and Measurements PIAP
Al. Jerozolimskie 202, 02-486 Warsaw, Poland

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] 802.11n IBSS: wlan0 stops receiving packets due to aggregation after sender reboot
  2019-10-29  8:41     ` Koen Vandeputte
@ 2019-10-29  8:58       ` Sebastian Gottschall
  2019-10-29  9:40         ` Koen Vandeputte
  2019-10-29  9:03       ` Johannes Berg
  1 sibling, 1 reply; 15+ messages in thread
From: Sebastian Gottschall @ 2019-10-29  8:58 UTC (permalink / raw)
  To: Koen Vandeputte, Johannes Berg, Krzysztof Hałasa
  Cc: David S. Miller, linux-wireless, netdev, linux-kernel

35 km? for 802.11n with ht40 this is out of the ack timing range the 
chipset supports. so this should be considered at any troubles with 
connections

Am 29.10.2019 um 09:41 schrieb Koen Vandeputte:
>
> On 28.10.19 13:21, Johannes Berg wrote:
>> On Fri, 2019-10-25 at 12:21 +0200, Krzysztof Hałasa wrote:
>>> Fix a bug where the mac80211 RX aggregation code sets a new aggregation
>>> "session" at the remote station's request, but the head_seq_num
>>> (the sequence number the receiver expects to receive) isn't reset.
>>>
>>> Spotted on a pair of AR9580 in IBSS mode.
>>>
>>> Signed-off-by: Krzysztof Halasa <khalasa@piap.pl>
>>>
>>> diff --git a/net/mac80211/agg-rx.c b/net/mac80211/agg-rx.c
>>> index 4d1c335e06e5..67733bd61297 100644
>>> --- a/net/mac80211/agg-rx.c
>>> +++ b/net/mac80211/agg-rx.c
>>> @@ -354,10 +354,13 @@ void ___ieee80211_start_rx_ba_session(struct 
>>> sta_info *sta,
>>>                */
>>>               rcu_read_lock();
>>>               tid_rx = rcu_dereference(sta->ampdu_mlme.tid_rx[tid]);
>>> -            if (tid_rx && tid_rx->timeout == timeout)
>>> +            if (tid_rx && tid_rx->timeout == timeout) {
>>> +                tid_rx->ssn = start_seq_num;
>>> +                tid_rx->head_seq_num = start_seq_num;
>>>                   status = WLAN_STATUS_SUCCESS;
>> This is wrong, this is the case of *updating an existing session*, we
>> must not reset the head SN then.
>>
>> I think you just got very lucky (or unlucky) to have the same dialog
>> token, because we start from 0 - maybe we should initialize it to a
>> random value to flush out such issues.
>>
>> Really what I think probably happened is that one of your stations lost
>> the connection to the other, and didn't tell it about it in any way - so
>> the other kept all the status alive.
>>
>> I suspect to make all this work well we need to not only have the fixes
>> I made recently to actually send and parse deauth frames, but also to
>> even send an auth and reset the state when we receive that, so if we
>> move out of range and even the deauth frame is lost, we can still reset
>> properly.
>>
>> In any case, this is not the right approach - we need to handle the
>> "lost connection" case better I suspect, but since you don't say what
>> really happened I don't really know that that's what you're seeing.
>>
>> johannes
>
> Hi all,
>
> I can confirm the issue as I'm also seeing this sometimes in the field 
> here.
>
> Sometimes when a devices goes out of range and then re-enters,
> the link refuses to "come up", as in rx looks to be "stuck" without 
> any reports in system log or locking issues (lockdep enabled)
>
> I have dozens of devices installed offshore (802.11n based), both on 
> static and moving assets,
> which cover from short (250m) up to very long distances (~35km)
>
> So .. while there is some momentum for this issue,
> I'm more than happy to provide extensive testing should fixes be 
> posted regarding IBSS in general.
>
> Regards,
>
> Koen
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] 802.11n IBSS: wlan0 stops receiving packets due to aggregation after sender reboot
  2019-10-29  8:41     ` Koen Vandeputte
  2019-10-29  8:58       ` Sebastian Gottschall
@ 2019-10-29  9:03       ` Johannes Berg
  2019-10-29  9:47         ` Koen Vandeputte
  1 sibling, 1 reply; 15+ messages in thread
From: Johannes Berg @ 2019-10-29  9:03 UTC (permalink / raw)
  To: Koen Vandeputte, Krzysztof Hałasa
  Cc: David S. Miller, linux-wireless, netdev, linux-kernel

On Tue, 2019-10-29 at 09:41 +0100, Koen Vandeputte wrote:

> I can confirm the issue as I'm also seeing this sometimes in the field here.
> 
> Sometimes when a devices goes out of range and then re-enters,
> the link refuses to "come up", as in rx looks to be "stuck" without any 
> reports in system log or locking issues (lockdep enabled)

Right. I've recently debugged this due to issues in distributed
beaconing (rather than moving in/out of range), but I guess it would be
relatively simple to reproduce this with wmediumd, if that can be
controlled dynamically?

What kernel are you running? You could check if you have

95697f9907bf ("mac80211: accept deauth frames in IBSS mode")
4b08d1b6a994 ("mac80211: IBSS: send deauth when expiring inactive STAs")

which might help somewhat, but don't fully cover the case of moving out
of range.

johannes


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] 802.11n IBSS: wlan0 stops receiving packets due to aggregation after sender reboot
  2019-10-29  8:54     ` Krzysztof Hałasa
@ 2019-10-29  9:07       ` Johannes Berg
  2019-10-29 10:51         ` Krzysztof Hałasa
  0 siblings, 1 reply; 15+ messages in thread
From: Johannes Berg @ 2019-10-29  9:07 UTC (permalink / raw)
  To: Krzysztof Hałasa
  Cc: David S. Miller, linux-wireless, netdev, linux-kernel

On Tue, 2019-10-29 at 09:54 +0100, Krzysztof Hałasa wrote:

> The problem I can see is that the dialog_tokens are 8-bit, way too small
> to eliminate conflicts.

Well, they're also per station, we could just randomize the start and
then we'd delete the old session and start a new one, on the receiver.

So that would improve robustness somewhat (down to a 1/256 chance to hit
this problem).

> > Really what I think probably happened is that one of your stations lost
> > the connection to the other, and didn't tell it about it in any way - so
> > the other kept all the status alive.
> 
> You must have missed my previous mail - I simply rebooted that station,
> and alternatively rmmoded/modprobed ath9k. But the problem originated in
> a station going out of and back in range, in fact.

I was on vacation, so yeah, quite possible I missed it.

Sounds like we need not just
4b08d1b6a994 ("mac80211: IBSS: send deauth when expiring inactive STAs")

but also send a deauth when we disconnect. Surprising we don't do that,
actually.

> > I suspect to make all this work well we need to not only have the fixes
> > I made recently to actually send and parse deauth frames, but also to
> > even send an auth and reset the state when we receive that, so if we
> > move out of range and even the deauth frame is lost, we can still reset
> > properly.
> 
> That's one thing. The other is a station trying ADDBA for the first time
> after boot (while the local station has seen it before that reboot).

That's the situation though - the local station needs to know that it
has in fact *not* seen the same instance of the station, but that the
station has reset and needs to be removed & re-added.

> I guess we need to identify "new connection" reliably. Otherwise,
> the new connections are treated as old ones and it doesn't work.

Right. But we can implement the (optional) authentication (which you
actually already get when you implement [encrypted] IBSS with wpa_s),
and reset the station state when we get an authentication frame.

johannes


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] 802.11n IBSS: wlan0 stops receiving packets due to aggregation after sender reboot
  2019-10-29  8:58       ` Sebastian Gottschall
@ 2019-10-29  9:40         ` Koen Vandeputte
  0 siblings, 0 replies; 15+ messages in thread
From: Koen Vandeputte @ 2019-10-29  9:40 UTC (permalink / raw)
  To: Sebastian Gottschall, Johannes Berg, Krzysztof Hałasa
  Cc: David S. Miller, linux-wireless, netdev, linux-kernel


On 29.10.19 09:58, Sebastian Gottschall wrote:
> 35 km? for 802.11n with ht40 this is out of the ack timing range the 
> chipset supports. so this should be considered at any troubles with 
> connections
>
(Please don't top-post)

When we know a link can exceed ~ 21km, it's set to HT20 for this reason.

Koen

> Am 29.10.2019 um 09:41 schrieb Koen Vandeputte:
>>
>> On 28.10.19 13:21, Johannes Berg wrote:
>>> On Fri, 2019-10-25 at 12:21 +0200, Krzysztof Hałasa wrote:
>>>> Fix a bug where the mac80211 RX aggregation code sets a new 
>>>> aggregation
>>>> "session" at the remote station's request, but the head_seq_num
>>>> (the sequence number the receiver expects to receive) isn't reset.
>>>>
>>>> Spotted on a pair of AR9580 in IBSS mode.
>>>>
>>>> Signed-off-by: Krzysztof Halasa <khalasa@piap.pl>
>>>>
>>>> diff --git a/net/mac80211/agg-rx.c b/net/mac80211/agg-rx.c
>>>> index 4d1c335e06e5..67733bd61297 100644
>>>> --- a/net/mac80211/agg-rx.c
>>>> +++ b/net/mac80211/agg-rx.c
>>>> @@ -354,10 +354,13 @@ void ___ieee80211_start_rx_ba_session(struct 
>>>> sta_info *sta,
>>>>                */
>>>>               rcu_read_lock();
>>>>               tid_rx = rcu_dereference(sta->ampdu_mlme.tid_rx[tid]);
>>>> -            if (tid_rx && tid_rx->timeout == timeout)
>>>> +            if (tid_rx && tid_rx->timeout == timeout) {
>>>> +                tid_rx->ssn = start_seq_num;
>>>> +                tid_rx->head_seq_num = start_seq_num;
>>>>                   status = WLAN_STATUS_SUCCESS;
>>> This is wrong, this is the case of *updating an existing session*, we
>>> must not reset the head SN then.
>>>
>>> I think you just got very lucky (or unlucky) to have the same dialog
>>> token, because we start from 0 - maybe we should initialize it to a
>>> random value to flush out such issues.
>>>
>>> Really what I think probably happened is that one of your stations lost
>>> the connection to the other, and didn't tell it about it in any way 
>>> - so
>>> the other kept all the status alive.
>>>
>>> I suspect to make all this work well we need to not only have the fixes
>>> I made recently to actually send and parse deauth frames, but also to
>>> even send an auth and reset the state when we receive that, so if we
>>> move out of range and even the deauth frame is lost, we can still reset
>>> properly.
>>>
>>> In any case, this is not the right approach - we need to handle the
>>> "lost connection" case better I suspect, but since you don't say what
>>> really happened I don't really know that that's what you're seeing.
>>>
>>> johannes
>>
>> Hi all,
>>
>> I can confirm the issue as I'm also seeing this sometimes in the 
>> field here.
>>
>> Sometimes when a devices goes out of range and then re-enters,
>> the link refuses to "come up", as in rx looks to be "stuck" without 
>> any reports in system log or locking issues (lockdep enabled)
>>
>> I have dozens of devices installed offshore (802.11n based), both on 
>> static and moving assets,
>> which cover from short (250m) up to very long distances (~35km)
>>
>> So .. while there is some momentum for this issue,
>> I'm more than happy to provide extensive testing should fixes be 
>> posted regarding IBSS in general.
>>
>> Regards,
>>
>> Koen
>>
>>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] 802.11n IBSS: wlan0 stops receiving packets due to aggregation after sender reboot
  2019-10-29  9:03       ` Johannes Berg
@ 2019-10-29  9:47         ` Koen Vandeputte
  0 siblings, 0 replies; 15+ messages in thread
From: Koen Vandeputte @ 2019-10-29  9:47 UTC (permalink / raw)
  To: Johannes Berg, Krzysztof Hałasa
  Cc: David S. Miller, linux-wireless, netdev, linux-kernel


On 29.10.19 10:03, Johannes Berg wrote:
> On Tue, 2019-10-29 at 09:41 +0100, Koen Vandeputte wrote:
>
>> I can confirm the issue as I'm also seeing this sometimes in the field here.
>>
>> Sometimes when a devices goes out of range and then re-enters,
>> the link refuses to "come up", as in rx looks to be "stuck" without any
>> reports in system log or locking issues (lockdep enabled)
> Right. I've recently debugged this due to issues in distributed
> beaconing (rather than moving in/out of range), but I guess it would be
> relatively simple to reproduce this with wmediumd, if that can be
> controlled dynamically?
>
> What kernel are you running? You could check if you have
>
> 95697f9907bf ("mac80211: accept deauth frames in IBSS mode")
> 4b08d1b6a994 ("mac80211: IBSS: send deauth when expiring inactive STAs")
>
> which might help somewhat, but don't fully cover the case of moving out
> of range.
>
> johannes
>
I'm running OpenWrt (kernel 4.14.150 with 4.19.79 mac80211)
I noticed these fixes last week and made a build 2 days ago with them 
backported to it.
Running in the field on roughly 4 devices since a day.

Koen


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] 802.11n IBSS: wlan0 stops receiving packets due to aggregation after sender reboot
  2019-10-29  9:07       ` Johannes Berg
@ 2019-10-29 10:51         ` Krzysztof Hałasa
  2019-10-29 10:57           ` Johannes Berg
  0 siblings, 1 reply; 15+ messages in thread
From: Krzysztof Hałasa @ 2019-10-29 10:51 UTC (permalink / raw)
  To: Johannes Berg; +Cc: David S. Miller, linux-wireless, netdev, linux-kernel

Johannes Berg <johannes@sipsolutions.net> writes:

>> The problem I can see is that the dialog_tokens are 8-bit, way too small
>> to eliminate conflicts.
>
> Well, they're also per station, we could just randomize the start and
> then we'd delete the old session and start a new one, on the receiver.
>
> So that would improve robustness somewhat (down to a 1/256 chance to hit
> this problem).

That was what I meant. Still, 1/256 seems hardly acceptable to me -
unless there is some work around (a short timeout or something similar).
Remember that when it doesn't work, it doesn't work - it won't recover
until the sequence catches up, which may mean basically forever.

Or, maybe the remote station can request de-aggregation first, so the
subsequent aggregation request is always treated as new?

Alternatively, perhaps the remote can signal that it's a new request and
not merely an existing session?

> That's the situation though - the local station needs to know that it
> has in fact *not* seen the same instance of the station, but that the
> station has reset and needs to be removed & re-added.

Precisely. And it seems to me that the first time the local station
learns of this is when a new, regular, non-aggregated packet arrives.
Or, when a new aggregation request arrives.
-- 
Krzysztof Halasa

ŁUKASIEWICZ Research Network
Industrial Research Institute for Automation and Measurements PIAP
Al. Jerozolimskie 202, 02-486 Warsaw, Poland

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] 802.11n IBSS: wlan0 stops receiving packets due to aggregation after sender reboot
  2019-10-29 10:51         ` Krzysztof Hałasa
@ 2019-10-29 10:57           ` Johannes Berg
  0 siblings, 0 replies; 15+ messages in thread
From: Johannes Berg @ 2019-10-29 10:57 UTC (permalink / raw)
  To: Krzysztof Hałasa
  Cc: David S. Miller, linux-wireless, netdev, linux-kernel

On Tue, 2019-10-29 at 11:51 +0100, Krzysztof Hałasa wrote:
> Johannes Berg <johannes@sipsolutions.net> writes:
> 
> > > The problem I can see is that the dialog_tokens are 8-bit, way too small
> > > to eliminate conflicts.
> > 
> > Well, they're also per station, we could just randomize the start and
> > then we'd delete the old session and start a new one, on the receiver.
> > 
> > So that would improve robustness somewhat (down to a 1/256 chance to hit
> > this problem).
> 
> That was what I meant. Still, 1/256 seems hardly acceptable to me -
> unless there is some work around (a short timeout or something similar).
> Remember that when it doesn't work, it doesn't work - it won't recover
> until the sequence catches up, which may mean basically forever.

Agree, it just helps in "most" cases to do this. Perhaps we shouldn't do
this then so that we find the problem more easily...

> Or, maybe the remote station can request de-aggregation first, so the
> subsequent aggregation request is always treated as new?

> Alternatively, perhaps the remote can signal that it's a new request and
> not merely an existing session?

I think we should just implement authentication and reset of the station
properly, instead of fudging around with aggregation. This is just one
possible problematic scenario ... what if the station was reconfigured
with a different number of antennas in the meantime, for example, or
whatnot. There's a lot of state we keep for each station.

> > That's the situation though - the local station needs to know that it
> > has in fact *not* seen the same instance of the station, but that the
> > station has reset and needs to be removed & re-added.
> 
> Precisely. And it seems to me that the first time the local station
> learns of this is when a new, regular, non-aggregated packet arrives.
> Or, when a new aggregation request arrives.

Well, it should learn about the station when there's a beacon from it,
or if not ... we have a patch to force a probe request/response cycle so
we have all the capabilities properly. We should upstream that patch,
but need to do something to avoid being able to use this for traffic
amplification attacks.

johannes


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-10-29 10:57 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-21 12:11 802.11n IBSS: wlan0 stops receiving packets due to aggregation after sender reboot Krzysztof Hałasa
2019-10-21 12:18 ` [PATCH] " Krzysztof Hałasa
2019-10-22  9:42   ` Sergei Shtylyov
2019-10-21 12:18 ` Krzysztof Hałasa
2019-10-25 10:21 ` [PATCH v2] " Krzysztof Hałasa
2019-10-28 12:21   ` Johannes Berg
2019-10-29  8:41     ` Koen Vandeputte
2019-10-29  8:58       ` Sebastian Gottschall
2019-10-29  9:40         ` Koen Vandeputte
2019-10-29  9:03       ` Johannes Berg
2019-10-29  9:47         ` Koen Vandeputte
2019-10-29  8:54     ` Krzysztof Hałasa
2019-10-29  9:07       ` Johannes Berg
2019-10-29 10:51         ` Krzysztof Hałasa
2019-10-29 10:57           ` Johannes Berg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).