linux-wireless.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH v2] ath10k: fix wmi mgmt tx queue full due to race condition
@ 2021-08-09  9:51 David Heidelberg
  2021-08-09 19:01 ` Brian Norris
  0 siblings, 1 reply; 6+ messages in thread
From: David Heidelberg @ 2021-08-09  9:51 UTC (permalink / raw)
  To: miaoqing, briannorris, kvalo; +Cc: linux-wireless

Hello all,

since I noticed this issue is very common (at least for me and some 
others) on 4.14 kernels [1] [2] would you think that backporting this 
patch into stable would make sense? I assume that at some point it 
could help some OpenWRT/LEDE and other devices (since for Turris it'll 
be most likely backported anyway).

Thank you for the working on this!
David

[1] 
https://forum.turris.cz/t/5-2-4-patch-wifi-fails-after-while-wmi-mgmt-tx-queue-is-full/15510
[2] https://forum.turris.cz/t/unstable-wifi-on-mox-b/11065/




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] ath10k: fix wmi mgmt tx queue full due to race condition
  2021-08-09  9:51 [PATCH v2] ath10k: fix wmi mgmt tx queue full due to race condition David Heidelberg
@ 2021-08-09 19:01 ` Brian Norris
  2021-08-09 19:40   ` David Heidelberg
  0 siblings, 1 reply; 6+ messages in thread
From: Brian Norris @ 2021-08-09 19:01 UTC (permalink / raw)
  To: David Heidelberg; +Cc: Miaoqing Pan, Kalle Valo, linux-wireless

(NB: I think your Reply-To header was wrong, so I've chosen to modify
that in hopes of actually reaching you.)

On Mon, Aug 9, 2021 at 2:52 AM David Heidelberg <david@ixit.cz> wrote:
> since I noticed this issue is very common (at least for me and some
> others) on 4.14 kernels [1] [2] would you think that backporting this
> patch into stable would make sense? I assume that at some point it
> could help some OpenWRT/LEDE and other devices (since for Turris it'll
> be most likely backported anyway).
>
> Thank you for the working on this!
> David
>
> [1]
> https://forum.turris.cz/t/5-2-4-patch-wifi-fails-after-while-wmi-mgmt-tx-queue-is-full/15510
> [2] https://forum.turris.cz/t/unstable-wifi-on-mox-b/11065/

Seems reasonable to me. The right way to submit such a request is
documented here:

https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html

Because this wasn't identified as a -stable candidate when first
submitted, you'll need either Option 2 or Option 3.

Regards,
Brian

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] ath10k: fix wmi mgmt tx queue full due to race condition
  2021-08-09 19:01 ` Brian Norris
@ 2021-08-09 19:40   ` David Heidelberg
  0 siblings, 0 replies; 6+ messages in thread
From: David Heidelberg @ 2021-08-09 19:40 UTC (permalink / raw)
  To: Brian Norris; +Cc: Miaoqing Pan, Kalle Valo, linux-wireless

Thank you,

you're right, I messed up Reply-to. I'll follow the Option 2.

Thank you
Best regards
David Heidelberg

On Mon, Aug 9 2021 at 12:01:45 -0700, Brian Norris 
<briannorris@chromium.org> wrote:
> (NB: I think your Reply-To header was wrong, so I've chosen to modify
> that in hopes of actually reaching you.)
> 
> On Mon, Aug 9, 2021 at 2:52 AM David Heidelberg <david@ixit.cz> wrote:
>>  since I noticed this issue is very common (at least for me and some
>>  others) on 4.14 kernels [1] [2] would you think that backporting 
>> this
>>  patch into stable would make sense? I assume that at some point it
>>  could help some OpenWRT/LEDE and other devices (since for Turris 
>> it'll
>>  be most likely backported anyway).
>> 
>>  Thank you for the working on this!
>>  David
>> 
>>  [1]
>>  
>> https://forum.turris.cz/t/5-2-4-patch-wifi-fails-after-while-wmi-mgmt-tx-queue-is-full/15510
>>  [2] https://forum.turris.cz/t/unstable-wifi-on-mox-b/11065/
> 
> Seems reasonable to me. The right way to submit such a request is
> documented here:
> 
> https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
> 
> Because this wasn't identified as a -stable candidate when first
> submitted, you'll need either Option 2 or Option 3.
> 
> Regards,
> Brian



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] ath10k: fix wmi mgmt tx queue full due to race condition
  2020-12-22  6:34 Miaoqing Pan
  2020-12-22 18:26 ` Brian Norris
@ 2021-01-28  7:19 ` Kalle Valo
  1 sibling, 0 replies; 6+ messages in thread
From: Kalle Valo @ 2021-01-28  7:19 UTC (permalink / raw)
  To: Miaoqing Pan; +Cc: ath10k, linux-wireless, briannorris, Miaoqing Pan

Miaoqing Pan <miaoqing@codeaurora.org> wrote:

> Failed to transmit wmi management frames:
> 
> [84977.840894] ath10k_snoc a000000.wifi: wmi mgmt tx queue is full
> [84977.840913] ath10k_snoc a000000.wifi: failed to transmit packet, dropping: -28
> [84977.840924] ath10k_snoc a000000.wifi: failed to submit frame: -28
> [84977.840932] ath10k_snoc a000000.wifi: failed to transmit frame: -28
> 
> This issue is caused by race condition between skb_dequeue and
> __skb_queue_tail. The queue of ‘wmi_mgmt_tx_queue’ is protected by a
> different lock: ar->data_lock vs list->lock, the result is no protection.
> So when ath10k_mgmt_over_wmi_tx_work() and ath10k_mac_tx_wmi_mgmt()
> running concurrently on different CPUs, there appear to be a rare corner
> cases when the queue length is 1,
> 
>   CPUx (skb_deuque)			CPUy (__skb_queue_tail)
> 					next=list
> 					prev=list
>   struct sk_buff *skb = skb_peek(list);	WRITE_ONCE(newsk->next, next);
>   WRITE_ONCE(list->qlen, list->qlen - 1);WRITE_ONCE(newsk->prev, prev);
>   next       = skb->next;		WRITE_ONCE(next->prev, newsk);
>   prev       = skb->prev;		WRITE_ONCE(prev->next, newsk);
>   skb->next  = skb->prev = NULL;	list->qlen++;
>   WRITE_ONCE(next->prev, prev);
>   WRITE_ONCE(prev->next, next);
> 
> If the instruction ‘next = skb->next’ is executed before
> ‘WRITE_ONCE(prev->next, newsk)’, newsk will be lost, as CPUx get the
> old ‘next’ pointer, but the length is still added by one. The final
> result is the length of the queue will reach the maximum value but
> the queue is empty.
> 
> So remove ar->data_lock, and use 'skb_queue_tail' instead of
> '__skb_queue_tail' to prevent the potential race condition. Also switch
> to use skb_queue_len_lockless, in case we queue a few SKBs simultaneously.
> 
> Tested-on: WCN3990 hw1.0 SNOC WLAN.HL.3.1.c2-00033-QCAHLSWMTPLZ-1
> 
> Signed-off-by: Miaoqing Pan <miaoqing@codeaurora.org>
> Reviewed-by: Brian Norris <briannorris@chromium.org>
> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

Patch applied to ath-next branch of ath.git, thanks.

b55379e343a3 ath10k: fix wmi mgmt tx queue full due to race condition

-- 
https://patchwork.kernel.org/project/linux-wireless/patch/1608618887-8857-1-git-send-email-miaoqing@codeaurora.org/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] ath10k: fix wmi mgmt tx queue full due to race condition
  2020-12-22  6:34 Miaoqing Pan
@ 2020-12-22 18:26 ` Brian Norris
  2021-01-28  7:19 ` Kalle Valo
  1 sibling, 0 replies; 6+ messages in thread
From: Brian Norris @ 2020-12-22 18:26 UTC (permalink / raw)
  To: Miaoqing Pan; +Cc: ath10k, linux-wireless

On Mon, Dec 21, 2020 at 10:34 PM Miaoqing Pan <miaoqing@codeaurora.org> wrote:
>
> Failed to transmit wmi management frames:
>
> [84977.840894] ath10k_snoc a000000.wifi: wmi mgmt tx queue is full
> [84977.840913] ath10k_snoc a000000.wifi: failed to transmit packet, dropping: -28
> [84977.840924] ath10k_snoc a000000.wifi: failed to submit frame: -28
> [84977.840932] ath10k_snoc a000000.wifi: failed to transmit frame: -28
>
> This issue is caused by race condition between skb_dequeue and
> __skb_queue_tail. The queue of ‘wmi_mgmt_tx_queue’ is protected by a
> different lock: ar->data_lock vs list->lock, the result is no protection.

Reviewed-by: Brian Norris <briannorris@chromium.org>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2] ath10k: fix wmi mgmt tx queue full due to race condition
@ 2020-12-22  6:34 Miaoqing Pan
  2020-12-22 18:26 ` Brian Norris
  2021-01-28  7:19 ` Kalle Valo
  0 siblings, 2 replies; 6+ messages in thread
From: Miaoqing Pan @ 2020-12-22  6:34 UTC (permalink / raw)
  To: ath10k; +Cc: linux-wireless, briannorris, Miaoqing Pan

Failed to transmit wmi management frames:

[84977.840894] ath10k_snoc a000000.wifi: wmi mgmt tx queue is full
[84977.840913] ath10k_snoc a000000.wifi: failed to transmit packet, dropping: -28
[84977.840924] ath10k_snoc a000000.wifi: failed to submit frame: -28
[84977.840932] ath10k_snoc a000000.wifi: failed to transmit frame: -28

This issue is caused by race condition between skb_dequeue and
__skb_queue_tail. The queue of ‘wmi_mgmt_tx_queue’ is protected by a
different lock: ar->data_lock vs list->lock, the result is no protection.
So when ath10k_mgmt_over_wmi_tx_work() and ath10k_mac_tx_wmi_mgmt()
running concurrently on different CPUs, there appear to be a rare corner
cases when the queue length is 1,

  CPUx (skb_deuque)			CPUy (__skb_queue_tail)
					next=list
					prev=list
  struct sk_buff *skb = skb_peek(list);	WRITE_ONCE(newsk->next, next);
  WRITE_ONCE(list->qlen, list->qlen - 1);WRITE_ONCE(newsk->prev, prev);
  next       = skb->next;		WRITE_ONCE(next->prev, newsk);
  prev       = skb->prev;		WRITE_ONCE(prev->next, newsk);
  skb->next  = skb->prev = NULL;	list->qlen++;
  WRITE_ONCE(next->prev, prev);
  WRITE_ONCE(prev->next, next);

If the instruction ‘next = skb->next’ is executed before
‘WRITE_ONCE(prev->next, newsk)’, newsk will be lost, as CPUx get the
old ‘next’ pointer, but the length is still added by one. The final
result is the length of the queue will reach the maximum value but
the queue is empty.

So remove ar->data_lock, and use 'skb_queue_tail' instead of
'__skb_queue_tail' to prevent the potential race condition. Also switch
to use skb_queue_len_lockless, in case we queue a few SKBs simultaneously.

Tested-on: WCN3990 hw1.0 SNOC WLAN.HL.3.1.c2-00033-QCAHLSWMTPLZ-1

Signed-off-by: Miaoqing Pan <miaoqing@codeaurora.org>
---
v2: use skb_queue_len_lockless instead of skb_queue_len
---
 drivers/net/wireless/ath/ath10k/mac.c | 15 ++++-----------
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
index dc32c78..3cefa13 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -3763,23 +3763,16 @@ bool ath10k_mac_tx_frm_has_freq(struct ath10k *ar)
 static int ath10k_mac_tx_wmi_mgmt(struct ath10k *ar, struct sk_buff *skb)
 {
 	struct sk_buff_head *q = &ar->wmi_mgmt_tx_queue;
-	int ret = 0;
-
-	spin_lock_bh(&ar->data_lock);
 
-	if (skb_queue_len(q) == ATH10K_MAX_NUM_MGMT_PENDING) {
+	if (skb_queue_len_lockless(q) >= ATH10K_MAX_NUM_MGMT_PENDING) {
 		ath10k_warn(ar, "wmi mgmt tx queue is full\n");
-		ret = -ENOSPC;
-		goto unlock;
+		return -ENOSPC;
 	}
 
-	__skb_queue_tail(q, skb);
+	skb_queue_tail(q, skb);
 	ieee80211_queue_work(ar->hw, &ar->wmi_mgmt_tx_work);
 
-unlock:
-	spin_unlock_bh(&ar->data_lock);
-
-	return ret;
+	return 0;
 }
 
 static enum ath10k_mac_tx_path
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-08-09 19:41 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-09  9:51 [PATCH v2] ath10k: fix wmi mgmt tx queue full due to race condition David Heidelberg
2021-08-09 19:01 ` Brian Norris
2021-08-09 19:40   ` David Heidelberg
  -- strict thread matches above, loose matches on Subject: below --
2020-12-22  6:34 Miaoqing Pan
2020-12-22 18:26 ` Brian Norris
2021-01-28  7:19 ` Kalle Valo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).