* [PATCH 1/4] wcn36xx: Fix DXE lock layering violation
2021-10-15 13:17 [PATCH 0/4] wcn36xx: Fix DMA buffer allocation and free logic Bryan O'Donoghue
@ 2021-10-15 13:17 ` Bryan O'Donoghue
2021-10-15 13:17 ` [PATCH 2/4] wcn36xx: Fix DXE/DMA channel enable/disable cycle Bryan O'Donoghue
` (2 subsequent siblings)
3 siblings, 0 replies; 6+ messages in thread
From: Bryan O'Donoghue @ 2021-10-15 13:17 UTC (permalink / raw)
To: kvalo, linux-wireless, wcn36xx
Cc: loic.poulain, benl, daniel.thompson, bryan.odonoghue
Looking at the code here we see that txrx.c is taking the dxe.c lock to set
and unset the current TX skbuff pointer.
There is no obvious logical bug however, it is a layering violation to
share locks like this.
Lets tidy up the code a bit by making access functions to set and unset the
TX sbuff. This makes it easier to read and reason about this code without
having to switch between multiple files.
Fixes: 8e84c2582169 ("wcn36xx: mac80211 driver for Qualcomm WCN3660/WCN3680 hardware")
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
---
drivers/net/wireless/ath/wcn36xx/dxe.c | 26 +++++++++++++++++++++++++
drivers/net/wireless/ath/wcn36xx/dxe.h | 2 ++
drivers/net/wireless/ath/wcn36xx/txrx.c | 15 ++------------
3 files changed, 30 insertions(+), 13 deletions(-)
diff --git a/drivers/net/wireless/ath/wcn36xx/dxe.c b/drivers/net/wireless/ath/wcn36xx/dxe.c
index 8e1dbfda65386..4e898bde1bb8c 100644
--- a/drivers/net/wireless/ath/wcn36xx/dxe.c
+++ b/drivers/net/wireless/ath/wcn36xx/dxe.c
@@ -831,6 +831,32 @@ int wcn36xx_dxe_tx_frame(struct wcn36xx *wcn,
return ret;
}
+int wcn36xx_dxe_set_tx_ack_skb(struct wcn36xx *wcn, struct sk_buff *skb)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&wcn->dxe_lock, flags);
+ if (wcn->tx_ack_skb) {
+ spin_unlock_irqrestore(&wcn->dxe_lock, flags);
+ wcn36xx_warn("tx_ack_skb already set\n");
+ return -EINVAL;
+ }
+
+ wcn->tx_ack_skb = skb;
+ spin_unlock_irqrestore(&wcn->dxe_lock, flags);
+
+ return 0;
+}
+
+void wcn36xx_dxe_unset_tx_ack_skb(struct wcn36xx *wcn)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&wcn->dxe_lock, flags);
+ wcn->tx_ack_skb = NULL;
+ spin_unlock_irqrestore(&wcn->dxe_lock, flags);
+}
+
int wcn36xx_dxe_init(struct wcn36xx *wcn)
{
int reg_data = 0, ret;
diff --git a/drivers/net/wireless/ath/wcn36xx/dxe.h b/drivers/net/wireless/ath/wcn36xx/dxe.h
index 31b81b7547a32..083a95e7de576 100644
--- a/drivers/net/wireless/ath/wcn36xx/dxe.h
+++ b/drivers/net/wireless/ath/wcn36xx/dxe.h
@@ -467,4 +467,6 @@ int wcn36xx_dxe_tx_frame(struct wcn36xx *wcn,
struct sk_buff *skb,
bool is_low);
void wcn36xx_dxe_tx_ack_ind(struct wcn36xx *wcn, u32 status);
+int wcn36xx_dxe_set_tx_ack_skb(struct wcn36xx *wcn, struct sk_buff *skb);
+void wcn36xx_dxe_unset_tx_ack_skb(struct wcn36xx *wcn);
#endif /* _DXE_H_ */
diff --git a/drivers/net/wireless/ath/wcn36xx/txrx.c b/drivers/net/wireless/ath/wcn36xx/txrx.c
index cab196bb38cd4..969210812cfbb 100644
--- a/drivers/net/wireless/ath/wcn36xx/txrx.c
+++ b/drivers/net/wireless/ath/wcn36xx/txrx.c
@@ -502,7 +502,6 @@ int wcn36xx_start_tx(struct wcn36xx *wcn,
struct ieee80211_hdr *hdr = (struct ieee80211_hdr *)skb->data;
struct wcn36xx_vif *vif_priv = NULL;
struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb);
- unsigned long flags;
bool is_low = ieee80211_is_data(hdr->frame_control);
bool bcast = is_broadcast_ether_addr(hdr->addr1) ||
is_multicast_ether_addr(hdr->addr1);
@@ -524,15 +523,8 @@ int wcn36xx_start_tx(struct wcn36xx *wcn,
if (info->flags & IEEE80211_TX_CTL_REQ_TX_STATUS) {
wcn36xx_dbg(WCN36XX_DBG_DXE, "TX_ACK status requested\n");
- spin_lock_irqsave(&wcn->dxe_lock, flags);
- if (wcn->tx_ack_skb) {
- spin_unlock_irqrestore(&wcn->dxe_lock, flags);
- wcn36xx_warn("tx_ack_skb already set\n");
+ if (wcn36xx_dxe_set_tx_ack_skb(wcn, skb))
return -EINVAL;
- }
-
- wcn->tx_ack_skb = skb;
- spin_unlock_irqrestore(&wcn->dxe_lock, flags);
/* Only one at a time is supported by fw. Stop the TX queues
* until the ack status gets back.
@@ -562,10 +554,7 @@ int wcn36xx_start_tx(struct wcn36xx *wcn,
/* If the skb has not been transmitted,
* don't keep a reference to it.
*/
- spin_lock_irqsave(&wcn->dxe_lock, flags);
- wcn->tx_ack_skb = NULL;
- spin_unlock_irqrestore(&wcn->dxe_lock, flags);
-
+ wcn36xx_dxe_unset_tx_ack_skb(wcn);
ieee80211_wake_queues(wcn->hw);
}
--
2.33.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/4] wcn36xx: Fix DXE/DMA channel enable/disable cycle
2021-10-15 13:17 [PATCH 0/4] wcn36xx: Fix DMA buffer allocation and free logic Bryan O'Donoghue
2021-10-15 13:17 ` [PATCH 1/4] wcn36xx: Fix DXE lock layering violation Bryan O'Donoghue
@ 2021-10-15 13:17 ` Bryan O'Donoghue
2021-10-15 13:17 ` [PATCH 3/4] wcn36xx: Release DMA channel descriptor allocations Bryan O'Donoghue
2021-10-15 13:17 ` [PATCH 4/4] wcn36xx: Put DXE block into reset before freeing memory Bryan O'Donoghue
3 siblings, 0 replies; 6+ messages in thread
From: Bryan O'Donoghue @ 2021-10-15 13:17 UTC (permalink / raw)
To: kvalo, linux-wireless, wcn36xx
Cc: loic.poulain, benl, daniel.thompson, bryan.odonoghue
Right now we have a broken sequence where we enable DMA channel interrupts
which can be left enabled and never disabled if we hit an error path.
Worse still when we stop WCNSS, the DMA channel interrupt bits are left
intact. About the only saving grace here is that we do remember to disable
the WCNSS interrupt when doing the stop.
Fixes: 8e84c2582169 ("wcn36xx: mac80211 driver for Qualcomm WCN3660/WCN3680 hardware")
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
---
drivers/net/wireless/ath/wcn36xx/dxe.c | 38 ++++++++++++++++++--------
1 file changed, 27 insertions(+), 11 deletions(-)
diff --git a/drivers/net/wireless/ath/wcn36xx/dxe.c b/drivers/net/wireless/ath/wcn36xx/dxe.c
index 4e898bde1bb8c..13e9a274fa26a 100644
--- a/drivers/net/wireless/ath/wcn36xx/dxe.c
+++ b/drivers/net/wireless/ath/wcn36xx/dxe.c
@@ -272,6 +272,21 @@ static int wcn36xx_dxe_enable_ch_int(struct wcn36xx *wcn, u16 wcn_ch)
return 0;
}
+static void wcn36xx_dxe_disable_ch_int(struct wcn36xx *wcn, u16 wcn_ch)
+{
+ int reg_data = 0;
+
+ wcn36xx_dxe_read_register(wcn,
+ WCN36XX_DXE_INT_MASK_REG,
+ ®_data);
+
+ reg_data &= ~wcn_ch;
+
+ wcn36xx_dxe_write_register(wcn,
+ WCN36XX_DXE_INT_MASK_REG,
+ (int)reg_data);
+}
+
static int wcn36xx_dxe_fill_skb(struct device *dev,
struct wcn36xx_dxe_ctl *ctl,
gfp_t gfp)
@@ -892,7 +907,6 @@ int wcn36xx_dxe_init(struct wcn36xx *wcn)
WCN36XX_DXE_WQ_TX_L);
wcn36xx_dxe_read_register(wcn, WCN36XX_DXE_REG_CH_EN, ®_data);
- wcn36xx_dxe_enable_ch_int(wcn, WCN36XX_INT_MASK_CHAN_TX_L);
/***************************************/
/* Init descriptors for TX HIGH channel */
@@ -916,9 +930,6 @@ int wcn36xx_dxe_init(struct wcn36xx *wcn)
wcn36xx_dxe_read_register(wcn, WCN36XX_DXE_REG_CH_EN, ®_data);
- /* Enable channel interrupts */
- wcn36xx_dxe_enable_ch_int(wcn, WCN36XX_INT_MASK_CHAN_TX_H);
-
/***************************************/
/* Init descriptors for RX LOW channel */
/***************************************/
@@ -928,7 +939,6 @@ int wcn36xx_dxe_init(struct wcn36xx *wcn)
goto out_err_rxl_ch;
}
-
/* For RX we need to preallocated buffers */
wcn36xx_dxe_ch_alloc_skb(wcn, &wcn->dxe_rx_l_ch);
@@ -951,9 +961,6 @@ int wcn36xx_dxe_init(struct wcn36xx *wcn)
WCN36XX_DXE_REG_CTL_RX_L,
WCN36XX_DXE_CH_DEFAULT_CTL_RX_L);
- /* Enable channel interrupts */
- wcn36xx_dxe_enable_ch_int(wcn, WCN36XX_INT_MASK_CHAN_RX_L);
-
/***************************************/
/* Init descriptors for RX HIGH channel */
/***************************************/
@@ -985,15 +992,18 @@ int wcn36xx_dxe_init(struct wcn36xx *wcn)
WCN36XX_DXE_REG_CTL_RX_H,
WCN36XX_DXE_CH_DEFAULT_CTL_RX_H);
- /* Enable channel interrupts */
- wcn36xx_dxe_enable_ch_int(wcn, WCN36XX_INT_MASK_CHAN_RX_H);
-
ret = wcn36xx_dxe_request_irqs(wcn);
if (ret < 0)
goto out_err_irq;
timer_setup(&wcn->tx_ack_timer, wcn36xx_dxe_tx_timer, 0);
+ /* Enable channel interrupts */
+ wcn36xx_dxe_enable_ch_int(wcn, WCN36XX_INT_MASK_CHAN_TX_L);
+ wcn36xx_dxe_enable_ch_int(wcn, WCN36XX_INT_MASK_CHAN_TX_H);
+ wcn36xx_dxe_enable_ch_int(wcn, WCN36XX_INT_MASK_CHAN_RX_L);
+ wcn36xx_dxe_enable_ch_int(wcn, WCN36XX_INT_MASK_CHAN_RX_H);
+
return 0;
out_err_irq:
@@ -1010,6 +1020,12 @@ int wcn36xx_dxe_init(struct wcn36xx *wcn)
void wcn36xx_dxe_deinit(struct wcn36xx *wcn)
{
+ /* Disable channel interrupts */
+ wcn36xx_dxe_disable_ch_int(wcn, WCN36XX_INT_MASK_CHAN_RX_H);
+ wcn36xx_dxe_disable_ch_int(wcn, WCN36XX_INT_MASK_CHAN_RX_L);
+ wcn36xx_dxe_disable_ch_int(wcn, WCN36XX_INT_MASK_CHAN_TX_H);
+ wcn36xx_dxe_disable_ch_int(wcn, WCN36XX_INT_MASK_CHAN_TX_L);
+
free_irq(wcn->tx_irq, wcn);
free_irq(wcn->rx_irq, wcn);
del_timer(&wcn->tx_ack_timer);
--
2.33.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 3/4] wcn36xx: Release DMA channel descriptor allocations
2021-10-15 13:17 [PATCH 0/4] wcn36xx: Fix DMA buffer allocation and free logic Bryan O'Donoghue
2021-10-15 13:17 ` [PATCH 1/4] wcn36xx: Fix DXE lock layering violation Bryan O'Donoghue
2021-10-15 13:17 ` [PATCH 2/4] wcn36xx: Fix DXE/DMA channel enable/disable cycle Bryan O'Donoghue
@ 2021-10-15 13:17 ` Bryan O'Donoghue
2021-10-15 13:17 ` [PATCH 4/4] wcn36xx: Put DXE block into reset before freeing memory Bryan O'Donoghue
3 siblings, 0 replies; 6+ messages in thread
From: Bryan O'Donoghue @ 2021-10-15 13:17 UTC (permalink / raw)
To: kvalo, linux-wireless, wcn36xx
Cc: loic.poulain, benl, daniel.thompson, bryan.odonoghue
When doing a stop callback we are not releasing the DMA descriptors which we
previously allocated.
The start and stop callbacks can happen incrementally depending on usage.
Failure to release the DMA descriptors leads to a reallocation of the DMA
descriptors leaking more and more memory over time.
Fixes: 8e84c2582169 ("wcn36xx: mac80211 driver for Qualcomm WCN3660/WCN3680 hardware")
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
---
drivers/net/wireless/ath/wcn36xx/dxe.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/net/wireless/ath/wcn36xx/dxe.c b/drivers/net/wireless/ath/wcn36xx/dxe.c
index 13e9a274fa26a..e89002502869a 100644
--- a/drivers/net/wireless/ath/wcn36xx/dxe.c
+++ b/drivers/net/wireless/ath/wcn36xx/dxe.c
@@ -1037,4 +1037,9 @@ void wcn36xx_dxe_deinit(struct wcn36xx *wcn)
wcn36xx_dxe_ch_free_skbs(wcn, &wcn->dxe_rx_l_ch);
wcn36xx_dxe_ch_free_skbs(wcn, &wcn->dxe_rx_h_ch);
+
+ wcn36xx_dxe_deinit_descs(wcn->dev, &wcn->dxe_tx_l_ch);
+ wcn36xx_dxe_deinit_descs(wcn->dev, &wcn->dxe_tx_h_ch);
+ wcn36xx_dxe_deinit_descs(wcn->dev, &wcn->dxe_rx_l_ch);
+ wcn36xx_dxe_deinit_descs(wcn->dev, &wcn->dxe_rx_h_ch);
}
--
2.33.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 4/4] wcn36xx: Put DXE block into reset before freeing memory
2021-10-15 13:17 [PATCH 0/4] wcn36xx: Fix DMA buffer allocation and free logic Bryan O'Donoghue
` (2 preceding siblings ...)
2021-10-15 13:17 ` [PATCH 3/4] wcn36xx: Release DMA channel descriptor allocations Bryan O'Donoghue
@ 2021-10-15 13:17 ` Bryan O'Donoghue
2021-10-17 1:28 ` Bryan O'Donoghue
3 siblings, 1 reply; 6+ messages in thread
From: Bryan O'Donoghue @ 2021-10-15 13:17 UTC (permalink / raw)
To: kvalo, linux-wireless, wcn36xx
Cc: loic.poulain, benl, daniel.thompson, bryan.odonoghue
When deiniting the DXE hardware we should reset the block to ensure there
is no spurious DMA write transaction from the downstream WCNSS to upstream
MSM at a skbuff address we will have released.
This is actually a pretty serious bug. Immediately after the reset we
release skbs, skbs which are from the perspective of the WCNSS DXE still
valid addresses for DMA.
Without first placing the DXE block into reset, it is possible for an
upstream DMA transaction to write to skbs we have freed.
We have seen some backtraces from usage in testing on 50k+ devices which
indicates an invalid RX of an APs beacon to unmapped memory.
The logical conclusion is that an RX transaction happened to a region of
memory that was previously valid but was subsequently released.
The only time such a window of opportunity exists is when we have
deallocated the skbs attached to the DMA BDs in other words after doing
wcn36xx_stop().
If we free the skbs on the DMA channel, we need to make sure we have
quiesced potential DMA on that channel prior to freeing.
This patch should eliminate that error.
Fixes: 8e84c2582169 ("wcn36xx: mac80211 driver for Qualcomm WCN3660/WCN3680 hardware")
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
---
drivers/net/wireless/ath/wcn36xx/dxe.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/net/wireless/ath/wcn36xx/dxe.c b/drivers/net/wireless/ath/wcn36xx/dxe.c
index e89002502869a..56f605c23f36c 100644
--- a/drivers/net/wireless/ath/wcn36xx/dxe.c
+++ b/drivers/net/wireless/ath/wcn36xx/dxe.c
@@ -1020,6 +1020,8 @@ int wcn36xx_dxe_init(struct wcn36xx *wcn)
void wcn36xx_dxe_deinit(struct wcn36xx *wcn)
{
+ int reg_data = 0;
+
/* Disable channel interrupts */
wcn36xx_dxe_disable_ch_int(wcn, WCN36XX_INT_MASK_CHAN_RX_H);
wcn36xx_dxe_disable_ch_int(wcn, WCN36XX_INT_MASK_CHAN_RX_L);
@@ -1035,6 +1037,10 @@ void wcn36xx_dxe_deinit(struct wcn36xx *wcn)
wcn->tx_ack_skb = NULL;
}
+ /* Put the DXE block into reset before freeing memory */
+ reg_data = WCN36XX_DXE_REG_RESET;
+ wcn36xx_dxe_write_register(wcn, WCN36XX_DXE_REG_CSR_RESET, reg_data);
+
wcn36xx_dxe_ch_free_skbs(wcn, &wcn->dxe_rx_l_ch);
wcn36xx_dxe_ch_free_skbs(wcn, &wcn->dxe_rx_h_ch);
--
2.33.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 4/4] wcn36xx: Put DXE block into reset before freeing memory
2021-10-15 13:17 ` [PATCH 4/4] wcn36xx: Put DXE block into reset before freeing memory Bryan O'Donoghue
@ 2021-10-17 1:28 ` Bryan O'Donoghue
0 siblings, 0 replies; 6+ messages in thread
From: Bryan O'Donoghue @ 2021-10-17 1:28 UTC (permalink / raw)
To: kvalo, linux-wireless, wcn36xx, Johannes Berg
Cc: loic.poulain, benl, daniel.thompson
On 15/10/2021 14:17, Bryan O'Donoghue wrote:
> When deiniting the DXE hardware we should reset the block to ensure there
> is no spurious DMA write transaction from the downstream WCNSS to upstream
> MSM at a skbuff address we will have released.
>
> This is actually a pretty serious bug. Immediately after the reset we
> release skbs, skbs which are from the perspective of the WCNSS DXE still
> valid addresses for DMA.
>
> Without first placing the DXE block into reset, it is possible for an
> upstream DMA transaction to write to skbs we have freed.
>
> We have seen some backtraces from usage in testing on 50k+ devices which
> indicates an invalid RX of an APs beacon to unmapped memory.
>
> The logical conclusion is that an RX transaction happened to a region of
> memory that was previously valid but was subsequently released.
>
> The only time such a window of opportunity exists is when we have
> deallocated the skbs attached to the DMA BDs in other words after doing
> wcn36xx_stop().
>
> If we free the skbs on the DMA channel, we need to make sure we have
> quiesced potential DMA on that channel prior to freeing.
>
> This patch should eliminate that error.
>
> Fixes: 8e84c2582169 ("wcn36xx: mac80211 driver for Qualcomm WCN3660/WCN3680 hardware")
> Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
> ---
> drivers/net/wireless/ath/wcn36xx/dxe.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/drivers/net/wireless/ath/wcn36xx/dxe.c b/drivers/net/wireless/ath/wcn36xx/dxe.c
> index e89002502869a..56f605c23f36c 100644
> --- a/drivers/net/wireless/ath/wcn36xx/dxe.c
> +++ b/drivers/net/wireless/ath/wcn36xx/dxe.c
> @@ -1020,6 +1020,8 @@ int wcn36xx_dxe_init(struct wcn36xx *wcn)
>
> void wcn36xx_dxe_deinit(struct wcn36xx *wcn)
> {
> + int reg_data = 0;
> +
> /* Disable channel interrupts */
> wcn36xx_dxe_disable_ch_int(wcn, WCN36XX_INT_MASK_CHAN_RX_H);
> wcn36xx_dxe_disable_ch_int(wcn, WCN36XX_INT_MASK_CHAN_RX_L);
> @@ -1035,6 +1037,10 @@ void wcn36xx_dxe_deinit(struct wcn36xx *wcn)
> wcn->tx_ack_skb = NULL;
> }
>
> + /* Put the DXE block into reset before freeing memory */
> + reg_data = WCN36XX_DXE_REG_RESET;
> + wcn36xx_dxe_write_register(wcn, WCN36XX_DXE_REG_CSR_RESET, reg_data);
> +
> wcn36xx_dxe_ch_free_skbs(wcn, &wcn->dxe_rx_l_ch);
> wcn36xx_dxe_ch_free_skbs(wcn, &wcn->dxe_rx_h_ch);
>
>
Johannes asked me separately if we need to wait for the quiesence to
complete.
I don't see that downstream but, that doesn't mean we shouldn't do it.
So I'll investigate that.
Also - now that I look at this code, this being the second usage of the
CSR_RESET like this, also means the reset can be functionally decomposed
into a routine.
So - I'll look into the first and definitely do the second as a V2
^ permalink raw reply [flat|nested] 6+ messages in thread