All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] wifi: ath9k: deal with uninit memory
@ 2023-03-15 20:21 Fedor Pchelkin
  2023-03-15 20:21 ` [PATCH 1/3] wifi: ath9k: avoid referencing uninit memory in ath9k_wmi_ctrl_rx Fedor Pchelkin
                   ` (3 more replies)
  0 siblings, 4 replies; 24+ messages in thread
From: Fedor Pchelkin @ 2023-03-15 20:21 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Fedor Pchelkin, Kalle Valo, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Senthil Balasubramanian,
	John W. Linville, Vasanthakumar Thiagarajan, Sujith,
	linux-wireless, netdev, linux-kernel, Alexey Khoroshilov,
	lvc-project

Syzkaller reports two cases ([1] and [2]) of uninitialized memory referencing in ath9k
wmi functions. The following patch series is intended to fix them and related issues.

[1] https://syzkaller.appspot.com/bug?id=51d401326d8ee41859d68997acdd6f3b1b39f186
[2] https://syzkaller.appspot.com/bug?id=fc54e8d79f5d5082c7867259d71b4e6618b69d25

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/3] wifi: ath9k: avoid referencing uninit memory in ath9k_wmi_ctrl_rx
  2023-03-15 20:21 [PATCH 0/3] wifi: ath9k: deal with uninit memory Fedor Pchelkin
@ 2023-03-15 20:21 ` Fedor Pchelkin
  2023-03-17  5:26   ` Kalle Valo
  2023-04-24 18:23   ` Fedor Pchelkin
  2023-03-15 20:21 ` [PATCH 2/3] wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx Fedor Pchelkin
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 24+ messages in thread
From: Fedor Pchelkin @ 2023-03-15 20:21 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Fedor Pchelkin, Kalle Valo, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Senthil Balasubramanian,
	John W. Linville, Vasanthakumar Thiagarajan, Sujith,
	linux-wireless, netdev, linux-kernel, Alexey Khoroshilov,
	lvc-project, syzbot+f2cb6e0ffdb961921e4d

For the reasons described in commit b383e8abed41 ("wifi: ath9k: avoid
uninit memory read in ath9k_htc_rx_msg()"), ath9k_htc_rx_msg() should
validate pkt_len before accessing the SKB. For example, the obtained SKB
may have uninitialized memory in the case of
ioctl(USB_RAW_IOCTL_EP_WRITE).

Implement sanity checking inside the corresponding endpoint RX handlers:
ath9k_wmi_ctrl_rx() and ath9k_htc_rxep(). Otherwise, uninit memory can
be referenced.

Add comments briefly describing the issue.

Found by Linux Verification Center (linuxtesting.org) with Syzkaller.

Fixes: fb9987d0f748 ("ath9k_htc: Support for AR9271 chipset.")
Reported-and-tested-by: syzbot+f2cb6e0ffdb961921e4d@syzkaller.appspotmail.com
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
---
 drivers/net/wireless/ath/ath9k/htc_drv_txrx.c | 6 ++++++
 drivers/net/wireless/ath/ath9k/htc_hst.c      | 4 ++++
 drivers/net/wireless/ath/ath9k/wmi.c          | 8 ++++++++
 3 files changed, 18 insertions(+)

diff --git a/drivers/net/wireless/ath/ath9k/htc_drv_txrx.c b/drivers/net/wireless/ath/ath9k/htc_drv_txrx.c
index 672789e3c55d..957efb26019d 100644
--- a/drivers/net/wireless/ath/ath9k/htc_drv_txrx.c
+++ b/drivers/net/wireless/ath/ath9k/htc_drv_txrx.c
@@ -1147,6 +1147,12 @@ void ath9k_htc_rxep(void *drv_priv, struct sk_buff *skb,
 	if (!data_race(priv->rx.initialized))
 		goto err;
 
+	/* Validate the obtained SKB so that it is handled without error
+	 * inside rx_tasklet handler.
+	 */
+	if (unlikely(skb->len < sizeof(struct ieee80211_hdr)))
+		goto err;
+
 	spin_lock_irqsave(&priv->rx.rxbuflock, flags);
 	list_for_each_entry(tmp_buf, &priv->rx.rxbuf, list) {
 		if (!tmp_buf->in_process) {
diff --git a/drivers/net/wireless/ath/ath9k/htc_hst.c b/drivers/net/wireless/ath/ath9k/htc_hst.c
index fe62ff668f75..9d0d9d0e1aa8 100644
--- a/drivers/net/wireless/ath/ath9k/htc_hst.c
+++ b/drivers/net/wireless/ath/ath9k/htc_hst.c
@@ -475,6 +475,10 @@ void ath9k_htc_rx_msg(struct htc_target *htc_handle,
 		skb_pull(skb, sizeof(struct htc_frame_hdr));
 
 		endpoint = &htc_handle->endpoint[epid];
+
+		/* The endpoint RX handlers should implement their own
+		 * additional SKB sanity checking
+		 */
 		if (endpoint->ep_callbacks.rx)
 			endpoint->ep_callbacks.rx(endpoint->ep_callbacks.priv,
 						  skb, epid);
diff --git a/drivers/net/wireless/ath/ath9k/wmi.c b/drivers/net/wireless/ath/ath9k/wmi.c
index 19345b8f7bfd..2e7c361b62f5 100644
--- a/drivers/net/wireless/ath/ath9k/wmi.c
+++ b/drivers/net/wireless/ath/ath9k/wmi.c
@@ -204,6 +204,10 @@ static void ath9k_wmi_rsp_callback(struct wmi *wmi, struct sk_buff *skb)
 {
 	skb_pull(skb, sizeof(struct wmi_cmd_hdr));
 
+	/* Once again validate the SKB. */
+	if (unlikely(skb->len < wmi->cmd_rsp_len))
+		return;
+
 	if (wmi->cmd_rsp_buf != NULL && wmi->cmd_rsp_len != 0)
 		memcpy(wmi->cmd_rsp_buf, skb->data, wmi->cmd_rsp_len);
 
@@ -221,6 +225,10 @@ static void ath9k_wmi_ctrl_rx(void *priv, struct sk_buff *skb,
 	if (unlikely(wmi->stopped))
 		goto free_skb;
 
+	/* Validate the obtained SKB. */
+	if (unlikely(skb->len < sizeof(struct wmi_cmd_hdr)))
+		goto free_skb;
+
 	hdr = (struct wmi_cmd_hdr *) skb->data;
 	cmd_id = be16_to_cpu(hdr->command_id);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/3] wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx
  2023-03-15 20:21 [PATCH 0/3] wifi: ath9k: deal with uninit memory Fedor Pchelkin
  2023-03-15 20:21 ` [PATCH 1/3] wifi: ath9k: avoid referencing uninit memory in ath9k_wmi_ctrl_rx Fedor Pchelkin
@ 2023-03-15 20:21 ` Fedor Pchelkin
  2023-04-24 19:11   ` Fedor Pchelkin
  2023-03-15 20:21 ` [PATCH 3/3] wifi: ath9k: fix ath9k_wmi_cmd return value when device is unplugged Fedor Pchelkin
  2023-03-15 20:47 ` [PATCH 0/3] wifi: ath9k: deal with uninit memory Fedor Pchelkin
  3 siblings, 1 reply; 24+ messages in thread
From: Fedor Pchelkin @ 2023-03-15 20:21 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Fedor Pchelkin, Kalle Valo, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Senthil Balasubramanian,
	John W. Linville, Vasanthakumar Thiagarajan, Sujith,
	linux-wireless, netdev, linux-kernel, Alexey Khoroshilov,
	lvc-project, syzbot+df61b36319e045c00a08

Currently, the synchronization between ath9k_wmi_cmd() and
ath9k_wmi_ctrl_rx() is exposed to races which can result, for example, not
only in wmi->cmd_rsp_buf being incorrectly initialized, but in overall
invalid behaviour of ath9k_wmi_cmd().

Consider the following scenario:

CPU0					CPU1

ath9k_wmi_cmd(...)
  mutex_lock(&wmi->op_mutex)
  ath9k_wmi_cmd_issue(...)
  wait_for_completion_timeout(...)
  ---
  timeout
  ---
  mutex_unlock(&wmi->op_mutex)
  return -ETIMEDOUT

ath9k_wmi_cmd(...)
  mutex_lock(&wmi->op_mutex)
  ath9k_wmi_cmd_issue(...)
  wait_for_completion_timeout(...)
					/* the one left after the first
					 * ath9k_wmi_cmd call
					 */
					ath9k_wmi_rsp_callback(...)
					  memcpy(...)
					  complete(&wmi->cmd_wait);
  /* Awakened by the bogus callback
   * => invalid return
   */
  mutex_unlock(&wmi->op_mutex)
  return 0

This may occur even on uniprocessor machines, and in SMP case the races
may be even more intricate.

To fix this, move the contents of ath9k_wmi_rsp_callback() under wmi_lock
inside ath9k_wmi_ctrl_rx() so that the wmi->cmd_wait can be completed only
for initially designated wmi_cmd call, otherwise the path would be
rejected with last_seq_id check.

Also move recording the rsp buffer and length into ath9k_wmi_cmd_issue()
under the same wmi_lock with last_seq_id update to avoid their racy
changes.

Found by Linux Verification Center (linuxtesting.org) with Syzkaller.

Fixes: fb9987d0f748 ("ath9k_htc: Support for AR9271 chipset.")
Reported-and-tested-by: syzbot+df61b36319e045c00a08@syzkaller.appspotmail.com
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
---
This patch depends on [PATCH 1/3] wifi: ath9k: avoid referencing uninit
memory in ath9k_wmi_ctrl_rx

 drivers/net/wireless/ath/ath9k/wmi.c | 48 ++++++++++++++--------------
 1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/drivers/net/wireless/ath/ath9k/wmi.c b/drivers/net/wireless/ath/ath9k/wmi.c
index 2e7c361b62f5..99a91bbaace9 100644
--- a/drivers/net/wireless/ath/ath9k/wmi.c
+++ b/drivers/net/wireless/ath/ath9k/wmi.c
@@ -200,20 +200,6 @@ void ath9k_fatal_work(struct work_struct *work)
 	ath9k_htc_reset(priv);
 }
 
-static void ath9k_wmi_rsp_callback(struct wmi *wmi, struct sk_buff *skb)
-{
-	skb_pull(skb, sizeof(struct wmi_cmd_hdr));
-
-	/* Once again validate the SKB. */
-	if (unlikely(skb->len < wmi->cmd_rsp_len))
-		return;
-
-	if (wmi->cmd_rsp_buf != NULL && wmi->cmd_rsp_len != 0)
-		memcpy(wmi->cmd_rsp_buf, skb->data, wmi->cmd_rsp_len);
-
-	complete(&wmi->cmd_wait);
-}
-
 static void ath9k_wmi_ctrl_rx(void *priv, struct sk_buff *skb,
 			      enum htc_endpoint_id epid)
 {
@@ -242,14 +228,26 @@ static void ath9k_wmi_ctrl_rx(void *priv, struct sk_buff *skb,
 
 	/* Check if there has been a timeout. */
 	spin_lock_irqsave(&wmi->wmi_lock, flags);
-	if (be16_to_cpu(hdr->seq_no) != wmi->last_seq_id) {
+	if (be16_to_cpu(hdr->seq_no) != wmi->last_seq_id ||
+	    be16_to_cpu(hdr->seq_no) == 0) {
+		spin_unlock_irqrestore(&wmi->wmi_lock, flags);
+		goto free_skb;
+	}
+
+	/* Next, process WMI command response */
+	skb_pull(skb, sizeof(struct wmi_cmd_hdr));
+
+	/* Once again validate the SKB. */
+	if (unlikely(skb->len < wmi->cmd_rsp_len)) {
 		spin_unlock_irqrestore(&wmi->wmi_lock, flags);
 		goto free_skb;
 	}
-	spin_unlock_irqrestore(&wmi->wmi_lock, flags);
 
-	/* WMI command response */
-	ath9k_wmi_rsp_callback(wmi, skb);
+	if (wmi->cmd_rsp_buf != NULL && wmi->cmd_rsp_len != 0)
+		memcpy(wmi->cmd_rsp_buf, skb->data, wmi->cmd_rsp_len);
+
+	complete(&wmi->cmd_wait);
+	spin_unlock_irqrestore(&wmi->wmi_lock, flags);
 
 free_skb:
 	kfree_skb(skb);
@@ -287,7 +285,8 @@ int ath9k_wmi_connect(struct htc_target *htc, struct wmi *wmi,
 
 static int ath9k_wmi_cmd_issue(struct wmi *wmi,
 			       struct sk_buff *skb,
-			       enum wmi_cmd_id cmd, u16 len)
+			       enum wmi_cmd_id cmd, u16 len,
+			       u8 *rsp_buf, u32 rsp_len)
 {
 	struct wmi_cmd_hdr *hdr;
 	unsigned long flags;
@@ -297,6 +296,11 @@ static int ath9k_wmi_cmd_issue(struct wmi *wmi,
 	hdr->seq_no = cpu_to_be16(++wmi->tx_seq_id);
 
 	spin_lock_irqsave(&wmi->wmi_lock, flags);
+
+	/* record the rsp buffer and length */
+	wmi->cmd_rsp_buf = rsp_buf;
+	wmi->cmd_rsp_len = rsp_len;
+
 	wmi->last_seq_id = wmi->tx_seq_id;
 	spin_unlock_irqrestore(&wmi->wmi_lock, flags);
 
@@ -337,11 +341,7 @@ int ath9k_wmi_cmd(struct wmi *wmi, enum wmi_cmd_id cmd_id,
 		goto out;
 	}
 
-	/* record the rsp buffer and length */
-	wmi->cmd_rsp_buf = rsp_buf;
-	wmi->cmd_rsp_len = rsp_len;
-
-	ret = ath9k_wmi_cmd_issue(wmi, skb, cmd_id, cmd_len);
+	ret = ath9k_wmi_cmd_issue(wmi, skb, cmd_id, cmd_len, rsp_buf, rsp_len);
 	if (ret)
 		goto out;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/3] wifi: ath9k: fix ath9k_wmi_cmd return value when device is unplugged
  2023-03-15 20:21 [PATCH 0/3] wifi: ath9k: deal with uninit memory Fedor Pchelkin
  2023-03-15 20:21 ` [PATCH 1/3] wifi: ath9k: avoid referencing uninit memory in ath9k_wmi_ctrl_rx Fedor Pchelkin
  2023-03-15 20:21 ` [PATCH 2/3] wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx Fedor Pchelkin
@ 2023-03-15 20:21 ` Fedor Pchelkin
  2023-03-15 20:47 ` [PATCH 0/3] wifi: ath9k: deal with uninit memory Fedor Pchelkin
  3 siblings, 0 replies; 24+ messages in thread
From: Fedor Pchelkin @ 2023-03-15 20:21 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Fedor Pchelkin, Kalle Valo, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Senthil Balasubramanian,
	John W. Linville, Vasanthakumar Thiagarajan, Sujith,
	linux-wireless, netdev, linux-kernel, Alexey Khoroshilov,
	lvc-project

Return with an error code in case the USB device has been already
unplugged. Otherwise the callers of ath9k_wmi_cmd() are unaware of the
fact that cmd_buf and rsp_buf are not initialized or handled properly
inside this function.

Found by Linux Verification Center (linuxtesting.org) with Syzkaller.

Fixes: a3be14b76da1 ("ath9k_htc: Handle device unplug properly")
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
---
 drivers/net/wireless/ath/ath9k/wmi.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/ath/ath9k/wmi.c b/drivers/net/wireless/ath/ath9k/wmi.c
index 99a91bbaace9..3e0ad4f8f0a0 100644
--- a/drivers/net/wireless/ath/ath9k/wmi.c
+++ b/drivers/net/wireless/ath/ath9k/wmi.c
@@ -320,8 +320,11 @@ int ath9k_wmi_cmd(struct wmi *wmi, enum wmi_cmd_id cmd_id,
 	unsigned long time_left;
 	int ret = 0;
 
-	if (ah->ah_flags & AH_UNPLUGGED)
-		return 0;
+	if (ah->ah_flags & AH_UNPLUGGED) {
+		ath_dbg(common, WMI, "Device unplugged for WMI command: %s\n",
+			wmi_cmd_to_name(cmd_id));
+		return -ENODEV;
+	}
 
 	skb = alloc_skb(headroom + cmd_len, GFP_ATOMIC);
 	if (!skb)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] wifi: ath9k: deal with uninit memory
  2023-03-15 20:21 [PATCH 0/3] wifi: ath9k: deal with uninit memory Fedor Pchelkin
                   ` (2 preceding siblings ...)
  2023-03-15 20:21 ` [PATCH 3/3] wifi: ath9k: fix ath9k_wmi_cmd return value when device is unplugged Fedor Pchelkin
@ 2023-03-15 20:47 ` Fedor Pchelkin
  3 siblings, 0 replies; 24+ messages in thread
From: Fedor Pchelkin @ 2023-03-15 20:47 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Kalle Valo, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Senthil Balasubramanian, John W. Linville,
	Vasanthakumar Thiagarajan, Sujith, linux-wireless, netdev,
	linux-kernel, Alexey Khoroshilov, lvc-project

On Wed, Mar 15, 2023 at 11:21:09PM +0300, Fedor Pchelkin wrote:
> Syzkaller reports two cases ([1] and [2]) of uninitialized memory referencing in ath9k
> wmi functions. The following patch series is intended to fix them and related issues.
> 
> [1] https://syzkaller.appspot.com/bug?id=51d401326d8ee41859d68997acdd6f3b1b39f186
> [2] https://syzkaller.appspot.com/bug?id=fc54e8d79f5d5082c7867259d71b4e6618b69d25

During the patch development I observed that the return value of REG_READ
(ath9k_regread), REG_READ_MULTI (ath9k_multi_regread) and similar macros
is not checked in most places inside ath9k where they are called. That may
also potentially lead to incorrect behaviour. I wonder if it actually
poses a problem as the current implementation has been for a long time and
perhaps somebody has already addressed this.

In more details:
-- ath9k_regread returns -1 on error, and probably this is a predefined
   error state and doesn't need additional check. But, overall, it seems
   strange to me that the return value is not checked in places where it
   is used later (for example, in ath9k_reg_rmw or
   ath9k_hw_ani_read_counters).
-- ath9k_multi_regread fills 'val' buffer with undefined values on error
   case, that should definitely be fixed with initializing the local
   buffer to zero, I think.

Could you please say your opinion on this issue?


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] wifi: ath9k: avoid referencing uninit memory in ath9k_wmi_ctrl_rx
  2023-03-15 20:21 ` [PATCH 1/3] wifi: ath9k: avoid referencing uninit memory in ath9k_wmi_ctrl_rx Fedor Pchelkin
@ 2023-03-17  5:26   ` Kalle Valo
  2023-03-18 20:25     ` Fedor Pchelkin
  2023-04-24 18:23   ` Fedor Pchelkin
  1 sibling, 1 reply; 24+ messages in thread
From: Kalle Valo @ 2023-03-17  5:26 UTC (permalink / raw)
  To: Fedor Pchelkin
  Cc: Toke Høiland-Jørgensen, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Senthil Balasubramanian,
	John W. Linville, Vasanthakumar Thiagarajan, Sujith,
	linux-wireless, netdev, linux-kernel, Alexey Khoroshilov,
	lvc-project, syzbot+f2cb6e0ffdb961921e4d

Fedor Pchelkin <pchelkin@ispras.ru> writes:

> For the reasons described in commit b383e8abed41 ("wifi: ath9k: avoid
> uninit memory read in ath9k_htc_rx_msg()"), ath9k_htc_rx_msg() should
> validate pkt_len before accessing the SKB. For example, the obtained SKB
> may have uninitialized memory in the case of
> ioctl(USB_RAW_IOCTL_EP_WRITE).
>
> Implement sanity checking inside the corresponding endpoint RX handlers:
> ath9k_wmi_ctrl_rx() and ath9k_htc_rxep(). Otherwise, uninit memory can
> be referenced.
>
> Add comments briefly describing the issue.
>
> Found by Linux Verification Center (linuxtesting.org) with Syzkaller.

It would be also nice to know how you have tested these. Syzkaller is no
substitute for testing on a real hardware.

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] wifi: ath9k: avoid referencing uninit memory in ath9k_wmi_ctrl_rx
  2023-03-17  5:26   ` Kalle Valo
@ 2023-03-18 20:25     ` Fedor Pchelkin
  0 siblings, 0 replies; 24+ messages in thread
From: Fedor Pchelkin @ 2023-03-18 20:25 UTC (permalink / raw)
  To: Kalle Valo
  Cc: Toke Høiland-Jørgensen, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Senthil Balasubramanian,
	John W. Linville, Vasanthakumar Thiagarajan, Sujith,
	linux-wireless, netdev, linux-kernel, Alexey Khoroshilov,
	lvc-project, syzbot+f2cb6e0ffdb961921e4d

On Fri, Mar 17, 2023 at 07:26:11AM +0200, Kalle Valo wrote:
> 
> It would be also nice to know how you have tested these. Syzkaller is no
> substitute for testing on a real hardware.
> 

Unfortunately, currently I can't test this on real hardware so probably we
should postpone the patch discussion for some time. Roughly in a week or
two I'll be able to do some testing and try to reproduce the problem
there.

For sure this should be tested on real hardware as some issues may arise.
I sent the patch based on the commit b383e8abed41 ("wifi: ath9k: avoid
uninit memory read in ath9k_htc_rx_msg()") where it is explained
thoroughly what can lead to such behaviour. At the moment I don't see
anything in the code which can prevent that invalid scenario to happen for
endpoint callbacks path.

Actually, sanity checks for SKB length have been added everywhere inside
ath9k_htc_rx_msg() except where the endpoint callbacks are called. As for
the repro, the SKB inside ath9k_hif_usb_rx_stream() is allocated with
pkt_len=8 so it passes the 'htc_frame_hdr' check and processing in
ath9k_htc_rx_msg() but it obviously cannot be handled correctly in the
endpoint callbacks then.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] wifi: ath9k: avoid referencing uninit memory in ath9k_wmi_ctrl_rx
  2023-03-15 20:21 ` [PATCH 1/3] wifi: ath9k: avoid referencing uninit memory in ath9k_wmi_ctrl_rx Fedor Pchelkin
  2023-03-17  5:26   ` Kalle Valo
@ 2023-04-24 18:23   ` Fedor Pchelkin
  2023-04-24 18:33     ` [PATCH v2] " Fedor Pchelkin
  1 sibling, 1 reply; 24+ messages in thread
From: Fedor Pchelkin @ 2023-04-24 18:23 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Kalle Vallo
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Senthil Balasubramanian, John W. Linville,
	Vasanthakumar Thiagarajan, Sujith, linux-wireless, netdev,
	linux-kernel, Alexey Khoroshilov, lvc-project,
	syzbot+f2cb6e0ffdb961921e4d

On Wed, Mar 15, 2023 at 11:21:10PM +0300, Fedor Pchelkin wrote:
> For the reasons described in commit b383e8abed41 ("wifi: ath9k: avoid
> uninit memory read in ath9k_htc_rx_msg()"), ath9k_htc_rx_msg() should
> validate pkt_len before accessing the SKB. For example, the obtained SKB
> may have uninitialized memory in the case of
> ioctl(USB_RAW_IOCTL_EP_WRITE).
> 
> Implement sanity checking inside the corresponding endpoint RX handlers:
> ath9k_wmi_ctrl_rx() and ath9k_htc_rxep(). Otherwise, uninit memory can
> be referenced.
> 
> Add comments briefly describing the issue.
> 
> Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
> 
> Fixes: fb9987d0f748 ("ath9k_htc: Support for AR9271 chipset.")
> Reported-and-tested-by: syzbot+f2cb6e0ffdb961921e4d@syzkaller.appspotmail.com
> Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
> ---

Apologies for the delay.

I've been able to test the patches in some way on 0cf3:9271 Qualcomm
Atheros Communications AR9271 802.11n .

>  drivers/net/wireless/ath/ath9k/htc_drv_txrx.c | 6 ++++++
>  drivers/net/wireless/ath/ath9k/htc_hst.c      | 4 ++++
>  drivers/net/wireless/ath/ath9k/wmi.c          | 8 ++++++++
>  3 files changed, 18 insertions(+)
> 
> diff --git a/drivers/net/wireless/ath/ath9k/htc_drv_txrx.c b/drivers/net/wireless/ath/ath9k/htc_drv_txrx.c
> index 672789e3c55d..957efb26019d 100644
> --- a/drivers/net/wireless/ath/ath9k/htc_drv_txrx.c
> +++ b/drivers/net/wireless/ath/ath9k/htc_drv_txrx.c
> @@ -1147,6 +1147,12 @@ void ath9k_htc_rxep(void *drv_priv, struct sk_buff *skb,
>  	if (!data_race(priv->rx.initialized))
>  		goto err;
>  
> +	/* Validate the obtained SKB so that it is handled without error
> +	 * inside rx_tasklet handler.
> +	 */
> +	if (unlikely(skb->len < sizeof(struct ieee80211_hdr)))
> +		goto err;
> +
>  	spin_lock_irqsave(&priv->rx.rxbuflock, flags);
>  	list_for_each_entry(tmp_buf, &priv->rx.rxbuf, list) {
>  		if (!tmp_buf->in_process) {

This check above seems to be redundant: SKB is properly checked inside
ath9k_rx_prepare() in rx_tasklet handler.

> diff --git a/drivers/net/wireless/ath/ath9k/htc_hst.c b/drivers/net/wireless/ath/ath9k/htc_hst.c
> index fe62ff668f75..9d0d9d0e1aa8 100644
> --- a/drivers/net/wireless/ath/ath9k/htc_hst.c
> +++ b/drivers/net/wireless/ath/ath9k/htc_hst.c
> @@ -475,6 +475,10 @@ void ath9k_htc_rx_msg(struct htc_target *htc_handle,
>  		skb_pull(skb, sizeof(struct htc_frame_hdr));
>  
>  		endpoint = &htc_handle->endpoint[epid];
> +
> +		/* The endpoint RX handlers should implement their own
> +		 * additional SKB sanity checking
> +		 */
>  		if (endpoint->ep_callbacks.rx)
>  			endpoint->ep_callbacks.rx(endpoint->ep_callbacks.priv,
>  						  skb, epid);
> diff --git a/drivers/net/wireless/ath/ath9k/wmi.c b/drivers/net/wireless/ath/ath9k/wmi.c
> index 19345b8f7bfd..2e7c361b62f5 100644
> --- a/drivers/net/wireless/ath/ath9k/wmi.c
> +++ b/drivers/net/wireless/ath/ath9k/wmi.c
> @@ -204,6 +204,10 @@ static void ath9k_wmi_rsp_callback(struct wmi *wmi, struct sk_buff *skb)
>  {
>  	skb_pull(skb, sizeof(struct wmi_cmd_hdr));
>  
> +	/* Once again validate the SKB. */
> +	if (unlikely(skb->len < wmi->cmd_rsp_len))
> +		return;
> +
>  	if (wmi->cmd_rsp_buf != NULL && wmi->cmd_rsp_len != 0)
>  		memcpy(wmi->cmd_rsp_buf, skb->data, wmi->cmd_rsp_len);
>

The change above is very very wrong. Looking at the firmware code and
debugging driver with real device, I realized the command response is
located in the tailroom of an SKB: skb->len (SKB data length) is zero in
such packets while skb->data and skb->tail pointers are equal. So a new
skb->len and cmd_rsp_len check resulted in driver denying all WMI command
responses. My bad for proposing such a mistake.

> @@ -221,6 +225,10 @@ static void ath9k_wmi_ctrl_rx(void *priv, struct sk_buff *skb,
>  	if (unlikely(wmi->stopped))
>  		goto free_skb;
>  
> +	/* Validate the obtained SKB. */
> +	if (unlikely(skb->len < sizeof(struct wmi_cmd_hdr)))
> +		goto free_skb;
> +
>  	hdr = (struct wmi_cmd_hdr *) skb->data;
>  	cmd_id = be16_to_cpu(hdr->command_id);
>  

This check above is actually good. A packet can be obtained constructed
something like this (taken from Syzkaller reproducer):

  *(uint16_t*)0x20000500 = 8; // pkt_len
  *(uint16_t*)0x20000502 = 0x4e00; // ATH_USB_RX_STREAM_MODE_TAG
  memcpy((void*)0x20000504, "\x15\xa7\xd5\x61\xb9\xb3\xb0\x7c", 8);
  syz_usb_ep_write(r[0], 0x82, 0xc, 0x20000500);

pkt_len is 8, so that it can contain an htc_frame_hdr, but when the SKB is
processed in ath9k_htc_rx_msg() and passed to the endpoint RX handler,
ath9k_wmi_ctrl_rx() specifically, the problem arises as there are no other
corresponding headers in the buffer.

wmi_cmd_hdr is pulled later so there are no problems with checking
skb->len. There are no issues arised with the patch on a real device, too.

Not sure if this can happen on an ath9k device with common firmware
(probably can't happen), but that is real with some bad USB device which
Syzkaller emulates.

I'll send the v2 versions for further discussions.

> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2] wifi: ath9k: avoid referencing uninit memory in ath9k_wmi_ctrl_rx
  2023-04-24 18:23   ` Fedor Pchelkin
@ 2023-04-24 18:33     ` Fedor Pchelkin
  2023-04-25 11:14       ` Toke Høiland-Jørgensen
  2023-04-28 16:52       ` Kalle Valo
  0 siblings, 2 replies; 24+ messages in thread
From: Fedor Pchelkin @ 2023-04-24 18:33 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Kalle Vallo
  Cc: Fedor Pchelkin, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Senthil Balasubramanian, John W. Linville,
	Vasanthakumar Thiagarajan, Sujith, linux-wireless, netdev,
	linux-kernel, Alexey Khoroshilov, lvc-project,
	syzbot+f2cb6e0ffdb961921e4d

For the reasons also described in commit b383e8abed41 ("wifi: ath9k: avoid
uninit memory read in ath9k_htc_rx_msg()"), ath9k_htc_rx_msg() should
validate pkt_len before accessing the SKB.

For example, the obtained SKB may have been badly constructed with
pkt_len = 8. In this case, the SKB can only contain a valid htc_frame_hdr
but after being processed in ath9k_htc_rx_msg() and passed to
ath9k_wmi_ctrl_rx() endpoint RX handler, it is expected to have a WMI
command header which should be located inside its data payload. 

Implement sanity checking inside ath9k_wmi_ctrl_rx(). Otherwise, uninit
memory can be referenced.

Tested on Qualcomm Atheros Communications AR9271 802.11n .

Found by Linux Verification Center (linuxtesting.org) with Syzkaller.

Fixes: fb9987d0f748 ("ath9k_htc: Support for AR9271 chipset.")
Reported-and-tested-by: syzbot+f2cb6e0ffdb961921e4d@syzkaller.appspotmail.com
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
---
v2: remove incorrect checks and rephrase commit info

 drivers/net/wireless/ath/ath9k/wmi.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/wireless/ath/ath9k/wmi.c b/drivers/net/wireless/ath/ath9k/wmi.c
index 19345b8f7bfd..d652c647d56b 100644
--- a/drivers/net/wireless/ath/ath9k/wmi.c
+++ b/drivers/net/wireless/ath/ath9k/wmi.c
@@ -221,6 +221,10 @@ static void ath9k_wmi_ctrl_rx(void *priv, struct sk_buff *skb,
 	if (unlikely(wmi->stopped))
 		goto free_skb;
 
+	/* Validate the obtained SKB. */
+	if (unlikely(skb->len < sizeof(struct wmi_cmd_hdr)))
+		goto free_skb;
+
 	hdr = (struct wmi_cmd_hdr *) skb->data;
 	cmd_id = be16_to_cpu(hdr->command_id);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/3] wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx
  2023-03-15 20:21 ` [PATCH 2/3] wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx Fedor Pchelkin
@ 2023-04-24 19:11   ` Fedor Pchelkin
  2023-04-24 19:18     ` [PATCH v2] " Fedor Pchelkin
  0 siblings, 1 reply; 24+ messages in thread
From: Fedor Pchelkin @ 2023-04-24 19:11 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Kalle Valo
  Cc: Kalle Valo, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Senthil Balasubramanian, John W. Linville,
	Vasanthakumar Thiagarajan, Sujith, linux-wireless, netdev,
	linux-kernel, Alexey Khoroshilov, lvc-project,
	syzbot+df61b36319e045c00a08

This problem is realy subtle, I suppose. In the v2 commit info, which I'll
send in the next mail, the race condition is described which can lead to
invalid behaviour.

Couldn't reproduce that particular problem on real hardware, but if
force timeouts to wmi cmd completions, local KMSan catches some uninit
values.

The synchronization between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx on
timeouts is good, especially after 8a2f35b98306 ("wifi: ath9k: Fix
potential stack-out-of-bounds write in ath9k_wmi_rsp_callback()").

And I think the only place where the fuzzer can provoke failure is when
wmi->last_seq_id in callback is checked before it is assigned zero inside
ath9k_wmi_cmd() during timeout exit. This scenario is more thoroughly
described in patch v2.

Well, the issue seems to be rare and I don't know how to properly test it
on real hardware.

I've made some checks on a basic driver workflow, and there weren't any
stalls or explicit failures, and the patch seems to close that tiny race
condition window. But, anyway, it requires more discussion.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2] wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx
  2023-04-24 19:11   ` Fedor Pchelkin
@ 2023-04-24 19:18     ` Fedor Pchelkin
       [not found]       ` <20230425033832.2041-1-hdanton@sina.com>
  0 siblings, 1 reply; 24+ messages in thread
From: Fedor Pchelkin @ 2023-04-24 19:18 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Kalle Valo
  Cc: Fedor Pchelkin, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Senthil Balasubramanian, John W. Linville,
	Vasanthakumar Thiagarajan, Sujith, linux-wireless, netdev,
	linux-kernel, Alexey Khoroshilov, lvc-project,
	syzbot+f2cb6e0ffdb961921e4d, syzbot+df61b36319e045c00a08

Currently, the synchronization between ath9k_wmi_cmd() and
ath9k_wmi_ctrl_rx() is exposed to a race condition which, although being
rather unlikely, can lead to invalid behaviour of ath9k_wmi_cmd().

Consider the following scenario:

CPU0					CPU1

ath9k_wmi_cmd(...)
  mutex_lock(&wmi->op_mutex)
  ath9k_wmi_cmd_issue(...)
  wait_for_completion_timeout(...)
  ---
  timeout
  ---
					/* the callback is being processed
					 * before last_seq_id became zero
					 */
					ath9k_wmi_ctrl_rx(...)
					  spin_lock_irqsave(...)
					  /* wmi->last_seq_id check here
					   * doesn't detect timeout yet
					   */
					  spin_unlock_irqrestore(...)
  /* last_seq_id is zeroed to
   * indicate there was a timeout
   */
  wmi->last_seq_id = 0
  mutex_unlock(&wmi->op_mutex)
  return -ETIMEDOUT

ath9k_wmi_cmd(...)
  mutex_lock(&wmi->op_mutex)
  /* the buffer is replaced with
   * another one
   */
  wmi->cmd_rsp_buf = rsp_buf
  wmi->cmd_rsp_len = rsp_len
  ath9k_wmi_cmd_issue(...)
    spin_lock_irqsave(...)
    spin_unlock_irqrestore(...)
  wait_for_completion_timeout(...)
					/* the continuation of the
					 * callback left after the first
					 * ath9k_wmi_cmd call
					 */
					  ath9k_wmi_rsp_callback(...)
					    /* copying data designated
					     * to already timeouted
					     * WMI command into an
					     * inappropriate wmi_cmd_buf
					     */
					    memcpy(...)
					    complete(&wmi->cmd_wait)
  /* awakened by the bogus callback
   * => invalid return result
   */
  mutex_unlock(&wmi->op_mutex)
  return 0

To fix this, move ath9k_wmi_rsp_callback() under wmi_lock inside
ath9k_wmi_ctrl_rx() so that the wmi->cmd_wait can be completed only for
initially designated wmi_cmd call, otherwise the path would be rejected
with last_seq_id check.

Also move recording the rsp buffer and length into ath9k_wmi_cmd_issue()
under the same wmi_lock with last_seq_id update to avoid their racy
changes.

Found by Linux Verification Center (linuxtesting.org) with Syzkaller.

Fixes: fb9987d0f748 ("ath9k_htc: Support for AR9271 chipset.")
Reported-and-tested-by: syzbot+df61b36319e045c00a08@syzkaller.appspotmail.com
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
---
v2: do not extract ath9k_wmi_rsp_callback() internals, rephrase
description

 drivers/net/wireless/ath/ath9k/wmi.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/net/wireless/ath/ath9k/wmi.c b/drivers/net/wireless/ath/ath9k/wmi.c
index d652c647d56b..688453a2e53a 100644
--- a/drivers/net/wireless/ath/ath9k/wmi.c
+++ b/drivers/net/wireless/ath/ath9k/wmi.c
@@ -242,10 +242,10 @@ static void ath9k_wmi_ctrl_rx(void *priv, struct sk_buff *skb,
 		spin_unlock_irqrestore(&wmi->wmi_lock, flags);
 		goto free_skb;
 	}
-	spin_unlock_irqrestore(&wmi->wmi_lock, flags);
 
 	/* WMI command response */
 	ath9k_wmi_rsp_callback(wmi, skb);
+	spin_unlock_irqrestore(&wmi->wmi_lock, flags);
 
 free_skb:
 	kfree_skb(skb);
@@ -283,7 +283,8 @@ int ath9k_wmi_connect(struct htc_target *htc, struct wmi *wmi,
 
 static int ath9k_wmi_cmd_issue(struct wmi *wmi,
 			       struct sk_buff *skb,
-			       enum wmi_cmd_id cmd, u16 len)
+			       enum wmi_cmd_id cmd, u16 len,
+			       u8 *rsp_buf, u32 rsp_len)
 {
 	struct wmi_cmd_hdr *hdr;
 	unsigned long flags;
@@ -293,6 +294,11 @@ static int ath9k_wmi_cmd_issue(struct wmi *wmi,
 	hdr->seq_no = cpu_to_be16(++wmi->tx_seq_id);
 
 	spin_lock_irqsave(&wmi->wmi_lock, flags);
+
+	/* record the rsp buffer and length */
+	wmi->cmd_rsp_buf = rsp_buf;
+	wmi->cmd_rsp_len = rsp_len;
+
 	wmi->last_seq_id = wmi->tx_seq_id;
 	spin_unlock_irqrestore(&wmi->wmi_lock, flags);
 
@@ -333,11 +339,7 @@ int ath9k_wmi_cmd(struct wmi *wmi, enum wmi_cmd_id cmd_id,
 		goto out;
 	}
 
-	/* record the rsp buffer and length */
-	wmi->cmd_rsp_buf = rsp_buf;
-	wmi->cmd_rsp_len = rsp_len;
-
-	ret = ath9k_wmi_cmd_issue(wmi, skb, cmd_id, cmd_len);
+	ret = ath9k_wmi_cmd_issue(wmi, skb, cmd_id, cmd_len, rsp_buf, rsp_len);
 	if (ret)
 		goto out;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx
       [not found]       ` <20230425033832.2041-1-hdanton@sina.com>
@ 2023-04-25  5:45         ` Kalle Valo
  2023-04-25  7:54         ` Fedor Pchelkin
  2023-04-25 19:26         ` [PATCH v3 1/2] " Fedor Pchelkin
  2 siblings, 0 replies; 24+ messages in thread
From: Kalle Valo @ 2023-04-25  5:45 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Fedor Pchelkin, Toke Høiland-Jørgensen, linux-kernel,
	syzbot+f2cb6e0ffdb961921e4d, syzbot+df61b36319e045c00a08,
	linux-wireless

Hillf Danton <hdanton@sina.com> writes:

> On 24 Apr 2023 22:18:26 +0300 Fedor Pchelkin <pchelkin@ispras.ru>
>> Currently, the synchronization between ath9k_wmi_cmd() and
>> ath9k_wmi_ctrl_rx() is exposed to a race condition which, although being
>> rather unlikely, can lead to invalid behaviour of ath9k_wmi_cmd().
>> 
>> Consider the following scenario:
>> 
>> CPU0					CPU1
>> 
>> ath9k_wmi_cmd(...)
>>   mutex_lock(&wmi->op_mutex)
>>   ath9k_wmi_cmd_issue(...)
>>   wait_for_completion_timeout(...)
>>   ---
>>   timeout
>>   ---
>> 					/* the callback is being processed
>> 					 * before last_seq_id became zero
>> 					 */
>> 					ath9k_wmi_ctrl_rx(...)
>> 					  spin_lock_irqsave(...)
>> 					  /* wmi->last_seq_id check here
>> 					   * doesn't detect timeout yet
>> 					   */
>> 					  spin_unlock_irqrestore(...)
>>   /* last_seq_id is zeroed to
>>    * indicate there was a timeout
>>    */
>>   wmi->last_seq_id = 0
>
> Without	wmi->wmi_lock held, updating last_seq_id on the waiter side
> means it is random on the waker side, so the fix below is incorrect.
>
>>   mutex_unlock(&wmi->op_mutex)
>>   return -ETIMEDOUT
>> 
>> ath9k_wmi_cmd(...)
>>   mutex_lock(&wmi->op_mutex)
>>   /* the buffer is replaced with
>>    * another one
>>    */
>>   wmi->cmd_rsp_buf = rsp_buf
>>   wmi->cmd_rsp_len = rsp_len
>>   ath9k_wmi_cmd_issue(...)
>>     spin_lock_irqsave(...)
>>     spin_unlock_irqrestore(...)
>>   wait_for_completion_timeout(...)
>> 					/* the continuation of the
>> 					 * callback left after the first
>> 					 * ath9k_wmi_cmd call
>> 					 */
>> 					  ath9k_wmi_rsp_callback(...)
>> 					    /* copying data designated
>> 					     * to already timeouted
>> 					     * WMI command into an
>> 					     * inappropriate wmi_cmd_buf
>> 					     */
>> 					    memcpy(...)
>> 					    complete(&wmi->cmd_wait)
>>   /* awakened by the bogus callback
>>    * => invalid return result
>>    */
>>   mutex_unlock(&wmi->op_mutex)
>>   return 0
>> 
>> To fix this, move ath9k_wmi_rsp_callback() under wmi_lock inside
>> ath9k_wmi_ctrl_rx() so that the wmi->cmd_wait can be completed only for
>> initially designated wmi_cmd call, otherwise the path would be rejected
>> with last_seq_id check.
>> 
>> Also move recording the rsp buffer and length into ath9k_wmi_cmd_issue()
>> under the same wmi_lock with last_seq_id update to avoid their racy
>> changes.
>
> Better in a seperate one.

Adding linux-wireless, please always CC the list with wireless patches.

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx
       [not found]       ` <20230425033832.2041-1-hdanton@sina.com>
  2023-04-25  5:45         ` Kalle Valo
@ 2023-04-25  7:54         ` Fedor Pchelkin
  2023-04-25 19:26         ` [PATCH v3 1/2] " Fedor Pchelkin
  2 siblings, 0 replies; 24+ messages in thread
From: Fedor Pchelkin @ 2023-04-25  7:54 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Toke Høiland-Jørgensen, Kalle Valo, linux-kernel,
	syzbot+f2cb6e0ffdb961921e4d, syzbot+df61b36319e045c00a08,
	linux-wireless, netdev, Alexey Khoroshilov, lvc-project

On Tue, Apr 25, 2023 at 11:38:32AM +0800, Hillf Danton wrote:
> On 24 Apr 2023 22:18:26 +0300 Fedor Pchelkin <pchelkin@ispras.ru>
> > Currently, the synchronization between ath9k_wmi_cmd() and
> > ath9k_wmi_ctrl_rx() is exposed to a race condition which, although being
> > rather unlikely, can lead to invalid behaviour of ath9k_wmi_cmd().
> > 
> > Consider the following scenario:
> > 
> > CPU0					CPU1
> > 
> > ath9k_wmi_cmd(...)
> >   mutex_lock(&wmi->op_mutex)
> >   ath9k_wmi_cmd_issue(...)
> >   wait_for_completion_timeout(...)
> >   ---
> >   timeout
> >   ---
> > 					/* the callback is being processed
> > 					 * before last_seq_id became zero
> > 					 */
> > 					ath9k_wmi_ctrl_rx(...)
> > 					  spin_lock_irqsave(...)
> > 					  /* wmi->last_seq_id check here
> > 					   * doesn't detect timeout yet
> > 					   */
> > 					  spin_unlock_irqrestore(...)
> >   /* last_seq_id is zeroed to
> >    * indicate there was a timeout
> >    */
> >   wmi->last_seq_id = 0
> 
> Without	wmi->wmi_lock held, updating last_seq_id on the waiter side
> means it is random on the waker side, so the fix below is incorrect.
> 

Thank you for noticing! Of course that should be done.

> >   mutex_unlock(&wmi->op_mutex)
> >   return -ETIMEDOUT
> > 
> > ath9k_wmi_cmd(...)
> >   mutex_lock(&wmi->op_mutex)
> >   /* the buffer is replaced with
> >    * another one
> >    */
> >   wmi->cmd_rsp_buf = rsp_buf
> >   wmi->cmd_rsp_len = rsp_len
> >   ath9k_wmi_cmd_issue(...)
> >     spin_lock_irqsave(...)
> >     spin_unlock_irqrestore(...)
> >   wait_for_completion_timeout(...)
> > 					/* the continuation of the
> > 					 * callback left after the first
> > 					 * ath9k_wmi_cmd call
> > 					 */
> > 					  ath9k_wmi_rsp_callback(...)
> > 					    /* copying data designated
> > 					     * to already timeouted
> > 					     * WMI command into an
> > 					     * inappropriate wmi_cmd_buf
> > 					     */
> > 					    memcpy(...)
> > 					    complete(&wmi->cmd_wait)
> >   /* awakened by the bogus callback
> >    * => invalid return result
> >    */
> >   mutex_unlock(&wmi->op_mutex)
> >   return 0
> > 
> > To fix this, move ath9k_wmi_rsp_callback() under wmi_lock inside
> > ath9k_wmi_ctrl_rx() so that the wmi->cmd_wait can be completed only for
> > initially designated wmi_cmd call, otherwise the path would be rejected
> > with last_seq_id check.
> > 
> > Also move recording the rsp buffer and length into ath9k_wmi_cmd_issue()
> > under the same wmi_lock with last_seq_id update to avoid their racy
> > changes.
> 
> Better in a seperate one.

Well, they are parts of the same problem but now it seems more relevant
to divide the patch in two: the first one for incorrect last_seq_id
synchronization and the second one for recording rsp buffer under the
lock. Thanks!

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] wifi: ath9k: avoid referencing uninit memory in ath9k_wmi_ctrl_rx
  2023-04-24 18:33     ` [PATCH v2] " Fedor Pchelkin
@ 2023-04-25 11:14       ` Toke Høiland-Jørgensen
  2023-04-28 16:52       ` Kalle Valo
  1 sibling, 0 replies; 24+ messages in thread
From: Toke Høiland-Jørgensen @ 2023-04-25 11:14 UTC (permalink / raw)
  To: Fedor Pchelkin, Kalle Vallo
  Cc: Fedor Pchelkin, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Senthil Balasubramanian, John W. Linville,
	Vasanthakumar Thiagarajan, Sujith, linux-wireless, netdev,
	linux-kernel, Alexey Khoroshilov, lvc-project,
	syzbot+f2cb6e0ffdb961921e4d

Fedor Pchelkin <pchelkin@ispras.ru> writes:

> For the reasons also described in commit b383e8abed41 ("wifi: ath9k: avoid
> uninit memory read in ath9k_htc_rx_msg()"), ath9k_htc_rx_msg() should
> validate pkt_len before accessing the SKB.
>
> For example, the obtained SKB may have been badly constructed with
> pkt_len = 8. In this case, the SKB can only contain a valid htc_frame_hdr
> but after being processed in ath9k_htc_rx_msg() and passed to
> ath9k_wmi_ctrl_rx() endpoint RX handler, it is expected to have a WMI
> command header which should be located inside its data payload. 
>
> Implement sanity checking inside ath9k_wmi_ctrl_rx(). Otherwise, uninit
> memory can be referenced.
>
> Tested on Qualcomm Atheros Communications AR9271 802.11n .
>
> Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
>
> Fixes: fb9987d0f748 ("ath9k_htc: Support for AR9271 chipset.")
> Reported-and-tested-by: syzbot+f2cb6e0ffdb961921e4d@syzkaller.appspotmail.com
> Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>

Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v3 1/2] wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx
       [not found]       ` <20230425033832.2041-1-hdanton@sina.com>
  2023-04-25  5:45         ` Kalle Valo
  2023-04-25  7:54         ` Fedor Pchelkin
@ 2023-04-25 19:26         ` Fedor Pchelkin
  2023-04-25 19:26           ` [PATCH v3 2/2] wifi: ath9k: protect WMI command response buffer replacement with a lock Fedor Pchelkin
                             ` (3 more replies)
  2 siblings, 4 replies; 24+ messages in thread
From: Fedor Pchelkin @ 2023-04-25 19:26 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Kalle Vallo
  Cc: Fedor Pchelkin, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Senthil Balasubramanian, John W. Linville,
	Vasanthakumar Thiagarajan, Sujith, linux-wireless, netdev,
	linux-kernel, Alexey Khoroshilov, lvc-project,
	syzbot+f2cb6e0ffdb961921e4d, Hillf Danton

Currently, the synchronization between ath9k_wmi_cmd() and
ath9k_wmi_ctrl_rx() is exposed to a race condition which, although being
rather unlikely, can lead to invalid behaviour of ath9k_wmi_cmd().

Consider the following scenario:

CPU0					CPU1

ath9k_wmi_cmd(...)
  mutex_lock(&wmi->op_mutex)
  ath9k_wmi_cmd_issue(...)
  wait_for_completion_timeout(...)
  ---
  timeout
  ---
					/* the callback is being processed
					 * before last_seq_id became zero
					 */
					ath9k_wmi_ctrl_rx(...)
					  spin_lock_irqsave(...)
					  /* wmi->last_seq_id check here
					   * doesn't detect timeout yet
					   */
					  spin_unlock_irqrestore(...)
  /* last_seq_id is zeroed to
   * indicate there was a timeout
   */
  wmi->last_seq_id = 0
  mutex_unlock(&wmi->op_mutex)
  return -ETIMEDOUT

ath9k_wmi_cmd(...)
  mutex_lock(&wmi->op_mutex)
  /* the buffer is replaced with
   * another one
   */
  wmi->cmd_rsp_buf = rsp_buf
  wmi->cmd_rsp_len = rsp_len
  ath9k_wmi_cmd_issue(...)
    spin_lock_irqsave(...)
    spin_unlock_irqrestore(...)
  wait_for_completion_timeout(...)
					/* the continuation of the
					 * callback left after the first
					 * ath9k_wmi_cmd call
					 */
					  ath9k_wmi_rsp_callback(...)
					    /* copying data designated
					     * to already timeouted
					     * WMI command into an
					     * inappropriate wmi_cmd_buf
					     */
					    memcpy(...)
					    complete(&wmi->cmd_wait)
  /* awakened by the bogus callback
   * => invalid return result
   */
  mutex_unlock(&wmi->op_mutex)
  return 0

To fix this, update last_seq_id on timeout path inside ath9k_wmi_cmd()
under the wmi_lock. Move ath9k_wmi_rsp_callback() under wmi_lock inside
ath9k_wmi_ctrl_rx() so that the wmi->cmd_wait can be completed only for
initially designated wmi_cmd call, otherwise the path would be rejected
with last_seq_id check.

Found by Linux Verification Center (linuxtesting.org) with Syzkaller.

Fixes: fb9987d0f748 ("ath9k_htc: Support for AR9271 chipset.")
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
---
v2: do not extract ath9k_wmi_rsp_callback() internals, rephrase
description
v3: per Hillf Danton's comment, protect last_seq_id with wmi_lock on
timeout path, divide the v2 version into two separate patches; the first
one is concerned with last_seq_id and completion under wmi_lock, the
second one is for moving rsp buffer and length recording under wmi lock.
Note that it's been only tested with Syzkaller and on basic driver
scenarios with real hardware.

 drivers/net/wireless/ath/ath9k/wmi.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/ath/ath9k/wmi.c b/drivers/net/wireless/ath/ath9k/wmi.c
index d652c647d56b..04f363cb90fe 100644
--- a/drivers/net/wireless/ath/ath9k/wmi.c
+++ b/drivers/net/wireless/ath/ath9k/wmi.c
@@ -242,10 +242,10 @@ static void ath9k_wmi_ctrl_rx(void *priv, struct sk_buff *skb,
 		spin_unlock_irqrestore(&wmi->wmi_lock, flags);
 		goto free_skb;
 	}
-	spin_unlock_irqrestore(&wmi->wmi_lock, flags);
 
 	/* WMI command response */
 	ath9k_wmi_rsp_callback(wmi, skb);
+	spin_unlock_irqrestore(&wmi->wmi_lock, flags);
 
 free_skb:
 	kfree_skb(skb);
@@ -308,8 +308,8 @@ int ath9k_wmi_cmd(struct wmi *wmi, enum wmi_cmd_id cmd_id,
 	struct ath_common *common = ath9k_hw_common(ah);
 	u16 headroom = sizeof(struct htc_frame_hdr) +
 		       sizeof(struct wmi_cmd_hdr);
+	unsigned long time_left, flags;
 	struct sk_buff *skb;
-	unsigned long time_left;
 	int ret = 0;
 
 	if (ah->ah_flags & AH_UNPLUGGED)
@@ -345,7 +345,9 @@ int ath9k_wmi_cmd(struct wmi *wmi, enum wmi_cmd_id cmd_id,
 	if (!time_left) {
 		ath_dbg(common, WMI, "Timeout waiting for WMI command: %s\n",
 			wmi_cmd_to_name(cmd_id));
+		spin_lock_irqsave(&wmi->wmi_lock, flags);
 		wmi->last_seq_id = 0;
+		spin_unlock_irqrestore(&wmi->wmi_lock, flags);
 		mutex_unlock(&wmi->op_mutex);
 		return -ETIMEDOUT;
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 2/2] wifi: ath9k: protect WMI command response buffer replacement with a lock
  2023-04-25 19:26         ` [PATCH v3 1/2] " Fedor Pchelkin
@ 2023-04-25 19:26           ` Fedor Pchelkin
  2023-08-08 14:07             ` Toke Høiland-Jørgensen
       [not found]           ` <20230425230708.2132-1-hdanton@sina.com>
                             ` (2 subsequent siblings)
  3 siblings, 1 reply; 24+ messages in thread
From: Fedor Pchelkin @ 2023-04-25 19:26 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Kalle Vallo
  Cc: Fedor Pchelkin, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Senthil Balasubramanian, John W. Linville,
	Vasanthakumar Thiagarajan, Sujith, linux-wireless, netdev,
	linux-kernel, Alexey Khoroshilov, lvc-project,
	syzbot+f2cb6e0ffdb961921e4d, Hillf Danton

If ath9k_wmi_cmd() has exited with a timeout, it is possible that during
next ath9k_wmi_cmd() call the wmi_rsp callback for previous wmi command
writes to new wmi->cmd_rsp_buf and makes a completion. This results in an
invalid ath9k_wmi_cmd() return value.

Move the replacement of WMI command response buffer and length under
wmi_lock. Note that last_seq_id value is updated there, too.

Thus, the buffer cannot be written to by a belated wmi_rsp callback
because that path is properly rejected by the last_seq_id check.

Found by Linux Verification Center (linuxtesting.org) with Syzkaller.

Fixes: fb9987d0f748 ("ath9k_htc: Support for AR9271 chipset.")
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
---
v2: do not extract ath9k_wmi_rsp_callback() internals, rephrase
description
v3: per Hillf Danton's comment, protect last_seq_id with wmi_lock on
timeout path, divide the v2 version into two separate patches; the first
one is concerned with last_seq_id and completion under wmi_lock, the
second one is for moving rsp buffer and length recording under wmi lock.
Note that it's been only tested with Syzkaller and on basic driver
scenarios with real hardware.

 drivers/net/wireless/ath/ath9k/wmi.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/net/wireless/ath/ath9k/wmi.c b/drivers/net/wireless/ath/ath9k/wmi.c
index 04f363cb90fe..1476b42b52a9 100644
--- a/drivers/net/wireless/ath/ath9k/wmi.c
+++ b/drivers/net/wireless/ath/ath9k/wmi.c
@@ -283,7 +283,8 @@ int ath9k_wmi_connect(struct htc_target *htc, struct wmi *wmi,
 
 static int ath9k_wmi_cmd_issue(struct wmi *wmi,
 			       struct sk_buff *skb,
-			       enum wmi_cmd_id cmd, u16 len)
+			       enum wmi_cmd_id cmd, u16 len,
+			       u8 *rsp_buf, u32 rsp_len)
 {
 	struct wmi_cmd_hdr *hdr;
 	unsigned long flags;
@@ -293,6 +294,11 @@ static int ath9k_wmi_cmd_issue(struct wmi *wmi,
 	hdr->seq_no = cpu_to_be16(++wmi->tx_seq_id);
 
 	spin_lock_irqsave(&wmi->wmi_lock, flags);
+
+	/* record the rsp buffer and length */
+	wmi->cmd_rsp_buf = rsp_buf;
+	wmi->cmd_rsp_len = rsp_len;
+
 	wmi->last_seq_id = wmi->tx_seq_id;
 	spin_unlock_irqrestore(&wmi->wmi_lock, flags);
 
@@ -333,11 +339,7 @@ int ath9k_wmi_cmd(struct wmi *wmi, enum wmi_cmd_id cmd_id,
 		goto out;
 	}
 
-	/* record the rsp buffer and length */
-	wmi->cmd_rsp_buf = rsp_buf;
-	wmi->cmd_rsp_len = rsp_len;
-
-	ret = ath9k_wmi_cmd_issue(wmi, skb, cmd_id, cmd_len);
+	ret = ath9k_wmi_cmd_issue(wmi, skb, cmd_id, cmd_len, rsp_buf, rsp_len);
 	if (ret)
 		goto out;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 1/2] wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx
       [not found]           ` <20230425230708.2132-1-hdanton@sina.com>
@ 2023-04-26 19:02             ` Fedor Pchelkin
  2023-05-15 12:06               ` Toke Høiland-Jørgensen
  2023-05-18 10:24               ` Hillf Danton
  0 siblings, 2 replies; 24+ messages in thread
From: Fedor Pchelkin @ 2023-04-26 19:02 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Toke Høiland-Jørgensen, Kalle Vallo,
	syzbot+f2cb6e0ffdb961921e4d, linux-wireless, netdev,
	linux-kernel, Alexey Khoroshilov, lvc-project

On Wed, Apr 26, 2023 at 07:07:08AM +0800, Hillf Danton wrote: 
> Given similar wait timeout[1], just taking lock on the waiter side is not
> enough wrt fixing the race, because in case job done on the waker side,
> waiter needs to wait again after timeout.
> 

As I understand you correctly, you mean the case when a timeout occurs
during ath9k_wmi_ctrl_rx() callback execution. I suppose if a timeout has
occurred on a waiter's side, it should return immediately and doesn't have
to care in which state the callback has been at that moment.

AFAICS, this is controlled properly with taking a wmi_lock on waiter and
waker sides, and there is no data corruption.

If a callback has not managed to do its work entirely (performing a
completion and subsequently waking waiting thread is included here), then,
well, it is considered a timeout, in my opinion.

Your suggestion makes a wmi_cmd call to give a little more chance for the
belated callback to complete (although timeout has actually expired). That
is probably good, but increasing a timeout value makes that job, too. I
don't think it makes any sense on real hardware.

Or do you mean there is data corruption that is properly fixed with your
patch?

That is, I agree there can be a situation when a callback makes all the
logical work it should and it just hasn't got enough time to perform a
completion before a timeout on waiter's side occurs. And this behaviour
can be named "racy". But, technically, this seems to be a rather valid
timeout.

> [1] https://lore.kernel.org/lkml/9d9b9652-c1ac-58e9-2eab-9256c17b1da2@I-love.SAKURA.ne.jp/
> 

I don't think it's a similar case because wait_for_completion_state() is
interruptible while wait_for_completion_timeout() is not.

> A correct fix looks like after putting pieces together
> 
> +++ b/drivers/net/wireless/ath/ath9k/wmi.c
> @@ -238,6 +238,7 @@ static void ath9k_wmi_ctrl_rx(void *priv
>  		spin_unlock_irqrestore(&wmi->wmi_lock, flags);
>  		goto free_skb;
>  	}
> +	wmi->last_seq_id = 0;
>  	spin_unlock_irqrestore(&wmi->wmi_lock, flags);
>  
>  	/* WMI command response */
> @@ -339,9 +340,20 @@ int ath9k_wmi_cmd(struct wmi *wmi, enum
>  
>  	time_left = wait_for_completion_timeout(&wmi->cmd_wait, timeout);
>  	if (!time_left) {
> +		unsigned long flags;
> +		int wait = 0;
> +
>  		ath_dbg(common, WMI, "Timeout waiting for WMI command: %s\n",
>  			wmi_cmd_to_name(cmd_id));
> -		wmi->last_seq_id = 0;
> +
> +		spin_lock_irqsave(&wmi->wmi_lock, flags);
> +		if (wmi->last_seq_id == 0) /* job done on the waker side? */
> +			wait = 1;
> +		else
> +			wmi->last_seq_id = 0;
> +		spin_unlock_irqrestore(&wmi->wmi_lock, flags);
> +		if (wait)
> +			wait_for_completion(&wmi->cmd_wait);
>  		mutex_unlock(&wmi->op_mutex);
>  		return -ETIMEDOUT;
>  	}

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] wifi: ath9k: avoid referencing uninit memory in ath9k_wmi_ctrl_rx
  2023-04-24 18:33     ` [PATCH v2] " Fedor Pchelkin
  2023-04-25 11:14       ` Toke Høiland-Jørgensen
@ 2023-04-28 16:52       ` Kalle Valo
  1 sibling, 0 replies; 24+ messages in thread
From: Kalle Valo @ 2023-04-28 16:52 UTC (permalink / raw)
  To: Fedor Pchelkin
  Cc: Toke Høiland-Jørgensen, Fedor Pchelkin,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Senthil Balasubramanian, John W. Linville,
	Vasanthakumar Thiagarajan, Sujith, linux-wireless, netdev,
	linux-kernel, Alexey Khoroshilov, lvc-project,
	syzbot+f2cb6e0ffdb961921e4d

Fedor Pchelkin <pchelkin@ispras.ru> wrote:

> For the reasons also described in commit b383e8abed41 ("wifi: ath9k: avoid
> uninit memory read in ath9k_htc_rx_msg()"), ath9k_htc_rx_msg() should
> validate pkt_len before accessing the SKB.
> 
> For example, the obtained SKB may have been badly constructed with
> pkt_len = 8. In this case, the SKB can only contain a valid htc_frame_hdr
> but after being processed in ath9k_htc_rx_msg() and passed to
> ath9k_wmi_ctrl_rx() endpoint RX handler, it is expected to have a WMI
> command header which should be located inside its data payload.
> 
> Implement sanity checking inside ath9k_wmi_ctrl_rx(). Otherwise, uninit
> memory can be referenced.
> 
> Tested on Qualcomm Atheros Communications AR9271 802.11n .
> 
> Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
> 
> Fixes: fb9987d0f748 ("ath9k_htc: Support for AR9271 chipset.")
> Reported-and-tested-by: syzbot+f2cb6e0ffdb961921e4d@syzkaller.appspotmail.com
> Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>

Patch applied to ath-next branch of ath.git, thanks.

f24292e82708 wifi: ath9k: avoid referencing uninit memory in ath9k_wmi_ctrl_rx

-- 
https://patchwork.kernel.org/project/linux-wireless/patch/20230424183348.111355-1-pchelkin@ispras.ru/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 1/2] wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx
  2023-04-26 19:02             ` [PATCH v3 1/2] wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx Fedor Pchelkin
@ 2023-05-15 12:06               ` Toke Høiland-Jørgensen
  2023-05-18 10:24               ` Hillf Danton
  1 sibling, 0 replies; 24+ messages in thread
From: Toke Høiland-Jørgensen @ 2023-05-15 12:06 UTC (permalink / raw)
  To: Fedor Pchelkin, Hillf Danton
  Cc: Kalle Vallo, syzbot+f2cb6e0ffdb961921e4d, linux-wireless, netdev,
	linux-kernel, Alexey Khoroshilov, lvc-project

Fedor Pchelkin <pchelkin@ispras.ru> writes:

> On Wed, Apr 26, 2023 at 07:07:08AM +0800, Hillf Danton wrote: 
>> Given similar wait timeout[1], just taking lock on the waiter side is not
>> enough wrt fixing the race, because in case job done on the waker side,
>> waiter needs to wait again after timeout.
>> 
>
> As I understand you correctly, you mean the case when a timeout occurs
> during ath9k_wmi_ctrl_rx() callback execution. I suppose if a timeout has
> occurred on a waiter's side, it should return immediately and doesn't have
> to care in which state the callback has been at that moment.
>
> AFAICS, this is controlled properly with taking a wmi_lock on waiter and
> waker sides, and there is no data corruption.
>
> If a callback has not managed to do its work entirely (performing a
> completion and subsequently waking waiting thread is included here), then,
> well, it is considered a timeout, in my opinion.
>
> Your suggestion makes a wmi_cmd call to give a little more chance for the
> belated callback to complete (although timeout has actually expired). That
> is probably good, but increasing a timeout value makes that job, too. I
> don't think it makes any sense on real hardware.
>
> Or do you mean there is data corruption that is properly fixed with your
> patch?
>
> That is, I agree there can be a situation when a callback makes all the
> logical work it should and it just hasn't got enough time to perform a
> completion before a timeout on waiter's side occurs. And this behaviour
> can be named "racy". But, technically, this seems to be a rather valid
> timeout.
>
>> [1] https://lore.kernel.org/lkml/9d9b9652-c1ac-58e9-2eab-9256c17b1da2@I-love.SAKURA.ne.jp/
>> 
>
> I don't think it's a similar case because wait_for_completion_state() is
> interruptible while wait_for_completion_timeout() is not.

Ping, Hillf?

-Toke

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 1/2] wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx
  2023-04-26 19:02             ` [PATCH v3 1/2] wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx Fedor Pchelkin
  2023-05-15 12:06               ` Toke Høiland-Jørgensen
@ 2023-05-18 10:24               ` Hillf Danton
  2023-05-18 15:44                 ` Fedor Pchelkin
  1 sibling, 1 reply; 24+ messages in thread
From: Hillf Danton @ 2023-05-18 10:24 UTC (permalink / raw)
  To: Fedor Pchelkin
  Cc: Toke Høiland-Jørgensen, Kalle Vallo,
	syzbot+f2cb6e0ffdb961921e4d, linux-wireless, netdev,
	linux-kernel, Alexey Khoroshilov, lvc-project

Fedor Pchelkin <pchelkin@ispras.ru> writes:

> On Wed, Apr 26, 2023 at 07:07:08AM +0800, Hillf Danton wrote: 
>> Given similar wait timeout[1], just taking lock on the waiter side is not
>> enough wrt fixing the race, because in case job done on the waker side,
>> waiter needs to wait again after timeout.
>> 
>
> As I understand you correctly, you mean the case when a timeout occurs
> during ath9k_wmi_ctrl_rx() callback execution. I suppose if a timeout has
> occurred on a waiter's side, it should return immediately and doesn't have
> to care in which state the callback has been at that moment.
>
> AFAICS, this is controlled properly with taking a wmi_lock on waiter and
> waker sides, and there is no data corruption.
>
> If a callback has not managed to do its work entirely (performing a
> completion and subsequently waking waiting thread is included here), then,
> well, it is considered a timeout, in my opinion.
>
> Your suggestion makes a wmi_cmd call to give a little more chance for the
> belated callback to complete (although timeout has actually expired). That
> is probably good, but increasing a timeout value makes that job, too. I
> don't think it makes any sense on real hardware.
>
> Or do you mean there is data corruption that is properly fixed with your patch?

Given complete() not paired with wait_for_completion(), what is the
difference after this patch?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 1/2] wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx
  2023-05-18 10:24               ` Hillf Danton
@ 2023-05-18 15:44                 ` Fedor Pchelkin
  0 siblings, 0 replies; 24+ messages in thread
From: Fedor Pchelkin @ 2023-05-18 15:44 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Toke Høiland-Jørgensen, Kalle Vallo,
	syzbot+f2cb6e0ffdb961921e4d, linux-wireless, netdev,
	linux-kernel, Alexey Khoroshilov, lvc-project

On Thu, May 18, 2023 at 06:24:37PM +0800, Hillf Danton wrote:
> Fedor Pchelkin <pchelkin@ispras.ru> writes:
> 
> > On Wed, Apr 26, 2023 at 07:07:08AM +0800, Hillf Danton wrote: 
> >> Given similar wait timeout[1], just taking lock on the waiter side is not
> >> enough wrt fixing the race, because in case job done on the waker side,
> >> waiter needs to wait again after timeout.
> >> 
> >
> > As I understand you correctly, you mean the case when a timeout occurs
> > during ath9k_wmi_ctrl_rx() callback execution. I suppose if a timeout has
> > occurred on a waiter's side, it should return immediately and doesn't have
> > to care in which state the callback has been at that moment.
> >
> > AFAICS, this is controlled properly with taking a wmi_lock on waiter and
> > waker sides, and there is no data corruption.
> >
> > If a callback has not managed to do its work entirely (performing a
> > completion and subsequently waking waiting thread is included here), then,
> > well, it is considered a timeout, in my opinion.
> >
> > Your suggestion makes a wmi_cmd call to give a little more chance for the
> > belated callback to complete (although timeout has actually expired). That
> > is probably good, but increasing a timeout value makes that job, too. I
> > don't think it makes any sense on real hardware.
> >
> > Or do you mean there is data corruption that is properly fixed with your patch?
> 
> Given complete() not paired with wait_for_completion(), what is the
> difference after this patch?

The main thing in the patch is making ath9k_wmi_ctrl_rx() release wmi_lock
after calling ath9k_wmi_rsp_callback() which does copying data into the
shared wmi->cmd_rsp_buf buffer. Otherwise there can occur a data
corrupting scenario outlined in the patch description (added it here,
too).

On Tue, 25 Apr 2023 22:26:06 +0300, Fedor Pchelkin wrote:
> CPU0					CPU1
> 
> ath9k_wmi_cmd(...)
>   mutex_lock(&wmi->op_mutex)
>   ath9k_wmi_cmd_issue(...)
>   wait_for_completion_timeout(...)
>   ---
>   timeout
>   ---
> 					/* the callback is being processed
> 					 * before last_seq_id became zero
> 					 */
> 					ath9k_wmi_ctrl_rx(...)
> 					  spin_lock_irqsave(...)
> 					  /* wmi->last_seq_id check here
> 					   * doesn't detect timeout yet
> 					   */
> 					  spin_unlock_irqrestore(...)
>   /* last_seq_id is zeroed to
>    * indicate there was a timeout
>    */
>   wmi->last_seq_id = 0
>   mutex_unlock(&wmi->op_mutex)
>   return -ETIMEDOUT
> 
> ath9k_wmi_cmd(...)
>   mutex_lock(&wmi->op_mutex)
>   /* the buffer is replaced with
>    * another one
>    */
>   wmi->cmd_rsp_buf = rsp_buf
>   wmi->cmd_rsp_len = rsp_len
>   ath9k_wmi_cmd_issue(...)
>     spin_lock_irqsave(...)
>     spin_unlock_irqrestore(...)
>   wait_for_completion_timeout(...)
> 					/* the continuation of the
> 					 * callback left after the first
> 					 * ath9k_wmi_cmd call
> 					 */
> 					  ath9k_wmi_rsp_callback(...)
> 					    /* copying data designated
> 					     * to already timeouted
> 					     * WMI command into an
> 					     * inappropriate wmi_cmd_buf
> 					     */
> 					    memcpy(...)
> 					    complete(&wmi->cmd_wait)
>   /* awakened by the bogus callback
>    * => invalid return result
>    */
>   mutex_unlock(&wmi->op_mutex)
>   return 0

So before the patch the wmi->last_seq_id check in ath9k_wmi_ctrl_rx()
wasn't helpful in case wmi->last_seq_id value was changed during
ath9k_wmi_rsp_callback() execution because of the next ath9k_wmi_cmd()
call.

With the proposed patch the wmi->last_seq_id check in ath9k_wmi_ctrl_rx()
accomplishes its job as:
 - the next ath9k_wmi_cmd call changes last_seq_id value under lock so
   it either waits for a belated ath9k_wmi_ctrl_rx() to finish or updates
   last_seq_id value so that the timeout check in ath9k_wmi_ctrl_rx()
   indicates that the waiter side has timeouted and we shouldn't further
   process the callback.
 - memcopying in ath9k_wmi_rsp_callback() is made to a valid place if
   the last_seq_id check was successful under the lock.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 1/2] wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx
  2023-04-25 19:26         ` [PATCH v3 1/2] " Fedor Pchelkin
  2023-04-25 19:26           ` [PATCH v3 2/2] wifi: ath9k: protect WMI command response buffer replacement with a lock Fedor Pchelkin
       [not found]           ` <20230425230708.2132-1-hdanton@sina.com>
@ 2023-08-08 14:06           ` Toke Høiland-Jørgensen
  2023-08-22 13:35           ` Kalle Valo
  3 siblings, 0 replies; 24+ messages in thread
From: Toke Høiland-Jørgensen @ 2023-08-08 14:06 UTC (permalink / raw)
  To: Fedor Pchelkin, Kalle Vallo
  Cc: Fedor Pchelkin, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Senthil Balasubramanian, John W. Linville,
	Vasanthakumar Thiagarajan, Sujith, linux-wireless, netdev,
	linux-kernel, Alexey Khoroshilov, lvc-project,
	syzbot+f2cb6e0ffdb961921e4d, Hillf Danton

Fedor Pchelkin <pchelkin@ispras.ru> writes:

> Currently, the synchronization between ath9k_wmi_cmd() and
> ath9k_wmi_ctrl_rx() is exposed to a race condition which, although being
> rather unlikely, can lead to invalid behaviour of ath9k_wmi_cmd().
>
> Consider the following scenario:
>
> CPU0					CPU1
>
> ath9k_wmi_cmd(...)
>   mutex_lock(&wmi->op_mutex)
>   ath9k_wmi_cmd_issue(...)
>   wait_for_completion_timeout(...)
>   ---
>   timeout
>   ---
> 					/* the callback is being processed
> 					 * before last_seq_id became zero
> 					 */
> 					ath9k_wmi_ctrl_rx(...)
> 					  spin_lock_irqsave(...)
> 					  /* wmi->last_seq_id check here
> 					   * doesn't detect timeout yet
> 					   */
> 					  spin_unlock_irqrestore(...)
>   /* last_seq_id is zeroed to
>    * indicate there was a timeout
>    */
>   wmi->last_seq_id = 0
>   mutex_unlock(&wmi->op_mutex)
>   return -ETIMEDOUT
>
> ath9k_wmi_cmd(...)
>   mutex_lock(&wmi->op_mutex)
>   /* the buffer is replaced with
>    * another one
>    */
>   wmi->cmd_rsp_buf = rsp_buf
>   wmi->cmd_rsp_len = rsp_len
>   ath9k_wmi_cmd_issue(...)
>     spin_lock_irqsave(...)
>     spin_unlock_irqrestore(...)
>   wait_for_completion_timeout(...)
> 					/* the continuation of the
> 					 * callback left after the first
> 					 * ath9k_wmi_cmd call
> 					 */
> 					  ath9k_wmi_rsp_callback(...)
> 					    /* copying data designated
> 					     * to already timeouted
> 					     * WMI command into an
> 					     * inappropriate wmi_cmd_buf
> 					     */
> 					    memcpy(...)
> 					    complete(&wmi->cmd_wait)
>   /* awakened by the bogus callback
>    * => invalid return result
>    */
>   mutex_unlock(&wmi->op_mutex)
>   return 0
>
> To fix this, update last_seq_id on timeout path inside ath9k_wmi_cmd()
> under the wmi_lock. Move ath9k_wmi_rsp_callback() under wmi_lock inside
> ath9k_wmi_ctrl_rx() so that the wmi->cmd_wait can be completed only for
> initially designated wmi_cmd call, otherwise the path would be rejected
> with last_seq_id check.
>
> Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
>
> Fixes: fb9987d0f748 ("ath9k_htc: Support for AR9271 chipset.")
> Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>

Alright, finally took the time to dig into this and convince myself that
the fix if correct. Sorry for taking so long!

Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 2/2] wifi: ath9k: protect WMI command response buffer replacement with a lock
  2023-04-25 19:26           ` [PATCH v3 2/2] wifi: ath9k: protect WMI command response buffer replacement with a lock Fedor Pchelkin
@ 2023-08-08 14:07             ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 24+ messages in thread
From: Toke Høiland-Jørgensen @ 2023-08-08 14:07 UTC (permalink / raw)
  To: Fedor Pchelkin, Kalle Vallo
  Cc: Fedor Pchelkin, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Senthil Balasubramanian, John W. Linville,
	Vasanthakumar Thiagarajan, Sujith, linux-wireless, netdev,
	linux-kernel, Alexey Khoroshilov, lvc-project,
	syzbot+f2cb6e0ffdb961921e4d, Hillf Danton

Fedor Pchelkin <pchelkin@ispras.ru> writes:

> If ath9k_wmi_cmd() has exited with a timeout, it is possible that during
> next ath9k_wmi_cmd() call the wmi_rsp callback for previous wmi command
> writes to new wmi->cmd_rsp_buf and makes a completion. This results in an
> invalid ath9k_wmi_cmd() return value.
>
> Move the replacement of WMI command response buffer and length under
> wmi_lock. Note that last_seq_id value is updated there, too.
>
> Thus, the buffer cannot be written to by a belated wmi_rsp callback
> because that path is properly rejected by the last_seq_id check.
>
> Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
>
> Fixes: fb9987d0f748 ("ath9k_htc: Support for AR9271 chipset.")
> Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>

Given that the previous patch resets the last_seq_id to 0 on timeout
under the lock, I don't think this patch is strictly necessary anymore.
However, it doesn't hurt either, and I actually think moving the update
of the rsp buf into ath9k_wmi_cmd_issue() aids readability, so:

Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 1/2] wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx
  2023-04-25 19:26         ` [PATCH v3 1/2] " Fedor Pchelkin
                             ` (2 preceding siblings ...)
  2023-08-08 14:06           ` Toke Høiland-Jørgensen
@ 2023-08-22 13:35           ` Kalle Valo
  3 siblings, 0 replies; 24+ messages in thread
From: Kalle Valo @ 2023-08-22 13:35 UTC (permalink / raw)
  To: Fedor Pchelkin
  Cc: Toke Høiland-Jørgensen, Fedor Pchelkin,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Senthil Balasubramanian, John W. Linville,
	Vasanthakumar Thiagarajan, Sujith, linux-wireless, netdev,
	linux-kernel, Alexey Khoroshilov, lvc-project,
	syzbot+f2cb6e0ffdb961921e4d, Hillf Danton

Fedor Pchelkin <pchelkin@ispras.ru> wrote:

> Currently, the synchronization between ath9k_wmi_cmd() and
> ath9k_wmi_ctrl_rx() is exposed to a race condition which, although being
> rather unlikely, can lead to invalid behaviour of ath9k_wmi_cmd().
> 
> Consider the following scenario:
> 
> CPU0                                    CPU1
> 
> ath9k_wmi_cmd(...)
>   mutex_lock(&wmi->op_mutex)
>   ath9k_wmi_cmd_issue(...)
>   wait_for_completion_timeout(...)
>   ---
>   timeout
>   ---
>                                         /* the callback is being processed
>                                          * before last_seq_id became zero
>                                          */
>                                         ath9k_wmi_ctrl_rx(...)
>                                           spin_lock_irqsave(...)
>                                           /* wmi->last_seq_id check here
>                                            * doesn't detect timeout yet
>                                            */
>                                           spin_unlock_irqrestore(...)
>   /* last_seq_id is zeroed to
>    * indicate there was a timeout
>    */
>   wmi->last_seq_id = 0
>   mutex_unlock(&wmi->op_mutex)
>   return -ETIMEDOUT
> 
> ath9k_wmi_cmd(...)
>   mutex_lock(&wmi->op_mutex)
>   /* the buffer is replaced with
>    * another one
>    */
>   wmi->cmd_rsp_buf = rsp_buf
>   wmi->cmd_rsp_len = rsp_len
>   ath9k_wmi_cmd_issue(...)
>     spin_lock_irqsave(...)
>     spin_unlock_irqrestore(...)
>   wait_for_completion_timeout(...)
>                                         /* the continuation of the
>                                          * callback left after the first
>                                          * ath9k_wmi_cmd call
>                                          */
>                                           ath9k_wmi_rsp_callback(...)
>                                             /* copying data designated
>                                              * to already timeouted
>                                              * WMI command into an
>                                              * inappropriate wmi_cmd_buf
>                                              */
>                                             memcpy(...)
>                                             complete(&wmi->cmd_wait)
>   /* awakened by the bogus callback
>    * => invalid return result
>    */
>   mutex_unlock(&wmi->op_mutex)
>   return 0
> 
> To fix this, update last_seq_id on timeout path inside ath9k_wmi_cmd()
> under the wmi_lock. Move ath9k_wmi_rsp_callback() under wmi_lock inside
> ath9k_wmi_ctrl_rx() so that the wmi->cmd_wait can be completed only for
> initially designated wmi_cmd call, otherwise the path would be rejected
> with last_seq_id check.
> 
> Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
> 
> Fixes: fb9987d0f748 ("ath9k_htc: Support for AR9271 chipset.")
> Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>

2 patches applied to ath-next branch of ath.git, thanks.

b674fb513e2e wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx
454994cfa9e4 wifi: ath9k: protect WMI command response buffer replacement with a lock

-- 
https://patchwork.kernel.org/project/linux-wireless/patch/20230425192607.18015-1-pchelkin@ispras.ru/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2023-08-22 13:35 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-15 20:21 [PATCH 0/3] wifi: ath9k: deal with uninit memory Fedor Pchelkin
2023-03-15 20:21 ` [PATCH 1/3] wifi: ath9k: avoid referencing uninit memory in ath9k_wmi_ctrl_rx Fedor Pchelkin
2023-03-17  5:26   ` Kalle Valo
2023-03-18 20:25     ` Fedor Pchelkin
2023-04-24 18:23   ` Fedor Pchelkin
2023-04-24 18:33     ` [PATCH v2] " Fedor Pchelkin
2023-04-25 11:14       ` Toke Høiland-Jørgensen
2023-04-28 16:52       ` Kalle Valo
2023-03-15 20:21 ` [PATCH 2/3] wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx Fedor Pchelkin
2023-04-24 19:11   ` Fedor Pchelkin
2023-04-24 19:18     ` [PATCH v2] " Fedor Pchelkin
     [not found]       ` <20230425033832.2041-1-hdanton@sina.com>
2023-04-25  5:45         ` Kalle Valo
2023-04-25  7:54         ` Fedor Pchelkin
2023-04-25 19:26         ` [PATCH v3 1/2] " Fedor Pchelkin
2023-04-25 19:26           ` [PATCH v3 2/2] wifi: ath9k: protect WMI command response buffer replacement with a lock Fedor Pchelkin
2023-08-08 14:07             ` Toke Høiland-Jørgensen
     [not found]           ` <20230425230708.2132-1-hdanton@sina.com>
2023-04-26 19:02             ` [PATCH v3 1/2] wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx Fedor Pchelkin
2023-05-15 12:06               ` Toke Høiland-Jørgensen
2023-05-18 10:24               ` Hillf Danton
2023-05-18 15:44                 ` Fedor Pchelkin
2023-08-08 14:06           ` Toke Høiland-Jørgensen
2023-08-22 13:35           ` Kalle Valo
2023-03-15 20:21 ` [PATCH 3/3] wifi: ath9k: fix ath9k_wmi_cmd return value when device is unplugged Fedor Pchelkin
2023-03-15 20:47 ` [PATCH 0/3] wifi: ath9k: deal with uninit memory Fedor Pchelkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.