* [PATCH 0/3] wifi: ath9k: deal with uninit memory @ 2023-03-15 20:21 Fedor Pchelkin 2023-03-15 20:21 ` [PATCH 1/3] wifi: ath9k: avoid referencing uninit memory in ath9k_wmi_ctrl_rx Fedor Pchelkin ` (3 more replies) 0 siblings, 4 replies; 24+ messages in thread From: Fedor Pchelkin @ 2023-03-15 20:21 UTC (permalink / raw) To: Toke Høiland-Jørgensen Cc: Fedor Pchelkin, Kalle Valo, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Senthil Balasubramanian, John W. Linville, Vasanthakumar Thiagarajan, Sujith, linux-wireless, netdev, linux-kernel, Alexey Khoroshilov, lvc-project Syzkaller reports two cases ([1] and [2]) of uninitialized memory referencing in ath9k wmi functions. The following patch series is intended to fix them and related issues. [1] https://syzkaller.appspot.com/bug?id=51d401326d8ee41859d68997acdd6f3b1b39f186 [2] https://syzkaller.appspot.com/bug?id=fc54e8d79f5d5082c7867259d71b4e6618b69d25 ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 1/3] wifi: ath9k: avoid referencing uninit memory in ath9k_wmi_ctrl_rx 2023-03-15 20:21 [PATCH 0/3] wifi: ath9k: deal with uninit memory Fedor Pchelkin @ 2023-03-15 20:21 ` Fedor Pchelkin 2023-03-17 5:26 ` Kalle Valo 2023-04-24 18:23 ` Fedor Pchelkin 2023-03-15 20:21 ` [PATCH 2/3] wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx Fedor Pchelkin ` (2 subsequent siblings) 3 siblings, 2 replies; 24+ messages in thread From: Fedor Pchelkin @ 2023-03-15 20:21 UTC (permalink / raw) To: Toke Høiland-Jørgensen Cc: Fedor Pchelkin, Kalle Valo, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Senthil Balasubramanian, John W. Linville, Vasanthakumar Thiagarajan, Sujith, linux-wireless, netdev, linux-kernel, Alexey Khoroshilov, lvc-project, syzbot+f2cb6e0ffdb961921e4d For the reasons described in commit b383e8abed41 ("wifi: ath9k: avoid uninit memory read in ath9k_htc_rx_msg()"), ath9k_htc_rx_msg() should validate pkt_len before accessing the SKB. For example, the obtained SKB may have uninitialized memory in the case of ioctl(USB_RAW_IOCTL_EP_WRITE). Implement sanity checking inside the corresponding endpoint RX handlers: ath9k_wmi_ctrl_rx() and ath9k_htc_rxep(). Otherwise, uninit memory can be referenced. Add comments briefly describing the issue. Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Fixes: fb9987d0f748 ("ath9k_htc: Support for AR9271 chipset.") Reported-and-tested-by: syzbot+f2cb6e0ffdb961921e4d@syzkaller.appspotmail.com Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> --- drivers/net/wireless/ath/ath9k/htc_drv_txrx.c | 6 ++++++ drivers/net/wireless/ath/ath9k/htc_hst.c | 4 ++++ drivers/net/wireless/ath/ath9k/wmi.c | 8 ++++++++ 3 files changed, 18 insertions(+) diff --git a/drivers/net/wireless/ath/ath9k/htc_drv_txrx.c b/drivers/net/wireless/ath/ath9k/htc_drv_txrx.c index 672789e3c55d..957efb26019d 100644 --- a/drivers/net/wireless/ath/ath9k/htc_drv_txrx.c +++ b/drivers/net/wireless/ath/ath9k/htc_drv_txrx.c @@ -1147,6 +1147,12 @@ void ath9k_htc_rxep(void *drv_priv, struct sk_buff *skb, if (!data_race(priv->rx.initialized)) goto err; + /* Validate the obtained SKB so that it is handled without error + * inside rx_tasklet handler. + */ + if (unlikely(skb->len < sizeof(struct ieee80211_hdr))) + goto err; + spin_lock_irqsave(&priv->rx.rxbuflock, flags); list_for_each_entry(tmp_buf, &priv->rx.rxbuf, list) { if (!tmp_buf->in_process) { diff --git a/drivers/net/wireless/ath/ath9k/htc_hst.c b/drivers/net/wireless/ath/ath9k/htc_hst.c index fe62ff668f75..9d0d9d0e1aa8 100644 --- a/drivers/net/wireless/ath/ath9k/htc_hst.c +++ b/drivers/net/wireless/ath/ath9k/htc_hst.c @@ -475,6 +475,10 @@ void ath9k_htc_rx_msg(struct htc_target *htc_handle, skb_pull(skb, sizeof(struct htc_frame_hdr)); endpoint = &htc_handle->endpoint[epid]; + + /* The endpoint RX handlers should implement their own + * additional SKB sanity checking + */ if (endpoint->ep_callbacks.rx) endpoint->ep_callbacks.rx(endpoint->ep_callbacks.priv, skb, epid); diff --git a/drivers/net/wireless/ath/ath9k/wmi.c b/drivers/net/wireless/ath/ath9k/wmi.c index 19345b8f7bfd..2e7c361b62f5 100644 --- a/drivers/net/wireless/ath/ath9k/wmi.c +++ b/drivers/net/wireless/ath/ath9k/wmi.c @@ -204,6 +204,10 @@ static void ath9k_wmi_rsp_callback(struct wmi *wmi, struct sk_buff *skb) { skb_pull(skb, sizeof(struct wmi_cmd_hdr)); + /* Once again validate the SKB. */ + if (unlikely(skb->len < wmi->cmd_rsp_len)) + return; + if (wmi->cmd_rsp_buf != NULL && wmi->cmd_rsp_len != 0) memcpy(wmi->cmd_rsp_buf, skb->data, wmi->cmd_rsp_len); @@ -221,6 +225,10 @@ static void ath9k_wmi_ctrl_rx(void *priv, struct sk_buff *skb, if (unlikely(wmi->stopped)) goto free_skb; + /* Validate the obtained SKB. */ + if (unlikely(skb->len < sizeof(struct wmi_cmd_hdr))) + goto free_skb; + hdr = (struct wmi_cmd_hdr *) skb->data; cmd_id = be16_to_cpu(hdr->command_id); -- 2.34.1 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH 1/3] wifi: ath9k: avoid referencing uninit memory in ath9k_wmi_ctrl_rx 2023-03-15 20:21 ` [PATCH 1/3] wifi: ath9k: avoid referencing uninit memory in ath9k_wmi_ctrl_rx Fedor Pchelkin @ 2023-03-17 5:26 ` Kalle Valo 2023-03-18 20:25 ` Fedor Pchelkin 2023-04-24 18:23 ` Fedor Pchelkin 1 sibling, 1 reply; 24+ messages in thread From: Kalle Valo @ 2023-03-17 5:26 UTC (permalink / raw) To: Fedor Pchelkin Cc: Toke Høiland-Jørgensen, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Senthil Balasubramanian, John W. Linville, Vasanthakumar Thiagarajan, Sujith, linux-wireless, netdev, linux-kernel, Alexey Khoroshilov, lvc-project, syzbot+f2cb6e0ffdb961921e4d Fedor Pchelkin <pchelkin@ispras.ru> writes: > For the reasons described in commit b383e8abed41 ("wifi: ath9k: avoid > uninit memory read in ath9k_htc_rx_msg()"), ath9k_htc_rx_msg() should > validate pkt_len before accessing the SKB. For example, the obtained SKB > may have uninitialized memory in the case of > ioctl(USB_RAW_IOCTL_EP_WRITE). > > Implement sanity checking inside the corresponding endpoint RX handlers: > ath9k_wmi_ctrl_rx() and ath9k_htc_rxep(). Otherwise, uninit memory can > be referenced. > > Add comments briefly describing the issue. > > Found by Linux Verification Center (linuxtesting.org) with Syzkaller. It would be also nice to know how you have tested these. Syzkaller is no substitute for testing on a real hardware. -- https://patchwork.kernel.org/project/linux-wireless/list/ https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/3] wifi: ath9k: avoid referencing uninit memory in ath9k_wmi_ctrl_rx 2023-03-17 5:26 ` Kalle Valo @ 2023-03-18 20:25 ` Fedor Pchelkin 0 siblings, 0 replies; 24+ messages in thread From: Fedor Pchelkin @ 2023-03-18 20:25 UTC (permalink / raw) To: Kalle Valo Cc: Toke Høiland-Jørgensen, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Senthil Balasubramanian, John W. Linville, Vasanthakumar Thiagarajan, Sujith, linux-wireless, netdev, linux-kernel, Alexey Khoroshilov, lvc-project, syzbot+f2cb6e0ffdb961921e4d On Fri, Mar 17, 2023 at 07:26:11AM +0200, Kalle Valo wrote: > > It would be also nice to know how you have tested these. Syzkaller is no > substitute for testing on a real hardware. > Unfortunately, currently I can't test this on real hardware so probably we should postpone the patch discussion for some time. Roughly in a week or two I'll be able to do some testing and try to reproduce the problem there. For sure this should be tested on real hardware as some issues may arise. I sent the patch based on the commit b383e8abed41 ("wifi: ath9k: avoid uninit memory read in ath9k_htc_rx_msg()") where it is explained thoroughly what can lead to such behaviour. At the moment I don't see anything in the code which can prevent that invalid scenario to happen for endpoint callbacks path. Actually, sanity checks for SKB length have been added everywhere inside ath9k_htc_rx_msg() except where the endpoint callbacks are called. As for the repro, the SKB inside ath9k_hif_usb_rx_stream() is allocated with pkt_len=8 so it passes the 'htc_frame_hdr' check and processing in ath9k_htc_rx_msg() but it obviously cannot be handled correctly in the endpoint callbacks then. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/3] wifi: ath9k: avoid referencing uninit memory in ath9k_wmi_ctrl_rx 2023-03-15 20:21 ` [PATCH 1/3] wifi: ath9k: avoid referencing uninit memory in ath9k_wmi_ctrl_rx Fedor Pchelkin 2023-03-17 5:26 ` Kalle Valo @ 2023-04-24 18:23 ` Fedor Pchelkin 2023-04-24 18:33 ` [PATCH v2] " Fedor Pchelkin 1 sibling, 1 reply; 24+ messages in thread From: Fedor Pchelkin @ 2023-04-24 18:23 UTC (permalink / raw) To: Toke Høiland-Jørgensen, Kalle Vallo Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Senthil Balasubramanian, John W. Linville, Vasanthakumar Thiagarajan, Sujith, linux-wireless, netdev, linux-kernel, Alexey Khoroshilov, lvc-project, syzbot+f2cb6e0ffdb961921e4d On Wed, Mar 15, 2023 at 11:21:10PM +0300, Fedor Pchelkin wrote: > For the reasons described in commit b383e8abed41 ("wifi: ath9k: avoid > uninit memory read in ath9k_htc_rx_msg()"), ath9k_htc_rx_msg() should > validate pkt_len before accessing the SKB. For example, the obtained SKB > may have uninitialized memory in the case of > ioctl(USB_RAW_IOCTL_EP_WRITE). > > Implement sanity checking inside the corresponding endpoint RX handlers: > ath9k_wmi_ctrl_rx() and ath9k_htc_rxep(). Otherwise, uninit memory can > be referenced. > > Add comments briefly describing the issue. > > Found by Linux Verification Center (linuxtesting.org) with Syzkaller. > > Fixes: fb9987d0f748 ("ath9k_htc: Support for AR9271 chipset.") > Reported-and-tested-by: syzbot+f2cb6e0ffdb961921e4d@syzkaller.appspotmail.com > Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> > --- Apologies for the delay. I've been able to test the patches in some way on 0cf3:9271 Qualcomm Atheros Communications AR9271 802.11n . > drivers/net/wireless/ath/ath9k/htc_drv_txrx.c | 6 ++++++ > drivers/net/wireless/ath/ath9k/htc_hst.c | 4 ++++ > drivers/net/wireless/ath/ath9k/wmi.c | 8 ++++++++ > 3 files changed, 18 insertions(+) > > diff --git a/drivers/net/wireless/ath/ath9k/htc_drv_txrx.c b/drivers/net/wireless/ath/ath9k/htc_drv_txrx.c > index 672789e3c55d..957efb26019d 100644 > --- a/drivers/net/wireless/ath/ath9k/htc_drv_txrx.c > +++ b/drivers/net/wireless/ath/ath9k/htc_drv_txrx.c > @@ -1147,6 +1147,12 @@ void ath9k_htc_rxep(void *drv_priv, struct sk_buff *skb, > if (!data_race(priv->rx.initialized)) > goto err; > > + /* Validate the obtained SKB so that it is handled without error > + * inside rx_tasklet handler. > + */ > + if (unlikely(skb->len < sizeof(struct ieee80211_hdr))) > + goto err; > + > spin_lock_irqsave(&priv->rx.rxbuflock, flags); > list_for_each_entry(tmp_buf, &priv->rx.rxbuf, list) { > if (!tmp_buf->in_process) { This check above seems to be redundant: SKB is properly checked inside ath9k_rx_prepare() in rx_tasklet handler. > diff --git a/drivers/net/wireless/ath/ath9k/htc_hst.c b/drivers/net/wireless/ath/ath9k/htc_hst.c > index fe62ff668f75..9d0d9d0e1aa8 100644 > --- a/drivers/net/wireless/ath/ath9k/htc_hst.c > +++ b/drivers/net/wireless/ath/ath9k/htc_hst.c > @@ -475,6 +475,10 @@ void ath9k_htc_rx_msg(struct htc_target *htc_handle, > skb_pull(skb, sizeof(struct htc_frame_hdr)); > > endpoint = &htc_handle->endpoint[epid]; > + > + /* The endpoint RX handlers should implement their own > + * additional SKB sanity checking > + */ > if (endpoint->ep_callbacks.rx) > endpoint->ep_callbacks.rx(endpoint->ep_callbacks.priv, > skb, epid); > diff --git a/drivers/net/wireless/ath/ath9k/wmi.c b/drivers/net/wireless/ath/ath9k/wmi.c > index 19345b8f7bfd..2e7c361b62f5 100644 > --- a/drivers/net/wireless/ath/ath9k/wmi.c > +++ b/drivers/net/wireless/ath/ath9k/wmi.c > @@ -204,6 +204,10 @@ static void ath9k_wmi_rsp_callback(struct wmi *wmi, struct sk_buff *skb) > { > skb_pull(skb, sizeof(struct wmi_cmd_hdr)); > > + /* Once again validate the SKB. */ > + if (unlikely(skb->len < wmi->cmd_rsp_len)) > + return; > + > if (wmi->cmd_rsp_buf != NULL && wmi->cmd_rsp_len != 0) > memcpy(wmi->cmd_rsp_buf, skb->data, wmi->cmd_rsp_len); > The change above is very very wrong. Looking at the firmware code and debugging driver with real device, I realized the command response is located in the tailroom of an SKB: skb->len (SKB data length) is zero in such packets while skb->data and skb->tail pointers are equal. So a new skb->len and cmd_rsp_len check resulted in driver denying all WMI command responses. My bad for proposing such a mistake. > @@ -221,6 +225,10 @@ static void ath9k_wmi_ctrl_rx(void *priv, struct sk_buff *skb, > if (unlikely(wmi->stopped)) > goto free_skb; > > + /* Validate the obtained SKB. */ > + if (unlikely(skb->len < sizeof(struct wmi_cmd_hdr))) > + goto free_skb; > + > hdr = (struct wmi_cmd_hdr *) skb->data; > cmd_id = be16_to_cpu(hdr->command_id); > This check above is actually good. A packet can be obtained constructed something like this (taken from Syzkaller reproducer): *(uint16_t*)0x20000500 = 8; // pkt_len *(uint16_t*)0x20000502 = 0x4e00; // ATH_USB_RX_STREAM_MODE_TAG memcpy((void*)0x20000504, "\x15\xa7\xd5\x61\xb9\xb3\xb0\x7c", 8); syz_usb_ep_write(r[0], 0x82, 0xc, 0x20000500); pkt_len is 8, so that it can contain an htc_frame_hdr, but when the SKB is processed in ath9k_htc_rx_msg() and passed to the endpoint RX handler, ath9k_wmi_ctrl_rx() specifically, the problem arises as there are no other corresponding headers in the buffer. wmi_cmd_hdr is pulled later so there are no problems with checking skb->len. There are no issues arised with the patch on a real device, too. Not sure if this can happen on an ath9k device with common firmware (probably can't happen), but that is real with some bad USB device which Syzkaller emulates. I'll send the v2 versions for further discussions. > -- > 2.34.1 > ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH v2] wifi: ath9k: avoid referencing uninit memory in ath9k_wmi_ctrl_rx 2023-04-24 18:23 ` Fedor Pchelkin @ 2023-04-24 18:33 ` Fedor Pchelkin 2023-04-25 11:14 ` Toke Høiland-Jørgensen 2023-04-28 16:52 ` Kalle Valo 0 siblings, 2 replies; 24+ messages in thread From: Fedor Pchelkin @ 2023-04-24 18:33 UTC (permalink / raw) To: Toke Høiland-Jørgensen, Kalle Vallo Cc: Fedor Pchelkin, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Senthil Balasubramanian, John W. Linville, Vasanthakumar Thiagarajan, Sujith, linux-wireless, netdev, linux-kernel, Alexey Khoroshilov, lvc-project, syzbot+f2cb6e0ffdb961921e4d For the reasons also described in commit b383e8abed41 ("wifi: ath9k: avoid uninit memory read in ath9k_htc_rx_msg()"), ath9k_htc_rx_msg() should validate pkt_len before accessing the SKB. For example, the obtained SKB may have been badly constructed with pkt_len = 8. In this case, the SKB can only contain a valid htc_frame_hdr but after being processed in ath9k_htc_rx_msg() and passed to ath9k_wmi_ctrl_rx() endpoint RX handler, it is expected to have a WMI command header which should be located inside its data payload. Implement sanity checking inside ath9k_wmi_ctrl_rx(). Otherwise, uninit memory can be referenced. Tested on Qualcomm Atheros Communications AR9271 802.11n . Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Fixes: fb9987d0f748 ("ath9k_htc: Support for AR9271 chipset.") Reported-and-tested-by: syzbot+f2cb6e0ffdb961921e4d@syzkaller.appspotmail.com Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> --- v2: remove incorrect checks and rephrase commit info drivers/net/wireless/ath/ath9k/wmi.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/net/wireless/ath/ath9k/wmi.c b/drivers/net/wireless/ath/ath9k/wmi.c index 19345b8f7bfd..d652c647d56b 100644 --- a/drivers/net/wireless/ath/ath9k/wmi.c +++ b/drivers/net/wireless/ath/ath9k/wmi.c @@ -221,6 +221,10 @@ static void ath9k_wmi_ctrl_rx(void *priv, struct sk_buff *skb, if (unlikely(wmi->stopped)) goto free_skb; + /* Validate the obtained SKB. */ + if (unlikely(skb->len < sizeof(struct wmi_cmd_hdr))) + goto free_skb; + hdr = (struct wmi_cmd_hdr *) skb->data; cmd_id = be16_to_cpu(hdr->command_id); -- 2.34.1 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH v2] wifi: ath9k: avoid referencing uninit memory in ath9k_wmi_ctrl_rx 2023-04-24 18:33 ` [PATCH v2] " Fedor Pchelkin @ 2023-04-25 11:14 ` Toke Høiland-Jørgensen 2023-04-28 16:52 ` Kalle Valo 1 sibling, 0 replies; 24+ messages in thread From: Toke Høiland-Jørgensen @ 2023-04-25 11:14 UTC (permalink / raw) To: Fedor Pchelkin, Kalle Vallo Cc: Fedor Pchelkin, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Senthil Balasubramanian, John W. Linville, Vasanthakumar Thiagarajan, Sujith, linux-wireless, netdev, linux-kernel, Alexey Khoroshilov, lvc-project, syzbot+f2cb6e0ffdb961921e4d Fedor Pchelkin <pchelkin@ispras.ru> writes: > For the reasons also described in commit b383e8abed41 ("wifi: ath9k: avoid > uninit memory read in ath9k_htc_rx_msg()"), ath9k_htc_rx_msg() should > validate pkt_len before accessing the SKB. > > For example, the obtained SKB may have been badly constructed with > pkt_len = 8. In this case, the SKB can only contain a valid htc_frame_hdr > but after being processed in ath9k_htc_rx_msg() and passed to > ath9k_wmi_ctrl_rx() endpoint RX handler, it is expected to have a WMI > command header which should be located inside its data payload. > > Implement sanity checking inside ath9k_wmi_ctrl_rx(). Otherwise, uninit > memory can be referenced. > > Tested on Qualcomm Atheros Communications AR9271 802.11n . > > Found by Linux Verification Center (linuxtesting.org) with Syzkaller. > > Fixes: fb9987d0f748 ("ath9k_htc: Support for AR9271 chipset.") > Reported-and-tested-by: syzbot+f2cb6e0ffdb961921e4d@syzkaller.appspotmail.com > Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2] wifi: ath9k: avoid referencing uninit memory in ath9k_wmi_ctrl_rx 2023-04-24 18:33 ` [PATCH v2] " Fedor Pchelkin 2023-04-25 11:14 ` Toke Høiland-Jørgensen @ 2023-04-28 16:52 ` Kalle Valo 1 sibling, 0 replies; 24+ messages in thread From: Kalle Valo @ 2023-04-28 16:52 UTC (permalink / raw) To: Fedor Pchelkin Cc: Toke Høiland-Jørgensen, Fedor Pchelkin, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Senthil Balasubramanian, John W. Linville, Vasanthakumar Thiagarajan, Sujith, linux-wireless, netdev, linux-kernel, Alexey Khoroshilov, lvc-project, syzbot+f2cb6e0ffdb961921e4d Fedor Pchelkin <pchelkin@ispras.ru> wrote: > For the reasons also described in commit b383e8abed41 ("wifi: ath9k: avoid > uninit memory read in ath9k_htc_rx_msg()"), ath9k_htc_rx_msg() should > validate pkt_len before accessing the SKB. > > For example, the obtained SKB may have been badly constructed with > pkt_len = 8. In this case, the SKB can only contain a valid htc_frame_hdr > but after being processed in ath9k_htc_rx_msg() and passed to > ath9k_wmi_ctrl_rx() endpoint RX handler, it is expected to have a WMI > command header which should be located inside its data payload. > > Implement sanity checking inside ath9k_wmi_ctrl_rx(). Otherwise, uninit > memory can be referenced. > > Tested on Qualcomm Atheros Communications AR9271 802.11n . > > Found by Linux Verification Center (linuxtesting.org) with Syzkaller. > > Fixes: fb9987d0f748 ("ath9k_htc: Support for AR9271 chipset.") > Reported-and-tested-by: syzbot+f2cb6e0ffdb961921e4d@syzkaller.appspotmail.com > Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> > Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> > Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Patch applied to ath-next branch of ath.git, thanks. f24292e82708 wifi: ath9k: avoid referencing uninit memory in ath9k_wmi_ctrl_rx -- https://patchwork.kernel.org/project/linux-wireless/patch/20230424183348.111355-1-pchelkin@ispras.ru/ https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 2/3] wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx 2023-03-15 20:21 [PATCH 0/3] wifi: ath9k: deal with uninit memory Fedor Pchelkin 2023-03-15 20:21 ` [PATCH 1/3] wifi: ath9k: avoid referencing uninit memory in ath9k_wmi_ctrl_rx Fedor Pchelkin @ 2023-03-15 20:21 ` Fedor Pchelkin 2023-04-24 19:11 ` Fedor Pchelkin 2023-03-15 20:21 ` [PATCH 3/3] wifi: ath9k: fix ath9k_wmi_cmd return value when device is unplugged Fedor Pchelkin 2023-03-15 20:47 ` [PATCH 0/3] wifi: ath9k: deal with uninit memory Fedor Pchelkin 3 siblings, 1 reply; 24+ messages in thread From: Fedor Pchelkin @ 2023-03-15 20:21 UTC (permalink / raw) To: Toke Høiland-Jørgensen Cc: Fedor Pchelkin, Kalle Valo, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Senthil Balasubramanian, John W. Linville, Vasanthakumar Thiagarajan, Sujith, linux-wireless, netdev, linux-kernel, Alexey Khoroshilov, lvc-project, syzbot+df61b36319e045c00a08 Currently, the synchronization between ath9k_wmi_cmd() and ath9k_wmi_ctrl_rx() is exposed to races which can result, for example, not only in wmi->cmd_rsp_buf being incorrectly initialized, but in overall invalid behaviour of ath9k_wmi_cmd(). Consider the following scenario: CPU0 CPU1 ath9k_wmi_cmd(...) mutex_lock(&wmi->op_mutex) ath9k_wmi_cmd_issue(...) wait_for_completion_timeout(...) --- timeout --- mutex_unlock(&wmi->op_mutex) return -ETIMEDOUT ath9k_wmi_cmd(...) mutex_lock(&wmi->op_mutex) ath9k_wmi_cmd_issue(...) wait_for_completion_timeout(...) /* the one left after the first * ath9k_wmi_cmd call */ ath9k_wmi_rsp_callback(...) memcpy(...) complete(&wmi->cmd_wait); /* Awakened by the bogus callback * => invalid return */ mutex_unlock(&wmi->op_mutex) return 0 This may occur even on uniprocessor machines, and in SMP case the races may be even more intricate. To fix this, move the contents of ath9k_wmi_rsp_callback() under wmi_lock inside ath9k_wmi_ctrl_rx() so that the wmi->cmd_wait can be completed only for initially designated wmi_cmd call, otherwise the path would be rejected with last_seq_id check. Also move recording the rsp buffer and length into ath9k_wmi_cmd_issue() under the same wmi_lock with last_seq_id update to avoid their racy changes. Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Fixes: fb9987d0f748 ("ath9k_htc: Support for AR9271 chipset.") Reported-and-tested-by: syzbot+df61b36319e045c00a08@syzkaller.appspotmail.com Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> --- This patch depends on [PATCH 1/3] wifi: ath9k: avoid referencing uninit memory in ath9k_wmi_ctrl_rx drivers/net/wireless/ath/ath9k/wmi.c | 48 ++++++++++++++-------------- 1 file changed, 24 insertions(+), 24 deletions(-) diff --git a/drivers/net/wireless/ath/ath9k/wmi.c b/drivers/net/wireless/ath/ath9k/wmi.c index 2e7c361b62f5..99a91bbaace9 100644 --- a/drivers/net/wireless/ath/ath9k/wmi.c +++ b/drivers/net/wireless/ath/ath9k/wmi.c @@ -200,20 +200,6 @@ void ath9k_fatal_work(struct work_struct *work) ath9k_htc_reset(priv); } -static void ath9k_wmi_rsp_callback(struct wmi *wmi, struct sk_buff *skb) -{ - skb_pull(skb, sizeof(struct wmi_cmd_hdr)); - - /* Once again validate the SKB. */ - if (unlikely(skb->len < wmi->cmd_rsp_len)) - return; - - if (wmi->cmd_rsp_buf != NULL && wmi->cmd_rsp_len != 0) - memcpy(wmi->cmd_rsp_buf, skb->data, wmi->cmd_rsp_len); - - complete(&wmi->cmd_wait); -} - static void ath9k_wmi_ctrl_rx(void *priv, struct sk_buff *skb, enum htc_endpoint_id epid) { @@ -242,14 +228,26 @@ static void ath9k_wmi_ctrl_rx(void *priv, struct sk_buff *skb, /* Check if there has been a timeout. */ spin_lock_irqsave(&wmi->wmi_lock, flags); - if (be16_to_cpu(hdr->seq_no) != wmi->last_seq_id) { + if (be16_to_cpu(hdr->seq_no) != wmi->last_seq_id || + be16_to_cpu(hdr->seq_no) == 0) { + spin_unlock_irqrestore(&wmi->wmi_lock, flags); + goto free_skb; + } + + /* Next, process WMI command response */ + skb_pull(skb, sizeof(struct wmi_cmd_hdr)); + + /* Once again validate the SKB. */ + if (unlikely(skb->len < wmi->cmd_rsp_len)) { spin_unlock_irqrestore(&wmi->wmi_lock, flags); goto free_skb; } - spin_unlock_irqrestore(&wmi->wmi_lock, flags); - /* WMI command response */ - ath9k_wmi_rsp_callback(wmi, skb); + if (wmi->cmd_rsp_buf != NULL && wmi->cmd_rsp_len != 0) + memcpy(wmi->cmd_rsp_buf, skb->data, wmi->cmd_rsp_len); + + complete(&wmi->cmd_wait); + spin_unlock_irqrestore(&wmi->wmi_lock, flags); free_skb: kfree_skb(skb); @@ -287,7 +285,8 @@ int ath9k_wmi_connect(struct htc_target *htc, struct wmi *wmi, static int ath9k_wmi_cmd_issue(struct wmi *wmi, struct sk_buff *skb, - enum wmi_cmd_id cmd, u16 len) + enum wmi_cmd_id cmd, u16 len, + u8 *rsp_buf, u32 rsp_len) { struct wmi_cmd_hdr *hdr; unsigned long flags; @@ -297,6 +296,11 @@ static int ath9k_wmi_cmd_issue(struct wmi *wmi, hdr->seq_no = cpu_to_be16(++wmi->tx_seq_id); spin_lock_irqsave(&wmi->wmi_lock, flags); + + /* record the rsp buffer and length */ + wmi->cmd_rsp_buf = rsp_buf; + wmi->cmd_rsp_len = rsp_len; + wmi->last_seq_id = wmi->tx_seq_id; spin_unlock_irqrestore(&wmi->wmi_lock, flags); @@ -337,11 +341,7 @@ int ath9k_wmi_cmd(struct wmi *wmi, enum wmi_cmd_id cmd_id, goto out; } - /* record the rsp buffer and length */ - wmi->cmd_rsp_buf = rsp_buf; - wmi->cmd_rsp_len = rsp_len; - - ret = ath9k_wmi_cmd_issue(wmi, skb, cmd_id, cmd_len); + ret = ath9k_wmi_cmd_issue(wmi, skb, cmd_id, cmd_len, rsp_buf, rsp_len); if (ret) goto out; -- 2.34.1 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH 2/3] wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx 2023-03-15 20:21 ` [PATCH 2/3] wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx Fedor Pchelkin @ 2023-04-24 19:11 ` Fedor Pchelkin 2023-04-24 19:18 ` [PATCH v2] " Fedor Pchelkin 0 siblings, 1 reply; 24+ messages in thread From: Fedor Pchelkin @ 2023-04-24 19:11 UTC (permalink / raw) To: Toke Høiland-Jørgensen, Kalle Valo Cc: Kalle Valo, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Senthil Balasubramanian, John W. Linville, Vasanthakumar Thiagarajan, Sujith, linux-wireless, netdev, linux-kernel, Alexey Khoroshilov, lvc-project, syzbot+df61b36319e045c00a08 This problem is realy subtle, I suppose. In the v2 commit info, which I'll send in the next mail, the race condition is described which can lead to invalid behaviour. Couldn't reproduce that particular problem on real hardware, but if force timeouts to wmi cmd completions, local KMSan catches some uninit values. The synchronization between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx on timeouts is good, especially after 8a2f35b98306 ("wifi: ath9k: Fix potential stack-out-of-bounds write in ath9k_wmi_rsp_callback()"). And I think the only place where the fuzzer can provoke failure is when wmi->last_seq_id in callback is checked before it is assigned zero inside ath9k_wmi_cmd() during timeout exit. This scenario is more thoroughly described in patch v2. Well, the issue seems to be rare and I don't know how to properly test it on real hardware. I've made some checks on a basic driver workflow, and there weren't any stalls or explicit failures, and the patch seems to close that tiny race condition window. But, anyway, it requires more discussion. ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH v2] wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx 2023-04-24 19:11 ` Fedor Pchelkin @ 2023-04-24 19:18 ` Fedor Pchelkin [not found] ` <20230425033832.2041-1-hdanton@sina.com> 0 siblings, 1 reply; 24+ messages in thread From: Fedor Pchelkin @ 2023-04-24 19:18 UTC (permalink / raw) To: Toke Høiland-Jørgensen, Kalle Valo Cc: Fedor Pchelkin, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Senthil Balasubramanian, John W. Linville, Vasanthakumar Thiagarajan, Sujith, linux-wireless, netdev, linux-kernel, Alexey Khoroshilov, lvc-project, syzbot+f2cb6e0ffdb961921e4d, syzbot+df61b36319e045c00a08 Currently, the synchronization between ath9k_wmi_cmd() and ath9k_wmi_ctrl_rx() is exposed to a race condition which, although being rather unlikely, can lead to invalid behaviour of ath9k_wmi_cmd(). Consider the following scenario: CPU0 CPU1 ath9k_wmi_cmd(...) mutex_lock(&wmi->op_mutex) ath9k_wmi_cmd_issue(...) wait_for_completion_timeout(...) --- timeout --- /* the callback is being processed * before last_seq_id became zero */ ath9k_wmi_ctrl_rx(...) spin_lock_irqsave(...) /* wmi->last_seq_id check here * doesn't detect timeout yet */ spin_unlock_irqrestore(...) /* last_seq_id is zeroed to * indicate there was a timeout */ wmi->last_seq_id = 0 mutex_unlock(&wmi->op_mutex) return -ETIMEDOUT ath9k_wmi_cmd(...) mutex_lock(&wmi->op_mutex) /* the buffer is replaced with * another one */ wmi->cmd_rsp_buf = rsp_buf wmi->cmd_rsp_len = rsp_len ath9k_wmi_cmd_issue(...) spin_lock_irqsave(...) spin_unlock_irqrestore(...) wait_for_completion_timeout(...) /* the continuation of the * callback left after the first * ath9k_wmi_cmd call */ ath9k_wmi_rsp_callback(...) /* copying data designated * to already timeouted * WMI command into an * inappropriate wmi_cmd_buf */ memcpy(...) complete(&wmi->cmd_wait) /* awakened by the bogus callback * => invalid return result */ mutex_unlock(&wmi->op_mutex) return 0 To fix this, move ath9k_wmi_rsp_callback() under wmi_lock inside ath9k_wmi_ctrl_rx() so that the wmi->cmd_wait can be completed only for initially designated wmi_cmd call, otherwise the path would be rejected with last_seq_id check. Also move recording the rsp buffer and length into ath9k_wmi_cmd_issue() under the same wmi_lock with last_seq_id update to avoid their racy changes. Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Fixes: fb9987d0f748 ("ath9k_htc: Support for AR9271 chipset.") Reported-and-tested-by: syzbot+df61b36319e045c00a08@syzkaller.appspotmail.com Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> --- v2: do not extract ath9k_wmi_rsp_callback() internals, rephrase description drivers/net/wireless/ath/ath9k/wmi.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/drivers/net/wireless/ath/ath9k/wmi.c b/drivers/net/wireless/ath/ath9k/wmi.c index d652c647d56b..688453a2e53a 100644 --- a/drivers/net/wireless/ath/ath9k/wmi.c +++ b/drivers/net/wireless/ath/ath9k/wmi.c @@ -242,10 +242,10 @@ static void ath9k_wmi_ctrl_rx(void *priv, struct sk_buff *skb, spin_unlock_irqrestore(&wmi->wmi_lock, flags); goto free_skb; } - spin_unlock_irqrestore(&wmi->wmi_lock, flags); /* WMI command response */ ath9k_wmi_rsp_callback(wmi, skb); + spin_unlock_irqrestore(&wmi->wmi_lock, flags); free_skb: kfree_skb(skb); @@ -283,7 +283,8 @@ int ath9k_wmi_connect(struct htc_target *htc, struct wmi *wmi, static int ath9k_wmi_cmd_issue(struct wmi *wmi, struct sk_buff *skb, - enum wmi_cmd_id cmd, u16 len) + enum wmi_cmd_id cmd, u16 len, + u8 *rsp_buf, u32 rsp_len) { struct wmi_cmd_hdr *hdr; unsigned long flags; @@ -293,6 +294,11 @@ static int ath9k_wmi_cmd_issue(struct wmi *wmi, hdr->seq_no = cpu_to_be16(++wmi->tx_seq_id); spin_lock_irqsave(&wmi->wmi_lock, flags); + + /* record the rsp buffer and length */ + wmi->cmd_rsp_buf = rsp_buf; + wmi->cmd_rsp_len = rsp_len; + wmi->last_seq_id = wmi->tx_seq_id; spin_unlock_irqrestore(&wmi->wmi_lock, flags); @@ -333,11 +339,7 @@ int ath9k_wmi_cmd(struct wmi *wmi, enum wmi_cmd_id cmd_id, goto out; } - /* record the rsp buffer and length */ - wmi->cmd_rsp_buf = rsp_buf; - wmi->cmd_rsp_len = rsp_len; - - ret = ath9k_wmi_cmd_issue(wmi, skb, cmd_id, cmd_len); + ret = ath9k_wmi_cmd_issue(wmi, skb, cmd_id, cmd_len, rsp_buf, rsp_len); if (ret) goto out; -- 2.34.1 ^ permalink raw reply related [flat|nested] 24+ messages in thread
[parent not found: <20230425033832.2041-1-hdanton@sina.com>]
* Re: [PATCH v2] wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx [not found] ` <20230425033832.2041-1-hdanton@sina.com> @ 2023-04-25 5:45 ` Kalle Valo 2023-04-25 7:54 ` Fedor Pchelkin 2023-04-25 19:26 ` [PATCH v3 1/2] " Fedor Pchelkin 2 siblings, 0 replies; 24+ messages in thread From: Kalle Valo @ 2023-04-25 5:45 UTC (permalink / raw) To: Hillf Danton Cc: Fedor Pchelkin, Toke Høiland-Jørgensen, linux-kernel, syzbot+f2cb6e0ffdb961921e4d, syzbot+df61b36319e045c00a08, linux-wireless Hillf Danton <hdanton@sina.com> writes: > On 24 Apr 2023 22:18:26 +0300 Fedor Pchelkin <pchelkin@ispras.ru> >> Currently, the synchronization between ath9k_wmi_cmd() and >> ath9k_wmi_ctrl_rx() is exposed to a race condition which, although being >> rather unlikely, can lead to invalid behaviour of ath9k_wmi_cmd(). >> >> Consider the following scenario: >> >> CPU0 CPU1 >> >> ath9k_wmi_cmd(...) >> mutex_lock(&wmi->op_mutex) >> ath9k_wmi_cmd_issue(...) >> wait_for_completion_timeout(...) >> --- >> timeout >> --- >> /* the callback is being processed >> * before last_seq_id became zero >> */ >> ath9k_wmi_ctrl_rx(...) >> spin_lock_irqsave(...) >> /* wmi->last_seq_id check here >> * doesn't detect timeout yet >> */ >> spin_unlock_irqrestore(...) >> /* last_seq_id is zeroed to >> * indicate there was a timeout >> */ >> wmi->last_seq_id = 0 > > Without wmi->wmi_lock held, updating last_seq_id on the waiter side > means it is random on the waker side, so the fix below is incorrect. > >> mutex_unlock(&wmi->op_mutex) >> return -ETIMEDOUT >> >> ath9k_wmi_cmd(...) >> mutex_lock(&wmi->op_mutex) >> /* the buffer is replaced with >> * another one >> */ >> wmi->cmd_rsp_buf = rsp_buf >> wmi->cmd_rsp_len = rsp_len >> ath9k_wmi_cmd_issue(...) >> spin_lock_irqsave(...) >> spin_unlock_irqrestore(...) >> wait_for_completion_timeout(...) >> /* the continuation of the >> * callback left after the first >> * ath9k_wmi_cmd call >> */ >> ath9k_wmi_rsp_callback(...) >> /* copying data designated >> * to already timeouted >> * WMI command into an >> * inappropriate wmi_cmd_buf >> */ >> memcpy(...) >> complete(&wmi->cmd_wait) >> /* awakened by the bogus callback >> * => invalid return result >> */ >> mutex_unlock(&wmi->op_mutex) >> return 0 >> >> To fix this, move ath9k_wmi_rsp_callback() under wmi_lock inside >> ath9k_wmi_ctrl_rx() so that the wmi->cmd_wait can be completed only for >> initially designated wmi_cmd call, otherwise the path would be rejected >> with last_seq_id check. >> >> Also move recording the rsp buffer and length into ath9k_wmi_cmd_issue() >> under the same wmi_lock with last_seq_id update to avoid their racy >> changes. > > Better in a seperate one. Adding linux-wireless, please always CC the list with wireless patches. -- https://patchwork.kernel.org/project/linux-wireless/list/ https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2] wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx [not found] ` <20230425033832.2041-1-hdanton@sina.com> 2023-04-25 5:45 ` Kalle Valo @ 2023-04-25 7:54 ` Fedor Pchelkin 2023-04-25 19:26 ` [PATCH v3 1/2] " Fedor Pchelkin 2 siblings, 0 replies; 24+ messages in thread From: Fedor Pchelkin @ 2023-04-25 7:54 UTC (permalink / raw) To: Hillf Danton Cc: Toke Høiland-Jørgensen, Kalle Valo, linux-kernel, syzbot+f2cb6e0ffdb961921e4d, syzbot+df61b36319e045c00a08, linux-wireless, netdev, Alexey Khoroshilov, lvc-project On Tue, Apr 25, 2023 at 11:38:32AM +0800, Hillf Danton wrote: > On 24 Apr 2023 22:18:26 +0300 Fedor Pchelkin <pchelkin@ispras.ru> > > Currently, the synchronization between ath9k_wmi_cmd() and > > ath9k_wmi_ctrl_rx() is exposed to a race condition which, although being > > rather unlikely, can lead to invalid behaviour of ath9k_wmi_cmd(). > > > > Consider the following scenario: > > > > CPU0 CPU1 > > > > ath9k_wmi_cmd(...) > > mutex_lock(&wmi->op_mutex) > > ath9k_wmi_cmd_issue(...) > > wait_for_completion_timeout(...) > > --- > > timeout > > --- > > /* the callback is being processed > > * before last_seq_id became zero > > */ > > ath9k_wmi_ctrl_rx(...) > > spin_lock_irqsave(...) > > /* wmi->last_seq_id check here > > * doesn't detect timeout yet > > */ > > spin_unlock_irqrestore(...) > > /* last_seq_id is zeroed to > > * indicate there was a timeout > > */ > > wmi->last_seq_id = 0 > > Without wmi->wmi_lock held, updating last_seq_id on the waiter side > means it is random on the waker side, so the fix below is incorrect. > Thank you for noticing! Of course that should be done. > > mutex_unlock(&wmi->op_mutex) > > return -ETIMEDOUT > > > > ath9k_wmi_cmd(...) > > mutex_lock(&wmi->op_mutex) > > /* the buffer is replaced with > > * another one > > */ > > wmi->cmd_rsp_buf = rsp_buf > > wmi->cmd_rsp_len = rsp_len > > ath9k_wmi_cmd_issue(...) > > spin_lock_irqsave(...) > > spin_unlock_irqrestore(...) > > wait_for_completion_timeout(...) > > /* the continuation of the > > * callback left after the first > > * ath9k_wmi_cmd call > > */ > > ath9k_wmi_rsp_callback(...) > > /* copying data designated > > * to already timeouted > > * WMI command into an > > * inappropriate wmi_cmd_buf > > */ > > memcpy(...) > > complete(&wmi->cmd_wait) > > /* awakened by the bogus callback > > * => invalid return result > > */ > > mutex_unlock(&wmi->op_mutex) > > return 0 > > > > To fix this, move ath9k_wmi_rsp_callback() under wmi_lock inside > > ath9k_wmi_ctrl_rx() so that the wmi->cmd_wait can be completed only for > > initially designated wmi_cmd call, otherwise the path would be rejected > > with last_seq_id check. > > > > Also move recording the rsp buffer and length into ath9k_wmi_cmd_issue() > > under the same wmi_lock with last_seq_id update to avoid their racy > > changes. > > Better in a seperate one. Well, they are parts of the same problem but now it seems more relevant to divide the patch in two: the first one for incorrect last_seq_id synchronization and the second one for recording rsp buffer under the lock. Thanks! ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH v3 1/2] wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx [not found] ` <20230425033832.2041-1-hdanton@sina.com> 2023-04-25 5:45 ` Kalle Valo 2023-04-25 7:54 ` Fedor Pchelkin @ 2023-04-25 19:26 ` Fedor Pchelkin 2023-04-25 19:26 ` [PATCH v3 2/2] wifi: ath9k: protect WMI command response buffer replacement with a lock Fedor Pchelkin ` (3 more replies) 2 siblings, 4 replies; 24+ messages in thread From: Fedor Pchelkin @ 2023-04-25 19:26 UTC (permalink / raw) To: Toke Høiland-Jørgensen, Kalle Vallo Cc: Fedor Pchelkin, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Senthil Balasubramanian, John W. Linville, Vasanthakumar Thiagarajan, Sujith, linux-wireless, netdev, linux-kernel, Alexey Khoroshilov, lvc-project, syzbot+f2cb6e0ffdb961921e4d, Hillf Danton Currently, the synchronization between ath9k_wmi_cmd() and ath9k_wmi_ctrl_rx() is exposed to a race condition which, although being rather unlikely, can lead to invalid behaviour of ath9k_wmi_cmd(). Consider the following scenario: CPU0 CPU1 ath9k_wmi_cmd(...) mutex_lock(&wmi->op_mutex) ath9k_wmi_cmd_issue(...) wait_for_completion_timeout(...) --- timeout --- /* the callback is being processed * before last_seq_id became zero */ ath9k_wmi_ctrl_rx(...) spin_lock_irqsave(...) /* wmi->last_seq_id check here * doesn't detect timeout yet */ spin_unlock_irqrestore(...) /* last_seq_id is zeroed to * indicate there was a timeout */ wmi->last_seq_id = 0 mutex_unlock(&wmi->op_mutex) return -ETIMEDOUT ath9k_wmi_cmd(...) mutex_lock(&wmi->op_mutex) /* the buffer is replaced with * another one */ wmi->cmd_rsp_buf = rsp_buf wmi->cmd_rsp_len = rsp_len ath9k_wmi_cmd_issue(...) spin_lock_irqsave(...) spin_unlock_irqrestore(...) wait_for_completion_timeout(...) /* the continuation of the * callback left after the first * ath9k_wmi_cmd call */ ath9k_wmi_rsp_callback(...) /* copying data designated * to already timeouted * WMI command into an * inappropriate wmi_cmd_buf */ memcpy(...) complete(&wmi->cmd_wait) /* awakened by the bogus callback * => invalid return result */ mutex_unlock(&wmi->op_mutex) return 0 To fix this, update last_seq_id on timeout path inside ath9k_wmi_cmd() under the wmi_lock. Move ath9k_wmi_rsp_callback() under wmi_lock inside ath9k_wmi_ctrl_rx() so that the wmi->cmd_wait can be completed only for initially designated wmi_cmd call, otherwise the path would be rejected with last_seq_id check. Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Fixes: fb9987d0f748 ("ath9k_htc: Support for AR9271 chipset.") Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> --- v2: do not extract ath9k_wmi_rsp_callback() internals, rephrase description v3: per Hillf Danton's comment, protect last_seq_id with wmi_lock on timeout path, divide the v2 version into two separate patches; the first one is concerned with last_seq_id and completion under wmi_lock, the second one is for moving rsp buffer and length recording under wmi lock. Note that it's been only tested with Syzkaller and on basic driver scenarios with real hardware. drivers/net/wireless/ath/ath9k/wmi.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/net/wireless/ath/ath9k/wmi.c b/drivers/net/wireless/ath/ath9k/wmi.c index d652c647d56b..04f363cb90fe 100644 --- a/drivers/net/wireless/ath/ath9k/wmi.c +++ b/drivers/net/wireless/ath/ath9k/wmi.c @@ -242,10 +242,10 @@ static void ath9k_wmi_ctrl_rx(void *priv, struct sk_buff *skb, spin_unlock_irqrestore(&wmi->wmi_lock, flags); goto free_skb; } - spin_unlock_irqrestore(&wmi->wmi_lock, flags); /* WMI command response */ ath9k_wmi_rsp_callback(wmi, skb); + spin_unlock_irqrestore(&wmi->wmi_lock, flags); free_skb: kfree_skb(skb); @@ -308,8 +308,8 @@ int ath9k_wmi_cmd(struct wmi *wmi, enum wmi_cmd_id cmd_id, struct ath_common *common = ath9k_hw_common(ah); u16 headroom = sizeof(struct htc_frame_hdr) + sizeof(struct wmi_cmd_hdr); + unsigned long time_left, flags; struct sk_buff *skb; - unsigned long time_left; int ret = 0; if (ah->ah_flags & AH_UNPLUGGED) @@ -345,7 +345,9 @@ int ath9k_wmi_cmd(struct wmi *wmi, enum wmi_cmd_id cmd_id, if (!time_left) { ath_dbg(common, WMI, "Timeout waiting for WMI command: %s\n", wmi_cmd_to_name(cmd_id)); + spin_lock_irqsave(&wmi->wmi_lock, flags); wmi->last_seq_id = 0; + spin_unlock_irqrestore(&wmi->wmi_lock, flags); mutex_unlock(&wmi->op_mutex); return -ETIMEDOUT; } -- 2.34.1 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH v3 2/2] wifi: ath9k: protect WMI command response buffer replacement with a lock 2023-04-25 19:26 ` [PATCH v3 1/2] " Fedor Pchelkin @ 2023-04-25 19:26 ` Fedor Pchelkin 2023-08-08 14:07 ` Toke Høiland-Jørgensen [not found] ` <20230425230708.2132-1-hdanton@sina.com> ` (2 subsequent siblings) 3 siblings, 1 reply; 24+ messages in thread From: Fedor Pchelkin @ 2023-04-25 19:26 UTC (permalink / raw) To: Toke Høiland-Jørgensen, Kalle Vallo Cc: Fedor Pchelkin, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Senthil Balasubramanian, John W. Linville, Vasanthakumar Thiagarajan, Sujith, linux-wireless, netdev, linux-kernel, Alexey Khoroshilov, lvc-project, syzbot+f2cb6e0ffdb961921e4d, Hillf Danton If ath9k_wmi_cmd() has exited with a timeout, it is possible that during next ath9k_wmi_cmd() call the wmi_rsp callback for previous wmi command writes to new wmi->cmd_rsp_buf and makes a completion. This results in an invalid ath9k_wmi_cmd() return value. Move the replacement of WMI command response buffer and length under wmi_lock. Note that last_seq_id value is updated there, too. Thus, the buffer cannot be written to by a belated wmi_rsp callback because that path is properly rejected by the last_seq_id check. Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Fixes: fb9987d0f748 ("ath9k_htc: Support for AR9271 chipset.") Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> --- v2: do not extract ath9k_wmi_rsp_callback() internals, rephrase description v3: per Hillf Danton's comment, protect last_seq_id with wmi_lock on timeout path, divide the v2 version into two separate patches; the first one is concerned with last_seq_id and completion under wmi_lock, the second one is for moving rsp buffer and length recording under wmi lock. Note that it's been only tested with Syzkaller and on basic driver scenarios with real hardware. drivers/net/wireless/ath/ath9k/wmi.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/drivers/net/wireless/ath/ath9k/wmi.c b/drivers/net/wireless/ath/ath9k/wmi.c index 04f363cb90fe..1476b42b52a9 100644 --- a/drivers/net/wireless/ath/ath9k/wmi.c +++ b/drivers/net/wireless/ath/ath9k/wmi.c @@ -283,7 +283,8 @@ int ath9k_wmi_connect(struct htc_target *htc, struct wmi *wmi, static int ath9k_wmi_cmd_issue(struct wmi *wmi, struct sk_buff *skb, - enum wmi_cmd_id cmd, u16 len) + enum wmi_cmd_id cmd, u16 len, + u8 *rsp_buf, u32 rsp_len) { struct wmi_cmd_hdr *hdr; unsigned long flags; @@ -293,6 +294,11 @@ static int ath9k_wmi_cmd_issue(struct wmi *wmi, hdr->seq_no = cpu_to_be16(++wmi->tx_seq_id); spin_lock_irqsave(&wmi->wmi_lock, flags); + + /* record the rsp buffer and length */ + wmi->cmd_rsp_buf = rsp_buf; + wmi->cmd_rsp_len = rsp_len; + wmi->last_seq_id = wmi->tx_seq_id; spin_unlock_irqrestore(&wmi->wmi_lock, flags); @@ -333,11 +339,7 @@ int ath9k_wmi_cmd(struct wmi *wmi, enum wmi_cmd_id cmd_id, goto out; } - /* record the rsp buffer and length */ - wmi->cmd_rsp_buf = rsp_buf; - wmi->cmd_rsp_len = rsp_len; - - ret = ath9k_wmi_cmd_issue(wmi, skb, cmd_id, cmd_len); + ret = ath9k_wmi_cmd_issue(wmi, skb, cmd_id, cmd_len, rsp_buf, rsp_len); if (ret) goto out; -- 2.34.1 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH v3 2/2] wifi: ath9k: protect WMI command response buffer replacement with a lock 2023-04-25 19:26 ` [PATCH v3 2/2] wifi: ath9k: protect WMI command response buffer replacement with a lock Fedor Pchelkin @ 2023-08-08 14:07 ` Toke Høiland-Jørgensen 0 siblings, 0 replies; 24+ messages in thread From: Toke Høiland-Jørgensen @ 2023-08-08 14:07 UTC (permalink / raw) To: Fedor Pchelkin, Kalle Vallo Cc: Fedor Pchelkin, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Senthil Balasubramanian, John W. Linville, Vasanthakumar Thiagarajan, Sujith, linux-wireless, netdev, linux-kernel, Alexey Khoroshilov, lvc-project, syzbot+f2cb6e0ffdb961921e4d, Hillf Danton Fedor Pchelkin <pchelkin@ispras.ru> writes: > If ath9k_wmi_cmd() has exited with a timeout, it is possible that during > next ath9k_wmi_cmd() call the wmi_rsp callback for previous wmi command > writes to new wmi->cmd_rsp_buf and makes a completion. This results in an > invalid ath9k_wmi_cmd() return value. > > Move the replacement of WMI command response buffer and length under > wmi_lock. Note that last_seq_id value is updated there, too. > > Thus, the buffer cannot be written to by a belated wmi_rsp callback > because that path is properly rejected by the last_seq_id check. > > Found by Linux Verification Center (linuxtesting.org) with Syzkaller. > > Fixes: fb9987d0f748 ("ath9k_htc: Support for AR9271 chipset.") > Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Given that the previous patch resets the last_seq_id to 0 on timeout under the lock, I don't think this patch is strictly necessary anymore. However, it doesn't hurt either, and I actually think moving the update of the rsp buf into ath9k_wmi_cmd_issue() aids readability, so: Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> ^ permalink raw reply [flat|nested] 24+ messages in thread
[parent not found: <20230425230708.2132-1-hdanton@sina.com>]
* Re: [PATCH v3 1/2] wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx [not found] ` <20230425230708.2132-1-hdanton@sina.com> @ 2023-04-26 19:02 ` Fedor Pchelkin 2023-05-15 12:06 ` Toke Høiland-Jørgensen 2023-05-18 10:24 ` Hillf Danton 0 siblings, 2 replies; 24+ messages in thread From: Fedor Pchelkin @ 2023-04-26 19:02 UTC (permalink / raw) To: Hillf Danton Cc: Toke Høiland-Jørgensen, Kalle Vallo, syzbot+f2cb6e0ffdb961921e4d, linux-wireless, netdev, linux-kernel, Alexey Khoroshilov, lvc-project On Wed, Apr 26, 2023 at 07:07:08AM +0800, Hillf Danton wrote: > Given similar wait timeout[1], just taking lock on the waiter side is not > enough wrt fixing the race, because in case job done on the waker side, > waiter needs to wait again after timeout. > As I understand you correctly, you mean the case when a timeout occurs during ath9k_wmi_ctrl_rx() callback execution. I suppose if a timeout has occurred on a waiter's side, it should return immediately and doesn't have to care in which state the callback has been at that moment. AFAICS, this is controlled properly with taking a wmi_lock on waiter and waker sides, and there is no data corruption. If a callback has not managed to do its work entirely (performing a completion and subsequently waking waiting thread is included here), then, well, it is considered a timeout, in my opinion. Your suggestion makes a wmi_cmd call to give a little more chance for the belated callback to complete (although timeout has actually expired). That is probably good, but increasing a timeout value makes that job, too. I don't think it makes any sense on real hardware. Or do you mean there is data corruption that is properly fixed with your patch? That is, I agree there can be a situation when a callback makes all the logical work it should and it just hasn't got enough time to perform a completion before a timeout on waiter's side occurs. And this behaviour can be named "racy". But, technically, this seems to be a rather valid timeout. > [1] https://lore.kernel.org/lkml/9d9b9652-c1ac-58e9-2eab-9256c17b1da2@I-love.SAKURA.ne.jp/ > I don't think it's a similar case because wait_for_completion_state() is interruptible while wait_for_completion_timeout() is not. > A correct fix looks like after putting pieces together > > +++ b/drivers/net/wireless/ath/ath9k/wmi.c > @@ -238,6 +238,7 @@ static void ath9k_wmi_ctrl_rx(void *priv > spin_unlock_irqrestore(&wmi->wmi_lock, flags); > goto free_skb; > } > + wmi->last_seq_id = 0; > spin_unlock_irqrestore(&wmi->wmi_lock, flags); > > /* WMI command response */ > @@ -339,9 +340,20 @@ int ath9k_wmi_cmd(struct wmi *wmi, enum > > time_left = wait_for_completion_timeout(&wmi->cmd_wait, timeout); > if (!time_left) { > + unsigned long flags; > + int wait = 0; > + > ath_dbg(common, WMI, "Timeout waiting for WMI command: %s\n", > wmi_cmd_to_name(cmd_id)); > - wmi->last_seq_id = 0; > + > + spin_lock_irqsave(&wmi->wmi_lock, flags); > + if (wmi->last_seq_id == 0) /* job done on the waker side? */ > + wait = 1; > + else > + wmi->last_seq_id = 0; > + spin_unlock_irqrestore(&wmi->wmi_lock, flags); > + if (wait) > + wait_for_completion(&wmi->cmd_wait); > mutex_unlock(&wmi->op_mutex); > return -ETIMEDOUT; > } ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 1/2] wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx 2023-04-26 19:02 ` [PATCH v3 1/2] wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx Fedor Pchelkin @ 2023-05-15 12:06 ` Toke Høiland-Jørgensen 2023-05-18 10:24 ` Hillf Danton 1 sibling, 0 replies; 24+ messages in thread From: Toke Høiland-Jørgensen @ 2023-05-15 12:06 UTC (permalink / raw) To: Fedor Pchelkin, Hillf Danton Cc: Kalle Vallo, syzbot+f2cb6e0ffdb961921e4d, linux-wireless, netdev, linux-kernel, Alexey Khoroshilov, lvc-project Fedor Pchelkin <pchelkin@ispras.ru> writes: > On Wed, Apr 26, 2023 at 07:07:08AM +0800, Hillf Danton wrote: >> Given similar wait timeout[1], just taking lock on the waiter side is not >> enough wrt fixing the race, because in case job done on the waker side, >> waiter needs to wait again after timeout. >> > > As I understand you correctly, you mean the case when a timeout occurs > during ath9k_wmi_ctrl_rx() callback execution. I suppose if a timeout has > occurred on a waiter's side, it should return immediately and doesn't have > to care in which state the callback has been at that moment. > > AFAICS, this is controlled properly with taking a wmi_lock on waiter and > waker sides, and there is no data corruption. > > If a callback has not managed to do its work entirely (performing a > completion and subsequently waking waiting thread is included here), then, > well, it is considered a timeout, in my opinion. > > Your suggestion makes a wmi_cmd call to give a little more chance for the > belated callback to complete (although timeout has actually expired). That > is probably good, but increasing a timeout value makes that job, too. I > don't think it makes any sense on real hardware. > > Or do you mean there is data corruption that is properly fixed with your > patch? > > That is, I agree there can be a situation when a callback makes all the > logical work it should and it just hasn't got enough time to perform a > completion before a timeout on waiter's side occurs. And this behaviour > can be named "racy". But, technically, this seems to be a rather valid > timeout. > >> [1] https://lore.kernel.org/lkml/9d9b9652-c1ac-58e9-2eab-9256c17b1da2@I-love.SAKURA.ne.jp/ >> > > I don't think it's a similar case because wait_for_completion_state() is > interruptible while wait_for_completion_timeout() is not. Ping, Hillf? -Toke ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 1/2] wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx 2023-04-26 19:02 ` [PATCH v3 1/2] wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx Fedor Pchelkin 2023-05-15 12:06 ` Toke Høiland-Jørgensen @ 2023-05-18 10:24 ` Hillf Danton 2023-05-18 15:44 ` Fedor Pchelkin 1 sibling, 1 reply; 24+ messages in thread From: Hillf Danton @ 2023-05-18 10:24 UTC (permalink / raw) To: Fedor Pchelkin Cc: Toke Høiland-Jørgensen, Kalle Vallo, syzbot+f2cb6e0ffdb961921e4d, linux-wireless, netdev, linux-kernel, Alexey Khoroshilov, lvc-project Fedor Pchelkin <pchelkin@ispras.ru> writes: > On Wed, Apr 26, 2023 at 07:07:08AM +0800, Hillf Danton wrote: >> Given similar wait timeout[1], just taking lock on the waiter side is not >> enough wrt fixing the race, because in case job done on the waker side, >> waiter needs to wait again after timeout. >> > > As I understand you correctly, you mean the case when a timeout occurs > during ath9k_wmi_ctrl_rx() callback execution. I suppose if a timeout has > occurred on a waiter's side, it should return immediately and doesn't have > to care in which state the callback has been at that moment. > > AFAICS, this is controlled properly with taking a wmi_lock on waiter and > waker sides, and there is no data corruption. > > If a callback has not managed to do its work entirely (performing a > completion and subsequently waking waiting thread is included here), then, > well, it is considered a timeout, in my opinion. > > Your suggestion makes a wmi_cmd call to give a little more chance for the > belated callback to complete (although timeout has actually expired). That > is probably good, but increasing a timeout value makes that job, too. I > don't think it makes any sense on real hardware. > > Or do you mean there is data corruption that is properly fixed with your patch? Given complete() not paired with wait_for_completion(), what is the difference after this patch? ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 1/2] wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx 2023-05-18 10:24 ` Hillf Danton @ 2023-05-18 15:44 ` Fedor Pchelkin 0 siblings, 0 replies; 24+ messages in thread From: Fedor Pchelkin @ 2023-05-18 15:44 UTC (permalink / raw) To: Hillf Danton Cc: Toke Høiland-Jørgensen, Kalle Vallo, syzbot+f2cb6e0ffdb961921e4d, linux-wireless, netdev, linux-kernel, Alexey Khoroshilov, lvc-project On Thu, May 18, 2023 at 06:24:37PM +0800, Hillf Danton wrote: > Fedor Pchelkin <pchelkin@ispras.ru> writes: > > > On Wed, Apr 26, 2023 at 07:07:08AM +0800, Hillf Danton wrote: > >> Given similar wait timeout[1], just taking lock on the waiter side is not > >> enough wrt fixing the race, because in case job done on the waker side, > >> waiter needs to wait again after timeout. > >> > > > > As I understand you correctly, you mean the case when a timeout occurs > > during ath9k_wmi_ctrl_rx() callback execution. I suppose if a timeout has > > occurred on a waiter's side, it should return immediately and doesn't have > > to care in which state the callback has been at that moment. > > > > AFAICS, this is controlled properly with taking a wmi_lock on waiter and > > waker sides, and there is no data corruption. > > > > If a callback has not managed to do its work entirely (performing a > > completion and subsequently waking waiting thread is included here), then, > > well, it is considered a timeout, in my opinion. > > > > Your suggestion makes a wmi_cmd call to give a little more chance for the > > belated callback to complete (although timeout has actually expired). That > > is probably good, but increasing a timeout value makes that job, too. I > > don't think it makes any sense on real hardware. > > > > Or do you mean there is data corruption that is properly fixed with your patch? > > Given complete() not paired with wait_for_completion(), what is the > difference after this patch? The main thing in the patch is making ath9k_wmi_ctrl_rx() release wmi_lock after calling ath9k_wmi_rsp_callback() which does copying data into the shared wmi->cmd_rsp_buf buffer. Otherwise there can occur a data corrupting scenario outlined in the patch description (added it here, too). On Tue, 25 Apr 2023 22:26:06 +0300, Fedor Pchelkin wrote: > CPU0 CPU1 > > ath9k_wmi_cmd(...) > mutex_lock(&wmi->op_mutex) > ath9k_wmi_cmd_issue(...) > wait_for_completion_timeout(...) > --- > timeout > --- > /* the callback is being processed > * before last_seq_id became zero > */ > ath9k_wmi_ctrl_rx(...) > spin_lock_irqsave(...) > /* wmi->last_seq_id check here > * doesn't detect timeout yet > */ > spin_unlock_irqrestore(...) > /* last_seq_id is zeroed to > * indicate there was a timeout > */ > wmi->last_seq_id = 0 > mutex_unlock(&wmi->op_mutex) > return -ETIMEDOUT > > ath9k_wmi_cmd(...) > mutex_lock(&wmi->op_mutex) > /* the buffer is replaced with > * another one > */ > wmi->cmd_rsp_buf = rsp_buf > wmi->cmd_rsp_len = rsp_len > ath9k_wmi_cmd_issue(...) > spin_lock_irqsave(...) > spin_unlock_irqrestore(...) > wait_for_completion_timeout(...) > /* the continuation of the > * callback left after the first > * ath9k_wmi_cmd call > */ > ath9k_wmi_rsp_callback(...) > /* copying data designated > * to already timeouted > * WMI command into an > * inappropriate wmi_cmd_buf > */ > memcpy(...) > complete(&wmi->cmd_wait) > /* awakened by the bogus callback > * => invalid return result > */ > mutex_unlock(&wmi->op_mutex) > return 0 So before the patch the wmi->last_seq_id check in ath9k_wmi_ctrl_rx() wasn't helpful in case wmi->last_seq_id value was changed during ath9k_wmi_rsp_callback() execution because of the next ath9k_wmi_cmd() call. With the proposed patch the wmi->last_seq_id check in ath9k_wmi_ctrl_rx() accomplishes its job as: - the next ath9k_wmi_cmd call changes last_seq_id value under lock so it either waits for a belated ath9k_wmi_ctrl_rx() to finish or updates last_seq_id value so that the timeout check in ath9k_wmi_ctrl_rx() indicates that the waiter side has timeouted and we shouldn't further process the callback. - memcopying in ath9k_wmi_rsp_callback() is made to a valid place if the last_seq_id check was successful under the lock. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 1/2] wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx 2023-04-25 19:26 ` [PATCH v3 1/2] " Fedor Pchelkin 2023-04-25 19:26 ` [PATCH v3 2/2] wifi: ath9k: protect WMI command response buffer replacement with a lock Fedor Pchelkin [not found] ` <20230425230708.2132-1-hdanton@sina.com> @ 2023-08-08 14:06 ` Toke Høiland-Jørgensen 2023-08-22 13:35 ` Kalle Valo 3 siblings, 0 replies; 24+ messages in thread From: Toke Høiland-Jørgensen @ 2023-08-08 14:06 UTC (permalink / raw) To: Fedor Pchelkin, Kalle Vallo Cc: Fedor Pchelkin, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Senthil Balasubramanian, John W. Linville, Vasanthakumar Thiagarajan, Sujith, linux-wireless, netdev, linux-kernel, Alexey Khoroshilov, lvc-project, syzbot+f2cb6e0ffdb961921e4d, Hillf Danton Fedor Pchelkin <pchelkin@ispras.ru> writes: > Currently, the synchronization between ath9k_wmi_cmd() and > ath9k_wmi_ctrl_rx() is exposed to a race condition which, although being > rather unlikely, can lead to invalid behaviour of ath9k_wmi_cmd(). > > Consider the following scenario: > > CPU0 CPU1 > > ath9k_wmi_cmd(...) > mutex_lock(&wmi->op_mutex) > ath9k_wmi_cmd_issue(...) > wait_for_completion_timeout(...) > --- > timeout > --- > /* the callback is being processed > * before last_seq_id became zero > */ > ath9k_wmi_ctrl_rx(...) > spin_lock_irqsave(...) > /* wmi->last_seq_id check here > * doesn't detect timeout yet > */ > spin_unlock_irqrestore(...) > /* last_seq_id is zeroed to > * indicate there was a timeout > */ > wmi->last_seq_id = 0 > mutex_unlock(&wmi->op_mutex) > return -ETIMEDOUT > > ath9k_wmi_cmd(...) > mutex_lock(&wmi->op_mutex) > /* the buffer is replaced with > * another one > */ > wmi->cmd_rsp_buf = rsp_buf > wmi->cmd_rsp_len = rsp_len > ath9k_wmi_cmd_issue(...) > spin_lock_irqsave(...) > spin_unlock_irqrestore(...) > wait_for_completion_timeout(...) > /* the continuation of the > * callback left after the first > * ath9k_wmi_cmd call > */ > ath9k_wmi_rsp_callback(...) > /* copying data designated > * to already timeouted > * WMI command into an > * inappropriate wmi_cmd_buf > */ > memcpy(...) > complete(&wmi->cmd_wait) > /* awakened by the bogus callback > * => invalid return result > */ > mutex_unlock(&wmi->op_mutex) > return 0 > > To fix this, update last_seq_id on timeout path inside ath9k_wmi_cmd() > under the wmi_lock. Move ath9k_wmi_rsp_callback() under wmi_lock inside > ath9k_wmi_ctrl_rx() so that the wmi->cmd_wait can be completed only for > initially designated wmi_cmd call, otherwise the path would be rejected > with last_seq_id check. > > Found by Linux Verification Center (linuxtesting.org) with Syzkaller. > > Fixes: fb9987d0f748 ("ath9k_htc: Support for AR9271 chipset.") > Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Alright, finally took the time to dig into this and convince myself that the fix if correct. Sorry for taking so long! Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 1/2] wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx 2023-04-25 19:26 ` [PATCH v3 1/2] " Fedor Pchelkin ` (2 preceding siblings ...) 2023-08-08 14:06 ` Toke Høiland-Jørgensen @ 2023-08-22 13:35 ` Kalle Valo 3 siblings, 0 replies; 24+ messages in thread From: Kalle Valo @ 2023-08-22 13:35 UTC (permalink / raw) To: Fedor Pchelkin Cc: Toke Høiland-Jørgensen, Fedor Pchelkin, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Senthil Balasubramanian, John W. Linville, Vasanthakumar Thiagarajan, Sujith, linux-wireless, netdev, linux-kernel, Alexey Khoroshilov, lvc-project, syzbot+f2cb6e0ffdb961921e4d, Hillf Danton Fedor Pchelkin <pchelkin@ispras.ru> wrote: > Currently, the synchronization between ath9k_wmi_cmd() and > ath9k_wmi_ctrl_rx() is exposed to a race condition which, although being > rather unlikely, can lead to invalid behaviour of ath9k_wmi_cmd(). > > Consider the following scenario: > > CPU0 CPU1 > > ath9k_wmi_cmd(...) > mutex_lock(&wmi->op_mutex) > ath9k_wmi_cmd_issue(...) > wait_for_completion_timeout(...) > --- > timeout > --- > /* the callback is being processed > * before last_seq_id became zero > */ > ath9k_wmi_ctrl_rx(...) > spin_lock_irqsave(...) > /* wmi->last_seq_id check here > * doesn't detect timeout yet > */ > spin_unlock_irqrestore(...) > /* last_seq_id is zeroed to > * indicate there was a timeout > */ > wmi->last_seq_id = 0 > mutex_unlock(&wmi->op_mutex) > return -ETIMEDOUT > > ath9k_wmi_cmd(...) > mutex_lock(&wmi->op_mutex) > /* the buffer is replaced with > * another one > */ > wmi->cmd_rsp_buf = rsp_buf > wmi->cmd_rsp_len = rsp_len > ath9k_wmi_cmd_issue(...) > spin_lock_irqsave(...) > spin_unlock_irqrestore(...) > wait_for_completion_timeout(...) > /* the continuation of the > * callback left after the first > * ath9k_wmi_cmd call > */ > ath9k_wmi_rsp_callback(...) > /* copying data designated > * to already timeouted > * WMI command into an > * inappropriate wmi_cmd_buf > */ > memcpy(...) > complete(&wmi->cmd_wait) > /* awakened by the bogus callback > * => invalid return result > */ > mutex_unlock(&wmi->op_mutex) > return 0 > > To fix this, update last_seq_id on timeout path inside ath9k_wmi_cmd() > under the wmi_lock. Move ath9k_wmi_rsp_callback() under wmi_lock inside > ath9k_wmi_ctrl_rx() so that the wmi->cmd_wait can be completed only for > initially designated wmi_cmd call, otherwise the path would be rejected > with last_seq_id check. > > Found by Linux Verification Center (linuxtesting.org) with Syzkaller. > > Fixes: fb9987d0f748 ("ath9k_htc: Support for AR9271 chipset.") > Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> > Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> > Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> 2 patches applied to ath-next branch of ath.git, thanks. b674fb513e2e wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx 454994cfa9e4 wifi: ath9k: protect WMI command response buffer replacement with a lock -- https://patchwork.kernel.org/project/linux-wireless/patch/20230425192607.18015-1-pchelkin@ispras.ru/ https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 3/3] wifi: ath9k: fix ath9k_wmi_cmd return value when device is unplugged 2023-03-15 20:21 [PATCH 0/3] wifi: ath9k: deal with uninit memory Fedor Pchelkin 2023-03-15 20:21 ` [PATCH 1/3] wifi: ath9k: avoid referencing uninit memory in ath9k_wmi_ctrl_rx Fedor Pchelkin 2023-03-15 20:21 ` [PATCH 2/3] wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx Fedor Pchelkin @ 2023-03-15 20:21 ` Fedor Pchelkin 2023-03-15 20:47 ` [PATCH 0/3] wifi: ath9k: deal with uninit memory Fedor Pchelkin 3 siblings, 0 replies; 24+ messages in thread From: Fedor Pchelkin @ 2023-03-15 20:21 UTC (permalink / raw) To: Toke Høiland-Jørgensen Cc: Fedor Pchelkin, Kalle Valo, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Senthil Balasubramanian, John W. Linville, Vasanthakumar Thiagarajan, Sujith, linux-wireless, netdev, linux-kernel, Alexey Khoroshilov, lvc-project Return with an error code in case the USB device has been already unplugged. Otherwise the callers of ath9k_wmi_cmd() are unaware of the fact that cmd_buf and rsp_buf are not initialized or handled properly inside this function. Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Fixes: a3be14b76da1 ("ath9k_htc: Handle device unplug properly") Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> --- drivers/net/wireless/ath/ath9k/wmi.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/net/wireless/ath/ath9k/wmi.c b/drivers/net/wireless/ath/ath9k/wmi.c index 99a91bbaace9..3e0ad4f8f0a0 100644 --- a/drivers/net/wireless/ath/ath9k/wmi.c +++ b/drivers/net/wireless/ath/ath9k/wmi.c @@ -320,8 +320,11 @@ int ath9k_wmi_cmd(struct wmi *wmi, enum wmi_cmd_id cmd_id, unsigned long time_left; int ret = 0; - if (ah->ah_flags & AH_UNPLUGGED) - return 0; + if (ah->ah_flags & AH_UNPLUGGED) { + ath_dbg(common, WMI, "Device unplugged for WMI command: %s\n", + wmi_cmd_to_name(cmd_id)); + return -ENODEV; + } skb = alloc_skb(headroom + cmd_len, GFP_ATOMIC); if (!skb) -- 2.34.1 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH 0/3] wifi: ath9k: deal with uninit memory 2023-03-15 20:21 [PATCH 0/3] wifi: ath9k: deal with uninit memory Fedor Pchelkin ` (2 preceding siblings ...) 2023-03-15 20:21 ` [PATCH 3/3] wifi: ath9k: fix ath9k_wmi_cmd return value when device is unplugged Fedor Pchelkin @ 2023-03-15 20:47 ` Fedor Pchelkin 3 siblings, 0 replies; 24+ messages in thread From: Fedor Pchelkin @ 2023-03-15 20:47 UTC (permalink / raw) To: Toke Høiland-Jørgensen Cc: Kalle Valo, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Senthil Balasubramanian, John W. Linville, Vasanthakumar Thiagarajan, Sujith, linux-wireless, netdev, linux-kernel, Alexey Khoroshilov, lvc-project On Wed, Mar 15, 2023 at 11:21:09PM +0300, Fedor Pchelkin wrote: > Syzkaller reports two cases ([1] and [2]) of uninitialized memory referencing in ath9k > wmi functions. The following patch series is intended to fix them and related issues. > > [1] https://syzkaller.appspot.com/bug?id=51d401326d8ee41859d68997acdd6f3b1b39f186 > [2] https://syzkaller.appspot.com/bug?id=fc54e8d79f5d5082c7867259d71b4e6618b69d25 During the patch development I observed that the return value of REG_READ (ath9k_regread), REG_READ_MULTI (ath9k_multi_regread) and similar macros is not checked in most places inside ath9k where they are called. That may also potentially lead to incorrect behaviour. I wonder if it actually poses a problem as the current implementation has been for a long time and perhaps somebody has already addressed this. In more details: -- ath9k_regread returns -1 on error, and probably this is a predefined error state and doesn't need additional check. But, overall, it seems strange to me that the return value is not checked in places where it is used later (for example, in ath9k_reg_rmw or ath9k_hw_ani_read_counters). -- ath9k_multi_regread fills 'val' buffer with undefined values on error case, that should definitely be fixed with initializing the local buffer to zero, I think. Could you please say your opinion on this issue? ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2023-08-22 13:35 UTC | newest] Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-03-15 20:21 [PATCH 0/3] wifi: ath9k: deal with uninit memory Fedor Pchelkin 2023-03-15 20:21 ` [PATCH 1/3] wifi: ath9k: avoid referencing uninit memory in ath9k_wmi_ctrl_rx Fedor Pchelkin 2023-03-17 5:26 ` Kalle Valo 2023-03-18 20:25 ` Fedor Pchelkin 2023-04-24 18:23 ` Fedor Pchelkin 2023-04-24 18:33 ` [PATCH v2] " Fedor Pchelkin 2023-04-25 11:14 ` Toke Høiland-Jørgensen 2023-04-28 16:52 ` Kalle Valo 2023-03-15 20:21 ` [PATCH 2/3] wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx Fedor Pchelkin 2023-04-24 19:11 ` Fedor Pchelkin 2023-04-24 19:18 ` [PATCH v2] " Fedor Pchelkin [not found] ` <20230425033832.2041-1-hdanton@sina.com> 2023-04-25 5:45 ` Kalle Valo 2023-04-25 7:54 ` Fedor Pchelkin 2023-04-25 19:26 ` [PATCH v3 1/2] " Fedor Pchelkin 2023-04-25 19:26 ` [PATCH v3 2/2] wifi: ath9k: protect WMI command response buffer replacement with a lock Fedor Pchelkin 2023-08-08 14:07 ` Toke Høiland-Jørgensen [not found] ` <20230425230708.2132-1-hdanton@sina.com> 2023-04-26 19:02 ` [PATCH v3 1/2] wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx Fedor Pchelkin 2023-05-15 12:06 ` Toke Høiland-Jørgensen 2023-05-18 10:24 ` Hillf Danton 2023-05-18 15:44 ` Fedor Pchelkin 2023-08-08 14:06 ` Toke Høiland-Jørgensen 2023-08-22 13:35 ` Kalle Valo 2023-03-15 20:21 ` [PATCH 3/3] wifi: ath9k: fix ath9k_wmi_cmd return value when device is unplugged Fedor Pchelkin 2023-03-15 20:47 ` [PATCH 0/3] wifi: ath9k: deal with uninit memory Fedor Pchelkin
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.