All of lore.kernel.org
 help / color / mirror / Atom feed
From: Larry Finger <Larry.Finger@lwfinger.net>
To: Ping-Ke Shih <pkshih@realtek.com>, Jonas Gorski <jonas.gorski@gmail.com>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"linux-wireless@vger.kernel.org" <linux-wireless@vger.kernel.org>
Subject: Re: [BUG v6.2.7] Hitting BUG_ON() on rtw89 wireless driver startup
Date: Wed, 22 Mar 2023 20:41:18 -0500	[thread overview]
Message-ID: <4c841575-1e02-32f2-b63d-52bc0c063c82@lwfinger.net> (raw)
In-Reply-To: <e4f8e55f843041978098f57ecb7e558b@realtek.com>

On 3/22/23 19:59, Ping-Ke Shih wrote:
> 
> 
>> -----Original Message-----
>> From: Jonas Gorski <jonas.gorski@gmail.com>
>> Sent: Thursday, March 23, 2023 4:52 AM
>> To: Larry Finger <Larry.Finger@lwfinger.net>
>> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>; netdev@vger.kernel.org; linux-wireless@vger.kernel.org; Ping-Ke
>> Shih <pkshih@realtek.com>
>> Subject: Re: [BUG v6.2.7] Hitting BUG_ON() on rtw89 wireless driver startup
>>
>> On Wed, 22 Mar 2023 at 18:03, Larry Finger <Larry.Finger@lwfinger.net> wrote:
>>>
>>> On 3/22/23 10:54, Hyeonggon Yoo wrote:
>>>>
>>>> Hello folks,
>>>> I've just encountered weird bug when booting Linux v6.2.7
>>>>
>>>> config: attached
>>>> dmesg: attached
>>>>
>>>> I'm not sure exactly how to trigger this issue yet because it's not
>>>> stably reproducible. (just have encountered randomly when logging in)
>>>>
>>>> At quick look it seems to be related to rtw89 wireless driver or network subsystem.
>>>
>>> Your bug is weird indeed, and it does come from rtw89_8852be. My distro has not
>>> yet released kernel 6.2.7, but I have not seen this problem with mainline
>>> kernels throughout the 6.2 or 6.3 development series.
>>
>> Looking at the rtw89 driver's probe function, the bug is probably a
>> simple race condition:
>>
>> int rtw89_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>> {
>>      ...
>>      ret = rtw89_core_register(rtwdev); <- calls ieee80211_register_hw();
>>      ...
>>      rtw89_core_napi_init(rtwdev);
>>      ...
>> }
>>
>> so it registers the wifi device first, making it visible to userspace,
>> and then initializes napi.
>>
>> So there is a window where a fast userspace may already try to
>> interact with the device before the driver got around to initializing
>> the napi parts, and then it explodes. At least that is my theory for
>> the issue.
>>
>> Switching the order of these two functions should avoid it in theory,
>> as long as rtw89_core_napi_init() doesn't depend on anything
>> rtw89_core_register() does.
>>
>> FWIW, registering the irq handler only after registering the device
>> also seems suspect, and should probably also happen before that.
> 
> Adding a 10 seconds sleep between rtw89_core_register() and
> rtw89_core_napi_init(), and I can reproduce this issue:
> 
> int rtw89_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> {
>      ...
>      ret = rtw89_core_register(rtwdev);
>      ...
> 	msleep(10 * 100);
> 	...
>      rtw89_core_napi_init(rtwdev);
>      ...
> }
> 
> And, as your suggestion, I move the rtw89_core_register() to the last step
> of PCI probe. Then, it looks more reasonable that we prepare NAPI and
> interrupt handlers before registering netdev. Could you give it a try with
> below fix?
> 
> diff --git a/pci.c b/pci.c
> index fe6c0efc..087de2e0 100644
> --- a/pci.c
> +++ b/pci.c
> @@ -3879,25 +3879,26 @@ int rtw89_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>          rtw89_pci_link_cfg(rtwdev);
>          rtw89_pci_l1ss_cfg(rtwdev);
> 
> -       ret = rtw89_core_register(rtwdev);
> -       if (ret) {
> -               rtw89_err(rtwdev, "failed to register core\n");
> -               goto err_clear_resource;
> -       }
> -
>          rtw89_core_napi_init(rtwdev);
> 
>          ret = rtw89_pci_request_irq(rtwdev, pdev);
>          if (ret) {
>                  rtw89_err(rtwdev, "failed to request pci irq\n");
> -               goto err_unregister;
> +               goto err_deinit_napi;
> +       }
> +
> +       ret = rtw89_core_register(rtwdev);
> +       if (ret) {
> +               rtw89_err(rtwdev, "failed to register core\n");
> +               goto err_free_irq;
>          }
> 
>          return 0;
> 
> -err_unregister:
> +err_free_irq:
> +       rtw89_pci_free_irq(rtwdev, pdev);
> +err_deinit_napi:
>          rtw89_core_napi_deinit(rtwdev);
> -       rtw89_core_unregister(rtwdev);
>   err_clear_resource:
>          rtw89_pci_clear_resource(rtwdev, pdev);
>   err_declaim_pci:

Ping-Ke,

The patch works fine here, but I did not have the problem.

When you submit it, add a Tested-by: Larry Finger<Larry.Finger@lwfinger.net> and 
a Reviewed-by for the same address.

@Jonas: Good catch.

Larry



  parent reply	other threads:[~2023-03-23  1:41 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-22 15:54 [BUG v6.2.7] Hitting BUG_ON() on rtw89 wireless driver startup Hyeonggon Yoo
2023-03-22 16:57 ` Larry Finger
2023-03-22 20:52   ` Jonas Gorski
2023-03-23  0:59     ` Ping-Ke Shih
2023-03-23  1:36       ` Larry Finger
2023-03-23  1:41       ` Larry Finger [this message]
2023-03-23  8:36         ` Ping-Ke Shih

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4c841575-1e02-32f2-b63d-52bc0c063c82@lwfinger.net \
    --to=larry.finger@lwfinger.net \
    --cc=42.hyeyoo@gmail.com \
    --cc=jonas.gorski@gmail.com \
    --cc=linux-wireless@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pkshih@realtek.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.