Re: wilc1000 kernel crash

From: <Ajay.Kathat@microchip.com>
To: <mwalle@kernel.org>
Cc: <Claudiu.Beznea@microchip.com>, <linux-wireless@vger.kernel.org>,
	<kvalo@kernel.org>
Subject: Re: wilc1000 kernel crash
Date: Fri, 9 Dec 2022 14:14:26 +0000	[thread overview]
Message-ID: <4f279aa2-b5df-0b76-2cdf-ddb339a19cf7@microchip.com> (raw)
In-Reply-To: <20221209120343.wvagbfprsgdj74af@0002.3ffe.de>

On 09/12/22 17:33, Michael Walle wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>
> On 22/10/26 10:54, Michael Walle wrote:
>> Hi Ajay,
>>
>> On 22/10/25 08:26, Ajay.Kathat@microchip.com wrote:
>>>> In handle_rcvd_ntwrk_info() scan_req->scan_result isn't valid anymore,
>>>> although it doesn't contain NULL. Thus the driver is calling into a
>>>> bogus function pointer. There seems to be no locking between the
>>>> asynchronous calls within the workqueue (wilc_enqueue_work()) and when
>>>> the interface is disabled (wilc_deinit()). wilc_deinit() will free the
>>>> host_if_drv object which might still be used within the workqueue
>>>> context.
>>>
>>> Please try the below code changes with your test setup environment.
>> The workqueue handling and wilc_deinit() run in parallel, correct? ..
>>
>>> --- a/drivers/net/wireless/microchip/wilc1000/hif.c
>>> +++ b/drivers/net/wireless/microchip/wilc1000/hif.c
>>> @@ -495,12 +495,18 @@ static void handle_rcvd_ntwrk_info(struct
>>> work_struct *work)
>>>    {
>>>           struct host_if_msg *msg = container_of(work, struct
>>> host_if_msg, work);
>>>           struct wilc_rcvd_net_info *rcvd_info = &msg->body.net_info;
>>> -       struct wilc_user_scan_req *scan_req =
>>> &msg->vif->hif_drv->usr_scan_req;
>>> +       struct host_if_drv *hif_drv = msg->vif->hif_drv;
>>> +       struct wilc_user_scan_req *scan_req;
>>>           const u8 *ch_elm;
>>>           u8 *ies;
>>>           int ies_len;
>>>           size_t offset;
>>>
>>> +       if (!hif_drv || !hif_drv->usr_scan_req.scan_result)
>>> +               goto done;
>> .. So what if hif_drv will be set to NULL right after this check?
>>
>> I don't think you'll get around using proper locking here.
>>
>>> +
>>> +       scan_req = &hif_drv->usr_scan_req;
>>> +
>>>           if (ieee80211_is_probe_resp(rcvd_info->mgmt->frame_control))
>>>                   offset = offsetof(struct ieee80211_mgmt,
>>> u.probe_resp.variable);
>>>           else if (ieee80211_is_beacon(rcvd_info->mgmt->frame_control))
>>> @@ -1574,6 +1580,9 @@ void wilc_network_info_received(struct wilc *wilc,
>>> u8 *buffer, u32 length)
>>>                   return;
>>>           }
>>>
>>> +       if (!hif_drv->usr_scan_req.scan_result)
>>> +               return;
>>> +
>> This is also racy. What if scan_result is cleared right after this
>> check? Then the work item will still get added to the work queue.
>>
>> Here is the call tree:
>>
>> isr_bh_routine() [interrupt thread ctx]
>>    wilc_handle_isr()
>>      mutex_lock(hif_cs)
>>      wilc_wlan_handle_isr_ext()
>>          wilc_wlan_handle_rxq()
>>            wilc_wlan_handle_rx_buff()
>>            wilc_wlan_cfg_indicate_rx()
>>              wilc_network_info_received()
>>                          wilc_enqueue_work(handle_rcvd_ntwrk_info)
>>      mutex_unlock(hif_cs)
>>
>> handle_rcvd_ntwrk_info [hif_workqueue ctx]
>>    if (scan_result)
>>      scan_result()
>>
>> wilc_mac_close() [ioctl ctx?]
>>    wilc_deinit_host_int()
>>      wilc_deinit()
>>        mutex_lock(deinit_lock)
>>        if (scan_result)
>>          scan_result()
>>          scan_result = NULL
>>        kfree(hif_drv)
>>        hif_drv = NULL
>>        mutex_unlock(deinit_lock)
>>
>> I don't see any synchronization mechanisms, between these threads.
>>
>>>           msg = wilc_alloc_work(vif, handle_rcvd_ntwrk_info, false);
>>>           if (IS_ERR(msg))
>>>                   return;
>>>
>>> The above changes should avoid the kernel crash exception.
>> As mentioned above, I think this will just decrease the chance that
>> it is happening. Nonetheless, I've tried your changes but it doesn't
>> fix the crash.
> Any news here?

Hi Michael,

No progress yet. I tried to simulate the condition a few times but was 
unable to see the exact failure in my setup so I need to try more. For 
the other "FW not responding" continuous logs, I got some clue. 
Probably, will try to send that patch first.

Regards,
Ajay