All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wen Gong <wgong@codeaurora.org>
To: Ben Greear <greearb@candelatech.com>
Cc: Krishna Chaitanya <chaitanya.mgit@gmail.com>,
	Kalle Valo <kvalo@codeaurora.org>,
	linux-wireless <linux-wireless@vger.kernel.org>,
	ath10k <ath10k@lists.infradead.org>
Subject: Re: [RFC] ath10k: change to do napi_enable and napi_disable when insmod and rmmod for sdio
Date: Fri, 21 Aug 2020 10:45:20 +0800	[thread overview]
Message-ID: <f58ad98479e54a5bbe8b6561563d8cc7@codeaurora.org> (raw)
In-Reply-To: <c69abe52-ccd1-ac73-8691-d87f5ed8be76@candelatech.com>

On 2020-08-21 04:59, Ben Greear wrote:
> On 8/20/20 1:15 PM, Krishna Chaitanya wrote:
>> On Thu, Aug 20, 2020 at 11:23 PM Ben Greear <greearb@candelatech.com> 
>> wrote:
>>> 
>>> On 8/20/20 10:42 AM, Krishna Chaitanya wrote:
>>>> On Thu, Aug 20, 2020 at 11:11 PM Krishna Chaitanya
>>>> <chaitanya.mgit@gmail.com> wrote:
>>>>> 
>>>>> On Thu, Aug 20, 2020 at 10:38 PM Ben Greear 
>>>>> <greearb@candelatech.com> wrote:
>>>>>> 
>>>>>> On 8/20/20 10:00 AM, Krishna Chaitanya wrote:
>>>>>>> On Thu, Aug 20, 2020 at 10:02 PM Ben Greear 
>>>>>>> <greearb@candelatech.com> wrote:
>>>>>>>> 
>>>>>>>> On 8/20/20 9:08 AM, Krishna Chaitanya wrote:
>>>>>>>>> On Thu, Aug 20, 2020 at 8:07 PM Wen Gong <wgong@codeaurora.org> 
>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> On 2020-08-20 18:52, Krishna Chaitanya wrote:
>>>>>>>>>>> On Thu, Aug 20, 2020 at 3:45 PM Wen Gong 
>>>>>>>>>>> <wgong@codeaurora.org> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> On 2020-08-20 17:19, Krishna Chaitanya wrote:
>>>>>>>>>> ...
>>>>>>>>>>>>>> I'm not really convinced that this is the right fix, but 
>>>>>>>>>>>>>> I'm no NAPI
>>>>>>>>>>>>>> expert. Can anyone else help?
>>>>>>>>>>>>> Calling napi_disable() twice can lead to hangs, but moving 
>>>>>>>>>>>>> NAPI from
>>>>>>>>>>>>> start/stop to
>>>>>>>>>>>>> the probe isn't the right approach as the datapath is tied 
>>>>>>>>>>>>> to
>>>>>>>>>>>>> start/stop.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Maybe check the state of NAPI before disable?
>>>>>>>>>>>>> 
>>>>>>>>>>>>>      if (test_bit(NAPI_STATE_SCHED, &ar->napi.napi.state))
>>>>>>>>>>>>>       napi_disable(&ar->napi)
>>>>>>>>>>>>> or maintain napi_state like this
>>>>>>>>>>>>> https://patchwork.kernel.org/patch/10249365/
>>>>>>>>>>>> it is better to use above link's patch.
>>>>>>>>>>>> napi.state is controlled by napi API, it is better ath10k 
>>>>>>>>>>>> not know it.
>>>>>>>>>>> Sure, but IMHO just canceling the async rx work should solve 
>>>>>>>>>>> the issue.
>>>>>>>>>> Oh no, canceling the async rx work will not solve this issue, 
>>>>>>>>>> rx worker
>>>>>>>>>> ath10k_rx_indication_async_work call napi_schedule, after 
>>>>>>>>>> napi_complete,
>>>>>>>>>> the NAPI_STATE_SCHED will clear.
>>>>>>>>>> The issue of this patch is because 2 thread called to hif_stop 
>>>>>>>>>> and
>>>>>>>>>> NAPI_STATE_SCHED not clear.
>>>>>>>>> That fix is still valid and good to have.
>>>>>>>>> 
>>>>>>>>> ndev_stop being called twice is typical scenarios (stop vs 
>>>>>>>>> rmmod), so
>>>>>>>>>      just checking the netdev_flags for IFF_UP and returning 
>>>>>>>>> from hif_Stop
>>>>>>>>> should suffice, no?
>>>>>>>> 
>>>>>>>> My approach to fix this problem was to add a boolean in ath10k 
>>>>>>>> as to whether
>>>>>>>> it had napi enabled or not, and then check that before trying to 
>>>>>>>> enable/disable
>>>>>>>> it again.  Seems to work fine, and cleaner in my mind than 
>>>>>>>> checking internal
>>>>>>>> napi flags.
>>>>>>> A much simpler approach is just to check for IFF_UP and skip NAPI 
>>>>>>> (and others)
>>>>>>> in the hif_stop no? (provided proper RTNL locking is done if 
>>>>>>> hif_stop
>>>>>>> is being called
>>>>>>> internally as well).
>>>>>>> 
>>>>>> 
>>>>>> I'm not sure, but I think the driver should be internally 
>>>>>> consistent and not
>>>>>> spend a lot of time trying to guess about interactions with 
>>>>>> objects higher
>>>>>> in the stack.
>>>>> Fair enough, the network interface state is a basic thing 
>>>>> controlled
>>>>> by the driver,
>>>>> so, should be okay to use. Anyways, the in-driver approach has more 
>>>>> control.
>>>>>> 
>>>>>> Here is my original patch to fix this, it is not complex.
>>>>>> 
>>>>>> https://patchwork.kernel.org/patch/10249363/
>>>>> Sure, I have shared your patch above :).
>>>> Sent a bit early, any idea why this wasn't upstreamed earlier?
>>> 
>>> No, one comment from Michal indicated maybe there were more problems 
>>> lurking
>>> in this area, but he seemed to be OK with the patch over all.  After 
>>> that,
>>> it was just ignored.
>>> 
>> Now might be a good time to push for it :)
>> 
> 
> It is generally a waste of time in my experience.  Kalle is the
> maintainer and should
> be seeing any of this he cares to see.  If he likes the patch, he can
> apply it or
> something similar.  If you have a reproducible test case, see if the 
> patch fixes
> things, that might help it be accepted.
I have 2 cmd, each one can reproduce the hang.
echo soft > 
/sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 
0.05;ifconfig wlan0 down
echo soft > 
/sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod 
ath10k_sdio
and with the my patch, it fix the hang. Change of my patch is similar 
with your
patch(https://patchwork.kernel.org/patch/10249365/), so it should also 
fix the hang with your patch.
> 
> Thanks,
> Ben

WARNING: multiple messages have this Message-ID (diff)
From: Wen Gong <wgong@codeaurora.org>
To: Ben Greear <greearb@candelatech.com>
Cc: Krishna Chaitanya <chaitanya.mgit@gmail.com>,
	linux-wireless <linux-wireless@vger.kernel.org>,
	ath10k <ath10k@lists.infradead.org>,
	Kalle Valo <kvalo@codeaurora.org>
Subject: Re: [RFC] ath10k: change to do napi_enable and napi_disable when insmod and rmmod for sdio
Date: Fri, 21 Aug 2020 10:45:20 +0800	[thread overview]
Message-ID: <f58ad98479e54a5bbe8b6561563d8cc7@codeaurora.org> (raw)
In-Reply-To: <c69abe52-ccd1-ac73-8691-d87f5ed8be76@candelatech.com>

On 2020-08-21 04:59, Ben Greear wrote:
> On 8/20/20 1:15 PM, Krishna Chaitanya wrote:
>> On Thu, Aug 20, 2020 at 11:23 PM Ben Greear <greearb@candelatech.com> 
>> wrote:
>>> 
>>> On 8/20/20 10:42 AM, Krishna Chaitanya wrote:
>>>> On Thu, Aug 20, 2020 at 11:11 PM Krishna Chaitanya
>>>> <chaitanya.mgit@gmail.com> wrote:
>>>>> 
>>>>> On Thu, Aug 20, 2020 at 10:38 PM Ben Greear 
>>>>> <greearb@candelatech.com> wrote:
>>>>>> 
>>>>>> On 8/20/20 10:00 AM, Krishna Chaitanya wrote:
>>>>>>> On Thu, Aug 20, 2020 at 10:02 PM Ben Greear 
>>>>>>> <greearb@candelatech.com> wrote:
>>>>>>>> 
>>>>>>>> On 8/20/20 9:08 AM, Krishna Chaitanya wrote:
>>>>>>>>> On Thu, Aug 20, 2020 at 8:07 PM Wen Gong <wgong@codeaurora.org> 
>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> On 2020-08-20 18:52, Krishna Chaitanya wrote:
>>>>>>>>>>> On Thu, Aug 20, 2020 at 3:45 PM Wen Gong 
>>>>>>>>>>> <wgong@codeaurora.org> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> On 2020-08-20 17:19, Krishna Chaitanya wrote:
>>>>>>>>>> ...
>>>>>>>>>>>>>> I'm not really convinced that this is the right fix, but 
>>>>>>>>>>>>>> I'm no NAPI
>>>>>>>>>>>>>> expert. Can anyone else help?
>>>>>>>>>>>>> Calling napi_disable() twice can lead to hangs, but moving 
>>>>>>>>>>>>> NAPI from
>>>>>>>>>>>>> start/stop to
>>>>>>>>>>>>> the probe isn't the right approach as the datapath is tied 
>>>>>>>>>>>>> to
>>>>>>>>>>>>> start/stop.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Maybe check the state of NAPI before disable?
>>>>>>>>>>>>> 
>>>>>>>>>>>>>      if (test_bit(NAPI_STATE_SCHED, &ar->napi.napi.state))
>>>>>>>>>>>>>       napi_disable(&ar->napi)
>>>>>>>>>>>>> or maintain napi_state like this
>>>>>>>>>>>>> https://patchwork.kernel.org/patch/10249365/
>>>>>>>>>>>> it is better to use above link's patch.
>>>>>>>>>>>> napi.state is controlled by napi API, it is better ath10k 
>>>>>>>>>>>> not know it.
>>>>>>>>>>> Sure, but IMHO just canceling the async rx work should solve 
>>>>>>>>>>> the issue.
>>>>>>>>>> Oh no, canceling the async rx work will not solve this issue, 
>>>>>>>>>> rx worker
>>>>>>>>>> ath10k_rx_indication_async_work call napi_schedule, after 
>>>>>>>>>> napi_complete,
>>>>>>>>>> the NAPI_STATE_SCHED will clear.
>>>>>>>>>> The issue of this patch is because 2 thread called to hif_stop 
>>>>>>>>>> and
>>>>>>>>>> NAPI_STATE_SCHED not clear.
>>>>>>>>> That fix is still valid and good to have.
>>>>>>>>> 
>>>>>>>>> ndev_stop being called twice is typical scenarios (stop vs 
>>>>>>>>> rmmod), so
>>>>>>>>>      just checking the netdev_flags for IFF_UP and returning 
>>>>>>>>> from hif_Stop
>>>>>>>>> should suffice, no?
>>>>>>>> 
>>>>>>>> My approach to fix this problem was to add a boolean in ath10k 
>>>>>>>> as to whether
>>>>>>>> it had napi enabled or not, and then check that before trying to 
>>>>>>>> enable/disable
>>>>>>>> it again.  Seems to work fine, and cleaner in my mind than 
>>>>>>>> checking internal
>>>>>>>> napi flags.
>>>>>>> A much simpler approach is just to check for IFF_UP and skip NAPI 
>>>>>>> (and others)
>>>>>>> in the hif_stop no? (provided proper RTNL locking is done if 
>>>>>>> hif_stop
>>>>>>> is being called
>>>>>>> internally as well).
>>>>>>> 
>>>>>> 
>>>>>> I'm not sure, but I think the driver should be internally 
>>>>>> consistent and not
>>>>>> spend a lot of time trying to guess about interactions with 
>>>>>> objects higher
>>>>>> in the stack.
>>>>> Fair enough, the network interface state is a basic thing 
>>>>> controlled
>>>>> by the driver,
>>>>> so, should be okay to use. Anyways, the in-driver approach has more 
>>>>> control.
>>>>>> 
>>>>>> Here is my original patch to fix this, it is not complex.
>>>>>> 
>>>>>> https://patchwork.kernel.org/patch/10249363/
>>>>> Sure, I have shared your patch above :).
>>>> Sent a bit early, any idea why this wasn't upstreamed earlier?
>>> 
>>> No, one comment from Michal indicated maybe there were more problems 
>>> lurking
>>> in this area, but he seemed to be OK with the patch over all.  After 
>>> that,
>>> it was just ignored.
>>> 
>> Now might be a good time to push for it :)
>> 
> 
> It is generally a waste of time in my experience.  Kalle is the
> maintainer and should
> be seeing any of this he cares to see.  If he likes the patch, he can
> apply it or
> something similar.  If you have a reproducible test case, see if the 
> patch fixes
> things, that might help it be accepted.
I have 2 cmd, each one can reproduce the hang.
echo soft > 
/sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 
0.05;ifconfig wlan0 down
echo soft > 
/sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod 
ath10k_sdio
and with the my patch, it fix the hang. Change of my patch is similar 
with your
patch(https://patchwork.kernel.org/patch/10249365/), so it should also 
fix the hang with your patch.
> 
> Thanks,
> Ben

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

  reply	other threads:[~2020-08-21  2:45 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-14  3:55 [RFC] ath10k: change to do napi_enable and napi_disable when insmod and rmmod for sdio Wen Gong
2020-02-14  3:55 ` Wen Gong
2020-08-20  8:32 ` Kalle Valo
2020-08-20  8:32   ` Kalle Valo
2020-08-20  9:19   ` Krishna Chaitanya
2020-08-20  9:19     ` Krishna Chaitanya
2020-08-20  9:26     ` Krishna Chaitanya
2020-08-20  9:26       ` Krishna Chaitanya
2020-08-20 10:20       ` Wen Gong
2020-08-20 10:20         ` Wen Gong
2020-08-20 10:14     ` Wen Gong
2020-08-20 10:14       ` Wen Gong
2020-08-20 10:52       ` Krishna Chaitanya
2020-08-20 10:52         ` Krishna Chaitanya
2020-08-20 14:37         ` Wen Gong
2020-08-20 14:37           ` Wen Gong
2020-08-20 16:08           ` Krishna Chaitanya
2020-08-20 16:08             ` Krishna Chaitanya
2020-08-20 16:32             ` Ben Greear
2020-08-20 16:32               ` Ben Greear
2020-08-20 17:00               ` Krishna Chaitanya
2020-08-20 17:00                 ` Krishna Chaitanya
2020-08-20 17:07                 ` Ben Greear
2020-08-20 17:07                   ` Ben Greear
2020-08-20 17:41                   ` Krishna Chaitanya
2020-08-20 17:41                     ` Krishna Chaitanya
2020-08-20 17:42                     ` Krishna Chaitanya
2020-08-20 17:42                       ` Krishna Chaitanya
2020-08-20 17:53                       ` Ben Greear
2020-08-20 17:53                         ` Ben Greear
2020-08-20 20:15                         ` Krishna Chaitanya
2020-08-20 20:15                           ` Krishna Chaitanya
2020-08-20 20:59                           ` Ben Greear
2020-08-20 20:59                             ` Ben Greear
2020-08-21  2:45                             ` Wen Gong [this message]
2020-08-21  2:45                               ` Wen Gong
2020-08-24  4:35                               ` Wen Gong
2020-08-24  4:35                                 ` Wen Gong
2020-09-07 16:07                             ` Kalle Valo
2020-09-07 16:07                             ` Kalle Valo
2020-09-07 17:18                               ` Ben Greear
2020-09-07 17:18                                 ` Ben Greear

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f58ad98479e54a5bbe8b6561563d8cc7@codeaurora.org \
    --to=wgong@codeaurora.org \
    --cc=ath10k@lists.infradead.org \
    --cc=chaitanya.mgit@gmail.com \
    --cc=greearb@candelatech.com \
    --cc=kvalo@codeaurora.org \
    --cc=linux-wireless@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.