linux-wireless.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* ax200, fw crashes, and sdata-in-driver
@ 2020-07-13 23:57 Ben Greear
  2020-07-30 12:30 ` Johannes Berg
  0 siblings, 1 reply; 4+ messages in thread
From: Ben Greear @ 2020-07-13 23:57 UTC (permalink / raw)
  To: linux-wireless

Hello,

I larded up my 5.4 kernel with KASAN and lockdep, and ran some tests.  This is with my
patch that keeps from busy-spinning forever (see previous ignored patch).

After a few restarts and FW crashes, the ax200 could not recover firmware.  There
were lots of sdata-in-driver errors, and then KASAN hit a use-after-free issue
related to ax200 accessing sta object that was previously deleted.

Now, I think I know why:

In the ieee80211_handle_reconfig_failure(struct ieee80211_local *local)
method, it will clear the SDATA_IN_DRIVER flag, and according to comments,
this is run when firmware cannot be recovered.  But, just because FW is
dead does not mean that the driver itself has cleaned up its state.

So question is, should ax200 (and all drivers) be responsible for cleaning
up all state when FW cannot be recovered, or should instead mac80211 do cleanup
in this case by, among other things, not clearing that flag (and probably
not doing the ctx->driver_present = false; config as well)?

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: ax200, fw crashes, and sdata-in-driver
  2020-07-13 23:57 ax200, fw crashes, and sdata-in-driver Ben Greear
@ 2020-07-30 12:30 ` Johannes Berg
  2020-07-30 12:58   ` Ben Greear
  0 siblings, 1 reply; 4+ messages in thread
From: Johannes Berg @ 2020-07-30 12:30 UTC (permalink / raw)
  To: Ben Greear, linux-wireless

Hi,

> I larded up my 5.4 kernel with KASAN and lockdep, and ran some tests.  This is with my
> patch that keeps from busy-spinning forever (see previous ignored patch).

Right, sorry, hadn't gotten to patches in a while.

> After a few restarts and FW crashes, the ax200 could not recover firmware.  There
> were lots of sdata-in-driver errors, and then KASAN hit a use-after-free issue
> related to ax200 accessing sta object that was previously deleted.
> 
> Now, I think I know why:
> 
> In the ieee80211_handle_reconfig_failure(struct ieee80211_local *local)
> method, it will clear the SDATA_IN_DRIVER flag, and according to comments,
> this is run when firmware cannot be recovered.  But, just because FW is
> dead does not mean that the driver itself has cleaned up its state.
> 
> So question is, should ax200 (and all drivers) be responsible for cleaning
> up all state when FW cannot be recovered, or should instead mac80211 do cleanup
> in this case by, among other things, not clearing that flag (and probably
> not doing the ctx->driver_present = false; config as well)?

I think it should be the driver. It's not clear _why_ the driver failed,
after all. If the firmware is still alive and just rejected something
then perhaps rolling things back will work. But if the firmware just
died again, that will just cause even more trouble.

johannes


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: ax200, fw crashes, and sdata-in-driver
  2020-07-30 12:30 ` Johannes Berg
@ 2020-07-30 12:58   ` Ben Greear
  2020-09-22 20:57     ` Ben Greear
  0 siblings, 1 reply; 4+ messages in thread
From: Ben Greear @ 2020-07-30 12:58 UTC (permalink / raw)
  To: Johannes Berg, linux-wireless

On 7/30/20 5:30 AM, Johannes Berg wrote:
> Hi,
> 
>> I larded up my 5.4 kernel with KASAN and lockdep, and ran some tests.  This is with my
>> patch that keeps from busy-spinning forever (see previous ignored patch).
> 
> Right, sorry, hadn't gotten to patches in a while.
> 
>> After a few restarts and FW crashes, the ax200 could not recover firmware.  There
>> were lots of sdata-in-driver errors, and then KASAN hit a use-after-free issue
>> related to ax200 accessing sta object that was previously deleted.
>>
>> Now, I think I know why:
>>
>> In the ieee80211_handle_reconfig_failure(struct ieee80211_local *local)
>> method, it will clear the SDATA_IN_DRIVER flag, and according to comments,
>> this is run when firmware cannot be recovered.  But, just because FW is
>> dead does not mean that the driver itself has cleaned up its state.
>>
>> So question is, should ax200 (and all drivers) be responsible for cleaning
>> up all state when FW cannot be recovered, or should instead mac80211 do cleanup
>> in this case by, among other things, not clearing that flag (and probably
>> not doing the ctx->driver_present = false; config as well)?
> 
> I think it should be the driver. It's not clear _why_ the driver failed,
> after all. If the firmware is still alive and just rejected something
> then perhaps rolling things back will work. But if the firmware just
> died again, that will just cause even more trouble.

The current code clears state without actually notifying the driver, so it
is causing mac80211 to be out of sync with the driver.  I can't see how that
is a good idea.  This is root cause of the issue that causes the busy-spin
related to sdata-in-driver / EIO as far as I can tell.

When we get to the state where the driver cannot properly recover from
a fW crash, which is what this code is about, then we just need to keep
from killing the rest of the system with busy-spins and memory corruption errors
and hope it can limp on.

Some day, we should notify user-space that a radio is unrecoverable
(short of rmmod/modprobe perhaps), and that it should take appropriate
action.  If this is an AP on a pole, for instance, the WDT should be triggered
and system rebooted most likely.

Thanks,
Ben

> 
> johannes
> 


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: ax200, fw crashes, and sdata-in-driver
  2020-07-30 12:58   ` Ben Greear
@ 2020-09-22 20:57     ` Ben Greear
  0 siblings, 0 replies; 4+ messages in thread
From: Ben Greear @ 2020-09-22 20:57 UTC (permalink / raw)
  To: Johannes Berg, linux-wireless

On 7/30/20 5:58 AM, Ben Greear wrote:
> On 7/30/20 5:30 AM, Johannes Berg wrote:
>> Hi,
>>
>>> I larded up my 5.4 kernel with KASAN and lockdep, and ran some tests.  This is with my
>>> patch that keeps from busy-spinning forever (see previous ignored patch).
>>
>> Right, sorry, hadn't gotten to patches in a while.
>>
>>> After a few restarts and FW crashes, the ax200 could not recover firmware.  There
>>> were lots of sdata-in-driver errors, and then KASAN hit a use-after-free issue
>>> related to ax200 accessing sta object that was previously deleted.
>>>
>>> Now, I think I know why:
>>>
>>> In the ieee80211_handle_reconfig_failure(struct ieee80211_local *local)
>>> method, it will clear the SDATA_IN_DRIVER flag, and according to comments,
>>> this is run when firmware cannot be recovered.  But, just because FW is
>>> dead does not mean that the driver itself has cleaned up its state.
>>>
>>> So question is, should ax200 (and all drivers) be responsible for cleaning
>>> up all state when FW cannot be recovered, or should instead mac80211 do cleanup
>>> in this case by, among other things, not clearing that flag (and probably
>>> not doing the ctx->driver_present = false; config as well)?
>>
>> I think it should be the driver. It's not clear _why_ the driver failed,
>> after all. If the firmware is still alive and just rejected something
>> then perhaps rolling things back will work. But if the firmware just
>> died again, that will just cause even more trouble.
> 
> The current code clears state without actually notifying the driver, so it
> is causing mac80211 to be out of sync with the driver.  I can't see how that
> is a good idea.  This is root cause of the issue that causes the busy-spin
> related to sdata-in-driver / EIO as far as I can tell.

Hello,

As far as I can tell, no work has gone into the driver(s) to resolve the use-after-free
issue.

So, maybe worth considering my earlier patch to clean everything up in mac80211
instead of depending on the drivers to get this correctly cleaned up in all cases?

I'll repost it, freshly rebased against latest linus tree....

Thanks,
Ben


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-09-22 20:58 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-13 23:57 ax200, fw crashes, and sdata-in-driver Ben Greear
2020-07-30 12:30 ` Johannes Berg
2020-07-30 12:58   ` Ben Greear
2020-09-22 20:57     ` Ben Greear

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).