From: Johannes Berg <johannes@sipsolutions.net>
To: Ben Greear <greearb@candelatech.com>, linux-wireless@vger.kernel.org
Subject: Re: [PATCH] mac80211: Fix kernel hang on ax200 firmware crash.
Date: Thu, 30 Jul 2020 15:11:10 +0200 [thread overview]
Message-ID: <a91c9337da6458e5f1d61ff36ec07e66132d0c1e.camel@sipsolutions.net> (raw)
In-Reply-To: <fffa6cc5-99b6-f598-e20f-b30270ecd04c@candelatech.com>
On Thu, 2020-07-30 at 05:52 -0700, Ben Greear wrote:
>
> > > + if (++count > 1000) {
> > > + /* WTF, bail out so that at least we don't hang the system. */
> > > + sdata_err(sdata, "Could not move state after 1000 tries, ret: %d state: %d\n",
> > > + ret, sta->sta_state);
> > > + WARN_ON_ONCE(1);
> > > + break;
> > > + }
> >
> > I guess that should be
> >
> > if (WARN_ON_ONCE()) ...
>
> If we spin 1000 times, it is worth a second warning. Or do you mean
> the WARN_ON_ONCE(ret) should have if in front of it?
Ah. I missed the WARN_ON_ONCE(ret) entirely. I just meant that the
warning could/should be around the condition.
In fact though, even the message probably should:
if (WARN_ONCE(++count > 1000, "...", ...))
break;
That way the message would be captured inside the warning, which is
better for tooling that parses warnings.
> >
> > etc.
> >
> > > int err = drv_sta_state(sta->local, sta->sdata, sta,
> > > sta->sta_state, new_state);
> > > - if (err)
> > > - return err;
> > > + if (err == -EIO) {
> > > + /* Sdata-not-in-driver, we are out of sync, but probably
> > > + * best to carry on instead of bailing here, at least maybe
> > > + * we can clean this up.
> > > + */
> >
> > It _could_ be the driver itself returning -EIO, so why not check the
> > sdata-in-driver flag?
>
> Right, but if driver is complaining here, we need to bail out regardless of
> sdata-in-driver or not,
Yes. But I'm not sure we should WARN on that?
> unless you think a driver could return EIO and then
> a small bit later start working for the same request?
Hah, no. If that's a possibility due to some stupid firmware reasons,
let the driver deal with it.
> > Really here that mostly applies to the commit log, which should probably
> > say something like
> >
> > mac80211: deadlock due to driver misbehaviour
> >
> > or so, and then go on to explain what it does in *mac80211*, and show
> > the iwlwifi parts only as an *example*.
>
> Its not really driver mis-behaviour per se. The root cause is that the
> firmware crashes too badly for the driver to recover (ok, so driver might
> could be better, but I've also seen cases where ath10k NIC falls off the PCI
> bus, so nothing the driver can do in that case I think).
Fair enough. We actually do have some code in there that tries to
unbind/rebind the driver from the device eventually, but that's
obviously a very last resort.
FWIW, we do have multiple NICs in a single machine, but then we run them
from VMs so each VM only has a single NIC. But I don't see why that
should be different from the device/firmware point of view. Perhaps your
PCIe configuration is different.
> Per my other patches, I've seen this sdata-in-driver crap in the past, so
> I think I probably hit a similar bug in both ax200 and ath10k, but since
> ax200 is so easy to crash, it is much more likely to hit this bug than any
> other driver I'm aware of.
>
> I'll try to re-word the commit message though, I don't really care what it
> says so long as the code gets in.
:)
Thanks,
johannes
prev parent reply other threads:[~2020-07-30 13:11 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-10 20:40 [PATCH] mac80211: Fix kernel hang on ax200 firmware crash greearb
2020-06-15 13:36 ` Ben Greear
2020-07-30 12:33 ` Johannes Berg
2020-07-30 12:52 ` Ben Greear
2020-07-30 13:11 ` Johannes Berg [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a91c9337da6458e5f1d61ff36ec07e66132d0c1e.camel@sipsolutions.net \
--to=johannes@sipsolutions.net \
--cc=greearb@candelatech.com \
--cc=linux-wireless@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).