linux-wireless.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Berg <johannes@sipsolutions.net>
To: Ben Greear <greearb@candelatech.com>, linux-wireless@vger.kernel.org
Subject: Re: [PATCH] mac80211: Fix kernel hang on ax200 firmware crash.
Date: Thu, 30 Jul 2020 15:11:10 +0200	[thread overview]
Message-ID: <a91c9337da6458e5f1d61ff36ec07e66132d0c1e.camel@sipsolutions.net> (raw)
In-Reply-To: <fffa6cc5-99b6-f598-e20f-b30270ecd04c@candelatech.com>

On Thu, 2020-07-30 at 05:52 -0700, Ben Greear wrote:
> 
> > > +		if (++count > 1000) {
> > > +			/* WTF, bail out so that at least we don't hang the system. */
> > > +			sdata_err(sdata, "Could not move state after 1000 tries, ret: %d  state: %d\n",
> > > +				  ret, sta->sta_state);
> > > +			WARN_ON_ONCE(1);
> > > +			break;
> > > +		}
> > 
> > I guess that should be
> > 
> > if (WARN_ON_ONCE()) ...
> 
> If we spin 1000 times, it is worth a second warning.  Or do you mean
> the WARN_ON_ONCE(ret) should have if in front of it?

Ah. I missed the WARN_ON_ONCE(ret) entirely. I just meant that the
warning could/should be around the condition.

In fact though, even the message probably should:

	if (WARN_ONCE(++count > 1000, "...", ...))
		break;

That way the message would be captured inside the warning, which is
better for tooling that parses warnings.

> > 
> > etc.
> > 
> > >   		int err = drv_sta_state(sta->local, sta->sdata, sta,
> > >   					sta->sta_state, new_state);
> > > -		if (err)
> > > -			return err;
> > > +		if (err == -EIO) {
> > > +			/* Sdata-not-in-driver, we are out of sync, but probably
> > > +			 * best to carry on instead of bailing here, at least maybe
> > > +			 * we can clean this up.
> > > +			 */
> > 
> > It _could_ be the driver itself returning -EIO, so why not check the
> > sdata-in-driver flag?
> 
> Right, but if driver is complaining here, we need to bail out regardless of
> sdata-in-driver or not,

Yes. But I'm not sure we should WARN on that?

>  unless you think a driver could return EIO and then
> a small bit later start working for the same request?

Hah, no. If that's a possibility due to some stupid firmware reasons,
let the driver deal with it.

> > Really here that mostly applies to the commit log, which should probably
> > say something like
> > 
> > 	mac80211: deadlock due to driver misbehaviour
> > 
> > or so, and then go on to explain what it does in *mac80211*, and show
> > the iwlwifi parts only as an *example*.
> 
> Its not really driver mis-behaviour per se.  The root cause is that the
> firmware crashes too badly for the driver to recover (ok, so driver might
> could be better, but I've also seen cases where ath10k NIC falls off the PCI
> bus, so nothing the driver can do in that case I think).

Fair enough. We actually do have some code in there that tries to
unbind/rebind the driver from the device eventually, but that's
obviously a very last resort.

FWIW, we do have multiple NICs in a single machine, but then we run them
from VMs so each VM only has a single NIC. But I don't see why that
should be different from the device/firmware point of view. Perhaps your
PCIe configuration is different.

> Per my other patches, I've seen this sdata-in-driver crap in the past, so
> I think I probably hit a similar bug in both ax200 and ath10k, but since
> ax200 is so easy to crash, it is much more likely to hit this bug than any
> other driver I'm aware of.
> 
> I'll try to re-word the commit message though, I don't really care what it
> says so long as the code gets in.

:)

Thanks,
johannes


      reply	other threads:[~2020-07-30 13:11 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-10 20:40 [PATCH] mac80211: Fix kernel hang on ax200 firmware crash greearb
2020-06-15 13:36 ` Ben Greear
2020-07-30 12:33 ` Johannes Berg
2020-07-30 12:52   ` Ben Greear
2020-07-30 13:11     ` Johannes Berg [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a91c9337da6458e5f1d61ff36ec07e66132d0c1e.camel@sipsolutions.net \
    --to=johannes@sipsolutions.net \
    --cc=greearb@candelatech.com \
    --cc=linux-wireless@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).