All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Berg <johannes@sipsolutions.net>
To: Ben Greear <greearb@candelatech.com>, linux-wireless@vger.kernel.org
Subject: Re: [PATCH] mac80211: Fix kernel hang on ax200 firmware crash.
Date: Thu, 30 Jul 2020 15:11:10 +0200	[thread overview]
Message-ID: <a91c9337da6458e5f1d61ff36ec07e66132d0c1e.camel@sipsolutions.net> (raw)
In-Reply-To: <fffa6cc5-99b6-f598-e20f-b30270ecd04c@candelatech.com>

On Thu, 2020-07-30 at 05:52 -0700, Ben Greear wrote:
> 
> > > +		if (++count > 1000) {
> > > +			/* WTF, bail out so that at least we don't hang the system. */
> > > +			sdata_err(sdata, "Could not move state after 1000 tries, ret: %d  state: %d\n",
> > > +				  ret, sta->sta_state);
> > > +			WARN_ON_ONCE(1);
> > > +			break;
> > > +		}
> > 
> > I guess that should be
> > 
> > if (WARN_ON_ONCE()) ...
> 
> If we spin 1000 times, it is worth a second warning.  Or do you mean
> the WARN_ON_ONCE(ret) should have if in front of it?

Ah. I missed the WARN_ON_ONCE(ret) entirely. I just meant that the
warning could/should be around the condition.

In fact though, even the message probably should:

	if (WARN_ONCE(++count > 1000, "...", ...))
		break;

That way the message would be captured inside the warning, which is
better for tooling that parses warnings.

> > 
> > etc.
> > 
> > >   		int err = drv_sta_state(sta->local, sta->sdata, sta,
> > >   					sta->sta_state, new_state);
> > > -		if (err)
> > > -			return err;
> > > +		if (err == -EIO) {
> > > +			/* Sdata-not-in-driver, we are out of sync, but probably
> > > +			 * best to carry on instead of bailing here, at least maybe
> > > +			 * we can clean this up.
> > > +			 */
> > 
> > It _could_ be the driver itself returning -EIO, so why not check the
> > sdata-in-driver flag?
> 
> Right, but if driver is complaining here, we need to bail out regardless of
> sdata-in-driver or not,

Yes. But I'm not sure we should WARN on that?

>  unless you think a driver could return EIO and then
> a small bit later start working for the same request?

Hah, no. If that's a possibility due to some stupid firmware reasons,
let the driver deal with it.

> > Really here that mostly applies to the commit log, which should probably
> > say something like
> > 
> > 	mac80211: deadlock due to driver misbehaviour
> > 
> > or so, and then go on to explain what it does in *mac80211*, and show
> > the iwlwifi parts only as an *example*.
> 
> Its not really driver mis-behaviour per se.  The root cause is that the
> firmware crashes too badly for the driver to recover (ok, so driver might
> could be better, but I've also seen cases where ath10k NIC falls off the PCI
> bus, so nothing the driver can do in that case I think).

Fair enough. We actually do have some code in there that tries to
unbind/rebind the driver from the device eventually, but that's
obviously a very last resort.

FWIW, we do have multiple NICs in a single machine, but then we run them
from VMs so each VM only has a single NIC. But I don't see why that
should be different from the device/firmware point of view. Perhaps your
PCIe configuration is different.

> Per my other patches, I've seen this sdata-in-driver crap in the past, so
> I think I probably hit a similar bug in both ax200 and ath10k, but since
> ax200 is so easy to crash, it is much more likely to hit this bug than any
> other driver I'm aware of.
> 
> I'll try to re-word the commit message though, I don't really care what it
> says so long as the code gets in.

:)

Thanks,
johannes


      reply	other threads:[~2020-07-30 13:11 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-10 20:40 [PATCH] mac80211: Fix kernel hang on ax200 firmware crash greearb
2020-06-15 13:36 ` Ben Greear
2020-07-30 12:33 ` Johannes Berg
2020-07-30 12:52   ` Ben Greear
2020-07-30 13:11     ` Johannes Berg [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a91c9337da6458e5f1d61ff36ec07e66132d0c1e.camel@sipsolutions.net \
    --to=johannes@sipsolutions.net \
    --cc=greearb@candelatech.com \
    --cc=linux-wireless@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.