linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Luis Chamberlain <mcgrof@kernel.org>
To: Steve deRosier <derosier@gmail.com>
Cc: Ben Greear <greearb@candelatech.com>,
	Johannes Berg <johannes@sipsolutions.net>,
	jeyu@kernel.org, akpm@linux-foundation.org, arnd@arndb.de,
	rostedt@goodmis.org, mingo@redhat.com, aquini@redhat.com,
	cai@lca.pw, dyoung@redhat.com, bhe@redhat.com,
	peterz@infradead.org, tglx@linutronix.de, gpiccoli@canonical.com,
	pmladek@suse.com, Takashi Iwai <tiwai@suse.de>,
	schlad@suse.de, andriy.shevchenko@linux.intel.com,
	keescook@chromium.org, daniel.vetter@ffwll.ch, will@kernel.org,
	mchehab+samsung@kernel.org, Kalle Valo <kvalo@codeaurora.org>,
	"David S. Miller" <davem@davemloft.net>,
	Network Development <netdev@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-wireless <linux-wireless@vger.kernel.org>,
	ath10k@lists.infradead.org
Subject: Re: [PATCH v2 12/15] ath10k: use new module_firmware_crashed()
Date: Mon, 18 May 2020 19:09:30 +0000	[thread overview]
Message-ID: <20200518190930.GO11244@42.do-not-panic.com> (raw)
In-Reply-To: <CALLGbR+ht2V3m5f-aUbdwEMOvbsX8ebmzdWgX4jyWTbpHrXZ0Q@mail.gmail.com>

On Mon, May 18, 2020 at 11:06:27AM -0700, Steve deRosier wrote:
> On Mon, May 18, 2020 at 10:19 AM Luis Chamberlain <mcgrof@kernel.org> wrote:
> > From a support perspective it is a *crystal* clear sign that the device
> > and / or device driver may be in a very bad state, in a generic way.
> >
> 
> Unfortunately a "taint" is interpreted by many users as: "your kernel
> is really F#*D up, you better do something about it right now."
> Assuming they're paying attention at all in the first place of course.

Taint historically has been used and still is today to help rule out
whether or not you get support, or how you get support.

For instance, a staging driver is not supported by some upstream
developers, but it will be by those who help staging and Greg. TAINT_CRAP
cannot be even more clear.

So, no, it is not just about "hey your kernel is messed up", there are
clear support boundaries being drawn.

> The fact is, WiFi chip firmware crashes, and in most cases the driver
> is able to recover seamlessly. At least that is the case with most QCA
> chipsets I work with. 

That has not been my exerience with the same driver, and so how do we
know? And this patch set is not about ath10k alone, I want you to
think about *all* device drivers with firmware. In my journey to scrape
the kernel for these cases I was very surprised by the amount of code
which clearly annotates these situations.

> And the users or our ability to do anything
> about it is minimal to none as we don't have access to firmware
> source.

This is not true, we have open firmware in WiFi. Some vendors choose
to not open source their firmware, that is their decision.

These days though, I think we all admit, that firmware crashes can use
a better generic infrastructure for ensuring that clearly affecting-user
experience issues. This patch is about that *when and if these happen*,
we annotate it in the kernel for support pursposes.

> It's too bad and I wish it weren't the case, but we have
> embraced reality and most drivers have a recovery mechanism built in
> for this case.

The mentality about firmware crashes being the end of the world is
certainly what will lead developers to often hide these. Where this
is openly clear, and not obfucscated I'd argue that firmware issues
get fixed likely more common.

So what you describe is not bad, its just accepting evolution.

> In short, it's a non-event. I fear that elevating this
> to a kernel taint will significantly increase "support" requests that
> really are nothing but noise;

That will depend on where you put this on the driver, and that is
why it is important to place it in the right place, if any.

> similar to how the firmware load failure
> messages (fail to load fw-2.bin, fail to load fw-1.bin, yay loaded
> fw-0.bin) cause a lot of noise.

That can be fixed, the developers behind this series gave up on it.
It has to do with a range version of supported firmwares, and all
being optional, but at least one is required.

> Not specifically opposed, but I wonder what it really accomplishes in
> a world where the firmware crashing is pretty much a normal
> occurrence.

Recovery without affecting user experience would be great, the taint is
*not* for those cases. The taint definition has:

+ 18) ``Q`` used by device drivers to annotate that the device driver's firmware
+     has crashed and the device's operation has been severely affected. The    
+     device may be left in a crippled state, requiring full driver removal /   
+     addition, system reboot, or it is unclear how long recovery will take.

Let me know if this is not clear.

> If it goes in, I think that the drivers shouldn't trigger the taint if
> they're able to recover normally. Only trigger on failure to come back
> up.  In other words, the ideal place in the ath10k driver isn't where
> you have proposed as at that point operation is normal and we're doing
> a routine recovery.

Sure, happy to remove it if indeed it is the case that the firwmare
crash is not happening to cripple the device, but I can vouch for the
fact that the exact place where I placed it left my device driver in a
state where I had to remove / add again.

  Luis

  reply	other threads:[~2020-05-18 19:09 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-15 21:28 [PATCH v2 00/15] net: taint when the device driver firmware crashes Luis Chamberlain
2020-05-15 21:28 ` [PATCH v2 01/15] taint: add module firmware crash taint support Luis Chamberlain
2020-05-16  4:03   ` Rafael Aquini
2020-05-19 16:42   ` Jessica Yu
2020-05-22  5:17     ` Luis Chamberlain
2020-05-15 21:28 ` [PATCH v2 02/15] ethernet/839: use new module_firmware_crashed() Luis Chamberlain
2020-05-16  4:04   ` Rafael Aquini
2020-05-15 21:28 ` [PATCH v2 03/15] bnx2x: " Luis Chamberlain
2020-05-16  4:05   ` Rafael Aquini
2020-05-15 21:28 ` [PATCH v2 04/15] bnxt: " Luis Chamberlain
2020-05-16  4:06   ` Rafael Aquini
2020-05-16  5:14   ` Vasundhara Volam
2020-05-15 21:28 ` [PATCH v2 05/15] bna: " Luis Chamberlain
2020-05-16  4:07   ` Rafael Aquini
2020-05-15 21:28 ` [PATCH v2 06/15] liquidio: " Luis Chamberlain
2020-05-16  4:07   ` Rafael Aquini
2020-05-15 21:28 ` [PATCH v2 07/15] cxgb4: " Luis Chamberlain
2020-05-16  4:09   ` Rafael Aquini
2020-05-15 21:28 ` [PATCH v2 08/15] ehea: " Luis Chamberlain
2020-05-16  4:09   ` Rafael Aquini
2020-05-15 21:28 ` [PATCH v2 09/15] qed: " Luis Chamberlain
2020-05-16  4:10   ` Rafael Aquini
2020-05-15 21:28 ` [PATCH v2 10/15] soc: qcom: ipa: " Luis Chamberlain
2020-05-16  4:10   ` Rafael Aquini
2020-05-19 22:34   ` Alex Elder
2020-05-22  5:28     ` Luis Chamberlain
2020-05-22 20:52       ` Alex Elder
2020-05-22 21:53         ` Luis Chamberlain
2020-05-15 21:28 ` [PATCH v2 11/15] wimax/i2400m: " Luis Chamberlain
2020-05-16  4:11   ` Rafael Aquini
2020-05-15 21:28 ` [PATCH v2 12/15] ath10k: " Luis Chamberlain
2020-05-16  4:11   ` Rafael Aquini
2020-05-16 13:24   ` Johannes Berg
2020-05-16 13:50     ` Johannes Berg
2020-05-18 16:56       ` Luis Chamberlain
2020-05-19  1:23       ` Brian Norris
2020-05-19 14:02         ` Luis Chamberlain
2020-05-20  0:47           ` Brian Norris
2020-05-20  5:37             ` Emmanuel Grumbach
2020-05-20  8:32               ` Andy Shevchenko
2020-05-21 19:01               ` Brian Norris
2020-05-22  5:12                 ` Emmanuel Grumbach
2020-05-22  5:23                   ` Luis Chamberlain
2020-05-18 16:51     ` Luis Chamberlain
2020-05-18 16:58       ` Ben Greear
2020-05-18 17:09         ` Luis Chamberlain
2020-05-18 17:15           ` Ben Greear
2020-05-18 17:18             ` Luis Chamberlain
2020-05-18 18:06               ` Steve deRosier
2020-05-18 19:09                 ` Luis Chamberlain [this message]
2020-05-18 19:25                   ` Johannes Berg
2020-05-18 19:59                     ` Luis Chamberlain
2020-05-18 20:07                       ` Johannes Berg
2020-05-18 21:18                         ` Luis Chamberlain
2020-05-18 20:28                     ` Jakub Kicinski
2020-05-18 20:29                       ` Johannes Berg
2020-05-18 20:35                         ` Jakub Kicinski
2020-05-18 20:41                           ` Johannes Berg
2020-05-18 20:46                             ` Jakub Kicinski
2020-05-18 21:22                               ` Luis Chamberlain
2020-05-18 22:16                                 ` Jakub Kicinski
2020-05-19  1:05                                   ` Luis Chamberlain
2020-05-19 21:15                                     ` [RFC 1/2] devlink: add simple fw crash helpers Jakub Kicinski
2020-05-22  5:20                                       ` Luis Chamberlain
2020-05-22 17:17                                         ` Jakub Kicinski
2020-05-22 20:46                                           ` Johannes Berg
2020-05-22 21:51                                             ` Luis Chamberlain
2020-05-22 23:23                                               ` Steve deRosier
2020-05-22 23:44                                                 ` Luis Chamberlain
2020-05-25  9:07                                                 ` Andy Shevchenko
2020-05-25 17:08                                                   ` Ben Greear
2020-05-25 20:57                                             ` Jakub Kicinski
2020-07-30 13:56                                               ` Johannes Berg
2020-05-22 21:49                                           ` Luis Chamberlain
2020-05-19 21:15                                     ` [RFC 2/2] i2400m: use devlink health reporter Jakub Kicinski
2020-05-15 21:28 ` [PATCH v2 13/15] ath6kl: use new module_firmware_crashed() Luis Chamberlain
2020-05-16  4:12   ` Rafael Aquini
2020-05-15 21:28 ` [PATCH v2 14/15] brcm80211: " Luis Chamberlain
2020-05-16  4:13   ` Rafael Aquini
2020-05-15 21:28 ` [PATCH v2 15/15] mwl8k: " Luis Chamberlain
2020-05-16  4:13   ` Rafael Aquini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200518190930.GO11244@42.do-not-panic.com \
    --to=mcgrof@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=aquini@redhat.com \
    --cc=arnd@arndb.de \
    --cc=ath10k@lists.infradead.org \
    --cc=bhe@redhat.com \
    --cc=cai@lca.pw \
    --cc=daniel.vetter@ffwll.ch \
    --cc=davem@davemloft.net \
    --cc=derosier@gmail.com \
    --cc=dyoung@redhat.com \
    --cc=gpiccoli@canonical.com \
    --cc=greearb@candelatech.com \
    --cc=jeyu@kernel.org \
    --cc=johannes@sipsolutions.net \
    --cc=keescook@chromium.org \
    --cc=kvalo@codeaurora.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-wireless@vger.kernel.org \
    --cc=mchehab+samsung@kernel.org \
    --cc=mingo@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=schlad@suse.de \
    --cc=tglx@linutronix.de \
    --cc=tiwai@suse.de \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).