From: Luis Chamberlain <mcgrof@kernel.org>
To: Steve deRosier <derosier@gmail.com>
Cc: Johannes Berg <johannes@sipsolutions.net>,
Jakub Kicinski <kuba@kernel.org>,
Ben Greear <greearb@candelatech.com>,
jeyu@kernel.org, akpm@linux-foundation.org, arnd@arndb.de,
rostedt@goodmis.org, mingo@redhat.com, aquini@redhat.com,
cai@lca.pw, dyoung@redhat.com, bhe@redhat.com,
peterz@infradead.org, tglx@linutronix.de, gpiccoli@canonical.com,
pmladek@suse.com, Takashi Iwai <tiwai@suse.de>,
schlad@suse.de, andriy.shevchenko@linux.intel.com,
Kees Cook <keescook@chromium.org>,
Daniel Vetter <daniel.vetter@ffwll.ch>,
will@kernel.org, mchehab+samsung@kernel.org,
Kalle Valo <kvalo@codeaurora.org>,
"David S. Miller" <davem@davemloft.net>,
Network Development <netdev@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>,
linux-wireless <linux-wireless@vger.kernel.org>,
ath10k@lists.infradead.org, jiri@resnulli.us,
briannorris@chromium.org
Subject: Re: [RFC 1/2] devlink: add simple fw crash helpers
Date: Fri, 22 May 2020 23:44:09 +0000 [thread overview]
Message-ID: <20200522234409.GH11244@42.do-not-panic.com> (raw)
In-Reply-To: <CALLGbR+QPcECtJbYmzztV_Qysc5qtwujT_qc785zvhZMCH50fg@mail.gmail.com>
On Fri, May 22, 2020 at 04:23:55PM -0700, Steve deRosier wrote:
> Specifically, I don't think we should set a taint flag when a driver
> easily handles a routine firmware crash and is confident that things
> have come up just fine again. In other words, triggering the taint in
> every driver module where it spits out a log comment that it had a
> firmware crash and had to recover seems too much. Sure, firmware
> shouldn't crash, sure it should be open source so we can fix it,
> whatever... those sort of wishful comments simply ignore reality and
> our ability to affect effective change. A lot of WiFi firmware crashes
> and for well-known cases the drivers handle them well. And in some
> cases, not so well and that should be a place the driver should detect
> and thus raise a red flag. If a WiFi firmware crash can bring down
> the kernel, there's either a major driver bug or some very funky
> hardware crap going on. That sort of thing we should be able to
> detect, mark with a taint (or something), and fix if within our sphere
> of influence. I guess what it comes down to me is how aggressive we
> are about setting the flag.
Exactly the crux of the issue.
I hope that by now we should all be in agreement that at least a
firmware crash requiring a reboot is something we should record and
inform the user of. A taint seems like a reasonable standard practice
for these sorts of things.
> I would like there to be a single solution, or a minimized set
> depending on what makes sense for the requirements. I haven't had time
> to look into the alternatives mentioned here so I don't have an
> informed opinion about the solution. I do think Luis is trying to
> solve a real problem though. Can we look at this from the point of
> view of what are the requirements? What is it we're trying to solve?
>
> I _think_ that the goal of Luis's original proposal is to report up to
> the user, at some future point when the user is interested (because
> something super drastic just occured, but long after the fw crash),
> that there was a firmware crash without the user having to grep
> through all logs on the machine. And then if the user sees that flag
> and suspects it, then they can bother to find it in the logs or do
> more drastic debugging steps like finding the fw crash in the log and
> pulling firmware crash dumps, etc.
Yes, that's exactly it. Not all users are clueful to inspect logs. I now
have a generic uevent mechanism drafted which sends a uevent for *any*
taint. So that is, it does not even depend on this series. But it
accomplishes the goal of informing the user of taints.
> I think the various alternate solutions are great but perhaps solving
> a superset of features (like adding in user-space notifications etc)?
> Perhaps different people on these related threads are trying to solve
> different problems?
The uevent mechanism I implemented (but not yet posted for review) at
least sends out a smoke signal. I think that if each subsystem wants
to expand on this with dumping facilities that is great too!
Luis
next prev parent reply other threads:[~2020-05-22 23:44 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20200515212846.1347-1-mcgrof@kernel.org>
2020-05-15 21:28 ` [PATCH v2 12/15] ath10k: use new module_firmware_crashed() Luis Chamberlain
2020-05-16 4:11 ` Rafael Aquini
2020-05-16 13:24 ` Johannes Berg
2020-05-16 13:50 ` Johannes Berg
2020-05-18 16:56 ` Luis Chamberlain
2020-05-19 1:23 ` Brian Norris
2020-05-19 14:02 ` Luis Chamberlain
2020-05-20 0:47 ` Brian Norris
2020-05-20 5:37 ` Emmanuel Grumbach
2020-05-20 8:32 ` Andy Shevchenko
2020-05-21 19:01 ` Brian Norris
2020-05-22 5:12 ` Emmanuel Grumbach
2020-05-22 5:23 ` Luis Chamberlain
2020-05-18 16:51 ` Luis Chamberlain
2020-05-18 16:58 ` Ben Greear
2020-05-18 17:09 ` Luis Chamberlain
2020-05-18 17:15 ` Ben Greear
2020-05-18 17:18 ` Luis Chamberlain
2020-05-18 18:06 ` Steve deRosier
2020-05-18 19:09 ` Luis Chamberlain
2020-05-18 19:25 ` Johannes Berg
2020-05-18 19:59 ` Luis Chamberlain
2020-05-18 20:07 ` Johannes Berg
2020-05-18 21:18 ` Luis Chamberlain
2020-05-18 20:28 ` Jakub Kicinski
2020-05-18 20:29 ` Johannes Berg
2020-05-18 20:35 ` Jakub Kicinski
2020-05-18 20:41 ` Johannes Berg
2020-05-18 20:46 ` Jakub Kicinski
2020-05-18 21:22 ` Luis Chamberlain
2020-05-18 22:16 ` Jakub Kicinski
2020-05-19 1:05 ` Luis Chamberlain
2020-05-19 21:15 ` [RFC 1/2] devlink: add simple fw crash helpers Jakub Kicinski
2020-05-22 5:20 ` Luis Chamberlain
2020-05-22 17:17 ` Jakub Kicinski
2020-05-22 20:46 ` Johannes Berg
2020-05-22 21:51 ` Luis Chamberlain
2020-05-22 23:23 ` Steve deRosier
2020-05-22 23:44 ` Luis Chamberlain [this message]
2020-05-25 9:07 ` Andy Shevchenko
2020-05-25 17:08 ` Ben Greear
2020-05-25 20:57 ` Jakub Kicinski
2020-07-30 13:56 ` Johannes Berg
2020-05-22 21:49 ` Luis Chamberlain
2020-05-19 21:15 ` [RFC 2/2] i2400m: use devlink health reporter Jakub Kicinski
2020-05-15 21:28 ` [PATCH v2 13/15] ath6kl: use new module_firmware_crashed() Luis Chamberlain
2020-05-16 4:12 ` Rafael Aquini
2020-05-15 21:28 ` [PATCH v2 14/15] brcm80211: " Luis Chamberlain
2020-05-16 4:13 ` Rafael Aquini
2020-05-15 21:28 ` [PATCH v2 15/15] mwl8k: " Luis Chamberlain
2020-05-16 4:13 ` Rafael Aquini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200522234409.GH11244@42.do-not-panic.com \
--to=mcgrof@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=andriy.shevchenko@linux.intel.com \
--cc=aquini@redhat.com \
--cc=arnd@arndb.de \
--cc=ath10k@lists.infradead.org \
--cc=bhe@redhat.com \
--cc=briannorris@chromium.org \
--cc=cai@lca.pw \
--cc=daniel.vetter@ffwll.ch \
--cc=davem@davemloft.net \
--cc=derosier@gmail.com \
--cc=dyoung@redhat.com \
--cc=gpiccoli@canonical.com \
--cc=greearb@candelatech.com \
--cc=jeyu@kernel.org \
--cc=jiri@resnulli.us \
--cc=johannes@sipsolutions.net \
--cc=keescook@chromium.org \
--cc=kuba@kernel.org \
--cc=kvalo@codeaurora.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-wireless@vger.kernel.org \
--cc=mchehab+samsung@kernel.org \
--cc=mingo@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=pmladek@suse.com \
--cc=rostedt@goodmis.org \
--cc=schlad@suse.de \
--cc=tglx@linutronix.de \
--cc=tiwai@suse.de \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).