All of lore.kernel.org
 help / color / mirror / Atom feed
From: Steve deRosier <derosier@gmail.com>
To: Luis Chamberlain <mcgrof@kernel.org>
Cc: Johannes Berg <johannes@sipsolutions.net>,
	Jakub Kicinski <kuba@kernel.org>,
	Ben Greear <greearb@candelatech.com>,
	jeyu@kernel.org, akpm@linux-foundation.org, arnd@arndb.de,
	rostedt@goodmis.org, mingo@redhat.com, aquini@redhat.com,
	cai@lca.pw, dyoung@redhat.com, bhe@redhat.com,
	peterz@infradead.org, tglx@linutronix.de, gpiccoli@canonical.com,
	pmladek@suse.com, Takashi Iwai <tiwai@suse.de>,
	schlad@suse.de, andriy.shevchenko@linux.intel.com,
	Kees Cook <keescook@chromium.org>,
	Daniel Vetter <daniel.vetter@ffwll.ch>,
	will@kernel.org, mchehab+samsung@kernel.org,
	Kalle Valo <kvalo@codeaurora.org>,
	"David S. Miller" <davem@davemloft.net>,
	Network Development <netdev@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-wireless <linux-wireless@vger.kernel.org>,
	ath10k@lists.infradead.org, jiri@resnulli.us,
	briannorris@chromium.org
Subject: Re: [RFC 1/2] devlink: add simple fw crash helpers
Date: Fri, 22 May 2020 16:23:55 -0700	[thread overview]
Message-ID: <CALLGbR+QPcECtJbYmzztV_Qysc5qtwujT_qc785zvhZMCH50fg@mail.gmail.com> (raw)
In-Reply-To: <20200522215145.GC11244@42.do-not-panic.com>

On Fri, May 22, 2020 at 2:51 PM Luis Chamberlain <mcgrof@kernel.org> wrote:
>
> On Fri, May 22, 2020 at 10:46:07PM +0200, Johannes Berg wrote:
> > FWIW, I still completely disagree on that taint. You (Luis) obviously
> > have been running into a bug in that driver, I doubt the firmware
> > actually managed to wedge the hardware.
>
> This hasn't happened just once, its happed many times sporadically now,
> once a week or two weeks I'd say. And the system isn't being moved
> around.
>
> > But even if it did, that's still not really a kernel taint. The kernel
> > itself isn't in any way affected by this.
>
> Of course it is, a full reboot is required.
>
> > Yes, the system is in a weird state now. But that's *not* equivalent to
> > "kernel tainted".
>
> Requiring a full reboot is a dire situation to be in, and loosing
> connectivity to the point this is not recoverable likewise.
>
> You guys are making out a taint to be the end of the world. We have a
> taint even for a kernel warning, and as others have mentioned mac80211
> already produces these.
>

I had to go RTFM re: kernel taints because it has been a very long
time since I looked at them. It had always seemed to me that most were
caused by "kernel-unfriendly" user actions.  The most famous of course
is loading proprietary modules, out-of-tree modules, forced module
loads, etc...  Honestly, I had forgotten the large variety of uses of
the taint flags. For anyone who hasn't looked at taints recently, I
recommend: https://www.kernel.org/doc/html/latest/admin-guide/tainted-kernels.html

In light of this I don't object to setting a taint on this anymore.
I'm a little uneasy, but I've softened on it now, and now I feel it
depends on implementation.

Specifically, I don't think we should set a taint flag when a driver
easily handles a routine firmware crash and is confident that things
have come up just fine again. In other words, triggering the taint in
every driver module where it spits out a log comment that it had a
firmware crash and had to recover seems too much. Sure, firmware
shouldn't crash, sure it should be open source so we can fix it,
whatever... those sort of wishful comments simply ignore reality and
our ability to affect effective change. A lot of WiFi firmware crashes
and for well-known cases the drivers handle them well. And in some
cases, not so well and that should be a place the driver should detect
and thus raise a red flag.  If a WiFi firmware crash can bring down
the kernel, there's either a major driver bug or some very funky
hardware crap going on. That sort of thing we should be able to
detect, mark with a taint (or something), and fix if within our sphere
of influence. I guess what it comes down to me is how aggressive we
are about setting the flag.

I would like there to be a single solution, or a minimized set
depending on what makes sense for the requirements. I haven't had time
to look into the alternatives mentioned here so I don't have an
informed opinion about the solution. I do think Luis is trying to
solve a real problem though. Can we look at this from the point of
view of what are the requirements?  What is it we're trying to solve?

I _think_ that the goal of Luis's original proposal is to report up to
the user, at some future point when the user is interested (because
something super drastic just occured, but long after the fw crash),
that there was a firmware crash without the user having to grep
through all logs on the machine. And then if the user sees that flag
and suspects it, then they can bother to find it in the logs or do
more drastic debugging steps like finding the fw crash in the log and
pulling firmware crash dumps, etc.

I think the various alternate solutions are great but perhaps solving
a superset of features (like adding in user-space notifications etc)?
Perhaps different people on these related threads are trying to solve
different problems?


- Steve

WARNING: multiple messages have this Message-ID (diff)
From: Steve deRosier <derosier@gmail.com>
To: Luis Chamberlain <mcgrof@kernel.org>
Cc: linux-wireless <linux-wireless@vger.kernel.org>,
	aquini@redhat.com, peterz@infradead.org,
	Daniel Vetter <daniel.vetter@ffwll.ch>,
	mchehab+samsung@kernel.org, will@kernel.org,
	Ben Greear <greearb@candelatech.com>,
	bhe@redhat.com, briannorris@chromium.org,
	ath10k@lists.infradead.org, Takashi Iwai <tiwai@suse.de>,
	mingo@redhat.com, Jakub Kicinski <kuba@kernel.org>,
	dyoung@redhat.com, pmladek@suse.com, jiri@resnulli.us,
	Kees Cook <keescook@chromium.org>,
	arnd@arndb.de, gpiccoli@canonical.com, rostedt@goodmis.org,
	cai@lca.pw, tglx@linutronix.de,
	andriy.shevchenko@linux.intel.com,
	Johannes Berg <johannes@sipsolutions.net>,
	Kalle Valo <kvalo@codeaurora.org>,
	Network Development <netdev@vger.kernel.org>,
	schlad@suse.de, LKML <linux-kernel@vger.kernel.org>,
	jeyu@kernel.org, akpm@linux-foundation.org,
	"David S. Miller" <davem@davemloft.net>
Subject: Re: [RFC 1/2] devlink: add simple fw crash helpers
Date: Fri, 22 May 2020 16:23:55 -0700	[thread overview]
Message-ID: <CALLGbR+QPcECtJbYmzztV_Qysc5qtwujT_qc785zvhZMCH50fg@mail.gmail.com> (raw)
In-Reply-To: <20200522215145.GC11244@42.do-not-panic.com>

On Fri, May 22, 2020 at 2:51 PM Luis Chamberlain <mcgrof@kernel.org> wrote:
>
> On Fri, May 22, 2020 at 10:46:07PM +0200, Johannes Berg wrote:
> > FWIW, I still completely disagree on that taint. You (Luis) obviously
> > have been running into a bug in that driver, I doubt the firmware
> > actually managed to wedge the hardware.
>
> This hasn't happened just once, its happed many times sporadically now,
> once a week or two weeks I'd say. And the system isn't being moved
> around.
>
> > But even if it did, that's still not really a kernel taint. The kernel
> > itself isn't in any way affected by this.
>
> Of course it is, a full reboot is required.
>
> > Yes, the system is in a weird state now. But that's *not* equivalent to
> > "kernel tainted".
>
> Requiring a full reboot is a dire situation to be in, and loosing
> connectivity to the point this is not recoverable likewise.
>
> You guys are making out a taint to be the end of the world. We have a
> taint even for a kernel warning, and as others have mentioned mac80211
> already produces these.
>

I had to go RTFM re: kernel taints because it has been a very long
time since I looked at them. It had always seemed to me that most were
caused by "kernel-unfriendly" user actions.  The most famous of course
is loading proprietary modules, out-of-tree modules, forced module
loads, etc...  Honestly, I had forgotten the large variety of uses of
the taint flags. For anyone who hasn't looked at taints recently, I
recommend: https://www.kernel.org/doc/html/latest/admin-guide/tainted-kernels.html

In light of this I don't object to setting a taint on this anymore.
I'm a little uneasy, but I've softened on it now, and now I feel it
depends on implementation.

Specifically, I don't think we should set a taint flag when a driver
easily handles a routine firmware crash and is confident that things
have come up just fine again. In other words, triggering the taint in
every driver module where it spits out a log comment that it had a
firmware crash and had to recover seems too much. Sure, firmware
shouldn't crash, sure it should be open source so we can fix it,
whatever... those sort of wishful comments simply ignore reality and
our ability to affect effective change. A lot of WiFi firmware crashes
and for well-known cases the drivers handle them well. And in some
cases, not so well and that should be a place the driver should detect
and thus raise a red flag.  If a WiFi firmware crash can bring down
the kernel, there's either a major driver bug or some very funky
hardware crap going on. That sort of thing we should be able to
detect, mark with a taint (or something), and fix if within our sphere
of influence. I guess what it comes down to me is how aggressive we
are about setting the flag.

I would like there to be a single solution, or a minimized set
depending on what makes sense for the requirements. I haven't had time
to look into the alternatives mentioned here so I don't have an
informed opinion about the solution. I do think Luis is trying to
solve a real problem though. Can we look at this from the point of
view of what are the requirements?  What is it we're trying to solve?

I _think_ that the goal of Luis's original proposal is to report up to
the user, at some future point when the user is interested (because
something super drastic just occured, but long after the fw crash),
that there was a firmware crash without the user having to grep
through all logs on the machine. And then if the user sees that flag
and suspects it, then they can bother to find it in the logs or do
more drastic debugging steps like finding the fw crash in the log and
pulling firmware crash dumps, etc.

I think the various alternate solutions are great but perhaps solving
a superset of features (like adding in user-space notifications etc)?
Perhaps different people on these related threads are trying to solve
different problems?


- Steve

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

  reply	other threads:[~2020-05-22 23:24 UTC|newest]

Thread overview: 128+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-15 21:28 [PATCH v2 00/15] net: taint when the device driver firmware crashes Luis Chamberlain
2020-05-15 21:28 ` [PATCH v2 01/15] taint: add module firmware crash taint support Luis Chamberlain
2020-05-16  4:03   ` Rafael Aquini
2020-05-19 16:42   ` Jessica Yu
2020-05-22  5:17     ` Luis Chamberlain
2020-05-15 21:28 ` [PATCH v2 02/15] ethernet/839: use new module_firmware_crashed() Luis Chamberlain
2020-05-16  4:04   ` Rafael Aquini
2020-05-15 21:28 ` [PATCH v2 03/15] bnx2x: " Luis Chamberlain
2020-05-16  4:05   ` Rafael Aquini
2020-05-15 21:28 ` [PATCH v2 04/15] bnxt: " Luis Chamberlain
2020-05-16  4:06   ` Rafael Aquini
2020-05-16  5:14   ` Vasundhara Volam
2020-05-15 21:28 ` [PATCH v2 05/15] bna: " Luis Chamberlain
2020-05-16  4:07   ` Rafael Aquini
2020-05-15 21:28 ` [PATCH v2 06/15] liquidio: " Luis Chamberlain
2020-05-16  4:07   ` Rafael Aquini
2020-05-15 21:28 ` [PATCH v2 07/15] cxgb4: " Luis Chamberlain
2020-05-16  4:09   ` Rafael Aquini
2020-05-15 21:28 ` [PATCH v2 08/15] ehea: " Luis Chamberlain
2020-05-16  4:09   ` Rafael Aquini
2020-05-15 21:28 ` [PATCH v2 09/15] qed: " Luis Chamberlain
2020-05-16  4:10   ` Rafael Aquini
2020-05-15 21:28 ` [PATCH v2 10/15] soc: qcom: ipa: " Luis Chamberlain
2020-05-16  4:10   ` Rafael Aquini
2020-05-19 22:34   ` Alex Elder
2020-05-22  5:28     ` Luis Chamberlain
2020-05-22 20:52       ` Alex Elder
2020-05-22 21:53         ` Luis Chamberlain
2020-05-15 21:28 ` [PATCH v2 11/15] wimax/i2400m: " Luis Chamberlain
2020-05-16  4:11   ` Rafael Aquini
2020-05-15 21:28 ` [PATCH v2 12/15] ath10k: " Luis Chamberlain
2020-05-15 21:28   ` Luis Chamberlain
2020-05-16  4:11   ` Rafael Aquini
2020-05-16  4:11     ` Rafael Aquini
2020-05-16 13:24   ` Johannes Berg
2020-05-16 13:24     ` Johannes Berg
2020-05-16 13:50     ` Johannes Berg
2020-05-16 13:50       ` Johannes Berg
2020-05-18 16:56       ` Luis Chamberlain
2020-05-18 16:56         ` Luis Chamberlain
2020-05-19  1:23       ` Brian Norris
2020-05-19  1:23         ` Brian Norris
2020-05-19 14:02         ` Luis Chamberlain
2020-05-19 14:02           ` Luis Chamberlain
2020-05-20  0:47           ` Brian Norris
2020-05-20  0:47             ` Brian Norris
2020-05-20  5:37             ` Emmanuel Grumbach
2020-05-20  5:37               ` Emmanuel Grumbach
2020-05-20  8:32               ` Andy Shevchenko
2020-05-20  8:32                 ` Andy Shevchenko
2020-05-21 19:01               ` Brian Norris
2020-05-21 19:01                 ` Brian Norris
2020-05-22  5:12                 ` Emmanuel Grumbach
2020-05-22  5:12                   ` Emmanuel Grumbach
2020-05-22  5:23                   ` Luis Chamberlain
2020-05-22  5:23                     ` Luis Chamberlain
2020-05-18 16:51     ` Luis Chamberlain
2020-05-18 16:51       ` Luis Chamberlain
2020-05-18 16:58       ` Ben Greear
2020-05-18 16:58         ` Ben Greear
2020-05-18 17:09         ` Luis Chamberlain
2020-05-18 17:09           ` Luis Chamberlain
2020-05-18 17:15           ` Ben Greear
2020-05-18 17:15             ` Ben Greear
2020-05-18 17:18             ` Luis Chamberlain
2020-05-18 17:18               ` Luis Chamberlain
2020-05-18 18:06               ` Steve deRosier
2020-05-18 18:06                 ` Steve deRosier
2020-05-18 19:09                 ` Luis Chamberlain
2020-05-18 19:09                   ` Luis Chamberlain
2020-05-18 19:25                   ` Johannes Berg
2020-05-18 19:25                     ` Johannes Berg
2020-05-18 19:59                     ` Luis Chamberlain
2020-05-18 19:59                       ` Luis Chamberlain
2020-05-18 20:07                       ` Johannes Berg
2020-05-18 20:07                         ` Johannes Berg
2020-05-18 21:18                         ` Luis Chamberlain
2020-05-18 21:18                           ` Luis Chamberlain
2020-05-18 20:28                     ` Jakub Kicinski
2020-05-18 20:28                       ` Jakub Kicinski
2020-05-18 20:29                       ` Johannes Berg
2020-05-18 20:29                         ` Johannes Berg
2020-05-18 20:35                         ` Jakub Kicinski
2020-05-18 20:35                           ` Jakub Kicinski
2020-05-18 20:41                           ` Johannes Berg
2020-05-18 20:41                             ` Johannes Berg
2020-05-18 20:46                             ` Jakub Kicinski
2020-05-18 20:46                               ` Jakub Kicinski
2020-05-18 21:22                               ` Luis Chamberlain
2020-05-18 21:22                                 ` Luis Chamberlain
2020-05-18 22:16                                 ` Jakub Kicinski
2020-05-18 22:16                                   ` Jakub Kicinski
2020-05-19  1:05                                   ` Luis Chamberlain
2020-05-19  1:05                                     ` Luis Chamberlain
2020-05-19 21:15                                     ` [RFC 1/2] devlink: add simple fw crash helpers Jakub Kicinski
2020-05-19 21:15                                       ` Jakub Kicinski
2020-05-22  5:20                                       ` Luis Chamberlain
2020-05-22  5:20                                         ` Luis Chamberlain
2020-05-22 17:17                                         ` Jakub Kicinski
2020-05-22 17:17                                           ` Jakub Kicinski
2020-05-22 20:46                                           ` Johannes Berg
2020-05-22 20:46                                             ` Johannes Berg
2020-05-22 21:51                                             ` Luis Chamberlain
2020-05-22 21:51                                               ` Luis Chamberlain
2020-05-22 23:23                                               ` Steve deRosier [this message]
2020-05-22 23:23                                                 ` Steve deRosier
2020-05-22 23:44                                                 ` Luis Chamberlain
2020-05-22 23:44                                                   ` Luis Chamberlain
2020-05-25  9:07                                                 ` Andy Shevchenko
2020-05-25  9:07                                                   ` Andy Shevchenko
2020-05-25 17:08                                                   ` Ben Greear
2020-05-25 17:08                                                     ` Ben Greear
2020-05-25 20:57                                             ` Jakub Kicinski
2020-05-25 20:57                                               ` Jakub Kicinski
2020-07-30 13:56                                               ` Johannes Berg
2020-07-30 13:56                                                 ` Johannes Berg
2020-05-22 21:49                                           ` Luis Chamberlain
2020-05-22 21:49                                             ` Luis Chamberlain
2020-05-19 21:15                                     ` [RFC 2/2] i2400m: use devlink health reporter Jakub Kicinski
2020-05-19 21:15                                       ` Jakub Kicinski
2020-05-15 21:28 ` [PATCH v2 13/15] ath6kl: use new module_firmware_crashed() Luis Chamberlain
2020-05-15 21:28   ` Luis Chamberlain
2020-05-16  4:12   ` Rafael Aquini
2020-05-16  4:12     ` Rafael Aquini
2020-05-15 21:28 ` [PATCH v2 14/15] brcm80211: " Luis Chamberlain
2020-05-16  4:13   ` Rafael Aquini
2020-05-15 21:28 ` [PATCH v2 15/15] mwl8k: " Luis Chamberlain
2020-05-16  4:13   ` Rafael Aquini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALLGbR+QPcECtJbYmzztV_Qysc5qtwujT_qc785zvhZMCH50fg@mail.gmail.com \
    --to=derosier@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=aquini@redhat.com \
    --cc=arnd@arndb.de \
    --cc=ath10k@lists.infradead.org \
    --cc=bhe@redhat.com \
    --cc=briannorris@chromium.org \
    --cc=cai@lca.pw \
    --cc=daniel.vetter@ffwll.ch \
    --cc=davem@davemloft.net \
    --cc=dyoung@redhat.com \
    --cc=gpiccoli@canonical.com \
    --cc=greearb@candelatech.com \
    --cc=jeyu@kernel.org \
    --cc=jiri@resnulli.us \
    --cc=johannes@sipsolutions.net \
    --cc=keescook@chromium.org \
    --cc=kuba@kernel.org \
    --cc=kvalo@codeaurora.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-wireless@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=mchehab+samsung@kernel.org \
    --cc=mingo@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=schlad@suse.de \
    --cc=tglx@linutronix.de \
    --cc=tiwai@suse.de \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.