All of lore.kernel.org
 help / color / mirror / Atom feed
From: Luis Chamberlain <mcgrof@kernel.org>
To: Jakub Kicinski <kuba@kernel.org>
Cc: Jiri Pirko <jiri@resnulli.us>,
	jeyu@kernel.org, akpm@linux-foundation.org, arnd@arndb.de,
	rostedt@goodmis.org, mingo@redhat.com, aquini@redhat.com,
	cai@lca.pw, dyoung@redhat.com, bhe@redhat.com,
	peterz@infradead.org, tglx@linutronix.de, gpiccoli@canonical.com,
	pmladek@suse.com, tiwai@suse.de, schlad@suse.de,
	andriy.shevchenko@linux.intel.com, keescook@chromium.org,
	daniel.vetter@ffwll.ch, will@kernel.org,
	mchehab+samsung@kernel.org, kvalo@codeaurora.org,
	davem@davemloft.net, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 00/15] net: taint when the device driver firmware crashes
Date: Mon, 11 May 2020 14:11:13 +0000	[thread overview]
Message-ID: <20200511141113.GP11244@42.do-not-panic.com> (raw)
In-Reply-To: <20200509113546.7dcd1599@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>

On Sat, May 09, 2020 at 11:35:46AM -0700, Jakub Kicinski wrote:
> On Sat,  9 May 2020 04:35:37 +0000 Luis Chamberlain wrote:
> > Device driver firmware can crash, and sometimes, this can leave your
> > system in a state which makes the device or subsystem completely
> > useless. Detecting this by inspecting /proc/sys/kernel/tainted instead
> > of scraping some magical words from the kernel log, which is driver
> > specific, is much easier. So instead this series provides a helper which
> > lets drivers annotate this and shows how to use this on networking
> > drivers.
> > 
> > My methodology for finding when firmware crashes is to git grep for
> > "crash" and then doing some study of the code to see if this indeed
> > a place where the firmware crashes. In some places this is quite
> > obvious.
> > 
> > I'm starting off with networking first, if this gets merged later on I
> > can focus on the other drivers, but I already have some work done on
> > other subsytems.
> > 
> > Review, flames, etc are greatly appreciated.
> 
> Tainting itself may be useful, but that's just the first step. I'd much
> rather see folks start using the devlink health infrastructure. Devlink
> is netlink based, but it's _not_ networking specific (many of its
> optional features obviously are, but don't let that mislead you).
> 
> With devlink health we get (a) a standard notification on the failure; 
> (b) information/state dump in a (somewhat) structured form, which can be
> collected & shared with vendors; (c) automatic remediation (usually
> device reset of some scope).

It indeed sounds very useful!

> Now regarding the tainting - as I said it may be useful, but don't we
> have to define what constitutes a "firmware crash"?

Yes indeed, I missed clarifying this in the documentation. I'll do so
in my next respin.

> There are many
> failure modes, some perfectly recoverable (e.g. processing queue hang), 
> some mere bugs (e.g. device fails to initialize some functions). All of
> them may impact the functioning of the system. How do we choose those
> that taint? 

Its up to the maintainers of the device driver, what I was aiming for
were those firmware crashes which indeed *can* have an impact on user
experience, and can *even* potentially require a driver removal / addition
to to get things back in order again.

  Luis

  reply	other threads:[~2020-05-11 14:11 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-09  4:35 [PATCH 00/15] net: taint when the device driver firmware crashes Luis Chamberlain
2020-05-09  4:35 ` [PATCH 01/15] taint: add module firmware crash taint support Luis Chamberlain
2020-05-09 15:18   ` Rafael Aquini
2020-05-09 16:46     ` Luis Chamberlain
2020-05-10  2:19       ` Randy Dunlap
2020-05-09  4:35 ` [PATCH 02/15] ethernet/839: use new module_firmware_crashed() Luis Chamberlain
2020-05-09  4:35 ` [PATCH 03/15] bnx2x: " Luis Chamberlain
2020-05-09  4:35 ` [PATCH 04/15] bnxt: " Luis Chamberlain
2020-05-09  4:35 ` [PATCH 05/15] bna: " Luis Chamberlain
2020-05-09  4:35 ` [PATCH 06/15] liquidio: " Luis Chamberlain
2020-05-09  4:35 ` [PATCH 07/15] cxgb4: " Luis Chamberlain
2020-05-09  4:35 ` [PATCH 08/15] ehea: " Luis Chamberlain
2020-05-09  4:35 ` [PATCH 09/15] qed: " Luis Chamberlain
2020-05-09  6:32   ` [EXT] " Igor Russkikh
2020-05-09 16:42     ` Luis Chamberlain
2020-05-12 16:23       ` Igor Russkikh
2020-05-12 17:34         ` Luis Chamberlain
2020-05-14 14:53           ` Igor Russkikh
2020-05-15 20:32             ` Luis Chamberlain
2020-05-15 20:37               ` Igor Russkikh
2020-05-09  4:35 ` [PATCH 10/15] soc: qcom: ipa: " Luis Chamberlain
2020-05-09  4:35 ` [PATCH 11/15] wimax/i2400m: " Luis Chamberlain
2020-05-09  4:35 ` [PATCH 12/15] ath10k: " Luis Chamberlain
2020-05-09  4:35   ` Luis Chamberlain
2020-05-09  4:35 ` [PATCH 13/15] ath6kl: " Luis Chamberlain
2020-05-09  4:35   ` Luis Chamberlain
2020-05-09  4:35 ` [PATCH 14/15] brcm80211: " Luis Chamberlain
2020-05-09  4:35 ` [PATCH 15/15] mwl8k: " Luis Chamberlain
2020-05-09 18:35 ` [PATCH 00/15] net: taint when the device driver firmware crashes Jakub Kicinski
2020-05-11 14:11   ` Luis Chamberlain [this message]
2020-05-10  1:01 ` Shannon Nelson
2020-05-10  1:58   ` Andrew Lunn
2020-05-10  2:15     ` Shannon Nelson
2020-05-11 14:13       ` Luis Chamberlain
2020-05-11 19:21   ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200511141113.GP11244@42.do-not-panic.com \
    --to=mcgrof@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=aquini@redhat.com \
    --cc=arnd@arndb.de \
    --cc=bhe@redhat.com \
    --cc=cai@lca.pw \
    --cc=daniel.vetter@ffwll.ch \
    --cc=davem@davemloft.net \
    --cc=dyoung@redhat.com \
    --cc=gpiccoli@canonical.com \
    --cc=jeyu@kernel.org \
    --cc=jiri@resnulli.us \
    --cc=keescook@chromium.org \
    --cc=kuba@kernel.org \
    --cc=kvalo@codeaurora.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchehab+samsung@kernel.org \
    --cc=mingo@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=schlad@suse.de \
    --cc=tglx@linutronix.de \
    --cc=tiwai@suse.de \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.