From: Johannes Berg <johannes@sipsolutions.net> To: Luis Chamberlain <mcgrof@kernel.org>, Steve deRosier <derosier@gmail.com> Cc: Ben Greear <greearb@candelatech.com>, jeyu@kernel.org, akpm@linux-foundation.org, arnd@arndb.de, rostedt@goodmis.org, mingo@redhat.com, aquini@redhat.com, cai@lca.pw, dyoung@redhat.com, bhe@redhat.com, peterz@infradead.org, tglx@linutronix.de, gpiccoli@canonical.com, pmladek@suse.com, Takashi Iwai <tiwai@suse.de>, schlad@suse.de, andriy.shevchenko@linux.intel.com, keescook@chromium.org, daniel.vetter@ffwll.ch, will@kernel.org, mchehab+samsung@kernel.org, Kalle Valo <kvalo@codeaurora.org>, "David S. Miller" <davem@davemloft.net>, Network Development <netdev@vger.kernel.org>, LKML <linux-kernel@vger.kernel.org>, linux-wireless <linux-wireless@vger.kernel.org>, ath10k@lists.infradead.org Subject: Re: [PATCH v2 12/15] ath10k: use new module_firmware_crashed() Date: Mon, 18 May 2020 21:25:09 +0200 [thread overview] Message-ID: <e3d978c8fa6a4075f12e843548d41e2c8ab537d1.camel@sipsolutions.net> (raw) In-Reply-To: <20200518190930.GO11244@42.do-not-panic.com> (sfid-20200518_210935_354047_0199DB8F) On Mon, 2020-05-18 at 19:09 +0000, Luis Chamberlain wrote: > > Unfortunately a "taint" is interpreted by many users as: "your kernel > > is really F#*D up, you better do something about it right now." > > Assuming they're paying attention at all in the first place of course. > > Taint historically has been used and still is today to help rule out > whether or not you get support, or how you get support. > > For instance, a staging driver is not supported by some upstream > developers, but it will be by those who help staging and Greg. TAINT_CRAP > cannot be even more clear. > > So, no, it is not just about "hey your kernel is messed up", there are > clear support boundaries being drawn. Err, no. Those two are most definitely related. Have you looked at (most or some or whatever) staging drivers recently? Those contain all kinds of garbage that might do whatever with your kernel. Of course that's not a completely clear boundary, maybe you can find a driver in staging that's perfect code just not written to kernel style? But I find that hard to believe, in most cases. So no, it's really not about "[a] staging driver is not supported" vs. "your kernel is messed up". The very fact that you loaded one of those things might very well have messed up your kernel entirely. > These days though, I think we all admit, that firmware crashes can use > a better generic infrastructure for ensuring that clearly affecting-user > experience issues. This patch is about that *when and if these happen*, > we annotate it in the kernel for support pursposes. That's all fine, I just don't think it's appropriate to pretend that your kernel is now 'tainted' (think about the meaning of that word) when the firmware of some random device crashed. Heck, that could have been a USB device that was since unplugged. Unless the driver is complete garbage (hello staging again?) that really should have no lasting effect on the system itself. > Recovery without affecting user experience would be great, the taint is > *not* for those cases. The taint definition has: > > + 18) ``Q`` used by device drivers to annotate that the device driver's firmware > + has crashed and the device's operation has been severely affected. The > + device may be left in a crippled state, requiring full driver removal / > + addition, system reboot, or it is unclear how long recovery will take. > > Let me know if this is not clear. It's pretty clear, but even then, first of all I doubt this is the case for many of the places that you've sprinkled the annotation on, and secondly it actually hides useful information. Regardless of the support issue, I think this hiding of information is also problematic. I really think we'd all be better off if you just made a sysfs file (I mistyped debugfs in some other email, sorry, apparently you didn't see the correction in time) that listed which device(s) crashed and how many times. That would actually be useful. Because honestly, if a random device crashed for some random reason, that's pretty much a non-event. If it keeps happening, then we might even want to know about it. You can obviously save the contents of this file into your bug reports automatically and act accordingly, but I think you'll find that this is far more useful than saying "TAINT_FIRMWARE_CRASHED" so I'll ignore this report. Yeah, that might be reasonable thing if the bug report is about slow wifi *and* you see that ath10k firmware crashed every 10 seconds, but if it just crashed once a few days earlier it's of no importance to the system anymore ... And certainly a reasonable driver (which I believe ath10k to be) would _not_ randomly start corrupting memory because its firmware crashed. Which really is what tainting the kernel is about. So no, even with all that, I still really believe you're solving the wrong problem. Having information about firmware crashes, preferably with some kind of frequency information attached, and *clearly* with information about which device attached would be _great_. Munging it all into one bit is actively harmful, IMO. johannes
WARNING: multiple messages have this Message-ID (diff)
From: Johannes Berg <johannes@sipsolutions.net> To: Luis Chamberlain <mcgrof@kernel.org>, Steve deRosier <derosier@gmail.com> Cc: linux-wireless <linux-wireless@vger.kernel.org>, aquini@redhat.com, peterz@infradead.org, daniel.vetter@ffwll.ch, mchehab+samsung@kernel.org, will@kernel.org, Ben Greear <greearb@candelatech.com>, bhe@redhat.com, ath10k@lists.infradead.org, Takashi Iwai <tiwai@suse.de>, mingo@redhat.com, dyoung@redhat.com, pmladek@suse.com, keescook@chromium.org, arnd@arndb.de, gpiccoli@canonical.com, rostedt@goodmis.org, cai@lca.pw, tglx@linutronix.de, andriy.shevchenko@linux.intel.com, Kalle Valo <kvalo@codeaurora.org>, Network Development <netdev@vger.kernel.org>, schlad@suse.de, LKML <linux-kernel@vger.kernel.org>, jeyu@kernel.org, akpm@linux-foundation.org, "David S. Miller" <davem@davemloft.net> Subject: Re: [PATCH v2 12/15] ath10k: use new module_firmware_crashed() Date: Mon, 18 May 2020 21:25:09 +0200 [thread overview] Message-ID: <e3d978c8fa6a4075f12e843548d41e2c8ab537d1.camel@sipsolutions.net> (raw) In-Reply-To: <20200518190930.GO11244@42.do-not-panic.com> (sfid-20200518_210935_354047_0199DB8F) On Mon, 2020-05-18 at 19:09 +0000, Luis Chamberlain wrote: > > Unfortunately a "taint" is interpreted by many users as: "your kernel > > is really F#*D up, you better do something about it right now." > > Assuming they're paying attention at all in the first place of course. > > Taint historically has been used and still is today to help rule out > whether or not you get support, or how you get support. > > For instance, a staging driver is not supported by some upstream > developers, but it will be by those who help staging and Greg. TAINT_CRAP > cannot be even more clear. > > So, no, it is not just about "hey your kernel is messed up", there are > clear support boundaries being drawn. Err, no. Those two are most definitely related. Have you looked at (most or some or whatever) staging drivers recently? Those contain all kinds of garbage that might do whatever with your kernel. Of course that's not a completely clear boundary, maybe you can find a driver in staging that's perfect code just not written to kernel style? But I find that hard to believe, in most cases. So no, it's really not about "[a] staging driver is not supported" vs. "your kernel is messed up". The very fact that you loaded one of those things might very well have messed up your kernel entirely. > These days though, I think we all admit, that firmware crashes can use > a better generic infrastructure for ensuring that clearly affecting-user > experience issues. This patch is about that *when and if these happen*, > we annotate it in the kernel for support pursposes. That's all fine, I just don't think it's appropriate to pretend that your kernel is now 'tainted' (think about the meaning of that word) when the firmware of some random device crashed. Heck, that could have been a USB device that was since unplugged. Unless the driver is complete garbage (hello staging again?) that really should have no lasting effect on the system itself. > Recovery without affecting user experience would be great, the taint is > *not* for those cases. The taint definition has: > > + 18) ``Q`` used by device drivers to annotate that the device driver's firmware > + has crashed and the device's operation has been severely affected. The > + device may be left in a crippled state, requiring full driver removal / > + addition, system reboot, or it is unclear how long recovery will take. > > Let me know if this is not clear. It's pretty clear, but even then, first of all I doubt this is the case for many of the places that you've sprinkled the annotation on, and secondly it actually hides useful information. Regardless of the support issue, I think this hiding of information is also problematic. I really think we'd all be better off if you just made a sysfs file (I mistyped debugfs in some other email, sorry, apparently you didn't see the correction in time) that listed which device(s) crashed and how many times. That would actually be useful. Because honestly, if a random device crashed for some random reason, that's pretty much a non-event. If it keeps happening, then we might even want to know about it. You can obviously save the contents of this file into your bug reports automatically and act accordingly, but I think you'll find that this is far more useful than saying "TAINT_FIRMWARE_CRASHED" so I'll ignore this report. Yeah, that might be reasonable thing if the bug report is about slow wifi *and* you see that ath10k firmware crashed every 10 seconds, but if it just crashed once a few days earlier it's of no importance to the system anymore ... And certainly a reasonable driver (which I believe ath10k to be) would _not_ randomly start corrupting memory because its firmware crashed. Which really is what tainting the kernel is about. So no, even with all that, I still really believe you're solving the wrong problem. Having information about firmware crashes, preferably with some kind of frequency information attached, and *clearly* with information about which device attached would be _great_. Munging it all into one bit is actively harmful, IMO. johannes _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
next prev parent reply other threads:[~2020-05-18 19:26 UTC|newest] Thread overview: 128+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-05-15 21:28 [PATCH v2 00/15] net: taint when the device driver firmware crashes Luis Chamberlain 2020-05-15 21:28 ` [PATCH v2 01/15] taint: add module firmware crash taint support Luis Chamberlain 2020-05-16 4:03 ` Rafael Aquini 2020-05-19 16:42 ` Jessica Yu 2020-05-22 5:17 ` Luis Chamberlain 2020-05-15 21:28 ` [PATCH v2 02/15] ethernet/839: use new module_firmware_crashed() Luis Chamberlain 2020-05-16 4:04 ` Rafael Aquini 2020-05-15 21:28 ` [PATCH v2 03/15] bnx2x: " Luis Chamberlain 2020-05-16 4:05 ` Rafael Aquini 2020-05-15 21:28 ` [PATCH v2 04/15] bnxt: " Luis Chamberlain 2020-05-16 4:06 ` Rafael Aquini 2020-05-16 5:14 ` Vasundhara Volam 2020-05-15 21:28 ` [PATCH v2 05/15] bna: " Luis Chamberlain 2020-05-16 4:07 ` Rafael Aquini 2020-05-15 21:28 ` [PATCH v2 06/15] liquidio: " Luis Chamberlain 2020-05-16 4:07 ` Rafael Aquini 2020-05-15 21:28 ` [PATCH v2 07/15] cxgb4: " Luis Chamberlain 2020-05-16 4:09 ` Rafael Aquini 2020-05-15 21:28 ` [PATCH v2 08/15] ehea: " Luis Chamberlain 2020-05-16 4:09 ` Rafael Aquini 2020-05-15 21:28 ` [PATCH v2 09/15] qed: " Luis Chamberlain 2020-05-16 4:10 ` Rafael Aquini 2020-05-15 21:28 ` [PATCH v2 10/15] soc: qcom: ipa: " Luis Chamberlain 2020-05-16 4:10 ` Rafael Aquini 2020-05-19 22:34 ` Alex Elder 2020-05-22 5:28 ` Luis Chamberlain 2020-05-22 20:52 ` Alex Elder 2020-05-22 21:53 ` Luis Chamberlain 2020-05-15 21:28 ` [PATCH v2 11/15] wimax/i2400m: " Luis Chamberlain 2020-05-16 4:11 ` Rafael Aquini 2020-05-15 21:28 ` [PATCH v2 12/15] ath10k: " Luis Chamberlain 2020-05-15 21:28 ` Luis Chamberlain 2020-05-16 4:11 ` Rafael Aquini 2020-05-16 4:11 ` Rafael Aquini 2020-05-16 13:24 ` Johannes Berg 2020-05-16 13:24 ` Johannes Berg 2020-05-16 13:50 ` Johannes Berg 2020-05-16 13:50 ` Johannes Berg 2020-05-18 16:56 ` Luis Chamberlain 2020-05-18 16:56 ` Luis Chamberlain 2020-05-19 1:23 ` Brian Norris 2020-05-19 1:23 ` Brian Norris 2020-05-19 14:02 ` Luis Chamberlain 2020-05-19 14:02 ` Luis Chamberlain 2020-05-20 0:47 ` Brian Norris 2020-05-20 0:47 ` Brian Norris 2020-05-20 5:37 ` Emmanuel Grumbach 2020-05-20 5:37 ` Emmanuel Grumbach 2020-05-20 8:32 ` Andy Shevchenko 2020-05-20 8:32 ` Andy Shevchenko 2020-05-21 19:01 ` Brian Norris 2020-05-21 19:01 ` Brian Norris 2020-05-22 5:12 ` Emmanuel Grumbach 2020-05-22 5:12 ` Emmanuel Grumbach 2020-05-22 5:23 ` Luis Chamberlain 2020-05-22 5:23 ` Luis Chamberlain 2020-05-18 16:51 ` Luis Chamberlain 2020-05-18 16:51 ` Luis Chamberlain 2020-05-18 16:58 ` Ben Greear 2020-05-18 16:58 ` Ben Greear 2020-05-18 17:09 ` Luis Chamberlain 2020-05-18 17:09 ` Luis Chamberlain 2020-05-18 17:15 ` Ben Greear 2020-05-18 17:15 ` Ben Greear 2020-05-18 17:18 ` Luis Chamberlain 2020-05-18 17:18 ` Luis Chamberlain 2020-05-18 18:06 ` Steve deRosier 2020-05-18 18:06 ` Steve deRosier 2020-05-18 19:09 ` Luis Chamberlain 2020-05-18 19:09 ` Luis Chamberlain 2020-05-18 19:25 ` Johannes Berg [this message] 2020-05-18 19:25 ` Johannes Berg 2020-05-18 19:59 ` Luis Chamberlain 2020-05-18 19:59 ` Luis Chamberlain 2020-05-18 20:07 ` Johannes Berg 2020-05-18 20:07 ` Johannes Berg 2020-05-18 21:18 ` Luis Chamberlain 2020-05-18 21:18 ` Luis Chamberlain 2020-05-18 20:28 ` Jakub Kicinski 2020-05-18 20:28 ` Jakub Kicinski 2020-05-18 20:29 ` Johannes Berg 2020-05-18 20:29 ` Johannes Berg 2020-05-18 20:35 ` Jakub Kicinski 2020-05-18 20:35 ` Jakub Kicinski 2020-05-18 20:41 ` Johannes Berg 2020-05-18 20:41 ` Johannes Berg 2020-05-18 20:46 ` Jakub Kicinski 2020-05-18 20:46 ` Jakub Kicinski 2020-05-18 21:22 ` Luis Chamberlain 2020-05-18 21:22 ` Luis Chamberlain 2020-05-18 22:16 ` Jakub Kicinski 2020-05-18 22:16 ` Jakub Kicinski 2020-05-19 1:05 ` Luis Chamberlain 2020-05-19 1:05 ` Luis Chamberlain 2020-05-19 21:15 ` [RFC 1/2] devlink: add simple fw crash helpers Jakub Kicinski 2020-05-19 21:15 ` Jakub Kicinski 2020-05-22 5:20 ` Luis Chamberlain 2020-05-22 5:20 ` Luis Chamberlain 2020-05-22 17:17 ` Jakub Kicinski 2020-05-22 17:17 ` Jakub Kicinski 2020-05-22 20:46 ` Johannes Berg 2020-05-22 20:46 ` Johannes Berg 2020-05-22 21:51 ` Luis Chamberlain 2020-05-22 21:51 ` Luis Chamberlain 2020-05-22 23:23 ` Steve deRosier 2020-05-22 23:23 ` Steve deRosier 2020-05-22 23:44 ` Luis Chamberlain 2020-05-22 23:44 ` Luis Chamberlain 2020-05-25 9:07 ` Andy Shevchenko 2020-05-25 9:07 ` Andy Shevchenko 2020-05-25 17:08 ` Ben Greear 2020-05-25 17:08 ` Ben Greear 2020-05-25 20:57 ` Jakub Kicinski 2020-05-25 20:57 ` Jakub Kicinski 2020-07-30 13:56 ` Johannes Berg 2020-07-30 13:56 ` Johannes Berg 2020-05-22 21:49 ` Luis Chamberlain 2020-05-22 21:49 ` Luis Chamberlain 2020-05-19 21:15 ` [RFC 2/2] i2400m: use devlink health reporter Jakub Kicinski 2020-05-19 21:15 ` Jakub Kicinski 2020-05-15 21:28 ` [PATCH v2 13/15] ath6kl: use new module_firmware_crashed() Luis Chamberlain 2020-05-15 21:28 ` Luis Chamberlain 2020-05-16 4:12 ` Rafael Aquini 2020-05-16 4:12 ` Rafael Aquini 2020-05-15 21:28 ` [PATCH v2 14/15] brcm80211: " Luis Chamberlain 2020-05-16 4:13 ` Rafael Aquini 2020-05-15 21:28 ` [PATCH v2 15/15] mwl8k: " Luis Chamberlain 2020-05-16 4:13 ` Rafael Aquini
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=e3d978c8fa6a4075f12e843548d41e2c8ab537d1.camel@sipsolutions.net \ --to=johannes@sipsolutions.net \ --cc=akpm@linux-foundation.org \ --cc=andriy.shevchenko@linux.intel.com \ --cc=aquini@redhat.com \ --cc=arnd@arndb.de \ --cc=ath10k@lists.infradead.org \ --cc=bhe@redhat.com \ --cc=cai@lca.pw \ --cc=daniel.vetter@ffwll.ch \ --cc=davem@davemloft.net \ --cc=derosier@gmail.com \ --cc=dyoung@redhat.com \ --cc=gpiccoli@canonical.com \ --cc=greearb@candelatech.com \ --cc=jeyu@kernel.org \ --cc=keescook@chromium.org \ --cc=kvalo@codeaurora.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-wireless@vger.kernel.org \ --cc=mcgrof@kernel.org \ --cc=mchehab+samsung@kernel.org \ --cc=mingo@redhat.com \ --cc=netdev@vger.kernel.org \ --cc=peterz@infradead.org \ --cc=pmladek@suse.com \ --cc=rostedt@goodmis.org \ --cc=schlad@suse.de \ --cc=tglx@linutronix.de \ --cc=tiwai@suse.de \ --cc=will@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.