linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thorsten Leemhuis <linux@leemhuis.info>
To: Jonathan Corbet <corbet@lwn.net>
Cc: Randy Dunlap <rdunlap@infradead.org>,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [RFC PATCH v2 08/26] docs: reporting-bugs: make readers check the taint flag
Date: Thu, 12 Nov 2020 18:58:45 +0100	[thread overview]
Message-ID: <d21b7ead04852d3de7dd6892fe5e27aca1f345ff.1605203187.git.linux@leemhuis.info> (raw)
In-Reply-To: <cover.1605203187.git.linux@leemhuis.info>

Tell users early in the process to check the taint flag, as that will
prevent them from investing time into a report that might be worthless.
That way users for example will notice that the issue they face is in
fact caused by an add-on kernel module or and Oops that happened
earlier.

This approach has a downside: users will later have to check the flag
again with the mainline kernel the guide tells them to install. But that
is an acceptable trade-off here, as checking only takes a few seconds
and can easily prevent wasting time in useless testing and debugging.

Signed-off-by: Thorsten Leemhuis <linux@leemhuis.info>
---

= RFC =

Should "disable DKMS" come before this step? But then the backup step
right before that one would need to be moved as well, as disabling DKMS
can easily mix things up.
---
 Documentation/admin-guide/reporting-bugs.rst  | 69 +++++++++++++++++++
 Documentation/admin-guide/tainted-kernels.rst |  2 +
 2 files changed, 71 insertions(+)

diff --git a/Documentation/admin-guide/reporting-bugs.rst b/Documentation/admin-guide/reporting-bugs.rst
index fdd79d99c18f..8ac491419bde 100644
--- a/Documentation/admin-guide/reporting-bugs.rst
+++ b/Documentation/admin-guide/reporting-bugs.rst
@@ -319,6 +319,75 @@ fatal error where the kernel stop itself) with a 'Oops' (a recoverable error),
 as the kernel remains running after the latter.
 
 
+Check 'taint' flag
+------------------
+
+    *Check if your kernel was 'tainted' when the issue occurred, as the event
+    that made the kernel set this flag might be causing the issue you face.*
+
+The kernel marks itself with a 'taint' flag when something happens that might
+lead to follow-up errors that look totally unrelated. The issue you face might
+be such an error if your kernel is tainted. That's why it's in your interest to
+rule this out early before investing more time into this process. This is the
+only reason why this step is here, as this process later will tell you to
+install the latest mainline kernel; you will need to check the taint flag again
+then, as that's when it matters because it's the kernel the report will focus
+on.
+
+On a running system is easy to check if the kernel tainted itself: if ``cat
+/proc/sys/kernel/tainted`` returns '0' then the kernel is not tainted and
+everything is fine. Checking that file is impossible in some situations; that's
+why the kernel also mentions the taint status when it reports an internal
+problem (a 'kernel bug'), a recoverable error (a 'kernel Oops') or a
+non-recoverable error before halting operation (a 'kernel panic'). Look near
+the top of the error messages printed when one of these occurs and search for a
+line starting with 'CPU:'. It should end with 'Not tainted' if the kernel was
+not tainted when it noticed the problem; it was tainted if you see 'Tainted:'
+followed by a few spaces and some letters.
+
+If your kernel is tainted, study
+:ref:`Documentation/admin-guide/tainted-kernels.rst <taintedkernels>` to find
+out why. Try to eliminate the reason. Often it's caused by one these three
+things:
+
+ 1. A recoverable error (a 'kernel Oops') occurred and the kernel tainted
+    itself, as the kernel knows it might misbehave in strange ways after that
+    point. In that case check your kernel or system log and look for a section
+    that starts with this::
+
+       Oops: 0000 [#1] SMP
+
+    That's the first Oops since boot-up, as the '#1' between the brackets shows.
+    Every Oops and any other problem that happens after that point might be a
+    follow-up problem to that first Oops, even if both look totally unrelated.
+    Rule this out by getting rid of the cause for the first Oops and reproducing
+    the issue afterwards. Sometimes simply restarting will be enough, sometimes a
+    change to the configuration followed by a reboot can eliminate the Oops. But
+    don't invest too much time into this at this point of the process, as the
+    cause for the Oops might already be fixed in the newer Linux kernel version
+    you are going to install later in this process.
+
+ 2. Your system uses a software that installs its own kernel modules, for
+    example Nvidia's proprietary graphics driver or VirtualBox. The kernel
+    taints itself when it loads such module from external sources (even if
+    they are Open Source): they sometimes cause errors in unrelated kernel
+    areas and thus might be causing the issue you face. You therefore have to
+    prevent those modules from loading when you want to report an issue to the
+    Linux kernel developers. Most of the time the easiest way to do that is:
+    temporarily uninstall such software including any modules they might have
+    installed. Afterwards reboot.
+
+ 3. The kernel also taints itself when it's loading a module that resides in
+    the staging tree of the Linux kernel source. That's a special area for
+    code (mostly drivers) that does not yet fulfill the normal Linux kernel
+    quality standards. When you report an issue with such a module it's
+    obviously okay if the kernel is tainted; just make sure the module in
+    question is the only reason for the taint. If the issue happens in an
+    unrelated area reboot and temporarily block the module from being loaded
+    by specifying ``foo.blacklist=1`` as kernel parameter (replace 'foo' with
+    the name of the module in question).
+
+
 .. ############################################################################
 .. Temporary marker added while this document is rewritten. Sections above
 .. are new and dual-licensed under GPLv2+ and CC-BY 4.0, those below are old.
diff --git a/Documentation/admin-guide/tainted-kernels.rst b/Documentation/admin-guide/tainted-kernels.rst
index f718a2eaf1f6..04d8da1fc080 100644
--- a/Documentation/admin-guide/tainted-kernels.rst
+++ b/Documentation/admin-guide/tainted-kernels.rst
@@ -1,3 +1,5 @@
+.. _taintedkernels:
+
 Tainted kernels
 ---------------
 
-- 
2.28.0


  parent reply	other threads:[~2020-11-12 17:59 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-12 17:58 [RFC PATCH v2 00/26] Make reporting-bugs easier to grasp and yet more detailed & helpful Thorsten Leemhuis
2020-11-12 17:58 ` [RFC PATCH v2 01/26] docs: reporting-bugs: temporary markers for licensing and diff reasons Thorsten Leemhuis
2020-11-12 17:58 ` [RFC PATCH v2 02/26] docs: reporting-bugs: Create a TLDR how to report issues Thorsten Leemhuis
2020-11-12 17:58 ` [RFC PATCH v2 03/26] docs: reporting-bugs: step-by-step guide on " Thorsten Leemhuis
2020-11-12 17:58 ` [RFC PATCH v2 04/26] docs: reporting-bugs: step-by-step guide for issues in stable & longterm Thorsten Leemhuis
2020-11-12 17:58 ` [RFC PATCH v2 05/26] docs: reporting-bugs: begin reference section providing details Thorsten Leemhuis
2020-11-12 17:58 ` [RFC PATCH v2 06/26] docs: reporting-bugs: point out we only care about fresh vanilla kernels Thorsten Leemhuis
2020-11-12 17:58 ` [RFC PATCH v2 07/26] docs: reporting-bugs: let users classify their issue Thorsten Leemhuis
2020-11-12 17:58 ` Thorsten Leemhuis [this message]
2020-11-19  0:05   ` [RFC PATCH v2 08/26] docs: reporting-bugs: make readers check the taint flag Jonathan Corbet
2020-11-19 10:26     ` Thorsten Leemhuis
2020-11-12 17:58 ` [RFC PATCH v2 09/26] docs: reporting-bugs: help users find the proper place for their report Thorsten Leemhuis
2020-11-12 17:58 ` [RFC PATCH v2 10/26] docs: reporting-bugs: remind people to look for existing reports Thorsten Leemhuis
2020-11-12 17:58 ` [RFC PATCH v2 11/26] docs: reporting-bugs: remind people to back up their data Thorsten Leemhuis
2020-11-12 17:58 ` [RFC PATCH v2 12/26] docs: reporting-bugs: tell users to disable DKMS et al Thorsten Leemhuis
2020-11-12 17:58 ` [RFC PATCH v2 13/26] docs: reporting-bugs: point out the environment might be causing issue Thorsten Leemhuis
2020-11-12 17:58 ` [RFC PATCH v2 14/26] docs: reporting-bugs: make users write notes, one for each issue Thorsten Leemhuis
2020-11-12 17:58 ` [RFC PATCH v2 15/26] docs: reporting-bugs: make readers test a fresh kernel Thorsten Leemhuis
2020-11-12 17:58 ` [RFC PATCH v2 16/26] docs: reporting-bugs: let users check taint status again Thorsten Leemhuis
2020-11-12 17:58 ` [RFC PATCH v2 17/26] docs: reporting-bugs: explain options if reproducing on mainline fails Thorsten Leemhuis
2020-11-12 17:58 ` [RFC PATCH v2 18/26] docs: reporting-bugs: let users optimize their notes Thorsten Leemhuis
2020-11-12 17:58 ` [RFC PATCH v2 19/26] docs: reporting-bugs: decode failure messages [need help!] Thorsten Leemhuis
2020-11-12 17:58 ` [RFC PATCH v2 20/26] docs: reporting-bugs: instructions for handling regressions Thorsten Leemhuis
2020-11-12 17:58 ` [RFC PATCH v2 21/26] docs: reporting-bugs: details on writing and sending the report Thorsten Leemhuis
2020-11-19  0:17   ` Jonathan Corbet
2020-11-19  9:42     ` Thorsten Leemhuis
2020-11-12 17:58 ` [RFC PATCH v2 22/26] docs: reporting-bugs: explain what users should do once the report is out Thorsten Leemhuis
2020-11-12 17:59 ` [RFC PATCH v2 23/26] docs: reporting-bugs: details for issues specific to stable and longterm Thorsten Leemhuis
2020-11-12 17:59 ` [RFC PATCH v2 24/26] docs: reporting-bugs: explain why users might get neither reply nor fix Thorsten Leemhuis
2020-11-12 17:59 ` [RFC PATCH v2 25/26] docs: reporting-bugs: explain things could be easier Thorsten Leemhuis
2020-11-12 17:59 ` [RFC PATCH v2 26/26] docs: reporting-bugs: add SPDX tag and license hint, remove markers Thorsten Leemhuis
2020-11-13 22:33 ` [RFC PATCH v2 00/26] Make reporting-bugs easier to grasp and yet more detailed & helpful Jonathan Corbet
2020-11-13 22:47   ` Randy Dunlap
2020-11-15 10:13   ` Thorsten Leemhuis
2020-11-19  0:29     ` Jonathan Corbet
2020-11-19 12:29       ` Thorsten Leemhuis
2020-11-19 16:20         ` Randy Dunlap
2020-11-20 21:59         ` Jonathan Corbet
2020-11-20 10:46       ` Thorsten Leemhuis
2020-11-20 16:27         ` Randy Dunlap
2020-11-20 21:58         ` Jonathan Corbet
2020-11-22  5:33           ` Thorsten Leemhuis
2020-11-22  5:42             ` Randy Dunlap

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d21b7ead04852d3de7dd6892fe5e27aca1f345ff.1605203187.git.linux@leemhuis.info \
    --to=linux@leemhuis.info \
    --cc=corbet@lwn.net \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rdunlap@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).