linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/3] coding-style.rst: document BUG() and WARN() rules
@ 2022-09-20 12:22 David Hildenbrand
  2022-09-20 12:23 ` [PATCH v1 1/3] coding-style.rst: document BUG() and WARN() rules ("do not crash the kernel") David Hildenbrand
                   ` (3 more replies)
  0 siblings, 4 replies; 20+ messages in thread
From: David Hildenbrand @ 2022-09-20 12:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lukas Bulwahn, Jonathan Corbet, Baoquan He, David Hildenbrand,
	Linus Torvalds, Dave Young, linux-doc, Nicholas Piggin,
	Jani Nikula, linux-mm, David Laight, Dwaipayan Ray,
	Andy Whitcroft, Joe Perches, Andrew Morton, linuxppc-dev,
	Ingo Molnar, Vivek Goyal

As it seems to be rather unclear if/when to use BUG(), BUG_ON(),
VM_BUG_ON(), WARN_ON_ONCE(), ... let's try to document the result of a
recent discussion.

Details can be found in patch #1.

RFC -> v1:
* "coding-style.rst: document BUG() and WARN() rules ("do not crash the
   kernel")"
 -> Rephrase/extend according to John
 -> Add some details regarding the use of panic()
* powerpc/prom_init: drop PROM_BUG()
 -> Added
* "checkpatch: warn on usage of VM_BUG_ON() and other BUG variants"
 -> Warn on more variants


Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Joe Perches <joe@perches.com>
Cc: Dwaipayan Ray <dwaipayanray1@gmail.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>

David Hildenbrand (3):
  coding-style.rst: document BUG() and WARN() rules ("do not crash the
    kernel")
  powerpc/prom_init: drop PROM_BUG()
  checkpatch: warn on usage of VM_BUG_ON() and other BUG variants

 Documentation/process/coding-style.rst | 61 ++++++++++++++++++++++++++
 arch/powerpc/kernel/prom_init.c        |  6 ---
 scripts/checkpatch.pl                  |  6 +--
 3 files changed, 64 insertions(+), 9 deletions(-)

-- 
2.37.3


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v1 1/3] coding-style.rst: document BUG() and WARN() rules ("do not crash the kernel")
  2022-09-20 12:22 [PATCH v1 0/3] coding-style.rst: document BUG() and WARN() rules David Hildenbrand
@ 2022-09-20 12:23 ` David Hildenbrand
  2022-09-21  4:40   ` Kalle Valo
                     ` (2 more replies)
  2022-09-20 12:23 ` [PATCH v1 2/3] powerpc/prom_init: drop PROM_BUG() David Hildenbrand
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 20+ messages in thread
From: David Hildenbrand @ 2022-09-20 12:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lukas Bulwahn, Jonathan Corbet, Baoquan He, David Hildenbrand,
	Linus Torvalds, Dave Young, linux-doc, Nicholas Piggin,
	Jani Nikula, linux-mm, David Laight, Dwaipayan Ray,
	Andy Whitcroft, Joe Perches, Andrew Morton, linuxppc-dev,
	Ingo Molnar, Vivek Goyal

Linus notes [1] that the introduction of new code that uses VM_BUG_ON()
is just as bad as BUG_ON(), because it will crash the kernel on
distributions that enable CONFIG_DEBUG_VM (like Fedora):

    VM_BUG_ON() has the exact same semantics as BUG_ON. It is literally
    no different, the only difference is "we can make the code smaller
    because these are less important". [2]

This resulted in a more generic discussion about usage of BUG() and
friends. While there might be corner cases that still deserve a BUG_ON(),
most BUG_ON() cases should simply use WARN_ON_ONCE() and implement a
recovery path if reasonable:

    The only possible case where BUG_ON can validly be used is "I have
    some fundamental data corruption and cannot possibly return an
    error". [2]

As a very good approximation is the general rule:

    "absolutely no new BUG_ON() calls _ever_" [2]

... not even if something really shouldn't ever happen and is merely for
documenting that an invariant always has to hold. However, there are sill
exceptions where BUG_ON() may be used:

    If you have a "this is major internal corruption, there's no way we can
    continue", then BUG_ON() is appropriate. [3]

There is only one good BUG_ON():

    Now, that said, there is one very valid sub-form of BUG_ON():
    BUILD_BUG_ON() is absolutely 100% fine. [2]

While WARN will also crash the machine with panic_on_warn set, that's
exactly to be expected:

    So we have two very different cases: the "virtual machine with good
    logging where a dead machine is fine" - use 'panic_on_warn'. And
    the actual real hardware with real drivers, running real loads by
    users. [4]

The basic idea is that warnings will similarly get reported by users
and be found during testing. However, in contrast to a BUG(), there is a
way to actually influence the expected behavior (e.g., panic_on_warn)
and to eventually keep the machine alive to extract some debug info.

Ingo notes that not all WARN_ON_ONCE cases need recovery. If we don't ever
expect this code to trigger in any case, recovery code is not really
helpful.

    I'd prefer to keep all these warnings 'simple' - i.e. no attempted
    recovery & control flow, unless we ever expect these to trigger.
    [5]

There have been different rules floating around that were never properly
documented. Let's try to clarify.

[1] https://lkml.kernel.org/r/CAHk-=wiEAH+ojSpAgx_Ep=NKPWHU8AdO3V56BXcCsU97oYJ1EA@mail.gmail.com
[2] https://lore.kernel.org/r/CAHk-=wg40EAZofO16Eviaj7mfqDhZ2gVEbvfsMf6gYzspRjYvw@mail.gmail.com
[2] https://lkml.kernel.org/r/CAHk-=wit-DmhMfQErY29JSPjFgebx_Ld+pnerc4J2Ag990WwAA@mail.gmail.com
[4] https://lore.kernel.org/r/CAHk-=wgF7K2gSSpy=m_=K3Nov4zaceUX9puQf1TjkTJLA2XC_g@mail.gmail.com
[5] https://lore.kernel.org/r/YwIW+mVeZoTOxn%2F4@gmail.com

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 Documentation/process/coding-style.rst | 61 ++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/Documentation/process/coding-style.rst b/Documentation/process/coding-style.rst
index 03eb53fd029a..e05899cbfd49 100644
--- a/Documentation/process/coding-style.rst
+++ b/Documentation/process/coding-style.rst
@@ -1186,6 +1186,67 @@ expression used.  For instance:
 	#endif /* CONFIG_SOMETHING */
 
 
+22) Do not crash the kernel
+---------------------------
+
+In general, it is not the kernel developer's decision to crash the kernel.
+
+Avoid panic()
+=============
+
+panic() should be used with care and primarily only during system boot.
+panic() is, for example, acceptable when running out of memory during boot and
+not being able to continue.
+
+Use WARN() rather than BUG()
+============================
+
+Do not add new code that uses any of the BUG() variants, such as BUG(),
+BUG_ON(), or VM_BUG_ON(). Instead, use a WARN*() variant, preferably
+WARN_ON_ONCE(), and possibly with recovery code. Recovery code is not
+required if there is no reasonable way to at least partially recover.
+
+"I'm too lazy to do error handling" is not an excuse for using BUG(). Major
+internal corruptions with no way of continuing may still use BUG(), but need
+good justification.
+
+Use WARN_ON_ONCE() rather than WARN() or WARN_ON()
+**************************************************
+
+WARN_ON_ONCE() is generally preferred over WARN() or WARN_ON(), because it
+is common for a given warning condition, if it occurs at all, to occur
+multiple times. This can fill up and wrap the kernel log, and can even slow
+the system enough that the excessive logging turns into its own, additional
+problem.
+
+Do not WARN lightly
+*******************
+
+WARN*() is intended for unexpected, this-should-never-happen situations.
+WARN*() macros are not to be used for anything that is expected to happen
+during normal operation. These are not pre- or post-condition asserts, for
+example. Again: WARN*() must not be used for a condition that is expected
+to trigger easily, for example, by user space actions. pr_warn_once() is a
+possible alternative, if you need to notify the user of a problem.
+
+Do not worry about panic_on_warn users
+**************************************
+
+A few more words about panic_on_warn: Remember that ``panic_on_warn`` is an
+available kernel option, and that many users set this option. This is why
+there is a "Do not WARN lightly" writeup, above. However, the existence of
+panic_on_warn users is not a valid reason to avoid the judicious use
+WARN*(). That is because, whoever enables panic_on_warn has explicitly
+asked the kernel to crash if a WARN*() fires, and such users must be
+prepared to deal with the consequences of a system that is somewhat more
+likely to crash.
+
+Use BUILD_BUG_ON() for compile-time assertions
+**********************************************
+
+The use of BUILD_BUG_ON() is acceptable and encouraged, because it is a
+compile-time assertion that has no effect at runtime.
+
 Appendix I) References
 ----------------------
 
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v1 2/3] powerpc/prom_init: drop PROM_BUG()
  2022-09-20 12:22 [PATCH v1 0/3] coding-style.rst: document BUG() and WARN() rules David Hildenbrand
  2022-09-20 12:23 ` [PATCH v1 1/3] coding-style.rst: document BUG() and WARN() rules ("do not crash the kernel") David Hildenbrand
@ 2022-09-20 12:23 ` David Hildenbrand
  2022-09-21 13:02   ` Michael Ellerman
  2022-09-20 12:23 ` [PATCH v1 3/3] checkpatch: warn on usage of VM_BUG_ON() and other BUG variants David Hildenbrand
  2022-10-04 13:24 ` (subset) [PATCH v1 0/3] coding-style.rst: document BUG() and WARN() rules Michael Ellerman
  3 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand @ 2022-09-20 12:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lukas Bulwahn, Jonathan Corbet, Baoquan He, David Hildenbrand,
	Linus Torvalds, Dave Young, linux-doc, Nicholas Piggin,
	Jani Nikula, linux-mm, David Laight, Dwaipayan Ray,
	Andy Whitcroft, Joe Perches, Andrew Morton, linuxppc-dev,
	Ingo Molnar, Vivek Goyal

Unused, let's drop it.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/powerpc/kernel/prom_init.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index a6669c40c1db..d464ba412084 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -96,12 +96,6 @@ static int of_workarounds __prombss;
 #define OF_WA_CLAIM	1	/* do phys/virt claim separately, then map */
 #define OF_WA_LONGTRAIL	2	/* work around longtrail bugs */
 
-#define PROM_BUG() do {						\
-        prom_printf("kernel BUG at %s line 0x%x!\n",		\
-		    __FILE__, __LINE__);			\
-	__builtin_trap();					\
-} while (0)
-
 #ifdef DEBUG_PROM
 #define prom_debug(x...)	prom_printf(x)
 #else
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v1 3/3] checkpatch: warn on usage of VM_BUG_ON() and other BUG variants
  2022-09-20 12:22 [PATCH v1 0/3] coding-style.rst: document BUG() and WARN() rules David Hildenbrand
  2022-09-20 12:23 ` [PATCH v1 1/3] coding-style.rst: document BUG() and WARN() rules ("do not crash the kernel") David Hildenbrand
  2022-09-20 12:23 ` [PATCH v1 2/3] powerpc/prom_init: drop PROM_BUG() David Hildenbrand
@ 2022-09-20 12:23 ` David Hildenbrand
  2022-09-23  2:05   ` John Hubbard
  2022-10-04 13:24 ` (subset) [PATCH v1 0/3] coding-style.rst: document BUG() and WARN() rules Michael Ellerman
  3 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand @ 2022-09-20 12:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lukas Bulwahn, Jonathan Corbet, Baoquan He, David Hildenbrand,
	Linus Torvalds, Dave Young, linux-doc, Nicholas Piggin,
	Jani Nikula, linux-mm, David Laight, Dwaipayan Ray,
	Andy Whitcroft, Joe Perches, Andrew Morton, linuxppc-dev,
	Ingo Molnar, Vivek Goyal

checkpatch does not point out that VM_BUG_ON() and friends should be
avoided, however, Linus notes:

    VM_BUG_ON() has the exact same semantics as BUG_ON. It is literally
    no different, the only difference is "we can make the code smaller
    because these are less important". [1]

So let's warn on VM_BUG_ON() and other BUG variants as well. While at it,
make it clearer that the kernel really shouldn't be crashed.

As there are some subsystem BUG macros that actually don't end up crashing
the kernel -- for example, KVM_BUG_ON() -- exclude these manually.

[1] https://lore.kernel.org/r/CAHk-=wg40EAZofO16Eviaj7mfqDhZ2gVEbvfsMf6gYzspRjYvw@mail.gmail.com

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 scripts/checkpatch.pl | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 79e759aac543..21f3a79aa46f 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -4695,12 +4695,12 @@ sub process {
 			}
 		}
 
-# avoid BUG() or BUG_ON()
-		if ($line =~ /\b(?:BUG|BUG_ON)\b/) {
+# do not use BUG() or variants
+		if ($line =~ /\b(?!AA_|BUILD_|DCCP_|IDA_|KVM_|RWLOCK_|snd_|SPIN_)(?:[a-zA-Z_]*_)?BUG(?:_ON)?(?:_[A-Z_]+)?\s*\(/) {
 			my $msg_level = \&WARN;
 			$msg_level = \&CHK if ($file);
 			&{$msg_level}("AVOID_BUG",
-				      "Avoid crashing the kernel - try using WARN_ON & recovery code rather than BUG() or BUG_ON()\n" . $herecurr);
+				      "Do not crash the kernel unless it is unavoidable - use WARN_ON_ONCE & recovery code (if reasonable) rather than BUG() or variants.\n" . $herecurr);
 		}
 
 # avoid LINUX_VERSION_CODE
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 1/3] coding-style.rst: document BUG() and WARN() rules ("do not crash the kernel")
  2022-09-20 12:23 ` [PATCH v1 1/3] coding-style.rst: document BUG() and WARN() rules ("do not crash the kernel") David Hildenbrand
@ 2022-09-21  4:40   ` Kalle Valo
  2022-09-22 14:12     ` David Hildenbrand
  2022-09-22 13:43   ` Akira Yokosawa
  2022-09-23  2:26   ` John Hubbard
  2 siblings, 1 reply; 20+ messages in thread
From: Kalle Valo @ 2022-09-21  4:40 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Lukas Bulwahn, Baoquan He, linux-doc, linuxppc-dev, Dave Young,
	Jonathan Corbet, Nicholas Piggin, linux-kernel, Jani Nikula,
	linux-mm, David Laight, Dwaipayan Ray, Andy Whitcroft,
	Joe Perches, Andrew Morton, Linus Torvalds, Ingo Molnar,
	Vivek Goyal

David Hildenbrand <david@redhat.com> writes:

> Linus notes [1] that the introduction of new code that uses VM_BUG_ON()
> is just as bad as BUG_ON(), because it will crash the kernel on
> distributions that enable CONFIG_DEBUG_VM (like Fedora):
>
>     VM_BUG_ON() has the exact same semantics as BUG_ON. It is literally
>     no different, the only difference is "we can make the code smaller
>     because these are less important". [2]
>
> This resulted in a more generic discussion about usage of BUG() and
> friends. While there might be corner cases that still deserve a BUG_ON(),
> most BUG_ON() cases should simply use WARN_ON_ONCE() and implement a
> recovery path if reasonable:
>
>     The only possible case where BUG_ON can validly be used is "I have
>     some fundamental data corruption and cannot possibly return an
>     error". [2]
>
> As a very good approximation is the general rule:
>
>     "absolutely no new BUG_ON() calls _ever_" [2]
>
> ... not even if something really shouldn't ever happen and is merely for
> documenting that an invariant always has to hold. However, there are sill
> exceptions where BUG_ON() may be used:
>
>     If you have a "this is major internal corruption, there's no way we can
>     continue", then BUG_ON() is appropriate. [3]
>
> There is only one good BUG_ON():
>
>     Now, that said, there is one very valid sub-form of BUG_ON():
>     BUILD_BUG_ON() is absolutely 100% fine. [2]
>
> While WARN will also crash the machine with panic_on_warn set, that's
> exactly to be expected:
>
>     So we have two very different cases: the "virtual machine with good
>     logging where a dead machine is fine" - use 'panic_on_warn'. And
>     the actual real hardware with real drivers, running real loads by
>     users. [4]
>
> The basic idea is that warnings will similarly get reported by users
> and be found during testing. However, in contrast to a BUG(), there is a
> way to actually influence the expected behavior (e.g., panic_on_warn)
> and to eventually keep the machine alive to extract some debug info.
>
> Ingo notes that not all WARN_ON_ONCE cases need recovery. If we don't ever
> expect this code to trigger in any case, recovery code is not really
> helpful.
>
>     I'd prefer to keep all these warnings 'simple' - i.e. no attempted
>     recovery & control flow, unless we ever expect these to trigger.
>     [5]
>
> There have been different rules floating around that were never properly
> documented. Let's try to clarify.
>
> [1] https://lkml.kernel.org/r/CAHk-=wiEAH+ojSpAgx_Ep=NKPWHU8AdO3V56BXcCsU97oYJ1EA@mail.gmail.com
> [2] https://lore.kernel.org/r/CAHk-=wg40EAZofO16Eviaj7mfqDhZ2gVEbvfsMf6gYzspRjYvw@mail.gmail.com
> [2] https://lkml.kernel.org/r/CAHk-=wit-DmhMfQErY29JSPjFgebx_Ld+pnerc4J2Ag990WwAA@mail.gmail.com
> [4] https://lore.kernel.org/r/CAHk-=wgF7K2gSSpy=m_=K3Nov4zaceUX9puQf1TjkTJLA2XC_g@mail.gmail.com
> [5] https://lore.kernel.org/r/YwIW+mVeZoTOxn%2F4@gmail.com
>
> Signed-off-by: David Hildenbrand <david@redhat.com>

[...]

> +Use WARN_ON_ONCE() rather than WARN() or WARN_ON()
> +**************************************************
> +
> +WARN_ON_ONCE() is generally preferred over WARN() or WARN_ON(), because it
> +is common for a given warning condition, if it occurs at all, to occur
> +multiple times. This can fill up and wrap the kernel log, and can even slow
> +the system enough that the excessive logging turns into its own, additional
> +problem.

FWIW I have had cases where WARN() messages caused a reboot, maybe
mention that here? In my case the logging was so excessive that the
watchdog wasn't updated and in the end the device was forcefully
rebooted.

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 2/3] powerpc/prom_init: drop PROM_BUG()
  2022-09-20 12:23 ` [PATCH v1 2/3] powerpc/prom_init: drop PROM_BUG() David Hildenbrand
@ 2022-09-21 13:02   ` Michael Ellerman
  2022-09-21 13:03     ` David Hildenbrand
  0 siblings, 1 reply; 20+ messages in thread
From: Michael Ellerman @ 2022-09-21 13:02 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel
  Cc: Lukas Bulwahn, Jonathan Corbet, Baoquan He, David Hildenbrand,
	Linus Torvalds, Dave Young, linux-doc, Jani Nikula, linux-mm,
	David Laight, Nicholas Piggin, Dwaipayan Ray, Andy Whitcroft,
	Joe Perches, Andrew Morton, linuxppc-dev, Ingo Molnar,
	Vivek Goyal

David Hildenbrand <david@redhat.com> writes:
> Unused, let's drop it.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  arch/powerpc/kernel/prom_init.c | 6 ------
>  1 file changed, 6 deletions(-)

Thanks. I'll take this one via the powerpc tree, and the others can go
via wherever?

cheers

> diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
> index a6669c40c1db..d464ba412084 100644
> --- a/arch/powerpc/kernel/prom_init.c
> +++ b/arch/powerpc/kernel/prom_init.c
> @@ -96,12 +96,6 @@ static int of_workarounds __prombss;
>  #define OF_WA_CLAIM	1	/* do phys/virt claim separately, then map */
>  #define OF_WA_LONGTRAIL	2	/* work around longtrail bugs */
>  
> -#define PROM_BUG() do {						\
> -        prom_printf("kernel BUG at %s line 0x%x!\n",		\
> -		    __FILE__, __LINE__);			\
> -	__builtin_trap();					\
> -} while (0)
> -
>  #ifdef DEBUG_PROM
>  #define prom_debug(x...)	prom_printf(x)
>  #else
> -- 
> 2.37.3

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 2/3] powerpc/prom_init: drop PROM_BUG()
  2022-09-21 13:02   ` Michael Ellerman
@ 2022-09-21 13:03     ` David Hildenbrand
  0 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand @ 2022-09-21 13:03 UTC (permalink / raw)
  To: Michael Ellerman, linux-kernel
  Cc: Lukas Bulwahn, Baoquan He, linux-doc, linuxppc-dev, Dave Young,
	Jonathan Corbet, Jani Nikula, linux-mm, David Laight,
	Nicholas Piggin, Dwaipayan Ray, Andy Whitcroft, Joe Perches,
	Andrew Morton, Linus Torvalds, Ingo Molnar, Vivek Goyal

On 21.09.22 15:02, Michael Ellerman wrote:
> David Hildenbrand <david@redhat.com> writes:
>> Unused, let's drop it.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>   arch/powerpc/kernel/prom_init.c | 6 ------
>>   1 file changed, 6 deletions(-)
> 
> Thanks. I'll take this one via the powerpc tree, and the others can go
> via wherever?

Makes sense; I'll drop this patch in case I have to resend, assuming 
it's in your tree.

Thanks!

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 1/3] coding-style.rst: document BUG() and WARN() rules ("do not crash the kernel")
  2022-09-20 12:23 ` [PATCH v1 1/3] coding-style.rst: document BUG() and WARN() rules ("do not crash the kernel") David Hildenbrand
  2022-09-21  4:40   ` Kalle Valo
@ 2022-09-22 13:43   ` Akira Yokosawa
  2022-09-22 14:41     ` David Hildenbrand
  2022-09-23  2:26   ` John Hubbard
  2 siblings, 1 reply; 20+ messages in thread
From: Akira Yokosawa @ 2022-09-22 13:43 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linuxppc-dev, bhe, corbet, Akira Yokosawa, dwaipayanray1,
	linux-doc, npiggin, linux-kernel, linux-mm, joe, torvalds,
	David.Laight, jani.nikula, apw, lukas.bulwahn, akpm, dyoung,
	mingo, vgoyal

Hi,

Minor nits on section title adornments.
See inline comments below.

On Tue, 20 Sep 2022 14:23:00 +0200, David Hildenbrand wrote:
> Linus notes [1] that the introduction of new code that uses VM_BUG_ON()
> is just as bad as BUG_ON(), because it will crash the kernel on
> distributions that enable CONFIG_DEBUG_VM (like Fedora):
> 
>     VM_BUG_ON() has the exact same semantics as BUG_ON. It is literally
>     no different, the only difference is "we can make the code smaller
>     because these are less important". [2]
> 
> This resulted in a more generic discussion about usage of BUG() and
> friends. While there might be corner cases that still deserve a BUG_ON(),
> most BUG_ON() cases should simply use WARN_ON_ONCE() and implement a
> recovery path if reasonable:
> 
>     The only possible case where BUG_ON can validly be used is "I have
>     some fundamental data corruption and cannot possibly return an
>     error". [2]
> 
> As a very good approximation is the general rule:
> 
>     "absolutely no new BUG_ON() calls _ever_" [2]
> 
> ... not even if something really shouldn't ever happen and is merely for
> documenting that an invariant always has to hold. However, there are sill
> exceptions where BUG_ON() may be used:
> 
>     If you have a "this is major internal corruption, there's no way we can
>     continue", then BUG_ON() is appropriate. [3]
> 
> There is only one good BUG_ON():
> 
>     Now, that said, there is one very valid sub-form of BUG_ON():
>     BUILD_BUG_ON() is absolutely 100% fine. [2]
> 
> While WARN will also crash the machine with panic_on_warn set, that's
> exactly to be expected:
> 
>     So we have two very different cases: the "virtual machine with good
>     logging where a dead machine is fine" - use 'panic_on_warn'. And
>     the actual real hardware with real drivers, running real loads by
>     users. [4]
> 
> The basic idea is that warnings will similarly get reported by users
> and be found during testing. However, in contrast to a BUG(), there is a
> way to actually influence the expected behavior (e.g., panic_on_warn)
> and to eventually keep the machine alive to extract some debug info.
> 
> Ingo notes that not all WARN_ON_ONCE cases need recovery. If we don't ever
> expect this code to trigger in any case, recovery code is not really
> helpful.
> 
>     I'd prefer to keep all these warnings 'simple' - i.e. no attempted
>     recovery & control flow, unless we ever expect these to trigger.
>     [5]
> 
> There have been different rules floating around that were never properly
> documented. Let's try to clarify.
> 
> [1] https://lkml.kernel.org/r/CAHk-=wiEAH+ojSpAgx_Ep=NKPWHU8AdO3V56BXcCsU97oYJ1EA@mail.gmail.com
> [2] https://lore.kernel.org/r/CAHk-=wg40EAZofO16Eviaj7mfqDhZ2gVEbvfsMf6gYzspRjYvw@mail.gmail.com
> [2] https://lkml.kernel.org/r/CAHk-=wit-DmhMfQErY29JSPjFgebx_Ld+pnerc4J2Ag990WwAA@mail.gmail.com
> [4] https://lore.kernel.org/r/CAHk-=wgF7K2gSSpy=m_=K3Nov4zaceUX9puQf1TjkTJLA2XC_g@mail.gmail.com
> [5] https://lore.kernel.org/r/YwIW+mVeZoTOxn%2F4@gmail.com
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  Documentation/process/coding-style.rst | 61 ++++++++++++++++++++++++++
>  1 file changed, 61 insertions(+)
> 
> diff --git a/Documentation/process/coding-style.rst b/Documentation/process/coding-style.rst
> index 03eb53fd029a..e05899cbfd49 100644
> --- a/Documentation/process/coding-style.rst
> +++ b/Documentation/process/coding-style.rst
> @@ -1186,6 +1186,67 @@ expression used.  For instance:
>  	#endif /* CONFIG_SOMETHING */
>  
>  
> +22) Do not crash the kernel
> +---------------------------
> +
> +In general, it is not the kernel developer's decision to crash the kernel.
> +
> +Avoid panic()
> +=============
This looks to me like a subsection-level title.  The adornment symbol
needs to be:

   *************

> +
> +panic() should be used with care and primarily only during system boot.
> +panic() is, for example, acceptable when running out of memory during boot and
> +not being able to continue.
> +
> +Use WARN() rather than BUG()
> +============================
Ditto.

> +
> +Do not add new code that uses any of the BUG() variants, such as BUG(),
> +BUG_ON(), or VM_BUG_ON(). Instead, use a WARN*() variant, preferably
> +WARN_ON_ONCE(), and possibly with recovery code. Recovery code is not
> +required if there is no reasonable way to at least partially recover.
> +
> +"I'm too lazy to do error handling" is not an excuse for using BUG(). Major
> +internal corruptions with no way of continuing may still use BUG(), but need
> +good justification.
> +
> +Use WARN_ON_ONCE() rather than WARN() or WARN_ON()
> +**************************************************
These wrong adornment symbol confuse ReST parser of Sphinx and results in
the build error from "make htmldocs" at this title (long message folded):

    Sphinx parallel build error:

    docutils.utils.SystemMessage: /xxx/Documentation/process/coding-style.rst:1213:
     (SEVERE/4) Title level inconsistent:



Please fix in v2.

        Thanks, Akira

> +
> +WARN_ON_ONCE() is generally preferred over WARN() or WARN_ON(), because it
> +is common for a given warning condition, if it occurs at all, to occur
> +multiple times. This can fill up and wrap the kernel log, and can even slow
> +the system enough that the excessive logging turns into its own, additional
> +problem.
> +
[...]


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 1/3] coding-style.rst: document BUG() and WARN() rules ("do not crash the kernel")
  2022-09-21  4:40   ` Kalle Valo
@ 2022-09-22 14:12     ` David Hildenbrand
  2022-09-26  7:44       ` Kalle Valo
  0 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand @ 2022-09-22 14:12 UTC (permalink / raw)
  To: Kalle Valo
  Cc: Lukas Bulwahn, Baoquan He, linux-doc, linuxppc-dev, Dave Young,
	Jonathan Corbet, Nicholas Piggin, linux-kernel, Jani Nikula,
	linux-mm, David Laight, Dwaipayan Ray, Andy Whitcroft,
	Joe Perches, Andrew Morton, Linus Torvalds, Ingo Molnar,
	Vivek Goyal

On 21.09.22 06:40, Kalle Valo wrote:
> David Hildenbrand <david@redhat.com> writes:
> 
>> Linus notes [1] that the introduction of new code that uses VM_BUG_ON()
>> is just as bad as BUG_ON(), because it will crash the kernel on
>> distributions that enable CONFIG_DEBUG_VM (like Fedora):
>>
>>      VM_BUG_ON() has the exact same semantics as BUG_ON. It is literally
>>      no different, the only difference is "we can make the code smaller
>>      because these are less important". [2]
>>
>> This resulted in a more generic discussion about usage of BUG() and
>> friends. While there might be corner cases that still deserve a BUG_ON(),
>> most BUG_ON() cases should simply use WARN_ON_ONCE() and implement a
>> recovery path if reasonable:
>>
>>      The only possible case where BUG_ON can validly be used is "I have
>>      some fundamental data corruption and cannot possibly return an
>>      error". [2]
>>
>> As a very good approximation is the general rule:
>>
>>      "absolutely no new BUG_ON() calls _ever_" [2]
>>
>> ... not even if something really shouldn't ever happen and is merely for
>> documenting that an invariant always has to hold. However, there are sill
>> exceptions where BUG_ON() may be used:
>>
>>      If you have a "this is major internal corruption, there's no way we can
>>      continue", then BUG_ON() is appropriate. [3]
>>
>> There is only one good BUG_ON():
>>
>>      Now, that said, there is one very valid sub-form of BUG_ON():
>>      BUILD_BUG_ON() is absolutely 100% fine. [2]
>>
>> While WARN will also crash the machine with panic_on_warn set, that's
>> exactly to be expected:
>>
>>      So we have two very different cases: the "virtual machine with good
>>      logging where a dead machine is fine" - use 'panic_on_warn'. And
>>      the actual real hardware with real drivers, running real loads by
>>      users. [4]
>>
>> The basic idea is that warnings will similarly get reported by users
>> and be found during testing. However, in contrast to a BUG(), there is a
>> way to actually influence the expected behavior (e.g., panic_on_warn)
>> and to eventually keep the machine alive to extract some debug info.
>>
>> Ingo notes that not all WARN_ON_ONCE cases need recovery. If we don't ever
>> expect this code to trigger in any case, recovery code is not really
>> helpful.
>>
>>      I'd prefer to keep all these warnings 'simple' - i.e. no attempted
>>      recovery & control flow, unless we ever expect these to trigger.
>>      [5]
>>
>> There have been different rules floating around that were never properly
>> documented. Let's try to clarify.
>>
>> [1] https://lkml.kernel.org/r/CAHk-=wiEAH+ojSpAgx_Ep=NKPWHU8AdO3V56BXcCsU97oYJ1EA@mail.gmail.com
>> [2] https://lore.kernel.org/r/CAHk-=wg40EAZofO16Eviaj7mfqDhZ2gVEbvfsMf6gYzspRjYvw@mail.gmail.com
>> [2] https://lkml.kernel.org/r/CAHk-=wit-DmhMfQErY29JSPjFgebx_Ld+pnerc4J2Ag990WwAA@mail.gmail.com
>> [4] https://lore.kernel.org/r/CAHk-=wgF7K2gSSpy=m_=K3Nov4zaceUX9puQf1TjkTJLA2XC_g@mail.gmail.com
>> [5] https://lore.kernel.org/r/YwIW+mVeZoTOxn%2F4@gmail.com
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
> 
> [...]
> 
>> +Use WARN_ON_ONCE() rather than WARN() or WARN_ON()
>> +**************************************************
>> +
>> +WARN_ON_ONCE() is generally preferred over WARN() or WARN_ON(), because it
>> +is common for a given warning condition, if it occurs at all, to occur
>> +multiple times. This can fill up and wrap the kernel log, and can even slow
>> +the system enough that the excessive logging turns into its own, additional
>> +problem.
> 
> FWIW I have had cases where WARN() messages caused a reboot, maybe
> mention that here? In my case the logging was so excessive that the
> watchdog wasn't updated and in the end the device was forcefully
> rebooted.
> 

That should be covered by the last part, no? What would be your suggestion?

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 1/3] coding-style.rst: document BUG() and WARN() rules ("do not crash the kernel")
  2022-09-22 13:43   ` Akira Yokosawa
@ 2022-09-22 14:41     ` David Hildenbrand
  0 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand @ 2022-09-22 14:41 UTC (permalink / raw)
  To: Akira Yokosawa
  Cc: linuxppc-dev, bhe, corbet, dwaipayanray1, linux-doc, npiggin,
	linux-kernel, linux-mm, joe, torvalds, David.Laight, jani.nikula,
	apw, lukas.bulwahn, akpm, dyoung, mingo, vgoyal

>>   
>> +22) Do not crash the kernel
>> +---------------------------
>> +
>> +In general, it is not the kernel developer's decision to crash the kernel.
>> +
>> +Avoid panic()
>> +=============
> This looks to me like a subsection-level title.  The adornment symbol
> needs to be:
> 
>     *************
> 
>> +
>> +panic() should be used with care and primarily only during system boot.
>> +panic() is, for example, acceptable when running out of memory during boot and
>> +not being able to continue.
>> +
>> +Use WARN() rather than BUG()
>> +============================
> Ditto.
> 
>> +
>> +Do not add new code that uses any of the BUG() variants, such as BUG(),
>> +BUG_ON(), or VM_BUG_ON(). Instead, use a WARN*() variant, preferably
>> +WARN_ON_ONCE(), and possibly with recovery code. Recovery code is not
>> +required if there is no reasonable way to at least partially recover.
>> +
>> +"I'm too lazy to do error handling" is not an excuse for using BUG(). Major
>> +internal corruptions with no way of continuing may still use BUG(), but need
>> +good justification.
>> +
>> +Use WARN_ON_ONCE() rather than WARN() or WARN_ON()
>> +**************************************************
> These wrong adornment symbol confuse ReST parser of Sphinx and results in
> the build error from "make htmldocs" at this title (long message folded):


Thanks,

the following on top should do the trick:


diff --git a/Documentation/process/coding-style.rst b/Documentation/process/coding-style.rst
index e05899cbfd49..9efde65ac2f3 100644
--- a/Documentation/process/coding-style.rst
+++ b/Documentation/process/coding-style.rst
@@ -1192,14 +1192,14 @@ expression used.  For instance:
  In general, it is not the kernel developer's decision to crash the kernel.
  
  Avoid panic()
-=============
+*************
  
  panic() should be used with care and primarily only during system boot.
  panic() is, for example, acceptable when running out of memory during boot and
  not being able to continue.
  
  Use WARN() rather than BUG()
-============================
+****************************
  
  Do not add new code that uses any of the BUG() variants, such as BUG(),
  BUG_ON(), or VM_BUG_ON(). Instead, use a WARN*() variant, preferably


-- 
Thanks,

David / dhildenb


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 3/3] checkpatch: warn on usage of VM_BUG_ON() and other BUG variants
  2022-09-20 12:23 ` [PATCH v1 3/3] checkpatch: warn on usage of VM_BUG_ON() and other BUG variants David Hildenbrand
@ 2022-09-23  2:05   ` John Hubbard
  2022-09-23  2:11     ` Joe Perches
  0 siblings, 1 reply; 20+ messages in thread
From: John Hubbard @ 2022-09-23  2:05 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel
  Cc: Lukas Bulwahn, Baoquan He, linux-doc, linuxppc-dev, Dave Young,
	Jonathan Corbet, Nicholas Piggin, Jani Nikula, linux-mm,
	David Laight, Dwaipayan Ray, Andy Whitcroft, Joe Perches,
	Andrew Morton, Linus Torvalds, Ingo Molnar, Vivek Goyal

On 9/20/22 05:23, David Hildenbrand wrote:
> checkpatch does not point out that VM_BUG_ON() and friends should be
> avoided, however, Linus notes:
> 
>     VM_BUG_ON() has the exact same semantics as BUG_ON. It is literally
>     no different, the only difference is "we can make the code smaller
>     because these are less important". [1]
> 
> So let's warn on VM_BUG_ON() and other BUG variants as well. While at it,
> make it clearer that the kernel really shouldn't be crashed.
> 
> As there are some subsystem BUG macros that actually don't end up crashing
> the kernel -- for example, KVM_BUG_ON() -- exclude these manually.
> 
> [1] https://lore.kernel.org/r/CAHk-=wg40EAZofO16Eviaj7mfqDhZ2gVEbvfsMf6gYzspRjYvw@mail.gmail.com
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  scripts/checkpatch.pl | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index 79e759aac543..21f3a79aa46f 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -4695,12 +4695,12 @@ sub process {
>  			}
>  		}
>  
> -# avoid BUG() or BUG_ON()
> -		if ($line =~ /\b(?:BUG|BUG_ON)\b/) {
> +# do not use BUG() or variants
> +		if ($line =~ /\b(?!AA_|BUILD_|DCCP_|IDA_|KVM_|RWLOCK_|snd_|SPIN_)(?:[a-zA-Z_]*_)?BUG(?:_ON)?(?:_[A-Z_]+)?\s*\(/) {

Should this be a separate patch? Adding a bunch of exceptions to the BUG() rules is 
a separate and distinct thing from adding VM_BUG_ON() and other *BUG*() variants to
the mix.

>  			my $msg_level = \&WARN;
>  			$msg_level = \&CHK if ($file);
>  			&{$msg_level}("AVOID_BUG",
> -				      "Avoid crashing the kernel - try using WARN_ON & recovery code rather than BUG() or BUG_ON()\n" . $herecurr);
> +				      "Do not crash the kernel unless it is unavoidable - use WARN_ON_ONCE & recovery code (if reasonable) rather than BUG() or variants.\n" . $herecurr);

Here's a requested tweak, to clean up the output and fix punctuation:

"Avoid crashing the kernel--use WARN_ON_ONCE() plus recovery code (if feasible) instead of BUG() or variants.\n" . $herecurr);


thanks,

-- 
John Hubbard
NVIDIA


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 3/3] checkpatch: warn on usage of VM_BUG_ON() and other BUG variants
  2022-09-23  2:05   ` John Hubbard
@ 2022-09-23  2:11     ` Joe Perches
  2022-09-23  2:20       ` John Hubbard
  0 siblings, 1 reply; 20+ messages in thread
From: Joe Perches @ 2022-09-23  2:11 UTC (permalink / raw)
  To: John Hubbard, David Hildenbrand, linux-kernel
  Cc: Baoquan He, linux-doc, linuxppc-dev, Dave Young, Jonathan Corbet,
	Nicholas Piggin, Jani Nikula, linux-mm, David Laight,
	Dwaipayan Ray, Andy Whitcroft, Lukas Bulwahn, Andrew Morton,
	Linus Torvalds, Ingo Molnar, Vivek Goyal

On Thu, 2022-09-22 at 19:05 -0700, John Hubbard wrote:
> On 9/20/22 05:23, David Hildenbrand wrote:
> > checkpatch does not point out that VM_BUG_ON() and friends should be
> > avoided, however, Linus notes:
> > 
> >     VM_BUG_ON() has the exact same semantics as BUG_ON. It is literally
> >     no different, the only difference is "we can make the code smaller
> >     because these are less important". [1]
> > 
> > So let's warn on VM_BUG_ON() and other BUG variants as well. While at it,
> > make it clearer that the kernel really shouldn't be crashed.
> > 
> > As there are some subsystem BUG macros that actually don't end up crashing
> > the kernel -- for example, KVM_BUG_ON() -- exclude these manually.
> > 
> > [1] https://lore.kernel.org/r/CAHk-=wg40EAZofO16Eviaj7mfqDhZ2gVEbvfsMf6gYzspRjYvw@mail.gmail.com
[]
> > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
[]
> > @@ -4695,12 +4695,12 @@ sub process {
> >  			}
> >  		}
> >  
> > -# avoid BUG() or BUG_ON()
> > -		if ($line =~ /\b(?:BUG|BUG_ON)\b/) {
> > +# do not use BUG() or variants
> > +		if ($line =~ /\b(?!AA_|BUILD_|DCCP_|IDA_|KVM_|RWLOCK_|snd_|SPIN_)(?:[a-zA-Z_]*_)?BUG(?:_ON)?(?:_[A-Z_]+)?\s*\(/) {
> 
> Should this be a separate patch? Adding a bunch of exceptions to the BUG() rules is 
> a separate and distinct thing from adding VM_BUG_ON() and other *BUG*() variants to
> the mix.

Not in my opinion.

> >  			my $msg_level = \&WARN;
> >  			$msg_level = \&CHK if ($file);
> >  			&{$msg_level}("AVOID_BUG",
> > -				      "Avoid crashing the kernel - try using WARN_ON & recovery code rather than BUG() or BUG_ON()\n" . $herecurr);
> > +				      "Do not crash the kernel unless it is unavoidable - use WARN_ON_ONCE & recovery code (if reasonable) rather than BUG() or variants.\n" . $herecurr);
> 
> Here's a requested tweak, to clean up the output and fix punctuation:
> 
> "Avoid crashing the kernel--use WARN_ON_ONCE() plus recovery code (if feasible) instead of BUG() or variants.\n" . $herecurr);

Fixing punctuation here would be removing the trailing period as checkpatch
only has periods for multi-sentence output messages.

And I think that "Do not crash" is a stronger statement than "Avoid crashing"
so I prefer the original suggestion but it's not a big deal either way.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 3/3] checkpatch: warn on usage of VM_BUG_ON() and other BUG variants
  2022-09-23  2:11     ` Joe Perches
@ 2022-09-23  2:20       ` John Hubbard
  2022-09-23 10:58         ` David Hildenbrand
  0 siblings, 1 reply; 20+ messages in thread
From: John Hubbard @ 2022-09-23  2:20 UTC (permalink / raw)
  To: Joe Perches, David Hildenbrand, linux-kernel
  Cc: Baoquan He, linux-doc, linuxppc-dev, Dave Young, Jonathan Corbet,
	Nicholas Piggin, Jani Nikula, linux-mm, David Laight,
	Dwaipayan Ray, Andy Whitcroft, Lukas Bulwahn, Andrew Morton,
	Linus Torvalds, Ingo Molnar, Vivek Goyal

On 9/22/22 19:11, Joe Perches wrote:
>> Should this be a separate patch? Adding a bunch of exceptions to the BUG() rules is 
>> a separate and distinct thing from adding VM_BUG_ON() and other *BUG*() variants to
>> the mix.
> 
> Not in my opinion.

OK, then. :)

> 
>>>  			my $msg_level = \&WARN;
>>>  			$msg_level = \&CHK if ($file);
>>>  			&{$msg_level}("AVOID_BUG",
>>> -				      "Avoid crashing the kernel - try using WARN_ON & recovery code rather than BUG() or BUG_ON()\n" . $herecurr);
>>> +				      "Do not crash the kernel unless it is unavoidable - use WARN_ON_ONCE & recovery code (if reasonable) rather than BUG() or variants.\n" . $herecurr);
>>
>> Here's a requested tweak, to clean up the output and fix punctuation:
>>
>> "Avoid crashing the kernel--use WARN_ON_ONCE() plus recovery code (if feasible) instead of BUG() or variants.\n" . $herecurr);
> 
> Fixing punctuation here would be removing the trailing period as checkpatch
> only has periods for multi-sentence output messages.

OK, let's do that too. 

> 
> And I think that "Do not crash" is a stronger statement than "Avoid crashing"
> so I prefer the original suggestion but it's not a big deal either way.

Yes, stronger wording is better. So how about this:

"Do not crash the kernel unless it is absolutely unavoidable--use WARN_ON_ONCE() plus recovery code (if feasible) instead of BUG() or variants\n" . $herecurr);


thanks,

-- 
John Hubbard
NVIDIA


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 1/3] coding-style.rst: document BUG() and WARN() rules ("do not crash the kernel")
  2022-09-20 12:23 ` [PATCH v1 1/3] coding-style.rst: document BUG() and WARN() rules ("do not crash the kernel") David Hildenbrand
  2022-09-21  4:40   ` Kalle Valo
  2022-09-22 13:43   ` Akira Yokosawa
@ 2022-09-23  2:26   ` John Hubbard
  2022-09-23  2:37     ` John Hubbard
  2022-09-23 10:55     ` David Hildenbrand
  2 siblings, 2 replies; 20+ messages in thread
From: John Hubbard @ 2022-09-23  2:26 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel
  Cc: Lukas Bulwahn, Baoquan He, linux-doc, linuxppc-dev, Dave Young,
	Jonathan Corbet, Nicholas Piggin, Jani Nikula, linux-mm,
	David Laight, Dwaipayan Ray, Andy Whitcroft, Joe Perches,
	Andrew Morton, Linus Torvalds, Ingo Molnar, Vivek Goyal

On 9/20/22 05:23, David Hildenbrand wrote:
> [1] https://lkml.kernel.org/r/CAHk-=wiEAH+ojSpAgx_Ep=NKPWHU8AdO3V56BXcCsU97oYJ1EA@mail.gmail.com
> [2] https://lore.kernel.org/r/CAHk-=wg40EAZofO16Eviaj7mfqDhZ2gVEbvfsMf6gYzspRjYvw@mail.gmail.com
> [2] https://lkml.kernel.org/r/CAHk-=wit-DmhMfQErY29JSPjFgebx_Ld+pnerc4J2Ag990WwAA@mail.gmail.com

s/2/3/

...
> diff --git a/Documentation/process/coding-style.rst b/Documentation/process/coding-style.rst
> index 03eb53fd029a..e05899cbfd49 100644
> --- a/Documentation/process/coding-style.rst
> +++ b/Documentation/process/coding-style.rst
> @@ -1186,6 +1186,67 @@ expression used.  For instance:
>  	#endif /* CONFIG_SOMETHING */
>  
>  
> +22) Do not crash the kernel
> +---------------------------
> +
> +In general, it is not the kernel developer's decision to crash the kernel.

What do you think of this alternate wording:

In general, the decision to crash the kernel belongs to the user, rather
than to the kernel developer.


> +
> +Avoid panic()
> +=============
> +
> +panic() should be used with care and primarily only during system boot.
> +panic() is, for example, acceptable when running out of memory during boot and
> +not being able to continue.
> +
> +Use WARN() rather than BUG()
> +============================
> +
> +Do not add new code that uses any of the BUG() variants, such as BUG(),
> +BUG_ON(), or VM_BUG_ON(). Instead, use a WARN*() variant, preferably
> +WARN_ON_ONCE(), and possibly with recovery code. Recovery code is not
> +required if there is no reasonable way to at least partially recover.
> +
> +"I'm too lazy to do error handling" is not an excuse for using BUG(). Major
> +internal corruptions with no way of continuing may still use BUG(), but need
> +good justification.
> +
> +Use WARN_ON_ONCE() rather than WARN() or WARN_ON()
> +**************************************************
> +
> +WARN_ON_ONCE() is generally preferred over WARN() or WARN_ON(), because it
> +is common for a given warning condition, if it occurs at all, to occur
> +multiple times. This can fill up and wrap the kernel log, and can even slow
> +the system enough that the excessive logging turns into its own, additional
> +problem.
> +
> +Do not WARN lightly
> +*******************
> +
> +WARN*() is intended for unexpected, this-should-never-happen situations.
> +WARN*() macros are not to be used for anything that is expected to happen
> +during normal operation. These are not pre- or post-condition asserts, for
> +example. Again: WARN*() must not be used for a condition that is expected
> +to trigger easily, for example, by user space actions. pr_warn_once() is a
> +possible alternative, if you need to notify the user of a problem.
> +
> +Do not worry about panic_on_warn users
> +**************************************
> +
> +A few more words about panic_on_warn: Remember that ``panic_on_warn`` is an
> +available kernel option, and that many users set this option. This is why
> +there is a "Do not WARN lightly" writeup, above. However, the existence of
> +panic_on_warn users is not a valid reason to avoid the judicious use
> +WARN*(). That is because, whoever enables panic_on_warn has explicitly
> +asked the kernel to crash if a WARN*() fires, and such users must be
> +prepared to deal with the consequences of a system that is somewhat more
> +likely to crash.
> +
> +Use BUILD_BUG_ON() for compile-time assertions
> +**********************************************
> +
> +The use of BUILD_BUG_ON() is acceptable and encouraged, because it is a
> +compile-time assertion that has no effect at runtime.
> +
>  Appendix I) References
>  ----------------------
>  

I like the wording, it feels familiar somehow! :)

Reviewed-by: John Hubbard <jhubbard@nvidia.com>

thanks,

-- 
John Hubbard
NVIDIA


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 1/3] coding-style.rst: document BUG() and WARN() rules ("do not crash the kernel")
  2022-09-23  2:26   ` John Hubbard
@ 2022-09-23  2:37     ` John Hubbard
  2022-09-23 10:55     ` David Hildenbrand
  1 sibling, 0 replies; 20+ messages in thread
From: John Hubbard @ 2022-09-23  2:37 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel
  Cc: Lukas Bulwahn, Baoquan He, linux-doc, linuxppc-dev, Dave Young,
	Jonathan Corbet, Nicholas Piggin, Jani Nikula, linux-mm,
	David Laight, Dwaipayan Ray, Andy Whitcroft, Joe Perches,
	Andrew Morton, Linus Torvalds, Ingo Molnar, Vivek Goyal

On 9/22/22 19:26, John Hubbard wrote:
> 
> Reviewed-by: John Hubbard <jhubbard@nvidia.com>
> 

I forgot to mention that I had applied your fix to Akira's
issue, before reviewing. So that fix works and builds and
looks nice too.

thanks,

-- 
John Hubbard
NVIDIA


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 1/3] coding-style.rst: document BUG() and WARN() rules ("do not crash the kernel")
  2022-09-23  2:26   ` John Hubbard
  2022-09-23  2:37     ` John Hubbard
@ 2022-09-23 10:55     ` David Hildenbrand
  1 sibling, 0 replies; 20+ messages in thread
From: David Hildenbrand @ 2022-09-23 10:55 UTC (permalink / raw)
  To: John Hubbard, linux-kernel
  Cc: Lukas Bulwahn, Baoquan He, linux-doc, linuxppc-dev, Dave Young,
	Jonathan Corbet, Nicholas Piggin, Jani Nikula, linux-mm,
	David Laight, Dwaipayan Ray, Andy Whitcroft, Joe Perches,
	Andrew Morton, Linus Torvalds, Ingo Molnar, Vivek Goyal

On 23.09.22 04:26, John Hubbard wrote:
> On 9/20/22 05:23, David Hildenbrand wrote:
>> [1] https://lkml.kernel.org/r/CAHk-=wiEAH+ojSpAgx_Ep=NKPWHU8AdO3V56BXcCsU97oYJ1EA@mail.gmail.com
>> [2] https://lore.kernel.org/r/CAHk-=wg40EAZofO16Eviaj7mfqDhZ2gVEbvfsMf6gYzspRjYvw@mail.gmail.com
>> [2] https://lkml.kernel.org/r/CAHk-=wit-DmhMfQErY29JSPjFgebx_Ld+pnerc4J2Ag990WwAA@mail.gmail.com
> 
> s/2/3/

Thanks!

> 
> ...
>> diff --git a/Documentation/process/coding-style.rst b/Documentation/process/coding-style.rst
>> index 03eb53fd029a..e05899cbfd49 100644
>> --- a/Documentation/process/coding-style.rst
>> +++ b/Documentation/process/coding-style.rst
>> @@ -1186,6 +1186,67 @@ expression used.  For instance:
>>   	#endif /* CONFIG_SOMETHING */
>>   
>>   
>> +22) Do not crash the kernel
>> +---------------------------
>> +
>> +In general, it is not the kernel developer's decision to crash the kernel.
> 
> What do you think of this alternate wording:
> 
> In general, the decision to crash the kernel belongs to the user, rather
> than to the kernel developer.

Ack

[...]

> I like the wording, it feels familiar somehow! :)

:)

> 
> Reviewed-by: John Hubbard <jhubbard@nvidia.com>

Thanks!

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 3/3] checkpatch: warn on usage of VM_BUG_ON() and other BUG variants
  2022-09-23  2:20       ` John Hubbard
@ 2022-09-23 10:58         ` David Hildenbrand
  0 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand @ 2022-09-23 10:58 UTC (permalink / raw)
  To: John Hubbard, Joe Perches, linux-kernel
  Cc: Baoquan He, linux-doc, linuxppc-dev, Dave Young, Jonathan Corbet,
	Nicholas Piggin, Jani Nikula, linux-mm, David Laight,
	Dwaipayan Ray, Andy Whitcroft, Lukas Bulwahn, Andrew Morton,
	Linus Torvalds, Ingo Molnar, Vivek Goyal

>> And I think that "Do not crash" is a stronger statement than "Avoid crashing"
>> so I prefer the original suggestion but it's not a big deal either way.
> 
> Yes, stronger wording is better. So how about this:
> 
> "Do not crash the kernel unless it is absolutely unavoidable--use WARN_ON_ONCE() plus recovery code (if feasible) instead of BUG() or variants\n" . $herecurr);

Okay, let's use that.

Thanks!

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 1/3] coding-style.rst: document BUG() and WARN() rules ("do not crash the kernel")
  2022-09-22 14:12     ` David Hildenbrand
@ 2022-09-26  7:44       ` Kalle Valo
  2022-10-04 12:32         ` David Hildenbrand
  0 siblings, 1 reply; 20+ messages in thread
From: Kalle Valo @ 2022-09-26  7:44 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Lukas Bulwahn, Baoquan He, linux-doc, linuxppc-dev, Dave Young,
	Jonathan Corbet, Nicholas Piggin, linux-kernel, Jani Nikula,
	linux-mm, David Laight, Dwaipayan Ray, Andy Whitcroft,
	Joe Perches, Andrew Morton, Linus Torvalds, Ingo Molnar,
	Vivek Goyal

David Hildenbrand <david@redhat.com> writes:

>>> +Use WARN_ON_ONCE() rather than WARN() or WARN_ON()
>>> +**************************************************
>>> +
>>> +WARN_ON_ONCE() is generally preferred over WARN() or WARN_ON(), because it
>>> +is common for a given warning condition, if it occurs at all, to occur
>>> +multiple times. This can fill up and wrap the kernel log, and can even slow
>>> +the system enough that the excessive logging turns into its own, additional
>>> +problem.
>>
>> FWIW I have had cases where WARN() messages caused a reboot, maybe
>> mention that here? In my case the logging was so excessive that the
>> watchdog wasn't updated and in the end the device was forcefully
>> rebooted.
>>
>
> That should be covered by the last part, no? What would be your suggestion?

I was just thinking that maybe make it more obvious that even WARN_ON()
can crash the system, something along these lines:

"..., additional problem like stalling the system so much that it causes
a reboot."

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 1/3] coding-style.rst: document BUG() and WARN() rules ("do not crash the kernel")
  2022-09-26  7:44       ` Kalle Valo
@ 2022-10-04 12:32         ` David Hildenbrand
  0 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand @ 2022-10-04 12:32 UTC (permalink / raw)
  To: Kalle Valo
  Cc: Lukas Bulwahn, Baoquan He, linux-doc, linuxppc-dev, Dave Young,
	Jonathan Corbet, Nicholas Piggin, linux-kernel, Jani Nikula,
	linux-mm, David Laight, Dwaipayan Ray, Andy Whitcroft,
	Joe Perches, Andrew Morton, Linus Torvalds, Ingo Molnar,
	Vivek Goyal

On 26.09.22 09:44, Kalle Valo wrote:
> David Hildenbrand <david@redhat.com> writes:
> 
>>>> +Use WARN_ON_ONCE() rather than WARN() or WARN_ON()
>>>> +**************************************************
>>>> +
>>>> +WARN_ON_ONCE() is generally preferred over WARN() or WARN_ON(), because it
>>>> +is common for a given warning condition, if it occurs at all, to occur
>>>> +multiple times. This can fill up and wrap the kernel log, and can even slow
>>>> +the system enough that the excessive logging turns into its own, additional
>>>> +problem.
>>>
>>> FWIW I have had cases where WARN() messages caused a reboot, maybe
>>> mention that here? In my case the logging was so excessive that the
>>> watchdog wasn't updated and in the end the device was forcefully
>>> rebooted.
>>>
>>
>> That should be covered by the last part, no? What would be your suggestion?
> 
> I was just thinking that maybe make it more obvious that even WARN_ON()
> can crash the system, something along these lines:
> 
> "..., additional problem like stalling the system so much that it causes
> a reboot."

Hi Kalle,

sorry for the late reply. Jonathan already queued v2 and sent it upstream.

I think that's it is already covered by the statement and that the 
additional example isn't required -- most of us learned the hard way 
that "excessive logging turns into its own problem" includes all weird 
kinds of kernel crashes. A panic/reboot due to a watchdog not firing is 
one such possible outcome.

Thanks!

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: (subset) [PATCH v1 0/3] coding-style.rst: document BUG() and WARN() rules
  2022-09-20 12:22 [PATCH v1 0/3] coding-style.rst: document BUG() and WARN() rules David Hildenbrand
                   ` (2 preceding siblings ...)
  2022-09-20 12:23 ` [PATCH v1 3/3] checkpatch: warn on usage of VM_BUG_ON() and other BUG variants David Hildenbrand
@ 2022-10-04 13:24 ` Michael Ellerman
  3 siblings, 0 replies; 20+ messages in thread
From: Michael Ellerman @ 2022-10-04 13:24 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel
  Cc: Joe Perches, Baoquan He, Jonathan Corbet, Dwaipayan Ray,
	Dave Young, linux-doc, Jani Nikula, linux-mm, Linus Torvalds,
	David Laight, Nicholas Piggin, Andy Whitcroft, Lukas Bulwahn,
	Andrew Morton, linuxppc-dev, Ingo Molnar, Vivek Goyal

On Tue, 20 Sep 2022 14:22:59 +0200, David Hildenbrand wrote:
> As it seems to be rather unclear if/when to use BUG(), BUG_ON(),
> VM_BUG_ON(), WARN_ON_ONCE(), ... let's try to document the result of a
> recent discussion.
> 
> Details can be found in patch #1.
> 
> RFC -> v1:
> * "coding-style.rst: document BUG() and WARN() rules ("do not crash the
>    kernel")"
>  -> Rephrase/extend according to John
>  -> Add some details regarding the use of panic()
> * powerpc/prom_init: drop PROM_BUG()
>  -> Added
> * "checkpatch: warn on usage of VM_BUG_ON() and other BUG variants"
>  -> Warn on more variants
> 
> [...]

Patch 2 applied to powerpc/next.

[2/3] powerpc/prom_init: drop PROM_BUG()
      https://git.kernel.org/powerpc/c/c4167aec98524fa4511b3222303a758b532b6009

cheers

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-10-04 13:45 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-20 12:22 [PATCH v1 0/3] coding-style.rst: document BUG() and WARN() rules David Hildenbrand
2022-09-20 12:23 ` [PATCH v1 1/3] coding-style.rst: document BUG() and WARN() rules ("do not crash the kernel") David Hildenbrand
2022-09-21  4:40   ` Kalle Valo
2022-09-22 14:12     ` David Hildenbrand
2022-09-26  7:44       ` Kalle Valo
2022-10-04 12:32         ` David Hildenbrand
2022-09-22 13:43   ` Akira Yokosawa
2022-09-22 14:41     ` David Hildenbrand
2022-09-23  2:26   ` John Hubbard
2022-09-23  2:37     ` John Hubbard
2022-09-23 10:55     ` David Hildenbrand
2022-09-20 12:23 ` [PATCH v1 2/3] powerpc/prom_init: drop PROM_BUG() David Hildenbrand
2022-09-21 13:02   ` Michael Ellerman
2022-09-21 13:03     ` David Hildenbrand
2022-09-20 12:23 ` [PATCH v1 3/3] checkpatch: warn on usage of VM_BUG_ON() and other BUG variants David Hildenbrand
2022-09-23  2:05   ` John Hubbard
2022-09-23  2:11     ` Joe Perches
2022-09-23  2:20       ` John Hubbard
2022-09-23 10:58         ` David Hildenbrand
2022-10-04 13:24 ` (subset) [PATCH v1 0/3] coding-style.rst: document BUG() and WARN() rules Michael Ellerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).