From: David Hildenbrand <david@redhat.com>
To: Kalle Valo <kvalo@kernel.org>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>,
Baoquan He <bhe@redhat.com>,
linux-doc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
Dave Young <dyoung@redhat.com>, Jonathan Corbet <corbet@lwn.net>,
Nicholas Piggin <npiggin@gmail.com>,
linux-kernel@vger.kernel.org,
Jani Nikula <jani.nikula@linux.intel.com>,
linux-mm@kvack.org, David Laight <David.Laight@ACULAB.COM>,
Dwaipayan Ray <dwaipayanray1@gmail.com>,
Andy Whitcroft <apw@canonical.com>, Joe Perches <joe@perches.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Ingo Molnar <mingo@kernel.org>, Vivek Goyal <vgoyal@redhat.com>
Subject: Re: [PATCH v1 1/3] coding-style.rst: document BUG() and WARN() rules ("do not crash the kernel")
Date: Thu, 22 Sep 2022 16:12:05 +0200 [thread overview]
Message-ID: <d1ceedca-b28e-c47e-aa0d-aa1cb36d12b9@redhat.com> (raw)
In-Reply-To: <87pmfp8hnj.fsf@kernel.org>
On 21.09.22 06:40, Kalle Valo wrote:
> David Hildenbrand <david@redhat.com> writes:
>
>> Linus notes [1] that the introduction of new code that uses VM_BUG_ON()
>> is just as bad as BUG_ON(), because it will crash the kernel on
>> distributions that enable CONFIG_DEBUG_VM (like Fedora):
>>
>> VM_BUG_ON() has the exact same semantics as BUG_ON. It is literally
>> no different, the only difference is "we can make the code smaller
>> because these are less important". [2]
>>
>> This resulted in a more generic discussion about usage of BUG() and
>> friends. While there might be corner cases that still deserve a BUG_ON(),
>> most BUG_ON() cases should simply use WARN_ON_ONCE() and implement a
>> recovery path if reasonable:
>>
>> The only possible case where BUG_ON can validly be used is "I have
>> some fundamental data corruption and cannot possibly return an
>> error". [2]
>>
>> As a very good approximation is the general rule:
>>
>> "absolutely no new BUG_ON() calls _ever_" [2]
>>
>> ... not even if something really shouldn't ever happen and is merely for
>> documenting that an invariant always has to hold. However, there are sill
>> exceptions where BUG_ON() may be used:
>>
>> If you have a "this is major internal corruption, there's no way we can
>> continue", then BUG_ON() is appropriate. [3]
>>
>> There is only one good BUG_ON():
>>
>> Now, that said, there is one very valid sub-form of BUG_ON():
>> BUILD_BUG_ON() is absolutely 100% fine. [2]
>>
>> While WARN will also crash the machine with panic_on_warn set, that's
>> exactly to be expected:
>>
>> So we have two very different cases: the "virtual machine with good
>> logging where a dead machine is fine" - use 'panic_on_warn'. And
>> the actual real hardware with real drivers, running real loads by
>> users. [4]
>>
>> The basic idea is that warnings will similarly get reported by users
>> and be found during testing. However, in contrast to a BUG(), there is a
>> way to actually influence the expected behavior (e.g., panic_on_warn)
>> and to eventually keep the machine alive to extract some debug info.
>>
>> Ingo notes that not all WARN_ON_ONCE cases need recovery. If we don't ever
>> expect this code to trigger in any case, recovery code is not really
>> helpful.
>>
>> I'd prefer to keep all these warnings 'simple' - i.e. no attempted
>> recovery & control flow, unless we ever expect these to trigger.
>> [5]
>>
>> There have been different rules floating around that were never properly
>> documented. Let's try to clarify.
>>
>> [1] https://lkml.kernel.org/r/CAHk-=wiEAH+ojSpAgx_Ep=NKPWHU8AdO3V56BXcCsU97oYJ1EA@mail.gmail.com
>> [2] https://lore.kernel.org/r/CAHk-=wg40EAZofO16Eviaj7mfqDhZ2gVEbvfsMf6gYzspRjYvw@mail.gmail.com
>> [2] https://lkml.kernel.org/r/CAHk-=wit-DmhMfQErY29JSPjFgebx_Ld+pnerc4J2Ag990WwAA@mail.gmail.com
>> [4] https://lore.kernel.org/r/CAHk-=wgF7K2gSSpy=m_=K3Nov4zaceUX9puQf1TjkTJLA2XC_g@mail.gmail.com
>> [5] https://lore.kernel.org/r/YwIW+mVeZoTOxn%2F4@gmail.com
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>
> [...]
>
>> +Use WARN_ON_ONCE() rather than WARN() or WARN_ON()
>> +**************************************************
>> +
>> +WARN_ON_ONCE() is generally preferred over WARN() or WARN_ON(), because it
>> +is common for a given warning condition, if it occurs at all, to occur
>> +multiple times. This can fill up and wrap the kernel log, and can even slow
>> +the system enough that the excessive logging turns into its own, additional
>> +problem.
>
> FWIW I have had cases where WARN() messages caused a reboot, maybe
> mention that here? In my case the logging was so excessive that the
> watchdog wasn't updated and in the end the device was forcefully
> rebooted.
>
That should be covered by the last part, no? What would be your suggestion?
--
Thanks,
David / dhildenb
next prev parent reply other threads:[~2022-09-22 14:13 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-20 12:22 [PATCH v1 0/3] coding-style.rst: document BUG() and WARN() rules David Hildenbrand
2022-09-20 12:23 ` [PATCH v1 1/3] coding-style.rst: document BUG() and WARN() rules ("do not crash the kernel") David Hildenbrand
2022-09-21 4:40 ` Kalle Valo
2022-09-22 14:12 ` David Hildenbrand [this message]
2022-09-26 7:44 ` Kalle Valo
2022-10-04 12:32 ` David Hildenbrand
2022-09-22 13:43 ` Akira Yokosawa
2022-09-22 14:41 ` David Hildenbrand
2022-09-23 2:26 ` John Hubbard
2022-09-23 2:37 ` John Hubbard
2022-09-23 10:55 ` David Hildenbrand
2022-09-20 12:23 ` [PATCH v1 2/3] powerpc/prom_init: drop PROM_BUG() David Hildenbrand
2022-09-21 13:02 ` Michael Ellerman
2022-09-21 13:03 ` David Hildenbrand
2022-09-20 12:23 ` [PATCH v1 3/3] checkpatch: warn on usage of VM_BUG_ON() and other BUG variants David Hildenbrand
2022-09-23 2:05 ` John Hubbard
2022-09-23 2:11 ` Joe Perches
2022-09-23 2:20 ` John Hubbard
2022-09-23 10:58 ` David Hildenbrand
2022-10-04 13:24 ` (subset) [PATCH v1 0/3] coding-style.rst: document BUG() and WARN() rules Michael Ellerman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d1ceedca-b28e-c47e-aa0d-aa1cb36d12b9@redhat.com \
--to=david@redhat.com \
--cc=David.Laight@ACULAB.COM \
--cc=akpm@linux-foundation.org \
--cc=apw@canonical.com \
--cc=bhe@redhat.com \
--cc=corbet@lwn.net \
--cc=dwaipayanray1@gmail.com \
--cc=dyoung@redhat.com \
--cc=jani.nikula@linux.intel.com \
--cc=joe@perches.com \
--cc=kvalo@kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=lukas.bulwahn@gmail.com \
--cc=mingo@kernel.org \
--cc=npiggin@gmail.com \
--cc=torvalds@linux-foundation.org \
--cc=vgoyal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).