Documentation/submitting-patches: Add blurb about backtraces in commit messages
diff mbox series

Message ID 20201222130555.GA13463@zn.tnic
State New, archived
Headers show
Series
  • Documentation/submitting-patches: Add blurb about backtraces in commit messages
Related show

Commit Message

Borislav Petkov Dec. 22, 2020, 1:05 p.m. UTC
On Mon, Dec 21, 2020 at 09:54:25AM -0700, Jonathan Corbet wrote:
> Given that there was indeed a lack of bikeshedding, I just went ahead
> and applied this to docs-next; will ship it Linusward in the near
> future.

Cool.

Although I betcha that is because of the impending holiday season. I
think I should just use that moment to sneak in more stuff stealthily
and when everyone awakes from their drunken and food stupor, it'll all
be there! Tadaa!

:-)))

Ok, here's the next one which I think, is also, not really controversial.

Thx.

---
From: Borislav Petkov <bp@suse.de>
Date: Tue, 22 Dec 2020 13:58:22 +0100

Document that backtraces in commit messages should be trimmed down to
the useful information only.

This has been carved out from a tip subsystem handbook patchset by
Thomas Gleixner:

  https://lkml.kernel.org/r/20181107171010.421878737@linutronix.de

and incorporates follow-on comments.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 Documentation/process/submitting-patches.rst | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

Comments

Sean Christopherson Dec. 22, 2020, 6:59 p.m. UTC | #1
On Tue, Dec 22, 2020, Borislav Petkov wrote:
> Ok, here's the next one which I think, is also, not really controversial.

Heh, are you trying to jinx yourself?

> diff --git a/Documentation/process/submitting-patches.rst b/Documentation/process/submitting-patches.rst
> index 5ba54120bef7..0ffb21366381 100644
> --- a/Documentation/process/submitting-patches.rst
> +++ b/Documentation/process/submitting-patches.rst
> @@ -679,6 +679,26 @@ generates appropriate diffstats by default.)
>  See more details on the proper patch format in the following
>  references.
>  
> +Backtraces in commit mesages
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +Backtraces help document the call chain leading to a problem. However,
> +not all backtraces are helpful. For example, early boot call chains are
> +unique and obvious.

I'd argue that there is still value in the backtrace though, e.g. I find them
very helpful when doing git archaeology.  A backtrace is an easily recognizable
signature (don't have to read a bunch of text to understand there was a splat of
some kind), and the call stack is often helpful even if it is unique, e.g. for
unfamiliar code (including early boot chains) and/or code that is substantially
different from the current upstream.

> Copying the full dmesg output verbatim, however,
> +adds distracting information like timestamps, module lists, register and
> +stack dumps.
> +
> +Therefore, the most useful backtraces should distill the relevant
> +information from the dump, which makes it easier to focus on the real
> +issue. Here is an example of a well-trimmed backtrace::
> +
> +  unchecked MSR access error: WRMSR to 0xd51 (tried to write 0x0000000000000064)
> +  at rIP: 0xffffffffae059994 (native_write_msr+0x4/0x20)
> +  Call Trace:
> +  mba_wrmsr
> +  update_domains
> +  rdtgroup_mkdir
> +
>  .. _explicit_in_reply_to:

I'd prefer not to encourage people to strip the info after the function name,
though I do agree it's somewhat distracting (especially the offset/size).  The
module, call site in the function, exact file/line if available, etc... provides
context that I find helpful for building a mental model of what went wrong.
E.g. which modules are in play, which short wrapper functions can likely be
glossed over, etc...
Borislav Petkov Dec. 22, 2020, 7:25 p.m. UTC | #2
On Tue, Dec 22, 2020 at 10:59:22AM -0800, Sean Christopherson wrote:
> On Tue, Dec 22, 2020, Borislav Petkov wrote:
> > Ok, here's the next one which I think, is also, not really controversial.
> 
> Heh, are you trying to jinx yourself?

I was trying to conjure up some bikeshedding... and there it is! :-)

> > +Backtraces help document the call chain leading to a problem. However,
> > +not all backtraces are helpful. For example, early boot call chains are
> > +unique and obvious.
> 
> I'd argue that there is still value in the backtrace though, e.g. I find them
> very helpful when doing git archaeology.  A backtrace is an easily recognizable
> signature (don't have to read a bunch of text to understand there was a splat of
> some kind), and the call stack is often helpful even if it is unique, e.g. for
> unfamiliar code (including early boot chains) and/or code that is substantially
> different from the current upstream.

I think the intent of the text is to say not to include callchains which
are *really* obvious. As in, there's no ambiguity as to how one has
landed here.

Also, sometimes people paste backtraces from a WARN* which are almost
always superfluous - only the warn's address is important. This is at
least how I go about debugging those.

Maybe the text should be made more precise.

> I'd prefer not to encourage people to strip the info after the function name,
> though I do agree it's somewhat distracting (especially the offset/size).

Yes. Especially since they don't make any sense on another system or
even on the same system but with a different .config.

> The module, call site in the function, exact file/line if available,
> etc... provides context that I find helpful for building a mental
> model of what went wrong.

File/line is more useful, yes, but only for the current code snapshot.
When time passes and stuff gets changed, those file/line things are not
correct anymore so one would have to checkout the tree on which the
splat happened.

I guess I need to make that aspect more precise too.

> E.g. which modules are in play, which short wrapper functions can
> likely be glossed over, etc...

That example doesn't have modules. I guess I'll generate a new one.

Thx.
Sean Christopherson Dec. 28, 2020, 5:59 p.m. UTC | #3
On Tue, Dec 22, 2020, Borislav Petkov wrote:
> On Tue, Dec 22, 2020 at 10:59:22AM -0800, Sean Christopherson wrote:
> > On Tue, Dec 22, 2020, Borislav Petkov wrote:
> > > +Backtraces help document the call chain leading to a problem. However,
> > > +not all backtraces are helpful. For example, early boot call chains are
> > > +unique and obvious.
> > 
> > I'd argue that there is still value in the backtrace though, e.g. I find them
> > very helpful when doing git archaeology.  A backtrace is an easily recognizable
> > signature (don't have to read a bunch of text to understand there was a splat of
> > some kind), and the call stack is often helpful even if it is unique, e.g. for
> > unfamiliar code (including early boot chains) and/or code that is substantially
> > different from the current upstream.
> 
> I think the intent of the text is to say not to include callchains which
> are *really* obvious. As in, there's no ambiguity as to how one has
> landed here.
> 
> Also, sometimes people paste backtraces from a WARN* which are almost
> always superfluous - only the warn's address is important. This is at
> least how I go about debugging those.

Obvious and superfluous for people that are intimately familiar with the code,
but explicit call stacks are extremely helpful when (re)learning code.  Using
the boot code as an example, until one fully understands all the function
pointer shenanigans for the real mode trampoline, initial_code, CPU HP, etc...,
even 100% unambiguous call chains may not be immediately obvious.  IMO, a few
"unnecessary" lines in the changelog is a worthwhile tradeoff if it's helpful to
even one person, now or in the future.

> Maybe the text should be made more precise.
> 
> > I'd prefer not to encourage people to strip the info after the function name,
> > though I do agree it's somewhat distracting (especially the offset/size).
> 
> Yes. Especially since they don't make any sense on another system or
> even on the same system but with a different .config.
>
> > The module, call site in the function, exact file/line if available,
> > etc... provides context that I find helpful for building a mental
> > model of what went wrong.
> 
> File/line is more useful, yes, but only for the current code snapshot.
> When time passes and stuff gets changed, those file/line things are not
> correct anymore so one would have to checkout the tree on which the
> splat happened.

I don't think that's a reason to encourage stripping that info though, e.g. I'll
often checkout the relevant tree when I'm deep down the blame rabbithole.
Similar to my above argument regarding call stacks, the cost of including the
may-be-stale file/line info is outweighed by the potential benefits for readers
and/or future archaeologists.

> I guess I need to make that aspect more precise too.
Borislav Petkov Dec. 28, 2020, 6:15 p.m. UTC | #4
On Mon, Dec 28, 2020 at 09:59:48AM -0800, Sean Christopherson wrote:
> Obvious and superfluous for people that are intimately familiar with the code,
> but explicit call stacks are extremely helpful when (re)learning code.

Here's an example:

[    2.649874] x86/mm: Checked W+X mappings: passed, no W+X pages found.
[    2.651774] ------------[ cut here ]------------
[    2.652726] WARNING: CPU: 2 PID: 1 at init/main.c:1436 kernel_init+0x58/0x107
[    2.653643] Modules linked in:
[    2.654260] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.11.0-rc1+ #2
[    2.654396] usb 1-1: new full-speed USB device number 2 using uhci_hcd
[    2.655125] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
[    2.657687] RIP: 0010:kernel_init+0x58/0x107
[    2.658391] Code: 5e fc ff eb 0c 48 c7 c7 18 bc 09 82 e8 8a a4 fc ff c7 05 a2 5f b0 07 02 00 00 00 e8 a9 1c 9b ff e8 b4 dc 89 ff e8 2f d3 a6 ff <0f> 0b 48 8b 3d 76 87 9d 00 48 85 ff 74 22 e8 ef 19 fc ff 85 c0 89
[    2.660707] RSP: 0018:ffffc90000013f50 EFLAGS: 00010292
[    2.661499] RAX: ffff88800ebcdc01 RBX: ffffffff81837b59 RCX: 0000000000000802
[    2.662469] RDX: 00000000000007f2 RSI: 05b5d427c1a6d18d RDI: 000000000002d100
[    2.663411] RBP: 0000000000000000 R08: 000000000000000d R09: 000000000000000c
[    2.664318] R10: 000000000000000d R11: 0000000000000020 R12: 0000000000000000
[    2.666458] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[    2.667462] FS:  0000000000000000(0000) GS:ffff88807da80000(0000) knlGS:0000000000000000
[    2.668472] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.669257] CR2: 0000000000000000 CR3: 000000000220a000 CR4: 00000000003506e0
[    2.670183] Call Trace:
[    2.670723]  ret_from_fork+0x22/0x30
[    2.671340] ---[ end trace 385e10f347736cf9 ]---

Yes, useless. The callstack is not even there.

This is the only line that matters:

[    2.652726] WARNING: CPU: 2 PID: 1 at init/main.c:1436 kernel_init+0x58/0x107

because what caused it is:

diff --git a/init/main.c b/init/main.c
index 6feee7f11eaf..aa52355d98ae 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1433,6 +1433,8 @@ static int __ref kernel_init(void *unused)
 
        do_sysctl_args();
 
+       WARN_ON_ONCE(1);
+
        if (ramdisk_execute_command) {
                ret = run_init_process(ramdisk_execute_command);
                if (!ret)


even if you had a stack trace here, it would be totally useless because
you *know* where it came from.

But ok, I can change the formulation that it is up to the committer's
judgement to decide whether to remove that blurb. Because I definitely
will be removing a useless noise like that before committing.

Thx.
Sean Christopherson Dec. 28, 2020, 7:02 p.m. UTC | #5
On Mon, Dec 28, 2020, Borislav Petkov wrote:
> On Mon, Dec 28, 2020 at 09:59:48AM -0800, Sean Christopherson wrote:
> > Obvious and superfluous for people that are intimately familiar with the code,
> > but explicit call stacks are extremely helpful when (re)learning code.
> 
> Here's an example:
> 
> [    2.649874] x86/mm: Checked W+X mappings: passed, no W+X pages found.
> [    2.651774] ------------[ cut here ]------------
> [    2.652726] WARNING: CPU: 2 PID: 1 at init/main.c:1436 kernel_init+0x58/0x107
> [    2.653643] Modules linked in:
> [    2.654260] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.11.0-rc1+ #2
> [    2.654396] usb 1-1: new full-speed USB device number 2 using uhci_hcd
> [    2.655125] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
> [    2.657687] RIP: 0010:kernel_init+0x58/0x107
> [    2.658391] Code: 5e fc ff eb 0c 48 c7 c7 18 bc 09 82 e8 8a a4 fc ff c7 05 a2 5f b0 07 02 00 00 00 e8 a9 1c 9b ff e8 b4 dc 89 ff e8 2f d3 a6 ff <0f> 0b 48 8b 3d 76 87 9d 00 48 85 ff 74 22 e8 ef 19 fc ff 85 c0 89
> [    2.660707] RSP: 0018:ffffc90000013f50 EFLAGS: 00010292
> [    2.661499] RAX: ffff88800ebcdc01 RBX: ffffffff81837b59 RCX: 0000000000000802
> [    2.662469] RDX: 00000000000007f2 RSI: 05b5d427c1a6d18d RDI: 000000000002d100
> [    2.663411] RBP: 0000000000000000 R08: 000000000000000d R09: 000000000000000c
> [    2.664318] R10: 000000000000000d R11: 0000000000000020 R12: 0000000000000000
> [    2.666458] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [    2.667462] FS:  0000000000000000(0000) GS:ffff88807da80000(0000) knlGS:0000000000000000
> [    2.668472] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    2.669257] CR2: 0000000000000000 CR3: 000000000220a000 CR4: 00000000003506e0
> [    2.670183] Call Trace:
> [    2.670723]  ret_from_fork+0x22/0x30
> [    2.671340] ---[ end trace 385e10f347736cf9 ]---
> 
> Yes, useless. The callstack is not even there.
> 
> This is the only line that matters:
> 
> [    2.652726] WARNING: CPU: 2 PID: 1 at init/main.c:1436 kernel_init+0x58/0x107
> 
> because what caused it is:
> 
> diff --git a/init/main.c b/init/main.c
> index 6feee7f11eaf..aa52355d98ae 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -1433,6 +1433,8 @@ static int __ref kernel_init(void *unused)
>  
>         do_sysctl_args();
>  
> +       WARN_ON_ONCE(1);
> +
>         if (ramdisk_execute_command) {
>                 ret = run_init_process(ramdisk_execute_command);
>                 if (!ret)
> 
> 
> even if you had a stack trace here, it would be totally useless because
> you *know* where it came from.
> 
> But ok, I can change the formulation that it is up to the committer's
> judgement to decide whether to remove that blurb. Because I definitely
> will be removing a useless noise like that before committing.

I 100% agree in this specific case.  What I'm arguing is that cases like this
where the call stack is completely uninteresting are few and far between.  The
documentation should focus on the common case where, IMO, the call stack should
be included, even if it might be obvious for some/many people.  E.g. similar to
how many kconfig options have something like "If unsure, say Y", I think the
blurb should encourage including the call stack unless the author is 100% sure
that it's useless.
Jonathan Corbet Jan. 4, 2021, 11:19 p.m. UTC | #6
[Digging out from under the pile of mail...]

> From: Borislav Petkov <bp@suse.de>
> Date: Tue, 22 Dec 2020 13:58:22 +0100
> 
> Document that backtraces in commit messages should be trimmed down to
> the useful information only.
> 
> This has been carved out from a tip subsystem handbook patchset by
> Thomas Gleixner:
> 
>   https://lkml.kernel.org/r/20181107171010.421878737@linutronix.de
> 
> and incorporates follow-on comments.
> 
> Signed-off-by: Borislav Petkov <bp@suse.de>
> ---
>  Documentation/process/submitting-patches.rst | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/Documentation/process/submitting-patches.rst b/Documentation/process/submitting-patches.rst
> index 5ba54120bef7..0ffb21366381 100644
> --- a/Documentation/process/submitting-patches.rst
> +++ b/Documentation/process/submitting-patches.rst
> @@ -679,6 +679,26 @@ generates appropriate diffstats by default.)
>  See more details on the proper patch format in the following
>  references.
>  
> +Backtraces in commit mesages
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +Backtraces help document the call chain leading to a problem. However,
> +not all backtraces are helpful. For example, early boot call chains are
> +unique and obvious. Copying the full dmesg output verbatim, however,
> +adds distracting information like timestamps, module lists, register and
> +stack dumps.
> +
> +Therefore, the most useful backtraces should distill the relevant
> +information from the dump, which makes it easier to focus on the real
> +issue. Here is an example of a well-trimmed backtrace::
> +
> +  unchecked MSR access error: WRMSR to 0xd51 (tried to write 0x0000000000000064)
> +  at rIP: 0xffffffffae059994 (native_write_msr+0x4/0x20)
> +  Call Trace:
> +  mba_wrmsr
> +  update_domains
> +  rdtgroup_mkdir
> +

So I have some questions, I guess...  How often is a backtrace *in a commit
message* really helpful at all?  The value in problem reports is clear, but
I'm not sure how often having a backtrace in a commit message will really
help the reader understand why the patch was written.  But perhaps I'm
wrong?

If we do want this advice in our already-too-long submitting-patches
document, we should perhaps give some advice as to what is "relevant
information" and what is not?

Thanks,

jon
Borislav Petkov Jan. 5, 2021, 10:48 a.m. UTC | #7
On Mon, Jan 04, 2021 at 04:19:11PM -0700, Jonathan Corbet wrote:
> So I have some questions, I guess...  How often is a backtrace *in a commit
> message* really helpful at all?  The value in problem reports is clear, but
> I'm not sure how often having a backtrace in a commit message will really
> help the reader understand why the patch was written.  But perhaps I'm
> wrong?

Does the subthread here with Sean shed some light on the matter or... ?

> If we do want this advice in our already-too-long submitting-patches
> document,

Thought the same thing when looking at that doc - it is a *lot* and I
guess we should put only very globally relevant info in there...

> we should perhaps give some advice as to what is "relevant
> information" and what is not?

Right, in that subthread, the gist of what we wanna say is to almost
always put the splat in the commit message - except for the example I
gave there and other early boot cases - but leave it to the committer to
do the final decision whether to keep or ditch the splat.

Something like that. Yah, I know, it is fuzzy :-\
Borislav Petkov Feb. 2, 2021, 3:43 p.m. UTC | #8
On Tue, Jan 05, 2021 at 11:48:05AM +0100, Borislav Petkov wrote:
> On Mon, Jan 04, 2021 at 04:19:11PM -0700, Jonathan Corbet wrote:
> > So I have some questions, I guess...  How often is a backtrace *in a commit
> > message* really helpful at all?  The value in problem reports is clear, but
> > I'm not sure how often having a backtrace in a commit message will really
> > help the reader understand why the patch was written.  But perhaps I'm
> > wrong?
> 
> Does the subthread here with Sean shed some light on the matter or... ?
> 
> > If we do want this advice in our already-too-long submitting-patches
> > document,
> 
> Thought the same thing when looking at that doc - it is a *lot* and I
> guess we should put only very globally relevant info in there...
> 
> > we should perhaps give some advice as to what is "relevant
> > information" and what is not?
> 
> Right, in that subthread, the gist of what we wanna say is to almost
> always put the splat in the commit message - except for the example I
> gave there and other early boot cases - but leave it to the committer to
> do the final decision whether to keep or ditch the splat.
> 
> Something like that. Yah, I know, it is fuzzy :-\

Lemme ping here quick - my TODO list still has it. :-)
Jonathan Corbet Feb. 4, 2021, 9:20 p.m. UTC | #9
Borislav Petkov <bp@alien8.de> writes:

> On Tue, Jan 05, 2021 at 11:48:05AM +0100, Borislav Petkov wrote:
> Lemme ping here quick - my TODO list still has it. :-)

Yeah, it's been languishing on mine as well.  Nobody seems to have any
objections, so I applied it, sorry for sitting on it for so long.

Thanks,

jon

Patch
diff mbox series

diff --git a/Documentation/process/submitting-patches.rst b/Documentation/process/submitting-patches.rst
index 5ba54120bef7..0ffb21366381 100644
--- a/Documentation/process/submitting-patches.rst
+++ b/Documentation/process/submitting-patches.rst
@@ -679,6 +679,26 @@  generates appropriate diffstats by default.)
 See more details on the proper patch format in the following
 references.
 
+Backtraces in commit mesages
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Backtraces help document the call chain leading to a problem. However,
+not all backtraces are helpful. For example, early boot call chains are
+unique and obvious. Copying the full dmesg output verbatim, however,
+adds distracting information like timestamps, module lists, register and
+stack dumps.
+
+Therefore, the most useful backtraces should distill the relevant
+information from the dump, which makes it easier to focus on the real
+issue. Here is an example of a well-trimmed backtrace::
+
+  unchecked MSR access error: WRMSR to 0xd51 (tried to write 0x0000000000000064)
+  at rIP: 0xffffffffae059994 (native_write_msr+0x4/0x20)
+  Call Trace:
+  mba_wrmsr
+  update_domains
+  rdtgroup_mkdir
+
 .. _explicit_in_reply_to:
 
 Explicit In-Reply-To headers