linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] docs: reporting-issues.rst: explain how to decode stack traces
@ 2021-02-10  5:48 Thorsten Leemhuis
  2021-02-11 17:07 ` Randy Dunlap
  2021-02-14 16:00 ` Qais Yousef
  0 siblings, 2 replies; 9+ messages in thread
From: Thorsten Leemhuis @ 2021-02-10  5:48 UTC (permalink / raw)
  To: Jonathan Corbet, Randy Dunlap
  Cc: linux-doc, linux-kernel, Sasha Levin, Vlastimil Babka,
	Joerg Roedel, Qais Yousef, Damian Tometzki

Replace placeholder text about decoding stack traces with a section that
properly describes what a typical user should do these days. To make
it works for them, add a paragraph in an earlier section to ensure
people build their kernels with everything that's needed to decode stack
traces later.

Signed-off-by: Thorsten Leemhuis <linux@leemhuis.info>
---
Reminder: This is not my area of expertise. Hopefully I didn't write anything
stupid or omitted something people find important. If I did, please let me know,
ideally suggesting what to write; bonus points for people sending text I can
simply include in the next revision.

I CCed Sasha, because he wrote decode_stacktrace.sh; I also CCed a bunch of
people that showed interest in this topic when I asked for help on Twitter.

I'm still unsure if linking to admin-guide/bug-hunting.rst is a good idea, as it
seems quite outdated; reporting-bugs did that, but for now I settled on 'don't
do that'.

Ciao, Thorsten
---
 .../admin-guide/reporting-issues.rst          | 77 +++++++++++++------
 1 file changed, 55 insertions(+), 22 deletions(-)

diff --git a/Documentation/admin-guide/reporting-issues.rst b/Documentation/admin-guide/reporting-issues.rst
index 07879d01fe68..b9c07d8e3141 100644
--- a/Documentation/admin-guide/reporting-issues.rst
+++ b/Documentation/admin-guide/reporting-issues.rst
@@ -154,8 +154,8 @@ After these preparations you'll now enter the main part:
    that hear about it for the first time. And if you learned something in this
    process, consider searching again for existing reports about the issue.
 
- * If the failure includes a stack dump, like an Oops does, consider decoding
-   it to find the offending line of code.
+ * If your failure involves a 'panic', 'oops', or 'warning', consider decoding
+   the kernel log to find the line of code that trigger the error.
 
  * If your problem is a regression, try to narrow down when the issue was
    introduced as much as possible.
@@ -869,6 +869,15 @@ pick up the configuration of your current kernel and then tries to adjust it
 somewhat for your system. That does not make the resulting kernel any better,
 but quicker to compile.
 
+Note: If you are dealing with a kernel panic, oops, or warning, please make
+sure to enable CONFIG_KALLSYMS when configuring your kernel. Additionally,
+enable CONFIG_DEBUG_KERNEL and CONFIG_DEBUG_INFO, too; the latter is the
+relevant one of those two, but can only be reached if you enable the former. Be
+aware CONFIG_DEBUG_INFO increases the storage space required to build a kernel
+by quite a bit. But that's worth it, as these options will allow you later to
+pinpoint the exact line of code that triggers your issue. The section 'Decode
+failure messages' below explains this in more detail.
+
 
 Check 'taint' flag
 ------------------
@@ -923,31 +932,55 @@ instead you can join.
 Decode failure messages
 -----------------------
 
-.. note::
+    *If your failure involves a 'panic', 'oops', or 'warning', consider
+    decoding the kernel log to find the line of code that trigger the error.*
 
-   FIXME: The text in this section is a placeholder for now and quite similar to
-   the old text found in 'Documentation/admin-guide/reporting-bugs.rst'
-   currently. It and the document it references are known to be outdated and
-   thus need to be revisited. Thus consider this note a request for help: if you
-   are familiar with this topic, please write a few lines that would fit here.
-   Alternatively, simply outline the current situation roughly to the main
-   authors of this document (see intro), as they might be able to write
-   something then.
+When the kernel detects an internal problem, it will log some information about
+the executed code. This makes it possible to pinpoint the exact line in the
+source code that triggered the issue and shows how it was called. But that only
+works if you enabled CONFIG_DEBUG_INFO and CONFIG_KALLSYMS when configuring
+your kernel. If you did so, consider to decode the information from the
+kernel's log. That will make it a lot easier to understand what lead to the
+'panic', 'oops', or 'warning', which increases the chances enormously that
+someone can provide a fix.
 
-   This section in the end should answer questions like "when is this actually
-   needed", "what .config options to ideally set earlier to make this step easy
-   or unnecessary?" (likely CONFIG_UNWINDER_ORC when it's available, otherwise
-   CONFIG_UNWINDER_FRAME_POINTER; but is there anything else needed?).
+Decoding can be done with a script you find in the Linux source tree. If you
+are running a kernel you compiled yourself earlier, call it like this::
 
-..
+       [user@something ~]$ sudo dmesg | ./linux-5.10.5/scripts/decode_stacktrace.sh ./linux-5.10.5/vmlinux
+
+If you are running a packaged vanilla kernel, you will likely have to install
+the corresponding packages with debug symbols. Then call the script (which you
+might need to get from the Linux sources if your distro does not package it)
+like this::
+
+       [user@something ~]$ sudo dmesg | ./linux-5.10.5/scripts/decode_stacktrace.sh \
+        /usr/lib/debug/lib/modules/5.10.10-4.1.x86_64/vmlinux /usr/src/kernels/5.10.10-4.1.x86_64/
+
+The script will work on log lines like the following, which show the address of
+the code the kernel was executing when the error occurred::
+
+       [   68.387301] RIP: 0010:test_module_init+0x5/0xffa [test_module]
+
+Once decoded, these lines will look like this::
+
+       [   68.387301] RIP: 0010:test_module_init (/home/username/linux-5.10.5/test-module/test-module.c:16) test_module
+
+In this case the executed code was built from the file
+'~/linux-5.10.5/test-module/test-module.c' and the error occurred by the
+instructions found in line '16'.
 
-    *If the failure includes a stack dump, like an Oops does, consider decoding
-    it to find the offending line of code.*
+The script will similarly decode the addresses mentioned in the section
+starting with 'Call trace', which show the path to the function where the
+problem occurred. Additionally, the script will show the assembler output for
+the code section the kernel was executing.
 
-When the kernel detects an error, it will print a stack dump that allows to
-identify the exact line of code where the issue happens. But that information
-sometimes needs to get decoded to be readable, which is explained in
-admin-guide/bug-hunting.rst.
+Note, if you can't get this to work, simply skip this step and mention the
+reason for it in the report. If you're lucky, it might not be needed. And if it
+is, someone might help you to get things going. Also be aware this is just one
+of several ways to decode kernel stack traces. Sometimes different steps will
+be required to retrieve the relevant details. Don't worry about that, if that's
+needed in your case, developers will tell you what to do.
 
 
 Special care for regressions
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] docs: reporting-issues.rst: explain how to decode stack traces
  2021-02-10  5:48 [PATCH] docs: reporting-issues.rst: explain how to decode stack traces Thorsten Leemhuis
@ 2021-02-11 17:07 ` Randy Dunlap
  2021-02-15  5:25   ` Thorsten Leemhuis
  2021-02-14 16:00 ` Qais Yousef
  1 sibling, 1 reply; 9+ messages in thread
From: Randy Dunlap @ 2021-02-11 17:07 UTC (permalink / raw)
  To: Thorsten Leemhuis, Jonathan Corbet
  Cc: linux-doc, linux-kernel, Sasha Levin, Vlastimil Babka,
	Joerg Roedel, Qais Yousef, Damian Tometzki

Hi Thorsten,

Just a couple of small nits (or one that is repeated):

On 2/9/21 9:48 PM, Thorsten Leemhuis wrote:
> Replace placeholder text about decoding stack traces with a section that
> properly describes what a typical user should do these days. To make
> it works for them, add a paragraph in an earlier section to ensure
> people build their kernels with everything that's needed to decode stack
> traces later.
> 
> Signed-off-by: Thorsten Leemhuis <linux@leemhuis.info>
> ---
>  .../admin-guide/reporting-issues.rst          | 77 +++++++++++++------
>  1 file changed, 55 insertions(+), 22 deletions(-)
> 
> diff --git a/Documentation/admin-guide/reporting-issues.rst b/Documentation/admin-guide/reporting-issues.rst
> index 07879d01fe68..b9c07d8e3141 100644
> --- a/Documentation/admin-guide/reporting-issues.rst
> +++ b/Documentation/admin-guide/reporting-issues.rst
> @@ -154,8 +154,8 @@ After these preparations you'll now enter the main part:
>     that hear about it for the first time. And if you learned something in this
>     process, consider searching again for existing reports about the issue.
>  
> - * If the failure includes a stack dump, like an Oops does, consider decoding
> -   it to find the offending line of code.
> + * If your failure involves a 'panic', 'oops', or 'warning', consider decoding
> +   the kernel log to find the line of code that trigger the error.

                                                   triggered

>  
>   * If your problem is a regression, try to narrow down when the issue was
>     introduced as much as possible.
> @@ -869,6 +869,15 @@ pick up the configuration of your current kernel and then tries to adjust it
>  somewhat for your system. That does not make the resulting kernel any better,
>  but quicker to compile.
>  
>  
>  Check 'taint' flag
>  ------------------
> @@ -923,31 +932,55 @@ instead you can join.
>  Decode failure messages
>  -----------------------
>  
> -.. note::
> +    *If your failure involves a 'panic', 'oops', or 'warning', consider
> +    decoding the kernel log to find the line of code that trigger the error.*

                                                             triggered


or it could be "code that triggers"... (just not "trigger").


-- 
~Randy


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] docs: reporting-issues.rst: explain how to decode stack traces
  2021-02-10  5:48 [PATCH] docs: reporting-issues.rst: explain how to decode stack traces Thorsten Leemhuis
  2021-02-11 17:07 ` Randy Dunlap
@ 2021-02-14 16:00 ` Qais Yousef
  2021-02-15  5:55   ` Thorsten Leemhuis
  1 sibling, 1 reply; 9+ messages in thread
From: Qais Yousef @ 2021-02-14 16:00 UTC (permalink / raw)
  To: Thorsten Leemhuis
  Cc: Jonathan Corbet, Randy Dunlap, linux-doc, linux-kernel,
	Sasha Levin, Vlastimil Babka, Joerg Roedel, Damian Tometzki

On 02/10/21 06:48, Thorsten Leemhuis wrote:
> Replace placeholder text about decoding stack traces with a section that
> properly describes what a typical user should do these days. To make
> it works for them, add a paragraph in an earlier section to ensure
> people build their kernels with everything that's needed to decode stack
> traces later.
> 
> Signed-off-by: Thorsten Leemhuis <linux@leemhuis.info>
> ---
> Reminder: This is not my area of expertise. Hopefully I didn't write anything
> stupid or omitted something people find important. If I did, please let me know,
> ideally suggesting what to write; bonus points for people sending text I can
> simply include in the next revision.

Thanks for your effort :-)

> 
> I CCed Sasha, because he wrote decode_stacktrace.sh; I also CCed a bunch of
> people that showed interest in this topic when I asked for help on Twitter.
> 
> I'm still unsure if linking to admin-guide/bug-hunting.rst is a good idea, as it
> seems quite outdated; reporting-bugs did that, but for now I settled on 'don't
> do that'.
> 
> Ciao, Thorsten
> ---
>  .../admin-guide/reporting-issues.rst          | 77 +++++++++++++------
>  1 file changed, 55 insertions(+), 22 deletions(-)
> 
> diff --git a/Documentation/admin-guide/reporting-issues.rst b/Documentation/admin-guide/reporting-issues.rst
> index 07879d01fe68..b9c07d8e3141 100644
> --- a/Documentation/admin-guide/reporting-issues.rst
> +++ b/Documentation/admin-guide/reporting-issues.rst
> @@ -154,8 +154,8 @@ After these preparations you'll now enter the main part:
>     that hear about it for the first time. And if you learned something in this
>     process, consider searching again for existing reports about the issue.
>  
> - * If the failure includes a stack dump, like an Oops does, consider decoding
> -   it to find the offending line of code.
> + * If your failure involves a 'panic', 'oops', or 'warning', consider decoding

or 'BUG'? There are similar other places below that could benefit from this
addition too.

> +   the kernel log to find the line of code that trigger the error.
>  
>   * If your problem is a regression, try to narrow down when the issue was
>     introduced as much as possible.
> @@ -869,6 +869,15 @@ pick up the configuration of your current kernel and then tries to adjust it
>  somewhat for your system. That does not make the resulting kernel any better,
>  but quicker to compile.
>  
> +Note: If you are dealing with a kernel panic, oops, or warning, please make
> +sure to enable CONFIG_KALLSYMS when configuring your kernel. Additionally,

s/make sure/try/

s/kernel./kernel if you can./

Less demanding wording in case the user doesn't have the capability to rebuild
or deploy such a kernel where the problem happens. Maybe you can tweak it more
if you like too :-)

> +enable CONFIG_DEBUG_KERNEL and CONFIG_DEBUG_INFO, too; the latter is the
> +relevant one of those two, but can only be reached if you enable the former. Be
> +aware CONFIG_DEBUG_INFO increases the storage space required to build a kernel
> +by quite a bit. But that's worth it, as these options will allow you later to
> +pinpoint the exact line of code that triggers your issue. The section 'Decode
> +failure messages' below explains this in more detail.
> +

I think worth mentioning too that the user should keep a log of the problem
when first encountered and then attempt the above. Just in case the problem is
not reproducible easily so the info is not lost.

Maybe something like below:

'''
Always keep a record of the issue encountered in case it is hard to reproduce.
Sending undecoded report is better than not sending a report at all.
'''

>  
>  Check 'taint' flag
>  ------------------
> @@ -923,31 +932,55 @@ instead you can join.
>  Decode failure messages
>  -----------------------
>  
> -.. note::
> +    *If your failure involves a 'panic', 'oops', or 'warning', consider

Thanks for choosing the word consider, it shouldn't be compulsory IMO.

> +    decoding the kernel log to find the line of code that trigger the error.*
>  
> -   FIXME: The text in this section is a placeholder for now and quite similar to
> -   the old text found in 'Documentation/admin-guide/reporting-bugs.rst'
> -   currently. It and the document it references are known to be outdated and
> -   thus need to be revisited. Thus consider this note a request for help: if you
> -   are familiar with this topic, please write a few lines that would fit here.
> -   Alternatively, simply outline the current situation roughly to the main
> -   authors of this document (see intro), as they might be able to write
> -   something then.
> +When the kernel detects an internal problem, it will log some information about
> +the executed code. This makes it possible to pinpoint the exact line in the
> +source code that triggered the issue and shows how it was called. But that only
> +works if you enabled CONFIG_DEBUG_INFO and CONFIG_KALLSYMS when configuring
> +your kernel. If you did so, consider to decode the information from the
> +kernel's log. That will make it a lot easier to understand what lead to the
> +'panic', 'oops', or 'warning', which increases the chances enormously that
> +someone can provide a fix.

I suggest removing the word enormously. It helps, but it all depends on the
particular circumstances. Sometimes it does, others it doesn't.

>  
> -   This section in the end should answer questions like "when is this actually
> -   needed", "what .config options to ideally set earlier to make this step easy
> -   or unnecessary?" (likely CONFIG_UNWINDER_ORC when it's available, otherwise
> -   CONFIG_UNWINDER_FRAME_POINTER; but is there anything else needed?).
> +Decoding can be done with a script you find in the Linux source tree. If you
> +are running a kernel you compiled yourself earlier, call it like this::
>  
> -..
> +       [user@something ~]$ sudo dmesg | ./linux-5.10.5/scripts/decode_stacktrace.sh ./linux-5.10.5/vmlinux
> +
> +If you are running a packaged vanilla kernel, you will likely have to install
> +the corresponding packages with debug symbols. Then call the script (which you
> +might need to get from the Linux sources if your distro does not package it)
> +like this::
> +
> +       [user@something ~]$ sudo dmesg | ./linux-5.10.5/scripts/decode_stacktrace.sh \
> +        /usr/lib/debug/lib/modules/5.10.10-4.1.x86_64/vmlinux /usr/src/kernels/5.10.10-4.1.x86_64/
> +
> +The script will work on log lines like the following, which show the address of
> +the code the kernel was executing when the error occurred::
> +
> +       [   68.387301] RIP: 0010:test_module_init+0x5/0xffa [test_module]
> +
> +Once decoded, these lines will look like this::
> +
> +       [   68.387301] RIP: 0010:test_module_init (/home/username/linux-5.10.5/test-module/test-module.c:16) test_module
> +
> +In this case the executed code was built from the file
> +'~/linux-5.10.5/test-module/test-module.c' and the error occurred by the
> +instructions found in line '16'.
>  
> -    *If the failure includes a stack dump, like an Oops does, consider decoding
> -    it to find the offending line of code.*
> +The script will similarly decode the addresses mentioned in the section
> +starting with 'Call trace', which show the path to the function where the
> +problem occurred. Additionally, the script will show the assembler output for
> +the code section the kernel was executing.
>  
> -When the kernel detects an error, it will print a stack dump that allows to
> -identify the exact line of code where the issue happens. But that information
> -sometimes needs to get decoded to be readable, which is explained in
> -admin-guide/bug-hunting.rst.
> +Note, if you can't get this to work, simply skip this step and mention the
> +reason for it in the report. If you're lucky, it might not be needed. And if it
> +is, someone might help you to get things going. Also be aware this is just one
> +of several ways to decode kernel stack traces. Sometimes different steps will
> +be required to retrieve the relevant details. Don't worry about that, if that's
> +needed in your case, developers will tell you what to do.

Ah you already clarify nicely here this is a good-to-have rather than
a must-have as I was trying to elude to above :-)

This looks good to me in general. With the above minor nits fixed, feel free to
add my

Reviewed-by: Qais Yousef <qais.yousef@arm.com>

Thanks!

--
Qais Yousef

>  
>  
>  Special care for regressions
> -- 
> 2.29.2
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] docs: reporting-issues.rst: explain how to decode stack traces
  2021-02-11 17:07 ` Randy Dunlap
@ 2021-02-15  5:25   ` Thorsten Leemhuis
  0 siblings, 0 replies; 9+ messages in thread
From: Thorsten Leemhuis @ 2021-02-15  5:25 UTC (permalink / raw)
  To: Randy Dunlap, Jonathan Corbet
  Cc: linux-doc, linux-kernel, Sasha Levin, Vlastimil Babka,
	Joerg Roedel, Qais Yousef, Damian Tometzki

Am 11.02.21 um 18:07 schrieb Randy Dunlap:
> Just a couple of small nits (or one that is repeated):

:-D

> On 2/9/21 9:48 PM, Thorsten Leemhuis wrote:
>>  
>> - * If the failure includes a stack dump, like an Oops does, consider decoding
>> -   it to find the offending line of code.
>> + * If your failure involves a 'panic', 'oops', or 'warning', consider decoding
>> +   the kernel log to find the line of code that trigger the error.
>                                                    triggered
> […] 
> or it could be "code that triggers"... (just not "trigger").

Ahh, yes, you're right of course. Went with the former, many thx for
taking a look and pointing it out!

Ciao, Thorsten

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] docs: reporting-issues.rst: explain how to decode stack traces
  2021-02-14 16:00 ` Qais Yousef
@ 2021-02-15  5:55   ` Thorsten Leemhuis
  2021-02-15 14:28     ` Qais Yousef
  0 siblings, 1 reply; 9+ messages in thread
From: Thorsten Leemhuis @ 2021-02-15  5:55 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Jonathan Corbet, Randy Dunlap, linux-doc, linux-kernel,
	Sasha Levin, Vlastimil Babka, Joerg Roedel, Damian Tometzki

Hi! Many thx for looking into this, much appreciated!

Am 14.02.21 um 17:00 schrieb Qais Yousef:
> On 02/10/21 06:48, Thorsten Leemhuis wrote:
>
>> - * If the failure includes a stack dump, like an Oops does, consider decoding
>> -   it to find the offending line of code.
>> + * If your failure involves a 'panic', 'oops', or 'warning', consider decoding
> or 'BUG'? There are similar other places below that could benefit from this
> addition too.

Good point. In fact there are other places in the document where this is
needed as well. Will address those in another patch.

>> +   the kernel log to find the line of code that trigger the error.
>>  
>>   * If your problem is a regression, try to narrow down when the issue was
>>     introduced as much as possible.
>> @@ -869,6 +869,15 @@ pick up the configuration of your current kernel and then tries to adjust it
>>  somewhat for your system. That does not make the resulting kernel any better,
>>  but quicker to compile.
>>  
>> +Note: If you are dealing with a kernel panic, oops, or warning, please make
>> +sure to enable CONFIG_KALLSYMS when configuring your kernel. Additionally,
> 
> s/make sure/try/

I did that, but ignored...

> s/kernel./kernel if you can./

...this. Yes, you have a point with...

> Less demanding wording in case the user doesn't have the capability to rebuild
> or deploy such a kernel where the problem happens. Maybe you can tweak it more
> if you like too :-)

...that, but that section in the document is about building your own
kernel, so I'd say we don't have to be that careful here.

>> +enable CONFIG_DEBUG_KERNEL and CONFIG_DEBUG_INFO, too; the latter is the
>> +relevant one of those two, but can only be reached if you enable the former. Be
>> +aware CONFIG_DEBUG_INFO increases the storage space required to build a kernel
>> +by quite a bit. But that's worth it, as these options will allow you later to
>> +pinpoint the exact line of code that triggers your issue. The section 'Decode
>> +failure messages' below explains this in more detail.
>
> I think worth mentioning too that the user should keep a log of the problem
> when first encountered and then attempt the above. Just in case the problem is
> not reproducible easily so the info is not lost.
> 
> Maybe something like below:
> 
> '''
> Always keep a record of the issue encountered in case it is hard to reproduce.
> Sending undecoded report is better than not sending a report at all.
> '''

Very good point, added.

>> +your kernel. If you did so, consider to decode the information from the
>> +kernel's log. That will make it a lot easier to understand what lead to the
>> +'panic', 'oops', or 'warning', which increases the chances enormously that
>> +someone can provide a fix.
> I suggest removing the word enormously. It helps, but it all depends on the
> particular circumstances. Sometimes it does, others it doesn't.

Done.

> This looks good to me in general. With the above minor nits fixed, feel free to
> add my
> Reviewed-by: Qais Yousef <qais.yousef@arm.com>

Great, thx, will do!

Ciao, Thorsten

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] docs: reporting-issues.rst: explain how to decode stack traces
  2021-02-15  5:55   ` Thorsten Leemhuis
@ 2021-02-15 14:28     ` Qais Yousef
  0 siblings, 0 replies; 9+ messages in thread
From: Qais Yousef @ 2021-02-15 14:28 UTC (permalink / raw)
  To: Thorsten Leemhuis
  Cc: Jonathan Corbet, Randy Dunlap, linux-doc, linux-kernel,
	Sasha Levin, Vlastimil Babka, Joerg Roedel, Damian Tometzki

Hi Thorsten

On 02/15/21 06:55, Thorsten Leemhuis wrote:
> Hi! Many thx for looking into this, much appreciated!
> 
> Am 14.02.21 um 17:00 schrieb Qais Yousef:
> > On 02/10/21 06:48, Thorsten Leemhuis wrote:
> >
> >> - * If the failure includes a stack dump, like an Oops does, consider decoding
> >> -   it to find the offending line of code.
> >> + * If your failure involves a 'panic', 'oops', or 'warning', consider decoding
> > or 'BUG'? There are similar other places below that could benefit from this
> > addition too.
> 
> Good point. In fact there are other places in the document where this is
> needed as well. Will address those in another patch.
> 
> >> +   the kernel log to find the line of code that trigger the error.
> >>  
> >>   * If your problem is a regression, try to narrow down when the issue was
> >>     introduced as much as possible.
> >> @@ -869,6 +869,15 @@ pick up the configuration of your current kernel and then tries to adjust it
> >>  somewhat for your system. That does not make the resulting kernel any better,
> >>  but quicker to compile.
> >>  
> >> +Note: If you are dealing with a kernel panic, oops, or warning, please make
> >> +sure to enable CONFIG_KALLSYMS when configuring your kernel. Additionally,
> > 
> > s/make sure/try/
> 
> I did that, but ignored...
> 
> > s/kernel./kernel if you can./
> 
> ...this. Yes, you have a point with...
> 
> > Less demanding wording in case the user doesn't have the capability to rebuild
> > or deploy such a kernel where the problem happens. Maybe you can tweak it more
> > if you like too :-)
> 
> ...that, but that section in the document is about building your own
> kernel, so I'd say we don't have to be that careful here.

Cool. Works for me.

Thanks

--
Qais Yousef

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] docs: reporting-issues.rst: explain how to decode stack traces
  2021-02-15 17:28 Thorsten Leemhuis
  2021-02-23 11:57 ` Vlastimil Babka
@ 2021-03-01 22:05 ` Jonathan Corbet
  1 sibling, 0 replies; 9+ messages in thread
From: Jonathan Corbet @ 2021-03-01 22:05 UTC (permalink / raw)
  To: Thorsten Leemhuis, Randy Dunlap
  Cc: linux-doc, linux-kernel, Sasha Levin, Vlastimil Babka,
	Joerg Roedel, Qais Yousef, Damian Tometzki

Thorsten Leemhuis <linux@leemhuis.info> writes:

> Replace placeholder text about decoding stack traces with a section that
> properly describes what a typical user should do these days. To make
> it works for them, add a paragraph in an earlier section to ensure
> people build their kernels with everything that's needed to decode stack
> traces later.
>
> Signed-off-by: Thorsten Leemhuis <linux@leemhuis.info>
> Reviewed-by: Qais Yousef <qais.yousef@arm.com>
> ---
> v1->v2
> * Fix typo pointed out by Randy
> * include review feedback from Qais and bis Reviewed-by:
>
> v1:
> https://lore.kernel.org/lkml/20210210054823.242262-1-linux@leemhuis.info/
> ---
>  .../admin-guide/reporting-issues.rst          | 81 ++++++++++++++-----
>  1 file changed, 59 insertions(+), 22 deletions(-)

Applied, thanks.

jon

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] docs: reporting-issues.rst: explain how to decode stack traces
  2021-02-15 17:28 Thorsten Leemhuis
@ 2021-02-23 11:57 ` Vlastimil Babka
  2021-03-01 22:05 ` Jonathan Corbet
  1 sibling, 0 replies; 9+ messages in thread
From: Vlastimil Babka @ 2021-02-23 11:57 UTC (permalink / raw)
  To: Thorsten Leemhuis, Jonathan Corbet, Randy Dunlap
  Cc: linux-doc, linux-kernel, Sasha Levin, Joerg Roedel, Qais Yousef,
	Damian Tometzki

On 2/15/21 6:28 PM, Thorsten Leemhuis wrote:
> Replace placeholder text about decoding stack traces with a section that
> properly describes what a typical user should do these days. To make
> it works for them, add a paragraph in an earlier section to ensure
> people build their kernels with everything that's needed to decode stack
> traces later.

Looks good!

> Signed-off-by: Thorsten Leemhuis <linux@leemhuis.info>
> Reviewed-by: Qais Yousef <qais.yousef@arm.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

Thanks!

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] docs: reporting-issues.rst: explain how to decode stack traces
@ 2021-02-15 17:28 Thorsten Leemhuis
  2021-02-23 11:57 ` Vlastimil Babka
  2021-03-01 22:05 ` Jonathan Corbet
  0 siblings, 2 replies; 9+ messages in thread
From: Thorsten Leemhuis @ 2021-02-15 17:28 UTC (permalink / raw)
  To: Jonathan Corbet, Randy Dunlap
  Cc: linux-doc, linux-kernel, Sasha Levin, Vlastimil Babka,
	Joerg Roedel, Qais Yousef, Damian Tometzki

Replace placeholder text about decoding stack traces with a section that
properly describes what a typical user should do these days. To make
it works for them, add a paragraph in an earlier section to ensure
people build their kernels with everything that's needed to decode stack
traces later.

Signed-off-by: Thorsten Leemhuis <linux@leemhuis.info>
Reviewed-by: Qais Yousef <qais.yousef@arm.com>
---
v1->v2
* Fix typo pointed out by Randy
* include review feedback from Qais and bis Reviewed-by:

v1:
https://lore.kernel.org/lkml/20210210054823.242262-1-linux@leemhuis.info/
---
 .../admin-guide/reporting-issues.rst          | 81 ++++++++++++++-----
 1 file changed, 59 insertions(+), 22 deletions(-)

diff --git a/Documentation/admin-guide/reporting-issues.rst b/Documentation/admin-guide/reporting-issues.rst
index 07879d01fe68..18b1280f7abf 100644
--- a/Documentation/admin-guide/reporting-issues.rst
+++ b/Documentation/admin-guide/reporting-issues.rst
@@ -154,8 +154,8 @@ After these preparations you'll now enter the main part:
    that hear about it for the first time. And if you learned something in this
    process, consider searching again for existing reports about the issue.
 
- * If the failure includes a stack dump, like an Oops does, consider decoding
-   it to find the offending line of code.
+ * If your failure involves a 'panic', 'Oops', 'warning', or 'BUG', consider
+   decoding the kernel log to find the line of code that triggered the error.
 
  * If your problem is a regression, try to narrow down when the issue was
    introduced as much as possible.
@@ -869,6 +869,19 @@ pick up the configuration of your current kernel and then tries to adjust it
 somewhat for your system. That does not make the resulting kernel any better,
 but quicker to compile.
 
+Note: If you are dealing with a panic, Oops, warning, or BUG from the kernel,
+please try to enable CONFIG_KALLSYMS when configuring your kernel.
+Additionally, enable CONFIG_DEBUG_KERNEL and CONFIG_DEBUG_INFO, too; the
+latter is the relevant one of those two, but can only be reached if you enable
+the former. Be aware CONFIG_DEBUG_INFO increases the storage space required to
+build a kernel by quite a bit. But that's worth it, as these options will allow
+you later to pinpoint the exact line of code that triggers your issue. The
+section 'Decode failure messages' below explains this in more detail.
+
+But keep in mind: Always keep a record of the issue encountered in case it is
+hard to reproduce. Sending an undecoded report is better than not reporting
+the issue at all.
+
 
 Check 'taint' flag
 ------------------
@@ -923,31 +936,55 @@ instead you can join.
 Decode failure messages
 -----------------------
 
-.. note::
+    *If your failure involves a 'panic', 'Oops', 'warning', or 'BUG', consider
+    decoding the kernel log to find the line of code that triggered the error.*
 
-   FIXME: The text in this section is a placeholder for now and quite similar to
-   the old text found in 'Documentation/admin-guide/reporting-bugs.rst'
-   currently. It and the document it references are known to be outdated and
-   thus need to be revisited. Thus consider this note a request for help: if you
-   are familiar with this topic, please write a few lines that would fit here.
-   Alternatively, simply outline the current situation roughly to the main
-   authors of this document (see intro), as they might be able to write
-   something then.
+When the kernel detects an internal problem, it will log some information about
+the executed code. This makes it possible to pinpoint the exact line in the
+source code that triggered the issue and shows how it was called. But that only
+works if you enabled CONFIG_DEBUG_INFO and CONFIG_KALLSYMS when configuring
+your kernel. If you did so, consider to decode the information from the
+kernel's log. That will make it a lot easier to understand what lead to the
+'panic', 'Oops', 'warning', or 'BUG', which increases the chances that someone
+can provide a fix.
 
-   This section in the end should answer questions like "when is this actually
-   needed", "what .config options to ideally set earlier to make this step easy
-   or unnecessary?" (likely CONFIG_UNWINDER_ORC when it's available, otherwise
-   CONFIG_UNWINDER_FRAME_POINTER; but is there anything else needed?).
+Decoding can be done with a script you find in the Linux source tree. If you
+are running a kernel you compiled yourself earlier, call it like this::
 
-..
+       [user@something ~]$ sudo dmesg | ./linux-5.10.5/scripts/decode_stacktrace.sh ./linux-5.10.5/vmlinux
+
+If you are running a packaged vanilla kernel, you will likely have to install
+the corresponding packages with debug symbols. Then call the script (which you
+might need to get from the Linux sources if your distro does not package it)
+like this::
+
+       [user@something ~]$ sudo dmesg | ./linux-5.10.5/scripts/decode_stacktrace.sh \
+        /usr/lib/debug/lib/modules/5.10.10-4.1.x86_64/vmlinux /usr/src/kernels/5.10.10-4.1.x86_64/
+
+The script will work on log lines like the following, which show the address of
+the code the kernel was executing when the error occurred::
+
+       [   68.387301] RIP: 0010:test_module_init+0x5/0xffa [test_module]
+
+Once decoded, these lines will look like this::
+
+       [   68.387301] RIP: 0010:test_module_init (/home/username/linux-5.10.5/test-module/test-module.c:16) test_module
+
+In this case the executed code was built from the file
+'~/linux-5.10.5/test-module/test-module.c' and the error occurred by the
+instructions found in line '16'.
 
-    *If the failure includes a stack dump, like an Oops does, consider decoding
-    it to find the offending line of code.*
+The script will similarly decode the addresses mentioned in the section
+starting with 'Call trace', which show the path to the function where the
+problem occurred. Additionally, the script will show the assembler output for
+the code section the kernel was executing.
 
-When the kernel detects an error, it will print a stack dump that allows to
-identify the exact line of code where the issue happens. But that information
-sometimes needs to get decoded to be readable, which is explained in
-admin-guide/bug-hunting.rst.
+Note, if you can't get this to work, simply skip this step and mention the
+reason for it in the report. If you're lucky, it might not be needed. And if it
+is, someone might help you to get things going. Also be aware this is just one
+of several ways to decode kernel stack traces. Sometimes different steps will
+be required to retrieve the relevant details. Don't worry about that, if that's
+needed in your case, developers will tell you what to do.
 
 
 Special care for regressions
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-03-02  7:11 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-10  5:48 [PATCH] docs: reporting-issues.rst: explain how to decode stack traces Thorsten Leemhuis
2021-02-11 17:07 ` Randy Dunlap
2021-02-15  5:25   ` Thorsten Leemhuis
2021-02-14 16:00 ` Qais Yousef
2021-02-15  5:55   ` Thorsten Leemhuis
2021-02-15 14:28     ` Qais Yousef
2021-02-15 17:28 Thorsten Leemhuis
2021-02-23 11:57 ` Vlastimil Babka
2021-03-01 22:05 ` Jonathan Corbet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).