All of lore.kernel.org
 help / color / mirror / Atom feed
* Tracing: rb_head_page_deactivate() caught in an infinite loop
@ 2020-07-01 17:07 rananta
  2020-07-02  2:03 ` Steven Rostedt
  0 siblings, 1 reply; 3+ messages in thread
From: rananta @ 2020-07-01 17:07 UTC (permalink / raw)
  To: rostedt, mingo; +Cc: psodagud, linux-kernel

Hi Steven and Mingo,

While trying to adjust the buffer size (echo <size> > 
/sys/kernel/debug/tracing/buffer_size_kb), we see that the kernel gets 
caught up in an infinite loop
while traversing the "cpu_buffer->pages" list in 
rb_head_page_deactivate().

Looks like the last node of the list could be uninitialized, thus 
leading to infinite traversal. From the data that we captured:
000|rb_head_page_deactivate(inline)
     |  cpu_buffer = 0xFFFFFF8000671600 = 
kernel_size_le_lo32+0xFFFFFF652F6EE600 -> (
...
     |    pages = 0xFFFFFF80A909D980 = 
kernel_size_le_lo32+0xFFFFFF65D811A980 -> (
     |      next = 0xFFFFFF80A909D200 = 
kernel_size_le_lo32+0xFFFFFF65D811A200 -> (
     |        next = 0xFFFFFF80A909D580 = 
kernel_size_le_lo32+0xFFFFFF65D811A580 -> (
     |          next = 0xFFFFFF8138D1CD00 = 
kernel_size_le_lo32+0xFFFFFF6667D99D00 -> (
     |            next = 0xFFFFFF80006716F0 = 
kernel_size_le_lo32+0xFFFFFF652F6EE6F0 -> (
     |              next = 0xFFFFFF80006716F0 = 
kernel_size_le_lo32+0xFFFFFF652F6EE6F0 -> (
     |                next = 0xFFFFFF80006716F0 = 
kernel_size_le_lo32+0xFFFFFF652F6EE6F0 -> (
     |                  next = 0xFFFFFF80006716F0 = 
kernel_size_le_lo32+0xFFFFFF652F6EE6F0,

Wanted to check with you if there's any scenario that could lead us into 
this state.

Test details:
-- Arch: arm64
-- Kernel version 5.4.30; running on Andriod
-- Test case: Running the following set of commands across reboot will 
lead us to the scenario

   atrace --async_start -z -c -b 120000 sched audio irq idle freq
   < Run any workload here >
   atrace --async_dump -z -c -b 1200000 sched audio irq idle freq > 
mytrace.trace
   atrace --async_stop > /dev/null
   echo 150000 > /sys/kernel/debug/tracing/buffer_size_kb
   echo 200000 > /sys/kernel/debug/tracing/buffer_size_kb
   reboot

Repeating the above lines across reboots would reproduce the issue.
The "atrace" or "echo" would just get stuck while resizing the buffer 
size.
I'll try to reproduce the issue without atrace as well, but wondering 
what could be the reason for leading us to this state.

Thank you.
Raghavendra

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Tracing: rb_head_page_deactivate() caught in an infinite loop
  2020-07-01 17:07 Tracing: rb_head_page_deactivate() caught in an infinite loop rananta
@ 2020-07-02  2:03 ` Steven Rostedt
  2020-07-02 19:50   ` rananta
  0 siblings, 1 reply; 3+ messages in thread
From: Steven Rostedt @ 2020-07-02  2:03 UTC (permalink / raw)
  To: rananta; +Cc: mingo, psodagud, linux-kernel

On Wed, 01 Jul 2020 10:07:06 -0700
rananta@codeaurora.org wrote:

> Hi Steven and Mingo,
> 

Hi Raghavendra,


> While trying to adjust the buffer size (echo <size> > 
> /sys/kernel/debug/tracing/buffer_size_kb), we see that the kernel gets 
> caught up in an infinite loop
> while traversing the "cpu_buffer->pages" list in 
> rb_head_page_deactivate().
> 
> Looks like the last node of the list could be uninitialized, thus 
> leading to infinite traversal. From the data that we captured:
> 000|rb_head_page_deactivate(inline)
>      |  cpu_buffer = 0xFFFFFF8000671600 = 
> kernel_size_le_lo32+0xFFFFFF652F6EE600 -> (
> ...
>      |    pages = 0xFFFFFF80A909D980 = 
> kernel_size_le_lo32+0xFFFFFF65D811A980 -> (
>      |      next = 0xFFFFFF80A909D200 = 
> kernel_size_le_lo32+0xFFFFFF65D811A200 -> (
>      |        next = 0xFFFFFF80A909D580 = 
> kernel_size_le_lo32+0xFFFFFF65D811A580 -> (
>      |          next = 0xFFFFFF8138D1CD00 = 
> kernel_size_le_lo32+0xFFFFFF6667D99D00 -> (
>      |            next = 0xFFFFFF80006716F0 = 
> kernel_size_le_lo32+0xFFFFFF652F6EE6F0 -> (
>      |              next = 0xFFFFFF80006716F0 = 
> kernel_size_le_lo32+0xFFFFFF652F6EE6F0 -> (
>      |                next = 0xFFFFFF80006716F0 = 
> kernel_size_le_lo32+0xFFFFFF652F6EE6F0 -> (
>      |                  next = 0xFFFFFF80006716F0 = 
> kernel_size_le_lo32+0xFFFFFF652F6EE6F0,
> 
> Wanted to check with you if there's any scenario that could lead us into 
> this state.
> 
> Test details:
> -- Arch: arm64
> -- Kernel version 5.4.30; running on Andriod
> -- Test case: Running the following set of commands across reboot will 
> lead us to the scenario
> 
>    atrace --async_start -z -c -b 120000 sched audio irq idle freq
>    < Run any workload here >
>    atrace --async_dump -z -c -b 1200000 sched audio irq idle freq > 
> mytrace.trace
>    atrace --async_stop > /dev/null
>    echo 150000 > /sys/kernel/debug/tracing/buffer_size_kb
>    echo 200000 > /sys/kernel/debug/tracing/buffer_size_kb
>    reboot
> 
> Repeating the above lines across reboots would reproduce the issue.
> The "atrace" or "echo" would just get stuck while resizing the buffer 
> size.

What do you mean repeat across reboots? If it doesn't happen it wont
ever happen, but if you reboot it may have it happen again?

> I'll try to reproduce the issue without atrace as well, but wondering 
> what could be the reason for leading us to this state.

I haven't used arm lately, and I'm unfamiliar with atrace. So I don't
really know what is going on. If you can reproduce this with just a
shell script accessing the ftrace files, that would be much more useful.

Thanks,

-- Steve

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Tracing: rb_head_page_deactivate() caught in an infinite loop
  2020-07-02  2:03 ` Steven Rostedt
@ 2020-07-02 19:50   ` rananta
  0 siblings, 0 replies; 3+ messages in thread
From: rananta @ 2020-07-02 19:50 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: mingo, psodagud, linux-kernel

Hi Steven,

Thank you for responding.

On 2020-07-01 19:03, Steven Rostedt wrote:
> On Wed, 01 Jul 2020 10:07:06 -0700
> rananta@codeaurora.org wrote:
> 
>> Hi Steven and Mingo,
>> 
> 
> Hi Raghavendra,
> 
> 
>> While trying to adjust the buffer size (echo <size> >
>> /sys/kernel/debug/tracing/buffer_size_kb), we see that the kernel gets
>> caught up in an infinite loop
>> while traversing the "cpu_buffer->pages" list in
>> rb_head_page_deactivate().
>> 
>> Looks like the last node of the list could be uninitialized, thus
>> leading to infinite traversal. From the data that we captured:
>> 000|rb_head_page_deactivate(inline)
>>      |  cpu_buffer = 0xFFFFFF8000671600 =
>> kernel_size_le_lo32+0xFFFFFF652F6EE600 -> (
>> ...
>>      |    pages = 0xFFFFFF80A909D980 =
>> kernel_size_le_lo32+0xFFFFFF65D811A980 -> (
>>      |      next = 0xFFFFFF80A909D200 =
>> kernel_size_le_lo32+0xFFFFFF65D811A200 -> (
>>      |        next = 0xFFFFFF80A909D580 =
>> kernel_size_le_lo32+0xFFFFFF65D811A580 -> (
>>      |          next = 0xFFFFFF8138D1CD00 =
>> kernel_size_le_lo32+0xFFFFFF6667D99D00 -> (
>>      |            next = 0xFFFFFF80006716F0 =
>> kernel_size_le_lo32+0xFFFFFF652F6EE6F0 -> (
>>      |              next = 0xFFFFFF80006716F0 =
>> kernel_size_le_lo32+0xFFFFFF652F6EE6F0 -> (
>>      |                next = 0xFFFFFF80006716F0 =
>> kernel_size_le_lo32+0xFFFFFF652F6EE6F0 -> (
>>      |                  next = 0xFFFFFF80006716F0 =
>> kernel_size_le_lo32+0xFFFFFF652F6EE6F0,
>> 
>> Wanted to check with you if there's any scenario that could lead us 
>> into
>> this state.
>> 
>> Test details:
>> -- Arch: arm64
>> -- Kernel version 5.4.30; running on Andriod
>> -- Test case: Running the following set of commands across reboot will
>> lead us to the scenario
>> 
>>    atrace --async_start -z -c -b 120000 sched audio irq idle freq
>>    < Run any workload here >
>>    atrace --async_dump -z -c -b 1200000 sched audio irq idle freq >
>> mytrace.trace
>>    atrace --async_stop > /dev/null
>>    echo 150000 > /sys/kernel/debug/tracing/buffer_size_kb
>>    echo 200000 > /sys/kernel/debug/tracing/buffer_size_kb
>>    reboot
>> 
>> Repeating the above lines across reboots would reproduce the issue.
>> The "atrace" or "echo" would just get stuck while resizing the buffer
>> size.
> 
> What do you mean repeat across reboots? If it doesn't happen it wont
> ever happen, but if you reboot it may have it happen again?
> 
Actually, ignore the reboot part. I was able to reproduce with the 
reboot.
>> I'll try to reproduce the issue without atrace as well, but wondering
>> what could be the reason for leading us to this state.
> 
> I haven't used arm lately, and I'm unfamiliar with atrace. So I don't
> really know what is going on. If you can reproduce this with just a
> shell script accessing the ftrace files, that would be much more 
> useful.
> 
I tried to reproduce it without atrace, but couldn't. It's a simple 
wrapper utility
to collect ftraces. You can find the source here:
https://android.googlesource.com/platform/system/extras/+/jb-mr1-dev-plus-aosp/atrace/atrace.c
> Thanks,
> 
> -- Steve

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-07-02 19:50 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-01 17:07 Tracing: rb_head_page_deactivate() caught in an infinite loop rananta
2020-07-02  2:03 ` Steven Rostedt
2020-07-02 19:50   ` rananta

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.