All of lore.kernel.org
 help / color / mirror / Atom feed
* INFO: rcu_preempt detected expedited stalls on CPUs/tasks (6.1.0-rc3): in cat /sys/kernel/debug/kmemleak
@ 2022-11-04 22:01 Mirsad Goran Todorovac
  2022-11-04 23:00 ` srinivas pandruvada
  2022-11-09 10:48 ` INFO: rcu_preempt detected expedited stalls on CPUs/tasks (6.1.0-rc3): in cat /sys/kernel/debug/kmemleak #forregzbot Thorsten Leemhuis
  0 siblings, 2 replies; 6+ messages in thread
From: Mirsad Goran Todorovac @ 2022-11-04 22:01 UTC (permalink / raw)
  To: LKML; +Cc: Catalin Marinas, Thorsten Leemhuis, srinivas pandruvada

[-- Attachment #1: Type: text/plain, Size: 5019 bytes --]

Dear all,

When investigating thermald kmemleak, it occurred that the "cat 
/sys/kernel/debug/kmemleak"
and "tail -20 /sys/kernel/debug/kmemleak" commands take unusual amount 
of time.

Dmesg output showed expedited stalls that the commands caused NMIs and 
NMI backtraces:

[ 8123.263464] rcu: INFO: rcu_preempt detected expedited stalls on 
CPUs/tasks: { 0-.... } 26 jiffies s: 3829 root: 0x1/.
[ 8123.263500] rcu: blocking rcu_node structures (internal RCU debug):
[ 8123.263508] Sending NMI from CPU 7 to CPUs 0:
[ 8123.263528] NMI backtrace for cpu 0
[ 8123.263539] CPU: 0 PID: 27898 Comm: cat Not tainted 6.1.0-rc3 #1
[ 8123.263552] Hardware name: LENOVO 82H8/LNVNB161216, BIOS GGCN34WW 
03/08/2022
[ 8123.263557] RIP: 0010:kmemleak_seq_start+0x41/0x80
[ 8123.263579] Code: 55 04 a6 00 4c 63 e0 85 c0 78 40 e8 b9 80 db ff 48 
8b 05 92 fb 88 01 4c 8d 60 f8 48 3d 30
                                     62 63 92 75 17 eb 32 49 8b 44 24 08 
<48> 83 eb 01 4c 8d 60 f8 48 3d 30 62 63 92 74 1d
                                     48 85 db 7f e6 4c
[ 8123.263588] RSP: 0018:ffff9968e400fc30 EFLAGS: 00000206
[ 8123.263598] RAX: ffff8963b7005b58 RBX: 0000000000011cb8 RCX: 
0000000000000001
[ 8123.263604] RDX: ffff8963b09c4000 RSI: ffff8963856de028 RDI: 
ffffffff926361c0
[ 8123.263608] RBP: ffff9968e400fc40 R08: 0000000000020000 R09: 
ffff896380592a80
[ 8123.263613] R10: 0000000000020000 R11: 0000000000000000 R12: 
ffff8964114c2390
[ 8123.263617] R13: ffff89639fa25b00 R14: ffff8963856de000 R15: 
ffff9968e400fe30
[ 8123.263622] FS:  00007ff217c15740(0000) GS:ffff896528800000(0000) 
knlGS:0000000000000000
[ 8123.263630] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8123.263636] CR2: 00007f19bd6a8000 CR3: 000000028e392001 CR4: 
0000000000770ef0
[ 8123.263642] PKRU: 55555554
[ 8123.263646] Call Trace:
[ 8123.263649]  <TASK>
[ 8123.263656]  seq_read_iter+0x169/0x420
[ 8123.263671]  seq_read+0xad/0xe0
[ 8123.263685]  full_proxy_read+0x59/0x90
[ 8123.263701]  vfs_read+0xb2/0x2e0
[ 8123.263718]  ksys_read+0x61/0xe0
[ 8123.263730]  __x64_sys_read+0x1a/0x20
[ 8123.263741]  do_syscall_64+0x58/0x80
[ 8123.263754]  ? do_syscall_64+0x67/0x80
[ 8123.263767]  ? exit_to_user_mode_prepare+0x15d/0x190
[ 8123.263781]  ? syscall_exit_to_user_mode+0x1b/0x30
[ 8123.263791]  ? do_syscall_64+0x67/0x80
[ 8123.263804]  ? syscall_exit_to_user_mode+0x1b/0x30
[ 8123.263813]  ? do_syscall_64+0x67/0x80
[ 8123.263823]  ? do_syscall_64+0x67/0x80
[ 8123.263833]  ? do_syscall_64+0x67/0x80
[ 8123.263844]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 8123.263857] RIP: 0033:0x7ff217914992
[ 8123.263867] Code: c0 e9 b2 fe ff ff 50 48 8d 3d fa b2 0c 00 e8 c5 1d 
02 00 0f 1f 44 00 00 f3 0f
                                     1e fa 64 8b 04 25 18 00 00 00 85 c0 
75 10 0f 05 <48> 3d 00 f0 ff ff 77 56
                                     c3 0f 1f 44 00 00 48 83 ec 28 48 89 
54 24
[ 8123.263875] RSP: 002b:00007ffdfcbc3e28 EFLAGS: 00000246 ORIG_RAX: 
0000000000000000
[ 8123.263894] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 
00007ff217914992
[ 8123.263899] RDX: 0000000000020000 RSI: 00007ff217b9d000 RDI: 
0000000000000003
[ 8123.263904] RBP: 00007ff217b9d000 R08: 00007ff217b9c010 R09: 
00007ff217b9c010
[ 8123.263909] R10: 0000000000000022 R11: 0000000000000246 R12: 
0000000000022000
[ 8123.263914] R13: 0000000000000003 R14: 0000000000020000 R15: 
0000000000020000
[ 8123.263927]  </TASK>

To reproduce:

Enable CONFIG_DEBUG_KMEMLEAK=y in linux_stable 6.1.0-rc3 build.

Then, a stress test on a service known from the previous report is required:

for a in {1..1000}; doRIP: 0010:kmemleak_seq_start+0x41/0x80
          echo $a
          systemctl stop thermald
          sleep 0.5
          systemctl start thermald
          sleep 0.5
done

After that, /sys/kernel/debug/kmemleak indicated 1413 unreferenced objects.

However, dmesg had shown the stalls on CPUs while executing "cat" or 
"tail -40" of /sys/kernel/debug/kmemleak.

I've read the 
https://www.kernel.org/doc/Documentation/RCU/stallwarn.txt, but I was 
unable to understand
what is going on except to locate the stall to "RIP: 
0010:kmemleak_seq_start+0x41/0x80" and probable
maintainer/developer.

Please find attached the kernel build config, the output of "cat 
/sys/kernel/debug/kmemleak" and
dmesg ouput about the expedited stalls. The number of jiffies is not so 
large, but IMHO they possibly indicate
greater problems or deadlocks in RCU code in kmemleak.

NOTE:
Please add:

Reported-By: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>

if (or when) the bug is fixed.

Thank you.

-mt

--
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu
-- 
System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia
The European Union

[-- Attachment #2: config-6.1.0-rc3.gz --]
[-- Type: application/gzip, Size: 65705 bytes --]

[-- Attachment #3: expedited_stalls_cat_tail_kmemleak.dmesg.gz --]
[-- Type: application/gzip, Size: 24792 bytes --]

[-- Attachment #4: kmemleak-thermald.txt.gz --]
[-- Type: application/gzip, Size: 31677 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: INFO: rcu_preempt detected expedited stalls on CPUs/tasks (6.1.0-rc3): in cat /sys/kernel/debug/kmemleak
  2022-11-04 22:01 INFO: rcu_preempt detected expedited stalls on CPUs/tasks (6.1.0-rc3): in cat /sys/kernel/debug/kmemleak Mirsad Goran Todorovac
@ 2022-11-04 23:00 ` srinivas pandruvada
       [not found]   ` <32da7e3f-8608-29a9-3787-1d4c3315d235@alu.unizg.hr>
  2022-11-09 10:48 ` INFO: rcu_preempt detected expedited stalls on CPUs/tasks (6.1.0-rc3): in cat /sys/kernel/debug/kmemleak #forregzbot Thorsten Leemhuis
  1 sibling, 1 reply; 6+ messages in thread
From: srinivas pandruvada @ 2022-11-04 23:00 UTC (permalink / raw)
  To: Mirsad Goran Todorovac, LKML; +Cc: Catalin Marinas, Thorsten Leemhuis


On Fri, 2022-11-04 at 23:01 +0100, Mirsad Goran Todorovac wrote:
> Dear all,
> 
> When investigating thermald kmemleak, it occurred that the "cat 
> /sys/kernel/debug/kmemleak"
> and "tail -20 /sys/kernel/debug/kmemleak" commands take unusual
> amount 
> of time.
> 
This memleak occurred because of bad element in the BIOS configuration
of _OSC method.
Unfortunately fixing this in ACPI interpreter can have other side
effects as this is not expected. So the ACPICA team
(https://acpica.org/community) is looking further.

Thanks,
Srinivas

> Dmesg output showed expedited stalls that the commands caused NMIs
> and 
> NMI backtraces:
> 
> [ 8123.263464] rcu: INFO: rcu_preempt detected expedited stalls on 
> CPUs/tasks: { 0-.... } 26 jiffies s: 3829 root: 0x1/.
> [ 8123.263500] rcu: blocking rcu_node structures (internal RCU
> debug):
> [ 8123.263508] Sending NMI from CPU 7 to CPUs 0:
> [ 8123.263528] NMI backtrace for cpu 0
> [ 8123.263539] CPU: 0 PID: 27898 Comm: cat Not tainted 6.1.0-rc3 #1
> [ 8123.263552] Hardware name: LENOVO 82H8/LNVNB161216, BIOS GGCN34WW 
> 03/08/2022
> [ 8123.263557] RIP: 0010:kmemleak_seq_start+0x41/0x80
> [ 8123.263579] Code: 55 04 a6 00 4c 63 e0 85 c0 78 40 e8 b9 80 db ff
> 48 
> 8b 05 92 fb 88 01 4c 8d 60 f8 48 3d 30
>                                      62 63 92 75 17 eb 32 49 8b 44 24
> 08 
> <48> 83 eb 01 4c 8d 60 f8 48 3d 30 62 63 92 74 1d
>                                      48 85 db 7f e6 4c
> [ 8123.263588] RSP: 0018:ffff9968e400fc30 EFLAGS: 00000206
> [ 8123.263598] RAX: ffff8963b7005b58 RBX: 0000000000011cb8 RCX: 
> 0000000000000001
> [ 8123.263604] RDX: ffff8963b09c4000 RSI: ffff8963856de028 RDI: 
> ffffffff926361c0
> [ 8123.263608] RBP: ffff9968e400fc40 R08: 0000000000020000 R09: 
> ffff896380592a80
> [ 8123.263613] R10: 0000000000020000 R11: 0000000000000000 R12: 
> ffff8964114c2390
> [ 8123.263617] R13: ffff89639fa25b00 R14: ffff8963856de000 R15: 
> ffff9968e400fe30
> [ 8123.263622] FS:  00007ff217c15740(0000) GS:ffff896528800000(0000) 
> knlGS:0000000000000000
> [ 8123.263630] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 8123.263636] CR2: 00007f19bd6a8000 CR3: 000000028e392001 CR4: 
> 0000000000770ef0
> [ 8123.263642] PKRU: 55555554
> [ 8123.263646] Call Trace:
> [ 8123.263649]  <TASK>
> [ 8123.263656]  seq_read_iter+0x169/0x420
> [ 8123.263671]  seq_read+0xad/0xe0
> [ 8123.263685]  full_proxy_read+0x59/0x90
> [ 8123.263701]  vfs_read+0xb2/0x2e0
> [ 8123.263718]  ksys_read+0x61/0xe0
> [ 8123.263730]  __x64_sys_read+0x1a/0x20
> [ 8123.263741]  do_syscall_64+0x58/0x80
> [ 8123.263754]  ? do_syscall_64+0x67/0x80
> [ 8123.263767]  ? exit_to_user_mode_prepare+0x15d/0x190
> [ 8123.263781]  ? syscall_exit_to_user_mode+0x1b/0x30
> [ 8123.263791]  ? do_syscall_64+0x67/0x80
> [ 8123.263804]  ? syscall_exit_to_user_mode+0x1b/0x30
> [ 8123.263813]  ? do_syscall_64+0x67/0x80
> [ 8123.263823]  ? do_syscall_64+0x67/0x80
> [ 8123.263833]  ? do_syscall_64+0x67/0x80
> [ 8123.263844]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [ 8123.263857] RIP: 0033:0x7ff217914992
> [ 8123.263867] Code: c0 e9 b2 fe ff ff 50 48 8d 3d fa b2 0c 00 e8 c5
> 1d 
> 02 00 0f 1f 44 00 00 f3 0f
>                                      1e fa 64 8b 04 25 18 00 00 00 85
> c0 
> 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56
>                                      c3 0f 1f 44 00 00 48 83 ec 28 48
> 89 
> 54 24
> [ 8123.263875] RSP: 002b:00007ffdfcbc3e28 EFLAGS: 00000246 ORIG_RAX: 
> 0000000000000000
> [ 8123.263894] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 
> 00007ff217914992
> [ 8123.263899] RDX: 0000000000020000 RSI: 00007ff217b9d000 RDI: 
> 0000000000000003
> [ 8123.263904] RBP: 00007ff217b9d000 R08: 00007ff217b9c010 R09: 
> 00007ff217b9c010
> [ 8123.263909] R10: 0000000000000022 R11: 0000000000000246 R12: 
> 0000000000022000
> [ 8123.263914] R13: 0000000000000003 R14: 0000000000020000 R15: 
> 0000000000020000
> [ 8123.263927]  </TASK>
> 
> To reproduce:
> 
> Enable CONFIG_DEBUG_KMEMLEAK=y in linux_stable 6.1.0-rc3 build.
> 
> Then, a stress test on a service known from the previous report is
> required:
> 
> for a in {1..1000}; doRIP: 0010:kmemleak_seq_start+0x41/0x80
>           echo $a
>           systemctl stop thermald
>           sleep 0.5
>           systemctl start thermald
>           sleep 0.5
> done
> 
> After that, /sys/kernel/debug/kmemleak indicated 1413 unreferenced
> objects.
> 
> However, dmesg had shown the stalls on CPUs while executing "cat" or 
> "tail -40" of /sys/kernel/debug/kmemleak.
> 
> I've read the 
> https://www.kernel.org/doc/Documentation/RCU/stallwarn.txt, but I was
> unable to understand
> what is going on except to locate the stall to "RIP: 
> 0010:kmemleak_seq_start+0x41/0x80" and probable
> maintainer/developer.
> 
> Please find attached the kernel build config, the output of "cat 
> /sys/kernel/debug/kmemleak" and
> dmesg ouput about the expedited stalls. The number of jiffies is not
> so 
> large, but IMHO they possibly indicate
> greater problems or deadlocks in RCU code in kmemleak.
> 
> NOTE:
> Please add:
> 
> Reported-By: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
> 
> if (or when) the bug is fixed.
> 
> Thank you.
> 
> -mt
> 
> --
> Mirsad Goran Todorovac
> Sistem inženjer
> Grafički fakultet | Akademija likovnih umjetnosti
> Sveučilište u Zagrebu


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: INFO: rcu_preempt detected expedited stalls on CPUs/tasks (6.1.0-rc3): in cat /sys/kernel/debug/kmemleak
       [not found]   ` <32da7e3f-8608-29a9-3787-1d4c3315d235@alu.unizg.hr>
@ 2022-11-07  9:53     ` Mirsad Goran Todorovac
  2022-11-08 14:21     ` INFO: BISECTED: " Mirsad Goran Todorovac
  1 sibling, 0 replies; 6+ messages in thread
From: Mirsad Goran Todorovac @ 2022-11-07  9:53 UTC (permalink / raw)
  To: srinivas pandruvada, LKML; +Cc: Catalin Marinas, Thorsten Leemhuis

On 05. 11. 2022. 00:39, Mirsad Goran Todorovac wrote:

> On 05. 11. 2022. 00:00, srinivas pandruvada wrote:
>
>> On Fri, 2022-11-04 at 23:01 +0100, Mirsad Goran Todorovac wrote:
>>> Dear all,
>>>
>>> When investigating thermald kmemleak, it occurred that the "cat
>>> /sys/kernel/debug/kmemleak"
>>> and "tail -20 /sys/kernel/debug/kmemleak" commands take unusual
>>> amount
>>> of time.
>>>
>> This memleak occurred because of bad element in the BIOS configuration
>> of _OSC method.
>> Unfortunately fixing this in ACPI interpreter can have other side
>> effects as this is not expected. So the ACPICA team
>> (https://acpica.org/community) is looking further.
>>
>> Thanks,
>> Srinivas
>
> Dear Mr. Srinivas,
>
> Thank you for the update.
>
> However, the investigation on thermald memleak had lead to discovery 
> of the RCU stalls
> in the kmemleak itself. I felt it correct to include you in Cc: 
> because thermald caused the
> leaks in question. I suspected myself it is the problem specific to my 
> Lenovo Ideapad 3 Intel core
> i5 laptop's BIOS, because I couldn't reproduce it on the other 
> (desktop Lenovo) computer.
>
> Thank you very much,
> Mirsad
>
I felt it prudent to report that the issue is still present in 6.1.0-rc4 
kernel:

Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004752] rcu: 
INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 5-.... } 6 
jiffies s: 1021 root: 0x20/.
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004777] rcu: 
blocking rcu_node structures (internal RCU debug):
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004782] Sending 
NMI from CPU 3 to CPUs 5:
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004796] NMI 
backtrace for cpu 5
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004802] CPU: 5 
PID: 11912 Comm: tail Not tainted 6.1.0-rc4 #1
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004808] Hardware 
name: LENOVO 82H8/LNVNB161216, BIOS GGCN34WW 03/08/2022
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004810] RIP: 
0010:kmemleak_seq_start+0x41/0x80
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004821] Code: e5 
f5 a5 00 4c 63 e0 85 c0 78 40 e8 09 81 db ff 48 8b 05 d2 fe 88 01 4c 8d 
60 f8 48 3d b0 63 03 ad 75 17 eb 32 49 8b 44 24 08 <48> 83 eb 01 4c 8d 
60 f8 48 3d b0 63 03 ad 74 1d 48 85 db 7f e6 4c
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004825] RSP: 
0018:ffffa7e621bc7cd8 EFLAGS: 00000202
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004829] RAX: 
ffff8efb493ae008 RBX: 00000000000418cd RCX: 0000000000000001
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004833] RDX: 
ffff8efc709da000 RSI: ffff8efb415937a8 RDI: ffffffffad036340
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004835] RBP: 
ffffa7e621bc7ce8 R08: 0000000000002000 R09: ffff8efb44502f80
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004837] R10: 
0000000000020000 R11: 0000000000000000 R12: ffff8efb493af0a0
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004839] R13: 
ffff8efb5c6d6200 R14: ffff8efb41593780 R15: ffffa7e621bc7ed8
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004841] FS:  
00007f55020df740(0000) GS:ffff8efce8940000(0000) knlGS:0000000000000000
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004844] CS:  0010 
DS: 0000 ES: 0000 CR0: 0000000080050033
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004847] CR2: 
00001d8300d0c000 CR3: 0000000266aaa003 CR4: 0000000000770ee0
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004850] PKRU: 
55555554
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004851] Call Trace:
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004853] <TASK>
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004856] 
seq_read_iter+0x169/0x420
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004862]  ? 
seq_read+0xad/0xe0
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004867] 
seq_read+0xad/0xe0
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004871] 
full_proxy_read+0x59/0x90
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004880] 
vfs_read+0xb2/0x2e0
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004887] 
ksys_read+0x61/0xe0
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004893] 
__x64_sys_read+0x1a/0x20
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004898] 
do_syscall_64+0x58/0x80
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004905]  ? 
common_interrupt+0x54/0xa0
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004911] 
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004917] RIP: 
0033:0x7f5501f14992
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004922] Code: c0 
e9 b2 fe ff ff 50 48 8d 3d fa b2 0c 00 e8 c5 1d 02 00 0f 1f 44 00 00 f3 
0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 
77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004925] RSP: 
002b:00007ffef4b4ce18 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004929] RAX: 
ffffffffffffffda RBX: 0000000000002000 RCX: 00007f5501f14992
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004931] RDX: 
0000000000002000 RSI: 000055f43da40160 RDI: 0000000000000003
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004933] RBP: 
000055f43da40160 R08: 000055f43da40fba R09: 000055f43da42180
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004935] R10: 
0000000000000077 R11: 0000000000000246 R12: 0000000000000e5a
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004937] R13: 
0000000000000003 R14: 000055f43da42180 R15: 0000000000000003
Nov  7 10:50:08 marvin-IdeaPad-3-15ITL6 kernel: [ 2369.004943] </TASK>

Thank you,
Mirsad

--
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu
-- 
System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia
The European Union


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: INFO: BISECTED: rcu_preempt detected expedited stalls on CPUs/tasks (6.1.0-rc3): in cat /sys/kernel/debug/kmemleak
       [not found]   ` <32da7e3f-8608-29a9-3787-1d4c3315d235@alu.unizg.hr>
  2022-11-07  9:53     ` Mirsad Goran Todorovac
@ 2022-11-08 14:21     ` Mirsad Goran Todorovac
  1 sibling, 0 replies; 6+ messages in thread
From: Mirsad Goran Todorovac @ 2022-11-08 14:21 UTC (permalink / raw)
  To: LKML
  Cc: Thorsten Leemhuis, Uladzislau Rezki, srinivas pandruvada,
	Catalin Marinas

[-- Attachment #1: Type: text/plain, Size: 17314 bytes --]

On 05. 11. 2022. 00:39, Mirsad Goran Todorovac wrote:

> On 05. 11. 2022. 00:00, srinivas pandruvada wrote:
>
>> On Fri, 2022-11-04 at 23:01 +0100, Mirsad Goran Todorovac wrote:
>>> Dear all,
>>>
>>> When investigating thermald kmemleak, it occurred that the "cat
>>> /sys/kernel/debug/kmemleak"
>>> and "tail -20 /sys/kernel/debug/kmemleak" commands take unusual
>>> amount
>>> of time.
>>>
>> This memleak occurred because of bad element in the BIOS configuration
>> of _OSC method.
>> Unfortunately fixing this in ACPI interpreter can have other side
>> effects as this is not expected. So the ACPICA team
>> (https://acpica.org/community) is looking further.
>>
>> Thanks,
>> Srinivas
>
> Dear Mr. Srinivas,
>
> Thank you for the update.
>
> However, the investigation on thermald memleak had lead to discovery 
> of the RCU stalls
> in the kmemleak itself. I felt it correct to include you in Cc: 
> because thermald caused the
> leaks in question. I suspected myself it is the problem specific to my 
> Lenovo Ideapad 3 Intel core
> i5 laptop's BIOS, because I couldn't reproduce it on the other 
> (desktop Lenovo) computer.
>
> Thank you very much,
> Mirsad
>
As the system exhibited expedited stalls after which NMI and Call trace 
followed, under test
load the bug also caused multimedia "hickups" (i.e. Youtube video sound 
repeating like on
a broken record), so I thought it would be interesting to search for the 
cause of this behaviour.

To cut the long stroy short, here is the bisect log and log of the 
symptoms from syslog:

Nov  8 14:37:24 marvin-IdeaPad-3-15ITL6 kernel: [  122.907434] audit: 
type=1400 audit(1667914644.748:80): apparmor="DENIED" 
operation="capable" 
profile="/snap/snapd/17336/usr/lib/snapd/snap-confine" pid=3481 
comm="snap-confine" capability=12  capname="net_admin"
Nov  8 14:37:24 marvin-IdeaPad-3-15ITL6 kernel: [  122.907468] audit: 
type=1400 audit(1667914644.748:81): apparmor="DENIED" 
operation="capable" 
profile="/snap/snapd/17336/usr/lib/snapd/snap-confine" pid=3481 
comm="snap-confine" capability=38  capname="perfmon"
Nov  8 14:37:24 marvin-IdeaPad-3-15ITL6 systemd[1]: 
tmp-snap.rootfs_riJKwP.mount: Deactivated successfully.
Nov  8 14:37:24 marvin-IdeaPad-3-15ITL6 kernel: [  122.977628] rcu: 
INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 0-... } 6 
jiffies s: 669 root: 0x1/.
Nov  8 14:37:24 marvin-IdeaPad-3-15ITL6 kernel: [  122.977647] rcu: 
blocking rcu_node structures (internal RCU debug):
Nov  8 14:37:24 marvin-IdeaPad-3-15ITL6 kernel: [  122.977652] Task dump 
for CPU 0:
Nov  8 14:37:24 marvin-IdeaPad-3-15ITL6 kernel: [  122.977655] 
task:cat             state:R  running task     stack:    0 pid: 3495 
ppid:  3032 flags:0x00000008
Nov  8 14:37:24 marvin-IdeaPad-3-15ITL6 kernel: [  122.977666] Call Trace:
Nov  8 14:37:24 marvin-IdeaPad-3-15ITL6 kernel: [  122.977669] <TASK>
--
Nov  8 14:37:24 marvin-IdeaPad-3-15ITL6 kernel: [  122.977717] R10: 
0000000000000022 R11: 0000000000000246 R12: 0000000000022000
Nov  8 14:37:24 marvin-IdeaPad-3-15ITL6 kernel: [  122.977720] R13: 
0000000000000003 R14: 0000000000020000 R15: 0000000000020000
Nov  8 14:37:24 marvin-IdeaPad-3-15ITL6 kernel: [  122.977727] </TASK>
Nov  8 14:37:24 marvin-IdeaPad-3-15ITL6 kernel: [  123.053628] rcu: 
INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 0-... } 25 
jiffies s: 669 root: 0x1/.
Nov  8 14:37:24 marvin-IdeaPad-3-15ITL6 kernel: [  123.053644] rcu: 
blocking rcu_node structures (internal RCU debug):
Nov  8 14:37:24 marvin-IdeaPad-3-15ITL6 kernel: [  123.053648] Task dump 
for CPU 0:
Nov  8 14:37:24 marvin-IdeaPad-3-15ITL6 kernel: [  123.053650] 
task:cat             state:R  running task     stack:    0 pid: 3495 
ppid:  3032 flags:0x00000008
Nov  8 14:37:24 marvin-IdeaPad-3-15ITL6 kernel: [  123.053659] Call Trace:
Nov  8 14:37:24 marvin-IdeaPad-3-15ITL6 kernel: [  123.053661] <TASK>
--
Nov  8 14:37:24 marvin-IdeaPad-3-15ITL6 kernel: [  123.053705] R13: 
0000000000000003 R14: 0000000000020000 R15: 0000000000020000
Nov  8 14:37:24 marvin-IdeaPad-3-15ITL6 kernel: [  123.053710] </TASK>
Nov  8 14:37:24 marvin-IdeaPad-3-15ITL6 kernel: [  123.087971] audit: 
type=1400 audit(1667914644.928:82): apparmor="DENIED" 
operation="capable" 
profile="/snap/snapd/17336/usr/lib/snapd/snap-confine" pid=3481 
comm="snap-confine" capability=4  capname="fsetid"
Nov  8 14:37:25 marvin-IdeaPad-3-15ITL6 kernel: [  123.177707] rcu: 
INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 7-... } 6 
jiffies s: 685 root: 0x80/.
Nov  8 14:37:25 marvin-IdeaPad-3-15ITL6 kernel: [  123.177725] rcu: 
blocking rcu_node structures (internal RCU debug):
Nov  8 14:37:25 marvin-IdeaPad-3-15ITL6 kernel: [  123.177728] Task dump 
for CPU 7:
Nov  8 14:37:25 marvin-IdeaPad-3-15ITL6 kernel: [  123.177731] 
task:tail            state:R  running task     stack:    0 pid: 3511 
ppid:  3032 flags:0x00000008
Nov  8 14:37:25 marvin-IdeaPad-3-15ITL6 kernel: [  123.177739] Call Trace:
Nov  8 14:37:25 marvin-IdeaPad-3-15ITL6 kernel: [  123.177742] <TASK>
--
Nov  8 14:37:25 marvin-IdeaPad-3-15ITL6 kernel: [  123.177786] R10: 
0000000000000077 R11: 0000000000000246 R12: 0000000000000000
Nov  8 14:37:25 marvin-IdeaPad-3-15ITL6 kernel: [  123.177788] R13: 
0000000000000003 R14: 0000559256e46160 R15: 0000000000000003
Nov  8 14:37:25 marvin-IdeaPad-3-15ITL6 kernel: [  123.177793] </TASK>
Nov  8 14:37:25 marvin-IdeaPad-3-15ITL6 kernel: [  123.253637] rcu: 
INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 7-... } 25 
jiffies s: 685 root: 0x80/.
Nov  8 14:37:25 marvin-IdeaPad-3-15ITL6 kernel: [  123.253658] rcu: 
blocking rcu_node structures (internal RCU debug):
Nov  8 14:37:25 marvin-IdeaPad-3-15ITL6 kernel: [  123.253661] Task dump 
for CPU 7:
Nov  8 14:37:25 marvin-IdeaPad-3-15ITL6 kernel: [  123.253663] 
task:tail            state:R  running task     stack:    0 pid: 3511 
ppid:  3032 flags:0x00000008
Nov  8 14:37:25 marvin-IdeaPad-3-15ITL6 kernel: [  123.253671] Call Trace:
Nov  8 14:37:25 marvin-IdeaPad-3-15ITL6 kernel: [  123.253674] <TASK>
--
Nov  8 14:38:59 marvin-IdeaPad-3-15ITL6 thermald[6171]: optimized 
configuration on Linux with open source
Nov  8 14:38:59 marvin-IdeaPad-3-15ITL6 thermald[6171]: You may want to 
disable thermald on this system if you see issue
Nov  8 14:39:00 marvin-IdeaPad-3-15ITL6 thermald[6171]: terminating on 
user request ..
Nov  8 14:39:00 marvin-IdeaPad-3-15ITL6 kernel: [  218.806259] rcu: 
INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 4-... } 6 
jiffies s: 705 root: 0x10/.
Nov  8 14:39:00 marvin-IdeaPad-3-15ITL6 kernel: [  218.806284] rcu: 
blocking rcu_node structures (internal RCU debug):
Nov  8 14:39:00 marvin-IdeaPad-3-15ITL6 kernel: [  218.806289] Task dump 
for CPU 4:
Nov  8 14:39:00 marvin-IdeaPad-3-15ITL6 kernel: [  218.806293] 
task:cat             state:R  running task     stack:    0 pid: 6184 
ppid:  3032 flags:0x00000008
Nov  8 14:39:00 marvin-IdeaPad-3-15ITL6 kernel: [  218.806304] Call Trace:
Nov  8 14:39:00 marvin-IdeaPad-3-15ITL6 kernel: [  218.806307] <TASK>
--
Nov  8 14:39:00 marvin-IdeaPad-3-15ITL6 kernel: [  218.806531]  ? 
asm_exc_page_fault+0x8/0x30
Nov  8 14:39:00 marvin-IdeaPad-3-15ITL6 kernel: [  218.806538]  ? 
entry_SYSCALL_64_after_hwframe+0x44/0xae
Nov  8 14:39:00 marvin-IdeaPad-3-15ITL6 kernel: [  218.806547] </TASK>

marvin@marvin-IdeaPad-3-15ITL6:~/linux/kernel/memleak$ cat 
kmemleak-bisect-20221108-1441.log
git bisect start
git bisect good f814957b018c867a2397bdd3c1816de7502d2c7f
# bad: [e1a8fde7203fa8a3e3f35d4f9df47477d23529c1] Merge tag 
'for-5.19/io_uring-net-2022-05-22' of git://git.kernel.dk/linux-block
git bisect bad e1a8fde7203fa8a3e3f35d4f9df47477d23529c1
# good: [0184f08e65348f39aa4e8a71927e4538515f4ac0] io_uring: add fully 
sparse buffer registration
git bisect good 0184f08e65348f39aa4e8a71927e4538515f4ac0
# good: [be05ee54378d451e1d60196921566d6087f3079f] Merge branches 
'docs.2022.04.20a', 'fixes.2022.04.20a', 'nocb.2022.04.11b', 
'rcu-tasks.2022.04.11b', 'srcu.2022.05.03a', 'torture.2022.04.11b', 
'torture-tasks.2022.04.20a' and 'torturescript.2022.04.20a' into HEAD
git bisect good be05ee54378d451e1d60196921566d6087f3079f
# good: [a4c76853609107f0e5e7b51571d966785fe89cb3] Merge branch 
'tcp-pass-back-data-left-in-socket-after-receive' of 
git://git.kernel.org/pub/scm/linux/kernel/git/kuba/linux into 
for-5.19/io_uring-net
git bisect good a4c76853609107f0e5e7b51571d966785fe89cb3
# bad: [1e57930e9f4083ad5854ab6eadffe790a8167fb4] Merge tag 
'rcu.2022.05.19a' of 
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
git bisect bad 1e57930e9f4083ad5854ab6eadffe790a8167fb4
# bad: [ce13389053a347aa9f8ffbfda2238352536e15c9] Merge branch 
'exp.2022.05.11a' into HEAD
git bisect bad ce13389053a347aa9f8ffbfda2238352536e15c9
# bad: [9621fbee44df940e2e1b94b0676460a538dffefa] rcu: Move expedited 
grace period (GP) work to RT kthread_worker
git bisect bad 9621fbee44df940e2e1b94b0676460a538dffefa
# bad: [28b3ae426598e722cf5d5ab9cc7038791b955a56] rcu: Introduce 
CONFIG_RCU_EXP_CPU_STALL_TIMEOUT
git bisect bad 28b3ae426598e722cf5d5ab9cc7038791b955a56
# first bad commit: [28b3ae426598e722cf5d5ab9cc7038791b955a56] rcu: 
Introduce CONFIG_RCU_EXP_CPU_STALL_TIMEOUT
marvin@marvin-IdeaPad-3-15ITL6:~/linux/kernel/memleak$

According to the Code of Conduct, I have Cc:-ed Mr. Uladzislau Rezki for 
he is the principal author of
the troublesome commit and Mr. Catalin Marinas as the listed author in 
kmemleak source file.

commit 28b3ae426598e722cf5d5ab9cc7038791b955a56 (HEAD, refs/bisect/bad)
Author: Uladzislau Rezki <uladzislau.rezki@sony.com>
Date:   Wed Feb 16 14:52:09 2022 +0100

     rcu: Introduce CONFIG_RCU_EXP_CPU_STALL_TIMEOUT

     Currently both expedited and regular grace period stall warnings use
     a single timeout value that with units of seconds.  However, recent
     Android use cases problem require a sub-100-millisecond expedited 
RCU CPU
     stall warning.  Given that expedited RCU grace periods normally 
complete
     in far less than a single millisecond, especially for small systems,
     this is not unreasonable.

     Therefore introduce the CONFIG_RCU_EXP_CPU_STALL_TIMEOUT kernel
     configuration that defaults to 20 msec on Android and remains the same
     as that of the non-expedited stall warnings otherwise.  It also can be
     changed in run-time via: /sys/.../parameters/rcu_exp_cpu_stall_timeout.

     [ paulmck: Default of zero to use CONFIG_RCU_STALL_TIMEOUT. ]

     Signed-off-by: Uladzislau Rezki <uladzislau.rezki@sony.com>
     Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
     Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

The problem persisted in the latest 6.1.0-rc4 release candidate in 
mainline Linux stable tree.
The only difference to mainline config is CONFIG_DEBUG_KMEMLEAK=y.

Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405239] rcu: 
INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 5-.... } 6 
jiffies s: 1037 root: 0x20/.
Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405264] rcu: 
blocking rcu_node structures (internal RCU debug):
Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405272] Sending 
NMI from CPU 1 to CPUs 5:
Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405286] NMI 
backtrace for cpu 5
Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405291] CPU: 5 
PID: 15254 Comm: tail Not tainted 6.1.0-rc4 #1
Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405299] Hardware 
name: LENOVO 82H8/LNVNB161216, BIOS GGCN34WW 03/08/2022
Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405302] RIP: 
0010:kmemleak_seq_start+0x41/0x80
Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405315] Code: e5 
f5 a5 00 4c 63 e0 85 c0 78 40 e8 09 81 db ff 48 8b 05 d2 fe 88 01 4c 8d 
60 f8 48 3d b0 63 03 ad 75 17 eb 32 49 8b 44 24 08 <48>
  83 eb 01 4c 8d 60 f8 48 3d b0 63 03 ad 74 1d 48 85 db 7f e6 4c
Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405321] RSP: 
0018:ffffa7e6242afc28 EFLAGS: 00000206
Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405327] RAX: 
ffff8efb59033b58 RBX: 0000000000035af9 RCX: 0000000000000001
Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405331] RDX: 
ffff8efb6143c000 RSI: ffff8efb41593be0 RDI: ffffffffad036340
Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405334] RBP: 
ffffa7e6242afc38 R08: 0000000000002000 R09: ffff8efb44502f80
Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405338] R10: 
0000000000020000 R11: 0000000000000000 R12: ffff8efb4a6b4ab0
Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405342] R13: 
ffff8efb6a637900 R14: ffff8efb41593bb8 R15: ffffa7e6242afe28
Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405345] FS:  
00007f50bf8d4740(0000) GS:ffff8efce8940000(0000) knlGS:0000000000000000
Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405350] CS:  0010 
DS: 0000 ES: 0000 CR0: 0000000080050033
Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405354] CR2: 
00005647c9879000 CR3: 000000023b99c004 CR4: 0000000000770ee0
Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405358] PKRU: 
55555554
Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405360] Call Trace:
Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405362] <TASK>
Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405366] 
seq_read_iter+0x169/0x420
Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405376] 
seq_read+0xad/0xe0
Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405383] 
full_proxy_read+0x59/0x90
Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405394] 
vfs_read+0xb2/0x2e0
Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405403]  ? 
vfs_read+0xb2/0x2e0
Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405409]  ? 
__schedule+0xeb2/0x1300
Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405420] 
ksys_read+0x61/0xe0
Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405428] 
__x64_sys_read+0x1a/0x20
Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405435] 
do_syscall_64+0x58/0x80
Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405442]  ? 
exit_to_user_mode_prepare+0x32/0x190
Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405451]  ? 
syscall_exit_to_user_mode+0x1b/0x30
Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405456]  ? 
do_syscall_64+0x67/0x80
Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405462]  ? 
do_syscall_64+0x67/0x80
Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405468]  ? 
do_user_addr_fault+0x1eb/0x690
Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405475]  ? 
exit_to_user_mode_prepare+0x32/0x190
Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405481]  ? 
irqentry_exit_to_user_mode+0x9/0x20
Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405486]  ? 
irqentry_exit+0x3b/0x50
Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405490]  ? 
exc_page_fault+0x8e/0x190
Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405495] 
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405502] RIP: 
0033:0x7f50bf714992
Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405508] Code: c0 
e9 b2 fe ff ff 50 48 8d 3d fa b2 0c 00 e8 c5 1d 02 00 0f 1f 44 00 00 f3 
0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48>
  3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405513] RSP: 
002b:00007ffc6487e708 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405518] RAX: 
ffffffffffffffda RBX: 0000000000002000 RCX: 00007f50bf714992
Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405522] RDX: 
0000000000002000 RSI: 00005647c9878180 RDI: 0000000000000003
Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405525] RBP: 
00005647c9878180 R08: 00005647c9876fd1 R09: 00005647c9878180
Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405527] R10: 
0000000000000077 R11: 0000000000000246 R12: 0000000000000e71
Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405530] R13: 
0000000000000003 R14: 00005647c9876160 R15: 0000000000000003
Nov  7 11:16:14 marvin-IdeaPad-3-15ITL6 kernel: [ 3935.405538] </TASK>

The reproduction requires stopping and restarting thermald quite a 
number of times (500+) until
it produces /sys/kernel/debug/kmemleak unreferenced object list.

Torture reading of this file through for a in {1..5000}; do cat 
kmemleak; tail -40 memleak; grep thermald memleak | wc -l; done
eventually causes spurious non-determinisic expedite stalls everyonce in 
a while.

Please find the build config attached:

I hope this helps someone.

Have a nice day, and have peace.

Thanks,
Mirsad

--
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu
-- 
System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia
The European Union

[-- Attachment #2: config.lrz --]
[-- Type: application/x-lrzip, Size: 56135 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: INFO: rcu_preempt detected expedited stalls on CPUs/tasks (6.1.0-rc3): in cat /sys/kernel/debug/kmemleak #forregzbot
  2022-11-04 22:01 INFO: rcu_preempt detected expedited stalls on CPUs/tasks (6.1.0-rc3): in cat /sys/kernel/debug/kmemleak Mirsad Goran Todorovac
  2022-11-04 23:00 ` srinivas pandruvada
@ 2022-11-09 10:48 ` Thorsten Leemhuis
  2022-11-29  8:53   ` Thorsten Leemhuis
  1 sibling, 1 reply; 6+ messages in thread
From: Thorsten Leemhuis @ 2022-11-09 10:48 UTC (permalink / raw)
  To: regressions; +Cc: LKML

[Note: this mail is primarily send for documentation purposes and/or for
regzbot, my Linux kernel regression tracking bot. That's why I removed
most or all folks from the list of recipients, but left any that looked
like a mailing lists. These mails usually contain '#forregzbot' in the
subject, to make them easy to spot and filter out.]

[TLDR: I'm adding this regression report to the list of tracked
regressions; all text from me you find below is based on a few templates
paragraphs you might have encountered already already in similar form.]

Hi, this is your Linux kernel regression tracker.

On 04.11.22 23:01, Mirsad Goran Todorovac wrote:
> 
> When investigating thermald kmemleak, it occurred that the "cat
> /sys/kernel/debug/kmemleak"
> and "tail -20 /sys/kernel/debug/kmemleak" commands take unusual amount
> of time.
> 
> Dmesg output showed expedited stalls that the commands caused NMIs and
> NMI backtraces:
> 
> [ 8123.263464] rcu: INFO: rcu_preempt detected expedited stalls on
> CPUs/tasks: { 0-.... } 26 jiffies s: 3829 root: 0x1/.
> [ 8123.263500] rcu: blocking rcu_node structures (internal RCU debug):
> [ 8123.263508] Sending NMI from CPU 7 to CPUs 0:
> [ 8123.263528] NMI backtrace for cpu 0
> [ 8123.263539] CPU: 0 PID: 27898 Comm: cat Not tainted 6.1.0-rc3 #1
> [ 8123.263552] Hardware name: LENOVO 82H8/LNVNB161216, BIOS GGCN34WW
> 03/08/2022

#regzbot ^introduced 28b3ae426598e
#regzbot title rcu: thermald causes a kmemleak
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply -- ideally with also
telling regzbot about it, as explained here:
https://linux-regtracking.leemhuis.info/tracked-regression/

Reminder for developers: When fixing the issue, add 'Link:' tags
pointing to the report (the mail this one replies to), as explained for
in the Linux kernel's documentation; above webpage explains why this is
important for tracked regressions.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: INFO: rcu_preempt detected expedited stalls on CPUs/tasks (6.1.0-rc3): in cat /sys/kernel/debug/kmemleak #forregzbot
  2022-11-09 10:48 ` INFO: rcu_preempt detected expedited stalls on CPUs/tasks (6.1.0-rc3): in cat /sys/kernel/debug/kmemleak #forregzbot Thorsten Leemhuis
@ 2022-11-29  8:53   ` Thorsten Leemhuis
  0 siblings, 0 replies; 6+ messages in thread
From: Thorsten Leemhuis @ 2022-11-29  8:53 UTC (permalink / raw)
  To: regressions; +Cc: LKML

On 09.11.22 11:48, Thorsten Leemhuis wrote:
> [Note: this mail is primarily send for documentation purposes and/or for
> regzbot, my Linux kernel regression tracking bot. That's why I removed
> most or all folks from the list of recipients, but left any that looked
> like a mailing lists. These mails usually contain '#forregzbot' in the
> subject, to make them easy to spot and filter out.]
>
> On 04.11.22 23:01, Mirsad Goran Todorovac wrote:
>>
>> When investigating thermald kmemleak, it occurred that the "cat
>> /sys/kernel/debug/kmemleak"
>> and "tail -20 /sys/kernel/debug/kmemleak" commands take unusual amount
>> of time.
>>
>> Dmesg output showed expedited stalls that the commands caused NMIs and
>> NMI backtraces:
>>
>> [ 8123.263464] rcu: INFO: rcu_preempt detected expedited stalls on
>> CPUs/tasks: { 0-.... } 26 jiffies s: 3829 root: 0x1/.
>> [ 8123.263500] rcu: blocking rcu_node structures (internal RCU debug):
>> [ 8123.263508] Sending NMI from CPU 7 to CPUs 0:
>> [ 8123.263528] NMI backtrace for cpu 0
>> [ 8123.263539] CPU: 0 PID: 27898 Comm: cat Not tainted 6.1.0-rc3 #1
>> [ 8123.263552] Hardware name: LENOVO 82H8/LNVNB161216, BIOS GGCN34WW
>> 03/08/2022
> 
> #regzbot ^introduced 28b3ae426598e
> #regzbot title rcu: thermald causes a kmemleak
> #regzbot ignore-activity

#regzbot invalid: likely not a regression and just a odd config

For details see the discussion ending here:
https://lore.kernel.org/all/20221124201552.GV4001@paulmck-ThinkPad-P17-Gen-1/

Ciao, Thorsten

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-11-29  8:54 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-04 22:01 INFO: rcu_preempt detected expedited stalls on CPUs/tasks (6.1.0-rc3): in cat /sys/kernel/debug/kmemleak Mirsad Goran Todorovac
2022-11-04 23:00 ` srinivas pandruvada
     [not found]   ` <32da7e3f-8608-29a9-3787-1d4c3315d235@alu.unizg.hr>
2022-11-07  9:53     ` Mirsad Goran Todorovac
2022-11-08 14:21     ` INFO: BISECTED: " Mirsad Goran Todorovac
2022-11-09 10:48 ` INFO: rcu_preempt detected expedited stalls on CPUs/tasks (6.1.0-rc3): in cat /sys/kernel/debug/kmemleak #forregzbot Thorsten Leemhuis
2022-11-29  8:53   ` Thorsten Leemhuis

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.