linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RCU stalls in squashfs_readahead()
@ 2022-11-17 19:21 Mirsad Goran Todorovac
  2022-11-17 23:05 ` BUG: in squashfs_xz_uncompress() (Was: RCU stalls in squashfs_readahead()) Mirsad Goran Todorovac
  0 siblings, 1 reply; 25+ messages in thread
From: Mirsad Goran Todorovac @ 2022-11-17 19:21 UTC (permalink / raw)
  To: LKML; +Cc: Phillip Lougher, phillip.lougher, Thorsten Leemhuis

[-- Attachment #1: Type: text/plain, Size: 9192 bytes --]

Dear all,

I've noticed two similar RCU stalls with the similar call trace, both involving squashfs_readeahead().

Configuration is (excerpt):

CONFIG_KASAN_SHADOW_OFFSET=0xdffffc0000000000
CONFIG_HAVE_DEBUG_KMEMLEAK=y
CONFIG_DEBUG_KMEMLEAK=y
CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE=16000
# CONFIG_DEBUG_KMEMLEAK_TEST is not set
# CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF is not set
CONFIG_DEBUG_KMEMLEAK_AUTO_SCAN=y
CONFIG_HAVE_ARCH_KASAN=y
CONFIG_HAVE_ARCH_KASAN_VMALLOC=y
CONFIG_CC_HAS_KASAN_GENERIC=y
CONFIG_KASAN=y
CONFIG_KASAN_GENERIC=y
CONFIG_KASAN_OUTLINE=y
# CONFIG_KASAN_INLINE is not set
CONFIG_KASAN_STACK=y
# CONFIG_KASAN_VMALLOC is not set
# CONFIG_KASAN_MODULE_TEST is not set

Do you think this is worthy of a bisect? I am not yet certain about being able to reproduce the stall, however it occurs at
great memory use and heavy swapping (I thought of experimenting with efficiency of multi-gen LRU in 6.1 with multimedia under heavy 
loads and low memory ...).

Here is the excerpt from dmesg:

[ 1326.301544] perf: interrupt took too long (3181 > 3178), lowering kernel.perf_event_max_sample_rate to 62750
[ 1347.269247] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 3-.... } 6 jiffies s: 1317 root: 0x8/.
[ 1347.269294] rcu: blocking rcu_node structures (internal RCU debug):
[ 1347.269310] Sending NMI from CPU 2 to CPUs 3:
[ 1347.269329] NMI backtrace for cpu 3
[ 1347.269343] CPU: 3 PID: 7907 Comm: Isolated Web Co Not tainted 6.1.0-rc5 #1
[ 1347.269357] Hardware name: LENOVO 82H8/LNVNB161216, BIOS GGCN49WW 07/21/2022
[ 1347.269363] RIP: 0010:__asan_load8+0x3e/0xb0
[ 1347.269386] Code: b8 00 00 00 00 00 00 00 ff eb 0a 48 b8 00 00 00 00 00 80 ff ff 48 39 f8 77 44 48 8d 47 07 48 89 c2 83 e2 07 48 
83 fa 07 75 1c <48> ba 00 00 00 00 00 fc ff df 48 c1 e8 03 0f b6 04 10 84 c0 75 30
[ 1347.269398] RSP: 0018:ffff888102eef288 EFLAGS: 00000246
[ 1347.269410] RAX: ffff888109280047 RBX: ffff8881157da0ca RCX: ffffffff8fca8434
[ 1347.269419] RDX: 0000000000000007 RSI: 0000000000000001 RDI: ffff888109280040
[ 1347.269427] RBP: ffff888102eef288 R08: 00000000000003ae R09: ffffed1021250de8
[ 1347.269435] R10: ffff888109286f39 R11: ffffed1021250de7 R12: 0000000000000155
[ 1347.269443] R13: ffffc9000122ff18 R14: ffff888109280000 R15: ffff888109280000
[ 1347.269451] FS:  00007efe541e9380(0000) GS:ffff888252f80000(0000) knlGS:0000000000000000
[ 1347.269462] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1347.269470] CR2: 00007fa7a15e2000 CR3: 000000013100a003 CR4: 0000000000770ee0
[ 1347.269479] PKRU: 55555554
[ 1347.269484] Call Trace:
[ 1347.269488]  <TASK>
[ 1347.269494]  lzma_main+0x3b4/0x12b0
[ 1347.269517]  lzma2_lzma+0x2dd/0x460
[ 1347.269535]  xz_dec_lzma2_run+0x11e/0xba0
[ 1347.269549]  ? __asan_load4+0x8/0xb0
[ 1347.269564]  ? crc32_le_base+0x179/0x250
[ 1347.269581]  xz_dec_run+0x57b/0x11a0
[ 1347.269600]  squashfs_xz_uncompress+0x196/0x370
[ 1347.269620]  squashfs_decompress+0x8e/0xd0
[ 1347.269633]  ? lzo_uncompress+0x400/0x400
[ 1347.269648]  squashfs_read_data+0x1e6/0x900
[ 1347.269662]  ? kmemleak_alloc+0x4b/0x80
[ 1347.269678]  ? squashfs_bio_read.isra.0+0x230/0x230
[ 1347.269694]  ? squashfs_page_actor_init_special+0x1bb/0x230
[ 1347.269710]  squashfs_readahead+0xa86/0xe70
[ 1347.269722]  ? put_pages_list+0x1e0/0x1e0
[ 1347.269747]  ? squashfs_fill_page+0x190/0x190
[ 1347.269763]  ? psi_task_change+0x106/0x130
[ 1347.269781]  ? raw_spin_rq_unlock+0x17/0x60
[ 1347.269796]  read_pages+0x12d/0x530
[ 1347.269814]  ? file_ra_state_init+0x60/0x60
[ 1347.269829]  ? xas_free_nodes+0x170/0x170
[ 1347.269844]  ? filemap_alloc_folio+0xcf/0x120
[ 1347.269863]  page_cache_ra_unbounded+0x1aa/0x280
[ 1347.269878]  ? __rcu_read_unlock+0x57/0x270
[ 1347.269898]  do_page_cache_ra+0x7c/0x90
[ 1347.269915]  page_cache_ra_order+0x3b4/0x420
[ 1347.269936]  filemap_fault+0x849/0xe40
[ 1347.269951]  ? read_cache_page_gfp+0x90/0x90
[ 1347.269970]  __do_fault+0x76/0x1e0
[ 1347.269984]  do_fault+0x1e5/0x6c0
[ 1347.269998]  __handle_mm_fault+0x905/0x1740
[ 1347.270015]  ? copy_page_range+0x2130/0x2130
[ 1347.270028]  ? mas_find+0x100/0x100
[ 1347.270050]  handle_mm_fault+0x11c/0x3b0
[ 1347.270065]  do_user_addr_fault+0x261/0x8b0
[ 1347.270084]  exc_page_fault+0x61/0xf0
[ 1347.270101]  asm_exc_page_fault+0x27/0x30
[ 1347.270115] RIP: 0033:0x7efe544ce8b4
[ 1347.270126] Code: d5 c4 e1 f9 7e d0 c4 e1 f9 7e d7 48 c1 f8 35 48 c1 ff 2e 48 89 c1 48 89 fe b8 ff 01 00 00 29 c8 83 e6 7f 48 8d 
0d ec 45 07 00 <c5> fb 10 04 f1 48 98 48 8d 0d fe 44 07 00 c5 fb 59 0c c1 c5 f3 59
[ 1347.270137] RSP: 002b:00007ffc019a6e30 EFLAGS: 00010206
[ 1347.270147] RAX: 0000000000000005 RBX: 00007ffc019a6f50 RCX: 00007efe54542ea0
[ 1347.270156] RDX: ffffffffbfefe12e RSI: 000000000000007b RDI: 000000000000fd7b
[ 1347.270164] RBP: 00007ffc019a6f58 R08: 00007ffc019a6f40 R09: 00000000006f71c0
[ 1347.270173] R10: 00007efe547fc500 R11: 00007efe54459b60 R12: 00007ffc019a7060
[ 1347.270181] R13: 00007ffc019a6f48 R14: 00007ffc019a7040 R15: 00007ffc019a6f40
[ 1347.270199]  </TASK>


[ 1534.865687] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 3-.... } 7 jiffies s: 1393 root: 0x8/.
[ 1534.865725] rcu: blocking rcu_node structures (internal RCU debug):
[ 1534.865735] Sending NMI from CPU 2 to CPUs 3:
[ 1534.865752] NMI backtrace for cpu 3
[ 1534.865762] CPU: 3 PID: 9629 Comm: FS Broker 9627 Not tainted 6.1.0-rc5 #1
[ 1534.865776] Hardware name: LENOVO 82H8/LNVNB161216, BIOS GGCN49WW 07/21/2022
[ 1534.865782] RIP: 0010:__asan_load4+0x8/0xb0
[ 1534.865802] Code: cc cc cc cc 48 c1 e8 03 80 3c 10 00 75 e9 5d c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 55 48 89 
e5 4c 8b 45 08 <48> 83 ff fb 77 63 eb 0f 0f 1f 00 48 b8 00 00 00 00 00 00 00 ff eb
[ 1534.865811] RSP: 0018:ffff88811fd3f288 EFLAGS: 00000246
[ 1534.865823] RAX: 0000000000000000 RBX: 000000000001d378 RCX: dffffc0000000000
[ 1534.865831] RDX: 0000000000000003 RSI: 000000000000077f RDI: ffff888109280080
[ 1534.865838] RBP: ffff88811fd3f288 R08: ffffffff8fca8622 R09: 000000000001d380
[ 1534.865847] R10: ffff888114694020 R11: ffffed102843b9ff R12: ffff888109280006
[ 1534.865854] R13: ffff888109280000 R14: ffff888109280000 R15: 0000000002bbab0b
[ 1534.865861] FS:  00007fe923eff700(0000) GS:ffff888252f80000(0000) knlGS:0000000000000000
[ 1534.865871] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1534.865878] CR2: 00007fe90fe90ac0 CR3: 0000000138d5c003 CR4: 0000000000770ee0
[ 1534.865885] PKRU: 55555554
[ 1534.865889] Call Trace:
[ 1534.865893]  <TASK>
[ 1534.865898]  lzma_main+0x5a2/0x12b0
[ 1534.865920]  lzma2_lzma+0x2dd/0x460
[ 1534.865935]  xz_dec_lzma2_run+0x11e/0xba0
[ 1534.865948]  ? crc32_le_base+0x58/0x250
[ 1534.865961]  ? crc32_le_base+0x1e3/0x250
[ 1534.865977]  xz_dec_run+0x57b/0x11a0
[ 1534.865995]  squashfs_xz_uncompress+0x196/0x370
[ 1534.866011]  squashfs_decompress+0x8e/0xd0
[ 1534.866022]  ? lzo_uncompress+0x400/0x400
[ 1534.866034]  squashfs_read_data+0x1e6/0x900
[ 1534.866047]  ? kmemleak_alloc+0x4b/0x80
[ 1534.866061]  ? squashfs_bio_read.isra.0+0x230/0x230
[ 1534.866074]  ? squashfs_page_actor_init_special+0x1bb/0x230
[ 1534.866087]  squashfs_readahead+0xa86/0xe70
[ 1534.866099]  ? xas_create+0x141/0x600
[ 1534.866119]  ? squashfs_fill_page+0x190/0x190
[ 1534.866136]  ? psi_task_change+0x106/0x130
[ 1534.866155]  ? raw_spin_rq_unlock+0x17/0x60
[ 1534.866171]  read_pages+0x12d/0x530
[ 1534.866186]  ? __kasan_check_read+0x11/0x20
[ 1534.866200]  ? folio_batch_add_and_move+0x8d/0xa0
[ 1534.866217]  ? file_ra_state_init+0x60/0x60
[ 1534.866231]  ? folio_add_lru+0x47/0x70
[ 1534.866249]  ? filemap_alloc_folio+0xcf/0x120
[ 1534.866267]  page_cache_ra_unbounded+0x1eb/0x280
[ 1534.866280]  ? __rcu_read_unlock+0x57/0x270
[ 1534.866297]  do_page_cache_ra+0x7c/0x90
[ 1534.866312]  page_cache_ra_order+0x3b4/0x420
[ 1534.866330]  filemap_fault+0x849/0xe40
[ 1534.866343]  ? read_cache_page_gfp+0x90/0x90
[ 1534.866359]  __do_fault+0x76/0x1e0
[ 1534.866372]  do_fault+0x1e5/0x6c0
[ 1534.866384]  __handle_mm_fault+0x905/0x1740
[ 1534.866399]  ? copy_page_range+0x2130/0x2130
[ 1534.866412]  ? mas_find+0x100/0x100
[ 1534.866428]  handle_mm_fault+0x11c/0x3b0
[ 1534.866443]  do_user_addr_fault+0x261/0x8b0
[ 1534.866456]  ? exit_to_user_mode_prepare+0x9b/0x190
[ 1534.866472]  exc_page_fault+0x61/0xf0
[ 1534.866486]  asm_exc_page_fault+0x27/0x30
[ 1534.866498] RIP: 0033:0x7fe93673dd00
[ 1534.866515] Code: Unable to access opcode bytes at 0x7fe93673dcd6.
[ 1534.866521] RSP: 002b:00007fe923efeea8 EFLAGS: 00010246
[ 1534.866532] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
[ 1534.866539] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000000
[ 1534.866545] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007fe837100098
[ 1534.866553] R10: 00007fe868300de8 R11: 0000000000000217 R12: 00007fff95860b2e
[ 1534.866561] R13: 00007fff95860b2f R14: 00007fff95860bf8 R15: 00007fe923efef80
[ 1534.866576]  </TASK>

Thank you,
Mirsad

--
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu
-- 
System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia
The European Union

[-- Attachment #2: config-6.1.0-rc5.xz --]
[-- Type: application/x-xz, Size: 56696 bytes --]

[-- Attachment #3: dmesg.log.xz --]
[-- Type: application/x-xz, Size: 26132 bytes --]

[-- Attachment #4: lshw.txt.xz --]
[-- Type: application/x-xz, Size: 5256 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: BUG: in squashfs_xz_uncompress() (Was: RCU stalls in squashfs_readahead())
  2022-11-17 19:21 RCU stalls in squashfs_readahead() Mirsad Goran Todorovac
@ 2022-11-17 23:05 ` Mirsad Goran Todorovac
  2022-11-17 23:40   ` Mirsad Goran Todorovac
  2022-11-18  6:11   ` Phillip Lougher
  0 siblings, 2 replies; 25+ messages in thread
From: Mirsad Goran Todorovac @ 2022-11-17 23:05 UTC (permalink / raw)
  To: LKML; +Cc: Phillip Lougher, phillip.lougher, Thorsten Leemhuis

[-- Attachment #1: Type: text/plain, Size: 13087 bytes --]

Hi,

While trying to bisect, I've found another bug that predated the introduction of squashfs_readahead(), but it has
a common denominator in squashfs_decompress() and squashfs_xz_uncompress().

Frankly, I need an advice on how to handle a situation like this.

Obviously, the best place to test the RCU problem with squashfs_readahead() is where it is introduced + Phillip's patches [012]/3.

Thanks,
Mirsad

P.S.

This is excerpt from dmesg log attached:

[   25.338169] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 4-... } 6 jiffies s: 373 root: 0x10/.
[   25.338202] rcu: blocking rcu_node structures (internal RCU debug):
[   25.338210] Task dump for CPU 4:
[   25.338216] task:canonical-livep state:R  running task     stack:    0 pid:  913 ppid:     1 flags:0x00004008
[   25.338237] Call Trace:
[   25.338243]  <TASK>
[   25.338249]  ? __kasan_check_write+0x14/0x20
[   25.338272]  ? _raw_spin_lock_irqsave+0x93/0xf0
[   25.338293]  ? sysvec_apic_timer_interrupt+0x89/0x90
[   25.338311]  ? asm_sysvec_apic_timer_interrupt+0x20/0x30
[   25.338327]  ? sysvec_apic_timer_interrupt+0x89/0x90
[   25.338342]  ? sysvec_apic_timer_interrupt+0x89/0x90
[   25.338359]  ? asm_sysvec_apic_timer_interrupt+0x20/0x30
[   25.338377]  ? lzma_len+0x1fb/0x3b0
[   25.338395]  ? dict_repeat+0xec/0x120
[   25.338413]  ? lzma_main+0x10d/0x1260
[   25.338437]  ? lzma2_lzma+0x2a4/0x3b0
[   25.338457]  ? xz_dec_lzma2_run+0x11f/0xb90
[   25.338483]  ? xz_dec_run+0x346/0x11e0
[   25.338506]  ? squashfs_xz_uncompress+0x135/0x330
[   25.338530]  ? lzo_init+0xd0/0xd0
[   25.338546]  ? squashfs_decompress+0x88/0xc0
[   25.338565]  ? squashfs_read_data+0x1e5/0x8e0
[   25.338582]  ? __filemap_get_folio+0x37e/0x540
[   25.338602]  ? squashfs_bio_read.isra.2+0x220/0x220
[   25.338628]  ? squashfs_readpage_block+0x493/0x950
[   25.338650]  ? squashfs_read_folio+0xa4c/0x1140
[   25.338671]  ? xas_nomem+0x2f/0x100
[   25.338689]  ? squashfs_copy_cache+0x360/0x360
[   25.338707]  ? folio_wait_bit+0x3c0/0x3c0
[   25.338725]  ? __kasan_check_read+0x11/0x20
[   25.338739]  ? pagevec_add_and_need_flush+0x8c/0xa0
[   25.338757]  ? folio_add_lru+0x42/0x70
[   25.338772]  ? filemap_add_folio+0xd4/0x130
[   25.338788]  ? add_to_page_cache_locked+0xa0/0xa0
[   25.338806]  ? filemap_read_folio.isra.62+0xbe/0x4d0
[   25.338822]  ? __kasan_check_write+0x14/0x20
[   25.338838]  ? filemap_page_mkwrite+0x6a0/0x6a0
[   25.338856]  ? filemap_add_folio+0x130/0x130
[   25.338879]  ? filemap_fault+0x1041/0x1310
[   25.338895]  ? __kasan_check_read+0x11/0x20
[   25.338918]  ? read_cache_page_gfp+0xd0/0xd0
[   25.338932]  ? __rcu_read_unlock+0x55/0x260
[   25.338953]  ? try_to_wake_up+0x3c2/0xa10
[   25.338972]  ? __filemap_get_folio+0x540/0x540
[   25.338985]  ? __pmd+0x10/0x10
[   25.339002]  ? __do_fault+0x7c/0x1b0
[   25.339019]  ? do_fault+0x1ce/0x6a0
[   25.339036]  ? __handle_mm_fault+0x9cb/0x1490
[   25.339057]  ? copy_page_range+0x1b90/0x1b90
[   25.339073]  ? kernel_fpu_begin_mask+0x160/0x160
[   25.339092]  ? __ia32_sys_futex_time32+0x300/0x300
[   25.339119]  ? handle_mm_fault+0x11b/0x380
[   25.339138]  ? do_user_addr_fault+0x258/0x810
[   25.339157]  ? exc_page_fault+0x60/0xe0
[   25.339173]  ? asm_exc_page_fault+0x2c/0x40
[   25.339194]  </TASK>



On 17. 11. 2022. 20:21, Mirsad Goran Todorovac wrote:
> Dear all,
> 
> I've noticed two similar RCU stalls with the similar call trace, both involving squashfs_readeahead().
> 
> Configuration is (excerpt):
> 
> CONFIG_KASAN_SHADOW_OFFSET=0xdffffc0000000000
> CONFIG_HAVE_DEBUG_KMEMLEAK=y
> CONFIG_DEBUG_KMEMLEAK=y
> CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE=16000
> # CONFIG_DEBUG_KMEMLEAK_TEST is not set
> # CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF is not set
> CONFIG_DEBUG_KMEMLEAK_AUTO_SCAN=y
> CONFIG_HAVE_ARCH_KASAN=y
> CONFIG_HAVE_ARCH_KASAN_VMALLOC=y
> CONFIG_CC_HAS_KASAN_GENERIC=y
> CONFIG_KASAN=y
> CONFIG_KASAN_GENERIC=y
> CONFIG_KASAN_OUTLINE=y
> # CONFIG_KASAN_INLINE is not set
> CONFIG_KASAN_STACK=y
> # CONFIG_KASAN_VMALLOC is not set
> # CONFIG_KASAN_MODULE_TEST is not set
> 
> Do you think this is worthy of a bisect? I am not yet certain about being able to reproduce the stall, however it occurs at
> great memory use and heavy swapping (I thought of experimenting with efficiency of multi-gen LRU in 6.1 with multimedia under heavy 
> loads and low memory ...).
> 
> Here is the excerpt from dmesg:
> 
> [ 1326.301544] perf: interrupt took too long (3181 > 3178), lowering kernel.perf_event_max_sample_rate to 62750
> [ 1347.269247] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 3-.... } 6 jiffies s: 1317 root: 0x8/.
> [ 1347.269294] rcu: blocking rcu_node structures (internal RCU debug):
> [ 1347.269310] Sending NMI from CPU 2 to CPUs 3:
> [ 1347.269329] NMI backtrace for cpu 3
> [ 1347.269343] CPU: 3 PID: 7907 Comm: Isolated Web Co Not tainted 6.1.0-rc5 #1
> [ 1347.269357] Hardware name: LENOVO 82H8/LNVNB161216, BIOS GGCN49WW 07/21/2022
> [ 1347.269363] RIP: 0010:__asan_load8+0x3e/0xb0
> [ 1347.269386] Code: b8 00 00 00 00 00 00 00 ff eb 0a 48 b8 00 00 00 00 00 80 ff ff 48 39 f8 77 44 48 8d 47 07 48 89 c2 83 e2 07 48 
> 83 fa 07 75 1c <48> ba 00 00 00 00 00 fc ff df 48 c1 e8 03 0f b6 04 10 84 c0 75 30
> [ 1347.269398] RSP: 0018:ffff888102eef288 EFLAGS: 00000246
> [ 1347.269410] RAX: ffff888109280047 RBX: ffff8881157da0ca RCX: ffffffff8fca8434
> [ 1347.269419] RDX: 0000000000000007 RSI: 0000000000000001 RDI: ffff888109280040
> [ 1347.269427] RBP: ffff888102eef288 R08: 00000000000003ae R09: ffffed1021250de8
> [ 1347.269435] R10: ffff888109286f39 R11: ffffed1021250de7 R12: 0000000000000155
> [ 1347.269443] R13: ffffc9000122ff18 R14: ffff888109280000 R15: ffff888109280000
> [ 1347.269451] FS:  00007efe541e9380(0000) GS:ffff888252f80000(0000) knlGS:0000000000000000
> [ 1347.269462] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1347.269470] CR2: 00007fa7a15e2000 CR3: 000000013100a003 CR4: 0000000000770ee0
> [ 1347.269479] PKRU: 55555554
> [ 1347.269484] Call Trace:
> [ 1347.269488]  <TASK>
> [ 1347.269494]  lzma_main+0x3b4/0x12b0
> [ 1347.269517]  lzma2_lzma+0x2dd/0x460
> [ 1347.269535]  xz_dec_lzma2_run+0x11e/0xba0
> [ 1347.269549]  ? __asan_load4+0x8/0xb0
> [ 1347.269564]  ? crc32_le_base+0x179/0x250
> [ 1347.269581]  xz_dec_run+0x57b/0x11a0
> [ 1347.269600]  squashfs_xz_uncompress+0x196/0x370
> [ 1347.269620]  squashfs_decompress+0x8e/0xd0
> [ 1347.269633]  ? lzo_uncompress+0x400/0x400
> [ 1347.269648]  squashfs_read_data+0x1e6/0x900
> [ 1347.269662]  ? kmemleak_alloc+0x4b/0x80
> [ 1347.269678]  ? squashfs_bio_read.isra.0+0x230/0x230
> [ 1347.269694]  ? squashfs_page_actor_init_special+0x1bb/0x230
> [ 1347.269710]  squashfs_readahead+0xa86/0xe70
> [ 1347.269722]  ? put_pages_list+0x1e0/0x1e0
> [ 1347.269747]  ? squashfs_fill_page+0x190/0x190
> [ 1347.269763]  ? psi_task_change+0x106/0x130
> [ 1347.269781]  ? raw_spin_rq_unlock+0x17/0x60
> [ 1347.269796]  read_pages+0x12d/0x530
> [ 1347.269814]  ? file_ra_state_init+0x60/0x60
> [ 1347.269829]  ? xas_free_nodes+0x170/0x170
> [ 1347.269844]  ? filemap_alloc_folio+0xcf/0x120
> [ 1347.269863]  page_cache_ra_unbounded+0x1aa/0x280
> [ 1347.269878]  ? __rcu_read_unlock+0x57/0x270
> [ 1347.269898]  do_page_cache_ra+0x7c/0x90
> [ 1347.269915]  page_cache_ra_order+0x3b4/0x420
> [ 1347.269936]  filemap_fault+0x849/0xe40
> [ 1347.269951]  ? read_cache_page_gfp+0x90/0x90
> [ 1347.269970]  __do_fault+0x76/0x1e0
> [ 1347.269984]  do_fault+0x1e5/0x6c0
> [ 1347.269998]  __handle_mm_fault+0x905/0x1740
> [ 1347.270015]  ? copy_page_range+0x2130/0x2130
> [ 1347.270028]  ? mas_find+0x100/0x100
> [ 1347.270050]  handle_mm_fault+0x11c/0x3b0
> [ 1347.270065]  do_user_addr_fault+0x261/0x8b0
> [ 1347.270084]  exc_page_fault+0x61/0xf0
> [ 1347.270101]  asm_exc_page_fault+0x27/0x30
> [ 1347.270115] RIP: 0033:0x7efe544ce8b4
> [ 1347.270126] Code: d5 c4 e1 f9 7e d0 c4 e1 f9 7e d7 48 c1 f8 35 48 c1 ff 2e 48 89 c1 48 89 fe b8 ff 01 00 00 29 c8 83 e6 7f 48 8d 
> 0d ec 45 07 00 <c5> fb 10 04 f1 48 98 48 8d 0d fe 44 07 00 c5 fb 59 0c c1 c5 f3 59
> [ 1347.270137] RSP: 002b:00007ffc019a6e30 EFLAGS: 00010206
> [ 1347.270147] RAX: 0000000000000005 RBX: 00007ffc019a6f50 RCX: 00007efe54542ea0
> [ 1347.270156] RDX: ffffffffbfefe12e RSI: 000000000000007b RDI: 000000000000fd7b
> [ 1347.270164] RBP: 00007ffc019a6f58 R08: 00007ffc019a6f40 R09: 00000000006f71c0
> [ 1347.270173] R10: 00007efe547fc500 R11: 00007efe54459b60 R12: 00007ffc019a7060
> [ 1347.270181] R13: 00007ffc019a6f48 R14: 00007ffc019a7040 R15: 00007ffc019a6f40
> [ 1347.270199]  </TASK>
> 
> 
> [ 1534.865687] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 3-.... } 7 jiffies s: 1393 root: 0x8/.
> [ 1534.865725] rcu: blocking rcu_node structures (internal RCU debug):
> [ 1534.865735] Sending NMI from CPU 2 to CPUs 3:
> [ 1534.865752] NMI backtrace for cpu 3
> [ 1534.865762] CPU: 3 PID: 9629 Comm: FS Broker 9627 Not tainted 6.1.0-rc5 #1
> [ 1534.865776] Hardware name: LENOVO 82H8/LNVNB161216, BIOS GGCN49WW 07/21/2022
> [ 1534.865782] RIP: 0010:__asan_load4+0x8/0xb0
> [ 1534.865802] Code: cc cc cc cc 48 c1 e8 03 80 3c 10 00 75 e9 5d c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 55 48 89 
> e5 4c 8b 45 08 <48> 83 ff fb 77 63 eb 0f 0f 1f 00 48 b8 00 00 00 00 00 00 00 ff eb
> [ 1534.865811] RSP: 0018:ffff88811fd3f288 EFLAGS: 00000246
> [ 1534.865823] RAX: 0000000000000000 RBX: 000000000001d378 RCX: dffffc0000000000
> [ 1534.865831] RDX: 0000000000000003 RSI: 000000000000077f RDI: ffff888109280080
> [ 1534.865838] RBP: ffff88811fd3f288 R08: ffffffff8fca8622 R09: 000000000001d380
> [ 1534.865847] R10: ffff888114694020 R11: ffffed102843b9ff R12: ffff888109280006
> [ 1534.865854] R13: ffff888109280000 R14: ffff888109280000 R15: 0000000002bbab0b
> [ 1534.865861] FS:  00007fe923eff700(0000) GS:ffff888252f80000(0000) knlGS:0000000000000000
> [ 1534.865871] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1534.865878] CR2: 00007fe90fe90ac0 CR3: 0000000138d5c003 CR4: 0000000000770ee0
> [ 1534.865885] PKRU: 55555554
> [ 1534.865889] Call Trace:
> [ 1534.865893]  <TASK>
> [ 1534.865898]  lzma_main+0x5a2/0x12b0
> [ 1534.865920]  lzma2_lzma+0x2dd/0x460
> [ 1534.865935]  xz_dec_lzma2_run+0x11e/0xba0
> [ 1534.865948]  ? crc32_le_base+0x58/0x250
> [ 1534.865961]  ? crc32_le_base+0x1e3/0x250
> [ 1534.865977]  xz_dec_run+0x57b/0x11a0
> [ 1534.865995]  squashfs_xz_uncompress+0x196/0x370
> [ 1534.866011]  squashfs_decompress+0x8e/0xd0
> [ 1534.866022]  ? lzo_uncompress+0x400/0x400
> [ 1534.866034]  squashfs_read_data+0x1e6/0x900
> [ 1534.866047]  ? kmemleak_alloc+0x4b/0x80
> [ 1534.866061]  ? squashfs_bio_read.isra.0+0x230/0x230
> [ 1534.866074]  ? squashfs_page_actor_init_special+0x1bb/0x230
> [ 1534.866087]  squashfs_readahead+0xa86/0xe70
> [ 1534.866099]  ? xas_create+0x141/0x600
> [ 1534.866119]  ? squashfs_fill_page+0x190/0x190
> [ 1534.866136]  ? psi_task_change+0x106/0x130
> [ 1534.866155]  ? raw_spin_rq_unlock+0x17/0x60
> [ 1534.866171]  read_pages+0x12d/0x530
> [ 1534.866186]  ? __kasan_check_read+0x11/0x20
> [ 1534.866200]  ? folio_batch_add_and_move+0x8d/0xa0
> [ 1534.866217]  ? file_ra_state_init+0x60/0x60
> [ 1534.866231]  ? folio_add_lru+0x47/0x70
> [ 1534.866249]  ? filemap_alloc_folio+0xcf/0x120
> [ 1534.866267]  page_cache_ra_unbounded+0x1eb/0x280
> [ 1534.866280]  ? __rcu_read_unlock+0x57/0x270
> [ 1534.866297]  do_page_cache_ra+0x7c/0x90
> [ 1534.866312]  page_cache_ra_order+0x3b4/0x420
> [ 1534.866330]  filemap_fault+0x849/0xe40
> [ 1534.866343]  ? read_cache_page_gfp+0x90/0x90
> [ 1534.866359]  __do_fault+0x76/0x1e0
> [ 1534.866372]  do_fault+0x1e5/0x6c0
> [ 1534.866384]  __handle_mm_fault+0x905/0x1740
> [ 1534.866399]  ? copy_page_range+0x2130/0x2130
> [ 1534.866412]  ? mas_find+0x100/0x100
> [ 1534.866428]  handle_mm_fault+0x11c/0x3b0
> [ 1534.866443]  do_user_addr_fault+0x261/0x8b0
> [ 1534.866456]  ? exit_to_user_mode_prepare+0x9b/0x190
> [ 1534.866472]  exc_page_fault+0x61/0xf0
> [ 1534.866486]  asm_exc_page_fault+0x27/0x30
> [ 1534.866498] RIP: 0033:0x7fe93673dd00
> [ 1534.866515] Code: Unable to access opcode bytes at 0x7fe93673dcd6.
> [ 1534.866521] RSP: 002b:00007fe923efeea8 EFLAGS: 00010246
> [ 1534.866532] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
> [ 1534.866539] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000000
> [ 1534.866545] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007fe837100098
> [ 1534.866553] R10: 00007fe868300de8 R11: 0000000000000217 R12: 00007fff95860b2e
> [ 1534.866561] R13: 00007fff95860b2f R14: 00007fff95860bf8 R15: 00007fe923efef80
> [ 1534.866576]  </TASK>
> 
> Thank you,
> Mirsad
> 
> -- 
> Mirsad Goran Todorovac
> Sistem inženjer
> Grafički fakultet | Akademija likovnih umjetnosti
> Sveučilište u Zagrebu

--
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu
-- 
System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia
The European Union

[-- Attachment #2: dmesg-5.19-rc1.log.xz --]
[-- Type: application/x-xz, Size: 27456 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: BUG: in squashfs_xz_uncompress() (Was: RCU stalls in squashfs_readahead())
  2022-11-17 23:05 ` BUG: in squashfs_xz_uncompress() (Was: RCU stalls in squashfs_readahead()) Mirsad Goran Todorovac
@ 2022-11-17 23:40   ` Mirsad Goran Todorovac
  2022-11-18  6:11   ` Phillip Lougher
  1 sibling, 0 replies; 25+ messages in thread
From: Mirsad Goran Todorovac @ 2022-11-17 23:40 UTC (permalink / raw)
  To: LKML; +Cc: Phillip Lougher, phillip.lougher, Thorsten Leemhuis

Dear Mr. Lougher,

On 18. 11. 2022. 00:05, Mirsad Goran Todorovac wrote:
> Hi,
> 
> While trying to bisect, I've found another bug that predated the introduction of squashfs_readahead(), but it has
> a common denominator in squashfs_decompress() and squashfs_xz_uncompress().
> 
> Frankly, I need an advice on how to handle a situation like this.
> 
> Obviously, the best place to test the RCU problem with squashfs_readahead() is where it is introduced + Phillip's patches [012]/3.
> 
> Thanks,
> Mirsad
> 
> P.S.
> 
> This is excerpt from dmesg log attached:
> 
> [   25.338169] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 4-... } 6 jiffies s: 373 root: 0x10/.
> [   25.338202] rcu: blocking rcu_node structures (internal RCU debug):
> [   25.338210] Task dump for CPU 4:
> [   25.338216] task:canonical-livep state:R  running task     stack:    0 pid:  913 ppid:     1 flags:0x00004008
> [   25.338237] Call Trace:
> [   25.338243]  <TASK>
> [   25.338249]  ? __kasan_check_write+0x14/0x20
> [   25.338272]  ? _raw_spin_lock_irqsave+0x93/0xf0
> [   25.338293]  ? sysvec_apic_timer_interrupt+0x89/0x90
> [   25.338311]  ? asm_sysvec_apic_timer_interrupt+0x20/0x30
> [   25.338327]  ? sysvec_apic_timer_interrupt+0x89/0x90
> [   25.338342]  ? sysvec_apic_timer_interrupt+0x89/0x90
> [   25.338359]  ? asm_sysvec_apic_timer_interrupt+0x20/0x30
> [   25.338377]  ? lzma_len+0x1fb/0x3b0
> [   25.338395]  ? dict_repeat+0xec/0x120
> [   25.338413]  ? lzma_main+0x10d/0x1260
> [   25.338437]  ? lzma2_lzma+0x2a4/0x3b0
> [   25.338457]  ? xz_dec_lzma2_run+0x11f/0xb90
> [   25.338483]  ? xz_dec_run+0x346/0x11e0
> [   25.338506]  ? squashfs_xz_uncompress+0x135/0x330
> [   25.338530]  ? lzo_init+0xd0/0xd0
> [   25.338546]  ? squashfs_decompress+0x88/0xc0
> [   25.338565]  ? squashfs_read_data+0x1e5/0x8e0
> [   25.338582]  ? __filemap_get_folio+0x37e/0x540
> [   25.338602]  ? squashfs_bio_read.isra.2+0x220/0x220
> [   25.338628]  ? squashfs_readpage_block+0x493/0x950
> [   25.338650]  ? squashfs_read_folio+0xa4c/0x1140
> [   25.338671]  ? xas_nomem+0x2f/0x100
> [   25.338689]  ? squashfs_copy_cache+0x360/0x360
> [   25.338707]  ? folio_wait_bit+0x3c0/0x3c0
> [   25.338725]  ? __kasan_check_read+0x11/0x20
> [   25.338739]  ? pagevec_add_and_need_flush+0x8c/0xa0
> [   25.338757]  ? folio_add_lru+0x42/0x70
> [   25.338772]  ? filemap_add_folio+0xd4/0x130
> [   25.338788]  ? add_to_page_cache_locked+0xa0/0xa0
> [   25.338806]  ? filemap_read_folio.isra.62+0xbe/0x4d0
> [   25.338822]  ? __kasan_check_write+0x14/0x20
> [   25.338838]  ? filemap_page_mkwrite+0x6a0/0x6a0
> [   25.338856]  ? filemap_add_folio+0x130/0x130
> [   25.338879]  ? filemap_fault+0x1041/0x1310
> [   25.338895]  ? __kasan_check_read+0x11/0x20
> [   25.338918]  ? read_cache_page_gfp+0xd0/0xd0
> [   25.338932]  ? __rcu_read_unlock+0x55/0x260
> [   25.338953]  ? try_to_wake_up+0x3c2/0xa10
> [   25.338972]  ? __filemap_get_folio+0x540/0x540
> [   25.338985]  ? __pmd+0x10/0x10
> [   25.339002]  ? __do_fault+0x7c/0x1b0
> [   25.339019]  ? do_fault+0x1ce/0x6a0
> [   25.339036]  ? __handle_mm_fault+0x9cb/0x1490
> [   25.339057]  ? copy_page_range+0x1b90/0x1b90
> [   25.339073]  ? kernel_fpu_begin_mask+0x160/0x160
> [   25.339092]  ? __ia32_sys_futex_time32+0x300/0x300
> [   25.339119]  ? handle_mm_fault+0x11b/0x380
> [   25.339138]  ? do_user_addr_fault+0x258/0x810
> [   25.339157]  ? exc_page_fault+0x60/0xe0
> [   25.339173]  ? asm_exc_page_fault+0x2c/0x40
> [   25.339194]  </TASK>
> 
> 
> 
> On 17. 11. 2022. 20:21, Mirsad Goran Todorovac wrote:
>> Dear all,
>>
>> I've noticed two similar RCU stalls with the similar call trace, both involving squashfs_readeahead().
>>
>> Configuration is (excerpt):
>>
>> CONFIG_KASAN_SHADOW_OFFSET=0xdffffc0000000000
>> CONFIG_HAVE_DEBUG_KMEMLEAK=y
>> CONFIG_DEBUG_KMEMLEAK=y
>> CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE=16000
>> # CONFIG_DEBUG_KMEMLEAK_TEST is not set
>> # CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF is not set
>> CONFIG_DEBUG_KMEMLEAK_AUTO_SCAN=y
>> CONFIG_HAVE_ARCH_KASAN=y
>> CONFIG_HAVE_ARCH_KASAN_VMALLOC=y
>> CONFIG_CC_HAS_KASAN_GENERIC=y
>> CONFIG_KASAN=y
>> CONFIG_KASAN_GENERIC=y
>> CONFIG_KASAN_OUTLINE=y
>> # CONFIG_KASAN_INLINE is not set
>> CONFIG_KASAN_STACK=y
>> # CONFIG_KASAN_VMALLOC is not set
>> # CONFIG_KASAN_MODULE_TEST is not set
>>
>> Do you think this is worthy of a bisect? I am not yet certain about being able to reproduce the stall, however it occurs at
>> great memory use and heavy swapping (I thought of experimenting with efficiency of multi-gen LRU in 6.1 with multimedia under 
>> heavy loads and low memory ...).
>>
>> Here is the excerpt from dmesg:
>>
>> [ 1326.301544] perf: interrupt took too long (3181 > 3178), lowering kernel.perf_event_max_sample_rate to 62750
>> [ 1347.269247] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 3-.... } 6 jiffies s: 1317 root: 0x8/.
>> [ 1347.269294] rcu: blocking rcu_node structures (internal RCU debug):
>> [ 1347.269310] Sending NMI from CPU 2 to CPUs 3:
>> [ 1347.269329] NMI backtrace for cpu 3
>> [ 1347.269343] CPU: 3 PID: 7907 Comm: Isolated Web Co Not tainted 6.1.0-rc5 #1
>> [ 1347.269357] Hardware name: LENOVO 82H8/LNVNB161216, BIOS GGCN49WW 07/21/2022
>> [ 1347.269363] RIP: 0010:__asan_load8+0x3e/0xb0
>> [ 1347.269386] Code: b8 00 00 00 00 00 00 00 ff eb 0a 48 b8 00 00 00 00 00 80 ff ff 48 39 f8 77 44 48 8d 47 07 48 89 c2 83 e2 07 
>> 48 83 fa 07 75 1c <48> ba 00 00 00 00 00 fc ff df 48 c1 e8 03 0f b6 04 10 84 c0 75 30
>> [ 1347.269398] RSP: 0018:ffff888102eef288 EFLAGS: 00000246
>> [ 1347.269410] RAX: ffff888109280047 RBX: ffff8881157da0ca RCX: ffffffff8fca8434
>> [ 1347.269419] RDX: 0000000000000007 RSI: 0000000000000001 RDI: ffff888109280040
>> [ 1347.269427] RBP: ffff888102eef288 R08: 00000000000003ae R09: ffffed1021250de8
>> [ 1347.269435] R10: ffff888109286f39 R11: ffffed1021250de7 R12: 0000000000000155
>> [ 1347.269443] R13: ffffc9000122ff18 R14: ffff888109280000 R15: ffff888109280000
>> [ 1347.269451] FS:  00007efe541e9380(0000) GS:ffff888252f80000(0000) knlGS:0000000000000000
>> [ 1347.269462] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 1347.269470] CR2: 00007fa7a15e2000 CR3: 000000013100a003 CR4: 0000000000770ee0
>> [ 1347.269479] PKRU: 55555554
>> [ 1347.269484] Call Trace:
>> [ 1347.269488]  <TASK>
>> [ 1347.269494]  lzma_main+0x3b4/0x12b0
>> [ 1347.269517]  lzma2_lzma+0x2dd/0x460
>> [ 1347.269535]  xz_dec_lzma2_run+0x11e/0xba0
>> [ 1347.269549]  ? __asan_load4+0x8/0xb0
>> [ 1347.269564]  ? crc32_le_base+0x179/0x250
>> [ 1347.269581]  xz_dec_run+0x57b/0x11a0
>> [ 1347.269600]  squashfs_xz_uncompress+0x196/0x370
>> [ 1347.269620]  squashfs_decompress+0x8e/0xd0
>> [ 1347.269633]  ? lzo_uncompress+0x400/0x400
>> [ 1347.269648]  squashfs_read_data+0x1e6/0x900
>> [ 1347.269662]  ? kmemleak_alloc+0x4b/0x80
>> [ 1347.269678]  ? squashfs_bio_read.isra.0+0x230/0x230
>> [ 1347.269694]  ? squashfs_page_actor_init_special+0x1bb/0x230
>> [ 1347.269710]  squashfs_readahead+0xa86/0xe70
>> [ 1347.269722]  ? put_pages_list+0x1e0/0x1e0
>> [ 1347.269747]  ? squashfs_fill_page+0x190/0x190
>> [ 1347.269763]  ? psi_task_change+0x106/0x130
>> [ 1347.269781]  ? raw_spin_rq_unlock+0x17/0x60
>> [ 1347.269796]  read_pages+0x12d/0x530
>> [ 1347.269814]  ? file_ra_state_init+0x60/0x60
>> [ 1347.269829]  ? xas_free_nodes+0x170/0x170
>> [ 1347.269844]  ? filemap_alloc_folio+0xcf/0x120
>> [ 1347.269863]  page_cache_ra_unbounded+0x1aa/0x280
>> [ 1347.269878]  ? __rcu_read_unlock+0x57/0x270
>> [ 1347.269898]  do_page_cache_ra+0x7c/0x90
>> [ 1347.269915]  page_cache_ra_order+0x3b4/0x420
>> [ 1347.269936]  filemap_fault+0x849/0xe40
>> [ 1347.269951]  ? read_cache_page_gfp+0x90/0x90
>> [ 1347.269970]  __do_fault+0x76/0x1e0
>> [ 1347.269984]  do_fault+0x1e5/0x6c0
>> [ 1347.269998]  __handle_mm_fault+0x905/0x1740
>> [ 1347.270015]  ? copy_page_range+0x2130/0x2130
>> [ 1347.270028]  ? mas_find+0x100/0x100
>> [ 1347.270050]  handle_mm_fault+0x11c/0x3b0
>> [ 1347.270065]  do_user_addr_fault+0x261/0x8b0
>> [ 1347.270084]  exc_page_fault+0x61/0xf0
>> [ 1347.270101]  asm_exc_page_fault+0x27/0x30
>> [ 1347.270115] RIP: 0033:0x7efe544ce8b4
>> [ 1347.270126] Code: d5 c4 e1 f9 7e d0 c4 e1 f9 7e d7 48 c1 f8 35 48 c1 ff 2e 48 89 c1 48 89 fe b8 ff 01 00 00 29 c8 83 e6 7f 48 
>> 8d 0d ec 45 07 00 <c5> fb 10 04 f1 48 98 48 8d 0d fe 44 07 00 c5 fb 59 0c c1 c5 f3 59
>> [ 1347.270137] RSP: 002b:00007ffc019a6e30 EFLAGS: 00010206
>> [ 1347.270147] RAX: 0000000000000005 RBX: 00007ffc019a6f50 RCX: 00007efe54542ea0
>> [ 1347.270156] RDX: ffffffffbfefe12e RSI: 000000000000007b RDI: 000000000000fd7b
>> [ 1347.270164] RBP: 00007ffc019a6f58 R08: 00007ffc019a6f40 R09: 00000000006f71c0
>> [ 1347.270173] R10: 00007efe547fc500 R11: 00007efe54459b60 R12: 00007ffc019a7060
>> [ 1347.270181] R13: 00007ffc019a6f48 R14: 00007ffc019a7040 R15: 00007ffc019a6f40
>> [ 1347.270199]  </TASK>
>>
>>
>> [ 1534.865687] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 3-.... } 7 jiffies s: 1393 root: 0x8/.
>> [ 1534.865725] rcu: blocking rcu_node structures (internal RCU debug):
>> [ 1534.865735] Sending NMI from CPU 2 to CPUs 3:
>> [ 1534.865752] NMI backtrace for cpu 3
>> [ 1534.865762] CPU: 3 PID: 9629 Comm: FS Broker 9627 Not tainted 6.1.0-rc5 #1
>> [ 1534.865776] Hardware name: LENOVO 82H8/LNVNB161216, BIOS GGCN49WW 07/21/2022
>> [ 1534.865782] RIP: 0010:__asan_load4+0x8/0xb0
>> [ 1534.865802] Code: cc cc cc cc 48 c1 e8 03 80 3c 10 00 75 e9 5d c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 55 48 
>> 89 e5 4c 8b 45 08 <48> 83 ff fb 77 63 eb 0f 0f 1f 00 48 b8 00 00 00 00 00 00 00 ff eb
>> [ 1534.865811] RSP: 0018:ffff88811fd3f288 EFLAGS: 00000246
>> [ 1534.865823] RAX: 0000000000000000 RBX: 000000000001d378 RCX: dffffc0000000000
>> [ 1534.865831] RDX: 0000000000000003 RSI: 000000000000077f RDI: ffff888109280080
>> [ 1534.865838] RBP: ffff88811fd3f288 R08: ffffffff8fca8622 R09: 000000000001d380
>> [ 1534.865847] R10: ffff888114694020 R11: ffffed102843b9ff R12: ffff888109280006
>> [ 1534.865854] R13: ffff888109280000 R14: ffff888109280000 R15: 0000000002bbab0b
>> [ 1534.865861] FS:  00007fe923eff700(0000) GS:ffff888252f80000(0000) knlGS:0000000000000000
>> [ 1534.865871] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 1534.865878] CR2: 00007fe90fe90ac0 CR3: 0000000138d5c003 CR4: 0000000000770ee0
>> [ 1534.865885] PKRU: 55555554
>> [ 1534.865889] Call Trace:
>> [ 1534.865893]  <TASK>
>> [ 1534.865898]  lzma_main+0x5a2/0x12b0
>> [ 1534.865920]  lzma2_lzma+0x2dd/0x460
>> [ 1534.865935]  xz_dec_lzma2_run+0x11e/0xba0
>> [ 1534.865948]  ? crc32_le_base+0x58/0x250
>> [ 1534.865961]  ? crc32_le_base+0x1e3/0x250
>> [ 1534.865977]  xz_dec_run+0x57b/0x11a0
>> [ 1534.865995]  squashfs_xz_uncompress+0x196/0x370
>> [ 1534.866011]  squashfs_decompress+0x8e/0xd0
>> [ 1534.866022]  ? lzo_uncompress+0x400/0x400
>> [ 1534.866034]  squashfs_read_data+0x1e6/0x900
>> [ 1534.866047]  ? kmemleak_alloc+0x4b/0x80
>> [ 1534.866061]  ? squashfs_bio_read.isra.0+0x230/0x230
>> [ 1534.866074]  ? squashfs_page_actor_init_special+0x1bb/0x230
>> [ 1534.866087]  squashfs_readahead+0xa86/0xe70
>> [ 1534.866099]  ? xas_create+0x141/0x600
>> [ 1534.866119]  ? squashfs_fill_page+0x190/0x190
>> [ 1534.866136]  ? psi_task_change+0x106/0x130
>> [ 1534.866155]  ? raw_spin_rq_unlock+0x17/0x60
>> [ 1534.866171]  read_pages+0x12d/0x530
>> [ 1534.866186]  ? __kasan_check_read+0x11/0x20
>> [ 1534.866200]  ? folio_batch_add_and_move+0x8d/0xa0
>> [ 1534.866217]  ? file_ra_state_init+0x60/0x60
>> [ 1534.866231]  ? folio_add_lru+0x47/0x70
>> [ 1534.866249]  ? filemap_alloc_folio+0xcf/0x120
>> [ 1534.866267]  page_cache_ra_unbounded+0x1eb/0x280
>> [ 1534.866280]  ? __rcu_read_unlock+0x57/0x270
>> [ 1534.866297]  do_page_cache_ra+0x7c/0x90
>> [ 1534.866312]  page_cache_ra_order+0x3b4/0x420
>> [ 1534.866330]  filemap_fault+0x849/0xe40
>> [ 1534.866343]  ? read_cache_page_gfp+0x90/0x90
>> [ 1534.866359]  __do_fault+0x76/0x1e0
>> [ 1534.866372]  do_fault+0x1e5/0x6c0
>> [ 1534.866384]  __handle_mm_fault+0x905/0x1740
>> [ 1534.866399]  ? copy_page_range+0x2130/0x2130
>> [ 1534.866412]  ? mas_find+0x100/0x100
>> [ 1534.866428]  handle_mm_fault+0x11c/0x3b0
>> [ 1534.866443]  do_user_addr_fault+0x261/0x8b0
>> [ 1534.866456]  ? exit_to_user_mode_prepare+0x9b/0x190
>> [ 1534.866472]  exc_page_fault+0x61/0xf0
>> [ 1534.866486]  asm_exc_page_fault+0x27/0x30
>> [ 1534.866498] RIP: 0033:0x7fe93673dd00
>> [ 1534.866515] Code: Unable to access opcode bytes at 0x7fe93673dcd6.
>> [ 1534.866521] RSP: 002b:00007fe923efeea8 EFLAGS: 00010246
>> [ 1534.866532] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
>> [ 1534.866539] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000000
>> [ 1534.866545] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007fe837100098
>> [ 1534.866553] R10: 00007fe868300de8 R11: 0000000000000217 R12: 00007fff95860b2e
>> [ 1534.866561] R13: 00007fff95860b2f R14: 00007fff95860bf8 R15: 00007fe923efef80
>> [ 1534.866576]  </TASK>

To what exact state of code should your patches phillip_[012]_3.patch be applied?
In https://lore.kernel.org/lkml/20221020223616.7571-3-phillip@squashfs.org.uk/#Z31fs:squashfs:file.c
you say that it is a patch that remedied the commit 8fc78b6fe24c:

Fixes: 8fc78b6fe24c ("squashfs: implement readahead")

However, the patch does not apply to that commit:

marvin@marvin-IdeaPad-3-15ITL6:~/linux/kernel/linux_stable_build$ git apply --ignore-whitespace ../phillip_0_3.patch
error: patch failed: fs/squashfs/file.c:593
error: fs/squashfs/file.c: patch does not apply
error: patch failed: fs/squashfs/page_actor.c:71
error: fs/squashfs/page_actor.c: patch does not apply
error: patch failed: fs/squashfs/page_actor.h:29
error: fs/squashfs/page_actor.h: patch does not apply

marvin@marvin-IdeaPad-3-15ITL6:~/linux/kernel/linux_stable_build$ git show | head
commit 8fc78b6fe24c36b151ac98d7546591ed92083d4f
Author: Hsin-Yi Wang <hsinyi@chromium.org>
Date:   Fri Jun 17 16:38:13 2022 +0800

     squashfs: implement readahead

     Implement readahead callback for squashfs.  It will read datablocks which
     cover pages in readahead request.  For a few cases it will not mark page
     as uptodate, including:

marvin@marvin-IdeaPad-3-15ITL6:~/linux/kernel/linux_stable_build$

I need this information for bisecting the RCU stall bug in squashfs_readahead(),
to see whether it happens immediately. I need a lower barrier for "git bisect".

Thank you,
Mirsad

--
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu
-- 
System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia
The European Union


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: BUG: in squashfs_xz_uncompress() (Was: RCU stalls in squashfs_readahead())
  2022-11-17 23:05 ` BUG: in squashfs_xz_uncompress() (Was: RCU stalls in squashfs_readahead()) Mirsad Goran Todorovac
  2022-11-17 23:40   ` Mirsad Goran Todorovac
@ 2022-11-18  6:11   ` Phillip Lougher
  2022-11-18 15:17     ` Mirsad Goran Todorovac
                       ` (2 more replies)
  1 sibling, 3 replies; 25+ messages in thread
From: Phillip Lougher @ 2022-11-18  6:11 UTC (permalink / raw)
  To: Mirsad Goran Todorovac, LKML, Paul E. McKenney
  Cc: phillip.lougher, Thorsten Leemhuis

On 17/11/2022 23:05, Mirsad Goran Todorovac wrote:
> Hi,
> 
> While trying to bisect, I've found another bug that predated the 
> introduction of squashfs_readahead(), but it has
> a common denominator in squashfs_decompress() and squashfs_xz_uncompress().

Wrong, the stall is happening in the XZ decompressor library, which
is *not* in Squashfs.

This reported stall in the decompressor code is likely a symptom of you
deliberately thrashing your system.  When the system thrashes everything
starts to happen very slowly, and the system will spend a lot of
its time doing page I/O, and the CPU will spend a lot of time in
any CPU intensive code like the XZ decompressor library.

So the fact the stall is being hit here is a symptom and not
a cause.  The decompressor code is likely running slowly due to
thrashing and waiting on paged-out buffers.  This is not indicative
of any bug, merely a system running slowly due to overload.

As I said, this is not a Squashfs issue, because the code when the
stall takes place isn't in Squashfs.

The people responsible for the rcu code should have a lot more insight
about what happens when the system is thrashing, and how this will
throw up false positives.  In this I believe this is an instance of
perfectly correct code running slowly due to thrashing incorrectly
being flagged as looping.

CC'ing Paul E. McKenney <paulmck@kernel.org>

Phillip



> 
> Frankly, I need an advice on how to handle a situation like this.
> 
> Obviously, the best place to test the RCU problem with 
> squashfs_readahead() is where it is introduced + Phillip's patches [012]/3.
> 
> Thanks,
> Mirsad
> 
> P.S.
> 
> This is excerpt from dmesg log attached:
> 
> [   25.338169] rcu: INFO: rcu_preempt detected expedited stalls on 
> CPUs/tasks: { 4-... } 6 jiffies s: 373 root: 0x10/.
> [   25.338202] rcu: blocking rcu_node structures (internal RCU debug):
> [   25.338210] Task dump for CPU 4:
> [   25.338216] task:canonical-livep state:R  running task     stack:    
> 0 pid:  913 ppid:     1 flags:0x00004008
> [   25.338237] Call Trace:
> [   25.338243]  <TASK>
> [   25.338249]  ? __kasan_check_write+0x14/0x20
> [   25.338272]  ? _raw_spin_lock_irqsave+0x93/0xf0
> [   25.338293]  ? sysvec_apic_timer_interrupt+0x89/0x90
> [   25.338311]  ? asm_sysvec_apic_timer_interrupt+0x20/0x30
> [   25.338327]  ? sysvec_apic_timer_interrupt+0x89/0x90
> [   25.338342]  ? sysvec_apic_timer_interrupt+0x89/0x90
> [   25.338359]  ? asm_sysvec_apic_timer_interrupt+0x20/0x30
> [   25.338377]  ? lzma_len+0x1fb/0x3b0
> [   25.338395]  ? dict_repeat+0xec/0x120
> [   25.338413]  ? lzma_main+0x10d/0x1260
> [   25.338437]  ? lzma2_lzma+0x2a4/0x3b0
> [   25.338457]  ? xz_dec_lzma2_run+0x11f/0xb90
> [   25.338483]  ? xz_dec_run+0x346/0x11e0
> [   25.338506]  ? squashfs_xz_uncompress+0x135/0x330
> [   25.338530]  ? lzo_init+0xd0/0xd0
> [   25.338546]  ? squashfs_decompress+0x88/0xc0
> [   25.338565]  ? squashfs_read_data+0x1e5/0x8e0
> [   25.338582]  ? __filemap_get_folio+0x37e/0x540
> [   25.338602]  ? squashfs_bio_read.isra.2+0x220/0x220
> [   25.338628]  ? squashfs_readpage_block+0x493/0x950
> [   25.338650]  ? squashfs_read_folio+0xa4c/0x1140
> [   25.338671]  ? xas_nomem+0x2f/0x100
> [   25.338689]  ? squashfs_copy_cache+0x360/0x360
> [   25.338707]  ? folio_wait_bit+0x3c0/0x3c0
> [   25.338725]  ? __kasan_check_read+0x11/0x20
> [   25.338739]  ? pagevec_add_and_need_flush+0x8c/0xa0
> [   25.338757]  ? folio_add_lru+0x42/0x70
> [   25.338772]  ? filemap_add_folio+0xd4/0x130
> [   25.338788]  ? add_to_page_cache_locked+0xa0/0xa0
> [   25.338806]  ? filemap_read_folio.isra.62+0xbe/0x4d0
> [   25.338822]  ? __kasan_check_write+0x14/0x20
> [   25.338838]  ? filemap_page_mkwrite+0x6a0/0x6a0
> [   25.338856]  ? filemap_add_folio+0x130/0x130
> [   25.338879]  ? filemap_fault+0x1041/0x1310
> [   25.338895]  ? __kasan_check_read+0x11/0x20
> [   25.338918]  ? read_cache_page_gfp+0xd0/0xd0
> [   25.338932]  ? __rcu_read_unlock+0x55/0x260
> [   25.338953]  ? try_to_wake_up+0x3c2/0xa10
> [   25.338972]  ? __filemap_get_folio+0x540/0x540
> [   25.338985]  ? __pmd+0x10/0x10
> [   25.339002]  ? __do_fault+0x7c/0x1b0
> [   25.339019]  ? do_fault+0x1ce/0x6a0
> [   25.339036]  ? __handle_mm_fault+0x9cb/0x1490
> [   25.339057]  ? copy_page_range+0x1b90/0x1b90
> [   25.339073]  ? kernel_fpu_begin_mask+0x160/0x160
> [   25.339092]  ? __ia32_sys_futex_time32+0x300/0x300
> [   25.339119]  ? handle_mm_fault+0x11b/0x380
> [   25.339138]  ? do_user_addr_fault+0x258/0x810
> [   25.339157]  ? exc_page_fault+0x60/0xe0
> [   25.339173]  ? asm_exc_page_fault+0x2c/0x40
> [   25.339194]  </TASK>
> 
> 
> 
> On 17. 11. 2022. 20:21, Mirsad Goran Todorovac wrote:
>> Dear all,
>>
>> I've noticed two similar RCU stalls with the similar call trace, both 
>> involving squashfs_readeahead().
>>
>> Configuration is (excerpt):
>>
>> CONFIG_KASAN_SHADOW_OFFSET=0xdffffc0000000000
>> CONFIG_HAVE_DEBUG_KMEMLEAK=y
>> CONFIG_DEBUG_KMEMLEAK=y
>> CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE=16000
>> # CONFIG_DEBUG_KMEMLEAK_TEST is not set
>> # CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF is not set
>> CONFIG_DEBUG_KMEMLEAK_AUTO_SCAN=y
>> CONFIG_HAVE_ARCH_KASAN=y
>> CONFIG_HAVE_ARCH_KASAN_VMALLOC=y
>> CONFIG_CC_HAS_KASAN_GENERIC=y
>> CONFIG_KASAN=y
>> CONFIG_KASAN_GENERIC=y
>> CONFIG_KASAN_OUTLINE=y
>> # CONFIG_KASAN_INLINE is not set
>> CONFIG_KASAN_STACK=y
>> # CONFIG_KASAN_VMALLOC is not set
>> # CONFIG_KASAN_MODULE_TEST is not set
>>
>> Do you think this is worthy of a bisect? I am not yet certain about 
>> being able to reproduce the stall, however it occurs at
>> great memory use and heavy swapping (I thought of experimenting with 
>> efficiency of multi-gen LRU in 6.1 with multimedia under heavy loads 
>> and low memory ...).
>>
>> Here is the excerpt from dmesg:
>>
>> [ 1326.301544] perf: interrupt took too long (3181 > 3178), lowering 
>> kernel.perf_event_max_sample_rate to 62750
>> [ 1347.269247] rcu: INFO: rcu_preempt detected expedited stalls on 
>> CPUs/tasks: { 3-.... } 6 jiffies s: 1317 root: 0x8/.
>> [ 1347.269294] rcu: blocking rcu_node structures (internal RCU debug):
>> [ 1347.269310] Sending NMI from CPU 2 to CPUs 3:
>> [ 1347.269329] NMI backtrace for cpu 3
>> [ 1347.269343] CPU: 3 PID: 7907 Comm: Isolated Web Co Not tainted 
>> 6.1.0-rc5 #1
>> [ 1347.269357] Hardware name: LENOVO 82H8/LNVNB161216, BIOS GGCN49WW 
>> 07/21/2022
>> [ 1347.269363] RIP: 0010:__asan_load8+0x3e/0xb0
>> [ 1347.269386] Code: b8 00 00 00 00 00 00 00 ff eb 0a 48 b8 00 00 00 
>> 00 00 80 ff ff 48 39 f8 77 44 48 8d 47 07 48 89 c2 83 e2 07 48 83 fa 
>> 07 75 1c <48> ba 00 00 00 00 00 fc ff df 48 c1 e8 03 0f b6 04 10 84 c0 
>> 75 30
>> [ 1347.269398] RSP: 0018:ffff888102eef288 EFLAGS: 00000246
>> [ 1347.269410] RAX: ffff888109280047 RBX: ffff8881157da0ca RCX: 
>> ffffffff8fca8434
>> [ 1347.269419] RDX: 0000000000000007 RSI: 0000000000000001 RDI: 
>> ffff888109280040
>> [ 1347.269427] RBP: ffff888102eef288 R08: 00000000000003ae R09: 
>> ffffed1021250de8
>> [ 1347.269435] R10: ffff888109286f39 R11: ffffed1021250de7 R12: 
>> 0000000000000155
>> [ 1347.269443] R13: ffffc9000122ff18 R14: ffff888109280000 R15: 
>> ffff888109280000
>> [ 1347.269451] FS:  00007efe541e9380(0000) GS:ffff888252f80000(0000) 
>> knlGS:0000000000000000
>> [ 1347.269462] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 1347.269470] CR2: 00007fa7a15e2000 CR3: 000000013100a003 CR4: 
>> 0000000000770ee0
>> [ 1347.269479] PKRU: 55555554
>> [ 1347.269484] Call Trace:
>> [ 1347.269488]  <TASK>
>> [ 1347.269494]  lzma_main+0x3b4/0x12b0
>> [ 1347.269517]  lzma2_lzma+0x2dd/0x460
>> [ 1347.269535]  xz_dec_lzma2_run+0x11e/0xba0
>> [ 1347.269549]  ? __asan_load4+0x8/0xb0
>> [ 1347.269564]  ? crc32_le_base+0x179/0x250
>> [ 1347.269581]  xz_dec_run+0x57b/0x11a0
>> [ 1347.269600]  squashfs_xz_uncompress+0x196/0x370
>> [ 1347.269620]  squashfs_decompress+0x8e/0xd0
>> [ 1347.269633]  ? lzo_uncompress+0x400/0x400
>> [ 1347.269648]  squashfs_read_data+0x1e6/0x900
>> [ 1347.269662]  ? kmemleak_alloc+0x4b/0x80
>> [ 1347.269678]  ? squashfs_bio_read.isra.0+0x230/0x230
>> [ 1347.269694]  ? squashfs_page_actor_init_special+0x1bb/0x230
>> [ 1347.269710]  squashfs_readahead+0xa86/0xe70
>> [ 1347.269722]  ? put_pages_list+0x1e0/0x1e0
>> [ 1347.269747]  ? squashfs_fill_page+0x190/0x190
>> [ 1347.269763]  ? psi_task_change+0x106/0x130
>> [ 1347.269781]  ? raw_spin_rq_unlock+0x17/0x60
>> [ 1347.269796]  read_pages+0x12d/0x530
>> [ 1347.269814]  ? file_ra_state_init+0x60/0x60
>> [ 1347.269829]  ? xas_free_nodes+0x170/0x170
>> [ 1347.269844]  ? filemap_alloc_folio+0xcf/0x120
>> [ 1347.269863]  page_cache_ra_unbounded+0x1aa/0x280
>> [ 1347.269878]  ? __rcu_read_unlock+0x57/0x270
>> [ 1347.269898]  do_page_cache_ra+0x7c/0x90
>> [ 1347.269915]  page_cache_ra_order+0x3b4/0x420
>> [ 1347.269936]  filemap_fault+0x849/0xe40
>> [ 1347.269951]  ? read_cache_page_gfp+0x90/0x90
>> [ 1347.269970]  __do_fault+0x76/0x1e0
>> [ 1347.269984]  do_fault+0x1e5/0x6c0
>> [ 1347.269998]  __handle_mm_fault+0x905/0x1740
>> [ 1347.270015]  ? copy_page_range+0x2130/0x2130
>> [ 1347.270028]  ? mas_find+0x100/0x100
>> [ 1347.270050]  handle_mm_fault+0x11c/0x3b0
>> [ 1347.270065]  do_user_addr_fault+0x261/0x8b0
>> [ 1347.270084]  exc_page_fault+0x61/0xf0
>> [ 1347.270101]  asm_exc_page_fault+0x27/0x30
>> [ 1347.270115] RIP: 0033:0x7efe544ce8b4
>> [ 1347.270126] Code: d5 c4 e1 f9 7e d0 c4 e1 f9 7e d7 48 c1 f8 35 48 
>> c1 ff 2e 48 89 c1 48 89 fe b8 ff 01 00 00 29 c8 83 e6 7f 48 8d 0d ec 
>> 45 07 00 <c5> fb 10 04 f1 48 98 48 8d 0d fe 44 07 00 c5 fb 59 0c c1 c5 
>> f3 59
>> [ 1347.270137] RSP: 002b:00007ffc019a6e30 EFLAGS: 00010206
>> [ 1347.270147] RAX: 0000000000000005 RBX: 00007ffc019a6f50 RCX: 
>> 00007efe54542ea0
>> [ 1347.270156] RDX: ffffffffbfefe12e RSI: 000000000000007b RDI: 
>> 000000000000fd7b
>> [ 1347.270164] RBP: 00007ffc019a6f58 R08: 00007ffc019a6f40 R09: 
>> 00000000006f71c0
>> [ 1347.270173] R10: 00007efe547fc500 R11: 00007efe54459b60 R12: 
>> 00007ffc019a7060
>> [ 1347.270181] R13: 00007ffc019a6f48 R14: 00007ffc019a7040 R15: 
>> 00007ffc019a6f40
>> [ 1347.270199]  </TASK>
>>
>>
>> [ 1534.865687] rcu: INFO: rcu_preempt detected expedited stalls on 
>> CPUs/tasks: { 3-.... } 7 jiffies s: 1393 root: 0x8/.
>> [ 1534.865725] rcu: blocking rcu_node structures (internal RCU debug):
>> [ 1534.865735] Sending NMI from CPU 2 to CPUs 3:
>> [ 1534.865752] NMI backtrace for cpu 3
>> [ 1534.865762] CPU: 3 PID: 9629 Comm: FS Broker 9627 Not tainted 
>> 6.1.0-rc5 #1
>> [ 1534.865776] Hardware name: LENOVO 82H8/LNVNB161216, BIOS GGCN49WW 
>> 07/21/2022
>> [ 1534.865782] RIP: 0010:__asan_load4+0x8/0xb0
>> [ 1534.865802] Code: cc cc cc cc 48 c1 e8 03 80 3c 10 00 75 e9 5d c3 
>> cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 55 48 89 e5 4c 
>> 8b 45 08 <48> 83 ff fb 77 63 eb 0f 0f 1f 00 48 b8 00 00 00 00 00 00 00 
>> ff eb
>> [ 1534.865811] RSP: 0018:ffff88811fd3f288 EFLAGS: 00000246
>> [ 1534.865823] RAX: 0000000000000000 RBX: 000000000001d378 RCX: 
>> dffffc0000000000
>> [ 1534.865831] RDX: 0000000000000003 RSI: 000000000000077f RDI: 
>> ffff888109280080
>> [ 1534.865838] RBP: ffff88811fd3f288 R08: ffffffff8fca8622 R09: 
>> 000000000001d380
>> [ 1534.865847] R10: ffff888114694020 R11: ffffed102843b9ff R12: 
>> ffff888109280006
>> [ 1534.865854] R13: ffff888109280000 R14: ffff888109280000 R15: 
>> 0000000002bbab0b
>> [ 1534.865861] FS:  00007fe923eff700(0000) GS:ffff888252f80000(0000) 
>> knlGS:0000000000000000
>> [ 1534.865871] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 1534.865878] CR2: 00007fe90fe90ac0 CR3: 0000000138d5c003 CR4: 
>> 0000000000770ee0
>> [ 1534.865885] PKRU: 55555554
>> [ 1534.865889] Call Trace:
>> [ 1534.865893]  <TASK>
>> [ 1534.865898]  lzma_main+0x5a2/0x12b0
>> [ 1534.865920]  lzma2_lzma+0x2dd/0x460
>> [ 1534.865935]  xz_dec_lzma2_run+0x11e/0xba0
>> [ 1534.865948]  ? crc32_le_base+0x58/0x250
>> [ 1534.865961]  ? crc32_le_base+0x1e3/0x250
>> [ 1534.865977]  xz_dec_run+0x57b/0x11a0
>> [ 1534.865995]  squashfs_xz_uncompress+0x196/0x370
>> [ 1534.866011]  squashfs_decompress+0x8e/0xd0
>> [ 1534.866022]  ? lzo_uncompress+0x400/0x400
>> [ 1534.866034]  squashfs_read_data+0x1e6/0x900
>> [ 1534.866047]  ? kmemleak_alloc+0x4b/0x80
>> [ 1534.866061]  ? squashfs_bio_read.isra.0+0x230/0x230
>> [ 1534.866074]  ? squashfs_page_actor_init_special+0x1bb/0x230
>> [ 1534.866087]  squashfs_readahead+0xa86/0xe70
>> [ 1534.866099]  ? xas_create+0x141/0x600
>> [ 1534.866119]  ? squashfs_fill_page+0x190/0x190
>> [ 1534.866136]  ? psi_task_change+0x106/0x130
>> [ 1534.866155]  ? raw_spin_rq_unlock+0x17/0x60
>> [ 1534.866171]  read_pages+0x12d/0x530
>> [ 1534.866186]  ? __kasan_check_read+0x11/0x20
>> [ 1534.866200]  ? folio_batch_add_and_move+0x8d/0xa0
>> [ 1534.866217]  ? file_ra_state_init+0x60/0x60
>> [ 1534.866231]  ? folio_add_lru+0x47/0x70
>> [ 1534.866249]  ? filemap_alloc_folio+0xcf/0x120
>> [ 1534.866267]  page_cache_ra_unbounded+0x1eb/0x280
>> [ 1534.866280]  ? __rcu_read_unlock+0x57/0x270
>> [ 1534.866297]  do_page_cache_ra+0x7c/0x90
>> [ 1534.866312]  page_cache_ra_order+0x3b4/0x420
>> [ 1534.866330]  filemap_fault+0x849/0xe40
>> [ 1534.866343]  ? read_cache_page_gfp+0x90/0x90
>> [ 1534.866359]  __do_fault+0x76/0x1e0
>> [ 1534.866372]  do_fault+0x1e5/0x6c0
>> [ 1534.866384]  __handle_mm_fault+0x905/0x1740
>> [ 1534.866399]  ? copy_page_range+0x2130/0x2130
>> [ 1534.866412]  ? mas_find+0x100/0x100
>> [ 1534.866428]  handle_mm_fault+0x11c/0x3b0
>> [ 1534.866443]  do_user_addr_fault+0x261/0x8b0
>> [ 1534.866456]  ? exit_to_user_mode_prepare+0x9b/0x190
>> [ 1534.866472]  exc_page_fault+0x61/0xf0
>> [ 1534.866486]  asm_exc_page_fault+0x27/0x30
>> [ 1534.866498] RIP: 0033:0x7fe93673dd00
>> [ 1534.866515] Code: Unable to access opcode bytes at 0x7fe93673dcd6.
>> [ 1534.866521] RSP: 002b:00007fe923efeea8 EFLAGS: 00010246
>> [ 1534.866532] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 
>> 0000000000000001
>> [ 1534.866539] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 
>> 0000000000000000
>> [ 1534.866545] RBP: 0000000000000000 R08: 0000000000000000 R09: 
>> 00007fe837100098
>> [ 1534.866553] R10: 00007fe868300de8 R11: 0000000000000217 R12: 
>> 00007fff95860b2e
>> [ 1534.866561] R13: 00007fff95860b2f R14: 00007fff95860bf8 R15: 
>> 00007fe923efef80
>> [ 1534.866576]  </TASK>
>>
>> Thank you,
>> Mirsad
>>
>> -- 
>> Mirsad Goran Todorovac
>> Sistem inženjer
>> Grafički fakultet | Akademija likovnih umjetnosti
>> Sveučilište u Zagrebu
> 
> -- 
> Mirsad Goran Todorovac
> Sistem inženjer
> Grafički fakultet | Akademija likovnih umjetnosti
> Sveučilište u Zagrebu


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: BUG: in squashfs_xz_uncompress() (Was: RCU stalls in squashfs_readahead())
  2022-11-18  6:11   ` Phillip Lougher
@ 2022-11-18 15:17     ` Mirsad Goran Todorovac
  2022-11-18 15:55     ` Paul E. McKenney
  2022-11-18 16:51     ` BUG: " Elliott, Robert (Servers)
  2 siblings, 0 replies; 25+ messages in thread
From: Mirsad Goran Todorovac @ 2022-11-18 15:17 UTC (permalink / raw)
  To: Phillip Lougher, LKML, Paul E. McKenney
  Cc: phillip.lougher, Thorsten Leemhuis

On 18. 11. 2022. 07:11, Phillip Lougher wrote:
> On 17/11/2022 23:05, Mirsad Goran Todorovac wrote:
>> Hi,
>>
>> While trying to bisect, I've found another bug that predated the introduction of squashfs_readahead(), but it has
>> a common denominator in squashfs_decompress() and squashfs_xz_uncompress().
> 
> Wrong, the stall is happening in the XZ decompressor library, which
> is *not* in Squashfs.
> 
> This reported stall in the decompressor code is likely a symptom of you
> deliberately thrashing your system.  When the system thrashes everything
> starts to happen very slowly, and the system will spend a lot of
> its time doing page I/O, and the CPU will spend a lot of time in
> any CPU intensive code like the XZ decompressor library.
> 
> So the fact the stall is being hit here is a symptom and not
> a cause.  The decompressor code is likely running slowly due to
> thrashing and waiting on paged-out buffers.  This is not indicative
> of any bug, merely a system running slowly due to overload.
> 
> As I said, this is not a Squashfs issue, because the code when the
> stall takes place isn't in Squashfs.
> 
> The people responsible for the rcu code should have a lot more insight
> about what happens when the system is thrashing, and how this will
> throw up false positives.  In this I believe this is an instance of
> perfectly correct code running slowly due to thrashing incorrectly
> being flagged as looping.
> 
> CC'ing Paul E. McKenney <paulmck@kernel.org>
> 
> Phillip
> 
> 
> 
>>
>> Frankly, I need an advice on how to handle a situation like this.
>>
>> Obviously, the best place to test the RCU problem with squashfs_readahead() is where it is introduced + Phillip's patches [012]/3.
>>
>> Thanks,
>> Mirsad
>>
>> P.S.
>>
>> This is excerpt from dmesg log attached:
>>
>> [   25.338169] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 4-... } 6 jiffies s: 373 root: 0x10/.
>> [   25.338202] rcu: blocking rcu_node structures (internal RCU debug):
>> [   25.338210] Task dump for CPU 4:
>> [   25.338216] task:canonical-livep state:R  running task     stack: 0 pid:  913 ppid:     1 flags:0x00004008
>> [   25.338237] Call Trace:
>> [   25.338243]  <TASK>
>> [   25.338249]  ? __kasan_check_write+0x14/0x20
>> [   25.338272]  ? _raw_spin_lock_irqsave+0x93/0xf0
>> [   25.338293]  ? sysvec_apic_timer_interrupt+0x89/0x90
>> [   25.338311]  ? asm_sysvec_apic_timer_interrupt+0x20/0x30
>> [   25.338327]  ? sysvec_apic_timer_interrupt+0x89/0x90
>> [   25.338342]  ? sysvec_apic_timer_interrupt+0x89/0x90
>> [   25.338359]  ? asm_sysvec_apic_timer_interrupt+0x20/0x30
>> [   25.338377]  ? lzma_len+0x1fb/0x3b0
>> [   25.338395]  ? dict_repeat+0xec/0x120
>> [   25.338413]  ? lzma_main+0x10d/0x1260
>> [   25.338437]  ? lzma2_lzma+0x2a4/0x3b0
>> [   25.338457]  ? xz_dec_lzma2_run+0x11f/0xb90
>> [   25.338483]  ? xz_dec_run+0x346/0x11e0
>> [   25.338506]  ? squashfs_xz_uncompress+0x135/0x330
>> [   25.338530]  ? lzo_init+0xd0/0xd0
>> [   25.338546]  ? squashfs_decompress+0x88/0xc0
>> [   25.338565]  ? squashfs_read_data+0x1e5/0x8e0
>> [   25.338582]  ? __filemap_get_folio+0x37e/0x540
>> [   25.338602]  ? squashfs_bio_read.isra.2+0x220/0x220
>> [   25.338628]  ? squashfs_readpage_block+0x493/0x950
>> [   25.338650]  ? squashfs_read_folio+0xa4c/0x1140
>> [   25.338671]  ? xas_nomem+0x2f/0x100
>> [   25.338689]  ? squashfs_copy_cache+0x360/0x360
>> [   25.338707]  ? folio_wait_bit+0x3c0/0x3c0
>> [   25.338725]  ? __kasan_check_read+0x11/0x20
>> [   25.338739]  ? pagevec_add_and_need_flush+0x8c/0xa0
>> [   25.338757]  ? folio_add_lru+0x42/0x70
>> [   25.338772]  ? filemap_add_folio+0xd4/0x130
>> [   25.338788]  ? add_to_page_cache_locked+0xa0/0xa0
>> [   25.338806]  ? filemap_read_folio.isra.62+0xbe/0x4d0
>> [   25.338822]  ? __kasan_check_write+0x14/0x20
>> [   25.338838]  ? filemap_page_mkwrite+0x6a0/0x6a0
>> [   25.338856]  ? filemap_add_folio+0x130/0x130
>> [   25.338879]  ? filemap_fault+0x1041/0x1310
>> [   25.338895]  ? __kasan_check_read+0x11/0x20
>> [   25.338918]  ? read_cache_page_gfp+0xd0/0xd0
>> [   25.338932]  ? __rcu_read_unlock+0x55/0x260
>> [   25.338953]  ? try_to_wake_up+0x3c2/0xa10
>> [   25.338972]  ? __filemap_get_folio+0x540/0x540
>> [   25.338985]  ? __pmd+0x10/0x10
>> [   25.339002]  ? __do_fault+0x7c/0x1b0
>> [   25.339019]  ? do_fault+0x1ce/0x6a0
>> [   25.339036]  ? __handle_mm_fault+0x9cb/0x1490
>> [   25.339057]  ? copy_page_range+0x1b90/0x1b90
>> [   25.339073]  ? kernel_fpu_begin_mask+0x160/0x160
>> [   25.339092]  ? __ia32_sys_futex_time32+0x300/0x300
>> [   25.339119]  ? handle_mm_fault+0x11b/0x380
>> [   25.339138]  ? do_user_addr_fault+0x258/0x810
>> [   25.339157]  ? exc_page_fault+0x60/0xe0
>> [   25.339173]  ? asm_exc_page_fault+0x2c/0x40
>> [   25.339194]  </TASK>
>>
>>
>>
>> On 17. 11. 2022. 20:21, Mirsad Goran Todorovac wrote:
>>> Dear all,
>>>
>>> I've noticed two similar RCU stalls with the similar call trace, both involving squashfs_readeahead().
>>>
>>> Configuration is (excerpt):
>>>
>>> CONFIG_KASAN_SHADOW_OFFSET=0xdffffc0000000000
>>> CONFIG_HAVE_DEBUG_KMEMLEAK=y
>>> CONFIG_DEBUG_KMEMLEAK=y
>>> CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE=16000
>>> # CONFIG_DEBUG_KMEMLEAK_TEST is not set
>>> # CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF is not set
>>> CONFIG_DEBUG_KMEMLEAK_AUTO_SCAN=y
>>> CONFIG_HAVE_ARCH_KASAN=y
>>> CONFIG_HAVE_ARCH_KASAN_VMALLOC=y
>>> CONFIG_CC_HAS_KASAN_GENERIC=y
>>> CONFIG_KASAN=y
>>> CONFIG_KASAN_GENERIC=y
>>> CONFIG_KASAN_OUTLINE=y
>>> # CONFIG_KASAN_INLINE is not set
>>> CONFIG_KASAN_STACK=y
>>> # CONFIG_KASAN_VMALLOC is not set
>>> # CONFIG_KASAN_MODULE_TEST is not set
>>>
>>> Do you think this is worthy of a bisect? I am not yet certain about being able to reproduce the stall, however it occurs at
>>> great memory use and heavy swapping (I thought of experimenting with efficiency of multi-gen LRU in 6.1 with multimedia under 
>>> heavy loads and low memory ...).
>>>
>>> Here is the excerpt from dmesg:
>>>
>>> [ 1326.301544] perf: interrupt took too long (3181 > 3178), lowering kernel.perf_event_max_sample_rate to 62750
>>> [ 1347.269247] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 3-.... } 6 jiffies s: 1317 root: 0x8/.
>>> [ 1347.269294] rcu: blocking rcu_node structures (internal RCU debug):
>>> [ 1347.269310] Sending NMI from CPU 2 to CPUs 3:
>>> [ 1347.269329] NMI backtrace for cpu 3
>>> [ 1347.269343] CPU: 3 PID: 7907 Comm: Isolated Web Co Not tainted 6.1.0-rc5 #1
>>> [ 1347.269357] Hardware name: LENOVO 82H8/LNVNB161216, BIOS GGCN49WW 07/21/2022
>>> [ 1347.269363] RIP: 0010:__asan_load8+0x3e/0xb0
>>> [ 1347.269386] Code: b8 00 00 00 00 00 00 00 ff eb 0a 48 b8 00 00 00 00 00 80 ff ff 48 39 f8 77 44 48 8d 47 07 48 89 c2 83 e2 07 
>>> 48 83 fa 07 75 1c <48> ba 00 00 00 00 00 fc ff df 48 c1 e8 03 0f b6 04 10 84 c0 75 30
>>> [ 1347.269398] RSP: 0018:ffff888102eef288 EFLAGS: 00000246
>>> [ 1347.269410] RAX: ffff888109280047 RBX: ffff8881157da0ca RCX: ffffffff8fca8434
>>> [ 1347.269419] RDX: 0000000000000007 RSI: 0000000000000001 RDI: ffff888109280040
>>> [ 1347.269427] RBP: ffff888102eef288 R08: 00000000000003ae R09: ffffed1021250de8
>>> [ 1347.269435] R10: ffff888109286f39 R11: ffffed1021250de7 R12: 0000000000000155
>>> [ 1347.269443] R13: ffffc9000122ff18 R14: ffff888109280000 R15: ffff888109280000
>>> [ 1347.269451] FS:  00007efe541e9380(0000) GS:ffff888252f80000(0000) knlGS:0000000000000000
>>> [ 1347.269462] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 1347.269470] CR2: 00007fa7a15e2000 CR3: 000000013100a003 CR4: 0000000000770ee0
>>> [ 1347.269479] PKRU: 55555554
>>> [ 1347.269484] Call Trace:
>>> [ 1347.269488]  <TASK>
>>> [ 1347.269494]  lzma_main+0x3b4/0x12b0
>>> [ 1347.269517]  lzma2_lzma+0x2dd/0x460
>>> [ 1347.269535]  xz_dec_lzma2_run+0x11e/0xba0
>>> [ 1347.269549]  ? __asan_load4+0x8/0xb0
>>> [ 1347.269564]  ? crc32_le_base+0x179/0x250
>>> [ 1347.269581]  xz_dec_run+0x57b/0x11a0
>>> [ 1347.269600]  squashfs_xz_uncompress+0x196/0x370
>>> [ 1347.269620]  squashfs_decompress+0x8e/0xd0
>>> [ 1347.269633]  ? lzo_uncompress+0x400/0x400
>>> [ 1347.269648]  squashfs_read_data+0x1e6/0x900
>>> [ 1347.269662]  ? kmemleak_alloc+0x4b/0x80
>>> [ 1347.269678]  ? squashfs_bio_read.isra.0+0x230/0x230
>>> [ 1347.269694]  ? squashfs_page_actor_init_special+0x1bb/0x230
>>> [ 1347.269710]  squashfs_readahead+0xa86/0xe70
>>> [ 1347.269722]  ? put_pages_list+0x1e0/0x1e0
>>> [ 1347.269747]  ? squashfs_fill_page+0x190/0x190
>>> [ 1347.269763]  ? psi_task_change+0x106/0x130
>>> [ 1347.269781]  ? raw_spin_rq_unlock+0x17/0x60
>>> [ 1347.269796]  read_pages+0x12d/0x530
>>> [ 1347.269814]  ? file_ra_state_init+0x60/0x60
>>> [ 1347.269829]  ? xas_free_nodes+0x170/0x170
>>> [ 1347.269844]  ? filemap_alloc_folio+0xcf/0x120
>>> [ 1347.269863]  page_cache_ra_unbounded+0x1aa/0x280
>>> [ 1347.269878]  ? __rcu_read_unlock+0x57/0x270
>>> [ 1347.269898]  do_page_cache_ra+0x7c/0x90
>>> [ 1347.269915]  page_cache_ra_order+0x3b4/0x420
>>> [ 1347.269936]  filemap_fault+0x849/0xe40
>>> [ 1347.269951]  ? read_cache_page_gfp+0x90/0x90
>>> [ 1347.269970]  __do_fault+0x76/0x1e0
>>> [ 1347.269984]  do_fault+0x1e5/0x6c0
>>> [ 1347.269998]  __handle_mm_fault+0x905/0x1740
>>> [ 1347.270015]  ? copy_page_range+0x2130/0x2130
>>> [ 1347.270028]  ? mas_find+0x100/0x100
>>> [ 1347.270050]  handle_mm_fault+0x11c/0x3b0
>>> [ 1347.270065]  do_user_addr_fault+0x261/0x8b0
>>> [ 1347.270084]  exc_page_fault+0x61/0xf0
>>> [ 1347.270101]  asm_exc_page_fault+0x27/0x30
>>> [ 1347.270115] RIP: 0033:0x7efe544ce8b4
>>> [ 1347.270126] Code: d5 c4 e1 f9 7e d0 c4 e1 f9 7e d7 48 c1 f8 35 48 c1 ff 2e 48 89 c1 48 89 fe b8 ff 01 00 00 29 c8 83 e6 7f 48 
>>> 8d 0d ec 45 07 00 <c5> fb 10 04 f1 48 98 48 8d 0d fe 44 07 00 c5 fb 59 0c c1 c5 f3 59
>>> [ 1347.270137] RSP: 002b:00007ffc019a6e30 EFLAGS: 00010206
>>> [ 1347.270147] RAX: 0000000000000005 RBX: 00007ffc019a6f50 RCX: 00007efe54542ea0
>>> [ 1347.270156] RDX: ffffffffbfefe12e RSI: 000000000000007b RDI: 000000000000fd7b
>>> [ 1347.270164] RBP: 00007ffc019a6f58 R08: 00007ffc019a6f40 R09: 00000000006f71c0
>>> [ 1347.270173] R10: 00007efe547fc500 R11: 00007efe54459b60 R12: 00007ffc019a7060
>>> [ 1347.270181] R13: 00007ffc019a6f48 R14: 00007ffc019a7040 R15: 00007ffc019a6f40
>>> [ 1347.270199]  </TASK>
>>>
>>>
>>> [ 1534.865687] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 3-.... } 7 jiffies s: 1393 root: 0x8/.
>>> [ 1534.865725] rcu: blocking rcu_node structures (internal RCU debug):
>>> [ 1534.865735] Sending NMI from CPU 2 to CPUs 3:
>>> [ 1534.865752] NMI backtrace for cpu 3
>>> [ 1534.865762] CPU: 3 PID: 9629 Comm: FS Broker 9627 Not tainted 6.1.0-rc5 #1
>>> [ 1534.865776] Hardware name: LENOVO 82H8/LNVNB161216, BIOS GGCN49WW 07/21/2022
>>> [ 1534.865782] RIP: 0010:__asan_load4+0x8/0xb0
>>> [ 1534.865802] Code: cc cc cc cc 48 c1 e8 03 80 3c 10 00 75 e9 5d c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 55 48 
>>> 89 e5 4c 8b 45 08 <48> 83 ff fb 77 63 eb 0f 0f 1f 00 48 b8 00 00 00 00 00 00 00 ff eb
>>> [ 1534.865811] RSP: 0018:ffff88811fd3f288 EFLAGS: 00000246
>>> [ 1534.865823] RAX: 0000000000000000 RBX: 000000000001d378 RCX: dffffc0000000000
>>> [ 1534.865831] RDX: 0000000000000003 RSI: 000000000000077f RDI: ffff888109280080
>>> [ 1534.865838] RBP: ffff88811fd3f288 R08: ffffffff8fca8622 R09: 000000000001d380
>>> [ 1534.865847] R10: ffff888114694020 R11: ffffed102843b9ff R12: ffff888109280006
>>> [ 1534.865854] R13: ffff888109280000 R14: ffff888109280000 R15: 0000000002bbab0b
>>> [ 1534.865861] FS:  00007fe923eff700(0000) GS:ffff888252f80000(0000) knlGS:0000000000000000
>>> [ 1534.865871] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 1534.865878] CR2: 00007fe90fe90ac0 CR3: 0000000138d5c003 CR4: 0000000000770ee0
>>> [ 1534.865885] PKRU: 55555554
>>> [ 1534.865889] Call Trace:
>>> [ 1534.865893]  <TASK>
>>> [ 1534.865898]  lzma_main+0x5a2/0x12b0
>>> [ 1534.865920]  lzma2_lzma+0x2dd/0x460
>>> [ 1534.865935]  xz_dec_lzma2_run+0x11e/0xba0
>>> [ 1534.865948]  ? crc32_le_base+0x58/0x250
>>> [ 1534.865961]  ? crc32_le_base+0x1e3/0x250
>>> [ 1534.865977]  xz_dec_run+0x57b/0x11a0
>>> [ 1534.865995]  squashfs_xz_uncompress+0x196/0x370
>>> [ 1534.866011]  squashfs_decompress+0x8e/0xd0
>>> [ 1534.866022]  ? lzo_uncompress+0x400/0x400
>>> [ 1534.866034]  squashfs_read_data+0x1e6/0x900
>>> [ 1534.866047]  ? kmemleak_alloc+0x4b/0x80
>>> [ 1534.866061]  ? squashfs_bio_read.isra.0+0x230/0x230
>>> [ 1534.866074]  ? squashfs_page_actor_init_special+0x1bb/0x230
>>> [ 1534.866087]  squashfs_readahead+0xa86/0xe70
>>> [ 1534.866099]  ? xas_create+0x141/0x600
>>> [ 1534.866119]  ? squashfs_fill_page+0x190/0x190
>>> [ 1534.866136]  ? psi_task_change+0x106/0x130
>>> [ 1534.866155]  ? raw_spin_rq_unlock+0x17/0x60
>>> [ 1534.866171]  read_pages+0x12d/0x530
>>> [ 1534.866186]  ? __kasan_check_read+0x11/0x20
>>> [ 1534.866200]  ? folio_batch_add_and_move+0x8d/0xa0
>>> [ 1534.866217]  ? file_ra_state_init+0x60/0x60
>>> [ 1534.866231]  ? folio_add_lru+0x47/0x70
>>> [ 1534.866249]  ? filemap_alloc_folio+0xcf/0x120
>>> [ 1534.866267]  page_cache_ra_unbounded+0x1eb/0x280
>>> [ 1534.866280]  ? __rcu_read_unlock+0x57/0x270
>>> [ 1534.866297]  do_page_cache_ra+0x7c/0x90
>>> [ 1534.866312]  page_cache_ra_order+0x3b4/0x420
>>> [ 1534.866330]  filemap_fault+0x849/0xe40
>>> [ 1534.866343]  ? read_cache_page_gfp+0x90/0x90
>>> [ 1534.866359]  __do_fault+0x76/0x1e0
>>> [ 1534.866372]  do_fault+0x1e5/0x6c0
>>> [ 1534.866384]  __handle_mm_fault+0x905/0x1740
>>> [ 1534.866399]  ? copy_page_range+0x2130/0x2130
>>> [ 1534.866412]  ? mas_find+0x100/0x100
>>> [ 1534.866428]  handle_mm_fault+0x11c/0x3b0
>>> [ 1534.866443]  do_user_addr_fault+0x261/0x8b0
>>> [ 1534.866456]  ? exit_to_user_mode_prepare+0x9b/0x190
>>> [ 1534.866472]  exc_page_fault+0x61/0xf0
>>> [ 1534.866486]  asm_exc_page_fault+0x27/0x30
>>> [ 1534.866498] RIP: 0033:0x7fe93673dd00
>>> [ 1534.866515] Code: Unable to access opcode bytes at 0x7fe93673dcd6.
>>> [ 1534.866521] RSP: 002b:00007fe923efeea8 EFLAGS: 00010246
>>> [ 1534.866532] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
>>> [ 1534.866539] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000000
>>> [ 1534.866545] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007fe837100098
>>> [ 1534.866553] R10: 00007fe868300de8 R11: 0000000000000217 R12: 00007fff95860b2e >>> [ 1534.866561] R13: 00007fff95860b2f R14: 00007fff95860bf8 R15: 00007fe923efef80
>>> [ 1534.866576]  </TASK>

Hello Phillip,

I have considered your theory, and I usually admit when I'm wrong, but this is the very
same combination of programs, windows and tabs used when bisecting the squashfs_readahead()
but, only this time ran under a kernel with CONFIG_DEBUG_KMEMLEAK and CONFIG_KASAN turned
on. So I don't think it's thrashing the system.

MGLRU was supposed to make smarter decisions of evicting the unused memory pages from cache
than the 5.10 code, so my intent was to see how that is working, too.

However, I've browsed through the source and it does actually happen when you call xz_dec_run()
from squashfs_xz_uncompress().

I think that 30 windows and 5-30 tabs each are an example of then normal use nowadays.

Plus, with v5.10 and the same config it doesn't throw stalls (only the unrelated systemd-udevd).

[    7.333220] Missing case (clock == 269720)
[    7.333341] WARNING: CPU: 4 PID: 214 at drivers/gpu/drm/i915/display/intel_dpll_mgr.c:2979 icl_get_dplls+0x948/0x11a0 [i915]
[    7.333343] Modules linked in: i915(+) nvme nvme_core i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops 
cec input_leds rc_core drm serio_raw xhci_pci vm
d xhci_pci_renesas mac_hid video pinctrl_tigerlake
[    7.333368] CPU: 4 PID: 214 Comm: systemd-udevd Not tainted 5.10.0-05-2c85ebc57b3e-squashfs-stall #1
[    7.333370] Hardware name: LENOVO 82H8/LNVNB161216, BIOS GGCN49WW 07/21/2022

So, my conclusion is that KASAN is operable in 5.10, and the problem with xz decompression is in between
5.19-rc1 and 5.10.0.

What can lock up a core for a dozen of jiffies on my laptop could knock it down for a couple of
minutes on a heavily loaded cloud server.

I lack the knowledge how to see a history of commits and modifications on a single file,
thereof the confusion.

However, I did not even blame your code for this stall I thought you might be interested since
in the RCU stall error it is called from your squash_readahead().

Thorsten has somewhat discouraged me when saying that there are 950 issues on the list to bisect.
I thought how the most of them requires a Google cloud to reproduce, and here is one we could bisect
ourselves, and I already know the developer.

So, at no point was I blaming you or demeaning you. I thought how would I react to a bug in my code,
and concluded that I would be glad that the code became more elegant and closer to the "golden cut"
of programming, if there is such a thing.

Have a nice day!

Thanks
Mirsad

--
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu
-- 
System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia
The European Union


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: BUG: in squashfs_xz_uncompress() (Was: RCU stalls in squashfs_readahead())
  2022-11-18  6:11   ` Phillip Lougher
  2022-11-18 15:17     ` Mirsad Goran Todorovac
@ 2022-11-18 15:55     ` Paul E. McKenney
  2022-11-20 15:49       ` BUG: BISECTED: " Mirsad Goran Todorovac
  2022-11-18 16:51     ` BUG: " Elliott, Robert (Servers)
  2 siblings, 1 reply; 25+ messages in thread
From: Paul E. McKenney @ 2022-11-18 15:55 UTC (permalink / raw)
  To: Phillip Lougher
  Cc: Mirsad Goran Todorovac, LKML, phillip.lougher, Thorsten Leemhuis

On Fri, Nov 18, 2022 at 06:11:16AM +0000, Phillip Lougher wrote:
> On 17/11/2022 23:05, Mirsad Goran Todorovac wrote:
> > Hi,
> > 
> > While trying to bisect, I've found another bug that predated the
> > introduction of squashfs_readahead(), but it has
> > a common denominator in squashfs_decompress() and squashfs_xz_uncompress().
> 
> Wrong, the stall is happening in the XZ decompressor library, which
> is *not* in Squashfs.
> 
> This reported stall in the decompressor code is likely a symptom of you
> deliberately thrashing your system.  When the system thrashes everything
> starts to happen very slowly, and the system will spend a lot of
> its time doing page I/O, and the CPU will spend a lot of time in
> any CPU intensive code like the XZ decompressor library.
> 
> So the fact the stall is being hit here is a symptom and not
> a cause.  The decompressor code is likely running slowly due to
> thrashing and waiting on paged-out buffers.  This is not indicative
> of any bug, merely a system running slowly due to overload.
> 
> As I said, this is not a Squashfs issue, because the code when the
> stall takes place isn't in Squashfs.
> 
> The people responsible for the rcu code should have a lot more insight
> about what happens when the system is thrashing, and how this will
> throw up false positives.  In this I believe this is an instance of
> perfectly correct code running slowly due to thrashing incorrectly
> being flagged as looping.
> 
> CC'ing Paul E. McKenney <paulmck@kernel.org>
> 
> Phillip
> 
> 
> 
> > 
> > Frankly, I need an advice on how to handle a situation like this.
> > 
> > Obviously, the best place to test the RCU problem with
> > squashfs_readahead() is where it is introduced + Phillip's patches
> > [012]/3.
> > 
> > Thanks,
> > Mirsad
> > 
> > P.S.
> > 
> > This is excerpt from dmesg log attached:
> > 
> > [   25.338169] rcu: INFO: rcu_preempt detected expedited stalls on
> > CPUs/tasks: { 4-... } 6 jiffies s: 373 root: 0x10/.

The "6 jiffies" leads me to believe that this kernel was built with
CONFIG_ANDROID=y, but also with the short expedited grace-period commit,
that is, with CONFIG_RCU_EXP_CPU_STALL_TIMEOUT set to 20 milliseconds
and with HZ set to something like 250.  This means that the kernel
will complain if an RCU read-side critical section exceeeds about 20
milliseconds.  This is a fine thing if you are actually running on an
Android, but a bit extreme for general usage.

If my guesses are correct, please apply this commit from mainline:

1045a06724f3 ("remove CONFIG_ANDROID")

With this commit in place, CONFIG_RCU_EXP_CPU_STALL_TIMEOUT will default
to 21000 (that is, 21 seconds) and things will be much better.

							Thanx, Paul

> > [   25.338202] rcu: blocking rcu_node structures (internal RCU debug):
> > [   25.338210] Task dump for CPU 4:
> > [   25.338216] task:canonical-livep state:R  running task     stack:
> > 0 pid:  913 ppid:     1 flags:0x00004008
> > [   25.338237] Call Trace:
> > [   25.338243]  <TASK>
> > [   25.338249]  ? __kasan_check_write+0x14/0x20
> > [   25.338272]  ? _raw_spin_lock_irqsave+0x93/0xf0
> > [   25.338293]  ? sysvec_apic_timer_interrupt+0x89/0x90
> > [   25.338311]  ? asm_sysvec_apic_timer_interrupt+0x20/0x30
> > [   25.338327]  ? sysvec_apic_timer_interrupt+0x89/0x90
> > [   25.338342]  ? sysvec_apic_timer_interrupt+0x89/0x90
> > [   25.338359]  ? asm_sysvec_apic_timer_interrupt+0x20/0x30
> > [   25.338377]  ? lzma_len+0x1fb/0x3b0
> > [   25.338395]  ? dict_repeat+0xec/0x120
> > [   25.338413]  ? lzma_main+0x10d/0x1260
> > [   25.338437]  ? lzma2_lzma+0x2a4/0x3b0
> > [   25.338457]  ? xz_dec_lzma2_run+0x11f/0xb90
> > [   25.338483]  ? xz_dec_run+0x346/0x11e0
> > [   25.338506]  ? squashfs_xz_uncompress+0x135/0x330
> > [   25.338530]  ? lzo_init+0xd0/0xd0
> > [   25.338546]  ? squashfs_decompress+0x88/0xc0
> > [   25.338565]  ? squashfs_read_data+0x1e5/0x8e0
> > [   25.338582]  ? __filemap_get_folio+0x37e/0x540
> > [   25.338602]  ? squashfs_bio_read.isra.2+0x220/0x220
> > [   25.338628]  ? squashfs_readpage_block+0x493/0x950
> > [   25.338650]  ? squashfs_read_folio+0xa4c/0x1140
> > [   25.338671]  ? xas_nomem+0x2f/0x100
> > [   25.338689]  ? squashfs_copy_cache+0x360/0x360
> > [   25.338707]  ? folio_wait_bit+0x3c0/0x3c0
> > [   25.338725]  ? __kasan_check_read+0x11/0x20
> > [   25.338739]  ? pagevec_add_and_need_flush+0x8c/0xa0
> > [   25.338757]  ? folio_add_lru+0x42/0x70
> > [   25.338772]  ? filemap_add_folio+0xd4/0x130
> > [   25.338788]  ? add_to_page_cache_locked+0xa0/0xa0
> > [   25.338806]  ? filemap_read_folio.isra.62+0xbe/0x4d0
> > [   25.338822]  ? __kasan_check_write+0x14/0x20
> > [   25.338838]  ? filemap_page_mkwrite+0x6a0/0x6a0
> > [   25.338856]  ? filemap_add_folio+0x130/0x130
> > [   25.338879]  ? filemap_fault+0x1041/0x1310
> > [   25.338895]  ? __kasan_check_read+0x11/0x20
> > [   25.338918]  ? read_cache_page_gfp+0xd0/0xd0
> > [   25.338932]  ? __rcu_read_unlock+0x55/0x260
> > [   25.338953]  ? try_to_wake_up+0x3c2/0xa10
> > [   25.338972]  ? __filemap_get_folio+0x540/0x540
> > [   25.338985]  ? __pmd+0x10/0x10
> > [   25.339002]  ? __do_fault+0x7c/0x1b0
> > [   25.339019]  ? do_fault+0x1ce/0x6a0
> > [   25.339036]  ? __handle_mm_fault+0x9cb/0x1490
> > [   25.339057]  ? copy_page_range+0x1b90/0x1b90
> > [   25.339073]  ? kernel_fpu_begin_mask+0x160/0x160
> > [   25.339092]  ? __ia32_sys_futex_time32+0x300/0x300
> > [   25.339119]  ? handle_mm_fault+0x11b/0x380
> > [   25.339138]  ? do_user_addr_fault+0x258/0x810
> > [   25.339157]  ? exc_page_fault+0x60/0xe0
> > [   25.339173]  ? asm_exc_page_fault+0x2c/0x40
> > [   25.339194]  </TASK>
> > 
> > 
> > 
> > On 17. 11. 2022. 20:21, Mirsad Goran Todorovac wrote:
> > > Dear all,
> > > 
> > > I've noticed two similar RCU stalls with the similar call trace,
> > > both involving squashfs_readeahead().
> > > 
> > > Configuration is (excerpt):
> > > 
> > > CONFIG_KASAN_SHADOW_OFFSET=0xdffffc0000000000
> > > CONFIG_HAVE_DEBUG_KMEMLEAK=y
> > > CONFIG_DEBUG_KMEMLEAK=y
> > > CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE=16000
> > > # CONFIG_DEBUG_KMEMLEAK_TEST is not set
> > > # CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF is not set
> > > CONFIG_DEBUG_KMEMLEAK_AUTO_SCAN=y
> > > CONFIG_HAVE_ARCH_KASAN=y
> > > CONFIG_HAVE_ARCH_KASAN_VMALLOC=y
> > > CONFIG_CC_HAS_KASAN_GENERIC=y
> > > CONFIG_KASAN=y
> > > CONFIG_KASAN_GENERIC=y
> > > CONFIG_KASAN_OUTLINE=y
> > > # CONFIG_KASAN_INLINE is not set
> > > CONFIG_KASAN_STACK=y
> > > # CONFIG_KASAN_VMALLOC is not set
> > > # CONFIG_KASAN_MODULE_TEST is not set
> > > 
> > > Do you think this is worthy of a bisect? I am not yet certain about
> > > being able to reproduce the stall, however it occurs at
> > > great memory use and heavy swapping (I thought of experimenting with
> > > efficiency of multi-gen LRU in 6.1 with multimedia under heavy loads
> > > and low memory ...).
> > > 
> > > Here is the excerpt from dmesg:
> > > 
> > > [ 1326.301544] perf: interrupt took too long (3181 > 3178), lowering
> > > kernel.perf_event_max_sample_rate to 62750
> > > [ 1347.269247] rcu: INFO: rcu_preempt detected expedited stalls on
> > > CPUs/tasks: { 3-.... } 6 jiffies s: 1317 root: 0x8/.
> > > [ 1347.269294] rcu: blocking rcu_node structures (internal RCU debug):
> > > [ 1347.269310] Sending NMI from CPU 2 to CPUs 3:
> > > [ 1347.269329] NMI backtrace for cpu 3
> > > [ 1347.269343] CPU: 3 PID: 7907 Comm: Isolated Web Co Not tainted
> > > 6.1.0-rc5 #1
> > > [ 1347.269357] Hardware name: LENOVO 82H8/LNVNB161216, BIOS GGCN49WW
> > > 07/21/2022
> > > [ 1347.269363] RIP: 0010:__asan_load8+0x3e/0xb0
> > > [ 1347.269386] Code: b8 00 00 00 00 00 00 00 ff eb 0a 48 b8 00 00 00
> > > 00 00 80 ff ff 48 39 f8 77 44 48 8d 47 07 48 89 c2 83 e2 07 48 83 fa
> > > 07 75 1c <48> ba 00 00 00 00 00 fc ff df 48 c1 e8 03 0f b6 04 10 84
> > > c0 75 30
> > > [ 1347.269398] RSP: 0018:ffff888102eef288 EFLAGS: 00000246
> > > [ 1347.269410] RAX: ffff888109280047 RBX: ffff8881157da0ca RCX:
> > > ffffffff8fca8434
> > > [ 1347.269419] RDX: 0000000000000007 RSI: 0000000000000001 RDI:
> > > ffff888109280040
> > > [ 1347.269427] RBP: ffff888102eef288 R08: 00000000000003ae R09:
> > > ffffed1021250de8
> > > [ 1347.269435] R10: ffff888109286f39 R11: ffffed1021250de7 R12:
> > > 0000000000000155
> > > [ 1347.269443] R13: ffffc9000122ff18 R14: ffff888109280000 R15:
> > > ffff888109280000
> > > [ 1347.269451] FS:  00007efe541e9380(0000) GS:ffff888252f80000(0000)
> > > knlGS:0000000000000000
> > > [ 1347.269462] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [ 1347.269470] CR2: 00007fa7a15e2000 CR3: 000000013100a003 CR4:
> > > 0000000000770ee0
> > > [ 1347.269479] PKRU: 55555554
> > > [ 1347.269484] Call Trace:
> > > [ 1347.269488]  <TASK>
> > > [ 1347.269494]  lzma_main+0x3b4/0x12b0
> > > [ 1347.269517]  lzma2_lzma+0x2dd/0x460
> > > [ 1347.269535]  xz_dec_lzma2_run+0x11e/0xba0
> > > [ 1347.269549]  ? __asan_load4+0x8/0xb0
> > > [ 1347.269564]  ? crc32_le_base+0x179/0x250
> > > [ 1347.269581]  xz_dec_run+0x57b/0x11a0
> > > [ 1347.269600]  squashfs_xz_uncompress+0x196/0x370
> > > [ 1347.269620]  squashfs_decompress+0x8e/0xd0
> > > [ 1347.269633]  ? lzo_uncompress+0x400/0x400
> > > [ 1347.269648]  squashfs_read_data+0x1e6/0x900
> > > [ 1347.269662]  ? kmemleak_alloc+0x4b/0x80
> > > [ 1347.269678]  ? squashfs_bio_read.isra.0+0x230/0x230
> > > [ 1347.269694]  ? squashfs_page_actor_init_special+0x1bb/0x230
> > > [ 1347.269710]  squashfs_readahead+0xa86/0xe70
> > > [ 1347.269722]  ? put_pages_list+0x1e0/0x1e0
> > > [ 1347.269747]  ? squashfs_fill_page+0x190/0x190
> > > [ 1347.269763]  ? psi_task_change+0x106/0x130
> > > [ 1347.269781]  ? raw_spin_rq_unlock+0x17/0x60
> > > [ 1347.269796]  read_pages+0x12d/0x530
> > > [ 1347.269814]  ? file_ra_state_init+0x60/0x60
> > > [ 1347.269829]  ? xas_free_nodes+0x170/0x170
> > > [ 1347.269844]  ? filemap_alloc_folio+0xcf/0x120
> > > [ 1347.269863]  page_cache_ra_unbounded+0x1aa/0x280
> > > [ 1347.269878]  ? __rcu_read_unlock+0x57/0x270
> > > [ 1347.269898]  do_page_cache_ra+0x7c/0x90
> > > [ 1347.269915]  page_cache_ra_order+0x3b4/0x420
> > > [ 1347.269936]  filemap_fault+0x849/0xe40
> > > [ 1347.269951]  ? read_cache_page_gfp+0x90/0x90
> > > [ 1347.269970]  __do_fault+0x76/0x1e0
> > > [ 1347.269984]  do_fault+0x1e5/0x6c0
> > > [ 1347.269998]  __handle_mm_fault+0x905/0x1740
> > > [ 1347.270015]  ? copy_page_range+0x2130/0x2130
> > > [ 1347.270028]  ? mas_find+0x100/0x100
> > > [ 1347.270050]  handle_mm_fault+0x11c/0x3b0
> > > [ 1347.270065]  do_user_addr_fault+0x261/0x8b0
> > > [ 1347.270084]  exc_page_fault+0x61/0xf0
> > > [ 1347.270101]  asm_exc_page_fault+0x27/0x30
> > > [ 1347.270115] RIP: 0033:0x7efe544ce8b4
> > > [ 1347.270126] Code: d5 c4 e1 f9 7e d0 c4 e1 f9 7e d7 48 c1 f8 35 48
> > > c1 ff 2e 48 89 c1 48 89 fe b8 ff 01 00 00 29 c8 83 e6 7f 48 8d 0d ec
> > > 45 07 00 <c5> fb 10 04 f1 48 98 48 8d 0d fe 44 07 00 c5 fb 59 0c c1
> > > c5 f3 59
> > > [ 1347.270137] RSP: 002b:00007ffc019a6e30 EFLAGS: 00010206
> > > [ 1347.270147] RAX: 0000000000000005 RBX: 00007ffc019a6f50 RCX:
> > > 00007efe54542ea0
> > > [ 1347.270156] RDX: ffffffffbfefe12e RSI: 000000000000007b RDI:
> > > 000000000000fd7b
> > > [ 1347.270164] RBP: 00007ffc019a6f58 R08: 00007ffc019a6f40 R09:
> > > 00000000006f71c0
> > > [ 1347.270173] R10: 00007efe547fc500 R11: 00007efe54459b60 R12:
> > > 00007ffc019a7060
> > > [ 1347.270181] R13: 00007ffc019a6f48 R14: 00007ffc019a7040 R15:
> > > 00007ffc019a6f40
> > > [ 1347.270199]  </TASK>
> > > 
> > > 
> > > [ 1534.865687] rcu: INFO: rcu_preempt detected expedited stalls on
> > > CPUs/tasks: { 3-.... } 7 jiffies s: 1393 root: 0x8/.
> > > [ 1534.865725] rcu: blocking rcu_node structures (internal RCU debug):
> > > [ 1534.865735] Sending NMI from CPU 2 to CPUs 3:
> > > [ 1534.865752] NMI backtrace for cpu 3
> > > [ 1534.865762] CPU: 3 PID: 9629 Comm: FS Broker 9627 Not tainted
> > > 6.1.0-rc5 #1
> > > [ 1534.865776] Hardware name: LENOVO 82H8/LNVNB161216, BIOS GGCN49WW
> > > 07/21/2022
> > > [ 1534.865782] RIP: 0010:__asan_load4+0x8/0xb0
> > > [ 1534.865802] Code: cc cc cc cc 48 c1 e8 03 80 3c 10 00 75 e9 5d c3
> > > cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 55 48 89 e5 4c
> > > 8b 45 08 <48> 83 ff fb 77 63 eb 0f 0f 1f 00 48 b8 00 00 00 00 00 00
> > > 00 ff eb
> > > [ 1534.865811] RSP: 0018:ffff88811fd3f288 EFLAGS: 00000246
> > > [ 1534.865823] RAX: 0000000000000000 RBX: 000000000001d378 RCX:
> > > dffffc0000000000
> > > [ 1534.865831] RDX: 0000000000000003 RSI: 000000000000077f RDI:
> > > ffff888109280080
> > > [ 1534.865838] RBP: ffff88811fd3f288 R08: ffffffff8fca8622 R09:
> > > 000000000001d380
> > > [ 1534.865847] R10: ffff888114694020 R11: ffffed102843b9ff R12:
> > > ffff888109280006
> > > [ 1534.865854] R13: ffff888109280000 R14: ffff888109280000 R15:
> > > 0000000002bbab0b
> > > [ 1534.865861] FS:  00007fe923eff700(0000) GS:ffff888252f80000(0000)
> > > knlGS:0000000000000000
> > > [ 1534.865871] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [ 1534.865878] CR2: 00007fe90fe90ac0 CR3: 0000000138d5c003 CR4:
> > > 0000000000770ee0
> > > [ 1534.865885] PKRU: 55555554
> > > [ 1534.865889] Call Trace:
> > > [ 1534.865893]  <TASK>
> > > [ 1534.865898]  lzma_main+0x5a2/0x12b0
> > > [ 1534.865920]  lzma2_lzma+0x2dd/0x460
> > > [ 1534.865935]  xz_dec_lzma2_run+0x11e/0xba0
> > > [ 1534.865948]  ? crc32_le_base+0x58/0x250
> > > [ 1534.865961]  ? crc32_le_base+0x1e3/0x250
> > > [ 1534.865977]  xz_dec_run+0x57b/0x11a0
> > > [ 1534.865995]  squashfs_xz_uncompress+0x196/0x370
> > > [ 1534.866011]  squashfs_decompress+0x8e/0xd0
> > > [ 1534.866022]  ? lzo_uncompress+0x400/0x400
> > > [ 1534.866034]  squashfs_read_data+0x1e6/0x900
> > > [ 1534.866047]  ? kmemleak_alloc+0x4b/0x80
> > > [ 1534.866061]  ? squashfs_bio_read.isra.0+0x230/0x230
> > > [ 1534.866074]  ? squashfs_page_actor_init_special+0x1bb/0x230
> > > [ 1534.866087]  squashfs_readahead+0xa86/0xe70
> > > [ 1534.866099]  ? xas_create+0x141/0x600
> > > [ 1534.866119]  ? squashfs_fill_page+0x190/0x190
> > > [ 1534.866136]  ? psi_task_change+0x106/0x130
> > > [ 1534.866155]  ? raw_spin_rq_unlock+0x17/0x60
> > > [ 1534.866171]  read_pages+0x12d/0x530
> > > [ 1534.866186]  ? __kasan_check_read+0x11/0x20
> > > [ 1534.866200]  ? folio_batch_add_and_move+0x8d/0xa0
> > > [ 1534.866217]  ? file_ra_state_init+0x60/0x60
> > > [ 1534.866231]  ? folio_add_lru+0x47/0x70
> > > [ 1534.866249]  ? filemap_alloc_folio+0xcf/0x120
> > > [ 1534.866267]  page_cache_ra_unbounded+0x1eb/0x280
> > > [ 1534.866280]  ? __rcu_read_unlock+0x57/0x270
> > > [ 1534.866297]  do_page_cache_ra+0x7c/0x90
> > > [ 1534.866312]  page_cache_ra_order+0x3b4/0x420
> > > [ 1534.866330]  filemap_fault+0x849/0xe40
> > > [ 1534.866343]  ? read_cache_page_gfp+0x90/0x90
> > > [ 1534.866359]  __do_fault+0x76/0x1e0
> > > [ 1534.866372]  do_fault+0x1e5/0x6c0
> > > [ 1534.866384]  __handle_mm_fault+0x905/0x1740
> > > [ 1534.866399]  ? copy_page_range+0x2130/0x2130
> > > [ 1534.866412]  ? mas_find+0x100/0x100
> > > [ 1534.866428]  handle_mm_fault+0x11c/0x3b0
> > > [ 1534.866443]  do_user_addr_fault+0x261/0x8b0
> > > [ 1534.866456]  ? exit_to_user_mode_prepare+0x9b/0x190
> > > [ 1534.866472]  exc_page_fault+0x61/0xf0
> > > [ 1534.866486]  asm_exc_page_fault+0x27/0x30
> > > [ 1534.866498] RIP: 0033:0x7fe93673dd00
> > > [ 1534.866515] Code: Unable to access opcode bytes at 0x7fe93673dcd6.
> > > [ 1534.866521] RSP: 002b:00007fe923efeea8 EFLAGS: 00010246
> > > [ 1534.866532] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
> > > 0000000000000001
> > > [ 1534.866539] RDX: 0000000000000001 RSI: 0000000000000000 RDI:
> > > 0000000000000000
> > > [ 1534.866545] RBP: 0000000000000000 R08: 0000000000000000 R09:
> > > 00007fe837100098
> > > [ 1534.866553] R10: 00007fe868300de8 R11: 0000000000000217 R12:
> > > 00007fff95860b2e
> > > [ 1534.866561] R13: 00007fff95860b2f R14: 00007fff95860bf8 R15:
> > > 00007fe923efef80
> > > [ 1534.866576]  </TASK>
> > > 
> > > Thank you,
> > > Mirsad
> > > 
> > > -- 
> > > Mirsad Goran Todorovac
> > > Sistem inženjer
> > > Grafički fakultet | Akademija likovnih umjetnosti
> > > Sveučilište u Zagrebu
> > 
> > -- 
> > Mirsad Goran Todorovac
> > Sistem inženjer
> > Grafički fakultet | Akademija likovnih umjetnosti
> > Sveučilište u Zagrebu
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: BUG: in squashfs_xz_uncompress() (Was: RCU stalls in squashfs_readahead())
  2022-11-18  6:11   ` Phillip Lougher
  2022-11-18 15:17     ` Mirsad Goran Todorovac
  2022-11-18 15:55     ` Paul E. McKenney
@ 2022-11-18 16:51     ` Elliott, Robert (Servers)
  2022-12-20 10:43       ` Mirsad Todorovac
  2 siblings, 1 reply; 25+ messages in thread
From: Elliott, Robert (Servers) @ 2022-11-18 16:51 UTC (permalink / raw)
  To: Phillip Lougher, Mirsad Goran Todorovac, LKML, Paul E. McKenney
  Cc: phillip.lougher, Thorsten Leemhuis



> -----Original Message-----
> From: Phillip Lougher <phillip@squashfs.org.uk>
> Sent: Friday, November 18, 2022 12:11 AM
> To: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>; LKML <linux-
> kernel@vger.kernel.org>; Paul E. McKenney <paulmck@kernel.org>
> Cc: phillip.lougher@gmail.com; Thorsten Leemhuis
> <regressions@leemhuis.info>
> Subject: Re: BUG: in squashfs_xz_uncompress() (Was: RCU stalls in
> squashfs_readahead())
> 
> On 17/11/2022 23:05, Mirsad Goran Todorovac wrote:
> > Hi,
> >
> > While trying to bisect, I've found another bug that predated the
> > introduction of squashfs_readahead(), but it has
> > a common denominator in squashfs_decompress() and
> squashfs_xz_uncompress().
> 
> Wrong, the stall is happening in the XZ decompressor library, which
> is *not* in Squashfs.
> 
> This reported stall in the decompressor code is likely a symptom of you
> deliberately thrashing your system.  When the system thrashes everything
> starts to happen very slowly, and the system will spend a lot of
> its time doing page I/O, and the CPU will spend a lot of time in
> any CPU intensive code like the XZ decompressor library.
> 
> So the fact the stall is being hit here is a symptom and not
> a cause.  The decompressor code is likely running slowly due to
> thrashing and waiting on paged-out buffers.  This is not indicative
> of any bug, merely a system running slowly due to overload.
> 
> As I said, this is not a Squashfs issue, because the code when the
> stall takes place isn't in Squashfs.
> 
> The people responsible for the rcu code should have a lot more insight
> about what happens when the system is thrashing, and how this will
> throw up false positives.  In this I believe this is an instance of
> perfectly correct code running slowly due to thrashing incorrectly
> being flagged as looping.
> 
> CC'ing Paul E. McKenney <paulmck@kernel.org>
> 
> Phillip

How big can these readahead sizes be? Should one of the loops include
cond_resched() calls?


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: BUG: BISECTED: in squashfs_xz_uncompress() (Was: RCU stalls in squashfs_readahead())
  2022-11-18 15:55     ` Paul E. McKenney
@ 2022-11-20 15:49       ` Mirsad Goran Todorovac
  2022-11-20 17:50         ` Paul E. McKenney
  0 siblings, 1 reply; 25+ messages in thread
From: Mirsad Goran Todorovac @ 2022-11-20 15:49 UTC (permalink / raw)
  To: paulmck, Phillip Lougher; +Cc: LKML, phillip.lougher, Thorsten Leemhuis

[-- Attachment #1: Type: text/plain, Size: 8125 bytes --]

On 18. 11. 2022. 16:55, Paul E. McKenney wrote:
> On Fri, Nov 18, 2022 at 06:11:16AM +0000, Phillip Lougher wrote:
>> On 17/11/2022 23:05, Mirsad Goran Todorovac wrote:
>>> Hi,
>>>
>>> While trying to bisect, I've found another bug that predated the
>>> introduction of squashfs_readahead(), but it has
>>> a common denominator in squashfs_decompress() and squashfs_xz_uncompress().
>>
>> Wrong, the stall is happening in the XZ decompressor library, which
>> is *not* in Squashfs.
>>
>> This reported stall in the decompressor code is likely a symptom of you
>> deliberately thrashing your system.  When the system thrashes everything
>> starts to happen very slowly, and the system will spend a lot of
>> its time doing page I/O, and the CPU will spend a lot of time in
>> any CPU intensive code like the XZ decompressor library.
>>
>> So the fact the stall is being hit here is a symptom and not
>> a cause.  The decompressor code is likely running slowly due to
>> thrashing and waiting on paged-out buffers.  This is not indicative
>> of any bug, merely a system running slowly due to overload.
>>
>> As I said, this is not a Squashfs issue, because the code when the
>> stall takes place isn't in Squashfs.
>>
>> The people responsible for the rcu code should have a lot more insight
>> about what happens when the system is thrashing, and how this will
>> throw up false positives.  In this I believe this is an instance of
>> perfectly correct code running slowly due to thrashing incorrectly
>> being flagged as looping.
>>
>> CC'ing Paul E. McKenney <paulmck@kernel.org>
>>
>>> Frankly, I need an advice on how to handle a situation like this.
>>>
>>> Obviously, the best place to test the RCU problem with
>>> squashfs_readahead() is where it is introduced + Phillip's patches
>>> [012]/3.
>>>
>>> Thanks,
>>> Mirsad
>>>
>>> P.S.
>>>
>>> This is excerpt from dmesg log attached:
>>>
>>> [   25.338169] rcu: INFO: rcu_preempt detected expedited stalls on
>>> CPUs/tasks: { 4-... } 6 jiffies s: 373 root: 0x10/.
> 
> The "6 jiffies" leads me to believe that this kernel was built with
> CONFIG_ANDROID=y, but also with the short expedited grace-period commit,
> that is, with CONFIG_RCU_EXP_CPU_STALL_TIMEOUT set to 20 milliseconds
> and with HZ set to something like 250.  This means that the kernel
> will complain if an RCU read-side critical section exceeeds about 20
> milliseconds.  This is a fine thing if you are actually running on an
> Android, but a bit extreme for general usage.
> 
> If my guesses are correct, please apply this commit from mainline:
> 
> 1045a06724f3 ("remove CONFIG_ANDROID")
> 
> With this commit in place, CONFIG_RCU_EXP_CPU_STALL_TIMEOUT will default
> to 21000 (that is, 21 seconds) and things will be much better.
> 
> 							Thanx, Paul

Dear Paul,

Actually, I was able to reproduce and bisect the commit that started causing
described stalls (I have already explained how 5.10 didn't show any under the
exactly same saved widows and tabs set.)

# bad: [1e57930e9f4083ad5854ab6eadffe790a8167fb4] Merge tag 'rcu.2022.05.19a' of 
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
git bisect bad 1e57930e9f4083ad5854ab6eadffe790a8167fb4
# good: [f993aed406eaf968ba3867a76bb46c95336a33d0] Merge tag 'block-5.18-2022-05-18' of git://git.kernel.dk/linux-block
git bisect good f993aed406eaf968ba3867a76bb46c95336a33d0
# good: [bf2431021c8cfe52c866d7bf640ced03a35ebe91] Merge tag 'efi-next-for-v5.19' of 
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
git bisect good bf2431021c8cfe52c866d7bf640ced03a35ebe91
# good: [be05ee54378d451e1d60196921566d6087f3079f] Merge branches 'docs.2022.04.20a', 'fixes.2022.04.20a', 'nocb.2022.04.11b', 
'rcu-tasks.2022.04.11b', 'srcu.2022.
05.03a', 'torture.2022.04.11b', 'torture-tasks.2022.04.20a' and 'torturescript.2022.04.20a' into HEAD
git bisect good be05ee54378d451e1d60196921566d6087f3079f
# good: [023033fe343cdf2ba83ab762f8de69241c7fc086] tools/nolibc/types: define PATH_MAX and MAXPATHLEN
git bisect good 023033fe343cdf2ba83ab762f8de69241c7fc086
# good: [96980b833a21c6dc29d0dfdc8f211fb8a10256a7] tools/nolibc/string: do not use __builtin_strlen() at -O0
git bisect good 96980b833a21c6dc29d0dfdc8f211fb8a10256a7
# good: [0e0ff638400be8f497a35b51a4751fd823f6bd6a] tools/nolibc/stdlib: Implement `malloc()`, `calloc()`, `realloc()` and `free()`
git bisect good 0e0ff638400be8f497a35b51a4751fd823f6bd6a
# good: [b2f02e9cdaade3ce565e6973ac313d2f814447f2] Merge tag 'lkmm.2022.05.20a' of 
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
git bisect good b2f02e9cdaade3ce565e6973ac313d2f814447f2
# good: [9621fbee44df940e2e1b94b0676460a538dffefa] rcu: Move expedited grace period (GP) work to RT kthread_worker
git bisect good 9621fbee44df940e2e1b94b0676460a538dffefa
# bad: [ce13389053a347aa9f8ffbfda2238352536e15c9] Merge branch 'exp.2022.05.11a' into HEAD
git bisect bad ce13389053a347aa9f8ffbfda2238352536e15c9
# first bad commit: [ce13389053a347aa9f8ffbfda2238352536e15c9] Merge branch 'exp.2022.05.11a' into HEAD

The bug looks like this:

[   57.876820] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 0-... } 6 jiffies s: 545 root: 0x1/.
[   57.876854] rcu: blocking rcu_node structures (internal RCU debug):
[   57.876862] Task dump for CPU 0:
[   57.876867] task:snap-store      state:R  running task     stack:    0 pid: 2678 ppid:  2230 flags:0x00004008
[   57.876883] Call Trace:
[   57.876888]  <TASK>
[   57.876893]  ? _raw_spin_lock_irqsave+0x93/0xf0
[   57.876909]  ? sysvec_apic_timer_interrupt+0x89/0x90
[   57.876920]  ? asm_sysvec_apic_timer_interrupt+0x12/0x20
[   57.876930]  ? sysvec_apic_timer_interrupt+0x89/0x90
[   57.876939]  ? sysvec_apic_timer_interrupt+0x89/0x90
[   57.876948]  ? asm_sysvec_apic_timer_interrupt+0x12/0x20
[   57.876959]  ? lzma_len+0x1fb/0x3b0
[   57.876971]  ? dict_repeat+0xb3/0x120
[   57.876981]  ? lzma_main+0x943/0x1260
[   57.876995]  ? lzma2_lzma+0x2a4/0x3b0
[   57.877007]  ? xz_dec_lzma2_run+0x11f/0xb90
[   57.877018]  ? crc32_le_base+0x186/0x250
[   57.877032]  ? xz_dec_run+0x346/0x11e0
[   57.877046]  ? squashfs_xz_uncompress+0x135/0x330
[   57.877060]  ? lzo_init+0xd0/0xd0
[   57.877069]  ? squashfs_decompress+0x80/0xb0
[   57.877079]  ? squashfs_read_data+0x1eb/0x920
[   57.877090]  ? __filemap_get_folio+0x37e/0x540
[   57.877104]  ? squashfs_bio_read.isra.2+0x2a0/0x2a0
[   57.877118]  ? squashfs_readpage_block+0x493/0x950
[   57.877131]  ? squashfs_readpage+0xa52/0x1140
[   57.877143]  ? xas_nomem+0x2f/0x100
[   57.877154]  ? squashfs_copy_cache+0x360/0x360
[   57.877164]  ? folio_wait_bit+0x3c0/0x3c0
[   57.877175]  ? __kasan_check_read+0x11/0x20
[   57.877186]  ? pagevec_add_and_need_flush+0x8c/0xa0
[   57.877197]  ? folio_add_lru+0x42/0x70
[   57.877206]  ? filemap_add_folio+0xd4/0x130
[   57.877216]  ? add_to_page_cache_locked+0xa0/0xa0
[   57.877228]  ? filemap_read_folio.isra.62+0xbe/0x4d0
[   57.877238]  ? __kasan_check_write+0x14/0x20
[   57.877248]  ? filemap_page_mkwrite+0x6a0/0x6a0
[   57.877260]  ? filemap_add_folio+0x130/0x130
[   57.877269]  ? try_charge_memcg+0x4cc/0xa90
[   57.877281]  ? filemap_fault+0xc9a/0xe50
[   57.877293]  ? generic_file_read_iter+0x230/0x230
[   57.877305]  ? __rcu_read_unlock+0x55/0x260
[   57.877317]  ? __do_fault+0x7c/0x1b0
[   57.877329]  ? __handle_mm_fault+0x173b/0x1970
[   57.877340]  ? copy_page_range+0x1890/0x1890
[   57.877356]  ? handle_mm_fault+0x11b/0x380
[   57.877366]  ? do_user_addr_fault+0x258/0x810
[   57.877380]  ? exc_page_fault+0x60/0xe0
[   57.877389]  ? asm_exc_page_fault+0x8/0x30
[   57.877398]  ? asm_exc_page_fault+0x1e/0x30
[   57.877410]  </TASK>

Please find complete dmesg, my lshw and bisect attached.

I am not competent to estimate how this commit causes this kind of problems,
and whether stalls up to 20 jiffies are worth eradicating.

Hope this helps.

Thank you,
Mirsad

--
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu
-- 
System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia
The European Union

[-- Attachment #2: config-6.1.0-rc5.xz --]
[-- Type: application/x-xz, Size: 56696 bytes --]

[-- Attachment #3: dmesg-5.18.0-rc1-31a-ce13389053a3.log.xz --]
[-- Type: application/x-xz, Size: 1064 bytes --]

[-- Attachment #4: dmesg-5.18.0-rc1-31a-ce13389053a3-full.log.xz --]
[-- Type: application/x-xz, Size: 22476 bytes --]

[-- Attachment #5: lshw.txt.xz --]
[-- Type: application/x-xz, Size: 5256 bytes --]

[-- Attachment #6: squashfs-stall-bisect-full.log.xz --]
[-- Type: application/x-xz, Size: 1552 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: BUG: BISECTED: in squashfs_xz_uncompress() (Was: RCU stalls in squashfs_readahead())
  2022-11-20 15:49       ` BUG: BISECTED: " Mirsad Goran Todorovac
@ 2022-11-20 17:50         ` Paul E. McKenney
  2022-11-20 18:36           ` Mirsad Goran Todorovac
  0 siblings, 1 reply; 25+ messages in thread
From: Paul E. McKenney @ 2022-11-20 17:50 UTC (permalink / raw)
  To: Mirsad Goran Todorovac
  Cc: Phillip Lougher, LKML, phillip.lougher, Thorsten Leemhuis

On Sun, Nov 20, 2022 at 04:49:37PM +0100, Mirsad Goran Todorovac wrote:
> On 18. 11. 2022. 16:55, Paul E. McKenney wrote:
> > On Fri, Nov 18, 2022 at 06:11:16AM +0000, Phillip Lougher wrote:
> > > On 17/11/2022 23:05, Mirsad Goran Todorovac wrote:
> > > > Hi,
> > > > 
> > > > While trying to bisect, I've found another bug that predated the
> > > > introduction of squashfs_readahead(), but it has
> > > > a common denominator in squashfs_decompress() and squashfs_xz_uncompress().
> > > 
> > > Wrong, the stall is happening in the XZ decompressor library, which
> > > is *not* in Squashfs.
> > > 
> > > This reported stall in the decompressor code is likely a symptom of you
> > > deliberately thrashing your system.  When the system thrashes everything
> > > starts to happen very slowly, and the system will spend a lot of
> > > its time doing page I/O, and the CPU will spend a lot of time in
> > > any CPU intensive code like the XZ decompressor library.
> > > 
> > > So the fact the stall is being hit here is a symptom and not
> > > a cause.  The decompressor code is likely running slowly due to
> > > thrashing and waiting on paged-out buffers.  This is not indicative
> > > of any bug, merely a system running slowly due to overload.
> > > 
> > > As I said, this is not a Squashfs issue, because the code when the
> > > stall takes place isn't in Squashfs.
> > > 
> > > The people responsible for the rcu code should have a lot more insight
> > > about what happens when the system is thrashing, and how this will
> > > throw up false positives.  In this I believe this is an instance of
> > > perfectly correct code running slowly due to thrashing incorrectly
> > > being flagged as looping.
> > > 
> > > CC'ing Paul E. McKenney <paulmck@kernel.org>
> > > 
> > > > Frankly, I need an advice on how to handle a situation like this.
> > > > 
> > > > Obviously, the best place to test the RCU problem with
> > > > squashfs_readahead() is where it is introduced + Phillip's patches
> > > > [012]/3.
> > > > 
> > > > Thanks,
> > > > Mirsad
> > > > 
> > > > P.S.
> > > > 
> > > > This is excerpt from dmesg log attached:
> > > > 
> > > > [   25.338169] rcu: INFO: rcu_preempt detected expedited stalls on
> > > > CPUs/tasks: { 4-... } 6 jiffies s: 373 root: 0x10/.
> > 
> > The "6 jiffies" leads me to believe that this kernel was built with
> > CONFIG_ANDROID=y, but also with the short expedited grace-period commit,
> > that is, with CONFIG_RCU_EXP_CPU_STALL_TIMEOUT set to 20 milliseconds
> > and with HZ set to something like 250.  This means that the kernel
> > will complain if an RCU read-side critical section exceeeds about 20
> > milliseconds.  This is a fine thing if you are actually running on an
> > Android, but a bit extreme for general usage.
> > 
> > If my guesses are correct, please apply this commit from mainline:
> > 
> > 1045a06724f3 ("remove CONFIG_ANDROID")
> > 
> > With this commit in place, CONFIG_RCU_EXP_CPU_STALL_TIMEOUT will default
> > to 21000 (that is, 21 seconds) and things will be much better.

Please apply the commit called out above and rerun your test.

But if you still cannot bring yourself to do that, more interspersed
below...

> Dear Paul,
> 
> Actually, I was able to reproduce and bisect the commit that started causing
> described stalls (I have already explained how 5.10 didn't show any under the
> exactly same saved widows and tabs set.)
> 
> # bad: [1e57930e9f4083ad5854ab6eadffe790a8167fb4] Merge tag
> 'rcu.2022.05.19a' of
> git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
> git bisect bad 1e57930e9f4083ad5854ab6eadffe790a8167fb4
> # good: [f993aed406eaf968ba3867a76bb46c95336a33d0] Merge tag 'block-5.18-2022-05-18' of git://git.kernel.dk/linux-block
> git bisect good f993aed406eaf968ba3867a76bb46c95336a33d0
> # good: [bf2431021c8cfe52c866d7bf640ced03a35ebe91] Merge tag
> 'efi-next-for-v5.19' of
> git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
> git bisect good bf2431021c8cfe52c866d7bf640ced03a35ebe91
> # good: [be05ee54378d451e1d60196921566d6087f3079f] Merge branches
> 'docs.2022.04.20a', 'fixes.2022.04.20a', 'nocb.2022.04.11b',
> 'rcu-tasks.2022.04.11b', 'srcu.2022.
> 05.03a', 'torture.2022.04.11b', 'torture-tasks.2022.04.20a' and 'torturescript.2022.04.20a' into HEAD
> git bisect good be05ee54378d451e1d60196921566d6087f3079f
> # good: [023033fe343cdf2ba83ab762f8de69241c7fc086] tools/nolibc/types: define PATH_MAX and MAXPATHLEN
> git bisect good 023033fe343cdf2ba83ab762f8de69241c7fc086
> # good: [96980b833a21c6dc29d0dfdc8f211fb8a10256a7] tools/nolibc/string: do not use __builtin_strlen() at -O0
> git bisect good 96980b833a21c6dc29d0dfdc8f211fb8a10256a7
> # good: [0e0ff638400be8f497a35b51a4751fd823f6bd6a] tools/nolibc/stdlib: Implement `malloc()`, `calloc()`, `realloc()` and `free()`
> git bisect good 0e0ff638400be8f497a35b51a4751fd823f6bd6a
> # good: [b2f02e9cdaade3ce565e6973ac313d2f814447f2] Merge tag
> 'lkmm.2022.05.20a' of
> git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
> git bisect good b2f02e9cdaade3ce565e6973ac313d2f814447f2
> # good: [9621fbee44df940e2e1b94b0676460a538dffefa] rcu: Move expedited grace period (GP) work to RT kthread_worker
> git bisect good 9621fbee44df940e2e1b94b0676460a538dffefa
> # bad: [ce13389053a347aa9f8ffbfda2238352536e15c9] Merge branch 'exp.2022.05.11a' into HEAD
> git bisect bad ce13389053a347aa9f8ffbfda2238352536e15c9
> # first bad commit: [ce13389053a347aa9f8ffbfda2238352536e15c9] Merge branch 'exp.2022.05.11a' into HEAD

More precisely, this commit:

28b3ae426598 ("rcu: Introduce CONFIG_RCU_EXP_CPU_STALL_TIMEOUT")

The idea was to cause Android-based systems to have 20-millisecond
expedited RCU CPU stall warnings, but to leave other kernels at the
traditional 21-second value.  Again, the "6 jiffies" below indicates
that you are running a kernel with CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=0
and CONFIG_ANDROID=y.

Except that there were a lot of non-Android systems (including Debian
and Fedora) that have nothing to do with Android, but that nevertheless
build their kernels with CONFIG_ANDROID=y, which no doubt seemed like
a good idea at the time.

For more information, please see https://lwn.net/Articles/899743/

The upshot of the discussion was this same commit that I pointed out
earlier that removes CONFIG_ANDROID completely:

1045a06724f3 ("remove CONFIG_ANDROID")

Again, please apply this commit and rerun your test.

> The bug looks like this:
> 
> [   57.876820] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 0-... } 6 jiffies s: 545 root: 0x1/.

Yes, again "6 jiffies" which is roughly the 20 milliseconds set for
kernels built with CONFIG_ANDROID=y and HZ=250 that have this commit:

28b3ae426598 ("rcu: Introduce CONFIG_RCU_EXP_CPU_STALL_TIMEOUT")

But not this commit:

1045a06724f3 ("remove CONFIG_ANDROID")

Yet again, please apply this commit and rerun your test.

> [   57.876854] rcu: blocking rcu_node structures (internal RCU debug):
> [   57.876862] Task dump for CPU 0:
> [   57.876867] task:snap-store      state:R  running task     stack:    0 pid: 2678 ppid:  2230 flags:0x00004008
> [   57.876883] Call Trace:
> [   57.876888]  <TASK>
> [   57.876893]  ? _raw_spin_lock_irqsave+0x93/0xf0
> [   57.876909]  ? sysvec_apic_timer_interrupt+0x89/0x90
> [   57.876920]  ? asm_sysvec_apic_timer_interrupt+0x12/0x20
> [   57.876930]  ? sysvec_apic_timer_interrupt+0x89/0x90
> [   57.876939]  ? sysvec_apic_timer_interrupt+0x89/0x90
> [   57.876948]  ? asm_sysvec_apic_timer_interrupt+0x12/0x20
> [   57.876959]  ? lzma_len+0x1fb/0x3b0
> [   57.876971]  ? dict_repeat+0xb3/0x120
> [   57.876981]  ? lzma_main+0x943/0x1260
> [   57.876995]  ? lzma2_lzma+0x2a4/0x3b0
> [   57.877007]  ? xz_dec_lzma2_run+0x11f/0xb90
> [   57.877018]  ? crc32_le_base+0x186/0x250
> [   57.877032]  ? xz_dec_run+0x346/0x11e0
> [   57.877046]  ? squashfs_xz_uncompress+0x135/0x330
> [   57.877060]  ? lzo_init+0xd0/0xd0
> [   57.877069]  ? squashfs_decompress+0x80/0xb0
> [   57.877079]  ? squashfs_read_data+0x1eb/0x920
> [   57.877090]  ? __filemap_get_folio+0x37e/0x540
> [   57.877104]  ? squashfs_bio_read.isra.2+0x2a0/0x2a0
> [   57.877118]  ? squashfs_readpage_block+0x493/0x950
> [   57.877131]  ? squashfs_readpage+0xa52/0x1140
> [   57.877143]  ? xas_nomem+0x2f/0x100
> [   57.877154]  ? squashfs_copy_cache+0x360/0x360
> [   57.877164]  ? folio_wait_bit+0x3c0/0x3c0
> [   57.877175]  ? __kasan_check_read+0x11/0x20
> [   57.877186]  ? pagevec_add_and_need_flush+0x8c/0xa0
> [   57.877197]  ? folio_add_lru+0x42/0x70
> [   57.877206]  ? filemap_add_folio+0xd4/0x130
> [   57.877216]  ? add_to_page_cache_locked+0xa0/0xa0
> [   57.877228]  ? filemap_read_folio.isra.62+0xbe/0x4d0
> [   57.877238]  ? __kasan_check_write+0x14/0x20
> [   57.877248]  ? filemap_page_mkwrite+0x6a0/0x6a0
> [   57.877260]  ? filemap_add_folio+0x130/0x130
> [   57.877269]  ? try_charge_memcg+0x4cc/0xa90
> [   57.877281]  ? filemap_fault+0xc9a/0xe50
> [   57.877293]  ? generic_file_read_iter+0x230/0x230
> [   57.877305]  ? __rcu_read_unlock+0x55/0x260
> [   57.877317]  ? __do_fault+0x7c/0x1b0
> [   57.877329]  ? __handle_mm_fault+0x173b/0x1970
> [   57.877340]  ? copy_page_range+0x1890/0x1890
> [   57.877356]  ? handle_mm_fault+0x11b/0x380
> [   57.877366]  ? do_user_addr_fault+0x258/0x810
> [   57.877380]  ? exc_page_fault+0x60/0xe0
> [   57.877389]  ? asm_exc_page_fault+0x8/0x30
> [   57.877398]  ? asm_exc_page_fault+0x1e/0x30
> [   57.877410]  </TASK>
> 
> Please find complete dmesg, my lshw and bisect attached.
> 
> I am not competent to estimate how this commit causes this kind of problems,
> and whether stalls up to 20 jiffies are worth eradicating.

The Android folks believe that they are on Android platforms, as you
can see from the commit log on:

28b3ae426598 ("rcu: Introduce CONFIG_RCU_EXP_CPU_STALL_TIMEOUT")

But this would be excessive on many other platforms, especially those
with large quantities of CPUs, devices, or memory.

> Hope this helps.

Likewise!

							Thanx, Paul

> Thank you,
> Mirsad
> 
> --
> Mirsad Goran Todorovac
> Sistem inženjer
> Grafički fakultet | Akademija likovnih umjetnosti
> Sveučilište u Zagrebu
> -- 
> System engineer
> Faculty of Graphic Arts | Academy of Fine Arts
> University of Zagreb, Republic of Croatia
> The European Union







^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: BUG: BISECTED: in squashfs_xz_uncompress() (Was: RCU stalls in squashfs_readahead())
  2022-11-20 17:50         ` Paul E. McKenney
@ 2022-11-20 18:36           ` Mirsad Goran Todorovac
  2022-11-20 19:21             ` Paul E. McKenney
  2022-11-20 19:33             ` Mirsad Goran Todorovac
  0 siblings, 2 replies; 25+ messages in thread
From: Mirsad Goran Todorovac @ 2022-11-20 18:36 UTC (permalink / raw)
  To: paulmck
  Cc: Phillip Lougher, LKML, phillip.lougher, Thorsten Leemhuis, elliott

On 20. 11. 2022. 18:50, Paul E. McKenney wrote:
> On Sun, Nov 20, 2022 at 04:49:37PM +0100, Mirsad Goran Todorovac wrote:
>> On 18. 11. 2022. 16:55, Paul E. McKenney wrote:
>>> On Fri, Nov 18, 2022 at 06:11:16AM +0000, Phillip Lougher wrote:
>>>> On 17/11/2022 23:05, Mirsad Goran Todorovac wrote:
>>>>> Hi,
>>>>>
>>>>> While trying to bisect, I've found another bug that predated the
>>>>> introduction of squashfs_readahead(), but it has
>>>>> a common denominator in squashfs_decompress() and squashfs_xz_uncompress().
>>>>
>>>> Wrong, the stall is happening in the XZ decompressor library, which
>>>> is *not* in Squashfs.
>>>>
>>>> This reported stall in the decompressor code is likely a symptom of you
>>>> deliberately thrashing your system.  When the system thrashes everything
>>>> starts to happen very slowly, and the system will spend a lot of
>>>> its time doing page I/O, and the CPU will spend a lot of time in
>>>> any CPU intensive code like the XZ decompressor library.
>>>>
>>>> So the fact the stall is being hit here is a symptom and not
>>>> a cause.  The decompressor code is likely running slowly due to
>>>> thrashing and waiting on paged-out buffers.  This is not indicative
>>>> of any bug, merely a system running slowly due to overload.
>>>>
>>>> As I said, this is not a Squashfs issue, because the code when the
>>>> stall takes place isn't in Squashfs.
>>>>
>>>> The people responsible for the rcu code should have a lot more insight
>>>> about what happens when the system is thrashing, and how this will
>>>> throw up false positives.  In this I believe this is an instance of
>>>> perfectly correct code running slowly due to thrashing incorrectly
>>>> being flagged as looping.
>>>>
>>>> CC'ing Paul E. McKenney <paulmck@kernel.org>
>>>>
>>>>> Frankly, I need an advice on how to handle a situation like this.
>>>>>
>>>>> Obviously, the best place to test the RCU problem with
>>>>> squashfs_readahead() is where it is introduced + Phillip's patches
>>>>> [012]/3.
>>>>>
>>>>> Thanks,
>>>>> Mirsad
>>>>>
>>>>> P.S.
>>>>>
>>>>> This is excerpt from dmesg log attached:
>>>>>
>>>>> [   25.338169] rcu: INFO: rcu_preempt detected expedited stalls on
>>>>> CPUs/tasks: { 4-... } 6 jiffies s: 373 root: 0x10/.
>>>
>>> The "6 jiffies" leads me to believe that this kernel was built with
>>> CONFIG_ANDROID=y, but also with the short expedited grace-period commit,
>>> that is, with CONFIG_RCU_EXP_CPU_STALL_TIMEOUT set to 20 milliseconds
>>> and with HZ set to something like 250.  This means that the kernel
>>> will complain if an RCU read-side critical section exceeeds about 20
>>> milliseconds.  This is a fine thing if you are actually running on an
>>> Android, but a bit extreme for general usage.
>>>
>>> If my guesses are correct, please apply this commit from mainline:
>>>
>>> 1045a06724f3 ("remove CONFIG_ANDROID")
>>>
>>> With this commit in place, CONFIG_RCU_EXP_CPU_STALL_TIMEOUT will default
>>> to 21000 (that is, 21 seconds) and things will be much better.
> 
> Please apply the commit called out above and rerun your test.
> 
> But if you still cannot bring yourself to do that, more interspersed
> below...
> 
>> Dear Paul,
>>
>> Actually, I was able to reproduce and bisect the commit that started causing
>> described stalls (I have already explained how 5.10 didn't show any under the
>> exactly same saved widows and tabs set.)
>>
>> # bad: [1e57930e9f4083ad5854ab6eadffe790a8167fb4] Merge tag
>> 'rcu.2022.05.19a' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
>> git bisect bad 1e57930e9f4083ad5854ab6eadffe790a8167fb4
>> # good: [f993aed406eaf968ba3867a76bb46c95336a33d0] Merge tag 'block-5.18-2022-05-18' of git://git.kernel.dk/linux-block
>> git bisect good f993aed406eaf968ba3867a76bb46c95336a33d0
>> # good: [bf2431021c8cfe52c866d7bf640ced03a35ebe91] Merge tag
>> 'efi-next-for-v5.19' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
>> git bisect good bf2431021c8cfe52c866d7bf640ced03a35ebe91
>> # good: [be05ee54378d451e1d60196921566d6087f3079f] Merge branches
>> 'docs.2022.04.20a', 'fixes.2022.04.20a', 'nocb.2022.04.11b',
>> 'rcu-tasks.2022.04.11b', 'srcu.2022.
>> 05.03a', 'torture.2022.04.11b', 'torture-tasks.2022.04.20a' and 'torturescript.2022.04.20a' into HEAD
>> git bisect good be05ee54378d451e1d60196921566d6087f3079f
>> # good: [023033fe343cdf2ba83ab762f8de69241c7fc086] tools/nolibc/types: define PATH_MAX and MAXPATHLEN
>> git bisect good 023033fe343cdf2ba83ab762f8de69241c7fc086
>> # good: [96980b833a21c6dc29d0dfdc8f211fb8a10256a7] tools/nolibc/string: do not use __builtin_strlen() at -O0
>> git bisect good 96980b833a21c6dc29d0dfdc8f211fb8a10256a7
>> # good: [0e0ff638400be8f497a35b51a4751fd823f6bd6a] tools/nolibc/stdlib: Implement `malloc()`, `calloc()`, `realloc()` and `free()`
>> git bisect good 0e0ff638400be8f497a35b51a4751fd823f6bd6a
>> # good: [b2f02e9cdaade3ce565e6973ac313d2f814447f2] Merge tag
>> 'lkmm.2022.05.20a' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
>> git bisect good b2f02e9cdaade3ce565e6973ac313d2f814447f2
>> # good: [9621fbee44df940e2e1b94b0676460a538dffefa] rcu: Move expedited grace period (GP) work to RT kthread_worker
>> git bisect good 9621fbee44df940e2e1b94b0676460a538dffefa
>> # bad: [ce13389053a347aa9f8ffbfda2238352536e15c9] Merge branch 'exp.2022.05.11a' into HEAD
>> git bisect bad ce13389053a347aa9f8ffbfda2238352536e15c9
>> # first bad commit: [ce13389053a347aa9f8ffbfda2238352536e15c9] Merge branch 'exp.2022.05.11a' into HEAD
> 
> More precisely, this commit:
> 
> 28b3ae426598 ("rcu: Introduce CONFIG_RCU_EXP_CPU_STALL_TIMEOUT")
> 
> The idea was to cause Android-based systems to have 20-millisecond
> expedited RCU CPU stall warnings, but to leave other kernels at the
> traditional 21-second value.  Again, the "6 jiffies" below indicates
> that you are running a kernel with CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=0
> and CONFIG_ANDROID=y.
> 
> Except that there were a lot of non-Android systems (including Debian
> and Fedora) that have nothing to do with Android, but that nevertheless
> build their kernels with CONFIG_ANDROID=y, which no doubt seemed like
> a good idea at the time.
> 
> For more information, please see https://lwn.net/Articles/899743/
> 
> The upshot of the discussion was this same commit that I pointed out
> earlier that removes CONFIG_ANDROID completely:
> 
> 1045a06724f3 ("remove CONFIG_ANDROID")
> 
> Again, please apply this commit and rerun your test.
> 
>> The bug looks like this:
>>
>> [   57.876820] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 0-... } 6 jiffies s: 545 root: 0x1/.
> 
> Yes, again "6 jiffies" which is roughly the 20 milliseconds set for
> kernels built with CONFIG_ANDROID=y and HZ=250 that have this commit:
> 
> 28b3ae426598 ("rcu: Introduce CONFIG_RCU_EXP_CPU_STALL_TIMEOUT")
> 
> But not this commit:
> 
> 1045a06724f3 ("remove CONFIG_ANDROID")
> 
> Yet again, please apply this commit and rerun your test.
> 
>> [   57.876854] rcu: blocking rcu_node structures (internal RCU debug):
>> [   57.876862] Task dump for CPU 0:
>> [   57.876867] task:snap-store      state:R  running task     stack:    0 pid: 2678 ppid:  2230 flags:0x00004008
>> [   57.876883] Call Trace:
>> [   57.876888]  <TASK>
>> [   57.876893]  ? _raw_spin_lock_irqsave+0x93/0xf0
>> [   57.876909]  ? sysvec_apic_timer_interrupt+0x89/0x90
>> [   57.876920]  ? asm_sysvec_apic_timer_interrupt+0x12/0x20
>> [   57.876930]  ? sysvec_apic_timer_interrupt+0x89/0x90
>> [   57.876939]  ? sysvec_apic_timer_interrupt+0x89/0x90
>> [   57.876948]  ? asm_sysvec_apic_timer_interrupt+0x12/0x20
>> [   57.876959]  ? lzma_len+0x1fb/0x3b0
>> [   57.876971]  ? dict_repeat+0xb3/0x120
>> [   57.876981]  ? lzma_main+0x943/0x1260
>> [   57.876995]  ? lzma2_lzma+0x2a4/0x3b0
>> [   57.877007]  ? xz_dec_lzma2_run+0x11f/0xb90
>> [   57.877018]  ? crc32_le_base+0x186/0x250
>> [   57.877032]  ? xz_dec_run+0x346/0x11e0
>> [   57.877046]  ? squashfs_xz_uncompress+0x135/0x330
>> [   57.877060]  ? lzo_init+0xd0/0xd0
>> [   57.877069]  ? squashfs_decompress+0x80/0xb0
>> [   57.877079]  ? squashfs_read_data+0x1eb/0x920
>> [   57.877090]  ? __filemap_get_folio+0x37e/0x540
>> [   57.877104]  ? squashfs_bio_read.isra.2+0x2a0/0x2a0
>> [   57.877118]  ? squashfs_readpage_block+0x493/0x950
>> [   57.877131]  ? squashfs_readpage+0xa52/0x1140
>> [   57.877143]  ? xas_nomem+0x2f/0x100
>> [   57.877154]  ? squashfs_copy_cache+0x360/0x360
>> [   57.877164]  ? folio_wait_bit+0x3c0/0x3c0
>> [   57.877175]  ? __kasan_check_read+0x11/0x20
>> [   57.877186]  ? pagevec_add_and_need_flush+0x8c/0xa0
>> [   57.877197]  ? folio_add_lru+0x42/0x70
>> [   57.877206]  ? filemap_add_folio+0xd4/0x130
>> [   57.877216]  ? add_to_page_cache_locked+0xa0/0xa0
>> [   57.877228]  ? filemap_read_folio.isra.62+0xbe/0x4d0
>> [   57.877238]  ? __kasan_check_write+0x14/0x20
>> [   57.877248]  ? filemap_page_mkwrite+0x6a0/0x6a0
>> [   57.877260]  ? filemap_add_folio+0x130/0x130
>> [   57.877269]  ? try_charge_memcg+0x4cc/0xa90
>> [   57.877281]  ? filemap_fault+0xc9a/0xe50
>> [   57.877293]  ? generic_file_read_iter+0x230/0x230
>> [   57.877305]  ? __rcu_read_unlock+0x55/0x260
>> [   57.877317]  ? __do_fault+0x7c/0x1b0
>> [   57.877329]  ? __handle_mm_fault+0x173b/0x1970
>> [   57.877340]  ? copy_page_range+0x1890/0x1890
>> [   57.877356]  ? handle_mm_fault+0x11b/0x380
>> [   57.877366]  ? do_user_addr_fault+0x258/0x810
>> [   57.877380]  ? exc_page_fault+0x60/0xe0
>> [   57.877389]  ? asm_exc_page_fault+0x8/0x30
>> [   57.877398]  ? asm_exc_page_fault+0x1e/0x30
>> [   57.877410]  </TASK>
>>
>> Please find complete dmesg, my lshw and bisect attached.
>>
>> I am not competent to estimate how this commit causes this kind of problems,
>> and whether stalls up to 20 jiffies are worth eradicating.
> 
> The Android folks believe that they are on Android platforms, as you
> can see from the commit log on:
> 
> 28b3ae426598 ("rcu: Introduce CONFIG_RCU_EXP_CPU_STALL_TIMEOUT")
> 
> But this would be excessive on many other platforms, especially those
> with large quantities of CPUs, devices, or memory.
> 
>> Hope this helps.
> 
> Likewise!
> 
> 							Thanx, Paul

Dear Mr. McKenney,

And what about the Mr. Robert Elliott's observation about calling conf_recshed()?

 > How big can these readahead sizes be? Should one of the loops include
 > cond_resched() calls?

That is IMHO better than allowing 21000 milisecond stalls on a core (or more of them).

I don't think it is correct to stay in kernel mode for more than an timer unit
without yielding the CPU. It creates stalls in multimedia and audio (chirps like on scratched
CD-ROMs). This is especially noticeable with a KASAN build.

Since Firefox and most snaps are using squashfs as compressed ROFS, the Firefox appears
to perform poorer since snaps are introduced than Chrome.

IMHO, if we want something like realtime and multimedia processing (which is the specific
area of my research), it seems that anything trying to hold processor for 21000 ms (21 secs)
is either buggy or deliberately malicious. 20 ms is quite enough of work for a threat
in one allotted timeslot.

I do not agree with Mr. Lougher's observation that I am thrashing my laptop. I think that
a system has to endure stress and torture testing. I was raised on Digital MicroVAX systems
on Ultrix which compiled lab at a time in memory that would today sound funny. :)

Besides, this is the very idea behind the MG-LRU algorithm commit, to test eviction of
memory pages in the system with heavy load and low on memory.

I will probably test your commits, but now I have to do my own evening ritual, unwinding,
and knowledge and memory consolidation (called "sleep").

I appreciate your lots of commits on the kernel.org and I hope I do not sound like
I am thinking you are a village idiot :(

I am trying to adhere to the Code of Conduct with mutual respect and politeness.

I know that the Linux kernel is about 30 million lines by now, and by the security experts
we should expect 30,000 bugs in such a solid piece of written code (one per thousand of
lines). Only Mr. Thorsten mentioned 950 unresolved in the "open" list.

Knowing all of this is difficult, but I still believe in open source and open systems
interconnected.

Of course, I always remember a proverb "Who hath despised the day of the small beginnings?"

Hope this helps. My $0.02.

Thank you,
Mirsad

--
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu
-- 
System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia
The European Union


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: BUG: BISECTED: in squashfs_xz_uncompress() (Was: RCU stalls in squashfs_readahead())
  2022-11-20 18:36           ` Mirsad Goran Todorovac
@ 2022-11-20 19:21             ` Paul E. McKenney
  2022-11-21  4:04               ` Mirsad Goran Todorovac
  2022-11-20 19:33             ` Mirsad Goran Todorovac
  1 sibling, 1 reply; 25+ messages in thread
From: Paul E. McKenney @ 2022-11-20 19:21 UTC (permalink / raw)
  To: Mirsad Goran Todorovac
  Cc: Phillip Lougher, LKML, phillip.lougher, Thorsten Leemhuis, elliott

On Sun, Nov 20, 2022 at 07:36:21PM +0100, Mirsad Goran Todorovac wrote:
> On 20. 11. 2022. 18:50, Paul E. McKenney wrote:
> > On Sun, Nov 20, 2022 at 04:49:37PM +0100, Mirsad Goran Todorovac wrote:
> > > On 18. 11. 2022. 16:55, Paul E. McKenney wrote:
> > > > On Fri, Nov 18, 2022 at 06:11:16AM +0000, Phillip Lougher wrote:
> > > > > On 17/11/2022 23:05, Mirsad Goran Todorovac wrote:
> > > > > > Hi,
> > > > > > 
> > > > > > While trying to bisect, I've found another bug that predated the
> > > > > > introduction of squashfs_readahead(), but it has
> > > > > > a common denominator in squashfs_decompress() and squashfs_xz_uncompress().
> > > > > 
> > > > > Wrong, the stall is happening in the XZ decompressor library, which
> > > > > is *not* in Squashfs.
> > > > > 
> > > > > This reported stall in the decompressor code is likely a symptom of you
> > > > > deliberately thrashing your system.  When the system thrashes everything
> > > > > starts to happen very slowly, and the system will spend a lot of
> > > > > its time doing page I/O, and the CPU will spend a lot of time in
> > > > > any CPU intensive code like the XZ decompressor library.
> > > > > 
> > > > > So the fact the stall is being hit here is a symptom and not
> > > > > a cause.  The decompressor code is likely running slowly due to
> > > > > thrashing and waiting on paged-out buffers.  This is not indicative
> > > > > of any bug, merely a system running slowly due to overload.
> > > > > 
> > > > > As I said, this is not a Squashfs issue, because the code when the
> > > > > stall takes place isn't in Squashfs.
> > > > > 
> > > > > The people responsible for the rcu code should have a lot more insight
> > > > > about what happens when the system is thrashing, and how this will
> > > > > throw up false positives.  In this I believe this is an instance of
> > > > > perfectly correct code running slowly due to thrashing incorrectly
> > > > > being flagged as looping.
> > > > > 
> > > > > CC'ing Paul E. McKenney <paulmck@kernel.org>
> > > > > 
> > > > > > Frankly, I need an advice on how to handle a situation like this.
> > > > > > 
> > > > > > Obviously, the best place to test the RCU problem with
> > > > > > squashfs_readahead() is where it is introduced + Phillip's patches
> > > > > > [012]/3.
> > > > > > 
> > > > > > Thanks,
> > > > > > Mirsad
> > > > > > 
> > > > > > P.S.
> > > > > > 
> > > > > > This is excerpt from dmesg log attached:
> > > > > > 
> > > > > > [   25.338169] rcu: INFO: rcu_preempt detected expedited stalls on
> > > > > > CPUs/tasks: { 4-... } 6 jiffies s: 373 root: 0x10/.
> > > > 
> > > > The "6 jiffies" leads me to believe that this kernel was built with
> > > > CONFIG_ANDROID=y, but also with the short expedited grace-period commit,
> > > > that is, with CONFIG_RCU_EXP_CPU_STALL_TIMEOUT set to 20 milliseconds
> > > > and with HZ set to something like 250.  This means that the kernel
> > > > will complain if an RCU read-side critical section exceeeds about 20
> > > > milliseconds.  This is a fine thing if you are actually running on an
> > > > Android, but a bit extreme for general usage.
> > > > 
> > > > If my guesses are correct, please apply this commit from mainline:
> > > > 
> > > > 1045a06724f3 ("remove CONFIG_ANDROID")
> > > > 
> > > > With this commit in place, CONFIG_RCU_EXP_CPU_STALL_TIMEOUT will default
> > > > to 21000 (that is, 21 seconds) and things will be much better.
> > 
> > Please apply the commit called out above and rerun your test.
> > 
> > But if you still cannot bring yourself to do that, more interspersed
> > below...
> > 
> > > Dear Paul,
> > > 
> > > Actually, I was able to reproduce and bisect the commit that started causing
> > > described stalls (I have already explained how 5.10 didn't show any under the
> > > exactly same saved widows and tabs set.)
> > > 
> > > # bad: [1e57930e9f4083ad5854ab6eadffe790a8167fb4] Merge tag
> > > 'rcu.2022.05.19a' of
> > > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
> > > git bisect bad 1e57930e9f4083ad5854ab6eadffe790a8167fb4
> > > # good: [f993aed406eaf968ba3867a76bb46c95336a33d0] Merge tag 'block-5.18-2022-05-18' of git://git.kernel.dk/linux-block
> > > git bisect good f993aed406eaf968ba3867a76bb46c95336a33d0
> > > # good: [bf2431021c8cfe52c866d7bf640ced03a35ebe91] Merge tag
> > > 'efi-next-for-v5.19' of
> > > git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
> > > git bisect good bf2431021c8cfe52c866d7bf640ced03a35ebe91
> > > # good: [be05ee54378d451e1d60196921566d6087f3079f] Merge branches
> > > 'docs.2022.04.20a', 'fixes.2022.04.20a', 'nocb.2022.04.11b',
> > > 'rcu-tasks.2022.04.11b', 'srcu.2022.
> > > 05.03a', 'torture.2022.04.11b', 'torture-tasks.2022.04.20a' and 'torturescript.2022.04.20a' into HEAD
> > > git bisect good be05ee54378d451e1d60196921566d6087f3079f
> > > # good: [023033fe343cdf2ba83ab762f8de69241c7fc086] tools/nolibc/types: define PATH_MAX and MAXPATHLEN
> > > git bisect good 023033fe343cdf2ba83ab762f8de69241c7fc086
> > > # good: [96980b833a21c6dc29d0dfdc8f211fb8a10256a7] tools/nolibc/string: do not use __builtin_strlen() at -O0
> > > git bisect good 96980b833a21c6dc29d0dfdc8f211fb8a10256a7
> > > # good: [0e0ff638400be8f497a35b51a4751fd823f6bd6a] tools/nolibc/stdlib: Implement `malloc()`, `calloc()`, `realloc()` and `free()`
> > > git bisect good 0e0ff638400be8f497a35b51a4751fd823f6bd6a
> > > # good: [b2f02e9cdaade3ce565e6973ac313d2f814447f2] Merge tag
> > > 'lkmm.2022.05.20a' of
> > > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
> > > git bisect good b2f02e9cdaade3ce565e6973ac313d2f814447f2
> > > # good: [9621fbee44df940e2e1b94b0676460a538dffefa] rcu: Move expedited grace period (GP) work to RT kthread_worker
> > > git bisect good 9621fbee44df940e2e1b94b0676460a538dffefa
> > > # bad: [ce13389053a347aa9f8ffbfda2238352536e15c9] Merge branch 'exp.2022.05.11a' into HEAD
> > > git bisect bad ce13389053a347aa9f8ffbfda2238352536e15c9
> > > # first bad commit: [ce13389053a347aa9f8ffbfda2238352536e15c9] Merge branch 'exp.2022.05.11a' into HEAD
> > 
> > More precisely, this commit:
> > 
> > 28b3ae426598 ("rcu: Introduce CONFIG_RCU_EXP_CPU_STALL_TIMEOUT")
> > 
> > The idea was to cause Android-based systems to have 20-millisecond
> > expedited RCU CPU stall warnings, but to leave other kernels at the
> > traditional 21-second value.  Again, the "6 jiffies" below indicates
> > that you are running a kernel with CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=0
> > and CONFIG_ANDROID=y.
> > 
> > Except that there were a lot of non-Android systems (including Debian
> > and Fedora) that have nothing to do with Android, but that nevertheless
> > build their kernels with CONFIG_ANDROID=y, which no doubt seemed like
> > a good idea at the time.
> > 
> > For more information, please see https://lwn.net/Articles/899743/
> > 
> > The upshot of the discussion was this same commit that I pointed out
> > earlier that removes CONFIG_ANDROID completely:
> > 
> > 1045a06724f3 ("remove CONFIG_ANDROID")
> > 
> > Again, please apply this commit and rerun your test.
> > 
> > > The bug looks like this:
> > > 
> > > [   57.876820] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 0-... } 6 jiffies s: 545 root: 0x1/.
> > 
> > Yes, again "6 jiffies" which is roughly the 20 milliseconds set for
> > kernels built with CONFIG_ANDROID=y and HZ=250 that have this commit:
> > 
> > 28b3ae426598 ("rcu: Introduce CONFIG_RCU_EXP_CPU_STALL_TIMEOUT")
> > 
> > But not this commit:
> > 
> > 1045a06724f3 ("remove CONFIG_ANDROID")
> > 
> > Yet again, please apply this commit and rerun your test.
> > 
> > > [   57.876854] rcu: blocking rcu_node structures (internal RCU debug):
> > > [   57.876862] Task dump for CPU 0:
> > > [   57.876867] task:snap-store      state:R  running task     stack:    0 pid: 2678 ppid:  2230 flags:0x00004008
> > > [   57.876883] Call Trace:
> > > [   57.876888]  <TASK>
> > > [   57.876893]  ? _raw_spin_lock_irqsave+0x93/0xf0
> > > [   57.876909]  ? sysvec_apic_timer_interrupt+0x89/0x90
> > > [   57.876920]  ? asm_sysvec_apic_timer_interrupt+0x12/0x20
> > > [   57.876930]  ? sysvec_apic_timer_interrupt+0x89/0x90
> > > [   57.876939]  ? sysvec_apic_timer_interrupt+0x89/0x90
> > > [   57.876948]  ? asm_sysvec_apic_timer_interrupt+0x12/0x20
> > > [   57.876959]  ? lzma_len+0x1fb/0x3b0
> > > [   57.876971]  ? dict_repeat+0xb3/0x120
> > > [   57.876981]  ? lzma_main+0x943/0x1260
> > > [   57.876995]  ? lzma2_lzma+0x2a4/0x3b0
> > > [   57.877007]  ? xz_dec_lzma2_run+0x11f/0xb90
> > > [   57.877018]  ? crc32_le_base+0x186/0x250
> > > [   57.877032]  ? xz_dec_run+0x346/0x11e0
> > > [   57.877046]  ? squashfs_xz_uncompress+0x135/0x330
> > > [   57.877060]  ? lzo_init+0xd0/0xd0
> > > [   57.877069]  ? squashfs_decompress+0x80/0xb0
> > > [   57.877079]  ? squashfs_read_data+0x1eb/0x920
> > > [   57.877090]  ? __filemap_get_folio+0x37e/0x540
> > > [   57.877104]  ? squashfs_bio_read.isra.2+0x2a0/0x2a0
> > > [   57.877118]  ? squashfs_readpage_block+0x493/0x950
> > > [   57.877131]  ? squashfs_readpage+0xa52/0x1140
> > > [   57.877143]  ? xas_nomem+0x2f/0x100
> > > [   57.877154]  ? squashfs_copy_cache+0x360/0x360
> > > [   57.877164]  ? folio_wait_bit+0x3c0/0x3c0
> > > [   57.877175]  ? __kasan_check_read+0x11/0x20
> > > [   57.877186]  ? pagevec_add_and_need_flush+0x8c/0xa0
> > > [   57.877197]  ? folio_add_lru+0x42/0x70
> > > [   57.877206]  ? filemap_add_folio+0xd4/0x130
> > > [   57.877216]  ? add_to_page_cache_locked+0xa0/0xa0
> > > [   57.877228]  ? filemap_read_folio.isra.62+0xbe/0x4d0
> > > [   57.877238]  ? __kasan_check_write+0x14/0x20
> > > [   57.877248]  ? filemap_page_mkwrite+0x6a0/0x6a0
> > > [   57.877260]  ? filemap_add_folio+0x130/0x130
> > > [   57.877269]  ? try_charge_memcg+0x4cc/0xa90
> > > [   57.877281]  ? filemap_fault+0xc9a/0xe50
> > > [   57.877293]  ? generic_file_read_iter+0x230/0x230
> > > [   57.877305]  ? __rcu_read_unlock+0x55/0x260
> > > [   57.877317]  ? __do_fault+0x7c/0x1b0
> > > [   57.877329]  ? __handle_mm_fault+0x173b/0x1970
> > > [   57.877340]  ? copy_page_range+0x1890/0x1890
> > > [   57.877356]  ? handle_mm_fault+0x11b/0x380
> > > [   57.877366]  ? do_user_addr_fault+0x258/0x810
> > > [   57.877380]  ? exc_page_fault+0x60/0xe0
> > > [   57.877389]  ? asm_exc_page_fault+0x8/0x30
> > > [   57.877398]  ? asm_exc_page_fault+0x1e/0x30
> > > [   57.877410]  </TASK>
> > > 
> > > Please find complete dmesg, my lshw and bisect attached.
> > > 
> > > I am not competent to estimate how this commit causes this kind of problems,
> > > and whether stalls up to 20 jiffies are worth eradicating.
> > 
> > The Android folks believe that they are on Android platforms, as you
> > can see from the commit log on:
> > 
> > 28b3ae426598 ("rcu: Introduce CONFIG_RCU_EXP_CPU_STALL_TIMEOUT")
> > 
> > But this would be excessive on many other platforms, especially those
> > with large quantities of CPUs, devices, or memory.
> > 
> > > Hope this helps.
> > 
> > Likewise!
> > 
> > 							Thanx, Paul
> 
> Dear Mr. McKenney,
> 
> And what about the Mr. Robert Elliott's observation about calling conf_recshed()?
> 
> > How big can these readahead sizes be? Should one of the loops include
> > cond_resched() calls?
> 
> That is IMHO better than allowing 21000 milisecond stalls on a core (or more of them).
> 
> I don't think it is correct to stay in kernel mode for more than an timer unit
> without yielding the CPU. It creates stalls in multimedia and audio (chirps like on scratched
> CD-ROMs). This is especially noticeable with a KASAN build.
> 
> Since Firefox and most snaps are using squashfs as compressed ROFS, the Firefox appears
> to perform poorer since snaps are introduced than Chrome.
> 
> IMHO, if we want something like realtime and multimedia processing (which is the specific
> area of my research), it seems that anything trying to hold processor for 21000 ms (21 secs)
> is either buggy or deliberately malicious. 20 ms is quite enough of work for a threat
> in one allotted timeslot.
> 
> I do not agree with Mr. Lougher's observation that I am thrashing my laptop. I think that
> a system has to endure stress and torture testing. I was raised on Digital MicroVAX systems
> on Ultrix which compiled lab at a time in memory that would today sound funny. :)

I personally think that it would be great if you were to work to decrease
the Linux kernel's latency.  Doing so would not be fixing a regression,
but I personally would welcome it.  Others might have different opinions,
but please do CC me on any resulting patches.

And I will see your MicroVAX and raise you a videogame written on a
PDP-12 whose fastest instruction executed in 1.6 microseconds (-not-
nanoseconds!).  ;-)

You can can see a couple of people playing the game on a PDP-12 in
a computer museum: https://www.rcsri.org/collection/pdp-12/

> Besides, this is the very idea behind the MG-LRU algorithm commit, to test eviction of
> memory pages in the system with heavy load and low on memory.
> 
> I will probably test your commits, but now I have to do my own evening ritual, unwinding,
> and knowledge and memory consolidation (called "sleep").

And yes, sleep is often one of the best debugging tools available.

> I appreciate your lots of commits on the kernel.org and I hope I do not sound like
> I am thinking you are a village idiot :(
> 
> I am trying to adhere to the Code of Conduct with mutual respect and politeness.

Skepticism is not necessarily a bad thing, especially given that I
am not immune from errors and confusion.  Me, I just thought you were
forcefully reporting the regression, so I forcefully pointed you at the
fix for that regression.

Again, I have absolutely no objection to your improving the kernel's
response time.

> I know that the Linux kernel is about 30 million lines by now, and by the security experts
> we should expect 30,000 bugs in such a solid piece of written code (one per thousand of
> lines). Only Mr. Thorsten mentioned 950 unresolved in the "open" list.

At least 30,000 bugs, of which we know of maybe 950.  ;-)

> Knowing all of this is difficult, but I still believe in open source and open systems
> interconnected.

If it was easy, where would be the challenge?

> Of course, I always remember a proverb "Who hath despised the day of the small beginnings?"
> 
> Hope this helps. My $0.02.

I think we are good.   ;-)

							Thanx, Paul

> Thank you,
> Mirsad
> 
> --
> Mirsad Goran Todorovac
> Sistem inženjer
> Grafički fakultet | Akademija likovnih umjetnosti
> Sveučilište u Zagrebu
> -- 
> System engineer
> Faculty of Graphic Arts | Academy of Fine Arts
> University of Zagreb, Republic of Croatia
> The European Union
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: BUG: BISECTED: in squashfs_xz_uncompress() (Was: RCU stalls in squashfs_readahead())
  2022-11-20 18:36           ` Mirsad Goran Todorovac
  2022-11-20 19:21             ` Paul E. McKenney
@ 2022-11-20 19:33             ` Mirsad Goran Todorovac
  1 sibling, 0 replies; 25+ messages in thread
From: Mirsad Goran Todorovac @ 2022-11-20 19:33 UTC (permalink / raw)
  To: paulmck
  Cc: Phillip Lougher, LKML, phillip.lougher, Thorsten Leemhuis, elliott

P.S.

Sorry, Paul, the patches did not apply on the mainline "master" or "v6.1-rc5" nor "v6.0.9".

Here is the session log:

mtodorov@domac:~/linux/kernel/linux_stable_build_b$ git checkout v6.1-rc5
Note: checking out 'v6.1-rc5'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

   git checkout -b <new-branch-name>

HEAD is now at 094226ad94f4 Linux 6.1-rc5
mtodorov@domac:~/linux/kernel/linux_stable_build_b$ git apply ../remove-android.patch
error: patch failed: drivers/Makefile:176
error: drivers/Makefile: patch does not apply
error: patch failed: drivers/android/Kconfig:1
error: drivers/android/Kconfig: patch does not apply
error: patch failed: kernel/configs/android-base.config:7
error: kernel/configs/android-base.config: patch does not apply
error: patch failed: kernel/rcu/Kconfig.debug:86
error: kernel/rcu/Kconfig.debug: patch does not apply
error: patch failed: tools/testing/selftests/filesystems/binderfs/config:1
error: tools/testing/selftests/filesystems/binderfs/config: patch does not apply
error: patch failed: tools/testing/selftests/sync/config:1
error: tools/testing/selftests/sync/config: patch does not apply
mtodorov@domac:~/linux/kernel/linux_stable_build_b$ git apply ../introduce-rcu-stall-timeout.patch
error: patch failed: Documentation/admin-guide/kernel-parameters.txt:4893
error: Documentation/admin-guide/kernel-parameters.txt: patch does not apply
error: patch failed: kernel/rcu/Kconfig.debug:91
error: kernel/rcu/Kconfig.debug: patch does not apply
error: patch failed: kernel/rcu/rcu.h:210
error: kernel/rcu/rcu.h: patch does not apply
error: patch failed: kernel/rcu/tree_exp.h:496
error: kernel/rcu/tree_exp.h: patch does not apply
error: patch failed: kernel/rcu/tree_stall.h:25
error: kernel/rcu/tree_stall.h: patch does not apply
error: patch failed: kernel/rcu/update.c:506
error: kernel/rcu/update.c: patch does not apply
mtodorov@domac:~/linux/kernel/linux_stable_build_b$ git checkout v6.0.9
Checking out files: 100% (12048/12048), done.
Previous HEAD position was 094226ad94f4 Linux 6.1-rc5
HEAD is now at be8b93b5cc7d Linux 6.0.9
You have mail in /var/mail/mtodorov
mtodorov@domac:~/linux/kernel/linux_stable_build_b$ git apply ../remove-android.patch
error: patch failed: drivers/Makefile:176
error: drivers/Makefile: patch does not apply
error: patch failed: drivers/android/Kconfig:1
error: drivers/android/Kconfig: patch does not apply
error: patch failed: kernel/configs/android-base.config:7
error: kernel/configs/android-base.config: patch does not apply
error: patch failed: kernel/rcu/Kconfig.debug:86
error: kernel/rcu/Kconfig.debug: patch does not apply
error: patch failed: tools/testing/selftests/filesystems/binderfs/config:1
error: tools/testing/selftests/filesystems/binderfs/config: patch does not apply
error: patch failed: tools/testing/selftests/sync/config:1
error: tools/testing/selftests/sync/config: patch does not apply
mtodorov@domac:~/linux/kernel/linux_stable_build_b$ git apply ../introduce-rcu-stall-timeout.patch
error: patch failed: Documentation/admin-guide/kernel-parameters.txt:4893
error: Documentation/admin-guide/kernel-parameters.txt: patch does not apply
error: patch failed: kernel/rcu/Kconfig.debug:91
error: kernel/rcu/Kconfig.debug: patch does not apply
error: patch failed: kernel/rcu/rcu.h:210
error: kernel/rcu/rcu.h: patch does not apply
error: patch failed: kernel/rcu/tree_exp.h:496
error: kernel/rcu/tree_exp.h: patch does not apply
error: patch failed: kernel/rcu/tree_stall.h:25
error: kernel/rcu/tree_stall.h: patch does not apply
error: patch failed: kernel/rcu/update.c:506
error: kernel/rcu/update.c: patch does not apply
mtodorov@domac:~/linux/kernel/linux_stable_build_b$

Now I really got to try to catch some sleep because I work my day job in the morning.

I must admit that I am not enthusiastic about these patches for I think they only mask
the problem of stalls: removing reports from the logs, but cores will still lock up.

Besides, I also hoped to develop multimedia for Android devices, too, as a part of our
research, to academically justify my work on the Linux kernel bug bisection and learning
about the kernel proper.

I still need to figure out a way to present my results to the academic community, to
have it justified and to have our University more visible and ranked on the world list. ;-)

Thanks,
Mirsad

--
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu
-- 
System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia
The European Union


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: BUG: BISECTED: in squashfs_xz_uncompress() (Was: RCU stalls in squashfs_readahead())
  2022-11-20 19:21             ` Paul E. McKenney
@ 2022-11-21  4:04               ` Mirsad Goran Todorovac
  2022-11-22  2:07                 ` Paul E. McKenney
  0 siblings, 1 reply; 25+ messages in thread
From: Mirsad Goran Todorovac @ 2022-11-21  4:04 UTC (permalink / raw)
  To: paulmck
  Cc: Phillip Lougher, LKML, phillip.lougher, Thorsten Leemhuis, elliott

On 20. 11. 2022. 20:21, Paul E. McKenney wrote:

>> And what about the Mr. Robert Elliott's observation about calling conf_recshed()?
>>
>>> How big can these readahead sizes be? Should one of the loops include
>>> cond_resched() calls?
>>
>> That is IMHO better than allowing 21000 milisecond stalls on a core (or more of them).
>>
>> I don't think it is correct to stay in kernel mode for more than an timer unit
>> without yielding the CPU. It creates stalls in multimedia and audio (chirps like on scratched
>> CD-ROMs). This is especially noticeable with a KASAN build.
>>
>> Since Firefox and most snaps are using squashfs as compressed ROFS, the Firefox appears
>> to perform poorer since snaps are introduced than Chrome.
>>
>> IMHO, if we want something like realtime and multimedia processing (which is the specific
>> area of my research), it seems that anything trying to hold processor for 21000 ms (21 secs)
>> is either buggy or deliberately malicious. 20 ms is quite enough of work for a threat
>> in one allotted timeslot.
>>
>> I do not agree with Mr. Lougher's observation that I am thrashing my laptop. I think that
>> a system has to endure stress and torture testing. I was raised on Digital MicroVAX systems
>> on Ultrix which compiled lab at a time in memory that would today sound funny. :)
> 
> I personally think that it would be great if you were to work to decrease
> the Linux kernel's latency.  Doing so would not be fixing a regression,
> but I personally would welcome it.  Others might have different opinions,
> but please do CC me on any resulting patches.
> 
> And I will see your MicroVAX and raise you a videogame written on a
> PDP-12 whose fastest instruction executed in 1.6 microseconds (-not-
> nanoseconds!).  ;-)

I'm afraid that I would lose in Far Cry miserably if my cores
decided to all lock up for 21 secs. :-(

> You can can see a couple of people playing the game on a PDP-12 in
> a computer museum: https://www.rcsri.org/collection/pdp-12/
> 
>> Besides, this is the very idea behind the MG-LRU algorithm commit, to test eviction of
>> memory pages in the system with heavy load and low on memory.
>>
>> I will probably test your commits, but now I have to do my own evening ritual, unwinding,
>> and knowledge and memory consolidation (called "sleep").
> 
> And yes, sleep is often one of the best debugging tools available.
> 
>> I appreciate your lots of commits on the kernel.org and I hope I do not sound like
>> I am thinking you are a village idiot :(
>>
>> I am trying to adhere to the Code of Conduct with mutual respect and politeness.
> 
> Skepticism is not necessarily a bad thing, especially given that I
> am not immune from errors and confusion.  Me, I just thought you were
> forcefully reporting the regression, so I forcefully pointed you at the
> fix for that regression.
> 
> Again, I have absolutely no objection to your improving the kernel's
> response time.

This is at present just the wishful thinking, as I lack your 30 years of
experience with the kernel and RCU update system. I am only beginning to realise
why it is more efficient than the traditional locking, and IMHO it should
avoid locking up cores instead of increasing the number of complaints.

But even if the Linux kernel source is magically "memory mapped" into my
mind, I still do not see how it could be done. My Linux kernel learning curve
had not yet got that up, but I have no doubts that it is designed by
Intelligent Designers who are very witty people, and not village idiots ;-)

>> I know that the Linux kernel is about 30 million lines by now, and by the security experts
>> we should expect 30,000 bugs in such a solid piece of written code (one per thousand of
>> lines). Only Mr. Thorsten mentioned 950 unresolved in the "open" list.
> 
> At least 30,000 bugs, of which we know of maybe 950.  ;-)

So I need no point in banning the kernel from screaming to logs that it had
core stalls that needed a physical NMI to recover from, or they would potentially
last much longer.

>> Knowing all of this is difficult, but I still believe in open source and open systems
>> interconnected.
> 
> If it was easy, where would be the challenge?

AFAIK, the point I was taught in life was obedience, not overcoming challenges.

>> Of course, I always remember a proverb "Who hath despised the day of the small beginnings?"
>>
>> Hope this helps. My $0.02.
> 
> I think we are good.   ;-)

Yes, you guys do an amasing job of keeping 30 million lines of code organised
and making some sense. I will cut the smalltalk as I know you are a busy man.
If I make a progress to actually produce any patches fixing these lockups and
stalls, I will be sure to include you into CC: as you requested.

Have a nice day!

Mirsad

--
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu
-- 
System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia
The European Union


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: BUG: BISECTED: in squashfs_xz_uncompress() (Was: RCU stalls in squashfs_readahead())
  2022-11-21  4:04               ` Mirsad Goran Todorovac
@ 2022-11-22  2:07                 ` Paul E. McKenney
  2022-11-23  9:14                   ` Mirsad Goran Todorovac
  0 siblings, 1 reply; 25+ messages in thread
From: Paul E. McKenney @ 2022-11-22  2:07 UTC (permalink / raw)
  To: Mirsad Goran Todorovac
  Cc: Phillip Lougher, LKML, phillip.lougher, Thorsten Leemhuis, elliott

On Mon, Nov 21, 2022 at 05:04:36AM +0100, Mirsad Goran Todorovac wrote:
> On 20. 11. 2022. 20:21, Paul E. McKenney wrote:
> > > And what about the Mr. Robert Elliott's observation about calling conf_recshed()?
> > > 
> > > > How big can these readahead sizes be? Should one of the loops include
> > > > cond_resched() calls?
> > > 
> > > That is IMHO better than allowing 21000 milisecond stalls on a core (or more of them).
> > > 
> > > I don't think it is correct to stay in kernel mode for more than an timer unit
> > > without yielding the CPU. It creates stalls in multimedia and audio (chirps like on scratched
> > > CD-ROMs). This is especially noticeable with a KASAN build.
> > > 
> > > Since Firefox and most snaps are using squashfs as compressed ROFS, the Firefox appears
> > > to perform poorer since snaps are introduced than Chrome.
> > > 
> > > IMHO, if we want something like realtime and multimedia processing (which is the specific
> > > area of my research), it seems that anything trying to hold processor for 21000 ms (21 secs)
> > > is either buggy or deliberately malicious. 20 ms is quite enough of work for a threat
> > > in one allotted timeslot.
> > > 
> > > I do not agree with Mr. Lougher's observation that I am thrashing my laptop. I think that
> > > a system has to endure stress and torture testing. I was raised on Digital MicroVAX systems
> > > on Ultrix which compiled lab at a time in memory that would today sound funny. :)
> > 
> > I personally think that it would be great if you were to work to decrease
> > the Linux kernel's latency.  Doing so would not be fixing a regression,
> > but I personally would welcome it.  Others might have different opinions,
> > but please do CC me on any resulting patches.
> > 
> > And I will see your MicroVAX and raise you a videogame written on a
> > PDP-12 whose fastest instruction executed in 1.6 microseconds (-not-
> > nanoseconds!).  ;-)
> 
> I'm afraid that I would lose in Far Cry miserably if my cores
> decided to all lock up for 21 secs. :-(

Agreed, 21 seconds is an improvement over the earlier 60 seconds, but
still a very long time.  Me, I come from DYNIX/ptx, where the equivalent
to the RCU CPU stall warning was 1.5 seconds.  On the other hand, it
is also the case that DYNIX/ptx had nowhere near the variety of drivers
and subsystems, nor did it scale anywhere near as far as Linux does today.

But you only need one CPU to lock up for 21 seconds to get an RCU CPU
stall warning, not all of them.  ;-)

> > You can can see a couple of people playing the game on a PDP-12 in
> > a computer museum: https://www.rcsri.org/collection/pdp-12/
> > 
> > > Besides, this is the very idea behind the MG-LRU algorithm commit, to test eviction of
> > > memory pages in the system with heavy load and low on memory.
> > > 
> > > I will probably test your commits, but now I have to do my own evening ritual, unwinding,
> > > and knowledge and memory consolidation (called "sleep").
> > 
> > And yes, sleep is often one of the best debugging tools available.
> > 
> > > I appreciate your lots of commits on the kernel.org and I hope I do not sound like
> > > I am thinking you are a village idiot :(
> > > 
> > > I am trying to adhere to the Code of Conduct with mutual respect and politeness.
> > 
> > Skepticism is not necessarily a bad thing, especially given that I
> > am not immune from errors and confusion.  Me, I just thought you were
> > forcefully reporting the regression, so I forcefully pointed you at the
> > fix for that regression.
> > 
> > Again, I have absolutely no objection to your improving the kernel's
> > response time.
> 
> This is at present just the wishful thinking, as I lack your 30 years of
> experience with the kernel and RCU update system. I am only beginning to realise
> why it is more efficient than the traditional locking, and IMHO it should
> avoid locking up cores instead of increasing the number of complaints.

Just to set the record straight, RCU does not normally lock up any of
the cores.  Instead, RCU detects that cores have been locked up.

Give or take the occasional bug in RCU, of course!

> But even if the Linux kernel source is magically "memory mapped" into my
> mind, I still do not see how it could be done. My Linux kernel learning curve
> had not yet got that up, but I have no doubts that it is designed by
> Intelligent Designers who are very witty people, and not village idiots ;-)

There is the school of thought that claims that the Linux kernel is
driven by evolutionary forces rather than intelligent design.  And as
we all know, evolutionary forces are driven by random changes, which
absolutely anyone could make.

And one approach is to take a less aggressive RCU CPU stall timeout,
say reducing from 21 seconds to (say) 15 seconds instead of all the
way down to 20 milliseconds.  This could allow you to ease into the
latency-reduction work.

Alternatively, consider that response time is a property of the
entire system plus the environment that it runs in.  So I suspect that
the Android folks are accompanying that 20-millisecond timeout with
some restrictions on what the on-phone workloads are permitted to do.
Maybe ask the Android guys what those restrictions are and loosen them
slightly, again allowing you to ease into the latency-reduction work.

> > > I know that the Linux kernel is about 30 million lines by now, and by the security experts
> > > we should expect 30,000 bugs in such a solid piece of written code (one per thousand of
> > > lines). Only Mr. Thorsten mentioned 950 unresolved in the "open" list.
> > 
> > At least 30,000 bugs, of which we know of maybe 950.  ;-)
> 
> So I need no point in banning the kernel from screaming to logs that it had
> core stalls that needed a physical NMI to recover from, or they would potentially
> last much longer.

Sometimes an NMI does get the CPUs back on track.  Sometimes the RCU CPU
stall warning is a symptom of the CPU having gotten too old and failing.
Most often, though, it is a sign of some sort of lockup, a too-long
RCU read-side critical section, or as Robert Elliot noted, the lack of
a cond_resched().

But please keep in mind that cond_resched() helps only in kernels built
with CONFIG_PREEMPTION=n.

> > > Knowing all of this is difficult, but I still believe in open source and open systems
> > > interconnected.
> > 
> > If it was easy, where would be the challenge?
> 
> AFAIK, the point I was taught in life was obedience, not overcoming challenges.

Perhaps early in life I was ordered to overcome challenges?  If so, then
my overcoming them would be a matter of obedience.  ;-)

> > > Of course, I always remember a proverb "Who hath despised the day of the small beginnings?"
> > > 
> > > Hope this helps. My $0.02.
> > 
> > I think we are good.   ;-)
> 
> Yes, you guys do an amasing job of keeping 30 million lines of code organised
> and making some sense. I will cut the smalltalk as I know you are a busy man.
> If I make a progress to actually produce any patches fixing these lockups and
> stalls, I will be sure to include you into CC: as you requested.

Looking forward to seeing what you come up with!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: BUG: BISECTED: in squashfs_xz_uncompress() (Was: RCU stalls in squashfs_readahead())
  2022-11-22  2:07                 ` Paul E. McKenney
@ 2022-11-23  9:14                   ` Mirsad Goran Todorovac
  2022-11-23 14:44                     ` Paul E. McKenney
  0 siblings, 1 reply; 25+ messages in thread
From: Mirsad Goran Todorovac @ 2022-11-23  9:14 UTC (permalink / raw)
  To: paulmck
  Cc: Phillip Lougher, LKML, phillip.lougher, Thorsten Leemhuis, elliott

On 22. 11. 2022. 03:07, Paul E. McKenney wrote:

>> I'm afraid that I would lose in Far Cry miserably if my cores
>> decided to all lock up for 21 secs. :-(
> 
> Agreed, 21 seconds is an improvement over the earlier 60 seconds, but
> still a very long time.  Me, I come from DYNIX/ptx, where the equivalent
> to the RCU CPU stall warning was 1.5 seconds.  On the other hand, it
> is also the case that DYNIX/ptx had nowhere near the variety of drivers
> and subsystems, nor did it scale anywhere near as far as Linux does today.
> 
> But you only need one CPU to lock up for 21 seconds to get an RCU CPU
> stall warning, not all of them.  ;-)

I can recall an occasion or a couple of them where the entire X Window system
had been unresponsive for quite a number of seconds that sometimes made me reset
the Ubuntu box.

I have the good news: the patches did not apply because they were already applied
in the mainline tree:

mtodorov@domac:~/linux/kernel/linux_stable_build_b$ git log | grep -C5 28b3ae426598
     Signed-off-by: Borislav Petkov <bp@suse.de>
     Reviewed-by: Kees Cook <keescook@chromium.org>
     Acked-by: Florian Weimer <fweimer@redhat.com>
     Link: https://lore.kernel.org/r/898932fe61db6a9d61bc2458fa2f6049f1ca9f5c.1652290558.git.luto@kernel.org

commit 28b3ae426598e722cf5d5ab9cc7038791b955a56
Author: Uladzislau Rezki <uladzislau.rezki@sony.com>
Date:   Wed Feb 16 14:52:09 2022 +0100

     rcu: Introduce CONFIG_RCU_EXP_CPU_STALL_TIMEOUT

mtodorov@domac:~/linux/kernel/linux_stable_build_b$ git log | grep -C5 1045a06724f3
     Somehow kernel-doc complains here about strong markup, but
     we really don't need the [] so just remove that.

     Signed-off-by: Johannes Berg <johannes.berg@intel.com>

commit 1045a06724f322ed61f1ffb994427c7bdbe64647
Author: Christoph Hellwig <hch@lst.de>
Date:   Wed Jun 29 17:01:02 2022 +0200

     remove CONFIG_ANDROID

mtodorov@domac:~/linux/kernel/linux_stable_build_b$

>> This is at present just the wishful thinking, as I lack your 30 years of
>> experience with the kernel and RCU update system. I am only beginning to realise
>> why it is more efficient than the traditional locking, and IMHO it should
>> avoid locking up cores instead of increasing the number of complaints.
> 
> Just to set the record straight, RCU does not normally lock up any of
> the cores.  Instead, RCU detects that cores have been locked up.
> 
> Give or take the occasional bug in RCU, of course!

Currently, I cannot be the judge of that, for I can't seem to understand how the
magic of RCU works., how it is implemented. There's more homework to be done ;-)

>> But even if the Linux kernel source is magically "memory mapped" into my
>> mind, I still do not see how it could be done. My Linux kernel learning curve
>> had not yet got that up, but I have no doubts that it is designed by
>> Intelligent Designers who are very witty people, and not village idiots ;-)
> 
> There is the school of thought that claims that the Linux kernel is
> driven by evolutionary forces rather than intelligent design.  And as
> we all know, evolutionary forces are driven by random changes, which
> absolutely anyone could make.

Give or take the rate of improbability where a bunch of monkeys randomly typing
would produce a working Linux kernel source would be about a couple of working
sources in a space of 96^30,000,00 (something like 10^300,000,000), it is comparable
to the probability of random coming of the first simplest DNA into the existence
from the amino acid primordial soup.

(Not that many atoms in the Universe - 10^82, you'd need an awful lot of wasted
multiverses with no even single cell life and certainly no working Linux kernels.)

> And one approach is to take a less aggressive RCU CPU stall timeout,
> say reducing from 21 seconds to (say) 15 seconds instead of all the
> way down to 20 milliseconds.  This could allow you to ease into the
> latency-reduction work.
> 
> Alternatively, consider that response time is a property of the
> entire system plus the environment that it runs in.  So I suspect that
> the Android folks are accompanying that 20-millisecond timeout with
> some restrictions on what the on-phone workloads are permitted to do.
> Maybe ask the Android guys what those restrictions are and loosen them
> slightly, again allowing you to ease into the latency-reduction work.

Good point.
> Sometimes an NMI does get the CPUs back on track.  Sometimes the RCU CPU
> stall warning is a symptom of the CPU having gotten too old and failing.
> Most often, though, it is a sign of some sort of lockup, a too-long
> RCU read-side critical section, or as Robert Elliot noted, the lack of
> a cond_resched().
> 
> But please keep in mind that cond_resched() helps only in kernels built
> with CONFIG_PREEMPTION=n.

I have bad news that 6.1-r6 is still affected with squashfs_xz_uncompress bug, despite having both of your fixes
(as visible in above command's output -- double checked):

[   91.065659] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 3-.... } 6 jiffies s: 621 root: 0x8/.
[   91.065694] rcu: blocking rcu_node structures (internal RCU debug):
[   91.065704] Sending NMI from CPU 5 to CPUs 3:
[   91.065721] NMI backtrace for cpu 3
[   91.065730] CPU: 3 PID: 2829 Comm: snap-store Not tainted 6.1.0-rc6 #1
[   91.065741] Hardware name: LENOVO 82H8/LNVNB161216, BIOS GGCN49WW 07/21/2022
[   91.065746] RIP: 0010:__asan_load4+0x0/0xa0
[   91.065764] Code: 9e c0 84 c0 75 e1 5d c3 cc cc cc cc 48 c1 e8 03 80 3c 10 00 75 e9 5d c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 
00 00 0f 1f 00 <55> 48 89 e5 48 8b 4d 08 48 83 ff fb 77 64 eb 0f 0f 1f 00 48 b8 00
[   91.065771] RSP: 0000:ffff8881388ef140 EFLAGS: 00000246
[   91.065779] RAX: 0000000000000000 RBX: 0000000000000003 RCX: ffffffff9be992fd
[   91.065785] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffff888125500004
[   91.065789] RBP: ffff8881388ef1e0 R08: 0000000000000001 R09: ffffed1024aa0de8
[   91.065794] R10: ffff888125506f39 R11: ffffed1024aa0de7 R12: 0000000001067db0
[   91.065799] R13: ffff888125500000 R14: 00000000014fe803 R15: ffff888125502112
[   91.065804] FS:  00007fdec50ab180(0000) GS:ffff888257180000(0000) knlGS:0000000000000000
[   91.065810] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   91.065815] CR2: 00007fdeb7cb6260 CR3: 000000011a436005 CR4: 0000000000770ee0
[   91.065820] PKRU: 55555554
[   91.065823] Call Trace:
[   91.065826]  <TASK>
[   91.065829]  ? lzma_main+0x37a/0x1260
[   91.065845]  lzma2_lzma+0x2b9/0x430
[   91.065857]  xz_dec_lzma2_run+0x11f/0xb90
[   91.065867]  ? __asan_load4+0x55/0xa0
[   91.065880]  xz_dec_run+0x346/0x11f0
[   91.065892]  squashfs_xz_uncompress+0x196/0x370
[   91.065905]  ? lzo_uncompress+0x400/0x400
[   91.065913]  squashfs_decompress+0x88/0xd0
[   91.065923]  squashfs_read_data+0x1e5/0x900
[   91.065930]  ? __create_object+0x4ae/0x560
[   91.065942]  ? squashfs_bio_read.isra.3+0x230/0x230
[   91.065951]  ? __kasan_kmalloc+0xb6/0xc0
[   91.065961]  ? squashfs_page_actor_init_special+0x1a6/0x210
[   91.065972]  squashfs_readahead+0xaa3/0xe80
[   91.065985]  ? squashfs_fill_page+0x190/0x190
[   91.065993]  ? __filemap_add_folio+0x3a1/0x680
[   91.066003]  ? dio_warn_stale_pagecache.part.67+0x90/0x90
[   91.066012]  read_pages+0x122/0x540
[   91.066023]  ? file_ra_state_init+0x60/0x60
[   91.066032]  ? filemap_add_folio+0xd4/0x140
[   91.066040]  ? folio_alloc+0x1b/0x50
[   91.066051]  page_cache_ra_unbounded+0x1e6/0x280
[   91.066064]  do_page_cache_ra+0x7c/0x90
[   91.066074]  page_cache_ra_order+0x393/0x400
[   91.066087]  ondemand_readahead+0x2f1/0x4e0
[   91.066098]  page_cache_async_ra+0x8b/0xa0
[   91.066106]  filemap_fault+0x742/0x1490
[   91.066113]  ? __folio_memcg_unlock+0x35/0x80
[   91.066124]  ? read_cache_page_gfp+0x90/0x90
[   91.066132]  ? filemap_map_pages+0x28e/0xc60
[   91.066145]  __do_fault+0x76/0x1b0
[   91.066154]  do_fault+0x1c6/0x680
[   91.066163]  __handle_mm_fault+0x89a/0x1310
[   91.066173]  ? copy_page_range+0x1b20/0x1b20
[   91.066181]  ? mt_find+0x189/0x330
[   91.066190]  ? mas_next_entry+0xa80/0xa80
[   91.066204]  handle_mm_fault+0x11b/0x390
[   91.066213]  do_user_addr_fault+0x258/0x860
[   91.066225]  exc_page_fault+0x64/0xf0
[   91.066235]  asm_exc_page_fault+0x27/0x30
[   91.066245] RIP: 0033:0x7fdeb7a1e541
[   91.066252] Code: 11 44 b8 e0 0f 11 44 b8 f0 0f 11 04 b8 48 83 c7 40 48 83 c6 f8 75 92 48 85 d2 74 2d 4c 01 d7 49 8d 04 b9 48 83 
c0 10 48 f7 da <0f> 28 05 18 7d 29 00 0f 1f 84 00 00 00 00 00 0f 11 40 f0 0f 11 00
[   91.066259] RSP: 002b:00007fff46f77b60 EFLAGS: 00010293
[   91.066265] RAX: 000055fbe1a57c30 RBX: 00007fff46f77d18 RCX: 000055fbe1a57c20
[   91.066270] RDX: fffffffffffffffb RSI: 0000000000000005 RDI: 0000000000000000
[   91.066274] RBP: 000055fbe16563b0 R08: 0000000000000028 R09: 000055fbe1a57c20
[   91.066279] R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000002b
[   91.066283] R13: 000055fbe1656408 R14: 00007fff46f77b80 R15: 0000000000000000
[   91.066292]  </TASK>

(This is apparently only visible in CONFIG_KASAN=y build.)

>> Yes, you guys do an amasing job of keeping 30 million lines of code organised
>> and making some sense. I will cut the smalltalk as I know you are a busy man.
>> If I make a progress to actually produce any patches fixing these lockups and
>> stalls, I will be sure to include you into CC: as you requested.
> 
> Looking forward to seeing what you come up with!

There will have to be a lot of homework to catch up with to before I'd be able to do anything
sensible. :)

Thanks,
Mirsad

--
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu
-- 
System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia
The European Union


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: BUG: BISECTED: in squashfs_xz_uncompress() (Was: RCU stalls in squashfs_readahead())
  2022-11-23  9:14                   ` Mirsad Goran Todorovac
@ 2022-11-23 14:44                     ` Paul E. McKenney
  2022-11-23 16:08                       ` Elliott, Robert (Servers)
  0 siblings, 1 reply; 25+ messages in thread
From: Paul E. McKenney @ 2022-11-23 14:44 UTC (permalink / raw)
  To: Mirsad Goran Todorovac
  Cc: Phillip Lougher, LKML, phillip.lougher, Thorsten Leemhuis, elliott

On Wed, Nov 23, 2022 at 10:14:07AM +0100, Mirsad Goran Todorovac wrote:
> On 22. 11. 2022. 03:07, Paul E. McKenney wrote:
> 
> > > I'm afraid that I would lose in Far Cry miserably if my cores
> > > decided to all lock up for 21 secs. :-(
> > 
> > Agreed, 21 seconds is an improvement over the earlier 60 seconds, but
> > still a very long time.  Me, I come from DYNIX/ptx, where the equivalent
> > to the RCU CPU stall warning was 1.5 seconds.  On the other hand, it
> > is also the case that DYNIX/ptx had nowhere near the variety of drivers
> > and subsystems, nor did it scale anywhere near as far as Linux does today.
> > 
> > But you only need one CPU to lock up for 21 seconds to get an RCU CPU
> > stall warning, not all of them.  ;-)
> 
> I can recall an occasion or a couple of them where the entire X Window system
> had been unresponsive for quite a number of seconds that sometimes made me reset
> the Ubuntu box.
> 
> I have the good news: the patches did not apply because they were already applied
> in the mainline tree:
> 
> mtodorov@domac:~/linux/kernel/linux_stable_build_b$ git log | grep -C5 28b3ae426598
>     Signed-off-by: Borislav Petkov <bp@suse.de>
>     Reviewed-by: Kees Cook <keescook@chromium.org>
>     Acked-by: Florian Weimer <fweimer@redhat.com>
>     Link: https://lore.kernel.org/r/898932fe61db6a9d61bc2458fa2f6049f1ca9f5c.1652290558.git.luto@kernel.org
> 
> commit 28b3ae426598e722cf5d5ab9cc7038791b955a56
> Author: Uladzislau Rezki <uladzislau.rezki@sony.com>
> Date:   Wed Feb 16 14:52:09 2022 +0100
> 
>     rcu: Introduce CONFIG_RCU_EXP_CPU_STALL_TIMEOUT
> 
> mtodorov@domac:~/linux/kernel/linux_stable_build_b$ git log | grep -C5 1045a06724f3
>     Somehow kernel-doc complains here about strong markup, but
>     we really don't need the [] so just remove that.
> 
>     Signed-off-by: Johannes Berg <johannes.berg@intel.com>
> 
> commit 1045a06724f322ed61f1ffb994427c7bdbe64647
> Author: Christoph Hellwig <hch@lst.de>
> Date:   Wed Jun 29 17:01:02 2022 +0200
> 
>     remove CONFIG_ANDROID
> 
> mtodorov@domac:~/linux/kernel/linux_stable_build_b$

OK, very good.

But your CONFIG_RCU_EXP_CPU_STALL_TIMEOUT seems to be about 20.
As in 20 milliseconds.

Is that intentional?

> > > This is at present just the wishful thinking, as I lack your 30 years of
> > > experience with the kernel and RCU update system. I am only beginning to realise
> > > why it is more efficient than the traditional locking, and IMHO it should
> > > avoid locking up cores instead of increasing the number of complaints.
> > 
> > Just to set the record straight, RCU does not normally lock up any of
> > the cores.  Instead, RCU detects that cores have been locked up.
> > 
> > Give or take the occasional bug in RCU, of course!
> 
> Currently, I cannot be the judge of that, for I can't seem to understand how the
> magic of RCU works., how it is implemented. There's more homework to be done ;-)

The high-level specification is dead simple.  Calls synchronize_rcu()
must wait for any CPU or task that has already executed rcu_read_lock()
to reach the corresponding rcu_read_unlock().  For more requirements:

Documentation/RCU/Design/Requirements/Requirements.rst

> > > But even if the Linux kernel source is magically "memory mapped" into my
> > > mind, I still do not see how it could be done. My Linux kernel learning curve
> > > had not yet got that up, but I have no doubts that it is designed by
> > > Intelligent Designers who are very witty people, and not village idiots ;-)
> > 
> > There is the school of thought that claims that the Linux kernel is
> > driven by evolutionary forces rather than intelligent design.  And as
> > we all know, evolutionary forces are driven by random changes, which
> > absolutely anyone could make.
> 
> Give or take the rate of improbability where a bunch of monkeys randomly typing
> would produce a working Linux kernel source would be about a couple of working
> sources in a space of 96^30,000,00 (something like 10^300,000,000), it is comparable
> to the probability of random coming of the first simplest DNA into the existence
> from the amino acid primordial soup.
> 
> (Not that many atoms in the Universe - 10^82, you'd need an awful lot of wasted
> multiverses with no even single cell life and certainly no working Linux kernels.)

Just apply continuous integration to each monkey's output.  Just like
was done to the strands of amino acids.

(Sorry, couldn't resist!)

> > And one approach is to take a less aggressive RCU CPU stall timeout,
> > say reducing from 21 seconds to (say) 15 seconds instead of all the
> > way down to 20 milliseconds.  This could allow you to ease into the
> > latency-reduction work.
> > 
> > Alternatively, consider that response time is a property of the
> > entire system plus the environment that it runs in.  So I suspect that
> > the Android folks are accompanying that 20-millisecond timeout with
> > some restrictions on what the on-phone workloads are permitted to do.
> > Maybe ask the Android guys what those restrictions are and loosen them
> > slightly, again allowing you to ease into the latency-reduction work.
> 
> Good point.
> > Sometimes an NMI does get the CPUs back on track.  Sometimes the RCU CPU
> > stall warning is a symptom of the CPU having gotten too old and failing.
> > Most often, though, it is a sign of some sort of lockup, a too-long
> > RCU read-side critical section, or as Robert Elliot noted, the lack of
> > a cond_resched().
> > 
> > But please keep in mind that cond_resched() helps only in kernels built
> > with CONFIG_PREEMPTION=n.
> 
> I have bad news that 6.1-r6 is still affected with squashfs_xz_uncompress bug, despite having both of your fixes
> (as visible in above command's output -- double checked):
> 
> [   91.065659] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 3-.... } 6 jiffies s: 621 root: 0x8/.

If you build with (say) CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=200, does
this still happen?

For that 

> [   91.065694] rcu: blocking rcu_node structures (internal RCU debug):
> [   91.065704] Sending NMI from CPU 5 to CPUs 3:
> [   91.065721] NMI backtrace for cpu 3
> [   91.065730] CPU: 3 PID: 2829 Comm: snap-store Not tainted 6.1.0-rc6 #1
> [   91.065741] Hardware name: LENOVO 82H8/LNVNB161216, BIOS GGCN49WW 07/21/2022
> [   91.065746] RIP: 0010:__asan_load4+0x0/0xa0
> [   91.065764] Code: 9e c0 84 c0 75 e1 5d c3 cc cc cc cc 48 c1 e8 03 80 3c
> 10 00 75 e9 5d c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 <55>
> 48 89 e5 48 8b 4d 08 48 83 ff fb 77 64 eb 0f 0f 1f 00 48 b8 00
> [   91.065771] RSP: 0000:ffff8881388ef140 EFLAGS: 00000246
> [   91.065779] RAX: 0000000000000000 RBX: 0000000000000003 RCX: ffffffff9be992fd
> [   91.065785] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffff888125500004
> [   91.065789] RBP: ffff8881388ef1e0 R08: 0000000000000001 R09: ffffed1024aa0de8
> [   91.065794] R10: ffff888125506f39 R11: ffffed1024aa0de7 R12: 0000000001067db0
> [   91.065799] R13: ffff888125500000 R14: 00000000014fe803 R15: ffff888125502112
> [   91.065804] FS:  00007fdec50ab180(0000) GS:ffff888257180000(0000) knlGS:0000000000000000
> [   91.065810] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   91.065815] CR2: 00007fdeb7cb6260 CR3: 000000011a436005 CR4: 0000000000770ee0
> [   91.065820] PKRU: 55555554
> [   91.065823] Call Trace:
> [   91.065826]  <TASK>
> [   91.065829]  ? lzma_main+0x37a/0x1260
> [   91.065845]  lzma2_lzma+0x2b9/0x430
> [   91.065857]  xz_dec_lzma2_run+0x11f/0xb90
> [   91.065867]  ? __asan_load4+0x55/0xa0
> [   91.065880]  xz_dec_run+0x346/0x11f0
> [   91.065892]  squashfs_xz_uncompress+0x196/0x370
> [   91.065905]  ? lzo_uncompress+0x400/0x400
> [   91.065913]  squashfs_decompress+0x88/0xd0
> [   91.065923]  squashfs_read_data+0x1e5/0x900
> [   91.065930]  ? __create_object+0x4ae/0x560
> [   91.065942]  ? squashfs_bio_read.isra.3+0x230/0x230
> [   91.065951]  ? __kasan_kmalloc+0xb6/0xc0
> [   91.065961]  ? squashfs_page_actor_init_special+0x1a6/0x210
> [   91.065972]  squashfs_readahead+0xaa3/0xe80
> [   91.065985]  ? squashfs_fill_page+0x190/0x190
> [   91.065993]  ? __filemap_add_folio+0x3a1/0x680
> [   91.066003]  ? dio_warn_stale_pagecache.part.67+0x90/0x90
> [   91.066012]  read_pages+0x122/0x540
> [   91.066023]  ? file_ra_state_init+0x60/0x60
> [   91.066032]  ? filemap_add_folio+0xd4/0x140
> [   91.066040]  ? folio_alloc+0x1b/0x50
> [   91.066051]  page_cache_ra_unbounded+0x1e6/0x280
> [   91.066064]  do_page_cache_ra+0x7c/0x90
> [   91.066074]  page_cache_ra_order+0x393/0x400
> [   91.066087]  ondemand_readahead+0x2f1/0x4e0
> [   91.066098]  page_cache_async_ra+0x8b/0xa0
> [   91.066106]  filemap_fault+0x742/0x1490
> [   91.066113]  ? __folio_memcg_unlock+0x35/0x80
> [   91.066124]  ? read_cache_page_gfp+0x90/0x90
> [   91.066132]  ? filemap_map_pages+0x28e/0xc60
> [   91.066145]  __do_fault+0x76/0x1b0
> [   91.066154]  do_fault+0x1c6/0x680
> [   91.066163]  __handle_mm_fault+0x89a/0x1310
> [   91.066173]  ? copy_page_range+0x1b20/0x1b20
> [   91.066181]  ? mt_find+0x189/0x330
> [   91.066190]  ? mas_next_entry+0xa80/0xa80
> [   91.066204]  handle_mm_fault+0x11b/0x390
> [   91.066213]  do_user_addr_fault+0x258/0x860
> [   91.066225]  exc_page_fault+0x64/0xf0
> [   91.066235]  asm_exc_page_fault+0x27/0x30
> [   91.066245] RIP: 0033:0x7fdeb7a1e541
> [   91.066252] Code: 11 44 b8 e0 0f 11 44 b8 f0 0f 11 04 b8 48 83 c7 40 48
> 83 c6 f8 75 92 48 85 d2 74 2d 4c 01 d7 49 8d 04 b9 48 83 c0 10 48 f7 da <0f>
> 28 05 18 7d 29 00 0f 1f 84 00 00 00 00 00 0f 11 40 f0 0f 11 00
> [   91.066259] RSP: 002b:00007fff46f77b60 EFLAGS: 00010293
> [   91.066265] RAX: 000055fbe1a57c30 RBX: 00007fff46f77d18 RCX: 000055fbe1a57c20
> [   91.066270] RDX: fffffffffffffffb RSI: 0000000000000005 RDI: 0000000000000000
> [   91.066274] RBP: 000055fbe16563b0 R08: 0000000000000028 R09: 000055fbe1a57c20
> [   91.066279] R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000002b
> [   91.066283] R13: 000055fbe1656408 R14: 00007fff46f77b80 R15: 0000000000000000
> [   91.066292]  </TASK>
> 
> (This is apparently only visible in CONFIG_KASAN=y build.)

And KASAN does add some overhead.

So maybe the value of CONFIG_RCU_EXP_CPU_STALL_TIMEOUT needs to be a
function of CONFIG_KASAN in your build.

> > > Yes, you guys do an amasing job of keeping 30 million lines of code organised
> > > and making some sense. I will cut the smalltalk as I know you are a busy man.
> > > If I make a progress to actually produce any patches fixing these lockups and
> > > stalls, I will be sure to include you into CC: as you requested.
> > 
> > Looking forward to seeing what you come up with!
> 
> There will have to be a lot of homework to catch up with to before I'd be able to do anything
> sensible. :)

Doing the homework up front can be a good thing!  ;-)

							Thanx, Paul

> Thanks,
> Mirsad
> 
> --
> Mirsad Goran Todorovac
> Sistem inženjer
> Grafički fakultet | Akademija likovnih umjetnosti
> Sveučilište u Zagrebu
> -- 
> System engineer
> Faculty of Graphic Arts | Academy of Fine Arts
> University of Zagreb, Republic of Croatia
> The European Union
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: BUG: BISECTED: in squashfs_xz_uncompress() (Was: RCU stalls in squashfs_readahead())
  2022-11-23 14:44                     ` Paul E. McKenney
@ 2022-11-23 16:08                       ` Elliott, Robert (Servers)
  2022-11-23 19:09                         ` Paul E. McKenney
  0 siblings, 1 reply; 25+ messages in thread
From: Elliott, Robert (Servers) @ 2022-11-23 16:08 UTC (permalink / raw)
  To: paulmck, Mirsad Goran Todorovac
  Cc: Phillip Lougher, LKML, phillip.lougher, Thorsten Leemhuis



> -----Original Message-----
> From: Paul E. McKenney <paulmck@kernel.org>
> Sent: Wednesday, November 23, 2022 8:44 AM
> Subject: Re: BUG: BISECTED: in squashfs_xz_uncompress() (Was: RCU stalls
> in squashfs_readahead())
> 
> On Wed, Nov 23, 2022 at 10:14:07AM +0100, Mirsad Goran Todorovac wrote:
...

> > commit 1045a06724f322ed61f1ffb994427c7bdbe64647
> > Author: Christoph Hellwig <hch@lst.de>
> > Date:   Wed Jun 29 17:01:02 2022 +0200
> >
> >     remove CONFIG_ANDROID
> 
> But your CONFIG_RCU_EXP_CPU_STALL_TIMEOUT seems to be about 20.
> As in 20 milliseconds.
> 
> Is that intentional?
...
> > > And one approach is to take a less aggressive RCU CPU stall timeout,
> > > say reducing from 21 seconds to (say) 15 seconds instead of all the
> > > way down to 20 milliseconds.  This could allow you to ease into the
> > > latency-reduction work.
> > >
> > > Alternatively, consider that response time is a property of the
> > > entire system plus the environment that it runs in.  So I suspect that
> > > the Android folks are accompanying that 20-millisecond timeout with
> > > some restrictions on what the on-phone workloads are permitted to do.
> > > Maybe ask the Android guys what those restrictions are and loosen them
> > > slightly, again allowing you to ease into the latency-reduction work.

Since there is no longer a kernel default of 20 ms for Android since
commit 1045a06724f3 ("remove CONFIG_ANDROID"), it might be appropriate
to drop that from stallwarn.rst. Something like:

diff --git a/Documentation/RCU/stallwarn.rst b/Documentation/RCU/stallwarn.rst
index e38c587067fc..d86a8b47504f 100644
--- a/Documentation/RCU/stallwarn.rst
+++ b/Documentation/RCU/stallwarn.rst
@@ -168,10 +168,11 @@ CONFIG_RCU_EXP_CPU_STALL_TIMEOUT
        Same as the CONFIG_RCU_CPU_STALL_TIMEOUT parameter but only for
        the expedited grace period. This parameter defines the period
        of time that RCU will wait from the beginning of an expedited
-       grace period until it issues an RCU CPU stall warning. This time
-       period is normally 20 milliseconds on Android devices.  A zero
-       value causes the CONFIG_RCU_CPU_STALL_TIMEOUT value to be used,
-       after conversion to milliseconds.
+       grace period until it issues an RCU CPU stall warning.
+
+       A zero value causes the CONFIG_RCU_CPU_STALL_TIMEOUT value to be
+        used, after conversion to milliseconds, limited to a maximum of
+        21 seconds.

        This configuration parameter may be changed at runtime via the
        /sys/module/rcupdate/parameters/rcu_exp_cpu_stall_timeout, however
diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
index 1b0c41d490f0..4477eeb8a54f 100644
--- a/kernel/rcu/Kconfig.debug
+++ b/kernel/rcu/Kconfig.debug
@@ -93,7 +93,7 @@ config RCU_EXP_CPU_STALL_TIMEOUT
          If the RCU grace period persists, additional CPU stall warnings
          are printed at more widely spaced intervals.  A value of zero
          says to use the RCU_CPU_STALL_TIMEOUT value converted from
-         seconds to milliseconds.
+         seconds to milliseconds, limited to a maximum of 21 seconds.

> > > But please keep in mind that cond_resched() helps only in kernels
> > > built with CONFIG_PREEMPTION=n.

I think they also help with
	CONFIG_PREEMPT_VOLUNTARY=y
which is used by Fedora and RHEL. For example, adding more of those in
the crypto tcrypt test module eliminated some RCU stalls when running
speed tests (that send lots of data through the algorithms).

> > I have bad news that 6.1-r6 is still affected with
> > squashfs_xz_uncompress bug, despite having both of your fixes
> > (as visible in above command's output -- double checked):
> >
> > [   91.065659] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 3-.... } 6 jiffies s: 621 root: 0x8/.
> 
> If you build with (say) CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=200, does
> this still happen?

BTW, you don't need to rebuild the kernel to change those parameters; they're
module parameters, so can be modified on the kernel command line (if needed
during boot) and sysfs (if only needed after boot).

For sysfs the syntax is:
#!/bin/bash
# set rcu timeouts to specified values
echo 60 > /sys/module/rcupdate/parameters/rcu_cpu_stall_timeout
echo 21000 > /sys/module/rcupdate/parameters/rcu_exp_cpu_stall_timeout
echo 600000 > /sys/module/rcupdate/parameters/rcu_task_stall_timeout
grep -Hs . /sys/module/rcupdate/parameters/rcu_*_timeout



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: BUG: BISECTED: in squashfs_xz_uncompress() (Was: RCU stalls in squashfs_readahead())
  2022-11-23 16:08                       ` Elliott, Robert (Servers)
@ 2022-11-23 19:09                         ` Paul E. McKenney
  2022-11-24 17:06                           ` Mirsad Goran Todorovac
  0 siblings, 1 reply; 25+ messages in thread
From: Paul E. McKenney @ 2022-11-23 19:09 UTC (permalink / raw)
  To: Elliott, Robert (Servers)
  Cc: Mirsad Goran Todorovac, Phillip Lougher, LKML, phillip.lougher,
	Thorsten Leemhuis

On Wed, Nov 23, 2022 at 04:08:03PM +0000, Elliott, Robert (Servers) wrote:
> > -----Original Message-----
> > From: Paul E. McKenney <paulmck@kernel.org>
> > Sent: Wednesday, November 23, 2022 8:44 AM
> > Subject: Re: BUG: BISECTED: in squashfs_xz_uncompress() (Was: RCU stalls
> > in squashfs_readahead())
> > 
> > On Wed, Nov 23, 2022 at 10:14:07AM +0100, Mirsad Goran Todorovac wrote:
> ...
> 
> > > commit 1045a06724f322ed61f1ffb994427c7bdbe64647
> > > Author: Christoph Hellwig <hch@lst.de>
> > > Date:   Wed Jun 29 17:01:02 2022 +0200
> > >
> > >     remove CONFIG_ANDROID
> > 
> > But your CONFIG_RCU_EXP_CPU_STALL_TIMEOUT seems to be about 20.
> > As in 20 milliseconds.
> > 
> > Is that intentional?
> ...
> > > > And one approach is to take a less aggressive RCU CPU stall timeout,
> > > > say reducing from 21 seconds to (say) 15 seconds instead of all the
> > > > way down to 20 milliseconds.  This could allow you to ease into the
> > > > latency-reduction work.
> > > >
> > > > Alternatively, consider that response time is a property of the
> > > > entire system plus the environment that it runs in.  So I suspect that
> > > > the Android folks are accompanying that 20-millisecond timeout with
> > > > some restrictions on what the on-phone workloads are permitted to do.
> > > > Maybe ask the Android guys what those restrictions are and loosen them
> > > > slightly, again allowing you to ease into the latency-reduction work.
> 
> Since there is no longer a kernel default of 20 ms for Android since
> commit 1045a06724f3 ("remove CONFIG_ANDROID"), it might be appropriate
> to drop that from stallwarn.rst. Something like:

Fair point, except that the Android guys asked for this.  In fact, the
Android guys supplied the patches.  I like the traceability.

> diff --git a/Documentation/RCU/stallwarn.rst b/Documentation/RCU/stallwarn.rst
> index e38c587067fc..d86a8b47504f 100644
> --- a/Documentation/RCU/stallwarn.rst
> +++ b/Documentation/RCU/stallwarn.rst
> @@ -168,10 +168,11 @@ CONFIG_RCU_EXP_CPU_STALL_TIMEOUT
>         Same as the CONFIG_RCU_CPU_STALL_TIMEOUT parameter but only for
>         the expedited grace period. This parameter defines the period
>         of time that RCU will wait from the beginning of an expedited
> -       grace period until it issues an RCU CPU stall warning. This time
> -       period is normally 20 milliseconds on Android devices.  A zero
> -       value causes the CONFIG_RCU_CPU_STALL_TIMEOUT value to be used,
> -       after conversion to milliseconds.
> +       grace period until it issues an RCU CPU stall warning.
> +
> +       A zero value causes the CONFIG_RCU_CPU_STALL_TIMEOUT value to be
> +        used, after conversion to milliseconds, limited to a maximum of
> +        21 seconds.
> 
>         This configuration parameter may be changed at runtime via the
>         /sys/module/rcupdate/parameters/rcu_exp_cpu_stall_timeout, however
> diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
> index 1b0c41d490f0..4477eeb8a54f 100644
> --- a/kernel/rcu/Kconfig.debug
> +++ b/kernel/rcu/Kconfig.debug
> @@ -93,7 +93,7 @@ config RCU_EXP_CPU_STALL_TIMEOUT
>           If the RCU grace period persists, additional CPU stall warnings
>           are printed at more widely spaced intervals.  A value of zero
>           says to use the RCU_CPU_STALL_TIMEOUT value converted from
> -         seconds to milliseconds.
> +         seconds to milliseconds, limited to a maximum of 21 seconds.
> 
> > > > But please keep in mind that cond_resched() helps only in kernels
> > > > built with CONFIG_PREEMPTION=n.
> 
> I think they also help with
> 	CONFIG_PREEMPT_VOLUNTARY=y
> which is used by Fedora and RHEL. For example, adding more of those in
> the crypto tcrypt test module eliminated some RCU stalls when running
> speed tests (that send lots of data through the algorithms).
> 
> > > I have bad news that 6.1-r6 is still affected with
> > > squashfs_xz_uncompress bug, despite having both of your fixes
> > > (as visible in above command's output -- double checked):
> > >
> > > [   91.065659] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 3-.... } 6 jiffies s: 621 root: 0x8/.
> > 
> > If you build with (say) CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=200, does
> > this still happen?
> 
> BTW, you don't need to rebuild the kernel to change those parameters; they're
> module parameters, so can be modified on the kernel command line (if needed
> during boot) and sysfs (if only needed after boot).
> 
> For sysfs the syntax is:
> #!/bin/bash
> # set rcu timeouts to specified values
> echo 60 > /sys/module/rcupdate/parameters/rcu_cpu_stall_timeout
> echo 21000 > /sys/module/rcupdate/parameters/rcu_exp_cpu_stall_timeout
> echo 600000 > /sys/module/rcupdate/parameters/rcu_task_stall_timeout
> grep -Hs . /sys/module/rcupdate/parameters/rcu_*_timeout

Excellent point, thank you!

I hope that this makes Mirsad's life easier, perhaps featuring less time
waiting for kernel builds and reboots.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: BUG: BISECTED: in squashfs_xz_uncompress() (Was: RCU stalls in squashfs_readahead())
  2022-11-23 19:09                         ` Paul E. McKenney
@ 2022-11-24 17:06                           ` Mirsad Goran Todorovac
  2022-11-24 17:19                             ` Paul E. McKenney
  0 siblings, 1 reply; 25+ messages in thread
From: Mirsad Goran Todorovac @ 2022-11-24 17:06 UTC (permalink / raw)
  To: paulmck, Elliott, Robert (Servers)
  Cc: Phillip Lougher, LKML, phillip.lougher, Thorsten Leemhuis

On 23. 11. 2022. 20:09, Paul E. McKenney wrote:

>>> If you build with (say) CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=200, does
>>> this still happen?
>>
>> BTW, you don't need to rebuild the kernel to change those parameters; they're
>> module parameters, so can be modified on the kernel command line (if needed
>> during boot) and sysfs (if only needed after boot).
>>
>> For sysfs the syntax is:
>> #!/bin/bash
>> # set rcu timeouts to specified values
>> echo 60 > /sys/module/rcupdate/parameters/rcu_cpu_stall_timeout
>> echo 21000 > /sys/module/rcupdate/parameters/rcu_exp_cpu_stall_timeout
>> echo 600000 > /sys/module/rcupdate/parameters/rcu_task_stall_timeout
>> grep -Hs . /sys/module/rcupdate/parameters/rcu_*_timeout
> 
> Excellent point, thank you!
> 
> I hope that this makes Mirsad's life easier, perhaps featuring less time
> waiting for kernel builds and reboots.  ;-)

Unfortunately, the first stall and NMI occurs before any system script or setting a /sys/module/rcupdate/parameters/*
could be executed, as second 14 of the boot process:

[   14.320045] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 7-.... } 6 jiffies s: 105 root: 0x80/.
[   14.320064] rcu: blocking rcu_node structures (internal RCU debug):
[   14.320070] Sending NMI from CPU 1 to CPUs 7:
[   14.320081] NMI backtrace for cpu 7
[   14.320085] CPU: 7 PID: 532 Comm: modprobe Not tainted 6.1.0-rc6 #1
[   14.320090] Hardware name: LENOVO 82H8/LNVNB161216, BIOS GGCN49WW 07/21/2022
[   14.320093] RIP: 0010:find_kallsyms_symbol+0x1ba/0x2e0
[   14.320103] Code: 24 02 84 c0 74 24 3c 2e 48 8b 55 a0 74 1c 48 39 55 b8 0f 82 f8 00 00 00 48 3b 55 c0 0f 86 ee 00 00 00 48 89 55 
c0 44 89 6d ac <41> 83 c5 01 48 83
c3 18 45 39 fd 0f 85 2d ff ff ff 8b 45 ac 4c 8b
[   14.320107] RSP: 0018:ffff88811ab5f4f0 EFLAGS: 00000202
[   14.320113] RAX: 0000000000000000 RBX: ffffffffc1d1429e RCX: ffffffffb1c378e5
[   14.320116] RDX: ffffc900066e11e0 RSI: dffffc0000000000 RDI: ffffffffc1d142a0
[   14.320120] RBP: ffff88811ab5f560 R08: 0000000000000000 R09: ffff88811ab5f670
[   14.320123] R10: ffff88811ab5f90f R11: ffffed102356bf21 R12: ffffffffc1d33bd9
[   14.320126] R13: 0000000000000171 R14: 000000000000005f R15: 00000000000014ed
[   14.320128] FS:  00007f878ce39000(0000) GS:ffff888257380000(0000) knlGS:0000000000000000
[   14.320132] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   14.320135] CR2: 00007f13ac19cae0 CR3: 000000010c73e001 CR4: 0000000000770ee0
[   14.320138] PKRU: 55555554
[   14.320140] Call Trace:
[   14.320142]  <TASK>
[   14.320145]  ? __SCT__tp_func_raid56_scrub_read_recover+0x8/0x8 [btrfs]
[   14.320318]  ? btrfs_free_reserved_bytes+0xe0/0xe0 [btrfs]
[   14.320485]  ? btrfs_show_options+0x860/0x860 [btrfs]
[   14.320647]  ? btrfs_free_reserved_bytes+0xe0/0xe0 [btrfs]
[   14.320816]  module_address_lookup+0x7a/0xc0
[   14.320822]  ? btrfs_free_reserved_bytes+0xe0/0xe0 [btrfs]
[   14.320987]  kallsyms_lookup_buildid+0x172/0x200
[   14.320993]  ? do_kernel_range_flush+0x60/0x60
[   14.321002]  kallsyms_lookup+0x14/0x20
[   14.321007]  test_for_valid_rec.isra.38+0x95/0x110
[   14.321015]  ? remove_direct_functions_hash.isra.37+0x1c0/0x1c0
[   14.321021]  ? _find_next_bit+0x42/0xe0
[   14.321037]  ? __kasan_check_write+0x14/0x20
[   14.321043]  ? mutex_lock+0x8f/0xe0
[   14.321049]  ? __mutex_lock_slowpath+0x20/0x20
[   14.321056]  ? btrfs_free_reserved_bytes+0xe0/0xe0 [btrfs]
[   14.321220]  ftrace_module_enable+0x15f/0x460
[   14.321228]  load_module+0x323e/0x3800
[   14.321233]  ? __kernel_read+0x304/0x450
[   14.321246]  ? module_frob_arch_sections+0x40/0x40
[   14.321252]  ? security_kernel_post_read_file+0x6d/0x80
[   14.321259]  ? __kasan_check_write+0x14/0x20
[   14.321264]  ? kernel_read_file+0x20a/0x490
[   14.321270]  ? __kasan_check_read+0x11/0x20
[   14.321275]  ? __fget_light+0xb6/0x220
[   14.321283]  __do_sys_finit_module+0x131/0x1d0
[   14.321288]  ? __do_sys_finit_module+0x131/0x1d0
[   14.321293]  ? __ia32_sys_init_module+0x50/0x50
[   14.321297]  ? vm_mmap_pgoff+0x1a8/0x240
[   14.321307]  ? __kasan_check_write+0x14/0x20
[   14.321311]  ? fput+0x1d/0xd0
[   14.321317]  __x64_sys_finit_module+0x43/0x50
[   14.321322]  do_syscall_64+0x58/0x80
[   14.321330]  ? exit_to_user_mode_prepare+0x32/0x190
[   14.321337]  ? syscall_exit_to_user_mode+0x1b/0x30
[   14.321343]  ? do_syscall_64+0x67/0x80
[   14.321348]  ? fpregs_assert_state_consistent+0x5e/0x70
[   14.321356]  ? exit_to_user_mode_prepare+0x32/0x190
[   14.321362]  ? syscall_exit_to_user_mode+0x1b/0x30
[   14.321367]  ? do_syscall_64+0x67/0x80
[   14.321372]  ? syscall_exit_to_user_mode+0x1b/0x30
[   14.321377]  ? do_syscall_64+0x67/0x80
[   14.321382]  ? do_syscall_64+0x67/0x80
[   14.321388]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[   14.321397] RIP: 0033:0x7f878c51ea3d
[   14.321401] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff
73 01 c3 48 8b 0d c3 a3 0f 00 f7 d8 64 89 01 48
[   14.321405] RSP: 002b:00007ffd078d4c88 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[   14.321410] RAX: ffffffffffffffda RBX: 0000558c0d8830a0 RCX: 00007f878c51ea3d
[   14.321414] RDX: 0000000000000000 RSI: 0000558c0d6fdcd2 RDI: 0000000000000005
[   14.321417] RBP: 0000000000040000 R08: 0000000000000000 R09: 0000000000000002
[   14.321419] R10: 0000000000000005 R11: 0000000000000246 R12: 0000558c0d6fdcd2
[   14.321422] R13: 0000558c0d883060 R14: 0000558c0d8824f0 R15: 0000558c0d88f5d0
[   14.321428]  </TASK>

Probably something sensible should be set in the case of KASAN build. This example of stall
apparently has nothing to do with squashfs_readahead().

Thanks,
Mirsad

--
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu
-- 
System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia
The European Union


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: BUG: BISECTED: in squashfs_xz_uncompress() (Was: RCU stalls in squashfs_readahead())
  2022-11-24 17:06                           ` Mirsad Goran Todorovac
@ 2022-11-24 17:19                             ` Paul E. McKenney
  2022-11-24 18:04                               ` Mirsad Goran Todorovac
  0 siblings, 1 reply; 25+ messages in thread
From: Paul E. McKenney @ 2022-11-24 17:19 UTC (permalink / raw)
  To: Mirsad Goran Todorovac
  Cc: Elliott, Robert (Servers),
	Phillip Lougher, LKML, phillip.lougher, Thorsten Leemhuis

On Thu, Nov 24, 2022 at 06:06:13PM +0100, Mirsad Goran Todorovac wrote:
> On 23. 11. 2022. 20:09, Paul E. McKenney wrote:
> 
> > > > If you build with (say) CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=200, does
> > > > this still happen?
> > > 
> > > BTW, you don't need to rebuild the kernel to change those parameters; they're
> > > module parameters, so can be modified on the kernel command line (if needed
> > > during boot) and sysfs (if only needed after boot).
> > > 
> > > For sysfs the syntax is:
> > > #!/bin/bash
> > > # set rcu timeouts to specified values
> > > echo 60 > /sys/module/rcupdate/parameters/rcu_cpu_stall_timeout
> > > echo 21000 > /sys/module/rcupdate/parameters/rcu_exp_cpu_stall_timeout
> > > echo 600000 > /sys/module/rcupdate/parameters/rcu_task_stall_timeout
> > > grep -Hs . /sys/module/rcupdate/parameters/rcu_*_timeout
> > 
> > Excellent point, thank you!
> > 
> > I hope that this makes Mirsad's life easier, perhaps featuring less time
> > waiting for kernel builds and reboots.  ;-)
> 
> Unfortunately, the first stall and NMI occurs before any system script or setting a /sys/module/rcupdate/parameters/*
> could be executed, as second 14 of the boot process:
> 
> [   14.320045] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 7-.... } 6 jiffies s: 105 root: 0x80/.
> [   14.320064] rcu: blocking rcu_node structures (internal RCU debug):
> [   14.320070] Sending NMI from CPU 1 to CPUs 7:
> [   14.320081] NMI backtrace for cpu 7
> [   14.320085] CPU: 7 PID: 532 Comm: modprobe Not tainted 6.1.0-rc6 #1
> [   14.320090] Hardware name: LENOVO 82H8/LNVNB161216, BIOS GGCN49WW 07/21/2022
> [   14.320093] RIP: 0010:find_kallsyms_symbol+0x1ba/0x2e0
> [   14.320103] Code: 24 02 84 c0 74 24 3c 2e 48 8b 55 a0 74 1c 48 39 55 b8
> 0f 82 f8 00 00 00 48 3b 55 c0 0f 86 ee 00 00 00 48 89 55 c0 44 89 6d ac <41>
> 83 c5 01 48 83
> c3 18 45 39 fd 0f 85 2d ff ff ff 8b 45 ac 4c 8b
> [   14.320107] RSP: 0018:ffff88811ab5f4f0 EFLAGS: 00000202
> [   14.320113] RAX: 0000000000000000 RBX: ffffffffc1d1429e RCX: ffffffffb1c378e5
> [   14.320116] RDX: ffffc900066e11e0 RSI: dffffc0000000000 RDI: ffffffffc1d142a0
> [   14.320120] RBP: ffff88811ab5f560 R08: 0000000000000000 R09: ffff88811ab5f670
> [   14.320123] R10: ffff88811ab5f90f R11: ffffed102356bf21 R12: ffffffffc1d33bd9
> [   14.320126] R13: 0000000000000171 R14: 000000000000005f R15: 00000000000014ed
> [   14.320128] FS:  00007f878ce39000(0000) GS:ffff888257380000(0000) knlGS:0000000000000000
> [   14.320132] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   14.320135] CR2: 00007f13ac19cae0 CR3: 000000010c73e001 CR4: 0000000000770ee0
> [   14.320138] PKRU: 55555554
> [   14.320140] Call Trace:
> [   14.320142]  <TASK>
> [   14.320145]  ? __SCT__tp_func_raid56_scrub_read_recover+0x8/0x8 [btrfs]
> [   14.320318]  ? btrfs_free_reserved_bytes+0xe0/0xe0 [btrfs]
> [   14.320485]  ? btrfs_show_options+0x860/0x860 [btrfs]
> [   14.320647]  ? btrfs_free_reserved_bytes+0xe0/0xe0 [btrfs]
> [   14.320816]  module_address_lookup+0x7a/0xc0
> [   14.320822]  ? btrfs_free_reserved_bytes+0xe0/0xe0 [btrfs]
> [   14.320987]  kallsyms_lookup_buildid+0x172/0x200
> [   14.320993]  ? do_kernel_range_flush+0x60/0x60
> [   14.321002]  kallsyms_lookup+0x14/0x20
> [   14.321007]  test_for_valid_rec.isra.38+0x95/0x110
> [   14.321015]  ? remove_direct_functions_hash.isra.37+0x1c0/0x1c0
> [   14.321021]  ? _find_next_bit+0x42/0xe0
> [   14.321037]  ? __kasan_check_write+0x14/0x20
> [   14.321043]  ? mutex_lock+0x8f/0xe0
> [   14.321049]  ? __mutex_lock_slowpath+0x20/0x20
> [   14.321056]  ? btrfs_free_reserved_bytes+0xe0/0xe0 [btrfs]
> [   14.321220]  ftrace_module_enable+0x15f/0x460
> [   14.321228]  load_module+0x323e/0x3800
> [   14.321233]  ? __kernel_read+0x304/0x450
> [   14.321246]  ? module_frob_arch_sections+0x40/0x40
> [   14.321252]  ? security_kernel_post_read_file+0x6d/0x80
> [   14.321259]  ? __kasan_check_write+0x14/0x20
> [   14.321264]  ? kernel_read_file+0x20a/0x490
> [   14.321270]  ? __kasan_check_read+0x11/0x20
> [   14.321275]  ? __fget_light+0xb6/0x220
> [   14.321283]  __do_sys_finit_module+0x131/0x1d0
> [   14.321288]  ? __do_sys_finit_module+0x131/0x1d0
> [   14.321293]  ? __ia32_sys_init_module+0x50/0x50
> [   14.321297]  ? vm_mmap_pgoff+0x1a8/0x240
> [   14.321307]  ? __kasan_check_write+0x14/0x20
> [   14.321311]  ? fput+0x1d/0xd0
> [   14.321317]  __x64_sys_finit_module+0x43/0x50
> [   14.321322]  do_syscall_64+0x58/0x80
> [   14.321330]  ? exit_to_user_mode_prepare+0x32/0x190
> [   14.321337]  ? syscall_exit_to_user_mode+0x1b/0x30
> [   14.321343]  ? do_syscall_64+0x67/0x80
> [   14.321348]  ? fpregs_assert_state_consistent+0x5e/0x70
> [   14.321356]  ? exit_to_user_mode_prepare+0x32/0x190
> [   14.321362]  ? syscall_exit_to_user_mode+0x1b/0x30
> [   14.321367]  ? do_syscall_64+0x67/0x80
> [   14.321372]  ? syscall_exit_to_user_mode+0x1b/0x30
> [   14.321377]  ? do_syscall_64+0x67/0x80
> [   14.321382]  ? do_syscall_64+0x67/0x80
> [   14.321388]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [   14.321397] RIP: 0033:0x7f878c51ea3d
> [   14.321401] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89
> f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff
> 73 01 c3 48 8b 0d c3 a3 0f 00 f7 d8 64 89 01 48
> [   14.321405] RSP: 002b:00007ffd078d4c88 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> [   14.321410] RAX: ffffffffffffffda RBX: 0000558c0d8830a0 RCX: 00007f878c51ea3d
> [   14.321414] RDX: 0000000000000000 RSI: 0000558c0d6fdcd2 RDI: 0000000000000005
> [   14.321417] RBP: 0000000000040000 R08: 0000000000000000 R09: 0000000000000002
> [   14.321419] R10: 0000000000000005 R11: 0000000000000246 R12: 0000558c0d6fdcd2
> [   14.321422] R13: 0000558c0d883060 R14: 0000558c0d8824f0 R15: 0000558c0d88f5d0
> [   14.321428]  </TASK>
> 
> Probably something sensible should be set in the case of KASAN build. This example of stall
> apparently has nothing to do with squashfs_readahead().

Can't have everything, I guess!

How about building your kernel with CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=200?
Again, mainline defaults to 21000.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: BUG: BISECTED: in squashfs_xz_uncompress() (Was: RCU stalls in squashfs_readahead())
  2022-11-24 17:19                             ` Paul E. McKenney
@ 2022-11-24 18:04                               ` Mirsad Goran Todorovac
  2022-11-24 19:32                                 ` Phillip Lougher
  2022-11-24 20:15                                 ` Paul E. McKenney
  0 siblings, 2 replies; 25+ messages in thread
From: Mirsad Goran Todorovac @ 2022-11-24 18:04 UTC (permalink / raw)
  To: paulmck
  Cc: Elliott, Robert (Servers),
	Phillip Lougher, LKML, phillip.lougher, Thorsten Leemhuis

On 24. 11. 2022. 18:19, Paul E. McKenney wrote:
> On Thu, Nov 24, 2022 at 06:06:13PM +0100, Mirsad Goran Todorovac wrote:
>> On 23. 11. 2022. 20:09, Paul E. McKenney wrote:
>>
>>>>> If you build with (say) CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=200, does
>>>>> this still happen?
>>>>
>>>> BTW, you don't need to rebuild the kernel to change those parameters; they're
>>>> module parameters, so can be modified on the kernel command line (if needed
>>>> during boot) and sysfs (if only needed after boot).
>>>>
>>>> For sysfs the syntax is:
>>>> #!/bin/bash
>>>> # set rcu timeouts to specified values
>>>> echo 60 > /sys/module/rcupdate/parameters/rcu_cpu_stall_timeout
>>>> echo 21000 > /sys/module/rcupdate/parameters/rcu_exp_cpu_stall_timeout
>>>> echo 600000 > /sys/module/rcupdate/parameters/rcu_task_stall_timeout
>>>> grep -Hs . /sys/module/rcupdate/parameters/rcu_*_timeout
>>>
>>> Excellent point, thank you!
>>>
>>> I hope that this makes Mirsad's life easier, perhaps featuring less time
>>> waiting for kernel builds and reboots.  ;-)
>>
>> Unfortunately, the first stall and NMI occurs before any system script or setting a /sys/module/rcupdate/parameters/*
>> could be executed, as second 14 of the boot process:
>>
>> [   14.320045] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 7-.... } 6 jiffies s: 105 root: 0x80/.
>> [   14.320064] rcu: blocking rcu_node structures (internal RCU debug):

...

>> Probably something sensible should be set in the case of KASAN build. This example of stall
>> apparently has nothing to do with squashfs_readahead().
> 
> Can't have everything, I guess!
> 
> How about building your kernel with CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=200?
> Again, mainline defaults to 21000.

Did just that, and so far there is no modprobe stall in second 14 of boot at least. Looks good.
Probably it is too early to say anything in general before more uptime and stress load.

BTW, the 20 for CONFIG_RCU_EXP_STALL_TIMEOUT wasn't my invention, but it comes from generic
Ubuntu stock kernel (but without KASAN or KMEMLEAK config options):

# grep STALL /boot/config-5.19.5-051905-generic
CONFIG_RCU_STALL_COMMON=y
# CONFIG_HEADERS_INSTALL is not set
CONFIG_RCU_CPU_STALL_TIMEOUT=60
CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=20
#

Thanks,
Mirsad

--
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu
-- 
System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia
The European Union


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: BUG: BISECTED: in squashfs_xz_uncompress() (Was: RCU stalls in squashfs_readahead())
  2022-11-24 18:04                               ` Mirsad Goran Todorovac
@ 2022-11-24 19:32                                 ` Phillip Lougher
  2022-12-06 20:35                                   ` Mirsad Goran Todorovac
  2022-11-24 20:15                                 ` Paul E. McKenney
  1 sibling, 1 reply; 25+ messages in thread
From: Phillip Lougher @ 2022-11-24 19:32 UTC (permalink / raw)
  To: Mirsad Goran Todorovac, paulmck
  Cc: Elliott, Robert (Servers), LKML, phillip.lougher, Thorsten Leemhuis

On 24/11/2022 18:04, Mirsad Goran Todorovac wrote:
> On 24. 11. 2022. 18:19, Paul E. McKenney wrote:
>> On Thu, Nov 24, 2022 at 06:06:13PM +0100, Mirsad Goran Todorovac wrote:
>>> On 23. 11. 2022. 20:09, Paul E. McKenney wrote:
>>>
>>>>>> If you build with (say) CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=200, does
>>>>>> this still happen?
>>>>>
>>>>> BTW, you don't need to rebuild the kernel to change those 
>>>>> parameters; they're
>>>>> module parameters, so can be modified on the kernel command line 
>>>>> (if needed
>>>>> during boot) and sysfs (if only needed after boot).
>>>>>
>>>>> For sysfs the syntax is:
>>>>> #!/bin/bash
>>>>> # set rcu timeouts to specified values
>>>>> echo 60 > /sys/module/rcupdate/parameters/rcu_cpu_stall_timeout
>>>>> echo 21000 > /sys/module/rcupdate/parameters/rcu_exp_cpu_stall_timeout
>>>>> echo 600000 > /sys/module/rcupdate/parameters/rcu_task_stall_timeout
>>>>> grep -Hs . /sys/module/rcupdate/parameters/rcu_*_timeout
>>>>
>>>> Excellent point, thank you!
>>>>
>>>> I hope that this makes Mirsad's life easier, perhaps featuring less 
>>>> time
>>>> waiting for kernel builds and reboots.  ;-)
>>>
>>> Unfortunately, the first stall and NMI occurs before any system 
>>> script or setting a /sys/module/rcupdate/parameters/*
>>> could be executed, as second 14 of the boot process:
>>>
>>> [   14.320045] rcu: INFO: rcu_preempt detected expedited stalls on 
>>> CPUs/tasks: { 7-.... } 6 jiffies s: 105 root: 0x80/.
>>> [   14.320064] rcu: blocking rcu_node structures (internal RCU debug):
> 
> ...
> 
>>> Probably something sensible should be set in the case of KASAN build. 
>>> This example of stall
>>> apparently has nothing to do with squashfs_readahead().
>>
>> Can't have everything, I guess!
>>
>> How about building your kernel with CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=200?
>> Again, mainline defaults to 21000.
> 
> Did just that, and so far there is no modprobe stall in second 14 of 
> boot at least. Looks good.
> Probably it is too early to say anything in general before more uptime 
> and stress load.
> 
> BTW, the 20 for CONFIG_RCU_EXP_STALL_TIMEOUT wasn't my invention, but it 
> comes from generic
> Ubuntu stock kernel (but without KASAN or KMEMLEAK config options):
> 
> # grep STALL /boot/config-5.19.5-051905-generic
> CONFIG_RCU_STALL_COMMON=y
> # CONFIG_HEADERS_INSTALL is not set
> CONFIG_RCU_CPU_STALL_TIMEOUT=60
> CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=20 > #

That has been raised as a bug, and a fix has been committed.

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1991951


> 
> Thanks,
> Mirsad
> 
> -- 
> Mirsad Goran Todorovac
> Sistem inženjer
> Grafički fakultet | Akademija likovnih umjetnosti
> Sveučilište u Zagrebu


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: BUG: BISECTED: in squashfs_xz_uncompress() (Was: RCU stalls in squashfs_readahead())
  2022-11-24 18:04                               ` Mirsad Goran Todorovac
  2022-11-24 19:32                                 ` Phillip Lougher
@ 2022-11-24 20:15                                 ` Paul E. McKenney
  1 sibling, 0 replies; 25+ messages in thread
From: Paul E. McKenney @ 2022-11-24 20:15 UTC (permalink / raw)
  To: Mirsad Goran Todorovac
  Cc: Elliott, Robert (Servers),
	Phillip Lougher, LKML, phillip.lougher, Thorsten Leemhuis

On Thu, Nov 24, 2022 at 07:04:13PM +0100, Mirsad Goran Todorovac wrote:
> On 24. 11. 2022. 18:19, Paul E. McKenney wrote:
> > On Thu, Nov 24, 2022 at 06:06:13PM +0100, Mirsad Goran Todorovac wrote:
> > > On 23. 11. 2022. 20:09, Paul E. McKenney wrote:
> > > 
> > > > > > If you build with (say) CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=200, does
> > > > > > this still happen?
> > > > > 
> > > > > BTW, you don't need to rebuild the kernel to change those parameters; they're
> > > > > module parameters, so can be modified on the kernel command line (if needed
> > > > > during boot) and sysfs (if only needed after boot).
> > > > > 
> > > > > For sysfs the syntax is:
> > > > > #!/bin/bash
> > > > > # set rcu timeouts to specified values
> > > > > echo 60 > /sys/module/rcupdate/parameters/rcu_cpu_stall_timeout
> > > > > echo 21000 > /sys/module/rcupdate/parameters/rcu_exp_cpu_stall_timeout
> > > > > echo 600000 > /sys/module/rcupdate/parameters/rcu_task_stall_timeout
> > > > > grep -Hs . /sys/module/rcupdate/parameters/rcu_*_timeout
> > > > 
> > > > Excellent point, thank you!
> > > > 
> > > > I hope that this makes Mirsad's life easier, perhaps featuring less time
> > > > waiting for kernel builds and reboots.  ;-)
> > > 
> > > Unfortunately, the first stall and NMI occurs before any system script or setting a /sys/module/rcupdate/parameters/*
> > > could be executed, as second 14 of the boot process:
> > > 
> > > [   14.320045] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 7-.... } 6 jiffies s: 105 root: 0x80/.
> > > [   14.320064] rcu: blocking rcu_node structures (internal RCU debug):
> 
> ...
> 
> > > Probably something sensible should be set in the case of KASAN build. This example of stall
> > > apparently has nothing to do with squashfs_readahead().
> > 
> > Can't have everything, I guess!
> > 
> > How about building your kernel with CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=200?
> > Again, mainline defaults to 21000.
> 
> Did just that, and so far there is no modprobe stall in second 14 of boot at least. Looks good.
> Probably it is too early to say anything in general before more uptime and stress load.
> 
> BTW, the 20 for CONFIG_RCU_EXP_STALL_TIMEOUT wasn't my invention, but it comes from generic
> Ubuntu stock kernel (but without KASAN or KMEMLEAK config options):
> 
> # grep STALL /boot/config-5.19.5-051905-generic
> CONFIG_RCU_STALL_COMMON=y
> # CONFIG_HEADERS_INSTALL is not set
> CONFIG_RCU_CPU_STALL_TIMEOUT=60
> CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=20
> #

Gotta admit, those Ubuntu folks do not lack for bravery!!!  Even if they
don't also enable KASAN or KMEMLEAK by default.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: BUG: BISECTED: in squashfs_xz_uncompress() (Was: RCU stalls in squashfs_readahead())
  2022-11-24 19:32                                 ` Phillip Lougher
@ 2022-12-06 20:35                                   ` Mirsad Goran Todorovac
  0 siblings, 0 replies; 25+ messages in thread
From: Mirsad Goran Todorovac @ 2022-12-06 20:35 UTC (permalink / raw)
  To: Phillip Lougher, paulmck
  Cc: Elliott, Robert (Servers), LKML, phillip.lougher, Thorsten Leemhuis

On 24. 11. 2022. 20:32, Phillip Lougher wrote:
> On 24/11/2022 18:04, Mirsad Goran Todorovac wrote:
>> On 24. 11. 2022. 18:19, Paul E. McKenney wrote:
>>> On Thu, Nov 24, 2022 at 06:06:13PM +0100, Mirsad Goran Todorovac wrote:
>>>> On 23. 11. 2022. 20:09, Paul E. McKenney wrote:
>>>>
>>>>>>> If you build with (say) CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=200, does
>>>>>>> this still happen?
>>>>>>
>>>>>> BTW, you don't need to rebuild the kernel to change those parameters; they're
>>>>>> module parameters, so can be modified on the kernel command line (if needed
>>>>>> during boot) and sysfs (if only needed after boot).
>>>>>>
>>>>>> For sysfs the syntax is:
>>>>>> #!/bin/bash
>>>>>> # set rcu timeouts to specified values
>>>>>> echo 60 > /sys/module/rcupdate/parameters/rcu_cpu_stall_timeout
>>>>>> echo 21000 > /sys/module/rcupdate/parameters/rcu_exp_cpu_stall_timeout
>>>>>> echo 600000 > /sys/module/rcupdate/parameters/rcu_task_stall_timeout
>>>>>> grep -Hs . /sys/module/rcupdate/parameters/rcu_*_timeout
>>>>>
>>>>> Excellent point, thank you!
>>>>>
>>>>> I hope that this makes Mirsad's life easier, perhaps featuring less time
>>>>> waiting for kernel builds and reboots.  ;-)
>>>>
>>>> Unfortunately, the first stall and NMI occurs before any system script or setting a /sys/module/rcupdate/parameters/*
>>>> could be executed, as second 14 of the boot process:
>>>>
>>>> [   14.320045] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 7-.... } 6 jiffies s: 105 root: 0x80/.
>>>> [   14.320064] rcu: blocking rcu_node structures (internal RCU debug):
>>
>> ...
>>
>>>> Probably something sensible should be set in the case of KASAN build. This example of stall
>>>> apparently has nothing to do with squashfs_readahead().
>>>
>>> Can't have everything, I guess!
>>>
>>> How about building your kernel with CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=200?
>>> Again, mainline defaults to 21000.
>>
>> Did just that, and so far there is no modprobe stall in second 14 of boot at least. Looks good.
>> Probably it is too early to say anything in general before more uptime and stress load.
>>
>> BTW, the 20 for CONFIG_RCU_EXP_STALL_TIMEOUT wasn't my invention, but it comes from generic
>> Ubuntu stock kernel (but without KASAN or KMEMLEAK config options):
>>
>> # grep STALL /boot/config-5.19.5-051905-generic
>> CONFIG_RCU_STALL_COMMON=y
>> # CONFIG_HEADERS_INSTALL is not set
>> CONFIG_RCU_CPU_STALL_TIMEOUT=60
>> CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=20 > #
> 
> That has been raised as a bug, and a fix has been committed.
> 
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1991951

P.S.

As for the comment that I am thrashing my systems, I now test activated MG-LRU kernel option
with 6.1-rc8 build and it functions much better, with no multimedia lags or chirps, even
with only 130/8192 MiB free and 5/10 GiB in swap area.

I am running basically the same load of simultaneously opened Firefox, Chrome and Thunderbird
windows.

However, I have set CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=0

The conclusion is that the squashfs isn't blocking, but 6 to 8 jiffies were not enough to
complete the operation, so other CPUs issued NMIs. With longer timeout, it is evident that
it was a longer operation due to KASAN build and not a lockup.

So I think I have to apologise to have wasted so much of your time with a false alarm.

To summarise, the culprit was obviously the CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=20 setting
from the Ubuntu mainline kernel stock, which I unsuspectedly copied into my build and
made a recommended "make olddefconfig".

Thanks,
Mirsad

--
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu
-- 
System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia
The European Union


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: BUG: in squashfs_xz_uncompress() (Was: RCU stalls in squashfs_readahead())
  2022-11-18 16:51     ` BUG: " Elliott, Robert (Servers)
@ 2022-12-20 10:43       ` Mirsad Todorovac
  0 siblings, 0 replies; 25+ messages in thread
From: Mirsad Todorovac @ 2022-12-20 10:43 UTC (permalink / raw)
  To: Elliott, Robert (Servers), Phillip Lougher, LKML, Paul E. McKenney
  Cc: phillip.lougher, Thorsten Leemhuis

On 11/18/22 17:51, Elliott, Robert (Servers) wrote:
> 
> 
>> -----Original Message-----
>> From: Phillip Lougher <phillip@squashfs.org.uk>
>> Sent: Friday, November 18, 2022 12:11 AM
>> To: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>; LKML <linux-
>> kernel@vger.kernel.org>; Paul E. McKenney <paulmck@kernel.org>
>> Cc: phillip.lougher@gmail.com; Thorsten Leemhuis
>> <regressions@leemhuis.info>
>> Subject: Re: BUG: in squashfs_xz_uncompress() (Was: RCU stalls in
>> squashfs_readahead())
>>
>> On 17/11/2022 23:05, Mirsad Goran Todorovac wrote:
>>> Hi,
>>>
>>> While trying to bisect, I've found another bug that predated the
>>> introduction of squashfs_readahead(), but it has
>>> a common denominator in squashfs_decompress() and
>> squashfs_xz_uncompress().
>>
>> Wrong, the stall is happening in the XZ decompressor library, which
>> is *not* in Squashfs.
>>
>> This reported stall in the decompressor code is likely a symptom of you
>> deliberately thrashing your system.  When the system thrashes everything
>> starts to happen very slowly, and the system will spend a lot of
>> its time doing page I/O, and the CPU will spend a lot of time in
>> any CPU intensive code like the XZ decompressor library.
>>
>> So the fact the stall is being hit here is a symptom and not
>> a cause.  The decompressor code is likely running slowly due to
>> thrashing and waiting on paged-out buffers.  This is not indicative
>> of any bug, merely a system running slowly due to overload.
>>
>> As I said, this is not a Squashfs issue, because the code when the
>> stall takes place isn't in Squashfs.
>>
>> The people responsible for the rcu code should have a lot more insight
>> about what happens when the system is thrashing, and how this will
>> throw up false positives.  In this I believe this is an instance of
>> perfectly correct code running slowly due to thrashing incorrectly
>> being flagged as looping.
>>
>> CC'ing Paul E. McKenney <paulmck@kernel.org>
>>
>> Phillip
> 
> How big can these readahead sizes be? Should one of the loops include
> cond_resched() calls?

Please allow me to assert that 6.1.0+ kernel (this Berlin time 6 AM 
morning's build on on Torvalds' tree) built with CONFIG_KMEMLEAK=y, 
CONFIG_KASAN=y, CONFIG_LRU_GEN=y (multi-gen LRU) and
CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=20 doesn't exhibit before seen RCU 
stalls even with such a low timeout as 20 ms.

So I guess kudos go to the MG-LRU developers, or has Mr. Lougher done 
something efficient in the meantime.

My $0.02!

Thank you,
Mirsad

-- 
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu
-- 
System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2022-12-20 10:43 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-17 19:21 RCU stalls in squashfs_readahead() Mirsad Goran Todorovac
2022-11-17 23:05 ` BUG: in squashfs_xz_uncompress() (Was: RCU stalls in squashfs_readahead()) Mirsad Goran Todorovac
2022-11-17 23:40   ` Mirsad Goran Todorovac
2022-11-18  6:11   ` Phillip Lougher
2022-11-18 15:17     ` Mirsad Goran Todorovac
2022-11-18 15:55     ` Paul E. McKenney
2022-11-20 15:49       ` BUG: BISECTED: " Mirsad Goran Todorovac
2022-11-20 17:50         ` Paul E. McKenney
2022-11-20 18:36           ` Mirsad Goran Todorovac
2022-11-20 19:21             ` Paul E. McKenney
2022-11-21  4:04               ` Mirsad Goran Todorovac
2022-11-22  2:07                 ` Paul E. McKenney
2022-11-23  9:14                   ` Mirsad Goran Todorovac
2022-11-23 14:44                     ` Paul E. McKenney
2022-11-23 16:08                       ` Elliott, Robert (Servers)
2022-11-23 19:09                         ` Paul E. McKenney
2022-11-24 17:06                           ` Mirsad Goran Todorovac
2022-11-24 17:19                             ` Paul E. McKenney
2022-11-24 18:04                               ` Mirsad Goran Todorovac
2022-11-24 19:32                                 ` Phillip Lougher
2022-12-06 20:35                                   ` Mirsad Goran Todorovac
2022-11-24 20:15                                 ` Paul E. McKenney
2022-11-20 19:33             ` Mirsad Goran Todorovac
2022-11-18 16:51     ` BUG: " Elliott, Robert (Servers)
2022-12-20 10:43       ` Mirsad Todorovac

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).