linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 5.19.0: dnf install hangs when system is under load
@ 2022-08-10 16:19 Chris Murphy
  2022-08-11  1:32 ` Qu Wenruo
  0 siblings, 1 reply; 3+ messages in thread
From: Chris Murphy @ 2022-08-10 16:19 UTC (permalink / raw)
  To: Btrfs BTRFS

Downstream bug - 5.19.0: dnf install hangs when system is under load
https://bugzilla.redhat.com/show_bug.cgi?id=2117326

5.19.0-65.fc37.x86_64

Setup
btrfs raid10 on 8x plain partitions

Command
sudo dnf install pciutils

Reproducible:
About 1 in 3, correlates with the system being under heavy load, otherwise it's not happening

Get stuck at 
Running scriptlet: sg3_utils-1.46-3.fc36.x86_64   2/2 

ps aux status for dnf is D+, kill -9 does nothing, strace shows nothing. The hang last at least 10 minutes, didn't test beyond that.

Full dmesg with sysrq+w is attached to the bug report.

snippet

[ 2268.057017] sysrq: Show Blocked State
[ 2268.057866] task:kworker/u97:11  state:D stack:    0 pid:  340 ppid:     2 flags:0x00004000
[ 2268.058361] Workqueue: writeback wb_workfn (flush-btrfs-1)
[ 2268.058825] Call Trace:
[ 2268.059261]  <TASK>
[ 2268.059692]  __schedule+0x335/0x1240
[ 2268.060145]  ? __blk_mq_sched_dispatch_requests+0xe0/0x130
[ 2268.060611]  schedule+0x4e/0xb0
[ 2268.061059]  io_schedule+0x42/0x70
[ 2268.061473]  blk_mq_get_tag+0x10c/0x290
[ 2268.061910]  ? dequeue_task_stop+0x70/0x70
[ 2268.062359]  __blk_mq_alloc_requests+0x16e/0x2a0
[ 2268.062797]  blk_mq_submit_bio+0x2a2/0x590
[ 2268.063226]  __submit_bio+0xf5/0x180
[ 2268.063660]  submit_bio_noacct_nocheck+0x1f9/0x2b0
[ 2268.064055]  btrfs_map_bio+0x170/0x410
[ 2268.064451]  btrfs_submit_data_bio+0x134/0x220
[ 2268.064859]  ? __mod_memcg_lruvec_state+0x93/0x110
[ 2268.065246]  submit_extent_page+0x17a/0x4b0
[ 2268.065637]  ? page_vma_mkclean_one.constprop.0+0x1b0/0x1b0
[ 2268.066018]  __extent_writepage_io.constprop.0+0x271/0x550
[ 2268.066363]  ? end_extent_writepage+0x100/0x100
[ 2268.066720]  ? writepage_delalloc+0x8a/0x180
[ 2268.067094]  __extent_writepage+0x115/0x490
[ 2268.067472]  extent_write_cache_pages+0x178/0x500
[ 2268.067889]  extent_writepages+0x60/0x140
[ 2268.068274]  do_writepages+0xac/0x1a0
[ 2268.068643]  __writeback_single_inode+0x3d/0x350
[ 2268.069022]  ? _raw_spin_lock+0x13/0x40
[ 2268.069419]  writeback_sb_inodes+0x1c5/0x460
[ 2268.069824]  __writeback_inodes_wb+0x4c/0xe0
[ 2268.070230]  wb_writeback+0x1c9/0x2a0
[ 2268.070622]  wb_workfn+0x298/0x490
[ 2268.070988]  process_one_work+0x1c7/0x380
[ 2268.071366]  worker_thread+0x4d/0x380
[ 2268.071775]  ? process_one_work+0x380/0x380
[ 2268.072179]  kthread+0xe9/0x110
[ 2268.072588]  ? kthread_complete_and_exit+0x20/0x20
[ 2268.073002]  ret_from_fork+0x22/0x30
[ 2268.073408]  </TASK>
[ 2268.073912] task:systemd-journal state:D stack:    0 pid: 1208 ppid:     1 flags:0x00000002
[ 2268.074334] Call Trace:
[ 2268.074756]  <TASK>
[ 2268.075165]  __schedule+0x335/0x1240
[ 2268.075586]  ? __blk_flush_plug+0xf2/0x130
[ 2268.075950]  schedule+0x4e/0xb0
[ 2268.076305]  io_schedule+0x42/0x70
[ 2268.076692]  folio_wait_bit_common+0x116/0x390
[ 2268.077075]  ? filemap_alloc_folio+0xc0/0xc0
[ 2268.077447]  filemap_fault+0xcf/0x980
[ 2268.077821]  __do_fault+0x36/0x130
[ 2268.078153]  do_fault+0x1da/0x440
[ 2268.078483]  __handle_mm_fault+0x9cf/0xed0
[ 2268.078837]  handle_mm_fault+0xae/0x280
[ 2268.079202]  do_user_addr_fault+0x1c5/0x670
[ 2268.079567]  exc_page_fault+0x70/0x170
[ 2268.079929]  asm_exc_page_fault+0x22/0x30
[ 2268.080282] RIP: 0033:0x7f3d8efd6519
[ 2268.080657] RSP: 002b:00007ffd4a073ac0 EFLAGS: 00010202
[ 2268.081017] RAX: 00007f3d8c6586d8 RBX: 00000000016bb6d8 RCX: 00007f3d8efe1309
[ 2268.081374] RDX: 00007ffd4a073ac0 RSI: 000055d6da7a52e8 RDI: 000055d6da7a52a0
[ 2268.081756] RBP: 00007ffd4a073b48 R08: 000055d6da7b7a00 R09: 000055d6da7aaea8
[ 2268.082147] R10: 0000000000000004 R11: 00007ffd4a073e00 R12: 000055d6da7aae70
[ 2268.082511] R13: 0000000000000001 R14: 00007ffd4a073ac0 R15: 00000000016bb6d8
[ 2268.082883]  </TASK>
[ 2268.083757] task:kworker/u97:0   state:D stack:    0 pid: 7664 ppid:     2 flags:0x00004000
[ 2268.084149] Workqueue: writeback wb_workfn (flush-btrfs-1)
[ 2268.084537] Call Trace:
[ 2268.084918]  <TASK>
[ 2268.085289]  __schedule+0x335/0x1240
[ 2268.085675]  ? __blk_flush_plug+0xf2/0x130
[ 2268.086056]  schedule+0x4e/0xb0
[ 2268.086406]  io_schedule+0x42/0x70
[ 2268.086786]  blk_mq_get_tag+0x10c/0x290
[ 2268.087160]  ? dequeue_task_stop+0x70/0x70
[ 2268.087539]  __blk_mq_alloc_requests+0x16e/0x2a0
[ 2268.087912]  blk_mq_submit_bio+0x2a2/0x590
[ 2268.088287]  __submit_bio+0xf5/0x180
[ 2268.088676]  submit_bio_noacct_nocheck+0x1f9/0x2b0
[ 2268.089052]  btrfs_map_bio+0x170/0x410
[ 2268.089426]  btrfs_submit_data_bio+0x134/0x220
[ 2268.089813]  ? __mod_memcg_lruvec_state+0x93/0x110
[ 2268.090192]  submit_extent_page+0x17a/0x4b0
[ 2268.090587]  ? page_vma_mkclean_one.constprop.0+0x1b0/0x1b0
[ 2268.090975]  __extent_writepage_io.constprop.0+0x271/0x550
[ 2268.091365]  ? end_extent_writepage+0x100/0x100
[ 2268.091757]  ? writepage_delalloc+0x8a/0x180
[ 2268.092141]  __extent_writepage+0x115/0x490
[ 2268.092534]  extent_write_cache_pages+0x178/0x500
[ 2268.092890]  extent_writepages+0x60/0x140
[ 2268.093247]  do_writepages+0xac/0x1a0
[ 2268.093624]  ? __blk_mq_sched_dispatch_requests+0xe0/0x130
[ 2268.094004]  __writeback_single_inode+0x3d/0x350
[ 2268.094376]  ? _raw_spin_lock+0x13/0x40
[ 2268.094765]  writeback_sb_inodes+0x1c5/0x460
[ 2268.095151]  __writeback_inodes_wb+0x4c/0xe0
[ 2268.095531]  wb_writeback+0x1c9/0x2a0
[ 2268.095913]  wb_workfn+0x306/0x490
[ 2268.096291]  process_one_work+0x1c7/0x380
[ 2268.096686]  worker_thread+0x4d/0x380
[ 2268.097070]  ? process_one_work+0x380/0x380
[ 2268.097448]  kthread+0xe9/0x110
[ 2268.097842]  ? kthread_complete_and_exit+0x20/0x20
[ 2268.098224]  ret_from_fork+0x22/0x30
[ 2268.098623]  </TASK>
[ 2268.099008] task:kworker/u97:7   state:D stack:    0 pid: 7665 ppid:     2 flags:0x00004000
[ 2268.099410] Workqueue: blkcg_punt_bio blkg_async_bio_workfn
[ 2268.099815] Call Trace:
[ 2268.100198]  <TASK>
[ 2268.100589]  __schedule+0x335/0x1240
[ 2268.100982]  ? __blk_mq_sched_dispatch_requests+0xe0/0x130
[ 2268.101377]  schedule+0x4e/0xb0
[ 2268.101772]  io_schedule+0x42/0x70
[ 2268.102153]  blk_mq_get_tag+0x10c/0x290
[ 2268.102537]  ? dequeue_task_stop+0x70/0x70
[ 2268.102902]  __blk_mq_alloc_requests+0x16e/0x2a0
[ 2268.103275]  blk_mq_submit_bio+0x2a2/0x590
[ 2268.103659]  __submit_bio+0xf5/0x180
[ 2268.104032]  submit_bio_noacct_nocheck+0x1f9/0x2b0
[ 2268.104408]  blkg_async_bio_workfn+0x66/0x90
[ 2268.104797]  process_one_work+0x1c7/0x380
[ 2268.105181]  worker_thread+0x1d6/0x380
[ 2268.105572]  ? process_one_work+0x380/0x380
[ 2268.105949]  kthread+0xe9/0x110
[ 2268.106319]  ? kthread_complete_and_exit+0x20/0x20
[ 2268.106667]  ret_from_fork+0x22/0x30
[ 2268.106882]  </TASK>



--
Chris Murphy

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: 5.19.0: dnf install hangs when system is under load
  2022-08-10 16:19 5.19.0: dnf install hangs when system is under load Chris Murphy
@ 2022-08-11  1:32 ` Qu Wenruo
  2022-08-11  2:21   ` Chris Murphy
  0 siblings, 1 reply; 3+ messages in thread
From: Qu Wenruo @ 2022-08-11  1:32 UTC (permalink / raw)
  To: Chris Murphy, Btrfs BTRFS, linux-block



On 2022/8/11 00:19, Chris Murphy wrote:
> Downstream bug - 5.19.0: dnf install hangs when system is under load
> https://bugzilla.redhat.com/show_bug.cgi?id=2117326
>
> 5.19.0-65.fc37.x86_64
>
> Setup
> btrfs raid10 on 8x plain partitions
>
> Command
> sudo dnf install pciutils
>
> Reproducible:
> About 1 in 3, correlates with the system being under heavy load, otherwise it's not happening
>
> Get stuck at
> Running scriptlet: sg3_utils-1.46-3.fc36.x86_64   2/2
>
> ps aux status for dnf is D+, kill -9 does nothing, strace shows nothing. The hang last at least 10 minutes, didn't test beyond that.
>
> Full dmesg with sysrq+w is attached to the bug report.
>
> snippet
>
> [ 2268.057017] sysrq: Show Blocked State
> [ 2268.057866] task:kworker/u97:11  state:D stack:    0 pid:  340 ppid:     2 flags:0x00004000
> [ 2268.058361] Workqueue: writeback wb_workfn (flush-btrfs-1)
> [ 2268.058825] Call Trace:
> [ 2268.059261]  <TASK>
> [ 2268.059692]  __schedule+0x335/0x1240
> [ 2268.060145]  ? __blk_mq_sched_dispatch_requests+0xe0/0x130
> [ 2268.060611]  schedule+0x4e/0xb0
> [ 2268.061059]  io_schedule+0x42/0x70
> [ 2268.061473]  blk_mq_get_tag+0x10c/0x290

All the hanging processes are waiting at blk_mq_get_tag(), thus I'm not
sure if it's really btrfs, or something in the block layer.

Adding block layer guys into the thread.

Thanks,
Qu

> [ 2268.061910]  ? dequeue_task_stop+0x70/0x70
> [ 2268.062359]  __blk_mq_alloc_requests+0x16e/0x2a0
> [ 2268.062797]  blk_mq_submit_bio+0x2a2/0x590
> [ 2268.063226]  __submit_bio+0xf5/0x180
> [ 2268.063660]  submit_bio_noacct_nocheck+0x1f9/0x2b0
> [ 2268.064055]  btrfs_map_bio+0x170/0x410
> [ 2268.064451]  btrfs_submit_data_bio+0x134/0x220
> [ 2268.064859]  ? __mod_memcg_lruvec_state+0x93/0x110
> [ 2268.065246]  submit_extent_page+0x17a/0x4b0
> [ 2268.065637]  ? page_vma_mkclean_one.constprop.0+0x1b0/0x1b0
> [ 2268.066018]  __extent_writepage_io.constprop.0+0x271/0x550
> [ 2268.066363]  ? end_extent_writepage+0x100/0x100
> [ 2268.066720]  ? writepage_delalloc+0x8a/0x180
> [ 2268.067094]  __extent_writepage+0x115/0x490
> [ 2268.067472]  extent_write_cache_pages+0x178/0x500
> [ 2268.067889]  extent_writepages+0x60/0x140
> [ 2268.068274]  do_writepages+0xac/0x1a0
> [ 2268.068643]  __writeback_single_inode+0x3d/0x350
> [ 2268.069022]  ? _raw_spin_lock+0x13/0x40
> [ 2268.069419]  writeback_sb_inodes+0x1c5/0x460
> [ 2268.069824]  __writeback_inodes_wb+0x4c/0xe0
> [ 2268.070230]  wb_writeback+0x1c9/0x2a0
> [ 2268.070622]  wb_workfn+0x298/0x490
> [ 2268.070988]  process_one_work+0x1c7/0x380
> [ 2268.071366]  worker_thread+0x4d/0x380
> [ 2268.071775]  ? process_one_work+0x380/0x380
> [ 2268.072179]  kthread+0xe9/0x110
> [ 2268.072588]  ? kthread_complete_and_exit+0x20/0x20
> [ 2268.073002]  ret_from_fork+0x22/0x30
> [ 2268.073408]  </TASK>
> [ 2268.073912] task:systemd-journal state:D stack:    0 pid: 1208 ppid:     1 flags:0x00000002
> [ 2268.074334] Call Trace:
> [ 2268.074756]  <TASK>
> [ 2268.075165]  __schedule+0x335/0x1240
> [ 2268.075586]  ? __blk_flush_plug+0xf2/0x130
> [ 2268.075950]  schedule+0x4e/0xb0
> [ 2268.076305]  io_schedule+0x42/0x70
> [ 2268.076692]  folio_wait_bit_common+0x116/0x390
> [ 2268.077075]  ? filemap_alloc_folio+0xc0/0xc0
> [ 2268.077447]  filemap_fault+0xcf/0x980
> [ 2268.077821]  __do_fault+0x36/0x130
> [ 2268.078153]  do_fault+0x1da/0x440
> [ 2268.078483]  __handle_mm_fault+0x9cf/0xed0
> [ 2268.078837]  handle_mm_fault+0xae/0x280
> [ 2268.079202]  do_user_addr_fault+0x1c5/0x670
> [ 2268.079567]  exc_page_fault+0x70/0x170
> [ 2268.079929]  asm_exc_page_fault+0x22/0x30
> [ 2268.080282] RIP: 0033:0x7f3d8efd6519
> [ 2268.080657] RSP: 002b:00007ffd4a073ac0 EFLAGS: 00010202
> [ 2268.081017] RAX: 00007f3d8c6586d8 RBX: 00000000016bb6d8 RCX: 00007f3d8efe1309
> [ 2268.081374] RDX: 00007ffd4a073ac0 RSI: 000055d6da7a52e8 RDI: 000055d6da7a52a0
> [ 2268.081756] RBP: 00007ffd4a073b48 R08: 000055d6da7b7a00 R09: 000055d6da7aaea8
> [ 2268.082147] R10: 0000000000000004 R11: 00007ffd4a073e00 R12: 000055d6da7aae70
> [ 2268.082511] R13: 0000000000000001 R14: 00007ffd4a073ac0 R15: 00000000016bb6d8
> [ 2268.082883]  </TASK>
> [ 2268.083757] task:kworker/u97:0   state:D stack:    0 pid: 7664 ppid:     2 flags:0x00004000
> [ 2268.084149] Workqueue: writeback wb_workfn (flush-btrfs-1)
> [ 2268.084537] Call Trace:
> [ 2268.084918]  <TASK>
> [ 2268.085289]  __schedule+0x335/0x1240
> [ 2268.085675]  ? __blk_flush_plug+0xf2/0x130
> [ 2268.086056]  schedule+0x4e/0xb0
> [ 2268.086406]  io_schedule+0x42/0x70
> [ 2268.086786]  blk_mq_get_tag+0x10c/0x290
> [ 2268.087160]  ? dequeue_task_stop+0x70/0x70
> [ 2268.087539]  __blk_mq_alloc_requests+0x16e/0x2a0
> [ 2268.087912]  blk_mq_submit_bio+0x2a2/0x590
> [ 2268.088287]  __submit_bio+0xf5/0x180
> [ 2268.088676]  submit_bio_noacct_nocheck+0x1f9/0x2b0
> [ 2268.089052]  btrfs_map_bio+0x170/0x410
> [ 2268.089426]  btrfs_submit_data_bio+0x134/0x220
> [ 2268.089813]  ? __mod_memcg_lruvec_state+0x93/0x110
> [ 2268.090192]  submit_extent_page+0x17a/0x4b0
> [ 2268.090587]  ? page_vma_mkclean_one.constprop.0+0x1b0/0x1b0
> [ 2268.090975]  __extent_writepage_io.constprop.0+0x271/0x550
> [ 2268.091365]  ? end_extent_writepage+0x100/0x100
> [ 2268.091757]  ? writepage_delalloc+0x8a/0x180
> [ 2268.092141]  __extent_writepage+0x115/0x490
> [ 2268.092534]  extent_write_cache_pages+0x178/0x500
> [ 2268.092890]  extent_writepages+0x60/0x140
> [ 2268.093247]  do_writepages+0xac/0x1a0
> [ 2268.093624]  ? __blk_mq_sched_dispatch_requests+0xe0/0x130
> [ 2268.094004]  __writeback_single_inode+0x3d/0x350
> [ 2268.094376]  ? _raw_spin_lock+0x13/0x40
> [ 2268.094765]  writeback_sb_inodes+0x1c5/0x460
> [ 2268.095151]  __writeback_inodes_wb+0x4c/0xe0
> [ 2268.095531]  wb_writeback+0x1c9/0x2a0
> [ 2268.095913]  wb_workfn+0x306/0x490
> [ 2268.096291]  process_one_work+0x1c7/0x380
> [ 2268.096686]  worker_thread+0x4d/0x380
> [ 2268.097070]  ? process_one_work+0x380/0x380
> [ 2268.097448]  kthread+0xe9/0x110
> [ 2268.097842]  ? kthread_complete_and_exit+0x20/0x20
> [ 2268.098224]  ret_from_fork+0x22/0x30
> [ 2268.098623]  </TASK>
> [ 2268.099008] task:kworker/u97:7   state:D stack:    0 pid: 7665 ppid:     2 flags:0x00004000
> [ 2268.099410] Workqueue: blkcg_punt_bio blkg_async_bio_workfn
> [ 2268.099815] Call Trace:
> [ 2268.100198]  <TASK>
> [ 2268.100589]  __schedule+0x335/0x1240
> [ 2268.100982]  ? __blk_mq_sched_dispatch_requests+0xe0/0x130
> [ 2268.101377]  schedule+0x4e/0xb0
> [ 2268.101772]  io_schedule+0x42/0x70
> [ 2268.102153]  blk_mq_get_tag+0x10c/0x290
> [ 2268.102537]  ? dequeue_task_stop+0x70/0x70
> [ 2268.102902]  __blk_mq_alloc_requests+0x16e/0x2a0
> [ 2268.103275]  blk_mq_submit_bio+0x2a2/0x590
> [ 2268.103659]  __submit_bio+0xf5/0x180
> [ 2268.104032]  submit_bio_noacct_nocheck+0x1f9/0x2b0
> [ 2268.104408]  blkg_async_bio_workfn+0x66/0x90
> [ 2268.104797]  process_one_work+0x1c7/0x380
> [ 2268.105181]  worker_thread+0x1d6/0x380
> [ 2268.105572]  ? process_one_work+0x380/0x380
> [ 2268.105949]  kthread+0xe9/0x110
> [ 2268.106319]  ? kthread_complete_and_exit+0x20/0x20
> [ 2268.106667]  ret_from_fork+0x22/0x30
> [ 2268.106882]  </TASK>
>
>
>
> --
> Chris Murphy

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: 5.19.0: dnf install hangs when system is under load
  2022-08-11  1:32 ` Qu Wenruo
@ 2022-08-11  2:21   ` Chris Murphy
  0 siblings, 0 replies; 3+ messages in thread
From: Chris Murphy @ 2022-08-11  2:21 UTC (permalink / raw)
  To: Qu Wenruo, Btrfs BTRFS, linux-block



On Wed, Aug 10, 2022, at 9:32 PM, Qu Wenruo wrote:
> On 2022/8/11 00:19, Chris Murphy wrote:
>> Downstream bug - 5.19.0: dnf install hangs when system is under load
>> https://bugzilla.redhat.com/show_bug.cgi?id=2117326
>>
>> 5.19.0-65.fc37.x86_64
>>
>> Setup
>> btrfs raid10 on 8x plain partitions
>>
>> Command
>> sudo dnf install pciutils
>>
>> Reproducible:
>> About 1 in 3, correlates with the system being under heavy load, otherwise it's not happening
>>
>> Get stuck at
>> Running scriptlet: sg3_utils-1.46-3.fc36.x86_64   2/2
>>
>> ps aux status for dnf is D+, kill -9 does nothing, strace shows nothing. The hang last at least 10 minutes, didn't test beyond that.
>>
>> Full dmesg with sysrq+w is attached to the bug report.
>>
>> snippet
>>
>> [ 2268.057017] sysrq: Show Blocked State
>> [ 2268.057866] task:kworker/u97:11  state:D stack:    0 pid:  340 ppid:     2 flags:0x00004000
>> [ 2268.058361] Workqueue: writeback wb_workfn (flush-btrfs-1)
>> [ 2268.058825] Call Trace:
>> [ 2268.059261]  <TASK>
>> [ 2268.059692]  __schedule+0x335/0x1240
>> [ 2268.060145]  ? __blk_mq_sched_dispatch_requests+0xe0/0x130
>> [ 2268.060611]  schedule+0x4e/0xb0
>> [ 2268.061059]  io_schedule+0x42/0x70
>> [ 2268.061473]  blk_mq_get_tag+0x10c/0x290
>
> All the hanging processes are waiting at blk_mq_get_tag(), thus I'm not
> sure if it's really btrfs, or something in the block layer.
>
> Adding block layer guys into the thread.

OK so it might just be the same problem I reported in this thread, which first appeared in the 5.12 merge window. The weird thing is, the 5.19 kernel is staying up for *days* unlike 5.12 through 5.18, except under heavy load I run dnf and then only dnf hangs.


https://lore.kernel.org/linux-btrfs/ad78a32c-7790-4e21-be9f-81c5848a4953@www.fastmail.com/T/#t

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-08-11  2:24 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-10 16:19 5.19.0: dnf install hangs when system is under load Chris Murphy
2022-08-11  1:32 ` Qu Wenruo
2022-08-11  2:21   ` Chris Murphy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).