[PATCH 0/3] btrfs: zoned: fix zoned extent allocator

* [PATCH 0/3] btrfs: zoned: fix zoned extent allocator
@ 2021-12-07 15:35 Naohiro Aota
  2021-12-07 15:35 ` [PATCH 1/3] btrfs: zoned: unset dedicated block group on allocation failure Naohiro Aota
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Naohiro Aota @ 2021-12-07 15:35 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba, Naohiro Aota

There are several reports of hung_task on btrfs recently.

- https://github.com/naota/linux/issues/59
- https://lore.kernel.org/linux-btrfs/CAJCQCtR=jztS3P34U_iUNoBodExHcud44OQ8oe4Jn3TK=1yFNw@mail.gmail.com/T/

The stack trace of hung is like this:

[  739.991925][   T25] INFO: task kworker/u4:0:7 blocked for more than 122 seconds.
[  739.994821][   T25]       Not tainted 5.16.0-rc3+ #56
[  739.996676][   T25] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  739.999600][   T25] task:kworker/u4:0    state:D stack:    0 pid:    7 ppid:     2 flags:0x00004000
[  740.002656][   T25] Workqueue: writeback wb_workfn (flush-btrfs-5)
[  740.004894][   T25] Call Trace:
[  740.006113][   T25]  <TASK>
[  740.007143][   T25]  __schedule+0x9e3/0x23b0
[  740.008657][   T25]  ? io_schedule_timeout+0x190/0x190
[  740.010529][   T25]  ? blk_start_plug_nr_ios+0x270/0x270
[  740.012385][   T25]  ? _raw_spin_unlock_irq+0x28/0x50
[  740.014163][   T25]  schedule+0xed/0x280
[  740.015567][   T25]  io_schedule+0xfb/0x180
[  740.017026][   T25]  folio_wait_bit_common+0x386/0x840
[  740.018839][   T25]  ? delete_from_page_cache+0x220/0x220
[  740.020744][   T25]  ? __this_cpu_preempt_check+0x13/0x20
[  740.022645][   T25]  ? filemap_range_has_page+0x210/0x210
[  740.024588][   T25]  __folio_lock+0x17/0x20
[  740.026197][   T25]  extent_write_cache_pages+0x78c/0xba0 [btrfs]
[  740.028776][   T25]  ? __extent_writepage+0x980/0x980 [btrfs]
[  740.031026][   T25]  ? __kasan_check_read+0x11/0x20
[  740.032916][   T25]  ? __lock_acquire+0x1772/0x5a10
[  740.034611][   T25]  extent_writepages+0x1e8/0x3b0 [btrfs]
[  740.036636][   T25]  ? extent_write_locked_range+0x580/0x580 [btrfs]
[  740.038828][   T25]  ? lockdep_hardirqs_on_prepare+0x410/0x410
[  740.040929][   T25]  btrfs_writepages+0xe/0x10 [btrfs]
[  740.042879][   T25]  do_writepages+0x187/0x610
[  740.044239][   T25]  ? page_writeback_cpu_online+0x20/0x20
[  740.045810][   T25]  ? sched_clock+0x9/0x10
[  740.047015][   T25]  ? sched_clock_cpu+0x18/0x1b0
[  740.048341][   T25]  ? find_held_lock+0x3c/0x130
[  740.049649][   T25]  ? __this_cpu_preempt_check+0x13/0x20
[  740.051155][   T25]  ? lock_release+0x3fd/0xed0
[  740.052358][   T25]  ? __this_cpu_preempt_check+0x13/0x20
[  740.053775][   T25]  ? lock_is_held_type+0xe4/0x140
[  740.055059][   T25]  __writeback_single_inode+0xd7/0xa90
[  740.056399][   T25]  writeback_sb_inodes+0x4e8/0xff0
[  740.057693][   T25]  ? sync_inode_metadata+0xe0/0xe0
[  740.058918][   T25]  ? down_read_trylock+0x45/0x50
[  740.060003][   T25]  ? trylock_super+0x1b/0xc0
[  740.061050][   T25]  __writeback_inodes_wb+0xba/0x210
[  740.062123][   T25]  wb_writeback+0x5b3/0x8b0
[  740.063075][   T25]  ? __writeback_inodes_wb+0x210/0x210
[  740.064158][   T25]  ? __local_bh_enable_ip+0xaa/0x120
[  740.065196][   T25]  ? __local_bh_enable_ip+0xaa/0x120
[  740.066180][   T25]  ? trace_hardirqs_on+0x2b/0x130
[  740.067117][   T25]  ? wb_workfn+0x2cc/0xe80
[  740.067943][   T25]  ? get_nr_dirty_inodes+0x130/0x1c0
[  740.068909][   T25]  wb_workfn+0x6f5/0xe80
[  740.069674][   T25]  ? inode_wait_for_writeback+0x40/0x40
[  740.070841][   T25]  ? __this_cpu_preempt_check+0x13/0x20
[  740.071834][   T25]  ? lock_acquire+0x1c1/0x4f0
[  740.072630][   T25]  ? lock_release+0xed0/0xed0
[  740.073483][   T25]  ? lock_downgrade+0x7c0/0x7c0
[  740.074374][   T25]  ? __this_cpu_preempt_check+0x13/0x20
[  740.075342][   T25]  ? lock_is_held_type+0xe4/0x140
[  740.076236][   T25]  process_one_work+0x826/0x14e0
[  740.077045][   T25]  ? lock_is_held_type+0xe4/0x140
[  740.077921][   T25]  ? pwq_dec_nr_in_flight+0x250/0x250
[  740.078842][   T25]  ? lockdep_hardirqs_off+0x99/0xe0
[  740.079699][   T25]  worker_thread+0x59b/0x1050
[  740.080478][   T25]  ? process_one_work+0x14e0/0x14e0
[  740.081306][   T25]  kthread+0x38f/0x460
[  740.081955][   T25]  ? set_kthread_struct+0x110/0x110
[  740.082769][   T25]  ret_from_fork+0x22/0x30
[  740.083476][   T25]  </TASK>
[  740.083972][   T25] INFO: task aio-dio-write-v:1459 blocked for more than 122 seconds.
[  740.085202][   T25]       Not tainted 5.16.0-rc3+ #56
[  740.085970][   T25] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  740.087238][   T25] task:aio-dio-write-v state:D stack:    0 pid: 1459 ppid:   542 flags:0x00004000
[  740.088676][   T25] Call Trace:
[  740.089269][   T25]  <TASK>
[  740.089780][   T25]  __schedule+0x9e3/0x23b0
[  740.090644][   T25]  ? io_schedule_timeout+0x190/0x190
[  740.091614][   T25]  ? blk_start_plug_nr_ios+0x270/0x270
[  740.092649][   T25]  ? _raw_spin_unlock_irq+0x28/0x50
[  740.093531][   T25]  schedule+0xed/0x280
[  740.094323][   T25]  io_schedule+0xfb/0x180
[  740.095095][   T25]  folio_wait_bit_common+0x386/0x840
[  740.095868][   T25]  ? delete_from_page_cache+0x220/0x220
[  740.096730][   T25]  ? lock_is_held_type+0xe4/0x140
[  740.097432][   T25]  ? filemap_range_has_page+0x210/0x210
[  740.098252][   T25]  __filemap_get_folio+0x4d3/0x8f0
[  740.099016][   T25]  ? filemap_range_needs_writeback+0xb0/0xb0
[  740.099953][   T25]  ? lock_contended+0xdf0/0xdf0
[  740.100663][   T25]  pagecache_get_page+0x19/0xc0
[  740.101400][   T25]  prepare_pages+0x205/0x4c0 [btrfs]
[  740.102274][   T25]  btrfs_buffered_write+0x5e0/0x1060 [btrfs]
[  740.103402][   T25]  ? btrfs_dirty_pages+0x2c0/0x2c0 [btrfs]
[  740.104480][   T25]  ? __up_read+0x1a9/0x7b0
[  740.105260][   T25]  ? up_write+0x480/0x480
[  740.106024][   T25]  ? btrfs_file_llseek+0x600/0x600 [btrfs]
[  740.107157][   T25]  btrfs_file_write_iter+0x84e/0xfa0 [btrfs]
[  740.108239][   T25]  ? lock_downgrade+0x7c0/0x7c0
[  740.109091][   T25]  aio_write+0x314/0x820
[  740.109888][   T25]  ? cpumask_weight.constprop.0+0x40/0x40
[  740.110846][   T25]  ? kvm_sched_clock_read+0x18/0x40
[  740.111677][   T25]  ? sched_clock_cpu+0x18/0x1b0
[  740.112765][   T25]  ? __this_cpu_preempt_check+0x13/0x20
[  740.114113][   T25]  ? lock_release+0x3fd/0xed0
[  740.115264][   T25]  ? lock_downgrade+0x7c0/0x7c0
[  740.116505][   T25]  io_submit_one.constprop.0+0xba3/0x1a50
[  740.117894][   T25]  ? io_submit_one.constprop.0+0xba3/0x1a50
[  740.119180][   T25]  ? kvm_sched_clock_read+0x18/0x40
[  740.120036][   T25]  ? sched_clock+0x9/0x10
[  740.120764][   T25]  ? sched_clock_cpu+0x18/0x1b0
[  740.121619][   T25]  ? __x64_sys_io_getevents_time32+0x2a0/0x2a0
[  740.122628][   T25]  ? lock_release+0x3fd/0xed0
[  740.123522][   T25]  __x64_sys_io_submit+0x15d/0x2b0
[  740.124417][   T25]  ? __x64_sys_io_submit+0x15d/0x2b0
[  740.125363][   T25]  ? io_submit_one.constprop.0+0x1a50/0x1a50
[  740.126322][   T25]  ? __this_cpu_preempt_check+0x13/0x20
[  740.127243][   T25]  ? lock_is_held_type+0xe4/0x140
[  740.128071][   T25]  ? lockdep_hardirqs_on+0x7e/0x100
[  740.128915][   T25]  ? syscall_enter_from_user_mode+0x25/0x80
[  740.129872][   T25]  ? trace_hardirqs_on+0x2b/0x130
[  740.130694][   T25]  do_syscall_64+0x3b/0x90
[  740.131380][   T25]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  740.132351][   T25] RIP: 0033:0x7f5c03578679
[  740.133117][   T25] RSP: 002b:00007fffa4f030e8 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1
[  740.134468][   T25] RAX: ffffffffffffffda RBX: 00007f5c034746c0 RCX: 00007f5c03578679
[  740.135749][   T25] RDX: 0000561f67b701d0 RSI: 0000000000000037 RDI: 00007f5c03676000
[  740.137151][   T25] RBP: 00007f5c03676000 R08: 0000000000000000 R09: 0000000000000000
[  740.138505][   T25] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000037
[  740.139831][   T25] R13: 0000000000000000 R14: 0000561f67b701d0 R15: 0000561f67b701d0
[  740.141263][   T25]  </TASK>
[  740.141793][   T25]
[  740.141793][   T25] Showing all locks held in the system:
[  740.143126][   T25] 3 locks held by kworker/u4:0/7:
[  740.143982][   T25]  #0: ffff888100d4d948 ((wq_completion)writeback){+.+.}-{0:0}, at: process_one_work+0x740/0x14e0
[  740.145599][   T25]  #1: ffffc9000007fda8 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}, at: process_one_work+0x770/0x14e0
[  740.147521][   T25]  #2: ffff8881771760e8 (&type->s_umount_key#30){++++}-{3:3}, at: trylock_super+0x1b/0xc0
[  740.149084][   T25] 1 lock held by khungtaskd/25:
[  740.149916][   T25]  #0: ffffffff833cee00 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x5f/0x27f
[  740.151491][   T25] 1 lock held by aio-dio-write-v/1459:
[  740.152458][   T25]  #0: ffff888172999b10 (&sb->s_type->i_mutex_key#13){++++}-{3:3}, at: btrfs_inode_lock+0x3f/0x70 [btrfs]
[  740.154487][   T25]
[  740.154920][   T25] =============================================
[  740.154920][   T25]
[  740.156347][   T25] Kernel panic - not syncing: hung_task: blocked tasks
[  740.157548][   T25] CPU: 0 PID: 25 Comm: khungtaskd Not tainted 5.16.0-rc3+ #56
[  740.158826][   T25] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS d55cb5a 04/01/2014
[  740.160265][   T25] Call Trace:
[  740.160788][   T25]  <TASK>
[  740.161240][   T25]  dump_stack_lvl+0x49/0x5e
[  740.162045][   T25]  dump_stack+0x10/0x12
[  740.162642][   T25]  panic+0x1e2/0x478
[  740.163330][   T25]  ? __warn_printk+0xf3/0xf3
[  740.164041][   T25]  watchdog.cold+0x118/0x137
[  740.164741][   T25]  ? reset_hung_task_detector+0x30/0x30
[  740.165537][   T25]  kthread+0x38f/0x460
[  740.166202][   T25]  ? set_kthread_struct+0x110/0x110
[  740.167022][   T25]  ret_from_fork+0x22/0x30
[  740.167782][   T25]  </TASK>
[  740.168520][   T25] Kernel Offset: disabled
[  740.169285][   T25] ---[ end Kernel panic - not syncing: hung_task: blocked tasks ]---

While we are debugging this issue, we found some faulty behaviors on
the zoned extent allocator. It is not the root cause of the hung as we
see a similar report also on a regular btrfs. But, it looks like that
early -ENOSPC is, at least, making the hung to happen often.

So, this series fixes the faulty behaviors of the zoned extent
allocator.

Patch 1 fixes a case when allocation fails in a dedicated block group.

Patches 2 and 3 fix the chunk allocation condition for zoned
allocator, so that it won't block a possible chunk allocation.

Naohiro Aota (3):
  btrfs: zoned: unset dedicated block group on allocation failure
  btrfs: add extent allocator hook to decide to allocate chunk or not
  btrfs: zoned: fix chunk allocation condition for zoned allocator

 fs/btrfs/extent-tree.c | 58 ++++++++++++++++++++++++++++++------------
 fs/btrfs/zoned.c       |  5 ++--
 fs/btrfs/zoned.h       |  5 ++--
 3 files changed, 46 insertions(+), 22 deletions(-)

-- 
2.34.1

^ permalink raw reply	[flat|nested] 8+ messages in thread