Tasks stuck jbd2 for a long time

* Tasks stuck jbd2 for a long time
@ 2023-08-15 19:01 Bhatnagar, Rishabh
  2023-08-16  2:28 ` Theodore Ts'o
  0 siblings, 1 reply; 15+ messages in thread
From: Bhatnagar, Rishabh @ 2023-08-15 19:01 UTC (permalink / raw)
  To: jack, tytso; +Cc: linux-ext4, linux-kernel, gregkh, Park, SeongJae

Hi Jan/Ted

We are seeing lockups in journaling EXT4 code (5.10 - 6.1) under heavy 
load. The stack traces seem to suggest that kjournald thread is blocked 
for a long time.
Kjournald thread seem to be waiting on writeback thread to decrement 
t_updates and other writeback threads seem to be waiting on kjournald to 
flush the current transaction.
The system completely hangs in this case and the system IO drops to zero 
after sometime.

This is a RAID0 setup with 4 nvme (7TB each) disks. There is 390GB RAM 
available. The issue occurs when user starts downloading a big enough 
data set (60-70% disk capacity).
This is observed on 5.10 kernels (5.10.184). We tried moving to 6.1 
kernels and saw similar issue. The system completely freezes and we see 
these stack traces in serial console.

We have tried experimenting with dirty_ratio, dirty_background_ratio, 
noatime/lazytime updates but don't see much improvement.
One thing that helps is disabling journaling completely. Testing is 
ongoing after increasing the journal size. (current size 128MB).
We are trying to understand why journal threads are stuck for such a 
long time. This causes the entire IO stall in the system. Let us know if 
you have seen something similar before and if there are any suggestions 
that we can try.

INFO: task kworker/u145:1:376 blocked for more than 120 seconds.
       Not tainted 5.10.184-175.731.amzn2.x86_64
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u145:1  state:D stack :    0 pid:  376 ppid:     2 
flags:0x00004000
Workqueue : writeback wb_workfn (flush-9:0)
Call Trace:
  __schedule+0x1f9/0x660
  schedule+0x46/0xb0
  wait_transaction_locked+0x8a/0xd0 [jbd2]
  ? add_wait_queue_exclusive+0x70/0x70
  add_transaction_credits+0xd6/0x2a0 [jbd2]
  ? blk_mq_flush_plug_list+0x100/0x1a0
  start_this_handle+0x12d/0x4d0 [jbd2]
  ? jbd2__journal_start+0xc3/0x1e0 [jbd2]
  ? kmem_cache_alloc+0x132/0x270
  jbd2__journal_start+0xfb/0x1e0 [jbd2]
  __ext4_journal_start_sb+0xfb/0x110 [ext4]
  ext4_writepages+0x32c/0x790 [ext4]
  do_writepages+0x34/0xc0
  ? write_inode+0x54/0xd0
  __writeback_single_inode+0x39/0x200
  writeback_sb_inodes+0x20d/0x4a0
  __writeback_inodes_wb+0x4c/0xe0
  wb_writeback+0x1d8/0x2a0
  wb_do_writeback+0x166/0x180
  wb_workfn+0x6e/0x250
  ? __switch_to_asm+0x3a/0x60
  ? __schedule+0x201/0x660
  process_one_work+0x1b0/0x350
  worker_thread+0x49/0x310
  ? process_one_work+0x350/0x350
  kthread+0x11b/0x140
  ? __kthread_bind_mask+0x60/0x60
  ret_from_fork+0x22/0x30

INFO: task jbd2/md0-8:8068 blocked for more than 120 seconds.

       Not tainted 5.10.184-175.731.amzn2.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:jbd2/md0-8      state:D stack:    0 pid: 8068 ppid:     2 
flags:0x00004080
Call Trace:
__schedule+0x1f9/0x660
  schedule+0x46/0xb0
  jbd2_journal_commit_transaction+0x35d/0x1880 [jbd2]
  ? update_load_avg+0x7a/0x5d0
  ? add_wait_queue_exclusive+0x70/0x70
  ? lock_timer_base+0x61/0x80
  ? kjournald2+0xcf/0x360 [jbd2]
  kjournald2+0xcf/0x360 [jbd2]
  ? add_wait_queue_exclusive+0x70/0x70
  ? load_superblock.part.0+0xb0/0xb0 [jbd2]
  kthread+0x11b/0x140
  ? __kthread_bind_mask+0x60/0x60
  ret_from_fork+0x22/0x30

INFO: task kvstore-leaf:39161 blocked for more than 121 seconds.
       Not tainted 5.10.184-175.731.amzn2.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kvstore-leaf    state:D stack:    0 pid:39161 ppid: 39046 
flags:0x00000080
Call Trace:
  __schedule+0x1f9/0x660
  schedule+0x46/0xb0
  wait_transaction_locked+0x8a/0xd0 [jbd2]
  ? add_wait_queue_exclusive+0x70/0x70
  add_transaction_credits+0xd6/0x2a0 [jbd2]
  start_this_handle+0x12d/0x4d0 [jbd2]
  ? jbd2__journal_start+0x91/0x1e0 [jbd2]
  ? kmem_cache_alloc+0x132/0x270
  jbd2__journal_start+0xfb/0x1e0 [jbd2]
  __ext4_journal_start_sb+0xfb/0x110 [ext4]
  ext4_dirty_inode+0x3d/0x90 [ext4]
  __mark_inode_dirty+0x196/0x300
  generic_update_time+0x68/0xd0
  file_update_time+0x127/0x140
  ? generic_write_checks+0x61/0xd0
  ext4_buffered_write_iter+0x52/0x160 [ext4]
  new_sync_write+0x11c/0x1b0
  vfs_write+0x1c9/0x260
  ksys_write+0x5f/0xe0
  do_syscall_64+0x33/0x40
  entry_SYSCALL_64_after_hwframe+0x61/0xc6

Thanks
Rishabh

^ permalink raw reply	[flat|nested] 15+ messages in thread