[bug] mkfs.ext[23] hangs on loop device (aarch64, 5.8+)

* [bug] mkfs.ext[23] hangs on loop device (aarch64, 5.8+)
       [not found] <94883604.6936811.1596720770623.JavaMail.zimbra@redhat.com>
@ 2020-08-06 17:47 ` Jan Stancek
  2020-08-18  2:29   ` Ming Lei
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Stancek @ 2020-08-06 17:47 UTC (permalink / raw)
  To: linux-block

Hi,

I'm seeing sporadic mkfs.ext[23] hangs on loop device while running various
LTP tests. It seems to hang indefinitely once in bad state:
  0 D root       29782   29761  0  80   0 -  1006 rq_qos 15:09 ?        00:00:00             mkfs.ext3 /dev/loop0

[19809.932566] mkfs.ext3       D    0 29782  29761 0x00000000 
[19809.934000] Call trace: 
[19809.934624]  __switch_to+0xfc/0x150 
[19809.935533]  __schedule+0x364/0x828 
[19809.936432]  schedule+0x58/0xe0 
[19809.937261]  io_schedule+0x24/0xc0 
[19809.938144]  rq_qos_wait+0xe4/0x150 
[19809.939044]  wbt_wait+0x98/0xd8 
[19809.939864]  __rq_qos_throttle+0x38/0x50 
[19809.940847]  blk_mq_submit_bio+0x108/0x620 
[19809.941890]  submit_bio_noacct+0x358/0x3d8 
[19809.942909]  submit_bio+0x40/0x1a8 
[19809.943770]  submit_bh_wbc+0x16c/0x1e8 
[19809.944701]  __block_write_full_page+0x238/0x5c8 
[19809.945862]  block_write_full_page+0x124/0x138 
[19809.947000]  blkdev_writepage+0x24/0x30 
[19809.948031]  __writepage+0x28/0xc8 
[19809.948905]  write_cache_pages+0x1ac/0x410 
[19809.949988]  generic_writepages+0x4c/0x88 
[19809.950947]  blkdev_writepages+0x18/0x28 
[19809.951934]  do_writepages+0x40/0xe8 
[19809.952856]  __filemap_fdatawrite_range+0xe0/0x150 
[19809.954066]  file_write_and_wait_range+0x9c/0x108 
[19809.955266]  blkdev_fsync+0x24/0x50 
[19809.956170]  vfs_fsync_range+0x3c/0x88 
[19809.957126]  do_fsync+0x44/0x90 
[19809.957925]  __arm64_sys_fsync+0x20/0x30 
[19809.958961]  el0_svc_common.constprop.0+0x7c/0x188 
[19809.960242]  do_el0_svc+0x2c/0x98 
[19809.961028]  el0_sync_handler+0x84/0x110 
[19809.962003]  el0_sync+0x15c/0x180 

It started happening in recent weeks and appears to be aarch64 exclusive so far.

Affected kernels are at least:
  v5.8-475-g382625d0d432
  v5.8-607-gcdc8fcb49905
  v5.8-rc2-87-g6b7b181b67aa
  v5.8-rc2-105-g492d76b21566

6b7b181b67aa is the oldest commit I could reproduce it with, but my current
reproducer (running LTP fgetxattr01 in loop for 30 minutes) doesn't look very
reliable for bisect. 

Does this ring any bells?

Thanks,
Jan

^ permalink raw reply	[flat|nested] 5+ messages in thread