All of lore.kernel.org
 help / color / mirror / Atom feed
* [bug report] Format FS failed with ublk device
@ 2024-04-25  1:54 Guangwu Zhang
  2024-04-26  1:09 ` Ming Lei
  0 siblings, 1 reply; 3+ messages in thread
From: Guangwu Zhang @ 2024-04-25  1:54 UTC (permalink / raw)
  To: Ming Lei, linux-block

Hi,
the format FS command will hung up  with ublk device.

# ublk --version
ublksrv 1.1-7-gf01c509

kerne: 6.9.0-rc4.kasan


nvme0n1                     259:1    0   1.5T  0 disk
└─nvme0n1p1                 259:2    0     5G  0 part
# ublk add -t loop -f /dev/nvme0n1p1
dev id 0: nr_hw_queues 1 queue_depth 128 block size 4096 dev_capacity 10485760
max rq size 524288 daemon pid 3227 flags 0x42 state LIVE
ublkc: 245:0 ublkb: 259:3 owner: 0:0
queue 0: tid 3228 affinity(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
18 19 20 21 22 23 24 25 26 27 28 29 30 31 )
target {"backing_file":"/dev/nvme0n1p1","dev_size":5368709120,"direct_io":1,"name":"loop","type":1}

# mkfs.xfs -f /dev/ublkb0    << can not finish,  pid 3239
meta-data=/dev/ublkb0            isize=512    agcount=4, agsize=327680 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1    bigtime=1 inobtcount=1 nrext64=0
data     =                       bsize=4096   blocks=1310720, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=16384, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

# cat /proc/3239/stack
[<0>] rq_qos_wait+0x12a/0x1f0
[<0>] wbt_wait+0x11a/0x240
[<0>] __rq_qos_throttle+0x49/0x90
[<0>] blk_mq_submit_bio+0x58c/0x19d0
[<0>] submit_bio_noacct_nocheck+0x40d/0x780
[<0>] blk_next_bio+0x41/0x50
[<0>] __blkdev_issue_zero_pages+0x1ba/0x370
[<0>] blkdev_issue_zeroout+0x1a7/0x390
[<0>] blkdev_fallocate+0x264/0x3d0
[<0>] vfs_fallocate+0x2b0/0xad0
[<0>] __x64_sys_fallocate+0xb4/0x100
[<0>] do_syscall_64+0x7b/0x160
[<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e

[  862.171377] INFO: task mkfs.xfs:3239 blocked for more than 122 seconds.
[  862.178073]       Not tainted 6.9.0-rc4.kasan+ #1
[  862.182820] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  862.190721] task:mkfs.xfs        state:D stack:0     pid:3239
tgid:3239  ppid:1801   flags:0x00004002
[  862.190733] Call Trace:
[  862.190737]  <TASK>
[  862.190742]  __schedule+0x65f/0x15a0
[  862.190759]  ? blk_mq_flush_plug_list.part.0+0xd7/0x4a0
[  862.190774]  ? __pfx___schedule+0x10/0x10
[  862.190779]  ? __blk_flush_plug+0x286/0x4a0
[  862.190790]  ? __pfx___blk_flush_plug+0x10/0x10
[  862.190795]  ? kmem_cache_alloc+0x145/0x390
[  862.190803]  ? __pfx_rq_wait_inc_below+0x10/0x10
[  862.190812]  ? __pfx_wbt_inflight_cb+0x10/0x10
[  862.190820]  schedule+0x70/0x1b0
[  862.190826]  io_schedule+0xc0/0x130
[  862.190832]  rq_qos_wait+0x12a/0x1f0
[  862.190840]  ? __pfx_rq_qos_wait+0x10/0x10
[  862.190845]  ? __pfx_rq_qos_wake_function+0x10/0x10
[  862.190850]  ? __pfx_wbt_inflight_cb+0x10/0x10
[  862.190856]  ? submit_bio_noacct_nocheck+0x78/0x780
[  862.190862]  wbt_wait+0x11a/0x240
[  862.190868]  ? __pfx_wbt_wait+0x10/0x10
[  862.190874]  __rq_qos_throttle+0x49/0x90
[  862.190880]  blk_mq_submit_bio+0x58c/0x19d0
[  862.190888]  ? __pfx_blk_mq_submit_bio+0x10/0x10
[  862.190893]  ? bio_associate_blkg_from_css+0x246/0xad0
[  862.190901]  ? blk_cgroup_bio_start+0x281/0x3e0
[  862.190908]  submit_bio_noacct_nocheck+0x40d/0x780
[  862.190912]  ? bio_alloc_bioset+0x43c/0x770
[  862.190923]  ? __pfx_submit_bio_noacct_nocheck+0x10/0x10
[  862.190929]  ? submit_bio_noacct+0x210/0x1780
[  862.190935]  blk_next_bio+0x41/0x50
[  862.190939]  __blkdev_issue_zero_pages+0x1ba/0x370
[  862.190948]  ? __pfx_vfs_write+0x10/0x10
[  862.190957]  blkdev_issue_zeroout+0x1a7/0x390
[  862.190964]  ? __pfx_blkdev_issue_zeroout+0x10/0x10
[  862.190974]  blkdev_fallocate+0x264/0x3d0
[  862.190981]  vfs_fallocate+0x2b0/0xad0
[  862.190989]  __x64_sys_fallocate+0xb4/0x100
[  862.190994]  do_syscall_64+0x7b/0x160
[  862.191005]  ? __vm_munmap+0x139/0x230
[  862.191012]  ? __pfx___vm_munmap+0x10/0x10
[  862.191016]  ? fpregs_restore_userregs+0xe3/0x1f0
[  862.191029]  ? syscall_exit_work+0xff/0x130
[  862.191037]  ? syscall_exit_to_user_mode+0x78/0x200
[  862.191045]  ? do_syscall_64+0x87/0x160
[  862.191050]  ? do_user_addr_fault+0x477/0xad0
[  862.191061]  ? exc_page_fault+0x59/0xc0
[  862.191067]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  862.191077] RIP: 0033:0x7f2c9e3025da
[  862.191083] RSP: 002b:00007ffd2f328b38 EFLAGS: 00000246 ORIG_RAX:
000000000000011d
[  862.191089] RAX: ffffffffffffffda RBX: 00000000a0009000 RCX: 00007f2c9e3025da
[  862.191094] RDX: 00000000a0009000 RSI: 0000000000000010 RDI: 0000000000000004
[  862.191097] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000000
[  862.191100] R10: 0000000004000000 R11: 0000000000000246 R12: 00005629d6f66870
[  862.191104] R13: 00007ffd2f329260 R14: 0000000000000004 R15: 0000000000000002
[  862.191111]  </TASK


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [bug report] Format FS failed with ublk device
  2024-04-25  1:54 [bug report] Format FS failed with ublk device Guangwu Zhang
@ 2024-04-26  1:09 ` Ming Lei
  2024-04-28 10:09   ` Ming Lei
  0 siblings, 1 reply; 3+ messages in thread
From: Ming Lei @ 2024-04-26  1:09 UTC (permalink / raw)
  To: Guangwu Zhang; +Cc: linux-block, ming.lei

Hi Guangwu,

Thanks for the report!

On Thu, Apr 25, 2024 at 09:54:04AM +0800, Guangwu Zhang wrote:
> Hi,
> the format FS command will hung up  with ublk device.
> 
> # ublk --version
> ublksrv 1.1-7-gf01c509
> 
> kerne: 6.9.0-rc4.kasan
> 
> 
> nvme0n1                     259:1    0   1.5T  0 disk
> └─nvme0n1p1                 259:2    0     5G  0 part
> # ublk add -t loop -f /dev/nvme0n1p1
> dev id 0: nr_hw_queues 1 queue_depth 128 block size 4096 dev_capacity 10485760
> max rq size 524288 daemon pid 3227 flags 0x42 state LIVE
> ublkc: 245:0 ublkb: 259:3 owner: 0:0
> queue 0: tid 3228 affinity(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
> 18 19 20 21 22 23 24 25 26 27 28 29 30 31 )
> target {"backing_file":"/dev/nvme0n1p1","dev_size":5368709120,"direct_io":1,"name":"loop","type":1}
> 
> # mkfs.xfs -f /dev/ublkb0    << can not finish,  pid 3239
> meta-data=/dev/ublkb0            isize=512    agcount=4, agsize=327680 blks
>          =                       sectsz=4096  attr=2, projid32bit=1
>          =                       crc=1        finobt=1, sparse=1, rmapbt=0
>          =                       reflink=1    bigtime=1 inobtcount=1 nrext64=0
> data     =                       bsize=4096   blocks=1310720, imaxpct=25
>          =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
> log      =internal log           bsize=4096   blocks=16384, version=2
>          =                       sectsz=4096  sunit=1 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> 
> # cat /proc/3239/stack
> [<0>] rq_qos_wait+0x12a/0x1f0
> [<0>] wbt_wait+0x11a/0x240
> [<0>] __rq_qos_throttle+0x49/0x90
> [<0>] blk_mq_submit_bio+0x58c/0x19d0
> [<0>] submit_bio_noacct_nocheck+0x40d/0x780
> [<0>] blk_next_bio+0x41/0x50
> [<0>] __blkdev_issue_zero_pages+0x1ba/0x370
> [<0>] blkdev_issue_zeroout+0x1a7/0x390
> [<0>] blkdev_fallocate+0x264/0x3d0
> [<0>] vfs_fallocate+0x2b0/0xad0
> [<0>] __x64_sys_fallocate+0xb4/0x100
> [<0>] do_syscall_64+0x7b/0x160
> [<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> 
> [  862.171377] INFO: task mkfs.xfs:3239 blocked for more than 122 seconds.
> [  862.178073]       Not tainted 6.9.0-rc4.kasan+ #1
> [  862.182820] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.

Looks it might be one blk-wbt issue, and ublk-loop doesn't setup
write_zero_max_bytes and it may take a bit long for __blkdev_issue_zero_pages
to complete, but it shouldn't hang.

Can you collect the following bpftrace by starting it before running mkfs?
And I can't reproduce it in my environment.

#!/usr/bin/bpftrace
kretfunc:vfs_fallocate
{
	printf("vfs_fallocate on %s ret %d (%x %lx %u)\n",
		str(args->file->f_path.dentry->d_name.name),
		retval, args->mode, args->offset, args->len);
}


Thanks,
Ming


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [bug report] Format FS failed with ublk device
  2024-04-26  1:09 ` Ming Lei
@ 2024-04-28 10:09   ` Ming Lei
  0 siblings, 0 replies; 3+ messages in thread
From: Ming Lei @ 2024-04-28 10:09 UTC (permalink / raw)
  To: Guangwu Zhang; +Cc: linux-block, ming.lei

On Fri, Apr 26, 2024 at 09:09:57AM +0800, Ming Lei wrote:
> Hi Guangwu,
> 
> Thanks for the report!
> 
> On Thu, Apr 25, 2024 at 09:54:04AM +0800, Guangwu Zhang wrote:
> > Hi,
> > the format FS command will hung up  with ublk device.
> > 
> > # ublk --version
> > ublksrv 1.1-7-gf01c509
> > 
> > kerne: 6.9.0-rc4.kasan
> > 
> > 
> > nvme0n1                     259:1    0   1.5T  0 disk
> > └─nvme0n1p1                 259:2    0     5G  0 part
> > # ublk add -t loop -f /dev/nvme0n1p1
> > dev id 0: nr_hw_queues 1 queue_depth 128 block size 4096 dev_capacity 10485760
> > max rq size 524288 daemon pid 3227 flags 0x42 state LIVE
> > ublkc: 245:0 ublkb: 259:3 owner: 0:0
> > queue 0: tid 3228 affinity(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
> > 18 19 20 21 22 23 24 25 26 27 28 29 30 31 )
> > target {"backing_file":"/dev/nvme0n1p1","dev_size":5368709120,"direct_io":1,"name":"loop","type":1}
> > 
> > # mkfs.xfs -f /dev/ublkb0    << can not finish,  pid 3239
> > meta-data=/dev/ublkb0            isize=512    agcount=4, agsize=327680 blks
> >          =                       sectsz=4096  attr=2, projid32bit=1
> >          =                       crc=1        finobt=1, sparse=1, rmapbt=0
> >          =                       reflink=1    bigtime=1 inobtcount=1 nrext64=0
> > data     =                       bsize=4096   blocks=1310720, imaxpct=25
> >          =                       sunit=0      swidth=0 blks
> > naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
> > log      =internal log           bsize=4096   blocks=16384, version=2
> >          =                       sectsz=4096  sunit=1 blks, lazy-count=1
> > realtime =none                   extsz=4096   blocks=0, rtextents=0
> > 
> > # cat /proc/3239/stack
> > [<0>] rq_qos_wait+0x12a/0x1f0
> > [<0>] wbt_wait+0x11a/0x240
> > [<0>] __rq_qos_throttle+0x49/0x90
> > [<0>] blk_mq_submit_bio+0x58c/0x19d0
> > [<0>] submit_bio_noacct_nocheck+0x40d/0x780
> > [<0>] blk_next_bio+0x41/0x50
> > [<0>] __blkdev_issue_zero_pages+0x1ba/0x370
> > [<0>] blkdev_issue_zeroout+0x1a7/0x390
> > [<0>] blkdev_fallocate+0x264/0x3d0
> > [<0>] vfs_fallocate+0x2b0/0xad0
> > [<0>] __x64_sys_fallocate+0xb4/0x100
> > [<0>] do_syscall_64+0x7b/0x160
> > [<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > 
> > [  862.171377] INFO: task mkfs.xfs:3239 blocked for more than 122 seconds.
> > [  862.178073]       Not tainted 6.9.0-rc4.kasan+ #1
> > [  862.182820] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> 
> Looks it might be one blk-wbt issue, and ublk-loop doesn't setup
> write_zero_max_bytes and it may take a bit long for __blkdev_issue_zero_pages
> to complete, but it shouldn't hang.
> 
> Can you collect the following bpftrace by starting it before running mkfs?
> And I can't reproduce it in my environment.
> 
> #!/usr/bin/bpftrace
> kretfunc:vfs_fallocate
> {
> 	printf("vfs_fallocate on %s ret %d (%x %lx %u)\n",
> 		str(args->file->f_path.dentry->d_name.name),
> 		retval, args->mode, args->offset, args->len);
> }

After co-working with Guangwu, the issue is now root-caused:

1) vfs_fallocate() can't translate block DISCARD into real discard, and
'FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE'
is supposed to be capable of doing that, but vfs doesn't allow
FALLOC_FL_NO_HIDE_STALE

2) so ublk discard is actually converted to write-zeroes because ublksrv
converts discard into fallocate(FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE),
and that is the reason why mkfs.xfs takes too long, and Guangwu confirmed
that zeroing out is in-progress actually, not hang.

Does FALLOC_FL_PUNCH_HOLE have to imply zeroing out for block device?

3) now fix is pushed to ublksrv by translating ublk discard into
ioctl(DISCARD) for block device

And same issue exists on kernel loop driver too.

Thanks,
Ming


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-04-28 10:10 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-25  1:54 [bug report] Format FS failed with ublk device Guangwu Zhang
2024-04-26  1:09 ` Ming Lei
2024-04-28 10:09   ` Ming Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.