* [bug report] Format FS failed with ublk device
@ 2024-04-25 1:54 Guangwu Zhang
2024-04-26 1:09 ` Ming Lei
0 siblings, 1 reply; 3+ messages in thread
From: Guangwu Zhang @ 2024-04-25 1:54 UTC (permalink / raw)
To: Ming Lei, linux-block
Hi,
the format FS command will hung up with ublk device.
# ublk --version
ublksrv 1.1-7-gf01c509
kerne: 6.9.0-rc4.kasan
nvme0n1 259:1 0 1.5T 0 disk
└─nvme0n1p1 259:2 0 5G 0 part
# ublk add -t loop -f /dev/nvme0n1p1
dev id 0: nr_hw_queues 1 queue_depth 128 block size 4096 dev_capacity 10485760
max rq size 524288 daemon pid 3227 flags 0x42 state LIVE
ublkc: 245:0 ublkb: 259:3 owner: 0:0
queue 0: tid 3228 affinity(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
18 19 20 21 22 23 24 25 26 27 28 29 30 31 )
target {"backing_file":"/dev/nvme0n1p1","dev_size":5368709120,"direct_io":1,"name":"loop","type":1}
# mkfs.xfs -f /dev/ublkb0 << can not finish, pid 3239
meta-data=/dev/ublkb0 isize=512 agcount=4, agsize=327680 blks
= sectsz=4096 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=0
= reflink=1 bigtime=1 inobtcount=1 nrext64=0
data = bsize=4096 blocks=1310720, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=16384, version=2
= sectsz=4096 sunit=1 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
# cat /proc/3239/stack
[<0>] rq_qos_wait+0x12a/0x1f0
[<0>] wbt_wait+0x11a/0x240
[<0>] __rq_qos_throttle+0x49/0x90
[<0>] blk_mq_submit_bio+0x58c/0x19d0
[<0>] submit_bio_noacct_nocheck+0x40d/0x780
[<0>] blk_next_bio+0x41/0x50
[<0>] __blkdev_issue_zero_pages+0x1ba/0x370
[<0>] blkdev_issue_zeroout+0x1a7/0x390
[<0>] blkdev_fallocate+0x264/0x3d0
[<0>] vfs_fallocate+0x2b0/0xad0
[<0>] __x64_sys_fallocate+0xb4/0x100
[<0>] do_syscall_64+0x7b/0x160
[<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 862.171377] INFO: task mkfs.xfs:3239 blocked for more than 122 seconds.
[ 862.178073] Not tainted 6.9.0-rc4.kasan+ #1
[ 862.182820] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 862.190721] task:mkfs.xfs state:D stack:0 pid:3239
tgid:3239 ppid:1801 flags:0x00004002
[ 862.190733] Call Trace:
[ 862.190737] <TASK>
[ 862.190742] __schedule+0x65f/0x15a0
[ 862.190759] ? blk_mq_flush_plug_list.part.0+0xd7/0x4a0
[ 862.190774] ? __pfx___schedule+0x10/0x10
[ 862.190779] ? __blk_flush_plug+0x286/0x4a0
[ 862.190790] ? __pfx___blk_flush_plug+0x10/0x10
[ 862.190795] ? kmem_cache_alloc+0x145/0x390
[ 862.190803] ? __pfx_rq_wait_inc_below+0x10/0x10
[ 862.190812] ? __pfx_wbt_inflight_cb+0x10/0x10
[ 862.190820] schedule+0x70/0x1b0
[ 862.190826] io_schedule+0xc0/0x130
[ 862.190832] rq_qos_wait+0x12a/0x1f0
[ 862.190840] ? __pfx_rq_qos_wait+0x10/0x10
[ 862.190845] ? __pfx_rq_qos_wake_function+0x10/0x10
[ 862.190850] ? __pfx_wbt_inflight_cb+0x10/0x10
[ 862.190856] ? submit_bio_noacct_nocheck+0x78/0x780
[ 862.190862] wbt_wait+0x11a/0x240
[ 862.190868] ? __pfx_wbt_wait+0x10/0x10
[ 862.190874] __rq_qos_throttle+0x49/0x90
[ 862.190880] blk_mq_submit_bio+0x58c/0x19d0
[ 862.190888] ? __pfx_blk_mq_submit_bio+0x10/0x10
[ 862.190893] ? bio_associate_blkg_from_css+0x246/0xad0
[ 862.190901] ? blk_cgroup_bio_start+0x281/0x3e0
[ 862.190908] submit_bio_noacct_nocheck+0x40d/0x780
[ 862.190912] ? bio_alloc_bioset+0x43c/0x770
[ 862.190923] ? __pfx_submit_bio_noacct_nocheck+0x10/0x10
[ 862.190929] ? submit_bio_noacct+0x210/0x1780
[ 862.190935] blk_next_bio+0x41/0x50
[ 862.190939] __blkdev_issue_zero_pages+0x1ba/0x370
[ 862.190948] ? __pfx_vfs_write+0x10/0x10
[ 862.190957] blkdev_issue_zeroout+0x1a7/0x390
[ 862.190964] ? __pfx_blkdev_issue_zeroout+0x10/0x10
[ 862.190974] blkdev_fallocate+0x264/0x3d0
[ 862.190981] vfs_fallocate+0x2b0/0xad0
[ 862.190989] __x64_sys_fallocate+0xb4/0x100
[ 862.190994] do_syscall_64+0x7b/0x160
[ 862.191005] ? __vm_munmap+0x139/0x230
[ 862.191012] ? __pfx___vm_munmap+0x10/0x10
[ 862.191016] ? fpregs_restore_userregs+0xe3/0x1f0
[ 862.191029] ? syscall_exit_work+0xff/0x130
[ 862.191037] ? syscall_exit_to_user_mode+0x78/0x200
[ 862.191045] ? do_syscall_64+0x87/0x160
[ 862.191050] ? do_user_addr_fault+0x477/0xad0
[ 862.191061] ? exc_page_fault+0x59/0xc0
[ 862.191067] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 862.191077] RIP: 0033:0x7f2c9e3025da
[ 862.191083] RSP: 002b:00007ffd2f328b38 EFLAGS: 00000246 ORIG_RAX:
000000000000011d
[ 862.191089] RAX: ffffffffffffffda RBX: 00000000a0009000 RCX: 00007f2c9e3025da
[ 862.191094] RDX: 00000000a0009000 RSI: 0000000000000010 RDI: 0000000000000004
[ 862.191097] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000000
[ 862.191100] R10: 0000000004000000 R11: 0000000000000246 R12: 00005629d6f66870
[ 862.191104] R13: 00007ffd2f329260 R14: 0000000000000004 R15: 0000000000000002
[ 862.191111] </TASK
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [bug report] Format FS failed with ublk device
2024-04-25 1:54 [bug report] Format FS failed with ublk device Guangwu Zhang
@ 2024-04-26 1:09 ` Ming Lei
2024-04-28 10:09 ` Ming Lei
0 siblings, 1 reply; 3+ messages in thread
From: Ming Lei @ 2024-04-26 1:09 UTC (permalink / raw)
To: Guangwu Zhang; +Cc: linux-block, ming.lei
Hi Guangwu,
Thanks for the report!
On Thu, Apr 25, 2024 at 09:54:04AM +0800, Guangwu Zhang wrote:
> Hi,
> the format FS command will hung up with ublk device.
>
> # ublk --version
> ublksrv 1.1-7-gf01c509
>
> kerne: 6.9.0-rc4.kasan
>
>
> nvme0n1 259:1 0 1.5T 0 disk
> └─nvme0n1p1 259:2 0 5G 0 part
> # ublk add -t loop -f /dev/nvme0n1p1
> dev id 0: nr_hw_queues 1 queue_depth 128 block size 4096 dev_capacity 10485760
> max rq size 524288 daemon pid 3227 flags 0x42 state LIVE
> ublkc: 245:0 ublkb: 259:3 owner: 0:0
> queue 0: tid 3228 affinity(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
> 18 19 20 21 22 23 24 25 26 27 28 29 30 31 )
> target {"backing_file":"/dev/nvme0n1p1","dev_size":5368709120,"direct_io":1,"name":"loop","type":1}
>
> # mkfs.xfs -f /dev/ublkb0 << can not finish, pid 3239
> meta-data=/dev/ublkb0 isize=512 agcount=4, agsize=327680 blks
> = sectsz=4096 attr=2, projid32bit=1
> = crc=1 finobt=1, sparse=1, rmapbt=0
> = reflink=1 bigtime=1 inobtcount=1 nrext64=0
> data = bsize=4096 blocks=1310720, imaxpct=25
> = sunit=0 swidth=0 blks
> naming =version 2 bsize=4096 ascii-ci=0, ftype=1
> log =internal log bsize=4096 blocks=16384, version=2
> = sectsz=4096 sunit=1 blks, lazy-count=1
> realtime =none extsz=4096 blocks=0, rtextents=0
>
> # cat /proc/3239/stack
> [<0>] rq_qos_wait+0x12a/0x1f0
> [<0>] wbt_wait+0x11a/0x240
> [<0>] __rq_qos_throttle+0x49/0x90
> [<0>] blk_mq_submit_bio+0x58c/0x19d0
> [<0>] submit_bio_noacct_nocheck+0x40d/0x780
> [<0>] blk_next_bio+0x41/0x50
> [<0>] __blkdev_issue_zero_pages+0x1ba/0x370
> [<0>] blkdev_issue_zeroout+0x1a7/0x390
> [<0>] blkdev_fallocate+0x264/0x3d0
> [<0>] vfs_fallocate+0x2b0/0xad0
> [<0>] __x64_sys_fallocate+0xb4/0x100
> [<0>] do_syscall_64+0x7b/0x160
> [<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> [ 862.171377] INFO: task mkfs.xfs:3239 blocked for more than 122 seconds.
> [ 862.178073] Not tainted 6.9.0-rc4.kasan+ #1
> [ 862.182820] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
Looks it might be one blk-wbt issue, and ublk-loop doesn't setup
write_zero_max_bytes and it may take a bit long for __blkdev_issue_zero_pages
to complete, but it shouldn't hang.
Can you collect the following bpftrace by starting it before running mkfs?
And I can't reproduce it in my environment.
#!/usr/bin/bpftrace
kretfunc:vfs_fallocate
{
printf("vfs_fallocate on %s ret %d (%x %lx %u)\n",
str(args->file->f_path.dentry->d_name.name),
retval, args->mode, args->offset, args->len);
}
Thanks,
Ming
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [bug report] Format FS failed with ublk device
2024-04-26 1:09 ` Ming Lei
@ 2024-04-28 10:09 ` Ming Lei
0 siblings, 0 replies; 3+ messages in thread
From: Ming Lei @ 2024-04-28 10:09 UTC (permalink / raw)
To: Guangwu Zhang; +Cc: linux-block, ming.lei
On Fri, Apr 26, 2024 at 09:09:57AM +0800, Ming Lei wrote:
> Hi Guangwu,
>
> Thanks for the report!
>
> On Thu, Apr 25, 2024 at 09:54:04AM +0800, Guangwu Zhang wrote:
> > Hi,
> > the format FS command will hung up with ublk device.
> >
> > # ublk --version
> > ublksrv 1.1-7-gf01c509
> >
> > kerne: 6.9.0-rc4.kasan
> >
> >
> > nvme0n1 259:1 0 1.5T 0 disk
> > └─nvme0n1p1 259:2 0 5G 0 part
> > # ublk add -t loop -f /dev/nvme0n1p1
> > dev id 0: nr_hw_queues 1 queue_depth 128 block size 4096 dev_capacity 10485760
> > max rq size 524288 daemon pid 3227 flags 0x42 state LIVE
> > ublkc: 245:0 ublkb: 259:3 owner: 0:0
> > queue 0: tid 3228 affinity(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
> > 18 19 20 21 22 23 24 25 26 27 28 29 30 31 )
> > target {"backing_file":"/dev/nvme0n1p1","dev_size":5368709120,"direct_io":1,"name":"loop","type":1}
> >
> > # mkfs.xfs -f /dev/ublkb0 << can not finish, pid 3239
> > meta-data=/dev/ublkb0 isize=512 agcount=4, agsize=327680 blks
> > = sectsz=4096 attr=2, projid32bit=1
> > = crc=1 finobt=1, sparse=1, rmapbt=0
> > = reflink=1 bigtime=1 inobtcount=1 nrext64=0
> > data = bsize=4096 blocks=1310720, imaxpct=25
> > = sunit=0 swidth=0 blks
> > naming =version 2 bsize=4096 ascii-ci=0, ftype=1
> > log =internal log bsize=4096 blocks=16384, version=2
> > = sectsz=4096 sunit=1 blks, lazy-count=1
> > realtime =none extsz=4096 blocks=0, rtextents=0
> >
> > # cat /proc/3239/stack
> > [<0>] rq_qos_wait+0x12a/0x1f0
> > [<0>] wbt_wait+0x11a/0x240
> > [<0>] __rq_qos_throttle+0x49/0x90
> > [<0>] blk_mq_submit_bio+0x58c/0x19d0
> > [<0>] submit_bio_noacct_nocheck+0x40d/0x780
> > [<0>] blk_next_bio+0x41/0x50
> > [<0>] __blkdev_issue_zero_pages+0x1ba/0x370
> > [<0>] blkdev_issue_zeroout+0x1a7/0x390
> > [<0>] blkdev_fallocate+0x264/0x3d0
> > [<0>] vfs_fallocate+0x2b0/0xad0
> > [<0>] __x64_sys_fallocate+0xb4/0x100
> > [<0>] do_syscall_64+0x7b/0x160
> > [<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> >
> > [ 862.171377] INFO: task mkfs.xfs:3239 blocked for more than 122 seconds.
> > [ 862.178073] Not tainted 6.9.0-rc4.kasan+ #1
> > [ 862.182820] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
>
> Looks it might be one blk-wbt issue, and ublk-loop doesn't setup
> write_zero_max_bytes and it may take a bit long for __blkdev_issue_zero_pages
> to complete, but it shouldn't hang.
>
> Can you collect the following bpftrace by starting it before running mkfs?
> And I can't reproduce it in my environment.
>
> #!/usr/bin/bpftrace
> kretfunc:vfs_fallocate
> {
> printf("vfs_fallocate on %s ret %d (%x %lx %u)\n",
> str(args->file->f_path.dentry->d_name.name),
> retval, args->mode, args->offset, args->len);
> }
After co-working with Guangwu, the issue is now root-caused:
1) vfs_fallocate() can't translate block DISCARD into real discard, and
'FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE'
is supposed to be capable of doing that, but vfs doesn't allow
FALLOC_FL_NO_HIDE_STALE
2) so ublk discard is actually converted to write-zeroes because ublksrv
converts discard into fallocate(FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE),
and that is the reason why mkfs.xfs takes too long, and Guangwu confirmed
that zeroing out is in-progress actually, not hang.
Does FALLOC_FL_PUNCH_HOLE have to imply zeroing out for block device?
3) now fix is pushed to ublksrv by translating ublk discard into
ioctl(DISCARD) for block device
And same issue exists on kernel loop driver too.
Thanks,
Ming
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-04-28 10:10 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-25 1:54 [bug report] Format FS failed with ublk device Guangwu Zhang
2024-04-26 1:09 ` Ming Lei
2024-04-28 10:09 ` Ming Lei
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.