* [bug] mkfs.ext[23] hangs on loop device (aarch64, 5.8+)
[not found] <94883604.6936811.1596720770623.JavaMail.zimbra@redhat.com>
@ 2020-08-06 17:47 ` Jan Stancek
2020-08-18 2:29 ` Ming Lei
0 siblings, 1 reply; 5+ messages in thread
From: Jan Stancek @ 2020-08-06 17:47 UTC (permalink / raw)
To: linux-block
Hi,
I'm seeing sporadic mkfs.ext[23] hangs on loop device while running various
LTP tests. It seems to hang indefinitely once in bad state:
0 D root 29782 29761 0 80 0 - 1006 rq_qos 15:09 ? 00:00:00 mkfs.ext3 /dev/loop0
[19809.932566] mkfs.ext3 D 0 29782 29761 0x00000000
[19809.934000] Call trace:
[19809.934624] __switch_to+0xfc/0x150
[19809.935533] __schedule+0x364/0x828
[19809.936432] schedule+0x58/0xe0
[19809.937261] io_schedule+0x24/0xc0
[19809.938144] rq_qos_wait+0xe4/0x150
[19809.939044] wbt_wait+0x98/0xd8
[19809.939864] __rq_qos_throttle+0x38/0x50
[19809.940847] blk_mq_submit_bio+0x108/0x620
[19809.941890] submit_bio_noacct+0x358/0x3d8
[19809.942909] submit_bio+0x40/0x1a8
[19809.943770] submit_bh_wbc+0x16c/0x1e8
[19809.944701] __block_write_full_page+0x238/0x5c8
[19809.945862] block_write_full_page+0x124/0x138
[19809.947000] blkdev_writepage+0x24/0x30
[19809.948031] __writepage+0x28/0xc8
[19809.948905] write_cache_pages+0x1ac/0x410
[19809.949988] generic_writepages+0x4c/0x88
[19809.950947] blkdev_writepages+0x18/0x28
[19809.951934] do_writepages+0x40/0xe8
[19809.952856] __filemap_fdatawrite_range+0xe0/0x150
[19809.954066] file_write_and_wait_range+0x9c/0x108
[19809.955266] blkdev_fsync+0x24/0x50
[19809.956170] vfs_fsync_range+0x3c/0x88
[19809.957126] do_fsync+0x44/0x90
[19809.957925] __arm64_sys_fsync+0x20/0x30
[19809.958961] el0_svc_common.constprop.0+0x7c/0x188
[19809.960242] do_el0_svc+0x2c/0x98
[19809.961028] el0_sync_handler+0x84/0x110
[19809.962003] el0_sync+0x15c/0x180
It started happening in recent weeks and appears to be aarch64 exclusive so far.
Affected kernels are at least:
v5.8-475-g382625d0d432
v5.8-607-gcdc8fcb49905
v5.8-rc2-87-g6b7b181b67aa
v5.8-rc2-105-g492d76b21566
6b7b181b67aa is the oldest commit I could reproduce it with, but my current
reproducer (running LTP fgetxattr01 in loop for 30 minutes) doesn't look very
reliable for bisect.
Does this ring any bells?
Thanks,
Jan
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [bug] mkfs.ext[23] hangs on loop device (aarch64, 5.8+)
2020-08-06 17:47 ` [bug] mkfs.ext[23] hangs on loop device (aarch64, 5.8+) Jan Stancek
@ 2020-08-18 2:29 ` Ming Lei
2020-08-18 6:19 ` Jan Stancek
0 siblings, 1 reply; 5+ messages in thread
From: Ming Lei @ 2020-08-18 2:29 UTC (permalink / raw)
To: Jan Stancek; +Cc: linux-block
On Thu, Aug 06, 2020 at 01:47:33PM -0400, Jan Stancek wrote:
> Hi,
>
> I'm seeing sporadic mkfs.ext[23] hangs on loop device while running various
> LTP tests. It seems to hang indefinitely once in bad state:
> 0 D root 29782 29761 0 80 0 - 1006 rq_qos 15:09 ? 00:00:00 mkfs.ext3 /dev/loop0
>
> [19809.932566] mkfs.ext3 D 0 29782 29761 0x00000000
> [19809.934000] Call trace:
> [19809.934624] __switch_to+0xfc/0x150
> [19809.935533] __schedule+0x364/0x828
> [19809.936432] schedule+0x58/0xe0
> [19809.937261] io_schedule+0x24/0xc0
> [19809.938144] rq_qos_wait+0xe4/0x150
> [19809.939044] wbt_wait+0x98/0xd8
> [19809.939864] __rq_qos_throttle+0x38/0x50
> [19809.940847] blk_mq_submit_bio+0x108/0x620
> [19809.941890] submit_bio_noacct+0x358/0x3d8
> [19809.942909] submit_bio+0x40/0x1a8
> [19809.943770] submit_bh_wbc+0x16c/0x1e8
> [19809.944701] __block_write_full_page+0x238/0x5c8
> [19809.945862] block_write_full_page+0x124/0x138
> [19809.947000] blkdev_writepage+0x24/0x30
> [19809.948031] __writepage+0x28/0xc8
> [19809.948905] write_cache_pages+0x1ac/0x410
> [19809.949988] generic_writepages+0x4c/0x88
> [19809.950947] blkdev_writepages+0x18/0x28
> [19809.951934] do_writepages+0x40/0xe8
> [19809.952856] __filemap_fdatawrite_range+0xe0/0x150
> [19809.954066] file_write_and_wait_range+0x9c/0x108
> [19809.955266] blkdev_fsync+0x24/0x50
> [19809.956170] vfs_fsync_range+0x3c/0x88
> [19809.957126] do_fsync+0x44/0x90
> [19809.957925] __arm64_sys_fsync+0x20/0x30
> [19809.958961] el0_svc_common.constprop.0+0x7c/0x188
> [19809.960242] do_el0_svc+0x2c/0x98
> [19809.961028] el0_sync_handler+0x84/0x110
> [19809.962003] el0_sync+0x15c/0x180
>
> It started happening in recent weeks and appears to be aarch64 exclusive so far.
>
> Affected kernels are at least:
> v5.8-475-g382625d0d432
> v5.8-607-gcdc8fcb49905
> v5.8-rc2-87-g6b7b181b67aa
> v5.8-rc2-105-g492d76b21566
>
> 6b7b181b67aa is the oldest commit I could reproduce it with, but my current
> reproducer (running LTP fgetxattr01 in loop for 30 minutes) doesn't look very
> reliable for bisect.
>
> Does this ring any bells?
I saw this kind io hang in ltp/fs_fill test reliably and the loop is
over image in tmpfs:
https://lkml.org/lkml/2020/7/26/77
And I have verified that the following patch can fix the issue:
https://lore.kernel.org/linux-block/bc5fa941-3b7c-f28e-dd46-1a1d6e5c40a8@kernel.dk/T/#t
Thanks,
Ming
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [bug] mkfs.ext[23] hangs on loop device (aarch64, 5.8+)
2020-08-18 2:29 ` Ming Lei
@ 2020-08-18 6:19 ` Jan Stancek
2020-08-19 6:37 ` Jan Stancek
0 siblings, 1 reply; 5+ messages in thread
From: Jan Stancek @ 2020-08-18 6:19 UTC (permalink / raw)
To: Ming Lei; +Cc: linux-block
----- Original Message -----
> I saw this kind io hang in ltp/fs_fill test reliably and the loop is
> over image in tmpfs:
>
> https://lkml.org/lkml/2020/7/26/77
>
> And I have verified that the following patch can fix the issue:
>
> https://lore.kernel.org/linux-block/bc5fa941-3b7c-f28e-dd46-1a1d6e5c40a8@kernel.dk/T/#t
Thanks, I'll test your patch with my setup.
In my case, I traced requests going up to blk_mq_sched_insert_requests(),
but they never made it to loop driver code (loop_queue_rq / lo_complete_rq),
so I assumed they are getting lost somewhere in mq scheduling.
After hang, there were always several requests stuck "inflight":
# cat /sys/kernel/debug/block/loop0/rqos/wbt/inflight
0: inflight 41
1: inflight 0
2: inflight 0
With some additional traces I could see requests being at dispatch list
and state == 0, which appears to fit description of problem you've seen:
blk_mq_sched_insert_requests: blk_mq_sched_insert_requests hctx: ffff000168598000, ctx: fffffdffbff16dc0
wbt_wait: wbt_wait rqos: ffff00016a5e1358, rqw: ffff00016a5e1388, bio: ffff0000da6bbd00, inflight: 41
<wbt_wait goes to sleep>
crash> bio.bi_disk ffff0000da6bbd00
bi_disk = 0xffff000168599800
crash> gendisk.disk_name 0xffff000168599800
disk_name = "loop0\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
crash> blk_mq_hw_ctx.queue 0xffff000168598000
queue = 0xffff000117c06800
crash> request_queue.rq_qos 0xffff000117c06800
rq_qos = 0xffff00016a5e1358
crash> blk_mq_hw_ctx.state 0xffff000168598000
state = 0
crash> list blk_mq_hw_ctx.dispatch -h 0xffff000168598000 | wc -l
42
Regards,
Jan
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [bug] mkfs.ext[23] hangs on loop device (aarch64, 5.8+)
2020-08-18 6:19 ` Jan Stancek
@ 2020-08-19 6:37 ` Jan Stancek
2020-08-21 9:45 ` [LTP] Fwd: " Jan Stancek
0 siblings, 1 reply; 5+ messages in thread
From: Jan Stancek @ 2020-08-19 6:37 UTC (permalink / raw)
To: Ming Lei; +Cc: linux-block
----- Original Message -----
>
>
> ----- Original Message -----
> > I saw this kind io hang in ltp/fs_fill test reliably and the loop is
> > over image in tmpfs:
> >
> > https://lkml.org/lkml/2020/7/26/77
> >
> > And I have verified that the following patch can fix the issue:
> >
> > https://lore.kernel.org/linux-block/bc5fa941-3b7c-f28e-dd46-1a1d6e5c40a8@kernel.dk/T/#t
>
> Thanks, I'll test your patch with my setup.
I've seen no hangs over past ~24 hours with patch above.
Thanks,
Jan
^ permalink raw reply [flat|nested] 5+ messages in thread
* [LTP] Fwd: [bug] mkfs.ext[23] hangs on loop device (aarch64, 5.8+)
2020-08-19 6:37 ` Jan Stancek
@ 2020-08-21 9:45 ` Jan Stancek
0 siblings, 0 replies; 5+ messages in thread
From: Jan Stancek @ 2020-08-21 9:45 UTC (permalink / raw)
To: ltp
> > ----- Original Message -----
> > > I saw this kind io hang in ltp/fs_fill test reliably and the loop is
> > > over image in tmpfs:
> > >
> > > https://lkml.org/lkml/2020/7/26/77
> > >
> > > And I have verified that the following patch can fix the issue:
> > >
> > > https://lore.kernel.org/linux-block/bc5fa941-3b7c-f28e-dd46-1a1d6e5c40a8@kernel.dk/T/#t
> >
> > Thanks, I'll test your patch with my setup.
>
> I've seen no hangs over past ~24 hours with patch above.
>
> Thanks,
> Jan
fyi, in case you see sporadic I/O hangs, usually while running LTP tests
that make use of loop device with recent upstream kernels:
https://lore.kernel.org/linux-block/1929570063.6965184.1596736053281.JavaMail.zimbra@redhat.com/
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-08-21 9:45 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <94883604.6936811.1596720770623.JavaMail.zimbra@redhat.com>
2020-08-06 17:47 ` [bug] mkfs.ext[23] hangs on loop device (aarch64, 5.8+) Jan Stancek
2020-08-18 2:29 ` Ming Lei
2020-08-18 6:19 ` Jan Stancek
2020-08-19 6:37 ` Jan Stancek
2020-08-21 9:45 ` [LTP] Fwd: " Jan Stancek
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.