linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* VM Boot Hangs with Commit "Revert "scsi: core: run queue if SCSI device queue isn't ready and queue is idle"" on linux-5.4.y
@ 2023-07-24  5:27 Sherry Yang
  2023-07-24 14:39 ` Doug Anderson
  0 siblings, 1 reply; 2+ messages in thread
From: Sherry Yang @ 2023-07-24  5:27 UTC (permalink / raw)
  To: dianders; +Cc: Harshit Mogalapalli, George Kennedy, linux-scsi, linux-kernel

Hi Douglas,

We observed linux-stable v5.4 VM boot hangs, but probably only 1 in thousands of boots (less than 10,000 boots).  We started 16 VMs on a Bare Metal with loop reboots, I chose 10,000 boots as a threshold, and bisected it. After a painful bisection, I found the culprit commit 578c8f09c04b (“Revert scsi: core: run queue if SCSI device queue isnt ready and queue is idle”). This commit actually was merged to v5.8 the 1st time. It's a series of patch set (https://www.spinics.net/lists/linux-block/msg51866.html). Actually, in the 4-patch series, 2 of them have already been backported to linux-stable v5.4, but not at the same time:

1) ab3cee3762e5 (“blk-mq: In blk_mq_dispatch_rq_list() no budget is a reason to kick”) in tag v5.4.86
2) 578c8f09c04b (“Revert scsi: core: run queue if SCSI device queue isnt ready and queue is idle”) in tag v5.4.235, it’s backported as stable dependency for another commit 

	Signed-off-by: Douglas Anderson <dianders@chromium.org>
	Reviewed-by: Ming Lei <ming.lei@redhat.com>
	Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
	Signed-off-by: Jens Axboe <axboe@kernel.dk>
	Stable-dep-of: c31e76bcc379 ("blk-mq: remove stale comment for blk_mq_sched_mark_restart_hctx”)
	Signed-off-by: Sasha Levin <sashal@kernel.org>

And I tried backporting the other 2 patches to v5.4, the issue is still reproducible.

I tested multiple kernels, the issue is not reproducible within 10,000 boots in the following kernels:
1) Linux v5.9
2) Linux v5.4.249 + revert of 578c8f09c04b (“Revert scsi: core: run queue if SCSI device queue isnt ready and queue is idle”)

Not exactly sure how this commit is affecting linux-stable v5.4, but I suspect some prerequisite commits are missing which lead to boot hangs on linux-stable v5.4 but not on higher releases. Could you take a look at this issue and share your insight? 

Here is the call trace:
[  369.850681] INFO: task systemd-udevd:313 blocked for more than 122 seconds.
[  369.852180]       Not tainted 5.4.248-master.20230608.el8.dev.x86_64 #1
[  369.853631] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  369.855406] systemd-udevd   D    0   313    282 0x80004106
[  369.856662] Call Trace:
[  369.857276]  __schedule+0x213/0x570
[  369.858017]  schedule+0x42/0xb0
[  369.858699]  io_schedule+0x45/0x70
[  369.859671]  wait_on_page_bit_common+0x131/0x380
[  369.860884]  ? find_get_entries+0x1a9/0x260
[  369.862011]  ? __filemap_set_wb_err+0x70/0x70
[  369.863239]  __lock_page+0x44/0x50
[  369.864122]  truncate_inode_pages_range+0x463/0x8a0
[  369.865516]  ? pagevec_lookup_range_tag+0x28/0x40
[  369.866891]  ? free_cpumask_var+0x9/0x10
[  369.867909]  ? mark_buffer_async_write+0x30/0x30
[  369.869019]  ? get_ksm_page+0xf6/0x210
[  369.869936]  ? free_cpumask_var+0x9/0x10
[  369.871020]  ? on_each_cpu_cond_mask+0xb1/0x130
[  369.872062]  truncate_inode_pages+0x15/0x20
[  369.872739]  __blkdev_put+0xa7/0x220
[  369.873408]  ? exit_mmap+0x121/0x1b0
[  369.874092]  blkdev_put+0x4e/0xe0
[  369.874674]  blkdev_close+0x26/0x30
[  369.875281]  __fput+0xcc/0x260
[  369.875787]  ____fput+0xe/0x10
[  369.876298]  task_work_run+0x8b/0xb0
[  369.876892]  do_exit+0x1ff/0x420
[  369.877414]  do_group_exit+0x3b/0xb0
[  369.877989]  get_signal+0x169/0x8b0
[  369.878549]  do_signal+0x2a/0x100
[  369.879130]  ? __vfs_read+0x29/0x40
[  369.879713]  ? vfs_read+0xaa/0x160
[  369.880281]  ? ksys_read+0x67/0xe0
[  369.880858]  prepare_exit_to_usermode+0x12b/0x1a0
[  369.881614]  do_syscall_64+0x8e/0x100
[  369.882199]  entry_SYSCALL_64_after_hwframe+0x5c/0xc1
[  369.883056] RIP: 0033:0x7f6d466de8c2
[  369.883664] Code: Bad RIP value.
[  369.884217] RSP: 002b:00007fffffadb558 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[  369.885433] RAX: fffffffffffffffc RBX: 000055b0b3c40ab0 RCX: 00007f6d466de8c2
[  369.886552] RDX: 0000000000000200 RSI: 000055b0b3c40ad8 RDI: 000000000000000f
[  369.887692] RBP: 000055b0b3c3dd20 R08: 000055b0b3c40ab0 R09: 0000000000000001
[  369.888842] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000004000
[  369.889963] R13: 0000000000000200 R14: 000055b0b3c3dd70 R15: 000055b0b3c40ac8
[  369.891529] INFO: task systemd-udevd:315 blocked for more than 122 seconds.
[  369.893182]       Not tainted 5.4.248-master.20230608.el8.dev.x86_64 #1
[  369.894557] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  369.896318] systemd-udevd   D    0   315    282 0x80004106
[  369.897596] Call Trace:
[  369.898197]  __schedule+0x213/0x570
[  369.899068]  schedule+0x42/0xb0
[  369.899813]  io_schedule+0x45/0x70
[  369.900640]  wait_on_page_bit_common+0x131/0x380
[  369.901711]  ? find_get_entries+0x1a9/0x260
[  369.902688]  ? __filemap_set_wb_err+0x70/0x70
[  369.903710]  __lock_page+0x44/0x50
[  369.904504]  truncate_inode_pages_range+0x463/0x8a0
[  369.905648]  ? find_get_pages_range_tag+0x7e/0x2d0
[  369.906777]  ? pagevec_lookup_range_tag+0x28/0x40
[  369.907881]  ? free_cpumask_var+0x9/0x10
[  369.908816]  ? mark_buffer_async_write+0x30/0x30
[  369.909836]  ? get_ksm_page+0xf0/0x210
[  369.910738]  ? __x64_sys_fsopen+0x160/0x160
[  369.911743]  ? free_cpumask_var+0x9/0x10
[  369.912694]  ? on_each_cpu_cond_mask+0xb1/0x130
[  369.913763]  truncate_inode_pages+0x15/0x20
[  369.914747]  __blkdev_put+0xa7/0x220
[  369.915565]  ? exit_mmap+0x121/0x1b0
[  369.916407]  blkdev_put+0x4e/0xe0
[  369.917163]  blkdev_close+0x26/0x30
[  369.917960]  __fput+0xcc/0x260
[  369.918669]  ____fput+0xe/0x10
[  369.919362]  task_work_run+0x8b/0xb0
[  369.920185]  do_exit+0x1ff/0x420
[  369.920953]  do_group_exit+0x3b/0xb0
[  369.921776]  get_signal+0x169/0x8b0
[  369.922597]  do_signal+0x2a/0x100
[  369.923466]  ? __vfs_read+0x29/0x40
[  369.924424]  ? vfs_read+0xaa/0x160
[  369.925256]  ? ksys_read+0x67/0xe0
[  369.926085]  prepare_exit_to_usermode+0x12b/0x1a0
[  369.927204]  do_syscall_64+0x8e/0x100
[  369.928063]  entry_SYSCALL_64_after_hwframe+0x5c/0xc1
[  369.929238] RIP: 0033:0x7f6d466de8c2
[  369.930056] Code: Bad RIP value.
[  369.930799] RSP: 002b:00007fffffadb518 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[  369.932496] RAX: fffffffffffffffc RBX: 000055b0b3c3dd30 RCX: 00007f6d466de8c2
[  369.934119] RDX: 0000000000000040 RSI: 000055b0b3c3dd58 RDI: 000000000000000e
[  369.935700] RBP: 000055b0b3c3dbe0 R08: 000055b0b3c3dd30 R09: 0000000000000001
[  369.937300] R10: 0000000000000001 R11: 0000000000000246 R12: 00000002e6df0000
[  369.938873] R13: 0000000000000040 R14: 000055b0b3c3dc30 R15: 000055b0b3c3dd48

Thanks,
Sherry

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: VM Boot Hangs with Commit "Revert "scsi: core: run queue if SCSI device queue isn't ready and queue is idle"" on linux-5.4.y
  2023-07-24  5:27 VM Boot Hangs with Commit "Revert "scsi: core: run queue if SCSI device queue isn't ready and queue is idle"" on linux-5.4.y Sherry Yang
@ 2023-07-24 14:39 ` Doug Anderson
  0 siblings, 0 replies; 2+ messages in thread
From: Doug Anderson @ 2023-07-24 14:39 UTC (permalink / raw)
  To: Sherry Yang; +Cc: Harshit Mogalapalli, George Kennedy, linux-scsi, linux-kernel

Hi,

On Sun, Jul 23, 2023 at 10:28 PM Sherry Yang <sherry.yang@oracle.com> wrote:
>
> Hi Douglas,
>
> We observed linux-stable v5.4 VM boot hangs, but probably only 1 in thousands of boots (less than 10,000 boots).  We started 16 VMs on a Bare Metal with loop reboots, I chose 10,000 boots as a threshold, and bisected it. After a painful bisection, I found the culprit commit 578c8f09c04b (“Revert scsi: core: run queue if SCSI device queue isnt ready and queue is idle”). This commit actually was merged to v5.8 the 1st time. It's a series of patch set (https://www.spinics.net/lists/linux-block/msg51866.html). Actually, in the 4-patch series, 2 of them have already been backported to linux-stable v5.4, but not at the same time:
>
> 1) ab3cee3762e5 (“blk-mq: In blk_mq_dispatch_rq_list() no budget is a reason to kick”) in tag v5.4.86
> 2) 578c8f09c04b (“Revert scsi: core: run queue if SCSI device queue isnt ready and queue is idle”) in tag v5.4.235, it’s backported as stable dependency for another commit
>
>         Signed-off-by: Douglas Anderson <dianders@chromium.org>
>         Reviewed-by: Ming Lei <ming.lei@redhat.com>
>         Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
>         Signed-off-by: Jens Axboe <axboe@kernel.dk>
>         Stable-dep-of: c31e76bcc379 ("blk-mq: remove stale comment for blk_mq_sched_mark_restart_hctx”)
>         Signed-off-by: Sasha Levin <sashal@kernel.org>
>
> And I tried backporting the other 2 patches to v5.4, the issue is still reproducible.
>
> I tested multiple kernels, the issue is not reproducible within 10,000 boots in the following kernels:
> 1) Linux v5.9
> 2) Linux v5.4.249 + revert of 578c8f09c04b (“Revert scsi: core: run queue if SCSI device queue isnt ready and queue is idle”)
>
> Not exactly sure how this commit is affecting linux-stable v5.4, but I suspect some prerequisite commits are missing which lead to boot hangs on linux-stable v5.4 but not on higher releases. Could you take a look at this issue and share your insight?

Ugh, I spent many days pouring over the code and digging through debug
traces in order to write those patches. I don't think I'd be able to
give any concrete advice without spending many days and being able to
reproduce multiple times with traces since pretty much any knowledge I
learned during the course of developing those patches has decayed over
the last several years. :( I don't happen to know any dependencies
offhand...

That being said, it seems like:

1. Backporting the revert (the 4th patch in the series) without all
the other patches in the series feels wrong. In the text of the revert
I explicitly refer to the other patches in the series as
prerequisites. I guess you said you tried backporting the other two
patches and they didn't help, though? That's no good. :(

2. I don't think the revert is actually important to backport to
stable. While the first 3 patches were important to fix the problems I
was seeing, the revert was just a cleanup. If the revert is causing
problems in 5.4.x then I'd suggest removing it from 5.4.x

Does that make sense? So ideally you'd submit 3 patches to the stable kernel:

a) Revert the revert

b) Pick ("blk-mq: Add blk_mq_delay_run_hw_queues() API call")

c) Pick ("blk-mq: Rerun dispatching in the case of budget contention")


FWIW, we seem to have all 4 patches in the ChromeOS 5.4 kernel tree.
They all landed together plus 1 prerequisite.

* https://crrev.com/c/2155423 - FROMGIT: Revert "scsi: core: run queue
if SCSI device queue isn't ready and queue is idle"
* https://crrev.com/c/2133069 - FROMGIT: blk-mq: Rerun dispatching in
the case of budget contention
* https://crrev.com/c/2155422 - FROMGIT: blk-mq: Add
blk_mq_delay_run_hw_queues() API call
* https://crrev.com/c/2125232 - FROMGIT: blk-mq: In
blk_mq_dispatch_rq_list() "no budget" is a reason to kick
* https://crrev.com/c/2155421 - UPSTREAM: blk-mq: Put driver tag in
blk_mq_dispatch_rq_list() when no budget

When I saw the prerequisite in there I was hopeful that it was the one
you needed, but it looks like that's already in 5.4 stable so (I
presume) you've already been testing with it...

-Doug

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-07-24 14:39 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-24  5:27 VM Boot Hangs with Commit "Revert "scsi: core: run queue if SCSI device queue isn't ready and queue is idle"" on linux-5.4.y Sherry Yang
2023-07-24 14:39 ` Doug Anderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).