From: bugzilla-daemon@kernel.org
To: linux-scsi@vger.kernel.org
Subject: [Bug 215880] Resume process hangs for 5-6 seconds starting sometime in 5.16
Date: Fri, 26 Aug 2022 07:00:38 +0000 [thread overview]
Message-ID: <bug-215880-11613-KMetxdJz9f@https.bugzilla.kernel.org/> (raw)
In-Reply-To: <bug-215880-11613@https.bugzilla.kernel.org/>
https://bugzilla.kernel.org/show_bug.cgi?id=215880
--- Comment #39 from damien.lemoal@opensource.wdc.com ---
On 8/26/22 07:15, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=215880
>
> --- Comment #38 from damien.lemoal@opensource.wdc.com ---
> On 8/26/22 05:31, bugzilla-daemon@kernel.org wrote:
>> https://bugzilla.kernel.org/show_bug.cgi?id=215880
>>
>> --- Comment #37 from Bart Van Assche (bvanassche@acm.org) ---
>> On 8/25/22 13:01, bugzilla-daemon@kernel.org wrote:
>>> https://bugzilla.kernel.org/show_bug.cgi?id=215880
>>>
>>> --- Comment #36 from jason600 (jason600.groome@gmail.com) ---
>>> (In reply to Bart Van Assche from comment #27)
>>>> Thanks for testing! The patches from the sd-resume branch have been posted
>>>> on the linux-scsi mailing list. See also
>>>>
>>>>
>>>>
>>>> https://lore.kernel.org/linux-scsi/20220628222131.14780-1-bvanassche@acm.org/
>>>> T/#t
>>>
>>> Hi Bart, just an update for you. I noticed this had been removed from the
>>> 6.0-rc1 for freezing after suspend.
>>>
>>> I've been compiling my kernel with this fix on various 5.18 kernels (with
>>> opensuse tumbleweed), it has worked fine, no freezing on resume as others
>>> have
>>> mentioned.
>>>
>>> Yesterday, I updated to 5.19.2 kernel, applied the fix, recompiled, and it
>>> froze after the first suspend. Rebooted and the same thing happened again.
>>> I
>>> recompiled the kernel with the fix, just to make sure i didn't mess it up,
>>> and
>>> the same happened again.
>>>
>>> When you originally did this fix, you based it on 5.18, and indeed, it
>>> works
>>> fine on 5.18 for me. There were a lot of changes to the drivers/scsi/sd.c
>>> file
>>> for 5.19, presumably it was those changes that made this fix start freezing
>>> after suspend.
>>>
>>> Perhaps you could check if the other people that experienced freezing were
>>> using either 5.19 or 6.0-rc1.
>>
>> Multiple people reported issues with freezes during suspend with kernel
>> v6.0-rc1. Please take a look at the following report:
>>
>> https://lore.kernel.org/all/dd6844e7-f338-a4e9-2dad-0960e25b2ca1@redhat.com/.
>> It shows that if zoned ATA disks are present that blk_mq_freeze_queue()
>> may be called from inside ata_scsi_dev_rescan() on the context of a work
>> queue. ATA rescanning happens from inside the SCSI error handler. So
>> there is potential for a lockup because of the following:
>> * Execution of the START command being postponed because the SCSI error
>> handler is active.
>> * blk_mq_freeze_queue() waiting for the START command to finish.
>> * The START completion handler not being executed because it got queued
>> on the same work queue as the ATA rescan work.
Checking the code, the dev pm resume call chain for ATA look like this.
ata_port_resume() -> ata_port_request_pm() -> ata_port_schedule_eh() ->
scsi_error_handler() thread runs -> shost->transportt->eh_strategy_handler
== ata_scsi_error() -> ata_scsi_port_error_handler() ->
ata_eh_handle_port_resume() -> ap->ops->port_resume() ==
ahci_port_resume() for AHCI adapters.
There are no commands issued to the device by this chain, only
registers/port settings being changed. So this should always complete
quickly and in itself not be the reason for the START_STOP command issued
by the sd driver to get stuck.
After calling ata_eh_handle_port_resume(), ata_scsi_port_error_handler()
will trigger a reset through ata_std_error_handler() -> ata_do_eh() and
that will cause the device rescan in EH context. Device rescan will
definitely spin up the device so the START_STOP command that sd_resume()
issues seems rather useless...
As a hack, it would be good to try something like this:
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 29e2f55c6faa..1bc92c04f048 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1081,6 +1081,7 @@ int ata_scsi_dev_config(struct scsi_device *sdev,
struct ata_device *dev)
} else {
sdev->sector_size = ata_id_logical_sector_size(dev->id);
sdev->manage_start_stop = 1;
+ sdev->no_start_on_resume = 1;
}
/*
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 8f79fa6318fe..4c28ca4d038b 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3761,7 +3761,7 @@ static int sd_suspend_runtime(struct device *dev)
static int sd_resume(struct device *dev)
{
struct scsi_disk *sdkp = dev_get_drvdata(dev);
- int ret;
+ int ret = 0;
if (!sdkp) /* E.g.: runtime resume at the start of sd_probe() */
return 0;
@@ -3770,7 +3770,8 @@ static int sd_resume(struct device *dev)
return 0;
sd_printk(KERN_NOTICE, sdkp, "Starting disk\n");
- ret = sd_start_stop_device(sdkp, 1);
+ if (!sdkp->device->no_start_on_resume)
+ ret = sd_start_stop_device(sdkp, 1);
if (!ret)
opal_unlock_from_suspend(sdkp->opal_dev);
return ret;
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 3113471ca375..92e141536c6c 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -192,6 +192,7 @@ struct scsi_device {
unsigned no_start_on_add:1; /* do not issue start on add */
unsigned allow_restart:1; /* issue START_UNIT in error handler */
unsigned manage_start_stop:1; /* Let HLD (sd) manage start/stop */
+ unsigned no_start_on_resume:1; /* Do not issue START_STOP_UNIT on
resume */
unsigned start_stop_pwr_cond:1; /* Set power cond. in START_STOP_UNIT
*/
unsigned no_uld_attach:1; /* disable connecting to upper level drivers
*/
unsigned select_no_atn:1;
I am not sure at all this is correct though. This actually may break other
suspend/resume flavors. If we could somehow synchronize scsi pm resume to
run *after* ata pm resume, all problems should go away I think.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
next prev parent reply other threads:[~2022-08-26 7:01 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <bug-215880-11613@https.bugzilla.kernel.org/>
2022-06-27 4:09 ` [Bug 215880] Resume process hangs for 5-6 seconds starting sometime in 5.16 bugzilla-daemon
2022-06-28 21:17 ` bugzilla-daemon
2022-06-28 22:30 ` bugzilla-daemon
2022-07-15 18:44 ` bugzilla-daemon
2022-08-15 11:02 ` bugzilla-daemon
2022-08-15 13:36 ` bugzilla-daemon
2022-08-16 11:08 ` bugzilla-daemon
2022-08-16 15:44 ` bugzilla-daemon
2022-08-16 16:10 ` bugzilla-daemon
2022-08-16 16:14 ` bugzilla-daemon
2022-08-16 17:11 ` bugzilla-daemon
2022-08-25 20:01 ` bugzilla-daemon
2022-08-25 20:31 ` bugzilla-daemon
2022-08-25 22:15 ` bugzilla-daemon
2022-08-26 7:00 ` Damien Le Moal
2022-08-26 7:00 ` bugzilla-daemon [this message]
2022-10-06 15:37 ` bugzilla-daemon
2022-10-06 23:48 ` bugzilla-daemon
2022-10-07 0:10 ` bugzilla-daemon
2022-10-07 0:15 ` bugzilla-daemon
2023-07-08 23:17 ` bugzilla-daemon
2023-07-09 7:09 ` bugzilla-daemon
2023-07-09 23:18 ` bugzilla-daemon
2023-07-10 0:18 ` bugzilla-daemon
2023-07-10 0:47 ` bugzilla-daemon
2023-07-10 1:02 ` bugzilla-daemon
2023-07-10 1:07 ` bugzilla-daemon
2023-07-10 2:51 ` bugzilla-daemon
2023-07-10 3:48 ` bugzilla-daemon
2023-07-10 17:00 ` bugzilla-daemon
2023-07-10 22:48 ` bugzilla-daemon
2023-07-11 21:39 ` bugzilla-daemon
2023-07-11 22:55 ` bugzilla-daemon
2023-07-20 21:52 ` bugzilla-daemon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bug-215880-11613-KMetxdJz9f@https.bugzilla.kernel.org/ \
--to=bugzilla-daemon@kernel.org \
--cc=linux-scsi@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).