linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: bugzilla-daemon@kernel.org
To: linux-scsi@vger.kernel.org
Subject: [Bug 215880] Resume process hangs for 5-6 seconds starting sometime in 5.16
Date: Fri, 26 Aug 2022 07:00:38 +0000	[thread overview]
Message-ID: <bug-215880-11613-KMetxdJz9f@https.bugzilla.kernel.org/> (raw)
In-Reply-To: <bug-215880-11613@https.bugzilla.kernel.org/>

https://bugzilla.kernel.org/show_bug.cgi?id=215880

--- Comment #39 from damien.lemoal@opensource.wdc.com ---
On 8/26/22 07:15, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=215880
> 
> --- Comment #38 from damien.lemoal@opensource.wdc.com ---
> On 8/26/22 05:31, bugzilla-daemon@kernel.org wrote:
>> https://bugzilla.kernel.org/show_bug.cgi?id=215880
>>
>> --- Comment #37 from Bart Van Assche (bvanassche@acm.org) ---
>> On 8/25/22 13:01, bugzilla-daemon@kernel.org wrote:
>>> https://bugzilla.kernel.org/show_bug.cgi?id=215880
>>>
>>> --- Comment #36 from jason600 (jason600.groome@gmail.com) ---
>>> (In reply to Bart Van Assche from comment #27)
>>>> Thanks for testing! The patches from the sd-resume branch have been posted
>>>> on the linux-scsi mailing list. See also
>>>>
>>>>
>>>>
>>>> https://lore.kernel.org/linux-scsi/20220628222131.14780-1-bvanassche@acm.org/
>>>> T/#t
>>>
>>> Hi Bart, just an update for you. I noticed this had been removed from the
>>> 6.0-rc1 for freezing after suspend.
>>>
>>> I've been compiling my kernel with this fix on various 5.18 kernels (with
>>> opensuse tumbleweed), it has worked fine, no freezing on resume as others
>>> have
>>> mentioned.
>>>
>>> Yesterday, I updated to 5.19.2 kernel, applied the fix, recompiled, and it
>>> froze after the first suspend. Rebooted and the same thing happened again.
>>> I
>>> recompiled the kernel with the fix, just to make sure i didn't mess it up,
>>> and
>>> the same happened again.
>>>
>>> When you originally did this fix, you based it on 5.18, and indeed, it
>>> works
>>> fine on 5.18 for me. There were a lot of changes to the drivers/scsi/sd.c
>>> file
>>> for 5.19, presumably it was those changes that made this fix start freezing
>>> after suspend.
>>>
>>> Perhaps you could check if the other people that experienced freezing were
>>> using either 5.19 or 6.0-rc1.
>>
>> Multiple people reported issues with freezes during suspend with kernel 
>> v6.0-rc1. Please take a look at the following report: 
>>
>> https://lore.kernel.org/all/dd6844e7-f338-a4e9-2dad-0960e25b2ca1@redhat.com/. 
>> It shows that if zoned ATA disks are present that blk_mq_freeze_queue() 
>> may be called from inside ata_scsi_dev_rescan() on the context of a work 
>> queue. ATA rescanning happens from inside the SCSI error handler. So 
>> there is potential for a lockup because of the following:
>> * Execution of the START command being postponed because the SCSI error 
>> handler is active.
>> * blk_mq_freeze_queue() waiting for the START command to finish.
>> * The START completion handler not being executed because it got queued 
>> on the same work queue as the ATA rescan work.

Checking the code, the dev pm resume call chain for ATA look like this.

ata_port_resume() -> ata_port_request_pm() -> ata_port_schedule_eh() ->
scsi_error_handler() thread runs -> shost->transportt->eh_strategy_handler
== ata_scsi_error() -> ata_scsi_port_error_handler() ->
ata_eh_handle_port_resume() -> ap->ops->port_resume() ==
ahci_port_resume() for AHCI adapters.

There are no commands issued to the device by this chain, only
registers/port settings being changed. So this should always complete
quickly and in itself not be the reason for the START_STOP command issued
by the sd driver to get stuck.

After calling ata_eh_handle_port_resume(), ata_scsi_port_error_handler()
will trigger a reset through ata_std_error_handler() -> ata_do_eh() and
that will cause the device rescan in EH context. Device rescan will
definitely spin up the device so the START_STOP command that sd_resume()
issues seems rather useless...

As a hack, it would be good to try something like this:

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 29e2f55c6faa..1bc92c04f048 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1081,6 +1081,7 @@ int ata_scsi_dev_config(struct scsi_device *sdev,
struct ata_device *dev)
        } else {
                sdev->sector_size = ata_id_logical_sector_size(dev->id);
                sdev->manage_start_stop = 1;
+               sdev->no_start_on_resume = 1;
        }

        /*
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 8f79fa6318fe..4c28ca4d038b 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3761,7 +3761,7 @@ static int sd_suspend_runtime(struct device *dev)
 static int sd_resume(struct device *dev)
 {
        struct scsi_disk *sdkp = dev_get_drvdata(dev);
-       int ret;
+       int ret = 0;

        if (!sdkp)      /* E.g.: runtime resume at the start of sd_probe() */
                return 0;
@@ -3770,7 +3770,8 @@ static int sd_resume(struct device *dev)
                return 0;

        sd_printk(KERN_NOTICE, sdkp, "Starting disk\n");
-       ret = sd_start_stop_device(sdkp, 1);
+       if (!sdkp->device->no_start_on_resume)
+               ret = sd_start_stop_device(sdkp, 1);
        if (!ret)
                opal_unlock_from_suspend(sdkp->opal_dev);
        return ret;
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 3113471ca375..92e141536c6c 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -192,6 +192,7 @@ struct scsi_device {
        unsigned no_start_on_add:1;     /* do not issue start on add */
        unsigned allow_restart:1; /* issue START_UNIT in error handler */
        unsigned manage_start_stop:1;   /* Let HLD (sd) manage start/stop */
+       unsigned no_start_on_resume:1; /* Do not issue START_STOP_UNIT on
resume */
        unsigned start_stop_pwr_cond:1; /* Set power cond. in START_STOP_UNIT
*/
        unsigned no_uld_attach:1; /* disable connecting to upper level drivers
*/
        unsigned select_no_atn:1;

I am not sure at all this is correct though. This actually may break other
suspend/resume flavors. If we could somehow synchronize scsi pm resume to
run *after* ata pm resume, all problems should go away I think.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

  parent reply	other threads:[~2022-08-26  7:01 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <bug-215880-11613@https.bugzilla.kernel.org/>
2022-06-27  4:09 ` [Bug 215880] Resume process hangs for 5-6 seconds starting sometime in 5.16 bugzilla-daemon
2022-06-28 21:17 ` bugzilla-daemon
2022-06-28 22:30 ` bugzilla-daemon
2022-07-15 18:44 ` bugzilla-daemon
2022-08-15 11:02 ` bugzilla-daemon
2022-08-15 13:36 ` bugzilla-daemon
2022-08-16 11:08 ` bugzilla-daemon
2022-08-16 15:44 ` bugzilla-daemon
2022-08-16 16:10 ` bugzilla-daemon
2022-08-16 16:14 ` bugzilla-daemon
2022-08-16 17:11 ` bugzilla-daemon
2022-08-25 20:01 ` bugzilla-daemon
2022-08-25 20:31 ` bugzilla-daemon
2022-08-25 22:15 ` bugzilla-daemon
2022-08-26  7:00   ` Damien Le Moal
2022-08-26  7:00 ` bugzilla-daemon [this message]
2022-10-06 15:37 ` bugzilla-daemon
2022-10-06 23:48 ` bugzilla-daemon
2022-10-07  0:10 ` bugzilla-daemon
2022-10-07  0:15 ` bugzilla-daemon
2023-07-08 23:17 ` bugzilla-daemon
2023-07-09  7:09 ` bugzilla-daemon
2023-07-09 23:18 ` bugzilla-daemon
2023-07-10  0:18 ` bugzilla-daemon
2023-07-10  0:47 ` bugzilla-daemon
2023-07-10  1:02 ` bugzilla-daemon
2023-07-10  1:07 ` bugzilla-daemon
2023-07-10  2:51 ` bugzilla-daemon
2023-07-10  3:48 ` bugzilla-daemon
2023-07-10 17:00 ` bugzilla-daemon
2023-07-10 22:48 ` bugzilla-daemon
2023-07-11 21:39 ` bugzilla-daemon
2023-07-11 22:55 ` bugzilla-daemon
2023-07-20 21:52 ` bugzilla-daemon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-215880-11613-KMetxdJz9f@https.bugzilla.kernel.org/ \
    --to=bugzilla-daemon@kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).