Re: Scsi_bus_resume+0x0/0x90 returns -5 when resuming from s3 sleep

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: Scsi_bus_resume+0x0/0x90 returns -5 when resuming from s3 sleep
       [not found] <0e272abe-292d-d58f-cf80-55868e793abc@gmail.com>
@ 2023-07-26 13:47 ` Thorsten Leemhuis
  2023-07-26 23:39   ` Damien Le Moal
  0 siblings, 1 reply; 13+ messages in thread
From: Thorsten Leemhuis @ 2023-07-26 13:47 UTC (permalink / raw)
  To: TW
  Cc: regressions, Damien Le Moal, Mario Limonciello, Bart Van Assche,
	LKML, stable

Hi, Thorsten here, the Linux kernel's regression tracker.

On 26.07.23 13:54, TW wrote:
> I have been having issues with the 6.x series of kernels resuming from
> suspend with one of my drives. Far as I can tell it has trouble with the
> cache on the drive when coming out of s3 sleep. Tried a few different
> distros (Manjaro, OpenMandriva Rome, EndeavourOS) all that give the same
> error message. It appears to work fine on the 5.15 kernel just fine
> however.
> 
> This is the error or errors that I have been getting and assume has been
> holding up the system from resuming from suspend.
>
> Jul 20 04:13:41 rageworks kernel: ata10.00: device reported invalid CHS sector 0
> Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: [sdc] Start/Stop Unit failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK
> Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: [sdc] Sense Key : Illegal Request [current]
> Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: [sdc] Add. Sense: Unaligned write command
> Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: PM: dpm_run_callback(): scsi_bus_resume+0x0/0x90 returns -5
> Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: PM: failed to resume async: error -5 

Thx for your report. I CCed a few people, with a bit of luck they have
an idea. But I doubt it. If no one replies you likely will need a
bisection to find the root of the problem. But before going down that
route you want to check if latest mainline kernel (vanilla!) works better.

FWIW, this is not my area of expertise, so the following might be a
misleading comment, but the problem looks somewhat similar to this one
that iirc was never solved:
https://bugzilla.kernel.org/show_bug.cgi?id=216087

> Jul 20 04:12:51 rageworks systemd[1]: nvidia-suspend.service: Deactivated successfully.
> Jul 20 04:12:51 rageworks systemd[1]: Finished NVIDIA system suspend actions.
> Jul 20 04:12:51 rageworks systemd[1]: Starting System Suspend... 

That sounds like you are using out-of tree drivers which can cause all
sorts of issues. Please recheck if the problem happens without those as
well and do not use them in all further tests to debug the issue.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Scsi_bus_resume+0x0/0x90 returns -5 when resuming from s3 sleep
  2023-07-26 13:47 ` Scsi_bus_resume+0x0/0x90 returns -5 when resuming from s3 sleep Thorsten Leemhuis
@ 2023-07-26 23:39   ` Damien Le Moal
  2023-07-27 10:06     ` TW
  0 siblings, 1 reply; 13+ messages in thread
From: Damien Le Moal @ 2023-07-26 23:39 UTC (permalink / raw)
  To: Thorsten Leemhuis, TW
  Cc: regressions, Mario Limonciello, Bart Van Assche, LKML, stable

On 7/26/23 22:47, Thorsten Leemhuis wrote:
> Hi, Thorsten here, the Linux kernel's regression tracker.
> 
> On 26.07.23 13:54, TW wrote:
>> I have been having issues with the 6.x series of kernels resuming from
>> suspend with one of my drives. Far as I can tell it has trouble with the
>> cache on the drive when coming out of s3 sleep. Tried a few different
>> distros (Manjaro, OpenMandriva Rome, EndeavourOS) all that give the same
>> error message. It appears to work fine on the 5.15 kernel just fine
>> however.
>>
>> This is the error or errors that I have been getting and assume has been
>> holding up the system from resuming from suspend.
>>
>> Jul 20 04:13:41 rageworks kernel: ata10.00: device reported invalid CHS sector 0
>> Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: [sdc] Start/Stop Unit failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK
>> Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: [sdc] Sense Key : Illegal Request [current]
>> Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: [sdc] Add. Sense: Unaligned write command

This sense is garbage. This issue was reported already, but it is hard
to deal with as it seems to be due to drives/adapters not correctly
reporting status bits. So for now, let's ignore this sense codes.

The start/stop unit failure is weird. On another case, I am suspecting
that this command is causing a delay on resume, but not an error like this.

>> Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: PM: dpm_run_callback(): scsi_bus_resume+0x0/0x90 returns -5
>> Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: PM: failed to resume async: error -5 
> 
> Thx for your report. I CCed a few people, with a bit of luck they have
> an idea. But I doubt it. If no one replies you likely will need a
> bisection to find the root of the problem. But before going down that
> route you want to check if latest mainline kernel (vanilla!) works better.
> 
> FWIW, this is not my area of expertise, so the following might be a
> misleading comment, but the problem looks somewhat similar to this one
> that iirc was never solved:
> https://bugzilla.kernel.org/show_bug.cgi?id=216087
> 
>> Jul 20 04:12:51 rageworks systemd[1]: nvidia-suspend.service: Deactivated successfully.
>> Jul 20 04:12:51 rageworks systemd[1]: Finished NVIDIA system suspend actions.
>> Jul 20 04:12:51 rageworks systemd[1]: Starting System Suspend... 
> 
> That sounds like you are using out-of tree drivers which can cause all
> sorts of issues. Please recheck if the problem happens without those as
> well and do not use them in all further tests to debug the issue.

Yes. Please retest with the latest 6.5-rc3.

And can you try this patch to see if it solves your issue ?

commit 29e81d11812ee924d19425343ec69acd34af9d35
Author: Damien Le Moal <dlemoal@kernel.org>
Date:   Mon Jul 24 13:23:14 2023 +0900

    ata,scsi: do not issue START STOP UNIT on resume

    Signed-off-by: Damien Le Moal <dlemoal@kernel.org>

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 370d18aca71e..6184c7bcc16c 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1100,7 +1100,13 @@ int ata_scsi_dev_config(struct scsi_device *sdev, struct
ata_device *dev)
 		}
 	} else {
 		sdev->sector_size = ata_id_logical_sector_size(dev->id);
+		/*
+		 * Stop the drive on suspend but do not issue START STOP UNIT
+		 * on resume as this is not necessary: the port is reset on
+		 * resume, which wakes up the drive.
+		 */
 		sdev->manage_start_stop = 1;
+		sdev->no_start_on_resume = 1;
 	}

 	/*
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 68b12afa0721..b8584fe3123e 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3876,7 +3876,7 @@ static int sd_suspend_runtime(struct device *dev)
 static int sd_resume(struct device *dev)
 {
 	struct scsi_disk *sdkp = dev_get_drvdata(dev);
-	int ret;
+	int ret = 0;

 	if (!sdkp)	/* E.g.: runtime resume at the start of sd_probe() */
 		return 0;
@@ -3885,7 +3885,8 @@ static int sd_resume(struct device *dev)
 		return 0;

 	sd_printk(KERN_NOTICE, sdkp, "Starting disk\n");
-	ret = sd_start_stop_device(sdkp, 1);
+	if (!sdkp->device->no_start_on_resume)
+		ret = sd_start_stop_device(sdkp, 1);
 	if (!ret)
 		opal_unlock_from_suspend(sdkp->opal_dev);
 	return ret;
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 75b2235b99e2..b9230b6add04 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -194,6 +194,7 @@ struct scsi_device {
 	unsigned no_start_on_add:1;	/* do not issue start on add */
 	unsigned allow_restart:1; /* issue START_UNIT in error handler */
 	unsigned manage_start_stop:1;	/* Let HLD (sd) manage start/stop */
+	unsigned no_start_on_resume:1; /* Do not issue START_STOP_UNIT on resume */
 	unsigned start_stop_pwr_cond:1;	/* Set power cond. in START_STOP_UNIT */
 	unsigned no_uld_attach:1; /* disable connecting to upper level drivers */
 	unsigned select_no_atn:1;


-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: Scsi_bus_resume+0x0/0x90 returns -5 when resuming from s3 sleep
  2023-07-26 23:39   ` Damien Le Moal
@ 2023-07-27 10:06     ` TW
  2023-07-27 10:22       ` TW
  2023-07-27 10:24       ` Damien Le Moal
  0 siblings, 2 replies; 13+ messages in thread
From: TW @ 2023-07-27 10:06 UTC (permalink / raw)
  To: Damien Le Moal, Thorsten Leemhuis
  Cc: regressions, Mario Limonciello, Bart Van Assche, LKML, stable

I retried on 6.5 rc3 without the Nvidia drivers and still received the 
same error and going to try for the patch next but got a malformed patch 
error on line 6 for the first patch for libata-scsi.c. The other two 
seem to go through just fine however.

Also the bugzilla link is similar to what I have but the disk doesn't 
disappear, comes back but just takes awhile to come back out of sleep mode.

On 7/26/23 17:39, Damien Le Moal wrote:
> On 7/26/23 22:47, Thorsten Leemhuis wrote:
>> Hi, Thorsten here, the Linux kernel's regression tracker.
>>
>> On 26.07.23 13:54, TW wrote:
>>> I have been having issues with the 6.x series of kernels resuming from
>>> suspend with one of my drives. Far as I can tell it has trouble with the
>>> cache on the drive when coming out of s3 sleep. Tried a few different
>>> distros (Manjaro, OpenMandriva Rome, EndeavourOS) all that give the same
>>> error message. It appears to work fine on the 5.15 kernel just fine
>>> however.
>>>
>>> This is the error or errors that I have been getting and assume has been
>>> holding up the system from resuming from suspend.
>>>
>>> Jul 20 04:13:41 rageworks kernel: ata10.00: device reported invalid CHS sector 0
>>> Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: [sdc] Start/Stop Unit failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK
>>> Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: [sdc] Sense Key : Illegal Request [current]
>>> Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: [sdc] Add. Sense: Unaligned write command
> This sense is garbage. This issue was reported already, but it is hard
> to deal with as it seems to be due to drives/adapters not correctly
> reporting status bits. So for now, let's ignore this sense codes.
>
> The start/stop unit failure is weird. On another case, I am suspecting
> that this command is causing a delay on resume, but not an error like this.
>
>>> Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: PM: dpm_run_callback(): scsi_bus_resume+0x0/0x90 returns -5
>>> Jul 20 04:13:41 rageworks kernel: sd 9:0:0:0: PM: failed to resume async: error -5
>> Thx for your report. I CCed a few people, with a bit of luck they have
>> an idea. But I doubt it. If no one replies you likely will need a
>> bisection to find the root of the problem. But before going down that
>> route you want to check if latest mainline kernel (vanilla!) works better.
>>
>> FWIW, this is not my area of expertise, so the following might be a
>> misleading comment, but the problem looks somewhat similar to this one
>> that iirc was never solved:
>> https://bugzilla.kernel.org/show_bug.cgi?id=216087
>>
>>> Jul 20 04:12:51 rageworks systemd[1]: nvidia-suspend.service: Deactivated successfully.
>>> Jul 20 04:12:51 rageworks systemd[1]: Finished NVIDIA system suspend actions.
>>> Jul 20 04:12:51 rageworks systemd[1]: Starting System Suspend...
>> That sounds like you are using out-of tree drivers which can cause all
>> sorts of issues. Please recheck if the problem happens without those as
>> well and do not use them in all further tests to debug the issue.
> Yes. Please retest with the latest 6.5-rc3.
>
> And can you try this patch to see if it solves your issue ?
>
> commit 29e81d11812ee924d19425343ec69acd34af9d35
> Author: Damien Le Moal <dlemoal@kernel.org>
> Date:   Mon Jul 24 13:23:14 2023 +0900
>
>      ata,scsi: do not issue START STOP UNIT on resume
>
>      Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
>
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index 370d18aca71e..6184c7bcc16c 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -1100,7 +1100,13 @@ int ata_scsi_dev_config(struct scsi_device *sdev, struct
> ata_device *dev)
>   		}
>   	} else {
>   		sdev->sector_size = ata_id_logical_sector_size(dev->id);
> +		/*
> +		 * Stop the drive on suspend but do not issue START STOP UNIT
> +		 * on resume as this is not necessary: the port is reset on
> +		 * resume, which wakes up the drive.
> +		 */
>   		sdev->manage_start_stop = 1;
> +		sdev->no_start_on_resume = 1;
>   	}
>
>   	/*
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 68b12afa0721..b8584fe3123e 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -3876,7 +3876,7 @@ static int sd_suspend_runtime(struct device *dev)
>   static int sd_resume(struct device *dev)
>   {
>   	struct scsi_disk *sdkp = dev_get_drvdata(dev);
> -	int ret;
> +	int ret = 0;
>
>   	if (!sdkp)	/* E.g.: runtime resume at the start of sd_probe() */
>   		return 0;
> @@ -3885,7 +3885,8 @@ static int sd_resume(struct device *dev)
>   		return 0;
>
>   	sd_printk(KERN_NOTICE, sdkp, "Starting disk\n");
> -	ret = sd_start_stop_device(sdkp, 1);
> +	if (!sdkp->device->no_start_on_resume)
> +		ret = sd_start_stop_device(sdkp, 1);
>   	if (!ret)
>   		opal_unlock_from_suspend(sdkp->opal_dev);
>   	return ret;
> diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
> index 75b2235b99e2..b9230b6add04 100644
> --- a/include/scsi/scsi_device.h
> +++ b/include/scsi/scsi_device.h
> @@ -194,6 +194,7 @@ struct scsi_device {
>   	unsigned no_start_on_add:1;	/* do not issue start on add */
>   	unsigned allow_restart:1; /* issue START_UNIT in error handler */
>   	unsigned manage_start_stop:1;	/* Let HLD (sd) manage start/stop */
> +	unsigned no_start_on_resume:1; /* Do not issue START_STOP_UNIT on resume */
>   	unsigned start_stop_pwr_cond:1;	/* Set power cond. in START_STOP_UNIT */
>   	unsigned no_uld_attach:1; /* disable connecting to upper level drivers */
>   	unsigned select_no_atn:1;
>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Scsi_bus_resume+0x0/0x90 returns -5 when resuming from s3 sleep
  2023-07-27 10:06     ` TW
@ 2023-07-27 10:22       ` TW
  2023-07-27 10:25         ` Damien Le Moal
  2023-07-27 10:27         ` Damien Le Moal
  2023-07-27 10:24       ` Damien Le Moal
  1 sibling, 2 replies; 13+ messages in thread
From: TW @ 2023-07-27 10:22 UTC (permalink / raw)
  To: Damien Le Moal, Thorsten Leemhuis
  Cc: regressions, Mario Limonciello, Bart Van Assche, LKML, stable

I managed to fix the patch file, guess the formatting messed up a bit. 
So will try with those patches installed.

On 7/27/23 04:06, TW wrote:
> I retried on 6.5 rc3 without the Nvidia drivers and still received the 
> same error and going to try for the patch next but got a malformed 
> patch error on line 6 for the first patch for libata-scsi.c. The other 
> two seem to go through just fine however.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Scsi_bus_resume+0x0/0x90 returns -5 when resuming from s3 sleep
  2023-07-27 10:06     ` TW
  2023-07-27 10:22       ` TW
@ 2023-07-27 10:24       ` Damien Le Moal
  1 sibling, 0 replies; 13+ messages in thread
From: Damien Le Moal @ 2023-07-27 10:24 UTC (permalink / raw)
  To: TW, Thorsten Leemhuis
  Cc: regressions, Mario Limonciello, Bart Van Assche, LKML, stable

On 7/27/23 19:06, TW wrote:
> I retried on 6.5 rc3 without the Nvidia drivers and still received the same
> error and going to try for the patch next but got a malformed patch error on
> line 6 for the first patch for libata-scsi.c. The other two seem to go through
> just fine however.

Which other two patches are you talking about ?

> Also the bugzilla link is similar to what I have but the disk doesn't
> disappear, comes back but just takes awhile to come back out of sleep mode.

The switch to async resume has revealed many issues with the way ata devices
are resumed with regard to their scsi representation. The issues manifest in
different form. The drive "gone" problem was fixed recently. The start-stop
command seem to cause most of the time a delay in resume that several users
noticed. In your case though, it is an outright failure.

Hence the *single* patch I asked you to test (another user with a delay issue
is testing the same as well). Not sure what other patches you are talking about.

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Scsi_bus_resume+0x0/0x90 returns -5 when resuming from s3 sleep
  2023-07-27 10:22       ` TW
@ 2023-07-27 10:25         ` Damien Le Moal
  2023-07-27 10:27         ` Damien Le Moal
  1 sibling, 0 replies; 13+ messages in thread
From: Damien Le Moal @ 2023-07-27 10:25 UTC (permalink / raw)
  To: TW, Thorsten Leemhuis
  Cc: regressions, Mario Limonciello, Bart Van Assche, LKML, stable

On 7/27/23 19:22, TW wrote:
> I managed to fix the patch file, guess the formatting messed up a bit. So will
> try with those patches installed.

Please do not top post.

Again, I sent you a single patch. What other patches are you talking about ?

> 
> On 7/27/23 04:06, TW wrote:
>> I retried on 6.5 rc3 without the Nvidia drivers and still received the same
>> error and going to try for the patch next but got a malformed patch error on
>> line 6 for the first patch for libata-scsi.c. The other two seem to go
>> through just fine however.

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Scsi_bus_resume+0x0/0x90 returns -5 when resuming from s3 sleep
  2023-07-27 10:22       ` TW
  2023-07-27 10:25         ` Damien Le Moal
@ 2023-07-27 10:27         ` Damien Le Moal
  2023-07-27 12:25           ` TW
  1 sibling, 1 reply; 13+ messages in thread
From: Damien Le Moal @ 2023-07-27 10:27 UTC (permalink / raw)
  To: TW, Thorsten Leemhuis
  Cc: regressions, Mario Limonciello, Bart Van Assche, LKML, stable

[-- Attachment #1: Type: text/plain, Size: 253 bytes --]

On 7/27/23 19:22, TW wrote:
> I managed to fix the patch file, guess the formatting messed up a bit. So will
> try with those patches installed.

Just in case, patch fil attached to avoid formatting issues.


-- 
Damien Le Moal
Western Digital Research

[-- Attachment #2: 0001-ata-scsi-do-not-issue-START-STOP-UNIT-on-resume.patch --]
[-- Type: text/x-patch, Size: 2454 bytes --]

From 29e81d11812ee924d19425343ec69acd34af9d35 Mon Sep 17 00:00:00 2001
From: Damien Le Moal <dlemoal@kernel.org>
Date: Mon, 24 Jul 2023 13:23:14 +0900
Subject: [PATCH] ata,scsi: do not issue START STOP UNIT on resume

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
---
 drivers/ata/libata-scsi.c  | 6 ++++++
 drivers/scsi/sd.c          | 5 +++--
 include/scsi/scsi_device.h | 1 +
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 370d18aca71e..6184c7bcc16c 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1100,7 +1100,13 @@ int ata_scsi_dev_config(struct scsi_device *sdev, struct ata_device *dev)
 		}
 	} else {
 		sdev->sector_size = ata_id_logical_sector_size(dev->id);
+		/*
+		 * Stop the drive on suspend but do not issue START STOP UNIT
+		 * on resume as this is not necessary: the port is reset on
+		 * resume, which wakes up the drive.
+		 */
 		sdev->manage_start_stop = 1;
+		sdev->no_start_on_resume = 1;
 	}
 
 	/*
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 68b12afa0721..b8584fe3123e 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3876,7 +3876,7 @@ static int sd_suspend_runtime(struct device *dev)
 static int sd_resume(struct device *dev)
 {
 	struct scsi_disk *sdkp = dev_get_drvdata(dev);
-	int ret;
+	int ret = 0;
 
 	if (!sdkp)	/* E.g.: runtime resume at the start of sd_probe() */
 		return 0;
@@ -3885,7 +3885,8 @@ static int sd_resume(struct device *dev)
 		return 0;
 
 	sd_printk(KERN_NOTICE, sdkp, "Starting disk\n");
-	ret = sd_start_stop_device(sdkp, 1);
+	if (!sdkp->device->no_start_on_resume)
+		ret = sd_start_stop_device(sdkp, 1);
 	if (!ret)
 		opal_unlock_from_suspend(sdkp->opal_dev);
 	return ret;
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 75b2235b99e2..b9230b6add04 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -194,6 +194,7 @@ struct scsi_device {
 	unsigned no_start_on_add:1;	/* do not issue start on add */
 	unsigned allow_restart:1; /* issue START_UNIT in error handler */
 	unsigned manage_start_stop:1;	/* Let HLD (sd) manage start/stop */
+	unsigned no_start_on_resume:1; /* Do not issue START_STOP_UNIT on resume */
 	unsigned start_stop_pwr_cond:1;	/* Set power cond. in START_STOP_UNIT */
 	unsigned no_uld_attach:1; /* disable connecting to upper level drivers */
 	unsigned select_no_atn:1;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: Scsi_bus_resume+0x0/0x90 returns -5 when resuming from s3 sleep
  2023-07-27 10:27         ` Damien Le Moal
@ 2023-07-27 12:25           ` TW
  2023-07-28  2:33             ` Damien Le Moal
  0 siblings, 1 reply; 13+ messages in thread
From: TW @ 2023-07-27 12:25 UTC (permalink / raw)
  To: Damien Le Moal, Thorsten Leemhuis
  Cc: regressions, Mario Limonciello, Bart Van Assche, LKML, stable

It was all 1 patch but the first change had a formatting issue from the 
email format I guess. So I fixed that and the patch went through and 
looks like the drive error message has stopped. Still a little slow 
coming back but that error is gone at least.

Jul 27 05:05:05 rageworks systemd[1]: Starting System Suspend...
Jul 27 05:05:05 rageworks systemd-sleep[1624]: Entering sleep state 
'suspend'...
Jul 27 05:05:05 rageworks kernel: PM: suspend entry (deep)
Jul 27 05:05:05 rageworks kernel: Filesystems sync: 0.246 seconds
Jul 27 05:05:26 rageworks kernel: Freezing user space processes
Jul 27 05:05:26 rageworks kernel: Freezing user space processes 
completed (elapsed 0.001 seconds)
Jul 27 05:05:26 rageworks kernel: OOM killer disabled.
Jul 27 05:05:26 rageworks kernel: Freezing remaining freezable tasks
Jul 27 05:05:26 rageworks kernel: Freezing remaining freezable tasks 
completed (elapsed 0.000 seconds)
Jul 27 05:05:26 rageworks kernel: printk: Suspending console(s) (use 
no_console_suspend to debug)
Jul 27 05:05:26 rageworks kernel: serial 00:05: disabled
Jul 27 05:05:26 rageworks kernel: sd 9:0:0:0: [sdc] Synchronizing SCSI cache
Jul 27 05:05:26 rageworks kernel: sd 1:0:0:0: [sdb] Synchronizing SCSI cache
Jul 27 05:05:26 rageworks kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache
Jul 27 05:05:26 rageworks kernel: sd 9:0:0:0: [sdc] Stopping disk
Jul 27 05:05:26 rageworks kernel: sd 1:0:0:0: [sdb] Stopping disk
Jul 27 05:05:26 rageworks kernel: sd 0:0:0:0: [sda] Stopping disk
Jul 27 05:05:26 rageworks kernel: ACPI: PM: Preparing to enter system 
sleep state S3
Jul 27 05:05:26 rageworks kernel: ACPI: PM: Saving platform NVS memory
Jul 27 05:05:26 rageworks kernel: Disabling non-boot CPUs ...
Jul 27 05:05:26 rageworks kernel: smpboot: CPU 1 is now offline
Jul 27 05:05:26 rageworks kernel: smpboot: CPU 2 is now offline
Jul 27 05:05:26 rageworks kernel: smpboot: CPU 3 is now offline
Jul 27 05:05:26 rageworks kernel: ACPI: PM: Low-level resume complete
Jul 27 05:05:26 rageworks kernel: ACPI: PM: Restoring platform NVS memory
Jul 27 05:05:26 rageworks kernel: Enabling non-boot CPUs ...
Jul 27 05:05:26 rageworks kernel: smpboot: Booting Node 0 Processor 1 
APIC 0x1
Jul 27 05:05:26 rageworks kernel: CPU1 is up
Jul 27 05:05:26 rageworks kernel: smpboot: Booting Node 0 Processor 2 
APIC 0x2
Jul 27 05:05:26 rageworks kernel: CPU2 is up
Jul 27 05:05:26 rageworks kernel: smpboot: Booting Node 0 Processor 3 
APIC 0x3
Jul 27 05:05:26 rageworks kernel: CPU3 is up
Jul 27 05:05:26 rageworks kernel: ACPI: PM: Waking up from system sleep 
state S3
Jul 27 05:05:26 rageworks kernel: xhci_hcd 0000:02:00.0: xHC error in 
resume, USBSTS 0x401, Reinit
Jul 27 05:05:26 rageworks kernel: usb usb1: root hub lost power or was reset
Jul 27 05:05:26 rageworks kernel: usb usb2: root hub lost power or was reset
Jul 27 05:05:26 rageworks kernel: sd 0:0:0:0: [sda] Starting disk
Jul 27 05:05:26 rageworks kernel: sd 1:0:0:0: [sdb] Starting disk
Jul 27 05:05:26 rageworks kernel: sd 9:0:0:0: [sdc] Starting disk
Jul 27 05:05:26 rageworks kernel: serial 00:05: activated
Jul 27 05:05:26 rageworks kernel: ata6: SATA link down (SStatus 0 
SControl 330)
Jul 27 05:05:26 rageworks kernel: ata5: SATA link down (SStatus 0 
SControl 330)
Jul 27 05:05:26 rageworks kernel: ata9: SATA link down (SStatus 0 
SControl 300)
Jul 27 05:05:26 rageworks kernel: usb 1-10: reset full-speed USB device 
number 4 using xhci_hcd
Jul 27 05:05:26 rageworks kernel: usb 1-8: reset full-speed USB device 
number 3 using xhci_hcd
Jul 27 05:05:26 rageworks kernel: usb 1-7: reset full-speed USB device 
number 2 using xhci_hcd
Jul 27 05:05:26 rageworks kernel: OOM killer enabled.
Jul 27 05:05:26 rageworks kernel: Restarting tasks ... done.
Jul 27 05:05:26 rageworks kernel: random: crng reseeded on system resumption
Jul 27 05:05:26 rageworks kernel: PM: suspend exit

On 7/27/23 04:27, Damien Le Moal wrote:
> On 7/27/23 19:22, TW wrote:
>> I managed to fix the patch file, guess the formatting messed up a bit. So will
>> try with those patches installed.
> Just in case, patch fil attached to avoid formatting issues.
>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Scsi_bus_resume+0x0/0x90 returns -5 when resuming from s3 sleep
  2023-07-27 12:25           ` TW
@ 2023-07-28  2:33             ` Damien Le Moal
  2023-07-28  2:49               ` TW
  0 siblings, 1 reply; 13+ messages in thread
From: Damien Le Moal @ 2023-07-28  2:33 UTC (permalink / raw)
  To: TW, Thorsten Leemhuis
  Cc: regressions, Mario Limonciello, Bart Van Assche, LKML, stable

On 7/27/23 21:25, TW wrote:
> It was all 1 patch but the first change had a formatting issue from the 
> email format I guess. So I fixed that and the patch went through and 
> looks like the drive error message has stopped. Still a little slow 
> coming back but that error is gone at least.

"Slow coming back" -> Compared to which version of the kernel ? Do you have
numbers ?

If the devices are HDDs, resume will wait for these to spin up. That takes a
while (about 10s normally).

> 
> Jul 27 05:05:05 rageworks systemd[1]: Starting System Suspend...
> Jul 27 05:05:05 rageworks systemd-sleep[1624]: Entering sleep state 
> 'suspend'...
> Jul 27 05:05:05 rageworks kernel: PM: suspend entry (deep)
> Jul 27 05:05:05 rageworks kernel: Filesystems sync: 0.246 seconds
> Jul 27 05:05:26 rageworks kernel: Freezing user space processes
> Jul 27 05:05:26 rageworks kernel: Freezing user space processes 
> completed (elapsed 0.001 seconds)
> Jul 27 05:05:26 rageworks kernel: OOM killer disabled.
> Jul 27 05:05:26 rageworks kernel: Freezing remaining freezable tasks
> Jul 27 05:05:26 rageworks kernel: Freezing remaining freezable tasks 
> completed (elapsed 0.000 seconds)
> Jul 27 05:05:26 rageworks kernel: printk: Suspending console(s) (use 
> no_console_suspend to debug)
> Jul 27 05:05:26 rageworks kernel: serial 00:05: disabled
> Jul 27 05:05:26 rageworks kernel: sd 9:0:0:0: [sdc] Synchronizing SCSI cache
> Jul 27 05:05:26 rageworks kernel: sd 1:0:0:0: [sdb] Synchronizing SCSI cache
> Jul 27 05:05:26 rageworks kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache
> Jul 27 05:05:26 rageworks kernel: sd 9:0:0:0: [sdc] Stopping disk
> Jul 27 05:05:26 rageworks kernel: sd 1:0:0:0: [sdb] Stopping disk
> Jul 27 05:05:26 rageworks kernel: sd 0:0:0:0: [sda] Stopping disk
> Jul 27 05:05:26 rageworks kernel: ACPI: PM: Preparing to enter system 
> sleep state S3
> Jul 27 05:05:26 rageworks kernel: ACPI: PM: Saving platform NVS memory
> Jul 27 05:05:26 rageworks kernel: Disabling non-boot CPUs ...
> Jul 27 05:05:26 rageworks kernel: smpboot: CPU 1 is now offline
> Jul 27 05:05:26 rageworks kernel: smpboot: CPU 2 is now offline
> Jul 27 05:05:26 rageworks kernel: smpboot: CPU 3 is now offline
> Jul 27 05:05:26 rageworks kernel: ACPI: PM: Low-level resume complete
> Jul 27 05:05:26 rageworks kernel: ACPI: PM: Restoring platform NVS memory
> Jul 27 05:05:26 rageworks kernel: Enabling non-boot CPUs ...
> Jul 27 05:05:26 rageworks kernel: smpboot: Booting Node 0 Processor 1 
> APIC 0x1
> Jul 27 05:05:26 rageworks kernel: CPU1 is up
> Jul 27 05:05:26 rageworks kernel: smpboot: Booting Node 0 Processor 2 
> APIC 0x2
> Jul 27 05:05:26 rageworks kernel: CPU2 is up
> Jul 27 05:05:26 rageworks kernel: smpboot: Booting Node 0 Processor 3 
> APIC 0x3
> Jul 27 05:05:26 rageworks kernel: CPU3 is up
> Jul 27 05:05:26 rageworks kernel: ACPI: PM: Waking up from system sleep 
> state S3
> Jul 27 05:05:26 rageworks kernel: xhci_hcd 0000:02:00.0: xHC error in 
> resume, USBSTS 0x401, Reinit
> Jul 27 05:05:26 rageworks kernel: usb usb1: root hub lost power or was reset
> Jul 27 05:05:26 rageworks kernel: usb usb2: root hub lost power or was reset
> Jul 27 05:05:26 rageworks kernel: sd 0:0:0:0: [sda] Starting disk
> Jul 27 05:05:26 rageworks kernel: sd 1:0:0:0: [sdb] Starting disk
> Jul 27 05:05:26 rageworks kernel: sd 9:0:0:0: [sdc] Starting disk
> Jul 27 05:05:26 rageworks kernel: serial 00:05: activated
> Jul 27 05:05:26 rageworks kernel: ata6: SATA link down (SStatus 0 
> SControl 330)
> Jul 27 05:05:26 rageworks kernel: ata5: SATA link down (SStatus 0 
> SControl 330)
> Jul 27 05:05:26 rageworks kernel: ata9: SATA link down (SStatus 0 
> SControl 300)
> Jul 27 05:05:26 rageworks kernel: usb 1-10: reset full-speed USB device 
> number 4 using xhci_hcd
> Jul 27 05:05:26 rageworks kernel: usb 1-8: reset full-speed USB device 
> number 3 using xhci_hcd
> Jul 27 05:05:26 rageworks kernel: usb 1-7: reset full-speed USB device 
> number 2 using xhci_hcd
> Jul 27 05:05:26 rageworks kernel: OOM killer enabled.
> Jul 27 05:05:26 rageworks kernel: Restarting tasks ... done.
> Jul 27 05:05:26 rageworks kernel: random: crng reseeded on system resumption
> Jul 27 05:05:26 rageworks kernel: PM: suspend exit
> 
> 
> On 7/27/23 04:27, Damien Le Moal wrote:
>> On 7/27/23 19:22, TW wrote:
>>> I managed to fix the patch file, guess the formatting messed up a bit. So will
>>> try with those patches installed.
>> Just in case, patch fil attached to avoid formatting issues.
>>
>>

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Scsi_bus_resume+0x0/0x90 returns -5 when resuming from s3 sleep
  2023-07-28  2:33             ` Damien Le Moal
@ 2023-07-28  2:49               ` TW
  2023-07-28  4:09                 ` Damien Le Moal
  2023-07-31  0:41                 ` Damien Le Moal
  0 siblings, 2 replies; 13+ messages in thread
From: TW @ 2023-07-28  2:49 UTC (permalink / raw)
  To: Damien Le Moal, Thorsten Leemhuis
  Cc: regressions, Mario Limonciello, Bart Van Assche, LKML, stable

Comparing to the 5.15 kernel which had almost no delay. They are HDDs 
though, it was working flawlessly earlier but didn't want to top post 
again. Tried a systemd suspend instead of from the Xfce4 logout menu and 
everything works as intended. So I'd say that it's fixed now.

On 7/27/23 20:33, Damien Le Moal wrote:
> "Slow coming back" -> Compared to which version of the kernel ? Do you 
> have numbers ? If the devices are HDDs, resume will wait for these to 
> spin up. That takes a while (about 10s normally).

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Scsi_bus_resume+0x0/0x90 returns -5 when resuming from s3 sleep
  2023-07-28  2:49               ` TW
@ 2023-07-28  4:09                 ` Damien Le Moal
  2023-07-31  0:41                 ` Damien Le Moal
  1 sibling, 0 replies; 13+ messages in thread
From: Damien Le Moal @ 2023-07-28  4:09 UTC (permalink / raw)
  To: TW, Thorsten Leemhuis
  Cc: regressions, Mario Limonciello, Bart Van Assche, LKML, stable

On 7/28/23 11:49, TW wrote:
> Comparing to the 5.15 kernel which had almost no delay. They are HDDs 
> though, it was working flawlessly earlier but didn't want to top post 
> again. Tried a systemd suspend instead of from the Xfce4 logout menu and 
> everything works as intended. So I'd say that it's fixed now.

I think that asynchronous scsi suspend/resume using the PM infrastructure,
introduced in 5.16, is creating this additional delay. The reason is that now,
ata and scsi drivers do their resume operations from within the PM resume
process, resulting in some delays to reach "PM: suspend exit".

I am looking into improving this, but it may take some more time.

I will post officially the patch you tried and CC you. A "tested-by" tag from
you would be appreciated.

Thanks.

> 
> On 7/27/23 20:33, Damien Le Moal wrote:
>> "Slow coming back" -> Compared to which version of the kernel ? Do you 
>> have numbers ? If the devices are HDDs, resume will wait for these to 
>> spin up. That takes a while (about 10s normally).

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Scsi_bus_resume+0x0/0x90 returns -5 when resuming from s3 sleep
  2023-07-28  2:49               ` TW
  2023-07-28  4:09                 ` Damien Le Moal
@ 2023-07-31  0:41                 ` Damien Le Moal
  2023-07-31  3:47                   ` TW
  1 sibling, 1 reply; 13+ messages in thread
From: Damien Le Moal @ 2023-07-31  0:41 UTC (permalink / raw)
  To: TW, Thorsten Leemhuis
  Cc: regressions, Mario Limonciello, Bart Van Assche, LKML, stable

On 7/28/23 11:49, TW wrote:
> Comparing to the 5.15 kernel which had almost no delay. They are HDDs 
> though, it was working flawlessly earlier but didn't want to top post 
> again. Tried a systemd suspend instead of from the Xfce4 logout menu and 
> everything works as intended. So I'd say that it's fixed now.

I posted a proper patch and CC-ed you. A "Tested-by:" tag from you would be much
appreciated (please use your full name in that case, instead of only "TW").
Thanks.

> 
> On 7/27/23 20:33, Damien Le Moal wrote:
>> "Slow coming back" -> Compared to which version of the kernel ? Do you 
>> have numbers ? If the devices are HDDs, resume will wait for these to 
>> spin up. That takes a while (about 10s normally).

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Scsi_bus_resume+0x0/0x90 returns -5 when resuming from s3 sleep
  2023-07-31  0:41                 ` Damien Le Moal
@ 2023-07-31  3:47                   ` TW
  0 siblings, 0 replies; 13+ messages in thread
From: TW @ 2023-07-31  3:47 UTC (permalink / raw)
  To: Damien Le Moal, Thorsten Leemhuis
  Cc: regressions, Mario Limonciello, Bart Van Assche, LKML, stable

Oh excellent I'll go do that now.

On 7/30/23 18:41, Damien Le Moal wrote:
> I posted a proper patch and CC-ed you. A "Tested-by:" tag from you 
> would be much appreciated (please use your full name in that case, 
> instead of only "TW"). Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-07-31  3:47 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <0e272abe-292d-d58f-cf80-55868e793abc@gmail.com>
2023-07-26 13:47 ` Scsi_bus_resume+0x0/0x90 returns -5 when resuming from s3 sleep Thorsten Leemhuis
2023-07-26 23:39   ` Damien Le Moal
2023-07-27 10:06     ` TW
2023-07-27 10:22       ` TW
2023-07-27 10:25         ` Damien Le Moal
2023-07-27 10:27         ` Damien Le Moal
2023-07-27 12:25           ` TW
2023-07-28  2:33             ` Damien Le Moal
2023-07-28  2:49               ` TW
2023-07-28  4:09                 ` Damien Le Moal
2023-07-31  0:41                 ` Damien Le Moal
2023-07-31  3:47                   ` TW
2023-07-27 10:24       ` Damien Le Moal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).