All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH/RESEND 1/2] Hard disk S3 resume time optimization
@ 2013-09-06  0:38 Todd E Brandt
  2013-09-06 16:54 ` Bartlomiej Zolnierkiewicz
  0 siblings, 1 reply; 5+ messages in thread
From: Todd E Brandt @ 2013-09-06  0:38 UTC (permalink / raw)
  To: linux-ide, linux-scsi; +Cc: tj, JBottomley

This is the final draft of the non-blocking hard disk resume patch. I've
included some performance results to demonstrate the real benefits
of this patch. Please note that this patch provides a MASSIVE performance
improvement in hard disk resume. It's too valuable to ignore, so I really
need the help of the maintainers to get this implemented. Even if this
patch is deemed the wrong approach I hope you won't abandon the idea
altogether. There is so much potential in this kind of optimization and
I'm highly motivated to make this work.

To demonstrate the substantial performance improvement I've run the 
AnalyzeSuspend tool on three different platforms patched with the new 
behavior. Each is running Ubuntu Raring with a kernel built from the 
upstream kernel source.

The complete analysis and graphical outputs of the tool are available
online at 01.org:
https://01.org/suspendresume/blogs/tebrandt/2013/hard-disk-resume-optimization-simpler-approach

Here's a synopsis of the results.

-------------------------------------------------------
[Computer One]
PLATFORM: Ubuntu Raring Ringtail (13.04)
KERNEL: 3.11.0-rc7
CPU: Intel(R) Core(TM) i7-3960X CPU @ 3.30GHz
SATA: Intel C600/X79 series chipset 6-Port SATA AHCI (r5)
DISK CONFIG:
    ATA1: 240 GB SSD
    ATA2: 3 TB Hard Disk
    ATA3: 500 GB Hard Disk
    ATA4: DVD-ROM (with cd inserted)
    ATA5: 2 TB Hard Disk
    ATA6: 1 TB Hard Disk
RESUME TIME WITHOUT PATCH: 11656 ms
RESUME TIME WITH PATCH: 1110 ms

IMPROVEMENT: 10.5X speedup

-------------------------------------------------------
[Computer Two]
PLATFORM: Ubuntu Raring Ringtail (13.04)
KERNEL: 3.11.0-rc7
CPU: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
SATA: Intel 7 Series/C210 Series Chipset Family 6-port SATA [AHCI mode] (r4)
DISK CONFIG:
    ATA1: 320 GB Hard Disk
    ATA2 - ATA6: Empty slots
RESUME TIME WITHOUT PATCH: 5416 ms
RESUME TIME WITH PATCH: 448 ms

IMPROVEMENT: 12X speedup

-------------------------------------------------------
[Computer Three]
PLATFORM: Ubuntu Raring Ringtail (13.04)
KERNEL: 3.11.0-rc7
CPU: Intel(R) Core(TM) i7-4770S CPU @ 3.10GHz
SATA: Intel Lynx Point 6-port SATA Controller 1 [AHCI mode] (r2)
DISK CONFIG:
    ATA1,3,4,6: Empty Slots
    ATA2: DVD-ROM (empty)
    ATA5: 500 GB Hard Disk
RESUME TIME WITHOUT PATCH: 5385 ms
RESUME TIME WITH PATCH: 688 ms

IMPROVEMENT: 7.8X speedup

-------------------------------------------------------

Signed-off-by: Todd Brandt <todd.e.brandt@intel.com>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>

 drivers/ata/libata-core.c | 36 +++++++++++++++++++++++++++++++++++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index c24354d..6cf0c15 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -5415,6 +5415,40 @@ static int ata_port_resume(struct device *dev)
 	return rc;
 }
 
+static int ata_port_resume_async(struct device *dev)
+{
+	struct ata_port *ap = to_ata_port(dev);
+	struct ata_link *link;
+	unsigned long flags;
+	int ret = 0;
+
+	if (ap->pflags & ATA_PFLAG_PM_PENDING) {
+		WARN_ON(ap->pflags & ATA_PFLAG_PM_PENDING);
+		ret = -EAGAIN;
+		goto out;
+	}
+
+	spin_lock_irqsave(ap->lock, flags);
+
+	ap->pm_mesg = PMSG_RESUME;
+	ap->pm_result = NULL;
+	ap->pflags |= ATA_PFLAG_PM_PENDING;
+	ata_for_each_link(link, ap, HOST_FIRST) {
+		link->eh_info.action |= ATA_EH_RESET;
+		link->eh_info.flags |= ATA_EHI_NO_AUTOPSY | ATA_EHI_QUIET;
+	}
+
+	ata_port_schedule_eh(ap);
+
+	spin_unlock_irqrestore(ap->lock, flags);
+
+ out:
+	pm_runtime_disable(dev);
+	pm_runtime_set_active(dev);
+	pm_runtime_enable(dev);
+	return ret;
+}
+
 /*
  * For ODDs, the upper layer will poll for media change every few seconds,
  * which will make it enter and leave suspend state every few seconds. And
@@ -5451,7 +5485,7 @@ static int ata_port_runtime_resume(struct device *dev)
 
 static const struct dev_pm_ops ata_port_pm_ops = {
 	.suspend = ata_port_suspend,
-	.resume = ata_port_resume,
+	.resume = ata_port_resume_async,
 	.freeze = ata_port_do_freeze,
 	.thaw = ata_port_resume,
 	.poweroff = ata_port_poweroff,


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH/RESEND 1/2] Hard disk S3 resume time optimization
  2013-09-06  0:38 [PATCH/RESEND 1/2] Hard disk S3 resume time optimization Todd E Brandt
@ 2013-09-06 16:54 ` Bartlomiej Zolnierkiewicz
  2013-09-06 22:45   ` Todd E Brandt
  0 siblings, 1 reply; 5+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2013-09-06 16:54 UTC (permalink / raw)
  To: todd.e.brandt; +Cc: linux-ide, linux-scsi, tj, JBottomley


Hi,

On Thursday, September 05, 2013 05:38:53 PM Todd E Brandt wrote:
> This is the final draft of the non-blocking hard disk resume patch. I've
> included some performance results to demonstrate the real benefits
> of this patch. Please note that this patch provides a MASSIVE performance
> improvement in hard disk resume. It's too valuable to ignore, so I really
> need the help of the maintainers to get this implemented. Even if this
> patch is deemed the wrong approach I hope you won't abandon the idea
> altogether. There is so much potential in this kind of optimization and
> I'm highly motivated to make this work.
> 
> To demonstrate the substantial performance improvement I've run the 
> AnalyzeSuspend tool on three different platforms patched with the new 
> behavior. Each is running Ubuntu Raring with a kernel built from the 
> upstream kernel source.
> 
> The complete analysis and graphical outputs of the tool are available
> online at 01.org:
> https://01.org/suspendresume/blogs/tebrandt/2013/hard-disk-resume-optimization-simpler-approach

Please include changes descriptions from the link above in the actual patch
descriptions.

"The essential issue behind hard disks' lengthy resume time is the ata
port driver blocking until the ATA port hardware is finished coming online.
So the kernel isn't really doing anything during all those seconds that
the disks are resuming, it's just blocking until the hardware says it's
ready to accept commands. This patch changes the ATA port driver to issue
the wakeup command and then return immediately. Any commands issued to
the hardware will be queued up and will be executed once the port is
physically online. Thus no information is lost, and although the wait
time itself isn't removed, it doesn't hold up the rest of the system
which can function on what's left in RAM and cache."

What happens when somebody requests suspend while ATA port resume is still
running?

> Here's a synopsis of the results.
> 
> -------------------------------------------------------
> [Computer One]
> PLATFORM: Ubuntu Raring Ringtail (13.04)
> KERNEL: 3.11.0-rc7
> CPU: Intel(R) Core(TM) i7-3960X CPU @ 3.30GHz
> SATA: Intel C600/X79 series chipset 6-Port SATA AHCI (r5)
> DISK CONFIG:
>     ATA1: 240 GB SSD
>     ATA2: 3 TB Hard Disk
>     ATA3: 500 GB Hard Disk
>     ATA4: DVD-ROM (with cd inserted)
>     ATA5: 2 TB Hard Disk
>     ATA6: 1 TB Hard Disk
> RESUME TIME WITHOUT PATCH: 11656 ms
> RESUME TIME WITH PATCH: 1110 ms

These results are with both patches applied, could you tell what is
the improvement from the patch #1 alone?

Where is the 11s delay coming from? Are we doing the resume for all 
ports sequentially instead of in parallel? In such case you should be
fixing the power management layer instead.

> IMPROVEMENT: 10.5X speedup
> 
> -------------------------------------------------------
> [Computer Two]
> PLATFORM: Ubuntu Raring Ringtail (13.04)
> KERNEL: 3.11.0-rc7
> CPU: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
> SATA: Intel 7 Series/C210 Series Chipset Family 6-port SATA [AHCI mode] (r4)
> DISK CONFIG:
>     ATA1: 320 GB Hard Disk
>     ATA2 - ATA6: Empty slots
> RESUME TIME WITHOUT PATCH: 5416 ms
> RESUME TIME WITH PATCH: 448 ms
> 
> IMPROVEMENT: 12X speedup
> 
> -------------------------------------------------------
> [Computer Three]
> PLATFORM: Ubuntu Raring Ringtail (13.04)
> KERNEL: 3.11.0-rc7
> CPU: Intel(R) Core(TM) i7-4770S CPU @ 3.10GHz
> SATA: Intel Lynx Point 6-port SATA Controller 1 [AHCI mode] (r2)
> DISK CONFIG:
>     ATA1,3,4,6: Empty Slots
>     ATA2: DVD-ROM (empty)
>     ATA5: 500 GB Hard Disk
> RESUME TIME WITHOUT PATCH: 5385 ms
> RESUME TIME WITH PATCH: 688 ms
> 
> IMPROVEMENT: 7.8X speedup
> 
> -------------------------------------------------------
> 
> Signed-off-by: Todd Brandt <todd.e.brandt@intel.com>
> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
> 
>  drivers/ata/libata-core.c | 36 +++++++++++++++++++++++++++++++++++-
>  1 file changed, 35 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> index c24354d..6cf0c15 100644
> --- a/drivers/ata/libata-core.c
> +++ b/drivers/ata/libata-core.c
> @@ -5415,6 +5415,40 @@ static int ata_port_resume(struct device *dev)
>  	return rc;
>  }
>  
> +static int ata_port_resume_async(struct device *dev)
> +{
> +	struct ata_port *ap = to_ata_port(dev);
> +	struct ata_link *link;
> +	unsigned long flags;
> +	int ret = 0;
> +
> +	if (ap->pflags & ATA_PFLAG_PM_PENDING) {
> +		WARN_ON(ap->pflags & ATA_PFLAG_PM_PENDING);
> +		ret = -EAGAIN;
> +		goto out;
> +	}
> +
> +	spin_lock_irqsave(ap->lock, flags);
> +
> +	ap->pm_mesg = PMSG_RESUME;
> +	ap->pm_result = NULL;
> +	ap->pflags |= ATA_PFLAG_PM_PENDING;
> +	ata_for_each_link(link, ap, HOST_FIRST) {
> +		link->eh_info.action |= ATA_EH_RESET;
> +		link->eh_info.flags |= ATA_EHI_NO_AUTOPSY | ATA_EHI_QUIET;
> +	}
> +
> +	ata_port_schedule_eh(ap);
> +
> +	spin_unlock_irqrestore(ap->lock, flags);

Why have you open-coded ata_port_request_pm() instead of just re-using it
but with "async" parameter set?

> + out:
> +	pm_runtime_disable(dev);
> +	pm_runtime_set_active(dev);
> +	pm_runtime_enable(dev);
> +	return ret;
> 
> +}
> 
>  /*
>   * For ODDs, the upper layer will poll for media change every few seconds,
>   * which will make it enter and leave suspend state every few seconds. And
> @@ -5451,7 +5485,7 @@ static int ata_port_runtime_resume(struct device *dev)
>  
>  static const struct dev_pm_ops ata_port_pm_ops = {
>  	.suspend = ata_port_suspend,
> -	.resume = ata_port_resume,
> +	.resume = ata_port_resume_async,

->resume will now return success even though it can later fail during
the async operation (error value will be lost), this is no-go, sorry.

Also it seems that ata_port_resume() becomes unused after this change.

>  	.freeze = ata_port_do_freeze,
>  	.thaw = ata_port_resume,
>  	.poweroff = ata_port_poweroff,

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung R&D Institute Poland
Samsung Electronics


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH/RESEND 1/2] Hard disk S3 resume time optimization
  2013-09-06 16:54 ` Bartlomiej Zolnierkiewicz
@ 2013-09-06 22:45   ` Todd E Brandt
  2013-09-10 12:59     ` Bartlomiej Zolnierkiewicz
  0 siblings, 1 reply; 5+ messages in thread
From: Todd E Brandt @ 2013-09-06 22:45 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz; +Cc: linux-scsi

On Fri, Sep 06, 2013 at 06:54:48PM +0200, Bartlomiej Zolnierkiewicz wrote:
> 
> Hi,
> 
> On Thursday, September 05, 2013 05:38:53 PM Todd E Brandt wrote:
> > This is the final draft of the non-blocking hard disk resume patch. I've
> > included some performance results to demonstrate the real benefits
> > of this patch. Please note that this patch provides a MASSIVE performance
> > improvement in hard disk resume. It's too valuable to ignore, so I really
> > need the help of the maintainers to get this implemented. Even if this
> > patch is deemed the wrong approach I hope you won't abandon the idea
> > altogether. There is so much potential in this kind of optimization and
> > I'm highly motivated to make this work.
> > 
> > To demonstrate the substantial performance improvement I've run the 
> > AnalyzeSuspend tool on three different platforms patched with the new 
> > behavior. Each is running Ubuntu Raring with a kernel built from the 
> > upstream kernel source.
> > 
> > The complete analysis and graphical outputs of the tool are available
> > online at 01.org:
> > https://01.org/suspendresume/blogs/tebrandt/2013/hard-disk-resume-optimization-simpler-approach
> 
> Please include changes descriptions from the link above in the actual patch
> descriptions.
> 
> "The essential issue behind hard disks' lengthy resume time is the ata
> port driver blocking until the ATA port hardware is finished coming online.
> So the kernel isn't really doing anything during all those seconds that
> the disks are resuming, it's just blocking until the hardware says it's
> ready to accept commands. This patch changes the ATA port driver to issue
> the wakeup command and then return immediately. Any commands issued to
> the hardware will be queued up and will be executed once the port is
> physically online. Thus no information is lost, and although the wait
> time itself isn't removed, it doesn't hold up the rest of the system
> which can function on what's left in RAM and cache."
> 

 What happens when somebody requests suspend while ATA port resume is still
 running?

The suspend request is queued until the resume has completed. I've tested
that case very heavily. Basically if you do two suspends in a row (within
seconds of each other) you lose any performance benefit, but that's a use
case that should happen only very rarerly (and wouldn't be expected to
be of any power benefit since too many sequential suspends actually
takes more power than just letting the machine stay in run mode).

> 
> > Here's a synopsis of the results.
> > 
> > -------------------------------------------------------
> > [Computer One]
> > PLATFORM: Ubuntu Raring Ringtail (13.04)
> > KERNEL: 3.11.0-rc7
> > CPU: Intel(R) Core(TM) i7-3960X CPU @ 3.30GHz
> > SATA: Intel C600/X79 series chipset 6-Port SATA AHCI (r5)
> > DISK CONFIG:
> >     ATA1: 240 GB SSD
> >     ATA2: 3 TB Hard Disk
> >     ATA3: 500 GB Hard Disk
> >     ATA4: DVD-ROM (with cd inserted)
> >     ATA5: 2 TB Hard Disk
> >     ATA6: 1 TB Hard Disk
> > RESUME TIME WITHOUT PATCH: 11656 ms
> > RESUME TIME WITH PATCH: 1110 ms

 These results are with both patches applied, could you tell what is
 the improvement from the patch #1 alone?

With either patch alone there is no benefit. I've tested with each
patch applied independently to verify that they function, but you 
can't get the benefit without both. 

 Where is the 11s delay coming from? Are we doing the resume for all 
 ports sequentially instead of in parallel? In such case you should be
 fixing the power management layer instead.

The ATA ports are all resumed in parallel. The long resume time
comes from extremely large drives. The ATA port wakeup delay is
the hardware running an internal subroutine to reinitialize the
connection, which I guess takes longer for bigger disks. I'm not
completely sure what it's doing, I just know that there's nothing
software can do to speed it up.

> > IMPROVEMENT: 10.5X speedup
> > 
> > -------------------------------------------------------
> > [Computer Two]
> > PLATFORM: Ubuntu Raring Ringtail (13.04)
> > KERNEL: 3.11.0-rc7
> > CPU: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
> > SATA: Intel 7 Series/C210 Series Chipset Family 6-port SATA [AHCI mode] (r4)
> > DISK CONFIG:
> >     ATA1: 320 GB Hard Disk
> >     ATA2 - ATA6: Empty slots
> > RESUME TIME WITHOUT PATCH: 5416 ms
> > RESUME TIME WITH PATCH: 448 ms
> > 
> > IMPROVEMENT: 12X speedup
> > 
> > -------------------------------------------------------
> > [Computer Three]
> > PLATFORM: Ubuntu Raring Ringtail (13.04)
> > KERNEL: 3.11.0-rc7
> > CPU: Intel(R) Core(TM) i7-4770S CPU @ 3.10GHz
> > SATA: Intel Lynx Point 6-port SATA Controller 1 [AHCI mode] (r2)
> > DISK CONFIG:
> >     ATA1,3,4,6: Empty Slots
> >     ATA2: DVD-ROM (empty)
> >     ATA5: 500 GB Hard Disk
> > RESUME TIME WITHOUT PATCH: 5385 ms
> > RESUME TIME WITH PATCH: 688 ms
> > 
> > IMPROVEMENT: 7.8X speedup
> > 
> > -------------------------------------------------------
> > 
> > Signed-off-by: Todd Brandt <todd.e.brandt@intel.com>
> > Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
> > 
> >  drivers/ata/libata-core.c | 36 +++++++++++++++++++++++++++++++++++-
> >  1 file changed, 35 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> > index c24354d..6cf0c15 100644
> > --- a/drivers/ata/libata-core.c
> > +++ b/drivers/ata/libata-core.c
> > @@ -5415,6 +5415,40 @@ static int ata_port_resume(struct device *dev)
> >  	return rc;
> >  }
> >  
> > +static int ata_port_resume_async(struct device *dev)
> > +{
> > +	struct ata_port *ap = to_ata_port(dev);
> > +	struct ata_link *link;
> > +	unsigned long flags;
> > +	int ret = 0;
> > +
> > +	if (ap->pflags & ATA_PFLAG_PM_PENDING) {
> > +		WARN_ON(ap->pflags & ATA_PFLAG_PM_PENDING);
> > +		ret = -EAGAIN;
> > +		goto out;
> > +	}
> > +
> > +	spin_lock_irqsave(ap->lock, flags);
> > +
> > +	ap->pm_mesg = PMSG_RESUME;
> > +	ap->pm_result = NULL;
> > +	ap->pflags |= ATA_PFLAG_PM_PENDING;
> > +	ata_for_each_link(link, ap, HOST_FIRST) {
> > +		link->eh_info.action |= ATA_EH_RESET;
> > +		link->eh_info.flags |= ATA_EHI_NO_AUTOPSY | ATA_EHI_QUIET;
> > +	}
> > +
> > +	ata_port_schedule_eh(ap);
> > +
> > +	spin_unlock_irqrestore(ap->lock, flags);
> 

 Why have you open-coded ata_port_request_pm() instead of just re-using it
 but with "async" parameter set?

Because the async mode call requires a place to capture the value, and in
this case I want to ignore it. The PM subsystem doesn't supply any method
of checking on the status of a resume callback that's running asynchronously
because it has no way of knowing. So without altering the pm code there's
no point in collecting it.

> 
> > + out:
> > +	pm_runtime_disable(dev);
> > +	pm_runtime_set_active(dev);
> > +	pm_runtime_enable(dev);
> > +	return ret;
> > 
> > +}
> > 
> >  /*
> >   * For ODDs, the upper layer will poll for media change every few seconds,
> >   * which will make it enter and leave suspend state every few seconds. And
> > @@ -5451,7 +5485,7 @@ static int ata_port_runtime_resume(struct device *dev)
> >  
> >  static const struct dev_pm_ops ata_port_pm_ops = {
> >  	.suspend = ata_port_suspend,
> > -	.resume = ata_port_resume,
> > +	.resume = ata_port_resume_async,
> 

 ->resume will now return success even though it can later fail during
 the async operation (error value will be lost), this is no-go, sorry.

Yes, that's the nature of the asynchronous paradigm. The return value in this 
case is the status of whether the resume was initiated, rather than if it was 
completed. I'm letting the scsi and ata drivers assume control over their
own error reporting in this case (which they already do). If you look at 
the ata_port resume code you'll see that the ata_port_resume callback returns
the status of the ahci_port_resume callback, which is always 0. So for SATA 
disks the returned value for the resume callback is always 0. Error 
handling and reporting is taken care of in the scsi error handler thread, 
which runs separately from the pm core. I agree that in general it's a 
"no-go" to ignore a return value, but in this specific case for this 
specific driver, the return value is erronuous anyway, so there's no 
harm in ignoring it.

> 
> Also it seems that ata_port_resume() becomes unused after this change.
> 
> >  	.freeze = ata_port_do_freeze,
> >  	.thaw = ata_port_resume,
> >  	.poweroff = ata_port_poweroff,
> 
> Best regards,
> --
> Bartlomiej Zolnierkiewicz
> Samsung R&D Institute Poland
> Samsung Electronics
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH/RESEND 1/2] Hard disk S3 resume time optimization
  2013-09-06 22:45   ` Todd E Brandt
@ 2013-09-10 12:59     ` Bartlomiej Zolnierkiewicz
  2013-09-10 20:23       ` Brandt, Todd E
  0 siblings, 1 reply; 5+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2013-09-10 12:59 UTC (permalink / raw)
  To: todd.e.brandt; +Cc: linux-scsi

On Friday, September 06, 2013 03:45:35 PM Todd E Brandt wrote:
> On Fri, Sep 06, 2013 at 06:54:48PM +0200, Bartlomiej Zolnierkiewicz wrote:
> > 
> > Hi,
> > 
> > On Thursday, September 05, 2013 05:38:53 PM Todd E Brandt wrote:
> > > This is the final draft of the non-blocking hard disk resume patch. I've
> > > included some performance results to demonstrate the real benefits
> > > of this patch. Please note that this patch provides a MASSIVE performance
> > > improvement in hard disk resume. It's too valuable to ignore, so I really
> > > need the help of the maintainers to get this implemented. Even if this
> > > patch is deemed the wrong approach I hope you won't abandon the idea
> > > altogether. There is so much potential in this kind of optimization and
> > > I'm highly motivated to make this work.
> > > 
> > > To demonstrate the substantial performance improvement I've run the 
> > > AnalyzeSuspend tool on three different platforms patched with the new 
> > > behavior. Each is running Ubuntu Raring with a kernel built from the 
> > > upstream kernel source.
> > > 
> > > The complete analysis and graphical outputs of the tool are available
> > > online at 01.org:
> > > https://01.org/suspendresume/blogs/tebrandt/2013/hard-disk-resume-optimization-simpler-approach
> > 
> > Please include changes descriptions from the link above in the actual patch
> > descriptions.
> > 
> > "The essential issue behind hard disks' lengthy resume time is the ata
> > port driver blocking until the ATA port hardware is finished coming online.
> > So the kernel isn't really doing anything during all those seconds that
> > the disks are resuming, it's just blocking until the hardware says it's
> > ready to accept commands. This patch changes the ATA port driver to issue
> > the wakeup command and then return immediately. Any commands issued to
> > the hardware will be queued up and will be executed once the port is
> > physically online. Thus no information is lost, and although the wait
> > time itself isn't removed, it doesn't hold up the rest of the system
> > which can function on what's left in RAM and cache."
> > 
> 
>  What happens when somebody requests suspend while ATA port resume is still
>  running?
> 
> The suspend request is queued until the resume has completed. I've tested
> that case very heavily. Basically if you do two suspends in a row (within
> seconds of each other) you lose any performance benefit, but that's a use
> case that should happen only very rarerly (and wouldn't be expected to
> be of any power benefit since too many sequential suspends actually
> takes more power than just letting the machine stay in run mode).

OK.

> > 
> > > Here's a synopsis of the results.
> > > 
> > > -------------------------------------------------------
> > > [Computer One]
> > > PLATFORM: Ubuntu Raring Ringtail (13.04)
> > > KERNEL: 3.11.0-rc7
> > > CPU: Intel(R) Core(TM) i7-3960X CPU @ 3.30GHz
> > > SATA: Intel C600/X79 series chipset 6-Port SATA AHCI (r5)
> > > DISK CONFIG:
> > >     ATA1: 240 GB SSD
> > >     ATA2: 3 TB Hard Disk
> > >     ATA3: 500 GB Hard Disk
> > >     ATA4: DVD-ROM (with cd inserted)
> > >     ATA5: 2 TB Hard Disk
> > >     ATA6: 1 TB Hard Disk
> > > RESUME TIME WITHOUT PATCH: 11656 ms
> > > RESUME TIME WITH PATCH: 1110 ms
> 
>  These results are with both patches applied, could you tell what is
>  the improvement from the patch #1 alone?
> 
> With either patch alone there is no benefit. I've tested with each
> patch applied independently to verify that they function, but you 
> can't get the benefit without both. 
> 
>  Where is the 11s delay coming from? Are we doing the resume for all 
>  ports sequentially instead of in parallel? In such case you should be
>  fixing the power management layer instead.
> 
> The ATA ports are all resumed in parallel. The long resume time
> comes from extremely large drives. The ATA port wakeup delay is
> the hardware running an internal subroutine to reinitialize the
> connection, which I guess takes longer for bigger disks. I'm not
> completely sure what it's doing, I just know that there's nothing
> software can do to speed it up.

OK.

> > > IMPROVEMENT: 10.5X speedup
> > > 
> > > -------------------------------------------------------
> > > [Computer Two]
> > > PLATFORM: Ubuntu Raring Ringtail (13.04)
> > > KERNEL: 3.11.0-rc7
> > > CPU: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
> > > SATA: Intel 7 Series/C210 Series Chipset Family 6-port SATA [AHCI mode] (r4)
> > > DISK CONFIG:
> > >     ATA1: 320 GB Hard Disk
> > >     ATA2 - ATA6: Empty slots
> > > RESUME TIME WITHOUT PATCH: 5416 ms
> > > RESUME TIME WITH PATCH: 448 ms
> > > 
> > > IMPROVEMENT: 12X speedup
> > > 
> > > -------------------------------------------------------
> > > [Computer Three]
> > > PLATFORM: Ubuntu Raring Ringtail (13.04)
> > > KERNEL: 3.11.0-rc7
> > > CPU: Intel(R) Core(TM) i7-4770S CPU @ 3.10GHz
> > > SATA: Intel Lynx Point 6-port SATA Controller 1 [AHCI mode] (r2)
> > > DISK CONFIG:
> > >     ATA1,3,4,6: Empty Slots
> > >     ATA2: DVD-ROM (empty)
> > >     ATA5: 500 GB Hard Disk
> > > RESUME TIME WITHOUT PATCH: 5385 ms
> > > RESUME TIME WITH PATCH: 688 ms
> > > 
> > > IMPROVEMENT: 7.8X speedup
> > > 
> > > -------------------------------------------------------
> > > 
> > > Signed-off-by: Todd Brandt <todd.e.brandt@intel.com>
> > > Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
> > > 
> > >  drivers/ata/libata-core.c | 36 +++++++++++++++++++++++++++++++++++-
> > >  1 file changed, 35 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> > > index c24354d..6cf0c15 100644
> > > --- a/drivers/ata/libata-core.c
> > > +++ b/drivers/ata/libata-core.c
> > > @@ -5415,6 +5415,40 @@ static int ata_port_resume(struct device *dev)
> > >  	return rc;
> > >  }
> > >  
> > > +static int ata_port_resume_async(struct device *dev)
> > > +{
> > > +	struct ata_port *ap = to_ata_port(dev);
> > > +	struct ata_link *link;
> > > +	unsigned long flags;
> > > +	int ret = 0;
> > > +
> > > +	if (ap->pflags & ATA_PFLAG_PM_PENDING) {
> > > +		WARN_ON(ap->pflags & ATA_PFLAG_PM_PENDING);
> > > +		ret = -EAGAIN;
> > > +		goto out;
> > > +	}
> > > +
> > > +	spin_lock_irqsave(ap->lock, flags);
> > > +
> > > +	ap->pm_mesg = PMSG_RESUME;
> > > +	ap->pm_result = NULL;
> > > +	ap->pflags |= ATA_PFLAG_PM_PENDING;
> > > +	ata_for_each_link(link, ap, HOST_FIRST) {
> > > +		link->eh_info.action |= ATA_EH_RESET;
> > > +		link->eh_info.flags |= ATA_EHI_NO_AUTOPSY | ATA_EHI_QUIET;
> > > +	}
> > > +
> > > +	ata_port_schedule_eh(ap);
> > > +
> > > +	spin_unlock_irqrestore(ap->lock, flags);
> > 
> 
>  Why have you open-coded ata_port_request_pm() instead of just re-using it
>  but with "async" parameter set?
> 
> Because the async mode call requires a place to capture the value, and in
> this case I want to ignore it. The PM subsystem doesn't supply any method
> of checking on the status of a resume callback that's running asynchronously
> because it has no way of knowing. So without altering the pm code there's
> no point in collecting it.

Then how's about changing ata_port_request_pm() to have async bool flag and
another argument (i.e. async_rc) which will be used to store the return value?

Something like:

static int ata_port_request_pm(struct ata_port *ap, pm_message_t mesg,
			       unsigned int action, unsigned int ehi_flags,
			       bool async, int *async_rc)
{
	struct ata_link *link;
	unsigned long flags;
	int rc = 0;

	/* Previous resume operation might still be in
	 * progress.  Wait for PM_PENDING to clear.
	 */
	if (ap->pflags & ATA_PFLAG_PM_PENDING) {
		if (async) {
			if (async_rc) {
				*async_rc = -EAGAIN;
				return 0;
			}
			return -EAGAIN;
		}
		ata_port_wait_eh(ap);
		WARN_ON(ap->pflags & ATA_PFLAG_PM_PENDING);
	}

	/* request PM ops to EH */
	spin_lock_irqsave(ap->lock, flags);

	ap->pm_mesg = mesg;
	if (async)
		ap->pm_result = async_rc ? async_rc : NULL;
	else
		ap->pm_result = &rc;

	ap->pflags |= ATA_PFLAG_PM_PENDING;
	ata_for_each_link(link, ap, HOST_FIRST) {
		link->eh_info.action |= action;
		link->eh_info.flags |= ehi_flags;
	}

	ata_port_schedule_eh(ap);

	spin_unlock_irqrestore(ap->lock, flags);

	/* wait and check result */
	if (!async) {
		ata_port_wait_eh(ap);
		WARN_ON(ap->pflags & ATA_PFLAG_PM_PENDING);
	}

	return rc;
}

> > 
> > > + out:
> > > +	pm_runtime_disable(dev);
> > > +	pm_runtime_set_active(dev);
> > > +	pm_runtime_enable(dev);
> > > +	return ret;
> > > 
> > > +}
> > > 
> > >  /*
> > >   * For ODDs, the upper layer will poll for media change every few seconds,
> > >   * which will make it enter and leave suspend state every few seconds. And
> > > @@ -5451,7 +5485,7 @@ static int ata_port_runtime_resume(struct device *dev)
> > >  
> > >  static const struct dev_pm_ops ata_port_pm_ops = {
> > >  	.suspend = ata_port_suspend,
> > > -	.resume = ata_port_resume,
> > > +	.resume = ata_port_resume_async,
> > 
> 
>  ->resume will now return success even though it can later fail during
>  the async operation (error value will be lost), this is no-go, sorry.
> 
> Yes, that's the nature of the asynchronous paradigm. The return value in this 
> case is the status of whether the resume was initiated, rather than if it was 
> completed. I'm letting the scsi and ata drivers assume control over their
> own error reporting in this case (which they already do). If you look at 
> the ata_port resume code you'll see that the ata_port_resume callback returns
> the status of the ahci_port_resume callback, which is always 0. So for SATA 
> disks the returned value for the resume callback is always 0. Error 
> handling and reporting is taken care of in the scsi error handler thread, 
> which runs separately from the pm core. I agree that in general it's a 
> "no-go" to ignore a return value, but in this specific case for this 
> specific driver, the return value is erronuous anyway, so there's no 
> harm in ignoring it.

Makes sense, thanks for explaining this.

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung R&D Institute Poland
Samsung Electronics


^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH/RESEND 1/2] Hard disk S3 resume time optimization
  2013-09-10 12:59     ` Bartlomiej Zolnierkiewicz
@ 2013-09-10 20:23       ` Brandt, Todd E
  0 siblings, 0 replies; 5+ messages in thread
From: Brandt, Todd E @ 2013-09-10 20:23 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz, todd.e.brandt; +Cc: linux-scsi, linux-ide, Tejun Heo

Good suggestion on the reuse of the existing ata_port_request_pm function. I have this habit of introducing new changes in such a way that they can be easily turned on and off for testing, but that's not appropriate here. I'll redo this patch with the functionality weaved into all the existing functions instead of creating new ones. Thanks for the feedback!

Todd Brandt
Linux Kernel Developer OTC, Hillsboro OR
https://opensource.intel.com/linux-wiki/ToddBrandt


________________________________________
From: linux-scsi-owner@vger.kernel.org [linux-scsi-owner@vger.kernel.org] on behalf of Bartlomiej Zolnierkiewicz [b.zolnierkie@samsung.com]
Sent: Tuesday, September 10, 2013 5:59 AM
To: todd.e.brandt@linux.intel.com
Cc: linux-scsi@vger.kernel.org
Subject: Re: [PATCH/RESEND 1/2] Hard disk S3 resume time optimization

On Friday, September 06, 2013 03:45:35 PM Todd E Brandt wrote:
> On Fri, Sep 06, 2013 at 06:54:48PM +0200, Bartlomiej Zolnierkiewicz wrote:
> >
> > Hi,
> >
> > On Thursday, September 05, 2013 05:38:53 PM Todd E Brandt wrote:
> > > This is the final draft of the non-blocking hard disk resume patch. I've
> > > included some performance results to demonstrate the real benefits
> > > of this patch. Please note that this patch provides a MASSIVE performance
> > > improvement in hard disk resume. It's too valuable to ignore, so I really
> > > need the help of the maintainers to get this implemented. Even if this
> > > patch is deemed the wrong approach I hope you won't abandon the idea
> > > altogether. There is so much potential in this kind of optimization and
> > > I'm highly motivated to make this work.
> > >
> > > To demonstrate the substantial performance improvement I've run the
> > > AnalyzeSuspend tool on three different platforms patched with the new
> > > behavior. Each is running Ubuntu Raring with a kernel built from the
> > > upstream kernel source.
> > >
> > > The complete analysis and graphical outputs of the tool are available
> > > online at 01.org:
> > > https://01.org/suspendresume/blogs/tebrandt/2013/hard-disk-resume-optimization-simpler-approach
> >
> > Please include changes descriptions from the link above in the actual patch
> > descriptions.
> >
> > "The essential issue behind hard disks' lengthy resume time is the ata
> > port driver blocking until the ATA port hardware is finished coming online.
> > So the kernel isn't really doing anything during all those seconds that
> > the disks are resuming, it's just blocking until the hardware says it's
> > ready to accept commands. This patch changes the ATA port driver to issue
> > the wakeup command and then return immediately. Any commands issued to
> > the hardware will be queued up and will be executed once the port is
> > physically online. Thus no information is lost, and although the wait
> > time itself isn't removed, it doesn't hold up the rest of the system
> > which can function on what's left in RAM and cache."
> >
>
>  What happens when somebody requests suspend while ATA port resume is still
>  running?
>
> The suspend request is queued until the resume has completed. I've tested
> that case very heavily. Basically if you do two suspends in a row (within
> seconds of each other) you lose any performance benefit, but that's a use
> case that should happen only very rarerly (and wouldn't be expected to
> be of any power benefit since too many sequential suspends actually
> takes more power than just letting the machine stay in run mode).

OK.

> >
> > > Here's a synopsis of the results.
> > >
> > > -------------------------------------------------------
> > > [Computer One]
> > > PLATFORM: Ubuntu Raring Ringtail (13.04)
> > > KERNEL: 3.11.0-rc7
> > > CPU: Intel(R) Core(TM) i7-3960X CPU @ 3.30GHz
> > > SATA: Intel C600/X79 series chipset 6-Port SATA AHCI (r5)
> > > DISK CONFIG:
> > >     ATA1: 240 GB SSD
> > >     ATA2: 3 TB Hard Disk
> > >     ATA3: 500 GB Hard Disk
> > >     ATA4: DVD-ROM (with cd inserted)
> > >     ATA5: 2 TB Hard Disk
> > >     ATA6: 1 TB Hard Disk
> > > RESUME TIME WITHOUT PATCH: 11656 ms
> > > RESUME TIME WITH PATCH: 1110 ms
>
>  These results are with both patches applied, could you tell what is
>  the improvement from the patch #1 alone?
>
> With either patch alone there is no benefit. I've tested with each
> patch applied independently to verify that they function, but you
> can't get the benefit without both.
>
>  Where is the 11s delay coming from? Are we doing the resume for all
>  ports sequentially instead of in parallel? In such case you should be
>  fixing the power management layer instead.
>
> The ATA ports are all resumed in parallel. The long resume time
> comes from extremely large drives. The ATA port wakeup delay is
> the hardware running an internal subroutine to reinitialize the
> connection, which I guess takes longer for bigger disks. I'm not
> completely sure what it's doing, I just know that there's nothing
> software can do to speed it up.

OK.

> > > IMPROVEMENT: 10.5X speedup
> > >
> > > -------------------------------------------------------
> > > [Computer Two]
> > > PLATFORM: Ubuntu Raring Ringtail (13.04)
> > > KERNEL: 3.11.0-rc7
> > > CPU: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
> > > SATA: Intel 7 Series/C210 Series Chipset Family 6-port SATA [AHCI mode] (r4)
> > > DISK CONFIG:
> > >     ATA1: 320 GB Hard Disk
> > >     ATA2 - ATA6: Empty slots
> > > RESUME TIME WITHOUT PATCH: 5416 ms
> > > RESUME TIME WITH PATCH: 448 ms
> > >
> > > IMPROVEMENT: 12X speedup
> > >
> > > -------------------------------------------------------
> > > [Computer Three]
> > > PLATFORM: Ubuntu Raring Ringtail (13.04)
> > > KERNEL: 3.11.0-rc7
> > > CPU: Intel(R) Core(TM) i7-4770S CPU @ 3.10GHz
> > > SATA: Intel Lynx Point 6-port SATA Controller 1 [AHCI mode] (r2)
> > > DISK CONFIG:
> > >     ATA1,3,4,6: Empty Slots
> > >     ATA2: DVD-ROM (empty)
> > >     ATA5: 500 GB Hard Disk
> > > RESUME TIME WITHOUT PATCH: 5385 ms
> > > RESUME TIME WITH PATCH: 688 ms
> > >
> > > IMPROVEMENT: 7.8X speedup
> > >
> > > -------------------------------------------------------
> > >
> > > Signed-off-by: Todd Brandt <todd.e.brandt@intel.com>
> > > Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
> > >
> > >  drivers/ata/libata-core.c | 36 +++++++++++++++++++++++++++++++++++-
> > >  1 file changed, 35 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> > > index c24354d..6cf0c15 100644
> > > --- a/drivers/ata/libata-core.c
> > > +++ b/drivers/ata/libata-core.c
> > > @@ -5415,6 +5415,40 @@ static int ata_port_resume(struct device *dev)
> > >   return rc;
> > >  }
> > >
> > > +static int ata_port_resume_async(struct device *dev)
> > > +{
> > > + struct ata_port *ap = to_ata_port(dev);
> > > + struct ata_link *link;
> > > + unsigned long flags;
> > > + int ret = 0;
> > > +
> > > + if (ap->pflags & ATA_PFLAG_PM_PENDING) {
> > > +         WARN_ON(ap->pflags & ATA_PFLAG_PM_PENDING);
> > > +         ret = -EAGAIN;
> > > +         goto out;
> > > + }
> > > +
> > > + spin_lock_irqsave(ap->lock, flags);
> > > +
> > > + ap->pm_mesg = PMSG_RESUME;
> > > + ap->pm_result = NULL;
> > > + ap->pflags |= ATA_PFLAG_PM_PENDING;
> > > + ata_for_each_link(link, ap, HOST_FIRST) {
> > > +         link->eh_info.action |= ATA_EH_RESET;
> > > +         link->eh_info.flags |= ATA_EHI_NO_AUTOPSY | ATA_EHI_QUIET;
> > > + }
> > > +
> > > + ata_port_schedule_eh(ap);
> > > +
> > > + spin_unlock_irqrestore(ap->lock, flags);
> >
>
>  Why have you open-coded ata_port_request_pm() instead of just re-using it
>  but with "async" parameter set?
>
> Because the async mode call requires a place to capture the value, and in
> this case I want to ignore it. The PM subsystem doesn't supply any method
> of checking on the status of a resume callback that's running asynchronously
> because it has no way of knowing. So without altering the pm code there's
> no point in collecting it.

Then how's about changing ata_port_request_pm() to have async bool flag and
another argument (i.e. async_rc) which will be used to store the return value?

Something like:

static int ata_port_request_pm(struct ata_port *ap, pm_message_t mesg,
                               unsigned int action, unsigned int ehi_flags,
                               bool async, int *async_rc)
{
        struct ata_link *link;
        unsigned long flags;
        int rc = 0;

        /* Previous resume operation might still be in
         * progress.  Wait for PM_PENDING to clear.
         */
        if (ap->pflags & ATA_PFLAG_PM_PENDING) {
                if (async) {
                        if (async_rc) {
                                *async_rc = -EAGAIN;
                                return 0;
                        }
                        return -EAGAIN;
                }
                ata_port_wait_eh(ap);
                WARN_ON(ap->pflags & ATA_PFLAG_PM_PENDING);
        }

        /* request PM ops to EH */
        spin_lock_irqsave(ap->lock, flags);

        ap->pm_mesg = mesg;
        if (async)
                ap->pm_result = async_rc ? async_rc : NULL;
        else
                ap->pm_result = &rc;

        ap->pflags |= ATA_PFLAG_PM_PENDING;
        ata_for_each_link(link, ap, HOST_FIRST) {
                link->eh_info.action |= action;
                link->eh_info.flags |= ehi_flags;
        }

        ata_port_schedule_eh(ap);

        spin_unlock_irqrestore(ap->lock, flags);

        /* wait and check result */
        if (!async) {
                ata_port_wait_eh(ap);
                WARN_ON(ap->pflags & ATA_PFLAG_PM_PENDING);
        }

        return rc;
}

> >
> > > + out:
> > > + pm_runtime_disable(dev);
> > > + pm_runtime_set_active(dev);
> > > + pm_runtime_enable(dev);
> > > + return ret;
> > >
> > > +}
> > >
> > >  /*
> > >   * For ODDs, the upper layer will poll for media change every few seconds,
> > >   * which will make it enter and leave suspend state every few seconds. And
> > > @@ -5451,7 +5485,7 @@ static int ata_port_runtime_resume(struct device *dev)
> > >
> > >  static const struct dev_pm_ops ata_port_pm_ops = {
> > >   .suspend = ata_port_suspend,
> > > - .resume = ata_port_resume,
> > > + .resume = ata_port_resume_async,
> >
>
>  ->resume will now return success even though it can later fail during
>  the async operation (error value will be lost), this is no-go, sorry.
>
> Yes, that's the nature of the asynchronous paradigm. The return value in this
> case is the status of whether the resume was initiated, rather than if it was
> completed. I'm letting the scsi and ata drivers assume control over their
> own error reporting in this case (which they already do). If you look at
> the ata_port resume code you'll see that the ata_port_resume callback returns
> the status of the ahci_port_resume callback, which is always 0. So for SATA
> disks the returned value for the resume callback is always 0. Error
> handling and reporting is taken care of in the scsi error handler thread,
> which runs separately from the pm core. I agree that in general it's a
> "no-go" to ignore a return value, but in this specific case for this
> specific driver, the return value is erronuous anyway, so there's no
> harm in ignoring it.

Makes sense, thanks for explaining this.

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung R&D Institute Poland
Samsung Electronics

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-09-10 20:27 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-06  0:38 [PATCH/RESEND 1/2] Hard disk S3 resume time optimization Todd E Brandt
2013-09-06 16:54 ` Bartlomiej Zolnierkiewicz
2013-09-06 22:45   ` Todd E Brandt
2013-09-10 12:59     ` Bartlomiej Zolnierkiewicz
2013-09-10 20:23       ` Brandt, Todd E

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.