All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] cpufreq: intel_pstate: Improve IO performance
@ 2017-08-02  3:45 Srinivas Pandruvada
  2017-08-04  0:34 ` Rafael J. Wysocki
  0 siblings, 1 reply; 3+ messages in thread
From: Srinivas Pandruvada @ 2017-08-02  3:45 UTC (permalink / raw)
  To: rjw, lenb; +Cc: linux-pm, Srinivas Pandruvada

In the current implementation the latency from SCHED_CPUFREQ_IOWAIT is
set to actual P-state adjustment can be upto 10ms. This can be improved
by reacting to SCHED_CPUFREQ_IOWAIT by jumping to max P-state immediately
. With this change the IO performance improves significantly.

With a simple "grep -r . linux" (Here linux is kernel source folder) with
dropped caches every time on a platform with per core P-states on a
Broadwell Xeon workstation, the user and system time improves as much as
30% to 40%.

The same performance difference was not observed on clients, which don't
have per core P-state support.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
---
 drivers/cpufreq/intel_pstate.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 8c67b77..7762255 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -1527,6 +1527,15 @@ static void intel_pstate_update_util(struct update_util_data *data, u64 time,
 
 	if (flags & SCHED_CPUFREQ_IOWAIT) {
 		cpu->iowait_boost = int_tofp(1);
+		/*
+		 * The last time the busy was 100% so P-state was max anyway
+		 * so avoid overhead of computation.
+		 */
+		if (fp_toint(cpu->sample.busy_scaled) == 100) {
+			cpu->last_update = time;
+			return;
+		}
+		goto set_pstate;
 	} else if (cpu->iowait_boost) {
 		/* Clear iowait_boost if the CPU may have been idle. */
 		delta_ns = time - cpu->last_update;
@@ -1538,6 +1547,7 @@ static void intel_pstate_update_util(struct update_util_data *data, u64 time,
 	if ((s64)delta_ns < INTEL_PSTATE_DEFAULT_SAMPLING_INTERVAL)
 		return;
 
+set_pstate:
 	if (intel_pstate_sample(cpu, time)) {
 		int target_pstate;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] cpufreq: intel_pstate: Improve IO performance
  2017-08-02  3:45 [PATCH] cpufreq: intel_pstate: Improve IO performance Srinivas Pandruvada
@ 2017-08-04  0:34 ` Rafael J. Wysocki
  2017-08-04  1:47   ` Srinivas Pandruvada
  0 siblings, 1 reply; 3+ messages in thread
From: Rafael J. Wysocki @ 2017-08-04  0:34 UTC (permalink / raw)
  To: Srinivas Pandruvada; +Cc: Rafael J. Wysocki, Len Brown, Linux PM

On Wed, Aug 2, 2017 at 5:45 AM, Srinivas Pandruvada
<srinivas.pandruvada@linux.intel.com> wrote:
> In the current implementation the latency from SCHED_CPUFREQ_IOWAIT is
> set to actual P-state adjustment can be upto 10ms. This can be improved
> by reacting to SCHED_CPUFREQ_IOWAIT by jumping to max P-state immediately
> . With this change the IO performance improves significantly.
>
> With a simple "grep -r . linux" (Here linux is kernel source folder) with
> dropped caches every time on a platform with per core P-states on a
> Broadwell Xeon workstation, the user and system time improves as much as
> 30% to 40%.
>
> The same performance difference was not observed on clients, which don't
> have per core P-state support.
>
> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> ---
>  drivers/cpufreq/intel_pstate.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> index 8c67b77..7762255 100644
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -1527,6 +1527,15 @@ static void intel_pstate_update_util(struct update_util_data *data, u64 time,
>
>         if (flags & SCHED_CPUFREQ_IOWAIT) {
>                 cpu->iowait_boost = int_tofp(1);
> +               /*
> +                * The last time the busy was 100% so P-state was max anyway
> +                * so avoid overhead of computation.
> +                */
> +               if (fp_toint(cpu->sample.busy_scaled) == 100) {
> +                       cpu->last_update = time;
> +                       return;
> +               }
> +               goto set_pstate;

cpu->last_update should also be updated when you jump to set_pstate,
shouldn't it?

>         } else if (cpu->iowait_boost) {
>                 /* Clear iowait_boost if the CPU may have been idle. */
>                 delta_ns = time - cpu->last_update;
> @@ -1538,6 +1547,7 @@ static void intel_pstate_update_util(struct update_util_data *data, u64 time,
>         if ((s64)delta_ns < INTEL_PSTATE_DEFAULT_SAMPLING_INTERVAL)
>                 return;
>
> +set_pstate:
>         if (intel_pstate_sample(cpu, time)) {
>                 int target_pstate;
>
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] cpufreq: intel_pstate: Improve IO performance
  2017-08-04  0:34 ` Rafael J. Wysocki
@ 2017-08-04  1:47   ` Srinivas Pandruvada
  0 siblings, 0 replies; 3+ messages in thread
From: Srinivas Pandruvada @ 2017-08-04  1:47 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Rafael J. Wysocki, Len Brown, Linux PM

On Fri, 2017-08-04 at 02:34 +0200, Rafael J. Wysocki wrote:
> On Wed, Aug 2, 2017 at 5:45 AM, Srinivas Pandruvada
> <srinivas.pandruvada@linux.intel.com> wrote:
> > 
> > In the current implementation the latency from SCHED_CPUFREQ_IOWAIT
> > is
> > set to actual P-state adjustment can be upto 10ms. This can be
> > improved
> > by reacting to SCHED_CPUFREQ_IOWAIT by jumping to max P-state
> > immediately
> > . With this change the IO performance improves significantly.
> > 
> > With a simple "grep -r . linux" (Here linux is kernel source
> > folder) with
> > dropped caches every time on a platform with per core P-states on a
> > Broadwell Xeon workstation, the user and system time improves as
> > much as
> > 30% to 40%.
> > 
> > The same performance difference was not observed on clients, which
> > don't
> > have per core P-state support.
> > 
> > Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel
> > .com>
> > ---
> >  drivers/cpufreq/intel_pstate.c | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> > 
> > diff --git a/drivers/cpufreq/intel_pstate.c
> > b/drivers/cpufreq/intel_pstate.c
> > index 8c67b77..7762255 100644
> > --- a/drivers/cpufreq/intel_pstate.c
> > +++ b/drivers/cpufreq/intel_pstate.c
> > @@ -1527,6 +1527,15 @@ static void intel_pstate_update_util(struct
> > update_util_data *data, u64 time,
> > 
> >         if (flags & SCHED_CPUFREQ_IOWAIT) {
> >                 cpu->iowait_boost = int_tofp(1);
> > +               /*
> > +                * The last time the busy was 100% so P-state was
> > max anyway
> > +                * so avoid overhead of computation.
> > +                */
> > +               if (fp_toint(cpu->sample.busy_scaled) == 100) {
> > +                       cpu->last_update = time;
> > +                       return;
> > +               }
> > +               goto set_pstate;
> cpu->last_update should also be updated when you jump to set_pstate,
> shouldn't it?
Yes. It should be updated.

Thanks,
Srinivas

> 
> > 
> >         } else if (cpu->iowait_boost) {
> >                 /* Clear iowait_boost if the CPU may have been
> > idle. */
> >                 delta_ns = time - cpu->last_update;
> > @@ -1538,6 +1547,7 @@ static void intel_pstate_update_util(struct
> > update_util_data *data, u64 time,
> >         if ((s64)delta_ns < INTEL_PSTATE_DEFAULT_SAMPLING_INTERVAL)
> >                 return;
> > 
> > +set_pstate:
> >         if (intel_pstate_sample(cpu, time)) {
> >                 int target_pstate;
> > 
> > --
> > 2.7.4
> > 

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-08-04  1:47 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-02  3:45 [PATCH] cpufreq: intel_pstate: Improve IO performance Srinivas Pandruvada
2017-08-04  0:34 ` Rafael J. Wysocki
2017-08-04  1:47   ` Srinivas Pandruvada

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.