All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gautham R Shenoy <ego@linux.vnet.ibm.com>
To: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
	Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>,
	linuxppc-dev@lists.ozlabs.org, linux-pm@vger.kernel.org,
	joedecke@de.ibm.com, Michal Suchanek <msuchanek@suse.de>,
	Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
Subject: Re: [PATCH v2] cpuidle/pseries: Fixup CEDE0 latency only for POWER10 onwards
Date: Thu, 29 Apr 2021 16:40:40 +0530	[thread overview]
Message-ID: <20210429111040.GA13183@in.ibm.com> (raw)
In-Reply-To: <87r1it9zxy.fsf@mpe.ellerman.id.au>

Hello Michael,

On Thu, Apr 29, 2021 at 07:56:25PM +1000, Michael Ellerman wrote:
> "Gautham R. Shenoy" <ego@linux.vnet.ibm.com> writes:
> > From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
> >
> > Commit d947fb4c965c ("cpuidle: pseries: Fixup exit latency for
> > CEDE(0)") sets the exit latency of CEDE(0) based on the latency values
> > of the Extended CEDE states advertised by the platform
> >
> > On POWER9 LPARs, the firmwares advertise a very low value of 2us for
> > CEDE1 exit latency on a Dedicated LPAR. The latency advertized by the
> > PHYP hypervisor corresponds to the latency required to wakeup from the
> > underlying hardware idle state. However the wakeup latency from the
> > LPAR perspective should include
> >
> > 1. The time taken to transition the CPU from the Hypervisor into the
> >    LPAR post wakeup from platform idle state
> >
> > 2. Time taken to send the IPI from the source CPU (waker) to the idle
> >    target CPU (wakee).
> >
> > 1. can be measured via timer idle test, where we queue a timer, say
> > for 1ms, and enter the CEDE state. When the timer fires, in the timer
> > handler we compute how much extra timer over the expected 1ms have we
> > consumed. On a a POWER9 LPAR the numbers are
> >
> > CEDE latency measured using a timer (numbers in ns)
> > N       Min      Median   Avg       90%ile  99%ile    Max    Stddev
> > 400     2601     5677     5668.74    5917    6413     9299   455.01
> >
> > 1. and 2. combined can be determined by an IPI latency test where we
> > send an IPI to an idle CPU and in the handler compute the time
> > difference between when the IPI was sent and when the handler ran. We
> > see the following numbers on POWER9 LPAR.
> >
> > CEDE latency measured using an IPI (numbers in ns)
> > N       Min      Median   Avg       90%ile  99%ile    Max    Stddev
> > 400     711      7564     7369.43   8559    9514      9698   1200.01
> >
> > Suppose, we consider the 99th percentile latency value measured using
> > the IPI to be the wakeup latency, the value would be 9.5us This is in
> > the ballpark of the default value of 10us.
> >
> > Hence, use the exit latency of CEDE(0) based on the latency values
> > advertized by platform only from POWER10 onwards. The values
>                                            ^^^^^^^
> > advertized on POWER10 platforms is more realistic and informed by the
> > latency measurements. For earlier platforms stick to the default value
> > of 10us.
> 
> ...
> 
> > diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
> > index a2b5c6f..7207467 100644
> > --- a/drivers/cpuidle/cpuidle-pseries.c
> > +++ b/drivers/cpuidle/cpuidle-pseries.c
> > @@ -419,7 +419,8 @@ static int pseries_idle_probe(void)
> >  			cpuidle_state_table = shared_states;
> >  			max_idle_state = ARRAY_SIZE(shared_states);
> >  		} else {
> > -			fixup_cede0_latency();
> > +			if (pvr_version_is(PVR_POWER10))
> > +				fixup_cede0_latency();
> 
> A PVR check like that tests for *only* Power10, not Power10 and onwards
> as you say in the change log.

Right. The accurate thing would be to check not do the fix up for


!(PVR_POWER4 || PVR_POWER4p || POWER_POWER5 || PVR_POWER5p  || PVR_POWER6  || PVR_POWER7
	     || PVR_POWER8  || PVR_POWER9)

But that was a bit mouthful. I will go with your suggestion (from
private correspondence)

if (cpu_has_feature(CPU_FTR_ARCH_31) || pvr_version_is(PVR_POWER10))
	fixup_cede0_latency(); 

since it will allow the fixup for Processors suporting ISA 3.1
(POWER10 and above) and also on POWER10 CPUs running in compat mode.


> 
> The other question is what should happen on a Power10 LPAR that's
> running in Power9 compat mode. I assume in that case we *do* want to use
> the firmware provided values, because they're tied to the underlying
> CPU, not the compat mode?
>

Yes, the firmware provided values are tied to the underlying CPU. Not
the compat mode.


> In which case a check for !PVR_POWER9 would seem to achieve what we
> want?
> 
> cheers

--
Thanks and Regards
gautham.

WARNING: multiple messages have this Message-ID (diff)
From: Gautham R Shenoy <ego@linux.vnet.ibm.com>
To: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>,
	Michal Suchanek <msuchanek@suse.de>,
	linux-pm@vger.kernel.org,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	joedecke@de.ibm.com,
	Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH v2] cpuidle/pseries: Fixup CEDE0 latency only for POWER10 onwards
Date: Thu, 29 Apr 2021 16:40:40 +0530	[thread overview]
Message-ID: <20210429111040.GA13183@in.ibm.com> (raw)
In-Reply-To: <87r1it9zxy.fsf@mpe.ellerman.id.au>

Hello Michael,

On Thu, Apr 29, 2021 at 07:56:25PM +1000, Michael Ellerman wrote:
> "Gautham R. Shenoy" <ego@linux.vnet.ibm.com> writes:
> > From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
> >
> > Commit d947fb4c965c ("cpuidle: pseries: Fixup exit latency for
> > CEDE(0)") sets the exit latency of CEDE(0) based on the latency values
> > of the Extended CEDE states advertised by the platform
> >
> > On POWER9 LPARs, the firmwares advertise a very low value of 2us for
> > CEDE1 exit latency on a Dedicated LPAR. The latency advertized by the
> > PHYP hypervisor corresponds to the latency required to wakeup from the
> > underlying hardware idle state. However the wakeup latency from the
> > LPAR perspective should include
> >
> > 1. The time taken to transition the CPU from the Hypervisor into the
> >    LPAR post wakeup from platform idle state
> >
> > 2. Time taken to send the IPI from the source CPU (waker) to the idle
> >    target CPU (wakee).
> >
> > 1. can be measured via timer idle test, where we queue a timer, say
> > for 1ms, and enter the CEDE state. When the timer fires, in the timer
> > handler we compute how much extra timer over the expected 1ms have we
> > consumed. On a a POWER9 LPAR the numbers are
> >
> > CEDE latency measured using a timer (numbers in ns)
> > N       Min      Median   Avg       90%ile  99%ile    Max    Stddev
> > 400     2601     5677     5668.74    5917    6413     9299   455.01
> >
> > 1. and 2. combined can be determined by an IPI latency test where we
> > send an IPI to an idle CPU and in the handler compute the time
> > difference between when the IPI was sent and when the handler ran. We
> > see the following numbers on POWER9 LPAR.
> >
> > CEDE latency measured using an IPI (numbers in ns)
> > N       Min      Median   Avg       90%ile  99%ile    Max    Stddev
> > 400     711      7564     7369.43   8559    9514      9698   1200.01
> >
> > Suppose, we consider the 99th percentile latency value measured using
> > the IPI to be the wakeup latency, the value would be 9.5us This is in
> > the ballpark of the default value of 10us.
> >
> > Hence, use the exit latency of CEDE(0) based on the latency values
> > advertized by platform only from POWER10 onwards. The values
>                                            ^^^^^^^
> > advertized on POWER10 platforms is more realistic and informed by the
> > latency measurements. For earlier platforms stick to the default value
> > of 10us.
> 
> ...
> 
> > diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
> > index a2b5c6f..7207467 100644
> > --- a/drivers/cpuidle/cpuidle-pseries.c
> > +++ b/drivers/cpuidle/cpuidle-pseries.c
> > @@ -419,7 +419,8 @@ static int pseries_idle_probe(void)
> >  			cpuidle_state_table = shared_states;
> >  			max_idle_state = ARRAY_SIZE(shared_states);
> >  		} else {
> > -			fixup_cede0_latency();
> > +			if (pvr_version_is(PVR_POWER10))
> > +				fixup_cede0_latency();
> 
> A PVR check like that tests for *only* Power10, not Power10 and onwards
> as you say in the change log.

Right. The accurate thing would be to check not do the fix up for


!(PVR_POWER4 || PVR_POWER4p || POWER_POWER5 || PVR_POWER5p  || PVR_POWER6  || PVR_POWER7
	     || PVR_POWER8  || PVR_POWER9)

But that was a bit mouthful. I will go with your suggestion (from
private correspondence)

if (cpu_has_feature(CPU_FTR_ARCH_31) || pvr_version_is(PVR_POWER10))
	fixup_cede0_latency(); 

since it will allow the fixup for Processors suporting ISA 3.1
(POWER10 and above) and also on POWER10 CPUs running in compat mode.


> 
> The other question is what should happen on a Power10 LPAR that's
> running in Power9 compat mode. I assume in that case we *do* want to use
> the firmware provided values, because they're tied to the underlying
> CPU, not the compat mode?
>

Yes, the firmware provided values are tied to the underlying CPU. Not
the compat mode.


> In which case a check for !PVR_POWER9 would seem to achieve what we
> want?
> 
> cheers

--
Thanks and Regards
gautham.

  reply	other threads:[~2021-04-29 11:31 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-29  5:18 [PATCH v2] cpuidle/pseries: Fixup CEDE0 latency only for POWER10 onwards Gautham R. Shenoy
2021-04-29  5:18 ` Gautham R. Shenoy
2021-04-29  9:56 ` Michael Ellerman
2021-04-29  9:56   ` Michael Ellerman
2021-04-29 11:10   ` Gautham R Shenoy [this message]
2021-04-29 11:10     ` Gautham R Shenoy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210429111040.GA13183@in.ibm.com \
    --to=ego@linux.vnet.ibm.com \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=joedecke@de.ibm.com \
    --cc=linux-pm@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=msuchanek@suse.de \
    --cc=rjw@rjwysocki.net \
    --cc=svaidy@linux.ibm.com \
    --cc=svaidy@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.