All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
To: Michal Such?nek <msuchanek@suse.de>
Cc: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
	linuxppc-dev@lists.ozlabs.org, joedecke@de.ibm.com,
	linux-pm@vger.kernel.org
Subject: Re: [PATCH] cpuidle/pseries: Fixup CEDE0 latency only for POWER10 onwards
Date: Fri, 23 Apr 2021 23:59:30 +0530	[thread overview]
Message-ID: <YIMSCjTzcSwjQtRi@drishya.in.ibm.com> (raw)
In-Reply-To: <20210423174505.GE6564@kitsune.suse.cz>

* Michal Such?nek <msuchanek@suse.de> [2021-04-23 19:45:05]:

> On Fri, Apr 23, 2021 at 09:29:39PM +0530, Vaidyanathan Srinivasan wrote:
> > * Michal Such?nek <msuchanek@suse.de> [2021-04-23 09:35:51]:
> > 
> > > On Thu, Apr 22, 2021 at 08:37:29PM +0530, Gautham R. Shenoy wrote:
> > > > From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
> > > > 
> > > > Commit d947fb4c965c ("cpuidle: pseries: Fixup exit latency for
> > > > CEDE(0)") sets the exit latency of CEDE(0) based on the latency values
> > > > of the Extended CEDE states advertised by the platform
> > > > 
> > > > On some of the POWER9 LPARs, the older firmwares advertise a very low
> > > > value of 2us for CEDE1 exit latency on a Dedicated LPAR. However the
> > > Can you be more specific about 'older firmwares'?
> > 
> > Hi Michal,
> > 
> > This is POWER9 vs POWER10 difference, not really an obsolete FW.  The
> > key idea behind the original patch was to make the H_CEDE latency and
> > hence target residency come from firmware instead of being decided by
> > the kernel.  The advantage is such that, different type of systems in
> > POWER10 generation can adjust this value and have an optimal H_CEDE
> > entry criteria which balances good single thread performance and
> > wakeup latency.  Further we can have additional H_CEDE state to feed
> > into the cpuidle.  
> 
> So all POWER9 machines are affected by the firmware bug where firmware
> reports CEDE1 exit latency of 2us and the real latency is 5us which
> causes the kernel to prefer CEDE1 too much when relying on the values
> supplied by the firmware. It is not about 'older firmware'.

Correct.  All POWER9 systems running Linux as guest LPARs will see
extra usage of CEDE idle state, but not baremetal (PowerNV).

The correct definition of the bug or miss-match in expectation is that
firmware reports wakeup latency from a core/thread wakeup timing, but
not end-to-end time from sending a wakeup event like an IPI using
H_calls and receiving the events on the target.  Practically there are
few extra micro-seconds needed after deciding to wakeup a target
core/thread to getting the target to start executing instructions
within the LPAR instance.

> I still think it would be preferrable to adjust the latency value
> reported by the firmware to match reality over a kernel workaround.

Right, practically we can fix for future releases and as such we
targeted this scheme from POWER10 but expected no harm on POWER9 which
proved to be wrong.

We can possibly change this FW value for POWER9, but it is too
expensive and not practical because many release streams exist for
different platforms and further customers are at different streams as
well.  We cannot force all of them to update because that blows up
co-dependency matrix.

From a user/customer's view current kernel worked fine, why is
a kernel update changing my performance :(

Looking back, we should have boxed any kernel-firmware dependent
feature like this one to a future releases only.  We have all options
open for a future release like POWER10, but on a released product
stream, we have to manage with kernel changes.  In this specific case
we should have been very conservative and not allow the kernel to
change behaviour on released products.

Thanks,
Vaidy


WARNING: multiple messages have this Message-ID (diff)
From: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
To: Michal Such?nek <msuchanek@suse.de>
Cc: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>,
	linux-pm@vger.kernel.org,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	joedecke@de.ibm.com, linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH] cpuidle/pseries: Fixup CEDE0 latency only for POWER10 onwards
Date: Fri, 23 Apr 2021 23:59:30 +0530	[thread overview]
Message-ID: <YIMSCjTzcSwjQtRi@drishya.in.ibm.com> (raw)
In-Reply-To: <20210423174505.GE6564@kitsune.suse.cz>

* Michal Such?nek <msuchanek@suse.de> [2021-04-23 19:45:05]:

> On Fri, Apr 23, 2021 at 09:29:39PM +0530, Vaidyanathan Srinivasan wrote:
> > * Michal Such?nek <msuchanek@suse.de> [2021-04-23 09:35:51]:
> > 
> > > On Thu, Apr 22, 2021 at 08:37:29PM +0530, Gautham R. Shenoy wrote:
> > > > From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
> > > > 
> > > > Commit d947fb4c965c ("cpuidle: pseries: Fixup exit latency for
> > > > CEDE(0)") sets the exit latency of CEDE(0) based on the latency values
> > > > of the Extended CEDE states advertised by the platform
> > > > 
> > > > On some of the POWER9 LPARs, the older firmwares advertise a very low
> > > > value of 2us for CEDE1 exit latency on a Dedicated LPAR. However the
> > > Can you be more specific about 'older firmwares'?
> > 
> > Hi Michal,
> > 
> > This is POWER9 vs POWER10 difference, not really an obsolete FW.  The
> > key idea behind the original patch was to make the H_CEDE latency and
> > hence target residency come from firmware instead of being decided by
> > the kernel.  The advantage is such that, different type of systems in
> > POWER10 generation can adjust this value and have an optimal H_CEDE
> > entry criteria which balances good single thread performance and
> > wakeup latency.  Further we can have additional H_CEDE state to feed
> > into the cpuidle.  
> 
> So all POWER9 machines are affected by the firmware bug where firmware
> reports CEDE1 exit latency of 2us and the real latency is 5us which
> causes the kernel to prefer CEDE1 too much when relying on the values
> supplied by the firmware. It is not about 'older firmware'.

Correct.  All POWER9 systems running Linux as guest LPARs will see
extra usage of CEDE idle state, but not baremetal (PowerNV).

The correct definition of the bug or miss-match in expectation is that
firmware reports wakeup latency from a core/thread wakeup timing, but
not end-to-end time from sending a wakeup event like an IPI using
H_calls and receiving the events on the target.  Practically there are
few extra micro-seconds needed after deciding to wakeup a target
core/thread to getting the target to start executing instructions
within the LPAR instance.

> I still think it would be preferrable to adjust the latency value
> reported by the firmware to match reality over a kernel workaround.

Right, practically we can fix for future releases and as such we
targeted this scheme from POWER10 but expected no harm on POWER9 which
proved to be wrong.

We can possibly change this FW value for POWER9, but it is too
expensive and not practical because many release streams exist for
different platforms and further customers are at different streams as
well.  We cannot force all of them to update because that blows up
co-dependency matrix.

From a user/customer's view current kernel worked fine, why is
a kernel update changing my performance :(

Looking back, we should have boxed any kernel-firmware dependent
feature like this one to a future releases only.  We have all options
open for a future release like POWER10, but on a released product
stream, we have to manage with kernel changes.  In this specific case
we should have been very conservative and not allow the kernel to
change behaviour on released products.

Thanks,
Vaidy


  reply	other threads:[~2021-04-23 18:29 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-22 15:07 [PATCH] cpuidle/pseries: Fixup CEDE0 latency only for POWER10 onwards Gautham R. Shenoy
2021-04-22 15:07 ` Gautham R. Shenoy
2021-04-22 17:57 ` Vaidyanathan Srinivasan
2021-04-22 17:57   ` Vaidyanathan Srinivasan
2021-04-23  7:35 ` Michal Suchánek
2021-04-23  7:35   ` Michal Suchánek
2021-04-23 15:59   ` Vaidyanathan Srinivasan
2021-04-23 15:59     ` Vaidyanathan Srinivasan
2021-04-23 17:45     ` Michal Suchánek
2021-04-23 17:45       ` Michal Suchánek
2021-04-23 18:29       ` Vaidyanathan Srinivasan [this message]
2021-04-23 18:29         ` Vaidyanathan Srinivasan
2021-04-23 18:42         ` Michal Suchánek
2021-04-23 18:42           ` Michal Suchánek
2021-04-24  7:37           ` Vaidyanathan Srinivasan
2021-04-24  7:37             ` Vaidyanathan Srinivasan
2021-04-25 11:07             ` Michal Suchánek
2021-04-25 11:07               ` Michal Suchánek
2021-04-28  5:58               ` Gautham R Shenoy
2021-04-28  5:58                 ` Gautham R Shenoy
2021-04-28  8:03                 ` Michal Suchánek
2021-04-28  8:03                   ` Michal Suchánek
2021-04-28 11:34                   ` Gautham R Shenoy
2021-04-28 11:34                     ` Gautham R Shenoy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YIMSCjTzcSwjQtRi@drishya.in.ibm.com \
    --to=svaidy@linux.ibm.com \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=ego@linux.vnet.ibm.com \
    --cc=joedecke@de.ibm.com \
    --cc=linux-pm@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=msuchanek@suse.de \
    --cc=rjw@rjwysocki.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.