linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Gautham R Shenoy <ego@linux.vnet.ibm.com>
To: Nathan Lynch <nathanl@linux.ibm.com>
Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>,
	linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>,
	Kamalesh Babulal <kamaleshb@in.ibm.com>,
	"Naveen N . Rao" <naveen.n.rao@linux.vnet.ibm.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Nicholas Piggin <npiggin@gmail.com>,
	Tyrel Datwyler <tyreld@linux.ibm.com>
Subject: Re: [PATCH 0/2] pseries/hotplug: Change the default behaviour of cede_offline
Date: Wed, 18 Sep 2019 18:00:39 +0530	[thread overview]
Message-ID: <20190918123039.GA12534@in.ibm.com> (raw)
In-Reply-To: <87a7b2rfj0.fsf@linux.ibm.com>

Hello Nathan, Michael,

On Tue, Sep 17, 2019 at 12:36:35PM -0500, Nathan Lynch wrote:
> Gautham R Shenoy <ego@linux.vnet.ibm.com> writes:
> > On Thu, Sep 12, 2019 at 10:39:45AM -0500, Nathan Lynch wrote:
> >> "Gautham R. Shenoy" <ego@linux.vnet.ibm.com> writes:
> >> > The patchset also defines a new sysfs attribute
> >> > "/sys/device/system/cpu/cede_offline_enabled" on PSeries Linux guests
> >> > to allow userspace programs to change the state into which the
> >> > offlined CPU need to be put to at runtime.
> >> 
> >> A boolean sysfs interface will become awkward if we need to add another
> >> mode in the future.
> >> 
> >> What do you think about naming the attribute something like
> >> 'offline_mode', with the possible values 'extended-cede' and
> >> 'rtas-stopped'?
> >
> > We can do that. However, IMHO in the longer term, on PSeries guests,
> > we should have only one offline state - rtas-stopped.  The reason for
> > this being, that on Linux, SMT switch is brought into effect through
> > the CPU Hotplug interface. The only state in which the SMT switch will
> > recognized by the hypervisors such as PHYP is rtas-stopped.
> 
> OK. Why "longer term" though, instead of doing it now?

Because adding extended-cede into a cpuidle state is non-trivial since
a CPU in that state is non responsive to external interrupts. We will
additional changes in the IPI, Timer and the Interrupt code to ensure
that these get translated to a H_PROD in order to wake-up the target
CPU in extended CEDE.

Timer: is relatively easy since the cpuidle infrastructure has the
       timer-offload framework (used for fastsleep in POWER8) where we
       can offload the timers of an idling CPU to another CPU which
       can wakeup the CPU when the timer expires via an IPI.

IPIs: We need to ensure that icp_hv_set_qirr() correctly sends H_IPI
      or H_PROD depending on whether or not the target CPU is in
      extended CEDE.

Interrupts: Either we migrate away the interrupts from the CPU that is
            entering extended CEDE or we prevent a CPU that is the
            sole target for an interrupt from entering extended CEDE.

The accounting problem in tools such as lparstat with
"cede_offline=on" is affecting customers who are using these tools for
capacity-planning. That problem needs a fix in the short-term, for
which Patch 1 changes the default behaviour of cede_offline from "on"
to "off". Since this patch would break the existing userspace tools
that use the CPU-Offline infrastructure to fold CPUs for saving power,
the sysfs interface allowing a runtime change of cede_offline_enabled
was provided to enable these userspace tools to cope with minimal
change.

> 
> 
> > All other states (such as extended-cede) should in the long-term be
> > exposed via the cpuidle interface.
> >
> > With this in mind, I made the sysfs interface boolean to mirror the
> > current "cede_offline" commandline parameter. Eventually when we have
> > only one offline-state, we can deprecate the commandline parameter as
> > well as the sysfs interface.
> 
> I don't care for adding a sysfs interface that is intended from the
> beginning to become vestigial...

Fair point. Come to think of it, in case the cpuidle menu governor
behaviour doesn't match the expectations provided by the current
userspace solutions for folding idle CPUs for power-savings, it would
be useful to have this option around so that existing users who prefer
the userspace solution can still have that option.

> 
> This strikes me as unnecessarily incremental if you're changing the
> default offline state. Any user space programs depending on the current
> behavior will have to change anyway (and why is it OK to break them?)
>

Yes, the current userspace program will need to be modified to check
for the sysfs interface and change the value to
cede_offline_enabled=1.

> Why isn't the plan:
> 
>   1. Add extended cede support to the pseries cpuidle driver
>   2. Make stop-self the only cpu offline state for pseries (no sysfs
>      interface necessary)

This is the plan, except that 1. requires some additional work and
this patchset is proposed as a short-term mitigation until we get
1. right.

> 
> ?

--
Thanks and Regards
gautham.

  parent reply	other threads:[~2019-09-18 12:31 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-12 10:35 [PATCH 0/2] pseries/hotplug: Change the default behaviour of cede_offline Gautham R. Shenoy
2019-09-12 10:35 ` [PATCH 1/2] pseries/hotplug-cpu: Change default behaviour of cede_offline to "off" Gautham R. Shenoy
2019-09-12 10:35 ` [PATCH 2/2] pseries/hotplug-cpu: Add sysfs attribute for cede_offline Gautham R. Shenoy
2019-09-12 15:39 ` [PATCH 0/2] pseries/hotplug: Change the default behaviour of cede_offline Nathan Lynch
2019-09-15  7:42   ` Gautham R Shenoy
2019-09-17 17:36     ` Nathan Lynch
2019-09-18  5:17       ` Michael Ellerman
2019-09-18 12:30       ` Gautham R Shenoy [this message]
2019-09-18 17:08         ` Nathan Lynch
2019-09-18  5:14 ` Michael Ellerman
2019-09-18  6:52   ` Naveen N. Rao
2019-09-18 11:31     ` Michael Ellerman
2019-09-18 13:38       ` Aneesh Kumar K.V
2019-09-18 16:24       ` Naveen N. Rao
2019-09-18 12:51   ` Gautham R Shenoy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190918123039.GA12534@in.ibm.com \
    --to=ego@linux.vnet.ibm.com \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=kamaleshb@in.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=nathanl@linux.ibm.com \
    --cc=naveen.n.rao@linux.vnet.ibm.com \
    --cc=npiggin@gmail.com \
    --cc=svaidy@linux.vnet.ibm.com \
    --cc=tyreld@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).