* [PATCH v2 0/1] pseries/hotplug: Change the default behaviour of cede_offline
@ 2019-10-22 10:33 Gautham R. Shenoy
2019-10-22 10:33 ` [PATCH v2 1/1] pseries/hotplug-cpu: Change default behaviour of cede_offline to "off" Gautham R. Shenoy
2019-10-25 23:03 ` [PATCH v2 0/1] pseries/hotplug: Change the default behaviour of cede_offline Nathan Lynch
0 siblings, 2 replies; 6+ messages in thread
From: Gautham R. Shenoy @ 2019-10-22 10:33 UTC (permalink / raw)
To: Nathan Lynch, Michael Ellerman, Nicholas Piggin, Tyrel Datwyler,
Vaidyanathan Srinivasan, Kamalesh Babulal, Naveen N. Rao,
Aneesh Kumar K.V
Cc: linux-kernel, linuxppc-dev, Gautham R. Shenoy
From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
This is the v2 of the fix to change the default behaviour of cede_offline.
The previous version can be found here: https://lkml.org/lkml/2019/9/12/222
The main change from v1 is that the patch2 to create a sysfs file to
report and control the value of cede_offline_enabled has been dropped.
Problem Description:
====================
Currently on Pseries Linux Guests, the offlined CPU can be put to one
of the following two states:
- Long term processor cede (also called extended cede)
- Returned to the Hypervisor via RTAS "stop-self" call.
This is controlled by the kernel boot parameter "cede_offline=on/off".
By default the offlined CPUs enter extended cede. The PHYP hypervisor
considers CPUs in extended cede to be "active" since the CPUs are
still under the control fo the Linux Guests. Hence, when we change the
SMT modes by offlining the secondary CPUs, the PURR and the RWMR SPRs
will continue to count the values for offlined CPUs in extended cede
as if they are online.
One of the expectations with PURR is that the for an interval of time,
the sum of the PURR increments across the online CPUs of a core should
equal the number of timebase ticks for that interval.
This is currently not the case.
In the following data (Generated using
https://github.com/gautshen/misc/blob/master/purr_tb.py):
SD-PURR = Sum of PURR increments on online CPUs of that core in 1 second
SMT=off
===========================================
Core SD-PURR SD-PURR
(expected) (observed)
===========================================
core00 [ 0] 512000000 69883784
core01 [ 8] 512000000 88782536
core02 [ 16] 512000000 94296824
core03 [ 24] 512000000 80951968
SMT=2
===========================================
Core SD-PURR SD-PURR
(expected) (observed)
===========================================
core00 [ 0,1] 512000000 136147792
core01 [ 8,9] 512000000 128636784
core02 [ 16,17] 512000000 135426488
core03 [ 24,25] 512000000 153027520
SMT=4
===================================================
Core SD-PURR SD-PURR
(expected) (observed)
===================================================
core00 [ 0,1,2,3] 512000000 258331616
core01 [ 8,9,10,11] 512000000 274220072
core02 [ 16,17,18,19] 512000000 260013736
core03 [ 24,25,26,27] 512000000 260079672
SMT=on
===================================================================
Core SD-PURR SD-PURR
(expected) (observed)
===================================================================
core00 [ 0,1,2,3,4,5,6,7] 512000000 512941248
core01 [ 8,9,10,11,12,13,14,15] 512000000 512936544
core02 [ 16,17,18,19,20,21,22,23] 512000000 512931544
core03 [ 24,25,26,27,28,29,30,31] 512000000 512923800
This patchset addresses this issue by ensuring that by default, the
offlined CPUs are returned to the Hypervisor via RTAS "stop-self" call
by changing the default value of "cede_offline_enabled" to false.
With the patches, we see that the observed value of the sum of the
PURR increments across the the online threads of a core in 1-second
matches the number of tb-ticks in 1-second.
SMT=off
===========================================
Core SD-PURR SD-PURR
(expected) (observed)
===========================================
core00 [ 0] 512000000 512527568
core01 [ 8] 512000000 512556128
core02 [ 16] 512000000 512590016
core03 [ 24] 512000000 512589440
SMT=2
===========================================
Core SD-PURR SD-PURR
(expected) (observed)
===========================================
core00 [ 0,1] 512000000 512635328
core01 [ 8,9] 512000000 512610416
core02 [ 16,17] 512000000 512639360
core03 [ 24,25] 512000000 512638720
SMT=4
===================================================
Core SD-PURR SD-PURR
(expected) (observed)
===================================================
core00 [ 0,1,2,3] 512000000 512757328
core01 [ 8,9,10,11] 512000000 512727920
core02 [ 16,17,18,19] 512000000 512754712
core03 [ 24,25,26,27] 512000000 512739040
SMT=on
==============================================================
Core SD-PURR SD-PURR
(expected) (observed)
==============================================================
core00 [ 0,1,2,3,4,5,6,7] 512000000 512920936
core01 [ 8,9,10,11,12,13,14,15] 512000000 512878728
core02 [ 16,17,18,19,20,21,22,23] 512000000 512921192
core03 [ 24,25,26,27,28,29,30,31] 512000000 512924816
Further, the patch
gives an improvement of 5% in offlining of a core on POWER8,
gives an improvement of 18% in offlining of a core on POWER9,
causes a regression of 2.5% in onlining of a core on POWER8,
causes a regression of 4.5% in onlining of a core on POWER9.
POWER8
======================================================================
| Operation | Patch status |#Samples|Min |Max |Median|Avg |Stddev|
| | | |(ms)|(ms) | (ms) |(ms) | |
======================================================================
| Offline | Without Patch| 20 | 822| 1232| 972 | 986.8 |112.58|
| Offline | With Patch | 20 | 831| 1152| 941 | 938.6 | 80.33|
| --------- | -------------|--------|----| ----|------|-------|------|
| Online | Without Patch| 20 |1460| 1760| 1620 |1591.2 | 82.72|
| Online | With Patch | 20 |1489| 1839| 1629 |1629.6 | 94.90|
======================================================================
POWER9
======================================================================
| Operation | Patch status |#Samples|Min |Max |Median|Avg |Stddev|
| | | |(ms)|(ms) | (ms) |(ms) | |
======================================================================
| Offline | Without Patch| 20 |1120|1653 | 1394 |1392.9 |133.63|
| Offline | With Patch | 20 | 930|1316 | 1161 |1130.8 |117.76|
| --------- | -------------|--------|----| ----|------|-------|------|
| Online | Without Patch| 20 |1652|2108 | 1903 |1891.6 |130.74|
| Online | With Patch | 20 |1824|2222 | 1960 |1976.1 | 93.98|
======================================================================
Gautham R. Shenoy (1):
pseries/hotplug-cpu: Change default behaviour of cede_offline to "off"
Documentation/core-api/cpu_hotplug.rst | 2 +-
arch/powerpc/platforms/pseries/hotplug-cpu.c | 12 +++++++++++-
2 files changed, 12 insertions(+), 2 deletions(-)
--
1.9.4
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 1/1] pseries/hotplug-cpu: Change default behaviour of cede_offline to "off"
2019-10-22 10:33 [PATCH v2 0/1] pseries/hotplug: Change the default behaviour of cede_offline Gautham R. Shenoy
@ 2019-10-22 10:33 ` Gautham R. Shenoy
2019-10-31 16:30 ` Nathan Lynch
2019-10-25 23:03 ` [PATCH v2 0/1] pseries/hotplug: Change the default behaviour of cede_offline Nathan Lynch
1 sibling, 1 reply; 6+ messages in thread
From: Gautham R. Shenoy @ 2019-10-22 10:33 UTC (permalink / raw)
To: Nathan Lynch, Michael Ellerman, Nicholas Piggin, Tyrel Datwyler,
Vaidyanathan Srinivasan, Kamalesh Babulal, Naveen N. Rao,
Aneesh Kumar K.V
Cc: linux-kernel, linuxppc-dev, Gautham R. Shenoy
From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
Currently on PSeries Linux guests, the offlined CPU can be put to one
of the following two states:
- Long term processor cede (also called extended cede)
- Returned to the hypervisor via RTAS "stop-self" call.
This is controlled by the kernel boot parameter "cede_offline=on/off".
By default the offlined CPUs enter extended cede. The PHYP hypervisor
considers CPUs in extended cede to be "active" since they are still
under the control fo the Linux guests. Hence, when we change the SMT
modes by offlining the secondary CPUs, the PURR and the RWMR SPRs will
continue to count the values for offlined CPUs in extended cede as if
they are online. This breaks the accounting in tools such as lparstat.
To fix this, ensure that by default the offlined CPUs are returned to
the hypervisor via RTAS "stop-self" call by changing the default value
of "cede_offline_enabled" to false.
Fixes: commit 3aa565f53c39 ("powerpc/pseries: Add hooks to put the CPU
into an appropriate offline state")
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
---
Documentation/core-api/cpu_hotplug.rst | 2 +-
arch/powerpc/platforms/pseries/hotplug-cpu.c | 12 +++++++++++-
2 files changed, 12 insertions(+), 2 deletions(-)
diff --git a/Documentation/core-api/cpu_hotplug.rst b/Documentation/core-api/cpu_hotplug.rst
index 4a50ab7..5319593 100644
--- a/Documentation/core-api/cpu_hotplug.rst
+++ b/Documentation/core-api/cpu_hotplug.rst
@@ -53,7 +53,7 @@ Command Line Switches
``cede_offline={"off","on"}``
Use this option to disable/enable putting offlined processors to an extended
``H_CEDE`` state on supported pseries platforms. If nothing is specified,
- ``cede_offline`` is set to "on".
+ ``cede_offline`` is set to "off".
This option is limited to the PowerPC architecture.
diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index bbda646..f9d0366 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -46,7 +46,17 @@ static DEFINE_PER_CPU(enum cpu_state_vals, preferred_offline_state) =
static enum cpu_state_vals default_offline_state = CPU_STATE_OFFLINE;
-static bool cede_offline_enabled __read_mostly = true;
+/*
+ * Determines whether the offlined CPUs should be put to a long term
+ * processor cede (called extended cede) for power-saving
+ * purposes. The CPUs in extended cede are still with the Linux Guest
+ * and are not returned to the Hypervisor.
+ *
+ * By default, the offlined CPUs are returned to the hypervisor via
+ * RTAS "stop-self". This behaviour can be changed by passing the
+ * kernel commandline parameter "cede_offline=on".
+ */
+static bool cede_offline_enabled __read_mostly;
/*
* Enable/disable cede_offline when available.
--
1.9.4
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v2 0/1] pseries/hotplug: Change the default behaviour of cede_offline
2019-10-22 10:33 [PATCH v2 0/1] pseries/hotplug: Change the default behaviour of cede_offline Gautham R. Shenoy
2019-10-22 10:33 ` [PATCH v2 1/1] pseries/hotplug-cpu: Change default behaviour of cede_offline to "off" Gautham R. Shenoy
@ 2019-10-25 23:03 ` Nathan Lynch
2019-10-29 11:13 ` Gautham R Shenoy
1 sibling, 1 reply; 6+ messages in thread
From: Nathan Lynch @ 2019-10-25 23:03 UTC (permalink / raw)
To: Gautham R. Shenoy
Cc: linux-kernel, linuxppc-dev, Michael Ellerman, Nicholas Piggin,
Tyrel Datwyler, Vaidyanathan Srinivasan, Kamalesh Babulal,
Naveen N. Rao, Aneesh Kumar K.V
"Gautham R. Shenoy" <ego@linux.vnet.ibm.com> writes:
> This is the v2 of the fix to change the default behaviour of
> cede_offline.
OK, but why keep the cede offline behavior at all? Can we remove it? I
think doing so would allow us to remove all the code that temporarily
onlines threads for partition migration.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 0/1] pseries/hotplug: Change the default behaviour of cede_offline
2019-10-25 23:03 ` [PATCH v2 0/1] pseries/hotplug: Change the default behaviour of cede_offline Nathan Lynch
@ 2019-10-29 11:13 ` Gautham R Shenoy
2019-10-29 15:29 ` Nathan Lynch
0 siblings, 1 reply; 6+ messages in thread
From: Gautham R Shenoy @ 2019-10-29 11:13 UTC (permalink / raw)
To: Nathan Lynch
Cc: Gautham R. Shenoy, linux-kernel, linuxppc-dev, Michael Ellerman,
Nicholas Piggin, Tyrel Datwyler, Vaidyanathan Srinivasan,
Kamalesh Babulal, Naveen N. Rao, Aneesh Kumar K.V
Hello Nathan,
On Fri, Oct 25, 2019 at 06:03:26PM -0500, Nathan Lynch wrote:
> "Gautham R. Shenoy" <ego@linux.vnet.ibm.com> writes:
> > This is the v2 of the fix to change the default behaviour of
> > cede_offline.
>
> OK, but why keep the cede offline behavior at all? Can we remove it? I
> think doing so would allow us to remove all the code that temporarily
> onlines threads for partition migration.
May be I am missing something. But don't we want all the CPUs to come
online and execute the H_JOIN hcall before performing partition
migration? How will this change whether the offlined CPUs are in
H_CEDE or rtas-stop-self?
--
Thanks and Regards
gautham.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 0/1] pseries/hotplug: Change the default behaviour of cede_offline
2019-10-29 11:13 ` Gautham R Shenoy
@ 2019-10-29 15:29 ` Nathan Lynch
0 siblings, 0 replies; 6+ messages in thread
From: Nathan Lynch @ 2019-10-29 15:29 UTC (permalink / raw)
To: Gautham R Shenoy
Cc: linux-kernel, linuxppc-dev, Michael Ellerman, Nicholas Piggin,
Tyrel Datwyler, Vaidyanathan Srinivasan, Kamalesh Babulal,
Naveen N. Rao, Aneesh Kumar K.V
Gautham R Shenoy <ego@linux.vnet.ibm.com> writes:
> On Fri, Oct 25, 2019 at 06:03:26PM -0500, Nathan Lynch wrote:
>> "Gautham R. Shenoy" <ego@linux.vnet.ibm.com> writes:
>> > This is the v2 of the fix to change the default behaviour of
>> > cede_offline.
>>
>> OK, but why keep the cede offline behavior at all? Can we remove it? I
>> think doing so would allow us to remove all the code that temporarily
>> onlines threads for partition migration.
>
> May be I am missing something. But don't we want all the CPUs to come
> online and execute the H_JOIN hcall before performing partition
> migration? How will this change whether the offlined CPUs are in
> H_CEDE or rtas-stop-self?
The platform considers threads in H_CEDE to be active. It considers
threads that have performed stop-self to be inactive until they have
been restarted. The Thread Join Option section of the PAPR says active
threads must perform the H_JOIN. I have confirmed with hypervisor
development that this implies that the OS needn't involve inactive
threads in the join/suspend sequence.
It isn't quite explicit in the log for 120496ac2d2d ("powerpc: Bring all
threads online prior to migration/hibernation"), but it stands to reason
that using cede for offline is the reason that the code to online all
threads for join/suspend was introduced in the first place.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 1/1] pseries/hotplug-cpu: Change default behaviour of cede_offline to "off"
2019-10-22 10:33 ` [PATCH v2 1/1] pseries/hotplug-cpu: Change default behaviour of cede_offline to "off" Gautham R. Shenoy
@ 2019-10-31 16:30 ` Nathan Lynch
0 siblings, 0 replies; 6+ messages in thread
From: Nathan Lynch @ 2019-10-31 16:30 UTC (permalink / raw)
To: Gautham R. Shenoy
Cc: linux-kernel, linuxppc-dev, Michael Ellerman, Nicholas Piggin,
Tyrel Datwyler, Vaidyanathan Srinivasan, Kamalesh Babulal,
Naveen N. Rao, Aneesh Kumar K.V
"Gautham R. Shenoy" <ego@linux.vnet.ibm.com> writes:
> From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
>
> Currently on PSeries Linux guests, the offlined CPU can be put to one
> of the following two states:
> - Long term processor cede (also called extended cede)
> - Returned to the hypervisor via RTAS "stop-self" call.
>
> This is controlled by the kernel boot parameter "cede_offline=on/off".
>
> By default the offlined CPUs enter extended cede. The PHYP hypervisor
> considers CPUs in extended cede to be "active" since they are still
> under the control fo the Linux guests. Hence, when we change the SMT
> modes by offlining the secondary CPUs, the PURR and the RWMR SPRs will
> continue to count the values for offlined CPUs in extended cede as if
> they are online. This breaks the accounting in tools such as lparstat.
>
> To fix this, ensure that by default the offlined CPUs are returned to
> the hypervisor via RTAS "stop-self" call by changing the default value
> of "cede_offline_enabled" to false.
>
> Fixes: commit 3aa565f53c39 ("powerpc/pseries: Add hooks to put the CPU
> into an appropriate offline state")
>
> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
I'm OK with changing the default as a precursor to removing the code
that implements the cede offline mode.
Acked-by: Nathan Lynch <nathanl@linux.ibm.com>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2019-10-31 16:30 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-22 10:33 [PATCH v2 0/1] pseries/hotplug: Change the default behaviour of cede_offline Gautham R. Shenoy
2019-10-22 10:33 ` [PATCH v2 1/1] pseries/hotplug-cpu: Change default behaviour of cede_offline to "off" Gautham R. Shenoy
2019-10-31 16:30 ` Nathan Lynch
2019-10-25 23:03 ` [PATCH v2 0/1] pseries/hotplug: Change the default behaviour of cede_offline Nathan Lynch
2019-10-29 11:13 ` Gautham R Shenoy
2019-10-29 15:29 ` Nathan Lynch
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).