linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/6] [PATCH v4 0/6] Track and expose idle PURR and SPURR ticks
@ 2020-03-27 11:32 Gautham R. Shenoy
  2020-03-27 11:32 ` [PATCH v4 1/6] powerpc: Move idle_loop_prolog()/epilog() functions to header file Gautham R. Shenoy
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Gautham R. Shenoy @ 2020-03-27 11:32 UTC (permalink / raw)
  To: Nathan Lynch, Michael Ellerman, Vaidyanathan Srinivasan,
	Kamalesh Babulal, Naveen N. Rao, Tyrel Datwyler
  Cc: linuxppc-dev, linux-kernel, Gautham R. Shenoy

From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>

Hi,

This is the fourth version of the patches to track and expose idle PURR
and SPURR ticks. These patches are required by tools such as lparstat
to compute system utilization for capacity planning purposes.

The previous versions can be found here:
v3: https://lkml.org/lkml/2020/3/11/331
v2: https://lkml.org/lkml/2020/2/21/21
v1: https://lore.kernel.org/patchwork/cover/1159341/

They key changes from v3 are:

   - Fixed the build errors on !CONFIG_PPC64 and !CONFIG_PPC_PSERIES
     configurations notified by the kbuild bot.


Motivation:
===========
On PSeries LPARs, the data centers planners desire a more accurate
view of system utilization per resource such as CPU to plan the system
capacity requirements better. Such accuracy can be obtained by reading
PURR/SPURR registers for CPU resource utilization.

Tools such as lparstat which are used to compute the utilization need
to know [S]PURR ticks when the cpu was busy or idle. The [S]PURR
counters are already exposed through sysfs.  We already account for
PURR ticks when we go to idle so that we can update the VPA area. This
patchset extends support to account for SPURR ticks when idle, and
expose both via per-cpu sysfs files.

This patch series also introduces a patch (Patch 6/6) to send an IPI
in order to read and cache the values of purr, spurr, idle_purr and
idle_spurr of the target CPU when any one of them is read via
sysfs. These cached values will be presented if any of these sysfs are
read within the next 10ms. If these sysfs files are read after 10ms
from the earlier IPI, a fresh IPI is issued to read and cache the
values again. This minimizes the number of IPIs required to be sent
when these values are read back-to-back via the sysfs interface.

    Without patch 6/6 (Without caching): 
                 16 [XICS 2 Edge IPI] = 422 times
                 DBL [Doorbell interrupts] = 13 times
                 Total : 435 IPIs.
    
    With patch 6/6 (With caching):
                  16 [XICS 2 Edge IPI] = 111 times
                  DBL [Doorbell interrupts] = 17 times
                  Total : 128 IPIs.

These patches are required for enhancement to the lparstat utility
that compute the CPU utilization based on PURR and SPURR which can be
found here :
https://groups.google.com/forum/#!topic/powerpc-utils-devel/fYRo69xO9r4


With the patches, when lparstat is run on a LPAR running CPU-Hogs,
=========================================================================
sudo ./src/lparstat -E 1 3
System Configuration
type=Dedicated mode=Capped smt=8 lcpu=2 mem=4834176 kB cpus=0 ent=2.00 
---Actual---                 -Normalized-
%busy  %idle   Frequency     %busy  %idle
------ ------  ------------- ------ ------
 99.99   0.00  3.35GHz[111%] 110.99   0.00
100.00   0.00  3.35GHz[111%] 111.00   0.00
100.00   0.00  3.35GHz[111%] 111.00   0.00

With patches, when lparstat is run on and idle LPAR
=========================================================================
---Actual---                 -Normalized-
%busy  %idle   Frequency     %busy  %idle
------ ------  ------------- ------ ------
0.20  99.81  2.17GHz[ 72%]   0.19  71.82
0.42  99.58  2.11GHz[ 70%]   0.31  69.69
0.41  99.59  2.11GHz[ 70%]   0.31  69.69

Gautham R. Shenoy (6):
  powerpc: Move idle_loop_prolog()/epilog() functions to header file
  powerpc/idle: Add accessor function to always read latest idle PURR
  powerpc/pseries: Account for SPURR ticks on idle CPUs
  powerpc/sysfs: Show idle_purr and idle_spurr for every CPU
  Documentation: Document sysfs interfaces purr, spurr, idle_purr,
    idle_spurr
  pseries/sysfs: Minimise IPI noise while reading [idle_][s]purr

 Documentation/ABI/testing/sysfs-devices-system-cpu |  39 +++++
 arch/powerpc/include/asm/idle.h                    |  93 ++++++++++++
 arch/powerpc/kernel/sysfs.c                        | 167 ++++++++++++++++++++-
 arch/powerpc/platforms/pseries/setup.c             |   8 +-
 drivers/cpuidle/cpuidle-pseries.c                  |  39 +----
 5 files changed, 305 insertions(+), 41 deletions(-)
 create mode 100644 arch/powerpc/include/asm/idle.h

-- 
1.9.4


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v4 1/6] powerpc: Move idle_loop_prolog()/epilog() functions to header file
  2020-03-27 11:32 [PATCH v4 0/6] [PATCH v4 0/6] Track and expose idle PURR and SPURR ticks Gautham R. Shenoy
@ 2020-03-27 11:32 ` Gautham R. Shenoy
  2020-03-27 11:32 ` [PATCH v4 2/6] powerpc/idle: Add accessor function to always read latest idle PURR Gautham R. Shenoy
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Gautham R. Shenoy @ 2020-03-27 11:32 UTC (permalink / raw)
  To: Nathan Lynch, Michael Ellerman, Vaidyanathan Srinivasan,
	Kamalesh Babulal, Naveen N. Rao, Tyrel Datwyler
  Cc: linuxppc-dev, linux-kernel, Gautham R. Shenoy

From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>

Currently prior to entering an idle state on a Linux Guest, the
pseries cpuidle driver implement an idle_loop_prolog() and
idle_loop_epilog() functions which ensure that idle_purr is correctly
computed, and the hypervisor is informed that the CPU cycles have been
donated.

These prolog and epilog functions are also required in the default
idle call, i.e pseries_lpar_idle(). Hence move these accessor
functions to a common header file and call them from
pseries_lpar_idle(). Since the existing header files such as
asm/processor.h have enough clutter, create a new header file
asm/idle.h. Finally rename idle_loop_prolog() and idle_loop_epilog()
to pseries_idle_prolog() and pseries_idle_epilog() as they are only
relavent for on pseries guests.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/idle.h        | 31 +++++++++++++++++++++++++++++
 arch/powerpc/platforms/pseries/setup.c |  7 +++++--
 drivers/cpuidle/cpuidle-pseries.c      | 36 +++++++---------------------------
 3 files changed, 43 insertions(+), 31 deletions(-)
 create mode 100644 arch/powerpc/include/asm/idle.h

diff --git a/arch/powerpc/include/asm/idle.h b/arch/powerpc/include/asm/idle.h
new file mode 100644
index 0000000..32064a4c
--- /dev/null
+++ b/arch/powerpc/include/asm/idle.h
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef _ASM_POWERPC_IDLE_H
+#define _ASM_POWERPC_IDLE_H
+#include <asm/runlatch.h>
+#include <asm/paca.h>
+
+#ifdef CONFIG_PPC_PSERIES
+static inline void pseries_idle_prolog(unsigned long *in_purr)
+{
+	ppc64_runlatch_off();
+	*in_purr = mfspr(SPRN_PURR);
+	/*
+	 * Indicate to the HV that we are idle. Now would be
+	 * a good time to find other work to dispatch.
+	 */
+	get_lppaca()->idle = 1;
+}
+
+static inline void pseries_idle_epilog(unsigned long in_purr)
+{
+	u64 wait_cycles;
+
+	wait_cycles = be64_to_cpu(get_lppaca()->wait_state_cycles);
+	wait_cycles += mfspr(SPRN_PURR) - in_purr;
+	get_lppaca()->wait_state_cycles = cpu_to_be64(wait_cycles);
+	get_lppaca()->idle = 0;
+
+	ppc64_runlatch_on();
+}
+#endif /* CONFIG_PPC_PSERIES */
+#endif
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 0c8421d..2f53e6b 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -68,6 +68,7 @@
 #include <asm/isa-bridge.h>
 #include <asm/security_features.h>
 #include <asm/asm-const.h>
+#include <asm/idle.h>
 #include <asm/swiotlb.h>
 #include <asm/svm.h>
 
@@ -319,6 +320,8 @@ static int alloc_dispatch_log_kmem_cache(void)
 
 static void pseries_lpar_idle(void)
 {
+	unsigned long in_purr;
+
 	/*
 	 * Default handler to go into low thread priority and possibly
 	 * low power mode by ceding processor to hypervisor
@@ -328,7 +331,7 @@ static void pseries_lpar_idle(void)
 		return;
 
 	/* Indicate to hypervisor that we are idle. */
-	get_lppaca()->idle = 1;
+	pseries_idle_prolog(&in_purr);
 
 	/*
 	 * Yield the processor to the hypervisor.  We return if
@@ -339,7 +342,7 @@ static void pseries_lpar_idle(void)
 	 */
 	cede_processor();
 
-	get_lppaca()->idle = 0;
+	pseries_idle_epilog(in_purr);
 }
 
 /*
diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
index 74c2479..46d5e05 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -19,6 +19,7 @@
 #include <asm/machdep.h>
 #include <asm/firmware.h>
 #include <asm/runlatch.h>
+#include <asm/idle.h>
 #include <asm/plpar_wrappers.h>
 
 struct cpuidle_driver pseries_idle_driver = {
@@ -31,29 +32,6 @@ struct cpuidle_driver pseries_idle_driver = {
 static u64 snooze_timeout __read_mostly;
 static bool snooze_timeout_en __read_mostly;
 
-static inline void idle_loop_prolog(unsigned long *in_purr)
-{
-	ppc64_runlatch_off();
-	*in_purr = mfspr(SPRN_PURR);
-	/*
-	 * Indicate to the HV that we are idle. Now would be
-	 * a good time to find other work to dispatch.
-	 */
-	get_lppaca()->idle = 1;
-}
-
-static inline void idle_loop_epilog(unsigned long in_purr)
-{
-	u64 wait_cycles;
-
-	wait_cycles = be64_to_cpu(get_lppaca()->wait_state_cycles);
-	wait_cycles += mfspr(SPRN_PURR) - in_purr;
-	get_lppaca()->wait_state_cycles = cpu_to_be64(wait_cycles);
-	get_lppaca()->idle = 0;
-
-	ppc64_runlatch_on();
-}
-
 static int snooze_loop(struct cpuidle_device *dev,
 			struct cpuidle_driver *drv,
 			int index)
@@ -63,7 +41,7 @@ static int snooze_loop(struct cpuidle_device *dev,
 
 	set_thread_flag(TIF_POLLING_NRFLAG);
 
-	idle_loop_prolog(&in_purr);
+	pseries_idle_prolog(&in_purr);
 	local_irq_enable();
 	snooze_exit_time = get_tb() + snooze_timeout;
 
@@ -87,7 +65,7 @@ static int snooze_loop(struct cpuidle_device *dev,
 
 	local_irq_disable();
 
-	idle_loop_epilog(in_purr);
+	pseries_idle_epilog(in_purr);
 
 	return index;
 }
@@ -115,7 +93,7 @@ static int dedicated_cede_loop(struct cpuidle_device *dev,
 {
 	unsigned long in_purr;
 
-	idle_loop_prolog(&in_purr);
+	pseries_idle_prolog(&in_purr);
 	get_lppaca()->donate_dedicated_cpu = 1;
 
 	HMT_medium();
@@ -124,7 +102,7 @@ static int dedicated_cede_loop(struct cpuidle_device *dev,
 	local_irq_disable();
 	get_lppaca()->donate_dedicated_cpu = 0;
 
-	idle_loop_epilog(in_purr);
+	pseries_idle_epilog(in_purr);
 
 	return index;
 }
@@ -135,7 +113,7 @@ static int shared_cede_loop(struct cpuidle_device *dev,
 {
 	unsigned long in_purr;
 
-	idle_loop_prolog(&in_purr);
+	pseries_idle_prolog(&in_purr);
 
 	/*
 	 * Yield the processor to the hypervisor.  We return if
@@ -147,7 +125,7 @@ static int shared_cede_loop(struct cpuidle_device *dev,
 	check_and_cede_processor();
 
 	local_irq_disable();
-	idle_loop_epilog(in_purr);
+	pseries_idle_epilog(in_purr);
 
 	return index;
 }
-- 
1.9.4


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v4 2/6] powerpc/idle: Add accessor function to always read latest idle PURR
  2020-03-27 11:32 [PATCH v4 0/6] [PATCH v4 0/6] Track and expose idle PURR and SPURR ticks Gautham R. Shenoy
  2020-03-27 11:32 ` [PATCH v4 1/6] powerpc: Move idle_loop_prolog()/epilog() functions to header file Gautham R. Shenoy
@ 2020-03-27 11:32 ` Gautham R. Shenoy
  2020-04-01  9:42   ` Naveen N. Rao
  2020-03-27 11:32 ` [PATCH v4 3/6] powerpc/pseries: Account for SPURR ticks on idle CPUs Gautham R. Shenoy
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Gautham R. Shenoy @ 2020-03-27 11:32 UTC (permalink / raw)
  To: Nathan Lynch, Michael Ellerman, Vaidyanathan Srinivasan,
	Kamalesh Babulal, Naveen N. Rao, Tyrel Datwyler
  Cc: linuxppc-dev, linux-kernel, Gautham R. Shenoy

From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>

Currently when CPU goes idle, we take a snapshot of PURR via
pseries_idle_prolog() which is used at the CPU idle exit to compute
the idle PURR cycles via the function pseries_idle_epilog().  Thus,
the value of idle PURR cycle thus read before pseries_idle_prolog() and
after pseries_idle_epilog() is always correct.

However, if we were to read the idle PURR cycles from an interrupt
context between pseries_idle_prolog() and pseries_idle_epilog() (this will
be done in a future patch), then, the value of the idle PURR thus read
will not include the cycles spent in the most recent idle period.

This patch addresses the issue by providing accessor function to read
the idle PURR such such that it includes the cycles spent in the most
recent idle period, if we read it between pseries_idle_prolog() and
pseries_idle_epilog(). In order to achieve it, the patch saves the
snapshot of PURR in pseries_idle_prolog() in a per-cpu variable,
instead of on the stack, so that it can be accessed from an interrupt
context.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/idle.h        | 47 +++++++++++++++++++++++++++-------
 arch/powerpc/platforms/pseries/setup.c |  7 +++--
 drivers/cpuidle/cpuidle-pseries.c      | 15 +++++------
 3 files changed, 47 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/include/asm/idle.h b/arch/powerpc/include/asm/idle.h
index 32064a4c..d4bfb6a 100644
--- a/arch/powerpc/include/asm/idle.h
+++ b/arch/powerpc/include/asm/idle.h
@@ -5,10 +5,27 @@
 #include <asm/paca.h>
 
 #ifdef CONFIG_PPC_PSERIES
-static inline void pseries_idle_prolog(unsigned long *in_purr)
+DECLARE_PER_CPU(u64, idle_entry_purr_snap);
+
+static inline void snapshot_purr_idle_entry(void)
+{
+	*this_cpu_ptr(&idle_entry_purr_snap) = mfspr(SPRN_PURR);
+}
+
+static inline void update_idle_purr_accounting(void)
+{
+	u64 wait_cycles;
+	u64 in_purr = *this_cpu_ptr(&idle_entry_purr_snap);
+
+	wait_cycles = be64_to_cpu(get_lppaca()->wait_state_cycles);
+	wait_cycles += mfspr(SPRN_PURR) - in_purr;
+	get_lppaca()->wait_state_cycles = cpu_to_be64(wait_cycles);
+}
+
+static inline void pseries_idle_prolog(void)
 {
 	ppc64_runlatch_off();
-	*in_purr = mfspr(SPRN_PURR);
+	snapshot_purr_idle_entry();
 	/*
 	 * Indicate to the HV that we are idle. Now would be
 	 * a good time to find other work to dispatch.
@@ -16,16 +33,28 @@ static inline void pseries_idle_prolog(unsigned long *in_purr)
 	get_lppaca()->idle = 1;
 }
 
-static inline void pseries_idle_epilog(unsigned long in_purr)
+static inline void pseries_idle_epilog(void)
 {
-	u64 wait_cycles;
-
-	wait_cycles = be64_to_cpu(get_lppaca()->wait_state_cycles);
-	wait_cycles += mfspr(SPRN_PURR) - in_purr;
-	get_lppaca()->wait_state_cycles = cpu_to_be64(wait_cycles);
+	update_idle_purr_accounting();
 	get_lppaca()->idle = 0;
-
 	ppc64_runlatch_on();
 }
+
+static inline u64 read_this_idle_purr(void)
+{
+	/*
+	 * If we are reading from an idle context, update the
+	 * idle-purr cycles corresponding to the last idle period.
+	 * Since the idle context is not yet over, take a fresh
+	 * snapshot of the idle-purr.
+	 */
+	if (unlikely(get_lppaca()->idle == 1)) {
+		update_idle_purr_accounting();
+		snapshot_purr_idle_entry();
+	}
+
+	return be64_to_cpu(get_lppaca()->wait_state_cycles);
+}
+
 #endif /* CONFIG_PPC_PSERIES */
 #endif
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 2f53e6b..4905c96 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -318,10 +318,9 @@ static int alloc_dispatch_log_kmem_cache(void)
 }
 machine_early_initcall(pseries, alloc_dispatch_log_kmem_cache);
 
+DEFINE_PER_CPU(u64, idle_entry_purr_snap);
 static void pseries_lpar_idle(void)
 {
-	unsigned long in_purr;
-
 	/*
 	 * Default handler to go into low thread priority and possibly
 	 * low power mode by ceding processor to hypervisor
@@ -331,7 +330,7 @@ static void pseries_lpar_idle(void)
 		return;
 
 	/* Indicate to hypervisor that we are idle. */
-	pseries_idle_prolog(&in_purr);
+	pseries_idle_prolog();
 
 	/*
 	 * Yield the processor to the hypervisor.  We return if
@@ -342,7 +341,7 @@ static void pseries_lpar_idle(void)
 	 */
 	cede_processor();
 
-	pseries_idle_epilog(in_purr);
+	pseries_idle_epilog();
 }
 
 /*
diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
index 46d5e05..6513ef2 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -36,12 +36,11 @@ static int snooze_loop(struct cpuidle_device *dev,
 			struct cpuidle_driver *drv,
 			int index)
 {
-	unsigned long in_purr;
 	u64 snooze_exit_time;
 
 	set_thread_flag(TIF_POLLING_NRFLAG);
 
-	pseries_idle_prolog(&in_purr);
+	pseries_idle_prolog();
 	local_irq_enable();
 	snooze_exit_time = get_tb() + snooze_timeout;
 
@@ -65,7 +64,7 @@ static int snooze_loop(struct cpuidle_device *dev,
 
 	local_irq_disable();
 
-	pseries_idle_epilog(in_purr);
+	pseries_idle_epilog();
 
 	return index;
 }
@@ -91,9 +90,8 @@ static int dedicated_cede_loop(struct cpuidle_device *dev,
 				struct cpuidle_driver *drv,
 				int index)
 {
-	unsigned long in_purr;
 
-	pseries_idle_prolog(&in_purr);
+	pseries_idle_prolog();
 	get_lppaca()->donate_dedicated_cpu = 1;
 
 	HMT_medium();
@@ -102,7 +100,7 @@ static int dedicated_cede_loop(struct cpuidle_device *dev,
 	local_irq_disable();
 	get_lppaca()->donate_dedicated_cpu = 0;
 
-	pseries_idle_epilog(in_purr);
+	pseries_idle_epilog();
 
 	return index;
 }
@@ -111,9 +109,8 @@ static int shared_cede_loop(struct cpuidle_device *dev,
 			struct cpuidle_driver *drv,
 			int index)
 {
-	unsigned long in_purr;
 
-	pseries_idle_prolog(&in_purr);
+	pseries_idle_prolog();
 
 	/*
 	 * Yield the processor to the hypervisor.  We return if
@@ -125,7 +122,7 @@ static int shared_cede_loop(struct cpuidle_device *dev,
 	check_and_cede_processor();
 
 	local_irq_disable();
-	pseries_idle_epilog(in_purr);
+	pseries_idle_epilog();
 
 	return index;
 }
-- 
1.9.4


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v4 3/6] powerpc/pseries: Account for SPURR ticks on idle CPUs
  2020-03-27 11:32 [PATCH v4 0/6] [PATCH v4 0/6] Track and expose idle PURR and SPURR ticks Gautham R. Shenoy
  2020-03-27 11:32 ` [PATCH v4 1/6] powerpc: Move idle_loop_prolog()/epilog() functions to header file Gautham R. Shenoy
  2020-03-27 11:32 ` [PATCH v4 2/6] powerpc/idle: Add accessor function to always read latest idle PURR Gautham R. Shenoy
@ 2020-03-27 11:32 ` Gautham R. Shenoy
  2020-03-27 11:32 ` [PATCH v4 4/6] powerpc/sysfs: Show idle_purr and idle_spurr for every CPU Gautham R. Shenoy
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Gautham R. Shenoy @ 2020-03-27 11:32 UTC (permalink / raw)
  To: Nathan Lynch, Michael Ellerman, Vaidyanathan Srinivasan,
	Kamalesh Babulal, Naveen N. Rao, Tyrel Datwyler
  Cc: linuxppc-dev, linux-kernel, Gautham R. Shenoy

From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>

On Pseries LPARs, to calculate utilization, we need to know the
[S]PURR ticks when the CPUs were busy or idle.

Via pseries_idle_prolog(), pseries_idle_epilog(), we track the idle
PURR ticks in the VPA variable "wait_state_cycles". This patch extends
the support to account for the idle SPURR ticks. It also provides an
accessor function to accurately reads idle SPURR ticks.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/idle.h        | 33 +++++++++++++++++++++++++++++++++
 arch/powerpc/platforms/pseries/setup.c |  2 ++
 2 files changed, 35 insertions(+)

diff --git a/arch/powerpc/include/asm/idle.h b/arch/powerpc/include/asm/idle.h
index d4bfb6a..accd1f5 100644
--- a/arch/powerpc/include/asm/idle.h
+++ b/arch/powerpc/include/asm/idle.h
@@ -5,13 +5,20 @@
 #include <asm/paca.h>
 
 #ifdef CONFIG_PPC_PSERIES
+DECLARE_PER_CPU(u64, idle_spurr_cycles);
 DECLARE_PER_CPU(u64, idle_entry_purr_snap);
+DECLARE_PER_CPU(u64, idle_entry_spurr_snap);
 
 static inline void snapshot_purr_idle_entry(void)
 {
 	*this_cpu_ptr(&idle_entry_purr_snap) = mfspr(SPRN_PURR);
 }
 
+static inline void snapshot_spurr_idle_entry(void)
+{
+	*this_cpu_ptr(&idle_entry_spurr_snap) = mfspr(SPRN_SPURR);
+}
+
 static inline void update_idle_purr_accounting(void)
 {
 	u64 wait_cycles;
@@ -22,10 +29,19 @@ static inline void update_idle_purr_accounting(void)
 	get_lppaca()->wait_state_cycles = cpu_to_be64(wait_cycles);
 }
 
+static inline void update_idle_spurr_accounting(void)
+{
+	u64 *idle_spurr_cycles_ptr = this_cpu_ptr(&idle_spurr_cycles);
+	u64 in_spurr = *this_cpu_ptr(&idle_entry_spurr_snap);
+
+	*idle_spurr_cycles_ptr += mfspr(SPRN_SPURR) - in_spurr;
+}
+
 static inline void pseries_idle_prolog(void)
 {
 	ppc64_runlatch_off();
 	snapshot_purr_idle_entry();
+	snapshot_spurr_idle_entry();
 	/*
 	 * Indicate to the HV that we are idle. Now would be
 	 * a good time to find other work to dispatch.
@@ -36,6 +52,7 @@ static inline void pseries_idle_prolog(void)
 static inline void pseries_idle_epilog(void)
 {
 	update_idle_purr_accounting();
+	update_idle_spurr_accounting();
 	get_lppaca()->idle = 0;
 	ppc64_runlatch_on();
 }
@@ -56,5 +73,21 @@ static inline u64 read_this_idle_purr(void)
 	return be64_to_cpu(get_lppaca()->wait_state_cycles);
 }
 
+static inline u64 read_this_idle_spurr(void)
+{
+	/*
+	 * If we are reading from an idle context, update the
+	 * idle-spurr cycles corresponding to the last idle period.
+	 * Since the idle context is not yet over, take a fresh
+	 * snapshot of the idle-spurr.
+	 */
+	if (get_lppaca()->idle == 1) {
+		update_idle_spurr_accounting();
+		snapshot_spurr_idle_entry();
+	}
+
+	return *this_cpu_ptr(&idle_spurr_cycles);
+}
+
 #endif /* CONFIG_PPC_PSERIES */
 #endif
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 4905c96..1b55e80 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -318,7 +318,9 @@ static int alloc_dispatch_log_kmem_cache(void)
 }
 machine_early_initcall(pseries, alloc_dispatch_log_kmem_cache);
 
+DEFINE_PER_CPU(u64, idle_spurr_cycles);
 DEFINE_PER_CPU(u64, idle_entry_purr_snap);
+DEFINE_PER_CPU(u64, idle_entry_spurr_snap);
 static void pseries_lpar_idle(void)
 {
 	/*
-- 
1.9.4


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v4 4/6] powerpc/sysfs: Show idle_purr and idle_spurr for every CPU
  2020-03-27 11:32 [PATCH v4 0/6] [PATCH v4 0/6] Track and expose idle PURR and SPURR ticks Gautham R. Shenoy
                   ` (2 preceding siblings ...)
  2020-03-27 11:32 ` [PATCH v4 3/6] powerpc/pseries: Account for SPURR ticks on idle CPUs Gautham R. Shenoy
@ 2020-03-27 11:32 ` Gautham R. Shenoy
  2020-03-27 11:32 ` [PATCH v4 5/6] Documentation: Document sysfs interfaces purr, spurr, idle_purr, idle_spurr Gautham R. Shenoy
  2020-03-27 11:32 ` [PATCH v4 6/6] pseries/sysfs: Minimise IPI noise while reading [idle_][s]purr Gautham R. Shenoy
  5 siblings, 0 replies; 17+ messages in thread
From: Gautham R. Shenoy @ 2020-03-27 11:32 UTC (permalink / raw)
  To: Nathan Lynch, Michael Ellerman, Vaidyanathan Srinivasan,
	Kamalesh Babulal, Naveen N. Rao, Tyrel Datwyler
  Cc: linuxppc-dev, linux-kernel, Gautham R. Shenoy

From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>

On Pseries LPARs, to calculate utilization, we need to know the
[S]PURR ticks when the CPUs were busy or idle.

The total PURR and SPURR ticks are already exposed via the per-cpu
sysfs files "purr" and "spurr". This patch adds support for exposing
the idle PURR and SPURR ticks via new per-cpu sysfs files named
"idle_purr" and "idle_spurr".

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/sysfs.c | 82 +++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 79 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
index 479c706..571b325 100644
--- a/arch/powerpc/kernel/sysfs.c
+++ b/arch/powerpc/kernel/sysfs.c
@@ -19,6 +19,7 @@
 #include <asm/smp.h>
 #include <asm/pmc.h>
 #include <asm/firmware.h>
+#include <asm/idle.h>
 #include <asm/svm.h>
 
 #include "cacheinfo.h"
@@ -760,6 +761,74 @@ static void create_svm_file(void)
 }
 #endif /* CONFIG_PPC_SVM */
 
+#ifdef CONFIG_PPC_PSERIES
+static void read_idle_purr(void *val)
+{
+	u64 *ret = val;
+
+	*ret = read_this_idle_purr();
+}
+
+static ssize_t idle_purr_show(struct device *dev,
+			      struct device_attribute *attr, char *buf)
+{
+	struct cpu *cpu = container_of(dev, struct cpu, dev);
+	u64 val;
+
+	smp_call_function_single(cpu->dev.id, read_idle_purr, &val, 1);
+	return sprintf(buf, "%llx\n", val);
+}
+static DEVICE_ATTR(idle_purr, 0400, idle_purr_show, NULL);
+
+static void create_idle_purr_file(struct device *s)
+{
+	if (firmware_has_feature(FW_FEATURE_LPAR))
+		device_create_file(s, &dev_attr_idle_purr);
+}
+
+static void remove_idle_purr_file(struct device *s)
+{
+	if (firmware_has_feature(FW_FEATURE_LPAR))
+		device_remove_file(s, &dev_attr_idle_purr);
+}
+
+static void read_idle_spurr(void *val)
+{
+	u64 *ret = val;
+
+	*ret = read_this_idle_spurr();
+}
+
+static ssize_t idle_spurr_show(struct device *dev,
+			       struct device_attribute *attr, char *buf)
+{
+	struct cpu *cpu = container_of(dev, struct cpu, dev);
+	u64 val;
+
+	smp_call_function_single(cpu->dev.id, read_idle_spurr, &val, 1);
+	return sprintf(buf, "%llx\n", val);
+}
+static DEVICE_ATTR(idle_spurr, 0400, idle_spurr_show, NULL);
+
+static void create_idle_spurr_file(struct device *s)
+{
+	if (firmware_has_feature(FW_FEATURE_LPAR))
+		device_create_file(s, &dev_attr_idle_spurr);
+}
+
+static void remove_idle_spurr_file(struct device *s)
+{
+	if (firmware_has_feature(FW_FEATURE_LPAR))
+		device_remove_file(s, &dev_attr_idle_spurr);
+}
+
+#else /* CONFIG_PPC_PSERIES */
+#define create_idle_purr_file(s)
+#define remove_idle_purr_file(s)
+#define create_idle_spurr_file(s)
+#define remove_idle_spurr_file(s)
+#endif /* CONFIG_PPC_PSERIES */
+
 static int register_cpu_online(unsigned int cpu)
 {
 	struct cpu *c = &per_cpu(cpu_devices, cpu);
@@ -823,10 +892,13 @@ static int register_cpu_online(unsigned int cpu)
 		if (!firmware_has_feature(FW_FEATURE_LPAR))
 			add_write_permission_dev_attr(&dev_attr_purr);
 		device_create_file(s, &dev_attr_purr);
+		create_idle_purr_file(s);
 	}
 
-	if (cpu_has_feature(CPU_FTR_SPURR))
+	if (cpu_has_feature(CPU_FTR_SPURR)) {
 		device_create_file(s, &dev_attr_spurr);
+		create_idle_spurr_file(s);
+	}
 
 	if (cpu_has_feature(CPU_FTR_DSCR))
 		device_create_file(s, &dev_attr_dscr);
@@ -910,11 +982,15 @@ static int unregister_cpu_online(unsigned int cpu)
 		device_remove_file(s, &dev_attr_mmcra);
 #endif /* CONFIG_PMU_SYSFS */
 
-	if (cpu_has_feature(CPU_FTR_PURR))
+	if (cpu_has_feature(CPU_FTR_PURR)) {
 		device_remove_file(s, &dev_attr_purr);
+		remove_idle_purr_file(s);
+	}
 
-	if (cpu_has_feature(CPU_FTR_SPURR))
+	if (cpu_has_feature(CPU_FTR_SPURR)) {
 		device_remove_file(s, &dev_attr_spurr);
+		remove_idle_spurr_file(s);
+	}
 
 	if (cpu_has_feature(CPU_FTR_DSCR))
 		device_remove_file(s, &dev_attr_dscr);
-- 
1.9.4


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v4 5/6] Documentation: Document sysfs interfaces purr, spurr, idle_purr, idle_spurr
  2020-03-27 11:32 [PATCH v4 0/6] [PATCH v4 0/6] Track and expose idle PURR and SPURR ticks Gautham R. Shenoy
                   ` (3 preceding siblings ...)
  2020-03-27 11:32 ` [PATCH v4 4/6] powerpc/sysfs: Show idle_purr and idle_spurr for every CPU Gautham R. Shenoy
@ 2020-03-27 11:32 ` Gautham R. Shenoy
  2020-04-01  9:45   ` Naveen N. Rao
  2020-03-27 11:32 ` [PATCH v4 6/6] pseries/sysfs: Minimise IPI noise while reading [idle_][s]purr Gautham R. Shenoy
  5 siblings, 1 reply; 17+ messages in thread
From: Gautham R. Shenoy @ 2020-03-27 11:32 UTC (permalink / raw)
  To: Nathan Lynch, Michael Ellerman, Vaidyanathan Srinivasan,
	Kamalesh Babulal, Naveen N. Rao, Tyrel Datwyler
  Cc: linuxppc-dev, linux-kernel, Gautham R. Shenoy

From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>

Add documentation for the following sysfs interfaces:
/sys/devices/system/cpu/cpuX/purr
/sys/devices/system/cpu/cpuX/spurr
/sys/devices/system/cpu/cpuX/idle_purr
/sys/devices/system/cpu/cpuX/idle_spurr

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
---
 Documentation/ABI/testing/sysfs-devices-system-cpu | 39 ++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 2e0e3b4..bc07677 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -580,3 +580,42 @@ Description:	Secure Virtual Machine
 		If 1, it means the system is using the Protected Execution
 		Facility in POWER9 and newer processors. i.e., it is a Secure
 		Virtual Machine.
+
+What: 		/sys/devices/system/cpu/cpuX/purr
+Date:		Apr 2005
+Contact:	Linux for PowerPC mailing list <linuxppc-dev@ozlabs.org>
+Description:	PURR ticks for this CPU since the system boot.
+
+		The Processor Utilization Resources Register (PURR) is
+		a 64-bit counter which provides an estimate of the
+		resources used by the CPU thread. The contents of this
+		register increases monotonically. This sysfs interface
+		exposes the number of PURR ticks for cpuX.
+
+What: 		/sys/devices/system/cpu/cpuX/spurr
+Date:		Dec 2006
+Contact:	Linux for PowerPC mailing list <linuxppc-dev@ozlabs.org>
+Description:	SPURR ticks for this CPU since the system boot.
+
+		The Scaled Processor Utilization Resources Register
+		(SPURR) is a 64-bit counter that provides a frequency
+		invariant estimate of the resources used by the CPU
+		thread. The contents of this register increases
+		monotonically. This sysfs interface exposes the number
+		of SPURR ticks for cpuX.
+
+What: 		/sys/devices/system/cpu/cpuX/idle_purr
+Date:		Mar 2020
+Contact:	Linux for PowerPC mailing list <linuxppc-dev@ozlabs.org>
+Description:	PURR ticks for cpuX when it was idle.
+
+		This sysfs interface exposes the number of PURR ticks
+		for cpuX when it was idle.
+
+What: 		/sys/devices/system/cpu/cpuX/idle_spurr
+Date:		Mar 2020
+Contact:	Linux for PowerPC mailing list <linuxppc-dev@ozlabs.org>
+Description:	SPURR ticks for cpuX when it was idle.
+
+		This sysfs interface exposes the number of SPURR ticks
+		for cpuX when it was idle.
-- 
1.9.4


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v4 6/6] pseries/sysfs: Minimise IPI noise while reading [idle_][s]purr
  2020-03-27 11:32 [PATCH v4 0/6] [PATCH v4 0/6] Track and expose idle PURR and SPURR ticks Gautham R. Shenoy
                   ` (4 preceding siblings ...)
  2020-03-27 11:32 ` [PATCH v4 5/6] Documentation: Document sysfs interfaces purr, spurr, idle_purr, idle_spurr Gautham R. Shenoy
@ 2020-03-27 11:32 ` Gautham R. Shenoy
  2020-04-01  9:58   ` Naveen N. Rao
  5 siblings, 1 reply; 17+ messages in thread
From: Gautham R. Shenoy @ 2020-03-27 11:32 UTC (permalink / raw)
  To: Nathan Lynch, Michael Ellerman, Vaidyanathan Srinivasan,
	Kamalesh Babulal, Naveen N. Rao, Tyrel Datwyler
  Cc: linuxppc-dev, linux-kernel, Gautham R. Shenoy

From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>

Currently purr, spurr, idle_purr, idle_spurr are exposed for every CPU
via the sysfs interface
/sys/devices/system/cpu/cpuX/[idle_][s]purr. Each sysfs read currently
generates an IPI to obtain the desired value from the target CPU X.
Since these aforementioned sysfs are typically read one after another,
we end up generating 4 IPIs per CPU in a short duration.

In order to minimize the IPI noise, this patch caches the values of
all the four entities whenever one of them is read. If subsequently
any of these are read within the next 10ms, the cached value is
returned. With this, we will generate at most one IPI every 10ms for
every CPU.

Test-results: While reading the four sysfs files back-to-back for a
given CPU every second for 100 seconds.

Without the patch:
		 16 [XICS 2 Edge IPI] = 422 times
		 DBL [Doorbell interrupts] = 13 times
		 Total : 435 IPIs.

With the patch:
		  16 [XICS 2 Edge IPI] = 111 times
		  DBL [Doorbell interrupts] = 17 times
		  Total : 128 IPIs.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/sysfs.c | 117 ++++++++++++++++++++++++++++++++++++--------
 1 file changed, 97 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
index 571b325..bd92023 100644
--- a/arch/powerpc/kernel/sysfs.c
+++ b/arch/powerpc/kernel/sysfs.c
@@ -586,8 +586,6 @@ void ppc_enable_pmcs(void)
  * SPRs which are not related to PMU.
  */
 #ifdef CONFIG_PPC64
-SYSFS_SPRSETUP(purr, SPRN_PURR);
-SYSFS_SPRSETUP(spurr, SPRN_SPURR);
 SYSFS_SPRSETUP(pir, SPRN_PIR);
 SYSFS_SPRSETUP(tscr, SPRN_TSCR);
 
@@ -596,8 +594,6 @@ void ppc_enable_pmcs(void)
   enable write when needed with a separate function.
   Lets be conservative and default to pseries.
 */
-static DEVICE_ATTR(spurr, 0400, show_spurr, NULL);
-static DEVICE_ATTR(purr, 0400, show_purr, store_purr);
 static DEVICE_ATTR(pir, 0400, show_pir, NULL);
 static DEVICE_ATTR(tscr, 0600, show_tscr, store_tscr);
 #endif /* CONFIG_PPC64 */
@@ -761,22 +757,110 @@ static void create_svm_file(void)
 }
 #endif /* CONFIG_PPC_SVM */
 
+#ifdef CONFIG_PPC64
+/*
+ * The duration (in ms) from the last IPI to the target CPU until
+ * which a cached value of purr, spurr, idle_purr, idle_spurr can be
+ * reported to the user on a corresponding sysfs file read. Beyond
+ * this duration, fresh values need to be obtained by sending IPIs to
+ * the target CPU when the sysfs files are read.
+ */
+static unsigned long util_stats_staleness_tolerance_ms = 10;
+struct util_acct_stats {
+	u64 latest_purr;
+	u64 latest_spurr;
+#ifdef CONFIG_PPC_PSERIES
+	u64 latest_idle_purr;
+	u64 latest_idle_spurr;
+#endif
+	unsigned long last_update_jiffies;
+};
+
+DEFINE_PER_CPU(struct util_acct_stats, util_acct_stats);
+
+static void update_util_acct_stats(void *ptr)
+{
+	struct util_acct_stats *stats = ptr;
+
+	stats->latest_purr = mfspr(SPRN_PURR);
+	stats->latest_spurr = mfspr(SPRN_SPURR);
 #ifdef CONFIG_PPC_PSERIES
-static void read_idle_purr(void *val)
+	stats->latest_idle_purr = read_this_idle_purr();
+	stats->latest_idle_spurr = read_this_idle_spurr();
+#endif
+	stats->last_update_jiffies = jiffies;
+}
+
+struct util_acct_stats *get_util_stats_ptr(int cpu)
+{
+	struct util_acct_stats *stats = per_cpu_ptr(&util_acct_stats, cpu);
+	unsigned long delta_jiffies;
+
+	delta_jiffies = jiffies - stats->last_update_jiffies;
+
+	/*
+	 * If we have a recent enough data, reuse that instead of
+	 * sending an IPI.
+	 */
+	if (jiffies_to_msecs(delta_jiffies) < util_stats_staleness_tolerance_ms)
+		return stats;
+
+	smp_call_function_single(cpu, update_util_acct_stats, stats, 1);
+	return stats;
+}
+
+static ssize_t show_purr(struct device *dev,
+			 struct device_attribute *attr, char *buf)
 {
-	u64 *ret = val;
+	struct cpu *cpu = container_of(dev, struct cpu, dev);
+	struct util_acct_stats *stats;
 
-	*ret = read_this_idle_purr();
+	stats = get_util_stats_ptr(cpu->dev.id);
+	return sprintf(buf, "%llx\n", stats->latest_purr);
 }
 
+static void write_purr(void *val)
+{
+	mtspr(SPRN_PURR, *(unsigned long *)val);
+}
+
+static ssize_t __used store_purr(struct device *dev,
+				 struct device_attribute *attr,
+				 const char *buf, size_t count)
+{
+	struct cpu *cpu = container_of(dev, struct cpu, dev);
+	unsigned long val;
+	int ret = kstrtoul(buf, 16, &val);
+
+	if (ret != 0)
+		return -EINVAL;
+
+	smp_call_function_single(cpu->dev.id, write_purr, &val, 1);
+	return count;
+}
+static DEVICE_ATTR(purr, 0400, show_purr, store_purr);
+
+static ssize_t show_spurr(struct device *dev,
+			  struct device_attribute *attr, char *buf)
+{
+	struct cpu *cpu = container_of(dev, struct cpu, dev);
+	struct util_acct_stats *stats;
+
+	stats = get_util_stats_ptr(cpu->dev.id);
+	return sprintf(buf, "%llx\n", stats->latest_spurr);
+}
+static DEVICE_ATTR(spurr, 0400, show_spurr, NULL);
+#endif /* CONFIG_PPC64 */
+
+#ifdef CONFIG_PPC_PSERIES
 static ssize_t idle_purr_show(struct device *dev,
 			      struct device_attribute *attr, char *buf)
 {
 	struct cpu *cpu = container_of(dev, struct cpu, dev);
-	u64 val;
+	struct util_acct_stats *stats;
 
-	smp_call_function_single(cpu->dev.id, read_idle_purr, &val, 1);
-	return sprintf(buf, "%llx\n", val);
+	stats = get_util_stats_ptr(cpu->dev.id);
+	return sprintf(buf, "%llx\n", stats->latest_idle_purr);
 }
 static DEVICE_ATTR(idle_purr, 0400, idle_purr_show, NULL);
 
@@ -792,21 +876,14 @@ static void remove_idle_purr_file(struct device *s)
 		device_remove_file(s, &dev_attr_idle_purr);
 }
 
-static void read_idle_spurr(void *val)
-{
-	u64 *ret = val;
-
-	*ret = read_this_idle_spurr();
-}
-
 static ssize_t idle_spurr_show(struct device *dev,
 			       struct device_attribute *attr, char *buf)
 {
 	struct cpu *cpu = container_of(dev, struct cpu, dev);
-	u64 val;
+	struct util_acct_stats *stats;
 
-	smp_call_function_single(cpu->dev.id, read_idle_spurr, &val, 1);
-	return sprintf(buf, "%llx\n", val);
+	stats =  get_util_stats_ptr(cpu->dev.id);
+	return sprintf(buf, "%llx\n", stats->latest_idle_spurr);
 }
 static DEVICE_ATTR(idle_spurr, 0400, idle_spurr_show, NULL);
 
-- 
1.9.4


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 2/6] powerpc/idle: Add accessor function to always read latest idle PURR
  2020-03-27 11:32 ` [PATCH v4 2/6] powerpc/idle: Add accessor function to always read latest idle PURR Gautham R. Shenoy
@ 2020-04-01  9:42   ` Naveen N. Rao
  2020-04-03  6:15     ` Gautham R Shenoy
  0 siblings, 1 reply; 17+ messages in thread
From: Naveen N. Rao @ 2020-04-01  9:42 UTC (permalink / raw)
  To: Gautham R. Shenoy, Kamalesh Babulal, Michael Ellerman,
	Nathan Lynch, Vaidyanathan Srinivasan, Tyrel Datwyler
  Cc: linux-kernel, linuxppc-dev

Hi Gautham,

Gautham R. Shenoy wrote:
> From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
> 
> Currently when CPU goes idle, we take a snapshot of PURR via
> pseries_idle_prolog() which is used at the CPU idle exit to compute
> the idle PURR cycles via the function pseries_idle_epilog().  Thus,
> the value of idle PURR cycle thus read before pseries_idle_prolog() and
> after pseries_idle_epilog() is always correct.
> 
> However, if we were to read the idle PURR cycles from an interrupt
> context between pseries_idle_prolog() and pseries_idle_epilog() (this will
> be done in a future patch), then, the value of the idle PURR thus read
> will not include the cycles spent in the most recent idle period.
> 
> This patch addresses the issue by providing accessor function to read
> the idle PURR such such that it includes the cycles spent in the most
> recent idle period, if we read it between pseries_idle_prolog() and
> pseries_idle_epilog(). In order to achieve it, the patch saves the
> snapshot of PURR in pseries_idle_prolog() in a per-cpu variable,
> instead of on the stack, so that it can be accessed from an interrupt
> context.
> 
> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/idle.h        | 47 +++++++++++++++++++++++++++-------
>  arch/powerpc/platforms/pseries/setup.c |  7 +++--
>  drivers/cpuidle/cpuidle-pseries.c      | 15 +++++------
>  3 files changed, 47 insertions(+), 22 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/idle.h b/arch/powerpc/include/asm/idle.h
> index 32064a4c..d4bfb6a 100644
> --- a/arch/powerpc/include/asm/idle.h
> +++ b/arch/powerpc/include/asm/idle.h
> @@ -5,10 +5,27 @@
>  #include <asm/paca.h>
> 
>  #ifdef CONFIG_PPC_PSERIES
> -static inline void pseries_idle_prolog(unsigned long *in_purr)
> +DECLARE_PER_CPU(u64, idle_entry_purr_snap);
> +
> +static inline void snapshot_purr_idle_entry(void)
> +{
> +	*this_cpu_ptr(&idle_entry_purr_snap) = mfspr(SPRN_PURR);
> +}
> +
> +static inline void update_idle_purr_accounting(void)
> +{
> +	u64 wait_cycles;
> +	u64 in_purr = *this_cpu_ptr(&idle_entry_purr_snap);
> +
> +	wait_cycles = be64_to_cpu(get_lppaca()->wait_state_cycles);
> +	wait_cycles += mfspr(SPRN_PURR) - in_purr;
> +	get_lppaca()->wait_state_cycles = cpu_to_be64(wait_cycles);
> +}
> +
> +static inline void pseries_idle_prolog(void)
>  {
>  	ppc64_runlatch_off();
> -	*in_purr = mfspr(SPRN_PURR);
> +	snapshot_purr_idle_entry();
>  	/*
>  	 * Indicate to the HV that we are idle. Now would be
>  	 * a good time to find other work to dispatch.
> @@ -16,16 +33,28 @@ static inline void pseries_idle_prolog(unsigned long *in_purr)
>  	get_lppaca()->idle = 1;
>  }
> 
> -static inline void pseries_idle_epilog(unsigned long in_purr)
> +static inline void pseries_idle_epilog(void)
>  {
> -	u64 wait_cycles;
> -
> -	wait_cycles = be64_to_cpu(get_lppaca()->wait_state_cycles);
> -	wait_cycles += mfspr(SPRN_PURR) - in_purr;
> -	get_lppaca()->wait_state_cycles = cpu_to_be64(wait_cycles);
> +	update_idle_purr_accounting();
>  	get_lppaca()->idle = 0;
> -
>  	ppc64_runlatch_on();
>  }
> +
> +static inline u64 read_this_idle_purr(void)
> +{
> +	/*
> +	 * If we are reading from an idle context, update the
> +	 * idle-purr cycles corresponding to the last idle period.
> +	 * Since the idle context is not yet over, take a fresh
> +	 * snapshot of the idle-purr.
> +	 */
> +	if (unlikely(get_lppaca()->idle == 1)) {
> +		update_idle_purr_accounting();
> +		snapshot_purr_idle_entry();
> +	}
> +
> +	return be64_to_cpu(get_lppaca()->wait_state_cycles);
> +}
> +

I think this and read_this_idle_spurr() from the next patch should be 
moved to Patch 4/6, where they are actually used.

- Naveen


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 5/6] Documentation: Document sysfs interfaces purr, spurr, idle_purr, idle_spurr
  2020-03-27 11:32 ` [PATCH v4 5/6] Documentation: Document sysfs interfaces purr, spurr, idle_purr, idle_spurr Gautham R. Shenoy
@ 2020-04-01  9:45   ` Naveen N. Rao
  0 siblings, 0 replies; 17+ messages in thread
From: Naveen N. Rao @ 2020-04-01  9:45 UTC (permalink / raw)
  To: Gautham R. Shenoy, Kamalesh Babulal, Michael Ellerman,
	Nathan Lynch, Vaidyanathan Srinivasan, Tyrel Datwyler
  Cc: linux-kernel, linuxppc-dev

Gautham R. Shenoy wrote:
> From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
> 
> Add documentation for the following sysfs interfaces:
> /sys/devices/system/cpu/cpuX/purr
> /sys/devices/system/cpu/cpuX/spurr
> /sys/devices/system/cpu/cpuX/idle_purr
> /sys/devices/system/cpu/cpuX/idle_spurr
> 
> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> ---
>  Documentation/ABI/testing/sysfs-devices-system-cpu | 39 ++++++++++++++++++++++
>  1 file changed, 39 insertions(+)
> 
> diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
> index 2e0e3b4..bc07677 100644
> --- a/Documentation/ABI/testing/sysfs-devices-system-cpu
> +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
> @@ -580,3 +580,42 @@ Description:	Secure Virtual Machine
>  		If 1, it means the system is using the Protected Execution
>  		Facility in POWER9 and newer processors. i.e., it is a Secure
>  		Virtual Machine.
> +
> +What: 		/sys/devices/system/cpu/cpuX/purr
> +Date:		Apr 2005
> +Contact:	Linux for PowerPC mailing list <linuxppc-dev@ozlabs.org>
> +Description:	PURR ticks for this CPU since the system boot.
> +
> +		The Processor Utilization Resources Register (PURR) is
> +		a 64-bit counter which provides an estimate of the
> +		resources used by the CPU thread. The contents of this
> +		register increases monotonically. This sysfs interface
> +		exposes the number of PURR ticks for cpuX.
> +
> +What: 		/sys/devices/system/cpu/cpuX/spurr
> +Date:		Dec 2006
> +Contact:	Linux for PowerPC mailing list <linuxppc-dev@ozlabs.org>
> +Description:	SPURR ticks for this CPU since the system boot.
> +
> +		The Scaled Processor Utilization Resources Register
> +		(SPURR) is a 64-bit counter that provides a frequency
> +		invariant estimate of the resources used by the CPU
> +		thread. The contents of this register increases
> +		monotonically. This sysfs interface exposes the number
> +		of SPURR ticks for cpuX.
> +
> +What: 		/sys/devices/system/cpu/cpuX/idle_purr
> +Date:		Mar 2020
> +Contact:	Linux for PowerPC mailing list <linuxppc-dev@ozlabs.org>
> +Description:	PURR ticks for cpuX when it was idle.
> +
> +		This sysfs interface exposes the number of PURR ticks
> +		for cpuX when it was idle.
> +
> +What: 		/sys/devices/system/cpu/cpuX/idle_spurr
> +Date:		Mar 2020
> +Contact:	Linux for PowerPC mailing list <linuxppc-dev@ozlabs.org>
> +Description:	SPURR ticks for cpuX when it was idle.
> +
> +		This sysfs interface exposes the number of SPURR ticks
> +		for cpuX when it was idle.

Apart from the minor comment on patches 2 and 3, from the sysfs 
interface standpoint, for patches 1 to 5:
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>

- Naveen


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 6/6] pseries/sysfs: Minimise IPI noise while reading [idle_][s]purr
  2020-03-27 11:32 ` [PATCH v4 6/6] pseries/sysfs: Minimise IPI noise while reading [idle_][s]purr Gautham R. Shenoy
@ 2020-04-01  9:58   ` Naveen N. Rao
  2020-04-01 12:01     ` Gautham R Shenoy
  0 siblings, 1 reply; 17+ messages in thread
From: Naveen N. Rao @ 2020-04-01  9:58 UTC (permalink / raw)
  To: Gautham R. Shenoy, Kamalesh Babulal, Michael Ellerman,
	Nathan Lynch, Vaidyanathan Srinivasan, Tyrel Datwyler
  Cc: linux-kernel, linuxppc-dev

Gautham R. Shenoy wrote:
> From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
> 
> Currently purr, spurr, idle_purr, idle_spurr are exposed for every CPU
> via the sysfs interface
> /sys/devices/system/cpu/cpuX/[idle_][s]purr. Each sysfs read currently
> generates an IPI to obtain the desired value from the target CPU X.
> Since these aforementioned sysfs are typically read one after another,
> we end up generating 4 IPIs per CPU in a short duration.
> 
> In order to minimize the IPI noise, this patch caches the values of
> all the four entities whenever one of them is read. If subsequently
> any of these are read within the next 10ms, the cached value is
> returned. With this, we will generate at most one IPI every 10ms for
> every CPU.
> 
> Test-results: While reading the four sysfs files back-to-back for a
> given CPU every second for 100 seconds.
> 
> Without the patch:
> 		 16 [XICS 2 Edge IPI] = 422 times
> 		 DBL [Doorbell interrupts] = 13 times
> 		 Total : 435 IPIs.
> 
> With the patch:
> 		  16 [XICS 2 Edge IPI] = 111 times
> 		  DBL [Doorbell interrupts] = 17 times
> 		  Total : 128 IPIs.
> 
> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> ---
>  arch/powerpc/kernel/sysfs.c | 117 ++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 97 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
> index 571b325..bd92023 100644
> --- a/arch/powerpc/kernel/sysfs.c
> +++ b/arch/powerpc/kernel/sysfs.c
> @@ -586,8 +586,6 @@ void ppc_enable_pmcs(void)
>   * SPRs which are not related to PMU.
>   */
>  #ifdef CONFIG_PPC64
> -SYSFS_SPRSETUP(purr, SPRN_PURR);
> -SYSFS_SPRSETUP(spurr, SPRN_SPURR);
>  SYSFS_SPRSETUP(pir, SPRN_PIR);
>  SYSFS_SPRSETUP(tscr, SPRN_TSCR);
> 
> @@ -596,8 +594,6 @@ void ppc_enable_pmcs(void)
>    enable write when needed with a separate function.
>    Lets be conservative and default to pseries.
>  */
> -static DEVICE_ATTR(spurr, 0400, show_spurr, NULL);
> -static DEVICE_ATTR(purr, 0400, show_purr, store_purr);
>  static DEVICE_ATTR(pir, 0400, show_pir, NULL);
>  static DEVICE_ATTR(tscr, 0600, show_tscr, store_tscr);
>  #endif /* CONFIG_PPC64 */
> @@ -761,22 +757,110 @@ static void create_svm_file(void)
>  }
>  #endif /* CONFIG_PPC_SVM */
> 
> +#ifdef CONFIG_PPC64
> +/*
> + * The duration (in ms) from the last IPI to the target CPU until
> + * which a cached value of purr, spurr, idle_purr, idle_spurr can be
> + * reported to the user on a corresponding sysfs file read. Beyond
> + * this duration, fresh values need to be obtained by sending IPIs to
> + * the target CPU when the sysfs files are read.
> + */
> +static unsigned long util_stats_staleness_tolerance_ms = 10;

This is a nice optimization for our use in lparstat, though I have a 
concern below.

> +struct util_acct_stats {
> +	u64 latest_purr;
> +	u64 latest_spurr;
> +#ifdef CONFIG_PPC_PSERIES
> +	u64 latest_idle_purr;
> +	u64 latest_idle_spurr;
> +#endif

You can probably drop the 'latest_' prefix.

> +	unsigned long last_update_jiffies;
> +};
> +
> +DEFINE_PER_CPU(struct util_acct_stats, util_acct_stats);

Per snowpatch, this should be static, and so should get_util_stats_ptr() 
below:
https://openpower.xyz/job/snowpatch/job/snowpatch-linux-sparse/16601//artifact/linux/report.txt

> +
> +static void update_util_acct_stats(void *ptr)
> +{
> +	struct util_acct_stats *stats = ptr;
> +
> +	stats->latest_purr = mfspr(SPRN_PURR);
> +	stats->latest_spurr = mfspr(SPRN_SPURR);
>  #ifdef CONFIG_PPC_PSERIES
> -static void read_idle_purr(void *val)
> +	stats->latest_idle_purr = read_this_idle_purr();
> +	stats->latest_idle_spurr = read_this_idle_spurr();
> +#endif
> +	stats->last_update_jiffies = jiffies;
> +}
> +
> +struct util_acct_stats *get_util_stats_ptr(int cpu)
> +{
> +	struct util_acct_stats *stats = per_cpu_ptr(&util_acct_stats, cpu);
> +	unsigned long delta_jiffies;
> +
> +	delta_jiffies = jiffies - stats->last_update_jiffies;
> +
> +	/*
> +	 * If we have a recent enough data, reuse that instead of
> +	 * sending an IPI.
> +	 */
> +	if (jiffies_to_msecs(delta_jiffies) < util_stats_staleness_tolerance_ms)
> +		return stats;
> +
> +	smp_call_function_single(cpu, update_util_acct_stats, stats, 1);
> +	return stats;
> +}
> +
> +static ssize_t show_purr(struct device *dev,
> +			 struct device_attribute *attr, char *buf)
>  {
> -	u64 *ret = val;
> +	struct cpu *cpu = container_of(dev, struct cpu, dev);
> +	struct util_acct_stats *stats;
> 
> -	*ret = read_this_idle_purr();
> +	stats = get_util_stats_ptr(cpu->dev.id);
> +	return sprintf(buf, "%llx\n", stats->latest_purr);

This alters the behavior of the current sysfs purr file. I am not sure 
if it is reasonable to return the same PURR value across a 10ms window.

I wonder if we should introduce a sysctl interface to control 
thresholding. It can default to 0, which disables thresholding so that 
the existing behavior continues. Applications (lparstat) can optionally 
set it to suit their use.

- Naveen


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 6/6] pseries/sysfs: Minimise IPI noise while reading [idle_][s]purr
  2020-04-01  9:58   ` Naveen N. Rao
@ 2020-04-01 12:01     ` Gautham R Shenoy
  2020-04-02  7:34       ` Naveen N. Rao
  0 siblings, 1 reply; 17+ messages in thread
From: Gautham R Shenoy @ 2020-04-01 12:01 UTC (permalink / raw)
  To: Naveen N. Rao, Michael Ellerman
  Cc: Gautham R. Shenoy, Kamalesh Babulal, Vaidyanathan Srinivasan,
	Tyrel Datwyler, linux-kernel, linuxppc-dev

Hello Naveen,


On Wed, Apr 01, 2020 at 03:28:48PM +0530, Naveen N. Rao wrote:
> Gautham R. Shenoy wrote:
> >From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
> >
 [..snip..]

> >-static DEVICE_ATTR(spurr, 0400, show_spurr, NULL);
> >-static DEVICE_ATTR(purr, 0400, show_purr, store_purr);
> > static DEVICE_ATTR(pir, 0400, show_pir, NULL);
> > static DEVICE_ATTR(tscr, 0600, show_tscr, store_tscr);
> > #endif /* CONFIG_PPC64 */
> >@@ -761,22 +757,110 @@ static void create_svm_file(void)
> > }
> > #endif /* CONFIG_PPC_SVM */
> >
> >+#ifdef CONFIG_PPC64
> >+/*
> >+ * The duration (in ms) from the last IPI to the target CPU until
> >+ * which a cached value of purr, spurr, idle_purr, idle_spurr can be
> >+ * reported to the user on a corresponding sysfs file read. Beyond
> >+ * this duration, fresh values need to be obtained by sending IPIs to
> >+ * the target CPU when the sysfs files are read.
> >+ */
> >+static unsigned long util_stats_staleness_tolerance_ms = 10;
> 
> This is a nice optimization for our use in lparstat, though I have a concern
> below.
> 
> >+struct util_acct_stats {
> >+	u64 latest_purr;
> >+	u64 latest_spurr;
> >+#ifdef CONFIG_PPC_PSERIES
> >+	u64 latest_idle_purr;
> >+	u64 latest_idle_spurr;
> >+#endif
> 
> You can probably drop the 'latest_' prefix.


Sure.

> 
> >+	unsigned long last_update_jiffies;
> >+};
> >+
> >+DEFINE_PER_CPU(struct util_acct_stats, util_acct_stats);
> 
> Per snowpatch, this should be static, and so should get_util_stats_ptr()
> below:
> https://openpower.xyz/job/snowpatch/job/snowpatch-linux-sparse/16601//artifact/linux/report.txt

Ok, will fix this in v5.

> 
> >+
> >+static void update_util_acct_stats(void *ptr)
> >+{
> >+	struct util_acct_stats *stats = ptr;
> >+
> >+	stats->latest_purr = mfspr(SPRN_PURR);
> >+	stats->latest_spurr = mfspr(SPRN_SPURR);
> > #ifdef CONFIG_PPC_PSERIES
> >-static void read_idle_purr(void *val)
> >+	stats->latest_idle_purr = read_this_idle_purr();
> >+	stats->latest_idle_spurr = read_this_idle_spurr();
> >+#endif
> >+	stats->last_update_jiffies = jiffies;
> >+}
> >+
> >+struct util_acct_stats *get_util_stats_ptr(int cpu)
> >+{
> >+	struct util_acct_stats *stats = per_cpu_ptr(&util_acct_stats, cpu);
> >+	unsigned long delta_jiffies;
> >+
> >+	delta_jiffies = jiffies - stats->last_update_jiffies;
> >+
> >+	/*
> >+	 * If we have a recent enough data, reuse that instead of
> >+	 * sending an IPI.
> >+	 */
> >+	if (jiffies_to_msecs(delta_jiffies) < util_stats_staleness_tolerance_ms)
> >+		return stats;
> >+
> >+	smp_call_function_single(cpu, update_util_acct_stats, stats, 1);
> >+	return stats;
> >+}
> >+
> >+static ssize_t show_purr(struct device *dev,
> >+			 struct device_attribute *attr, char *buf)
> > {
> >-	u64 *ret = val;
> >+	struct cpu *cpu = container_of(dev, struct cpu, dev);
> >+	struct util_acct_stats *stats;
> >
> >-	*ret = read_this_idle_purr();
> >+	stats = get_util_stats_ptr(cpu->dev.id);
> >+	return sprintf(buf, "%llx\n", stats->latest_purr);
> 
> This alters the behavior of the current sysfs purr file. I am not sure if it
> is reasonable to return the same PURR value across a 10ms window.


It does reduce it to 10ms window. I am not sure if anyone samples PURR
etc faster than that rate.

I measured how much time it takes to read the purr, spurr, idle_purr,
idle_spurr files back-to-back. It takes not more than 150us.  From
lparstat will these values be read back-to-back ? If so, we can reduce
the staleness_tolerance to something like 500us and still avoid extra
IPIs. If not, what is the maximum delay between the first sysfs file
read and the last sysfs file read ?

>
> I wonder if we should introduce a sysctl interface to control thresholding.
> It can default to 0, which disables thresholding so that the existing
> behavior continues. Applications (lparstat) can optionally set it to suit
> their use.

We would be introducing 3 new sysfs interfaces that way instead of
two.

/sys/devices/system/cpu/purr_spurr_staleness
/sys/devices/system/cpu/cpuX/idle_purr
/sys/devices/system/cpu/cpuX/idle_spurr

I don't have a problem with this. Nathan, Michael, thoughts on this?


The alternative is to have a procfs interface, something like
/proc/powerpc/resource_util_stats

which gives a listing similar to /proc/stat, i.e

      CPUX  <purr>  <idle_purr>  <spurr>  <idle_spurr>

Even in this case, the values can be obtained in one-shot with a
single IPI and be printed in the row corresponding to the CPU.

> 
> - Naveen
> 

--
Thanks and Regards
gautham.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 6/6] pseries/sysfs: Minimise IPI noise while reading [idle_][s]purr
  2020-04-01 12:01     ` Gautham R Shenoy
@ 2020-04-02  7:34       ` Naveen N. Rao
  2020-04-03  6:28         ` Gautham R Shenoy
  0 siblings, 1 reply; 17+ messages in thread
From: Naveen N. Rao @ 2020-04-02  7:34 UTC (permalink / raw)
  To: ego, Michael Ellerman
  Cc: Kamalesh Babulal, linux-kernel, linuxppc-dev,
	Vaidyanathan Srinivasan, Tyrel Datwyler

Gautham R Shenoy wrote:
> Hello Naveen,
> 
> 
> On Wed, Apr 01, 2020 at 03:28:48PM +0530, Naveen N. Rao wrote:
>> Gautham R. Shenoy wrote:
>> >From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
>> >
>  [..snip..]
> 
>> >+
>> >+static ssize_t show_purr(struct device *dev,
>> >+			 struct device_attribute *attr, char *buf)
>> > {
>> >-	u64 *ret = val;
>> >+	struct cpu *cpu = container_of(dev, struct cpu, dev);
>> >+	struct util_acct_stats *stats;
>> >
>> >-	*ret = read_this_idle_purr();
>> >+	stats = get_util_stats_ptr(cpu->dev.id);
>> >+	return sprintf(buf, "%llx\n", stats->latest_purr);
>> 
>> This alters the behavior of the current sysfs purr file. I am not sure if it
>> is reasonable to return the same PURR value across a 10ms window.
> 
> 
> It does reduce it to 10ms window. I am not sure if anyone samples PURR
> etc faster than that rate.
> 
> I measured how much time it takes to read the purr, spurr, idle_purr,
> idle_spurr files back-to-back. It takes not more than 150us.  From
> lparstat will these values be read back-to-back ? If so, we can reduce
> the staleness_tolerance to something like 500us and still avoid extra
> IPIs. If not, what is the maximum delay between the first sysfs file
> read and the last sysfs file read ?

Oh, for lparstat usage, this is perfectly fine.

I meant that there could be other users of [s]purr who might care. I 
don't know of one, but since this is an existing sysfs interface, I 
wanted to point out that the behavior might change.

> 
>>
>> I wonder if we should introduce a sysctl interface to control thresholding.
>> It can default to 0, which disables thresholding so that the existing
>> behavior continues. Applications (lparstat) can optionally set it to suit
>> their use.
> 
> We would be introducing 3 new sysfs interfaces that way instead of
> two.
> 
> /sys/devices/system/cpu/purr_spurr_staleness
> /sys/devices/system/cpu/cpuX/idle_purr
> /sys/devices/system/cpu/cpuX/idle_spurr
> 
> I don't have a problem with this. Nathan, Michael, thoughts on this?
> 
> 
> The alternative is to have a procfs interface, something like
> /proc/powerpc/resource_util_stats
> 
> which gives a listing similar to /proc/stat, i.e
> 
>       CPUX  <purr>  <idle_purr>  <spurr>  <idle_spurr>
> 
> Even in this case, the values can be obtained in one-shot with a
> single IPI and be printed in the row corresponding to the CPU.

Right -- and that would be optimal requiring a single system call, at 
the cost of using a legacy interface.

The other option would be to drop this patch and to just go with patches 
1-5 introducing the new sysfs interfaces for idle_[s]purr. It isn't 
entirely clear how often this would be used, or its actual impact. We 
can perhaps consider this optimization if and when this causes 
problems...


Thanks,
Naveen


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 2/6] powerpc/idle: Add accessor function to always read latest idle PURR
  2020-04-01  9:42   ` Naveen N. Rao
@ 2020-04-03  6:15     ` Gautham R Shenoy
  2020-04-03 10:34       ` Naveen N. Rao
  0 siblings, 1 reply; 17+ messages in thread
From: Gautham R Shenoy @ 2020-04-03  6:15 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: Gautham R. Shenoy, Kamalesh Babulal, Michael Ellerman,
	Nathan Lynch, Vaidyanathan Srinivasan, Tyrel Datwyler,
	linux-kernel, linuxppc-dev

On Wed, Apr 01, 2020 at 03:12:53PM +0530, Naveen N. Rao wrote:
> Hi Gautham,
> 
> Gautham R. Shenoy wrote:
> >From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
> >
> >Currently when CPU goes idle, we take a snapshot of PURR via
> >pseries_idle_prolog() which is used at the CPU idle exit to compute
> >the idle PURR cycles via the function pseries_idle_epilog().  Thus,
> >the value of idle PURR cycle thus read before pseries_idle_prolog() and
> >after pseries_idle_epilog() is always correct.
> >
> >However, if we were to read the idle PURR cycles from an interrupt
> >context between pseries_idle_prolog() and pseries_idle_epilog() (this will
> >be done in a future patch), then, the value of the idle PURR thus read
> >will not include the cycles spent in the most recent idle period.
> >
> >This patch addresses the issue by providing accessor function to read
> >the idle PURR such such that it includes the cycles spent in the most
> >recent idle period, if we read it between pseries_idle_prolog() and
> >pseries_idle_epilog(). In order to achieve it, the patch saves the
> >snapshot of PURR in pseries_idle_prolog() in a per-cpu variable,
> >instead of on the stack, so that it can be accessed from an interrupt
> >context.
> >
> >Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> >---
> > arch/powerpc/include/asm/idle.h        | 47 +++++++++++++++++++++++++++-------
> > arch/powerpc/platforms/pseries/setup.c |  7 +++--
> > drivers/cpuidle/cpuidle-pseries.c      | 15 +++++------
> > 3 files changed, 47 insertions(+), 22 deletions(-)
> >
> >diff --git a/arch/powerpc/include/asm/idle.h b/arch/powerpc/include/asm/idle.h
> >index 32064a4c..d4bfb6a 100644
> >--- a/arch/powerpc/include/asm/idle.h
> >+++ b/arch/powerpc/include/asm/idle.h
> >@@ -5,10 +5,27 @@
> > #include <asm/paca.h>
> >
> > #ifdef CONFIG_PPC_PSERIES
> >-static inline void pseries_idle_prolog(unsigned long *in_purr)
> >+DECLARE_PER_CPU(u64, idle_entry_purr_snap);
> >+
> >+static inline void snapshot_purr_idle_entry(void)
> >+{
> >+	*this_cpu_ptr(&idle_entry_purr_snap) = mfspr(SPRN_PURR);
> >+}
> >+
> >+static inline void update_idle_purr_accounting(void)
> >+{
> >+	u64 wait_cycles;
> >+	u64 in_purr = *this_cpu_ptr(&idle_entry_purr_snap);
> >+
> >+	wait_cycles = be64_to_cpu(get_lppaca()->wait_state_cycles);
> >+	wait_cycles += mfspr(SPRN_PURR) - in_purr;
> >+	get_lppaca()->wait_state_cycles = cpu_to_be64(wait_cycles);
> >+}
> >+
> >+static inline void pseries_idle_prolog(void)
> > {
> > 	ppc64_runlatch_off();
> >-	*in_purr = mfspr(SPRN_PURR);
> >+	snapshot_purr_idle_entry();
> > 	/*
> > 	 * Indicate to the HV that we are idle. Now would be
> > 	 * a good time to find other work to dispatch.
> >@@ -16,16 +33,28 @@ static inline void pseries_idle_prolog(unsigned long *in_purr)
> > 	get_lppaca()->idle = 1;
> > }
> >
> >-static inline void pseries_idle_epilog(unsigned long in_purr)
> >+static inline void pseries_idle_epilog(void)
> > {
> >-	u64 wait_cycles;
> >-
> >-	wait_cycles = be64_to_cpu(get_lppaca()->wait_state_cycles);
> >-	wait_cycles += mfspr(SPRN_PURR) - in_purr;
> >-	get_lppaca()->wait_state_cycles = cpu_to_be64(wait_cycles);
> >+	update_idle_purr_accounting();
> > 	get_lppaca()->idle = 0;
> >-
> > 	ppc64_runlatch_on();
> > }
> >+
> >+static inline u64 read_this_idle_purr(void)
> >+{
> >+	/*
> >+	 * If we are reading from an idle context, update the
> >+	 * idle-purr cycles corresponding to the last idle period.
> >+	 * Since the idle context is not yet over, take a fresh
> >+	 * snapshot of the idle-purr.
> >+	 */
> >+	if (unlikely(get_lppaca()->idle == 1)) {
> >+		update_idle_purr_accounting();
> >+		snapshot_purr_idle_entry();
> >+	}
> >+
> >+	return be64_to_cpu(get_lppaca()->wait_state_cycles);
> >+}
> >+
> 
> I think this and read_this_idle_spurr() from the next patch should be moved
> to Patch 4/6, where they are actually used.

The reason I included this function in this patch was to justify why
we were introducing snapshotting the purr values in a global per-cpu
variable instead of on a stack variable. The reason being that someone
might want to read the PURR value from an interrupt context which had
woken up the CPU from idle. At this point, since epilog() function
wasn't called, the idle PURR count corresponding to this latest idle
period would have been accumulated in lppaca->wait_cycles. Thus, this
helper function safely reads the value by
   1) First updating the lppaca->wait_cycles with the latest idle_purr
   count.
   2) Take a fresh snapshot, since the time from now to the epilog()
   call is also counted under idle CPU. So the PURR cycle increment
   during this short period should also be accumulated in lppaca->wait_cycles.


prolog()
|	snapshot PURR
|
|
|
Idle
|
| <----- Interrupt . Read idle PURR ---- update idle PURR;
|                              	         snapshot PURR;
|                                   	 Read idle PURR.       
|
epilog()
	update idle PURR



> 
> - Naveen
> 

However, if you feel that moving this function to Patch 4 where it is
actually used makes it more readable, I can do that.

--
Thanks and Regards
gautham.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 6/6] pseries/sysfs: Minimise IPI noise while reading [idle_][s]purr
  2020-04-02  7:34       ` Naveen N. Rao
@ 2020-04-03  6:28         ` Gautham R Shenoy
  2020-04-03 18:10           ` Nathan Lynch
  0 siblings, 1 reply; 17+ messages in thread
From: Gautham R Shenoy @ 2020-04-03  6:28 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: ego, Michael Ellerman, Kamalesh Babulal, linux-kernel,
	linuxppc-dev, Vaidyanathan Srinivasan, Tyrel Datwyler

Hi Naveen,

On Thu, Apr 02, 2020 at 01:04:34PM +0530, Naveen N. Rao wrote:
[..snip..]

> >
> >It does reduce it to 10ms window. I am not sure if anyone samples PURR
> >etc faster than that rate.
> >
> >I measured how much time it takes to read the purr, spurr, idle_purr,
> >idle_spurr files back-to-back. It takes not more than 150us.  From
> >lparstat will these values be read back-to-back ? If so, we can reduce
> >the staleness_tolerance to something like 500us and still avoid extra
> >IPIs. If not, what is the maximum delay between the first sysfs file
> >read and the last sysfs file read ?
> 
> Oh, for lparstat usage, this is perfectly fine.
> 
> I meant that there could be other users of [s]purr who might care. I don't
> know of one, but since this is an existing sysfs interface, I wanted to
> point out that the behavior might change.

Fair point. Perhaps this should be documented in the Documentation, if
we are going to continue with this patch.

> 
> >
> >>
> >>I wonder if we should introduce a sysctl interface to control thresholding.
> >>It can default to 0, which disables thresholding so that the existing
> >>behavior continues. Applications (lparstat) can optionally set it to suit
> >>their use.
> >
> >We would be introducing 3 new sysfs interfaces that way instead of
> >two.
> >
> >/sys/devices/system/cpu/purr_spurr_staleness
> >/sys/devices/system/cpu/cpuX/idle_purr
> >/sys/devices/system/cpu/cpuX/idle_spurr
> >
> >I don't have a problem with this. Nathan, Michael, thoughts on this?
> >
> >
> >The alternative is to have a procfs interface, something like
> >/proc/powerpc/resource_util_stats
> >
> >which gives a listing similar to /proc/stat, i.e
> >
> >      CPUX  <purr>  <idle_purr>  <spurr>  <idle_spurr>
> >
> >Even in this case, the values can be obtained in one-shot with a
> >single IPI and be printed in the row corresponding to the CPU.
> 
> Right -- and that would be optimal requiring a single system call, at the
> cost of using a legacy interface.
> 
> The other option would be to drop this patch and to just go with patches 1-5
> introducing the new sysfs interfaces for idle_[s]purr. It isn't entirely
> clear how often this would be used, or its actual impact. We can perhaps
> consider this optimization if and when this causes problems...

I am ok with that. We can revisit the problem if IPI noise becomes
noticable. However, if Nathan or Michael feel that this problem is
better solved now, than leaving it for the future, we will have to
take a call on what the interface is going to be.

> 
> 
> Thanks,
> Naveen
> 

--
Thanks and Regards
gautham.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 2/6] powerpc/idle: Add accessor function to always read latest idle PURR
  2020-04-03  6:15     ` Gautham R Shenoy
@ 2020-04-03 10:34       ` Naveen N. Rao
  2020-04-03 11:24         ` Gautham R Shenoy
  0 siblings, 1 reply; 17+ messages in thread
From: Naveen N. Rao @ 2020-04-03 10:34 UTC (permalink / raw)
  To: ego
  Cc: Kamalesh Babulal, linux-kernel, linuxppc-dev, Michael Ellerman,
	Nathan Lynch, Vaidyanathan Srinivasan, Tyrel Datwyler

Gautham R Shenoy wrote:
> On Wed, Apr 01, 2020 at 03:12:53PM +0530, Naveen N. Rao wrote:
>> Hi Gautham,
>> 
>> Gautham R. Shenoy wrote:
>> >From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
>> >
>> >+
>> >+static inline u64 read_this_idle_purr(void)
>> >+{
>> >+	/*
>> >+	 * If we are reading from an idle context, update the
>> >+	 * idle-purr cycles corresponding to the last idle period.
>> >+	 * Since the idle context is not yet over, take a fresh
>> >+	 * snapshot of the idle-purr.
>> >+	 */
>> >+	if (unlikely(get_lppaca()->idle == 1)) {
>> >+		update_idle_purr_accounting();
>> >+		snapshot_purr_idle_entry();
>> >+	}
>> >+
>> >+	return be64_to_cpu(get_lppaca()->wait_state_cycles);
>> >+}
>> >+
>> 
>> I think this and read_this_idle_spurr() from the next patch should be moved
>> to Patch 4/6, where they are actually used.
> 
> The reason I included this function in this patch was to justify why
> we were introducing snapshotting the purr values in a global per-cpu
> variable instead of on a stack variable. The reason being that someone
> might want to read the PURR value from an interrupt context which had
> woken up the CPU from idle. At this point, since epilog() function
> wasn't called, the idle PURR count corresponding to this latest idle
> period would have been accumulated in lppaca->wait_cycles. Thus, this
> helper function safely reads the value by
>    1) First updating the lppaca->wait_cycles with the latest idle_purr
>    count.
>    2) Take a fresh snapshot, since the time from now to the epilog()
>    call is also counted under idle CPU. So the PURR cycle increment
>    during this short period should also be accumulated in lppaca->wait_cycles.
> 
> 
> prolog()
> |	snapshot PURR
> |
> |
> |
> Idle
> |
> | <----- Interrupt . Read idle PURR ---- update idle PURR;
> |                              	         snapshot PURR;
> |                                   	 Read idle PURR.       
> |
> epilog()
> 	update idle PURR
> 

Yes, I understand. It makes sense.

> 
> However, if you feel that moving this function to Patch 4 where it is
> actually used makes it more readable, I can do that.

My suggestion was from a bisectability standpoint though. This is a 
fairly simple function, but it is generally recommended to ensure that 
newly added code gets exercized in the patch that it is introduced in:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/5.Posting.rst#n119


Regards,
Naveen


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 2/6] powerpc/idle: Add accessor function to always read latest idle PURR
  2020-04-03 10:34       ` Naveen N. Rao
@ 2020-04-03 11:24         ` Gautham R Shenoy
  0 siblings, 0 replies; 17+ messages in thread
From: Gautham R Shenoy @ 2020-04-03 11:24 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: ego, Kamalesh Babulal, linux-kernel, linuxppc-dev,
	Michael Ellerman, Nathan Lynch, Vaidyanathan Srinivasan,
	Tyrel Datwyler

On Fri, Apr 03, 2020 at 04:04:56PM +0530, Naveen N. Rao wrote:
> Gautham R Shenoy wrote:
> >On Wed, Apr 01, 2020 at 03:12:53PM +0530, Naveen N. Rao wrote:
> >>Hi Gautham,
> >>
> >>Gautham R. Shenoy wrote:
> >>>From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
> >>>
> >>>+
> >>>+static inline u64 read_this_idle_purr(void)
> >>>+{
> >>>+	/*
> >>>+	 * If we are reading from an idle context, update the
> >>>+	 * idle-purr cycles corresponding to the last idle period.
> >>>+	 * Since the idle context is not yet over, take a fresh
> >>>+	 * snapshot of the idle-purr.
> >>>+	 */
> >>>+	if (unlikely(get_lppaca()->idle == 1)) {
> >>>+		update_idle_purr_accounting();
> >>>+		snapshot_purr_idle_entry();
> >>>+	}
> >>>+
> >>>+	return be64_to_cpu(get_lppaca()->wait_state_cycles);
> >>>+}
> >>>+
> >>
> >>I think this and read_this_idle_spurr() from the next patch should be moved
> >>to Patch 4/6, where they are actually used.
> >
> >The reason I included this function in this patch was to justify why
> >we were introducing snapshotting the purr values in a global per-cpu
> >variable instead of on a stack variable. The reason being that someone
> >might want to read the PURR value from an interrupt context which had
> >woken up the CPU from idle. At this point, since epilog() function
> >wasn't called, the idle PURR count corresponding to this latest idle
> >period would have been accumulated in lppaca->wait_cycles. Thus, this
> >helper function safely reads the value by
> >   1) First updating the lppaca->wait_cycles with the latest idle_purr
> >   count.
> >   2) Take a fresh snapshot, since the time from now to the epilog()
> >   call is also counted under idle CPU. So the PURR cycle increment
> >   during this short period should also be accumulated in lppaca->wait_cycles.
> >
> >
> >prolog()
> >|	snapshot PURR
> >|
> >|
> >|
> >Idle
> >|
> >| <----- Interrupt . Read idle PURR ---- update idle PURR;
> >|                              	         snapshot PURR;
> >|                                   	 Read idle PURR.       |
> >epilog()
> >	update idle PURR
> >
> 
> Yes, I understand. It makes sense.
> 
> >
> >However, if you feel that moving this function to Patch 4 where it is
> >actually used makes it more readable, I can do that.
> 
> My suggestion was from a bisectability standpoint though. This is a fairly
> simple function, but it is generally recommended to ensure that newly added
> code gets exercized in the patch that it is introduced in:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/5.Posting.rst#n119
>

Fair point. Will move those functions to Patch 4.


> 
> Regards,
> Naveen
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 6/6] pseries/sysfs: Minimise IPI noise while reading [idle_][s]purr
  2020-04-03  6:28         ` Gautham R Shenoy
@ 2020-04-03 18:10           ` Nathan Lynch
  0 siblings, 0 replies; 17+ messages in thread
From: Nathan Lynch @ 2020-04-03 18:10 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: Tyrel Datwyler, linux-kernel, Kamalesh Babulal,
	Vaidyanathan Srinivasan, linuxppc-dev, Naveen N. Rao

Gautham R Shenoy <ego@linux.vnet.ibm.com> writes:
> On Thu, Apr 02, 2020 at 01:04:34PM +0530, Naveen N. Rao wrote:
>> >>
>> >>I wonder if we should introduce a sysctl interface to control thresholding.
>> >>It can default to 0, which disables thresholding so that the existing
>> >>behavior continues. Applications (lparstat) can optionally set it to suit
>> >>their use.
>> >
>> >We would be introducing 3 new sysfs interfaces that way instead of
>> >two.
>> >
>> >/sys/devices/system/cpu/purr_spurr_staleness
>> >/sys/devices/system/cpu/cpuX/idle_purr
>> >/sys/devices/system/cpu/cpuX/idle_spurr
>> >
>> >I don't have a problem with this. Nathan, Michael, thoughts on this?

No, I don't think this warrants a tunable when the issue it's intended
to address is still a bit speculative at this point. (Also, note that
this would be a system-wide value, but you could have multiple
concurrent users of the interface with different needs.)


>> >The alternative is to have a procfs interface, something like
>> >/proc/powerpc/resource_util_stats
>> >
>> >which gives a listing similar to /proc/stat, i.e
>> >
>> >      CPUX  <purr>  <idle_purr>  <spurr>  <idle_spurr>
>> >
>> >Even in this case, the values can be obtained in one-shot with a
>> >single IPI and be printed in the row corresponding to the CPU.
>> 
>> Right -- and that would be optimal requiring a single system call, at the
>> cost of using a legacy interface.
>> 
>> The other option would be to drop this patch and to just go with patches 1-5
>> introducing the new sysfs interfaces for idle_[s]purr. It isn't entirely
>> clear how often this would be used, or its actual impact. We can perhaps
>> consider this optimization if and when this causes problems...
>
> I am ok with that. We can revisit the problem if IPI noise becomes
> noticable. However, if Nathan or Michael feel that this problem is
> better solved now, than leaving it for the future, we will have to
> take a call on what the interface is going to be.

While I maintain some concern about the overhead on larger LPARs (150us
per CPU works out to ~0.15s total to serially sample 1024 CPUs, ~0.3s
for 2048 and so on), I am OK with the straightforward addition of the
attributes without any batching or sampling thresholds behind the scenes
for now. I appreciate your consideration of the issue.

If this turns out to be too inefficient then I think we should consider
a non-sysfs mechanism such as chardev+ioctl.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-04-06  4:57 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-27 11:32 [PATCH v4 0/6] [PATCH v4 0/6] Track and expose idle PURR and SPURR ticks Gautham R. Shenoy
2020-03-27 11:32 ` [PATCH v4 1/6] powerpc: Move idle_loop_prolog()/epilog() functions to header file Gautham R. Shenoy
2020-03-27 11:32 ` [PATCH v4 2/6] powerpc/idle: Add accessor function to always read latest idle PURR Gautham R. Shenoy
2020-04-01  9:42   ` Naveen N. Rao
2020-04-03  6:15     ` Gautham R Shenoy
2020-04-03 10:34       ` Naveen N. Rao
2020-04-03 11:24         ` Gautham R Shenoy
2020-03-27 11:32 ` [PATCH v4 3/6] powerpc/pseries: Account for SPURR ticks on idle CPUs Gautham R. Shenoy
2020-03-27 11:32 ` [PATCH v4 4/6] powerpc/sysfs: Show idle_purr and idle_spurr for every CPU Gautham R. Shenoy
2020-03-27 11:32 ` [PATCH v4 5/6] Documentation: Document sysfs interfaces purr, spurr, idle_purr, idle_spurr Gautham R. Shenoy
2020-04-01  9:45   ` Naveen N. Rao
2020-03-27 11:32 ` [PATCH v4 6/6] pseries/sysfs: Minimise IPI noise while reading [idle_][s]purr Gautham R. Shenoy
2020-04-01  9:58   ` Naveen N. Rao
2020-04-01 12:01     ` Gautham R Shenoy
2020-04-02  7:34       ` Naveen N. Rao
2020-04-03  6:28         ` Gautham R Shenoy
2020-04-03 18:10           ` Nathan Lynch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).