All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
To: Nicholas Piggin <npiggin@gmail.com>,
	Anton Blanchard <anton@ozlabs.org>,
	Nathan Lynch <nathanl@linux.ibm.com>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Michael Neuling <mikey@neuling.org>,
	Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Cc: linuxppc-dev@ozlabs.org, linux-kernel@vger.kernel.org,
	linux-pm@vger.kernel.org,
	"Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
Subject: [PATCH 4/5] cpuidle-pseries : Include extended CEDE states in cpuidle framework
Date: Tue,  7 Jul 2020 16:41:38 +0530	[thread overview]
Message-ID: <1594120299-31389-5-git-send-email-ego@linux.vnet.ibm.com> (raw)
In-Reply-To: <1594120299-31389-1-git-send-email-ego@linux.vnet.ibm.com>

From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>

This patch exposes those extended CEDE states to the cpuidle framework
which are responsive to external interrupts and do not need an H_PROD.

Since as per the PAPR, all the extended CEDE states are non-responsive
to timers, we indicate this to the cpuidle subsystem via the
CPUIDLE_FLAG_TIMER_STOP flag for all those extende CEDE states which
can wake up on external interrupts.

With the patch, we are able to see the extended CEDE state with
latency hint = 1 exposed via the cpuidle framework.

	$ cpupower idle-info
	CPUidle driver: pseries_idle
	CPUidle governor: menu
	analyzing CPU 0:

	Number of idle states: 3
	Available idle states: snooze CEDE XCEDE1
	snooze:
	Flags/Description: snooze
	Latency: 0
	Usage: 33429446
	Duration: 27006062
	CEDE:
	Flags/Description: CEDE
	Latency: 1
	Usage: 10272
	Duration: 110786770
	XCEDE1:
	Flags/Description: XCEDE1
	Latency: 12
	Usage: 26445
	Duration: 1436433815

Benchmark results:
TLDR: Over all we do not see any additional benefit from having XCEDE1 over
CEDE.

ebizzy :
2 threads bound to a big-core. With this patch, we see a 3.39%
regression compared to with only CEDE0 latency fixup.
x With only CEDE0 latency fixup
* With CEDE0 latency fixup + CEDE1
    N           Min           Max        Median           Avg        Stddev
x  10       2893813       5834474       5832448     5327281.3     1055941.4
*  10       2907329       5834923       5831398     5146614.6     1193874.8

context_switch2:
With the context_switch2 there are no observable regressions in the
results.

context_switch2 CPU0 CPU1 (Same Big-core, different small-cores).
No difference with and without patch.
x without_patch
* with_patch
    N           Min           Max        Median           Avg        Stddev
x 500        343644        348778        345444     345584.02     1035.1658
* 500        344310        347646        345776     345877.22     802.19501

context_switch2 CPU0 CPU8 (different big-cores). Minor 0.05% improvement
with patch
x without_patch
* with_patch
    N           Min           Max        Median           Avg        Stddev
x 500        287562        288756        288162     288134.76     262.24328
* 500        287874        288960        288306     288274.66     187.57034

schbench:
No regressions observed with schbench

Without Patch:
Latency percentiles (usec)
	50.0th: 29
	75.0th: 40
	90.0th: 50
	95.0th: 61
	*99.0th: 13648
	99.5th: 14768
	99.9th: 15664
	min=0, max=29812

With Patch:
Latency percentiles (usec)
	50.0th: 30
	75.0th: 40
	90.0th: 51
	95.0th: 59
	*99.0th: 13616
	99.5th: 14512
	99.9th: 15696
	min=0, max=15996

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
---
 drivers/cpuidle/cpuidle-pseries.c | 50 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
index 502f906..6f893cd 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -362,9 +362,59 @@ static int add_pseries_idle_states(void)
 	for (i = 0; i < nr_xcede_records; i++) {
 		u64 latency_tb = xcede_records[i].wakeup_latency_tb_ticks;
 		u64 latency_us = tb_to_ns(latency_tb) / NSEC_PER_USEC;
+		char name[CPUIDLE_NAME_LEN];
+		unsigned int latency_hint = xcede_records[i].latency_hint;
+		u64 residency_us;
+
+		if (!xcede_records[i].responsive_to_irqs) {
+			pr_info("cpuidle : Skipping XCEDE%d. Not responsive to IRQs\n",
+				latency_hint);
+			continue;
+		}
 
 		if (latency_us < min_latency_us)
 			min_latency_us = latency_us;
+		snprintf(name, CPUIDLE_NAME_LEN, "XCEDE%d", latency_hint);
+
+		/*
+		 * As per the section 14.14.1 of PAPR version 2.8.1
+		 * says that alling H_CEDE with the value of the cede
+		 * latency specifier set greater than zero allows the
+		 * processor timer facility to be disabled (so as not
+		 * to cause gratuitous wake-ups - the use of H_PROD,
+		 * or other external interrupt is required to wake the
+		 * processor in this case).
+		 *
+		 * So, inform the cpuidle-subsystem that the timer
+		 * will be stopped for these states.
+		 *
+		 * Also, bump up the latency by 10us, since cpuidle
+		 * would use timer-offload framework which will need
+		 * to send an IPI to wakeup a CPU whose timer has
+		 * expired.
+		 */
+		if (latency_hint > 0) {
+			dedicated_states[nr_states].flags = CPUIDLE_FLAG_TIMER_STOP;
+			latency_us += 10;
+		}
+
+		/*
+		 * Thumb rule : Reside in the XCEDE state for at least
+		 * 10x the time required to enter and exit that state.
+		 */
+		residency_us = latency_us * 10;
+
+		strlcpy(dedicated_states[nr_states].name, (const char *)name,
+			CPUIDLE_NAME_LEN);
+		strlcpy(dedicated_states[nr_states].desc, (const char *)name,
+			CPUIDLE_NAME_LEN);
+		dedicated_states[nr_states].exit_latency = latency_us;
+		dedicated_states[nr_states].target_residency = residency_us;
+		dedicated_states[nr_states].enter = &dedicated_cede_loop;
+		cede_latency_hint[nr_states] = latency_hint;
+		pr_info("cpuidle : Added %s. latency-hint = %d\n",
+			name, latency_hint);
+		nr_states++;
 	}
 
 	/*
-- 
1.9.4


WARNING: multiple messages have this Message-ID (diff)
From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
To: Nicholas Piggin <npiggin@gmail.com>,
	Anton Blanchard <anton@ozlabs.org>,
	Nathan Lynch <nathanl@linux.ibm.com>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Michael Neuling <mikey@neuling.org>,
	Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Cc: linuxppc-dev@ozlabs.org,
	"Gautham R. Shenoy" <ego@linux.vnet.ibm.com>,
	linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org
Subject: [PATCH 4/5] cpuidle-pseries : Include extended CEDE states in cpuidle framework
Date: Tue,  7 Jul 2020 16:41:38 +0530	[thread overview]
Message-ID: <1594120299-31389-5-git-send-email-ego@linux.vnet.ibm.com> (raw)
In-Reply-To: <1594120299-31389-1-git-send-email-ego@linux.vnet.ibm.com>

From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>

This patch exposes those extended CEDE states to the cpuidle framework
which are responsive to external interrupts and do not need an H_PROD.

Since as per the PAPR, all the extended CEDE states are non-responsive
to timers, we indicate this to the cpuidle subsystem via the
CPUIDLE_FLAG_TIMER_STOP flag for all those extende CEDE states which
can wake up on external interrupts.

With the patch, we are able to see the extended CEDE state with
latency hint = 1 exposed via the cpuidle framework.

	$ cpupower idle-info
	CPUidle driver: pseries_idle
	CPUidle governor: menu
	analyzing CPU 0:

	Number of idle states: 3
	Available idle states: snooze CEDE XCEDE1
	snooze:
	Flags/Description: snooze
	Latency: 0
	Usage: 33429446
	Duration: 27006062
	CEDE:
	Flags/Description: CEDE
	Latency: 1
	Usage: 10272
	Duration: 110786770
	XCEDE1:
	Flags/Description: XCEDE1
	Latency: 12
	Usage: 26445
	Duration: 1436433815

Benchmark results:
TLDR: Over all we do not see any additional benefit from having XCEDE1 over
CEDE.

ebizzy :
2 threads bound to a big-core. With this patch, we see a 3.39%
regression compared to with only CEDE0 latency fixup.
x With only CEDE0 latency fixup
* With CEDE0 latency fixup + CEDE1
    N           Min           Max        Median           Avg        Stddev
x  10       2893813       5834474       5832448     5327281.3     1055941.4
*  10       2907329       5834923       5831398     5146614.6     1193874.8

context_switch2:
With the context_switch2 there are no observable regressions in the
results.

context_switch2 CPU0 CPU1 (Same Big-core, different small-cores).
No difference with and without patch.
x without_patch
* with_patch
    N           Min           Max        Median           Avg        Stddev
x 500        343644        348778        345444     345584.02     1035.1658
* 500        344310        347646        345776     345877.22     802.19501

context_switch2 CPU0 CPU8 (different big-cores). Minor 0.05% improvement
with patch
x without_patch
* with_patch
    N           Min           Max        Median           Avg        Stddev
x 500        287562        288756        288162     288134.76     262.24328
* 500        287874        288960        288306     288274.66     187.57034

schbench:
No regressions observed with schbench

Without Patch:
Latency percentiles (usec)
	50.0th: 29
	75.0th: 40
	90.0th: 50
	95.0th: 61
	*99.0th: 13648
	99.5th: 14768
	99.9th: 15664
	min=0, max=29812

With Patch:
Latency percentiles (usec)
	50.0th: 30
	75.0th: 40
	90.0th: 51
	95.0th: 59
	*99.0th: 13616
	99.5th: 14512
	99.9th: 15696
	min=0, max=15996

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
---
 drivers/cpuidle/cpuidle-pseries.c | 50 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
index 502f906..6f893cd 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -362,9 +362,59 @@ static int add_pseries_idle_states(void)
 	for (i = 0; i < nr_xcede_records; i++) {
 		u64 latency_tb = xcede_records[i].wakeup_latency_tb_ticks;
 		u64 latency_us = tb_to_ns(latency_tb) / NSEC_PER_USEC;
+		char name[CPUIDLE_NAME_LEN];
+		unsigned int latency_hint = xcede_records[i].latency_hint;
+		u64 residency_us;
+
+		if (!xcede_records[i].responsive_to_irqs) {
+			pr_info("cpuidle : Skipping XCEDE%d. Not responsive to IRQs\n",
+				latency_hint);
+			continue;
+		}
 
 		if (latency_us < min_latency_us)
 			min_latency_us = latency_us;
+		snprintf(name, CPUIDLE_NAME_LEN, "XCEDE%d", latency_hint);
+
+		/*
+		 * As per the section 14.14.1 of PAPR version 2.8.1
+		 * says that alling H_CEDE with the value of the cede
+		 * latency specifier set greater than zero allows the
+		 * processor timer facility to be disabled (so as not
+		 * to cause gratuitous wake-ups - the use of H_PROD,
+		 * or other external interrupt is required to wake the
+		 * processor in this case).
+		 *
+		 * So, inform the cpuidle-subsystem that the timer
+		 * will be stopped for these states.
+		 *
+		 * Also, bump up the latency by 10us, since cpuidle
+		 * would use timer-offload framework which will need
+		 * to send an IPI to wakeup a CPU whose timer has
+		 * expired.
+		 */
+		if (latency_hint > 0) {
+			dedicated_states[nr_states].flags = CPUIDLE_FLAG_TIMER_STOP;
+			latency_us += 10;
+		}
+
+		/*
+		 * Thumb rule : Reside in the XCEDE state for at least
+		 * 10x the time required to enter and exit that state.
+		 */
+		residency_us = latency_us * 10;
+
+		strlcpy(dedicated_states[nr_states].name, (const char *)name,
+			CPUIDLE_NAME_LEN);
+		strlcpy(dedicated_states[nr_states].desc, (const char *)name,
+			CPUIDLE_NAME_LEN);
+		dedicated_states[nr_states].exit_latency = latency_us;
+		dedicated_states[nr_states].target_residency = residency_us;
+		dedicated_states[nr_states].enter = &dedicated_cede_loop;
+		cede_latency_hint[nr_states] = latency_hint;
+		pr_info("cpuidle : Added %s. latency-hint = %d\n",
+			name, latency_hint);
+		nr_states++;
 	}
 
 	/*
-- 
1.9.4


  parent reply	other threads:[~2020-07-07 11:12 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-07 11:11 [PATCH 0/5] cpuidle-pseries: Parse extended CEDE information for idle Gautham R. Shenoy
2020-07-07 11:11 ` Gautham R. Shenoy
2020-07-07 11:11 ` [PATCH 1/5] cpuidle-pseries: Set the latency-hint before entering CEDE Gautham R. Shenoy
2020-07-07 11:11   ` Gautham R. Shenoy
2020-07-20  5:55   ` Vaidyanathan Srinivasan
2020-07-20  5:55     ` Vaidyanathan Srinivasan
2020-07-07 11:11 ` [PATCH 2/5] cpuidle-pseries: Add function to parse extended CEDE records Gautham R. Shenoy
2020-07-07 11:11   ` Gautham R. Shenoy
2020-07-20  6:09   ` Vaidyanathan Srinivasan
2020-07-20  6:09     ` Vaidyanathan Srinivasan
2020-07-07 11:11 ` [PATCH 3/5] cpuidle-pseries : Fixup exit latency for CEDE(0) Gautham R. Shenoy
2020-07-07 11:11   ` Gautham R. Shenoy
2020-07-20  6:24   ` Vaidyanathan Srinivasan
2020-07-20  6:24     ` Vaidyanathan Srinivasan
2020-07-07 11:11 ` Gautham R. Shenoy [this message]
2020-07-07 11:11   ` [PATCH 4/5] cpuidle-pseries : Include extended CEDE states in cpuidle framework Gautham R. Shenoy
2020-07-20  6:31   ` Vaidyanathan Srinivasan
2020-07-20  6:31     ` Vaidyanathan Srinivasan
2020-07-07 11:11 ` [PATCH 5/5] cpuidle-pseries: Block Extended CEDE(1) which adds no additional value Gautham R. Shenoy
2020-07-07 11:11   ` Gautham R. Shenoy
2020-07-20  6:34   ` Vaidyanathan Srinivasan
2020-07-20  6:34     ` Vaidyanathan Srinivasan
2020-07-07 11:32 ` [PATCH 0/5] cpuidle-pseries: Parse extended CEDE information for idle Gautham R Shenoy
2020-07-07 11:32   ` Gautham R Shenoy
2020-07-27 14:14   ` Rafael J. Wysocki
2020-07-27 14:14     ` Rafael J. Wysocki
2020-07-27 18:55     ` Gautham R Shenoy
2020-07-27 18:55       ` Gautham R Shenoy
2020-07-20  5:48 ` Vaidyanathan Srinivasan
2020-07-20  5:48   ` Vaidyanathan Srinivasan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1594120299-31389-5-git-send-email-ego@linux.vnet.ibm.com \
    --to=ego@linux.vnet.ibm.com \
    --cc=anton@ozlabs.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=mikey@neuling.org \
    --cc=mpe@ellerman.id.au \
    --cc=nathanl@linux.ibm.com \
    --cc=npiggin@gmail.com \
    --cc=svaidy@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.