All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC v2] powerpc/powernv: Introduce kernel param to control fastsleep workaround behavior
@ 2015-03-17  4:13 ` Shreyas B. Prabhu
  0 siblings, 0 replies; 9+ messages in thread
From: Shreyas B. Prabhu @ 2015-03-17  4:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Shreyas B. Prabhu, Michael Ellerman, Paul Mackerras,
	Benjamin Herrenschmidt, linuxppc-dev

Fastsleep is one of the idle state which cpuidle subsystem currently
uses on power8 machines. In this state L2 cache is brought down to a
threshold voltage. Therefore when the core is in fastsleep, the
communication between L2 and L3 needs to be fenced. But there is a bug
in the current power8 chips surrounding this fencing. OPAL provides an
interface to workaround this bug, and in the current implementation,
every time before a core enters fastsleep OPAL call is made to 'apply'
the workarond and when the core wakes up from fastsleep OPAL call is
made to 'undo' the workaround. These OPAL calls account for roughly
4000 cycles everytime the core has to enter or wakeup from fastsleep.
The other alternative is to apply this workaround once at boot, and not
undo it at all. While this would quicken fastsleep entry/wakeup path,
downside is, any correctable error detected in L2 directory will result
in a checkstop.

This patch adds a new kernel paramerter
pnv_fastsleep_workaround_once, which can be used to override
the default behavior and apply the workaround once at boot and not undo
it.

Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Paul Mackerras <paulus@samba.org>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: linuxppc-dev@lists.ozlabs.org
---
Changes in v2:
--------------
Accurately describes the downside of running workaround always applied.

 Documentation/kernel-parameters.txt            |  4 +++
 arch/powerpc/include/asm/opal.h                |  8 +++++
 arch/powerpc/platforms/powernv/opal-wrappers.S |  1 +
 arch/powerpc/platforms/powernv/setup.c         | 45 +++++++++++++++++++++++++-
 4 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index bfcb1a6..006863b 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2857,6 +2857,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			autoconfiguration.
 			Ranges are in pairs (memory base and size).
 
+	pnv_fastsleep_workaround_once=
+			[BUGS=ppc64] Tells kernel to apply fastsleep workaround
+			once at boot.
+
 	ports=		[IP_VS_FTP] IPVS ftp helper module
 			Default is 21.
 			Up to 8 (IP_VS_APP_MAX_PORTS) ports
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 9ee0a30..8bea8fc 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -180,6 +180,13 @@ struct opal_sg_list {
 #define OPAL_PM_WINKLE_ENABLED	0x00040000
 #define OPAL_PM_SLEEP_ENABLED_ER1	0x00080000
 
+/*
+ * OPAL_CONFIG_CPU_IDLE_STATE parameters
+ */
+#define OPAL_CONFIG_IDLE_FASTSLEEP	1
+#define OPAL_CONFIG_IDLE_UNDO		0
+#define OPAL_CONFIG_IDLE_APPLY		1
+
 #ifndef __ASSEMBLY__
 
 #include <linux/notifier.h>
@@ -924,6 +931,7 @@ int64_t opal_handle_hmi(void);
 int64_t opal_register_dump_region(uint32_t id, uint64_t start, uint64_t end);
 int64_t opal_unregister_dump_region(uint32_t id);
 int64_t opal_slw_set_reg(uint64_t cpu_pir, uint64_t sprn, uint64_t val);
+int64_t opal_config_cpu_idle_state(uint64_t state, uint64_t flag);
 int64_t opal_pci_set_phb_cxl_mode(uint64_t phb_id, uint64_t mode, uint64_t pe_number);
 int64_t opal_ipmi_send(uint64_t interface, struct opal_ipmi_msg *msg,
 		uint64_t msg_len);
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
index 0509bca..84a20bb 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -283,6 +283,7 @@ OPAL_CALL(opal_sensor_read,			OPAL_SENSOR_READ);
 OPAL_CALL(opal_get_param,			OPAL_GET_PARAM);
 OPAL_CALL(opal_set_param,			OPAL_SET_PARAM);
 OPAL_CALL(opal_handle_hmi,			OPAL_HANDLE_HMI);
+OPAL_CALL(opal_config_cpu_idle_state,		OPAL_CONFIG_CPU_IDLE_STATE);
 OPAL_CALL(opal_slw_set_reg,			OPAL_SLW_SET_REG);
 OPAL_CALL(opal_register_dump_region,		OPAL_REGISTER_DUMP_REGION);
 OPAL_CALL(opal_unregister_dump_region,		OPAL_UNREGISTER_DUMP_REGION);
diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c
index d2de7d5..21dde6c 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -28,6 +28,7 @@
 #include <linux/bug.h>
 #include <linux/pci.h>
 #include <linux/cpufreq.h>
+#include <linux/cpumask.h>
 
 #include <asm/machdep.h>
 #include <asm/firmware.h>
@@ -405,6 +406,20 @@ u32 pnv_get_supported_cpuidle_states(void)
 }
 EXPORT_SYMBOL_GPL(pnv_get_supported_cpuidle_states);
 
+u8 pnv_apply_fastsleep_workaround_once;
+
+static int __init pnv_fastsleep_workaround_once(char *str)
+{
+	pnv_apply_fastsleep_workaround_once = 1;
+	return 0;
+}
+early_param("pnv_fastsleep_workaround_once", pnv_fastsleep_workaround_once);
+
+static void __init pnv_fastsleep_workaround_apply(void *info)
+{
+	opal_config_cpu_idle_state(OPAL_CONFIG_IDLE_FASTSLEEP,
+					OPAL_CONFIG_IDLE_APPLY);
+}
 static int __init pnv_init_idle_states(void)
 {
 	struct device_node *power_mgt;
@@ -440,7 +455,35 @@ static int __init pnv_init_idle_states(void)
 		flags = be32_to_cpu(idle_state_flags[i]);
 		supported_cpuidle_states |= flags;
 	}
-	if (!(supported_cpuidle_states & OPAL_PM_SLEEP_ENABLED_ER1)) {
+
+	/*
+	 * If OPAL_PM_SLEEP_ENABLED_ER1 is set, it indicates that workaround is
+	 * needed to use fastsleep. Check whether the workaround has to be
+	 * applied only once.
+	 */
+	if ((supported_cpuidle_states & OPAL_PM_SLEEP_ENABLED_ER1)
+			&& pnv_apply_fastsleep_workaround_once) {
+		/*
+		 * Workaround needs to be applied by one thread in each core
+		 */
+		cpumask_t primary_thread_mask = cpu_thread_mask_to_cores(cpu_online_mask);
+
+		on_each_cpu_mask(&primary_thread_mask,
+					pnv_fastsleep_workaround_apply,
+					NULL, 1);
+	}
+
+	/*
+	 * In the fastsleep entry/exit path, calls to workaround are always
+	 * made with an expectation that they will be patched out when not
+	 * needed.
+	 * Patch out these calls in following scenarios-
+	 * 1. OPAL_PM_SLEEP_ENABLED_ER1 is not set. Indicating the underlying
+	 * hardware does not have the bug.
+	 * 2. Kernel is running with workaround always applied.
+	 */
+	if (!(supported_cpuidle_states & OPAL_PM_SLEEP_ENABLED_ER1)
+			|| pnv_apply_fastsleep_workaround_once) {
 		patch_instruction(
 			(unsigned int *)pnv_fastsleep_workaround_at_entry,
 			PPC_INST_NOP);
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH RFC v2] powerpc/powernv: Introduce kernel param to control fastsleep workaround behavior
@ 2015-03-17  4:13 ` Shreyas B. Prabhu
  0 siblings, 0 replies; 9+ messages in thread
From: Shreyas B. Prabhu @ 2015-03-17  4:13 UTC (permalink / raw)
  To: linux-kernel; +Cc: Shreyas B. Prabhu, Paul Mackerras, linuxppc-dev

Fastsleep is one of the idle state which cpuidle subsystem currently
uses on power8 machines. In this state L2 cache is brought down to a
threshold voltage. Therefore when the core is in fastsleep, the
communication between L2 and L3 needs to be fenced. But there is a bug
in the current power8 chips surrounding this fencing. OPAL provides an
interface to workaround this bug, and in the current implementation,
every time before a core enters fastsleep OPAL call is made to 'apply'
the workarond and when the core wakes up from fastsleep OPAL call is
made to 'undo' the workaround. These OPAL calls account for roughly
4000 cycles everytime the core has to enter or wakeup from fastsleep.
The other alternative is to apply this workaround once at boot, and not
undo it at all. While this would quicken fastsleep entry/wakeup path,
downside is, any correctable error detected in L2 directory will result
in a checkstop.

This patch adds a new kernel paramerter
pnv_fastsleep_workaround_once, which can be used to override
the default behavior and apply the workaround once at boot and not undo
it.

Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Paul Mackerras <paulus@samba.org>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: linuxppc-dev@lists.ozlabs.org
---
Changes in v2:
--------------
Accurately describes the downside of running workaround always applied.

 Documentation/kernel-parameters.txt            |  4 +++
 arch/powerpc/include/asm/opal.h                |  8 +++++
 arch/powerpc/platforms/powernv/opal-wrappers.S |  1 +
 arch/powerpc/platforms/powernv/setup.c         | 45 +++++++++++++++++++++++++-
 4 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index bfcb1a6..006863b 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2857,6 +2857,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			autoconfiguration.
 			Ranges are in pairs (memory base and size).
 
+	pnv_fastsleep_workaround_once=
+			[BUGS=ppc64] Tells kernel to apply fastsleep workaround
+			once at boot.
+
 	ports=		[IP_VS_FTP] IPVS ftp helper module
 			Default is 21.
 			Up to 8 (IP_VS_APP_MAX_PORTS) ports
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 9ee0a30..8bea8fc 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -180,6 +180,13 @@ struct opal_sg_list {
 #define OPAL_PM_WINKLE_ENABLED	0x00040000
 #define OPAL_PM_SLEEP_ENABLED_ER1	0x00080000
 
+/*
+ * OPAL_CONFIG_CPU_IDLE_STATE parameters
+ */
+#define OPAL_CONFIG_IDLE_FASTSLEEP	1
+#define OPAL_CONFIG_IDLE_UNDO		0
+#define OPAL_CONFIG_IDLE_APPLY		1
+
 #ifndef __ASSEMBLY__
 
 #include <linux/notifier.h>
@@ -924,6 +931,7 @@ int64_t opal_handle_hmi(void);
 int64_t opal_register_dump_region(uint32_t id, uint64_t start, uint64_t end);
 int64_t opal_unregister_dump_region(uint32_t id);
 int64_t opal_slw_set_reg(uint64_t cpu_pir, uint64_t sprn, uint64_t val);
+int64_t opal_config_cpu_idle_state(uint64_t state, uint64_t flag);
 int64_t opal_pci_set_phb_cxl_mode(uint64_t phb_id, uint64_t mode, uint64_t pe_number);
 int64_t opal_ipmi_send(uint64_t interface, struct opal_ipmi_msg *msg,
 		uint64_t msg_len);
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
index 0509bca..84a20bb 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -283,6 +283,7 @@ OPAL_CALL(opal_sensor_read,			OPAL_SENSOR_READ);
 OPAL_CALL(opal_get_param,			OPAL_GET_PARAM);
 OPAL_CALL(opal_set_param,			OPAL_SET_PARAM);
 OPAL_CALL(opal_handle_hmi,			OPAL_HANDLE_HMI);
+OPAL_CALL(opal_config_cpu_idle_state,		OPAL_CONFIG_CPU_IDLE_STATE);
 OPAL_CALL(opal_slw_set_reg,			OPAL_SLW_SET_REG);
 OPAL_CALL(opal_register_dump_region,		OPAL_REGISTER_DUMP_REGION);
 OPAL_CALL(opal_unregister_dump_region,		OPAL_UNREGISTER_DUMP_REGION);
diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c
index d2de7d5..21dde6c 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -28,6 +28,7 @@
 #include <linux/bug.h>
 #include <linux/pci.h>
 #include <linux/cpufreq.h>
+#include <linux/cpumask.h>
 
 #include <asm/machdep.h>
 #include <asm/firmware.h>
@@ -405,6 +406,20 @@ u32 pnv_get_supported_cpuidle_states(void)
 }
 EXPORT_SYMBOL_GPL(pnv_get_supported_cpuidle_states);
 
+u8 pnv_apply_fastsleep_workaround_once;
+
+static int __init pnv_fastsleep_workaround_once(char *str)
+{
+	pnv_apply_fastsleep_workaround_once = 1;
+	return 0;
+}
+early_param("pnv_fastsleep_workaround_once", pnv_fastsleep_workaround_once);
+
+static void __init pnv_fastsleep_workaround_apply(void *info)
+{
+	opal_config_cpu_idle_state(OPAL_CONFIG_IDLE_FASTSLEEP,
+					OPAL_CONFIG_IDLE_APPLY);
+}
 static int __init pnv_init_idle_states(void)
 {
 	struct device_node *power_mgt;
@@ -440,7 +455,35 @@ static int __init pnv_init_idle_states(void)
 		flags = be32_to_cpu(idle_state_flags[i]);
 		supported_cpuidle_states |= flags;
 	}
-	if (!(supported_cpuidle_states & OPAL_PM_SLEEP_ENABLED_ER1)) {
+
+	/*
+	 * If OPAL_PM_SLEEP_ENABLED_ER1 is set, it indicates that workaround is
+	 * needed to use fastsleep. Check whether the workaround has to be
+	 * applied only once.
+	 */
+	if ((supported_cpuidle_states & OPAL_PM_SLEEP_ENABLED_ER1)
+			&& pnv_apply_fastsleep_workaround_once) {
+		/*
+		 * Workaround needs to be applied by one thread in each core
+		 */
+		cpumask_t primary_thread_mask = cpu_thread_mask_to_cores(cpu_online_mask);
+
+		on_each_cpu_mask(&primary_thread_mask,
+					pnv_fastsleep_workaround_apply,
+					NULL, 1);
+	}
+
+	/*
+	 * In the fastsleep entry/exit path, calls to workaround are always
+	 * made with an expectation that they will be patched out when not
+	 * needed.
+	 * Patch out these calls in following scenarios-
+	 * 1. OPAL_PM_SLEEP_ENABLED_ER1 is not set. Indicating the underlying
+	 * hardware does not have the bug.
+	 * 2. Kernel is running with workaround always applied.
+	 */
+	if (!(supported_cpuidle_states & OPAL_PM_SLEEP_ENABLED_ER1)
+			|| pnv_apply_fastsleep_workaround_once) {
 		patch_instruction(
 			(unsigned int *)pnv_fastsleep_workaround_at_entry,
 			PPC_INST_NOP);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC, v2] powerpc/powernv: Introduce kernel param to control fastsleep workaround behavior
  2015-03-17  4:13 ` Shreyas B. Prabhu
  (?)
@ 2015-03-17  8:57 ` Michael Ellerman
  2015-03-17  9:39     ` Benjamin Herrenschmidt
  -1 siblings, 1 reply; 9+ messages in thread
From: Michael Ellerman @ 2015-03-17  8:57 UTC (permalink / raw)
  To: Shreyas B. Prabhu, linux-kernel
  Cc: Shreyas B. Prabhu, Paul Mackerras, linuxppc-dev

On Tue, 2015-17-03 at 04:13:53 UTC, "Shreyas B. Prabhu" wrote:
> Fastsleep is one of the idle state which cpuidle subsystem currently
> uses on power8 machines. In this state L2 cache is brought down to a
> threshold voltage. Therefore when the core is in fastsleep, the
> communication between L2 and L3 needs to be fenced. But there is a bug
> in the current power8 chips surrounding this fencing. OPAL provides an
> interface to workaround this bug, and in the current implementation,
> every time before a core enters fastsleep OPAL call is made to 'apply'
> the workarond and when the core wakes up from fastsleep OPAL call is
> made to 'undo' the workaround. These OPAL calls account for roughly
> 4000 cycles everytime the core has to enter or wakeup from fastsleep.

OK. The bit you don't explain is that while the workaround is applied there is
a risk ...

> The other alternative is to apply this workaround once at boot, and not
> undo it at all. While this would quicken fastsleep entry/wakeup path,
> downside is, any correctable error detected in L2 directory will result
> in a checkstop.

Of this happening.

Which is why we don't just always apply the workaround. Am I right?

> This patch adds a new kernel paramerter
> pnv_fastsleep_workaround_once, which can be used to override
> the default behavior and apply the workaround once at boot and not undo
> it.

So my first preference is that you just bite the bullet and decide to either
always apply the workaround, or just stick with the current behaviour. That's a
trade-off between (I think) better idle latency but a risk of checkstops, vs.
slower idle latency but less (how much less?) risk of checkstops.

I think the reason you're proposing a kernel parameter is because we aren't
willing to make that decision, ie. we're saying that users should decide. Is
that right?

I'm not a big fan of kernel parameters. They are a pain to use, and are often
just pushing a decision down one layer for no reason. What I mean is that
individual users are probably just going to accept whatever the default value
is from their distro.

But anyway, that's a bit of a rant.

As far as this patch is concerned, I don't think it actually needs to be a
kernel parameter.

>From what I can see below, the decision as to whether you apply the workaround
or not doesn't affect the list of idle states. So this could just as well be a
runtime parameter, ie. a sysfs file, which can then be set by the user whenever
they like? They might do it in a boot script, but that's up to them.

For simplicity I think it would also be fine to make it a write-once parameter,
ie. you don't need to handle undoing it.

I think the only complication that would add is that you'd need to be a little
careful about the order in which you nop out the calls vs applying the
workaround, in case some threads are idle when you're called.

cheers

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC, v2] powerpc/powernv: Introduce kernel param to control fastsleep workaround behavior
  2015-03-17  8:57 ` [RFC, " Michael Ellerman
@ 2015-03-17  9:39     ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 9+ messages in thread
From: Benjamin Herrenschmidt @ 2015-03-17  9:39 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Shreyas B. Prabhu, linux-kernel, Paul Mackerras, linuxppc-dev

On Tue, 2015-03-17 at 19:57 +1100, Michael Ellerman wrote:
> 
> So my first preference is that you just bite the bullet and decide to either
> always apply the workaround, or just stick with the current behaviour. That's a
> trade-off between (I think) better idle latency but a risk of checkstops, vs.
> slower idle latency but less (how much less?) risk of checkstops.
> 
> I think the reason you're proposing a kernel parameter is because we aren't
> willing to make that decision, ie. we're saying that users should decide. Is
> that right?

Correct. More specifically, a fairly high profile user that I will not
name here has expressed interest in such a feature...

> I'm not a big fan of kernel parameters. They are a pain to use, and are often
> just pushing a decision down one layer for no reason. What I mean is that
> individual users are probably just going to accept whatever the default value
> is from their distro.

Right. This is quite an obscure tunable.

> But anyway, that's a bit of a rant.
> 
> As far as this patch is concerned, I don't think it actually needs to be a
> kernel parameter.
> 
> >From what I can see below, the decision as to whether you apply the workaround
> or not doesn't affect the list of idle states. So this could just as well be a
> runtime parameter, ie. a sysfs file, which can then be set by the user whenever
> they like? They might do it in a boot script, but that's up to them.

Right, that would work too.

> For simplicity I think it would also be fine to make it a write-once parameter,
> ie. you don't need to handle undoing it.

It would be easy enough to make it rw using stop machine I think... 

> I think the only complication that would add is that you'd need to be a little
> careful about the order in which you nop out the calls vs applying the
> workaround, in case some threads are idle when you're called.

I wouldn't bother with NOP'ing in that case, a runtime test will probably be noise
in the measurement.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC, v2] powerpc/powernv: Introduce kernel param to control fastsleep workaround behavior
@ 2015-03-17  9:39     ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 9+ messages in thread
From: Benjamin Herrenschmidt @ 2015-03-17  9:39 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Shreyas B. Prabhu, linuxppc-dev, Paul Mackerras, linux-kernel

On Tue, 2015-03-17 at 19:57 +1100, Michael Ellerman wrote:
> 
> So my first preference is that you just bite the bullet and decide to either
> always apply the workaround, or just stick with the current behaviour. That's a
> trade-off between (I think) better idle latency but a risk of checkstops, vs.
> slower idle latency but less (how much less?) risk of checkstops.
> 
> I think the reason you're proposing a kernel parameter is because we aren't
> willing to make that decision, ie. we're saying that users should decide. Is
> that right?

Correct. More specifically, a fairly high profile user that I will not
name here has expressed interest in such a feature...

> I'm not a big fan of kernel parameters. They are a pain to use, and are often
> just pushing a decision down one layer for no reason. What I mean is that
> individual users are probably just going to accept whatever the default value
> is from their distro.

Right. This is quite an obscure tunable.

> But anyway, that's a bit of a rant.
> 
> As far as this patch is concerned, I don't think it actually needs to be a
> kernel parameter.
> 
> >From what I can see below, the decision as to whether you apply the workaround
> or not doesn't affect the list of idle states. So this could just as well be a
> runtime parameter, ie. a sysfs file, which can then be set by the user whenever
> they like? They might do it in a boot script, but that's up to them.

Right, that would work too.

> For simplicity I think it would also be fine to make it a write-once parameter,
> ie. you don't need to handle undoing it.

It would be easy enough to make it rw using stop machine I think... 

> I think the only complication that would add is that you'd need to be a little
> careful about the order in which you nop out the calls vs applying the
> workaround, in case some threads are idle when you're called.

I wouldn't bother with NOP'ing in that case, a runtime test will probably be noise
in the measurement.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC, v2] powerpc/powernv: Introduce kernel param to control fastsleep workaround behavior
  2015-03-17  9:39     ` Benjamin Herrenschmidt
@ 2015-03-17 15:49       ` Shreyas B Prabhu
  -1 siblings, 0 replies; 9+ messages in thread
From: Shreyas B Prabhu @ 2015-03-17 15:49 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Michael Ellerman
  Cc: linux-kernel, Paul Mackerras, linuxppc-dev



On Tuesday 17 March 2015 03:09 PM, Benjamin Herrenschmidt wrote:
> On Tue, 2015-03-17 at 19:57 +1100, Michael Ellerman wrote:
>>
>> So my first preference is that you just bite the bullet and decide to either
>> always apply the workaround, or just stick with the current behaviour. That's a
>> trade-off between (I think) better idle latency but a risk of checkstops, vs.
>> slower idle latency but less (how much less?) risk of checkstops.
>>
>> I think the reason you're proposing a kernel parameter is because we aren't
>> willing to make that decision, ie. we're saying that users should decide. Is
>> that right?
> 
> Correct. More specifically, a fairly high profile user that I will not
> name here has expressed interest in such a feature...
> 
>> I'm not a big fan of kernel parameters. They are a pain to use, and are often
>> just pushing a decision down one layer for no reason. What I mean is that
>> individual users are probably just going to accept whatever the default value
>> is from their distro.
> 
> Right. This is quite an obscure tunable.
> 
>> But anyway, that's a bit of a rant.
>>
>> As far as this patch is concerned, I don't think it actually needs to be a
>> kernel parameter.
>>
>> >From what I can see below, the decision as to whether you apply the workaround
>> or not doesn't affect the list of idle states. So this could just as well be a
>> runtime parameter, ie. a sysfs file, which can then be set by the user whenever
>> they like? They might do it in a boot script, but that's up to them.
> 
> Right, that would work too.

Okay. I'll send a patch with this design.
> 
>> For simplicity I think it would also be fine to make it a write-once parameter,
>> ie. you don't need to handle undoing it.
> 
> It would be easy enough to make it rw using stop machine I think... 
> 
>> I think the only complication that would add is that you'd need to be a little
>> careful about the order in which you nop out the calls vs applying the
>> workaround, in case some threads are idle when you're called.

Right, we should be safe with this sequence-
- NOP call to undo workaround
- Apply workaround on all cores.
- NOP call to apply workaround

> 
> I wouldn't bother with NOP'ing in that case, a runtime test will probably be noise
> in the measurement.
> 

Didn't get your point here. Do you mean, ignore the request if some
cores are in sleep or deeper state?
> Cheers,
> Ben.
> 
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC, v2] powerpc/powernv: Introduce kernel param to control fastsleep workaround behavior
@ 2015-03-17 15:49       ` Shreyas B Prabhu
  0 siblings, 0 replies; 9+ messages in thread
From: Shreyas B Prabhu @ 2015-03-17 15:49 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Michael Ellerman
  Cc: linuxppc-dev, Paul Mackerras, linux-kernel



On Tuesday 17 March 2015 03:09 PM, Benjamin Herrenschmidt wrote:
> On Tue, 2015-03-17 at 19:57 +1100, Michael Ellerman wrote:
>>
>> So my first preference is that you just bite the bullet and decide to either
>> always apply the workaround, or just stick with the current behaviour. That's a
>> trade-off between (I think) better idle latency but a risk of checkstops, vs.
>> slower idle latency but less (how much less?) risk of checkstops.
>>
>> I think the reason you're proposing a kernel parameter is because we aren't
>> willing to make that decision, ie. we're saying that users should decide. Is
>> that right?
> 
> Correct. More specifically, a fairly high profile user that I will not
> name here has expressed interest in such a feature...
> 
>> I'm not a big fan of kernel parameters. They are a pain to use, and are often
>> just pushing a decision down one layer for no reason. What I mean is that
>> individual users are probably just going to accept whatever the default value
>> is from their distro.
> 
> Right. This is quite an obscure tunable.
> 
>> But anyway, that's a bit of a rant.
>>
>> As far as this patch is concerned, I don't think it actually needs to be a
>> kernel parameter.
>>
>> >From what I can see below, the decision as to whether you apply the workaround
>> or not doesn't affect the list of idle states. So this could just as well be a
>> runtime parameter, ie. a sysfs file, which can then be set by the user whenever
>> they like? They might do it in a boot script, but that's up to them.
> 
> Right, that would work too.

Okay. I'll send a patch with this design.
> 
>> For simplicity I think it would also be fine to make it a write-once parameter,
>> ie. you don't need to handle undoing it.
> 
> It would be easy enough to make it rw using stop machine I think... 
> 
>> I think the only complication that would add is that you'd need to be a little
>> careful about the order in which you nop out the calls vs applying the
>> workaround, in case some threads are idle when you're called.

Right, we should be safe with this sequence-
- NOP call to undo workaround
- Apply workaround on all cores.
- NOP call to apply workaround

> 
> I wouldn't bother with NOP'ing in that case, a runtime test will probably be noise
> in the measurement.
> 

Didn't get your point here. Do you mean, ignore the request if some
cores are in sleep or deeper state?
> Cheers,
> Ben.
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC, v2] powerpc/powernv: Introduce kernel param to control fastsleep workaround behavior
  2015-03-17 15:49       ` Shreyas B Prabhu
@ 2015-03-18  4:55         ` Michael Ellerman
  -1 siblings, 0 replies; 9+ messages in thread
From: Michael Ellerman @ 2015-03-18  4:55 UTC (permalink / raw)
  To: Shreyas B Prabhu
  Cc: Benjamin Herrenschmidt, linux-kernel, Paul Mackerras, linuxppc-dev

On Tue, 2015-03-17 at 21:19 +0530, Shreyas B Prabhu wrote:
> 
> On Tuesday 17 March 2015 03:09 PM, Benjamin Herrenschmidt wrote:
> > On Tue, 2015-03-17 at 19:57 +1100, Michael Ellerman wrote:
> >>
> >> >From what I can see below, the decision as to whether you apply the workaround
> >> or not doesn't affect the list of idle states. So this could just as well be a
> >> runtime parameter, ie. a sysfs file, which can then be set by the user whenever
> >> they like? They might do it in a boot script, but that's up to them.
> > 
> > Right, that would work too.
> 
> Okay. I'll send a patch with this design.

Thanks.

> >> For simplicity I think it would also be fine to make it a write-once parameter,
> >> ie. you don't need to handle undoing it.
> > 
> > It would be easy enough to make it rw using stop machine I think... 
> > 
> >> I think the only complication that would add is that you'd need to be a little
> >> careful about the order in which you nop out the calls vs applying the
> >> workaround, in case some threads are idle when you're called.
> 
> Right, we should be safe with this sequence-
> - NOP call to undo workaround
> - Apply workaround on all cores.
> - NOP call to apply workaround
 
Yeah that sounds right.

> > I wouldn't bother with NOP'ing in that case, a runtime test will probably be noise
> > in the measurement.
> 
> Didn't get your point here. Do you mean, ignore the request if some
> cores are in sleep or deeper state?

I *think* what he means is we probably don't actually need to patch a nop
in/out. Instead we could just test a flag, because the cost of testing a flag
is miniscule compared to the rest of the logic.

cheers



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC, v2] powerpc/powernv: Introduce kernel param to control fastsleep workaround behavior
@ 2015-03-18  4:55         ` Michael Ellerman
  0 siblings, 0 replies; 9+ messages in thread
From: Michael Ellerman @ 2015-03-18  4:55 UTC (permalink / raw)
  To: Shreyas B Prabhu; +Cc: linuxppc-dev, Paul Mackerras, linux-kernel

On Tue, 2015-03-17 at 21:19 +0530, Shreyas B Prabhu wrote:
> 
> On Tuesday 17 March 2015 03:09 PM, Benjamin Herrenschmidt wrote:
> > On Tue, 2015-03-17 at 19:57 +1100, Michael Ellerman wrote:
> >>
> >> >From what I can see below, the decision as to whether you apply the workaround
> >> or not doesn't affect the list of idle states. So this could just as well be a
> >> runtime parameter, ie. a sysfs file, which can then be set by the user whenever
> >> they like? They might do it in a boot script, but that's up to them.
> > 
> > Right, that would work too.
> 
> Okay. I'll send a patch with this design.

Thanks.

> >> For simplicity I think it would also be fine to make it a write-once parameter,
> >> ie. you don't need to handle undoing it.
> > 
> > It would be easy enough to make it rw using stop machine I think... 
> > 
> >> I think the only complication that would add is that you'd need to be a little
> >> careful about the order in which you nop out the calls vs applying the
> >> workaround, in case some threads are idle when you're called.
> 
> Right, we should be safe with this sequence-
> - NOP call to undo workaround
> - Apply workaround on all cores.
> - NOP call to apply workaround
 
Yeah that sounds right.

> > I wouldn't bother with NOP'ing in that case, a runtime test will probably be noise
> > in the measurement.
> 
> Didn't get your point here. Do you mean, ignore the request if some
> cores are in sleep or deeper state?

I *think* what he means is we probably don't actually need to patch a nop
in/out. Instead we could just test a flag, because the cost of testing a flag
is miniscule compared to the rest of the logic.

cheers

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-03-18  4:56 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-17  4:13 [PATCH RFC v2] powerpc/powernv: Introduce kernel param to control fastsleep workaround behavior Shreyas B. Prabhu
2015-03-17  4:13 ` Shreyas B. Prabhu
2015-03-17  8:57 ` [RFC, " Michael Ellerman
2015-03-17  9:39   ` Benjamin Herrenschmidt
2015-03-17  9:39     ` Benjamin Herrenschmidt
2015-03-17 15:49     ` Shreyas B Prabhu
2015-03-17 15:49       ` Shreyas B Prabhu
2015-03-18  4:55       ` Michael Ellerman
2015-03-18  4:55         ` Michael Ellerman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.