linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] Add suspend-to-idle validation for Intel SoCs
@ 2016-06-02  4:33 dbasehore
  2016-06-02  4:33 ` [PATCH 1/5] x86: stub out pmc function dbasehore
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: dbasehore @ 2016-06-02  4:33 UTC (permalink / raw)
  To: linux-kernel; +Cc: dbasehore, linux-pm, rjw, pavel, len.brown, tglx

From: Derek Basehore <dbasehore@chromium.org>

This patch set adds support for catching errors when entering freeze
on Intel Skylake SoCs. Support for this can be added to newer SoCs in
later patches.

Verification is done by waking up the CPU once every X (default 10)
seconds to check the residency of S0ix. This can't be verified before
attempting to enter S0ix through mwait, so we have to repeatedly
verify entry into that state. Successfully entering S0ix is no
guarantee that it will be entered on the next attempt, so we have to
schedule another check. This has a minimal power impact of <1% of the
total system power on our systems.

This relies on the recently added patch "platform/x86: Add PMC Driver
for Intel Core SoC"

Derek Basehore (5):
  x86: stub out pmc function
  clockevents: Add timed freeze
  x86, apic: Add timed freeze support
  freeze: Add error reporting
  intel_idle: Add S0ix validation

 arch/x86/include/asm/pmc_core.h |   6 +-
 arch/x86/kernel/apic/apic.c     |  25 +++++-
 drivers/acpi/processor_idle.c   |  10 ++-
 drivers/cpuidle/cpuidle.c       |  31 +++++--
 drivers/idle/intel_idle.c       | 185 +++++++++++++++++++++++++++++++++++++---
 include/linux/clockchips.h      |  10 +++
 include/linux/cpuidle.h         |  10 ++-
 include/linux/suspend.h         |  10 +++
 kernel/power/suspend.c          |  11 ++-
 kernel/time/clockevents.c       | 117 +++++++++++++++++++++++++
 10 files changed, 385 insertions(+), 30 deletions(-)

-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/5] x86: stub out pmc function
  2016-06-02  4:33 [PATCH 0/5] Add suspend-to-idle validation for Intel SoCs dbasehore
@ 2016-06-02  4:33 ` dbasehore
  2016-06-02  4:33 ` [PATCH 2/5] clockevents: Add timed freeze dbasehore
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: dbasehore @ 2016-06-02  4:33 UTC (permalink / raw)
  To: linux-kernel; +Cc: dbasehore, linux-pm, rjw, pavel, len.brown, tglx

From: Derek Basehore <dbasehore@chromium.org>

This creates an inline function of intel_pmc_slp_s0_counter_read for
!CONFIG_INTEL_PMC_CORE.

Signed-off-by: Derek Basehore <dbasehore@chromium.org>
---
 arch/x86/include/asm/pmc_core.h | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/pmc_core.h b/arch/x86/include/asm/pmc_core.h
index d4855f1..786e526 100644
--- a/arch/x86/include/asm/pmc_core.h
+++ b/arch/x86/include/asm/pmc_core.h
@@ -22,6 +22,10 @@
 #define _ASM_PMC_CORE_H
 
 /* API to read SLP_S0_RESIDENCY counter */
-int intel_pmc_slp_s0_counter_read(u32 *data);
+#ifdef CONFIG_INTEL_PMC_CORE
+extern int intel_pmc_slp_s0_counter_read(u32 *data);
+#else
+static inline int intel_pmc_slp_s0_counter_read(u32 *data) { return -ENOSYS; }
+#endif
 
 #endif /* _ASM_PMC_CORE_H */
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/5] clockevents: Add timed freeze
  2016-06-02  4:33 [PATCH 0/5] Add suspend-to-idle validation for Intel SoCs dbasehore
  2016-06-02  4:33 ` [PATCH 1/5] x86: stub out pmc function dbasehore
@ 2016-06-02  4:33 ` dbasehore
  2016-06-02  4:33 ` [PATCH 3/5] x86, apic: Add timed freeze support dbasehore
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: dbasehore @ 2016-06-02  4:33 UTC (permalink / raw)
  To: linux-kernel; +Cc: dbasehore, linux-pm, rjw, pavel, len.brown, tglx

From: Derek Basehore <dbasehore@chromium.org>

Adds a new feature to clockevents to schedule wakeups on a CPU during
freeze. These won't fully wake up the system, but allow simple
platform callbacks that don't require device support to be run during
freeze with little power impact.

This implementation allows an idle driver to setup a timer event with
the clock event device when entering freeze by calling timed_freeze.
timed_freeze is a wrapper around entering freeze and setting up the
timer event so that clock event device state is not exposed to the
caller. This only allows one user of a timed freeze event, but that
will likely be the case for some time.

The reason this isn't implemented on top of the RTC is that wake irqs
unconditionally wake up the system by calling pm_system_wakeup. This
is done by both the SCI and RTC irqs on x86 when an RTC alarm fires.

An abstraction could be easily added on top of this if there is ever
more than one user for freeze events.

Signed-off-by: Derek Basehore <dbasehore@chromium.org>
---
 include/linux/clockchips.h |  10 ++++
 include/linux/suspend.h    |  10 ++++
 kernel/time/clockevents.c  | 117 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 137 insertions(+)

diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
index 0d442e3..e1b66d4 100644
--- a/include/linux/clockchips.h
+++ b/include/linux/clockchips.h
@@ -66,12 +66,20 @@ enum clock_event_state {
  */
 # define CLOCK_EVT_FEAT_HRTIMER		0x000080
 
+/*
+ * Clockevent device may run during freeze
+ */
+# define CLOCK_EVT_FEAT_FREEZE		0x000100
+
 /**
  * struct clock_event_device - clock event device descriptor
  * @event_handler:	Assigned by the framework to be called by the low
  *			level handler of the event source
  * @set_next_event:	set next event function using a clocksource delta
  * @set_next_ktime:	set next event function using a direct ktime value
+ * @event_pending:	check if the programmed event is still pending. Used
+ *			for freeze events when timekeeping is suspended and
+ *			irqs are disabled.
  * @next_event:		local storage for the next event in oneshot mode
  * @max_delta_ns:	maximum delta value in ns
  * @min_delta_ns:	minimum delta value in ns
@@ -100,7 +108,9 @@ struct clock_event_device {
 	void			(*event_handler)(struct clock_event_device *);
 	int			(*set_next_event)(unsigned long evt, struct clock_event_device *);
 	int			(*set_next_ktime)(ktime_t expires, struct clock_event_device *);
+	bool			(*event_expired)(struct clock_event_device *);
 	ktime_t			next_event;
+	bool			freeze_event_programmed;
 	u64			max_delta_ns;
 	u64			min_delta_ns;
 	u32			mult;
diff --git a/include/linux/suspend.h b/include/linux/suspend.h
index 8b6ec7e..d8d4296 100644
--- a/include/linux/suspend.h
+++ b/include/linux/suspend.h
@@ -194,6 +194,11 @@ struct platform_freeze_ops {
 	void (*end)(void);
 };
 
+struct timed_freeze_ops {
+	int (*enter_freeze)(void *);
+	int (*callback)(void *);
+};
+
 #ifdef CONFIG_SUSPEND
 /**
  * suspend_set_ops - set platform dependent suspend operations
@@ -246,6 +251,9 @@ static inline bool idle_should_freeze(void)
 	return unlikely(suspend_freeze_state == FREEZE_STATE_ENTER);
 }
 
+extern int timed_freeze(struct timed_freeze_ops *ops, void *data,
+			ktime_t delta);
+
 extern void freeze_set_ops(const struct platform_freeze_ops *ops);
 extern void freeze_wake(void);
 
@@ -280,6 +288,8 @@ static inline bool pm_resume_via_firmware(void) { return false; }
 static inline void suspend_set_ops(const struct platform_suspend_ops *ops) {}
 static inline int pm_suspend(suspend_state_t state) { return -ENOSYS; }
 static inline bool idle_should_freeze(void) { return false; }
+static inline int timed_freeze(struct timed_freeze_ops *ops, void *data,
+			       ktime_t delta) { return -ENOSYS; }
 static inline void freeze_set_ops(const struct platform_freeze_ops *ops) {}
 static inline void freeze_wake(void) {}
 #endif /* !CONFIG_SUSPEND */
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index a9b76a4..e7ca673 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -17,6 +17,7 @@
 #include <linux/module.h>
 #include <linux/smp.h>
 #include <linux/device.h>
+#include <linux/suspend.h>
 
 #include "tick-internal.h"
 
@@ -341,6 +342,122 @@ int clockevents_program_event(struct clock_event_device *dev, ktime_t expires,
 	return (rc && force) ? clockevents_program_min_delta(dev) : rc;
 }
 
+static int clockevents_program_freeze_event(struct clock_event_device *dev,
+					    ktime_t delta)
+{
+	int64_t delta_ns = ktime_to_ns(delta);
+	unsigned long long clc;
+	int ret;
+
+	if (delta_ns > (int64_t) dev->max_delta_ns) {
+		printk_deferred(KERN_WARNING
+				"Freeze event time longer than max delta\n");
+		delta_ns = (int64_t) dev->max_delta_ns;
+	}
+
+	clockevents_tick_resume(dev);
+	clockevents_switch_state(dev, CLOCK_EVT_STATE_ONESHOT);
+	delta_ns = max_t(int64_t, delta_ns, dev->min_delta_ns);
+	clc = ((unsigned long long) delta_ns * dev->mult) >> dev->shift;
+	ret = dev->set_next_event((unsigned long) clc, dev);
+	if (ret < 0) {
+		printk_deferred(KERN_WARNING
+				"Failed to program freeze event\n");
+		clockevents_shutdown(dev);
+	} else {
+		dev->freeze_event_programmed = true;
+	}
+
+	return ret;
+}
+
+static bool clockevents_freeze_event_expired(struct clock_event_device *dev)
+{
+	if (dev->freeze_event_programmed)
+		return dev->event_expired(dev);
+
+	return false;
+}
+
+static void clockevents_cleanup_freeze_event(struct clock_event_device *dev)
+{
+	if (!(dev->features & CLOCK_EVT_FEAT_FREEZE))
+		return;
+
+	clockevents_shutdown(dev);
+	dev->freeze_event_programmed = false;
+}
+
+/**
+ * timed_freeze - Enter freeze on a CPU for a timed duration
+ * @ops:	Pointers for enter freeze and callback functions.
+ * @data:	Pointer to pass arguments to the function pointers.
+ * @delta:	Time to freeze for. If this amount of time passes in freeze, the
+ *		callback in ops will be called.
+ *
+ * Returns the value from ops->enter_freeze or ops->callback on success, -EERROR
+ * otherwise. If an error is encountered while setting up the clock event,
+ * freeze with still be entered, but it will not be timed nor will the callback
+ * function be run.
+ */
+int timed_freeze(struct timed_freeze_ops *ops, void *data, ktime_t delta)
+{
+	int cpu = smp_processor_id();
+	struct tick_device *td = tick_get_device(cpu);
+	struct clock_event_device *dev;
+	int ret;
+
+	if (!ops || !ops->enter_freeze) {
+		printk_deferred(KERN_ERR
+				"[%s] called with invalid ops\n", __func__);
+		return -EINVAL;
+	}
+
+	if (!td || !td->evtdev ||
+	    !(td->evtdev->features & CLOCK_EVT_FEAT_FREEZE)) {
+		printk_deferred(KERN_WARNING
+				"[%s] called with invalid clock event device\n",
+				__func__);
+		ret = -ENOSYS;
+		goto freeze_no_check;
+	}
+
+	dev = td->evtdev;
+	if (!clockevent_state_shutdown(dev)) {
+		printk_deferred(KERN_WARNING
+				"[%s] called while clock event device in use\n",
+				__func__);
+		ret = -EBUSY;
+		goto freeze_no_check;
+	}
+
+	ret = clockevents_program_freeze_event(dev, delta);
+	if (ret < 0)
+		goto freeze_no_check;
+
+	ret = ops->enter_freeze(data);
+	if (ret < 0)
+		goto out;
+
+	if (ops->callback && clockevents_freeze_event_expired(dev))
+		ret = ops->callback(data);
+
+out:
+	clockevents_cleanup_freeze_event(dev);
+	return ret;
+
+freeze_no_check:
+	/*
+	 * If an error happens before enter_freeze, enter freeze normally and
+	 * return an error. The called can't tell if freeze should be entered on
+	 * an error (since errors can happen after returning from freeze), so
+	 * just handle it here.
+	 */
+	ops->enter_freeze(data);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(timed_freeze);
+
 /*
  * Called after a notify add to make devices available which were
  * released from the notifier call.
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/5] x86, apic: Add timed freeze support
  2016-06-02  4:33 [PATCH 0/5] Add suspend-to-idle validation for Intel SoCs dbasehore
  2016-06-02  4:33 ` [PATCH 1/5] x86: stub out pmc function dbasehore
  2016-06-02  4:33 ` [PATCH 2/5] clockevents: Add timed freeze dbasehore
@ 2016-06-02  4:33 ` dbasehore
  2016-06-02  4:33 ` [PATCH 4/5] freeze: Add error reporting dbasehore
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: dbasehore @ 2016-06-02  4:33 UTC (permalink / raw)
  To: linux-kernel; +Cc: dbasehore, linux-pm, rjw, pavel, len.brown, tglx

From: Derek Basehore <dbasehore@chromium.org>

This adds support to the clock event devices created by apic to use
timed freeze. The apic is able to run a timer during freeze with near
izero impact on modern CPUs such as skylake. This will allow S0ix,
suspend-to-idle, to be validated on Intel CPUs that support it.

This is needed because bugs with power settings on the SoC can prevent
S0ix entry. There is also no way to check this before idling all of
the CPUs.

Signed-off-by: Derek Basehore <dbasehore@chromium.org>
---
 arch/x86/kernel/apic/apic.c | 25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 60078a6..f0c5f92 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -475,6 +475,26 @@ static int lapic_next_deadline(unsigned long delta,
 	return 0;
 }
 
+static bool lapic_event_expired(struct clock_event_device *evt)
+{
+	u32 cct;
+
+	cct = apic_read(APIC_TMCCT);
+	return cct == 0;
+}
+
+static bool lapic_deadline_expired(struct clock_event_device *evt)
+{
+	u64 msr;
+
+	/*
+	 * When the timer interrupt is triggered, the register is cleared, so a
+	 * non-zero value indicates a pending timer event.
+	 */
+	rdmsrl(MSR_IA32_TSC_DEADLINE, msr);
+	return msr == 0;
+}
+
 static int lapic_timer_shutdown(struct clock_event_device *evt)
 {
 	unsigned int v;
@@ -529,12 +549,14 @@ static struct clock_event_device lapic_clockevent = {
 	.name			= "lapic",
 	.features		= CLOCK_EVT_FEAT_PERIODIC |
 				  CLOCK_EVT_FEAT_ONESHOT | CLOCK_EVT_FEAT_C3STOP
-				  | CLOCK_EVT_FEAT_DUMMY,
+				  | CLOCK_EVT_FEAT_DUMMY |
+				  CLOCK_EVT_FEAT_FREEZE,
 	.shift			= 32,
 	.set_state_shutdown	= lapic_timer_shutdown,
 	.set_state_periodic	= lapic_timer_set_periodic,
 	.set_state_oneshot	= lapic_timer_set_oneshot,
 	.set_next_event		= lapic_next_event,
+	.event_expired		= lapic_event_expired,
 	.broadcast		= lapic_timer_broadcast,
 	.rating			= 100,
 	.irq			= -1,
@@ -562,6 +584,7 @@ static void setup_APIC_timer(void)
 		levt->features &= ~(CLOCK_EVT_FEAT_PERIODIC |
 				    CLOCK_EVT_FEAT_DUMMY);
 		levt->set_next_event = lapic_next_deadline;
+		levt->event_expired = lapic_deadline_expired;
 		clockevents_config_and_register(levt,
 						(tsc_khz / TSC_DIVISOR) * 1000,
 						0xF, ~0UL);
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/5] freeze: Add error reporting
  2016-06-02  4:33 [PATCH 0/5] Add suspend-to-idle validation for Intel SoCs dbasehore
                   ` (2 preceding siblings ...)
  2016-06-02  4:33 ` [PATCH 3/5] x86, apic: Add timed freeze support dbasehore
@ 2016-06-02  4:33 ` dbasehore
  2016-06-02  4:33 ` [PATCH 5/5] intel_idle: Add S0ix validation dbasehore
  2016-06-07  7:46 ` [PATCH 0/5] Add suspend-to-idle validation for Intel SoCs Pavel Machek
  5 siblings, 0 replies; 17+ messages in thread
From: dbasehore @ 2016-06-02  4:33 UTC (permalink / raw)
  To: linux-kernel; +Cc: dbasehore, linux-pm, rjw, pavel, len.brown, tglx

From: Derek Basehore <dbasehore@chromium.org>

This adds error reporting for cpuidle to freeze so suspend-to-idle can
report errors when the CPU/SoC is unable to idle properly. Freeze will
abort when an error is encounted.

Signed-off-by: Derek Basehore <dbasehore@chromium.org>
---
 drivers/acpi/processor_idle.c | 10 ++++++----
 drivers/cpuidle/cpuidle.c     | 31 ++++++++++++++++++++++++++-----
 drivers/idle/intel_idle.c     |  8 +++++---
 include/linux/cpuidle.h       | 10 ++++++----
 kernel/power/suspend.c        | 11 +++++++----
 5 files changed, 50 insertions(+), 20 deletions(-)

diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index 444e374..a959b32 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -783,8 +783,8 @@ static int acpi_idle_enter(struct cpuidle_device *dev,
 	return index;
 }
 
-static void acpi_idle_enter_freeze(struct cpuidle_device *dev,
-				   struct cpuidle_driver *drv, int index)
+static int acpi_idle_enter_freeze(struct cpuidle_device *dev,
+				  struct cpuidle_driver *drv, int index)
 {
 	struct acpi_processor_cx *cx = per_cpu(acpi_cstate[index], dev->cpu);
 
@@ -792,16 +792,18 @@ static void acpi_idle_enter_freeze(struct cpuidle_device *dev,
 		struct acpi_processor *pr = __this_cpu_read(processors);
 
 		if (unlikely(!pr))
-			return;
+			return 0;
 
 		if (pr->flags.bm_check) {
 			acpi_idle_enter_bm(pr, cx, false);
-			return;
+			return 0;
 		} else {
 			ACPI_FLUSH_CPU_CACHE();
 		}
 	}
 	acpi_idle_do_entry(cx);
+
+	return 0;
 }
 
 struct cpuidle_driver acpi_idle_driver = {
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index a4d0059..2664a6c 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -34,6 +34,7 @@ LIST_HEAD(cpuidle_detected_devices);
 static int enabled_devices;
 static int off __read_mostly;
 static int initialized __read_mostly;
+static int cpuidle_freeze_error;
 
 int cpuidle_disabled(void)
 {
@@ -109,9 +110,11 @@ int cpuidle_find_deepest_state(struct cpuidle_driver *drv,
 	return find_deepest_state(drv, dev, UINT_MAX, 0, false);
 }
 
-static void enter_freeze_proper(struct cpuidle_driver *drv,
+static int enter_freeze_proper(struct cpuidle_driver *drv,
 				struct cpuidle_device *dev, int index)
 {
+	int ret;
+
 	/*
 	 * trace_suspend_resume() called by tick_freeze() for the last CPU
 	 * executing it contains RCU usage regarded as invalid in the idle
@@ -124,7 +127,7 @@ static void enter_freeze_proper(struct cpuidle_driver *drv,
 	 * suspended is generally unsafe.
 	 */
 	stop_critical_timings();
-	drv->states[index].enter_freeze(dev, drv, index);
+	ret = drv->states[index].enter_freeze(dev, drv, index);
 	WARN_ON(!irqs_disabled());
 	/*
 	 * timekeeping_resume() that will be called by tick_unfreeze() for the
@@ -133,6 +136,7 @@ static void enter_freeze_proper(struct cpuidle_driver *drv,
 	 */
 	RCU_NONIDLE(tick_unfreeze());
 	start_critical_timings();
+	return ret;
 }
 
 /**
@@ -145,7 +149,7 @@ static void enter_freeze_proper(struct cpuidle_driver *drv,
  */
 int cpuidle_enter_freeze(struct cpuidle_driver *drv, struct cpuidle_device *dev)
 {
-	int index;
+	int index, ret = 0;
 
 	/*
 	 * Find the deepest state with ->enter_freeze present, which guarantees
@@ -153,8 +157,13 @@ int cpuidle_enter_freeze(struct cpuidle_driver *drv, struct cpuidle_device *dev)
 	 * be frozen safely.
 	 */
 	index = find_deepest_state(drv, dev, UINT_MAX, 0, true);
-	if (index > 0)
-		enter_freeze_proper(drv, dev, index);
+	if (index >= 0)
+		ret = enter_freeze_proper(drv, dev, index);
+
+	if (ret < 0) {
+		cpuidle_freeze_error = ret;
+		freeze_wake();
+	}
 
 	return index;
 }
@@ -353,6 +362,18 @@ void cpuidle_resume(void)
 	mutex_unlock(&cpuidle_lock);
 }
 
+void cpuidle_prepare_freeze(void)
+{
+	cpuidle_freeze_error = 0;
+	cpuidle_resume();
+}
+
+int cpuidle_complete_freeze(void)
+{
+	cpuidle_pause();
+	return cpuidle_freeze_error;
+}
+
 /**
  * cpuidle_enable_device - enables idle PM for a CPU
  * @dev: the CPU
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index c966492..98565de 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -97,8 +97,8 @@ static const struct idle_cpu *icpu;
 static struct cpuidle_device __percpu *intel_idle_cpuidle_devices;
 static int intel_idle(struct cpuidle_device *dev,
 			struct cpuidle_driver *drv, int index);
-static void intel_idle_freeze(struct cpuidle_device *dev,
-			      struct cpuidle_driver *drv, int index);
+static int intel_idle_freeze(struct cpuidle_device *dev,
+			     struct cpuidle_driver *drv, int index);
 static int intel_idle_cpu_init(int cpu);
 
 static struct cpuidle_state *cpuidle_state_table;
@@ -870,13 +870,15 @@ static int intel_idle(struct cpuidle_device *dev,
  * @drv: cpuidle driver
  * @index: state index
  */
-static void intel_idle_freeze(struct cpuidle_device *dev,
+static int intel_idle_freeze(struct cpuidle_device *dev,
 			     struct cpuidle_driver *drv, int index)
 {
 	unsigned long ecx = 1; /* break on interrupt flag */
 	unsigned long eax = flg2MWAIT(drv->states[index].flags);
 
 	mwait_idle_with_hints(eax, ecx);
+
+	return 0;
 }
 
 static void __setup_broadcast_timer(void *arg)
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index 786ad32..27f6b11 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -54,11 +54,11 @@ struct cpuidle_state {
 	/*
 	 * CPUs execute ->enter_freeze with the local tick or entire timekeeping
 	 * suspended, so it must not re-enable interrupts at any point (even
-	 * temporarily) or attempt to change states of clock event devices.
+	 * temporarily). Returns 0 on success and non-zero if an error occurred.
 	 */
-	void (*enter_freeze) (struct cpuidle_device *dev,
-			      struct cpuidle_driver *drv,
-			      int index);
+	int (*enter_freeze) (struct cpuidle_device *dev,
+			     struct cpuidle_driver *drv,
+			     int index);
 };
 
 /* Idle State Flags */
@@ -147,6 +147,8 @@ extern void cpuidle_pause_and_lock(void);
 extern void cpuidle_resume_and_unlock(void);
 extern void cpuidle_pause(void);
 extern void cpuidle_resume(void);
+extern void cpuidle_prepare_freeze(void);
+extern int cpuidle_complete_freeze(void);
 extern int cpuidle_enable_device(struct cpuidle_device *dev);
 extern void cpuidle_disable_device(struct cpuidle_device *dev);
 extern int cpuidle_play_dead(void);
diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
index 5b70d64..419154b 100644
--- a/kernel/power/suspend.c
+++ b/kernel/power/suspend.c
@@ -57,8 +57,10 @@ static void freeze_begin(void)
 	suspend_freeze_state = FREEZE_STATE_NONE;
 }
 
-static void freeze_enter(void)
+static int freeze_enter(void)
 {
+	int error = 0;
+
 	spin_lock_irq(&suspend_freeze_lock);
 	if (pm_wakeup_pending())
 		goto out;
@@ -67,7 +69,7 @@ static void freeze_enter(void)
 	spin_unlock_irq(&suspend_freeze_lock);
 
 	get_online_cpus();
-	cpuidle_resume();
+	cpuidle_prepare_freeze();
 
 	/* Push all the CPUs into the idle loop. */
 	wake_up_all_idle_cpus();
@@ -77,7 +79,7 @@ static void freeze_enter(void)
 		   suspend_freeze_state == FREEZE_STATE_WAKE);
 	pr_debug("PM: resume from suspend-to-idle\n");
 
-	cpuidle_pause();
+	error = cpuidle_complete_freeze();
 	put_online_cpus();
 
 	spin_lock_irq(&suspend_freeze_lock);
@@ -85,6 +87,7 @@ static void freeze_enter(void)
  out:
 	suspend_freeze_state = FREEZE_STATE_NONE;
 	spin_unlock_irq(&suspend_freeze_lock);
+	return error;
 }
 
 void freeze_wake(void)
@@ -347,7 +350,7 @@ static int suspend_enter(suspend_state_t state, bool *wakeup)
 	 */
 	if (state == PM_SUSPEND_FREEZE) {
 		trace_suspend_resume(TPS("machine_suspend"), state, true);
-		freeze_enter();
+		error = freeze_enter();
 		trace_suspend_resume(TPS("machine_suspend"), state, false);
 		goto Platform_wake;
 	}
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5/5] intel_idle: Add S0ix validation
  2016-06-02  4:33 [PATCH 0/5] Add suspend-to-idle validation for Intel SoCs dbasehore
                   ` (3 preceding siblings ...)
  2016-06-02  4:33 ` [PATCH 4/5] freeze: Add error reporting dbasehore
@ 2016-06-02  4:33 ` dbasehore
  2016-06-02  9:25   ` Peter Zijlstra
  2016-06-07  7:46 ` [PATCH 0/5] Add suspend-to-idle validation for Intel SoCs Pavel Machek
  5 siblings, 1 reply; 17+ messages in thread
From: dbasehore @ 2016-06-02  4:33 UTC (permalink / raw)
  To: linux-kernel; +Cc: dbasehore, linux-pm, rjw, pavel, len.brown, tglx

From: Derek Basehore <dbasehore@chromium.org>

This adds validation of S0ix entry and enables it on Skylake. Using
the new timed_freeze function, we program the CPU to wake up X seconds
after entering freeze. After X seconds, it will wake the CPU to check
the S0ix residency counters and make sure we entered the lowest power
state for suspend-to-idle.

It exits freeze and reports an error to userspace when the SoC does
not enter S0ix on suspend-to-idle.

One example of a bug that can prevent a Skylake CPU from entering S0ix
(suspend-to-idle) is a leaked reference count to one of the i915 power
wells. The CPU will not be able to enter Package C10 and will
therefore use about 4x as much power for the entire system. The issue
is not specific to the i915 power wells though.

Signed-off-by: Derek Basehore <dbasehore@chromium.org>
---
 drivers/idle/intel_idle.c | 177 +++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 169 insertions(+), 8 deletions(-)

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 98565de..e748e0d 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -55,15 +55,18 @@
 
 #include <linux/kernel.h>
 #include <linux/cpuidle.h>
+#include <linux/debugfs.h>
 #include <linux/tick.h>
 #include <trace/events/power.h>
 #include <linux/sched.h>
 #include <linux/notifier.h>
 #include <linux/cpu.h>
 #include <linux/module.h>
+#include <linux/suspend.h>
 #include <asm/cpu_device_id.h>
 #include <asm/mwait.h>
 #include <asm/msr.h>
+#include <asm/pmc_core.h>
 
 #define INTEL_IDLE_VERSION "0.4.1"
 #define PREFIX "intel_idle: "
@@ -93,12 +96,39 @@ struct idle_cpu {
 	bool disable_promotion_to_c1e;
 };
 
+/*
+ * Default chosen to have <= 1% power increase while allowing fast detection of
+ * SLP S0 entry errors. Waking up 10 times a second shows ~30% increase in
+ * system power on Skylake Y. Waking up once every 10 seconds is
+ * indistinguishable from not waking up at all (as ~0.3% power increase would
+ * be). Any reasonable power increases above this will not be visible to the
+ * user.
+ */
+#define DEFAULT_SLP_S0_SECONDS 10
+
+struct timed_freeze_data {
+	u32 slp_s0_saved_count;
+	struct cpuidle_device *dev;
+	struct cpuidle_driver *drv;
+	int index;
+};
+
+static struct dentry *debugfs_dir;
+static struct dentry *slp_s0_file;
+static unsigned int slp_s0_seconds = DEFAULT_SLP_S0_SECONDS;
+
+static DEFINE_SPINLOCK(slp_s0_check_lock);
+static unsigned int slp_s0_num_cpus;
+static bool slp_s0_check_inprogress;
+
 static const struct idle_cpu *icpu;
 static struct cpuidle_device __percpu *intel_idle_cpuidle_devices;
 static int intel_idle(struct cpuidle_device *dev,
 			struct cpuidle_driver *drv, int index);
 static int intel_idle_freeze(struct cpuidle_device *dev,
 			     struct cpuidle_driver *drv, int index);
+static int intel_idle_freeze_and_check(struct cpuidle_device *dev,
+				       struct cpuidle_driver *drv, int index);
 static int intel_idle_cpu_init(int cpu);
 
 static struct cpuidle_state *cpuidle_state_table;
@@ -599,7 +629,7 @@ static struct cpuidle_state skl_cstates[] = {
 		.exit_latency = 2,
 		.target_residency = 2,
 		.enter = &intel_idle,
-		.enter_freeze = intel_idle_freeze, },
+		.enter_freeze = intel_idle_freeze_and_check, },
 	{
 		.name = "C1E-SKL",
 		.desc = "MWAIT 0x01",
@@ -607,7 +637,7 @@ static struct cpuidle_state skl_cstates[] = {
 		.exit_latency = 10,
 		.target_residency = 20,
 		.enter = &intel_idle,
-		.enter_freeze = intel_idle_freeze, },
+		.enter_freeze = intel_idle_freeze_and_check, },
 	{
 		.name = "C3-SKL",
 		.desc = "MWAIT 0x10",
@@ -615,7 +645,7 @@ static struct cpuidle_state skl_cstates[] = {
 		.exit_latency = 70,
 		.target_residency = 100,
 		.enter = &intel_idle,
-		.enter_freeze = intel_idle_freeze, },
+		.enter_freeze = intel_idle_freeze_and_check, },
 	{
 		.name = "C6-SKL",
 		.desc = "MWAIT 0x20",
@@ -623,7 +653,7 @@ static struct cpuidle_state skl_cstates[] = {
 		.exit_latency = 85,
 		.target_residency = 200,
 		.enter = &intel_idle,
-		.enter_freeze = intel_idle_freeze, },
+		.enter_freeze = intel_idle_freeze_and_check, },
 	{
 		.name = "C7s-SKL",
 		.desc = "MWAIT 0x33",
@@ -631,7 +661,7 @@ static struct cpuidle_state skl_cstates[] = {
 		.exit_latency = 124,
 		.target_residency = 800,
 		.enter = &intel_idle,
-		.enter_freeze = intel_idle_freeze, },
+		.enter_freeze = intel_idle_freeze_and_check, },
 	{
 		.name = "C8-SKL",
 		.desc = "MWAIT 0x40",
@@ -639,7 +669,7 @@ static struct cpuidle_state skl_cstates[] = {
 		.exit_latency = 200,
 		.target_residency = 800,
 		.enter = &intel_idle,
-		.enter_freeze = intel_idle_freeze, },
+		.enter_freeze = intel_idle_freeze_and_check, },
 	{
 		.name = "C9-SKL",
 		.desc = "MWAIT 0x50",
@@ -647,7 +677,7 @@ static struct cpuidle_state skl_cstates[] = {
 		.exit_latency = 480,
 		.target_residency = 5000,
 		.enter = &intel_idle,
-		.enter_freeze = intel_idle_freeze, },
+		.enter_freeze = intel_idle_freeze_and_check, },
 	{
 		.name = "C10-SKL",
 		.desc = "MWAIT 0x60",
@@ -655,7 +685,7 @@ static struct cpuidle_state skl_cstates[] = {
 		.exit_latency = 890,
 		.target_residency = 5000,
 		.enter = &intel_idle,
-		.enter_freeze = intel_idle_freeze, },
+		.enter_freeze = intel_idle_freeze_and_check, },
 	{
 		.enter = NULL }
 };
@@ -869,6 +899,8 @@ static int intel_idle(struct cpuidle_device *dev,
  * @dev: cpuidle_device
  * @drv: cpuidle driver
  * @index: state index
+ *
+ * @return 0 for success, no failure state
  */
 static int intel_idle_freeze(struct cpuidle_device *dev,
 			     struct cpuidle_driver *drv, int index)
@@ -881,6 +913,121 @@ static int intel_idle_freeze(struct cpuidle_device *dev,
 	return 0;
 }
 
+static int intel_idle_freeze_wrapper(void *data)
+{
+	struct timed_freeze_data *tfd = data;
+
+	return intel_idle_freeze(tfd->dev, tfd->drv, tfd->index);
+}
+
+static int check_slp_s0(void *data)
+{
+	struct timed_freeze_data *tfd = data;
+	u32 slp_s0_new_count;
+
+	if (intel_pmc_slp_s0_counter_read(&slp_s0_new_count)) {
+		pr_warn("Unable to read SLP S0 residency counter\n");
+		return -EIO;
+	}
+
+	if (tfd->slp_s0_saved_count == slp_s0_new_count) {
+		pr_warn("CPU did not enter SLP S0 for suspend-to-idle.\n");
+		return -EIO;
+	}
+
+	return 0;
+}
+
+/**
+ * intel_idle_freeze_and_check - enters suspend-to-idle and validates the power
+ * state
+ *
+ * This function enters suspend-to-idle with intel_idle_freeze, but also sets up
+ * a timer to check that S0ix (low power state for suspend-to-idle on Intel
+ * CPUs) is properly entered.
+ *
+ * @dev: cpuidle_device
+ * @drv: cpuidle_driver
+ * @index: state index
+ * @return 0 for success, -EERROR if S0ix was not entered.
+ */
+static int intel_idle_freeze_and_check(struct cpuidle_device *dev,
+				       struct cpuidle_driver *drv, int index)
+{
+	bool check_on_this_cpu = false;
+	struct timed_freeze_ops ops;
+	struct timed_freeze_data tfd;
+	unsigned long flags;
+	int ret = 0;
+
+	/* The last CPU to freeze sets up checking SLP S0 assertion. */
+	spin_lock_irqsave(&slp_s0_check_lock, flags);
+	if (slp_s0_seconds &&
+	    ++slp_s0_num_cpus == num_online_cpus() &&
+	    !slp_s0_check_inprogress &&
+	    !intel_pmc_slp_s0_counter_read(&tfd.slp_s0_saved_count)) {
+		tfd.dev = dev;
+		tfd.drv = drv;
+		tfd.index = index;
+		ops.enter_freeze = intel_idle_freeze_wrapper;
+		ops.callback = check_slp_s0;
+		check_on_this_cpu = true;
+		/*
+		 * Make sure check_slp_s0 isn't scheduled on another CPU if it
+		 * were to leave freeze and enter it again before this CPU
+		 * leaves freeze.
+		 */
+		slp_s0_check_inprogress = true;
+	}
+	spin_unlock_irqrestore(&slp_s0_check_lock, flags);
+
+	if (check_on_this_cpu)
+		ret = timed_freeze(&ops, &tfd, ktime_set(slp_s0_seconds, 0));
+	else
+		ret = intel_idle_freeze(dev, drv, index);
+
+	spin_lock_irqsave(&slp_s0_check_lock, flags);
+	slp_s0_num_cpus--;
+	if (check_on_this_cpu)
+		slp_s0_check_inprogress = false;
+
+	spin_unlock_irqrestore(&slp_s0_check_lock, flags);
+	return ret;
+}
+
+static ssize_t slp_s0_seconds_read(struct file *fp, char __user *ubuf,
+				   size_t count, loff_t *ppos)
+{
+	char buf[10];
+	int len;
+
+	len = snprintf(buf, 10, "%u\n", slp_s0_seconds);
+	return simple_read_from_buffer(ubuf, count, ppos, buf, len);
+}
+
+static ssize_t slp_s0_seconds_write(struct file *fp,
+				    const char __user *user_buf, size_t count,
+				    loff_t *ppos)
+{
+	unsigned int value;
+	int ret;
+
+	ret = kstrtouint_from_user(user_buf, count, 10, &value);
+	if (ret < 0)
+		return -EINVAL;
+
+	slp_s0_seconds = value;
+	if (!slp_s0_seconds)
+		pr_info("SLP S0 validation disabled by userspace\n");
+	return count;
+}
+
+static const struct file_operations slp_s0_fops = {
+	.open = simple_open,
+	.read = slp_s0_seconds_read,
+	.write = slp_s0_seconds_write,
+};
+
 static void __setup_broadcast_timer(void *arg)
 {
 	unsigned long on = (unsigned long)arg;
@@ -1415,6 +1562,17 @@ static int __init intel_idle_init(void)
 	pr_debug(PREFIX "lapic_timer_reliable_states 0x%x\n",
 		lapic_timer_reliable_states);
 
+	debugfs_dir = debugfs_create_dir("intel_idle", NULL);
+	if (!debugfs_dir) {
+		pr_warn("intel_idle failed to create debugfs directory\n");
+		return 0;
+	}
+
+	slp_s0_file = debugfs_create_file("slp_s0_seconds", S_IRUSR | S_IWUSR,
+					  debugfs_dir, NULL, &slp_s0_fops);
+	if (!slp_s0_file)
+		pr_warn("intel_idle failed to create debugfs file\n");
+
 	return 0;
 }
 
@@ -1423,6 +1581,9 @@ static void __exit intel_idle_exit(void)
 	struct cpuidle_device *dev;
 	int i;
 
+	debugfs_remove(slp_s0_file);
+	debugfs_remove(debugfs_dir);
+
 	cpu_notifier_register_begin();
 
 	if (lapic_timer_reliable_states != LAPIC_TIMER_ALWAYS_RELIABLE)
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] intel_idle: Add S0ix validation
  2016-06-02  4:33 ` [PATCH 5/5] intel_idle: Add S0ix validation dbasehore
@ 2016-06-02  9:25   ` Peter Zijlstra
  2016-06-02 13:23     ` One Thousand Gnomes
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2016-06-02  9:25 UTC (permalink / raw)
  To: dbasehore; +Cc: linux-kernel, linux-pm, rjw, pavel, len.brown, tglx

On Wed, Jun 01, 2016 at 09:33:29PM -0700, dbasehore@chromium.org wrote:
> +/*
> + * Default chosen to have <= 1% power increase while allowing fast detection of
> + * SLP S0 entry errors. Waking up 10 times a second shows ~30% increase in
> + * system power on Skylake Y. Waking up once every 10 seconds is
> + * indistinguishable from not waking up at all (as ~0.3% power increase would
> + * be). Any reasonable power increases above this will not be visible to the
> + * user.
> + */
> +#define DEFAULT_SLP_S0_SECONDS 10

So I don't think anybody waits for 10 seconds to see if suspend worked.
After 10 seconds its in the bag and I'm out the door.

Then what?


Why can't you fire a single timer after 0.5 seconds to see if you hit
C10 and leave it at that? What's the point any further wakeup, if you
know you hit C10, you're good continue on.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] intel_idle: Add S0ix validation
  2016-06-02  9:25   ` Peter Zijlstra
@ 2016-06-02 13:23     ` One Thousand Gnomes
  2016-06-02 18:31       ` dbasehore .
  2016-06-02 18:55       ` dbasehore .
  0 siblings, 2 replies; 17+ messages in thread
From: One Thousand Gnomes @ 2016-06-02 13:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: dbasehore, linux-kernel, linux-pm, rjw, pavel, len.brown, tglx

On Thu, 2 Jun 2016 11:25:05 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, Jun 01, 2016 at 09:33:29PM -0700, dbasehore@chromium.org wrote:
> > +/*
> > + * Default chosen to have <= 1% power increase while allowing fast detection of
> > + * SLP S0 entry errors. Waking up 10 times a second shows ~30% increase in
> > + * system power on Skylake Y. Waking up once every 10 seconds is
> > + * indistinguishable from not waking up at all (as ~0.3% power increase would
> > + * be). Any reasonable power increases above this will not be visible to the
> > + * user.
> > + */
> > +#define DEFAULT_SLP_S0_SECONDS 10  
> 
> So I don't think anybody waits for 10 seconds to see if suspend worked.
> After 10 seconds its in the bag and I'm out the door.
> 
> Then what?
> 
> 
> Why can't you fire a single timer after 0.5 seconds to see if you hit
> C10 and leave it at that? What's the point any further wakeup, if you
> know you hit C10, you're good continue on.

There are plenty of Skylake configurations where at the moment you won't
get s0ix entry because the ISH driver is not yet merged. Spamming those
users with useless messages is not helpful. Likewise on systems with
modular kernels your warning may spuriously trigger during boot until the
ISH, i915 and audio modules and firmware have loaded and are active. I
know Chrome doesn't like modules but the rest of us do !

I'm also a bit at a loss to understand why anyone needs this except
validation engineers for Chrome products and kernel hackers doing
debug. It seems a bit odd to burden the entire world with a pile of
checks they can't use that cost even 0.3% of power (that's 15 minutes on
an 8 hour battery multiplied by every Skylake user!).

Having to have debugfs present to turn it off, but not to use it is also
a bit weird...

IMHO this should be one of the hacking/kernel debug options and not even
compiled into normal kernels.

Alan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] intel_idle: Add S0ix validation
  2016-06-02 13:23     ` One Thousand Gnomes
@ 2016-06-02 18:31       ` dbasehore .
  2016-06-02 18:55       ` dbasehore .
  1 sibling, 0 replies; 17+ messages in thread
From: dbasehore . @ 2016-06-02 18:31 UTC (permalink / raw)
  To: One Thousand Gnomes
  Cc: Peter Zijlstra, linux-kernel, Linux-pm mailing list,
	Rafael J. Wysocki, Pavel Machek, Len Brown, Thomas Gleixner

On Thu, Jun 2, 2016 at 6:23 AM, One Thousand Gnomes
<gnomes@lxorguk.ukuu.org.uk> wrote:
> On Thu, 2 Jun 2016 11:25:05 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
>
>> On Wed, Jun 01, 2016 at 09:33:29PM -0700, dbasehore@chromium.org wrote:
>> > +/*
>> > + * Default chosen to have <= 1% power increase while allowing fast detection of
>> > + * SLP S0 entry errors. Waking up 10 times a second shows ~30% increase in
>> > + * system power on Skylake Y. Waking up once every 10 seconds is
>> > + * indistinguishable from not waking up at all (as ~0.3% power increase would
>> > + * be). Any reasonable power increases above this will not be visible to the
>> > + * user.
>> > + */
>> > +#define DEFAULT_SLP_S0_SECONDS 10
>>
>> So I don't think anybody waits for 10 seconds to see if suspend worked.
>> After 10 seconds its in the bag and I'm out the door.
>>
>> Then what?
>>
>>
>> Why can't you fire a single timer after 0.5 seconds to see if you hit
>> C10 and leave it at that? What's the point any further wakeup, if you
>> know you hit C10, you're good continue on.

That will take care of most of the problems I have seen, but that
doesn't handle everything. Say your audio codec is misconfigured and
causes an interrupt storm when you plug in headphones. The irq handler
won't run, but it could still wake up the system repeatedly and
prevent entry into S0ix.

If this fails, it's not expected that the user catch and handle it,
unless he/she uses "echo freeze > /sys/power/state" to suspend to
idle. It's intended that whatever daemon in user space handles power
state transitions will catch the error and either retry, suspend to
RAM, or shut down the system.

What could happen is we could wake up after 1 second the first time,
then wake up at a slp_s0_seconds after that. This will allow us to
fail faster, still catch issues that happen later, and increase
DEFAULT_SLP_S0_SECONDS to something longer.

>
> There are plenty of Skylake configurations where at the moment you won't
> get s0ix entry because the ISH driver is not yet merged. Spamming those
> users with useless messages is not helpful. Likewise on systems with
> modular kernels your warning may spuriously trigger during boot until the
> ISH, i915 and audio modules and firmware have loaded and are active. I
> know Chrome doesn't like modules but the rest of us do !
>
> I'm also a bit at a loss to understand why anyone needs this except
> validation engineers for Chrome products and kernel hackers doing
> debug. It seems a bit odd to burden the entire world with a pile of
> checks they can't use that cost even 0.3% of power (that's 15 minutes on
> an 8 hour battery multiplied by every Skylake user!).

15 minutes is 4% of 8 hours, but let's take a system that has 8 hours
of battery life for use and 10 days of suspend to idle. 0.3% of 10
days is < 1 hour. That's only for suspend time, though. A user could
lose 1.5 minutes of use, but that's only if the user left his or her
machine suspended for 10 days. I'll probably add the single early wake
that I mentioned before and change this to 100 seconds. At that point,
we're looking at 0.03% power increase, which is < 9 seconds of lost
use for 10 days of suspend to idle.

>
> Having to have debugfs present to turn it off, but not to use it is also
> a bit weird...

I could look into putting this into the cpuidle sysfs.

>
> IMHO this should be one of the hacking/kernel debug options and not even
> compiled into normal kernels.

This patch isn't only about finding the bugs, but doing something more
graceful than burning a lot of power during suspend to idle. Whether
that's switching to suspend to RAM or shutting down is up to whatever
daemon handles power transitions. That doesn't necessarily cover users
that just use "echo |state| > /sys/power/state", but those users
already have spurious wakes, devices that take a long time to suspend
followed by failures, and other problems to handle.

This patch set does nothing if CONFIG_INTEL_PMC_CORE is not set. If
Linux distros don't want this running they can compile without that
config set since this is currently the only user of that. I could also
add another config flag if that's preferred or if anything else starts
using the INTEL_PMC_CORE code.

>
> Alan

Thanks for the reviews.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] intel_idle: Add S0ix validation
  2016-06-02 13:23     ` One Thousand Gnomes
  2016-06-02 18:31       ` dbasehore .
@ 2016-06-02 18:55       ` dbasehore .
  2016-06-02 19:53         ` One Thousand Gnomes
  1 sibling, 1 reply; 17+ messages in thread
From: dbasehore . @ 2016-06-02 18:55 UTC (permalink / raw)
  To: One Thousand Gnomes
  Cc: Peter Zijlstra, linux-kernel, Linux-pm mailing list,
	Rafael J. Wysocki, Pavel Machek, Len Brown, Thomas Gleixner

On Thu, Jun 2, 2016 at 6:23 AM, One Thousand Gnomes
<gnomes@lxorguk.ukuu.org.uk> wrote:
>
> There are plenty of Skylake configurations where at the moment you won't
> get s0ix entry because the ISH driver is not yet merged. Spamming those
> users with useless messages is not helpful. Likewise on systems with
> modular kernels your warning may spuriously trigger during boot until the
> ISH, i915 and audio modules and firmware have loaded and are active. I
> know Chrome doesn't like modules but the rest of us do !

How would this spuriously trigger during boot? This code is only run
during freeze. If there's some issue with not entering S0ix before a
module or firmware is loaded, it's better to not use suspend to idle
until those are in place.

I'm not familiar with the ISH driver. How does it prevent entry into
S0ix? I would argue that you don't want to use suspend to idle on a
Skylake system that can't enter S0ix and instead use suspend to RAM.

>
> I'm also a bit at a loss to understand why anyone needs this except
> validation engineers for Chrome products and kernel hackers doing
> debug. It seems a bit odd to burden the entire world with a pile of
> checks they can't use that cost even 0.3% of power (that's 15 minutes on
> an 8 hour battery multiplied by every Skylake user!).
>
> Having to have debugfs present to turn it off, but not to use it is also
> a bit weird...
>
> IMHO this should be one of the hacking/kernel debug options and not even
> compiled into normal kernels.
>
> Alan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] intel_idle: Add S0ix validation
  2016-06-02 18:55       ` dbasehore .
@ 2016-06-02 19:53         ` One Thousand Gnomes
  2016-06-02 20:35           ` dbasehore .
  0 siblings, 1 reply; 17+ messages in thread
From: One Thousand Gnomes @ 2016-06-02 19:53 UTC (permalink / raw)
  To: dbasehore .
  Cc: Peter Zijlstra, linux-kernel, Linux-pm mailing list,
	Rafael J. Wysocki, Pavel Machek, Len Brown, Thomas Gleixner

> How would this spuriously trigger during boot? This code is only run
> during freeze. If there's some issue with not entering S0ix before a
> module or firmware is loaded, it's better to not use suspend to idle
> until those are in place.

Ok yes you are correct it's not likely to trigger during boot (although
if you close the lid during boot...)

> I'm not familiar with the ISH driver. How does it prevent entry into
> S0ix? I would argue that you don't want to use suspend to idle on a
> Skylake system that can't enter S0ix and instead use suspend to RAM.

Several IP blocks on the systems do their own power management if present
and enabled by the firmware. If they do not have drivers loaded then you
will not be able to enter some power states. Thus you'll now get warnings
if you try to freeze such a system and those warnings are not themselves
terribly helpful to a user.

It's a good debug feature, but it doesn't belong anywhere but debug menus
and off by default. If you want to ship it enabled in ChromeOS for
product fine, but I don't think it belongs on for everywhere and
everything.

Alan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] intel_idle: Add S0ix validation
  2016-06-02 19:53         ` One Thousand Gnomes
@ 2016-06-02 20:35           ` dbasehore .
  2016-06-04 12:22             ` Alan
  0 siblings, 1 reply; 17+ messages in thread
From: dbasehore . @ 2016-06-02 20:35 UTC (permalink / raw)
  To: One Thousand Gnomes
  Cc: Peter Zijlstra, linux-kernel, Linux-pm mailing list,
	Rafael J. Wysocki, Pavel Machek, Len Brown, Thomas Gleixner

On Thu, Jun 2, 2016 at 12:53 PM, One Thousand Gnomes
<gnomes@lxorguk.ukuu.org.uk> wrote:
>> How would this spuriously trigger during boot? This code is only run
>> during freeze. If there's some issue with not entering S0ix before a
>> module or firmware is loaded, it's better to not use suspend to idle
>> until those are in place.
>
> Ok yes you are correct it's not likely to trigger during boot (although
> if you close the lid during boot...)
>
>> I'm not familiar with the ISH driver. How does it prevent entry into
>> S0ix? I would argue that you don't want to use suspend to idle on a
>> Skylake system that can't enter S0ix and instead use suspend to RAM.
>
> Several IP blocks on the systems do their own power management if present
> and enabled by the firmware. If they do not have drivers loaded then you
> will not be able to enter some power states. Thus you'll now get warnings
> if you try to freeze such a system and those warnings are not themselves
> terribly helpful to a user.

I would expect those IP blocks to do nothing and not block lower power
states if the firmware is not loaded. If that is not the case, I think
that should be fixed such that those lower power states are at least
available during suspend (if not during runtime). If your Skylake+
system is not entering S0ix during freeze, I consider that a bug that
needs to be addressed.

>
> It's a good debug feature, but it doesn't belong anywhere but debug menus
> and off by default. If you want to ship it enabled in ChromeOS for
> product fine, but I don't think it belongs on for everywhere and
> everything.

If other Linux distributions choose not to enable it in their kernel
configs, that's their decision. As I said, this does nothing in the
!CONFIG_INTEL_PMC_CORE case, but if a finer level config is warranted,
I can add that.

I would prefer if others used this more, since there would be better
debug coverage and I would have to fix fewer bugs.

>
> Alan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] intel_idle: Add S0ix validation
  2016-06-02 20:35           ` dbasehore .
@ 2016-06-04 12:22             ` Alan
  2016-06-06 21:39               ` dbasehore .
  0 siblings, 1 reply; 17+ messages in thread
From: Alan @ 2016-06-04 12:22 UTC (permalink / raw)
  To: dbasehore .
  Cc: Peter Zijlstra, linux-kernel, Linux-pm mailing list,
	Rafael J. Wysocki, Pavel Machek, Len Brown, Thomas Gleixner

> I would expect those IP blocks to do nothing and not block lower power
> states if the firmware is not loaded. If that is not the case, I think
> that should be fixed such that those lower power states are at least
> available during suspend (if not during runtime). If your Skylake+
> system is not entering S0ix during freeze, I consider that a bug that
> needs to be addressed.

You would assume wrongly. Several parts of the system do their own
power management so if present need to have a driver loaded. Graphics
is the example everyone is familiar with but ADSP audio and ISH are two
others.

> configs, that's their decision. As I said, this does nothing in the
> !CONFIG_INTEL_PMC_CORE case, but if a finer level config is warranted,
> I can add that.

IMHO it belongs as a config item because it has a power cost, and you
can't turn it off without enabling debugfs when it's compiled in.

> I would prefer if others used this more, since there would be better
> debug coverage and I would have to fix fewer bugs.

I'd be more concerned about getting 10,000 emails bisecting the warning
to your commit 8)

Alan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] intel_idle: Add S0ix validation
  2016-06-04 12:22             ` Alan
@ 2016-06-06 21:39               ` dbasehore .
  0 siblings, 0 replies; 17+ messages in thread
From: dbasehore . @ 2016-06-06 21:39 UTC (permalink / raw)
  To: Alan
  Cc: Peter Zijlstra, linux-kernel, Linux-pm mailing list,
	Rafael J. Wysocki, Pavel Machek, Len Brown, Thomas Gleixner

On Sat, Jun 4, 2016 at 5:22 AM, Alan <gnomes@lxorguk.ukuu.org.uk> wrote:
>> I would expect those IP blocks to do nothing and not block lower power
>> states if the firmware is not loaded. If that is not the case, I think
>> that should be fixed such that those lower power states are at least
>> available during suspend (if not during runtime). If your Skylake+
>> system is not entering S0ix during freeze, I consider that a bug that
>> needs to be addressed.
>
> You would assume wrongly. Several parts of the system do their own
> power management so if present need to have a driver loaded. Graphics
> is the example everyone is familiar with but ADSP audio and ISH are two
> others.

Okay, I remember running into this with the display actually.

>
>> configs, that's their decision. As I said, this does nothing in the
>> !CONFIG_INTEL_PMC_CORE case, but if a finer level config is warranted,
>> I can add that.
>
> IMHO it belongs as a config item because it has a power cost, and you
> can't turn it off without enabling debugfs when it's compiled in.
>

It could also be default off. It's not a lot of code being added, so
it could just be part of the Intel Idle driver.

After thinking about it, I plan on moving to exponential back off for
the freeze time (1, 10, 100, 1000 seconds). This way, the power impact
won't be measurable, yet it will still catch errors. There will just
be a sysfs entry added to the cpuidle node to enable/disable the
feature. The feature will be turned off by default.

>> I would prefer if others used this more, since there would be better
>> debug coverage and I would have to fix fewer bugs.
>
> I'd be more concerned about getting 10,000 emails bisecting the warning
> to your commit 8)
>
> Alan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/5] Add suspend-to-idle validation for Intel SoCs
  2016-06-02  4:33 [PATCH 0/5] Add suspend-to-idle validation for Intel SoCs dbasehore
                   ` (4 preceding siblings ...)
  2016-06-02  4:33 ` [PATCH 5/5] intel_idle: Add S0ix validation dbasehore
@ 2016-06-07  7:46 ` Pavel Machek
  2016-06-08  0:07   ` dbasehore .
  5 siblings, 1 reply; 17+ messages in thread
From: Pavel Machek @ 2016-06-07  7:46 UTC (permalink / raw)
  To: dbasehore; +Cc: linux-kernel, linux-pm, rjw, len.brown, tglx

On Wed 2016-06-01 21:33:24, dbasehore@chromium.org wrote:
> From: Derek Basehore <dbasehore@chromium.org>
> 
> This patch set adds support for catching errors when entering freeze
> on Intel Skylake SoCs. Support for this can be added to newer SoCs in
> later patches.
> 
> Verification is done by waking up the CPU once every X (default 10)
> seconds to check the residency of S0ix. This can't be verified before
> attempting to enter S0ix through mwait, so we have to repeatedly
> verify entry into that state. Successfully entering S0ix is no
> guarantee that it will be entered on the next attempt, so we have to
> schedule another check. This has a minimal power impact of <1% of the
> total system power on our systems.

Dunno. Should this be protected with something like CONFIG_TEST_SLEEP?
People probably don't want this for production...

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/5] Add suspend-to-idle validation for Intel SoCs
  2016-06-07  7:46 ` [PATCH 0/5] Add suspend-to-idle validation for Intel SoCs Pavel Machek
@ 2016-06-08  0:07   ` dbasehore .
  2016-06-11 20:31     ` Pavel Machek
  0 siblings, 1 reply; 17+ messages in thread
From: dbasehore . @ 2016-06-08  0:07 UTC (permalink / raw)
  To: Pavel Machek
  Cc: linux-kernel, Linux-pm mailing list, Rafael J. Wysocki,
	Len Brown, Thomas Gleixner

On Tue, Jun 7, 2016 at 12:46 AM, Pavel Machek <pavel@ucw.cz> wrote:
> On Wed 2016-06-01 21:33:24, dbasehore@chromium.org wrote:
>> From: Derek Basehore <dbasehore@chromium.org>
>>
>> This patch set adds support for catching errors when entering freeze
>> on Intel Skylake SoCs. Support for this can be added to newer SoCs in
>> later patches.
>>
>> Verification is done by waking up the CPU once every X (default 10)
>> seconds to check the residency of S0ix. This can't be verified before
>> attempting to enter S0ix through mwait, so we have to repeatedly
>> verify entry into that state. Successfully entering S0ix is no
>> guarantee that it will be entered on the next attempt, so we have to
>> schedule another check. This has a minimal power impact of <1% of the
>> total system power on our systems.
>
> Dunno. Should this be protected with something like CONFIG_TEST_SLEEP?
> People probably don't want this for production...
>

That depends, if you switch to using suspend to idle instead of
suspend to RAM, would you rather not catch power bugs due to
misconfigured hardware in production?

I agree that it shouldn't be on by default since freeze shouldn't fail
because some IP on the SoC doesn't have firmware loaded (this happens
with i915), but I was just going to leave it off by default instead of
adding yet another config option for a small feature.

> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/5] Add suspend-to-idle validation for Intel SoCs
  2016-06-08  0:07   ` dbasehore .
@ 2016-06-11 20:31     ` Pavel Machek
  0 siblings, 0 replies; 17+ messages in thread
From: Pavel Machek @ 2016-06-11 20:31 UTC (permalink / raw)
  To: dbasehore .
  Cc: linux-kernel, Linux-pm mailing list, Rafael J. Wysocki,
	Len Brown, Thomas Gleixner

On Tue 2016-06-07 17:07:21, dbasehore . wrote:
> On Tue, Jun 7, 2016 at 12:46 AM, Pavel Machek <pavel@ucw.cz> wrote:
> > On Wed 2016-06-01 21:33:24, dbasehore@chromium.org wrote:
> >> From: Derek Basehore <dbasehore@chromium.org>
> >>
> >> This patch set adds support for catching errors when entering freeze
> >> on Intel Skylake SoCs. Support for this can be added to newer SoCs in
> >> later patches.
> >>
> >> Verification is done by waking up the CPU once every X (default 10)
> >> seconds to check the residency of S0ix. This can't be verified before
> >> attempting to enter S0ix through mwait, so we have to repeatedly
> >> verify entry into that state. Successfully entering S0ix is no
> >> guarantee that it will be entered on the next attempt, so we have to
> >> schedule another check. This has a minimal power impact of <1% of the
> >> total system power on our systems.
> >
> > Dunno. Should this be protected with something like CONFIG_TEST_SLEEP?
> > People probably don't want this for production...
> >
> 
> That depends, if you switch to using suspend to idle instead of
> suspend to RAM, would you rather not catch power bugs due to
> misconfigured hardware in production?

> I agree that it shouldn't be on by default since freeze shouldn't fail
> because some IP on the SoC doesn't have firmware loaded (this happens
> with i915), but I was just going to leave it off by default instead of
> adding yet another config option for a small feature.

I'd rather have testing features optional. In 2 years, hopefully the
drivers are debugged, and people can set this to off...

									Pavel


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2016-06-11 20:31 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-02  4:33 [PATCH 0/5] Add suspend-to-idle validation for Intel SoCs dbasehore
2016-06-02  4:33 ` [PATCH 1/5] x86: stub out pmc function dbasehore
2016-06-02  4:33 ` [PATCH 2/5] clockevents: Add timed freeze dbasehore
2016-06-02  4:33 ` [PATCH 3/5] x86, apic: Add timed freeze support dbasehore
2016-06-02  4:33 ` [PATCH 4/5] freeze: Add error reporting dbasehore
2016-06-02  4:33 ` [PATCH 5/5] intel_idle: Add S0ix validation dbasehore
2016-06-02  9:25   ` Peter Zijlstra
2016-06-02 13:23     ` One Thousand Gnomes
2016-06-02 18:31       ` dbasehore .
2016-06-02 18:55       ` dbasehore .
2016-06-02 19:53         ` One Thousand Gnomes
2016-06-02 20:35           ` dbasehore .
2016-06-04 12:22             ` Alan
2016-06-06 21:39               ` dbasehore .
2016-06-07  7:46 ` [PATCH 0/5] Add suspend-to-idle validation for Intel SoCs Pavel Machek
2016-06-08  0:07   ` dbasehore .
2016-06-11 20:31     ` Pavel Machek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).