linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/5] Add support for S3 non-stop TSC support.
@ 2013-01-21  6:38 Feng Tang
  2013-01-21  6:38 ` [RFC PATCH 1/5] x86: Add cpu capability flag X86_FEATURE_TSC_S3_NOTSTOP Feng Tang
                   ` (6 more replies)
  0 siblings, 7 replies; 30+ messages in thread
From: Feng Tang @ 2013-01-21  6:38 UTC (permalink / raw)
  To: Thomas Gleixner, John Stultz, Ingo Molnar, H. Peter Anvin, x86,
	Len Brown, Rafael J. Wysocki, linux-kernel
  Cc: Feng Tang

Hi All,

On some new Intel Atom processors (Penwell and Cloverview), there is
a feature that the TSC won't stop S3, say the TSC value won't be
reset to 0 after resume. This feature makes TSC a more reliable
clocksource and could benefit the timekeeping code during system
suspend/resume cycles.

The enabling efforts include adding new flags for this feature, 
modifying clocksource.c and timekeeping.c to support and utilizing
it.

One remaining question is inside the timekeeping_resume(), we don't
know if it is called by resuming from suspend(s2ram) or from
hibernate(s2disk), as there is no easy way to check it currently.
But it doesn't hurt as these Penwell/Cloverview platforms only have
S3 state, and no S4.

Please help to review them, thanks!

- Feng

---------

Feng Tang (5):
  x86: Add cpu capability flag X86_FEATURE_TSC_S3_NOTSTOP
  clocksource: Add new feature flag CLOCK_SOURCE_SUSPEND_NOTSTOP
  x86: tsc: Add support for new S3_NOTSTOP feature
  clocksource: Enlarge the maxim time interval when configuring the
    scale and shift
  timekeeping: Add support for clocksource which doesn't stop during
    suspend

 arch/x86/include/asm/cpufeature.h |    1 +
 arch/x86/kernel/cpu/intel.c       |   12 ++++++++++++
 arch/x86/kernel/tsc.c             |    6 +++++-
 include/linux/clocksource.h       |    1 +
 kernel/time/clocksource.c         |    6 +++---
 kernel/time/timekeeping.c         |   23 ++++++++++++++++++-----
 6 files changed, 40 insertions(+), 9 deletions(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [RFC PATCH 1/5] x86: Add cpu capability flag X86_FEATURE_TSC_S3_NOTSTOP
  2013-01-21  6:38 [RFC PATCH 0/5] Add support for S3 non-stop TSC support Feng Tang
@ 2013-01-21  6:38 ` Feng Tang
  2013-01-21  7:27   ` Chen Gong
  2013-01-21  6:38 ` [RFC PATCH 2/5] clocksource: Add new feature flag CLOCK_SOURCE_SUSPEND_NOTSTOP Feng Tang
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Feng Tang @ 2013-01-21  6:38 UTC (permalink / raw)
  To: Thomas Gleixner, John Stultz, Ingo Molnar, H. Peter Anvin, x86,
	Len Brown, Rafael J. Wysocki, linux-kernel
  Cc: Feng Tang

On some new Intel Atom processors (Penwell and Cloverview), there is
a feature that the TSC won't stop S3, say the TSC value won't be
reset to 0 after resume. This feature makes TSC a more reliable
clocksource and could benefit the timekeeping code during system
suspend/resume cycle, so add a flag for it.

Signed-off-by: Feng Tang <feng.tang@intel.com>
---
 arch/x86/include/asm/cpufeature.h |    1 +
 arch/x86/kernel/cpu/intel.c       |   12 ++++++++++++
 2 files changed, 13 insertions(+)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 2d9075e..f7e1eac 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -100,6 +100,7 @@
 #define X86_FEATURE_AMD_DCM     (3*32+27) /* multi-node processor */
 #define X86_FEATURE_APERFMPERF	(3*32+28) /* APERFMPERF */
 #define X86_FEATURE_EAGER_FPU	(3*32+29) /* "eagerfpu" Non lazy FPU restore */
+#define X86_FEATURE_TSC_S3_NOTSTOP (3*32+30) /* TSC doesn't stop in S3 state */
 
 /* Intel-defined CPU features, CPUID level 0x00000001 (ecx), word 4 */
 #define X86_FEATURE_XMM3	(4*32+ 0) /* "pni" SSE-3 */
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index fcaabd0..532f873 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -97,6 +97,18 @@ static void __cpuinit early_init_intel(struct cpuinfo_x86 *c)
 			sched_clock_stable = 1;
 	}
 
+	/* Penwell and Cloverview have the TSC which doesn't sleep on S3 */
+	if (c->x86 == 6) {
+		switch (c->x86_model) {
+		case 0x27:	/* Penwell */
+		case 0x35:	/* Cloverview */
+			set_cpu_cap(c, X86_FEATURE_TSC_S3_NOTSTOP);
+			break;
+		default:
+			;
+		}
+	}
+
 	/*
 	 * There is a known erratum on Pentium III and Core Solo
 	 * and Core Duo CPUs.
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 2/5] clocksource: Add new feature flag CLOCK_SOURCE_SUSPEND_NOTSTOP
  2013-01-21  6:38 [RFC PATCH 0/5] Add support for S3 non-stop TSC support Feng Tang
  2013-01-21  6:38 ` [RFC PATCH 1/5] x86: Add cpu capability flag X86_FEATURE_TSC_S3_NOTSTOP Feng Tang
@ 2013-01-21  6:38 ` Feng Tang
  2013-01-21  6:38 ` [RFC PATCH 3/5] x86: tsc: Add support for new S3_NOTSTOP feature Feng Tang
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 30+ messages in thread
From: Feng Tang @ 2013-01-21  6:38 UTC (permalink / raw)
  To: Thomas Gleixner, John Stultz, Ingo Molnar, H. Peter Anvin, x86,
	Len Brown, Rafael J. Wysocki, linux-kernel
  Cc: Feng Tang

Some x86 processors have a TSC clocksource, which continue to work when
system is suspend. Add a feature flag so that it could be utilized.

Signed-off-by: Feng Tang <feng.tang@intel.com>
---
 include/linux/clocksource.h |    1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 4dceaf8..2d53a8a 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -206,6 +206,7 @@ struct clocksource {
 #define CLOCK_SOURCE_WATCHDOG			0x10
 #define CLOCK_SOURCE_VALID_FOR_HRES		0x20
 #define CLOCK_SOURCE_UNSTABLE			0x40
+#define CLOCK_SOURCE_SUSPEND_NOTSTOP		0x80
 
 /* simplify initialization of mask field */
 #define CLOCKSOURCE_MASK(bits) (cycle_t)((bits) < 64 ? ((1ULL<<(bits))-1) : -1)
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 3/5] x86: tsc: Add support for new S3_NOTSTOP feature
  2013-01-21  6:38 [RFC PATCH 0/5] Add support for S3 non-stop TSC support Feng Tang
  2013-01-21  6:38 ` [RFC PATCH 1/5] x86: Add cpu capability flag X86_FEATURE_TSC_S3_NOTSTOP Feng Tang
  2013-01-21  6:38 ` [RFC PATCH 2/5] clocksource: Add new feature flag CLOCK_SOURCE_SUSPEND_NOTSTOP Feng Tang
@ 2013-01-21  6:38 ` Feng Tang
  2013-01-21  6:38 ` [RFC PATCH 4/5] clocksource: Enlarge the maxim time interval when configuring the scale and shift Feng Tang
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 30+ messages in thread
From: Feng Tang @ 2013-01-21  6:38 UTC (permalink / raw)
  To: Thomas Gleixner, John Stultz, Ingo Molnar, H. Peter Anvin, x86,
	Len Brown, Rafael J. Wysocki, linux-kernel
  Cc: Feng Tang

Signed-off-by: Feng Tang <feng.tang@intel.com>
---
 arch/x86/kernel/tsc.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 06ccb50..4cc33ca 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -767,7 +767,8 @@ static cycle_t read_tsc(struct clocksource *cs)
 
 static void resume_tsc(struct clocksource *cs)
 {
-	clocksource_tsc.cycle_last = 0;
+	if (!boot_cpu_has(X86_FEATURE_TSC_S3_NOTSTOP))
+		clocksource_tsc.cycle_last = 0;
 }
 
 static struct clocksource clocksource_tsc = {
@@ -938,6 +939,9 @@ static int __init init_tsc_clocksource(void)
 		clocksource_tsc.flags &= ~CLOCK_SOURCE_IS_CONTINUOUS;
 	}
 
+	if (boot_cpu_has(X86_FEATURE_TSC_S3_NOTSTOP))
+		clocksource_tsc.flags |= CLOCK_SOURCE_SUSPEND_NOTSTOP;
+
 	/*
 	 * Trust the results of the earlier calibration on systems
 	 * exporting a reliable TSC.
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 4/5] clocksource: Enlarge the maxim time interval when configuring the scale and shift
  2013-01-21  6:38 [RFC PATCH 0/5] Add support for S3 non-stop TSC support Feng Tang
                   ` (2 preceding siblings ...)
  2013-01-21  6:38 ` [RFC PATCH 3/5] x86: tsc: Add support for new S3_NOTSTOP feature Feng Tang
@ 2013-01-21  6:38 ` Feng Tang
  2013-01-21  7:25   ` Chen Gong
  2013-01-21  6:38 ` [RFC PATCH 5/5] timekeeping: Add support for clocksource which doesn't stop during suspend Feng Tang
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Feng Tang @ 2013-01-21  6:38 UTC (permalink / raw)
  To: Thomas Gleixner, John Stultz, Ingo Molnar, H. Peter Anvin, x86,
	Len Brown, Rafael J. Wysocki, linux-kernel
  Cc: Feng Tang

On our x86 platform, we see a failure case of calling clocksource_cyc2ns(),
which return a negative value. The reason is the time interval was large
(more than 1000 seconds), while its TSC frequency is 2GHz, so the following
fomular overflowed:
	((u64) cycles * mult) >> shift

So enlarge the time interval from 10 mins to 40 mins to fix the bug.

Another solution may be adding a "max_interval" in struct clocksource, and
use a default value (like current 10 minutes) when clocksource driver
doesn't set it.

Signed-off-by: Feng Tang <feng.tang@intel.com>
---
 kernel/time/clocksource.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index c958338..48fbfcb 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -663,7 +663,7 @@ void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq)
 	 * Calc the maximum number of seconds which we can run before
 	 * wrapping around. For clocksources which have a mask > 32bit
 	 * we need to limit the max sleep time to have a good
-	 * conversion precision. 10 minutes is still a reasonable
+	 * conversion precision. 40 minutes is still a reasonable
 	 * amount. That results in a shift value of 24 for a
 	 * clocksource with mask >= 40bit and f >= 4GHz. That maps to
 	 * ~ 0.06ppm granularity for NTP. We apply the same 12.5%
@@ -674,8 +674,8 @@ void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq)
 	do_div(sec, scale);
 	if (!sec)
 		sec = 1;
-	else if (sec > 600 && cs->mask > UINT_MAX)
-		sec = 600;
+	else if (sec > 2400 && cs->mask > UINT_MAX)
+		sec = 2400;
 
 	clocks_calc_mult_shift(&cs->mult, &cs->shift, freq,
 			       NSEC_PER_SEC / scale, sec * scale);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 5/5] timekeeping: Add support for clocksource which doesn't stop during suspend
  2013-01-21  6:38 [RFC PATCH 0/5] Add support for S3 non-stop TSC support Feng Tang
                   ` (3 preceding siblings ...)
  2013-01-21  6:38 ` [RFC PATCH 4/5] clocksource: Enlarge the maxim time interval when configuring the scale and shift Feng Tang
@ 2013-01-21  6:38 ` Feng Tang
  2013-01-21 13:55 ` [RFC PATCH 0/5] Add support for S3 non-stop TSC support Rafael J. Wysocki
  2013-01-21 18:46 ` John Stultz
  6 siblings, 0 replies; 30+ messages in thread
From: Feng Tang @ 2013-01-21  6:38 UTC (permalink / raw)
  To: Thomas Gleixner, John Stultz, Ingo Molnar, H. Peter Anvin, x86,
	Len Brown, Rafael J. Wysocki, linux-kernel
  Cc: Feng Tang

There are some new processors whose TSC clocksource won't stop during
suspend. Currently, after system resumes from sleep state, kernel will
use persistent clock or RTC to compensate the sleep time, but for those
new types of clocksources, we could skip the special compensation from
external sources, and just use current clocksource for recounting.

This can solve some time drift bugs caused by the not-so-accurate RTC
devices.

Signed-off-by: Feng Tang <feng.tang@intel.com>
---
 kernel/time/timekeeping.c |   23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index cbc6acb..628c9ba 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -749,22 +749,36 @@ void timekeeping_inject_sleeptime(struct timespec *delta)
 static void timekeeping_resume(void)
 {
 	struct timekeeper *tk = &timekeeper;
+	struct clocksource *clock = tk->clock;
 	unsigned long flags;
 	struct timespec ts;
+	cycle_t cycle_now, cycle_delta;
+	s64 nsec;
 
 	read_persistent_clock(&ts);
-
 	clockevents_resume();
 	clocksource_resume();
 
 	write_seqlock_irqsave(&tk->lock, flags);
 
-	if (timespec_compare(&ts, &timekeeping_suspend_time) > 0) {
+	if (clock->flags & CLOCK_SOURCE_SUSPEND_NOTSTOP) {
+		cycle_now = clock->read(clock);
+		cycle_delta = (cycle_now - clock->cycle_last) & clock->mask;
+		clock->cycle_last = cycle_now;
+
+		nsec = clocksource_cyc2ns(cycle_delta, clock->mult, clock->shift);
+		ts = ns_to_timespec(nsec);
+	} else if (timespec_compare(&ts, &timekeeping_suspend_time) > 0)
 		ts = timespec_sub(ts, timekeeping_suspend_time);
-		__timekeeping_inject_sleeptime(tk, &ts);
+	else {
+		ts.tv_sec = 0;
+		ts.tv_nsec = 0;
 	}
+
+	__timekeeping_inject_sleeptime(tk, &ts);
+
 	/* re-base the last cycle value */
-	tk->clock->cycle_last = tk->clock->read(tk->clock);
+	clock->cycle_last = clock->read(clock);
 	tk->ntp_error = 0;
 	timekeeping_suspended = 0;
 	timekeeping_update(tk, false);
@@ -1134,7 +1148,6 @@ static inline void old_vsyscall_fixup(struct timekeeper *tk)
 #endif
 
 
-
 /**
  * update_wall_time - Uses the current clocksource to increment the wall time
  *
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 4/5] clocksource: Enlarge the maxim time interval when configuring the scale and shift
  2013-01-21  6:38 ` [RFC PATCH 4/5] clocksource: Enlarge the maxim time interval when configuring the scale and shift Feng Tang
@ 2013-01-21  7:25   ` Chen Gong
  0 siblings, 0 replies; 30+ messages in thread
From: Chen Gong @ 2013-01-21  7:25 UTC (permalink / raw)
  To: Feng Tang
  Cc: Thomas Gleixner, John Stultz, Ingo Molnar, H. Peter Anvin, x86,
	Len Brown, Rafael J. Wysocki, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2973 bytes --]

On Mon, Jan 21, 2013 at 02:38:44PM +0800, Feng Tang wrote:
> Date:	Mon, 21 Jan 2013 14:38:44 +0800
> From: Feng Tang <feng.tang@intel.com>
> To: Thomas Gleixner <tglx@linutronix.de>, John Stultz
>  <john.stultz@linaro.org>, Ingo Molnar <mingo@elte.hu>, "H. Peter Anvin"
>  <hpa@linux.intel.com>, x86@kernel.org, Len Brown <lenb@kernel.org>,
>  "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
>  linux-kernel@vger.kernel.org
> Cc: Feng Tang <feng.tang@intel.com>
> Subject: [RFC PATCH 4/5] clocksource: Enlarge the maxim time interval when
>  configuring the scale and shift
> X-Mailer: git-send-email 1.7.9.5
> 
> On our x86 platform, we see a failure case of calling clocksource_cyc2ns(),
> which return a negative value. The reason is the time interval was large
> (more than 1000 seconds), while its TSC frequency is 2GHz, so the following
> fomular overflowed:
> 	((u64) cycles * mult) >> shift
> 
> So enlarge the time interval from 10 mins to 40 mins to fix the bug.
> 
> Another solution may be adding a "max_interval" in struct clocksource, and
> use a default value (like current 10 minutes) when clocksource driver
> doesn't set it.
> 
As you said, it looks like it is a littleb it arbitrary from 10m -> 40m, I
think max_interval is a better choice, if timer guys not minding too many
control knobs :-).

> Signed-off-by: Feng Tang <feng.tang@intel.com>
> ---
>  kernel/time/clocksource.c |    6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> index c958338..48fbfcb 100644
> --- a/kernel/time/clocksource.c
> +++ b/kernel/time/clocksource.c
> @@ -663,7 +663,7 @@ void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq)
>  	 * Calc the maximum number of seconds which we can run before
>  	 * wrapping around. For clocksources which have a mask > 32bit
>  	 * we need to limit the max sleep time to have a good
> -	 * conversion precision. 10 minutes is still a reasonable
> +	 * conversion precision. 40 minutes is still a reasonable
>  	 * amount. That results in a shift value of 24 for a
>  	 * clocksource with mask >= 40bit and f >= 4GHz. That maps to
>  	 * ~ 0.06ppm granularity for NTP. We apply the same 12.5%
> @@ -674,8 +674,8 @@ void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq)
>  	do_div(sec, scale);
>  	if (!sec)
>  		sec = 1;
> -	else if (sec > 600 && cs->mask > UINT_MAX)
> -		sec = 600;
> +	else if (sec > 2400 && cs->mask > UINT_MAX)
> +		sec = 2400;
>  
>  	clocks_calc_mult_shift(&cs->mult, &cs->shift, freq,
>  			       NSEC_PER_SEC / scale, sec * scale);
> -- 
> 1.7.9.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 1/5] x86: Add cpu capability flag X86_FEATURE_TSC_S3_NOTSTOP
  2013-01-21  6:38 ` [RFC PATCH 1/5] x86: Add cpu capability flag X86_FEATURE_TSC_S3_NOTSTOP Feng Tang
@ 2013-01-21  7:27   ` Chen Gong
  2013-01-21  7:59     ` Feng Tang
  0 siblings, 1 reply; 30+ messages in thread
From: Chen Gong @ 2013-01-21  7:27 UTC (permalink / raw)
  To: Feng Tang
  Cc: Thomas Gleixner, John Stultz, Ingo Molnar, H. Peter Anvin, x86,
	Len Brown, Rafael J. Wysocki, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1842 bytes --]

On Mon, Jan 21, 2013 at 02:38:41PM +0800, Feng Tang wrote:
> Date:	Mon, 21 Jan 2013 14:38:41 +0800
> From: Feng Tang <feng.tang@intel.com>
> To: Thomas Gleixner <tglx@linutronix.de>, John Stultz
>  <john.stultz@linaro.org>, Ingo Molnar <mingo@elte.hu>, "H. Peter Anvin"
>  <hpa@linux.intel.com>, x86@kernel.org, Len Brown <lenb@kernel.org>,
>  "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
>  linux-kernel@vger.kernel.org
> Cc: Feng Tang <feng.tang@intel.com>
> Subject: [RFC PATCH 1/5] x86: Add cpu capability flag
>  X86_FEATURE_TSC_S3_NOTSTOP
> X-Mailer: git-send-email 1.7.9.5
> 
> On some new Intel Atom processors (Penwell and Cloverview), there is
> a feature that the TSC won't stop S3, say the TSC value won't be
> reset to 0 after resume. This feature makes TSC a more reliable
> clocksource and could benefit the timekeeping code during system
> suspend/resume cycle, so add a flag for it.
> 
> Signed-off-by: Feng Tang <feng.tang@intel.com>
> ---
>  arch/x86/include/asm/cpufeature.h |    1 +
>  arch/x86/kernel/cpu/intel.c       |   12 ++++++++++++
>  2 files changed, 13 insertions(+)
> 
> diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
> index 2d9075e..f7e1eac 100644
> --- a/arch/x86/include/asm/cpufeature.h
> +++ b/arch/x86/include/asm/cpufeature.h
> @@ -100,6 +100,7 @@
>  #define X86_FEATURE_AMD_DCM     (3*32+27) /* multi-node processor */
>  #define X86_FEATURE_APERFMPERF	(3*32+28) /* APERFMPERF */
>  #define X86_FEATURE_EAGER_FPU	(3*32+29) /* "eagerfpu" Non lazy FPU restore */
> +#define X86_FEATURE_TSC_S3_NOTSTOP (3*32+30) /* TSC doesn't stop in S3 state */
>  
We have an existed "TSC always running in C3+" feature and name it as
X86_FEATURE_NONSTOP_TSC, so how about naming it with the same style,
like X86_FEATURE_NONSTOP_TSC_S3?


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 1/5] x86: Add cpu capability flag X86_FEATURE_TSC_S3_NOTSTOP
  2013-01-21  7:27   ` Chen Gong
@ 2013-01-21  7:59     ` Feng Tang
  2013-01-21 15:58       ` H. Peter Anvin
  0 siblings, 1 reply; 30+ messages in thread
From: Feng Tang @ 2013-01-21  7:59 UTC (permalink / raw)
  To: Thomas Gleixner, John Stultz, Ingo Molnar, H. Peter Anvin, x86,
	Len Brown, Rafael J. Wysocki, linux-kernel

On Mon, Jan 21, 2013 at 02:27:29AM -0500, Chen Gong wrote:
> On Mon, Jan 21, 2013 at 02:38:41PM +0800, Feng Tang wrote:
> > Date:	Mon, 21 Jan 2013 14:38:41 +0800
> > From: Feng Tang <feng.tang@intel.com>
> > To: Thomas Gleixner <tglx@linutronix.de>, John Stultz
> >  <john.stultz@linaro.org>, Ingo Molnar <mingo@elte.hu>, "H. Peter Anvin"
> >  <hpa@linux.intel.com>, x86@kernel.org, Len Brown <lenb@kernel.org>,
> >  "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
> >  linux-kernel@vger.kernel.org
> > Cc: Feng Tang <feng.tang@intel.com>
> > Subject: [RFC PATCH 1/5] x86: Add cpu capability flag
> >  X86_FEATURE_TSC_S3_NOTSTOP
> > X-Mailer: git-send-email 1.7.9.5
> > 
> > On some new Intel Atom processors (Penwell and Cloverview), there is
> > a feature that the TSC won't stop S3, say the TSC value won't be
> > reset to 0 after resume. This feature makes TSC a more reliable
> > clocksource and could benefit the timekeeping code during system
> > suspend/resume cycle, so add a flag for it.
> > 
> > Signed-off-by: Feng Tang <feng.tang@intel.com>
> > ---
> >  arch/x86/include/asm/cpufeature.h |    1 +
> >  arch/x86/kernel/cpu/intel.c       |   12 ++++++++++++
> >  2 files changed, 13 insertions(+)
> > 
> > diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
> > index 2d9075e..f7e1eac 100644
> > --- a/arch/x86/include/asm/cpufeature.h
> > +++ b/arch/x86/include/asm/cpufeature.h
> > @@ -100,6 +100,7 @@
> >  #define X86_FEATURE_AMD_DCM     (3*32+27) /* multi-node processor */
> >  #define X86_FEATURE_APERFMPERF	(3*32+28) /* APERFMPERF */
> >  #define X86_FEATURE_EAGER_FPU	(3*32+29) /* "eagerfpu" Non lazy FPU restore */
> > +#define X86_FEATURE_TSC_S3_NOTSTOP (3*32+30) /* TSC doesn't stop in S3 state */
> >  
> We have an existed "TSC always running in C3+" feature and name it as
> X86_FEATURE_NONSTOP_TSC, so how about naming it with the same style,
> like X86_FEATURE_NONSTOP_TSC_S3?

Yeah, actually I used a name X86_FEATURE_xxx_TSC, then I did a grep,
and found there is no unified name convention for TSC, so I chose such
a name.

--------------
#grep _TSC arch/x86/include/asm/cpufeature.h
#define X86_FEATURE_TSC         (0*32+ 4) /* Time Stamp Counter */
#define X86_FEATURE_CONSTANT_TSC (3*32+ 8) /* TSC ticks at a constant rate */
#define X86_FEATURE_TSC_RELIABLE (3*32+23) /* TSC is known to be reliable */
#define X86_FEATURE_NONSTOP_TSC (3*32+24) /* TSC does not stop in C states */
#define X86_FEATURE_TSC_S3_NOTSTOP (3*32+30) /* TSC doesn't stop in S3 state */
#define X86_FEATURE_TSC_DEADLINE_TIMER  (4*32+24) /* Tsc deadline timer */
#define X86_FEATURE_TSCRATEMSR  (8*32+ 9) /* "tsc_scale" AMD TSC scaling support */
#define X86_FEATURE_TSC_ADJUST  (9*32+ 1) /* TSC adjustment MSR 0x3b */

Thanks,
Feng







^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 0/5] Add support for S3 non-stop TSC support.
  2013-01-21  6:38 [RFC PATCH 0/5] Add support for S3 non-stop TSC support Feng Tang
                   ` (4 preceding siblings ...)
  2013-01-21  6:38 ` [RFC PATCH 5/5] timekeeping: Add support for clocksource which doesn't stop during suspend Feng Tang
@ 2013-01-21 13:55 ` Rafael J. Wysocki
  2013-03-30 18:14   ` Pavel Machek
  2013-01-21 18:46 ` John Stultz
  6 siblings, 1 reply; 30+ messages in thread
From: Rafael J. Wysocki @ 2013-01-21 13:55 UTC (permalink / raw)
  To: Feng Tang
  Cc: Thomas Gleixner, John Stultz, Ingo Molnar, H. Peter Anvin, x86,
	Len Brown, linux-kernel

On 1/21/2013 7:38 AM, Feng Tang wrote:
> Hi All,
>
> On some new Intel Atom processors (Penwell and Cloverview), there is
> a feature that the TSC won't stop S3, say the TSC value won't be
> reset to 0 after resume. This feature makes TSC a more reliable
> clocksource and could benefit the timekeeping code during system
> suspend/resume cycles.
>
> The enabling efforts include adding new flags for this feature,
> modifying clocksource.c and timekeeping.c to support and utilizing
> it.
>
> One remaining question is inside the timekeeping_resume(), we don't
> know if it is called by resuming from suspend(s2ram) or from
> hibernate(s2disk), as there is no easy way to check it currently.
> But it doesn't hurt as these Penwell/Cloverview platforms only have
> S3 state, and no S4.
>
> Please help to review them, thanks!

The patches look reasonable to me.

Thanks,
Rafael

---------------------------------------------------------------------
Intel Technology Poland sp. z o.o.
z siedziba w Gdansku
ul. Slowackiego 173
80-298 Gdansk

Sad Rejonowy Gdansk Polnoc w Gdansku, 
VII Wydzial Gospodarczy Krajowego Rejestru Sadowego, 
numer KRS 101882

NIP 957-07-52-316
Kapital zakladowy 200.000 zl

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 1/5] x86: Add cpu capability flag X86_FEATURE_TSC_S3_NOTSTOP
  2013-01-21  7:59     ` Feng Tang
@ 2013-01-21 15:58       ` H. Peter Anvin
  2013-01-22 14:07         ` Feng Tang
  0 siblings, 1 reply; 30+ messages in thread
From: H. Peter Anvin @ 2013-01-21 15:58 UTC (permalink / raw)
  To: Feng Tang
  Cc: Thomas Gleixner, John Stultz, Ingo Molnar, x86, Len Brown,
	Rafael J. Wysocki, linux-kernel

On 01/21/2013 01:59 AM, Feng Tang wrote:
>>>  
>> We have an existed "TSC always running in C3+" feature and name it as
>> X86_FEATURE_NONSTOP_TSC, so how about naming it with the same style,
>> like X86_FEATURE_NONSTOP_TSC_S3?
> 
> Yeah, actually I used a name X86_FEATURE_xxx_TSC, then I did a grep,
> and found there is no unified name convention for TSC, so I chose such
> a name.
> 

I think NONSTOP_TSC_S3 is quite a bit more consistent than S3_NOTSTOP...

	-hpa



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 0/5] Add support for S3 non-stop TSC support.
  2013-01-21  6:38 [RFC PATCH 0/5] Add support for S3 non-stop TSC support Feng Tang
                   ` (5 preceding siblings ...)
  2013-01-21 13:55 ` [RFC PATCH 0/5] Add support for S3 non-stop TSC support Rafael J. Wysocki
@ 2013-01-21 18:46 ` John Stultz
  2013-01-22 14:55   ` Feng Tang
  2013-01-22 19:57   ` Jason Gunthorpe
  6 siblings, 2 replies; 30+ messages in thread
From: John Stultz @ 2013-01-21 18:46 UTC (permalink / raw)
  To: Feng Tang
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Len Brown,
	Rafael J. Wysocki, linux-kernel, Jason Gunthorpe

On 01/20/2013 10:38 PM, Feng Tang wrote:
> Hi All,
>
> On some new Intel Atom processors (Penwell and Cloverview), there is
> a feature that the TSC won't stop S3, say the TSC value won't be
> reset to 0 after resume. This feature makes TSC a more reliable
> clocksource and could benefit the timekeeping code during system
> suspend/resume cycles.
>
> The enabling efforts include adding new flags for this feature,
> modifying clocksource.c and timekeeping.c to support and utilizing
> it.
>
> One remaining question is inside the timekeeping_resume(), we don't
> know if it is called by resuming from suspend(s2ram) or from
> hibernate(s2disk), as there is no easy way to check it currently.
> But it doesn't hurt as these Penwell/Cloverview platforms only have
> S3 state, and no S4.
>

Ooof. This is an interesting feature, but it does complicate things a bit.

So just a few high-level thoughts initially.

The clocksource code has to balance being able to make fine tuned 
adjustments with also being able to properly account for time when no 
timer interrupts occur. So by stretching the maximum time interval out, 
you end up hurting the adjustment granularity.

Also, since you still have a limited time value (40 minutes instead of 
10), you will still run into lost time issues if the system suspends for 
longer then that. I think its reasonable to expect we get timer 
interrupts at least every 10 minutes while the system is running, but 
that's maybe not a reasonable expectation in suspend (even if we push it 
out to 40 minutes).

Because of this, I think trying to integrate this feature into the 
clocksource code is the wrong approach.


What this feature really reminds me of, is our discussion with Jason, 
and how the 32k counter is used on some ARM platforms with 
read_persistent_clock(). While read_persistent_clock() was initially a 
sort of special RTC interface, which let us initialize time properly in 
early boot and manage tracking suspend/resume time (before interrupts 
are enabled). The ARM platforms with the 32k counter really only use it 
for suspend/resume tracking (since it doesn't give a valid time at 
boot), and instead initialize time some other way.  I always considered 
it an interesting and creative slight misuse of the interface, but now 
that there's a good example of other systems where this approach would 
be usable, I think we should probably formalize it some.


What I'd propose is that we break the read_persistent_clock() 
functionality up. So we need two interfaces:
1) An interface to access a time value we used to initialize the 
system's CLOCK_REALTIME time.
2) An interface to measure the length of suspend.


Interface #1 could be possibly just replaced with the RTCTOSYS 
functionality. Although the downside there is that for some time at 
bootup between the timekeeping_init() function running (prior to 
interrupts being enabled) and the RTC driver being available (after 
interrupts are enabled), where we'd have an incorrect system clock. So 
we may want to preserve something like the existing 
read_persistent_clock() interface, but as Jason suggested, we could push 
that access into the RTC driver itself.

Interface #2 could then be either RTC based, or countinuous counter 
based. Since we still want to do this measurement with interrupts off, 
we still would need that interrupt-free RTC method like 
read_persistent_clock() where supported (falling back to the RTC 
driver's suspend/resume handler to try to fix things up as best it can 
if that's not available).


There is still plenty of ugly details as to how interface #2 would work. 
Since it could return something as coarse as seconds, or it could 
provide nanosecond granularity, you probably want to return a timespec 
that we'd capture at suspend and resume, and calculate the delta of. 
However, in order to properly provide a timespec from a raw TSC counter, 
you need to be careful with the math to avoid overflows as TSC counter 
value grows (take a look at the sched_clock code). Also whatever 
function backs this would need to have the logic to know when to use the 
TSC counter vs falling back to the RTC in the case where we're actually 
able to go into S4.

thanks
-john















^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 1/5] x86: Add cpu capability flag X86_FEATURE_TSC_S3_NOTSTOP
  2013-01-21 15:58       ` H. Peter Anvin
@ 2013-01-22 14:07         ` Feng Tang
  0 siblings, 0 replies; 30+ messages in thread
From: Feng Tang @ 2013-01-22 14:07 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, John Stultz, Ingo Molnar, x86, Len Brown,
	Rafael J. Wysocki, linux-kernel

On Mon, Jan 21, 2013 at 09:58:32AM -0600, H. Peter Anvin wrote:
> On 01/21/2013 01:59 AM, Feng Tang wrote:
> >>>  
> >> We have an existed "TSC always running in C3+" feature and name it as
> >> X86_FEATURE_NONSTOP_TSC, so how about naming it with the same style,
> >> like X86_FEATURE_NONSTOP_TSC_S3?
> > 
> > Yeah, actually I used a name X86_FEATURE_xxx_TSC, then I did a grep,
> > and found there is no unified name convention for TSC, so I chose such
> > a name.
> > 
> 
> I think NONSTOP_TSC_S3 is quite a bit more consistent than S3_NOTSTOP...

Got it, will use that in new version.

Thanks
Feng

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 0/5] Add support for S3 non-stop TSC support.
  2013-01-21 18:46 ` John Stultz
@ 2013-01-22 14:55   ` Feng Tang
  2013-01-22 21:56     ` John Stultz
  2013-01-22 19:57   ` Jason Gunthorpe
  1 sibling, 1 reply; 30+ messages in thread
From: Feng Tang @ 2013-01-22 14:55 UTC (permalink / raw)
  To: John Stultz
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Len Brown,
	Rafael J. Wysocki, linux-kernel, Jason Gunthorpe

Hi John,

On Mon, Jan 21, 2013 at 10:46:31AM -0800, John Stultz wrote:
> On 01/20/2013 10:38 PM, Feng Tang wrote:
> >Hi All,
> >
> >On some new Intel Atom processors (Penwell and Cloverview), there is
> >a feature that the TSC won't stop S3, say the TSC value won't be
> >reset to 0 after resume. This feature makes TSC a more reliable
> >clocksource and could benefit the timekeeping code during system
> >suspend/resume cycles.
> >
> >The enabling efforts include adding new flags for this feature,
> >modifying clocksource.c and timekeeping.c to support and utilizing
> >it.
> >
> >One remaining question is inside the timekeeping_resume(), we don't
> >know if it is called by resuming from suspend(s2ram) or from
> >hibernate(s2disk), as there is no easy way to check it currently.
> >But it doesn't hurt as these Penwell/Cloverview platforms only have
> >S3 state, and no S4.
> >
> 
> Ooof. This is an interesting feature, but it does complicate things a bit.
> 
> So just a few high-level thoughts initially.
> 
> The clocksource code has to balance being able to make fine tuned
> adjustments with also being able to properly account for time when
> no timer interrupts occur. So by stretching the maximum time
> interval out, you end up hurting the adjustment granularity.
> 
> Also, since you still have a limited time value (40 minutes instead
> of 10), you will still run into lost time issues if the system
> suspends for longer then that. I think its reasonable to expect we
> get timer interrupts at least every 10 minutes while the system is
> running, but that's maybe not a reasonable expectation in suspend
> (even if we push it out to 40 minutes).

Good point. There were 2 reasons I chose 40 mins, one is the Android
running on our platform will set a RTC alarm to wake up system no
longer than 30 minutes, the other was to not hurt the precision too
much. I agree this change has some problems, and should be dumped.

> 
> Because of this, I think trying to integrate this feature into the
> clocksource code is the wrong approach.
> 
> 
> What this feature really reminds me of, is our discussion with
> Jason, and how the 32k counter is used on some ARM platforms with
> read_persistent_clock(). While read_persistent_clock() was initially
> a sort of special RTC interface, which let us initialize time
> properly in early boot and manage tracking suspend/resume time
> (before interrupts are enabled). The ARM platforms with the 32k
> counter really only use it for suspend/resume tracking (since it
> doesn't give a valid time at boot), and instead initialize time some
> other way.  I always considered it an interesting and creative
> slight misuse of the interface, but now that there's a good example
> of other systems where this approach would be usable, I think we
> should probably formalize it some.

Yes, that ARM platform's usage model is really interesting.

> 
> What I'd propose is that we break the read_persistent_clock()
> functionality up. So we need two interfaces:
> 1) An interface to access a time value we used to initialize the
> system's CLOCK_REALTIME time.
> 2) An interface to measure the length of suspend.
> 
> 
> Interface #1 could be possibly just replaced with the RTCTOSYS
> functionality. Although the downside there is that for some time at
> bootup between the timekeeping_init() function running (prior to
> interrupts being enabled) and the RTC driver being available (after
> interrupts are enabled), where we'd have an incorrect system clock.
> So we may want to preserve something like the existing
> read_persistent_clock() interface, but as Jason suggested, we could
> push that access into the RTC driver itself.

One case is one platform need a minimum size of kernel, which only
needs to use the read_persistent_clock for time init, and chose
to not compile in the "drivers/rtc/". So I think read_persistent_clock()
is needed anyway to remove the dependency over the rtc system.

IIRC, some EFI backed x86 system's read_persistent_clock() is
implemented by EFI's runtime gettime service.

> 
> Interface #2 could then be either RTC based, or countinuous counter
> based. Since we still want to do this measurement with interrupts
> off, we still would need that interrupt-free RTC method like
> read_persistent_clock() where supported (falling back to the RTC
> driver's suspend/resume handler to try to fix things up as best it
> can if that's not available).

Do you mean to create a new function and not embed the suspend/hibernate
time compensation code inside timekeeping_suspend/resume()?

> There is still plenty of ugly details as to how interface #2 would
> work. Since it could return something as coarse as seconds, or it
> could provide nanosecond granularity, you probably want to return a
> timespec that we'd capture at suspend and resume, and calculate the

Yes, we should keep to use the timespec way in current code.

> delta of. However, in order to properly provide a timespec from a
> raw TSC counter, you need to be careful with the math to avoid
> overflows as TSC counter value grows (take a look at the sched_clock
> code). Also whatever function backs this would need to have the
> logic to know when to use the TSC counter vs falling back to the RTC
> in the case where we're actually able to go into S4.

Thanks for the hint, will study the sched_clock code. And yes, how
to tell s2ram or s2disk remains a tough task.

Thanks,
Feng
 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 0/5] Add support for S3 non-stop TSC support.
  2013-01-21 18:46 ` John Stultz
  2013-01-22 14:55   ` Feng Tang
@ 2013-01-22 19:57   ` Jason Gunthorpe
  2013-01-22 20:22     ` John Stultz
  1 sibling, 1 reply; 30+ messages in thread
From: Jason Gunthorpe @ 2013-01-22 19:57 UTC (permalink / raw)
  To: John Stultz
  Cc: Feng Tang, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Len Brown, Rafael J. Wysocki, linux-kernel

On Mon, Jan 21, 2013 at 10:46:31AM -0800, John Stultz wrote:
 
> What I'd propose is that we break the read_persistent_clock()
> functionality up. So we need two interfaces:
> 1) An interface to access a time value we used to initialize the
> system's CLOCK_REALTIME time.
> 2) An interface to measure the length of suspend.

> Interface #1 could be possibly just replaced with the RTCTOSYS
> functionality. Although the downside there is that for some time at
> bootup between the timekeeping_init() function running (prior to
> interrupts being enabled) and the RTC driver being available (after
> interrupts are enabled), where we'd have an incorrect system clock.
> So we may want to preserve something like the existing
> read_persistent_clock() interface, but as Jason suggested, we could
> push that access into the RTC driver itself.

How big of an issue is this? Could the RTCTOSYS function be moved to
the moment the RTC driver is registered rather than using a
late_initcall?
 
> Interface #2 could then be either RTC based, or countinuous counter
> based. Since we still want to do this measurement with interrupts
> off, we still would need that interrupt-free RTC method like
> read_persistent_clock() where supported (falling back to the RTC
> driver's suspend/resume handler to try to fix things up as best it
> can if that's not available).

Could the counter version of this be bundled into the clocksource
framework? It already has generic APIs for handling cycle counters and
things. Isn't there a TSC driver for clocksource already? Is all that
is missing is a way to tell if the counter survived suspend?

clocksource already has suspend/resume callbacks stuff, so the counter
driver could sense if the sleep was too deep and mark itself as
invalid.

This would help solve the problem on ARM with muxing persistent clock
on multi-platform.

A RTC device flag 'readable with interrupts off' still seems like a
good idea for the RTC case..

Cheers,
Jason

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 0/5] Add support for S3 non-stop TSC support.
  2013-01-22 19:57   ` Jason Gunthorpe
@ 2013-01-22 20:22     ` John Stultz
  2013-01-23  0:26       ` Jason Gunthorpe
  0 siblings, 1 reply; 30+ messages in thread
From: John Stultz @ 2013-01-22 20:22 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Feng Tang, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Len Brown, Rafael J. Wysocki, linux-kernel

On 01/22/2013 11:57 AM, Jason Gunthorpe wrote:
> On Mon, Jan 21, 2013 at 10:46:31AM -0800, John Stultz wrote:
>   
>> What I'd propose is that we break the read_persistent_clock()
>> functionality up. So we need two interfaces:
>> 1) An interface to access a time value we used to initialize the
>> system's CLOCK_REALTIME time.
>> 2) An interface to measure the length of suspend.
>> Interface #1 could be possibly just replaced with the RTCTOSYS
>> functionality. Although the downside there is that for some time at
>> bootup between the timekeeping_init() function running (prior to
>> interrupts being enabled) and the RTC driver being available (after
>> interrupts are enabled), where we'd have an incorrect system clock.
>> So we may want to preserve something like the existing
>> read_persistent_clock() interface, but as Jason suggested, we could
>> push that access into the RTC driver itself.
> How big of an issue is this? Could the RTCTOSYS function be moved to
> the moment the RTC driver is registered rather than using a
> late_initcall?

It may not be huge. Most early boot items are going to be 
CLOCK_MONOTONIC based, which would be unaffected. So that's a potential 
solution, but I'm hesitant to claim there'd be no side effects.


>> Interface #2 could then be either RTC based, or countinuous counter
>> based. Since we still want to do this measurement with interrupts
>> off, we still would need that interrupt-free RTC method like
>> read_persistent_clock() where supported (falling back to the RTC
>> driver's suspend/resume handler to try to fix things up as best it
>> can if that's not available).
> Could the counter version of this be bundled into the clocksource
> framework? It already has generic APIs for handling cycle counters and
> things. Isn't there a TSC driver for clocksource already? Is all that
> is missing is a way to tell if the counter survived suspend?

So without *major* rework, I'd rather not do this. Again, the 
clocksource code has quite a few assumptions built in that are optimized 
for timekeeping (where we avoid overflows by expecting relatively 
frequent updates), and very different approaches are needed for 
something like suspend (where valid suspend times could be potentially 
months to years).

That said, given ARM's use of clocksources for sched_clock, there may be 
some reasonable claim to splitting the clocksource mult/shift selection 
management away from the clocksource itself (instead having it managed 
by the timekeeping core). That would allow other subsystems to use the 
clocksource accessor function, managing their own mult/shift selection 
trade-offs independently.

But such an effort would be of substantial size, given how invasive it 
would be. So I'm not sure this is likely to happen in the near term 
without a dedicated effort.

So I think the smaller effort of splitting up the 
read_persistent_clock() and making it more reasonably handle counters 
like this S3 non-stop TSC and the 32k counter is a more reasonable 
mid-step (especially since there will still be the same logical division 
from a timekeeping perspective between runtime clocksources and 
suspend-measuring clocksources if we were to do the major overhaul 
eventually).


> clocksource already has suspend/resume callbacks stuff, so the counter
> driver could sense if the sleep was too deep and mark itself as
> invalid.
But at that point you've lost time. If this was all centrally 
controlled, you have to know before hand what the bounds would be. With 
the TSC, we know it won't wrap around our starting measurement for at 
least 10 years. That's a reasonable range for suspend.  We don't want to 
resume and just get a "oh, bad call, you picked the wrong clocksource 
for such a long suspend", and really without the clocksource checking 
with the RTC I don't think it can even know if its been too long itself 
(since maybe the counter wrapped, but maybe not).

> This would help solve the problem on ARM with muxing persistent clock
> on multi-platform.
>
> A RTC device flag 'readable with interrupts off' still seems like a
> good idea for the RTC case..

Yea, I think this point you're probably right, as I'm warming to the 
idea of pushing the existing read_persistent_clock() into the rtc device 
code. Just need to make sure not initializing CLOCK_REALTIME at 
timekeeping_init isn't going to have negative effects.

thanks
-john



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 0/5] Add support for S3 non-stop TSC support.
  2013-01-22 14:55   ` Feng Tang
@ 2013-01-22 21:56     ` John Stultz
  2013-01-24  3:37       ` Feng Tang
  0 siblings, 1 reply; 30+ messages in thread
From: John Stultz @ 2013-01-22 21:56 UTC (permalink / raw)
  To: Feng Tang
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Len Brown,
	Rafael J. Wysocki, linux-kernel, Jason Gunthorpe

On 01/22/2013 06:55 AM, Feng Tang wrote:
> Hi John,
>
> On Mon, Jan 21, 2013 at 10:46:31AM -0800, John Stultz wrote:
>> What I'd propose is that we break the read_persistent_clock()
>> functionality up. So we need two interfaces:
>> 1) An interface to access a time value we used to initialize the
>> system's CLOCK_REALTIME time.
>> 2) An interface to measure the length of suspend.
>>
>>
>> Interface #1 could be possibly just replaced with the RTCTOSYS
>> functionality. Although the downside there is that for some time at
>> bootup between the timekeeping_init() function running (prior to
>> interrupts being enabled) and the RTC driver being available (after
>> interrupts are enabled), where we'd have an incorrect system clock.
>> So we may want to preserve something like the existing
>> read_persistent_clock() interface, but as Jason suggested, we could
>> push that access into the RTC driver itself.
> One case is one platform need a minimum size of kernel, which only
> needs to use the read_persistent_clock for time init, and chose
> to not compile in the "drivers/rtc/". So I think read_persistent_clock()
> is needed anyway to remove the dependency over the rtc system.

I think hard numbers would be needed to show the rtc layer is causing 
major issues for space constrained kernels, so this trade-off could be 
properly prioritized. Having duplicate code paths in standard kernels is 
wasteful as well.

> IIRC, some EFI backed x86 system's read_persistent_clock() is
> implemented by EFI's runtime gettime service.
Interesting, does the rtc driver not support this?



>
>> Interface #2 could then be either RTC based, or countinuous counter
>> based. Since we still want to do this measurement with interrupts
>> off, we still would need that interrupt-free RTC method like
>> read_persistent_clock() where supported (falling back to the RTC
>> driver's suspend/resume handler to try to fix things up as best it
>> can if that's not available).
> Do you mean to create a new function and not embed the suspend/hibernate
> time compensation code inside timekeeping_suspend/resume()?

No, that's not what I mean.  timekeeping_suspend/resume still would do 
the proper timekeeping adjustments, but there would be a new interface 
that is similar to read_persistent_clock() that does not necessarily 
return a time value to be used to initialize CLOCK_REALTIME. It could 
return the same value read_persistent_clock() does today, but it would 
be acceptable for it to return time values that are not based from the 
unix epoch (as platforms with the 32k counter do already).


>
>> There is still plenty of ugly details as to how interface #2 would
>> work. Since it could return something as coarse as seconds, or it
>> could provide nanosecond granularity, you probably want to return a
>> timespec that we'd capture at suspend and resume, and calculate the
> Yes, we should keep to use the timespec way in current code.
>
>> delta of. However, in order to properly provide a timespec from a
>> raw TSC counter, you need to be careful with the math to avoid
>> overflows as TSC counter value grows (take a look at the sched_clock
>> code). Also whatever function backs this would need to have the
>> logic to know when to use the TSC counter vs falling back to the RTC
>> in the case where we're actually able to go into S4.
> Thanks for the hint, will study the sched_clock code. And yes, how
> to tell s2ram or s2disk remains a tough task.
Although from whatever the new read_persistent_clock interface would be, 
you might be able to detect things like the TSC value being reset (less 
then what it was at suspend time), and fall back to an RTC approximation 
of what the timestamp should be? Or alternatively, on hardware that can 
hybernate, avoid using the tsc counter entirely. Either way, these 
implementation details should be contained in the architecture's new 
read_persistent_clock() implementation, and likely not need any changes 
in the timekeeping code (other then to adapt to use the new interface).

thanks
-john


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 0/5] Add support for S3 non-stop TSC support.
  2013-01-22 20:22     ` John Stultz
@ 2013-01-23  0:26       ` Jason Gunthorpe
  2013-01-23  0:41         ` John Stultz
  0 siblings, 1 reply; 30+ messages in thread
From: Jason Gunthorpe @ 2013-01-23  0:26 UTC (permalink / raw)
  To: John Stultz
  Cc: Feng Tang, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Len Brown, Rafael J. Wysocki, linux-kernel

On Tue, Jan 22, 2013 at 12:22:29PM -0800, John Stultz wrote:

> >How big of an issue is this? Could the RTCTOSYS function be moved to
> >the moment the RTC driver is registered rather than using a
> >late_initcall?
> 
> It may not be huge. Most early boot items are going to be
> CLOCK_MONOTONIC based, which would be unaffected. So that's a
> potential solution, but I'm hesitant to claim there'd be no side
> effects.

Well, ARM/PPC/etc pretty much rely on RTCTOSYS for time, so if there
are side effects then they are going to be problematic for not-x86
today and should be fixed up.. But that probably also says there are
not many side effects because folks are not complaining??

> >>Interface #2 could then be either RTC based, or countinuous counter
> >>based. Since we still want to do this measurement with interrupts
> >>off, we still would need that interrupt-free RTC method like
> >>read_persistent_clock() where supported (falling back to the RTC
> >>driver's suspend/resume handler to try to fix things up as best it
> >>can if that's not available).
> >Could the counter version of this be bundled into the clocksource
> >framework? It already has generic APIs for handling cycle counters and
> >things. Isn't there a TSC driver for clocksource already? Is all that
> >is missing is a way to tell if the counter survived suspend?
 
> So without *major* rework, I'd rather not do this. Again, the
> clocksource code has quite a few assumptions built in that are
> optimized for timekeeping (where we avoid overflows by expecting
> relatively frequent updates), and very different approaches are
> needed for something like suspend (where valid suspend times could
> be potentially months to years).

Well, I was thinking something very simple..

The reason to be interested in the clocksource code is there is
already so much support code to make it easy to use for many timers
out there, and there is already TSC support for it. Plus there is
already the full architecture for muxing multiple drivers, which is
pretty important...

The simple case is that any clocksource intended for suspend time
keeping must not over flow for reasonable times (years?), so you can
ignore the overflow problem entirely. The 32kHz ARM counter and the 64
bit TSC both seem to be OK in this regard.

> >clocksource already has suspend/resume callbacks stuff, so the counter
> >driver could sense if the sleep was too deep and mark itself as
> >invalid.

> But at that point you've lost time. If this was all centrally
> controlled, you have to know before hand what the bounds would be.
> With the TSC, we know it won't wrap around our starting measurement
> for at least 10 years. That's a reasonable range for suspend.  We
> don't want to resume and just get a "oh, bad call, you picked the
> wrong clocksource for such a long suspend", and really without the
> clocksource checking with the RTC I don't think it can even know if
> its been too long itself (since maybe the counter wrapped, but maybe
> not).

I'm not worrying about overflow here, I was thinking about different
sleep states. Eg a timer may only function in suspend to ram but not
hibernate to disk, so on transition in/out of hibernate it would allow
the clock source driver to detect that transition and mark itself as
invalid.

So, it would work something like:
 - Prior to suspend record the result of read() from all clock_sources
 - Run through all the suspend call backs. If the suspend state (eg
   hibernate) is too deep then the clock source PM call back will mark
   it as invalid
 - Upon resume do another read from all clock sources, and also do a
   'survived_suspend' type of call. Take the highest priority
   clock source that survived suspend and use that delta to update the
   realtime clock.
 - If no clock sources survived then attempt to read without
   interrupts from the RTC driver
 - If you couldn't read without interrupts from the RTC driver then
   schedual a RTC read when interrupts are on

A fancier version could sanity check the clocksource delta with the
RTC delta, if they differ by more max(~10%,2 sec) then use the RTC
delta, this would handle clocksource overflows fairly simply.

Jason

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 0/5] Add support for S3 non-stop TSC support.
  2013-01-23  0:26       ` Jason Gunthorpe
@ 2013-01-23  0:41         ` John Stultz
  2013-01-23  1:37           ` Jason Gunthorpe
  0 siblings, 1 reply; 30+ messages in thread
From: John Stultz @ 2013-01-23  0:41 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Feng Tang, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Len Brown, Rafael J. Wysocki, linux-kernel

On 01/22/2013 04:26 PM, Jason Gunthorpe wrote:
> On Tue, Jan 22, 2013 at 12:22:29PM -0800, John Stultz wrote:
>
>>> How big of an issue is this? Could the RTCTOSYS function be moved to
>>> the moment the RTC driver is registered rather than using a
>>> late_initcall?
>> It may not be huge. Most early boot items are going to be
>> CLOCK_MONOTONIC based, which would be unaffected. So that's a
>> potential solution, but I'm hesitant to claim there'd be no side
>> effects.
> Well, ARM/PPC/etc pretty much rely on RTCTOSYS for time, so if there
> are side effects then they are going to be problematic for not-x86
> today and should be fixed up.. But that probably also says there are
> not many side effects because folks are not complaining??
>
>>>> Interface #2 could then be either RTC based, or countinuous counter
>>>> based. Since we still want to do this measurement with interrupts
>>>> off, we still would need that interrupt-free RTC method like
>>>> read_persistent_clock() where supported (falling back to the RTC
>>>> driver's suspend/resume handler to try to fix things up as best it
>>>> can if that's not available).
>>> Could the counter version of this be bundled into the clocksource
>>> framework? It already has generic APIs for handling cycle counters and
>>> things. Isn't there a TSC driver for clocksource already? Is all that
>>> is missing is a way to tell if the counter survived suspend?
>   
>> So without *major* rework, I'd rather not do this. Again, the
>> clocksource code has quite a few assumptions built in that are
>> optimized for timekeeping (where we avoid overflows by expecting
>> relatively frequent updates), and very different approaches are
>> needed for something like suspend (where valid suspend times could
>> be potentially months to years).
> Well, I was thinking something very simple..
>
> The reason to be interested in the clocksource code is there is
> already so much support code to make it easy to use for many timers
> out there, and there is already TSC support for it. Plus there is
> already the full architecture for muxing multiple drivers, which is
> pretty important...
>
> The simple case is that any clocksource intended for suspend time
> keeping must not over flow for reasonable times (years?), so you can
> ignore the overflow problem entirely. The 32kHz ARM counter and the 64
> bit TSC both seem to be OK in this regard.

Right but to calculate an suspend interval (since they are likely many 
orders of magnitude larger then the intervals between timer interrupts), 
you need different mult/shift selection.  Its splitting out the 
mult/shift management into a per-subsystem level that is the complicated 
part. Additionally, there may be cases where the timekeeping clocksource 
is one clocksource and the suspend clocksource is another. So I think to 
properly integrate this sort of functionality w/ clocksources is going 
to require a serious rework of the clocksource code.


>
>>> clocksource already has suspend/resume callbacks stuff, so the counter
>>> driver could sense if the sleep was too deep and mark itself as
>>> invalid.
>> But at that point you've lost time. If this was all centrally
>> controlled, you have to know before hand what the bounds would be.
>> With the TSC, we know it won't wrap around our starting measurement
>> for at least 10 years. That's a reasonable range for suspend.  We
>> don't want to resume and just get a "oh, bad call, you picked the
>> wrong clocksource for such a long suspend", and really without the
>> clocksource checking with the RTC I don't think it can even know if
>> its been too long itself (since maybe the counter wrapped, but maybe
>> not).
> I'm not worrying about overflow here, I was thinking about different
> sleep states. Eg a timer may only function in suspend to ram but not
> hibernate to disk, so on transition in/out of hibernate it would allow
> the clock source driver to detect that transition and mark itself as
> invalid.
>
> So, it would work something like:
>   - Prior to suspend record the result of read() from all clock_sources
>   - Run through all the suspend call backs. If the suspend state (eg
>     hibernate) is too deep then the clock source PM call back will mark
>     it as invalid
>   - Upon resume do another read from all clock sources, and also do a
>     'survived_suspend' type of call. Take the highest priority
>     clock source that survived suspend and use that delta to update the
>     realtime clock.
>   - If no clock sources survived then attempt to read without
>     interrupts from the RTC driver
>   - If you couldn't read without interrupts from the RTC driver then
>     schedual a RTC read when interrupts are on
>
> A fancier version could sanity check the clocksource delta with the
> RTC delta, if they differ by more max(~10%,2 sec) then use the RTC
> delta, this would handle clocksource overflows fairly simply.

So something like this flow sounds fine, but I think that doing it 
behind the new read_persistent_clock()-like call is the right approach 
in the near term.

thanks
-john


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 0/5] Add support for S3 non-stop TSC support.
  2013-01-23  0:41         ` John Stultz
@ 2013-01-23  1:37           ` Jason Gunthorpe
  2013-01-23  1:54             ` John Stultz
  0 siblings, 1 reply; 30+ messages in thread
From: Jason Gunthorpe @ 2013-01-23  1:37 UTC (permalink / raw)
  To: John Stultz
  Cc: Feng Tang, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Len Brown, Rafael J. Wysocki, linux-kernel

On Tue, Jan 22, 2013 at 04:41:58PM -0800, John Stultz wrote:

> Right but to calculate an suspend interval (since they are likely
> many orders of magnitude larger then the intervals between timer
> interrupts), you need different mult/shift selection.  Its splitting
> out the mult/shift management into a per-subsystem level that is the

You are talking about overflow in cyclecounter_cyc2ns and the like
right? The 64 bit cycle_t and the underlying hw counter (eg 64 bit
rdtsc) are not going to overflow..

An alternate version of cyclecounter_cyc2ns for use by the suspend
code that handles overflow during the mult/shift operation solves that
problem:

// Drops some small precision along the way but is simple..
static inline u64 cyclecounter_cyc2ns_128(const struct cyclecounter *cc,
                                          cycle_t cycles)
{
    u64 max = U64_MAX/cc->mult;
    u64 num = cycles/max;
    u64 result = num * ((max * cc->mult) >> cc->shift);
    return result + cyclecounter_cyc2ns(cc, cycles - num*cc->mult);
}

Or am I missing the issue?

> complicated part. Additionally, there may be cases where the
> timekeeping clocksource is one clocksource and the suspend
> clocksource is another. So I think to properly integrate this sort

Does the difference matter? The clocksource to use is detected at
runtime, if the timekeeping clocksource is no good for suspend time
keeping then it just won't be used. With a distinct
read_persistent_clock API then they are already seperate??

Jason

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 0/5] Add support for S3 non-stop TSC support.
  2013-01-23  1:37           ` Jason Gunthorpe
@ 2013-01-23  1:54             ` John Stultz
  2013-01-23  2:35               ` Jason Gunthorpe
  0 siblings, 1 reply; 30+ messages in thread
From: John Stultz @ 2013-01-23  1:54 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Feng Tang, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Len Brown, Rafael J. Wysocki, linux-kernel

On 01/22/2013 05:37 PM, Jason Gunthorpe wrote:
> On Tue, Jan 22, 2013 at 04:41:58PM -0800, John Stultz wrote:
>
>> Right but to calculate an suspend interval (since they are likely
>> many orders of magnitude larger then the intervals between timer
>> interrupts), you need different mult/shift selection.  Its splitting
>> out the mult/shift management into a per-subsystem level that is the
> You are talking about overflow in cyclecounter_cyc2ns and the like
> right? The 64 bit cycle_t and the underlying hw counter (eg 64 bit
> rdtsc) are not going to overflow..
>
> An alternate version of cyclecounter_cyc2ns for use by the suspend
> code that handles overflow during the mult/shift operation solves that
> problem:
>
> // Drops some small precision along the way but is simple..
> static inline u64 cyclecounter_cyc2ns_128(const struct cyclecounter *cc,
>                                            cycle_t cycles)
> {
>      u64 max = U64_MAX/cc->mult;
>      u64 num = cycles/max;
>      u64 result = num * ((max * cc->mult) >> cc->shift);
>      return result + cyclecounter_cyc2ns(cc, cycles - num*cc->mult);
> }
>
> Or am I missing the issue?

Well, cyclecounters and clocksources are currently different things. 
There was some hope that cyclecounters would be a simpler base structure 
that would supersede clocksources, but the complexity of all the 
variants of clocksources have limited the ability to make such a 
conversion. At least so far. I hope to eventually clean that up as the 
potential overlap is obvious - although as the 
cyclecounters/timecounters code never grew as I expected. But I'm not 
sure how soon "eventually" will end up being.

But regardless of historical tangents :), you're right, an alternate and 
slower cyc2ns function could be used to avoid overflow issues.


>
>> complicated part. Additionally, there may be cases where the
>> timekeeping clocksource is one clocksource and the suspend
>> clocksource is another. So I think to properly integrate this sort
> Does the difference matter? The clocksource to use is detected at
> runtime, if the timekeeping clocksource is no good for suspend time
> keeping then it just won't be used. With a distinct
> read_persistent_clock API then they are already seperate??

Not sure I'm following you here.  I still think the selection of which 
clocksource to use for suspend timing is problematic (especially if its 
not the active timekeeping clocksource).  So I think instead of 
complicating the generic timekeeping code with the logic, I'd rather 
push it off to new read_presistent_clock api.

thanks
-john



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 0/5] Add support for S3 non-stop TSC support.
  2013-01-23  1:54             ` John Stultz
@ 2013-01-23  2:35               ` Jason Gunthorpe
  2013-01-23  3:07                 ` John Stultz
  0 siblings, 1 reply; 30+ messages in thread
From: Jason Gunthorpe @ 2013-01-23  2:35 UTC (permalink / raw)
  To: John Stultz
  Cc: Feng Tang, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Len Brown, Rafael J. Wysocki, linux-kernel

On Tue, Jan 22, 2013 at 05:54:21PM -0800, John Stultz wrote:

> >>complicated part. Additionally, there may be cases where the
> >>timekeeping clocksource is one clocksource and the suspend
> >>clocksource is another. So I think to properly integrate this sort
> >Does the difference matter? The clocksource to use is detected at
> >runtime, if the timekeeping clocksource is no good for suspend time
> >keeping then it just won't be used. With a distinct
> >read_persistent_clock API then they are already seperate??
> 
> Not sure I'm following you here.  I still think the selection of
> which clocksource to use for suspend timing is problematic
> (especially if its not the active timekeeping clocksource).  So I
> think instead of complicating the generic timekeeping code with the
> logic, I'd rather push it off to new read_presistent_clock api.

Well, where do you see the complication?

My thought was to save the cycle_t from *all* the clock sources during
suspend, and then on resume look for the highest priority one where
the driver reports it continued to tick the whole time. The active
timekeeping clock source doesn't seem to come into the equation??

Add
      int (*active_during_suspend)(struct clocksource *cs);
      cycle_t value_before_suspend;
To struct clocksource.

Before suspend:
  foreach(cs,all clocksources) {
      if (cs->active_during_suspend)
            cs->value_before_suspend = cs->read(cs);
  }

After resume:
  cycle_t best_delta = 0;
  struct clocksource *best_cs = NULL;
  foreach(cs,all clocksources) {
      if (cs->active_during_suspend &&
          (best_cs == NULL || cs->rating > best_cs->rating) &&
          cs->active_during_suspend(cs)) {
            best_delta = cs->read(cs) - cs->value_before_suspend;
            best_cs = cs;
      }
  }

Update the TSC driver to set the function pointer
active_during_suspend when the CPU supports non-stop TSC. Have it
return true if S3 was the deepest sleep entered during the last
suspend/resume cycle, false otherwise. Ditto for that ARM case.

This seems reasonably simple, compared to adding a new API, new
drivers, new function pointer multiplexors to arches...

Jason

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 0/5] Add support for S3 non-stop TSC support.
  2013-01-23  2:35               ` Jason Gunthorpe
@ 2013-01-23  3:07                 ` John Stultz
  2013-01-23 19:23                   ` Jason Gunthorpe
  0 siblings, 1 reply; 30+ messages in thread
From: John Stultz @ 2013-01-23  3:07 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Feng Tang, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Len Brown, Rafael J. Wysocki, linux-kernel

On 01/22/2013 06:35 PM, Jason Gunthorpe wrote:
> On Tue, Jan 22, 2013 at 05:54:21PM -0800, John Stultz wrote:
>
>>>> complicated part. Additionally, there may be cases where the
>>>> timekeeping clocksource is one clocksource and the suspend
>>>> clocksource is another. So I think to properly integrate this sort
>>> Does the difference matter? The clocksource to use is detected at
>>> runtime, if the timekeeping clocksource is no good for suspend time
>>> keeping then it just won't be used. With a distinct
>>> read_persistent_clock API then they are already seperate??
>> Not sure I'm following you here.  I still think the selection of
>> which clocksource to use for suspend timing is problematic
>> (especially if its not the active timekeeping clocksource).  So I
>> think instead of complicating the generic timekeeping code with the
>> logic, I'd rather push it off to new read_presistent_clock api.
> Well, where do you see the complication?
>
> My thought was to save the cycle_t from *all* the clock sources during
> suspend, and then on resume look for the highest priority one where
> the driver reports it continued to tick the whole time. The active
> timekeeping clock source doesn't seem to come into the equation??
>
> Add
>        int (*active_during_suspend)(struct clocksource *cs);
>        cycle_t value_before_suspend;
> To struct clocksource.
>
> Before suspend:
>    foreach(cs,all clocksources) {
>        if (cs->active_during_suspend)
>              cs->value_before_suspend = cs->read(cs);
>    }
>
> After resume:
>    cycle_t best_delta = 0;
>    struct clocksource *best_cs = NULL;
>    foreach(cs,all clocksources) {
>        if (cs->active_during_suspend &&
>            (best_cs == NULL || cs->rating > best_cs->rating) &&
>            cs->active_during_suspend(cs)) {
>              best_delta = cs->read(cs) - cs->value_before_suspend;
>              best_cs = cs;
>        }
>    }
>
> Update the TSC driver to set the function pointer
> active_during_suspend when the CPU supports non-stop TSC. Have it
> return true if S3 was the deepest sleep entered during the last
> suspend/resume cycle, false otherwise. Ditto for that ARM case.
>
> This seems reasonably simple, compared to adding a new API, new
> drivers, new function pointer multiplexors to arches...

So yea, your suggestion is attractive in some ways.

But personally, I'm less fond of adding additional state to the 
clocksources, as I'm (admittedly, very) slowly trying to go the other 
way, and make the clocksources mostly state free. This is in part to 
allow for faster timekeeping updates (see: 
https://lkml.org/lkml/2012/3/2/66) - but again, I've not made much 
progress there recently, so this probably isn't a strong enough argument 
against it.

Another downside is that accessing a clocksource can be costly, so doing 
so for every clocksource could unnecessarily slow suspend/resume down. 
Reading all the clocksources avoids the complexity of creating the 
secondary selection and management of a suspend-time measuring 
clocksource, but it also feels a little hackish to me. And iterating 
over the clocksource list requires exposing currently private 
clocksource data to the timekeeping core.

The reason I like the idea of a new persistent_clock api, is that it 
formalizes existing usage, and doesn't require changes to the 
timekeeping logic, or to architectures that don't have running 
suspend-time counters (whatever the new read_persistent_clock interface 
ends up being can call the existing read_persistent_clock function). 
Only on arm and x86 might there be a more complicated function.

But don't let my naysaying stop you from submitting a patch. It would be 
interesting to see your idea fully fleshed out.

I appreciate your persistence here, and apologies for my thick-headed-ness.

thanks
-john



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 0/5] Add support for S3 non-stop TSC support.
  2013-01-23  3:07                 ` John Stultz
@ 2013-01-23 19:23                   ` Jason Gunthorpe
  0 siblings, 0 replies; 30+ messages in thread
From: Jason Gunthorpe @ 2013-01-23 19:23 UTC (permalink / raw)
  To: John Stultz
  Cc: Feng Tang, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Len Brown, Rafael J. Wysocki, linux-kernel

On Tue, Jan 22, 2013 at 07:07:04PM -0800, John Stultz wrote:

> But personally, I'm less fond of adding additional state to the
> clocksources, as I'm (admittedly, very) slowly trying to go the
> other way, and make the clocksources mostly state free. This is in
> part to allow for faster timekeeping updates (see:
> https://lkml.org/lkml/2012/3/2/66) - but again, I've not made much
> progress there recently, so this probably isn't a strong enough
> argument against it.

I think there should be ways to avoid storing the suspend time in the
clocksource struct, but since the suspend time is orthogonal to
timekeeping updates maybe it doesn't matter?

> Another downside is that accessing a clocksource can be costly, so
> doing so for every clocksource could unnecessarily slow
> suspend/resume down. Reading all the clocksources avoids the
> complexity of creating the secondary selection and management of a
> suspend-time measuring clocksource, but it also feels a little
> hackish to me. And iterating over the clocksource list requires
> exposing currently private clocksource data to the timekeeping core.

I was imagining these functions would be in the clocksource code and
called from suspend (clocksource_suspend_prepare,
clocksource_suspend_delta or some such). Not sure on iteration
expense, but you only need to look at clock sources that have a
active_during_suspend function pointer, so there would be various ways
to minimize the cost of finding that list, including precomputing it
during clocksource registration.

Generally there would be 0 or 1 active_during_suspend sources, I
expect. So in practice this probably boils down to locking only one
clocksource.

> The reason I like the idea of a new persistent_clock api, is that it
> formalizes existing usage, and doesn't require changes to the
> timekeeping logic, or to architectures that don't have running

Having seen ARM go through so many iterations of removing these sorts
of non-driver APIs and moving to dynamic bindings just makes it seem
wrong to add more, especially when the API is expected to work with
hardware already handled by a dynamically bound driver.

> But don't let my naysaying stop you from submitting a patch. It
> would be interesting to see your idea fully fleshed out.

Maybe Feng will try a v2 of his patch with some of these ideas? He has
hardware to test it :) I agree it would be clearer to see with code!!
 
> I appreciate your persistence here, and apologies for my thick-headed-ness.

NP

Regards,
Jason

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 0/5] Add support for S3 non-stop TSC support.
  2013-01-22 21:56     ` John Stultz
@ 2013-01-24  3:37       ` Feng Tang
  2013-01-24 18:15         ` Jason Gunthorpe
  0 siblings, 1 reply; 30+ messages in thread
From: Feng Tang @ 2013-01-24  3:37 UTC (permalink / raw)
  To: John Stultz
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Len Brown,
	Rafael J. Wysocki, linux-kernel, Jason Gunthorpe

On Tue, Jan 22, 2013 at 01:56:09PM -0800, John Stultz wrote:
> On 01/22/2013 06:55 AM, Feng Tang wrote:
> >Hi John,
> >
> >On Mon, Jan 21, 2013 at 10:46:31AM -0800, John Stultz wrote:
> >>What I'd propose is that we break the read_persistent_clock()
> >>functionality up. So we need two interfaces:
> >>1) An interface to access a time value we used to initialize the
> >>system's CLOCK_REALTIME time.
> >>2) An interface to measure the length of suspend.
> >>
> >>
> >>Interface #1 could be possibly just replaced with the RTCTOSYS
> >>functionality. Although the downside there is that for some time at
> >>bootup between the timekeeping_init() function running (prior to
> >>interrupts being enabled) and the RTC driver being available (after
> >>interrupts are enabled), where we'd have an incorrect system clock.
> >>So we may want to preserve something like the existing
> >>read_persistent_clock() interface, but as Jason suggested, we could
> >>push that access into the RTC driver itself.
> >One case is one platform need a minimum size of kernel, which only
> >needs to use the read_persistent_clock for time init, and chose
> >to not compile in the "drivers/rtc/". So I think read_persistent_clock()
> >is needed anyway to remove the dependency over the rtc system.
>
> I think hard numbers would be needed to show the rtc layer is
> causing major issues for space constrained kernels, so this
> trade-off could be properly prioritized. Having duplicate code paths
> in standard kernels is wasteful as well.

Here are some sizes of rtc codes:
size rtc-core.o rtc-lib.o hctosys.o rtc-cmos.o
text    data     bss     dec     hex filename
14810     304     132   15246    3b8e rtc-core.o
1425       0       0    1425     591 rtc-lib.o
486       8       0     494     1ee hctosys.o
7169     456      90    7715    1e23 rtc-cmos.o

Another thing is currently the CONFIG_RTC_XXX is selectable option for
kernel, if we push the read_persistent_clock() from kernel code down to
rtc  driver layer, then some of the CONFIG_RTC_XXX have to be always 'y' 

>
> >IIRC, some EFI backed x86 system's read_persistent_clock() is
> >implemented by EFI's runtime gettime service.
> Interesting, does the rtc driver not support this?

x86's read_persistent_clock() is actually implemented with 
	retval = x86_platform.get_wallclock()

And for x86_32 platform, the efi.c has code to set  x86_platform.get_wallclock()
to efi_get_time() which is efi's runtime service.

I don't know the detail how it works, but I think it could co-exist with a
rtc driver if there is.


>
> >
> >>There is still plenty of ugly details as to how interface #2 would
> >>work. Since it could return something as coarse as seconds, or it
> >>could provide nanosecond granularity, you probably want to return a
> >>timespec that we'd capture at suspend and resume, and calculate the
> >Yes, we should keep to use the timespec way in current code.
> >
> >>delta of. However, in order to properly provide a timespec from a
> >>raw TSC counter, you need to be careful with the math to avoid
> >>overflows as TSC counter value grows (take a look at the sched_clock
> >>code). Also whatever function backs this would need to have the
> >>logic to know when to use the TSC counter vs falling back to the RTC
> >>in the case where we're actually able to go into S4.
> >Thanks for the hint, will study the sched_clock code. And yes, how
> >to tell s2ram or s2disk remains a tough task.
> Although from whatever the new read_persistent_clock interface would
> be, you might be able to detect things like the TSC value being
> reset (less then what it was at suspend time), and fall back to an

Good idea! This could be used to judge the S3/S4, as no clocksource should
still tick in S4 (hibernate) mode.

> RTC approximation of what the timestamp should be? Or alternatively,
> on hardware that can hybernate, avoid using the tsc counter
> entirely. Either way, these implementation details should be
> contained in the architecture's new read_persistent_clock()
> implementation, and likely not need any changes in the timekeeping
> code (other then to adapt to use the new interface).

One concern is this way may push some clocksource ops into arch's
read_persistent_clock() implementation. Currently read_persistent_clock()
is only called in three phases: boot, suspend and resume. If we want it to  
trace the suspended time, then we need to detect which phase is calling it.

One rough thought is adding a struct suspend_time_tracker, and make it part
of struct timekeeper. It can embed the 2 existing global variables
timekeeping_suspended and timekeeping_suspend_time. And add 2 wrappers like
suspend_time_prepare(), suspend_time_calc() as Jason mentioned to take care
the suspend time. 

Thanks,
Feng



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 0/5] Add support for S3 non-stop TSC support.
  2013-01-24  3:37       ` Feng Tang
@ 2013-01-24 18:15         ` Jason Gunthorpe
  0 siblings, 0 replies; 30+ messages in thread
From: Jason Gunthorpe @ 2013-01-24 18:15 UTC (permalink / raw)
  To: Feng Tang
  Cc: John Stultz, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Len Brown, Rafael J. Wysocki, linux-kernel

On Thu, Jan 24, 2013 at 11:37:30AM +0800, Feng Tang wrote:

> > I think hard numbers would be needed to show the rtc layer is
> > causing major issues for space constrained kernels, so this
> > trade-off could be properly prioritized. Having duplicate code paths
> > in standard kernels is wasteful as well.

> Another thing is currently the CONFIG_RTC_XXX is selectable option for
> kernel, if we push the read_persistent_clock() from kernel code down to
> rtc  driver layer, then some of the CONFIG_RTC_XXX have to be always 'y' 

All my space constrained embedded kernels (ARM, PPC) already need
CONFIG_RTC.. Configurations that can still access the RTC without
CONFIG_RTC seem to be very limited, the notable one is x86 - and I
don't think 14k is going to be a problem for any modern embedded x86
systems.

CONFIG_RTC_xx doesn't have to be forced to yes, it is like any other
driver, if you don't have/load a RTC driver then you don't get a RTC,
*shrug*

> > >IIRC, some EFI backed x86 system's read_persistent_clock() is
> > >implemented by EFI's runtime gettime service.
> > Interesting, does the rtc driver not support this?
> 
> x86's read_persistent_clock() is actually implemented with 
> 	retval = x86_platform.get_wallclock()
> 
> And for x86_32 platform, the efi.c has code to set  x86_platform.get_wallclock()
> to efi_get_time() which is efi's runtime service.
> 
> I don't know the detail how it works, but I think it could co-exist with a
> rtc driver if there is.

Like the CMOS path, it completely duplicates the code in the EFI RTC
driver, so it should co-exist. The locking seems to be handled by the
EFI stuff:

unsigned long efi_get_time(void)
{
        efi_status_t status;
        efi_time_t eft;
        efi_time_cap_t cap;

        status = efi.get_time(&eft, &cap);
        if (status != EFI_SUCCESS)
                pr_err("Oops: efitime: can't read time!\n");

        return mktime(eft.year, eft.month, eft.day, eft.hour,
                      eft.minute, eft.second);
}

vs:

static int efi_read_time(struct device *dev, struct rtc_time *tm)
{
        efi_status_t status;
        efi_time_t eft;
        efi_time_cap_t cap;

        status = efi.get_time(&eft, &cap);

        if (status != EFI_SUCCESS) {
                /* should never happen */
                printk(KERN_ERR "efitime: can't read time\n");
                return -EINVAL;
        }

        convert_from_efi_time(&eft, tm);

        return rtc_valid_tm(tm);
}

Jason

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 0/5] Add support for S3 non-stop TSC support.
  2013-01-21 13:55 ` [RFC PATCH 0/5] Add support for S3 non-stop TSC support Rafael J. Wysocki
@ 2013-03-30 18:14   ` Pavel Machek
  2013-04-01 17:32     ` John Stultz
  0 siblings, 1 reply; 30+ messages in thread
From: Pavel Machek @ 2013-03-30 18:14 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Feng Tang, Thomas Gleixner, John Stultz, Ingo Molnar,
	H. Peter Anvin, x86, Len Brown, linux-kernel

Hi!

> >On some new Intel Atom processors (Penwell and Cloverview), there is
> >a feature that the TSC won't stop S3, say the TSC value won't be
> >reset to 0 after resume. This feature makes TSC a more reliable
> >clocksource and could benefit the timekeeping code during system
> >suspend/resume cycles.
> >
> >The enabling efforts include adding new flags for this feature,
> >modifying clocksource.c and timekeeping.c to support and utilizing
> >it.
> >
> >One remaining question is inside the timekeeping_resume(), we don't
> >know if it is called by resuming from suspend(s2ram) or from
> >hibernate(s2disk), as there is no easy way to check it currently.
> >But it doesn't hurt as these Penwell/Cloverview platforms only have
> >S3 state, and no S4.
> >
> >Please help to review them, thanks!
> 
> The patches look reasonable to me.

Not sure... what are advantages? TSC is high resolution, but not
exactly precise time source... and this only makes resume more
complex.

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 0/5] Add support for S3 non-stop TSC support.
  2013-03-30 18:14   ` Pavel Machek
@ 2013-04-01 17:32     ` John Stultz
  2013-04-01 20:31       ` Pavel Machek
  0 siblings, 1 reply; 30+ messages in thread
From: John Stultz @ 2013-04-01 17:32 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Rafael J. Wysocki, Feng Tang, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Len Brown, linux-kernel

On 03/30/2013 11:14 AM, Pavel Machek wrote:
> Hi!
>
>>> On some new Intel Atom processors (Penwell and Cloverview), there is
>>> a feature that the TSC won't stop S3, say the TSC value won't be
>>> reset to 0 after resume. This feature makes TSC a more reliable
>>> clocksource and could benefit the timekeeping code during system
>>> suspend/resume cycles.
>>>
>>> The enabling efforts include adding new flags for this feature,
>>> modifying clocksource.c and timekeeping.c to support and utilizing
>>> it.
>>>
>>> One remaining question is inside the timekeeping_resume(), we don't
>>> know if it is called by resuming from suspend(s2ram) or from
>>> hibernate(s2disk), as there is no easy way to check it currently.
>>> But it doesn't hurt as these Penwell/Cloverview platforms only have
>>> S3 state, and no S4.
>>>
>>> Please help to review them, thanks!
>> The patches look reasonable to me.
> Not sure... what are advantages? TSC is high resolution, but not
> exactly precise time source... and this only makes resume more
> complex.
Providing sub-second granularity for suspend time measurement is a 
pretty compelling use, which greatly reduces time error across 
suspend/resume.

I agree that the code logic is more complex, but the TSC path should be 
a good bit faster then hitting the CMOS.
And overall, I'm hoping we can eventually move the RTC based 
read_persistent_clock() implementations into the rtc core, then allow 
for clocksource based suspend timing where available (since ARM boards 
are already basically doing this, but hiding it behind 
read_persistent_clock() calls).

There is the open concern that given their history, x86 designers will 
find yet another way to break the TSC in some future chip and we'll end 
up with non-stop TSCs that run at different frequencies in suspend or 
some other such nonsense. But we'll just have to detect that and disable 
functionality where appropriate, much as we do with the TSC now.

thanks
-john



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 0/5] Add support for S3 non-stop TSC support.
  2013-04-01 17:32     ` John Stultz
@ 2013-04-01 20:31       ` Pavel Machek
  2013-04-01 20:41         ` John Stultz
  0 siblings, 1 reply; 30+ messages in thread
From: Pavel Machek @ 2013-04-01 20:31 UTC (permalink / raw)
  To: John Stultz
  Cc: Rafael J. Wysocki, Feng Tang, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Len Brown, linux-kernel

Hi!

> >>>On some new Intel Atom processors (Penwell and Cloverview), there is
> >>>a feature that the TSC won't stop S3, say the TSC value won't be
> >>>reset to 0 after resume. This feature makes TSC a more reliable
> >>>clocksource and could benefit the timekeeping code during system
> >>>suspend/resume cycles.
> >>>
> >>>The enabling efforts include adding new flags for this feature,
> >>>modifying clocksource.c and timekeeping.c to support and utilizing
> >>>it.
> >>>
> >>>One remaining question is inside the timekeeping_resume(), we don't
> >>>know if it is called by resuming from suspend(s2ram) or from
> >>>hibernate(s2disk), as there is no easy way to check it currently.
> >>>But it doesn't hurt as these Penwell/Cloverview platforms only have
> >>>S3 state, and no S4.
> >>>
> >>>Please help to review them, thanks!
> >>The patches look reasonable to me.
> >Not sure... what are advantages? TSC is high resolution, but not
> >exactly precise time source... and this only makes resume more
> >complex.
> Providing sub-second granularity for suspend time measurement is a
> pretty compelling use, which greatly reduces time error across
> suspend/resume.

Certainly for short sleeps. Is TSC actually precise enough to keep
precise time for hours? I thought TSC sucked at precision.

> I agree that the code logic is more complex, but the TSC path should
> be a good bit faster then hitting the CMOS.

You are right here.

> There is the open concern that given their history, x86 designers
> will find yet another way to break the TSC in some future chip and
> we'll end up with non-stop TSCs that run at different frequencies in
> suspend or some other such nonsense. But we'll just have to detect
> that and disable functionality where appropriate, much as we do with
> the TSC now.

Ok.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 0/5] Add support for S3 non-stop TSC support.
  2013-04-01 20:31       ` Pavel Machek
@ 2013-04-01 20:41         ` John Stultz
  0 siblings, 0 replies; 30+ messages in thread
From: John Stultz @ 2013-04-01 20:41 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Rafael J. Wysocki, Feng Tang, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Len Brown, linux-kernel

On 04/01/2013 01:31 PM, Pavel Machek wrote:
>
> Certainly for short sleeps. Is TSC actually precise enough to keep
> precise time for hours? I thought TSC sucked at precision.

Well, we can measure suspend using the ntp corrected frequency, so that 
should be ok, assuming the freq doesn't change during suspend.

But yes, the long term accuracy will depend on the actual hardware 
implementation, so we'll have to see what the hardware does.

Even so, other boards have similar non-stop clocksources in suspend that 
could be used in a similar fashion, so I think this is still the right 
appraoch.

thanks
-john

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2013-04-01 20:41 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-21  6:38 [RFC PATCH 0/5] Add support for S3 non-stop TSC support Feng Tang
2013-01-21  6:38 ` [RFC PATCH 1/5] x86: Add cpu capability flag X86_FEATURE_TSC_S3_NOTSTOP Feng Tang
2013-01-21  7:27   ` Chen Gong
2013-01-21  7:59     ` Feng Tang
2013-01-21 15:58       ` H. Peter Anvin
2013-01-22 14:07         ` Feng Tang
2013-01-21  6:38 ` [RFC PATCH 2/5] clocksource: Add new feature flag CLOCK_SOURCE_SUSPEND_NOTSTOP Feng Tang
2013-01-21  6:38 ` [RFC PATCH 3/5] x86: tsc: Add support for new S3_NOTSTOP feature Feng Tang
2013-01-21  6:38 ` [RFC PATCH 4/5] clocksource: Enlarge the maxim time interval when configuring the scale and shift Feng Tang
2013-01-21  7:25   ` Chen Gong
2013-01-21  6:38 ` [RFC PATCH 5/5] timekeeping: Add support for clocksource which doesn't stop during suspend Feng Tang
2013-01-21 13:55 ` [RFC PATCH 0/5] Add support for S3 non-stop TSC support Rafael J. Wysocki
2013-03-30 18:14   ` Pavel Machek
2013-04-01 17:32     ` John Stultz
2013-04-01 20:31       ` Pavel Machek
2013-04-01 20:41         ` John Stultz
2013-01-21 18:46 ` John Stultz
2013-01-22 14:55   ` Feng Tang
2013-01-22 21:56     ` John Stultz
2013-01-24  3:37       ` Feng Tang
2013-01-24 18:15         ` Jason Gunthorpe
2013-01-22 19:57   ` Jason Gunthorpe
2013-01-22 20:22     ` John Stultz
2013-01-23  0:26       ` Jason Gunthorpe
2013-01-23  0:41         ` John Stultz
2013-01-23  1:37           ` Jason Gunthorpe
2013-01-23  1:54             ` John Stultz
2013-01-23  2:35               ` Jason Gunthorpe
2013-01-23  3:07                 ` John Stultz
2013-01-23 19:23                   ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).