All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH][v2] timekeeping: Fix memory overwrite of sleep_time_bin array
@ 2016-07-19  4:51 Chen Yu
  2016-07-19  8:36 ` Thomas Gleixner
  0 siblings, 1 reply; 9+ messages in thread
From: Chen Yu @ 2016-07-19  4:51 UTC (permalink / raw)
  To: John Stultz, Thomas Gleixner
  Cc: Rafael J. Wysock, Linux PM list, Linux Kernel list, Chen Yu,
	Stable # 3 . 17+

It is reported the hibernation fails at 2nd attempt, which
hangs at hibernate() -> syscore_resume() -> i8237A_resume()
-> claim_dma_lock(), because the lock has already been taken.
However there is actually no other process would like to grab
this lock on that problematic platform.

Further investigation shows that, the problem is caused by setting
/sys/power/pm_trace to 1 before the 1st hibernation, since once
pm_trace is enabled, the rtc becomes an unmeaningful value after resumed,
which might bring a significant long sleep time in timekeeping_resume,
thus in tk_debug_account_sleep_time, if the bit31 happened to be set to 1,
the fls might return 32 and then we add 1 to sleep_time_bin[32], which
caused a memory overwritten. As System.map shows:

ffffffff81c9d080 b sleep_time_bin
ffffffff81c9d100 B dma_spin_lock

Thus set the dma_spin_lock.val to 1, which caused this problem.

To fix this problem, we ignore those abnormal sleep time,
since no one would like to sleep that long.

Cc: Stable <stable@vger.kernel.org> # 3.17+
Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reported-and-tested-by: Janek Kozicki <cosurgi@gmail.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
 kernel/time/timekeeping_debug.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/time/timekeeping_debug.c b/kernel/time/timekeeping_debug.c
index f6bd652..b164bb9 100644
--- a/kernel/time/timekeeping_debug.c
+++ b/kernel/time/timekeeping_debug.c
@@ -24,6 +24,7 @@
 #include "timekeeping_internal.h"
 
 static unsigned int sleep_time_bin[32] = {0};
+#define MAX_SLEEP_TIME 0x7fffffff
 
 static int tk_debug_show_sleep_time(struct seq_file *s, void *data)
 {
@@ -69,6 +70,7 @@ late_initcall(tk_debug_sleep_time_init);
 
 void tk_debug_account_sleep_time(struct timespec64 *t)
 {
-	sleep_time_bin[fls(t->tv_sec)]++;
+	if ((t->tv_sec >= 0) && (t->tv_sec <= MAX_SLEEP_TIME))
+		sleep_time_bin[fls(t->tv_sec)]++;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH][v2] timekeeping: Fix memory overwrite of sleep_time_bin array
  2016-07-19  4:51 [PATCH][v2] timekeeping: Fix memory overwrite of sleep_time_bin array Chen Yu
@ 2016-07-19  8:36 ` Thomas Gleixner
  2016-07-19  9:07   ` Chen Yu
  0 siblings, 1 reply; 9+ messages in thread
From: Thomas Gleixner @ 2016-07-19  8:36 UTC (permalink / raw)
  To: Chen Yu
  Cc: John Stultz, Rafael J. Wysock, Linux PM list, Linux Kernel list,
	Stable # 3 . 17+

On Tue, 19 Jul 2016, Chen Yu wrote:

> It is reported the hibernation fails at 2nd attempt, which
> hangs at hibernate() -> syscore_resume() -> i8237A_resume()
> -> claim_dma_lock(), because the lock has already been taken.
> However there is actually no other process would like to grab
> this lock on that problematic platform.
> 
> Further investigation shows that, the problem is caused by setting
> /sys/power/pm_trace to 1 before the 1st hibernation, since once
> pm_trace is enabled, the rtc becomes an unmeaningful value after resumed,

So why is the RTC value useless if pm_trace is enabled? I really have a hard
time to understand why pm_trace would affect the sleep time readout from RTC.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH][v2] timekeeping: Fix memory overwrite of sleep_time_bin array
  2016-07-19  8:36 ` Thomas Gleixner
@ 2016-07-19  9:07   ` Chen Yu
  2016-07-19 10:40     ` Thomas Gleixner
  0 siblings, 1 reply; 9+ messages in thread
From: Chen Yu @ 2016-07-19  9:07 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: John Stultz, Rafael J. Wysock, Linux PM list, Linux Kernel list,
	Stable # 3 . 17+

Hi Thomas,

On 2016年07月19日 16:36, Thomas Gleixner wrote:
> On Tue, 19 Jul 2016, Chen Yu wrote:
>
>> It is reported the hibernation fails at 2nd attempt, which
>> hangs at hibernate() -> syscore_resume() -> i8237A_resume()
>> -> claim_dma_lock(), because the lock has already been taken.
>> However there is actually no other process would like to grab
>> this lock on that problematic platform.
>>
>> Further investigation shows that, the problem is caused by setting
>> /sys/power/pm_trace to 1 before the 1st hibernation, since once
>> pm_trace is enabled, the rtc becomes an unmeaningful value after resumed,
> So why is the RTC value useless if pm_trace is enabled? I really have a hard
> time to understand why pm_trace would affect the sleep time readout from RTC.
>
> Thanks,
>
> 	tglx
After pm_trace is enabled, during system suspend/hibernate, the hash name of
each devices will be written to rtc, so  the rtc value depends on what 
we write in last suspend
round, thus pm_trace can be  used for diagnose which device failed to 
suspend(eg, the suspending
on this device hang the system, we reboot the system , and check rtc 
hash value).

  In our case, after first hibernate/resume round,  we found our current 
system time
is  at  2117, so syscore_resume -> timekeeping_resume :
__timekeeping_inject_sleeptime(tk, &ts_delta)
would inject a quite large delta :  2117 - 2017 year, thus the 
sleep_time_bin is overflow.

thanks,
Yu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH][v2] timekeeping: Fix memory overwrite of sleep_time_bin array
  2016-07-19  9:07   ` Chen Yu
@ 2016-07-19 10:40     ` Thomas Gleixner
  2016-07-20 11:06       ` Chen Yu
  2016-07-29  9:50       ` Chen Yu
  0 siblings, 2 replies; 9+ messages in thread
From: Thomas Gleixner @ 2016-07-19 10:40 UTC (permalink / raw)
  To: Chen Yu
  Cc: John Stultz, Rafael J. Wysock, Linux PM list, Linux Kernel list,
	Stable # 3 . 17+

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1513 bytes --]

On Tue, 19 Jul 2016, Chen Yu wrote:
> On 2016年07月19日 16:36, Thomas Gleixner wrote:
> > On Tue, 19 Jul 2016, Chen Yu wrote:
> > > Further investigation shows that, the problem is caused by setting
> > > /sys/power/pm_trace to 1 before the 1st hibernation, since once
> > > pm_trace is enabled, the rtc becomes an unmeaningful value after resumed,
> >
> > So why is the RTC value useless if pm_trace is enabled? I really have a hard
> > time to understand why pm_trace would affect the sleep time readout from
> > RTC.
>
> After pm_trace is enabled, during system suspend/hibernate, the hash name of
> each devices will be written to rtc, so the rtc value depends on what we
> write in last suspend round, thus pm_trace can be used for diagnose which
> device failed to suspend(eg, the suspending on this device hang the system,
> we reboot the system , and check rtc hash value).
> 
> In our case, after first hibernate/resume round, we found our current system
> time is at 2117, so syscore_resume -> timekeeping_resume :
> __timekeeping_inject_sleeptime(tk, &ts_delta) would inject a quite large
> delta : 2117 - 2017 year, thus the sleep_time_bin is overflow.

While the range check is certainly correct and a good thing to have it's wrong
in the first place to call __timekeeping_inject_sleeptime() in case that
pm_trace is enabled simply because that "hash" time value will also wreckage
timekeeping. Your patch is just curing the symptom in the debug code but not
fixing the root cause.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH][v2] timekeeping: Fix memory overwrite of sleep_time_bin array
  2016-07-19 10:40     ` Thomas Gleixner
@ 2016-07-20 11:06       ` Chen Yu
  2016-07-20 12:59         ` Rafael J. Wysocki
  2016-07-29  9:50       ` Chen Yu
  1 sibling, 1 reply; 9+ messages in thread
From: Chen Yu @ 2016-07-20 11:06 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Rafael J . Wysocki, John Stultz, Linux PM, Linux Kernel Mailing List

Hi Thomas,
On Tue, Jul 19, 2016 at 12:40:14PM +0200, Thomas Gleixner wrote:
> On Tue, 19 Jul 2016, Chen Yu wrote:
> > On 2016年07月19日 16:36, Thomas Gleixner wrote:
> > > On Tue, 19 Jul 2016, Chen Yu wrote:
> > > > Further investigation shows that, the problem is caused by setting
> > > > /sys/power/pm_trace to 1 before the 1st hibernation, since once
> > > > pm_trace is enabled, the rtc becomes an unmeaningful value after resumed,
> > >
> > > So why is the RTC value useless if pm_trace is enabled? I really have a hard
> > > time to understand why pm_trace would affect the sleep time readout from
> > > RTC.
> >
> > After pm_trace is enabled, during system suspend/hibernate, the hash name of
> > each devices will be written to rtc, so the rtc value depends on what we
> > write in last suspend round, thus pm_trace can be used for diagnose which
> > device failed to suspend(eg, the suspending on this device hang the system,
> > we reboot the system , and check rtc hash value).
> > 
> > In our case, after first hibernate/resume round, we found our current system
> > time is at 2117, so syscore_resume -> timekeeping_resume :
> > __timekeeping_inject_sleeptime(tk, &ts_delta) would inject a quite large
> > delta : 2117 - 2017 year, thus the sleep_time_bin is overflow.
> 
> While the range check is certainly correct and a good thing to have it's wrong
> in the first place to call __timekeeping_inject_sleeptime() in case that
> pm_trace is enabled simply because that "hash" time value will also wreckage
> timekeeping. Your patch is just curing the symptom in the debug code but not
> fixing the root cause.
> 
OK. I've modified the patch.
In case I break any other stuff :p, could you help check
if this patch is in the right direction, thanks:

1.  There are two places would invoke __timekeeping_inject_sleeptime(),
    they are timekeeping_resume and rtc_resume, so we need to deal with
    them respctively.

2.  for rtc_resume, if the pm_trace has once been enabled,
    we bypass the injection of sleep time.

3. for timekeeping_resume,
     Currently we either use nonstop clock source, or use persistent
     clock to get the sleep time. As pm_trace breaks systems who use rtc
     as a persistent clock, x86 is affected. So we add a
     check for x86 that, if the pm_trace has been enabled, we can not
     trust the persistent clock delta read from rtc, thus bypass
     the injection of sleep time in this case.

4. Why we checked the history of pm_trace: once pm_trace
   has been enabled, the delta of rtc would not be reliable anymore.
   For example, if we only check current pm_trace, we might still get
   memory overwrite:

   4.1 echo 1 > /sys/power/pm_trace
   4.2 hibernate/resume (rtc is broken, do not add delta from rtc because pm_trace is 1)
   4.3 echo 0 > /sys/power/pm_trace
   4.4 hibernate/resume (rtc is still broken, but add delta from rtc because pm_trace is 0)

   so we have to check if pm_trace has once been enabled, if it is, we will not add
   any delta from tsc until system reboots.

Thanks,
Yu

Index: linux/kernel/time/timekeeping.c
===================================================================
--- linux.orig/kernel/time/timekeeping.c
+++ linux/kernel/time/timekeeping.c
@@ -1448,6 +1448,11 @@ void __weak read_boot_clock64(struct tim
 	ts->tv_nsec = 0;
 }
 
+bool __weak persistent_clock_is_usable(void)
+{
+	return true;
+}
+
 /* Flag for if timekeeping_resume() has injected sleeptime */
 static bool sleeptime_injected;
 
@@ -1609,6 +1614,7 @@ void timekeeping_resume(void)
 	unsigned long flags;
 	struct timespec64 ts_new, ts_delta;
 	cycle_t cycle_now, cycle_delta;
+	bool persist_clock_usable = true;
 
 	sleeptime_injected = false;
 	read_persistent_clock64(&ts_new);
@@ -1660,9 +1666,11 @@ void timekeeping_resume(void)
 	} else if (timespec64_compare(&ts_new, &timekeeping_suspend_time) > 0) {
 		ts_delta = timespec64_sub(ts_new, timekeeping_suspend_time);
 		sleeptime_injected = true;
+		if (!persistent_clock_is_usable())
+			persist_clock_usable = false;
 	}
 
-	if (sleeptime_injected)
+	if (sleeptime_injected && persist_clock_usable)
 		__timekeeping_inject_sleeptime(tk, &ts_delta);
 
 	/* Re-base the last cycle value */
Index: linux/arch/x86/kernel/rtc.c
===================================================================
--- linux.orig/arch/x86/kernel/rtc.c
+++ linux/arch/x86/kernel/rtc.c
@@ -8,6 +8,7 @@
 #include <linux/export.h>
 #include <linux/pnp.h>
 #include <linux/of.h>
+#include <linux/pm-trace.h>
 
 #include <asm/vsyscall.h>
 #include <asm/x86_init.h>
@@ -147,6 +148,10 @@ void read_persistent_clock(struct timesp
 	x86_platform.get_wallclock(ts);
 }
 
+bool persistent_clock_is_usable(void)
+{
+	return !pm_trace_once_enabled();
+}
 
 static struct resource rtc_resources[] = {
 	[0] = {
Index: linux/drivers/rtc/class.c
===================================================================
--- linux.orig/drivers/rtc/class.c
+++ linux/drivers/rtc/class.c
@@ -20,6 +20,7 @@
 #include <linux/idr.h>
 #include <linux/slab.h>
 #include <linux/workqueue.h>
+#include <linux/pm-trace.h>
 
 #include "rtc-core.h"
 
@@ -138,7 +139,7 @@ static int rtc_resume(struct device *dev
 	sleep_time = timespec64_sub(sleep_time,
 			timespec64_sub(new_system, old_system));
 
-	if (sleep_time.tv_sec >= 0)
+	if ((sleep_time.tv_sec >= 0) && (!pm_trace_once_enabled()) )
 		timekeeping_inject_sleeptime64(&sleep_time);
 	rtc_hctosys_ret = 0;
 	return 0;
Index: linux/kernel/power/main.c
===================================================================
--- linux.orig/kernel/power/main.c
+++ linux/kernel/power/main.c
@@ -532,6 +532,7 @@ power_attr(wake_unlock);
 
 #ifdef CONFIG_PM_TRACE
 int pm_trace_enabled;
+bool pm_trace_been_enabled;
 
 static ssize_t pm_trace_show(struct kobject *kobj, struct kobj_attribute *attr,
 			     char *buf)
@@ -548,6 +549,7 @@ pm_trace_store(struct kobject *kobj, str
 	if (sscanf(buf, "%d", &val) == 1) {
 		pm_trace_enabled = !!val;
 		if (pm_trace_enabled) {
+			pm_trace_been_enabled = true;
 			pr_warn("PM: Enabling pm_trace changes system date and time during resume.\n"
 				"PM: Correct system time has to be restored manually after resume.\n");
 		}
Index: linux/include/linux/pm-trace.h
===================================================================
--- linux.orig/include/linux/pm-trace.h
+++ linux/include/linux/pm-trace.h
@@ -6,12 +6,18 @@
 #include <linux/types.h>
 
 extern int pm_trace_enabled;
+extern bool pm_trace_been_enabled;
 
 static inline int pm_trace_is_enabled(void)
 {
        return pm_trace_enabled;
 }
 
+static inline bool pm_trace_once_enabled(void)
+{
+       return pm_trace_been_enabled;
+}
+
 struct device;
 extern void set_trace_device(struct device *);
 extern void generate_pm_trace(const void *tracedata, unsigned int user);
@@ -25,6 +31,7 @@ extern int show_trace_dev_match(char *bu
 #else
 
 static inline int pm_trace_is_enabled(void) { return 0; }
+static inline bool pm_trace_once_enabled(void) { return false; }
 
 #define TRACE_DEVICE(dev) do { } while (0)
 #define TRACE_RESUME(dev) do { } while (0)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH][v2] timekeeping: Fix memory overwrite of sleep_time_bin array
  2016-07-20 11:06       ` Chen Yu
@ 2016-07-20 12:59         ` Rafael J. Wysocki
  2016-07-20 16:59           ` Chen, Yu C
  0 siblings, 1 reply; 9+ messages in thread
From: Rafael J. Wysocki @ 2016-07-20 12:59 UTC (permalink / raw)
  To: Chen Yu; +Cc: Thomas Gleixner, John Stultz, Linux PM, Linux Kernel Mailing List

On Wednesday, July 20, 2016 07:06:58 PM Chen Yu wrote:
> Hi Thomas,
> On Tue, Jul 19, 2016 at 12:40:14PM +0200, Thomas Gleixner wrote:
> > On Tue, 19 Jul 2016, Chen Yu wrote:
> > > On 2016年07月19日 16:36, Thomas Gleixner wrote:
> > > > On Tue, 19 Jul 2016, Chen Yu wrote:
> > > > > Further investigation shows that, the problem is caused by setting
> > > > > /sys/power/pm_trace to 1 before the 1st hibernation, since once
> > > > > pm_trace is enabled, the rtc becomes an unmeaningful value after resumed,
> > > >
> > > > So why is the RTC value useless if pm_trace is enabled? I really have a hard
> > > > time to understand why pm_trace would affect the sleep time readout from
> > > > RTC.
> > >
> > > After pm_trace is enabled, during system suspend/hibernate, the hash name of
> > > each devices will be written to rtc, so the rtc value depends on what we
> > > write in last suspend round, thus pm_trace can be used for diagnose which
> > > device failed to suspend(eg, the suspending on this device hang the system,
> > > we reboot the system , and check rtc hash value).
> > > 
> > > In our case, after first hibernate/resume round, we found our current system
> > > time is at 2117, so syscore_resume -> timekeeping_resume :
> > > __timekeeping_inject_sleeptime(tk, &ts_delta) would inject a quite large
> > > delta : 2117 - 2017 year, thus the sleep_time_bin is overflow.
> > 
> > While the range check is certainly correct and a good thing to have it's wrong
> > in the first place to call __timekeeping_inject_sleeptime() in case that
> > pm_trace is enabled simply because that "hash" time value will also wreckage
> > timekeeping. Your patch is just curing the symptom in the debug code but not
> > fixing the root cause.
> > 
> OK. I've modified the patch.
> In case I break any other stuff :p, could you help check
> if this patch is in the right direction, thanks:
> 
> 1.  There are two places would invoke __timekeeping_inject_sleeptime(),
>     they are timekeeping_resume and rtc_resume, so we need to deal with
>     them respctively.
> 
> 2.  for rtc_resume, if the pm_trace has once been enabled,
>     we bypass the injection of sleep time.
> 
> 3. for timekeeping_resume,
>      Currently we either use nonstop clock source, or use persistent
>      clock to get the sleep time. As pm_trace breaks systems who use rtc
>      as a persistent clock, x86 is affected. So we add a
>      check for x86 that, if the pm_trace has been enabled, we can not
>      trust the persistent clock delta read from rtc, thus bypass
>      the injection of sleep time in this case.
> 
> 4. Why we checked the history of pm_trace: once pm_trace
>    has been enabled, the delta of rtc would not be reliable anymore.
>    For example, if we only check current pm_trace, we might still get
>    memory overwrite:
> 
>    4.1 echo 1 > /sys/power/pm_trace
>    4.2 hibernate/resume (rtc is broken, do not add delta from rtc because pm_trace is 1)
>    4.3 echo 0 > /sys/power/pm_trace
>    4.4 hibernate/resume (rtc is still broken, but add delta from rtc because pm_trace is 0)

The initial state of the RTC is invalid, but will the delta be still invalid?

And what if the admin fixes up the RTC before hibernating?  You will still discard
the RTC delta until the next reboot, right?

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH][v2] timekeeping: Fix memory overwrite of sleep_time_bin array
  2016-07-20 12:59         ` Rafael J. Wysocki
@ 2016-07-20 16:59           ` Chen, Yu C
  0 siblings, 0 replies; 9+ messages in thread
From: Chen, Yu C @ 2016-07-20 16:59 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Thomas Gleixner, John Stultz, Linux PM, Linux Kernel Mailing List

Hi,
> -----Original Message-----
> From: Rafael J. Wysocki [mailto:rjw@rjwysocki.net]
> Sent: Wednesday, July 20, 2016 9:00 PM
> To: Chen, Yu C
> Cc: Thomas Gleixner; John Stultz; Linux PM; Linux Kernel Mailing List
> Subject: Re: [PATCH][v2] timekeeping: Fix memory overwrite of sleep_time_bin
> array
> 
> On Wednesday, July 20, 2016 07:06:58 PM Chen Yu wrote:
> > Hi Thomas,
> > On Tue, Jul 19, 2016 at 12:40:14PM +0200, Thomas Gleixner wrote:
> > > On Tue, 19 Jul 2016, Chen Yu wrote:
> > > > On 2016年07月19日 16:36, Thomas Gleixner wrote:
> > > > > On Tue, 19 Jul 2016, Chen Yu wrote:
> > > > > > Further investigation shows that, the problem is caused by
> > > > > > setting /sys/power/pm_trace to 1 before the 1st hibernation,
> > > > > > since once pm_trace is enabled, the rtc becomes an
> > > > > > unmeaningful value after resumed,
> > > > >
> > > > > So why is the RTC value useless if pm_trace is enabled? I really
> > > > > have a hard time to understand why pm_trace would affect the
> > > > > sleep time readout from RTC.
> > > >
> > > > After pm_trace is enabled, during system suspend/hibernate, the
> > > > hash name of each devices will be written to rtc, so the rtc value
> > > > depends on what we write in last suspend round, thus pm_trace can
> > > > be used for diagnose which device failed to suspend(eg, the
> > > > suspending on this device hang the system, we reboot the system , and
> check rtc hash value).
> > > >
> > > > In our case, after first hibernate/resume round, we found our
> > > > current system time is at 2117, so syscore_resume ->
> timekeeping_resume :
> > > > __timekeeping_inject_sleeptime(tk, &ts_delta) would inject a quite
> > > > large delta : 2117 - 2017 year, thus the sleep_time_bin is overflow.
> > >
> > > While the range check is certainly correct and a good thing to have
> > > it's wrong in the first place to call
> > > __timekeeping_inject_sleeptime() in case that pm_trace is enabled
> > > simply because that "hash" time value will also wreckage
> > > timekeeping. Your patch is just curing the symptom in the debug code but
> not fixing the root cause.
> > >
> > OK. I've modified the patch.
> > In case I break any other stuff :p, could you help check if this patch
> > is in the right direction, thanks:
> >
> > 1.  There are two places would invoke __timekeeping_inject_sleeptime(),
> >     they are timekeeping_resume and rtc_resume, so we need to deal with
> >     them respctively.
> >
> > 2.  for rtc_resume, if the pm_trace has once been enabled,
> >     we bypass the injection of sleep time.
> >
> > 3. for timekeeping_resume,
> >      Currently we either use nonstop clock source, or use persistent
> >      clock to get the sleep time. As pm_trace breaks systems who use rtc
> >      as a persistent clock, x86 is affected. So we add a
> >      check for x86 that, if the pm_trace has been enabled, we can not
> >      trust the persistent clock delta read from rtc, thus bypass
> >      the injection of sleep time in this case.
> >
> > 4. Why we checked the history of pm_trace: once pm_trace
> >    has been enabled, the delta of rtc would not be reliable anymore.
> >    For example, if we only check current pm_trace, we might still get
> >    memory overwrite:
> >
> >    4.1 echo 1 > /sys/power/pm_trace
> >    4.2 hibernate/resume (rtc is broken, do not add delta from rtc because
> pm_trace is 1)
> >    4.3 echo 0 > /sys/power/pm_trace
> >    4.4 hibernate/resume (rtc is still broken, but add delta from rtc
> > because pm_trace is 0)
> 
> The initial state of the RTC is invalid, but will the delta be still invalid?
>
According to feedback from the bug reporter, 
with previous patch applied which uses pm_trace_is_enabled() directly in
 resume, then after several rounds,  once the pm_trace is disabled, 
the following hibernate/resume  would bring  a overflow sleep delta.
 It looks like the delta also considers the historical(previous) delta, 
so I added the history pm_once_trace . I'll confirm and give feedback later.
 
> And what if the admin fixes up the RTC before hibernating?  You will still
> discard the RTC delta until the next reboot, right?
Yes, it will be discarded, I agree we should not bypass the delta if someone has
fixed the rtc, I'll dig into the code.
 
Thanks,
Yu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH][v2] timekeeping: Fix memory overwrite of sleep_time_bin array
  2016-07-19 10:40     ` Thomas Gleixner
  2016-07-20 11:06       ` Chen Yu
@ 2016-07-29  9:50       ` Chen Yu
  2016-07-29 12:59         ` Thomas Gleixner
  1 sibling, 1 reply; 9+ messages in thread
From: Chen Yu @ 2016-07-29  9:50 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Rafael J. Wysocki, John Stultz, Xunlei Pang, Zhang Rui,
	Linux PM list, Linux Kernel Mail List

Hi Thomas,

On Tue, Jul 19, 2016 at 12:40:14PM +0200, Thomas Gleixner wrote:
> On Tue, 19 Jul 2016, Chen Yu wrote:
> > On 2016年07月19日 16:36, Thomas Gleixner wrote:
> > > On Tue, 19 Jul 2016, Chen Yu wrote:
> > > > Further investigation shows that, the problem is caused by setting
> > > > /sys/power/pm_trace to 1 before the 1st hibernation, since once
> > > > pm_trace is enabled, the rtc becomes an unmeaningful value after resumed,
> > >
> > > So why is the RTC value useless if pm_trace is enabled? I really have a hard
> > > time to understand why pm_trace would affect the sleep time readout from
> > > RTC.
> >
> > After pm_trace is enabled, during system suspend/hibernate, the hash name of
> > each devices will be written to rtc, so the rtc value depends on what we
> > write in last suspend round, thus pm_trace can be used for diagnose which
> > device failed to suspend(eg, the suspending on this device hang the system,
> > we reboot the system , and check rtc hash value).
> > 
> > In our case, after first hibernate/resume round, we found our current system
> > time is at 2117, so syscore_resume -> timekeeping_resume :
> > __timekeeping_inject_sleeptime(tk, &ts_delta) would inject a quite large
> > delta : 2117 - 2017 year, thus the sleep_time_bin is overflow.
> 
> While the range check is certainly correct and a good thing to have it's wrong
> in the first place to call __timekeeping_inject_sleeptime() in case that
> pm_trace is enabled simply because that "hash" time value will also wreckage
> timekeeping. Your patch is just curing the symptom in the debug code but not
> fixing the root cause.
> 
I've done more testings on that problematical machine, and the result shows that,
the real offender is that, some BIOSes would like to adjust the 'INVALID' rtc value
to the motherboard released date during POST stage. That is to say, once the rtc
is set to an invalid value, the subsequent hibernate/resume would get a meaningless
tsc delta due to the aggressive BIOSes.

Here's one of scenarios on how the problem is triggered:

1. pm_trace = 1, suspend to disk sets the rtc to year 1912.
2. system reboots, BIOS adjusts the rtc to 2013, which is the release date of that motherboard.
3. resume from disk, and the sleep time(i.e, delta of tsc) is (2013 - 1912), which caused overflow
   in timekeepting_debug.

We can avoid this problem by ignoring the idle injection if pm_trace is enabled,
but we might still miss other cases which might also break the rtc, e.g, buggy
clocksoure/rtc driver, or even user space tool such as hwclock, so actually we
can not distinguish whether the delta of tsc is reasonable(long time sleep or not?), IMO.

I've prepared two version of solutions:

1st is to bypass the sleep time if pm_trace is involved(a little complicated
    because it needs to deal with historical pm_trace), or

2nd version is to introduce a sysfs interface to allow the
    users to specify the threshold of sleep time, any delta
    of tsc bigger than the threshold would be regarded as invalid.

Both two solutions contain the fix for the overflow in timekeeping_debug.
May I know which one do you prefer, or any suggestions would be appreciated.

Thanks,
Yu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH][v2] timekeeping: Fix memory overwrite of sleep_time_bin array
  2016-07-29  9:50       ` Chen Yu
@ 2016-07-29 12:59         ` Thomas Gleixner
  0 siblings, 0 replies; 9+ messages in thread
From: Thomas Gleixner @ 2016-07-29 12:59 UTC (permalink / raw)
  To: Chen Yu
  Cc: Rafael J. Wysocki, John Stultz, Xunlei Pang, Zhang Rui,
	Linux PM list, Linux Kernel Mail List

On Fri, 29 Jul 2016, Chen Yu wrote:
> On Tue, Jul 19, 2016 at 12:40:14PM +0200, Thomas Gleixner wrote:
> 1st is to bypass the sleep time if pm_trace is involved(a little complicated
>     because it needs to deal with historical pm_trace), or
> 
> 2nd version is to introduce a sysfs interface to allow the
>     users to specify the threshold of sleep time, any delta
>     of tsc bigger than the threshold would be regarded as invalid.

That one is really overkill. We provide a proper sanity measure for this and
be done with it.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-07-29 13:01 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-19  4:51 [PATCH][v2] timekeeping: Fix memory overwrite of sleep_time_bin array Chen Yu
2016-07-19  8:36 ` Thomas Gleixner
2016-07-19  9:07   ` Chen Yu
2016-07-19 10:40     ` Thomas Gleixner
2016-07-20 11:06       ` Chen Yu
2016-07-20 12:59         ` Rafael J. Wysocki
2016-07-20 16:59           ` Chen, Yu C
2016-07-29  9:50       ` Chen Yu
2016-07-29 12:59         ` Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.