All of lore.kernel.org
 help / color / mirror / Atom feed
* [Regression][Revert request] Excessive delay or hang during resume from system suspend due to a hrtimer commit
@ 2012-07-15 20:40 Rafael J. Wysocki
  2012-07-16  9:47 ` Thomas Gleixner
  0 siblings, 1 reply; 7+ messages in thread
From: Rafael J. Wysocki @ 2012-07-15 20:40 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux PM list, LKML, John Stultz, Ingo Molnar, Peter Zijlstra,
	Prarit Bhargava, stable, Thomas Gleixner, Andreas Schwab

Hi Linus,

Please revert:

commit 5baefd6d84163443215f4a99f6a20f054ef11236
Author: John Stultz <johnstul@us.ibm.com>
Date:   Tue Jul 10 18:43:25 2012 -0400

    hrtimer: Update hrtimer base offsets each hrtimer_interrupt

This breaks resume on the iBook G4 and Toshiba Portege R500 (at least), by
adding an excessive delay to it (the Toshiba box sometimes hangs hard during
resume from system suspend).  According to Andreas
(https://lkml.org/lkml/2012/7/15/66):

"Apparently during or before noirq resume the system is hanging by the same
amount of time as the system was sleeping."

which seems to agree with my observations.

Given that the two known-affected boxes are so different, it is quite probable
that the total number of affected systems is actually quite high.

Thanks!


To everyone involved: the fact that this change, which was likely to introduce
regressions from the look of it alone, has been pushed to Linus (an to -stable
at the same time!) so late in the cycle, is seriuosly disappointing.

Thanks,
Rafael



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Regression][Revert request] Excessive delay or hang during resume from system suspend due to a hrtimer commit
  2012-07-15 20:40 [Regression][Revert request] Excessive delay or hang during resume from system suspend due to a hrtimer commit Rafael J. Wysocki
@ 2012-07-16  9:47 ` Thomas Gleixner
  2012-07-16 11:16   ` Rafael J. Wysocki
  0 siblings, 1 reply; 7+ messages in thread
From: Thomas Gleixner @ 2012-07-16  9:47 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linus Torvalds, Linux PM list, LKML, John Stultz, Ingo Molnar,
	Peter Zijlstra, Prarit Bhargava, stable, Andreas Schwab

On Sun, 15 Jul 2012, Rafael J. Wysocki wrote:
> To everyone involved: the fact that this change, which was likely to introduce
> regressions from the look of it alone, has been pushed to Linus (an to -stable
> at the same time!) so late in the cycle, is seriuosly disappointing.

Well, we spent an massive amount of time in testing, reviewing and
discussion and it definitely did not break suspend/resume here.

This was not pushed without a lot of thoughts and in fact what you are
seing is another long standing bug in the timekeeping resume code,
which was just papered over by the incorrect handling of the clock was
set cases in the other parts of the system.

Does the following patch fix the problem for you ?

@John: Should that clear ntp as well or is it enough to set ntp_error
       to 0 ?

/me really goes on vacation now.

Thanks,

	tglx

---------
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 269b1fe..3447cfa 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -717,6 +717,7 @@ static void timekeeping_resume(void)
 	timekeeper.clock->cycle_last = timekeeper.clock->read(timekeeper.clock);
 	timekeeper.ntp_error = 0;
 	timekeeping_suspended = 0;
+	timekeeping_update(false);
 	write_sequnlock_irqrestore(&timekeeper.lock, flags);
 
 	touch_softlockup_watchdog();


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [Regression][Revert request] Excessive delay or hang during resume from system suspend due to a hrtimer commit
  2012-07-16 11:16   ` Rafael J. Wysocki
@ 2012-07-16 11:15     ` Thomas Gleixner
  2012-07-16 11:26       ` Thomas Gleixner
  2012-07-16 12:48     ` Andreas Schwab
  1 sibling, 1 reply; 7+ messages in thread
From: Thomas Gleixner @ 2012-07-16 11:15 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linus Torvalds, Linux PM list, LKML, John Stultz, Ingo Molnar,
	Peter Zijlstra, Prarit Bhargava, stable, Andreas Schwab

On Mon, 16 Jul 2012, Rafael J. Wysocki wrote:

> On Monday, July 16, 2012, Thomas Gleixner wrote:
> > On Sun, 15 Jul 2012, Rafael J. Wysocki wrote:
> > > To everyone involved: the fact that this change, which was likely to introduce
> > > regressions from the look of it alone, has been pushed to Linus (an to -stable
> > > at the same time!) so late in the cycle, is seriuosly disappointing.
> > 
> > Well, we spent an massive amount of time in testing, reviewing and
> > discussion and it definitely did not break suspend/resume here.
> 
> I'm not saying that you didn't consider it thoroughly, but unfortunately you
> did overlook this particular issue, didn't you?
> 
> > This was not pushed without a lot of thoughts and in fact what you are
> > seing is another long standing bug in the timekeeping resume code,
> > which was just papered over by the incorrect handling of the clock was
> > set cases in the other parts of the system.
> > 
> > Does the following patch fix the problem for you ?
> 
> Yes, it does, thanks!
> 
> > @John: Should that clear ntp as well or is it enough to set ntp_error
> >        to 0 ?
> > 
> > /me really goes on vacation now.
> 
> So who's going to take care of the patch? :-)

I'm still packing gear. So i'll push it into timers/urgent.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Regression][Revert request] Excessive delay or hang during resume from system suspend due to a hrtimer commit
  2012-07-16  9:47 ` Thomas Gleixner
@ 2012-07-16 11:16   ` Rafael J. Wysocki
  2012-07-16 11:15     ` Thomas Gleixner
  2012-07-16 12:48     ` Andreas Schwab
  0 siblings, 2 replies; 7+ messages in thread
From: Rafael J. Wysocki @ 2012-07-16 11:16 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Linus Torvalds, Linux PM list, LKML, John Stultz, Ingo Molnar,
	Peter Zijlstra, Prarit Bhargava, stable, Andreas Schwab

On Monday, July 16, 2012, Thomas Gleixner wrote:
> On Sun, 15 Jul 2012, Rafael J. Wysocki wrote:
> > To everyone involved: the fact that this change, which was likely to introduce
> > regressions from the look of it alone, has been pushed to Linus (an to -stable
> > at the same time!) so late in the cycle, is seriuosly disappointing.
> 
> Well, we spent an massive amount of time in testing, reviewing and
> discussion and it definitely did not break suspend/resume here.

I'm not saying that you didn't consider it thoroughly, but unfortunately you
did overlook this particular issue, didn't you?

> This was not pushed without a lot of thoughts and in fact what you are
> seing is another long standing bug in the timekeeping resume code,
> which was just papered over by the incorrect handling of the clock was
> set cases in the other parts of the system.
> 
> Does the following patch fix the problem for you ?

Yes, it does, thanks!

> @John: Should that clear ntp as well or is it enough to set ntp_error
>        to 0 ?
> 
> /me really goes on vacation now.

So who's going to take care of the patch? :-)

Rafael


> ---------
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index 269b1fe..3447cfa 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -717,6 +717,7 @@ static void timekeeping_resume(void)
>  	timekeeper.clock->cycle_last = timekeeper.clock->read(timekeeper.clock);
>  	timekeeper.ntp_error = 0;
>  	timekeeping_suspended = 0;
> +	timekeeping_update(false);
>  	write_sequnlock_irqrestore(&timekeeper.lock, flags);
>  
>  	touch_softlockup_watchdog();
> 
> 
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Regression][Revert request] Excessive delay or hang during resume from system suspend due to a hrtimer commit
  2012-07-16 11:15     ` Thomas Gleixner
@ 2012-07-16 11:26       ` Thomas Gleixner
  2012-07-16 15:47         ` John Stultz
  0 siblings, 1 reply; 7+ messages in thread
From: Thomas Gleixner @ 2012-07-16 11:26 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linus Torvalds, Linux PM list, LKML, John Stultz, Ingo Molnar,
	Peter Zijlstra, Prarit Bhargava, stable, Andreas Schwab

On Mon, 16 Jul 2012, Thomas Gleixner wrote:

> On Mon, 16 Jul 2012, Rafael J. Wysocki wrote:
> 
> > On Monday, July 16, 2012, Thomas Gleixner wrote:
> > > On Sun, 15 Jul 2012, Rafael J. Wysocki wrote:
> > > > To everyone involved: the fact that this change, which was likely to introduce
> > > > regressions from the look of it alone, has been pushed to Linus (an to -stable
> > > > at the same time!) so late in the cycle, is seriuosly disappointing.
> > > 
> > > Well, we spent an massive amount of time in testing, reviewing and
> > > discussion and it definitely did not break suspend/resume here.
> > 
> > I'm not saying that you didn't consider it thoroughly, but unfortunately you
> > did overlook this particular issue, didn't you?
> > 
> > > This was not pushed without a lot of thoughts and in fact what you are
> > > seing is another long standing bug in the timekeeping resume code,
> > > which was just papered over by the incorrect handling of the clock was
> > > set cases in the other parts of the system.
> > > 
> > > Does the following patch fix the problem for you ?
> > 
> > Yes, it does, thanks!
> > 
> > > @John: Should that clear ntp as well or is it enough to set ntp_error
> > >        to 0 ?
> > > 
> > > /me really goes on vacation now.
> > 
> > So who's going to take care of the patch? :-)
> 
> I'm still packing gear. So i'll push it into timers/urgent.

Actually that's a bad idea. John want's to double check vs. the
ntp_clear question. So John can send it to linus directly.

@John: Should it be: timekeeping_update(true)

Now I'm gone for real.

Thanks,

	tglx
-----
Subject: timekeeping: Add missing update call in timekeeping_resume()
From: Thomas Gleixner <tglx@linutronix.de>
Date: Mon, 16 Jul 2012 11:47:31 +0200 (CEST)

The leap second rework unearthed another issue of inconsistent data.

On timekeeping_resume() the timekeeper data is updated, but nothing
calls timekeeping_update(), so now the update code in the timer
interrupt sees stale values.

This has been the case before those changes, but then the timer
interrupt was using stale data as well so this went unnoticed for
quite some time.

Add the missing update call, so all the data is consistent everywhere.

Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Reported-by-and-tested-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linux PM list <linux-pm@vger.kernel.org>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>, 
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Index: tip/kernel/time/timekeeping.c
===================================================================
--- tip.orig/kernel/time/timekeeping.c
+++ tip/kernel/time/timekeeping.c
@@ -717,6 +717,7 @@ static void timekeeping_resume(void)
 	timekeeper.clock->cycle_last = timekeeper.clock->read(timekeeper.clock);
 	timekeeper.ntp_error = 0;
 	timekeeping_suspended = 0;
+	timekeeping_update(false);
 	write_sequnlock_irqrestore(&timekeeper.lock, flags);
 
 	touch_softlockup_watchdog();


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Regression][Revert request] Excessive delay or hang during resume from system suspend due to a hrtimer commit
  2012-07-16 11:16   ` Rafael J. Wysocki
  2012-07-16 11:15     ` Thomas Gleixner
@ 2012-07-16 12:48     ` Andreas Schwab
  1 sibling, 0 replies; 7+ messages in thread
From: Andreas Schwab @ 2012-07-16 12:48 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Thomas Gleixner, Linus Torvalds, Linux PM list, LKML,
	John Stultz, Ingo Molnar, Peter Zijlstra, Prarit Bhargava,
	stable

"Rafael J. Wysocki" <rjw@sisk.pl> writes:

> On Monday, July 16, 2012, Thomas Gleixner wrote:
>> Does the following patch fix the problem for you ?
>
> Yes, it does, thanks!

Works for me as well.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Regression][Revert request] Excessive delay or hang during resume from system suspend due to a hrtimer commit
  2012-07-16 11:26       ` Thomas Gleixner
@ 2012-07-16 15:47         ` John Stultz
  0 siblings, 0 replies; 7+ messages in thread
From: John Stultz @ 2012-07-16 15:47 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Rafael J. Wysocki, Linus Torvalds, Linux PM list, LKML,
	Ingo Molnar, Peter Zijlstra, Prarit Bhargava, stable,
	Andreas Schwab

On 07/16/2012 04:26 AM, Thomas Gleixner wrote:
> On Mon, 16 Jul 2012, Thomas Gleixner wrote:
>
>> On Mon, 16 Jul 2012, Rafael J. Wysocki wrote:
>>
>>> On Monday, July 16, 2012, Thomas Gleixner wrote:
>>>> On Sun, 15 Jul 2012, Rafael J. Wysocki wrote:
>>>>> To everyone involved: the fact that this change, which was likely to introduce
>>>>> regressions from the look of it alone, has been pushed to Linus (an to -stable
>>>>> at the same time!) so late in the cycle, is seriuosly disappointing.
>>>> Well, we spent an massive amount of time in testing, reviewing and
>>>> discussion and it definitely did not break suspend/resume here.
>>> I'm not saying that you didn't consider it thoroughly, but unfortunately you
>>> did overlook this particular issue, didn't you?
>>>
>>>> This was not pushed without a lot of thoughts and in fact what you are
>>>> seing is another long standing bug in the timekeeping resume code,
>>>> which was just papered over by the incorrect handling of the clock was
>>>> set cases in the other parts of the system.
>>>>
>>>> Does the following patch fix the problem for you ?
>>> Yes, it does, thanks!
>>>
>>>> @John: Should that clear ntp as well or is it enough to set ntp_error
>>>>         to 0 ?
>>>>
>>>> /me really goes on vacation now.
>>> So who's going to take care of the patch? :-)
>> I'm still packing gear. So i'll push it into timers/urgent.
> Actually that's a bad idea. John want's to double check vs. the
> ntp_clear question. So John can send it to linus directly.
>
> @John: Should it be: timekeeping_update(true)
I think its better to leave it as false, so we don't reset the NTP state 
machine completely after suspend.

When we come back from suspend our error is usually off by the 
persistent_clock/rtc granularity, so it might make sense, but I'd want a 
lot more testing of using ntp over suspend before changing the existing 
behavior of not doing it.

> Now I'm gone for real.
Ok. Thanks for spinning this up so quickly. I'll go ahead and send it on 
to Linus.

thanks
-john


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-07-16 15:50 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-15 20:40 [Regression][Revert request] Excessive delay or hang during resume from system suspend due to a hrtimer commit Rafael J. Wysocki
2012-07-16  9:47 ` Thomas Gleixner
2012-07-16 11:16   ` Rafael J. Wysocki
2012-07-16 11:15     ` Thomas Gleixner
2012-07-16 11:26       ` Thomas Gleixner
2012-07-16 15:47         ` John Stultz
2012-07-16 12:48     ` Andreas Schwab

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.