linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Suspend to RAM regression tracked down
@ 2006-07-02 10:47 Jean-Marc Valin
  2006-07-02 18:06 ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 13+ messages in thread
From: Jean-Marc Valin @ 2006-07-02 10:47 UTC (permalink / raw)
  To: Linux Kernel

Hi,

A while ago, I reported a suspend to RAM regression (fail to resume). I
have since then tracked down the regression to the changes between
2.6.12-rc5-git5 and 2.6.12-rc5-git6. On my laptop, I have only been able
to reproduce the problem with the ondemand cpufreq governor, but I've
head of another user with the same (Dell D600) laptop having problem
with the userspace governor as well. All the details are actually
http://bugzilla.kernel.org/show_bug.cgi?id=6166 but it seems like it's
being ignored. It's currently assigned to the ACPI category, but maybe
it belongs to cpufreq? Anyone can help here?

	Jean-Marc

P.S. Please CC me since I'm not subscribed to the list.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Suspend to RAM regression tracked down
  2006-07-02 10:47 Suspend to RAM regression tracked down Jean-Marc Valin
@ 2006-07-02 18:06 ` Jeremy Fitzhardinge
  2006-07-02 22:52   ` Jean-Marc Valin
  2006-07-07 11:25   ` Jean-Marc Valin
  0 siblings, 2 replies; 13+ messages in thread
From: Jeremy Fitzhardinge @ 2006-07-02 18:06 UTC (permalink / raw)
  To: Jean-Marc Valin; +Cc: Linux Kernel, cpufreq

Jean-Marc Valin wrote:
> A while ago, I reported a suspend to RAM regression (fail to resume). I
> have since then tracked down the regression to the changes between
> 2.6.12-rc5-git5 and 2.6.12-rc5-git6. On my laptop, I have only been able
> to reproduce the problem with the ondemand cpufreq governor, but I've
> head of another user with the same (Dell D600) laptop having problem
> with the userspace governor as well. All the details are actually
> http://bugzilla.kernel.org/show_bug.cgi?id=6166 but it seems like it's
> being ignored. It's currently assigned to the ACPI category, but maybe
> it belongs to cpufreq? Anyone can help here?
>   

There was a race in ondemand and conservative which made them lock up on 
resume (possibly only on SMP systems though).  There's a patch for that 
in current -mm, but I suspect there's another problem (still haven't had 
any time to track it down).

The workaround is to switch to one of the performance/powersave/user 
governors just before suspend, and restore the governor on resume.

    J


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Suspend to RAM regression tracked down
  2006-07-02 18:06 ` Jeremy Fitzhardinge
@ 2006-07-02 22:52   ` Jean-Marc Valin
  2006-07-03  6:02     ` Jeremy Fitzhardinge
  2006-07-03  6:08     ` Jeff Chua
  2006-07-07 11:25   ` Jean-Marc Valin
  1 sibling, 2 replies; 13+ messages in thread
From: Jean-Marc Valin @ 2006-07-02 22:52 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Linux Kernel, cpufreq

> There was a race in ondemand and conservative which made them lock up on 
> resume (possibly only on SMP systems though).  There's a patch for that 
> in current -mm, but I suspect there's another problem (still haven't had 
> any time to track it down).

Any link to the patch and the thread about the problem (if any)? Also,
was the race introduced in 2.6.12-rc5-git6? If not, it's a completely
different problem because my machine worked fine with 2.6.12-rc5-git5.

	Jean-Marc

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Suspend to RAM regression tracked down
  2006-07-02 22:52   ` Jean-Marc Valin
@ 2006-07-03  6:02     ` Jeremy Fitzhardinge
  2006-07-03  6:31       ` Jean-Marc Valin
  2006-07-03  6:08     ` Jeff Chua
  1 sibling, 1 reply; 13+ messages in thread
From: Jeremy Fitzhardinge @ 2006-07-03  6:02 UTC (permalink / raw)
  To: Jean-Marc Valin; +Cc: cpufreq, Linux Kernel

Jean-Marc Valin wrote:
> Any link to the patch and the thread about the problem (if any)? Also,
> was the race introduced in 2.6.12-rc5-git6? If not, it's a completely
> different problem because my machine worked fine with 2.6.12-rc5-git5.
>   

It's in the thread on the cpufreq list titled "ondemand vs suspend"; 
Venkatesh Pallipadi posted the patch.

    J

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Suspend to RAM regression tracked down
  2006-07-02 22:52   ` Jean-Marc Valin
  2006-07-03  6:02     ` Jeremy Fitzhardinge
@ 2006-07-03  6:08     ` Jeff Chua
  1 sibling, 0 replies; 13+ messages in thread
From: Jeff Chua @ 2006-07-03  6:08 UTC (permalink / raw)
  To: Jean-Marc Valin; +Cc: Jeremy Fitzhardinge, Linux Kernel, cpufreq

Can you try 2.6.17 with this patch from Greg. I had same problem on my
IBM X60s, as you mentioned ... sometimes, can't resume after a few
suspend/resume cycles. But this patch seems to do the trick even if I
don't load USB modules sometimes.

Subject: USB: get USB suspend to work again

Yeah, it's a hack, but it is only temporary until Alan's patches
reworking this area make it in.  We really should not care what devices
below us are doing, especially when we do not really know what type of
devices they are.  This patch relies on the fact that the endpoint
devices do not have a driver assigned to us.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 drivers/usb/core/usb.c |    2 ++
 1 file changed, 2 insertions(+)

--- gregkh-2.6.orig/drivers/usb/core/usb.c
+++ gregkh-2.6/drivers/usb/core/usb.c
@@ -991,6 +991,8 @@ void usb_buffer_unmap_sg (struct usb_dev

 static int verify_suspended(struct device *dev, void *unused)
 {
+       if (dev->driver == NULL)
+               return 0;
        return (dev->power.power_state.event == PM_EVENT_ON) ? -EBUSY : 0;
 }



On 7/3/06, Jean-Marc Valin <Jean-Marc.Valin@usherbrooke.ca> wrote:
> > There was a race in ondemand and conservative which made them lock up on
> > resume (possibly only on SMP systems though).  There's a patch for that
> > in current -mm, but I suspect there's another problem (still haven't had
> > any time to track it down).
>
> Any link to the patch and the thread about the problem (if any)? Also,
> was the race introduced in 2.6.12-rc5-git6? If not, it's a completely
> different problem because my machine worked fine with 2.6.12-rc5-git5.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Suspend to RAM regression tracked down
  2006-07-03  6:02     ` Jeremy Fitzhardinge
@ 2006-07-03  6:31       ` Jean-Marc Valin
  0 siblings, 0 replies; 13+ messages in thread
From: Jean-Marc Valin @ 2006-07-03  6:31 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: cpufreq, Linux Kernel, venkatesh.pallipadi

On Sun, 2006-07-02 at 23:02 -0700, Jeremy Fitzhardinge wrote: 
> Jean-Marc Valin wrote:
> > Any link to the patch and the thread about the problem (if any)? Also,
> > was the race introduced in 2.6.12-rc5-git6? If not, it's a completely
> > different problem because my machine worked fine with 2.6.12-rc5-git5.
> >   
> 
> It's in the thread on the cpufreq list titled "ondemand vs suspend"; 
> Venkatesh Pallipadi posted the patch.

Just read the thread and it's not clear to me whether it's the same
problem. Venkatesh, does the thread describe the same as this bug:
http://bugzilla.kernel.org/show_bug.cgi?id=6166
which appeared in 2.6.12-rc5-git6 or is it a separate problem?

	Jean-Marc

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Suspend to RAM regression tracked down
  2006-07-02 18:06 ` Jeremy Fitzhardinge
  2006-07-02 22:52   ` Jean-Marc Valin
@ 2006-07-07 11:25   ` Jean-Marc Valin
  2006-07-07 16:21     ` Dave Jones
  1 sibling, 1 reply; 13+ messages in thread
From: Jean-Marc Valin @ 2006-07-07 11:25 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Linux Kernel, cpufreq

> There was a race in ondemand and conservative which made them lock up on 
> resume (possibly only on SMP systems though).  There's a patch for that 
> in current -mm, but I suspect there's another problem (still haven't had 
> any time to track it down).

OK, I tried the patch with 2.6.17 and it didn't work. My laptop failed
to resume on the first try, so it must be something else. Could someone
actually have a look at the changes in 2.6.12-rc5-git6 (which happen to
be cpufreq-related)? I spend months pinpointing the problem to that
version (it's takes several days to reproduce). I'd appreciate if
someone could at least have a look at what changed there and maybe fix
it.

Thanks,

	Jean-Marc


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Suspend to RAM regression tracked down
  2006-07-07 11:25   ` Jean-Marc Valin
@ 2006-07-07 16:21     ` Dave Jones
  2006-07-07 22:48       ` Jean-Marc Valin
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Jones @ 2006-07-07 16:21 UTC (permalink / raw)
  To: Jean-Marc Valin; +Cc: Jeremy Fitzhardinge, Linux Kernel, cpufreq

On Fri, Jul 07, 2006 at 09:25:37PM +1000, Jean-Marc Valin wrote:
 > > There was a race in ondemand and conservative which made them lock up on 
 > > resume (possibly only on SMP systems though).  There's a patch for that 
 > > in current -mm, but I suspect there's another problem (still haven't had 
 > > any time to track it down).
 > 
 > OK, I tried the patch with 2.6.17 and it didn't work. My laptop failed
 > to resume on the first try, so it must be something else. Could someone
 > actually have a look at the changes in 2.6.12-rc5-git6 (which happen to
 > be cpufreq-related)? I spend months pinpointing the problem to that
 > version (it's takes several days to reproduce). I'd appreciate if
 > someone could at least have a look at what changed there and maybe fix
 > it.

Can you show /proc/cpuinfo for the affected system ?
If it's 15/3/4 or 15/4/1, that would explain why this kernel,
as this was when support for those models got introduced to
speedstep-centrino.

If it's not that, there is a pretty large delta in the ondemand
governor in this update, but I don't see anything blindlingly
obvious from looking over it.

		Dave

-- 
http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Suspend to RAM regression tracked down
  2006-07-07 16:21     ` Dave Jones
@ 2006-07-07 22:48       ` Jean-Marc Valin
  2006-07-08  6:23         ` Dave Jones
  0 siblings, 1 reply; 13+ messages in thread
From: Jean-Marc Valin @ 2006-07-07 22:48 UTC (permalink / raw)
  To: Dave Jones; +Cc: Jeremy Fitzhardinge, Linux Kernel, cpufreq

Le vendredi 07 juillet 2006 à 12:21 -0400, Dave Jones a écrit :
> On Fri, Jul 07, 2006 at 09:25:37PM +1000, Jean-Marc Valin wrote:
>  > > There was a race in ondemand and conservative which made them lock up on 
>  > > resume (possibly only on SMP systems though).  There's a patch for that 
>  > > in current -mm, but I suspect there's another problem (still haven't had 
>  > > any time to track it down).
>  > 
>  > OK, I tried the patch with 2.6.17 and it didn't work. My laptop failed
>  > to resume on the first try, so it must be something else. Could someone
>  > actually have a look at the changes in 2.6.12-rc5-git6 (which happen to
>  > be cpufreq-related)? I spend months pinpointing the problem to that
>  > version (it's takes several days to reproduce). I'd appreciate if
>  > someone could at least have a look at what changed there and maybe fix
>  > it.
> 
> Can you show /proc/cpuinfo for the affected system ?
> If it's 15/3/4 or 15/4/1, that would explain why this kernel,
> as this was when support for those models got introduced to
> speedstep-centrino.

Not sure what's the 15/..., but here's the content:
% cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 9
model name      : Intel(R) Pentium(R) M processor 1600MHz
stepping        : 5
cpu MHz         : 598.132
cache size      : 1024 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr mce cx8 sep mtrr pge mca cmov
pat clflush dts acpi mmx fxsr sse sse2 tm pbe est tm2
bogomips        : 1197.24

BTW, speedstep worked fine on my laptop with 2.6.12-rc5-git5 and
earlier.

> If it's not that, there is a pretty large delta in the ondemand
> governor in this update, but I don't see anything blindlingly
> obvious from looking over it.

Well, is there some way of doing a bisection over these changes? As far
as I know, the problem probably affects all Dell D600 owners, probably
others.

	Jean-Marc

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Suspend to RAM regression tracked down
  2006-07-07 22:48       ` Jean-Marc Valin
@ 2006-07-08  6:23         ` Dave Jones
  2006-07-08 22:55           ` Jean-Marc Valin
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Jones @ 2006-07-08  6:23 UTC (permalink / raw)
  To: Jean-Marc Valin; +Cc: Jeremy Fitzhardinge, Linux Kernel, cpufreq

On Sat, Jul 08, 2006 at 08:48:49AM +1000, Jean-Marc Valin wrote:
 > Le vendredi 07 juillet 2006 à 12:21 -0400, Dave Jones a écrit :
 > > On Fri, Jul 07, 2006 at 09:25:37PM +1000, Jean-Marc Valin wrote:
 > >  > > There was a race in ondemand and conservative which made them lock up on 
 > >  > > resume (possibly only on SMP systems though).  There's a patch for that 
 > >  > > in current -mm, but I suspect there's another problem (still haven't had 
 > >  > > any time to track it down).
 > >  > 
 > >  > OK, I tried the patch with 2.6.17 and it didn't work. My laptop failed
 > >  > to resume on the first try, so it must be something else. Could someone
 > >  > actually have a look at the changes in 2.6.12-rc5-git6 (which happen to
 > >  > be cpufreq-related)? I spend months pinpointing the problem to that
 > >  > version (it's takes several days to reproduce). I'd appreciate if
 > >  > someone could at least have a look at what changed there and maybe fix
 > >  > it.
 > > 
 > > Can you show /proc/cpuinfo for the affected system ?
 > > If it's 15/3/4 or 15/4/1, that would explain why this kernel,
 > > as this was when support for those models got introduced to
 > > speedstep-centrino.
 > 
 > Not sure what's the 15/..., but here's the content:

it was the family, of which yours is 6, so this isn't it.
which means it must be ..

 > > If it's not that, there is a pretty large delta in the ondemand
 > > governor in this update, but I don't see anything blindlingly
 > > obvious from looking over it.
 > 
 > Well, is there some way of doing a bisection over these changes? As far
 > as I know, the problem probably affects all Dell D600 owners, probably
 > others.

If you're prepared to play around with 'git bisect' a little, it shouldn't
take that many iterations, as you've already narrowed it down quite a lot.

$ git bisect start drivers/cpufreq/cpufreq_ondemand.c
$ git bisect bad
$ git bisect good v2.6.12-rc5

should get you most of the way there.

http://www.kernel.org/pub/software/scm/git/docs/git-bisect.html
has more info.

		Dave

-- 
http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Suspend to RAM regression tracked down
  2006-07-08  6:23         ` Dave Jones
@ 2006-07-08 22:55           ` Jean-Marc Valin
  2006-07-09  3:21             ` Dave Jones
  0 siblings, 1 reply; 13+ messages in thread
From: Jean-Marc Valin @ 2006-07-08 22:55 UTC (permalink / raw)
  To: Dave Jones; +Cc: Jeremy Fitzhardinge, Linux Kernel, cpufreq

> If you're prepared to play around with 'git bisect' a little, it shouldn't
> take that many iterations, as you've already narrowed it down quite a lot.
> 
> $ git bisect start drivers/cpufreq/cpufreq_ondemand.c
> $ git bisect bad
> $ git bisect good v2.6.12-rc5
> 
> should get you most of the way there.
> 
> http://www.kernel.org/pub/software/scm/git/docs/git-bisect.html
> has more info.

Could you give me a bit more info, since I've never used git before (I
only downloaded the git snapshots)? Also, if I understand correctly,
cpufreq_ondemand.c is the only file that could cause the problem. Is
that right? Also, is it possible to use an old version of it on a new
kernel?

	Jean-Marc

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Suspend to RAM regression tracked down
  2006-07-08 22:55           ` Jean-Marc Valin
@ 2006-07-09  3:21             ` Dave Jones
  2006-07-09  5:28               ` Jean-Marc Valin
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Jones @ 2006-07-09  3:21 UTC (permalink / raw)
  To: Jean-Marc Valin; +Cc: Jeremy Fitzhardinge, Linux Kernel, cpufreq

On Sun, Jul 09, 2006 at 08:55:02AM +1000, Jean-Marc Valin wrote:
 > > If you're prepared to play around with 'git bisect' a little, it shouldn't
 > > take that many iterations, as you've already narrowed it down quite a lot.
 > > 
 > > $ git bisect start drivers/cpufreq/cpufreq_ondemand.c
 > > $ git bisect bad
 > > $ git bisect good v2.6.12-rc5
 > > 
 > > should get you most of the way there.
 > > 
 > > http://www.kernel.org/pub/software/scm/git/docs/git-bisect.html
 > > has more info.
 > 
 > Could you give me a bit more info, since I've never used git before (I
 > only downloaded the git snapshots)? Also, if I understand correctly,
 > cpufreq_ondemand.c is the only file that could cause the problem. Is
 > that right? Also, is it possible to use an old version of it on a new
 > kernel?

Actually, before deep diving into chasing bugs in ondemand, we should
probably confirm that the same behaviour doesn't happen with a different
governor.  Can you try that ?   Try setting it to userspace, and then
running a userspace app like cpuspeed/powernowd etc.

		Dave

-- 
http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Suspend to RAM regression tracked down
  2006-07-09  3:21             ` Dave Jones
@ 2006-07-09  5:28               ` Jean-Marc Valin
  0 siblings, 0 replies; 13+ messages in thread
From: Jean-Marc Valin @ 2006-07-09  5:28 UTC (permalink / raw)
  To: Dave Jones; +Cc: Jeremy Fitzhardinge, Linux Kernel, cpufreq

> Actually, before deep diving into chasing bugs in ondemand, we should
> probably confirm that the same behaviour doesn't happen with a different
> governor.  Can you try that ?   Try setting it to userspace, and then
> running a userspace app like cpuspeed/powernowd etc.

Well, that's the thing here. I tried userspace and couldn't get it to
crash (for several weeks in a row). However, someone I know with the
same laptop model (not sure it's the exact same revision) told me it
would crash on him with userspace as well.

	Jean-Marc

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2006-07-09  5:28 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-07-02 10:47 Suspend to RAM regression tracked down Jean-Marc Valin
2006-07-02 18:06 ` Jeremy Fitzhardinge
2006-07-02 22:52   ` Jean-Marc Valin
2006-07-03  6:02     ` Jeremy Fitzhardinge
2006-07-03  6:31       ` Jean-Marc Valin
2006-07-03  6:08     ` Jeff Chua
2006-07-07 11:25   ` Jean-Marc Valin
2006-07-07 16:21     ` Dave Jones
2006-07-07 22:48       ` Jean-Marc Valin
2006-07-08  6:23         ` Dave Jones
2006-07-08 22:55           ` Jean-Marc Valin
2006-07-09  3:21             ` Dave Jones
2006-07-09  5:28               ` Jean-Marc Valin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).