linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
@ 2008-01-27 21:29 Rafael J. Wysocki
  2008-01-27 21:59 ` Ingo Molnar
  0 siblings, 1 reply; 27+ messages in thread
From: Rafael J. Wysocki @ 2008-01-27 21:29 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Steven Rostedt, Andrew Morton, Linus Torvalds, LKML

Hi,

2.6.24-git3 adds a 5 - 10 sec delay to the suspend and hibernation code paths
(probably related to the disabling of nonboot CPUs), which is !@#$%^&*()
annoying.

It's 100% reproducible on my HP nx6325 and bisection idendified the following
commit as the first bad one:

commit 764a9d6fe4b52995c8aba277e3634385699354f4
Author: Steven Rostedt <srostedt@redhat.com>
Date:   Fri Jan 25 21:08:04 2008 +0100

    sched: track highest prio task queued

Greetings,
Rafael

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
  2008-01-27 21:29 [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected) Rafael J. Wysocki
@ 2008-01-27 21:59 ` Ingo Molnar
  2008-01-28  1:26   ` Rafael J. Wysocki
  0 siblings, 1 reply; 27+ messages in thread
From: Ingo Molnar @ 2008-01-27 21:59 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Steven Rostedt, Andrew Morton, Linus Torvalds, LKML,
	Dmitry Adamushko, Peter Zijlstra


* Rafael J. Wysocki <rjw@sisk.pl> wrote:

> Hi,
> 
> 2.6.24-git3 adds a 5 - 10 sec delay to the suspend and hibernation 
> code paths (probably related to the disabling of nonboot CPUs), which 
> is !@#$%^&*() annoying.
> 
> It's 100% reproducible on my HP nx6325 and bisection idendified the 
> following commit as the first bad one:
> 
> commit 764a9d6fe4b52995c8aba277e3634385699354f4
> Author: Steven Rostedt <srostedt@redhat.com>
> Date:   Fri Jan 25 21:08:04 2008 +0100
> 
>     sched: track highest prio task queued

hm, this patch is a NOP, so it's weird that it has an effect.

Do you have serial logging enabled perhaps? If the following WARN_ON() 
triggers:

+               WARN_ON(p->prio < rq->rt.highest_prio);

then perhaps that can cause a 5-10 seconds delay. (that's how much time 
it takes to printk a warning on the slowest serial settings)

but if you use suspend, then any such printks would be preserved in the 
dmesg, right? If the WARN_ON() triggers, and if you remove it, do things 
get faster?

this does have the feel of being scheduling related, but are you 
absolutely sure about the precise identity of the patch? It's not the 
next patch or the preceding patch by any chance? (which would also be 
scheduler patches)

	Ingo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
  2008-01-27 21:59 ` Ingo Molnar
@ 2008-01-28  1:26   ` Rafael J. Wysocki
  2008-01-28  1:40     ` Steven Rostedt
                       ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Rafael J. Wysocki @ 2008-01-28  1:26 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Steven Rostedt, Andrew Morton, Linus Torvalds, LKML,
	Dmitry Adamushko, Peter Zijlstra

On Sunday, 27 of January 2008, Ingo Molnar wrote:
> 
> * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> 
> > Hi,
> > 
> > 2.6.24-git3 adds a 5 - 10 sec delay to the suspend and hibernation 
> > code paths (probably related to the disabling of nonboot CPUs), which 
> > is !@#$%^&*() annoying.
> > 
> > It's 100% reproducible on my HP nx6325 and bisection idendified the 
> > following commit as the first bad one:
> > 
> > commit 764a9d6fe4b52995c8aba277e3634385699354f4
> > Author: Steven Rostedt <srostedt@redhat.com>
> > Date:   Fri Jan 25 21:08:04 2008 +0100
> > 
> >     sched: track highest prio task queued
> 
> hm, this patch is a NOP, so it's weird that it has an effect.
> 
> Do you have serial logging enabled perhaps? If the following WARN_ON() 
> triggers:
> 
> +               WARN_ON(p->prio < rq->rt.highest_prio);
> 
> then perhaps that can cause a 5-10 seconds delay. (that's how much time 
> it takes to printk a warning on the slowest serial settings)
> 
> but if you use suspend, then any such printks would be preserved in the 
> dmesg, right? If the WARN_ON() triggers, and if you remove it, do things 
> get faster?

No, this isn't the WARN_ON().

> this does have the feel of being scheduling related, but are you 
> absolutely sure about the precise identity of the patch?

Actually, not quite.  That's why I have verified it and found that another
patch is really responsible for the issue, namely:

commit 82a1fcb90287052aabfa235e7ffc693ea003fe69
Author: Ingo Molnar <mingo@elte.hu>
Date:   Fri Jan 25 21:08:02 2008 +0100

    softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks

Reverting this commit (it reverts with some minor modifications) fixes the
problem for me.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
  2008-01-28  1:26   ` Rafael J. Wysocki
@ 2008-01-28  1:40     ` Steven Rostedt
  2008-01-28 11:31       ` Rafael J. Wysocki
  2008-01-28  8:56     ` Dmitry Adamushko
  2008-01-31 15:58     ` Peter Zijlstra
  2 siblings, 1 reply; 27+ messages in thread
From: Steven Rostedt @ 2008-01-28  1:40 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Ingo Molnar, Andrew Morton, Linus Torvalds, LKML,
	Dmitry Adamushko, Peter Zijlstra

Rafael J. Wysocki wrote:

> No, this isn't the WARN_ON().
> 
>> this does have the feel of being scheduling related, but are you 
>> absolutely sure about the precise identity of the patch?
> 
> Actually, not quite.  That's why I have verified it and found that another
> patch is really responsible for the issue, namely:
> 
> commit 82a1fcb90287052aabfa235e7ffc693ea003fe69
> Author: Ingo Molnar <mingo@elte.hu>
> Date:   Fri Jan 25 21:08:02 2008 +0100
> 
>     softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks

Are you getting a bunch of prints from the softlockup detector in dmesg?
I wonder if the detector can detect a long timeout caused by suspend and 
resume and if not is triggering false positives?

-- Steve

> 
> Reverting this commit (it reverts with some minor modifications) fixes the
> problem for me.
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
  2008-01-28  1:26   ` Rafael J. Wysocki
  2008-01-28  1:40     ` Steven Rostedt
@ 2008-01-28  8:56     ` Dmitry Adamushko
  2008-01-28 11:32       ` Rafael J. Wysocki
  2008-01-28 16:08       ` Rafael J. Wysocki
  2008-01-31 15:58     ` Peter Zijlstra
  2 siblings, 2 replies; 27+ messages in thread
From: Dmitry Adamushko @ 2008-01-28  8:56 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Ingo Molnar, Steven Rostedt, Andrew Morton, Linus Torvalds, LKML,
	Peter Zijlstra

On 28/01/2008, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Sunday, 27 of January 2008, Ingo Molnar wrote:
> >
> > * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> >
> > > Hi,
> > >
> > > 2.6.24-git3 adds a 5 - 10 sec delay to the suspend and hibernation
> > > code paths (probably related to the disabling of nonboot CPUs), which
> > > is !@#$%^&*() annoying.
> > >
> > > It's 100% reproducible on my HP nx6325 and bisection idendified the
> > > following commit as the first bad one:
> > >
> > > commit 764a9d6fe4b52995c8aba277e3634385699354f4
> > > Author: Steven Rostedt <srostedt@redhat.com>
> > > Date:   Fri Jan 25 21:08:04 2008 +0100
> > >
> > >     sched: track highest prio task queued
> >
> > hm, this patch is a NOP, so it's weird that it has an effect.
> >
> > Do you have serial logging enabled perhaps? If the following WARN_ON()
> > triggers:
> >
> > +               WARN_ON(p->prio < rq->rt.highest_prio);
> >
> > then perhaps that can cause a 5-10 seconds delay. (that's how much time
> > it takes to printk a warning on the slowest serial settings)
> >
> > but if you use suspend, then any such printks would be preserved in the
> > dmesg, right? If the WARN_ON() triggers, and if you remove it, do things
> > get faster?
>
> No, this isn't the WARN_ON().
>
> > this does have the feel of being scheduling related, but are you
> > absolutely sure about the precise identity of the patch?
>
> Actually, not quite.  That's why I have verified it and found that another
> patch is really responsible for the issue, namely:
>
> commit 82a1fcb90287052aabfa235e7ffc693ea003fe69
> Author: Ingo Molnar <mingo@elte.hu>
> Date:   Fri Jan 25 21:08:02 2008 +0100
>
>     softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks
>
> Reverting this commit (it reverts with some minor modifications) fixes the
> problem for me.

What if you use the same kernel that triggers a problem and just disable
this new 'softlockup' functionality:

echo 0 > /proc/sys/kernel/hung_task_timeout_secs

does the problem disapear?

TIA,

>
> Thanks,
> Rafael
>


-- 
Best regards,
Dmitry Adamushko

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
  2008-01-28  1:40     ` Steven Rostedt
@ 2008-01-28 11:31       ` Rafael J. Wysocki
  2008-01-28 16:31         ` Rafael J. Wysocki
  0 siblings, 1 reply; 27+ messages in thread
From: Rafael J. Wysocki @ 2008-01-28 11:31 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, Andrew Morton, Linus Torvalds, LKML,
	Dmitry Adamushko, Peter Zijlstra

On Monday, 28 of January 2008, Steven Rostedt wrote:
> Rafael J. Wysocki wrote:
> 
> > No, this isn't the WARN_ON().
> > 
> >> this does have the feel of being scheduling related, but are you 
> >> absolutely sure about the precise identity of the patch?
> > 
> > Actually, not quite.  That's why I have verified it and found that another
> > patch is really responsible for the issue, namely:
> > 
> > commit 82a1fcb90287052aabfa235e7ffc693ea003fe69
> > Author: Ingo Molnar <mingo@elte.hu>
> > Date:   Fri Jan 25 21:08:02 2008 +0100
> > 
> >     softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks
> 
> Are you getting a bunch of prints from the softlockup detector in dmesg?

No, I don't.  In fact, I don't get _any_ messages from it whatsoever.

> I wonder if the detector can detect a long timeout caused by suspend and 
> resume and if not is triggering false positives?

I'm not sure, but the code is supposed to be suspend-aware, IIRC.  However,
I'm seeing a similar symptom on poweroff on an SMP x86-64 box, so it may be
more directly related to the CPU hotplug.  I'll try to verify that.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
  2008-01-28  8:56     ` Dmitry Adamushko
@ 2008-01-28 11:32       ` Rafael J. Wysocki
  2008-01-28 16:08       ` Rafael J. Wysocki
  1 sibling, 0 replies; 27+ messages in thread
From: Rafael J. Wysocki @ 2008-01-28 11:32 UTC (permalink / raw)
  To: Dmitry Adamushko
  Cc: Ingo Molnar, Steven Rostedt, Andrew Morton, Linus Torvalds, LKML,
	Peter Zijlstra

On Monday, 28 of January 2008, Dmitry Adamushko wrote:
> On 28/01/2008, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > On Sunday, 27 of January 2008, Ingo Molnar wrote:
> > >
> > > * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > >
> > > > Hi,
> > > >
> > > > 2.6.24-git3 adds a 5 - 10 sec delay to the suspend and hibernation
> > > > code paths (probably related to the disabling of nonboot CPUs), which
> > > > is !@#$%^&*() annoying.
> > > >
> > > > It's 100% reproducible on my HP nx6325 and bisection idendified the
> > > > following commit as the first bad one:
> > > >
> > > > commit 764a9d6fe4b52995c8aba277e3634385699354f4
> > > > Author: Steven Rostedt <srostedt@redhat.com>
> > > > Date:   Fri Jan 25 21:08:04 2008 +0100
> > > >
> > > >     sched: track highest prio task queued
> > >
> > > hm, this patch is a NOP, so it's weird that it has an effect.
> > >
> > > Do you have serial logging enabled perhaps? If the following WARN_ON()
> > > triggers:
> > >
> > > +               WARN_ON(p->prio < rq->rt.highest_prio);
> > >
> > > then perhaps that can cause a 5-10 seconds delay. (that's how much time
> > > it takes to printk a warning on the slowest serial settings)
> > >
> > > but if you use suspend, then any such printks would be preserved in the
> > > dmesg, right? If the WARN_ON() triggers, and if you remove it, do things
> > > get faster?
> >
> > No, this isn't the WARN_ON().
> >
> > > this does have the feel of being scheduling related, but are you
> > > absolutely sure about the precise identity of the patch?
> >
> > Actually, not quite.  That's why I have verified it and found that another
> > patch is really responsible for the issue, namely:
> >
> > commit 82a1fcb90287052aabfa235e7ffc693ea003fe69
> > Author: Ingo Molnar <mingo@elte.hu>
> > Date:   Fri Jan 25 21:08:02 2008 +0100
> >
> >     softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks
> >
> > Reverting this commit (it reverts with some minor modifications) fixes the
> > problem for me.
> 
> What if you use the same kernel that triggers a problem and just disable
> this new 'softlockup' functionality:
> 
> echo 0 > /proc/sys/kernel/hung_task_timeout_secs
> 
> does the problem disapear?

I haven't checked that yet, will do it later today.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
  2008-01-28  8:56     ` Dmitry Adamushko
  2008-01-28 11:32       ` Rafael J. Wysocki
@ 2008-01-28 16:08       ` Rafael J. Wysocki
  1 sibling, 0 replies; 27+ messages in thread
From: Rafael J. Wysocki @ 2008-01-28 16:08 UTC (permalink / raw)
  To: Dmitry Adamushko
  Cc: Ingo Molnar, Steven Rostedt, Andrew Morton, Linus Torvalds, LKML,
	Peter Zijlstra

On Monday, 28 of January 2008, Dmitry Adamushko wrote:
> On 28/01/2008, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > On Sunday, 27 of January 2008, Ingo Molnar wrote:
> > >
> > > * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > >
> > > > Hi,
> > > >
> > > > 2.6.24-git3 adds a 5 - 10 sec delay to the suspend and hibernation
> > > > code paths (probably related to the disabling of nonboot CPUs), which
> > > > is !@#$%^&*() annoying.
> > > >
> > > > It's 100% reproducible on my HP nx6325 and bisection idendified the
> > > > following commit as the first bad one:
> > > >
> > > > commit 764a9d6fe4b52995c8aba277e3634385699354f4
> > > > Author: Steven Rostedt <srostedt@redhat.com>
> > > > Date:   Fri Jan 25 21:08:04 2008 +0100
> > > >
> > > >     sched: track highest prio task queued
> > >
> > > hm, this patch is a NOP, so it's weird that it has an effect.
> > >
> > > Do you have serial logging enabled perhaps? If the following WARN_ON()
> > > triggers:
> > >
> > > +               WARN_ON(p->prio < rq->rt.highest_prio);
> > >
> > > then perhaps that can cause a 5-10 seconds delay. (that's how much time
> > > it takes to printk a warning on the slowest serial settings)
> > >
> > > but if you use suspend, then any such printks would be preserved in the
> > > dmesg, right? If the WARN_ON() triggers, and if you remove it, do things
> > > get faster?
> >
> > No, this isn't the WARN_ON().
> >
> > > this does have the feel of being scheduling related, but are you
> > > absolutely sure about the precise identity of the patch?
> >
> > Actually, not quite.  That's why I have verified it and found that another
> > patch is really responsible for the issue, namely:
> >
> > commit 82a1fcb90287052aabfa235e7ffc693ea003fe69
> > Author: Ingo Molnar <mingo@elte.hu>
> > Date:   Fri Jan 25 21:08:02 2008 +0100
> >
> >     softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks
> >
> > Reverting this commit (it reverts with some minor modifications) fixes the
> > problem for me.
> 
> What if you use the same kernel that triggers a problem and just disable
> this new 'softlockup' functionality:
> 
> echo 0 > /proc/sys/kernel/hung_task_timeout_secs
> 
> does the problem disapear?

No, it doesn't.  The setting doesn't seem to have any effect on it.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
  2008-01-28 11:31       ` Rafael J. Wysocki
@ 2008-01-28 16:31         ` Rafael J. Wysocki
  2008-01-28 16:46           ` Steven Rostedt
  2008-01-29  0:08           ` Rafael J. Wysocki
  0 siblings, 2 replies; 27+ messages in thread
From: Rafael J. Wysocki @ 2008-01-28 16:31 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Steven Rostedt, Andrew Morton, Linus Torvalds, LKML,
	Dmitry Adamushko, Peter Zijlstra

On Monday, 28 of January 2008, Rafael J. Wysocki wrote:
> On Monday, 28 of January 2008, Steven Rostedt wrote:
> > Rafael J. Wysocki wrote:
> > 
> > > No, this isn't the WARN_ON().
> > > 
> > >> this does have the feel of being scheduling related, but are you 
> > >> absolutely sure about the precise identity of the patch?
> > > 
> > > Actually, not quite.  That's why I have verified it and found that another
> > > patch is really responsible for the issue, namely:
> > > 
> > > commit 82a1fcb90287052aabfa235e7ffc693ea003fe69
> > > Author: Ingo Molnar <mingo@elte.hu>
> > > Date:   Fri Jan 25 21:08:02 2008 +0100
> > > 
> > >     softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks
> > 
> > Are you getting a bunch of prints from the softlockup detector in dmesg?
> 
> No, I don't.  In fact, I don't get _any_ messages from it whatsoever.
> 
> > I wonder if the detector can detect a long timeout caused by suspend and 
> > resume and if not is triggering false positives?
> 
> I'm not sure, but the code is supposed to be suspend-aware, IIRC.  However,
> I'm seeing a similar symptom on poweroff on an SMP x86-64 box, so it may be
> more directly related to the CPU hotplug.  I'll try to verify that.

As I expected, the delay is also observable when I do:

echo 0 > /sys/devices/system/cpu/cpu1/online

(it's variable, between 3 and 30 seconds).  Again, no messages appear in dmesg
when this happens.

I suspect I'll be able to reproduce it on another x86-64 SMP machine (I'm going
to try that later today).

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
  2008-01-28 16:31         ` Rafael J. Wysocki
@ 2008-01-28 16:46           ` Steven Rostedt
  2008-01-29  0:08           ` Rafael J. Wysocki
  1 sibling, 0 replies; 27+ messages in thread
From: Steven Rostedt @ 2008-01-28 16:46 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Ingo Molnar, Andrew Morton, Linus Torvalds, LKML,
	Dmitry Adamushko, Peter Zijlstra

Rafael J. Wysocki wrote:
> On Monday, 28 of January 2008, Rafael J. Wysocki wrote:
>> On Monday, 28 of January 2008, Steven Rostedt wrote:
>>> Rafael J. Wysocki wrote:
>>>
>>>> No, this isn't the WARN_ON().
>>>>
>>>>> this does have the feel of being scheduling related, but are you 
>>>>> absolutely sure about the precise identity of the patch?
>>>> Actually, not quite.  That's why I have verified it and found that another
>>>> patch is really responsible for the issue, namely:
>>>>
>>>> commit 82a1fcb90287052aabfa235e7ffc693ea003fe69
>>>> Author: Ingo Molnar <mingo@elte.hu>
>>>> Date:   Fri Jan 25 21:08:02 2008 +0100
>>>>
>>>>     softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks
>>> Are you getting a bunch of prints from the softlockup detector in dmesg?
>> No, I don't.  In fact, I don't get _any_ messages from it whatsoever.
>>
>>> I wonder if the detector can detect a long timeout caused by suspend and 
>>> resume and if not is triggering false positives?
>> I'm not sure, but the code is supposed to be suspend-aware, IIRC.  However,
>> I'm seeing a similar symptom on poweroff on an SMP x86-64 box, so it may be
>> more directly related to the CPU hotplug.  I'll try to verify that.
> 
> As I expected, the delay is also observable when I do:
> 
> echo 0 > /sys/devices/system/cpu/cpu1/online
> 
> (it's variable, between 3 and 30 seconds).  Again, no messages appear in dmesg
> when this happens.
> 
> I suspect I'll be able to reproduce it on another x86-64 SMP machine (I'm going
> to try that later today).

Could you download my tracing patches from here:

 
http://people.redhat.com/srostedt/tracing/mcount-tracing-patches-v6.tar.bz2

Apply the above patches and select all but the histogram tracers. Then 
run this program

   http://people.redhat.com/srostedt/tracing/trace-cmd.c

  ./trace-cmd -f echo 0 > /sys/devices/system/cpu/cpu1/online

then do this (mount debugfs):

   bzip2 -c /debugfs/tracing/latency_trace > /tmp/trace.bz2

and send the result to myself and Ingo.

Thanks,

-- Steve

> 
> Thanks,
> Rafael


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
  2008-01-28 16:31         ` Rafael J. Wysocki
  2008-01-28 16:46           ` Steven Rostedt
@ 2008-01-29  0:08           ` Rafael J. Wysocki
  1 sibling, 0 replies; 27+ messages in thread
From: Rafael J. Wysocki @ 2008-01-29  0:08 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Steven Rostedt, Andrew Morton, Linus Torvalds, LKML,
	Dmitry Adamushko, Peter Zijlstra

On Monday, 28 of January 2008, Rafael J. Wysocki wrote:
> On Monday, 28 of January 2008, Rafael J. Wysocki wrote:
> > On Monday, 28 of January 2008, Steven Rostedt wrote:
> > > Rafael J. Wysocki wrote:
> > > 
> > > > No, this isn't the WARN_ON().
> > > > 
> > > >> this does have the feel of being scheduling related, but are you 
> > > >> absolutely sure about the precise identity of the patch?
> > > > 
> > > > Actually, not quite.  That's why I have verified it and found that another
> > > > patch is really responsible for the issue, namely:
> > > > 
> > > > commit 82a1fcb90287052aabfa235e7ffc693ea003fe69
> > > > Author: Ingo Molnar <mingo@elte.hu>
> > > > Date:   Fri Jan 25 21:08:02 2008 +0100
> > > > 
> > > >     softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks
> > > 
> > > Are you getting a bunch of prints from the softlockup detector in dmesg?
> > 
> > No, I don't.  In fact, I don't get _any_ messages from it whatsoever.
> > 
> > > I wonder if the detector can detect a long timeout caused by suspend and 
> > > resume and if not is triggering false positives?
> > 
> > I'm not sure, but the code is supposed to be suspend-aware, IIRC.  However,
> > I'm seeing a similar symptom on poweroff on an SMP x86-64 box, so it may be
> > more directly related to the CPU hotplug.  I'll try to verify that.
> 
> As I expected, the delay is also observable when I do:
> 
> echo 0 > /sys/devices/system/cpu/cpu1/online
> 
> (it's variable, between 3 and 30 seconds).  Again, no messages appear in dmesg
> when this happens.
> 
> I suspect I'll be able to reproduce it on another x86-64 SMP machine (I'm going
> to try that later today).

FWIW, the problem is reproducible on the second machine, with the same
symptoms.  Both machines are AMD-based.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
  2008-01-28  1:26   ` Rafael J. Wysocki
  2008-01-28  1:40     ` Steven Rostedt
  2008-01-28  8:56     ` Dmitry Adamushko
@ 2008-01-31 15:58     ` Peter Zijlstra
  2008-01-31 20:54       ` Rafael J. Wysocki
  2 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2008-01-31 15:58 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Ingo Molnar, Steven Rostedt, Andrew Morton, Linus Torvalds, LKML,
	Dmitry Adamushko


On Mon, 2008-01-28 at 02:26 +0100, Rafael J. Wysocki wrote:
> On Sunday, 27 of January 2008, Ingo Molnar wrote:
> > 
> > * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > 
> > > Hi,
> > > 
> > > 2.6.24-git3 adds a 5 - 10 sec delay to the suspend and hibernation 
> > > code paths (probably related to the disabling of nonboot CPUs), which 
> > > is !@#$%^&*() annoying.
> > > 
> > > It's 100% reproducible on my HP nx6325 and bisection idendified the 
> > > following commit as the first bad one:
> > > 
> > > commit 764a9d6fe4b52995c8aba277e3634385699354f4
> > > Author: Steven Rostedt <srostedt@redhat.com>
> > > Date:   Fri Jan 25 21:08:04 2008 +0100
> > > 
> > >     sched: track highest prio task queued
> > 
> > hm, this patch is a NOP, so it's weird that it has an effect.
> > 
> > Do you have serial logging enabled perhaps? If the following WARN_ON() 
> > triggers:
> > 
> > +               WARN_ON(p->prio < rq->rt.highest_prio);
> > 
> > then perhaps that can cause a 5-10 seconds delay. (that's how much time 
> > it takes to printk a warning on the slowest serial settings)
> > 
> > but if you use suspend, then any such printks would be preserved in the 
> > dmesg, right? If the WARN_ON() triggers, and if you remove it, do things 
> > get faster?
> 
> No, this isn't the WARN_ON().
> 
> > this does have the feel of being scheduling related, but are you 
> > absolutely sure about the precise identity of the patch?
> 
> Actually, not quite.  That's why I have verified it and found that another
> patch is really responsible for the issue, namely:
> 
> commit 82a1fcb90287052aabfa235e7ffc693ea003fe69
> Author: Ingo Molnar <mingo@elte.hu>
> Date:   Fri Jan 25 21:08:02 2008 +0100
> 
>     softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks
> 
> Reverting this commit (it reverts with some minor modifications) fixes the
> problem for me.

I can seem to reproduce this:

[root@opteron cpu1]# time echo 0 > online

real    0m6.230s
user    0m0.000s
sys     0m0.010s
[root@opteron cpu1]# echo 1 > online
[root@opteron cpu1]# time echo 0 > online

real    0m7.966s
user    0m0.000s
sys     0m0.011s


I'll have a look at it.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
  2008-01-31 15:58     ` Peter Zijlstra
@ 2008-01-31 20:54       ` Rafael J. Wysocki
  2008-02-01 12:04         ` Peter Zijlstra
  0 siblings, 1 reply; 27+ messages in thread
From: Rafael J. Wysocki @ 2008-01-31 20:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Steven Rostedt, Andrew Morton, Linus Torvalds, LKML,
	Dmitry Adamushko

On Thursday, 31 of January 2008, Peter Zijlstra wrote:
> 
> On Mon, 2008-01-28 at 02:26 +0100, Rafael J. Wysocki wrote:
> > On Sunday, 27 of January 2008, Ingo Molnar wrote:
> > > 
> > > * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > > 
> > > > Hi,
> > > > 
> > > > 2.6.24-git3 adds a 5 - 10 sec delay to the suspend and hibernation 
> > > > code paths (probably related to the disabling of nonboot CPUs), which 
> > > > is !@#$%^&*() annoying.
> > > > 
> > > > It's 100% reproducible on my HP nx6325 and bisection idendified the 
> > > > following commit as the first bad one:
> > > > 
> > > > commit 764a9d6fe4b52995c8aba277e3634385699354f4
> > > > Author: Steven Rostedt <srostedt@redhat.com>
> > > > Date:   Fri Jan 25 21:08:04 2008 +0100
> > > > 
> > > >     sched: track highest prio task queued
> > > 
> > > hm, this patch is a NOP, so it's weird that it has an effect.
> > > 
> > > Do you have serial logging enabled perhaps? If the following WARN_ON() 
> > > triggers:
> > > 
> > > +               WARN_ON(p->prio < rq->rt.highest_prio);
> > > 
> > > then perhaps that can cause a 5-10 seconds delay. (that's how much time 
> > > it takes to printk a warning on the slowest serial settings)
> > > 
> > > but if you use suspend, then any such printks would be preserved in the 
> > > dmesg, right? If the WARN_ON() triggers, and if you remove it, do things 
> > > get faster?
> > 
> > No, this isn't the WARN_ON().
> > 
> > > this does have the feel of being scheduling related, but are you 
> > > absolutely sure about the precise identity of the patch?
> > 
> > Actually, not quite.  That's why I have verified it and found that another
> > patch is really responsible for the issue, namely:
> > 
> > commit 82a1fcb90287052aabfa235e7ffc693ea003fe69
> > Author: Ingo Molnar <mingo@elte.hu>
> > Date:   Fri Jan 25 21:08:02 2008 +0100
> > 
> >     softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks
> > 
> > Reverting this commit (it reverts with some minor modifications) fixes the
> > problem for me.
> 
> I can seem to reproduce this:
> 
> [root@opteron cpu1]# time echo 0 > online
> 
> real    0m6.230s
> user    0m0.000s
> sys     0m0.010s
> [root@opteron cpu1]# echo 1 > online
> [root@opteron cpu1]# time echo 0 > online
> 
> real    0m7.966s
> user    0m0.000s
> sys     0m0.011s
> 
> 
> I'll have a look at it.

Much appreciated, thanks!

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
  2008-01-31 20:54       ` Rafael J. Wysocki
@ 2008-02-01 12:04         ` Peter Zijlstra
  2008-02-01 12:47           ` Ingo Molnar
  2008-02-01 15:11           ` Dmitry Adamushko
  0 siblings, 2 replies; 27+ messages in thread
From: Peter Zijlstra @ 2008-02-01 12:04 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Ingo Molnar, Steven Rostedt, Andrew Morton, Linus Torvalds, LKML,
	Dmitry Adamushko


On Thu, 2008-01-31 at 21:54 +0100, Rafael J. Wysocki wrote:
> On Thursday, 31 of January 2008, Peter Zijlstra wrote:

> > I can seem to reproduce this:
> > 
> > [root@opteron cpu1]# time echo 0 > online
> > 
> > real    0m6.230s
> > user    0m0.000s
> > sys     0m0.010s
> > [root@opteron cpu1]# echo 1 > online
> > [root@opteron cpu1]# time echo 0 > online
> > 
> > real    0m7.966s
> > user    0m0.000s
> > sys     0m0.011s
> > 
> > 
> > I'll have a look at it.
> 
> Much appreciated, thanks!

The below fixes it for me..

---
 - restore the old wakeup mechanism
 - fix break usage in do_each_thread() { } while_eac_thread().
 - fix the hotplug switch stmt, a fall-through case was broken.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---

diff --git a/kernel/softlockup.c b/kernel/softlockup.c
index c1d7655..7c2da88 100644
--- a/kernel/softlockup.c
+++ b/kernel/softlockup.c
@@ -101,6 +101,10 @@ void softlockup_tick(void)
 
 	now = get_timestamp(this_cpu);
 
+	/* Wake up the high-prio watchdog task every second: */
+	if (now > (touch_timestamp + 1))
+		wake_up_process(per_cpu(watchdog_task, this_cpu));
+
 	/* Warn about unreasonable delays: */
 	if (now <= (touch_timestamp + softlockup_thresh))
 		return;
@@ -191,11 +195,11 @@ static void check_hung_uninterruptible_tasks(int this_cpu)
 	read_lock(&tasklist_lock);
 	do_each_thread(g, t) {
 		if (!--max_count)
-			break;
+			goto unlock;
 		if (t->state & TASK_UNINTERRUPTIBLE)
 			check_hung_task(t, now);
 	} while_each_thread(g, t);
-
+ unlock:
 	read_unlock(&tasklist_lock);
 }
 
@@ -218,14 +222,19 @@ static int watchdog(void *__bind_cpu)
 	 * debug-printout triggers in softlockup_tick().
 	 */
 	while (!kthread_should_stop()) {
+		set_current_state(TASK_INTERRUPTIBLE);
 		touch_softlockup_watchdog();
-		msleep_interruptible(10000);
+		schedule();
+
+		if (kthread_should_stop())
+			break;
 
 		if (this_cpu != check_cpu)
 			continue;
 
 		if (sysctl_hung_task_timeout_secs)
 			check_hung_uninterruptible_tasks(this_cpu);
+
 	}
 
 	return 0;
@@ -259,13 +268,6 @@ cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)
 		wake_up_process(per_cpu(watchdog_task, hotcpu));
 		break;
 #ifdef CONFIG_HOTPLUG_CPU
-	case CPU_UP_CANCELED:
-	case CPU_UP_CANCELED_FROZEN:
-		if (!per_cpu(watchdog_task, hotcpu))
-			break;
-		/* Unbind so it can run.  Fall thru. */
-		kthread_bind(per_cpu(watchdog_task, hotcpu),
-			     any_online_cpu(cpu_online_map));
 	case CPU_DOWN_PREPARE:
 	case CPU_DOWN_PREPARE_FROZEN:
 		if (hotcpu == check_cpu) {
@@ -275,6 +277,14 @@ cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)
 			check_cpu = any_online_cpu(temp_cpu_online_map);
 		}
 		break;
+
+	case CPU_UP_CANCELED:
+	case CPU_UP_CANCELED_FROZEN:
+		if (!per_cpu(watchdog_task, hotcpu))
+			break;
+		/* Unbind so it can run.  Fall thru. */
+		kthread_bind(per_cpu(watchdog_task, hotcpu),
+			     any_online_cpu(cpu_online_map));
 	case CPU_DEAD:
 	case CPU_DEAD_FROZEN:
 		p = per_cpu(watchdog_task, hotcpu);



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
  2008-02-01 12:04         ` Peter Zijlstra
@ 2008-02-01 12:47           ` Ingo Molnar
  2008-02-01 14:42             ` Rafael J. Wysocki
  2008-02-01 15:11           ` Dmitry Adamushko
  1 sibling, 1 reply; 27+ messages in thread
From: Ingo Molnar @ 2008-02-01 12:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Rafael J. Wysocki, Steven Rostedt, Andrew Morton, Linus Torvalds,
	LKML, Dmitry Adamushko


* Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> The below fixes it for me..
> 
> ---
>  - restore the old wakeup mechanism
>  - fix break usage in do_each_thread() { } while_eac_thread().
>  - fix the hotplug switch stmt, a fall-through case was broken.

thanks Peter, i've picked your fix up. I hope this solves the problem 
for Rafael too.

	Ingo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
  2008-02-01 12:47           ` Ingo Molnar
@ 2008-02-01 14:42             ` Rafael J. Wysocki
  2008-02-01 23:19               ` Rafael J. Wysocki
  0 siblings, 1 reply; 27+ messages in thread
From: Rafael J. Wysocki @ 2008-02-01 14:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Steven Rostedt, Andrew Morton, Linus Torvalds,
	LKML, Dmitry Adamushko

On Friday, 1 of February 2008, Ingo Molnar wrote:
> 
> * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> 
> > The below fixes it for me..
> > 
> > ---
> >  - restore the old wakeup mechanism
> >  - fix break usage in do_each_thread() { } while_eac_thread().
> >  - fix the hotplug switch stmt, a fall-through case was broken.
> 
> thanks Peter, i've picked your fix up. I hope this solves the problem 
> for Rafael too.

No, it doesn't, unfortunately.  Actually, it breaks things even more, as
suspend now hangs (probably deadlocks) while disabling the nonboot CPU.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
  2008-02-01 12:04         ` Peter Zijlstra
  2008-02-01 12:47           ` Ingo Molnar
@ 2008-02-01 15:11           ` Dmitry Adamushko
  2008-02-01 17:10             ` Ingo Molnar
  1 sibling, 1 reply; 27+ messages in thread
From: Dmitry Adamushko @ 2008-02-01 15:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Rafael J. Wysocki, Ingo Molnar, Steven Rostedt, Andrew Morton,
	Linus Torvalds, LKML

On 01/02/2008, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> ---
>  - restore the old wakeup mechanism

and how does it change behavior, logically-wise?

do we somehow miss a 'wake-up' from kthread_stop() so that its caller
gets blocked until watchdog's msleep_interruptible(10000) timeouts? On
average, it would take +-5 sec. and might explain the first
observation of Ravael -- "...adds a 5 - 10 sec delay..." (although,
lately he reported up to +30 sec. delays).

(/me goint to also try reproducing it later today)

> [ ... ]

-- 
Best regards,
Dmitry Adamushko

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
  2008-02-01 15:11           ` Dmitry Adamushko
@ 2008-02-01 17:10             ` Ingo Molnar
  2008-02-01 21:54               ` Dmitry Adamushko
  0 siblings, 1 reply; 27+ messages in thread
From: Ingo Molnar @ 2008-02-01 17:10 UTC (permalink / raw)
  To: Dmitry Adamushko
  Cc: Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt, Andrew Morton,
	Linus Torvalds, LKML


* Dmitry Adamushko <dmitry.adamushko@gmail.com> wrote:

> On 01/02/2008, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> 
> > ---
> >  - restore the old wakeup mechanism
> 
> and how does it change behavior, logically-wise?
> 
> do we somehow miss a 'wake-up' from kthread_stop() so that its caller 
> gets blocked until watchdog's msleep_interruptible(10000) timeouts? On 
> average, it would take +-5 sec. and might explain the first 
> observation of Ravael -- "...adds a 5 - 10 sec delay..." (although, 
> lately he reported up to +30 sec. delays).
> 
> (/me goint to also try reproducing it later today)

thanks - i cannot reproduce it on my usual suspend/resume testbox 
because e1000 broke on it, and this is a pretty annoying regression. 
We'll have to undo the hung-tasks detection feature if it's not fixed 
quickly. (there's no point in debugging features that _add_ bugs)

	Ingo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
  2008-02-01 17:10             ` Ingo Molnar
@ 2008-02-01 21:54               ` Dmitry Adamushko
  2008-02-01 22:44                 ` Dmitry Adamushko
  0 siblings, 1 reply; 27+ messages in thread
From: Dmitry Adamushko @ 2008-02-01 21:54 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt, Andrew Morton,
	Linus Torvalds, LKML

On 01/02/2008, Ingo Molnar <mingo@elte.hu> wrote:
>
> thanks - i cannot reproduce it on my usual suspend/resume testbox
> because e1000 broke on it, and this is a pretty annoying regression.
> We'll have to undo the hung-tasks detection feature if it's not fixed
> quickly. (there's no point in debugging features that _add_ bugs)

I can reproduce it and Peter's patch does fix it for me.

e.g.
root@earth:/sys/devices/system/cpu/cpu1# time echo 0 > online
real    0m6.743s
root@earth:/sys/devices/system/cpu/cpu1# time echo 0 > online
real    0m7.770s

I've observed delays from ~3 s. up to ~8 s. (out of ~20 tests) so the
10s. delay of msleep_interruptible() might be related but
I'm still looking for the reason why this fix helps (and what goes
wrong with the current code).


>
>         Ingo
>

-- 
Best regards,
Dmitry Adamushko

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
  2008-02-01 21:54               ` Dmitry Adamushko
@ 2008-02-01 22:44                 ` Dmitry Adamushko
  2008-02-01 22:48                   ` Ingo Molnar
  0 siblings, 1 reply; 27+ messages in thread
From: Dmitry Adamushko @ 2008-02-01 22:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt, Andrew Morton,
	Linus Torvalds, LKML

On 01/02/2008, Dmitry Adamushko <dmitry.adamushko@gmail.com> wrote:
> On 01/02/2008, Ingo Molnar <mingo@elte.hu> wrote:
> >
> > thanks - i cannot reproduce it on my usual suspend/resume testbox
> > because e1000 broke on it, and this is a pretty annoying regression.
> > We'll have to undo the hung-tasks detection feature if it's not fixed
> > quickly. (there's no point in debugging features that _add_ bugs)
>
> I can reproduce it and Peter's patch does fix it for me.
>
> e.g.
> root@earth:/sys/devices/system/cpu/cpu1# time echo 0 > online
> real    0m6.743s
> root@earth:/sys/devices/system/cpu/cpu1# time echo 0 > online
> real    0m7.770s
>
> I've observed delays from ~3 s. up to ~8 s. (out of ~20 tests) so the
> 10s. delay of msleep_interruptible() might be related but
> I'm still looking for the reason why this fix helps (and what goes
> wrong with the current code).
>

heh... it's pretty obvious indeed. What's msleep_interruptible() is
all about? :-)

"sleep waiting for signals"

so the 'watchdog' thread gets woken up

[ as a result of cpu_callback(action = CPU_DEAD) --> kthread_stop() ]

just to be immediately scheduled out again for as long as the
remaining timeout > 0.

So it should work if we substitute msleep_interruptible() with
schedule_timeout_interruptible().


-- 
Best regards,
Dmitry Adamushko

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
  2008-02-01 22:44                 ` Dmitry Adamushko
@ 2008-02-01 22:48                   ` Ingo Molnar
  2008-02-01 23:22                     ` Dmitry Adamushko
  0 siblings, 1 reply; 27+ messages in thread
From: Ingo Molnar @ 2008-02-01 22:48 UTC (permalink / raw)
  To: Dmitry Adamushko
  Cc: Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt, Andrew Morton,
	Linus Torvalds, LKML


* Dmitry Adamushko <dmitry.adamushko@gmail.com> wrote:

> > I've observed delays from ~3 s. up to ~8 s. (out of ~20 tests) so 
> > the 10s. delay of msleep_interruptible() might be related but I'm 
> > still looking for the reason why this fix helps (and what goes wrong 
> > with the current code).
> 
> heh... it's pretty obvious indeed. What's msleep_interruptible() is 
> all about? :-)
> 
> "sleep waiting for signals"
> 
> so the 'watchdog' thread gets woken up
> 
> [ as a result of cpu_callback(action = CPU_DEAD) --> kthread_stop() ]
> 
> just to be immediately scheduled out again for as long as the
> remaining timeout > 0.
> 
> So it should work if we substitute msleep_interruptible() with 
> schedule_timeout_interruptible().

Doh. Could someone with ths problem please test the patch below, does it 
do the trick?

	Ingo

---
 kernel/softlockup.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/kernel/softlockup.c
===================================================================
--- linux.orig/kernel/softlockup.c
+++ linux/kernel/softlockup.c
@@ -228,7 +228,7 @@ static int watchdog(void *__bind_cpu)
 	 */
 	while (!kthread_should_stop()) {
 		touch_softlockup_watchdog();
-		msleep_interruptible(10000);
+		schedule_timeout_interruptible(10*HZ);
 
 		if (this_cpu != check_cpu)
 			continue;

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
  2008-02-01 14:42             ` Rafael J. Wysocki
@ 2008-02-01 23:19               ` Rafael J. Wysocki
  2008-02-01 23:24                 ` Ingo Molnar
  0 siblings, 1 reply; 27+ messages in thread
From: Rafael J. Wysocki @ 2008-02-01 23:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Steven Rostedt, Andrew Morton, Linus Torvalds,
	LKML, Dmitry Adamushko

On Friday, 1 of February 2008, Rafael J. Wysocki wrote:
> On Friday, 1 of February 2008, Ingo Molnar wrote:
> > 
> > * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> > 
> > > The below fixes it for me..
> > > 
> > > ---
> > >  - restore the old wakeup mechanism
> > >  - fix break usage in do_each_thread() { } while_eac_thread().
> > >  - fix the hotplug switch stmt, a fall-through case was broken.
> > 
> > thanks Peter, i've picked your fix up. I hope this solves the problem 
> > for Rafael too.
> 
> No, it doesn't, unfortunately.  Actually, it breaks things even more, as
> suspend now hangs (probably deadlocks) while disabling the nonboot CPU.

For some obscure reason, I'm no longer able to reproduce the problem
with the Peter's patch and I think it's correct.  You can add

Tested-by: Rafael J. Wysocki <rjw@sisk.pl>

to it.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
  2008-02-01 22:48                   ` Ingo Molnar
@ 2008-02-01 23:22                     ` Dmitry Adamushko
  2008-02-01 23:25                       ` Ingo Molnar
  2008-02-01 23:26                       ` Rafael J. Wysocki
  0 siblings, 2 replies; 27+ messages in thread
From: Dmitry Adamushko @ 2008-02-01 23:22 UTC (permalink / raw)
  To: Ingo Molnar, Rafael J. Wysocki
  Cc: Peter Zijlstra, Steven Rostedt, Andrew Morton, Linus Torvalds, LKML

On 01/02/2008, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Dmitry Adamushko <dmitry.adamushko@gmail.com> wrote:
>
> > > I've observed delays from ~3 s. up to ~8 s. (out of ~20 tests) so
> > > the 10s. delay of msleep_interruptible() might be related but I'm
> > > still looking for the reason why this fix helps (and what goes wrong
> > > with the current code).
> >
> > heh... it's pretty obvious indeed. What's msleep_interruptible() is
> > all about? :-)
> >
> > "sleep waiting for signals"
> >
> > so the 'watchdog' thread gets woken up
> >
> > [ as a result of cpu_callback(action = CPU_DEAD) --> kthread_stop() ]
> >
> > just to be immediately scheduled out again for as long as the
> > remaining timeout > 0.
> >
> > So it should work if we substitute msleep_interruptible() with
> > schedule_timeout_interruptible().
>
> Doh. Could someone with ths problem please test the patch below, does it
> do the trick?

yeah, I was already on a half-way to check it out.

It does fix a problem for me.

Don't forget to take along these 2 fixes from Peter's patch:

- fix break usage in do_each_thread() { } while_each_thread().
- fix the hotplug switch stmt, a fall-through case was broken.


Rafael, does your system still hangs? I'd expect, yes -- as
effectively this fix is not that different from Peter's one when it
comes to suspend-case.

Can you then send your config? Anything special about this machine
(say, some /proc tunables are altered)?



>
>         Ingo
>

-- 
Best regards,
Dmitry Adamushko

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
  2008-02-01 23:19               ` Rafael J. Wysocki
@ 2008-02-01 23:24                 ` Ingo Molnar
  0 siblings, 0 replies; 27+ messages in thread
From: Ingo Molnar @ 2008-02-01 23:24 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Peter Zijlstra, Steven Rostedt, Andrew Morton, Linus Torvalds,
	LKML, Dmitry Adamushko


* Rafael J. Wysocki <rjw@sisk.pl> wrote:

> On Friday, 1 of February 2008, Rafael J. Wysocki wrote:
> > On Friday, 1 of February 2008, Ingo Molnar wrote:
> > > 
> > > * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> > > 
> > > > The below fixes it for me..
> > > > 
> > > > ---
> > > >  - restore the old wakeup mechanism
> > > >  - fix break usage in do_each_thread() { } while_eac_thread().
> > > >  - fix the hotplug switch stmt, a fall-through case was broken.
> > > 
> > > thanks Peter, i've picked your fix up. I hope this solves the problem 
> > > for Rafael too.
> > 
> > No, it doesn't, unfortunately.  Actually, it breaks things even more, as
> > suspend now hangs (probably deadlocks) while disabling the nonboot CPU.
> 
> For some obscure reason, I'm no longer able to reproduce the problem 
> with the Peter's patch and I think it's correct.  You can add
> 
> Tested-by: Rafael J. Wysocki <rjw@sisk.pl>
> 
> to it.

Great! I sent the fix to Linus with your Tested-by line. Thanks Rafael 
for all the testing!

	Ingo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
  2008-02-01 23:22                     ` Dmitry Adamushko
@ 2008-02-01 23:25                       ` Ingo Molnar
  2008-02-02  0:03                         ` Dmitry Adamushko
  2008-02-01 23:26                       ` Rafael J. Wysocki
  1 sibling, 1 reply; 27+ messages in thread
From: Ingo Molnar @ 2008-02-01 23:25 UTC (permalink / raw)
  To: Dmitry Adamushko
  Cc: Rafael J. Wysocki, Peter Zijlstra, Steven Rostedt, Andrew Morton,
	Linus Torvalds, LKML


* Dmitry Adamushko <dmitry.adamushko@gmail.com> wrote:

> yeah, I was already on a half-way to check it out.
> 
> It does fix a problem for me.
> 
> Don't forget to take along these 2 fixes from Peter's patch:
> 
> - fix break usage in do_each_thread() { } while_each_thread().
> - fix the hotplug switch stmt, a fall-through case was broken.

Dmitry, i sent Peter's fix(es) below to Linus. Do you concur that it 
fixes all the practical and theoretical problems you could see with the 
code too?

	Ingo

--------------->
Subject: debug: softlockup looping fix
From: Peter Zijlstra <a.p.zijlstra@chello.nl>

Rafael J. Wysocki reported weird, multi-seconds delays during
suspend/resume and bisected it back to:

  commit 82a1fcb90287052aabfa235e7ffc693ea003fe69
  Author: Ingo Molnar <mingo@elte.hu>
  Date:   Fri Jan 25 21:08:02 2008 +0100

      softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks

fix it:

 - restore the old wakeup mechanism
 - fix break usage in do_each_thread() { } while_each_thread().
 - fix the hotplug switch stmt, a fall-through case was broken.

Bisected-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/softlockup.c |   30 ++++++++++++++++++++----------
 1 file changed, 20 insertions(+), 10 deletions(-)

Index: linux/kernel/softlockup.c
===================================================================
--- linux.orig/kernel/softlockup.c
+++ linux/kernel/softlockup.c
@@ -101,6 +101,10 @@ void softlockup_tick(void)
 
 	now = get_timestamp(this_cpu);
 
+	/* Wake up the high-prio watchdog task every second: */
+	if (now > (touch_timestamp + 1))
+		wake_up_process(per_cpu(watchdog_task, this_cpu));
+
 	/* Warn about unreasonable delays: */
 	if (now <= (touch_timestamp + softlockup_thresh))
 		return;
@@ -191,11 +195,11 @@ static void check_hung_uninterruptible_t
 	read_lock(&tasklist_lock);
 	do_each_thread(g, t) {
 		if (!--max_count)
-			break;
+			goto unlock;
 		if (t->state & TASK_UNINTERRUPTIBLE)
 			check_hung_task(t, now);
 	} while_each_thread(g, t);
-
+ unlock:
 	read_unlock(&tasklist_lock);
 }
 
@@ -218,14 +222,19 @@ static int watchdog(void *__bind_cpu)
 	 * debug-printout triggers in softlockup_tick().
 	 */
 	while (!kthread_should_stop()) {
+		set_current_state(TASK_INTERRUPTIBLE);
 		touch_softlockup_watchdog();
-		msleep_interruptible(10000);
+		schedule();
+
+		if (kthread_should_stop())
+			break;
 
 		if (this_cpu != check_cpu)
 			continue;
 
 		if (sysctl_hung_task_timeout_secs)
 			check_hung_uninterruptible_tasks(this_cpu);
+
 	}
 
 	return 0;
@@ -259,13 +268,6 @@ cpu_callback(struct notifier_block *nfb,
 		wake_up_process(per_cpu(watchdog_task, hotcpu));
 		break;
 #ifdef CONFIG_HOTPLUG_CPU
-	case CPU_UP_CANCELED:
-	case CPU_UP_CANCELED_FROZEN:
-		if (!per_cpu(watchdog_task, hotcpu))
-			break;
-		/* Unbind so it can run.  Fall thru. */
-		kthread_bind(per_cpu(watchdog_task, hotcpu),
-			     any_online_cpu(cpu_online_map));
 	case CPU_DOWN_PREPARE:
 	case CPU_DOWN_PREPARE_FROZEN:
 		if (hotcpu == check_cpu) {
@@ -275,6 +277,14 @@ cpu_callback(struct notifier_block *nfb,
 			check_cpu = any_online_cpu(temp_cpu_online_map);
 		}
 		break;
+
+	case CPU_UP_CANCELED:
+	case CPU_UP_CANCELED_FROZEN:
+		if (!per_cpu(watchdog_task, hotcpu))
+			break;
+		/* Unbind so it can run.  Fall thru. */
+		kthread_bind(per_cpu(watchdog_task, hotcpu),
+			     any_online_cpu(cpu_online_map));
 	case CPU_DEAD:
 	case CPU_DEAD_FROZEN:
 		p = per_cpu(watchdog_task, hotcpu);

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
  2008-02-01 23:22                     ` Dmitry Adamushko
  2008-02-01 23:25                       ` Ingo Molnar
@ 2008-02-01 23:26                       ` Rafael J. Wysocki
  1 sibling, 0 replies; 27+ messages in thread
From: Rafael J. Wysocki @ 2008-02-01 23:26 UTC (permalink / raw)
  To: Dmitry Adamushko
  Cc: Ingo Molnar, Peter Zijlstra, Steven Rostedt, Andrew Morton,
	Linus Torvalds, LKML

On Saturday, 2 of February 2008, Dmitry Adamushko wrote:
> On 01/02/2008, Ingo Molnar <mingo@elte.hu> wrote:
> >
> > * Dmitry Adamushko <dmitry.adamushko@gmail.com> wrote:
> >
> > > > I've observed delays from ~3 s. up to ~8 s. (out of ~20 tests) so
> > > > the 10s. delay of msleep_interruptible() might be related but I'm
> > > > still looking for the reason why this fix helps (and what goes wrong
> > > > with the current code).
> > >
> > > heh... it's pretty obvious indeed. What's msleep_interruptible() is
> > > all about? :-)
> > >
> > > "sleep waiting for signals"
> > >
> > > so the 'watchdog' thread gets woken up
> > >
> > > [ as a result of cpu_callback(action = CPU_DEAD) --> kthread_stop() ]
> > >
> > > just to be immediately scheduled out again for as long as the
> > > remaining timeout > 0.
> > >
> > > So it should work if we substitute msleep_interruptible() with
> > > schedule_timeout_interruptible().
> >
> > Doh. Could someone with ths problem please test the patch below, does it
> > do the trick?
> 
> yeah, I was already on a half-way to check it out.
> 
> It does fix a problem for me.
> 
> Don't forget to take along these 2 fixes from Peter's patch:
> 
> - fix break usage in do_each_thread() { } while_each_thread().
> - fix the hotplug switch stmt, a fall-through case was broken.
> 
> 
> Rafael, does your system still hangs? I'd expect, yes -- as
> effectively this fix is not that different from Peter's one when it
> comes to suspend-case.

I haven't tested it yet, but I don't expect it to hang, as the Peter's patch
started to work for me.  I'm going to test it in a while.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
  2008-02-01 23:25                       ` Ingo Molnar
@ 2008-02-02  0:03                         ` Dmitry Adamushko
  0 siblings, 0 replies; 27+ messages in thread
From: Dmitry Adamushko @ 2008-02-02  0:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rafael J. Wysocki, Peter Zijlstra, Steven Rostedt, Andrew Morton,
	Linus Torvalds, LKML

On 02/02/2008, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Dmitry Adamushko <dmitry.adamushko@gmail.com> wrote:
>
> > yeah, I was already on a half-way to check it out.
> >
> > It does fix a problem for me.
> >
> > Don't forget to take along these 2 fixes from Peter's patch:
> >
> > - fix break usage in do_each_thread() { } while_each_thread().
> > - fix the hotplug switch stmt, a fall-through case was broken.
>
> Dmitry, i sent Peter's fix(es) below to Linus. Do you concur that it
> fixes all the practical and theoretical problems you could see with the
> code too?
>

One comment: any argument for waking up the 'watchdog' thread from
softlockup_tick() wrt just having schedule_timeout_interruptible(HZ)
in watchdog() ?
(although, it's not that important).

IMHO, it looks good.

(I'll look again tomorrow when I'm less sleepy :-/)

-- 
Best regards,
Dmitry Adamushko

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2008-02-02  0:03 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-01-27 21:29 [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected) Rafael J. Wysocki
2008-01-27 21:59 ` Ingo Molnar
2008-01-28  1:26   ` Rafael J. Wysocki
2008-01-28  1:40     ` Steven Rostedt
2008-01-28 11:31       ` Rafael J. Wysocki
2008-01-28 16:31         ` Rafael J. Wysocki
2008-01-28 16:46           ` Steven Rostedt
2008-01-29  0:08           ` Rafael J. Wysocki
2008-01-28  8:56     ` Dmitry Adamushko
2008-01-28 11:32       ` Rafael J. Wysocki
2008-01-28 16:08       ` Rafael J. Wysocki
2008-01-31 15:58     ` Peter Zijlstra
2008-01-31 20:54       ` Rafael J. Wysocki
2008-02-01 12:04         ` Peter Zijlstra
2008-02-01 12:47           ` Ingo Molnar
2008-02-01 14:42             ` Rafael J. Wysocki
2008-02-01 23:19               ` Rafael J. Wysocki
2008-02-01 23:24                 ` Ingo Molnar
2008-02-01 15:11           ` Dmitry Adamushko
2008-02-01 17:10             ` Ingo Molnar
2008-02-01 21:54               ` Dmitry Adamushko
2008-02-01 22:44                 ` Dmitry Adamushko
2008-02-01 22:48                   ` Ingo Molnar
2008-02-01 23:22                     ` Dmitry Adamushko
2008-02-01 23:25                       ` Ingo Molnar
2008-02-02  0:03                         ` Dmitry Adamushko
2008-02-01 23:26                       ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).