linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RESEND 1/1] x86: tsc: avoid system instability in hibernation
@ 2018-07-26 15:56 Eduardo Valentin
  2018-07-30  7:15 ` Rafael J. Wysocki
  2018-07-30  8:53 ` Peter Zijlstra
  0 siblings, 2 replies; 6+ messages in thread
From: Eduardo Valentin @ 2018-07-26 15:56 UTC (permalink / raw)
  To: Peter Zijlstra, Rafael J . Wysocki
  Cc: Eduardo Valentin, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Dou Liyang, Len Brown, Rafael J. Wysocki, mike.travis,
	Rajvi Jingar, Pavel Tatashin, Philippe Ombredanne, Kate Stewart,
	Greg Kroah-Hartman, x86, linux-kernel, linux-pm

System instability are seen during resume from hibernation when system
is under heavy CPU load. This is due to the lack of update of sched
clock data, and the scheduler would then think that heavy CPU hog
tasks need more time in CPU, causing the system to freeze
during the unfreezing of tasks. For example, threaded irqs,
and kernel processes servicing network interface may be delayed
for several tens of seconds, causing the system to be unreachable.

Situation like this can be reported by using lockup detectors
such as workqueue lockup detectors:

[root@ip-172-31-67-114 ec2-user]# echo disk > /sys/power/state

Message from syslogd@ip-172-31-67-114 at May  7 18:23:21 ...
 kernel:BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 57s!

Message from syslogd@ip-172-31-67-114 at May  7 18:23:21 ...
 kernel:BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 57s!

Message from syslogd@ip-172-31-67-114 at May  7 18:23:21 ...
 kernel:BUG: workqueue lockup - pool cpus=3 node=0 flags=0x1 nice=0 stuck for 57s!

Message from syslogd@ip-172-31-67-114 at May  7 18:29:06 ...
 kernel:BUG: workqueue lockup - pool cpus=3 node=0 flags=0x1 nice=0 stuck for 403s!

The fix for this situation is to mark the sched clock as unstable
as early as possible in the resume path, leaving it unstable
for the duration of the resume process. This will force the
scheduler to attempt to align the sched clock across CPUs using
the delta with time of day, updating sched clock data. In a post
hibernation event, we can then mark the sched clock as stable
again, avoiding unnecessary syncs with time of day on systems
in which TSC is reliable.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Len Brown <len.brown@intel.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
Cc: Rajvi Jingar <rajvi.jingar@intel.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Signed-off-by: Eduardo Valentin <eduval@amazon.com>
---

Hey,

No changes from first attempt, no pressure on resending. The RESEND
tag is just because I missed linux-pm in the first attempt.

BR,

 arch/x86/kernel/tsc.c       | 29 +++++++++++++++++++++++++++++
 include/linux/sched/clock.h |  5 +++++
 kernel/sched/clock.c        |  4 ++--
 3 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 8ea117f8142e..f197c9742fef 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -13,6 +13,7 @@
 #include <linux/percpu.h>
 #include <linux/timex.h>
 #include <linux/static_key.h>
+#include <linux/suspend.h>
 
 #include <asm/hpet.h>
 #include <asm/timer.h>
@@ -1377,3 +1378,31 @@ unsigned long calibrate_delay_is_known(void)
 	return 0;
 }
 #endif
+
+static int tsc_pm_notifier(struct notifier_block *notifier,
+			   unsigned long pm_event, void *unused)
+{
+	switch (pm_event) {
+	case PM_HIBERNATION_PREPARE:
+		clear_sched_clock_stable();
+		break;
+	case PM_POST_HIBERNATION:
+		/* Set back to the default */
+		if (!check_tsc_unstable())
+			set_sched_clock_stable();
+		break;
+	}
+
+	return 0;
+};
+
+static struct notifier_block tsc_pm_notifier_block = {
+	.notifier_call = tsc_pm_notifier,
+};
+
+static int tsc_setup_pm_notifier(void)
+{
+	return register_pm_notifier(&tsc_pm_notifier_block);
+}
+
+subsys_initcall(tsc_setup_pm_notifier);
diff --git a/include/linux/sched/clock.h b/include/linux/sched/clock.h
index 867d588314e0..902654ac5f7e 100644
--- a/include/linux/sched/clock.h
+++ b/include/linux/sched/clock.h
@@ -32,6 +32,10 @@ static inline void clear_sched_clock_stable(void)
 {
 }
 
+static inline void set_sched_clock_stable(void)
+{
+}
+
 static inline void sched_clock_idle_sleep_event(void)
 {
 }
@@ -51,6 +55,7 @@ static inline u64 local_clock(void)
 }
 #else
 extern int sched_clock_stable(void);
+extern void set_sched_clock_stable(void);
 extern void clear_sched_clock_stable(void);
 
 /*
diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
index e086babe6c61..8453440e236c 100644
--- a/kernel/sched/clock.c
+++ b/kernel/sched/clock.c
@@ -131,7 +131,7 @@ static void __scd_stamp(struct sched_clock_data *scd)
 	scd->tick_raw = sched_clock();
 }
 
-static void __set_sched_clock_stable(void)
+void set_sched_clock_stable(void)
 {
 	struct sched_clock_data *scd;
 
@@ -228,7 +228,7 @@ static int __init sched_clock_init_late(void)
 	smp_mb(); /* matches {set,clear}_sched_clock_stable() */
 
 	if (__sched_clock_stable_early)
-		__set_sched_clock_stable();
+		set_sched_clock_stable();
 
 	return 0;
 }
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH RESEND 1/1] x86: tsc: avoid system instability in hibernation
  2018-07-26 15:56 [PATCH RESEND 1/1] x86: tsc: avoid system instability in hibernation Eduardo Valentin
@ 2018-07-30  7:15 ` Rafael J. Wysocki
  2018-07-30 16:44   ` Eduardo Valentin
  2018-07-30  8:53 ` Peter Zijlstra
  1 sibling, 1 reply; 6+ messages in thread
From: Rafael J. Wysocki @ 2018-07-30  7:15 UTC (permalink / raw)
  To: Eduardo Valentin
  Cc: Peter Zijlstra, Rafael J . Wysocki, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Dou Liyang, Len Brown, Rafael J. Wysocki,
	mike.travis, Rajvi Jingar, Pavel Tatashin, Philippe Ombredanne,
	Kate Stewart, Greg Kroah-Hartman, the arch/x86 maintainers,
	Linux Kernel Mailing List, Linux PM

On Thu, Jul 26, 2018 at 5:56 PM, Eduardo Valentin <eduval@amazon.com> wrote:
> System instability are seen during resume from hibernation when system
> is under heavy CPU load. This is due to the lack of update of sched
> clock data,

Isn't that the actual bug?

> and the scheduler would then think that heavy CPU hog
> tasks need more time in CPU, causing the system to freeze
> during the unfreezing of tasks. For example, threaded irqs,
> and kernel processes servicing network interface may be delayed
> for several tens of seconds, causing the system to be unreachable.
>
> Situation like this can be reported by using lockup detectors
> such as workqueue lockup detectors:
>
> [root@ip-172-31-67-114 ec2-user]# echo disk > /sys/power/state
>
> Message from syslogd@ip-172-31-67-114 at May  7 18:23:21 ...
>  kernel:BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 57s!
>
> Message from syslogd@ip-172-31-67-114 at May  7 18:23:21 ...
>  kernel:BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 57s!
>
> Message from syslogd@ip-172-31-67-114 at May  7 18:23:21 ...
>  kernel:BUG: workqueue lockup - pool cpus=3 node=0 flags=0x1 nice=0 stuck for 57s!
>
> Message from syslogd@ip-172-31-67-114 at May  7 18:29:06 ...
>  kernel:BUG: workqueue lockup - pool cpus=3 node=0 flags=0x1 nice=0 stuck for 403s!
>
> The fix for this situation is to mark the sched clock as unstable
> as early as possible in the resume path, leaving it unstable
> for the duration of the resume process.

I would rather call it a workaround.

> This will force the
> scheduler to attempt to align the sched clock across CPUs using
> the delta with time of day, updating sched clock data. In a post
> hibernation event, we can then mark the sched clock as stable
> again, avoiding unnecessary syncs with time of day on systems
> in which TSC is reliable.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
> Cc: Len Brown <len.brown@intel.com>
> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
> Cc: Eduardo Valentin <eduval@amazon.com>
> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
> Cc: Rajvi Jingar <rajvi.jingar@intel.com>
> Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
> Cc: Philippe Ombredanne <pombredanne@nexb.com>
> Cc: Kate Stewart <kstewart@linuxfoundation.org>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: x86@kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-pm@vger.kernel.org
> Signed-off-by: Eduardo Valentin <eduval@amazon.com>
> ---
>
> Hey,
>
> No changes from first attempt, no pressure on resending. The RESEND
> tag is just because I missed linux-pm in the first attempt.
>
> BR,
>
>  arch/x86/kernel/tsc.c       | 29 +++++++++++++++++++++++++++++
>  include/linux/sched/clock.h |  5 +++++
>  kernel/sched/clock.c        |  4 ++--
>  3 files changed, 36 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index 8ea117f8142e..f197c9742fef 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -13,6 +13,7 @@
>  #include <linux/percpu.h>
>  #include <linux/timex.h>
>  #include <linux/static_key.h>
> +#include <linux/suspend.h>
>
>  #include <asm/hpet.h>
>  #include <asm/timer.h>
> @@ -1377,3 +1378,31 @@ unsigned long calibrate_delay_is_known(void)
>         return 0;
>  }
>  #endif
> +
> +static int tsc_pm_notifier(struct notifier_block *notifier,
> +                          unsigned long pm_event, void *unused)
> +{
> +       switch (pm_event) {
> +       case PM_HIBERNATION_PREPARE:
> +               clear_sched_clock_stable();
> +               break;

This is too early IMO.  This happens before hibernation starts, even
before the image is created.

> +       case PM_POST_HIBERNATION:
> +               /* Set back to the default */
> +               if (!check_tsc_unstable())
> +                       set_sched_clock_stable();
> +               break;
> +       }
> +
> +       return 0;
> +};

If anything like this is the way to go, which honestly I doubt, I
would prefer it to be done in hibernate() in the !in_suspend case.

But why does it only affect hibernation?  Do we do something extra for
system-wide suspend/resume that is not done for hibernation?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH RESEND 1/1] x86: tsc: avoid system instability in hibernation
  2018-07-26 15:56 [PATCH RESEND 1/1] x86: tsc: avoid system instability in hibernation Eduardo Valentin
  2018-07-30  7:15 ` Rafael J. Wysocki
@ 2018-07-30  8:53 ` Peter Zijlstra
  2018-07-30 16:41   ` Eduardo Valentin
  1 sibling, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2018-07-30  8:53 UTC (permalink / raw)
  To: Eduardo Valentin
  Cc: Rafael J . Wysocki, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Dou Liyang, Len Brown, Rafael J. Wysocki, mike.travis,
	Rajvi Jingar, Pavel Tatashin, Philippe Ombredanne, Kate Stewart,
	Greg Kroah-Hartman, x86, linux-kernel, linux-pm

On Thu, Jul 26, 2018 at 08:56:56AM -0700, Eduardo Valentin wrote:
> System instability are seen during resume from hibernation when system
> is under heavy CPU load. This is due to the lack of update of sched
> clock data

Which would suggest you're already running with unstable sched clock.
Otherwise nobody would care about the scd stuff.

What kind of machine are you running? What does:

  dmesg | grep -i tsc

say?

> The fix for this situation is to mark the sched clock as unstable
> as early as possible in the resume path, leaving it unstable
> for the duration of the resume process. This will force the
> scheduler to attempt to align the sched clock across CPUs using
> the delta with time of day, updating sched clock data. In a post
> hibernation event, we can then mark the sched clock as stable
> again, avoiding unnecessary syncs with time of day on systems
> in which TSC is reliable.

None of this makes any sense. Either you were already unstable and it
should already have worked and them marking it stable is an outright
bug, or your sched clock was stable but then your initial diagnosis of
lack of scd updates is complete garbage.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH RESEND 1/1] x86: tsc: avoid system instability in hibernation
  2018-07-30  8:53 ` Peter Zijlstra
@ 2018-07-30 16:41   ` Eduardo Valentin
  2018-07-30 19:49     ` Thomas Gleixner
  0 siblings, 1 reply; 6+ messages in thread
From: Eduardo Valentin @ 2018-07-30 16:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Eduardo Valentin, Rafael J . Wysocki, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Dou Liyang, Len Brown,
	Rafael J. Wysocki, mike.travis, Rajvi Jingar, Pavel Tatashin,
	Philippe Ombredanne, Kate Stewart, Greg Kroah-Hartman, x86,
	linux-kernel, linux-pm

Hey Peter,

On Mon, Jul 30, 2018 at 10:53:54AM +0200, Peter Zijlstra wrote:
> On Thu, Jul 26, 2018 at 08:56:56AM -0700, Eduardo Valentin wrote:
> > System instability are seen during resume from hibernation when system
> > is under heavy CPU load. This is due to the lack of update of sched
> > clock data
> 
> Which would suggest you're already running with unstable sched clock.
> Otherwise nobody would care about the scd stuff.


Yes.

> 
> What kind of machine are you running? What does:
> 
>   dmesg | grep -i tsc
> 
> say?

Here:
[    0.000000] tsc: Fast TSC calibration using PIT
[    0.004005] tsc: Detected 3000.000 MHz processor
[    0.066796] TSC deadline timer enabled
[    3.904269] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x2b3e459bf4c, max_idle_ns: 440795289890 ns

> 
> > The fix for this situation is to mark the sched clock as unstable
> > as early as possible in the resume path, leaving it unstable
> > for the duration of the resume process. This will force the
> > scheduler to attempt to align the sched clock across CPUs using
> > the delta with time of day, updating sched clock data. In a post
> > hibernation event, we can then mark the sched clock as stable
> > again, avoiding unnecessary syncs with time of day on systems
> > in which TSC is reliable.
> 
> None of this makes any sense. Either you were already unstable and it
> should already have worked and them marking it stable is an outright
> bug, or your sched clock was stable but then your initial diagnosis of
> lack of scd updates is complete garbage.
> 

I see, or it is just a workaround for the underling issue. I, for sure, see no
lockups anymore after forcing the scd updates. The other thing which are not
super clear is that this happens during the unfreezing of tasks. If I get a
set of cpu hog tasks while unfreezing, I see the system throwing worqueue lockup
detectors in hibernation restore.

> 

-- 
All the best,
Eduardo Valentin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH RESEND 1/1] x86: tsc: avoid system instability in hibernation
  2018-07-30  7:15 ` Rafael J. Wysocki
@ 2018-07-30 16:44   ` Eduardo Valentin
  0 siblings, 0 replies; 6+ messages in thread
From: Eduardo Valentin @ 2018-07-30 16:44 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Eduardo Valentin, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Dou Liyang, Len Brown, Rafael J. Wysocki,
	mike.travis, Rajvi Jingar, Pavel Tatashin, Philippe Ombredanne,
	Kate Stewart, Greg Kroah-Hartman, the arch/x86 maintainers,
	Linux Kernel Mailing List, Linux PM

On Mon, Jul 30, 2018 at 09:15:48AM +0200, Rafael J. Wysocki wrote:
> On Thu, Jul 26, 2018 at 5:56 PM, Eduardo Valentin <eduval@amazon.com> wrote:
> > System instability are seen during resume from hibernation when system
> > is under heavy CPU load. This is due to the lack of update of sched
> > clock data,
> 
> Isn't that the actual bug?
> 
> > and the scheduler would then think that heavy CPU hog
> > tasks need more time in CPU, causing the system to freeze
> > during the unfreezing of tasks. For example, threaded irqs,
> > and kernel processes servicing network interface may be delayed
> > for several tens of seconds, causing the system to be unreachable.
> >
> > Situation like this can be reported by using lockup detectors
> > such as workqueue lockup detectors:
> >
> > [root@ip-172-31-67-114 ec2-user]# echo disk > /sys/power/state
> >
> > Message from syslogd@ip-172-31-67-114 at May  7 18:23:21 ...
> >  kernel:BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 57s!
> >
> > Message from syslogd@ip-172-31-67-114 at May  7 18:23:21 ...
> >  kernel:BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 57s!
> >
> > Message from syslogd@ip-172-31-67-114 at May  7 18:23:21 ...
> >  kernel:BUG: workqueue lockup - pool cpus=3 node=0 flags=0x1 nice=0 stuck for 57s!
> >
> > Message from syslogd@ip-172-31-67-114 at May  7 18:29:06 ...
> >  kernel:BUG: workqueue lockup - pool cpus=3 node=0 flags=0x1 nice=0 stuck for 403s!
> >
> > The fix for this situation is to mark the sched clock as unstable
> > as early as possible in the resume path, leaving it unstable
> > for the duration of the resume process.
> 
> I would rather call it a workaround.

ok.

> 
> > This will force the
> > scheduler to attempt to align the sched clock across CPUs using
> > the delta with time of day, updating sched clock data. In a post
> > hibernation event, we can then mark the sched clock as stable
> > again, avoiding unnecessary syncs with time of day on systems
> > in which TSC is reliable.
> >
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
> > Cc: Len Brown <len.brown@intel.com>
> > Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
> > Cc: Eduardo Valentin <eduval@amazon.com>
> > Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
> > Cc: Rajvi Jingar <rajvi.jingar@intel.com>
> > Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
> > Cc: Philippe Ombredanne <pombredanne@nexb.com>
> > Cc: Kate Stewart <kstewart@linuxfoundation.org>
> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Cc: x86@kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Cc: linux-pm@vger.kernel.org
> > Signed-off-by: Eduardo Valentin <eduval@amazon.com>
> > ---
> >
> > Hey,
> >
> > No changes from first attempt, no pressure on resending. The RESEND
> > tag is just because I missed linux-pm in the first attempt.
> >
> > BR,
> >
> >  arch/x86/kernel/tsc.c       | 29 +++++++++++++++++++++++++++++
> >  include/linux/sched/clock.h |  5 +++++
> >  kernel/sched/clock.c        |  4 ++--
> >  3 files changed, 36 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> > index 8ea117f8142e..f197c9742fef 100644
> > --- a/arch/x86/kernel/tsc.c
> > +++ b/arch/x86/kernel/tsc.c
> > @@ -13,6 +13,7 @@
> >  #include <linux/percpu.h>
> >  #include <linux/timex.h>
> >  #include <linux/static_key.h>
> > +#include <linux/suspend.h>
> >
> >  #include <asm/hpet.h>
> >  #include <asm/timer.h>
> > @@ -1377,3 +1378,31 @@ unsigned long calibrate_delay_is_known(void)
> >         return 0;
> >  }
> >  #endif
> > +
> > +static int tsc_pm_notifier(struct notifier_block *notifier,
> > +                          unsigned long pm_event, void *unused)
> > +{
> > +       switch (pm_event) {
> > +       case PM_HIBERNATION_PREPARE:
> > +               clear_sched_clock_stable();
> > +               break;
> 
> This is too early IMO.  This happens before hibernation starts, even
> before the image is created.

Yeah, I think, as long as it is marked, it should be fine.

> 
> > +       case PM_POST_HIBERNATION:
> > +               /* Set back to the default */
> > +               if (!check_tsc_unstable())
> > +                       set_sched_clock_stable();
> > +               break;
> > +       }
> > +
> > +       return 0;
> > +};
> 
> If anything like this is the way to go, which honestly I doubt, I
> would prefer it to be done in hibernate() in the !in_suspend case.
> 

The problem is more in the unfreeze of tasks..

> But why does it only affect hibernation?  Do we do something extra for
> system-wide suspend/resume that is not done for hibernation?

I don't think we do anything special in hibernation per si. 
Only thing is the unfreezing of tasks seams to get confused when 
CPU hog tasks are present. 
> 

-- 
All the best,
Eduardo Valentin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH RESEND 1/1] x86: tsc: avoid system instability in hibernation
  2018-07-30 16:41   ` Eduardo Valentin
@ 2018-07-30 19:49     ` Thomas Gleixner
  0 siblings, 0 replies; 6+ messages in thread
From: Thomas Gleixner @ 2018-07-30 19:49 UTC (permalink / raw)
  To: Eduardo Valentin
  Cc: Peter Zijlstra, Rafael J . Wysocki, Ingo Molnar, H. Peter Anvin,
	Dou Liyang, Len Brown, Rafael J. Wysocki, mike.travis,
	Rajvi Jingar, Pavel Tatashin, Philippe Ombredanne, Kate Stewart,
	Greg Kroah-Hartman, x86, linux-kernel, linux-pm

On Mon, 30 Jul 2018, Eduardo Valentin wrote:
> On Mon, Jul 30, 2018 at 10:53:54AM +0200, Peter Zijlstra wrote:
> > On Thu, Jul 26, 2018 at 08:56:56AM -0700, Eduardo Valentin wrote:
> > > System instability are seen during resume from hibernation when system
> > > is under heavy CPU load. This is due to the lack of update of sched
> > > clock data
> > 
> > Which would suggest you're already running with unstable sched clock.
> > Otherwise nobody would care about the scd stuff.
> 
> Yes.

I doubt that...

> > 
> > What kind of machine are you running? What does:
> > 
> >   dmesg | grep -i tsc
> > 
> > say?
> 
> Here:
> [    0.000000] tsc: Fast TSC calibration using PIT
> [    0.004005] tsc: Detected 3000.000 MHz processor
> [    0.066796] TSC deadline timer enabled
> [    3.904269] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x2b3e459bf4c, max_idle_ns: 440795289890 ns
>

... because if the sched clock would be unstable then you'd have something
like 'TSC unstable' in dmesg, which you obviously do not.

'sched_clock: Marking unstable' is the other message which would be
emitted.

> > > The fix for this situation is to mark the sched clock as unstable
> > > as early as possible in the resume path, leaving it unstable
> > > for the duration of the resume process. This will force the
> > > scheduler to attempt to align the sched clock across CPUs using
> > > the delta with time of day, updating sched clock data. In a post
> > > hibernation event, we can then mark the sched clock as stable
> > > again, avoiding unnecessary syncs with time of day on systems
> > > in which TSC is reliable.
> > 
> > None of this makes any sense. Either you were already unstable and it
> > should already have worked and them marking it stable is an outright
> > bug, or your sched clock was stable but then your initial diagnosis of
> > lack of scd updates is complete garbage.
> > 
> 
> I see, or it is just a workaround for the underling issue. I, for sure, see no
> lockups anymore after forcing the scd updates. The other thing which are not
> super clear is that this happens during the unfreezing of tasks. If I get a
> set of cpu hog tasks while unfreezing, I see the system throwing worqueue lockup
> detectors in hibernation restore.

Yes, it pretty much papers over something else. Can you please provide a
full dmesg from boot to failure case?

Another question: Does the system recover after issuing the lockup messages
or is it hosed completely?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-07-30 19:50 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-26 15:56 [PATCH RESEND 1/1] x86: tsc: avoid system instability in hibernation Eduardo Valentin
2018-07-30  7:15 ` Rafael J. Wysocki
2018-07-30 16:44   ` Eduardo Valentin
2018-07-30  8:53 ` Peter Zijlstra
2018-07-30 16:41   ` Eduardo Valentin
2018-07-30 19:49     ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).