All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: 2.6.36-rc3 suspend issue (was: 2.6.35-rc4 / X201 issues)
@ 2010-09-10  5:36 Jeff Chua
  2010-09-10  7:48 ` Peter Zijlstra
  0 siblings, 1 reply; 25+ messages in thread
From: Jeff Chua @ 2010-09-10  5:36 UTC (permalink / raw)
  To: Nico Schottelius, Rafael J. Wysocki, Nico Schottelius, Jeff Chua,
	Jesse Barnes, LKML, Linus Torvalds, Florian Pritz, Suresh Siddha,
	stable, Peter Zijlstra, Ingo Molnar


On Wed, Sep 8, 2010 at 2:21 PM, Nico Schottelius 
<nico-nospam@schottelius.org> wrote:
> Rafael J. Wysocki [Wed, Sep 08, 2010 at 01:28:52AM +0200]:
>> On Wednesday, September 08, 2010, Nico Schottelius wrote:
>> > Rafael J. Wysocki [Tue, Sep 07, 2010 at 11:48:41PM +0200]:
>> > > On Tuesday, September 07, 2010, Jeff Chua wrote:
>> > > > Cool. Thanks for the short-cut! At least now, I can resume, but got a
>> > > > lot of BUGS showing up upon resume after applying the patch.
>> > > This also was reported IIRC, but there's no resolution so far. It's a
>> > > different issue.
>> > Can somebody ping me, as soon as a git pull on linux-2.6
>> > should be as "stable" (or more stable) than 2.6.34?
>> No one can say when that happens for your machine.
> True. I was more wondering, when the bisected issue will
> be fixed, as this may give my machine some more chances
> to work on Linux.


I've bisected and it's pointing to the following commit causing the 
errors after resume. Reverting the commit solves the problem.


commit cd7240c0b900eb6d690ccee088a6c9b46dae815a
Author: Suresh Siddha <suresh.b.siddha@intel.com>
Date:   Thu Aug 19 17:03:38 2010 -0700

     x86, tsc, sched: Recompute cyc2ns_offset's during resume from sleep states

     TSC's get reset after suspend/resume (even on cpu's with invariant TSC
     which runs at a constant rate across ACPI P-, C- and T-states). And in
     some systems BIOS seem to reinit TSC to arbitrary large value (still
     sync'd across cpu's) during resume.

     This leads to a scenario of scheduler rq->clock (sched_clock_cpu()) less
     than rq->age_stamp (introduced in 2.6.32). This leads to a big value
     returned by scale_rt_power() and the resulting big group power set by the
     update_group_power() is causing improper load balancing between busy and
     idle cpu's after suspend/resume.

     This resulted in multi-threaded workloads (like kernel-compilation) go
     slower after suspend/resume cycle on core i5 laptops.

     Fix this by recomputing cyc2ns_offset's during resume, so that
     sched_clock() continues from the point where it was left off during
     suspend.

     Reported-by: Florian Pritz <flo@xssn.at>
     Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
     Cc: <stable@kernel.org> # [v2.6.32+]
     Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
     LKML-Reference: <1282262618.2675.24.camel@sbsiddha-MOBL3.sc.intel.com>
     Signed-off-by: Ingo Molnar <mingo@elte.hu>

diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h
index c042729..1ca132f 100644
--- a/arch/x86/include/asm/tsc.h
+++ b/arch/x86/include/asm/tsc.h
@@ -59,5 +59,7 @@ extern void check_tsc_sync_source(int cpu);
  extern void check_tsc_sync_target(void);

  extern int notsc_setup(char *);
+extern void save_sched_clock_state(void);
+extern void restore_sched_clock_state(void);

  #endif /* _ASM_X86_TSC_H */
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index ce8e502..d632934 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -626,6 +626,44 @@ static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu)
  	local_irq_restore(flags);
  }

+static unsigned long long cyc2ns_suspend;
+
+void save_sched_clock_state(void)
+{
+	if (!sched_clock_stable)
+		return;
+
+	cyc2ns_suspend = sched_clock();
+}
+
+/*
+ * Even on processors with invariant TSC, TSC gets reset in some the
+ * ACPI system sleep states. And in some systems BIOS seem to reinit TSC to
+ * arbitrary value (still sync'd across cpu's) during resume from such sleep
+ * states. To cope up with this, recompute the cyc2ns_offset for each cpu so
+ * that sched_clock() continues from the point where it was left off during
+ * suspend.
+ */
+void restore_sched_clock_state(void)
+{
+	unsigned long long offset;
+	unsigned long flags;
+	int cpu;
+
+	if (!sched_clock_stable)
+		return;
+
+	local_irq_save(flags);
+
+	get_cpu_var(cyc2ns_offset) = 0;
+	offset = cyc2ns_suspend - sched_clock();
+
+	for_each_possible_cpu(cpu)
+		per_cpu(cyc2ns_offset, cpu) = offset;
+
+	local_irq_restore(flags);
+}
+
  #ifdef CONFIG_CPU_FREQ

  /* Frequency scaling support. Adjust the TSC based timer when the cpu frequency
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index e7e8c5f..87bb35e 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -113,6 +113,7 @@ static void __save_processor_state(struct saved_context *ctxt)
  void save_processor_state(void)
  {
  	__save_processor_state(&saved_context);
+	save_sched_clock_state();
  }
  #ifdef CONFIG_X86_32
  EXPORT_SYMBOL(save_processor_state);
@@ -229,6 +230,7 @@ static void __restore_processor_state(struct saved_context *ctxt)
  void restore_processor_state(void)
  {
  	__restore_processor_state(&saved_context);
+	restore_sched_clock_state();
  }
  #ifdef CONFIG_X86_32
  EXPORT_SYMBOL(restore_processor_state);




Errors like the one below:

cpi_ds_exec_end_op+0x8e/0x3cd
  [<ffffffff8121497d>] ? acpi_ps_parse_loop+0x7dd/0x96c
  [<ffffffff81213af7>] ? acpi_ps_parse_aml+0x8e/0x29a
  [<ffffffff8121512e>] ? acpi_ps_execute_method+0x1bf/0x28d
  [<ffffffff81210741>] ? acpi_ns_evaluate+0xdd/0x19a
  [<ffffffff812101f3>] ? acpi_evaluate_object+0x145/0x246
  [<ffffffff811f79b2>] ? acpi_os_signal_semaphore+0x23/0x27
  [<ffffffff811fa41e>] ? acpi_device_resume+0x0/0x2b
  [<ffffffff81222892>] ? acpi_battery_get_state+0x7f/0x121
  [<ffffffff812118c2>] ? acpi_get_handle+0x7b/0x99
  [<ffffffff81222b99>] ? acpi_battery_update+0x265/0x26e
  [<ffffffff81222c70>] ? acpi_battery_resume+0x25/0x2a
  [<ffffffff81295c8d>] ? legacy_resume+0x1e/0x55
  [<ffffffff81295d24>] ? device_resume+0x60/0xdd
  [<ffffffff811c2102>] ? kobject_get+0x12/0x17
  [<ffffffff812963e1>] ? dpm_resume_end+0xf2/0x349
  [<ffffffff8105c9a4>] ? suspend_devices_and_enter+0x15b/0x188
  [<ffffffff8105ca6a>] ? enter_state+0x99/0xcb
  [<ffffffff8105c2da>] ? state_store+0xb1/0xcf
  [<ffffffff810e9f0f>] ? sysfs_write_file+0xd6/0x112
  [<ffffffff810a2f82>] ? vfs_write+0xad/0x132
  [<ffffffff810a30bd>] ? sys_write+0x45/0x6e
  [<ffffffff81001f02>] ? system_call_fastpath+0x16/0x1b
BUG: scheduling while atomic: lid/2486/0x00000002



In short, to solve resume problem, revert these 2 commits ...
         drm/i915: Enable RC6 on Ironlake
                 ce17178094f368d9e3f39b2cb4303da5ed633dd4

         x86, tsc, sched: Recompute cyc2ns_offset's during resume ...
                 cd7240c0b900eb6d690ccee088a6c9b46dae815a



Thanks,
Jeff

^ permalink raw reply related	[flat|nested] 25+ messages in thread
* 2.6.35-rc4 / X201 issues
@ 2010-07-09 17:04 Nico Schottelius
  2010-07-09 17:29 ` Jesse Barnes
  0 siblings, 1 reply; 25+ messages in thread
From: Nico Schottelius @ 2010-07-09 17:04 UTC (permalink / raw)
  To: LKML; +Cc: Nico Schottelius

[-- Attachment #1: Type: text/plain, Size: 1255 bytes --]

Good evening hackers,

latest report from 2.6.35-rc4 on X201:

  - (almost) everytime I close and reopen the lid, the image is screwed up
    - switching to vt1 (away from xorg) and back to vt7 fixes that almost always
    - did not happen on 2.6.34 (iirc)
  - if it's not screwed up, the system is frozen
    - does not react on ping, last image is displayed, no keyboard shortcut works
    - ATTENTION: it's *NOT* needed to suspend directly before to get a freeze!
      - simplying closing & opening the lid is enough
  - on freeze, JFS screws up many (open?) files:
    - .viminfo contains stuff from other files / memory?
    - branch of git is damaged:
      [18:46] kr:ceofhack% cat .git/refs/heads/ceof-crypto
      --- !ruby/object:RDoc::RI::MethodDescript%
  - netdev notification for carrier change still broken (see previous reports)

The first two could probably also be xorg related, though in theory an application
shouldn't be able to freeze the kernel... (yeah, know that xorg is "special" here).

Cheers,

Nico

-- 
New PGP key: 7ED9 F7D3 6B10 81D7 0EC5  5C09 D7DC C8E4 3187 7DF0
Please resign, if you signed 9885188C or 8D0E27A4.

Currently moving *.schottelius.org to http://www.nico.schottelius.org/ ...

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2010-09-28 11:58 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-10  5:36 2.6.36-rc3 suspend issue (was: 2.6.35-rc4 / X201 issues) Jeff Chua
2010-09-10  7:48 ` Peter Zijlstra
2010-09-10 11:53   ` Jeff Chua
2010-09-10 12:05     ` Peter Zijlstra
2010-09-10 17:19       ` Jeff Chua
2010-09-10 20:32         ` [PATCH] x86, tsc: Fix a preemption leak in restore_sched_clock_state() Peter Zijlstra
2010-09-10 21:01           ` Suresh Siddha
2010-09-11  0:59             ` Jeff Chua
2010-09-11  7:49           ` [tip:sched/urgent] " tip-bot for Peter Zijlstra
2010-09-13  7:42             ` Nico Schottelius
2010-09-13  8:46               ` Ingo Molnar
2010-09-13 22:56                 ` Jeff Chua
2010-09-15  8:03                   ` Nico Schottelius
2010-09-28 11:59                     ` 2.6.36-rc3-00464-g84e1d83 resume issue (was: [tip:sched/urgent] x86, tsc: Fix a preemption leak in restore_sched_clock_state()) Nico Schottelius
  -- strict thread matches above, loose matches on Subject: below --
2010-07-09 17:04 2.6.35-rc4 / X201 issues Nico Schottelius
2010-07-09 17:29 ` Jesse Barnes
2010-09-06  7:17   ` 2.6.36-rc3 suspend issue (was: 2.6.35-rc4 / X201 issues) Nico Schottelius
2010-09-06  8:49     ` Nico Schottelius
2010-09-06 13:50     ` Jeff Chua
2010-09-06 18:37       ` Rafael J. Wysocki
2010-09-07  3:42         ` Jeff Chua
2010-09-07 21:48           ` Rafael J. Wysocki
2010-09-07 22:50             ` Nico Schottelius
2010-09-07 23:28               ` Rafael J. Wysocki
2010-09-08  6:21                 ` Nico Schottelius
2010-09-08 22:27                   ` Rafael J. Wysocki
2010-09-10  8:26                 ` Nico Schottelius

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.