All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG] [BISECTED] System gets unresponsive since 2.6.35-rc1
@ 2010-09-15 10:38 Martin Kepplinger
  2010-09-15 11:14 ` Thomas Gleixner
  0 siblings, 1 reply; 5+ messages in thread
From: Martin Kepplinger @ 2010-09-15 10:38 UTC (permalink / raw)
  To: tglx; +Cc: johnstul, damm, akpm, linux-kernel

Hi,
My Ubuntu 10.04, 32bit on a Lenovo G550 Laptop has the following issue since .35-rc1: After a short uptime doing little things like video playback, the system gets slow and stuck after some time. At first, compiz-animations and everything (video playback, terminal-commands) gets unusably slow. /var/log/messages comes up with the following _during_ video playback. These warnings stop when the video is stopped. The system stays (quite) unresponsive / unusable.

Sep 15 12:15:51 mobil pulseaudio[1456]: ratelimit.c: 19 events suppressed
Sep 15 12:15:57 mobil pulseaudio[1456]: ratelimit.c: 25 events suppressed
Sep 15 12:16:02 mobil pulseaudio[1456]: ratelimit.c: 6 events suppressed
Sep 15 12:16:07 mobil pulseaudio[1456]: ratelimit.c: 11 events suppressed
Sep 15 12:16:17 mobil pulseaudio[1456]: ratelimit.c: 12 events suppressed
Sep 15 12:16:23 mobil pulseaudio[1456]: ratelimit.c: 3 events suppressed

Shutdown also hangs very soon. The problem is still present in 2.6.36-rc3! I'm about to test -rc4. The last "good" one is 2.6.34.

I bisected the problem. Unfortunately I could not revert the patch. Additionally, I'm only 90% sure about the result because I had to skip one step of the bisect (but since the toplevel-Makefile showed "2.6.34-something", I could guess it to be "good")

The rest of this E-Mail is the result (git show). I can only report it. I'd happily test patches to it and hope this makes sense to someone of you.

thanks,
Martin

commit d7e81c269db899b800e0963dc4aceece1f82a680
Author: John Stultz <johnstul@us.ibm.com>
Date:   Fri May 7 18:07:38 2010 -0700

    clocksource: Add clocksource_register_hz/khz interface
    
    How to pick good mult/shift pairs has always been difficult to
    describe to folks writing clocksource drivers, since it requires
    careful tradeoffs in adjustment accuracy vs overflow limits.
    
    Now, with the clocks_calc_mult_shift function, its much
    easier. However, not many clocksources have converted to using that
    function, and there is still the issue of the max interval length
    assumption being made by each clocksource driver independently.
    
    So this patch simplifies the registration process by having
    clocksources be registered with a hz/khz value and the registration
    function taking care of setting mult/shift.
    
    This should take most of the confusion out of writing a clocksource
    driver.
    
    Additionally it also keeps the shift size tradeoff (more accuracy vs
    longer possible nohz times) centralized so the timekeeping core can
    keep track of the assumptions being made.
    
    [ tglx: Coding style and comments fixed ]
    
    Signed-off-by: John Stultz <johnstul@us.ibm.com>
    LKML-Reference: <1273280858-30143-1-git-send-email-johnstul@us.ibm.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 4bca8b6..5ea3c60 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -273,7 +273,6 @@ static inline s64 clocksource_cyc2ns(cycle_t cycles, u32 mul
 }
 
 
-/* used to install a new clocksource */
 extern int clocksource_register(struct clocksource*);
 extern void clocksource_unregister(struct clocksource*);
 extern void clocksource_touch_watchdog(void);
@@ -287,6 +286,24 @@ extern void clocksource_mark_unstable(struct clocksource *c
 extern void
 clocks_calc_mult_shift(u32 *mult, u32 *shift, u32 from, u32 to, u32 minsec);
 
+/*
+ * Don't call __clocksource_register_scale directly, use
+ * clocksource_register_hz/khz
+ */
+extern int
+__clocksource_register_scale(struct clocksource *cs, u32 scale, u32 freq);
+
+static inline int clocksource_register_hz(struct clocksource *cs, u32 hz)
+{
+       return __clocksource_register_scale(cs, 1, hz);
+}
+
+static inline int clocksource_register_khz(struct clocksource *cs, u32 khz)
+{
+       return __clocksource_register_scale(cs, 1000, khz);
+}
+
+
 static inline void
 clocksource_calc_mult_shift(struct clocksource *cs, u32 freq, u32 minsec)
 {
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 1f5dde6..f08e99c 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -625,6 +625,54 @@ static void clocksource_enqueue(struct clocksource *cs)
        list_add(&cs->list, entry);
 }
 
+
+/*
+ * Maximum time we expect to go between ticks. This includes idle
+ * tickless time. It provides the trade off between selecting a
+ * mult/shift pair that is very precise but can only handle a short
+ * period of time, vs. a mult/shift pair that can handle long periods
+ * of time but isn't as precise.
+ *
+ * This is a subsystem constant, and actual hardware limitations
+ * may override it (ie: clocksources that wrap every 3 seconds).
+ */
+#define MAX_UPDATE_LENGTH 5 /* Seconds */
+
+/**
+ * __clocksource_register_scale - Used to install new clocksources
+ * @t:         clocksource to be registered
+ * @scale:     Scale factor multiplied against freq to get clocksource hz
+ * @freq:      clocksource frequency (cycles per second) divided by scale
+ *
+ * Returns -EBUSY if registration fails, zero otherwise.
+ *
+ * This *SHOULD NOT* be called directly! Please use the
+ * clocksource_register_hz() or clocksource_register_khz helper functions.
+ */
+int __clocksource_register_scale(struct clocksource *cs, u32 scale, u32 freq)
+{
+
+       /*
+        * Ideally we want to use  some of the limits used in
+        * clocksource_max_deferment, to provide a more informed
+        * MAX_UPDATE_LENGTH. But for now this just gets the
+        * register interface working properly.
+        */
+       clocks_calc_mult_shift(&cs->mult, &cs->shift, freq,
+                                     NSEC_PER_SEC/scale,
+                                     MAX_UPDATE_LENGTH*scale);
+       cs->max_idle_ns = clocksource_max_deferment(cs);
+
+       mutex_lock(&clocksource_mutex);
+       clocksource_enqueue(cs);
+       clocksource_select();
+       clocksource_enqueue_watchdog(cs);
+       mutex_unlock(&clocksource_mutex);
+       return 0;
+}
+EXPORT_SYMBOL_GPL(__clocksource_register_scale);
+
+
 /**
  * clocksource_register - Used to install new clocksources
  * @t:         clocksource to be registered

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [BUG] [BISECTED] System gets unresponsive since 2.6.35-rc1
  2010-09-15 10:38 [BUG] [BISECTED] System gets unresponsive since 2.6.35-rc1 Martin Kepplinger
@ 2010-09-15 11:14 ` Thomas Gleixner
  2010-09-15 14:58   ` Martin Kepplinger
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas Gleixner @ 2010-09-15 11:14 UTC (permalink / raw)
  To: Martin Kepplinger; +Cc: johnstul, damm, akpm, linux-kernel

On Wed, 15 Sep 2010, Martin Kepplinger wrote:

> Hi,
> 
> My Ubuntu 10.04, 32bit on a Lenovo G550 Laptop has the following
> issue since .35-rc1: After a short uptime doing little things like
> video playback, the system gets slow and stuck after some time. At
> first, compiz-animations and everything (video playback,
> terminal-commands) gets unusably slow. /var/log/messages comes up
> with the following _during_ video playback. These warnings stop when
> the video is stopped. The system stays (quite) unresponsive /
> unusable.
> 
> Sep 15 12:15:51 mobil pulseaudio[1456]: ratelimit.c: 19 events suppressed
> Sep 15 12:15:57 mobil pulseaudio[1456]: ratelimit.c: 25 events suppressed
> Sep 15 12:16:02 mobil pulseaudio[1456]: ratelimit.c: 6 events suppressed
> Sep 15 12:16:07 mobil pulseaudio[1456]: ratelimit.c: 11 events suppressed
> Sep 15 12:16:17 mobil pulseaudio[1456]: ratelimit.c: 12 events suppressed
> Sep 15 12:16:23 mobil pulseaudio[1456]: ratelimit.c: 3 events suppressed
> 
> Shutdown also hangs very soon. The problem is still present in
> 2.6.36-rc3! I'm about to test -rc4. The last "good" one is 2.6.34.
> 
> I bisected the problem. Unfortunately I could not revert the
> patch. Additionally, I'm only 90% sure about the result because I
> had to skip one step of the bisect (but since the toplevel-Makefile
> showed "2.6.34-something", I could guess it to be "good")
> 
> The rest of this E-Mail is the result (git show). I can only report
> it. I'd happily test patches to it and hope this makes sense to
> someone of you.

I doubt that this is the real culprit. We just decoded a HPET problem,
which might be the real cause. Patch is below.

Can you provide the output of /proc/timer_list please ?

Thanks,

	tglx
---
commit 54ff7e595d763d894104d421b103a89f7becf47c
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Sep 14 22:10:21 2010 +0200

    x86: hpet: Work around hardware stupidity
    
    This more or less reverts commits 08be979 (x86: Force HPET
    readback_cmp for all ATI chipsets) and 30a564be (x86, hpet: Restrict
    read back to affected ATI chipsets) to the status of commit 8da854c
    (x86, hpet: Erratum workaround for read after write of HPET
    comparator).
    
    The delta to commit 8da854c is mostly comments and the change from
    WARN_ONCE to printk_once as we know the call path of this function
    already.
    
    This needs really in depth explanation:
    
    First of all the HPET design is a complete failure. Having a counter
    compare register which generates an interrupt on matching values
    forces the software to do at least one superfluous readback of the
    counter register.
    
    While it is nice in theory to program "absolute" time events it is
    practically useless because the timer runs at some absurd frequency
    which can never be matched to real world units. So we are forced to
    calculate a relative delta and this forces a readout of the actual
    counter value, adding the delta and programming the compare
    register. When the delta is small enough we run into the danger that
    we program a compare value which is already in the past. Due to the
    compare for equal nature of HPET we need to read back the counter
    value after writing the compare rehgister (btw. this is necessary for
    absolute timeouts as well) to make sure that we did not miss the timer
    event. We try to work around that by setting the minimum delta to a
    value which is larger than the theoretical time which elapses between
    the counter readout and the compare register write, but that's only
    true in theory. A NMI or SMI which hits between the readout and the
    write can easily push us beyond that limit. This would result in
    waiting for the next HPET timer interrupt until the 32bit wraparound
    of the counter happens which takes about 306 seconds.
    
    So we designed the next event function to look like:
    
       match = read_cnt() + delta;
       write_compare_ref(match);
       return read_cnt() < match ? 0 : -ETIME;
    
    At some point we got into trouble with certain ATI chipsets. Even the
    above "safe" procedure failed. The reason was that the write to the
    compare register was delayed probably for performance reasons. The
    theory was that they wanted to avoid the synchronization of the write
    with the HPET clock, which is understandable. So the write does not
    hit the compare register directly instead it goes to some intermediate
    register which is copied to the real compare register in sync with the
    HPET clock. That opens another window for hitting the dreaded "wait
    for a wraparound" problem.
    
    To work around that "optimization" we added a read back of the compare
    register which either enforced the update of the just written value or
    just delayed the readout of the counter enough to avoid the issue. We
    unfortunately never got any affirmative info from ATI/AMD about this.
    
    One thing is sure, that we nuked the performance "optimization" that
    way completely and I'm pretty sure that the result is worse than
    before some HW folks came up with those.
    
    Just for paranoia reasons I added a check whether the read back
    compare register value was the same as the value we wrote right
    before. That paranoia check triggered a couple of years after it was
    added on an Intel ICH9 chipset. Venki added a workaround (commit
    8da854c) which was reading the compare register twice when the first
    check failed. We considered this to be a penalty in general and
    restricted the readback (thus the wasted CPU cycles) to the known to
    be affected ATI chipsets.
    
    This turned out to be a utterly wrong decision. 2.6.35 testers
    experienced massive problems and finally one of them bisected it down
    to commit 30a564be which spured some further investigation.
    
    Finally we got confirmation that the write to the compare register can
    be delayed by up to two HPET clock cycles which explains the problems
    nicely. All we can do about this is to go back to Venki's initial
    workaround in a slightly modified version.
    
    Just for the record I need to say, that all of this could have been
    avoided if hardware designers and of course the HPET committee would
    have thought about the consequences for a split second. It's out of my
    comprehension why designing a working timer is so hard. There are two
    ways to achieve it:
    
     1) Use a counter wrap around aware compare_reg <= counter_reg
        implementation instead of the easy compare_reg == counter_reg
    
        Downsides:
    
    	- It needs more silicon.
    
    	- It needs a readout of the counter to apply a relative
    	  timeout. This is necessary as the counter does not run in
    	  any useful (and adjustable) frequency and there is no
    	  guarantee that the counter which is used for timer events is
    	  the same which is used for reading the actual time (and
    	  therefor for calculating the delta)
    
        Upsides:
    
    	- None
    
      2) Use a simple down counter for relative timer events
    
        Downsides:
    
    	- Absolute timeouts are not possible, which is not a problem
    	  at all in the context of an OS and the expected
    	  max. latencies/jitter (also see Downsides of #1)
    
       Upsides:
    
    	- It needs less or equal silicon.
    
    	- It works ALWAYS
    
    	- It is way faster than a compare register based solution (One
    	  write versus one write plus at least one and up to four
    	  reads)
    
    I would not be so grumpy about all of this, if I would not have been
    ignored for many years when pointing out these flaws to various
    hardware folks. I really hate timers (at least those which seem to be
    designed by janitors).
    
    Though finally we got a reasonable explanation plus a solution and I
    want to thank all the folks involved in chasing it down and providing
    valuable input to this.
    
    Bisected-by: Nix <nix@esperi.org.uk>
    Reported-by: Artur Skawina <art.08.09@gmail.com>
    Reported-by: Damien Wyart <damien.wyart@free.fr>
    Reported-by: John Drescher <drescherjm@gmail.com>
    Cc: Venkatesh Pallipadi <venki@google.com>
    Cc: Ingo Molnar <mingo@elte.hu>
    Cc: H. Peter Anvin <hpa@zytor.com>
    Cc: Arjan van de Ven <arjan@linux.intel.com>
    Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
    Cc: Borislav Petkov <borislav.petkov@amd.com>
    Cc: stable@kernel.org
    Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 004e6e2..1d5c08a 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -68,7 +68,6 @@ extern unsigned long force_hpet_address;
 extern u8 hpet_blockid;
 extern int hpet_force_user;
 extern u8 hpet_msi_disable;
-extern u8 hpet_readback_cmp;
 extern int is_hpet_enabled(void);
 extern int hpet_enable(void);
 extern void hpet_disable(void);
diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c
index e5cc7e8..ebdb85c 100644
--- a/arch/x86/kernel/early-quirks.c
+++ b/arch/x86/kernel/early-quirks.c
@@ -18,7 +18,6 @@
 #include <asm/apic.h>
 #include <asm/iommu.h>
 #include <asm/gart.h>
-#include <asm/hpet.h>
 
 static void __init fix_hypertransport_config(int num, int slot, int func)
 {
@@ -192,21 +191,6 @@ static void __init ati_bugs_contd(int num, int slot, int func)
 }
 #endif
 
-/*
- * Force the read back of the CMP register in hpet_next_event()
- * to work around the problem that the CMP register write seems to be
- * delayed. See hpet_next_event() for details.
- *
- * We do this on all SMBUS incarnations for now until we have more
- * information about the affected chipsets.
- */
-static void __init ati_hpet_bugs(int num, int slot, int func)
-{
-#ifdef CONFIG_HPET_TIMER
-	hpet_readback_cmp = 1;
-#endif
-}
-
 #define QFLAG_APPLY_ONCE 	0x1
 #define QFLAG_APPLIED		0x2
 #define QFLAG_DONE		(QFLAG_APPLY_ONCE|QFLAG_APPLIED)
@@ -236,8 +220,6 @@ static struct chipset early_qrk[] __initdata = {
 	  PCI_CLASS_SERIAL_SMBUS, PCI_ANY_ID, 0, ati_bugs },
 	{ PCI_VENDOR_ID_ATI, PCI_DEVICE_ID_ATI_SBX00_SMBUS,
 	  PCI_CLASS_SERIAL_SMBUS, PCI_ANY_ID, 0, ati_bugs_contd },
-	{ PCI_VENDOR_ID_ATI, PCI_ANY_ID,
-	  PCI_CLASS_SERIAL_SMBUS, PCI_ANY_ID, 0, ati_hpet_bugs },
 	{}
 };
 
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 351f9c0..410fdb3 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -35,7 +35,6 @@
 unsigned long				hpet_address;
 u8					hpet_blockid; /* OS timer block num */
 u8					hpet_msi_disable;
-u8					hpet_readback_cmp;
 
 #ifdef CONFIG_PCI_MSI
 static unsigned long			hpet_num_timers;
@@ -395,23 +394,27 @@ static int hpet_next_event(unsigned long delta,
 	 * at that point and we would wait for the next hpet interrupt
 	 * forever. We found out that reading the CMP register back
 	 * forces the transfer so we can rely on the comparison with
-	 * the counter register below.
+	 * the counter register below. If the read back from the
+	 * compare register does not match the value we programmed
+	 * then we might have a real hardware problem. We can not do
+	 * much about it here, but at least alert the user/admin with
+	 * a prominent warning.
 	 *
-	 * That works fine on those ATI chipsets, but on newer Intel
-	 * chipsets (ICH9...) this triggers due to an erratum: Reading
-	 * the comparator immediately following a write is returning
-	 * the old value.
+	 * An erratum on some chipsets (ICH9,..), results in
+	 * comparator read immediately following a write returning old
+	 * value. Workaround for this is to read this value second
+	 * time, when first read returns old value.
 	 *
-	 * We restrict the read back to the affected ATI chipsets (set
-	 * by quirks) and also run it with hpet=verbose for debugging
-	 * purposes.
+	 * In fact the write to the comparator register is delayed up
+	 * to two HPET cycles so the workaround we tried to restrict
+	 * the readback to those known to be borked ATI chipsets
+	 * failed miserably. So we give up on optimizations forever
+	 * and penalize all HPET incarnations unconditionally.
 	 */
-	if (hpet_readback_cmp || hpet_verbose) {
-		u32 cmp = hpet_readl(HPET_Tn_CMP(timer));
-
-		if (cmp != cnt)
+	if (unlikely((u32)hpet_readl(HPET_Tn_CMP(timer)) != cnt)) {
+		if (hpet_readl(HPET_Tn_CMP(timer)) != cnt)
 			printk_once(KERN_WARNING
-			    "hpet: compare register read back failed.\n");
+				"hpet: compare register read back failed.\n");
 	}
 
 	return (s32)(hpet_readl(HPET_COUNTER) - cnt) >= 0 ? -ETIME : 0;

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [BUG] [BISECTED] System gets unresponsive since 2.6.35-rc1
  2010-09-15 11:14 ` Thomas Gleixner
@ 2010-09-15 14:58   ` Martin Kepplinger
  2010-09-15 15:21     ` Thomas Gleixner
  0 siblings, 1 reply; 5+ messages in thread
From: Martin Kepplinger @ 2010-09-15 14:58 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: johnstul, damm, akpm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2327 bytes --]

Am 2010-09-15 13:14, schrieb Thomas Gleixner:
> On Wed, 15 Sep 2010, Martin Kepplinger wrote:
> 
>> Hi,
>>
>> My Ubuntu 10.04, 32bit on a Lenovo G550 Laptop has the following
>> issue since .35-rc1: After a short uptime doing little things like
>> video playback, the system gets slow and stuck after some time. At
>> first, compiz-animations and everything (video playback,
>> terminal-commands) gets unusably slow. /var/log/messages comes up
>> with the following _during_ video playback. These warnings stop when
>> the video is stopped. The system stays (quite) unresponsive /
>> unusable.
>>
>> Sep 15 12:15:51 mobil pulseaudio[1456]: ratelimit.c: 19 events suppressed
>> Sep 15 12:15:57 mobil pulseaudio[1456]: ratelimit.c: 25 events suppressed
>> Sep 15 12:16:02 mobil pulseaudio[1456]: ratelimit.c: 6 events suppressed
>> Sep 15 12:16:07 mobil pulseaudio[1456]: ratelimit.c: 11 events suppressed
>> Sep 15 12:16:17 mobil pulseaudio[1456]: ratelimit.c: 12 events suppressed
>> Sep 15 12:16:23 mobil pulseaudio[1456]: ratelimit.c: 3 events suppressed
>>
>> Shutdown also hangs very soon. The problem is still present in
>> 2.6.36-rc3! I'm about to test -rc4. The last "good" one is 2.6.34.
>>
>> I bisected the problem. Unfortunately I could not revert the
>> patch. Additionally, I'm only 90% sure about the result because I
>> had to skip one step of the bisect (but since the toplevel-Makefile
>> showed "2.6.34-something", I could guess it to be "good")
>>
>> The rest of this E-Mail is the result (git show). I can only report
>> it. I'd happily test patches to it and hope this makes sense to
>> someone of you.
> 
> I doubt that this is the real culprit. We just decoded a HPET problem,
> which might be the real cause. Patch is below.
> 
> Can you provide the output of /proc/timer_list please ?
> 
> Thanks,
> 
> 	tglx
> ---
> commit 54ff7e595d763d894104d421b103a89f7becf47c
> Author: Thomas Gleixner <tglx@linutronix.de>
> Date:   Tue Sep 14 22:10:21 2010 +0200
> 
>     x86: hpet: Work around hardware stupidity
>     

I tested this against 2.6.35. Since I can write this after watching half an
hour of LinuxCon Videos, this fixes the problem. I hope it to get merged soon.
I'll test it against the current tree as well and will stay happy and quiet as
long as I can't find a problem.

Thanks a lot!

	Martin

[-- Attachment #2: timer_list.dump --]
[-- Type: text/plain, Size: 9953 bytes --]

Timer List Version: v0.6
HRTIMER_MAX_CLOCK_BASES: 2
now at 48763215587 nsecs

cpu: 0
 clock 0:
  .base:       c78042c4
  .index:      0
  .resolution: 1 nsecs
  .get_time:   ktime_get_real
  .offset:     1284553359316792381 nsecs
active timers:
 clock 1:
  .base:       c78042f0
  .index:      1
  .resolution: 1 nsecs
  .get_time:   ktime_get
  .offset:     0 nsecs
active timers:
 #0: <c7804380>, tick_sched_timer, S:01, hrtimer_start_range_ns, swapper/0
 # expires at 48764000000-48764000000 nsecs [in 784413 to 784413 nsecs]
 #1: <f687db64>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, gedit/1508
 # expires at 48770791462-48770841462 nsecs [in 7575875 to 7625875 nsecs]
 #2: <c2117b64>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, nautilus/1416
 # expires at 48770851246-48770901246 nsecs [in 7635659 to 7685659 nsecs]
 #3: <c1f0df40>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, gvfs-afc-volume/1454
 # expires at 48916775057-48916825057 nsecs [in 153559470 to 153609470 nsecs]
 #4: <c2217b64>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, empathy/1408
 # expires at 49186991118-49188951116 nsecs [in 423775531 to 425735529 nsecs]
 #5: <c2359acc>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, Xorg/996
 # expires at 49257833999-49258333997 nsecs [in 494618412 to 495118410 nsecs]
 #6: <c1fc9e9c>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, indicator-apple/1481
 # expires at 49692030085-49692080085 nsecs [in 928814498 to 928864498 nsecs]
 #7: <c1f53e9c>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, nautilus/1467
 # expires at 50120922876-50120972876 nsecs [in 1357707289 to 1357757289 nsecs]
 #8: <c1f9db64>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, indicator-apple/1474
 # expires at 50186662984-50189457982 nsecs [in 1423447397 to 1426242395 nsecs]
 #9: <c2241b64>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, gnome-panel/1413
 # expires at 50187007949-50189971947 nsecs [in 1423792362 to 1426756360 nsecs]
 #10: <c1f9bb64>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, clock-applet/1473
 # expires at 50187184403-50190150401 nsecs [in 1423968816 to 1426934814 nsecs]
 #11: <c20efb64>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, udisks-daemon/1433
 # expires at 50684122950-50686110948 nsecs [in 1920907363 to 1922895361 nsecs]
 #12: <c1e85b64>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, rtkit-daemon/1406
 # expires at 51892055563-51897055561 nsecs [in 3128839976 to 3133839974 nsecs]
 #13: <c2243b64>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, gnome-power-man/1414
 # expires at 52186614186-52191578184 nsecs [in 3423398599 to 3428362597 nsecs]
 #14: <c1df3b64>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, gconfd-2/1376
 # expires at 61187451824-61201413821 nsecs [in 12424236237 to 12438198234 nsecs]
 #15: <c1dc7b64>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, gnome-session/1335
 # expires at 62187405694-62202365692 nsecs [in 13424190107 to 13439150105 nsecs]
 #16: <c20edb64>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, udisks-daemon/1432
 # expires at 62683451247-62698908245 nsecs [in 13920235660 to 13935692658 nsecs]
 #17: <c1e87f40>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, sleep/1235
 # expires at 83560438223-83560488223 nsecs [in 34797222636 to 34797272636 nsecs]
 #18: <c1fd3b64>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, indicator-appli/1486
 # expires at 94187527252-94234493250 nsecs [in 45424311665 to 45471277663 nsecs]
 #19: <c1fd1b64>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, gvfsd-metadata/1485
 # expires at 94187214347-94246851345 nsecs [in 45423998760 to 45483635758 nsecs]
 #20: <c223bf40>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, gdm-session-wor/1301
 # expires at 330478636536-330478686536 nsecs [in 281715420949 to 281715470949 nsecs]
 #21: <c20e1b64>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, nm-applet/1411
 # expires at 332187153847-332287153847 nsecs [in 283423938260 to 283523938260 nsecs]
 #22: <c1f51b64>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, gnome-screensav/1506
 # expires at 637404954416-637504954416 nsecs [in 588641738829 to 588741738829 nsecs]
  .expires_next   : 48764000000 nsecs
  .hres_active    : 1
  .nr_events      : 8949
  .nr_retries     : 1
  .nr_hangs       : 0
  .max_hang_time  : 0 nsecs
  .nohz_mode      : 2
  .idle_tick      : 48708000000 nsecs
  .tick_stopped   : 0
  .idle_jiffies   : 4294904473
  .idle_calls     : 18731
  .idle_sleeps    : 5026
  .idle_entrytime : 48761860185 nsecs
  .idle_waketime  : 48757284701 nsecs
  .idle_exittime  : 48757313126 nsecs
  .idle_sleeptime : 36258257723 nsecs
  .iowait_sleeptime: 10879152787 nsecs
  .last_jiffies   : 4294904486
  .next_jiffies   : 4294904487
  .idle_expires   : 48760000000 nsecs
jiffies: 4294904486

cpu: 1
 clock 0:
  .base:       c7a042c4
  .index:      0
  .resolution: 1 nsecs
  .get_time:   ktime_get_real
  .offset:     1284553359316792381 nsecs
active timers:
 clock 1:
  .base:       c7a042f0
  .index:      1
  .resolution: 1 nsecs
  .get_time:   ktime_get
  .offset:     0 nsecs
active timers:
 #0: <c7a04380>, tick_sched_timer, S:01, hrtimer_start_range_ns, swapper/0
 # expires at 48765000000-48765000000 nsecs [in 1784413 to 1784413 nsecs]
 #1: <f68adb64>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, gnome-terminal/1517
 # expires at 48780988348-48781038348 nsecs [in 17772761 to 17822761 nsecs]
 #2: <c2199f40>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, syndaemon/1426
 # expires at 48789418137-48789468137 nsecs [in 26202550 to 26252550 nsecs]
 #3: <c1effb64>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, indicator-sessi/1495
 # expires at 48988676165-48990277163 nsecs [in 225460578 to 227061576 nsecs]
 #4: <c1df7acc>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, gpsd/1108
 # expires at 49660673314-49661673312 nsecs [in 897457727 to 898457725 nsecs]
 #5: <f68ffb64>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, gnome-user-shar/1542
 # expires at 50187366844-50190321842 nsecs [in 1424151257 to 1427106255 nsecs]
 #6: <c2283acc>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, rsyslogd/774
 # expires at 50857704507-50887704504 nsecs [in 2094488920 to 2124488917 nsecs]
 #7: <c1e83acc>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, ssh-agent/1369
 # expires at 50952443949-50962443947 nsecs [in 2189228362 to 2199228360 nsecs]
 #8: <c1de1b64>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, gnome-settings-/1383
 # expires at 52185885821-52189313819 nsecs [in 3422670234 to 3426098232 nsecs]
 #9: <c1df5b64>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, compiz/1394
 # expires at 52757945535-52761945533 nsecs [in 3994729948 to 3998729946 nsecs]
 #10: <c1df1acc>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, winbindd/1099
 # expires at 52994025916-53024025914 nsecs [in 4230810329 to 4260810327 nsecs]
 #11: <f684de9c>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, gedit/1509
 # expires at 55496508225-55496558225 nsecs [in 6733292638 to 6733342638 nsecs]
 #12: <c1e89b64>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, rtkit-daemon/1407
 # expires at 56892083151-56892083151 nsecs [in 8128867564 to 8128867564 nsecs]
 #13: <c1eafb64>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, upowerd/1246
 # expires at 59683477647-59695934645 nsecs [in 10920262060 to 10932719058 nsecs]
 #14: <c1fcbe9c>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, indicator-apple/1490
 # expires at 62890059877-62890109877 nsecs [in 14126844290 to 14126894290 nsecs]
 #15: <c1efbb64>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, gnome-clock-app/1498
 # expires at 64682362549-64699818547 nsecs [in 15919146962 to 15936602960 nsecs]
 #16: <c2331f40>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, cron/961
 # expires at 82143942281-82143992281 nsecs [in 33380726694 to 33380776694 nsecs]
 #17: <c2251e9c>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, rsyslogd/786
 # expires at 92488203208-92488253208 nsecs [in 43724987621 to 43725037621 nsecs]
 #18: <c22b3b64>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, polipo/967
 # expires at 1521115124444-1521215124444 nsecs [in 1472351908857 to 1472451908857 nsecs]
 #19: <c1d19f40>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, atd/962
 # expires at 3621115433141-3621115483141 nsecs [in 3572352217554 to 3572352267554 nsecs]
  .expires_next   : 48765000000 nsecs
  .hres_active    : 1
  .nr_events      : 8218
  .nr_retries     : 1
  .nr_hangs       : 0
  .max_hang_time  : 0 nsecs
  .nohz_mode      : 2
  .idle_tick      : 48709000000 nsecs
  .tick_stopped   : 0
  .idle_jiffies   : 4294904473
  .idle_calls     : 16954
  .idle_sleeps    : 3813
  .idle_entrytime : 48761516638 nsecs
  .idle_waketime  : 48756053407 nsecs
  .idle_exittime  : 48757476973 nsecs
  .idle_sleeptime : 37238099743 nsecs
  .iowait_sleeptime: 7873142111 nsecs
  .last_jiffies   : 4294904486
  .next_jiffies   : 4294904487
  .idle_expires   : 48756000000 nsecs
jiffies: 4294904486


Tick Device: mode:     1
Broadcast device
Clock Event Device: hpet
 max_delta_ns:   149983005959
 min_delta_ns:   5000
 mult:           61496114
 shift:          32
 mode:           3
 next_event:     48764000000 nsecs
 set_next_event: hpet_legacy_next_event
 set_mode:       hpet_legacy_set_mode
 event_handler:  tick_handle_oneshot_broadcast
 retries:        1706
tick_broadcast_mask: 00000003
tick_broadcast_oneshot_mask: 00000001


Tick Device: mode:     1
Per CPU device: 0
Clock Event Device: lapic
 max_delta_ns:   672747792
 min_delta_ns:   1202
 mult:           53554680
 shift:          32
 mode:           1
 next_event:     48764000000 nsecs
 set_next_event: lapic_next_event
 set_mode:       lapic_timer_setup
 event_handler:  hrtimer_interrupt
 retries:        1917

Tick Device: mode:     1
Per CPU device: 1
Clock Event Device: lapic
 max_delta_ns:   672747792
 min_delta_ns:   1202
 mult:           53554680
 shift:          32
 mode:           3
 next_event:     48765000000 nsecs
 set_next_event: lapic_next_event
 set_mode:       lapic_timer_setup
 event_handler:  hrtimer_interrupt
 retries:        1780


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] [BISECTED] System gets unresponsive since 2.6.35-rc1
  2010-09-15 14:58   ` Martin Kepplinger
@ 2010-09-15 15:21     ` Thomas Gleixner
  2010-09-16  9:53       ` Martin Kepplinger
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas Gleixner @ 2010-09-15 15:21 UTC (permalink / raw)
  To: Martin Kepplinger; +Cc: johnstul, damm, akpm, linux-kernel

Martin,

On Wed, 15 Sep 2010, Martin Kepplinger wrote:
> Am 2010-09-15 13:14, schrieb Thomas Gleixner:
> > commit 54ff7e595d763d894104d421b103a89f7becf47c
> > Author: Thomas Gleixner <tglx@linutronix.de>
> > Date:   Tue Sep 14 22:10:21 2010 +0200
> > 
> >     x86: hpet: Work around hardware stupidity
> >     
> 
> I tested this against 2.6.35. Since I can write this after watching half an
> hour of LinuxCon Videos, this fixes the problem. I hope it to get merged soon.
> I'll test it against the current tree as well and will stay happy and quiet as
> long as I can't find a problem.
> 
> Thanks a lot!

Thanks for testing! Could you please test the patch below on top of
this one as well ?

Thanks,

	tglx
---
Subject: x86: hpet: Avoid the readback penalty
From: Thomas Gleixner <tglx@linutronix.de>
Date: Wed, 15 Sep 2010 14:32:17 +0200

Due to the overly intelligent design of HPETs, we need to workaround
the problem that the compare value which we write is already behind
the actual counter value at the point where the value hits the real
compare register. This happens for two reasons:

1) We read out the counter, add the delta and write the result to the
   compare register. When a NMI or SMI hits between the read out and
   the write then the counter can be ahead of the event already

2) The write to the compare register is delayed by up to two HPET
   cycles in certain chipsets.

We worked around this by reading back the compare register to make
sure that the written value has hit the hardware. For certain ICH9+
chipsets this can require two readouts, as the first one can return
the previous compare register value. That's bad performance wise for
the normal case where the event is far enough in the future.

As we already know that the write can be delayed by up to two cycles
we can avoid the read back of the compare register completely if we
make the decision whether the delta has elapsed already or not based
on the following calculation:

  cmp = event - actual_count;

If cmp is less than 8 HPET clock cycles, then we decide that the event
has happened already and return -ETIME. That covers the above #1 and
#2 problems which can cause a wait for HPET wraparound (~306 seconds).

Works nice on my affected ATI system.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/hpet.c |   51 ++++++++++++++++++++-----------------------------
 1 file changed, 21 insertions(+), 30 deletions(-)

Index: linux-2.6-tip/arch/x86/kernel/hpet.c
===================================================================
--- linux-2.6-tip.orig/arch/x86/kernel/hpet.c
+++ linux-2.6-tip/arch/x86/kernel/hpet.c
@@ -380,44 +380,35 @@ static int hpet_next_event(unsigned long
 			   struct clock_event_device *evt, int timer)
 {
 	u32 cnt;
+	s32 res;
 
 	cnt = hpet_readl(HPET_COUNTER);
 	cnt += (u32) delta;
 	hpet_writel(cnt, HPET_Tn_CMP(timer));
 
 	/*
-	 * We need to read back the CMP register on certain HPET
-	 * implementations (ATI chipsets) which seem to delay the
-	 * transfer of the compare register into the internal compare
-	 * logic. With small deltas this might actually be too late as
-	 * the counter could already be higher than the compare value
-	 * at that point and we would wait for the next hpet interrupt
-	 * forever. We found out that reading the CMP register back
-	 * forces the transfer so we can rely on the comparison with
-	 * the counter register below. If the read back from the
-	 * compare register does not match the value we programmed
-	 * then we might have a real hardware problem. We can not do
-	 * much about it here, but at least alert the user/admin with
-	 * a prominent warning.
-	 *
-	 * An erratum on some chipsets (ICH9,..), results in
-	 * comparator read immediately following a write returning old
-	 * value. Workaround for this is to read this value second
-	 * time, when first read returns old value.
-	 *
-	 * In fact the write to the comparator register is delayed up
-	 * to two HPET cycles so the workaround we tried to restrict
-	 * the readback to those known to be borked ATI chipsets
-	 * failed miserably. So we give up on optimizations forever
-	 * and penalize all HPET incarnations unconditionally.
+	 * HPETs are a complete disaster. The compare register is
+	 * based on a equal comparison and does provide a less than or
+	 * equal functionality (which would require to take the
+	 * wraparound into account) and it does not provide a simple
+	 * count down event mode. Further the write to the comparator
+	 * register is delayed internaly up to two HPET clock cycles
+	 * in certain chipsets (ATI, ICH9,10). We worked around that
+	 * by reading back the compare register, but that required
+	 * another workaround for ICH9,10 chips where the first
+	 * readout after write can return the old stale value. We
+	 * already have a minimum delta of 5us enforced, but a NMI or
+	 * SMI hitting between the counter readout and the comparator
+	 * write can move us behind that point easily. Now instead of
+	 * reading the compare register back several times, we make
+	 * the ETIME decision based on the following: Return ETIME if
+	 * the counter value after the write is less than 8 HPET
+	 * cycles away from the event or if the counter is already
+	 * ahead of the event.
 	 */
-	if (unlikely((u32)hpet_readl(HPET_Tn_CMP(timer)) != cnt)) {
-		if (hpet_readl(HPET_Tn_CMP(timer)) != cnt)
-			printk_once(KERN_WARNING
-				"hpet: compare register read back failed.\n");
-	}
+	res = (s32)(cnt - hpet_readl(HPET_COUNTER));
 
-	return (s32)(hpet_readl(HPET_COUNTER) - cnt) >= 0 ? -ETIME : 0;
+	return res < 8 ? -ETIME : 0;
 }
 
 static void hpet_legacy_set_mode(enum clock_event_mode mode,

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] [BISECTED] System gets unresponsive since 2.6.35-rc1
  2010-09-15 15:21     ` Thomas Gleixner
@ 2010-09-16  9:53       ` Martin Kepplinger
  0 siblings, 0 replies; 5+ messages in thread
From: Martin Kepplinger @ 2010-09-16  9:53 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: johnstul, damm, akpm, linux-kernel

Am 2010-09-15 17:21, schrieb Thomas Gleixner:
> Martin,
> 
> On Wed, 15 Sep 2010, Martin Kepplinger wrote:
>> Am 2010-09-15 13:14, schrieb Thomas Gleixner:
>>> commit 54ff7e595d763d894104d421b103a89f7becf47c
>>> Author: Thomas Gleixner <tglx@linutronix.de>
>>> Date:   Tue Sep 14 22:10:21 2010 +0200
>>>
>>>     x86: hpet: Work around hardware stupidity
>>>     
>>
>> I tested this against 2.6.35. Since I can write this after watching half an
>> hour of LinuxCon Videos, this fixes the problem. I hope it to get merged soon.
>> I'll test it against the current tree as well and will stay happy and quiet as
>> long as I can't find a problem.
>>
>> Thanks a lot!
> 
> Thanks for testing! Could you please test the patch below on top of
> this one as well ?
> 
> Thanks,
> 
> 	tglx
> ---
> Subject: x86: hpet: Avoid the readback penalty
> From: Thomas Gleixner <tglx@linutronix.de>
> Date: Wed, 15 Sep 2010 14:32:17 +0200
> 

Both patches tested against 2.6.36. My system seems very stable.

Thank you,

	Martin

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-09-16  9:53 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-15 10:38 [BUG] [BISECTED] System gets unresponsive since 2.6.35-rc1 Martin Kepplinger
2010-09-15 11:14 ` Thomas Gleixner
2010-09-15 14:58   ` Martin Kepplinger
2010-09-15 15:21     ` Thomas Gleixner
2010-09-16  9:53       ` Martin Kepplinger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.