All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [tip:x86/urgent] x86: hpet: Work around hardware stupidity
       [not found] <tip-54ff7e595d763d894104d421b103a89f7becf47c@git.kernel.org>
@ 2010-09-14 23:36 ` Venkatesh Pallipadi
  2010-09-15 13:11 ` [PATCH RFC] x86: hpet: Avoid the readback penalty Thomas Gleixner
  1 sibling, 0 replies; 13+ messages in thread
From: Venkatesh Pallipadi @ 2010-09-14 23:36 UTC (permalink / raw)
  To: linux-kernel, mingo, hpa, arjan, andreas.herrmann3, drescherjm,
	art.08.09, damien.wyart, suresh.b.siddha, tglx, nix, mingo,
	borislav.petkov, venki
  Cc: linux-tip-commits

On Tue, Sep 14, 2010 at 4:10 PM, tip-bot for Thomas Gleixner
<tglx@linutronix.de> wrote:
> Commit-ID:  54ff7e595d763d894104d421b103a89f7becf47c
> Gitweb:     http://git.kernel.org/tip/54ff7e595d763d894104d421b103a89f7becf47c
> Author:     Thomas Gleixner <tglx@linutronix.de>
> AuthorDate: Tue, 14 Sep 2010 22:10:21 +0200
> Committer:  Thomas Gleixner <tglx@linutronix.de>
> CommitDate: Wed, 15 Sep 2010 00:55:13 +0200
>
> x86: hpet: Work around hardware stupidity
>
> This more or less reverts commits 08be979 (x86: Force HPET
> readback_cmp for all ATI chipsets) and 30a564be (x86, hpet: Restrict
> read back to affected ATI chipsets) to the status of commit 8da854c
> (x86, hpet: Erratum workaround for read after write of HPET
> comparator).
>
> The delta to commit 8da854c is mostly comments and the change from
> WARN_ONCE to printk_once as we know the call path of this function
> already.
>
> This needs really in depth explanation:
>
> First of all the HPET design is a complete failure. Having a counter
> compare register which generates an interrupt on matching values
> forces the software to do at least one superfluous readback of the
> counter register.
>
> While it is nice in theory to program "absolute" time events it is
> practically useless because the timer runs at some absurd frequency
> which can never be matched to real world units. So we are forced to
> calculate a relative delta and this forces a readout of the actual
> counter value, adding the delta and programming the compare
> register. When the delta is small enough we run into the danger that
> we program a compare value which is already in the past. Due to the
> compare for equal nature of HPET we need to read back the counter
> value after writing the compare rehgister (btw. this is necessary for
> absolute timeouts as well) to make sure that we did not miss the timer
> event. We try to work around that by setting the minimum delta to a
> value which is larger than the theoretical time which elapses between
> the counter readout and the compare register write, but that's only
> true in theory. A NMI or SMI which hits between the readout and the
> write can easily push us beyond that limit. This would result in
> waiting for the next HPET timer interrupt until the 32bit wraparound
> of the counter happens which takes about 306 seconds.
>
> So we designed the next event function to look like:
>
>   match = read_cnt() + delta;
>   write_compare_ref(match);
>   return read_cnt() < match ? 0 : -ETIME;
>
> At some point we got into trouble with certain ATI chipsets. Even the
> above "safe" procedure failed. The reason was that the write to the
> compare register was delayed probably for performance reasons. The
> theory was that they wanted to avoid the synchronization of the write
> with the HPET clock, which is understandable. So the write does not
> hit the compare register directly instead it goes to some intermediate
> register which is copied to the real compare register in sync with the
> HPET clock. That opens another window for hitting the dreaded "wait
> for a wraparound" problem.
>
> To work around that "optimization" we added a read back of the compare
> register which either enforced the update of the just written value or
> just delayed the readout of the counter enough to avoid the issue. We
> unfortunately never got any affirmative info from ATI/AMD about this.
>
> One thing is sure, that we nuked the performance "optimization" that
> way completely and I'm pretty sure that the result is worse than
> before some HW folks came up with those.
>
> Just for paranoia reasons I added a check whether the read back
> compare register value was the same as the value we wrote right
> before. That paranoia check triggered a couple of years after it was
> added on an Intel ICH9 chipset. Venki added a workaround (commit
> 8da854c) which was reading the compare register twice when the first
> check failed. We considered this to be a penalty in general and
> restricted the readback (thus the wasted CPU cycles) to the known to
> be affected ATI chipsets.
>
> This turned out to be a utterly wrong decision. 2.6.35 testers
> experienced massive problems and finally one of them bisected it down
> to commit 30a564be which spured some further investigation.
>
> Finally we got confirmation that the write to the compare register can
> be delayed by up to two HPET clock cycles which explains the problems
> nicely. All we can do about this is to go back to Venki's initial
> workaround in a slightly modified version.
>
> Just for the record I need to say, that all of this could have been
> avoided if hardware designers and of course the HPET committee would
> have thought about the consequences for a split second. It's out of my
> comprehension why designing a working timer is so hard. There are two
> ways to achieve it:
>
>  1) Use a counter wrap around aware compare_reg <= counter_reg
>    implementation instead of the easy compare_reg == counter_reg
>
>    Downsides:
>
>        - It needs more silicon.
>
>        - It needs a readout of the counter to apply a relative
>          timeout. This is necessary as the counter does not run in
>          any useful (and adjustable) frequency and there is no
>          guarantee that the counter which is used for timer events is
>          the same which is used for reading the actual time (and
>          therefor for calculating the delta)
>
>    Upsides:
>
>        - None
>
>  2) Use a simple down counter for relative timer events
>
>    Downsides:
>
>        - Absolute timeouts are not possible, which is not a problem
>          at all in the context of an OS and the expected
>          max. latencies/jitter (also see Downsides of #1)
>
>   Upsides:
>
>        - It needs less or equal silicon.
>
>        - It works ALWAYS
>
>        - It is way faster than a compare register based solution (One
>          write versus one write plus at least one and up to four
>          reads)
>
> I would not be so grumpy about all of this, if I would not have been
> ignored for many years when pointing out these flaws to various
> hardware folks. I really hate timers (at least those which seem to be
> designed by janitors).
>
> Though finally we got a reasonable explanation plus a solution and I
> want to thank all the folks involved in chasing it down and providing
> valuable input to this.
>
> Bisected-by: Nix <nix@esperi.org.uk>
> Reported-by: Artur Skawina <art.08.09@gmail.com>
> Reported-by: Damien Wyart <damien.wyart@free.fr>
> Reported-by: John Drescher <drescherjm@gmail.com>
> Cc: Venkatesh Pallipadi <venki@google.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: H. Peter Anvin <hpa@zytor.com>
> Cc: Arjan van de Ven <arjan@linux.intel.com>
> Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
> Cc: Borislav Petkov <borislav.petkov@amd.com>
> Cc: stable@kernel.org
> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  arch/x86/include/asm/hpet.h    |    1 -
>  arch/x86/kernel/early-quirks.c |   18 ------------------
>  arch/x86/kernel/hpet.c         |   31 +++++++++++++++++--------------
>  3 files changed, 17 insertions(+), 33 deletions(-)
>
> diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
> index 004e6e2..1d5c08a 100644
> --- a/arch/x86/include/asm/hpet.h
> +++ b/arch/x86/include/asm/hpet.h
> @@ -68,7 +68,6 @@ extern unsigned long force_hpet_address;
>  extern u8 hpet_blockid;
>  extern int hpet_force_user;
>  extern u8 hpet_msi_disable;
> -extern u8 hpet_readback_cmp;
>  extern int is_hpet_enabled(void);
>  extern int hpet_enable(void);
>  extern void hpet_disable(void);
> diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c
> index e5cc7e8..ebdb85c 100644
> --- a/arch/x86/kernel/early-quirks.c
> +++ b/arch/x86/kernel/early-quirks.c
> @@ -18,7 +18,6 @@
>  #include <asm/apic.h>
>  #include <asm/iommu.h>
>  #include <asm/gart.h>
> -#include <asm/hpet.h>
>
>  static void __init fix_hypertransport_config(int num, int slot, int func)
>  {
> @@ -192,21 +191,6 @@ static void __init ati_bugs_contd(int num, int slot, int func)
>  }
>  #endif
>
> -/*
> - * Force the read back of the CMP register in hpet_next_event()
> - * to work around the problem that the CMP register write seems to be
> - * delayed. See hpet_next_event() for details.
> - *
> - * We do this on all SMBUS incarnations for now until we have more
> - * information about the affected chipsets.
> - */
> -static void __init ati_hpet_bugs(int num, int slot, int func)
> -{
> -#ifdef CONFIG_HPET_TIMER
> -       hpet_readback_cmp = 1;
> -#endif
> -}
> -
>  #define QFLAG_APPLY_ONCE       0x1
>  #define QFLAG_APPLIED          0x2
>  #define QFLAG_DONE             (QFLAG_APPLY_ONCE|QFLAG_APPLIED)
> @@ -236,8 +220,6 @@ static struct chipset early_qrk[] __initdata = {
>          PCI_CLASS_SERIAL_SMBUS, PCI_ANY_ID, 0, ati_bugs },
>        { PCI_VENDOR_ID_ATI, PCI_DEVICE_ID_ATI_SBX00_SMBUS,
>          PCI_CLASS_SERIAL_SMBUS, PCI_ANY_ID, 0, ati_bugs_contd },
> -       { PCI_VENDOR_ID_ATI, PCI_ANY_ID,
> -         PCI_CLASS_SERIAL_SMBUS, PCI_ANY_ID, 0, ati_hpet_bugs },
>        {}
>  };
>
> diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
> index 351f9c0..410fdb3 100644
> --- a/arch/x86/kernel/hpet.c
> +++ b/arch/x86/kernel/hpet.c
> @@ -35,7 +35,6 @@
>  unsigned long                          hpet_address;
>  u8                                     hpet_blockid; /* OS timer block num */
>  u8                                     hpet_msi_disable;
> -u8                                     hpet_readback_cmp;
>
>  #ifdef CONFIG_PCI_MSI
>  static unsigned long                   hpet_num_timers;
> @@ -395,23 +394,27 @@ static int hpet_next_event(unsigned long delta,
>         * at that point and we would wait for the next hpet interrupt
>         * forever. We found out that reading the CMP register back
>         * forces the transfer so we can rely on the comparison with
> -        * the counter register below.
> +        * the counter register below. If the read back from the
> +        * compare register does not match the value we programmed
> +        * then we might have a real hardware problem. We can not do
> +        * much about it here, but at least alert the user/admin with
> +        * a prominent warning.
>         *
> -        * That works fine on those ATI chipsets, but on newer Intel
> -        * chipsets (ICH9...) this triggers due to an erratum: Reading
> -        * the comparator immediately following a write is returning
> -        * the old value.
> +        * An erratum on some chipsets (ICH9,..), results in
> +        * comparator read immediately following a write returning old
> +        * value. Workaround for this is to read this value second
> +        * time, when first read returns old value.
>         *
> -        * We restrict the read back to the affected ATI chipsets (set
> -        * by quirks) and also run it with hpet=verbose for debugging
> -        * purposes.
> +        * In fact the write to the comparator register is delayed up
> +        * to two HPET cycles so the workaround we tried to restrict
> +        * the readback to those known to be borked ATI chipsets
> +        * failed miserably. So we give up on optimizations forever
> +        * and penalize all HPET incarnations unconditionally.
>         */
> -       if (hpet_readback_cmp || hpet_verbose) {
> -               u32 cmp = hpet_readl(HPET_Tn_CMP(timer));
> -
> -               if (cmp != cnt)
> +       if (unlikely((u32)hpet_readl(HPET_Tn_CMP(timer)) != cnt)) {
> +               if (hpet_readl(HPET_Tn_CMP(timer)) != cnt)

Minor nit.
I guess (u32) in first check above is not needed as hpet_readl
actually returns unsigned int.
Otherwise
Acked-by: Venkatesh Pallipadi <venki@google.com>

>                        printk_once(KERN_WARNING
> -                           "hpet: compare register read back failed.\n");
> +                               "hpet: compare register read back failed.\n");
>        }
>
>        return (s32)(hpet_readl(HPET_COUNTER) - cnt) >= 0 ? -ETIME : 0;
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH RFC] x86: hpet: Avoid the readback penalty
       [not found] <tip-54ff7e595d763d894104d421b103a89f7becf47c@git.kernel.org>
  2010-09-14 23:36 ` [tip:x86/urgent] x86: hpet: Work around hardware stupidity Venkatesh Pallipadi
@ 2010-09-15 13:11 ` Thomas Gleixner
  2010-09-15 13:43   ` Anders Larsen
                     ` (5 more replies)
  1 sibling, 6 replies; 13+ messages in thread
From: Thomas Gleixner @ 2010-09-15 13:11 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, H. Peter Anvin, Nix, Artur Skawina, Damien Wyart,
	John Drescher, Venkatesh Pallipadi, Arjan van de Ven,
	Andreas Herrmann, Borislav Petkov, Suresh Siddha

On Tue, 14 Sep 2010, tip-bot for Thomas Gleixner wrote:
> x86: hpet: Work around hardware stupidity

After my brain recovered from yesterdays exposure with the x86 timer
horror, I came up with a different solution for this problem, which
avoids the readback of the compare register completely. It works
nicely on my affected ATI system, but needs some exposure to the other
machines.

Comments ?

Thanks,

	tglx
---
Subject: x86: hpet: Avoid the readback penalty
From: Thomas Gleixner <tglx@linutronix.de>
Date: Wed, 15 Sep 2010 14:32:17 +0200

Due to the overly intelligent design of HPETs, we need to workaround
the problem that the compare value which we write is already behind
the actual counter value at the point where the value hits the real
compare register. This happens for two reasons:

1) We read out the counter, add the delta and write the result to the
   compare register. When a NMI or SMI hits between the read out and
   the write then the counter can be ahead of the event already

2) The write to the compare register is delayed by up to two HPET
   cycles in certain chipsets.

We worked around this by reading back the compare register to make
sure that the written value has hit the hardware. For certain ICH9+
chipsets this can require two readouts, as the first one can return
the previous compare register value. That's bad performance wise for
the normal case where the event is far enough in the future.

As we already know that the write can be delayed by up to two cycles
we can avoid the read back of the compare register completely if we
make the decision whether the delta has elapsed already or not based
on the following calculation:

  cmp = event - actual_count;

If cmp is less than 8 HPET clock cycles, then we decide that the event
has happened already and return -ETIME. That covers the above #1 and
#2 problems which would cause a wait for HPET wraparound (~306
seconds).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/hpet.c |   51 ++++++++++++++++++++-----------------------------
 1 file changed, 21 insertions(+), 30 deletions(-)

Index: linux-2.6-tip/arch/x86/kernel/hpet.c
===================================================================
--- linux-2.6-tip.orig/arch/x86/kernel/hpet.c
+++ linux-2.6-tip/arch/x86/kernel/hpet.c
@@ -380,44 +380,35 @@ static int hpet_next_event(unsigned long
 			   struct clock_event_device *evt, int timer)
 {
 	u32 cnt;
+	s32 res;
 
 	cnt = hpet_readl(HPET_COUNTER);
 	cnt += (u32) delta;
 	hpet_writel(cnt, HPET_Tn_CMP(timer));
 
 	/*
-	 * We need to read back the CMP register on certain HPET
-	 * implementations (ATI chipsets) which seem to delay the
-	 * transfer of the compare register into the internal compare
-	 * logic. With small deltas this might actually be too late as
-	 * the counter could already be higher than the compare value
-	 * at that point and we would wait for the next hpet interrupt
-	 * forever. We found out that reading the CMP register back
-	 * forces the transfer so we can rely on the comparison with
-	 * the counter register below. If the read back from the
-	 * compare register does not match the value we programmed
-	 * then we might have a real hardware problem. We can not do
-	 * much about it here, but at least alert the user/admin with
-	 * a prominent warning.
-	 *
-	 * An erratum on some chipsets (ICH9,..), results in
-	 * comparator read immediately following a write returning old
-	 * value. Workaround for this is to read this value second
-	 * time, when first read returns old value.
-	 *
-	 * In fact the write to the comparator register is delayed up
-	 * to two HPET cycles so the workaround we tried to restrict
-	 * the readback to those known to be borked ATI chipsets
-	 * failed miserably. So we give up on optimizations forever
-	 * and penalize all HPET incarnations unconditionally.
+	 * HPETs are a complete disaster. The compare register is
+	 * based on a equal comparison and does provide a less than or
+	 * equal functionality (which would require to take the
+	 * wraparound into account) and it does not provide a simple
+	 * count down event mode. Further the write to the comparator
+	 * register is delayed internaly up to two HPET clock cycles
+	 * in certain chipsets (ATI, ICH9,10). We worked around that
+	 * by reading back the compare register, but that required
+	 * another workaround for ICH9,10 chips where the first
+	 * readout after write can return the old stale value. We
+	 * already have a minimum delta of 5us enforced, but a NMI or
+	 * SMI hitting between the counter readout and the comparator
+	 * write can move us behind that point easily. Now instead of
+	 * reading the compare register back several times, we make
+	 * the ETIME decision based on the following: Return ETIME if
+	 * the counter value after the write is less than 8 HPET
+	 * cycles away from the event or if the counter is already
+	 * ahead of the event.
 	 */
-	if (unlikely((u32)hpet_readl(HPET_Tn_CMP(timer)) != cnt)) {
-		if (hpet_readl(HPET_Tn_CMP(timer)) != cnt)
-			printk_once(KERN_WARNING
-				"hpet: compare register read back failed.\n");
-	}
+	res = (s32)(cnt - hpet_readl(HPET_COUNTER));
 
-	return (s32)(hpet_readl(HPET_COUNTER) - cnt) >= 0 ? -ETIME : 0;
+	return res < 8 ? -ETIME : 0;
 }
 
 static void hpet_legacy_set_mode(enum clock_event_mode mode,

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC] x86: hpet: Avoid the readback penalty
  2010-09-15 13:11 ` [PATCH RFC] x86: hpet: Avoid the readback penalty Thomas Gleixner
@ 2010-09-15 13:43   ` Anders Larsen
  2010-09-15 13:54   ` John Drescher
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Anders Larsen @ 2010-09-15 13:43 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Ingo Molnar, H. Peter Anvin, Nix, Artur Skawina,
	Damien Wyart, John Drescher, Venkatesh Pallipadi,
	Arjan van de Ven, Andreas Herrmann, Borislav Petkov,
	Suresh Siddha

On 2010-09-15 15:11:57, Thomas Gleixner wrote:
> -	 * We need to read back the CMP register on certain HPET
> -	 * implementations (ATI chipsets) which seem to delay the
> -	 * transfer of the compare register into the internal compare
> -	 * logic. With small deltas this might actually be too late as
> -	 * the counter could already be higher than the compare value
> -	 * at that point and we would wait for the next hpet interrupt
> -	 * forever. We found out that reading the CMP register back
> -	 * forces the transfer so we can rely on the comparison with
> -	 * the counter register below. If the read back from the
> -	 * compare register does not match the value we programmed
> -	 * then we might have a real hardware problem. We can not do
> -	 * much about it here, but at least alert the user/admin with
> -	 * a prominent warning.
> -	 *
> -	 * An erratum on some chipsets (ICH9,..), results in
> -	 * comparator read immediately following a write returning old
> -	 * value. Workaround for this is to read this value second
> -	 * time, when first read returns old value.
> -	 *
> -	 * In fact the write to the comparator register is delayed up
> -	 * to two HPET cycles so the workaround we tried to restrict
> -	 * the readback to those known to be borked ATI chipsets
> -	 * failed miserably. So we give up on optimizations forever
> -	 * and penalize all HPET incarnations unconditionally.
> +	 * HPETs are a complete disaster. The compare register is
> +	 * based on a equal comparison and does provide a less than or

s/does provide/does not provide/

> +	 * equal functionality (which would require to take the
> +	 * wraparound into account) and it does not provide a simple
> +	 * count down event mode. Further the write to the comparator
> +	 * register is delayed internaly up to two HPET clock cycles

s/internaly/internally/

> +	 * in certain chipsets (ATI, ICH9,10). We worked around that
> +	 * by reading back the compare register, but that required
> +	 * another workaround for ICH9,10 chips where the first
> +	 * readout after write can return the old stale value. We
> +	 * already have a minimum delta of 5us enforced, but a NMI or
> +	 * SMI hitting between the counter readout and the comparator
> +	 * write can move us behind that point easily. Now instead of
> +	 * reading the compare register back several times, we make
> +	 * the ETIME decision based on the following: Return ETIME if
> +	 * the counter value after the write is less than 8 HPET
> +	 * cycles away from the event or if the counter is already
> +	 * ahead of the event.

Cheers
Anders



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC] x86: hpet: Avoid the readback penalty
  2010-09-15 13:11 ` [PATCH RFC] x86: hpet: Avoid the readback penalty Thomas Gleixner
  2010-09-15 13:43   ` Anders Larsen
@ 2010-09-15 13:54   ` John Drescher
       [not found]     ` <AANLkTikHfDUh3LdmyA5eMNACFPfi6pqGLFJiyzVm79X_@mail.gmail.com>
  2010-09-15 14:40   ` Borislav Petkov
                     ` (3 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: John Drescher @ 2010-09-15 13:54 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Ingo Molnar, H. Peter Anvin, Nix, Artur Skawina,
	Damien Wyart, Venkatesh Pallipadi, Arjan van de Ven,
	Andreas Herrmann, Borislav Petkov, Suresh Siddha

On Wed, Sep 15, 2010 at 9:11 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Tue, 14 Sep 2010, tip-bot for Thomas Gleixner wrote:
>> x86: hpet: Work around hardware stupidity
>
> After my brain recovered from yesterdays exposure with the x86 timer
> horror, I came up with a different solution for this problem, which
> avoids the readback of the compare register completely. It works
> nicely on my affected ATI system, but needs some exposure to the other
> machines.
>
> Comments ?
>

I can test this on my i7 machine in tonight (around 12 hours from now).

John

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC] x86: hpet: Avoid the readback penalty
  2010-09-15 13:11 ` [PATCH RFC] x86: hpet: Avoid the readback penalty Thomas Gleixner
  2010-09-15 13:43   ` Anders Larsen
  2010-09-15 13:54   ` John Drescher
@ 2010-09-15 14:40   ` Borislav Petkov
  2010-09-15 14:42     ` Thomas Gleixner
  2010-09-16 12:28     ` Borislav Petkov
  2010-09-16 20:04   ` Nix
                     ` (2 subsequent siblings)
  5 siblings, 2 replies; 13+ messages in thread
From: Borislav Petkov @ 2010-09-15 14:40 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Ingo Molnar, H. Peter Anvin, Nix, Artur Skawina,
	Damien Wyart, John Drescher, Venkatesh Pallipadi,
	Arjan van de Ven, Herrmann3, Andreas, Suresh Siddha

From: Thomas Gleixner <tglx@linutronix.de>
Date: Wed, Sep 15, 2010 at 09:11:57AM -0400

> On Tue, 14 Sep 2010, tip-bot for Thomas Gleixner wrote:
> > x86: hpet: Work around hardware stupidity
> 
> After my brain recovered from yesterdays exposure with the x86 timer
> horror, I came up with a different solution for this problem, which
> avoids the readback of the compare register completely. It works
> nicely on my affected ATI system, but needs some exposure to the other
> machines.

Will run in on a couple of SBx00 machines I got here.

...

> If cmp is less than 8 HPET clock cycles, then we decide that the event
> has happened already and return -ETIME. That covers the above #1 and
> #2 problems which would cause a wait for HPET wraparound (~306
> seconds).

Make sense. I guess you're choosing a value of 8 just to be on the safe
side wrt to HPET clock cycles it takes to write the cmp register?

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC] x86: hpet: Avoid the readback penalty
  2010-09-15 14:40   ` Borislav Petkov
@ 2010-09-15 14:42     ` Thomas Gleixner
  2010-09-16 12:28     ` Borislav Petkov
  1 sibling, 0 replies; 13+ messages in thread
From: Thomas Gleixner @ 2010-09-15 14:42 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: LKML, Ingo Molnar, H. Peter Anvin, Nix, Artur Skawina,
	Damien Wyart, John Drescher, Venkatesh Pallipadi,
	Arjan van de Ven, Herrmann3, Andreas, Suresh Siddha

On Wed, 15 Sep 2010, Borislav Petkov wrote:

> From: Thomas Gleixner <tglx@linutronix.de>
> Date: Wed, Sep 15, 2010 at 09:11:57AM -0400
> 
> > On Tue, 14 Sep 2010, tip-bot for Thomas Gleixner wrote:
> > > x86: hpet: Work around hardware stupidity
> > 
> > After my brain recovered from yesterdays exposure with the x86 timer
> > horror, I came up with a different solution for this problem, which
> > avoids the readback of the compare register completely. It works
> > nicely on my affected ATI system, but needs some exposure to the other
> > machines.
> 
> Will run in on a couple of SBx00 machines I got here.
> 
> ...
> 
> > If cmp is less than 8 HPET clock cycles, then we decide that the event
> > has happened already and return -ETIME. That covers the above #1 and
> > #2 problems which would cause a wait for HPET wraparound (~306
> > seconds).
> 
> Make sense. I guess you're choosing a value of 8 just to be on the safe
> side wrt to HPET clock cycles it takes to write the cmp register?

Yes, I do _NOT_ trust those hardware dudes at all. A factor 4 seems to
be an appropriate choice:)

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Fwd: [PATCH RFC] x86: hpet: Avoid the readback penalty
       [not found]     ` <AANLkTikHfDUh3LdmyA5eMNACFPfi6pqGLFJiyzVm79X_@mail.gmail.com>
@ 2010-09-16  4:45       ` John Drescher
  0 siblings, 0 replies; 13+ messages in thread
From: John Drescher @ 2010-09-16  4:45 UTC (permalink / raw)
  To: LKML

On Wed, Sep 15, 2010 at 9:54 AM, John Drescher <drescherjm@gmail.com> wrote:
> On Wed, Sep 15, 2010 at 9:11 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> On Tue, 14 Sep 2010, tip-bot for Thomas Gleixner wrote:
>>> x86: hpet: Work around hardware stupidity
>>
>> After my brain recovered from yesterdays exposure with the x86 timer
>> horror, I came up with a different solution for this problem, which
>> avoids the readback of the compare register completely. It works
>> nicely on my affected ATI system, but needs some exposure to the other
>> machines.
>>
>> Comments ?
>>
>
> I can test this on my i7 machine in tonight (around 12 hours from now).

Seems to work fine on my i7 box. I applied the first patch to
2.6.36-rc4-git2 then after that I applied the new patch from this
email. Both patches applied cleanly and I am now running the patched
kernel without any issue. I turned off the clocksource=acpi_pm
workaround I had enabled from the original problem.

jmd1 ~ # uname -a
Linux jmd1 2.6.36-rc4-git2-no-penalty #3 SMP PREEMPT Wed Sep 15
21:25:53 EDT 2010 x86_64 Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz
GenuineIntel GNU/Linux

John



-- 
John M. Drescher

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC] x86: hpet: Avoid the readback penalty
  2010-09-15 14:40   ` Borislav Petkov
  2010-09-15 14:42     ` Thomas Gleixner
@ 2010-09-16 12:28     ` Borislav Petkov
  1 sibling, 0 replies; 13+ messages in thread
From: Borislav Petkov @ 2010-09-16 12:28 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Ingo Molnar, H. Peter Anvin, Nix, Artur Skawina,
	Damien Wyart, John Drescher, Venkatesh Pallipadi,
	Arjan van de Ven, Herrmann3, Andreas, Suresh Siddha

> > On Tue, 14 Sep 2010, tip-bot for Thomas Gleixner wrote:
> > > x86: hpet: Work around hardware stupidity
> > 
> > After my brain recovered from yesterdays exposure with the x86 timer
> > horror, I came up with a different solution for this problem, which
> > avoids the readback of the compare register completely. It works
> > nicely on my affected ATI system, but needs some exposure to the other
> > machines.
> 
> Will run in on a couple of SBx00 machines I got here.

FWIW, the patch doesn't break my SBx00 mini-farm here. Let me know if
you need more testing done, otherwise:

Tested-by: Borislav Petkov <borislav.petkov@amd.com>

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC] x86: hpet: Avoid the readback penalty
  2010-09-15 13:11 ` [PATCH RFC] x86: hpet: Avoid the readback penalty Thomas Gleixner
                     ` (2 preceding siblings ...)
  2010-09-15 14:40   ` Borislav Petkov
@ 2010-09-16 20:04   ` Nix
  2010-09-18  7:49   ` Damien Wyart
  2010-09-18 10:13   ` [tip:x86/timers] x86: Hpet: Avoid the comparator " tip-bot for Thomas Gleixner
  5 siblings, 0 replies; 13+ messages in thread
From: Nix @ 2010-09-16 20:04 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, Neil Brown, linux-nfs

On 15 Sep 2010, Thomas Gleixner said:

> On Tue, 14 Sep 2010, tip-bot for Thomas Gleixner wrote:
>> x86: hpet: Work around hardware stupidity
>
> After my brain recovered from yesterdays exposure with the x86 timer
> horror, I came up with a different solution for this problem, which
> avoids the readback of the compare register completely. It works
> nicely on my affected ATI system, but needs some exposure to the other
> machines.
>
> Comments ?

Works for me. (I still have boot problems, but these seem to be
NFS-related: -ESTALE from an apparently random subset of my mount points
at initial mount time. Restarting rpc.mountd on the server fixes
it. This is with nfs-utils 1.2.2, but I see it right up to 60abb98, that
being the most recent version I tested. This doesn't happen if the
client is running 2.6.34.x, as far as I can tell. It also doesn't happen
with all my 2.6.35.x clients. Sigh. I hate bugs like this.)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC] x86: hpet: Avoid the readback penalty
  2010-09-15 13:11 ` [PATCH RFC] x86: hpet: Avoid the readback penalty Thomas Gleixner
                     ` (3 preceding siblings ...)
  2010-09-16 20:04   ` Nix
@ 2010-09-18  7:49   ` Damien Wyart
  2010-09-18  9:41     ` Thomas Gleixner
  2010-09-18 13:43     ` Nix
  2010-09-18 10:13   ` [tip:x86/timers] x86: Hpet: Avoid the comparator " tip-bot for Thomas Gleixner
  5 siblings, 2 replies; 13+ messages in thread
From: Damien Wyart @ 2010-09-18  7:49 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Ingo Molnar, H. Peter Anvin, Nix, Artur Skawina,
	John Drescher, Venkatesh Pallipadi, Arjan van de Ven,
	Andreas Herrmann, Borislav Petkov, Suresh Siddha

* Thomas Gleixner <tglx@linutronix.de> [2010-09-15 15:11]:
> > x86: hpet: Work around hardware stupidity

> After my brain recovered from yesterdays exposure with the x86 timer
> horror, I came up with a different solution for this problem, which
> avoids the readback of the compare register completely. It works
> nicely on my affected ATI system, but needs some exposure to the other
> machines.

Comments for this different solution seemed fine, but it seems only the
first one was commited into mainline and then stable. Is this intended?


Thanks,

Damien

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC] x86: hpet: Avoid the readback penalty
  2010-09-18  7:49   ` Damien Wyart
@ 2010-09-18  9:41     ` Thomas Gleixner
  2010-09-18 13:43     ` Nix
  1 sibling, 0 replies; 13+ messages in thread
From: Thomas Gleixner @ 2010-09-18  9:41 UTC (permalink / raw)
  To: Damien Wyart
  Cc: LKML, Ingo Molnar, H. Peter Anvin, Nix, Artur Skawina,
	John Drescher, Venkatesh Pallipadi, Arjan van de Ven,
	Andreas Herrmann, Borislav Petkov, Suresh Siddha

On Sat, 18 Sep 2010, Damien Wyart wrote:

> * Thomas Gleixner <tglx@linutronix.de> [2010-09-15 15:11]:
> > > x86: hpet: Work around hardware stupidity
> 
> > After my brain recovered from yesterdays exposure with the x86 timer
> > horror, I came up with a different solution for this problem, which
> > avoids the readback of the compare register completely. It works
> > nicely on my affected ATI system, but needs some exposure to the other
> > machines.
> 
> Comments for this different solution seemed fine, but it seems only the
> first one was commited into mainline and then stable. Is this intended?

Yes. I wanted to have confirmation for the second solution (also from
the HW folks). I'm queueing it for .37

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [tip:x86/timers] x86: Hpet: Avoid the comparator readback penalty
  2010-09-15 13:11 ` [PATCH RFC] x86: hpet: Avoid the readback penalty Thomas Gleixner
                     ` (4 preceding siblings ...)
  2010-09-18  7:49   ` Damien Wyart
@ 2010-09-18 10:13   ` tip-bot for Thomas Gleixner
  5 siblings, 0 replies; 13+ messages in thread
From: tip-bot for Thomas Gleixner @ 2010-09-18 10:13 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, andreas.herrmann3, arjan, drescherjm,
	art.08.09, damien.wyart, suresh.b.siddha, tglx, nix,
	borislav.petkov, venki

Commit-ID:  995bd3bb5c78f3ff71339803c0b8337ed36d64fb
Gitweb:     http://git.kernel.org/tip/995bd3bb5c78f3ff71339803c0b8337ed36d64fb
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Wed, 15 Sep 2010 15:11:57 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sat, 18 Sep 2010 12:09:13 +0200

x86: Hpet: Avoid the comparator readback penalty

Due to the overly intelligent design of HPETs, we need to workaround
the problem that the compare value which we write is already behind
the actual counter value at the point where the value hits the real
compare register. This happens for two reasons:

1) We read out the counter, add the delta and write the result to the
   compare register. When a NMI or SMI hits between the read out and
   the write then the counter can be ahead of the event already

2) The write to the compare register is delayed by up to two HPET
   cycles in certain chipsets.

We worked around this by reading back the compare register to make
sure that the written value has hit the hardware. For certain ICH9+
chipsets this can require two readouts, as the first one can return
the previous compare register value. That's bad performance wise for
the normal case where the event is far enough in the future.

As we already know that the write can be delayed by up to two cycles
we can avoid the read back of the compare register completely if we
make the decision whether the delta has elapsed already or not based
on the following calculation:

  cmp = event - actual_count;

If cmp is less than 8 HPET clock cycles, then we decide that the event
has happened already and return -ETIME. That covers the above #1 and
#2 problems which would cause a wait for HPET wraparound (~306
seconds).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nix <nix@esperi.org.uk>
Tested-by: Artur Skawina <art.08.09@gmail.com>
Cc: Damien Wyart <damien.wyart@free.fr>
Tested-by: John Drescher <drescherjm@gmail.com>
Cc: Venkatesh Pallipadi <venki@google.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Tested-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <alpine.LFD.2.00.1009151500060.2416@localhost6.localdomain6>

---
 arch/x86/kernel/hpet.c |   51 +++++++++++++++++++----------------------------
 1 files changed, 21 insertions(+), 30 deletions(-)

diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 410fdb3..0b568b3 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -380,44 +380,35 @@ static int hpet_next_event(unsigned long delta,
 			   struct clock_event_device *evt, int timer)
 {
 	u32 cnt;
+	s32 res;
 
 	cnt = hpet_readl(HPET_COUNTER);
 	cnt += (u32) delta;
 	hpet_writel(cnt, HPET_Tn_CMP(timer));
 
 	/*
-	 * We need to read back the CMP register on certain HPET
-	 * implementations (ATI chipsets) which seem to delay the
-	 * transfer of the compare register into the internal compare
-	 * logic. With small deltas this might actually be too late as
-	 * the counter could already be higher than the compare value
-	 * at that point and we would wait for the next hpet interrupt
-	 * forever. We found out that reading the CMP register back
-	 * forces the transfer so we can rely on the comparison with
-	 * the counter register below. If the read back from the
-	 * compare register does not match the value we programmed
-	 * then we might have a real hardware problem. We can not do
-	 * much about it here, but at least alert the user/admin with
-	 * a prominent warning.
-	 *
-	 * An erratum on some chipsets (ICH9,..), results in
-	 * comparator read immediately following a write returning old
-	 * value. Workaround for this is to read this value second
-	 * time, when first read returns old value.
-	 *
-	 * In fact the write to the comparator register is delayed up
-	 * to two HPET cycles so the workaround we tried to restrict
-	 * the readback to those known to be borked ATI chipsets
-	 * failed miserably. So we give up on optimizations forever
-	 * and penalize all HPET incarnations unconditionally.
+	 * HPETs are a complete disaster. The compare register is
+	 * based on a equal comparison and neither provides a less
+	 * than or equal functionality (which would require to take
+	 * the wraparound into account) nor a simple count down event
+	 * mode. Further the write to the comparator register is
+	 * delayed internally up to two HPET clock cycles in certain
+	 * chipsets (ATI, ICH9,10). We worked around that by reading
+	 * back the compare register, but that required another
+	 * workaround for ICH9,10 chips where the first readout after
+	 * write can return the old stale value. We already have a
+	 * minimum delta of 5us enforced, but a NMI or SMI hitting
+	 * between the counter readout and the comparator write can
+	 * move us behind that point easily. Now instead of reading
+	 * the compare register back several times, we make the ETIME
+	 * decision based on the following: Return ETIME if the
+	 * counter value after the write is less than 8 HPET cycles
+	 * away from the event or if the counter is already ahead of
+	 * the event.
 	 */
-	if (unlikely((u32)hpet_readl(HPET_Tn_CMP(timer)) != cnt)) {
-		if (hpet_readl(HPET_Tn_CMP(timer)) != cnt)
-			printk_once(KERN_WARNING
-				"hpet: compare register read back failed.\n");
-	}
+	res = (s32)(cnt - hpet_readl(HPET_COUNTER));
 
-	return (s32)(hpet_readl(HPET_COUNTER) - cnt) >= 0 ? -ETIME : 0;
+	return res < 8 ? -ETIME : 0;
 }
 
 static void hpet_legacy_set_mode(enum clock_event_mode mode,

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC] x86: hpet: Avoid the readback penalty
  2010-09-18  7:49   ` Damien Wyart
  2010-09-18  9:41     ` Thomas Gleixner
@ 2010-09-18 13:43     ` Nix
  1 sibling, 0 replies; 13+ messages in thread
From: Nix @ 2010-09-18 13:43 UTC (permalink / raw)
  To: Damien Wyart
  Cc: Thomas Gleixner, LKML, Ingo Molnar, H. Peter Anvin,
	Artur Skawina, John Drescher, Venkatesh Pallipadi,
	Arjan van de Ven, Andreas Herrmann, Borislav Petkov,
	Suresh Siddha

On 18 Sep 2010, Damien Wyart uttered the following:

> * Thomas Gleixner <tglx@linutronix.de> [2010-09-15 15:11]:
>> > x86: hpet: Work around hardware stupidity
>
>> After my brain recovered from yesterdays exposure with the x86 timer
>> horror, I came up with a different solution for this problem, which
>> avoids the readback of the compare register completely. It works
>> nicely on my affected ATI system, but needs some exposure to the other
>> machines.
>
> Comments for this different solution seemed fine, but it seems only the
> first one was commited into mainline and then stable. Is this intended?

I assumed he was letting people shake the bugs out rather than inflict
what is basically just a performance improvement on -stable. For the
record: no bugs here. I also know why one of my machines was unaffected,
and it provides even more confirmation that this bug is correctly
identified, as if we needed more. It's not using the HPET at all:

Clock Event Device: cs5535-clockevt

Every single machine I owned with an HPET was affected by this bug: they
all work perfectly well now.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2010-09-18 13:43 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <tip-54ff7e595d763d894104d421b103a89f7becf47c@git.kernel.org>
2010-09-14 23:36 ` [tip:x86/urgent] x86: hpet: Work around hardware stupidity Venkatesh Pallipadi
2010-09-15 13:11 ` [PATCH RFC] x86: hpet: Avoid the readback penalty Thomas Gleixner
2010-09-15 13:43   ` Anders Larsen
2010-09-15 13:54   ` John Drescher
     [not found]     ` <AANLkTikHfDUh3LdmyA5eMNACFPfi6pqGLFJiyzVm79X_@mail.gmail.com>
2010-09-16  4:45       ` Fwd: " John Drescher
2010-09-15 14:40   ` Borislav Petkov
2010-09-15 14:42     ` Thomas Gleixner
2010-09-16 12:28     ` Borislav Petkov
2010-09-16 20:04   ` Nix
2010-09-18  7:49   ` Damien Wyart
2010-09-18  9:41     ` Thomas Gleixner
2010-09-18 13:43     ` Nix
2010-09-18 10:13   ` [tip:x86/timers] x86: Hpet: Avoid the comparator " tip-bot for Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.