All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/1] i386: Make pmtimer tsc calibration not take 51 seconds to fail
@ 2022-07-14 21:42 Robbie Harwood
  2022-07-14 21:42 ` [PATCH v3 1/1] " Robbie Harwood
  0 siblings, 1 reply; 6+ messages in thread
From: Robbie Harwood @ 2022-07-14 21:42 UTC (permalink / raw)
  To: grub-devel; +Cc: Robbie Harwood

This is actually the fourth time this patch has been posted, but last
time it was v2, so I guess we're zero-indexing.

Changes from v2:
- Rebase forward 6 months
- Grammar pass over commit message to address Paul Menzel's review

Be well,
--Robbie

Peter Jones (1):
  i386: Make pmtimer tsc calibration not take 51 seconds to fail

 grub-core/kern/i386/tsc_pmtimer.c | 112 ++++++++++++++++++++++++------
 1 file changed, 92 insertions(+), 20 deletions(-)

-- 
2.35.1



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v3 1/1] i386: Make pmtimer tsc calibration not take 51 seconds to fail
  2022-07-14 21:42 [PATCH v3 0/1] i386: Make pmtimer tsc calibration not take 51 seconds to fail Robbie Harwood
@ 2022-07-14 21:42 ` Robbie Harwood
  2022-07-19 13:28   ` Daniel Kiper
  0 siblings, 1 reply; 6+ messages in thread
From: Robbie Harwood @ 2022-07-14 21:42 UTC (permalink / raw)
  To: grub-devel; +Cc: Peter Jones, Javier Martinez Canillas, Robbie Harwood

From: Peter Jones <pjones@redhat.com>

On my laptop running at 2.4GHz, if I run a VM where tsc calibration
using pmtimer will fail presuming a broken pmtimer, it takes ~51 seconds
to do so (as measured with the stopwatch on my phone), with a tsc delta
of 0x1cd1c85300, or around 125 billion cycles.

If instead of trying to wait for 5-200ms to show up on the pmtimer, we try
to wait for 5-200us, it decides it's broken in ~0x2626aa0 TSCs, aka ~2.4
million cycles, or more or less instantly.

Additionally, this reading the pmtimer was returning 0xffffffff anyway,
and that's obviously an invalid return.  I've added a check for that and
0 so we don't bother waiting for the test if what we're seeing is dead
pins with no response at all.

If "debug" includes "pmtimer", you will see one of the following three
outcomes.  If pmtimer gives all 0 or all 1 bits, you will see:

pmtimer: 0xffffff bad_reads: 1
pmtimer: 0xffffff bad_reads: 2
pmtimer: 0xffffff bad_reads: 3
pmtimer: 0xffffff bad_reads: 4
pmtimer: 0xffffff bad_reads: 5
pmtimer: 0xffffff bad_reads: 6
pmtimer: 0xffffff bad_reads: 7
pmtimer: 0xffffff bad_reads: 8
pmtimer: 0xffffff bad_reads: 9
pmtimer: 0xffffff bad_reads: 10
timer is broken; giving up.

This outcome was tested using qemu+kvm with UEFI (OVMF) firmware and
these options: -machine pc-q35-2.10 -cpu Broadwell-noTSX

If pmtimer gives any other bit patterns but is not actually marching
forward fast enough to use for clock calibration, you will see:

pmtimer delta is 0x0 (1904 iterations)
tsc delta is implausible: 0x2626aa0

This outcome was tested using grub compiled with
GRUB_PMTIMER_IGNORE_BAD_READS defined (so as not to trip the bad read
test) using qemu+kvm with UEFI (OVMF) firmware, and these options:
-machine pc-q35-2.10 -cpu Broadwell-noTSX

If pmtimer actually works, you'll see something like:

pmtimer delta is 0xdff
tsc delta is 0x278756

This outcome was tested using qemu+kvm with UEFI (OVMF) firmware, and
these options: -machine pc-i440fx-2.4 -cpu Broadwell-noTSX

I've also tested this outcome on a real Intel Xeon E3-1275v3 on an Intel
Server Board S1200V3RPS using the SDV.RP.B8 "Release" build here:
https://firmware.intel.com/sites/default/files/UEFIDevKit_S1200RP_vB8.zip

Signed-off-by: Peter Jones <pjones@redhat.com>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Signed-off-by: Robbie Harwood <rharwood@redhat.com>
---
 grub-core/kern/i386/tsc_pmtimer.c | 112 ++++++++++++++++++++++++------
 1 file changed, 92 insertions(+), 20 deletions(-)

diff --git a/grub-core/kern/i386/tsc_pmtimer.c b/grub-core/kern/i386/tsc_pmtimer.c
index c9c3616997..d9b3765b01 100644
--- a/grub-core/kern/i386/tsc_pmtimer.c
+++ b/grub-core/kern/i386/tsc_pmtimer.c
@@ -28,40 +28,104 @@
 #include <grub/acpi.h>
 #include <grub/cpu/io.h>
 
+/*
+ * Define GRUB_PMTIMER_IGNORE_BAD_READS if you're trying to test a timer that's
+ * present but doesn't keep time well.
+ */
+// #define GRUB_PMTIMER_IGNORE_BAD_READS
+
 grub_uint64_t
 grub_pmtimer_wait_count_tsc (grub_port_t pmtimer,
 			     grub_uint16_t num_pm_ticks)
 {
   grub_uint32_t start;
-  grub_uint32_t last;
-  grub_uint32_t cur, end;
+  grub_uint64_t cur, end;
   grub_uint64_t start_tsc;
   grub_uint64_t end_tsc;
-  int num_iter = 0;
+  unsigned int num_iter = 0;
+#ifndef GRUB_PMTIMER_IGNORE_BAD_READS
+  int bad_reads = 0;
+#endif
 
-  start = grub_inl (pmtimer) & 0xffffff;
-  last = start;
+  /*
+   * Some timers are 24-bit and some are 32-bit, but it doesn't make much
+   * difference to us.  Caring which one we have isn't really worth it since
+   * the low-order digits will give us enough data to calibrate TSC.  So just
+   * mask the top-order byte off.
+   */
+  cur = start = grub_inl (pmtimer) & 0x00ffffffUL;
   end = start + num_pm_ticks;
   start_tsc = grub_get_tsc ();
   while (1)
     {
-      cur = grub_inl (pmtimer) & 0xffffff;
-      if (cur < last)
-	cur |= 0x1000000;
-      num_iter++;
+      cur &= 0xffffffffff000000ULL;
+      /*
+       * Only take the low-order 24-bit for the reason explained above.
+       */
+      cur |= grub_inl (pmtimer) & 0x00ffffffUL;
+
+      end_tsc = grub_get_tsc();
+
+#ifndef GRUB_PMTIMER_IGNORE_BAD_READS
+      /*
+       * If we get 10 reads in a row that are obviously dead pins, there's no
+       * reason to do this thousands of times.
+       */
+      if (cur == 0xffffffUL || cur == 0)
+	{
+	  bad_reads++;
+	  grub_dprintf ("pmtimer",
+			"pmtimer: 0x%"PRIxGRUB_UINT64_T" bad_reads: %d\n",
+			cur, bad_reads);
+	  grub_dprintf ("pmtimer", "timer is broken; giving up.\n");
+
+	  if (bad_reads == 10)
+	    return 0;
+	}
+#endif
+
+      if (cur < start)
+	cur += 0x1000000;
+
       if (cur >= end)
 	{
-	  end_tsc = grub_get_tsc ();
+	  grub_dprintf ("pmtimer", "pmtimer delta is 0x%"PRIxGRUB_UINT64_T"\n",
+			cur - start);
+	  grub_dprintf ("pmtimer", "tsc delta is 0x%"PRIxGRUB_UINT64_T"\n",
+			end_tsc - start_tsc);
 	  return end_tsc - start_tsc;
 	}
-      /* Check for broken PM timer.
-	 50000000 TSCs is between 5 ms (10GHz) and 200 ms (250 MHz)
-	 if after this time we still don't have 1 ms on pmtimer, then
-	 pmtimer is broken.
+
+      /*
+       * Check for broken PM timer.  1ms at 10GHz should be 1E+7 TSCs; at
+       * 250MHz it should be 2.5E5.  So if after 4E+7 TSCs on a 10GHz machine,
+       * we should have seen pmtimer show 4ms of change (i.e. cur =~
+       * start+14320); on a 250MHz machine that should be 160ms (start+572800).
+       * If after this a time we still don't have 1ms on pmtimer, then pmtimer
+       * is broken.
+       *
+       * Likewise, if our code is perfectly efficient and introduces no delays
+       * whatsoever, on a 10GHz system we should see a TSC delta of 3580 in
+       * ~3580 iterations.  On a 250MHz machine that should be ~900 iterations.
+       *
+       * With those factors in mind, there are two limits here.  There's a hard
+       * limit here at 8x our desired pm timer delta. This limit was picked as
+       * an arbitrarily large value that's still not a lot of time to humans,
+       * because if we get that far this is either an implausibly fast machine
+       * or the pmtimer is not running.  And there is another limit on a 4 ms TSC
+       * delta on a 10 GHz clock, without seeing cur converge on our target value.
        */
-      if ((num_iter & 0xffffff) == 0 && grub_get_tsc () - start_tsc > 5000000) {
-	return 0;
-      }
+      if ((++num_iter > (grub_uint32_t)num_pm_ticks << 3UL) ||
+	  end_tsc - start_tsc > 40000000)
+	{
+	  grub_dprintf ("pmtimer",
+			"pmtimer delta is 0x%"PRIxGRUB_UINT64_T" (%u iterations)\n",
+			cur - start, num_iter);
+	  grub_dprintf ("pmtimer",
+			"tsc delta is implausible: 0x%"PRIxGRUB_UINT64_T"\n",
+			end_tsc - start_tsc);
+	  return 0;
+	}
     }
 }
 
@@ -74,12 +138,20 @@ grub_tsc_calibrate_from_pmtimer (void)
 
   fadt = grub_acpi_find_fadt ();
   if (!fadt)
-    return 0;
+    {
+      grub_dprintf ("pmtimer", "No FADT found; not using pmtimer.\n");
+      return 0;
+    }
   pmtimer = fadt->pmtimer;
   if (!pmtimer)
-    return 0;
+    {
+      grub_dprintf ("pmtimer", "FADT does not specify pmtimer; skipping.\n");
+      return 0;
+    }
 
-  /* It's 3.579545 MHz clock. Wait 1 ms.  */
+  /*
+   * It's 3.579545 MHz clock. Wait 1 ms.
+   */
   tsc_diff = grub_pmtimer_wait_count_tsc (pmtimer, 3580);
   if (tsc_diff == 0)
     return 0;
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 1/1] i386: Make pmtimer tsc calibration not take 51 seconds to fail
  2022-07-14 21:42 ` [PATCH v3 1/1] " Robbie Harwood
@ 2022-07-19 13:28   ` Daniel Kiper
  2022-07-19 17:38     ` Robbie Harwood
  2022-07-20 13:09     ` Daniel Kiper
  0 siblings, 2 replies; 6+ messages in thread
From: Daniel Kiper @ 2022-07-19 13:28 UTC (permalink / raw)
  To: Robbie Harwood; +Cc: grub-devel, Peter Jones, Javier Martinez Canillas

On Thu, Jul 14, 2022 at 05:42:51PM -0400, Robbie Harwood wrote:
> From: Peter Jones <pjones@redhat.com>
>
> On my laptop running at 2.4GHz, if I run a VM where tsc calibration
> using pmtimer will fail presuming a broken pmtimer, it takes ~51 seconds
> to do so (as measured with the stopwatch on my phone), with a tsc delta
> of 0x1cd1c85300, or around 125 billion cycles.
>
> If instead of trying to wait for 5-200ms to show up on the pmtimer, we try
> to wait for 5-200us, it decides it's broken in ~0x2626aa0 TSCs, aka ~2.4
> million cycles, or more or less instantly.
>
> Additionally, this reading the pmtimer was returning 0xffffffff anyway,
> and that's obviously an invalid return.  I've added a check for that and
> 0 so we don't bother waiting for the test if what we're seeing is dead
> pins with no response at all.
>
> If "debug" includes "pmtimer", you will see one of the following three
> outcomes.  If pmtimer gives all 0 or all 1 bits, you will see:
>
> pmtimer: 0xffffff bad_reads: 1
> pmtimer: 0xffffff bad_reads: 2
> pmtimer: 0xffffff bad_reads: 3
> pmtimer: 0xffffff bad_reads: 4
> pmtimer: 0xffffff bad_reads: 5
> pmtimer: 0xffffff bad_reads: 6
> pmtimer: 0xffffff bad_reads: 7
> pmtimer: 0xffffff bad_reads: 8
> pmtimer: 0xffffff bad_reads: 9
> pmtimer: 0xffffff bad_reads: 10
> timer is broken; giving up.
>
> This outcome was tested using qemu+kvm with UEFI (OVMF) firmware and
> these options: -machine pc-q35-2.10 -cpu Broadwell-noTSX
>
> If pmtimer gives any other bit patterns but is not actually marching
> forward fast enough to use for clock calibration, you will see:
>
> pmtimer delta is 0x0 (1904 iterations)
> tsc delta is implausible: 0x2626aa0
>
> This outcome was tested using grub compiled with
> GRUB_PMTIMER_IGNORE_BAD_READS defined (so as not to trip the bad read
> test) using qemu+kvm with UEFI (OVMF) firmware, and these options:
> -machine pc-q35-2.10 -cpu Broadwell-noTSX
>
> If pmtimer actually works, you'll see something like:
>
> pmtimer delta is 0xdff
> tsc delta is 0x278756
>
> This outcome was tested using qemu+kvm with UEFI (OVMF) firmware, and
> these options: -machine pc-i440fx-2.4 -cpu Broadwell-noTSX
>
> I've also tested this outcome on a real Intel Xeon E3-1275v3 on an Intel
> Server Board S1200V3RPS using the SDV.RP.B8 "Release" build here:
> https://firmware.intel.com/sites/default/files/UEFIDevKit_S1200RP_vB8.zip
>
> Signed-off-by: Peter Jones <pjones@redhat.com>
> Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
> Signed-off-by: Robbie Harwood <rharwood@redhat.com>
> ---
>  grub-core/kern/i386/tsc_pmtimer.c | 112 ++++++++++++++++++++++++------
>  1 file changed, 92 insertions(+), 20 deletions(-)
>
> diff --git a/grub-core/kern/i386/tsc_pmtimer.c b/grub-core/kern/i386/tsc_pmtimer.c
> index c9c3616997..d9b3765b01 100644
> --- a/grub-core/kern/i386/tsc_pmtimer.c
> +++ b/grub-core/kern/i386/tsc_pmtimer.c
> @@ -28,40 +28,104 @@
>  #include <grub/acpi.h>
>  #include <grub/cpu/io.h>
>
> +/*
> + * Define GRUB_PMTIMER_IGNORE_BAD_READS if you're trying to test a timer that's
> + * present but doesn't keep time well.
> + */
> +// #define GRUB_PMTIMER_IGNORE_BAD_READS
> +
>  grub_uint64_t
>  grub_pmtimer_wait_count_tsc (grub_port_t pmtimer,
>  			     grub_uint16_t num_pm_ticks)
>  {
>    grub_uint32_t start;
> -  grub_uint32_t last;
> -  grub_uint32_t cur, end;
> +  grub_uint64_t cur, end;
>    grub_uint64_t start_tsc;
>    grub_uint64_t end_tsc;
> -  int num_iter = 0;
> +  unsigned int num_iter = 0;
> +#ifndef GRUB_PMTIMER_IGNORE_BAD_READS
> +  int bad_reads = 0;
> +#endif
>
> -  start = grub_inl (pmtimer) & 0xffffff;
> -  last = start;
> +  /*
> +   * Some timers are 24-bit and some are 32-bit, but it doesn't make much
> +   * difference to us.  Caring which one we have isn't really worth it since
> +   * the low-order digits will give us enough data to calibrate TSC.  So just
> +   * mask the top-order byte off.
> +   */
> +  cur = start = grub_inl (pmtimer) & 0x00ffffffUL;
>    end = start + num_pm_ticks;
>    start_tsc = grub_get_tsc ();
>    while (1)
>      {
> -      cur = grub_inl (pmtimer) & 0xffffff;
> -      if (cur < last)
> -	cur |= 0x1000000;
> -      num_iter++;
> +      cur &= 0xffffffffff000000ULL;
> +      /*
> +       * Only take the low-order 24-bit for the reason explained above.
> +       */
> +      cur |= grub_inl (pmtimer) & 0x00ffffffUL;
> +
> +      end_tsc = grub_get_tsc();
> +
> +#ifndef GRUB_PMTIMER_IGNORE_BAD_READS

I still do not understand why we need compile this code conditionally.
Why cannot we build it always and enable only when debug is set to
pmtimer/all? Or use another variable to enable this test?

> +      /*
> +       * If we get 10 reads in a row that are obviously dead pins, there's no
> +       * reason to do this thousands of times.
> +       */
> +      if (cur == 0xffffffUL || cur == 0)
> +	{
> +	  bad_reads++;
> +	  grub_dprintf ("pmtimer",
> +			"pmtimer: 0x%"PRIxGRUB_UINT64_T" bad_reads: %d\n",
> +			cur, bad_reads);
> +	  grub_dprintf ("pmtimer", "timer is broken; giving up.\n");

It seems to me this grub_dprintf() should be moved...

> +
> +	  if (bad_reads == 10)

... here.

> +	    return 0;
> +	}
> +#endif
> +
> +      if (cur < start)
> +	cur += 0x1000000;
> +
>        if (cur >= end)
>  	{
> -	  end_tsc = grub_get_tsc ();
> +	  grub_dprintf ("pmtimer", "pmtimer delta is 0x%"PRIxGRUB_UINT64_T"\n",
> +			cur - start);
> +	  grub_dprintf ("pmtimer", "tsc delta is 0x%"PRIxGRUB_UINT64_T"\n",
> +			end_tsc - start_tsc);
>  	  return end_tsc - start_tsc;
>  	}
> -      /* Check for broken PM timer.
> -	 50000000 TSCs is between 5 ms (10GHz) and 200 ms (250 MHz)
> -	 if after this time we still don't have 1 ms on pmtimer, then
> -	 pmtimer is broken.
> +
> +      /*
> +       * Check for broken PM timer.  1ms at 10GHz should be 1E+7 TSCs; at
> +       * 250MHz it should be 2.5E5.  So if after 4E+7 TSCs on a 10GHz machine,
> +       * we should have seen pmtimer show 4ms of change (i.e. cur =~
> +       * start+14320); on a 250MHz machine that should be 160ms (start+572800).
> +       * If after this a time we still don't have 1ms on pmtimer, then pmtimer
> +       * is broken.
> +       *
> +       * Likewise, if our code is perfectly efficient and introduces no delays
> +       * whatsoever, on a 10GHz system we should see a TSC delta of 3580 in
> +       * ~3580 iterations.  On a 250MHz machine that should be ~900 iterations.
> +       *
> +       * With those factors in mind, there are two limits here.  There's a hard
> +       * limit here at 8x our desired pm timer delta. This limit was picked as
> +       * an arbitrarily large value that's still not a lot of time to humans,
> +       * because if we get that far this is either an implausibly fast machine
> +       * or the pmtimer is not running.  And there is another limit on a 4 ms TSC
> +       * delta on a 10 GHz clock, without seeing cur converge on our target value.
>         */
> -      if ((num_iter & 0xffffff) == 0 && grub_get_tsc () - start_tsc > 5000000) {
> -	return 0;
> -      }
> +      if ((++num_iter > (grub_uint32_t)num_pm_ticks << 3UL) ||

s/grub_uint32_t/unsigned int/? And please add space between ")" and num_pm_ticks.

> +	  end_tsc - start_tsc > 40000000)

You can put this in one line with "if" above. In general I am OK with
lines a bit longer than 80 chars...

> +	{
> +	  grub_dprintf ("pmtimer",
> +			"pmtimer delta is 0x%"PRIxGRUB_UINT64_T" (%u iterations)\n",
> +			cur - start, num_iter);
> +	  grub_dprintf ("pmtimer",
> +			"tsc delta is implausible: 0x%"PRIxGRUB_UINT64_T"\n",
> +			end_tsc - start_tsc);
> +	  return 0;
> +	}
>      }
>  }
>
> @@ -74,12 +138,20 @@ grub_tsc_calibrate_from_pmtimer (void)
>
>    fadt = grub_acpi_find_fadt ();
>    if (!fadt)
> -    return 0;
> +    {
> +      grub_dprintf ("pmtimer", "No FADT found; not using pmtimer.\n");
> +      return 0;
> +    }
>    pmtimer = fadt->pmtimer;
>    if (!pmtimer)
> -    return 0;
> +    {
> +      grub_dprintf ("pmtimer", "FADT does not specify pmtimer; skipping.\n");
> +      return 0;
> +    }
>
> -  /* It's 3.579545 MHz clock. Wait 1 ms.  */
> +  /*
> +   * It's 3.579545 MHz clock. Wait 1 ms.
> +   */

Please drop this change.

Daniel


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 1/1] i386: Make pmtimer tsc calibration not take 51 seconds to fail
  2022-07-19 13:28   ` Daniel Kiper
@ 2022-07-19 17:38     ` Robbie Harwood
  2022-07-20 13:28       ` Daniel Kiper
  2022-07-20 13:09     ` Daniel Kiper
  1 sibling, 1 reply; 6+ messages in thread
From: Robbie Harwood @ 2022-07-19 17:38 UTC (permalink / raw)
  To: Daniel Kiper; +Cc: grub-devel, Peter Jones, Javier Martinez Canillas

[-- Attachment #1: Type: text/plain, Size: 1351 bytes --]

Daniel Kiper <dkiper@net-space.pl> writes:

> On Thu, Jul 14, 2022 at 05:42:51PM -0400, Robbie Harwood wrote:
>
>> This outcome was tested using grub compiled with
>> GRUB_PMTIMER_IGNORE_BAD_READS defined (so as not to trip the bad read
>> test) using qemu+kvm with UEFI (OVMF) firmware, and these options:
>> -machine pc-q35-2.10 -cpu Broadwell-noTSX

>> +/*
>> + * Define GRUB_PMTIMER_IGNORE_BAD_READS if you're trying to test a timer that's
>> + * present but doesn't keep time well.
>> + */
>> +// #define GRUB_PMTIMER_IGNORE_BAD_READS

>> +#ifndef GRUB_PMTIMER_IGNORE_BAD_READS
>
> I still do not understand why we need compile this code conditionally.
> Why cannot we build it always and enable only when debug is set to
> pmtimer/all? Or use another variable to enable this test?

It's mentioned in the commit message and the comment I've quoted above
that this is for testing - there isn't a use case for defining it
otherwise.  (I also touched on this some in reply to Paul Menzel.)

As it changes more behavior than just debug prints, it doesn't make
sense to me to tie it to debug.  More generally, I'd rather remove it
than making the behavior runtime-configurable.  Would you prefer
removing the #define entirely to the patch as written now?

(Rest of comments make sense to me and I'll cut a v4 once we resolve
this.)

Be well,
--Robbie

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 861 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 1/1] i386: Make pmtimer tsc calibration not take 51 seconds to fail
  2022-07-19 13:28   ` Daniel Kiper
  2022-07-19 17:38     ` Robbie Harwood
@ 2022-07-20 13:09     ` Daniel Kiper
  1 sibling, 0 replies; 6+ messages in thread
From: Daniel Kiper @ 2022-07-20 13:09 UTC (permalink / raw)
  To: Robbie Harwood; +Cc: grub-devel, Peter Jones, Javier Martinez Canillas

On Tue, Jul 19, 2022 at 03:28:35PM +0200, Daniel Kiper wrote:
> On Thu, Jul 14, 2022 at 05:42:51PM -0400, Robbie Harwood wrote:
> > From: Peter Jones <pjones@redhat.com>
> >
> > On my laptop running at 2.4GHz, if I run a VM where tsc calibration
> > using pmtimer will fail presuming a broken pmtimer, it takes ~51 seconds
> > to do so (as measured with the stopwatch on my phone), with a tsc delta
> > of 0x1cd1c85300, or around 125 billion cycles.
> >
> > If instead of trying to wait for 5-200ms to show up on the pmtimer, we try
> > to wait for 5-200us, it decides it's broken in ~0x2626aa0 TSCs, aka ~2.4
> > million cycles, or more or less instantly.
> >
> > Additionally, this reading the pmtimer was returning 0xffffffff anyway,
> > and that's obviously an invalid return.  I've added a check for that and
> > 0 so we don't bother waiting for the test if what we're seeing is dead
> > pins with no response at all.
> >
> > If "debug" includes "pmtimer", you will see one of the following three
> > outcomes.  If pmtimer gives all 0 or all 1 bits, you will see:
> >
> > pmtimer: 0xffffff bad_reads: 1
> > pmtimer: 0xffffff bad_reads: 2
> > pmtimer: 0xffffff bad_reads: 3
> > pmtimer: 0xffffff bad_reads: 4
> > pmtimer: 0xffffff bad_reads: 5
> > pmtimer: 0xffffff bad_reads: 6
> > pmtimer: 0xffffff bad_reads: 7
> > pmtimer: 0xffffff bad_reads: 8
> > pmtimer: 0xffffff bad_reads: 9
> > pmtimer: 0xffffff bad_reads: 10
> > timer is broken; giving up.
> >
> > This outcome was tested using qemu+kvm with UEFI (OVMF) firmware and
> > these options: -machine pc-q35-2.10 -cpu Broadwell-noTSX
> >
> > If pmtimer gives any other bit patterns but is not actually marching
> > forward fast enough to use for clock calibration, you will see:
> >
> > pmtimer delta is 0x0 (1904 iterations)
> > tsc delta is implausible: 0x2626aa0
> >
> > This outcome was tested using grub compiled with
> > GRUB_PMTIMER_IGNORE_BAD_READS defined (so as not to trip the bad read
> > test) using qemu+kvm with UEFI (OVMF) firmware, and these options:
> > -machine pc-q35-2.10 -cpu Broadwell-noTSX
> >
> > If pmtimer actually works, you'll see something like:
> >
> > pmtimer delta is 0xdff
> > tsc delta is 0x278756
> >
> > This outcome was tested using qemu+kvm with UEFI (OVMF) firmware, and
> > these options: -machine pc-i440fx-2.4 -cpu Broadwell-noTSX
> >
> > I've also tested this outcome on a real Intel Xeon E3-1275v3 on an Intel
> > Server Board S1200V3RPS using the SDV.RP.B8 "Release" build here:
> > https://firmware.intel.com/sites/default/files/UEFIDevKit_S1200RP_vB8.zip
> >
> > Signed-off-by: Peter Jones <pjones@redhat.com>
> > Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
> > Signed-off-by: Robbie Harwood <rharwood@redhat.com>
> > ---
> >  grub-core/kern/i386/tsc_pmtimer.c | 112 ++++++++++++++++++++++++------
> >  1 file changed, 92 insertions(+), 20 deletions(-)
> >
> > diff --git a/grub-core/kern/i386/tsc_pmtimer.c b/grub-core/kern/i386/tsc_pmtimer.c
> > index c9c3616997..d9b3765b01 100644
> > --- a/grub-core/kern/i386/tsc_pmtimer.c
> > +++ b/grub-core/kern/i386/tsc_pmtimer.c
> > @@ -28,40 +28,104 @@
> >  #include <grub/acpi.h>
> >  #include <grub/cpu/io.h>
> >
> > +/*
> > + * Define GRUB_PMTIMER_IGNORE_BAD_READS if you're trying to test a timer that's
> > + * present but doesn't keep time well.
> > + */
> > +// #define GRUB_PMTIMER_IGNORE_BAD_READS
> > +
> >  grub_uint64_t
> >  grub_pmtimer_wait_count_tsc (grub_port_t pmtimer,
> >  			     grub_uint16_t num_pm_ticks)
> >  {
> >    grub_uint32_t start;
> > -  grub_uint32_t last;
> > -  grub_uint32_t cur, end;
> > +  grub_uint64_t cur, end;
> >    grub_uint64_t start_tsc;
> >    grub_uint64_t end_tsc;
> > -  int num_iter = 0;
> > +  unsigned int num_iter = 0;
> > +#ifndef GRUB_PMTIMER_IGNORE_BAD_READS
> > +  int bad_reads = 0;
> > +#endif
> >
> > -  start = grub_inl (pmtimer) & 0xffffff;
> > -  last = start;
> > +  /*
> > +   * Some timers are 24-bit and some are 32-bit, but it doesn't make much
> > +   * difference to us.  Caring which one we have isn't really worth it since
> > +   * the low-order digits will give us enough data to calibrate TSC.  So just
> > +   * mask the top-order byte off.
> > +   */
> > +  cur = start = grub_inl (pmtimer) & 0x00ffffffUL;
> >    end = start + num_pm_ticks;
> >    start_tsc = grub_get_tsc ();
> >    while (1)
> >      {
> > -      cur = grub_inl (pmtimer) & 0xffffff;
> > -      if (cur < last)
> > -	cur |= 0x1000000;
> > -      num_iter++;
> > +      cur &= 0xffffffffff000000ULL;
> > +      /*
> > +       * Only take the low-order 24-bit for the reason explained above.
> > +       */
> > +      cur |= grub_inl (pmtimer) & 0x00ffffffUL;
> > +
> > +      end_tsc = grub_get_tsc();
> > +
> > +#ifndef GRUB_PMTIMER_IGNORE_BAD_READS
>
> I still do not understand why we need compile this code conditionally.
> Why cannot we build it always and enable only when debug is set to
> pmtimer/all? Or use another variable to enable this test?
>
> > +      /*
> > +       * If we get 10 reads in a row that are obviously dead pins, there's no
> > +       * reason to do this thousands of times.
> > +       */
> > +      if (cur == 0xffffffUL || cur == 0)
> > +	{
> > +	  bad_reads++;
> > +	  grub_dprintf ("pmtimer",
> > +			"pmtimer: 0x%"PRIxGRUB_UINT64_T" bad_reads: %d\n",
> > +			cur, bad_reads);
> > +	  grub_dprintf ("pmtimer", "timer is broken; giving up.\n");
>
> It seems to me this grub_dprintf() should be moved...
>
> > +
> > +	  if (bad_reads == 10)
>
> ... here.
>
> > +	    return 0;
> > +	}
> > +#endif
> > +
> > +      if (cur < start)
> > +	cur += 0x1000000;
> > +
> >        if (cur >= end)
> >  	{
> > -	  end_tsc = grub_get_tsc ();
> > +	  grub_dprintf ("pmtimer", "pmtimer delta is 0x%"PRIxGRUB_UINT64_T"\n",
> > +			cur - start);
> > +	  grub_dprintf ("pmtimer", "tsc delta is 0x%"PRIxGRUB_UINT64_T"\n",
> > +			end_tsc - start_tsc);
> >  	  return end_tsc - start_tsc;
> >  	}
> > -      /* Check for broken PM timer.
> > -	 50000000 TSCs is between 5 ms (10GHz) and 200 ms (250 MHz)
> > -	 if after this time we still don't have 1 ms on pmtimer, then
> > -	 pmtimer is broken.
> > +
> > +      /*
> > +       * Check for broken PM timer.  1ms at 10GHz should be 1E+7 TSCs; at
> > +       * 250MHz it should be 2.5E5.  So if after 4E+7 TSCs on a 10GHz machine,
> > +       * we should have seen pmtimer show 4ms of change (i.e. cur =~
> > +       * start+14320); on a 250MHz machine that should be 160ms (start+572800).
> > +       * If after this a time we still don't have 1ms on pmtimer, then pmtimer
> > +       * is broken.
> > +       *
> > +       * Likewise, if our code is perfectly efficient and introduces no delays
> > +       * whatsoever, on a 10GHz system we should see a TSC delta of 3580 in
> > +       * ~3580 iterations.  On a 250MHz machine that should be ~900 iterations.
> > +       *
> > +       * With those factors in mind, there are two limits here.  There's a hard
> > +       * limit here at 8x our desired pm timer delta. This limit was picked as
> > +       * an arbitrarily large value that's still not a lot of time to humans,
> > +       * because if we get that far this is either an implausibly fast machine
> > +       * or the pmtimer is not running.  And there is another limit on a 4 ms TSC
> > +       * delta on a 10 GHz clock, without seeing cur converge on our target value.
> >         */
> > -      if ((num_iter & 0xffffff) == 0 && grub_get_tsc () - start_tsc > 5000000) {
> > -	return 0;
> > -      }
> > +      if ((++num_iter > (grub_uint32_t)num_pm_ticks << 3UL) ||
>
> s/grub_uint32_t/unsigned int/? And please add space between ")" and num_pm_ticks.

Or would not it be more natural if num_iter is defined as grub_uint32_t
and we cast num_pm_ticks to grub_uint32_t here too?

Daniel


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 1/1] i386: Make pmtimer tsc calibration not take 51 seconds to fail
  2022-07-19 17:38     ` Robbie Harwood
@ 2022-07-20 13:28       ` Daniel Kiper
  0 siblings, 0 replies; 6+ messages in thread
From: Daniel Kiper @ 2022-07-20 13:28 UTC (permalink / raw)
  To: Robbie Harwood; +Cc: grub-devel, Peter Jones, Javier Martinez Canillas

On Tue, Jul 19, 2022 at 01:38:51PM -0400, Robbie Harwood wrote:
> Daniel Kiper <dkiper@net-space.pl> writes:
> > On Thu, Jul 14, 2022 at 05:42:51PM -0400, Robbie Harwood wrote:
> >
> >> This outcome was tested using grub compiled with
> >> GRUB_PMTIMER_IGNORE_BAD_READS defined (so as not to trip the bad read
> >> test) using qemu+kvm with UEFI (OVMF) firmware, and these options:
> >> -machine pc-q35-2.10 -cpu Broadwell-noTSX
>
> >> +/*
> >> + * Define GRUB_PMTIMER_IGNORE_BAD_READS if you're trying to test a timer that's
> >> + * present but doesn't keep time well.
> >> + */
> >> +// #define GRUB_PMTIMER_IGNORE_BAD_READS
>
> >> +#ifndef GRUB_PMTIMER_IGNORE_BAD_READS
> >
> > I still do not understand why we need compile this code conditionally.
> > Why cannot we build it always and enable only when debug is set to
> > pmtimer/all? Or use another variable to enable this test?
>
> It's mentioned in the commit message and the comment I've quoted above
> that this is for testing - there isn't a use case for defining it
> otherwise.  (I also touched on this some in reply to Paul Menzel.)

I took a look at your reply to Paul. I think something like that should
be added to the commit message. Otherwise it is not (entirely) clear why
this part of code has to be commented out (or disabled if we agree it).

> As it changes more behavior than just debug prints, it doesn't make
> sense to me to tie it to debug.  More generally, I'd rather remove it
> than making the behavior runtime-configurable.  Would you prefer
> removing the #define entirely to the patch as written now?

If you say it is untested code I do not feel comfortable to run it
unconditionally too. However, I can see some reasons to have it compiled
in. Then everybody could enable additional tests without recompiling the
GRUB. It would be especially convenient on machines with secure boot
enabled. Though I agree we should not tie this to debug variable and use
separate environment variable to enable this tests.

> (Rest of comments make sense to me and I'll cut a v4 once we resolve
> this.)

Cool! Thanks!

Daniel


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-07-20 13:28 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-14 21:42 [PATCH v3 0/1] i386: Make pmtimer tsc calibration not take 51 seconds to fail Robbie Harwood
2022-07-14 21:42 ` [PATCH v3 1/1] " Robbie Harwood
2022-07-19 13:28   ` Daniel Kiper
2022-07-19 17:38     ` Robbie Harwood
2022-07-20 13:28       ` Daniel Kiper
2022-07-20 13:09     ` Daniel Kiper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.