All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Jones <pjones@redhat.com>
To: Daniel Kiper <daniel.kiper@oracle.com>
Cc: grub-devel@gnu.org, "David E . Box" <david.e.box@linux.intel.com>
Subject: Re: [PATCH] Make pmtimer tsc calibration not take 51 seconds to fail.
Date: Fri, 19 Jan 2018 16:10:01 -0500	[thread overview]
Message-ID: <20180119211000.ubntekuxuzhnxlac@redhat.com> (raw)
In-Reply-To: <20180118225253.GB21705@olila.local.net-space.pl>

On Thu, Jan 18, 2018 at 11:52:53PM +0100, Daniel Kiper wrote:
> On Tue, Jan 16, 2018 at 01:16:17PM -0500, Peter Jones wrote:
> > On my laptop running at 2.4GHz, if I run a VM where tsc calibration
> > using pmtimer will fail presuming a broken pmtimer, it takes ~51 seconds
> > to do so (as measured with the stopwatch on my phone), with a tsc delta
> > of 0x1cd1c85300, or around 125 billion cycles.
> >
> > If instead of trying to wait for 5-200ms to show up on the pmtimer, we try
> > to wait for 5-200us, it decides it's broken in ~0x7998f9e TSCs, aka ~2
> > million cycles, or more or less instantly.
> >
> > Additionally, this reading the pmtimer was returning 0xffffffff anyway,
> > and that's obviously an invalid return.  I've added a check for that and
> > 0 so we don't bother waiting for the test if what we're seeing is dead
> > pins with no response at all.
> >
> > Signed-off-by: Peter Jones <pjones@redhat.com>
> > ---
> >  grub-core/kern/i386/tsc_pmtimer.c | 43 ++++++++++++++++++++++++++++++---------
> >  1 file changed, 33 insertions(+), 10 deletions(-)
> >
> > diff --git a/grub-core/kern/i386/tsc_pmtimer.c b/grub-core/kern/i386/tsc_pmtimer.c
> > index c9c36169978..609402b8376 100644
> > --- a/grub-core/kern/i386/tsc_pmtimer.c
> > +++ b/grub-core/kern/i386/tsc_pmtimer.c
> > @@ -38,30 +38,53 @@ grub_pmtimer_wait_count_tsc (grub_port_t pmtimer,
> >    grub_uint64_t start_tsc;
> >    grub_uint64_t end_tsc;
> >    int num_iter = 0;
> > +  int bad_reads = 0;
> >
> > -  start = grub_inl (pmtimer) & 0xffffff;
> > +  start = grub_inl (pmtimer) & 0x3fff;
> 
> I am not sure why you are changing this to 0x3fff...

Upon re-reading, I am not either.  (As you see I wrote this patch 2+
months ago while working on something else, so the memory has already
faded.)

Obviously I need to make a version 2 of this patch to fix some issues.

> >    last = start;
> >    end = start + num_pm_ticks;
> >    start_tsc = grub_get_tsc ();
> >    while (1)
> >      {
> > -      cur = grub_inl (pmtimer) & 0xffffff;
> 
> What about 24-bit timers? I would leave this here...

I was blisfully unaware of 24-bit timers.  I'll put that back and add a
comment for future readers of the code, and check for 0xffffff below.

> > +      cur = grub_inl (pmtimer);
> > +
> > +      /* If we get 10 reads in a row that are obviously dead pins, there's no
> > +	 reason to do this thousands of times.
> > +       */
> > +      if (cur == 0xffffffff || cur == 0)
> 
> ...and here I would check for 0xffffff and 0.

Sure.

> 
> > +	{
> > +	  bad_reads++;
> > +	  grub_dprintf ("pmtimer", "cur: 0x%08x bad_reads: %d\n", cur, bad_reads);
> > +
> > +	  if (bad_reads == 10)
> > +	    return 0;
> > +	}
> > +      else if (bad_reads)
> > +	bad_reads = 0;
> 
> Do we really need to reset this?

I mean, it's technically correct, but I don't think you're going to see
0 and 0xffffff as valid values often enough to actually care.

> > +      cur &= 0x3fff;
> > +
> >        if (cur < last)
> > -	cur |= 0x1000000;
> > +	cur |= 0x4000;
> >        num_iter++;

You didn't spot this, but it's wrong for exactly the same reason the
initialization of start is.  I'm pretty sure when I did this part, I was
letting the comment below tell me what was going on, and completely
missed that the comment doesn't match the conditional at all.

> >        if (cur >= end)
> >  	{
> >  	  end_tsc = grub_get_tsc ();
> > +	  grub_dprintf ("pmtimer", "tsc delta is 0x%016lx\n",
> > +			end_tsc - start_tsc);
> >  	  return end_tsc - start_tsc;
> >  	}
> > -      /* Check for broken PM timer.
> > -	 50000000 TSCs is between 5 ms (10GHz) and 200 ms (250 MHz)
> > -	 if after this time we still don't have 1 ms on pmtimer, then
> > -	 pmtimer is broken.
> > +      /* Check for broken PM timer.  5000 TSCs is between 5us (10GHz) and
>                                                              ^^^ 500ns?
> 
> > +	 200us (250 MHz).  If after this time we still don't have 1us on
>          ^^^^^ 20us?

You know, I stared at this for a surprising amount of time when writing
it trying to figure out why these numbers seemed weird.  While I have
the diff in front of me rather than the code as it is now, I see that
it's because I was starting with the comment that was previously there,
which had 50M instead of 5M for the TSC count.  So I wound up dividing
these by 10k instead of 1k.  Woops.  Will fix in v2.

> 
> > +	 pmtimer, then pmtimer is broken.
> >         */
> > -      if ((num_iter & 0xffffff) == 0 && grub_get_tsc () - start_tsc > 5000000) {
> > -	return 0;
> > -      }
> > +      end_tsc = grub_get_tsc();
> > +      if ((num_iter & 0x3fff) == 0 && end_tsc - start_tsc > 5000)
> 
> Why 0x3fff here which means, AIUI, 4000 iterations?

Er... 16384, surely?  But basically I just wanted to scale down the
limit by approximately the same order as I scaled down the TSC delta,
and I wanted to leave the code basically working the same way (not for
any good reason), so I still wanted a number with only low-order bits
set.  TSC is scaled down by 1000, so I scaled the PM tick iteration down
by 1024.  I could make it twice as wide of a window instead, if you're
worried about that particular race, but I don't think it's going to
matter.

If our loop takes 1 TSC tick per iteration, which is quite impluasibly
fast, and we're on a 10GHz machine, then we should normally see pmtimer
change by 3580 in ~2800 iterations.  If it takes 16k iterations (again,
with perfectly executing code that takes no time), the pmtimer is
running at 625KHz instead of 3.58MHz.  If we're on a 250MHz machine,
that number is ~70 iterations.  If we get to 16000 iterations without
seeing pmtimer change by 3580 then pmtimer is either broken or we're on
a 60GHz+ machine.

That said the logic here did not match the comment and it still doesn't;
it should be comparing cur and start, not the number of iterations.  On
the next version of the patch I'm going to do something like:

        if ((++num_iter > (grub_uint32_t)num_pm_ticks << 3UL) ||
            (cur == start && end_tsc - start_tsc > 5000))

So it'll have a fairly widely ranged total limit but also actually check if
pmtimer results have changed at all.

Tentative patch below, but I haven't had a chance to test it yet:

From 97e59bbe000961db5f823c4df074835d1165bd9d Mon Sep 17 00:00:00 2001
From: Peter Jones <pjones@redhat.com>
Date: Tue, 7 Nov 2017 17:12:17 -0500
Subject: [PATCH] Make pmtimer tsc calibration not take 51 seconds to fail.

On my laptop running at 2.4GHz, if I run a VM where tsc calibration
using pmtimer will fail presuming a broken pmtimer, it takes ~51 seconds
to do so (as measured with the stopwatch on my phone), with a tsc delta
of 0x1cd1c85300, or around 125 billion cycles.

If instead of trying to wait for 5-200ms to show up on the pmtimer, we try
to wait for 500ns-20us, it decides it's broken in ~0x7998f9e TSCs, aka ~2
million cycles, or more or less instantly.

Additionally, this reading the pmtimer was returning 0xffffffff anyway,
and that's obviously an invalid return.  I've added a check for that and
0 so we don't bother waiting for the test if what we're seeing is dead
pins with no response at all.
---
 grub-core/kern/i386/tsc_pmtimer.c | 56 ++++++++++++++++++++++++++++++++-------
 1 file changed, 46 insertions(+), 10 deletions(-)

diff --git a/grub-core/kern/i386/tsc_pmtimer.c b/grub-core/kern/i386/tsc_pmtimer.c
index c9c36169978..b09c00316df 100644
--- a/grub-core/kern/i386/tsc_pmtimer.c
+++ b/grub-core/kern/i386/tsc_pmtimer.c
@@ -37,8 +37,14 @@ grub_pmtimer_wait_count_tsc (grub_port_t pmtimer,
   grub_uint32_t cur, end;
   grub_uint64_t start_tsc;
   grub_uint64_t end_tsc;
-  int num_iter = 0;
+  unsigned int num_iter = 0;
+  int bad_reads = 0;
 
+  /* Some timers are 24-bit and some are 32-bit, but it doesn't make much
+   * difference to us.  Caring which one we have isn't really worth it since
+   * the low-order digits will give us enough data to calibrate TSC.  So just
+   * mask the top-order byte off.
+   */
   start = grub_inl (pmtimer) & 0xffffff;
   last = start;
   end = start + num_pm_ticks;
@@ -46,22 +52,52 @@ grub_pmtimer_wait_count_tsc (grub_port_t pmtimer,
   while (1)
     {
       cur = grub_inl (pmtimer) & 0xffffff;
+
+      /* If we get 10 reads in a row that are obviously dead pins, there's no
+	 reason to do this thousands of times.
+       */
+      if (cur == 0xffffff || cur == 0)
+	{
+	  bad_reads++;
+	  grub_dprintf ("pmtimer", "pmtimer value: 0x%08x bad_reads: %d\n",
+			cur, bad_reads);
+
+	  if (bad_reads == 10)
+	    return 0;
+	}
+      else if (bad_reads)
+	bad_reads = 0;
+
+      end_tsc = grub_get_tsc();
+
       if (cur < last)
 	cur |= 0x1000000;
-      num_iter++;
       if (cur >= end)
 	{
-	  end_tsc = grub_get_tsc ();
+	  grub_dprintf ("pmtimer", "tsc delta is 0x%016llx\n",
+			end_tsc - start_tsc);
 	  return end_tsc - start_tsc;
 	}
-      /* Check for broken PM timer.
-	 50000000 TSCs is between 5 ms (10GHz) and 200 ms (250 MHz)
-	 if after this time we still don't have 1 ms on pmtimer, then
-	 pmtimer is broken.
+
+      /* Check for broken PM timer.  5000 TSCs is between 500ns (10GHz) and
+	 20us (250 MHz).  If after this time we still don't have 1us on
+	 pmtimer, then pmtimer is broken.
+
+	 Likewise, if our code is perfectly efficient and introduces no delays
+	 whatsoever, on a 10GHz system we should see a TSC delta of 3580 in
+	 ~2800 iterations.  On a 250MHz machine that should be ~70 iterations.
+	 So there's a hard limit here at 8x our desired pm timer delta, because
+	 if we get that far, we're either on a 60GHz+ machine or we're never
+	 hitting our threshold.
        */
-      if ((num_iter & 0xffffff) == 0 && grub_get_tsc () - start_tsc > 5000000) {
-	return 0;
-      }
+      if ((++num_iter > (grub_uint32_t)num_pm_ticks << 3UL) ||
+	  (cur == start && end_tsc - start_tsc > 5000))
+	{
+	  grub_dprintf ("pmtimer",
+			"tsc delta is 0x%016llx, which is implausible\n",
+			end_tsc - start_tsc);
+	  return 0;
+	}
     }
 }
 
-- 
2.15.0


-- 
  Peter


  reply	other threads:[~2018-01-19 21:10 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-16 18:16 [PATCH] Make pmtimer tsc calibration not take 51 seconds to fail Peter Jones
2018-01-18 22:52 ` Daniel Kiper
2018-01-19 21:10   ` Peter Jones [this message]
2018-01-29 16:42     ` Daniel Kiper
2018-02-19 22:37       ` Peter Jones
2018-02-20 19:39         ` Daniel Kiper

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180119211000.ubntekuxuzhnxlac@redhat.com \
    --to=pjones@redhat.com \
    --cc=daniel.kiper@oracle.com \
    --cc=david.e.box@linux.intel.com \
    --cc=grub-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.