All of lore.kernel.org
 help / color / mirror / Atom feed
* statistical time calibration
@ 2022-01-18 15:03 Jan Beulich
  2022-01-19  8:27 ` Jan Beulich
  2022-03-11 15:29 ` Roger Pau Monné
  0 siblings, 2 replies; 4+ messages in thread
From: Jan Beulich @ 2022-01-18 15:03 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

Hello,

Roger pointer me to a FreeBSD commit [1] introducing such there. While
we don't start at 2000ms (but rather at 50), this still looked interesting
enough to take a closer look. I think I've mostly understood the idea and
implementation now, with the exception of three things:

1) When deciding whether to increment "passes", both variance values have
an arbitrary value of 4 added to them. There's a sentence about this in
the earlier (big) comment, but it lacks any justification as to the chosen
value. What's worse, variance is not a plain number, but a quantity in the
same units as the base values. Since typically both clocks will run at
very difference frequencies, using the same (constant) value here has much
more of an effect on the lower frequency clock's value than on the higher
frequency one's.

2) The second of the "important formulas" is nothing I could recall or was
able to look up. All I could find are somewhat similar, but still
sufficiently different ones. Perhaps my "introductory statistics" have
meanwhile been too long ago ... (In this context I'd like to also mention
that it took me quite a while to prove to myself that the degenerate case
of, in particular, the first iteration wouldn't lead to an early exit
from the function.)

3) At the bottom of the loop there is some delaying logic, leading to
later data points coming in closer succession than earlier ones. I'm
afraid I don't understand the "theoretical risk of aliasing", and hence
I'm seeing more risks than benefits from this construct.

Beyond that there are implementation aspects that I'm not happy with,
like aforementioned delay loop not dealing with a TSC which did start
from a large "negative" value, and which hence would eventually wrap. Nor
is the SMI (or other long latency events) aspect being taken care of. But
any such concern could of course be dealt with as we port over this
logic, if we decided we want to go that route.

My main concern is with the goal of reaching accuracy of 1PPM, and the
loop ending only after a full second (if I got that right) if that
accuracy cannot be reached. Afaict there's no guarantee that 1PPM is
reachable. My recent observations suggest that with HPET that's
feasible (but only barely), but with PMTMR it might be more like 3 or
more.

The other slight concern I have, as previously voiced on IRC, is the use
of floating point here.

Jan

[1] https://cgit.freebsd.org/src/commit/?id=c2705ceaeb09d8579661097fd358ffb5defb5624



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: statistical time calibration
  2022-01-18 15:03 statistical time calibration Jan Beulich
@ 2022-01-19  8:27 ` Jan Beulich
  2022-03-11 15:29 ` Roger Pau Monné
  1 sibling, 0 replies; 4+ messages in thread
From: Jan Beulich @ 2022-01-19  8:27 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

On 18.01.2022 16:03, Jan Beulich wrote:
> Hello,
> 
> Roger pointer me to a FreeBSD commit [1] introducing such there. While
> we don't start at 2000ms (but rather at 50), this still looked interesting
> enough to take a closer look. I think I've mostly understood the idea and
> implementation now, with the exception of three things:
> 
> 1) When deciding whether to increment "passes", both variance values have
> an arbitrary value of 4 added to them. There's a sentence about this in
> the earlier (big) comment, but it lacks any justification as to the chosen
> value. What's worse, variance is not a plain number, but a quantity in the
> same units as the base values.

While not relevant for the eventual usage, I'd like to correct myself here:
The unit of variance (and covariance) is the square of the base unit
(assuming, in the covariance case, the units of both values are the same,
as is the case here, where fundamentally both use Hz, and just the scales
may - and typically will - be different). Which ...

> Since typically both clocks will run at
> very difference frequencies, using the same (constant) value here has much
> more of an effect on the lower frequency clock's value than on the higher
> frequency one's.

... means the difference in (relative) effect on the two values is even
more significant.

Jan

> 2) The second of the "important formulas" is nothing I could recall or was
> able to look up. All I could find are somewhat similar, but still
> sufficiently different ones. Perhaps my "introductory statistics" have
> meanwhile been too long ago ... (In this context I'd like to also mention
> that it took me quite a while to prove to myself that the degenerate case
> of, in particular, the first iteration wouldn't lead to an early exit
> from the function.)
> 
> 3) At the bottom of the loop there is some delaying logic, leading to
> later data points coming in closer succession than earlier ones. I'm
> afraid I don't understand the "theoretical risk of aliasing", and hence
> I'm seeing more risks than benefits from this construct.
> 
> Beyond that there are implementation aspects that I'm not happy with,
> like aforementioned delay loop not dealing with a TSC which did start
> from a large "negative" value, and which hence would eventually wrap. Nor
> is the SMI (or other long latency events) aspect being taken care of. But
> any such concern could of course be dealt with as we port over this
> logic, if we decided we want to go that route.
> 
> My main concern is with the goal of reaching accuracy of 1PPM, and the
> loop ending only after a full second (if I got that right) if that
> accuracy cannot be reached. Afaict there's no guarantee that 1PPM is
> reachable. My recent observations suggest that with HPET that's
> feasible (but only barely), but with PMTMR it might be more like 3 or
> more.
> 
> The other slight concern I have, as previously voiced on IRC, is the use
> of floating point here.
> 
> Jan
> 
> [1] https://cgit.freebsd.org/src/commit/?id=c2705ceaeb09d8579661097fd358ffb5defb5624
> 
> 



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: statistical time calibration
  2022-01-18 15:03 statistical time calibration Jan Beulich
  2022-01-19  8:27 ` Jan Beulich
@ 2022-03-11 15:29 ` Roger Pau Monné
  2022-03-12  3:25   ` Colin Percival
  1 sibling, 1 reply; 4+ messages in thread
From: Roger Pau Monné @ 2022-03-11 15:29 UTC (permalink / raw)
  To: Jan Beulich, Colin Percival; +Cc: xen-devel, Andrew Cooper, Wei Liu

On Tue, Jan 18, 2022 at 04:03:56PM +0100, Jan Beulich wrote:
> Hello,
> 
> Roger pointer me to a FreeBSD commit [1] introducing such there. While
> we don't start at 2000ms (but rather at 50), this still looked interesting
> enough to take a closer look. I think I've mostly understood the idea and
> implementation now, with the exception of three things:

I have to admit I didn't really look at the commit in detail, just saw
it go by at the same time you where working on improving our time
calibration, and assumed it could be interesting.

> 1) When deciding whether to increment "passes", both variance values have
> an arbitrary value of 4 added to them. There's a sentence about this in
> the earlier (big) comment, but it lacks any justification as to the chosen
> value. What's worse, variance is not a plain number, but a quantity in the
> same units as the base values. Since typically both clocks will run at
> very difference frequencies, using the same (constant) value here has much
> more of an effect on the lower frequency clock's value than on the higher
> frequency one's.
> 
> 2) The second of the "important formulas" is nothing I could recall or was
> able to look up. All I could find are somewhat similar, but still
> sufficiently different ones. Perhaps my "introductory statistics" have
> meanwhile been too long ago ... (In this context I'd like to also mention
> that it took me quite a while to prove to myself that the degenerate case
> of, in particular, the first iteration wouldn't lead to an early exit
> from the function.)
> 
> 3) At the bottom of the loop there is some delaying logic, leading to
> later data points coming in closer succession than earlier ones. I'm
> afraid I don't understand the "theoretical risk of aliasing", and hence
> I'm seeing more risks than benefits from this construct.

Might be easier to just add Colin, he did the original commit and can
likely answer those questions much better than me. He has also done a
bunch of work for FreeBSD/Xen.

> Beyond that there are implementation aspects that I'm not happy with,
> like aforementioned delay loop not dealing with a TSC which did start
> from a large "negative" value, and which hence would eventually wrap. Nor
> is the SMI (or other long latency events) aspect being taken care of. But
> any such concern could of course be dealt with as we port over this
> logic, if we decided we want to go that route.
> 
> My main concern is with the goal of reaching accuracy of 1PPM, and the
> loop ending only after a full second (if I got that right) if that
> accuracy cannot be reached. Afaict there's no guarantee that 1PPM is
> reachable. My recent observations suggest that with HPET that's
> feasible (but only barely), but with PMTMR it might be more like 3 or
> more.
> 
> The other slight concern I have, as previously voiced on IRC, is the use
> of floating point here.
> 
> Jan
> 
> [1] https://cgit.freebsd.org/src/commit/?id=c2705ceaeb09d8579661097fd358ffb5defb5624
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: statistical time calibration
  2022-03-11 15:29 ` Roger Pau Monné
@ 2022-03-12  3:25   ` Colin Percival
  0 siblings, 0 replies; 4+ messages in thread
From: Colin Percival @ 2022-03-12  3:25 UTC (permalink / raw)
  To: Roger Pau Monné, Jan Beulich; +Cc: xen-devel, Andrew Cooper, Wei Liu

Hi everyone,

On 3/11/22 07:29, Roger Pau Monné wrote:
> On Tue, Jan 18, 2022 at 04:03:56PM +0100, Jan Beulich wrote:
>> 1) When deciding whether to increment "passes", both variance values have
>> an arbitrary value of 4 added to them. There's a sentence about this in
>> the earlier (big) comment, but it lacks any justification as to the chosen
>> value. What's worse, variance is not a plain number, but a quantity in the
>> same units as the base values. Since typically both clocks will run at
>> very difference frequencies, using the same (constant) value here has much
>> more of an effect on the lower frequency clock's value than on the higher
>> frequency one's.

This additional variance arises from the quantization, and so it scales with
the timing quantum.  It makes sense that it has a larger effect on a lower
frequency clock -- if you imagine trying to calibrate against a clock which
runs at 1 Hz, without this term you would read several identical values from
that clock and conclude that your clock runs at infinity Hz.

>> 2) The second of the "important formulas" is nothing I could recall or was
>> able to look up. All I could find are somewhat similar, but still
>> sufficiently different ones. Perhaps my "introductory statistics" have
>> meanwhile been too long ago ... (In this context I'd like to also mention
>> that it took me quite a while to prove to myself that the degenerate case
>> of, in particular, the first iteration wouldn't lead to an early exit
>> from the function.)

Most statistics courses present a formula for the absolute uncertainty in the
slope rather than the relative uncertainty.  But it's easy to derive one from
the other.

>> 3) At the bottom of the loop there is some delaying logic, leading to
>> later data points coming in closer succession than earlier ones. I'm
>> afraid I don't understand the "theoretical risk of aliasing", and hence
>> I'm seeing more risks than benefits from this construct.

Suppose it takes exactly 1 us to run through the loop but one of the clocks
runs at exactly 1000001 Hz.  Without the extra delay, we'll probably observe
the clock incrementing by 1 every time through the loop (since it would only
increment by 2 once a second) and end up computing the wrong frequency.  The
"noise" introduced by adding small (variable) delays eliminates any chance
of this scenario and makes the data points behave like the *random* data
points which the statistical analysis needs.

> Might be easier to just add Colin, he did the original commit and can
> likely answer those questions much better than me. He has also done a
> bunch of work for FreeBSD/Xen.

You're too generous... I think the only real Xen work I did was adding support
for indirect segment I/Os to blkfront.  Mostly I was just packaging things up
for EC2 (back when EC2 used Xen).

>> My main concern is with the goal of reaching accuracy of 1PPM, and the
>> loop ending only after a full second (if I got that right) if that
>> accuracy cannot be reached. Afaict there's no guarantee that 1PPM is
>> reachable. My recent observations suggest that with HPET that's
>> feasible (but only barely), but with PMTMR it might be more like 3 or
>> more.

The "give up after 1 second" thing is just "fall back to historical FreeBSD
behaviour".  In my experiments I found that calibrating against the i8254
we would get 1PPM in about 50 ms while HPET was 2-3 ms.

>> The other slight concern I have, as previously voiced on IRC, is the use
>> of floating point here.

FWIW, my first version of this code (about 5 years ago) used fixed-point
arithmetic.  It was far uglier so I was happy when the FreeBSD kernel became
able to use the FPU this early in the boot process.

-- 
Colin Percival
Security Officer Emeritus, FreeBSD | The power to serve
Founder, Tarsnap | www.tarsnap.com | Online backups for the truly paranoid


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-03-12  3:26 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-18 15:03 statistical time calibration Jan Beulich
2022-01-19  8:27 ` Jan Beulich
2022-03-11 15:29 ` Roger Pau Monné
2022-03-12  3:25   ` Colin Percival

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.