statistical time calibration

* statistical time calibration
@ 2022-01-18 15:03 Jan Beulich
  2022-01-19  8:27 ` Jan Beulich
  2022-03-11 15:29 ` Roger Pau Monné
  0 siblings, 2 replies; 4+ messages in thread
From: Jan Beulich @ 2022-01-18 15:03 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

Hello,

Roger pointer me to a FreeBSD commit [1] introducing such there. While
we don't start at 2000ms (but rather at 50), this still looked interesting
enough to take a closer look. I think I've mostly understood the idea and
implementation now, with the exception of three things:

1) When deciding whether to increment "passes", both variance values have
an arbitrary value of 4 added to them. There's a sentence about this in
the earlier (big) comment, but it lacks any justification as to the chosen
value. What's worse, variance is not a plain number, but a quantity in the
same units as the base values. Since typically both clocks will run at
very difference frequencies, using the same (constant) value here has much
more of an effect on the lower frequency clock's value than on the higher
frequency one's.

2) The second of the "important formulas" is nothing I could recall or was
able to look up. All I could find are somewhat similar, but still
sufficiently different ones. Perhaps my "introductory statistics" have
meanwhile been too long ago ... (In this context I'd like to also mention
that it took me quite a while to prove to myself that the degenerate case
of, in particular, the first iteration wouldn't lead to an early exit
from the function.)

3) At the bottom of the loop there is some delaying logic, leading to
later data points coming in closer succession than earlier ones. I'm
afraid I don't understand the "theoretical risk of aliasing", and hence
I'm seeing more risks than benefits from this construct.

Beyond that there are implementation aspects that I'm not happy with,
like aforementioned delay loop not dealing with a TSC which did start
from a large "negative" value, and which hence would eventually wrap. Nor
is the SMI (or other long latency events) aspect being taken care of. But
any such concern could of course be dealt with as we port over this
logic, if we decided we want to go that route.

My main concern is with the goal of reaching accuracy of 1PPM, and the
loop ending only after a full second (if I got that right) if that
accuracy cannot be reached. Afaict there's no guarantee that 1PPM is
reachable. My recent observations suggest that with HPET that's
feasible (but only barely), but with PMTMR it might be more like 3 or
more.

The other slight concern I have, as previously voiced on IRC, is the use
of floating point here.

Jan

[1] https://cgit.freebsd.org/src/commit/?id=c2705ceaeb09d8579661097fd358ffb5defb5624

^ permalink raw reply	[flat|nested] 4+ messages in thread