[Qemu-devel] -rtc base=, migration and time jumps

* [Qemu-devel] -rtc base=, migration and time jumps
@ 2019-07-19 12:36 Dr. David Alan Gilbert
  2019-07-19 14:13 ` Paolo Bonzini
  0 siblings, 1 reply; 3+ messages in thread
From: Dr. David Alan Gilbert @ 2019-07-19 12:36 UTC (permalink / raw)
  To: qemu-devel, pbonzini; +Cc: jan.kiszka

Hi,
  I've just spent an unreasonable amount of time debugging
an rtc issue and come to the conclusion it's probably more
of a documentation problem than actual code - but I wondered if
anyone disagrees.

(ref: https://bugzilla.redhat.com/show_bug.cgi?id=1714143 )

The question revolves around -rtc base=    and what the base=
passed to a destination qemu after migration should be.
(partcicularly with 'host' clock)

At startup, QEMU (vl.c) calculates offsets from the host clock
to the base - that value isn't migrated.

Most rtc calculations done afterwards don't reference it - they're
all based on the time since we last read the clock and a rolling time
since then.

There's code to detect host clock jumps, and trigger a notifier
- the only use of that is the mc146818rtc used on the x86.
It then reuses the base offset to reset the rtc to the current host
clock time.

a) If you start a destination qemu with the same base= value
as the source then the internal offset value will be different
by how much later you started the destination.

b) If you can trigger the host clock jump update, then on x86
that difference from (a) will become visible in reading the rtc
(hwclock) and thus the rtc will appear to have fallen behind.

c) libvirt (when using an 'adjustment' as oVirt does) recalculates
the base on the destination; so the base passed to the destination
qemu is different from the source; so even when (b) happens
you get a consistent value.  This may be an accident!

d) The host clock jump detection (b) is broken - it correctly detects
backwards jumps; but it's detection of a forward jump is based
on two readings of the host clock being more than 60s apart - but
often ona q emu running a Linux guest the host clock isn't read at all;
so reading hwclock, waiting a minute and reading it again will trigger
the jump code.

So what to do?

1) Tell people to do what libvirt does and specify base= differently
on the dest.

2) Migrate the offset value such that the base= on the destination
is ignored

3) Fix the host clock jump detection

(3) is probably independent - the easiest fix would seem to be just
to set a timer to read the host clock at say 20 second intervals
which is wasteful but would avoid the false trigger.

Is (2) worth it or do we just go with (1) - I'm tempted to just
specify the behaviour.

Mind you, we could kill the host clock jump detection code - only
the mc148618 registers on the notifier for it - so presumably
aarch/ppc/s390 etc dont see it.

Thoughts?

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 3+ messages in thread