On Thu, 2023-10-19 at 08:40 -0700, Sean Christopherson wrote:
> 
> > Normally, it should be up to the hypervisor to tell the guest which
> > clock to use, i.e. if TSC is reliable or not. Let me put my question
> > this way: if TSC on the particular host is good for everything, why
> > does the hypervisor advertises 'kvmclock' to its guests?
> 
> I suspect there are two reasons.
> 
>   1. As is likely the case in our fleet, no one revisited the set of advertised
>      PV features when defining the VM shapes for a new generation of hardware, or
>      whoever did the reviews wasn't aware that advertising kvmclock is actually
>      suboptimal.  All the PV clock stuff in KVM is quite labyrinthian, so it's
>      not hard to imagine it getting overlooked.
> 
>   2. Legacy VMs.  If VMs have been running with a PV clock for years, forcing
>      them to switch to a new clocksource is high-risk, low-reward.

Doubly true for Xen guests (given that the Xen clocksource is identical
to the KVM clocksource).

> > If for some 'historical reasons' we can't revoke features we can always
> > introduce a new PV feature bit saying that TSC is preferred.

Don't we already have one? It's the PVCLOCK_TSC_STABLE_BIT. Why would a
guest ever use kvmclock if the PVCLOCK_TSC_STABLE_BIT is set?

The *point* in the kvmclock is that the hypervisor can mess with the
epoch/scaling to try to compensate for TSC brokenness as the host
scales/sleeps/etc.

And the *problem* with the kvmclock is that it does just that, even
when the host TSC hasn't done anything wrong and the kvmclock shouldn't
have changed at all.

If the PVCLOCK_TSC_STABLE_BIT is set, a guest should just use the guest
TSC directly without looking to the kvmclock for adjusting it.

No?