On Thu, 2023-10-19 at 08:40 -0700, Sean Christopherson wrote: > > > Normally, it should be up to the hypervisor to tell the guest which > > clock to use, i.e. if TSC is reliable or not. Let me put my question > > this way: if TSC on the particular host is good for everything, why > > does the hypervisor advertises 'kvmclock' to its guests? > > I suspect there are two reasons. > >   1. As is likely the case in our fleet, no one revisited the set of advertised >      PV features when defining the VM shapes for a new generation of hardware, or >      whoever did the reviews wasn't aware that advertising kvmclock is actually >      suboptimal.  All the PV clock stuff in KVM is quite labyrinthian, so it's >      not hard to imagine it getting overlooked. > >   2. Legacy VMs.  If VMs have been running with a PV clock for years, forcing >      them to switch to a new clocksource is high-risk, low-reward. Doubly true for Xen guests (given that the Xen clocksource is identical to the KVM clocksource). > > If for some 'historical reasons' we can't revoke features we can always > > introduce a new PV feature bit saying that TSC is preferred. Don't we already have one? It's the PVCLOCK_TSC_STABLE_BIT. Why would a guest ever use kvmclock if the PVCLOCK_TSC_STABLE_BIT is set? The *point* in the kvmclock is that the hypervisor can mess with the epoch/scaling to try to compensate for TSC brokenness as the host scales/sleeps/etc. And the *problem* with the kvmclock is that it does just that, even when the host TSC hasn't done anything wrong and the kvmclock shouldn't have changed at all. If the PVCLOCK_TSC_STABLE_BIT is set, a guest should just use the guest TSC directly without looking to the kvmclock for adjusting it. No?