Re: L1D-Fault KVM mitigation

From: Thomas Gleixner <tglx@linutronix.de>
To: speck@linutronix.de
Subject: Re: L1D-Fault KVM mitigation
Date: Sun, 27 May 2018 17:42:46 +0200 (CEST)	[thread overview]
Message-ID: <alpine.DEB.2.21.1805271713230.1585@nanos.tec.linutronix.de> (raw)
In-Reply-To: <20180526204319.GB4486@tassilo.jf.intel.com>

On Sat, 26 May 2018, speck for Andi Kleen wrote:
> > The PIO case _IS_ interesting because it highlights the problem with the
> > synchronization overhead. And it does not matter at all whether you VMEXIT
> > because of a PIO access or due to any other reason. So even if you optimize
> > it then you still have a gazillion of vm_exits on boot. The simple boot
> > tests I did have ~250k vm_exits in 5 seconds and only half of them are PIO.
> 
> Keep in mind that we don't need to synchronize when the other CPU is idle
> in the guest, so it's only a problem when all the CPUs are busy.

No. It does not matter at all what a guest CPU does. The allowed states are:

	CPU0			CPU1

	In host			In host

	In guest		In host forced idle

	In host forced idle	In guest

	In guest       		In guest

Whatever the guest mode does is irrelevant.

> That should be the common case for boot.

> > Nevertheless it gave me very interesting insights via tracing the
> > synchronization mechanics. The interesting thing is that halfways
> > synchronous vmexits on both vCPUs are rather cheap. The slightly async ones
> 
> What's an async vmexit? One that blocks?

No, I'm talking about timing. Sorry I should have said simultaneous. Let me
rephrase.

If the vmexits of both guest CPUs happen almost at the same time,
i.e. simultaneous then the overhead is pretty small. That's the case for
the tick. But that's pretty much the only event which has that property.

All other vmexits I see are singular events of one guest. There you have a
choice of busy waiting for the other vCPU to vmexit as well or force it out
via an IPI. The method I use is IPI as busy waiting would be horribly slow
for obvious reasons.

Now there are situations which show the following behaviour:

CPU0	  vmexit
	  IPI CPU1
CPU0	  sync_exit
CPU1	  vmexit
CPU1	  sync_exit
CPU1	  sync_enter

CPU0	  do_stuff
CPU0	  sync_enter

CPU0	  vmenter
CPU1	  vmenter

CPU0	  vmexit	immediate after vmenter	
	  IPI CPU1
....

and this ping pong goes on 10 times in a row taking 2+ milliseconds and the
progress made is the guest is minimal. The reason for this are interrupts
targeted to one of the vCPUs or operations in one of the guest threads
which cause several exits in a row. And there is nothing you can do about
that. It's completely workload dependent.

So unless you have a fully controlled scenario where the guests almost
never exit, the whole synchronization stuff is doomed. But fully controlled
means a 1:1 relationship of physical and virtual CPUs like David mentioned.
Yes, that stuff can benefit, but then we rather want ucode assistance than
the whole wait/IPI dance in software.

Thanks,

	tglx