Re: [PATCH v7 08/12] KVM/x86/vPMU: Add APIs to support host save/restore the guest lbr stack

From: Wei Wang <wei.w.wang@intel.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	pbonzini@redhat.com, ak@linux.intel.com, kan.liang@intel.com,
	mingo@redhat.com, rkrcmar@redhat.com, like.xu@intel.com,
	jannh@google.com, arei.gonglei@huawei.com, jmattson@google.com
Subject: Re: [PATCH v7 08/12] KVM/x86/vPMU: Add APIs to support host save/restore the guest lbr stack
Date: Tue, 09 Jul 2019 19:34:26 +0800	[thread overview]
Message-ID: <5D247BC2.70104@intel.com> (raw)
In-Reply-To: <20190709093917.GS3402@hirez.programming.kicks-ass.net>

On 07/09/2019 05:39 PM, Peter Zijlstra wrote:
> On Tue, Jul 09, 2019 at 11:04:21AM +0800, Wei Wang wrote:
>> On 07/08/2019 10:48 PM, Peter Zijlstra wrote:
>>> *WHY* does the host need to save/restore? Why not make VMENTER/VMEXIT do
>>> this?
>> Because the VMX transition is much more frequent than the vCPU switching.
>> On SKL, saving 32 LBR entries could add 3000~4000 cycles overhead, this
>> would be too large for the frequent VMX transitions.
>>
>> LBR state is saved when vCPU is scheduled out to ensure that this
>> vCPU's LBR data doesn't get lost (as another vCPU or host thread that
>> is scheduled in may use LBR)
> But VMENTER/VMEXIT still have to enable/disable the LBR, right?
> Otherwise the host will pollute LBR contents. And you then rely on this
> 'fake' event to ensure the host doesn't use LBR when the VCPU is
> running.

Yes, only the debugctl msr is save/restore on vmx tranisions.

>
> But what about the counter scheduling rules;

The counter is emulated independent of the lbr emulation.

Here is the background reason:

The direction we are going is the architectural emulation, where the 
features
are emulated based on the hardware behavior described in the spec. So 
the lbr
emulation path only offers the lbr feature to the guest (no counters 
associated, as
the lbr feature doesn't have a counter essentially).

If the above isn't clear, please see this example: the guest could run 
any software
to use the lbr feature (non-perf or non-linux, or even a testing kernel 
module to try
lbr for their own purpose), and it could choose to use a regular timer 
to do sampling.
If the lbr emulation takes a counter to generate a PMI to the guest to 
do sampling,
that pmi isn't expected from the guest perspective.

So the counter scheduling isn't considered by the lbr emulation here, it 
is considered
by the counter emulation. If the guest needs a counter, it configures 
the related msr,
which traps to KVM, and the counter emulation has it own emulation path
(e.g. reprogram_gp_counter which is called when the guest writes to the 
emulated
eventsel msr).

> what happens when a CPU
> event claims the LBR before the task event can claim it? CPU events have
> precedence over task events.

I think the precedence (cpu pined and task pined) is for the counter 
multiplexing,
right?

For the lbr feature, could we thought of it as first come, first served?
For example, if we have 2 host threads who want to use lbr at the same time,
I think one of them would simply fail to use.

So if guest first gets the lbr, host wouldn't take over unless some 
userspace
command (we added to QEMU) is executed to have the vCPU actively
stop using lbr.

>
> I'm missing all these details in the Changelogs. Please describe the
> whole setup and explain why this approach.

OK, just shared some important background above.
I'll see if any more important details missed.

Best,
Wei