Re: [PATCH qemu-web] Add a blog post on "Micro-Optimizing KVM VM-Exits"

From: "Alex Bennée" <alex.bennee@linaro.org>
To: qemu-devel@nongnu.org
Cc: aarcange@redhat.com, Kashyap Chamarthy <kchamart@redhat.com>,
	dgilbert@redhat.com, stefanha@redhat.com, pbonzini@redhat.com,
	vkuznets@redhat.com
Subject: Re: [PATCH qemu-web] Add a blog post on "Micro-Optimizing KVM VM-Exits"
Date: Fri, 15 Nov 2019 12:25:56 +0000	[thread overview]
Message-ID: <87zhgx5nsb.fsf@linaro.org> (raw)
In-Reply-To: <f8dce546-ea28-0619-a20a-62c762f99721@redhat.com>

Thomas Huth <thuth@redhat.com> writes:

> On 08/11/2019 10.22, Kashyap Chamarthy wrote:
>> This blog post summarizes the talk "Micro-Optimizing KVM VM-Exits"[1],
>> given by Andrea Arcangeli at the recently concluded KVM Forum 2019.
>>
>
>  Hi Kashyap,
>
> first thanks for writing up this article! It's a really nice summary of
> the presentation, I think.
>
> But before we include it, let me ask a meta-question: Is an article
> about the KVM *kernel* code suitable for the *QEMU* blog? Or is there
> maybe a better place for this, like an article on www.linux-kvm.org ?
>
> Opinions? Ideas?

I don't think it is a particular problem hosting it on the QEMU blog
given the closeness of the two projects. It would get syndicated to
planet.libvirt as well ;-)

>
>  Thomas
>
>
>> ---
>>  ...019-11-06-micro-optimizing-kvm-vmexits.txt | 115 ++++++++++++++++++
>>  1 file changed, 115 insertions(+)
>>  create mode 100644 _posts/2019-11-06-micro-optimizing-kvm-vmexits.txt
>>
>> diff --git a/_posts/2019-11-06-micro-optimizing-kvm-vmexits.txt b/_posts/2019-11-06-micro-optimizing-kvm-vmexits.txt
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..f4a28d58ddb40103dd599fdfd861eeb4c41ed976
>> --- /dev/null
>> +++ b/_posts/2019-11-06-micro-optimizing-kvm-vmexits.txt
>> @@ -0,0 +1,115 @@
>> +---
>> +layout: post
>> +title: "Micro-Optimizing KVM VM-Exits"
>> +date:   2019-11-08
>> +categories: [kvm, optimization]
>> +---
>> +
>> +Background on VM-Exits
>> +----------------------
>> +
>> +KVM (Kernel-based Virtual Machine) is the Linux kernel module that
>> +allows a host to run virtualized guests (Linux, Windows, etc).  The KVM
>> +"guest execution loop", with QEMU (the open source emulator and
>> +virtualizer) as its user space, is roughly as follows: QEMU issues the
>> +ioctl(), KVM_RUN, to tell KVM to prepare to enter the CPU's "Guest Mode"
>> +-- a special processor mode which allows guest code to safely run
>> +directly on the physical CPU.  The guest code, which is inside a "jail"
>> +and thus cannot interfere with the rest of the system, keeps running on
>> +the hardware until it encounters a request it cannot handle.  Then the
>> +processor gives the control back (referred to as "VM-Exit") either to
>> +kernel space, or to the user space to handle the request.  Once the
>> +request is handled, native execution of guest code on the processor
>> +resumes again.  And the loop goes on.
>> +
>> +There are dozens of reasons for VM-Exits (Intel's Software Developer
>> +Manual outlines 64 "Basic Exit Reasons").  For example, when a guest
>> +needs to emulate the CPUID instruction, it causes a "light-weight exit"
>> +to kernel space, because CPUID (among a few others) is emulated in the
>> +kernel itself, for performance reasons.  But when the kernel _cannot_
>> +handle a request, e.g. to emulate certain hardware, it results in a
>> +"heavy-weight exit" to QEMU, to perform the emulation.  These VM-Exits
>> +and subsequent re-entries ("VM-Enters"), even the light-weight ones, can
>> +be expensive.  What can be done about it?
>> +
>> +Guest workloads that are hard to virtualize
>> +-------------------------------------------
>> +
>> +At the 2019 edition of the KVM Forum in Lyon, kernel developer, Andrea
>> +Arcangeli, attempted to address the kernel part of minimizing VM-Exits.
>> +
>> +His talk touched on the cost of VM-Exits into the kernel, especially for
>> +guest workloads (e.g. enterprise databases) that are sensitive to their
>> +performance penalty.  However, these workloads cannot avoid triggering
>> +VM-Exits with a high frequency.  Andrea then outlined some of the
>> +optimizations he's been working on to improve the VM-Exit performance in
>> +the KVM code path -- especially in light of applying mitigations for
>> +speculative execution flaws (Spectre v2, MDS, L1TF).
>> +
>> +Andrea gave a brief recap of the different kinds of speculative
>> +execution attacks (retpolines, IBPB, PTI, SSBD, etc).  Followed by that
>> +he outlined the performance impact of Spectre-v2 mitigations in context
>> +of KVM.
>> +
>> +The microbechmark: CPUID in a one million loop
>> +----------------------------------------------
>> +
>> +The synthetic microbenchmark (meaning, focus on measuring the
>> +performance of a specific area of code) Andrea used was to run the CPUID
>> +instruction one million times, without any GCC optimizations or caching.
>> +This was done to test the latency of VM-Exits.
>> +
>> +While stressing that the results of these microbenchmarks do not
>> +represent real-world workloads, he had two goals in mind with it: (a)
>> +explain how the software mitigation works; and (b) to justify to the
>> +broader community the value of the software optimizations he's working
>> +on in KVM.
>> +
>> +Andrea then reasoned through several interesting graphs that show how
>> +CPU computation time gets impacted when you disable or enable the
>> +various kernel-space mitigations for Spectre v2, L1TF, MDS, et al.
>> +
>> +The proposal: "KVM Monolithic"
>> +------------------------------
>> +
>> +Based on his investigation, Andrea proposed a patch series, ["KVM
>> +monolithc"](https://lwn.net/Articles/800870/), to get rid of the KVM
>> +common module, 'kvm.ko'.  Instead the KVM common code gets linked twice
>> +into each of the vendor-specific KVM modules, 'kvm-intel.ko' and
>> +'kvm-amd.ko'.
>> +
>> +The reason for doing this is that the 'kvm.ko' module indirectly calls
>> +(via the "retpoline" technique) the vendor-specific KVM modules at every
>> +VM-Exit, several times.  These indirect calls were not optimal before,
>> +but the "retpoline" mitigation (which isolates indirect branches, that
>> +allow a CPU to execute code from arbitrary locations, from speculative
>> +execution) for Spectre v2 compounds the problem, as it degrades
>> +performance.
>> +
>> +This approach will result in a few MiB of increased disk space for
>> +'kvm-intel.ko' and 'kvm-amd.ko', but the upside in saved indirect calls,
>> +and the elimination of "retpoline" overhead at run-time more than
>> +compensate for it.
>> +
>> +With the "KVM Monolithic" patch series applied, Andrea's microbenchmarks
>> +show a double-digit improvement in performance with default mitigations
>> +(for Spectre v2, et al) enabled on both Intel 'VMX' and AMD 'SVM'.  And
>> +with 'spectre_v2=off' or for CPUs with IBRS_ALL in ARCH_CAPABILITIES
>> +"KVM monolithic" still improve[s] performance, albiet it's on the order
>> +of 1%.
>> +
>> +Conclusion
>> +----------
>> +
>> +Removal of the common KVM module has a non-negligible positive
>> +performance impact.  And the "KVM Monolitic" patch series is still
>> +actively being reviewed, modulo some pending clean-ups.  Based on the
>> +upstream review discussion, KVM Maintainer, Paolo Bonzini, and other
>> +reviewers seemed amenable to merge the series.
>> +
>> +Although, we still have to deal with mitigations for 'indirect branch
>> +prediction' for a long time, reducing the VM-Exit latency is important
>> +in general; and more specifically, for guest workloads that happen to
>> +trigger frequent VM-Exits, without having to disable Spectre v2
>> +mitigations on the host, as Andrea stated in the cover letter of his
>> +patch series.
>>

--
Alex Bennée