KVM Archive on lore.kernel.org
 help / color / Atom feed
From: Sean Christopherson <sean.j.christopherson@intel.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, vkuznets@redhat.com
Subject: Re: [PATCH 42/43] KVM: VMX: Leave preemption timer running when it's disabled
Date: Fri, 14 Jun 2019 09:34:16 -0700
Message-ID: <20190614163416.GH12191@linux.intel.com> (raw)
In-Reply-To: <1560445409-17363-43-git-send-email-pbonzini@redhat.com>

On Thu, Jun 13, 2019 at 07:03:28PM +0200, Paolo Bonzini wrote:
> From: Sean Christopherson <sean.j.christopherson@intel.com>
> 
> VMWRITEs to the major VMCS controls, pin controls included, are
> deceptively expensive.  CPUs with VMCS caching (Westmere and later) also
> optimize away consistency checks on VM-Entry, i.e. skip consistency
> checks if the relevant fields have not changed since the last successful
> VM-Entry (of the cached VMCS).  Because uops are a precious commodity,
> uCode's dirty VMCS field tracking isn't as precise as software would
> prefer.  Notably, writing any of the major VMCS fields effectively marks
> the entire VMCS dirty, i.e. causes the next VM-Entry to perform all
> consistency checks, which consumes several hundred cycles.
> 
> As it pertains to KVM, toggling PIN_BASED_VMX_PREEMPTION_TIMER more than
> doubles the latency of the next VM-Entry (and again when/if the flag is
> toggled back).  In a non-nested scenario, running a "standard" guest
> with the preemption timer enabled, toggling the timer flag is uncommon
> but not rare, e.g. roughly 1 in 10 entries.  Disabling the preemption
> timer can change these numbers due to its use for "immediate exits",
> even when explicitly disabled by userspace.
> 
> Nested virtualization in particular is painful, as the timer flag is set
> for the majority of VM-Enters, but prepare_vmcs02() initializes vmcs02's
> pin controls to *clear* the flag since its the timer's final state isn't
> known until vmx_vcpu_run().  I.e. the majority of nested VM-Enters end
> up unnecessarily writing pin controls *twice*.
> 
> Rather than toggle the timer flag in pin controls, set the timer value
> itself to the largest allowed value to put it into a "soft disabled"
> state, and ignore any spurious preemption timer exits.
> 
> Sadly, the timer is a 32-bit value and so theoretically it can fire
> before the head death of the universe, i.e. spurious exits are possible.

s/head/heat

> But because KVM does *not* save the timer value on VM-Exit and because
> the timer runs at a slower rate than the TSC, the maximuma timer value

s/maximuma/maximum

> is still sufficiently large for KVM's purposes.  E.g. on a modern CPU
> with a timer that runs at 1/32 the frequency of a 2.4ghz constant-rate
> TSC, the timer will fire after ~55 seconds of *uninterrupted* guest
> execution.  In other words, spurious VM-Exits are effectively only
> possible if the *host* is tickless on the logical CPU, the guest is
> not using the preemption timer, and the guest is not generating VM-Exits
> for *any* other reason.
> 
> To be safe from bad/weird hardware, disable the preemption timer if its
> maximum delay is less than ten seconds.  Ten seconds is mostly arbitrary
> and was selected in no small part because it's a nice round number.
> For simplicity and paranoia, fall back to __kvm_request_immediate_exit()
> if the preemption timer is disabled by KVM or userspace.  Previously
> KVM continued to use the preemption timer to force immediate exits even
> when the timer was disabled by userspace.  Now that KVM leaves the timer
> running instead of truly disabling it, allow userspace to kill it
> entirely in the unlikely event the timer (or KVM) malfunctions.
> 
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---

  reply index

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-13 17:02 [PATCH 00/43] VMX optimizations Paolo Bonzini
2019-06-13 17:02 ` [PATCH 01/43] KVM: VMX: Fix handling of #MC that occurs during VM-Entry Paolo Bonzini
2019-06-13 17:24   ` Jim Mattson
2019-06-13 17:02 ` [PATCH 02/43] kvm: nVMX: small cleanup in handle_exception Paolo Bonzini
2019-06-13 17:02 ` [PATCH 03/43] KVM: VMX: Read cached VM-Exit reason to detect external interrupt Paolo Bonzini
2019-06-13 17:02 ` [PATCH 04/43] KVM: VMX: Store the host kernel's IDT base in a global variable Paolo Bonzini
2019-06-13 17:02 ` [PATCH 05/43] KVM: x86: Move kvm_{before,after}_interrupt() calls to vendor code Paolo Bonzini
2019-06-13 17:02 ` [PATCH 06/43] KVM: VMX: Handle NMIs, #MCs and async #PFs in common irqs-disabled fn Paolo Bonzini
2019-06-13 17:02 ` [PATCH 07/43] KVM: nVMX: Intercept VMWRITEs to read-only shadow VMCS fields Paolo Bonzini
2019-06-13 17:02 ` [PATCH 08/43] KVM: nVMX: Intercept VMWRITEs to GUEST_{CS,SS}_AR_BYTES Paolo Bonzini
2019-06-13 17:02 ` [PATCH 09/43] KVM: nVMX: Track vmcs12 offsets for shadowed VMCS fields Paolo Bonzini
2019-06-13 17:02 ` [PATCH 10/43] KVM: nVMX: Lift sync_vmcs12() out of prepare_vmcs12() Paolo Bonzini
2019-06-13 17:02 ` [PATCH 11/43] KVM: nVMX: Use descriptive names for VMCS sync functions and flags Paolo Bonzini
2019-06-13 17:02 ` [PATCH 12/43] KVM: nVMX: Add helpers to identify shadowed VMCS fields Paolo Bonzini
2019-06-14 16:10   ` Sean Christopherson
2019-06-13 17:02 ` [PATCH 13/43] KVM: nVMX: Sync rarely accessed guest fields only when needed Paolo Bonzini
2019-06-13 17:03 ` [PATCH 14/43] KVM: nVMX: Rename prepare_vmcs02_*_full to prepare_vmcs02_*_rare Paolo Bonzini
2019-06-13 17:03 ` [PATCH 15/43] KVM: VMX: Always signal #GP on WRMSR to MSR_IA32_CR_PAT with bad value Paolo Bonzini
2019-06-13 17:03 ` [PATCH 16/43] KVM: nVMX: Always sync GUEST_BNDCFGS when it comes from vmcs01 Paolo Bonzini
     [not found]   ` <20190615221602.93C5721851@mail.kernel.org>
2019-06-15 22:40     ` Liran Alon
2019-06-13 17:03 ` [PATCH 17/43] KVM: nVMX: Write ENCLS-exiting bitmap once per vmcs02 Paolo Bonzini
2019-06-13 17:03 ` [PATCH 18/43] KVM: nVMX: Don't rewrite GUEST_PML_INDEX during nested VM-Entry Paolo Bonzini
2019-06-13 17:03 ` [PATCH 19/43] KVM: VMX: simplify vmx_prepare_switch_to_{guest,host} Paolo Bonzini
2019-06-13 17:03 ` [PATCH 20/43] KVM: nVMX: Don't "put" vCPU or host state when switching VMCS Paolo Bonzini
2019-06-13 17:03 ` [PATCH 21/43] KVM: nVMX: Don't reread VMCS-agnostic " Paolo Bonzini
2019-06-14 16:25   ` Sean Christopherson
2019-06-13 17:03 ` [PATCH 22/43] KVM: nVMX: Don't dump VMCS if virtual APIC page can't be mapped Paolo Bonzini
2019-06-17 19:17   ` Radim Krčmář
2019-06-17 20:07     ` Sean Christopherson
2019-06-18  9:43       ` Paolo Bonzini
2019-06-13 17:03 ` [PATCH 23/43] KVM: nVMX: Don't speculatively write virtual-APIC page address Paolo Bonzini
2019-06-13 17:03 ` [PATCH 24/43] KVM: nVMX: Don't speculatively write APIC-access " Paolo Bonzini
2019-06-13 17:03 ` [PATCH 25/43] KVM: nVMX: Update vmcs12 for MSR_IA32_CR_PAT when it's written Paolo Bonzini
2019-06-13 17:03 ` [PATCH 26/43] KVM: nVMX: Update vmcs12 for SYSENTER MSRs when they're written Paolo Bonzini
2019-06-13 17:03 ` [PATCH 27/43] KVM: nVMX: Update vmcs12 for MSR_IA32_DEBUGCTLMSR when it's written Paolo Bonzini
2019-06-13 17:03 ` [PATCH 28/43] KVM: nVMX: Don't update GUEST_BNDCFGS if it's clean in HV eVMCS Paolo Bonzini
2019-06-13 17:03 ` [PATCH 29/43] KVM: x86: introduce is_pae_paging Paolo Bonzini
2019-06-13 17:03 ` [PATCH 30/43] KVM: nVMX: Copy PDPTRs to/from vmcs12 only when necessary Paolo Bonzini
2019-06-13 17:03 ` [PATCH 31/43] KVM: nVMX: Use adjusted pin controls for vmcs02 Paolo Bonzini
2019-06-13 17:03 ` [PATCH 32/43] KVM: VMX: Add builder macros for shadowing controls Paolo Bonzini
2019-06-13 17:03 ` [PATCH 33/43] KVM: VMX: Shadow VMCS pin controls Paolo Bonzini
2019-06-13 17:03 ` [PATCH 34/43] KVM: VMX: Shadow VMCS primary execution controls Paolo Bonzini
2019-06-13 17:03 ` [PATCH 35/43] KVM: VMX: Shadow VMCS secondary " Paolo Bonzini
2019-06-13 17:03 ` [PATCH 36/43] KVM: nVMX: Shadow VMCS controls on a per-VMCS basis Paolo Bonzini
2019-06-13 17:03 ` [PATCH 37/43] KVM: nVMX: Don't reset VMCS controls shadow on VMCS switch Paolo Bonzini
2019-06-13 17:03 ` [PATCH 38/43] KVM: VMX: Explicitly initialize controls shadow at VMCS allocation Paolo Bonzini
2019-06-13 17:03 ` [PATCH 39/43] KVM: nVMX: Preserve last USE_MSR_BITMAPS when preparing vmcs02 Paolo Bonzini
2019-06-13 17:03 ` [PATCH 40/43] KVM: nVMX: Preset *DT exiting in vmcs02 when emulating UMIP Paolo Bonzini
2019-06-13 17:03 ` [PATCH 41/43] KVM: VMX: Drop hv_timer_armed from 'struct loaded_vmcs' Paolo Bonzini
2019-06-13 17:03 ` [PATCH 42/43] KVM: VMX: Leave preemption timer running when it's disabled Paolo Bonzini
2019-06-14 16:34   ` Sean Christopherson [this message]
2019-06-13 17:03 ` [PATCH 43/43] KVM: nVMX: shadow pin based execution controls Paolo Bonzini
2019-06-14 16:34   ` Sean Christopherson

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190614163416.GH12191@linux.intel.com \
    --to=sean.j.christopherson@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=vkuznets@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org kvm@archiver.kernel.org
	public-inbox-index kvm


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/ public-inbox