kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Sean Christopherson <sean.j.christopherson@intel.com>
Subject: Re: [PATCH 12/14] KVM: retpolines: x86: eliminate retpoline from vmx.c exit handlers
Date: Wed, 16 Oct 2019 09:07:39 +0200	[thread overview]
Message-ID: <27cc0d6b-6bd7-fcaf-10b4-37bb566871f8@redhat.com> (raw)
In-Reply-To: <20191015234229.GC6487@redhat.com>

On 16/10/19 01:42, Andrea Arcangeli wrote:
> On Wed, Oct 16, 2019 at 12:22:31AM +0200, Paolo Bonzini wrote:
>> Oh come on.  0.9 is not 12-years old.  virtio 1.0 is 3.5 years old
>> (March 2016).  Anything older than 2017 is going to use 0.9.
> 
> Sorry if I got the date wrong, but still I don't see the point in
> optimizing for legacy virtio. I can't justify forcing everyone to
> execute that additional branch for inb/outb, in the attempt to make
> legacy virtio faster that nobody should use in combination with
> bleeding edge KVM in the host.

Yet you would add CPUID to the list even though it is not even there in
your benchmarks, and is *never* invoked in a hot path by *any* sane
program? Some OSes have never gotten virtio 1.0 drivers.  OpenBSD only
got it earlier this year.

>> Your tables give:
>>
>> 	Samples	  Samples%  Time%     Min Time  Max time       Avg time
>> HLT     101128    75.33%    99.66%    0.43us    901000.66us    310.88us
>> HLT     118474    19.11%    95.88%    0.33us    707693.05us    43.56us
>>
>> If "avg time" means the average time to serve an HLT vmexit, I don't
>> understand how you can have an average time of 0.3ms (1/3000th of a
>> second) and 100000 samples per second.  Can you explain that to me?
> 
> I described it wrong, the bpftrace record was a sleep 5, not a sleep
> 1. The pipe loop was sure a sleep 1.

It still doesn't add up.  0.3ms / 5 is 1/15000th of a second; 43us is
1/25000th of a second.  Do you have multiple vCPU perhaps?

> The issue is that in production you get a flood more of those with
> hundred of CPUs, so the exact number doesn't move the needle.
> This just needs to be frequent enough that the branch cost pay itself off,
> but the sure thing is that HLT vmexit will not go away unless you execute
> mwait in guest mode by isolating the CPU in the host.

The number of vmexits doesn't count (for HLT).  What counts is how long
they take to be serviced, and as long as it's 1us or more the
optimization is pointless.

Consider these pictures

         w/o optimization                   with optimization
         ----------------------             -------------------------
0us      vmexit                             vmexit
500ns    retpoline                          call vmexit handler directly
600ns    retpoline                          kvm_vcpu_check_block()
700ns    retpoline                          kvm_vcpu_check_block()
800ns    kvm_vcpu_check_block()             kvm_vcpu_check_block()
900ns    kvm_vcpu_check_block()             kvm_vcpu_check_block()
...
39900ns  kvm_vcpu_check_block()             kvm_vcpu_check_block()

                            <interrupt arrives>

40000ns  kvm_vcpu_check_block()             kvm_vcpu_check_block()


Unless the interrupt arrives exactly in the few nanoseconds that it
takes to execute the retpoline, a direct handling of HLT vmexits makes
*absolutely no difference*.

>> Again: what is the real workload that does thousands of CPUIDs per second?
> 
> None, but there are always background CPUID vmexits while there are
> never inb/outb vmexits.
> 
> So the cpuid retpoline removal has a slight chance to pay for the cost
> of the branch, the inb/outb retpoline removal cannot pay off the cost
> of the branch.

Please stop considering only the exact configuration of your benchmarks.
 There are known, valid configurations where outb is a very hot vmexit.

Thanks,

Paolo

  reply	other threads:[~2019-10-16  7:07 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-28 17:23 [PATCH 00/14] KVM monolithic v2 Andrea Arcangeli
2019-09-28 17:23 ` [PATCH 01/14] KVM: monolithic: x86: remove kvm.ko Andrea Arcangeli
2019-10-15  1:31   ` Sean Christopherson
2019-10-15  3:18     ` Sean Christopherson
2019-10-15  8:32       ` Paolo Bonzini
2019-09-28 17:23 ` [PATCH 02/14] KVM: monolithic: x86: disable linking vmx and svm at the same time into the kernel Andrea Arcangeli
2019-10-15  3:16   ` Sean Christopherson
2019-10-15  8:21     ` Paolo Bonzini
2019-10-15 15:23       ` Sean Christopherson
2019-09-28 17:23 ` [PATCH 04/14] KVM: monolithic: x86: handle the request_immediate_exit variation Andrea Arcangeli
2019-09-28 17:23 ` [PATCH 05/14] KVM: monolithic: add more section prefixes in the KVM common code Andrea Arcangeli
2019-09-28 17:23 ` [PATCH 06/14] KVM: monolithic: x86: remove __exit section prefix from machine_unsetup Andrea Arcangeli
2019-09-28 17:23 ` [PATCH 07/14] KVM: monolithic: x86: remove __init section prefix from kvm_x86_cpu_has_kvm_support Andrea Arcangeli
2019-09-28 17:23 ` [PATCH 08/14] KVM: monolithic: x86: remove exports Andrea Arcangeli
2019-09-28 17:23 ` [PATCH 09/14] KVM: monolithic: remove exports from KVM common code Andrea Arcangeli
2019-09-28 17:23 ` [PATCH 10/14] KVM: monolithic: x86: drop the kvm_pmu_ops structure Andrea Arcangeli
2019-09-28 17:23 ` [PATCH 11/14] KVM: x86: optimize more exit handlers in vmx.c Andrea Arcangeli
2019-09-28 17:23 ` [PATCH 12/14] KVM: retpolines: x86: eliminate retpoline from vmx.c exit handlers Andrea Arcangeli
2019-10-15  8:28   ` Paolo Bonzini
2019-10-15 16:49     ` Andrea Arcangeli
2019-10-15 19:46       ` Paolo Bonzini
2019-10-15 20:35         ` Andrea Arcangeli
2019-10-15 22:22           ` Paolo Bonzini
2019-10-15 23:42             ` Andrea Arcangeli
2019-10-16  7:07               ` Paolo Bonzini [this message]
2019-10-16 16:50                 ` Andrea Arcangeli
2019-10-16 17:01                   ` Paolo Bonzini
2019-09-28 17:23 ` [PATCH 13/14] KVM: retpolines: x86: eliminate retpoline from svm.c " Andrea Arcangeli
2019-09-28 17:23 ` [PATCH 14/14] x86: retpolines: eliminate retpoline from msr event handlers Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=27cc0d6b-6bd7-fcaf-10b4-37bb566871f8@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sean.j.christopherson@intel.com \
    --cc=vkuznets@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).