From: Like Xu <like.xu.linux@gmail.com>
To: Peter Zijlstra <peterz@infradead.org>,
Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
Vitaly Kuznetsov <vkuznets@redhat.com>,
Wanpeng Li <wanpengli@tencent.com>,
Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
Thomas Gleixner <tglx@linutronix.de>,
kvm@vger.kernel.org, x86@kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] KVM: x86/pmu: Introduce pmc->is_paused to reduce the call time of perf interfaces
Date: Thu, 29 Jul 2021 21:46:05 +0800 [thread overview]
Message-ID: <1e4fdcd4-fea3-771a-f437-b0305951ca92@gmail.com> (raw)
In-Reply-To: <YQKl7/0I4p0o0TCY@hirez.programming.kicks-ass.net>
On 29/7/2021 8:58 pm, Peter Zijlstra wrote:
> On Wed, Jul 28, 2021 at 08:07:05PM +0800, Like Xu wrote:
>> From: Like Xu <likexu@tencent.com>
>>
>> Based on our observations, after any vm-exit associated with vPMU, there
>> are at least two or more perf interfaces to be called for guest counter
>> emulation, such as perf_event_{pause, read_value, period}(), and each one
>> will {lock, unlock} the same perf_event_ctx. The frequency of calls becomes
>> more severe when guest use counters in a multiplexed manner.
>>
>> Holding a lock once and completing the KVM request operations in the perf
>> context would introduce a set of impractical new interfaces. So we can
>> further optimize the vPMU implementation by avoiding repeated calls to
>> these interfaces in the KVM context for at least one pattern:
>>
>> After we call perf_event_pause() once, the event will be disabled and its
>> internal count will be reset to 0. So there is no need to pause it again
>> or read its value. Once the event is paused, event period will not be
>> updated until the next time it's resumed or reprogrammed. And there is
>> also no need to call perf_event_period twice for a non-running counter,
>> considering the perf_event for a running counter is never paused.
>>
>> Based on this implementation, for the following common usage of
>> sampling 4 events using perf on a 4u8g guest:
>>
>> echo 0 > /proc/sys/kernel/watchdog
>> echo 25 > /proc/sys/kernel/perf_cpu_time_max_percent
>> echo 10000 > /proc/sys/kernel/perf_event_max_sample_rate
>> echo 0 > /proc/sys/kernel/perf_cpu_time_max_percent
>> for i in `seq 1 1 10`
>> do
>> taskset -c 0 perf record \
>> -e cpu-cycles -e instructions -e branch-instructions -e cache-misses \
>> /root/br_instr a
>> done
>>
>> the average latency of the guest NMI handler is reduced from
>> 37646.7 ns to 32929.3 ns (~1.14x speed up) on the Intel ICX server.
>> Also, in addition to collecting more samples, no loss of sampling
>> accuracy was observed compared to before the optimization.
>>
>> Signed-off-by: Like Xu <likexu@tencent.com>
>
> Looks sane I suppose.
>
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>
> What kinds of VM-exits are the most common?
>
A typical vm-exit trace is as follows:
146820 EXTERNAL_INTERRUPT
126301 MSR_WRITE
17009 MSR_READ
9710 RDPMC
7295 EXCEPTION_NMI
2493 EPT_VIOLATION
1357 EPT_MISCONFIG
567 CPUID
107 NMI_WINDOW
59 IO_INSTRUCTION
2 VMCALL
including the following kvm_msr trace:
15822 msr_write, MSR_CORE_PERF_GLOBAL_CTRL
14558 msr_read, MSR_CORE_PERF_GLOBAL_STATUS
7315 msr_write, IA32_X2APIC_LVT_PMI
7250 msr_write, MSR_CORE_PERF_GLOBAL_OVF_CTRL
2922 msr_write, MSR_IA32_PMC0
2912 msr_write, MSR_CORE_PERF_FIXED_CTR0
2904 msr_write, MSR_CORE_PERF_FIXED_CTR1
2390 msr_write, MSR_CORE_PERF_FIXED_CTR_CTRL
2390 msr_read, MSR_CORE_PERF_FIXED_CTR_CTRL
1195 msr_write, MSR_P6_EVNTSEL1
1195 msr_write, MSR_P6_EVNTSEL0
976 msr_write, MSR_IA32_PMC1
618 msr_write, IA32_X2APIC_ICR
Due to the presence of a large number of msr accesses, the latency of
the guest PMI handler is still far from that of the host handler.
I have two rough ideas that could drastically reduce the vPMU overhead
for the third time:
- Add a new paravirtualized pmu guest driver that saves all msr latest
values to the static physical memory of each vcpu to achieve a reduced
number of vm-exits and also facilitate kvm emulation access;
- Bypass the host perf_event PMI callback injection path and inject
guest PMI directly after the EXCEPTION_N/PMI vm-exit; (For TDX guest)
Any negative comments or help with additional details are welcome.
next prev parent reply other threads:[~2021-07-29 13:46 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-28 12:07 [PATCH] KVM: x86/pmu: Introduce pmc->is_paused to reduce the call time of perf interfaces Like Xu
2021-07-29 12:58 ` Peter Zijlstra
2021-07-29 13:46 ` Like Xu [this message]
2021-08-02 15:46 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1e4fdcd4-fea3-771a-f437-b0305951ca92@gmail.com \
--to=like.xu.linux@gmail.com \
--cc=jmattson@google.com \
--cc=joro@8bytes.org \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=seanjc@google.com \
--cc=tglx@linutronix.de \
--cc=vkuznets@redhat.com \
--cc=wanpengli@tencent.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).