kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Like Xu <like.xu.linux@gmail.com>
To: Jim Mattson <jmattson@google.com>, Peter Zijlstra <peterz@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Sean Christopherson <seanjc@google.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Joerg Roedel <joro@8bytes.org>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Stephane Eranian <eranian@google.com>,
	David Dunn <daviddunn@google.com>
Subject: KVM: x86: Reconsider the current approach of vPMU
Date: Wed, 9 Feb 2022 16:10:48 +0800	[thread overview]
Message-ID: <2db2ebbe-e552-b974-fc77-870d958465ba@gmail.com> (raw)
In-Reply-To: <CALMp9eRBOmwz=mspp0m5Q093K3rMUeAsF3vEL39MGV5Br9wEQQ@mail.gmail.com>

Changed the subject to attract more attention.

On 3/2/2022 6:35 am, Jim Mattson wrote:
> On Wed, Feb 2, 2022 at 6:43 AM Peter Zijlstra <peterz@infradead.org> wrote:
> 
>> Urgh... hate on kvm being a module again. We really need something like
>> EXPORT_SYMBOL_KVM() or something.
> 
> Perhaps we should reconsider the current approach of treating the
> guest as a client of the host perf subsystem via kvm as a proxy. There

The story of vPMU begins with the perf_event_create_kernel_counter()
interface which is a generic API in the kernel mode.

> are several drawbacks to the current approach:
> 1) If the guest actually sets the counter mask (and invert counter
> mask) or edge detect in the event selector, we ignore it, because we
> have no way of requesting that from perf.

We need more guest user cases and voices when it comes to vPMU
capabilities on a case-by-case basis (invert counter mask or edge detect).

KVM may set these bits before vm-entry if it does not affect the host.

> 2) If a system-wide pinned counter preempts one of kvm's thread-pinned
> counters, we have no way of letting the guest know, because the
> architectural specification doesn't allow counters to be suspended.

One such case is NMI watchdog. The truth is that KVM can check the status
of the event before vm-entry to know that the back-end counter has been
rescheduled to another perf user, but can't do anything about it.

I had drafted a vPMU notification patch set to synchronize the status of the
back-end counters to the guest, using the PV method with the help of vPMI.

I'm sceptical about this direction and the efficiency of the notification mechanism
I have designed but I do hope that others have better ideas and quality code.

The number of counters is relatively plenty, but it's a pain in the arse for LBR,
and I may post out a slow path with a high performance cost if you're interested in.

> 3) TDX is going to pull the rug out from under us anyway. When the TDX
> module usurps control of the PMU, any active host counters are going
> to stop counting. We are going to need a way of telling the host perf

I presume that performance counters data of TDX guest is isolated for host,
and host counters (from host perf agent) will not stop and keep counting
only for TDX guests in debug mode.

Off-topic, not all of the capabilities of the core-PMU can or should be
used by TDX guests (given that the behavior of firmware for PMU resource
is constantly changing and not even defined).

> subsystem what's happening, or other host perf clients are going to
> get bogus data.

I predict perf core will be patched to sense (via callback, KVM notifies perf,
smart perf_event running time or host stable TSC diff) and report this kind
of data holes from TDX, SGX, AMD-SEV in the report.

> 
> Given what's coming with TDX, I wonder if we should just bite the
> bullet and cede the PMU to the guest while it's running, even for
> non-TDX guests. That would solve (1) and (2) as well.

The idea of "cede the PMU to the guest" or "vPMU pass-through" is not really
new to us, there are two main obstacles: one is the isolation of NMI paths
(including GLOBAL_* MSRs, like STATUS); the other is avoiding guest sniffing
host data which is a perfect method for guest escape attackers.

At one time, we proposed to statically reserve counters from the host
perf view at guest startup, but this option was NAK-ed from PeterZ.

We may have a chance to reserve counters at the host startup
until we have a guest PMU more friendly hardware design. :D

I'd like to add one more vPMU discussion point: support for
heterogeneous VMX cores (run vPMU on the ADL or later).

The current capability of vPMU depends on which physical CPU
the KVM module is initialized on, and the current upstream
solution brings concerns in terms of vCPU compatibility and migration.

Thanks,
Like Xu

  parent reply	other threads:[~2022-02-09  8:11 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-17  8:53 [PATCH kvm/queue v2 0/3] KVM: x86/pmu: Fix out-of-date AMD amd_event_mapping[] Like Xu
2022-01-17  8:53 ` [PATCH kvm/queue v2 1/3] KVM: x86/pmu: Replace pmu->available_event_types with a new BITMAP Like Xu
2022-02-01 12:26   ` Paolo Bonzini
2022-01-17  8:53 ` [PATCH kvm/queue v2 2/3] perf: x86/core: Add interface to query perfmon_event_map[] directly Like Xu
2022-02-01 12:27   ` Paolo Bonzini
2022-02-02 14:43   ` Peter Zijlstra
2022-02-02 22:35     ` Jim Mattson
2022-02-03 17:33       ` David Dunn
2022-02-09  8:10       ` Like Xu [this message]
2022-02-09 13:33         ` KVM: x86: Reconsider the current approach of vPMU Peter Zijlstra
2022-02-09 21:00           ` Sean Christopherson
2022-02-10 12:08             ` Like Xu
2022-02-10 17:12               ` Sean Christopherson
2022-02-16  3:33                 ` Like Xu
2022-02-16 17:53                   ` Jim Mattson
2022-02-09 13:21       ` [PATCH kvm/queue v2 2/3] perf: x86/core: Add interface to query perfmon_event_map[] directly Peter Zijlstra
2022-02-09 15:40         ` Dave Hansen
2022-02-09 18:47           ` Jim Mattson
2022-02-09 18:57             ` Dave Hansen
2022-02-09 19:24               ` David Dunn
2022-02-10 13:29                 ` Like Xu
2022-02-10 15:34                 ` Liang, Kan
2022-02-10 16:34                   ` Jim Mattson
2022-02-10 18:30                     ` Liang, Kan
2022-02-10 19:16                       ` Jim Mattson
2022-02-10 19:46                         ` Liang, Kan
2022-02-10 19:55                           ` David Dunn
2022-02-11 14:11                             ` Liang, Kan
2022-02-11 18:08                               ` Jim Mattson
2022-02-11 21:47                                 ` Liang, Kan
2022-02-12 23:31                                   ` Jim Mattson
2022-02-14 21:55                                     ` Liang, Kan
2022-02-14 22:55                                       ` Jim Mattson
2022-02-16  7:36                                         ` Like Xu
2022-02-16 18:10                                           ` Jim Mattson
2022-02-16  7:30                           ` Like Xu
2022-02-16  5:08                   ` Like Xu
2022-02-10 12:55               ` Like Xu
2022-02-12 23:32               ` Jim Mattson
2022-02-08 11:52     ` Like Xu
2022-01-17  8:53 ` [PATCH kvm/queue v2 3/3] KVM: x86/pmu: Setup the {inte|amd}_event_mapping[] when hardware_setup Like Xu
2022-02-01 12:28   ` Paolo Bonzini
2022-02-08 10:10     ` Like Xu
2022-01-26 11:22 ` [PATCH kvm/queue v2 0/3] KVM: x86/pmu: Fix out-of-date AMD amd_event_mapping[] Like Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2db2ebbe-e552-b974-fc77-870d958465ba@gmail.com \
    --to=like.xu.linux@gmail.com \
    --cc=daviddunn@google.com \
    --cc=eranian@google.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=seanjc@google.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).