From: Mingwei Zhang <mizhang@google.com>
To: Sean Christopherson <seanjc@google.com>
Cc: Xiong Zhang <xiong.y.zhang@linux.intel.com>,
pbonzini@redhat.com, peterz@infradead.org, kan.liang@intel.com,
zhenyuw@linux.intel.com, dapeng1.mi@linux.intel.com,
jmattson@google.com, kvm@vger.kernel.org,
linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org,
zhiyuan.lv@intel.com, eranian@google.com, irogers@google.com,
samantha.alt@intel.com, like.xu.linux@gmail.com,
chao.gao@intel.com
Subject: Re: [RFC PATCH 00/41] KVM: x86/pmu: Introduce passthrough vPM
Date: Thu, 18 Apr 2024 21:52:39 +0000 [thread overview]
Message-ID: <ZiGWJ3J1hYBpRjRQ@google.com> (raw)
In-Reply-To: <ZiGGiOspm6N-vIta@google.com>
On Thu, Apr 18, 2024, Mingwei Zhang wrote:
> On Thu, Apr 11, 2024, Sean Christopherson wrote:
> > <bikeshed>
> >
> > I think we should call this a mediated PMU, not a passthrough PMU. KVM still
> > emulates the control plane (controls and event selectors), while the data is
> > fully passed through (counters).
> >
> > </bikeshed>
> Sean,
>
> I feel "mediated PMU" seems to be a little bit off the ..., no? In
> KVM, almost all of features are mediated. In our specific case, the
> legacy PMU is mediated by KVM and perf subsystem on the host. In new
> design, it is mediated by KVM only.
>
> We intercept the control plan in current design, but the only thing
> we do is the event filtering. No fancy code change to emulate the control
> registers. So, it is still a passthrough logic.
>
> In some (rare) business cases, I think maybe we could fully passthrough
> the control plan as well. For instance, sole-tenant machine, or
> full-machine VM + full offload. In case if there is a cpu errata, KVM
> can force vmexit and dynamically intercept the selectors on all vcpus
> with filters checked. It is not supported in current RFC, but maybe
> doable in later versions.
>
> With the above, I wonder if we can still use passthrough PMU for
> simplicity? But no strong opinion if you really want to keep this name.
> I would have to take some time to convince myself.
>
One propoal. Maybe "direct vPMU"? I think there would be many words that
focus on the "passthrough" side but not on the "interception/mediation"
side?
> Thanks.
> -Mingwei
> >
> > On Fri, Jan 26, 2024, Xiong Zhang wrote:
> >
> > > 1. host system wide / QEMU events handling during VM running
> > > At VM-entry, all the host perf events which use host x86 PMU will be
> > > stopped. These events with attr.exclude_guest = 1 will be stopped here
> > > and re-started after vm-exit. These events without attr.exclude_guest=1
> > > will be in error state, and they cannot recovery into active state even
> > > if the guest stops running. This impacts host perf a lot and request
> > > host system wide perf events have attr.exclude_guest=1.
> > >
> > > This requests QEMU Process's perf event with attr.exclude_guest=1 also.
> > >
> > > During VM running, perf event creation for system wide and QEMU
> > > process without attr.exclude_guest=1 fail with -EBUSY.
> > >
> > > 2. NMI watchdog
> > > the perf event for NMI watchdog is a system wide cpu pinned event, it
> > > will be stopped also during vm running, but it doesn't have
> > > attr.exclude_guest=1, we add it in this RFC. But this still means NMI
> > > watchdog loses function during VM running.
> > >
> > > Two candidates exist for replacing perf event of NMI watchdog:
> > > a. Buddy hardlock detector[3] may be not reliable to replace perf event.
> > > b. HPET-based hardlock detector [4] isn't in the upstream kernel.
> >
> > I think the simplest solution is to allow mediated PMU usage if and only if
> > the NMI watchdog is disabled. Then whether or not the host replaces the NMI
> > watchdog with something else becomes an orthogonal discussion, i.e. not KVM's
> > problem to solve.
> >
> > > 3. Dedicated kvm_pmi_vector
> > > In emulated vPMU, host PMI handler notify KVM to inject a virtual
> > > PMI into guest when physical PMI belongs to guest counter. If the
> > > same mechanism is used in passthrough vPMU and PMI skid exists
> > > which cause physical PMI belonging to guest happens after VM-exit,
> > > then the host PMI handler couldn't identify this PMI belongs to
> > > host or guest.
> > > So this RFC uses a dedicated kvm_pmi_vector, PMI belonging to guest
> > > has this vector only. The PMI belonging to host still has an NMI
> > > vector.
> > >
> > > Without considering PMI skid especially for AMD, the host NMI vector
> > > could be used for guest PMI also, this method is simpler and doesn't
> >
> > I don't see how multiplexing NMIs between guest and host is simpler. At best,
> > the complexity is a wash, just in different locations, and I highly doubt it's
> > a wash. AFAIK, there is no way to precisely know that an NMI came in via the
> > LVTPC.
> >
> > E.g. if an IPI NMI arrives before the host's PMU is loaded, confusion may ensue.
> > SVM has the luxury of running with GIF=0, but that simply isn't an option on VMX.
> >
> > > need x86 subsystem to reserve the dedicated kvm_pmi_vector, and we
> > > didn't meet the skid PMI issue on modern Intel processors.
> > >
> > > 4. per-VM passthrough mode configuration
> > > Current RFC uses a KVM module enable_passthrough_pmu RO parameter,
> > > it decides vPMU is passthrough mode or emulated mode at kvm module
> > > load time.
> > > Do we need the capability of per-VM passthrough mode configuration?
> > > So an admin can launch some non-passthrough VM and profile these
> > > non-passthrough VMs in host, but admin still cannot profile all
> > > the VMs once passthrough VM existence. This means passthrough vPMU
> > > and emulated vPMU mix on one platform, it has challenges to implement.
> > > As the commit message in commit 0011, the main challenge is
> > > passthrough vPMU and emulated vPMU have different vPMU features, this
> > > ends up with two different values for kvm_cap.supported_perf_cap, which
> > > is initialized at module load time. To support it, more refactor is
> > > needed.
> >
> > I have no objection to an all-or-nothing setup. I'd honestly love to rip out the
> > existing vPMU support entirely, but that's probably not be realistic, at least not
> > in the near future.
> >
> > > Remain Works
> > > ===
> > > 1. To reduce passthrough vPMU overhead, optimize the PMU context switch.
> >
> > Before this gets out of its "RFC" phase, I would at least like line of sight to
> > a more optimized switch. I 100% agree that starting with a conservative
> > implementation is the way to go, and the kernel absolutely needs to be able to
> > profile KVM itself (and everything KVM calls into), i.e. _always_ keeping the
> > guest PMU loaded for the entirety of KVM_RUN isn't a viable option.
> >
> > But I also don't want to get into a situation where can't figure out a clean,
> > robust way to do the optimized context switch without needing (another) massive
> > rewrite.
next prev parent reply other threads:[~2024-04-18 21:52 UTC|newest]
Thread overview: 181+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-26 8:54 [RFC PATCH 00/41] KVM: x86/pmu: Introduce passthrough vPM Xiong Zhang
2024-01-26 8:54 ` [RFC PATCH 01/41] perf: x86/intel: Support PERF_PMU_CAP_VPMU_PASSTHROUGH Xiong Zhang
2024-04-11 17:04 ` Sean Christopherson
2024-04-11 17:21 ` Liang, Kan
2024-04-11 17:24 ` Jim Mattson
2024-04-11 17:46 ` Sean Christopherson
2024-04-11 19:13 ` Liang, Kan
2024-04-11 20:43 ` Sean Christopherson
2024-04-11 21:04 ` Liang, Kan
2024-04-11 19:32 ` Sean Christopherson
2024-01-26 8:54 ` [RFC PATCH 02/41] perf: Support guest enter/exit interfaces Xiong Zhang
2024-03-20 16:40 ` Raghavendra Rao Ananta
2024-03-20 17:12 ` Liang, Kan
2024-04-11 18:06 ` Sean Christopherson
2024-04-11 19:53 ` Liang, Kan
2024-04-12 19:17 ` Sean Christopherson
2024-04-12 20:56 ` Liang, Kan
2024-04-15 16:03 ` Liang, Kan
2024-04-16 5:34 ` Zhang, Xiong Y
2024-04-16 12:48 ` Liang, Kan
2024-04-17 9:42 ` Zhang, Xiong Y
2024-04-18 16:11 ` Sean Christopherson
2024-04-19 1:37 ` Zhang, Xiong Y
2024-04-26 4:09 ` Zhang, Xiong Y
2024-01-26 8:54 ` [RFC PATCH 03/41] perf: Set exclude_guest onto nmi_watchdog Xiong Zhang
2024-04-11 18:56 ` Sean Christopherson
2024-01-26 8:54 ` [RFC PATCH 04/41] perf: core/x86: Add support to register a new vector for PMI handling Xiong Zhang
2024-04-11 17:10 ` Sean Christopherson
2024-04-11 19:05 ` Sean Christopherson
2024-04-12 3:56 ` Zhang, Xiong Y
2024-04-13 1:17 ` Mi, Dapeng
2024-01-26 8:54 ` [RFC PATCH 05/41] KVM: x86/pmu: Register PMI handler for passthrough PMU Xiong Zhang
2024-04-11 19:07 ` Sean Christopherson
2024-04-12 5:44 ` Zhang, Xiong Y
2024-01-26 8:54 ` [RFC PATCH 06/41] perf: x86: Add function to switch PMI handler Xiong Zhang
2024-04-11 19:17 ` Sean Christopherson
2024-04-11 19:34 ` Sean Christopherson
2024-04-12 6:03 ` Zhang, Xiong Y
2024-04-12 5:57 ` Zhang, Xiong Y
2024-01-26 8:54 ` [RFC PATCH 07/41] perf/x86: Add interface to reflect virtual LVTPC_MASK bit onto HW Xiong Zhang
2024-04-11 19:21 ` Sean Christopherson
2024-04-12 6:17 ` Zhang, Xiong Y
2024-01-26 8:54 ` [RFC PATCH 08/41] KVM: x86/pmu: Add get virtual LVTPC_MASK bit function Xiong Zhang
2024-04-11 19:22 ` Sean Christopherson
2024-01-26 8:54 ` [RFC PATCH 09/41] perf: core/x86: Forbid PMI handler when guest own PMU Xiong Zhang
2024-04-11 19:26 ` Sean Christopherson
2024-01-26 8:54 ` [RFC PATCH 10/41] perf: core/x86: Plumb passthrough PMU capability from x86_pmu to x86_pmu_cap Xiong Zhang
2024-01-26 8:54 ` [RFC PATCH 11/41] KVM: x86/pmu: Introduce enable_passthrough_pmu module parameter and propage to KVM instance Xiong Zhang
2024-04-11 20:54 ` Sean Christopherson
2024-04-11 21:03 ` Sean Christopherson
2024-01-26 8:54 ` [RFC PATCH 12/41] KVM: x86/pmu: Plumb through passthrough PMU to vcpu for Intel CPUs Xiong Zhang
2024-04-11 20:57 ` Sean Christopherson
2024-01-26 8:54 ` [RFC PATCH 13/41] KVM: x86/pmu: Add a helper to check if passthrough PMU is enabled Xiong Zhang
2024-01-26 8:54 ` [RFC PATCH 14/41] KVM: x86/pmu: Allow RDPMC pass through Xiong Zhang
2024-01-26 8:54 ` [RFC PATCH 15/41] KVM: x86/pmu: Manage MSR interception for IA32_PERF_GLOBAL_CTRL Xiong Zhang
2024-04-11 21:21 ` Sean Christopherson
2024-04-11 22:30 ` Jim Mattson
2024-04-11 23:27 ` Sean Christopherson
2024-04-13 2:10 ` Mi, Dapeng
2024-01-26 8:54 ` [RFC PATCH 16/41] KVM: x86/pmu: Create a function prototype to disable MSR interception Xiong Zhang
2024-01-26 8:54 ` [RFC PATCH 17/41] KVM: x86/pmu: Implement pmu function for Intel CPU " Xiong Zhang
2024-01-26 8:54 ` [RFC PATCH 18/41] KVM: x86/pmu: Intercept full-width GP counter MSRs by checking with perf capabilities Xiong Zhang
2024-04-11 21:23 ` Sean Christopherson
2024-04-11 21:50 ` Jim Mattson
2024-04-12 16:01 ` Sean Christopherson
2024-01-26 8:54 ` [RFC PATCH 19/41] KVM: x86/pmu: Whitelist PMU MSRs for passthrough PMU Xiong Zhang
2024-01-26 8:54 ` [RFC PATCH 20/41] KVM: x86/pmu: Introduce PMU operation prototypes for save/restore PMU context Xiong Zhang
2024-01-26 8:54 ` [RFC PATCH 21/41] KVM: x86/pmu: Introduce function prototype for Intel CPU to " Xiong Zhang
2024-01-26 8:54 ` [RFC PATCH 22/41] x86: Introduce MSR_CORE_PERF_GLOBAL_STATUS_SET for passthrough PMU Xiong Zhang
2024-01-26 8:54 ` [RFC PATCH 23/41] KVM: x86/pmu: Implement the save/restore of PMU state for Intel CPU Xiong Zhang
2024-04-11 21:26 ` Sean Christopherson
2024-04-13 2:29 ` Mi, Dapeng
2024-04-11 21:44 ` Sean Christopherson
2024-04-11 22:19 ` Jim Mattson
2024-04-11 23:31 ` Sean Christopherson
2024-04-13 3:19 ` Mi, Dapeng
2024-04-13 3:03 ` Mi, Dapeng
2024-04-13 3:34 ` Mingwei Zhang
2024-04-13 4:25 ` Mi, Dapeng
2024-04-15 6:06 ` Mingwei Zhang
2024-04-15 10:04 ` Mi, Dapeng
2024-04-15 16:44 ` Mingwei Zhang
2024-04-15 17:38 ` Sean Christopherson
2024-04-15 17:54 ` Mingwei Zhang
2024-04-15 22:45 ` Sean Christopherson
2024-04-22 2:14 ` maobibo
2024-04-22 17:01 ` Sean Christopherson
2024-04-23 1:01 ` maobibo
2024-04-23 2:44 ` Mi, Dapeng
2024-04-23 2:53 ` maobibo
2024-04-23 3:13 ` Mi, Dapeng
2024-04-23 3:26 ` maobibo
2024-04-23 3:59 ` Mi, Dapeng
2024-04-23 3:55 ` maobibo
2024-04-23 4:23 ` Mingwei Zhang
2024-04-23 6:08 ` maobibo
2024-04-23 6:45 ` Mi, Dapeng
2024-04-23 7:10 ` Mingwei Zhang
2024-04-23 8:24 ` Mi, Dapeng
2024-04-23 8:51 ` maobibo
2024-04-23 16:50 ` Mingwei Zhang
2024-04-23 12:12 ` maobibo
2024-04-23 17:02 ` Mingwei Zhang
2024-04-24 1:07 ` maobibo
2024-04-24 8:18 ` Mi, Dapeng
2024-04-24 15:00 ` Sean Christopherson
2024-04-25 3:55 ` Mi, Dapeng
2024-04-25 4:24 ` Mingwei Zhang
2024-04-25 16:13 ` Liang, Kan
2024-04-25 20:16 ` Mingwei Zhang
2024-04-25 20:43 ` Liang, Kan
2024-04-25 21:46 ` Sean Christopherson
2024-04-26 1:46 ` Mi, Dapeng
2024-04-26 3:12 ` Mingwei Zhang
2024-04-26 4:02 ` Mi, Dapeng
2024-04-26 4:46 ` Mingwei Zhang
2024-04-26 14:09 ` Liang, Kan
2024-04-26 18:41 ` Mingwei Zhang
2024-04-26 19:06 ` Liang, Kan
2024-04-26 19:46 ` Sean Christopherson
2024-04-27 3:04 ` Mingwei Zhang
2024-04-28 0:58 ` Mi, Dapeng
2024-04-28 6:01 ` Mingwei Zhang
2024-04-29 17:44 ` Sean Christopherson
2024-05-01 17:43 ` Mingwei Zhang
2024-05-01 18:00 ` Liang, Kan
2024-05-01 20:36 ` Sean Christopherson
2024-04-29 13:08 ` Liang, Kan
2024-04-26 13:53 ` Liang, Kan
2024-04-26 1:50 ` Mi, Dapeng
2024-04-18 21:21 ` Mingwei Zhang
2024-04-18 21:41 ` Mingwei Zhang
2024-04-19 1:02 ` Mi, Dapeng
2024-01-26 8:54 ` [RFC PATCH 24/41] KVM: x86/pmu: Zero out unexposed Counters/Selectors to avoid information leakage Xiong Zhang
2024-04-11 21:36 ` Sean Christopherson
2024-04-11 21:56 ` Jim Mattson
2024-01-26 8:54 ` [RFC PATCH 25/41] KVM: x86/pmu: Introduce macro PMU_CAP_PERF_METRICS Xiong Zhang
2024-01-26 8:54 ` [RFC PATCH 26/41] KVM: x86/pmu: Add host_perf_cap field in kvm_caps to record host PMU capability Xiong Zhang
2024-04-11 21:49 ` Sean Christopherson
2024-01-26 8:54 ` [RFC PATCH 27/41] KVM: x86/pmu: Clear PERF_METRICS MSR for guest Xiong Zhang
2024-04-11 21:50 ` Sean Christopherson
2024-04-13 3:29 ` Mi, Dapeng
2024-01-26 8:54 ` [RFC PATCH 28/41] KVM: x86/pmu: Switch IA32_PERF_GLOBAL_CTRL at VM boundary Xiong Zhang
2024-04-11 21:54 ` Sean Christopherson
2024-04-11 22:10 ` Jim Mattson
2024-04-11 22:54 ` Sean Christopherson
2024-04-11 23:08 ` Jim Mattson
2024-01-26 8:54 ` [RFC PATCH 29/41] KVM: x86/pmu: Exclude existing vLBR logic from the passthrough PMU Xiong Zhang
2024-01-26 8:54 ` [RFC PATCH 30/41] KVM: x86/pmu: Switch PMI handler at KVM context switch boundary Xiong Zhang
2024-01-26 8:54 ` [RFC PATCH 31/41] KVM: x86/pmu: Call perf_guest_enter() at PMU context switch Xiong Zhang
2024-01-26 8:54 ` [RFC PATCH 32/41] KVM: x86/pmu: Add support for PMU context switch at VM-exit/enter Xiong Zhang
2024-01-26 8:54 ` [RFC PATCH 33/41] KVM: x86/pmu: Make check_pmu_event_filter() an exported function Xiong Zhang
2024-01-26 8:54 ` [RFC PATCH 34/41] KVM: x86/pmu: Intercept EVENT_SELECT MSR Xiong Zhang
2024-04-11 21:55 ` Sean Christopherson
2024-01-26 8:54 ` [RFC PATCH 35/41] KVM: x86/pmu: Allow writing to event selector for GP counters if event is allowed Xiong Zhang
2024-01-26 8:54 ` [RFC PATCH 36/41] KVM: x86/pmu: Intercept FIXED_CTR_CTRL MSR Xiong Zhang
2024-04-11 21:56 ` Sean Christopherson
2024-01-26 8:54 ` [RFC PATCH 37/41] KVM: x86/pmu: Allow writing to fixed counter selector if counter is exposed Xiong Zhang
2024-04-11 22:03 ` Sean Christopherson
2024-04-13 4:12 ` Mi, Dapeng
2024-01-26 8:54 ` [RFC PATCH 38/41] KVM: x86/pmu: Introduce PMU helper to increment counter Xiong Zhang
2024-01-26 8:54 ` [RFC PATCH 39/41] KVM: x86/pmu: Implement emulated counter increment for passthrough PMU Xiong Zhang
2024-04-11 23:12 ` Sean Christopherson
2024-04-11 23:17 ` Sean Christopherson
2024-01-26 8:54 ` [RFC PATCH 40/41] KVM: x86/pmu: Separate passthrough PMU logic in set/get_msr() from non-passthrough vPMU Xiong Zhang
2024-04-11 23:18 ` Sean Christopherson
2024-04-18 21:54 ` Mingwei Zhang
2024-01-26 8:54 ` [RFC PATCH 41/41] KVM: nVMX: Add nested virtualization support for passthrough PMU Xiong Zhang
2024-04-11 23:21 ` Sean Christopherson
2024-04-11 17:03 ` [RFC PATCH 00/41] KVM: x86/pmu: Introduce passthrough vPM Sean Christopherson
2024-04-12 2:19 ` Zhang, Xiong Y
2024-04-12 18:32 ` Sean Christopherson
2024-04-15 1:06 ` Zhang, Xiong Y
2024-04-15 15:05 ` Sean Christopherson
2024-04-16 5:11 ` Zhang, Xiong Y
2024-04-18 20:46 ` Mingwei Zhang
2024-04-18 21:52 ` Mingwei Zhang [this message]
2024-04-19 19:14 ` Sean Christopherson
2024-04-19 22:02 ` Mingwei Zhang
2024-04-11 23:25 ` Sean Christopherson
2024-04-11 23:56 ` Mingwei Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZiGWJ3J1hYBpRjRQ@google.com \
--to=mizhang@google.com \
--cc=chao.gao@intel.com \
--cc=dapeng1.mi@linux.intel.com \
--cc=eranian@google.com \
--cc=irogers@google.com \
--cc=jmattson@google.com \
--cc=kan.liang@intel.com \
--cc=kvm@vger.kernel.org \
--cc=like.xu.linux@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=samantha.alt@intel.com \
--cc=seanjc@google.com \
--cc=xiong.y.zhang@linux.intel.com \
--cc=zhenyuw@linux.intel.com \
--cc=zhiyuan.lv@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).